NGINX Health Check

  • Upstream 에 저장된/등록된 Pool Member Check

Passive Health Check

Upstream 설정

  • Upstrem 내 각각 Server 들에 대해 설정

    • sample Config - 1

      upstream test_pool {
          zone test_pool 128k;
          server 10.250.11.101:80 max_fails=1 fail_timeout=10 ;  ## Default Parameter 
          server 10.250.11.102:80;
          server 10.250.11.201:80;
          server 10.250.11.202:80;
      }
      server {
          listen       80 default_server;
          server_name  localhost;
      
          access_log  /var/log/nginx/host.access.log  main;
      
          location / {
          proxy_pass http://test_pool;
          }
      }
      
  • Upstream 서버들에 대해 다양한 옵션 설정 가능

    • Options

      max_fails=number sets the number of unsuccessful attempts to communicate with the server that should happen in the duration set by the fail_timeout parameter to consider the server unavailable for a duration also set by the fail_timeout parameter. By default, the number of unsuccessful attempts is set to 1. The zero value disables the accounting of attempts.


      fail_timeout=time the time during which the specified number of unsuccessful attempts to communicate with the server should happen to consider the server unavailable; and the period of time the server will be considered unavailable. By default, the parameter is set to 10 seconds.


      slow_start=time ## Only NGINX Plus sets the time during which the server will recover its weight from zero to a nominal value, when unhealthy server becomes healthy, or when the server becomes available after a period of time it was considered unavailable. Default value is zero, i.e. slow start is disabled.


      The parameter cannot be used along with the hash, ip_hash, and random load balancing methods.

Passive Health Check - Logging

  • nginx error.log 에 남음 / Logging Level 확인 필요

  • Upstream Server 에 대해 재 사용 가능해도 별도 로깅 없이, 트래픽 보냄

  • Case1. upstream 1EA Fail

    • Access 시도하는 있음
  • 2023/04/05 17:00:57 [error] 5855#5855: *2189 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    2023/04/05 17:00:57 [warn] 5855#5855: *2189 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    2023/04/05 17:01:19 [error] 5855#5855: *2215 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    2023/04/05 17:01:19 [warn] 5855#5855: *2215 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    2023/04/05 17:01:34 [error] 5855#5855: *2253 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    2023/04/05 17:01:34 [warn] 5855#5855: *2253 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    
  • Case2. All Upstream Fail

    2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.102:80/>", host: "0"
    2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.102:80/>", host: "0"
    2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
    2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.202:80/>", host: "0"
    2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.202:80/>", host: "0"
    2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.201:80/>", host: "0"
    2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.201:80/>", host: "0"
    
  • Case 3. Upstream Restore - 별도의 로깅 없음


Active Health Check

Upstream 설정

  • zone 영역 필요  ( Shared Memory 를 통해 상태 저장 )

Location 설정

  • health_check;

    • Default 조건 : interval=5 , fails = 1 , passes = 1 , uri = / , port= NGINX Server Port ,
    • / 경로로 접속 및 Response Code 확인 (2xx, 3xx : 정상 / 4xx,5xx : Fail )
  • Sample Config - 1

    upstream test_pool {
            zone test_pool 128k;
            server 10.250.11.101:80;
            server 10.250.11.102:80;
            server 10.250.11.201:80;
            server 10.250.11.202:80;
    }
    server {
        listen       80 default_server;
        server_name  localhost;
        access_log  /var/log/nginx/host.access.log  main;
        location / {
            proxy_pass http://test_pool;
            health_check;
        }
    }
    
  • health_check Options;

    • 다양한 조건 설정 가능

      interval=time sets the interval between two consecutive health checks, by default, 5 seconds.


      jitter=time sets the time within which each health check will be randomly delayed, by default, there is no delay. 

      fails=number sets the number of consecutive failed health checks of a particular server after which this server will be considered unhealthy, by default, 1.


      passes=number sets the number of consecutive passed health checks of a particular server after which the server will be considered healthy, by default, 1. 

      uri=uri defines the URI used in health check requests, by default, “/”. 

      mandatory [persistent] sets the initial “checking” state for a server until the first health check is completed (1.11.7). Client requests are not passed to servers in the “checking” state. If the parameter is not specified, the server will be initially considered healthy. The persistent parameter (1.19.7) sets the initial “up” state for a server after reload if the server was considered healthy before reload.


      match=name specifies the match block configuring the tests that a response should pass in order for a health check to pass. By default, the response should have status code 2xx or 3xx.


      port=number defines the port used when connecting to a server to perform a health check (1.9.7). By default, equals the server port.


      type=grpc [grpc_service=name] [grpc_status=code] enables periodic health checks of a gRPC server or a particular gRPC service specified with the optional grpc_service parameter (1.19.5). If the server does not support the gRPC Health Checking Protocol, the optional grpc_status parameter can be used to specify non-zero gRPC status (for example, status code “12” / “UNIMPLEMENTED”) that will be treated as healthy: health_check mandatory type=grpc grpc_status=12; The type=grpc parameter must be specified after all other directive parameters, grpc_service and grpc_status must follow type=grpc. The parameter is not compatible with uri or match parameters.

      keepalive_time=time

      enables keepalive connections for health checks and specifies the time during which requests can be processed through one keepalive connection (1.21.7). By default keepalive connections are disabled.

  • Sample Config - 2

    upstream test_pool {
            zone test_pool 128k;
            server 10.250.11.101:80;
            server 10.250.11.102:80;
            server 10.250.11.201:80;
            server 10.250.11.202:80;
    }
    server {
        listen       80 default_server;
        server_name  localhost;
        access_log  /var/log/nginx/host.access.log  main;
        location / {
            proxy_pass http://test_pool;
            health_check interval=5 fails=2 passes=2 uri=/ port=80 keepalive_time=10;
        }
    }
    
  • health_check Condition 생성

    • Match 항목 이용 ( Response Code or Header / Body 값 활용 )

    • 상세

      status

      status 200; - status is 200 

      status ! 500; - status is not 500 

      status 200 204; - status is 200 or 204

      status ! 301 302; - status is neither 301 nor 302 

      status 200-399; - status is in the range from 200 to 399 

      status ! 400-599; - status is not in the range from 400 to 599 

      status 301-303 307; - status is either 301, 302, 303, or 307


      header

      header Content-Type = text/html; - header contains “Content-Type” with value text/html

      header Content-Type != text/html; - header contains “Content-Type” with value other than text/html header Connection ~ close; - header contains “Connection” with value matching regular expression close 

      header Connection !~ close; - header contains “Connection” with value not matching regular expression close 

      header Host; - header contains “Host” header ! X-Accel-Redirect; - header lacks “X-Accel-Redirect”


      Body
      body ~ "Welcome to nginx!"; - body matches regular expression “Welcome to nginx!” 

      body !~ "Welcome to nginx!"; - body does not match regular expression “Welcome to nginx!”

      Require
      require $variable ...; - all specified variables are not empty and not equal to “0” (1.15.9)



  • Sample Config - 3

    upstream test_pool {
            zone test_pool 128k;
            server 10.250.11.101:80;
            server 10.250.11.102:80;
            server 10.250.11.201:80;
            server 10.250.11.202:80;
    
    }
    match openbase {
            status 200;
            header Context-Type = text/html;
            body ~ "Welcome";
    }
    
    server {
        listen       80 default_server;
        server_name  localhost;
    
        access_log  /var/log/nginx/host.access.log  main;
    
        location / {
            proxy_pass http://test_pool;
            health_check match=openbase;
        }
    }
    

Active Health Check - Logging

  • nginx error.log 에 남음 / Logging Level 확인 필요

  • Case 1. Upstream Fail - 지속적으로 로그 생성됨

    2023/04/05 16:43:10 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"
    2023/04/05 16:43:10 [warn] 5774#5774: peer is unhealthy while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"
    2023/04/05 16:43:15 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"
    2023/04/05 16:43:20 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"
    2023/04/05 16:43:25 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"
    
  • Case 2. All Upstream Fail

    2023/04/05 16:38:07 [error] 5761#5761: *1014 no live upstreams while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "http://test_pool/", host: "0"
    2023/04/05 16:38:09 [error] 5761#5761: *1015 no live upstreams while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "http://test_pool/", host: "0"
    
  • Case 3. Upstream Restore

    2023/04/05 16:43:55 [notice] 5774#5774: peer is healthy while checking status code, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"