NGINX Health Check
- Upstream 에 저장된/등록된 Pool Member Check
Passive Health Check
- NGINX Plus / NGINX OSS 사용 가능
- 트랜잭션이 발생할 때 모니터링하고 실패한 연결을 재 시도함
- Upstream 서버가 정상상태 일 때 별도 로깅 안남김
- Official Docs : https://nginx.org/en/docs/http/ngx_http_upstream_module.html?&_ga=2.264470534.868107238.1680484186-1445402126.1666916942#server
Upstream 설정
Upstrem 내 각각 Server 들에 대해 설정
sample Config - 1
upstream test_pool { zone test_pool 128k; server 10.250.11.101:80 max_fails=1 fail_timeout=10 ; ## Default Parameter server 10.250.11.102:80; server 10.250.11.201:80; server 10.250.11.202:80; } server { listen 80 default_server; server_name localhost; access_log /var/log/nginx/host.access.log main; location / { proxy_pass http://test_pool; } }
Upstream 서버들에 대해 다양한 옵션 설정 가능
Options
max_fails=number sets the number of unsuccessful attempts to communicate with the server that should happen in the duration set by the fail_timeout parameter to consider the server unavailable for a duration also set by the fail_timeout parameter. By default, the number of unsuccessful attempts is set to 1. The zero value disables the accounting of attempts.
fail_timeout=time the time during which the specified number of unsuccessful attempts to communicate with the server should happen to consider the server unavailable; and the period of time the server will be considered unavailable. By default, the parameter is set to 10 seconds.
slow_start=time ## Only NGINX Plus sets the time during which the server will recover its weight from zero to a nominal value, when unhealthy server becomes healthy, or when the server becomes available after a period of time it was considered unavailable. Default value is zero, i.e. slow start is disabled.
The parameter cannot be used along with the hash, ip_hash, and random load balancing methods.
Passive Health Check - Logging
nginx error.log 에 남음 / Logging Level 확인 필요
Upstream Server 에 대해 재 사용 가능해도 별도 로깅 없이, 트래픽 보냄
Case1. upstream 1EA Fail
- Access 시도하는 있음
2023/04/05 17:00:57 [error] 5855#5855: *2189 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0" 2023/04/05 17:00:57 [warn] 5855#5855: *2189 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0" 2023/04/05 17:01:19 [error] 5855#5855: *2215 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0" 2023/04/05 17:01:19 [warn] 5855#5855: *2215 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0" 2023/04/05 17:01:34 [error] 5855#5855: *2253 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0" 2023/04/05 17:01:34 [warn] 5855#5855: *2253 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "HEAD / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0"
Case2. All Upstream Fail
2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.102:80/>", host: "0" 2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.102:80/>", host: "0" 2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0" 2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.101:80/>", host: "0" 2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.202:80/>", host: "0" 2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.202:80/>", host: "0" 2023/04/05 17:18:46 [error] 6065#6065: *12475 connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.201:80/>", host: "0" 2023/04/05 17:18:46 [warn] 6065#6065: *12475 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "<http://10.250.11.201:80/>", host: "0"
Case 3. Upstream Restore - 별도의 로깅 없음
Active Health Check
- NGINX Plus Only
- Official Docs : https://nginx.org/en/docs/http/ngx_http_upstream_hc_module.html?&_ga=2.72090522.868107238.1680484186-1445402126.1666916942#health_check
Upstream 설정
- zone 영역 필요 ( Shared Memory 를 통해 상태 저장 )
Location 설정
health_check;
- Default 조건 : interval=5 , fails = 1 , passes = 1 , uri = / , port= NGINX Server Port ,
- / 경로로 접속 및 Response Code 확인 (2xx, 3xx : 정상 / 4xx,5xx : Fail )
Sample Config - 1
upstream test_pool { zone test_pool 128k; server 10.250.11.101:80; server 10.250.11.102:80; server 10.250.11.201:80; server 10.250.11.202:80; } server { listen 80 default_server; server_name localhost; access_log /var/log/nginx/host.access.log main; location / { proxy_pass http://test_pool; health_check; } }
health_check Options;
다양한 조건 설정 가능
interval=time sets the interval between two consecutive health checks, by default, 5 seconds.
jitter=time sets the time within which each health check will be randomly delayed, by default, there is no delay.
fails=number sets the number of consecutive failed health checks of a particular server after which this server will be considered unhealthy, by default, 1.
passes=number sets the number of consecutive passed health checks of a particular server after which the server will be considered healthy, by default, 1.
uri=uri defines the URI used in health check requests, by default, “/”.
mandatory [persistent] sets the initial “checking” state for a server until the first health check is completed (1.11.7). Client requests are not passed to servers in the “checking” state. If the parameter is not specified, the server will be initially considered healthy. The persistent parameter (1.19.7) sets the initial “up” state for a server after reload if the server was considered healthy before reload.
match=name specifies the match block configuring the tests that a response should pass in order for a health check to pass. By default, the response should have status code 2xx or 3xx.
port=number defines the port used when connecting to a server to perform a health check (1.9.7). By default, equals the server port.
type=grpc [grpc_service=name] [grpc_status=code] enables periodic health checks of a gRPC server or a particular gRPC service specified with the optional grpc_service parameter (1.19.5). If the server does not support the gRPC Health Checking Protocol, the optional grpc_status parameter can be used to specify non-zero gRPC status (for example, status code “12” / “UNIMPLEMENTED”) that will be treated as healthy: health_check mandatory type=grpc grpc_status=12; The type=grpc parameter must be specified after all other directive parameters, grpc_service and grpc_status must follow type=grpc. The parameter is not compatible with uri or match parameters.
keepalive_time=time
enables keepalive connections for health checks and specifies the time during which requests can be processed through one keepalive connection (1.21.7). By default keepalive connections are disabled.
Sample Config - 2
upstream test_pool { zone test_pool 128k; server 10.250.11.101:80; server 10.250.11.102:80; server 10.250.11.201:80; server 10.250.11.202:80; } server { listen 80 default_server; server_name localhost; access_log /var/log/nginx/host.access.log main; location / { proxy_pass http://test_pool; health_check interval=5 fails=2 passes=2 uri=/ port=80 keepalive_time=10; } }
health_check Condition 생성
Match 항목 이용 ( Response Code or Header / Body 값 활용 )
상세
status
status 200; - status is 200
status ! 500; - status is not 500
status 200 204; - status is 200 or 204
status ! 301 302; - status is neither 301 nor 302
status 200-399; - status is in the range from 200 to 399
status ! 400-599; - status is not in the range from 400 to 599
status 301-303 307; - status is either 301, 302, 303, or 307
header
header Content-Type = text/html; - header contains “Content-Type” with value text/html
header Content-Type != text/html; - header contains “Content-Type” with value other than text/html header Connection ~ close; - header contains “Connection” with value matching regular expression close
header Connection !~ close; - header contains “Connection” with value not matching regular expression close
header Host; - header contains “Host” header ! X-Accel-Redirect; - header lacks “X-Accel-Redirect”
Body
body ~ "Welcome to nginx!"; - body matches regular expression “Welcome to nginx!”body !~ "Welcome to nginx!"; - body does not match regular expression “Welcome to nginx!”
Require
require $variable ...; - all specified variables are not empty and not equal to “0” (1.15.9)
Sample Config - 3
upstream test_pool { zone test_pool 128k; server 10.250.11.101:80; server 10.250.11.102:80; server 10.250.11.201:80; server 10.250.11.202:80; } match openbase { status 200; header Context-Type = text/html; body ~ "Welcome"; } server { listen 80 default_server; server_name localhost; access_log /var/log/nginx/host.access.log main; location / { proxy_pass http://test_pool; health_check match=openbase; } }
Active Health Check - Logging
nginx error.log 에 남음 / Logging Level 확인 필요
Case 1. Upstream Fail - 지속적으로 로그 생성됨
2023/04/05 16:43:10 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool" 2023/04/05 16:43:10 [warn] 5774#5774: peer is unhealthy while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool" 2023/04/05 16:43:15 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool" 2023/04/05 16:43:20 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool" 2023/04/05 16:43:25 [error] 5774#5774: connect() failed (111: Connection refused) while connecting to upstream, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"
Case 2. All Upstream Fail
2023/04/05 16:38:07 [error] 5761#5761: *1014 no live upstreams while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "http://test_pool/", host: "0" 2023/04/05 16:38:09 [error] 5761#5761: *1015 no live upstreams while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "http://test_pool/", host: "0"
Case 3. Upstream Restore
2023/04/05 16:43:55 [notice] 5774#5774: peer is healthy while checking status code, health check "openbase" of peer 10.250.11.201:80 in upstream "test_pool"