錯誤現象
應用中遇到一個錯誤
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
問題分析
Connection timed out
數據庫側記錄 的日志之一
could not receive data from client: Connection timed out
數據庫日志中沒有同樣的記錄,應該是連接斷開,客戶端直接返回這個錯誤了。于是想辦法重現這樣的錯誤。
猜測和連接斷開有關,這個好模擬在服務端將對應的連接kill -15 殺掉,客戶端再繼續操作會出現類似的錯誤。
[postgres@work ~]$ psql -h 192.168.149.131 -U antdb
Password for user antdb:
psql (11.5, server 11.6)
Type "help" for help.
192.168.149.131:5432 antdb@test=# begin;
BEGIN
192.168.149.131:5432 antdb@test=*# commit;
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
真實應用中不會有主動斷開連接的操作,所以繼續分析。聯想到數據庫會對空閑的連接做處理的機制。由幾個參數控制
tcp_keepalives_count | Maximum number of TCP keepalive retransmits.
tcp_keepalives_idle | Time between issuing TCP keepalives.
tcp_keepalives_interval | Time between TCP keepalive retransmits.
和操作系統對tcp socket 控制類似。
在server端查看數據庫連接網絡情況,用ss或者netstat 命令
[antdb@node1 ~]$ ss -apte|grep 105408
ESTAB 0 0 192.168.149.131:postgres 192.168.149.100:44338
users:(("postgres",pid=105408,fd=11)) timer:(keepalive,19sec,0) uid:1004 ino:118045263
sk:ffff92e9416a4d80 <->
[antdb@node1 ~]$ netstat -antpo|grep 105408
tcp 0 0 192.168.149.131:5432 192.168.149.100:44338 ESTABLISHED 105408/postgres: an keepalive (10.05/0/0)
可以看到keepalive 的情況。
查看數據庫中tcp相關參數
[local]:5432 postgres@test=# select name,setting,reset_val from pg_settings where name ~ 'tcp';
name | setting | reset_val
-------------------------+---------+-----------
tcp_keepalives_count | 0 | 3
tcp_keepalives_idle | 0 | 60
tcp_keepalives_interval | 0 | 10
setting 列沒有改變,感覺生效的是reset_val 列的數值,此處沒有細研究。這三個參數的組合意思就是數據庫對每個鏈接都會有計時器一樣的操作,輪詢計時,間隔tcp_keepalives_idle 這么長時間沒有網絡交互的話,數據庫會發送心跳包給客戶端,用tcpdump 可以抓取到
16:57:08.366531 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294551424 ecr 123129537], length 0
16:57:08.366555 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123189623 ecr 294491338], length 0
16:58:08.398387 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294611456 ecr 123189623], length 0
16:58:08.398409 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123249654 ecr 294491338], length 0
16:59:08.558872 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294671616 ecr 123249654], length 0
16:59:08.558907 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123309815 ecr 294491338], length 0
客戶端對心跳包做返回時,數據庫就認為這不是一個廢棄連接,也就不會進一步殺掉這個連接。
什么情況下數據庫會主動殺掉連接呢,從心跳機制看數據庫主動發給客戶端的心跳超過設置的次數并且客戶端一直沒有反饋,該連接就會被殺掉。但怎么模擬客戶端沒有反饋呢,曾嘗試各個級別的kill命令,和在代碼里調用sleep函數,gdb close掉tcp socket 都沒有模擬出來能力有限。決定把客戶端的服務器網網卡關閉(用的虛擬機模擬),這樣就可以模擬網絡中斷導致的通信失敗了。
從命令觀測到,在嘗試了3次后,數據庫把這個連接給殺掉了。
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (20.44/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (19.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (18.80/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (2.86/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (1.10/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (0.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (9.45/0/1)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (6.29/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (4.47/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (0.30/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (8.72/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47752 ESTABLISHED 63841/postgres: ant keepalive (3.19/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
數據庫日志中能看到斷開連接。
2020-07-30 17:55:32.955 CST,,,63841,"192.168.149.100:47752",5f229914.f961,1,"",2020-07-30 17:55:32 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47752",,,,,,,,,""
2020-07-30 17:55:32.957 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,2,"authentication",2020-07-30 17:55:32 CST,3/2434,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,3,"idle",2020-07-30 17:55:32 CST,3/0,0,LOG,XX000,"could not receive data from client: Connection timed out",,,,,,,,,"psql"
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,4,"idle",2020-07-30 17:55:32 CST,,0,LOG,00000,"disconnection: session time: 0:01:30.131 user=antdb database=test host=192.168.149.100 port=47752",,,,,,,,,"psql"
根據模擬出來的日志在生產系統中過濾確實有相同的日志,基本可以斷定該錯誤是因為心跳超時數據庫主動殺掉連接導致的,但生產系統為什么會出現心跳超時錯誤呢,生產系統應用是通過F5負載均衡連接數據庫的,初步分析是F5對數據庫發出的心跳包不再回應導致斷聯。
Connection reset by peer
could not receive data from client: Connection reset by peer
這樣的日志有別于time out錯誤,雖然客戶端看到的都是server closed the connection unexpectedly,但數據庫日志記錄中這個是另外的錯誤。Connection reset by peer是一個典型的tcp 網絡連接的問題,當tcp正產建立連接后,如果一端主動端開連接(發送rst 網絡包等),另一側就會收到報錯,這個錯誤不是數據庫主動斷開的,所以在應用側針對連接抓包,用wareshark分析一下,可以看到F5主動發了rst信號給連接,將其斷掉了。
No. Time Source Destination Protocol Length Info
1 0.000000 10.159.101.45 10.154.52.156 TCP 77 58721 → 15432 [PSH, ACK] Seq=1 Ack=1 Win=19561 Len=11 TSval=899685638 TSecr=3959865574
2 0.001125 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=1 Ack=12 Win=65535 Len=0 TSval=3960160103 TSecr=899685638
3 0.004199 10.154.52.156 10.159.101.45 TCP 83 15432 → 58721 [PSH, ACK] Seq=1 Ack=12 Win=65535 Len=17 TSval=3960160106 TSecr=899685638
4 0.004215 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=0 TSval=899685642 TSecr=3960160106
5 0.004390 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=1448 TSval=899685642 TSecr=3960160106
6 0.004398 10.159.101.45 10.154.52.156 TCP 209 58721 → 15432 [PSH, ACK] Seq=1460 Ack=18 Win=19561 Len=143 TSval=899685642 TSecr=3960160106
7 0.005488 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=18 Ack=1603 Win=65535 Len=0 TSval=3960160108 TSecr=899685642
8 0.078162 10.154.52.156 10.159.101.45 TCP 314 15432 → 58721 [PSH, ACK] Seq=18 Ack=1603 Win=65535 Len=248 TSval=3960160180 TSecr=899685642
9 0.078262 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=1603 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
10 0.078278 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=8843 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
11 0.079582 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=4499 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
12 0.079590 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=7395 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
13 0.079595 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=16083 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
14 0.079616 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=10291 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
15 0.079620 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=13187 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
16 0.079624 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=23323 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
17 0.079630 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=16083 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
18 0.079633 10.159.101.45 10.154.52.156 TCP 7306 58721 → 15432 [ACK] Seq=30563 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
19 0.081022 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=18979 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
20 0.081077 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=21875 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
21 0.081089 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=37803 Ack=266 Win=20823 Len=1448 TSval=899685719 TSecr=3960160183
22 0.081126 10.159.101.45 10.154.52.156 TCP 988 58721 → 15432 [PSH, ACK] Seq=39251 Ack=266 Win=20823 Len=922 TSval=899685719 TSecr=3960160183
23 0.081132 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=24771 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
24 0.081136 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=27667 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
25 0.081140 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=30563 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
26 0.081144 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=33459 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
27 0.081147 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=36355 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
28 0.082225 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=39251 Win=65535 Len=0 TSval=3960160184 TSecr=899685717
29 0.082251 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=266 Ack=40173 Win=65535 Len=0 TSval=3960160184 TSecr=899685719
30 0.104067 10.154.52.156 10.159.101.45 TCP 85 15432 → 58721 [PSH, ACK] Seq=266 Ack=40173 Win=65535 Len=19 TSval=3960160206 TSecr=899685719
31 0.104154 10.159.101.45 10.154.52.156 TCP 78 58721 → 15432 [PSH, ACK] Seq=40173 Ack=285 Win=20823 Len=12 TSval=899685742 TSecr=3960160206
32 0.105258 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=285 Ack=40185 Win=65535 Len=0 TSval=3960160207 TSecr=899685742
33 0.313198 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=285 Ack=40185 Win=65535 Len=18 TSval=3960160415 TSecr=899685742
34 0.324214 10.159.101.45 10.154.52.156 TCP 77 58721 → 15432 [PSH, ACK] Seq=40185 Ack=303 Win=20823 Len=11 TSval=899685962 TSecr=3960160415
35 0.325344 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=303 Ack=40196 Win=65535 Len=0 TSval=3960160427 TSecr=899685962
36 0.337175 10.154.52.156 10.159.101.45 TCP 83 15432 → 58721 [PSH, ACK] Seq=303 Ack=40196 Win=65535 Len=17 TSval=3960160439 TSecr=899685962
37 0.337250 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=40196 Ack=320 Win=20823 Len=1448 TSval=899685975 TSecr=3960160439
38 0.337258 10.159.101.45 10.154.52.156 TCP 209 58721 → 15432 [PSH, ACK] Seq=41644 Ack=320 Win=20823 Len=143 TSval=899685975 TSecr=3960160439
39 0.338360 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=320 Ack=41787 Win=65535 Len=0 TSval=3960160440 TSecr=899685975
40 0.390031 10.154.52.156 10.159.101.45 TCP 314 15432 → 58721 [PSH, ACK] Seq=320 Ack=41787 Win=65535 Len=248 TSval=3960160492 TSecr=899685975
41 0.390156 10.159.101.45 10.154.52.156 TCP 4410 58721 → 15432 [ACK] Seq=41787 Ack=568 Win=22085 Len=4344 TSval=899686028 TSecr=3960160492
42 0.390169 10.159.101.45 10.154.52.156 TCP 90 58721 → 15432 [PSH, ACK] Seq=46131 Ack=568 Win=22085 Len=24 TSval=899686028 TSecr=3960160492
43 0.391292 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=568 Ack=44683 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
44 0.391301 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=568 Ack=46155 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
45 0.416738 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=568 Ack=46155 Win=65535 Len=18 TSval=3960160519 TSecr=899686028
46 0.416830 10.159.101.45 10.154.52.156 TCP 78 58721 → 15432 [PSH, ACK] Seq=46155 Ack=586 Win=22085 Len=12 TSval=899686055 TSecr=3960160519
47 0.417929 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=586 Ack=46167 Win=65535 Len=0 TSval=3960160520 TSecr=899686055
48 0.622674 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=586 Ack=46167 Win=65535 Len=18 TSval=3960160725 TSecr=899686055
49 0.662720 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=46167 Ack=604 Win=22085 Len=0 TSval=899686301 TSecr=3960160725
50 5.646245 10.159.101.45 10.154.52.156 TCP 77 58721 → 15432 [PSH, ACK] Seq=46167 Ack=604 Win=22085 Len=11 TSval=899691284 TSecr=3960160725
51 5.647397 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=604 Ack=46178 Win=65535 Len=0 TSval=3960165749 TSecr=899691284
52 5.650477 10.154.52.156 10.159.101.45 TCP 83 15432 → 58721 [PSH, ACK] Seq=604 Ack=46178 Win=65535 Len=17 TSval=3960165753 TSecr=899691284
53 5.650490 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=0 TSval=899691288 TSecr=3960165753
54 5.650699 10.159.101.45 10.154.52.156 TCP 1514 58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=1448 TSval=899691289 TSecr=3960165753
55 5.650725 10.159.101.45 10.154.52.156 TCP 209 58721 → 15432 [PSH, ACK] Seq=47626 Ack=621 Win=22085 Len=143 TSval=899691289 TSecr=3960165753
56 5.651861 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=621 Ack=47769 Win=65535 Len=0 TSval=3960165754 TSecr=899691289
57 5.655522 10.154.52.156 10.159.101.45 TCP 314 15432 → 58721 [PSH, ACK] Seq=621 Ack=47769 Win=65535 Len=248 TSval=3960165758 TSecr=899691289
58 5.655759 10.159.101.45 10.154.52.156 TCP 1290 58721 → 15432 [PSH, ACK] Seq=47769 Ack=869 Win=23347 Len=1224 TSval=899691294 TSecr=3960165758
59 5.656885 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=869 Ack=48993 Win=65535 Len=0 TSval=3960165759 TSecr=899691294
60 5.663736 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=869 Ack=48993 Win=65535 Len=18 TSval=3960165766 TSecr=899691294
61 5.663830 10.159.101.45 10.154.52.156 TCP 78 58721 → 15432 [PSH, ACK] Seq=48993 Ack=887 Win=23347 Len=12 TSval=899691302 TSecr=3960165766
62 5.664924 10.154.52.156 10.159.101.45 TCP 66 15432 → 58721 [ACK] Seq=887 Ack=49005 Win=65535 Len=0 TSval=3960165767 TSecr=899691302
63 5.799961 10.154.52.156 10.159.101.45 TCP 84 15432 → 58721 [PSH, ACK] Seq=887 Ack=49005 Win=65535 Len=18 TSval=3960165902 TSecr=899691302
64 5.839732 10.159.101.45 10.154.52.156 TCP 66 58721 → 15432 [ACK] Seq=49005 Ack=905 Win=23347 Len=0 TSval=899691478 TSecr=3960165902
65 306.902943 10.154.52.156 10.159.101.45 TCP 60 15432 → 58721 [RST, ACK] Seq=905 Ack=49005 Win=65535 Len=0
解決方案是修改F5的reset 連接參數,或者在應用端加入重試機制。
操作系統心跳參數和數據庫心跳參數關系
將操作系統心跳改小
[root@node1 ~]# sysctl -a|grep keepalive
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60
數據庫心跳改大
[local]:5432 antdb@test=# select setting,reset_val from pg_settings where name ~ 'tcp';
setting | reset_val
---------+-----------
0 | 3
0 | 300
0 | 30
(3 rows)
觀察數據庫連接心跳管理,可以觀察到是以數據庫設置為準的。
[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47904 ESTABLISHED 122214/postgres: an keepalive (281.39/0/0)
[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.149.131:5432 192.168.149.100:47904 ESTABLISHED 122214/postgres: an keepalive (271.24/0/0)
同時在客戶端抓包,發現確實是以數據庫的心跳參數為準。
root@work ~]# tcpdump -i ens33 tcp port 47904
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
18:20:24.590429 IP 192.168.149.131.postgres > work.47904: Flags [.], ack 2175538745, win 227, options [nop,nop,TS val 299547648 ecr 127884958], length 0
18:20:24.590492 IP work.47904 > 192.168.149.131.postgres: Flags [.], ack 1, win 239, options [nop,nop,TS val 128185847 ecr 299246720], length 0
數據庫日志查看連接開始時間,和首次心跳發送時間間隔正好是數據庫心跳設置數值。
2020-07-30 18:15:23.659 CST,,,122214,"192.168.149.100:47904",5f229dbb.1dd66,1,"",2020-07-30 18:15:23 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47904",,,,,,,,,""
2020-07-30 18:15:23.661 CST,"antdb","test",122214,"192.168.149.100:47904",5f229dbb.1dd66,2,"authentication",2020-07-30 18:15:23 CST,3/2455,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""
所以結論是數據庫連接由數據庫心跳參數管理。