postgresql tcp 連接超時問題

錯誤現象

應用中遇到一個錯誤

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

問題分析

Connection timed out

數據庫側記錄 的日志之一

could not receive data from client: Connection timed out

數據庫日志中沒有同樣的記錄,應該是連接斷開,客戶端直接返回這個錯誤了。于是想辦法重現這樣的錯誤。

猜測和連接斷開有關,這個好模擬在服務端將對應的連接kill -15 殺掉,客戶端再繼續操作會出現類似的錯誤。

[postgres@work ~]$ psql -h 192.168.149.131 -U antdb
Password for user antdb: 
psql (11.5, server 11.6)
Type "help" for help.

192.168.149.131:5432 antdb@test=# begin;
BEGIN
192.168.149.131:5432 antdb@test=*# commit;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

真實應用中不會有主動斷開連接的操作,所以繼續分析。聯想到數據庫會對空閑的連接做處理的機制。由幾個參數控制

 tcp_keepalives_count            | Maximum number of TCP keepalive retransmits.
 tcp_keepalives_idle             | Time between issuing TCP keepalives.
 tcp_keepalives_interval         | Time between TCP keepalive retransmits.

和操作系統對tcp socket 控制類似。

在server端查看數據庫連接網絡情況,用ss或者netstat 命令

[antdb@node1 ~]$ ss -apte|grep 105408
ESTAB      0      0      192.168.149.131:postgres             192.168.149.100:44338               
users:(("postgres",pid=105408,fd=11)) timer:(keepalive,19sec,0) uid:1004 ino:118045263 
sk:ffff92e9416a4d80 <->

[antdb@node1 ~]$ netstat -antpo|grep 105408
tcp        0      0 192.168.149.131:5432    192.168.149.100:44338   ESTABLISHED 105408/postgres: an  keepalive (10.05/0/0)

可以看到keepalive 的情況。

查看數據庫中tcp相關參數

[local]:5432 postgres@test=# select name,setting,reset_val from pg_settings where name ~ 'tcp';
          name           | setting | reset_val 
-------------------------+---------+-----------
 tcp_keepalives_count    | 0       | 3
 tcp_keepalives_idle     | 0       | 60
 tcp_keepalives_interval | 0       | 10

setting 列沒有改變,感覺生效的是reset_val 列的數值,此處沒有細研究。這三個參數的組合意思就是數據庫對每個鏈接都會有計時器一樣的操作,輪詢計時,間隔tcp_keepalives_idle 這么長時間沒有網絡交互的話,數據庫會發送心跳包給客戶端,用tcpdump 可以抓取到

16:57:08.366531 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294551424 ecr 123129537], length 0
16:57:08.366555 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123189623 ecr 294491338], length 0
16:58:08.398387 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294611456 ecr 123189623], length 0
16:58:08.398409 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123249654 ecr 294491338], length 0
16:59:08.558872 IP 192.168.149.131.postgres > work.47230: Flags [.], ack 160, win 227, options [nop,nop,TS val 294671616 ecr 123249654], length 0
16:59:08.558907 IP work.47230 > 192.168.149.131.postgres: Flags [.], ack 316, win 239, options [nop,nop,TS val 123309815 ecr 294491338], length 0

客戶端對心跳包做返回時,數據庫就認為這不是一個廢棄連接,也就不會進一步殺掉這個連接。

什么情況下數據庫會主動殺掉連接呢,從心跳機制看數據庫主動發給客戶端的心跳超過設置的次數并且客戶端一直沒有反饋,該連接就會被殺掉。但怎么模擬客戶端沒有反饋呢,曾嘗試各個級別的kill命令,和在代碼里調用sleep函數,gdb close掉tcp socket 都沒有模擬出來能力有限。決定把客戶端的服務器網網卡關閉(用的虛擬機模擬),這樣就可以模擬網絡中斷導致的通信失敗了。

從命令觀測到,在嘗試了3次后,數據庫把這個連接給殺掉了。

(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (20.44/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (19.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (18.80/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (2.86/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (1.10/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (0.29/0/0)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (9.45/0/1)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (6.29/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (4.47/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (0.30/0/2)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (8.72/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47752   ESTABLISHED 63841/postgres: ant  keepalive (3.19/0/3)
[antdb@node1 ~]$ netstat -antpo|grep 63841
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)

數據庫日志中能看到斷開連接。

2020-07-30 17:55:32.955 CST,,,63841,"192.168.149.100:47752",5f229914.f961,1,"",2020-07-30 17:55:32 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47752",,,,,,,,,""
2020-07-30 17:55:32.957 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,2,"authentication",2020-07-30 17:55:32 CST,3/2434,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,3,"idle",2020-07-30 17:55:32 CST,3/0,0,LOG,XX000,"could not receive data from client: Connection timed out",,,,,,,,,"psql"
2020-07-30 17:57:03.087 CST,"antdb","test",63841,"192.168.149.100:47752",5f229914.f961,4,"idle",2020-07-30 17:55:32 CST,,0,LOG,00000,"disconnection: session time: 0:01:30.131 user=antdb database=test host=192.168.149.100 port=47752",,,,,,,,,"psql"

根據模擬出來的日志在生產系統中過濾確實有相同的日志,基本可以斷定該錯誤是因為心跳超時數據庫主動殺掉連接導致的,但生產系統為什么會出現心跳超時錯誤呢,生產系統應用是通過F5負載均衡連接數據庫的,初步分析是F5對數據庫發出的心跳包不再回應導致斷聯。

Connection reset by peer

could not receive data from client: Connection reset by peer

這樣的日志有別于time out錯誤,雖然客戶端看到的都是server closed the connection unexpectedly,但數據庫日志記錄中這個是另外的錯誤。Connection reset by peer是一個典型的tcp 網絡連接的問題,當tcp正產建立連接后,如果一端主動端開連接(發送rst 網絡包等),另一側就會收到報錯,這個錯誤不是數據庫主動斷開的,所以在應用側針對連接抓包,用wareshark分析一下,可以看到F5主動發了rst信號給連接,將其斷掉了。

No. Time    Source  Destination Protocol    Length  Info
1   0.000000    10.159.101.45   10.154.52.156   TCP 77  58721 → 15432 [PSH, ACK] Seq=1 Ack=1 Win=19561 Len=11 TSval=899685638 TSecr=3959865574
2   0.001125    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=1 Ack=12 Win=65535 Len=0 TSval=3960160103 TSecr=899685638
3   0.004199    10.154.52.156   10.159.101.45   TCP 83  15432 → 58721 [PSH, ACK] Seq=1 Ack=12 Win=65535 Len=17 TSval=3960160106 TSecr=899685638
4   0.004215    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=0 TSval=899685642 TSecr=3960160106
5   0.004390    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=12 Ack=18 Win=19561 Len=1448 TSval=899685642 TSecr=3960160106
6   0.004398    10.159.101.45   10.154.52.156   TCP 209 58721 → 15432 [PSH, ACK] Seq=1460 Ack=18 Win=19561 Len=143 TSval=899685642 TSecr=3960160106
7   0.005488    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=18 Ack=1603 Win=65535 Len=0 TSval=3960160108 TSecr=899685642
8   0.078162    10.154.52.156   10.159.101.45   TCP 314 15432 → 58721 [PSH, ACK] Seq=18 Ack=1603 Win=65535 Len=248 TSval=3960160180 TSecr=899685642
9   0.078262    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=1603 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
10  0.078278    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=8843 Ack=266 Win=20823 Len=7240 TSval=899685716 TSecr=3960160180
11  0.079582    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=4499 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
12  0.079590    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=7395 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
13  0.079595    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=16083 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
14  0.079616    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=10291 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
15  0.079620    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=13187 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
16  0.079624    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=23323 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
17  0.079630    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=16083 Win=65535 Len=0 TSval=3960160182 TSecr=899685716
18  0.079633    10.159.101.45   10.154.52.156   TCP 7306    58721 → 15432 [ACK] Seq=30563 Ack=266 Win=20823 Len=7240 TSval=899685717 TSecr=3960160182
19  0.081022    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=18979 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
20  0.081077    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=21875 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
21  0.081089    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=37803 Ack=266 Win=20823 Len=1448 TSval=899685719 TSecr=3960160183
22  0.081126    10.159.101.45   10.154.52.156   TCP 988 58721 → 15432 [PSH, ACK] Seq=39251 Ack=266 Win=20823 Len=922 TSval=899685719 TSecr=3960160183
23  0.081132    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=24771 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
24  0.081136    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=27667 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
25  0.081140    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=30563 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
26  0.081144    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=33459 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
27  0.081147    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=36355 Win=65535 Len=0 TSval=3960160183 TSecr=899685717
28  0.082225    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=39251 Win=65535 Len=0 TSval=3960160184 TSecr=899685717
29  0.082251    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=266 Ack=40173 Win=65535 Len=0 TSval=3960160184 TSecr=899685719
30  0.104067    10.154.52.156   10.159.101.45   TCP 85  15432 → 58721 [PSH, ACK] Seq=266 Ack=40173 Win=65535 Len=19 TSval=3960160206 TSecr=899685719
31  0.104154    10.159.101.45   10.154.52.156   TCP 78  58721 → 15432 [PSH, ACK] Seq=40173 Ack=285 Win=20823 Len=12 TSval=899685742 TSecr=3960160206
32  0.105258    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=285 Ack=40185 Win=65535 Len=0 TSval=3960160207 TSecr=899685742
33  0.313198    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=285 Ack=40185 Win=65535 Len=18 TSval=3960160415 TSecr=899685742
34  0.324214    10.159.101.45   10.154.52.156   TCP 77  58721 → 15432 [PSH, ACK] Seq=40185 Ack=303 Win=20823 Len=11 TSval=899685962 TSecr=3960160415
35  0.325344    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=303 Ack=40196 Win=65535 Len=0 TSval=3960160427 TSecr=899685962
36  0.337175    10.154.52.156   10.159.101.45   TCP 83  15432 → 58721 [PSH, ACK] Seq=303 Ack=40196 Win=65535 Len=17 TSval=3960160439 TSecr=899685962
37  0.337250    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=40196 Ack=320 Win=20823 Len=1448 TSval=899685975 TSecr=3960160439
38  0.337258    10.159.101.45   10.154.52.156   TCP 209 58721 → 15432 [PSH, ACK] Seq=41644 Ack=320 Win=20823 Len=143 TSval=899685975 TSecr=3960160439
39  0.338360    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=320 Ack=41787 Win=65535 Len=0 TSval=3960160440 TSecr=899685975
40  0.390031    10.154.52.156   10.159.101.45   TCP 314 15432 → 58721 [PSH, ACK] Seq=320 Ack=41787 Win=65535 Len=248 TSval=3960160492 TSecr=899685975
41  0.390156    10.159.101.45   10.154.52.156   TCP 4410    58721 → 15432 [ACK] Seq=41787 Ack=568 Win=22085 Len=4344 TSval=899686028 TSecr=3960160492
42  0.390169    10.159.101.45   10.154.52.156   TCP 90  58721 → 15432 [PSH, ACK] Seq=46131 Ack=568 Win=22085 Len=24 TSval=899686028 TSecr=3960160492
43  0.391292    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=568 Ack=44683 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
44  0.391301    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=568 Ack=46155 Win=65535 Len=0 TSval=3960160493 TSecr=899686028
45  0.416738    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=568 Ack=46155 Win=65535 Len=18 TSval=3960160519 TSecr=899686028
46  0.416830    10.159.101.45   10.154.52.156   TCP 78  58721 → 15432 [PSH, ACK] Seq=46155 Ack=586 Win=22085 Len=12 TSval=899686055 TSecr=3960160519
47  0.417929    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=586 Ack=46167 Win=65535 Len=0 TSval=3960160520 TSecr=899686055
48  0.622674    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=586 Ack=46167 Win=65535 Len=18 TSval=3960160725 TSecr=899686055
49  0.662720    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=46167 Ack=604 Win=22085 Len=0 TSval=899686301 TSecr=3960160725
50  5.646245    10.159.101.45   10.154.52.156   TCP 77  58721 → 15432 [PSH, ACK] Seq=46167 Ack=604 Win=22085 Len=11 TSval=899691284 TSecr=3960160725
51  5.647397    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=604 Ack=46178 Win=65535 Len=0 TSval=3960165749 TSecr=899691284
52  5.650477    10.154.52.156   10.159.101.45   TCP 83  15432 → 58721 [PSH, ACK] Seq=604 Ack=46178 Win=65535 Len=17 TSval=3960165753 TSecr=899691284
53  5.650490    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=0 TSval=899691288 TSecr=3960165753
54  5.650699    10.159.101.45   10.154.52.156   TCP 1514    58721 → 15432 [ACK] Seq=46178 Ack=621 Win=22085 Len=1448 TSval=899691289 TSecr=3960165753
55  5.650725    10.159.101.45   10.154.52.156   TCP 209 58721 → 15432 [PSH, ACK] Seq=47626 Ack=621 Win=22085 Len=143 TSval=899691289 TSecr=3960165753
56  5.651861    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=621 Ack=47769 Win=65535 Len=0 TSval=3960165754 TSecr=899691289
57  5.655522    10.154.52.156   10.159.101.45   TCP 314 15432 → 58721 [PSH, ACK] Seq=621 Ack=47769 Win=65535 Len=248 TSval=3960165758 TSecr=899691289
58  5.655759    10.159.101.45   10.154.52.156   TCP 1290    58721 → 15432 [PSH, ACK] Seq=47769 Ack=869 Win=23347 Len=1224 TSval=899691294 TSecr=3960165758
59  5.656885    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=869 Ack=48993 Win=65535 Len=0 TSval=3960165759 TSecr=899691294
60  5.663736    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=869 Ack=48993 Win=65535 Len=18 TSval=3960165766 TSecr=899691294
61  5.663830    10.159.101.45   10.154.52.156   TCP 78  58721 → 15432 [PSH, ACK] Seq=48993 Ack=887 Win=23347 Len=12 TSval=899691302 TSecr=3960165766
62  5.664924    10.154.52.156   10.159.101.45   TCP 66  15432 → 58721 [ACK] Seq=887 Ack=49005 Win=65535 Len=0 TSval=3960165767 TSecr=899691302
63  5.799961    10.154.52.156   10.159.101.45   TCP 84  15432 → 58721 [PSH, ACK] Seq=887 Ack=49005 Win=65535 Len=18 TSval=3960165902 TSecr=899691302
64  5.839732    10.159.101.45   10.154.52.156   TCP 66  58721 → 15432 [ACK] Seq=49005 Ack=905 Win=23347 Len=0 TSval=899691478 TSecr=3960165902
65  306.902943  10.154.52.156   10.159.101.45   TCP 60  15432 → 58721 [RST, ACK] Seq=905 Ack=49005 Win=65535 Len=0

解決方案是修改F5的reset 連接參數,或者在應用端加入重試機制。

操作系統心跳參數和數據庫心跳參數關系

將操作系統心跳改小

[root@node1 ~]# sysctl -a|grep keepalive
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 60

數據庫心跳改大

[local]:5432 antdb@test=# select setting,reset_val from pg_settings where name ~ 'tcp';
 setting | reset_val 
---------+-----------
 0       | 3
 0       | 300
 0       | 30
(3 rows)

觀察數據庫連接心跳管理,可以觀察到是以數據庫設置為準的。

[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47904   ESTABLISHED 122214/postgres: an  keepalive (281.39/0/0)
[antdb@node1 pg_log]$ netstat -antpo|grep 122214
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.149.131:5432    192.168.149.100:47904   ESTABLISHED 122214/postgres: an  keepalive (271.24/0/0)

同時在客戶端抓包,發現確實是以數據庫的心跳參數為準。

root@work ~]# tcpdump -i ens33 tcp port 47904
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens33, link-type EN10MB (Ethernet), capture size 262144 bytes
18:20:24.590429 IP 192.168.149.131.postgres > work.47904: Flags [.], ack 2175538745, win 227, options [nop,nop,TS val 299547648 ecr 127884958], length 0
18:20:24.590492 IP work.47904 > 192.168.149.131.postgres: Flags [.], ack 1, win 239, options [nop,nop,TS val 128185847 ecr 299246720], length 0

數據庫日志查看連接開始時間,和首次心跳發送時間間隔正好是數據庫心跳設置數值。

2020-07-30 18:15:23.659 CST,,,122214,"192.168.149.100:47904",5f229dbb.1dd66,1,"",2020-07-30 18:15:23 CST,,0,LOG,00000,"connection received: host=192.168.149.100 port=47904",,,,,,,,,""
2020-07-30 18:15:23.661 CST,"antdb","test",122214,"192.168.149.100:47904",5f229dbb.1dd66,2,"authentication",2020-07-30 18:15:23 CST,3/2455,0,LOG,00000,"connection authorized: user=antdb database=test",,,,,,,,,""

所以結論是數據庫連接由數據庫心跳參數管理。

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明:文章內容(如有圖片或視頻亦包括在內)由作者上傳并發布,文章內容僅代表作者本人觀點,簡書系信息發布平臺,僅提供信息存儲服務。