% prstat 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
2160 QAtest 941M 886M cpu0 0 0 80:03:57 99% myserver/41
28352 patrol 7888K 6032K sleep 59 0 4:49:37 0.1% bgscollect/1
24720 QAtest 1872K 1656K cpu3 59 0 0:00:00 0.0% prstat/1
59 root 4064K 3288K sleep 59 0 0:27:56 0.0% picld/6
2132 QAtest 478M 431M sleep 59 0 0:15:45 0.0% someserver.exe/901
I started off with my favorite tool
truss
, and found that the recv()
system call is being called tons of times with no corresponding send()
.
% truss -c -p 2160
^Csyscall seconds calls errors
time .001 115
lwp_park .001 51 24
lwp_unpark .000 23
poll .002 34
recv 61.554 2512863
-------- ------ ----
sys totals: 61.561 2513086 24
usr time: 12.008
elapsed: 68.350
Interestingly the return value of all
recv()
calls is 0 (EOF). A return value of 0 is an indication that the the other end has nothing more to write and ready to close the socket (connection).
% head /tmp/truss.log
2160/216: recv(294, 0x277C9410, 32768, 0) = 0
2160/222: recv(59, 0x1F4CB410, 32768, 0) = 0
2160/216: recv(294, 0x277C9410, 32768, 0) = 0
2160/222: recv(59, 0x1F4CB410, 32768, 0) = 0
2160/216: recv(294, 0x277C9410, 32768, 0) = 0
2160/222: recv(59, 0x1F4CB410, 32768, 0) = 0
2160/216: recv(294, 0x277C9410, 32768, 0) = 0
2160/222: recv(59, 0x1F4CB410, 32768, 0) = 0
2160/216: recv(294, 0x277C9410, 32768, 0) = 0
2160/222: recv(59, 0x1F4CB410, 32768, 0) = 0
2160/216: recv(294, 0x277C9410, 32768, 0) = 0
A typical recv() call will be like this:
recv(55, 0x05CCB010, 4096, 0) = 2958
Then collected the network statistics, and found quite a number of connections in CLOSE_WAIT state
% netstat -an
...
127.0.0.1.54356 127.0.0.1.9810 49152 0 49152 0 ESTABLISHED
127.0.0.1.9810 127.0.0.1.54356 49152 0 49152 0 ESTABLISHED
127.0.0.1.54687 127.0.0.1.9810 49152 0 49152 0 ESTABLISHED
127.0.0.1.9810 127.0.0.1.54687 49152 0 49152 0 ESTABLISHED
...
127.0.0.1.9710 127.0.0.1.55830 49152 0 49152 0 CLOSE_WAIT
127.0.0.1.9810 127.0.0.1.57701 49152 0 49152 0 CLOSE_WAIT
127.0.0.1.9710 127.0.0.1.59209 49152 0 49152 0 CLOSE_WAIT
127.0.0.1.9810 127.0.0.1.60694 49152 0 49152 0 CLOSE_WAIT
127.0.0.1.9810 127.0.0.1.61133 49152 0 49152 0 CLOSE_WAIT
127.0.0.1.9810 127.0.0.1.61136 49152 0 49152 0 CLOSE_WAIT
...
(Later realized that these half-closed socket connections have been lying there for more than two days).
2160/216: recv(294, 0x277C9410, 32768, 0) = 0 <- from truss
The next step is to find out the state of the network connection, with socket id: 294.
pfiles
utility of Solaris, reports the information for all open files in each process. It makes sense to use this utility, as the socket descriptor is nothing, but a file id. (On UNIX, everything is mapped to a file including the raw devices)
% pfiles 2160
2160: /export/home/QAtest/572bliss/web/bin/myserver
Current rlimit: 1024 file descriptors
...
294: S_IFSOCK mode:0666 dev:259,0 ino:35150 uid:0 gid:0 size:0
O_RDWR
sockname: AF_INET 127.0.0.1 port: 9710
peername: AF_INET 127.0.0.1 port: 59209
Now it is fairly easy to identify the connection with the port numbers reported in
pfiles
output
% netstat -an | grep 59209
127.0.0.1.9710 127.0.0.1.59209 49152 0 49152 0 CLOSE_WAIT
A closer look at the other socket ids from
truss
indicated that the server is continuously trying to read data from connections that are in CLOSE_WAIT state. Here are the corresponding statistics for TCP:% netstat -s
TCP tcpRtoAlgorithm = 4 tcpRtoMin = 400
tcpRtoMax = 60000 tcpMaxConn = -1
tcpActiveOpens =4593219 tcpPassiveOpens =2259153
tcpAttemptFails =4036987 tcpEstabResets = 20254
tcpCurrEstab = 75 tcpOutSegs =1264739589
tcpOutDataSegs =645683085 tcpOutDataBytes =1480883468
tcpRetransSegs =682053 tcpRetransBytes =759804724
tcpOutAck =618848538 tcpOutAckDelayed =40226142
tcpOutUrg = 351 tcpOutWinUpdate =155203
tcpOutWinProbe = 3278 tcpOutControl =18622247
tcpOutRsts =8970930 tcpOutFastRetrans = 60772
tcpInSegs =1622143125
tcpInAckSegs =443838358 tcpInAckBytes =1459391481
tcpInDupAck =3254927 tcpInAckUnsent = 0
tcpInInorderSegs =1462796453 tcpInInorderBytes =550228772
tcpInUnorderSegs = 12095 tcpInUnorderBytes =10680481
tcpInDupSegs = 60814 tcpInDupBytes =30969565
tcpInPartDupSegs = 29 tcpInPartDupBytes = 19498
tcpInPastWinSegs = 66 tcpInPastWinBytes =102280302
tcpInWinProbe = 2142 tcpInWinUpdate = 3092
tcpInClosed = 1218 tcpRttNoUpdate =391989
tcpRttUpdate =441925010 tcpTimRetrans =185795
tcpTimRetransDrop = 456 tcpTimKeepalive = 8077
tcpTimKeepaliveProbe= 3054 tcpTimKeepaliveDrop = 0
tcpListenDrop = 18265 tcpListenDropQ0 = 0
tcpHalfOpenDrop = 0 tcpOutSackRetrans =255744
Apparently one end of the connection (at server, in this scenario) ignored the 0 length read (EOF) and trying to read the data from the connection as if it is still a duplex connection.
But how to check if the other end has really closed the connection?
According to man page of
recv
:Upon successful completion, recv() returns the length of the message in bytes. If no messages are available to be received and the peer has performed an orderly shutdown, recv() returns 0. Otherwise, -1 is returned and errno is set to indicate the error.
So, a simple check on the return value of
recv()
would do. Just to make sure that the other end is really intended to close the connection, but not sending null strings (very unlikely though), try this: after a series of EOFs (ie., return value 0) from recv()
, try to write some data to the socket. It would result in a "connection reset" (ECONNRESET) error. A subsequent (second) write results in a "broken pipe" (EPIPE) error. Then it is safe to assume that the other end has closed the connection.I just suggested the responsible engineer to check the return value of
recv()
and close the connection when it is safe to do so (see above).About CLOSE_WAIT state:
CLOSE_WAIT state means the other end of the connection has been closed while the local end is still waiting for the application to close. That's normal. But an indefinite CLOSE_WAIT state normally indicates some application level bug. TCP connections will move to the CLOSE_WAIT state from the ESTABLISHED state after receiving a FIN from the remote system but before a close has called from the local application.
The CLOSE_WAIT state signifies that the endpoint has received a FIN from the peer, indicating that the peer has finished writing ie., it has no more data to send. This will be indicated by a 0 length read on the input. The connection is now half-closed or a simplex connection (one way) the receiver of the FIN still has the option of writing more data. The state can persist indefinitely as a it is perfectly valid, synchronized tcp state. The peer should be in FIN_WAIT_2 (i.e. sent fin, received ack, waiting for fin). It's only an application's fault, if the it ignores the EOF (0 length read) and persists as if the connection is still a duplex connection.
Note that an application that only intends to receive data and not send any, might close its end of the connection, which leaves the other end in CLOSE_WAIT until the process at that end is done sending data and issues a close. (But that's not the case in this scenario.)
State diagram for the closing phase of a TCP connection:
Server Client
| Fin |
CLOSE_WAIT|<-------------- | FIN_WAIT_1
| |
| Ack |
|--------------->| FIN_WAIT_2
| |
| |
| |
| |
| |
| |
| Fin |
LAST_ACK |--------------->| TIME_WAIT
| |
| Ack |
|<-------------- |
CLOSED | |
| |
Reference:
Sun Alert document:
TCP: Why do I have tcp connections in the CLOSE_WAIT state?
Suggested reading:
RFC 793 Transmission Control protocol
No comments:
Post a Comment