Mandalika's scratchpad: CPU hog with connections in CLOSE

Mandalika's scratchpad

[ Work blog @Oracle | My Music Compositions ]

Old Posts: 09.04 10.04 11.04 12.04 01.05 02.05 03.05 04.05 05.05 06.05 07.05 08.05 09.05 10.05 11.05 12.05 01.06 02.06 03.06 04.06 05.06 06.06 07.06 08.06 09.06 10.06 11.06 12.06 01.07 02.07 03.07 04.07 05.07 06.07 08.07 09.07 10.07 11.07 12.07 01.08 02.08 03.08 04.08 05.08 06.08 07.08 08.08 09.08 10.08 11.08 12.08 01.09 02.09 03.09 04.09 05.09 06.09 07.09 08.09 09.09 10.09 11.09 12.09 01.10 02.10 03.10 04.10 05.10 06.10 07.10 08.10 09.10 10.10 11.10 12.10 01.11 02.11 03.11 04.11 05.11 07.11 08.11 09.11 10.11 11.11 12.11 01.12 02.12 03.12 04.12 05.12 06.12 07.12 08.12 09.12 10.12 11.12 12.12 01.13 02.13 03.13 04.13 05.13 06.13 07.13 08.13 09.13 10.13 11.13 12.13 01.14 02.14 03.14 04.14 05.14 06.14 07.14 09.14 10.14 11.14 12.14 01.15 02.15 03.15 04.15 06.15 09.15 12.15 01.16 03.16 04.16 05.16 06.16 07.16 08.16 09.16 12.16 01.17 02.17 03.17 04.17 06.17 07.17 08.17 09.17 10.17 12.17 01.18 02.18 03.18 04.18 05.18 06.18 07.18 08.18 09.18 11.18 12.18 01.19 02.19 05.19 06.19 08.19 10.19 11.19 05.20 10.20 11.20 12.20 09.21 11.21 12.22

Thursday, May 12, 2005

CPU hog with connections in CLOSE_WAIT

Couple of days back I got a call at our partner's site, to look into an issue where one process (server) is hogging all the processing power with absolutely no load on the server. The server process is running on Solaris 9.


% prstat 1 1
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  2160 QAtest    941M  886M cpu0     0    0  80:03:57  99% myserver/41
 28352 patrol   7888K 6032K sleep   59    0   4:49:37 0.1% bgscollect/1
 24720 QAtest   1872K 1656K cpu3    59    0   0:00:00 0.0% prstat/1
    59 root     4064K 3288K sleep   59    0   0:27:56 0.0% picld/6
  2132 QAtest    478M  431M sleep   59    0   0:15:45 0.0% someserver.exe/901

I started off with my favorite tool truss, and found that the recv() system call is being called tons of times with no corresponding send().


% truss -c -p 2160
^Csyscall               seconds   calls  errors
time                     .001     115
lwp_park                 .001      51     24
lwp_unpark               .000      23
poll                     .002      34
recv                   61.554 2512863
                     --------  ------   ----
sys totals:            61.561 2513086     24
usr time:              12.008
elapsed:               68.350

Interestingly the return value of all recv() calls is 0 (EOF). A return value of 0 is an indication that the the other end has nothing more to write and ready to close the socket (connection).


% head /tmp/truss.log
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0

A typical recv() call will be like this:

recv(55, 0x05CCB010, 4096, 0)                   = 2958

Then collected the network statistics, and found quite a number of connections in CLOSE_WAIT state


% netstat -an
...
127.0.0.1.54356      127.0.0.1.9810       49152      0 49152      0 ESTABLISHED
127.0.0.1.9810       127.0.0.1.54356      49152      0 49152      0 ESTABLISHED
127.0.0.1.54687      127.0.0.1.9810       49152      0 49152      0 ESTABLISHED
127.0.0.1.9810       127.0.0.1.54687      49152      0 49152      0 ESTABLISHED
...
127.0.0.1.9710       127.0.0.1.55830      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.57701      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9710       127.0.0.1.59209      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.60694      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.61133      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.61136      49152      0 49152      0 CLOSE_WAIT
...

(Later realized that these half-closed socket connections have been lying there for more than two days).

2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0 <- from truss

The next step is to find out the state of the network connection, with socket id: 294. pfiles utility of Solaris, reports the information for all open files in each process. It makes sense to use this utility, as the socket descriptor is nothing, but a file id. (On UNIX, everything is mapped to a file including the raw devices)


% pfiles 2160
2160:   /export/home/QAtest/572bliss/web/bin/myserver
  Current rlimit: 1024 file descriptors
...
 294: S_IFSOCK mode:0666 dev:259,0 ino:35150 uid:0 gid:0 size:0
      O_RDWR
        sockname: AF_INET 127.0.0.1  port: 9710
        peername: AF_INET 127.0.0.1  port: 59209

Now it is fairly easy to identify the connection with the port numbers reported in pfiles output


% netstat -an | grep 59209
127.0.0.1.9710       127.0.0.1.59209      49152      0 49152      0 CLOSE_WAIT

A closer look at the other socket ids from truss indicated that the server is continuously trying to read data from connections that are in CLOSE_WAIT state. Here are the corresponding statistics for TCP:

% netstat -s
 
TCP     tcpRtoAlgorithm     =     4     tcpRtoMin           =   400
        tcpRtoMax           = 60000     tcpMaxConn          =    -1
        tcpActiveOpens      =4593219    tcpPassiveOpens     =2259153
        tcpAttemptFails     =4036987    tcpEstabResets      = 20254
        tcpCurrEstab        =    75     tcpOutSegs          =1264739589
        tcpOutDataSegs      =645683085  tcpOutDataBytes     =1480883468
        tcpRetransSegs      =682053     tcpRetransBytes     =759804724
        tcpOutAck           =618848538  tcpOutAckDelayed    =40226142
        tcpOutUrg           =   351     tcpOutWinUpdate     =155203
        tcpOutWinProbe      =  3278     tcpOutControl       =18622247
        tcpOutRsts          =8970930    tcpOutFastRetrans   = 60772
        tcpInSegs           =1622143125
        tcpInAckSegs        =443838358  tcpInAckBytes       =1459391481
        tcpInDupAck         =3254927    tcpInAckUnsent      =     0
        tcpInInorderSegs    =1462796453 tcpInInorderBytes   =550228772
        tcpInUnorderSegs    = 12095     tcpInUnorderBytes   =10680481
        tcpInDupSegs        = 60814     tcpInDupBytes       =30969565
        tcpInPartDupSegs    =    29     tcpInPartDupBytes   = 19498
        tcpInPastWinSegs    =    66     tcpInPastWinBytes   =102280302
        tcpInWinProbe       =  2142     tcpInWinUpdate      =  3092
        tcpInClosed         =  1218     tcpRttNoUpdate      =391989
        tcpRttUpdate        =441925010  tcpTimRetrans       =185795
        tcpTimRetransDrop   =   456     tcpTimKeepalive     =  8077
        tcpTimKeepaliveProbe=  3054     tcpTimKeepaliveDrop =     0
        tcpListenDrop       = 18265     tcpListenDropQ0     =     0
        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =255744

Apparently one end of the connection (at server, in this scenario) ignored the 0 length read (EOF) and trying to read the data from the connection as if it is still a duplex connection.

But how to check if the other end has really closed the connection?
According to man page of recv:
Upon successful completion, recv() returns the length of the message in bytes. If no messages are available to be received and the peer has performed an orderly shutdown, recv() returns 0. Otherwise, -1 is returned and errno is set to indicate the error.

So, a simple check on the return value of recv() would do. Just to make sure that the other end is really intended to close the connection, but not sending null strings (very unlikely though), try this: after a series of EOFs (ie., return value 0) from recv(), try to write some data to the socket. It would result in a "connection reset" (ECONNRESET) error. A subsequent (second) write results in a "broken pipe" (EPIPE) error. Then it is safe to assume that the other end has closed the connection.

I just suggested the responsible engineer to check the return value of recv() and close the connection when it is safe to do so (see above).

About CLOSE_WAIT state:

CLOSE_WAIT state means the other end of the connection has been closed while the local end is still waiting for the application to close. That's normal. But an indefinite CLOSE_WAIT state normally indicates some application level bug. TCP connections will move to the CLOSE_WAIT state from the ESTABLISHED state after receiving a FIN from the remote system but before a close has called from the local application.

The CLOSE_WAIT state signifies that the endpoint has received a FIN from the peer, indicating that the peer has finished writing ie., it has no more data to send. This will be indicated by a 0 length read on the input. The connection is now half-closed or a simplex connection (one way) the receiver of the FIN still has the option of writing more data. The state can persist indefinitely as a it is perfectly valid, synchronized tcp state. The peer should be in FIN_WAIT_2 (i.e. sent fin, received ack, waiting for fin). It's only an application's fault, if the it ignores the EOF (0 length read) and persists as if the connection is still a duplex connection.

Note that an application that only intends to receive data and not send any, might close its end of the connection, which leaves the other end in CLOSE_WAIT until the process at that end is done sending data and issues a close. (But that's not the case in this scenario.)

State diagram for the closing phase of a TCP connection:


               Server          Client
                 |      Fin       |
      CLOSE_WAIT|<-------------- | FIN_WAIT_1                  
                 |                |                
                 |      Ack       |                
                 |--------------->| FIN_WAIT_2     
                 |                |                
                 |                |                
                 |                |                
                 |                |
                 |                |
                 |                |
                 |      Fin       |
        LAST_ACK |--------------->| TIME_WAIT
                 |                |
                 |      Ack       |
                 |<-------------- |
          CLOSED |                | 
                 |                |

Reference:
Sun Alert document:
TCP: Why do I have tcp connections in the CLOSE_WAIT state?

Suggested reading:
RFC 793 Transmission Control protocol

Posted by: Giri Mandalika. # 3:45 PM

Comments: Post a Comment

<< Home

2004-2019