Mandalika's scratchpad: 05.05

Mandalika's scratchpad

[ Work blog @Oracle | My Music Compositions ]

Old Posts: 09.04 10.04 11.04 12.04 01.05 02.05 03.05 04.05 05.05 06.05 07.05 08.05 09.05 10.05 11.05 12.05 01.06 02.06 03.06 04.06 05.06 06.06 07.06 08.06 09.06 10.06 11.06 12.06 01.07 02.07 03.07 04.07 05.07 06.07 08.07 09.07 10.07 11.07 12.07 01.08 02.08 03.08 04.08 05.08 06.08 07.08 08.08 09.08 10.08 11.08 12.08 01.09 02.09 03.09 04.09 05.09 06.09 07.09 08.09 09.09 10.09 11.09 12.09 01.10 02.10 03.10 04.10 05.10 06.10 07.10 08.10 09.10 10.10 11.10 12.10 01.11 02.11 03.11 04.11 05.11 07.11 08.11 09.11 10.11 11.11 12.11 01.12 02.12 03.12 04.12 05.12 06.12 07.12 08.12 09.12 10.12 11.12 12.12 01.13 02.13 03.13 04.13 05.13 06.13 07.13 08.13 09.13 10.13 11.13 12.13 01.14 02.14 03.14 04.14 05.14 06.14 07.14 09.14 10.14 11.14 12.14 01.15 02.15 03.15 04.15 06.15 09.15 12.15 01.16 03.16 04.16 05.16 06.16 07.16 08.16 09.16 12.16 01.17 02.17 03.17 04.17 06.17 07.17 08.17 09.17 10.17 12.17 01.18 02.18 03.18 04.18 05.18 06.18 07.18 08.18 09.18 11.18 12.18 01.19 02.19 05.19 06.19 08.19 10.19 11.19 05.20 10.20 11.20 12.20 09.21 11.21 12.22

Thursday, May 26, 2005

Solaris: 32-bits , fopen() and max number of open files

Last friday I was assigned to look into an issue where the application is not able write into files, once it is up for more than one week. It is a 32-bit application running on Solaris (SPARC platform) and the error message says, too many open files. With little effort, we came to know that all those errors are due to the calls to fopen(), from the application.

A little background on stdio's fopen():

fopen() is part of stdio API. For a 32-bit application, a stdio library FILE structure represents the underlying file descriptor as an unsigned char (8 bits), limiting the range of file descriptors which can be opened as FILE's to 0-255 inclusive.

A common known problem (perhaps a "fact") is that when the 32-bit stdio is used in large applications on Solaris, the 255 limit for the number of open files, is frequently reached. File descriptors are allocated by the operating system starting at 0, and are then allocated in numerical order. Descriptors 0, 1, and 2 are opened for every process as stdin, stdout and stderr at startup.

open() system call can also be used to open files from a C program. Both
open and fopen use file descriptors which are taken from the total number of file descriptors allowed by the environment. Also the system allocates descriptors from the same pool of file descriptors, for calls to popen(), socket(), accept() and any other system call that returns a descriptor. That is, the same pool of file descriptors will be shared by various system calls like fopen, open, popen, accept, socket. So, if the application has numerous calls to these functions, and assuming if they are not closed immediately, it is very likely that a call to fopen may fail even before it reaches the 253 (266 - 3 = 253) file descriptors, that it can have them open as permitted by the OS.

However if the program uses open/popen/socket/accept exclusively, then the program will be able to open as many files/pipes/sockets/connections as the current soft limit allows. The soft limit defines how many files a process can open. There are actually two environmental limits. The soft limit and the hard limit. The soft limit is the number of files a process can open by default. The hard limit is the maximum number of files a process can open if it increases the soft limit.

The following C program illustrates the limitation of the number of open files with fopen():


% cat fopen.c
#include <stdio.h>
#include <errno.h>

#define MAXFOPEN 275

int main() {
        FILE *fps[MAXFOPEN];
        char fname[15];
        int i, j;

        /*
        * Test total number of fopen()'s which can be completed
        */

        for (i = 0; i < MAXFOPEN; i++) {
                sprintf(fname, "fopen_%d", i);
                if ((fps[i] = fopen(fname, "w+")) == NULL) {
                        perror("fopen fails");
                        break;
                }
        }

        printf("fopen() completes: %d\n", i);

        /*
        * Close the file descriptors
        */

        for (j =0; j < i; j++) {
                if (fclose(fps[j]) == EOF) {
                        perror("fclose failed");
                }
        }
        return (0);
}
% cc -o fopen fopen.c

% file fopen
fopen:          ELF 32-bit MSB executable SPARC32PLUS Version 1, V8+ Required, 
                   dynamically linked, not stripped

% ./fopen
fopen failed: Too many open files
fopen() completes: 253

How to resolve this issue:
Make it a 64-bit binary; it will allow the program to have 65536 open files

% cc -xarch=v9 -o fopen fopen.c

% file fopen
fopen:          ELF 64-bit MSB executable SPARCV9 Version 1, dynamically linked, not 
                   stripped

% ./fopen
fopen() completes: 275

As we can see, fopen() was able to overcome the 253 open files limitation with 64-bit executable.

Note:
To use 64-bits, the processor and the OS must have support for 64-bit binaries

Since I cannot re-compile the code at customer site, the option of creating 64-bit binaries has been ruled out.

A closer look at the output of lsof (LiSt of Open Files), gave me a clue that the most of the open files are actually TCP sockets/connections.


% grep TCP openfiles.log
app 6913 giri 11u  IPv4 0x30013e4d3c0       0t0 TCP *:49152 (LISTEN)
app 6913 giri 12u  IPv4 0x30011722200       0t0 TCP *:49153 (LISTEN)
app 6913 giri 13u  IPv4 0x300126e8680       0t0 TCP *:49154 (LISTEN)
app 6913 giri 14u  IPv4 0x30010faf300       0t0 TCP *:49155 (LISTEN)
app 6913 giri 15u  IPv4 0x300082c2d00       0t0 TCP *:49156 (LISTEN)
app 6913 giri 16u  IPv4 0x30011e3a180       0t0 TCP *:49157 (LISTEN)
app 6913 giri 17u  IPv4 0x30018e36700       0t0 TCP *:1571 (LISTEN)
app 6913 giri 18u  IPv4 0x30009bad900       0t0 TCP *:49158 (LISTEN)
app 6913 giri 43u  IPv4 0x3001aa0a700       0t0 TCP as7:44232->as7:49156 (ESTABLISHED)
app 6913 giri 46u  IPv4 0x30011cff800       0t0 TCP as7:1571->as3:27025 (ESTABLISHED)
app 6913 giri 49u  IPv4 0x3000ce48d40       0t0 TCP as7:1571->as3:27026 (ESTABLISHED)
app 6913 giri 51u  IPv4 0x300199d3980  0t722051 TCP as7:44238->repo:1521 (ESTABLISHED)
app 6913 giri 52u  IPv4 0x30014d40c40  0t793865 TCP as7:44239->repo:1521 (ESTABLISHED)
app 6913 giri 55u  IPv4 0x300197db340       0t0 TCP as7:1571->as3:27027 (ESTABLISHED)
app 6913 giri 56u  IPv4 0x30011b5f800  0t675177 TCP as7:44243->repo:1521 (ESTABLISHED)
app 6913 giri 57u  IPv4 0x30012853880       0t0 TCP as7:1571->as3:27028 (ESTABLISHED)
app 6913 giri 58u  IPv4 0x30011d94d00  0t723190 TCP as7:44244->repo:1521 (ESTABLISHED)
app 6913 giri 62u  IPv4 0x30016d5b240       0t0 TCP as7:1571->as3:27029 (ESTABLISHED)
app 6913 giri 63u  IPv4 0x3001126d9c0  0t575246 TCP as7:44247->repo:1521 (ESTABLISHED)
app 6913 giri 64u  IPv4 0x3000a825900       0t0 TCP as7:1571->as3:27030 (ESTABLISHED)
...
...
app 6913 giri 250u IPv4 0x300139899c0       0t0 TCP as7:1571->as3:27076 (ESTABLISHED)
app 6913 giri 251u IPv4 0x30017fc4700       0t0 TCP as7:1571->as3:27077 (ESTABLISHED)
app 6913 giri 252u IPv4 0x30011c3b900  0t403370 TCP as7:44390->repo:1521 (ESTABLISHED)
app 6913 giri 253u IPv4 0x3000cd32c40  0t445290 TCP as7:44391->repo:1521 (ESTABLISHED)
app 6913 giri 257u IPv4 0x30017f640c0       0t0 TCP as7:1571->as3:27078 (ESTABLISHED)
app 6913 giri 258u IPv4 0x300141f1280       0t0 TCP as7:1571->as3:27079 (ESTABLISHED)

So, to find the actual number of calls to fopen(), I have created a simple interposing library with only one interface that interposes on actual fopen() function.


% cat logfopen.c

#include <dlfcn.h>
#include <stdio.h>
#include <stdarg.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/errno.h>
#include <thread.h>
#include <synch.h>
#include <fcntl.h>

FILE *fopen(const char *filename, const char *mode) {
        FILE *fd;
        static void * (*func)();

        if(!func) {
                func = (void *(*)()) dlsym(RTLD_NEXT, "fopen");
                if (func == NULL) {
                        (void) fprintf(stderr, "dlopen(): %s\n", dlerror());
                        return(0);
                }
        }

        fd = func(filename, mode);
        if (fd != NULL) {
                fprintf(stderr, "\nfopen(): fd = %d filename = %s mode = %s", 
                       fileno(fd), filename, mode);
        } else {
                fprintf(stderr, "\nfopen() failed; returned NULL. Tried to open %s 
                       with mode: %s", filename, mode);
        }
        return (fd);
}

Interestingly the interposer caught only two calls to fopen(), during a 10 min real world simulation run of the application. This was confirmed by running truss tool.


% grep fopen stderrout.log
 fopen(): fd = 32 filename = /export/home/oracle/network/names/.sdns.ora mode = r
/export/home/C/liblogfopen.so:fopen+0x3c
 fopen(): fd = 32 filename = /export/home/oracle/network/admin/tnsnames.ora mode = r
/export/home/C/liblogfopen.so:fopen+0x3c

% grep fopen truss.log
6913/14@14:     -> libc:fopen(0xe48f6178, 0xe48f627c, 0x1, 0x61)
6913/14@14:     <- libc:fopen() = 0
6913/14@14:     -> libc:fopen(0xe48f8ad0, 0xe48f8bd4, 0x0, 0x61)
6913/14@14:     <- libc:fopen() = 0xfdae884c

This observation made my job little simple. To resolve this particular customer issue, the application just needs to reserve low numbered file descriptors <= 255 for use by fopen().

How to reserve the file descriptors?

Use file control function, fcntl() to return lowest file descriptor greater than or equal to 256, that is not already associated with an open file. fcntl() takes the OS assigned file descriptor and returns a new file descriptor greater than or equal to the value passed as 3rd argument. ie., once fcntl() successfully returns a new file descriptor, we will have two file descriptors pointing to the same open file. Since our intention is to make, as many file descriptors available as possible for fopen() to succeed, and since we don't need two file descriptors, we can close the OS assigned file descriptor safely.

In fact, database management systems like Oracle, Sybase, Informix addressed the fopen issue by employing this technique of reserving the low numbered file descriptors for exclusive use by stdio routines.

As changing the application code is not feasible (and not possible), this can be done very easily with an interposing library, with interfaces to open(), popen(), socket(), accept(). (A brief introduction to library interposition, is available at: Solaris: hijacking a function call (interposing)). The interfaces of the interposing library catches all calls to open(), popen(), socket(), accept() etc., even before the actual implementation receives the call, and duplicates the file descriptors with the help of fcntl() function, to get a new file descriptor that is > 256, and returns the OS assigned file descriptor, to the pool of available file descriptors ie., to the OS.

Interposing code for socket():


% cat fopenfix.c
#include <stdio.h>
#include <dlfcn.h>
#include <ucontext.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>

int socket(int domain, int type, int protocol) {
        int sd, newsd = -1;

        static void * (*func)();

        if(!func) {
                func = (void *(*)()) dlsym(RTLD_NEXT, "socket");
        }

        sd = (int) func(domain, type, protocol);
        if (sd != NULL) {
                if (sd < 256) {
                        newsd = (int) fcntl(sd, F_DUPFD, 256);
                        if (newsd == -1) {
                                fprintf(stderr, "\nfcntl() failed. Cannot return %d to OS", sd);
                                return (sd);
                        } else {
                                close (sd);
                                return (newsd);
                        } // if-else
                } else {
                        return (sd);
                }
        }  else {
                return (sd);
        }
}

The same functionality has to be replicated for the other interfaces:

int open(const char *path, int oflag, ...);
int accept(int s, struct sockaddr *addr, void *addrlen);
...

After preloading the interposing library, all the file descriptors of open, popen, socket, accept etc., were mapped to the descriptors > 256, leaving enough room for fopen() to have nearly 253 open files.
_______________________

Because of issues like this, it is recommended to use open() and its family of functions, for file handling instead of stdio's fopen() and its subordinates. It is a good practice for the developers to think about problems like this and handle them properly (just like Oracle/Informix/.. did), during the early stages of the design/development of the application.

References and suggested reading:

Posted by: Giri Mandalika. # 5:57 PM 3 comments.

Friday, May 20, 2005

Behavior of Sun C++ Compiler While Compiling Templates

When the C++ compiler finds a template declaration in a header (.h) file, the compiler needs the definition of the template to allow compilations to proceed faster (on average), because template definitions that are not used don't need to be processed. Hence the compiler automatically searches for a .cc or .C or .cpp etc file with the same name. If such a file exists, it is automatically included in the current compilation. Note that it is not separately compiled. This compiler behavior means that some source code organizations won't work with our compiler.

For example, the compilation of the following driver program fails with Multiple declaration of a variable. The driver program calls a function template multiply, which in turn calls another function template
Array. The definitions of multiply and Array are in different source (.cpp) files. Since there is a dependency between Array and multiply function templates, the compiler tries to include both source files into the compilation unit and hence the failure.


%cat array.h
#ifndef _ARRAY_H_
#define _ARRAY_H_

const int ArraySize = 20;
template <class Type> class Array {

private:
        Type* data;
        int size;
public:
        Array(int sz=ArraySize);
        int GetSize();
};

#endif // _ARRAY_H_

% cat array.cpp
static const char file_id [ ] = "$Header: array.cpp 1 04/05/05 1:35p Giri $";

#include "array.h"

template <class Type> Array<Type>::Array(int sz) {
        size = sz;
        data = new Type[size];
}

template <class Type> int Array<Type>::GetSize() {
        return size;
}

% cat multiply.h
#include "array.h"
int AnyNumber;

template <class Number>

Number multiply(Number original);

% cat multiply.cpp
static const char file_id [ ] = "$Header: multiply.cpp 1 04/05/05 1:35p Giri $" ;

template <class Number>

Number multiply( Number original ) {
        Array<int> IntArray;
        int size = IntArray.GetSize();

        return (size * original);
}

% cat driver.cpp
#include "multiply.h"
#include <stdio.h>

int main( ) {
        printf("\n ** %d **\n", multiply(50));
}

% CC -o driver driver.cpp
"array.cpp", line 1: Error: file_id is initialized twice.
"array.cpp", line 1: Error: Multiple declaration for file_id.
2 Error(s) detected.

% truss -f -o truss.log CC -o driver driver.cpp
% cat truss.log | egrep "multiply|array"
24813:  open("multiply.h", O_RDONLY)                    = 6
24813:  open("multiply.cpp", O_RDONLY)                  = 5
24813:  open("array.h", O_RDONLY)                       = 6
24813:  open("array.cpp", O_RDONLY)                     = 5

Note that both multiply.cpp and array.cpp are syntactically correct; but the problem can be seen only during the compilation of multiply routine. The above mentioned behavior of the compiler has been documented in Sun C++ User's Guide, Compiling Templates chapter.

Sun C++ User's Guide suggests employing a definitions separate template compilation model. This model can better be described as follows: If file x.h has any template declarations, a file called x.cc or x.C or x.cpp, etc must contain definitions of those templates, and nothing else; no #include directives, no definitions of anything other than the templates declared in x.h.

To comply with the definitions separate template compilation model, array.cpp and multiply.cpp files can be modified as follows:


% cat array.cpp
template <class Type> Array<Type>::Array(int sz) {
        size = sz;
        data = new Type[size];
}

template <class Type> int Array<Type>::GetSize() {
        return size;
}

% cat multiply.cpp

template <class Number>

Number multiply( Number original ) {
        Array<int> IntArray;
        int size = IntArray.GetSize();

        return (size * original);
}

Now the compilation of driver.cpp should succeed, due to the implementation of definitions separate template compilation model.


% CC -o driver driver.cpp
%./driver

 ** 1000 **

As we can see, the compilation succeeds and the driver program prints the expected result on console.

If the source code organization does not follow this model, you can use the compiler option -template=no%extdef. This option tells the compiler not to look for template definitions in associated files. With this compiler option, the
compilation succeeds, but the linking may fail. For example, compiling the original source files with -template=no%extdef compiler option, fails during linking phase with the following error:


% CC -o driver -template=no%extdef driver.cpp
Undefined                       first referenced
 symbol                             in file
__type_0 multiply(__type_0)            driver.o
ld: fatal: Symbol referencing errors. No output written to driver

Moral of the story: rely on "definitions separate template compilation" model as suggested by the documentation, but not on temporary workarounds.

If source code changes are not feasible, carefully guarding the common interfaces, variable names etc., with #ifdef directives will do the trick and the compilation and eventually linking succeds.

Acknowledgements:
Steve Clamage of Sun Microsystems

Posted by: Giri Mandalika. # 5:46 PM 0 comments.

Thursday, May 19, 2005

Solaris: hijacking a function call (interposing)

Sometimes it is necessary to alter the functionality of a routine, or collect some data from a malfunctioning routine, for debugging. It works well, as long as we have the access to source code. But what if we don't have access to source code or changes to the source code is not feasible? With dynamic libraries, it is very easy to intercept any call to a routine of choice, and can do whatever we wish to do in that routine, including calling the real routine the client intended to call.

In simple words, the hacker (who writes the interposing library, in this context) writes a new library with the exact interfaces of the routines, that (s)he wish to intercept, and preloads the new library before starting up the application. It works well, as long as the targeted interfaces are not protected. On Solaris, with linker's -Bsymbolic option or Sun Studio compiler's -xldscope=symbolic option, all symbols of a library can be made non-interposable (those symbols are called protected symbols, since no one else can interpose on them). If the targeted routine is interposable, dynamic linker simply passes the control to whatever symbol it encounters first, that matches the function call (callee). Now with the preloaded library in force, hacker gets control over the routine. At this point, it is upto the hacker whether to pass the control to the actual routine that the client is intended to call. If the intention is just to collect data and let go, the required data can be collected and the control will be passed to the actual routine with the help of libdl routines. Note that the control has to be passed explicitly to the actual routine; and as far as dynamic linker is concerned, it is done with its job once it passes the control to the function (interposer in this case). If the idea is to completely change the behavior of the routine (easy to write a new routine with the new behavior, but the library and the clients have to be re-built to make use of the new routine), the new implementation will be part of the interposing routine and the control will never be passed to the actual routine. Yet in worst cases, a malicious hacker can intercept data that is supposed to be confidential (eg., passwords, account numbers etc.,) and may do more harm at his wish.

[Off-topic] To guard against such attacks, it is recommended to make most of the symbols local in scope, with the help of linker supported map files or compiler supported linker scoping mechanism. Read http://developers.sun.com/tools/cc/articles/symbol_scope.html to learn more about linker scoping.

The above mentioned technique is commonly referred as library interposition; and as we can see it is quite useful for debugging, collecting run-time data, and for performance tuning of an application.

It would be more interesting to see some interceptor in action. So, let's build a very small library with only one routine fopen(). The idea is to collect the number of calls to fopen() and to find out the files being opened. Our interceptor, simply prints a message on the console with the file name to be opened, everytime there is a call to fopen() from the application. Then it passes the control to fopen() routine of libc. For this, first we need to get the signature of fopen(). fopen() is declared in stdio.h as follows:
FILE *fopen(const char *filename, const char *mode);

Here is the source code for the interposer:


% cat interceptfopen.c
#include <stdio.h>
#include <dlfcn.h>

FILE *fopen(const char *filename, const char *mode) {
        FILE *fd = NULL;
        static void *(*actualfunction)();

        if (!actualfunction) {
                actualfunction = (void *(*)()) dlsym(RTLD_NEXT, "fopen");
        }

        printf("\nfopen() has been called. file name = %s, mode = %s   \n
             Forwarding the control to fopen() of libc", filename, mode);
        fd = actualfunction(filename, mode);
        return(fd);
}

% cc -G -o libfopenhack.so interceptfopen.c
% ls -lh libfopenhack.so
-rwxrwxr-x   1 build    engr        3.7K May 19 19:02 libfopenhack.so*

actualfunction is a function pointer to the actual fopen() routine, which is in libc. dlsym is part of libdl and the RTLD_NEXT argument directs the dynamic linker (ld.so.1) to find the next reference to the specified function, using the normal dynamic linker search sequence.

Let's proceed to write a simple C program, that writes and reads a string to and from a file.


% cat fopenclient.c
#include <stdio.h>

int main () {
        FILE * pFile;
        char string[30];

        pFile = fopen ("myfile.txt", "w");
        if (pFile != NULL) {
                fputs ("Some Random String", pFile);
                fclose (pFile);
        }

        pFile = fopen ("myfile.txt", "r");
        if (pFile != NULL) {
                fgets (string , 30 , pFile);
                printf("\nstring = %s", string);
                fclose (pFile);
        } else {
                perror("fgets(): ");
        }
        return 0;
}
% cc -o fopenclient fopenclient.c
% ./fopenclient
string = Some Random String

With no interceptor, everything works as expected. Now let's introduce the interceptor and collect the data, during run-time.


% setenv LD_PRELOAD ./libfopenhack.so

% ./fopenclient
fopen() has been called. file name = myfile.txt, mode = w
Forwarding the control to fopen() of libc
fopen() has been called. file name = myfile.txt, mode = r
Forwarding the control to fopen() of libc
string = Some Random String

%unsetenv LD_PRELOAD

As we can see from the above output, the interceptor received the calls to fopen(), instead of the actual implementation in libc. And the advantages of this technique is evident from this simple example, and it is up to the hacker to take advantage or abuse the flexibility of symbol interposition.

Suggested Reading:

Posted by: Giri Mandalika. # 7:24 PM 2 comments.

Wednesday, May 18, 2005

Sun C/C++: Reducing symbol scope with Linker Scoping feature

This article was published on Sun developer's portal at: http://developers.sun.com/tools/cc/articles/symbol_scope.html

I have been working on this article for more than 3 months, and glad to learn quite a few new things, from the extensive feedback of Lawrence Crowl and Steve Clamage, of Sun C/C++ compiler team.

Keywords:
Linker Scoping, Global, Symbolic, Hidden, __global, __symbolic, __hidden, __declspec, dllexport, dllimport, xldscope, xldscoperef, linker map files

Posted by: Giri Mandalika. # 1:51 PM 0 comments.

Friday, May 13, 2005

Csh: Arguments too long error

Symptom:
C-shell fails to execute commands with arguments using wildcard characters.

eg.,

% \rm -rf *
Arguments too long

% ls -l | wc
    8462   76151  550202

The reason for this failure is that the wildcard has exceeded C-shell limitation(s). The command in this example is evaluating to a very long string. It overwhelmed the csh limit of 1706, for the maximum number of arguments to a command for which filename expansion applies.

Workarounds:

Use multiple commands Or

Use xargs utility

% \rm -rf *
Arguments too long
% ls | xargs rm -rf
% ls
%

% \rm -rf *
Arguments too long
% find . -name "*" | xargs rm -rf
% ls
%

From Jerry Peek's Handle Too-Long Command Lines with xargs:

xargs reads a group of arguments from its standard input, then runs a UNIX command with that group of arguments. It keeps reading arguments and running the command until it runs out of arguments.
The shell's backquotes do the same kind of thing, but they give all the arguments to the command at once. That's the main reason for the Arguments too long error, when the shell reaches its limitations

Reference:
Man page of csh

Thanks to Chris Quenelle for suggesting the xargs workaround

Posted by: Giri Mandalika. # 9:21 PM 1 comments.

Thursday, May 12, 2005

CPU hog with connections in CLOSE_WAIT

Couple of days back I got a call at our partner's site, to look into an issue where one process (server) is hogging all the processing power with absolutely no load on the server. The server process is running on Solaris 9.


% prstat 1 1
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  2160 QAtest    941M  886M cpu0     0    0  80:03:57  99% myserver/41
 28352 patrol   7888K 6032K sleep   59    0   4:49:37 0.1% bgscollect/1
 24720 QAtest   1872K 1656K cpu3    59    0   0:00:00 0.0% prstat/1
    59 root     4064K 3288K sleep   59    0   0:27:56 0.0% picld/6
  2132 QAtest    478M  431M sleep   59    0   0:15:45 0.0% someserver.exe/901

I started off with my favorite tool truss, and found that the recv() system call is being called tons of times with no corresponding send().


% truss -c -p 2160
^Csyscall               seconds   calls  errors
time                     .001     115
lwp_park                 .001      51     24
lwp_unpark               .000      23
poll                     .002      34
recv                   61.554 2512863
                     --------  ------   ----
sys totals:            61.561 2513086     24
usr time:              12.008
elapsed:               68.350

Interestingly the return value of all recv() calls is 0 (EOF). A return value of 0 is an indication that the the other end has nothing more to write and ready to close the socket (connection).


% head /tmp/truss.log
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0
2160/222:       recv(59, 0x1F4CB410, 32768, 0)                  = 0
2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0

A typical recv() call will be like this:

recv(55, 0x05CCB010, 4096, 0)                   = 2958

Then collected the network statistics, and found quite a number of connections in CLOSE_WAIT state


% netstat -an
...
127.0.0.1.54356      127.0.0.1.9810       49152      0 49152      0 ESTABLISHED
127.0.0.1.9810       127.0.0.1.54356      49152      0 49152      0 ESTABLISHED
127.0.0.1.54687      127.0.0.1.9810       49152      0 49152      0 ESTABLISHED
127.0.0.1.9810       127.0.0.1.54687      49152      0 49152      0 ESTABLISHED
...
127.0.0.1.9710       127.0.0.1.55830      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.57701      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9710       127.0.0.1.59209      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.60694      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.61133      49152      0 49152      0 CLOSE_WAIT
127.0.0.1.9810       127.0.0.1.61136      49152      0 49152      0 CLOSE_WAIT
...

(Later realized that these half-closed socket connections have been lying there for more than two days).

2160/216:       recv(294, 0x277C9410, 32768, 0)                 = 0 <- from truss

The next step is to find out the state of the network connection, with socket id: 294. pfiles utility of Solaris, reports the information for all open files in each process. It makes sense to use this utility, as the socket descriptor is nothing, but a file id. (On UNIX, everything is mapped to a file including the raw devices)


% pfiles 2160
2160:   /export/home/QAtest/572bliss/web/bin/myserver
  Current rlimit: 1024 file descriptors
...
 294: S_IFSOCK mode:0666 dev:259,0 ino:35150 uid:0 gid:0 size:0
      O_RDWR
        sockname: AF_INET 127.0.0.1  port: 9710
        peername: AF_INET 127.0.0.1  port: 59209

Now it is fairly easy to identify the connection with the port numbers reported in pfiles output


% netstat -an | grep 59209
127.0.0.1.9710       127.0.0.1.59209      49152      0 49152      0 CLOSE_WAIT

A closer look at the other socket ids from truss indicated that the server is continuously trying to read data from connections that are in CLOSE_WAIT state. Here are the corresponding statistics for TCP:

% netstat -s
 
TCP     tcpRtoAlgorithm     =     4     tcpRtoMin           =   400
        tcpRtoMax           = 60000     tcpMaxConn          =    -1
        tcpActiveOpens      =4593219    tcpPassiveOpens     =2259153
        tcpAttemptFails     =4036987    tcpEstabResets      = 20254
        tcpCurrEstab        =    75     tcpOutSegs          =1264739589
        tcpOutDataSegs      =645683085  tcpOutDataBytes     =1480883468
        tcpRetransSegs      =682053     tcpRetransBytes     =759804724
        tcpOutAck           =618848538  tcpOutAckDelayed    =40226142
        tcpOutUrg           =   351     tcpOutWinUpdate     =155203
        tcpOutWinProbe      =  3278     tcpOutControl       =18622247
        tcpOutRsts          =8970930    tcpOutFastRetrans   = 60772
        tcpInSegs           =1622143125
        tcpInAckSegs        =443838358  tcpInAckBytes       =1459391481
        tcpInDupAck         =3254927    tcpInAckUnsent      =     0
        tcpInInorderSegs    =1462796453 tcpInInorderBytes   =550228772
        tcpInUnorderSegs    = 12095     tcpInUnorderBytes   =10680481
        tcpInDupSegs        = 60814     tcpInDupBytes       =30969565
        tcpInPartDupSegs    =    29     tcpInPartDupBytes   = 19498
        tcpInPastWinSegs    =    66     tcpInPastWinBytes   =102280302
        tcpInWinProbe       =  2142     tcpInWinUpdate      =  3092
        tcpInClosed         =  1218     tcpRttNoUpdate      =391989
        tcpRttUpdate        =441925010  tcpTimRetrans       =185795
        tcpTimRetransDrop   =   456     tcpTimKeepalive     =  8077
        tcpTimKeepaliveProbe=  3054     tcpTimKeepaliveDrop =     0
        tcpListenDrop       = 18265     tcpListenDropQ0     =     0
        tcpHalfOpenDrop     =     0     tcpOutSackRetrans   =255744

Apparently one end of the connection (at server, in this scenario) ignored the 0 length read (EOF) and trying to read the data from the connection as if it is still a duplex connection.

But how to check if the other end has really closed the connection?
According to man page of recv:
Upon successful completion, recv() returns the length of the message in bytes. If no messages are available to be received and the peer has performed an orderly shutdown, recv() returns 0. Otherwise, -1 is returned and errno is set to indicate the error.

So, a simple check on the return value of recv() would do. Just to make sure that the other end is really intended to close the connection, but not sending null strings (very unlikely though), try this: after a series of EOFs (ie., return value 0) from recv(), try to write some data to the socket. It would result in a "connection reset" (ECONNRESET) error. A subsequent (second) write results in a "broken pipe" (EPIPE) error. Then it is safe to assume that the other end has closed the connection.

I just suggested the responsible engineer to check the return value of recv() and close the connection when it is safe to do so (see above).

About CLOSE_WAIT state:

CLOSE_WAIT state means the other end of the connection has been closed while the local end is still waiting for the application to close. That's normal. But an indefinite CLOSE_WAIT state normally indicates some application level bug. TCP connections will move to the CLOSE_WAIT state from the ESTABLISHED state after receiving a FIN from the remote system but before a close has called from the local application.

The CLOSE_WAIT state signifies that the endpoint has received a FIN from the peer, indicating that the peer has finished writing ie., it has no more data to send. This will be indicated by a 0 length read on the input. The connection is now half-closed or a simplex connection (one way) the receiver of the FIN still has the option of writing more data. The state can persist indefinitely as a it is perfectly valid, synchronized tcp state. The peer should be in FIN_WAIT_2 (i.e. sent fin, received ack, waiting for fin). It's only an application's fault, if the it ignores the EOF (0 length read) and persists as if the connection is still a duplex connection.

Note that an application that only intends to receive data and not send any, might close its end of the connection, which leaves the other end in CLOSE_WAIT until the process at that end is done sending data and issues a close. (But that's not the case in this scenario.)

State diagram for the closing phase of a TCP connection:


               Server          Client
                 |      Fin       |
      CLOSE_WAIT|<-------------- | FIN_WAIT_1                  
                 |                |                
                 |      Ack       |                
                 |--------------->| FIN_WAIT_2     
                 |                |                
                 |                |                
                 |                |                
                 |                |
                 |                |
                 |                |
                 |      Fin       |
        LAST_ACK |--------------->| TIME_WAIT
                 |                |
                 |      Ack       |
                 |<-------------- |
          CLOSED |                | 
                 |                |

Reference:
Sun Alert document:
TCP: Why do I have tcp connections in the CLOSE_WAIT state?

Suggested reading:
RFC 793 Transmission Control protocol

Posted by: Giri Mandalika. # 3:45 PM 0 comments.

Saturday, May 07, 2005

Solaris: Mounting a CD-ROM manually

Get the device name in cxtydzsn format, associated with the CD drive

% iostat -En

c1t0d0           Soft Errors: 149 Hard Errors: 0 Transport Errors: 0 
Vendor: MATSHITA Product: CDRW/DVD UJDA740 Revision: 1.00 Serial No:  
Size: 0.56GB <555350016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 149 Predictive Failure Analysis: 0

-E displays all device error statistics & -n shows the names in descriptive format. We are interested in only the logical device name though.
slice 0 is the default

Mount the device associated with CD-ROM
- % mount -F hsfs -o ro /dev/dsk/c1t0d0s0 /cdrom
  - ensure the existence of the mount point /cdrom, before running the mount command
  - -F specifies the type of the file system on which to operate. High Sierra File System (HSFS) is the file system for CD-ROM
    -o specifies the file system options. ro stands for read-only. Default is rw (read-write).

Check the file system

% df

/                  (/dev/dsk/c0d0s0   ):11467100 blocks   959851 files
/devices           (/devices          ):       0 blocks        0 files
/system/contract   (ctfs              ):       0 blocks 2147483616 files
/proc              (proc              ):       0 blocks     7782 files
/etc/mnttab        (mnttab            ):       0 blocks        0 files
/etc/svc/volatile  (swap              ): 1083672 blocks   120236 files
/system/object     (objfs             ):       0 blocks 2147483503 files
/dev/fd            (fd                ):       0 blocks        0 files
/tmp               (swap              ): 1083672 blocks   120236 files
/var/run           (swap              ): 1083672 blocks   120236 files
/export/home       (/dev/dsk/c0d0s7   ): 7351834 blocks  2086532 files
/cdrom             (/dev/dsk/c1t0d0s0 ):       0 blocks        0 files

% ls /cdrom
cdi/ ext/ mpegav/ segment/ vcd/

Posted by: Giri Mandalika. # 3:48 PM 2 comments.

Wednesday, May 04, 2005

C/C++: global const variables, symbol collisions & symbolic scoping

(Most of the following is "generic" C/C++. Sun Studio compilers were used to compile the code and to propose a solution to symbol collision problem)

The way C++ handles global const variables is different from C.

In C++, a global const variable that is not explicitly declared extern has static linkage.

In C, global const variables will have extern linkage by default, and global variables can be declared more than once. As long as a single initialization (at most) for the same variable is used, the linker resolves all the repeated declarations into a single entity; and the the initialization takes place when the program starts up, before entry to the main function.

This can be illustrated with a simple C program, that produces different results when compiled with C and C++ compilers


% cat mylib.h
const float libraryversion = 2.2;
float getlibversion();
int checklibversion();

% cat mylib.c
#include <stdio.h>
#include "mylib.h"

float getlibversion() {
        printf("\nmylib.c: libraryversion = %f", libraryversion);
        return (libraryversion);
}

int checklibversion() {
        float ver;
        ver = getlibversion();
        printf("\nmylib.c: ver = %f", ver);
        if (ver < 2.0) {
                return (1);
        } else {
                return (0);
        }
}

% cat thirdpartylib.h
extern const float libraryversion = 1.5;
float getlibversion();

% cat thirdpartylib.c
#include <stdio.h>
#include "thirdpartylib.h"

float getlibversion() {
        printf("\nthirdparty.c: libraryversion = %f", libraryversion);
        return (libraryversion);
}

% cat versioncheck.c
#include <stdio.h>
#include "mylib.h"

int main() {
        printf("\n** versioncheck.c: libraryversion = %f", libraryversion);
       int retval = 0;
        retval = checklibversion();
        if (retval) {
                printf("\n** Obsolete version being used .. Can\'t proceed further! **\n");
        } else {
                printf("\n** Met the library version requirement .. Good to Go! ** \n");
        }
        return (0);
}

Case 1:
Compile with Sun Studio C compiler:


% cc -G -o libmylib.so mylib.c
% cc -G -o libthirdparty.so thirdpartylib.c
% cc -o vercheck -lthirdparty -lmylib versioncheck.c
% ./vercheck

** versioncheck.c: libraryversion = 2.200000
thirdparty.c: libraryversion = 2.200000
mylib.c: ver = 2.200000
** Met the library version requirement .. Good to Go! **

From this output, it appears that it is working as expected although there is a symbol collision between libmylib and libthirdparty load modules over libraryversion symbol.

Case 2:
Compile with Sun Studio C++ compiler:


% CC -G -o libmylib.so mylib.c
% CC -G -o libthirdparty.so thirdpartylib.c
% CC -o vercheck -lthirdparty -lmylib versioncheck.c
% ./vercheck

** versioncheck.c: libraryversion = 2.200000
thirdparty.c: libraryversion = 1.500000
mylib.c: ver = 1.500000
** Obsolete version being used .. Can't proceed further! **

The inherent symbol collision was exposed when the code was compiled with C++ compiler.

It is a known fact that the global const variables, as libraryversion in this example are bound to cause problems.

The following is an alternative implementation of the above example, which shows consistent behavior when compiled with C and C++ compilers.


% cat mylib_public.h
float getlibversion();
int checklibversion();

% cat mylib_private.h
#include "mylib_public.h"
const float libversion = 2.2;

% cat mylib.c
#include <stdio.h>
#include "mylib_private.h"

float getlibversion() {
        printf("\nmylib.c: libraryversion = %f", libraryversion);
        return (libraryversion);
}

int checklibversion() {
        float ver;
        ver = getlibversion();
        printf("\nmylib.c: ver = %f", ver);
        if (ver < 2.0) {
                return (1);
        } else {
                return (0);
        }
}

% cat versioncheck.c
#include <stdio.h>
#include "mylib_public.h"

int main() {
       int retval = 0;
        retval = checklibversion();
        if (retval) {
                printf("\n** Obsolete version being used .. Can\'t proceed further! **\n");
        } else {
                printf("\n** Met the library version requirement .. Good to Go! ** \n");
        }
        return (0);
}

Since we cannot control 3rd party implementation, it was kept intact in this example.

Case 1:
Compile with Sun Studio C compiler:


% cc -G -o libmylib.so mylib.c
% cc -G -o libthirdparty.so thirdpartylib.c
% cc -o vercheck -lthirdparty -lmylib versioncheck.c
% ./vercheck

thirdparty.c: libraryversion = 1.500000
mylib.c: ver = 1.500000
** Obsolete version being used .. Can't proceed further! **

Case 2:
Compile with Sun Studio C++ compiler:


% CC -G -o libmylib.so mylib.c
% CC  -G -o libthirdparty.so thirdpartylib.c
% CC -o vercheck -lthirdparty -lmylib versioncheck.c
% ./vercheck

thirdparty.c: libraryversion = 1.500000
mylib.c: ver = 1.500000
** Obsolete version being used .. Can't proceed further! **

Now with the new implementation, the behavior of the code is the same and as expected with both C and C++ compilers.

The final paragraph proposes a solution common to both C and C++, to resolve the symbol collision. With C++, symbol collisions can be minimized using namespaces.

symbolic (protected) scope
All symbols of a library get symbolic scope, when the library was built with Sun Studio's -xldscope=symbolic compiler option.

Symbolic scoping is more restrictive than global linker scoping; all references within a library that match definitions within the library will bind to those definitions. Outside of the library, the symbol appears as though it was global. That is, at first the link-editor tries to find the definition of the symbol being used in the same shared library. If found the symbol will be bound to the definition during link time; otherwise the search continues outside the library as the case with global symbols. This explanation holds good for functions, but for variables, there is an extra complication of copy relocations.

Let's see how symbolic scope works practically, by compiling the same code again with -xldscope=symbolic option.

Case 1:
Compile with Sun Studio C compiler:


% cc -G -o libmylib.so -xldscope=symbolic mylib.c
% cc -G -o libthirdparty.so thirdpartylib.c
% cc -o vercheck -lthirdparty -lmylib versioncheck.c
% ./vercheck

mylib.c: libraryversion = 2.200000
mylib.c: ver = 2.200000
** Met the library version requirement .. Good to Go! **

Case 2:
Compile with Sun Studio C++ compiler:


% CC -G -o libmylib.so -xldscope=symbolic mylib.c
% CC -G -o libthirdparty.so thirdpartylib.c
% CC -o vercheck -lthirdparty -lmylib versioncheck.c
%  ./vercheck

mylib.c: libraryversion = 2.200000
mylib.c: ver = 2.200000
** Met the library version requirement .. Good to Go! **

With symbolic (protected) scoping, the reference to the symbol libraryversion was bound to its definition within the load module libmylib and the program showed the intended behavior.

However the main drawback of -xldscope=symbolic is that, it may interpose the implementation symbols of C++. These implementation interfaces often must remain global within a group of similar dynamic objects, as one interface must interpose on all the others for the correct execution of the application. Due to this, the use of -xldscope=symbolic is strongly discouraged.

Sun Studio compilers (8 or later versions) provide a declaration specifier called __symbolic and using __symbolic specifier with symbols that needs to have symbolic scope (protected symbols) is recommended.

Posted by: Giri Mandalika. # 5:57 PM 0 comments.

Monday, May 02, 2005

C/C++: Printing Stack Trace with printstack() on Solaris

libc on Solaris 9 and later, provides a useful function called printstack, to print a symbolic stack trace to the specified file descriptor. This is useful for reporting errors from an application during run-time.

If the stack trace appears corrupted, or if the stack cannot be read, printstack() returns -1.

Programmatic example:


% more printstack.c
#include <stdio.h>
#include <ucontext.h>

int callee(int file) {
        printstack(file);
        return (0);
}

int caller() {
        int a;
        a = callee (fileno(stdout));
        return (a);
}

int main() {
        caller();
        return (0);
}

% cc -o stacktrace stacktrace.c
% ./stacktrace
/tmp/stacktrace:callee+0x18
/tmp/stacktrace:caller+0x22
/tmp/stacktrace:main+0x14
/tmp/stacktrace:0x6d2

The printstack() function uses dladdr1() to obtain symbolic symbol names. As a result, only global symbols are reported as symbol names by printstack().


% CC -o stacktrace stacktrace.c
% ./stacktrace
/tmp/stacktrace:__1cGcallee6Fi_i_+0x18
/tmp/stacktrace:__1cGcaller6F_i_+0x22
/tmp/stacktrace:main+0x14
/tmp/stacktrace:0x91a

The stack trace from a C++ program, will have all the symbols in their mangled form. So as of now, the programmers may need to have their own wrapper functions to print the stack trace in unmangled form.

There has been an RFE (Request For Enhancement) in place against Solaris' libc to print the stack trace in unmangled form, when printstack() has been called from a C++ program. This will be released as a libc patch for Solaris 8, 9 & 10 some time in the near future.


% elfdump -CsN.symtab libc.so | grep printstack
    [5275]  0x00052629 0x00000051  FUNC GLOB  D    0 .text       _printstack
    [6332]  0x00052629 0x00000051  FUNC WEAK  D    0 .text       printstack

Since the object code is automatically linked with libc during the creation of an executable or a dynamic library, the programmer need not specify -lc on the compile line.

Suggested Reading:
Man page of walkcontext or printstack

Posted by: Giri Mandalika. # 1:23 AM 9 comments.

2004-2019