Pages

Thursday, May 31, 2007

Solaris OS Solutions to 32-bit stdio's 256 File Descriptors Limitation

.. technical article is up on Sun Developer Network (SDN) web site at http://developers.sun.com/solaris/articles/stdio_256.html. In order to comply with the SDN guidelines, the copy-editor at Sun made quite a number of changes to the original draft. But I still like the original draft "as is". I'm reproducing the draft that I submitted, in this blog post.

Sun Solaris Solution to 32-bit stdio's 256 File Descriptors Limitation


The following discussion is relevant only in the 32-bit world. 64-bit applications are immune to 256 file descriptors limitation.

Historic Background


A quick web search with keywords Solaris stdio open file descriptors results in numerous references to stdio's limitation of 256 open file descriptors on Sun Solaris. 1085341: 32-bit stdio routines should support file descriptors >255, a 15 year old RFE 1 explains the problem and the bug report links to handful of other bugs which are some how related to stdio's 256 file descriptors limitation.

The reason for this limitation on Solaris operating system is that an unsigned char is used to store the value of the file descriptor associated with a standard I/O stream. Have a look at the definition of the FILE structure that you will find in header, /usr/include/stdio_impl.h, on any Solaris system that does not have the solution under discussion.
struct __FILE_TAG       /* needs to be binary-compatible with old versions */
{
#ifdef _STDIO_REVERSE
unsigned char *_ptr; /* next character from/to here in buffer */
int _cnt; /* number of available characters in buffer */
#else
int _cnt; /* number of available characters in buffer */
unsigned char *_ptr; /* next character from/to here in buffer */
#endif
unsigned char *_base; /* the buffer */
unsigned char _flag; /* the state of the stream */
unsigned char _file; /* UNIX System file descriptor */
unsigned __orientation:2; /* the orientation of the stream */
unsigned __ionolock:1; /* turn off implicit locking */
unsigned __seekable:1; /* is file seekable? */
unsigned __filler:4;
};

__FILE_TAG is just an alias for FILE (see /usr/include/stdio_tag.h).

_file holds the file descriptor, which was declared as an unsigned char. An unsigned char occupies 8 bits in memory; hence _file can hold a maximum value of 2^8 = 256. In other words, _file restricts the access to 256 file descriptors per 32-bit process. This limitation was clearly documented in the man page of stdio(3C).

Sun didn't make changes from an 8-bit unsigned char to a 16-bit int to accomodate more file descriptors because this would break the much promised binary compatibility with earlier releases of Solaris, as it changes the size of the structure.

Solutions


Starting with the upcoming release of Solaris 10 8/07 2, Sun offers run-time and programming solutions in the form of extended FILE facility to alleviate stdio's 256 file descriptors limitation. Systems running Solaris Express (SX) or any OpenSolaris distribution after build 39 will have these solutions. Check Patches, Bugs section for instructions to get extended FILE facility installed on systems running any release of Solaris 10 3/05 through Solaris 10 11/06.

1) Run-time Solution


As the title suggests, a run-time solution does not require any source code changes or re-compilation of the objects to overcome the 256 file descriptors limitation with the stdio(3C) C library functions. However the default behavior of existing 32-bit applications won't change unless the extended FILE facility is enabled explicitly. Applications that enable this feature will be able to associate any valid file descriptor with a standard I/O (stdio) stream. Any value that lies within the range 3 3 and the value returned by ulimit -n from the shell used to launch the application is a valid file descriptor. The per process maximum number of file descriptors in a shell can be increased from the default 256 to any value that is less than or equal to the value returned by the command, echo 'rlim_fd_max/D' | mdb -k | awk '{ print $2 }'. To adjust the file descriptor limit in a shell, run ulimit -n <max_file_descriptors> in sh/ksh/bash or limit descriptors <max_file_descriptors> in csh, where max_file_descriptors is the maximum number of file descriptors desired.

65,536 is the default hard limit for the number of files a process can have opened at any time. This limit can be tuned with the system tunable rlim_fd_max. Although a very large number of files can be opened by tuning rlim_fd_max parameter, virtual memory space becomes the limit for 32-bit processes when there are hundreds of thousands of open files. When the process reaches the limits of virtual memory, stdio calls fail with Not enough space error.

Before running the 32-bit application, enable the extended FILE facility by:

  1. Raising the maximum number of file descriptors in a shell, and
  2. Pre-loading the extended FILE facility, /usr/lib/extendedFILE.so.1.


Note that extendedFILE.so.1 is not a library, but an enabler of the extended FILE facility.

Here is how you'd enable the extended FILE facility from ksh:

% ulimit -n
256

% echo 'rlim_fd_max/D' | mdb -k | awk '{ print $2 }'
65536

% ulimit -n 65537
ksh: ulimit: exceeds allowable limit

% ulimit -n 65536

% ulimit -n
65536

% export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1
% application [arg1 arg2 .. argn]

The following example shows the behavior of a simple 32-bit process with and without the extended FILE facility enabled. The test case, a simple C program, tries to open 65536 files with fopen() interface.

% cat fopentestcase.c

#include <stdio.h>
#include <stdlib.h>

#define NoOfFILES 65536

int main()
{
char filename[10];
FILE *fds[NoOfFILES];
int i;

for (i = 0; i < NoOfFILES; ++i)
{
sprintf (filename, "/tmp/%d.log", i);
fds[i] = fopen(filename, "w");

if (fds[i] == NULL)
{
printf("\nNumber of open files = %d. fopen() failed with error: ", i);
perror("");
exit(1);
}
else
{
fprintf (fds[i], "some string");
}
}
return (0);
}

Reproduce the failure with the default maximum number of file descriptors in a shell.

% cc -o fopentestcase fopentestcase.c

% ulimit -a | grep descriptors
nofiles(descriptors) 256

% ./fopentestcase
Number of open files = 253. fopen() failed with error: Too many open files

Raise the file descriptor limit, enable extended FILE facility; and run the test case again to see the run-time solution at work.
% ulimit -n 5000

% ulimit -a | grep descriptors
nofiles(descriptors) 5000

% export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1

% ./fopentestcase
Number of open files = 4996. fopen() failed with error: Too many open files

% ulimit -n 65536

% ulimit -a | grep descriptors
nofiles(descriptors) 65536

% ./fopentestcase
Number of open files = 65532. fopen() failed with error: Too many open files

Observe the shortage of one file descriptor (excluding 0, 1 and 2 for stdin, stdout and stderr respectively) in the above examples. When extended FILE facility is enabled, by default the file descriptor 196 will be made unallocatable to minimize silent data corruption. See the next section, Environment Variables, for more.

Here is the pfiles output to confirm the above proposition:
% pfiles `pgrep fopentestcase` | egrep "log|:"
...
195: S_IFREG mode:0644 dev:102,7 ino:7380 uid:209044 gid:1 size:0
/tmp/192.log
197: S_IFREG mode:0644 dev:102,7 ino:7381 uid:209044 gid:1 size:0
/tmp/193.log
...

Environment Variables

The following two environment variables control the behavior of the extended FILE facility:

  1. _STDIO_BADFD

    This variable takes any integer value in the range 3 - 255, which will be made unallocatable as file descriptor. Setting this environment variable will provide a protection mechanism to software with unknown behaviors such as 3rd party libraries without source code, so applications do not experience silent data corruption. In the absence of this environment variable, a default value of 196 will be marked as unallocatable file descriptor during run-time.

    It is apparent that object code built on Solaris in pre-extended FILE era will not be expecting any file descriptor that doesn't fit in an 8-bit unsigned char; and will not understand how to handle extended FILE pointers. For these reasons the range has been restricted to 3 - 255 so the code that retrieves the file descriptor value by de-referencing FILE -> _file rather than fileno(3C) function will receive the unallocatable (bad) file descriptor when the actual descriptor is indeed an extended file descriptor i.e., any value > 255. Another environment variable _STDIO_BADFD_SIGNAL has been introduced to specify the signal to be sent to the process when the uninspected code tries to modify the unallocatable file descriptor.

  2. _STDIO_BADFD_SIGNAL

    This variable takes an integer or string representing any valid signal. See signal.h(3HEAD) for valid values or strings for all supported signals on Solaris. This variable causes the specified signal to be sent to the application if certain exceptional cases are detected during the use of the extended FILE facility. The default signal is SIGABRT.

When Not To Use This Solution?

Do not enable extended FILE facility if the application:
  1. directly de-references the _file field of FILE structure, or
  2. uses the long removed fileno() macro 4 rather than the function fileno(FILE) to get the value of the underlying file descriptor.

When this feature is enabled, file descriptors > 255 will be stored in an auxiliary location unknown to the application, and an unallocatable (bad) file descriptor held by environment variable _STDIO_BADFD will be stored in FILE -> _file field. Improper access by the application to the FILE -> _file field will yield the unallocatable bad file descriptor when the actual underlying file descriptor is greater than 255, thus leading to silent data corruption.

Also data corruption can occur if the process truncates the value returned by the fileno(FILE) function. For example, if the 16-bit or 32-bit int value returned by fileno() function is stored in an 8-bit unsigned char variable, truncation occurs; and accessing the truncated file descriptor may yield errors.

The following error message during the run-time is a clear indication that the application is modifying the internal file descriptor field of the FILE structure from stdio.
Application violated extended FILE safety mechanism.
Please read the man page for extendedFILE.
Aborting

When confronted with such an error message, stop using extended FILE facility with the application; if possible, fix the source by replacing all references to FILE -> _file with calls to fileno(FILE). Ignoring the above run-time error could lead to data corruption.

Example

The following trivial example illustrates the usage of environment variables, _STDIO_BADFD and _STDIO_BADFD_SIGNAL; and shows the subsequent program crash when the code violates the extended FILE safety mechanism.

Compile the following code and build a library on any system running Solaris 5 without the extended FILE solutions.
% cat thirdpartysrc.c

#include <stdio.h>

void manipulatefd (FILE *fptr)
{
;
;
fprintf(stdout, "\n%s : manipulatefd(): underlying file descriptor = %d\n", \
__FILE__, fptr -> _file);
fptr -> _file = 123;
fprintf(fptr, "This call is gonna fail!\n");

;
;
}

% cc -G -o /tmp/libthirdparty.so thirdpartysrc.c

Compile the following code and build an executable by linking the object code with the library created in the above step,
on any system running Solaris with the extended FILE solutions.
% cat enableextfile.c

#include <stdio.h>
#include <stdlib.h>

#define NoOfFiles 500

void manipulatefd(FILE *);

int main ()
{
FILE *fptr;
int i;

for (i = 0; i < NoOfFiles; i++)
{
fptr = fopen("/tmp/enable_test.txt", "w");

if (fptr == NULL)
{
perror("fopen failed. ");
exit(1);
}

printf("\nfd = %d", fileno(fptr));

if (fileno(fptr) % 400 == 0)
{
manipulatefd(fptr);
}
}

return(0);
}

% export LD_LIBRARY_PATH=/tmp:$LD_LIBRARY_PATH
% cc -o enableextfile -lthirdparty enableextfile.c

Raise the maximum file descriptor limit per process to any number > 255, set the environment variables _STDIO_BADFD and _STDIO_BADFD_SIGNAL, enable the extended FILE facility by pre-loading /usr/lib/extendedFILE.so.1; and finally run the executable.
% ulimit -n
256

% ulimit -n 500

% ulimit -n
500

% export _STDIO_BADFD=196
% export _STDIO_BADFD_SIGNAL=SIGABRT
% export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1

% ./enableextfile
fd = 3
fd = 4
fd = 5
...
...
fd = 398
fd = 399
fd = 400
thirdpartysrc.c : manipulatefd(): underlying file descriptor = 196
Application violated extended FILE safety mechanism.
Please read the man page for extendedFILE.
Aborting

Abort(coredump)

% /usr/bin/pstack core
core 'core' of 10172: ./enableextfile
d1f28e65 _lwp_kill (1, 6) + 15
d1ee2102 raise (6) + 22
d1ec0dad abort (0, 80677e0, d1f60000, 804638c, 804638c, 80463c4) + cd
d1f01d54 _file_get (80677e0) + b4
d1efeb21 _findbuf (80677e0) + 31
d1ef2f16 _ndoprnt (d1f70344, 80471d4, 80677e0, 0) + 46
d1ef669f fprintf (80677e0, d1f70344) + 9f
d1f702cb manipulatefd (80677e0) + 3b
0805097f main (1, 8047214, 804721c) + 9f
0805084a _start (1, 8047360, 0, 8047370, 8047382, 8047393) + 7a

If the application does not show any of the above mentioned patterns, it can take advantage of this run-time solution irrespective of its age i.e., even the applications built on Solaris 7 or prior versions may continue to work flawlessly.

See the manual page of extendedFILE(5) for more examples.

2) Programming Solution(s)


This section is intended for new applications and applications that can easily be modified.

There are two programmatic interfaces which will allow access to the larger than 256 file descriptor FILE pool provided the maximum file descriptors resource limit has been raised. Note that the default maximum file descriptors limit is still 256.

i) Enhanced Standard I/O Open Calls fopen(3C), fdopen(3C) and popen(3C)

In order to reduce the effort in modifying the existing sources to take advantage of the extended FILE feature, the existing mode string of stdio open calls like fopen(3C), fdopen(3C), popen(3C) have been augmented with a new flag 'F'.

eg.,
FILE *fptr = fopen("dummy.txt", "rF");

int fd = creat("dummy2.txt", S_IWUSR);
FILE *stream = fdopen(fd, "wF");

FILE *ptr = popen("/usr/bin/ls *.txt", "rF");

If the last character of the mode string is an 'F', 32-bit processes will be allowed to associate a stream with a file accessed by a file descriptor with a value greater than 255. In case of 64-bit applications, the character 'F' in the mode string will be silently ignored. Except for this minor enhancement, the existing semantics of stdio open calls haven't changed.

The 'F' in the mode string of stdio open calls is intended for code that doesn't:
  1. directly de-references fields in the FILE structure, and
  2. return a FILE pointer to the caller.

If the application exhibits any of the above mentioned patterns, the character 'F' must not be appended in the mode string to enable the extended FILE feature. Data corruption could occur, if the fields in the FILE structure are used directly by 32-bit applications when the last character of mode is 'F'. Also there is an equal chance of data corruption when the extended FILE pointer is returned to the binary code unknown to the user (uninspected code, that is), as the caller might not understand how to handle it. This interface does not provide any safe guards against misuse of extended FILE pointers. If a FILE pointer must be returned to any uninspected code consider using enable_extended_FILE_stdio(3C) at a higher level in the code.

Usage example:

Re-build the test case after changing the following line:

fds[i] = fopen(filename, "w");

to
fds[i] = fopen(filename, "wF");

Raise the file descriptor limit from the shell, and run the test case again to see the results.
% cc -o fopentestcaseF fopentestcase.c

% ulimit -n 10000

% ulimit -a | grep descriptors
nofiles(descriptors) 10000

% ./fopentestcaseF
Number of open files = 9996. fopen() failed with error: Too many open files

Note the absence of linking against any special libraries to make it work. All the stdio routines are still part of libc.

Have a look at the man pages of fopen(3C), fdopen(3C) and popen(3C) for more.

ii) New Programming Interface enable_extended_FILE_stdio(3C)

If the FILE pointer is not confined within the context of a single function, the new programming interface enable_extended_FILE_stdio() can be used to enable the extended FILE facility. This interface minimizes the data corruption by providing some protection mechanism to software with unknown behaviors such as 3rd party libraries without source code. For instance, by using this interface the user can choose any signal to be sent to the process during run-time when the application de-references FILE -> _file inappropriately.

This new interface was defined in /usr/include/stdio_ext.h header as follows:

int enable_extended_FILE_stdio(int, int);


The first argument, an integer, specifies the file descriptor in the range 3 - 255 that the application wants to be selected as the unallocatable file descriptor. Alternatively setting it to -1 will request enable_extended_FILE_stdio(3C) to select a reasonable unallocatable file descriptor. This is the equivalent of setting environment variable, _STDIO_BADFD, when enabling the run-time solution for extended FILEs.

The second argument, an integer, specifies the signal to be sent to the process when the unallocatable file descriptor is used as a file descriptor argument to any system call except close(2) or closefrom(3C). Some applications may attempt to close file descriptors that they did not open. This exception prevents application crash from such harmless calls. If -1 is passed, the default signal SIGABRT will be sent to the process. A value of 0 ignores any FILE -> _file de-references by disabling the sending of a signal. Otherwise, the specified signal will be sent to the process. See signal.h(3HEAD) for the complete list of signals on Solaris. This is the equivalent of setting environment variable, _STDIO_BADFD_SIGNAL, when enabling the run-time solution for extended FILEs.

The enable_extended_FILE_stdio(3C) function is available only in the 32-bit compilation environment.

For the extended FILE facility to be effective, raise the default maximum file descriptor limit for the process from 256 to any number less than or equal to the hard limit for the number of files a process can have opened at any time (see the kernel tunable, rlim_fd_max, for more). This can be done either from the shell with ulimit/limit commands or programmatically by using getrlimit(2)/setrlimit(2) functions defined in /usr/include/sys/resource.h header.

The following trivial programming example demonstrates:
  • setting the file descriptor limit through getrlimit(2)/setrlimit(2) interfaces
  • the usage of this new function to enable the extended FILE facility, and
  • the application crash when an uninspected code abuses the underlying file descriptor by directly changing the value of FILE -> _file.

Compile the following code and build an executable by linking the object code with the library, libthirdparty.so, created in the Run-time Solution section of this article.
% cat enableextfilestdio.c

#include <stdio.h>
#include <stdio_ext.h>
#include <stdlib.h>
#include <sys/resource.h>

#define NoOfFiles 500

void manipulatefd(FILE *);

int main ()
{
FILE *fptr;
struct rlimit rlp;
int i;

(void) getrlimit (RLIMIT_NOFILE, &rlp);
rlp.rlim_cur = NoOfFiles;
/* set the desired number of file descriptors */

if (setrlimit (RLIMIT_NOFILE, &rlp) == -1)
{
perror ("setrlimit(): ");
exit (1);
}

if (enable_extended_FILE_stdio (-1, -1) == -1)
{
perror ("enable_extended_FILE_stdio(3C): ");
exit (1);
}

for (i = 0; i < NoOfFiles; i++)
{
fptr = fopen ("/tmp/enable_test.txt", "w");

if (fptr == NULL)
{
perror("\nfopen failed. ");
exit (1);
}

printf ("\nfd = %d", fileno(fptr));

if (fileno (fptr) % 400 == 0)
{
manipulatefd (fptr);
}
}

return (0);
}

% export LD_LIBRARY_PATH=/tmp:$LD_LIBRARY_PATH
% cc -o enableextfilestdio -lthirdparty enableextfilestdio.c

% ./enableextfilestdio
fd = 3
fd = 4
fd = 5
...
...
fd = 398
fd = 399
fd = 400
thirdpartysrc.c : manipulatefd(): underlying file descriptor = 196
Application violated extended FILE safety mechanism.
Please read the man page for extendedFILE.
Aborting
Abort
(core dumped)

See the man page of enable_extended_FILE_stdio(3C) for more.

Alert: _file becomes _magic in the Next Major Customer Release of Solaris

The following alert does not apply to Solaris 10 including update releases.

In order to ensure the safety in using the extended FILE mechanism, _file in FILE structure has been intentionally renamed to _magic in Solaris Express (a monthly snapshot of the next major customer release of Solaris currently under development) and OpenSolaris after build 39. This change would break the compilation of source code containing any references to FILE -> _file.

The following diff output shows the changes introduced in the definition of FILE structure to accommodate the extended FILE facility.
-        unsigned char   _file;  /* UNIX System file descriptor */
+ unsigned char _magic; /* Old home of the file descriptor */
+ /* Only fileno(3C) can retrieve the value now */


- unsigned __filler:4;
+ unsigned __extendedfd:1; /* enable extended FILE */
+ unsigned __xf_nocheck:1; /* no extended FILE runtime check */

+ unsigned __filler:10;

Hence the compilation of code with references to FILE -> _file will fail with the following error message on systems running Solaris Express (SX) or any OpenSolaris distribution after build 39, and of course the next major customer release of Solaris when it is available.
"filename.c", line xx: undefined struct/union member: _file
cc: acomp failed for filename.c

The value found in the field formerly known as _file might no longer contain the FILE's file descriptor. If the code is simply reading the value of _file, replace all such references with the more appropriate fileno(FILE) function (see fileno(3C)). The developers should no longer assign a new value to _file.

Impact on Run-time Performance


When extended FILE facility is enabled, there is no performance impact when accessing file descriptors <= 255; but there will be a slight performance degradation in accessing file descriptors >= 256 due to the storage/retrieval of the file descriptor in an auxiliary location.

Patches, Bugs


If your system is running any existing version of the Solaris 10 OS -- that is, Solaris 10 3/05 through Solaris 10 11/06 -- you can install the extended FILE facility on the system with the following set of three patches (or later revisions) for your hardware platform:

SPARC platform:

125100-04 Kernel Update patch
120473-05 libc nss ldap PAM zfs patch
125800-01 Fault Manager Patch

x86/x64 platform:

125101-04 Kernel Update patch
120037-15 libc nss ldap PAM zfs patch
125801-01 Fault Manager Patch

If the application code links with STLport C++ standard library that was shipped with Sun Studio compiler suite {using -library=stlport4 compiler option}, make sure Sun Studio 11 is patched with 121017-07 or later on SPARC, and 121018-07 or later on x86/x64 platforms to take advantage of extended FILEs.

Report extended FILE bugs, if any, at bugs.opensolaris.org; and use OpenSolaris discussion forums for any clarifications.

References

  1. PSARC/2006/162 Extended FILE space for 32-bit Solaris processes
  2. Manual pages of stdio(3C), extendedFILE(5), enable_extended_FILE_stdio(3C), fopen(3C), fdopen(3C) and popen(3C).

Footnotes


  1. RFE stands for Request For Enhancement.
  2. Solaris 10 8/07 will informally be referred as 'Solaris 10 Update 4'.
  3. By default file descriptors 0, 1 and 2 are reserved for use as the default stdin, stdout and stderr I/O streams.
  4. fileno() macro was removed from the headers in Solaris release 2.7
  5. Unpatched systems running Solaris 10 releases 3/05 through 11/06 wouldn't have the extended FILE solutions. However it is possible to get the extended FILE facility installed on those systems by applying the latest kernel and libc patches. See Patches, Bugs section for instructions.

Acknowledgments


Craig Mohrman, Peter Shoults, Chien Yen, Tom Gould and HN.


Related Posts:
* Patches to get extendedFILE solution on Solaris 10
* Solaris: Workaround to stdio's 255 open file descriptors limitation
* Solaris: 32-bits , fopen() and max number of open files
_____________
Technorati Tags:
|

No comments:

Post a Comment