Pages

Friday, October 14, 2005

Handling SIGFPE

Couple of days back I was asked to look into simple C code that tries to get into division by zero problem deliberately. Signal handler was installed, and there's some code to catch signal SIGFPE (Floating Point Exception), and to print simple message when the relevant code in the signal handler is called. Here's the code (courtesy: Dheeraj):
% cat fpe.c

#include <sys/types.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

void signal_handler (int signo, siginfo_t *si, void *data) {
switch (signo) {
case SIGFPE:
fprintf(stdout, "Caught FPE\n");
break;
default:
fprintf(stdout, "default handler\n");
}
}

int main (void) {
struct sigaction sa, osa;
unsigned int b = ULONG_MAX;

sa.sa_flags = SA_ONSTACK | SA_RESTART | SA_SIGINFO;
sa.sa_sigaction = signal_handler;
sigaction(SIGFPE, &sa, &osa);

b /= 0x0;

return b;
}
During run-time, the system (OS) throws SIGFPE once the statement b /= 0x0; gets executed. Since the handler is available for this signal, it should print Caught FPE once, on console and then return from main(). Strangely enough, the floating point exception was caught multiple times as though it was in an infinite loop, and the process didn't exit.
% cc -o fpe fpe.c
"fpe.c", line 25: warning: division by 0

% ./fpe
Caught FPE
Caught FPE
Caught FPE
Caught FPE
Caught FPE
Caught FPE
Caught FPE
^C
It turns out to be the expected behavior; and it appears that when a floating point instruction traps due to the occurrence of an unmasked floating point exception, the hardware leaves the instruction pointer pointing to the beginning of the same instruction. This explains the reason for the multiple SIGFPE's from the same process (and from the same instruction).

Now the developer has the following choices:
  1. Abort the program

  2. Modify the operands of the instruction, so the exception will not occur; then continue by re-executing that instruction. Doing so, supplies a result for the trapping instruction

    --And/Or--

  3. Update the instruction pointer (PC), so the execution continues at the next instruction (nPC)
I chose the final one, and simply updated the program counter with the next instruction, as follows. New code is in green color.
% cat fpe.c

#include <sys/types.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
#include <ucontext.h>

void signal_handler (int signo, siginfo_t *si, void *data) {
ucontext_t *uc;
uc = (ucontext_t *) data;


switch (signo) {
case SIGFPE:
fprintf(stdout, "Caught FPE\n");
uc->uc_mcontext.gregs[REG_PC] = uc->uc_mcontext.gregs[REG_nPC];
break;
default:
fprintf(stdout, "default handler\n");
}
}

int main (void) {
struct sigaction sa, osa;
unsigned int b = ULONG_MAX;

sa.sa_flags = SA_ONSTACK | SA_RESTART | SA_SIGINFO;
sa.sa_sigaction = signal_handler;
sigaction(SIGFPE, &sa, &osa);

b /= 0x0;

return b;
}

% cc -o fpe fpe.c
"fpe.c", line 30: warning: division by 0

% ./fpe
Caught FPE

uc points to the user context, defined by the structure ucontext_t. The user context includes the contents of the calling processes' machine registers, the signal mask, and the current execution stack. uc_mcontext is a member of the structure ucontext_t, of type mcontext_t. gregs, general register set is a member of structure mcontext_t. gregs[REG_PC] holds the PC of the current instruction, and gregs[REG_nPC] holds the PC of the next instruction.

Now it is obvious that uc->uc_mcontext.gregs[REG_PC] = uc->uc_mcontext.gregs[REG_nPC]; statement increments the program counter . Since the user context got manipulated a bit, the process will be able to continue with the next instruction.

----
This code works "as is" on SPARC, since REG_nPC is available on SPARC. To make it work with other processors, the code needs to be changed a little bit.

________________
Technorati tag: | |

No comments:

Post a Comment