Debugging Fun with JNI

Our proprietary system, written in a mix of Java and C++, was crashing quite often. It only started happening after some new features were added, so naturally that became the prime suspect, but my colleague claimed it was because the process ran out of memory.

I wrote a small test case to see if an out-of-memory situation would cause this, and true enough, it did. The JVM generated a crash log file that looked like this (I’ve omitted the unimportant parts):

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x6d941e62, pid=996, tid=10040
#

---------------  T H R E A D  ---------------

Current thread (0x0b2dec00):  JavaThread "pool-1-thread-49" ...

siginfo: ExceptionCode=0xc0000005, reading address 0x00000000
...

Stack: [0x0b7e0000,0x0b830000],  sp=0x0b82d92c,  free space=310k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0xa1e62]
C  [JNITask.dll+0x1ec1]  Java_JNITask_doWork+0xc41
C  [JNITask.dll+0x1399]  Java_JNITask_doWork+0x119
j  JNITask.doWork()Ljava/util/Hashtable;+0
j  JNITask.run()V+31
j  java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Ljava/lang/Runnable;)V+59
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+28
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [jvm.dll+0xfad0b]
V  [jvm.dll+0x18c241]
...

It looks like the problem is occurring within a call made from C++ back into the JVM. After tracing the DLL file through IDA Pro, I found this code fragment:

jdoubleArray numbersArr = env->NewDoubleArray(DBL_ARR_SIZE);
env->SetDoubleArrayRegion(numbersArr, 0, DBL_ARR_SIZE-1, (jdouble *) f);

It looks like we skimped on error checking code. The newly-created numbersArr returned by the JVM wasn’t being checked. When the JVM ran out of memory, it failed to create the array and returned NULL, and this NULL went into the next JVM call.

The correct solution here is to check for NULL and return immediately. When we try to allocate the array and it fails due to insufficient memory, the JVM will record an exception but because the native code is still executing, this exception will not be thrown until the native code returns.

As soon as we returned from the native code, the JVM throws the exception:

Exception in thread "pool-1-thread-17" java.lang.OutOfMemoryError: Java heap space
        at JNITask.doWork(Native Method)
        at JNITask.run(Caller.java:23)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Some people have suggested passing the -Xcheck:jni argument to the Sun/Oracle JVM to hunt down JNI issues. Let’s see how useful it is in our case:

WARNING in native method: JNI call made with exception pending
        at JNITask.doWork(Native Method)
        at JNITask.run(Caller.java:23)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

Hmmm it stopped the JVM from crashing, but I’m not sure if that would have helped if we didn’t know what to look out for. It’s basically complaining about the subsequent call after the call to NewDoubleArray failed.

In most cases, the lack of error checking would have gone unnoticed, but because of the amount of memory being consumed, the JVM eventually runs out of memory and increases the chances of this happening.

An alternative method could have been to throw an exception, but this requires you to use env->FindClass to first get a handle to the exception class, then calling env->ThrowNew to create an instance. I doubt it would work if the JVM was already running out of memory.

This entry was posted in code.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.