2009-02-23

log4j causing OutOfMemory error

OutOfMemory Error on application server.

Thread dump analysis reported 15 threads active out of 650. The others were in wait. Inactive threads stack trace reported the following:

at java.lang.Thread.sleep(Native Method)
at org.apache.log4j.helpers.FileWatchdog.run(FileWatchdog.java(Compiled Code))

In here http://www.theserverside.com/discussions/thread.tss?thread_id=27315 I found the following:

I found from the thread dump that lot of threads are created for log4j and not getting recycled.(see the thread dump below). I masked the log4j logging from my code. These log4j threads still get generated. Any ideas???

Thread Dump
"Thread-10" daemon prio=5 tid=0x113015d0 nid=0x838c waiting on monitor [0x133bf000..0x133bfdbc]
at java.lang.Thread.sleep(Native Method)
at org.apache.log4j.helpers.FileWatchdog.run(FileWatchdog.java:95)

SOLUTION [in short: log4j configureAndWatch() spreads lots of threads without deallocating them, eventually causing OOM exceptions]:

I have not worked on JBoss. I am having Weblogic Server. But I am pretty much sure that JBoss might let you do similar stuff.

1. After you have launched the server, run your application in a loop.
2. At the point where the server says that it cannot create new native threads, get a thread dump. In weblogic I did it by running the command :

java weblogic.Admin -url hostname:7001 -username admin -password mypassword THREAD_DUMP

3. Take a look at all the threads that are waiting on monitor. Chances are that 90% of the threads will be spawned by same process. In my case, it was

"Thread-14" daemon prio=5 tid=0x11999e90 nid=0x3388 waiting on monitor [0x135bf000..0x135bfdbc]
at java.lang.Thread.sleep(Native Method)
at org.apache.log4j.helpers.FileWatchdog.run(FileWatchdog.java:95)

I knew that this has something to do with log4j framework that I was using for logging.

4. So I started masking the logging lines, to see where this was being generated from. Eventually I found that the line
DOMConfigurator.configureAndWatch(fileFileName, logFileWatchTime)
was generating the threads that were not getting recycled. So I just went ahead and masked them.

Now everything works fine. It is going to be a different day I will be spending trouble shooting why this is happening. I was using this, because I liked the idea that by doing so, you could dynamically change log4j properties file and not restart the server.

But for now, I can leave with the idea of restarting the server if I change any configuration in log4j properties file. ;->

What i would suggest is that instead of looping 1000 times, just loop 20 times. Get a thread dump before you run your client and after you run youyr client. See what threads are created new, and if most of them are for same process, you need to see where they care created. It might be a bug in your code or third party library that you are using.

No comments:

Post a Comment