How To Analyze Java Thread Dumps

While building large scale systems in Java, you are bound to encounter performance and app crashing issues at some point of time. These issues require appropriate diagnosis. Java thread dumps can be a powerful tool for diagnosing and troubleshooting such issues in Java applications. When used correctly, a thread dump can provide valuable insights into runtime behavior of an application. In this post, we will discuss some best practices and techniques for analyzing Java thread dumps. By the end of this post, you should have a better understanding of how to use thread dumps to optimize the performance of your Java applications and minimize downtime.

What is a Java Thread Dump?

A Java thread dump is a diagnostic data that captures the current state of all the threads running in a Java Virtual Machine (JVM) at a particular moment in time. It contains information about each thread, such as its ID, name, and state, as well as the method call stack for each thread.

When is it useful to capture and analyze a thread dump?

Thread dumps are typically used to diagnose performance and concurrency issues in Java applications. By analyzing a thread dump, you can identify threads that are blocked, waiting, or running for lengthy periods of time, and you can identify the methods that each thread is currently executing. This information can help you understand the behavior of your Java application and diagnose issues such as deadlocks, resource contention, and excessive CPU usage.

So how do I capture a thread dump?

Thread dumps can be captured using various tools:

  • Using jstack: The jstack tool is a command-line utility that is bundled with the JDK. It allows you to capture a thread dump of a
    running Java application by providing the process ID (PID) of the JVM running the application. The following command can be used to
    take a thread dump using jstack:

    jstack < PID > thread-dump.txt
  • Using kill command: On linux platforms, you can use following command as well to capture a thread-dump:
    kill -quit < process-id >

    This would print the thread-dump on standard output; thus, you can redirect it to a file (using > ) if you wish or if
    your application is already redirecting the standard output to some log file, you could find the thread-dump over there.

  • You can also use VisualVM and other Java profiling tools to capture a thread dump.

Guidelines to analyze a thread dump

Once you have captured a thread dump, you can follow these guidelines to analyze it:

Understand the format of thread dumps

The first thing is to understand the format of a thread dump. Thread dumps are typically presented in a stack trace format, where each thread’s stack trace is displayed one after the other. The format of a thread dump can vary depending on the tool used to capture it, but most thread dumps include the thread’s ID, name, and state, as well as a list of the method calls on the thread’s stack. Here is an example of a stack trace for a thread in a thread dump:


"main" #1 prio=5 os\_prio=0 tid=0x00007fc251808800 nid=0x1a03 waiting on condition [0x000070000deee000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076c80bba0> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
at com.example.MyClass.doSomething(MyClass.java:24)
at com.example.MyClass.main(MyClass.java:10)

In this example, the thread’s name is “main”, and it has a thread ID of 0x00007fc251808800 and a native ID of 0x1a03. The thread is in the WAITING state and is waiting on a CountDownLatch object with a synchronization object reference of <0x000000076c80bba0>. The rest of the stack trace shows the sequence of method calls that led to the current state of the thread.

Here is another example of a thread dump where a deadlock is detected:


"Thread-1" #11 prio=5 os\_prio=0 tid=0x0000000021dfe800 nid=0x2a3 waiting for monitor entry [0x0000000022cfe000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.example.MyClass.method1(MyClass.java:20)
- waiting to lock <0x000000076c80bba0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at com.example.MyClass.method2(MyClass.java:30) - locked <0x000000076c80bb90> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) 
at com.example.MyClass.run(MyClass.java:40) 

"Thread-2" #12 prio=5 os\_prio=0 tid=0x0000000021dff800 nid=0x2a4 waiting for monitor entry [0x0000000022dfd000]
java.lang.Thread.State: BLOCKED (on objec7t monitor)
at com.example.MyClass.method2(MyClass.java:30)
- waiting to lock <0x000000076c80bb90> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at com.example.MyClass.method1(MyClass.java:20)
- locked <0x000000076c80bba0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) 
at com.example.MyClass.run(MyClass.java:40)

In this example, there are two threads, “Thread-1” and “Thread-2”, both waiting for a monitor entry on a ReentrantLock object. However, each thread has acquired a lock on a different ReentrantLock object, causing a deadlock. “Thread-1” is waiting to acquire the lock on <0x000000076c80bba0> while holding the lock on <0x000000076c80bb90>, while “Thread-2” is waiting to acquire the lock on <0x000000076c80bb90> while holding the lock on <0x000000076c80bba0>.

In this case, the threads are both blocked and waiting for each other to release the lock they need to continue, causing a deadlock. To resolve this deadlock, you would need to carefully analyze the code to understand why the threads are locking the resources in the order they are and restructure the code to prevent the deadlock.

Use automated tools for analysis

Another approach which you can take to analyze large thread dumps is to use some automated tools (such as https://fastthread.io/) where you can upload your thread dump file and it analyses and reports some useful points about your dump. You may still need to further analyze the thread dump to identify the actual pain point however this automated tool may give you a good start. Of course, you need to take care of privacy before you upload any thread dump file over to such online thread dump analyzing tools.

Consider these tips to make your life easier

In real world Java based enterprise applications, you can find few hundreds (or even more than a thousand!) threads sometimes. It could become a little tedious to analyze such thread dumps. Thus, it is always recommended to name a user-defined thread properly if you are developing a multi-threaded Java application. If you ever need to analyze the thread dump of your application, you would be aware of the names of threads which are most crucial/likely to cause the issue you are analyzing and you can straight away look for names of those threads in the thread dumps and analyze what they are up to. Similarly, if you take multiple (say 3 or 4) thread dumps at the interval of 4-5 minutes, and you notice that each time the threads are stuck while doing a specific task then you can easily identify the area causing the issue.

A real-world example

I would like to share an example which I encountered in one of the applications while I was trying to analyze a performance issue. In this case, there were fifteen threads with similar stack trace, here is the trace for one such thread:


"Thread-1" #99 prio=5 os\_prio=0 tid=0x0000000053d3e800 nid=0x44b4 waiting for monitor entry [0x000000005aaae000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.example.MyClass.doSomething(MyClass.java:134)
waiting to lock <0x000000008e6f5da0> (a java.lang.Object) …

And fourteen more threads had similar stack trace and were in BLOCKED state. As soon as you detect this sort of situation, try to look for the string “locked <0x000000008e6f5da0>” in the thread dump. Here <0x000000008e6f5da0> is the object id of the locking object used in a synchronized block to achieve control in concurrent execution of a specific task. On searching for the string “locked <0x000000008e6f5da0>”, I found following stack trace in the thread dump:


"Thread-16" #107 prio=5 os\_prio=0 tid=0x0000000051b52800 nid=0xd74 runnable [0x000000005b2ae000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.close0(Native Method)
at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
at java.io.FileDescriptor.closeAll(FileDescriptor.java:216)
- locked <0x000000260e5c8458> (a java.io.FileDescriptor)
at java.io.FileOutputStream.close(FileOutputStream.java:354)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
- locked <0x000000260e5c85a0> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:266)
- locked <0x000000260e5c85a0> (a java.io.OutputStreamWriter)
at java.io.PrintWriter.close(PrintWriter.java:339)
- locked <0x000000260e5c8560> (a java.io.BufferedWriter)
at com.example.MyClass.doSomething(MyClass.java:145)
- locked <0x000000008e6f5da0> (a java.lang.Object) …

And I captured 2 more thread dumps at the gap of 3-4 minutes and noted similar stack traces in all the dumps. This clearly indicated the problem area – 1 thread was taking a long time while trying to perform an I/O operation (in above case it was writing something to a file and then it was taking long time to close the FileOutputStream). In the end, this came out to be an issue with the underlying file system and after rectifying the underlying issue, things started working well.

Summary

In summary, you can follow these points while analyzing a thread dump:

  • Identify the problematic threads: The initial step in analyzing a thread dump is to identify the threads that are causing performance issues. Look for threads that are in a “blocked” or “waiting” state, as these threads are likely causing contention issues.
  • Analyze the stack trace: Once you have identified the problematic threads, analyze their stack traces to understand what they are doing. Look for long-running methods or methods that are called frequently. These methods could be the source of performance issues.
  • Identify the root cause: After analyzing the stack traces, try to identify the root cause of the performance issue. It could be a synchronization issue, a deadlock, an environmental issue, or a resource contention issue. Look for common patterns in the thread dump to identify the root cause.
  • Act: Once you have identified the root cause of the performance issue, take action to resolve it. This could involve optimizing code, improving resource utilization, resolving the environmental issue, or redesigning the application architecture.

Written By: Ashish Sarna