Investigating java.lang.OutOfMemoryError
Mostly we do not encounter this error in preliminary stages of application development. Sometimes functional testing reveals this error, but many times it is revealed during performance testing. It is not so annoying when it occurs before we push the application into production, but the real pain starts when we are hit by this error in production. When production server goes down or hangs itself with java.lang.OutOfMemoryError, it can even lead to a heavy monitory loss. You get some breathing space if the server gives enough time between two cycles of memory shortages. If you have a clustered environment and one of the many servers is going out of memory, then the situation is still manageable. The business can continue with the available running servers. But it is always urgent to fix this problem. Due to technical nature of this problem, this problem lands up in tray of techies. In further discussion on this topic, I assume that the reader has basic understanding of the concepts and terms in Java, JVM, application server and memory context.
Though the consequence of this problem so big that the JVM itself is pulled down, investigation of this problem is not so straight forward. What makes the analysis so difficult? First is the urgency to fix, and second are the challenges in this analysis. Here is a list of challenges which are faced mostly.
- Unavailability of detail information
- Limited access to production environment
- Unavailability of any profiling tool on production environment
- Log level enough to give basic information of problem only
- After OutOfMemoryError, JVM can provide information almost negligible
Under these circumstances, investigation of this problem becomes really a tough job. Finding exact cause of problem is sometimes as complex as finding a needle in ocean. Hence to solve such problem, it requires a systematic approach, which can take one towards the root of the problem step by step. Next we understand the problem in detail and then devise a strategy to solve it.
What Does OutOfMemoryError Mean?
Meaning is simple, JVM needs more memory and it cannot get it any more. But why does JVM ends up in no memory available situation? We all know that Java manages object clean up and subsequently memory on it’s own. Programmer does not need to worry about that. (This is one of the reasons why Java is adapted so widely.) Before this, let us understand how JVM manages the memory available with it. When additional memory is required to create more objects, JVM takes one of the following steps -
- Use free memory from the allocated memory
- Clean up some of the used memory by garbage collecting some of the not-referenced objects
Even after these attempts, when JVM is not able to get any memory for the thread requiring additional memory, then it throws this error and stops business. Though it appears very simple, but the underlying algorithms JVM uses before ending up in this nasty error are not simple. We leave complexity for some other discussion.
Reasons behind this short fall of memory can be many. Few important and common are –
- Not enough memory allocated to JVM
- Memory intensive operations
- Memory leaks
To know more about these reasons, we dig it further so that it can take us to the possible root causes.
Not Enough Memory Allocated to JVM:
In other words, this is infrastructure sizing issue. When the application is prepared to get deployed on an infrastructure, we definitely determine the size of infrastructure. This involves mainly processing power and memory allocation. (There are many more things to be considered, but for us now these are the important one.) It is applicable to both application server as well as database server. Out of this available memory some memory is allocated to the (application sever) JVM using start up (-X) parameters. So far, I haven’t encountered any parameter to control the processing power (CPU) other than just removing it from the box. So the CPU is utilized as and when needed by JVM. So the infrastructure of concern here is processing power provided by CPU and memory allocated to JVM.
Memory Intensive Operations:
There are couples of scenarios, in which considerable memory is occupied, and JVM cannot reclaim it quickly resulting in shortage of available memory.
- Single transaction involves huge number of objects creation or fewer heavy objects creation. Such transactions block considerable chunk of JVM memory as long as the transaction’s life time.
- Transactions lasting for very long time keeping moderate amount memory locked for entire life of transaction. If there are many such transactions then the available memory goes on reducing considerably. Also garbage collector cannot reclaim it as the objects involved are still referenced.
- Sometimes web application (http) sessions keep a lot of data in session, this also restricts the available memory for other processing.
Memory Leak:
Whenever people encounter an OutOfMemoryError error, the first suspect is memory leak. Memory leak means the garbage collector is not able to reclaim certain memory, through it is no longer directly referenced by any object or which is logically free. Static collections resulting in soft references, un necessary continued references are some of the primary suspects in memory leak scenario. I have heard few people saying connection leak as one of the cause of memory leak, but connection leak does not mostly result in out of memory. Finding out memory leak is a really a big task.
Analysis Technique: Elimination
So far we have gone through the theory part of out of memory error. Now we define a methodology to locate the root cause which can be fixed later with a suitable solution. At this stage, we know that there are many reasons behind memory shortage. We take each possibility one by one in a sequence below and eliminate it based on some amount of data collected. Yes, we need some data from the production environment to analyze this problem. Let us see what all we need to pull out and what it contains for this analysis. This information should be collected mostly around the error.
- JVM graphs from application server: JVM memory behavior
- Application logs: Application events and GC activity
- Error logs: Error traces for application event
- Garbage collection activity logs (if it is not enabled then enable verbose garbage collection)
- Session/message queue/ejb instance count: Memory utilizing artifacts
- Thread dump: Thread snapshots of JVM
- Heap dump: Object snapshot of JVM
- Access logs: Http events
- JDBC logs: JDBC related details
- Database query reports: Query analysis
- CPU utilization graphs: Processor usage details
Now it comes to eliminate possibilities one by one. Let us list the possibilities first.
- CPU shortage
- Not enough memory allocated
- Memory intensive operations
- Memory Leak
We take these points one by one till we identify the actual culprit.
CPU Shortage:
Check CPU utilization graph/numbers. If the numbers are always hitting 100% then there might be a problem with the processor infrastructure. You may want to suggest infrastructure improvement. Just to confirm this investment in infrastructure, you can opt to go through the remaining possibilities and eliminate those also.
Not Enough Memory Allocated:
If it is not CPU the problem, then check JVM memory behavior graphs. The observations pointing towards this problem are –
- JVM memory usage is always around 100%
- Frequency of (complete) GC activity is high
If you observe any of these then try allocating additional memory to the JVM under use. For some applications it is application server’s JVM, while sometime it is an external JVM. You can change the memory required using –Xms and –Xmx JVM start up parameters for minimum and maximum heap size respectively.
Memory Intensive Operations:
Again we go back to memory graphs of JVM, to find a peculiar behavior, there is sudden increase in memory usage, GC is trying to reclaim memory but even after multiple attempts it is able to claim very less amount. Also the time taken to clean up memory is very high as compared to rest of the GC activities. You can verify this using server/application/error logs which have verbose GC output. Here is an example of such output.
[memory ] 32116.512-32119.934: GC 1740800K->1642039K (1740800K), 1846.869 ms
[memory ] 32123.702-32127.357: GC 1740800K->1652810K (1740800K), 3655.000 ms
[memory ] 32130.501-32134.321: GC 1740800K->1647695K (1740800K), 3820.000 ms
[memory ] 32160.123-32163.956: GC 1740800K->1591811K (1740800K), 3833.000 ms
[memory ] 32234.630-32238.500: GC 1740800K->1505867K (1740800K), -544.961 ms
[memory ] 32283.039-32286.981: GC 1740800K->1423925K (1740800K), 3942.000 ms
[memory ] 32355.097-32358.892: GC 1740800K->1534577K (1740800K), -619.947 ms
[memory ] 32431.978-32435.834: GC 1740800K->1268371K (1740800K), -1336.950 ms
[memory ] 32641.995-32645.666: GC 1740800K->1197137K (1740800K), -1522.033 ms
[memory ] 32892.624-32892.769: GC 1740800K->83716K (1740800K), 145.000 ms
Now you go back to the application logs and check the event during such behavior, it will definitely point you to the root cause of memory intensive operation. Also you can check database queries and the amount of data returned to java layer. It may be huge amount of data that flows out.
Other observations on JVM memory graph that pointing towards long running heavy transactions and web application session problems is memory graph moving upwards in steps, and maintaining higher level even after GC activity. Method is again same, go back to logs and see what was triggered during this time. It should give you required application event pointers.
Sometimes you can improve JVM performance by tuning it. Java provides many parameters (which give off course operating system dependent results) which can be altered to get best throughput from JVM.
Memory Leak:
Here the tricky business starts. One of the symptom will be that the GC activity always resulting in available memory which is lesser than the available memory of previous any GC, and this continues for each GC. Then there is possibility of memory leakage. To investigate it further, I would reckon to first start with heap dump and thread dump. If you can get something just prior to error then that is best, but few periodic dumps also help. There are many tools available which can help in analysis of this data. Output will be threads which are hanging around very long and occupying memory. Objects which are not getting garbage collected. These will help you to identify the use cases (business scenarios) which are resulting in memory leak. Now we have few selected use cases which might be resulting in objects which cannot be garbage collected. Further analysis can be done using a profiling tool on an environment which is similar to production. I found this interesting article which explains methods to be followed in this analysis. Most of the matured profiling tools provide similar data, hence analysis can be done on same lines.
After going through above elimination process, I am sure it will be possible to find out the root cause of the OutOfMemoryError. You can identify the fix suitable to the application business needs.










awesome post and very helpful ,big thanks to you!
Awesome post!!!
Thank you very weight analysis.
Really useful…Good one
Leave your response!
Subscribe
Subscribe Via Email
Recent Posts
Recent Comments
Tags
Categories
Archives