How Bigger Heaps Might Slow Down An Application
In this article, we’ll learn more about how JVM stores objects and their representation in memory. Additionally, we’ll dig deeper into the performance implications and how we can use them to our benefit.
Also, we’ll check how we can use -XX:+UseCompressedOops and how it might affect the performance of our application. Moreover, we’ll learn the connection between UseCompressedOops and the heap’s size.
1. The Size of an Object
JVM represents Java objects in memory using the following structure:
Let’s review these sections in more detail to understand their purpose and the data they store.
1.1. Mark Word
This section stores the runtime data of an object. JVM might place here the information about locks, survivor counts, garbage collection marks, and even hashcodes. The size of this section entirely depends on the JVM’s architecture. 32-bit architectures will use four bytes for a marked word, and 64-bit – eight bytes. Although we cannot change its size directly, the architecture might affect the application’s performance.
1.2. Klass Pointer
The klass [sic] pointer is the pointer to the object class. It refers to the class metadata, which is helpful for many things: method invocation, calculating field offsets, memory allocation, garbage collection, etc. The pointer itself can take either four bytes or eight bytes of information. The most exciting part of this section is that we can alter it with JVM arguments, which we’ll discuss later in the article.
1.3. Array Length
Array length is set only for arrays. It takes four bytes on any system. However, it doesn’t mean we can use unsigned integers for indexing, as we can fit in the array length section. Each JVM may impose its restrictions, and trying to allocate a huge array might cause an OutOfMemoryError with the following message: Requested array size exceeds VM limit.
1.4. Internal Padding
Internal padding is optional and aligns the size of the object header. Its size depends on the JVM architecture. 32-bit architectures use four-byte alignment; 64-bit uses eight bytes by default. We can alter this behavior, but this discussion is outside the article’s scope.
1.5. Instance Fields
Instance fields contain the values of the object fields. The size of this section depends solely on the field types and their composition. JVM might perform an optimization and rearrange the order of the fields. It’s called field-packing. This process aims to reduce the memory wasted on padding.
1.6. External Padding
This final padding ensures the correct alignment of the next object. Sometimes, objects align to the correct number of bytes; sometimes, we need additional padding. We can exploit this knowledge and add more fields to the class without increasing the memory footprint.
2. Object’s Size
Let’s calculate the size of an empty object and an empty array on different JVM architectures. We’ll consider the default 8-byte alignment of objects, though we can change it. The general formula would look like this:
Empty Object’s Size = Mark Word + Klass Pointer + Padding
Thus, in general, we’ll have the following results:
For arrays, we’ll have a similar formula that will just include the array’s length:
Empty Object’s Size = Mark Word + Klass Pointer + Arrays’ Length + Padding
We can have similar calculations for empty array sizes:
* We cannot use 64-bit pointers on 32-bit JVMs.
3. Object Pointers
JVM stores objects in a heap; to work with these objects, we should be able to reference them by their addresses. If we have 4 GB of memory, we need eight-byte addresses and want to address every byte individually.
Does this mean that we cannot use eight-byte addresses for larger heaps? JVM has an optimization that restricts the start of an object and aligns it to eight bytes by default.
With alignment, the address’s last three bits will always be zeros. Thus, we can omit them while storing the addresses, which technically allows us to compress addresses from 32 to 29 bits. The smaller address size means we can increase the addressable memory eight times(!): from 4 GB to 32 GB:
Pointer Compression
When we want a full address from the compressed one, we can make a bitshift to the left and decompress it. However, as nothing comes for free, we’ll spend CPU cycles on this operation:
00000000000000000000000000101 << 3 = 00000000000000000000000000101000
If we use a heap larger than 32 GB, JVM, by default, allocates 64-bit for addresses. Thus, the size of our objects will increase. At the same time, the size of our empty object on 64-bit JMVs won’t change due to alignment rules.
This knowledge might be handy sometimes, as we can store more information without significantly increasing memory consumption. However, relying on the performance benefits of utilizing padding space isn’t recommended.
4. Performance Implications
We’ll be using a LinkedList of Integers with one million elements:
Also, we’ll have a simple logic that would filter and count odd and even numbers from this list:
4.1. Decompression and CPU cycles
Let’s run the following benchmark:
This code produces barely any garbage, so let’s run it with different heap sizes. We will run several tests for 2 GB, 4 GB, and 8 GB heaps. Technically, it should not affect the performance. Also, we’ll be using -XX:+AlwaysPreTouch to avoid heap resizing issues:
Performance analysis for the filteringList() benchmark
Compressed pointers benefit the object sizes but, at the same time, require additional CPU cycles for decompression. Because the only thing our benchmark does is dereferencing the nodes in the LinkedList, we have a higher overhead while using compressed pointers.
Thus, we can have a significant performance penalty on such applications. In our benchmark, the difference is substantial: ≈194 ops/s against ≈242 ops/s.
The issue is that JVM switches them on automatically when the heap size exceeds 4 GB, a default behavior from Java 6. Sometimes, we can make an application slower by adding more memory.
That’s why we need to pay closer attention to heap size management. We should balance the application to allocate the “just right” amount of memory. The applications that don’t create much garbage and can perform better on smaller heap sizes.
We don’t use compression for larger heaps, which skips the compression step. At the same time, we have a higher memory footprint because the headers take up eight bytes.
4.2. Memory Footprint
Let’s check another benchmark with a higher object creation rate:
Performance analysis for the creatingList() benchmark
Here, we have the opposite result. The benchmarks with compressed pointers are more performant. The reason is that the objects take up less space.
4.3. Overall Performance
Let’s combine these two benchmarks to get a more reasonable result, as applications usually both create and iterate over the objects:
This benchmark would combine the memory consumption benefits we received from smaller headers and the cycles wasted on decompression:
Performance analysis for the creatingAndFilteringList() benchmark
In this setup, both compressed and uncompressed pointers behave similarly. However, overall, the compressed pointers have better performance. The result might not be the same for any application. The performance impact is based purely on the access patterns, object creation rate, and other factors.
5. Garbage Collection
Let’s check the behavior of the garbage collector during the previous benchmarks. We would consider the results for the examples run on the 8 GB heaps. To analyze the behavior, we’ll be using reports from GCeasy.
5.1. Low Creation Rate
First, let’s check our filtering benchmark. This one didn’t create any garbage, and the report is quite boring. The application behaves similarly on all tested heap sizes.
The only difference is the peak heap usage. It’s mainly based on the size of the initial List. Thus, the version that uses uncompressed pointers consumes slightly more memory.
Checking reports and visuals doesn’t make much sense. The benchmarks don’t produce any garbage, and we pre-initialized the heap with -Xmx, -Xms, and -XX:+AlwaysPreTouch, so the only line in the garbage collection logs is the following:
[0.005s][info][gc] Using G1
5.2. High Creation Rate
In the case of creatingList benchmark, we have a significant difference between creation rates. When using compressed pointers, we’ll get the following result:
We have larger objects, and the creation rate is higher. We can explain it by checking the garbage collection activity. In the case of uncompressed pointers, we’ll have more garbage collection cycles:
While compressed pointers would require fewer cycles:
Fewer cycles are reasonable as we need smaller objects to fill the heap. As a result, using compressed pointers, we have slightly better performance and faster garbage collection cycles:
Conclusion
Using compressed object pointers can help improve the performance of our application, but at the same time, it can make it run slower or sometimes won’t affect it. Everything depends on the application and our goals.
To identify the actual effect of -XX:UseCompressedOops, we should profile the application and see how it behaves in different circumstances. Understanding the internals of JVM provides more insights and prevents the use of JVM flags that won’t make any difference. At the same time, it’s always possible to check the assumptions using yCrash and ensure that theoretical optimization provides us with real benefits.