How Bigger Heaps Might Slow Down An Application

yCrash

We solve Java Performance Problems in seconds

Published Apr 22, 2024

In this article, we’ll learn more about how JVM stores objects and their representation in memory. Additionally, we’ll dig deeper into the performance implications and how we can use them to our benefit.

Also, we’ll check how we can use -XX:+UseCompressedOops and how it might affect the performance of our application. Moreover, we’ll learn the connection between UseCompressedOops and the heap’s size.

1. The Size of an Object

JVM represents Java objects in memory using the following structure:

Let’s review these sections in more detail to understand their purpose and the data they store.

1.1. Mark Word

This section stores the runtime data of an object. JVM might place here the information about locks, survivor counts, garbage collection marks, and even hashcodes. The size of this section entirely depends on the JVM’s architecture. 32-bit architectures will use four bytes for a marked word, and 64-bit – eight bytes. Although we cannot change its size directly, the architecture might affect the application’s performance.

1.2. Klass Pointer

The klass [sic] pointer is the pointer to the object class. It refers to the class metadata, which is helpful for many things: method invocation, calculating field offsets, memory allocation, garbage collection, etc. The pointer itself can take either four bytes or eight bytes of information. The most exciting part of this section is that we can alter it with JVM arguments, which we’ll discuss later in the article.

1.3. Array Length

Array length is set only for arrays. It takes four bytes on any system. However, it doesn’t mean we can use unsigned integers for indexing, as we can fit in the array length section. Each JVM may impose its restrictions, and trying to allocate a huge array might cause an OutOfMemoryError with the following message: Requested array size exceeds VM limit.

1.4. Internal Padding

Internal padding is optional and aligns the size of the object header. Its size depends on the JVM architecture. 32-bit architectures use four-byte alignment; 64-bit uses eight bytes by default. We can alter this behavior, but this discussion is outside the article’s scope.

1.5. Instance Fields

Instance fields contain the values of the object fields. The size of this section depends solely on the field types and their composition. JVM might perform an optimization and rearrange the order of the fields. It’s called field-packing. This process aims to reduce the memory wasted on padding.

1.6. External Padding

This final padding ensures the correct alignment of the next object. Sometimes, objects align to the correct number of bytes; sometimes, we need additional padding. We can exploit this knowledge and add more fields to the class without increasing the memory footprint.

2. Object’s Size

Let’s calculate the size of an empty object and an empty array on different JVM architectures. We’ll consider the default 8-byte alignment of objects, though we can change it. The general formula would look like this:

Empty Object’s Size = Mark Word + Klass Pointer + Padding

Thus, in general, we’ll have the following results:

For arrays, we’ll have a similar formula that will just include the array’s length:

Empty Object’s Size = Mark Word + Klass Pointer + Arrays’ Length + Padding

We can have similar calculations for empty array sizes:

* We cannot use 64-bit pointers on 32-bit JVMs.

3. Object Pointers

JVM stores objects in a heap; to work with these objects, we should be able to reference them by their addresses. If we have 4 GB of memory, we need eight-byte addresses and want to address every byte individually.

Does this mean that we cannot use eight-byte addresses for larger heaps? JVM has an optimization that restricts the start of an object and aligns it to eight bytes by default.

With alignment, the address’s last three bits will always be zeros. Thus, we can omit them while storing the addresses, which technically allows us to compress addresses from 32 to 29 bits. The smaller address size means we can increase the addressable memory eight times(!): from 4 GB to 32 GB:

Pointer Compression

When we want a full address from the compressed one, we can make a bitshift to the left and decompress it. However, as nothing comes for free, we’ll spend CPU cycles on this operation:

00000000000000000000000000101 << 3 = 00000000000000000000000000101000

If we use a heap larger than 32 GB, JVM, by default, allocates 64-bit for addresses. Thus, the size of our objects will increase. At the same time, the size of our empty object on 64-bit JMVs won’t change due to alignment rules.

This knowledge might be handy sometimes, as we can store more information without significantly increasing memory consumption. However, relying on the performance benefits of utilizing padding space isn’t recommended.

4. Performance Implications

We’ll be using a LinkedList of Integers with one million elements:

Also, we’ll have a simple logic that would filter and count odd and even numbers from this list:

4.1. Decompression and CPU cycles

Let’s run the following benchmark:

This code produces barely any garbage, so let’s run it with different heap sizes. We will run several tests for 2 GB, 4 GB, and 8 GB heaps. Technically, it should not affect the performance. Also, we’ll be using -XX:+AlwaysPreTouch to avoid heap resizing issues:

Performance analysis for the filteringList() benchmark

Compressed pointers benefit the object sizes but, at the same time, require additional CPU cycles for decompression. Because the only thing our benchmark does is dereferencing the nodes in the LinkedList, we have a higher overhead while using compressed pointers.

Thus, we can have a significant performance penalty on such applications. In our benchmark, the difference is substantial: ≈194 ops/s against ≈242 ops/s.

The issue is that JVM switches them on automatically when the heap size exceeds 4 GB, a default behavior from Java 6. Sometimes, we can make an application slower by adding more memory.

That’s why we need to pay closer attention to heap size management. We should balance the application to allocate the “just right” amount of memory. The applications that don’t create much garbage and can perform better on smaller heap sizes.

We don’t use compression for larger heaps, which skips the compression step. At the same time, we have a higher memory footprint because the headers take up eight bytes.

4.2. Memory Footprint

Let’s check another benchmark with a higher object creation rate:

Performance analysis for the creatingList() benchmark

Here, we have the opposite result. The benchmarks with compressed pointers are more performant. The reason is that the objects take up less space.

4.3. Overall Performance

Let’s combine these two benchmarks to get a more reasonable result, as applications usually both create and iterate over the objects:

This benchmark would combine the memory consumption benefits we received from smaller headers and the cycles wasted on decompression:

Performance analysis for the creatingAndFilteringList() benchmark

In this setup, both compressed and uncompressed pointers behave similarly. However, overall, the compressed pointers have better performance. The result might not be the same for any application. The performance impact is based purely on the access patterns, object creation rate, and other factors.

5. Garbage Collection

Let’s check the behavior of the garbage collector during the previous benchmarks. We would consider the results for the examples run on the 8 GB heaps. To analyze the behavior, we’ll be using reports from GCeasy.

5.1. Low Creation Rate

First, let’s check our filtering benchmark. This one didn’t create any garbage, and the report is quite boring. The application behaves similarly on all tested heap sizes.

The only difference is the peak heap usage. It’s mainly based on the size of the initial List. Thus, the version that uses uncompressed pointers consumes slightly more memory.

Checking reports and visuals doesn’t make much sense. The benchmarks don’t produce any garbage, and we pre-initialized the heap with -Xmx, -Xms, and -XX:+AlwaysPreTouch, so the only line in the garbage collection logs is the following:

[0.005s][info][gc] Using G1

5.2. High Creation Rate

In the case of creatingList benchmark, we have a significant difference between creation rates. When using compressed pointers, we’ll get the following result:

We have larger objects, and the creation rate is higher. We can explain it by checking the garbage collection activity. In the case of uncompressed pointers, we’ll have more garbage collection cycles:

While compressed pointers would require fewer cycles:

Fewer cycles are reasonable as we need smaller objects to fill the heap. As a result, using compressed pointers, we have slightly better performance and faster garbage collection cycles:

Conclusion

Using compressed object pointers can help improve the performance of our application, but at the same time, it can make it run slower or sometimes won’t affect it. Everything depends on the application and our goals.

To identify the actual effect of -XX:UseCompressedOops, we should profile the application and see how it behaves in different circumstances. Understanding the internals of JVM provides more insights and prevents the use of JVM flags that won’t make any difference. At the same time, it’s always possible to check the assumptions using yCrash and ensure that theoretical optimization provides us with real benefits.

How Bigger Heaps Might Slow Down An Application

yCrash

We solve Java Performance Problems in seconds

1. The Size of an Object

1.1. Mark Word

1.2. Klass Pointer

1.3. Array Length

1.4. Internal Padding

1.5. Instance Fields

1.6. External Padding

2. Object’s Size

3. Object Pointers

4. Performance Implications

4.1. Decompression and CPU cycles

4.2. Memory Footprint

4.3. Overall Performance

5. Garbage Collection

5.1. Low Creation Rate

5.2. High Creation Rate

Conclusion

yCrash Weekly Bytes

2,431 followers

More articles by this author

Others also viewed

Singleton Pattern | Handling Multithreading and Performance Optimizations

DTO or not to DTO?

Spring boot Apps getting optimized

🔥 Real Lessons From Debugging a Memory Leak in Production

The triangle for GC performance metrics - What makes a good GC algorithm

Explaining OutOfMemoryError on Overhead Limit Exceeded

Comprehensive Comparison: Spring Boot vs. NestJS

JPA/Hibernate: Advanced Features

GraalVM vs OpenJDK GC Performance Comparison

Java for Everyone... Arrays

Explore topics

1. The Size of an Object

1.1. Mark Word

1.2. Klass Pointer

1.3. Array Length

1.4. Internal Padding

1.5. Instance Fields

1.6. External Padding

2. Object’s Size

3. Object Pointers

4. Performance Implications

4.1. Decompression and CPU cycles

4.2. Memory Footprint

4.3. Overall Performance

5. Garbage Collection

5.1. Low Creation Rate

5.2. High Creation Rate

Conclusion

yCrash Weekly Bytes

2,431 followers

Performance Lab Tests Say Green. Production Says Otherwise. Why?

Jun 10, 2025

What is Java Garbage Collection?

Apr 30, 2025

Reading & Analyzing GC Logs: A Step-by-Step Guide

Apr 2, 2025

What to Do When There Is a High Number of JVM GC Threads

Mar 24, 2025

Best Practices for GC Logging in Java Applications

Mar 18, 2025

7 Micro-Metrics That Predict Production Outages in Performance Labs Webinar

Mar 14, 2025

How to Perform Java Garbage Collection Analysis? (3 Easy Steps)

Mar 4, 2025

9 Micro-Metrics That Forecast Production Outages in Performance Labs

Feb 28, 2025

Top 5 Java Performance Problems

Feb 17, 2025

What is Java Garbage Collection?

Jan 29, 2025

Others also viewed

Singleton Pattern | Handling Multithreading and Performance Optimizations

DTO or not to DTO?

Spring boot Apps getting optimized

🔥 Real Lessons From Debugging a Memory Leak in Production

The triangle for GC performance metrics - What makes a good GC algorithm

Explaining OutOfMemoryError on Overhead Limit Exceeded

Comprehensive Comparison: Spring Boot vs. NestJS

JPA/Hibernate: Advanced Features

GraalVM vs OpenJDK GC Performance Comparison

Java for Everyone... Arrays

Explore topics