Cut Our AWS Bill by 45%

Cut Our AWS Bill by 45%

Identifying the Resource Gluttons

Application architecture was fairly standard for a modern Java application:

  • Spring Boot 2.7 backend services

  • PostgreSQL RDS instances for persistent storage

  • Redis for caching

  • EC2 instances in an auto-scaling group behind a load balancer

The first step was understanding exactly where our cloud dollars were going. I set up detailed cost allocation tags and analyzed our spending patterns using AWS Cost Explorer.

The results were surprising:

  • EC2 instances: 58% of total cost

  • RDS PostgreSQL: 25% of total cost

  • Data transfer: 12% of total cost

  • Other services (Redis, S3, etc.): 5% of total cost

Our EC2 costs were the obvious target, and digging deeper revealed something even more interesting: we were running far more instances than our traffic should have required. Our auto-scaling was frequently triggered, spinning up new instances that would remain underutilized.

The Resource Utilization Mystery

Our monitoring showed a peculiar pattern. Each EC2 instance would start with healthy metrics but gradually experience:

  1. Increasing CPU utilization (eventually hitting 70–80%)

  2. Growing heap usage in the JVM

  3. Slower response times

  4. Decreased request throughput

The Database Connection Revelation

After enabling detailed performance monitoring and log analysis, we discovered something surprising: our application was creating an excessive number of database connections, and many weren't being properly closed.

A typical API request flow would look something like this:

But our connection usage didn't align with this clean pattern. Instead, we were seeing:

These leaked connections would accumulate until our connection pool was exhausted, causing performance degradation that would eventually trigger auto-scaling.

Application was using Spring Data JPA with some custom repository implementations that weren't properly managing transaction boundaries and connection lifetimes.

The Spring Boot 3.5 Revelation

Around this time, Spring Boot 3.5 was released with several notable improvements to database connection management and ORM performance. The release notes mentioned "significantly improved resource utilization," which caught my attention.

After some research, I discovered that Spring Boot 3.5 included:

  1. Enhanced connection pool integration

  2. Improved transaction management

  3. Smarter resource cleanup

  4. Better handling of lazy loading scenarios

Could an upgrade help with our specific problems? It seemed worth investigating.

The Configuration Changes That Made All the Difference

We decided to upgrade to Spring Boot 3.5 and implement several critical configuration changes focused on database connection management. Here are the specific changes that had the biggest impact:

1. Connection Pool Optimization

We switched from the default HikariCP settings to a configuration optimized for our workload:

The critical addition here was the , which helped us identify and log potentially leaked connections. Setting a lower value also prevented us from keeping unnecessary connections open during quieter periods.

2. Transaction Management Improvements

We refined our transaction management configuration:

The setting was a game-changer. It ensures that database connections are acquired at the last possible moment and released as soon as the transaction completes.

3. JPA Query Optimization

We implemented several JPA and Hibernate optimization:

The new query optimizer in Spring Boot 3.5 was particularly effective at reducing the number of database queries required for our common operations.

4. Statement Caching

We enabled prepared statement caching, which had a remarkable impact on database performance:

These settings ensure that frequently used SQL statements are cached, reducing the overhead of statement preparation.

5. Context-Specific Connection Management

For our most resource-intensive endpoints, we implemented context-specific transaction and connection settings:

This allowed us to provide different transaction and fetch behaviors for specific operations rather than using a one-size-fits-all approach.

The Implementation and Immediate Results

Implementing these changes required careful testing, as database connection issues can be subtle and environment-specific. We:

  1. Created a staging environment that mirrored production

  2. Upgraded to Spring Boot 3.5 and implemented the new configurations

  3. Ran extensive load tests to validate the changes

  4. Monitored connection usage patterns before and after

The initial results were promising:

  • Average database connections per instance dropped from 7.8 to 3.2

  • Connection acquisition times decreased by 68%

  • No connection leaks were detected during 72-hour load tests

Understanding Why It Worked: The Technical Details

To truly appreciate why these changes had such a dramatic impact, it's worth understanding the technical improvements in Spring Boot 3.5 and how our configuration changes leveraged them.

Connection Lifecycle Improvements

Spring Boot 3.5 fundamentally changed how database connections are managed. In previous versions, the framework would often acquire connections earlier than necessary and hold them longer than required. The new setting ensures connections are:

  1. Acquired only when SQL is about to be executed

  2. Released immediately when the transaction completes

This dramatically reduces the connection holding time, allowing a smaller pool to handle the same workload.

Query Optimization Engine

The new query optimizer in Spring Boot 3.5 addresses several common inefficiencies:

  1. N+1 Query Prevention: It detects potential N+1 query patterns and converts them to more efficient batch queries

  2. Join Optimization: It analyzes entity relationships and chooses more efficient join strategies

  3. Fetch Size Tuning: It adjusts JDBC fetch sizes based on the expected result set size

Statement Caching

Database statement preparation has a non-trivial cost. By enabling statement caching, we allowed frequently used queries to bypass this preparation phase:

For our application, which runs a relatively consistent set of queries, this reduced database CPU usage and improved response times.

Leak Detection and Prevention

The leak detection configuration was critical for identifying remaining issues:

This setting logs detailed stack traces when a connection is held for longer than the specified threshold, allowing us to identify and fix the remaining connection management issues in our code.

Beyond Configuration: Code Changes That Complemented Our Approach

While the configuration changes provided the bulk of our improvements, we also made several code changes to complement them:

1. Simplified Repository Methods

We refactored complex repository methods to leverage Spring Data JPA's query derivation:

With Spring Boot 3.5's enhanced query optimizer, the simplified method performs better as the framework can make smarter decisions about fetch strategies.

2. Explicit Transaction Boundaries

We made transaction boundaries more explicit, especially for read-only operations:

The hint allows Spring and the database to optimize query execution further.

3. Async Processing for Batch Operations

For resource-intensive operations, we implemented asynchronous processing with controlled resource usage:

This approach ensured that batch operations didn't consume excessive database connections or CPU resources.

Lessons Learned: Best Practices for Spring Boot Database Configuration

Through this process, we developed several best practices that other teams might find useful:

1. Align Connection Pool Size with Available Database Connections

Your connection pool maximum size should be calculated based on:

For our RDS instance with 100 max connections and 5 application instances:

We rounded to 20 to allow for slight variations.

2. Monitor and Log Connection Usage

Enable detailed connection monitoring:

This provides visibility into connection usage patterns and helps identify issues.

3. Use Environment-Specific Connection Settings

Different environments have different needs. We implemented environment-specific profiles:

This prevented development environments from consuming unnecessary resources.

4. Regularly Review Query Performance

We implemented an SQL performance monitoring solution that logs slow queries and their execution plans:

This helped us identify and optimize the most resource-intensive queries.

The Unexpected Benefits Beyond Cost Savings

While the AWS cost reduction was our primary goal, we discovered several additional benefits:

1. Improved Developer Experience

With better connection management and clearer error messages, developers encountered fewer mysterious timeouts and connection issues during development.

2. More Accurate Load Testing

Our load tests became more predictable and representative of production behavior, allowing for better capacity planning.

3. Reduced Operational Incidents

In the six months following these changes, we experienced:

  • 87% fewer connection-related alerts

  • 92% fewer auto-scaling events

  • Zero production incidents related to database connectivity

4. Environmental Impact

Reducing our server count from 24 to 10 wasn't just a cost saving — it also represented a significant reduction in energy consumption and carbon footprint.

Conclusion: Configuration as a First-Class Optimization Strategy

As software engineers, we often focus on code optimizations, algorithmic improvements, and architectural changes when facing performance challenges. Our experience shows that configuration — particularly database connection configuration — deserves equal attention as a first-class optimization strategy.

Spring Boot 3.5's improvements provided the foundation, but it was our careful configuration tuning that unlocked the full potential of these enhancements. The result wasn't just cost savings but a more reliable, efficient, and environmentally friendly application.

Chirag Bolakani

Software Engineer | Backend | Python, Django, FastAPI | REST APIs | Ex Nokia |🏆 Devsoc'21 CodeChefVIT Winner | 4x Hackathon Winner 🏆

2mo

To view or add a comment, sign in

Others also viewed

Explore topics