Lessons in Scaling Applications for Account Aggregator Tech Products
Authors: Sagar Bodapati and Kunal Kishan
Abstract
Over the past few years, the Account Aggregator ecosystem use cases have exploded, circumventing the apprehensions about the ecosystem. As systems evolved, Account Aggregator Tech slowly became a B2C pipeline and demanded higher uptimes, availability, and low latency. This blog details the evolution of the AA Tech platform to meet the quality of services like reliability and resiliency. It showcases the lessons learned and the strategies implemented to handle large traffic expected out of the B2C pipeline. The journey involves addressing challenges across various layers of the architecture, including Web Application Firewall, Web Servers, application layer, caching mechanisms, databases and file storage. It covers the challenges faced in each of these components and trade-offs. The journey details how a monolith is converted to microservices for horizontal scalability. Compatibility w.r.t SaaS vs On-Premises and maintaining similar reliability given the limitations in these deployments is also explored.
Background:
An Account Aggregator (AA) ecosystem is a network of participants that exchange users’ financial information. We built the tech modules to enable Banks and NBFCs to join the network. Beyond that, we were building several use cases to power Financial entities to leverage the AA ecosystem.
Business Objectives:
- Deliver the financial information (FI ) with low latency, forming success metrics.
- Authentic information delivery — Strengthens the Trust of the ecosystem.
- Systems are to be built for future scale — Future of FI delivery within the regulatory framework.
The whole application system serves a standard specification and internally integrates with various heterogeneous clients’ APIs, Databases, and Native vs. Cloud environments. Hence, it’s crucial to separate concerns and isolation between tenants. When thinking of scale, it’s also essential how developers focusing on multiple tenants can deliver faster.
Engineering Objectives:
1. Integration of heterogenous systems: Includes core banking systems that broadly vary in media types (CSV, XML, Json, Stream), security mechanisms (RSA, PGP, AES, Diffie Hellman), protocols (REST, SOAP, ISO 8583), content delivery, synchronous vs polling, paginations.
2. Building a universal platform that can serve on cloud or bare-metal environments, with technology restrictions from on-premises environments.
3. Custom throttling per customer
3. Supporting multiple databases to cater to client requirements.
4. Bound to regulatory compliance.
5. Single code base yet sandboxing of clients and support customisations.
6. Building for the network of entities where any system can fail with cascading effects.
Approach:
Your house should be in order before you consider the scale. Some critical aspects are modelling the system and choosing design patterns based on its characteristics.
The Seed Architecture:
Heterogeneous Integrations:
To address the problem, the primary step was to separate the concerns and isolate the complexity of variance to individual modules.
Inspired by the Junit Platform, we have created a specification library to define the response schema of each core banking integration module. This allowed us to separate and parallelize the development of the core engine from independent core banking integrations. With the Spring Frameworks component scan feature, we could dynamically load each core banking integration driven by configuration. The resource allocation and context for each module were localised. This allowed us to keep the core engine pure despite the customisations of core banking integrations.
Polyglot Persistence:
To support different databases, we have used Spring JPA with JPQL. However, JPQL did not produce optimised queries for specific cases for relevant databases. We had to use JPA native queries. However, adding JPA native queries in Spring repositories couldn’t support both databases. JPA offered another feature to externalise the native queries. This enabled us to maintain separate native queries for MySQL, Oracle And PostgreSQL.
Universal Deployments:
To support deployments across multiple clouds and bare-metal, the approach minimised the cloud-specific services to keep the tech stack within the accepted criteria of On-premises banking. But for the cases where the cloud adds operational efficiency and cost reductions, we have created adapters to support multi-cloud service integrations.
API Consumption Throttling:
When consuming any APIs, it is essential to think about the capacity of the upstream system, the burst the system can take, and the recovery duration. While we must refrain from overloading the upstream system, it is equally important to leverage the maximum potential the upstream can support. We have implemented smart rate controls using Bucket4J and Redis distributed rate limiter to address this.
High Availability: Think about the State!
The application components must be redundant and have a failover mechanism for HA. For the app server to be multinode, it can not have the localisation of state. Requests can reach any app server, and each node must have access to the latest state.
The application primarily stored the state in the following places.
- In-memory Cache
- File System
- Database
- Schedulers
To share the cache across the nodes, we have used Redis, a distributed in-memory cache with persistence capabilities. Two popular options are Object Storage of the Cloud (S3) and Elastic File System (EFS), which share the file system across the nodes. We latched onto the S3 for its simplicity and rich features. And for On-Premises, we have used a Network file system. To avoid duplicate runs of schedulers, a leader election algorithm was applied through Redis. Since the database is common for all application nodes, we were able to remove the state from the application.
Thus, we have developed a system design in which each component can be individually targeted and optimised for scalability. Our house is in order! Now, we focus on scale, i.e., supporting High TPS.
Achieving Horizontal Auto Scaling:
First, we containerised our application using Docker. We followed the best practices for building Docker images for efficiency. We leveraged lightweight alpine images for better security through a smaller surface area of attack for production. We chose the open-source platform Kubernetes for container orchestration, as it can even be deployed on bare metal if needed.
Separation of audit & metrics from Transactional System:
As the traffic scaled, the core engine spent significant time writing the audit data to the database. Because audit data and transactional data were in the same database, the DB server spent significant Disk IO and resources on non-transactional work. Additionally, analytics and report systems connecting to the same database put heavy pressure on the database.
Applying the separation of concerns pattern, We had to separate the audit & metric data from the transactional database. However, the application still spends significant resources writing audit data to the audit database. Typically, the application takes 1–3 ms on each write. On a large scale, this is still a concern. This led us to stream the events from the application to a high-throughput intermediate message broker and have a separate consumer application to process those events and write to the audit database. The popular choice was to use Kafka, which can be scaled to stream millions of messages per second. However, for On-premises, it was a complex choice for the scale of dedicated clients. Hence, we leveraged Redis to stream events in such scenarios.
Scaling Databases !!
In scaling distributed applications, databases are often the hardest to scale since they contain the state. While there are several techniques for horizontal scaling, most of them are hard to manage and expensive choices. Below are some of the categories where we improved the database’s performance.
Denormalisation to reduce the joins:
For most of our cases in the transactional system, complex joins could be avoided entirely at the cost of redundancy in storage.
Applying CQRS Pattern:
Leveraging the slave database for read operations can reduce the load on the master. We had to implement this pattern carefully, remembering that the slave database may not always be current.
Reduce the number of hits on DB with caching:
For many queries during the journey, similar data was being queried by different components. Leveraging cache and optimising for a high cache hit ratio reduced the impact on the database for repeated queries.
Avoid locking in databases:
For one of the cases of synchronous transaction updates, we initially used pessimistic locking. We frequently ran into deadlocks and high lock wait times for specific scenarios. Instead, moving such locks to Redis has dramatically helped.
Avoid high-impact DDls or alter table queries during peak times:
It is critical to be aware of the impact of the queries and know the query plan and consequences. The MySQL documentation for online DDLs was handy for us.
Critical microservices can have a dedicated database:
High-impact microservices require low Disk IO, CPU, and Buffer wait times. Having a dedicated database for such components helped us with lower latencies.
Choose the Right Instance Type ( AWS ) :
Choose a T3 family or C5 family instance for your database, then determine why it starts choking on the I/O. Scale up and add some RAM and CPU. Any better? You ask. The instance type starkly determines how the I/O will scale. The instance type determines the baseline I/O and has burstable peak capacity, after which it comes down to the baseline I/O. The instance type fixes the sustaining capacity of the burst load, which is often 30 minutes. There you go! The consistent I/O demand by the application should be studied, and the instance type should be determined accordingly. Do refer to Amazon EBS optimised performance1
Partition the tables:
Mysql Innodb supports storing each table in a separate file. However, for large tables, having all the data in a single file reduces query performance and causes a heavy load on Disk IO. To perform partitioning, we have to choose the partitioning key. On what basis would you like the data to be partitioned?
e.g.:
- Each month, data is in a separate file. → range based partitioning
- Each client data in a separate file → list-based partitioning: Based on a list of values
Innodb supports several types of partitioning. Popular options include
- Range based partitioning
- List based partitioning
- Hash-based partitioning
- Composite partitioning.
For most of our tables, range-based partitioning was a natural choice. For some tables where the data was maintained for the long term, hash-based partitioning on a key helped in efficient lookups based on the key.
Efficient Data Archival Using Partitioning:
In traditional data archiving, a tool like the Percona archiver is used. If historical data has to be removed, the tool first queries the data for the relevant window and dumps the data to a file in batches. This process involves many disks seeking a large number of records. After that, the data has to be deleted from the master database, which also takes significant time and disk space. Even after such activity, the OS does not reclaim the freed space.
For our case, most of our tables are range partitioned by the created_timestamp column. Data is stored in a separate file each month. In most tables, our transactional system does not require the old data after a month. At the start of every month, T-2 month data can be archived. Though the T-2 month data is entirely in a separate file, there is no direct way to dump the data and truncate that partition.
Mysql Innodb has a feature called exchange partition. We can swap partition data with an empty table with the same schema without the partitioning definition. So, we created an archive schema and an empty table with the same schema without partitioning definitions. Then, the T-2 partition data can be exchanged with the empty archive table. At the end of this step, the T-2 partition contained empty data, and the archive table contained the T-2 months’ data. Now, using mysqldump archive schema can be dumped without much effort. This method eliminates the burden of data archival on the master database.
The traditional archiver approach can be used for tables where this method is not applicable. In our case, partition-based archival was very effective.
Indexing: Leverage composite indices.
Indexing is essential for large-scale databases. Even a small set of indexes can slow down the database. We optimised the number of rows scanned using the proper composite indexes, reducing the impact on Disk IO.
Have slow query logging and deadlocks monitored:
Having a mechanism to capture and monitor slow query logging actively helped us identify scenarios where the proper indexes were not picked. We implemented a process to review slow queries weekly until they zeroed down.
Observability :
The application should have the telemetry data recorded via tools like ELK stack. System SLAs are in check via monitoring systems like Nagios and Paging systems like Pager Duty. Telemetry Data is the single source from which developers can optimize their applications.
Resiliency and Fault Tolerance:
The platform has to deal with several external network participants over HTTP, as the specification defines. If any of the participants turned slow, the cascading effect can impact other system components. To address this, we have used the bulkhead design pattern to isolate the threads associated with each participant. And the circuit breaker pattern deals with repeated failures.
We have introduced messaging queues to handle the backpressure and custom retry strategies to allow graceful recovery of the downstream systems.
In summary, building large-scale applications requires a transparent thought process. Model the system, know the scale of your product, and know what business it caters to. Then, design patterns that allow the system to be easily broken down into separate components, each loosely coupled with the others, are applied. Each element plays its part, yet it becomes an orchestra when combined. Systems need constant behavioural assessment via observability so that they are constantly fine-tuned.