API Rate Limiting

12 min readSep 24, 2023

These articles/stories are part of a series on #tinysystemdesign. You can follow on my linkedin for immediate notifications of post of your interest.

🌟 Here the goal is to focus on clarity over tech jargon, making it perfect for newcomers. Dive into essential topics without the overwhelm.

A rate limit is a restriction on the number of requests that a user or client can make to a server within a specified time period. The rate limit can be applied to requests based on various criteria, such as the user’s IP address, authentication credentials, or the requested URL.

An important role of the components/services in the above diagram is noted below.

Client: This represents the user or system making the API request.
API Gateway: This is the AWS service that handles incoming API requests.
Usage Plan: This is where the API key is checked and associated rate limits and quotas are retrieved.
Rate Limit & Quota: This represents the actual rate limits (e.g., requests per minute) and quotas (e.g., total requests per day) set for the API key.
Allow/Deny Request: Based on the rate limit and quota checks, the request is either allowed to proceed or denied.
Backend Service: If the request is allowed, it’s forwarded to the actual backend service that handles the business logic.
Throttle Response: If the request exceeds the rate limits, a throttle response is sent back to the client.

For example, an API might have a rate limit of 100 requests per minute for unauthenticated users, and a higher limit of 1000 requests per minute for authenticated users. This rate limit can be applied on url basis, for example all requests made to a specific endpoint like /api/users might have a rate limit of 50 requests per minute where as for other endpoints it might differ.

It’s also common for rate limits to be based on a “rolling window” of time, rather than a fixed time period. In this case, the number of requests is tracked over a moving window of time, and once the limit is reached, the user will have to wait until the oldest request in the window falls out before making any more requests.

A note on code demo. Treat the code demo as psuedocode as in real applications you might need to add a lot more checks to meet your final goal. You may use some other third-party library or may use the Cloud Gateway services for the same. But the overall concepts remain the same.

Some Theory and background

API rate limiting is a technique used to control the number of requests that a client can make to an API within a specified period. It is an essential aspect of API management, ensuring the stability and availability of services by preventing excessive use that could lead to server overloads or abuse. Here’s an in-depth explanation of API rate limiting, including its importance and implementation methods:

Why Rate Limiting is Important

Preventing Abuse: Rate limiting protects against Denial-of-Service (DoS) attacks and abusive behaviors such as spamming API endpoints.

Resource Management: By limiting requests, servers can efficiently manage resources, ensuring that no single client monopolizes server capacity.

Cost Control: Rate limiting can help control costs, especially in cloud environments where resources are billed based on usage.

Ensuring Fairness: It ensures fair access to resources for all clients, preventing one client from negatively impacting the experience of others.

Compliance: Some services may have legal or contractual obligations to limit data throughput or access, which can be enforced through rate limiting.

Common Rate Limiting Strategies

Token Bucket Algorithm:
— This algorithm allows a fixed number of tokens to be issued at a regular interval.
— Each request consumes a token. When tokens are exhausted, requests are denied until more tokens are added.
— It allows for a burst of traffic as long as tokens are available.

Leaky Bucket Algorithm:
— Requests are processed at a constant rate. Excess requests are queued and handled in order.
— This approach smooths out traffic spikes but can introduce delays for queued requests.

Fixed Window:
— Limits the number of requests in a fixed time window (e.g., 100 requests per minute).
— Simple to implement but can allow bursts at the edge of windows.

Sliding Window:
— Similar to the fixed window but tracks requests in a rolling window to prevent bursts at the edges.
— More complex but provides smoother enforcement.

Quota:
— Limits the total number of requests over a longer period (e.g., 10,000 requests per day).
— Often used in combination with other rate limiting techniques.

Implementation of Rate Limiting

Rate limiting can be implemented at various layers of an API architecture:

Server-side Implementation:
Use built-in capabilities in API gateways like AWS API Gateway, Azure API Management, or Google Cloud Endpoints.
Implement middleware in your API server using libraries like express-rate-limit for Express.js or flask-limiter for Flask.

Client-side Implementation:
Sometimes, clients voluntarily enforce rate limits by respecting server-defined limits.

Network-level Implementation:
Use network appliances or proxies like NGINX or HAProxy to enforce rate limits at the network edge.

Code Demo (Python Example)

There are many ways to implement rate limiting in code, depending on the programming language, framework, and specific use case. Here is an example of how rate limiting could be implemented in Python using the Flask web framework and the Redis key-value store:

from flask import Flask, request
from redis import Redis
import time
app = Flask(__name__)
redis = Redis()
RATE_LIMIT = 100  # requests per minute

@app.before_request
def rate_limit():
    user_id = request.headers.get("X-User-Id")
    if user_id:
        key = f"rate_limit:{user_id}"
        # check if the user has exceeded the rate limit
        if redis.get(key) and redis.ttl(key) > 0:
            return "Rate limit exceeded", 429
        
        # otherwise, set the key with an expiry time
        redis.set(key, 1, ex=60)
    else:
        # Requests coming from unauthenticated users
        ip_address = request.remote_addr
        key = f"rate_limit:{ip_address}"
        if redis.get(key) and redis.ttl(key) > 0:
            return "Rate limit exceeded", 429
        
        redis.incr(key)
        redis.expire(key, 60)
        if int(redis.get(key)) > RATE_LIMIT:
            redis.set(key, 0)
            time.sleep(60)
@app.route("/api/users")
def get_users():
    return "Returning list of users"

Here, requests are rate-limited on the basis of IP address if the request comes from unauthenticated user and user_id if it comes from authenticated users.

Redis is used to store the rate limit, and it is accessed using the redis object, which is instantiated at the beginning of the script. The @app.before_request decorator is used to apply the rate limiting function to all routes, the rate limit is being set to 100 request per minute

This is just one example, it can be implemented in other way too, for example using libraries like ratelimiter or similar depending on the technology stack.

Code Demo (Python Example) — Rate Limit based on URL

Here’s an example of how rate limiting could be implemented in Node.js using the Express.js framework and the express-rate-limit middleware:

const express = require('express')
const rateLimit = require("express-rate-limit");
const app = express()
const apiLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100, // limit each IP to 100 requests per windowMs
  message: "Too many requests, please try again later"
});
// Apply the rate limiter to all requests to the /api endpoint
app.use("/api", apiLimiter);
app.get("/api/users", (req, res) => {
  res.send("Returning list of users")
});

This example uses the express-rate-limit middleware to rate limit requests to the /api endpoint. The rate limiter is configured to allow a maximum of 100 requests per minute per IP address. If a client exceeds this limit, the middleware will return a 429 Too Many Requests response with the message "Too many requests, please try again later".

You can also apply rate limiting to specific route, for example if you want to limit requests to /api/users

const userLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 50, // limit each IP to 50 requests per windowMs
  message: "Too many requests, please try again later"
});

app.use("/api/users", userLimiter);

This creates a new instance of rate limiter with a maximum of 50 requests per minute per IP address, And will apply this limiter only to requests made to the /api/users route.

Please note that this is an example implementation and you may need to modify it depending on your use case and requirements. Also, the rate limiting is based on IP address, there are other ways to identify clients like session ID, api key, etc.

Apply to login endpoint using Express.js

Here’s an example of how rate limiting could be implemented for a login endpoint using the Express.js framework and the express-rate-limit middleware:

const express = require('express')
const rateLimit = require("express-rate-limit");
const app = express()
const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 5, // limit each IP to 5 login attempts per windowMs
  message: "Too many login attempts, please try again later"
});
app.use("/api/login", loginLimiter);
app.post("/api/login", (req, res) => {
    // handle login logic
    const { email, password } = req.body
    // check email and password
    // If valid, return success
    // otherwise, return error
});

This example uses the express-rate-limit middleware to rate limit login attempts to the /api/login endpoint. The rate limiter is configured to allow a maximum of 5 login attempts per 15 minutes per IP address. If a client exceeds this limit, the middleware will return a 429 Too Many Requests response with the message "Too many login attempts, please try again later".

You can also apply rate limiting based on the user’s email address instead of IP address to make it more secure. To do this, you will have to modify the middleware to extract the email address from the login request, and use it as the key for the rate limiter.

const loginLimiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 5, // limit each email to 5 login attempts per windowMs
  keyGenerator: (req) => { return req.body.email },
  message: "Too many login attempts, please try again later"
});

With this modification, the rate limiter will keep track of login attempts for each email address, rather than IP address. Please note that this is an example implementation and you may need to modify it depending on your use case and requirements. Also, this is only one way to secure rate limiting, you may use other technique or combine it with other security mechanisms like Captcha, 2FA to provide additional layer of security.

Rate Limiting with Nginx

Yes, rate limiting can also be implemented at the server level using a web server such as Nginx.

Nginx has a built-in module called ngx_http_limit_req_module https://nginx.org/en/docs/http/ngx_http_limit_req_module.html that can be used to rate limit requests. Here's an example configuration that demonstrates how to rate limit requests to a /api/login endpoint to 5 requests per minute per IP address:

http {
    limit_req_zone $binary_remote_addr zone=one:10m rate=5r/m;
    server {
            ...
            location /api/login {
                limit_req zone=one burst=5;
                ...
            }
        }
    }

This configuration sets up a shared memory zone called one that is 10MB in size. The $binary_remote_addr variable is used as the key for the rate limiter, which means that the rate limit will be applied to each individual IP address. The rate=5r/m parameter sets the rate limit to 5 requests per minute.

The location block then applies the rate limit to requests to the /api/login endpoint using the limit_req directive. The burst parameter controls how many requests will be allowed to exceed the rate limit before the limit is enforced.

Also note that this is an example configuration. This can be more complex in real-world scenarios. You can also configure rate limiting based on other request parameters such as the user-agent, referrer, and more. Additionally, Nginx can be integrated with other tools like redis, lua, etc for more fine-grained rate limiting.

It’s also worth noting that rate limiting at the Nginx level does not provide the same level of granularity as rate limiting at the application level, as you cannot rate limit based on the user’s email or session id, and cannot customize the error message.

Using API key with Nginx for Rate Limiting

You can use API keys to rate limit requests in Nginx by extracting the API key from the request headers and using it as the key for the rate limiter. Here’s an example configuration that demonstrates how to rate limit requests to a /api endpoint using a custom X-API-Key header:

http {
    limit_req_zone $http_x_api_key zone=api_keys:10m rate=1000r/s;
server {
        ...
        location /api {
            limit_req zone=api_keys burst=1000;
            ...
        }
    }
}

This configuration sets up a shared memory zone called api_keys that is 10MB in size. The $http_x_api_key variable is used as the key for the rate limiter, which means that the rate limit will be applied to each individual API key. The rate=1000r/s parameter sets the rate limit to 1000 requests per second.

The location block then applies the rate limit to requests to the /api endpoint using the limit_req directive. The burst parameter controls how many requests will be allowed to exceed the rate limit before the limit is enforced.

It’s worth noting that this is just one example of how to rate limit based on API keys, you can also choose to extract the key from query parameter or cookies or other ways and use that to rate limit

Also, you will have to make sure that your application is sending the api key in the request header and the key is unique and associated with specific user or client, and is kept secure.

It is also important to mention that rate limiting alone is not enough to protect your API from malicious usage or abuse, you should use multiple security mechanisms in place like authentication, proper key handling, and monitoring the rate limiting rules and the usage patterns to identify potential attacks.

Rate Limiting using Amazon API Gateway

Below are the process by which Rate limiting can be set with AWS API Gateway (Watch out for the official docs as these steps might have changed, or improved upon). Documented here just for reference purpose.

Log in to the AWS Management Console:
- Navigate to the Amazon API Gateway service.
Choose or Create an API:
- If you already have an API, select it. Otherwise, create a new API.
Set Default Method Throttling:
- In the navigation pane, choose “Stages”.
- Select the stage for which you want to set up rate limiting.
- In the “Settings” tab, under “Default Method Throttling”, you’ll see two settings:
— Rate: The number of requests allowed per second.
— Burst: The number of burst requests allowed. This is typically used with the token bucket algorithm for rate limiting.
— Set your desired values for “Rate” and “Burst”.
Override Method-Level Throttling (Optional):
- If you want to set custom throttling for specific methods:
— In the navigation pane, under your selected stage, expand the resource tree to find the method you want to customize.
— Select the method, and in the “Method Throttling” section, override the default settings by specifying custom “Rate” and “Burst” values
Enable API Caching (Optional):
- API caching can reduce the number of calls made to your endpoint and also reduce latency for your end users.
- In the “Settings” tab of your stage, enable caching and define the cache capacity.
Set Up Usage Plans and API Keys:
- If you want more granular control over who can access your API and at what rate:
— In the navigation pane, choose “Usage Plans”.
— Click “Create” and define the name, description, and associated API stages.
— Under the “Throttle” tab, set the desired rate and burst values.
— Under the “Quota” tab, you can set a maximum number of requests in a given time period (e.g., per day or month).
— After creating the usage plan, you can create and associate API keys with this plan. This allows you to distribute these keys to your clients, and they must include these keys in their requests.
Deploy Your API:
— After setting up rate limiting, ensure you deploy your API to apply the changes.
— In the navigation pane, choose “Resources”.
— Click on “Deploy API”, select the desired deployment stage, and deploy.
Monitor and Adjust:
- Use Amazon CloudWatch to monitor the API calls and throttled requests. Based on the metrics, you can adjust your rate limits as needed.
Additional Considerations:
- Remember that rate limiting settings in API Gateway are separate from any rate limits or throttling behavior implemented in the backend services your API might be integrated with.
- Ensure that your backend can handle the maximum allowed request rate.

By following these steps, you can effectively set up rate limiting for your APIs using AWS API Gateway. This helps in protecting your backend services from excessive load and potential abuse.

AWS does has exhaustive settings/control to apply rate limiting.

https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html

Similarly GCP (Google Cloud) has it’s own configuration.

Follow me on LinkedIn for the latest insights tailored for budding tech enthusiasts Rajesh Pillai | LinkedIn

PS: Please refer the respective cloud providers for the steps as minor changes might happen as far as certain configurations go.