Please Give Me Back My Network Cables! On Networking Limits in AWS
1. Please give me back my Network Cables!
On Networking Limits in AWS
S T E F F E N G E B E R T
M I K L O S T I R P A K
S R E C O N 2 0 2 5 A M E R I C A S
2 0 2 5 - 0 3 - 2 6
3. 100G Ethernet
Throughput Limits
§ 148,809,524pps!
@64 bytes/frame
§ Per direction!
FIXME get own image
Hopefully correct, for illustration purposes only. Don’t dimension your systems based on this.
7. Network Functions
Throughput Limits
7
Router /
Firewall etc.
NIC NIC
750Mbps
downstream
750Mbps
upstream
With a 15 Gbps instance, we can transfer bi-directional 750 Mbps - peak!
8. Disclaimer
§ We like working with AWS, it makes our life so
much easier!
§ But we want to share our journeys through
sleepless nights etc.
Thanks to..
§ Our colleagues, who went with us through this
§ AWS account team and network specialists
One Note...
8
Limits are critical in multi-tenant environments –
they protect (also us) from noisy neighbors!
9. emnify IoT SuperNetwork
9
Deployed in
7 breakout regions
Global IoT SIM card
Integration into customer
networks
Cellular connectivity for the
Internet of Things (IoT)
IoT-specific security features
Headquartered in Germany
Engineering EU remote
🇪🇺
🇩🇪
10. Why are AWS limits so relevant for us?
10
IoT device IoT backend
Internet
Mobile Network
Operators (MNO)
customer-owned nobody owns it
emnify-contracted emnify-owned
Unpredictable
traffic
Customer pays for
the traffic
EC2 instance type
economics
11. Burstable Instance Types (“up to X Gbps”)
Throughput Limits
https://docs.aws.amazon.com/ec2/latest/instancetypes/co.html#co_network 11
§ Documentation is clear about
baseline / burst bandwidth
§ Credits-based mechanism
§ On a best effort basis
12. § Metrics
• bw_in_allowance_exceeded
• bw_out_allowance_exceeded
• pps_allowance_exceeded
§ Indicate queuing (and potential packet drops)
§ Amazon CloudWatch
§ ethtool –S ens5 to read from NIC driver (ENA)
• CloudWatch Agent install + configuration
• Prometheus node_exporter, with ethtool
collector (disabled by default)
§ Monitoring interval: Beware of bursts
Monitoring
Throughput Limits
12
13. § Packet drops on a c6i.xlarge with very low traffic
• Bandwidth utilization is 15 Mbps
• bw_out_allowance_exceeded increases every 5 min
§ File upload to S3 every 5 min
• File size: 9 MB
• Data transmit completes within 26 ms
• à 9 MB/26 ms = 2.7 Gbps peak
Microbursts
Throughput Limits
13
14. § Limits are implemented much more granular than
per second.
§ bw_out_allowance_exceeded does not
always mean dropped packets, only enqueued.
§ Make sure you monitor your application behavior.
§ "Bandwidth is a shared resource”
Instance burst is on a best effort basis, even when the
instance has credits available, as burst bandwidth is a
shared resource.
Microbursts
Throughput Limits
14
Amazonian: You write that you
observe bandwidth out allowance
exceeded metric increasing on an
instance that is barely used, which
should have enough network credits.
EM: Yes, metric increases every 5min.
Amazonian: Your packet size is only
1500 byte. Can you add a VPCe?
EM: Okay, now it increases even
faster.
Amazonian: Probably microburst on
PPS limit.
EM: The PPS metric is not increased.
Amazonian: Scale up the instance!
EM: No, you won’t get more money!
AWS Support
15. Increasing Limits
§ Add more network interfaces?
§ Change the instance type
• Scale up! 💸
• Use network-enhanced types 💸
• Use newer generation
(7th gen increased limits)
§ EC2 instance bandwidth weighting: 25% more
network capacity (requires. 8th gen instance)
§ Use jumbo frames inside AWS or via DX
Avoid Microbursts
§ AWS-CLI S3: default.s3.max_bandwidth
§ SO_MAX_PACING_RATE socket option for own
applications
§ Limit bandwidth using tc
§ Increase the Tx ring size with ethtool
Countermeasures
Throughput Limits
https ://repo s t.aws /kno wl edge-center/ec2-i ns tance -ex ceedi ng-netwo rk-l i mi ts 15
16. § pps_allowance_exceeded increased,
while far away from the PPS limit
Amazonian: You write that
everything is fine but you have
packet loss metrics increasing.
EM: Yes, totally.
Amazonian: Fragmented packets?
EM: Oh, yes! How many can I?
Amazonian: Can’t tell ya
EM: Hold my beer. Like 1000?
Amazonian: Exactly. And btw. you
won’t get more when you scale up.
AWS Support
Issues
Fragmented Packets
https ://do cs .aws .amazo n.co m/AWSEC2/l ates t/Us erGui de/ena-ni tro-perf.html #ena-ni tro-perf-ex cepti o ns 16
Packet
Gateway
UDP 1500 byte
17. § Not using Nitro’s hardware acceleration
(fast path)
§ Differs between
• Ingress: Nitro standard rate (slow path)*
• Egress: 1024 pps (intentionally slow path)
§ Same limit (1024pps) as link-local traffic, but
different bucket.
Understanding
Fragmented Packets
* UNDOCUM ENTED LIM IT, BUT GROWS WITH INSTANCE SIZE 17
18. § pps_allowance_exceeded increased, while
far away from the PPS limit
§ Running tcpdump 'ip[6] = 32'
§ Ask AWS support!
Monitoring
Fragmented Packets
18
19. Avoid Fragmentation
§ Don’t fragment!
§ VPN / encapsulation use cases
• Lower MTU of tunnel interfaces: Fragment
the inner packets (hide them from AWS)
• Make sure path MTU discovery works
• TCP MSS clamping
Increase Throughput
§ Move fragmentation to TGW, if applicable
§ Use ENA’s fragment bypass support
(v2.13.3, released last week)
• enable_frag_bypass
• Increases egress throughput from 1024 pps
to the “standard rate” (not accelerated)
• Competes for Nitro CPU resources
Countermeasures
Fragmented Packets
https ://gi thub.co m/amzn/amzn-dri vers /bl ob/mas ter/kernel
/l i nux /ena/RELEASENOTES.md#r213 3-rel eas e-no tes
19
21. State of Connections
§ Stateful firewall:
• Allow the returning flow of a connection
• Similar to NAT table, but slightly different
§ In Linux, mostly using iptables
Introduction
Connection Tracking
21
Firewall
NIC NIC
Source
IP Address
Source
Port
Destination
IP Address
Destination
Port
Protocol State
1.2.3.4 54321 5.6.7.8 80 TCP ESTABLISHED
1.2.3.5 54322 5.6.7.8 80 TCP SYN_SENT
1.2.3.5 43210 6.7.8.9 123 UDP SEEN_REPLY
22. State of Connections
§ Stateful firewall:
• Allow the returning flow of a connection
• Similar to NAT table, but slightly different
§ In Linux, mostly using iptables
AWS
§ Security groups: What makes them stateful
§ Limited number of entries
• Varies per EC2 instance type
• No official documentation of limits
Introduction
Connection Tracking
22
23. § New TCP / UDP connections are blocked
• Incoming connections might fail
• DNS might (sometimes) not work
• AWS SSM might (sometimes) not work
Consequences
Connection Tracking
23
PGW
PGW
VPN
25. § Requires young enough ENA driver version, not included in
Ubuntu 24.04 LTS. We have own version compiled.
§ Counters to be interpreted differently
• Exceeded counter: refused connections per interface
(use the sum of all interfaces)
• Available counter: each interface shows the same value,
the remaining for the EC2 instance
(use avg or pick a single interface)
§ Observed different values for the available counter on
different interfaces – only 1st interface was up-to-date
(bug #291, fixed)
§ Flow logs can help, but setup is cumbersome
Monitoring Challenges
Connection Tracking
https ://gi thub.co m/amzn/amzn-dri vers /i s s ues /291 25
26. Countermeasures
Connection Tracking
Disable connection tracking
§ Have a rule in the reverse direction allowing
traffic from all IP addresses1:
§ Network functions, imagine NAT instance
§ Beware of the security implications! 1) EXCEPT “AUTOM ATICALLY TRACKED”
https ://docs .aws .amazon.com/AWSEC2/l ates t/Us erGui de/s ecuri ty-group-co nnecti o n-tracki ng.html 26
Ingress Rules
Local Port Source IP
80 0.0.0.0/0
Ingress Rules
Local Port Source IP
* 0.0.0.0/0
Egress Rules
Destination IP Remote Port
0.0.0.0/0 *
Egress Rules
Destination IP Remote Port
0.0.0.0/0 *
28. Countermeasures
Connection Tracking
Disable connection tracking
§ Have a rule in the reverse direction allowing
traffic from all IP addresses1:
§ Network functions, imagine NAT instance
§ Beware of the security implications!
Live with it
§ Change EC2 instance
• Scale up instance size, as it doubles 💸
• Change instance type (cf. table before)
§ Lower idle timers of network interface
§ Lower cardinality (easier said than done)
28
PGW
PGW
VPN
Ingress Rules
Local Port Source IP
80 0.0.0.0/0
Ingress Rules
Local Port Source IP
* 0.0.0.0/0
Egress Rules
Destination IP Remote Port
0.0.0.0/0 *
Egress Rules
Destination IP Remote Port
0.0.0.0/0 *
29. § The cloud has limits to protect from noisy
neighbors
• Often not clear, what the limits are
• AWS documentation got a lot better (we
should read it more often!)
§ Monitoring capabilities are helpful
• We can’t rescue every packet
• Sometimes still a mystery
§ Other clouds?
Summary
29
30. § This is My Architecture: emnify
§ Evaluation SIM Cards
§ AWS re:Invent 2024 - EC2 Nitro networking
under the hood (NET402)
§ EC2 User Guide – Instance network
bandwidth
§ AWS Blogs - Amazon EC2 instance-level
network performance metrics uncover new
insights
§ AWS re:post - Why does my Amazon EC2
instance exceed its network limits when
average utilization is low?
Resources
30