SlideShare a Scribd company logo
by Jesper Dangaard Brouer  <jdb@comx.dk> Master of Computer Science ComX Networks A/S OpenSourceDays 2008 d.4/10-2008 ComX Networks A/S
Who am I Name: Jesper Dangaard Brouer Edu: Computer Science for Uni. Copenhagen Focus on Network, Dist. sys and OS Linux user since 1996, professional since 1998 Sysadm, Developer, Embedded OpenSource projects Author of ADSL-optmizer CPAN IPTables::libiptc Patches accepted into Kernel, iproute2 and iptables
Presentation overview You will learn: About a Danish ISPs extreme use of iptables How to avoid bad routing performance Traffic categorization is performance key How iptables rulesets are processed in userspace How to use userspace processing as an advantage Improvements to make iptables scale
ComX Networks A/S I work for ComX Networks A/S Danish Fiber Broadband Provider variety of services (TV, IPTV, VoIP, Internet) This talk is about our Internet product Netfilter is a core component: Basic Access Control Bandwidth Control Personal Firewall
Physical surroundings ComX delivers fiber based solutions Our primary customers are apartment buildings but with an end-user relationship Ring based network topology with POPs  (Point Of Presence) POPs have fiber strings to apartment buildings CPE box in apartment performs service separation into VLANs
The Linux box The iptables box(es), this talk is all about placed at each POP (near the core routers) high-end server PC, with  only two netcards Internet traffic: from several apartment buildings,  layer2 terminated via VLANs on one netcard,  routed out the other. Cost efficient but  needs to scale to a large number of customers goal is to scale to 5000 customers per machine
Issues and limitations First generation solution was in production. business grew and customers where added;  several scalability issues arose The two primary were: Routing performance reduced (20 kpps) Rule changes where slow I was hired to rethink the system
Overview Presentation split into two subjects 1) Routing performance Solved using effective traffic categorization 2) Slow rule changes Solved by modifying iptables to use binary search
Issue: Bad route performance The first generation solution,  naive approach: long list of rules in a single chain Routing performance degradation problem: It all comes down to traffic categorizing binding packets to a customer where a customer can have several IP-addresses Need to  find a scalable categorization mechanism
Existing solutions Looking for existing solutions for solving the categorization task Ended up using standard iptables chains nf-hipac,  universal solution, Optimize ruleset for memory lookups per packet Did not work with current kernels ipset Sets of IP, can be matched, given action
The categorization tasks With the kind of categorization needed, why did I ended up using standard iptables chains? Access Control simple open/close solution could use ipset Bandwidth Control requires an individual shaper per customer cannot use ipset Personal firewall most complicated: individual set of rules per customer cannot use ipset
Solution: SubnetSkeleton The solution was to build a search tree;  for IP-addresses, based on subnet partitioning, using standard iptables chains and jump rules.
SubnetSkeleton: Algorithm Algorithm, predefined partitioning of IP space; based on a user-defined  list of CIDR prefixes Depth of tree, determined by CIDR list length. Max number of children, bits between CIDRs (2 n ) Creates tree by bit masking the IP with the CIDR list Example: CIDR list = [/8, /16, /24] IP: 10.1.2.3 10.1.2.3 10 .0.0.0/8 10.1.2 .0/24 10.1 .0.0/16
SubnetSkeleton: CIDR partitioning Choosing CIDR list is essential. Base it on IP-space that needs to be covered. E.g. our IP-address space, limited to AS number AS31661 = 156.672 IPs.  Largest subnet we announce is a /16. CIDR list: [8, 18, 20, 22, 24, 26, 28] /8 needed as our subnets vary on first byte,  &quot;0-8&quot;, 2 8 =256 children, but only 4 different subnets Between &quot;8-18&quot;: 2 10  = Max 1024 children.  But know /16 (2 2 =4) Between, rest 2 bits, thus max 4 children in nodes. Last, &quot;28-32&quot;: (2 4 =16) max 16 direct IP matches.
SubnetSkeleton: iptables Expressing the tree using iptables: Each  node  in the tree is an iptables  chain . child  pointers in a  node  are  jump  rules. A  leaf  has IP specific jump rules to a user-defined chain leafs  are allowed to jump to the same user-defined chain children  ( jump  rules) are processed linearly, in  chain
Perl - IPTables::SubnetSkeleton #!/usr/bin/perl use  IPTables::SubnetSkeleton ; my @CIDR = (8, 16, 24);  # prefix list my $name  = &quot;bw&quot;;   # Shortname for bandwidth my $table = &quot; mangle &quot;;  # Use “mangle” table my $subnet_src = IPTables::SubnetSkeleton::new(&quot;$name&quot;, &quot; src &quot;, $table, @CIDR); # Connect subnet skeleton to build-in chain &quot;FORWARD&quot; $subnet_src->connect_to(&quot;FORWARD&quot;); # Insert IP's to match into the tree $subnet_src->insert_element(&quot;10.2.11.33&quot;, &quot; userchain1 &quot;); $subnet_src->insert_element(&quot;10.2.10.66&quot;, &quot; userchain2 &quot;); $subnet_src->insert_element(&quot;10.1.2.42&quot;,  &quot; userchain3 &quot;); $subnet_src->insert_element(&quot;10.1.3.123&quot;, &quot; userchain3 &quot;); # Remember to commit the ruleset to kernel $subnet_src->iptables_commit();
Full routing performance achieved Full route performance achieved When using SubnetSkeleton HTB shaper seems to scale well Good perf boost in 2.6.25, Better conntrack locking, faster conntrack hash func  reduced cpu load to half, Thanks Patrick McHardy! Parameter tuning Increase route cache Increase conntrack entries remember conntrack hash bucket size   (/sys/module/nf_conntrack/parameters/hashsize) Adjust arp/neighbor size and thresholds Back to subject: Slow ruleset changes
Issue: iptables command slow The next scalability issue: Rule changes slow! Rebuilding the entire ruleset could take hours Discover  how iptables works : Entire ruleset copied to userspace After possibly multiple changes, copied back to kernel Performed by a IPTables Cache library &quot;libiptc&quot; iptables.c is a command line parser using this library Profiling: identified  first  scalability issue Initial ruleset parsing , during “pull-out” Could postpone fix...
Take advantage of libiptc Take advantage of pull-out and commit system Pull-out ruleset ( one initial ruleset parsing penalty ) Make all modification needed Commit ruleset (to kernel) This is h ow  iptables-restore  wo rks Extra bonus: Several rule changes appear atomic Update all rules related to a customer at once No need for temp chains and renaming
Perl - IPTables::libiptc Cannot use iptables-restore/save SubnetSkeleton must have is_chain() test function Created CPAN IPTables::libiptc Chains: Direct libiptc calls Rules: Command like interface via iptables.c linking iptables extensions available on system, dynamic loaded No need to maintain or port iptables extensions Remember to commit() Using this module I could postponed fixing &quot;initial ruleset parsing&quot;
Next scalability issue: Chain lookup Slow chain name lookup is_chain() testing (internal iptcc_find_label()) Cause by: linearly list search with strcmp() Affects: almost everything Rule create, delete, even listing. Multiple rule changes,  eg. iptables-restore, SubnetSkeleton Rule listing (iptables -nL) with 50k chains: Takes approx 5 minutes! After my fix: reduced to 0.5 sec.
Chains lookup: Solution Solution: binary search on chain names Important property chain list is sorted by name Keep original linked list data structure New data structure: &quot;Chain index&quot; Array with pointers into linked list with a given spacing (40) Result:  better starting points when searching the linked list Chain index: Array Chain list: linked list, sorted by chain name Mainline:  iptables ver.1.4.1 , git:2008-01-15 0 1 2 3 B D F H J L N C E O M K I G
Chain index: Insert chain Handle: Inserting/creating new chains Inserting don't change correctness of chain index only cause longer lists rebuild after threshold inserts (355) Chain index: Array Chain list: linked list, sorted by chain name Inserting before first element is special 0 1 2 3 B D F H J L N P C E A
Chain index: Delete chain Handle: deletion of chains Delete chain  not  pointed to by chain index, no effect Delete chain pointed to by chain index, possible rebuild Replace index pointer with next pointer Only if next pointer not part of chain index Chain index: Array Chain list: linked list, sorted by chain name Rebuild array 0 1 2 3 B D F H J L N C E O M K I G
Solving: Initial ruleset parsing Back to fixing &quot;initial ruleset parsing&quot;. Did have a fix, but was not 64-bit compliant (2007-11-26) Problem:  Resolving jump rules is slow For each: Jump Rule Do a linearly, offset based, search of chain list Solution:  Reuse binary search algorithm and data structure Realize chain list are both sorted by name and offsets Ruleset from kernel already sorted mainline:  iptables ver.1.4.2-rc1 , git: 2008-07-03
Summary: Load time Personal firewall Reload all rules on a production machine Chains: 5789 Rules: 22827 Machine with the most customers, has in filter table Chains: 9827 Rules:36532
Summary: Open Source Open Source Status Chain lookup fix In iptables version 1.4.1 50k chains, listing 5 min -> 0.5 sec Initial ruleset parsing fix In iptables version 1.4.2-rc1 Production, reached 10 sec -> 0.046 sec IPTables::libiptc Released on CPAN IPTables::SubnetSkeleton Available via  http://people.netfilter.org/hawk/
Summary: Goal reached? Goal of 5000 equipment, Production, reached 3400 CPU load 30% average, 62% in peek. CPU Xeon (Hyperthread) 3.2 Ghz, 1MB cache In filter table Chains: 9827 Rules: 36532
The End Goodbye and thank you for accepting the patches...  81.161.128/0/18 195.135.216.0/22 87.72.0.0/16 82.211.224.0/19
Extra slides Bonus slides if time permits or funny questions arise
Route cache perf Improved route cache Kernel 2.6.15 --> 2.6.25 Thanks to Eric Dumazet
CPU util softirq Softirq CPU usage dropped Kernel 2.6.15 --> 2.6.25 Patrick McHardy, improved conntrack locking
More libiptc stats Machine with the most customers, Customers:2105  Equipment: 3477 In filter table Chains: 9827 Rules: 36532 In mangle table Chains: 2770 Rules:14275 “Init”  time: 0.10719919s “is_chain”  time: 0.00001473s
BSD pf firewalling My  limited  knowledge of Open/FreeBSD's firewall facility: pf (packet filter) Don't have chains with rules like iptables: Uses one list/chain To compensate, they have an “ipset” like facility called “tables” Quite smart using a radix tree. Has a basic ruleset-optimizer, performs four tasks: remove duplicate rules remove rules that are a subset of another rule combine multiple rules into a table when advantageous re-order the rules to improve evaluation performance Don't think pf would solve my categorization needs I could not use “ipset”, for the same reasons cannot use pf “tables”

More Related Content

PPTX
Iptables the Linux Firewall
PPTX
bgp protocol
PPTX
Network management ppt
PDF
TechWiseTV Workshop: Cisco SD-WAN
PPTX
Material of Course Juniper JNCIA JUNOS Day1
PDF
Firewall & types of Firewall
PPT
Tcp
PPT
Workgroup vs domain
Iptables the Linux Firewall
bgp protocol
Network management ppt
TechWiseTV Workshop: Cisco SD-WAN
Material of Course Juniper JNCIA JUNOS Day1
Firewall & types of Firewall
Tcp
Workgroup vs domain

What's hot (20)

PDF
Iptables presentation
PDF
Network Time Synchronization
PDF
Ccna rse chp7 Access Control List (ACL)
PDF
Building DataCenter networks with VXLAN BGP-EVPN
PPTX
Windows Network concepts
PPT
Chapter 2 - Computer Evolution and Performance
PDF
AnyConnect Secure Mobility
PDF
Apple Captive Network Assistant Bypass with ClearPass Guest
PDF
Wi-fi Hacking
PPT
DHCP PROTOCOL
PPTX
Cisco nx os
PDF
Syslog Protocols
PDF
Iptables fundamentals
PDF
VXLAN BGP EVPN: Technology Building Blocks
PDF
NETCONF Call Home
PDF
CNIT 141: 4. Block Ciphers
PPTX
Remote access service
PPTX
BGP (Border Gateway Protocol)
PDF
Introduction to Software Defined Networking (SDN)
PDF
Bgp route reflector
Iptables presentation
Network Time Synchronization
Ccna rse chp7 Access Control List (ACL)
Building DataCenter networks with VXLAN BGP-EVPN
Windows Network concepts
Chapter 2 - Computer Evolution and Performance
AnyConnect Secure Mobility
Apple Captive Network Assistant Bypass with ClearPass Guest
Wi-fi Hacking
DHCP PROTOCOL
Cisco nx os
Syslog Protocols
Iptables fundamentals
VXLAN BGP EVPN: Technology Building Blocks
NETCONF Call Home
CNIT 141: 4. Block Ciphers
Remote access service
BGP (Border Gateway Protocol)
Introduction to Software Defined Networking (SDN)
Bgp route reflector
Ad

Viewers also liked (20)

PPTX
NOS Comparison
PDF
IP Tables Getting Started - Part 2
PDF
IP Tables Primer - Part 1
PDF
IPTables Primer - Part 2
PDF
Packet Filtering Using Iptables
PPT
Iptables
PDF
Open vSwitch - Stateful Connection Tracking & Stateful NAT
PDF
Cilium - Fast IPv6 Container Networking with BPF and XDP
PDF
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
PDF
The 100 - {dive} : event
PDF
TRex Traffic Generator - Hanoch Haim
PDF
Red Hat demo of OpenStack and ODL at ODL summit 2016
ODP
nftables - the evolution of Linux Firewall
PDF
Pipework: Software-Defined Network for Containers and Docker
PDF
LinuxCon 2015 Stateful NAT with OVS
PDF
2015 FOSDEM - OVS Stateful Services
PPT
Iptables in linux
PPT
IP Address
PDF
Iptables Configuration
PPTX
NOS Comparison
IP Tables Getting Started - Part 2
IP Tables Primer - Part 1
IPTables Primer - Part 2
Packet Filtering Using Iptables
Iptables
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Cilium - Fast IPv6 Container Networking with BPF and XDP
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
The 100 - {dive} : event
TRex Traffic Generator - Hanoch Haim
Red Hat demo of OpenStack and ODL at ODL summit 2016
nftables - the evolution of Linux Firewall
Pipework: Software-Defined Network for Containers and Docker
LinuxCon 2015 Stateful NAT with OVS
2015 FOSDEM - OVS Stateful Services
Iptables in linux
IP Address
Iptables Configuration
Ad

Similar to Netfilter: Making large iptables rulesets scale (20)

PPT
Nad710 Network Address Translation
PDF
Ecet 375 Enhance teaching / snaptutorial.com
DOC
Ecet 375 Education Redefined - snaptutorial.com
DOC
Ecet 375 Education Specialist-snaptutorial.com
DOC
ECET 375 Invent Yourself/newtonhelp.com
DOCX
ECET 375 Effective Communication/tutorialrank.com
DOCX
ECET 375 Success Begins/Newtonhelp.com
DOCX
NWI FOR OLATUNDE ISMAILA (G10B)
PDF
The Network Ip Address Scheme
PDF
Tutorial mikrotik step by step
PPTX
Introduction to tcp ip linux networking
PPTX
Low latency in java 8 v5
PPTX
IPv4 Addressing
PPTX
PDF
Tutorial mikrotik step by step anung muhandanu
DOCX
Ecet 375 Massive Success / snaptutorial.com
PDF
[Ccna] subnetting & vlsm
PDF
iptables 101- bottom-up
PDF
200-301-demo.pdf
PDF
Cisco 200-301 Exam Dumps
Nad710 Network Address Translation
Ecet 375 Enhance teaching / snaptutorial.com
Ecet 375 Education Redefined - snaptutorial.com
Ecet 375 Education Specialist-snaptutorial.com
ECET 375 Invent Yourself/newtonhelp.com
ECET 375 Effective Communication/tutorialrank.com
ECET 375 Success Begins/Newtonhelp.com
NWI FOR OLATUNDE ISMAILA (G10B)
The Network Ip Address Scheme
Tutorial mikrotik step by step
Introduction to tcp ip linux networking
Low latency in java 8 v5
IPv4 Addressing
Tutorial mikrotik step by step anung muhandanu
Ecet 375 Massive Success / snaptutorial.com
[Ccna] subnetting & vlsm
iptables 101- bottom-up
200-301-demo.pdf
Cisco 200-301 Exam Dumps

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Approach and Philosophy of On baking technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Electronic commerce courselecture one. Pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Approach and Philosophy of On baking technology
MIND Revenue Release Quarter 2 2025 Press Release
Programs and apps: productivity, graphics, security and other tools
Electronic commerce courselecture one. Pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
The AUB Centre for AI in Media Proposal.docx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectroscopy.pptx food analysis technology

Netfilter: Making large iptables rulesets scale

  • 1. by Jesper Dangaard Brouer <jdb@comx.dk> Master of Computer Science ComX Networks A/S OpenSourceDays 2008 d.4/10-2008 ComX Networks A/S
  • 2. Who am I Name: Jesper Dangaard Brouer Edu: Computer Science for Uni. Copenhagen Focus on Network, Dist. sys and OS Linux user since 1996, professional since 1998 Sysadm, Developer, Embedded OpenSource projects Author of ADSL-optmizer CPAN IPTables::libiptc Patches accepted into Kernel, iproute2 and iptables
  • 3. Presentation overview You will learn: About a Danish ISPs extreme use of iptables How to avoid bad routing performance Traffic categorization is performance key How iptables rulesets are processed in userspace How to use userspace processing as an advantage Improvements to make iptables scale
  • 4. ComX Networks A/S I work for ComX Networks A/S Danish Fiber Broadband Provider variety of services (TV, IPTV, VoIP, Internet) This talk is about our Internet product Netfilter is a core component: Basic Access Control Bandwidth Control Personal Firewall
  • 5. Physical surroundings ComX delivers fiber based solutions Our primary customers are apartment buildings but with an end-user relationship Ring based network topology with POPs (Point Of Presence) POPs have fiber strings to apartment buildings CPE box in apartment performs service separation into VLANs
  • 6. The Linux box The iptables box(es), this talk is all about placed at each POP (near the core routers) high-end server PC, with only two netcards Internet traffic: from several apartment buildings, layer2 terminated via VLANs on one netcard, routed out the other. Cost efficient but needs to scale to a large number of customers goal is to scale to 5000 customers per machine
  • 7. Issues and limitations First generation solution was in production. business grew and customers where added; several scalability issues arose The two primary were: Routing performance reduced (20 kpps) Rule changes where slow I was hired to rethink the system
  • 8. Overview Presentation split into two subjects 1) Routing performance Solved using effective traffic categorization 2) Slow rule changes Solved by modifying iptables to use binary search
  • 9. Issue: Bad route performance The first generation solution, naive approach: long list of rules in a single chain Routing performance degradation problem: It all comes down to traffic categorizing binding packets to a customer where a customer can have several IP-addresses Need to find a scalable categorization mechanism
  • 10. Existing solutions Looking for existing solutions for solving the categorization task Ended up using standard iptables chains nf-hipac, universal solution, Optimize ruleset for memory lookups per packet Did not work with current kernels ipset Sets of IP, can be matched, given action
  • 11. The categorization tasks With the kind of categorization needed, why did I ended up using standard iptables chains? Access Control simple open/close solution could use ipset Bandwidth Control requires an individual shaper per customer cannot use ipset Personal firewall most complicated: individual set of rules per customer cannot use ipset
  • 12. Solution: SubnetSkeleton The solution was to build a search tree; for IP-addresses, based on subnet partitioning, using standard iptables chains and jump rules.
  • 13. SubnetSkeleton: Algorithm Algorithm, predefined partitioning of IP space; based on a user-defined list of CIDR prefixes Depth of tree, determined by CIDR list length. Max number of children, bits between CIDRs (2 n ) Creates tree by bit masking the IP with the CIDR list Example: CIDR list = [/8, /16, /24] IP: 10.1.2.3 10.1.2.3 10 .0.0.0/8 10.1.2 .0/24 10.1 .0.0/16
  • 14. SubnetSkeleton: CIDR partitioning Choosing CIDR list is essential. Base it on IP-space that needs to be covered. E.g. our IP-address space, limited to AS number AS31661 = 156.672 IPs. Largest subnet we announce is a /16. CIDR list: [8, 18, 20, 22, 24, 26, 28] /8 needed as our subnets vary on first byte, &quot;0-8&quot;, 2 8 =256 children, but only 4 different subnets Between &quot;8-18&quot;: 2 10 = Max 1024 children. But know /16 (2 2 =4) Between, rest 2 bits, thus max 4 children in nodes. Last, &quot;28-32&quot;: (2 4 =16) max 16 direct IP matches.
  • 15. SubnetSkeleton: iptables Expressing the tree using iptables: Each node in the tree is an iptables chain . child pointers in a node are jump rules. A leaf has IP specific jump rules to a user-defined chain leafs are allowed to jump to the same user-defined chain children ( jump rules) are processed linearly, in chain
  • 16. Perl - IPTables::SubnetSkeleton #!/usr/bin/perl use IPTables::SubnetSkeleton ; my @CIDR = (8, 16, 24); # prefix list my $name = &quot;bw&quot;; # Shortname for bandwidth my $table = &quot; mangle &quot;; # Use “mangle” table my $subnet_src = IPTables::SubnetSkeleton::new(&quot;$name&quot;, &quot; src &quot;, $table, @CIDR); # Connect subnet skeleton to build-in chain &quot;FORWARD&quot; $subnet_src->connect_to(&quot;FORWARD&quot;); # Insert IP's to match into the tree $subnet_src->insert_element(&quot;10.2.11.33&quot;, &quot; userchain1 &quot;); $subnet_src->insert_element(&quot;10.2.10.66&quot;, &quot; userchain2 &quot;); $subnet_src->insert_element(&quot;10.1.2.42&quot;, &quot; userchain3 &quot;); $subnet_src->insert_element(&quot;10.1.3.123&quot;, &quot; userchain3 &quot;); # Remember to commit the ruleset to kernel $subnet_src->iptables_commit();
  • 17. Full routing performance achieved Full route performance achieved When using SubnetSkeleton HTB shaper seems to scale well Good perf boost in 2.6.25, Better conntrack locking, faster conntrack hash func reduced cpu load to half, Thanks Patrick McHardy! Parameter tuning Increase route cache Increase conntrack entries remember conntrack hash bucket size (/sys/module/nf_conntrack/parameters/hashsize) Adjust arp/neighbor size and thresholds Back to subject: Slow ruleset changes
  • 18. Issue: iptables command slow The next scalability issue: Rule changes slow! Rebuilding the entire ruleset could take hours Discover how iptables works : Entire ruleset copied to userspace After possibly multiple changes, copied back to kernel Performed by a IPTables Cache library &quot;libiptc&quot; iptables.c is a command line parser using this library Profiling: identified first scalability issue Initial ruleset parsing , during “pull-out” Could postpone fix...
  • 19. Take advantage of libiptc Take advantage of pull-out and commit system Pull-out ruleset ( one initial ruleset parsing penalty ) Make all modification needed Commit ruleset (to kernel) This is h ow iptables-restore wo rks Extra bonus: Several rule changes appear atomic Update all rules related to a customer at once No need for temp chains and renaming
  • 20. Perl - IPTables::libiptc Cannot use iptables-restore/save SubnetSkeleton must have is_chain() test function Created CPAN IPTables::libiptc Chains: Direct libiptc calls Rules: Command like interface via iptables.c linking iptables extensions available on system, dynamic loaded No need to maintain or port iptables extensions Remember to commit() Using this module I could postponed fixing &quot;initial ruleset parsing&quot;
  • 21. Next scalability issue: Chain lookup Slow chain name lookup is_chain() testing (internal iptcc_find_label()) Cause by: linearly list search with strcmp() Affects: almost everything Rule create, delete, even listing. Multiple rule changes, eg. iptables-restore, SubnetSkeleton Rule listing (iptables -nL) with 50k chains: Takes approx 5 minutes! After my fix: reduced to 0.5 sec.
  • 22. Chains lookup: Solution Solution: binary search on chain names Important property chain list is sorted by name Keep original linked list data structure New data structure: &quot;Chain index&quot; Array with pointers into linked list with a given spacing (40) Result: better starting points when searching the linked list Chain index: Array Chain list: linked list, sorted by chain name Mainline: iptables ver.1.4.1 , git:2008-01-15 0 1 2 3 B D F H J L N C E O M K I G
  • 23. Chain index: Insert chain Handle: Inserting/creating new chains Inserting don't change correctness of chain index only cause longer lists rebuild after threshold inserts (355) Chain index: Array Chain list: linked list, sorted by chain name Inserting before first element is special 0 1 2 3 B D F H J L N P C E A
  • 24. Chain index: Delete chain Handle: deletion of chains Delete chain not pointed to by chain index, no effect Delete chain pointed to by chain index, possible rebuild Replace index pointer with next pointer Only if next pointer not part of chain index Chain index: Array Chain list: linked list, sorted by chain name Rebuild array 0 1 2 3 B D F H J L N C E O M K I G
  • 25. Solving: Initial ruleset parsing Back to fixing &quot;initial ruleset parsing&quot;. Did have a fix, but was not 64-bit compliant (2007-11-26) Problem: Resolving jump rules is slow For each: Jump Rule Do a linearly, offset based, search of chain list Solution: Reuse binary search algorithm and data structure Realize chain list are both sorted by name and offsets Ruleset from kernel already sorted mainline: iptables ver.1.4.2-rc1 , git: 2008-07-03
  • 26. Summary: Load time Personal firewall Reload all rules on a production machine Chains: 5789 Rules: 22827 Machine with the most customers, has in filter table Chains: 9827 Rules:36532
  • 27. Summary: Open Source Open Source Status Chain lookup fix In iptables version 1.4.1 50k chains, listing 5 min -> 0.5 sec Initial ruleset parsing fix In iptables version 1.4.2-rc1 Production, reached 10 sec -> 0.046 sec IPTables::libiptc Released on CPAN IPTables::SubnetSkeleton Available via http://people.netfilter.org/hawk/
  • 28. Summary: Goal reached? Goal of 5000 equipment, Production, reached 3400 CPU load 30% average, 62% in peek. CPU Xeon (Hyperthread) 3.2 Ghz, 1MB cache In filter table Chains: 9827 Rules: 36532
  • 29. The End Goodbye and thank you for accepting the patches... 81.161.128/0/18 195.135.216.0/22 87.72.0.0/16 82.211.224.0/19
  • 30. Extra slides Bonus slides if time permits or funny questions arise
  • 31. Route cache perf Improved route cache Kernel 2.6.15 --> 2.6.25 Thanks to Eric Dumazet
  • 32. CPU util softirq Softirq CPU usage dropped Kernel 2.6.15 --> 2.6.25 Patrick McHardy, improved conntrack locking
  • 33. More libiptc stats Machine with the most customers, Customers:2105 Equipment: 3477 In filter table Chains: 9827 Rules: 36532 In mangle table Chains: 2770 Rules:14275 “Init” time: 0.10719919s “is_chain” time: 0.00001473s
  • 34. BSD pf firewalling My limited knowledge of Open/FreeBSD's firewall facility: pf (packet filter) Don't have chains with rules like iptables: Uses one list/chain To compensate, they have an “ipset” like facility called “tables” Quite smart using a radix tree. Has a basic ruleset-optimizer, performs four tasks: remove duplicate rules remove rules that are a subset of another rule combine multiple rules into a table when advantageous re-order the rules to improve evaluation performance Don't think pf would solve my categorization needs I could not use “ipset”, for the same reasons cannot use pf “tables”

Editor's Notes

  • #8: TALK: First I&apos;ll focus on the routing performance issue I&apos;ll come back to slow rule changes later
  • #13: (To solve the routing performance issue, I had to: Make traffic categorizing scale!) (reducing the lookup time from O(n) to O(log(n)))
  • #23: (named skip-list search infrastructure by Thomas Jacob &lt;jacob@internet24.de&gt;) (In mainline, iptables version 1.4.1, git:2008-01-15) The runtime complexity is actually also affected by this &amp;quot;bucket&amp;quot; size concept. Thus, O(log(n/k) + k) where k is CHAIN_INDEX_BUCKET_LEN.
  • #26: (In mainline: iptables ver.1.4.2-rc1, git: 2008-07-03)