SlideShare a Scribd company logo
Evaluating Lustre 2.9 and OpenStack
James Beal
Secure lustre on openstack
The Sanger Institute
LSF 9
~10,000 cores in main compute farm
~10,000 cores across smaller project-specific farms
15PB Lustre storage
Mostly everything is available everywhere - “isolation” is based on POSIX file
permissions
Our OpenStack History
2015, June: sysadmin training
July: experiments with RHOSP6 (Juno)
August: RHOSP7 (Kilo) released
December: pilot “beta” system opened to testers
2016, first half: Science As A Service
July: pilot “gamma” system opened using proper Ceph hardware
August: datacentre shutdown
September: production system hardware installation
HPC and Cloud computing are
Complimentary
Traditional HPC
The highest possible
performance for Sanger
workloads.
A mature and centrally
managed compute platform.
High performance Lustre
filesystems.
Flexible Compute
Full segregation between
projects ensures data security
throughout computation
tasks.
Developers and collaborators
are no longer tied to a single
system. They are free to
follow the latest technologies
and trends
Motivations
Traditional pipelines require a shared POSIX filesystem while cloud
workloads support object stores.
We have a large number of traditional/legacy pipelines.
We do not always have the source code or expertise to migrate.
We require multi Gigabyte per second performance.
The tenant will have root and could impersonate any user.
We want the system to be simplest for the tenant and as simple as
possible for the administrator.
Lustre 2.9 features
• Each tenant’s I/O is squashed to their own unique uid/gid
• Each tenant is restricted to their own subdirectory of the Lustre
filesystem
It might be possible to treat general access outside of openstack as a
separate tenant with a uid space reserved for a number of openstack
tenants. With only a subdirectory exported for standard usage.
Production openstack (I)
• 107 Compute nodes (Supermicro) each with:
• 512GB of RAM, 2 * 25GB/s network interfaces,
• 1 * 960GB local SSD, 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• 6 Control nodes (Supermicro) allow 2 openstack instances.
• 256 GB RAM, 2 * 100 GB/s network interfaces,
• 1 * 120 GB local SSD, 1 * Intel P3600 NVMe ( /var )
• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading.
• Redhat Liberty deployed with Triple-O
Production openstack (II)
• 9 Storage nodes (Supermicro) each with:
• 512GB of RAM,
• 2 * 100GB/s network interfaces,
• 60 * 6TB SAS discs, 2 system SSD.
• 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz )
• 4TB of Intel P3600 NVMe used for journal.
• Ubuntu Xenial.
• 3 PB of disc space, 1PB usable.
• Single instance ( 1.3 GBytes/sec write, 200 MBytes/sec read )
• Ceph benchmarks imply 7 GBytes/second.
• Rebuild traffic of 20 GBytes/second.
Production openstack (III)
• 3 Racks of equipment, 24 KW load per rack.
• 10 Arista 7060CX-32S switches .
• 1U, 32 * 100Gb/s -> 128 * 25Gb/s .
• Hardware VxLan support integrated with openstack *.
• Layer two traffic limited to rack, VxLan used inter-rack.
• Layer three between racks and interconnect to legacy systems.
• All network switch software can be upgraded without disruption.
• True linux systems.
• 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy
systems.
(* VxLan in ml2 plugin not used in first iteration because of software issues )
UID mapping
Allows uid’s from a set of NID’s to be mapped to another set of uid’s
These commands are run on the MGS
lctl nodemap_add ${TENANT_NAME}
lctl nodemap_modify --name ${TENANT_NAME} --property trusted --value 0
lctl nodemap_modify --name ${TENANT_NAME} --property admin --value 0
lctl nodemap_modify --name ${TENANT_NAME} --property squash_uid --value ${TENANT_UID}
lctl nodemap_modify --name ${TENANT_NAME} --property squash_gid --value ${TENANT_UID}
lctl nodemap_add_idmap --name ${TENANT_NAME} --idtype uid --idmap 1000:${TENANT_UID}
Sub directory mounts
Restricts access to a filesystem to a directory.
These commands are run on an admin host
These commands are run on the MGSmkdir /lustre/secure/${TENANT_NAME}
chown ${TENANT_NAME} /lustre/secure/${TENANT_NAME}
lctl set_param -P nodemap.${TENANT_NAME}.fileset=/${TENANT_NAME}
Map nodemap to network
This commands are run on the MGS
And this command adds a route via a Lustre router.
This is run on all MDS and OSS ( or the route added to
/etc/modprobe.d/lustre.conf )
In the same way a similar command is needed on each client using tcp
lctl nodemap_add_range --name ${TENANT_NAME} --range  [0-255].[0-255].[0-255].[0-
255]@tcp${TENANT_UID}
lnetctl route add --net tcp${TENANT_UID} --gateway ${LUSTRE_ROUTER_IP}@tcp
Openstack configuration
neutron net-create <name> --shared --provider:network_type vlan 
--provider:physical_network datacentre --provider:segmentation_id 109
neutron subnet-create --enable-dhcp --dns-nameserver 172.18.255.1 --dns-nameserver 172.18.255.2
--dns-nameserver 172.18.255.3 --no-gateway de208f24-999d-4ca3-98da-5d0edd2184ad --name 
LNet-subnet-5 --allocation-pool start=172.27.202.17,end=172.27.203.240 172.27.202.0/23
openstack role create Lnet-5
openstack role add --project <project ID> --user <user ID> <roleID>
Openstack policy
Edit /etc/neutron/policy.json so that the get_network rule
is:
"get_network": "rule:get_network_local"
/etc/neutron/policy.d/get_networks_local.json
this defines the new rule and keeps the change to
/etc/neutron/policy.json simple.
{
"get_network_local": "rule:admin_or_owner or rule:external or rule:context_is_advsvc or
rule:show_providers or ( not rule:provider_networks and rule:shared )"
}
Openstack policy
/etc/neutron/policy.d/provider.json is used to define
networks and their mapping to roles.
{
"net_LNet-1": "field:networks:id=d18f2aca-163b-4fc7-a493-237e383c1aa9",
"show_LNet-1": "rule:net_LNet-1 and role:LNet-1_ok",
"net_LNet-2": "field:networks:id=169b54c9-4292-478b-ac72-272725a26263",
"show_LNet-2": "rule:net_LNet-2 and role:LNet-2_ok",
"provider_networks": "rule:net_LNet-1 or rule:net_LNet-2",
"show_providers": "rule:show_LNet-1 or rule:show_LNet-2"
}
Restart neutron
Secure lustre on openstack
Evaluation hardware
6+ year old hardware
• Lustre servers
• Dual Intel E5620 @ 2.40GHz
• 256GB RAM
• Dual 10G network
• lustre: 2.9.0.ddnsec2
• https://jira.hpdd.intel.com/browse/LU-9289
• SFA-10k
• 300 * SATA, 7200rpm , 1TB
We have seen this system reach 6G Bytes/second in production.
Secure lustre on openstack
Physical router configuration.
• Repurposed compute node
• Redhat 7.3
• lustre 2.9.0.ddnsec2
• Mellanox ConnectX-4 ( 2*25GB/s )
• Dual Intel E5-2690 v4 @ 2.60GHz
• 512 GB Ram
Connected in a single rack so packets from other racks will have to
transverse the spine. No changes from default settings.
Virtual client
• 2 CPU
• 4 GB of RAM
• CentOS Linux release 7.3.1611 (Core)
• lustre: 2.9.0.ddnsec2
• Dual nic
• Tenant network
• Shared lustre network
Testing procedure - vdbench
http://bit.ly/2rjRuPP The oracle download page (version 5.04.06)
Creates a large pool of files on which tests are later run.
Sequential and Random IO, block sizes of 4k,64k,512k,1M,4M,16M.
Each test section is run for 5 minutes.
Threads are used to increase performance.
No performance tuning attempted.
Single machine performance
Filesets and uid mapping have no effect on performance.
Instance size has little effect on performance.
Single machine performance
Single machine Performance
Filesets and UID mapping overhead insignificant.
Read performance (Virtual machine,old kernel)≅ 350 MBytes/second
Write performance (Virtual machine,old kernel)≅ 750 MBytes/second
Read performance (Virtual machine,new kernel)≅ 1300
MBytes/second
Write performance (Virtual machine,new kernel)≅ 950 MBytes/second
Read performance (Physical machine)≅ 3200 MBytes/second
Write performance (Physical machine)≅ 1700 MBytes/second
Multiple vms, with bare metal
routers.
Multiple vms, with bare metal
routers.
Virtualised Lustre routers.
We could see that bare metal Lustre routers gave acceptable
performance. We wanted to know if we could virtualise these
routers.
Each tenant could have their own set of virtual routers.
• Fault isolation
• Ease of provisioning routers.
• No additional cost.
• Increases east-west traffic.
Improved security
As each tenant has its own set of Lustre routers:
• The traffic to a different tenant does not go to a shared router.
• A Lustre router could be compromised without directly
compromising another tenant’s data - the filesystem servers will not
route data for @tcp1 to the router @tcp2.
• Either a second Lustre router or the Lustre servers would need to be
compromised to re route the data.
Secure lustre on openstack
Port security...
The routed lustre network (eg tcp1 etc) required that port security
was disabled on the lustre router ports.
neutron port-list | grep 172.27.70.36 | awk '{print $2}'
08a1808a-fe4a-463c-b755-397aedd0b36c
neutron port-update --no-security-groups 08a1808a-fe4a-463c-b755-397aedd0b36c
neutron port-update 08a1808a-fe4a-463c-b755-397aedd0b36c --port-security-enabled=False
http://kimizhang.com/neutron-ml2-port-security/
We would need to have iptables inside the instance rather than rely
on iptables in the ovs/hypervisor. The tests do not include this.
Sequential performance.
Random Performance
Asymmetric routing ?
http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.rpf.html
Conclusion
• Isolated POSIX islands can be deployed with Lustre 2.9
• Performance is an acceptable given the hardware.
• Lustre routers require little cpu and memory.
• Physical routers work and can give good locality for network usage.
• Virtual routers work and are “easy” to scale and can give additional
security benefits,however multiple routers will need to be deployed
and additional east-west traffic will need to be accommodated.
Acknowledgements
DDN: Sébastien Buisson,Thomas Favre-Bulle, James Coomer
Current group staff: Pete Clapham, James Beal, Helen Brimmer, John Constable,
Helen Cousins, Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon.
Previous group staff: Simon Fraser, Andrew Perry, Matthew Rahtz

More Related Content

PPTX
Accelerating Neutron with Intel DPDK
PDF
Disruptive IP Networking with Intel DPDK on Linux
PDF
Kernel Recipes 2015 - Porting Linux to a new processor architecture
PPTX
Mininet multiple controller
PDF
L3HA-VRRP-20141201
PDF
Namespaces and cgroups - the basis of Linux containers
PPTX
The n00bs guide to ovs dpdk
PDF
DPDK in Containers Hands-on Lab
Accelerating Neutron with Intel DPDK
Disruptive IP Networking with Intel DPDK on Linux
Kernel Recipes 2015 - Porting Linux to a new processor architecture
Mininet multiple controller
L3HA-VRRP-20141201
Namespaces and cgroups - the basis of Linux containers
The n00bs guide to ovs dpdk
DPDK in Containers Hands-on Lab

What's hot (20)

PDF
Containers with systemd-nspawn
PDF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
PPTX
Enable DPDK and SR-IOV for containerized virtual network functions with zun
PDF
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PDF
Containers and Namespaces in the Linux Kernel
PDF
LXC on Ganeti
PDF
OpenSSL + Intel (r) Quick Assist Technology Engine Setup Instructions
ODP
Dpdk performance
PDF
Open stack pike-devstack-tutorial
PDF
Kernel Recipes 2015: Kernel packet capture technologies
PPTX
Corralling Big Data at TACC
PDF
Full system roll-back and systemd in SUSE Linux Enterprise 12
PDF
Docker - container and lightweight virtualization
PDF
Introduction to eBPF and XDP
KEY
イマドキなNetwork/IO
PPTX
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
PDF
High Availability Storage (susecon2016)
PDF
GlusterFS Update and OpenStack Integration
PDF
LF_OVS_17_OVS-DPDK: Embracing your NUMA nodes.
PPTX
High Performance Networking Leveraging the DPDK and Growing Community
Containers with systemd-nspawn
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
Enable DPDK and SR-IOV for containerized virtual network functions with zun
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
Containers and Namespaces in the Linux Kernel
LXC on Ganeti
OpenSSL + Intel (r) Quick Assist Technology Engine Setup Instructions
Dpdk performance
Open stack pike-devstack-tutorial
Kernel Recipes 2015: Kernel packet capture technologies
Corralling Big Data at TACC
Full system roll-back and systemd in SUSE Linux Enterprise 12
Docker - container and lightweight virtualization
Introduction to eBPF and XDP
イマドキなNetwork/IO
LISA18: Hidden Linux Metrics with Prometheus eBPF Exporter
High Availability Storage (susecon2016)
GlusterFS Update and OpenStack Integration
LF_OVS_17_OVS-DPDK: Embracing your NUMA nodes.
High Performance Networking Leveraging the DPDK and Growing Community
Ad

Similar to Secure lustre on openstack (20)

PDF
Enabling a Secure Multi-Tenant Environment for HPC
PDF
Experiences in Providing Secure Mult-Tenant Lustre Access to OpenStack
PDF
Sanger OpenStack presentation March 2017
PPTX
Sanger, upcoming Openstack for Bio-informaticians
PPTX
Flexible compute
PDF
Tacc Infinite Memory Engine
PDF
Open nebula froscon
PPTX
OpenStack@NBU
TXT
Havana版 RDO-QuickStart-2 Answer File(answer2.txt)
TXT
Havana版 RDO-QuickStart-1 Answer File(answer1.txt)
TXT
Havana版 RDO-QuickStart-3 Answer File(RDO-QuickStart-3.txt)
PPTX
HPC and cloud distributed computing, as a journey
PDF
NFD9 - Matt Peterson, Data Center Operations
PDF
OpenStack Neutron Havana Overview - Oct 2013
PPT
Cumulus networks - Overcoming traditional network limitations with open source
PDF
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
PDF
MSST-2013 Openstack in the Land of Guilder
PDF
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
PPTX
DOE Magellan OpenStack user story
PDF
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
Enabling a Secure Multi-Tenant Environment for HPC
Experiences in Providing Secure Mult-Tenant Lustre Access to OpenStack
Sanger OpenStack presentation March 2017
Sanger, upcoming Openstack for Bio-informaticians
Flexible compute
Tacc Infinite Memory Engine
Open nebula froscon
OpenStack@NBU
Havana版 RDO-QuickStart-2 Answer File(answer2.txt)
Havana版 RDO-QuickStart-1 Answer File(answer1.txt)
Havana版 RDO-QuickStart-3 Answer File(RDO-QuickStart-3.txt)
HPC and cloud distributed computing, as a journey
NFD9 - Matt Peterson, Data Center Operations
OpenStack Neutron Havana Overview - Oct 2013
Cumulus networks - Overcoming traditional network limitations with open source
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
MSST-2013 Openstack in the Land of Guilder
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
DOE Magellan OpenStack user story
OpenNebulaConf 2016 - Building a GNU/Linux Distribution by Daniel Dehennin, M...
Ad

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Approach and Philosophy of On baking technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Review of recent advances in non-invasive hemoglobin estimation
The AUB Centre for AI in Media Proposal.docx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
sap open course for s4hana steps from ECC to s4
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
Programs and apps: productivity, graphics, security and other tools
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Review of recent advances in non-invasive hemoglobin estimation

Secure lustre on openstack

  • 1. Evaluating Lustre 2.9 and OpenStack James Beal
  • 3. The Sanger Institute LSF 9 ~10,000 cores in main compute farm ~10,000 cores across smaller project-specific farms 15PB Lustre storage Mostly everything is available everywhere - “isolation” is based on POSIX file permissions
  • 4. Our OpenStack History 2015, June: sysadmin training July: experiments with RHOSP6 (Juno) August: RHOSP7 (Kilo) released December: pilot “beta” system opened to testers 2016, first half: Science As A Service July: pilot “gamma” system opened using proper Ceph hardware August: datacentre shutdown September: production system hardware installation
  • 5. HPC and Cloud computing are Complimentary Traditional HPC The highest possible performance for Sanger workloads. A mature and centrally managed compute platform. High performance Lustre filesystems. Flexible Compute Full segregation between projects ensures data security throughout computation tasks. Developers and collaborators are no longer tied to a single system. They are free to follow the latest technologies and trends
  • 6. Motivations Traditional pipelines require a shared POSIX filesystem while cloud workloads support object stores. We have a large number of traditional/legacy pipelines. We do not always have the source code or expertise to migrate. We require multi Gigabyte per second performance. The tenant will have root and could impersonate any user. We want the system to be simplest for the tenant and as simple as possible for the administrator.
  • 7. Lustre 2.9 features • Each tenant’s I/O is squashed to their own unique uid/gid • Each tenant is restricted to their own subdirectory of the Lustre filesystem It might be possible to treat general access outside of openstack as a separate tenant with a uid space reserved for a number of openstack tenants. With only a subdirectory exported for standard usage.
  • 8. Production openstack (I) • 107 Compute nodes (Supermicro) each with: • 512GB of RAM, 2 * 25GB/s network interfaces, • 1 * 960GB local SSD, 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz ) • 6 Control nodes (Supermicro) allow 2 openstack instances. • 256 GB RAM, 2 * 100 GB/s network interfaces, • 1 * 120 GB local SSD, 1 * Intel P3600 NVMe ( /var ) • 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz ) • Total of 53 TB of RAM, 2996 cores, 5992 with hyperthreading. • Redhat Liberty deployed with Triple-O
  • 9. Production openstack (II) • 9 Storage nodes (Supermicro) each with: • 512GB of RAM, • 2 * 100GB/s network interfaces, • 60 * 6TB SAS discs, 2 system SSD. • 2 * Intel E52690v4 ( 14 cores @ 2.6Ghz ) • 4TB of Intel P3600 NVMe used for journal. • Ubuntu Xenial. • 3 PB of disc space, 1PB usable. • Single instance ( 1.3 GBytes/sec write, 200 MBytes/sec read ) • Ceph benchmarks imply 7 GBytes/second. • Rebuild traffic of 20 GBytes/second.
  • 10. Production openstack (III) • 3 Racks of equipment, 24 KW load per rack. • 10 Arista 7060CX-32S switches . • 1U, 32 * 100Gb/s -> 128 * 25Gb/s . • Hardware VxLan support integrated with openstack *. • Layer two traffic limited to rack, VxLan used inter-rack. • Layer three between racks and interconnect to legacy systems. • All network switch software can be upgraded without disruption. • True linux systems. • 400 Gb/s from racks to spine, 160 Gb/s from spine to legacy systems. (* VxLan in ml2 plugin not used in first iteration because of software issues )
  • 11. UID mapping Allows uid’s from a set of NID’s to be mapped to another set of uid’s These commands are run on the MGS lctl nodemap_add ${TENANT_NAME} lctl nodemap_modify --name ${TENANT_NAME} --property trusted --value 0 lctl nodemap_modify --name ${TENANT_NAME} --property admin --value 0 lctl nodemap_modify --name ${TENANT_NAME} --property squash_uid --value ${TENANT_UID} lctl nodemap_modify --name ${TENANT_NAME} --property squash_gid --value ${TENANT_UID} lctl nodemap_add_idmap --name ${TENANT_NAME} --idtype uid --idmap 1000:${TENANT_UID}
  • 12. Sub directory mounts Restricts access to a filesystem to a directory. These commands are run on an admin host These commands are run on the MGSmkdir /lustre/secure/${TENANT_NAME} chown ${TENANT_NAME} /lustre/secure/${TENANT_NAME} lctl set_param -P nodemap.${TENANT_NAME}.fileset=/${TENANT_NAME}
  • 13. Map nodemap to network This commands are run on the MGS And this command adds a route via a Lustre router. This is run on all MDS and OSS ( or the route added to /etc/modprobe.d/lustre.conf ) In the same way a similar command is needed on each client using tcp lctl nodemap_add_range --name ${TENANT_NAME} --range [0-255].[0-255].[0-255].[0- 255]@tcp${TENANT_UID} lnetctl route add --net tcp${TENANT_UID} --gateway ${LUSTRE_ROUTER_IP}@tcp
  • 14. Openstack configuration neutron net-create <name> --shared --provider:network_type vlan --provider:physical_network datacentre --provider:segmentation_id 109 neutron subnet-create --enable-dhcp --dns-nameserver 172.18.255.1 --dns-nameserver 172.18.255.2 --dns-nameserver 172.18.255.3 --no-gateway de208f24-999d-4ca3-98da-5d0edd2184ad --name LNet-subnet-5 --allocation-pool start=172.27.202.17,end=172.27.203.240 172.27.202.0/23 openstack role create Lnet-5 openstack role add --project <project ID> --user <user ID> <roleID>
  • 15. Openstack policy Edit /etc/neutron/policy.json so that the get_network rule is: "get_network": "rule:get_network_local" /etc/neutron/policy.d/get_networks_local.json this defines the new rule and keeps the change to /etc/neutron/policy.json simple. { "get_network_local": "rule:admin_or_owner or rule:external or rule:context_is_advsvc or rule:show_providers or ( not rule:provider_networks and rule:shared )" }
  • 16. Openstack policy /etc/neutron/policy.d/provider.json is used to define networks and their mapping to roles. { "net_LNet-1": "field:networks:id=d18f2aca-163b-4fc7-a493-237e383c1aa9", "show_LNet-1": "rule:net_LNet-1 and role:LNet-1_ok", "net_LNet-2": "field:networks:id=169b54c9-4292-478b-ac72-272725a26263", "show_LNet-2": "rule:net_LNet-2 and role:LNet-2_ok", "provider_networks": "rule:net_LNet-1 or rule:net_LNet-2", "show_providers": "rule:show_LNet-1 or rule:show_LNet-2" } Restart neutron
  • 18. Evaluation hardware 6+ year old hardware • Lustre servers • Dual Intel E5620 @ 2.40GHz • 256GB RAM • Dual 10G network • lustre: 2.9.0.ddnsec2 • https://jira.hpdd.intel.com/browse/LU-9289 • SFA-10k • 300 * SATA, 7200rpm , 1TB We have seen this system reach 6G Bytes/second in production.
  • 20. Physical router configuration. • Repurposed compute node • Redhat 7.3 • lustre 2.9.0.ddnsec2 • Mellanox ConnectX-4 ( 2*25GB/s ) • Dual Intel E5-2690 v4 @ 2.60GHz • 512 GB Ram Connected in a single rack so packets from other racks will have to transverse the spine. No changes from default settings.
  • 21. Virtual client • 2 CPU • 4 GB of RAM • CentOS Linux release 7.3.1611 (Core) • lustre: 2.9.0.ddnsec2 • Dual nic • Tenant network • Shared lustre network
  • 22. Testing procedure - vdbench http://bit.ly/2rjRuPP The oracle download page (version 5.04.06) Creates a large pool of files on which tests are later run. Sequential and Random IO, block sizes of 4k,64k,512k,1M,4M,16M. Each test section is run for 5 minutes. Threads are used to increase performance. No performance tuning attempted.
  • 23. Single machine performance Filesets and uid mapping have no effect on performance. Instance size has little effect on performance.
  • 25. Single machine Performance Filesets and UID mapping overhead insignificant. Read performance (Virtual machine,old kernel)≅ 350 MBytes/second Write performance (Virtual machine,old kernel)≅ 750 MBytes/second Read performance (Virtual machine,new kernel)≅ 1300 MBytes/second Write performance (Virtual machine,new kernel)≅ 950 MBytes/second Read performance (Physical machine)≅ 3200 MBytes/second Write performance (Physical machine)≅ 1700 MBytes/second
  • 26. Multiple vms, with bare metal routers.
  • 27. Multiple vms, with bare metal routers.
  • 28. Virtualised Lustre routers. We could see that bare metal Lustre routers gave acceptable performance. We wanted to know if we could virtualise these routers. Each tenant could have their own set of virtual routers. • Fault isolation • Ease of provisioning routers. • No additional cost. • Increases east-west traffic.
  • 29. Improved security As each tenant has its own set of Lustre routers: • The traffic to a different tenant does not go to a shared router. • A Lustre router could be compromised without directly compromising another tenant’s data - the filesystem servers will not route data for @tcp1 to the router @tcp2. • Either a second Lustre router or the Lustre servers would need to be compromised to re route the data.
  • 31. Port security... The routed lustre network (eg tcp1 etc) required that port security was disabled on the lustre router ports. neutron port-list | grep 172.27.70.36 | awk '{print $2}' 08a1808a-fe4a-463c-b755-397aedd0b36c neutron port-update --no-security-groups 08a1808a-fe4a-463c-b755-397aedd0b36c neutron port-update 08a1808a-fe4a-463c-b755-397aedd0b36c --port-security-enabled=False http://kimizhang.com/neutron-ml2-port-security/ We would need to have iptables inside the instance rather than rely on iptables in the ovs/hypervisor. The tests do not include this.
  • 35. Conclusion • Isolated POSIX islands can be deployed with Lustre 2.9 • Performance is an acceptable given the hardware. • Lustre routers require little cpu and memory. • Physical routers work and can give good locality for network usage. • Virtual routers work and are “easy” to scale and can give additional security benefits,however multiple routers will need to be deployed and additional east-west traffic will need to be accommodated.
  • 36. Acknowledgements DDN: Sébastien Buisson,Thomas Favre-Bulle, James Coomer Current group staff: Pete Clapham, James Beal, Helen Brimmer, John Constable, Helen Cousins, Brett Hartley, Dave Holland, Jon Nicholson, Matthew Vernon. Previous group staff: Simon Fraser, Andrew Perry, Matthew Rahtz