SlideShare a Scribd company logo
Draft Document for Review May 1, 2014 2:10 pm SG24-8223-00
ibm.com/redbooks
Front cover
NIC Virtualization on
IBM Flex System
Scott Irwin
Scott Lorditch
Matt Slavin
Ilya Krutov
Introduces NIC virtualization concepts
and technologies
Discusses vNIC deployment
scenarios
Provides vNIC configuration
examples
NIC Virtualization on IBM Flex Systems
International Technical Support Organization
NIC Virtualization on IBM Flex System
May 2014
Draft Document for Review May 1, 2014 2:10 pm 8223edno.fm
SG24-8223-00
© Copyright International Business Machines Corporation 2014. All rights reserved.
Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
8223edno.fm Draft Document for Review May 1, 2014 2:10 pm
First Edition (May 2014)
This edition applies to:
򐂰 IBM Networking Operating System 7.7
򐂰 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch
򐂰 IBM Flex System Fabric EN4093R 10Gb Scalable Switch
򐂰 IBM RackSwitch G8264CS
򐂰 IBM Flex System Embedded 10Gb Virtual Fabric Adapter
򐂰 IBM Flex System CN4054 10Gb Virtual Fabric Adapter
򐂰 IBM Flex System CN4054R 10Gb Virtual Fabric Adapter
This document was created or updated on May 1, 2014.
Note: Before using this information and the product it supports, read the information in “Notices” on
page vii.
© Copyright IBM Corp. 2014. All rights reserved. iii
Draft Document for Review May 1, 2014 2:10 pm 8223TOC.fm
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex
System environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview of Flex System I/O module virtualization technologies . . . . . . . . . . . . . . . . . . 2
1.1.1 Introduction to converged fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Introduction to vLAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Introduction to stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Introduction to SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.5 Easy Connect Q-in-Q solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.6 Introduction to the Failover feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Introduction to NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 vNIC based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Unified Fabric Port based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3 Comparing vNIC modes and UFP modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 2. Converged networking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1 What convergence is. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.1 Calling it what it is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Vision of convergence in data centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 The interest in convergence now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Fibre Channel SANs today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Ethernet-based storage today. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Benefits of convergence in storage and network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Challenge of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9 Fibre Channel over Ethernet protocol stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.10 iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.11 iSCSI versus FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.11.1 Key similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.11.2 Key differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 3. IBM Flex System networking architecture and portfolio. . . . . . . . . . . . . . . 27
3.1 Enterprise Chassis I/O architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 IBM Flex System Ethernet I/O modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches . . . . . 31
3.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch. . . . . . . . . . 36
3.2.3 IBM Flex System Fabric SI4093 System Interconnect Module. . . . . . . . . . . . . . . 42
3.2.4 I/O modules and cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 IBM Flex System Ethernet adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.1 Embedded 10Gb Virtual Fabric Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3.2 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters. . . . . . . . . . . 48
8223TOC.fm Draft Document for Review May 1, 2014 2:10 pm
iv NIC Virtualization on IBM Flex System
3.3.3 IBM Flex System CN4022 2-port 10Gb Converged Adapter . . . . . . . . . . . . . . . . 50
3.3.4 IBM Flex System x222 Compute Node LOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Chapter 4. NIC virtualization considerations on the switch side . . . . . . . . . . . . . . . . . 55
4.1 Virtual Fabric vNIC solution capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.1.1 Virtual Fabric mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.1.2 Switch Independent mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Unified Fabric Port feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 UFP Access and Trunk modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 UFP Tunnel mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.3 UFP FCoE mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.4 UFP Auto mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.5 The following rules and attributes are associated with UFP vPorts . . . . . . . . . . . 69
4.3 Compute node NIC to I/O module connectivity mapping . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.1 Embedded 10Gb VFA (LoM) - Mezzanine 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.2 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1. . . . . . . . . . . . . . 72
4.3.3 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 and 2. . . . . . . . . 72
4.3.4 IBM Flex System x222 Compute Node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Chapter 5. NIC virtualization considerations on the server side. . . . . . . . . . . . . . . . . 75
5.1 Introduction to enabling Virtual NICs on the server. . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.1.1 Getting in to the virtual NIC configuration section of UEFI . . . . . . . . . . . . . . . . . . 76
5.1.2 Initially enabling virtual NIC functionality via UEFI . . . . . . . . . . . . . . . . . . . . . . . . 85
5.1.3 Special settings for the different modes of virtual NIC via UEFI . . . . . . . . . . . . . . 86
5.1.4 Setting the Emulex virtual NIC settings back to factory default. . . . . . . . . . . . . . . 91
5.2 Other methods for configuring virtual NICs on the server . . . . . . . . . . . . . . . . . . . . . . . 92
5.2.1 FSM Configuration Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.3 Utilizing physical and virtual NICs in the OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.1 Introduction to teaming/bonding on the server . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.2 OS side teaming/bonding and upstream network requirements . . . . . . . . . . . . . 122
5.3.3 Discussion of physical NIC connections and logical enumeration . . . . . . . . . . . 128
Chapter 6. Flex System NIC virtulization deployment scenarios . . . . . . . . . . . . . . . . 133
6.1 Introduction to deployment examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.2 UFP mode virtual NIC and Layer 2 Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.2.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.2.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.2.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.2.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.3 UFP mode virtual NIC with vLAG and FCoE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.3.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.3.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.4 pNIC and vNIC Virtual Fabric modes with Layer 2 Failover . . . . . . . . . . . . . . . . . . . . 163
6.4.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.4.2 Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.4.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.4.4 Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.4.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.5 Switch Independent mode with SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.5.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Contents v
Draft Document for Review May 1, 2014 2:10 pm 8223TOC.fm
6.5.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.5.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.5.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.5.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8223TOC.fm Draft Document for Review May 1, 2014 2:10 pm
vi NIC Virtualization on IBM Flex System
© Copyright IBM Corp. 2014. All rights reserved. vii
Draft Document for Review May 1, 2014 2:10 pm 8223spec.fm
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION
PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Any performance data contained herein was determined in a controlled environment. Therefore, the results
obtained in other operating environments may vary significantly. Some measurements may have been made
on development-level systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been estimated through
extrapolation. Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs.
8223spec.fm Draft Document for Review May 1, 2014 2:10 pm
viii NIC Virtualization on IBM Flex System
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation in the United States, other countries, or both. These and other IBM trademarked terms are
marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US
registered or common law trademarks owned by IBM at the time this information was published. Such
trademarks may also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
Blade Network Technologies®
BladeCenter®
BNT®
IBM®
IBM Flex System®
Power Systems™
PowerVM®
PureFlex®
RackSwitch™
Redbooks®
Redbooks (logo) ®
System x®
VMready®
The following terms are trademarks of other companies:
Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
© Copyright IBM Corp. 2014. All rights reserved. ix
Draft Document for Review May 1, 2014 2:10 pm 8223pref.fm
Preface
The deployment of server virtualization technologies in data centers requires significant
efforts in providing sufficient network I/O bandwidth to satisfy the demand of virtualized
applications and services. For example, every virtualized system can host several dozen
network applications and services. Each of these services requires certain bandwidth (or
speed) to function properly. Furthermore, because of different network traffic patterns that are
relevant to different service types, these traffic flows can interfere with each other. They can
lead to serious network problems, including the inability of the service to perform its functions.
The NIC virtualization solutions on IBM® Flex System address these issues. The solutions
are based on the IBM Flex System® Enterprise Chassis with a 10 Gbps Converged
Enhanced Ethernet infrastructure. This infrastructure is built on IBM RackSwitch™ G8264
and G8264CS Top of Rack (ToR) switches, IBM Flex System Fabric CN4093 and EN4093R
10 Gbps Ethernet switch modules, and IBM Flex System SI4093 Switch Interconnect
modules in the chassis and the Emulex and Broadcom Virtual Fabric Adapters in each
compute node.
This IBM Redbooks® publication provides configuration scenarios that use leading edge IBM
networking technologies combined with the Emulex Virtual Fabric adapters. This book is for
IBM, IBM Business Partner and client networking professionals who want to learn how to
implement NIC virtualization solutions and switch interconnect technologies on IBM Flex
System by using the IBM Unified Fabric Port (UFP) mode, Switch Independent mode, and
IBM Virtual Fabric mode.
Authors
This book was produced by a team of specialists from around the world working at the
International Technical Support Organization, Raleigh Center.
Ilya Krutov is a Project Leader at the ITSO Center in Raleigh and has been with IBM since
1998. Before he joined the ITSO, Ilya served in IBM as a Run Rate Team Leader, Portfolio
Manager, Brand Manager, Technical Sales Specialist, and Certified Instructor. Ilya has expert
knowledge in IBM System x®, BladeCenter®, and Flex System products and technologies,
virtualization and cloud computing, and data center networking. He has authored over 150
books, papers, product guides, and solution guides. He has a bachelor’s degree in Computer
Engineering from the Moscow Engineering and Physics Institute.
Scott Irwin is a Consulting System Engineer (CSE) for IBm System Networking. He joined
IBM in November of 2010 as part of the Blade Network Technologies®, (BNT®) acquisition.
His Networking background spans well over 16 years as both a Customer Support Escalation
Engineer and a Customer facing Field Systems Engineer. In May of 2007, he was promoted
to Consulting Systems Engineer with a focus on deep customer troubleshooting. His
responsibilities are to support customer Proof of Concepts, assist with paid installations and
training and provide support for both pre and post Sales focusing on all verticals (Public
Sector, High Frequency Trading, Service Provider, Mid Market and Enterprise).
8223pref.fm Draft Document for Review May 1, 2014 2:10 pm
x NIC Virtualization on IBM Flex System
Scott Lorditch is a Consulting Systems Engineer for IBM System Networking. He performs
network architecture assessments, and develops designs and proposals for implementing
GbE Switch Module products for the IBM BladeCenter. He also developed several training
and lab sessions for IBM technical and sales personnel. Previously, Scott spent almost 20
years working on networking in various industries, working as a senior network architect, a
product manager for managed hosting services, and manager of electronic securities transfer
projects. Scott holds a BS degree in Operations Research with a specialization in computer
science from Cornell University.
Matt Slavin is a Consulting Systems Engineer for IBM Systems Networking, based out of
Tulsa, Oklahoma, and currently providing network consulting skills to the Americas. He has a
background of over 30 years of hands-on systems and network design, installation, and
troubleshooting. Most recently, he has focused on data center networking where he is leading
client efforts in adopting new and potently game-changing technologies into their day-to-day
operations. Matt joined IBM through the acquisition of Blade Network Technologies, and prior
to that has worked at some of the top systems and networking companies in the world.
Thanks to the following people for their contributions to this project:
Tamikia Barrow, Cheryl Gera, Chris Rayns, Jon Tate, David Watts, Debbie Willmschen
International Technical Support Organization, Raleigh Center
Nghiem Chu, Sai Chan, Michael Easterly, Heidi Griffin, Richard Mancini, Shekhar Mishra,
Heather Richardson, Hector Sanchez, Tim Shaughnessy
IBM
Jeff Lin
Emulex
Now you can become a published author, too!
Here’s an opportunity to spotlight your skills, grow your career, and become a published
author—all at the same time! Join an ITSO residency project and help write a book in your
area of expertise, while honing your experience using leading-edge technologies. Your efforts
will help to increase product acceptance and customer satisfaction, as you expand your
network of technical contacts and relationships. Residencies run from two to six weeks in
length, and you can participate either in person or as a remote resident working from your
home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Preface xi
Draft Document for Review May 1, 2014 2:10 pm 8223pref.fm
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
򐂰 Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
򐂰 Send your comments in an email to:
redbooks@us.ibm.com
򐂰 Mail your comments to:
IBM Corporation, International Technical Support Organization
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Stay connected to IBM Redbooks
򐂰 Find us on Facebook:
http://www.facebook.com/IBMRedbooks
򐂰 Follow us on Twitter:
http://twitter.com/ibmredbooks
򐂰 Look for us on LinkedIn:
http://www.linkedin.com/groups?home=&gid=2130806
򐂰 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks
weekly newsletter:
https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm
򐂰 Stay current on recent Redbooks publications with RSS Feeds:
http://www.redbooks.ibm.com/rss.html
8223pref.fm Draft Document for Review May 1, 2014 2:10 pm
xii NIC Virtualization on IBM Flex System
© Copyright IBM Corp. 2014. All rights reserved. 1
Draft Document for Review May 1, 2014 2:10 pm Introduction.fm
Chapter 1. Introduction to I/O module and
NIC virtualization features in the
IBM Flex System environment
This chapter introduces the various virtualization features available with certain I/O Modules
and converged network adapters (CNAs) in the IBM PureFlex® System environment. The
primary focus of this paper are the EN4093R, CN4093, and the SI4093, along with related
server side CNA virtulization features. Although other I/O modules are available for the Flex
System Enterprise Chassis environment, unless otherwise noted, those other I/O modules do
not support the virtualization features discussed in this document and are not covered here.
This chapter includes the following sections:
򐂰 1.1, “Overview of Flex System I/O module virtualization technologies” on page 2
򐂰 1.2, “Introduction to NIC virtualization” on page 10
1
Introduction.fm Draft Document for Review May 1, 2014 2:10 pm
2 NIC Virtualization on IBM Flex System
1.1 Overview of Flex System I/O module virtualization
technologies
The term virtualization can mean many different things to different people, and in different
contexts.
For example, in the server world it is often associated with taking bare metal platforms and
putting in a layer of software (referred to as a hypervisor) that permits multiple virtual
machines (VMs) to run on that single physical platform, with each VM thinking it owns the
entire hardware platform.
In the network world, there are many different concepts of virtualization. Such things as
overlay technologies, that let a user run one network on top of another network, usually with
the goal of hiding the complexities of the underlying network (often referred to as overlay
networking). Another form of network virtualization would be Openflow technology, which
de-couples a switches control plane from the switch, and allows the switching path decisions
to be made from a central control point.
And then there are other forms of virtualization, such as cross chassis aggregation (also
known as cross-switch aggregation), virtualized NIC technologies, and converged fabrics.
This paper is focused on the latter set of virtualization forms, specifically the following set of
features:
򐂰 Converged fabrics - Fibre Channel over Ethernet (FCoE) and internet Small Computer
Systems Interconnect (iSCSI)
򐂰 virtual Link Aggregation (vLAG) - A form of cross switch aggregation
򐂰 Stacking - Virtualizing the management plane and the switching fabric
򐂰 Switch Partitioning (SPAR) - Masking the I/O Module from the host and upstream network
򐂰 Easy Connect Q-in-Q solutions - More ways to mask the I/O Modules from connecting
devices
򐂰 NIC virtualization - Allowing a single physical 10G NIC to represent multiple NICs to the
host OS
Although we will be introducing all of these topics in this section, the primary focus of this
paper will be around how the last item (NIC virtualization) integrates into the various other
features, and the surrounding customer environment. The specific NIC virtualization features
that will be discussed in detail in this paper include the following:
򐂰 IBM Virtual Fabric mode - also known as vNIC Virtual Fabric mode, including both
Dedicated Uplink Mode (default) and Shared Uplink Mode (optional) operations
򐂰 Switch Independent Mode - also known as vNIC Switch Independent Mode
򐂰 Unified Fabric Port - also known as IBM Unified Fabric Protocol, or just UFP - All modes
Important: The term vNIC can be used both generically for all virtual NIC technologies, or
as a vendor specific term. For example, VMware calls the virtual NIC that resides inside a
VM a vNIC. Unless otherwise noted, the use of the term vNIC in this paper is referring to a
specific feature available on the Flex System I/O modules and Emulex CNAs inside
physical hosts. In a related fashion, the term vPort has multiple connotations, for example,
used by Microsoft for their Hyper-V environment. Unless otherwise noted, the use of the
term vPort in this paper is referring to the UFP feature on the Flex System I/O modules and
Emulex CNAs inside physical hosts.
Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 3
Draft Document for Review May 1, 2014 2:10 pm Introduction.fm
1.1.1 Introduction to converged fabrics
As the name implies, converged fabrics are all about taking a set of protocols and data
designed to run on top of one kind of physical medium, and allowing them to be carried on top
of a different physical medium. This provides a number of cost benefits, such as reducing the
number of physical cabling plants that are required, removing the need for separate physical
NICs and HBAs, including a potential reduction in power and cooling. From an OpEx
perspective it can reduce the cost associated with the management of separate physical
infrastructures. In the datacenter world, two of the most common forms of converged fabrics
are FCoE and iSCSI.
FCoE allows a host to use its 10 Gb Ethernet connections to access Fibre Channel attached
remote storage, as if it were physically Fibre Channel attached to the host, when in fact the
FC traffic is encapsulated into FCoE frames and carried to the remote storage via an Ethernet
network.
iSCSI takes a protocol that was originally designed for hosts to talk to relatively close physical
storage over physical SCSI cables, and converts it to utilize IP and run over an Ethernet
network, and thus be able to access storage way beyond the limitations of a physical SCSI
based solution.
Both of these topics are discussed in more detail in Chapter 2, “Converged networking” on
page 15.
1.1.2 Introduction to vLAG
In its simplest terms, vLAG is a technology designed to enhance traditional Ethernet link
aggregations (sometimes referred to generically as Portchannels or Etherchannels). It is
important to note that vLAG is not a form of aggregation in its own right, but an enhancement
to aggregations.
As some background, under current IEEE specifications, an aggregation is still defined as a
bundle of similar links between two, and only two devices, bound together to operate as a
single logical link. By today’s standards based definitions, you cannot create an aggregation
on one device and have these links of that aggregation connect to more than a single device
on the other side of the aggregation. The use of only two devices in this fashion limits the
ability to offer certain robust designs.
Although the standards bodies are working on a solution that provides split aggregations
across devices, most vendors have developed their own versions of this multi-chassis
aggregation. For example, Cisco has virtual Port Channel (vPC) on NX OS products, and
Virtual Switch System (VSS) on the 6500 IOS products. IBM offers virtual Link Aggregation
(vLAG) on many of the IBM Top of Rack (ToR) solutions, and on the EN4093R and CN4093
Flex System I/O modules.
The primary goal of virtual link aggregation is to overcome the limit imposed by the current
standards-based aggregation, and provide a distributed aggregation across a pair of switches
instead of a single switch. Doing so results in a reduction of single points of failure, while still
maintaining a loop-free, non-blocking environment.
Important: All I/O module features discussed in this paper are based on the latest
available firmware at the time of this writing (7.7.9 for the EN4093R and CN4093, and 7.7.8
for the SI4093 System Interconnect Module).
Introduction.fm Draft Document for Review May 1, 2014 2:10 pm
4 NIC Virtualization on IBM Flex System
Figure 1-1, shows an example of how vLAG can create a single common uplink out of a pair
of embedded I/O Modules. This creates a non-looped path with no blocking links, offering the
maximum amount of bandwidth for the links, and no single point of failure.
Figure 1-1 Non-looped design using multi-chassis aggregation on both sides
Although this vLAG based design is considered the most optimal, not all I/O module
virtualization options support this topology, for example, Virtual Fabric vNIC mode or SPAR is
not supported with vLAG.
Another potentially limiting factor with vLAG (and other such cross-chassis aggregations such
as vPC and VSS) is that it only supports a pair of switches acting as one for this cross-chassis
aggregation, and not more than two. If the desire is to split an aggregation across more than
two switches, stacking might be an option to consider.
1.1.3 Introduction to stacking
Stacking provides the ability to take up to eight physical I/O modules and treat them as a
single logical switch from a port usage and management perspective. This means ports on
different I/O modules in the stack can be part of a common aggregation, and you only log in to
a single IP address to manage all I/O modules in the stack. For devices that are attaching to
the stack, the stack looks and acts like a single large switch.
Stacking is supported on the EN4093R and CN4093 I/O modules. It is provided by reserving
a group of uplinks into stacking links and creating a ring of I/O modules with these links. The
ring design ensures the loss of a single link or single I/O module in the stack does not lead to
a disruption of the stack.
Before v7.7 releases of code, it was possible to stack the EN4093R only into a common stack
of like model I/O modules. However, in v7.7 and later code, support was added to add a pair
CN4093s into a hybrid stack of EN4093s to add Fibre Channel Forwarder (FCF) capability
into the stack. The limit for this hybrid stacking is a maximum of 6 x EN4093Rs and 2 x
CN4093s in a common stack.
Chassis
Compute
Node
NIC 1
NIC 2
Upstream
Network
ToR
Switch 2
ToR
Switch 1
Multi-chassis Aggregation (vLAG, vPC, mLAG, etc)
I/O Module 1
I/O Module 2
Multi-chassis Aggregation (vLAG)
Important: When using the EN4093R and CN4093 in hybrid stacking, only the CN4093 is
allowed to act as a stack master or stack backup master for the stack.
Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 5
Draft Document for Review May 1, 2014 2:10 pm Introduction.fm
Stacking the Flex System chassis I/O modules with IBM Top of Rack switches that also
support stacking is not allowed. Connections from a stack of Flex System chassis I/O
modules to upstream switches can be made with normal single or aggregated connections,
including the use of vLAG/vPC on the upstream switches to connect links across stack
members into a common non-blocking fabric between the stack and the Top of Rack switches.
An example of four I/O modules in a highly available stacking design is shown in Figure 1-2.
Figure 1-2 Example of stacking in the Flex System environment
This example shows a design with no single points of failures, via a stack of four I/O modules
in a single stack, and a pair of upstream vLAG/vPC connected switches.
One of the potential limitations of the current implementation of stacking is that if an upgrade
of code is needed, a reload of the entire stack must occur. Because upgrades are uncommon
and should be scheduled for non-production hours anyway, a single stack design is usually
efficient and acceptable. But some customers do not want to have any downtime (scheduled
or otherwise) and a single stack design is thus not an acceptable solution. For these users
that still want to make the most use of stacking, a two-stack design might be an option. This
design features stacking a set of I/O modules in bay 1 into one stack, and a set of I/O modules
in bay 2 in a second stack.
The primary advantage to a two-stack design is that each stack can be upgraded one at a
time, with the running stack maintaining connectivity for the compute nodes during the
upgrade and reload of the other stack. The downside of the two-stack design is that traffic that
is flowing from one stack to another stack must go through the upstream network to reach the
other stack.
As can be seen, stacking might not be suitable for all customers. However, if it is desired, it is
another tool that is available for building a robust infrastructure by using the Flex System I/O
modules.
Multi-chassis Aggregation
(vLAG, vPC, mLAG, etc)
Chassis 1
Compute
Node
NIC 1
NIC 2
Upstream
Network
ToR
Switch 2
ToR
Switch 1
I/O Module 1
I/O Module 2
Stacking
Chassis 2
Compute
Node
NIC 1
NIC 2
I/O Module 1
I/O Module 2
Introduction.fm Draft Document for Review May 1, 2014 2:10 pm
6 NIC Virtualization on IBM Flex System
1.1.4 Introduction to SPAR
Switch partitioning (SPAR) is a feature that, among other things, allows a physical I/O module
to be divided into multiple logical switches. After SPAR is configured, ports within a given
SPAR group can communicate only with each other. Ports that are members of different
SPAR groups on the same I/O module can not communicate directly with each other, without
going outside the I/O module.
The EN4093R, CN4093, and the SI4093 I/O Modules support SPAR,
SPAR features two modes of operation:
򐂰 Pass-through domain mode (also known as transparent mode)
This mode of SPAR uses a Q-in-Q function to encapsulate all traffic passing through the
switch in a second layer of VLAN tagging. This is the default mode when SPAR is enabled
and is VLAN agnostic owing to this Q-in-Q operation. It passes tagged and untagged
packets through the SPAR session without looking at or interfering with any customer
assigned tag.
SPAR pass-thru mode supports passing FCoE packets to an upstream FCF, but without
the benefit of FIP snooping within the SPAR group in pass-through domain mode.
򐂰 Local domain mode
This mode is not VLAN agnostic and requires a user to create any required VLANs in the
SPAR group.Currently, there is a limit of 256 VLANs in Local domain mode.
Support is available for FIP Snooping on FCoE sessions in Local Domain mode. Unlike
pass-through domain mode, Local Domain mode provides strict control of end host VLAN
usage.
Consider the following points regarding SPAR:
򐂰 SPAR is disabled by default on the EN4093R and CN4093. SPAR is enabled by default on
SI4093, with all base licensed internal and external ports defaulting to a single
pass-through SPAR group. This default SI4093 configuration can be changed if desired.
򐂰 Any port can be a member of only a single SPAR group at one time.
򐂰 Only a single uplink path is allowed per SPAR group (can be a single link, a single static
aggregation, or a single LACP aggregation). This SPAR enforced restriction ensures that
no network loops are possible with ports in a SPAR group.
򐂰 SPAR cannot be used with UFP or Virtual Fabric vNIC at this time. Switch Independent
Mode vNIC is supported with SPAR. UFP support is slated for a possible future release.
򐂰 Up to eight SPAR groups per I/O module are supported. This number might be increased
in a future release.
򐂰 SPAR is not supported with vLAG, stacking or tagpvid-ingress features.
SPAR can be a useful solution in environments were simplicity is paramount.
1.1.5 Easy Connect Q-in-Q solutions
The Easy Connect concept, often referred to as Easy Connect mode, or Transparent mode, is
not a specific feature but a way of using one of four different existing features to attempt to
minimize ongoing I/O module management requirements. The primary goal of Easy Connect
is to make an I/O module transparent to the hosts and the upstream network they need to
access, thus reducing the management requirements for I/O Modules in an Easy Connect
mode.
Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 7
Draft Document for Review May 1, 2014 2:10 pm Introduction.fm
As noted, there are actually several features that can be used to accomplish an Easy Connect
solution, with the following being common aspects of Easy Connect solutions:
򐂰 At the heart of Easy Connect is some form of Q-in-Q tagging, to mask packets traveling
through the I/O module. This is a fundamental requirement of any Easy Connect solution
and lets the attached hosts and upstream network communicate using any VLAN (tagged
or untagged), and the I/O module will pass those packets through to the other side of the
I/O module by wrapping them in an outer VLAN tag, and then removing that outer VLAN
tag as the packet exits the I/O module, thus making the I/O module VLAN agnostic. This
Q-in-Q operation is what removes the need to manage VLANs on the I/O module, which is
usually one of the larger ongoing management requirements of a deployed I/O module.
򐂰 Pre-creating an aggregation of the uplinks, in some cases, all of the uplinks, to remove the
likelihood of loops (if all uplinks are not used, any unused uplinks/ports should be disabled
to ensure loops are not possible).
򐂰 Optionally disabling spanning-tree so the upstream network does not receive any
spanning-tree BPDUs. This is especially important in the case of upstream devices that
will shut down a port if BPDUs are received, such as a Cisco FEX device, or an upstream
switch running some form of BPDU guard.
After it is configured, an I/O module in Easy Connect mode does not require on-going
configuration changes as a customer adds and removes VLANs to the hosts and upstream
network. In essence, Easy Connect turns the I/O module into a VLAN agnostic port
aggregator, with support for growing up to the maximum bandwidth of the product (for
example, add upgrade Feature on Demand (FoD) keys to the I/O module to increase the
10 Gb links to Compute Nodes and 10 Gb and 40 Gb links to the upstream networks).
The following are the two primary methods for deploying an Easy Connect solution:
򐂰 Use an I/O module that defaults to a form of Easy Connect:
– For customers that want an Easy Connect type of solution that is immediately ready for
use out of the box (zero touch I/O module deployment), the SI4093 provides this by
default. The SI4093 accomplishes this by having the following factory default
configuration:
• All base licensed internal and external ports are put into a single SPAR group.
• All uplinks are put into a single common LACP aggregation and the LACP
suspend-port feature is enabled.
• The failover feature is enabled on the common LACP key.
• No spanning-tree support (the SI4093 is designed to never permit more than a
single uplink path per SPAR, so it can not create a loop and does not support
spanning-tree).
򐂰 For customers that want the option to be able to use advanced features, but also want an
Easy Connect mode solution, the EN4093R and CN4093 offer configurable options that
can make them transparent to the attaching Compute Nodes and upstream network
switches. While maintaining the option of changing to more advanced modes of
configuration when needed.
As noted, the SI4093 accomplishes this by defaulting to the SPAR feature in pass-through
mode, which puts all compute node ports and all uplinks into a common Q-in-Q group.
For the EN4093R and CN4093, there are a number of features that can be implemented to
accomplish this Easy Connect support. The primary difference between these I/O modules
and the SI4093 is that you must first perform a small set of configuration steps to set up the
EN4093R and CN4093 into an Easy Connect mode, after which minimal management of the
I/O module is required.
Introduction.fm Draft Document for Review May 1, 2014 2:10 pm
8 NIC Virtualization on IBM Flex System
For these I/O modules, this Easy Connect mode can be configured by using one of the
following four features:
򐂰 The SPAR feature that is default on the SI4093 can be configured on both the EN4093R
and CN4093 as well
򐂰 Utilize the tagpvid-ingress feature
򐂰 Configure vNIC Virtual Fabric Dedicated Uplink Mode
򐂰 Configure UFP vPort tunnel mode
In general, all of these features provide this Easy Connect functionality, with each having
some pros and cons. For example, if the desire is to use Easy Connect with vLAG, you should
use the tagpvid-ingress mode or the UFP vPort tunnel mode (SPAR and Virtual Fabric vNIC
do not permit the vLAG ISL). But, if you want to use Easy Connect with FCoE today, you
cannot use tagpvid-ingress and must utilize a different form of Easy connect, such as the
vNIC Virtual Fabric Dedicated Uplink Mode or UFP tunnel mode (SPAR pass-through mode
allows FCoE but does not support FIP snooping, which may or may not be a concern for
some customers).
As an example of how Easy Connect works (in all Easy Connect modes), consider the
tagpvid-ingress Easy Connect mode operation shown in Figure 1-3. When all internal ports
and the desired uplink ports are placed into a common PVID/Native VLAN (4091 in this
example) and tagpvid-ingress is enabled on these ports (with any wanted aggregation
protocol on the uplinks that are required to match the other end of those links), all ports with a
matching Native or PVID setting On this I/O module are part of a single Q-in-Q tunnel. The
Native/PVID VLAN on the port acts as the outer tag and the I/O module switches traffic based
on this outer tag VLAN. The inner customer tag rides through the fabric encapsulated on this
Native/PVID VLAN to the destination port (or ports) in this tunnel, and then has the outer tag
stripped off as it exits the I/O Module, thus re-exposing the original customer facing tag (or no
tag) to the device attaching to that egress port.
Figure 1-3 Packet flow with Easy Connect
In all modes of Easy Connect, local switching based on destination MAC address is still used.
Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 9
Draft Document for Review May 1, 2014 2:10 pm Introduction.fm
Some considerations on what form of Easy Connect mode makes the most sense for a given
situation:
򐂰 For users that require virtualized NICs and are already using vNIC Virtual Fabric mode,
and are more comfortable staying with it, vNIC Virtual Fabric Easy Connect mode might
be the best solution.
򐂰 For users that require virtualized NICs and have no particular opinion on which mode of
virtualized NIC they prefer, UFP tunnel mode would be the best choice for Easy Connect
mode, since the UFP feature is the future direction of virtualized NICs in the Flex System
I/O module solutions.
򐂰 For users planning to make use of the vLAG feature, this would require either UFP tunnel
mode or tagpvid-ingress mode forms of Easy Connect (vNIC virtual fabric mode and
SPAR Easy Connect modes do not work with the vLAG feature).
򐂰 For users that do not need vLAG or virtual NIC functionality, SPAR is a very simple and
clean solution to implement as an Easy Connect solution.
1.1.6 Introduction to the Failover feature
Failover, some times referred to as Layer 2 Failover or Trunk Failover, is not a virtulization
feature in its own right, but can play an important role when NICs on a server are making use
of teaming/bonding (forms of NIC virtulization in the OS). Failover is particularly important in
an embedded environment, such as in a Flex System chassis.
When NICs are teamed/bonded in an operating system, they need to know when a NIC is no
longer able to reach the upstream network, so they can decide to use or not use a NIC in the
team. Most commonly this is a simple link up/link down check in the server. If the link is
reporting up, use the NIC, if a link is reporting down, do not use the NIC.
In an embedded environment, this can be a problem if the uplinks out of the embedded I/O
module go down, but the internal link to the server is still up. In that case, the server will still
be reporting the NIC link as up, even though there is no path to the upstream network, and
that leads to the server sending traffic out a NIC that has no path out of the embedded I/O
module, and disrupts server communications.
The Failover feature can be implemented in these environments, and when the set of uplinks
the Failover feature is tracking go down, then configurable internal ports will also be taken
down, alerting the embedded server to a path fault in this direction, at which time the server
can utilize the team/bond to select a different NIC, and maintain network connectivity.
Introduction.fm Draft Document for Review May 1, 2014 2:10 pm
10 NIC Virtualization on IBM Flex System
An example of how failover can protect Compute Nodes in a PureFlex chassis when there is
an uplink fault out of one of the I/O modules can be seen in Figure 1-4.
Figure 1-4 Example of Failover in action
Without failover or some other form of remote link failure detection, embedded servers would
potentially be exposed to loss of connectivity if the uplink path on one of the embedded I/O
modules were to fail.
Note designs that utilize vLAG or some sort of cross chassis aggregation such as stacking
are not exposed to this issue (and thus do not need the Failover feature) as they have a
different coping method for dealing with uplinks out of an I/O module going down (for
example, with vLAG, the packets that need to get upstream can cross the vLAG ISL and use
the other I/O modules uplinks to get to the upstream network).
1.2 Introduction to NIC virtualization
As noted previously, although we have introduced a number of virtualization elements, this
book is primarily focused on the various options to virtualize NIC technology within the
PureFlex System and Flex System environment. This section introduces the two primary
types of NIC virtualization (vNIC and UFP) available on the Flex System I/O modules, as well
as introduces the various sub-elements of these virtual NIC technologies.
At the core of all virtual NICs discussed in this section is the ability to take a single physical 10
GbE NIC and carve it up into up to three or four NICs for use in the attaching host.
The virtual NIC technologies discussed for the I/O module here are all directly tied to the
Emulex CNA offerings for the Flex System environment, and documented in 3.3, “IBM Flex
System Ethernet adapters” on page 47.
HowFailoverWorks
1. All uplinks out of the I/O module have gone down (could be a link failure or failure
of ToR 1, and so forth).
2. Trunk failover takes down the link to NIC 1 to notify the compute node the path
out of I/O module 1 is gone.
3. NIC teaming on the compute node begins to utilizing the still functioning NIC 2 for
all communications.
Chassis
Node
NIC1
NIC2
ToR
Switch2
ToR
Switch1 I/O Module 1
Failover enabled
I/O Module 2
Failover enabled
X
Logical
Teamed NIC
2
3
1
Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 11
Draft Document for Review May 1, 2014 2:10 pm Introduction.fm
1.2.1 vNIC based NIC virtualization
vNIC is the original virtual NIC technology utilized in the IBM BladeCenter 10Gb Virtual Fabric
Switch Module, and has been brought forward into the PureFlex System environment to allow
customers that have standardized on vNIC to still use it with the PureFlex System solutions.
vNIC has three primary nodes:
򐂰 vNIC Virtual Fabric - Dedicated Uplink Mode
– Provides a Q-in-Q tunneling action for each vNIC group
– Each vNIC group must have its own dedicated uplink path out
– Any vNICs in one vNIC group can not talk with vNICs in any other vNIC group, without
first exiting to the upstream network
򐂰 vNIC Virtual Fabric - Shared Uplink Mode
– Each vNIC group provides a single VLAN for all vNICs in that group
– Each vNIC group must be a unique VLAN (can not use same VLAN on more than a
single vNIC group)
– Servers can not use tagging when Shared Uplink Mode is enabled
– Like vNICs in Dedicate Uplink Mode, any vNICs in one vNIC group can not talk with
vNICs in any other vNIC group, without first exiting to the upstream network
򐂰 vNIC Switch Independent Mode
– Offers virtual NICs to server with no special I/O module side configuration
– The switch is completely unaware that the 10 GbE NIC is being seen as multiple logical
NICs in the OS
Details for enabling and configuring these modes can be found in Chapter 5, “NIC
virtualization considerations on the server side” on page 75 and Chapter 6, “Flex System NIC
virtulization deployment scenarios” on page 133.
1.2.2 Unified Fabric Port based NIC virtualization
UFP is the current direction of IBM NIC virtualization, and provides a more feature rich
solution compared to the original vNIC Virtual Fabric mode. Like VF mode vNIC, UFP allows
carving up a single 10 Gb port into four virtual NICs. UFP also has a number of modes
associated with it, including:
򐂰 Tunnel mode
Provides a mode very similar to vNIC Virtual Fabric Dedicated Uplink Mode
򐂰 Trunk mode
Provides a traditional 802.1Q trunk mode to the virtual NIC (vPort) interface
򐂰 Access mode
Provides a traditional access mode (single untagged VLAN) to the virtual NIC (vPort)
interface
򐂰 FCoE mode
Provides FCoE functionality to the vPort
򐂰 Auto-VLAN mode
Auto VLAN creation for Qbg and IBM VMready® environments
Introduction.fm Draft Document for Review May 1, 2014 2:10 pm
12 NIC Virtualization on IBM Flex System
Only vPort 2 can be bound to FCoE. If FCoE is not desired, vPort 2 can be configured for one
of the other modes.
Details for enabling and configuring these modes can be found in Chapter 5, “NIC
virtualization considerations on the server side” on page 75 and Chapter 6, “Flex System NIC
virtulization deployment scenarios” on page 133.
1.2.3 Comparing vNIC modes and UFP modes
As a general rule of thumb, if a customer desires virtualized NICs in the PureFlex System
environment, UFP is usually the preferred solution, as all new feature development is going
into UFP.
If a customer has standardized on the original vNIC Virtual Fabric mode, then they can still
continue to use that mode in a fully supported fashion.
If a customer does not want any of the virtual NIC functionality controlled by the I/O module
(only controlled and configured on the server side) then Switch Independent mode vNIC is
the solution of choice. This mode has the advantage of being I/O module independent, such
that any upstream I/O module can be utilized. Some of the down sides to this mode are that
bandwidth restrictions can only be enforced from the server side, not the I/O module side, and
to change bandwidth requires a reload of the server (bandwidth control for the other virtual
NIC modes discussed here are changed from the switch side, enforce bandwidth restrictions
bidirectionally, and can be changed on the fly, with no reboot required).
Table 1-1 shows some of the items that may effect the decision making process.
Table 1-1 Attributes of virtual NIC options
Capability
Virtual Fabric vNIC mode Switch
independent
Mode vNIC
UFP
Dedicated
uplink
Shared
uplink
Requires support in the I/O module Yes Yes No Yes
Requires support in the NIC/CNA Yes Yes Yes Yes
Supports adapter transmit rate
control
Yes Yes Yes Yes
Supports I/O module transmit rate
control
Yes Yes No Yes
Supports changing rate without
restart of node
Yes Yes No Yes
Requires a dedicated uplink path per
vNIC group or vPort
Yes No No Yes for vPorts in
Tunnel mode
Support for node OS-based tagging Yes No Yes Yes
Support for failover per vNIC/
group/UFP vPort
Yes Yes No Yes
Support for more than one uplink
path per vNIC/vPort group
No Yes Yes Yes for vPorts in trunk
and Access modes
Supported regardless of the model
of upstream I/O module
No No Yes No
Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 13
Draft Document for Review May 1, 2014 2:10 pm Introduction.fm
For a deeper dive into virtual NIC operational characteristics from the switch side see
Chapter 4, “NIC virtualization considerations on the switch side” on page 55. For virtual NIC
operational characteristics from the server side, see Chapter 5, “NIC virtualization
considerations on the server side” on page 75.
Supported with vLAG No No Yes Yes for uplinks out of
the I/O Module
carrying vPort traffic
Supported with SPAR No No Yes No
Supported with stacking Yes Yes Yes No (UFP and stacking
on EN/CN4093 in
coming release of
code)
Supported with an SI4093 No No Yes No today, but
supported in coming
release
Supported with EN4093 Yes Yes Yes Yes
Supported with CN4093 Yes Yes Yes Yes
Capability
Virtual Fabric vNIC mode Switch
independent
Mode vNIC
UFP
Dedicated
uplink
Shared
uplink
Introduction.fm Draft Document for Review May 1, 2014 2:10 pm
14 NIC Virtualization on IBM Flex System
© Copyright IBM Corp. 2014. All rights reserved. 15
Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm
Chapter 2. Converged networking
This chapter introduces storage and network convergence, highlighting the impact on data
centers and the vision behind it.
This chapter includes the following sections:
򐂰 2.1, “What convergence is” on page 16
򐂰 2.2, “Vision of convergence in data centers” on page 16
򐂰 2.3, “The interest in convergence now” on page 17
򐂰 2.4, “Fibre Channel SANs today” on page 17
򐂰 2.5, “Ethernet-based storage today” on page 18
򐂰 2.6, “Benefits of convergence in storage and network” on page 19
򐂰 2.7, “Challenge of convergence” on page 20
򐂰 2.8, “Conclusion” on page 22
򐂰 2.9, “Fibre Channel over Ethernet protocol stack” on page 23
򐂰 2.10, “iSCSI” on page 24
򐂰 2.11, “iSCSI versus FCoE” on page 25
2
Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm
16 NIC Virtualization on IBM Flex System
2.1 What convergence is
Dictionaries describes convergence as follows:
򐂰 The degree or point at which lines, objects, and so on, converge1
򐂰 The merging of distinct technologies, industries, or devices into a unified whole2
In the context of this book, convergence addresses the fusion of local area networks (LANs)
and storage area networks (SANs), including servers and storage systems, into a unified
network. In other words, the same infrastructure is used for both data (LAN) and storage
(SAN) networking; the components of this infrastructure are primarily those traditionally used
for LANs.
2.1.1 Calling it what it is
Many terms and acronyms are used to describe convergence in a network environment.
These terms are described in later chapters of this book. For a better understanding of the
basics, let us start with the core.
Data Center Bridging (DCB)
The Institute of Electrical and Electronics Engineers (IEEE) uses the term DCB to group the
required extensions to enable an enhanced Ethernet that is capable of deploying a converged
network where different applications, relying on different link layer technologies, can be run
over a single physical infrastructure. The Data Center Bridging Task Group (DCB TG), part of
the IEEE 802.1 Working Group, provided the required extensions to existing 802.1 bridge
specifications in several projects.
Converged Enhanced Ethernet (CEE)
This is a trademark term that was registered by IBM in 2007 and was abandoned in 2008.
Initially, it was planned to donate (transfer) this term to the industry (IEEE 802 or Ethernet
Alliance) upon reception. Several vendors started using or referring to CEE in the meantime.
Data Center Ethernet (DCE)
Cisco registered the trademark DCE for their initial activity in the converged network area.
Bringing it all together
All three terms describes more or less the same thing. Some of them were introduced before
an industrial standard (or name) was available. Because manufacturers have used different
command names and terms, different terms might be used in this book. This clarification that
these terms can be interchanged should help prevent confusion. While all of these terms are
still heard, it is preferred to use the open industry standards Data Center Bridging (DCB)
terms. Command syntax in some of the IBM products used for testing in this book includes
the CEE acronym.
2.2 Vision of convergence in data centers
The density - processing and storage capability per square foot - of the data center footprint is
increasing over time, allowing the same processing power and storage capacity in significantly
1
Dictionary.com. Retrieved July 08, 2013 from http://dictionary.reference.com/browse/convergence
2 Merriam-Webster.com. Retrieved July 08, 2013 from http://www.merriam-webster.com/dictionary/convergence
Chapter 2. Converged networking 17
Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm
smaller space. At the same time, information technology is embracing infrastructure
virtualization more rapidly than ever.
One way to reduce the storage and network infrastructure footprint is to implement a
converged network. Vendors are adopting industry standards which support convergence
when developing products.
Fibre Channel over Ethernet (FCoE) and iSCSI are two of the enablers of storage and network
convergence. Enterprises can preserve investments in traditional Fibre Channel (FC) storage
and at the same time adapt to higher Ethernet throughput demands which arise from server
virtualization. Most of the vendors in the networking market offer 10 Gbps Network Interface
Cards; 40Gbps NICs are also available today. Similarly, data center network switches
increasingly offer an option to choose 40 Gbps for ports, and 100 Gbps is expected relatively
soon.
Convergence has long had a role in networking, but now it takes on a new significance. The
following sections describe storage and networking in data centers today, explain what is
changing, and highlight approaches to storage and network convergence that are explored in
this book.
2.3 The interest in convergence now
Several factors are driving new interest in combining storage and data infrastructure. The
Ethernet community has a history of continually moving to transmission speeds that were
thought impossible only a few years earlier. Although a 100 Mbps Ethernet was once
considered fast, a 10 Gbps Ethernet is commonplace today. and 40 Gbps Ethernet is
becoming more and more widely available, with 100 Gb Ethernet following shortly. From a
simple data transmission speed perspective, Ethernet can now meet or exceed the speeds
that are available by using FC.
The IEEE 802.3 work group is already working on the 400 Gbps standard (results are
expected in 2017), so this process will continue.
A second factor that is enabling convergence is the addition of capabilities that make Ethernet
lower latency and “lossless,” making it more similar to FC. The Data Center Bridging (DCB)
protocols provide several capabilities that substantially enhance the performance of Ethernet
and initially enable its usage for storage traffic.
One of the primary motivations for storage and networking convergence is improved asset
utilization and cost of ownership, similar to the convergence of voice and data networks that
occurred in previous years. By using a single infrastructure for multiple types of network
traffic, the costs of procuring, installing, managing, and operating the data center
infrastructure can be lowered. Where multiple types of adapters, switches, and cables were
once required for separate networks, a single set of infrastructure will take its place, providing
savings in equipment, cabling, and power requirements. The improved speeds and
capabilities of lossless 10 and 40 Gbps Ethernet are enabling such improvements.
2.4 Fibre Channel SANs today
Fibre Channel SANs are generally regarded as the high-performance approach to storage
networking. With a Fibre Channel SAN, storage arrays are equipped with FC ports that
connect to FC switches. Similarly, servers are equipped with Fibre Channel host bus adapters
Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm
18 NIC Virtualization on IBM Flex System
(HBAs) that also connect to Fibre Channel switches. Therefore, the Fibre Channel SAN,
which is the set of FC switches, is a separate network for storage traffic.
Fibre Channel (FC) was standardized in the early 1990s and became the technology of
choice for enterprise-class storage networks. Compared to its alternatives, FC offered
relatively high-speed, low-latency, and back-pressure mechanisms that provide lossless
connectivity. That is, FC is designed not to drop packets during periods of network
congestion.
Just as the maximum speed of Ethernet networks has increased repeatedly, Fibre Channel
networks have offered increased speed, typically by factors of 2, from four to eight to
16 Gbps, with thirty-two Gbps becoming available.
FC has many desirable characteristics for a storage network, but with some considerations.
First, because FC is a separate network from the enterprise data Ethernet network, additional
cost and infrastructure are required.
Second, FC is a different technology from Ethernet. Therefore, the skill set required to design,
install, operate and manage the FC SAN is different from the skill set required for Ethernet,
which adds cost in terms of personnel requirements.
Third, despite many years of maturity in the FC marketplace, vendor interoperability within a
SAN fabric is limited. Such technologies as N_Port Virtualization (NPV) or N_Port ID
Virtualization (NPIV) allow the equipment of one vendor to attach at the edge of the SAN
fabric of another vendor. However, interoperability over inter-switch links (ISLs; E_Port links)
within a Fibre Channel SAN is generally viewed as problematic.
2.5 Ethernet-based storage today
Storage arrays can also be networked by using technologies based on Ethernet. Two major
approaches are the Internet Small Computer System Interface (iSCSI) protocol and various
NAS protocols.
iSCSI provides block-level access to data over IP networks. With iSCSI, the storage arrays
and servers use Ethernet adapters. Servers and storage exchange SCSI commands over an
Ethernet network to store and retrieve data.
iSCSI provides a similar capability to FC, but by using a native Ethernet network. For this
reason, iSCSI is sometimes referred to as IP SAN. By using iSCSI, designers and
administrators can take advantage of familiar Ethernet skills for designing and maintaining
networks. Also, unlike FC devices, Ethernet devices are widely interoperable. Ethernet
infrastructure can also be significantly less expensive than FC gear.
When compared to FC, iSCSI also has challenges. FC is lossless and provides low latency
in-sequence data transfer. However, traditional Ethernet drops packets when traffic
congestion occurs, so that higher-layer protocols are required to ensure that no packets are
lost. For iSCSI, TCP/IP is used above an Ethernet network to guarantee that no storage
packets are lost. Therefore, iSCSI traffic undergoes a further layer of encapsulation as it is
transmitted across an Ethernet network.
Until recently, Ethernet technology was available only at speeds significantly lower than those
speeds of FC. Although FC offered speeds of 2, 4, 8, or 16 Gbps, with 32 Gbps just arriving,
Ethernet traditionally operated at 100 Mbps and1 Gbps. Now, 10 Gbps is common, and 40
Gbps is not far behind. iSCSI might offer a lower cost overall than an FC infrastructure, but it
historically has tended to offer lower performance because of its extra encapsulation and lower
Chapter 2. Converged networking 19
Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm
speeds. Therefore, iSCSI has been viewed as a lower cost, lower performance storage
networking approach compared to FC. Today, the DCB standards which are a prerequisite for
FCoE to operate with lossless transmission and packets arriving in order can also be used for
iSCSI, resulting in improved performance.
NAS also operates over Ethernet. NAS protocols, such as Network File System (NFS) and
Common Internet File System (CIFS), provide file-level access to data, not block-level
access. The server that accesses the NAS over a network detects a file system, not a disk.
The operating system in the NAS device converts file-level commands that are received from
the server to block-level commands. The operating system then accesses the data on its
disks and returns information to the server.
NAS appliances are attractive because, similar to iSCSI, they use a traditional Ethernet
infrastructure and offer a simple file-level access method. However, similar to iSCSI, they
have been limited by Ethernet’s capabilities. NAS protocols are encapsulated in an upper
layer protocol (such as TCP or RPC) to ensure no packet loss. While NAS is working on a
file-level, there is the possibility of additional processing on the NAS device, because it is
aware of the stored content (for example, deduplication or incremental backup). On the other
hand, NAS systems require more processing power, because they are also required to handle
all file-system related operations. This requires more resources than pure block-level
handling.
2.6 Benefits of convergence in storage and network
The term convergence has had various meanings in the history of networking. Convergence
is used generally to refer to the notion of combining or consolidating storage traffic and
traditional data traffic on a single network (or fabric). Because Fibre Channel (FC) storage
area networks (SANs) are generally called “fabrics,” the term fabric is now also commonly
used for an Ethernet network that carries storage traffic.
Convergence of network and storage consolidates data and storage traffics into a single,
highly scalable, highly available, high performance and highly reliable storage network
infrastructure.
Converging storage and network brings lot of benefits which outweigh the initial investment.
Here are some of the key benefits:
򐂰 Simplicity, cost savings, and reliability
򐂰 Scalability and easier-to-move workloads in the virtual world
򐂰 Low latency and higher throughput
򐂰 One single, high-speed network infrastructure for both storage and network
򐂰 Better utilization of server resources and simplified management
To get an idea how the differences between traditional and converged data centers can look
like, see the following figures. Both figures include three major components: servers, storage,
and the networks, to establish the connections. The required amount of switches in each
network depends on the size of the environment.
Figure 2-1 on page 20 shows a simplified picture of a traditional data center without
convergence. Either servers or storage devices might require multiple interfaces to connect to
the different networks. In addition, each network requires dedicated switches, which leads to
higher investments in multiple devices and more efforts for configuration and management.
Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm
20 NIC Virtualization on IBM Flex System
Figure 2-1 Conceptual view of a data center without implemented convergence
Using converged network technologies, as shown by the converged data center in Figure 2-2,
there is only the need for one converged enhanced Ethernet. This results in fewer required
switches and decreases the amount of devices that require management. This reduction
might impact the TCO. Even the servers, clients, and storage devices require only one type of
adapters to be connected. For redundancy, performance, or segmentation purposes, it might
still make sense to use multiple adapters.
Figure 2-2 Conceptual view of a converged data center
2.7 Challenge of convergence
Fibre Channel SANs have different design requirements than Ethernet. To provide a better
understanding, they can be compared with two different transportation systems. Each system
moves people or goods from point A to point B.
Chapter 2. Converged networking 21
Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm
Railroads
Trains run on rails and tracks. This can be compared with Fibre Channel SAN.
Figure 2-3 Trains running on rails
Specific aspects for trains that even impact network traffic are as follows:
򐂰 The route is already defined by rails (shortest path first).
򐂰 All participating trains are registered and known (nameserver).
򐂰 The network is isolated, but accidents (dropped packages) have a huge impact.
򐂰 The amount of trains in one track segment is limited (buffer to buffer credit for a lossless
connection).
򐂰 Signals and railway switches all over the tracks define the allowed routes (zoning).
򐂰 They have high capacity (payload 2148 bytes).
Roads
Cars can use roads with paved or even unpaved lanes. This can be compared with traditional
Ethernet traffic.
Figure 2-4 Cars using roads
Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm
22 NIC Virtualization on IBM Flex System
Specific aspects for roads that even impact network traffic are as follows:
򐂰 An unknown number of participants may be using the road at the same time. Metering
lights can only be used as a reactive method to slow down traffic (no confirmation for
available receiving capacity in front of sending).
򐂰 Accidents are more or less common and expected (packet loss).
򐂰 All roads lead to Rome (no point-to-point topology).
򐂰 Navigation is required to prevent moving in circles (requirement of Trill/Spanning
Tree/SDN).
򐂰 Everybody can join and hop on/off mostly everywhere (no zoning).
򐂰 They have limited capacity (payload 1500), while available bigger buses/trucks can carry
more (jumbo frames).
Convergence approaches
Maintaining two transportation infrastructure systems, with separate vehicles and different
stations and routes, is complex to manage and expensive. Convergence for storage and
networks can mean “running trains on the road”, to stay in the context. The two potential
vehicles, which are enabled to run as trains on the road, are iSCSI and Fibre Channel over
Ethernet (FCoE).
iSCSI can be used in existing (lossy) and new (lossless) Ethernet infrastructure, with different
performance characteristics. However, FCoE requires a lossless converged enhanced
Ethernet network and it relies on additional functionality known from Fibre Channel (for
example, nameserver, zoning).
The Emulex CNA (Converged Network Adapters) which are used in compute nodes in the
Flex chassis can support either iSCSI or FCoE in their onboard ASIC - that is, in hardware.
Their configuration and use is described in the chapters which follow. Testing was done for the
purpose of this book using FCoE as the storage protocol of choice, because it is more
commonly used at this time and because there are more configuration steps required to
implement FCoE in a Flex environment than to implement iSCSI. Many of the scenarios
presented in the chapters that follow can readily be adapted for deployment in an
environment which includes iSCSI storage networking.
2.8 Conclusion
Convergence is the future. Network convergence can reduce cost, simplify deployment, better
leverage expensive resources, and enable a smaller data center infrastructure footprint. The
IT industry is adopting FCoE more rapidly because the technology is becoming more mature
and offers higher throughput in terms of 40/100 Gbps. Sooner or later, the CIOs will realize
the cost benefits and advantages of convergence and will adopt the storage and network
convergence more rapidly.
The bulk of the chapters of this book focus on insights and capabilities of FCoE on IBM Flex
Systems and introduces available IBM switches and storage solutions with support for
converged networks. Most of the content of the previous book which focused more on IBM
BladeCenter converged solutions is still valid and is an integrated part of the book.
Chapter 2. Converged networking 23
Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm
2.9 Fibre Channel over Ethernet protocol stack
FCoE assumes the existence of a lossless Ethernet, such as one that implements the Data
Center Bridging (DCB) extensions to Ethernet. This section highlights, at a high level, the
concepts of FCoE as defined in FC-BB-5. The EN4093R, CN4093, G8264 and G8264CS
switches support FCoE; the G8264 and EN4093R functions as an FCoE transit switch while
the CN4093 and G8264CS have OmniPorts which can be set to function as either FC ports or
Ethernet ports under as specified in the switch configuration.
The basic notion of FCoE is that the upper layers of FC are mapped onto Ethernet, as shown
in Figure 2-5. The upper layer protocols and services of FC remain the same in an FCoE
deployment. Zoning, fabric services, and similar services still exist with FCoE.
Figure 2-5 FCoE protocol mapping
The difference is that the lower layers of FC are replaced by lossless Ethernet, which also
implies that FC concepts, such as port types and lower-layer initialization protocols, must be
replaced by new constructs in FCoE. Such mappings are defined by the FC-BB-5 standard
and are briefly addressed here.
FC-0
FC-1
FC-2P
FC-2M
FC-2V
FC-3
FC-4
Fibre Channel
Protocol Stack
Ethernet PHY
Ethernet MAC
FCoE Entity
FC-2V
FC-3
FC-4
FCoE
Protocol Stack
Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm
24 NIC Virtualization on IBM Flex System
Figure 2-6 shows another perspective on FCoE layering compared to other storage
networking technologies. In this figure, FC and FCoE layers are shown with other storage
networking protocols, including iSCSI.
Figure 2-6 Storage Network Protocol Layering
Based on this protocol structure, Figure 2-7 shows a conceptual view of an FCoE frame.
Figure 2-7 Conceptual view of an FCoE frame
2.10 iSCSI
The iSCSI protocol allows for longer distances between a server and its storage when
compared to the traditionally restrictive parallel SCSI solutions or the newer serial-attached
SCSI (SAS). iSCSI technology can use a hardware initiator, such as a host bus adapter
(HBA), or a software initiator to issue requests to target devices. Within iSCSI storage
Operating Systems / Applications
SCSI Layer
1, 2, 4, 8, 16
Gbps
FCP FCP FCP FC
iSCSI SRP
TCP TCP TCP
IP IP IP FCoE
FC IB
iFCP
FCIP
Ethernet
1, 10, 40, 100... Gbps 10, 20, 40 Gbps
FC FCoE
Ethernet
Header
FCoE
Header
FCS
EOF
FC
Header
CRC
FC Payload
Ethernet Frame, Ethertype = FCoE=8906h
Same as a physical FC frame
Control information: version, ordered sets (SOF, EOF)
Chapter 2. Converged networking 25
Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm
terminology, the initiator is typically known as a client, and the target is the storage device.
The iSCSI protocol encapsulates SCSI commands into protocol data units (PDUs) within the
TCP/IP protocol and then transports them over the network to the target device. The disk is
presented locally to the client as shown in Figure 2-8.
Figure 2-8 iSCSI architecture overview
The iSCSI protocol is a transport for SCSI over TCP/IP. Figure 2-6 on page 24 illustrates a
protocol stack comparison between Fibre Channel and iSCSI. iSCSI provides block-level
access to storage, as does Fibre Channel, but uses TCP/IP over Ethernet instead of Fibre
Channel protocol. iSCSI is defined in RFC 3720, which you can find at:
http://www.ietf.org/rfc/rfc3720.txt
iSCSI uses Ethernet-based TCP/IP rather than a dedicated (and different) storage area
network (SAN) technology. Therefore, it is attractive for its relative simplicity and usage of
widely available Ethernet skills. Its chief limitations historically have been the relatively lower
speeds of Ethernet compared to Fibre Channel and the extra TCP/IP encapsulation required.
With lossless 10 Gbps Ethernet now available, the attractiveness of iSCSI is expected to grow
rapidly. TCP/IP encapsulation will still be used, but 10 Gbps Ethernet speeds will dramatically
increase the appeal of iSCSI.
2.11 iSCSI versus FCoE
The section highlights the similarities and differences between iSCSI and FCoE. However, in
most cases, considerations other than purely technical ones will influence your decision in
choosing one over the other.
2.11.1 Key similarities
iSCSI and FCoE have the following similarities:
򐂰 Both protocols are block-oriented storage protocols. That is, the file system logic for
accessing storage with either of them is on the computer where the initiator is, not on the
storage hardware. Therefore, they are both different from typical network-attached storage
(NAS) technologies, which are file oriented.
򐂰 Both protocols implement Ethernet-attached storage.
򐂰 Both protocols can be implemented in hardware, which is detected by the operating
system of the host as an HBA.
򐂰 Both protocols can also be implemented by using software initiators which are available in
various server operating systems. However, this approach uses resources of the main
processor to perform tasks which would otherwise be performed by the hardware of an
HBA.
iSCSI Initiator
Client
TCP Connection iSCSI Target
Client
Network
Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm
26 NIC Virtualization on IBM Flex System
򐂰 Both protocols can use the Converged Enhanced Ethernet (CEE), also referred to as Data
Center Bridging), standards to deliver “lossless” traffic over Ethernet.
򐂰 Both protocols are alternatives to traditional FC storage and FC SANs.
2.11.2 Key differences
iSCSI and FCoE have the following differences:
򐂰 iSCSI uses TCP/IP as its transport, and FCoE uses Ethernet. iSCSI can use media other
than Ethernet, such as InfiniBand, and iSCSI can use Layer 3 routing in an IP network.
򐂰 Numerous vendors provide local iSCSI storage targets, some of which also support Fibre
Channel and other storage technologies. Relatively few native FCoE targets are available
at this time, which might allow iSCSI to be implemented at a lower overall capital cost.
򐂰 FCoE requires a gateway function, usually called a Fibre Channel Forwarder (FCF),
which allows FCoE access to traditional FC-attached storage. This approach allows FCoE
and traditional FC storage access to coexist either as a long-term approach or as part of a
migration. The G8264CS and CN4093 switches can be used to provide FCF functionality.
򐂰 iSCSI-to-FC gateways exist but are not required when a storage device is used that can
accept iSCSI traffic directly.
򐂰 Except in the case of a local FCoE storage target, the last leg of the connection uses FC to
reach the storage. FC uses 8b/10b encoding, which means that, sending 8 bits of data
requires a transmission of 10 bits over the wire or 25% overhead that is transmitted over
the network to prevent corruption of the data. The 10 Gbps Ethernet uses 64b/66b
encoding, which has a far smaller overhead.
򐂰 iSCSI includes IP headers and Ethernet (or other media) headers with every frame, which
adds overhead.
򐂰 The largest payload that can be sent in an FCoE frame is 2112. iSCSI can use jumbo
frame support on Ethernet and send 9K or more in a single frame.
򐂰 iSCSI has been on the market for several years longer than FCoE. Therefore, the iSCSI
standards are more mature than FCoE.
򐂰 Troubleshooting FCoE end-to-end requires Ethernet networking skills and FC SAN skills.
© Copyright IBM Corp. 2014. All rights reserved. 27
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
Chapter 3. IBM Flex System networking
architecture and portfolio
IBM Flex System, a new category of computing and the next generation of Smarter
Computing, offers intelligent workload deployment and management for maximum business
agility. This chassis delivers high-speed performance complete with integrated servers,
storage, and networking for multi-chassis management in data center compute environments.
Furthermore, its flexible design can meet the needs of varying workloads with independently
scalable IT resource pools for higher usage and lower cost per workload. Although increased
security and resiliency protect vital information and promote maximum uptime, the integrated,
easy-to-use management system reduces setup time and complexity, which provides a
quicker path to return on investment (ROI).
This chapter includes the following topics:
򐂰 3.1, “Enterprise Chassis I/O architecture” on page 28
򐂰 3.2, “IBM Flex System Ethernet I/O modules” on page 31
򐂰 3.3, “IBM Flex System Ethernet adapters” on page 47
3
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
28 NIC Virtualization on IBM Flex System
3.1 Enterprise Chassis I/O architecture
The Ethernet networking I/O architecture for the IBM Flex System Enterprise Chassis
includes an array of connectivity options for server nodes that are installed in the enclosure.
Users can decide to use a local switching model that provides superior performance, cable
reduction and a rich feature set, or use pass-through technology and allow all Ethernet
networking decisions to be made external to the Enterprise Chassis.
By far, the most versatile option is to use modules that provide local switching capabilities and
advanced features that are fully integrated into the operation and management of the
Enterprise Chassis. In particular, the EN4093 10Gb Scalable Switch module offers the
maximum port density, highest throughput, and most advanced data center-class features to
support the most demanding compute environments.
From a physical I/O module bay perspective, the Enterprise Chassis has four I/O bays in the
rear of the chassis. The physical layout of these I/O module bays is shown in Figure 3-1.
Figure 3-1 Rear view of the Enterprise Chassis showing I/O module bays
From a midplane wiring point of view, the Enterprise Chassis provides 16 lanes out of each
half-wide node bay (toward the rear I/O bays) with each lane capable of 16 Gbps or higher
speeds. How these lanes are used is a function of which adapters are installed in a node,
which I/O module is installed in the rear, and which port licenses are enabled on the I/O
module.
I/O module
bay 1
I/O module
bay 3
I/O module
bay 2
I/O module
bay 4
Chapter 3. IBM Flex System networking architecture and portfolio 29
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
How the midplane lanes connect between the node bays upfront and the I/O bays in the rear
is shown in Figure 3-2. The concept of an I/O module Upgrade Feature on Demand (FoD)
also is shown in Figure 3-2. From a physical perspective, an upgrade FoD in this context is a
bank of 14 ports and some number of uplinks that can be enabled and used on a switch
module. By default, all I/O modules include the base set of ports, and thus have 14 internal
ports, one each connected to the 14 compute node bays in the front. By adding an upgrade
license to the I/O module, it is possible to add more banks of 14 ports (plus some number of
uplinks) to an I/O module. The node needs an adapter that has the necessary physical ports
to connect to the new lanes enabled by the upgrades. Those lanes connect to the ports in the
I/O module enabled by the upgrade.
Figure 3-2 Sixteen lanes total of a single half-wide node bay toward the I/O bays
For example, if a node were installed with only the dual port LAN on system board (LOM)
adapter, only two of the 16 lanes are used (one to I/O bay 1 and one to I/O bay 2), as shown
in Figure 3-3 on page 30.
If a node was installed without LOM and two quad port adapters were installed, eight of the 16
lanes are used (two to each of the four I/O bays).
This installation can potentially provide up to 320 Gb of full duplex Ethernet bandwidth (16
lanes x 10 Gb x 2) to a single half-wide node and over half a terabit (Tb) per second of
bandwidth to a full-wide node.
Node Bay 1
Interface
Connector
To Adapter 2
To LOM or
Adapter 1
Interface
Connector
Midplane
I/O Bay 1
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
I/O Bay 2
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
I/O Bay 3
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
I/O Bay 4
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
30 NIC Virtualization on IBM Flex System
Figure 3-3 Dual port LOM connecting to ports on I/O bays 1 and 2 (all other lanes unused)
Today, there are limits on the port density of the current I/O modules, in that only the first three
lanes are potentially available from the I/O module.
By default, each I/O module provides a single connection (lane) to each of the 14 half-wide
node bays upfront. By adding port licenses, an EN2092 1Gb Ethernet Switch can offer two
1 Gb ports to each half-wide node bay, and an EN4093R 10Gb Scalable Switch, CN4093
10Gb Converged Scalable Switch or SI4093 System Interconnect Module can each provide
up to three 10 Gb ports to each of the 14 half-wide node bays. Because it is a one-for-one
14-port pass-through, the EN4091 10Gb Ethernet Pass-thru I/O module can only ever offer a
single link to each of the half-wide node bays.
As an example, if two 8-port adapters were installed and four I/O modules were installed with
all upgrades, the end node has access 12 10G lanes (three to each switch). On the 8-port
adapter, two lanes are unavailable at this time.
Concerning port licensing, the default available upstream connections also are associated
with port licenses. For more information about these connections and the node that face links,
see 3.2, “IBM Flex System Ethernet I/O modules” on page 31.
All I/O modules include a base set of 14 downstream ports, with the pass-through module
supporting only the single set of 14 server facing ports. The Ethernet switching and
interconnect I/O modules support more than the base set of ports, and the ports are enabled
by the upgrades. For more information, see the respective I/O module section in 3.2, “IBM
Flex System Ethernet I/O modules” on page 31.
As of this writing, although no I/O modules and node adapter combinations can use all 16
lanes between a compute node bay and the I/O bays, the lanes exist to ensure that the
Enterprise Chassis can use future available capacity.
Node Bay 1
Interface
Connector
Interface
Connector
Midplane
Dual port
Ethernet
Adapter
LAN on
Motherboard
I/O Bay 1
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
I/O Bay 2
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
I/O Bay 3
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
I/O Bay 4
Base
Upgrade 1 (Optional)
Upgrade 2 (Optional)
Future
Chapter 3. IBM Flex System networking architecture and portfolio 31
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
Beyond the physical aspects of the hardware, there are certain logical aspects that ensure
that the Enterprise Chassis can integrate seamlessly into any modern data centers
infrastructure.
Many of these enhancements, such as vNIC, VMready, and 802.1Qbg, revolve around
integrating virtualized servers into the environment. Fibre Channel over Ethernet (FCoE)
allows users to converge their Fibre Channel traffic onto their 10 Gb Ethernet network, which
reduces the number of cables and points of management that is necessary to connect the
Enterprise Chassis to the upstream infrastructures.
The wide range of physical and logical Ethernet networking options that are available today
and in development ensure that the Enterprise Chassis can meet the most demanding I/O
connectivity challenges now and as the data center evolves.
3.2 IBM Flex System Ethernet I/O modules
The IBM Flex System Enterprise Chassis features a number of Ethernet I/O module solutions
that provide a combination of 1 Gb and 10 Gb ports to the servers and 1 Gb, 10 Gb, and
40 Gb for uplink connectivity to the outside upstream infrastructure. The IBM Flex System
Enterprise Chassis ensures that a suitable selection is available to meet the needs of the
server nodes.
The following Ethernet I/O modules are available for deployment with the Enterprise Chassis:
򐂰 3.2.1, “IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches”
򐂰 3.2.2, “IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch” on page 36
򐂰 3.2.3, “IBM Flex System Fabric SI4093 System Interconnect Module” on page 42
򐂰 3.2.4, “I/O modules and cables” on page 46
These modules are described next.
3.2.1 IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches
The EN4093 and EN4093R 10Gb Scalable Switches are primarily 10 Gb switches that can
provide up to 42 x 10 Gb node-facing ports, and up to 14 SFP+ 10 Gb and two QSFP+ 40 Gb
external upstream facing ports, depending on the applied upgrade licenses.
Note: The EN4093, non R, is no longer available for purchase.
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
32 NIC Virtualization on IBM Flex System
A view of the face plate of the EN4093/EN4093R 10Gb Scalable Switch is shown in
Figure 3-4.
Figure 3-4 The IBM Flex System Fabric EN4093/EN4093R 10Gb Scalable Switch
As listed in Table 3-1, the switch is initially licensed with 14 10-Gb internal ports that are
enabled and 10 10-Gb external uplink ports enabled. More ports can be enabled, including
the two 40 Gb external uplink ports with the Upgrade 1 and four more SFP+ 10Gb ports with
Upgrade 2 license options. Upgrade 1 must be applied before Upgrade 2 can be applied.
Table 3-1 IBM Flex System Fabric EN4093 10Gb Scalable Switch part numbers and port upgrades
The IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches have the
following features and specifications:
򐂰 Internal ports:
– A total of 42 internal full-duplex 10 Gigabit ports (14 ports are enabled by default;
optional FoD licenses are required to activate the remaining 28 ports).
Part
number
Feature
codea
a. The first feature code that is listed is for configurations that are ordered through System x sales channels (HVEC)
by using x-config. The second feature code is for configurations that are ordered through the IBM Power Systems
channel (AAS) by using e-config.
Product description Total ports that are enabled
Internal 10 Gb uplink 40 Gb uplink
49Y4270 A0TB / 3593 IBM Flex System Fabric EN4093 10Gb
Scalable Switch
򐂰 10x external 10 Gb uplinks
򐂰 14x internal 10 Gb ports
14 10 0
05Y3309 A3J6 / ESW7 IBM Flex System Fabric EN4093R 10Gb
Scalable Switch
򐂰 10x external 10 Gb uplinks
򐂰 14x internal 10 Gb ports
14 10 0
49Y4798 A1EL / 3596 IBM Flex System Fabric EN4093 10Gb
Scalable Switch (Upgrade 1)
򐂰 Adds 2x external 40 Gb uplinks
򐂰 Adds 14x internal 10 Gb ports
28 10 2
88Y6037 A1EM / 3597 IBM Flex System Fabric EN4093 10Gb
Scalable Switch (Upgrade 2) (requires
Upgrade 1):
򐂰 Adds 4x external 10 Gb uplinks
򐂰 Add 14x internal 10 Gb ports
42 14 2
Chapter 3. IBM Flex System networking architecture and portfolio 33
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
– Two internal full-duplex 1 GbE ports that are connected to the chassis management
module.
򐂰 External ports:
– A total of 14 ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for
1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10 GBASE-SR, or 10 GBASE-LR) or
SFP+ copper direct-attach cables (DAC). There are 10 ports enabled by default and an
optional FoD license is required to activate the remaining four ports. SFP+ modules
and DAC cables are not included and must be purchased separately.
– Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DACs (these ports are
disabled by default; an optional FoD license is required to activate them). QSFP+
modules and DAC cables are not included and must be purchased separately.
– One RS-232 serial port (mini-USB connector) that provides another means to
configure the switch module.
򐂰 Scalability and performance:
– 40 Gb Ethernet ports for extreme uplink bandwidth and performance
– Fixed-speed external 10 Gb Ethernet ports to use 10 Gb core infrastructure
– Support for 1G speeds on uplinks via proper SFP selection
– Non-blocking architecture with wire-speed forwarding of traffic and aggregated
throughput of 1.28 Tbps
– Media access control (MAC) address learning:
• Automatic update
• Support of up to 128,000 MAC addresses
– Up to 128 IP interfaces per switch
– Static and LACP (IEEE 802.1AX; previously known as 802.3ad) link aggregation with
up to:
• 220 Gb of total uplink bandwidth per switch
• 64 trunk groups
• 16 ports per group
– Support for cross switch aggregations via vLAG
– Support for jumbo frames (up to 9,216 bytes)
– Broadcast/multicast storm control
– IGMP snooping to limit flooding of IP multicast traffic
– IGMP filtering to control multicast traffic for hosts that participate in multicast groups
– Configurable traffic distribution schemes over aggregated links
– Fast port forwarding and fast uplink convergence for rapid STP convergence
򐂰 Availability and redundancy:
– VRRP for Layer 3 router redundancy
– IEEE 802.1D Spanning-tree to providing L2 redundancy, including support for:
• Multiple STP (MSTP) for topology optimization, up to 32 STP instances are
supported by single switch (previously known as 802.1s)
• Rapid STP (RSTP) provides rapid STP convergence for critical delay-sensitive
traffic, such as voice or video (previously known as 802.1w)
• Per-VLAN Rapid STP (PVRST) to seamlessly integrate into Cisco infrastructures
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
34 NIC Virtualization on IBM Flex System
– Layer 2 Trunk Failover to support active and standby configurations of network adapter
that team on compute nodes
– Hot Links provides basic link redundancy with fast recovery for network topologies that
require Spanning Tree to be turned off
򐂰 VLAN support:
– Up to 4095 active VLANs supported per switch, with VLAN numbers that range from 1
to 4094 (4095 is used for internal management functions only)
– 802.1Q VLAN tagging support on all ports
– Private VLANs
򐂰 Security:
– VLAN-based, MAC-based, and IP-based ACLs
– 802.1x port-based authentication
– Multiple user IDs and passwords
– User access control
– Radius, TACACS+, and LDAP authentication and authorization
򐂰 QoS:
– Support for IEEE 802.1p, IP ToS/DSCP, and ACL-based (MAC/IP source and
destination addresses, VLANs) traffic classification and processing
– Traffic shaping and re-marking based on defined policies
– Eight WRR priority queues per port for processing qualified traffic
򐂰 IP v4 Layer 3 functions:
– Host management
– IP forwarding
– IP filtering with ACLs, up to 896 ACLs supported
– VRRP for router redundancy
– Support for up to 128 static routes
– Routing protocol support (RIP v1, RIP v2, OSPF v2, and BGP-4), up to 2048 entries in
a routing table
– Support for DHCP Relay
– Support for IGMP snooping and IGMP relay
– Support for Protocol Independent Multicast (PIM) in Sparse Mode (PIM-SM) and
Dense Mode (PIM-DM).
򐂰 IP v6 Layer 3 functions:
– IPv6 host management (except default switch management IP address)
– IPv6 forwarding
– Up to 128 static routes
– Support of OSPF v3 routing protocol
– IPv6 filtering with ACLs
򐂰 Virtualization:
– Virtual NICs (vNICs): Ethernet, iSCSI, or FCoE traffic is supported on vNICs
– Unified fabric ports (UFPs): Ethernet or FCoE traffic is supported on UFPs
– Virtual link aggregation groups (vLAGs)
– 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing
networks to become virtual machine (VM)-aware.
Chapter 3. IBM Flex System networking architecture and portfolio 35
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are
mechanisms for switching between VMs on the same hypervisor.
Edge Control Protocol (ECP) is a transport protocol that operates between two peers
over an IEEE 802 LAN providing reliable, in-order delivery of upper layer protocol data
units.
Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows
centralized configuration of network policies that will persist with the VM, independent
of its location.
EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, and VDP.
– VMready
– Switch partitioning (SPAR)
򐂰 Converged Enhanced Ethernet:
– Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow
control to allow the switch to pause traffic that is based on the 802.1p priority value in
the VLAN tag of each packet.
– Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for
allocating link bandwidth that is based on the 802.1p priority value in the VLAN tag of
each packet.
– Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows
neighboring network devices to exchange information about their capabilities.
򐂰 FCoE:
– FC-BB5 FCoE specification compliant
– FCoE transit switch operations
– FCoE Initialization Protocol (FIP) support for automatic ACL configuration
– FCoE Link Aggregation Group (LAG) support
– Multi-hop RDMA over Converged Ethernet (RoCE) with LAG support
򐂰 Stacking:
– Up to eight switches in a stack
– Hybrid stacking support (from two to six EN4093/EN4093R switches with two CN4093
switches)
– FCoE support (EN4093R only)
– vNIC support
– 802.1Qbg support
򐂰 Manageability:
– Simple Network Management Protocol (SNMP V1, V2, and V3)
– HTTP browser GUI
– Telnet interface for CLI
– SSH
– Serial interface for CLI
– Scriptable CLI
– Firmware image update (TFTP and FTP)
– Network Time Protocol (NTP) for switch clock synchronization
򐂰 Monitoring:
– Switch LEDs for external port status and switch module status indication
– RMON agent to collect statistics and proactively monitor switch performance
– Port mirroring for analyzing network traffic that passes through switch
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
36 NIC Virtualization on IBM Flex System
– Change tracking and remote logging with syslog feature
– Support for sFLOW agent for monitoring traffic in data networks (separate sFLOW
analyzer required elsewhere)
– POST diagnostic testing
Table 3-2 compares the EN4093 to the EN4093R.
Table 3-2 EN4093 and EN4093R supported features
For more information, see IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable
Switches, TIPS0864, which is available at this website:
http://www.redbooks.ibm.com/abstracts/tips0864.html?Open
3.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch
The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch provides unmatched
scalability, performance, convergence, and network virtualization, while also delivering
innovations to help address a number of networking concerns and providing capabilities that
help you prepare for the future.
Feature EN4093 EN4093R
Layer 2 switching Yes Yes
Layer 3 switching Yes Yes
Switch Stacking Yes Yes
Virtual NIC (stand-alone) Yes Yes
Virtual NIC (stacking) Yes Yes
Unified Fabric Port (stand-alone) Yes Yes
Unified Fabric Port (stacking) No No
Edge virtual bridging (stand-alone) Yes Yes
Edge virtual bridging (stacking) Yes Yes
CEE/FCoE (stand-alone) Yes Yes
CEE/FCoE (stacking) No Yes
Chapter 3. IBM Flex System networking architecture and portfolio 37
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
The switch offers full Layer 2/3 switching and FCoE Full Fabric and Fibre Channel NPV
Gateway operations to deliver a converged and integrated solution. It is installed within the I/O
module bays of the IBM Flex System Enterprise Chassis. The switch can help you migrate to
a 10 Gb or 40 Gb converged Ethernet infrastructure and offers virtualization features such as
Virtual Fabric and IBM VMready, plus the ability to work with IBM Distributed Virtual Switch
5000V.
Figure 3-5 shows the IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch.
Figure 3-5 IBM Flex System Fabric CN4093 10 Gb Converged Scalable Switch
The CN4093 switch is initially licensed for 14 10-GbE internal ports, two external 10-GbE
SFP+ ports, and six external Omni Ports enabled.
The following other ports can be enabled:
򐂰 A total of 14 more internal ports and two external 40 GbE QSFP+ uplink ports with
Upgrade 1.
򐂰 A total of 14 more internal ports and six more external Omni Ports with the Upgrade 2
license options.
򐂰 Upgrade 1 and Upgrade 2 can be applied on the switch independently from each other or
in combination for full feature capability.
Table 3-3 shows the part numbers for ordering the switches and the upgrades.
Table 3-3 Part numbers and feature codes for ordering
Neither QSFP+ or SFP+ transceivers or cables are included with the switch. They must be
ordered separately.
Description Part number Feature code
(x-config / e-config)
Switch module
IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch 00D5823 A3HH / ESW2
Features on Demand upgrades
IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 1) 00D5845 A3HL / ESU1
IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 2) 00D5847 A3HM / ESU2
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
38 NIC Virtualization on IBM Flex System
The switch does not include a serial management cable. However, IBM Flex System
Management Serial Access Cable, 90Y9338, is supported and contains two cables, a
mini-USB-to-RJ45 serial cable and a mini-USB-to-DB9 serial cable, either of which can be
used to connect to the switch locally for configuration tasks and firmware updates.
The following base switch and upgrades are available:
򐂰 00D5823 is the part number for the physical device, which comes with 14 internal 10 GbE
ports enabled (one to each node bay), two external 10 GbE SFP+ ports that are enabled
to connect to a top-of-rack switch or other devices identified as EXT1 and EXT2, and six
Omni Ports enabled to connect to Ethernet or Fibre Channel networking infrastructure,
depending on the SFP+ cable or transceiver that is used. The six Omni ports are from the
12 that are labeled on the switch as EXT11 through EXT22.
򐂰 00D5845 (Upgrade 1) can be applied on the base switch when you need more uplink
bandwidth with two 40 GbE QSFP+ ports that can be converted into 4x 10 GbE SFP+
DAC links with the optional break-out cables. These are labeled EXT3, EXT7 or
EXT3-EXT6, EXT7-EXT10 if converted. This upgrade also enables 14 more internal ports,
for a total of 28 ports, to provide more bandwidth to the compute nodes using 4-port
expansion cards.
򐂰 00D5847 (Upgrade 2) can be applied on the base switch when you need more external
Omni Ports on the switch or if you want more internal bandwidth to the node bays. The
upgrade enables the remaining six external Omni Ports from range EXT11 through
EXT22, plus 14 more internal 10 Gb ports, for a total of 28 internal ports, to provide more
bandwidth to the compute nodes by using 4-port expansion cards.
򐂰 Both 00D5845 (Upgrade 1) and 00D5847 (Upgrade 2) can be applied on the switch at the
same time so that you can use six ports on an 8-port expansion card, and use all the
external ports on the switch.
Table 3-4 shows the switch upgrades and the ports they enable.
Table 3-4 CN4093 10 Gb Converged Scalable Switch part numbers and port upgrades
The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch has the following
features and specifications:
򐂰 Internal ports:
– A total of 42 internal full-duplex 10 Gigabit ports. (A total of 14 ports are enabled by
default. Optional FoD licenses are required to activate the remaining 28 ports.)
– Two internal full-duplex 1 GbE ports that are connected to the Chassis Management
Module.
Part
number
Feature
codea
a. The first feature code that is listed is for configurations that are ordered through System x sales channels (HVEC)
by using x-config. The second feature code is for configurations that are ordered through the IBM Power Systems
channel (AAS) by using e-config.
Description Total ports that are enabled
Internal
10Gb
External
10Gb SFP+
External
10Gb Omni
External
40Gb QSFP+
00D5823 A3HH / ESW2 Base switch (no upgrades) 14 2 6 0
00D5845 A3HL / ESU1 Add Upgrade 1 28 2 6 2
00D5847 A3HM / ESU2 Add Upgrade 2 28 2 12 0
00D5845
00D5847
A3HL / ESU1
A3HM / ESU2
Add both Upgrade 1 and
Upgrade 2
42 2 12 2
Chapter 3. IBM Flex System networking architecture and portfolio 39
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
򐂰 External ports:
– Two ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for 1000BASE-SX,
1000BASE-LX, 1000BASE-T, 10GBASE-SR, 10GBASE-LR, or SFP+ copper
direct-attach cables (DACs)). These two ports are enabled by default. SFP+ modules
and DACs are not included and must be purchased separately.
– Twelve IBM Omni Ports. Each of them can operate as 10 Gb Ethernet (support for
10GBASE-SR, 10GBASE-LR, or 10 GbE SFP+ DACs), or auto-negotiating as 4/8 Gb
Fibre Channel, depending on the SFP+ transceiver that is installed in the port. The first
six ports are enabled by default. An optional FoD license is required to activate the
remaining six ports. SFP+ modules and DACs are not included and must be purchased
separately.
– Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DACs. (Ports are disabled
by default. An optional FoD license is required to activate them.) Also, you can use
break-out cables to break out each 40 GbE port into four 10 GbE SFP+ connections.
QSFP+ modules and DACs are not included and must be purchased separately.
– One RS-232 serial port (mini-USB connector) that provides another means to
configure the switch module.
򐂰 Scalability and performance:
– 40 Gb Ethernet ports for extreme uplink bandwidth and performance.
– Fixed-speed external 10 Gb Ethernet ports to use the 10 Gb core infrastructure.
– Non-blocking architecture with wire-speed forwarding of traffic and aggregated
throughput of 1.28 Tbps on Ethernet ports.
– MAC address learning: Automatic update, and support for up to 128,000 MAC
addresses.
– Up to 128 IP interfaces per switch.
– Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total uplink
bandwidth per switch, up to 64 trunk groups, and up to 16 ports per group.
– Support for jumbo frames (up to 9,216 bytes).
– Broadcast/multicast storm control.
– IGMP snooping to limit flooding of IP multicast traffic.
– IGMP filtering to control multicast traffic for hosts that participate in multicast groups.
– Configurable traffic distribution schemes over trunk links that are based on
source/destination IP or MAC addresses or both.
– Fast port forwarding and fast uplink convergence for rapid STP convergence.
򐂰 Availability and redundancy:
– VRRP for Layer 3 router redundancy.
– IEEE 802.1D STP for providing L2 redundancy.
– IEEE 802.1s MSTP for topology optimization. Up to 32 STP instances are supported
by a single switch.
– IEEE 802.1w RSTP provides rapid STP convergence for critical delay-sensitive traffic,
such as voice or video.
– PVRST enhancements.
Omni Ports support: Note: Omni Ports do not support 1 Gb Ethernet operations.
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
40 NIC Virtualization on IBM Flex System
– Layer 2 Trunk Failover to support active/standby configurations of network adapter
teaming on compute nodes.
– Hot Links provides basic link redundancy with fast recovery for network topologies that
require Spanning Tree to be turned off.
򐂰 VLAN support:
– Up to 1024 VLANs supported per switch, with VLAN numbers from 1 - 4095. (4095 is
used for management module’s connection only).
– 802.1Q VLAN tagging support on all ports.
– Private VLANs.
򐂰 Security:
– VLAN-based, MAC-based, and IP-based access control lists (ACLs).
– 802.1x port-based authentication.
– Multiple user IDs and passwords.
– User access control.
– Radius, TACACS+, and LDAP authentication and authorization.
򐂰 QoS
– Support for IEEE 802.1p, IP ToS/DSCP, and ACL-based (MAC/IP source and
destination addresses, VLANs) traffic classification and processing.
– Traffic shaping and re-marking based on defined policies.
– Eight WRR priority queues per port for processing qualified traffic.
򐂰 IP v4 Layer 3 functions:
– Host management.
– IP forwarding.
– IP filtering with ACLs, with up to 896 ACLs supported.
– VRRP for router redundancy.
– Support for up to 128 static routes.
– Routing protocol support (RIP v1, RIP v2, OSPF v2, and BGP-4), for up to 2048 entries
in a routing table.
– Support for DHCP Relay.
– Support for IGMP snooping and IGMP relay.
– Support for PIM in PIM-SM and PIM-DM.
򐂰 IP v6 Layer 3 functions:
– IPv6 host management (except for a default switch management IP address).
– IPv6 forwarding.
– Up to 128 static routes.
– Support for OSPF v3 routing protocol.
– IPv6 filtering with ACLs.
򐂰 Virtualization:
– vNICs: Ethernet, iSCSI, or FCoE traffic is supported on vNICs.
– UFPs: Ethernet or FCoE traffic is supported on UFPs
– 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing
networks to become virtual machine (VM)-aware:
• Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are
mechanisms for switching between VMs on the same hypervisor.
Chapter 3. IBM Flex System networking architecture and portfolio 41
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
• Edge Control Protocol (ECP) is a transport protocol that operates between two
peers over an IEEE 802 LAN providing reliable and in-order delivery of upper layer
protocol data units.
• Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows
centralized configuration of network policies that persists with the VM, independent
of its location.
• EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, and
VDP.
– VMready.
򐂰 Converged Enhanced Ethernet
– Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow
control to allow the switch to pause traffic that is based on the 802.1p priority value in
each packet’s VLAN tag.
– Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for
allocating link bandwidth that is based on the 802.1p priority value in each packet’s
VLAN tag.
– Data center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows
neighboring network devices to exchange information about their capabilities.
򐂰 Fibre Channel over Ethernet (FCoE)
– FC-BB5 FCoE specification compliant.
– Native FC Forwarder switch operations.
– End-to-end FCoE support (initiator to target).
– FCoE Initialization Protocol (FIP) support.
򐂰 Fibre Channel
– Omni Ports support 4/8 Gb FC when FC SFPs+ are installed in these ports.
– Full Fabric mode for end-to-end FCoE or NPV Gateway mode for external FC SAN
attachments (support for IBM B-type, Brocade, and Cisco MDS external SANs).
– Fabric services in Full Fabric mode:
• Name Server
• Registered State Change Notification (RSCN)
• Login services
• Zoning
򐂰 Stacking
– Hybrid stacking support (from two to six EN4093/EN4093R switches with two CN4093
switches)
– FCoE support
– vNIC support
– 802.1Qbg support
򐂰 Manageability
– Simple Network Management Protocol (SNMP V1, V2, and V3).
– HTTP browser GUI.
– Telnet interface for CLI.
– SSH.
– Secure FTP (sFTP).
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
42 NIC Virtualization on IBM Flex System
– Service Location Protocol (SLP).
– Serial interface for CLI.
– Scriptable CLI.
– Firmware image update (TFTP and FTP).
– Network Time Protocol (NTP) for switch clock synchronization.
򐂰 Monitoring
– Switch LEDs for external port status and switch module status indication.
– Remote Monitoring (RMON) agent to collect statistics and proactively monitor switch
performance.
– Port mirroring for analyzing network traffic that passes through a switch.
– Change tracking and remote logging with syslog feature.
– Support for sFLOW agent for monitoring traffic in data networks (separate sFLOW
analyzer is required elsewhere).
– POST diagnostic tests.
For more information, see the IBM Redbooks Product Guide IBM Flex System Fabric
CN4093 10Gb Converged Scalable Switch, TIPS0910, found at:
http://www.redbooks.ibm.com/abstracts/tips0910.html?Open
3.2.3 IBM Flex System Fabric SI4093 System Interconnect Module
The IBM Flex System Fabric SI4093 System Interconnect Module enables simplified
integration of IBM Flex System into your existing networking infrastructure.
The SI4093 System Interconnect Module requires no management for most data center
environments. This eliminates the need to configure each networking device or individual
ports, which reduces the number of management points. It provides a low latency, loop-free
interface that does not rely upon spanning tree protocols, which removes one of the greatest
deployment and management complexities of a traditional switch.
The SI4093 System Interconnect Module offers administrators a simplified deployment
experience while maintaining the performance of intra-chassis connectivity.
The SI4093 System Interconnect Module is shown in Figure 3-6 on page 42.
Figure 3-6 IBM Flex System Fabric SI4093 System Interconnect Module
Chapter 3. IBM Flex System networking architecture and portfolio 43
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
The SI4093 System Interconnect Module is initially licensed for 14 10-Gb internal ports
enabled and 10 10-Gb external uplink ports enabled. More ports can be enabled, including 14
internal ports and two 40 Gb external uplink ports with Upgrade 1, and 14 internal ports and
four SFP+ 10 Gb external ports with Upgrade 2 license options. Upgrade 1 must be applied
before Upgrade 2 can be applied.
Table 3-5 shows the part numbers for ordering the switches and the upgrades.
Table 3-5 SI4093 ordering information
The following base switch and upgrades are available:
򐂰 95Y3313 is the part number for the physical device, and it comes with 14 internal 10 Gb
ports enabled (one to each node bay) and 10 external 10 Gb ports enabled for
connectivity to an upstream network, plus external servers and storage. All external 10 Gb
ports are SFP+ based connections.
򐂰 95Y3318 (Upgrade 1) can be applied on the base interconnect module to make full use of
4-port adapters that are installed in each compute node. This upgrade enables 14 more
internal ports, for a total of 28 ports. The upgrade also enables two 40 Gb uplinks with
QSFP+ connectors. These QSFP+ ports can also be converted to four 10 Gb SFP+ DAC
connections by using the appropriate fan-out cable. This upgrade requires the base
interconnect module.
򐂰 95Y3320 (Upgrade 2) can be applied on top of Upgrade 1 when you want more uplink
bandwidth on the interconnect module or if you want more internal bandwidth to the
compute nodes with the adapters capable of supporting six ports (like CN4058). The
upgrade enables the remaining four external 10 Gb uplinks with SFP+ connectors, plus 14
internal 10 Gb ports, for a total of 42 ports (three to each compute node).
Table 3-6 lists the supported port combinations on the interconnect module and the required
upgrades.
Table 3-6 Supported port combinations
Description Part
number
Feature code
(x-config / e-config)
Interconnect module
IBM Flex System Fabric SI4093 System Interconnect Module 95Y3313 A45T / ESWA
Features on Demand upgrades
SI4093 System Interconnect Module (Upgrade 1) 95Y3318 A45U / ESW8
SI4093 System Interconnect Module (Upgrade 2) 95Y3320 A45V / ESW9
Important: SFP and SFP+ (small form-factor pluggable plus) transceivers or cables are
not included with the switch. They must be ordered separately. See Table 3-6 on page 43.
Quantity required
Supported port combinations Base switch, 95Y3313 Upgrade 1, 95Y3318 Upgrade 2, 95Y3320
14x internal 10 GbE
10x external 10 GbE
1 0 0
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
44 NIC Virtualization on IBM Flex System
The SI4093 System Interconnect Module has the following features and specifications:
򐂰 Modes of operations:
– Transparent (or VLAN-agnostic) mode
In VLAN-agnostic mode (default configuration), the SI4093 transparently forwards
VLAN tagged frames without filtering on the customer VLAN tag, which provides an
end host view to the upstream network. The interconnect module provides traffic
consolidation in the chassis to minimize TOR port usage, and it enables
server-to-server communication for optimum performance (for example, vMotion). It
can be connected to the FCoE transit switch or FCoE gateway (FC Forwarder) device.
– Local Domain (or VLAN-aware) mode
In VLAN-aware mode (optional configuration), the SI4093 provides more security for
multi-tenant environments by extending client VLAN traffic isolation to the interconnect
module and its uplinks. VLAN-based access control lists (ACLs) can be configured on
the SI4093. When FCoE is used, the SI4093 operates as an FCoE transit switch, and it
should be connected to the FCF device.
򐂰 Internal ports:
– A total of 42 internal full-duplex 10 Gigabit ports; 14 ports are enabled by default.
Optional FoD licenses are required to activate the remaining 28 ports.
– Two internal full-duplex 1 GbE ports that are connected to the chassis management
module.
򐂰 External ports:
– A total of 14 ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for
1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, or 10GBASE-LR) or
SFP+ copper direct-attach cables (DAC). A total of 10 ports are enabled by default. An
optional FoD license is required to activate the remaining four ports. SFP+ modules
and DACs are not included and must be purchased separately.
– Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DACs. (Ports are disabled
by default. An optional FoD license is required to activate them.) QSFP+ modules and
DACs are not included and must be purchased separately.
– One RS-232 serial port (mini-USB connector) that provides another means to
configure the switch module.
򐂰 Scalability and performance:
– 40 Gb Ethernet ports for extreme uplink bandwidth and performance.
– External 10 Gb Ethernet ports to use 10 Gb upstream infrastructure.
28x internal 10 GbE
10x external 10 GbE
2x external 40 GbE
1 1 0
42x internal 10 GbEa
14x external 10 GbE
2x external 40 GbE
1 1 1
a. This configuration uses six of the eight ports on the CN4058 adapter that are available for IBM Power Systems™
compute nodes.
Quantity required
Supported port combinations Base switch, 95Y3313 Upgrade 1, 95Y3318 Upgrade 2, 95Y3320
Chapter 3. IBM Flex System networking architecture and portfolio 45
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
– Non-blocking architecture with wire-speed forwarding of traffic and aggregated
throughput of 1.28 Tbps.
– Media access control (MAC) address learning: automatic update, support for up to
128,000 MAC addresses.
– Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total uplink
bandwidth per interconnect module.
– Support for jumbo frames (up to 9,216 bytes).
򐂰 Availability and redundancy:
– Layer 2 Trunk Failover to support active and standby configurations of network adapter
teaming on compute nodes.
– Built in link redundancy with loop prevention without a need for Spanning Tree protocol.
򐂰 VLAN support:
– Up to 32 VLANs supported per interconnect module SPAR partition, with VLAN
numbers 1 - 4095. (4095 is used for management module’s connection only.)
– 802.1Q VLAN tagging support on all ports.
򐂰 Security:
– VLAN-based access control lists (ACLs) (VLAN-aware mode).
– Multiple user IDs and passwords.
– User access control.
– Radius, TACACS+, and LDAP authentication and authorization.
򐂰 QoS
Support for IEEE 802.1p traffic classification and processing.
򐂰 Virtualization:
– Switch Independent Virtual NIC (vNIC2): Ethernet, iSCSI, or FCoE traffic is supported
on vNICs.
– SPAR:
• SPAR forms separate virtual switching contexts by segmenting the data plane of the
switch. Data plane traffic is not shared between SPARs on the same switch.
• SPAR operates as a Layer 2 broadcast network. Hosts on the same VLAN attached
to a SPAR can communicate with each other and with the upstream switch. Hosts
on the same VLAN but attached to different SPARs communicate through the
upstream switch.
• SPAR is implemented as a dedicated VLAN with a set of internal server ports and a
single uplink port or link aggregation (LAG). Multiple uplink ports or LAGs are not
allowed in SPAR. A port can be a member of only one SPAR.
򐂰 Converged Enhanced Ethernet:
– Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow
control to allow the switch to pause traffic based on the 802.1p priority value in each
packet’s VLAN tag.
– Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for
allocating link bandwidth based on the 802.1p priority value in each packet’s VLAN tag.
– Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows
neighboring network devices to exchange information about their capabilities.
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
46 NIC Virtualization on IBM Flex System
򐂰 FCoE:
– FC-BB5 FCoE specification compliant
– FCoE transit switch operations
– FCoE Initialization Protocol (FIP) support
򐂰 Manageability:
– IPv4 and IPv6 host management.
– Simple Network Management Protocol (SNMP V1, V2, and V3).
– Industry standard command-line interface (IS-CLI) through Telnet, SSH, and serial
port.
– Secure FTP (sFTP).
– Service Location Protocol (SLP).
– Firmware image update (TFTP and FTP/sFTP).
– Network Time Protocol (NTP) for clock synchronization.
– IBM System Networking Switch Center (SNSC) support.
򐂰 Monitoring:
– Switch LEDs for external port status and switch module status indication.
– Change tracking and remote logging with syslog feature.
– POST diagnostic tests.
For more information, see IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable
Switches, TIPS0864, which is available at this website:
http://www.redbooks.ibm.com/abstracts/tips0864.html?Open
3.2.4 I/O modules and cables
The Ethernet I/O modules support for interface modules and cables is shown in Table 3-7.
Table 3-7 Modules and cables supported in Ethernet I/O modules
Part number Description EN4093 EN4093R CN4093 SI4093
44W4408 10GbE 850 nm Fiber SFP+ Transceiver (SR) Yes Yes Yes Yes
46C3447 IBM SFP+ SR Transceiver Yes Yes Yes Yes
90Y9412 IBM SFP+ LR Transceiver Yes Yes Yes Yes
81Y1622 IBM SFP SX Transceiver Yes Yes Yes Yes
81Y1618 IBM SFP RJ45 Transceiver Yes Yes Yes Yes
90Y9424 IBM SFP LX Transceiver Yes Yes Yes Yes
49Y7884 IBM QSFP+ SR Transceiver Yes Yes Yes Yes
90Y9427 1m IBM Passive DAC SFP+ Cable Yes Yes Yes Yes
00AY764 1.5m IBM Passive DAC SFP+ Cable No Yes Yes Yes
00AY765 2m IBM Passive DAC SFP+ Cable No Yes Yes Yes
90Y9430 3m IBM Passive DAC SFP+ Cable Yes Yes Yes Yes
90Y9433 5m IBM Passive DAC SFP+ Cable Yes Yes Yes Yes
Chapter 3. IBM Flex System networking architecture and portfolio 47
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
All Ethernet /O modules are restricted to the use of the SFP/SFP+ modules that are listed
in Table 3-7 on page 46.
3.3 IBM Flex System Ethernet adapters
The IBM Flex System portfolio contains a number of Ethernet I/O adapters. The cards are a
combination of 1 Gb, 10 Gb, and 40 Gb ports and advanced function support that includes
converged networks and virtual NICs.
The following Ethernet I/O adapters are described:
򐂰 3.3.1, “Embedded 10Gb Virtual Fabric Adapter”
򐂰 3.3.2, “IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters” on page 48
򐂰 3.3.3, “IBM Flex System CN4022 2-port 10Gb Converged Adapter” on page 50
򐂰 3.3.4, “IBM Flex System x222 Compute Node LOM” on page 52
3.3.1 Embedded 10Gb Virtual Fabric Adapter
Some models of the x240 - those with a model of the form 8737-x4x - include an Embedded
10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard or LOM) built into the
system board. Table 2 lists the models of the x240 include the Embedded 10Gb Virtual Fabric
Adapter. Each x240 model that includes the embedded 10 Gb VFA also has the Compute
Node Fabric Connector installed in I/O connector 1 (and physically screwed onto the system
board) to provide connectivity to the Enterprise Chassis midplane. Figure 3 shows the
location of the Fabric Connector.
The Fabric Connector enables port 1 on the embedded 10Gb VFA to be routed to I/O module
bay 1 and port 2 to be routed to I/O module bay 2. The Fabric Connector can be unscrewed
and removed, if required, to allow the installation of an I/O adapter on I/O connector 1.
The Embedded 10Gb VFA is based on the Emulex BladeEngine 3R (BE3R), which is a
single-chip, dual-port 10 Gigabit Ethernet (10GbE) Ethernet Controller.
00D6151 7m IBM Passive DAC SFP+ Cable No Yes Yes Yes
49Y7886 1m IBM QSFP+ DAC Break Out Cbl. Yes Yes Yes Yes
49Y7887 3m IBM QSFP+ DAC Break Out Cbl. Yes Yes Yes Yes
49Y7888 5m IBM QSFP+ DAC Break Out Cbl. Yes Yes Yes Yes
90Y3519 10m IBM QSFP+ MTP Optical cable Yes Yes Yes Yes
90Y3521 30m IBM QSFP+ MTP Optical cable Yes Yes Yes Yes
49Y7890 1m IBM QSFP+-to-QSFP+ cable Yes Yes Yes Yes
49Y7891 3m IBM QSFP+-to-QSFP+ cable Yes Yes Yes Yes
00D5810 5m IBM QSFP+ to QSFP+ Cable No Yes Yes Yes
00D5813 7m IBM QSFP+ to QSFP+ Cable No Yes Yes Yes
Part number Description EN4093 EN4093R CN4093 SI4093
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
48 NIC Virtualization on IBM Flex System
These are some of the features of the Embedded 10Gb VFA:
򐂰 PCI-Express Gen2 x8 host bus interface
򐂰 Supports connection to 10 Gb and 1 Gb Flex System Ethernet switches Supports multiple
virtual NIC (vNIC) functions
򐂰 TCP/IP Offload Engine (TOE enabled)
򐂰 SR-IOV capable
򐂰 RDMA over TCP/IP capable
򐂰 iSCSI and FCoE upgrade offering via FoD
The following table lists the ordering information for the IBM Virtual Fabric Advanced Software
Upgrade (LOM), which enables the iSCSI and FCoE support on the Embedded 10Gb Virtual
Fabric Adapter.
Table 3-8 Feature on Demand upgrade for FCoE and iSCSI support
3.3.2 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters
The IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters is a 4-port 10 Gb
converged network adapter. It can scale to up to 16 virtual ports and support multiple
protocols, such as Ethernet, iSCSI, and FCoE.
Figure 3-7 shows the IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters.
Figure 3-7 The CN4054/CN4054R 10Gb Virtual Fabric Adapter for IBM Flex System
Table 3-9 lists the ordering part numbers and feature codes.
Part
number
x-config
feature code
e-config
feature code
7863-10X
feature code
Description
90Y9310 A2TD None IBM Virtual Fabric Advanced
Software Upgrade (LOM)
Chapter 3. IBM Flex System networking architecture and portfolio 49
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
Table 3-9 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter ordering information
The IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter has the following
features and specifications:
򐂰 Two ASICs per adapter.
– CN4054: Dual-ASIC Emulex BladeEngine 3 (BE3) controller.
– CN4054R: Dual-ASIC Emulex BladeEngine 3R (BE3R) controller.
򐂰 Operates as a 4-port 1/10 Gb Ethernet adapter, or supports up to 16 Virtual Network
Interface Cards (vNICs).
򐂰 In virtual NIC (vNIC) mode, it supports:
– Virtual port bandwidth allocation in 100 Mbps increments.
– Up to 16 virtual ports per adapter (four per port).
– With the CN4054/CN4054R Virtual Fabric Adapter Upgrade, 90Y3558, four of the 16
vNICs (one per port) support iSCSI or FCoE.
򐂰 Support for two vNIC modes: IBM Virtual Fabric Mode and Switch Independent Mode.
򐂰 Wake On LAN support.
򐂰 With the CN4054/CN4054R Virtual Fabric Adapter Upgrade, 90Y3558, the adapter adds
FCoE and iSCSI hardware initiator support. iSCSI support is implemented as a full offload
and presents an iSCSI adapter to the operating system.
򐂰 TCP offload Engine (TOE) support with Windows Server 2003, 2008, and 2008 R2 (TCP
Chimney) and Linux.
򐂰 The connection and its state are passed to the TCP offload engine.
򐂰 Data transmit and receive is handled by the adapter.
򐂰 Supported by iSCSI.
򐂰 Connection to either 1 Gb or 10 Gb data center infrastructure (1 Gb and 10 Gb
auto-negotiation).
򐂰 PCI Express 3.0 x8 host interface.
򐂰 Full-duplex capability.
򐂰 Bus-mastering support.
򐂰 DMA support.
򐂰 PXE support.
򐂰 IPv4/IPv6 TCP, UDP checksum offload:
– Large send offload
– Large receive offload
– RSS
– IPv4 TCP Chimney offload
Part
number
x-config
feature
code
e-config
feature
code
7863-10X
feature
code
Description
90Y3554 A1R1 None 1759 CN4054 10Gb Virtual Fabric Adapter
90Y3558 A1R0 None 1760 CN4054 Virtual Fabric Adapter Upgrade
00Y3306 A4K2 None A4K2 CN4054R 10Gb Virtual Fabric Adapter
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
50 NIC Virtualization on IBM Flex System
– TCP Segmentation offload
򐂰 VLAN insertion and extraction.
򐂰 Jumbo frames up to 9000 bytes.
򐂰 Load balancing and failover support, including AFT, SFT, ALB, teaming support, and IEEE
802.3ad.
򐂰 Enhanced Ethernet (draft):
– Enhanced Transmission Selection (ETS) (P802.1Qaz).
– Priority-based Flow Control (PFC) (P802.1Qbb).
– Data Center Bridging Capabilities eXchange Protocol, CIN-DCBX, and CEE-DCBX
(P802.1Qaz).
򐂰 Supports Serial over LAN (SoL).
For more information, see IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter
and EN4054 4-port 10Gb Ethernet Adapter, TIPS0868, which can be found at this website:
http://www.redbooks.ibm.com/abstracts/tips0868.html
3.3.3 IBM Flex System CN4022 2-port 10Gb Converged Adapter
The IBM Flex System CN4022 2-port 10Gb Converged Adapter is a dual-port 10 Gigabit
Ethernet network adapter that supports Ethernet, Fibre Channel over Ethernet (FCoE), and
Internet Small Computer System Interface (iSCSI) protocols out of the box. Clients now have
a choice of multiple vendors without compromising the features. This adapter also supports
virtual network interface controller (vNIC) capability, which helps clients reduce cost and
complexity. The CN4022 adapter is based on the Broadcom 57840 controller and offers a
PCIe 2.0 x8 host interface. This IBM Redbooks Product Guide describes the IBM Flex
System CN4022 2-port 10Gb Converged Adapter.
Figure 3-8 The CN4022 2-port 10Gb Converged Adapter is shown in Figure 3-8.
Figure 3-8 IBM Flex System CN4022 2-port 10Gb Converged Adapter
Chapter 3. IBM Flex System networking architecture and portfolio 51
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
This CN4022 is based on the industry-standard PCIe architecture and is ideal for clients that
use 10 GbE in their network infrastructure and that are looking for an entry price point for
FCoE or iSCSI capabilities. The adapter ships standard with support for FCoE and iSCSi and
with vNIC features that allow each physical port of the adapter to be virtualized into four
virtual NICs (vNICs).
Table 3-10List the ordering part numbers and feature codes.
Table 3-10 IBM Flex System CN4022 2-port 10 Gb Converged Adapter ordering information
The IBM Flex System CN4022 2-port 10Gb Converged Adapter has the following features
and specifications:
The IBM Flex System CN4022 2-port 10Gb Converged Adapter has these features:
򐂰 One Broadcom BCM57840 ASIC
򐂰 Connection 10 Gb data center infrastructure
򐂰 PCI Express 2.0 x8 host interface
򐂰 Full line-rate performance
򐂰 Supports 10 Gb Ethernet, FCoE, and iSCSI
򐂰 IBM Flex System Manager support (Tier 2 support only, no alerting)
򐂰 Ethernet features
– Ethernet frame: 1500 byte or 9600 byte (jumbo frame)
– Virtual LAN (VLAN) support with VLAN tagging
– vNIC support:
• Supports Switch Independent Mode (vNIC2 mode)
• UFP mode support planned in 2014
• Four vNIC/NPAR Ethernet devices per 10Gb physical port
• Support either for two iSCSI ports or for one iSCSI port and one FCoE port, per
10 Gb physical port
򐂰 Stateless offload
– IP, TCP, and UDP checksum offloads
– IPv4 and IPv6 offloads
– Large send offload (LSO)
򐂰 Performance optimization
– Receive Side Scaling (RSS)
– Transmit Side Scaling (TSS)
– MSI and MSI-X support
– RX/TX multiqueue
– TCP Offload Engine (TOE) support
򐂰 SR-IOV-ready
򐂰 Wake on LAN
򐂰 Preboot eXecution Environment (PXE) support
Part
number
x-config feature
code
e-config feature
code
Description
88Y5920 A4K3 A4K3 IBM Flex System CN4022
2-port 10Gb Converged
Adapter
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
52 NIC Virtualization on IBM Flex System
򐂰 Network teaming, failover, and load balancing
– Smart Load Balancing (SLB)
– Link Aggregation Control Protocol (LACP) and generic trunking
– Management using Broadcom Advanced Control Suite management application
򐂰 Compliance
– IEEE 802.3ae (10 Gb Ethernet)
– IEEE 802.3ad (Link aggregation)
– IEEE 802.3ap Clause73 1G/10G Autonegotiation for 10GBase-KR channels
– IEEE 802.1q (VLAN)
– IEEE 802.1p (Priority Encoding)
– IEEE 802.3x (Flow Control)
– IEEE 802.1au (Congestion Notification)
– IPv4 (RFQ 791)
– IPv6 (RFC 2460)
– IEEE 1588/802.1as (Precision Time Protocol (PTP))
– IEEE 802.1Qbb Priority Flow Control (PFC)
– IEEE 802.1Qaz Enhanced Transmission Selection (ETS)
򐂰 iSCSI features
– iSCSI initiator hardware offload and boot support
– Protocols
• RFC 3347 (iSCSI Requirements and Design Considerations)
• Challenge Handshake Authentication Protocol (CHAP)
• iSNS
• Service Location Protocol (SLP)
򐂰 FCoE features
– 3,500 N_Port ID Virtualization (NPIV) interfaces (total for adapter)
– Support for FIP and FCoE Ethertypes
– Fabric Provided Media Access Control (MAC) Addressing (FPMA) support
– 2,048 concurrent port logins (RPIs) per port
– 1,024 active exchanges (XRIs) per port
For more information, see IBM Flex System CN4022 2-port 10Gb Converged Adapter, TIPS1087,
which can be found at this website:
http://www.redbooks.ibm.com/abstracts/tips1087.html?Open
3.3.4 IBM Flex System x222 Compute Node LOM
The IBM Flex System x222 Compute Node is a high-density dual-server offering that is
designed for virtualization, dense cloud deployments, and hosted clients. The x222 has two
independent servers in one mechanical package, which means that the x222 has a
double-density design that allows up to 28 servers to be housed in a single 10U Flex System
Enterprise Chassis.
Notes:
򐂰 FCoE is not supported with Red Hat Enterprise Linux KVM
򐂰 FCoE support for VLAN discovery only with the port PVID = 1
򐂰 FCoE SAN boot is not supported
Chapter 3. IBM Flex System networking architecture and portfolio 53
Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm
The following figure shows the IBM Flex System x222 Compute Node.
Figure 3-9 IBM Flex System x222 Compute Node
More information on the specifics for the x222 can be found at the Redbooks Publication link
below;
http://www.redbooks.ibm.com/abstracts/tips1036.html
Embedded 10Gb Virtual Fabric Adapter on the x222
Each server in the x222 includes an Embedded 10Gb Virtual Fabric Adapter (VFA, also
known as LAN on Motherboard or LOM) built in to the system board. The x222 has one Fabric
Connector (which is physically on the lower server) and the Ethernet connections from both
Embedded 10 Gb VFAs are routed through it. Figure 5 shows the physical location of the
Fabric Connector.
Figure 3-10 below shows the internal connections between the Embedded 10Gb VFAs and
the switches in chassis bays 1 and 2.
Figure 3-10 Embedded 10 Gb VFA connectivity to the switches (switch port upgrades applies to
EN4093, EN4093R, CN4093 and SI4093 switches)
Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm
54 NIC Virtualization on IBM Flex System
In Figure 3-10 on page 53:
򐂰 The blue lines show that the two Ethernet ports in the upper server route to switches in
bay 1 and bay 2. These connections require that the switch have Upgrade 1 enabled so as
to enable the second bank of internal ports, ports 15-28 (Alias ports INTB1-INTB14).
򐂰 The red lines show that the two Ethernet ports in the lower server also route to switches in
bay 1 and bay 2. These connections both go to the base ports of the switch, ports 1-14
(Alias ports INTA1-INTA14)
Switch upgrade 1 required: For EN4093, EN4093R, CN4093 and SI4093 switches,
Upgrade 1 must be enabled in the two switches. Without this feature upgrade, the upper
server will not have any Ethernet connectivity.
The Embedded 10Gb VFA is based on the Emulex BladeEngine 3 (BE3), which is a
single-chip, dual-port 10 Gigabit Ethernet (10GbE) Ethernet Controller. These are some of
the features of the Embedded 10Gb VFA:
򐂰 PCI-Express Gen2 x8 host bus interface
򐂰 Supports multiple virtual NIC (vNIC) functions
򐂰 TCP/IP Offload Engine (TOE enabled)
򐂰 SR-IOV capable
򐂰 RDMA over TCP/IP capable
򐂰 iSCSI and FCoE upgrade offering through FoD
Table 3-11 on page 54 lists the ordering information for the IBM Flex System Embedded
10Gb Virtual Fabric Upgrade, which enables the iSCSI and FCoE support on the Embedded
10Gb Virtual Fabric Adapter.
Table 3-11 Feature on Demand upgrade for FCoE and iSCSI support
Supported switches
The x222 supports only Ethernet scalable switches with at least the first internal port upgrade
enabled.
Table 3-12 Supported Switches
Part Number Feature
Code
Description Maximum supported
90Y9310 A2TD IBM Virtual Fabric Advanced Software Upgrade
(LOM)
1 per server
2 per x222 Compute Node
TIP: Two licenses required: To enable the FCoE/iSCSI upgrade for both servers in the
x222 Compute Node, two licenses are required.
Adapter Switches supported Minimum required switch
upgrades
Embedded 10 GbE
Virtual Fabric Adapter
EN4093R 10Gb Scalable Switch (95Y3309) Upgrade 1 (49Y4798)
CN4093 10Gb Converged Scalable Switch (00D5823) Upgrade 1 (00D5845) or
Upgrade 2 (00D5847)
SI4093 System Interconnect Module (95Y3313) Upgrade 1 (95Y3318)
© Copyright IBM Corp. 2014. All rights reserved. 55
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
Chapter 4. NIC virtualization considerations
on the switch side
This paper is primarily focused on the various options to virtualize NIC technology. This
section introduces the two primary types of NIC Virtualization (vNIC and UFP) available on
the Flex System switches, as well as introduces and discusses considerations of the various
sub-elements of these virtual NIC technologies.
At the core of all virtual NICs discussed in this section, is the ability to take a single physical
10 GbE NIC, and carve it up into up to four virtual NICs, for use in the attaching host.
This chapter focuses on various deployment considerations when looking at making the right
choice in NIC virtualization within a PureFlex System environment.
The following topics are covered:
򐂰 4.1, “Virtual Fabric vNIC solution capabilities” on page 56
򐂰 4.2, “Unified Fabric Port feature” on page 64
򐂰 4.3, “Compute node NIC to I/O module connectivity mapping” on page 70
4
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
56 NIC Virtualization on IBM Flex System
4.1 Virtual Fabric vNIC solution capabilities
Virtual Network Interface Controller (called vNIC in this paper) was the original way IBM
switches provided the ability to divide a physical NIC into smaller logical NICs, so that the OS
has more ways to logically connect to the infrastructure. The vNIC feature is supported only
on 10 Gb ports that face the compute nodes within the chassis, and only on certain Ethernet
I/O modules. These currently include the EN4093R 10Gb Scalable Switch and CN4093 10Gb
Converged Scalable Switch. vNIC also requires a node adapter that also supports this
functionality.
As of this writing, there are two primary forms of vNIC available: Virtual Fabric mode (or
Switch dependent mode) and Switch independent mode.
The Virtual Fabric mode of vNIC also is subdivided into two sub-modes: Dedicated uplink
vNIC mode and Shared uplink vNIC mode.
All vNIC modes share the following common elements:
򐂰 They are supported only on 10 Gb connections.
򐂰 Each vNIC mode allows a NIC to be divided into up to four vNICs per physical NIC (can be
less than four, but not more).
򐂰 They all require an adapter that has support for one or more of the vNIC modes.
򐂰 When vNICs are created, the default bandwidth is 2.5 Gb for each vNIC, but they can be
configured to be anywhere from 100 Mb up to the full bandwidth of the NIC.
򐂰 The bandwidth of all configured vNICs on a physical NIC cannot exceed 10 Gb.
򐂰 All modes support FCoE.
A summary of some of the differences and similarities of these modes is shown in Table 4-1.
These differences and similarities are covered in more detail next.
Table 4-1 Attributes of vNIC modes
Tip: It will occasionally be seen in other documentation that these modes are called vNIC 1
(virtual fabric mode vNIC) and vNIC 2 (switch independent mode vNIC).
Capability
IBM Virtual Fabric mode Switch
independent
mode
Dedicated
uplink
Shared
uplink
Requires support in the I/O module Yes Yes No
Requires support in the NIC/CNA Yes Yes Yes
Supports adapter transmit rate control Yes Yes Yes
Support I/O module transmit rate control Yes Yes No
Supports changing rate without restart of node Yes Yes No
Requires a dedicated uplink per vNIC group Yes No No
Support for node OS-based tagging Yes No Yes
Support for per vNIC group Yes Yes N/A
Support for more than one uplink path per vNIC No No Yes
Chapter 4. NIC virtualization considerations on the switch side 57
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
4.1.1 Virtual Fabric mode vNIC
Virtual Fabric mode vNIC depends on the switch in the I/O module bay to participate in the
vNIC process. Specifically, the IBM Flex System Fabric EN4093R 10Gb Scalable Switch and
the CN4093 10Gb Converged Scalable Switch support this mode. It also requires an adapter
on the Compute node that supports the vNIC Virtual Fabric mode feature.
In Virtual Fabric mode vNIC, configuration is performed on the switch and the configuration
information is communicated between the switch and the adapter so that both sides agree on
and enforce bandwidth controls. The mode can be changed to different speeds at any time
without reloading the OS or the I/O module.
As noted, there are two types of Virtual Fabric vNIC modes: Dedicated uplink mode and
Shared uplink mode. Both modes incorporate the concept of a vNIC group on the switch that
is used to associate vNICs and physical ports into virtual switches within the chassis. How
these vNIC groups are used is the primary difference between dedicated uplink mode and
shared uplink mode.
Virtual Fabric vNIC modes share the following common attributes:
򐂰 They conceptually are a vNIC group that must be created on the I/O module.
򐂰 Similar vNICs are bundled together into common vNIC groups.
򐂰 Each vNIC group is treated as a virtual switch within the I/O module. Packets in one vNIC
group can get only to a different vNIC group by going to an external switch/router.
򐂰 For the purposes of Spanning tree and packet flow, each vNIC group is treated as a
unique switch by upstream connecting switches/routers.
򐂰 Both modes support the addition of physical NICs (pNIC) (the NICs from nodes that are
not using vNIC) to vNIC groups for internal communication to other pNICs and vNICs in
that vNIC group, and share any uplink that is associated with that vNIC group.
Dedicated uplink mode
Dedicated uplink mode is the default mode when vNIC is enabled on the I/O module. In
dedicated uplink mode, each vNIC group must have its own dedicated physical or logical
(aggregation) uplink. In this mode, no more than one physical or logical uplink to a vNIC
group can be assigned and it assumed that high availability is achieved by some combination
of aggregation on the uplink or NIC teaming on the server.
In dedicated uplink mode, vNIC groups are VLAN-independent to the nodes and the rest of
the network, which means that you do not need to create VLANs for each VLAN that is used
by the nodes. The vNIC group takes each packet (tagged or untagged) and moves it through
the switch. This mode is accomplished by the use of a form of Q-in-Q tagging. Each vNIC
group is assigned some VLAN that is unique to each vNIC group. Any packet (tagged or
untagged) that comes in on a downstream or upstream port in that vNIC group has a tag
placed on it equal to the vNIC group VLAN. As that packet leaves the vNIC into the node or
out an uplink, that tag is removed and the original tag (or no tag, depending on the original
packet) is revealed.
Example Configuration
Example 4-1 on page 58 shows an example Virtual Fabric vNIC mode configuration. The
below example enables VLAN 4091 as the Outer Q-n-Q VLAN ID on vNIC port 1 the first
Index ID. By default the bandwidth configuration is set to 25% on all 4 Index numbers
equating to 100%. As noted above, these values can be adjusted as needed but not to
exceed 100% on all four Index’s.
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
58 NIC Virtualization on IBM Flex System
In the previous paragraph we discussed the INT vNIC Port settings but how does this relate to
the EXT Port for network access? Within the vnic vnicgroup 1 configuration one of three
options can chosen to get network access;
򐂰 port; a single physical port
򐂰 trunk; a Static/Trunk Port Channel
򐂰 key; an LACP (802.3ad) Port Channel
The failover command, also located within the vnic vnicgroup section, allows for the
monitoring of an EXT Port or Port Channel. In the event of a link failure on the EXT Port or
Port Channel the I/O Module will disable all related members within that vnicgroup.
Example 4-1 Switch Independent mode example configuration
vnic enable
vnic port INTA1 index 1
bandwidth 25
enable
exit
!
vnic vnicgroup 1
vlan 4091
enable
failover
member INTA1.1
port EXT1
exit
In Figure 4-1 on page 59, Virtual Fabric vNIC Dedicated Uplink Mode uses vNIC Groups to
partition the vSwitch within the ESXi Host. Note that this is not specific to VMware and is
supported on all Intel Platform Operating Systems with the Emulex Virtual Fabric Adapter.
In this example vNIC Group1, 2, 3, and 4 utilizes separate uplinks since normal VLAN Traffic
is being transparently switched within each group using Q-n-Q.
Since all traffic is transparent and is contained within its own vNIC Group and I/O Module it is
possible to run the same VLAN or VLANs within multiple vNIC Groups and still maintain
VLAN isolation. For instance, in Figure 4-1 on page 59 below VLAN 20 is being utilized within
two separate ESXi vSwitch’s. However, since each vSwitch has its own physical uplink and
the I/O Module is also running Virtual Fabric vNIC Dedicated Uplink Mode VLAN 20 between
the two vSwitches will remain in isolation from one another.
Chapter 4. NIC virtualization considerations on the switch side 59
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
Virtual Fabric vNIC Dedicated Uplink mode is shown in Figure 4-1 below.
Figure 4-1 IBM Virtual Fabric vNIC Dedicated Uplink Mode
Shared Uplink mode
Shared uplink mode is a global option that can be enabled on an I/O Module that has the
vNIC feature enabled. As the name suggests, it allows an uplink to be shared by more than
one group, which reduces the possible number of uplinks that are required.
It also changes the way that the vNIC groups process packets for tagging. In Shared Uplink
mode, it is expected that the servers no longer use tags. Instead, the vNIC group VLAN acts
as the tag that is placed on the packet. When a server sends a packet into the vNIC group, it
has a tag placed on it equal to the vNIC group VLAN and then sends it out the uplink tagged
with that VLAN.
Only one VLAN can be assigned to a vNIC Group. Since Shared Uplink mode is a global
parameter, Dedicated Uplink mode cannot be utilized on the same I/O Module when enabled.
Unlike the restrictions that both Virtual Fabric Dedicated and Shared Uplink mode contains,
Unified Fabric Port (UFP) does not contain these restrictions.
Example Configuration
Example 4-2 on page 60 shows an example of Shared Uplink mode.
The following parameters must be set in order for Shared Uplink mode to operate properly.
Also note that most parameters below in this example are identical to the settings in
Dedicated Uplink mode section above minus the vnic uplink-share command and the vlan
number which in Shared Uplink mode is identical to that of the customers vlan.
򐂰 The default VLAN must be set on both the INT and EXT Port or PortChannel participating
in the Shared Uplink vNIC mode configuration.
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
60 NIC Virtualization on IBM Flex System
򐂰 TAGGING must be enabled on the EXT Port or PortChannel. All VLAN’s set within the
vnicgroup will be TAGGED to the upstream customer network.
Example 4-2 Virtual Fabric vNIC Shared Uplink mode example configuration
vnic enable
vnic uplink-share
vnic port INTA1 index 1
bandwidth 25
enable
exit
!
vnic vnicgroup 1
vlan 100
enable
failover
member INTA1.1
port EXT1
exit
!
In Figure 4-2 on page 61, Virtual Fabric vNIC Shared Uplink Mode uses vNIC Groups to
partition the vSwitch within the ESXi Host. Note that this is not specific to VMware and is
supported on all Intel Platform Operating Systems with the Emulex Virtual Fabric Adapter.
In this example vNIC Group1, 2, and 3 all share the same uplink port out of the I/O Module in
order to communicate with the network. vNIC Group4, however, utilizes a separate uplink
giving flexibility and control over physical connectivity into the network.
The biggest draw back to Virtual Fabric vNIC Shared Uplink Mode is the inability to apply
VLANs via the operating system.
Chapter 4. NIC virtualization considerations on the switch side 61
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
Virtual Fabric vNIC Shared Uplink mode is shown in Figure 4-2 below.
Figure 4-2 IBM Virtual Fabric vNIC Shared Uplink Mode
4.1.2 Switch Independent mode vNIC
Switch Independent mode vNIC is configured only on the node, and the I/O Module is
unaware of this virtualization. The I/O Module acts as a normal switch in all ways (any VLAN
that must be carried through the I/O Module must be created on the I/O Module and allowed
on the wanted ports). This mode is enabled at the compute node directly (via F1 setup at boot
time or via FSM configuration pattern controls), and has similar rules as Virtual Fabric vNIC
mode regarding how you can divide the vNIC’s. But any bandwidth settings are limited to how
the node sends traffic, not how the I/O Module sends traffic back to the node (since the I/O
Module is unaware of the vNIC virtualization taking place on the Compute Node). Also, the
bandwidth settings cannot be changed in real time, because they require a reload of the
compute node for any speed change to take effect.
Switch Independent mode requires setting an LPVID value in the Compute Node NIC
configuration, and this is a catch-all VLAN for the vNIC to which it is assigned. Any untagged
packet from the OS sent to the vNIC is sent to the switch with the tag of the LPVID for that
vNIC. Any tagged packet sent from the OS to the vNIC is sent to the switch with the tag set by
the OS (the LPVID is ignored). Owing to this interaction, most users set the LPVID to some
unused VLAN, and then tag all packets in the OS. One exception to this is for a Compute
Node that needs PXE to boot the base OS. In that case, the LPVID for the vNIC that is
providing the PXE service must be set for the wanted PXE VLAN.
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
62 NIC Virtualization on IBM Flex System
Because all packets that are coming into the I/O module from a NIC that is configured for
Switch Independent mode vNIC are always tagged (by the OS or by the LPVID setting if the
OS is not tagging), all VLANs that are allowed on the port on the I/O Module side should be
tagging as well. This means set the PVID/Native VLAN on the switch port to some unused
VLAN, or set it to one that is used and enable PVID tagging to ensure the port sends and
receives PVID and Native VLAN packets as tagged.
In most OSs, Switch Independent mode vNIC supports as many VLANs as the OS supports.
One exception is with bare metal Windows OS installations, where in Switch Independent
mode, only a limited number of VLANs are supported per vNIC (maximum of 63 VLANs, but
less in some cases, depending on version of Windows and what driver is in use). See the
documentation for your NIC for details about any limitations for Windows and Switch
Independent mode vNIC.
Example Configuration
In Figure 4-3 on page 63, Switch Independent Mode is being utilized to present multiple
vmnic instances to the hypervisor. Each vmnic can be used to connect to it’s own vSwitch
with multiple Port Groups.
In this example each vmnic is configured to support 1 or more Port Groups. Those Port
Groups without a VLAN defined will utilize the LPVID VLAN ID to communicate with the
Network. For instance, vmnic 0 has an untagged Port Group defined that is part of the LPVID
200 vNIC. For that specific Port Group each VM client will end up on the network TAGGED
with VLAN 200. Those Port Groups that do contain a VLAN TAG will utilize its own TAG and
will bypass the LPVID. The same thing goes for the untagged Port Group connected to vmnic
2 except that VM client will utilize the LPVID VLAN 300 to communicate with the Network.
The I/O Module, on the other hand, sees these ports as physical 10 GB Ports utilizing
Traditional Network VLAN’s and Switching technology.
Chapter 4. NIC virtualization considerations on the switch side 63
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
Figure 4-3 IBM Switch Independent vNIC mode
Summary of Virtual Fabric mode vNIC options
In this section, we have described the various modes of vNIC. The mode that is best-suited
for a user depends on the user’s requirements. Virtual Fabric Dedicated Uplink mode offers
the most control, and Shared Uplink mode and Switch Independent mode offer the most
flexibility with uplink connectivity.
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
64 NIC Virtualization on IBM Flex System
4.2 Unified Fabric Port feature
Unified Fabric Port (UFP) is another approach to NIC virtualization. It is similar to vNIC but
with enhanced flexibility and should be considered the direction for future development in the
virtual NIC area for IBM switching solutions. UFP is supported today on the EN4093R 10Gb
Scalable Switch and CN4093 10Gb Converged Scalable Switch and utilizes LLDP TLDs to
communicate between the physical switch port and the physical NIC within the Compute
Node.
UFP and vNIC are mutually exclusive in that you cannot enable UFP and vNIC at the same
time on the same switch. If a comparison were to be made between UFP and vNIC, UFP is
most closely related to vNIC Virtual Fabric mode in that in both sides, the switch and the
NIC/CNA share in controlling bandwidth usage, but there are significant differences.
Compared to vNIC, UFP supports the following modes of operation per virtual NIC (vPort):
4.2.1 UFP Access and Trunk modes
򐂰 Access: The vPort only allows the default VLAN, which is similar to a physical port in
access mode.
򐂰 Trunk: The vPort permits host side tagging and supports up to 32 customer-defined
VLANs on each vPort.
Example Configuration
Example 4-3 shows one vPort configured for Access mode and another vPort, within the
same physical port, configured for Trunk mode. VLAN 10 on vPort 1 is set to be an access
port allowing only a single un-tagged VLAN for this vPort. VLAN 20 on vPort 2 is set to be the
native VLAN for that vPort with VLAN 30 and 40 set to be tagged over that same vPort.
Example 4-3 vPort Access and Trunk mode example configuration
ufp port INTA1 vport 1
network mode access
network default-vlan 10
enable
exit
!
ufp port INTA1 vport 2
network mode trunk
network default-vlan 20
enable
exit
!
Note: Before beginning the following criteria’s must be set before an I/O module port can
be enabled to support UFP:
򐂰 VLAN 1 is the only VLAN that can be assigned.
򐂰 TAGGING must be enabled. (When enabling UFP on a physical port tagging will be
enabled automatically.)
Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable
command) and on the port (ufp port port identifier enable).
Chapter 4. NIC virtualization considerations on the switch side 65
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
vlan 30,40
enable
vmember INTA1.2
Optionally, Example 4-4 shows adding the ability to detect uplink failures referred to as
failover. Failover is a feature used to monitor an up uplink port or port channel and upon
detection of a failed link or port channel the I/O Module will disable any associated members
(INT Ports) or vmembers (UFP vPorts).
Example 4-4 UFP Failover of a vmembers
failover trigger 1 mmon monitor member EXT1
failover trigger 1 mmon control vmember INTA1.1
failover trigger 1 enable
Configuration validation and state of a UFP vPort
While it’s easy enough to read and understanding how to configure an I/O module for UFP,
there are several troubleshooting commands that can be utilized to validate the configuration
and the state of a vPort as seen below in Example 4-5. and Figure 4-7 on page 73.
Example 4-5 below shows the results of a successfully configured vPort with UFP selected
and running on the Compute Node.
Example 4-5 display’s individual ufp vPort configuration and status
PF_CN4093a#show ufp information vport port 3 vport 1
-------------------------------------------------------------------
vPort state evbprof mode svid defvlan deftag VLANs
--------- ----- ------- ---- ---- ------- ------ ---------
INTA3.1 up dis trunk 4002 10 dis 10 20 30
Below is an understanding of each of the states taking from the above Example 4-5.
򐂰 vPort = is the Virtual Port ID [port.vport]
򐂰 state = the state of the vPort (up, down or disabled)
򐂰 evbprof = only used when Edge Virtual Bridge Profile is being utilized, i.e. 5000v
򐂰 mode = vPort mode type, e.g. access, trunk, tunnel, fcoe, auto
򐂰 svid = Reserved VLAN 4001-4004 for UFP vPort communication with Emulex NIC
򐂰 defvlan = default VLAN is the PVID/Native VLAN for that vPort (untagged)
򐂰 deftag = default TAG, disabled by default, allows for option to tag the defvlan
򐂰 VLANs = list of VLAN’s assigned to that vPort
Some other useful UFP vPort troubleshooting commands can be seen below in Example 4-6.
Example 4-6 display's multiple ufp vPort configuration and status
PF_CN4093a(config)#show ufp information port
-----------------------------------------------------------------
Alias Port state vPorts chan 1 chan 2 chan 3 chan 4
------- ---- ----- ------ --------- --------- --------- ---------
INTA1 1 dis 0 disabled disabled disabled disabled
INTA2 2 dis 0 disabled disabled disabled disabled
INTA3 3 ena 1 up disabled disabled disabled
.
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
66 NIC Virtualization on IBM Flex System
.
.
PF_CN4093a(config)#show ufp information vport
-------------------------------------------------------------------
vPort state evbprof mode svid defvlan deftag VLANs
--------- ----- ------- ---- ---- ------- ------ ---------
INTA1.1 dis dis tunnel 0 0 dis
INTA1.2 dis dis tunnel 0 0 dis
INTA1.3 dis dis tunnel 0 0 dis
INTA1.4 dis dis tunnel 0 0 dis
INTA2.1 dis dis tunnel 0 0 dis
INTA2.2 dis dis tunnel 0 0 dis
INTA2.3 dis dis tunnel 0 0 dis
INTA2.4 dis dis tunnel 0 0 dis
INTA3.1 up dis trunk 4002 10 dis 10 20 30
.
4.2.2 UFP Tunnel mode
Q-in-Q mode, where the vPort is customer VLAN-independent (this is the closest to vNIC
Virtual Fabric dedicated uplink mode). Tunnel mode is the default mode for a vPort.
Example Configuration
Example 4-7 shows port INTA1 vPort 3 configured in Tunnel mode, Q-n-Q, which can carry
multiple VLANs through a single outer tagged VLAN ID. In this example we are using VLAN
4091 as the Tunnel VLAN. When configuring UFP Tunnel mode at least one EXT port must be
configured to support the Outer VLAN ID as seen in the below example.
Example 4-7 vPort Tunnel mode example configuration
ufp port INTA1 vport 3
network mode tunnel
network default-vlan 4091
enable
exit
!
interface port EXT1
tagpvid-ingress
pvid 4091
exit
Configuration validation and state of a UFP vPort - Tunnel mode
While it’s easy enough to read and understanding how to configure an I/O module for UFP,
there are several troubleshooting commands that can be utilized to validate the configuration
and the state of a vPort. (See Example 4-6 on page 65.)
Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable
command) and on the port (ufp port port identifier enable).
Chapter 4. NIC virtualization considerations on the switch side 67
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
4.2.3 UFP FCoE mode
UFP FCoE mode dedicates the specific vPort (vPort 2 only) for FCoE traffic when enabled
within the UEFI. See Chapter 5, “NIC virtualization considerations on the server side” on
page 75 on how to enable FCoE within a compute node.
Example Configuration
Example 4-8 shows vPort 2 set in FCoE mode utilizing VLAN 1001. QoS minimum bandwidth
is set to 50% of a 10 GbE port with the default max burst set of 100%.
Example 4-8 vPort FCoE Mode example configuration
ufp port INTA1 vport 2
network mode fcoe
network default-vlan 1001
qos bandwidth min 50
enable
exit
In Figure 4-4 on page 68, IBM Unified Fabric Port utilizes vPorts to create isolation between
virtual NICs within the compute node and maintains that isolation within the I/O module.
vmNIC’s within the compute node are created (up to 4 per 10 GB NIC) that can be assigned
to separate vSwitches or be seen as a virtual HBA within the hypervisor or bare bone
Operating System.
In this example vPort (.1) is utilized for ESXi Management for connectivity to vCenter and
vPort (.3) is utilized for vMotion both of which are set to Access mode. vPort (.2) has been
enabled for FCoE mode. vPort (.4), which is set to Tunnel mode, is utilized to Tunnel VM Data
between the hypervisor and the upstream network.
Configuration validation and state of a UFP vPort
While it’s easy enough to read and understanding how to configure an I/O module for UFP,
there are several troubleshooting commands that can be utilized to validate the configuration
and the state of a vPort. (See Example 4-6 on page 65).
Note: This is only the vPort setting required to carry FCoE. CEE, FCoE FIPS Snooping
and other settings are required to be enabled that can be seen in Chapter 6, “Flex System
NIC virtulization deployment scenarios” on page 133.
Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable
command) and on the port (ufp port port identifier enable).
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
68 NIC Virtualization on IBM Flex System
Figure 4-4 IBM Unified Fabric Port Mode
4.2.4 UFP Auto mode
The UFP vPort Auto mode feature is based on IBM VMready and IEEE 802.1Qbg
implementations.
IBM VMready and IEEE 802.1Qbg Edge Virtual Bridging are software solutions that supports
open standards virtualization. They allow administrators to create groups of virtual machine
port groups allowing the ability to administer and migrate from a central location. VMready
works with all major hypervisor software, including VMware, Microsoft Hyper-V, Linux
Kernel-based Virtual Machine (KVM) or, Citrix XenServer. Although IBM PowerVM® is
supported with VMready, UFP is specific to Intel based Compute Nodes. It requires no
proprietary tagging or changes to the hypervisor software.
UFP vPort Auto Mode works to dynamically create and remove VLAN’s learned from the
vPort. When a VLAN is created and added to a vPort that same VLAN ID is also added to the
Uplink associated with that vPort. This, however, can be intrusive to a network if having more
than one Uplink path out of a Switch, not a PortChannel, to a single destination running the
same VLAN. Caution should be taken when implementing VMready.
More information can be found on implementing VMready within the following Redbooks
Publication:
http://www.redbooks.ibm.com/abstracts/sg247985.html
Chapter 4. NIC virtualization considerations on the switch side 69
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
4.2.5 The following rules and attributes are associated with UFP vPorts
򐂰 They are supported only on 10 Gb internal interfaces.
򐂰 UFP allows a NIC to be divided into up to four virtual NICs called vPorts per physical NIC
(can be less than 4, but not more).
򐂰 Each vPort can be set for a different mode or same mode (with the exception of the FCoE
mode, which is limited only to a single vPort on a UFP port, and specifically only vPort 2).
򐂰 UFP requires the proper support in the Compute Node for any port using UFP.
򐂰 By default, each vPort is ensured 2.5 Gb and can burst up to the full 10G if other vPorts do
not need the bandwidth. The ensured minimum bandwidth and maximum bandwidth for
each vPort are configurable.
򐂰 The minimum bandwidth settings of all configured vPorts on a physical NIC cannot exceed
10 Gb.
򐂰 Each vPort must have a default VLAN assigned. This default VLAN is used for different
purposes in different modes.
򐂰 This default VLAN must be unique across the other three vPorts for this physical port,
which means that vPort 1.1 must have a different default VLAN assigned than vPort 1.2,
1.3 or 1.4.
򐂰 When in trunk or access mode, this default VLAN is untagged by default, but it can be
configured for tagging if desired. This configuration is similar to tagging the native or PVID
VLAN on a physical port. In tunnel mode, the default VLAN is the outer tag for the Q-in-Q
tunnel through the switch and is not seen by the end hosts and upstream network.
򐂰 vPort 2 is the only vPort that supports the FCoE setting. vPort 2 can also be used for other
modes (for example, access, trunk or tunnel). However, if you want the physical port to
support FCoE, this function can only be defined on vPort 2
򐂰 The physical port must be set to VLAN 1 as the pvid with tagging enabled and no other
VLAN’s defined for that port.
Table 4-2 offers some check points in helping to select a UFP mode.
Table 4-2 Attributes of UFP modes
Summary of whether or not Virtual Fabric or UFP should be considered
What are some of the criteria to decide if a UFP or vNIC solution should be implemented to
provide the virtual NIC capability?
Capability
IBM UFP vPort mode options
Access Trunk Tunnel FCoE
Support for a single untagged VLAN on the vPorta
Yes Yes Yes No
Support for VLAN restrictions on vPortb Yes Yes No Yes
VLAN-independent pass-true for customer VLANs No No Yes No
Support for FCoE on vPort No No No Yes
Support to carry more than 256 VLANs on a vPort No No Yes No
a. Typically a user sets the vPort for access mode if the OS uses this vPort as a simple untagged link. Both
trunk and tunnel mode can also support this, but are not necessary to carry only a single untagged VLAN.
b. Access and FCoE mode restricts VLANs to only the default VLAN that is set on the vPort. Trunk mode
restricts VLANs to ones that are specifically allowed per VLAN on the switch (up to 32).
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
70 NIC Virtualization on IBM Flex System
In an environment that has not standardized on any specific virtual NIC technology, UFP is
the way to go. As noted, all future virtual NIC development will be on UFP. UFP has the
advantage of being able to emulate vNIC virtual fabric modes (via tunnel mode for dedicated
uplink vNIC and access mode for shared uplink vNIC) but can also offer virtual NIC support
with customer VLAN awareness (trunk mode) and shared virtual group uplinks for access and
trunk mode vPorts.
If an environment has already standardized on Virtual Fabric mode vNIC and plans to stay
with it, Virtual Fabric mode vNIC is recommended.
Note that Switch Independent mode vNIC is actually exclusive of the above decision making
process. Switch Independent mode has its own unique attributes, one being truly switch
independent, which allows a user to configure the switch without restrictions to the virtual NIC
technology, other than allowing the proper VLANs. UFP and Virtual Fabric mode vNIC each
have a number of unique switch-side requirements and configurations. The down side to
Switch independent mode vNIC is the inability to make changes to the vNIC without first
reloading the server, and the lack of support for bidirectional bandwidth allocation.
4.3 Compute node NIC to I/O module connectivity mapping
Port Mapping between CNA NICs and I/O module slots are often mis-understood and
confusing to explain. Each type of mezzanine card option could have similar connectivity to
each I/O module slot and others might be completely different depending on the number of
ports and ports per ASIC. One thing is always the same, each mezzanine slot consists of four
lanes. Each lane can drive either 1 Gb or 10 Gb Ethernet speeds. In total a single mezzanine
slot is possible of driving up to 40 Gb Ethernet to each I/O module.
Chapter 4. NIC virtualization considerations on the switch side 71
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
4.3.1 Embedded 10Gb VFA (LoM) - Mezzanine 1
Figure 4-5 shows an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on
Motherboard or LoM), specifically for the x86 compute nodes that can be replaced with
another option card by removing the riser card from Mezzanine Slot 1. The 2-port LoM types
are capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization
options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The
dual-port LoM consists of a single ASIC with two ports of 10 GbE that has physical direct
wiring through the midplane to the I/O Module Slot 1 and 2 for port redundancy.
Figure 4-5 2 port LoM 10G VFA Mezz 1 connectivity to I/O Modules 1 and 2
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
72 NIC Virtualization on IBM Flex System
4.3.2 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1
Figure 4-6 shows a CN4054 4 port 10Gb Virtual Fabric Adapter specifically for the x86
compute nodes that can be placed into either Mezzanine Slot 1 or 2. The 4-port CNA type is
capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization options
are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The four-port
CNA Card consists of dual ASICs with two ports of 10 GbE each that has physical direct
wiring through the midplane to the I/O Module Slot 1 and 2 for port redundancy when placed
into Mezzanine Slot 1.
Figure 4-6 4 port CN4054/R 10G VFA Mezz 1 connectivity to I/O Modules 1 and 2
4.3.3 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 and 2
Figure 4-7 on page 73 shows two 4-port CN4054 10Gb Virtual Fabric Adapters specifically for
the x86 compute nodes that has placed into both Mezzanine Slots 1 and 2. The 4-port CNA
type is capable of pNIC, FCoE and iSCSI (license key may be required). The Virtualization
options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The
four-port CNA card consists of dual ASICs with two ports of 10 GbE each that has physical
direct wiring through the Midplane to I/O Module Slot 1 and 2 for Mezzanine 1 and I/O
Modules 3 and 4 for Mezzanine 2. This provides for a highly redundant environment with
bandwidth possibilities of up to 80 Gb can be achieved with this option to each half width
compute node.
Chapter 4. NIC virtualization considerations on the switch side 73
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm
Figure 4-7 Two 4-port CN4054/CN4054R 10Gb VFA Mezz 1 and 2 connectivity to I/O Modules
4.3.4 IBM Flex System x222 Compute Node
Each server in the x222 includes an Embedded 10Gb Virtual Fabric Adapter (VFA, also
known as LAN on Motherboard or LOM) built in to the system board. The x222 has one Fabric
Connector (which is physically on the lower server) and the Ethernet connections from both
Embedded 10 Gb VFAs are routed through it. Figure 4-8 shows how each server connects to
the I/O module. Each 2-port CNA type is capable of pNIC, FCoE and iSCSI (license key may
be required). The virtualization options are Virtual Fabric Mode, Switch Independent Mode
and Unified Fabric Protocol.
Figure 4-8 x222 Node Server connectivity to I/O Module
NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm
74 NIC Virtualization on IBM Flex System
Switch upgrade 1 required: For EN4093, EN4093R, CN4093 and SI4093 switches, you
must have Upgrade 1 enabled in the two switches. Without this feature upgrade, the upper
server will not have any Ethernet connectivity.
© Copyright IBM Corp. 2014. All rights reserved. 75
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Chapter 5. NIC virtualization considerations
on the server side
In 3.3, “IBM Flex System Ethernet adapters” on page 47 we introduced the physical Emulex
NICs that support virtual NIC functionality in the PureFlex System environment and in
Chapter 4, “NIC virtualization considerations on the switch side” on page 55 we discussed the
I/O Module virtualization features. In this chapter we go into detail on how to enable the NIC
virtualization from the server side, as well as some design considerations for utilizing these
NICs within various operating systems.
The following topics are covered:
򐂰 5.1, “Introduction to enabling Virtual NICs on the server” on page 76
򐂰 5.2, “Other methods for configuring virtual NICs on the server” on page 92
򐂰 5.3, “Utilizing physical and virtual NICs in the OS” on page 115
5
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
76 NIC Virtualization on IBM Flex System
5.1 Introduction to enabling Virtual NICs on the server
Regardless of what mode of virtual NIC is desired, all modes have at least some small
element of low level configuration that must be performed on the server side. Some Emulex
NICs may ship pre-configured for vNIC Virtual Fabric mode already enabled, but even those
can be changed to a different mode, or have vNIC disabled all together if desired.
Exactly how to enable and/or change the virtual NIC function on the Emulex NICs has varied
over the years, but for the most part it can always be done via the UEFI configuration from the
F1 setup on the server.
It is also possible to control and automate setting virtual NIC options via certain tools, such as
using Configuration Patterns in the FSM, and this will also be introduced in this section as
well, but we will primarily focus on using the F1 setup method for configuring the virtual NIC
on the server side.
5.1.1 Getting in to the virtual NIC configuration section of UEFI
When manually performing the virtual NIC configuration on the server, it is necessary to enter
UEFI via the F1 setup option during server boot. Once you are into F1 setup you need to drill
into the section that permits enabling and changing the desired virtual NIC mode and perform
any changes and then save those changes.
Important: The steps to get into UEFI in this section assume the reader knows how to get
to the console of a Compute Node. For reference, this is commonly done by connecting via
browser to the IMM IP address of that host, and clicking on the Remote Control button, and
the clicking on the option to start remote control in either single-user or multi-user mode.
Chapter 5. NIC virtualization considerations on the server side 77
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
The following are the exact steps on how to get to the virtual NIC configuration screens when
utilizing version 4.6.281.26 of the Emulex firmware
1. Power on the server, and when the screen shown in Figure 5-1 is present, press the F1
key to enter in to UEFI setup.
Figure 5-1 Example of screen to press the F1 key to enter UEFI setup
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
78 NIC Virtualization on IBM Flex System
2. On the main System Configuration and Boot Management screen as seen in Figure 5-2
on page 78 use the arrow keys to scroll down to System Settings option and press Enter.
Figure 5-2 Example of first screen viewed after pressing the F1 key to enter UEFI setup
Chapter 5. NIC virtualization considerations on the server side 79
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
3. On the System Settings screen as seen in Figure 5-3, scroll down to the Network option
and press Enter.
Figure 5-3 Example of screen to enter network set up
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
80 NIC Virtualization on IBM Flex System
4. On the Network screen, scroll down to the desired NIC and press Enter
– Exactly how many NICs you see on the Network screen will vary, depending on what
model NIC is installed (dual port, quad port and so on), how many of these NICs are
installed (LoM only, MEZZ1 and/or MEZZ2 slots used), and if a virtual NIC mode is
already enabled or not. For example, if this were a Compute Node with only the LoM
dual port NIC, and no virtual NIC had previously been enabled, you would only see the
two physical NICs on this screen, as seen in Figure 5-4.
– If this were the same dual port NIC and virtual NIC had already been enabled, you
would see between six and eight NICs on this screen (depending on if FCoE/iSCSI had
also been previously enabled or not).
Figure 5-4 Example of Network screen with dual port LoM, before any virtual NIC has been enabled
Chapter 5. NIC virtualization considerations on the server side 81
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
– Figure 5-5 shows how the Network screen might look on a dual port NIC after some
form of virtual NIC had been enabled, and the system restarted.
Figure 5-5 Example of Network screen after vNIC has been enabled and the system restarted
– The images in Figure 5-4 on page 80 and Figure 5-5 on page 81 also illustrate an
important concept, once a NIC has been placed into a virtual NIC mode and reloaded,
and a user comes back into this Network screen, if it is desired to drill back into the
NICs to review or change the virtual NIC settings, the two top NICs (in this example of
a dual NIC solution) are the only ones that will let you make those changes. If you drill
into the third through eight NICs in this list, the user will not be presented with an option
to drill in to make changes to the virtual NIC settings. Only the first two NICs in the list
of 8 NICs in this example will let you make those changes.
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
82 NIC Virtualization on IBM Flex System
5. Once a user highlights the desired NIC in the Network screen and presses the Enter key, a
screen for just that one NIC will be shown, something like what is shown in Figure 5-6.
Figure 5-6 Example of the individual NIC screen
Chapter 5. NIC virtualization considerations on the server side 83
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
6. On the screen shown in Figure 5-6 on page 82, highlight the NIC itself and press Enter to
drill one step deeper into that NICs configuration, which will bring up a screen called
Emulex NIC selection, that will look something like Figure 5-7 (may vary depending on
firmware version of the NIC).
Figure 5-7 Example of the Emulex NIC Selection screen (virtual NIC disabled)
Some important items with regard to Figure 5-7:
– If Multichannel mode is disabled, then regardless of the Personality setting (NIC, FCoE
or iSCSI), the OS will be presented with just the physical NICs
– If Multichannel mode is set to any form of virtual NIC mode, then the Personality setting
impacts how many virtual NICs are presented to the OS.
• If NIC is selected in Personality, 4 NICs will be presented to the OS for each 10G
NIC set to a form of virtual NIC
• If FCoE or iSCSI is selected in Personality, 3 NICs will be presented to the OS for
each 10G NIC set to a form of virtual NIC. An example of 3 ports on each NIC on a
dual port NIC (6 ports total) can be seen in Figure 5-8 on page 84
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
84 NIC Virtualization on IBM Flex System
Figure 5-8 NICs available on dual port NIC with virtual NIC enabled, and iSCSI or FCoE Personality
enabled
– The Multichannel mode is how the virtual NIC feature is enabled, and should bring up a
window as shown in Figure 5-9 when Multichannel is selected and the Enter key is
pressed:
Figure 5-9 Emulex Multichannel (virtual NIC) mode options
And should have these four options listed
• Switch Independent Mode (This is Switch Independent Mode vNIC)
• IBM Virtual Fabric Mode (this is vNIC Virtual Fabric mode)
• IBM Unified Fabric Protocol Mode (This is UFP)
• Disable (when selected turns off all NIC virtualization on this ASIC
– Controller configuration is where you can make some changes to the vNIC modes of
virtual NIC (once enabled and saved in UEFI, all remaining configuration for the UFP
modes of virtual NIC is done via the I/O Module)
Important: If you do not see all three virtual NIC options (Switch Independent Mode, IBM
Virtual Fabric Mode, and IBM Unified Fabric Protocol Mode), more then likely the NIC is on
down level firmware, and should be upgraded before going any further.
Chapter 5. NIC virtualization considerations on the server side 85
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
5.1.2 Initially enabling virtual NIC functionality via UEFI
Starting from the Emulex NIC selection screen, perform the following steps to select a virtual
NIC mode:
1. Scroll down to the Multichannel Mode option and press Enter to see the selections as
shown in Figure 5-10.
Figure 5-10 Selecting a multichannel mode
2. In the screen shown in Figure 5-10 scroll to the desired virtual NIC mode and press the
Enter key to enable the version of virtual NIC to be used (or disable it if the Disable option
is selected)
3. What needs to happen next depends on what mode is selected:
– If Switch Independent Mode is selected, you must now go into the Controller
Configuration portion of the Emulex NIC Selection screen, and set the LPVID (Logical
Port VLAN Identifier), and the Bandwidth (in older firmware you also had to enable or
disable each virtual NIC individually, but that is not necessary in newer firmware). See
Special settings for vNIC Switch Independent Mode section for details. With this mode
of Virtual NIC mode, there are no special settings that need to be performed on the I/O
Modules.
– If IBM Virtual Fabric Mode is selected, you can optionally go into the Controller
Configuration section and set LPVID (as seen in Special settings for vNIC Virtual
Fabric mode section), but you must perform specific configuration steps on the I/O
Modules to complete this mode of virtual NIC. See chapter 4 for details on necessary
settings on the I/O Modules to complete this configuration.
– If IBM Unified Fabric Protocol Mode is selected, no other configuration in the UEFI is
permitted, but you must perform specific configuration on the I/O Modules themselves
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
86 NIC Virtualization on IBM Flex System
to complete this mode of virtual NIC. See chapter 4 for details on necessary settings on
the I/O Modules to complete this configuration.
Regardless of the mode selected, it is necessary to eventually exit out of UEFI and save the
changes before any of these options take effect.
It is important to note that enabling a type of virtual NIC in the Multichannel mode section of
the Emulex NIC Selection screen impacts all NICs on an ASIC, not just that single NIC. If
working with the dual port NIC (single ASIC solution), enabling a virtual NIC mode on one
NIC enables the feature on both NICs. If working with the 4 or 8 port Emulex NIC (both dual
ASIC solutions) and want virtual NICs on all NICs, you must enable it twice, once for each
ASIC (in the case of the 8 port NIC, when you enable it on a single port on an ASIC, the other
3 ports on that same ASIC are also enabled for this function). See Chapter 4 for details on
ASIC NIC mapping in relationship to I/O Module connectivity.
5.1.3 Special settings for the different modes of virtual NIC via UEFI
As noted, when UFP is enabled there are no other settings necessary in UEFI, but both
modes of vNIC virtual NIC have more settings that can be performed within UEFI. These
extra settings are mandatory with Switch Independent Mode vNIC, and optional for Virtual
Fabric Mode vNIC. The following are the extra settings for these modes.
Important: Unlike when enabling the virtual NIC feature itself, where it effects all ports on
the same ASIC, you must complete these extra settings on a per physical port basis. So if
this is a dual port NIC, once you have set and saved the first NIC, you must exit back to the
Network screen, and select the second physical NIC, and repeat the process.
Chapter 5. NIC virtualization considerations on the server side 87
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Special settings for vNIC Switch Independent Mode
After the Multichannel Mode has been set to Switch Independent Mode, it is now mandatory
to scroll down to the Controller Configuration option and complete other steps to bring these
virtual NICs fully operational. After selecting the Controller Configuration option and pressing
Enter you will be taken to a screen similar to that seen in Figure 5-11.
Figure 5-11 Example options available in Switch Independent Mode
As can be seen, the Controller Configuration screen for Switch Independent Mode offers 4
options:
1. View configuration- Views the most recently saved configuration (changes that have been
made but have not yet been saved via the Save Current Configurations option on this
screen, will not be seen in here)
2. Configure Bandwidth - Defaults to 0G per vNIC, and must be set and saved before they
become operational in the OS
3. Configure LPVID - Must be set and saved before these vNICs will become operational in
the OS
4. Save Current Configuration - Must save config changes before leaving this screen or
changes will be lost
Important: One of the most common issues noted in the field is the changes not being
saved in this screen before exiting. Remember to always save here if any changes are
made in this area. It may be a good idea after saving changes and exiting this screen, to go
back into this screen and reconfirm the configurations for LPVID and Bandwidth were truly
saved.
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
88 NIC Virtualization on IBM Flex System
The following provides more details on these specific options.
Configure Bandwidth:
After scrolling to the Configure Bandwidth option and pressing the Enter key, a screen similar
to Figure 5-12 will be shown:
Figure 5-12 Example of Bandwidth settings in Switch Independent Mode showing default settings
Users must properly set the desired minimum and maximum bandwidths before this
configuration can be saved. The following are some guidelines with regard to these
Bandwidth settings:
򐂰 All values are in percentages of 10G (for example, setting a 10 in here represents 10% of
10G, meaning it is set for 1G)
򐂰 All values are between 0 to 100 in increments of 1 (1% of 10G = 100M)
򐂰 The total value of all the minimums must equal 100%, or save will not be allowed
򐂰 The value of any given vNIC maximum must be equal to or greater then the minimum for
that vNIC
򐂰 If hard enforcement of bandwidth is desired, set the minimum and maximum values the
same for each vNIC. An example of this would be setting both the minimum and
maximums values all to 25, which would hard lock the values to 2.5G per each vNIC.
򐂰 If it is desired to allow vNICs to use excess bandwidth not in use by other vNICs, set the
maximum to a higher value then the minimum. An example of this would be setting all of
the minimums to 25, and all of the maximums to some higher value, in which case each
vNIC is guaranteed 25%, but can use up to their maximum percentage if other vNICs are
not using their full minimum allotment.
Chapter 5. NIC virtualization considerations on the server side 89
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
򐂰 It is possible to set the maximum for all vNICs to 100%, meaning each vNIC is guaranteed
the minimum set, but can use up to 100% of the remaining bandwidth if it is not in use by
other vNICs
Configure LPVID
After scrolling to the Configure LPVID option and pressing Enter, a screen similar to
Figure 5-12 will be shown.
Figure 5-13 Example of default LPVID settings in Switch Independent Mode
The LPVID is a unique concept to the vNIC based options (both Virtual Fabric mode and
Switch Independent Mode). From an end user perspective the LPVID value could be
considered the default VLAN for that vNIC. This LPVID value is only used by the OS if the OS
is sending untagged packets. If the OS is sending untagged packets toward the I/O Module,
that packet will get a tag equal to the LPVID for that vNIC, before being sent on its way to the
I/O Module (return packets would have the LPVID VLAN stripped off before being sent back
to the OS). If the OS is sending tagged packets, the LPVID is ignored and the OS VLAN tag is
passed to the upstream I/O Module unmolested. One side effect of this LPVID usage is that
all packets coming from a host running Switch Independent Mode will be delivered to the
upstream I/O Module tagged (if the OS sends an untagged packet, it will be sent to the I/O
Module tagged with the value of the LPVID setting for that vNIC, and if the OS sends the
packet tagged, it will be sent to the I/O module with whatever tag the OS had put on the
packet).
The following are some guidelines with regard to these the LPVID settings:
򐂰 Valid LPVID values are 2-4094
򐂰 For Switch Independent mode, you must set the LPVID on all vNICs before a save will be
allowed (this is an optional setting on Virtual Fabric vNIC mode)
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
90 NIC Virtualization on IBM Flex System
򐂰 Each vNIC on a given physical port must use a unique LPVID - most cases, the partner
NIC LPVIDs are set for the same value, but they could be set different
򐂰 Owing to how all packets will arrive tagged at the I/O Module, on the I/O Module side the
interface must be tagged and if the host needs to use the currently assigned PVID/Native
VLAN on the I/O Module side, then the tag-pvid option must be configured on this interface
on the I/O Module. Another solution to this is to set the PVID/Native VLAN on the I/O
Module for this port to some unused value and do not use the PVID/Native VLAN
򐂰 If bare metal PXE boot is not required on the host, one option is to set the LPVID values to
some unused VLANs, and then only send tagged packets from the OS. The same
restriction from the previous bullet (all packets tagged) still applies, but the end user no
longer needs to keep track of which VLANs need to be tagged in the OS and which do not
(just tag them all at all times).
򐂰 If bare metal PXE boot is required, then the LPVID for the vNIC that needs to PXE boot,
must be set for the VLAN that the PXE packet is expected to arrive on
Once the LPVID and bandwidth settings are properly set, before exiting the Controller
configuration screen, the user must perform a save. Older versions of firmware would allow a
user to escape out of this screen without saving and not provide any warning. The version of
firmware used during the writing of this paper (and hopefully all newer versions) put up a
warning as seen in Figure 5-14 if the changes have not been saved.
Figure 5-14 Example of attempting to exit Switch Independent Mode vNIC Controller Configuration
screen without saving
Chapter 5. NIC virtualization considerations on the server side 91
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Special settings for vNIC Virtual Fabric mode
When enabled for the Multichannel mode of IBM Virtual Fabric mode vNIC, the only UEFI
option is to configure the LPVID value (Bandwidth is controlled from the I/O Module). Unlike
the Switch Independent mode, this setting is strictly optional.
Also unlike the Switch Independent Mode, it is not necessary to set all vNICs LPVID values,
and still save the config. If desired, only a single vNIC or any or all vNICs can have an LPVID
assigned or remain at 0 (0 meaning the vNIC passes untagged traffic untagged to the
upstream I/O module), and it will still be allowed.
For any vNICs that do have an LPVID assigned, the operation is the same as for Switch
Independent Mode (if the host sends an untagged packet, that packet will be sent to the I/O
Module tagged with the value of the LPVID, if a host sends a tagged packet, the LPVID is
ignored and the tag the host set gets sent to the I/O Module).
As noted, if no LPVID value is assigned (default for Virtual Fabric vNIC mode), any untagged
packet sent from the OS will be sent to the I/O module untagged, and arrive on the
Native/PVID VLAN assigned to the I/O Module port connecting to this host.
5.1.4 Setting the Emulex virtual NIC settings back to factory default
If necessary, it is possible to reset the Emulex NICs back to factory default. This not only
resets all of the Bandwidth and LPVID settings, but also disables Multichannel for this ASIC
back to factory default. The option to perform this factory default can be found by scrolling to
the bottom of the Emulex NIC Selection screen, and select Erase Configuration and pressing
the Enter key. An example of the results of pressing Enter on this selection is shown in
Figure 5-15 on page 92
Important: As noted previously, after the Bandwidth and LPVID values are configured and
saved on one NIC, this process must be completed for the other physical NIC of this pair
(you must exit back to the Network screen and select the other NIC and drill back in to the
LPVID and Bandwidth settings and make and save the changes). This is different from the
settings in the Emulex NIC selection screen, where changes there, to things like
Multichannel mode and Personality, are carried to all NICs on the common ASIC.
Important: Until both the LPVID and Bandwidth values are properly set and saved, the
vNICs will show as disconnected in the OS. Be sure to complete these operations on all
Switch Independent Mode configured NICS before attempting to utilize these NICs in the
OS.
Important: Regardless of if you do or do not set any LPVID values, the Virtual Fabric
mode of vNIC now requires you to go into the I/O module to complete the configuration
process (enable vNIC, create vNIC groups and assign other variables). Until the I/O
module step is done the OS will report the vNIC as not connected. See Chapter 6, “Flex
System NIC virtulization deployment scenarios” on page 133 for examples for configuring
the I/O module side for Virtual Fabric vNIC.
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
92 NIC Virtualization on IBM Flex System
Figure 5-15 Example of setting Emulex NIC back to factory default
5.2 Other methods for configuring virtual NICs on the server
Although the primary method used in this document for enabling virtual NICs on the server is
via the UEFI F1 setup path, there are other tools available to help automate this process. This
section introduces one such tool - FSM configuration patterns.
5.2.1 FSM Configuration Patterns
With certain Emulex NICs it is possible to automate the deployment of the NIC settings via the
FSM. Some examples of items that can be automated via the FSM:
򐂰 Change the personality between NIC, FCoE, or iSCSI (assuming FoDs installed)
򐂰 Enable a desired mode of Virtual NIC, or disable it
򐂰 For the vNIC modes of virtual NICs that offer other configuration options, we can change
those options, such as LPVID or Bandwidth
Currently the Embedded 10Gb Virtual Fabric Ethernet Controller (LOM) and IBM Flex System
CN4054 10Gb Virtual Fabric Adapter are supported with FSM configuration patterns.
The most important aspect of utilizing configuration patterns, is the ability to push out
changes to many servers, without having to perform the tedious process of manually going
into F1 setup on every server that virtual NICs need to be changed on.
After making any such changes with FSM Configuration Patterns the server must be reloaded
for those changes to take effect.
Chapter 5. NIC virtualization considerations on the server side 93
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
The process of configuring NIC settings via configuration patterns include the following steps:
1. Creating port patterns that describe desired vNIC mode, protocols, and port settings
2. Creating adapter patterns that describe adapter types and desired protocols
3. Creating server patterns that describe node configuration including I/O adapter settings
4. Deploying server patterns on x86 compute node targets
Consider the following hypothetical example. You need to configure vNIC Switch Independent
mode with Ethernet only vNICs on the integrated LOM and vNIC UFP mode on the CN4054
adapter installed in slot 2 of the x240 compute node. The first ASIC of the CN4054 adapter
needs to be configured with Ethernet only vNICs, and the second ASIC requires both
Ethernet and FCoE vNICs.
By default, both LOM and CN4054 adapters are not configured with any vNICs, as shown
in Figure 5-16.
Figure 5-16 Initial NIC configuration
PFA 12:0:0 and PFA 12:0:1 represent two physical LOM ports, PFA 22:0:0 and PFA 22:0:1
represent two physical ports on the first ASIC of the CN4054, and PFA 27:0:0 and PFA
27:0:1represent two physical ports on the second ASIC of the CN4054, for a total of six
network ports.
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
94 NIC Virtualization on IBM Flex System
Opening server configuration patterns
Perform the following steps to open server configuration patterns:
1. Launch FSM Explorer from the Home tab of the FSM interface, as shown in Figure 5-17.
Figure 5-17 Launch FSM Explorer
Chapter 5. NIC virtualization considerations on the server side 95
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
2. Open Configuration Patterns in the FSM Explorer interface by selecting Systems 
Configuration Patterns, as shown in Figure 5-18.
Figure 5-18 Open Configuration Patterns
3. Select Server Patterns to manage server configuration patterns, as shown in Figure 5-19.
Figure 5-19 Server Patterns
Creating port patterns
In our example, we are creating three port patterns:
򐂰 vNIC switch independent mode with Ethernet only ports
򐂰 Universal fabric port (UFP) mode with Ethernet only ports
򐂰 Universal fabric port (UFP) mode with Ethernet and FCoE ports
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
96 NIC Virtualization on IBM Flex System
Perform the following steps to create desired port patterns:
1. Click New icon and select New Port Pattern, as shown in Figure 5-20.
Figure 5-20 New Port Pattern
Chapter 5. NIC virtualization considerations on the server side 97
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
2. In the New Port Pattern window shown in Figure 5-21, specify the port pattern name and
select desired parameters and click Create. In our example, we are creating switch
independent vNIC mode with Ethernet only network ports. For switch independent vNIC,
we should also assign bandwidth parameters and VLAN tags (VLAN tags represent the
LPVID setting as seen in F1 setup for the NICs, as shown in Figure 5-13 on page 89).
Figure 5-21 Port pattern: Configuring vNIC switch independent mode
3. Repeat steps 1 and 2 for the remaining port configurations. In our example, we are
creating two more port patterns: UFP mode with Ethernet only ports and UFP mode with
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
98 NIC Virtualization on IBM Flex System
Ethernet and FCoE ports, as shown in Figure 5-22 on page 98 and Figure 5-23 on
page 99.
Figure 5-22 Port Pattern: Configuring vNIC UFP mode
Chapter 5. NIC virtualization considerations on the server side 99
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Figure 5-23 Port Pattern: Configuring UFP mode with FCoE
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
100 NIC Virtualization on IBM Flex System
4. Configured patterns are displayed in the Server Patterns window, as shown in Figure 5-24.
Figure 5-24 List of configured port patterns
Creating adapter patterns
We are creating two adapter patterns:
򐂰 vNIC switch independent mode with Ethernet only ports for the integrated LOM
򐂰 vNIC UFP mode with Ethernet only ports for the first ASIC of the CN4054 and Ethernet
and FCoE ports for the second ASIC of the CN4054
Chapter 5. NIC virtualization considerations on the server side 101
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Perform the following steps to create adapter patterns:
1. Select New Adapter Pattern from the New Patterns drop-down menu, as shown in
Figure 5-25.
Figure 5-25 New Adapter Pattern
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
102 NIC Virtualization on IBM Flex System
2. In the New Adapter Pattern window, specify the adapter pattern name, adapter type,
operational mode and protocols, as shown in Figure 5-26. We are creating the pattern for
the integrated LOM in vNIC switch independent mode with Ethernet only ports. Click
Create.
Figure 5-26 LOM adapter pattern settings
Chapter 5. NIC virtualization considerations on the server side 103
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
3. Repeat steps 1 and 2 for the remaining patterns. In our example, we are configuring the
pattern for the CN4094 in UFP mode with Ethernet only ports on the first ASIC
(Configuration port group 1) and Ethernet and FCoE ports on the second ASIC
(Configuration port group 2), as shown in Figure 5-27. Click Create.
Figure 5-27 CN4054 adapter pattern settings
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
104 NIC Virtualization on IBM Flex System
Creating new server pattern
We are creating the new server pattern that configures x240 compute node networking
components as follows:
򐂰 Integrated LOM is set to vNIC switch independent mode with Ethernet only ports.
򐂰 The first ASIC of the CN4054 expansion card installed in slot 2 is set to UFP mode with
Ethernet only ports.
򐂰 The second ASIC of the CN4054 expansion card installed in slot 2 is set to UFP mode
with Ethernet and FCoE ports.
Perform the following steps to create server patterns:
1. Select New Server Pattern from the drop-down menu as shown in Figure 5-28.
Figure 5-28 Creating a new server pattern
2. Select Create a new pattern from scratch as shown in Figure 5-29 and click Next.
Figure 5-29 Creating a new pattern from scratch
Chapter 5. NIC virtualization considerations on the server side 105
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
3. Specify the pattern name and form factor as shown in Figure 5-30 and click Next.
Figure 5-30 New Server Pattern Wizard: General
4. Leave Keep existing storage configuration selected as shown in Figure 5-31 and click
Next.
Figure 5-31 New Server Pattern Wizard: Local Storage
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
106 NIC Virtualization on IBM Flex System
5. Expand Compute Node twistie, then click Add I/O Adapter 1 or LOM, as shown in
Figure 5-32.
Figure 5-32 Adding I/O adapter 1 or LOM
6. In the Add I/O Adapter window, select the adapter type (LOM) from the adapter list as
shown in Figure 5-33, then click Add.
Figure 5-33 Selecting the adapter type: LOM
Chapter 5. NIC virtualization considerations on the server side 107
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
7. On the next screen, select previously configured adapter and port patterns, as shown in
Figure 5-34. Click Add. In our example, we choose vNIC Switch Independent LOM
adapter pattern and vNIC switch independent port pattern that we created earlier.
Figure 5-34 Selecting adapter and port patterns
8. From the I/O Adapters screen (see Figure 5-32 on page 106) click Add I/O Adapter 2.
9. In the Add I/O Adapter window, select the adapter type (CN4054) from the adapter list as
shown in Figure 5-35, then click Add.
Figure 5-35 Selecting the adapter type: CN4054
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
108 NIC Virtualization on IBM Flex System
10.On the next screen, select previously configured adapter and port patterns, as shown in
Figure 5-36. In our example, we select previously configured vNIC UFP FCoE CN4054
adapter pattern and vNIC UFP and vNIC UFP FCoE port patterns. Click Add.
Figure 5-36 Selecting adapter and port patterns: CN4054
11.The configured adapter settings are summarized in Figure 5-37. Click Next.
Figure 5-37 New Server Pattern Wizard: I/O Adapters summary
Chapter 5. NIC virtualization considerations on the server side 109
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
12.Leave Keep existing boot mode selected as shown in Figure 5-38 and click Save.
Figure 5-38 New Server Pattern Wizard: Save
13.You can see the created server pattern in the list of patterns, as shown in Figure 5-39.
Figure 5-39 Newly created server pattern
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
110 NIC Virtualization on IBM Flex System
Deploying server pattern
Perform the following steps to deploy a server pattern:
1. Right click a server pattern that you are going to deploy and select Deploy from the
context menu, as shown in Figure 5-40.
Figure 5-40 Deploying server pattern
2. Select target nodes (we selected x240_03) as shown in Figure 5-41, then click Deploy.
Figure 5-41 Selecting target compute nodes
Chapter 5. NIC virtualization considerations on the server side 111
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
3. Click Deploy in the confirmation window appeared. A new job is started and the
confirmation is displayed as shown in Figure 5-42. Click Close.
Figure 5-42 Deployment job start confirmation
4. You can check the job status in the Jobs pod by clicking Jobs  Active and moving the
mouse pointer other the job name, as shown in Figure 5-43.
Figure 5-43 Server Profile activation job status
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
112 NIC Virtualization on IBM Flex System
5. Click Server Profiles on the left side of the Configuration Patterns window (see
Figure 5-43 on page 111). You see the profile deployment status in the Profile Column, as
shown in Figure 5-44.
Figure 5-44 Profile activation status
6. When profile activation completes successfully, the profile status changes to Profile
assigned, as shown in Figure 5-45.
Figure 5-45 Profile assigned
Server NICs are now configured. Now, let’s have a look at what changed in the UEFI for the
network setup.
Chapter 5. NIC virtualization considerations on the server side 113
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Go to UEFI by pressing F1 during the compute node boot phase, then select System
Settings  Network. Figure 5-46 and Figure 5-47 on page 113 show vNICs configured on
the LOM and the CN4054 adapter using configuration patterns.
Figure 5-46 Network Device List (Part 1)
Figure 5-47 Network Device List (Part 2)
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
114 NIC Virtualization on IBM Flex System
Select Onboard PFA 12:0:0 (Integrated LOM) from the device list, press Enter two times,
and verify vNIC parameters, as shown in Figure 5-48. LOM is configured with vNIC Switch
Independent mode and NIC personality (Ethernet only ports).
Figure 5-48 LOM vNIC configuration
Go back to the network device list by pressing Esc two times and select Slot PFA 22:0:0 (the
first ASIC of the CN4054) from the device list, press Enter two times, and verify vNIC
parameters, as shown in Figure 5-49. The first ASIC is configured with vNIC UFP mode and
NIC personality (Ethernet only ports).
Figure 5-49 CN4054 vNIC configuration: First ASIC
Chapter 5. NIC virtualization considerations on the server side 115
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Go back to the network device list by pressing Esc two times and select Slot PFA 27:0:0 (the
second ASIC of the CN4054) from the device list, press Enter two times, and verify vNIC
parameters, as shown in Figure 5-50. The second ASIC is configured with vNIC UFP mode
and FCoE personality (Ethernet and FCoE ports).
Figure 5-50 CN4054 vNIC configuration: Second ASIC
See the following link for more details on utilizing FSM configuration patterns:
http://www.redbooks.ibm.com/abstracts/sg248060.html
5.3 Utilizing physical and virtual NICs in the OS
Regardless of if the user is using virtual NICs or physical NICs, most Operating Systems have
various ways to utilize those NICs, either as individual links or in teamed/bonded modes for
better performance or high availability (or both). This section provides guidance on various
aspects of the NIC teaming/bonding usage by the OS.
5.3.1 Introduction to teaming/bonding on the server
The terms bonding and teaming are different words for the same thing. In general, in Linux it
is referred to as Bonding, in Windows and VMware it is referred to as Teaming. Regardless of
the term, these technologies provide a way to allow two or more NICs to appear and operate
as a single logical interface, for the purpose of either high availability or increased
performance (all modes of teaming/bonding provide high availability, some modes also
provide increased performance via load balancing).
Each OS has their own way of providing these services, with most having native built in
support, but some older Operating Systems still require a third party application to provide
this functionality.
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
116 NIC Virtualization on IBM Flex System
All teaming/bonding modes come in two primary types, Switch Dependent mode, and Switch
Independent mode, discussed here in more detail.
Switch Dependent modes of teaming/bonding
These are any teaming/bonding modes in the OS that also require a specific architecture in
the connecting switches, and special configurations in these upstream switches (in other
words, they are dependent on the upstream switch design and configuration to operate
correctly).
Some comments on these modes:
򐂰 All of these modes are some form of link aggregation, either static aggregation or dynamic
aggregation (Link Aggregation Control Protocol - LACP).
򐂰 Most OS’s support both an LACP and a static form of teaming/bonding, and these are all
forms of active/active teaming/bonding, usually load balancing traffic on a per-session
basis (what constitutes a session is usually controlled by settings on each side of the
device supporting this mode of teaming/bonding and is beyond the scope of this
document)
򐂰 Any teaming/bonding mode that utilizes either static or LACP aggregation, requires that all
ports in that team/bond, go to a single upstream switch, or a group of switches that can
appear as a single switch to the NIC teaming (for example, switches running Cisco vPC or
IBM vLAG, or stacked switches).
򐂰 Any of these modes also must have a corresponding mode of aggregation configured on
the upstream I/O Modules to work properly - this is what makes them Switch Dependent
Important: Currently using any aggregation based mode of teaming/bonding is not
supported on a server if any of the virtual NIC options (Switch Independent mode, VF vNIC
mode, or UFP) have been implemented. This is based on the current limitation that
aggregation on IBM switches is on the physical port, not the logical port. An upcoming
release of code should permit aggregations on UFP vPorts.
Chapter 5. NIC virtualization considerations on the server side 117
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Figure 5-51 shows some examples of Switch Dependent mode teaming/bonding and their
relationship to the upstream network connections.
Figure 5-51 Examples of architectures and their interaction with Switch Dependent modes of
teaming/bonding
Switch Independent modes of teaming/bonding
These are teaming/bonding modes that do not require any form of aggregation to be
configured on the switch and thus are not dependent on any special switch side design or
configuration (just ensure all ports connecting to the team carry a common set of VLANs and
any other normal switch settings the host requires).
Some comments on these modes:.
򐂰 Some Switch independent modes offer simple Active/Standby NIC teaming, where only
the active NIC is used, and the standby NIC comes into play only if the active NIC fails
򐂰 All operating systems offer more advanced kinds of server side teaming that deliver
Active/Active NIC usage by attempting to load balance the NICs in the team in such a way
that only the server knows or cares about this load balancing (in turn, the switch side of
this team/bond can load balance the return traffic based on how the host uses MACs to
send traffic out)
򐂰 Attempting to configure some form of aggregation on the I/O Module ports facing the NICs
in Switch Independent mode will almost always not work and lead to issues
StandAloneSW1 StandAloneSW2
ComputeNode
NIC0 NIC1
vLAGSW1 vLAGSW2
ComputeNode
NIC0 NIC1
StackedSW1 StackedSW2
ComputeNode
NIC0 NIC1
StandAloneSwitch
ComputeNode
NIC0 NIC1
Thesymbol representssomeformofaggregation
SwitchDependentMode
Supported
SwitchDependentMode
Supported
SwitchDependentMode
Supported
SwitchDependentMode
NotSupported
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
118 NIC Virtualization on IBM Flex System
Figure 5-52 shows examples of Switch Independent mode teaming/bonding and their
relationship to the upstream network connections.
Figure 5-52 Examples of architectures and their interaction with Switch Independent modes of
teaming/bonding
Understanding the terms Active and Standby with teaming/bonding
The use of the phrases Active/Standby, Active/Passive, Active/Backup, and Active/Active can
occasionally be misunderstood and confusing. This section attempts to clarify these terms.
Active/Standby, Active/Passive, and Active/Backup
These are all different names for the same thing, a NIC in a team/bond selected to be active
(passing traffic), and the other NIC is put into a standby state (not passing traffic), and only
used in the event the active NIC goes down. In some cases the team/bond might have
multiple active NICs and only a single standby NIC, or the reverse (one active NIC and
Important: In Figure 5-52 it always shows some sort of path between the pair of upstream
switches, and never two switches isolated from one another. Although that path may be
directly between the upstream pair (as shown here), or may be somewhere further up in
the architecture, it must be present to ensure a failover path between points in the event of
a path fault. See the section titled The need for end to end paths between NICs in a team
later in this chapter for more detail.
Important: The term Switch Independent has been used in this document in relation to a
form of virtual NIC that operates independently of the I/O Module, and now as a mode of
teaming/bonding on the server that is also independent of the I/O Module. Although they
are both independent of the I/O Module, other then the name, and this independence, they
are unrelated features.
StackedSW1 StackedSW2
ComputeNode
NIC0 NIC1
StandAloneSwitch
ComputeNode
NIC0 NIC1
vLAGSW1 vLAGSW2
ComputeNode
NIC0 NIC1
vLAGSW1 vLAGSW2
ComputeNode
NIC0 NIC1
SwitchIndependentMode
Supported
SwitchIndependentMode
Supported
SwitchIndependentMode
Supported
SwitchIndependentMode
NotSupported
Thesymbol representssomeformofaggregation
Chapter 5. NIC virtualization considerations on the server side 119
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
multiple standby NICs), the point being that one or more NICs in this mode are unused for any
traffic until needed.
Most users understand the operation of these modes of teaming, but there is occasionally
some confusion in the context of how the connecting I/O Modules are utilized. The I/O
Modules themselves are not in any sort of special Active/Standby config. I/O Modules
supporting servers running Active/Standby will both be active, simply forwarding traffic as it is
received from the server, following the rules of that I/O Module (usually L2 switching based on
MAC addresses). So the I/O Modules are not in any sort of Active/Standby mode and depend
on the servers to decide which I/O module to utilize (based on the NIC selected as active in
the OS team/bond).
Since the server admin can control what NICs are active or standby, it is possible to configure
some servers using a NIC going to I/O Module bay 1 as the active NIC, and other servers in
the same PureFlex chassis using a NIC pointing to the other I’O module in bay 2, and in doing
so, thus achieve some form of load balancing (albeit a chassis-based form of load balance).
For example, the server admin could configure half of the servers to utilize the NIC going to
I/O Module bay 1 for the active NIC, and the other servers using the I/O Module going to bay
2 for the active NIC.
One possible down side for this type of Active/Standby per-chassis load balance is that any
server within this PureFlex chassis that is using I/O Module bay 1, and has to talk to another
server in the same chassis using bay 2 as the active path, usually must have that traffic travel
to the upstream network and back down to get between the two I/O bays and their associated
active server NICs.
Overall, these Active/Standby modes tend to be the simplest to implement, and require no
special switch side configuration. But provide only high availability (no load balancing for a
single server), and thus are wasteful of over all bandwidth available to a given server.
Active/Active
While most agree on the meaning of the phrase Active/Standby, the phrase Active/Active is
frequently a point of contention when parties do not define what the term Active means.
In this document, the term active means the OS is free to actively use the NIC in any way that
agrees with the teaming/bonding mode selected in the OS, and does not leave that NIC in
some sort of standby mode. Within the term Active/Active, there are both Switch Dependent
modes and Switch Independent modes of teaming/bonding.
The following are some comments on Switch Dependent modes of Active/Active and some
examples of these modes:
򐂰 Like all Switch Dependent modes of teaming/bonding, any of these Active/Active modes
use some form of aggregation and requires an accompanying upstream network
architecture and I/O Module configuration to support this aggregation on the server side
NICs. Today these aggregation modes are exclusively either LACP or static aggregation
򐂰 These modes use the aggregation hash algorithm to determine what NIC is used for a
session of traffic, and a session of traffic may be based on MAC address, IP address
and/or other components of the packets being transferred
򐂰 The outbound path used for this mode of teaming/bonding is decoupled from the return
traffic, in that each side of the aggregation decides on their own hash what NIC to use for
a given session
򐂰 These modes provide a higher chance of better over all load balance, but do not
guarantee any load balancing. For example, for a given session if all traffic is between just
two hosts (for instance, a large file copy from one host to another) that traffic will generally
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
120 NIC Virtualization on IBM Flex System
only use a single NIC in the team. The return traffic will use whatever link it has hashed to
by the switch side hash, but will also only pick a single link for this single session for that
return traffic. Meaning for a given session, a sending device can only utilize the bandwidth
of a single NIC in the team.
򐂰 As noted previously, these aggregation based modes of teaming/bonding are not
supported today when using any of the virtual NIC features available from the Emulex NIC.
That means that if the server has been configured for UFP, VF mode vNIC, or Switch
Independent mode vNIC, that these teaming/bonding modes should not be implemented.
򐂰 Some examples of Active/Active teaming modes in this category for various OSs are:
– Linux: Bonding mode 2 - Static aggregation
– Linux: Bonding mode 4 - LACP aggregation
– ESX vSwitch teaming mode Route based on IP hash - Static aggregation
– ESX dvSwitch teaming mode Route based on IP hash - Static or LACP, depending on
LACP setting enabled or disabled in the dvSwitch
The following are some comments on Switch Independent modes of Active/Active, along with
some examples:
򐂰 Like all Switch Independent modes of teaming/bonding, there is no special switch side
architectures or configurations, and the switch should not be configured for any form of
aggregation
򐂰 These modes use some server side decision making process to select what NIC to use for
what session. In this case, a session is often all traffic from a given VM, or a given process
in a bare metal OS, or destination IP or MAC and so on. The point being that the server
decides how it will load balance the traffic over the NICs
򐂰 The outbound path used for this mode of teaming/bonding is usually not decoupled from
the inbound traffic, in that in most cases, what ever NIC is used to send outgoing traffic
from the host, the switch side will use the same NIC/link for any return traffic (the switch
bases its decision on the MAC learned when the host sent a packet, using that MAC to
return the traffic on the link learned).
򐂰 These modes can provide quite satisfactory load balancing, and are not dependent on
having a specific switch architecture or configuration above the host, as the Switch
Dependent modes do, and are available in al major Operating Systems
򐂰 Unlike the aggregation based modes of active/active teaming/bonding, these switch
independent modes of active/active teaming/bonding work fine with any of the virtual NIC
functions available in the Emulex adapters.
򐂰 Some examples of Active/Active Switch Independent modes of teaming/bonding in various
OSs are:
– Linux: Bonding mode 5 - Adaptive Transmit load balance
– Linux: Bonding mode 6 - Adaptive load balance
– ESX vSwitch teaming mode Route based on originating virtual port ID
– ESX dvSwitch teaming mode Route based on source MAC hash
In general, the Switch Dependent modes of Active/Active bonding/teaming, have a greater
potential (but no guaranteed they will) of over all better load balance in the team/bond, but
have added complexity, and only support certain upstream network architectures, and require
the server team to coordinate with the network team to match the aggregation configurations
correctly. Where as Switch Independent modes of Active/Active do not require any special
upstream architecture/switch configuration, and can be completely controlled and configured
from the server side of the equation, with no need for the server team to coordinate with the
network team (except for, of course, what VLANs to utilize and how (tagged or untagged),
which is always necessary, with or without any sort of teaming/bonding modes).
Chapter 5. NIC virtualization considerations on the server side 121
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Link and path fault detection in teaming/bonding
All teaming/bonding solutions need a way to know if a NIC in the team/bond is available for
use. Most use simple link up/down as the primary method. Some add a layer beyond simple
link up/down to attempt to detect remote failures beyond the direct link (upstream path
failures). In general, most of these remote fault methods use some sort of arp or ping or probe
packet to determine if the path to the other NIC or some upstream device is available, and if
not, take that NIC out of service.
Some examples of non-link fault failure detection technologies:
򐂰 Linux arp-monitoring
򐂰 VMware Beacon probing
򐂰 Broadcom Livelink (third party teaming tool)
All of these remote fault methods have their limitations and can be prone to false positives
(reporting a NIC unavailable when it can still service packets). Some examples of issues with
these remote fault detection methods:
򐂰 In a large DataCenter with potentially 1000’s of hosts using Linux arp-monitoring and
constantly ARPing the default gateway, could eventually become (or at least be perceived
as) a Denial of Service attack on the default gateway
򐂰 If Beacon probing in ESX is used on a two-NIC team, if it fails with both NICs still in an up
state (for example, a path fault not directly at the host, but somewhere in the upstream L2
network) it will not know which NIC is having a path issue and will begin to blast all packets
out both ports, potentially overloading the network and creating new issues (owing to this,
VMware does not recommend using Beacon probing with two NIC teams, but it will let you
configure it on a two NIC team).
Rather then using any of these OS based remote fault detection methods, it is usually
preferred to utilize the Failover feature of IBM switches. Other vendors often also support a
similar failover feature, such as Cisco’s Link State Tracking
See Chapter 6, “Flex System NIC virtulization deployment scenarios” on page 133 for some
examples of Failover configurations in a PureFlex System environment.
The need for end to end paths between NICs in a team
For teaming to work properly, there must be an end to end layer 2 path between the two (or
more) NICs in the team. In other words, If you have a pair of teamed NICs, and a host needs
to use VLAN 10, then VLAN 10 must be carried to both NICs, and that VLAN 10 must have an
external path in the upstream network to connect these two NICs together.
This is required for both failover, and in some configurations, load balancing and normal
traffic, and is true regardless of teaming type (switch dependent or switch independent
modes). This also has implications when using multi-switch aggregations (i.e. vPC or vLAG)
In a typical vLAG/vPC environment, a user might have a pair of enclosure switches, running a
vLAG aggregation toward the upstream network. Since the upstream switch thinks this pair of
enclosure based switches are one switch, a host on the enclosure might send a packet that
goes up on a port on one enclosure switch, but the response comes down on a port on the
other enclosure switch (based on the other sides load balancing transmit of packets).
Owing to this, you must ensure that not only is that VLAN carried on all ports to the server
team, and all ports to the upstream aggregation, but it must also be carried on the ISL links of
the vLAG/vPC. If this were a switch dependent mode of teaming (i.e. aggregation) this VLAN
on the ISL is needed in the event of failover. If this is a switch independent mode of teaming,
then this VLAN on the ISL is required for both failover and normal communications.
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
122 NIC Virtualization on IBM Flex System
5.3.2 OS side teaming/bonding and upstream network requirements
This section looks at the most common NIC teaming and bonding modes for various OSs and
relates them to requirements for the upstream connecting network.
Linux bonding
Linux bonding has evolved over the years to become easier to deploy and more robust. This
section discusses the various modes of bonding available on most Linux implementations.
Most flavors of Linux today come with the bonding module prepackaged, but some versions
still have to have it installed before bonding can be implemented.
Linux offers many different modes of bonding, and not all modes of bonding exist in all flavors
of Linux. But most implementations of Linux support bonding modes 0 through 6, which will
be discussed here.
Linux bonding offers two primary ways to determine if a link is available for server use
򐂰 mii-mon – This is simple link status up/down, and is the default for bonding
򐂰 ARP monitor – Sends an ARP packet to a specified device and expects a response
There are some helpful documents available on the web that explain bonding, but It is
important to note that much of the Linux bonding documentation has been written by server
admins, not network admins. Thus some of the terms used in these documents and help files
can be confusing to a network admin.
One of the better places to learn about Linux bonding is the following link:
https://www.kernel.org/doc/Documentation/networking/bonding.txt
Table 5-1 on page 122 provides a cross reference between Linux OS side modes of bonding
and their associated switch side requirements when using various Linux bonding modes:
Table 5-1 Linux Bonding modes and their associated switch side dependences if any
Linux side bonding modes and comments Type Switch side requirements and comments
Bond
Mode
Comments Type
of Agg
Comments
0 Round Robin Transmit – Also called
balance-rr - Xmit load balance per
packet
D Static Xmit load balance based on hash setting of the
switch
1 Active/Standby – No load balancing –
just fault tolerant
I None No load balancing of traffic
2 XOR of hash – Also called
balance-XOR - Xmit load balance
based on setting of xmit_hash_policy,
Xmit per session load balance
D Static Xmit load balance based on hash setting of the
switch
3 Broadcast – Xmits everything out all
member interfaces, No load balancing,
just fault tolerant
D Static Xmit load balance based on hash setting of the
switch - can work without switch side
aggregation support - see note below
4 LACP – Also called 802.3ad - Xmit load
balance based on setting of
xmit_hash_policy, Xmit per session
load balance
D LACP Xmit load balance based on hash setting of the
switch
Chapter 5. NIC virtualization considerations on the server side 123
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Some comments on Table 5-1 on page 122:
򐂰 Type I = A switch Independent mode of bonding
򐂰 Type D = A switch Dependent mode of bonding
򐂰 Mode 0 may lead to out of order packet reception on the receiving device (this mode is
usually only used in some very specific environments, for example, where out of order
packet reception is not an issue)
򐂰 Mode 2 is most aligned with the polices of typical static aggregation on a switch
򐂰 Mode 3 duplicates all packets on each port (this is not a common selection and is rarely
utilized). It could also potentially be used without static aggregation, if each NIC in the
bond went to different physical networks or devices upstream
򐂰 Mode 4 is aligned with the polices of LACP aggregation on a switch
򐂰 Modes 1, 5 and 6 do not require any sort of aggregation configured on the switch side
VMware ESX teaming
VMware ESX offers teaming on its virtual switches including both the stand alone vSwitch
and the distributed vSwitch (dvSwitch).
The forms of teaming available vary slightly between an ESX stand alone vSwitch and the
distributed dvSwitch, with the stand alone vSwitch offering the following four options:
򐂰 Route based on originating virtual Port ID (this is the default - load balances on a per-VM
basis)
򐂰 Route based on IP hash (this is a static aggregation)
򐂰 Route based on source MAC hash (similar to the default)
򐂰 Use Explicit failover order (high availability only, no load balancing)
The dvSwitch offers some of the same modes, but with more options. The following is the list
of teaming options available on the dvSwitch:
򐂰 Route based on originating virtual port (same as stand alone vSwitch)
򐂰 Route based on IP hash (defaults to static aggregation) (same as stand alone vSwitch)
򐂰 Route based on IP hash (Optionally configured for LACP)
򐂰 Route based on source MAC hash (same as stand alone vSwitch)
򐂰 Route based on physical NIC load (attempts to take into account the load on a NIC as they
are allocated to the VMs)
򐂰 Use Explicit failover order (same as stand alone vSwitch)
5 Adaptive Transmit Load balance – Also
called balance-tlb - Xmit based on
current load of NICs in bond
I None According to Linux documentation, return
traffic is not load balanced (only goes to slave
NIC)
6 Adaptive Load balance – Also called
balance-alb - Xmit per session load
balance
I None Load balances return traffic to host based on
MAC usage of the host side
Linux side bonding modes and comments Type Switch side requirements and comments
Bond
Mode
Comments Type
of Agg
Comments
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
124 NIC Virtualization on IBM Flex System
VMware offers two modes of detecting when a path is down:
򐂰 Link Status – this is simple link up/link down and is the default
򐂰 Beacon Probing – Only useful in vSwitches/dvSwitches with more then 2 NICs
– Do not use beacon probing on vSwitches/dvSwitches with only two NICs
– If the upstream switch offers a failover option (as all of the 4093 models do) it is
encouraged to use that over Beacon Probing
An older document that does a very good job of explaining VMware ESX networking and the
kinds of teams supported can be found at the following link (does not include the modes
available in the dvSwitch):
http://www.vmware.com/files/pdf/virtual_networking_concepts.pdf
Some good information specific to the dvSwitch can be found in the following link:
http://www.vmware.com/files/pdf/vsphere-vnetwork-ds-migration-configuration-wp.pdf
Table 5-2 provides a cross reference between OS side modes of teaming and their
associated switch side requirements when utilizing VMware ESX teaming:
Table 5-2 VMware teaming modes and their associated switch side dependences if any
VMware side teaming modes and comments Type Switch side requirements and comments
Mode of teaming Comments Type
of Agg
Comments
Route based on
originating virtual
port ID
Load balances NICs in vSwitch on a
per-VM basis – this is the default
teaming mode for an ESX vSwitch
I None Load balances return traffic to host
based on MAC usage of the host
side
Route based on
IP hash
This is a static aggregation on the ESX
links in the stand alone vSwitch. When
used on a dvSwitch portgroup and the
uplinks configured for LACP, this is an
LACP aggregation – see below)
D Static Xmit load balance based on hash
setting of the switch
Route based on
source MAC hash
This is similar to the default teaming
mode (per-VM) except it selects the
outbound NIC based on the source
MAC, and not the originating virtual
port ID
I None Load balances return traffic to host
based on MAC usage of the host
side
Use explicit
failover order
Always use the highest order uplink
from the list of Active adapters that is
up. No load balancing
I None No load balance
LACP LACP – only available on Distributed
vSwitch – can only configure from
vSphere Web client (not the traditional
vSphere client)– When configured, all
PortGroups using this uplink pair must
be set to Route based on IP hash
D LACP Xmit load balance based on hash
setting of the switch
Route based on
physical NIC load
Chooses path based on physical NIC
load – only available on Distributed
vSwitch (dvSwitch)
I None Load balances return traffic to host
based on MAC usage of the host
side
Chapter 5. NIC virtualization considerations on the server side 125
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Some comments on Table 5-2 on page 124
򐂰 Type I = A switch Independent mode of teaming
򐂰 Type D = A switch Dependent mode of teaming
򐂰 When NICs are added to a vSwitch, they can be assigned to active or standby rolls
independently of the mode of teaming assigned.
򐂰 vSwitch teaming modes can be overridden by vSwitch PortGroup teaming settings
Windows Server teaming
Teaming in a Windows Server environment can be quite varied. For Windows Server 2008
and 2003, any teaming was only provided by a third party application provided by the NIC
vendor. Starting in Windows Server 2012 there is a choice of using either a vendors third
party application, or built in teaming provided by Windows 2012.
For Windows versions (2012) that have native teaming ability, it is usually best to use the built
in native teaming, and only install a third party vendor if there is some special feature that is
needed that is not available by the built in versions of teaming in Windows.
Teaming using the native modes available in Windows Server 2012
As noted, Windows Server 2012 offers built in NIC teaming, also referred to as LBFO (Load
Balance/Failover) in some of their documentation.
Microsoft refers to their teaming options as either switch independent mode, or switch
dependent mode, with the same meaning we have been applying in this chapter.
When selecting the teaming mode in Windows 2012, the user is presented with three options:
򐂰 Static Teaming
Also referred to as generic aggregation in some Microsoft documentation, represents a
static aggregation and is switch dependent, requiring a static aggregation to be configured
on the switch.
򐂰 Switch Independent
As the name implies, represents a switch independent mode of teaming (no aggregation
configuration needed on the switch). How it utilizes the NICs for load balance is a separate
setting.
򐂰 LACP
Also referred to as 802.1AX in some Microsoft documentation (AX being the latest IEEE
standard for LACP, replacing the older 802.3ad LACP standard), is a switch dependent
mode of teaming that requires LACP be configured on the upstream switch.
Separate from the teaming mode, a user can then select load balance method. In the initial
versions of Windows Server 2012, two types of load balance options existed. Address hash
and Hyper-V Port. Address hash utilizes information from the IP addresses in the packets to
determine load balance. Hyper-V port attempts to load balance on a per vPort basis (not
related to the term vPort as configured in IBM UFP virtual NIC settings).
As of Windows Server 2012 R2, Microsoft has added a third load balance option, dynamic
load balance, that attempts to also factor in NIC utilization to distribute the loads. Details on
this and other aspects of teaming load balancing for Windows Server 2012 can be found in a
document available from the following location:
http://www.microsoft.com/en-us/download/confirmation.aspx?id=40319
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
126 NIC Virtualization on IBM Flex System
As noted, Windows Server 2012 also still allows third party NIC vendors teaming applications,
but states that it is strongly recommended that no system administrator ever run two teaming
solutions (built in Windows teaming and third party vendor teaming) at the same time on the
same server. So use built in, or use third party, but never use both at the same time.
Another good Microsoft document explaining Windows Server 2012 NIC teaming can be
found at the following link:
http://www.microsoft.com/en-us/download/details.aspx?id=30160
Table 5-3 provides a cross reference between Windows Server 2012 OS side modes of
teaming and their associated switch side requirements:
Table 5-3 Windows 2012 teaming modes and their associated switch side dependences if any
Some comments on Table 5-3 on page 126
򐂰 Type I = A switch Independent mode of teaming
򐂰 Type D = A switch Dependent mode of teaming
򐂰 Both static teaming and LACP modes can also be set to use one of the three available
hash methods (Address hash, Hyper-Vport, and if 2012 R2, Dynamic)
򐂰 Active/Standby teaming is available as a function of building one of the above mode
teams, and then choosing to put a a member of the team into standby
Teaming using third party vendor applications for Windows
As noted, for Windows Server 2008 or Windows Server 2003, a vendor supplied application is
required to implement any form of NIC teaming.
Windows 2012 side teaming modes and comments Type Switch side requirements and comments
Mode of teaming Comments Type
of Agg
Comments
Switch
Independent
(all load
balancing is
controlled by the
server side)
Load balance options are set
independent of teaming mode
selection. Available load balance
options are:
Address hash - attempts to load
balance based on IP addressing
information in the packets
Hyper-V port - This is a per-VM load
balance and load balances the NICs on
a per-VM basis
Dynamic (only with R2 or later) -
Attempts to assign outbound flows
based on IP addresses, TCP ports and
NIC utilization
I None Load balances return traffic to host
based on MAC usage of the host
side
Static Teaming Microsoft uses the names Generic
Trunking and IEEE 802.3ad draft
v1 in some of their documentation to
refer to a static aggregation
D Static Xmit load balance based on hash
setting of the switch
LACP Microsoft uses the name IEEE
802.3AX LACP in some of their
documentation to mean an LACP
aggregation
D LACP Xmit load balance based on hash
setting of the switch
Chapter 5. NIC virtualization considerations on the server side 127
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
Which vendor application you chose is mostly based on the vendor NIC in use on the server.
This section discusses two of the more common NIC vendors, Broadcom and Emulex, and
their tools.
Broadcom provides an application named Broadcom Advanced Server Program (BASP) that
runs inside of the Broadcom Advanced Control Suite (BACS) to provide teaming services in
Windows 2003/2008. It supports many of the Broadcom NICs as well as some Intel NICs. for
a list of supported NICs and an introduction to this product, see the following link:
http://www.broadcom.com/support/ethernet_nic/management_applications.php
Broadcom BASP supports four primary teaming modes as noted in Table 5-4, and also has a
form of remote path failure detection, known as LiveLink. Livelink requires an IP address on
the team interface and separate IP addresses on each of the physical NICs. Like all forms of
NIC teaming remote path detection discussed in this document, a more robust choice is
usually to make use of the switch side Failover feature.
A good document on using BASP can be found at the following link:
http://www.broadcom.com/docs/support/ethernet_nic/Broadcom_NetXtremeII_Server_T7.8
.pdf
Table 5-4 provides a cross reference between OS side modes of teaming and their
associated switch side requirements, when utilizing Windows and the Broadcom Advanced
Server Program:
Table 5-4 Broadcom third party teaming modes and their associated switch side dependences if any
Some comments on Table 5-4 on page 127
򐂰 Type I = A switch Independent mode of teaming
򐂰 Type D = A switch Dependent mode of teaming
򐂰 This BASP tool can also be used to create VLAN tagged interfaces
Emulex is another vendor that offers third party teaming for Windows Server 2003 and 2008
platforms. Emulex refers to their teaming application as OneCommand NIC Teaming and
Windows/Broadcom side teaming modes and comments Type Switch side requirements and comments
Mode of teaming Comments Type
of Agg
Comments
Active/Standby Active NIC carries all traffic until it fails,
then standby NIC takes over. No load
balancing
I None No load balance
Smart Load
Balance (SLB) –
with or without
auto failback
Attempts to load balance based on IP
flows. With failback enabled, if a NIC
that had failed comes back up, teaming
will attempt to switch traffic back to that
NIC
I None Load balances return traffic to host
based on MAC usage of the host
side
Generic Trunking
(FEC/GEC)/802.3
ad-Draft Static
This is a typical static aggregation
implementation. Broadcom also refers
to this as (FEC/GEC)-
802.3ad-Draft Static
D Static Xmit load balance based on hash
setting of the switch
Link Aggregation
(802.3ad)
This works with LACP aggregations D LACP Xmit load balance based on hash
setting of the switch
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
128 NIC Virtualization on IBM Flex System
VLAN Manager, and also offers four primary modes of teaming, as noted in Table 5-5 on
page 128.
Emulex also uses the terms switch independent and switch dependent modes of teaming in
their documentation, which can be found at the following link:
http://www-dl.emulex.com/support/windows/windows/240005/nic_teaming_manager.pdf
Table 5-5 provides a cross reference between OS side modes of teaming and their
associated switch side requirements, when utilizing Windows and the Emulex OneCommand
application.
Table 5-5 Emulex third party teaming modes and their associated switch side dependences if any
Some comments on Table 5-4 on page 127
򐂰 Type I = A switch Independent mode of teaming
򐂰 Type D = A switch Dependent mode of teaming
򐂰 The Emulex tool can also be used to create VLAN tagged interfaces
5.3.3 Discussion of physical NIC connections and logical enumeration
From a physical perspective, all physical NICs are hard wired to a specific I/O Module bay and
specific port on those I/O Modules in the Flex System chassis. Examples of these fixed
physical connections can be seen in 3.1, “Enterprise Chassis I/O architecture” on page 28.
Any virtual NICs that are created on top of a physical NIC can naturally only connect to
wherever the physical NIC it was created from connects to. Although a given physical NIC
always goes to a specific physical I/O Module and port, how the OS enumerates (names)
these NICs can be confusing and downright illogical at times.
Knowing what OS enumerated NIC physically rides on top of what physical NIC (and thus
where it connects to what I/O Module in the Flex System) is important for the server
administrator. Understanding this logical to physical mapping allows proper NIC selection
when building teamed/bonded designs. If we do not understand this relationship, and build a
team or bond of two NICs that happen to go to the same switch, although providing increased
Windows/Emulex side teaming modes and comments Type Switch side requirements and comments
Mode of teaming Comments Type
of Agg
Comments
Failover (FO) Simple Active/Standby - no load
balancing
I None No load balance
Smart Load
Balance (SLB) -
AKA just “Load
Balance”
Attempts to load balance based on IP
hash setting
I None Load balances return traffic to host
based on MAC usage of the host
side
Generic trunking -
Link aggregation
static mode
(802.3ad static
aggregation)
This is a typical static aggregation
implementation
D Static Xmit load balance based on hash
setting of the switch
Link Aggregation
Control Protocol
(LACP)
This works with LACP aggregations D LACP Xmit load balance based on hash
setting of the switch
Chapter 5. NIC virtualization considerations on the server side 129
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
bandwidth and NIC redundancy, it would not provide redundancy in the event of an I/O
Module failure.
As an example of OS enumeration, Figure 5-53 represents a Compute Node in a PureFlex
System environment, not configured for any virtual NIC technology, and how VMware ESX
might typically enumerate those physical NICs.
Figure 5-53 Dual port physical NIC enumerated in a VMware ESX host
As can be seen, the OS enumerated NIC vmnic0 has been associated with physical NIC 0
that connects to the I/O Module in bay 1, and the OS enumerated vmnic1 has been
associated with the physical NIC1 that connects to I/O Module bay 2. In this case, putting
these two NICs in a team/bond would provide full redundancy, straight forward and orderly.
I/O
Module
1
I/O
Module
2
ComputeNodeInthePureFlexrunning
VMwareESX
Physical10GLinks
vmnic0 Physical
NIC0
Physical
NIC1
vmnic1
WithoutvNICorUFP
enabled– PhysicalNICs
asseenbytheOS
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
130 NIC Virtualization on IBM Flex System
If we then look at an ESX host that had been installed when the NICs had been set for one of
the virtual NIC modes we might see what is represented in Figure 5-54 (NICs configured for
virtual fabric mode with no iSCSI or FCoE personality selected).
Figure 5-54 Dual port physical NIC in a virtual NIC mode enumerated in a VMware ESX host
Notice the enumeration sequence as seen Figure 5-54 is also very orderly, and could be
readily utilized to determine best pairs of NICs for teaming/bonding (for example, vmnic0 and
vmnic1 in a team/bond, vmnic2 and vmnic3 in a team/bond, and so on) to provide I/O Module
redundancy.
Although this orderly enumeration is frequently the case, it is not always how it works out (true
of all operating systems, not just ESX shown in this example). In some cases, the
enumeration may be in a completely different order then might be expected. For example, if a
user had installed VMware when virtual NICs were enabled, and then disabled the virtual
NICs and booted back up into the OS, the remaining physical NICs may not be sequential or
logical in the OS.
In the case of underlying NIC configuration changes, one way (although disruptive) to force
the OS to re-enumerate the NICs in proper order is to reinstall the OS, and let it rediscover the
current NIC structure. Perhaps simpler is to rename the NICs in the OS (some OS’s provide
this ability).
Even with a reinstall though, there are times when the OS just seems to want to provide less
then obvious enumerations of the NICs, and this can be problematic. How can a user
determine what OS named NIC is mapped to what physical NIC and I/O Module?
There are several ways to help figure out what OS NIC is associated with what physical NIC.
One of the simpler is to go into the I/O Module and shut down one of the physical ports toward
the Compute Node, and see which NICs the OS then report as disconnected. Of course this
is a disruptive operation so not unnecessarily a good choice in a production environment.
Perhaps a less disruptive way is to make note of MAC addresses in the OS, and look in the
I/O Module MAC address table to determine what physical port they came in on. But this can
be a little more complicated with OSs that do not use the physical NIC MACs.
I/O
Module
1
I/O
Module
2
ComputeNodeInthePureFlexrunning
VMwareESX
Physical10GLinks
vmnic0
vmnic2
vmnic4
vmnic6
Physical
NIC0
Physical
NIC1
vmnic1
vmnic3
vmnic5
vmnic7
WithvNICorUFP
enabled VirtualNICsas
seenbytheOS
Chapter 5. NIC virtualization considerations on the server side 131
Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm
One fairly accurate, if not time consuming, method to make this determination is to go into the
UEFI F1 setup, into the Network screen for the NICs, and make note of the information there
to compare to information related to each logical NIC in the OS.
Figure 5-55 represents an example of what might be seen on this Network screen:
Figure 5-55 Example of MAC and PCI Function Address numbering of virtual NICs
This screen provides both the MAC address and PCI Function Address (PFA) information for
each physical or logical NIC, which can then be used in the server OS to figure out what OS
enumerated names are related to the physical (or logical) NICs in hardware. The following two
examples show the MAC and PFA info for comparison and contrast between the physical and
then converted NICs for a dual port LoM NIC. Example 5-1 represents the values as seen for
an onboard NICs not in any virtual NIC mode, along with what physical I/O Module bays those
physical NICs connect to. Example 5-2 represents that same onboard NIC after conversion to
some form of virtual NIC mode.
Example 5-1 Example of onboard dual port NIC not in any virtual NIC mode
MAC: 34:40:B5:BE:83:D0 Onboard PFA 12:0:0  physical NIC-0 to I/O Module bay 1
MAC: 34:40:B5:BE:83:D4 Onboard PFA 12:0:1  physical NIC-1 to I/O Module bay 2
As can be seen in Example 5-2, the original physical NIC and PFA information have been
inherited by the first two virtual NICs, followed by the other 6 virtual NICs and their associated
MAC, PFA info, and what I/O Module (based on the under lying physical connections of the
physical NIC) they connect to.
Example 5-2 Example of onboard dual port NIC after converting in to virtual NIC mode
MAC: 34:40:B5:BE:83:D0 Onboard PFA 12:0:0  physical NIC-0 to I/O Module bay 1
MAC: 34:40:B5:BE:83:D4 Onboard PFA 12:0:1  physical NIC-1 to I/O Module bay 2
MAC: 34:40:B5:BE:83:D1 Onboard PFA 12:0:2  physical NIC-0 to I/O Module bay 1
MAC: 34:40:B5:BE:83:D5 Onboard PFA 12:0:3  physical NIC-1 to I/O Module bay 2
MAC: 34:40:B5:BE:83:D2 Onboard PFA 12:0:4  physical NIC-0 to I/O Module bay 1
MAC: 34:40:B5:BE:83:D6 Onboard PFA 12:0:5  physical NIC-1 to I/O Module bay 2
MAC: 34:40:B5:BE:83:D3 Onboard PFA 12:0:6  physical NIC-0 to I/O Module bay 1
MAC: 34:40:B5:BE:83:D7 Onboard PFA 12:0:7  physical NIC-1 to I/O Module bay 2
As noted, now that we know this MAC and PFA information (as well as their relationship to the
underlying physical NIC and where it connects to), it is usually possible to go into the OS and
locate either the MAC or PFA information associated with the OS enumerated name (for
example, in the Device Manager in Windows Server 2012), and thus regardless of the
enumerated name, know where each vNIC connects to.
Regardless of how it is determined, getting the proper pair of NICs into a team/bond is always
important to ensure the desired high availability is achieved.
NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm
132 NIC Virtualization on IBM Flex System
© Copyright IBM Corp. 2014. All rights reserved. 133
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios.fm
Chapter 6. Flex System NIC virtulization
deployment scenarios
This chapter provides details on various aspects of NIC virtualization as well as their
interactions with a number of I/O Module features.
The following topics are covered:
򐂰 6.1, “Introduction to deployment examples” on page 134
򐂰 6.2, “UFP mode virtual NIC and Layer 2 Failover” on page 137
򐂰 6.3, “UFP mode virtual NIC with vLAG and FCoE” on page 149
򐂰 6.4, “pNIC and vNIC Virtual Fabric modes with Layer 2 Failover” on page 163
򐂰 6.5, “Switch Independent mode with SPAR” on page 189
6
Deployment scenarios.fm Draft Document for Review May 1, 2014 2:10 pm
134 NIC Virtualization on IBM Flex System
6.1 Introduction to deployment examples
This chapter provides examples for deploying the PureFlex I/O Modules and virtual NIC
functionality in a number of different scenarios. Also provided are helpful commands to
confirm the environment is operating as designed.
It is important to note that the examples provided may or may not reflect an exact combination
of features an average environment might include, but were more chosen to demonstrate the
interoperation of features and their associated configurations. The following combinations of
features will be presented in this chapter:
򐂰 UFP mode virtual NIC and Layer 2 Failover
򐂰 UFP mode virtual NIC with FCoE and vLAG
򐂰 Virtual Fabric mode vNIC and Physical NIC with Layer 2 Failover
򐂰 Switch Independent mode vNIC with SPAR
The above combinations are not necessarily indicative of any specific restriction as to what
works with what, or on what model I/O Module, but some features and combinations of
features do indeed not interoperate with others, or on all I/O Modules. Some considerations in
this regard:
򐂰 NIC virtualization features
– All forms of vNIC are mutually exclusive of each other on the server side. In other
words, a given server can be set for UFP or Virtual Fabric mode vNIC, or Switch
Independent mode (or disabled for virtual NIC), but not more then one of these can be
set at one time on that server.
– On the switch side related to virtual NICs, UFP and Virtual Fabric mode vNIC are also
mutual exclusive of each other, in that you can enable one or the other, but not both at
the same time. Switch Independent mode vNIC can be enabled on a host, connecting
to an I/O Module that is configured for UFP or Virtual Fabric mode vNIC, but only if the
I/O Module ports facing this host are in physical mode (not enabled/configured for UFP
or Virtual Fabric mode vNIC).
򐂰 Switch virtualization features
– SPAR, vLAG and Stacking are all mutually exclusive on a given I/O Module.
– SI4093 does not support vLAG or Stacking, but does support SPAR. The SI4093 also
does not support any of the I/O Module based virtual NIC technologies (UFP or Virtual
Fabric vNIC) but like all I/O Modules, supports Switch Independent mode vNIC running
on the host.
– The I/O Module based Failover feature is supported with all modes of virtual NIC, but
implemented differently depending on the mode of virtual NIC (Virtual Fabric vNIC is
configure on a per vNIC group basis, Switch Independent vNIC is configured using
Important: Unless otherwise noted, all configuration examples and commands in this
document are based on using the industry standard CLI (isCLI) of the PureFlex I/O
Modules. By default today, the EN4093R and CN4093 use the menu driven CLI (this may
change in the future). If an I/O Module is in the menu driven CLI mode, to make use of
these examples it is first necessary to change to isCLI mode. The simplest way to get into
the isCLI mode from the menu CLI mode, is to issue the menu CLI command /boot/prompt
ena, and then exit out and log back in. Upon logging back in you will be offered the option
to select the desired CLI.
Chapter 6. Flex System NIC virtulization deployment scenarios 135
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios.fm
global failover per physical port, and UFP is also configured using global failover, but
on a per vPort basis).
The above notations are provided as some examples, but other restrictions may apply. These
are noted in more depth in chapters 3 and 4.
Deployment scenarios.fm Draft Document for Review May 1, 2014 2:10 pm
136 NIC Virtualization on IBM Flex System
137
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm
6.2 UFP mode virtual NIC and Layer 2 Failover
Unified Fabric Port provides for the ability of carving up 10 Gb ports into virtual NICs as seen
in Chapter 5, “NIC virtualization considerations on the server side” on page 75. Layer 2
Failover, seen in other chapters throughout this book, provides for the ability to detect uplink
failures and systematically disable all INT ports. Layer 2 Failover with UFP takes that process
to the next level and automates the shutdown not only to a physical NIC but a UFP vPort
virtual NIC.
This section will provide diagrams and configuration examples for setting up UFP and Layer 2
Failover. The following topics are covered:
򐂰 6.2.1, “Components”
򐂰 6.2.2, “Topology”
򐂰 6.2.3, “Use Cases” on page 139
򐂰 6.2.4, “Configuration” on page 139
6.2.1 Components
This deployment scenario uses the following equipment:
򐂰 Flex System Enterprise Chassis
򐂰 x240 Compute Node (in bay 3)
– Running ESXi 5.5
– Dual port Emulex LoM NIC
• Physical NIC disabled in UEFI
– Quad port CN4054 NIC in Mezz slot 2
• First 2 physical NICs have UFP configured and FCoE personality enabled
• Second 2 NICs have Virtual NIC disabled (in physical NIC mode)
򐂰 Two CN4093s in switch bays 3 and 4
򐂰 Two G8264 to act as upstream Ethernet connectivity running vLAG
6.2.2 Topology
The x240 Compute Node OS running ESXi will be utilizing vSwitch0 using its default NIC
team setting route based on originating virtual port to the pair of CN4093s. The first two
ports within the UEFI of the CN4054R Emulex Quad Port NIC will be running in UFP mode.
The CN4093s are running as independent I/O Modules with UFP enabled on vPort (.1) in
Tunnel mode and vPort (.2) in FCoE mode.
Tunnel mode is utilizing EXT1 and EXT2 which are in an IEEE 802.3ad LACP PortChannel
with adminkey 4344. The PortChannel, along with INT port 4 UFP vPort (.1) are members of
a failover trigger.
Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm
138 NIC Virtualization on IBM Flex System
As seen in Figure 6-1 below a single I/O Module is being presented to display connectivity
between the Compute Node and the external network.
Figure 6-1 failover trigger with an active failure
In the above Figure 6-4 EXT1 and EXT2 are forming a PortChannel which are also members
of a failover trigger. The failover trigger is configured to only allow a single port to fail before it
fails the associated INT vPorts. In this example we’re using Auto Monitor with VLAN aware.
There are two forms of failover triggers that can be configured;
򐂰 AMON - Auto Monitor which allows for tracking of a physical uplink, static PortChannel or
LACP PortChannel. When the uplink fails the I/O Module will auto disable any associated
INT Ports or vPorts that is associate with any of the VLANs also assigned to the Monitor
Port.
򐂰 MMON - Manual Monitor which also allows for tracking of the same uplink types as AMON,
upon failure, will disable any manually configured INT ports or vPorts associated with that
Trigger.
򐂰 Limit is a mechanism that is part of failover and can be applied on a per trigger bases. In
this example limit is set to 1 within trigger 1. Limit 1 represents the number of ports that
must be up and forwarding before a failover is triggered. Once the limit is met failover will
trigger a event and disable all INT Ports or vPorts associated with that trigger.
There are a couple of different ways a failure can occur. The most clearly understood way is
by failure of link on the physical port. The second method in which a failure can occur is by
spanning-tree state. When a VLAN that has spanning-tree enabled on the uplink or
PortChannel enters into a non forwarding state the I/O Module sees this as a failure and
triggers a failover event disabling any of the INT Ports and or vPorts associated with that
trigger. A non spanning-tree forward failure event can occur on either AMON or MMON types
of Failover.
139
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm
6.2.3 Use Cases
Failover can be extremely useful when NIC Teaming / Bonding is utilized on the Compute
Nodes. Since the I/O Module is between the Compute Node and the upstream network a
Compute Node has no way of detecting an outage further than its physical connection and
can end up sending traffic to a black holed I/O Module. For this reason failover is a significant
feature that will allow customers to implement an HA environment with a peace of mind that if
a failure does occur their applications can survive with full access through its redundant
connection to the network.
6.2.4 Configuration
This section includes the configurations and steps necessary to configure the various
components. This will not include the upstream G8264s as that is not the focus of this section
(but it will include the configuration for the uplinks in the CN4093’s toward the G8264).
Host side configuring (OS/UEFI)
The process of configuring the UEFI is the same for any operating system that resides on an
Intel based Compute Node.
In Figure 6-2 below the UEFI Emulex NIC Selection page is found within the System
Settings  Network  Network Device List {NIC wanting to enable UFP on}. Once here
select Multichannel Mode  IBM Unified Fabric Protocol Mode. After making the
change step all the way back out to System Configuration and Boot Management by pressing
ESC and select Save Settings. Once enabled on one port of a two port ASIC the settings will
automatically be applied to the other port/s.
Figure 6-2 UEFI Emulex NIC Selection settings
Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm
140 NIC Virtualization on IBM Flex System
In Figure 6-3 vmnic2 and vmnic3 are associated with UFP port 4 vPort (.1) on each of the
CN4093s. This is representing a healthy management network as both vmnics are being
listed as Connected.
Figure 6-3 ESXi Management with both redundant ports showing Connected
In Figure 6-4 below the associated vSwitch which is utilizing vmnic1 and vmnic2 are seen
below and is also showing connected.
Figure 6-4 ESXi vSwitch with redundant vmnics
Switch side configuration
This subsection explains switch side configuration. The following options are covered:
򐂰 “Base Configuration of I/O Module”
򐂰 “Auto Monitor (AMON)” on page 142
򐂰 “Manual Monitor (MMON)” on page 143
򐂰 “View from Flex System Chassis with 2x CN4093s” on page 144
Base Configuration of I/O Module
Although the base configuration and following failover configurations are all utilizing a pair of
CN4093 I/O Modules the steps below can also apply to the EN4093/R with potentially minor
EXT Port reassignments since the CN4093 has a different EXT port alignment than either of
the EN4093 I/O Modules.
1. The first step, if utilizing a PortChannel as the Uplink, is to create an LACP 802.3ad
PortChannel. In this Example 6-1 on page 141 below there will be four ports utilized as the
Uplink providing 40 Gb of unidirectional bandwidth. Also configured will be the
tagpvid-ingress setting as the vPort will be running in tunnel mode.
141
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm
Example 6-1 Setting up LACP as the uplink
interface port EXT11-EXT14
lacp mode active
lacp key 5356
tagpvid-ingress
2. The second step is to create the UFP vPorts that will be utilized as the vmembers within
the failover trigger. The Example 6-2 below shows how to setup UFP with a vPort running
in Tunnel Mode.
Example 6-2 Setting up UFP vPort 1 in Tunnel mode
ufp port INTA3,INTA4 vport 1
network mode tunnel
network default-vlan 4091
qos bandwidth min 50
enable
exit
ufp port INTA3 enable
ufp port INTA4 enable
ufp enable
3. Since the I/O Modules will be running in UFP Tunnel mode and not participating in
spanning tree the option of shutting down spanning-tree globally is provided in
Example 6-3 below.
Example 6-3 globally disabling spanning-tree
spanning-tree mode disable
Now that the I/O Module has been completely setup to support both the uplink PortChannel
and the UFP INT Ports the next step is to decide whether to utilize Auto Monitor (AMON) or
Manual Monitor (MMON). Both AMON and MMON have there advantages.
With AMON, in combination with UFP globally enabled, VLAN monitoring must be enabled
before you can enable a failover trigger. VLAN monitoring allows the I/O Module to only
disable those vPorts that carry the same VLAN ID as the Uplink or PortChannel assigned to
that Failover Trigger. All other vPorts will remain unaffected even within the same physical INT
port as the failed vPort.
With MMON, the meaning of the word “Manual” is exactly that. The I/O Module must be
defined with both the Monitor Port or PortChannel (EXT ports) and the Control members and
or vmembers (INT ports). MMON, perhaps, might be more utilized as it provides for greater
control of what gets disabled during an uplink outage.
Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm
142 NIC Virtualization on IBM Flex System
Auto Monitor (AMON)
In Figure 6-5 below two of the 4 uplinks within the LACP PortChannel have failed. Since the
limit of ports is set to 2 (i.e. 2 ports left up) a failed event occurs in that I/O Module causing all
vPorts associated with the same VLANs listed in the PortChannel to also fail.
Figure 6-5 Auto Monitor failure
I/O Module configuration in Example 6-4 below consists of a trigger set with auto monitor
enabled. This trigger is also set to fail, with a limit of 2, all control members and/or vmembers
if the number of forwarding ports is reached by the specified failover limit number.
Example 6-4 Failover Trigger with amon configuration
failover enable
failover vlan
failover trigger 1 limit 2
failover trigger 1 amon admin-key 5356
failover trigger 1 enable
Note: VLAN trigger requirement with AMON is only necessary if UFP is enabled. AMON
failover also works without vlan tracking when UFP is not enabled.
143
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm
Manual Monitor (MMON)
In Figure 6-6 below two of the 4 uplinks within the LACP PortChannel have failed. Since the
limit of ports is set to 2 (i.e. 2 ports left up) a failed event occurs in that I/O Module causing all
vPorts, manually enabled as control ports, to also fail.
Figure 6-6 Manual Monitor failure
I/O Module configuration Example 6-5 below consists of a trigger set to MMON enabled. This
trigger is also set to fail, with a limit of 2, all control members and or vmembers if the number
of forwarding ports is reached by the specified failover limit number.
Example 6-5 Failover Trigger with mmon configuration
failover enable
failover trigger 1 limit 2
failover trigger 1 mmon monitor admin-key 5356
failover trigger 1 mmon control vmember INTA3.1
failover trigger 1 mmon control vmember INTA4.1
failover trigger 1 enable
The biggest difference between AMON and MMON is AMON uses VLANs associated with the
EXT Port and triggers a failure event disabling only those vPorts associated with the same
VLANs as the Uplink defined within the trigger.
Verification of proper configuration, with show commands, can be seen in “Confirming
operation of the environment” on page 144.
Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm
144 NIC Virtualization on IBM Flex System
View from Flex System Chassis with 2x CN4093s
Figure 6-7 shows a view of two CN4093s with UFP and Failover enabled. This scenario is
identical from the two scenarios above allowing the redundant link to take 100% of the
bandwidth after a failure to the primary ESXi vmnic.
Figure 6-7 Flex Chassis with 2x CN4093s with failover enabled
6.2.5 Confirming operation of the environment
Upon completion of the above steps there are several show commands that can display
whether or not Failover is working as expected with the desired configuration.
The first and easiest example, as seen below with Example 6-6 on page 145, is to display the
status of the vPorts. By issuing a show ufp information port command this displays the
health of each vPort. INTA3 and INTA4 Channel 1 (i.e. vPort (.1)) are both showing disabled,
however, notice an asterisk next to the word disabled. This indicates, as also noted at the
bottom of this example, that the vPort has been disabled due to a UFP failover trigger. This
indicates that the number of failed uplinks, either the entire uplink/s or the limit, has been
reached.
Important: Channel 2 (i.e. vPort (.2)) is still up and forwarding as those vPorts were not
members of a trigger with a failed event.
145
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm
Example 6-6 UFP vPort status
CN4093a(config)#show ufp information port
-----------------------------------------------------------------
Alias Port state vPorts chan 1 chan 2 chan 3 chan 4
------- ---- ----- ------ --------- --------- --------- ---------
INTA1 1 dis 0 disabled disabled disabled disabled
INTA2 2 dis 0 disabled disabled disabled disabled
INTA3 3 ena 2 disabled* up disabled disabled
INTA4 4 ena 2 disabled* up disabled disabled
INTA5 5 dis 0 disabled disabled disabled disabled
.
.
.
* = vPort disabled due to UFP teaming failover
Example 6-7 shows results of the command show portchannel information which displays
the number of ports that have failed. As you can see below, the number of ports left up and
forwarding is two. The failover trigger is also set to 2 so the limit has been reached which
forced a failure event and disabled all INT vPorts. This command is especially important to
figure out if what caused the failure event was due to Link status or Spanning-Tree block
status.
Example 6-7 displaying which ports within a PortChannel are still forwarding
CN4093a(config)#show portchannel information
PortChannel 65: Enabled
Protocol - LACP
Port State:
EXT13: STG 1 forwarding
EXT14: STG 1 forwarding
These next two, Example 6-8 and Example 6-9 on page 146,display the full status of a
failover trigger. This just might be the easiest command to run to find whether a trigger has
been activated or not.
In Example 6-8 notice that the limit is set to 2 with three of the four ports still remaining in
Operational status. Because the limit has not been met the failover trigger has not kicked in.
Example 6-8 Healthy Trigger state
CN4093a(config)#show failover trigger 1 information
Trigger 1 Manual Monitor: Enabled
Trigger 1 limit: 2
Monitor State: Up
Member Status
--------- -----------
adminkey 5356
EXT11 Operational
EXT12 Failed
EXT13 Operational
EXT14 Operational
Control State: Auto Controlled
Member Status
--------- -----------
Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm
146 NIC Virtualization on IBM Flex System
Virtual ports
INTA3.1 Operational
INTA4.1 Operational
In Example 6-9 notice that the limit is set to 2 and there are only two ports remaining in
Operational status. Because the limit has now been met the failover trigger has kicked in and
put the associated vPorts into a Failed state.
Example 6-9 Failed Trigger state
CN4093a(config)#show failover trigger 1 information
Trigger 1 Manual Monitor: Enabled
Trigger 1 limit: 2
Monitor State: Down
Member Status
--------- -----------
adminkey 5356
EXT11 Failed
EXT12 Failed
EXT13 Operational
EXT14 Operational
Control State: Auto Disabled
Member Status
--------- -----------
Virtual ports
INTA3.1 Failed
INTA4.1 Failed
We can also see disconnects from the host side indicating that a physical or logical
connection has been terminated.
In Figure 6-8 vmnic 2 states Disconnected as the uplinks in the I/O Module to the network has
been severed (or spanning-tree blocked) causing a Trigger Failover response to the
associated vPorts.
Figure 6-8 vmnic2 failure - VMware Management
147
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm
In Figure 6-9 below vSwitch0 is now displaying disconnected on vmnic2 and has failed over to
its redundant (stand by) vmnic3. When this happens the traffic that was originally on vmnic 2
is now running over vmnic 3 and up through I/O Module 4.
Figure 6-9 vmnic2 failure - vSwitch
In Example 6-10, using a linux command line, a failure of 2 seconds (e.g. 2 ICMP Ping loss)
was experienced during a failover trigger event.
Example 6-10 ICMP ping loss due to failover trigger between I/O Modules
64 bytes from 9.42.171.170: icmp_seq=580 ttl=64 time=0.580 ms
Request timeout for icmp_seq 581
Request timeout for icmp_seq 582
64 bytes from 9.42.171.170: icmp_seq=583 ttl=64 time=0.468 ms
Important: During a failure event between I/O Modules it is normal to experience up to 3
seconds of packet loss due to network reconvergence.
Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm
148 NIC Virtualization on IBM Flex System
149
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm
6.3 UFP mode virtual NIC with vLAG and FCoE
This section discusses the implementation of UFP virtual NIC with FCoE, and vLAG
aggregations on the uplinks of a pair of CN4093’s.
6.3.1 Components
This deployment scenario will make use of the following equipment:
򐂰 Flex System Enterprise Chassis
򐂰 x240 Compute Node in bay 3
– Running ESX 5.5
– Dual port Emulex LoM CNA
• Not used in this scenario
– Quad port CN4054 CNA in Mezz slot 2
• First two physical CNA ports have UFP configured and FCoE personality enabled
• Second two CNA ports have Virtual NIC disabled - Not used in this scenario
򐂰 v7000 Storage Node in bays 11 - 14 of the Flex System chassis
– Providing remote storage for Compute Node in bay 3
򐂰 Two CN4093 I/O Modules
– Installed in I/O Module bays 3 and 4 for this scenario
– Both with Upgrade 1 FoD installed
– Providing the FCF function between the Compute Node in bay 3, and the storage array
in bays 11-14
򐂰 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of
CN4093’s
6.3.2 Topology
This scenario will take advantage of the vLAG feature available on the CN4093 to virtualize
the data plane to support cross switch aggregation. As well as UFP to provide virtual NIC
support to the Compute Node, and FCoE within UFP to offer FCoE attached storage to the
Compute Node in bay 3.
Some comments on what is being demonstrated:
򐂰 We are using vLAG to provide cross-switch aggregation out of the CN4093’s toward the
upstream direction to the Top of Rack switches. This provides both HA and improved
performance for these connections to the upstream network
– We are not doing any vLAG aggregations from the CN4093’s toward the Compute
Node in bay 3 (aggregations toward servers running any form of virtual NIC is not
supported at this time)
򐂰 For UFP we will be demonstrating four different vPorts:
– vPort1 in tunnel mode - using the vLAG aggregation of EXT11 on both CN4093’s for
the tunnel uplinks out of the I/O Modules
• Uplink for vPorts using tunnel mode should use the tagpvid-ingress command to
break out tunnel packets toward upstream and re-add outer tag on inbound packets
back into the tunnel
– vPort2 in FCoE mode
• If FCoE is desired, only vPort2 can provide that function. All other vPorts can be any
mode except FCoE
– vPort3 in Access mode, allowing only vLAN 40, untagged
Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm
150 NIC Virtualization on IBM Flex System
– vPort4 in Trunk mode, allowing VLANs 50 and 60 (VLAN 50 untagged)
• vPort3 and vPort4 sharing vLAG aggregations on ports EXT12 and EXT13 on both
CN4093’s for their uplinks
Figure 6-10 shows how the components of this design come together.
Figure 6-10 Example of vLAG aggregations upstream, UFP and FCoE using CN4093s
6.3.3 Use cases
For customers desiring highly available upstream connections (vLAG), virtual NICs on the
servers (UFP) and converged storage access (FCoE). As noted previously, none of these
features are directly a requirement of the other (we can have vLAG without UFP, or UFP
without FCoE, and so forth). They are just demonstrated together here for the purposes of
showing a potentially flexible and robust design.
6.3.4 Configuration
This section includes the configurations and steps necessary to configure the various
components. This examples here will not include the upstream G8264 configurations, as that
is not the focus of this paper (but it will include the configuration for the uplinks in the
CN4093s toward the G8264s). Also not included here is the act of creating the LUNs that will
be used for this process. It is assumed they already exist at the time this scenario is built.
151
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm
The steps required to complete this scenario are broken up into five primary sections:
򐂰 “Host side enablement (UEFI Setup)”
򐂰 “Miscellaneous I/O Module settings”
򐂰 “vLAG and aggregation configurations” on page 152
򐂰 “UFP configuration” on page 154
򐂰 “FCoE configuration” on page 156
Host side enablement (UEFI Setup)
For this example we will need to go into UEFI and configure the desired virtual NIC type
(UFP) and set the personality to FCoE. Not shown will be the install of ESX and the
configuration of the vSwitches and a test VM (images in Figure 6-10 on page 150 represent
the final vSwitch vmnic usage).
To configure the host to support UFP and FCoE, reboot the server and when prompted, press
the F1 key to enter Setup. In Setup, go to System Settings  Network, then highlight the
desired NIC and press Enter twice. This should take us to the Emulex NIC Selection menu.
Change Personality to FCoE (assumes FCoE FoD key already installed) and change
Multichannel Mode to Unified Fabric Port (UFP).
After setting the FCoE and UFP Virtual NIC in UEFI, escape back out of UEFI setup, and
save the configuration when prompted, and reboot the Compute Node.
Detailed instructions and screen shots of this process can be found in Chapter 5, “NIC
virtualization considerations on the server side” on page 75.
Miscellaneous I/O Module settings
The following are some preparatory steps before configuring the main features of this
scenario.
Some comments on these commands:
򐂰 In this example we are only using a limited subset of ports for example, INTA3,
INTA13-INTA14, and so on, but in most cases many ports would be performing the same
roles, so some of the commands shown here will impact both the ports we will be using to
demonstrate this scenario, as well as ports that we will not be using in this specific
scenario.
򐂰 Tagging needs to be enabled on all ports carrying FCoE VLANs, as well as on any ports
carrying more then a single VLAN
Important: Changing the Personality and MultiChannel modes effects all CNA ports on the
ASIC associated with the one being changed. Meaning it is only necessary to set this in
one place, to enable two CNA ports if this is the onboard Emulex or in two places for Quad
port CN4054 NIC (CN4054 and Cn4054R have two ASICs).
Important: While performing the configurations on the I/O Modules, all uplinks should be
disconnected or disabled until instructed to bring the links up. Making certain configuration
changes on a I/O Modules with live connections to an upstream network can cause
instability in the network.
Important: All switch configuration instructions assume we are starting from a factory
default configuration on the I/O Modules. All configuration commands shown executed are
from the conf t mode of the isCLI interface of the I/O Module.
Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm
152 NIC Virtualization on IBM Flex System
򐂰 A host name is configured (for clarity)
򐂰 An idle logout timer is configured (for reference)
򐂰 Apply name to ports going to internal v7000 (for clarity)
The commands used to perform these miscellaneous tasks can be seen in Example 6-11:
Example 6-11 Example of preparing switch with base commands
! Enable tagging on all desired ports
! 1-28 = INTA1-INTB14, 43-44 = EXT1-EXT2 (vLAG ISL)
! 54-55 = EXT11-EXT12 (uplink)
int port 1-28,43-44,54-55
tagging
!
! Add host name and set idle time out to 60 minutes
hostname PF_CN4093a
system idle 60
!
! Add port names on INTA13 and INTA14
int port 13-14
name v7000_Storage
Repeat the above steps for the second switch, changing hostname to PF_CN4093b. Once
these base commands are applied we can proceed to creating the vLAG and aggregations.
vLAG and aggregation configurations
Configuring vLAG and aggregation is a multistep process and will include the following steps:
1. Create the aggregation for the vLAG ISL and set PVID to an unused VLAN (using an
unused VLAN for the PVID on the vLAG ISL helps to increase stability of the ISL). We will
be using LACP for all aggregations, but static aggregations could also have been utilized.
All LACP keys are chosen to be unique for each aggregation and do not denote anything
else special by the use of these specific LACP key numbers
2. Disable Spanning-tree on the PVID VLAN of the ISL (also helps ensure stability of the ISL)
3. Create the local aggregations on the uplinks
4. Configure the health check (in this example we will be using the EXTM ports back to back
to provide the vLAG health check). Will be using some unused IP subnet (1.1.1.X/30) for
this health check connection
5. Configure and enable vLAG
6. vLAG Tier ID must be unique from any upstream connecting vLAG pair and must be same
for both CN4093 I/O Modules in the same vLAG pair
7. Once all configurations are complete, plug in back-to-back health check cable between
EXTM ports. Plug in ISL links
8. Once ISL is up, plug in uplinks to upstream networks to complete the physical steps
Important: All of the examples provided here can be directly cut and pasted into the I/O
Module.
153
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm
The commands used to perform these tasks are provided in Example 6-12.
Example 6-12 Example of configuring vLAG and aggregations
! Create the ISL aggregation and set the PVID to an unused VLAN
int port 43-44
lacp mode active
lacp key 4344
pvid 4090
!
! Exit from interface config mode and then globally disable instance of STP
! for ISL PVID VLAN
exit
no spanning-tree stp 26 enable
spanning-tree stp 26 vlan 4090
! Configure upstream aggregations (using EXT11 (53) for UFP tunnel uplink
! Using EXT12-EXT13 (54-55) for UFP trunk and access uplinks
int port 53
lacp mode active
lacp key 1111
!
int port 54-55
lacp mode active
lacp key 1213
!
! Configure EXTM ports for use as vLAG healthcheck
! Interface IP 127 is tied to EXTM
int ip 127
ip address 1.1.1.1 255.255.255.252 enable
! Configure VLAG
! Hlthck points to IP of other CN4093 in this vLAG pair
! ISL adminkey is the admin keys on ports EXT1-EXT2
! Other adminkeys are for uplink aggregations previously configured
vlag enable
vlag tier-id 11
vlag hlthchk peer-ip 1.1.1.2
vlag isl adminkey 4344
vlag adminkey 1111 enable
vlag adminkey 1213 enable
!
Once the above steps are complete, repeat for the second I/O Module, changing the following
two lines in the above config:
򐂰 Change ip address 1.1.1.1 255.255.255.252 enable to ip address 1.1.1.2
255.255.255.252 enable
򐂰 Change vlag hlthchk peer-ip 1.1.1.2 to vlag hlthchk peer-ip 1.1.1.1
Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm
154 NIC Virtualization on IBM Flex System
Once both switches are configured per the above, perform the following steps:
1. Bring up the ISL links between the pair of CN4093’s (no shut EXT1-EXT2 and/or plug in
the cables as necessary)
2. Bring up the management ports on both CN4093’s (no shut EXTM and/or plug in the
cable as necessary
3. Confirm links on EXT1, EXT2 and EXTM ports on both CN4093’s are up (show int
status)
4. Confirm aggregation on EXT1-EXT2 is Up (show lacp info)
5. Confirm vLAG ISL and health check are up using the command show vlag info and
confirm Health check is Up and ISL state is Up
6. Once vLAG and health checks are confirmed operational, bring up uplink aggregations on
EXT11-EXT13 on both I/O Modules (no shut EXT11-EXT13 and/or plug in the cables as
necessary)
7. Confirm links are up (show int status), aggregations are up (show lacp info) and vLAG
shows state formed (show vlag info) for both upstream aggregations.
Details on output of above commands for correctly functioning I/O Modules are provided in
6.3.5, “Confirming operation of the environment” on page 158.
UFP configuration
In this step we will enable and configure UFP on the INTA3 interface and add desired VLANs
to uplink ports to complete the path out for the UFP vPorts.
Some comments on these steps:
򐂰 Before we start configuring vPorts, we will enabled CEE
– If a vPort is configured for FCoE, UFP can not be enabled until CEE is enabled
– Enabling CEE automatically turns off flow control on all internal ports - this is to switch
to Per Priority Flow control used by CEE
– When changing flowcontrol states, the ports are shut/no shut automatically briefly to
force the new flowcontrol state
򐂰 In this example we will be configuring four vPorts
– vPort 1 will be in UFP tunnel mode and will use a tunnel VLAN of 4091. 4091 will be the
outer tag used on packets flowing on this tunnel, and will be stripped off on the uplink
EXT11 interface using the tagpvid-ingress command.
– vPort 2 will be used for FCoE traffic and set for VLAN 1001 or 1002, depending on the
switch. FCoE VLANs should be different for the separate switches to reduce the likely
hood of a fabric merge.
– vPort 3 will be configured as a simple access vPort, using an access/untagged VLAN
of 40.
– vPort 4 will be configured as an 802.1Q trunk vPort, using an untagged VLAN of 5o,
and allowing a tagged VLAN of 60.
Important: It is assumed that the upstream connecting switches have already been
properly configured for any necessary aggregations and vLAG/vPC before bringing up the
links to the upstream network. Failure to ensure upstream configuration is complete before
plugging in cables can lead to a network down situation.
155
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm
򐂰 vPort bandwidths used in this example can change if desired, but it is recommended to not
set the FCoE vPort 2 minimum bandwidth lower then 40%, to prevent FCoE traffic from
being guaranteed the necessary bandwidth
򐂰 While in this example we show 4 different types of vPorts being used (tunnel, FCoE,
access and trunk), we could have used different arrangements of types (for example, used
all trunk vPorts, or all tunnel or access vPorts (except for vPort 2. if FCoE is in use, vPort 2
must be the FCoE vPort)
– For each tunnel mode vPort, assuming the tunnel is being broken out (outer tag
stripped off) on the uplink, that tunnel must have a separate uplink path (can not share
uplink paths with other tunnel mode vPorts or even access or trunk mode vPorts)
– All vPorts on a physical port must use unique VLANs
The commands used to configure UFP and some associated VLAN and tunnel parameters
can be seen in Example 6-13:
Example 6-13 Example of configuring UFP and vPorts on INTA3
! Enabling CEE at this point as it must be enabled before enabling a UFP vPort
! that has FCoE configured
cee enable
! Create and configure all of the vPorts on INTA3 and enable UFP
ufp port INTA3 vport 1
network mode tunnel
network default-vlan 4091
qos bandwidth min 10
enable
ufp port INTA3 vport 2
network mode fcoe
network default-vlan 1001
qos bandwidth min 40
enable
ufp port INTA3 vport 3
network mode access
network default-vlan 40
qos bandwidth min 20
enable
ufp port INTA3 vport 4
network mode trunk
network default-vlan 50
qos bandwidth min 30
enable
ufp port INTA3 enable
ufp enable
! When UFP is enabled, it will automatically create and enable the assigned
! default-vlan for each vPort, and add the vPort as a member of that default VLAN
! Create any extra VLANS and assign VLANs to uplinks and ISL for failover paths
! VLANs 40 and 50 will have automatically been assigned to the vPorts with
! the default-vlan of the same. We need to now add the ISL and uplink ports
Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm
156 NIC Virtualization on IBM Flex System
vlan 40
enable
member EXT1-EXT2,EXT12-EXT13
!
vlan 50
enable
member EXT1-EXT2,EXT12-EXT13
!
! For VLAN 60, this is the only non default-vlan VLAN we will be using
! so we must also manually add the vPort to this VLAN using the vmember command
vlan 60
enable
member EXT1-EXT2,EXT12-EXT13
vmember INTA3.4
!
! VLAN 4091 is our tunnel mode VLAN, and vPort1 is automatically a member, but we
! must add the ISL links and desired uplink as members to carry traffic in and out
vlan 4091
enable
member EXT1-EXT2,EXT11
! We will add the FCoE VLAN to desired ports in the next step.
! Set tagpvid-ingress on upstream port EXT11 to act as tunnel endpoint for vPort 1
! Will remove tunnel VLAN for outbound packets
! Will add tunnel VLAN for inbound ports
int port 53
tagpvid-ingress
Repeat the above steps for the second switch, changing the following fine in the above config:
򐂰 Change the vPort 2 command network default-vlan 1001 to network default-vlan
1002.
Once both switches are configured per the above, perform the following checks:
1. Run the command show run | section ufp and confirm UFP config is in place
2. Run the command show ufp info vport port inta3 and confirm all vPorts are up and
carrying desired VLANs and in desired modes
3. Run the command show int trunk and confirm VLANs are correct and tagpvid-ingress is
on upstream EXT11
Details on proper output of above commands for correctly functioning I/O Modules are
provided in 6.3.5, “Confirming operation of the environment” on page 158
Once these UFP commands are applied we can proceed to configuring FCoE.
FCoE configuration
In this section we will be performing the necessary commands to enable FCoE. It is assumed
the above steps have already been completed. Most importantly, that CEE has already been
enabled in a previous step.
Important: It is assumed that the steps to enable multichannel mode to UFP and
personality to FCoE In the UEFI of Compute Node 3 has already been completed.
157
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm
The steps we will be performing and some comments:
1. CEE was enabled in a previous step, but if it had not, it must be enabled now
2. We will be using EXT15-EXT16 as our FCF ports
– We will not be attaching any cables to these ports in our example as all FCoE traffic will
stay internal to the CN4093, between the host on INTA3 and the FCoE attached
storage on ports INTA13-INTA14 - but we still must assign FC ports to communicate to
the FC component of the CN4093
– Assigning a minimum of 2 FC ports is mandatory for any FCF function to work
– Assigning more FC ports provides higher bandwidth
– FC ports are always assigned in pairs (even numbers used)
– Only the 12 omni ports (EXT11-EXT22) can be assigned as FC ports
3. Configure desired FCoE ports to carry vlan 1001 or 1002 tagged
– FCoE VLAN must be a tagged VLAN on any ports that are carrying it
4. Enabling VLAN 1001 or 1002 for FCF functionality
– 1001 is considered an industry default FCoE VLAN, but almost any VLAN can be used
for FCoE (can not use VLAN 1 and a few other reserved VLANs)
– Although it is possible to use the same FCoE VLAN on both switches (as long as that
VLAN is not carried between the two switches), it is not recommended, to ensure a
fabric merge does not occur if the FCoE VLAN did accidently get bridged between the
I/O Modules
5. Disable STP instance of spanning-tree associated with the FCoE VLAN
6. Configure any desired zoning
– We will be applying zoning that lets all hosts see all available LUNs. This is not what
most production designs will incorporate and is only used here for simplified operation
– In normal zoning, whenever changes are made to zoning, the zoneset activate name
xxxxx command (where xxxxx is the name of the zone to be activated) must be
executed before any zoning changes take effect. The zonset activate command is not
necessary with the zoning syntax we are using in this scenario
7. Save the configuration to NVRAM when completed
The commands used to perform these tasks can be seen in Example 6-14:
Example 6-14 Example of configuring FCoE
! Enable FIP Snooping to ensure FCoE end to end security
fcoe fips enable
! Designate the desired omni ports as FC ports
system port EXT15,EXT16 type fc
! Name FCoE VLAN, add v7000 facing ports and FC ports and enable the FCF support
vlan 1001
enable
name FCoE_FAB-A
member INTA13-INTA14,EXT15-EXT16
fcf enable
! Disable STP on instance of STP associated with FCoE VLAN
no spanning-tree stp 112 enable
Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm
158 NIC Virtualization on IBM Flex System
spanning-tree stp 112 vlan 1001
! Add catch-all zoning (not suitable for most production environments)
zone default-zone permit
zone name allow-all
zoneset name default
! Save the configuration changes made to NVRAM
copy running startup
! If prompted to save to flash press the y key
! If prompted to change to active config block, press the y key
Repeat the above steps for the second switch, changing the following lines:
򐂰 Change vlan 1001 to vlan 1002
򐂰 Change name FCoE_FAB-A to name FCoE_FAB-B
򐂰 Change no spanning-tree stp 112 enable to no spanning-tree stp 113 enable
򐂰 Change spanning-tree stp 112 vlan 1001 to spanning-tree stp 113 vlan 1002
Once both switches are configured per the above, perform the following checks:
1. Run the command show fcoe fips fcf and confirm we see an FCF entry for each FC port
that was configured. The FCF function should come up regardless of FCoE sessions.
2. Run the command show fcoe fips fcoe and confirm we see an FCoE session for each
V7000 port on INTA13 and INTA14, and one for the server on INTA3.
3. Run the command show fcoe fips vlan and confirm desired interfaces are present for the
FCoE VLAN.
Details on proper output of above commands, along with other helpful troubleshooting
commands for this environment are provided in “Confirming operation of the environment”
6.3.5 Confirming operation of the environment
This section contains helpful commands and their associated output to ensure the scenario
demonstrated is healthy and operating as expected. Note there are many helpful commands
for many tasks, but this section is focused on the specific commands for this environment.
Also note that the output for most of this information can also be obtained from a show tech
command.
Details on confirming the health of vLAG and aggregations
The examples provided in Example 6-15 on page 159 represent truncated output and added
embedded comments on that command output:
The examples here are all run on the bay 3 I/O Module. When troubleshooting, one should
always look at both I/O Modules in the design.
Important: It is assumed that the OS has already been installed on the Compute Node
and proper FCoE drivers are operational within the OS. It is also assumed the V7000
storage has been configured and is presenting storage to the host.
Important: In an effort to reduce extraneous output, many non-essential lines have been
removed from the output of the commands executed in this section. Where removed, they
have been replaced by an ellipsis (...)
159
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm
Example 6-15 Example of commands to check the health of vLAG (after all configs applied)
! First check the link status. Make sure ISL ports (EXT1-EXT2), INTA3, INTA13,
! INTA14, EXT1, are Link up, as well as EXTM Link up for the vLAG health check
PF_CN4093a#show int status
------------------------------------------------------------------
Alias Port Speed Duplex Flow Ctrl Link Name
------- ---- ----- -------- --TX-----RX-- ------ ------
INTA3 3 10000 full no no up INTA3
...
INTA13 13 10000 full no no up v7000_Storage
INTA14 14 10000 full no no up v7000_Storage
...
EXT1 43 10000 full no no up EXT1
EXT2 44 10000 full no no up EXT2
...
EXT11 53 10000 full no no up EXT11
EXT12 54 10000 full no no up EXT12
EXT13 55 10000 full no no up EXT13
...
EXTM 65 1000 full no no up EXTM
...
! Confirm aggregation is now up not only for ISL but each one of the upsteam
! aggeregations
PF_CN4093a#sho lacp info
------------------------------------------------------------------
port mode adminkey operkey selected prio aggr trunk status minlinks
---------------------------------------------------------------------------------
...
EXT1 active 4344 4344 yes 32768 43 65 up 1
EXT2 active 4344 4344 yes 32768 43 65 up 1
...
EXT11 active 1111 1111 yes 32768 53 66 up 1
EXT12 active 1213 1213 yes 32768 54 67 up 1
EXT13 active 1213 1213 yes 32768 54 67 up 1
...
! Confirm vLAG is fully healthy and both upstream vLAGed aggregations show state
! formed (formed = at least one uplink from each switch in a vLAGed aggregation is
! up and operationsl)
PF_CN4093a#sho vlag info
vLAG system MAC: 08:17:f4:c3:dd:0a
Local MAC 74:99:75:5d:dc:00 Priority 0 Admin Role PRIMARY (Operational Role
PRIMARY)
Peer MAC a8:97:dc:10:44:00 Priority 0
Health local 1.1.1.1 peer 1.1.1.2 State UP
ISL trunk id 65
ISL state Up
Auto Recovery Interval: 300s (Finished)
Startup Delay Interval: 120s (Finished)
vLAG 65: config with admin key 1111, associated trunk down, state formed
vLAG 66: config with admin key 1213, associated trunk down, state formed
! For reference, aside from state formed, there are three possible other states
! state local up = At least one link from the vLAG agg is up on this switch, but
Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm
160 NIC Virtualization on IBM Flex System
! no links for this vLAG agg are up on the other switch
! state remote up = the reverse of local up, in other words, there is port up for
! this vLAG agg on the other switch, but none on this switch
! state down = no links on either switch are up for this vLAG agg
Details on confirming the health of UFP
The examples provided in Example 6-16 represent truncated output and added embedded
comments on that command output for checking the health of UFP:
Example 6-16 Example of commands to check the health of UFP (after all configs applied)
! First check that the desired UFP commands are present in the running config
! by filering on just showing the UFP sections
PF_CN4093a#show run | section ufp
ufp port INTA3 vport 1
network mode tunnel
network default-vlan 4091
qos bandwidth min 10
enable
exit
!
ufp port INTA3 vport 2
network mode fcoe
network default-vlan 1001
qos bandwidth min 40
enable
exit
!
ufp port INTA3 vport 3
network mode access
network default-vlan 40
qos bandwidth min 20
enable
exit
!
ufp port INTA3 vport 4
network mode trunk
network default-vlan 50
qos bandwidth min 30
enable
exit
!
ufp port INTA3 enable
!
ufp enable
!
! Get a real time snapshot of vPort state and VLANs in use, as well as the mode
! configured for each vPort.
PF_CN4093a#show ufp info vport port inta3
-------------------------------------------------------------------------------
vPort state evbprof mode svid defvlan deftag VLANs
--------- ----- ------- ---- ---- ------- ------ ----------------------
INTA3.1 up dis tunnel 4091 4091 dis 4091
INTA3.2 up dis fcoe 1001 1001 dis 1001
INTA3.3 up dis access 4004 40 dis 40
161
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm
INTA3.4 up dis trunk 4005 50 dis 50 60
! Get real time infomation on VLAN allowed status as well as the status of
! tagpvid-ingress on the uplink for the tunnel vPort (EXT11) as seen by the
! accompanying # symbol.
PF_CN4093a#show int trunk
Alias Port Tag Type RMON Lrn Fld PVID NAME VLAN(s)
------- ---- --- ---------- ---- --- --- ------ -------------- ------------------
...
INTA3 3 y Internal d e e 1 INTA3 1 40 50 60 1001
4091
...
EXT1 43 y External d e e 4090 EXT1 1 40 50 60 4090
4091
EXT2 44 y External d e e 4090 EXT2 1 40 50 60 4090
4091
...
EXT11 53 n External d e e 4091# EXT11 4091
EXT12 54 y External d e e 1 EXT12 1 40 50 60
EXT13 55 y External d e e 1 EXT13 1 40 50 60
...
* = PVID is tagged.
# = PVID is ingress tagged.
Details on confirming the health of FCoE
The examples provided in Example 6-17 represent truncated output and added embedded
comments on that command output for checking the health of FCoE:
Example 6-17 Example of commands to check health of FCoE (after all configs applied)
! Confirm the FCF is detected and has an entry for each of the FC ports assigned
! to this purpose
PF_CN4093a#show fcoe fips fcf
Total number of FCFs detected: 2
FCF MAC Port Vlan
-----------------------------------
a8:97:dc:10:44:c7 EXT15 1001
a8:97:dc:10:44:c8 EXT16 1001
! Confirm the FCoE sessions have been establised for each device that is using
! FCoE (the host on INTA3 and the ports toward the v7000 storage (INTA13 and
! INTA14)
PF_CN4093a#show fcoe fips fcoe
Total number of FCoE connections: 3
VN_PORT MAC FCF MAC Port Vlan
------------------------------------------------------
0e:fc:00:01:11:00 a8:97:dc:10:44:c8 INTA3 1001
0e:fc:00:01:10:00 a8:97:dc:10:44:c7 INTA13 1001
0e:fc:00:01:10:01 a8:97:dc:10:44:c7 INTA14 1001
! Check that all ports that need access to the FCoE VLAN are included:
PF_CN4093a#show fcoe fips vlan
Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm
162 NIC Virtualization on IBM Flex System
Vlan App creator Ports
---- ----------------- -------------------------------------------------------
1001 UFP INTA3 INTA13 INTA14 EXT15 EXT16
! The following commands are only available when in full fabric mode (FCF enabled)
! and can be helpful when troubleshooting
! Make sure the FCoE database is populated with all hosts
PF_CN4093a#show fcoe database
-----------------------------------------------------------------------
VLAN FCID WWN MAC Port
-----------------------------------------------------------------------
1001 011100 10:00:00:00:c9:f8:0a:59 0e:fc:00:01:11:00 INTA3
1001 011000 50:05:07:68:05:08:03:70 0e:fc:00:01:10:00 INTA13
1001 011001 50:05:07:68:05:08:03:71 0e:fc:00:01:10:01 INTA14
Total number of entries = 3.
-----------------------------------------------------------------------
! Make sure we see a fabric login for each device:
PF_CN4093a#show flogi database
-----------------------------------------------------------------------
Port FCID Port-WWN Node-WWN
-----------------------------------------------------------------------
INTA13 011000 50:05:07:68:05:08:03:70 50:05:07:68:05:00:03:70
INTA14 011001 50:05:07:68:05:08:03:71 50:05:07:68:05:00:03:71
INTA3 011100 10:00:00:00:c9:f8:0a:59 20:00:00:00:c9:f8:0a:59
Total number of entries = 3.
-----------------------------------------------------------------------
For further commands on reviewing the health of an I/O Module see the appropriate
Application Guide for that product. A good source for guides for PureFlex I/O Modules is the
following link:
http://publib.boulder.ibm.com/infocenter/flexsys/information/topic/com.ibm.acc.net
workdevices.doc/network_iomodule.html
163
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
6.4 pNIC and vNIC Virtual Fabric modes with Layer 2 Failover
This section presents several scenarios for use of the Emulex LOM’s and mezzanine
adapters in Flex System compute nodes. The presented scenarios are:
򐂰 Physical NIC mode with Layer 2 failover
򐂰 Physical NIC mode with Layer 2 failover and FCoE storage
򐂰 Virtual Fabric vNIC mode with failover
򐂰 Virtual Fabric vNIC mode with failover and FCoE storage
Physical NIC mode presents each port of the Emulex LOM or card as a single 10Gb physical
port. A two-port card would be seen by the OS of the compute node as two 10Gb NICs, each
of which would go to a different embedded I/O Module in the Flex chassis. A four-port
mezzanine card would be seen as four 10Gb ports; two ports would go to one I/O Module (for
example bay 1) and two to another (bay 2), using internal ports INTAx and INTBx on the
switches. To make full use of a four port card such as the CN4054, an upgrade would be
required on embedded switch modules (EN4093R, CN4093, or SI4093).
Physical NIC mode with FCoE changes the presentation of the card so that each physical
port is seen as a NIC and a corresponding FCoE HBA. (It is also possible to select the iSCSI
personality on the card, and the storage side would be seen as an iSCSI HBA. This scenario
is not tested here.)
Virtual Fabric vNIC mode, also known as IBM Virtual Fabric mode presents each port of an
Emulex LOM or card as up to four virtualized ports. The bandwidth of these ports is
configurable with both a minimum guaranteed bandwidth allocation and a maximum limit on
bandwidth usage. The OS of the compute node will see up to eight NICs, with bandwidth
equal to the maximum limit configured on the Emulex hardware. Even though the OS might
see eight NICs, each with a bandwidth of 10Gb, there are still only two 10Gb physical ports
behind them. Four of the vNICs will share the 10Gb bandwidth of each physical port. (If a four
port card such as the EN4054 is used, vNIC will present up to sixteen virtualized NICs to the
OS from each EN4054, but there are still only four 10Gb physical ports and the total available
bandwidth is 40Gb.)
Virtual Fabric vNIC mode with FCoE reserves one of the four vNIC instances for each
physical port for storage networking. In this case, the OS will see fewer virtualized NIC
instances but will see the storage functionality reflected as an HBA. For example, a two port
LOM configured in this way would be seen by the OS as six virtualized NIC instances and a
two port HBA. The two port LOM still has only two physical 10Gb ports, and each one would
be shared by three vNIC instances and one HBA. As in Physical NIC mode, an iSCSI
personality is also available.
Layer 2 Failover is a configurable function of most of the embedded switch I/O Modules on
the Flex System chassis. It allows the state of a set of ports - typically external ports which
connect to an upstream network - to control the state of other ports, typically internal facing
ports which connect across the chassis backplane to compute nodes. This feature is typically
used to protect against a specific type of network failure which can occur in chassis-based
systems, where an embedded switch is operational but disconnected from the remainder of
the network. Layer 2 failover can administratively disable server-facing ports when such a
failure occurs, triggering the servers’ NIC teaming (or bonding) capability to use a surviving
port which still has a viable connection to the network.
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
164 NIC Virtualization on IBM Flex System
6.4.1 Components
The testing in this chapter was done using the following hardware and software:
򐂰 Flex System Enterprise Chassis
򐂰 x240 Compute Node in bay 1
– Running ESX 5.1
– Dual port Emulex LOM CNA
– DS4800 external storage attached via FC ports on G8264 switches
򐂰 Two EN4093’s in I/O Module bays 1 and 2
– Both with Upgrade 1 FoD installed
򐂰 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of
CN4093’s
– Providing FCF function and physical connectivity to DS4800 on Fibre Channel port 53
6.4.2 Topologies
The base topology for the scenarios presented in this section is shown in Figure 6-11 on
page 165 and shows the connections between the components listed above. Specific
topology diagrams will be included in the sections below for specific scenarios.
Note: There are two distinct ways to configure L2 failover on the 4093 switches. The
failover command and associated subcommands and operands operates on full physical
ports, and has been enhanced to function for UFP vports as well.
There is also a failover option within the configuration of a vnic group; this option allows a
failure in the uplink associated with a vnic group to cause the vnic members of that group
to be administratively disabled.
The vmember option of the failover command, which is intended for UFP vports, will allow
a vNIC instance to be specified but it will not provide the desired failover function.
165
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
Figure 6-11 Base topology for scenarios
6.4.3 Use cases
Physical NIC mode (pNIC) is the default for the Flex environment. It presents the LOM and
NIC mezzanine cards to the server’s OS with the same number of ports as the card actually
has (2-port or 4-port). In this mode, converged networking can be enabled, so that these
cards present two or four NIC ports and two or four HBA ports for storage (FCoE or iSCSI).
Redundancy can be achieved in pNIC mode for data networking through the use of NIC
teaming options on the various operating systems. The storage protocols each have their own
multi-pathing options which provide a similar capability as long as both HBA ports have
access to the storage LUNs.
The embedded and top-of-rack switches can be configured the failover command, which
works in concert with NIC teaming. This scenario would use active-standby teaming on
Windows and Linux; it could use a form of active-active teaming with VMware. These options
are discussed in chapter 5.3, “Utilizing physical and virtual NICs in the OS” on page 115. In
addition, with Virtual Link-aggregation (vLAG) on the switches, active-active NIC teaming
modes can be supported. This will typically provide a more rapid failover and fail-back.
PureFlex Chassis
DS4800
Storage
--
FC
attached
EN4093 – Bay 1
EXT6
vNIC .4
8264-1
EXT5
EXT7
vNIC .2
FCoE
EN4093 – Bay 2
vNIC .4
8264-2
vNIC .2
FCoE
INTA1
INTA1
EXT6
EXT7
EXT5
EXT10
EXT9
EXT9
EXT10
vLAG
ISL
52
42
42
52
51
51
17 18
17 18
EXTM
EXTM
FC
54
FC
54
vNIC .3
vNIC .3
vNIC .1
vNIC .1
X240
server
--
2-port
LOM
--
ESX 5.1
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
166 NIC Virtualization on IBM Flex System
Virtual Fabric vNIC mode was the first virtualization option available from Emulex and IBM. It
allows the Emulex converged NIC to be seen by operating systems as four NIC ports per
physical port, or three NIC ports and one HBA per physical port. There are topology
constraints in Virtual Fabric vNIC mode which are largely relaxed in the newer UFP
virtualization mode which is recommended for new implementations. UFP is discussed in
section 6.3, “UFP mode virtual NIC with vLAG and FCoE”.
6.4.4 Configurations
The following configuration options are covered:
򐂰 “Physical NIC mode”
򐂰 “Use of vLAG with failover” on page 167
򐂰 “Physical NIC mode with FCoE storage” on page 168
򐂰 “Virtual Fabric vNIC mode” on page 174
򐂰 “Virtual Fabric vNIC mode with FCoE” on page 176
Physical NIC mode
The failover function on the EN4093R switches can be configured on static or dynamic
(LACP) aggregations. If it is desired to use auto monitoring (amon) then a single port can be
configured as an aggregation and then configured into a failover trigger. The configuration
would be done as follows, assuming that the uplink ports to be monitored are EXT5 and
EXT7. (The upstream switch would have to configure LACP on the corresponding ports.)
With this configuration, when both EXT5 and EXT7 fail, internally facing ports with the same
VLANs will be administratively brought down. The limit option shown can be used to cause
the internal ports to be brought down when either EXT5 or EXT7 fails - that is, when there are
one or fewer ports active. The commands are shown in Example 6-18.
Example 6-18 Failover configuration - pNIC mode - Auto monitor
interface port EXT5,EXT7
lacp key 5757
lacp mode active
failover enable
failover trigger 1 amon admin-key 5757
failover trigger 1 enable
failover trigger 1 limit 1 (optional)
It is sometimes desirable to configure failover with more flexibility than the amon option
provides. This can be done with manual configuration, also known as mmon. A configuration
to do the same failover as is shown above using manual monitoring is shown below. Note that
the controlled ports are explicitly specified, and can be a subset of the internal facing ports, or
can include external ports such as when a server is connected to them. In Example 6-19, only
ports INTA1 and INTA2 are to be disabled in the event of an uplink failure.
Example 6-19 Failover configuration - pNIC mode - Manual monitor
interface port EXT5,EXT7
lacp key 5757
lacp mode active
failover enable
failover trigger 1 mmon monitor admin-key 5757
167
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
failover trigger 1 mmon control member INTA1,INTA2
failover trigger 1 enable
failover trigger 1 limit 1 (optional)
Manual monitor failover can also be configured to monitor individual ports with the following
command syntax: failover trigger 1 mmon monitor member EXT5,EXT7.
Multiple triggers can be configured but a given resource - one or more ports - can only be
controlled by one trigger at a time. A given trigger instance number can be either in amon or
mmon mode.
Example 6-20 shows manual monitoring of a static Port Channel.
Example 6-20 Failover configuration - manual with static PortChannel
portchannel 10 port EXT5,EXT7
portchannel 10 enable
failover enable
failover trigger 2 mmon monitor portchannel 10
failover trigger 2 mmon control member INTA1,INTA2
failover trigger 2 enable
failover trigger 2 limit 1 (optional)
Use of vLAG with failover
The vLAG feature allows a port aggregation to be connected from a switch, including an
EN4093 switch, to a pair of upstream switches which are connected and configured
appropriately. This function is supported for both static and dynamic link aggregations.
Since the failover feature is intended for failures where a server NIC is connected to a switch
which has no uplink path, it is less useful when vLAG is used between a pair of 4093’s. This is
because if the uplink from a 4093 fails in such a topology, traffic will cross the inter-switch link
(ISL) configured as part of vLAG and use the uplink from the other 4093. If both 4093’s uplink
ports fail at the same time, then there is no uplink path available from the chassis, and the
failover feature will not help. However, failover can be configured to bring down an internal
port when both the uplinks and the ISL ports fail (which is likely to be a very rare event); this is
shown in Example 6-21.
Example 6-21 Failover configuration when vLAG is in use
!*** Uplink ports ***
int port EXT5,EXT7
lacp key 5757
lacp mode active
! *** ISL ports ***
int port ext9,ext10
lacp key 910
lacp mode active
!*** vLAG configuration ***
vlag enable
vlag tier-id 20
vlag isl adminkey 910
!vlag hlthchk ... typically uses EXTM port and interface 127 on embedded switches
vlag adminkey 5757 enable
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
168 NIC Virtualization on IBM Flex System
failover enable
failover trigger 3 mmon monitor admin-key 5757
failover trigger 3 mmon monitor admin-key 910
failover trigger 3 mmon control member INTA1,INTA2....
failover trigger 3 enable
Physical NIC mode with FCoE storage
Physical NIC mode with storage is not very different from pNIC with no storage; the difference
is that there is a dedicated VLAN for the storage traffic which must be carried to a
Fibre-Channel Forwarder (FCF), which is where FC and Ethernet addressing is correlated
and where FCoE traffic can be converted to standard FC traffic if the topology calls for this.
Failover is configured in the same way with FCoE in use as it is without it. Uplink and downlink
(server-facing) ports should be configured to carry the FCoE VLAN and the cee enable and
fips enable command need to be part of the configuration.
On a CN4093 or G8264CS, additional configuration is necessary to configure the Omniports
and the FCF function; this is discussed under , “FCoE configuration” on page 156.
Design choices
For pNIC mode (or vNIC) with storage, it is generally suggested that the two HBA ports and
the associated switches use different FCoE VLANs, and if vLAG is in use in such a topology,
then the FCoE VLANs should not cross the ISL between the vLAG partner switches. This
works well with the typical SAN design where redundancy is provided by having two distinct
SAN networks (SAN-A, SAN-B) which can both reach the physical storage but which share
few or no components between the servers and the storage.
It is possible to either send FCoE traffic on the same uplinks as data traffic, or to use separate
uplinks for the different types of traffic. In the tested scenario, storage and data traffic were
both forwarded to the same upstream switches, but this is not required. Even when the traffic
is sent to the same upstream switches, the option to segregate the two types of traffic is
available.
Topologies which show this and relevant parts of the switch configurations are shown in
Figure 6-12 on page 169 and Figure 6-13 on page 170. In the configuration examples shown
in Example 6-22 on page 170 and Example 6-23 on page 172, VLANs 1001 and 1002 (on the
second EN4093) are used to carry FCoE traffic and VLANs 1 and 2 are carrying data traffic.
The traffic could be segregated by changing the configuration in the following ways:
򐂰 On the EN4093 switches:
– Breaking the aggregation between links EXT5 and EXT7 which uses LACP key 5757
– Assigning VLAN 1001 (or 1002) to EXT5 and VLAN 1 and 2 to EXT7 (or vice-versa).
򐂰 On the G8264CS switches:
– Breaking the aggregation between links 42 and 52 which uses LACP key 4252
– Assigning the VLANs to the links to match what was done on the EN4093s
169
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
Figure 6-12 pNIC with FCoE: single shared uplink aggregation
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
170 NIC Virtualization on IBM Flex System
Figure 6-13 pNIC + FCoE - with FCoE traffic on segregated uplink
Example 6-22 EN4093 config excerpts for vLAG topology with pNIC and FCoE
version 7.7.9
switch-type IBM Flex System Fabric EN4093R 10Gb Scalable Switch(Upgrade1)
...
interface port INTA1
tagging
no flowcontrol
exit
...
!
interface port EXT5
tagging
exit
...
!
interface port EXT7
tagging
exit
...
171
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
interface port EXT9
tagging
pvid 4090
exit
!
interface port EXT10
tagging
pvid 4090
exit
!
vlan 2
enable
name VLAN 2
member INTA1,EXT5,EXT7,EXT9-EXT10
...
! Note: SAN-B will use VLAN 1002 here
vlan 1001
enable
name FCoE SAN-A
member INTA1,EXT5,EXT7
!
vlan 4090
enable
name ISL
member EXT9-EXT10
!
!
portchannel 10 port EXT9
portchannel 10 port EXT10
portchannel 10 enable
!
!
!
interface port EXT5
no spanning-tree stp 112 enable
exit
!
interface port EXT7
no spanning-tree stp 112 enable
exit
...
!
interface port EXT5
lacp mode active
lacp key 5757
!
...
!
interface port EXT7
lacp mode active
lacp key 5757
!
vlag enable
vlag tier-id 20
vlag isl portchannel 10
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
172 NIC Virtualization on IBM Flex System
vlag hlthchk peer-ip 1.1.1.22
vlag adminkey 5757 enable
no fcoe fips automatic-vlan
!
fcoe fips enable
cee enable
!
interface ip 127
ip address 1.1.1.11 255.255.255.0
enable
exit
Example 6-23 G8264CS config for vLAG topology with pNIC and FCoE
version 7.8.1
switch-type IBM Networking Operating System RackSwitch G8264CS
...
system port 53,54 type fc
interface fc 53
switchport trunk allowed vlan 1,1001
interface fc 54
switchport trunk allowed vlan 1,1001
!
...
interface port 17
description ISL
switchport mode trunk
switchport trunk allowed vlan 1-2,10,4090
switchport trunk native vlan 4090
exit
!
interface port 18
description ISL
switchport mode trunk
switchport trunk allowed vlan 1-2,10,4090
switchport trunk native vlan 4090
exit
...
interface port 42
description 4093 downlink
switchport mode trunk
switchport trunk allowed vlan 1-2,1001
exit
!
interface port 52
description 4093 downlink
switchport mode trunk
switchport trunk allowed vlan 1-2,1001
exit
!
vlan 2
name VLAN 2
173
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
!
!
! note that SAN-B (8264-2) will use vlan 1002 here and in the allowed vlan
statements
vlan 1001
name FCoE SAN-A
fcf enable
!
vlan 4090
name ISL
...
!
interface port 17
lacp mode active
lacp key 1718
!
interface port 18
lacp mode active
lacp key 1718
!
interface port 42
lacp mode active
lacp key 4252
!
interface port 52
lacp mode active
lacp key 4252
!
!
!
vlag enable
vlag tier-id 10
vlag hlthchk peer-ip 9.42.171.24
vlag isl adminkey 1718
vlag adminkey 4252 enable
!
fcoe fips enable
cee enable!
!
zone default-zone permit
!
!
!
!
!
interface ip 128
ip address 9.42.171.23 255.255.254.0
enable
exit
!
ip gateway 4 address 9.42.170.1
ip gateway 4 enable
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
174 NIC Virtualization on IBM Flex System
Virtual Fabric vNIC mode
Virtual Fabric vNIC (or vNIC1) mode is the first NIC virtualization mode developed for use with
Emulex adapters on IBM servers. It has largely been supplanted by UFP mode, which is more
versatile. However, Virtual Fabric vNIC has its own failover configuration commands which
are part of the vNIC group configuration.
vNICs, vNIC groups, and uplinks
An overall discussion of the available options for vNIC and their initial configuration can be
found starting in Section 5.1, “Introduction to enabling Virtual NICs on the server” on page 76.
Virtual Fabric vNIC mode introduces the following concepts:
򐂰 vNIC - an instance of a virtualized NIC which is associated with a specific physical port
and which appears as a NIC or as an HBA as seen by a server’s OS or hypervisor
򐂰 vNIC group - a set of vNIC’s which are used together and which are each associated with
a different physical port
򐂰 vNIC group uplink - a single port or a static or LACP port aggregation associated with a
vNIC group
򐂰 vNIC group VLAN - a VLAN used for tunneling traffic from the vNICs and any
non-virtualized internal ports associated the group through the group’s uplink to the wider
network.
Configuration of the Virtual Fabric vNIC feature is done according to the following
requirements:
򐂰 A physical port can have up to four vNICs activated. No more than one can be for FCoE
traffic and it will always be vNIC instance 2.
򐂰 Bandwidth of vNIC’s is specified in 100 Mb increments; each increment is also one
percent of the bandwidth of a 10Gb port. Minimum bandwidth is 1 Gb which is specified as
10 in the configuration.
򐂰 Each data vNIC must be associated with a vNIC group. FCoE vNIC instances can not be
associated with a vNIC group.
򐂰 A vNIC group can have a single logical uplink, as discussed above. If there is no
requirement for traffic from the group to be forwarded outside of the chassis, then an
uplink is not needed.
򐂰 Each vNIC group must be configured with a vNIC VLAN. This VLAN is never seen outside
of the embedded switch in the chassis, and is used as an outer tag for 802-1q
double-tagging by the switching ASIC.
򐂰 vNIC group VLAN numbers are not strictly required to be unique within the network, but
making them unique may avoid confusion when troubleshooting.
Note: The command “zone default-zone permit” allows any server to access any storage
where the LUN is made accessible. However, the default zoning configuration when FCF
mode is used on a G8264CS or a CN4093 is to deny all access. Therefore either explicit
zoning or the default-zone option is necessary. The status of zoning can be seen with the
show zone command on converged switches.
This does not apply when NPV mode is used; in that case, zoning is configured on an
upstream SAN switch.
175
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
NIC teaming configuration on servers
NIC teaming is a feature included in current operating systems which allows multiple physical
or virtual NICs to be treated as a single logical interface. Teaming can be active/active or
active/standby, and the capabilities of the various teaming modes differ across the various
operating systems. A discussion of teaming features and their configuration can be found in
section 5.3.2, “OS side teaming/bonding and upstream network requirements” on page 122.
vNIC sample failover configuration
A sample failover configuration is shown in Example 6-24, including the associated vNIC and
vNIC group configuration commands. In this configuration, ports EXT5 and EXT7 are uplink
ports. Only one server (in slot 1 and reached via port INTA1) is shown; the configuration
would be similar for other servers but the bandwidth allocations need not be identical. This
configuration fragment would typically be used identically in each of a pair of 4093 switches in
a chassis, especially when failover is used.
Example 6-24 vNIC configuration with failover configured as part of the vNIC group
vnic enable
vnic port INTA1 index 1
enable
bandwidth 40
vnic port INTA1 index 2
enable
bandwidth 30
vnic port INTA1 index 3
enable
bandwidth 20
vnic port INTA1 index 4
enable
bandwidth 10
vnic vnicgroup 1
vlan 3001
member INTA1.1
(additional server vnics can go here)
port ext5
failover
enable
vnic vnicgroup 2
vlan 3002
member INTA1.2
(additional server vnics can go here)
failover
enable
(vnic groups 3 and 4 would be configured similarly and would need additional
uplink ports to carry traffic outside the chassis)
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
176 NIC Virtualization on IBM Flex System
The failover command in the above example is used instead of the failover configuration
shown elsewhere in this section when Virtual Fabric vNIC is used. vNIC failover would
function as follows for vNIC groups where it is configured:
򐂰 The uplink port - which can also be a static portchannel or LACP portchannel specified by
an adminkey - is monitored.
򐂰 If the uplink for a vnic group fails or is blocked due to spanning tree, then the vnic
members of the group would be administratively brought down.
򐂰 If the other switch in the chassis is configured with the same vnic and vnic group
configuration, and if the corresponding uplink in that switch is up, and if NIC teaming is
configured appropriately on the servers, then traffic will use the path through the other
switch.
򐂰 Options which are available in the standard failover trigger configuration, such as the limit
option, VLAN sensitivity, and the manual monitoring options are not available in the vNIC
failover feature. However, UFP uses standard failover triggers.
򐂰 vLAG can not be used with Virtual Fabric vNIC mode.
vNIC failover and shared uplink mode
Shared uplink mode with Virtual Fabric vNIC allows multiple vnic groups to share an uplink
port. This mode is enabled with the vnic uplink-share command, and by specifying the
uplink port (or aggregation) in those vNIC groups where it is desired. The vnic failover
command is specified in the same way when shared uplink mode is in use. Shared uplink
mode, like dedicated uplink mode, does not allow multiple uplinks to be specified in a given
vnic group. A fuller discussion of shared uplink mode and a comparison with the default
dedicated uplink mode can be found in section 4.1.1, “Virtual Fabric mode vNIC” on page 57.
vLAG considerations
vLAG cannot be used on ports or vNIC instances which are members of a vNIC group. A
vNIC group can have only one uplink, and so it would not be possible to configure both an
uplink and an ISL to connect to a vLAG peer switch.
A pair of upstream switches such as the G8264s used in our testing can run vLAG between
them and connect to the uplink PortChannels of a pair of vNIC groups on different switches
such as EN4093’s. The EN4093’s cannot detect that vLAG is in use at the other end of their
uplinks. For this to work, the servers supported by the vNIC groups must configure the same
VLANS on corresponding vNIC’s connecting to each physical port.
Virtual Fabric vNIC mode with FCoE
FCoE traffic is configured in Virtual Fabric vNIC mode as follows:
• FCoE traffic, if enabled, is always on vNIC instance 2.
• When instance 2 is used for FCoE, it is not included in any vNIC group.
• Since the FCoE instance is not configured in a vNIC group, failover for FCoE traffic
is not configured with the vnic group failover option.
• FCoE traffic does not flow over an uplink configured for a vnic group. It can flow over
an uplink in shared uplink mode.
• The standard failover trigger commands can be used to implement failover for FCoE
traffic if desired, but if this is done the entire server-facing port will be brought down,
not only the FCoE vNIC.
177
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
An example config of Virtual Fabric vNIC with FCoE is shown in Example 6-25. In this
example, port EXT7 is used to carry FCoE traffic upstream to the 8264CS switch where the
FCF is.
Example 6-25 Virtual Fabric vNIC with FCoE
vnic enable
vnic port INTA1 index 1
enable
bandwidth 40
vnic port INTA1 index 2
enable
bandwidth 30
vnic port INTA1 index 3
enable
bandwidth 20
vnic port INTA1 index 4
enable
bandwidth 10
vnic vnicgroup 1
vlan 3001
member INTA1.1
(additional server vnics can go here)
port ext5
failover
enable
.... the FCoE vnic can not be added to a vNIC group
.... additional groups for data vNICs would be configured here
failover trigger 3 mmon monitor member EXT7
failover trigger 3 mmon control INTA1[,INTA2 ... etc.]
failover trigger 3 enable
... configuration for FCoE and for FCoE uplink to G8264CS....
cee enable
fcoe fips enable
int port ext7
vlan 1002
member ext7
The above configuration will implement failover for both the data and FCoE vNIC instances,
but it will behave in the following ways:
򐂰 If port EXT5 fails, vNIC INTA1.1 and others configured in vnic group 1 (which would be on
other servers) would be administratively down. The same would happen if an uplink port
configured in vnic groups 3 or 4 should fail; the vNICs associated with those groups would
be disabled.
򐂰 If the FCoE uplink, port EXT7 fails, then port INTA1 and other ports specified in the failover
trigger would be administratively down. This would include all of the vNIC instances
configured on those ports even though they might still have a working path to the
upstream network.
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
178 NIC Virtualization on IBM Flex System
Because a failure on the FCoE uplink port would bring down all of the vNIC instances rather
than just the FCoE instance on vNIC 2, this configuration might not be desirable. Our testing
on ESX showed that FCoE has failover mechanisms of its own on the server. If the HBA ports
are configured so that both of them have access to the storage LUNs, and one of them loses
connectivity to the storage, such as due to an uplink failure, storage access will fail over to the
other HBA. The tests performed showed that there might be a slight advantage in how long it
takes to detect that storage connectivity is lost if the server-facing port (e.g. INTA1) is brought
down, but it did not appear to be a significant advantage.
A diagram of the topology in dedicated uplink mode is shown in Figure 6-14.
Figure 6-14 vNIC with FCoE: dedicated uplink mode
vNIC with FCoE and shared uplink mode
The configuration above would be changed in the following ways to use shared-uplink vNIC:
򐂰 On the EN4093’s
– The vnic uplink-share command would be used to enable shared uplink mode
– The VLAN for vnic group 1 would be set to VLAN 2. All vnic instances which are
assigned to group 1 would only carry VLAN 2.
– Ports EXT5 and EXT7 could optionally be aggregated together.
179
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
– The ports used to uplink vnic group 1 could also carry traffic from other vnic groups, on
their group VLANs.
– The uplink ports or aggregations for group 1 must be configured to include the FCoE
VLAN, 1001 or 1002.
򐂰 On the G8264CS’s:
– The port or aggregation used to downlink to the EN4093’s must match its aggregation
type and status and its VLAN membership, including VLAN 1001 or 1002.
A topology diagram with shared uplink mode is shown in Figure 6-15.
Figure 6-15 vNIC with FCoE: shared uplink mode
Design Choices
The choice to use shared uplink mode or dedicated uplink mode is similar to the choice
between a single uplink and uplinks which segregate data and FCoE traffic discussed in the
section on pNIC mode. Shared uplink mode allows data and FCoE traffic to traverse the same
uplink, shared uplink mode restricts each data bearing vNIC connected to a server to carry
only a single VLAN. UFP allows either shared uplinks or distinct uplinks without these
restrictions.
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
180 NIC Virtualization on IBM Flex System
6.4.5 Verifying operation
This section discusses commands that help verify correct operations.
Failover in pNIC mode
The failover trigger commands can be checked using the show failover command, as shown
in Example 6-26 and Example 6-27.
Example 6-26 Show Failover command output - Manual Monitor
slot-1#sho failover trigger 1
Current Trigger 1 setting: enabled
limit 1
Auto Monitor settings:
Manual Monitor settings:
LACP port adminkey 7575
Manual Control settings:
ports INTA1 INTA2
Example 6-27 Show Failover command output - Auto Monitor
slot-2#show failover trigger 1
Current Trigger 1 setting: enabled
limit 1
Auto Monitor settings:
LACP port adminkey 5757
Manual Monitor settings:
Manual Control settings:
When a failover occurs, the following messages are seen. Note that in this case, FCoE was
part of the configuration and the FCoE session failure also resulted in a message shown in
Example 6-28:
Example 6-28 Messages resulting from a failover event
slot-2(config)#int port ext7
slot-2(config-if)#shut
slot-2(config-if)#
Apr 15 16:02:26 slot-2 NOTICE link: link down on port EXT7
Apr 15 16:02:26 slot-2 NOTICE lacp: LACP is down on port EXT7
Apr 15 16:02:26 slot-2 WARNING failover: Trigger 1 is down, control ports are auto
disabled.
Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA1
Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA3
Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA4
Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA10
Apr 15 16:02:45 slot-2 NOTICE fcoe: FCOE connection between VN_PORT
0e:fc:00:01:0c:00 and FCF a8:97:dc:44:eb:c3 is down.
181
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
When internal (or other) ports are down due to a failover, they appear as disabled in a show
interface link command, as shown in Figure 6-16.
Figure 6-16 Links disabled after failover
When a failed link recovers, messages such as the following shown in Figure 6-17 are seen.
Figure 6-17 Messages resulting from failover recovery
slot-2(config-if)#sho int link
------------------------------------------------------------------
Alias Port Speed Duplex Flow Ctrl Link Name
------- ---- ----- -------- --TX-----RX-- ------ ------
INTA1 1 1G/10G full no no disabled INTA1
INTA2 2 1G/10G full no no disabled INTA2
.....
INTA14 14 1G/10G full no no disabled INTA14
slot-2(config-if)#int port ext7
slot-2(config-if)#no shut
slot-2(config-if)#
Apr 15 16:07:35 slot-2 NOTICE link: link up on port EXT7
Apr 15 16:07:35 slot-2 NOTICE dcbx: Detected DCBX peer on port EXT7
Apr 15 16:07:39 slot-2 NOTICE lacp: LACP is up on port EXT7
Apr 15 16:07:39 slot-2 NOTICE failover: Trigger 1 is up, control ports are
auto controlled.
Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA1
Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA3
Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA4
Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA10
Apr 15 16:07:42 slot-2 NOTICE fcoe: FCOE connection between VN_PORT
0e:fc:00:01:0c:00 and FCF a8:97:dc:44:eb:c3 has been established.
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
182 NIC Virtualization on IBM Flex System
The status of a disabled port can also be seen on the server as both a NIC and HBA. Port
vmnic0 is still active and carrying traffic, and has paths to the storage array, as shown in
Figure 6-18 and Figure 6-19
Figure 6-18 VMware display showing port down
Figure 6-19 VMware storage adapter showing no paths to storage
Failover in vNIC mode
The FCoE vNIC instance (INTA1.2) still requires a dedicated uplink unless shared-uplink
mode is used.
The failover status of a non-FCoE vNIC is shown in the show vnic vnicgroup command.
However, there is no console message that shows that the associated vnic(s) have been
brought down; this can also be seen by entering the same command, as shown in
Figure 6-20 on page 183.
183
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
Figure 6-20 Show vNIC vnicgroup before failover
As shown in Figure 6-21, there is no message showing that INTA1.1 has been brought down.
Figure 6-21 Messages resulting from shutting down vnic group’s uplink ports
slot-2#sho vnic vnicg 1
------------------------------------------------------------------------
vNIC Group 1: enabled
------------------------------------------------------------------------
VLAN : 3901
Failover : enabled
vNIC Link
---------- ---------
INTA1.1 up
Port Link
---------- ---------
UplinkPort Link
---------- ---------
EXT5* up
* = The uplink port has LACP admin key 555
slot-2(config)#int port ext5,ext6
slot-2(config-if)#shut
slot-2(config-if)#
Apr 15 18:51:55 slot-2 NOTICE link: link down on port EXT5
Apr 15 18:51:55 slot-2 NOTICE lacp: LACP is down on port EXT5
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
184 NIC Virtualization on IBM Flex System
The command output, however, does show that the uplink port, EXT5, is down and that the
associated vNIC members of the group have been disabled, as shown in Figure 6-22.
Figure 6-22 show vnic vnicgroup after failover
slot-1(config-if)#sho vnic vnicg 1
------------------------------------------------------------------------
vNIC Group 1: enabled
------------------------------------------------------------------------
VLAN : 3901
Failover : enabled
vNIC Link
---------- ---------
INTA1.1 disabled
Port Link
---------- ---------
UplinkPort Link
---------- ---------
EXT5* down
* = The uplink port has LACP admin key 555
185
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
The FCoE vnic instance can not be configured into a vNIC group and is managed by the
failover trigger commands. In the configuration shown in Figure 6-23, the uplink for FCoE
traffic is on port EXT7 and FCoE uses VLAN 1001. vNIC and pNIC modes are very similar in
this regard.
Figure 6-23 Failover configuration for FCoE vnic
When EXT7 is brought down, INTA1 (and other ports if so configured) are brought down as
shown in the messages. This brings down all the vNIC instances associated with INTA1, so
INTA1.1 is down and it is shown in Figure 6-24 on page 186 as down rather than disabled as
is the case above. Since the uplinks associated with vnic group 1 are still up, the remaining
vNIC instances still have a viable path to the network.
slot-1#sho run | section failover
failover enable
failover trigger 1 mmon monitor member EXT7
failover trigger 1 mmon control member INTA1
failover trigger 1 enable
vnic enable
vnic uplink-share
vnic port INTA1 index 1
bandwidth 25
enable
exit
!
vnic port INTA1 index 2
bandwidth 25
enable
exit
!
vnic port INTA1 index 3
bandwidth 25
enable
exit
!
vnic port INTA1 index 4
bandwidth 25
enable
exit
!
vnic vnicgroup 1
vlan 2
enable
failover
member INTA1.1
key 555
exit
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
186 NIC Virtualization on IBM Flex System
Figure 6-24 Failover message flow from FCoE uplink failure
slot-1#sho vnic vnicgroup 1
------------------------------------------------------------------------
vNIC Group 1: enabled
------------------------------------------------------------------------
VLAN : 2
Failover : enabled
vNIC Link
---------- ---------
INTA1.1 up
Port Link
---------- ---------
UplinkPort Link
---------- ---------
EXT5* up
EXT6* up
* = The uplink port has LACP admin key 555
slot-1#config t
Enter configuration commands, one per line. End with Ctrl/Z.
slot-1(config)#int port ext7
slot-1(config-if)#shut
Apr 15 20:27:04 slot-1 NOTICE link: link down on port EXT7
Apr 15 20:27:04 slot-1 WARNING failover: Trigger 1 is down, control ports are
auto disabled.
Apr 15 20:27:04 slot-1 NOTICE server: link down on port INTA1
Apr 15 20:27:43 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c3 has been removed
because it had timed out.
Apr 15 20:27:43 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c4 has been removed
because it had timed out.
slot-1#sho vnic vnicg 1 (after EXT7 shut down)
------------------------------------------------------------------------
vNIC Group 1: enabled
------------------------------------------------------------------------
VLAN : 2
Failover : enabled
vNIC Link
---------- ---------
INTA1.1 down
UplinkPort Link
---------- ---------
EXT5* up
EXT6* up
* = The uplink port has LACP admin key 555
187
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2
Failover with FCoE and shared-uplink vNIC
Failover in this mode is similar to the previous scenarios presented; the difference is that
FCoE traffic and other data traffic share the same uplink(s). It is still appropriate to use both
the failover trigger command and the vnic group failover option. The failover trigger can be
used to bring down those internal facing ports that depend specifically on the uplink while the
vnic group failover will bring down vnic’s (and not entire internal ports) which depend on the
uplink.
As in previous cases, FCoE and the HBA drivers that support it have their own failover
capabilities on the servers so that if one HBA fails, the surviving HBA can continue to provide
storage access if properly configured to do so. From the testing performed on VMware, this
failover happens quickly.
The messages that result from an uplink failure in this scenario are similar to those in the
non-shared scenario presented above but they are shown in Figure 6-25.
Figure 6-25 Message flow from uplink failure - shared uplink mode with FCoE
slot-1#config t
Enter configuration commands, one per line. End with Ctrl/Z.
slot-1(config)#int port ext5
slot-1(config-if)#shut
Apr 16 12:52:30 slot-1 NOTICE link: link down on port EXT5
Apr 16 12:52:30 slot-1 WARNING failover: Trigger 1 is down, control ports are
auto disabled.
Apr 16 12:52:30 slot-1 NOTICE lacp: LACP is down on port EXT5
Apr 16 12:52:30 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c4 has been removed
because trunk configuration on the fcf changed.
Apr 16 12:52:30 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c3 has been removed
because trunk configuration on the fcf changed.
Apr 16 12:52:30 slot-1 NOTICE server: link down on port INTA1
Apr 16 12:52:31 slot-1 NOTICE dcbx: Feature VNIC not supported by peer on
port INTA2
Apr 16 12:52:31 slot-1 NOTICE dcbx: Feature VNIC not supported by peer on
port INTA10
sho vnic vnicgroup 1
------------------------------------------------------------------------
vNIC Group 1: enabled
------------------------------------------------------------------------
VLAN : 2
Failover : enabled
vNIC Link
---------- ---------
INTA1.1 disabled
UplinkPort Link
---------- ---------
EXT5* down
* = The uplink port has LACP admin key 555
Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014
188 NIC Virtualization on IBM Flex System
189
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR
6.5 Switch Independent mode with SPAR
This section will show deployment examples using vNIC Switch Independent mode with
SPAR pass-thru mode. The combination of these features - Switch Independent mode on the
Emulex adapter and SPAR pass-thru mode on the embedded switches (EN4093R, CN4093,
SI4093) allows for a minimum configuration effort on the embedded switches. Little to no
embedded switch configuration effort is required when a new VLAN or new compute node is
added to a Flex chassis in this scenario,
6.5.1 Components
The following hardware and software was used in the examples in this chapter.
򐂰 Flex System Enterprise Chassis
򐂰 x240 Compute Node in bay 1
– Running ESX 5.1
– Dual port Emulex LOM CNA
– DS4800 external storage attached via FC ports on G8264 switches
򐂰 Two EN4093’s in I/O Module bays 1 and 2
– Both with Upgrade 1 FoD installed
򐂰 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of
CN4093’s
– Providing FCF function and physical connectivity to DS4800 on Fibre Channel port 53
6.5.2 Topology
Figure 6-26 on page 190 and Figure 6-27 on page 191 describe topologies that are used with
SPAR.
Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1,
190 NIC Virtualization on IBM Flex System
Figure 6-26 Topology with SPAR passthru mode
191
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR
Figure 6-27 Local SPAR domain with Switch Independent vNIC and FCoE
6.5.3 Use Cases
SPAR Local and Passthru mode
SPAR (Switch Partition) is an option on the EN4093R, CN4093, and SI4093 IBM embedded
switches. The implementation of SPAR on the SI4093 is different from that on the other
switches and is not dealt with in this book.
SPAR allows the switches listed above to logically partition its available ports into multiple
domains. In other words, there are multiple segments of the data plane of the switch which do
not communicate with each other (unless via an external device).
SPAR pass-thru mode is an option which uses 802.1q-in-q double tagging to allow customer
VLANs to pass through a SPAR instance on a switch without any explicit configuration. This
allows new VLANs to be added without any additional configuration on the embedded switch.
It is possible to use the same VLAN number in multiple domains, but devices on a given
VLAN in SPAR 1 will not be able to communicate with a device on that same VLAN in a
different SPAR domain unless the domains are interconnected elsewhere in the network.
SPAR local domain mode provides the logical partitioning mentioned above, but does not
tunnel customer VLANs through the switch. Instead, each VLAN which is to be used in a
Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1,
192 NIC Virtualization on IBM Flex System
domain must be explicitly configured in that domain. However, it is still possible to define the
same VLAN number in multiple different domains; a device connected to a given VLAN (for
example, 10) in SPAR 1 will not be able to communicate with a device on that same VLAN in
SPAR 2 or SPAR 3 within the switch.
vNIC Switch Independent Mode
Switch Independent Mode is an option on the Emulex adapters, including the LOM included
on several of the available Flex compute nodes and the EN4054 mezzanine card. This feature
allows the Emulex chip to present up to four vNIC instances to a server based on
configuration options in the server’s UEFI rather than those learned from an IBM switch. This
mode can therefore be used with a variety of embedded I/O modules including the 4091
Pass-thru module, SI4093 System Interconnect, and I/O modules from companies other than
IBM. The testing that is outlined in this section was all done with IBM embedded switches, but
the commands for vNIC and UFP functionality are not used.
In Switch Independent Mode, each vNIC associated with a port is assigned a default VLAN in
UEFI, referred to as a LPVID (Local Port VLAN ID). Untagged traffic originating from the
server on a vNIC will be tagged by the Emulex adapter with the configured LPVID VLAN. One
consequence of this is that all server traffic entering the embedded switch from a server using
this mode will be tagged.
vNIC Switch Independent Mode with SPAR Passthru mode
Using these features together allows new VLANs to be created and used on servers
(including guest OS’s running under a hypervisor) and not configured on the embedded
switches at all. On servers, VLANs would be created in ways including the following. This is
covered in more detail in section 5.3, “Utilizing physical and virtual NICs in the OS” on
page 115.
Here are some considerations regarding the creation of tagged VLANs for different operations
systems:
򐂰 Windows Server 2012 - has network configuration tools that allow the creation of tagged
VLANs. When this is done, an additional items is created in the Network Connections
folder. The default (untagged) Network Connection would use the LPVID for the
associated vNIC.
򐂰 Other versions of Windows would need to use the Emulex utility that provides the ability to
create tagged VLANs.
򐂰 VMware - port groups which are attached to a vSwitch can have a specific VLAN
associated with them; these VLANs are transmitted with tags. A port group configured with
no VLAN (VLAN 0) will use the LPVID for the associated vNIC. VMware also allows a port
group to be associated with VLAN 4095; when this is done, VLAN tagging is delegated to
the OS’s of the guest systems.
򐂰 Linux - the vconfig command can create tagged VLAN interfaces attached to a specific
NIC (or vNIC) as seen by the Linux OS. These interfaces default to names of the form
ethx.vlan#; for example, eth0.10. The ifconfig command can be used to set the
attributes of these interfaces once they are created. Various Linux distributions also have
graphical tools which provide the same capabilities.
vLAG topology considerations
vLAG can not be used in concert with SPAR. Since each SPAR domain can only have one
uplink, it would not be possible to successfully configure links to upstream switches and also
an ISL to a vLAG peer switch. Therefore vLAG can not be used on uplink or downlink (server
facing) ports which are included in a SPAR domain.
193
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR
A switch running SPAR could be an access switch using a PortChannel to connect to two
upstream vLAG switches if desired. The SPAR domains would have to use the same VLANs,
whether or not they were explicitly configured (passthru vs. local mode), and could include a
FCoE VLAN in either case. A topology such as this would be more robust than one which did
not include the use of vLAG.
6.5.4 Configuration
This section describes the following configuration steps:
򐂰 “vNIC Switch Independent Mode”
򐂰 “Switch side configuration - FCoE” on page 195
򐂰 “SPAR (Switch Partition) configuration” on page 196
vNIC Switch Independent Mode
Server side - UEFI configuration
This topic is covered in detail in section 5.1.3, “Special settings for the different modes of
virtual NIC via UEFI” on page 86. For the examples in this section, the configuration on each
port of the Emulex card is as follows, and is shown in Figure 6-28, Figure 6-29 on page 194,
and Figure 6-30 on page 194:
򐂰 vnic instance 1 - LPVID 3001, min. bandwidth 10%, max bandwidth 100%
򐂰 vnic instance 2 - FCoE vNIC, no LPVID, min. bandwidth 40%, max bandwidth 100%
򐂰 vnic instance 3 - LPVID 3003, min. bandwidth 20%, max bandwidth 100%
򐂰 vnic instance 4 - LPVID 3004, min. bandwidth 30%, max bandwidth 100%
Figure 6-28 UEFI Configuration for Switch Independent Mode
Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1,
194 NIC Virtualization on IBM Flex System
Figure 6-29 UEFI Configuration - Bandwidth for Switch Independent Mode
Figure 6-30 Configuration display with Bandwidth and LPVID (2 of 4 vNIC’s shown)
195
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR
Server Side - Operating System Configuration (VMware)
The host was configured with a port group for VLAN 2 and an additional port group to test
guest tagging, assigned to VLAN 4095. Guests can be moved from one port group to another
via the settings menu.
For a deeper discussion of networking configuration on VMware and other operating systems,
see section 5.3, “Utilizing physical and virtual NICs in the OS” on page 115.
Figure 6-31 VMware network configuration with two port groups
Switch side configuration - FCoE
There is a group of commands required to enable FCoE on an embedded switch with SPAR
and Switch Independent mode. The requirements differ depending on whether the switch is
an FCoE transit switch such as the EN4093R used in testing for this chapter, or a converged
switch such as the CN4093 used in testing for UFP. The transit switch requirements are
below; for a discussion of the configuration of the CN4093, see section 6.3, “UFP mode
virtual NIC with vLAG and FCoE” on page 149.
To configure an EN4093R as a FCoE transit switch, the requirements are as follows:
򐂰 Enable lossless Ethernet (or Converged Enhanced Ethernet) functionality with the cee
enable command.
򐂰 Enable FIP snooping with the fcoe fips enable command. This allows the switch to
become aware of FCoE initialization traffic and be ready to carry FCoE traffic.
򐂰 Define the VLAN(s) which will carry FCoE traffic and ensure that the appropriate server
facing ports and uplink ports are members of those VLAN(s).
– FCoE VLANs should not be the native VLAN on server facing ports. If vLAG is used, in
general the vLAG ISL should not carry the FCoE VLANs.
– It is common, but not required, to use two distinct VLANs for FCoE. This is usually
done where to connect to a redundant storage networking environment. In such an
environment, there are two SAN fabrics, usually referred to as SAN-A and SAN-B.
Each of the fabrics would connect to its own FCoE VLAN. Typically, two FCoE transit
switches in a Flex chassis would each use a different VLAN for FCoE.
Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1,
196 NIC Virtualization on IBM Flex System
An example of the configuration commands required for FCoE transit is shown in
Example 6-29. It uses VLAN 1002 for FCoE traffic, which is the default:
Example 6-29 FCoE Transit Configuration
cee enable
fcoe fips enable
vlan 1002
enable
member INTA1-INTA14,EXT5,EXT7
There are no changes to the configuration above if Switch Independent mode is used. The
differences when SPAR passthru mode or SPAR local mode are used are shown in the
remainder of this section.
SPAR (Switch Partition) configuration
SPAR configuration is performed exclusively on switches; the servers are unaware of it. In
SPAR local mode, the VLANs configured on the server must be explicitly configured on the
switches, but this is also true when the SPAR feature is not used.
SPAR pass-through mode - Switch side
For the examples in this section, the configuration is as follows:
򐂰 SPAR 2 has at least the necessary ports (INTA1 and EXT5 and 7) configured as members
of the SPAR domain. (Additional internal ports are added to the domain but were not used
in testing.)
򐂰 The two uplink ports are aggregated together using LACP key 5757.
򐂰 The VLAN associated with SPAR 2 is 3992; note that this is an outer-tag or tunnel VLAN
which never leaves the embedded switch on either server-facing or external-facing ports.
򐂰 The remaining internal and external ports on the embedded switches were not configured
in a SPAR domain and continue to be configured and to operate normally.
򐂰 The VLANs configured on the VMware server flow through the SPAR domain as a tunnel
and do not appear in its configuration. Those VLANs, along with the FCoE VLAN, are
configured on the upstream 8264’s.
򐂰 When FCoE is used with SPAR passthru mode, the only command that is used is the cee
enable command. FIPS snooping is performed on the switch upstream from the one
where SPAR is used, which in our testing would be one of the upstream 8264CS switches.
Example 6-30 shows SPAR configuration for the pass-thru mode.
Example 6-30 SPAR Pass- through Mode - Switch Configuration for Embedded 4093 switches
spar 2
uplink adminkey 5757
domain default vlan 3992
domain default member INTA1,INTA12-INTA14
enable
exit
197
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR
SPAR local mode - Switch side
The same server configuration was used to test a local SPAR domain in concert with Switch
Independent mode. The local SPAR domain was configured as follows:
򐂰 Ports INTA1 and EXT5 and 7 are included in the domain. The two external ports are
configured to use LACP key 5757.
򐂰 The default VLAN for the domain is 3001, which matches the LPVID for vnic 1.
򐂰 Local VLANs 3002, 3003, and 3004 are defined in the domain and associated with the
INTA1 and EXT6 ports. These VLANs would carry untagged traffic originating on the
server and sent via the vNIC instances.
򐂰 Local VLAN 2 is also defined on the server; it is used to carry the traffic from the guest
VM’s which are attached to the port groups discussed in 6.4, “pNIC and vNIC Virtual
Fabric modes with Layer 2 Failover” on page 163.
򐂰 The intended FCoE VLAN(s), 1001 or 1002, also need to be configured here if they are to
pass through the SPAR domain. When those VLANs are configured in the SPAR
configuration, the usual commands to create the VLANs and assign their members are not
used.
򐂰 Different server facing ports within the SPAR domain can have different VLAN
membership by specifying the ports desired for a specific VLAN in the domain local n
commands. This mirrors the ability to configure VLANs on a port with the usual
switchport allowed vlan or VLAN member commands.
򐂰 There is only a single uplink per SPAR domain, which can be an individual port, a static
portchannel, or a LACP portchannel. The uplink is always a member of all of the VLANs
defined within the SPAR local domain.
Example 6-31 shows SPAR configuration for the local mode.
Example 6-31 SPAR Local Mode - Switch Configuration for Embedded 4093 switches
slot-1#sho run | section spar
spar 2
uplink adminkey 5757
domain mode local
domain default vlan 3001
domain default member INTA1
domain local 1 vlan 3003
domain local 1 member INTA1
domain local 1 enable
domain local 2 vlan 3004
domain local 2 member INTA1
domain local 2 enable
domain local 3 vlan 1001 (1002 on second switch)
domain local 3 member INTA1
domain local 3 enable
domain local 4 vlan 2
domain local 4 member INTA1
domain local 4 enable
Upstream G8264 configuration for SPAR
The G8264 switches have no special configuration requirements when SPAR is used on the
downstream EN4093 switches. VLANs used on the servers must be configured on the G8264
switches, whether they are configured on the Emulex UEFI, the server operating system, or
learned by the servers as part of FCoE initialization. The configuration on the G8264 switches
Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1,
198 NIC Virtualization on IBM Flex System
for their side of the uplinks from the EN4093’s also must be configured to match the
configuration specified on the EN4093 switches.
If the upstream switches are to provide FCoE functions such as FCF or NPV, then those
functions would be part of their configuration in the usual way.
6.5.5 Verifying operation
In summary, it is possible to do the following if desired:
򐂰 Switch Independent mode with SPAR passthru domain
򐂰 Switch Independent mode with SPAR local domain
It is also possible to use these features separately from each other if desired. Switch
independent mode allows servers to see more NIC interfaces than are physically available
and allocate their bandwidth (outbound only). SPAR provides a way to partition the switches
on which it is available and tunnel VLANs through them with no additional configuration if
passthru mode is used.
VLAN numbering considerations
There are several different categories of VLANs which need to have assigned numbers with
these features, whether used separately or in concert. They are summarized below:
򐂰 Data-bearing VLANs - these are the VLANs that are defined both on the compute node
and in the upstream network and which actually carry data. They are typically assigned
and managed by the networking team in a customer environment. They are configured on
compute nodes in the Flex chassis and also on Top-of-Rack or other aggregation switches
which typically are immediately upstream of the embedded I/O modules.
򐂰 Switch Independent Mode LPVIDs - these are the VLANs which are configured in the
UEFI page for the Emulex adapter(s) on compute nodes. They are used as the VLANs for
untagged traffic sent from a compute node on a vNIC instance, so they are similar to a
native VLAN on a switch. LPVIDs can either be actual VLANs which are data-bearing
VLANs and will allow host or guest OS’s to send untagged traffic. One common approach,
however, is to use numbers for these VLANs which are unlikely to be used for data-bearing
VLANs, such as numbers in the 4000 range, and to then always send tagged traffic from
hypervisors or guess OS’s. This approach allows VLAN assignments to be changed
without the need to reboot the compute node and go through the UEFI configuration.
򐂰 SPAR domain default VLANs - these are used for the outer tag when traffic passes
through a SPAR passthru domain. They never leave the switch where they are configured.
They can be assigned the same number as a data-bearing VLAN number or a LPVID
number, although this may result in confusion when troubleshooting the environment.
򐂰 VLANs used in SPAR local domains. If a SPAR local domain is used then any
data-bearing VLANs, including the LPVID VLANs and others defined on OS’s must be
explicitly configured as the domain default VLAN or as local VLANs within the domain.
Use of SPAR local domains does not provide the ability to avoid configuring VLANs on the
embedded I/O modules which is one of the key benefits of using a SPAR passthru domain.
Verifying Operations: SPAR Passthru Mode
The status of the SPAR is shown through the show spar command. To verify that traffic is
flowing to the upstream switch, the show mac-address-table command is used on the
downlink ports and/or the desired VLANs. In our test bed, addresses from the VMware
management network, the virtual guest machines, and FCoE appear on the SPAR VLAN on
the embedded switch but on their proper VLANs on the upstream 4093 switch. If the MAC
199
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR
addresses do not appear in both places, traffic is not flowing properly. The SPAR VLAN, 3992,
is not seen at all on the upstream switch.
The commands to verify SPAR operations and their output are listed in Example 6-32,
Example 6-33, Example 6-34, and Example 6-35
Example 6-32 Show SPAR command output
slot-1#sho spar ?
1-8 Show SPAR ID information
slot-1#sho spar 2
Current SPAR 2 Settings:
enabled, name SPAR 2
Current SPAR 2 Uplink Settings:
port 0, PortChannel 0, adminkey 5757
Current SPAR 2 Domain Settings:
mode passthrough
Current SPAR 2 Default VLAN Domain Settings:
sparvid 3992
server port list: INTA1,INTA12-INTA14
Example 6-33 MAC address display on embedded switch
slot-1#sho mac int port inta1
MAC address VLAN Port Trnk State Permanent Openflow
----------------- -------- ------- ---- ----- --------- --------
00:0c:29:4a:60:ae 3992 INTA1 FWD N
00:0c:29:54:38:d8 3992 INTA1 FWD N
0e:fc:00:01:0c:00 3992 INTA1 FWD N
34:40:b5:be:8e:91 3992 INTA1 FWD N
Example 6-34 MAC address display from 8264 switch - downlinks to 4093
8264cs-1#sho mac portchannel 67
MAC address VLAN Port Trnk State Permanent
----------------- -------- ------- ---- ----- ---------
00:0c:29:4a:60:ae 2 67 TRK
00:0c:29:54:38:d8 1 67 TRK
0e:fc:00:01:0c:00 1001 67 TRK P
34:40:b5:be:8e:91 1 67 TRK
34:40:b5:be:8e:91 1001 67 TRK
Example 6-35 SPAR VLAN on upstream 8264
8264cs-1#sho mac vlan 3992
No FDB entries for VLAN 3992.
8264cs-1#sho vlan 3992
VLAN Name Status Ports
---- -------------------------------- ------ -------------------------
VLAN 3992 doesn't exist.
8264cs-1#
Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1,
200 NIC Virtualization on IBM Flex System
Verifying Operations: SPAR Local Mode
SPAR local mode requires explicit VLAN configuration for every VLAN that will flow through
the SPAR domain. These VLANs do appear in the MAC address table of the switch but as
shown in the configuration section ref above, they are configured using the domain local
n vlan command rather than the usual VLAN membership commands. In addition to the
steps shown in the section on verifying SPAR pass-through mode, the MAC address display
on both the embedded and upstream switches should show all of the VLANs which are to be
used.
An example of a MAC display from a SPAR local server is shown in Figure 6-32. It includes
the SPAR domain default VLAN, which is also the LPVID for vNIC 1, as well as addresses
and VLANs used by FCoE.
Figure 6-32 MAC addresses for server in SPAR local domain
The SPAR local VLANs would also need to be configured on upstream switch(es). In this test
case, they are the same as the vNIC LPVID VLANs.
Unlike SPAR pass-through mode, FIP snooping is configured in a SPAR local domain and the
show fcoe commands do work and would need to be checked to verify proper operations, as
shown in Figure 6-33.
Figure 6-33 FCoE information - 4093 switch - SPAR local domain mode
Verifying Operations: Switch Independent Mode
The status of the network can be seen from the presence of MAC address entries in the
embedded and upstream switches as well as from the tools included in the operating system.
show mac int port inta1
MAC address VLAN Port Trnk State Permanent Openflow
----------------- -------- ------- ---- ----- --------- --------
00:0c:29:4a:60:ae 2 INTA1 FWD N
00:0c:29:54:38:ce 2 INTA1 FWD N
0e:fc:00:01:0c:00 1001 INTA1 FWD P N
34:40:b5:be:8e:90 2 INTA1 FWD N
34:40:b5:be:8e:91 1001 INTA1 FWD N
34:40:b5:be:8e:91 3001 INTA1 FWD N
slot-1#sho fcoe fips fcoe
Total number of FCoE connections: 1
VN_PORT MAC FCF MAC Port Vlan
------------------------------------------------------
0e:fc:00:01:0c:00 a8:97:dc:0f:ed:c3 INTA1 1001
slot-1#sho fcoe fips fcf
Total number of FCFs detected: 2
FCF MAC Port Vlan
-----------------------------------
a8:97:dc:0f:ed:c3 PCH65 1001
a8:97:dc:0f:ed:c4 PCH65 1001
201
Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR
Examples of the MAC displays can be seen in Figure 6-32 on page 200 and Figure 6-33 on
page 200. The network adapter display from VMware is shown in Figure 6-34 and
Figure 6-35. VLAN 2 is configured on multiple vSwitches and this works as intended, but uses
different vNIC’s as seen by the OS. The active vNIC instances can be seen below followed by
a display of all of the NIC’s known to the OS. The differing bandwidth configurations for the
different vNICs on the two physical ports are reflected in the display below, except for the
FCoE vNIC’s which do not appear in the network adapter display.
Figure 6-34 VMware vSwitches with multiple vNIC instances
Figure 6-35 VMware Network Adapter display showing all six vNIC’s
Verifying Operations: Storage Access
Because FCoE traffic, whichever VLAN it is using, is not detected as such on the embedded
switches in this mode, the commands to display its status will not show anything when issued
on the embedded 4093’s. To determine the status of FCoE, appropriate commands need to
be issued on the upstream G8264 switch, as shown in Example 6-36 on page 202 and
Example 6-37 on page 202.
Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1,
202 NIC Virtualization on IBM Flex System
Example 6-36 FCoE query on embedded 4093 using SPAR Pass-through
slot-1#sho fcoe fips fcoe
FIP snooping is currently disabled.
Example 6-37 FCoE query on upstream 8264
8264cs-1#sho fcoe fips fcoe
Total number of FCoE connections: 1
VN_PORT MAC FCF MAC Port Vlan
------------------------------------------------------
0e:fc:00:01:0c:00 a8:97:dc:0f:ed:c3 PCH67 1001
Access to network storage also needs to be verified from the servers accessing it. Three
LUNs are shown as visible to the server (see Figure 6-36); when there is a configuration error
or a failure on either adapter, the number of LUNs and paths drops to zero on that adapter.
Figure 6-36 Storage Adapter status from VMware host
© Copyright IBM Corp. 2014. All rights reserved. 203
Draft Document for Review May 1, 2014 2:10 pm 8223abrv.fm
10GbE 10 Gigabit Ethernet
ACLs access control lists
AMON Auto Monitor
BACS Broadcom Advanced Control Suite
BASP Broadcom Advanced Server
Program
BE3 BladeEngine 3
BE3R BladeEngine 3R
BNT Blade Network Technologies
CEE Converged Enhanced Ethernet
CIFS Common Internet File System
CNAs converged network adapters
CSE Consulting System Engineer
DAC direct-attach cables
DACs direct-attach cables
DCB Data Center Bridging
DCE Data Center Ethernet
ECP Edge Control Protocol
ETS Enhanced Transmission Selection
EVB Edge Virtual Bridging
FC Fibre Channel
FCF Fibre Channel Forwarder
FCoE Fibre Channel over Ethernet
FIP FCoE Initialization Protocol
FO Failover
FoD Feature on Demand
HBA host bus adapter
HBAs host bus adapters
IBM International Business Machines
Corporation
ISL inter-switch link
ITSO International Technical Support
Organization
KVM Kernel-based Virtual Machine
LACP Link Aggregation Control Protocol
LAG Link Aggregation Group
LANs local area networks
LOM LAN on system board
MAC Media access control
MMON Manual Monitor
MSTP Multiple STP
Abbreviations and acronyms
NAS network-attached storage
NFS Network File System
NIC Network Interface Card
NPIV N_Port ID Virtualization
NPV N_Port Virtualization
NTP Network Time Protocol
PDUs protocol data units
PFA PCI Function Address
PFC Priority-based Flow Control
PIM Protocol Independent Multicast
PVRST Per-VLAN Rapid STP
RMON Remote Monitoring
ROI return on investment
RSCN Registered State Change
Notification
RSTP Rapid STP
RoCE RDMA over Converged Ethernet
SAN storage area network
SANs storage area networks
SAS serial-attached SCSI
SLB Smart Load Balance
SLP Service Location Protocol
SNSC System Networking Switch Center
SPAR Switch Partitioning
SR SFP+ Transceiver
SoL Supports Serial over LAN
TLV Type-Length-Value
TOE TCP offload Engine
Tb terabit
ToR Top of Rack
UFP Unified Fabric Port
UFPs Unified fabric ports
VEB Virtual Ethernet Bridging
VEPA Virtual Ethernet Port Aggregator
VM virtual machine
VMs virtual machines
VSI Virtual Station Interface
VSS Virtual Switch System
iSCSI Internet Small Computer System
Interface
isCLI industry standard CLI
8223abrv.fm Draft Document for Review May 1, 2014 2:10 pm
204 NIC Virtualization on IBM Flex System
pNIC Physical NIC mode
sFTP Secure FTP
vLAG virtual Link Aggregation
vLAGs Virtual link aggregation groups
vNIC virtual Network Interface Card
vNICs Virtual NICs
vPC virtual Port Channel
vPort virtual port
© Copyright IBM Corp. 2014. All rights reserved. 205
Draft Document for Review May 1, 2014 2:10 pm 8223bibl.fm
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
򐂰 IBM Flex System Networking in an Enterprise Data Center, 2nd Edition, REDP-4834
򐂰 IBM Flex System and PureFlex System Network Implementation, SG24-8089
򐂰 Storage and Network Convergence Using FCoE and iSCSI, SG24-7986
򐂰 Implementing Systems Management of IBM PureFlex System, SG24-8060
򐂰 IBM PureFlex System and IBM Flex System Products and Technology, SG24-7984
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Help from IBM
IBM Support and downloads
ibm.com/support
IBM Global Services
ibm.com/services
8223bibl.fm Draft Document for Review May 1, 2014 2:10 pm
206 NIC Virtualization on IBM Flex System
To
determine
the
spine
width
of
a
book,
you
divide
the
paper
PPI
into
the
number
of
pages
in
the
book.
An
example
is
a
250
page
book
using
Plainfield
opaque
50#
smooth
which
has
a
PPI
of
526.
Divided
250
by
526
which
equals
a
spine
width
of
.4752.
In
this
case,
you
would
use
the
.5”
spine.
Now
select
the
Spine
width
for
the
book
and
hide
the
others:
SpecialConditional
TextShow/HideSpineSize(--Hide:)Set
.
Move
the
changed
Conditional
text
settings
to
all
files
in
your
book
by
opening
the
book
file
with
the
spine.fm
still
open
and
FileImportFormats
the
Conditional
Text
Settings
(ONLY!)
to
the
book
files.
Draft
Document
for
Review
May
1,
2014
2:10
pm
8223spine.fm
207
(0.2”spine)
0.17”-0.473”
90-249
pages
NIC
Virtualization
on
IBM
Flex
System
NIC
Virtualization
on
IBM
Flex
System
NIC
Virtualization
on
IBM
Flex
System
NIC
Virtualization
on
IBM
Flex
System
To
determine
the
spine
width
of
a
book,
you
divide
the
paper
PPI
into
the
number
of
pages
in
the
book.
An
example
is
a
250
page
book
using
Plainfield
opaque
50#
smooth
which
has
a
PPI
of
526.
Divided
250
by
526
which
equals
a
spine
width
of
.4752.
In
this
case,
you
would
use
the
.5”
spine.
Now
select
the
Spine
width
for
the
book
and
hide
the
others:
SpecialConditional
TextShow/HideSpineSize(--Hide:)Set
.
Move
the
changed
Conditional
text
settings
to
all
files
in
your
book
by
opening
the
book
file
with
the
spine.fm
still
open
and
FileImportFormats
the
Conditional
Text
Settings
(ONLY!)
to
the
book
files.
Draft
Document
for
Review
May
1,
2014
2:10
pm
8223spine.fm
208
NIC
Virtualization
on
IBM
Flex
System
NIC
Virtualization
on
IBM
Flex
System
NIC Virtualization on IBM Flex Systems
®
SG24-8223-00 ISBN
Draft Document for Review May 1, 2014 2:11 pm
INTERNATIONAL
TECHNICAL
SUPPORT
ORGANIZATION
BUILDING TECHNICAL
INFORMATION BASED ON
PRACTICAL EXPERIENCE
IBM Redbooks are developed
by the IBM International
Technical Support
Organization. Experts from
IBM, Customers and Partners
from around the world create
timely technical information
based on realistic scenarios.
Specific recommendations
are provided to help you
implement IT solutions more
effectively in your
environment.
For more information:
ibm.com/redbooks
®
NIC Virtualization on IBM
Flex System
Introduces NIC
virtualization
concepts and
technologies
Discusses vNIC
deployment scenarios
Provides vNIC
configuration
examples
The deployment of server virtualization technologies in data centers
requires significant efforts in providing sufficient network I/O
bandwidth to satisfy the demand of virtualized applications and
services. For example, every virtualized system can host several dozen
network applications and services. Each of these services requires
certain bandwidth (or speed) to function properly. Furthermore,
because of different network traffic patterns that are relevant to
different service types, these traffic flows can interfere with each other.
They can lead to serious network problems, including the inability of
the service to perform its functions.
The NIC virtualization solutions on IBM® Flex System address these
issues. The solutions are based on the IBM Flex System® Enterprise
Chassis with a 10 Gbps Converged Enhanced Ethernet infrastructure.
This infrastructure is built on IBM RackSwitch™ G8264 and G8264CS
Top of Rack (ToR) switches, IBM Flex System Fabric CN4093 and
EN4093R 10 Gbps Ethernet switch modules, and IBM Flex System
SI4093 Switch Interconnect modules in the chassis and the Emulex
Virtual Fabric Adapters in each compute node.
This IBM Redbooks® publication provides configuration scenarios that
use leading edge IBM networking technologies combined with the
Emulex Virtual Fabric adapters. This book is for IBM, IBM Business
Partner and client networking professionals who want to learn how to
implement NIC virtualization solutions and switch interconnect
technologies on IBM Flex System by using the IBM Unified Fabric Port
(UFP) mode, Switch Independent mode, and IBM Virtual Fabric mode.
Back cover

More Related Content

PDF
Introduction to the EMC VNX Series VNX5100, VNX5300, VNX5500, VNX5700, and VN...
 
PDF
TechBook: Using EMC VNX Storage with VMware vSphere
 
PDF
VNX Snapshots
 
PDF
IBM AIX Version 7.1 Differences Guide
PDF
Sg248203
PDF
IBM Flex System Interoperability Guide
PDF
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
 
PDF
Networking for Storage Virtualization and EMC RecoverPoint TechBook
 
Introduction to the EMC VNX Series VNX5100, VNX5300, VNX5500, VNX5700, and VN...
 
TechBook: Using EMC VNX Storage with VMware vSphere
 
VNX Snapshots
 
IBM AIX Version 7.1 Differences Guide
Sg248203
IBM Flex System Interoperability Guide
Techbook : Using EMC Symmetrix Storage in VMware vSphere Environments
 
Networking for Storage Virtualization and EMC RecoverPoint TechBook
 

What's hot (18)

PDF
Hypervisor Framework
PDF
DB2 10 for Linux on System z Using z/VM v6.2, Single System Image Clusters an...
PDF
IBM Power 710 and 730 Technical Overview and Introduction
PDF
Implementing omegamon xe for messaging v6.0 sg247357
PDF
I/O Scalability in Xen
PDF
TechBook: EMC VPLEX Metro Witness Technology and High Availability
 
PPTX
Hypervisors
PPTX
Hypervisor seminar
PDF
Redbook: Running IBM WebSphere Application Server on System p and AIX: Optimi...
PDF
IBM PowerVC Introduction and Configuration
PDF
Using EMC Symmetrix Storage in VMware vSphere Environments
 
PDF
Huong Dan Cau Hinh Cac Tinh Nang Co Ban Cho Cisco Router
PDF
Guide server virtualization_deployment
PDF
Ibm total storage productivity center for replication on windows 2003 sg247250
PDF
Ibm tivoli omegamon xe v3.1.0 deep dive on z os sg247155
PDF
Ibm tivoli storage manager bare machine recovery for aix with sysback - red...
PDF
Rfs7000 series switch troubleshooting guide
PPTX
2014.08.30 Virtual Machine Threat 세미나
Hypervisor Framework
DB2 10 for Linux on System z Using z/VM v6.2, Single System Image Clusters an...
IBM Power 710 and 730 Technical Overview and Introduction
Implementing omegamon xe for messaging v6.0 sg247357
I/O Scalability in Xen
TechBook: EMC VPLEX Metro Witness Technology and High Availability
 
Hypervisors
Hypervisor seminar
Redbook: Running IBM WebSphere Application Server on System p and AIX: Optimi...
IBM PowerVC Introduction and Configuration
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Huong Dan Cau Hinh Cac Tinh Nang Co Ban Cho Cisco Router
Guide server virtualization_deployment
Ibm total storage productivity center for replication on windows 2003 sg247250
Ibm tivoli omegamon xe v3.1.0 deep dive on z os sg247155
Ibm tivoli storage manager bare machine recovery for aix with sysback - red...
Rfs7000 series switch troubleshooting guide
2014.08.30 Virtual Machine Threat 세미나
Ad

Viewers also liked (20)

PPT
IBM System Networking Easy Connect Mode
PDF
Calender of Side Events #CSW59 #Beijing20 @UNWOMEN @UNNGLS
PPT
Ibm pure flex overview cust pr
KEY
Social Media for Events
PDF
احكام بيع التقسيط
PDF
مراجعة الصف الثانى الاعدادى
PPT
Usabilidad
PPT
Socialising the enterprise
PPT
Zeimer BNI Presentation June 8, 2011
PDF
C. V Hanaa Ahmed
PPTX
Budget Cuts And Their Effects
DOCX
Nelly Osorio Godoy
DOCX
Alba Lucia Sanchez Mejia
DOC
2 3ton per hour sand gold processing
ODT
Agribusiness
PDF
Cascalog at Hadoop Day
DOC
PPT
Engaging the older Participant
PDF
Congelamiento de precios productos en wal mart
PPSX
Daniel Avidor - Deciphering the Viral Code – The Secrets of Redmatch
IBM System Networking Easy Connect Mode
Calender of Side Events #CSW59 #Beijing20 @UNWOMEN @UNNGLS
Ibm pure flex overview cust pr
Social Media for Events
احكام بيع التقسيط
مراجعة الصف الثانى الاعدادى
Usabilidad
Socialising the enterprise
Zeimer BNI Presentation June 8, 2011
C. V Hanaa Ahmed
Budget Cuts And Their Effects
Nelly Osorio Godoy
Alba Lucia Sanchez Mejia
2 3ton per hour sand gold processing
Agribusiness
Cascalog at Hadoop Day
Engaging the older Participant
Congelamiento de precios productos en wal mart
Daniel Avidor - Deciphering the Viral Code – The Secrets of Redmatch
Ad

Similar to NIC Virtualization on IBM Flex Systems (20)

PDF
Ibm flex system and pure flex system network implementation with cisco systems
PDF
Advanced Networking Concepts Applied Using Linux on IBM System z
PDF
IBM Flex System Networking in an Enterprise Data Center
PDF
IBM PowerVM Best Practices
PDF
IBM PowerVM Virtualization Introduction and Configuration
PDF
Introducing and Implementing IBM FlashSystem V9000
PDF
IBM Flex System Interoperability Guide
PDF
AIX 5L Differences Guide Version 5.3 Edition
PDF
Ibm power vc version 1.2.3 introduction and configuration
PDF
Tcpip Tutorial And Technical Overview 7th Edition Ibm Redbooks
PDF
Ref arch for ve sg248155
PDF
IBM Data Center Networking: Planning for Virtualization and Cloud Computing
PDF
Getting Started with KVM for IBM z Systems
PDF
IBM Power 750 and 755 Technical Overview and Introduction
PDF
Implementing Linux With Ibm Disk Storage Ibm Redbooks
PDF
IBM Power10.pdf
PDF
IBM Flex System p260 and p460 Planning and Implementation Guide
PDF
Implementing IBM SmartCloud Entry on IBM PureFlex System
PDF
redp5222.pdf
PDF
IBM zEnterprise 114 Technical Guide
Ibm flex system and pure flex system network implementation with cisco systems
Advanced Networking Concepts Applied Using Linux on IBM System z
IBM Flex System Networking in an Enterprise Data Center
IBM PowerVM Best Practices
IBM PowerVM Virtualization Introduction and Configuration
Introducing and Implementing IBM FlashSystem V9000
IBM Flex System Interoperability Guide
AIX 5L Differences Guide Version 5.3 Edition
Ibm power vc version 1.2.3 introduction and configuration
Tcpip Tutorial And Technical Overview 7th Edition Ibm Redbooks
Ref arch for ve sg248155
IBM Data Center Networking: Planning for Virtualization and Cloud Computing
Getting Started with KVM for IBM z Systems
IBM Power 750 and 755 Technical Overview and Introduction
Implementing Linux With Ibm Disk Storage Ibm Redbooks
IBM Power10.pdf
IBM Flex System p260 and p460 Planning and Implementation Guide
Implementing IBM SmartCloud Entry on IBM PureFlex System
redp5222.pdf
IBM zEnterprise 114 Technical Guide

More from Angel Villar Garea (20)

PDF
SlideShare Stats 2014
PDF
Good bye and many thanks!!
PDF
Guía básica de configuración switches Flex System
PDF
Comparing Enterprise Server And Storage Networking Options
PDF
IBM System Networking Portfolio Update, June 2014
PDF
Flexible Port Mapping
PDF
IBM RackSwitch G7028 Pictures
PDF
IBM Flex Systems Interconnect Fabric
PDF
SHS IBM SAP HANA - 100-times boost
PDF
Recommended Security Practices on IBM Switches and Routers
PDF
Introducing SDN VE and network virtualization
PDF
IBM 40Gb Ethernet - A competitive alternative to Infiniband
PDF
IBM SDN VE Platform
PDF
IBM SDN for VE - Jan 2014
PDF
El software se abre camino por la senda de las redes con SDN
PDF
Angel 2013 Year on SlideShare
PDF
Let's build a bright 2014
PDF
Nuevos modelos de crecimiento para generar oportunidades de negocio y empleo
PDF
IBM Flash Systems, un paso adelante
PDF
Regionalgas Euskirchen beats competitors to new business
SlideShare Stats 2014
Good bye and many thanks!!
Guía básica de configuración switches Flex System
Comparing Enterprise Server And Storage Networking Options
IBM System Networking Portfolio Update, June 2014
Flexible Port Mapping
IBM RackSwitch G7028 Pictures
IBM Flex Systems Interconnect Fabric
SHS IBM SAP HANA - 100-times boost
Recommended Security Practices on IBM Switches and Routers
Introducing SDN VE and network virtualization
IBM 40Gb Ethernet - A competitive alternative to Infiniband
IBM SDN VE Platform
IBM SDN for VE - Jan 2014
El software se abre camino por la senda de las redes con SDN
Angel 2013 Year on SlideShare
Let's build a bright 2014
Nuevos modelos de crecimiento para generar oportunidades de negocio y empleo
IBM Flash Systems, un paso adelante
Regionalgas Euskirchen beats competitors to new business

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
Modernizing your data center with Dell and AMD
Mobile App Security Testing_ A Comprehensive Guide.pdf
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
The AUB Centre for AI in Media Proposal.docx
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I

NIC Virtualization on IBM Flex Systems

  • 1. Draft Document for Review May 1, 2014 2:10 pm SG24-8223-00 ibm.com/redbooks Front cover NIC Virtualization on IBM Flex System Scott Irwin Scott Lorditch Matt Slavin Ilya Krutov Introduces NIC virtualization concepts and technologies Discusses vNIC deployment scenarios Provides vNIC configuration examples
  • 3. International Technical Support Organization NIC Virtualization on IBM Flex System May 2014 Draft Document for Review May 1, 2014 2:10 pm 8223edno.fm SG24-8223-00
  • 4. © Copyright International Business Machines Corporation 2014. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 8223edno.fm Draft Document for Review May 1, 2014 2:10 pm First Edition (May 2014) This edition applies to: 򐂰 IBM Networking Operating System 7.7 򐂰 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch 򐂰 IBM Flex System Fabric EN4093R 10Gb Scalable Switch 򐂰 IBM RackSwitch G8264CS 򐂰 IBM Flex System Embedded 10Gb Virtual Fabric Adapter 򐂰 IBM Flex System CN4054 10Gb Virtual Fabric Adapter 򐂰 IBM Flex System CN4054R 10Gb Virtual Fabric Adapter This document was created or updated on May 1, 2014. Note: Before using this information and the product it supports, read the information in “Notices” on page vii.
  • 5. © Copyright IBM Corp. 2014. All rights reserved. iii Draft Document for Review May 1, 2014 2:10 pm 8223TOC.fm Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .x Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Overview of Flex System I/O module virtualization technologies . . . . . . . . . . . . . . . . . . 2 1.1.1 Introduction to converged fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.2 Introduction to vLAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.3 Introduction to stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.4 Introduction to SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.5 Easy Connect Q-in-Q solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.6 Introduction to the Failover feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Introduction to NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.1 vNIC based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.2 Unified Fabric Port based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.3 Comparing vNIC modes and UFP modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 2. Converged networking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1 What convergence is. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.1 Calling it what it is . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Vision of convergence in data centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 The interest in convergence now . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Fibre Channel SANs today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5 Ethernet-based storage today. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 Benefits of convergence in storage and network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7 Challenge of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.9 Fibre Channel over Ethernet protocol stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.10 iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.11 iSCSI versus FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.11.1 Key similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.11.2 Key differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Chapter 3. IBM Flex System networking architecture and portfolio. . . . . . . . . . . . . . . 27 3.1 Enterprise Chassis I/O architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 IBM Flex System Ethernet I/O modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches . . . . . 31 3.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch. . . . . . . . . . 36 3.2.3 IBM Flex System Fabric SI4093 System Interconnect Module. . . . . . . . . . . . . . . 42 3.2.4 I/O modules and cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 IBM Flex System Ethernet adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.1 Embedded 10Gb Virtual Fabric Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.3.2 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters. . . . . . . . . . . 48
  • 6. 8223TOC.fm Draft Document for Review May 1, 2014 2:10 pm iv NIC Virtualization on IBM Flex System 3.3.3 IBM Flex System CN4022 2-port 10Gb Converged Adapter . . . . . . . . . . . . . . . . 50 3.3.4 IBM Flex System x222 Compute Node LOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter 4. NIC virtualization considerations on the switch side . . . . . . . . . . . . . . . . . 55 4.1 Virtual Fabric vNIC solution capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.1.1 Virtual Fabric mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.1.2 Switch Independent mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Unified Fabric Port feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.1 UFP Access and Trunk modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2.2 UFP Tunnel mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2.3 UFP FCoE mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.4 UFP Auto mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.5 The following rules and attributes are associated with UFP vPorts . . . . . . . . . . . 69 4.3 Compute node NIC to I/O module connectivity mapping . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.1 Embedded 10Gb VFA (LoM) - Mezzanine 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3.2 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1. . . . . . . . . . . . . . 72 4.3.3 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 and 2. . . . . . . . . 72 4.3.4 IBM Flex System x222 Compute Node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Chapter 5. NIC virtualization considerations on the server side. . . . . . . . . . . . . . . . . 75 5.1 Introduction to enabling Virtual NICs on the server. . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.1.1 Getting in to the virtual NIC configuration section of UEFI . . . . . . . . . . . . . . . . . . 76 5.1.2 Initially enabling virtual NIC functionality via UEFI . . . . . . . . . . . . . . . . . . . . . . . . 85 5.1.3 Special settings for the different modes of virtual NIC via UEFI . . . . . . . . . . . . . . 86 5.1.4 Setting the Emulex virtual NIC settings back to factory default. . . . . . . . . . . . . . . 91 5.2 Other methods for configuring virtual NICs on the server . . . . . . . . . . . . . . . . . . . . . . . 92 5.2.1 FSM Configuration Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3 Utilizing physical and virtual NICs in the OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3.1 Introduction to teaming/bonding on the server . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3.2 OS side teaming/bonding and upstream network requirements . . . . . . . . . . . . . 122 5.3.3 Discussion of physical NIC connections and logical enumeration . . . . . . . . . . . 128 Chapter 6. Flex System NIC virtulization deployment scenarios . . . . . . . . . . . . . . . . 133 6.1 Introduction to deployment examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6.2 UFP mode virtual NIC and Layer 2 Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.2.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.2.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 6.2.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.2.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.2.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 6.3 UFP mode virtual NIC with vLAG and FCoE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.3.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.3.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.3.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.4 pNIC and vNIC Virtual Fabric modes with Layer 2 Failover . . . . . . . . . . . . . . . . . . . . 163 6.4.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.4.2 Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 6.4.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.4.4 Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.4.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.5 Switch Independent mode with SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.5.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
  • 7. Contents v Draft Document for Review May 1, 2014 2:10 pm 8223TOC.fm 6.5.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.5.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.5.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.5.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
  • 8. 8223TOC.fm Draft Document for Review May 1, 2014 2:10 pm vi NIC Virtualization on IBM Flex System
  • 9. © Copyright IBM Corp. 2014. All rights reserved. vii Draft Document for Review May 1, 2014 2:10 pm 8223spec.fm Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
  • 10. 8223spec.fm Draft Document for Review May 1, 2014 2:10 pm viii NIC Virtualization on IBM Flex System Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Blade Network Technologies® BladeCenter® BNT® IBM® IBM Flex System® Power Systems™ PowerVM® PureFlex® RackSwitch™ Redbooks® Redbooks (logo) ® System x® VMready® The following terms are trademarks of other companies: Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
  • 11. © Copyright IBM Corp. 2014. All rights reserved. ix Draft Document for Review May 1, 2014 2:10 pm 8223pref.fm Preface The deployment of server virtualization technologies in data centers requires significant efforts in providing sufficient network I/O bandwidth to satisfy the demand of virtualized applications and services. For example, every virtualized system can host several dozen network applications and services. Each of these services requires certain bandwidth (or speed) to function properly. Furthermore, because of different network traffic patterns that are relevant to different service types, these traffic flows can interfere with each other. They can lead to serious network problems, including the inability of the service to perform its functions. The NIC virtualization solutions on IBM® Flex System address these issues. The solutions are based on the IBM Flex System® Enterprise Chassis with a 10 Gbps Converged Enhanced Ethernet infrastructure. This infrastructure is built on IBM RackSwitch™ G8264 and G8264CS Top of Rack (ToR) switches, IBM Flex System Fabric CN4093 and EN4093R 10 Gbps Ethernet switch modules, and IBM Flex System SI4093 Switch Interconnect modules in the chassis and the Emulex and Broadcom Virtual Fabric Adapters in each compute node. This IBM Redbooks® publication provides configuration scenarios that use leading edge IBM networking technologies combined with the Emulex Virtual Fabric adapters. This book is for IBM, IBM Business Partner and client networking professionals who want to learn how to implement NIC virtualization solutions and switch interconnect technologies on IBM Flex System by using the IBM Unified Fabric Port (UFP) mode, Switch Independent mode, and IBM Virtual Fabric mode. Authors This book was produced by a team of specialists from around the world working at the International Technical Support Organization, Raleigh Center. Ilya Krutov is a Project Leader at the ITSO Center in Raleigh and has been with IBM since 1998. Before he joined the ITSO, Ilya served in IBM as a Run Rate Team Leader, Portfolio Manager, Brand Manager, Technical Sales Specialist, and Certified Instructor. Ilya has expert knowledge in IBM System x®, BladeCenter®, and Flex System products and technologies, virtualization and cloud computing, and data center networking. He has authored over 150 books, papers, product guides, and solution guides. He has a bachelor’s degree in Computer Engineering from the Moscow Engineering and Physics Institute. Scott Irwin is a Consulting System Engineer (CSE) for IBm System Networking. He joined IBM in November of 2010 as part of the Blade Network Technologies®, (BNT®) acquisition. His Networking background spans well over 16 years as both a Customer Support Escalation Engineer and a Customer facing Field Systems Engineer. In May of 2007, he was promoted to Consulting Systems Engineer with a focus on deep customer troubleshooting. His responsibilities are to support customer Proof of Concepts, assist with paid installations and training and provide support for both pre and post Sales focusing on all verticals (Public Sector, High Frequency Trading, Service Provider, Mid Market and Enterprise).
  • 12. 8223pref.fm Draft Document for Review May 1, 2014 2:10 pm x NIC Virtualization on IBM Flex System Scott Lorditch is a Consulting Systems Engineer for IBM System Networking. He performs network architecture assessments, and develops designs and proposals for implementing GbE Switch Module products for the IBM BladeCenter. He also developed several training and lab sessions for IBM technical and sales personnel. Previously, Scott spent almost 20 years working on networking in various industries, working as a senior network architect, a product manager for managed hosting services, and manager of electronic securities transfer projects. Scott holds a BS degree in Operations Research with a specialization in computer science from Cornell University. Matt Slavin is a Consulting Systems Engineer for IBM Systems Networking, based out of Tulsa, Oklahoma, and currently providing network consulting skills to the Americas. He has a background of over 30 years of hands-on systems and network design, installation, and troubleshooting. Most recently, he has focused on data center networking where he is leading client efforts in adopting new and potently game-changing technologies into their day-to-day operations. Matt joined IBM through the acquisition of Blade Network Technologies, and prior to that has worked at some of the top systems and networking companies in the world. Thanks to the following people for their contributions to this project: Tamikia Barrow, Cheryl Gera, Chris Rayns, Jon Tate, David Watts, Debbie Willmschen International Technical Support Organization, Raleigh Center Nghiem Chu, Sai Chan, Michael Easterly, Heidi Griffin, Richard Mancini, Shekhar Mishra, Heather Richardson, Hector Sanchez, Tim Shaughnessy IBM Jeff Lin Emulex Now you can become a published author, too! Here’s an opportunity to spotlight your skills, grow your career, and become a published author—all at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html
  • 13. Preface xi Draft Document for Review May 1, 2014 2:10 pm 8223pref.fm Comments welcome Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: 򐂰 Use the online Contact us review Redbooks form found at: ibm.com/redbooks 򐂰 Send your comments in an email to: redbooks@us.ibm.com 򐂰 Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 Stay connected to IBM Redbooks 򐂰 Find us on Facebook: http://www.facebook.com/IBMRedbooks 򐂰 Follow us on Twitter: http://twitter.com/ibmredbooks 򐂰 Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 򐂰 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm 򐂰 Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html
  • 14. 8223pref.fm Draft Document for Review May 1, 2014 2:10 pm xii NIC Virtualization on IBM Flex System
  • 15. © Copyright IBM Corp. 2014. All rights reserved. 1 Draft Document for Review May 1, 2014 2:10 pm Introduction.fm Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment This chapter introduces the various virtualization features available with certain I/O Modules and converged network adapters (CNAs) in the IBM PureFlex® System environment. The primary focus of this paper are the EN4093R, CN4093, and the SI4093, along with related server side CNA virtulization features. Although other I/O modules are available for the Flex System Enterprise Chassis environment, unless otherwise noted, those other I/O modules do not support the virtualization features discussed in this document and are not covered here. This chapter includes the following sections: 򐂰 1.1, “Overview of Flex System I/O module virtualization technologies” on page 2 򐂰 1.2, “Introduction to NIC virtualization” on page 10 1
  • 16. Introduction.fm Draft Document for Review May 1, 2014 2:10 pm 2 NIC Virtualization on IBM Flex System 1.1 Overview of Flex System I/O module virtualization technologies The term virtualization can mean many different things to different people, and in different contexts. For example, in the server world it is often associated with taking bare metal platforms and putting in a layer of software (referred to as a hypervisor) that permits multiple virtual machines (VMs) to run on that single physical platform, with each VM thinking it owns the entire hardware platform. In the network world, there are many different concepts of virtualization. Such things as overlay technologies, that let a user run one network on top of another network, usually with the goal of hiding the complexities of the underlying network (often referred to as overlay networking). Another form of network virtualization would be Openflow technology, which de-couples a switches control plane from the switch, and allows the switching path decisions to be made from a central control point. And then there are other forms of virtualization, such as cross chassis aggregation (also known as cross-switch aggregation), virtualized NIC technologies, and converged fabrics. This paper is focused on the latter set of virtualization forms, specifically the following set of features: 򐂰 Converged fabrics - Fibre Channel over Ethernet (FCoE) and internet Small Computer Systems Interconnect (iSCSI) 򐂰 virtual Link Aggregation (vLAG) - A form of cross switch aggregation 򐂰 Stacking - Virtualizing the management plane and the switching fabric 򐂰 Switch Partitioning (SPAR) - Masking the I/O Module from the host and upstream network 򐂰 Easy Connect Q-in-Q solutions - More ways to mask the I/O Modules from connecting devices 򐂰 NIC virtualization - Allowing a single physical 10G NIC to represent multiple NICs to the host OS Although we will be introducing all of these topics in this section, the primary focus of this paper will be around how the last item (NIC virtualization) integrates into the various other features, and the surrounding customer environment. The specific NIC virtualization features that will be discussed in detail in this paper include the following: 򐂰 IBM Virtual Fabric mode - also known as vNIC Virtual Fabric mode, including both Dedicated Uplink Mode (default) and Shared Uplink Mode (optional) operations 򐂰 Switch Independent Mode - also known as vNIC Switch Independent Mode 򐂰 Unified Fabric Port - also known as IBM Unified Fabric Protocol, or just UFP - All modes Important: The term vNIC can be used both generically for all virtual NIC technologies, or as a vendor specific term. For example, VMware calls the virtual NIC that resides inside a VM a vNIC. Unless otherwise noted, the use of the term vNIC in this paper is referring to a specific feature available on the Flex System I/O modules and Emulex CNAs inside physical hosts. In a related fashion, the term vPort has multiple connotations, for example, used by Microsoft for their Hyper-V environment. Unless otherwise noted, the use of the term vPort in this paper is referring to the UFP feature on the Flex System I/O modules and Emulex CNAs inside physical hosts.
  • 17. Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 3 Draft Document for Review May 1, 2014 2:10 pm Introduction.fm 1.1.1 Introduction to converged fabrics As the name implies, converged fabrics are all about taking a set of protocols and data designed to run on top of one kind of physical medium, and allowing them to be carried on top of a different physical medium. This provides a number of cost benefits, such as reducing the number of physical cabling plants that are required, removing the need for separate physical NICs and HBAs, including a potential reduction in power and cooling. From an OpEx perspective it can reduce the cost associated with the management of separate physical infrastructures. In the datacenter world, two of the most common forms of converged fabrics are FCoE and iSCSI. FCoE allows a host to use its 10 Gb Ethernet connections to access Fibre Channel attached remote storage, as if it were physically Fibre Channel attached to the host, when in fact the FC traffic is encapsulated into FCoE frames and carried to the remote storage via an Ethernet network. iSCSI takes a protocol that was originally designed for hosts to talk to relatively close physical storage over physical SCSI cables, and converts it to utilize IP and run over an Ethernet network, and thus be able to access storage way beyond the limitations of a physical SCSI based solution. Both of these topics are discussed in more detail in Chapter 2, “Converged networking” on page 15. 1.1.2 Introduction to vLAG In its simplest terms, vLAG is a technology designed to enhance traditional Ethernet link aggregations (sometimes referred to generically as Portchannels or Etherchannels). It is important to note that vLAG is not a form of aggregation in its own right, but an enhancement to aggregations. As some background, under current IEEE specifications, an aggregation is still defined as a bundle of similar links between two, and only two devices, bound together to operate as a single logical link. By today’s standards based definitions, you cannot create an aggregation on one device and have these links of that aggregation connect to more than a single device on the other side of the aggregation. The use of only two devices in this fashion limits the ability to offer certain robust designs. Although the standards bodies are working on a solution that provides split aggregations across devices, most vendors have developed their own versions of this multi-chassis aggregation. For example, Cisco has virtual Port Channel (vPC) on NX OS products, and Virtual Switch System (VSS) on the 6500 IOS products. IBM offers virtual Link Aggregation (vLAG) on many of the IBM Top of Rack (ToR) solutions, and on the EN4093R and CN4093 Flex System I/O modules. The primary goal of virtual link aggregation is to overcome the limit imposed by the current standards-based aggregation, and provide a distributed aggregation across a pair of switches instead of a single switch. Doing so results in a reduction of single points of failure, while still maintaining a loop-free, non-blocking environment. Important: All I/O module features discussed in this paper are based on the latest available firmware at the time of this writing (7.7.9 for the EN4093R and CN4093, and 7.7.8 for the SI4093 System Interconnect Module).
  • 18. Introduction.fm Draft Document for Review May 1, 2014 2:10 pm 4 NIC Virtualization on IBM Flex System Figure 1-1, shows an example of how vLAG can create a single common uplink out of a pair of embedded I/O Modules. This creates a non-looped path with no blocking links, offering the maximum amount of bandwidth for the links, and no single point of failure. Figure 1-1 Non-looped design using multi-chassis aggregation on both sides Although this vLAG based design is considered the most optimal, not all I/O module virtualization options support this topology, for example, Virtual Fabric vNIC mode or SPAR is not supported with vLAG. Another potentially limiting factor with vLAG (and other such cross-chassis aggregations such as vPC and VSS) is that it only supports a pair of switches acting as one for this cross-chassis aggregation, and not more than two. If the desire is to split an aggregation across more than two switches, stacking might be an option to consider. 1.1.3 Introduction to stacking Stacking provides the ability to take up to eight physical I/O modules and treat them as a single logical switch from a port usage and management perspective. This means ports on different I/O modules in the stack can be part of a common aggregation, and you only log in to a single IP address to manage all I/O modules in the stack. For devices that are attaching to the stack, the stack looks and acts like a single large switch. Stacking is supported on the EN4093R and CN4093 I/O modules. It is provided by reserving a group of uplinks into stacking links and creating a ring of I/O modules with these links. The ring design ensures the loss of a single link or single I/O module in the stack does not lead to a disruption of the stack. Before v7.7 releases of code, it was possible to stack the EN4093R only into a common stack of like model I/O modules. However, in v7.7 and later code, support was added to add a pair CN4093s into a hybrid stack of EN4093s to add Fibre Channel Forwarder (FCF) capability into the stack. The limit for this hybrid stacking is a maximum of 6 x EN4093Rs and 2 x CN4093s in a common stack. Chassis Compute Node NIC 1 NIC 2 Upstream Network ToR Switch 2 ToR Switch 1 Multi-chassis Aggregation (vLAG, vPC, mLAG, etc) I/O Module 1 I/O Module 2 Multi-chassis Aggregation (vLAG) Important: When using the EN4093R and CN4093 in hybrid stacking, only the CN4093 is allowed to act as a stack master or stack backup master for the stack.
  • 19. Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 5 Draft Document for Review May 1, 2014 2:10 pm Introduction.fm Stacking the Flex System chassis I/O modules with IBM Top of Rack switches that also support stacking is not allowed. Connections from a stack of Flex System chassis I/O modules to upstream switches can be made with normal single or aggregated connections, including the use of vLAG/vPC on the upstream switches to connect links across stack members into a common non-blocking fabric between the stack and the Top of Rack switches. An example of four I/O modules in a highly available stacking design is shown in Figure 1-2. Figure 1-2 Example of stacking in the Flex System environment This example shows a design with no single points of failures, via a stack of four I/O modules in a single stack, and a pair of upstream vLAG/vPC connected switches. One of the potential limitations of the current implementation of stacking is that if an upgrade of code is needed, a reload of the entire stack must occur. Because upgrades are uncommon and should be scheduled for non-production hours anyway, a single stack design is usually efficient and acceptable. But some customers do not want to have any downtime (scheduled or otherwise) and a single stack design is thus not an acceptable solution. For these users that still want to make the most use of stacking, a two-stack design might be an option. This design features stacking a set of I/O modules in bay 1 into one stack, and a set of I/O modules in bay 2 in a second stack. The primary advantage to a two-stack design is that each stack can be upgraded one at a time, with the running stack maintaining connectivity for the compute nodes during the upgrade and reload of the other stack. The downside of the two-stack design is that traffic that is flowing from one stack to another stack must go through the upstream network to reach the other stack. As can be seen, stacking might not be suitable for all customers. However, if it is desired, it is another tool that is available for building a robust infrastructure by using the Flex System I/O modules. Multi-chassis Aggregation (vLAG, vPC, mLAG, etc) Chassis 1 Compute Node NIC 1 NIC 2 Upstream Network ToR Switch 2 ToR Switch 1 I/O Module 1 I/O Module 2 Stacking Chassis 2 Compute Node NIC 1 NIC 2 I/O Module 1 I/O Module 2
  • 20. Introduction.fm Draft Document for Review May 1, 2014 2:10 pm 6 NIC Virtualization on IBM Flex System 1.1.4 Introduction to SPAR Switch partitioning (SPAR) is a feature that, among other things, allows a physical I/O module to be divided into multiple logical switches. After SPAR is configured, ports within a given SPAR group can communicate only with each other. Ports that are members of different SPAR groups on the same I/O module can not communicate directly with each other, without going outside the I/O module. The EN4093R, CN4093, and the SI4093 I/O Modules support SPAR, SPAR features two modes of operation: 򐂰 Pass-through domain mode (also known as transparent mode) This mode of SPAR uses a Q-in-Q function to encapsulate all traffic passing through the switch in a second layer of VLAN tagging. This is the default mode when SPAR is enabled and is VLAN agnostic owing to this Q-in-Q operation. It passes tagged and untagged packets through the SPAR session without looking at or interfering with any customer assigned tag. SPAR pass-thru mode supports passing FCoE packets to an upstream FCF, but without the benefit of FIP snooping within the SPAR group in pass-through domain mode. 򐂰 Local domain mode This mode is not VLAN agnostic and requires a user to create any required VLANs in the SPAR group.Currently, there is a limit of 256 VLANs in Local domain mode. Support is available for FIP Snooping on FCoE sessions in Local Domain mode. Unlike pass-through domain mode, Local Domain mode provides strict control of end host VLAN usage. Consider the following points regarding SPAR: 򐂰 SPAR is disabled by default on the EN4093R and CN4093. SPAR is enabled by default on SI4093, with all base licensed internal and external ports defaulting to a single pass-through SPAR group. This default SI4093 configuration can be changed if desired. 򐂰 Any port can be a member of only a single SPAR group at one time. 򐂰 Only a single uplink path is allowed per SPAR group (can be a single link, a single static aggregation, or a single LACP aggregation). This SPAR enforced restriction ensures that no network loops are possible with ports in a SPAR group. 򐂰 SPAR cannot be used with UFP or Virtual Fabric vNIC at this time. Switch Independent Mode vNIC is supported with SPAR. UFP support is slated for a possible future release. 򐂰 Up to eight SPAR groups per I/O module are supported. This number might be increased in a future release. 򐂰 SPAR is not supported with vLAG, stacking or tagpvid-ingress features. SPAR can be a useful solution in environments were simplicity is paramount. 1.1.5 Easy Connect Q-in-Q solutions The Easy Connect concept, often referred to as Easy Connect mode, or Transparent mode, is not a specific feature but a way of using one of four different existing features to attempt to minimize ongoing I/O module management requirements. The primary goal of Easy Connect is to make an I/O module transparent to the hosts and the upstream network they need to access, thus reducing the management requirements for I/O Modules in an Easy Connect mode.
  • 21. Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 7 Draft Document for Review May 1, 2014 2:10 pm Introduction.fm As noted, there are actually several features that can be used to accomplish an Easy Connect solution, with the following being common aspects of Easy Connect solutions: 򐂰 At the heart of Easy Connect is some form of Q-in-Q tagging, to mask packets traveling through the I/O module. This is a fundamental requirement of any Easy Connect solution and lets the attached hosts and upstream network communicate using any VLAN (tagged or untagged), and the I/O module will pass those packets through to the other side of the I/O module by wrapping them in an outer VLAN tag, and then removing that outer VLAN tag as the packet exits the I/O module, thus making the I/O module VLAN agnostic. This Q-in-Q operation is what removes the need to manage VLANs on the I/O module, which is usually one of the larger ongoing management requirements of a deployed I/O module. 򐂰 Pre-creating an aggregation of the uplinks, in some cases, all of the uplinks, to remove the likelihood of loops (if all uplinks are not used, any unused uplinks/ports should be disabled to ensure loops are not possible). 򐂰 Optionally disabling spanning-tree so the upstream network does not receive any spanning-tree BPDUs. This is especially important in the case of upstream devices that will shut down a port if BPDUs are received, such as a Cisco FEX device, or an upstream switch running some form of BPDU guard. After it is configured, an I/O module in Easy Connect mode does not require on-going configuration changes as a customer adds and removes VLANs to the hosts and upstream network. In essence, Easy Connect turns the I/O module into a VLAN agnostic port aggregator, with support for growing up to the maximum bandwidth of the product (for example, add upgrade Feature on Demand (FoD) keys to the I/O module to increase the 10 Gb links to Compute Nodes and 10 Gb and 40 Gb links to the upstream networks). The following are the two primary methods for deploying an Easy Connect solution: 򐂰 Use an I/O module that defaults to a form of Easy Connect: – For customers that want an Easy Connect type of solution that is immediately ready for use out of the box (zero touch I/O module deployment), the SI4093 provides this by default. The SI4093 accomplishes this by having the following factory default configuration: • All base licensed internal and external ports are put into a single SPAR group. • All uplinks are put into a single common LACP aggregation and the LACP suspend-port feature is enabled. • The failover feature is enabled on the common LACP key. • No spanning-tree support (the SI4093 is designed to never permit more than a single uplink path per SPAR, so it can not create a loop and does not support spanning-tree). 򐂰 For customers that want the option to be able to use advanced features, but also want an Easy Connect mode solution, the EN4093R and CN4093 offer configurable options that can make them transparent to the attaching Compute Nodes and upstream network switches. While maintaining the option of changing to more advanced modes of configuration when needed. As noted, the SI4093 accomplishes this by defaulting to the SPAR feature in pass-through mode, which puts all compute node ports and all uplinks into a common Q-in-Q group. For the EN4093R and CN4093, there are a number of features that can be implemented to accomplish this Easy Connect support. The primary difference between these I/O modules and the SI4093 is that you must first perform a small set of configuration steps to set up the EN4093R and CN4093 into an Easy Connect mode, after which minimal management of the I/O module is required.
  • 22. Introduction.fm Draft Document for Review May 1, 2014 2:10 pm 8 NIC Virtualization on IBM Flex System For these I/O modules, this Easy Connect mode can be configured by using one of the following four features: 򐂰 The SPAR feature that is default on the SI4093 can be configured on both the EN4093R and CN4093 as well 򐂰 Utilize the tagpvid-ingress feature 򐂰 Configure vNIC Virtual Fabric Dedicated Uplink Mode 򐂰 Configure UFP vPort tunnel mode In general, all of these features provide this Easy Connect functionality, with each having some pros and cons. For example, if the desire is to use Easy Connect with vLAG, you should use the tagpvid-ingress mode or the UFP vPort tunnel mode (SPAR and Virtual Fabric vNIC do not permit the vLAG ISL). But, if you want to use Easy Connect with FCoE today, you cannot use tagpvid-ingress and must utilize a different form of Easy connect, such as the vNIC Virtual Fabric Dedicated Uplink Mode or UFP tunnel mode (SPAR pass-through mode allows FCoE but does not support FIP snooping, which may or may not be a concern for some customers). As an example of how Easy Connect works (in all Easy Connect modes), consider the tagpvid-ingress Easy Connect mode operation shown in Figure 1-3. When all internal ports and the desired uplink ports are placed into a common PVID/Native VLAN (4091 in this example) and tagpvid-ingress is enabled on these ports (with any wanted aggregation protocol on the uplinks that are required to match the other end of those links), all ports with a matching Native or PVID setting On this I/O module are part of a single Q-in-Q tunnel. The Native/PVID VLAN on the port acts as the outer tag and the I/O module switches traffic based on this outer tag VLAN. The inner customer tag rides through the fabric encapsulated on this Native/PVID VLAN to the destination port (or ports) in this tunnel, and then has the outer tag stripped off as it exits the I/O Module, thus re-exposing the original customer facing tag (or no tag) to the device attaching to that egress port. Figure 1-3 Packet flow with Easy Connect In all modes of Easy Connect, local switching based on destination MAC address is still used.
  • 23. Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 9 Draft Document for Review May 1, 2014 2:10 pm Introduction.fm Some considerations on what form of Easy Connect mode makes the most sense for a given situation: 򐂰 For users that require virtualized NICs and are already using vNIC Virtual Fabric mode, and are more comfortable staying with it, vNIC Virtual Fabric Easy Connect mode might be the best solution. 򐂰 For users that require virtualized NICs and have no particular opinion on which mode of virtualized NIC they prefer, UFP tunnel mode would be the best choice for Easy Connect mode, since the UFP feature is the future direction of virtualized NICs in the Flex System I/O module solutions. 򐂰 For users planning to make use of the vLAG feature, this would require either UFP tunnel mode or tagpvid-ingress mode forms of Easy Connect (vNIC virtual fabric mode and SPAR Easy Connect modes do not work with the vLAG feature). 򐂰 For users that do not need vLAG or virtual NIC functionality, SPAR is a very simple and clean solution to implement as an Easy Connect solution. 1.1.6 Introduction to the Failover feature Failover, some times referred to as Layer 2 Failover or Trunk Failover, is not a virtulization feature in its own right, but can play an important role when NICs on a server are making use of teaming/bonding (forms of NIC virtulization in the OS). Failover is particularly important in an embedded environment, such as in a Flex System chassis. When NICs are teamed/bonded in an operating system, they need to know when a NIC is no longer able to reach the upstream network, so they can decide to use or not use a NIC in the team. Most commonly this is a simple link up/link down check in the server. If the link is reporting up, use the NIC, if a link is reporting down, do not use the NIC. In an embedded environment, this can be a problem if the uplinks out of the embedded I/O module go down, but the internal link to the server is still up. In that case, the server will still be reporting the NIC link as up, even though there is no path to the upstream network, and that leads to the server sending traffic out a NIC that has no path out of the embedded I/O module, and disrupts server communications. The Failover feature can be implemented in these environments, and when the set of uplinks the Failover feature is tracking go down, then configurable internal ports will also be taken down, alerting the embedded server to a path fault in this direction, at which time the server can utilize the team/bond to select a different NIC, and maintain network connectivity.
  • 24. Introduction.fm Draft Document for Review May 1, 2014 2:10 pm 10 NIC Virtualization on IBM Flex System An example of how failover can protect Compute Nodes in a PureFlex chassis when there is an uplink fault out of one of the I/O modules can be seen in Figure 1-4. Figure 1-4 Example of Failover in action Without failover or some other form of remote link failure detection, embedded servers would potentially be exposed to loss of connectivity if the uplink path on one of the embedded I/O modules were to fail. Note designs that utilize vLAG or some sort of cross chassis aggregation such as stacking are not exposed to this issue (and thus do not need the Failover feature) as they have a different coping method for dealing with uplinks out of an I/O module going down (for example, with vLAG, the packets that need to get upstream can cross the vLAG ISL and use the other I/O modules uplinks to get to the upstream network). 1.2 Introduction to NIC virtualization As noted previously, although we have introduced a number of virtualization elements, this book is primarily focused on the various options to virtualize NIC technology within the PureFlex System and Flex System environment. This section introduces the two primary types of NIC virtualization (vNIC and UFP) available on the Flex System I/O modules, as well as introduces the various sub-elements of these virtual NIC technologies. At the core of all virtual NICs discussed in this section is the ability to take a single physical 10 GbE NIC and carve it up into up to three or four NICs for use in the attaching host. The virtual NIC technologies discussed for the I/O module here are all directly tied to the Emulex CNA offerings for the Flex System environment, and documented in 3.3, “IBM Flex System Ethernet adapters” on page 47. HowFailoverWorks 1. All uplinks out of the I/O module have gone down (could be a link failure or failure of ToR 1, and so forth). 2. Trunk failover takes down the link to NIC 1 to notify the compute node the path out of I/O module 1 is gone. 3. NIC teaming on the compute node begins to utilizing the still functioning NIC 2 for all communications. Chassis Node NIC1 NIC2 ToR Switch2 ToR Switch1 I/O Module 1 Failover enabled I/O Module 2 Failover enabled X Logical Teamed NIC 2 3 1
  • 25. Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 11 Draft Document for Review May 1, 2014 2:10 pm Introduction.fm 1.2.1 vNIC based NIC virtualization vNIC is the original virtual NIC technology utilized in the IBM BladeCenter 10Gb Virtual Fabric Switch Module, and has been brought forward into the PureFlex System environment to allow customers that have standardized on vNIC to still use it with the PureFlex System solutions. vNIC has three primary nodes: 򐂰 vNIC Virtual Fabric - Dedicated Uplink Mode – Provides a Q-in-Q tunneling action for each vNIC group – Each vNIC group must have its own dedicated uplink path out – Any vNICs in one vNIC group can not talk with vNICs in any other vNIC group, without first exiting to the upstream network 򐂰 vNIC Virtual Fabric - Shared Uplink Mode – Each vNIC group provides a single VLAN for all vNICs in that group – Each vNIC group must be a unique VLAN (can not use same VLAN on more than a single vNIC group) – Servers can not use tagging when Shared Uplink Mode is enabled – Like vNICs in Dedicate Uplink Mode, any vNICs in one vNIC group can not talk with vNICs in any other vNIC group, without first exiting to the upstream network 򐂰 vNIC Switch Independent Mode – Offers virtual NICs to server with no special I/O module side configuration – The switch is completely unaware that the 10 GbE NIC is being seen as multiple logical NICs in the OS Details for enabling and configuring these modes can be found in Chapter 5, “NIC virtualization considerations on the server side” on page 75 and Chapter 6, “Flex System NIC virtulization deployment scenarios” on page 133. 1.2.2 Unified Fabric Port based NIC virtualization UFP is the current direction of IBM NIC virtualization, and provides a more feature rich solution compared to the original vNIC Virtual Fabric mode. Like VF mode vNIC, UFP allows carving up a single 10 Gb port into four virtual NICs. UFP also has a number of modes associated with it, including: 򐂰 Tunnel mode Provides a mode very similar to vNIC Virtual Fabric Dedicated Uplink Mode 򐂰 Trunk mode Provides a traditional 802.1Q trunk mode to the virtual NIC (vPort) interface 򐂰 Access mode Provides a traditional access mode (single untagged VLAN) to the virtual NIC (vPort) interface 򐂰 FCoE mode Provides FCoE functionality to the vPort 򐂰 Auto-VLAN mode Auto VLAN creation for Qbg and IBM VMready® environments
  • 26. Introduction.fm Draft Document for Review May 1, 2014 2:10 pm 12 NIC Virtualization on IBM Flex System Only vPort 2 can be bound to FCoE. If FCoE is not desired, vPort 2 can be configured for one of the other modes. Details for enabling and configuring these modes can be found in Chapter 5, “NIC virtualization considerations on the server side” on page 75 and Chapter 6, “Flex System NIC virtulization deployment scenarios” on page 133. 1.2.3 Comparing vNIC modes and UFP modes As a general rule of thumb, if a customer desires virtualized NICs in the PureFlex System environment, UFP is usually the preferred solution, as all new feature development is going into UFP. If a customer has standardized on the original vNIC Virtual Fabric mode, then they can still continue to use that mode in a fully supported fashion. If a customer does not want any of the virtual NIC functionality controlled by the I/O module (only controlled and configured on the server side) then Switch Independent mode vNIC is the solution of choice. This mode has the advantage of being I/O module independent, such that any upstream I/O module can be utilized. Some of the down sides to this mode are that bandwidth restrictions can only be enforced from the server side, not the I/O module side, and to change bandwidth requires a reload of the server (bandwidth control for the other virtual NIC modes discussed here are changed from the switch side, enforce bandwidth restrictions bidirectionally, and can be changed on the fly, with no reboot required). Table 1-1 shows some of the items that may effect the decision making process. Table 1-1 Attributes of virtual NIC options Capability Virtual Fabric vNIC mode Switch independent Mode vNIC UFP Dedicated uplink Shared uplink Requires support in the I/O module Yes Yes No Yes Requires support in the NIC/CNA Yes Yes Yes Yes Supports adapter transmit rate control Yes Yes Yes Yes Supports I/O module transmit rate control Yes Yes No Yes Supports changing rate without restart of node Yes Yes No Yes Requires a dedicated uplink path per vNIC group or vPort Yes No No Yes for vPorts in Tunnel mode Support for node OS-based tagging Yes No Yes Yes Support for failover per vNIC/ group/UFP vPort Yes Yes No Yes Support for more than one uplink path per vNIC/vPort group No Yes Yes Yes for vPorts in trunk and Access modes Supported regardless of the model of upstream I/O module No No Yes No
  • 27. Chapter 1. Introduction to I/O module and NIC virtualization features in the IBM Flex System environment 13 Draft Document for Review May 1, 2014 2:10 pm Introduction.fm For a deeper dive into virtual NIC operational characteristics from the switch side see Chapter 4, “NIC virtualization considerations on the switch side” on page 55. For virtual NIC operational characteristics from the server side, see Chapter 5, “NIC virtualization considerations on the server side” on page 75. Supported with vLAG No No Yes Yes for uplinks out of the I/O Module carrying vPort traffic Supported with SPAR No No Yes No Supported with stacking Yes Yes Yes No (UFP and stacking on EN/CN4093 in coming release of code) Supported with an SI4093 No No Yes No today, but supported in coming release Supported with EN4093 Yes Yes Yes Yes Supported with CN4093 Yes Yes Yes Yes Capability Virtual Fabric vNIC mode Switch independent Mode vNIC UFP Dedicated uplink Shared uplink
  • 28. Introduction.fm Draft Document for Review May 1, 2014 2:10 pm 14 NIC Virtualization on IBM Flex System
  • 29. © Copyright IBM Corp. 2014. All rights reserved. 15 Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm Chapter 2. Converged networking This chapter introduces storage and network convergence, highlighting the impact on data centers and the vision behind it. This chapter includes the following sections: 򐂰 2.1, “What convergence is” on page 16 򐂰 2.2, “Vision of convergence in data centers” on page 16 򐂰 2.3, “The interest in convergence now” on page 17 򐂰 2.4, “Fibre Channel SANs today” on page 17 򐂰 2.5, “Ethernet-based storage today” on page 18 򐂰 2.6, “Benefits of convergence in storage and network” on page 19 򐂰 2.7, “Challenge of convergence” on page 20 򐂰 2.8, “Conclusion” on page 22 򐂰 2.9, “Fibre Channel over Ethernet protocol stack” on page 23 򐂰 2.10, “iSCSI” on page 24 򐂰 2.11, “iSCSI versus FCoE” on page 25 2
  • 30. Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm 16 NIC Virtualization on IBM Flex System 2.1 What convergence is Dictionaries describes convergence as follows: 򐂰 The degree or point at which lines, objects, and so on, converge1 򐂰 The merging of distinct technologies, industries, or devices into a unified whole2 In the context of this book, convergence addresses the fusion of local area networks (LANs) and storage area networks (SANs), including servers and storage systems, into a unified network. In other words, the same infrastructure is used for both data (LAN) and storage (SAN) networking; the components of this infrastructure are primarily those traditionally used for LANs. 2.1.1 Calling it what it is Many terms and acronyms are used to describe convergence in a network environment. These terms are described in later chapters of this book. For a better understanding of the basics, let us start with the core. Data Center Bridging (DCB) The Institute of Electrical and Electronics Engineers (IEEE) uses the term DCB to group the required extensions to enable an enhanced Ethernet that is capable of deploying a converged network where different applications, relying on different link layer technologies, can be run over a single physical infrastructure. The Data Center Bridging Task Group (DCB TG), part of the IEEE 802.1 Working Group, provided the required extensions to existing 802.1 bridge specifications in several projects. Converged Enhanced Ethernet (CEE) This is a trademark term that was registered by IBM in 2007 and was abandoned in 2008. Initially, it was planned to donate (transfer) this term to the industry (IEEE 802 or Ethernet Alliance) upon reception. Several vendors started using or referring to CEE in the meantime. Data Center Ethernet (DCE) Cisco registered the trademark DCE for their initial activity in the converged network area. Bringing it all together All three terms describes more or less the same thing. Some of them were introduced before an industrial standard (or name) was available. Because manufacturers have used different command names and terms, different terms might be used in this book. This clarification that these terms can be interchanged should help prevent confusion. While all of these terms are still heard, it is preferred to use the open industry standards Data Center Bridging (DCB) terms. Command syntax in some of the IBM products used for testing in this book includes the CEE acronym. 2.2 Vision of convergence in data centers The density - processing and storage capability per square foot - of the data center footprint is increasing over time, allowing the same processing power and storage capacity in significantly 1 Dictionary.com. Retrieved July 08, 2013 from http://dictionary.reference.com/browse/convergence 2 Merriam-Webster.com. Retrieved July 08, 2013 from http://www.merriam-webster.com/dictionary/convergence
  • 31. Chapter 2. Converged networking 17 Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm smaller space. At the same time, information technology is embracing infrastructure virtualization more rapidly than ever. One way to reduce the storage and network infrastructure footprint is to implement a converged network. Vendors are adopting industry standards which support convergence when developing products. Fibre Channel over Ethernet (FCoE) and iSCSI are two of the enablers of storage and network convergence. Enterprises can preserve investments in traditional Fibre Channel (FC) storage and at the same time adapt to higher Ethernet throughput demands which arise from server virtualization. Most of the vendors in the networking market offer 10 Gbps Network Interface Cards; 40Gbps NICs are also available today. Similarly, data center network switches increasingly offer an option to choose 40 Gbps for ports, and 100 Gbps is expected relatively soon. Convergence has long had a role in networking, but now it takes on a new significance. The following sections describe storage and networking in data centers today, explain what is changing, and highlight approaches to storage and network convergence that are explored in this book. 2.3 The interest in convergence now Several factors are driving new interest in combining storage and data infrastructure. The Ethernet community has a history of continually moving to transmission speeds that were thought impossible only a few years earlier. Although a 100 Mbps Ethernet was once considered fast, a 10 Gbps Ethernet is commonplace today. and 40 Gbps Ethernet is becoming more and more widely available, with 100 Gb Ethernet following shortly. From a simple data transmission speed perspective, Ethernet can now meet or exceed the speeds that are available by using FC. The IEEE 802.3 work group is already working on the 400 Gbps standard (results are expected in 2017), so this process will continue. A second factor that is enabling convergence is the addition of capabilities that make Ethernet lower latency and “lossless,” making it more similar to FC. The Data Center Bridging (DCB) protocols provide several capabilities that substantially enhance the performance of Ethernet and initially enable its usage for storage traffic. One of the primary motivations for storage and networking convergence is improved asset utilization and cost of ownership, similar to the convergence of voice and data networks that occurred in previous years. By using a single infrastructure for multiple types of network traffic, the costs of procuring, installing, managing, and operating the data center infrastructure can be lowered. Where multiple types of adapters, switches, and cables were once required for separate networks, a single set of infrastructure will take its place, providing savings in equipment, cabling, and power requirements. The improved speeds and capabilities of lossless 10 and 40 Gbps Ethernet are enabling such improvements. 2.4 Fibre Channel SANs today Fibre Channel SANs are generally regarded as the high-performance approach to storage networking. With a Fibre Channel SAN, storage arrays are equipped with FC ports that connect to FC switches. Similarly, servers are equipped with Fibre Channel host bus adapters
  • 32. Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm 18 NIC Virtualization on IBM Flex System (HBAs) that also connect to Fibre Channel switches. Therefore, the Fibre Channel SAN, which is the set of FC switches, is a separate network for storage traffic. Fibre Channel (FC) was standardized in the early 1990s and became the technology of choice for enterprise-class storage networks. Compared to its alternatives, FC offered relatively high-speed, low-latency, and back-pressure mechanisms that provide lossless connectivity. That is, FC is designed not to drop packets during periods of network congestion. Just as the maximum speed of Ethernet networks has increased repeatedly, Fibre Channel networks have offered increased speed, typically by factors of 2, from four to eight to 16 Gbps, with thirty-two Gbps becoming available. FC has many desirable characteristics for a storage network, but with some considerations. First, because FC is a separate network from the enterprise data Ethernet network, additional cost and infrastructure are required. Second, FC is a different technology from Ethernet. Therefore, the skill set required to design, install, operate and manage the FC SAN is different from the skill set required for Ethernet, which adds cost in terms of personnel requirements. Third, despite many years of maturity in the FC marketplace, vendor interoperability within a SAN fabric is limited. Such technologies as N_Port Virtualization (NPV) or N_Port ID Virtualization (NPIV) allow the equipment of one vendor to attach at the edge of the SAN fabric of another vendor. However, interoperability over inter-switch links (ISLs; E_Port links) within a Fibre Channel SAN is generally viewed as problematic. 2.5 Ethernet-based storage today Storage arrays can also be networked by using technologies based on Ethernet. Two major approaches are the Internet Small Computer System Interface (iSCSI) protocol and various NAS protocols. iSCSI provides block-level access to data over IP networks. With iSCSI, the storage arrays and servers use Ethernet adapters. Servers and storage exchange SCSI commands over an Ethernet network to store and retrieve data. iSCSI provides a similar capability to FC, but by using a native Ethernet network. For this reason, iSCSI is sometimes referred to as IP SAN. By using iSCSI, designers and administrators can take advantage of familiar Ethernet skills for designing and maintaining networks. Also, unlike FC devices, Ethernet devices are widely interoperable. Ethernet infrastructure can also be significantly less expensive than FC gear. When compared to FC, iSCSI also has challenges. FC is lossless and provides low latency in-sequence data transfer. However, traditional Ethernet drops packets when traffic congestion occurs, so that higher-layer protocols are required to ensure that no packets are lost. For iSCSI, TCP/IP is used above an Ethernet network to guarantee that no storage packets are lost. Therefore, iSCSI traffic undergoes a further layer of encapsulation as it is transmitted across an Ethernet network. Until recently, Ethernet technology was available only at speeds significantly lower than those speeds of FC. Although FC offered speeds of 2, 4, 8, or 16 Gbps, with 32 Gbps just arriving, Ethernet traditionally operated at 100 Mbps and1 Gbps. Now, 10 Gbps is common, and 40 Gbps is not far behind. iSCSI might offer a lower cost overall than an FC infrastructure, but it historically has tended to offer lower performance because of its extra encapsulation and lower
  • 33. Chapter 2. Converged networking 19 Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm speeds. Therefore, iSCSI has been viewed as a lower cost, lower performance storage networking approach compared to FC. Today, the DCB standards which are a prerequisite for FCoE to operate with lossless transmission and packets arriving in order can also be used for iSCSI, resulting in improved performance. NAS also operates over Ethernet. NAS protocols, such as Network File System (NFS) and Common Internet File System (CIFS), provide file-level access to data, not block-level access. The server that accesses the NAS over a network detects a file system, not a disk. The operating system in the NAS device converts file-level commands that are received from the server to block-level commands. The operating system then accesses the data on its disks and returns information to the server. NAS appliances are attractive because, similar to iSCSI, they use a traditional Ethernet infrastructure and offer a simple file-level access method. However, similar to iSCSI, they have been limited by Ethernet’s capabilities. NAS protocols are encapsulated in an upper layer protocol (such as TCP or RPC) to ensure no packet loss. While NAS is working on a file-level, there is the possibility of additional processing on the NAS device, because it is aware of the stored content (for example, deduplication or incremental backup). On the other hand, NAS systems require more processing power, because they are also required to handle all file-system related operations. This requires more resources than pure block-level handling. 2.6 Benefits of convergence in storage and network The term convergence has had various meanings in the history of networking. Convergence is used generally to refer to the notion of combining or consolidating storage traffic and traditional data traffic on a single network (or fabric). Because Fibre Channel (FC) storage area networks (SANs) are generally called “fabrics,” the term fabric is now also commonly used for an Ethernet network that carries storage traffic. Convergence of network and storage consolidates data and storage traffics into a single, highly scalable, highly available, high performance and highly reliable storage network infrastructure. Converging storage and network brings lot of benefits which outweigh the initial investment. Here are some of the key benefits: 򐂰 Simplicity, cost savings, and reliability 򐂰 Scalability and easier-to-move workloads in the virtual world 򐂰 Low latency and higher throughput 򐂰 One single, high-speed network infrastructure for both storage and network 򐂰 Better utilization of server resources and simplified management To get an idea how the differences between traditional and converged data centers can look like, see the following figures. Both figures include three major components: servers, storage, and the networks, to establish the connections. The required amount of switches in each network depends on the size of the environment. Figure 2-1 on page 20 shows a simplified picture of a traditional data center without convergence. Either servers or storage devices might require multiple interfaces to connect to the different networks. In addition, each network requires dedicated switches, which leads to higher investments in multiple devices and more efforts for configuration and management.
  • 34. Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm 20 NIC Virtualization on IBM Flex System Figure 2-1 Conceptual view of a data center without implemented convergence Using converged network technologies, as shown by the converged data center in Figure 2-2, there is only the need for one converged enhanced Ethernet. This results in fewer required switches and decreases the amount of devices that require management. This reduction might impact the TCO. Even the servers, clients, and storage devices require only one type of adapters to be connected. For redundancy, performance, or segmentation purposes, it might still make sense to use multiple adapters. Figure 2-2 Conceptual view of a converged data center 2.7 Challenge of convergence Fibre Channel SANs have different design requirements than Ethernet. To provide a better understanding, they can be compared with two different transportation systems. Each system moves people or goods from point A to point B.
  • 35. Chapter 2. Converged networking 21 Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm Railroads Trains run on rails and tracks. This can be compared with Fibre Channel SAN. Figure 2-3 Trains running on rails Specific aspects for trains that even impact network traffic are as follows: 򐂰 The route is already defined by rails (shortest path first). 򐂰 All participating trains are registered and known (nameserver). 򐂰 The network is isolated, but accidents (dropped packages) have a huge impact. 򐂰 The amount of trains in one track segment is limited (buffer to buffer credit for a lossless connection). 򐂰 Signals and railway switches all over the tracks define the allowed routes (zoning). 򐂰 They have high capacity (payload 2148 bytes). Roads Cars can use roads with paved or even unpaved lanes. This can be compared with traditional Ethernet traffic. Figure 2-4 Cars using roads
  • 36. Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm 22 NIC Virtualization on IBM Flex System Specific aspects for roads that even impact network traffic are as follows: 򐂰 An unknown number of participants may be using the road at the same time. Metering lights can only be used as a reactive method to slow down traffic (no confirmation for available receiving capacity in front of sending). 򐂰 Accidents are more or less common and expected (packet loss). 򐂰 All roads lead to Rome (no point-to-point topology). 򐂰 Navigation is required to prevent moving in circles (requirement of Trill/Spanning Tree/SDN). 򐂰 Everybody can join and hop on/off mostly everywhere (no zoning). 򐂰 They have limited capacity (payload 1500), while available bigger buses/trucks can carry more (jumbo frames). Convergence approaches Maintaining two transportation infrastructure systems, with separate vehicles and different stations and routes, is complex to manage and expensive. Convergence for storage and networks can mean “running trains on the road”, to stay in the context. The two potential vehicles, which are enabled to run as trains on the road, are iSCSI and Fibre Channel over Ethernet (FCoE). iSCSI can be used in existing (lossy) and new (lossless) Ethernet infrastructure, with different performance characteristics. However, FCoE requires a lossless converged enhanced Ethernet network and it relies on additional functionality known from Fibre Channel (for example, nameserver, zoning). The Emulex CNA (Converged Network Adapters) which are used in compute nodes in the Flex chassis can support either iSCSI or FCoE in their onboard ASIC - that is, in hardware. Their configuration and use is described in the chapters which follow. Testing was done for the purpose of this book using FCoE as the storage protocol of choice, because it is more commonly used at this time and because there are more configuration steps required to implement FCoE in a Flex environment than to implement iSCSI. Many of the scenarios presented in the chapters that follow can readily be adapted for deployment in an environment which includes iSCSI storage networking. 2.8 Conclusion Convergence is the future. Network convergence can reduce cost, simplify deployment, better leverage expensive resources, and enable a smaller data center infrastructure footprint. The IT industry is adopting FCoE more rapidly because the technology is becoming more mature and offers higher throughput in terms of 40/100 Gbps. Sooner or later, the CIOs will realize the cost benefits and advantages of convergence and will adopt the storage and network convergence more rapidly. The bulk of the chapters of this book focus on insights and capabilities of FCoE on IBM Flex Systems and introduces available IBM switches and storage solutions with support for converged networks. Most of the content of the previous book which focused more on IBM BladeCenter converged solutions is still valid and is an integrated part of the book.
  • 37. Chapter 2. Converged networking 23 Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm 2.9 Fibre Channel over Ethernet protocol stack FCoE assumes the existence of a lossless Ethernet, such as one that implements the Data Center Bridging (DCB) extensions to Ethernet. This section highlights, at a high level, the concepts of FCoE as defined in FC-BB-5. The EN4093R, CN4093, G8264 and G8264CS switches support FCoE; the G8264 and EN4093R functions as an FCoE transit switch while the CN4093 and G8264CS have OmniPorts which can be set to function as either FC ports or Ethernet ports under as specified in the switch configuration. The basic notion of FCoE is that the upper layers of FC are mapped onto Ethernet, as shown in Figure 2-5. The upper layer protocols and services of FC remain the same in an FCoE deployment. Zoning, fabric services, and similar services still exist with FCoE. Figure 2-5 FCoE protocol mapping The difference is that the lower layers of FC are replaced by lossless Ethernet, which also implies that FC concepts, such as port types and lower-layer initialization protocols, must be replaced by new constructs in FCoE. Such mappings are defined by the FC-BB-5 standard and are briefly addressed here. FC-0 FC-1 FC-2P FC-2M FC-2V FC-3 FC-4 Fibre Channel Protocol Stack Ethernet PHY Ethernet MAC FCoE Entity FC-2V FC-3 FC-4 FCoE Protocol Stack
  • 38. Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm 24 NIC Virtualization on IBM Flex System Figure 2-6 shows another perspective on FCoE layering compared to other storage networking technologies. In this figure, FC and FCoE layers are shown with other storage networking protocols, including iSCSI. Figure 2-6 Storage Network Protocol Layering Based on this protocol structure, Figure 2-7 shows a conceptual view of an FCoE frame. Figure 2-7 Conceptual view of an FCoE frame 2.10 iSCSI The iSCSI protocol allows for longer distances between a server and its storage when compared to the traditionally restrictive parallel SCSI solutions or the newer serial-attached SCSI (SAS). iSCSI technology can use a hardware initiator, such as a host bus adapter (HBA), or a software initiator to issue requests to target devices. Within iSCSI storage Operating Systems / Applications SCSI Layer 1, 2, 4, 8, 16 Gbps FCP FCP FCP FC iSCSI SRP TCP TCP TCP IP IP IP FCoE FC IB iFCP FCIP Ethernet 1, 10, 40, 100... Gbps 10, 20, 40 Gbps FC FCoE Ethernet Header FCoE Header FCS EOF FC Header CRC FC Payload Ethernet Frame, Ethertype = FCoE=8906h Same as a physical FC frame Control information: version, ordered sets (SOF, EOF)
  • 39. Chapter 2. Converged networking 25 Draft Document for Review May 1, 2014 2:10 pm Converged networking.fm terminology, the initiator is typically known as a client, and the target is the storage device. The iSCSI protocol encapsulates SCSI commands into protocol data units (PDUs) within the TCP/IP protocol and then transports them over the network to the target device. The disk is presented locally to the client as shown in Figure 2-8. Figure 2-8 iSCSI architecture overview The iSCSI protocol is a transport for SCSI over TCP/IP. Figure 2-6 on page 24 illustrates a protocol stack comparison between Fibre Channel and iSCSI. iSCSI provides block-level access to storage, as does Fibre Channel, but uses TCP/IP over Ethernet instead of Fibre Channel protocol. iSCSI is defined in RFC 3720, which you can find at: http://www.ietf.org/rfc/rfc3720.txt iSCSI uses Ethernet-based TCP/IP rather than a dedicated (and different) storage area network (SAN) technology. Therefore, it is attractive for its relative simplicity and usage of widely available Ethernet skills. Its chief limitations historically have been the relatively lower speeds of Ethernet compared to Fibre Channel and the extra TCP/IP encapsulation required. With lossless 10 Gbps Ethernet now available, the attractiveness of iSCSI is expected to grow rapidly. TCP/IP encapsulation will still be used, but 10 Gbps Ethernet speeds will dramatically increase the appeal of iSCSI. 2.11 iSCSI versus FCoE The section highlights the similarities and differences between iSCSI and FCoE. However, in most cases, considerations other than purely technical ones will influence your decision in choosing one over the other. 2.11.1 Key similarities iSCSI and FCoE have the following similarities: 򐂰 Both protocols are block-oriented storage protocols. That is, the file system logic for accessing storage with either of them is on the computer where the initiator is, not on the storage hardware. Therefore, they are both different from typical network-attached storage (NAS) technologies, which are file oriented. 򐂰 Both protocols implement Ethernet-attached storage. 򐂰 Both protocols can be implemented in hardware, which is detected by the operating system of the host as an HBA. 򐂰 Both protocols can also be implemented by using software initiators which are available in various server operating systems. However, this approach uses resources of the main processor to perform tasks which would otherwise be performed by the hardware of an HBA. iSCSI Initiator Client TCP Connection iSCSI Target Client Network
  • 40. Converged networking.fm Draft Document for Review May 1, 2014 2:10 pm 26 NIC Virtualization on IBM Flex System 򐂰 Both protocols can use the Converged Enhanced Ethernet (CEE), also referred to as Data Center Bridging), standards to deliver “lossless” traffic over Ethernet. 򐂰 Both protocols are alternatives to traditional FC storage and FC SANs. 2.11.2 Key differences iSCSI and FCoE have the following differences: 򐂰 iSCSI uses TCP/IP as its transport, and FCoE uses Ethernet. iSCSI can use media other than Ethernet, such as InfiniBand, and iSCSI can use Layer 3 routing in an IP network. 򐂰 Numerous vendors provide local iSCSI storage targets, some of which also support Fibre Channel and other storage technologies. Relatively few native FCoE targets are available at this time, which might allow iSCSI to be implemented at a lower overall capital cost. 򐂰 FCoE requires a gateway function, usually called a Fibre Channel Forwarder (FCF), which allows FCoE access to traditional FC-attached storage. This approach allows FCoE and traditional FC storage access to coexist either as a long-term approach or as part of a migration. The G8264CS and CN4093 switches can be used to provide FCF functionality. 򐂰 iSCSI-to-FC gateways exist but are not required when a storage device is used that can accept iSCSI traffic directly. 򐂰 Except in the case of a local FCoE storage target, the last leg of the connection uses FC to reach the storage. FC uses 8b/10b encoding, which means that, sending 8 bits of data requires a transmission of 10 bits over the wire or 25% overhead that is transmitted over the network to prevent corruption of the data. The 10 Gbps Ethernet uses 64b/66b encoding, which has a far smaller overhead. 򐂰 iSCSI includes IP headers and Ethernet (or other media) headers with every frame, which adds overhead. 򐂰 The largest payload that can be sent in an FCoE frame is 2112. iSCSI can use jumbo frame support on Ethernet and send 9K or more in a single frame. 򐂰 iSCSI has been on the market for several years longer than FCoE. Therefore, the iSCSI standards are more mature than FCoE. 򐂰 Troubleshooting FCoE end-to-end requires Ethernet networking skills and FC SAN skills.
  • 41. © Copyright IBM Corp. 2014. All rights reserved. 27 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm Chapter 3. IBM Flex System networking architecture and portfolio IBM Flex System, a new category of computing and the next generation of Smarter Computing, offers intelligent workload deployment and management for maximum business agility. This chassis delivers high-speed performance complete with integrated servers, storage, and networking for multi-chassis management in data center compute environments. Furthermore, its flexible design can meet the needs of varying workloads with independently scalable IT resource pools for higher usage and lower cost per workload. Although increased security and resiliency protect vital information and promote maximum uptime, the integrated, easy-to-use management system reduces setup time and complexity, which provides a quicker path to return on investment (ROI). This chapter includes the following topics: 򐂰 3.1, “Enterprise Chassis I/O architecture” on page 28 򐂰 3.2, “IBM Flex System Ethernet I/O modules” on page 31 򐂰 3.3, “IBM Flex System Ethernet adapters” on page 47 3
  • 42. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 28 NIC Virtualization on IBM Flex System 3.1 Enterprise Chassis I/O architecture The Ethernet networking I/O architecture for the IBM Flex System Enterprise Chassis includes an array of connectivity options for server nodes that are installed in the enclosure. Users can decide to use a local switching model that provides superior performance, cable reduction and a rich feature set, or use pass-through technology and allow all Ethernet networking decisions to be made external to the Enterprise Chassis. By far, the most versatile option is to use modules that provide local switching capabilities and advanced features that are fully integrated into the operation and management of the Enterprise Chassis. In particular, the EN4093 10Gb Scalable Switch module offers the maximum port density, highest throughput, and most advanced data center-class features to support the most demanding compute environments. From a physical I/O module bay perspective, the Enterprise Chassis has four I/O bays in the rear of the chassis. The physical layout of these I/O module bays is shown in Figure 3-1. Figure 3-1 Rear view of the Enterprise Chassis showing I/O module bays From a midplane wiring point of view, the Enterprise Chassis provides 16 lanes out of each half-wide node bay (toward the rear I/O bays) with each lane capable of 16 Gbps or higher speeds. How these lanes are used is a function of which adapters are installed in a node, which I/O module is installed in the rear, and which port licenses are enabled on the I/O module. I/O module bay 1 I/O module bay 3 I/O module bay 2 I/O module bay 4
  • 43. Chapter 3. IBM Flex System networking architecture and portfolio 29 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm How the midplane lanes connect between the node bays upfront and the I/O bays in the rear is shown in Figure 3-2. The concept of an I/O module Upgrade Feature on Demand (FoD) also is shown in Figure 3-2. From a physical perspective, an upgrade FoD in this context is a bank of 14 ports and some number of uplinks that can be enabled and used on a switch module. By default, all I/O modules include the base set of ports, and thus have 14 internal ports, one each connected to the 14 compute node bays in the front. By adding an upgrade license to the I/O module, it is possible to add more banks of 14 ports (plus some number of uplinks) to an I/O module. The node needs an adapter that has the necessary physical ports to connect to the new lanes enabled by the upgrades. Those lanes connect to the ports in the I/O module enabled by the upgrade. Figure 3-2 Sixteen lanes total of a single half-wide node bay toward the I/O bays For example, if a node were installed with only the dual port LAN on system board (LOM) adapter, only two of the 16 lanes are used (one to I/O bay 1 and one to I/O bay 2), as shown in Figure 3-3 on page 30. If a node was installed without LOM and two quad port adapters were installed, eight of the 16 lanes are used (two to each of the four I/O bays). This installation can potentially provide up to 320 Gb of full duplex Ethernet bandwidth (16 lanes x 10 Gb x 2) to a single half-wide node and over half a terabit (Tb) per second of bandwidth to a full-wide node. Node Bay 1 Interface Connector To Adapter 2 To LOM or Adapter 1 Interface Connector Midplane I/O Bay 1 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 2 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 3 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 4 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future
  • 44. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 30 NIC Virtualization on IBM Flex System Figure 3-3 Dual port LOM connecting to ports on I/O bays 1 and 2 (all other lanes unused) Today, there are limits on the port density of the current I/O modules, in that only the first three lanes are potentially available from the I/O module. By default, each I/O module provides a single connection (lane) to each of the 14 half-wide node bays upfront. By adding port licenses, an EN2092 1Gb Ethernet Switch can offer two 1 Gb ports to each half-wide node bay, and an EN4093R 10Gb Scalable Switch, CN4093 10Gb Converged Scalable Switch or SI4093 System Interconnect Module can each provide up to three 10 Gb ports to each of the 14 half-wide node bays. Because it is a one-for-one 14-port pass-through, the EN4091 10Gb Ethernet Pass-thru I/O module can only ever offer a single link to each of the half-wide node bays. As an example, if two 8-port adapters were installed and four I/O modules were installed with all upgrades, the end node has access 12 10G lanes (three to each switch). On the 8-port adapter, two lanes are unavailable at this time. Concerning port licensing, the default available upstream connections also are associated with port licenses. For more information about these connections and the node that face links, see 3.2, “IBM Flex System Ethernet I/O modules” on page 31. All I/O modules include a base set of 14 downstream ports, with the pass-through module supporting only the single set of 14 server facing ports. The Ethernet switching and interconnect I/O modules support more than the base set of ports, and the ports are enabled by the upgrades. For more information, see the respective I/O module section in 3.2, “IBM Flex System Ethernet I/O modules” on page 31. As of this writing, although no I/O modules and node adapter combinations can use all 16 lanes between a compute node bay and the I/O bays, the lanes exist to ensure that the Enterprise Chassis can use future available capacity. Node Bay 1 Interface Connector Interface Connector Midplane Dual port Ethernet Adapter LAN on Motherboard I/O Bay 1 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 2 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 3 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 4 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future
  • 45. Chapter 3. IBM Flex System networking architecture and portfolio 31 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm Beyond the physical aspects of the hardware, there are certain logical aspects that ensure that the Enterprise Chassis can integrate seamlessly into any modern data centers infrastructure. Many of these enhancements, such as vNIC, VMready, and 802.1Qbg, revolve around integrating virtualized servers into the environment. Fibre Channel over Ethernet (FCoE) allows users to converge their Fibre Channel traffic onto their 10 Gb Ethernet network, which reduces the number of cables and points of management that is necessary to connect the Enterprise Chassis to the upstream infrastructures. The wide range of physical and logical Ethernet networking options that are available today and in development ensure that the Enterprise Chassis can meet the most demanding I/O connectivity challenges now and as the data center evolves. 3.2 IBM Flex System Ethernet I/O modules The IBM Flex System Enterprise Chassis features a number of Ethernet I/O module solutions that provide a combination of 1 Gb and 10 Gb ports to the servers and 1 Gb, 10 Gb, and 40 Gb for uplink connectivity to the outside upstream infrastructure. The IBM Flex System Enterprise Chassis ensures that a suitable selection is available to meet the needs of the server nodes. The following Ethernet I/O modules are available for deployment with the Enterprise Chassis: 򐂰 3.2.1, “IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches” 򐂰 3.2.2, “IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch” on page 36 򐂰 3.2.3, “IBM Flex System Fabric SI4093 System Interconnect Module” on page 42 򐂰 3.2.4, “I/O modules and cables” on page 46 These modules are described next. 3.2.1 IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches The EN4093 and EN4093R 10Gb Scalable Switches are primarily 10 Gb switches that can provide up to 42 x 10 Gb node-facing ports, and up to 14 SFP+ 10 Gb and two QSFP+ 40 Gb external upstream facing ports, depending on the applied upgrade licenses. Note: The EN4093, non R, is no longer available for purchase.
  • 46. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 32 NIC Virtualization on IBM Flex System A view of the face plate of the EN4093/EN4093R 10Gb Scalable Switch is shown in Figure 3-4. Figure 3-4 The IBM Flex System Fabric EN4093/EN4093R 10Gb Scalable Switch As listed in Table 3-1, the switch is initially licensed with 14 10-Gb internal ports that are enabled and 10 10-Gb external uplink ports enabled. More ports can be enabled, including the two 40 Gb external uplink ports with the Upgrade 1 and four more SFP+ 10Gb ports with Upgrade 2 license options. Upgrade 1 must be applied before Upgrade 2 can be applied. Table 3-1 IBM Flex System Fabric EN4093 10Gb Scalable Switch part numbers and port upgrades The IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches have the following features and specifications: 򐂰 Internal ports: – A total of 42 internal full-duplex 10 Gigabit ports (14 ports are enabled by default; optional FoD licenses are required to activate the remaining 28 ports). Part number Feature codea a. The first feature code that is listed is for configurations that are ordered through System x sales channels (HVEC) by using x-config. The second feature code is for configurations that are ordered through the IBM Power Systems channel (AAS) by using e-config. Product description Total ports that are enabled Internal 10 Gb uplink 40 Gb uplink 49Y4270 A0TB / 3593 IBM Flex System Fabric EN4093 10Gb Scalable Switch 򐂰 10x external 10 Gb uplinks 򐂰 14x internal 10 Gb ports 14 10 0 05Y3309 A3J6 / ESW7 IBM Flex System Fabric EN4093R 10Gb Scalable Switch 򐂰 10x external 10 Gb uplinks 򐂰 14x internal 10 Gb ports 14 10 0 49Y4798 A1EL / 3596 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 1) 򐂰 Adds 2x external 40 Gb uplinks 򐂰 Adds 14x internal 10 Gb ports 28 10 2 88Y6037 A1EM / 3597 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 2) (requires Upgrade 1): 򐂰 Adds 4x external 10 Gb uplinks 򐂰 Add 14x internal 10 Gb ports 42 14 2
  • 47. Chapter 3. IBM Flex System networking architecture and portfolio 33 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm – Two internal full-duplex 1 GbE ports that are connected to the chassis management module. 򐂰 External ports: – A total of 14 ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10 GBASE-SR, or 10 GBASE-LR) or SFP+ copper direct-attach cables (DAC). There are 10 ports enabled by default and an optional FoD license is required to activate the remaining four ports. SFP+ modules and DAC cables are not included and must be purchased separately. – Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DACs (these ports are disabled by default; an optional FoD license is required to activate them). QSFP+ modules and DAC cables are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides another means to configure the switch module. 򐂰 Scalability and performance: – 40 Gb Ethernet ports for extreme uplink bandwidth and performance – Fixed-speed external 10 Gb Ethernet ports to use 10 Gb core infrastructure – Support for 1G speeds on uplinks via proper SFP selection – Non-blocking architecture with wire-speed forwarding of traffic and aggregated throughput of 1.28 Tbps – Media access control (MAC) address learning: • Automatic update • Support of up to 128,000 MAC addresses – Up to 128 IP interfaces per switch – Static and LACP (IEEE 802.1AX; previously known as 802.3ad) link aggregation with up to: • 220 Gb of total uplink bandwidth per switch • 64 trunk groups • 16 ports per group – Support for cross switch aggregations via vLAG – Support for jumbo frames (up to 9,216 bytes) – Broadcast/multicast storm control – IGMP snooping to limit flooding of IP multicast traffic – IGMP filtering to control multicast traffic for hosts that participate in multicast groups – Configurable traffic distribution schemes over aggregated links – Fast port forwarding and fast uplink convergence for rapid STP convergence 򐂰 Availability and redundancy: – VRRP for Layer 3 router redundancy – IEEE 802.1D Spanning-tree to providing L2 redundancy, including support for: • Multiple STP (MSTP) for topology optimization, up to 32 STP instances are supported by single switch (previously known as 802.1s) • Rapid STP (RSTP) provides rapid STP convergence for critical delay-sensitive traffic, such as voice or video (previously known as 802.1w) • Per-VLAN Rapid STP (PVRST) to seamlessly integrate into Cisco infrastructures
  • 48. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 34 NIC Virtualization on IBM Flex System – Layer 2 Trunk Failover to support active and standby configurations of network adapter that team on compute nodes – Hot Links provides basic link redundancy with fast recovery for network topologies that require Spanning Tree to be turned off 򐂰 VLAN support: – Up to 4095 active VLANs supported per switch, with VLAN numbers that range from 1 to 4094 (4095 is used for internal management functions only) – 802.1Q VLAN tagging support on all ports – Private VLANs 򐂰 Security: – VLAN-based, MAC-based, and IP-based ACLs – 802.1x port-based authentication – Multiple user IDs and passwords – User access control – Radius, TACACS+, and LDAP authentication and authorization 򐂰 QoS: – Support for IEEE 802.1p, IP ToS/DSCP, and ACL-based (MAC/IP source and destination addresses, VLANs) traffic classification and processing – Traffic shaping and re-marking based on defined policies – Eight WRR priority queues per port for processing qualified traffic 򐂰 IP v4 Layer 3 functions: – Host management – IP forwarding – IP filtering with ACLs, up to 896 ACLs supported – VRRP for router redundancy – Support for up to 128 static routes – Routing protocol support (RIP v1, RIP v2, OSPF v2, and BGP-4), up to 2048 entries in a routing table – Support for DHCP Relay – Support for IGMP snooping and IGMP relay – Support for Protocol Independent Multicast (PIM) in Sparse Mode (PIM-SM) and Dense Mode (PIM-DM). 򐂰 IP v6 Layer 3 functions: – IPv6 host management (except default switch management IP address) – IPv6 forwarding – Up to 128 static routes – Support of OSPF v3 routing protocol – IPv6 filtering with ACLs 򐂰 Virtualization: – Virtual NICs (vNICs): Ethernet, iSCSI, or FCoE traffic is supported on vNICs – Unified fabric ports (UFPs): Ethernet or FCoE traffic is supported on UFPs – Virtual link aggregation groups (vLAGs) – 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing networks to become virtual machine (VM)-aware.
  • 49. Chapter 3. IBM Flex System networking architecture and portfolio 35 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are mechanisms for switching between VMs on the same hypervisor. Edge Control Protocol (ECP) is a transport protocol that operates between two peers over an IEEE 802 LAN providing reliable, in-order delivery of upper layer protocol data units. Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows centralized configuration of network policies that will persist with the VM, independent of its location. EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, and VDP. – VMready – Switch partitioning (SPAR) 򐂰 Converged Enhanced Ethernet: – Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow control to allow the switch to pause traffic that is based on the 802.1p priority value in the VLAN tag of each packet. – Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for allocating link bandwidth that is based on the 802.1p priority value in the VLAN tag of each packet. – Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows neighboring network devices to exchange information about their capabilities. 򐂰 FCoE: – FC-BB5 FCoE specification compliant – FCoE transit switch operations – FCoE Initialization Protocol (FIP) support for automatic ACL configuration – FCoE Link Aggregation Group (LAG) support – Multi-hop RDMA over Converged Ethernet (RoCE) with LAG support 򐂰 Stacking: – Up to eight switches in a stack – Hybrid stacking support (from two to six EN4093/EN4093R switches with two CN4093 switches) – FCoE support (EN4093R only) – vNIC support – 802.1Qbg support 򐂰 Manageability: – Simple Network Management Protocol (SNMP V1, V2, and V3) – HTTP browser GUI – Telnet interface for CLI – SSH – Serial interface for CLI – Scriptable CLI – Firmware image update (TFTP and FTP) – Network Time Protocol (NTP) for switch clock synchronization 򐂰 Monitoring: – Switch LEDs for external port status and switch module status indication – RMON agent to collect statistics and proactively monitor switch performance – Port mirroring for analyzing network traffic that passes through switch
  • 50. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 36 NIC Virtualization on IBM Flex System – Change tracking and remote logging with syslog feature – Support for sFLOW agent for monitoring traffic in data networks (separate sFLOW analyzer required elsewhere) – POST diagnostic testing Table 3-2 compares the EN4093 to the EN4093R. Table 3-2 EN4093 and EN4093R supported features For more information, see IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches, TIPS0864, which is available at this website: http://www.redbooks.ibm.com/abstracts/tips0864.html?Open 3.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch provides unmatched scalability, performance, convergence, and network virtualization, while also delivering innovations to help address a number of networking concerns and providing capabilities that help you prepare for the future. Feature EN4093 EN4093R Layer 2 switching Yes Yes Layer 3 switching Yes Yes Switch Stacking Yes Yes Virtual NIC (stand-alone) Yes Yes Virtual NIC (stacking) Yes Yes Unified Fabric Port (stand-alone) Yes Yes Unified Fabric Port (stacking) No No Edge virtual bridging (stand-alone) Yes Yes Edge virtual bridging (stacking) Yes Yes CEE/FCoE (stand-alone) Yes Yes CEE/FCoE (stacking) No Yes
  • 51. Chapter 3. IBM Flex System networking architecture and portfolio 37 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm The switch offers full Layer 2/3 switching and FCoE Full Fabric and Fibre Channel NPV Gateway operations to deliver a converged and integrated solution. It is installed within the I/O module bays of the IBM Flex System Enterprise Chassis. The switch can help you migrate to a 10 Gb or 40 Gb converged Ethernet infrastructure and offers virtualization features such as Virtual Fabric and IBM VMready, plus the ability to work with IBM Distributed Virtual Switch 5000V. Figure 3-5 shows the IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch. Figure 3-5 IBM Flex System Fabric CN4093 10 Gb Converged Scalable Switch The CN4093 switch is initially licensed for 14 10-GbE internal ports, two external 10-GbE SFP+ ports, and six external Omni Ports enabled. The following other ports can be enabled: 򐂰 A total of 14 more internal ports and two external 40 GbE QSFP+ uplink ports with Upgrade 1. 򐂰 A total of 14 more internal ports and six more external Omni Ports with the Upgrade 2 license options. 򐂰 Upgrade 1 and Upgrade 2 can be applied on the switch independently from each other or in combination for full feature capability. Table 3-3 shows the part numbers for ordering the switches and the upgrades. Table 3-3 Part numbers and feature codes for ordering Neither QSFP+ or SFP+ transceivers or cables are included with the switch. They must be ordered separately. Description Part number Feature code (x-config / e-config) Switch module IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch 00D5823 A3HH / ESW2 Features on Demand upgrades IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 1) 00D5845 A3HL / ESU1 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 2) 00D5847 A3HM / ESU2
  • 52. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 38 NIC Virtualization on IBM Flex System The switch does not include a serial management cable. However, IBM Flex System Management Serial Access Cable, 90Y9338, is supported and contains two cables, a mini-USB-to-RJ45 serial cable and a mini-USB-to-DB9 serial cable, either of which can be used to connect to the switch locally for configuration tasks and firmware updates. The following base switch and upgrades are available: 򐂰 00D5823 is the part number for the physical device, which comes with 14 internal 10 GbE ports enabled (one to each node bay), two external 10 GbE SFP+ ports that are enabled to connect to a top-of-rack switch or other devices identified as EXT1 and EXT2, and six Omni Ports enabled to connect to Ethernet or Fibre Channel networking infrastructure, depending on the SFP+ cable or transceiver that is used. The six Omni ports are from the 12 that are labeled on the switch as EXT11 through EXT22. 򐂰 00D5845 (Upgrade 1) can be applied on the base switch when you need more uplink bandwidth with two 40 GbE QSFP+ ports that can be converted into 4x 10 GbE SFP+ DAC links with the optional break-out cables. These are labeled EXT3, EXT7 or EXT3-EXT6, EXT7-EXT10 if converted. This upgrade also enables 14 more internal ports, for a total of 28 ports, to provide more bandwidth to the compute nodes using 4-port expansion cards. 򐂰 00D5847 (Upgrade 2) can be applied on the base switch when you need more external Omni Ports on the switch or if you want more internal bandwidth to the node bays. The upgrade enables the remaining six external Omni Ports from range EXT11 through EXT22, plus 14 more internal 10 Gb ports, for a total of 28 internal ports, to provide more bandwidth to the compute nodes by using 4-port expansion cards. 򐂰 Both 00D5845 (Upgrade 1) and 00D5847 (Upgrade 2) can be applied on the switch at the same time so that you can use six ports on an 8-port expansion card, and use all the external ports on the switch. Table 3-4 shows the switch upgrades and the ports they enable. Table 3-4 CN4093 10 Gb Converged Scalable Switch part numbers and port upgrades The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch has the following features and specifications: 򐂰 Internal ports: – A total of 42 internal full-duplex 10 Gigabit ports. (A total of 14 ports are enabled by default. Optional FoD licenses are required to activate the remaining 28 ports.) – Two internal full-duplex 1 GbE ports that are connected to the Chassis Management Module. Part number Feature codea a. The first feature code that is listed is for configurations that are ordered through System x sales channels (HVEC) by using x-config. The second feature code is for configurations that are ordered through the IBM Power Systems channel (AAS) by using e-config. Description Total ports that are enabled Internal 10Gb External 10Gb SFP+ External 10Gb Omni External 40Gb QSFP+ 00D5823 A3HH / ESW2 Base switch (no upgrades) 14 2 6 0 00D5845 A3HL / ESU1 Add Upgrade 1 28 2 6 2 00D5847 A3HM / ESU2 Add Upgrade 2 28 2 12 0 00D5845 00D5847 A3HL / ESU1 A3HM / ESU2 Add both Upgrade 1 and Upgrade 2 42 2 12 2
  • 53. Chapter 3. IBM Flex System networking architecture and portfolio 39 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm 򐂰 External ports: – Two ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, 10GBASE-LR, or SFP+ copper direct-attach cables (DACs)). These two ports are enabled by default. SFP+ modules and DACs are not included and must be purchased separately. – Twelve IBM Omni Ports. Each of them can operate as 10 Gb Ethernet (support for 10GBASE-SR, 10GBASE-LR, or 10 GbE SFP+ DACs), or auto-negotiating as 4/8 Gb Fibre Channel, depending on the SFP+ transceiver that is installed in the port. The first six ports are enabled by default. An optional FoD license is required to activate the remaining six ports. SFP+ modules and DACs are not included and must be purchased separately. – Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DACs. (Ports are disabled by default. An optional FoD license is required to activate them.) Also, you can use break-out cables to break out each 40 GbE port into four 10 GbE SFP+ connections. QSFP+ modules and DACs are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides another means to configure the switch module. 򐂰 Scalability and performance: – 40 Gb Ethernet ports for extreme uplink bandwidth and performance. – Fixed-speed external 10 Gb Ethernet ports to use the 10 Gb core infrastructure. – Non-blocking architecture with wire-speed forwarding of traffic and aggregated throughput of 1.28 Tbps on Ethernet ports. – MAC address learning: Automatic update, and support for up to 128,000 MAC addresses. – Up to 128 IP interfaces per switch. – Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total uplink bandwidth per switch, up to 64 trunk groups, and up to 16 ports per group. – Support for jumbo frames (up to 9,216 bytes). – Broadcast/multicast storm control. – IGMP snooping to limit flooding of IP multicast traffic. – IGMP filtering to control multicast traffic for hosts that participate in multicast groups. – Configurable traffic distribution schemes over trunk links that are based on source/destination IP or MAC addresses or both. – Fast port forwarding and fast uplink convergence for rapid STP convergence. 򐂰 Availability and redundancy: – VRRP for Layer 3 router redundancy. – IEEE 802.1D STP for providing L2 redundancy. – IEEE 802.1s MSTP for topology optimization. Up to 32 STP instances are supported by a single switch. – IEEE 802.1w RSTP provides rapid STP convergence for critical delay-sensitive traffic, such as voice or video. – PVRST enhancements. Omni Ports support: Note: Omni Ports do not support 1 Gb Ethernet operations.
  • 54. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 40 NIC Virtualization on IBM Flex System – Layer 2 Trunk Failover to support active/standby configurations of network adapter teaming on compute nodes. – Hot Links provides basic link redundancy with fast recovery for network topologies that require Spanning Tree to be turned off. 򐂰 VLAN support: – Up to 1024 VLANs supported per switch, with VLAN numbers from 1 - 4095. (4095 is used for management module’s connection only). – 802.1Q VLAN tagging support on all ports. – Private VLANs. 򐂰 Security: – VLAN-based, MAC-based, and IP-based access control lists (ACLs). – 802.1x port-based authentication. – Multiple user IDs and passwords. – User access control. – Radius, TACACS+, and LDAP authentication and authorization. 򐂰 QoS – Support for IEEE 802.1p, IP ToS/DSCP, and ACL-based (MAC/IP source and destination addresses, VLANs) traffic classification and processing. – Traffic shaping and re-marking based on defined policies. – Eight WRR priority queues per port for processing qualified traffic. 򐂰 IP v4 Layer 3 functions: – Host management. – IP forwarding. – IP filtering with ACLs, with up to 896 ACLs supported. – VRRP for router redundancy. – Support for up to 128 static routes. – Routing protocol support (RIP v1, RIP v2, OSPF v2, and BGP-4), for up to 2048 entries in a routing table. – Support for DHCP Relay. – Support for IGMP snooping and IGMP relay. – Support for PIM in PIM-SM and PIM-DM. 򐂰 IP v6 Layer 3 functions: – IPv6 host management (except for a default switch management IP address). – IPv6 forwarding. – Up to 128 static routes. – Support for OSPF v3 routing protocol. – IPv6 filtering with ACLs. 򐂰 Virtualization: – vNICs: Ethernet, iSCSI, or FCoE traffic is supported on vNICs. – UFPs: Ethernet or FCoE traffic is supported on UFPs – 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing networks to become virtual machine (VM)-aware: • Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are mechanisms for switching between VMs on the same hypervisor.
  • 55. Chapter 3. IBM Flex System networking architecture and portfolio 41 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm • Edge Control Protocol (ECP) is a transport protocol that operates between two peers over an IEEE 802 LAN providing reliable and in-order delivery of upper layer protocol data units. • Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows centralized configuration of network policies that persists with the VM, independent of its location. • EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, and VDP. – VMready. 򐂰 Converged Enhanced Ethernet – Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow control to allow the switch to pause traffic that is based on the 802.1p priority value in each packet’s VLAN tag. – Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for allocating link bandwidth that is based on the 802.1p priority value in each packet’s VLAN tag. – Data center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows neighboring network devices to exchange information about their capabilities. 򐂰 Fibre Channel over Ethernet (FCoE) – FC-BB5 FCoE specification compliant. – Native FC Forwarder switch operations. – End-to-end FCoE support (initiator to target). – FCoE Initialization Protocol (FIP) support. 򐂰 Fibre Channel – Omni Ports support 4/8 Gb FC when FC SFPs+ are installed in these ports. – Full Fabric mode for end-to-end FCoE or NPV Gateway mode for external FC SAN attachments (support for IBM B-type, Brocade, and Cisco MDS external SANs). – Fabric services in Full Fabric mode: • Name Server • Registered State Change Notification (RSCN) • Login services • Zoning 򐂰 Stacking – Hybrid stacking support (from two to six EN4093/EN4093R switches with two CN4093 switches) – FCoE support – vNIC support – 802.1Qbg support 򐂰 Manageability – Simple Network Management Protocol (SNMP V1, V2, and V3). – HTTP browser GUI. – Telnet interface for CLI. – SSH. – Secure FTP (sFTP).
  • 56. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 42 NIC Virtualization on IBM Flex System – Service Location Protocol (SLP). – Serial interface for CLI. – Scriptable CLI. – Firmware image update (TFTP and FTP). – Network Time Protocol (NTP) for switch clock synchronization. 򐂰 Monitoring – Switch LEDs for external port status and switch module status indication. – Remote Monitoring (RMON) agent to collect statistics and proactively monitor switch performance. – Port mirroring for analyzing network traffic that passes through a switch. – Change tracking and remote logging with syslog feature. – Support for sFLOW agent for monitoring traffic in data networks (separate sFLOW analyzer is required elsewhere). – POST diagnostic tests. For more information, see the IBM Redbooks Product Guide IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch, TIPS0910, found at: http://www.redbooks.ibm.com/abstracts/tips0910.html?Open 3.2.3 IBM Flex System Fabric SI4093 System Interconnect Module The IBM Flex System Fabric SI4093 System Interconnect Module enables simplified integration of IBM Flex System into your existing networking infrastructure. The SI4093 System Interconnect Module requires no management for most data center environments. This eliminates the need to configure each networking device or individual ports, which reduces the number of management points. It provides a low latency, loop-free interface that does not rely upon spanning tree protocols, which removes one of the greatest deployment and management complexities of a traditional switch. The SI4093 System Interconnect Module offers administrators a simplified deployment experience while maintaining the performance of intra-chassis connectivity. The SI4093 System Interconnect Module is shown in Figure 3-6 on page 42. Figure 3-6 IBM Flex System Fabric SI4093 System Interconnect Module
  • 57. Chapter 3. IBM Flex System networking architecture and portfolio 43 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm The SI4093 System Interconnect Module is initially licensed for 14 10-Gb internal ports enabled and 10 10-Gb external uplink ports enabled. More ports can be enabled, including 14 internal ports and two 40 Gb external uplink ports with Upgrade 1, and 14 internal ports and four SFP+ 10 Gb external ports with Upgrade 2 license options. Upgrade 1 must be applied before Upgrade 2 can be applied. Table 3-5 shows the part numbers for ordering the switches and the upgrades. Table 3-5 SI4093 ordering information The following base switch and upgrades are available: 򐂰 95Y3313 is the part number for the physical device, and it comes with 14 internal 10 Gb ports enabled (one to each node bay) and 10 external 10 Gb ports enabled for connectivity to an upstream network, plus external servers and storage. All external 10 Gb ports are SFP+ based connections. 򐂰 95Y3318 (Upgrade 1) can be applied on the base interconnect module to make full use of 4-port adapters that are installed in each compute node. This upgrade enables 14 more internal ports, for a total of 28 ports. The upgrade also enables two 40 Gb uplinks with QSFP+ connectors. These QSFP+ ports can also be converted to four 10 Gb SFP+ DAC connections by using the appropriate fan-out cable. This upgrade requires the base interconnect module. 򐂰 95Y3320 (Upgrade 2) can be applied on top of Upgrade 1 when you want more uplink bandwidth on the interconnect module or if you want more internal bandwidth to the compute nodes with the adapters capable of supporting six ports (like CN4058). The upgrade enables the remaining four external 10 Gb uplinks with SFP+ connectors, plus 14 internal 10 Gb ports, for a total of 42 ports (three to each compute node). Table 3-6 lists the supported port combinations on the interconnect module and the required upgrades. Table 3-6 Supported port combinations Description Part number Feature code (x-config / e-config) Interconnect module IBM Flex System Fabric SI4093 System Interconnect Module 95Y3313 A45T / ESWA Features on Demand upgrades SI4093 System Interconnect Module (Upgrade 1) 95Y3318 A45U / ESW8 SI4093 System Interconnect Module (Upgrade 2) 95Y3320 A45V / ESW9 Important: SFP and SFP+ (small form-factor pluggable plus) transceivers or cables are not included with the switch. They must be ordered separately. See Table 3-6 on page 43. Quantity required Supported port combinations Base switch, 95Y3313 Upgrade 1, 95Y3318 Upgrade 2, 95Y3320 14x internal 10 GbE 10x external 10 GbE 1 0 0
  • 58. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 44 NIC Virtualization on IBM Flex System The SI4093 System Interconnect Module has the following features and specifications: 򐂰 Modes of operations: – Transparent (or VLAN-agnostic) mode In VLAN-agnostic mode (default configuration), the SI4093 transparently forwards VLAN tagged frames without filtering on the customer VLAN tag, which provides an end host view to the upstream network. The interconnect module provides traffic consolidation in the chassis to minimize TOR port usage, and it enables server-to-server communication for optimum performance (for example, vMotion). It can be connected to the FCoE transit switch or FCoE gateway (FC Forwarder) device. – Local Domain (or VLAN-aware) mode In VLAN-aware mode (optional configuration), the SI4093 provides more security for multi-tenant environments by extending client VLAN traffic isolation to the interconnect module and its uplinks. VLAN-based access control lists (ACLs) can be configured on the SI4093. When FCoE is used, the SI4093 operates as an FCoE transit switch, and it should be connected to the FCF device. 򐂰 Internal ports: – A total of 42 internal full-duplex 10 Gigabit ports; 14 ports are enabled by default. Optional FoD licenses are required to activate the remaining 28 ports. – Two internal full-duplex 1 GbE ports that are connected to the chassis management module. 򐂰 External ports: – A total of 14 ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, or 10GBASE-LR) or SFP+ copper direct-attach cables (DAC). A total of 10 ports are enabled by default. An optional FoD license is required to activate the remaining four ports. SFP+ modules and DACs are not included and must be purchased separately. – Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DACs. (Ports are disabled by default. An optional FoD license is required to activate them.) QSFP+ modules and DACs are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides another means to configure the switch module. 򐂰 Scalability and performance: – 40 Gb Ethernet ports for extreme uplink bandwidth and performance. – External 10 Gb Ethernet ports to use 10 Gb upstream infrastructure. 28x internal 10 GbE 10x external 10 GbE 2x external 40 GbE 1 1 0 42x internal 10 GbEa 14x external 10 GbE 2x external 40 GbE 1 1 1 a. This configuration uses six of the eight ports on the CN4058 adapter that are available for IBM Power Systems™ compute nodes. Quantity required Supported port combinations Base switch, 95Y3313 Upgrade 1, 95Y3318 Upgrade 2, 95Y3320
  • 59. Chapter 3. IBM Flex System networking architecture and portfolio 45 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm – Non-blocking architecture with wire-speed forwarding of traffic and aggregated throughput of 1.28 Tbps. – Media access control (MAC) address learning: automatic update, support for up to 128,000 MAC addresses. – Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total uplink bandwidth per interconnect module. – Support for jumbo frames (up to 9,216 bytes). 򐂰 Availability and redundancy: – Layer 2 Trunk Failover to support active and standby configurations of network adapter teaming on compute nodes. – Built in link redundancy with loop prevention without a need for Spanning Tree protocol. 򐂰 VLAN support: – Up to 32 VLANs supported per interconnect module SPAR partition, with VLAN numbers 1 - 4095. (4095 is used for management module’s connection only.) – 802.1Q VLAN tagging support on all ports. 򐂰 Security: – VLAN-based access control lists (ACLs) (VLAN-aware mode). – Multiple user IDs and passwords. – User access control. – Radius, TACACS+, and LDAP authentication and authorization. 򐂰 QoS Support for IEEE 802.1p traffic classification and processing. 򐂰 Virtualization: – Switch Independent Virtual NIC (vNIC2): Ethernet, iSCSI, or FCoE traffic is supported on vNICs. – SPAR: • SPAR forms separate virtual switching contexts by segmenting the data plane of the switch. Data plane traffic is not shared between SPARs on the same switch. • SPAR operates as a Layer 2 broadcast network. Hosts on the same VLAN attached to a SPAR can communicate with each other and with the upstream switch. Hosts on the same VLAN but attached to different SPARs communicate through the upstream switch. • SPAR is implemented as a dedicated VLAN with a set of internal server ports and a single uplink port or link aggregation (LAG). Multiple uplink ports or LAGs are not allowed in SPAR. A port can be a member of only one SPAR. 򐂰 Converged Enhanced Ethernet: – Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow control to allow the switch to pause traffic based on the 802.1p priority value in each packet’s VLAN tag. – Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for allocating link bandwidth based on the 802.1p priority value in each packet’s VLAN tag. – Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows neighboring network devices to exchange information about their capabilities.
  • 60. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 46 NIC Virtualization on IBM Flex System 򐂰 FCoE: – FC-BB5 FCoE specification compliant – FCoE transit switch operations – FCoE Initialization Protocol (FIP) support 򐂰 Manageability: – IPv4 and IPv6 host management. – Simple Network Management Protocol (SNMP V1, V2, and V3). – Industry standard command-line interface (IS-CLI) through Telnet, SSH, and serial port. – Secure FTP (sFTP). – Service Location Protocol (SLP). – Firmware image update (TFTP and FTP/sFTP). – Network Time Protocol (NTP) for clock synchronization. – IBM System Networking Switch Center (SNSC) support. 򐂰 Monitoring: – Switch LEDs for external port status and switch module status indication. – Change tracking and remote logging with syslog feature. – POST diagnostic tests. For more information, see IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches, TIPS0864, which is available at this website: http://www.redbooks.ibm.com/abstracts/tips0864.html?Open 3.2.4 I/O modules and cables The Ethernet I/O modules support for interface modules and cables is shown in Table 3-7. Table 3-7 Modules and cables supported in Ethernet I/O modules Part number Description EN4093 EN4093R CN4093 SI4093 44W4408 10GbE 850 nm Fiber SFP+ Transceiver (SR) Yes Yes Yes Yes 46C3447 IBM SFP+ SR Transceiver Yes Yes Yes Yes 90Y9412 IBM SFP+ LR Transceiver Yes Yes Yes Yes 81Y1622 IBM SFP SX Transceiver Yes Yes Yes Yes 81Y1618 IBM SFP RJ45 Transceiver Yes Yes Yes Yes 90Y9424 IBM SFP LX Transceiver Yes Yes Yes Yes 49Y7884 IBM QSFP+ SR Transceiver Yes Yes Yes Yes 90Y9427 1m IBM Passive DAC SFP+ Cable Yes Yes Yes Yes 00AY764 1.5m IBM Passive DAC SFP+ Cable No Yes Yes Yes 00AY765 2m IBM Passive DAC SFP+ Cable No Yes Yes Yes 90Y9430 3m IBM Passive DAC SFP+ Cable Yes Yes Yes Yes 90Y9433 5m IBM Passive DAC SFP+ Cable Yes Yes Yes Yes
  • 61. Chapter 3. IBM Flex System networking architecture and portfolio 47 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm All Ethernet /O modules are restricted to the use of the SFP/SFP+ modules that are listed in Table 3-7 on page 46. 3.3 IBM Flex System Ethernet adapters The IBM Flex System portfolio contains a number of Ethernet I/O adapters. The cards are a combination of 1 Gb, 10 Gb, and 40 Gb ports and advanced function support that includes converged networks and virtual NICs. The following Ethernet I/O adapters are described: 򐂰 3.3.1, “Embedded 10Gb Virtual Fabric Adapter” 򐂰 3.3.2, “IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters” on page 48 򐂰 3.3.3, “IBM Flex System CN4022 2-port 10Gb Converged Adapter” on page 50 򐂰 3.3.4, “IBM Flex System x222 Compute Node LOM” on page 52 3.3.1 Embedded 10Gb Virtual Fabric Adapter Some models of the x240 - those with a model of the form 8737-x4x - include an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard or LOM) built into the system board. Table 2 lists the models of the x240 include the Embedded 10Gb Virtual Fabric Adapter. Each x240 model that includes the embedded 10 Gb VFA also has the Compute Node Fabric Connector installed in I/O connector 1 (and physically screwed onto the system board) to provide connectivity to the Enterprise Chassis midplane. Figure 3 shows the location of the Fabric Connector. The Fabric Connector enables port 1 on the embedded 10Gb VFA to be routed to I/O module bay 1 and port 2 to be routed to I/O module bay 2. The Fabric Connector can be unscrewed and removed, if required, to allow the installation of an I/O adapter on I/O connector 1. The Embedded 10Gb VFA is based on the Emulex BladeEngine 3R (BE3R), which is a single-chip, dual-port 10 Gigabit Ethernet (10GbE) Ethernet Controller. 00D6151 7m IBM Passive DAC SFP+ Cable No Yes Yes Yes 49Y7886 1m IBM QSFP+ DAC Break Out Cbl. Yes Yes Yes Yes 49Y7887 3m IBM QSFP+ DAC Break Out Cbl. Yes Yes Yes Yes 49Y7888 5m IBM QSFP+ DAC Break Out Cbl. Yes Yes Yes Yes 90Y3519 10m IBM QSFP+ MTP Optical cable Yes Yes Yes Yes 90Y3521 30m IBM QSFP+ MTP Optical cable Yes Yes Yes Yes 49Y7890 1m IBM QSFP+-to-QSFP+ cable Yes Yes Yes Yes 49Y7891 3m IBM QSFP+-to-QSFP+ cable Yes Yes Yes Yes 00D5810 5m IBM QSFP+ to QSFP+ Cable No Yes Yes Yes 00D5813 7m IBM QSFP+ to QSFP+ Cable No Yes Yes Yes Part number Description EN4093 EN4093R CN4093 SI4093
  • 62. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 48 NIC Virtualization on IBM Flex System These are some of the features of the Embedded 10Gb VFA: 򐂰 PCI-Express Gen2 x8 host bus interface 򐂰 Supports connection to 10 Gb and 1 Gb Flex System Ethernet switches Supports multiple virtual NIC (vNIC) functions 򐂰 TCP/IP Offload Engine (TOE enabled) 򐂰 SR-IOV capable 򐂰 RDMA over TCP/IP capable 򐂰 iSCSI and FCoE upgrade offering via FoD The following table lists the ordering information for the IBM Virtual Fabric Advanced Software Upgrade (LOM), which enables the iSCSI and FCoE support on the Embedded 10Gb Virtual Fabric Adapter. Table 3-8 Feature on Demand upgrade for FCoE and iSCSI support 3.3.2 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters The IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters is a 4-port 10 Gb converged network adapter. It can scale to up to 16 virtual ports and support multiple protocols, such as Ethernet, iSCSI, and FCoE. Figure 3-7 shows the IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters. Figure 3-7 The CN4054/CN4054R 10Gb Virtual Fabric Adapter for IBM Flex System Table 3-9 lists the ordering part numbers and feature codes. Part number x-config feature code e-config feature code 7863-10X feature code Description 90Y9310 A2TD None IBM Virtual Fabric Advanced Software Upgrade (LOM)
  • 63. Chapter 3. IBM Flex System networking architecture and portfolio 49 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm Table 3-9 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter ordering information The IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter has the following features and specifications: 򐂰 Two ASICs per adapter. – CN4054: Dual-ASIC Emulex BladeEngine 3 (BE3) controller. – CN4054R: Dual-ASIC Emulex BladeEngine 3R (BE3R) controller. 򐂰 Operates as a 4-port 1/10 Gb Ethernet adapter, or supports up to 16 Virtual Network Interface Cards (vNICs). 򐂰 In virtual NIC (vNIC) mode, it supports: – Virtual port bandwidth allocation in 100 Mbps increments. – Up to 16 virtual ports per adapter (four per port). – With the CN4054/CN4054R Virtual Fabric Adapter Upgrade, 90Y3558, four of the 16 vNICs (one per port) support iSCSI or FCoE. 򐂰 Support for two vNIC modes: IBM Virtual Fabric Mode and Switch Independent Mode. 򐂰 Wake On LAN support. 򐂰 With the CN4054/CN4054R Virtual Fabric Adapter Upgrade, 90Y3558, the adapter adds FCoE and iSCSI hardware initiator support. iSCSI support is implemented as a full offload and presents an iSCSI adapter to the operating system. 򐂰 TCP offload Engine (TOE) support with Windows Server 2003, 2008, and 2008 R2 (TCP Chimney) and Linux. 򐂰 The connection and its state are passed to the TCP offload engine. 򐂰 Data transmit and receive is handled by the adapter. 򐂰 Supported by iSCSI. 򐂰 Connection to either 1 Gb or 10 Gb data center infrastructure (1 Gb and 10 Gb auto-negotiation). 򐂰 PCI Express 3.0 x8 host interface. 򐂰 Full-duplex capability. 򐂰 Bus-mastering support. 򐂰 DMA support. 򐂰 PXE support. 򐂰 IPv4/IPv6 TCP, UDP checksum offload: – Large send offload – Large receive offload – RSS – IPv4 TCP Chimney offload Part number x-config feature code e-config feature code 7863-10X feature code Description 90Y3554 A1R1 None 1759 CN4054 10Gb Virtual Fabric Adapter 90Y3558 A1R0 None 1760 CN4054 Virtual Fabric Adapter Upgrade 00Y3306 A4K2 None A4K2 CN4054R 10Gb Virtual Fabric Adapter
  • 64. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 50 NIC Virtualization on IBM Flex System – TCP Segmentation offload 򐂰 VLAN insertion and extraction. 򐂰 Jumbo frames up to 9000 bytes. 򐂰 Load balancing and failover support, including AFT, SFT, ALB, teaming support, and IEEE 802.3ad. 򐂰 Enhanced Ethernet (draft): – Enhanced Transmission Selection (ETS) (P802.1Qaz). – Priority-based Flow Control (PFC) (P802.1Qbb). – Data Center Bridging Capabilities eXchange Protocol, CIN-DCBX, and CEE-DCBX (P802.1Qaz). 򐂰 Supports Serial over LAN (SoL). For more information, see IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter and EN4054 4-port 10Gb Ethernet Adapter, TIPS0868, which can be found at this website: http://www.redbooks.ibm.com/abstracts/tips0868.html 3.3.3 IBM Flex System CN4022 2-port 10Gb Converged Adapter The IBM Flex System CN4022 2-port 10Gb Converged Adapter is a dual-port 10 Gigabit Ethernet network adapter that supports Ethernet, Fibre Channel over Ethernet (FCoE), and Internet Small Computer System Interface (iSCSI) protocols out of the box. Clients now have a choice of multiple vendors without compromising the features. This adapter also supports virtual network interface controller (vNIC) capability, which helps clients reduce cost and complexity. The CN4022 adapter is based on the Broadcom 57840 controller and offers a PCIe 2.0 x8 host interface. This IBM Redbooks Product Guide describes the IBM Flex System CN4022 2-port 10Gb Converged Adapter. Figure 3-8 The CN4022 2-port 10Gb Converged Adapter is shown in Figure 3-8. Figure 3-8 IBM Flex System CN4022 2-port 10Gb Converged Adapter
  • 65. Chapter 3. IBM Flex System networking architecture and portfolio 51 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm This CN4022 is based on the industry-standard PCIe architecture and is ideal for clients that use 10 GbE in their network infrastructure and that are looking for an entry price point for FCoE or iSCSI capabilities. The adapter ships standard with support for FCoE and iSCSi and with vNIC features that allow each physical port of the adapter to be virtualized into four virtual NICs (vNICs). Table 3-10List the ordering part numbers and feature codes. Table 3-10 IBM Flex System CN4022 2-port 10 Gb Converged Adapter ordering information The IBM Flex System CN4022 2-port 10Gb Converged Adapter has the following features and specifications: The IBM Flex System CN4022 2-port 10Gb Converged Adapter has these features: 򐂰 One Broadcom BCM57840 ASIC 򐂰 Connection 10 Gb data center infrastructure 򐂰 PCI Express 2.0 x8 host interface 򐂰 Full line-rate performance 򐂰 Supports 10 Gb Ethernet, FCoE, and iSCSI 򐂰 IBM Flex System Manager support (Tier 2 support only, no alerting) 򐂰 Ethernet features – Ethernet frame: 1500 byte or 9600 byte (jumbo frame) – Virtual LAN (VLAN) support with VLAN tagging – vNIC support: • Supports Switch Independent Mode (vNIC2 mode) • UFP mode support planned in 2014 • Four vNIC/NPAR Ethernet devices per 10Gb physical port • Support either for two iSCSI ports or for one iSCSI port and one FCoE port, per 10 Gb physical port 򐂰 Stateless offload – IP, TCP, and UDP checksum offloads – IPv4 and IPv6 offloads – Large send offload (LSO) 򐂰 Performance optimization – Receive Side Scaling (RSS) – Transmit Side Scaling (TSS) – MSI and MSI-X support – RX/TX multiqueue – TCP Offload Engine (TOE) support 򐂰 SR-IOV-ready 򐂰 Wake on LAN 򐂰 Preboot eXecution Environment (PXE) support Part number x-config feature code e-config feature code Description 88Y5920 A4K3 A4K3 IBM Flex System CN4022 2-port 10Gb Converged Adapter
  • 66. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 52 NIC Virtualization on IBM Flex System 򐂰 Network teaming, failover, and load balancing – Smart Load Balancing (SLB) – Link Aggregation Control Protocol (LACP) and generic trunking – Management using Broadcom Advanced Control Suite management application 򐂰 Compliance – IEEE 802.3ae (10 Gb Ethernet) – IEEE 802.3ad (Link aggregation) – IEEE 802.3ap Clause73 1G/10G Autonegotiation for 10GBase-KR channels – IEEE 802.1q (VLAN) – IEEE 802.1p (Priority Encoding) – IEEE 802.3x (Flow Control) – IEEE 802.1au (Congestion Notification) – IPv4 (RFQ 791) – IPv6 (RFC 2460) – IEEE 1588/802.1as (Precision Time Protocol (PTP)) – IEEE 802.1Qbb Priority Flow Control (PFC) – IEEE 802.1Qaz Enhanced Transmission Selection (ETS) 򐂰 iSCSI features – iSCSI initiator hardware offload and boot support – Protocols • RFC 3347 (iSCSI Requirements and Design Considerations) • Challenge Handshake Authentication Protocol (CHAP) • iSNS • Service Location Protocol (SLP) 򐂰 FCoE features – 3,500 N_Port ID Virtualization (NPIV) interfaces (total for adapter) – Support for FIP and FCoE Ethertypes – Fabric Provided Media Access Control (MAC) Addressing (FPMA) support – 2,048 concurrent port logins (RPIs) per port – 1,024 active exchanges (XRIs) per port For more information, see IBM Flex System CN4022 2-port 10Gb Converged Adapter, TIPS1087, which can be found at this website: http://www.redbooks.ibm.com/abstracts/tips1087.html?Open 3.3.4 IBM Flex System x222 Compute Node LOM The IBM Flex System x222 Compute Node is a high-density dual-server offering that is designed for virtualization, dense cloud deployments, and hosted clients. The x222 has two independent servers in one mechanical package, which means that the x222 has a double-density design that allows up to 28 servers to be housed in a single 10U Flex System Enterprise Chassis. Notes: 򐂰 FCoE is not supported with Red Hat Enterprise Linux KVM 򐂰 FCoE support for VLAN discovery only with the port PVID = 1 򐂰 FCoE SAN boot is not supported
  • 67. Chapter 3. IBM Flex System networking architecture and portfolio 53 Draft Document for Review May 1, 2014 2:10 pm Flex System networking offerings.fm The following figure shows the IBM Flex System x222 Compute Node. Figure 3-9 IBM Flex System x222 Compute Node More information on the specifics for the x222 can be found at the Redbooks Publication link below; http://www.redbooks.ibm.com/abstracts/tips1036.html Embedded 10Gb Virtual Fabric Adapter on the x222 Each server in the x222 includes an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard or LOM) built in to the system board. The x222 has one Fabric Connector (which is physically on the lower server) and the Ethernet connections from both Embedded 10 Gb VFAs are routed through it. Figure 5 shows the physical location of the Fabric Connector. Figure 3-10 below shows the internal connections between the Embedded 10Gb VFAs and the switches in chassis bays 1 and 2. Figure 3-10 Embedded 10 Gb VFA connectivity to the switches (switch port upgrades applies to EN4093, EN4093R, CN4093 and SI4093 switches)
  • 68. Flex System networking offerings.fm Draft Document for Review May 1, 2014 2:10 pm 54 NIC Virtualization on IBM Flex System In Figure 3-10 on page 53: 򐂰 The blue lines show that the two Ethernet ports in the upper server route to switches in bay 1 and bay 2. These connections require that the switch have Upgrade 1 enabled so as to enable the second bank of internal ports, ports 15-28 (Alias ports INTB1-INTB14). 򐂰 The red lines show that the two Ethernet ports in the lower server also route to switches in bay 1 and bay 2. These connections both go to the base ports of the switch, ports 1-14 (Alias ports INTA1-INTA14) Switch upgrade 1 required: For EN4093, EN4093R, CN4093 and SI4093 switches, Upgrade 1 must be enabled in the two switches. Without this feature upgrade, the upper server will not have any Ethernet connectivity. The Embedded 10Gb VFA is based on the Emulex BladeEngine 3 (BE3), which is a single-chip, dual-port 10 Gigabit Ethernet (10GbE) Ethernet Controller. These are some of the features of the Embedded 10Gb VFA: 򐂰 PCI-Express Gen2 x8 host bus interface 򐂰 Supports multiple virtual NIC (vNIC) functions 򐂰 TCP/IP Offload Engine (TOE enabled) 򐂰 SR-IOV capable 򐂰 RDMA over TCP/IP capable 򐂰 iSCSI and FCoE upgrade offering through FoD Table 3-11 on page 54 lists the ordering information for the IBM Flex System Embedded 10Gb Virtual Fabric Upgrade, which enables the iSCSI and FCoE support on the Embedded 10Gb Virtual Fabric Adapter. Table 3-11 Feature on Demand upgrade for FCoE and iSCSI support Supported switches The x222 supports only Ethernet scalable switches with at least the first internal port upgrade enabled. Table 3-12 Supported Switches Part Number Feature Code Description Maximum supported 90Y9310 A2TD IBM Virtual Fabric Advanced Software Upgrade (LOM) 1 per server 2 per x222 Compute Node TIP: Two licenses required: To enable the FCoE/iSCSI upgrade for both servers in the x222 Compute Node, two licenses are required. Adapter Switches supported Minimum required switch upgrades Embedded 10 GbE Virtual Fabric Adapter EN4093R 10Gb Scalable Switch (95Y3309) Upgrade 1 (49Y4798) CN4093 10Gb Converged Scalable Switch (00D5823) Upgrade 1 (00D5845) or Upgrade 2 (00D5847) SI4093 System Interconnect Module (95Y3313) Upgrade 1 (95Y3318)
  • 69. © Copyright IBM Corp. 2014. All rights reserved. 55 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm Chapter 4. NIC virtualization considerations on the switch side This paper is primarily focused on the various options to virtualize NIC technology. This section introduces the two primary types of NIC Virtualization (vNIC and UFP) available on the Flex System switches, as well as introduces and discusses considerations of the various sub-elements of these virtual NIC technologies. At the core of all virtual NICs discussed in this section, is the ability to take a single physical 10 GbE NIC, and carve it up into up to four virtual NICs, for use in the attaching host. This chapter focuses on various deployment considerations when looking at making the right choice in NIC virtualization within a PureFlex System environment. The following topics are covered: 򐂰 4.1, “Virtual Fabric vNIC solution capabilities” on page 56 򐂰 4.2, “Unified Fabric Port feature” on page 64 򐂰 4.3, “Compute node NIC to I/O module connectivity mapping” on page 70 4
  • 70. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 56 NIC Virtualization on IBM Flex System 4.1 Virtual Fabric vNIC solution capabilities Virtual Network Interface Controller (called vNIC in this paper) was the original way IBM switches provided the ability to divide a physical NIC into smaller logical NICs, so that the OS has more ways to logically connect to the infrastructure. The vNIC feature is supported only on 10 Gb ports that face the compute nodes within the chassis, and only on certain Ethernet I/O modules. These currently include the EN4093R 10Gb Scalable Switch and CN4093 10Gb Converged Scalable Switch. vNIC also requires a node adapter that also supports this functionality. As of this writing, there are two primary forms of vNIC available: Virtual Fabric mode (or Switch dependent mode) and Switch independent mode. The Virtual Fabric mode of vNIC also is subdivided into two sub-modes: Dedicated uplink vNIC mode and Shared uplink vNIC mode. All vNIC modes share the following common elements: 򐂰 They are supported only on 10 Gb connections. 򐂰 Each vNIC mode allows a NIC to be divided into up to four vNICs per physical NIC (can be less than four, but not more). 򐂰 They all require an adapter that has support for one or more of the vNIC modes. 򐂰 When vNICs are created, the default bandwidth is 2.5 Gb for each vNIC, but they can be configured to be anywhere from 100 Mb up to the full bandwidth of the NIC. 򐂰 The bandwidth of all configured vNICs on a physical NIC cannot exceed 10 Gb. 򐂰 All modes support FCoE. A summary of some of the differences and similarities of these modes is shown in Table 4-1. These differences and similarities are covered in more detail next. Table 4-1 Attributes of vNIC modes Tip: It will occasionally be seen in other documentation that these modes are called vNIC 1 (virtual fabric mode vNIC) and vNIC 2 (switch independent mode vNIC). Capability IBM Virtual Fabric mode Switch independent mode Dedicated uplink Shared uplink Requires support in the I/O module Yes Yes No Requires support in the NIC/CNA Yes Yes Yes Supports adapter transmit rate control Yes Yes Yes Support I/O module transmit rate control Yes Yes No Supports changing rate without restart of node Yes Yes No Requires a dedicated uplink per vNIC group Yes No No Support for node OS-based tagging Yes No Yes Support for per vNIC group Yes Yes N/A Support for more than one uplink path per vNIC No No Yes
  • 71. Chapter 4. NIC virtualization considerations on the switch side 57 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm 4.1.1 Virtual Fabric mode vNIC Virtual Fabric mode vNIC depends on the switch in the I/O module bay to participate in the vNIC process. Specifically, the IBM Flex System Fabric EN4093R 10Gb Scalable Switch and the CN4093 10Gb Converged Scalable Switch support this mode. It also requires an adapter on the Compute node that supports the vNIC Virtual Fabric mode feature. In Virtual Fabric mode vNIC, configuration is performed on the switch and the configuration information is communicated between the switch and the adapter so that both sides agree on and enforce bandwidth controls. The mode can be changed to different speeds at any time without reloading the OS or the I/O module. As noted, there are two types of Virtual Fabric vNIC modes: Dedicated uplink mode and Shared uplink mode. Both modes incorporate the concept of a vNIC group on the switch that is used to associate vNICs and physical ports into virtual switches within the chassis. How these vNIC groups are used is the primary difference between dedicated uplink mode and shared uplink mode. Virtual Fabric vNIC modes share the following common attributes: 򐂰 They conceptually are a vNIC group that must be created on the I/O module. 򐂰 Similar vNICs are bundled together into common vNIC groups. 򐂰 Each vNIC group is treated as a virtual switch within the I/O module. Packets in one vNIC group can get only to a different vNIC group by going to an external switch/router. 򐂰 For the purposes of Spanning tree and packet flow, each vNIC group is treated as a unique switch by upstream connecting switches/routers. 򐂰 Both modes support the addition of physical NICs (pNIC) (the NICs from nodes that are not using vNIC) to vNIC groups for internal communication to other pNICs and vNICs in that vNIC group, and share any uplink that is associated with that vNIC group. Dedicated uplink mode Dedicated uplink mode is the default mode when vNIC is enabled on the I/O module. In dedicated uplink mode, each vNIC group must have its own dedicated physical or logical (aggregation) uplink. In this mode, no more than one physical or logical uplink to a vNIC group can be assigned and it assumed that high availability is achieved by some combination of aggregation on the uplink or NIC teaming on the server. In dedicated uplink mode, vNIC groups are VLAN-independent to the nodes and the rest of the network, which means that you do not need to create VLANs for each VLAN that is used by the nodes. The vNIC group takes each packet (tagged or untagged) and moves it through the switch. This mode is accomplished by the use of a form of Q-in-Q tagging. Each vNIC group is assigned some VLAN that is unique to each vNIC group. Any packet (tagged or untagged) that comes in on a downstream or upstream port in that vNIC group has a tag placed on it equal to the vNIC group VLAN. As that packet leaves the vNIC into the node or out an uplink, that tag is removed and the original tag (or no tag, depending on the original packet) is revealed. Example Configuration Example 4-1 on page 58 shows an example Virtual Fabric vNIC mode configuration. The below example enables VLAN 4091 as the Outer Q-n-Q VLAN ID on vNIC port 1 the first Index ID. By default the bandwidth configuration is set to 25% on all 4 Index numbers equating to 100%. As noted above, these values can be adjusted as needed but not to exceed 100% on all four Index’s.
  • 72. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 58 NIC Virtualization on IBM Flex System In the previous paragraph we discussed the INT vNIC Port settings but how does this relate to the EXT Port for network access? Within the vnic vnicgroup 1 configuration one of three options can chosen to get network access; 򐂰 port; a single physical port 򐂰 trunk; a Static/Trunk Port Channel 򐂰 key; an LACP (802.3ad) Port Channel The failover command, also located within the vnic vnicgroup section, allows for the monitoring of an EXT Port or Port Channel. In the event of a link failure on the EXT Port or Port Channel the I/O Module will disable all related members within that vnicgroup. Example 4-1 Switch Independent mode example configuration vnic enable vnic port INTA1 index 1 bandwidth 25 enable exit ! vnic vnicgroup 1 vlan 4091 enable failover member INTA1.1 port EXT1 exit In Figure 4-1 on page 59, Virtual Fabric vNIC Dedicated Uplink Mode uses vNIC Groups to partition the vSwitch within the ESXi Host. Note that this is not specific to VMware and is supported on all Intel Platform Operating Systems with the Emulex Virtual Fabric Adapter. In this example vNIC Group1, 2, 3, and 4 utilizes separate uplinks since normal VLAN Traffic is being transparently switched within each group using Q-n-Q. Since all traffic is transparent and is contained within its own vNIC Group and I/O Module it is possible to run the same VLAN or VLANs within multiple vNIC Groups and still maintain VLAN isolation. For instance, in Figure 4-1 on page 59 below VLAN 20 is being utilized within two separate ESXi vSwitch’s. However, since each vSwitch has its own physical uplink and the I/O Module is also running Virtual Fabric vNIC Dedicated Uplink Mode VLAN 20 between the two vSwitches will remain in isolation from one another.
  • 73. Chapter 4. NIC virtualization considerations on the switch side 59 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm Virtual Fabric vNIC Dedicated Uplink mode is shown in Figure 4-1 below. Figure 4-1 IBM Virtual Fabric vNIC Dedicated Uplink Mode Shared Uplink mode Shared uplink mode is a global option that can be enabled on an I/O Module that has the vNIC feature enabled. As the name suggests, it allows an uplink to be shared by more than one group, which reduces the possible number of uplinks that are required. It also changes the way that the vNIC groups process packets for tagging. In Shared Uplink mode, it is expected that the servers no longer use tags. Instead, the vNIC group VLAN acts as the tag that is placed on the packet. When a server sends a packet into the vNIC group, it has a tag placed on it equal to the vNIC group VLAN and then sends it out the uplink tagged with that VLAN. Only one VLAN can be assigned to a vNIC Group. Since Shared Uplink mode is a global parameter, Dedicated Uplink mode cannot be utilized on the same I/O Module when enabled. Unlike the restrictions that both Virtual Fabric Dedicated and Shared Uplink mode contains, Unified Fabric Port (UFP) does not contain these restrictions. Example Configuration Example 4-2 on page 60 shows an example of Shared Uplink mode. The following parameters must be set in order for Shared Uplink mode to operate properly. Also note that most parameters below in this example are identical to the settings in Dedicated Uplink mode section above minus the vnic uplink-share command and the vlan number which in Shared Uplink mode is identical to that of the customers vlan. 򐂰 The default VLAN must be set on both the INT and EXT Port or PortChannel participating in the Shared Uplink vNIC mode configuration.
  • 74. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 60 NIC Virtualization on IBM Flex System 򐂰 TAGGING must be enabled on the EXT Port or PortChannel. All VLAN’s set within the vnicgroup will be TAGGED to the upstream customer network. Example 4-2 Virtual Fabric vNIC Shared Uplink mode example configuration vnic enable vnic uplink-share vnic port INTA1 index 1 bandwidth 25 enable exit ! vnic vnicgroup 1 vlan 100 enable failover member INTA1.1 port EXT1 exit ! In Figure 4-2 on page 61, Virtual Fabric vNIC Shared Uplink Mode uses vNIC Groups to partition the vSwitch within the ESXi Host. Note that this is not specific to VMware and is supported on all Intel Platform Operating Systems with the Emulex Virtual Fabric Adapter. In this example vNIC Group1, 2, and 3 all share the same uplink port out of the I/O Module in order to communicate with the network. vNIC Group4, however, utilizes a separate uplink giving flexibility and control over physical connectivity into the network. The biggest draw back to Virtual Fabric vNIC Shared Uplink Mode is the inability to apply VLANs via the operating system.
  • 75. Chapter 4. NIC virtualization considerations on the switch side 61 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm Virtual Fabric vNIC Shared Uplink mode is shown in Figure 4-2 below. Figure 4-2 IBM Virtual Fabric vNIC Shared Uplink Mode 4.1.2 Switch Independent mode vNIC Switch Independent mode vNIC is configured only on the node, and the I/O Module is unaware of this virtualization. The I/O Module acts as a normal switch in all ways (any VLAN that must be carried through the I/O Module must be created on the I/O Module and allowed on the wanted ports). This mode is enabled at the compute node directly (via F1 setup at boot time or via FSM configuration pattern controls), and has similar rules as Virtual Fabric vNIC mode regarding how you can divide the vNIC’s. But any bandwidth settings are limited to how the node sends traffic, not how the I/O Module sends traffic back to the node (since the I/O Module is unaware of the vNIC virtualization taking place on the Compute Node). Also, the bandwidth settings cannot be changed in real time, because they require a reload of the compute node for any speed change to take effect. Switch Independent mode requires setting an LPVID value in the Compute Node NIC configuration, and this is a catch-all VLAN for the vNIC to which it is assigned. Any untagged packet from the OS sent to the vNIC is sent to the switch with the tag of the LPVID for that vNIC. Any tagged packet sent from the OS to the vNIC is sent to the switch with the tag set by the OS (the LPVID is ignored). Owing to this interaction, most users set the LPVID to some unused VLAN, and then tag all packets in the OS. One exception to this is for a Compute Node that needs PXE to boot the base OS. In that case, the LPVID for the vNIC that is providing the PXE service must be set for the wanted PXE VLAN.
  • 76. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 62 NIC Virtualization on IBM Flex System Because all packets that are coming into the I/O module from a NIC that is configured for Switch Independent mode vNIC are always tagged (by the OS or by the LPVID setting if the OS is not tagging), all VLANs that are allowed on the port on the I/O Module side should be tagging as well. This means set the PVID/Native VLAN on the switch port to some unused VLAN, or set it to one that is used and enable PVID tagging to ensure the port sends and receives PVID and Native VLAN packets as tagged. In most OSs, Switch Independent mode vNIC supports as many VLANs as the OS supports. One exception is with bare metal Windows OS installations, where in Switch Independent mode, only a limited number of VLANs are supported per vNIC (maximum of 63 VLANs, but less in some cases, depending on version of Windows and what driver is in use). See the documentation for your NIC for details about any limitations for Windows and Switch Independent mode vNIC. Example Configuration In Figure 4-3 on page 63, Switch Independent Mode is being utilized to present multiple vmnic instances to the hypervisor. Each vmnic can be used to connect to it’s own vSwitch with multiple Port Groups. In this example each vmnic is configured to support 1 or more Port Groups. Those Port Groups without a VLAN defined will utilize the LPVID VLAN ID to communicate with the Network. For instance, vmnic 0 has an untagged Port Group defined that is part of the LPVID 200 vNIC. For that specific Port Group each VM client will end up on the network TAGGED with VLAN 200. Those Port Groups that do contain a VLAN TAG will utilize its own TAG and will bypass the LPVID. The same thing goes for the untagged Port Group connected to vmnic 2 except that VM client will utilize the LPVID VLAN 300 to communicate with the Network. The I/O Module, on the other hand, sees these ports as physical 10 GB Ports utilizing Traditional Network VLAN’s and Switching technology.
  • 77. Chapter 4. NIC virtualization considerations on the switch side 63 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm Figure 4-3 IBM Switch Independent vNIC mode Summary of Virtual Fabric mode vNIC options In this section, we have described the various modes of vNIC. The mode that is best-suited for a user depends on the user’s requirements. Virtual Fabric Dedicated Uplink mode offers the most control, and Shared Uplink mode and Switch Independent mode offer the most flexibility with uplink connectivity.
  • 78. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 64 NIC Virtualization on IBM Flex System 4.2 Unified Fabric Port feature Unified Fabric Port (UFP) is another approach to NIC virtualization. It is similar to vNIC but with enhanced flexibility and should be considered the direction for future development in the virtual NIC area for IBM switching solutions. UFP is supported today on the EN4093R 10Gb Scalable Switch and CN4093 10Gb Converged Scalable Switch and utilizes LLDP TLDs to communicate between the physical switch port and the physical NIC within the Compute Node. UFP and vNIC are mutually exclusive in that you cannot enable UFP and vNIC at the same time on the same switch. If a comparison were to be made between UFP and vNIC, UFP is most closely related to vNIC Virtual Fabric mode in that in both sides, the switch and the NIC/CNA share in controlling bandwidth usage, but there are significant differences. Compared to vNIC, UFP supports the following modes of operation per virtual NIC (vPort): 4.2.1 UFP Access and Trunk modes 򐂰 Access: The vPort only allows the default VLAN, which is similar to a physical port in access mode. 򐂰 Trunk: The vPort permits host side tagging and supports up to 32 customer-defined VLANs on each vPort. Example Configuration Example 4-3 shows one vPort configured for Access mode and another vPort, within the same physical port, configured for Trunk mode. VLAN 10 on vPort 1 is set to be an access port allowing only a single un-tagged VLAN for this vPort. VLAN 20 on vPort 2 is set to be the native VLAN for that vPort with VLAN 30 and 40 set to be tagged over that same vPort. Example 4-3 vPort Access and Trunk mode example configuration ufp port INTA1 vport 1 network mode access network default-vlan 10 enable exit ! ufp port INTA1 vport 2 network mode trunk network default-vlan 20 enable exit ! Note: Before beginning the following criteria’s must be set before an I/O module port can be enabled to support UFP: 򐂰 VLAN 1 is the only VLAN that can be assigned. 򐂰 TAGGING must be enabled. (When enabling UFP on a physical port tagging will be enabled automatically.) Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable command) and on the port (ufp port port identifier enable).
  • 79. Chapter 4. NIC virtualization considerations on the switch side 65 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm vlan 30,40 enable vmember INTA1.2 Optionally, Example 4-4 shows adding the ability to detect uplink failures referred to as failover. Failover is a feature used to monitor an up uplink port or port channel and upon detection of a failed link or port channel the I/O Module will disable any associated members (INT Ports) or vmembers (UFP vPorts). Example 4-4 UFP Failover of a vmembers failover trigger 1 mmon monitor member EXT1 failover trigger 1 mmon control vmember INTA1.1 failover trigger 1 enable Configuration validation and state of a UFP vPort While it’s easy enough to read and understanding how to configure an I/O module for UFP, there are several troubleshooting commands that can be utilized to validate the configuration and the state of a vPort as seen below in Example 4-5. and Figure 4-7 on page 73. Example 4-5 below shows the results of a successfully configured vPort with UFP selected and running on the Compute Node. Example 4-5 display’s individual ufp vPort configuration and status PF_CN4093a#show ufp information vport port 3 vport 1 ------------------------------------------------------------------- vPort state evbprof mode svid defvlan deftag VLANs --------- ----- ------- ---- ---- ------- ------ --------- INTA3.1 up dis trunk 4002 10 dis 10 20 30 Below is an understanding of each of the states taking from the above Example 4-5. 򐂰 vPort = is the Virtual Port ID [port.vport] 򐂰 state = the state of the vPort (up, down or disabled) 򐂰 evbprof = only used when Edge Virtual Bridge Profile is being utilized, i.e. 5000v 򐂰 mode = vPort mode type, e.g. access, trunk, tunnel, fcoe, auto 򐂰 svid = Reserved VLAN 4001-4004 for UFP vPort communication with Emulex NIC 򐂰 defvlan = default VLAN is the PVID/Native VLAN for that vPort (untagged) 򐂰 deftag = default TAG, disabled by default, allows for option to tag the defvlan 򐂰 VLANs = list of VLAN’s assigned to that vPort Some other useful UFP vPort troubleshooting commands can be seen below in Example 4-6. Example 4-6 display's multiple ufp vPort configuration and status PF_CN4093a(config)#show ufp information port ----------------------------------------------------------------- Alias Port state vPorts chan 1 chan 2 chan 3 chan 4 ------- ---- ----- ------ --------- --------- --------- --------- INTA1 1 dis 0 disabled disabled disabled disabled INTA2 2 dis 0 disabled disabled disabled disabled INTA3 3 ena 1 up disabled disabled disabled .
  • 80. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 66 NIC Virtualization on IBM Flex System . . PF_CN4093a(config)#show ufp information vport ------------------------------------------------------------------- vPort state evbprof mode svid defvlan deftag VLANs --------- ----- ------- ---- ---- ------- ------ --------- INTA1.1 dis dis tunnel 0 0 dis INTA1.2 dis dis tunnel 0 0 dis INTA1.3 dis dis tunnel 0 0 dis INTA1.4 dis dis tunnel 0 0 dis INTA2.1 dis dis tunnel 0 0 dis INTA2.2 dis dis tunnel 0 0 dis INTA2.3 dis dis tunnel 0 0 dis INTA2.4 dis dis tunnel 0 0 dis INTA3.1 up dis trunk 4002 10 dis 10 20 30 . 4.2.2 UFP Tunnel mode Q-in-Q mode, where the vPort is customer VLAN-independent (this is the closest to vNIC Virtual Fabric dedicated uplink mode). Tunnel mode is the default mode for a vPort. Example Configuration Example 4-7 shows port INTA1 vPort 3 configured in Tunnel mode, Q-n-Q, which can carry multiple VLANs through a single outer tagged VLAN ID. In this example we are using VLAN 4091 as the Tunnel VLAN. When configuring UFP Tunnel mode at least one EXT port must be configured to support the Outer VLAN ID as seen in the below example. Example 4-7 vPort Tunnel mode example configuration ufp port INTA1 vport 3 network mode tunnel network default-vlan 4091 enable exit ! interface port EXT1 tagpvid-ingress pvid 4091 exit Configuration validation and state of a UFP vPort - Tunnel mode While it’s easy enough to read and understanding how to configure an I/O module for UFP, there are several troubleshooting commands that can be utilized to validate the configuration and the state of a vPort. (See Example 4-6 on page 65.) Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable command) and on the port (ufp port port identifier enable).
  • 81. Chapter 4. NIC virtualization considerations on the switch side 67 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm 4.2.3 UFP FCoE mode UFP FCoE mode dedicates the specific vPort (vPort 2 only) for FCoE traffic when enabled within the UEFI. See Chapter 5, “NIC virtualization considerations on the server side” on page 75 on how to enable FCoE within a compute node. Example Configuration Example 4-8 shows vPort 2 set in FCoE mode utilizing VLAN 1001. QoS minimum bandwidth is set to 50% of a 10 GbE port with the default max burst set of 100%. Example 4-8 vPort FCoE Mode example configuration ufp port INTA1 vport 2 network mode fcoe network default-vlan 1001 qos bandwidth min 50 enable exit In Figure 4-4 on page 68, IBM Unified Fabric Port utilizes vPorts to create isolation between virtual NICs within the compute node and maintains that isolation within the I/O module. vmNIC’s within the compute node are created (up to 4 per 10 GB NIC) that can be assigned to separate vSwitches or be seen as a virtual HBA within the hypervisor or bare bone Operating System. In this example vPort (.1) is utilized for ESXi Management for connectivity to vCenter and vPort (.3) is utilized for vMotion both of which are set to Access mode. vPort (.2) has been enabled for FCoE mode. vPort (.4), which is set to Tunnel mode, is utilized to Tunnel VM Data between the hypervisor and the upstream network. Configuration validation and state of a UFP vPort While it’s easy enough to read and understanding how to configure an I/O module for UFP, there are several troubleshooting commands that can be utilized to validate the configuration and the state of a vPort. (See Example 4-6 on page 65). Note: This is only the vPort setting required to carry FCoE. CEE, FCoE FIPS Snooping and other settings are required to be enabled that can be seen in Chapter 6, “Flex System NIC virtulization deployment scenarios” on page 133. Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable command) and on the port (ufp port port identifier enable).
  • 82. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 68 NIC Virtualization on IBM Flex System Figure 4-4 IBM Unified Fabric Port Mode 4.2.4 UFP Auto mode The UFP vPort Auto mode feature is based on IBM VMready and IEEE 802.1Qbg implementations. IBM VMready and IEEE 802.1Qbg Edge Virtual Bridging are software solutions that supports open standards virtualization. They allow administrators to create groups of virtual machine port groups allowing the ability to administer and migrate from a central location. VMready works with all major hypervisor software, including VMware, Microsoft Hyper-V, Linux Kernel-based Virtual Machine (KVM) or, Citrix XenServer. Although IBM PowerVM® is supported with VMready, UFP is specific to Intel based Compute Nodes. It requires no proprietary tagging or changes to the hypervisor software. UFP vPort Auto Mode works to dynamically create and remove VLAN’s learned from the vPort. When a VLAN is created and added to a vPort that same VLAN ID is also added to the Uplink associated with that vPort. This, however, can be intrusive to a network if having more than one Uplink path out of a Switch, not a PortChannel, to a single destination running the same VLAN. Caution should be taken when implementing VMready. More information can be found on implementing VMready within the following Redbooks Publication: http://www.redbooks.ibm.com/abstracts/sg247985.html
  • 83. Chapter 4. NIC virtualization considerations on the switch side 69 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm 4.2.5 The following rules and attributes are associated with UFP vPorts 򐂰 They are supported only on 10 Gb internal interfaces. 򐂰 UFP allows a NIC to be divided into up to four virtual NICs called vPorts per physical NIC (can be less than 4, but not more). 򐂰 Each vPort can be set for a different mode or same mode (with the exception of the FCoE mode, which is limited only to a single vPort on a UFP port, and specifically only vPort 2). 򐂰 UFP requires the proper support in the Compute Node for any port using UFP. 򐂰 By default, each vPort is ensured 2.5 Gb and can burst up to the full 10G if other vPorts do not need the bandwidth. The ensured minimum bandwidth and maximum bandwidth for each vPort are configurable. 򐂰 The minimum bandwidth settings of all configured vPorts on a physical NIC cannot exceed 10 Gb. 򐂰 Each vPort must have a default VLAN assigned. This default VLAN is used for different purposes in different modes. 򐂰 This default VLAN must be unique across the other three vPorts for this physical port, which means that vPort 1.1 must have a different default VLAN assigned than vPort 1.2, 1.3 or 1.4. 򐂰 When in trunk or access mode, this default VLAN is untagged by default, but it can be configured for tagging if desired. This configuration is similar to tagging the native or PVID VLAN on a physical port. In tunnel mode, the default VLAN is the outer tag for the Q-in-Q tunnel through the switch and is not seen by the end hosts and upstream network. 򐂰 vPort 2 is the only vPort that supports the FCoE setting. vPort 2 can also be used for other modes (for example, access, trunk or tunnel). However, if you want the physical port to support FCoE, this function can only be defined on vPort 2 򐂰 The physical port must be set to VLAN 1 as the pvid with tagging enabled and no other VLAN’s defined for that port. Table 4-2 offers some check points in helping to select a UFP mode. Table 4-2 Attributes of UFP modes Summary of whether or not Virtual Fabric or UFP should be considered What are some of the criteria to decide if a UFP or vNIC solution should be implemented to provide the virtual NIC capability? Capability IBM UFP vPort mode options Access Trunk Tunnel FCoE Support for a single untagged VLAN on the vPorta Yes Yes Yes No Support for VLAN restrictions on vPortb Yes Yes No Yes VLAN-independent pass-true for customer VLANs No No Yes No Support for FCoE on vPort No No No Yes Support to carry more than 256 VLANs on a vPort No No Yes No a. Typically a user sets the vPort for access mode if the OS uses this vPort as a simple untagged link. Both trunk and tunnel mode can also support this, but are not necessary to carry only a single untagged VLAN. b. Access and FCoE mode restricts VLANs to only the default VLAN that is set on the vPort. Trunk mode restricts VLANs to ones that are specifically allowed per VLAN on the switch (up to 32).
  • 84. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 70 NIC Virtualization on IBM Flex System In an environment that has not standardized on any specific virtual NIC technology, UFP is the way to go. As noted, all future virtual NIC development will be on UFP. UFP has the advantage of being able to emulate vNIC virtual fabric modes (via tunnel mode for dedicated uplink vNIC and access mode for shared uplink vNIC) but can also offer virtual NIC support with customer VLAN awareness (trunk mode) and shared virtual group uplinks for access and trunk mode vPorts. If an environment has already standardized on Virtual Fabric mode vNIC and plans to stay with it, Virtual Fabric mode vNIC is recommended. Note that Switch Independent mode vNIC is actually exclusive of the above decision making process. Switch Independent mode has its own unique attributes, one being truly switch independent, which allows a user to configure the switch without restrictions to the virtual NIC technology, other than allowing the proper VLANs. UFP and Virtual Fabric mode vNIC each have a number of unique switch-side requirements and configurations. The down side to Switch independent mode vNIC is the inability to make changes to the vNIC without first reloading the server, and the lack of support for bidirectional bandwidth allocation. 4.3 Compute node NIC to I/O module connectivity mapping Port Mapping between CNA NICs and I/O module slots are often mis-understood and confusing to explain. Each type of mezzanine card option could have similar connectivity to each I/O module slot and others might be completely different depending on the number of ports and ports per ASIC. One thing is always the same, each mezzanine slot consists of four lanes. Each lane can drive either 1 Gb or 10 Gb Ethernet speeds. In total a single mezzanine slot is possible of driving up to 40 Gb Ethernet to each I/O module.
  • 85. Chapter 4. NIC virtualization considerations on the switch side 71 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm 4.3.1 Embedded 10Gb VFA (LoM) - Mezzanine 1 Figure 4-5 shows an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard or LoM), specifically for the x86 compute nodes that can be replaced with another option card by removing the riser card from Mezzanine Slot 1. The 2-port LoM types are capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The dual-port LoM consists of a single ASIC with two ports of 10 GbE that has physical direct wiring through the midplane to the I/O Module Slot 1 and 2 for port redundancy. Figure 4-5 2 port LoM 10G VFA Mezz 1 connectivity to I/O Modules 1 and 2
  • 86. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 72 NIC Virtualization on IBM Flex System 4.3.2 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 Figure 4-6 shows a CN4054 4 port 10Gb Virtual Fabric Adapter specifically for the x86 compute nodes that can be placed into either Mezzanine Slot 1 or 2. The 4-port CNA type is capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The four-port CNA Card consists of dual ASICs with two ports of 10 GbE each that has physical direct wiring through the midplane to the I/O Module Slot 1 and 2 for port redundancy when placed into Mezzanine Slot 1. Figure 4-6 4 port CN4054/R 10G VFA Mezz 1 connectivity to I/O Modules 1 and 2 4.3.3 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 and 2 Figure 4-7 on page 73 shows two 4-port CN4054 10Gb Virtual Fabric Adapters specifically for the x86 compute nodes that has placed into both Mezzanine Slots 1 and 2. The 4-port CNA type is capable of pNIC, FCoE and iSCSI (license key may be required). The Virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The four-port CNA card consists of dual ASICs with two ports of 10 GbE each that has physical direct wiring through the Midplane to I/O Module Slot 1 and 2 for Mezzanine 1 and I/O Modules 3 and 4 for Mezzanine 2. This provides for a highly redundant environment with bandwidth possibilities of up to 80 Gb can be achieved with this option to each half width compute node.
  • 87. Chapter 4. NIC virtualization considerations on the switch side 73 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Switch side.fm Figure 4-7 Two 4-port CN4054/CN4054R 10Gb VFA Mezz 1 and 2 connectivity to I/O Modules 4.3.4 IBM Flex System x222 Compute Node Each server in the x222 includes an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard or LOM) built in to the system board. The x222 has one Fabric Connector (which is physically on the lower server) and the Ethernet connections from both Embedded 10 Gb VFAs are routed through it. Figure 4-8 shows how each server connects to the I/O module. Each 2-port CNA type is capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. Figure 4-8 x222 Node Server connectivity to I/O Module
  • 88. NIC virtualization considerations - Switch side.fm Draft Document for Review May 1, 2014 2:10 pm 74 NIC Virtualization on IBM Flex System Switch upgrade 1 required: For EN4093, EN4093R, CN4093 and SI4093 switches, you must have Upgrade 1 enabled in the two switches. Without this feature upgrade, the upper server will not have any Ethernet connectivity.
  • 89. © Copyright IBM Corp. 2014. All rights reserved. 75 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Chapter 5. NIC virtualization considerations on the server side In 3.3, “IBM Flex System Ethernet adapters” on page 47 we introduced the physical Emulex NICs that support virtual NIC functionality in the PureFlex System environment and in Chapter 4, “NIC virtualization considerations on the switch side” on page 55 we discussed the I/O Module virtualization features. In this chapter we go into detail on how to enable the NIC virtualization from the server side, as well as some design considerations for utilizing these NICs within various operating systems. The following topics are covered: 򐂰 5.1, “Introduction to enabling Virtual NICs on the server” on page 76 򐂰 5.2, “Other methods for configuring virtual NICs on the server” on page 92 򐂰 5.3, “Utilizing physical and virtual NICs in the OS” on page 115 5
  • 90. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 76 NIC Virtualization on IBM Flex System 5.1 Introduction to enabling Virtual NICs on the server Regardless of what mode of virtual NIC is desired, all modes have at least some small element of low level configuration that must be performed on the server side. Some Emulex NICs may ship pre-configured for vNIC Virtual Fabric mode already enabled, but even those can be changed to a different mode, or have vNIC disabled all together if desired. Exactly how to enable and/or change the virtual NIC function on the Emulex NICs has varied over the years, but for the most part it can always be done via the UEFI configuration from the F1 setup on the server. It is also possible to control and automate setting virtual NIC options via certain tools, such as using Configuration Patterns in the FSM, and this will also be introduced in this section as well, but we will primarily focus on using the F1 setup method for configuring the virtual NIC on the server side. 5.1.1 Getting in to the virtual NIC configuration section of UEFI When manually performing the virtual NIC configuration on the server, it is necessary to enter UEFI via the F1 setup option during server boot. Once you are into F1 setup you need to drill into the section that permits enabling and changing the desired virtual NIC mode and perform any changes and then save those changes. Important: The steps to get into UEFI in this section assume the reader knows how to get to the console of a Compute Node. For reference, this is commonly done by connecting via browser to the IMM IP address of that host, and clicking on the Remote Control button, and the clicking on the option to start remote control in either single-user or multi-user mode.
  • 91. Chapter 5. NIC virtualization considerations on the server side 77 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm The following are the exact steps on how to get to the virtual NIC configuration screens when utilizing version 4.6.281.26 of the Emulex firmware 1. Power on the server, and when the screen shown in Figure 5-1 is present, press the F1 key to enter in to UEFI setup. Figure 5-1 Example of screen to press the F1 key to enter UEFI setup
  • 92. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 78 NIC Virtualization on IBM Flex System 2. On the main System Configuration and Boot Management screen as seen in Figure 5-2 on page 78 use the arrow keys to scroll down to System Settings option and press Enter. Figure 5-2 Example of first screen viewed after pressing the F1 key to enter UEFI setup
  • 93. Chapter 5. NIC virtualization considerations on the server side 79 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 3. On the System Settings screen as seen in Figure 5-3, scroll down to the Network option and press Enter. Figure 5-3 Example of screen to enter network set up
  • 94. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 80 NIC Virtualization on IBM Flex System 4. On the Network screen, scroll down to the desired NIC and press Enter – Exactly how many NICs you see on the Network screen will vary, depending on what model NIC is installed (dual port, quad port and so on), how many of these NICs are installed (LoM only, MEZZ1 and/or MEZZ2 slots used), and if a virtual NIC mode is already enabled or not. For example, if this were a Compute Node with only the LoM dual port NIC, and no virtual NIC had previously been enabled, you would only see the two physical NICs on this screen, as seen in Figure 5-4. – If this were the same dual port NIC and virtual NIC had already been enabled, you would see between six and eight NICs on this screen (depending on if FCoE/iSCSI had also been previously enabled or not). Figure 5-4 Example of Network screen with dual port LoM, before any virtual NIC has been enabled
  • 95. Chapter 5. NIC virtualization considerations on the server side 81 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm – Figure 5-5 shows how the Network screen might look on a dual port NIC after some form of virtual NIC had been enabled, and the system restarted. Figure 5-5 Example of Network screen after vNIC has been enabled and the system restarted – The images in Figure 5-4 on page 80 and Figure 5-5 on page 81 also illustrate an important concept, once a NIC has been placed into a virtual NIC mode and reloaded, and a user comes back into this Network screen, if it is desired to drill back into the NICs to review or change the virtual NIC settings, the two top NICs (in this example of a dual NIC solution) are the only ones that will let you make those changes. If you drill into the third through eight NICs in this list, the user will not be presented with an option to drill in to make changes to the virtual NIC settings. Only the first two NICs in the list of 8 NICs in this example will let you make those changes.
  • 96. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 82 NIC Virtualization on IBM Flex System 5. Once a user highlights the desired NIC in the Network screen and presses the Enter key, a screen for just that one NIC will be shown, something like what is shown in Figure 5-6. Figure 5-6 Example of the individual NIC screen
  • 97. Chapter 5. NIC virtualization considerations on the server side 83 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 6. On the screen shown in Figure 5-6 on page 82, highlight the NIC itself and press Enter to drill one step deeper into that NICs configuration, which will bring up a screen called Emulex NIC selection, that will look something like Figure 5-7 (may vary depending on firmware version of the NIC). Figure 5-7 Example of the Emulex NIC Selection screen (virtual NIC disabled) Some important items with regard to Figure 5-7: – If Multichannel mode is disabled, then regardless of the Personality setting (NIC, FCoE or iSCSI), the OS will be presented with just the physical NICs – If Multichannel mode is set to any form of virtual NIC mode, then the Personality setting impacts how many virtual NICs are presented to the OS. • If NIC is selected in Personality, 4 NICs will be presented to the OS for each 10G NIC set to a form of virtual NIC • If FCoE or iSCSI is selected in Personality, 3 NICs will be presented to the OS for each 10G NIC set to a form of virtual NIC. An example of 3 ports on each NIC on a dual port NIC (6 ports total) can be seen in Figure 5-8 on page 84
  • 98. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 84 NIC Virtualization on IBM Flex System Figure 5-8 NICs available on dual port NIC with virtual NIC enabled, and iSCSI or FCoE Personality enabled – The Multichannel mode is how the virtual NIC feature is enabled, and should bring up a window as shown in Figure 5-9 when Multichannel is selected and the Enter key is pressed: Figure 5-9 Emulex Multichannel (virtual NIC) mode options And should have these four options listed • Switch Independent Mode (This is Switch Independent Mode vNIC) • IBM Virtual Fabric Mode (this is vNIC Virtual Fabric mode) • IBM Unified Fabric Protocol Mode (This is UFP) • Disable (when selected turns off all NIC virtualization on this ASIC – Controller configuration is where you can make some changes to the vNIC modes of virtual NIC (once enabled and saved in UEFI, all remaining configuration for the UFP modes of virtual NIC is done via the I/O Module) Important: If you do not see all three virtual NIC options (Switch Independent Mode, IBM Virtual Fabric Mode, and IBM Unified Fabric Protocol Mode), more then likely the NIC is on down level firmware, and should be upgraded before going any further.
  • 99. Chapter 5. NIC virtualization considerations on the server side 85 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 5.1.2 Initially enabling virtual NIC functionality via UEFI Starting from the Emulex NIC selection screen, perform the following steps to select a virtual NIC mode: 1. Scroll down to the Multichannel Mode option and press Enter to see the selections as shown in Figure 5-10. Figure 5-10 Selecting a multichannel mode 2. In the screen shown in Figure 5-10 scroll to the desired virtual NIC mode and press the Enter key to enable the version of virtual NIC to be used (or disable it if the Disable option is selected) 3. What needs to happen next depends on what mode is selected: – If Switch Independent Mode is selected, you must now go into the Controller Configuration portion of the Emulex NIC Selection screen, and set the LPVID (Logical Port VLAN Identifier), and the Bandwidth (in older firmware you also had to enable or disable each virtual NIC individually, but that is not necessary in newer firmware). See Special settings for vNIC Switch Independent Mode section for details. With this mode of Virtual NIC mode, there are no special settings that need to be performed on the I/O Modules. – If IBM Virtual Fabric Mode is selected, you can optionally go into the Controller Configuration section and set LPVID (as seen in Special settings for vNIC Virtual Fabric mode section), but you must perform specific configuration steps on the I/O Modules to complete this mode of virtual NIC. See chapter 4 for details on necessary settings on the I/O Modules to complete this configuration. – If IBM Unified Fabric Protocol Mode is selected, no other configuration in the UEFI is permitted, but you must perform specific configuration on the I/O Modules themselves
  • 100. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 86 NIC Virtualization on IBM Flex System to complete this mode of virtual NIC. See chapter 4 for details on necessary settings on the I/O Modules to complete this configuration. Regardless of the mode selected, it is necessary to eventually exit out of UEFI and save the changes before any of these options take effect. It is important to note that enabling a type of virtual NIC in the Multichannel mode section of the Emulex NIC Selection screen impacts all NICs on an ASIC, not just that single NIC. If working with the dual port NIC (single ASIC solution), enabling a virtual NIC mode on one NIC enables the feature on both NICs. If working with the 4 or 8 port Emulex NIC (both dual ASIC solutions) and want virtual NICs on all NICs, you must enable it twice, once for each ASIC (in the case of the 8 port NIC, when you enable it on a single port on an ASIC, the other 3 ports on that same ASIC are also enabled for this function). See Chapter 4 for details on ASIC NIC mapping in relationship to I/O Module connectivity. 5.1.3 Special settings for the different modes of virtual NIC via UEFI As noted, when UFP is enabled there are no other settings necessary in UEFI, but both modes of vNIC virtual NIC have more settings that can be performed within UEFI. These extra settings are mandatory with Switch Independent Mode vNIC, and optional for Virtual Fabric Mode vNIC. The following are the extra settings for these modes. Important: Unlike when enabling the virtual NIC feature itself, where it effects all ports on the same ASIC, you must complete these extra settings on a per physical port basis. So if this is a dual port NIC, once you have set and saved the first NIC, you must exit back to the Network screen, and select the second physical NIC, and repeat the process.
  • 101. Chapter 5. NIC virtualization considerations on the server side 87 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Special settings for vNIC Switch Independent Mode After the Multichannel Mode has been set to Switch Independent Mode, it is now mandatory to scroll down to the Controller Configuration option and complete other steps to bring these virtual NICs fully operational. After selecting the Controller Configuration option and pressing Enter you will be taken to a screen similar to that seen in Figure 5-11. Figure 5-11 Example options available in Switch Independent Mode As can be seen, the Controller Configuration screen for Switch Independent Mode offers 4 options: 1. View configuration- Views the most recently saved configuration (changes that have been made but have not yet been saved via the Save Current Configurations option on this screen, will not be seen in here) 2. Configure Bandwidth - Defaults to 0G per vNIC, and must be set and saved before they become operational in the OS 3. Configure LPVID - Must be set and saved before these vNICs will become operational in the OS 4. Save Current Configuration - Must save config changes before leaving this screen or changes will be lost Important: One of the most common issues noted in the field is the changes not being saved in this screen before exiting. Remember to always save here if any changes are made in this area. It may be a good idea after saving changes and exiting this screen, to go back into this screen and reconfirm the configurations for LPVID and Bandwidth were truly saved.
  • 102. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 88 NIC Virtualization on IBM Flex System The following provides more details on these specific options. Configure Bandwidth: After scrolling to the Configure Bandwidth option and pressing the Enter key, a screen similar to Figure 5-12 will be shown: Figure 5-12 Example of Bandwidth settings in Switch Independent Mode showing default settings Users must properly set the desired minimum and maximum bandwidths before this configuration can be saved. The following are some guidelines with regard to these Bandwidth settings: 򐂰 All values are in percentages of 10G (for example, setting a 10 in here represents 10% of 10G, meaning it is set for 1G) 򐂰 All values are between 0 to 100 in increments of 1 (1% of 10G = 100M) 򐂰 The total value of all the minimums must equal 100%, or save will not be allowed 򐂰 The value of any given vNIC maximum must be equal to or greater then the minimum for that vNIC 򐂰 If hard enforcement of bandwidth is desired, set the minimum and maximum values the same for each vNIC. An example of this would be setting both the minimum and maximums values all to 25, which would hard lock the values to 2.5G per each vNIC. 򐂰 If it is desired to allow vNICs to use excess bandwidth not in use by other vNICs, set the maximum to a higher value then the minimum. An example of this would be setting all of the minimums to 25, and all of the maximums to some higher value, in which case each vNIC is guaranteed 25%, but can use up to their maximum percentage if other vNICs are not using their full minimum allotment.
  • 103. Chapter 5. NIC virtualization considerations on the server side 89 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 򐂰 It is possible to set the maximum for all vNICs to 100%, meaning each vNIC is guaranteed the minimum set, but can use up to 100% of the remaining bandwidth if it is not in use by other vNICs Configure LPVID After scrolling to the Configure LPVID option and pressing Enter, a screen similar to Figure 5-12 will be shown. Figure 5-13 Example of default LPVID settings in Switch Independent Mode The LPVID is a unique concept to the vNIC based options (both Virtual Fabric mode and Switch Independent Mode). From an end user perspective the LPVID value could be considered the default VLAN for that vNIC. This LPVID value is only used by the OS if the OS is sending untagged packets. If the OS is sending untagged packets toward the I/O Module, that packet will get a tag equal to the LPVID for that vNIC, before being sent on its way to the I/O Module (return packets would have the LPVID VLAN stripped off before being sent back to the OS). If the OS is sending tagged packets, the LPVID is ignored and the OS VLAN tag is passed to the upstream I/O Module unmolested. One side effect of this LPVID usage is that all packets coming from a host running Switch Independent Mode will be delivered to the upstream I/O Module tagged (if the OS sends an untagged packet, it will be sent to the I/O Module tagged with the value of the LPVID setting for that vNIC, and if the OS sends the packet tagged, it will be sent to the I/O module with whatever tag the OS had put on the packet). The following are some guidelines with regard to these the LPVID settings: 򐂰 Valid LPVID values are 2-4094 򐂰 For Switch Independent mode, you must set the LPVID on all vNICs before a save will be allowed (this is an optional setting on Virtual Fabric vNIC mode)
  • 104. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 90 NIC Virtualization on IBM Flex System 򐂰 Each vNIC on a given physical port must use a unique LPVID - most cases, the partner NIC LPVIDs are set for the same value, but they could be set different 򐂰 Owing to how all packets will arrive tagged at the I/O Module, on the I/O Module side the interface must be tagged and if the host needs to use the currently assigned PVID/Native VLAN on the I/O Module side, then the tag-pvid option must be configured on this interface on the I/O Module. Another solution to this is to set the PVID/Native VLAN on the I/O Module for this port to some unused value and do not use the PVID/Native VLAN 򐂰 If bare metal PXE boot is not required on the host, one option is to set the LPVID values to some unused VLANs, and then only send tagged packets from the OS. The same restriction from the previous bullet (all packets tagged) still applies, but the end user no longer needs to keep track of which VLANs need to be tagged in the OS and which do not (just tag them all at all times). 򐂰 If bare metal PXE boot is required, then the LPVID for the vNIC that needs to PXE boot, must be set for the VLAN that the PXE packet is expected to arrive on Once the LPVID and bandwidth settings are properly set, before exiting the Controller configuration screen, the user must perform a save. Older versions of firmware would allow a user to escape out of this screen without saving and not provide any warning. The version of firmware used during the writing of this paper (and hopefully all newer versions) put up a warning as seen in Figure 5-14 if the changes have not been saved. Figure 5-14 Example of attempting to exit Switch Independent Mode vNIC Controller Configuration screen without saving
  • 105. Chapter 5. NIC virtualization considerations on the server side 91 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Special settings for vNIC Virtual Fabric mode When enabled for the Multichannel mode of IBM Virtual Fabric mode vNIC, the only UEFI option is to configure the LPVID value (Bandwidth is controlled from the I/O Module). Unlike the Switch Independent mode, this setting is strictly optional. Also unlike the Switch Independent Mode, it is not necessary to set all vNICs LPVID values, and still save the config. If desired, only a single vNIC or any or all vNICs can have an LPVID assigned or remain at 0 (0 meaning the vNIC passes untagged traffic untagged to the upstream I/O module), and it will still be allowed. For any vNICs that do have an LPVID assigned, the operation is the same as for Switch Independent Mode (if the host sends an untagged packet, that packet will be sent to the I/O Module tagged with the value of the LPVID, if a host sends a tagged packet, the LPVID is ignored and the tag the host set gets sent to the I/O Module). As noted, if no LPVID value is assigned (default for Virtual Fabric vNIC mode), any untagged packet sent from the OS will be sent to the I/O module untagged, and arrive on the Native/PVID VLAN assigned to the I/O Module port connecting to this host. 5.1.4 Setting the Emulex virtual NIC settings back to factory default If necessary, it is possible to reset the Emulex NICs back to factory default. This not only resets all of the Bandwidth and LPVID settings, but also disables Multichannel for this ASIC back to factory default. The option to perform this factory default can be found by scrolling to the bottom of the Emulex NIC Selection screen, and select Erase Configuration and pressing the Enter key. An example of the results of pressing Enter on this selection is shown in Figure 5-15 on page 92 Important: As noted previously, after the Bandwidth and LPVID values are configured and saved on one NIC, this process must be completed for the other physical NIC of this pair (you must exit back to the Network screen and select the other NIC and drill back in to the LPVID and Bandwidth settings and make and save the changes). This is different from the settings in the Emulex NIC selection screen, where changes there, to things like Multichannel mode and Personality, are carried to all NICs on the common ASIC. Important: Until both the LPVID and Bandwidth values are properly set and saved, the vNICs will show as disconnected in the OS. Be sure to complete these operations on all Switch Independent Mode configured NICS before attempting to utilize these NICs in the OS. Important: Regardless of if you do or do not set any LPVID values, the Virtual Fabric mode of vNIC now requires you to go into the I/O module to complete the configuration process (enable vNIC, create vNIC groups and assign other variables). Until the I/O module step is done the OS will report the vNIC as not connected. See Chapter 6, “Flex System NIC virtulization deployment scenarios” on page 133 for examples for configuring the I/O module side for Virtual Fabric vNIC.
  • 106. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 92 NIC Virtualization on IBM Flex System Figure 5-15 Example of setting Emulex NIC back to factory default 5.2 Other methods for configuring virtual NICs on the server Although the primary method used in this document for enabling virtual NICs on the server is via the UEFI F1 setup path, there are other tools available to help automate this process. This section introduces one such tool - FSM configuration patterns. 5.2.1 FSM Configuration Patterns With certain Emulex NICs it is possible to automate the deployment of the NIC settings via the FSM. Some examples of items that can be automated via the FSM: 򐂰 Change the personality between NIC, FCoE, or iSCSI (assuming FoDs installed) 򐂰 Enable a desired mode of Virtual NIC, or disable it 򐂰 For the vNIC modes of virtual NICs that offer other configuration options, we can change those options, such as LPVID or Bandwidth Currently the Embedded 10Gb Virtual Fabric Ethernet Controller (LOM) and IBM Flex System CN4054 10Gb Virtual Fabric Adapter are supported with FSM configuration patterns. The most important aspect of utilizing configuration patterns, is the ability to push out changes to many servers, without having to perform the tedious process of manually going into F1 setup on every server that virtual NICs need to be changed on. After making any such changes with FSM Configuration Patterns the server must be reloaded for those changes to take effect.
  • 107. Chapter 5. NIC virtualization considerations on the server side 93 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm The process of configuring NIC settings via configuration patterns include the following steps: 1. Creating port patterns that describe desired vNIC mode, protocols, and port settings 2. Creating adapter patterns that describe adapter types and desired protocols 3. Creating server patterns that describe node configuration including I/O adapter settings 4. Deploying server patterns on x86 compute node targets Consider the following hypothetical example. You need to configure vNIC Switch Independent mode with Ethernet only vNICs on the integrated LOM and vNIC UFP mode on the CN4054 adapter installed in slot 2 of the x240 compute node. The first ASIC of the CN4054 adapter needs to be configured with Ethernet only vNICs, and the second ASIC requires both Ethernet and FCoE vNICs. By default, both LOM and CN4054 adapters are not configured with any vNICs, as shown in Figure 5-16. Figure 5-16 Initial NIC configuration PFA 12:0:0 and PFA 12:0:1 represent two physical LOM ports, PFA 22:0:0 and PFA 22:0:1 represent two physical ports on the first ASIC of the CN4054, and PFA 27:0:0 and PFA 27:0:1represent two physical ports on the second ASIC of the CN4054, for a total of six network ports.
  • 108. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 94 NIC Virtualization on IBM Flex System Opening server configuration patterns Perform the following steps to open server configuration patterns: 1. Launch FSM Explorer from the Home tab of the FSM interface, as shown in Figure 5-17. Figure 5-17 Launch FSM Explorer
  • 109. Chapter 5. NIC virtualization considerations on the server side 95 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 2. Open Configuration Patterns in the FSM Explorer interface by selecting Systems  Configuration Patterns, as shown in Figure 5-18. Figure 5-18 Open Configuration Patterns 3. Select Server Patterns to manage server configuration patterns, as shown in Figure 5-19. Figure 5-19 Server Patterns Creating port patterns In our example, we are creating three port patterns: 򐂰 vNIC switch independent mode with Ethernet only ports 򐂰 Universal fabric port (UFP) mode with Ethernet only ports 򐂰 Universal fabric port (UFP) mode with Ethernet and FCoE ports
  • 110. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 96 NIC Virtualization on IBM Flex System Perform the following steps to create desired port patterns: 1. Click New icon and select New Port Pattern, as shown in Figure 5-20. Figure 5-20 New Port Pattern
  • 111. Chapter 5. NIC virtualization considerations on the server side 97 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 2. In the New Port Pattern window shown in Figure 5-21, specify the port pattern name and select desired parameters and click Create. In our example, we are creating switch independent vNIC mode with Ethernet only network ports. For switch independent vNIC, we should also assign bandwidth parameters and VLAN tags (VLAN tags represent the LPVID setting as seen in F1 setup for the NICs, as shown in Figure 5-13 on page 89). Figure 5-21 Port pattern: Configuring vNIC switch independent mode 3. Repeat steps 1 and 2 for the remaining port configurations. In our example, we are creating two more port patterns: UFP mode with Ethernet only ports and UFP mode with
  • 112. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 98 NIC Virtualization on IBM Flex System Ethernet and FCoE ports, as shown in Figure 5-22 on page 98 and Figure 5-23 on page 99. Figure 5-22 Port Pattern: Configuring vNIC UFP mode
  • 113. Chapter 5. NIC virtualization considerations on the server side 99 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Figure 5-23 Port Pattern: Configuring UFP mode with FCoE
  • 114. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 100 NIC Virtualization on IBM Flex System 4. Configured patterns are displayed in the Server Patterns window, as shown in Figure 5-24. Figure 5-24 List of configured port patterns Creating adapter patterns We are creating two adapter patterns: 򐂰 vNIC switch independent mode with Ethernet only ports for the integrated LOM 򐂰 vNIC UFP mode with Ethernet only ports for the first ASIC of the CN4054 and Ethernet and FCoE ports for the second ASIC of the CN4054
  • 115. Chapter 5. NIC virtualization considerations on the server side 101 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Perform the following steps to create adapter patterns: 1. Select New Adapter Pattern from the New Patterns drop-down menu, as shown in Figure 5-25. Figure 5-25 New Adapter Pattern
  • 116. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 102 NIC Virtualization on IBM Flex System 2. In the New Adapter Pattern window, specify the adapter pattern name, adapter type, operational mode and protocols, as shown in Figure 5-26. We are creating the pattern for the integrated LOM in vNIC switch independent mode with Ethernet only ports. Click Create. Figure 5-26 LOM adapter pattern settings
  • 117. Chapter 5. NIC virtualization considerations on the server side 103 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 3. Repeat steps 1 and 2 for the remaining patterns. In our example, we are configuring the pattern for the CN4094 in UFP mode with Ethernet only ports on the first ASIC (Configuration port group 1) and Ethernet and FCoE ports on the second ASIC (Configuration port group 2), as shown in Figure 5-27. Click Create. Figure 5-27 CN4054 adapter pattern settings
  • 118. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 104 NIC Virtualization on IBM Flex System Creating new server pattern We are creating the new server pattern that configures x240 compute node networking components as follows: 򐂰 Integrated LOM is set to vNIC switch independent mode with Ethernet only ports. 򐂰 The first ASIC of the CN4054 expansion card installed in slot 2 is set to UFP mode with Ethernet only ports. 򐂰 The second ASIC of the CN4054 expansion card installed in slot 2 is set to UFP mode with Ethernet and FCoE ports. Perform the following steps to create server patterns: 1. Select New Server Pattern from the drop-down menu as shown in Figure 5-28. Figure 5-28 Creating a new server pattern 2. Select Create a new pattern from scratch as shown in Figure 5-29 and click Next. Figure 5-29 Creating a new pattern from scratch
  • 119. Chapter 5. NIC virtualization considerations on the server side 105 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 3. Specify the pattern name and form factor as shown in Figure 5-30 and click Next. Figure 5-30 New Server Pattern Wizard: General 4. Leave Keep existing storage configuration selected as shown in Figure 5-31 and click Next. Figure 5-31 New Server Pattern Wizard: Local Storage
  • 120. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 106 NIC Virtualization on IBM Flex System 5. Expand Compute Node twistie, then click Add I/O Adapter 1 or LOM, as shown in Figure 5-32. Figure 5-32 Adding I/O adapter 1 or LOM 6. In the Add I/O Adapter window, select the adapter type (LOM) from the adapter list as shown in Figure 5-33, then click Add. Figure 5-33 Selecting the adapter type: LOM
  • 121. Chapter 5. NIC virtualization considerations on the server side 107 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 7. On the next screen, select previously configured adapter and port patterns, as shown in Figure 5-34. Click Add. In our example, we choose vNIC Switch Independent LOM adapter pattern and vNIC switch independent port pattern that we created earlier. Figure 5-34 Selecting adapter and port patterns 8. From the I/O Adapters screen (see Figure 5-32 on page 106) click Add I/O Adapter 2. 9. In the Add I/O Adapter window, select the adapter type (CN4054) from the adapter list as shown in Figure 5-35, then click Add. Figure 5-35 Selecting the adapter type: CN4054
  • 122. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 108 NIC Virtualization on IBM Flex System 10.On the next screen, select previously configured adapter and port patterns, as shown in Figure 5-36. In our example, we select previously configured vNIC UFP FCoE CN4054 adapter pattern and vNIC UFP and vNIC UFP FCoE port patterns. Click Add. Figure 5-36 Selecting adapter and port patterns: CN4054 11.The configured adapter settings are summarized in Figure 5-37. Click Next. Figure 5-37 New Server Pattern Wizard: I/O Adapters summary
  • 123. Chapter 5. NIC virtualization considerations on the server side 109 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 12.Leave Keep existing boot mode selected as shown in Figure 5-38 and click Save. Figure 5-38 New Server Pattern Wizard: Save 13.You can see the created server pattern in the list of patterns, as shown in Figure 5-39. Figure 5-39 Newly created server pattern
  • 124. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 110 NIC Virtualization on IBM Flex System Deploying server pattern Perform the following steps to deploy a server pattern: 1. Right click a server pattern that you are going to deploy and select Deploy from the context menu, as shown in Figure 5-40. Figure 5-40 Deploying server pattern 2. Select target nodes (we selected x240_03) as shown in Figure 5-41, then click Deploy. Figure 5-41 Selecting target compute nodes
  • 125. Chapter 5. NIC virtualization considerations on the server side 111 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm 3. Click Deploy in the confirmation window appeared. A new job is started and the confirmation is displayed as shown in Figure 5-42. Click Close. Figure 5-42 Deployment job start confirmation 4. You can check the job status in the Jobs pod by clicking Jobs  Active and moving the mouse pointer other the job name, as shown in Figure 5-43. Figure 5-43 Server Profile activation job status
  • 126. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 112 NIC Virtualization on IBM Flex System 5. Click Server Profiles on the left side of the Configuration Patterns window (see Figure 5-43 on page 111). You see the profile deployment status in the Profile Column, as shown in Figure 5-44. Figure 5-44 Profile activation status 6. When profile activation completes successfully, the profile status changes to Profile assigned, as shown in Figure 5-45. Figure 5-45 Profile assigned Server NICs are now configured. Now, let’s have a look at what changed in the UEFI for the network setup.
  • 127. Chapter 5. NIC virtualization considerations on the server side 113 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Go to UEFI by pressing F1 during the compute node boot phase, then select System Settings  Network. Figure 5-46 and Figure 5-47 on page 113 show vNICs configured on the LOM and the CN4054 adapter using configuration patterns. Figure 5-46 Network Device List (Part 1) Figure 5-47 Network Device List (Part 2)
  • 128. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 114 NIC Virtualization on IBM Flex System Select Onboard PFA 12:0:0 (Integrated LOM) from the device list, press Enter two times, and verify vNIC parameters, as shown in Figure 5-48. LOM is configured with vNIC Switch Independent mode and NIC personality (Ethernet only ports). Figure 5-48 LOM vNIC configuration Go back to the network device list by pressing Esc two times and select Slot PFA 22:0:0 (the first ASIC of the CN4054) from the device list, press Enter two times, and verify vNIC parameters, as shown in Figure 5-49. The first ASIC is configured with vNIC UFP mode and NIC personality (Ethernet only ports). Figure 5-49 CN4054 vNIC configuration: First ASIC
  • 129. Chapter 5. NIC virtualization considerations on the server side 115 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Go back to the network device list by pressing Esc two times and select Slot PFA 27:0:0 (the second ASIC of the CN4054) from the device list, press Enter two times, and verify vNIC parameters, as shown in Figure 5-50. The second ASIC is configured with vNIC UFP mode and FCoE personality (Ethernet and FCoE ports). Figure 5-50 CN4054 vNIC configuration: Second ASIC See the following link for more details on utilizing FSM configuration patterns: http://www.redbooks.ibm.com/abstracts/sg248060.html 5.3 Utilizing physical and virtual NICs in the OS Regardless of if the user is using virtual NICs or physical NICs, most Operating Systems have various ways to utilize those NICs, either as individual links or in teamed/bonded modes for better performance or high availability (or both). This section provides guidance on various aspects of the NIC teaming/bonding usage by the OS. 5.3.1 Introduction to teaming/bonding on the server The terms bonding and teaming are different words for the same thing. In general, in Linux it is referred to as Bonding, in Windows and VMware it is referred to as Teaming. Regardless of the term, these technologies provide a way to allow two or more NICs to appear and operate as a single logical interface, for the purpose of either high availability or increased performance (all modes of teaming/bonding provide high availability, some modes also provide increased performance via load balancing). Each OS has their own way of providing these services, with most having native built in support, but some older Operating Systems still require a third party application to provide this functionality.
  • 130. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 116 NIC Virtualization on IBM Flex System All teaming/bonding modes come in two primary types, Switch Dependent mode, and Switch Independent mode, discussed here in more detail. Switch Dependent modes of teaming/bonding These are any teaming/bonding modes in the OS that also require a specific architecture in the connecting switches, and special configurations in these upstream switches (in other words, they are dependent on the upstream switch design and configuration to operate correctly). Some comments on these modes: 򐂰 All of these modes are some form of link aggregation, either static aggregation or dynamic aggregation (Link Aggregation Control Protocol - LACP). 򐂰 Most OS’s support both an LACP and a static form of teaming/bonding, and these are all forms of active/active teaming/bonding, usually load balancing traffic on a per-session basis (what constitutes a session is usually controlled by settings on each side of the device supporting this mode of teaming/bonding and is beyond the scope of this document) 򐂰 Any teaming/bonding mode that utilizes either static or LACP aggregation, requires that all ports in that team/bond, go to a single upstream switch, or a group of switches that can appear as a single switch to the NIC teaming (for example, switches running Cisco vPC or IBM vLAG, or stacked switches). 򐂰 Any of these modes also must have a corresponding mode of aggregation configured on the upstream I/O Modules to work properly - this is what makes them Switch Dependent Important: Currently using any aggregation based mode of teaming/bonding is not supported on a server if any of the virtual NIC options (Switch Independent mode, VF vNIC mode, or UFP) have been implemented. This is based on the current limitation that aggregation on IBM switches is on the physical port, not the logical port. An upcoming release of code should permit aggregations on UFP vPorts.
  • 131. Chapter 5. NIC virtualization considerations on the server side 117 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Figure 5-51 shows some examples of Switch Dependent mode teaming/bonding and their relationship to the upstream network connections. Figure 5-51 Examples of architectures and their interaction with Switch Dependent modes of teaming/bonding Switch Independent modes of teaming/bonding These are teaming/bonding modes that do not require any form of aggregation to be configured on the switch and thus are not dependent on any special switch side design or configuration (just ensure all ports connecting to the team carry a common set of VLANs and any other normal switch settings the host requires). Some comments on these modes:. 򐂰 Some Switch independent modes offer simple Active/Standby NIC teaming, where only the active NIC is used, and the standby NIC comes into play only if the active NIC fails 򐂰 All operating systems offer more advanced kinds of server side teaming that deliver Active/Active NIC usage by attempting to load balance the NICs in the team in such a way that only the server knows or cares about this load balancing (in turn, the switch side of this team/bond can load balance the return traffic based on how the host uses MACs to send traffic out) 򐂰 Attempting to configure some form of aggregation on the I/O Module ports facing the NICs in Switch Independent mode will almost always not work and lead to issues StandAloneSW1 StandAloneSW2 ComputeNode NIC0 NIC1 vLAGSW1 vLAGSW2 ComputeNode NIC0 NIC1 StackedSW1 StackedSW2 ComputeNode NIC0 NIC1 StandAloneSwitch ComputeNode NIC0 NIC1 Thesymbol representssomeformofaggregation SwitchDependentMode Supported SwitchDependentMode Supported SwitchDependentMode Supported SwitchDependentMode NotSupported
  • 132. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 118 NIC Virtualization on IBM Flex System Figure 5-52 shows examples of Switch Independent mode teaming/bonding and their relationship to the upstream network connections. Figure 5-52 Examples of architectures and their interaction with Switch Independent modes of teaming/bonding Understanding the terms Active and Standby with teaming/bonding The use of the phrases Active/Standby, Active/Passive, Active/Backup, and Active/Active can occasionally be misunderstood and confusing. This section attempts to clarify these terms. Active/Standby, Active/Passive, and Active/Backup These are all different names for the same thing, a NIC in a team/bond selected to be active (passing traffic), and the other NIC is put into a standby state (not passing traffic), and only used in the event the active NIC goes down. In some cases the team/bond might have multiple active NICs and only a single standby NIC, or the reverse (one active NIC and Important: In Figure 5-52 it always shows some sort of path between the pair of upstream switches, and never two switches isolated from one another. Although that path may be directly between the upstream pair (as shown here), or may be somewhere further up in the architecture, it must be present to ensure a failover path between points in the event of a path fault. See the section titled The need for end to end paths between NICs in a team later in this chapter for more detail. Important: The term Switch Independent has been used in this document in relation to a form of virtual NIC that operates independently of the I/O Module, and now as a mode of teaming/bonding on the server that is also independent of the I/O Module. Although they are both independent of the I/O Module, other then the name, and this independence, they are unrelated features. StackedSW1 StackedSW2 ComputeNode NIC0 NIC1 StandAloneSwitch ComputeNode NIC0 NIC1 vLAGSW1 vLAGSW2 ComputeNode NIC0 NIC1 vLAGSW1 vLAGSW2 ComputeNode NIC0 NIC1 SwitchIndependentMode Supported SwitchIndependentMode Supported SwitchIndependentMode Supported SwitchIndependentMode NotSupported Thesymbol representssomeformofaggregation
  • 133. Chapter 5. NIC virtualization considerations on the server side 119 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm multiple standby NICs), the point being that one or more NICs in this mode are unused for any traffic until needed. Most users understand the operation of these modes of teaming, but there is occasionally some confusion in the context of how the connecting I/O Modules are utilized. The I/O Modules themselves are not in any sort of special Active/Standby config. I/O Modules supporting servers running Active/Standby will both be active, simply forwarding traffic as it is received from the server, following the rules of that I/O Module (usually L2 switching based on MAC addresses). So the I/O Modules are not in any sort of Active/Standby mode and depend on the servers to decide which I/O module to utilize (based on the NIC selected as active in the OS team/bond). Since the server admin can control what NICs are active or standby, it is possible to configure some servers using a NIC going to I/O Module bay 1 as the active NIC, and other servers in the same PureFlex chassis using a NIC pointing to the other I’O module in bay 2, and in doing so, thus achieve some form of load balancing (albeit a chassis-based form of load balance). For example, the server admin could configure half of the servers to utilize the NIC going to I/O Module bay 1 for the active NIC, and the other servers using the I/O Module going to bay 2 for the active NIC. One possible down side for this type of Active/Standby per-chassis load balance is that any server within this PureFlex chassis that is using I/O Module bay 1, and has to talk to another server in the same chassis using bay 2 as the active path, usually must have that traffic travel to the upstream network and back down to get between the two I/O bays and their associated active server NICs. Overall, these Active/Standby modes tend to be the simplest to implement, and require no special switch side configuration. But provide only high availability (no load balancing for a single server), and thus are wasteful of over all bandwidth available to a given server. Active/Active While most agree on the meaning of the phrase Active/Standby, the phrase Active/Active is frequently a point of contention when parties do not define what the term Active means. In this document, the term active means the OS is free to actively use the NIC in any way that agrees with the teaming/bonding mode selected in the OS, and does not leave that NIC in some sort of standby mode. Within the term Active/Active, there are both Switch Dependent modes and Switch Independent modes of teaming/bonding. The following are some comments on Switch Dependent modes of Active/Active and some examples of these modes: 򐂰 Like all Switch Dependent modes of teaming/bonding, any of these Active/Active modes use some form of aggregation and requires an accompanying upstream network architecture and I/O Module configuration to support this aggregation on the server side NICs. Today these aggregation modes are exclusively either LACP or static aggregation 򐂰 These modes use the aggregation hash algorithm to determine what NIC is used for a session of traffic, and a session of traffic may be based on MAC address, IP address and/or other components of the packets being transferred 򐂰 The outbound path used for this mode of teaming/bonding is decoupled from the return traffic, in that each side of the aggregation decides on their own hash what NIC to use for a given session 򐂰 These modes provide a higher chance of better over all load balance, but do not guarantee any load balancing. For example, for a given session if all traffic is between just two hosts (for instance, a large file copy from one host to another) that traffic will generally
  • 134. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 120 NIC Virtualization on IBM Flex System only use a single NIC in the team. The return traffic will use whatever link it has hashed to by the switch side hash, but will also only pick a single link for this single session for that return traffic. Meaning for a given session, a sending device can only utilize the bandwidth of a single NIC in the team. 򐂰 As noted previously, these aggregation based modes of teaming/bonding are not supported today when using any of the virtual NIC features available from the Emulex NIC. That means that if the server has been configured for UFP, VF mode vNIC, or Switch Independent mode vNIC, that these teaming/bonding modes should not be implemented. 򐂰 Some examples of Active/Active teaming modes in this category for various OSs are: – Linux: Bonding mode 2 - Static aggregation – Linux: Bonding mode 4 - LACP aggregation – ESX vSwitch teaming mode Route based on IP hash - Static aggregation – ESX dvSwitch teaming mode Route based on IP hash - Static or LACP, depending on LACP setting enabled or disabled in the dvSwitch The following are some comments on Switch Independent modes of Active/Active, along with some examples: 򐂰 Like all Switch Independent modes of teaming/bonding, there is no special switch side architectures or configurations, and the switch should not be configured for any form of aggregation 򐂰 These modes use some server side decision making process to select what NIC to use for what session. In this case, a session is often all traffic from a given VM, or a given process in a bare metal OS, or destination IP or MAC and so on. The point being that the server decides how it will load balance the traffic over the NICs 򐂰 The outbound path used for this mode of teaming/bonding is usually not decoupled from the inbound traffic, in that in most cases, what ever NIC is used to send outgoing traffic from the host, the switch side will use the same NIC/link for any return traffic (the switch bases its decision on the MAC learned when the host sent a packet, using that MAC to return the traffic on the link learned). 򐂰 These modes can provide quite satisfactory load balancing, and are not dependent on having a specific switch architecture or configuration above the host, as the Switch Dependent modes do, and are available in al major Operating Systems 򐂰 Unlike the aggregation based modes of active/active teaming/bonding, these switch independent modes of active/active teaming/bonding work fine with any of the virtual NIC functions available in the Emulex adapters. 򐂰 Some examples of Active/Active Switch Independent modes of teaming/bonding in various OSs are: – Linux: Bonding mode 5 - Adaptive Transmit load balance – Linux: Bonding mode 6 - Adaptive load balance – ESX vSwitch teaming mode Route based on originating virtual port ID – ESX dvSwitch teaming mode Route based on source MAC hash In general, the Switch Dependent modes of Active/Active bonding/teaming, have a greater potential (but no guaranteed they will) of over all better load balance in the team/bond, but have added complexity, and only support certain upstream network architectures, and require the server team to coordinate with the network team to match the aggregation configurations correctly. Where as Switch Independent modes of Active/Active do not require any special upstream architecture/switch configuration, and can be completely controlled and configured from the server side of the equation, with no need for the server team to coordinate with the network team (except for, of course, what VLANs to utilize and how (tagged or untagged), which is always necessary, with or without any sort of teaming/bonding modes).
  • 135. Chapter 5. NIC virtualization considerations on the server side 121 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Link and path fault detection in teaming/bonding All teaming/bonding solutions need a way to know if a NIC in the team/bond is available for use. Most use simple link up/down as the primary method. Some add a layer beyond simple link up/down to attempt to detect remote failures beyond the direct link (upstream path failures). In general, most of these remote fault methods use some sort of arp or ping or probe packet to determine if the path to the other NIC or some upstream device is available, and if not, take that NIC out of service. Some examples of non-link fault failure detection technologies: 򐂰 Linux arp-monitoring 򐂰 VMware Beacon probing 򐂰 Broadcom Livelink (third party teaming tool) All of these remote fault methods have their limitations and can be prone to false positives (reporting a NIC unavailable when it can still service packets). Some examples of issues with these remote fault detection methods: 򐂰 In a large DataCenter with potentially 1000’s of hosts using Linux arp-monitoring and constantly ARPing the default gateway, could eventually become (or at least be perceived as) a Denial of Service attack on the default gateway 򐂰 If Beacon probing in ESX is used on a two-NIC team, if it fails with both NICs still in an up state (for example, a path fault not directly at the host, but somewhere in the upstream L2 network) it will not know which NIC is having a path issue and will begin to blast all packets out both ports, potentially overloading the network and creating new issues (owing to this, VMware does not recommend using Beacon probing with two NIC teams, but it will let you configure it on a two NIC team). Rather then using any of these OS based remote fault detection methods, it is usually preferred to utilize the Failover feature of IBM switches. Other vendors often also support a similar failover feature, such as Cisco’s Link State Tracking See Chapter 6, “Flex System NIC virtulization deployment scenarios” on page 133 for some examples of Failover configurations in a PureFlex System environment. The need for end to end paths between NICs in a team For teaming to work properly, there must be an end to end layer 2 path between the two (or more) NICs in the team. In other words, If you have a pair of teamed NICs, and a host needs to use VLAN 10, then VLAN 10 must be carried to both NICs, and that VLAN 10 must have an external path in the upstream network to connect these two NICs together. This is required for both failover, and in some configurations, load balancing and normal traffic, and is true regardless of teaming type (switch dependent or switch independent modes). This also has implications when using multi-switch aggregations (i.e. vPC or vLAG) In a typical vLAG/vPC environment, a user might have a pair of enclosure switches, running a vLAG aggregation toward the upstream network. Since the upstream switch thinks this pair of enclosure based switches are one switch, a host on the enclosure might send a packet that goes up on a port on one enclosure switch, but the response comes down on a port on the other enclosure switch (based on the other sides load balancing transmit of packets). Owing to this, you must ensure that not only is that VLAN carried on all ports to the server team, and all ports to the upstream aggregation, but it must also be carried on the ISL links of the vLAG/vPC. If this were a switch dependent mode of teaming (i.e. aggregation) this VLAN on the ISL is needed in the event of failover. If this is a switch independent mode of teaming, then this VLAN on the ISL is required for both failover and normal communications.
  • 136. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 122 NIC Virtualization on IBM Flex System 5.3.2 OS side teaming/bonding and upstream network requirements This section looks at the most common NIC teaming and bonding modes for various OSs and relates them to requirements for the upstream connecting network. Linux bonding Linux bonding has evolved over the years to become easier to deploy and more robust. This section discusses the various modes of bonding available on most Linux implementations. Most flavors of Linux today come with the bonding module prepackaged, but some versions still have to have it installed before bonding can be implemented. Linux offers many different modes of bonding, and not all modes of bonding exist in all flavors of Linux. But most implementations of Linux support bonding modes 0 through 6, which will be discussed here. Linux bonding offers two primary ways to determine if a link is available for server use 򐂰 mii-mon – This is simple link status up/down, and is the default for bonding 򐂰 ARP monitor – Sends an ARP packet to a specified device and expects a response There are some helpful documents available on the web that explain bonding, but It is important to note that much of the Linux bonding documentation has been written by server admins, not network admins. Thus some of the terms used in these documents and help files can be confusing to a network admin. One of the better places to learn about Linux bonding is the following link: https://www.kernel.org/doc/Documentation/networking/bonding.txt Table 5-1 on page 122 provides a cross reference between Linux OS side modes of bonding and their associated switch side requirements when using various Linux bonding modes: Table 5-1 Linux Bonding modes and their associated switch side dependences if any Linux side bonding modes and comments Type Switch side requirements and comments Bond Mode Comments Type of Agg Comments 0 Round Robin Transmit – Also called balance-rr - Xmit load balance per packet D Static Xmit load balance based on hash setting of the switch 1 Active/Standby – No load balancing – just fault tolerant I None No load balancing of traffic 2 XOR of hash – Also called balance-XOR - Xmit load balance based on setting of xmit_hash_policy, Xmit per session load balance D Static Xmit load balance based on hash setting of the switch 3 Broadcast – Xmits everything out all member interfaces, No load balancing, just fault tolerant D Static Xmit load balance based on hash setting of the switch - can work without switch side aggregation support - see note below 4 LACP – Also called 802.3ad - Xmit load balance based on setting of xmit_hash_policy, Xmit per session load balance D LACP Xmit load balance based on hash setting of the switch
  • 137. Chapter 5. NIC virtualization considerations on the server side 123 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Some comments on Table 5-1 on page 122: 򐂰 Type I = A switch Independent mode of bonding 򐂰 Type D = A switch Dependent mode of bonding 򐂰 Mode 0 may lead to out of order packet reception on the receiving device (this mode is usually only used in some very specific environments, for example, where out of order packet reception is not an issue) 򐂰 Mode 2 is most aligned with the polices of typical static aggregation on a switch 򐂰 Mode 3 duplicates all packets on each port (this is not a common selection and is rarely utilized). It could also potentially be used without static aggregation, if each NIC in the bond went to different physical networks or devices upstream 򐂰 Mode 4 is aligned with the polices of LACP aggregation on a switch 򐂰 Modes 1, 5 and 6 do not require any sort of aggregation configured on the switch side VMware ESX teaming VMware ESX offers teaming on its virtual switches including both the stand alone vSwitch and the distributed vSwitch (dvSwitch). The forms of teaming available vary slightly between an ESX stand alone vSwitch and the distributed dvSwitch, with the stand alone vSwitch offering the following four options: 򐂰 Route based on originating virtual Port ID (this is the default - load balances on a per-VM basis) 򐂰 Route based on IP hash (this is a static aggregation) 򐂰 Route based on source MAC hash (similar to the default) 򐂰 Use Explicit failover order (high availability only, no load balancing) The dvSwitch offers some of the same modes, but with more options. The following is the list of teaming options available on the dvSwitch: 򐂰 Route based on originating virtual port (same as stand alone vSwitch) 򐂰 Route based on IP hash (defaults to static aggregation) (same as stand alone vSwitch) 򐂰 Route based on IP hash (Optionally configured for LACP) 򐂰 Route based on source MAC hash (same as stand alone vSwitch) 򐂰 Route based on physical NIC load (attempts to take into account the load on a NIC as they are allocated to the VMs) 򐂰 Use Explicit failover order (same as stand alone vSwitch) 5 Adaptive Transmit Load balance – Also called balance-tlb - Xmit based on current load of NICs in bond I None According to Linux documentation, return traffic is not load balanced (only goes to slave NIC) 6 Adaptive Load balance – Also called balance-alb - Xmit per session load balance I None Load balances return traffic to host based on MAC usage of the host side Linux side bonding modes and comments Type Switch side requirements and comments Bond Mode Comments Type of Agg Comments
  • 138. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 124 NIC Virtualization on IBM Flex System VMware offers two modes of detecting when a path is down: 򐂰 Link Status – this is simple link up/link down and is the default 򐂰 Beacon Probing – Only useful in vSwitches/dvSwitches with more then 2 NICs – Do not use beacon probing on vSwitches/dvSwitches with only two NICs – If the upstream switch offers a failover option (as all of the 4093 models do) it is encouraged to use that over Beacon Probing An older document that does a very good job of explaining VMware ESX networking and the kinds of teams supported can be found at the following link (does not include the modes available in the dvSwitch): http://www.vmware.com/files/pdf/virtual_networking_concepts.pdf Some good information specific to the dvSwitch can be found in the following link: http://www.vmware.com/files/pdf/vsphere-vnetwork-ds-migration-configuration-wp.pdf Table 5-2 provides a cross reference between OS side modes of teaming and their associated switch side requirements when utilizing VMware ESX teaming: Table 5-2 VMware teaming modes and their associated switch side dependences if any VMware side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Route based on originating virtual port ID Load balances NICs in vSwitch on a per-VM basis – this is the default teaming mode for an ESX vSwitch I None Load balances return traffic to host based on MAC usage of the host side Route based on IP hash This is a static aggregation on the ESX links in the stand alone vSwitch. When used on a dvSwitch portgroup and the uplinks configured for LACP, this is an LACP aggregation – see below) D Static Xmit load balance based on hash setting of the switch Route based on source MAC hash This is similar to the default teaming mode (per-VM) except it selects the outbound NIC based on the source MAC, and not the originating virtual port ID I None Load balances return traffic to host based on MAC usage of the host side Use explicit failover order Always use the highest order uplink from the list of Active adapters that is up. No load balancing I None No load balance LACP LACP – only available on Distributed vSwitch – can only configure from vSphere Web client (not the traditional vSphere client)– When configured, all PortGroups using this uplink pair must be set to Route based on IP hash D LACP Xmit load balance based on hash setting of the switch Route based on physical NIC load Chooses path based on physical NIC load – only available on Distributed vSwitch (dvSwitch) I None Load balances return traffic to host based on MAC usage of the host side
  • 139. Chapter 5. NIC virtualization considerations on the server side 125 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Some comments on Table 5-2 on page 124 򐂰 Type I = A switch Independent mode of teaming 򐂰 Type D = A switch Dependent mode of teaming 򐂰 When NICs are added to a vSwitch, they can be assigned to active or standby rolls independently of the mode of teaming assigned. 򐂰 vSwitch teaming modes can be overridden by vSwitch PortGroup teaming settings Windows Server teaming Teaming in a Windows Server environment can be quite varied. For Windows Server 2008 and 2003, any teaming was only provided by a third party application provided by the NIC vendor. Starting in Windows Server 2012 there is a choice of using either a vendors third party application, or built in teaming provided by Windows 2012. For Windows versions (2012) that have native teaming ability, it is usually best to use the built in native teaming, and only install a third party vendor if there is some special feature that is needed that is not available by the built in versions of teaming in Windows. Teaming using the native modes available in Windows Server 2012 As noted, Windows Server 2012 offers built in NIC teaming, also referred to as LBFO (Load Balance/Failover) in some of their documentation. Microsoft refers to their teaming options as either switch independent mode, or switch dependent mode, with the same meaning we have been applying in this chapter. When selecting the teaming mode in Windows 2012, the user is presented with three options: 򐂰 Static Teaming Also referred to as generic aggregation in some Microsoft documentation, represents a static aggregation and is switch dependent, requiring a static aggregation to be configured on the switch. 򐂰 Switch Independent As the name implies, represents a switch independent mode of teaming (no aggregation configuration needed on the switch). How it utilizes the NICs for load balance is a separate setting. 򐂰 LACP Also referred to as 802.1AX in some Microsoft documentation (AX being the latest IEEE standard for LACP, replacing the older 802.3ad LACP standard), is a switch dependent mode of teaming that requires LACP be configured on the upstream switch. Separate from the teaming mode, a user can then select load balance method. In the initial versions of Windows Server 2012, two types of load balance options existed. Address hash and Hyper-V Port. Address hash utilizes information from the IP addresses in the packets to determine load balance. Hyper-V port attempts to load balance on a per vPort basis (not related to the term vPort as configured in IBM UFP virtual NIC settings). As of Windows Server 2012 R2, Microsoft has added a third load balance option, dynamic load balance, that attempts to also factor in NIC utilization to distribute the loads. Details on this and other aspects of teaming load balancing for Windows Server 2012 can be found in a document available from the following location: http://www.microsoft.com/en-us/download/confirmation.aspx?id=40319
  • 140. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 126 NIC Virtualization on IBM Flex System As noted, Windows Server 2012 also still allows third party NIC vendors teaming applications, but states that it is strongly recommended that no system administrator ever run two teaming solutions (built in Windows teaming and third party vendor teaming) at the same time on the same server. So use built in, or use third party, but never use both at the same time. Another good Microsoft document explaining Windows Server 2012 NIC teaming can be found at the following link: http://www.microsoft.com/en-us/download/details.aspx?id=30160 Table 5-3 provides a cross reference between Windows Server 2012 OS side modes of teaming and their associated switch side requirements: Table 5-3 Windows 2012 teaming modes and their associated switch side dependences if any Some comments on Table 5-3 on page 126 򐂰 Type I = A switch Independent mode of teaming 򐂰 Type D = A switch Dependent mode of teaming 򐂰 Both static teaming and LACP modes can also be set to use one of the three available hash methods (Address hash, Hyper-Vport, and if 2012 R2, Dynamic) 򐂰 Active/Standby teaming is available as a function of building one of the above mode teams, and then choosing to put a a member of the team into standby Teaming using third party vendor applications for Windows As noted, for Windows Server 2008 or Windows Server 2003, a vendor supplied application is required to implement any form of NIC teaming. Windows 2012 side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Switch Independent (all load balancing is controlled by the server side) Load balance options are set independent of teaming mode selection. Available load balance options are: Address hash - attempts to load balance based on IP addressing information in the packets Hyper-V port - This is a per-VM load balance and load balances the NICs on a per-VM basis Dynamic (only with R2 or later) - Attempts to assign outbound flows based on IP addresses, TCP ports and NIC utilization I None Load balances return traffic to host based on MAC usage of the host side Static Teaming Microsoft uses the names Generic Trunking and IEEE 802.3ad draft v1 in some of their documentation to refer to a static aggregation D Static Xmit load balance based on hash setting of the switch LACP Microsoft uses the name IEEE 802.3AX LACP in some of their documentation to mean an LACP aggregation D LACP Xmit load balance based on hash setting of the switch
  • 141. Chapter 5. NIC virtualization considerations on the server side 127 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm Which vendor application you chose is mostly based on the vendor NIC in use on the server. This section discusses two of the more common NIC vendors, Broadcom and Emulex, and their tools. Broadcom provides an application named Broadcom Advanced Server Program (BASP) that runs inside of the Broadcom Advanced Control Suite (BACS) to provide teaming services in Windows 2003/2008. It supports many of the Broadcom NICs as well as some Intel NICs. for a list of supported NICs and an introduction to this product, see the following link: http://www.broadcom.com/support/ethernet_nic/management_applications.php Broadcom BASP supports four primary teaming modes as noted in Table 5-4, and also has a form of remote path failure detection, known as LiveLink. Livelink requires an IP address on the team interface and separate IP addresses on each of the physical NICs. Like all forms of NIC teaming remote path detection discussed in this document, a more robust choice is usually to make use of the switch side Failover feature. A good document on using BASP can be found at the following link: http://www.broadcom.com/docs/support/ethernet_nic/Broadcom_NetXtremeII_Server_T7.8 .pdf Table 5-4 provides a cross reference between OS side modes of teaming and their associated switch side requirements, when utilizing Windows and the Broadcom Advanced Server Program: Table 5-4 Broadcom third party teaming modes and their associated switch side dependences if any Some comments on Table 5-4 on page 127 򐂰 Type I = A switch Independent mode of teaming 򐂰 Type D = A switch Dependent mode of teaming 򐂰 This BASP tool can also be used to create VLAN tagged interfaces Emulex is another vendor that offers third party teaming for Windows Server 2003 and 2008 platforms. Emulex refers to their teaming application as OneCommand NIC Teaming and Windows/Broadcom side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Active/Standby Active NIC carries all traffic until it fails, then standby NIC takes over. No load balancing I None No load balance Smart Load Balance (SLB) – with or without auto failback Attempts to load balance based on IP flows. With failback enabled, if a NIC that had failed comes back up, teaming will attempt to switch traffic back to that NIC I None Load balances return traffic to host based on MAC usage of the host side Generic Trunking (FEC/GEC)/802.3 ad-Draft Static This is a typical static aggregation implementation. Broadcom also refers to this as (FEC/GEC)- 802.3ad-Draft Static D Static Xmit load balance based on hash setting of the switch Link Aggregation (802.3ad) This works with LACP aggregations D LACP Xmit load balance based on hash setting of the switch
  • 142. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 128 NIC Virtualization on IBM Flex System VLAN Manager, and also offers four primary modes of teaming, as noted in Table 5-5 on page 128. Emulex also uses the terms switch independent and switch dependent modes of teaming in their documentation, which can be found at the following link: http://www-dl.emulex.com/support/windows/windows/240005/nic_teaming_manager.pdf Table 5-5 provides a cross reference between OS side modes of teaming and their associated switch side requirements, when utilizing Windows and the Emulex OneCommand application. Table 5-5 Emulex third party teaming modes and their associated switch side dependences if any Some comments on Table 5-4 on page 127 򐂰 Type I = A switch Independent mode of teaming 򐂰 Type D = A switch Dependent mode of teaming 򐂰 The Emulex tool can also be used to create VLAN tagged interfaces 5.3.3 Discussion of physical NIC connections and logical enumeration From a physical perspective, all physical NICs are hard wired to a specific I/O Module bay and specific port on those I/O Modules in the Flex System chassis. Examples of these fixed physical connections can be seen in 3.1, “Enterprise Chassis I/O architecture” on page 28. Any virtual NICs that are created on top of a physical NIC can naturally only connect to wherever the physical NIC it was created from connects to. Although a given physical NIC always goes to a specific physical I/O Module and port, how the OS enumerates (names) these NICs can be confusing and downright illogical at times. Knowing what OS enumerated NIC physically rides on top of what physical NIC (and thus where it connects to what I/O Module in the Flex System) is important for the server administrator. Understanding this logical to physical mapping allows proper NIC selection when building teamed/bonded designs. If we do not understand this relationship, and build a team or bond of two NICs that happen to go to the same switch, although providing increased Windows/Emulex side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Failover (FO) Simple Active/Standby - no load balancing I None No load balance Smart Load Balance (SLB) - AKA just “Load Balance” Attempts to load balance based on IP hash setting I None Load balances return traffic to host based on MAC usage of the host side Generic trunking - Link aggregation static mode (802.3ad static aggregation) This is a typical static aggregation implementation D Static Xmit load balance based on hash setting of the switch Link Aggregation Control Protocol (LACP) This works with LACP aggregations D LACP Xmit load balance based on hash setting of the switch
  • 143. Chapter 5. NIC virtualization considerations on the server side 129 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm bandwidth and NIC redundancy, it would not provide redundancy in the event of an I/O Module failure. As an example of OS enumeration, Figure 5-53 represents a Compute Node in a PureFlex System environment, not configured for any virtual NIC technology, and how VMware ESX might typically enumerate those physical NICs. Figure 5-53 Dual port physical NIC enumerated in a VMware ESX host As can be seen, the OS enumerated NIC vmnic0 has been associated with physical NIC 0 that connects to the I/O Module in bay 1, and the OS enumerated vmnic1 has been associated with the physical NIC1 that connects to I/O Module bay 2. In this case, putting these two NICs in a team/bond would provide full redundancy, straight forward and orderly. I/O Module 1 I/O Module 2 ComputeNodeInthePureFlexrunning VMwareESX Physical10GLinks vmnic0 Physical NIC0 Physical NIC1 vmnic1 WithoutvNICorUFP enabled– PhysicalNICs asseenbytheOS
  • 144. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 130 NIC Virtualization on IBM Flex System If we then look at an ESX host that had been installed when the NICs had been set for one of the virtual NIC modes we might see what is represented in Figure 5-54 (NICs configured for virtual fabric mode with no iSCSI or FCoE personality selected). Figure 5-54 Dual port physical NIC in a virtual NIC mode enumerated in a VMware ESX host Notice the enumeration sequence as seen Figure 5-54 is also very orderly, and could be readily utilized to determine best pairs of NICs for teaming/bonding (for example, vmnic0 and vmnic1 in a team/bond, vmnic2 and vmnic3 in a team/bond, and so on) to provide I/O Module redundancy. Although this orderly enumeration is frequently the case, it is not always how it works out (true of all operating systems, not just ESX shown in this example). In some cases, the enumeration may be in a completely different order then might be expected. For example, if a user had installed VMware when virtual NICs were enabled, and then disabled the virtual NICs and booted back up into the OS, the remaining physical NICs may not be sequential or logical in the OS. In the case of underlying NIC configuration changes, one way (although disruptive) to force the OS to re-enumerate the NICs in proper order is to reinstall the OS, and let it rediscover the current NIC structure. Perhaps simpler is to rename the NICs in the OS (some OS’s provide this ability). Even with a reinstall though, there are times when the OS just seems to want to provide less then obvious enumerations of the NICs, and this can be problematic. How can a user determine what OS named NIC is mapped to what physical NIC and I/O Module? There are several ways to help figure out what OS NIC is associated with what physical NIC. One of the simpler is to go into the I/O Module and shut down one of the physical ports toward the Compute Node, and see which NICs the OS then report as disconnected. Of course this is a disruptive operation so not unnecessarily a good choice in a production environment. Perhaps a less disruptive way is to make note of MAC addresses in the OS, and look in the I/O Module MAC address table to determine what physical port they came in on. But this can be a little more complicated with OSs that do not use the physical NIC MACs. I/O Module 1 I/O Module 2 ComputeNodeInthePureFlexrunning VMwareESX Physical10GLinks vmnic0 vmnic2 vmnic4 vmnic6 Physical NIC0 Physical NIC1 vmnic1 vmnic3 vmnic5 vmnic7 WithvNICorUFP enabled VirtualNICsas seenbytheOS
  • 145. Chapter 5. NIC virtualization considerations on the server side 131 Draft Document for Review May 1, 2014 2:10 pm NIC virtualization considerations - Server side.fm One fairly accurate, if not time consuming, method to make this determination is to go into the UEFI F1 setup, into the Network screen for the NICs, and make note of the information there to compare to information related to each logical NIC in the OS. Figure 5-55 represents an example of what might be seen on this Network screen: Figure 5-55 Example of MAC and PCI Function Address numbering of virtual NICs This screen provides both the MAC address and PCI Function Address (PFA) information for each physical or logical NIC, which can then be used in the server OS to figure out what OS enumerated names are related to the physical (or logical) NICs in hardware. The following two examples show the MAC and PFA info for comparison and contrast between the physical and then converted NICs for a dual port LoM NIC. Example 5-1 represents the values as seen for an onboard NICs not in any virtual NIC mode, along with what physical I/O Module bays those physical NICs connect to. Example 5-2 represents that same onboard NIC after conversion to some form of virtual NIC mode. Example 5-1 Example of onboard dual port NIC not in any virtual NIC mode MAC: 34:40:B5:BE:83:D0 Onboard PFA 12:0:0 physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D4 Onboard PFA 12:0:1 physical NIC-1 to I/O Module bay 2 As can be seen in Example 5-2, the original physical NIC and PFA information have been inherited by the first two virtual NICs, followed by the other 6 virtual NICs and their associated MAC, PFA info, and what I/O Module (based on the under lying physical connections of the physical NIC) they connect to. Example 5-2 Example of onboard dual port NIC after converting in to virtual NIC mode MAC: 34:40:B5:BE:83:D0 Onboard PFA 12:0:0 physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D4 Onboard PFA 12:0:1 physical NIC-1 to I/O Module bay 2 MAC: 34:40:B5:BE:83:D1 Onboard PFA 12:0:2 physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D5 Onboard PFA 12:0:3 physical NIC-1 to I/O Module bay 2 MAC: 34:40:B5:BE:83:D2 Onboard PFA 12:0:4 physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D6 Onboard PFA 12:0:5 physical NIC-1 to I/O Module bay 2 MAC: 34:40:B5:BE:83:D3 Onboard PFA 12:0:6 physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D7 Onboard PFA 12:0:7 physical NIC-1 to I/O Module bay 2 As noted, now that we know this MAC and PFA information (as well as their relationship to the underlying physical NIC and where it connects to), it is usually possible to go into the OS and locate either the MAC or PFA information associated with the OS enumerated name (for example, in the Device Manager in Windows Server 2012), and thus regardless of the enumerated name, know where each vNIC connects to. Regardless of how it is determined, getting the proper pair of NICs into a team/bond is always important to ensure the desired high availability is achieved.
  • 146. NIC virtualization considerations - Server side.fm Draft Document for Review May 1, 2014 2:10 pm 132 NIC Virtualization on IBM Flex System
  • 147. © Copyright IBM Corp. 2014. All rights reserved. 133 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios.fm Chapter 6. Flex System NIC virtulization deployment scenarios This chapter provides details on various aspects of NIC virtualization as well as their interactions with a number of I/O Module features. The following topics are covered: 򐂰 6.1, “Introduction to deployment examples” on page 134 򐂰 6.2, “UFP mode virtual NIC and Layer 2 Failover” on page 137 򐂰 6.3, “UFP mode virtual NIC with vLAG and FCoE” on page 149 򐂰 6.4, “pNIC and vNIC Virtual Fabric modes with Layer 2 Failover” on page 163 򐂰 6.5, “Switch Independent mode with SPAR” on page 189 6
  • 148. Deployment scenarios.fm Draft Document for Review May 1, 2014 2:10 pm 134 NIC Virtualization on IBM Flex System 6.1 Introduction to deployment examples This chapter provides examples for deploying the PureFlex I/O Modules and virtual NIC functionality in a number of different scenarios. Also provided are helpful commands to confirm the environment is operating as designed. It is important to note that the examples provided may or may not reflect an exact combination of features an average environment might include, but were more chosen to demonstrate the interoperation of features and their associated configurations. The following combinations of features will be presented in this chapter: 򐂰 UFP mode virtual NIC and Layer 2 Failover 򐂰 UFP mode virtual NIC with FCoE and vLAG 򐂰 Virtual Fabric mode vNIC and Physical NIC with Layer 2 Failover 򐂰 Switch Independent mode vNIC with SPAR The above combinations are not necessarily indicative of any specific restriction as to what works with what, or on what model I/O Module, but some features and combinations of features do indeed not interoperate with others, or on all I/O Modules. Some considerations in this regard: 򐂰 NIC virtualization features – All forms of vNIC are mutually exclusive of each other on the server side. In other words, a given server can be set for UFP or Virtual Fabric mode vNIC, or Switch Independent mode (or disabled for virtual NIC), but not more then one of these can be set at one time on that server. – On the switch side related to virtual NICs, UFP and Virtual Fabric mode vNIC are also mutual exclusive of each other, in that you can enable one or the other, but not both at the same time. Switch Independent mode vNIC can be enabled on a host, connecting to an I/O Module that is configured for UFP or Virtual Fabric mode vNIC, but only if the I/O Module ports facing this host are in physical mode (not enabled/configured for UFP or Virtual Fabric mode vNIC). 򐂰 Switch virtualization features – SPAR, vLAG and Stacking are all mutually exclusive on a given I/O Module. – SI4093 does not support vLAG or Stacking, but does support SPAR. The SI4093 also does not support any of the I/O Module based virtual NIC technologies (UFP or Virtual Fabric vNIC) but like all I/O Modules, supports Switch Independent mode vNIC running on the host. – The I/O Module based Failover feature is supported with all modes of virtual NIC, but implemented differently depending on the mode of virtual NIC (Virtual Fabric vNIC is configure on a per vNIC group basis, Switch Independent vNIC is configured using Important: Unless otherwise noted, all configuration examples and commands in this document are based on using the industry standard CLI (isCLI) of the PureFlex I/O Modules. By default today, the EN4093R and CN4093 use the menu driven CLI (this may change in the future). If an I/O Module is in the menu driven CLI mode, to make use of these examples it is first necessary to change to isCLI mode. The simplest way to get into the isCLI mode from the menu CLI mode, is to issue the menu CLI command /boot/prompt ena, and then exit out and log back in. Upon logging back in you will be offered the option to select the desired CLI.
  • 149. Chapter 6. Flex System NIC virtulization deployment scenarios 135 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios.fm global failover per physical port, and UFP is also configured using global failover, but on a per vPort basis). The above notations are provided as some examples, but other restrictions may apply. These are noted in more depth in chapters 3 and 4.
  • 150. Deployment scenarios.fm Draft Document for Review May 1, 2014 2:10 pm 136 NIC Virtualization on IBM Flex System
  • 151. 137 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm 6.2 UFP mode virtual NIC and Layer 2 Failover Unified Fabric Port provides for the ability of carving up 10 Gb ports into virtual NICs as seen in Chapter 5, “NIC virtualization considerations on the server side” on page 75. Layer 2 Failover, seen in other chapters throughout this book, provides for the ability to detect uplink failures and systematically disable all INT ports. Layer 2 Failover with UFP takes that process to the next level and automates the shutdown not only to a physical NIC but a UFP vPort virtual NIC. This section will provide diagrams and configuration examples for setting up UFP and Layer 2 Failover. The following topics are covered: 򐂰 6.2.1, “Components” 򐂰 6.2.2, “Topology” 򐂰 6.2.3, “Use Cases” on page 139 򐂰 6.2.4, “Configuration” on page 139 6.2.1 Components This deployment scenario uses the following equipment: 򐂰 Flex System Enterprise Chassis 򐂰 x240 Compute Node (in bay 3) – Running ESXi 5.5 – Dual port Emulex LoM NIC • Physical NIC disabled in UEFI – Quad port CN4054 NIC in Mezz slot 2 • First 2 physical NICs have UFP configured and FCoE personality enabled • Second 2 NICs have Virtual NIC disabled (in physical NIC mode) 򐂰 Two CN4093s in switch bays 3 and 4 򐂰 Two G8264 to act as upstream Ethernet connectivity running vLAG 6.2.2 Topology The x240 Compute Node OS running ESXi will be utilizing vSwitch0 using its default NIC team setting route based on originating virtual port to the pair of CN4093s. The first two ports within the UEFI of the CN4054R Emulex Quad Port NIC will be running in UFP mode. The CN4093s are running as independent I/O Modules with UFP enabled on vPort (.1) in Tunnel mode and vPort (.2) in FCoE mode. Tunnel mode is utilizing EXT1 and EXT2 which are in an IEEE 802.3ad LACP PortChannel with adminkey 4344. The PortChannel, along with INT port 4 UFP vPort (.1) are members of a failover trigger.
  • 152. Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm 138 NIC Virtualization on IBM Flex System As seen in Figure 6-1 below a single I/O Module is being presented to display connectivity between the Compute Node and the external network. Figure 6-1 failover trigger with an active failure In the above Figure 6-4 EXT1 and EXT2 are forming a PortChannel which are also members of a failover trigger. The failover trigger is configured to only allow a single port to fail before it fails the associated INT vPorts. In this example we’re using Auto Monitor with VLAN aware. There are two forms of failover triggers that can be configured; 򐂰 AMON - Auto Monitor which allows for tracking of a physical uplink, static PortChannel or LACP PortChannel. When the uplink fails the I/O Module will auto disable any associated INT Ports or vPorts that is associate with any of the VLANs also assigned to the Monitor Port. 򐂰 MMON - Manual Monitor which also allows for tracking of the same uplink types as AMON, upon failure, will disable any manually configured INT ports or vPorts associated with that Trigger. 򐂰 Limit is a mechanism that is part of failover and can be applied on a per trigger bases. In this example limit is set to 1 within trigger 1. Limit 1 represents the number of ports that must be up and forwarding before a failover is triggered. Once the limit is met failover will trigger a event and disable all INT Ports or vPorts associated with that trigger. There are a couple of different ways a failure can occur. The most clearly understood way is by failure of link on the physical port. The second method in which a failure can occur is by spanning-tree state. When a VLAN that has spanning-tree enabled on the uplink or PortChannel enters into a non forwarding state the I/O Module sees this as a failure and triggers a failover event disabling any of the INT Ports and or vPorts associated with that trigger. A non spanning-tree forward failure event can occur on either AMON or MMON types of Failover.
  • 153. 139 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm 6.2.3 Use Cases Failover can be extremely useful when NIC Teaming / Bonding is utilized on the Compute Nodes. Since the I/O Module is between the Compute Node and the upstream network a Compute Node has no way of detecting an outage further than its physical connection and can end up sending traffic to a black holed I/O Module. For this reason failover is a significant feature that will allow customers to implement an HA environment with a peace of mind that if a failure does occur their applications can survive with full access through its redundant connection to the network. 6.2.4 Configuration This section includes the configurations and steps necessary to configure the various components. This will not include the upstream G8264s as that is not the focus of this section (but it will include the configuration for the uplinks in the CN4093’s toward the G8264). Host side configuring (OS/UEFI) The process of configuring the UEFI is the same for any operating system that resides on an Intel based Compute Node. In Figure 6-2 below the UEFI Emulex NIC Selection page is found within the System Settings  Network  Network Device List {NIC wanting to enable UFP on}. Once here select Multichannel Mode  IBM Unified Fabric Protocol Mode. After making the change step all the way back out to System Configuration and Boot Management by pressing ESC and select Save Settings. Once enabled on one port of a two port ASIC the settings will automatically be applied to the other port/s. Figure 6-2 UEFI Emulex NIC Selection settings
  • 154. Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm 140 NIC Virtualization on IBM Flex System In Figure 6-3 vmnic2 and vmnic3 are associated with UFP port 4 vPort (.1) on each of the CN4093s. This is representing a healthy management network as both vmnics are being listed as Connected. Figure 6-3 ESXi Management with both redundant ports showing Connected In Figure 6-4 below the associated vSwitch which is utilizing vmnic1 and vmnic2 are seen below and is also showing connected. Figure 6-4 ESXi vSwitch with redundant vmnics Switch side configuration This subsection explains switch side configuration. The following options are covered: 򐂰 “Base Configuration of I/O Module” 򐂰 “Auto Monitor (AMON)” on page 142 򐂰 “Manual Monitor (MMON)” on page 143 򐂰 “View from Flex System Chassis with 2x CN4093s” on page 144 Base Configuration of I/O Module Although the base configuration and following failover configurations are all utilizing a pair of CN4093 I/O Modules the steps below can also apply to the EN4093/R with potentially minor EXT Port reassignments since the CN4093 has a different EXT port alignment than either of the EN4093 I/O Modules. 1. The first step, if utilizing a PortChannel as the Uplink, is to create an LACP 802.3ad PortChannel. In this Example 6-1 on page 141 below there will be four ports utilized as the Uplink providing 40 Gb of unidirectional bandwidth. Also configured will be the tagpvid-ingress setting as the vPort will be running in tunnel mode.
  • 155. 141 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm Example 6-1 Setting up LACP as the uplink interface port EXT11-EXT14 lacp mode active lacp key 5356 tagpvid-ingress 2. The second step is to create the UFP vPorts that will be utilized as the vmembers within the failover trigger. The Example 6-2 below shows how to setup UFP with a vPort running in Tunnel Mode. Example 6-2 Setting up UFP vPort 1 in Tunnel mode ufp port INTA3,INTA4 vport 1 network mode tunnel network default-vlan 4091 qos bandwidth min 50 enable exit ufp port INTA3 enable ufp port INTA4 enable ufp enable 3. Since the I/O Modules will be running in UFP Tunnel mode and not participating in spanning tree the option of shutting down spanning-tree globally is provided in Example 6-3 below. Example 6-3 globally disabling spanning-tree spanning-tree mode disable Now that the I/O Module has been completely setup to support both the uplink PortChannel and the UFP INT Ports the next step is to decide whether to utilize Auto Monitor (AMON) or Manual Monitor (MMON). Both AMON and MMON have there advantages. With AMON, in combination with UFP globally enabled, VLAN monitoring must be enabled before you can enable a failover trigger. VLAN monitoring allows the I/O Module to only disable those vPorts that carry the same VLAN ID as the Uplink or PortChannel assigned to that Failover Trigger. All other vPorts will remain unaffected even within the same physical INT port as the failed vPort. With MMON, the meaning of the word “Manual” is exactly that. The I/O Module must be defined with both the Monitor Port or PortChannel (EXT ports) and the Control members and or vmembers (INT ports). MMON, perhaps, might be more utilized as it provides for greater control of what gets disabled during an uplink outage.
  • 156. Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm 142 NIC Virtualization on IBM Flex System Auto Monitor (AMON) In Figure 6-5 below two of the 4 uplinks within the LACP PortChannel have failed. Since the limit of ports is set to 2 (i.e. 2 ports left up) a failed event occurs in that I/O Module causing all vPorts associated with the same VLANs listed in the PortChannel to also fail. Figure 6-5 Auto Monitor failure I/O Module configuration in Example 6-4 below consists of a trigger set with auto monitor enabled. This trigger is also set to fail, with a limit of 2, all control members and/or vmembers if the number of forwarding ports is reached by the specified failover limit number. Example 6-4 Failover Trigger with amon configuration failover enable failover vlan failover trigger 1 limit 2 failover trigger 1 amon admin-key 5356 failover trigger 1 enable Note: VLAN trigger requirement with AMON is only necessary if UFP is enabled. AMON failover also works without vlan tracking when UFP is not enabled.
  • 157. 143 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm Manual Monitor (MMON) In Figure 6-6 below two of the 4 uplinks within the LACP PortChannel have failed. Since the limit of ports is set to 2 (i.e. 2 ports left up) a failed event occurs in that I/O Module causing all vPorts, manually enabled as control ports, to also fail. Figure 6-6 Manual Monitor failure I/O Module configuration Example 6-5 below consists of a trigger set to MMON enabled. This trigger is also set to fail, with a limit of 2, all control members and or vmembers if the number of forwarding ports is reached by the specified failover limit number. Example 6-5 Failover Trigger with mmon configuration failover enable failover trigger 1 limit 2 failover trigger 1 mmon monitor admin-key 5356 failover trigger 1 mmon control vmember INTA3.1 failover trigger 1 mmon control vmember INTA4.1 failover trigger 1 enable The biggest difference between AMON and MMON is AMON uses VLANs associated with the EXT Port and triggers a failure event disabling only those vPorts associated with the same VLANs as the Uplink defined within the trigger. Verification of proper configuration, with show commands, can be seen in “Confirming operation of the environment” on page 144.
  • 158. Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm 144 NIC Virtualization on IBM Flex System View from Flex System Chassis with 2x CN4093s Figure 6-7 shows a view of two CN4093s with UFP and Failover enabled. This scenario is identical from the two scenarios above allowing the redundant link to take 100% of the bandwidth after a failure to the primary ESXi vmnic. Figure 6-7 Flex Chassis with 2x CN4093s with failover enabled 6.2.5 Confirming operation of the environment Upon completion of the above steps there are several show commands that can display whether or not Failover is working as expected with the desired configuration. The first and easiest example, as seen below with Example 6-6 on page 145, is to display the status of the vPorts. By issuing a show ufp information port command this displays the health of each vPort. INTA3 and INTA4 Channel 1 (i.e. vPort (.1)) are both showing disabled, however, notice an asterisk next to the word disabled. This indicates, as also noted at the bottom of this example, that the vPort has been disabled due to a UFP failover trigger. This indicates that the number of failed uplinks, either the entire uplink/s or the limit, has been reached. Important: Channel 2 (i.e. vPort (.2)) is still up and forwarding as those vPorts were not members of a trigger with a failed event.
  • 159. 145 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm Example 6-6 UFP vPort status CN4093a(config)#show ufp information port ----------------------------------------------------------------- Alias Port state vPorts chan 1 chan 2 chan 3 chan 4 ------- ---- ----- ------ --------- --------- --------- --------- INTA1 1 dis 0 disabled disabled disabled disabled INTA2 2 dis 0 disabled disabled disabled disabled INTA3 3 ena 2 disabled* up disabled disabled INTA4 4 ena 2 disabled* up disabled disabled INTA5 5 dis 0 disabled disabled disabled disabled . . . * = vPort disabled due to UFP teaming failover Example 6-7 shows results of the command show portchannel information which displays the number of ports that have failed. As you can see below, the number of ports left up and forwarding is two. The failover trigger is also set to 2 so the limit has been reached which forced a failure event and disabled all INT vPorts. This command is especially important to figure out if what caused the failure event was due to Link status or Spanning-Tree block status. Example 6-7 displaying which ports within a PortChannel are still forwarding CN4093a(config)#show portchannel information PortChannel 65: Enabled Protocol - LACP Port State: EXT13: STG 1 forwarding EXT14: STG 1 forwarding These next two, Example 6-8 and Example 6-9 on page 146,display the full status of a failover trigger. This just might be the easiest command to run to find whether a trigger has been activated or not. In Example 6-8 notice that the limit is set to 2 with three of the four ports still remaining in Operational status. Because the limit has not been met the failover trigger has not kicked in. Example 6-8 Healthy Trigger state CN4093a(config)#show failover trigger 1 information Trigger 1 Manual Monitor: Enabled Trigger 1 limit: 2 Monitor State: Up Member Status --------- ----------- adminkey 5356 EXT11 Operational EXT12 Failed EXT13 Operational EXT14 Operational Control State: Auto Controlled Member Status --------- -----------
  • 160. Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm 146 NIC Virtualization on IBM Flex System Virtual ports INTA3.1 Operational INTA4.1 Operational In Example 6-9 notice that the limit is set to 2 and there are only two ports remaining in Operational status. Because the limit has now been met the failover trigger has kicked in and put the associated vPorts into a Failed state. Example 6-9 Failed Trigger state CN4093a(config)#show failover trigger 1 information Trigger 1 Manual Monitor: Enabled Trigger 1 limit: 2 Monitor State: Down Member Status --------- ----------- adminkey 5356 EXT11 Failed EXT12 Failed EXT13 Operational EXT14 Operational Control State: Auto Disabled Member Status --------- ----------- Virtual ports INTA3.1 Failed INTA4.1 Failed We can also see disconnects from the host side indicating that a physical or logical connection has been terminated. In Figure 6-8 vmnic 2 states Disconnected as the uplinks in the I/O Module to the network has been severed (or spanning-tree blocked) causing a Trigger Failover response to the associated vPorts. Figure 6-8 vmnic2 failure - VMware Management
  • 161. 147 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + L2 Failover.fm In Figure 6-9 below vSwitch0 is now displaying disconnected on vmnic2 and has failed over to its redundant (stand by) vmnic3. When this happens the traffic that was originally on vmnic 2 is now running over vmnic 3 and up through I/O Module 4. Figure 6-9 vmnic2 failure - vSwitch In Example 6-10, using a linux command line, a failure of 2 seconds (e.g. 2 ICMP Ping loss) was experienced during a failover trigger event. Example 6-10 ICMP ping loss due to failover trigger between I/O Modules 64 bytes from 9.42.171.170: icmp_seq=580 ttl=64 time=0.580 ms Request timeout for icmp_seq 581 Request timeout for icmp_seq 582 64 bytes from 9.42.171.170: icmp_seq=583 ttl=64 time=0.468 ms Important: During a failure event between I/O Modules it is normal to experience up to 3 seconds of packet loss due to network reconvergence.
  • 162. Deployment scenarios - UFP + L2 Failover.fm Draft Document for Review May 1, 2014 2:10 pm 148 NIC Virtualization on IBM Flex System
  • 163. 149 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm 6.3 UFP mode virtual NIC with vLAG and FCoE This section discusses the implementation of UFP virtual NIC with FCoE, and vLAG aggregations on the uplinks of a pair of CN4093’s. 6.3.1 Components This deployment scenario will make use of the following equipment: 򐂰 Flex System Enterprise Chassis 򐂰 x240 Compute Node in bay 3 – Running ESX 5.5 – Dual port Emulex LoM CNA • Not used in this scenario – Quad port CN4054 CNA in Mezz slot 2 • First two physical CNA ports have UFP configured and FCoE personality enabled • Second two CNA ports have Virtual NIC disabled - Not used in this scenario 򐂰 v7000 Storage Node in bays 11 - 14 of the Flex System chassis – Providing remote storage for Compute Node in bay 3 򐂰 Two CN4093 I/O Modules – Installed in I/O Module bays 3 and 4 for this scenario – Both with Upgrade 1 FoD installed – Providing the FCF function between the Compute Node in bay 3, and the storage array in bays 11-14 򐂰 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of CN4093’s 6.3.2 Topology This scenario will take advantage of the vLAG feature available on the CN4093 to virtualize the data plane to support cross switch aggregation. As well as UFP to provide virtual NIC support to the Compute Node, and FCoE within UFP to offer FCoE attached storage to the Compute Node in bay 3. Some comments on what is being demonstrated: 򐂰 We are using vLAG to provide cross-switch aggregation out of the CN4093’s toward the upstream direction to the Top of Rack switches. This provides both HA and improved performance for these connections to the upstream network – We are not doing any vLAG aggregations from the CN4093’s toward the Compute Node in bay 3 (aggregations toward servers running any form of virtual NIC is not supported at this time) 򐂰 For UFP we will be demonstrating four different vPorts: – vPort1 in tunnel mode - using the vLAG aggregation of EXT11 on both CN4093’s for the tunnel uplinks out of the I/O Modules • Uplink for vPorts using tunnel mode should use the tagpvid-ingress command to break out tunnel packets toward upstream and re-add outer tag on inbound packets back into the tunnel – vPort2 in FCoE mode • If FCoE is desired, only vPort2 can provide that function. All other vPorts can be any mode except FCoE – vPort3 in Access mode, allowing only vLAN 40, untagged
  • 164. Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm 150 NIC Virtualization on IBM Flex System – vPort4 in Trunk mode, allowing VLANs 50 and 60 (VLAN 50 untagged) • vPort3 and vPort4 sharing vLAG aggregations on ports EXT12 and EXT13 on both CN4093’s for their uplinks Figure 6-10 shows how the components of this design come together. Figure 6-10 Example of vLAG aggregations upstream, UFP and FCoE using CN4093s 6.3.3 Use cases For customers desiring highly available upstream connections (vLAG), virtual NICs on the servers (UFP) and converged storage access (FCoE). As noted previously, none of these features are directly a requirement of the other (we can have vLAG without UFP, or UFP without FCoE, and so forth). They are just demonstrated together here for the purposes of showing a potentially flexible and robust design. 6.3.4 Configuration This section includes the configurations and steps necessary to configure the various components. This examples here will not include the upstream G8264 configurations, as that is not the focus of this paper (but it will include the configuration for the uplinks in the CN4093s toward the G8264s). Also not included here is the act of creating the LUNs that will be used for this process. It is assumed they already exist at the time this scenario is built.
  • 165. 151 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm The steps required to complete this scenario are broken up into five primary sections: 򐂰 “Host side enablement (UEFI Setup)” 򐂰 “Miscellaneous I/O Module settings” 򐂰 “vLAG and aggregation configurations” on page 152 򐂰 “UFP configuration” on page 154 򐂰 “FCoE configuration” on page 156 Host side enablement (UEFI Setup) For this example we will need to go into UEFI and configure the desired virtual NIC type (UFP) and set the personality to FCoE. Not shown will be the install of ESX and the configuration of the vSwitches and a test VM (images in Figure 6-10 on page 150 represent the final vSwitch vmnic usage). To configure the host to support UFP and FCoE, reboot the server and when prompted, press the F1 key to enter Setup. In Setup, go to System Settings  Network, then highlight the desired NIC and press Enter twice. This should take us to the Emulex NIC Selection menu. Change Personality to FCoE (assumes FCoE FoD key already installed) and change Multichannel Mode to Unified Fabric Port (UFP). After setting the FCoE and UFP Virtual NIC in UEFI, escape back out of UEFI setup, and save the configuration when prompted, and reboot the Compute Node. Detailed instructions and screen shots of this process can be found in Chapter 5, “NIC virtualization considerations on the server side” on page 75. Miscellaneous I/O Module settings The following are some preparatory steps before configuring the main features of this scenario. Some comments on these commands: 򐂰 In this example we are only using a limited subset of ports for example, INTA3, INTA13-INTA14, and so on, but in most cases many ports would be performing the same roles, so some of the commands shown here will impact both the ports we will be using to demonstrate this scenario, as well as ports that we will not be using in this specific scenario. 򐂰 Tagging needs to be enabled on all ports carrying FCoE VLANs, as well as on any ports carrying more then a single VLAN Important: Changing the Personality and MultiChannel modes effects all CNA ports on the ASIC associated with the one being changed. Meaning it is only necessary to set this in one place, to enable two CNA ports if this is the onboard Emulex or in two places for Quad port CN4054 NIC (CN4054 and Cn4054R have two ASICs). Important: While performing the configurations on the I/O Modules, all uplinks should be disconnected or disabled until instructed to bring the links up. Making certain configuration changes on a I/O Modules with live connections to an upstream network can cause instability in the network. Important: All switch configuration instructions assume we are starting from a factory default configuration on the I/O Modules. All configuration commands shown executed are from the conf t mode of the isCLI interface of the I/O Module.
  • 166. Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm 152 NIC Virtualization on IBM Flex System 򐂰 A host name is configured (for clarity) 򐂰 An idle logout timer is configured (for reference) 򐂰 Apply name to ports going to internal v7000 (for clarity) The commands used to perform these miscellaneous tasks can be seen in Example 6-11: Example 6-11 Example of preparing switch with base commands ! Enable tagging on all desired ports ! 1-28 = INTA1-INTB14, 43-44 = EXT1-EXT2 (vLAG ISL) ! 54-55 = EXT11-EXT12 (uplink) int port 1-28,43-44,54-55 tagging ! ! Add host name and set idle time out to 60 minutes hostname PF_CN4093a system idle 60 ! ! Add port names on INTA13 and INTA14 int port 13-14 name v7000_Storage Repeat the above steps for the second switch, changing hostname to PF_CN4093b. Once these base commands are applied we can proceed to creating the vLAG and aggregations. vLAG and aggregation configurations Configuring vLAG and aggregation is a multistep process and will include the following steps: 1. Create the aggregation for the vLAG ISL and set PVID to an unused VLAN (using an unused VLAN for the PVID on the vLAG ISL helps to increase stability of the ISL). We will be using LACP for all aggregations, but static aggregations could also have been utilized. All LACP keys are chosen to be unique for each aggregation and do not denote anything else special by the use of these specific LACP key numbers 2. Disable Spanning-tree on the PVID VLAN of the ISL (also helps ensure stability of the ISL) 3. Create the local aggregations on the uplinks 4. Configure the health check (in this example we will be using the EXTM ports back to back to provide the vLAG health check). Will be using some unused IP subnet (1.1.1.X/30) for this health check connection 5. Configure and enable vLAG 6. vLAG Tier ID must be unique from any upstream connecting vLAG pair and must be same for both CN4093 I/O Modules in the same vLAG pair 7. Once all configurations are complete, plug in back-to-back health check cable between EXTM ports. Plug in ISL links 8. Once ISL is up, plug in uplinks to upstream networks to complete the physical steps Important: All of the examples provided here can be directly cut and pasted into the I/O Module.
  • 167. 153 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm The commands used to perform these tasks are provided in Example 6-12. Example 6-12 Example of configuring vLAG and aggregations ! Create the ISL aggregation and set the PVID to an unused VLAN int port 43-44 lacp mode active lacp key 4344 pvid 4090 ! ! Exit from interface config mode and then globally disable instance of STP ! for ISL PVID VLAN exit no spanning-tree stp 26 enable spanning-tree stp 26 vlan 4090 ! Configure upstream aggregations (using EXT11 (53) for UFP tunnel uplink ! Using EXT12-EXT13 (54-55) for UFP trunk and access uplinks int port 53 lacp mode active lacp key 1111 ! int port 54-55 lacp mode active lacp key 1213 ! ! Configure EXTM ports for use as vLAG healthcheck ! Interface IP 127 is tied to EXTM int ip 127 ip address 1.1.1.1 255.255.255.252 enable ! Configure VLAG ! Hlthck points to IP of other CN4093 in this vLAG pair ! ISL adminkey is the admin keys on ports EXT1-EXT2 ! Other adminkeys are for uplink aggregations previously configured vlag enable vlag tier-id 11 vlag hlthchk peer-ip 1.1.1.2 vlag isl adminkey 4344 vlag adminkey 1111 enable vlag adminkey 1213 enable ! Once the above steps are complete, repeat for the second I/O Module, changing the following two lines in the above config: 򐂰 Change ip address 1.1.1.1 255.255.255.252 enable to ip address 1.1.1.2 255.255.255.252 enable 򐂰 Change vlag hlthchk peer-ip 1.1.1.2 to vlag hlthchk peer-ip 1.1.1.1
  • 168. Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm 154 NIC Virtualization on IBM Flex System Once both switches are configured per the above, perform the following steps: 1. Bring up the ISL links between the pair of CN4093’s (no shut EXT1-EXT2 and/or plug in the cables as necessary) 2. Bring up the management ports on both CN4093’s (no shut EXTM and/or plug in the cable as necessary 3. Confirm links on EXT1, EXT2 and EXTM ports on both CN4093’s are up (show int status) 4. Confirm aggregation on EXT1-EXT2 is Up (show lacp info) 5. Confirm vLAG ISL and health check are up using the command show vlag info and confirm Health check is Up and ISL state is Up 6. Once vLAG and health checks are confirmed operational, bring up uplink aggregations on EXT11-EXT13 on both I/O Modules (no shut EXT11-EXT13 and/or plug in the cables as necessary) 7. Confirm links are up (show int status), aggregations are up (show lacp info) and vLAG shows state formed (show vlag info) for both upstream aggregations. Details on output of above commands for correctly functioning I/O Modules are provided in 6.3.5, “Confirming operation of the environment” on page 158. UFP configuration In this step we will enable and configure UFP on the INTA3 interface and add desired VLANs to uplink ports to complete the path out for the UFP vPorts. Some comments on these steps: 򐂰 Before we start configuring vPorts, we will enabled CEE – If a vPort is configured for FCoE, UFP can not be enabled until CEE is enabled – Enabling CEE automatically turns off flow control on all internal ports - this is to switch to Per Priority Flow control used by CEE – When changing flowcontrol states, the ports are shut/no shut automatically briefly to force the new flowcontrol state 򐂰 In this example we will be configuring four vPorts – vPort 1 will be in UFP tunnel mode and will use a tunnel VLAN of 4091. 4091 will be the outer tag used on packets flowing on this tunnel, and will be stripped off on the uplink EXT11 interface using the tagpvid-ingress command. – vPort 2 will be used for FCoE traffic and set for VLAN 1001 or 1002, depending on the switch. FCoE VLANs should be different for the separate switches to reduce the likely hood of a fabric merge. – vPort 3 will be configured as a simple access vPort, using an access/untagged VLAN of 40. – vPort 4 will be configured as an 802.1Q trunk vPort, using an untagged VLAN of 5o, and allowing a tagged VLAN of 60. Important: It is assumed that the upstream connecting switches have already been properly configured for any necessary aggregations and vLAG/vPC before bringing up the links to the upstream network. Failure to ensure upstream configuration is complete before plugging in cables can lead to a network down situation.
  • 169. 155 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm 򐂰 vPort bandwidths used in this example can change if desired, but it is recommended to not set the FCoE vPort 2 minimum bandwidth lower then 40%, to prevent FCoE traffic from being guaranteed the necessary bandwidth 򐂰 While in this example we show 4 different types of vPorts being used (tunnel, FCoE, access and trunk), we could have used different arrangements of types (for example, used all trunk vPorts, or all tunnel or access vPorts (except for vPort 2. if FCoE is in use, vPort 2 must be the FCoE vPort) – For each tunnel mode vPort, assuming the tunnel is being broken out (outer tag stripped off) on the uplink, that tunnel must have a separate uplink path (can not share uplink paths with other tunnel mode vPorts or even access or trunk mode vPorts) – All vPorts on a physical port must use unique VLANs The commands used to configure UFP and some associated VLAN and tunnel parameters can be seen in Example 6-13: Example 6-13 Example of configuring UFP and vPorts on INTA3 ! Enabling CEE at this point as it must be enabled before enabling a UFP vPort ! that has FCoE configured cee enable ! Create and configure all of the vPorts on INTA3 and enable UFP ufp port INTA3 vport 1 network mode tunnel network default-vlan 4091 qos bandwidth min 10 enable ufp port INTA3 vport 2 network mode fcoe network default-vlan 1001 qos bandwidth min 40 enable ufp port INTA3 vport 3 network mode access network default-vlan 40 qos bandwidth min 20 enable ufp port INTA3 vport 4 network mode trunk network default-vlan 50 qos bandwidth min 30 enable ufp port INTA3 enable ufp enable ! When UFP is enabled, it will automatically create and enable the assigned ! default-vlan for each vPort, and add the vPort as a member of that default VLAN ! Create any extra VLANS and assign VLANs to uplinks and ISL for failover paths ! VLANs 40 and 50 will have automatically been assigned to the vPorts with ! the default-vlan of the same. We need to now add the ISL and uplink ports
  • 170. Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm 156 NIC Virtualization on IBM Flex System vlan 40 enable member EXT1-EXT2,EXT12-EXT13 ! vlan 50 enable member EXT1-EXT2,EXT12-EXT13 ! ! For VLAN 60, this is the only non default-vlan VLAN we will be using ! so we must also manually add the vPort to this VLAN using the vmember command vlan 60 enable member EXT1-EXT2,EXT12-EXT13 vmember INTA3.4 ! ! VLAN 4091 is our tunnel mode VLAN, and vPort1 is automatically a member, but we ! must add the ISL links and desired uplink as members to carry traffic in and out vlan 4091 enable member EXT1-EXT2,EXT11 ! We will add the FCoE VLAN to desired ports in the next step. ! Set tagpvid-ingress on upstream port EXT11 to act as tunnel endpoint for vPort 1 ! Will remove tunnel VLAN for outbound packets ! Will add tunnel VLAN for inbound ports int port 53 tagpvid-ingress Repeat the above steps for the second switch, changing the following fine in the above config: 򐂰 Change the vPort 2 command network default-vlan 1001 to network default-vlan 1002. Once both switches are configured per the above, perform the following checks: 1. Run the command show run | section ufp and confirm UFP config is in place 2. Run the command show ufp info vport port inta3 and confirm all vPorts are up and carrying desired VLANs and in desired modes 3. Run the command show int trunk and confirm VLANs are correct and tagpvid-ingress is on upstream EXT11 Details on proper output of above commands for correctly functioning I/O Modules are provided in 6.3.5, “Confirming operation of the environment” on page 158 Once these UFP commands are applied we can proceed to configuring FCoE. FCoE configuration In this section we will be performing the necessary commands to enable FCoE. It is assumed the above steps have already been completed. Most importantly, that CEE has already been enabled in a previous step. Important: It is assumed that the steps to enable multichannel mode to UFP and personality to FCoE In the UEFI of Compute Node 3 has already been completed.
  • 171. 157 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm The steps we will be performing and some comments: 1. CEE was enabled in a previous step, but if it had not, it must be enabled now 2. We will be using EXT15-EXT16 as our FCF ports – We will not be attaching any cables to these ports in our example as all FCoE traffic will stay internal to the CN4093, between the host on INTA3 and the FCoE attached storage on ports INTA13-INTA14 - but we still must assign FC ports to communicate to the FC component of the CN4093 – Assigning a minimum of 2 FC ports is mandatory for any FCF function to work – Assigning more FC ports provides higher bandwidth – FC ports are always assigned in pairs (even numbers used) – Only the 12 omni ports (EXT11-EXT22) can be assigned as FC ports 3. Configure desired FCoE ports to carry vlan 1001 or 1002 tagged – FCoE VLAN must be a tagged VLAN on any ports that are carrying it 4. Enabling VLAN 1001 or 1002 for FCF functionality – 1001 is considered an industry default FCoE VLAN, but almost any VLAN can be used for FCoE (can not use VLAN 1 and a few other reserved VLANs) – Although it is possible to use the same FCoE VLAN on both switches (as long as that VLAN is not carried between the two switches), it is not recommended, to ensure a fabric merge does not occur if the FCoE VLAN did accidently get bridged between the I/O Modules 5. Disable STP instance of spanning-tree associated with the FCoE VLAN 6. Configure any desired zoning – We will be applying zoning that lets all hosts see all available LUNs. This is not what most production designs will incorporate and is only used here for simplified operation – In normal zoning, whenever changes are made to zoning, the zoneset activate name xxxxx command (where xxxxx is the name of the zone to be activated) must be executed before any zoning changes take effect. The zonset activate command is not necessary with the zoning syntax we are using in this scenario 7. Save the configuration to NVRAM when completed The commands used to perform these tasks can be seen in Example 6-14: Example 6-14 Example of configuring FCoE ! Enable FIP Snooping to ensure FCoE end to end security fcoe fips enable ! Designate the desired omni ports as FC ports system port EXT15,EXT16 type fc ! Name FCoE VLAN, add v7000 facing ports and FC ports and enable the FCF support vlan 1001 enable name FCoE_FAB-A member INTA13-INTA14,EXT15-EXT16 fcf enable ! Disable STP on instance of STP associated with FCoE VLAN no spanning-tree stp 112 enable
  • 172. Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm 158 NIC Virtualization on IBM Flex System spanning-tree stp 112 vlan 1001 ! Add catch-all zoning (not suitable for most production environments) zone default-zone permit zone name allow-all zoneset name default ! Save the configuration changes made to NVRAM copy running startup ! If prompted to save to flash press the y key ! If prompted to change to active config block, press the y key Repeat the above steps for the second switch, changing the following lines: 򐂰 Change vlan 1001 to vlan 1002 򐂰 Change name FCoE_FAB-A to name FCoE_FAB-B 򐂰 Change no spanning-tree stp 112 enable to no spanning-tree stp 113 enable 򐂰 Change spanning-tree stp 112 vlan 1001 to spanning-tree stp 113 vlan 1002 Once both switches are configured per the above, perform the following checks: 1. Run the command show fcoe fips fcf and confirm we see an FCF entry for each FC port that was configured. The FCF function should come up regardless of FCoE sessions. 2. Run the command show fcoe fips fcoe and confirm we see an FCoE session for each V7000 port on INTA13 and INTA14, and one for the server on INTA3. 3. Run the command show fcoe fips vlan and confirm desired interfaces are present for the FCoE VLAN. Details on proper output of above commands, along with other helpful troubleshooting commands for this environment are provided in “Confirming operation of the environment” 6.3.5 Confirming operation of the environment This section contains helpful commands and their associated output to ensure the scenario demonstrated is healthy and operating as expected. Note there are many helpful commands for many tasks, but this section is focused on the specific commands for this environment. Also note that the output for most of this information can also be obtained from a show tech command. Details on confirming the health of vLAG and aggregations The examples provided in Example 6-15 on page 159 represent truncated output and added embedded comments on that command output: The examples here are all run on the bay 3 I/O Module. When troubleshooting, one should always look at both I/O Modules in the design. Important: It is assumed that the OS has already been installed on the Compute Node and proper FCoE drivers are operational within the OS. It is also assumed the V7000 storage has been configured and is presenting storage to the host. Important: In an effort to reduce extraneous output, many non-essential lines have been removed from the output of the commands executed in this section. Where removed, they have been replaced by an ellipsis (...)
  • 173. 159 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm Example 6-15 Example of commands to check the health of vLAG (after all configs applied) ! First check the link status. Make sure ISL ports (EXT1-EXT2), INTA3, INTA13, ! INTA14, EXT1, are Link up, as well as EXTM Link up for the vLAG health check PF_CN4093a#show int status ------------------------------------------------------------------ Alias Port Speed Duplex Flow Ctrl Link Name ------- ---- ----- -------- --TX-----RX-- ------ ------ INTA3 3 10000 full no no up INTA3 ... INTA13 13 10000 full no no up v7000_Storage INTA14 14 10000 full no no up v7000_Storage ... EXT1 43 10000 full no no up EXT1 EXT2 44 10000 full no no up EXT2 ... EXT11 53 10000 full no no up EXT11 EXT12 54 10000 full no no up EXT12 EXT13 55 10000 full no no up EXT13 ... EXTM 65 1000 full no no up EXTM ... ! Confirm aggregation is now up not only for ISL but each one of the upsteam ! aggeregations PF_CN4093a#sho lacp info ------------------------------------------------------------------ port mode adminkey operkey selected prio aggr trunk status minlinks --------------------------------------------------------------------------------- ... EXT1 active 4344 4344 yes 32768 43 65 up 1 EXT2 active 4344 4344 yes 32768 43 65 up 1 ... EXT11 active 1111 1111 yes 32768 53 66 up 1 EXT12 active 1213 1213 yes 32768 54 67 up 1 EXT13 active 1213 1213 yes 32768 54 67 up 1 ... ! Confirm vLAG is fully healthy and both upstream vLAGed aggregations show state ! formed (formed = at least one uplink from each switch in a vLAGed aggregation is ! up and operationsl) PF_CN4093a#sho vlag info vLAG system MAC: 08:17:f4:c3:dd:0a Local MAC 74:99:75:5d:dc:00 Priority 0 Admin Role PRIMARY (Operational Role PRIMARY) Peer MAC a8:97:dc:10:44:00 Priority 0 Health local 1.1.1.1 peer 1.1.1.2 State UP ISL trunk id 65 ISL state Up Auto Recovery Interval: 300s (Finished) Startup Delay Interval: 120s (Finished) vLAG 65: config with admin key 1111, associated trunk down, state formed vLAG 66: config with admin key 1213, associated trunk down, state formed ! For reference, aside from state formed, there are three possible other states ! state local up = At least one link from the vLAG agg is up on this switch, but
  • 174. Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm 160 NIC Virtualization on IBM Flex System ! no links for this vLAG agg are up on the other switch ! state remote up = the reverse of local up, in other words, there is port up for ! this vLAG agg on the other switch, but none on this switch ! state down = no links on either switch are up for this vLAG agg Details on confirming the health of UFP The examples provided in Example 6-16 represent truncated output and added embedded comments on that command output for checking the health of UFP: Example 6-16 Example of commands to check the health of UFP (after all configs applied) ! First check that the desired UFP commands are present in the running config ! by filering on just showing the UFP sections PF_CN4093a#show run | section ufp ufp port INTA3 vport 1 network mode tunnel network default-vlan 4091 qos bandwidth min 10 enable exit ! ufp port INTA3 vport 2 network mode fcoe network default-vlan 1001 qos bandwidth min 40 enable exit ! ufp port INTA3 vport 3 network mode access network default-vlan 40 qos bandwidth min 20 enable exit ! ufp port INTA3 vport 4 network mode trunk network default-vlan 50 qos bandwidth min 30 enable exit ! ufp port INTA3 enable ! ufp enable ! ! Get a real time snapshot of vPort state and VLANs in use, as well as the mode ! configured for each vPort. PF_CN4093a#show ufp info vport port inta3 ------------------------------------------------------------------------------- vPort state evbprof mode svid defvlan deftag VLANs --------- ----- ------- ---- ---- ------- ------ ---------------------- INTA3.1 up dis tunnel 4091 4091 dis 4091 INTA3.2 up dis fcoe 1001 1001 dis 1001 INTA3.3 up dis access 4004 40 dis 40
  • 175. 161 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - UFP + FCoE + vLAG.fm INTA3.4 up dis trunk 4005 50 dis 50 60 ! Get real time infomation on VLAN allowed status as well as the status of ! tagpvid-ingress on the uplink for the tunnel vPort (EXT11) as seen by the ! accompanying # symbol. PF_CN4093a#show int trunk Alias Port Tag Type RMON Lrn Fld PVID NAME VLAN(s) ------- ---- --- ---------- ---- --- --- ------ -------------- ------------------ ... INTA3 3 y Internal d e e 1 INTA3 1 40 50 60 1001 4091 ... EXT1 43 y External d e e 4090 EXT1 1 40 50 60 4090 4091 EXT2 44 y External d e e 4090 EXT2 1 40 50 60 4090 4091 ... EXT11 53 n External d e e 4091# EXT11 4091 EXT12 54 y External d e e 1 EXT12 1 40 50 60 EXT13 55 y External d e e 1 EXT13 1 40 50 60 ... * = PVID is tagged. # = PVID is ingress tagged. Details on confirming the health of FCoE The examples provided in Example 6-17 represent truncated output and added embedded comments on that command output for checking the health of FCoE: Example 6-17 Example of commands to check health of FCoE (after all configs applied) ! Confirm the FCF is detected and has an entry for each of the FC ports assigned ! to this purpose PF_CN4093a#show fcoe fips fcf Total number of FCFs detected: 2 FCF MAC Port Vlan ----------------------------------- a8:97:dc:10:44:c7 EXT15 1001 a8:97:dc:10:44:c8 EXT16 1001 ! Confirm the FCoE sessions have been establised for each device that is using ! FCoE (the host on INTA3 and the ports toward the v7000 storage (INTA13 and ! INTA14) PF_CN4093a#show fcoe fips fcoe Total number of FCoE connections: 3 VN_PORT MAC FCF MAC Port Vlan ------------------------------------------------------ 0e:fc:00:01:11:00 a8:97:dc:10:44:c8 INTA3 1001 0e:fc:00:01:10:00 a8:97:dc:10:44:c7 INTA13 1001 0e:fc:00:01:10:01 a8:97:dc:10:44:c7 INTA14 1001 ! Check that all ports that need access to the FCoE VLAN are included: PF_CN4093a#show fcoe fips vlan
  • 176. Deployment scenarios - UFP + FCoE + vLAG.fm Draft Document for Review May 1, 2014 2:10 pm 162 NIC Virtualization on IBM Flex System Vlan App creator Ports ---- ----------------- ------------------------------------------------------- 1001 UFP INTA3 INTA13 INTA14 EXT15 EXT16 ! The following commands are only available when in full fabric mode (FCF enabled) ! and can be helpful when troubleshooting ! Make sure the FCoE database is populated with all hosts PF_CN4093a#show fcoe database ----------------------------------------------------------------------- VLAN FCID WWN MAC Port ----------------------------------------------------------------------- 1001 011100 10:00:00:00:c9:f8:0a:59 0e:fc:00:01:11:00 INTA3 1001 011000 50:05:07:68:05:08:03:70 0e:fc:00:01:10:00 INTA13 1001 011001 50:05:07:68:05:08:03:71 0e:fc:00:01:10:01 INTA14 Total number of entries = 3. ----------------------------------------------------------------------- ! Make sure we see a fabric login for each device: PF_CN4093a#show flogi database ----------------------------------------------------------------------- Port FCID Port-WWN Node-WWN ----------------------------------------------------------------------- INTA13 011000 50:05:07:68:05:08:03:70 50:05:07:68:05:00:03:70 INTA14 011001 50:05:07:68:05:08:03:71 50:05:07:68:05:00:03:71 INTA3 011100 10:00:00:00:c9:f8:0a:59 20:00:00:00:c9:f8:0a:59 Total number of entries = 3. ----------------------------------------------------------------------- For further commands on reviewing the health of an I/O Module see the appropriate Application Guide for that product. A good source for guides for PureFlex I/O Modules is the following link: http://publib.boulder.ibm.com/infocenter/flexsys/information/topic/com.ibm.acc.net workdevices.doc/network_iomodule.html
  • 177. 163 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 6.4 pNIC and vNIC Virtual Fabric modes with Layer 2 Failover This section presents several scenarios for use of the Emulex LOM’s and mezzanine adapters in Flex System compute nodes. The presented scenarios are: 򐂰 Physical NIC mode with Layer 2 failover 򐂰 Physical NIC mode with Layer 2 failover and FCoE storage 򐂰 Virtual Fabric vNIC mode with failover 򐂰 Virtual Fabric vNIC mode with failover and FCoE storage Physical NIC mode presents each port of the Emulex LOM or card as a single 10Gb physical port. A two-port card would be seen by the OS of the compute node as two 10Gb NICs, each of which would go to a different embedded I/O Module in the Flex chassis. A four-port mezzanine card would be seen as four 10Gb ports; two ports would go to one I/O Module (for example bay 1) and two to another (bay 2), using internal ports INTAx and INTBx on the switches. To make full use of a four port card such as the CN4054, an upgrade would be required on embedded switch modules (EN4093R, CN4093, or SI4093). Physical NIC mode with FCoE changes the presentation of the card so that each physical port is seen as a NIC and a corresponding FCoE HBA. (It is also possible to select the iSCSI personality on the card, and the storage side would be seen as an iSCSI HBA. This scenario is not tested here.) Virtual Fabric vNIC mode, also known as IBM Virtual Fabric mode presents each port of an Emulex LOM or card as up to four virtualized ports. The bandwidth of these ports is configurable with both a minimum guaranteed bandwidth allocation and a maximum limit on bandwidth usage. The OS of the compute node will see up to eight NICs, with bandwidth equal to the maximum limit configured on the Emulex hardware. Even though the OS might see eight NICs, each with a bandwidth of 10Gb, there are still only two 10Gb physical ports behind them. Four of the vNICs will share the 10Gb bandwidth of each physical port. (If a four port card such as the EN4054 is used, vNIC will present up to sixteen virtualized NICs to the OS from each EN4054, but there are still only four 10Gb physical ports and the total available bandwidth is 40Gb.) Virtual Fabric vNIC mode with FCoE reserves one of the four vNIC instances for each physical port for storage networking. In this case, the OS will see fewer virtualized NIC instances but will see the storage functionality reflected as an HBA. For example, a two port LOM configured in this way would be seen by the OS as six virtualized NIC instances and a two port HBA. The two port LOM still has only two physical 10Gb ports, and each one would be shared by three vNIC instances and one HBA. As in Physical NIC mode, an iSCSI personality is also available. Layer 2 Failover is a configurable function of most of the embedded switch I/O Modules on the Flex System chassis. It allows the state of a set of ports - typically external ports which connect to an upstream network - to control the state of other ports, typically internal facing ports which connect across the chassis backplane to compute nodes. This feature is typically used to protect against a specific type of network failure which can occur in chassis-based systems, where an embedded switch is operational but disconnected from the remainder of the network. Layer 2 failover can administratively disable server-facing ports when such a failure occurs, triggering the servers’ NIC teaming (or bonding) capability to use a surviving port which still has a viable connection to the network.
  • 178. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 164 NIC Virtualization on IBM Flex System 6.4.1 Components The testing in this chapter was done using the following hardware and software: 򐂰 Flex System Enterprise Chassis 򐂰 x240 Compute Node in bay 1 – Running ESX 5.1 – Dual port Emulex LOM CNA – DS4800 external storage attached via FC ports on G8264 switches 򐂰 Two EN4093’s in I/O Module bays 1 and 2 – Both with Upgrade 1 FoD installed 򐂰 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of CN4093’s – Providing FCF function and physical connectivity to DS4800 on Fibre Channel port 53 6.4.2 Topologies The base topology for the scenarios presented in this section is shown in Figure 6-11 on page 165 and shows the connections between the components listed above. Specific topology diagrams will be included in the sections below for specific scenarios. Note: There are two distinct ways to configure L2 failover on the 4093 switches. The failover command and associated subcommands and operands operates on full physical ports, and has been enhanced to function for UFP vports as well. There is also a failover option within the configuration of a vnic group; this option allows a failure in the uplink associated with a vnic group to cause the vnic members of that group to be administratively disabled. The vmember option of the failover command, which is intended for UFP vports, will allow a vNIC instance to be specified but it will not provide the desired failover function.
  • 179. 165 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 Figure 6-11 Base topology for scenarios 6.4.3 Use cases Physical NIC mode (pNIC) is the default for the Flex environment. It presents the LOM and NIC mezzanine cards to the server’s OS with the same number of ports as the card actually has (2-port or 4-port). In this mode, converged networking can be enabled, so that these cards present two or four NIC ports and two or four HBA ports for storage (FCoE or iSCSI). Redundancy can be achieved in pNIC mode for data networking through the use of NIC teaming options on the various operating systems. The storage protocols each have their own multi-pathing options which provide a similar capability as long as both HBA ports have access to the storage LUNs. The embedded and top-of-rack switches can be configured the failover command, which works in concert with NIC teaming. This scenario would use active-standby teaming on Windows and Linux; it could use a form of active-active teaming with VMware. These options are discussed in chapter 5.3, “Utilizing physical and virtual NICs in the OS” on page 115. In addition, with Virtual Link-aggregation (vLAG) on the switches, active-active NIC teaming modes can be supported. This will typically provide a more rapid failover and fail-back. PureFlex Chassis DS4800 Storage -- FC attached EN4093 – Bay 1 EXT6 vNIC .4 8264-1 EXT5 EXT7 vNIC .2 FCoE EN4093 – Bay 2 vNIC .4 8264-2 vNIC .2 FCoE INTA1 INTA1 EXT6 EXT7 EXT5 EXT10 EXT9 EXT9 EXT10 vLAG ISL 52 42 42 52 51 51 17 18 17 18 EXTM EXTM FC 54 FC 54 vNIC .3 vNIC .3 vNIC .1 vNIC .1 X240 server -- 2-port LOM -- ESX 5.1
  • 180. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 166 NIC Virtualization on IBM Flex System Virtual Fabric vNIC mode was the first virtualization option available from Emulex and IBM. It allows the Emulex converged NIC to be seen by operating systems as four NIC ports per physical port, or three NIC ports and one HBA per physical port. There are topology constraints in Virtual Fabric vNIC mode which are largely relaxed in the newer UFP virtualization mode which is recommended for new implementations. UFP is discussed in section 6.3, “UFP mode virtual NIC with vLAG and FCoE”. 6.4.4 Configurations The following configuration options are covered: 򐂰 “Physical NIC mode” 򐂰 “Use of vLAG with failover” on page 167 򐂰 “Physical NIC mode with FCoE storage” on page 168 򐂰 “Virtual Fabric vNIC mode” on page 174 򐂰 “Virtual Fabric vNIC mode with FCoE” on page 176 Physical NIC mode The failover function on the EN4093R switches can be configured on static or dynamic (LACP) aggregations. If it is desired to use auto monitoring (amon) then a single port can be configured as an aggregation and then configured into a failover trigger. The configuration would be done as follows, assuming that the uplink ports to be monitored are EXT5 and EXT7. (The upstream switch would have to configure LACP on the corresponding ports.) With this configuration, when both EXT5 and EXT7 fail, internally facing ports with the same VLANs will be administratively brought down. The limit option shown can be used to cause the internal ports to be brought down when either EXT5 or EXT7 fails - that is, when there are one or fewer ports active. The commands are shown in Example 6-18. Example 6-18 Failover configuration - pNIC mode - Auto monitor interface port EXT5,EXT7 lacp key 5757 lacp mode active failover enable failover trigger 1 amon admin-key 5757 failover trigger 1 enable failover trigger 1 limit 1 (optional) It is sometimes desirable to configure failover with more flexibility than the amon option provides. This can be done with manual configuration, also known as mmon. A configuration to do the same failover as is shown above using manual monitoring is shown below. Note that the controlled ports are explicitly specified, and can be a subset of the internal facing ports, or can include external ports such as when a server is connected to them. In Example 6-19, only ports INTA1 and INTA2 are to be disabled in the event of an uplink failure. Example 6-19 Failover configuration - pNIC mode - Manual monitor interface port EXT5,EXT7 lacp key 5757 lacp mode active failover enable failover trigger 1 mmon monitor admin-key 5757
  • 181. 167 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 failover trigger 1 mmon control member INTA1,INTA2 failover trigger 1 enable failover trigger 1 limit 1 (optional) Manual monitor failover can also be configured to monitor individual ports with the following command syntax: failover trigger 1 mmon monitor member EXT5,EXT7. Multiple triggers can be configured but a given resource - one or more ports - can only be controlled by one trigger at a time. A given trigger instance number can be either in amon or mmon mode. Example 6-20 shows manual monitoring of a static Port Channel. Example 6-20 Failover configuration - manual with static PortChannel portchannel 10 port EXT5,EXT7 portchannel 10 enable failover enable failover trigger 2 mmon monitor portchannel 10 failover trigger 2 mmon control member INTA1,INTA2 failover trigger 2 enable failover trigger 2 limit 1 (optional) Use of vLAG with failover The vLAG feature allows a port aggregation to be connected from a switch, including an EN4093 switch, to a pair of upstream switches which are connected and configured appropriately. This function is supported for both static and dynamic link aggregations. Since the failover feature is intended for failures where a server NIC is connected to a switch which has no uplink path, it is less useful when vLAG is used between a pair of 4093’s. This is because if the uplink from a 4093 fails in such a topology, traffic will cross the inter-switch link (ISL) configured as part of vLAG and use the uplink from the other 4093. If both 4093’s uplink ports fail at the same time, then there is no uplink path available from the chassis, and the failover feature will not help. However, failover can be configured to bring down an internal port when both the uplinks and the ISL ports fail (which is likely to be a very rare event); this is shown in Example 6-21. Example 6-21 Failover configuration when vLAG is in use !*** Uplink ports *** int port EXT5,EXT7 lacp key 5757 lacp mode active ! *** ISL ports *** int port ext9,ext10 lacp key 910 lacp mode active !*** vLAG configuration *** vlag enable vlag tier-id 20 vlag isl adminkey 910 !vlag hlthchk ... typically uses EXTM port and interface 127 on embedded switches vlag adminkey 5757 enable
  • 182. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 168 NIC Virtualization on IBM Flex System failover enable failover trigger 3 mmon monitor admin-key 5757 failover trigger 3 mmon monitor admin-key 910 failover trigger 3 mmon control member INTA1,INTA2.... failover trigger 3 enable Physical NIC mode with FCoE storage Physical NIC mode with storage is not very different from pNIC with no storage; the difference is that there is a dedicated VLAN for the storage traffic which must be carried to a Fibre-Channel Forwarder (FCF), which is where FC and Ethernet addressing is correlated and where FCoE traffic can be converted to standard FC traffic if the topology calls for this. Failover is configured in the same way with FCoE in use as it is without it. Uplink and downlink (server-facing) ports should be configured to carry the FCoE VLAN and the cee enable and fips enable command need to be part of the configuration. On a CN4093 or G8264CS, additional configuration is necessary to configure the Omniports and the FCF function; this is discussed under , “FCoE configuration” on page 156. Design choices For pNIC mode (or vNIC) with storage, it is generally suggested that the two HBA ports and the associated switches use different FCoE VLANs, and if vLAG is in use in such a topology, then the FCoE VLANs should not cross the ISL between the vLAG partner switches. This works well with the typical SAN design where redundancy is provided by having two distinct SAN networks (SAN-A, SAN-B) which can both reach the physical storage but which share few or no components between the servers and the storage. It is possible to either send FCoE traffic on the same uplinks as data traffic, or to use separate uplinks for the different types of traffic. In the tested scenario, storage and data traffic were both forwarded to the same upstream switches, but this is not required. Even when the traffic is sent to the same upstream switches, the option to segregate the two types of traffic is available. Topologies which show this and relevant parts of the switch configurations are shown in Figure 6-12 on page 169 and Figure 6-13 on page 170. In the configuration examples shown in Example 6-22 on page 170 and Example 6-23 on page 172, VLANs 1001 and 1002 (on the second EN4093) are used to carry FCoE traffic and VLANs 1 and 2 are carrying data traffic. The traffic could be segregated by changing the configuration in the following ways: 򐂰 On the EN4093 switches: – Breaking the aggregation between links EXT5 and EXT7 which uses LACP key 5757 – Assigning VLAN 1001 (or 1002) to EXT5 and VLAN 1 and 2 to EXT7 (or vice-versa). 򐂰 On the G8264CS switches: – Breaking the aggregation between links 42 and 52 which uses LACP key 4252 – Assigning the VLANs to the links to match what was done on the EN4093s
  • 183. 169 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 Figure 6-12 pNIC with FCoE: single shared uplink aggregation
  • 184. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 170 NIC Virtualization on IBM Flex System Figure 6-13 pNIC + FCoE - with FCoE traffic on segregated uplink Example 6-22 EN4093 config excerpts for vLAG topology with pNIC and FCoE version 7.7.9 switch-type IBM Flex System Fabric EN4093R 10Gb Scalable Switch(Upgrade1) ... interface port INTA1 tagging no flowcontrol exit ... ! interface port EXT5 tagging exit ... ! interface port EXT7 tagging exit ...
  • 185. 171 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 interface port EXT9 tagging pvid 4090 exit ! interface port EXT10 tagging pvid 4090 exit ! vlan 2 enable name VLAN 2 member INTA1,EXT5,EXT7,EXT9-EXT10 ... ! Note: SAN-B will use VLAN 1002 here vlan 1001 enable name FCoE SAN-A member INTA1,EXT5,EXT7 ! vlan 4090 enable name ISL member EXT9-EXT10 ! ! portchannel 10 port EXT9 portchannel 10 port EXT10 portchannel 10 enable ! ! ! interface port EXT5 no spanning-tree stp 112 enable exit ! interface port EXT7 no spanning-tree stp 112 enable exit ... ! interface port EXT5 lacp mode active lacp key 5757 ! ... ! interface port EXT7 lacp mode active lacp key 5757 ! vlag enable vlag tier-id 20 vlag isl portchannel 10
  • 186. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 172 NIC Virtualization on IBM Flex System vlag hlthchk peer-ip 1.1.1.22 vlag adminkey 5757 enable no fcoe fips automatic-vlan ! fcoe fips enable cee enable ! interface ip 127 ip address 1.1.1.11 255.255.255.0 enable exit Example 6-23 G8264CS config for vLAG topology with pNIC and FCoE version 7.8.1 switch-type IBM Networking Operating System RackSwitch G8264CS ... system port 53,54 type fc interface fc 53 switchport trunk allowed vlan 1,1001 interface fc 54 switchport trunk allowed vlan 1,1001 ! ... interface port 17 description ISL switchport mode trunk switchport trunk allowed vlan 1-2,10,4090 switchport trunk native vlan 4090 exit ! interface port 18 description ISL switchport mode trunk switchport trunk allowed vlan 1-2,10,4090 switchport trunk native vlan 4090 exit ... interface port 42 description 4093 downlink switchport mode trunk switchport trunk allowed vlan 1-2,1001 exit ! interface port 52 description 4093 downlink switchport mode trunk switchport trunk allowed vlan 1-2,1001 exit ! vlan 2 name VLAN 2
  • 187. 173 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 ! ! ! note that SAN-B (8264-2) will use vlan 1002 here and in the allowed vlan statements vlan 1001 name FCoE SAN-A fcf enable ! vlan 4090 name ISL ... ! interface port 17 lacp mode active lacp key 1718 ! interface port 18 lacp mode active lacp key 1718 ! interface port 42 lacp mode active lacp key 4252 ! interface port 52 lacp mode active lacp key 4252 ! ! ! vlag enable vlag tier-id 10 vlag hlthchk peer-ip 9.42.171.24 vlag isl adminkey 1718 vlag adminkey 4252 enable ! fcoe fips enable cee enable! ! zone default-zone permit ! ! ! ! ! interface ip 128 ip address 9.42.171.23 255.255.254.0 enable exit ! ip gateway 4 address 9.42.170.1 ip gateway 4 enable
  • 188. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 174 NIC Virtualization on IBM Flex System Virtual Fabric vNIC mode Virtual Fabric vNIC (or vNIC1) mode is the first NIC virtualization mode developed for use with Emulex adapters on IBM servers. It has largely been supplanted by UFP mode, which is more versatile. However, Virtual Fabric vNIC has its own failover configuration commands which are part of the vNIC group configuration. vNICs, vNIC groups, and uplinks An overall discussion of the available options for vNIC and their initial configuration can be found starting in Section 5.1, “Introduction to enabling Virtual NICs on the server” on page 76. Virtual Fabric vNIC mode introduces the following concepts: 򐂰 vNIC - an instance of a virtualized NIC which is associated with a specific physical port and which appears as a NIC or as an HBA as seen by a server’s OS or hypervisor 򐂰 vNIC group - a set of vNIC’s which are used together and which are each associated with a different physical port 򐂰 vNIC group uplink - a single port or a static or LACP port aggregation associated with a vNIC group 򐂰 vNIC group VLAN - a VLAN used for tunneling traffic from the vNICs and any non-virtualized internal ports associated the group through the group’s uplink to the wider network. Configuration of the Virtual Fabric vNIC feature is done according to the following requirements: 򐂰 A physical port can have up to four vNICs activated. No more than one can be for FCoE traffic and it will always be vNIC instance 2. 򐂰 Bandwidth of vNIC’s is specified in 100 Mb increments; each increment is also one percent of the bandwidth of a 10Gb port. Minimum bandwidth is 1 Gb which is specified as 10 in the configuration. 򐂰 Each data vNIC must be associated with a vNIC group. FCoE vNIC instances can not be associated with a vNIC group. 򐂰 A vNIC group can have a single logical uplink, as discussed above. If there is no requirement for traffic from the group to be forwarded outside of the chassis, then an uplink is not needed. 򐂰 Each vNIC group must be configured with a vNIC VLAN. This VLAN is never seen outside of the embedded switch in the chassis, and is used as an outer tag for 802-1q double-tagging by the switching ASIC. 򐂰 vNIC group VLAN numbers are not strictly required to be unique within the network, but making them unique may avoid confusion when troubleshooting. Note: The command “zone default-zone permit” allows any server to access any storage where the LUN is made accessible. However, the default zoning configuration when FCF mode is used on a G8264CS or a CN4093 is to deny all access. Therefore either explicit zoning or the default-zone option is necessary. The status of zoning can be seen with the show zone command on converged switches. This does not apply when NPV mode is used; in that case, zoning is configured on an upstream SAN switch.
  • 189. 175 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 NIC teaming configuration on servers NIC teaming is a feature included in current operating systems which allows multiple physical or virtual NICs to be treated as a single logical interface. Teaming can be active/active or active/standby, and the capabilities of the various teaming modes differ across the various operating systems. A discussion of teaming features and their configuration can be found in section 5.3.2, “OS side teaming/bonding and upstream network requirements” on page 122. vNIC sample failover configuration A sample failover configuration is shown in Example 6-24, including the associated vNIC and vNIC group configuration commands. In this configuration, ports EXT5 and EXT7 are uplink ports. Only one server (in slot 1 and reached via port INTA1) is shown; the configuration would be similar for other servers but the bandwidth allocations need not be identical. This configuration fragment would typically be used identically in each of a pair of 4093 switches in a chassis, especially when failover is used. Example 6-24 vNIC configuration with failover configured as part of the vNIC group vnic enable vnic port INTA1 index 1 enable bandwidth 40 vnic port INTA1 index 2 enable bandwidth 30 vnic port INTA1 index 3 enable bandwidth 20 vnic port INTA1 index 4 enable bandwidth 10 vnic vnicgroup 1 vlan 3001 member INTA1.1 (additional server vnics can go here) port ext5 failover enable vnic vnicgroup 2 vlan 3002 member INTA1.2 (additional server vnics can go here) failover enable (vnic groups 3 and 4 would be configured similarly and would need additional uplink ports to carry traffic outside the chassis)
  • 190. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 176 NIC Virtualization on IBM Flex System The failover command in the above example is used instead of the failover configuration shown elsewhere in this section when Virtual Fabric vNIC is used. vNIC failover would function as follows for vNIC groups where it is configured: 򐂰 The uplink port - which can also be a static portchannel or LACP portchannel specified by an adminkey - is monitored. 򐂰 If the uplink for a vnic group fails or is blocked due to spanning tree, then the vnic members of the group would be administratively brought down. 򐂰 If the other switch in the chassis is configured with the same vnic and vnic group configuration, and if the corresponding uplink in that switch is up, and if NIC teaming is configured appropriately on the servers, then traffic will use the path through the other switch. 򐂰 Options which are available in the standard failover trigger configuration, such as the limit option, VLAN sensitivity, and the manual monitoring options are not available in the vNIC failover feature. However, UFP uses standard failover triggers. 򐂰 vLAG can not be used with Virtual Fabric vNIC mode. vNIC failover and shared uplink mode Shared uplink mode with Virtual Fabric vNIC allows multiple vnic groups to share an uplink port. This mode is enabled with the vnic uplink-share command, and by specifying the uplink port (or aggregation) in those vNIC groups where it is desired. The vnic failover command is specified in the same way when shared uplink mode is in use. Shared uplink mode, like dedicated uplink mode, does not allow multiple uplinks to be specified in a given vnic group. A fuller discussion of shared uplink mode and a comparison with the default dedicated uplink mode can be found in section 4.1.1, “Virtual Fabric mode vNIC” on page 57. vLAG considerations vLAG cannot be used on ports or vNIC instances which are members of a vNIC group. A vNIC group can have only one uplink, and so it would not be possible to configure both an uplink and an ISL to connect to a vLAG peer switch. A pair of upstream switches such as the G8264s used in our testing can run vLAG between them and connect to the uplink PortChannels of a pair of vNIC groups on different switches such as EN4093’s. The EN4093’s cannot detect that vLAG is in use at the other end of their uplinks. For this to work, the servers supported by the vNIC groups must configure the same VLANS on corresponding vNIC’s connecting to each physical port. Virtual Fabric vNIC mode with FCoE FCoE traffic is configured in Virtual Fabric vNIC mode as follows: • FCoE traffic, if enabled, is always on vNIC instance 2. • When instance 2 is used for FCoE, it is not included in any vNIC group. • Since the FCoE instance is not configured in a vNIC group, failover for FCoE traffic is not configured with the vnic group failover option. • FCoE traffic does not flow over an uplink configured for a vnic group. It can flow over an uplink in shared uplink mode. • The standard failover trigger commands can be used to implement failover for FCoE traffic if desired, but if this is done the entire server-facing port will be brought down, not only the FCoE vNIC.
  • 191. 177 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 An example config of Virtual Fabric vNIC with FCoE is shown in Example 6-25. In this example, port EXT7 is used to carry FCoE traffic upstream to the 8264CS switch where the FCF is. Example 6-25 Virtual Fabric vNIC with FCoE vnic enable vnic port INTA1 index 1 enable bandwidth 40 vnic port INTA1 index 2 enable bandwidth 30 vnic port INTA1 index 3 enable bandwidth 20 vnic port INTA1 index 4 enable bandwidth 10 vnic vnicgroup 1 vlan 3001 member INTA1.1 (additional server vnics can go here) port ext5 failover enable .... the FCoE vnic can not be added to a vNIC group .... additional groups for data vNICs would be configured here failover trigger 3 mmon monitor member EXT7 failover trigger 3 mmon control INTA1[,INTA2 ... etc.] failover trigger 3 enable ... configuration for FCoE and for FCoE uplink to G8264CS.... cee enable fcoe fips enable int port ext7 vlan 1002 member ext7 The above configuration will implement failover for both the data and FCoE vNIC instances, but it will behave in the following ways: 򐂰 If port EXT5 fails, vNIC INTA1.1 and others configured in vnic group 1 (which would be on other servers) would be administratively down. The same would happen if an uplink port configured in vnic groups 3 or 4 should fail; the vNICs associated with those groups would be disabled. 򐂰 If the FCoE uplink, port EXT7 fails, then port INTA1 and other ports specified in the failover trigger would be administratively down. This would include all of the vNIC instances configured on those ports even though they might still have a working path to the upstream network.
  • 192. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 178 NIC Virtualization on IBM Flex System Because a failure on the FCoE uplink port would bring down all of the vNIC instances rather than just the FCoE instance on vNIC 2, this configuration might not be desirable. Our testing on ESX showed that FCoE has failover mechanisms of its own on the server. If the HBA ports are configured so that both of them have access to the storage LUNs, and one of them loses connectivity to the storage, such as due to an uplink failure, storage access will fail over to the other HBA. The tests performed showed that there might be a slight advantage in how long it takes to detect that storage connectivity is lost if the server-facing port (e.g. INTA1) is brought down, but it did not appear to be a significant advantage. A diagram of the topology in dedicated uplink mode is shown in Figure 6-14. Figure 6-14 vNIC with FCoE: dedicated uplink mode vNIC with FCoE and shared uplink mode The configuration above would be changed in the following ways to use shared-uplink vNIC: 򐂰 On the EN4093’s – The vnic uplink-share command would be used to enable shared uplink mode – The VLAN for vnic group 1 would be set to VLAN 2. All vnic instances which are assigned to group 1 would only carry VLAN 2. – Ports EXT5 and EXT7 could optionally be aggregated together.
  • 193. 179 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 – The ports used to uplink vnic group 1 could also carry traffic from other vnic groups, on their group VLANs. – The uplink ports or aggregations for group 1 must be configured to include the FCoE VLAN, 1001 or 1002. 򐂰 On the G8264CS’s: – The port or aggregation used to downlink to the EN4093’s must match its aggregation type and status and its VLAN membership, including VLAN 1001 or 1002. A topology diagram with shared uplink mode is shown in Figure 6-15. Figure 6-15 vNIC with FCoE: shared uplink mode Design Choices The choice to use shared uplink mode or dedicated uplink mode is similar to the choice between a single uplink and uplinks which segregate data and FCoE traffic discussed in the section on pNIC mode. Shared uplink mode allows data and FCoE traffic to traverse the same uplink, shared uplink mode restricts each data bearing vNIC connected to a server to carry only a single VLAN. UFP allows either shared uplinks or distinct uplinks without these restrictions.
  • 194. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 180 NIC Virtualization on IBM Flex System 6.4.5 Verifying operation This section discusses commands that help verify correct operations. Failover in pNIC mode The failover trigger commands can be checked using the show failover command, as shown in Example 6-26 and Example 6-27. Example 6-26 Show Failover command output - Manual Monitor slot-1#sho failover trigger 1 Current Trigger 1 setting: enabled limit 1 Auto Monitor settings: Manual Monitor settings: LACP port adminkey 7575 Manual Control settings: ports INTA1 INTA2 Example 6-27 Show Failover command output - Auto Monitor slot-2#show failover trigger 1 Current Trigger 1 setting: enabled limit 1 Auto Monitor settings: LACP port adminkey 5757 Manual Monitor settings: Manual Control settings: When a failover occurs, the following messages are seen. Note that in this case, FCoE was part of the configuration and the FCoE session failure also resulted in a message shown in Example 6-28: Example 6-28 Messages resulting from a failover event slot-2(config)#int port ext7 slot-2(config-if)#shut slot-2(config-if)# Apr 15 16:02:26 slot-2 NOTICE link: link down on port EXT7 Apr 15 16:02:26 slot-2 NOTICE lacp: LACP is down on port EXT7 Apr 15 16:02:26 slot-2 WARNING failover: Trigger 1 is down, control ports are auto disabled. Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA1 Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA3 Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA4 Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA10 Apr 15 16:02:45 slot-2 NOTICE fcoe: FCOE connection between VN_PORT 0e:fc:00:01:0c:00 and FCF a8:97:dc:44:eb:c3 is down.
  • 195. 181 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 When internal (or other) ports are down due to a failover, they appear as disabled in a show interface link command, as shown in Figure 6-16. Figure 6-16 Links disabled after failover When a failed link recovers, messages such as the following shown in Figure 6-17 are seen. Figure 6-17 Messages resulting from failover recovery slot-2(config-if)#sho int link ------------------------------------------------------------------ Alias Port Speed Duplex Flow Ctrl Link Name ------- ---- ----- -------- --TX-----RX-- ------ ------ INTA1 1 1G/10G full no no disabled INTA1 INTA2 2 1G/10G full no no disabled INTA2 ..... INTA14 14 1G/10G full no no disabled INTA14 slot-2(config-if)#int port ext7 slot-2(config-if)#no shut slot-2(config-if)# Apr 15 16:07:35 slot-2 NOTICE link: link up on port EXT7 Apr 15 16:07:35 slot-2 NOTICE dcbx: Detected DCBX peer on port EXT7 Apr 15 16:07:39 slot-2 NOTICE lacp: LACP is up on port EXT7 Apr 15 16:07:39 slot-2 NOTICE failover: Trigger 1 is up, control ports are auto controlled. Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA1 Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA3 Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA4 Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA10 Apr 15 16:07:42 slot-2 NOTICE fcoe: FCOE connection between VN_PORT 0e:fc:00:01:0c:00 and FCF a8:97:dc:44:eb:c3 has been established.
  • 196. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 182 NIC Virtualization on IBM Flex System The status of a disabled port can also be seen on the server as both a NIC and HBA. Port vmnic0 is still active and carrying traffic, and has paths to the storage array, as shown in Figure 6-18 and Figure 6-19 Figure 6-18 VMware display showing port down Figure 6-19 VMware storage adapter showing no paths to storage Failover in vNIC mode The FCoE vNIC instance (INTA1.2) still requires a dedicated uplink unless shared-uplink mode is used. The failover status of a non-FCoE vNIC is shown in the show vnic vnicgroup command. However, there is no console message that shows that the associated vnic(s) have been brought down; this can also be seen by entering the same command, as shown in Figure 6-20 on page 183.
  • 197. 183 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 Figure 6-20 Show vNIC vnicgroup before failover As shown in Figure 6-21, there is no message showing that INTA1.1 has been brought down. Figure 6-21 Messages resulting from shutting down vnic group’s uplink ports slot-2#sho vnic vnicg 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 3901 Failover : enabled vNIC Link ---------- --------- INTA1.1 up Port Link ---------- --------- UplinkPort Link ---------- --------- EXT5* up * = The uplink port has LACP admin key 555 slot-2(config)#int port ext5,ext6 slot-2(config-if)#shut slot-2(config-if)# Apr 15 18:51:55 slot-2 NOTICE link: link down on port EXT5 Apr 15 18:51:55 slot-2 NOTICE lacp: LACP is down on port EXT5
  • 198. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 184 NIC Virtualization on IBM Flex System The command output, however, does show that the uplink port, EXT5, is down and that the associated vNIC members of the group have been disabled, as shown in Figure 6-22. Figure 6-22 show vnic vnicgroup after failover slot-1(config-if)#sho vnic vnicg 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 3901 Failover : enabled vNIC Link ---------- --------- INTA1.1 disabled Port Link ---------- --------- UplinkPort Link ---------- --------- EXT5* down * = The uplink port has LACP admin key 555
  • 199. 185 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 The FCoE vnic instance can not be configured into a vNIC group and is managed by the failover trigger commands. In the configuration shown in Figure 6-23, the uplink for FCoE traffic is on port EXT7 and FCoE uses VLAN 1001. vNIC and pNIC modes are very similar in this regard. Figure 6-23 Failover configuration for FCoE vnic When EXT7 is brought down, INTA1 (and other ports if so configured) are brought down as shown in the messages. This brings down all the vNIC instances associated with INTA1, so INTA1.1 is down and it is shown in Figure 6-24 on page 186 as down rather than disabled as is the case above. Since the uplinks associated with vnic group 1 are still up, the remaining vNIC instances still have a viable path to the network. slot-1#sho run | section failover failover enable failover trigger 1 mmon monitor member EXT7 failover trigger 1 mmon control member INTA1 failover trigger 1 enable vnic enable vnic uplink-share vnic port INTA1 index 1 bandwidth 25 enable exit ! vnic port INTA1 index 2 bandwidth 25 enable exit ! vnic port INTA1 index 3 bandwidth 25 enable exit ! vnic port INTA1 index 4 bandwidth 25 enable exit ! vnic vnicgroup 1 vlan 2 enable failover member INTA1.1 key 555 exit
  • 200. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 186 NIC Virtualization on IBM Flex System Figure 6-24 Failover message flow from FCoE uplink failure slot-1#sho vnic vnicgroup 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 2 Failover : enabled vNIC Link ---------- --------- INTA1.1 up Port Link ---------- --------- UplinkPort Link ---------- --------- EXT5* up EXT6* up * = The uplink port has LACP admin key 555 slot-1#config t Enter configuration commands, one per line. End with Ctrl/Z. slot-1(config)#int port ext7 slot-1(config-if)#shut Apr 15 20:27:04 slot-1 NOTICE link: link down on port EXT7 Apr 15 20:27:04 slot-1 WARNING failover: Trigger 1 is down, control ports are auto disabled. Apr 15 20:27:04 slot-1 NOTICE server: link down on port INTA1 Apr 15 20:27:43 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c3 has been removed because it had timed out. Apr 15 20:27:43 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c4 has been removed because it had timed out. slot-1#sho vnic vnicg 1 (after EXT7 shut down) ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 2 Failover : enabled vNIC Link ---------- --------- INTA1.1 down UplinkPort Link ---------- --------- EXT5* up EXT6* up * = The uplink port has LACP admin key 555
  • 201. 187 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover with FCoE and shared-uplink vNIC Failover in this mode is similar to the previous scenarios presented; the difference is that FCoE traffic and other data traffic share the same uplink(s). It is still appropriate to use both the failover trigger command and the vnic group failover option. The failover trigger can be used to bring down those internal facing ports that depend specifically on the uplink while the vnic group failover will bring down vnic’s (and not entire internal ports) which depend on the uplink. As in previous cases, FCoE and the HBA drivers that support it have their own failover capabilities on the servers so that if one HBA fails, the surviving HBA can continue to provide storage access if properly configured to do so. From the testing performed on VMware, this failover happens quickly. The messages that result from an uplink failure in this scenario are similar to those in the non-shared scenario presented above but they are shown in Figure 6-25. Figure 6-25 Message flow from uplink failure - shared uplink mode with FCoE slot-1#config t Enter configuration commands, one per line. End with Ctrl/Z. slot-1(config)#int port ext5 slot-1(config-if)#shut Apr 16 12:52:30 slot-1 NOTICE link: link down on port EXT5 Apr 16 12:52:30 slot-1 WARNING failover: Trigger 1 is down, control ports are auto disabled. Apr 16 12:52:30 slot-1 NOTICE lacp: LACP is down on port EXT5 Apr 16 12:52:30 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c4 has been removed because trunk configuration on the fcf changed. Apr 16 12:52:30 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c3 has been removed because trunk configuration on the fcf changed. Apr 16 12:52:30 slot-1 NOTICE server: link down on port INTA1 Apr 16 12:52:31 slot-1 NOTICE dcbx: Feature VNIC not supported by peer on port INTA2 Apr 16 12:52:31 slot-1 NOTICE dcbx: Feature VNIC not supported by peer on port INTA10 sho vnic vnicgroup 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 2 Failover : enabled vNIC Link ---------- --------- INTA1.1 disabled UplinkPort Link ---------- --------- EXT5* down * = The uplink port has LACP admin key 555
  • 202. Deployment scenarios - pNICvNIC Virtual Fabric + L2 Failover.fm Draft Document for Review May 1, 2014 188 NIC Virtualization on IBM Flex System
  • 203. 189 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR 6.5 Switch Independent mode with SPAR This section will show deployment examples using vNIC Switch Independent mode with SPAR pass-thru mode. The combination of these features - Switch Independent mode on the Emulex adapter and SPAR pass-thru mode on the embedded switches (EN4093R, CN4093, SI4093) allows for a minimum configuration effort on the embedded switches. Little to no embedded switch configuration effort is required when a new VLAN or new compute node is added to a Flex chassis in this scenario, 6.5.1 Components The following hardware and software was used in the examples in this chapter. 򐂰 Flex System Enterprise Chassis 򐂰 x240 Compute Node in bay 1 – Running ESX 5.1 – Dual port Emulex LOM CNA – DS4800 external storage attached via FC ports on G8264 switches 򐂰 Two EN4093’s in I/O Module bays 1 and 2 – Both with Upgrade 1 FoD installed 򐂰 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of CN4093’s – Providing FCF function and physical connectivity to DS4800 on Fibre Channel port 53 6.5.2 Topology Figure 6-26 on page 190 and Figure 6-27 on page 191 describe topologies that are used with SPAR.
  • 204. Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1, 190 NIC Virtualization on IBM Flex System Figure 6-26 Topology with SPAR passthru mode
  • 205. 191 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR Figure 6-27 Local SPAR domain with Switch Independent vNIC and FCoE 6.5.3 Use Cases SPAR Local and Passthru mode SPAR (Switch Partition) is an option on the EN4093R, CN4093, and SI4093 IBM embedded switches. The implementation of SPAR on the SI4093 is different from that on the other switches and is not dealt with in this book. SPAR allows the switches listed above to logically partition its available ports into multiple domains. In other words, there are multiple segments of the data plane of the switch which do not communicate with each other (unless via an external device). SPAR pass-thru mode is an option which uses 802.1q-in-q double tagging to allow customer VLANs to pass through a SPAR instance on a switch without any explicit configuration. This allows new VLANs to be added without any additional configuration on the embedded switch. It is possible to use the same VLAN number in multiple domains, but devices on a given VLAN in SPAR 1 will not be able to communicate with a device on that same VLAN in a different SPAR domain unless the domains are interconnected elsewhere in the network. SPAR local domain mode provides the logical partitioning mentioned above, but does not tunnel customer VLANs through the switch. Instead, each VLAN which is to be used in a
  • 206. Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1, 192 NIC Virtualization on IBM Flex System domain must be explicitly configured in that domain. However, it is still possible to define the same VLAN number in multiple different domains; a device connected to a given VLAN (for example, 10) in SPAR 1 will not be able to communicate with a device on that same VLAN in SPAR 2 or SPAR 3 within the switch. vNIC Switch Independent Mode Switch Independent Mode is an option on the Emulex adapters, including the LOM included on several of the available Flex compute nodes and the EN4054 mezzanine card. This feature allows the Emulex chip to present up to four vNIC instances to a server based on configuration options in the server’s UEFI rather than those learned from an IBM switch. This mode can therefore be used with a variety of embedded I/O modules including the 4091 Pass-thru module, SI4093 System Interconnect, and I/O modules from companies other than IBM. The testing that is outlined in this section was all done with IBM embedded switches, but the commands for vNIC and UFP functionality are not used. In Switch Independent Mode, each vNIC associated with a port is assigned a default VLAN in UEFI, referred to as a LPVID (Local Port VLAN ID). Untagged traffic originating from the server on a vNIC will be tagged by the Emulex adapter with the configured LPVID VLAN. One consequence of this is that all server traffic entering the embedded switch from a server using this mode will be tagged. vNIC Switch Independent Mode with SPAR Passthru mode Using these features together allows new VLANs to be created and used on servers (including guest OS’s running under a hypervisor) and not configured on the embedded switches at all. On servers, VLANs would be created in ways including the following. This is covered in more detail in section 5.3, “Utilizing physical and virtual NICs in the OS” on page 115. Here are some considerations regarding the creation of tagged VLANs for different operations systems: 򐂰 Windows Server 2012 - has network configuration tools that allow the creation of tagged VLANs. When this is done, an additional items is created in the Network Connections folder. The default (untagged) Network Connection would use the LPVID for the associated vNIC. 򐂰 Other versions of Windows would need to use the Emulex utility that provides the ability to create tagged VLANs. 򐂰 VMware - port groups which are attached to a vSwitch can have a specific VLAN associated with them; these VLANs are transmitted with tags. A port group configured with no VLAN (VLAN 0) will use the LPVID for the associated vNIC. VMware also allows a port group to be associated with VLAN 4095; when this is done, VLAN tagging is delegated to the OS’s of the guest systems. 򐂰 Linux - the vconfig command can create tagged VLAN interfaces attached to a specific NIC (or vNIC) as seen by the Linux OS. These interfaces default to names of the form ethx.vlan#; for example, eth0.10. The ifconfig command can be used to set the attributes of these interfaces once they are created. Various Linux distributions also have graphical tools which provide the same capabilities. vLAG topology considerations vLAG can not be used in concert with SPAR. Since each SPAR domain can only have one uplink, it would not be possible to successfully configure links to upstream switches and also an ISL to a vLAG peer switch. Therefore vLAG can not be used on uplink or downlink (server facing) ports which are included in a SPAR domain.
  • 207. 193 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR A switch running SPAR could be an access switch using a PortChannel to connect to two upstream vLAG switches if desired. The SPAR domains would have to use the same VLANs, whether or not they were explicitly configured (passthru vs. local mode), and could include a FCoE VLAN in either case. A topology such as this would be more robust than one which did not include the use of vLAG. 6.5.4 Configuration This section describes the following configuration steps: 򐂰 “vNIC Switch Independent Mode” 򐂰 “Switch side configuration - FCoE” on page 195 򐂰 “SPAR (Switch Partition) configuration” on page 196 vNIC Switch Independent Mode Server side - UEFI configuration This topic is covered in detail in section 5.1.3, “Special settings for the different modes of virtual NIC via UEFI” on page 86. For the examples in this section, the configuration on each port of the Emulex card is as follows, and is shown in Figure 6-28, Figure 6-29 on page 194, and Figure 6-30 on page 194: 򐂰 vnic instance 1 - LPVID 3001, min. bandwidth 10%, max bandwidth 100% 򐂰 vnic instance 2 - FCoE vNIC, no LPVID, min. bandwidth 40%, max bandwidth 100% 򐂰 vnic instance 3 - LPVID 3003, min. bandwidth 20%, max bandwidth 100% 򐂰 vnic instance 4 - LPVID 3004, min. bandwidth 30%, max bandwidth 100% Figure 6-28 UEFI Configuration for Switch Independent Mode
  • 208. Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1, 194 NIC Virtualization on IBM Flex System Figure 6-29 UEFI Configuration - Bandwidth for Switch Independent Mode Figure 6-30 Configuration display with Bandwidth and LPVID (2 of 4 vNIC’s shown)
  • 209. 195 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR Server Side - Operating System Configuration (VMware) The host was configured with a port group for VLAN 2 and an additional port group to test guest tagging, assigned to VLAN 4095. Guests can be moved from one port group to another via the settings menu. For a deeper discussion of networking configuration on VMware and other operating systems, see section 5.3, “Utilizing physical and virtual NICs in the OS” on page 115. Figure 6-31 VMware network configuration with two port groups Switch side configuration - FCoE There is a group of commands required to enable FCoE on an embedded switch with SPAR and Switch Independent mode. The requirements differ depending on whether the switch is an FCoE transit switch such as the EN4093R used in testing for this chapter, or a converged switch such as the CN4093 used in testing for UFP. The transit switch requirements are below; for a discussion of the configuration of the CN4093, see section 6.3, “UFP mode virtual NIC with vLAG and FCoE” on page 149. To configure an EN4093R as a FCoE transit switch, the requirements are as follows: 򐂰 Enable lossless Ethernet (or Converged Enhanced Ethernet) functionality with the cee enable command. 򐂰 Enable FIP snooping with the fcoe fips enable command. This allows the switch to become aware of FCoE initialization traffic and be ready to carry FCoE traffic. 򐂰 Define the VLAN(s) which will carry FCoE traffic and ensure that the appropriate server facing ports and uplink ports are members of those VLAN(s). – FCoE VLANs should not be the native VLAN on server facing ports. If vLAG is used, in general the vLAG ISL should not carry the FCoE VLANs. – It is common, but not required, to use two distinct VLANs for FCoE. This is usually done where to connect to a redundant storage networking environment. In such an environment, there are two SAN fabrics, usually referred to as SAN-A and SAN-B. Each of the fabrics would connect to its own FCoE VLAN. Typically, two FCoE transit switches in a Flex chassis would each use a different VLAN for FCoE.
  • 210. Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1, 196 NIC Virtualization on IBM Flex System An example of the configuration commands required for FCoE transit is shown in Example 6-29. It uses VLAN 1002 for FCoE traffic, which is the default: Example 6-29 FCoE Transit Configuration cee enable fcoe fips enable vlan 1002 enable member INTA1-INTA14,EXT5,EXT7 There are no changes to the configuration above if Switch Independent mode is used. The differences when SPAR passthru mode or SPAR local mode are used are shown in the remainder of this section. SPAR (Switch Partition) configuration SPAR configuration is performed exclusively on switches; the servers are unaware of it. In SPAR local mode, the VLANs configured on the server must be explicitly configured on the switches, but this is also true when the SPAR feature is not used. SPAR pass-through mode - Switch side For the examples in this section, the configuration is as follows: 򐂰 SPAR 2 has at least the necessary ports (INTA1 and EXT5 and 7) configured as members of the SPAR domain. (Additional internal ports are added to the domain but were not used in testing.) 򐂰 The two uplink ports are aggregated together using LACP key 5757. 򐂰 The VLAN associated with SPAR 2 is 3992; note that this is an outer-tag or tunnel VLAN which never leaves the embedded switch on either server-facing or external-facing ports. 򐂰 The remaining internal and external ports on the embedded switches were not configured in a SPAR domain and continue to be configured and to operate normally. 򐂰 The VLANs configured on the VMware server flow through the SPAR domain as a tunnel and do not appear in its configuration. Those VLANs, along with the FCoE VLAN, are configured on the upstream 8264’s. 򐂰 When FCoE is used with SPAR passthru mode, the only command that is used is the cee enable command. FIPS snooping is performed on the switch upstream from the one where SPAR is used, which in our testing would be one of the upstream 8264CS switches. Example 6-30 shows SPAR configuration for the pass-thru mode. Example 6-30 SPAR Pass- through Mode - Switch Configuration for Embedded 4093 switches spar 2 uplink adminkey 5757 domain default vlan 3992 domain default member INTA1,INTA12-INTA14 enable exit
  • 211. 197 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR SPAR local mode - Switch side The same server configuration was used to test a local SPAR domain in concert with Switch Independent mode. The local SPAR domain was configured as follows: 򐂰 Ports INTA1 and EXT5 and 7 are included in the domain. The two external ports are configured to use LACP key 5757. 򐂰 The default VLAN for the domain is 3001, which matches the LPVID for vnic 1. 򐂰 Local VLANs 3002, 3003, and 3004 are defined in the domain and associated with the INTA1 and EXT6 ports. These VLANs would carry untagged traffic originating on the server and sent via the vNIC instances. 򐂰 Local VLAN 2 is also defined on the server; it is used to carry the traffic from the guest VM’s which are attached to the port groups discussed in 6.4, “pNIC and vNIC Virtual Fabric modes with Layer 2 Failover” on page 163. 򐂰 The intended FCoE VLAN(s), 1001 or 1002, also need to be configured here if they are to pass through the SPAR domain. When those VLANs are configured in the SPAR configuration, the usual commands to create the VLANs and assign their members are not used. 򐂰 Different server facing ports within the SPAR domain can have different VLAN membership by specifying the ports desired for a specific VLAN in the domain local n commands. This mirrors the ability to configure VLANs on a port with the usual switchport allowed vlan or VLAN member commands. 򐂰 There is only a single uplink per SPAR domain, which can be an individual port, a static portchannel, or a LACP portchannel. The uplink is always a member of all of the VLANs defined within the SPAR local domain. Example 6-31 shows SPAR configuration for the local mode. Example 6-31 SPAR Local Mode - Switch Configuration for Embedded 4093 switches slot-1#sho run | section spar spar 2 uplink adminkey 5757 domain mode local domain default vlan 3001 domain default member INTA1 domain local 1 vlan 3003 domain local 1 member INTA1 domain local 1 enable domain local 2 vlan 3004 domain local 2 member INTA1 domain local 2 enable domain local 3 vlan 1001 (1002 on second switch) domain local 3 member INTA1 domain local 3 enable domain local 4 vlan 2 domain local 4 member INTA1 domain local 4 enable Upstream G8264 configuration for SPAR The G8264 switches have no special configuration requirements when SPAR is used on the downstream EN4093 switches. VLANs used on the servers must be configured on the G8264 switches, whether they are configured on the Emulex UEFI, the server operating system, or learned by the servers as part of FCoE initialization. The configuration on the G8264 switches
  • 212. Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1, 198 NIC Virtualization on IBM Flex System for their side of the uplinks from the EN4093’s also must be configured to match the configuration specified on the EN4093 switches. If the upstream switches are to provide FCoE functions such as FCF or NPV, then those functions would be part of their configuration in the usual way. 6.5.5 Verifying operation In summary, it is possible to do the following if desired: 򐂰 Switch Independent mode with SPAR passthru domain 򐂰 Switch Independent mode with SPAR local domain It is also possible to use these features separately from each other if desired. Switch independent mode allows servers to see more NIC interfaces than are physically available and allocate their bandwidth (outbound only). SPAR provides a way to partition the switches on which it is available and tunnel VLANs through them with no additional configuration if passthru mode is used. VLAN numbering considerations There are several different categories of VLANs which need to have assigned numbers with these features, whether used separately or in concert. They are summarized below: 򐂰 Data-bearing VLANs - these are the VLANs that are defined both on the compute node and in the upstream network and which actually carry data. They are typically assigned and managed by the networking team in a customer environment. They are configured on compute nodes in the Flex chassis and also on Top-of-Rack or other aggregation switches which typically are immediately upstream of the embedded I/O modules. 򐂰 Switch Independent Mode LPVIDs - these are the VLANs which are configured in the UEFI page for the Emulex adapter(s) on compute nodes. They are used as the VLANs for untagged traffic sent from a compute node on a vNIC instance, so they are similar to a native VLAN on a switch. LPVIDs can either be actual VLANs which are data-bearing VLANs and will allow host or guest OS’s to send untagged traffic. One common approach, however, is to use numbers for these VLANs which are unlikely to be used for data-bearing VLANs, such as numbers in the 4000 range, and to then always send tagged traffic from hypervisors or guess OS’s. This approach allows VLAN assignments to be changed without the need to reboot the compute node and go through the UEFI configuration. 򐂰 SPAR domain default VLANs - these are used for the outer tag when traffic passes through a SPAR passthru domain. They never leave the switch where they are configured. They can be assigned the same number as a data-bearing VLAN number or a LPVID number, although this may result in confusion when troubleshooting the environment. 򐂰 VLANs used in SPAR local domains. If a SPAR local domain is used then any data-bearing VLANs, including the LPVID VLANs and others defined on OS’s must be explicitly configured as the domain default VLAN or as local VLANs within the domain. Use of SPAR local domains does not provide the ability to avoid configuring VLANs on the embedded I/O modules which is one of the key benefits of using a SPAR passthru domain. Verifying Operations: SPAR Passthru Mode The status of the SPAR is shown through the show spar command. To verify that traffic is flowing to the upstream switch, the show mac-address-table command is used on the downlink ports and/or the desired VLANs. In our test bed, addresses from the VMware management network, the virtual guest machines, and FCoE appear on the SPAR VLAN on the embedded switch but on their proper VLANs on the upstream 4093 switch. If the MAC
  • 213. 199 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR addresses do not appear in both places, traffic is not flowing properly. The SPAR VLAN, 3992, is not seen at all on the upstream switch. The commands to verify SPAR operations and their output are listed in Example 6-32, Example 6-33, Example 6-34, and Example 6-35 Example 6-32 Show SPAR command output slot-1#sho spar ? 1-8 Show SPAR ID information slot-1#sho spar 2 Current SPAR 2 Settings: enabled, name SPAR 2 Current SPAR 2 Uplink Settings: port 0, PortChannel 0, adminkey 5757 Current SPAR 2 Domain Settings: mode passthrough Current SPAR 2 Default VLAN Domain Settings: sparvid 3992 server port list: INTA1,INTA12-INTA14 Example 6-33 MAC address display on embedded switch slot-1#sho mac int port inta1 MAC address VLAN Port Trnk State Permanent Openflow ----------------- -------- ------- ---- ----- --------- -------- 00:0c:29:4a:60:ae 3992 INTA1 FWD N 00:0c:29:54:38:d8 3992 INTA1 FWD N 0e:fc:00:01:0c:00 3992 INTA1 FWD N 34:40:b5:be:8e:91 3992 INTA1 FWD N Example 6-34 MAC address display from 8264 switch - downlinks to 4093 8264cs-1#sho mac portchannel 67 MAC address VLAN Port Trnk State Permanent ----------------- -------- ------- ---- ----- --------- 00:0c:29:4a:60:ae 2 67 TRK 00:0c:29:54:38:d8 1 67 TRK 0e:fc:00:01:0c:00 1001 67 TRK P 34:40:b5:be:8e:91 1 67 TRK 34:40:b5:be:8e:91 1001 67 TRK Example 6-35 SPAR VLAN on upstream 8264 8264cs-1#sho mac vlan 3992 No FDB entries for VLAN 3992. 8264cs-1#sho vlan 3992 VLAN Name Status Ports ---- -------------------------------- ------ ------------------------- VLAN 3992 doesn't exist. 8264cs-1#
  • 214. Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1, 200 NIC Virtualization on IBM Flex System Verifying Operations: SPAR Local Mode SPAR local mode requires explicit VLAN configuration for every VLAN that will flow through the SPAR domain. These VLANs do appear in the MAC address table of the switch but as shown in the configuration section ref above, they are configured using the domain local n vlan command rather than the usual VLAN membership commands. In addition to the steps shown in the section on verifying SPAR pass-through mode, the MAC address display on both the embedded and upstream switches should show all of the VLANs which are to be used. An example of a MAC display from a SPAR local server is shown in Figure 6-32. It includes the SPAR domain default VLAN, which is also the LPVID for vNIC 1, as well as addresses and VLANs used by FCoE. Figure 6-32 MAC addresses for server in SPAR local domain The SPAR local VLANs would also need to be configured on upstream switch(es). In this test case, they are the same as the vNIC LPVID VLANs. Unlike SPAR pass-through mode, FIP snooping is configured in a SPAR local domain and the show fcoe commands do work and would need to be checked to verify proper operations, as shown in Figure 6-33. Figure 6-33 FCoE information - 4093 switch - SPAR local domain mode Verifying Operations: Switch Independent Mode The status of the network can be seen from the presence of MAC address entries in the embedded and upstream switches as well as from the tools included in the operating system. show mac int port inta1 MAC address VLAN Port Trnk State Permanent Openflow ----------------- -------- ------- ---- ----- --------- -------- 00:0c:29:4a:60:ae 2 INTA1 FWD N 00:0c:29:54:38:ce 2 INTA1 FWD N 0e:fc:00:01:0c:00 1001 INTA1 FWD P N 34:40:b5:be:8e:90 2 INTA1 FWD N 34:40:b5:be:8e:91 1001 INTA1 FWD N 34:40:b5:be:8e:91 3001 INTA1 FWD N slot-1#sho fcoe fips fcoe Total number of FCoE connections: 1 VN_PORT MAC FCF MAC Port Vlan ------------------------------------------------------ 0e:fc:00:01:0c:00 a8:97:dc:0f:ed:c3 INTA1 1001 slot-1#sho fcoe fips fcf Total number of FCFs detected: 2 FCF MAC Port Vlan ----------------------------------- a8:97:dc:0f:ed:c3 PCH65 1001 a8:97:dc:0f:ed:c4 PCH65 1001
  • 215. 201 Draft Document for Review May 1, 2014 2:10 pm Deployment scenarios - vNIC Switch Independent + SPAR Examples of the MAC displays can be seen in Figure 6-32 on page 200 and Figure 6-33 on page 200. The network adapter display from VMware is shown in Figure 6-34 and Figure 6-35. VLAN 2 is configured on multiple vSwitches and this works as intended, but uses different vNIC’s as seen by the OS. The active vNIC instances can be seen below followed by a display of all of the NIC’s known to the OS. The differing bandwidth configurations for the different vNICs on the two physical ports are reflected in the display below, except for the FCoE vNIC’s which do not appear in the network adapter display. Figure 6-34 VMware vSwitches with multiple vNIC instances Figure 6-35 VMware Network Adapter display showing all six vNIC’s Verifying Operations: Storage Access Because FCoE traffic, whichever VLAN it is using, is not detected as such on the embedded switches in this mode, the commands to display its status will not show anything when issued on the embedded 4093’s. To determine the status of FCoE, appropriate commands need to be issued on the upstream G8264 switch, as shown in Example 6-36 on page 202 and Example 6-37 on page 202.
  • 216. Deployment scenarios - vNIC Switch Independent + SPAR pass-thru.fm Draft Document for Review May 1, 202 NIC Virtualization on IBM Flex System Example 6-36 FCoE query on embedded 4093 using SPAR Pass-through slot-1#sho fcoe fips fcoe FIP snooping is currently disabled. Example 6-37 FCoE query on upstream 8264 8264cs-1#sho fcoe fips fcoe Total number of FCoE connections: 1 VN_PORT MAC FCF MAC Port Vlan ------------------------------------------------------ 0e:fc:00:01:0c:00 a8:97:dc:0f:ed:c3 PCH67 1001 Access to network storage also needs to be verified from the servers accessing it. Three LUNs are shown as visible to the server (see Figure 6-36); when there is a configuration error or a failure on either adapter, the number of LUNs and paths drops to zero on that adapter. Figure 6-36 Storage Adapter status from VMware host
  • 217. © Copyright IBM Corp. 2014. All rights reserved. 203 Draft Document for Review May 1, 2014 2:10 pm 8223abrv.fm 10GbE 10 Gigabit Ethernet ACLs access control lists AMON Auto Monitor BACS Broadcom Advanced Control Suite BASP Broadcom Advanced Server Program BE3 BladeEngine 3 BE3R BladeEngine 3R BNT Blade Network Technologies CEE Converged Enhanced Ethernet CIFS Common Internet File System CNAs converged network adapters CSE Consulting System Engineer DAC direct-attach cables DACs direct-attach cables DCB Data Center Bridging DCE Data Center Ethernet ECP Edge Control Protocol ETS Enhanced Transmission Selection EVB Edge Virtual Bridging FC Fibre Channel FCF Fibre Channel Forwarder FCoE Fibre Channel over Ethernet FIP FCoE Initialization Protocol FO Failover FoD Feature on Demand HBA host bus adapter HBAs host bus adapters IBM International Business Machines Corporation ISL inter-switch link ITSO International Technical Support Organization KVM Kernel-based Virtual Machine LACP Link Aggregation Control Protocol LAG Link Aggregation Group LANs local area networks LOM LAN on system board MAC Media access control MMON Manual Monitor MSTP Multiple STP Abbreviations and acronyms NAS network-attached storage NFS Network File System NIC Network Interface Card NPIV N_Port ID Virtualization NPV N_Port Virtualization NTP Network Time Protocol PDUs protocol data units PFA PCI Function Address PFC Priority-based Flow Control PIM Protocol Independent Multicast PVRST Per-VLAN Rapid STP RMON Remote Monitoring ROI return on investment RSCN Registered State Change Notification RSTP Rapid STP RoCE RDMA over Converged Ethernet SAN storage area network SANs storage area networks SAS serial-attached SCSI SLB Smart Load Balance SLP Service Location Protocol SNSC System Networking Switch Center SPAR Switch Partitioning SR SFP+ Transceiver SoL Supports Serial over LAN TLV Type-Length-Value TOE TCP offload Engine Tb terabit ToR Top of Rack UFP Unified Fabric Port UFPs Unified fabric ports VEB Virtual Ethernet Bridging VEPA Virtual Ethernet Port Aggregator VM virtual machine VMs virtual machines VSI Virtual Station Interface VSS Virtual Switch System iSCSI Internet Small Computer System Interface isCLI industry standard CLI
  • 218. 8223abrv.fm Draft Document for Review May 1, 2014 2:10 pm 204 NIC Virtualization on IBM Flex System pNIC Physical NIC mode sFTP Secure FTP vLAG virtual Link Aggregation vLAGs Virtual link aggregation groups vNIC virtual Network Interface Card vNICs Virtual NICs vPC virtual Port Channel vPort virtual port
  • 219. © Copyright IBM Corp. 2014. All rights reserved. 205 Draft Document for Review May 1, 2014 2:10 pm 8223bibl.fm Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book. IBM Redbooks The following IBM Redbooks publications provide additional information about the topic in this document. Note that some publications referenced in this list might be available in softcopy only. 򐂰 IBM Flex System Networking in an Enterprise Data Center, 2nd Edition, REDP-4834 򐂰 IBM Flex System and PureFlex System Network Implementation, SG24-8089 򐂰 Storage and Network Convergence Using FCoE and iSCSI, SG24-7986 򐂰 Implementing Systems Management of IBM PureFlex System, SG24-8060 򐂰 IBM PureFlex System and IBM Flex System Products and Technology, SG24-7984 You can search for, view, download or order these documents and other Redbooks, Redpapers, Web Docs, draft and additional materials, at the following website: ibm.com/redbooks Help from IBM IBM Support and downloads ibm.com/support IBM Global Services ibm.com/services
  • 220. 8223bibl.fm Draft Document for Review May 1, 2014 2:10 pm 206 NIC Virtualization on IBM Flex System
  • 221. To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided 250 by 526 which equals a spine width of .4752. In this case, you would use the .5” spine. Now select the Spine width for the book and hide the others: SpecialConditional TextShow/HideSpineSize(--Hide:)Set . Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and FileImportFormats the Conditional Text Settings (ONLY!) to the book files. Draft Document for Review May 1, 2014 2:10 pm 8223spine.fm 207 (0.2”spine) 0.17”-0.473” 90-249 pages NIC Virtualization on IBM Flex System NIC Virtualization on IBM Flex System NIC Virtualization on IBM Flex System NIC Virtualization on IBM Flex System
  • 224. ® SG24-8223-00 ISBN Draft Document for Review May 1, 2014 2:11 pm INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment. For more information: ibm.com/redbooks ® NIC Virtualization on IBM Flex System Introduces NIC virtualization concepts and technologies Discusses vNIC deployment scenarios Provides vNIC configuration examples The deployment of server virtualization technologies in data centers requires significant efforts in providing sufficient network I/O bandwidth to satisfy the demand of virtualized applications and services. For example, every virtualized system can host several dozen network applications and services. Each of these services requires certain bandwidth (or speed) to function properly. Furthermore, because of different network traffic patterns that are relevant to different service types, these traffic flows can interfere with each other. They can lead to serious network problems, including the inability of the service to perform its functions. The NIC virtualization solutions on IBM® Flex System address these issues. The solutions are based on the IBM Flex System® Enterprise Chassis with a 10 Gbps Converged Enhanced Ethernet infrastructure. This infrastructure is built on IBM RackSwitch™ G8264 and G8264CS Top of Rack (ToR) switches, IBM Flex System Fabric CN4093 and EN4093R 10 Gbps Ethernet switch modules, and IBM Flex System SI4093 Switch Interconnect modules in the chassis and the Emulex Virtual Fabric Adapters in each compute node. This IBM Redbooks® publication provides configuration scenarios that use leading edge IBM networking technologies combined with the Emulex Virtual Fabric adapters. This book is for IBM, IBM Business Partner and client networking professionals who want to learn how to implement NIC virtualization solutions and switch interconnect technologies on IBM Flex System by using the IBM Unified Fabric Port (UFP) mode, Switch Independent mode, and IBM Virtual Fabric mode. Back cover