SlideShare a Scribd company logo
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon
Scalable processors offered better BERT machine
learning performance
vs. M5n instances with 2nd
Gen Intel Xeon Scalable processors and
M6a instances with 3rd
Gen AMD EPYC processors
Many machine learning workloads involve sorting, analyzing, and making relationships
between images, but how can organizations quickly make sense of large amounts of text?
Bidirectional Encoder Representations from Transformers (BERT) is a machine learning
framework for natural language processing (NLP). To analyze text, BERT looks at all the words
around a given word to put it in the correct context. This allows applications such as search
engines to predict sentences, answer questions, or generate conversational responses.
Using Intel optimization for TensorFlow and ZenDNN integrated with TensorFlow, we
compared the BERT machine learning performance of three types of Amazon Web Services
(AWS) EC2 series instances: M6i instances with 3rd
Gen Intel®
Xeon®
Scalable processors
featuring Intel DL Boost with Vector Neural Network Instructions, M5n instances with 2nd
Gen
Intel Xeon Scalable processors, and M6a instances with 3rd
Gen AMD EPYC™
processors.
In tests at multiple instance sizes, AWS M6i instances offered up to 45 percent better BERT
performance on a benchmark from the Intel Model Zoo than the M5n instances with previous-
gen processors and up to 6.4 times the BERT performance compared to M6a instances with
3rd
Gen AMD EPYC processors. This means that organizations running similar BERT workloads
in the cloud could get better performance per instance by choosing M6i instances featuring
3rd
Gen Intel Xeon Scalable processors.
Up to 5.2x the queries
per second
vs. M6a instances
Up to 5.1x the queries
per second
vs. M6a instances
Up to 6.4x the queries
per second
vs. M6a instances
4
vCPUs
8
vCPUs
16
vCPUs
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance June 2022
A Principled Technologies report: Hands-on testing. Real-world results.
Figure 1: Key specifications for each instance size we tested. Source: Principled Technologies.
How we tested
We purchased three sets of instances from three general-purpose AWS EC2 series:
• M6i instances featuring 3rd
Gen Intel Xeon Platinum 8375C processors (Ice Lake)
• M5n instances featuring 2nd
Gen Intel Xeon Platinum 8259CL processors (Cascade Lake)
• M6a instances featuring 3rd
Gen AMD EPYC 7R13 processors (Milan)
We ran each instance in the US East 1 region.
Figure 1 shows the specifications for the instances that we chose. To show how businesses of various sizes with
different machine learning demands can benefit from choosing M6i instances, we tested instances with 4 vCPUs,
8 vCPUs, and 16 vCPUs. To account for different types of datasets, we ran tests using a small batch size of 1
and a large batch size of 32—where batch size is the number of samples that go through the neural network at
a time. In this report, we present the comparisons between M6i and M5n instances first, and then present the
comparisons between M6i and M6a instances. (Note: For additional test results on even larger instances, see the
science behind the report.)
4
vCPUs
8
vCPUs
16
vCPUs
Testing BERT performance in the cloud
The BERT framework, which was trained on text from the English language Wikipedia with over 2.5 million
words, works by turning text into numbers to sort, analyze, and make predictions about that text.1
Depending
on the dataset on which an organization needs to run BERT machine leaning, the size of the AWS instances they
choose will vary. To account for these different needs, we tested using two batch sizes across three different
instance sizes. We used a BERT benchmark from Intel Model Zoo, which offers a range of machine learning
models and tools. At the time of our testing, AMD EPYC processors did not support INT8 precision for BERT,
so we present FP32 precision results for M6i instances as well for comparison. In all three, the M6i instances
enabled by 3rd
Gen Intel Xeon Scalable processors outperformed both the previous-gen M5n instances and the
current-gen M6a instances.
June 2022 | 2
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance
Why choose M6i instances with 3rd
Gen Intel Xeon Scalable processors?
New M6i instances with 3rd
Gen Intel Xeon Scalable processors offer the following:4
• All-core turbo frequency of up to 3.5 GHz
• Always-on memory encryption with Intel Total Memory Encryption (TME)
• Intel DL Boost with Vector Neural Network Instructions (VNNI) that accelerate INT8 performance
• Intel Advanced Vector Extensions 512 (Intel AVX-512) instructions for demanding machine
learning workloads
• Support for up to 128 vCPUs and 512 GB of memory per instance
• Up to 50Gbps networking
About 3rd
Generation Intel Xeon Scalable processors
According to Intel, 3rd
Generation Intel Xeon Scalable processors are “[o]ptimized for cloud, enterprise, HPC,
network, security, and IoT workloads with 8 to 40 powerful cores and a wide range of frequency, feature, and
power levels.”2
Intel continues to offer many models from the Platinum, Gold, Silver, and Bronze processor lines
that they “designed through decades of innovation for the most common workload requirements.3
For more information, visit http://intel.com/xeonscalable.
June 2022 | 3
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance
Instances with 4 vCPUs: M6i vs. M5n
First, we compared BERT performance on smaller instances, looking at the relative amount of text the instance
types analyzed on 4vCPU configurations. As Figure 2 shows, M6i instances enabled by 3rd
Gen Intel Xeon
Scalable processors analyzed up to 18 percent more examples per second than the M5n instances with 2nd
Gen
Intel Xeon Scalable processors.
Figure 2: Relative BERT performance for M6i and M5n instances using 4 vCPUs. Higher numbers are better.
Source: Principled Technologies.
Instances with 8 vCPUs: M6i vs. M5n
When we doubled the instance size to 8 vCPUs, M6i instances delivered a similar performance increase over
previous-gen M5n instances. Figure 3 compares the relative amount of text the instance types analyzed on
8vCPU configurations. The M6i instances enabled by 3rd
Gen Intel Xeon Scalable processors analyzed up to 11
percent more examples per second than the M5n instances with 2nd
Gen Intel Xeon Scalable processors.
Figure 3: Relative BERT performance for M6i and M5n instances using 8 vCPUs. Higher numbers are better.
Source: Principled Technologies.
Relative BERT performance of m6i.xlarge vs. m5n.xlarge
Larger is better
0
0.20
0.40
0.60
0.80
1.00
1.40
1.20
1.00
Relative
throughput
(examples/sec)
M6i (INT8) M5n (INT8) M6i (INT8)
Batch size: 1 Batch size: 32
1.13
1.00
1.18
M5n (INT8)
1.60
Relative BERT performance of m6i.2xlarge vs. m5n.2xlarge
Larger is better
0
0.20
0.40
0.60
0.80
1.00
1.40
1.20
1.00
Relative
throughput
(examples/sec)
M6i (INT8) M5n (INT8) M6i (INT8)
Batch size: 1 Batch size: 32
1.11
1.00
1.11
M5n (INT8)
1.60
up to 11%
better throughput
up to 18%
better throughput
4
vCPUs
8
vCPUs
June 2022 | 4
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance
Instances with 16 vCPUs: M6i vs. M5n
As Figure 4 shows, M6i instances offered the greatest relative BERT performance increase over previous-gen
M5n instances using larger 16vCPU configurations. The M6i instances enabled by 3rd
Gen Intel Xeon Scalable
processors analyzed up to 45 percent more examples per second than the M5n instances with 2nd
Gen Intel Xeon
Scalable processors. By improving textual data analysis throughput by 45 percent, organizations could reduce
the number of instances they need to purchase and manage when they select the M6i instance type.
Figure 4: Relative BERT performance for M6i and M5n instances using 16 vCPUs. Higher numbers are better.
Source: Principled Technologies.
Relative BERT performance of m6i.4xlarge vs. m5n.4xlarge
Larger is better
0
0.20
0.40
0.60
0.80
1.00
1.40
1.20
1.00
Relative
throughput
(examples/sec)
M6i (INT8) M5n (INT8) M6i (INT8)
Batch size: 1 Batch size: 32
1.21
1.00
1.45
M5n (INT8)
1.60
up to 45%
better throughput
16
vCPUs
June 2022 | 5
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance
Instances with 4 vCPUs: M6i vs. M6a
After comparing BERT performance of M6i instances against that of instances based on previous-gen processors,
we compared those three sizes of M6i instances against M6a instances with AMD EPYC processors. Figure 5
compares the relative amount of text these instance types analyzed on 4vCPU configurations. The M6i instances
enabled by 3rd
Gen Intel Xeon Scalable processors with INT8 precision analyzed data 5.29 times as fast as the
M6a instances with 3rd
Gen AMD EPYC processors using FP32 precision. Note: At the time of testing, INT8
precision—which can improve performance for these types of machine learning—was not available for
BERT workloads on AMD EPYC processors. Using FP32 precision, M6i instances improved performance over
M6a instances by as much as 68 percent.
Figure 5: Relative BERT performance for M6i and M6a instances using 4 vCPUs. Higher numbers are better.
Source: Principled Technologies.
Instances with 8 vCPUs: M6i vs. M6a
When we increased the instance sizes to 8 vCPUs, performance increases were similar to the 4vCPU
configurations. Figure 6 compares the relative amount of text the instance types analyzed on 8vCPU
configurations. The M6i instances enabled by 3rd
Gen Intel Xeon Scalable processors analyzed data up to 5.10
times as fast as the M6a instances with 3rd
Gen AMD EPYC processors.
Figure 6: Relative BERT performance for M6i and M6a instances using 8 vCPUs. Higher numbers are better.
Source: Principled Technologies.
Relative BERT performance of m6i.xlarge vs. m6a.xlarge
Larger is better
0
1.00
2.00
3.00
4.00
5.00
7.00
6.00
1.68
1.00
Relative
throughput
(examples/sec)
M6i (INT8) M6i (FP32) M6a (FP32) M6i (INT8) M6i (FP32) M6a (FP32)
Batch size: 1 Batch size: 32
4.24
1.00
1.57
5.29
Relative BERT performance of m6i.2xlarge vs. m6a.2xlarge
Larger is better
0
1.00
2.00
3.00
4.00
5.00
7.00
6.00
1.64
1.00
Relative
throughput
(examples/sec)
M6i (INT8) M6i (FP32) M6a (FP32) M6i (INT8) M6i (FP32) M6a (FP32)
Batch size: 1 Batch size: 32
4.44
1.00
1.68
5.10
up to 5.10x
the throughput
up to 5.29x
the throughput
4
vCPUs
8
vCPUs
June 2022 | 6
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance
Instances with 16 vCPUs: M6i vs. M6a
The biggest relative difference in BERT performance occurred in our 16vCPU comparison of M6i and M6a
configurations. Figure 7 compares the relative examples per second the instance types analyzed on 16vCPU
configurations. The M6i instances enabled by 3rd
Gen Intel Xeon Scalable processors analyzed data up to 6.40
times as fast as the M6a instances with 3rd
Gen AMD EPYC processors. These results show that for these types
of BERT workloads, selecting M6i instances that offer INT8 precision over M6a instances that don’t could allow
organizations to complete textual analysis workloads using fewer cloud instances.
Figure 7: Relative BERT performance for M6i and M6a instances using 16 vCPUs. Higher numbers are better.
Source: Principled Technologies.
Relative BERT performance of m6i.4xlarge vs. m6a.4xlarge
Larger is better
0
1.00
2.00
3.00
4.00
5.00
7.00
6.00
1.81
1.00
Relative
throughput
(examples/sec)
M6i (INT8) M6i (FP32) M6a (FP32) M6i (INT8) M6i (FP32) M6a (FP32)
Batch size: 1 Batch size: 32
6.37
1.00
2.24
6.40
up to 6.40x
the throughput
16
vCPUs
June 2022 | 7
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance
Scaling BERT workloads
Another consideration for assessing BERT performance is to see how the throughput scales as you increase
the size of the instance. Theoretically, performance could double as you double the vCPU count, which would
be perfect linear scaling. While resource allocation makes this unlikely in the real world, the closer an instance
approaches this ideal, the better.
As Figure 8 shows, using results from our batch size: 1 tests, the M6i instance with 3rd
Gen Intel Xeon Scalable
processors had better BERT performance scaling from 8 vCPUs to 16 vCPUs compared to the M6a instance with
AMD EPYC processors, though slightly worse scaling from 4 vCPUs to 8 vCPUs.
Figure 8: How BERT performance scaled across instance sizes, compared to results from the 4vCPU tests with batch size 1.
Higher numbers are better. Source: Principled Technologies.
Figure 9 makes the same comparison, but uses results from our batch size: 32 testing. Again, the M6i
instance with 3rd
Gen Intel Xeon Scalable processors scaled more linearly from 4 to 16 vCPUs compared to
the M6a instance.
Figure 9: How BERT performance scaled across instance sizes, compared to results from the 4vCPU tests with batch size
32. Higher numbers are better. Source: Principled Technologies.
Relative BERT performance scaling compared to 4vCPUs with batch size: 1
Larger is better
0
0.50
1.00
1.50
2.00
2.50
3.50
3.00
Relative
throughput
(examples/sec)
4 vCPUs 8 vCPUs
1.85
1.00
3.40
1.86 1.91
3.16
4.00
4.50
M6i (INT8) M6i (INT32) M6a (FP32)
16vCPUs
3.83
1.00 1.00
Relative BERT performance scaling compared to 4vCPUs with batch size: 32
Larger is better
Relative
throughput
(examples/sec)
0
0.50
1.00
1.50
2.00
2.50
3.50
3.00
4 vCPUs 8 vCPUs
1.86
1.00
1.90
3.52
2.52
1.78
4.00
4.50
M6i (INT8) M6i (INT32) M6a (FP32)
16vCPUs
3.79
1.00 1.00
By selecting M6i instances that offer more linear, predictable performance scaling, organizations could more
reliably fix their cloud operating budgets as textual analysis workloads continue to grow.
June 2022 | 8
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance
Conclusion
Organizations analyzing textual data using NLP through the BERT framework must decide which type of instance
can deliver the BERT performance they need. In our tests, we found that across instance sizes, AWS M6i
instances with 3rd
Gen Intel Xeon Scalable processors outperformed both M5n instances with 2nd
Gen Intel Xeon
Scalable processors and M6a instances with 3rd
Gen AMD EPYC processors for BERT machine learning. Plus, the
M6i instances offered more predictable scaling at 16vCPUs. These performance increases could help you get
quicker insight from textual data to better satisfy consumers and increase revenues.
1.	 TechTarget, “BERT language model,” accessed December 16, 2021,
https://www.techtarget.com/searchenterpriseai/definition/BERT-language-model.
2.	 Intel, “3rd Gen Intel®
Xeon®
Scalable Processors,” accessed December 14, 2021,
https://www.intel.com/content/www/us/en/products/docs/processors/xeon/3rd-gen-xeon-scalable-processors-brief.html.
3.	 Intel, “3rd Gen Intel®
Xeon®
Scalable Processors.”
4.	 Amazon, Amazon EC2 M6i Instances, accessed December 14, 2021,
https://aws.amazon.com/ec2/instance-types/m6i/.
Principled Technologies is a registered trademark of Principled Technologies, Inc.
All other product names are the trademarks of their respective owners.
For additional information, review the science behind this report.
Principled
Technologies®
Facts matter.®
Principled
Technologies®
Facts matter.®
This project was commissioned by Intel.
Read the science behind this report at https://facts.pt/ZymIIA3
June 2022 | 9
AWS EC2 M6i instances featuring 3rd
Gen Intel Xeon Scalable processors offered better BERT machine learning performance

More Related Content

PDF
AWS EC2 M6i instances with 3rd Gen Intel Xeon Scalable processors accelerated...
PDF
AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors improv...
PDF
Accelerate natural language processing with AWS EC2 M7i instances featuring 4...
PDF
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd...
PDF
Make sense of important data faster with AWS EC2 M6i instances
PDF
Google Cloud N2 VM instances featuring 3rd Gen Intel Xeon Scalable processors...
PDF
Complete artificial intelligence workloads faster using Microsoft Azure virtu...
PDF
Finish Microsoft SQL Server data analysis faster with new M5n series instance...
AWS EC2 M6i instances with 3rd Gen Intel Xeon Scalable processors accelerated...
AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors improv...
Accelerate natural language processing with AWS EC2 M7i instances featuring 4...
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd...
Make sense of important data faster with AWS EC2 M6i instances
Google Cloud N2 VM instances featuring 3rd Gen Intel Xeon Scalable processors...
Complete artificial intelligence workloads faster using Microsoft Azure virtu...
Finish Microsoft SQL Server data analysis faster with new M5n series instance...

Similar to AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance (20)

PDF
Complete online analytics processing work faster with Google Cloud Platform N...
PDF
January 2020 - re:Invent reCap slides - Denver Amazon Web Services Users' Group
PDF
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
PDF
Get higher performance for your MySQL databases with Dell APEX Private Cloud ...
PDF
Google Cloud N2 instances featuring 3rd Gen Intel Xeon Scalable processors ex...
PPTX
Elastic Compute Cloud (EC2) on AWS Presentation
PDF
Complete more PostgreSQL work with new Microsoft Azure Lsv3-series VMs featur...
PDF
Get a clearer picture of potential cloud performance by looking beyond SPECra...
PDF
Workstations powered by Intel can play a vital role in CPU-intensive AI devel...
PDF
Workstations powered by Intel for AI developer tasks: Q&A
PDF
Boost your MariaDB online transaction processing performance with N2 standard...
PDF
Open up new possibilities with higher transactional database performance from...
PDF
Get competitive logistic regression performance with servers with AMD EPYC 75...
PDF
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
PDF
Process data analytics queries faster with new Microsoft Azure Lsv3-series VM...
DOCX
1.multicore processors
PDF
Lesson 26. Optimization of 64-bit programs
DOCX
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Automatic scaling of internet applica...
PDF
How to use Apache TVM to optimize your ML models
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
Complete online analytics processing work faster with Google Cloud Platform N...
January 2020 - re:Invent reCap slides - Denver Amazon Web Services Users' Group
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Get higher performance for your MySQL databases with Dell APEX Private Cloud ...
Google Cloud N2 instances featuring 3rd Gen Intel Xeon Scalable processors ex...
Elastic Compute Cloud (EC2) on AWS Presentation
Complete more PostgreSQL work with new Microsoft Azure Lsv3-series VMs featur...
Get a clearer picture of potential cloud performance by looking beyond SPECra...
Workstations powered by Intel can play a vital role in CPU-intensive AI devel...
Workstations powered by Intel for AI developer tasks: Q&A
Boost your MariaDB online transaction processing performance with N2 standard...
Open up new possibilities with higher transactional database performance from...
Get competitive logistic regression performance with servers with AMD EPYC 75...
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
Process data analytics queries faster with new Microsoft Azure Lsv3-series VM...
1.multicore processors
Lesson 26. Optimization of 64-bit programs
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Automatic scaling of internet applica...
How to use Apache TVM to optimize your ML models
Accelerating Real Time Applications on Heterogeneous Platforms
Ad

More from Principled Technologies (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Dell Pro 14 Plus: Be better prepared for what’s coming
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
PDF
Make GenAI investments go further with the Dell AI Factory
PDF
Unlock faster insights with Azure Databricks
PDF
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
PDF
The case for on-premises AI
PDF
Dell PowerEdge server cooling: Choose the cooling options that match the need...
PDF
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
PDF
Propel your business into the future by refreshing with new one-socket Dell P...
PDF
Propel your business into the future by refreshing with new one-socket Dell P...
PDF
Unlock flexibility, security, and scalability by migrating MySQL databases to...
PDF
Migrate your PostgreSQL databases to Microsoft Azure for plug‑and‑play simpli...
PDF
On-premises AI approaches: The advantages of a turnkey solution, HPE Private ...
PDF
A Dell PowerStore shared storage solution is more cost-effective than an HCI ...
PDF
Gain the flexibility that diverse modern workloads demand with Dell PowerStore
PDF
Save up to $2.8M per new server over five years by consolidating with new Sup...
PDF
Securing Red Hat workloads on Azure - Summary Presentation
PDF
Securing Red Hat workloads on Azure - Infographic
Modernizing your data center with Dell and AMD
Dell Pro 14 Plus: Be better prepared for what’s coming
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Make GenAI investments go further with the Dell AI Factory - Infographic
Make GenAI investments go further with the Dell AI Factory
Unlock faster insights with Azure Databricks
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
The case for on-premises AI
Dell PowerEdge server cooling: Choose the cooling options that match the need...
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
Propel your business into the future by refreshing with new one-socket Dell P...
Propel your business into the future by refreshing with new one-socket Dell P...
Unlock flexibility, security, and scalability by migrating MySQL databases to...
Migrate your PostgreSQL databases to Microsoft Azure for plug‑and‑play simpli...
On-premises AI approaches: The advantages of a turnkey solution, HPE Private ...
A Dell PowerStore shared storage solution is more cost-effective than an HCI ...
Gain the flexibility that diverse modern workloads demand with Dell PowerStore
Save up to $2.8M per new server over five years by consolidating with new Sup...
Securing Red Hat workloads on Azure - Summary Presentation
Securing Red Hat workloads on Azure - Infographic
Ad

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
NewMind AI Weekly Chronicles - August'25 Week I
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance

  • 1. AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance vs. M5n instances with 2nd Gen Intel Xeon Scalable processors and M6a instances with 3rd Gen AMD EPYC processors Many machine learning workloads involve sorting, analyzing, and making relationships between images, but how can organizations quickly make sense of large amounts of text? Bidirectional Encoder Representations from Transformers (BERT) is a machine learning framework for natural language processing (NLP). To analyze text, BERT looks at all the words around a given word to put it in the correct context. This allows applications such as search engines to predict sentences, answer questions, or generate conversational responses. Using Intel optimization for TensorFlow and ZenDNN integrated with TensorFlow, we compared the BERT machine learning performance of three types of Amazon Web Services (AWS) EC2 series instances: M6i instances with 3rd Gen Intel® Xeon® Scalable processors featuring Intel DL Boost with Vector Neural Network Instructions, M5n instances with 2nd Gen Intel Xeon Scalable processors, and M6a instances with 3rd Gen AMD EPYC™ processors. In tests at multiple instance sizes, AWS M6i instances offered up to 45 percent better BERT performance on a benchmark from the Intel Model Zoo than the M5n instances with previous- gen processors and up to 6.4 times the BERT performance compared to M6a instances with 3rd Gen AMD EPYC processors. This means that organizations running similar BERT workloads in the cloud could get better performance per instance by choosing M6i instances featuring 3rd Gen Intel Xeon Scalable processors. Up to 5.2x the queries per second vs. M6a instances Up to 5.1x the queries per second vs. M6a instances Up to 6.4x the queries per second vs. M6a instances 4 vCPUs 8 vCPUs 16 vCPUs AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance June 2022 A Principled Technologies report: Hands-on testing. Real-world results.
  • 2. Figure 1: Key specifications for each instance size we tested. Source: Principled Technologies. How we tested We purchased three sets of instances from three general-purpose AWS EC2 series: • M6i instances featuring 3rd Gen Intel Xeon Platinum 8375C processors (Ice Lake) • M5n instances featuring 2nd Gen Intel Xeon Platinum 8259CL processors (Cascade Lake) • M6a instances featuring 3rd Gen AMD EPYC 7R13 processors (Milan) We ran each instance in the US East 1 region. Figure 1 shows the specifications for the instances that we chose. To show how businesses of various sizes with different machine learning demands can benefit from choosing M6i instances, we tested instances with 4 vCPUs, 8 vCPUs, and 16 vCPUs. To account for different types of datasets, we ran tests using a small batch size of 1 and a large batch size of 32—where batch size is the number of samples that go through the neural network at a time. In this report, we present the comparisons between M6i and M5n instances first, and then present the comparisons between M6i and M6a instances. (Note: For additional test results on even larger instances, see the science behind the report.) 4 vCPUs 8 vCPUs 16 vCPUs Testing BERT performance in the cloud The BERT framework, which was trained on text from the English language Wikipedia with over 2.5 million words, works by turning text into numbers to sort, analyze, and make predictions about that text.1 Depending on the dataset on which an organization needs to run BERT machine leaning, the size of the AWS instances they choose will vary. To account for these different needs, we tested using two batch sizes across three different instance sizes. We used a BERT benchmark from Intel Model Zoo, which offers a range of machine learning models and tools. At the time of our testing, AMD EPYC processors did not support INT8 precision for BERT, so we present FP32 precision results for M6i instances as well for comparison. In all three, the M6i instances enabled by 3rd Gen Intel Xeon Scalable processors outperformed both the previous-gen M5n instances and the current-gen M6a instances. June 2022 | 2 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance
  • 3. Why choose M6i instances with 3rd Gen Intel Xeon Scalable processors? New M6i instances with 3rd Gen Intel Xeon Scalable processors offer the following:4 • All-core turbo frequency of up to 3.5 GHz • Always-on memory encryption with Intel Total Memory Encryption (TME) • Intel DL Boost with Vector Neural Network Instructions (VNNI) that accelerate INT8 performance • Intel Advanced Vector Extensions 512 (Intel AVX-512) instructions for demanding machine learning workloads • Support for up to 128 vCPUs and 512 GB of memory per instance • Up to 50Gbps networking About 3rd Generation Intel Xeon Scalable processors According to Intel, 3rd Generation Intel Xeon Scalable processors are “[o]ptimized for cloud, enterprise, HPC, network, security, and IoT workloads with 8 to 40 powerful cores and a wide range of frequency, feature, and power levels.”2 Intel continues to offer many models from the Platinum, Gold, Silver, and Bronze processor lines that they “designed through decades of innovation for the most common workload requirements.3 For more information, visit http://intel.com/xeonscalable. June 2022 | 3 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance
  • 4. Instances with 4 vCPUs: M6i vs. M5n First, we compared BERT performance on smaller instances, looking at the relative amount of text the instance types analyzed on 4vCPU configurations. As Figure 2 shows, M6i instances enabled by 3rd Gen Intel Xeon Scalable processors analyzed up to 18 percent more examples per second than the M5n instances with 2nd Gen Intel Xeon Scalable processors. Figure 2: Relative BERT performance for M6i and M5n instances using 4 vCPUs. Higher numbers are better. Source: Principled Technologies. Instances with 8 vCPUs: M6i vs. M5n When we doubled the instance size to 8 vCPUs, M6i instances delivered a similar performance increase over previous-gen M5n instances. Figure 3 compares the relative amount of text the instance types analyzed on 8vCPU configurations. The M6i instances enabled by 3rd Gen Intel Xeon Scalable processors analyzed up to 11 percent more examples per second than the M5n instances with 2nd Gen Intel Xeon Scalable processors. Figure 3: Relative BERT performance for M6i and M5n instances using 8 vCPUs. Higher numbers are better. Source: Principled Technologies. Relative BERT performance of m6i.xlarge vs. m5n.xlarge Larger is better 0 0.20 0.40 0.60 0.80 1.00 1.40 1.20 1.00 Relative throughput (examples/sec) M6i (INT8) M5n (INT8) M6i (INT8) Batch size: 1 Batch size: 32 1.13 1.00 1.18 M5n (INT8) 1.60 Relative BERT performance of m6i.2xlarge vs. m5n.2xlarge Larger is better 0 0.20 0.40 0.60 0.80 1.00 1.40 1.20 1.00 Relative throughput (examples/sec) M6i (INT8) M5n (INT8) M6i (INT8) Batch size: 1 Batch size: 32 1.11 1.00 1.11 M5n (INT8) 1.60 up to 11% better throughput up to 18% better throughput 4 vCPUs 8 vCPUs June 2022 | 4 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance
  • 5. Instances with 16 vCPUs: M6i vs. M5n As Figure 4 shows, M6i instances offered the greatest relative BERT performance increase over previous-gen M5n instances using larger 16vCPU configurations. The M6i instances enabled by 3rd Gen Intel Xeon Scalable processors analyzed up to 45 percent more examples per second than the M5n instances with 2nd Gen Intel Xeon Scalable processors. By improving textual data analysis throughput by 45 percent, organizations could reduce the number of instances they need to purchase and manage when they select the M6i instance type. Figure 4: Relative BERT performance for M6i and M5n instances using 16 vCPUs. Higher numbers are better. Source: Principled Technologies. Relative BERT performance of m6i.4xlarge vs. m5n.4xlarge Larger is better 0 0.20 0.40 0.60 0.80 1.00 1.40 1.20 1.00 Relative throughput (examples/sec) M6i (INT8) M5n (INT8) M6i (INT8) Batch size: 1 Batch size: 32 1.21 1.00 1.45 M5n (INT8) 1.60 up to 45% better throughput 16 vCPUs June 2022 | 5 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance
  • 6. Instances with 4 vCPUs: M6i vs. M6a After comparing BERT performance of M6i instances against that of instances based on previous-gen processors, we compared those three sizes of M6i instances against M6a instances with AMD EPYC processors. Figure 5 compares the relative amount of text these instance types analyzed on 4vCPU configurations. The M6i instances enabled by 3rd Gen Intel Xeon Scalable processors with INT8 precision analyzed data 5.29 times as fast as the M6a instances with 3rd Gen AMD EPYC processors using FP32 precision. Note: At the time of testing, INT8 precision—which can improve performance for these types of machine learning—was not available for BERT workloads on AMD EPYC processors. Using FP32 precision, M6i instances improved performance over M6a instances by as much as 68 percent. Figure 5: Relative BERT performance for M6i and M6a instances using 4 vCPUs. Higher numbers are better. Source: Principled Technologies. Instances with 8 vCPUs: M6i vs. M6a When we increased the instance sizes to 8 vCPUs, performance increases were similar to the 4vCPU configurations. Figure 6 compares the relative amount of text the instance types analyzed on 8vCPU configurations. The M6i instances enabled by 3rd Gen Intel Xeon Scalable processors analyzed data up to 5.10 times as fast as the M6a instances with 3rd Gen AMD EPYC processors. Figure 6: Relative BERT performance for M6i and M6a instances using 8 vCPUs. Higher numbers are better. Source: Principled Technologies. Relative BERT performance of m6i.xlarge vs. m6a.xlarge Larger is better 0 1.00 2.00 3.00 4.00 5.00 7.00 6.00 1.68 1.00 Relative throughput (examples/sec) M6i (INT8) M6i (FP32) M6a (FP32) M6i (INT8) M6i (FP32) M6a (FP32) Batch size: 1 Batch size: 32 4.24 1.00 1.57 5.29 Relative BERT performance of m6i.2xlarge vs. m6a.2xlarge Larger is better 0 1.00 2.00 3.00 4.00 5.00 7.00 6.00 1.64 1.00 Relative throughput (examples/sec) M6i (INT8) M6i (FP32) M6a (FP32) M6i (INT8) M6i (FP32) M6a (FP32) Batch size: 1 Batch size: 32 4.44 1.00 1.68 5.10 up to 5.10x the throughput up to 5.29x the throughput 4 vCPUs 8 vCPUs June 2022 | 6 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance
  • 7. Instances with 16 vCPUs: M6i vs. M6a The biggest relative difference in BERT performance occurred in our 16vCPU comparison of M6i and M6a configurations. Figure 7 compares the relative examples per second the instance types analyzed on 16vCPU configurations. The M6i instances enabled by 3rd Gen Intel Xeon Scalable processors analyzed data up to 6.40 times as fast as the M6a instances with 3rd Gen AMD EPYC processors. These results show that for these types of BERT workloads, selecting M6i instances that offer INT8 precision over M6a instances that don’t could allow organizations to complete textual analysis workloads using fewer cloud instances. Figure 7: Relative BERT performance for M6i and M6a instances using 16 vCPUs. Higher numbers are better. Source: Principled Technologies. Relative BERT performance of m6i.4xlarge vs. m6a.4xlarge Larger is better 0 1.00 2.00 3.00 4.00 5.00 7.00 6.00 1.81 1.00 Relative throughput (examples/sec) M6i (INT8) M6i (FP32) M6a (FP32) M6i (INT8) M6i (FP32) M6a (FP32) Batch size: 1 Batch size: 32 6.37 1.00 2.24 6.40 up to 6.40x the throughput 16 vCPUs June 2022 | 7 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance
  • 8. Scaling BERT workloads Another consideration for assessing BERT performance is to see how the throughput scales as you increase the size of the instance. Theoretically, performance could double as you double the vCPU count, which would be perfect linear scaling. While resource allocation makes this unlikely in the real world, the closer an instance approaches this ideal, the better. As Figure 8 shows, using results from our batch size: 1 tests, the M6i instance with 3rd Gen Intel Xeon Scalable processors had better BERT performance scaling from 8 vCPUs to 16 vCPUs compared to the M6a instance with AMD EPYC processors, though slightly worse scaling from 4 vCPUs to 8 vCPUs. Figure 8: How BERT performance scaled across instance sizes, compared to results from the 4vCPU tests with batch size 1. Higher numbers are better. Source: Principled Technologies. Figure 9 makes the same comparison, but uses results from our batch size: 32 testing. Again, the M6i instance with 3rd Gen Intel Xeon Scalable processors scaled more linearly from 4 to 16 vCPUs compared to the M6a instance. Figure 9: How BERT performance scaled across instance sizes, compared to results from the 4vCPU tests with batch size 32. Higher numbers are better. Source: Principled Technologies. Relative BERT performance scaling compared to 4vCPUs with batch size: 1 Larger is better 0 0.50 1.00 1.50 2.00 2.50 3.50 3.00 Relative throughput (examples/sec) 4 vCPUs 8 vCPUs 1.85 1.00 3.40 1.86 1.91 3.16 4.00 4.50 M6i (INT8) M6i (INT32) M6a (FP32) 16vCPUs 3.83 1.00 1.00 Relative BERT performance scaling compared to 4vCPUs with batch size: 32 Larger is better Relative throughput (examples/sec) 0 0.50 1.00 1.50 2.00 2.50 3.50 3.00 4 vCPUs 8 vCPUs 1.86 1.00 1.90 3.52 2.52 1.78 4.00 4.50 M6i (INT8) M6i (INT32) M6a (FP32) 16vCPUs 3.79 1.00 1.00 By selecting M6i instances that offer more linear, predictable performance scaling, organizations could more reliably fix their cloud operating budgets as textual analysis workloads continue to grow. June 2022 | 8 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance
  • 9. Conclusion Organizations analyzing textual data using NLP through the BERT framework must decide which type of instance can deliver the BERT performance they need. In our tests, we found that across instance sizes, AWS M6i instances with 3rd Gen Intel Xeon Scalable processors outperformed both M5n instances with 2nd Gen Intel Xeon Scalable processors and M6a instances with 3rd Gen AMD EPYC processors for BERT machine learning. Plus, the M6i instances offered more predictable scaling at 16vCPUs. These performance increases could help you get quicker insight from textual data to better satisfy consumers and increase revenues. 1. TechTarget, “BERT language model,” accessed December 16, 2021, https://www.techtarget.com/searchenterpriseai/definition/BERT-language-model. 2. Intel, “3rd Gen Intel® Xeon® Scalable Processors,” accessed December 14, 2021, https://www.intel.com/content/www/us/en/products/docs/processors/xeon/3rd-gen-xeon-scalable-processors-brief.html. 3. Intel, “3rd Gen Intel® Xeon® Scalable Processors.” 4. Amazon, Amazon EC2 M6i Instances, accessed December 14, 2021, https://aws.amazon.com/ec2/instance-types/m6i/. Principled Technologies is a registered trademark of Principled Technologies, Inc. All other product names are the trademarks of their respective owners. For additional information, review the science behind this report. Principled Technologies® Facts matter.® Principled Technologies® Facts matter.® This project was commissioned by Intel. Read the science behind this report at https://facts.pt/ZymIIA3 June 2022 | 9 AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offered better BERT machine learning performance