SlideShare a Scribd company logo
Speed up deep learning tasks with Amazon Web
Services instances featuring 2nd Gen Intel Xeon
Scalable processors
Newer M5n instances featuring Intel Xeon Platinum 8272CL processors
performed more inference operations per second than M4 instances with
older processors
Big data—the massive amounts of information that organizations collect—is useful only if the
information is sorted and classified to deliver insights they can act on. By using deep learning
networks for quick image classification and prediction, computers can offer insight into business
patterns and offer suggestions to consumers in real time. Amazon Web Services (AWS) Elastic
Compute Cloud (EC2) offers several cloud instances that can support deep learning models,
including general-purpose M5n instances. These newer AWS M5n instances run on Intel®
Xeon®
Platinum 8272CL processors, which include a feature, Intel Deep Learning Boost, that Intel
designed to improve machine learning workloads.
At Principled Technologies, we used two deep learning inference benchmarks from the Model
Zoo for Intel Architecture—ResNet50, which classifies images, and Wide & Deep recommendation
system, which generates advertisement recommendations—to compare the inference
performance of older M4 instances to newer M5n instances at various instance sizes. We found
that for both deep learning frameworks, the upgraded M5n instances offered significantly better
inference performance, which shows that M5n instances featuring 2nd Generation Intel Xeon
Scalable processors can help organizations make sense of their data faster.
Classify images up to
6.17x more
images per second
Generate ad
recommendations
up to 2.94x as fast
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021
A Principled Technologies report: Hands-on testing. Real-world results.
Figure 1: Key specifications for each instance size we tested. Source: Principled Technologies.
How we tested
We purchased three sets of instances from two general-purpose AWS EC2 series:
• Newer M5n instances featuring 2nd Generation Intel Xeon Scalable processors (Cascade Lake)
• Older M4 instances featuring Intel Xeon E5-2686 v4 processors (Broadwell)
We ran each instance in the US East 1 region.
Figure 1 shows the specifications for the instances that we chose. To show how businesses of various sizes with
different deep learning demands can benefit from choosing M5n instances, we tested small (8 vCPU), medium
(16 vCPU), and large (64 vCPU) VM sizes.
Small
(E8s_v4) 8 vCPUs
Medium
(E16s_v4) 16 vCPUs
Large
(E64s_v4) 64 vCPUs
About 2nd Generation Intel Xeon Scalable processors with Intel Deep Learning Boost
The 2nd Generation Intel Xeon Scalable processor platform—codenamed Cascade Lake—features a wide
range of processor types, including Bronze, Silver, Gold, and Platinum, to support varying workload needs.
To accelerate machine learning inference, 2nd Gen Intel Xeon Scalable processors offer Intel Deep Learning
Boost (DL Boost). Intel DL Boost builds on Intel Advanced Vector Extensions 512 (AVX-512) instructions with
Intel Vector Neural Network Instructions (VNNI), combining multiple processor instructions into one to improve
machine learning inference performance through resource optimization.1
To learn more about Intel DL Boost built into 2nd Generation Intel Xeon Scalable processors, visit https://www.
intel.com/content/dam/www/public/us/en/documents/product-overviews/dl-boost-product-overview.pdf.
S M L
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 2
Why choose M5n instances?
Compared to older M4 instances, M5n instances offer:
• 2nd Generation Intel Xeon Scalable Processors with a sustained all-core Turbo CPU frequency
of 3.1 GHz, maximum single core turbo frequency of 3.5 GHz, and Intel Vector Neural Network
support (AVX-512 VNNI)
• Peak bandwidth of 25 Gbps for small instances or 100 Gbps for large instances
• EBC or NVMe™
SSDs physically attached to the host server
Classifying images – ResNet50
From Model Zoo for Intel Architecture, which offers machine learning models, we chose the popular ResNet50
deep learning model for testing. ResNet50 is a convolutional neural network that runs 50 layers deep;
organizations use it to recognize and classify images. Deep learning for image classification is useful for real-
world applications such as self-driving cars or aiding in medical diagnoses. The benchmark reported throughput
in images per second that the solutions handled using this model, with higher scores indicating better
performance at this type of deep learning.
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 3
Small instances
If your deep learning needs are on the smaller side, selecting an AWS instance with 8 vCPUs could
meet your image classification needs. We found that a newer AWS M5n instance with 8 vCPUs
featuring 2nd Gen Intel Xeon Scalable processors classified 6.17 times the number of images per
second using the ResNet50 benchmark (with INT8 precision) as the small-sized M4 instance with
previous-generation Intel Xeon processors (with FP32 precision).
0 1 2 3 4 5 7
6
m5n.2xlarge
8 vCPU ResNet50 normalized images/sec throughput
Images/sec
Higher is better
m4.2xlarge 1
6.17
Figure 2: Relative number of images per second that the small-size instances (8 vCPUs) classified using the
ResNet50 benchmark. Higher numbers are better. Source: Principled Technologies.
Medium instances
Medium instances (16 vCPUs), which are better suited for larger models or datasets, also benefit from
newer processors. We found that a newer AWS M5n instance with 16 vCPUs featuring 2nd Gen Intel
Xeon Scalable processors classified 5.78 times the number of images per second using the ResNet50
benchmark (with INT8 precision) as the medium-sized M4 instance with previous-generation Intel
Xeon processors (with FP32 precision).
16 vCPU ResNet50 normalized images/sec throughput
Images/sec
Higher is better
m5n.4xlarge
m4.4xlarge
0 1 2 3 4 5 7
6
5.78
1
Figure 3: Relative number of images per second that the medium-size instances (16 vCPUs) classified using the
ResNet50 benchmark. Higher numbers are better. Source: Principled Technologies.
6.17x
images/sec
5.78x
images/sec
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 4
Large instances
If your organization needs to run deep learning workloads to extract insights or make recommendations from
even larger datasets, instances with 64 vCPUs can better tackle your needs. We found that a newer AWS M5n
instance with 64 vCPUs featuring 2nd Gen Intel Xeon Scalable processors classified 5.23 times the number of
images per second using the ResNet50 benchmark (with INT8 precision) as the large-sized M4 instance with
previous-generation Intel Xeon processors (with FP32 precision).
64 vCPU ResNet50 normalized images/sec throughput
Images/sec
Higher is better
m5n.16xlarge
m4.16xlarge
0 1 2 3 4 5 7
6
1
5.23
Figure 4: Relative number of images per second that the large-size instances (64 vCPUs) classified using the ResNet50
benchmark. Higher numbers are better. Source: Principled Technologies.
Get more value from your
cloud instances
Based on our machine learning test results,
newer AWS M5n instances offer up to 6.17
times the ResNet50 performance for as
little as 1.19 times the cost. This means
that upgraded M5n instances with 2nd
Gen Intel Xeon Scalable processors can
provide better overall value compared to
older M4 instances.
5.23x
images/sec
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 5
Real-time recommendations based on ad-click historical data – Model
Zoo for Intel Architecture Wide & Deep model
We used a TensorFlow-based model from the Model Zoo for Intel Architecture to conduct Wide & Deep testing.
Wide & Deep uses wide linear models and deep neural networks to infer meaningful relationships between data
and deliver recommendations based on that data.
Small instances
Organizations with smaller model sizes and datasets can run deep learning on instances configured with
8 vCPUs. We found that a newer AWS M5n instance with 8 vCPUs featuring 2nd Gen Intel Xeon Scalable
processors (with INT8 precision) handled 2.86 times the number of samples per second using the Wide & Deep
benchmark as the small-sized M4 instance with previous-generation processors (with FP32 precision).
0 1 2 3 4
8 vCPU normalized throughput
Samples/sec
Higher is better
m5n.2xlarge
m4.2xlarge 1
2.86
Figure 5: Relative number of samples per second that the small-size instances (8 vCPUs) handled using the Wide & Deep
benchmark. Higher numbers are better. Source: Principled Technologies.
2.86x
samples/sec
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 6
Medium instances
For organizations seeking to make recommendations based on mid-sized datasets, 16 vCPU instances
may be more appropriate. We found that a newer AWS M5n instance with 16 vCPUs featuring 2nd
Gen Intel Xeon Scalable processors (with INT8 precision) handled 2.94 times the number of samples
per second using the Wide & Deep benchmark as the medium-sized M4 instance with previous-
generation processors (with FP32 precision).
Higher is better
Samples/sec
16 vCPU normalized throughput
m5n.4xlarge
m4.4xlarge 1
0 1 2 3 4
2.94
Figure 6: Relative number of samples per second that the medium-size instances (16 vCPUs) handled using the
Wide & Deep benchmark. Higher numbers are better. Source: Principled Technologies.
Large instances
Those that need quick recommendations on larger datasets may require virtual machines with 64
vCPUs. We found that a newer AWS M5n instance with 64 vCPUs featuring 2nd Gen Intel Xeon
Scalable processors (with INT8 precision) handled 2.67 times the number of samples per second
using the Wide & Deep benchmark as the large-sized M4 instance with previous-generation
processors (with FP32 precision).
Higher is better
Samples/sec
64 vCPU normalized throughput
m5n.16xlarge
m4.16xlarge 1
0 1 2 3 4
2.67
Figure 7: Relative number of samples per second that the large-size instances (64 vCPUs) handled using the
Wide & Deep benchmark. Higher numbers are better. Source: Principled Technologies.
2.94x
samples/sec
2.67x
samples/sec
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 7
Choose AWS M5n instances featuring 2nd Gen Intel Xeon Scalable
processors for timely insights from data
Getting faster insights from never-ending data streams can improve business agility and lead to greater
success. Our test results show that newer AWS M5n instances featuring 2nd Gen Intel Xeon Scalable processors
with Intel Deep Learning Boost sped up deep learning inference performance for image classification and
recommendation models over older M4 instances. These performance improvements come at little added
cost, which means that M5n instances can offer better value per VM. By doing more deep learning work per
instance, your organization could ultimately require fewer instances overall, which can help keep budget
concerns in check.
By choosing AWS EC2 M5n instances with 2nd Gen Intel Xeon Scalable processors, your organization can get
deep learning insights from data faster than with older M4 instances.
1	 Intel, “Intel Deep Learning Boost,” accessed July 29, 2021, https://www.intel.com/content/dam/www/public/us/en/
documents/product-overviews/dl-boost-product-overview.pdf.
Principled Technologies is a registered trademark of Principled Technologies, Inc.
All other product names are the trademarks of their respective owners.
For additional information, review the science behind this report.
Principled
Technologies®
Facts matter.®
Principled
Technologies®
Facts matter.®
This project was commissioned by Intel.
Read the science behind this report at http://facts.pt/7J92SKA
Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 8

More Related Content

PDF
DBaaS with VMware vCAC, EMC XtremIO, and Cisco UCS
PDF
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
PDF
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
PDF
Component upgrades from Intel and Dell can increase VM density and boost perf...
PDF
Boosting performance with the Dell Acceleration Appliance for Databases
PDF
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
PDF
Performance of three Intel-based SMB servers running Web, email, and database...
PDF
Make sense of important data faster with AWS EC2 M6i instances
DBaaS with VMware vCAC, EMC XtremIO, and Cisco UCS
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Consolidate SAS 9.4 workloads with Intel Xeon processor E7 v3 and Intel SSD t...
Component upgrades from Intel and Dell can increase VM density and boost perf...
Boosting performance with the Dell Acceleration Appliance for Databases
Better email response time using Microsoft Exchange 2013 with the Dell PowerE...
Performance of three Intel-based SMB servers running Web, email, and database...
Make sense of important data faster with AWS EC2 M6i instances

What's hot (20)

PDF
Distributed DNN training: Infrastructure, challenges, and lessons learned
PDF
Phdvb citrix-datasheet
PPT
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs
PDF
Increase density and performance with upgrades from Intel and Dell
PDF
Deep Learning Computer Build
PDF
Pod density comparison: VMware vSphere with Tanzu vs. a bare-metal approach ...
PDF
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
PDF
Intel Xeon processor E5-2690: Enterprise workload performance while running s...
PDF
PDF
Introduction to Parallel Distributed Computer Systems
PDF
Citrix Xen Desktop Solution White Paper
PDF
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
PDF
Using VMTurbo to boost performance
PDF
Get the most out of your storage with the Dell EMC Unity XT 880F All-Flash ar...
PDF
Intel - Office 2010 Solution Brief
PDF
The All New 2010 Intel Core vPro Processor Family and Windows 7: Your Best Ch...
PDF
Perceiving and recovering degraded data on secure cloud
PDF
Classification of Virtualization Environment for Cloud Computing
PDF
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01
PDF
Harness Enterprise Data with SQL Server 2008 R2 and New Intel Xeon Processors
Distributed DNN training: Infrastructure, challenges, and lessons learned
Phdvb citrix-datasheet
Build FAST Deep Learning Apps with Docker on OpenPOWER and GPUs
Increase density and performance with upgrades from Intel and Dell
Deep Learning Computer Build
Pod density comparison: VMware vSphere with Tanzu vs. a bare-metal approach ...
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and Docker
Intel Xeon processor E5-2690: Enterprise workload performance while running s...
Introduction to Parallel Distributed Computer Systems
Citrix Xen Desktop Solution White Paper
Accelerate performance on machine learning workloads with the Dell EMC PowerE...
Using VMTurbo to boost performance
Get the most out of your storage with the Dell EMC Unity XT 880F All-Flash ar...
Intel - Office 2010 Solution Brief
The All New 2010 Intel Core vPro Processor Family and Windows 7: Your Best Ch...
Perceiving and recovering degraded data on secure cloud
Classification of Virtualization Environment for Cloud Computing
Demartek Lenovo Storage S3200 MS Exchange Evaluation_2016-01
Harness Enterprise Data with SQL Server 2008 R2 and New Intel Xeon Processors
Ad

Similar to Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors (20)

PDF
AWS EC2 M6i instances with 3rd Gen Intel Xeon Scalable processors accelerated...
PDF
AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors improv...
PDF
AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offere...
PDF
Google Cloud N2 VM instances featuring 3rd Gen Intel Xeon Scalable processors...
PDF
Finish Microsoft SQL Server data analysis faster with new M5n series instance...
PDF
Complete online analytics processing work faster with Google Cloud Platform N...
PDF
Google Cloud N2 instances featuring 3rd Gen Intel Xeon Scalable processors ex...
PDF
Accelerate natural language processing with AWS EC2 M7i instances featuring 4...
PDF
Process data analytics queries faster with new Microsoft Azure Lsv3-series VM...
PPTX
Anomaly Detection with Azure and .NET
PDF
Complete artificial intelligence workloads faster using Microsoft Azure virtu...
PPTX
Anomaly Detection with Azure and .net
PDF
Get higher performance for your MySQL databases with Dell APEX Private Cloud ...
PDF
Prepare images for machine learning faster with servers powered by AMD EPYC 7...
PDF
Get competitive logistic regression performance with servers with AMD EPYC 75...
PPTX
Squeezing Deep Learning Into Mobile Phones
PDF
Research Paper of Image Recognition .02.pdf
PDF
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
PDF
Podila QCon SF 2016
PDF
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
AWS EC2 M6i instances with 3rd Gen Intel Xeon Scalable processors accelerated...
AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors improv...
AWS EC2 M6i instances featuring 3rd Gen Intel Xeon Scalable processors offere...
Google Cloud N2 VM instances featuring 3rd Gen Intel Xeon Scalable processors...
Finish Microsoft SQL Server data analysis faster with new M5n series instance...
Complete online analytics processing work faster with Google Cloud Platform N...
Google Cloud N2 instances featuring 3rd Gen Intel Xeon Scalable processors ex...
Accelerate natural language processing with AWS EC2 M7i instances featuring 4...
Process data analytics queries faster with new Microsoft Azure Lsv3-series VM...
Anomaly Detection with Azure and .NET
Complete artificial intelligence workloads faster using Microsoft Azure virtu...
Anomaly Detection with Azure and .net
Get higher performance for your MySQL databases with Dell APEX Private Cloud ...
Prepare images for machine learning faster with servers powered by AMD EPYC 7...
Get competitive logistic regression performance with servers with AMD EPYC 75...
Squeezing Deep Learning Into Mobile Phones
Research Paper of Image Recognition .02.pdf
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...
Podila QCon SF 2016
Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report
Ad

More from Principled Technologies (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Dell Pro 14 Plus: Be better prepared for what’s coming
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
PDF
Make GenAI investments go further with the Dell AI Factory - Infographic
PDF
Make GenAI investments go further with the Dell AI Factory
PDF
Unlock faster insights with Azure Databricks
PDF
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
PDF
The case for on-premises AI
PDF
Dell PowerEdge server cooling: Choose the cooling options that match the need...
PDF
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
PDF
Propel your business into the future by refreshing with new one-socket Dell P...
PDF
Propel your business into the future by refreshing with new one-socket Dell P...
PDF
Unlock flexibility, security, and scalability by migrating MySQL databases to...
PDF
Migrate your PostgreSQL databases to Microsoft Azure for plug‑and‑play simpli...
PDF
On-premises AI approaches: The advantages of a turnkey solution, HPE Private ...
PDF
A Dell PowerStore shared storage solution is more cost-effective than an HCI ...
PDF
Gain the flexibility that diverse modern workloads demand with Dell PowerStore
PDF
Save up to $2.8M per new server over five years by consolidating with new Sup...
PDF
Securing Red Hat workloads on Azure - Summary Presentation
PDF
Securing Red Hat workloads on Azure - Infographic
Modernizing your data center with Dell and AMD
Dell Pro 14 Plus: Be better prepared for what’s coming
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Make GenAI investments go further with the Dell AI Factory - Infographic
Make GenAI investments go further with the Dell AI Factory
Unlock faster insights with Azure Databricks
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
The case for on-premises AI
Dell PowerEdge server cooling: Choose the cooling options that match the need...
Speed up your transactions and save with new Dell PowerEdge R7725 servers pow...
Propel your business into the future by refreshing with new one-socket Dell P...
Propel your business into the future by refreshing with new one-socket Dell P...
Unlock flexibility, security, and scalability by migrating MySQL databases to...
Migrate your PostgreSQL databases to Microsoft Azure for plug‑and‑play simpli...
On-premises AI approaches: The advantages of a turnkey solution, HPE Private ...
A Dell PowerStore shared storage solution is more cost-effective than an HCI ...
Gain the flexibility that diverse modern workloads demand with Dell PowerStore
Save up to $2.8M per new server over five years by consolidating with new Sup...
Securing Red Hat workloads on Azure - Summary Presentation
Securing Red Hat workloads on Azure - Infographic

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PDF
KodekX | Application Modernization Development
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
KodekX | Application Modernization Development
Digital-Transformation-Roadmap-for-Companies.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors

  • 1. Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors Newer M5n instances featuring Intel Xeon Platinum 8272CL processors performed more inference operations per second than M4 instances with older processors Big data—the massive amounts of information that organizations collect—is useful only if the information is sorted and classified to deliver insights they can act on. By using deep learning networks for quick image classification and prediction, computers can offer insight into business patterns and offer suggestions to consumers in real time. Amazon Web Services (AWS) Elastic Compute Cloud (EC2) offers several cloud instances that can support deep learning models, including general-purpose M5n instances. These newer AWS M5n instances run on Intel® Xeon® Platinum 8272CL processors, which include a feature, Intel Deep Learning Boost, that Intel designed to improve machine learning workloads. At Principled Technologies, we used two deep learning inference benchmarks from the Model Zoo for Intel Architecture—ResNet50, which classifies images, and Wide & Deep recommendation system, which generates advertisement recommendations—to compare the inference performance of older M4 instances to newer M5n instances at various instance sizes. We found that for both deep learning frameworks, the upgraded M5n instances offered significantly better inference performance, which shows that M5n instances featuring 2nd Generation Intel Xeon Scalable processors can help organizations make sense of their data faster. Classify images up to 6.17x more images per second Generate ad recommendations up to 2.94x as fast Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 A Principled Technologies report: Hands-on testing. Real-world results.
  • 2. Figure 1: Key specifications for each instance size we tested. Source: Principled Technologies. How we tested We purchased three sets of instances from two general-purpose AWS EC2 series: • Newer M5n instances featuring 2nd Generation Intel Xeon Scalable processors (Cascade Lake) • Older M4 instances featuring Intel Xeon E5-2686 v4 processors (Broadwell) We ran each instance in the US East 1 region. Figure 1 shows the specifications for the instances that we chose. To show how businesses of various sizes with different deep learning demands can benefit from choosing M5n instances, we tested small (8 vCPU), medium (16 vCPU), and large (64 vCPU) VM sizes. Small (E8s_v4) 8 vCPUs Medium (E16s_v4) 16 vCPUs Large (E64s_v4) 64 vCPUs About 2nd Generation Intel Xeon Scalable processors with Intel Deep Learning Boost The 2nd Generation Intel Xeon Scalable processor platform—codenamed Cascade Lake—features a wide range of processor types, including Bronze, Silver, Gold, and Platinum, to support varying workload needs. To accelerate machine learning inference, 2nd Gen Intel Xeon Scalable processors offer Intel Deep Learning Boost (DL Boost). Intel DL Boost builds on Intel Advanced Vector Extensions 512 (AVX-512) instructions with Intel Vector Neural Network Instructions (VNNI), combining multiple processor instructions into one to improve machine learning inference performance through resource optimization.1 To learn more about Intel DL Boost built into 2nd Generation Intel Xeon Scalable processors, visit https://www. intel.com/content/dam/www/public/us/en/documents/product-overviews/dl-boost-product-overview.pdf. S M L Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 2
  • 3. Why choose M5n instances? Compared to older M4 instances, M5n instances offer: • 2nd Generation Intel Xeon Scalable Processors with a sustained all-core Turbo CPU frequency of 3.1 GHz, maximum single core turbo frequency of 3.5 GHz, and Intel Vector Neural Network support (AVX-512 VNNI) • Peak bandwidth of 25 Gbps for small instances or 100 Gbps for large instances • EBC or NVMe™ SSDs physically attached to the host server Classifying images – ResNet50 From Model Zoo for Intel Architecture, which offers machine learning models, we chose the popular ResNet50 deep learning model for testing. ResNet50 is a convolutional neural network that runs 50 layers deep; organizations use it to recognize and classify images. Deep learning for image classification is useful for real- world applications such as self-driving cars or aiding in medical diagnoses. The benchmark reported throughput in images per second that the solutions handled using this model, with higher scores indicating better performance at this type of deep learning. Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 3
  • 4. Small instances If your deep learning needs are on the smaller side, selecting an AWS instance with 8 vCPUs could meet your image classification needs. We found that a newer AWS M5n instance with 8 vCPUs featuring 2nd Gen Intel Xeon Scalable processors classified 6.17 times the number of images per second using the ResNet50 benchmark (with INT8 precision) as the small-sized M4 instance with previous-generation Intel Xeon processors (with FP32 precision). 0 1 2 3 4 5 7 6 m5n.2xlarge 8 vCPU ResNet50 normalized images/sec throughput Images/sec Higher is better m4.2xlarge 1 6.17 Figure 2: Relative number of images per second that the small-size instances (8 vCPUs) classified using the ResNet50 benchmark. Higher numbers are better. Source: Principled Technologies. Medium instances Medium instances (16 vCPUs), which are better suited for larger models or datasets, also benefit from newer processors. We found that a newer AWS M5n instance with 16 vCPUs featuring 2nd Gen Intel Xeon Scalable processors classified 5.78 times the number of images per second using the ResNet50 benchmark (with INT8 precision) as the medium-sized M4 instance with previous-generation Intel Xeon processors (with FP32 precision). 16 vCPU ResNet50 normalized images/sec throughput Images/sec Higher is better m5n.4xlarge m4.4xlarge 0 1 2 3 4 5 7 6 5.78 1 Figure 3: Relative number of images per second that the medium-size instances (16 vCPUs) classified using the ResNet50 benchmark. Higher numbers are better. Source: Principled Technologies. 6.17x images/sec 5.78x images/sec Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 4
  • 5. Large instances If your organization needs to run deep learning workloads to extract insights or make recommendations from even larger datasets, instances with 64 vCPUs can better tackle your needs. We found that a newer AWS M5n instance with 64 vCPUs featuring 2nd Gen Intel Xeon Scalable processors classified 5.23 times the number of images per second using the ResNet50 benchmark (with INT8 precision) as the large-sized M4 instance with previous-generation Intel Xeon processors (with FP32 precision). 64 vCPU ResNet50 normalized images/sec throughput Images/sec Higher is better m5n.16xlarge m4.16xlarge 0 1 2 3 4 5 7 6 1 5.23 Figure 4: Relative number of images per second that the large-size instances (64 vCPUs) classified using the ResNet50 benchmark. Higher numbers are better. Source: Principled Technologies. Get more value from your cloud instances Based on our machine learning test results, newer AWS M5n instances offer up to 6.17 times the ResNet50 performance for as little as 1.19 times the cost. This means that upgraded M5n instances with 2nd Gen Intel Xeon Scalable processors can provide better overall value compared to older M4 instances. 5.23x images/sec Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 5
  • 6. Real-time recommendations based on ad-click historical data – Model Zoo for Intel Architecture Wide & Deep model We used a TensorFlow-based model from the Model Zoo for Intel Architecture to conduct Wide & Deep testing. Wide & Deep uses wide linear models and deep neural networks to infer meaningful relationships between data and deliver recommendations based on that data. Small instances Organizations with smaller model sizes and datasets can run deep learning on instances configured with 8 vCPUs. We found that a newer AWS M5n instance with 8 vCPUs featuring 2nd Gen Intel Xeon Scalable processors (with INT8 precision) handled 2.86 times the number of samples per second using the Wide & Deep benchmark as the small-sized M4 instance with previous-generation processors (with FP32 precision). 0 1 2 3 4 8 vCPU normalized throughput Samples/sec Higher is better m5n.2xlarge m4.2xlarge 1 2.86 Figure 5: Relative number of samples per second that the small-size instances (8 vCPUs) handled using the Wide & Deep benchmark. Higher numbers are better. Source: Principled Technologies. 2.86x samples/sec Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 6
  • 7. Medium instances For organizations seeking to make recommendations based on mid-sized datasets, 16 vCPU instances may be more appropriate. We found that a newer AWS M5n instance with 16 vCPUs featuring 2nd Gen Intel Xeon Scalable processors (with INT8 precision) handled 2.94 times the number of samples per second using the Wide & Deep benchmark as the medium-sized M4 instance with previous- generation processors (with FP32 precision). Higher is better Samples/sec 16 vCPU normalized throughput m5n.4xlarge m4.4xlarge 1 0 1 2 3 4 2.94 Figure 6: Relative number of samples per second that the medium-size instances (16 vCPUs) handled using the Wide & Deep benchmark. Higher numbers are better. Source: Principled Technologies. Large instances Those that need quick recommendations on larger datasets may require virtual machines with 64 vCPUs. We found that a newer AWS M5n instance with 64 vCPUs featuring 2nd Gen Intel Xeon Scalable processors (with INT8 precision) handled 2.67 times the number of samples per second using the Wide & Deep benchmark as the large-sized M4 instance with previous-generation processors (with FP32 precision). Higher is better Samples/sec 64 vCPU normalized throughput m5n.16xlarge m4.16xlarge 1 0 1 2 3 4 2.67 Figure 7: Relative number of samples per second that the large-size instances (64 vCPUs) handled using the Wide & Deep benchmark. Higher numbers are better. Source: Principled Technologies. 2.94x samples/sec 2.67x samples/sec Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 7
  • 8. Choose AWS M5n instances featuring 2nd Gen Intel Xeon Scalable processors for timely insights from data Getting faster insights from never-ending data streams can improve business agility and lead to greater success. Our test results show that newer AWS M5n instances featuring 2nd Gen Intel Xeon Scalable processors with Intel Deep Learning Boost sped up deep learning inference performance for image classification and recommendation models over older M4 instances. These performance improvements come at little added cost, which means that M5n instances can offer better value per VM. By doing more deep learning work per instance, your organization could ultimately require fewer instances overall, which can help keep budget concerns in check. By choosing AWS EC2 M5n instances with 2nd Gen Intel Xeon Scalable processors, your organization can get deep learning insights from data faster than with older M4 instances. 1 Intel, “Intel Deep Learning Boost,” accessed July 29, 2021, https://www.intel.com/content/dam/www/public/us/en/ documents/product-overviews/dl-boost-product-overview.pdf. Principled Technologies is a registered trademark of Principled Technologies, Inc. All other product names are the trademarks of their respective owners. For additional information, review the science behind this report. Principled Technologies® Facts matter.® Principled Technologies® Facts matter.® This project was commissioned by Intel. Read the science behind this report at http://facts.pt/7J92SKA Speed up deep learning tasks with Amazon Web Services instances featuring 2nd Gen Intel Xeon Scalable processors September 2021 | 8