2018 Intel AI Developer Conference Keynote

Operant
conditioning
1938
Transistor1947
1964
first
neuroscience
department
Intel
Founded
1968
1952
Spiking
Neuron
The
Turing
Machine1936
First
computer
science
department1962
mid 2000s
First1
Billion
transistor
processor
mid 2000s
Deep
learning
prevalence

Convolution
Matrixmultiplication
BatchNorm poolingnormalization
activation
HELPS REALIZE THE INCREDIBLE BENEFITS OF DIRECT OPTIMIZATION
Intel/mkl-dnn
OPEN SOURCE

OPTIMIZING TENSORFLOW
Other names and brands may be claimed as the property of others

OPEN SOURCE COMPILER ENABLING FLEXIBILITY
TO RUN MODELS ACROSS A VARIETY OF
FRAMEWORKS AND HARDWARE
NervanaSystems/ngraph

ENABLING DEEP LEARNING TO TAKE ADVANTAGE
OF SCALABLE SPARK AND HADOOP CLUSTERS
intel-analytics/BigDL

USING DATA SCIENCE TO

EmergencyResponse Financialservices MachineVision Cities/transportation
AutonomousVehicles ResponsiveRetail Manufacturing Publicsector

VISUAL INFERENCING AND NEURAL NETWORK OPTIMIZATION
DEPLOY COMPUTER
VISION AND DEEP
LEARNING CAPABILITIES
TO THE EDGE HighPerformance,highEfficiencyfortheedge
Writeonce+scaletoDiverseAccelerators
BroadFrameworksupport
VPU = Vision Processing Unit (Movidius)

Source: https://research.fb.com/publications/applied-machine-learning-at-facebook-a-datacenter-infrastructure-perspective/

AScientificCollaborationbetween
IntelandNovartis

224x224x3
ImageNet
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.

1024x1280x3
26xlarger
224x224x3
ImageNet

26xlargerMultipleobjects
1024x1280x3

§ Configuration: CPU: Xeon 6148 @ 2.4GHz, Hyper-threading: Enabled. NIC: Intel® Omni-Path Host Fabric Interface, TensorFlow: v1.7.0, Horovod: 0.12.1, OpenMPI: 3.0.0. OS: CentOS 7.3, OpenMPU 23.0.0, Python 2.7.5
Time to Train to converge to 99% accuracy in model
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.
MultiscaleConvolutionNeuralNetwork
Intel® MKL/MKL-DNN,
clDNN, DAAL
OptimizedLibraries Intel®Omni-PathArchitecture
ScalingofTimetoTrainIntel® Omni-Path Architecture, Horovod and TensorFlow®
Speedupcomparedtobaseline1.0
measuredintimetotrainin1nodes
1 Node 2 Nodes 4 Nodes 8 Nodes 1 Node 2 Nodes 4 Nodes 8 Nodes
TOTALMEMORYUSED192GB DDR4 PER INTEL® 2S XEON® 6148 PROCESSOR
128.6GB
257.2GB
514.4GB
64.3GB

ZIVA VFX
Authoring Tools
Shipped as a
Maya Plugin
DISTRIBUTED TO SERVER FARM FOR SHOT RENDERS
Intel
Paradiso IntelBLAS IntelLapack
ZIVA FEM PHYSICS SOLVER
IntelMKL
Bones/
Muscles Fascia FatandSkin
ZIVA CHARACTERS SIMULATED IN PASSES Charactertransfer/
automation
Volumetriccapture
augmentation
Real-TimeTraining
Embeddedplayersin
anysoftware
Distributedgraphs
NativeNodegraphui
withrobustapi
A.I & M.L
TECHNOLOGIES
DISTRIBUTED
FUNCTIONALITY
AND EMBEDDED PLAYERS
ZIVA INTEGRATION FRAMEWORK
CLOUD SERVICES
TOOLS
ENGINES

FLEXIBLE REAL-TIME INFERENCING

•Deploy DNN and Computer Vision at the Edge
•Native FP16 and Fixed Point 8 bit support
•4 TOPS with 1 TOPS of DNN Compute at 1W

Chip X Lake Crest
Theoretical
Reality
TOPs
0
contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance
Chip X GEMM based on DeepBench training data for A(5124, 9124), B(9124,2560) matrix size GEMM operations performing DeepSpeech using FP16+ mixed precision at 27.43 TOPs.
Source: Lake Crest, Based on Intel measurements on limited distribution SDV, General Matrix-Matrix Multiplication; A(1536, 2048), B(2048, 1536).

Source: Lake Crest: Based on Intel measurements on limited distribution SDV
1 General Matrix-Matrix Multiplication; A(1536, 2048), B(2048, 1536)
2 Two chip vs. single chip GEMM performance; A(6144, 2048), B(2048, 1536)
3 Full Chip MRB-CHIP MRB data movement using send/recv, Tensor size = (1, 32), average across 50K iterations
MULTI-CHIP
COMMUNICATION3
Power<210W
2.4Tb/s
OFF CHIP BANDWIDTH
<790ns LATENCY
96.4%
GEMM OPERATION
UTILIZATION1
A(1536, 2048)
B(2048, 1536)
MULTI-CHIP
SCALING2
96.2%A(6144, 2048)
B(2048, 1536)
Chip X Lake Crest
Theoretical
Reality
TOPs
0

Chip X Lake Crest Spring Crest (Estimate)
Theoretical
Reality
TOPs
0
Source: Lake Crest - Based on Intel measurements on limited distribution SDV
Source: Spring Crest - Intel measurements on simulated product

firstcommercialNNP
Intel®Nervana™
NNPL-1000in 2019
3-4x training performance
of first generation
Lake Crest product
PURPOSE BUILT DESIGN OPTIMIZED ACROSS
MEMORY BANDWIDTH, UTILIZATION, AND POWER

Machine Learning at Amazon: a long heritage
Person alized
recommen d ation s
Inventin g entirely n ew
c u stomer exp erien c es
F u lfillment au tomation
/ inventor y man agement
Cargo Voic e d riven
interac tion s

Fully managed
hosting with auto-
scaling
One-click
deployment
Pre-built notebooks
for common
problems
Built-in, high
performance
algorithms
One-click
training
Hyperparameter
optimization
BUILD TRAIN DEPLOY
A m a z o n S a g e M a k e r

A W S M a c h i n e L e a r n i n g S t a c k
FRAMEWORKS AND
INTERFACES
APPLICATION SERVICES
PLATFORM
SERVICES
KERAS
Frameworks Interfaces
P O L L YR E K O G N I T I O N L E XR E K O G N I T I O N
V I D E O
T R A N S C R I B E T R A N S L A T E C O M P R E H E N D
AMAZON
SAGEMAKER
INFRASTRUCTURE
EC2 GPUs EC2 CPUs IoT Edge
AWS
DEEPLENS

RL
Coach
Neural
Network
Distiller
NLP
Architect
NervanaSystems/nlp-architect NervanaSystems/coach NervanaSystems/distiller

GOAL IS TO SELECT A SET OF ML PROBLEMS, EACH
DEFINED BY A DATASET AND QUALITY TARGET, THEN
MEASURE THE WALL CLOCK TIME TO TRAIN A MODEL
FOR EACH PROBLEM.
For more information, see: https://mlperf.org/

AICHALLENGE.INTEL.COM
*Idea implementation at the Olympic Games subject to approval

Today’s presentation contains forward-looking statements. All
statements made that are not historical facts are subject to a number of
risks and uncertainties, and actual results may differ materially. Please
refer to our most recent earnings release, Form 10-Q and 10-K filing
available on our website for more information on the risk factors that
could cause actual results to differ.

2018 Intel AI Developer Conference Keynote

More Related Content

What's hot (17)

Similar to 2018 Intel AI Developer Conference Keynote (20)

More from DESMOND YUEN (20)

Recently uploaded (20)

2018 Intel AI Developer Conference Keynote