SlideShare a Scribd company logo
1© 2018 Walmart International | Confidential | For Internal Use Only
How GPU Computing literally
saved me at work!
Abhishek Mungoli, Data Scientist, MTech(IIIT, Hyd.)
Date: June 11th, 2019
Prepared for Fifth Elephant
2© 2018 Walmart International | Confidential | For Internal Use Only
Content
• Problem Description
• Solutions for run time optimizations
• Focus on GPU- Infra used
• Task Complexity details
• Solution Framework
• CPU v/s GPU-Theory + Implementation
• Assumptions
• Take Away
3© 2018 Walmart International | Confidential | For Internal Use Only
Identifying Similar Items
• Recommendation systems
• Item alternatives
• Assortment
• Customer basket customization
4© 2018 Walmart International | Confidential | For Internal Use Only
EngineeringTools
Distributed processing Spark GPU
5© 2018 Walmart International | Confidential | For Internal Use Only
Infrastructure Used
• Nvidia Volta V100 16GB GPU • Python • Numba
6© 2018 Walmart International | Confidential | For Internal Use Only
Task Complexity
Computation TimeSize
• Number of Items
• 105
• Dimension of each item
• 64-D vector
• Task
• Identify top 3 similar
items to each item in set
• Cosine Similarity
• For finding items with highest
similarity, for one item
• O(n*k)
• For finding items with highest
similarity, for all items
• O(n*n*k)
• On a subset of 103 items
• 17 seconds
• 3.7 *10⁶ operations per second
• On a subset of 104 items
• 1700 seconds = ~28 minutes
• On 105 item set
• 1700 * 100 seconds = 2834
minutes = 47.2 hours ~ 2 days
7© 2018 Walmart International | Confidential | For Internal Use Only
CPU v/s GPU
Few Cores Thousands of Cores
Single-thread optimization Multiple concurrent threads
Low LatencyTolerance High LatencyTolerance
• When to Use GPU????
8© 2018 Walmart International | Confidential | For Internal Use Only
Solution Framework
• Embeddings can be obtained at different level of hierarchies i.e. Category, Sub-category, Fineline, UPC or Item
level.
• Similar items have similar embeddings.
Assumptions
Work Flow
9© 2018 Walmart International | Confidential | For Internal Use Only
Take Away
• The CPU estimated time of 2 days was brought down to 20.5 seconds with the use of GPU.
• This was possible only because of the nature of the task. Finding top-3 similar items to Item ‘A’ is independent
of finding top-3 similar items to Item ‘B’.
• GPU’s can be used for fast text searching in document, fast searching of a node in graph, etc.We need to
identify the parallelism in task and exploit GPU for speed up.
• We can identify the components of the system/module parallel in nature and speed them up.This way we can
have a system/module with some components running in CPU and some in GPU as per need and necessity.
10© 2018 Walmart International | Confidential | For Internal Use Only
Q&A
11© 2018 Walmart International | Confidential | For Internal Use Only
References
• https://en.wikipedia.org/wiki/Time_complexity
• https://en.wikipedia.org/wiki/Graphics_processing_unit
• https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/
• https://www.datascience.com/blog/cpu-gpu-machine-learning
• https://qr.ae/TWIuic
• https://numba.pydata.org/numba-doc/latest/index.html
• https://en.wikipedia.org/wiki/CUDA
• https://www.nvidia.in/object/cuda-parallel-computing-in.html
• https://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/
• https://qr.ae/TWIwEW
12© 2018 Walmart International | Confidential | For Internal Use Only
ThankYou
Abhishek Mungoli, Data Scientist, Walmart
LinkedIn - https://www.linkedin.com/in/abhishek-mungoli-39048355/
Medium - https://medium.com/@mungoliabhishek81
Instagram - https://www.instagram.com/simplyspartanx/

More Related Content

PDF
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
PPTX
Introduction to knime
PDF
How GPU Computing saved me at work PyData talk
PDF
Python for Computer Vision - Revision 2nd Edition
PPTX
Hyperledger weatherreport20190219 公開版
PPTX
Lecture4 Windows System Artifacts.pptx
PDF
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
PDF
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
Big Data as easy as 1, 2, 3, ... 4 ... with KNIME
Introduction to knime
How GPU Computing saved me at work PyData talk
Python for Computer Vision - Revision 2nd Edition
Hyperledger weatherreport20190219 公開版
Lecture4 Windows System Artifacts.pptx
OSDC 2018 | Apache Ignite - the in-memory hammer for your data science toolki...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...

Similar to How GPU Computing literally saved me at work! (20)

PPTX
Search and Recommendations: 3 Sides of the Same Coin
PPTX
[DSC Europe 24] Thomas Kitzler - Building the Future – Unpacking the Essentia...
PPTX
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
PDF
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
PDF
PCCC24(第24回PCクラスタシンポジウム):富士通株式会社 テーマ2「AI処理におけるGPUの演算効率を高めるミドルウェア技術『AI Computi...
PPTX
Deep Learning for Recommender Systems
PPTX
Flipkart Data Platform @ Scale - slash n 2018 reprise
PDF
How to scale MongoDB
PPTX
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
PDF
Scaling notebooks for Deep Learning workloads
PDF
Archmage, Pinterest’s Real-time Analytics Platform on Druid
PPTX
[Unity Forum 2019] Mobile Graphics Optimization Guides
PPTX
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
PPTX
Spark Magic Building and Deploying a High Scale Product in 4 Months
PDF
Deep Learning for Recommender Systems with Nick pentreath
PDF
Machine Learning for Capacity Management
 
PPTX
AI Hardware
PPTX
RNNs for Recommendations and Personalization
PDF
Estimating the Total Costs of Your Cloud Analytics Platform 
PDF
Genomics Deployments - How to Get Right with Software Defined Storage
Search and Recommendations: 3 Sides of the Same Coin
[DSC Europe 24] Thomas Kitzler - Building the Future – Unpacking the Essentia...
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
IMCSummit 2015 - Day 1 Developer Track - In-memory Computing for Iterative CP...
PCCC24(第24回PCクラスタシンポジウム):富士通株式会社 テーマ2「AI処理におけるGPUの演算効率を高めるミドルウェア技術『AI Computi...
Deep Learning for Recommender Systems
Flipkart Data Platform @ Scale - slash n 2018 reprise
How to scale MongoDB
YugaByte DB - "Designing a Distributed Database Architecture for GDPR Complia...
Scaling notebooks for Deep Learning workloads
Archmage, Pinterest’s Real-time Analytics Platform on Druid
[Unity Forum 2019] Mobile Graphics Optimization Guides
Building A Self Service Streaming Platform at Pinterest - Steven Bairos-Novak...
Spark Magic Building and Deploying a High Scale Product in 4 Months
Deep Learning for Recommender Systems with Nick pentreath
Machine Learning for Capacity Management
 
AI Hardware
RNNs for Recommendations and Personalization
Estimating the Total Costs of Your Cloud Analytics Platform 
Genomics Deployments - How to Get Right with Software Defined Storage
Ad

More from Abhishek Mungoli (8)

PDF
Pattern searching
PDF
Dots & boxes
PDF
Function polynomial time
PDF
Choice Coordination problem
PDF
Tree Based Regular Languages
PDF
BOOKPAD: REST API for Document Viewing
PDF
Analysis of different similarity measures: Simrank
PDF
Ire major project
Pattern searching
Dots & boxes
Function polynomial time
Choice Coordination problem
Tree Based Regular Languages
BOOKPAD: REST API for Document Viewing
Analysis of different similarity measures: Simrank
Ire major project
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction to machine learning and Linear Models
PDF
Mega Projects Data Mega Projects Data
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
IBA_Chapter_11_Slides_Final_Accessible.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Foundation of Data Science unit number two notes
IB Computer Science - Internal Assessment.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
.pdf is not working space design for the following data for the following dat...
Qualitative Qantitative and Mixed Methods.pptx
Clinical guidelines as a resource for EBP(1).pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Fluorescence-microscope_Botany_detailed content
Miokarditis (Inflamasi pada Otot Jantung)
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to machine learning and Linear Models
Mega Projects Data Mega Projects Data
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

How GPU Computing literally saved me at work!

  • 1. 1© 2018 Walmart International | Confidential | For Internal Use Only How GPU Computing literally saved me at work! Abhishek Mungoli, Data Scientist, MTech(IIIT, Hyd.) Date: June 11th, 2019 Prepared for Fifth Elephant
  • 2. 2© 2018 Walmart International | Confidential | For Internal Use Only Content • Problem Description • Solutions for run time optimizations • Focus on GPU- Infra used • Task Complexity details • Solution Framework • CPU v/s GPU-Theory + Implementation • Assumptions • Take Away
  • 3. 3© 2018 Walmart International | Confidential | For Internal Use Only Identifying Similar Items • Recommendation systems • Item alternatives • Assortment • Customer basket customization
  • 4. 4© 2018 Walmart International | Confidential | For Internal Use Only EngineeringTools Distributed processing Spark GPU
  • 5. 5© 2018 Walmart International | Confidential | For Internal Use Only Infrastructure Used • Nvidia Volta V100 16GB GPU • Python • Numba
  • 6. 6© 2018 Walmart International | Confidential | For Internal Use Only Task Complexity Computation TimeSize • Number of Items • 105 • Dimension of each item • 64-D vector • Task • Identify top 3 similar items to each item in set • Cosine Similarity • For finding items with highest similarity, for one item • O(n*k) • For finding items with highest similarity, for all items • O(n*n*k) • On a subset of 103 items • 17 seconds • 3.7 *10⁶ operations per second • On a subset of 104 items • 1700 seconds = ~28 minutes • On 105 item set • 1700 * 100 seconds = 2834 minutes = 47.2 hours ~ 2 days
  • 7. 7© 2018 Walmart International | Confidential | For Internal Use Only CPU v/s GPU Few Cores Thousands of Cores Single-thread optimization Multiple concurrent threads Low LatencyTolerance High LatencyTolerance • When to Use GPU????
  • 8. 8© 2018 Walmart International | Confidential | For Internal Use Only Solution Framework • Embeddings can be obtained at different level of hierarchies i.e. Category, Sub-category, Fineline, UPC or Item level. • Similar items have similar embeddings. Assumptions Work Flow
  • 9. 9© 2018 Walmart International | Confidential | For Internal Use Only Take Away • The CPU estimated time of 2 days was brought down to 20.5 seconds with the use of GPU. • This was possible only because of the nature of the task. Finding top-3 similar items to Item ‘A’ is independent of finding top-3 similar items to Item ‘B’. • GPU’s can be used for fast text searching in document, fast searching of a node in graph, etc.We need to identify the parallelism in task and exploit GPU for speed up. • We can identify the components of the system/module parallel in nature and speed them up.This way we can have a system/module with some components running in CPU and some in GPU as per need and necessity.
  • 10. 10© 2018 Walmart International | Confidential | For Internal Use Only Q&A
  • 11. 11© 2018 Walmart International | Confidential | For Internal Use Only References • https://en.wikipedia.org/wiki/Time_complexity • https://en.wikipedia.org/wiki/Graphics_processing_unit • https://blogs.nvidia.com/blog/2009/12/16/whats-the-difference-between-a-cpu-and-a-gpu/ • https://www.datascience.com/blog/cpu-gpu-machine-learning • https://qr.ae/TWIuic • https://numba.pydata.org/numba-doc/latest/index.html • https://en.wikipedia.org/wiki/CUDA • https://www.nvidia.in/object/cuda-parallel-computing-in.html • https://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/ • https://qr.ae/TWIwEW
  • 12. 12© 2018 Walmart International | Confidential | For Internal Use Only ThankYou Abhishek Mungoli, Data Scientist, Walmart LinkedIn - https://www.linkedin.com/in/abhishek-mungoli-39048355/ Medium - https://medium.com/@mungoliabhishek81 Instagram - https://www.instagram.com/simplyspartanx/

Editor's Notes

  • #4: In the retail domain, finding similar or closest entities is a common task. Provided with a list of items where each item is represented with k latent attributes. Task is to find top-3 similar items for every item in the list.