SlideShare a Scribd company logo
Globe2Train: A Framework for Distributed
ML Model Training using IoT Devices
Across the Globe
Bharath Sudharsan, John G. Breslin, Muhammad Intizar Ali
Introduction
▪ Every modern household does not compulsorily own a GPU, yet it roughly has around a dozen IoT devices
➢ When idle IoT devices efficiently connected, can locally, within their home network train mid-sized
ML models without depending on Cloud or GPU servers
➢ Latest GEFORCE RTX 2080 Ti GPU has 11 GB RAM but 1500 $. Whereas one Alexa smart speaker
has 2 GB RAM and efficiently connecting 20 can collectively pool 40 GB of RAM. In this way, we can
complete training faster than the expensive GPU and at a 0 $ investment since millions of IoT
devices already exist globally, and most of them are idle
▪ Challenges in distributed global training scenarios:
➢ Network Uncertainties: Congestion and Latency Variance in real-world IoT networks
➢ Staleness Effect: stale parameters. i.e., the model parameters arrive late, not reflect the latest
updates. Staleness slows down convergence
Related Papers
[1] L. Zhu, Y. Lu, Y. Lin, and S. Han, “Distributed training across the world,” 2019
[2] J. Lin, C. Gan, and S. Han, “Training kinetics in 15 minutes: Large-scale distributed training on videos,”arXiv preprint, 2019
[3] Y. Lin, and W. J. Dally, “Deep gradient compression: Reducing the communication bandwidth for distributed training,”arXiv preprint, 2017
[4] W. Xu, and N. Xiong, “Accelerating federated learning for iot in big data analytics with pruning, quantization and selective updating, ”IEEE Access, 2021
[5] S. Wang, and K. Chan, “When edge meets learning: Adaptive control for resource-constrained distributed machine learning,” in IEEE INFOCOM, 2018
Globe2Train
▪ The Globe2Train (G2T) framework enables distributed ML model training using IoT devices across the globe:
a. The framework's G2T-Cloud (G2T-C) component is deployed on a central server/cloud
b. The G2T-Device (G2T-D) component is deployed on training involved IoT devices
G2T-C Component
▪ G2T-C component decomposes one multi-
class problem into multiple binary
problems, which IoT devices solve and
updates back the weights
➢ Decomposes 1 multi-class problem
into k(k-1)/2 binary problems
➢ Smart speaker trains 3 binary
classifiers. b_4 (to differentiate
class 1 data from class 5 data), b_5
(class 2 from 3) and b_6 (2 from 4)
G2T-D Component
▪ line 4: Sets the weight threshold Wt high for the
training involved devices that have a poor
internet connection
➢ Reduces frequent transmission of weights,
reducing the network bandwidth. i.e.,
weights are locally accumulated till it
reaches Wt
▪ Lines 6 to 8: Shrinks all the accumulated weights
using the encode() function, which packs the non-
zero weight values
➢ Similar to how Vanilla, Nesterov
momentum SGD encodes the
parameters/gradients
Results
▪ We initiate distributed training from G2T-C component, where it decomposes the given 95 classes problem into
4465 binary problems (i.e., K = 95 in K(K-1)/2), where b0 to b4465 binary classifiers need to be trained by MCUs
1-3 in numerous rounds and in each round report back the calculated weights
Results
▪ Contributions of the G2T-D Framework Component
➢ Congestion and Latency Toleration: Reduces weights synchronization frequency (less usage of
congested network) by not allowing to transmit weights frequently or based on its availability. So,
the training process used in across the globe setting gains the ability to tolerate latency
➢ Improved Training Scalability: Sends the weights at intervals that depend on the network condition
- this process improves scalability by reducing the communication-to-computation ratio
➢ IoT Hardware Friendliness: Implementation is only a few lines of code
▪ Contributions of the G2T-C Framework Component
➢ Overcoming Staleness Effect: Designed not to face staleness and is more efficient than SSGD and
ASGD - as it intelligently splits the tasks that execute in multiple rounds on the IoT devices
Conclusion
▪ We presented Globe2Train, a framework for training ML models on idle IoT devices, millions of which
exist across the globe
▪ G2T-C and G2T-D framework components can improve distributed training scalability, speed while
reducing communication frequency and tolerating network latency
▪ Since the G2T-C can decompose one multi-class problem into multiple binary problems, it can be used to
decompose a resource-demanding problem (can run only on GPU clusters) into multiple resource-friendly
tiny parts that can distributedly execute on numerous idle IoT devices across the globe
▪ Since the G2T-D can significantly compress gradients/parameters during distributed training, it can be
the basis for a broad spectrum of decentralized and collaborative learning applications
Future Work
▪ Perform comprehensive real-world experimental evaluation of G2T
➢ Deploy the G2T-D on a few geographically separated IoT devices - poor to good network conditions
➢ Deploy the G2T-C on an Amazon AWS server and establish communication with IoT devices
➢ Define a ML model on AWS, then instruct the G2T-C to decompose the given ML multi-class
problem into multiple binary problems and assign it to the connected IoT devices
➢ Make IoT devices solve the binary problems (by performing model training), then using G2T-D
report back the calculated weights
➢ Now, we should have a full ML model, that was distributedly trained by multiple IoT devices and
capable to solve a multi-class problem
Contact: Bharath Sudharsan
Email: bharath.sudharsan@insight-centre.org
www.confirm.ie

More Related Content

PDF
ECML PKDD 2021 ML meets IoT Tutorial Part I: ML for IoT Devices
PDF
Ensemble Methods for Collective Intelligence: Combining Ubiquitous ML Models ...
PDF
Toward Distributed, Global, Deep Learning Using IoT Devices
PDF
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
PDF
Train++: An Incremental ML Model Training Algorithm to Create Self Learning I...
PDF
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
PDF
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
PDF
AI On the Edge: Model Compression
ECML PKDD 2021 ML meets IoT Tutorial Part I: ML for IoT Devices
Ensemble Methods for Collective Intelligence: Combining Ubiquitous ML Models ...
Toward Distributed, Global, Deep Learning Using IoT Devices
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
Train++: An Incremental ML Model Training Algorithm to Create Self Learning I...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
Enabling Machine Learning on the Edge using SRAM Conserving Efficient Neural ...
AI On the Edge: Model Compression

What's hot (20)

PPTX
Introduction to Parallel Computing
PPT
Parallel Computing
PDF
Introduction to Parallel Distributed Computer Systems
PPTX
Introduction to Parallel and Distributed Computing
PDF
Lecture 1 introduction to parallel and distributed computing
PDF
Spine net learning scale permuted backbone for recognition and localization
PDF
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
PDF
Introduction to Parallel Computing
PPSX
Research Scope in Parallel Computing And Parallel Programming
PDF
In datacenter performance analysis of a tensor processing unit
PPT
Parallel Computing 2007: Overview
DOCX
Introduction to parallel computing
PDF
Machine Learning with New Hardware Challegens
PPTX
Applications of paralleL processing
PDF
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
PDF
Lecture02 types
PPTX
Parallel computing and its applications
PDF
Deep learning-practical
PDF
CNN Quantization
PPT
Parallel Computing
Introduction to Parallel Computing
Parallel Computing
Introduction to Parallel Distributed Computer Systems
Introduction to Parallel and Distributed Computing
Lecture 1 introduction to parallel and distributed computing
Spine net learning scale permuted backbone for recognition and localization
A SURVEY OF NEURAL NETWORK HARDWARE ACCELERATORS IN MACHINE LEARNING
Introduction to Parallel Computing
Research Scope in Parallel Computing And Parallel Programming
In datacenter performance analysis of a tensor processing unit
Parallel Computing 2007: Overview
Introduction to parallel computing
Machine Learning with New Hardware Challegens
Applications of paralleL processing
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Lecture02 types
Parallel computing and its applications
Deep learning-practical
CNN Quantization
Parallel Computing
Ad

Similar to Globe2Train: A Framework for Distributed ML Model Training using IoT Devices Across the Globe (20)

PPTX
Towards Energy and Carbon Footprint and Testing for AI-driven IoT Services
PDF
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
PPTX
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
PDF
Distributed deep learning
PDF
Distributed TensorFlow on Hops (Papis London, April 2018)
PDF
Paper Review: Training ImageNet in 1hour
PDF
Distributed deep learning
PDF
dagrep_v006_i004_p057_s16152
PPTX
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
PDF
KurtPortelliMastersDissertation
PDF
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
PDF
Internet of things (IOT) connects physical to digital
PDF
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
PDF
6G: Potential Use Cases and Enabling Technologies
PDF
Autonomous Systems for Optimization and Control
PDF
Deep learning architectures
PDF
A review of Pipeline Parallel Training of Large-scale Neural Network.pdf
PPTX
Internet of Things: Research Directions
PDF
Deep Learning and Big Data technologies for IoT Security
PDF
Campus edge computing_network_based_on_io_t_street_lighting_nodes
Towards Energy and Carbon Footprint and Testing for AI-driven IoT Services
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
NS-CUK Seminar: H.B.Kim, Review on "Cluster-GCN: An Efficient Algorithm for ...
Distributed deep learning
Distributed TensorFlow on Hops (Papis London, April 2018)
Paper Review: Training ImageNet in 1hour
Distributed deep learning
dagrep_v006_i004_p057_s16152
Anima Anandkumar at AI Frontiers : Modern ML : Deep, distributed, Multi-dimen...
KurtPortelliMastersDissertation
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Internet of things (IOT) connects physical to digital
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
6G: Potential Use Cases and Enabling Technologies
Autonomous Systems for Optimization and Control
Deep learning architectures
A review of Pipeline Parallel Training of Large-scale Neural Network.pdf
Internet of Things: Research Directions
Deep Learning and Big Data technologies for IoT Security
Campus edge computing_network_based_on_io_t_street_lighting_nodes
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Introduction to Data Science and Data Analysis
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Introduction to the R Programming Language
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Mega Projects Data Mega Projects Data
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Data Science and Data Analysis
Reliability_Chapter_ presentation 1221.5784
Galatica Smart Energy Infrastructure Startup Pitch Deck
STERILIZATION AND DISINFECTION-1.ppthhhbx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
[EN] Industrial Machine Downtime Prediction
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
climate analysis of Dhaka ,Banglades.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to the R Programming Language
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Quality review (1)_presentation of this 21
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

Globe2Train: A Framework for Distributed ML Model Training using IoT Devices Across the Globe

  • 1. Globe2Train: A Framework for Distributed ML Model Training using IoT Devices Across the Globe Bharath Sudharsan, John G. Breslin, Muhammad Intizar Ali
  • 2. Introduction ▪ Every modern household does not compulsorily own a GPU, yet it roughly has around a dozen IoT devices ➢ When idle IoT devices efficiently connected, can locally, within their home network train mid-sized ML models without depending on Cloud or GPU servers ➢ Latest GEFORCE RTX 2080 Ti GPU has 11 GB RAM but 1500 $. Whereas one Alexa smart speaker has 2 GB RAM and efficiently connecting 20 can collectively pool 40 GB of RAM. In this way, we can complete training faster than the expensive GPU and at a 0 $ investment since millions of IoT devices already exist globally, and most of them are idle ▪ Challenges in distributed global training scenarios: ➢ Network Uncertainties: Congestion and Latency Variance in real-world IoT networks ➢ Staleness Effect: stale parameters. i.e., the model parameters arrive late, not reflect the latest updates. Staleness slows down convergence
  • 3. Related Papers [1] L. Zhu, Y. Lu, Y. Lin, and S. Han, “Distributed training across the world,” 2019 [2] J. Lin, C. Gan, and S. Han, “Training kinetics in 15 minutes: Large-scale distributed training on videos,”arXiv preprint, 2019 [3] Y. Lin, and W. J. Dally, “Deep gradient compression: Reducing the communication bandwidth for distributed training,”arXiv preprint, 2017 [4] W. Xu, and N. Xiong, “Accelerating federated learning for iot in big data analytics with pruning, quantization and selective updating, ”IEEE Access, 2021 [5] S. Wang, and K. Chan, “When edge meets learning: Adaptive control for resource-constrained distributed machine learning,” in IEEE INFOCOM, 2018
  • 4. Globe2Train ▪ The Globe2Train (G2T) framework enables distributed ML model training using IoT devices across the globe: a. The framework's G2T-Cloud (G2T-C) component is deployed on a central server/cloud b. The G2T-Device (G2T-D) component is deployed on training involved IoT devices
  • 5. G2T-C Component ▪ G2T-C component decomposes one multi- class problem into multiple binary problems, which IoT devices solve and updates back the weights ➢ Decomposes 1 multi-class problem into k(k-1)/2 binary problems ➢ Smart speaker trains 3 binary classifiers. b_4 (to differentiate class 1 data from class 5 data), b_5 (class 2 from 3) and b_6 (2 from 4)
  • 6. G2T-D Component ▪ line 4: Sets the weight threshold Wt high for the training involved devices that have a poor internet connection ➢ Reduces frequent transmission of weights, reducing the network bandwidth. i.e., weights are locally accumulated till it reaches Wt ▪ Lines 6 to 8: Shrinks all the accumulated weights using the encode() function, which packs the non- zero weight values ➢ Similar to how Vanilla, Nesterov momentum SGD encodes the parameters/gradients
  • 7. Results ▪ We initiate distributed training from G2T-C component, where it decomposes the given 95 classes problem into 4465 binary problems (i.e., K = 95 in K(K-1)/2), where b0 to b4465 binary classifiers need to be trained by MCUs 1-3 in numerous rounds and in each round report back the calculated weights
  • 8. Results ▪ Contributions of the G2T-D Framework Component ➢ Congestion and Latency Toleration: Reduces weights synchronization frequency (less usage of congested network) by not allowing to transmit weights frequently or based on its availability. So, the training process used in across the globe setting gains the ability to tolerate latency ➢ Improved Training Scalability: Sends the weights at intervals that depend on the network condition - this process improves scalability by reducing the communication-to-computation ratio ➢ IoT Hardware Friendliness: Implementation is only a few lines of code ▪ Contributions of the G2T-C Framework Component ➢ Overcoming Staleness Effect: Designed not to face staleness and is more efficient than SSGD and ASGD - as it intelligently splits the tasks that execute in multiple rounds on the IoT devices
  • 9. Conclusion ▪ We presented Globe2Train, a framework for training ML models on idle IoT devices, millions of which exist across the globe ▪ G2T-C and G2T-D framework components can improve distributed training scalability, speed while reducing communication frequency and tolerating network latency ▪ Since the G2T-C can decompose one multi-class problem into multiple binary problems, it can be used to decompose a resource-demanding problem (can run only on GPU clusters) into multiple resource-friendly tiny parts that can distributedly execute on numerous idle IoT devices across the globe ▪ Since the G2T-D can significantly compress gradients/parameters during distributed training, it can be the basis for a broad spectrum of decentralized and collaborative learning applications
  • 10. Future Work ▪ Perform comprehensive real-world experimental evaluation of G2T ➢ Deploy the G2T-D on a few geographically separated IoT devices - poor to good network conditions ➢ Deploy the G2T-C on an Amazon AWS server and establish communication with IoT devices ➢ Define a ML model on AWS, then instruct the G2T-C to decompose the given ML multi-class problem into multiple binary problems and assign it to the connected IoT devices ➢ Make IoT devices solve the binary problems (by performing model training), then using G2T-D report back the calculated weights ➢ Now, we should have a full ML model, that was distributedly trained by multiple IoT devices and capable to solve a multi-class problem
  • 11. Contact: Bharath Sudharsan Email: bharath.sudharsan@insight-centre.org www.confirm.ie