From the course: AWS Certified Machine Learning - Specialty (MLS-C01) Cert Prep: 1 Data Engineering
Unlock this course with a free trial
Join today to access over 24,700 courses taught by industry experts.
Handle ML-specific MapReduce - Amazon Web Services (AWS) Tutorial
From the course: AWS Certified Machine Learning - Specialty (MLS-C01) Cert Prep: 1 Data Engineering
Handle ML-specific MapReduce
- [Instructor] AWS has a great mechanism for handling MapReduce for machine learning with the ability to run Spark. Spark can run in a third-party platform like Databricks or it can run inside of the EMR ecosystem. One of the things that Spark does and really the core capability of Spark is it interacts with the Hadoop Distributed File System and the Hadoop Distributed File System sits on top of S3 and treats the file system as a distributed and parallel compute and storage mechanism. So what this means is that if you have a notebook and the notebook needs to have more and more disk I/O, behind the scenes, the object storage system of S3 will be able to really deal with all of the different operations in and out by spinning up virtual machines and it can distribute the compute and storage across each of those nodes inside of the cluster. And what's great about this is it allows you to fairly seamlessly write a function, for…