This chapter discusses different methods for processing large amounts of data across distributed systems. It introduces MapReduce as a programming model used by Google to process vast amounts of data across thousands of servers. MapReduce allows for distributed processing of large datasets by dividing work into independent tasks (mapping) and collecting/aggregating the results (reducing). The chapter also discusses scaling computation by launching many independent virtual machines and assigning tasks via a messaging queue. Overall it provides an overview of approaches for parallel and distributed processing of big data across cloud infrastructures.
Related topics: