- MapReduce is a programming model for processing large datasets in a distributed manner across clusters of machines. It handles parallelization, load balancing, and hardware failures automatically.
- In MapReduce, the input data is mapped to intermediate key-value pairs, shuffled and sorted by the keys, then reduced to produce the final output. This pattern applies to many large-scale computing problems.
- Google uses MapReduce for tasks like generating map tiles, processing log files, and mining user data at massive scales across thousands of machines. The programming model hides complex distributed systems details from developers.
Related topics: