- MapReduce is a programming model for processing large datasets in a distributed manner across clusters of machines. It handles parallelization, load balancing, and hardware failures automatically.
- In MapReduce, the input data is mapped to intermediate key-value pairs, shuffled and sorted by the keys, then reduced to produce the final output. This pattern applies to many large-scale computing problems.
- Google uses MapReduce for tasks like generating map tiles, processing web search query logs, and mining user activity data from clickstreams. It hides the complex distributed systems details from programmers.
Related topics: