Hadoop is an open source framework that allows processing and storage of large datasets across clusters of commodity hardware. It was created in 2006 by Doug Cutting and Mike Cafarella to support distributed processing for the Nutch search engine. Hadoop uses a distributed file system and MapReduce programming model to store and process data in a fault-tolerant way across large clusters of servers. It became an Apache project in 2006 and is now widely used by companies like Yahoo, Facebook, and Amazon to manage their big data.
Related topics: