This document describes Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language called HiveQL, which are compiled into map-reduce jobs executed on Hadoop. Hive organizes data into tables partitioned across directories and files in HDFS. It includes a system catalog called Hive Metastore for storing schemas and statistics to optimize queries.