This document discusses integrating Apache Flink with Apache Hive to unify stream and batch processing. The goals are to access Hive metadata and data from Flink, store Flink metadata in Hive's metastore, and support Hive's SQL grammar. The work will proceed in phases, starting with unified catalog APIs, then integrating metadata and data access between Flink and Hive, and finally supporting SQL DDL and DML. Current progress includes catalog designs, HiveCatalog for metadata integration, and HiveTableSource/Sink for data access. A demo was presented live using these new SQL and Table API capabilities to query Hive data from Flink.
Related topics: