This document discusses streaming ETL using Apache Flink and Elasticsearch to ingest and transform educational data from various sources into a unified format. It describes collecting raw event data from different apps, ingesting it into Kafka, and using Flink jobs to join entities, map to Avro schemas, and write outputs to Elasticsearch and S3. The ETL process handles flexible input schemas, performs joins without dependencies on input ordering, and supports versioned namespaces. Monitoring and deployment approaches are also outlined.
Related topics: