This document outlines a streaming decision tree classifier for classifying data streams using Apache Flink. It discusses the need for a classifier that can learn from streaming data. The architecture uses Kafka streams to ingest a stream of labeled data points and broadcast the evolving decision tree model. The algorithm builds approximate histograms over data features to determine split points for the decision tree in a streaming fashion without needing to store all data. This allows the classifier to continuously learn and make predictions on streaming data.
Related topics: