This document discusses approximating important data aggregates online using hashing. It notes that many aggregates like count distinct, frequencies of top items, and quantiles cannot be computed online due to memory costs, but that most can be approximated online. The key trick is hashing, where the document explains that hashing elements to fixed-size tables allows tracking aggregates with sublinear memory overhead.
Related topics: