This document summarizes the current state of large data processing in Python. It discusses Apache Spark and its RDD and SQL features. It also covers vectorized UDFs in PySpark and Spark structured streaming. Dask and its array, dataframe, and bag features are presented as an alternative to Spark. Ray is introduced as another framework building on Pandas. Google BigQuery and TensorFlow are also mentioned as options for cloud platforms. The document concludes by discussing functional programming and SQL as possible directions for the future.