Cloud computing allows large-scale data processing across many computers in a distributed manner. Spark is an open-source cluster computing framework that can efficiently handle large datasets across clusters of machines using in-memory techniques. This lecture at NTU Data Science in 2015 discussed Spark as a framework for analyzing big data in the cloud.
Related topics: