Spark is a big data processing tool built with Scala that runs on the Java Virtual Machine (JVM). It is up to 100 times faster than Hadoop for iterative jobs because it keeps intermediate data in memory rather than writing to disk. Spark uses Resilient Distributed Datasets (RDDs) that allow data to be kept in memory across transformations and actions. RDDs also maintain lineage graphs to allow recovery from failures.