This document discusses efficient cost minimization techniques for big data processing. It characterizes big data processing using a two-dimensional Markov chain model to evaluate expected completion time. The problem is formulated as a mixed non-linear programming problem to optimize data assignment, placement, and migration across distributed data centers. A weighted bloom filter approach is presented to reduce communication costs through distributed incomplete pattern matching.