This document discusses security threats from data leakage attacks on Hadoop systems and proposes an investigation framework to detect such attacks. It first provides background on Hadoop and describes how sensitive data stored on Hadoop clusters could be targeted. The framework consists of data collectors in host operating systems that monitor access to important data and transmit logs to a central data analyzer. The data analyzer uses detection algorithms across four dimensions - abnormal directories, users, operations, and block proportions - to identify potential data leakage attacks in the collected logs and provide warnings. The document concludes that this framework can help reconstruct attack scenarios by automatically detecting suspicious data leakage behaviors.