The document discusses techniques for detecting data leakage when sensitive data is shared with third parties (agents). It proposes:
1) A model to calculate the probability that each agent is guilty of leaking a set of data (S) that was discovered outside of authorized channels. The model accounts for the likelihood that data in S could be guessed or obtained from other public sources versus leaked by agents.
2) Strategies for allocating data among agents in a way that improves the ability to identify leakers, such as distributing unique or rare records to single agents.
3) The optional addition of "fake" records to the data in a manner similar to digital watermarks, allowing positive identification of leakers if fake