This document discusses the Content Mine project, which aims to extract factual information from scientific literature using automated processes. Some key points:
1) ContentMine will extract 100 million facts per year from scientific papers by crawling, scraping, extracting, and republishing the data. The extracted data will be made openly available under open licenses and standards.
2) The goal is to make the vast amount of data locked in scientific papers more accessible and useful by converting it to structured, semantic formats like CSV and applying techniques like computer vision and natural language processing.
3) This will help address issues like an estimated 85% of medical research being wasted due to problems like poor data sharing and availability. Extracting facts at scale