The document discusses a project aimed at transforming unstructured data from paper catalogues into a structured format using information extraction techniques. It highlights challenges in achieving multilingual consistency and the need for controlled vocabularies to enhance data retrieval efficiently. The authors propose various methodologies, including named entity recognition and relation extraction, to streamline data processing in archaeological contexts.