The document discusses extracting canonical citations from classical texts at scale. It begins by explaining the importance of references in classics scholarship and trends toward enhanced reading. An approach is presented that uses named entity recognition, relation extraction, and disambiguation to extract citation components and assign identifiers. The extraction pipeline is evaluated on data from L'Année philologique, achieving a high F1 score. Overall, the approach aims to scale the extraction of citations to enable applications like search and network analysis over large corpora.
Related topics: