The document evaluates various semantic answer similarity metrics for question-answering systems, highlighting the inadequacy of traditional lexical-based metrics and introducing new transformer-based models. It presents a new dataset of co-referent name pairs aimed at training these models for improved evaluation and correlating with human judgment. The study also emphasizes the need for automated metrics that address the complexities of semantic similarity in language,
Related topics: