This document surveys advancements in the evaluation of abstractive summarization models from 2018 to mid-2020, highlighting the shift from traditional n-gram overlap metrics to more semantic assessments of summary fidelity. It discusses various evaluation methods, including manual and automated metrics, and critiques the limitations of existing measures like BLEU and ROUGE in ensuring factual accuracy. The paper emphasizes the importance of evaluating summarization through frameworks such as question answering and natural language inference to assess the truthfulness of generated summaries.