This document provides an overview of semi-structured data and XML. It defines semi-structured data as a flexible data model based on trees, where data can have differences and come from multiple sources. XML is introduced as a language that uses tags to mark up data semantics rather than formatting. Key concepts of XML like well-formedness, tags, and DTDs that define valid tag structures are described.