This paper proposes a metadata vault model for the evolutionary integration of big data sets, focusing on the NCBI database for genetic variation. It highlights the necessity for data warehouses to adapt to evolving data sources and business needs while maintaining historical integrity and a single version of truth through master data management. The study emphasizes the integration of both relational and NoSQL technologies to efficiently manage large-scale biomedical data sets and address the challenges presented by schema evolution.
Related topics: