This paper presents a novel hybrid approach to build document-oriented data warehouses from unstructured data by integrating data analysis and user requirements analysis. Utilizing the Apache Spark engine, the method generates general schemas from data collections and maps users' decision-making needs expressed in natural language to these schemas. The outcome is a customized decisional schema in JSON format designed specifically for decision-making purposes, enhancing the efficiency and relevance of data utilization in data warehouses.
Related topics: