SlideShare a Scribd company logo
Deduplication & Fusioninfo@sparsity-technologies.com
IndexIntroduction
Process
Successful stories
DemoIndexIntroduction
Process
Successful stories
DemoIntroductionBenefitsIdentification of suspected duplicated records inside a databaseMerging of data belonging to several databases with different formats detecting duplicated recordsValidation tools for the detected similarities
IntroductionDeduplication
IntroductionDeduplicationConfigurationAutomatic executionValidation of resultsPersonalized export
IntroductionDeduplicationConfigurationAutomatic executionValidation of resultsPersonalized export
IntroductionFusion
IntroductionFusionConfigurationAutomatic executionValidation of resultsPersonalized export
IntroductionFusionConfigurationAutomatic executionValidation of resultsPersonalized export
IntroductionFeatures
IndexIntroduction
Process
Successful stories
DemoProcessConfigurationsInput data file format: CSV
Select relevant columns to link registers
Relation between columns from different data sources (only when merging)
Assign types to columns to help using the most adequate automatic filtersCSVConfigurationsExecutionValidationExportationExcelPDFXMLCSV
ProcessConfigurationsComparative type: exact value, estimation by text, numerical estimation
Percentage of the importance of each column for the similarity computationCSVConfigurationsExecutionValidationExportation30%35%35% 100% =ExcelPDFXMLCSV
ProcessConfigurationsSpecific percentage for registers with null valued columns
Use filters to make values standard
Available automatic and specific filters for values such as name, dates, address, etc…CSVConfigurationsExecutionValidationExportationExcelPDFXMLCSV
ProcessConfigurationsEdit filters (create new filters, delete or update existing ones)
Use of dictionaries: name-converter dictionary (I.e.: Pepe  Jose)

More Related Content

PDF
Hovitaga Transport Tool
PDF
Hovitaga Mass Comparison Tool - overview
PDF
Global deforestation through time. Presentation at ESA
PDF
David P Brown - Phoenix ATB 2014-11-18
PPT
Visual Studio 2005 Database Professional Edition
DOCX
SAS Online Training Institute in Hyderabad - C-Point
PPTX
Azure Data Factory Data Flows Training (Sept 2020 Update)
PPTX
SSIS 2008 R2 data flow
Hovitaga Transport Tool
Hovitaga Mass Comparison Tool - overview
Global deforestation through time. Presentation at ESA
David P Brown - Phoenix ATB 2014-11-18
Visual Studio 2005 Database Professional Edition
SAS Online Training Institute in Hyderabad - C-Point
Azure Data Factory Data Flows Training (Sept 2020 Update)
SSIS 2008 R2 data flow

Similar to Daurum: Introduction (20)

PPT
Data validation option
PPTX
Mapping Data Flows Training April 2021
PPTX
Mapping Data Flows Training deck Q1 CY22
PPSX
Orcanos medical-common-validation-errors
PPT
Alm Specialist Toolkit Team System Roadmap 2008 And Beyond External
PPT
Test Automation Framework Designs
PDF
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
PDF
Leveraging HPE ALM & QuerySurge to test HPE Vertica
PPTX
Automate Best Practices
PPT
Test Automation Frameworks Final
PDF
Complex Event Processor 3.0.0 - An overview of upcoming features
PPSX
Automation Framework 042009 V2
PPTX
Vb essentials
PPT
Paper Ps
PPT
justin for ppt1 by browse button
PPT
Paper Ps
PPT
Paper PsUpload
PPT
upload ppt by browse button
PPT
alkatest7
PPT
justin presentation slideshare1
Data validation option
Mapping Data Flows Training April 2021
Mapping Data Flows Training deck Q1 CY22
Orcanos medical-common-validation-errors
Alm Specialist Toolkit Team System Roadmap 2008 And Beyond External
Test Automation Framework Designs
The Mechanics of Testing Large Data Pipelines (QCon London 2016)
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Automate Best Practices
Test Automation Frameworks Final
Complex Event Processor 3.0.0 - An overview of upcoming features
Automation Framework 042009 V2
Vb essentials
Paper Ps
justin for ppt1 by browse button
Paper Ps
Paper PsUpload
upload ppt by browse button
alkatest7
justin presentation slideshare1
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Spectroscopy.pptx food analysis technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Getting Started with Data Integration: FME Form 101
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
A Presentation on Artificial Intelligence
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Tartificialntelligence_presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
Getting Started with Data Integration: FME Form 101
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A Presentation on Artificial Intelligence
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Tartificialntelligence_presentation.pptx
Ad

Daurum: Introduction