SlideShare a Scribd company logo
Modern Management
of Data Pipelines
Improving management of data
over its lifetime
Hinders audit &
transparency
Incurs extra cost & time
Can’t satisfy regulatory
requirements
CLASSIFICATION
ANONYMIZATION
Two CloverDX-based Solutions
Harvester/Anonymizer
Machine-aided enterprise scale data discovery
and multi-tiered data delivery based on well-
defined policies
Data Discovery
Classification
Anonymization
Retention
Access Permission Control
Data Modeling Bridge
Cultural shift to metadata-driven transparency, audit
and ease of maintaining large numbers of
integrations
Reporting
Reconciliation/Measurement
Auditing of Data/Process
Transparent Data Mapping
Visualization
Automated data discovery on data sources or targets
Scan data source and use matching algorithms to determine type of data
Collect information about data domains of live data
Where is PII located?
What kind of data is really in our databases?
Harvester: Data Discovery
Harvester automates the discovery process
Data layouts (schemas, tables, fields and their types)
Assisted Classification
Domain information: customizable domain detection algorithms and sample data sets
Data density within a column for each domain
Can be automated using statistical outputs provided by Harvester
“Seed” core systems as basis for high-accuracy classification of derived systems
Manually correct outliers
Data Discovery and Classification
Built on top of CloverDX platform
Data manipulation and orchestration implemented in CloverDX
Easy to add support for new data sources (e.g. different databases)
How Does It Work?
Configure
XML config file
Collect Data
CloverDX Server
Find Domains
CloverDX Server
Build Reports
CloverDX Server
Quick Demo
Sharing of data within organization is harder if data contains
PII
Anonymized data can be used for reporting and analytics
Does not have much value outside of the organization (cannot steal CCs or identities)
Anonymize data and only keep PII on production
Automatically apply anonymization policies to remove PII
Generate anonymizers based on metadata and models
Working With PII (And Helping With Regulations)
Rule-based data anonymization
Read data from source system and apply anonymization rules
Pre-defined rules for common patterns (e.g. credit cards, names, addresses, …)
Define custom rules to implement different anonymization strategies
Use data from Harvester to drive the configuration
Help with locating of PII in the source
Configure rules based on data domains
Anonymizer
Built on top of CloverDX platform
Connect to different data sources/targets and allow for flexible business logic
implementation
Customizable anonymization rules
How Does It Work?
Configure
XML config file
Collect Data
CloverDX Server
Apply Rules
CloverDX Server
Write Data
CloverDX Server
Quick Demo
Anonymization gateway
Expose data through an API that will automatically produce anonymized output
Automatically generate the API based on privacy policies
Improving Data Privacy
Anonymization API can be automatically
generated based on metadata
Apps authenticate with the API
and have their own grants
Restricted
app
App 1
Raw data
(includes PII)
API
get CUSTOMER; id = 5; grant=PUBLIC
get CUSTOMER; id = 5; grant=ALL
Data Models Runnable
Transformation
Data and mapping descriptions Generated executable transformation code
Data models (which are common to everybody) transparently
become process models (again, common to everybody)
Runtime is tied to foundational data models
Single definition of sources and consumers
Single capture of mapping/transformation
Single point of control / governance
Data Models for Everything
Stakeholders and
analysts work on data
models only
Bridge closes the gap between
models and executable code
Executable code is verified and
ultimately deployed to production
Data modeling tool Bridge Generator
CloverDX Server
Bridge Runtime
CloverDX Server
ProductionDevelopment
Databases
Files
APIs
Data models
Metadata
CloverDX project
How Does It Work?
Quick Demo
Change Impact Analysis and Data Lineage
Analyze change impact
No need to investigate the code,
just query the metadata
Visualize where PII data is used
See where PII data ends up and if it is handled
according to requirements
Data lineage for full transparency
Query the model to see where the data comes
from
Thank you
hello@cloverdx.com

More Related Content

PDF
How to Effectively Migrate Data From Legacy Apps
PDF
Lessons learned from over 25 Data Virtualization implementations
PPTX
Cloud Computing
PPTX
8.cloud migration
PPTX
Cloud migration
PPSX
Corporate-Overview-Slides
PDF
Velostrata cloud migration --Whitepaper
PDF
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
How to Effectively Migrate Data From Legacy Apps
Lessons learned from over 25 Data Virtualization implementations
Cloud Computing
8.cloud migration
Cloud migration
Corporate-Overview-Slides
Velostrata cloud migration --Whitepaper
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017

What's hot (20)

PPTX
RapidScale CloudServer
PPTX
Cloud migration slides
PPTX
Healthcare IT
PDF
Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem
PPTX
Cloud migration
PPTX
Iaas Pricing Models
PDF
Cloud and Analytics - From Platforms to an Ecosystem
PPTX
IT HealthCheck
DOCX
Cloud migration, orchestration and operations
PPT
Cloud strategy briefing 101
PPTX
Developing a cloud strategy - Presentation Nexon ABC Event
PPTX
Plan Your IaaS Environment for Optimal Performance
PPTX
RapidScale CloudRecovery
PPTX
January 2015 Webinar - Wins and Successes from 2014
PDF
Building a centralized observability platform
PPTX
Next Generation Enterprise Architecture
PPTX
CloudDiscovery - Machine Analytics
PDF
Get the most out of your AWS Redshift investment while keeping cost down
PDF
A complete-guide-to-oracle-to-redshift-migration
PDF
Hortonworks Hybrid Cloud - Putting you back in control of your data
RapidScale CloudServer
Cloud migration slides
Healthcare IT
Privacera and Northwestern Mutual - Scaling Privacy in a Spark Ecosystem
Cloud migration
Iaas Pricing Models
Cloud and Analytics - From Platforms to an Ecosystem
IT HealthCheck
Cloud migration, orchestration and operations
Cloud strategy briefing 101
Developing a cloud strategy - Presentation Nexon ABC Event
Plan Your IaaS Environment for Optimal Performance
RapidScale CloudRecovery
January 2015 Webinar - Wins and Successes from 2014
Building a centralized observability platform
Next Generation Enterprise Architecture
CloudDiscovery - Machine Analytics
Get the most out of your AWS Redshift investment while keeping cost down
A complete-guide-to-oracle-to-redshift-migration
Hortonworks Hybrid Cloud - Putting you back in control of your data
Ad

Similar to Modern management of data pipelines made easier (20)

PDF
Removing Danger From Data
PDF
apidays London 2023 - API Programs - Security by Design, Privacy by Default, ...
PPTX
dlp-sales-play-sales-customer-deck-2022.pptx
PDF
How to publish data and transformations over APIs with CloverDX Data Services
PDF
Geek Sync: Database Auditing Essentials: Tracking Who Did What to Which Data ...
PDF
Protecting privacy in practice
PPTX
How ddd, cqrs and event sourcing constitute the architecture of the future
PPTX
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
PPT
Information security in big data -privacy and data mining
PPTX
PPTX
Handling PII and sensitive content in SAP BusinessObjects
PDF
Idera live 2021: Database Auditing - on-Premises and in the Cloud by Craig M...
PDF
Data Security with Oracle Database_ What’s New in 2025_ (1).pdf
PDF
Where In The World Is Your Sensitive Data?
PDF
Privacy by Design - Lars Albertsson, Mapflat
PDF
Why Master Data Management matters
PPTX
BAS 250 Lecture 1
PDF
Modern Methods for Managing Data Security
PDF
Data+Management+Masterclasssdfsdfsdfsd.pdf
PDF
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah Hurley
Removing Danger From Data
apidays London 2023 - API Programs - Security by Design, Privacy by Default, ...
dlp-sales-play-sales-customer-deck-2022.pptx
How to publish data and transformations over APIs with CloverDX Data Services
Geek Sync: Database Auditing Essentials: Tracking Who Did What to Which Data ...
Protecting privacy in practice
How ddd, cqrs and event sourcing constitute the architecture of the future
Targeted Marketing: How Marketing Companies can use Big Data to Target Custom...
Information security in big data -privacy and data mining
Handling PII and sensitive content in SAP BusinessObjects
Idera live 2021: Database Auditing - on-Premises and in the Cloud by Craig M...
Data Security with Oracle Database_ What’s New in 2025_ (1).pdf
Where In The World Is Your Sensitive Data?
Privacy by Design - Lars Albertsson, Mapflat
Why Master Data Management matters
BAS 250 Lecture 1
Modern Methods for Managing Data Security
Data+Management+Masterclasssdfsdfsdfsd.pdf
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah Hurley
Ad

More from CloverDX (11)

PPTX
Data architecture principles to accelerate your data strategy
PPTX
Characteristics of modern data architecture that drive innovation
PPTX
How to build an automated customer data onboarding pipeline
PPTX
Automating Data Pipelines: Moving away from Scripts and Excel
PPTX
CloverDX 6.2 Release
PDF
Deploying ETL to Cloud
PDF
Moving Legacy Apps to Cloud: How to Avoid Risk
PDF
Starting Your Modern DataOps Journey
PPTX
CloverDX for IBM Infosphere MDM (for 11.4 and later)
PDF
Data Anonymization For Better Software Testing
PPTX
Moving "Something Simple" To The Cloud - What It Really Takes
Data architecture principles to accelerate your data strategy
Characteristics of modern data architecture that drive innovation
How to build an automated customer data onboarding pipeline
Automating Data Pipelines: Moving away from Scripts and Excel
CloverDX 6.2 Release
Deploying ETL to Cloud
Moving Legacy Apps to Cloud: How to Avoid Risk
Starting Your Modern DataOps Journey
CloverDX for IBM Infosphere MDM (for 11.4 and later)
Data Anonymization For Better Software Testing
Moving "Something Simple" To The Cloud - What It Really Takes

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Lecture1 pattern recognition............
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
annual-report-2024-2025 original latest.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Miokarditis (Inflamasi pada Otot Jantung)
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Supervised vs unsupervised machine learning algorithms
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Mega Projects Data Mega Projects Data
Lecture1 pattern recognition............
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to machine learning and Linear Models
Introduction-to-Cloud-ComputingFinal.pptx
IB Computer Science - Internal Assessment.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
annual-report-2024-2025 original latest.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

Modern management of data pipelines made easier

  • 1. Modern Management of Data Pipelines Improving management of data over its lifetime
  • 2. Hinders audit & transparency Incurs extra cost & time Can’t satisfy regulatory requirements CLASSIFICATION ANONYMIZATION
  • 3. Two CloverDX-based Solutions Harvester/Anonymizer Machine-aided enterprise scale data discovery and multi-tiered data delivery based on well- defined policies Data Discovery Classification Anonymization Retention Access Permission Control Data Modeling Bridge Cultural shift to metadata-driven transparency, audit and ease of maintaining large numbers of integrations Reporting Reconciliation/Measurement Auditing of Data/Process Transparent Data Mapping Visualization
  • 4. Automated data discovery on data sources or targets Scan data source and use matching algorithms to determine type of data Collect information about data domains of live data Where is PII located? What kind of data is really in our databases? Harvester: Data Discovery
  • 5. Harvester automates the discovery process Data layouts (schemas, tables, fields and their types) Assisted Classification Domain information: customizable domain detection algorithms and sample data sets Data density within a column for each domain Can be automated using statistical outputs provided by Harvester “Seed” core systems as basis for high-accuracy classification of derived systems Manually correct outliers Data Discovery and Classification
  • 6. Built on top of CloverDX platform Data manipulation and orchestration implemented in CloverDX Easy to add support for new data sources (e.g. different databases) How Does It Work? Configure XML config file Collect Data CloverDX Server Find Domains CloverDX Server Build Reports CloverDX Server
  • 8. Sharing of data within organization is harder if data contains PII Anonymized data can be used for reporting and analytics Does not have much value outside of the organization (cannot steal CCs or identities) Anonymize data and only keep PII on production Automatically apply anonymization policies to remove PII Generate anonymizers based on metadata and models Working With PII (And Helping With Regulations)
  • 9. Rule-based data anonymization Read data from source system and apply anonymization rules Pre-defined rules for common patterns (e.g. credit cards, names, addresses, …) Define custom rules to implement different anonymization strategies Use data from Harvester to drive the configuration Help with locating of PII in the source Configure rules based on data domains Anonymizer
  • 10. Built on top of CloverDX platform Connect to different data sources/targets and allow for flexible business logic implementation Customizable anonymization rules How Does It Work? Configure XML config file Collect Data CloverDX Server Apply Rules CloverDX Server Write Data CloverDX Server
  • 12. Anonymization gateway Expose data through an API that will automatically produce anonymized output Automatically generate the API based on privacy policies Improving Data Privacy Anonymization API can be automatically generated based on metadata Apps authenticate with the API and have their own grants Restricted app App 1 Raw data (includes PII) API get CUSTOMER; id = 5; grant=PUBLIC get CUSTOMER; id = 5; grant=ALL
  • 13. Data Models Runnable Transformation Data and mapping descriptions Generated executable transformation code
  • 14. Data models (which are common to everybody) transparently become process models (again, common to everybody) Runtime is tied to foundational data models Single definition of sources and consumers Single capture of mapping/transformation Single point of control / governance
  • 15. Data Models for Everything Stakeholders and analysts work on data models only Bridge closes the gap between models and executable code Executable code is verified and ultimately deployed to production
  • 16. Data modeling tool Bridge Generator CloverDX Server Bridge Runtime CloverDX Server ProductionDevelopment Databases Files APIs Data models Metadata CloverDX project How Does It Work?
  • 18. Change Impact Analysis and Data Lineage Analyze change impact No need to investigate the code, just query the metadata Visualize where PII data is used See where PII data ends up and if it is handled according to requirements Data lineage for full transparency Query the model to see where the data comes from