SlideShare a Scribd company logo
Get a Farm-to-Table View of Your Data
Tracking data quality and lineage on-premises
and in the cloud, on and off the cluster
Dr. Tendü Yoğurtçu, Chief Technology Officer
Today’s Speaker
Dr. Tendü Yoğurtçu
Chief Technology Officer, Syncsort
@TenduYogurtcu
www.linkedin.com/in/tenduyogurtcu
2Syncsort Confidential and Proprietary - do not copy or distribute
Farm to Table
3Syncsort Confidential and Proprietary - do not copy or distribute
Technology Trends Advancing Data
4Syncsort Confidential and Proprietary - do not copy or distribute
Advanced
Business &
Operational
Analytics
CLOUD
DATA SCIENCE
& ARTIFICIAL
INTELLIGENCE
IOT &
STREAMING
DATA
DATA
GOVERNANCE
Technology Trends Advancing Data
5Syncsort Confidential and Proprietary - do not copy or distribute
Technology Trends Advancing Data
Advanced
Business &
Operational
Analytics
CLOUD
DATA SCIENCE
& ARTIFICIAL
INTELLIGENCE
IOT &
STREAMING
DATA
DATA
GOVERNANCE
Technology Trends Advancing Data
6Syncsort Confidential and Proprietary - do not copy or distribute
Technology Trends Advancing Data
Advanced
Business &
Operational
Analytics
CLOUD
DATA SCIENCE
& ARTIFICIAL
INTELLIGENCE
IOT &
STREAMING
DATA
DATA
GOVERNANCE
Technology Trends Advancing Data
7Syncsort Confidential and Proprietary - do not copy or distribute
Advanced
Business &
Operational
Analytics
CLOUD
DATA SCIENCE
& ARTIFICIAL
INTELLIGENCE
IOT &
STREAMING
DATA
DATA
GOVERNANCE
Data Governance
8Syncsort Confidential and Proprietary - do not copy or distribute
GOALS
• Regulatory compliance
• Understand data context, meaning
• Accuracy, completeness, consistency, relevancy,
timeliness, validity of data
CHALLENGES
• Multi-platform, data volume and complexity
• Diversity and consistency of sources
• Compliance demands: broader & deeper
 Business imperative across platforms and deployment
models, on-premise and in the cloud
Data Governance
9Syncsort Confidential and Proprietary - do not copy or distribute
QUALITY
• Discover sources of, relationships between, data
• Apply business rules to measure data quality continuously
SECURITY
• Protect the confidentiality, integrity and availability
of data
LINEAGE
• Get insights into where data came from, what changes
were made and where it lands
 Requires a multi-faceted approach
End to End Data Lineage in Cloudera Navigator
10Syncsort Confidential and Proprietary - do not copy or distribute
Data Sources Data analyst
gets end-to-end
data lineage
info from
Navigator.
Syncsort onboards
data, modifies
on-the-fly to match
Hadoop storage
model.
Syncsort accesses
data from
sources outside
cluster.
Syncsort changes,
enhances, joins
data in cluster with
MapReduce or
Spark.
Analytics and
visualizations get
complete data.
Navigator gathers
any other changes
made to data on
cluster.
Syncsort passes
source-to-
cluster data
lineage info to
Navigator.
Data Hub
Analytics,
Visualization
Data changes made
by MapReduce,
Spark, HiveQL.
Syncsort DMX-h + Cloudera Navigator for End-to-End Lineage
11Syncsort Confidential and Proprietary - do not copy or distribute
End-to-End Data Lineage in Apache Atlas
12Syncsort Confidential and Proprietary - do not copy or distribute
Data Sources Data analyst
gets end-to-
end data
lineage info
from Atlas
Data Hub
Analytics,
Visualization
Data changes made
by MapReduce,
Spark, HiveQL.
Syncsort onboards
data, modifies
on-the-fly to match
Hadoop storage
model.
Syncsort accesses
data from
sources outside
cluster.
Syncsort changes,
enhances, joins
data in cluster with
MapReduce or
Spark.
Analytics and
visualizations get
complete data.
Any other changes
made to data on
cluster are
published to Atlas.
Syncsort passes
source-to-
cluster data
lineage info to
Atlas.
Data Lineage + Data Quality = Foundations of Data Governance
13Syncsort Confidential and Proprietary - do not copy or distribute
Discovery
and
Profiling
Data Sources
Multi-field fuzzy matching, de-duplication,
cleansing, enrichment, standardization,
business rule enforcement.
Analytics and
visualizations on
clean, complete data
you can trust.
Data Hub
Analytics,
Visualization
Data Lineage
Anti-Money Laundering Solution on Hadoop at Large Global Bank
Challenge: Meet AML transaction monitoring
and FCA compliance demands
– Data too large, diversely scattered to analyze
– Disparate data sources -- Mainframe, RDBMS,
Cloud, etc
Requirements:
– Consolidated, clean, verified data for all analytics
and reporting.
– MUST have complete, detailed data lineage from
origin to end point
– MUST be secure: Kerberos and LDAP integration
required
– Need unmodified copy of mainframe data stored
on Hadoop for backup, archive
14Syncsort Confidential and Proprietary - do not copy or distribute
Anti-Money Laundering Solution on Hadoop at Large Global Bank
Solution:
• Syncsort DMX-h to create “Golden Record” on
Hadoop for compliance archiving
• Trillium Quality for Big Data for cluster-native
data verification, enrichment, and demanding
multi-field entity resolution on Spark framework
• Full end-to-end lineage to Cloudera Navigator,
from all sources, through transformations, to
data landing, including HiveQL changes
Benefits:
• New financial crimes data hub produces high
performance results at massive scale
• Bank meets stringent Anti-Money Laundering
compliance requirements
15Syncsort Confidential and Proprietary - do not copy or distribute
Learn How Syncsort Solutions Can Help You
16Syncsort Confidential and Proprietary - do not copy or distribute
Data
Infrastructure Optimization
• Mainframe Optimization
• Application Modernization
• EDW Optimization
• Cross-Platform Capacity
Management
Data
Availability
• High Availability & Disaster
Recovery
• Mission-Critical Migration
• Cross-Platform Data Sharing
• IBM i Data Security & Audit
• Mainframe Access &
Integration for Machine Data
• Mainframe Access &
Integration for App Data
• High-performance ETL
• Change Data Capture
Data
Integration
Data
Quality
• Data Governance
• Customer 360
• Big Data Quality & Integration
• Data Enrichment & Validation
www.syncsort.com
THANK YOU

More Related Content

PDF
Sqrrl Overview for Stac Research
PDF
Sqrrl February Webinar: Breaking Down Data Silos
PDF
Sqrrl March Webinar: How to Build a Big App
PDF
Reducing Mean Time to Know
PPTX
Getting Started with Splunk Enterprises
PDF
Big data today and tomorrow
PPTX
Managing the Dewey Decimal System
PDF
Paris Spark Meetup - Trifacta - 03_04_2017
Sqrrl Overview for Stac Research
Sqrrl February Webinar: Breaking Down Data Silos
Sqrrl March Webinar: How to Build a Big App
Reducing Mean Time to Know
Getting Started with Splunk Enterprises
Big data today and tomorrow
Managing the Dewey Decimal System
Paris Spark Meetup - Trifacta - 03_04_2017

What's hot (18)

PDF
Data in Motion vs Data at Rest
PDF
Countering Threats with the Elastic Stack at CERDEC/ARL
PPTX
Geo-Analytics with Apache Spark and In-Memory Data Grids
PPTX
Hadoop in Validated Environment - Data Governance Initiative
PPTX
Getting Started with Splunk Breakout Session
PDF
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
PDF
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
PPTX
From Data to Insights to Action: When Transactions and Analytics Converge
PPTX
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data Growth
PPTX
From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...
PDF
Splunk hunkbeta
PPTX
Hadoop Hadoop & Spark meetup - Altiscale
PPT
Big data – can it deliver speed and accuracy v1
PPTX
Big data big rewards
PPTX
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
PPTX
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
PPTX
Power of Splunk Search Processing Language (SPL) ...
PPTX
Taking Splunk to the Next Level - Architecture Breakout Session
Data in Motion vs Data at Rest
Countering Threats with the Elastic Stack at CERDEC/ARL
Geo-Analytics with Apache Spark and In-Memory Data Grids
Hadoop in Validated Environment - Data Governance Initiative
Getting Started with Splunk Breakout Session
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
From Data to Insights to Action: When Transactions and Analytics Converge
Webinar: Rearchitecting Storage for the Next Wave of Splunk Data Growth
From Batch to Real Time: Overstock’s Journey Towards Unifying Analytics Acros...
Splunk hunkbeta
Hadoop Hadoop & Spark meetup - Altiscale
Big data – can it deliver speed and accuracy v1
Big data big rewards
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Power of Splunk Search Processing Language (SPL) ...
Taking Splunk to the Next Level - Architecture Breakout Session
Ad

Similar to Get a "Farm to Table" View of Your Data: Tracking Data Quality and Lineage, on Premise and in the Cloud, On and Off the Cluster (20)

PPTX
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
PDF
End-to-End, Source to Analytics, Data Lineage with Syncsort DMX-h
PPTX
PCM Vision 2019 Breakout: Quest Software
 
PDF
Reliable Data Intestion in BigData / IoT
PDF
Keeping Data in Sync with Syncsort
PDF
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
PDF
Hadoop as an Analytic Platform: Why Not?
PDF
The Future of Data Management: The Enterprise Data Hub
PDF
Hadoop is Happening
PPTX
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
PDF
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
PPTX
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
PPTX
Impala Unlocks Interactive BI on Hadoop
PPTX
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
PDF
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
PDF
Level Up – How to Achieve Hadoop Acceleration
PDF
SAP Data Hub e SUSE Container as a Service Platform
PDF
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
PDF
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
PDF
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
End-to-End, Source to Analytics, Data Lineage with Syncsort DMX-h
PCM Vision 2019 Breakout: Quest Software
 
Reliable Data Intestion in BigData / IoT
Keeping Data in Sync with Syncsort
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Hadoop as an Analytic Platform: Why Not?
The Future of Data Management: The Enterprise Data Hub
Hadoop is Happening
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Impala Unlocks Interactive BI on Hadoop
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Level Up – How to Achieve Hadoop Acceleration
SAP Data Hub e SUSE Container as a Service Platform
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
SplunkLive! Amsterdam 2015 Breakout - Getting Started with Splunk
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Ad

More from Precisely (20)

PDF
The Future of Automation: AI, APIs, and Cloud Modernization.pdf
PDF
Unlock new opportunities with location data.pdf
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PDF
Introducing Syncsort™ Storage Management.pdf
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
PDF
Solving the CIO’s Dilemma: Speed, Scale, and Smarter SAP Modernization.pdf
PDF
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
PDF
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
PDF
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
PDF
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
PDF
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
PDF
The 2025 Guide on What's Next for Automation.pdf
PDF
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
PDF
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
PDF
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
PDF
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
PDF
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
PDF
The Changing Compliance Landscape in 2025.pdf
The Future of Automation: AI, APIs, and Cloud Modernization.pdf
Unlock new opportunities with location data.pdf
Reimagining Insurance: Connected Data for Confident Decisions.pdf
Introducing Syncsort™ Storage Management.pdf
Enable Enterprise-Ready Security on IBM i Systems.pdf
A Day in the Life of Location Data - Turning Where into How.pdf
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Solving the CIO’s Dilemma: Speed, Scale, and Smarter SAP Modernization.pdf
Solving the Data Disconnect: Why Success Hinges on Pre-Linked Data.pdf
Cooking Up Clean Addresses - 3 Ways to Whip Messy Data into Shape.pdf
Building Confidence in AI & Analytics with High-Integrity Location Data.pdf
SAP Modernization Strategies for a Successful S/4HANA Journey.pdf
Precisely Demo Showcase: Powering ServiceNow Discovery with Precisely Ironstr...
The 2025 Guide on What's Next for Automation.pdf
Outdated Tech, Invisible Expenses – How Data Silos Undermine Operational Effi...
Modernización de SAP: Maximizando el Valor de su Migración a SAP S/4HANA.pdf
Outdated Tech, Invisible Expenses – The Hidden Cost of Disconnected Data Syst...
Migration vers SAP S/4HANA: Un levier stratégique pour votre transformation d...
Outdated Tech, Invisible Expenses: The Hidden Cost of Poor Data Integration o...
The Changing Compliance Landscape in 2025.pdf

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Spectroscopy.pptx food analysis technology
20250228 LYD VKU AI Blended-Learning.pptx
Network Security Unit 5.pdf for BCA BBA.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Weekly Chronicles - August'25 Week I
sap open course for s4hana steps from ECC to s4
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Chapter 3 Spatial Domain Image Processing.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Get a "Farm to Table" View of Your Data: Tracking Data Quality and Lineage, on Premise and in the Cloud, On and Off the Cluster

  • 1. Get a Farm-to-Table View of Your Data Tracking data quality and lineage on-premises and in the cloud, on and off the cluster Dr. Tendü Yoğurtçu, Chief Technology Officer
  • 2. Today’s Speaker Dr. Tendü Yoğurtçu Chief Technology Officer, Syncsort @TenduYogurtcu www.linkedin.com/in/tenduyogurtcu 2Syncsort Confidential and Proprietary - do not copy or distribute
  • 3. Farm to Table 3Syncsort Confidential and Proprietary - do not copy or distribute
  • 4. Technology Trends Advancing Data 4Syncsort Confidential and Proprietary - do not copy or distribute Advanced Business & Operational Analytics CLOUD DATA SCIENCE & ARTIFICIAL INTELLIGENCE IOT & STREAMING DATA DATA GOVERNANCE
  • 5. Technology Trends Advancing Data 5Syncsort Confidential and Proprietary - do not copy or distribute Technology Trends Advancing Data Advanced Business & Operational Analytics CLOUD DATA SCIENCE & ARTIFICIAL INTELLIGENCE IOT & STREAMING DATA DATA GOVERNANCE
  • 6. Technology Trends Advancing Data 6Syncsort Confidential and Proprietary - do not copy or distribute Technology Trends Advancing Data Advanced Business & Operational Analytics CLOUD DATA SCIENCE & ARTIFICIAL INTELLIGENCE IOT & STREAMING DATA DATA GOVERNANCE
  • 7. Technology Trends Advancing Data 7Syncsort Confidential and Proprietary - do not copy or distribute Advanced Business & Operational Analytics CLOUD DATA SCIENCE & ARTIFICIAL INTELLIGENCE IOT & STREAMING DATA DATA GOVERNANCE
  • 8. Data Governance 8Syncsort Confidential and Proprietary - do not copy or distribute GOALS • Regulatory compliance • Understand data context, meaning • Accuracy, completeness, consistency, relevancy, timeliness, validity of data CHALLENGES • Multi-platform, data volume and complexity • Diversity and consistency of sources • Compliance demands: broader & deeper  Business imperative across platforms and deployment models, on-premise and in the cloud
  • 9. Data Governance 9Syncsort Confidential and Proprietary - do not copy or distribute QUALITY • Discover sources of, relationships between, data • Apply business rules to measure data quality continuously SECURITY • Protect the confidentiality, integrity and availability of data LINEAGE • Get insights into where data came from, what changes were made and where it lands  Requires a multi-faceted approach
  • 10. End to End Data Lineage in Cloudera Navigator 10Syncsort Confidential and Proprietary - do not copy or distribute Data Sources Data analyst gets end-to-end data lineage info from Navigator. Syncsort onboards data, modifies on-the-fly to match Hadoop storage model. Syncsort accesses data from sources outside cluster. Syncsort changes, enhances, joins data in cluster with MapReduce or Spark. Analytics and visualizations get complete data. Navigator gathers any other changes made to data on cluster. Syncsort passes source-to- cluster data lineage info to Navigator. Data Hub Analytics, Visualization Data changes made by MapReduce, Spark, HiveQL.
  • 11. Syncsort DMX-h + Cloudera Navigator for End-to-End Lineage 11Syncsort Confidential and Proprietary - do not copy or distribute
  • 12. End-to-End Data Lineage in Apache Atlas 12Syncsort Confidential and Proprietary - do not copy or distribute Data Sources Data analyst gets end-to- end data lineage info from Atlas Data Hub Analytics, Visualization Data changes made by MapReduce, Spark, HiveQL. Syncsort onboards data, modifies on-the-fly to match Hadoop storage model. Syncsort accesses data from sources outside cluster. Syncsort changes, enhances, joins data in cluster with MapReduce or Spark. Analytics and visualizations get complete data. Any other changes made to data on cluster are published to Atlas. Syncsort passes source-to- cluster data lineage info to Atlas.
  • 13. Data Lineage + Data Quality = Foundations of Data Governance 13Syncsort Confidential and Proprietary - do not copy or distribute Discovery and Profiling Data Sources Multi-field fuzzy matching, de-duplication, cleansing, enrichment, standardization, business rule enforcement. Analytics and visualizations on clean, complete data you can trust. Data Hub Analytics, Visualization Data Lineage
  • 14. Anti-Money Laundering Solution on Hadoop at Large Global Bank Challenge: Meet AML transaction monitoring and FCA compliance demands – Data too large, diversely scattered to analyze – Disparate data sources -- Mainframe, RDBMS, Cloud, etc Requirements: – Consolidated, clean, verified data for all analytics and reporting. – MUST have complete, detailed data lineage from origin to end point – MUST be secure: Kerberos and LDAP integration required – Need unmodified copy of mainframe data stored on Hadoop for backup, archive 14Syncsort Confidential and Proprietary - do not copy or distribute
  • 15. Anti-Money Laundering Solution on Hadoop at Large Global Bank Solution: • Syncsort DMX-h to create “Golden Record” on Hadoop for compliance archiving • Trillium Quality for Big Data for cluster-native data verification, enrichment, and demanding multi-field entity resolution on Spark framework • Full end-to-end lineage to Cloudera Navigator, from all sources, through transformations, to data landing, including HiveQL changes Benefits: • New financial crimes data hub produces high performance results at massive scale • Bank meets stringent Anti-Money Laundering compliance requirements 15Syncsort Confidential and Proprietary - do not copy or distribute
  • 16. Learn How Syncsort Solutions Can Help You 16Syncsort Confidential and Proprietary - do not copy or distribute Data Infrastructure Optimization • Mainframe Optimization • Application Modernization • EDW Optimization • Cross-Platform Capacity Management Data Availability • High Availability & Disaster Recovery • Mission-Critical Migration • Cross-Platform Data Sharing • IBM i Data Security & Audit • Mainframe Access & Integration for Machine Data • Mainframe Access & Integration for App Data • High-performance ETL • Change Data Capture Data Integration Data Quality • Data Governance • Customer 360 • Big Data Quality & Integration • Data Enrichment & Validation www.syncsort.com

Editor's Notes

  • #3: Tendü Yoğurtçu, Ph.D., is Syncsort’s Chief Technology Officer (CTO). She has 20+ years of software industry experience, including extensive Big Data and Hadoop industry knowledge. As CTO, Tendu directs the company’s technology strategy and innovation, leading all product research and development programs. Prior to her CTO role, Tendü has served as Syncsort’s General Manager of Big Data, leading the global software business for Data Integration, Hadoop and Cloud, including sales, marketing, engineering and support. Tendu has held several engineering management roles where she directed the development of ETL, Sort, and Application Modernization products for Syncsort’s Data Integration business. She also was an Adjunct Faculty Member at the Computer Science Department at Stevens Institute of Technology.
  • #4: So, think about this scenario… You’re a chef at a high-end restaurant that has made a name for itself by providing meals of the highest quality made with the freshest organic food. Not only do you need to run a tight ship inside your kitchen – selecting the best ingredients, ensuring high standards for how your food is stored, prepared and cooked… But you also need to know everything about that food BEFORE it was delivered to your kitchen door. Where and how was your food produced? Were your chicken’s free range? Are all the organic certifications valid and up to date? How was the food handled, transported and stored along the way? Was it kept at the proper temperatures to maintain freshness and safety? Did it come into contact with any other foods it shouldn’t have? What about ingredients – like your rice, pasta and spices -- that are imported from other countries? How do you trace them all the way back to where they were harvested or produced? out .. And how long will it take? An end to end view of where your ingredients came from … and what happened to them along the way … is critically important to maintain the restaurant’s reputation for high-quality food. However, this is obviously a complicated and difficult task. While different industries, there are many similarities between the chef’s mission to track her food supply from the farm to the table – and our need to have an end-to-end view of our data. We need to know all the various sources of the data we are working with – how it was moved, transformed, combined along the way – and then what’s happening to it once we are working with it in our Enterprise Data Hub or Data Lake. And, like the restaurant scenario, the task of doing this is critical to complying with regulations and ensuring high quality for our end consumers. And, the task of accomplishing this end to end view is also very challenging.
  • #5: Some of the top technology trends are advancing what we do with data but also make it harder to get this view. Cloud: Rapid growth in cloud data volumes IT is a consumer, provider and broker of cloud services Hybrid is becoming the standard And this all drives the need for Data Governance – but also makes it harder to achieve
  • #6: Data Science & AI: Investment growing Early adopters: Financial Services, Banking, Retail, Telco
  • #7: IoT and Streaming Data: IoT adoption doubled between 2013-2017 Links to analytics, artificial intelligence (AI) and other critical digital initiatives
  • #8: And this all drives the need for Data Governance – but also makes it harder to achieve
  • #11: A better way is needed – so that, just like the chef, we can have a complete view of our data, from the origin to the data hub – and know what has happened to it at every step of the way
  • #13: A better way is needed – so that, just like the chef, we can have a complete view of our data, from the origin to the data hub – and know what has happened to it at every step of the way
  • #14: As mentioned earlier – lineage is very important, but it’s not the only factor to consider for governance. Data Quality plays a critical role in enterprise data governance as well. To meet the needs of our customers with Hadoop and Spark, we recently released Trillium Quality for Big Data, which runs natively in the cluster to ensure that the data lake doesn’t turn into a data swamp, while harnessing the processing power of the cluster to scale for massive data volumes.
  • #15: Let’s take a look at how this works in the real world…
  • #16: Let’s take a look at how this works in the real world…
  • #19: #1 BIBD solutions provider # countries ?? # partners ??