SlideShare a Scribd company logo
Talend Data Integration and Management
Data Integration



   Data Integration involves combining data
 residing in differente sources and providing the
        user with a unified view of the data


Data Management combines different disciplines
    to manage data as a valuable resource




                                         www.robertomarchetto.com
Talend


●   Talend is a company focused on Data
    Integration and Data Management solutions
●   Talend is a „Cool Vendor“ for Gartner (2010)
●   Present in more than 12 locations around the
    World
●   Fast growing company




                                          www.robertomarchetto.com
Talend Open Studio




                     www.robertomarchetto.com
Talend Open Studio

●   Open Source, professional tool
●   Draw procedures linking components, each
    component performs an operation
●   DB vendor-specific optimized components
●   Produces fully editable Java (or Perl) code
●   Deployment with small and fast compiled Java
    or as Web Service
●   Eclipse based IDE, excellent flexibility
●   BI Platform indipendent, DB Vendor indipendent
                                               www.robertomarchetto.com
Automatic code generation, diffent
           deployment




                             www.robertomarchetto.com
Extracion Transformation Loading


●   ETL is a common process in Data Integration
    ●   Extract, reading data from different datasources
        (database, flat files, spreadsheet files, web
        services, etc)
    ●   Transfom, converting data in a form so that it can
        be placed in another container (database, web
        services, files, etc). Cleaning, computations and
        verifications are also performed
    ●   Load, write the data in the target format



                                                    www.robertomarchetto.com
Tutorial, Source data




                        www.robertomarchetto.com
Tutorial, Destination data (Datawarehouse)




                                 www.robertomarchetto.com
Tutorial, Metadata


●   Talend requires a preliminary definition of the
    metadata
●   Often a strong metadata definition means, as in
    programming languages, fast, robust and
    maintenable applications
●   ..demo..




                                            www.robertomarchetto.com
Tutorial, Talend jobs basics



●   Place components on the designer
●   Link components to build a transformation
●   Main type of link: Rows flow
●   Schema metadata is propagated and must be
    coherent
●   ..demo..



                                         www.robertomarchetto.com
Tutorial, users_dimension




                        www.robertomarchetto.com
Test the job




               www.robertomarchetto.com
Tutorial, accounts_dimension




                         www.robertomarchetto.com
Tutorial, dates_dimension




                        www.robertomarchetto.com
Tutorial, write a Java library




                            www.robertomarchetto.com
Tutorial, opportunities_fact




                          www.robertomarchetto.com
Tutorial, define a root job




                          www.robertomarchetto.com
Deploy and run




                 www.robertomarchetto.com
Extensibility, comunity plugins


                ●   Many official
                    components
                ●   Components for
                    every task released
                    by the comunity
                ●   Geospatial
                    components, log
                    analysis, Google
                    analytics, data
                    encryption, etc

                                www.robertomarchetto.com
Scheduler




            www.robertomarchetto.com
And now.. reports, dashboards, OLAP,
        Geoanalysis, KPIs..




                              www.robertomarchetto.com
Do you trust your data?




                     www.robertomarchetto.com
What about data quality?

●   Customer A is present 5 times with different
    names
●   Null values can vary statistical indexes like
    mean calculation
●   Duplicated records
●   Blank values
●   Some records can contain errors (es -1 field
    values)
●   Some records can be garbage

                                            www.robertomarchetto.com
Talend Open Profiler




                       www.robertomarchetto.com
What abount data storage size?


●   Some fields can be oversized for the data they
    contain
●   Sometimes fields are related and can be
    calculated
●   Some keys or values are never used
●   When data grow garbage grow
●   Data storage is not free (disks, electricity,
    backups, DB licenses)

                                              www.robertomarchetto.com
Data is „the black gold“ that can produce
                knowledge


●   Data is a resource, you can extract knowledge
●   A lot of Data produces concise informations
●   Data storage is not free and a lot of data can
    make system not fast
●   Data cleansing is a central process in statistical
    analysis and Data Mining




                                            www.robertomarchetto.com
Talend Master Data Management




                         www.robertomarchetto.com

More Related Content

PDF
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
PPTX
Informatica PowerCenter
PDF
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
PDF
ETL Using Informatica Power Center
PDF
Talend Open Studio Introduction - OSSCamp 2014
PDF
Moving OBIEE to Oracle Analytics Cloud
PPT
Informatica session
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Informatica PowerCenter
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
ETL Using Informatica Power Center
Talend Open Studio Introduction - OSSCamp 2014
Moving OBIEE to Oracle Analytics Cloud
Informatica session

What's hot (20)

PPSX
Intro to Talend Open Studio for Data Integration
PDF
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
PDF
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
PPTX
What is ETL?
PDF
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
PDF
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
PPTX
What is Change Data Capture (CDC) and Why is it Important?
PDF
Etl overview training
PDF
Autonomous Data Warehouse
PDF
Future of Data Engineering
PPTX
Introduction of ssis
PDF
Introduction to Azure Data Factory
PPTX
Introduction to snowflake
PPTX
Oracle architecture ppt
PPTX
An Introduction to Talend Integration Cloud
PPTX
ETL Testing Overview
PPTX
Snowflake Architecture.pptx
PDF
Introducing DataFrames in Spark for Large Scale Data Science
PDF
Introduction to ETL and Data Integration
PDF
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Intro to Talend Open Studio for Data Integration
Talend Components | tMap, tJoin, tFileList, tInputFileDelimited | Talend Onli...
Talend Data Integration Tutorial | Talend Tutorial For Beginners | Talend Onl...
What is ETL?
Talend Interview Questions and Answers | Talend Online Training | Talend Tuto...
Talend Big Data Tutorial | Talend DI and Big Data Certification | Talend Onli...
What is Change Data Capture (CDC) and Why is it Important?
Etl overview training
Autonomous Data Warehouse
Future of Data Engineering
Introduction of ssis
Introduction to Azure Data Factory
Introduction to snowflake
Oracle architecture ppt
An Introduction to Talend Integration Cloud
ETL Testing Overview
Snowflake Architecture.pptx
Introducing DataFrames in Spark for Large Scale Data Science
Introduction to ETL and Data Integration
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Ad

Similar to Talend Open Studio Data Integration (20)

PDF
Business Intelligence Open Source
PDF
Mapping Manager Product Overview
PDF
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
PDF
SFScon22 - Grazia Cazzin - Open source analytics and business intelligence.pdf
PDF
Talend bonitasoft-ow2-conference-nov10
 
PDF
Business objects data services advanced
PDF
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
PDF
Data-Ed: Data Warehousing Strategies
PDF
Data-Ed Online Presents: Data Warehouse Strategies
PDF
When Worlds Collide: Intelligence, Analytics and Operations
PDF
Powering your Apps with Data.com (Dreamforce 2011)
PDF
Bringing Agility and Flexibility to Data Design and Integration
PPTX
Mind Blowing Business Intelligence Dashboards
PDF
OW2 Talend Data Integration Linuxtag09 (German)
PDF
OW2 Talend Data Integration Linuxtag09 (English)
PPTX
Go-To-Market with Capstone v3
PDF
Data Quality Success Stories
PPTX
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
PPTX
Cisco event 6 05 2014v3 wwt only
PDF
BAR360 open data platform presentation at DAMA, Sydney
Business Intelligence Open Source
Mapping Manager Product Overview
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
SFScon22 - Grazia Cazzin - Open source analytics and business intelligence.pdf
Talend bonitasoft-ow2-conference-nov10
 
Business objects data services advanced
Delivering Self-Service Analytics using Big Data and Data Virtualization on t...
Data-Ed: Data Warehousing Strategies
Data-Ed Online Presents: Data Warehouse Strategies
When Worlds Collide: Intelligence, Analytics and Operations
Powering your Apps with Data.com (Dreamforce 2011)
Bringing Agility and Flexibility to Data Design and Integration
Mind Blowing Business Intelligence Dashboards
OW2 Talend Data Integration Linuxtag09 (German)
OW2 Talend Data Integration Linuxtag09 (English)
Go-To-Market with Capstone v3
Data Quality Success Stories
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Cisco event 6 05 2014v3 wwt only
BAR360 open data platform presentation at DAMA, Sydney
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Empathic Computing: Creating Shared Understanding
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Empathic Computing: Creating Shared Understanding
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectroscopy.pptx food analysis technology
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Talend Open Studio Data Integration

  • 1. Talend Data Integration and Management
  • 2. Data Integration Data Integration involves combining data residing in differente sources and providing the user with a unified view of the data Data Management combines different disciplines to manage data as a valuable resource www.robertomarchetto.com
  • 3. Talend ● Talend is a company focused on Data Integration and Data Management solutions ● Talend is a „Cool Vendor“ for Gartner (2010) ● Present in more than 12 locations around the World ● Fast growing company www.robertomarchetto.com
  • 4. Talend Open Studio www.robertomarchetto.com
  • 5. Talend Open Studio ● Open Source, professional tool ● Draw procedures linking components, each component performs an operation ● DB vendor-specific optimized components ● Produces fully editable Java (or Perl) code ● Deployment with small and fast compiled Java or as Web Service ● Eclipse based IDE, excellent flexibility ● BI Platform indipendent, DB Vendor indipendent www.robertomarchetto.com
  • 6. Automatic code generation, diffent deployment www.robertomarchetto.com
  • 7. Extracion Transformation Loading ● ETL is a common process in Data Integration ● Extract, reading data from different datasources (database, flat files, spreadsheet files, web services, etc) ● Transfom, converting data in a form so that it can be placed in another container (database, web services, files, etc). Cleaning, computations and verifications are also performed ● Load, write the data in the target format www.robertomarchetto.com
  • 8. Tutorial, Source data www.robertomarchetto.com
  • 9. Tutorial, Destination data (Datawarehouse) www.robertomarchetto.com
  • 10. Tutorial, Metadata ● Talend requires a preliminary definition of the metadata ● Often a strong metadata definition means, as in programming languages, fast, robust and maintenable applications ● ..demo.. www.robertomarchetto.com
  • 11. Tutorial, Talend jobs basics ● Place components on the designer ● Link components to build a transformation ● Main type of link: Rows flow ● Schema metadata is propagated and must be coherent ● ..demo.. www.robertomarchetto.com
  • 12. Tutorial, users_dimension www.robertomarchetto.com
  • 13. Test the job www.robertomarchetto.com
  • 14. Tutorial, accounts_dimension www.robertomarchetto.com
  • 15. Tutorial, dates_dimension www.robertomarchetto.com
  • 16. Tutorial, write a Java library www.robertomarchetto.com
  • 17. Tutorial, opportunities_fact www.robertomarchetto.com
  • 18. Tutorial, define a root job www.robertomarchetto.com
  • 19. Deploy and run www.robertomarchetto.com
  • 20. Extensibility, comunity plugins ● Many official components ● Components for every task released by the comunity ● Geospatial components, log analysis, Google analytics, data encryption, etc www.robertomarchetto.com
  • 21. Scheduler www.robertomarchetto.com
  • 22. And now.. reports, dashboards, OLAP, Geoanalysis, KPIs.. www.robertomarchetto.com
  • 23. Do you trust your data? www.robertomarchetto.com
  • 24. What about data quality? ● Customer A is present 5 times with different names ● Null values can vary statistical indexes like mean calculation ● Duplicated records ● Blank values ● Some records can contain errors (es -1 field values) ● Some records can be garbage www.robertomarchetto.com
  • 25. Talend Open Profiler www.robertomarchetto.com
  • 26. What abount data storage size? ● Some fields can be oversized for the data they contain ● Sometimes fields are related and can be calculated ● Some keys or values are never used ● When data grow garbage grow ● Data storage is not free (disks, electricity, backups, DB licenses) www.robertomarchetto.com
  • 27. Data is „the black gold“ that can produce knowledge ● Data is a resource, you can extract knowledge ● A lot of Data produces concise informations ● Data storage is not free and a lot of data can make system not fast ● Data cleansing is a central process in statistical analysis and Data Mining www.robertomarchetto.com
  • 28. Talend Master Data Management www.robertomarchetto.com