SlideShare a Scribd company logo
Make (big) data meaningfulSoftnix Technology
Softnix
Big Data
Platform
Company
Talend
Introduction
Make (big) data meaningfulSoftnix Technology
Agenda
• Introduction to Softnix
• Data Engineer vs Data Scientist
• Data Lake
• Talend Big Data Integration
• Demo
Make (big) data meaningfulSoftnix Technology
Logger Logger Cloud
for MSP
Data PlatformAuthenticator
Logger for AWS
Logger for Azure
“Big Data Platform Company”
Collector
Edge Point
All-in-one Law & Standard Compliance Security & IT Services
Monitoring by ZABBIX
Big Data Analytics
Products Module
- SDP for Log Analytic
- SDP for Cybersecurity
- SDP for IT Operation
& Monitoring
Product Module
- ISO 27001 Report
Make (big) data meaningfulSoftnix Technology
Technology Partner
Leader
Big Data Operation Technology
Leader
Data Integration
Data Quality
ISV Partner
Make (big) data meaningfulSoftnix Technology
Softnix Data Platform
Big Data Analytic Platform
Any Device
Any Platform
Dashboard & VisualizeEnterprise Data Lake
What ‘s Softnix Data Platform
Make (big) data meaningfulSoftnix Technology
Softnix Data Platform - Services
Framework
Make (big) data meaningfulSoftnix Technology
Architecture Softnix Data Platform
Architecture
Make (big) data meaningfulSoftnix Technology
Dashboard System
Make (big) data meaningfulSoftnix Technology
Source : https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
Make (big) data meaningfulSoftnix Technology
Source : https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
Make (big) data meaningfulSoftnix Technology
Source : https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
Make (big) data meaningfulSoftnix Technology
Roles Big Data Project
Roles Responsibility Example Jobs Type of analytic Weakness
Data Analyst Data Query, Process,
Summary, Visualize
Data
Create Monthly Report,
Summary Report by
business request
Backward (Descriptive,
Diagnostic )
Not have skill to analytic
and management Big
Data
Data Engineer Create pipeline data for
serve to business
requirement, Clean,
Accurate date, Prepare
data to data analyst or
data scientist,
Knowledge of data
warehouse, data lake,
data modeling
Setup server,
maintenance big data
server, collect data,
Data Preparation,
Import and export data
to easy analyst format
Backward and forward Not have analyst data
skill
Data Scientist Work with machine
learning, AI to analytic
data
Work with raw data,
focus to insight of data,
find new meaningful
Focus to Forward
(Predictive &
Prescriptive)
Operation Big data
infrastructure
Make (big) data meaningfulSoftnix Technology
What	data	scientists	spend	the	most	time	doing
Data preparation accounts for about 80% of
the work of data scientists
Source: https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-
survey-says/#2bdb12bf6f63
Make (big) data meaningfulSoftnix Technology
Why	Do	We	Need	a	Data	Lake?
“Data lakes are enterprise-wide data management platforms for analyzing
disparate sources of data in its native format.”, Gartner.
BusinessValue
Reducing cost
Generating new opportunities
• ETL offload
• EDW offload/optimization
• Data archiving
• Customer acquisition,
retention..
• Real-time engagement
• Pricing optimization
• Demand forecasting
• Risk and fraud
• Predictive maintenance
• Smart products…
Make (big) data meaningfulSoftnix Technology
The rest
of us
Data	Lakes	Bring	New	Challenges
High-end
users
Complexity, poor governance and control, no reuse
Make (big) data meaningfulSoftnix Technology
Data Lake – Conceptual
Architecture
Acquire
Ingest
Understand
& Improve
Curate &
Govern
Deliver
Self-
service
SCALE
Make (big) data meaningfulSoftnix Technology
Ingestion	Best	Practices
Transactions
Messages & Events
1011
0111
0010
1011
0111
0010
Logs
Sensors
Data Analytics & Data Science
Real-time Data Visualization
Real-time Indicators / Scorecard
Collect - Distribute
Track
Streaming
Alert
NYC Taxi Data Streaming
Make (big) data meaningfulSoftnix Technology
• Hadoop: the core project
• HDFS: the Hadoop Distributed File System
• MapReduce: the software framework for distributed processing of
large data sets
• Hive: a data warehouse infrastructure that provides data
summarization and a querying language
• Pig: a high-level data-flow language and execution framework for
parallel computation
• HBase: this is the Hadoop database. Use it when you need
random, realtime read/write access to your Big Data
• And many many more: Sqoop, HCatalog, Zookeeper, Oozie,
Cassandra, MongoDB, Flume, Impala, Stinger, Neo4J, etc.
Big Data Ecosystem
Make (big) data meaningfulSoftnix Technology
Talend Big Data Integration
Make (big) data meaningfulSoftnix Technology
Make (big) data meaningfulSoftnix Technology
GARTNER	DATA	INTEGRATION	MAGIC	QUADRANT	2017
Make (big) data meaningfulSoftnix Technology
Talend Unique Integration
Solution
Make (big) data meaningfulSoftnix Technology
• Basic Concept
• Monitor the hadoop cluster
• Create cluster metadata
• Reading and writing data in HDFS
• reading and writing to HDFS and Hbase
• Working with tables
• Import DB table with Scoop
• Create table with Hive
• Processing data and table in HDFS
• Processing tables with Hive
• Processing data with Pig
• Processing data with Big Data batch jobs (MapReduce or Spark)
Talend Big Data Capabilities
Make (big) data meaningfulSoftnix Technology
Talend Product Strategy
Make (big) data meaningfulSoftnix Technology
Data Preparation
Components
• Select – include/exclude data
• Merge – combine data from multiple source
• Clean – handle missing value, duplicates, null value
• Transform – convert data into analysis format
• Enrich – derive attributes, generate fields
• Aggregate – convert transactional rows of data into
summarize columns.
Make (big) data meaningfulSoftnix Technology
Vision: Democratize big data
Make (big) data meaningfulSoftnix Technology
Vision: Democratize big data
Make (big) data meaningfulSoftnix Technology
Talend Big Data
Make (big) data meaningfulSoftnix Technology
Typical Use Case Scenario
Make (big) data meaningfulSoftnix Technology
Hadoop Connector
Make (big) data meaningfulSoftnix Technology
• Step 1 Import File into Cloudera Hadoop
• Step 2 Preparation Process on HDFS
• Step 3 Setup Hive Table
• Step 4 Load file to Hive
• Step 5 Using Hue to explore data
Demo

More Related Content

PDF
Softnix Security Data Lake
PPTX
Architecting a Modern Data Warehouse: Enterprise Must-Haves
PPTX
The Yellowbrick Impact for MicroStrategy
PDF
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
PPTX
Automating Splunk at Large Scale with Cloudify
PPTX
How Yellowbrick Data Integrates to Existing Environments Webcast
PPTX
Scality medical imaging storage
PPTX
Yellowbrick Webcast with DBTA for Real-Time Analytics
Softnix Security Data Lake
Architecting a Modern Data Warehouse: Enterprise Must-Haves
The Yellowbrick Impact for MicroStrategy
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Automating Splunk at Large Scale with Cloudify
How Yellowbrick Data Integrates to Existing Environments Webcast
Scality medical imaging storage
Yellowbrick Webcast with DBTA for Real-Time Analytics

What's hot (20)

PDF
Hortonworks Hybrid Cloud - Putting you back in control of your data
PDF
Countering Threats with the Elastic Stack at CERDEC/ARL
PPTX
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
PDF
Data Virtualization: From Zero to Hero (Middle East)
PPTX
The Power of Data
PDF
Threat Detection and Response at Scale with Dominique Brezinski
PDF
Open Source Data Management for Industry 4.0
PDF
End to End Supply Chain Control Tower
PDF
Monitoring and Securing a Geo-Dispersed Data Center at Hill AFB
PDF
Transforming GE Healthcare with Data Platform Strategy
PPTX
Making Bank Predictive and Real-Time
PPTX
How to get Real-Time Value from your IoT Data - Datastax
PDF
The Synapse IoT Stack: Technology Trends in IOT and Big Data
PDF
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
PDF
Business Insight
PDF
Destroying Data Silos
PDF
Denodo Design Studio: Modeling and Creation of Data Services
PPTX
2016 Cybersecurity Analytics State of the Union
PDF
What's New in Pentaho 7.0?
PPTX
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Hortonworks Hybrid Cloud - Putting you back in control of your data
Countering Threats with the Elastic Stack at CERDEC/ARL
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Data Virtualization: From Zero to Hero (Middle East)
The Power of Data
Threat Detection and Response at Scale with Dominique Brezinski
Open Source Data Management for Industry 4.0
End to End Supply Chain Control Tower
Monitoring and Securing a Geo-Dispersed Data Center at Hill AFB
Transforming GE Healthcare with Data Platform Strategy
Making Bank Predictive and Real-Time
How to get Real-Time Value from your IoT Data - Datastax
The Synapse IoT Stack: Technology Trends in IOT and Big Data
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Business Insight
Destroying Data Silos
Denodo Design Studio: Modeling and Creation of Data Services
2016 Cybersecurity Analytics State of the Union
What's New in Pentaho 7.0?
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
Ad

Similar to Talend introduction v1 (20)

PPT
Datapreneurs
PDF
How to make your data scientists happy
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
PPTX
Talend 6.1 - What's New in Talend?
PPTX
Big data4businessusers
PPTX
Big Data and the Art of Data Science
PPTX
Demystifying Data Science & Analytics - 757ColorCoded 2019
PDF
Revolution in Business Analytics-Zika Virus Example
PDF
SIMPosium presentation_Bardess Qlik
PPTX
Why Everything You Know About bigdata Is A Lie
PPTX
On Big Data
PDF
Big data Analytics
PDF
Big Data Analytics
PDF
Ds01 data science
PDF
Getting started in Data Science (April 2017, Los Angeles)
PDF
Unleash the Power of Big Data and Machine Learning
PPTX
The Journey to Big Data Analytics
PDF
Level Seven - Expedient Big Data presentation
PPTX
Big data insights part i
PPTX
Big data
Datapreneurs
How to make your data scientists happy
Big Data and Data Science: The Technologies Shaping Our Lives
Talend 6.1 - What's New in Talend?
Big data4businessusers
Big Data and the Art of Data Science
Demystifying Data Science & Analytics - 757ColorCoded 2019
Revolution in Business Analytics-Zika Virus Example
SIMPosium presentation_Bardess Qlik
Why Everything You Know About bigdata Is A Lie
On Big Data
Big data Analytics
Big Data Analytics
Ds01 data science
Getting started in Data Science (April 2017, Los Angeles)
Unleash the Power of Big Data and Machine Learning
The Journey to Big Data Analytics
Level Seven - Expedient Big Data presentation
Big data insights part i
Big data
Ad

Recently uploaded (20)

PDF
.pdf is not working space design for the following data for the following dat...
PDF
annual-report-2024-2025 original latest.
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Mega Projects Data Mega Projects Data
PDF
Business Analytics and business intelligence.pdf
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to machine learning and Linear Models
.pdf is not working space design for the following data for the following dat...
annual-report-2024-2025 original latest.
climate analysis of Dhaka ,Banglades.pptx
Foundation of Data Science unit number two notes
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Acumen Training GuidePresentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Supervised vs unsupervised machine learning algorithms
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Miokarditis (Inflamasi pada Otot Jantung)
Mega Projects Data Mega Projects Data
Business Analytics and business intelligence.pdf
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
ISS -ESG Data flows What is ESG and HowHow
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
Introduction to machine learning and Linear Models

Talend introduction v1

  • 1. Make (big) data meaningfulSoftnix Technology Softnix Big Data Platform Company Talend Introduction
  • 2. Make (big) data meaningfulSoftnix Technology Agenda • Introduction to Softnix • Data Engineer vs Data Scientist • Data Lake • Talend Big Data Integration • Demo
  • 3. Make (big) data meaningfulSoftnix Technology Logger Logger Cloud for MSP Data PlatformAuthenticator Logger for AWS Logger for Azure “Big Data Platform Company” Collector Edge Point All-in-one Law & Standard Compliance Security & IT Services Monitoring by ZABBIX Big Data Analytics Products Module - SDP for Log Analytic - SDP for Cybersecurity - SDP for IT Operation & Monitoring Product Module - ISO 27001 Report
  • 4. Make (big) data meaningfulSoftnix Technology Technology Partner Leader Big Data Operation Technology Leader Data Integration Data Quality ISV Partner
  • 5. Make (big) data meaningfulSoftnix Technology Softnix Data Platform Big Data Analytic Platform Any Device Any Platform Dashboard & VisualizeEnterprise Data Lake What ‘s Softnix Data Platform
  • 6. Make (big) data meaningfulSoftnix Technology Softnix Data Platform - Services Framework
  • 7. Make (big) data meaningfulSoftnix Technology Architecture Softnix Data Platform Architecture
  • 8. Make (big) data meaningfulSoftnix Technology Dashboard System
  • 9. Make (big) data meaningfulSoftnix Technology Source : https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
  • 10. Make (big) data meaningfulSoftnix Technology Source : https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
  • 11. Make (big) data meaningfulSoftnix Technology Source : https://www.datacamp.com/community/blog/data-scientist-vs-data-engineer
  • 12. Make (big) data meaningfulSoftnix Technology Roles Big Data Project Roles Responsibility Example Jobs Type of analytic Weakness Data Analyst Data Query, Process, Summary, Visualize Data Create Monthly Report, Summary Report by business request Backward (Descriptive, Diagnostic ) Not have skill to analytic and management Big Data Data Engineer Create pipeline data for serve to business requirement, Clean, Accurate date, Prepare data to data analyst or data scientist, Knowledge of data warehouse, data lake, data modeling Setup server, maintenance big data server, collect data, Data Preparation, Import and export data to easy analyst format Backward and forward Not have analyst data skill Data Scientist Work with machine learning, AI to analytic data Work with raw data, focus to insight of data, find new meaningful Focus to Forward (Predictive & Prescriptive) Operation Big data infrastructure
  • 13. Make (big) data meaningfulSoftnix Technology What data scientists spend the most time doing Data preparation accounts for about 80% of the work of data scientists Source: https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task- survey-says/#2bdb12bf6f63
  • 14. Make (big) data meaningfulSoftnix Technology Why Do We Need a Data Lake? “Data lakes are enterprise-wide data management platforms for analyzing disparate sources of data in its native format.”, Gartner. BusinessValue Reducing cost Generating new opportunities • ETL offload • EDW offload/optimization • Data archiving • Customer acquisition, retention.. • Real-time engagement • Pricing optimization • Demand forecasting • Risk and fraud • Predictive maintenance • Smart products…
  • 15. Make (big) data meaningfulSoftnix Technology The rest of us Data Lakes Bring New Challenges High-end users Complexity, poor governance and control, no reuse
  • 16. Make (big) data meaningfulSoftnix Technology Data Lake – Conceptual Architecture Acquire Ingest Understand & Improve Curate & Govern Deliver Self- service SCALE
  • 17. Make (big) data meaningfulSoftnix Technology Ingestion Best Practices Transactions Messages & Events 1011 0111 0010 1011 0111 0010 Logs Sensors Data Analytics & Data Science Real-time Data Visualization Real-time Indicators / Scorecard Collect - Distribute Track Streaming Alert NYC Taxi Data Streaming
  • 18. Make (big) data meaningfulSoftnix Technology • Hadoop: the core project • HDFS: the Hadoop Distributed File System • MapReduce: the software framework for distributed processing of large data sets • Hive: a data warehouse infrastructure that provides data summarization and a querying language • Pig: a high-level data-flow language and execution framework for parallel computation • HBase: this is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data • And many many more: Sqoop, HCatalog, Zookeeper, Oozie, Cassandra, MongoDB, Flume, Impala, Stinger, Neo4J, etc. Big Data Ecosystem
  • 19. Make (big) data meaningfulSoftnix Technology Talend Big Data Integration
  • 20. Make (big) data meaningfulSoftnix Technology
  • 21. Make (big) data meaningfulSoftnix Technology GARTNER DATA INTEGRATION MAGIC QUADRANT 2017
  • 22. Make (big) data meaningfulSoftnix Technology Talend Unique Integration Solution
  • 23. Make (big) data meaningfulSoftnix Technology • Basic Concept • Monitor the hadoop cluster • Create cluster metadata • Reading and writing data in HDFS • reading and writing to HDFS and Hbase • Working with tables • Import DB table with Scoop • Create table with Hive • Processing data and table in HDFS • Processing tables with Hive • Processing data with Pig • Processing data with Big Data batch jobs (MapReduce or Spark) Talend Big Data Capabilities
  • 24. Make (big) data meaningfulSoftnix Technology Talend Product Strategy
  • 25. Make (big) data meaningfulSoftnix Technology Data Preparation Components • Select – include/exclude data • Merge – combine data from multiple source • Clean – handle missing value, duplicates, null value • Transform – convert data into analysis format • Enrich – derive attributes, generate fields • Aggregate – convert transactional rows of data into summarize columns.
  • 26. Make (big) data meaningfulSoftnix Technology Vision: Democratize big data
  • 27. Make (big) data meaningfulSoftnix Technology Vision: Democratize big data
  • 28. Make (big) data meaningfulSoftnix Technology Talend Big Data
  • 29. Make (big) data meaningfulSoftnix Technology Typical Use Case Scenario
  • 30. Make (big) data meaningfulSoftnix Technology Hadoop Connector
  • 31. Make (big) data meaningfulSoftnix Technology • Step 1 Import File into Cloudera Hadoop • Step 2 Preparation Process on HDFS • Step 3 Setup Hive Table • Step 4 Load file to Hive • Step 5 Using Hue to explore data Demo