SlideShare a Scribd company logo
Introduction to Big Data 
Three Engines for Harnessing the Power of Big Data 
Paul Barsch, Marketing Director
2 
2 > 
What are Big Data? 
Big data is not about size alone. This year's big 
data is next year's normal-sized data. 
Generally, volume quickly gives way to the 
more defining requirements of variety, velocity 
and complexity. 
-Mark Beyer, Douglas Laney, Gartner 
“Examples include web logs, RFID, sensor networks, 
social networks, Internet text and documents, 
Internet search indexing, call detail records, 
genomics, astronomy, biological research, military 
surveillance, medical records, photography 
archives, video archives, and large scale 
eCommerce." Wikipedia, Big Data
3 
We’ve Come A Long Way! 
• Larry Page and Sergey Brin 
managed to patch together 1TB 
of disk by spending $15K on their 
credit cards in 1998 
• In 1980, 1 Terabyte of disk 
storage could cost up to $14M. 
Amazon.com - $87.99
4 
Big Data: From Transactions to Interactions 
BIG DATA 
WEB 
Petabytes 
User Generated 
Content 
Mobile Web 
Dynamic Pricing 
CRM 
Terabytes 
Gigabytes 
Offer Details 
Segmentation 
Purchase ERP 
Customer Touches 
Detail 
Exabytes 
Increasing Data Variety and Complexity 
SMS/MMS 
Sentiment 
External 
Demographics 
HD Video 
Speech to Text 
Product/ 
Service Logs 
Social Network 
Business Data 
Feeds 
User Click Stream 
Web Logs 
Offer History A/B Testing 
Affiliate Networks 
Search Marketing 
Behavioral 
Targeting 
Dynamic Funnels 
Payment 
Record Support Contacts 
Purchase 
Record 
Behavioral Analytics 
Not Just “Big Data” but All Data
5 
Myriad Data Sources 
According to IDC, 
80 percent of 
enterprise data 
today is multi-structured 
data, 
and that is growing 
at the exponential 
annual rate of 60 
percent.
6 
Data Growth 
Source: IDC - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009 
Transactions 
10 24 
Yottabyte 
Interactions 1021 
Zettabyte 
1018 
Exabyte 
1015 
Petabyte 
1012 
Terabyte 
109 
Gigabyte
7 
235 TB of Data – as of 2011 
“The average company (over 1000 employees) in 14 of 17 sectors stores 
more data than does the US Library of Congress” 
Source: HortonWorks: Apache Hadoop Basics Whitepaper, June 2013
8 
The Teradata Club of Elite Power Players 
Teradata creates elite club for petabyte-plus data 
warehouse customers 
'Petabyte Power Players' includes eBay, Wal-Mart, Bank of America, Dell, unnamed bank 
October 14, 2008 (Computerworld) Teradata Corp. took its second step in two days to reaffirm itself as king of the 
data warehousing mountain, as it announced five customers running data warehouses larger than a petabyte in 
size. At its PARTNERS conference in Las Vegas on Tuesday, the Miamisburg, Oh. vendor said the five members of its 
newly-created 'Petabyte Power Players' club include eBay Inc., with 5 petabytes of data, Wal-Mart Stores Inc., 
which has 2.5 petabytes, Bank of America Corp., which is storing 1.5 petabytes, Dell Inc., which has a 1PB data 
warehouse, and a final bank, with a 1.4PB data warehouse that chief marketing officer Darryl McDonald said he 
couldn't name yet. McDonald said the club should grow quickly as Teradata convinces other petabyte-plus 
enterprises to come forward. However, the many rumored government and military customers that use Teradata 
will remain publicity-shy, he said. Most of the customers have been using Teradata for at least half a decade. Take 
eBay, which started in 2002 with a single 14TB system. Today, it processes 50PB of information each day while 
adding 40TB of auction and purchase data. Not only is the data warehouse large, it is speedy, with eBay doing real-time 
analytics alongside less timely data mining efforts, McDonald said …. 
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9117159
Financial, Customer, Transactional Data Most 
Important to Business Strategy 
53% 
44% 
41% 
36% 
26% 
22% 
22% 
15% 
17% 
11% 
12% 
8% 
10% 
7% 
7% 
8% 
5% 
21% 
18% 
23% 
18% 
18% 
14% 
15% 
14% 
13% 
10% 
29% 
28% 
37% 
27% 
31% 
31% 
38% 
Planning, budgeting, forecasting 
Transactional-corporate apps 
Customer 
Transactional-custom apps 
Spreadsheets 
Unstructured internal 
Product 
System logs 
Scientific 
3rd party 
Partner 
Video, imagery, audio 
Sensor 
Weblogs 
Social network 
Consumer mobile 
Unstructured external 
Very important 
Important 
Base: 603 global decision-makers involved in business intelligence, data management, and governance initiatives 
Source: Forrsights Strategy Spotlight: Business Intelligence And Big Data, Q4 2012 
9 9
10 
Unified Data Architecture 
Analytic Applications 
Visualization & BI Industry Accelerators 
Event 
Processing 
Big Data Architecture 
Hadoop Discovery 
Platform 
Application 
Development 
Systems 
Management 
Collaboration 
Access Layer 
Data Integration and Management 
Data 
Warehousing
11 
What is a Data Warehouse? 
• Subject oriented 
- A model of sales, inventory, finance, etc. with detailed data 
• Integrated 
- Consolidated data from many sources 
- Consistent, standardized data formats and values 
• Nonvolatile 
- Records kept unmodified for long periods of time 
• Time variant 
- Record versions with time stamps or temporal 
• Persistent storage 
- Not virtual, not federated 
Source: Gartner: Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses‘, Dec 2005; 
Inmon, Building the Data Warehouse, 1992, Wiley and Sons
12 
Subject Areas: A Model of ‘Our’ Business 
Price 
history 
Point of Sale 
Inventory 
Supplier 
Contracts 
Product/Services 
Labor 
E-Commerce 
Associate 
Channels 
Customer 
Sales 
transactions 
Carrier Shipment 
Campaigns 
Promotion 
Warehouse 
Each subject area has numerous large FACT tables (=big joins)
Attributes for Enterprise Class Data Warehousing 
13 
High Performance 
Database 
RDBMS with powerful architecture and rich features 
High Performance 
Components 
Powerful, robust hardware that supports the most demanding 
needs 
Reliable No single point of failure 
High Availability Data Warehouses are often mission critical 
Scalable Easily expand to meet high growth needs 
High Concurrency 10’s to 1000’s of concurrent users & multiple applications 
Mixed Workloads Reporting, ad hoc and complex queries on same platform 
Secure Full protection of customer data 
Fully Managed Single point of system operation 
Investment Protection Multiple generations of HW technologies in the same system 
Data Center Compliant Efficient systems that fit the enterprise data center processes
14 
BCBS North Carolina 
http://www.teradata.com/Resources/Videos/Blue-Cross-Blue-Shield-of- 
North-Carolina-High-Impact-Results-of-a-Data-Driven- 
Culture/?LangType=1033&LangSelect=true
15 
Why Data Discovery? 
• Discovery as a “process”*: 
– PoC/experimentation (8-10 weeks) 
– Rapid modeling –before scaling out on a 
global basis 
– Freedom to experiment without impacting 
production systems 
• Types of discovery analysis: 
– Customer Path 
– Fraud 
– Social Network 
– Attrition 
– Online testing/targeting 
• Go beyond expensive data scientists and 
“democratize” discovery 
Customer Paths To Attrition 
Fraudulent Paths 
* Content Courtesy of 
Thomas Davenport
16 
If You Know SQL – You Can Do This! 
Some of the 100+ out-of-the-box analytical apps 
Path Analysis 
Discover patterns in rows of 
sequential data 
Text Analysis 
Derive patterns and extract 
features in textual data 
Statistical Analysis 
High-performance processing of 
common statistical calculations 
Segmentation 
Discover natural groupings of 
data points 
Marketing Analytics 
Analyze customer interactions to 
optimize marketing decisions 
Data Transformation 
Transform data for more 
advanced analysis
17 
Barnes and Noble 
http://www.teradata.com/Resources/Videos/Data-Driven-Decision- 
Making/?LangType=1033&LangSelect=true
18 
Architecture Differences – File System vs. Relational 
Database 
• Hadoop • Teradata
19 
What Goes in Hadoop? 
© 2014 Teradata
20 
Benefits of Hadoop 
• Runs on 10 to 4,000 servers 
– Extreme scalability 
• Data analyzed where it is stored 
– Move function to data 
– Don’t move data to the function 
• Use popular developer tools 
– Java, grep, python, etc. 
• Average programmers do parallel processing 
– Millions of Java programmers 
• All open source (free)
21 
Yahoo! Hadoop Clusters 
• ≈42,000 machines running Hadoop 
• Largest Hadoop clusters are currently 4000 nodes 
• Several petabytes of user data (compressed, unreplicated) 
• Run hundreds of thousands of jobs every month
Yahoo! Japan 
http://blogs.teradata.com/customers/yahoojapan-increasing-roi-through-predictive- 
22 © 2014 Teradata 
analytics-to-solve-customers-challenges-for-a-better-japan/
23 
How They All Work Together 
Service Management 
Teradata Applications 
Reports Visualization 
Tools 
Source Data 
Marketing 
Sales 
Customers 
Marketing 
Execution 
Campaign 
Management 
BI and Visualization 
Advanced Analytics 
Data Mining 
Marketing 
Operations 
Predictive Models 
Data 
Integration 
DATA 
INGEST 
Data 
Infrastructure 
Data Access 
Analytic Users 
Lifecycle Development and Sustainment 
Production Support and Operations 
ERP 
CRM 
SCM 
Images, 
Audio & 
Video 
Machine 
Logs, Text, 
Web, 
Social
24 
Verizon Wireless 
http://www.teradata.com/Resources/Videos/Verizon-Wireless-Employing- 
Unified-Data-Architecture-to-serve-100-million-customers/ 
© 2014 Teradata
25 
Thank You! 
Questions 
and Answers

More Related Content

PPT
Data Architecture for Data Governance
PPTX
DMBOK - Chapter 1 Summary
PDF
RWDG Slides: Data Governance and Three Levels of Metadata Management
PDF
The Data Model as a Data Governance Artifact
PDF
The Value of Metadata
PDF
LDM Webinar: Data Modeling & Metadata Management
PDF
Artifacts to Enable Data Goverance
PDF
Big data Readiness white paper
Data Architecture for Data Governance
DMBOK - Chapter 1 Summary
RWDG Slides: Data Governance and Three Levels of Metadata Management
The Data Model as a Data Governance Artifact
The Value of Metadata
LDM Webinar: Data Modeling & Metadata Management
Artifacts to Enable Data Goverance
Big data Readiness white paper

What's hot (20)

PDF
TeraStream - Data Integration/Migration/ETL/Batch Tool
PDF
Data Modeling is Data Governance
PDF
Data-Ed Webinar: Data Modeling Fundamentals
PDF
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
PDF
DataEd Online: Unlock Business Value through Data Governance
PDF
DAMA Webinar: What Does "Manage Data Assets" Really Mean?
PDF
Convincing Stakeholders Data Governance Is Essential
PDF
Aug 2017 damaga-peter-vennel
PDF
Data-Ed Online: A Practical Approach to Data Modeling
PDF
RWDG Webinar: Mastering and Master Data Governance
PDF
Data Governance by stealth v0.0.2
PDF
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
PDF
Enterprise Data World: Data Governance - The Four Critical Success Factors
PDF
Five Things to Consider About Data Mesh and Data Governance
PDF
RWDG Slides: Data Governance Roles and Responsibilities
PDF
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
PDF
Data Management
PDF
RWDG Slides: Corporate Data Governance - The CDO is the Data Governance Chief
PDF
RWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data Governance
PDF
RWDG Slides: Building Data Governance Through Data Stewardship
TeraStream - Data Integration/Migration/ETL/Batch Tool
Data Modeling is Data Governance
Data-Ed Webinar: Data Modeling Fundamentals
Seiner dataversity-rwdg2017-05-operating modelofdatagovernanceroles-20170518f...
DataEd Online: Unlock Business Value through Data Governance
DAMA Webinar: What Does "Manage Data Assets" Really Mean?
Convincing Stakeholders Data Governance Is Essential
Aug 2017 damaga-peter-vennel
Data-Ed Online: A Practical Approach to Data Modeling
RWDG Webinar: Mastering and Master Data Governance
Data Governance by stealth v0.0.2
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
Enterprise Data World: Data Governance - The Four Critical Success Factors
Five Things to Consider About Data Mesh and Data Governance
RWDG Slides: Data Governance Roles and Responsibilities
RWDG Slides: The Future of Data Governance – IoT, AI, IG, and Cloud
Data Management
RWDG Slides: Corporate Data Governance - The CDO is the Data Governance Chief
RWDG Slides: Glossaries, Dictionaries, and Catalogs Result in Data Governance
RWDG Slides: Building Data Governance Through Data Stewardship
Ad

Similar to Introduction to Harnessing Big Data (20)

PDF
Exploring the Wider World of Big Data
PPTX
Big data
PPTX
Presentation on Big Data
PDF
Exploring the Wider World of Big Data- Vasalis Kapsalis
PPTX
In-Memory Computing Webcast. Market Predictions 2017
PDF
Big Data World Forum
PPTX
Big data and data mining
PPTX
Introduction to Big Data
PDF
02 a holistic approach to big data
PDF
Big data Introduction by Mohan
PDF
Big data data lake and beyond
PPTX
Big Data, NoSQL, NewSQL & The Future of Data Management
PPTX
Big data? No. Big Decisions are What You Want
DOCX
Content1. Introduction2. What is Big Data3. Characte.docx
PDF
Die Big Data Fabric als Enabler für Machine Learning & AI
PPTX
Fundamentals of Big Data
PPTX
What is a Data Warehouse and How Do I Test It?
PPTX
Big data seminor
PDF
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Exploring the Wider World of Big Data
Big data
Presentation on Big Data
Exploring the Wider World of Big Data- Vasalis Kapsalis
In-Memory Computing Webcast. Market Predictions 2017
Big Data World Forum
Big data and data mining
Introduction to Big Data
02 a holistic approach to big data
Big data Introduction by Mohan
Big data data lake and beyond
Big Data, NoSQL, NewSQL & The Future of Data Management
Big data? No. Big Decisions are What You Want
Content1. Introduction2. What is Big Data3. Characte.docx
Die Big Data Fabric als Enabler für Machine Learning & AI
Fundamentals of Big Data
What is a Data Warehouse and How Do I Test It?
Big data seminor
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Ad

More from Paul Barsch (9)

PDF
What’s your perspective
PPTX
UCSD: Building a Big Data Culture - It Takes a Village
PDF
Harnessing Big Data_UCLA
PPTX
Internet of Things and the Value of Tracking Everything
PPTX
The Limits of Statistics in Business
PDF
Lecture three skills to thrive in new economy slideshare
PPT
Surviving The Corporate World - 4 Lessons Learned
PPT
MBA Lecture: Supply Chain Risk Management
PPTX
Boundaryless Marketing
What’s your perspective
UCSD: Building a Big Data Culture - It Takes a Village
Harnessing Big Data_UCLA
Internet of Things and the Value of Tracking Everything
The Limits of Statistics in Business
Lecture three skills to thrive in new economy slideshare
Surviving The Corporate World - 4 Lessons Learned
MBA Lecture: Supply Chain Risk Management
Boundaryless Marketing

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
A Presentation on Touch Screen Technology
PDF
August Patch Tuesday
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
1. Introduction to Computer Programming.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
OMC Textile Division Presentation 2021.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A Presentation on Touch Screen Technology
August Patch Tuesday
WOOl fibre morphology and structure.pdf for textiles
1. Introduction to Computer Programming.pptx
NewMind AI Weekly Chronicles - August'25-Week II
Assigned Numbers - 2025 - Bluetooth® Document
Digital-Transformation-Roadmap-for-Companies.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A comparative study of natural language inference in Swahili using monolingua...
Univ-Connecticut-ChatGPT-Presentaion.pdf
cloud_computing_Infrastucture_as_cloud_p
Zenith AI: Advanced Artificial Intelligence
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
TLE Review Electricity (Electricity).pptx
A comparative analysis of optical character recognition models for extracting...
OMC Textile Division Presentation 2021.pptx

Introduction to Harnessing Big Data

  • 1. Introduction to Big Data Three Engines for Harnessing the Power of Big Data Paul Barsch, Marketing Director
  • 2. 2 2 > What are Big Data? Big data is not about size alone. This year's big data is next year's normal-sized data. Generally, volume quickly gives way to the more defining requirements of variety, velocity and complexity. -Mark Beyer, Douglas Laney, Gartner “Examples include web logs, RFID, sensor networks, social networks, Internet text and documents, Internet search indexing, call detail records, genomics, astronomy, biological research, military surveillance, medical records, photography archives, video archives, and large scale eCommerce." Wikipedia, Big Data
  • 3. 3 We’ve Come A Long Way! • Larry Page and Sergey Brin managed to patch together 1TB of disk by spending $15K on their credit cards in 1998 • In 1980, 1 Terabyte of disk storage could cost up to $14M. Amazon.com - $87.99
  • 4. 4 Big Data: From Transactions to Interactions BIG DATA WEB Petabytes User Generated Content Mobile Web Dynamic Pricing CRM Terabytes Gigabytes Offer Details Segmentation Purchase ERP Customer Touches Detail Exabytes Increasing Data Variety and Complexity SMS/MMS Sentiment External Demographics HD Video Speech to Text Product/ Service Logs Social Network Business Data Feeds User Click Stream Web Logs Offer History A/B Testing Affiliate Networks Search Marketing Behavioral Targeting Dynamic Funnels Payment Record Support Contacts Purchase Record Behavioral Analytics Not Just “Big Data” but All Data
  • 5. 5 Myriad Data Sources According to IDC, 80 percent of enterprise data today is multi-structured data, and that is growing at the exponential annual rate of 60 percent.
  • 6. 6 Data Growth Source: IDC - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009 Transactions 10 24 Yottabyte Interactions 1021 Zettabyte 1018 Exabyte 1015 Petabyte 1012 Terabyte 109 Gigabyte
  • 7. 7 235 TB of Data – as of 2011 “The average company (over 1000 employees) in 14 of 17 sectors stores more data than does the US Library of Congress” Source: HortonWorks: Apache Hadoop Basics Whitepaper, June 2013
  • 8. 8 The Teradata Club of Elite Power Players Teradata creates elite club for petabyte-plus data warehouse customers 'Petabyte Power Players' includes eBay, Wal-Mart, Bank of America, Dell, unnamed bank October 14, 2008 (Computerworld) Teradata Corp. took its second step in two days to reaffirm itself as king of the data warehousing mountain, as it announced five customers running data warehouses larger than a petabyte in size. At its PARTNERS conference in Las Vegas on Tuesday, the Miamisburg, Oh. vendor said the five members of its newly-created 'Petabyte Power Players' club include eBay Inc., with 5 petabytes of data, Wal-Mart Stores Inc., which has 2.5 petabytes, Bank of America Corp., which is storing 1.5 petabytes, Dell Inc., which has a 1PB data warehouse, and a final bank, with a 1.4PB data warehouse that chief marketing officer Darryl McDonald said he couldn't name yet. McDonald said the club should grow quickly as Teradata convinces other petabyte-plus enterprises to come forward. However, the many rumored government and military customers that use Teradata will remain publicity-shy, he said. Most of the customers have been using Teradata for at least half a decade. Take eBay, which started in 2002 with a single 14TB system. Today, it processes 50PB of information each day while adding 40TB of auction and purchase data. Not only is the data warehouse large, it is speedy, with eBay doing real-time analytics alongside less timely data mining efforts, McDonald said …. http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9117159
  • 9. Financial, Customer, Transactional Data Most Important to Business Strategy 53% 44% 41% 36% 26% 22% 22% 15% 17% 11% 12% 8% 10% 7% 7% 8% 5% 21% 18% 23% 18% 18% 14% 15% 14% 13% 10% 29% 28% 37% 27% 31% 31% 38% Planning, budgeting, forecasting Transactional-corporate apps Customer Transactional-custom apps Spreadsheets Unstructured internal Product System logs Scientific 3rd party Partner Video, imagery, audio Sensor Weblogs Social network Consumer mobile Unstructured external Very important Important Base: 603 global decision-makers involved in business intelligence, data management, and governance initiatives Source: Forrsights Strategy Spotlight: Business Intelligence And Big Data, Q4 2012 9 9
  • 10. 10 Unified Data Architecture Analytic Applications Visualization & BI Industry Accelerators Event Processing Big Data Architecture Hadoop Discovery Platform Application Development Systems Management Collaboration Access Layer Data Integration and Management Data Warehousing
  • 11. 11 What is a Data Warehouse? • Subject oriented - A model of sales, inventory, finance, etc. with detailed data • Integrated - Consolidated data from many sources - Consistent, standardized data formats and values • Nonvolatile - Records kept unmodified for long periods of time • Time variant - Record versions with time stamps or temporal • Persistent storage - Not virtual, not federated Source: Gartner: Of Data Warehouses, Operational Data Stores, Data Marts and Data 'Outhouses‘, Dec 2005; Inmon, Building the Data Warehouse, 1992, Wiley and Sons
  • 12. 12 Subject Areas: A Model of ‘Our’ Business Price history Point of Sale Inventory Supplier Contracts Product/Services Labor E-Commerce Associate Channels Customer Sales transactions Carrier Shipment Campaigns Promotion Warehouse Each subject area has numerous large FACT tables (=big joins)
  • 13. Attributes for Enterprise Class Data Warehousing 13 High Performance Database RDBMS with powerful architecture and rich features High Performance Components Powerful, robust hardware that supports the most demanding needs Reliable No single point of failure High Availability Data Warehouses are often mission critical Scalable Easily expand to meet high growth needs High Concurrency 10’s to 1000’s of concurrent users & multiple applications Mixed Workloads Reporting, ad hoc and complex queries on same platform Secure Full protection of customer data Fully Managed Single point of system operation Investment Protection Multiple generations of HW technologies in the same system Data Center Compliant Efficient systems that fit the enterprise data center processes
  • 14. 14 BCBS North Carolina http://www.teradata.com/Resources/Videos/Blue-Cross-Blue-Shield-of- North-Carolina-High-Impact-Results-of-a-Data-Driven- Culture/?LangType=1033&LangSelect=true
  • 15. 15 Why Data Discovery? • Discovery as a “process”*: – PoC/experimentation (8-10 weeks) – Rapid modeling –before scaling out on a global basis – Freedom to experiment without impacting production systems • Types of discovery analysis: – Customer Path – Fraud – Social Network – Attrition – Online testing/targeting • Go beyond expensive data scientists and “democratize” discovery Customer Paths To Attrition Fraudulent Paths * Content Courtesy of Thomas Davenport
  • 16. 16 If You Know SQL – You Can Do This! Some of the 100+ out-of-the-box analytical apps Path Analysis Discover patterns in rows of sequential data Text Analysis Derive patterns and extract features in textual data Statistical Analysis High-performance processing of common statistical calculations Segmentation Discover natural groupings of data points Marketing Analytics Analyze customer interactions to optimize marketing decisions Data Transformation Transform data for more advanced analysis
  • 17. 17 Barnes and Noble http://www.teradata.com/Resources/Videos/Data-Driven-Decision- Making/?LangType=1033&LangSelect=true
  • 18. 18 Architecture Differences – File System vs. Relational Database • Hadoop • Teradata
  • 19. 19 What Goes in Hadoop? © 2014 Teradata
  • 20. 20 Benefits of Hadoop • Runs on 10 to 4,000 servers – Extreme scalability • Data analyzed where it is stored – Move function to data – Don’t move data to the function • Use popular developer tools – Java, grep, python, etc. • Average programmers do parallel processing – Millions of Java programmers • All open source (free)
  • 21. 21 Yahoo! Hadoop Clusters • ≈42,000 machines running Hadoop • Largest Hadoop clusters are currently 4000 nodes • Several petabytes of user data (compressed, unreplicated) • Run hundreds of thousands of jobs every month
  • 22. Yahoo! Japan http://blogs.teradata.com/customers/yahoojapan-increasing-roi-through-predictive- 22 © 2014 Teradata analytics-to-solve-customers-challenges-for-a-better-japan/
  • 23. 23 How They All Work Together Service Management Teradata Applications Reports Visualization Tools Source Data Marketing Sales Customers Marketing Execution Campaign Management BI and Visualization Advanced Analytics Data Mining Marketing Operations Predictive Models Data Integration DATA INGEST Data Infrastructure Data Access Analytic Users Lifecycle Development and Sustainment Production Support and Operations ERP CRM SCM Images, Audio & Video Machine Logs, Text, Web, Social
  • 24. 24 Verizon Wireless http://www.teradata.com/Resources/Videos/Verizon-Wireless-Employing- Unified-Data-Architecture-to-serve-100-million-customers/ © 2014 Teradata
  • 25. 25 Thank You! Questions and Answers