SlideShare a Scribd company logo
An Introduction to
BIG DATA
CUSO Seminar on Big Data
Prof. Dr. Philippe Cudré-Mauroux
http://exascale.info
May 22, 2014
Fribourg–Switzerland
1
2
On the Menu Today
• Big Data: Context
• Big Data: Buzzwords
– 3 Vs of Big Data
• Big Data Landscape
• Hadoop
• Big Data in Switzerland
Instant Quizz
• 3 Vs of Big Data?
• CAP?
• Hadoop?
• Spark?
3
Exascale Data Deluge
• Science
– Biology
– Astronomy
– Remote Sensing
• Web companies
– Ebay
– Yahoo
• Financial services,
retail companies
governments, etc.
© Wired 2009
➡ New data formats
➡ New machines
➡ Peta & exa-scale datasets
➡ Obsolescence of traditional
information infrastructures
The Web as the Main Driver
5
© Qmee
Big Data Central Theorem
Data+Technology  Actionable Insight  $$
6
Big Data Buzz
7
Between now and 2015, the firm expects big data to
create some 4.4 million IT jobs globally; of those,
1.9 million will be in the U.S. Applying an economic
multiplier to that estimate, Gartner expects each new
big-data-related IT job to create work for three more
people outside the tech industry, for a total of almost
6 million more U.S. jobs.
Growth in the Asia Pacific Big Data
market is expected to accelerate rapidly
in two to three years time, from a mere
US$258.5 million last year to in excess
of $1.76 billion in 2016, with highest
growth in the storage segment.
Big Data as a New Class of Asset
• The Age of Big Data (NYTimes Feb. 11, 2012)
http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-
world.html
“Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at
Google and now Facebook, are masters at harnessing the data of the Web
— online searches, posts and messages — with Internet advertising. At the
World Economic Forum last month in Davos, Switzerland, Big Data was a
marquee topic. A report by the forum, “Big Data, Big Impact,” declared data
a new class of economic asset, like currency or gold.”
8
9
The 3-Vs of Big Data
• Volume
– Amount of data
• Velocity
– speed of data in and out
• Variety
– range of data types and sources
• [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high-
variety information assets that require new forms of processing to
enable enhanced decision making, insight discovery and process
optimization" 10
What can you do with the data
• Reporting
– Post Hoc
– Real time
• Monitoring (fine-grained)
• Exploration
• Finding Patterns
• Root Cause Analysis
• Closed-loop Control
• Model construction
• Prediction
• …
11
© Mike Franklin
10 ways big data changes everything
• Some concrete examples
– http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/
1. Can gigabytes predict the next Lady Gaga?
2. How big data can curb the world’s energy consumption
3. Big data is now your company’s virtual assistant
4. The future of Foursquare is data-fueled recommendations
5. How Twitter data-tracked cholera in Haiti
6. Revolutionizing Web publishing with big data
7. Can cell phone data cure society’s ills?
8. How data can help predict and create video hits
9. The new face of data visualization
10. One hospital’s embrace of big data
12
Typical Big Data Success Story
• Modeling users through Big Data
– Online ads sale / placement [e.g., Facebook]
– Personalized Coupons [e.g., Target]
– Product Placement [Walmart]
– Content Generation [e.g., NetFlix]
– Personalized learning [e.g., Duolingo]
– HR Recruiting [e.g., Gild]
13
More Data => Better Answers?
• Not that easy…
• More Rows: Algorithmic complexity kicks in
• More Columns: Exponentially more hypotheses
• Another formulation of the problem:
– Given an inferential goal and a fixed computational budget,
provide a guarantee that the quality of inference will increase
monotonically as data accrue (without bound)
• In other words:
=> Data should be a resource, not a load
14
© Mike Jordan
Big Data Infrastructures
15
A Concrete Example: Zynga
16
Leading the Pack of Wolves: Hadoop
• Google: Map/Reduce paper published 2004
• Open source variant: Hadoop
• Map-reduce = high-level programming model and
implementation for large-scale parallel data processing
• Right now most overhyped system in CS
17
What about Swiss Big Data?
• Competitive Research Groups
• Swiss Big Data User Group
• Swiss companies playing catch-up
– Productized Big Data systems at leading telcos & financial
companies
– Big Data is not a new technology: it's a fact;
• Deal with it  POCs in most banks, insurance companies, retailers
18
Tasty Bites of Big Data (1)
Thursday afternoon
• 13:30-15:00: Big Data Profiling
Felix Naumann (Hasso Plattner Institute)
• 15:15-16:45: Realtime Analytics
Christoph Koch (EPFL)
• 16:45-17:45: Current Trends and Challenges in Big Data
Benchmarking
Kais Sachs (SAP / Spec)
19
Tasty Bites of Big Data (2)
Friday
• 9:00 - 10:30: Structured Data in Web Search
Alon Halevy (Google)
• 10:45 - 12:15: Human Computation for Big Data
Gianluca Demartini (UNIFR)
• 13:30-15:00: Analysing and Querying Big Scientific Data
Thomas Heinis (EPFL)
• 15:00-16:30: The Evolution of Big Data Frameworks
Carlo Curino (Microsoft Research)
20
Social Event, Friday – Beer
Tasting!
Basse-Ville Fribourg / 15 CHF per Person
Everything You Always Wanted to Know
About Beer. * But Were Afraid to Ask!
18:00 @ Café du Belvédère, Grand-Rue
36
19:00 @ Fri-Mousse, Rue de la
Samaritaine 19
Limited Places, Inscription is mandatory at:
http://xr.si

More Related Content

PPTX
Big data
PDF
Big Data Information Architecture PowerPoint Presentation Slide
PPTX
Introduction to big data
PDF
Big Data Characteristics And Process PowerPoint Presentation Slides
PPT
Research issues in the big data and its Challenges
PDF
Big data privacy issues in public social media
PDF
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
PPTX
Big data Presentation
Big data
Big Data Information Architecture PowerPoint Presentation Slide
Introduction to big data
Big Data Characteristics And Process PowerPoint Presentation Slides
Research issues in the big data and its Challenges
Big data privacy issues in public social media
Big Data & Future - Big Data, Analytics, Cloud, SDN, Internet of things
Big data Presentation

What's hot (20)

PDF
Big Data vs. Small Data...what's the difference?
PDF
Big Data - Insights & Challenges
PPTX
A Big Data Concept
PPTX
Team 2 Big Data Presentation
PPTX
Big data - What is It?
PPTX
Data mining with big data implementation
PDF
Big Data Trends
PDF
Big Data & Analytics (Conceptual and Practical Introduction)
PDF
Applications of Big Data
PDF
Big Data : Risks and Opportunities
DOCX
Big data (word file)
DOCX
Big data lecture notes
ODP
Big Data Presentation
PDF
Introduction to big data
PDF
Trends in Big Data & Business Challenges
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
PPTX
Big data and its applications
PPTX
Big Data - The 5 Vs Everyone Must Know
PDF
The promise and challenge of Big Data
Big Data vs. Small Data...what's the difference?
Big Data - Insights & Challenges
A Big Data Concept
Team 2 Big Data Presentation
Big data - What is It?
Data mining with big data implementation
Big Data Trends
Big Data & Analytics (Conceptual and Practical Introduction)
Applications of Big Data
Big Data : Risks and Opportunities
Big data (word file)
Big data lecture notes
Big Data Presentation
Introduction to big data
Trends in Big Data & Business Challenges
Big Data - 25 Amazing Facts Everyone Should Know
Big data and its applications
Big Data - The 5 Vs Everyone Must Know
The promise and challenge of Big Data
Ad

Similar to An Introduction to Big Data (20)

PDF
Big Data et eGovernment
PPT
Big data and Internet
PPTX
Big Data By Vijay Bhaskar Semwal
PPTX
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
PPTX
Data mining with big data
PPTX
Big data
PPTX
Big data seminor
PPTX
Fundamentals of Big Data
PDF
Big data basics
PPTX
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
PPTX
BigDataFinal.pptx
PPTX
Bigdata " new level"
PPTX
Kartikey tripathi
PPTX
Data mining with big data
PDF
PPTX
ppt final.pptx
PPTX
Identifying the new frontier of big data as an enabler for T&T industries: Re...
PPTX
Big Data ppt
PDF
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
PDF
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Big Data et eGovernment
Big data and Internet
Big Data By Vijay Bhaskar Semwal
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Data mining with big data
Big data
Big data seminor
Fundamentals of Big Data
Big data basics
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
BigDataFinal.pptx
Bigdata " new level"
Kartikey tripathi
Data mining with big data
ppt final.pptx
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Big Data ppt
Whitepaper: Know Your Big Data – in 10 Minutes! - Happiest Minds
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Ad

More from eXascale Infolab (20)

PDF
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
PPTX
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
PDF
Representation Learning on Complex Graphs
PPTX
A force directed approach for offline gps trajectory map
PPTX
Cikm 2018
PPTX
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
PDF
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
PDF
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
PDF
Crowd scheduling www2016
PPTX
SANAPHOR: Ontology-based Coreference Resolution
PDF
Efficient, Scalable, and Provenance-Aware Management of Linked Data
PDF
Entity-Centric Data Management
PDF
SSSW 2015 Sense Making
PDF
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
PDF
Executing Provenance-Enabled Queries over Web Data
PDF
The Dynamics of Micro-Task Crowdsourcing
PDF
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
PPTX
CIKM14: Fixing grammatical errors by preposition ranking
PDF
OLTP-Bench
PPTX
Hasler2014
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
Representation Learning on Complex Graphs
A force directed approach for offline gps trajectory map
Cikm 2018
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Crowd scheduling www2016
SANAPHOR: Ontology-based Coreference Resolution
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Entity-Centric Data Management
SSSW 2015 Sense Making
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
Executing Provenance-Enabled Queries over Web Data
The Dynamics of Micro-Task Crowdsourcing
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
CIKM14: Fixing grammatical errors by preposition ranking
OLTP-Bench
Hasler2014

Recently uploaded (20)

PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Database Infoormation System (DBIS).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Global journeys: estimating international migration
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Introduction to Business Data Analytics.
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Moving the Public Sector (Government) to a Digital Adoption
Business Ppt On Nestle.pptx huunnnhhgfvu
climate analysis of Dhaka ,Banglades.pptx
Introduction-to-Cloud-ComputingFinal.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Database Infoormation System (DBIS).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Foundation of Data Science unit number two notes
IBA_Chapter_11_Slides_Final_Accessible.pptx
Global journeys: estimating international migration
Acceptance and paychological effects of mandatory extra coach I classes.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to Business Data Analytics.
Clinical guidelines as a resource for EBP(1).pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

An Introduction to Big Data

  • 1. An Introduction to BIG DATA CUSO Seminar on Big Data Prof. Dr. Philippe Cudré-Mauroux http://exascale.info May 22, 2014 Fribourg–Switzerland 1
  • 2. 2 On the Menu Today • Big Data: Context • Big Data: Buzzwords – 3 Vs of Big Data • Big Data Landscape • Hadoop • Big Data in Switzerland
  • 3. Instant Quizz • 3 Vs of Big Data? • CAP? • Hadoop? • Spark? 3
  • 4. Exascale Data Deluge • Science – Biology – Astronomy – Remote Sensing • Web companies – Ebay – Yahoo • Financial services, retail companies governments, etc. © Wired 2009 ➡ New data formats ➡ New machines ➡ Peta & exa-scale datasets ➡ Obsolescence of traditional information infrastructures
  • 5. The Web as the Main Driver 5 © Qmee
  • 6. Big Data Central Theorem Data+Technology  Actionable Insight  $$ 6
  • 7. Big Data Buzz 7 Between now and 2015, the firm expects big data to create some 4.4 million IT jobs globally; of those, 1.9 million will be in the U.S. Applying an economic multiplier to that estimate, Gartner expects each new big-data-related IT job to create work for three more people outside the tech industry, for a total of almost 6 million more U.S. jobs. Growth in the Asia Pacific Big Data market is expected to accelerate rapidly in two to three years time, from a mere US$258.5 million last year to in excess of $1.76 billion in 2016, with highest growth in the storage segment.
  • 8. Big Data as a New Class of Asset • The Age of Big Data (NYTimes Feb. 11, 2012) http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the- world.html “Welcome to the Age of Big Data. The new megarich of Silicon Valley, first at Google and now Facebook, are masters at harnessing the data of the Web — online searches, posts and messages — with Internet advertising. At the World Economic Forum last month in Davos, Switzerland, Big Data was a marquee topic. A report by the forum, “Big Data, Big Impact,” declared data a new class of economic asset, like currency or gold.” 8
  • 9. 9
  • 10. The 3-Vs of Big Data • Volume – Amount of data • Velocity – speed of data in and out • Variety – range of data types and sources • [Gartner 2012] "Big Data are high-volume, high-velocity, and/or high- variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization" 10
  • 11. What can you do with the data • Reporting – Post Hoc – Real time • Monitoring (fine-grained) • Exploration • Finding Patterns • Root Cause Analysis • Closed-loop Control • Model construction • Prediction • … 11 © Mike Franklin
  • 12. 10 ways big data changes everything • Some concrete examples – http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/2/ 1. Can gigabytes predict the next Lady Gaga? 2. How big data can curb the world’s energy consumption 3. Big data is now your company’s virtual assistant 4. The future of Foursquare is data-fueled recommendations 5. How Twitter data-tracked cholera in Haiti 6. Revolutionizing Web publishing with big data 7. Can cell phone data cure society’s ills? 8. How data can help predict and create video hits 9. The new face of data visualization 10. One hospital’s embrace of big data 12
  • 13. Typical Big Data Success Story • Modeling users through Big Data – Online ads sale / placement [e.g., Facebook] – Personalized Coupons [e.g., Target] – Product Placement [Walmart] – Content Generation [e.g., NetFlix] – Personalized learning [e.g., Duolingo] – HR Recruiting [e.g., Gild] 13
  • 14. More Data => Better Answers? • Not that easy… • More Rows: Algorithmic complexity kicks in • More Columns: Exponentially more hypotheses • Another formulation of the problem: – Given an inferential goal and a fixed computational budget, provide a guarantee that the quality of inference will increase monotonically as data accrue (without bound) • In other words: => Data should be a resource, not a load 14 © Mike Jordan
  • 17. Leading the Pack of Wolves: Hadoop • Google: Map/Reduce paper published 2004 • Open source variant: Hadoop • Map-reduce = high-level programming model and implementation for large-scale parallel data processing • Right now most overhyped system in CS 17
  • 18. What about Swiss Big Data? • Competitive Research Groups • Swiss Big Data User Group • Swiss companies playing catch-up – Productized Big Data systems at leading telcos & financial companies – Big Data is not a new technology: it's a fact; • Deal with it  POCs in most banks, insurance companies, retailers 18
  • 19. Tasty Bites of Big Data (1) Thursday afternoon • 13:30-15:00: Big Data Profiling Felix Naumann (Hasso Plattner Institute) • 15:15-16:45: Realtime Analytics Christoph Koch (EPFL) • 16:45-17:45: Current Trends and Challenges in Big Data Benchmarking Kais Sachs (SAP / Spec) 19
  • 20. Tasty Bites of Big Data (2) Friday • 9:00 - 10:30: Structured Data in Web Search Alon Halevy (Google) • 10:45 - 12:15: Human Computation for Big Data Gianluca Demartini (UNIFR) • 13:30-15:00: Analysing and Querying Big Scientific Data Thomas Heinis (EPFL) • 15:00-16:30: The Evolution of Big Data Frameworks Carlo Curino (Microsoft Research) 20
  • 21. Social Event, Friday – Beer Tasting! Basse-Ville Fribourg / 15 CHF per Person Everything You Always Wanted to Know About Beer. * But Were Afraid to Ask! 18:00 @ Café du Belvédère, Grand-Rue 36 19:00 @ Fri-Mousse, Rue de la Samaritaine 19 Limited Places, Inscription is mandatory at: http://xr.si