SlideShare a Scribd company logo
Albert Wong - Manager Reporting Platform
Big Data Visual Analytics Oct 2013
Netflix Case Study
Netflix Data Platform
2013 Netflix / MicroStrategy / Amazon AWS Webinar
Production
S3
Query
Cloud Apps
Bonus
DSE Platform
S3
Honu
2013 Netflix / MicroStrategy / Amazon AWS Webinar
S3
Honu
2013 Netflix / MicroStrategy / Amazon AWS Webinar
DSE Platform
Processing
Honu
S3
2013 Netflix / MicroStrategy / Amazon AWS Webinar
2013 Netflix / MicroStrategy / Amazon AWS Webinar
2013 Netflix / MicroStrategy / Amazon AWS Webinar
DSE Platform
S3
Honu
DSE Platform
S3
Honu
Hive Server
Demo
Twitter - albertcwong
select
dateint,
hour,
count(distinct other_properties['CUSTOMER_ID']) signups
from default.start_membership_event
where other_properties['SIGNUP_COUNTRY']='NL'
and other_properties['IS_TESTER']='false'
and ( dateint >= 20130911 OR (dateint = 20130910 AND hour >= 22) )
and ( dateint <= 20130916 OR (dateint = 20130917 AND hour < 22) )
group by
dateint,
hour
2013 Netflix / MicroStrategy / Amazon AWS Webinar
2013 Netflix / MicroStrategy / Amazon AWS Webinar
2013 Netflix / MicroStrategy / Amazon AWS Webinar
2013 Netflix / MicroStrategy / Amazon AWS Webinar

More Related Content

PPTX
Server Admin Tableau User Group.pptx
PDF
2014 DATA @ NFLX (Tableau Customer Conference)
PPTX
2015 Tableau Server on AWS (Tableau Customer Conference)
PPTX
2013 MicroStrategy World
PPTX
2014 MicroStrategy at Netflix (MicroStrategy User Group)
PPTX
2013 DATA @ NFLX (Tableau User Group)
PPTX
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
PDF
2024 Trend Updates: What Really Works In SEO & Content Marketing
Server Admin Tableau User Group.pptx
2014 DATA @ NFLX (Tableau Customer Conference)
2015 Tableau Server on AWS (Tableau Customer Conference)
2013 MicroStrategy World
2014 MicroStrategy at Netflix (MicroStrategy User Group)
2013 DATA @ NFLX (Tableau User Group)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2024 Trend Updates: What Really Works In SEO & Content Marketing

Recently uploaded (20)

PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Quality review (1)_presentation of this 21
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Leprosy and NLEP programme community medicine
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Database Infoormation System (DBIS).pptx
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
ISS -ESG Data flows What is ESG and HowHow
Clinical guidelines as a resource for EBP(1).pdf
Optimise Shopper Experiences with a Strong Data Estate.pdf
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Quality review (1)_presentation of this 21
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
modul_python (1).pptx for professional and student
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Leprosy and NLEP programme community medicine
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
annual-report-2024-2025 original latest.
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Database Infoormation System (DBIS).pptx
Ad
Ad

2013 Netflix / MicroStrategy / Amazon AWS Webinar

Editor's Notes

  • #2: Hi, my name is Albert Wong I am in charge of the reporting platform at Netflix And today, we'll take a look at how we've set up EMR, Overview where it's used in our data platform, And go over how MicroStrategy plugs into this architecture Lastly, we'll end with a demo, illustrating how we get data from EMR to MSTR
  • #3: If you are new to Netflix, we are a tv/movie content streaming business When you log in, we display content you can watch on demand If you're a kid, we provide a section for kids content next to 'watch instantly' at the top And, we also have a DVD service Now, clicking on one of the shows below does 2 things First, you get to watch the show Second, we, at the data platform, get data to analyze
  • #4: Simplified view of our of data pipeline
  • #5: Full view of our data platform Complicated at first glance, but easy to understand when we take the time to break it down Let’s break it down
  • #6: Data platform when we were mainly a DVD company
  • #7: Infinite hard drive Low cost to store our data reliably Stands for simple storage service So what do we store?
  • #8: Our event data pipeline We get streaming events (e.g. when you start a movie/show, stop, pause, resume) Branch off of similar technology called Chukwa conceived at Yahoo!, then modified to meet our needs
  • #9: Where we store our dimension data (e.g. titles, user accounts) Open source dbms, distributed reads quickly, writes, quickly And we’ve replaced Oracle with it
  • #10: Review Keep in mind, we eventually want this data to be reported out of MSTR
  • #11: Needed something to process our increased data volumes Hadoop fit the bill It’s a framework for processing large data sets Full discussion on how hadoop works out of scope for today But one thing to highlight is it’s designed to scale – if we need to process more data, we just add more servers And EMR allows us to add more servers with relative ease
  • #12: Pig is an interface which makes it easy to write code to be executed within the hadoop framework to extract data Python is a high level programming language that we use to aid in data transformation
  • #13: Hive, like pig makes it easy to write code to execute code within the hadoop framework The language resembles SQL We use it for adhoc queries and creating aggregate/summary tables
  • #14: Review once again, Cassandra, and Honu data land in S3 For Cassandra, have an intermediary data extraction step before S3 On the right, we have hive, pig and python being used to process and aggregate data for reporting We then move that down to Teradata and then into MicroStrategy for reporting Lots of steps, what if we want to just explore data without processing, skip the ETL process?
  • #15: Spin up a hive server in EMR Configure MicroStrategy to talk to it Query data directly out of AWS