SlideShare a Scribd company logo
ORGANIZATION NAME 
Photo: Courtesy of O'Reilly Conference on Flickr 
How LinkedIn Democratizes 
Big Data Visualization
Democratizes 
Big Data Visualization 
How 
Jonathan Wu 
Praveen Neppalli Naga 
Chi-Yi Kuan
313,000,000 
Members 
End of Q2 2014
25,000,000,000 
Page Views 
Q2 2014
3,000,000+ 
Endorsements
3,500,000+ 
Companies
What can we do with Linkedin data 
?
Sales 
Talent flow between companies
Product & engineering
Is it simple? 
Member attributes 
Page View events data
Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344 
Data is the new vineyard
Photo Credit: https://www.flickr.com/photos/johnjoh/1060267344 
Data is the new vineyard
Data infra: collect & prepare data 
Collect & Prepare Data 
Mysql, Oracle, Kafka + Hadoop 
Serve Data 
Pinot 
Taste Data 
Easy-to-use visualization
Data Computation 
ETL 
HDFS 
Y 
A 
R 
N 
Map-Reduce 
Spark 
Tez 
Pig 
Hive 
Cubert 
Kafka 
Data Stores 
Hadoop
Data infra: Serve data 
Collect & Prepare Data 
Kafka + Hadoop 
Serve Data 
Pinot 
Taste Data 
Easy-to-use visualization
Products for members/customers with real-time interactive analytics 
•Who’s Viewed Your Profile 
•Ads Reporting 
•Jobs Analytics 
Categories of interactive analytics products 
Interactive business analytics for internal use 
•How feature X is performing 
Real-time business monitoring 
•Page view changes across mobile devices in different regions
Requirements for real-time interactive analytics 
Slice and dice billions of records, hundreds of dimensions 
End to end freshness of minutes not hours 
Sub-second query response times 
e.g. Which are top regions that contribute to my profile views? Which industries in those regions?
Pinot 
Distributed Analytics Infrastructure that serves Interactive Analytics products at Linkedin.
Data Indexes 
Distributed System 
Ingestion 
What is Pinot? 
Compressed Columnar indexes (supports Mmap and In-memory) 
Apache Helix for cluster management 
Apache Kafka (for near real-time) and Hadoop
Data Indexes 
Single Value Index 
Multi Value Index 
Inverted Index 
•Fixed bit length encoding 
•Sorted Index 
•Secondary Sorted Index 
•Multi-value Fixed bit length encoding 
•BitMap Multi-value Index 
•P4Delta 
•Modified P4Delta 
•BitMap
Cluster Management 
•Create Resources 
•Update Resource meta data 
•Expand/Contract partitions dynamically 
•Query Router
Data Ingestion 
Kafka for Realtime 
Hadoop for Historical
High Level Architecture 
PINOT 
Hadoop 
Kafka 
Historical 
Realtime 
CLUSTER MANAGER 
Controller 
Helix 
Zookeeper 
Broker 1 
Broker 2 
Server 1 
Server 2 
Server 3
Core Features 
Low latency and high QPS OLAP Queries with real-time ingestion 
Support complex dimensions 
Operational simplicity 
Data bootstrapping & reconciliation
Usage @ Linkedin 
About 18 member facing products on Linkedin.com 
Internal Reporting 
Open-source.…coming soon
Reporting UI: serve & taste data 
Collect & Prepare Data 
Kafka + Hadoop 
Serve Data 
Pinot 
Taste Data 
Easy-to-use visualization
I want to access big data without running SQL 
Business need
Start a new dashboard with one click
Select what metrics/dimensions you want
Charts are rendered in just a few seconds
Zoom into a single chart
Filter on various dimensions
Access everywhere
Portal that connects dashboards, internal reports, and internal Wiki Pages 
Enterprise analytics portal
Scale of the data 
Pinot for interactive analysis 
Self service visualization for insights 
Summary
How LinkedIn Democratizes Big Data Visualization
We are hiring 
Jonathan Wu 
www.linkedin.com/in/pneppalli 
www.linkedin.com/in/jiyewu 
www.linkedin.com/in/chiyikuan 
jowu@linkedin.com 
Praveen Neppalli Naga pneppalli@linkedin.com 
Chi-yi Kuan ckuan@linkedin.com 
650-605-2184 
650-962-3299 
650-426-6301

More Related Content

PDF
Understanding voice of the member via text mining
PDF
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
PDF
How LinkedIn leverages data to build scalable payments strategy
PPTX
Connected data meetup group - introduction & scope
PDF
Sebastian Hellmann
PPTX
Lju Lazarevic
PDF
Building, and communicating, a knowledge graph in Zalando
PDF
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Understanding voice of the member via text mining
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
How LinkedIn leverages data to build scalable payments strategy
Connected data meetup group - introduction & scope
Sebastian Hellmann
Lju Lazarevic
Building, and communicating, a knowledge graph in Zalando
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...

What's hot (19)

PPTX
Charles Ivie
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
PDF
Fried data summit data quality data analytics together
PPSX
RDF and OWL : the powerful duo | Tara Raafat
PDF
Ontos NLP Stack, Sep. 2016
PPTX
Solution architecture
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
PDF
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
PPTX
The Evolution of Search and Big Data
PPTX
Enterprise architecture for big data projects
PDF
New from BookNet Canada: BNC CataList - Carol Gordon - Tech Forum 2018
PDF
Callcenter HPE IDOL overview
PDF
Smarter content with a Dynamic Semantic Publishing Platform
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
PDF
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
PDF
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
PPTX
Sören Auer | Enterprise Knowledge Graphs
PDF
Using neo4j for enterprise metadata requirements
Charles Ivie
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Fried data summit data quality data analytics together
RDF and OWL : the powerful duo | Tara Raafat
Ontos NLP Stack, Sep. 2016
Solution architecture
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
The Evolution of Search and Big Data
Enterprise architecture for big data projects
New from BookNet Canada: BNC CataList - Carol Gordon - Tech Forum 2018
Callcenter HPE IDOL overview
Smarter content with a Dynamic Semantic Publishing Platform
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
Sören Auer | Enterprise Knowledge Graphs
Using neo4j for enterprise metadata requirements
Ad

Similar to How LinkedIn Democratizes Big Data Visualization (20)

PDF
Big Data in Action – Real-World Solution Showcase
PDF
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
PDF
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
PDF
Big Data & SQL: The On-Ramp to Hadoop
PPTX
Marketing Digital Command Center
PDF
Big data for Telco: opportunity or threat?
PDF
Hadoop as an Analytic Platform: Why Not?
PDF
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
PPTX
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
PPTX
A modern, flexible approach to Hadoop implementation incorporating innovation...
PPTX
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
PPTX
4th Industrial Revolution
PDF
The Great Lakes: How to Approach a Big Data Implementation
PPTX
PDF
Introducing Neo4j
PDF
Level Up – How to Achieve Hadoop Acceleration
PDF
Making the Most of Power BI with SQL Server 2014 and Azure
PPSX
Best practices to deliver data analytics to the business with power bi
PPTX
TIBCO Advanced Analytics Meetup (TAAM) November 2015
PDF
Tapdata Product Intro
Big Data in Action – Real-World Solution Showcase
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Big Data & SQL: The On-Ramp to Hadoop
Marketing Digital Command Center
Big data for Telco: opportunity or threat?
Hadoop as an Analytic Platform: Why Not?
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
A modern, flexible approach to Hadoop implementation incorporating innovation...
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
4th Industrial Revolution
The Great Lakes: How to Approach a Big Data Implementation
Introducing Neo4j
Level Up – How to Achieve Hadoop Acceleration
Making the Most of Power BI with SQL Server 2014 and Azure
Best practices to deliver data analytics to the business with power bi
TIBCO Advanced Analytics Meetup (TAAM) November 2015
Tapdata Product Intro
Ad

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Computer network topology notes for revision
PDF
annual-report-2024-2025 original latest.
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Lecture1 pattern recognition............
PPT
Reliability_Chapter_ presentation 1221.5784
Database Infoormation System (DBIS).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Clinical guidelines as a resource for EBP(1).pdf
STUDY DESIGN details- Lt Col Maksud (21).pptx
Computer network topology notes for revision
annual-report-2024-2025 original latest.
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
oil_refinery_comprehensive_20250804084928 (1).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Lecture1 pattern recognition............
Reliability_Chapter_ presentation 1221.5784

How LinkedIn Democratizes Big Data Visualization