SlideShare a Scribd company logo
Big Data EcoSystem and Analytics @ LinkedIn
May 16, 2013
LinkedIn Confidential ©2013 All Rights Reserved
Srinu Adira
Manager, Data Services(Business Solutions)
LinkedIn Corporation
http://www.linkedin.com/in/srinuadira
LinkedIn Confidential ©2013 All Rights Reserved 2
Outline
LinkedIn Overview
Why Data is important for LinkedIn?
Big Data Ecosystem
Analytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 3
Our Mission
Connect the world’s professionals
to make them more productive and successful
LinkedIn Confidential ©2013 All Rights Reserved 4
5
The LinkedIn Opportunity
Connect talent with opportunity at massive scale
+
Fundamentally transforming the way the world works
LinkedIn Confidential ©2013 All Rights Reserved
200M+
The World’s Largest Professional Network
LinkedIn Confidential ©2013 All Rights Reserved 6
8
17
32
55
90
147
2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
*
88%
Fortune 100 Companies
use LinkedIn to hire
~2/sec
New Members joining
>2.9M
Company Pages
Professional
searches in 2012
~5.7B
Outline
 LinkedIn Overview
 Why Data is important for LinkedIn?
 Big Data Ecosystem
 Analytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 7
LinkedIn Confidential ©2013 All Rights Reserved 8
“If you are not embarrassed by the first version
of your product, you have launched it too late.”
Reid Hoffman, Founder & Chairman LinkedIn Corp
LinkedIn Confidential ©2013 All Rights Reserved 9
“What gets measured gets fixed.”
David Henke, SVP Technology Operations, LinkedIn Corp
LinkedIn Confidential ©2013 All Rights Reserved 10
The Power of LinkedIn’s Network Effects
Member growth
and engagement
Relevant and
valuable products,
solutions & services
Critical mass
of data
 Few Data Driven Products
 People You May Like
 Groups You May Like
 Jobs You May Be Interested In
 Who's Viewed Your Profile
 Companies You May Want To Follow
11
LinkedIn Confidential ©2013 All Rights Reserved
Data Insights (Sample)
LinkedIn Confidential ©2013 All Rights Reserved 12
Data Solutions (Sample)
LinkedIn Confidential ©2013 All Rights Reserved
Java/MPP
/Hadoop
ML/Statis
tical
Packages
Hadoo
pMPP
MPP
13
 Data Solutions Drivers
 Business analytics (e.g., data mining, enable
decision making)
 Sales analytics (e.g., customer segmentation,
targeting)
 Marketing (e.g., campaigns)
 Data insights for Customers (e.g., Career site
analytics)
 Business Operations (forecasting, business pulse)
14
LinkedIn Confidential ©2013 All Rights Reserved
Outline
 LinkedIn Overview
 Why Data is important at LinkedIn?
 Big Data Ecosystem
 Analytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 15
Big Data at LinkedIn
16
* Chart from Philip Russom- Research Director: TDWI
LinkedIn Confidential ©2013 All Rights Reserved
LinkedIn Confidential ©2013 All Rights Reserved 17
Big Data at LinkedIn
 Platform and solutions that
 Scale at cost with data complexity
 Simplify the data continuum across online, near-line
and offline
 Enable business decisions
18
What does “big data” mean at LinkedIn?
ERP data…
Social Data…
CRM data…
Web data…
+∞
+∞
Data
Volume
Analytical Challenge & Complexity
0
18
LinkedIn Confidential ©2013 All Rights Reserved
3 major data dimensions at LinkedIn
19
Identity
Data
Social
Data
Behavioral
Data
LinkedIn Confidential ©2013 All Rights Reserved
LinkedIn Confidential ©2013 All Rights Reserved 2
0
Near-Line
Data Store
Online Data
Store
Web
Logs
Big Data at LinkedIn
High-level data environment
Application
Users
Challenges so complex that
off-the-shelf or a few
technologies can’t address
Offline Data
Store
Built our own combination of
toolsets/ technologies to
meet specific requirements
LinkedIn Confidential ©2013 All Rights Reserved 21
LinkedIn’s Sample Data Stack
Let’s do a deep dive to understand how the capabilities of
LinkedIn’s data stack meet our requirements
LinkedIn Confidential ©2013 All Rights Reserved 22
Users
Near-Line
Data Store
Online Data
Store
Application Offline Data
Store
Web
Logs
LinkedIn Data Stack – Online
Systems
•
•
Capabilities
Rich structures (e.g., indexes)
Change capture capability
LinkedIn Confidential ©2013 All Rights Reserved 33
Users
Near-Line
Data Store
Online Data
Store
Application Offline Data
Store
Web
Logs
LinkedIn Data Stack – Nearline
Systems Capabilities
•
•
•
Distributed Key value store
Search platform
Distributed Graph engine
Bobo Sensei
Voldemort
Zoie
D-Graph
LinkedIn Confidential ©2013 All Rights Reserved 34
Users
Online Data
Store
Application Offline Data
Store
Web
Logs
LinkedIn Data Stack – Pipeline
Systems Capabilities
•
•
•
Messaging for site events, monitoring
Change data capture streams
Reliable, consistent, low latency pipe
Near-Line
Data Store
LinkedIn Confidential ©2013 All Rights Reserved 35
Users
Near-Line
Data Store
Online Data
Store
Application Offline Data
Store
Web
Logs
LinkedIn Data Stack – Offline
Systems
•
•
Capabilities
Machine learning, ranking,
Relevance, Solutions
Warehouse and analytics
LinkedIn with Hadoop, Aster, and Teradata
Aster/Teradata
Bi-Directional Connector
Aster/Teradata
Hadoop Connectors
Data transformation
& batch processing
• Image processing
• Search indexes
• Graph (PYMK)
• MapReduce
Batch data transformations for
engineering groups using HDFS +
MapReduce
LinkedIn Confidential ©2013 All Rights Reserved
Analytic Platform for data
discovery
• nPath Pattern/Path
• Clickstream analysis
• A/B site testing
• Data Sciences discovery
• SQL-MapReduce
Interactive MapReduce
analytics for the enterprise using
MapReduce Analytics &
SQL-MapReduce
Integrated Data
Warehouse
• Exec Dashboards
• Adhoc/OLAP
• Complex SQL
• SQL
Integration with structured data,
operational intelligence, scalable
distribution of analytics
26
Outline
 LinkedIn Overview
 Why Data is important at LinkedIn?
 Big Data Ecosystem
 Analytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 27
Several examples of business
analytics evolution at LinkedIn
Products
Marketing
Sales
1
2
3
28
How we leverage data to support Marketing
29
Identity Data
Social Data
Behavioral Data
Overall
Audience
Target
Audience
LinkedIn Confidential ©2013 All Rights Reserved
The closed-loop analytical framework
30
Execution
Reporting &
business
intelligence
Post campaign
analysis
Model building
and tuning
Campaign
planning & design
Test
Measure
Why?Predict
Design
LinkedIn Confidential ©2013 All Rights Reserved
A example of using data to improve sales
Which account? Who? How?
Step 1 Step 2 Step 3
Identity Data
Social Data
Behavioral Data
31
How to provide 500 to 1000X impact?
Insights portal for sales org.
Easy: quickly find right info
Fast: few seconds response time for
most insights
Scalable: 2M+ accounts/prospects
Accurate: mimic analyst/data scientist1
2
3
4
32
Four stages of data analytics
What will happen?
What happened?
Why it happened?
What is happening?
High
High
Business
Value
Analytical Challenge & Complexity
0
33
LinkedIn Confidential ©2013 All Rights Reserved
Use data to solve product problems
-- A solution for answering A/B testing questions
Let technology work for us
Results first, methodology later
Bypass the charts and reports
Several thousands A/B tests are live, how
to measure the performance?
1
2
3
34
LinkedIn Confidential ©2013 All Rights Reserved
Nextplay : Web 3.0 – It’s all about data!!
LinkedIn Confidential ©2013 All Rights Reserved 35
We are hiring!
Thank you!
36
sadira@linkedin.com
LinkedIn Confidential ©2013 All Rights Reserved

More Related Content

PPTX
Slides: The Business Value of Data Modeling
PDF
Big Data Ecosystem @ LinkedIn
PDF
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
PDF
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
PDF
Modeling Data Governance
PPTX
Kostas Kastrantas | Business Opportunities with Linked Open Data
PDF
Neo4j Aura Enterprise
PDF
Battle the Dark Side of Data Governance
Slides: The Business Value of Data Modeling
Big Data Ecosystem @ LinkedIn
An Overview of the Neo4j Cloud Strategy and the Future of Graph Databases in ...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Modeling Data Governance
Kostas Kastrantas | Business Opportunities with Linked Open Data
Neo4j Aura Enterprise
Battle the Dark Side of Data Governance

What's hot (20)

PDF
Designing a Successful Governed Citizen Data Science Strategy
PDF
DataStreams : Corporate Overview
PDF
GDPR: Leverage the Power of Graphs
PDF
Enabling a Culture of Self-Service Analytics
PDF
Big Data LDN 2017: How to leverage the cloud for Business Solutions
PPTX
Tiger graph 2021 corporate overview [read only]
PDF
Unlocking Greater Insights with Integrated Data Quality for Collibra
PPTX
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
PDF
Regulation and Compliance in the Data Driven Enterprise
PDF
Gain 3 Benefits with Delta Sharing
PDF
Chief Data Officer: Evolution to the Chief Analytics Officer and Data Science
PPTX
Big and fast data strategy 2017 jr
PDF
Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
PPTX
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
PPTX
How Data is Driving AI Innovation
PDF
Blockchain - "Hype, Reality and Promise" - ISG Digital Business Summit, 2018
PDF
The Connected Data Imperative: An Introduction to Neo4j
PPTX
Big Data and BI Best Practices
PDF
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
PPTX
Unlock Data-driven Insights in Databricks Using Location Intelligence
Designing a Successful Governed Citizen Data Science Strategy
DataStreams : Corporate Overview
GDPR: Leverage the Power of Graphs
Enabling a Culture of Self-Service Analytics
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Tiger graph 2021 corporate overview [read only]
Unlocking Greater Insights with Integrated Data Quality for Collibra
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
Regulation and Compliance in the Data Driven Enterprise
Gain 3 Benefits with Delta Sharing
Chief Data Officer: Evolution to the Chief Analytics Officer and Data Science
Big and fast data strategy 2017 jr
Graphs & the Police: How Law Enforcement Analyze Connected Data at Scale
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
How Data is Driving AI Innovation
Blockchain - "Hype, Reality and Promise" - ISG Digital Business Summit, 2018
The Connected Data Imperative: An Introduction to Neo4j
Big Data and BI Best Practices
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Unlock Data-driven Insights in Databricks Using Location Intelligence
Ad

Similar to Big data arch_analytics (20)

PPTX
Big data presentation linked in simon zhang 20140714
PPTX
How Business Analytics drives business value - Teradata partners conference N...
PPTX
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
PPTX
How Linkedin uses Automic for Big Data Processes
PPT
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
PDF
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
PPTX
LinkedIn Member Segmentation Platform: A Big Data Application
PPTX
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
PPTX
LinkedIn Segmentation & Targeting Platform
PPTX
Developing Data Products
PDF
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
PDF
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
PPTX
Group 1 LinkedIn
PPTX
Linked In Presentation
PPTX
LinkedIn Segmentation & Targeting Platform: A Big Data Application
PPTX
Data infrastructure and Hadoop at LinkedIn
PPT
Want to do a surevy among our members?
PDF
Open Source Data PowerPoint Presentation Slides
PDF
Open Source Data PowerPoint Presentation Slides
PDF
Rapid Data Exploration With Hadoop
Big data presentation linked in simon zhang 20140714
How Business Analytics drives business value - Teradata partners conference N...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
How Linkedin uses Automic for Big Data Processes
Zeine 2011 LinkedIn Use of Information Technology for Global Professional Net...
A Small Overview of Big Data Products, Analytics, and Infrastructure at LinkedIn
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn's Segmentation & Targeting Platform (Hadoop Summit 2013)
LinkedIn Segmentation & Targeting Platform
Developing Data Products
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Linkedin Analytics Week 11 MKT 9715 baruch mba program Prof Marshall Sponder
Group 1 LinkedIn
Linked In Presentation
LinkedIn Segmentation & Targeting Platform: A Big Data Application
Data infrastructure and Hadoop at LinkedIn
Want to do a surevy among our members?
Open Source Data PowerPoint Presentation Slides
Open Source Data PowerPoint Presentation Slides
Rapid Data Exploration With Hadoop
Ad

Big data arch_analytics

  • 1. Big Data EcoSystem and Analytics @ LinkedIn May 16, 2013 LinkedIn Confidential ©2013 All Rights Reserved
  • 2. Srinu Adira Manager, Data Services(Business Solutions) LinkedIn Corporation http://www.linkedin.com/in/srinuadira LinkedIn Confidential ©2013 All Rights Reserved 2
  • 3. Outline LinkedIn Overview Why Data is important for LinkedIn? Big Data Ecosystem Analytics at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 3
  • 4. Our Mission Connect the world’s professionals to make them more productive and successful LinkedIn Confidential ©2013 All Rights Reserved 4
  • 5. 5 The LinkedIn Opportunity Connect talent with opportunity at massive scale + Fundamentally transforming the way the world works LinkedIn Confidential ©2013 All Rights Reserved
  • 6. 200M+ The World’s Largest Professional Network LinkedIn Confidential ©2013 All Rights Reserved 6 8 17 32 55 90 147 2006 2007 2008 2009 2010 2011 2012 LinkedIn Members (Millions) * 88% Fortune 100 Companies use LinkedIn to hire ~2/sec New Members joining >2.9M Company Pages Professional searches in 2012 ~5.7B
  • 7. Outline  LinkedIn Overview  Why Data is important for LinkedIn?  Big Data Ecosystem  Analytics at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 7
  • 8. LinkedIn Confidential ©2013 All Rights Reserved 8 “If you are not embarrassed by the first version of your product, you have launched it too late.” Reid Hoffman, Founder & Chairman LinkedIn Corp
  • 9. LinkedIn Confidential ©2013 All Rights Reserved 9 “What gets measured gets fixed.” David Henke, SVP Technology Operations, LinkedIn Corp
  • 10. LinkedIn Confidential ©2013 All Rights Reserved 10 The Power of LinkedIn’s Network Effects Member growth and engagement Relevant and valuable products, solutions & services Critical mass of data
  • 11.  Few Data Driven Products  People You May Like  Groups You May Like  Jobs You May Be Interested In  Who's Viewed Your Profile  Companies You May Want To Follow 11 LinkedIn Confidential ©2013 All Rights Reserved
  • 12. Data Insights (Sample) LinkedIn Confidential ©2013 All Rights Reserved 12
  • 13. Data Solutions (Sample) LinkedIn Confidential ©2013 All Rights Reserved Java/MPP /Hadoop ML/Statis tical Packages Hadoo pMPP MPP 13
  • 14.  Data Solutions Drivers  Business analytics (e.g., data mining, enable decision making)  Sales analytics (e.g., customer segmentation, targeting)  Marketing (e.g., campaigns)  Data insights for Customers (e.g., Career site analytics)  Business Operations (forecasting, business pulse) 14 LinkedIn Confidential ©2013 All Rights Reserved
  • 15. Outline  LinkedIn Overview  Why Data is important at LinkedIn?  Big Data Ecosystem  Analytics at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 15
  • 16. Big Data at LinkedIn 16 * Chart from Philip Russom- Research Director: TDWI LinkedIn Confidential ©2013 All Rights Reserved
  • 17. LinkedIn Confidential ©2013 All Rights Reserved 17 Big Data at LinkedIn  Platform and solutions that  Scale at cost with data complexity  Simplify the data continuum across online, near-line and offline  Enable business decisions
  • 18. 18 What does “big data” mean at LinkedIn? ERP data… Social Data… CRM data… Web data… +∞ +∞ Data Volume Analytical Challenge & Complexity 0 18 LinkedIn Confidential ©2013 All Rights Reserved
  • 19. 3 major data dimensions at LinkedIn 19 Identity Data Social Data Behavioral Data LinkedIn Confidential ©2013 All Rights Reserved
  • 20. LinkedIn Confidential ©2013 All Rights Reserved 2 0 Near-Line Data Store Online Data Store Web Logs Big Data at LinkedIn High-level data environment Application Users Challenges so complex that off-the-shelf or a few technologies can’t address Offline Data Store Built our own combination of toolsets/ technologies to meet specific requirements
  • 21. LinkedIn Confidential ©2013 All Rights Reserved 21 LinkedIn’s Sample Data Stack Let’s do a deep dive to understand how the capabilities of LinkedIn’s data stack meet our requirements
  • 22. LinkedIn Confidential ©2013 All Rights Reserved 22 Users Near-Line Data Store Online Data Store Application Offline Data Store Web Logs LinkedIn Data Stack – Online Systems • • Capabilities Rich structures (e.g., indexes) Change capture capability
  • 23. LinkedIn Confidential ©2013 All Rights Reserved 33 Users Near-Line Data Store Online Data Store Application Offline Data Store Web Logs LinkedIn Data Stack – Nearline Systems Capabilities • • • Distributed Key value store Search platform Distributed Graph engine Bobo Sensei Voldemort Zoie D-Graph
  • 24. LinkedIn Confidential ©2013 All Rights Reserved 34 Users Online Data Store Application Offline Data Store Web Logs LinkedIn Data Stack – Pipeline Systems Capabilities • • • Messaging for site events, monitoring Change data capture streams Reliable, consistent, low latency pipe Near-Line Data Store
  • 25. LinkedIn Confidential ©2013 All Rights Reserved 35 Users Near-Line Data Store Online Data Store Application Offline Data Store Web Logs LinkedIn Data Stack – Offline Systems • • Capabilities Machine learning, ranking, Relevance, Solutions Warehouse and analytics
  • 26. LinkedIn with Hadoop, Aster, and Teradata Aster/Teradata Bi-Directional Connector Aster/Teradata Hadoop Connectors Data transformation & batch processing • Image processing • Search indexes • Graph (PYMK) • MapReduce Batch data transformations for engineering groups using HDFS + MapReduce LinkedIn Confidential ©2013 All Rights Reserved Analytic Platform for data discovery • nPath Pattern/Path • Clickstream analysis • A/B site testing • Data Sciences discovery • SQL-MapReduce Interactive MapReduce analytics for the enterprise using MapReduce Analytics & SQL-MapReduce Integrated Data Warehouse • Exec Dashboards • Adhoc/OLAP • Complex SQL • SQL Integration with structured data, operational intelligence, scalable distribution of analytics 26
  • 27. Outline  LinkedIn Overview  Why Data is important at LinkedIn?  Big Data Ecosystem  Analytics at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 27
  • 28. Several examples of business analytics evolution at LinkedIn Products Marketing Sales 1 2 3 28
  • 29. How we leverage data to support Marketing 29 Identity Data Social Data Behavioral Data Overall Audience Target Audience LinkedIn Confidential ©2013 All Rights Reserved
  • 30. The closed-loop analytical framework 30 Execution Reporting & business intelligence Post campaign analysis Model building and tuning Campaign planning & design Test Measure Why?Predict Design LinkedIn Confidential ©2013 All Rights Reserved
  • 31. A example of using data to improve sales Which account? Who? How? Step 1 Step 2 Step 3 Identity Data Social Data Behavioral Data 31
  • 32. How to provide 500 to 1000X impact? Insights portal for sales org. Easy: quickly find right info Fast: few seconds response time for most insights Scalable: 2M+ accounts/prospects Accurate: mimic analyst/data scientist1 2 3 4 32
  • 33. Four stages of data analytics What will happen? What happened? Why it happened? What is happening? High High Business Value Analytical Challenge & Complexity 0 33 LinkedIn Confidential ©2013 All Rights Reserved
  • 34. Use data to solve product problems -- A solution for answering A/B testing questions Let technology work for us Results first, methodology later Bypass the charts and reports Several thousands A/B tests are live, how to measure the performance? 1 2 3 34 LinkedIn Confidential ©2013 All Rights Reserved
  • 35. Nextplay : Web 3.0 – It’s all about data!! LinkedIn Confidential ©2013 All Rights Reserved 35
  • 36. We are hiring! Thank you! 36 sadira@linkedin.com LinkedIn Confidential ©2013 All Rights Reserved

Editor's Notes

  • #2: Good Morning! My Name is
  • #3: I am Srinu Adira, I manage Business Solutions at LinkedIn. I am primarily responsible for providing and enabling data solutions for iterative business decisions. Today, I am going to talk about big data eco system at LinkedIn. We manage 100s of Terabytes of data by leveraging scalable infrastructure.
  • #5: LinkedIn’s mission is to connect the world’s professionals to make them more productive and successful. There are total 640M professionals in the world. As of last week, we have 200MM registered members from all over the world. LinkedIn operates the world’s largest professional network on the Internet in over 200 countries and territories.
  • #6: LinkedIn believes in connecting talent with opportunity at massive scale. LinkedIn highly leverages scalable infrastructure to track user behavior. We believe in transforming the way the world works.
  • #7: LinkedIn has grown member base has grown gradually during initial years. For the past 3years, growth has been phenomenal. Yup, we have reached 200M mark. Current rate is approximately 2 per sec with over 2M companies have their presence on LinkedIn. More than 80% of fortune 100 companies use LinkedIn to hire professionals. Approximately 5.7B searches were done in 2012 on LinkedIn. Bottomline is we are growing fast and expanding the professional network.
  • #8: Why data is important for LinkedIn? Data is everywhere and LinkedIn believes in data.
  • #9: Our founder, Reid Hoffman believes in constant iterations of product improvements. Our site reflects the same belief as we keep adding new features.
  • #10: Of course, our SVP is a firm believer in data as to fix something we need to measure it. That’s where data plays a critical role. Extend it further, what gets measured gets improved. Whats gets analyzed gets monetized.
  • #11: How does LinkedIn’s network impacts our members? LinkedIn analyzes large amounts of data on daily basis. This analysis results in relevant and valuable products, business solutions and services. In turn these improvements in products and services reflect in member growth and engagement. These improvements, in turn, generate more data. Cycle of improvements continues.
  • #12: Who are the main drivers behind theses solutions?Our business analytics teams leverage these solutions to measure, analyze and predict the growth.Our sales analytics use the results of these analysis for improving the sales cyclesOur marketing teams use to fine-tune their campaigns such as emailsTalent Connect leverages it to match job and members vice-versaLast but not least, our business operations leverage these solutions to assess the pulse of our business
  • #13: In order to enable business decisions, data insights provide much needed analysis. In this case, for eg, we can provide typical company level details such as total members, connections, employees viewed provide good overview of growth and engagement on LinkedIn for a given company. Another example, if you look at the company cloud which provide employee inflow and outflow. This kind of analysis equips our business with much needed data input to improve further.
  • #14: What kind of solutions we generate? If you look into this cycle, as a step 1 - we focus on segmentation of our members with the help of standardization. This segmented data can be used for propensity modeling to fine-tune our target audience. In turn – this model helps in targeting our member base with richer usage experience. Besides targeting, we leverage data to come up with business forecasting to help our sales teams. Also, we analyze the churn and calculate lifetmevalue of a customer improve our customer base. For our solutions, we leverage tools and technology extensively. Few of them are MPP systems such as Teradata and Aster, distributed systems such as Hadoop for storage and processing. Besides MPP and hadoop, Java, machine learning technologies are used for various processes.
  • #15: Who are the main drivers behind theses solutions?Our business analytics teams leverage these solutions to measure, analyze and predict the growh.Our sales analytics use the results of these analysis for improving the sales cyclesOur marketing teams use to fine-tune their campaigns such as emailsTalent Connect leverages it to match job and members vice-versaLast but not least, our business operations leverage these solutions to assess the pulse of our business
  • #16: Now I will spend some time in going over our big data eco-system.
  • #17: We broadly divide our data challenges into three dimensions. i.e. Volume, Variety and Velocity. This chart source from TDWI – courtacy - Phillip RussomAbout volume - we process terabytes of data in form of records, transactions, tables and files. About variety – we process various kinds of data such as structured, data base tables, unstructured, such some tracking data and semi structured.About velocity – our infrastructure is developed to incorporate various kinds of data streams. Such as Batch(files), Near time(tracking data), Real time (transactional data) besides streams
  • #18: In order to accommodate accelerating volumes, increasing varieties and velocities, we are building our platforms and solutions that can scale, simplify and enable business decisions.
  • #19: ERP data: transactional data, informationCRM: Marketing, campaigns, usage, engagementWeb: Engagement, pathing, Social Data, What happened? (BI and Reporting)AnalyzesReal time monitoring what the key business trends.Predictive analyzesTeradata too small
  • #20: 3 major dimensions of data empower analyticsBehavioral Data Site EngagementOL TransactionsSearchesNavigation pathsRFMCommentsDiscussions….Demographic DataLocationGenderTitleFunctionSeniorityEducation….Social DataConnectionsCo-viewsSentiment trackingNPSFollowsEndorsementsForwardsCommentsShares….
  • #21: High level data flow architecture consists of user interaction with application generates various data sets such as near line lookup data, online data store to maintain user transactions such as profile information. Also, offline data is generated in form of web logs. In turn, all these data sets are centralized in offline data store. As you can see from this high level data flow, no single tool/technology can handle these needs. Hence, we have to build our own combination of tools and technologies to meet specific requirements.
  • #22: What do we use for data stack?As you can see from this slide, we use mix of commercial tools such Teradata and Oracle and open source technologies such as hadoop, kafka and voldemort.Next slide we will look into where and how these tools are used.
  • #23: LinkedIn leverages, builds tools and contributes to open source. Transactional data like member profile data is maintained in Oracle and Espresso.
  • #24: For nearline, linkedin leverages Voldemort as distributed key value store where as D-Graph is used for distributed graph engine.
  • #25: For pipelines – we leverage kafka and databus to transporting data from online and weblogs to offline data store.
  • #26: For data analysis/reporting, we leverage hadoop and teradata systems. Teradata and hadoop are used for processing large data sets to enable machine learning and analytics.
  • #28: Now I will spend some time in going over our big data eco-system.
  • #30: 3 major dimensions of data empower analyticsBehavioral Data Site EngagementOL TransactionsSearchesNavigation pathsRFMCommentsDiscussions….Demographic DataLocationGenderTitleFunctionSeniorityEducation….Social DataConnectionsCo-viewsSentiment trackingNPSFollowsEndorsementsForwards….
  • #31: Many companies are stumbling blindly into social media marketing w/o a measurement strategy.Measurement is the kingThe first new model email campaign was launchedBy using half of the regular campaign volume -Within 2 hours, it triggered the NOC alerting system due to the order volume doubled on a w/w base.Within 9 hours, the new model already bypassed the old model which was launched 14 days on new sub acquisitions with a 7 day reminder email.Within the 7 days, we saw 300+% lift from the Gen model, and 480+% lift from the Sales model.The email open rate of new model over performed the old model.So far, we have not seen any increase on the opt-out rate. In fact, we have observed slightly decrease on the opt-out rate at high level.
  • #32: 2M companiesWe have thousands sales people,How to prioritize sales?We predict which account, how much revenue they will spend?We predict within the account, who can make the decision, it is a mix of behaviors and engagementWe predict within LinkedIn, who has the highest likelihood to close the deal? Who can be leveraged to close the deal?
  • #33: 4 principles at LinkedIn
  • #34: 4 phases of analytics.We’d like to predict future.
  • #35: 4 principles at LinkedIn
  • #36: As our CEO says, always look for nextplay! Yes! Web 3.0 is all about data!!!
  • #37: Thank You all! We are growing and need more professionals! We are hiring! Please reach out to me if you have questions!