SlideShare a Scribd company logo
Developing Data Products
Uber Tech Talk
Pete Skomoroch @peteskomoroch
December 5 2012


©2012 LinkedIn Corporation. All Rights Reserved.
Examples, Techniques, & Lessons Learned

Developing Data Products
Our Mission
                          Connect the world’s professionals to make them
                             more productive and successful.

Our Vision
Create economic opportunity for every
   professional in the world.


Members First!
LinkedIn is the leading professional network site



         187M+
                                           1




           LinkedIn Members

                                          2


         640M+        Worldwide
                   Professionals
                                           2


 3,300M+
      Worldwide Workforce




©2012 LinkedIn Corporation. All Rights Reserved.    4
LinkedIn profiles represent our professional identity




                                 1                             2




       187M                                        Members   187M   Member
                                                                    Profiles


©2012 LinkedIn Corporation. All Rights Reserved.                               5
We have a lot of data.




©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
    And (like everyone else), we store it in Hadoop.




©2012 LinkedIn Corporation. All Rights Reserved.
We have a lot of data.
  And (like everyone else), we store it in Hadoop.
  And people build awesome things with that data.




©2012 LinkedIn Corporation. All Rights Reserved.
What do we mean by data
products?
Building products from data at LinkedIn

A few examples:

    People You May Know
    Skills and Endorsements
    Year in Review
    Network Updates Digest
    InMaps
    Who’s viewed my profile
    Collaborative Filtering
    Groups You May Like
    and more…




©2012 LinkedIn Corporation. All Rights Reserved.
Collaborative Filtering: LinkedIn Skill Pages




©2012 LinkedIn Corporation. All Rights Reserved.
Classification: giving structure to unstructured data




          Extract




©2012 LinkedIn Corporation. All Rights Reserved.
Clustering & Disambiguation




©2012 LinkedIn Corporation. All Rights Reserved.
De-duplication and Normalization




©2012 LinkedIn Corporation. All Rights Reserved.
Network Algorithms: Relevance & Ranking




©2012 LinkedIn Corporation. All Rights Reserved.   15
Prediction: Personalized Skill Recommendations




©2012 LinkedIn Corporation. All Rights Reserved.
Developing Data Products
Developing Data Products
Skill Endorsements




©2012 LinkedIn Corporation. All Rights Reserved.
Social Proof and the Skill Endorsement Graph




©2012 LinkedIn Corporation. All Rights Reserved.   20
The Economic Graph: Skills, Jobs, People, Locations…




                                                   Location



©2012 LinkedIn Corporation. All Rights Reserved.              21
Lessons learned developing data
products
Collect the right data at the right time
Large amounts of data can reveal new patterns
 Probability of Job Title




                                                   Months since graduation
©2012 LinkedIn Corporation. All Rights Reserved.                             24
Be wary of “black-box” approaches




©2012 LinkedIn Corporation. All Rights Reserved.   25
Look at your data




©2012 LinkedIn Corporation. All Rights Reserved.   26
Aggregate statistics can be misleading

       12




       10




        8




        6




        4




        2




        0
                 1             2            3      4   5   6   7   8   9   10




©2012 LinkedIn Corporation. All Rights Reserved.                                27
Build a viewer app, “micro-listen”




©2012 LinkedIn Corporation. All Rights Reserved.   28
Algorithmic intuition: include data geeks in design




©2012 LinkedIn Corporation. All Rights Reserved.      29
OODA: Think like a jet fighter




©2012 LinkedIn Corporation. All Rights Reserved.   30
OODA: Observe, Orient, Decide, Act




©2012 LinkedIn Corporation. All Rights Reserved.   31
OODA: The speed you can move determines victory




©2012 LinkedIn Corporation. All Rights Reserved.   32
Red teaming: what can go wrong likely will




©2012 LinkedIn Corporation. All Rights Reserved.   33
Error data is super valuable, analyze it and adapt




©2012 LinkedIn Corporation. All Rights Reserved.     34
Conclusion: tips for developing data products

    Collect the right data at the right time
    Large amounts of data can reveal new patterns
    Be wary of “black box” approaches
    Look at your raw data
    Aggregate statistics can be misleading
    Build and use viewer apps
    Include data geeks in design process
    OODA: Think like a jet fighter
    Red-teaming: anticipate edge cases
    Find opportunity in your error data




©2012 LinkedIn Corporation. All Rights Reserved.
Questions?

More info: data.linkedin.com
@peteskomoroch


©2012 LinkedIn Corporation. All Rights Reserved.   36

More Related Content

PDF
What are data products and why are they different from other products?
PDF
Data Products and teams
PPTX
Designing Data Products
PDF
Azure Machine Learning
PDF
Data Architecture vs Data Modeling
PPT
Data Architecture for Data Governance
PDF
Modernizing Integration with Data Virtualization
PDF
DAS Slides: Enterprise Architecture vs. Data Architecture
What are data products and why are they different from other products?
Data Products and teams
Designing Data Products
Azure Machine Learning
Data Architecture vs Data Modeling
Data Architecture for Data Governance
Modernizing Integration with Data Virtualization
DAS Slides: Enterprise Architecture vs. Data Architecture

What's hot (20)

PDF
PDF
Databricks Partner Enablement Guide.pdf
PPTX
DW Migration Webinar-March 2022.pptx
PDF
Data Catalog for Better Data Discovery and Governance
PDF
DataOps - The Foundation for Your Agile Data Architecture
PPTX
Introduction to Data Engineering
PDF
Data Quality Best Practices
PDF
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PDF
Generative AI: Past, Present, and Future – A Practitioner's Perspective
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
PPT
MDM Strategy & Roadmap
PDF
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Business Intelligence & Data Analytics– An Architected Approach
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PDF
Data Management, Metadata Management, and Data Governance – Working Together
PDF
Make Data Work for You
PPTX
Azure data platform overview
Databricks Partner Enablement Guide.pdf
DW Migration Webinar-March 2022.pptx
Data Catalog for Better Data Discovery and Governance
DataOps - The Foundation for Your Agile Data Architecture
Introduction to Data Engineering
Data Quality Best Practices
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Enabling a Data Mesh Architecture with Data Virtualization
Generative AI: Past, Present, and Future – A Practitioner's Perspective
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Architect’s Open-Source Guide for a Data Mesh Architecture
MDM Strategy & Roadmap
DAS Slides: Building a Data Strategy – Practical Steps for Aligning with Busi...
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Business Intelligence & Data Analytics– An Architected Approach
Data Lakehouse Symposium | Day 1 | Part 2
Data Management, Metadata Management, and Data Governance – Working Together
Make Data Work for You
Azure data platform overview
Ad

Similar to Developing Data Products (20)

PPTX
SF Data Science: Developing Data Products
PPTX
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
PPTX
Computing Professional Identity for the Economic Graph
PPTX
7 Badass Tactics for SlideShare Content Domination
PPTX
7 Badass Tactics for Slideshare Content Domination
PPTX
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
PPTX
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
PDF
Big Data Ecosystem @ LinkedIn
PPTX
Are Your LinkedIn or Linked Out?
PPT
Enterprise Search, more relevant now than ever
PPTX
Big data arch_analytics
PDF
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
PPTX
Building the perfect profile on LinkedIn
PPTX
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
PPTX
Partner - Talent Solutions - Staffing Agency
PPTX
Remarkable Content for AMA New Orleans, Nov 2014
PPTX
Partner - Talent Solutions - Corporate
PPTX
Tamm & kitt
PDF
Office 2012 LinkedIn slides
ODP
Business Networking on LinkedIn
SF Data Science: Developing Data Products
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Computing Professional Identity for the Economic Graph
7 Badass Tactics for SlideShare Content Domination
7 Badass Tactics for Slideshare Content Domination
Crowdsourcing Series: LinkedIn. By Vitaly Gordon & Patrick Philips.
7 Badass SlideShare Tactics - Jason Miller (Social Fresh WEST 2013)
Big Data Ecosystem @ LinkedIn
Are Your LinkedIn or Linked Out?
Enterprise Search, more relevant now than ever
Big data arch_analytics
The Data Effect: Canadian Big Data & Analytics Update - Dr. Alison Brooks Dir...
Building the perfect profile on LinkedIn
Solving Biz Problems with SugarExchange: Session 9: How to Run Contributor Ca...
Partner - Talent Solutions - Staffing Agency
Remarkable Content for AMA New Orleans, Nov 2014
Partner - Talent Solutions - Corporate
Tamm & kitt
Office 2012 LinkedIn slides
Business Networking on LinkedIn
Ad

More from Peter Skomoroch (15)

PPTX
Bridging the AI Gap: Building Stakeholder Support
PDF
Managing Machines: The New AI Dev Stack
PDF
Product Management for AI
PDF
Executive Briefing: Why managing machines is harder than you think
PPTX
Building Competitive Moats With Data
PPT
O'Reilly Strata: Distilling Data Exhaust
PPTX
Skills, Reputation, and Search
PPTX
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
PDF
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
PDF
Street Fighting Data Science
PDF
Data Mashups -Data Science Summit
KEY
Geo Analytics Tutorial - Where 2.0 2011
PDF
Rapid Data Exploration With Hadoop
PDF
Prototyping Data Intensive Apps: TrendingTopics.org
PDF
Elasticwulf Pycon Talk
Bridging the AI Gap: Building Stakeholder Support
Managing Machines: The New AI Dev Stack
Product Management for AI
Executive Briefing: Why managing machines is harder than you think
Building Competitive Moats With Data
O'Reilly Strata: Distilling Data Exhaust
Skills, Reputation, and Search
LinkedIn Endorsements: Reputation, Virality, and Social Tagging
Practical Problem Solving with Data - Onlab Data Conference, Tokyo
Street Fighting Data Science
Data Mashups -Data Science Summit
Geo Analytics Tutorial - Where 2.0 2011
Rapid Data Exploration With Hadoop
Prototyping Data Intensive Apps: TrendingTopics.org
Elasticwulf Pycon Talk

Developing Data Products

  • 1. Developing Data Products Uber Tech Talk Pete Skomoroch @peteskomoroch December 5 2012 ©2012 LinkedIn Corporation. All Rights Reserved.
  • 2. Examples, Techniques, & Lessons Learned Developing Data Products
  • 3. Our Mission Connect the world’s professionals to make them more productive and successful. Our Vision Create economic opportunity for every professional in the world. Members First!
  • 4. LinkedIn is the leading professional network site 187M+ 1 LinkedIn Members 2 640M+ Worldwide Professionals 2 3,300M+ Worldwide Workforce ©2012 LinkedIn Corporation. All Rights Reserved. 4
  • 5. LinkedIn profiles represent our professional identity 1 2 187M Members 187M Member Profiles ©2012 LinkedIn Corporation. All Rights Reserved. 5
  • 6. We have a lot of data. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 7. We have a lot of data. And (like everyone else), we store it in Hadoop. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 8. We have a lot of data. And (like everyone else), we store it in Hadoop. And people build awesome things with that data. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 9. What do we mean by data products?
  • 10. Building products from data at LinkedIn A few examples:  People You May Know  Skills and Endorsements  Year in Review  Network Updates Digest  InMaps  Who’s viewed my profile  Collaborative Filtering  Groups You May Like  and more… ©2012 LinkedIn Corporation. All Rights Reserved.
  • 11. Collaborative Filtering: LinkedIn Skill Pages ©2012 LinkedIn Corporation. All Rights Reserved.
  • 12. Classification: giving structure to unstructured data Extract ©2012 LinkedIn Corporation. All Rights Reserved.
  • 13. Clustering & Disambiguation ©2012 LinkedIn Corporation. All Rights Reserved.
  • 14. De-duplication and Normalization ©2012 LinkedIn Corporation. All Rights Reserved.
  • 15. Network Algorithms: Relevance & Ranking ©2012 LinkedIn Corporation. All Rights Reserved. 15
  • 16. Prediction: Personalized Skill Recommendations ©2012 LinkedIn Corporation. All Rights Reserved.
  • 19. Skill Endorsements ©2012 LinkedIn Corporation. All Rights Reserved.
  • 20. Social Proof and the Skill Endorsement Graph ©2012 LinkedIn Corporation. All Rights Reserved. 20
  • 21. The Economic Graph: Skills, Jobs, People, Locations… Location ©2012 LinkedIn Corporation. All Rights Reserved. 21
  • 22. Lessons learned developing data products
  • 23. Collect the right data at the right time
  • 24. Large amounts of data can reveal new patterns Probability of Job Title Months since graduation ©2012 LinkedIn Corporation. All Rights Reserved. 24
  • 25. Be wary of “black-box” approaches ©2012 LinkedIn Corporation. All Rights Reserved. 25
  • 26. Look at your data ©2012 LinkedIn Corporation. All Rights Reserved. 26
  • 27. Aggregate statistics can be misleading 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 ©2012 LinkedIn Corporation. All Rights Reserved. 27
  • 28. Build a viewer app, “micro-listen” ©2012 LinkedIn Corporation. All Rights Reserved. 28
  • 29. Algorithmic intuition: include data geeks in design ©2012 LinkedIn Corporation. All Rights Reserved. 29
  • 30. OODA: Think like a jet fighter ©2012 LinkedIn Corporation. All Rights Reserved. 30
  • 31. OODA: Observe, Orient, Decide, Act ©2012 LinkedIn Corporation. All Rights Reserved. 31
  • 32. OODA: The speed you can move determines victory ©2012 LinkedIn Corporation. All Rights Reserved. 32
  • 33. Red teaming: what can go wrong likely will ©2012 LinkedIn Corporation. All Rights Reserved. 33
  • 34. Error data is super valuable, analyze it and adapt ©2012 LinkedIn Corporation. All Rights Reserved. 34
  • 35. Conclusion: tips for developing data products  Collect the right data at the right time  Large amounts of data can reveal new patterns  Be wary of “black box” approaches  Look at your raw data  Aggregate statistics can be misleading  Build and use viewer apps  Include data geeks in design process  OODA: Think like a jet fighter  Red-teaming: anticipate edge cases  Find opportunity in your error data ©2012 LinkedIn Corporation. All Rights Reserved.
  • 36. Questions? More info: data.linkedin.com @peteskomoroch ©2012 LinkedIn Corporation. All Rights Reserved. 36

Editor's Notes

  • #4: Mission: For us, fundamentally changing the way the world works begins with our mission statement: To connect the world’s professionals to make them more productive and successful. This means not only helping people to find their dream jobs, but also enabling them to be great at the jobs they’re already in. Vision: But, we’re just getting started. By our measure,there are more than 640 million professionals in the world. And roughly 3.3 billion people in the global workforce. Ultimately, our vision is to create economic opportunity for every professional, which we believe is an especially crucial objective in light of current macroeconomic trends.Our most important core value is that members come first.