SlideShare a Scribd company logo
!
!
!
Every Second – in over 50,000 Categories
eBay Analytics

   >50 TB/day             new data                                >100k data elements
                                          >100 Trillion pairs of information
>150 PB/day            Processed
                                                >50k chains of logic
                                                                                         >7500
                                                                              business users & analysts

       Structured/Unstructured

                                       turning over a TB every             second
  24   x7x365
       Always online                                          Millions of queries/day
                                 99.98+% Availability
                                                                       Near-Real-time
                                                                                    3
Big
Detail
Designing for the Unknown
>85% of analytical workload is NEW & Unknown

The metrics you know are cheap

The metrics you don’t know are expensive – but high in potential ROI

Exploration & Testing are core pillars of an analytics-driven
  organization
incremental   storage


        Volume

       DATA
incremental   storage


        Volume

       DATA
                 Velocity      processing

                            change
incremental   storage


                            Volume

                            DATA
    structured    Variety            Velocity      processing
semi-structured
                                                change
        un-structured
Value > Cost
                         $’s per year in incremental revenue




www.wallpapertimes.com
!    Data Growing Faster
2011 x.commerce Innovate Data Alchemy
•    Impact
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy
Data


         questions later
         structure later



              ($0.04/GB, $80/2TB)

single HDFS instances >50PB




Value > Cost                        16
2011 x.commerce Innovate Data Alchemy
Synonyms	
  derived	
  from	
  top	
  queries	
  in	
  item	
  query	
  clusters	
  
texas	
  instruments	
  ba	
  ii	
  plus	
  
                                          /	
  ba	
  ii	
  plus	
  
brighton	
  handbag	
                     brighton	
  purse	
  
lenovo	
  x200	
                          thinkpad	
  x200	
  
king	
  bedspread	
                       king	
  coverlet	
  
rockabilly	
  dress	
                     swing	
  dress	
  
1963	
  ford	
  falcon	
                  63	
  falcon	
  
jessica	
  simpson	
  hair	
  extensions	
  
                                          jessica	
  simpson	
  hairdo	
  
                                        	
  
              Abbrevia7ons/acronym	
  derived	
  from	
  query	
  transi7ons	
  
stanford	
  ky	
                          stanford	
  kentucky	
  
dc	
  sub	
                               dc	
  subwoofer	
  
snowboard	
  helmet	
  l	
                snowboard	
  helmet	
  large	
  
motorcycle	
  cam	
                       motorcycle	
  camera	
  
diamond	
  amp	
                          diamond	
  amplifier	
  
Toys and Hobbies
ATC   >   Artist trading card   in ART
ATC   >   Automatic Tool Change in Business and Industrial
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy
Offline                   Online                            Clients


Editorial                         Service
                                                                   Search

                                   Code
                                                                   Selling

                                    Small
                                    Data                           Others…


               Behavioral Logs
                                  Big Data Store
               Document Data      NoSQL



            Human Judgment

                                 <3 milliseconds per query
                                 1.2 billion queries per day
                                 1,000’s of queries per second per machine
2011 x.commerce Innovate Data Alchemy
German Compound Words
 •    German compound words can be arbitrarily created and extremely long
          Adidastrainingsanzug (Adidas track suit)
          Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz
                   (beef labeling regulation & delegation of supervision law)
 •    Syntactically, words can be combined and split in many ways.
 •    Some words shouldn’t be de-compounded.
          beiden (both) – bei(at) den(the)
 •    Too many candidates for
          Granitpflastersteine (granite paving stones)
          Granit(granite) pflastersteine(cobblestones)
          Granit(granite) pflaster(paving/band-aid) steine(stones)
 •    Binding characters
      Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de)
      Hochzeitschuhe (129 hits on ebay.de).
Analyze & Report
                                                                         Discover & Explore


      Structured                               Semi-Structured                                  Unstructured
         SQL                                       SQL++                                      Java/C++/Pig/Hive
Production Data Warehousing                Contextual-Complex Analytics                       Structure the Unstructured
Large Concurrent User-base             Deep, Seasonal, Consumable Data Sets                        Detect Patterns




  Data Warehouse                            Data Warehouse +                                         Hadoop
                                               Behavioral



Enterprise-class System                Low End Enterprise-class System                    Commodity Hardware System



        8+PB                                      60+PB                                              40+PB
2011 x.commerce Innovate Data Alchemy
Brian knows the satisfaction and importance of good search results,
and his team is responsible for ensuring that the millions of queries
entered onto the eBay website provide just that. The words “Did you
mean…?” are incredibly meaningful to Brian as he combs through a
universe of queries altered by synonyms, acronyms, attributes, and
expansions. He’s been doing this sort of work since he joined eBay
nine years ago. Brian has loved technology ever since junior high
school, when he played the game “Lunar Lander” on a paper
teletype before video games existed, and pulled pranks in the local
Radio Shack. When Brian gets outside, he goes backpacking on
Mount Whitney, enters triathlons, and walks on water (barefoot water
skiing).
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy

More Related Content

PDF
2012.04.26 big insights streams im forum2
PPTX
eBay Search Query Intent
PPTX
eBay Search Science, IEEE Big Data, April 3rd, 2015
PPTX
2015-04 eBay Statistics
PDF
2011 Crowdsourcing Search Evaluation
PDF
Strategic evaluation of e bay
PPT
Strategic mgt of Ebay
2012.04.26 big insights streams im forum2
eBay Search Query Intent
eBay Search Science, IEEE Big Data, April 3rd, 2015
2015-04 eBay Statistics
2011 Crowdsourcing Search Evaluation
Strategic evaluation of e bay
Strategic mgt of Ebay

Similar to 2011 x.commerce Innovate Data Alchemy (20)

PDF
The New Alchemy: Turning Data into Gold By Brian Johnson Engineering Director...
PPTX
The New Alchemy Turning Data into Gold
PDF
CloudCon Data Mining Presentation
PPTX
Big Data vs Data Warehousing
PDF
The Agile Data Warehouse Webinar – Next Generation BI
PPTX
Kurukshetra - Big Data
PDF
Balancing Replication and Partitioning in a Distributed Java Database
PDF
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당:: AWS Summit Online Korea 2020
PDF
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
PPTX
SIL rapid capture
PPTX
2012: The End of the World?
PDF
Not about the Big in Big Data
PDF
Jeff Barr Amazon Services Cloud Computing
PDF
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
PDF
Measure Data Quality
PDF
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
KEY
Processing Big Data
PDF
Xldb2011 tue 1055_tom_fastner
PPTX
Introduction to Azure DocumentDB
PDF
Transforming the Database: Critical Innovations for Performance at Scale
The New Alchemy: Turning Data into Gold By Brian Johnson Engineering Director...
The New Alchemy Turning Data into Gold
CloudCon Data Mining Presentation
Big Data vs Data Warehousing
The Agile Data Warehouse Webinar – Next Generation BI
Kurukshetra - Big Data
Balancing Replication and Partitioning in a Distributed Java Database
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당:: AWS Summit Online Korea 2020
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
SIL rapid capture
2012: The End of the World?
Not about the Big in Big Data
Jeff Barr Amazon Services Cloud Computing
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Measure Data Quality
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
Processing Big Data
Xldb2011 tue 1055_tom_fastner
Introduction to Azure DocumentDB
Transforming the Database: Critical Innovations for Performance at Scale
Ad

More from Brian Johnson (7)

PDF
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
PDF
Treemaps: Visualizing Hierarchical and Categorical Data
PDF
11 964 181 System And Method For Providi
PDF
11 641 262 Proprietor Currency Assignmen
PDF
10 977 279 Method And System For Categor
PDF
11 869 290 Electronic Publication System
PDF
2011 Search Query Rewrites - Synonyms & Acronyms
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Treemaps: Visualizing Hierarchical and Categorical Data
11 964 181 System And Method For Providi
11 641 262 Proprietor Currency Assignmen
10 977 279 Method And System For Categor
11 869 290 Electronic Publication System
2011 Search Query Rewrites - Synonyms & Acronyms
Ad

Recently uploaded (20)

PDF
STKI Israel Market Study 2025 version august
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPT
What is a Computer? Input Devices /output devices
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
WOOl fibre morphology and structure.pdf for textiles
STKI Israel Market Study 2025 version august
Web App vs Mobile App What Should You Build First.pdf
A novel scalable deep ensemble learning framework for big data classification...
Module 1.ppt Iot fundamentals and Architecture
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Programs and apps: productivity, graphics, security and other tools
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
OMC Textile Division Presentation 2021.pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
cloud_computing_Infrastucture_as_cloud_p
What is a Computer? Input Devices /output devices
DP Operators-handbook-extract for the Mautical Institute
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Hindi spoken digit analysis for native and non-native speakers
Final SEM Unit 1 for mit wpu at pune .pptx
Getting started with AI Agents and Multi-Agent Systems
WOOl fibre morphology and structure.pdf for textiles

2011 x.commerce Innovate Data Alchemy

  • 2. Every Second – in over 50,000 Categories
  • 3. eBay Analytics >50 TB/day new data >100k data elements >100 Trillion pairs of information >150 PB/day Processed >50k chains of logic >7500 business users & analysts Structured/Unstructured turning over a TB every second 24 x7x365 Always online Millions of queries/day 99.98+% Availability Near-Real-time 3
  • 4. Big
  • 6. Designing for the Unknown >85% of analytical workload is NEW & Unknown The metrics you know are cheap The metrics you don’t know are expensive – but high in potential ROI Exploration & Testing are core pillars of an analytics-driven organization
  • 7. incremental storage Volume DATA
  • 8. incremental storage Volume DATA Velocity processing change
  • 9. incremental storage Volume DATA structured Variety Velocity processing semi-structured change un-structured
  • 10. Value > Cost $’s per year in incremental revenue www.wallpapertimes.com
  • 11. !  Data Growing Faster
  • 13. •  Impact
  • 16. Data questions later structure later ($0.04/GB, $80/2TB) single HDFS instances >50PB Value > Cost 16
  • 18. Synonyms  derived  from  top  queries  in  item  query  clusters   texas  instruments  ba  ii  plus   /  ba  ii  plus   brighton  handbag   brighton  purse   lenovo  x200   thinkpad  x200   king  bedspread   king  coverlet   rockabilly  dress   swing  dress   1963  ford  falcon   63  falcon   jessica  simpson  hair  extensions   jessica  simpson  hairdo     Abbrevia7ons/acronym  derived  from  query  transi7ons   stanford  ky   stanford  kentucky   dc  sub   dc  subwoofer   snowboard  helmet  l   snowboard  helmet  large   motorcycle  cam   motorcycle  camera   diamond  amp   diamond  amplifier  
  • 19. Toys and Hobbies ATC > Artist trading card in ART ATC > Automatic Tool Change in Business and Industrial
  • 22. Offline Online Clients Editorial Service Search Code Selling Small Data Others… Behavioral Logs Big Data Store Document Data NoSQL Human Judgment <3 milliseconds per query 1.2 billion queries per day 1,000’s of queries per second per machine
  • 24. German Compound Words •  German compound words can be arbitrarily created and extremely long Adidastrainingsanzug (Adidas track suit) Rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz (beef labeling regulation & delegation of supervision law) •  Syntactically, words can be combined and split in many ways. •  Some words shouldn’t be de-compounded. beiden (both) – bei(at) den(the) •  Too many candidates for Granitpflastersteine (granite paving stones) Granit(granite) pflastersteine(cobblestones) Granit(granite) pflaster(paving/band-aid) steine(stones) •  Binding characters Hochzeitsschuhe (grammatically correct, 593 hits on ebay.de) Hochzeitschuhe (129 hits on ebay.de).
  • 25. Analyze & Report Discover & Explore Structured Semi-Structured Unstructured SQL SQL++ Java/C++/Pig/Hive Production Data Warehousing Contextual-Complex Analytics Structure the Unstructured Large Concurrent User-base Deep, Seasonal, Consumable Data Sets Detect Patterns Data Warehouse Data Warehouse + Hadoop Behavioral Enterprise-class System Low End Enterprise-class System Commodity Hardware System 8+PB 60+PB 40+PB
  • 27. Brian knows the satisfaction and importance of good search results, and his team is responsible for ensuring that the millions of queries entered onto the eBay website provide just that. The words “Did you mean…?” are incredibly meaningful to Brian as he combs through a universe of queries altered by synonyms, acronyms, attributes, and expansions. He’s been doing this sort of work since he joined eBay nine years ago. Brian has loved technology ever since junior high school, when he played the game “Lunar Lander” on a paper teletype before video games existed, and pulled pranks in the local Radio Shack. When Brian gets outside, he goes backpacking on Mount Whitney, enters triathlons, and walks on water (barefoot water skiing).