SlideShare a Scribd company logo
AI in multi billion (dollars, users,
documents) search engines
Career in Search and AI
Andrei Lopatenko, PhD
Vice President of Engineering, Zillow Group
Who am I
Core contributor to Google Search (2006-2010), Apple AppStore/iTunes
Co-designed and Co-implemented Apple Maps Search (2010), Walmart Grocery
Exit: Adviser, Ozlo, acquired to Facebook, 2017
Has been leading search teams: Zillow (now), Walmart, eBay,
PhD in Computer Science, The University of Manchester, UK
My path: from core contributor of Google Search to leading search ecosystem of
market leaders of Real Estate (Zillow, Trulia), eCommerce (Walmart, eBay), digital
distribution (Apple),
My goal for this talk
To demonstrate that AI is a useful tool in every part of search engine stack from
data acquisition to ranking and beyond.
AI significantly improves customer experience, revenue/GMV, operational costs,
helps to get more customers and serve them better
Some ways of AI development are better than other - and I’ll describe which ones
are better
To outline what make a good AI engineer for a successful career in search or other
consumer oriented AI/ML based service and how to succeed in long term
My message
Successful AI in Search is an AI infrastructure, engineering and science culture
and toolsets enabling to continuously introduce, measure, improve AI features in
every part of the search engine rather than several SOTA models in ranking or
query understanding
The same applies to every large scale consumer or business facing platform
(recommendation engines, call center analytics etc etc etc)
What’s AI
For the purpose of this talk: we take an extended definition of AI which includes
machine learning, data science, statistical decision theory, statistics, natural
language processing - everything what let machines to make right decisions for
search tasks
What’s search engine?
A customer facing engine which given a query returns a set of answers
Where query can be a natural language/text query, or location (location based
search), or item from the catalog (recommendation engine) or user profile
(personalization) or any combination of them
Multi billion? (customers, dollars, documents )
We focus on search engines with many billion dollars revenue/GMV, billions of users
(or hundreds of millions), billions of documents which justify investment in building
AI infrastructure because it improves revenues by hundreds millions/billions of
dollars or saves infrastructure costs on large scale - justifying developing many AI
applications for search
Typical Search Engine - High Level View
Data
Acquisition
Indexing
Ranking
Retrieval
Query
Understanding
UX
Search
Assistance
Logging
Monitoring
Experiment
management
SRP logic
Other
Post Search
Applications
notation
In this presentation, ‘documents’ will be items representing information about
atomic search results which might be documents, houses and other properties (real
estate), web pages (web search), ecommerce products (ecommerce),
Apps/Music/Video/Books (digital distribution), businesses (local search),
geographic entities (local search)
When I say “documents”, I mean information about objects of the search engine:
real estate properties, restaurants, streets, web pages, apps etc
Main Components
Aka Search Quality: Search assistance services (Autosuggest, dynamic facets etc),
Query Understanding, Ranking, SERP logic (snippet building, universal search -
mixing results of different corpora)
Aka Search Infrastructure: retrieval, loging, monitoring, ops, deploying index
Aka Indexing: data acquisition (crawling web, feeds, data imports), includes data
enrichment (duplicate resolution, cleaning data, mapping into common dictionaries,
extraction), indexing (building index for retrieval systems)
Main Components
Post Search applications: recommendation engines to serve results to user based on
their saved searched, saved items, notifications about new items, new prices,
availability
Search eXperience: view and design of search pages, result pages
AI is everywhere
Using AI components to improve data acquisition by 10%, indexing by 10%, query
understanding by 10%, ranking by 10%, result page by 10% gives more gains in
customer satisfaction and revenue/GMB than applying SOTA and improving only
one components such as query understanding or ranking by 30%.
The AI development should be driven by creating infrastructure, culture and toolsets
for continuous AI deployment, improvement, measurement everywhere in the search
stack. There are no ‘engineering’ teams, every team is an AI team
AI is not separable from engineering
AI development is not separable from other engineering developments.
Improvement of index selection by 10% lets either accommodate other data sources
to improve coverage (by getting more documents) or to improve ranking (by getting
more computation and using it for more advanced functions since fewer documents
to rank)
Improvement of infrastructure to decrease latency by 10% lets to deploy more
sophisticated ranking or query understanding functions (10% more time)
Good engineering quality and culture is not separable from AI development but a
mandatory part of AI culture
AI ‘rank labs’ or AI platforms
Search engine teams benefit if there is an unified environment to train, deploy, serve
models - by reducing work on infrastructure, MLOps, sharing metrics, making easy
to measure end to end metrics
There is no an environment which will handle all types of AI development, AI
serving, AI measurement.
Search Systems are naturally complex by different tasks, different environments,
different languages, CI/CD systems but never ending work on unifying AI
development infrastructures across search teams helps
Multiple ways to ‘deploy’ AI models
Deploy models to TF Serving, TorchServe etc
Deploy models served in container
Compile a model directly into a machine code or as a source code of search
component (GBDT models into c++/java code to be used in ranking)
Relearn and change parameters of existing models served
Tons of other deployment scenarios
Etc etc etc
Multiple ways to serve AI models in Search
Streaming (ex: document updates)
Batch (ex: offline processing of queries or users or documents)
Serving runtime services (ranking, query understanding)
Multiple ways to improve AI in Search
Change evaluation methods and metrics, train models to new metrics
Change sampling procedures
Change training procedures
Change modeling techniques
Model previously unmodeled tasks
New data and new features
Infrastructure changes in serving etc etc etc
AI platforms
So, it’s almost impossible to make one platform to handle all types of AI
development and deployment (see variety in previous slides)
But, unification of some of tasks reduces development and operation efforts and
costs, increase velocity of AI development bringing a lot of money and customer
satisfaction
Every AI driven Search company created “Rank Lab” AI platform for Search
Now , there is an open source such as KubeFlow, MLFlow, to simplify developments
AI services in prod
Good if they are decoupled, so multiple small teams can work on services
independently
But wiring is needed (a ‘signal’ from query understanding to be used in ranking etc)
Processing of a query may call ~dozens of AI services, processing of a document in
data acquisition and index may call 100s of AI services. Performance considerations
are extremely important
AI infrastructure benefits greatly from common software practices, protocols,
orchestration to organize this ‘AI chaos’ and make order out of it
AI in prod
Besides ML objective functions and metrics
a. Latency
b. Resiliency
c. Throughput
d. Resource utilization
Are super important factors in design of every AI services at the search engine stack
AI service development should be tested and benchmark against them
Your model will serve billions of document updates or billions of queries, every 1ms
delay, 1 ms downtime, etc will cost either bad user experience or millions in ops
Typically two types of AI platforms emerge
- Type A. Horizontal, Ranking or NLP or forecast, fraud etc to be applied
everywhere
- Type B. Vertical. Indexing, Query Understanding,
- Different requirements
- Invest in infrastructure for reusable / repeatable AI cases
-
Next steps
I’ll describe how AI is used in every component of the search stack. It’s everywhere
and everywhere it’s a tool to get significant improvement of that part of search.
The original presentation was 170 slides, I did almost random sampling to fit the
time given to me. There are a lot important high impact AI use cases are not
included .
After that I’ll describe what makes AI engineer & data scientist successful in search
and what’s important for long term career success in the area
AI in Search
Use cases
Indexing - index selection
Perhaps, one of the first of AI applications to the search domains. Started in 90s
when web volume increased and index selection strategy started to be important
Which documents should be indexed? In which index layers they should be placed?
(many modern search engines are multi level, smaller index for frequently searched
items, a very big and comprehensive index for rarely search items)
AI for quality, popularity assessment of ‘documents’
Indexing - duplicate resolution
Which ‘documents’ are essentially the same (represent the same item)? Or have
highly duplicated content, so the second document does not carry more information?
Which documents are the same wrt to a particular query? (we do not want to show
collocated Target and Target Pharmacy for local search query target but they are
different entities for query pharmacy)
Indexing - attributes extraction
Given documents - full text descriptions of houses, ecommerce products, businesses
etc - extract significant attributes important for search to understand items of
interest
(size, wheel size, weight, number of pages, location, view)
Indexing
Detection of adversary content: explicit, spam
Indexing - cold start
Given a new item, predict its conversion rate, customer demand, probability to buy,
propensity
Probability to be bought for a particular query, customer, in conjunction with
another item
Indexing - knowledge graph
Given a set of documents describing items, extract knowledge graphs, key items and
other entities with understanding of their relationship
Map information from a set of document with full text descriptions (typ. Billions of
documents) into structured knowledge graph representation of items of various types
and their relationships
Index - statistical tasks
Evaluation of quality and size of the index.
Is our index provides good coverage? What categories are missing? What data
quality problems?
Evaluation of Index size of external systems
Index - data quality
What attributes are important and must be mandatory and which attributes can be
optional in data acquisition? Which types of data, which categories to acquire?
AI processes continuously looking into search logs to decide customer priorities,
what drives conversion and using this information to drive data acquisition
Demand generation beyond just data
The same type of AI evaluation procedures to compute and forecast future demand
of items, to drive purchase decision if search engine is used to sell items
AI for query understanding
Query understanding is mapping of customer’s query into a machine understandable
format to retrieve a set of relevant items and rank them with highest probability of
customer engagement (view, purchase, etc)
Synonym expansion for better retrieval, removal of insignificant terms, correcting
spelling and other errors, term weighting, attribute and entity extraction, compound
and phrase extraction, classification (novelty, price range etc ) etc
Query understanding parsing
Dependency, constituency and other parsings as a part of query understanding
stack. Typically, serves other part of query understanding stack
Query understanding Classification
Mapping a query into a certain set of categories to be used in retrieval and ranking
-> most probable document category (italian -> restaurants in local search),
-> most probable distance (gas -> 5 miles distance, micheline restaurant -> 50
miles distance local search)
-> novelty: printers -> released within 1 years, pillows -> release date does not
matter
Typically: 100s classifiers per search engine with significant impact on quality /
revenue
Query understanding Similar Queries
Given queries q1 - q2 how similar they are (how results for one query will be good
as results for the other query)
Tons of applications in query understanding and ranking: given features for one
query, apply them to another query for ranking, extend retrieval set etc etc
Query understanding: term weighting
Computer importance / weights of terms and phrases of queries to be used in
ranking
Query understanding: entity and attribute
extraction
Given a query: map it into structured representation of entities and attributes to be
used for better retrieval and ranking
Query understanding: backfilling
Given a query and its structured representation
Generate a query which represent latent user needs behind this qiery (not what user
types, but what she/he wants to find)
Query understanding temporal aspects
Particular example of query classification :
Detect temporal topicality shifts in query/user interest and in documents,
AI for ranking
Learning to rank / Machine Learning based ranking technologies to rank document
(LeToR/ MLR)
AI for unbiased ranking
User interactions based LeToR/Counterfactual
Frequently it’s the biggest revenue driver of the search engine but it’s
over-represented by AI in Search lectures, so just one slide from me
AI for search assistance
Typeahead prediction - language modeling, other contextual information, location of
user, previous searches of users,
Query dependent , user dependent navigational panels and guided search
Vertical AI
Very heterogeneous set of AI applications. examples:
Image search - understanding image similarity for image-image search and
extraction of keywords about images from image itself and relevant documents for
text to image search
Deep understanding of geographical data for local search, ex: extraction of region
names from various text documents describing houses, businesses, geo entities
AI Whole page
Given an output of several search engines: how to combine them to construct the
best customer experience.
Ex: music, video, book, podcasts as in iTunes;
web search, maps, youtube, news, image, books, scholar etc in Google
AI SERP snippet
How to generate the best descriptions of items in the search result page so
customers understand relevance of items without clicking on them
How to select the best chunk of text representing the item, picture, formats, -
depending on the query and the user
Indexing - infrastructure
AI to learn new more efficient index structures
AI to predict index strategies (for search engines based on databases)
AI - Search ops
Improving data center efficiency
Predicting failing nodes
Power USage Effectiveness improvement
AI Result page
How to generate the page of the item / document?
How to place image, title, full text descriptions, attributes, reviews, detailed
descriptions, recommended items to maximize convenience to make the right decision
for the customer
AI price prediction
Predict the price of the item (for selling search engines)
Which will maximize item conversion and customer satisfaction and revenue of the
company
(economics problems, but tightly connected to search, depends on item position in
search, relevance, exposure, prices of other search results)
AI for search service performance
AI to predict performance load to deploy new machines to serve increased traffic or
adopt to reduced traffic to reduce operations costs
AI for caching to predict what results/queries should be cached to improve
latency/decrease load
AI conversational search
Conversational interfaces for search, multi turn interactions with customers to
understand customer search intent and help her/him to express their intent or even
to find it by making latent intent explicit
NLP/NLU, dialog state management, deep reinforcement learning, text generation
ASR for voice based systems
AI SEO SEM
Getting legitimate traffic from Google/Search engines, more users to the search
How to present search engine outputs (result pages, search pages, other) so search
engines pick them, index, rank higher, show for more queries, bring more users to
the search engine
NLP, generation of search friendly pages (good structure, titles, anchors)
Analysis of performance of marketing campaigns, finding better keyword, search
queries which are relevant and bring more users
AI anti SEO
Detection of users trying to affect the ranking or retrieval to promote their items
Behaviour analysis, data science, text analytics to detect behaviour online (fake
clicks, views, queries, purchases) or data manipulation (keyword stuffing, attribute
manipulation) to affect retrieval and ranking of search engine to promote pages of
SEO manipulator
AI Question Answering
Answering questions as finding the best factual answer rather than a document
relevant to the query.
Which french restaurant is the most romantic in Palo Alto? What’s the phone
number of sales customer support? What’s a return policy? When my items will be
shipped?
Post search AI
Given a set of queries relevant to user (saved queries, previous sessions) and a set of
items relevant to users
Generate email and other notifications about new items, price changes, availability
changes - which will help users to find/buy/discover what they want
What makes a good search engineer
This part of the presentation is about what are qualities of a good search engineer
and how to build career in AI/Search
1. How to be successful in your search projects and what makes you a good
search engineer
2. How to be successful in a long term career building
Qualities of a good Search engineer
Required Knowledge for long term success in search (to be able to delivery multiple
company level impact successful projects):
1. Machine Learning, new models, new features,
2. Engineering, implementing software solutions with performance, quality, etc
requirements
3. Metrics / Customer, transforming customer experience into metrics which can
be used for ML training, experiments/analysis
4. Statistics, design and analysis of experiments
5. Business, understanding business, how to transform business development into
metrics/OKRs, and consequentually into new search features, new search
products
Qualities of a good Search engineer
Many search features require changes in many parts of search stack: indexing,
ranking, query understanding, evaluation setups
Requires collaboration with many different teams: engineering, MLE, research,
statisticians.
Ability to collaborate at large scale with multiple diverse teams: communications,
document writing, project organization at multiple levels from coding to project
management to product management
Qualities of a good Search engineer
Sometime, search development work requires long time a person / a small team
efforts, where help from management or from colleagues will not change much
Require ability to have long term focus and be able to work in an isolated result
focused environment (PhD style work), result focused environment
Qualities of a good Search engineer
Ability to work on long term projects with no guaranteed outcomes
Many search projects are focused on improving certain customer satisfaction
metrics, (the number of local results, the number of new relevant results etc etc),
improving the model, feature set, something else.
Frequently, there is no guarantee that it’s achievable. Some search projects require
work with multiple unsuccessful tries before finding a good solution
Requires certain persistence to go through failure to failure before finding a
successful solution
Qualities of good Search engineer
Understanding the customer, and skills of transforming understanding the customer
needs into into actionable metrics
A lot of search development is not about continuous improvement of one relevance,
query understanding, index size etc metric, but about discovering and understanding
of various aspects of customer satisfaction and transforming this understanding into
new metrics, which can be used for training models, measurement and improvements
of the search
Qualities of a good Search engineer
Continuous awareness of new developments in many areas of ML/IR/NLP/statistics
which can be used to improve search
Continuous professional development, learning, reading, in machine
learning/AI/NLP/IR, engineering/programming, and other professionals skills
Qualities of a good Search engineer
Success of many big projects and initiatives depends on collaboration with multiple
teams from other technology teams to business departments (legal, marketing, etc)
Ability to find a support and convince people with very different points of view about
importance, criteria of success, impact of technology projects
and
Ability to listen to feedback and proposals of very different people from business to
tech, objectively understand it and incorporate it into technology development
Qualities of a good Search engineer
Qualities of a good Search Engineer
Engineering part is super important and frequently underestimated in many articles
and books. Only small part of the search development is a training of new models.
The other part is development of new product features, building infrastructure to
serve models, etc software engineering is a part of the job.
Search engines has strict performance limits, search engine is a face of your
business. It’s down, business is down. Quality engineering.
Skills how to write good, quality, performance code, how to test it, tune it, document
it, etc is crucial part of search engineering success.
Long term career success as a Search engineer
Reputation is the number 1 success criteria of a long term career success.
Reputation of you as an engineer, MLE, leader, collaborator. Reputation of you,
teams you built, etc
Reputation among engineering teams, business teams, your peers, partners and you
leadership.
Reputation based on different qualities from building large scale systems to success
in ML projects to understanding business needs and transforming them into
engineering products
First 15 years of career is focus on building of a reputation
Long term career success
Select only jobs which truly suits your
Next job offer: analyze the company: values, culture, technology area, business
vision - is it what you want?
Very important for the first job after college, PhD you get etc - good initial fit is
crucial
Assess companies, will you relate to its business, culture and people?
What you learn there will define your career for several decades
Do systematic assessment of every job offer -- but especially the first job after
college, PhD one is very important
Long term career success
The best job is a job with a company that suits you
When you select next step, be sure that company culture, values, product,
engineering fits you, your development goals, your values. Do not move because of
popular technology, a big title, sudden unexpected salary increase, hype, and other
accidental to your long term career reasons
Long term career success
Focus on development of long term professional relationships
Develop diverse base of meaningful work connections, with colleagues from different
technology departments, different lines of business, marketing, legal, recruiters etc
based on joint work and your reputation as your work with them
Long term career success
Within your company, Move to more strategic projects with big impact on the
company business
Strategic projects - More opportunities for career development, more meaningful
work connections, more things to learn for long term career goals , typically more
interesting technologies, more to learn about business, technology, customer, more
opportunity for self development, more skills, more knowledge
Long term career success
More to more strategic and bigger impact contributions in your area of work
First job - develop models, develop software features as requested by mentor,
manager
Move from individual projects to team projects, from coding and model training to
defining vision, strategy, roadmap, execution, building teams
In *every* role and project, widen your scope, do more challenging tasks, bigger
impact on the company business
Long term career success
Do not complaint, Make changes
It applies to code, technology, org structure, culture, relationships, products,
anything you believe can be improved
Do not just complain about things going wrong. Fix them whenever possible. By
coding, writing documentation, making people aware about wrong things and
proposing solutions, at every level of your career, you can make bigger changes than
you are expected at this step of career. Bring changes rather than whine.
Even if a problem is well above your role, propose solutions, notify relevant people,
bring value to solve it, rather than just complain.
Long term career success
Continuous professional development is crucial at every step of the career
Every year ask yourself questions,
over last 12 months
1. How much I learned about the technologies, the products, the services, the
markets? What part of this knowledge is relevant to my work? How much did it
help to improve my performance (performance of my team)
2. How many new people have I gotten to know at work? How diverse is this
people set? How many people have I improved relationships with?
-
Long term career success
Continuous professional development is crucial at every step of the career
Over last 12 months
1. What new results, accomplishments have i achieved? What have I launched,
improved? How much does it add to my reputation? Track record?
2. What new skills have i developed? Am I better in communications?
Technology? Analytics skills? Judgement? In which areas?
How can I do it better next year? What should I improve? How to apply these new
skills, relationship, knowledge?
QA

More Related Content

PDF
Building multi billion ( dollars, users, documents ) search engines on open ...
PDF
Anatomy of an eCommerce Search Engine by Mayur Datar
PDF
System design for recommendations and search
PDF
Thought Vectors and Knowledge Graphs in AI-powered Search
PDF
How to Build your Training Set for a Learning To Rank Project
PPTX
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
PPTX
Slash n near real time indexing
PPT
A Hybrid Recommendation system
Building multi billion ( dollars, users, documents ) search engines on open ...
Anatomy of an eCommerce Search Engine by Mayur Datar
System design for recommendations and search
Thought Vectors and Knowledge Graphs in AI-powered Search
How to Build your Training Set for a Learning To Rank Project
Semantic Content Networks - Ranking Websites on Google with Semantic SEO
Slash n near real time indexing
A Hybrid Recommendation system

What's hot (20)

PDF
Bringing ML To Production, What Is Missing? AMLD 2020
PDF
Haystack 2019 - Query relaxation - a rewriting technique between search and r...
PDF
An introduction to Elasticsearch's advanced relevance ranking toolbox
PPTX
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
PPTX
Introduction to Auto ML
PDF
AutoML - The Future of AI
PDF
Deep Learning for Recommender Systems RecSys2017 Tutorial
PDF
Text Classification, Sentiment Analysis, and Opinion Mining
PPTX
Better Search Through Query Understanding
PDF
Natural Language Processing NLP (Transformers)
PPT
Speech To Sign Language Interpreter System
PDF
Applied Machine Learning for Ranking Products in an Ecommerce Setting
PDF
Natural Language Processing (NLP)
PDF
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
PPTX
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
PDF
Enterprise Knowledge Graph
PDF
Introduction to Recommendation Systems
PDF
Netflix Recommendations - Beyond the 5 Stars
PDF
A Multi-Armed Bandit Framework For Recommendations at Netflix
PPTX
Content based filtering
Bringing ML To Production, What Is Missing? AMLD 2020
Haystack 2019 - Query relaxation - a rewriting technique between search and r...
An introduction to Elasticsearch's advanced relevance ranking toolbox
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Introduction to Auto ML
AutoML - The Future of AI
Deep Learning for Recommender Systems RecSys2017 Tutorial
Text Classification, Sentiment Analysis, and Opinion Mining
Better Search Through Query Understanding
Natural Language Processing NLP (Transformers)
Speech To Sign Language Interpreter System
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Natural Language Processing (NLP)
[MLOps KR 행사] MLOps 춘추 전국 시대 정리(210605)
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Enterprise Knowledge Graph
Introduction to Recommendation Systems
Netflix Recommendations - Beyond the 5 Stars
A Multi-Armed Bandit Framework For Recommendations at Netflix
Content based filtering
Ad

Similar to AI in Multi Billion Search Engines. Career building in AI / Search. What makes a good search engineer (20)

PDF
AI in Search Engines
PDF
AI in multi billion search engines. Building AI and Search teams
PDF
Natural Language Processing at Scale
PPT
Search Analytics for Fun and Profit
PDF
AI, Search, and the Disruption of Knowledge Management
PPTX
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
PPTX
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
PPTX
Introduction to Data Analytics
PPTX
AI Recruitment - How Businesses Are Winning the Race for the Talent
PDF
Harendra Singh, AI Strategy and Consulting Portfolio
PPTX
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
PDF
Agile data science
PPTX
Introduction to enterprise search
PDF
Unlocking Value from Unstructured Data
PPTX
How an AI-backed recommendation system can help increase revenue for your onl...
PPTX
DU Series - Day 4.pptx
PPTX
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
PPTX
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
PDF
How Data Annotation Companies Improve AI Model Accuracy.pdf
PDF
One Stop Recommendation
AI in Search Engines
AI in multi billion search engines. Building AI and Search teams
Natural Language Processing at Scale
Search Analytics for Fun and Profit
AI, Search, and the Disruption of Knowledge Management
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Introduction to Data Analytics
AI Recruitment - How Businesses Are Winning the Race for the Talent
Harendra Singh, AI Strategy and Consulting Portfolio
AI for Customer Service - How to Improve Contact Center Efficiency with Machi...
Agile data science
Introduction to enterprise search
Unlocking Value from Unstructured Data
How an AI-backed recommendation system can help increase revenue for your onl...
DU Series - Day 4.pptx
test - Future of Ecommerce: How to Improve the Online Shopping Experience Usi...
AI for Customer Service: How to Improve Contact Center Efficiency with Machin...
How Data Annotation Companies Improve AI Model Accuracy.pdf
One Stop Recommendation
Ad

Recently uploaded (20)

PPTX
Surgical thesis protocol formation ppt.pptx
PPTX
Autonomic_Nervous_SystemM_Drugs_PPT.pptx
PPTX
CORE 1 HOUSEKEEPING TOURISM SECTOR POWERPOINT
PPTX
Sports and Dance -lesson 3 powerpoint presentation
DOC
field study for teachers graduating samplr
PDF
Why Today’s Brands Need ORM & SEO Specialists More Than Ever.pdf
PDF
Entrepreneurship PowerPoint for students
PPTX
ESD MODULE-5hdbdhbdbdbdbbdbdbbdndbdbdbdbbdbd
PPT
APPROACH TO DEVELOPMENTALlllllllllllllllll
PPTX
E-Commerce____Intermediate_Presentation.pptx
PDF
Sales and Distribution Managemnjnfijient.pdf
PPTX
chapter 3_bem.pptxKLJLKJLKJLKJKJKLJKJKJKHJH
DOCX
How to Become a Criminal Profiler or Behavioural Analyst.docx
PDF
Chapter 7-2.pdf. .
PPTX
internship presentation of bsnl in colllege
PDF
RIBOSOMES.12.pdf kerala msc botany degree
PPTX
Job-opportunities lecture about it skills
PPTX
PMP (Project Management Professional) course prepares individuals
PPTX
Cerebral_Palsy_Detailed_Presentation.pptx
PPTX
OCCULAR MANIFESTATIONS IN LEPROSY.pptx bbb
Surgical thesis protocol formation ppt.pptx
Autonomic_Nervous_SystemM_Drugs_PPT.pptx
CORE 1 HOUSEKEEPING TOURISM SECTOR POWERPOINT
Sports and Dance -lesson 3 powerpoint presentation
field study for teachers graduating samplr
Why Today’s Brands Need ORM & SEO Specialists More Than Ever.pdf
Entrepreneurship PowerPoint for students
ESD MODULE-5hdbdhbdbdbdbbdbdbbdndbdbdbdbbdbd
APPROACH TO DEVELOPMENTALlllllllllllllllll
E-Commerce____Intermediate_Presentation.pptx
Sales and Distribution Managemnjnfijient.pdf
chapter 3_bem.pptxKLJLKJLKJLKJKJKLJKJKJKHJH
How to Become a Criminal Profiler or Behavioural Analyst.docx
Chapter 7-2.pdf. .
internship presentation of bsnl in colllege
RIBOSOMES.12.pdf kerala msc botany degree
Job-opportunities lecture about it skills
PMP (Project Management Professional) course prepares individuals
Cerebral_Palsy_Detailed_Presentation.pptx
OCCULAR MANIFESTATIONS IN LEPROSY.pptx bbb

AI in Multi Billion Search Engines. Career building in AI / Search. What makes a good search engineer

  • 1. AI in multi billion (dollars, users, documents) search engines Career in Search and AI Andrei Lopatenko, PhD Vice President of Engineering, Zillow Group
  • 2. Who am I Core contributor to Google Search (2006-2010), Apple AppStore/iTunes Co-designed and Co-implemented Apple Maps Search (2010), Walmart Grocery Exit: Adviser, Ozlo, acquired to Facebook, 2017 Has been leading search teams: Zillow (now), Walmart, eBay, PhD in Computer Science, The University of Manchester, UK My path: from core contributor of Google Search to leading search ecosystem of market leaders of Real Estate (Zillow, Trulia), eCommerce (Walmart, eBay), digital distribution (Apple),
  • 3. My goal for this talk To demonstrate that AI is a useful tool in every part of search engine stack from data acquisition to ranking and beyond. AI significantly improves customer experience, revenue/GMV, operational costs, helps to get more customers and serve them better Some ways of AI development are better than other - and I’ll describe which ones are better To outline what make a good AI engineer for a successful career in search or other consumer oriented AI/ML based service and how to succeed in long term
  • 4. My message Successful AI in Search is an AI infrastructure, engineering and science culture and toolsets enabling to continuously introduce, measure, improve AI features in every part of the search engine rather than several SOTA models in ranking or query understanding The same applies to every large scale consumer or business facing platform (recommendation engines, call center analytics etc etc etc)
  • 5. What’s AI For the purpose of this talk: we take an extended definition of AI which includes machine learning, data science, statistical decision theory, statistics, natural language processing - everything what let machines to make right decisions for search tasks
  • 6. What’s search engine? A customer facing engine which given a query returns a set of answers Where query can be a natural language/text query, or location (location based search), or item from the catalog (recommendation engine) or user profile (personalization) or any combination of them
  • 7. Multi billion? (customers, dollars, documents ) We focus on search engines with many billion dollars revenue/GMV, billions of users (or hundreds of millions), billions of documents which justify investment in building AI infrastructure because it improves revenues by hundreds millions/billions of dollars or saves infrastructure costs on large scale - justifying developing many AI applications for search
  • 8. Typical Search Engine - High Level View Data Acquisition Indexing Ranking Retrieval Query Understanding UX Search Assistance Logging Monitoring Experiment management SRP logic Other Post Search Applications
  • 9. notation In this presentation, ‘documents’ will be items representing information about atomic search results which might be documents, houses and other properties (real estate), web pages (web search), ecommerce products (ecommerce), Apps/Music/Video/Books (digital distribution), businesses (local search), geographic entities (local search) When I say “documents”, I mean information about objects of the search engine: real estate properties, restaurants, streets, web pages, apps etc
  • 10. Main Components Aka Search Quality: Search assistance services (Autosuggest, dynamic facets etc), Query Understanding, Ranking, SERP logic (snippet building, universal search - mixing results of different corpora) Aka Search Infrastructure: retrieval, loging, monitoring, ops, deploying index Aka Indexing: data acquisition (crawling web, feeds, data imports), includes data enrichment (duplicate resolution, cleaning data, mapping into common dictionaries, extraction), indexing (building index for retrieval systems)
  • 11. Main Components Post Search applications: recommendation engines to serve results to user based on their saved searched, saved items, notifications about new items, new prices, availability Search eXperience: view and design of search pages, result pages
  • 12. AI is everywhere Using AI components to improve data acquisition by 10%, indexing by 10%, query understanding by 10%, ranking by 10%, result page by 10% gives more gains in customer satisfaction and revenue/GMB than applying SOTA and improving only one components such as query understanding or ranking by 30%. The AI development should be driven by creating infrastructure, culture and toolsets for continuous AI deployment, improvement, measurement everywhere in the search stack. There are no ‘engineering’ teams, every team is an AI team
  • 13. AI is not separable from engineering AI development is not separable from other engineering developments. Improvement of index selection by 10% lets either accommodate other data sources to improve coverage (by getting more documents) or to improve ranking (by getting more computation and using it for more advanced functions since fewer documents to rank) Improvement of infrastructure to decrease latency by 10% lets to deploy more sophisticated ranking or query understanding functions (10% more time) Good engineering quality and culture is not separable from AI development but a mandatory part of AI culture
  • 14. AI ‘rank labs’ or AI platforms Search engine teams benefit if there is an unified environment to train, deploy, serve models - by reducing work on infrastructure, MLOps, sharing metrics, making easy to measure end to end metrics There is no an environment which will handle all types of AI development, AI serving, AI measurement. Search Systems are naturally complex by different tasks, different environments, different languages, CI/CD systems but never ending work on unifying AI development infrastructures across search teams helps
  • 15. Multiple ways to ‘deploy’ AI models Deploy models to TF Serving, TorchServe etc Deploy models served in container Compile a model directly into a machine code or as a source code of search component (GBDT models into c++/java code to be used in ranking) Relearn and change parameters of existing models served Tons of other deployment scenarios Etc etc etc
  • 16. Multiple ways to serve AI models in Search Streaming (ex: document updates) Batch (ex: offline processing of queries or users or documents) Serving runtime services (ranking, query understanding)
  • 17. Multiple ways to improve AI in Search Change evaluation methods and metrics, train models to new metrics Change sampling procedures Change training procedures Change modeling techniques Model previously unmodeled tasks New data and new features Infrastructure changes in serving etc etc etc
  • 18. AI platforms So, it’s almost impossible to make one platform to handle all types of AI development and deployment (see variety in previous slides) But, unification of some of tasks reduces development and operation efforts and costs, increase velocity of AI development bringing a lot of money and customer satisfaction Every AI driven Search company created “Rank Lab” AI platform for Search Now , there is an open source such as KubeFlow, MLFlow, to simplify developments
  • 19. AI services in prod Good if they are decoupled, so multiple small teams can work on services independently But wiring is needed (a ‘signal’ from query understanding to be used in ranking etc) Processing of a query may call ~dozens of AI services, processing of a document in data acquisition and index may call 100s of AI services. Performance considerations are extremely important AI infrastructure benefits greatly from common software practices, protocols, orchestration to organize this ‘AI chaos’ and make order out of it
  • 20. AI in prod Besides ML objective functions and metrics a. Latency b. Resiliency c. Throughput d. Resource utilization Are super important factors in design of every AI services at the search engine stack AI service development should be tested and benchmark against them Your model will serve billions of document updates or billions of queries, every 1ms delay, 1 ms downtime, etc will cost either bad user experience or millions in ops
  • 21. Typically two types of AI platforms emerge - Type A. Horizontal, Ranking or NLP or forecast, fraud etc to be applied everywhere - Type B. Vertical. Indexing, Query Understanding, - Different requirements - Invest in infrastructure for reusable / repeatable AI cases -
  • 22. Next steps I’ll describe how AI is used in every component of the search stack. It’s everywhere and everywhere it’s a tool to get significant improvement of that part of search. The original presentation was 170 slides, I did almost random sampling to fit the time given to me. There are a lot important high impact AI use cases are not included . After that I’ll describe what makes AI engineer & data scientist successful in search and what’s important for long term career success in the area
  • 24. Indexing - index selection Perhaps, one of the first of AI applications to the search domains. Started in 90s when web volume increased and index selection strategy started to be important Which documents should be indexed? In which index layers they should be placed? (many modern search engines are multi level, smaller index for frequently searched items, a very big and comprehensive index for rarely search items) AI for quality, popularity assessment of ‘documents’
  • 25. Indexing - duplicate resolution Which ‘documents’ are essentially the same (represent the same item)? Or have highly duplicated content, so the second document does not carry more information? Which documents are the same wrt to a particular query? (we do not want to show collocated Target and Target Pharmacy for local search query target but they are different entities for query pharmacy)
  • 26. Indexing - attributes extraction Given documents - full text descriptions of houses, ecommerce products, businesses etc - extract significant attributes important for search to understand items of interest (size, wheel size, weight, number of pages, location, view)
  • 27. Indexing Detection of adversary content: explicit, spam
  • 28. Indexing - cold start Given a new item, predict its conversion rate, customer demand, probability to buy, propensity Probability to be bought for a particular query, customer, in conjunction with another item
  • 29. Indexing - knowledge graph Given a set of documents describing items, extract knowledge graphs, key items and other entities with understanding of their relationship Map information from a set of document with full text descriptions (typ. Billions of documents) into structured knowledge graph representation of items of various types and their relationships
  • 30. Index - statistical tasks Evaluation of quality and size of the index. Is our index provides good coverage? What categories are missing? What data quality problems? Evaluation of Index size of external systems
  • 31. Index - data quality What attributes are important and must be mandatory and which attributes can be optional in data acquisition? Which types of data, which categories to acquire? AI processes continuously looking into search logs to decide customer priorities, what drives conversion and using this information to drive data acquisition
  • 32. Demand generation beyond just data The same type of AI evaluation procedures to compute and forecast future demand of items, to drive purchase decision if search engine is used to sell items
  • 33. AI for query understanding Query understanding is mapping of customer’s query into a machine understandable format to retrieve a set of relevant items and rank them with highest probability of customer engagement (view, purchase, etc) Synonym expansion for better retrieval, removal of insignificant terms, correcting spelling and other errors, term weighting, attribute and entity extraction, compound and phrase extraction, classification (novelty, price range etc ) etc
  • 34. Query understanding parsing Dependency, constituency and other parsings as a part of query understanding stack. Typically, serves other part of query understanding stack
  • 35. Query understanding Classification Mapping a query into a certain set of categories to be used in retrieval and ranking -> most probable document category (italian -> restaurants in local search), -> most probable distance (gas -> 5 miles distance, micheline restaurant -> 50 miles distance local search) -> novelty: printers -> released within 1 years, pillows -> release date does not matter Typically: 100s classifiers per search engine with significant impact on quality / revenue
  • 36. Query understanding Similar Queries Given queries q1 - q2 how similar they are (how results for one query will be good as results for the other query) Tons of applications in query understanding and ranking: given features for one query, apply them to another query for ranking, extend retrieval set etc etc
  • 37. Query understanding: term weighting Computer importance / weights of terms and phrases of queries to be used in ranking
  • 38. Query understanding: entity and attribute extraction Given a query: map it into structured representation of entities and attributes to be used for better retrieval and ranking
  • 39. Query understanding: backfilling Given a query and its structured representation Generate a query which represent latent user needs behind this qiery (not what user types, but what she/he wants to find)
  • 40. Query understanding temporal aspects Particular example of query classification : Detect temporal topicality shifts in query/user interest and in documents,
  • 41. AI for ranking Learning to rank / Machine Learning based ranking technologies to rank document (LeToR/ MLR) AI for unbiased ranking User interactions based LeToR/Counterfactual Frequently it’s the biggest revenue driver of the search engine but it’s over-represented by AI in Search lectures, so just one slide from me
  • 42. AI for search assistance Typeahead prediction - language modeling, other contextual information, location of user, previous searches of users, Query dependent , user dependent navigational panels and guided search
  • 43. Vertical AI Very heterogeneous set of AI applications. examples: Image search - understanding image similarity for image-image search and extraction of keywords about images from image itself and relevant documents for text to image search Deep understanding of geographical data for local search, ex: extraction of region names from various text documents describing houses, businesses, geo entities
  • 44. AI Whole page Given an output of several search engines: how to combine them to construct the best customer experience. Ex: music, video, book, podcasts as in iTunes; web search, maps, youtube, news, image, books, scholar etc in Google
  • 45. AI SERP snippet How to generate the best descriptions of items in the search result page so customers understand relevance of items without clicking on them How to select the best chunk of text representing the item, picture, formats, - depending on the query and the user
  • 46. Indexing - infrastructure AI to learn new more efficient index structures AI to predict index strategies (for search engines based on databases)
  • 47. AI - Search ops Improving data center efficiency Predicting failing nodes Power USage Effectiveness improvement
  • 48. AI Result page How to generate the page of the item / document? How to place image, title, full text descriptions, attributes, reviews, detailed descriptions, recommended items to maximize convenience to make the right decision for the customer
  • 49. AI price prediction Predict the price of the item (for selling search engines) Which will maximize item conversion and customer satisfaction and revenue of the company (economics problems, but tightly connected to search, depends on item position in search, relevance, exposure, prices of other search results)
  • 50. AI for search service performance AI to predict performance load to deploy new machines to serve increased traffic or adopt to reduced traffic to reduce operations costs AI for caching to predict what results/queries should be cached to improve latency/decrease load
  • 51. AI conversational search Conversational interfaces for search, multi turn interactions with customers to understand customer search intent and help her/him to express their intent or even to find it by making latent intent explicit NLP/NLU, dialog state management, deep reinforcement learning, text generation ASR for voice based systems
  • 52. AI SEO SEM Getting legitimate traffic from Google/Search engines, more users to the search How to present search engine outputs (result pages, search pages, other) so search engines pick them, index, rank higher, show for more queries, bring more users to the search engine NLP, generation of search friendly pages (good structure, titles, anchors) Analysis of performance of marketing campaigns, finding better keyword, search queries which are relevant and bring more users
  • 53. AI anti SEO Detection of users trying to affect the ranking or retrieval to promote their items Behaviour analysis, data science, text analytics to detect behaviour online (fake clicks, views, queries, purchases) or data manipulation (keyword stuffing, attribute manipulation) to affect retrieval and ranking of search engine to promote pages of SEO manipulator
  • 54. AI Question Answering Answering questions as finding the best factual answer rather than a document relevant to the query. Which french restaurant is the most romantic in Palo Alto? What’s the phone number of sales customer support? What’s a return policy? When my items will be shipped?
  • 55. Post search AI Given a set of queries relevant to user (saved queries, previous sessions) and a set of items relevant to users Generate email and other notifications about new items, price changes, availability changes - which will help users to find/buy/discover what they want
  • 56. What makes a good search engineer This part of the presentation is about what are qualities of a good search engineer and how to build career in AI/Search 1. How to be successful in your search projects and what makes you a good search engineer 2. How to be successful in a long term career building
  • 57. Qualities of a good Search engineer Required Knowledge for long term success in search (to be able to delivery multiple company level impact successful projects): 1. Machine Learning, new models, new features, 2. Engineering, implementing software solutions with performance, quality, etc requirements 3. Metrics / Customer, transforming customer experience into metrics which can be used for ML training, experiments/analysis 4. Statistics, design and analysis of experiments 5. Business, understanding business, how to transform business development into metrics/OKRs, and consequentually into new search features, new search products
  • 58. Qualities of a good Search engineer Many search features require changes in many parts of search stack: indexing, ranking, query understanding, evaluation setups Requires collaboration with many different teams: engineering, MLE, research, statisticians. Ability to collaborate at large scale with multiple diverse teams: communications, document writing, project organization at multiple levels from coding to project management to product management
  • 59. Qualities of a good Search engineer Sometime, search development work requires long time a person / a small team efforts, where help from management or from colleagues will not change much Require ability to have long term focus and be able to work in an isolated result focused environment (PhD style work), result focused environment
  • 60. Qualities of a good Search engineer Ability to work on long term projects with no guaranteed outcomes Many search projects are focused on improving certain customer satisfaction metrics, (the number of local results, the number of new relevant results etc etc), improving the model, feature set, something else. Frequently, there is no guarantee that it’s achievable. Some search projects require work with multiple unsuccessful tries before finding a good solution Requires certain persistence to go through failure to failure before finding a successful solution
  • 61. Qualities of good Search engineer Understanding the customer, and skills of transforming understanding the customer needs into into actionable metrics A lot of search development is not about continuous improvement of one relevance, query understanding, index size etc metric, but about discovering and understanding of various aspects of customer satisfaction and transforming this understanding into new metrics, which can be used for training models, measurement and improvements of the search
  • 62. Qualities of a good Search engineer Continuous awareness of new developments in many areas of ML/IR/NLP/statistics which can be used to improve search Continuous professional development, learning, reading, in machine learning/AI/NLP/IR, engineering/programming, and other professionals skills
  • 63. Qualities of a good Search engineer Success of many big projects and initiatives depends on collaboration with multiple teams from other technology teams to business departments (legal, marketing, etc) Ability to find a support and convince people with very different points of view about importance, criteria of success, impact of technology projects and Ability to listen to feedback and proposals of very different people from business to tech, objectively understand it and incorporate it into technology development
  • 64. Qualities of a good Search engineer Qualities of a good Search Engineer Engineering part is super important and frequently underestimated in many articles and books. Only small part of the search development is a training of new models. The other part is development of new product features, building infrastructure to serve models, etc software engineering is a part of the job. Search engines has strict performance limits, search engine is a face of your business. It’s down, business is down. Quality engineering. Skills how to write good, quality, performance code, how to test it, tune it, document it, etc is crucial part of search engineering success.
  • 65. Long term career success as a Search engineer Reputation is the number 1 success criteria of a long term career success. Reputation of you as an engineer, MLE, leader, collaborator. Reputation of you, teams you built, etc Reputation among engineering teams, business teams, your peers, partners and you leadership. Reputation based on different qualities from building large scale systems to success in ML projects to understanding business needs and transforming them into engineering products First 15 years of career is focus on building of a reputation
  • 66. Long term career success Select only jobs which truly suits your Next job offer: analyze the company: values, culture, technology area, business vision - is it what you want? Very important for the first job after college, PhD you get etc - good initial fit is crucial Assess companies, will you relate to its business, culture and people? What you learn there will define your career for several decades Do systematic assessment of every job offer -- but especially the first job after college, PhD one is very important
  • 67. Long term career success The best job is a job with a company that suits you When you select next step, be sure that company culture, values, product, engineering fits you, your development goals, your values. Do not move because of popular technology, a big title, sudden unexpected salary increase, hype, and other accidental to your long term career reasons
  • 68. Long term career success Focus on development of long term professional relationships Develop diverse base of meaningful work connections, with colleagues from different technology departments, different lines of business, marketing, legal, recruiters etc based on joint work and your reputation as your work with them
  • 69. Long term career success Within your company, Move to more strategic projects with big impact on the company business Strategic projects - More opportunities for career development, more meaningful work connections, more things to learn for long term career goals , typically more interesting technologies, more to learn about business, technology, customer, more opportunity for self development, more skills, more knowledge
  • 70. Long term career success More to more strategic and bigger impact contributions in your area of work First job - develop models, develop software features as requested by mentor, manager Move from individual projects to team projects, from coding and model training to defining vision, strategy, roadmap, execution, building teams In *every* role and project, widen your scope, do more challenging tasks, bigger impact on the company business
  • 71. Long term career success Do not complaint, Make changes It applies to code, technology, org structure, culture, relationships, products, anything you believe can be improved Do not just complain about things going wrong. Fix them whenever possible. By coding, writing documentation, making people aware about wrong things and proposing solutions, at every level of your career, you can make bigger changes than you are expected at this step of career. Bring changes rather than whine. Even if a problem is well above your role, propose solutions, notify relevant people, bring value to solve it, rather than just complain.
  • 72. Long term career success Continuous professional development is crucial at every step of the career Every year ask yourself questions, over last 12 months 1. How much I learned about the technologies, the products, the services, the markets? What part of this knowledge is relevant to my work? How much did it help to improve my performance (performance of my team) 2. How many new people have I gotten to know at work? How diverse is this people set? How many people have I improved relationships with? -
  • 73. Long term career success Continuous professional development is crucial at every step of the career Over last 12 months 1. What new results, accomplishments have i achieved? What have I launched, improved? How much does it add to my reputation? Track record? 2. What new skills have i developed? Am I better in communications? Technology? Analytics skills? Judgement? In which areas? How can I do it better next year? What should I improve? How to apply these new skills, relationship, knowledge?
  • 74. QA