AI in Multi Billion Search Engines. Career building in AI / Search. What makes a good search engineer

AI in multi billion (dollars, users,
documents) search engines
Career in Search and AI
Andrei Lopatenko, PhD
Vice President of Engineering, Zillow Group

Who am I
Core contributor to Google Search (2006-2010), Apple AppStore/iTunes
Co-designed and Co-implemented Apple Maps Search (2010), Walmart Grocery
Exit: Adviser, Ozlo, acquired to Facebook, 2017
Has been leading search teams: Zillow (now), Walmart, eBay,
PhD in Computer Science, The University of Manchester, UK
My path: from core contributor of Google Search to leading search ecosystem of
market leaders of Real Estate (Zillow, Trulia), eCommerce (Walmart, eBay), digital
distribution (Apple),

My goal for this talk
To demonstrate that AI is a useful tool in every part of search engine stack from
data acquisition to ranking and beyond.
AI signiﬁcantly improves customer experience, revenue/GMV, operational costs,
helps to get more customers and serve them better
Some ways of AI development are better than other - and I’ll describe which ones
are better
To outline what make a good AI engineer for a successful career in search or other
consumer oriented AI/ML based service and how to succeed in long term

My message
Successful AI in Search is an AI infrastructure, engineering and science culture
and toolsets enabling to continuously introduce, measure, improve AI features in
every part of the search engine rather than several SOTA models in ranking or
query understanding
The same applies to every large scale consumer or business facing platform
(recommendation engines, call center analytics etc etc etc)

What’s AI
For the purpose of this talk: we take an extended deﬁnition of AI which includes
machine learning, data science, statistical decision theory, statistics, natural
language processing - everything what let machines to make right decisions for
search tasks

What’s search engine?
A customer facing engine which given a query returns a set of answers
Where query can be a natural language/text query, or location (location based
search), or item from the catalog (recommendation engine) or user proﬁle
(personalization) or any combination of them

Multi billion? (customers, dollars, documents )
We focus on search engines with many billion dollars revenue/GMV, billions of users
(or hundreds of millions), billions of documents which justify investment in building
AI infrastructure because it improves revenues by hundreds millions/billions of
dollars or saves infrastructure costs on large scale - justifying developing many AI
applications for search

Typical Search Engine - High Level View
Data
Acquisition
Indexing
Ranking
Retrieval
Query
Understanding
UX
Search
Assistance
Logging
Monitoring
Experiment
management
SRP logic
Other
Post Search
Applications

notation
In this presentation, ‘documents’ will be items representing information about
atomic search results which might be documents, houses and other properties (real
estate), web pages (web search), ecommerce products (ecommerce),
Apps/Music/Video/Books (digital distribution), businesses (local search),
geographic entities (local search)
When I say “documents”, I mean information about objects of the search engine:
real estate properties, restaurants, streets, web pages, apps etc

Main Components
Aka Search Quality: Search assistance services (Autosuggest, dynamic facets etc),
Query Understanding, Ranking, SERP logic (snippet building, universal search -
mixing results of different corpora)
Aka Search Infrastructure: retrieval, loging, monitoring, ops, deploying index
Aka Indexing: data acquisition (crawling web, feeds, data imports), includes data
enrichment (duplicate resolution, cleaning data, mapping into common dictionaries,
extraction), indexing (building index for retrieval systems)

Main Components
Post Search applications: recommendation engines to serve results to user based on
their saved searched, saved items, notiﬁcations about new items, new prices,
availability
Search eXperience: view and design of search pages, result pages

AI is everywhere
Using AI components to improve data acquisition by 10%, indexing by 10%, query
understanding by 10%, ranking by 10%, result page by 10% gives more gains in
customer satisfaction and revenue/GMB than applying SOTA and improving only
one components such as query understanding or ranking by 30%.
The AI development should be driven by creating infrastructure, culture and toolsets
for continuous AI deployment, improvement, measurement everywhere in the search
stack. There are no ‘engineering’ teams, every team is an AI team

AI is not separable from engineering
AI development is not separable from other engineering developments.
Improvement of index selection by 10% lets either accommodate other data sources
to improve coverage (by getting more documents) or to improve ranking (by getting
more computation and using it for more advanced functions since fewer documents
to rank)
Improvement of infrastructure to decrease latency by 10% lets to deploy more
sophisticated ranking or query understanding functions (10% more time)
Good engineering quality and culture is not separable from AI development but a
mandatory part of AI culture

AI ‘rank labs’ or AI platforms
Search engine teams beneﬁt if there is an uniﬁed environment to train, deploy, serve
models - by reducing work on infrastructure, MLOps, sharing metrics, making easy
to measure end to end metrics
There is no an environment which will handle all types of AI development, AI
serving, AI measurement.
Search Systems are naturally complex by different tasks, different environments,
different languages, CI/CD systems but never ending work on unifying AI
development infrastructures across search teams helps

Multiple ways to ‘deploy’ AI models
Deploy models to TF Serving, TorchServe etc
Deploy models served in container
Compile a model directly into a machine code or as a source code of search
component (GBDT models into c++/java code to be used in ranking)
Relearn and change parameters of existing models served
Tons of other deployment scenarios
Etc etc etc

Multiple ways to serve AI models in Search
Streaming (ex: document updates)
Batch (ex: ofﬂine processing of queries or users or documents)
Serving runtime services (ranking, query understanding)

Multiple ways to improve AI in Search
Change evaluation methods and metrics, train models to new metrics
Change sampling procedures
Change training procedures
Change modeling techniques
Model previously unmodeled tasks
New data and new features
Infrastructure changes in serving etc etc etc

AI platforms
So, it’s almost impossible to make one platform to handle all types of AI
development and deployment (see variety in previous slides)
But, uniﬁcation of some of tasks reduces development and operation efforts and
costs, increase velocity of AI development bringing a lot of money and customer
satisfaction
Every AI driven Search company created “Rank Lab” AI platform for Search
Now , there is an open source such as KubeFlow, MLFlow, to simplify developments

AI services in prod
Good if they are decoupled, so multiple small teams can work on services
independently
But wiring is needed (a ‘signal’ from query understanding to be used in ranking etc)
Processing of a query may call ~dozens of AI services, processing of a document in
data acquisition and index may call 100s of AI services. Performance considerations
are extremely important
AI infrastructure beneﬁts greatly from common software practices, protocols,
orchestration to organize this ‘AI chaos’ and make order out of it

AI in prod
Besides ML objective functions and metrics
a. Latency
b. Resiliency
c. Throughput
d. Resource utilization
Are super important factors in design of every AI services at the search engine stack
AI service development should be tested and benchmark against them
Your model will serve billions of document updates or billions of queries, every 1ms
delay, 1 ms downtime, etc will cost either bad user experience or millions in ops

Typically two types of AI platforms emerge
- Type A. Horizontal, Ranking or NLP or forecast, fraud etc to be applied
everywhere
- Type B. Vertical. Indexing, Query Understanding,
- Different requirements
- Invest in infrastructure for reusable / repeatable AI cases
-

Next steps
I’ll describe how AI is used in every component of the search stack. It’s everywhere
and everywhere it’s a tool to get signiﬁcant improvement of that part of search.
The original presentation was 170 slides, I did almost random sampling to ﬁt the
time given to me. There are a lot important high impact AI use cases are not
included .
After that I’ll describe what makes AI engineer & data scientist successful in search
and what’s important for long term career success in the area

Indexing - index selection
Perhaps, one of the ﬁrst of AI applications to the search domains. Started in 90s
when web volume increased and index selection strategy started to be important
Which documents should be indexed? In which index layers they should be placed?
(many modern search engines are multi level, smaller index for frequently searched
items, a very big and comprehensive index for rarely search items)
AI for quality, popularity assessment of ‘documents’

Indexing - duplicate resolution
Which ‘documents’ are essentially the same (represent the same item)? Or have
highly duplicated content, so the second document does not carry more information?
Which documents are the same wrt to a particular query? (we do not want to show
collocated Target and Target Pharmacy for local search query target but they are
different entities for query pharmacy)

Indexing - attributes extraction
Given documents - full text descriptions of houses, ecommerce products, businesses
etc - extract signiﬁcant attributes important for search to understand items of
interest
(size, wheel size, weight, number of pages, location, view)

Indexing
Detection of adversary content: explicit, spam

Indexing - cold start
Given a new item, predict its conversion rate, customer demand, probability to buy,
propensity
Probability to be bought for a particular query, customer, in conjunction with
another item

Indexing - knowledge graph
Given a set of documents describing items, extract knowledge graphs, key items and
other entities with understanding of their relationship
Map information from a set of document with full text descriptions (typ. Billions of
documents) into structured knowledge graph representation of items of various types
and their relationships

Index - statistical tasks
Evaluation of quality and size of the index.
Is our index provides good coverage? What categories are missing? What data
quality problems?
Evaluation of Index size of external systems

Index - data quality
What attributes are important and must be mandatory and which attributes can be
optional in data acquisition? Which types of data, which categories to acquire?
AI processes continuously looking into search logs to decide customer priorities,
what drives conversion and using this information to drive data acquisition

Demand generation beyond just data
The same type of AI evaluation procedures to compute and forecast future demand
of items, to drive purchase decision if search engine is used to sell items

AI for query understanding
Query understanding is mapping of customer’s query into a machine understandable
format to retrieve a set of relevant items and rank them with highest probability of
customer engagement (view, purchase, etc)
Synonym expansion for better retrieval, removal of insigniﬁcant terms, correcting
spelling and other errors, term weighting, attribute and entity extraction, compound
and phrase extraction, classiﬁcation (novelty, price range etc ) etc

Query understanding parsing
Dependency, constituency and other parsings as a part of query understanding
stack. Typically, serves other part of query understanding stack

Query understanding Classification
Mapping a query into a certain set of categories to be used in retrieval and ranking
-> most probable document category (italian -> restaurants in local search),
-> most probable distance (gas -> 5 miles distance, micheline restaurant -> 50
miles distance local search)
-> novelty: printers -> released within 1 years, pillows -> release date does not
matter
Typically: 100s classifiers per search engine with significant impact on quality /
revenue

Query understanding Similar Queries
Given queries q1 - q2 how similar they are (how results for one query will be good
as results for the other query)
Tons of applications in query understanding and ranking: given features for one
query, apply them to another query for ranking, extend retrieval set etc etc

Query understanding: term weighting
Computer importance / weights of terms and phrases of queries to be used in
ranking

Query understanding: entity and attribute
extraction
Given a query: map it into structured representation of entities and attributes to be
used for better retrieval and ranking

Query understanding: backﬁlling
Given a query and its structured representation
Generate a query which represent latent user needs behind this qiery (not what user
types, but what she/he wants to ﬁnd)

Query understanding temporal aspects
Particular example of query classiﬁcation :
Detect temporal topicality shifts in query/user interest and in documents,

AI for ranking
Learning to rank / Machine Learning based ranking technologies to rank document
(LeToR/ MLR)
AI for unbiased ranking
User interactions based LeToR/Counterfactual
Frequently it’s the biggest revenue driver of the search engine but it’s
over-represented by AI in Search lectures, so just one slide from me

AI for search assistance
Typeahead prediction - language modeling, other contextual information, location of
user, previous searches of users,
Query dependent , user dependent navigational panels and guided search

Vertical AI
Very heterogeneous set of AI applications. examples:
Image search - understanding image similarity for image-image search and
extraction of keywords about images from image itself and relevant documents for
text to image search
Deep understanding of geographical data for local search, ex: extraction of region
names from various text documents describing houses, businesses, geo entities

AI Whole page
Given an output of several search engines: how to combine them to construct the
best customer experience.
Ex: music, video, book, podcasts as in iTunes;
web search, maps, youtube, news, image, books, scholar etc in Google

AI SERP snippet
How to generate the best descriptions of items in the search result page so
customers understand relevance of items without clicking on them
How to select the best chunk of text representing the item, picture, formats, -
depending on the query and the user

Indexing - infrastructure
AI to learn new more efﬁcient index structures
AI to predict index strategies (for search engines based on databases)

AI - Search ops
Improving data center efﬁciency
Predicting failing nodes
Power USage Effectiveness improvement

AI Result page
How to generate the page of the item / document?
How to place image, title, full text descriptions, attributes, reviews, detailed
descriptions, recommended items to maximize convenience to make the right decision
for the customer

AI price prediction
Predict the price of the item (for selling search engines)
Which will maximize item conversion and customer satisfaction and revenue of the
company
(economics problems, but tightly connected to search, depends on item position in
search, relevance, exposure, prices of other search results)

AI for search service performance
AI to predict performance load to deploy new machines to serve increased trafﬁc or
adopt to reduced trafﬁc to reduce operations costs
AI for caching to predict what results/queries should be cached to improve
latency/decrease load

AI conversational search
Conversational interfaces for search, multi turn interactions with customers to
understand customer search intent and help her/him to express their intent or even
to ﬁnd it by making latent intent explicit
NLP/NLU, dialog state management, deep reinforcement learning, text generation
ASR for voice based systems

AI SEO SEM
Getting legitimate trafﬁc from Google/Search engines, more users to the search
How to present search engine outputs (result pages, search pages, other) so search
engines pick them, index, rank higher, show for more queries, bring more users to
the search engine
NLP, generation of search friendly pages (good structure, titles, anchors)
Analysis of performance of marketing campaigns, ﬁnding better keyword, search
queries which are relevant and bring more users

AI anti SEO
Detection of users trying to affect the ranking or retrieval to promote their items
Behaviour analysis, data science, text analytics to detect behaviour online (fake
clicks, views, queries, purchases) or data manipulation (keyword stufﬁng, attribute
manipulation) to affect retrieval and ranking of search engine to promote pages of
SEO manipulator

AI Question Answering
Answering questions as ﬁnding the best factual answer rather than a document
relevant to the query.
Which french restaurant is the most romantic in Palo Alto? What’s the phone
number of sales customer support? What’s a return policy? When my items will be
shipped?

Post search AI
Given a set of queries relevant to user (saved queries, previous sessions) and a set of
items relevant to users
Generate email and other notiﬁcations about new items, price changes, availability
changes - which will help users to ﬁnd/buy/discover what they want

What makes a good search engineer
This part of the presentation is about what are qualities of a good search engineer
and how to build career in AI/Search
1. How to be successful in your search projects and what makes you a good
search engineer
2. How to be successful in a long term career building

Qualities of a good Search engineer
Required Knowledge for long term success in search (to be able to delivery multiple
company level impact successful projects):
1. Machine Learning, new models, new features,
2. Engineering, implementing software solutions with performance, quality, etc
requirements
3. Metrics / Customer, transforming customer experience into metrics which can
be used for ML training, experiments/analysis
4. Statistics, design and analysis of experiments
5. Business, understanding business, how to transform business development into
metrics/OKRs, and consequentually into new search features, new search
products

Many search features require changes in many parts of search stack: indexing,
ranking, query understanding, evaluation setups
Requires collaboration with many different teams: engineering, MLE, research,
statisticians.
Ability to collaborate at large scale with multiple diverse teams: communications,
document writing, project organization at multiple levels from coding to project
management to product management

Sometime, search development work requires long time a person / a small team
efforts, where help from management or from colleagues will not change much
Require ability to have long term focus and be able to work in an isolated result
focused environment (PhD style work), result focused environment

Ability to work on long term projects with no guaranteed outcomes
Many search projects are focused on improving certain customer satisfaction
metrics, (the number of local results, the number of new relevant results etc etc),
improving the model, feature set, something else.
Frequently, there is no guarantee that it’s achievable. Some search projects require
work with multiple unsuccessful tries before ﬁnding a good solution
Requires certain persistence to go through failure to failure before ﬁnding a
successful solution

Qualities of good Search engineer
Understanding the customer, and skills of transforming understanding the customer
needs into into actionable metrics
A lot of search development is not about continuous improvement of one relevance,
query understanding, index size etc metric, but about discovering and understanding
of various aspects of customer satisfaction and transforming this understanding into
new metrics, which can be used for training models, measurement and improvements
of the search

Continuous awareness of new developments in many areas of ML/IR/NLP/statistics
which can be used to improve search
Continuous professional development, learning, reading, in machine
learning/AI/NLP/IR, engineering/programming, and other professionals skills

Success of many big projects and initiatives depends on collaboration with multiple
teams from other technology teams to business departments (legal, marketing, etc)
Ability to ﬁnd a support and convince people with very different points of view about
importance, criteria of success, impact of technology projects
and
Ability to listen to feedback and proposals of very different people from business to
tech, objectively understand it and incorporate it into technology development

Qualities of a good Search Engineer
Engineering part is super important and frequently underestimated in many articles
and books. Only small part of the search development is a training of new models.
The other part is development of new product features, building infrastructure to
serve models, etc software engineering is a part of the job.
Search engines has strict performance limits, search engine is a face of your
business. It’s down, business is down. Quality engineering.
Skills how to write good, quality, performance code, how to test it, tune it, document
it, etc is crucial part of search engineering success.

Long term career success as a Search engineer
Reputation is the number 1 success criteria of a long term career success.
Reputation of you as an engineer, MLE, leader, collaborator. Reputation of you,
teams you built, etc
Reputation among engineering teams, business teams, your peers, partners and you
leadership.
Reputation based on different qualities from building large scale systems to success
in ML projects to understanding business needs and transforming them into
engineering products
First 15 years of career is focus on building of a reputation

Long term career success
Select only jobs which truly suits your
Next job offer: analyze the company: values, culture, technology area, business
vision - is it what you want?
Very important for the first job after college, PhD you get etc - good initial fit is
crucial
Assess companies, will you relate to its business, culture and people?
What you learn there will define your career for several decades
Do systematic assessment of every job offer -- but especially the first job after
college, PhD one is very important

The best job is a job with a company that suits you
When you select next step, be sure that company culture, values, product,
engineering ﬁts you, your development goals, your values. Do not move because of
popular technology, a big title, sudden unexpected salary increase, hype, and other
accidental to your long term career reasons

Focus on development of long term professional relationships
Develop diverse base of meaningful work connections, with colleagues from different
technology departments, different lines of business, marketing, legal, recruiters etc
based on joint work and your reputation as your work with them

Within your company, Move to more strategic projects with big impact on the
company business
Strategic projects - More opportunities for career development, more meaningful
work connections, more things to learn for long term career goals , typically more
interesting technologies, more to learn about business, technology, customer, more
opportunity for self development, more skills, more knowledge

More to more strategic and bigger impact contributions in your area of work
First job - develop models, develop software features as requested by mentor,
manager
Move from individual projects to team projects, from coding and model training to
deﬁning vision, strategy, roadmap, execution, building teams
In *every* role and project, widen your scope, do more challenging tasks, bigger
impact on the company business

Do not complaint, Make changes
It applies to code, technology, org structure, culture, relationships, products,
anything you believe can be improved
Do not just complain about things going wrong. Fix them whenever possible. By
coding, writing documentation, making people aware about wrong things and
proposing solutions, at every level of your career, you can make bigger changes than
you are expected at this step of career. Bring changes rather than whine.
Even if a problem is well above your role, propose solutions, notify relevant people,
bring value to solve it, rather than just complain.

Continuous professional development is crucial at every step of the career
Every year ask yourself questions,
over last 12 months
1. How much I learned about the technologies, the products, the services, the
markets? What part of this knowledge is relevant to my work? How much did it
help to improve my performance (performance of my team)
2. How many new people have I gotten to know at work? How diverse is this
people set? How many people have I improved relationships with?
-

Continuous professional development is crucial at every step of the career
Over last 12 months
1. What new results, accomplishments have i achieved? What have I launched,
improved? How much does it add to my reputation? Track record?
2. What new skills have i developed? Am I better in communications?
Technology? Analytics skills? Judgement? In which areas?
How can I do it better next year? What should I improve? How to apply these new
skills, relationship, knowledge?

AI in Multi Billion Search Engines. Career building in AI / Search. What makes a good search engineer

More Related Content

What's hot (20)

Similar to AI in Multi Billion Search Engines. Career building in AI / Search. What makes a good search engineer (20)

Recently uploaded (20)

AI in Multi Billion Search Engines. Career building in AI / Search. What makes a good search engineer