SlideShare a Scribd company logo
Hypothesis	Testing:		
How	to	Eliminate	Ideas	as	Soon	as	Possible

Roman	Zykov	
Retail	Rocket	
Boston,	RecSys	2016
Context
• Intro	
• Offline	vs	Online	testing	
• Make	offline	testing	shorter	
• Artificial	diversity	metric	
• Online	tests
Retail	Rocket
• Personalised	real-time	recommendations	
• E-commerce	only	
• Multiple	channels	(site,	email,	…)	
• Founded	in	2012	
• Offices:	Amsterdam,	Barcelona,	Milan,	Moscow	
• 1000+	retail	partners	
• 100+	million	daily	events
Why	testing	is	important?
• Highly	competitive	market	
• It’s	not	hard	to	create	own	recommendation		
• Constant	changes	in	the	product	and	algorithms	
• Fast	and	reliable	decisions
Offline	vs	Online	testing
Offline	testing		forecasts	online	testing	results	
• Relatively	fast,	testing	of	minor	changes	requires	hours	
• Few	resources:	data,	computational	resources,	code,	1	dev	
• Hard	to	forecast	online	metrics	in	some	cases	
• Influence	of	an	algorithm	on	users'	behaviour	is	ignored	
• Bad	values	of	offline	metrics	prevent	online	implementation	
Online	test	-	final	decision	point	
• Requires	much	time.	At	least	two	cycles	of	decision	making	
• Requires	many	resources:	design,	onsite	production,	etc
Testing	facts
• Nine	out	of	ten	ideas	do	not	improve	anything	
• Most	ideas	have	minor	impact:	
o add	new	data:	extracted	from	text,	images,	etc	
o adjust	parameters	of	algorithm
Offline	testing
Offline	predicts	Online
Major	changes	or	new	algorithm	
• Always	check	by	online	experiment	
• Find	appropriate	offline	metric	after	
• Try	different	definitions	of	users’	sessions	
• Try	different	events	sequences	
Minor	changes	
•			Use	offline	tests	if	you	have	proved	offline	metric
Make	offline	testing	shorter	Retail	Rocket
What	we	did	
• Functional	programming	on	Scala/Spark.	Four	languages	
(Python,	Java,	Pig,	Hive)	had	been	previously	used.	
• Research	in	Scala/Spark	Notebooks	with	added	R	integration	
for	graphics	
• Offline	evaluation	framework	for	all	of	our	tasks	with	metrics	
calculations.	The	most	complicated	project	among	others	in	
Retail	Rocket	
What	we	got	
• It	takes	hours	to	prove	or	disapprove	any	simple	idea	
whereas	previously	it	could	have	taken	days	
• Research	is	limited	by	the	power	of	our	cluster	and	the	
number	of	data	scientists
Scala/Spark	notebook	with	R
Offline	framework
• Scala	on	Spark	
• Deals	with	existing	web	logs	
• Implicit	feedback	
• Major	metrics:	
o Recall,	Diversity,	Recall	with	NN,	Empty	Recs	
• Minor	metrics:	
o Serendipity,	Novelty,	Coverage	
• Different	types	of	events	sequences	
• Different	definitions	of	users’	sessions	
• Personalised	/	Non-personalised	recommendations	
• Adjustable	TOP	of	viewable	recommendations		
• Test	panel	of	sites	from	different	domains
Offline	events	sequences
		view1													view2										view3										cart1	 						cart2											view4										view5	 					view6							purchase1
View2View View2Cart View2Purchase Cart2Purchase Cart2Cart
view1	->	view2	
view2	->	view3	
view3	->	view4	
view4	->	view5	
view5	->	view6	
view1	->	cart1	
view2	->	cart1	
view3	->	cart1	
view4	->	cart1	
view5	->	cart2	
view6	->	cart2	
view1	->	purchase1	
view2	->	purchase1	
view3	->	purchase1	
view4	->	purchase1	
view5	->	purchase1	
view6	->	purchase1	
cart1	->	purchase1	
cart2	->	purchase1	
cart1	->	cart2	
*	Events:	product	view,	add	to	cart,	purchase,	main	page	view,	search,	catalog	page,	…
Offline	metric	examples	
		view1													view2										view3										cart1	 						cart2											view4										view5	 					view6							purchase1
What	Customers	Buy	After	Viewing	This	Item	
• View2Cart	
• View2Purchase	
• …	
Customers	Who	Bought	This	Item	Also	Bought		
• Cart2Cart	
• Cart2Purchase	
• View2Cart	
• …
Case:	Artificial	diversification
Artificial	diversification
Original
After
Problem:	It’s	not	impossible	to	use	Recall	for	evaluating
Recall	with	Nearest	Neighbours	(NN)
Top	4	recs
0.8 0.7 0.5 0.5
0.8 0.7 0.5 0.5
0.6 0.5 0.4
0.9 0.8 0.3 0.5
Content	based	similarity

(Nearest	neighbours)
Real	item
0.5
Indirect	hit
1.0
Direct	hit	
No	hit	
0.0
Metric	=	Average	over	all	sessions
Online	A/B	testing
AA/BB	tests
A	group
A	group
B	group
B	group
Control	group
Test	group
AA/BB	tests
A
A
B
B
A
A
B
B
IdealDirty
Bayesian	approach
• Conversion	rates	
o Beta	distribution	with	normal	priors		
• Average	Order	Values	
o Normal	distribution	(after	log)	with	normal	priors	
• Priors	from	historical	data	before	experiment	
Anything	may	be	done	with	posteriors.	
E.g.:	There	is	a	95%	chance	that	A	has	an	1%	lift	over	B
Conclusion
• Offline	testing	can	predict	online	results	
• One	programming	language	for	R&D	reduces	the	test	time	
• The	Scala	language	is	a	good	alternative	for	ML	tasks	
• Different	event	sequences	for	offline	metrics	
• Recall	with	Nearest	Neighbours	(NN)	metric
Thank	you!
Roman	Zykov	
Retail	Rocket		
rzykov@retailrocket.net	
https://github.com/RetailRocket/SparkMultiTool

More Related Content

PDF
RecSys 2016 Talk: Feature Selection For Human Recommenders
PDF
Электронная коммерция: от Hadoop к Spark Scala
PDF
Kib Rif 2015. Make money from your data
PPTX
сервисы персонализации на основе данных
PDF
Big data europe 2012 brochure (3)
PPT
Wikimart recommendations
PPT
Hadoop in Wikimart. Part 1. Business
PPT
Hadoop implementation in Wikimart
RecSys 2016 Talk: Feature Selection For Human Recommenders
Электронная коммерция: от Hadoop к Spark Scala
Kib Rif 2015. Make money from your data
сервисы персонализации на основе данных
Big data europe 2012 brochure (3)
Wikimart recommendations
Hadoop in Wikimart. Part 1. Business
Hadoop implementation in Wikimart

More from Roman Zykov (20)

PDF
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
PPT
MIPhT presentation about BI
PPT
Owox rzykov kp_iexamples
PPT
Owox rzykov
PDF
Roman zykovcertificates
PDF
Wpaper 005 functionalism_new_approach
PDF
Searchpatterns 100519055231-phpapp02
PDF
Metrics drivendesign
PPTX
E-commerce KPIs
PPT
Ozon в высшей школе экономики часть 4
PPT
Ozon в высшей школе экономики часть 3
PPT
Ozon в высшей школе экономики часть 2
PPT
Ozon в высшей школе экономики часть 1
PPTX
Roman Zykov Certificates
PPT
Связной клуб
PDF
Complete Ga Power User Web
PPTX
RIW2009 Анализ продвижения
PPTX
Seo Prodvizhenie Short
PDF
Jobremont Ru
PDF
Kib Rzykov 24th Apr2009
Google Analytics vs Omniture SiteCatalyst vs In-ouse Webanalytics at iMetrics
MIPhT presentation about BI
Owox rzykov kp_iexamples
Owox rzykov
Roman zykovcertificates
Wpaper 005 functionalism_new_approach
Searchpatterns 100519055231-phpapp02
Metrics drivendesign
E-commerce KPIs
Ozon в высшей школе экономики часть 4
Ozon в высшей школе экономики часть 3
Ozon в высшей школе экономики часть 2
Ozon в высшей школе экономики часть 1
Roman Zykov Certificates
Связной клуб
Complete Ga Power User Web
RIW2009 Анализ продвижения
Seo Prodvizhenie Short
Jobremont Ru
Kib Rzykov 24th Apr2009
Ad

Recently uploaded (20)

PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Sciences of Europe No 170 (2025)
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Sciences of Europe No 170 (2025)
Taita Taveta Laboratory Technician Workshop Presentation.pptx
2. Earth - The Living Planet Module 2ELS
Cell Membrane: Structure, Composition & Functions
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
bbec55_b34400a7914c42429908233dbd381773.pdf
neck nodes and dissection types and lymph nodes levels
AlphaEarth Foundations and the Satellite Embedding dataset
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
TOTAL hIP ARTHROPLASTY Presentation.pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
The KM-GBF monitoring framework – status & key messages.pptx
HPLC-PPT.docx high performance liquid chromatography
Ad

How to eliminate ideas as soon as possible