Why Users Drop out of
Blade & Soul ?
o Leo HyeongNam Lee
o Sep 14, 2018
1. Introduction
2. EDA & Data Preparing
- Preparing Analysis
- Party Data
- Activity Data
- Trade Data
- Payment Data
- Guild Data
3. Modeling
- Model Selection
- Feature Engineering
- Grid Search
4. Analysis : why users drop out?
5. Conclusion
EDA &
Data PreParing
1. Preparing Analysis
2. Party Data
3. Activity Data
4. Trade Data
5. Payment Data
6. Guild Data
Load Libraries & Data
Create Custom Functions
1. Counting Functions : Count number of things.
EX) participation in party and guild.
2. Converting Functions : Convert original forms to target forms.
Usually use string to list, time and etc. label to score also.
3. Connecting Functions : Find target information from their index or
id list.
4. Association Functions : Analyze relationship between users, party
and guild. Especially use party, guild and trade data.
5. Visualization Functions : ECDF, bar and etc.
Party Data
1. Data Structure
2. Number of Participation
3. Duration of party
4. Scale of party
5. Week, Day, Hour
Data Structure
1. About 7M rows and 7 columns
2. No NA
3. No duplicates
Number of participation
1. ECD Graph
2. Cut to 1000 (except outlier)
3. Week = black, month = red, 2month = blue, retained = green
Duration of Party
1. ECD Graph
2. Count number of participation in party by DURATION(UNIT: 5min)
EX) 0~5, 5~10, … , 55~60, 60~
3. 2 kinds of shape
4. Week = black, month = red, 2month = blue, retained = green
Scale of Party
1. ECD Graph
2. Count number of participation in party by
NUMBER OF PARTY MEMEBERS. EX) 1, 2, … , 10, 11~
3. 1, 6, 11~ are interesting.
4. Week = black, month = red, 2month = blue, retained = green
Week
1. ECD Graph
2. Count number of participation in party by
PARTY START WEEK. EX) 1, 2, … , 7, 8 AND mean, median, max
3. 1, 7, 8, summaries are interesting.
4. Week = black, month = red, 2month = blue, retained = green
1. ECD Graph
2. Count number of participation in party by
PARTY START WEEK. EX) 1, 2, … , 7, 8 AND mean, median, max
3. 1, 7, 8, summaries are interesting.
4. Week = black, month = red, 2month = blue, retained = green
Week
1. ECD Graph
2. Count number of participation in party by
PARTY START DAY. EX) 1, 2, … , 6, 7 AND mean, median, max
3. 1, 7, max are interesting.
4. Week = black, month = red, 2month = blue, retained = green
Day
1. ECD Graph
2. Count number of participation in party by PARTY START HOUR.
EX) 0~2, 3~5, … , 18~20, 21~23 AND mean, median, max
3. 6~8, , 18~20, max are interesting.
4. Week = black, month = red, 2month = blue, retained = green
Hour
Activity Data
1. Data Structure
2. Data Preparing by Week
3. Interesting Weeks
4. Interesting Columns
Data Structure
1. About 440,000 rows and 38 columns
2. If users didn’t log in the game a whole week,
activity data don’t have been logged that week
3. No duplicates
4. Almost columns are regularized.
Irregular columns are below
wk : activity week
acc_id : user id
cnt_dt: 1~7, int
Data Preparing by Week
1. Define Function
2. Seperate Dataset
3. Extract Values : 1st~8th week & sum, mean, median, max
Interesting Week
1. ECD Graph
2. 2 kinds of shape
3. Anyway, HARD TO CLASSIFY MONTH AND 2MONTH.
They have extremely similar features.
4. Week = black, month = red, 2month = blue, retained = green
1st type : almost 1st ~ 7th week 2nd type : 8th week
Interesting Columns
1. ECD Graph
2. 2 kinds of shape
3. Labels are extremely same as each others in many cases.
4. Week = black, month = red, 2month = blue, retained = green
1st type : about 5 features are meaningful 2nd type : others are 0 or outlier
Trade Data
1. Data Structure
2. Association between Users
3. Scoring Association
4. Counting Trade by Week
5. Counting Trade by Day
6. Counting Trade by Hour
Data Structure
1. About 10M rows and 7 columns
2. No NA
3. 45% of rows are duplicates
Association between Users
1. Bar Graph
2. Counting number of users who trade more than twice with each users.
3. 2month label is special
4. Week = black, month = red, 2month = blue, retained = green
In Sell Cases In Buying Cases In Both Cases
Scoring Association
1. Define Function
2. In Sell Cases 2. In Buy Cases
Counting Trades by Week
1. ECD Graph
2. Counting number of trades by Week
3. Never meaningful (In sell cases, very slightly meaningful)
4. Week = black, month = red, 2month = blue, retained = green
Counting Trades by Day
1. ECD Graph
2. Counting number of trades by Day
3. Never meaningful (In sell cases, very slightly meaningful)
4. Week = black, month = red, 2month = blue, retained = green
Counting Trades by Hour
1. ECD Graph
2. Counting number of trades by Hour
3. Never meaningful (In sell cases, very slightly meaningful)
4. Week = black, month = red, 2month = blue, retained = green
Payment Data
1. Data Structure
2. Weekly Payment
3. Paying or Not
Data Structure
1. 800,000 rows and 3 columns
2. No NA: even if NO PAY
3. No duplicates
4. Payment Amounts are regularized.
Weekly Payment
1. ECD Graph
2. A few users had paid
3. Only retained users had paid
4. Week = black, month = red, 2month = blue, retained = green
Paying or Not
1. Bar Graph
2. Retained users had paid more
3. Week = black, month = red, 2month = blue, retained = green
Guild Data
1. Data Structure
2. Association between Guild and Users
3. Scale of Guild
4. Joining Guild or Not
Data Structure
1. About 10,000 rows and 2 columns
2. No NA
3. No duplicates
Association between Guild and Users
1. ECD Graph
2. Scoring Guild: if guild members belong to train data,
they take a score depending on their label.
And summarize it.
3. Retained users stand out against others
4. Week = black, month = red, 2month = blue, retained = green
Scale of Guild
1. Scale : Number of guild members
2. Sometimes users join multiple guilds. So summarize it.
3. Retained users stand out against others
4. ECD Graph, Week = black, month = red, 2month = blue, retained = green
Joining in Guild or Not
1. If users join in guild, 1. Or if not, 0.
2. Retained users stand out against others
3. Bar Graph, Week = black, month = red, 2month = blue, retained = green
Modeling
1. Model selection
2. Feature Engineering
3. Grid Search
4. Prediction
Model Selection
1. Select Candidates: Regression, SVM, DNN and Tree
2. Features of Dataset
- Hundreds of Features : many features are very similar each class
- 100,000 rows
- Multi-Class Classification
3. Regression : Not good at hundreds of features
4. SVM : Not good at 100,000 rows
5. DNN : I have only laptop
6. Tree : Tree ensemble model can handle this problem.
So, I chose light-gbm.
Feature Engineering
1. Based on EDA, validate the feature importance.
2. Feature importance follows EDA results
Unmeaningful things are badMeaningful features perform excellent
Grid Search
1. Start light-gbm from default option
2. Learning rate : default is best
3. Loss function : don’t have meaningful differences
4. Number of rounds & number of tree leaves : Search
Default Option Grid Search
Why Users Drop out of
Blade & Soul ?
1. Retained Users
- They seems to Blade & Soul is a part of their life.
- They enjoy various game contents actively.
- They actively interact others
2. Others
- They play game extreme way. Not a routine.
- They rarely play hardcore contents
- They do not interact others
I Failed This Way
1. Create pseudo rows :
Assumed each 1st ~ 7th week is the last week of recoding,
not 8th week, so can infer label of new data.
2. Over sampling
3. Predict each class : Not multi-class classification
4. Ensemble model vertically
Copyright
1. Image : NCSOFT Corp.
2. Fonts
- DX신사임당Bold : DXKOREA

More Related Content

PPT
Unit 1.3 Introduction to Programming (Part 2)
PPTX
Microsoft NERD Talk - R and Tableau - 2-4-2013
PDF
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
PDF
Machine learning at b.e.s.t. summer university
PPTX
Game analytics - The challenges of mobile free-to-play games
PDF
Zurich R user group presentation May 2016
PDF
Exploring the Data science Process
PDF
Custom-Made Games with Machine Learning and Big Data
Unit 1.3 Introduction to Programming (Part 2)
Microsoft NERD Talk - R and Tableau - 2-4-2013
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
Machine learning at b.e.s.t. summer university
Game analytics - The challenges of mobile free-to-play games
Zurich R user group presentation May 2016
Exploring the Data science Process
Custom-Made Games with Machine Learning and Big Data

Similar to Why game users drop out of blade & soul? - 2018 big contest (20)

PPTX
4.Data-Visualization.pptx
PDF
data-science-lifecycle-ebook.pdf
DOCX
Predicting Winner of DOTA2 Game
PDF
The Data Science Process
PDF
Demystify Big Data, Data Science & Signal Extraction Deep Dive
PPTX
Graphs and Financial Services Analytics
PPT
Making Data Work For You - The Data Assemblyline
PDF
Games Analytics and players segmentation
PPTX
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
PDF
Interpreting Data Like a Pro - Dawn of the Data Age Lecture Series
PPTX
Unit 2- Machine Learninnonjjnkbhkhjjljknkmg.pptx
PPTX
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
PDF
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf
PPTX
PDF
Big Data Analysis
PPTX
Financial Networks IV. Analyzing and Visualizing Exposures
DOCX
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
PPTX
Data centric design and operation
PPTX
Data centric Design & Operation: A data-driven and scientific approach for ga...
PDF
Introduction to Data science in syllabus of machine intelligence in data science
4.Data-Visualization.pptx
data-science-lifecycle-ebook.pdf
Predicting Winner of DOTA2 Game
The Data Science Process
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Graphs and Financial Services Analytics
Making Data Work For You - The Data Assemblyline
Games Analytics and players segmentation
Mining and analyzing social media facebook w gephi - hicss47 tutorial - dav...
Interpreting Data Like a Pro - Dawn of the Data Age Lecture Series
Unit 2- Machine Learninnonjjnkbhkhjjljknkmg.pptx
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
1026332_Master_Thesis_Eef_Lemmens_BIS_269.pdf
Big Data Analysis
Financial Networks IV. Analyzing and Visualizing Exposures
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Data centric design and operation
Data centric Design & Operation: A data-driven and scientific approach for ga...
Introduction to Data science in syllabus of machine intelligence in data science
Ad

More from HYEONGNAM LEE (6)

PDF
소프트웨어 마에스트로 10기 - 책을 만나는 순간, 책을찍다
PDF
STEC: 채용공고로 알아보는 IT 기술 트렌드 분석 서비스
PDF
REST가 unrest할 때, GraphQL, gRPC는 어때요?
PDF
계륵 같은 딥러닝, 실 서비스 적용기
PDF
Rnn for seq
PPTX
코딩은 문제해결이다 (도구로써 파이썬)
소프트웨어 마에스트로 10기 - 책을 만나는 순간, 책을찍다
STEC: 채용공고로 알아보는 IT 기술 트렌드 분석 서비스
REST가 unrest할 때, GraphQL, gRPC는 어때요?
계륵 같은 딥러닝, 실 서비스 적용기
Rnn for seq
코딩은 문제해결이다 (도구로써 파이썬)
Ad

Recently uploaded (20)

PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PDF
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
PDF
A biomechanical Functional analysis of the masitary muscles in man
PPT
expt-design-lecture-12 hghhgfggjhjd (1).ppt
PPTX
ai agent creaction with langgraph_presentation_
PPT
statistics analysis - topic 3 - describing data visually
PPTX
eGramSWARAJ-PPT Training Module for beginners
PDF
Session 11 - Data Visualization Storytelling (2).pdf
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
Caseware_IDEA_Detailed_Presentation.pptx
PDF
Microsoft 365 products and services descrption
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Tapan_20220802057_Researchinternship_final_stage.pptx
PPTX
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPT
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt
DU, AIS, Big Data and Data Analytics.ppt
Navigating the Thai Supplements Landscape.pdf
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
©️ 02_SKU Automatic SW Robotics for Microsoft PC.pdf
A biomechanical Functional analysis of the masitary muscles in man
expt-design-lecture-12 hghhgfggjhjd (1).ppt
ai agent creaction with langgraph_presentation_
statistics analysis - topic 3 - describing data visually
eGramSWARAJ-PPT Training Module for beginners
Session 11 - Data Visualization Storytelling (2).pdf
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
SET 1 Compulsory MNH machine learning intro
Caseware_IDEA_Detailed_Presentation.pptx
Microsoft 365 products and services descrption
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Tapan_20220802057_Researchinternship_final_stage.pptx
CHAPTER-2-THE-ACCOUNTING-PROCESS-2-4.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PROJECT CYCLE MANAGEMENT FRAMEWORK (PCM).ppt

Why game users drop out of blade & soul? - 2018 big contest

  • 1. Why Users Drop out of Blade & Soul ? o Leo HyeongNam Lee o Sep 14, 2018
  • 2. 1. Introduction 2. EDA & Data Preparing - Preparing Analysis - Party Data - Activity Data - Trade Data - Payment Data - Guild Data 3. Modeling - Model Selection - Feature Engineering - Grid Search 4. Analysis : why users drop out? 5. Conclusion
  • 3. EDA & Data PreParing 1. Preparing Analysis 2. Party Data 3. Activity Data 4. Trade Data 5. Payment Data 6. Guild Data
  • 5. Create Custom Functions 1. Counting Functions : Count number of things. EX) participation in party and guild. 2. Converting Functions : Convert original forms to target forms. Usually use string to list, time and etc. label to score also. 3. Connecting Functions : Find target information from their index or id list. 4. Association Functions : Analyze relationship between users, party and guild. Especially use party, guild and trade data. 5. Visualization Functions : ECDF, bar and etc.
  • 6. Party Data 1. Data Structure 2. Number of Participation 3. Duration of party 4. Scale of party 5. Week, Day, Hour
  • 7. Data Structure 1. About 7M rows and 7 columns 2. No NA 3. No duplicates
  • 8. Number of participation 1. ECD Graph 2. Cut to 1000 (except outlier) 3. Week = black, month = red, 2month = blue, retained = green
  • 9. Duration of Party 1. ECD Graph 2. Count number of participation in party by DURATION(UNIT: 5min) EX) 0~5, 5~10, … , 55~60, 60~ 3. 2 kinds of shape 4. Week = black, month = red, 2month = blue, retained = green
  • 10. Scale of Party 1. ECD Graph 2. Count number of participation in party by NUMBER OF PARTY MEMEBERS. EX) 1, 2, … , 10, 11~ 3. 1, 6, 11~ are interesting. 4. Week = black, month = red, 2month = blue, retained = green
  • 11. Week 1. ECD Graph 2. Count number of participation in party by PARTY START WEEK. EX) 1, 2, … , 7, 8 AND mean, median, max 3. 1, 7, 8, summaries are interesting. 4. Week = black, month = red, 2month = blue, retained = green
  • 12. 1. ECD Graph 2. Count number of participation in party by PARTY START WEEK. EX) 1, 2, … , 7, 8 AND mean, median, max 3. 1, 7, 8, summaries are interesting. 4. Week = black, month = red, 2month = blue, retained = green Week
  • 13. 1. ECD Graph 2. Count number of participation in party by PARTY START DAY. EX) 1, 2, … , 6, 7 AND mean, median, max 3. 1, 7, max are interesting. 4. Week = black, month = red, 2month = blue, retained = green Day
  • 14. 1. ECD Graph 2. Count number of participation in party by PARTY START HOUR. EX) 0~2, 3~5, … , 18~20, 21~23 AND mean, median, max 3. 6~8, , 18~20, max are interesting. 4. Week = black, month = red, 2month = blue, retained = green Hour
  • 15. Activity Data 1. Data Structure 2. Data Preparing by Week 3. Interesting Weeks 4. Interesting Columns
  • 16. Data Structure 1. About 440,000 rows and 38 columns 2. If users didn’t log in the game a whole week, activity data don’t have been logged that week 3. No duplicates 4. Almost columns are regularized. Irregular columns are below wk : activity week acc_id : user id cnt_dt: 1~7, int
  • 17. Data Preparing by Week 1. Define Function 2. Seperate Dataset 3. Extract Values : 1st~8th week & sum, mean, median, max
  • 18. Interesting Week 1. ECD Graph 2. 2 kinds of shape 3. Anyway, HARD TO CLASSIFY MONTH AND 2MONTH. They have extremely similar features. 4. Week = black, month = red, 2month = blue, retained = green 1st type : almost 1st ~ 7th week 2nd type : 8th week
  • 19. Interesting Columns 1. ECD Graph 2. 2 kinds of shape 3. Labels are extremely same as each others in many cases. 4. Week = black, month = red, 2month = blue, retained = green 1st type : about 5 features are meaningful 2nd type : others are 0 or outlier
  • 20. Trade Data 1. Data Structure 2. Association between Users 3. Scoring Association 4. Counting Trade by Week 5. Counting Trade by Day 6. Counting Trade by Hour
  • 21. Data Structure 1. About 10M rows and 7 columns 2. No NA 3. 45% of rows are duplicates
  • 22. Association between Users 1. Bar Graph 2. Counting number of users who trade more than twice with each users. 3. 2month label is special 4. Week = black, month = red, 2month = blue, retained = green In Sell Cases In Buying Cases In Both Cases
  • 23. Scoring Association 1. Define Function 2. In Sell Cases 2. In Buy Cases
  • 24. Counting Trades by Week 1. ECD Graph 2. Counting number of trades by Week 3. Never meaningful (In sell cases, very slightly meaningful) 4. Week = black, month = red, 2month = blue, retained = green
  • 25. Counting Trades by Day 1. ECD Graph 2. Counting number of trades by Day 3. Never meaningful (In sell cases, very slightly meaningful) 4. Week = black, month = red, 2month = blue, retained = green
  • 26. Counting Trades by Hour 1. ECD Graph 2. Counting number of trades by Hour 3. Never meaningful (In sell cases, very slightly meaningful) 4. Week = black, month = red, 2month = blue, retained = green
  • 27. Payment Data 1. Data Structure 2. Weekly Payment 3. Paying or Not
  • 28. Data Structure 1. 800,000 rows and 3 columns 2. No NA: even if NO PAY 3. No duplicates 4. Payment Amounts are regularized.
  • 29. Weekly Payment 1. ECD Graph 2. A few users had paid 3. Only retained users had paid 4. Week = black, month = red, 2month = blue, retained = green
  • 30. Paying or Not 1. Bar Graph 2. Retained users had paid more 3. Week = black, month = red, 2month = blue, retained = green
  • 31. Guild Data 1. Data Structure 2. Association between Guild and Users 3. Scale of Guild 4. Joining Guild or Not
  • 32. Data Structure 1. About 10,000 rows and 2 columns 2. No NA 3. No duplicates
  • 33. Association between Guild and Users 1. ECD Graph 2. Scoring Guild: if guild members belong to train data, they take a score depending on their label. And summarize it. 3. Retained users stand out against others 4. Week = black, month = red, 2month = blue, retained = green
  • 34. Scale of Guild 1. Scale : Number of guild members 2. Sometimes users join multiple guilds. So summarize it. 3. Retained users stand out against others 4. ECD Graph, Week = black, month = red, 2month = blue, retained = green
  • 35. Joining in Guild or Not 1. If users join in guild, 1. Or if not, 0. 2. Retained users stand out against others 3. Bar Graph, Week = black, month = red, 2month = blue, retained = green
  • 36. Modeling 1. Model selection 2. Feature Engineering 3. Grid Search 4. Prediction
  • 37. Model Selection 1. Select Candidates: Regression, SVM, DNN and Tree 2. Features of Dataset - Hundreds of Features : many features are very similar each class - 100,000 rows - Multi-Class Classification 3. Regression : Not good at hundreds of features 4. SVM : Not good at 100,000 rows 5. DNN : I have only laptop 6. Tree : Tree ensemble model can handle this problem. So, I chose light-gbm.
  • 38. Feature Engineering 1. Based on EDA, validate the feature importance. 2. Feature importance follows EDA results Unmeaningful things are badMeaningful features perform excellent
  • 39. Grid Search 1. Start light-gbm from default option 2. Learning rate : default is best 3. Loss function : don’t have meaningful differences 4. Number of rounds & number of tree leaves : Search Default Option Grid Search
  • 40. Why Users Drop out of Blade & Soul ?
  • 41. 1. Retained Users - They seems to Blade & Soul is a part of their life. - They enjoy various game contents actively. - They actively interact others 2. Others - They play game extreme way. Not a routine. - They rarely play hardcore contents - They do not interact others
  • 42. I Failed This Way 1. Create pseudo rows : Assumed each 1st ~ 7th week is the last week of recoding, not 8th week, so can infer label of new data. 2. Over sampling 3. Predict each class : Not multi-class classification 4. Ensemble model vertically
  • 43. Copyright 1. Image : NCSOFT Corp. 2. Fonts - DX신사임당Bold : DXKOREA