11. 陳昇瑋 / 當學術研究者遇見線上遊戲 16
As A Researcher
A killer application
35% Internet users & larger business than movie & music
An emerging field
E.g., IEEE Transactions on AI and CI in Games since Sep 2008
Asia-based researchers have some niches
Large user base (50%)
Lots of local game companies
It’s fun!
13. 陳昇瑋 / 當學術研究者遇見線上遊戲 20
Game Bots
Game bots: automated AI programs that can perform
certain tasks in place of gamers
Popular in MMORPG and FPS games
MMORPGs (Role Playing Games)
accumulate rewards in 24 hours a day
break the balance of power and economies in game
FPS games (First-Person Shooting Games)
a) improve aiming accuracy only
b) fully automated
achieve high ranking without proficient skills and efforts
14. 陳昇瑋 / 當學術研究者遇見線上遊戲 21
Bot Detection
Detecting whether a character is controlled by a bot is
difficult since a bot obeys the game rules perfectly
No general detection methods are available today
State of practice is identifying via human intelligence
Detect by “bots may show regular patterns or peculiar
behavior”
Confirm by “bots cannot talk like humans”
Labor-intensive and may annoy innocent players
15. 陳昇瑋 / 當學術研究者遇見線上遊戲 22
CAPTCHA in a Japanese Online Game
(CompletelyAutomated PublicTuring test to tell Computers and Humans Apart)
16. 陳昇瑋 / 當學術研究者遇見線上遊戲 23
Our Goal of Bot Detection Solutions
Passive detection
No intrusion in players’ gaming experience
No client software support is required
Generalizable schemes (for other games and other
game genres)
17. 陳昇瑋 / 當學術研究者遇見線上遊戲 24
Our Solution I: Traffic Analysis
Game client Game server
Traffic stream
Q: Whether a bot is controlling a game client
given the traffic stream it generates?
A: Yes or No
18. 陳昇瑋 / 當學術研究者遇見線上遊戲 25
Case Study: Ragnarok Online
(Figure courtesy of www.Ragnarok.co.kr)
19. 陳昇瑋 / 當學術研究者遇見線上遊戲 26
DreamRO -- A screen shot
World Map
View scope
Character
Status
20. 陳昇瑋 / 當學術研究者遇見線上遊戲 27
Trace Collection
Category Tr# ID Avg.
Period
Avg. Pkt rate Network
Human
players
8 A, B, C, D 2.6 hr 1.0 / 3.2 pkt/s ADSL,
Cable Modem,
Campus Network
Bots 11 K (Kore)
R (DreamRO)
17 hr 1.0 / 2.2 pkt/s
207 hours, 3.8 million packets were traced in total
Heterogeneity in player skills and network conditions
Category participants Client pkt rate Avg. RTT Avg. Loss rate
Human players 2 rookies
2 experts
0.8 ~ 1.2 pkt/s 45 ~ 192 ms 0.01% ~ 1.73%
Bots 2 bots 0.5 ~ 1.7 pkt/s 33 ~ 97 ms 0.004% ~ 0.2%
21. 陳昇瑋 / 當學術研究者遇見線上遊戲 28
Command Timing
Client response time (response time):
time difference between the client packet departure time and the
most recent server packet arrival time
We expect the following patterns:
A large number of small response times (bots respond server packets
immediately)
Regularity in response times
Observation
bots often issue their commands based on arrivals of server
packets, which carry the latest status of the character and
environment
State UpdateCommandAfter
certain
time t
22. 陳昇瑋 / 當學術研究者遇見線上遊戲 29
CDF of Client Response Times
Kore: Zigzag pattern
(multiples of a certain
value)
DreamRO: > 50% response
times are very small
23. 陳昇瑋 / 當學術研究者遇見線上遊戲 30
Histograms of Response Times
1 ms
multiple
peaks
1 ms multiple
peaks
27. 陳昇瑋 / 當學術研究者遇見線上遊戲 37
Robustness against Counter Attacks
Adding random delays to the release time of client
commands
Command timing scheme will be ineffective
Schemes based on traffic burstiness and human reaction to
network conditions are robust
Adding random delay to command timing will not eliminate the
regularity unless the added delay is longer than the updating interval
by orders of magnitude or heavy-tailed
However, adding such long delays will make the bots incompetent as
this will slowdown the character’s speed by orders of magnitude
28. 陳昇瑋 / 當學術研究者遇見線上遊戲 38
The IDC of the original packet arrival process
and that of intentionally-delayed versions
29. 陳昇瑋 / 當學術研究者遇見線上遊戲 39
Our Solution II: Movement Trajectory
Based on the avatar’s movement trajectory in game
Applicable for all genres of games where players
control the avatar’s movement directly
Avatar’s trajectory is high-dimensional (both in time
and spatial domain)
30. 陳昇瑋 / 當學術研究者遇見線上遊戲 40
The Rationale behind Our Scheme
The trajectory of the avatar controlled by a human
player is hard to simulate for two reasons:
Complex context information:
Players control the movement of avatars based on their
knowledge, experience, intuition, and a great deal of
environmental information in game.
Human behavior is not always logical and optimal
How to model and simulate realistic movements (for
game agents) is still an open question in the AI field.
31. 陳昇瑋 / 當學術研究者遇見線上遊戲 41
Bot Detection: A Decision Problem
Q: Whether a bot is controlling a game client given
the movement trajectory of the avatar?
A: Yes / No?
35. 陳昇瑋 / 當學術研究者遇見線上遊戲 45
Data Collection
Human traces downloaded from fan sites including GotFrag
Quake, Planet Quake, Demo Squad, and Revilla Quake Site
Bot traces collected on our own Quake server
CR BOT 1.14
Eraser Bot 1.01
ICE Bot 1.0
Totally 143.8 hours of traces were
collected
41. 陳昇瑋 / 當學術研究者遇見線上遊戲 51
Movement Trail Analysis
Activity
mean/sd of ON/OFF periods
Pace
speed/offset in each time period
teleportation frequency
Path
linger frequency/length
smoothness
detourness
Turn
frequency of mild turn, U-turn, …
43. 陳昇瑋 / 當學術研究者遇見線上遊戲 53
Step 1. Pace Vector Construction
For each trace sn , we compute the pace (distance) in
successive two seconds by
We then compute the distribution (histogram) of paces
with a fixed bin size by
where B is the number of bins in the distribution.
44. 陳昇瑋 / 當學術研究者遇見線上遊戲 54
Pace Vector: An Example
B is set to 200 (dimensions) in this work
45. 陳昇瑋 / 當學術研究者遇見線上遊戲 55
Step 2. Dimension Reduction with Isomap
We adopt Isomap for nonlinear dimension reduction for
Better classifiaction accuracy
Lower computation overhead in classification
Isomap
Assume data points lie on a manifold
1. Construct the neighborhood graph by kNN (k-nearest neighbor)
2. Compute the shortest geodesic path for each pair of points
3. Reconstruct data by MDS (multidimensional scaling)
A mathematical space in which every point has a neighborhood which
resembles Euclidean space, but in which the global structure may be
more complicated. (Wikipedia)
48. 陳昇瑋 / 當學術研究者遇見線上遊戲 59
Five Methods for Comparison
Method Data Input
kNN
Original 200-dimension
PaceVectors
Linear SVM
Nonlinear SVM
Isomap + kNN Isomap-reduced Pace
VectorsIsomap + Nonlinear SVM
52. 陳昇瑋 / 當學術研究者遇見線上遊戲 64
Unsubscription Prediction
Game improvement
Players’ unsubscription low satisfaction
Surveys can be conducted to determine the causes of player dissatisfaction
and improve the game accordingly
More likely to receive useful comments before players quit
Prevent VIP players’ quitting (maintain revenue)
For “item mall” model, users’ contribution (of revenue) is heavy-tailed
Losing VIP players may significantly harm the revenue
Network/system planning and diagnosis
By predicting “which” players tend to leave the game investigating is
there any problem regarding network resource planning, network
congestion, or server arrangement
53. 陳昇瑋 / 當學術研究者遇見線上遊戲 65
Unsubscription Prediction: Our Proposal
Rationale: players’ satisfaction / enthusiasm / addiction
to a game is embedded in her game play history
Quit in
30 days?
Quit
Stay
Login
history
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec 2007
Subscription time
56. 陳昇瑋 / 當學術研究者遇見線上遊戲 68
Data Collection Methodology
Create a game character
Use the command ‘who’
The command asks the game
server to reply with a list of
players who are currently
online
Write a specialized data-collection program (using C#,
VBScript, and Lua)
65. 陳昇瑋 / 當學術研究者遇見線上遊戲 78
Trend of Game Playing Time
37%
28%
20%
9%
6%
沒有特定趨勢,依當時情
況而定
越玩越短, 登入的天數也
越來越少
沒有明顯變化
到後期反而玩得比較多
隨著月份不同而周期性變
化
66. 陳昇瑋 / 當學術研究者遇見線上遊戲 79
Logisitic Regression Model for
Unsubscription Prediction
Significant features (out of > 20 features)
Avg. session time
Daily session count
Variation of the login hour (when the player starts playing a
game each day)
Variation of daily play time (number of hours)
A naive logistic regression model achieves
approximately 75% prediction accuracy
69. 陳昇瑋 / 當學術研究者遇見線上遊戲 82
How Gamers are Aware of Service
Quality?
Real-time interactive online games are generally considered QoS-
sensitive
Gamers are always complaining about high
“ping-times” or network lags
Online gaming is increasingly popular despite the best-effort
Internet
Q1: Are game players really sensitive to network
quality as they claim?
Q2: If so, how do they react to poor network
quality?
70. 陳昇瑋 / 當學術研究者遇見線上遊戲 83
Our Conjecture
Poor Network Quality
Unstable Game Play
Less Fun
Shorter Game PlayTime
strongly
associated
Verified with real-life
game traces
71. 陳昇瑋 / 當學術研究者遇見線上遊戲 84
神州 Online
ShenZhouOnline
A commercial MMORPG in Taiwan
Thousands of players online at anytime
TCP-based client-server architecture
72. 陳昇瑋 / 當學術研究者遇見線上遊戲 85
Trace Collection
(20 hours and 1,356 million packets)
Session # Avg.Time Top 20% Bottom 20%
15,140 100 min > 8 hours < 40 min
73. 陳昇瑋 / 當學術研究者遇見線上遊戲 86
Round-Trip Times vs. Session Time
y-axis is
logarithemic
74. 陳昇瑋 / 當學術研究者遇見線上遊戲 87
Delay Jitter vs. Session Time
(std. dev. of the round-trip times)
75. 陳昇瑋 / 當學術研究者遇見線上遊戲 88
Hypothesis Testing -- Effect of Loss Rate
Null Hypothesis:
All the survival curves are equivalent
Log-rank test: P < 1e-20
We have > 99.999% confidence claiming
loss rates are correlated with game playing times
high loss
low loss
med loss
TheCCDF of game session times
76. 陳昇瑋 / 當學術研究者遇見線上遊戲 89
Regression Modeling
Linear regression is not adequate
Violating the assumptions (normal errors, equal variance, …)
The Cox regression model provides a good fit
Log-hazard function is proportional to the weighted sum of factors
Hazard function (conditional failure rate)
The instantaneous rate of quitting a game for a player (session)
where each session has factors Z (RTT=x, jitter=y, …)
77. 陳昇瑋 / 當學術研究者遇見線上遊戲 90
Final Model & Interpretation
Interpretation
A: RTT = 200 ms
B: RTT = 100 ms, other factors same as A
Hazard ratio between A and B:
exp((log(0.2) – log(0.1)) × 1.27) ≈ 2.4
A will more likely leave a game (2.4 times probability) than B at any
moment
Variable Coef Std. Err. Signif.
log(RTT) 1.27 0.04 < 1e-20
log(jitter) 0.68 0.03 < 1e-20
log(closs) 0.12 0.01 < 1e-20
log(sloss) 0.09 0.01 7e-13
79. 陳昇瑋 / 當學術研究者遇見線上遊戲 92
Relative Influence of QoS Factors
Latency = 20% Client packet loss = 20%
Delay jitter = 45% Server packet loss = 15%
80. 陳昇瑋 / 當學術研究者遇見線上遊戲 93
An Index for ShenZhou Online
Features
derived from real-life game sessions
accessible and computable in real time
implications: delay jitter is more intolerable than delay
RTT:
jitter:
closs:
sloss:
round-trip times
level of network congestion
loss rate of client packets
loss rate of server packets
81. 陳昇瑋 / 當學術研究者遇見線上遊戲 94
App #1: Evaluation of Alternative Designs
Suppose now we have two designs (e.g., protocols)
One leads to lower delay but high jitter:
100 ms, 120 ms, 100 ms, 120 ms, 100 ms, 120 ms, 100 ms, 120 ms, …
One leads to higher delay but lower jitter:
150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, …
Which one design shall we choose?
time
network latency
150 ms
82. 陳昇瑋 / 當學術研究者遇見線上遊戲 95
App #2: Overlay Path Selection
Internet
path delay jitter loss rate score
100 ms (G) 50 ms (P) 5% (P) 3.84
150 ms (A) 20 ms (G) 1% (A) 6.33
200 ms (P) 30 ms (A) 1% (A) 5.43
83. 陳昇瑋 / 當學術研究者遇見線上遊戲 96
Player Departure Behavior Analysis
Player departure rate is decreasing by time
Golden time is the first 10 minutes: the longer gamers play, the
more external factors would affect their decisions to stay or leave
allocating more resources to players just entered
95. 陳昇瑋 / 當學術研究者遇見線上遊戲
Feature Engineering
108
A feature is a piece of information that might be useful for
prediction. Any attribute could be a feature, as long as it is
useful to the model.
"…some machine learning projects succeed and some fail.
What makes the difference? Easily the most important
factor is the features used.“
—Pedro Domingos,
"A Few UsefulThings to Know about Machine Learning”