當學術研究者遇見線上遊戲

當學術研究者遇見
線上遊戲
陳昇瑋
中央研究院資訊科學研究所

陳昇瑋 / 當學術研究者遇見線上遊戲 2
US$ 42 billion
US$ 35 billion
US$ 63 billion
Video games
Movie
Music
US$ 27 billion Book
http://vgsales.wikia.com/wiki/Video_game_industry
Entertainment Market Size (worldwide)
No. 1
No. 2
No. 3
No. 4

Game Research: My Own Reasons
As A PC Gamer …
As A Programmer …
As A Researcher …

As A PC Gamer (1)
198
8
1989
199
0
199
1

As A PC Gamer (2)
199
0
199
2
199
3
199
8

As A Programmer (1)
10 歲寫 football game with ROM BASIC
國中寫對打遊戲 with dBASE & Pascal
高中寫 RPG with C & Assembly
RichardGarriott
1980

As A Programmer (2)
1999 – 2002 資策會教育訓練課程 (C/C++, Winsock Programming, Delphi,
C++Builder) 夾帶遊戲設計課程
1999 – 2001《遊戲設計大師》專欄作家
2000 出版《Delphi 深度歷險》
2002 出版《C++Builder 深度歷險》

As A Researcher
A killer application
35% Internet users & larger business than movie & music
An emerging field
E.g., IEEE Transactions on AI and CI in Games since Sep 2008
Asia-based researchers have some niches
Large user base (50%)
Lots of local game companies
It’s fun!

SecurityTopics
Game Bot
Detection

Game Bots
Game bots: automated AI programs that can perform
certain tasks in place of gamers
Popular in MMORPG and FPS games
MMORPGs (Role Playing Games)
accumulate rewards in 24 hours a day
 break the balance of power and economies in game
FPS games (First-Person Shooting Games)
a) improve aiming accuracy only
b) fully automated
 achieve high ranking without proficient skills and efforts

Bot Detection
Detecting whether a character is controlled by a bot is
difficult since a bot obeys the game rules perfectly
No general detection methods are available today
State of practice is identifying via human intelligence
Detect by “bots may show regular patterns or peculiar
behavior”
Confirm by “bots cannot talk like humans”
Labor-intensive and may annoy innocent players

CAPTCHA in a Japanese Online Game
(CompletelyAutomated PublicTuring test to tell Computers and Humans Apart)

Our Goal of Bot Detection Solutions
Passive detection
 No intrusion in players’ gaming experience
No client software support is required
Generalizable schemes (for other games and other
game genres)

Our Solution I: Traffic Analysis
Game client Game server
Traffic stream
Q: Whether a bot is controlling a game client
given the traffic stream it generates?
A: Yes or No

Case Study: Ragnarok Online
(Figure courtesy of www.Ragnarok.co.kr)

DreamRO -- A screen shot
World Map
View scope
Character
Status

Trace Collection
Category Tr# ID Avg.
Period
Avg. Pkt rate Network
Human
players
8 A, B, C, D 2.6 hr 1.0 / 3.2 pkt/s ADSL,
Cable Modem,
Campus Network
Bots 11 K (Kore)
R (DreamRO)
17 hr 1.0 / 2.2 pkt/s
207 hours, 3.8 million packets were traced in total
Heterogeneity in player skills and network conditions
Category participants Client pkt rate Avg. RTT Avg. Loss rate
Human players 2 rookies
2 experts
0.8 ~ 1.2 pkt/s 45 ~ 192 ms 0.01% ~ 1.73%
Bots 2 bots 0.5 ~ 1.7 pkt/s 33 ~ 97 ms 0.004% ~ 0.2%

Command Timing
Client response time (response time):
time difference between the client packet departure time and the
most recent server packet arrival time
We expect the following patterns:
A large number of small response times (bots respond server packets
immediately)
Regularity in response times
Observation
bots often issue their commands based on arrivals of server
packets, which carry the latest status of the character and
environment
State UpdateCommandAfter
certain
time t

CDF of Client Response Times
Kore: Zigzag pattern
(multiples of a certain
value)
DreamRO: > 50% response
times are very small

Histograms of Response Times
1 ms
multiple
peaks
1 ms multiple
peaks

31
Periodograms of Histograms of
Response times
Player 1 Player 2

Examining the Trend of Traffic Burstiness

An Integrated Classifier
Conservative approach (10000 packets):
false positive rate ≈ 0% and 90% correct rate
Progressive approach (2000 packets):
false negative rate < 1% and 95% correct rate

Robustness against Counter Attacks
Adding random delays to the release time of client
commands
Command timing scheme will be ineffective
Schemes based on traffic burstiness and human reaction to
network conditions are robust
 Adding random delay to command timing will not eliminate the
regularity unless the added delay is longer than the updating interval
by orders of magnitude or heavy-tailed
 However, adding such long delays will make the bots incompetent as
this will slowdown the character’s speed by orders of magnitude

The IDC of the original packet arrival process
and that of intentionally-delayed versions

Our Solution II: Movement Trajectory
Based on the avatar’s movement trajectory in game
Applicable for all genres of games where players
control the avatar’s movement directly
Avatar’s trajectory is high-dimensional (both in time
and spatial domain)

The Rationale behind Our Scheme
The trajectory of the avatar controlled by a human
player is hard to simulate for two reasons:
Complex context information:
Players control the movement of avatars based on their
knowledge, experience, intuition, and a great deal of
environmental information in game.
Human behavior is not always logical and optimal
How to model and simulate realistic movements (for
game agents) is still an open question in the AI field.

Bot Detection: A Decision Problem
Q: Whether a bot is controlling a game client given
the movement trajectory of the avatar?
A: Yes / No?

User Movement Trails

3D Path Visualization Tool

Case Study: Quake 2

Data Collection
Human traces downloaded from fan sites including GotFrag
Quake, Planet Quake, Demo Squad, and Revilla Quake Site
Bot traces collected on our own Quake server
CR BOT 1.14
Eraser Bot 1.01
ICE Bot 1.0
Totally 143.8 hours of traces were
collected

Data Representation
(X,Y)(X,Y)t (X,Y) (X,Y)

Aggregate View of Trails (Human & 3 Bots)
Human CR Bot
Eraser ICE Bot

Trails of Human Players

Trails of Eraser Bot

Trails of ICE Bot

Movement Trail Analysis
Activity
mean/sd of ON/OFF periods
Pace
speed/offset in each time period
teleportation frequency
Path
linger frequency/length
smoothness
detourness
Turn
frequency of mild turn, U-turn, …

Bot Detection Performance

Step 1. Pace Vector Construction
For each trace sn , we compute the pace (distance) in
successive two seconds by
We then compute the distribution (histogram) of paces
with a fixed bin size by
where B is the number of bins in the distribution.

Pace Vector: An Example
B is set to 200 (dimensions) in this work

Step 2. Dimension Reduction with Isomap
We adopt Isomap for nonlinear dimension reduction for
Better classifiaction accuracy
Lower computation overhead in classification
Isomap
Assume data points lie on a manifold
1. Construct the neighborhood graph by kNN (k-nearest neighbor)
2. Compute the shortest geodesic path for each pair of points
3. Reconstruct data by MDS (multidimensional scaling)
A mathematical space in which every point has a neighborhood which
resembles Euclidean space, but in which the global structure may be
more complicated. (Wikipedia)

A Graphic Representation of Isomap

PCA (Linear) vs. Isomap (Nonlinear)

Five Methods for Comparison
Method Data Input
kNN
Original 200-dimension
PaceVectors
Linear SVM
Nonlinear SVM
Isomap + kNN Isomap-reduced Pace
VectorsIsomap + Nonlinear SVM

Evaluation Results
Error Rate
False Positive Rate False Negative Rate

Evaluation Results
Error Rate
False Postive Rate False Negative Rate

User BehaviorTopics
Game-PlayTime
Prediction

Unsubscription Prediction
Game improvement
Players’ unsubscription  low satisfaction
Surveys can be conducted to determine the causes of player dissatisfaction
and improve the game accordingly
More likely to receive useful comments before players quit
Prevent VIP players’ quitting (maintain revenue)
For “item mall” model, users’ contribution (of revenue) is heavy-tailed
Losing VIP players may significantly harm the revenue
Network/system planning and diagnosis
By predicting “which” players tend to leave the game  investigating is
there any problem regarding network resource planning, network
congestion, or server arrangement

Unsubscription Prediction: Our Proposal
Rationale: players’ satisfaction / enthusiasm / addiction
to a game is embedded in her game play history
Quit in
30 days?
Quit
Stay
Login
history
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec 2007
Subscription time

World of Warcraft
The most popular MMOG for now

Data Collection Methodology
Create a game character
Use the command ‘who’
The command asks the game
server to reply with a list of
players who are currently
online
Write a specialized data-collection program (using C#,
VBScript, and Lua)

Trace Summary

福克斯大神之謎？？ (1)
ref. http://forum.gamebase.com.tw/content.jsp?no=4715&cno=47150002&sno=75201947
ref. http://www.wings-of-narnia.com/viewtopic.php?t=3012
網友A：不知道在聖光之願部落的玩家有沒有發現到，在新手村薩滿訓練師的後
面，永遠都會站著一個叫「福克斯大神」的獵人玩家！在半年前我到聖光定居時
我在新手村見到他，到現在他仍然還是留守在那個地方……不會暫離, 而且可以觀
察他= ="
這種事該回報給GM嗎？創新手看到他的時候都覺得好恐佈啊囧
網友B：me too
看到的一瞬間突然起雞皮疙瘩.....
網友C："已離去"玩家的怨念(怨魂@@)嗎?
還是在悲傷愛情故事裡,癡等所愛的另一人?
^^^^^^^^QQ
網友D：哈線在好多人在看噢
旁邊為了一大群人@@
觀光景點呀XD

福克斯大神之謎？？ (2)
網友E：我剛剛也有去看了一下開了一個ID叫做“聽說有鬼”的獸人戰士坐在他
面前的桶子一直望著他~ 忽然!
<暫離>福克斯大神
他蹲下了...隔一分鐘..消失=ˇ="
..
..
現在我心裡也是毛毛的..
網友F：好猛鬼啊!!!!!!大神的力量好可怕啊,一堆信眾死在他之前！！！！！！
網友G：我上次有開過去看，還遇到了兩位同好，看的時候真的蠻不可思議的...
可以列入魔獸10大世界奇觀吧!

福克斯大神與祂的信眾們 -_-

Questionnaire
37%
19%
16%
12%
4%
4%3%2%2%1%
WoW
天堂
RO
楓之谷
石器時代
LUNA
神州
其他
洛汗
萬王之王
# samples: 1,747

Reasons for User Unsubscription

Trend of Game Playing Time
37%
28%
20%
9%
6%
沒有特定趨勢，依當時情
況而定
越玩越短，登入的天數也
越來越少
沒有明顯變化
到後期反而玩得比較多
隨著月份不同而周期性變
化

Logisitic Regression Model for
Unsubscription Prediction
Significant features (out of > 20 features)
Avg. session time
Daily session count
Variation of the login hour (when the player starts playing a
game each day)
Variation of daily play time (number of hours)
A naive logistic regression model achieves
approximately 75% prediction accuracy

Unsubscription Prediction Result

陳昇瑋 / 當學術研究者遇見線上遊戲
NetworkingTopics
How gamers are aware of service quality?
User Perception
Measurement

How Gamers are Aware of Service
Quality?
Real-time interactive online games are generally considered QoS-
sensitive
Gamers are always complaining about high
“ping-times” or network lags
Online gaming is increasingly popular despite the best-effort
Internet
Q1: Are game players really sensitive to network
quality as they claim?
Q2: If so, how do they react to poor network
quality?

Our Conjecture
Poor Network Quality
Unstable Game Play
Less Fun
Shorter Game PlayTime
strongly
associated
Verified with real-life
game traces

神州 Online
ShenZhouOnline
A commercial MMORPG in Taiwan
Thousands of players online at anytime
TCP-based client-server architecture

Trace Collection
(20 hours and 1,356 million packets)
Session # Avg.Time Top 20% Bottom 20%
15,140 100 min > 8 hours < 40 min

Round-Trip Times vs. Session Time
y-axis is
logarithemic

Delay Jitter vs. Session Time
(std. dev. of the round-trip times)

Hypothesis Testing -- Effect of Loss Rate
Null Hypothesis:
All the survival curves are equivalent
Log-rank test: P < 1e-20
We have > 99.999% confidence claiming
loss rates are correlated with game playing times
high loss
low loss
med loss
TheCCDF of game session times

Regression Modeling
Linear regression is not adequate
Violating the assumptions (normal errors, equal variance, …)
The Cox regression model provides a good fit
Log-hazard function is proportional to the weighted sum of factors
Hazard function (conditional failure rate)
The instantaneous rate of quitting a game for a player (session)
where each session has factors Z (RTT=x, jitter=y, …)

Final Model & Interpretation
Interpretation
A: RTT = 200 ms
B: RTT = 100 ms, other factors same as A
Hazard ratio between A and B:
exp((log(0.2) – log(0.1)) × 1.27) ≈ 2.4
A will more likely leave a game (2.4 times probability) than B at any
moment
Variable Coef Std. Err. Signif.
log(RTT) 1.27 0.04 < 1e-20
log(jitter) 0.68 0.03 < 1e-20
log(closs) 0.12 0.01 < 1e-20
log(sloss) 0.09 0.01 7e-13

How good does the model fit?

Relative Influence of QoS Factors
Latency = 20% Client packet loss = 20%
Delay jitter = 45% Server packet loss = 15%

An Index for ShenZhou Online
Features
derived from real-life game sessions
accessible and computable in real time
implications: delay jitter is more intolerable than delay
RTT:
jitter:
closs:
sloss:
round-trip times
level of network congestion
loss rate of client packets
loss rate of server packets

App #1: Evaluation of Alternative Designs
Suppose now we have two designs (e.g., protocols)
One leads to lower delay but high jitter:
100 ms, 120 ms, 100 ms, 120 ms, 100 ms, 120 ms, 100 ms, 120 ms, …
One leads to higher delay but lower jitter:
150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, 150 ms, …
Which one design shall we choose?
time
network latency
150 ms

App #2: Overlay Path Selection
Internet
path delay jitter loss rate score
100 ms (G) 50 ms (P) 5% (P) 3.84
150 ms (A) 20 ms (G) 1% (A) 6.33
200 ms (P) 30 ms (A) 1% (A) 5.43

Player Departure Behavior Analysis
Player departure rate is decreasing by time
Golden time is the first 10 minutes: the longer gamers play, the
more external factors would affect their decisions to stay or leave
allocating more resources to players just entered

線上遊戲市場表現預測
97

資料科學如何輔助線上遊戲
虛寶銷售

虛擬寶物

哪一件銷量最好？

商品銷售差異
總銷售量：93,945
首週銷量：55,947
總銷售量：1,268
首週銷量：992

資料分析團隊該通常做些什麼？
玩家層面
DAU, WAU, MAU
上線時間
平均花費
商品層面
每個商品的交易量
每個商品隨著時間交易量
演進
玩家 vs. 商品
玩家對於特定商品的偏好
玩家屬性 (性別、年紀、
等級、職業、是否 VIP)、
購買期間與商品的關係
行銷作法
使用推薦系統來做個人化
推薦商品給玩家
103
X

其實我們很想知道一個問題…

以資料分析幫助設計虛擬商品
量化影響虛擬商品銷售好壞的要素
主觀要素
影像訊號要素
提供可以讓設計師參考的設計指引
建構一套系統化的方法，為運行在不同區域, 國家的
遊戲，提供調整虛擬商品設計的準則

目標
設計熱銷的虛擬商品

Dataification
總銷售量：93,945
首週銷量：55,947
總銷售量：1,268
首週銷量：992

Feature Engineering
108
A feature is a piece of information that might be useful for
prediction. Any attribute could be a feature, as long as it is
useful to the model.
"…some machine learning projects succeed and some fail.
What makes the difference? Easily the most important
factor is the features used.“
—Pedro Domingos,
"A Few UsefulThings to Know about Machine Learning”

http://jobs.netflix.com/jobs.php?id=NFX01466

Netflix Taggers
聘請專人依照 SOP (36 pages) 觀賞並標註影片
555 個標籤，76,897 種組合 (2014年一月)
以標籤為基礎建立影片推薦系統

Netflix Micro-genres for Videos

http://bountyworkers.net/

女角衣服的風格標籤
俏皮暗紅撩人溫婉魔女和風裸露辣妹
可愛火焰管家華麗東洋誘惑媚惑學生
蓬裙火辣性感淘氣萌萌制服彩衣艷麗
冷豔惡魔女傭夢幻狂野神聖女僕飄逸
野性青春古典甜美日式迷你裙

首週銷量見真章
商品發售首週銷售量佔總銷量一半
首週銷售量與總銷量之相關係數為 > 0.9

銷售量與活躍玩家數
相關係數：0.83

虛擬商品銷售指標 (SI)
比較不同時期發售之裝備的銷售優劣
去除發售時間之影響 (1)
去除銷售期間之影響 (2)
去除玩家購買力影響 (3)
每個裝備的銷售指標 SI (Sale Index) 定義為
銷售數量 normalized by (1), (2), and (3)

風格
標籤
與SI
之相
關係
數

彩衣：0.667
誘惑：0.143
俏皮：0.048
火辣：0
冷豔：0.548
夢幻：0.161
俏皮：0.065
裸露：0
SI：0.0621 SI：0.0013

以風格標籤預測女裝 SI 高低
真實值
總數
高低
預
測
值
高 19 2 21
低 2 14 16
總數 21 16
準確度：89.2%
靈敏度：90.5%
特異度：90.5%
AUC：0.890

影像訊號分析

以影像訊號分辨女裝 SI 高低
真實值
總數
高低
預
測
值
高 16 2 18
低 5 19 24
總數 21 21
準確度：83.3%
靈敏度：88.9%
特異度：76.2%
AUC：0.833
略低於風格標籤

以
預測
女裝
SI
R^2：0.669
風
格
標
籤
影
像
訊
號

當然，這只是個開始…
開發虛擬商品設計指引
與遊戲企劃與美術人員共同開發更具體的風格標籤
與影像訊號與實務上商品設計之間的關係
分析不同玩家族群對於商品外觀之偏好
男玩家買女裝 vs. 女玩家買女裝
大戶 vs. 偶而購物玩家
跨文化比較
廣告的拍攝

當學術研究者遇見線上遊戲

More Related Content

What's hot (20)

Similar to 當學術研究者遇見線上遊戲 (20)

More from Sheng-Wei (Kuan-Ta) Chen (16)

Recently uploaded (16)

當學術研究者遇見線上遊戲