SlideShare a Scribd company logo
Chen‐Chi Wu1, Kuan‐Ta Chen2
                                      Yu‐Chun Chang1, Chin‐Laung Lei1

1Department of Electrical Engineering, National Taiwan University

               2Institute of Information Science, Academia Sinica




IPTComm 2008                                                        1
Outline
    Motivation
    Methodology
    Performance evaluation
    Summary




IPTComm 2008                 2
Motivation
    VoIP is becoming popular because of
         Low call cost
         High voice quality
    Skype, a popular VoIP application
           over 10,000,000 concurrent users
    Accurately identifying VoIP flows from the network traffic 
    is required
         Traffic analysis
         Traffic management

IPTComm 2008                                                  3
Motivation
    Challenges of VoIP flows identification
         Various signaling protocols: SIP, H.323, various proprietary 
         protocols
         Non‐standard port numbers
         Packet payload encryption

    The interaction of human conversation is unique
         result in a specific characteristic of VoIP traffic



IPTComm 2008                                                             4
4‐State Traffic Pattern
    Infer the on/off (talking/silence) pattern by the level of the 
    packet rate during a short period
    We model a two‐way conversation by a process of four states
         State A: a period that speaker A is talking and B is silent
         State B: B is talking and A is silent
         State D: both A and B are talking
         State M: mutual silence             ON       OFF



                                                                       ON

                  OFF

IPTComm 2008                                                                5
Intuition behind Our Approach
     The 4‐state traffic pattern of VoIP traffic is unique 
     compared to that of other network applications

                Web

  P2P (BitTorrent)
Online game (WoW)

          TELNET

       VoIP (Skype)
                                          A or B   D     M


 IPTComm 2008                                                 6
Methodology
    Detect VoIP flows based on the unique human speech 
    conversation patterns embedded in voice traffic
    Derive features (attributes) from the conversation 
    patterns
    Adopt naïve Bayesian classifier, a supervised machine 
    learning tool, to divide traffic into the VoIP and non‐VoIP 
    class
         The class label of each training data is required



IPTComm 2008                                                       7
Methodology Overview
        Training phase                                        Identification phase
                                                                   Incoming flows 
Labeled training flows                                             (unknown class)
  (VoIP or non‐VoIP)                                                    Extract conversation
                                                                        patterns and derive
               Extract 4‐state traffic       Naïve                      features   
               patterns and derive         Bayesian
               features   
                                           Classifier               Flow vectors
                        Learn classifier                Classify
                        parameters
    Flow vectors                                                    Flow labels
                                                                    (VoIP or non‐VoIP) 
IPTComm 2008                                                                              8
Naïve Bayesian Classifier
 Naïve Bayesian classifier is based on the Bayes’ theorem
                               P( B | A) P( A)
                   P( A | B) =
                                   P( B)


       Each flow is represented by a vector  X = (x1, x2,…, xn),  
       depicting n features A1, A2,…, An

       Suppose there are m classes, C1, C2,…, Cm


IPTComm 2008                                                         9
Naïve Bayesian Classifier
    Given a flow vector X, the classifier predicts the flow 
    belongs to class Ci iff
        P (C i | X ) > P (C j | X ) for 1 ≤ j ≤ m, j ≠ i

    By Bayes’ theorem
                                   P( X | Ci ) P(Ci )
                    P(Ci | X ) =
                                        P( X )
    P ( X ) is constant and             is the prior probability, thus 
                            P (Ci )
    the task is to maximize
                           P ( X | Ci )
IPTComm 2008                                                              10
Naïve Bayesian Classifier
    The naïve assumption is that the values of the features 
    are conditionally independent of one another
                                  n
                 P ( X | C i ) = ∏ P ( x k | Ci )
                                 k =1

                 = P( x1 | Ci ) × P ( x2 | Ci ) ×   × P ( x n | Ci )


    P ( x1 | Ci ), P ( x2 | Ci ),..., P ( xn | Ci ) can be easily estimated 
    from the training data


IPTComm 2008                                                                   11
How to derive features from the 4‐
state traffic pattern?
    Use a Markov chain to model the VoIP traffic pattern
    Statistics of traffic patterns


         Web

          P2P
        WoW

      TELNET

          VoIP
                                    A or B   D    M
IPTComm 2008                                               12
Markov Chain
    Build a Markov chain model based on a set of known VoIP 
    traffic patterns
    Derive a feature – likelihood value
                                      Transition probabilities of the Markov chain
                                                 A         B        D         M
                                       A      0.9022   0.0028    0.0380    0.0571
                                       B      0.0029   0.9030    0.0391    0.0550
                                       D      0.0607   0.0592    0.8763    0.0038
                                       M      0.0465   0.0439    0.0019    0.9078


               4‐state Markov chain
IPTComm 2008                                                                      13
Likelihood of Traffic Patterns
    Given a traffic pattern with a state sequence S1, S2,…, Sn, 
    where Si ∈ { A, B, D, M }
    Compute the log‐likelihood value as
                  log( P , 2 × P2,3 × × P( n −1) n )
                        1

            Pi,j : the transition probability from Si to Sj
    Traffic flows may vary in length, thus define the 
    normalized log‐likelihood value as
                     log( P , 2 × P2,3 × × P( n −1) n )
                           1

                               N
               N: the length of the sequence
IPTComm 2008                                                       14
Likelihood of Traffic Patterns
    The Markov chain represents typical human conversation

    VoIP flows => large log‐likelihood value

    Non‐VoIP flows => low log‐likelihood value
    Exhibit non‐human‐like behavior: non‐interactive, 
    independent, unidirectional




IPTComm 2008                                                 15
Statistics of Traffic Patterns
    Mean of the period that party A (or B) is ON (talking) each 
    time (also compute the standard deviation)
         Bidirectional behavior

    Mean and standard deviation of the sojourn time in 
    states A, B, D, M, respectively
         Interactive behavior

    State alternation frequency
         Fragmented and disordered level of traffic pattern

IPTComm 2008                                                  16
Statistics of Traffic Patterns
    State alternation frequency
         Alternation frequency between different states




         E.g., (6 alternations between different states) / (20 sec.)
IPTComm 2008                                                           17
Feature Summary
                                       Feature set
               Normalized log‐likelihood value based on the Markov 
               chain
               Speech period of party A or B (mean, standard deviation)
               Sojourn time in each states* (mean, standard deviation)
               Ratio of sojourn time in each states*
               Alternation rate between states*
               *states A, B, D, M




IPTComm 2008                                                              18
Methodology
        Training phase                                        Identification phase
                                                                   Incoming flows 
Labeled training flows                                             (unknown class)
  (VoIP or non‐VoIP)                                                    Extract conversation
                                                                        patterns and derive
               Extract 4‐state traffic       Naïve                      features   
               patterns and derive         Bayesian
               features   
                                           Classifier               Flow vectors
                        Learn classifier                Classify
                        parameters
    Flow vectors                                                    Flow labels
                                                                    (VoIP or non‐VoIP) 
IPTComm 2008                                                                             19
Trace Collection
       We collected network traffic from 5 categories of 
       applications
         VoIP (Skype), TELNET, Web, P2P (BitTorrent), online game 
         (World of Warcraft)

Category       # Connections    Duration      # Packets      Bytes
VoIP                      462   2,388 (min)     4,728,240    4,318 (MB)
TELNET                  2,008   4,729 (min)    10,559,261    7,331 (MB)
Web                     1,406   1,537 (min)     2,528,359     680 (MB)
P2P                    15,845   3,334 (min)    29,220,870   30,500 (MB)
Online game             2,224     120 (min)    28,264,360   59,097 (MB)

IPTComm 2008                                                              20
Performance Evaluation
    Detect VoIP flows as early as possible
         Detection time is a major concern
         95% accuracy with 4‐second detection time
         97% accuracy with 11‐second detection time




IPTComm 2008                                          21
Performance Evaluation
    Goal        detect VoIP flows
         VoIP flows         positives, non‐VoIP flows        negatives
    True positive rate
             The  number  of  VoIP  flows  correctly  identified
       TPR =
                     The  number  of  total  VoIP  flows
   False positive rate
        The  number  of  non ‐ VoIP  flows  correctly  identified
  FPR =
                The  number  of  total  non ‐ VoIP  flows
    True negative rate
IPTComm 2008                                                             22
Performance Evaluation
    97% TPR with a detection time longer than 3 sec.
    Flows of World of Warcraft tend to be mis‐identified
         Achieve 90% TNR with a detection time longer than 10 sec.




IPTComm 2008                                                         23
ROC Curves
    ROC (Receiver Operating Characteristic)




IPTComm 2008                                  24
Summary
    Propose a VoIP flow identification scheme based on 
    human conversation patterns

    Our scheme yields an identification accuracy 95% within 
    4 sec. of the detection time, and 97% within 11 sec.

    High accuracy in short detection time




IPTComm 2008                                                   25
Thanks for your attention



IPTComm 2008                     26

More Related Content

PDF
Bz25454457
PDF
The performance of turbo codes for wireless communication systems
PDF
D I G I T A L C O M M U N I C A T I O N S J N T U M O D E L P A P E R{Www
PDF
Dw24779784
PDF
44 i9 advanced-speaker-recognition
PDF
gio's tesi
PDF
The H.264 Video Compression Standard
PDF
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...
Bz25454457
The performance of turbo codes for wireless communication systems
D I G I T A L C O M M U N I C A T I O N S J N T U M O D E L P A P E R{Www
Dw24779784
44 i9 advanced-speaker-recognition
gio's tesi
The H.264 Video Compression Standard
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...

What's hot (20)

PDF
TCP over low-power and lossy networks: tuning the segment size to minimize en...
PDF
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
PDF
Ipmc003 2
PDF
Er24902905
PPTX
Cell Tech V09 0312
PDF
Ber performance analysis of mimo systems using equalization
PDF
intro_dgital_TV
PDF
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
PDF
On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...
PDF
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...
PDF
Iy3116761679
PDF
Performance Analysis of M-ary Optical CDMA in Presence of Chromatic Dispersion
PDF
Multinode Cooperative Communications with Generalized Combining Schemes
PDF
Development of Robust Adaptive Inverse models using Bacterial Foraging Optimi...
PDF
LREProxy module for Kamailio Presenation
PDF
Apresentação feita em 2005 no Annual Simulation Symposium.
PDF
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...
PPT
A survey on transfer learning
TCP over low-power and lossy networks: tuning the segment size to minimize en...
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...
Ipmc003 2
Er24902905
Cell Tech V09 0312
Ber performance analysis of mimo systems using equalization
intro_dgital_TV
EXPERIENCES WITH HIGH DEFINITION INTERACTIVE VIDEO ...
On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...
Performance analysis and implementation for nonbinary quasi cyclic ldpc decod...
Iy3116761679
Performance Analysis of M-ary Optical CDMA in Presence of Chromatic Dispersion
Multinode Cooperative Communications with Generalized Combining Schemes
Development of Robust Adaptive Inverse models using Bacterial Foraging Optimi...
LREProxy module for Kamailio Presenation
Apresentação feita em 2005 no Annual Simulation Symposium.
HGS-Assisted Detection Algorithm for 4G and Beyond Wireless Mobile Communicat...
A survey on transfer learning
Ad

Similar to Detecting VoIP Traffic Based on Human Conversation Patterns (20)

PDF
Tracing of voip traffic in the rapid flow internet backbone
PDF
Internet Traffic Classification Using Bayesian Analysis Techniques
PDF
Traffic Classification using a Statistical Approach
PDF
Practical Attacks Against Encrypted VoIP Communications
PDF
H42045359
PDF
Inferring Speech Activity from Encrypted Skype Traffic
PDF
Traffic classification svm_im2015_10may2015
PDF
Cmg2006 paper 6168
PDF
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
PDF
IRJET- Comparative Study on Embedded Feature Selection Techniques for Interne...
PDF
Rapid Detection of Constant-Packet-Rate Flows
PDF
A Review on Traffic Classification Methods in WSN
DOC
DOWNLOAD
PDF
Sip Intrusion Detection And Prevention Recommendations And Prototype Impleme...
PPT
Encrypted Traffic Mining
DOC
Comparing Naive Bayesian and k-NN algorithms for automatic ...
PDF
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
PPTX
Network Measurement and Monitori - Assigment 1, Group3, "Classification"
PDF
pam2008
Tracing of voip traffic in the rapid flow internet backbone
Internet Traffic Classification Using Bayesian Analysis Techniques
Traffic Classification using a Statistical Approach
Practical Attacks Against Encrypted VoIP Communications
H42045359
Inferring Speech Activity from Encrypted Skype Traffic
Traffic classification svm_im2015_10may2015
Cmg2006 paper 6168
Impact of Asymmetry of Internet Traffic for Heuristic Based Classification
IRJET- Comparative Study on Embedded Feature Selection Techniques for Interne...
Rapid Detection of Constant-Packet-Rate Flows
A Review on Traffic Classification Methods in WSN
DOWNLOAD
Sip Intrusion Detection And Prevention Recommendations And Prototype Impleme...
Encrypted Traffic Mining
Comparing Naive Bayesian and k-NN algorithms for automatic ...
Real-Time Non-Intrusive Speech Quality Estimation: A Signal-Based Model
Network Measurement and Monitori - Assigment 1, Group3, "Classification"
pam2008
Ad

More from Academia Sinica (20)

PDF
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
PDF
Games on Demand: Are We There Yet?
PDF
Detecting In-Situ Identity Fraud on Social Network Services: A Case Study on ...
PDF
Cloud Gaming Onward: Research Opportunities and Outlook
PPTX
Quantifying User Satisfaction in Mobile Cloud Games
PDF
量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
PPTX
On The Battle between Online Gamers and Lags
PPTX
Understanding The Performance of Thin-Client Gaming
PPT
Quantifying QoS Requirements of Network Services: A Cheat-Proof Framework
PPT
Online Game QoE Evaluation using Paired Comparisons
PPTX
GamingAnywhere: An Open Cloud Gaming System
PPT
Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic Approach
PPT
Forecasting Online Game Addictiveness
PDF
Identifying MMORPG Bots: A Traffic Analysis Approach
PDF
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
PDF
Game Bot Detection Based on Avatar Trajectory
PDF
Improving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
PDF
A Collusion-Resistant Automation Scheme for Social Moderation Systems
PDF
Tuning Skype’s Redundancy Control Algorithm for User Satisfaction
PDF
Network Game Design: Hints and Implications of Player Interaction
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Games on Demand: Are We There Yet?
Detecting In-Situ Identity Fraud on Social Network Services: A Case Study on ...
Cloud Gaming Onward: Research Opportunities and Outlook
Quantifying User Satisfaction in Mobile Cloud Games
量化「樂趣」-以心理生理量測探究數位娛樂商品之市場價值
On The Battle between Online Gamers and Lags
Understanding The Performance of Thin-Client Gaming
Quantifying QoS Requirements of Network Services: A Cheat-Proof Framework
Online Game QoE Evaluation using Paired Comparisons
GamingAnywhere: An Open Cloud Gaming System
Are All Games Equally Cloud-Gaming-Friendly? An Electromyographic Approach
Forecasting Online Game Addictiveness
Identifying MMORPG Bots: A Traffic Analysis Approach
Toward an Understanding of the Processing Delay of Peer-to-Peer Relay Nodes
Game Bot Detection Based on Avatar Trajectory
Improving Reliability of Web 2.0-based Rating Systems Using Per-user Trustiness
A Collusion-Resistant Automation Scheme for Social Moderation Systems
Tuning Skype’s Redundancy Control Algorithm for User Satisfaction
Network Game Design: Hints and Implications of Player Interaction

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Electronic commerce courselecture one. Pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A comparative analysis of optical character recognition models for extracting...
Electronic commerce courselecture one. Pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25-Week II
The Rise and Fall of 3GPP – Time for a Sabbatical?
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
sap open course for s4hana steps from ECC to s4
Dropbox Q2 2025 Financial Results & Investor Presentation
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Detecting VoIP Traffic Based on Human Conversation Patterns

  • 1. Chen‐Chi Wu1, Kuan‐Ta Chen2 Yu‐Chun Chang1, Chin‐Laung Lei1 1Department of Electrical Engineering, National Taiwan University 2Institute of Information Science, Academia Sinica IPTComm 2008 1
  • 2. Outline Motivation Methodology Performance evaluation Summary IPTComm 2008 2
  • 3. Motivation VoIP is becoming popular because of Low call cost High voice quality Skype, a popular VoIP application over 10,000,000 concurrent users Accurately identifying VoIP flows from the network traffic  is required Traffic analysis Traffic management IPTComm 2008 3
  • 4. Motivation Challenges of VoIP flows identification Various signaling protocols: SIP, H.323, various proprietary  protocols Non‐standard port numbers Packet payload encryption The interaction of human conversation is unique result in a specific characteristic of VoIP traffic IPTComm 2008 4
  • 5. 4‐State Traffic Pattern Infer the on/off (talking/silence) pattern by the level of the  packet rate during a short period We model a two‐way conversation by a process of four states State A: a period that speaker A is talking and B is silent State B: B is talking and A is silent State D: both A and B are talking State M: mutual silence ON OFF ON OFF IPTComm 2008 5
  • 6. Intuition behind Our Approach The 4‐state traffic pattern of VoIP traffic is unique  compared to that of other network applications Web P2P (BitTorrent) Online game (WoW) TELNET VoIP (Skype) A or B D M IPTComm 2008 6
  • 7. Methodology Detect VoIP flows based on the unique human speech  conversation patterns embedded in voice traffic Derive features (attributes) from the conversation  patterns Adopt naïve Bayesian classifier, a supervised machine  learning tool, to divide traffic into the VoIP and non‐VoIP  class The class label of each training data is required IPTComm 2008 7
  • 8. Methodology Overview Training phase Identification phase Incoming flows  Labeled training flows  (unknown class) (VoIP or non‐VoIP) Extract conversation patterns and derive Extract 4‐state traffic Naïve  features    patterns and derive Bayesian features    Classifier Flow vectors Learn classifier Classify parameters Flow vectors Flow labels (VoIP or non‐VoIP)  IPTComm 2008 8
  • 9. Naïve Bayesian Classifier Naïve Bayesian classifier is based on the Bayes’ theorem P( B | A) P( A) P( A | B) = P( B) Each flow is represented by a vector  X = (x1, x2,…, xn),   depicting n features A1, A2,…, An Suppose there are m classes, C1, C2,…, Cm IPTComm 2008 9
  • 10. Naïve Bayesian Classifier Given a flow vector X, the classifier predicts the flow  belongs to class Ci iff P (C i | X ) > P (C j | X ) for 1 ≤ j ≤ m, j ≠ i By Bayes’ theorem P( X | Ci ) P(Ci ) P(Ci | X ) = P( X ) P ( X ) is constant and             is the prior probability, thus  P (Ci ) the task is to maximize P ( X | Ci ) IPTComm 2008 10
  • 11. Naïve Bayesian Classifier The naïve assumption is that the values of the features  are conditionally independent of one another n P ( X | C i ) = ∏ P ( x k | Ci ) k =1 = P( x1 | Ci ) × P ( x2 | Ci ) × × P ( x n | Ci ) P ( x1 | Ci ), P ( x2 | Ci ),..., P ( xn | Ci ) can be easily estimated  from the training data IPTComm 2008 11
  • 12. How to derive features from the 4‐ state traffic pattern? Use a Markov chain to model the VoIP traffic pattern Statistics of traffic patterns Web P2P WoW TELNET VoIP A or B D M IPTComm 2008 12
  • 13. Markov Chain Build a Markov chain model based on a set of known VoIP  traffic patterns Derive a feature – likelihood value Transition probabilities of the Markov chain A B D M A 0.9022 0.0028 0.0380 0.0571 B 0.0029 0.9030 0.0391 0.0550 D 0.0607 0.0592 0.8763 0.0038 M 0.0465 0.0439 0.0019 0.9078 4‐state Markov chain IPTComm 2008 13
  • 14. Likelihood of Traffic Patterns Given a traffic pattern with a state sequence S1, S2,…, Sn,  where Si ∈ { A, B, D, M } Compute the log‐likelihood value as log( P , 2 × P2,3 × × P( n −1) n ) 1 Pi,j : the transition probability from Si to Sj Traffic flows may vary in length, thus define the  normalized log‐likelihood value as log( P , 2 × P2,3 × × P( n −1) n ) 1 N N: the length of the sequence IPTComm 2008 14
  • 15. Likelihood of Traffic Patterns The Markov chain represents typical human conversation VoIP flows => large log‐likelihood value Non‐VoIP flows => low log‐likelihood value Exhibit non‐human‐like behavior: non‐interactive,  independent, unidirectional IPTComm 2008 15
  • 16. Statistics of Traffic Patterns Mean of the period that party A (or B) is ON (talking) each  time (also compute the standard deviation) Bidirectional behavior Mean and standard deviation of the sojourn time in  states A, B, D, M, respectively Interactive behavior State alternation frequency Fragmented and disordered level of traffic pattern IPTComm 2008 16
  • 17. Statistics of Traffic Patterns State alternation frequency Alternation frequency between different states E.g., (6 alternations between different states) / (20 sec.) IPTComm 2008 17
  • 18. Feature Summary Feature set Normalized log‐likelihood value based on the Markov  chain Speech period of party A or B (mean, standard deviation) Sojourn time in each states* (mean, standard deviation) Ratio of sojourn time in each states* Alternation rate between states* *states A, B, D, M IPTComm 2008 18
  • 19. Methodology Training phase Identification phase Incoming flows  Labeled training flows  (unknown class) (VoIP or non‐VoIP) Extract conversation patterns and derive Extract 4‐state traffic Naïve  features    patterns and derive Bayesian features    Classifier Flow vectors Learn classifier Classify parameters Flow vectors Flow labels (VoIP or non‐VoIP)  IPTComm 2008 19
  • 20. Trace Collection We collected network traffic from 5 categories of  applications VoIP (Skype), TELNET, Web, P2P (BitTorrent), online game  (World of Warcraft) Category # Connections Duration # Packets Bytes VoIP 462 2,388 (min) 4,728,240 4,318 (MB) TELNET 2,008 4,729 (min) 10,559,261 7,331 (MB) Web 1,406 1,537 (min) 2,528,359 680 (MB) P2P 15,845 3,334 (min) 29,220,870 30,500 (MB) Online game 2,224 120 (min) 28,264,360 59,097 (MB) IPTComm 2008 20
  • 21. Performance Evaluation Detect VoIP flows as early as possible Detection time is a major concern 95% accuracy with 4‐second detection time 97% accuracy with 11‐second detection time IPTComm 2008 21
  • 22. Performance Evaluation Goal        detect VoIP flows VoIP flows         positives, non‐VoIP flows        negatives True positive rate The  number  of  VoIP  flows  correctly  identified TPR = The  number  of  total  VoIP  flows False positive rate The  number  of  non ‐ VoIP  flows  correctly  identified FPR = The  number  of  total  non ‐ VoIP  flows True negative rate IPTComm 2008 22
  • 23. Performance Evaluation 97% TPR with a detection time longer than 3 sec. Flows of World of Warcraft tend to be mis‐identified Achieve 90% TNR with a detection time longer than 10 sec. IPTComm 2008 23
  • 24. ROC Curves ROC (Receiver Operating Characteristic) IPTComm 2008 24
  • 25. Summary Propose a VoIP flow identification scheme based on  human conversation patterns Our scheme yields an identification accuracy 95% within  4 sec. of the detection time, and 97% within 11 sec. High accuracy in short detection time IPTComm 2008 25