SlideShare a Scribd company logo
VARIABLE NEIGHBORHOOD
PREDICTION OF TEMPORAL
COLLECTIVE PROFILES
Presentation for EuroIoTA ’16
Speaker: Keun-Woo Lim
Telecom Paristech
24-11-2016
Brief Overview
 What do we do in this work?
 Analysis of temporal collective profiles (time-series)
 Use of mobile datasets (cellular, Wi-Fi)
 Real–time & Lightweight prediction (online prediction)
 What do we try to achieve?
 Better prediction accuracy
 Lower computational complexity
 Better application & use case
Contents
Contents  Introduction
 Methodology
 Prediction
 Outlier Detection
 Future Work
Introduction
Temporal collective profiles
 Representation of data that aggregate behavior of
group of individuals – over time
 Can be categorized in various ways
 “Daily Profiles”
What are collected?
 Basic telephone calls and SMS?
 However, we want to focus on more specific matters
 Specific application data
 Usage of Internet service
Why do we analyze these data?
 For “online network analysis”
 Real-time prediction of the near-future
 Recognition of sudden changes/outliers
 Timely adaptation
 Use cases
 Resource allocation
 Traffic handling
 Social behavior
Requirements
 Low computational complexity
 Lightweight prediction methods
 Accuracy
 Still have to be accurate
Dataset
 Cellular mobile dataset
 1-week data from 90 lacs in Paris
 More than 500 daily profiles
 Wi-Fi cloud dataset
 122 days (March 1st to June 30th, 2014)
 60 million URL connection logs
(Top 20 mobile applications)
Methodology
What should we do with daily profiles?
 Daily profiles can be:
 Very similar to each other (same day, location, etc.)
 Very different too (outlier, events)
 We use methods to calculate similarity
 Cluster similar profiles
 Distinguish different profiles
Previous work (Offline analysis)1
 Utilization of clustering methods (UPGMA)
 With similarity comparison techniques (DTW, quantiles)
 Not ideal in online data analysis
 Clustering may take long time (𝑂(𝑀2 𝑁3)with DTW)
1K. Lim, S. Secci, L. Tabourier, B. Tebbani, “Characterizing and predicting mobile application usage,”
https://hal.archives-ouvertes.fr/hal-01345824/document
Profile similarity
 We use two examples of similarity measures
(M values in a time-series)
 Euclidean distance (ED) = Θ(M)
 Dynamic time warping (DTW) = Θ(M2)
 For specific dataset containing N profiles,
 ED = Θ(N2M)
 DTW = Θ(N2M2)
to compare all with each other
Weighted graph representation
 Using similarity measures, we acquire a graph
structure of neighbors
 E.g., if ED is used, lower value = more similar
Filtering paths
 Filter neighbors with high distance
 Depending on the value of α, the number of neighbors
change for all profiles
Visualization of graph structure
 Example graph structure for ED – cellular dataset
Variable Neighborhood Prediction
(VNP)
Principle of VNP
 For a new day 𝑥 𝑛(𝑡), we configure
 𝑡0 = 0, 𝑡1 = 0~24, 𝑡2 = 24 (hour)
 Objective
 Observation period = 𝑥 𝑛 𝑡0, 𝑡1
 Create a temporal profile to predict 𝑥 𝑛 𝑡1, 𝑡2
 Find 𝑥 from the observation period
 The closest profile 𝑥, in 𝑥 𝑡0, 𝑡1 and 𝑥 𝑛 𝑡0, 𝑡1
Find the neighbors
 Using closest neighbor 𝑥, we find the group of
neighbors 𝑁 𝑛 to be used for prediction
 For any other profile y of the training set,
 𝑦 ∈ 𝑁𝑛 𝑖𝑖𝑓
𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑦 𝑡0, 𝑡1 ≤ 𝑎 ∙ 𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑥 𝑡0, 𝑡1
Creating the prediction profile
 Using 𝑁 𝑛, formulate the prediction
 𝑥 𝑛 𝑡 =
σ 𝑦∈𝑁 𝑛
𝑠(𝑥 𝑛,𝑦)∙𝑦(𝑡)
σ 𝑦∈𝑁 𝑛
𝑠(𝑥 𝑛,𝑦)
 Simply put, it is the weighted average over the profiles
of its neighborhood
Training Parameter 𝑎
 𝑎 can be tuned to select the optimal number of
neighbors
 Variable neighborhood search to find the 𝑎 that yields
the highest accuracy over time
 E.g. 1.0 < 𝑎 < 10.0
 Drawbacks
 Increase in complexity (recalculate for each 𝑎)
Calculating multiple 𝑡1
 For a more fine-grained prediction, multiple 𝑡1 can
be used in one day
 Repetition of the VNP (e.g. per-hour analysis)
Handling Complexity - VNP
 Computation of calculating neighborhood of target
day per 𝑎 :
 ED = Θ(NM)
 DTW = Θ(NM2)
 Depending on N, this can be large in practice
 Also, in case of multiple 𝑡1 analysis, large M can
also impact
Handling Complexity - Graph
 Can be heavy
 ED = Θ(N2M)
 DTW = Θ(N2M2)
 Luckily, graph representation is only updated once per day
 Although, needed for various M in case of multiple 𝑡1 analysis
 Also, space partitioning can be used to reduce time
 Via Kd-tree
 This can reduce complexity of ED to Θ(log(N)M)
Prediction Analysis
Prediction accuracy analysis
 Prediction through relative error, defined as
 𝜀 =
σ 𝑡=𝑡1
𝑡2 𝑥 𝑛 𝑡 − ෣𝑥 𝑛 𝑡
2
σ 𝑡=𝑡1
𝑡2 𝑥 𝑛 𝑡
2
 Comparison with closest neighbor ( 𝑎 =1), UPGMA
 𝑡1 = 12
cellular data - ED cellular data - DTW
Effect of changing 𝑡1
 Per-hour analysis
 The length of observation period may also effect the performance
of prediction
cellular data - ED cellular data - DTW
Time consumption
 The required time can be acceptable for both methods in a
per-hour analysis.
 However, need caution for DTW when many profiles are used
cellular data - ED cellular data - DTW
Distribution of α
 The distribution of optimal α is focused in range [1,2], allowing
us to easily limit the range of α
 Distribution of neighbors is heterogeneous, but most are < 20
Conclusion & Future work
Conclusion & Future work
 We have proposed a methodology for online
prediction of mobile time-series datasets
 Acceptable time for our current dataset
 Can be used for other time-series datasets in various
IoT environment
 Further studies include
 Testing in a bigger scale dataset
Any Questions?
Appendix – Wi-Fi data prediction
Wifi data - ED Wifi data - DTW

More Related Content

PPT
Concurrent Replication of Parallel and Distributed Simulations
PDF
Mean shift and Hierarchical clustering
PDF
DyCode Engineering - Machine Learning with TensorFlow
PPTX
Density based Clustering Algorithms(DB SCAN, Mean shift )
PDF
Load balancing in public cloud combining the concepts of data mining and netw...
PPT
Cure, Clustering Algorithm
PPT
Data miningpresentation
PDF
Restricting the Flow: Information Bottlenecks for Attribution
Concurrent Replication of Parallel and Distributed Simulations
Mean shift and Hierarchical clustering
DyCode Engineering - Machine Learning with TensorFlow
Density based Clustering Algorithms(DB SCAN, Mean shift )
Load balancing in public cloud combining the concepts of data mining and netw...
Cure, Clustering Algorithm
Data miningpresentation
Restricting the Flow: Information Bottlenecks for Attribution

What's hot (20)

PPTX
K-means Clustering
PPTX
Customer Segmentation using Clustering
PPT
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
PPTX
K-Means clustring @jax
PPTX
Grid based method & model based clustering method
PDF
K means Clustering
PPT
★Mean shift a_robust_approach_to_feature_space_analysis
PPT
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
PPT
Enhance The K Means Algorithm On Spatial Dataset
PPT
3.6 constraint based cluster analysis
PPTX
"k-means-clustering" presentation @ Papers We Love Bucharest
PDF
K means and dbscan
PPTX
Types of clustering and different types of clustering algorithms
PDF
New Approach for K-mean and K-medoids Algorithm
PPT
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
PPTX
Clique
PPTX
K means clustering algorithm
PPTX
Unsupervised Learning
PPT
3.2 partitioning methods
PPTX
Kmeans
K-means Clustering
Customer Segmentation using Clustering
Proximity Detection in Distributed Simulation of Wireless Mobile Systems
K-Means clustring @jax
Grid based method & model based clustering method
K means Clustering
★Mean shift a_robust_approach_to_feature_space_analysis
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
Enhance The K Means Algorithm On Spatial Dataset
3.6 constraint based cluster analysis
"k-means-clustering" presentation @ Papers We Love Bucharest
K means and dbscan
Types of clustering and different types of clustering algorithms
New Approach for K-mean and K-medoids Algorithm
Modelling Accessibility Performance in LTE networks, An Analytics Methodology
Clique
K means clustering algorithm
Unsupervised Learning
3.2 partitioning methods
Kmeans
Ad

Viewers also liked (9)

PDF
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
PDF
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange Labs
PDF
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
PDF
5G: a revolution or an evolution for IoT by Merouane DEBBAH, Huawei
PDF
Real-Time Big Data Stream Analytics
PPTX
Grid Analytics Europe 2016: "Open for Business", April 2016
PPTX
Big data/Data Mining/IoT/Smart City
PDF
Benefiting from Big Data - A New Approach for the Telecom Industry
PDF
Predictive Analytics in Telecommunication
Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland
Data analytics for monitoring IoT infrastructures by G.Madhusudan, Orange Labs
OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, Universit...
5G: a revolution or an evolution for IoT by Merouane DEBBAH, Huawei
Real-Time Big Data Stream Analytics
Grid Analytics Europe 2016: "Open for Business", April 2016
Big data/Data Mining/IoT/Smart City
Benefiting from Big Data - A New Approach for the Telecom Industry
Predictive Analytics in Telecommunication
Ad

Similar to Variable neighborhood Prediction of temporal collective profiles by Keun-Woo Lim, Telecom ParisTech (20)

PDF
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
PDF
IMPROVED NEURAL NETWORK PREDICTION PERFORMANCES OF ELECTRICITY DEMAND: MODIFY...
PDF
Combination of Similarity Measures for Time Series Classification using Genet...
PDF
Estimation of global solar radiation by using machine learning methods
PPTX
description description description description
PDF
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
PDF
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
PDF
Data mining projects topics for java and dot net
PDF
Poster_Reseau_Neurones_Journees_2013
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
PDF
08039246
PDF
M017327378
PDF
Scheduling Using Multi Objective Genetic Algorithm
PDF
L010628894
PDF
Lidar Point Cloud Classification Using Expectation Maximization Algorithm
PDF
LIDAR POINT CLOUD CLASSIFICATION USING EXPECTATION MAXIMIZATION ALGORITHM
PDF
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
PDF
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
PDF
Accurate indoor positioning system based on modify nearest point technique
PDF
Data collection in multi application sharing wireless sensor networks
Data Accuracy Models under Spatio - Temporal Correlation with Adaptive Strate...
IMPROVED NEURAL NETWORK PREDICTION PERFORMANCES OF ELECTRICITY DEMAND: MODIFY...
Combination of Similarity Measures for Time Series Classification using Genet...
Estimation of global solar radiation by using machine learning methods
description description description description
DCE: A NOVEL DELAY CORRELATION MEASUREMENT FOR TOMOGRAPHY WITH PASSIVE REAL...
USING LEARNING AUTOMATA AND GENETIC ALGORITHMS TO IMPROVE THE QUALITY OF SERV...
Data mining projects topics for java and dot net
Poster_Reseau_Neurones_Journees_2013
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
08039246
M017327378
Scheduling Using Multi Objective Genetic Algorithm
L010628894
Lidar Point Cloud Classification Using Expectation Maximization Algorithm
LIDAR POINT CLOUD CLASSIFICATION USING EXPECTATION MAXIMIZATION ALGORITHM
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
Accurate indoor positioning system based on modify nearest point technique
Data collection in multi application sharing wireless sensor networks

Recently uploaded (20)

PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Leprosy and NLEP programme community medicine
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Computer network topology notes for revision
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Mega Projects Data Mega Projects Data
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
.pdf is not working space design for the following data for the following dat...
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Leprosy and NLEP programme community medicine
Galatica Smart Energy Infrastructure Startup Pitch Deck
Computer network topology notes for revision
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
climate analysis of Dhaka ,Banglades.pptx
annual-report-2024-2025 original latest.
Optimise Shopper Experiences with a Strong Data Estate.pdf
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Mega Projects Data Mega Projects Data
Data_Analytics_and_PowerBI_Presentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
oil_refinery_comprehensive_20250804084928 (1).pptx
IB Computer Science - Internal Assessment.pptx
Predictive modeling basics in data cleaning process
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Clinical guidelines as a resource for EBP(1).pdf
.pdf is not working space design for the following data for the following dat...

Variable neighborhood Prediction of temporal collective profiles by Keun-Woo Lim, Telecom ParisTech

  • 1. VARIABLE NEIGHBORHOOD PREDICTION OF TEMPORAL COLLECTIVE PROFILES Presentation for EuroIoTA ’16 Speaker: Keun-Woo Lim Telecom Paristech 24-11-2016
  • 2. Brief Overview  What do we do in this work?  Analysis of temporal collective profiles (time-series)  Use of mobile datasets (cellular, Wi-Fi)  Real–time & Lightweight prediction (online prediction)  What do we try to achieve?  Better prediction accuracy  Lower computational complexity  Better application & use case
  • 3. Contents Contents  Introduction  Methodology  Prediction  Outlier Detection  Future Work
  • 5. Temporal collective profiles  Representation of data that aggregate behavior of group of individuals – over time  Can be categorized in various ways  “Daily Profiles”
  • 6. What are collected?  Basic telephone calls and SMS?  However, we want to focus on more specific matters  Specific application data  Usage of Internet service
  • 7. Why do we analyze these data?  For “online network analysis”  Real-time prediction of the near-future  Recognition of sudden changes/outliers  Timely adaptation  Use cases  Resource allocation  Traffic handling  Social behavior
  • 8. Requirements  Low computational complexity  Lightweight prediction methods  Accuracy  Still have to be accurate
  • 9. Dataset  Cellular mobile dataset  1-week data from 90 lacs in Paris  More than 500 daily profiles  Wi-Fi cloud dataset  122 days (March 1st to June 30th, 2014)  60 million URL connection logs (Top 20 mobile applications)
  • 11. What should we do with daily profiles?  Daily profiles can be:  Very similar to each other (same day, location, etc.)  Very different too (outlier, events)  We use methods to calculate similarity  Cluster similar profiles  Distinguish different profiles
  • 12. Previous work (Offline analysis)1  Utilization of clustering methods (UPGMA)  With similarity comparison techniques (DTW, quantiles)  Not ideal in online data analysis  Clustering may take long time (𝑂(𝑀2 𝑁3)with DTW) 1K. Lim, S. Secci, L. Tabourier, B. Tebbani, “Characterizing and predicting mobile application usage,” https://hal.archives-ouvertes.fr/hal-01345824/document
  • 13. Profile similarity  We use two examples of similarity measures (M values in a time-series)  Euclidean distance (ED) = Θ(M)  Dynamic time warping (DTW) = Θ(M2)  For specific dataset containing N profiles,  ED = Θ(N2M)  DTW = Θ(N2M2) to compare all with each other
  • 14. Weighted graph representation  Using similarity measures, we acquire a graph structure of neighbors  E.g., if ED is used, lower value = more similar
  • 15. Filtering paths  Filter neighbors with high distance  Depending on the value of α, the number of neighbors change for all profiles
  • 16. Visualization of graph structure  Example graph structure for ED – cellular dataset
  • 18. Principle of VNP  For a new day 𝑥 𝑛(𝑡), we configure  𝑡0 = 0, 𝑡1 = 0~24, 𝑡2 = 24 (hour)  Objective  Observation period = 𝑥 𝑛 𝑡0, 𝑡1  Create a temporal profile to predict 𝑥 𝑛 𝑡1, 𝑡2  Find 𝑥 from the observation period  The closest profile 𝑥, in 𝑥 𝑡0, 𝑡1 and 𝑥 𝑛 𝑡0, 𝑡1
  • 19. Find the neighbors  Using closest neighbor 𝑥, we find the group of neighbors 𝑁 𝑛 to be used for prediction  For any other profile y of the training set,  𝑦 ∈ 𝑁𝑛 𝑖𝑖𝑓 𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑦 𝑡0, 𝑡1 ≤ 𝑎 ∙ 𝑠 𝑥 𝑛 𝑡0, 𝑡1 , 𝑥 𝑡0, 𝑡1
  • 20. Creating the prediction profile  Using 𝑁 𝑛, formulate the prediction  𝑥 𝑛 𝑡 = σ 𝑦∈𝑁 𝑛 𝑠(𝑥 𝑛,𝑦)∙𝑦(𝑡) σ 𝑦∈𝑁 𝑛 𝑠(𝑥 𝑛,𝑦)  Simply put, it is the weighted average over the profiles of its neighborhood
  • 21. Training Parameter 𝑎  𝑎 can be tuned to select the optimal number of neighbors  Variable neighborhood search to find the 𝑎 that yields the highest accuracy over time  E.g. 1.0 < 𝑎 < 10.0  Drawbacks  Increase in complexity (recalculate for each 𝑎)
  • 22. Calculating multiple 𝑡1  For a more fine-grained prediction, multiple 𝑡1 can be used in one day  Repetition of the VNP (e.g. per-hour analysis)
  • 23. Handling Complexity - VNP  Computation of calculating neighborhood of target day per 𝑎 :  ED = Θ(NM)  DTW = Θ(NM2)  Depending on N, this can be large in practice  Also, in case of multiple 𝑡1 analysis, large M can also impact
  • 24. Handling Complexity - Graph  Can be heavy  ED = Θ(N2M)  DTW = Θ(N2M2)  Luckily, graph representation is only updated once per day  Although, needed for various M in case of multiple 𝑡1 analysis  Also, space partitioning can be used to reduce time  Via Kd-tree  This can reduce complexity of ED to Θ(log(N)M)
  • 26. Prediction accuracy analysis  Prediction through relative error, defined as  𝜀 = σ 𝑡=𝑡1 𝑡2 𝑥 𝑛 𝑡 − ෣𝑥 𝑛 𝑡 2 σ 𝑡=𝑡1 𝑡2 𝑥 𝑛 𝑡 2  Comparison with closest neighbor ( 𝑎 =1), UPGMA  𝑡1 = 12 cellular data - ED cellular data - DTW
  • 27. Effect of changing 𝑡1  Per-hour analysis  The length of observation period may also effect the performance of prediction cellular data - ED cellular data - DTW
  • 28. Time consumption  The required time can be acceptable for both methods in a per-hour analysis.  However, need caution for DTW when many profiles are used cellular data - ED cellular data - DTW
  • 29. Distribution of α  The distribution of optimal α is focused in range [1,2], allowing us to easily limit the range of α  Distribution of neighbors is heterogeneous, but most are < 20
  • 31. Conclusion & Future work  We have proposed a methodology for online prediction of mobile time-series datasets  Acceptable time for our current dataset  Can be used for other time-series datasets in various IoT environment  Further studies include  Testing in a bigger scale dataset
  • 33. Appendix – Wi-Fi data prediction Wifi data - ED Wifi data - DTW