SlideShare a Scribd company logo
Information fusion for location
data analysis
Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei
Doctorate School in Industrial Innovation Engineering
Thesis outline
• Introduction to Data Fusion Methods
• Location Data and Application Scenarios
• Data Fusion for Event Detection and Event Description
• Re-identification of Anonymized CDR Records Using Information Fusion
• Privacy issues
• Conclusions
Location data and application scenarios
Data
• Location data such as CDR (Call
Description Records)
• Geo-tagged social network data or
data from LBS
• Open data with a location
dimension such as census data
Applications
• Social – economic development
(D4D) .
• Smart mobility applications, land use
and city management
• Ground truth information for
validation analysis
Introduction to data fusion
Introduction to data fusion methods
• Stage based methods.
• Feature level-based.
• Semantic meaning-based data fusion methods
Location data fusion : side effect
• Data fusion enables a huge number of applications
• Privacy risks for individual data
Data fusion for event detection / description by
using aggregated CDR data and geo-tagged social
network data
Detecting and describing events happening in urban
areas by analysing spatio – temporal data
Detecting and describing events happening in urban areas
by analysing spatio – temporal data
Riferimento all’articolo
Presentation of PhD thesis on Location Data Fusion
The dataset
The dataset: spatio-temporal aggregation
Spatial Aggregation
Temporal aggregation
Statistical modelling
Outlier detection
method
Median method :
[LB,UB] = [Q50 – k*Q50, Q50 + k*Q50]
IQR method :
[LB,UB] = [Q25 – k*IQR, Q75 + k*IQR]
Q75 method :
[LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]
Groundtruth
dataset
 Football matches
 Fairs
 Protests
 Other events
Events happeing in the period of
time the data covers
Measuring precision and
recall of the system
True positives (tp)
False positives (fp)
False negatives (fn)
Precision = tp / (tp + fp)
Recall = tp / (tp + fn)
Precision – Recall of event detection system
Precision – Recall Milano vs Trentino SMS-Call
Precision – Recall Milano vs Trentino SMS-Call
Precision – Recall Milano vs Trentino SMS-Call
By combining the results from
the two datasets
• Improvement of precision – recall
performance of the method
• The improvement is limited in the
long run by the main dataset.
• The same improvement can be
observed also by joining the
results of the other datasets.
Improving event detection results by data fusion
By using the CDR the events
can be detected but not
described:
• By joining the results the data
can complement and enrich
each other.
• In this case the social dataset
can be used to describe
semantically the events
Data fusion for Event description
Confronting the results with other works on event
detection
• Two other similar works
• Using much more sophisticated algorithms
• Comparable results
Re-identification of CDR data by using social
network geo-tagged data
• Fine grained social and CDR user data
• Mobility paths
• Uniqueness of mobility prints
• Matching of user’s mobility path
• Re-identification probability evaluation
• The groundtruth problem.
Location data : CDR and social
CDR data
1. Massive dataset about millions of
users
2. Released in an anonymized format
3. Regularly sampled
4. Tower granularity (400 – sev. kml)
Geo-tagged social data
1. Sparse data following exp. distrib. (too
many users too little events per user)
2. Not anonymized
3. Irregular samplinig
4. Precise (GPS or triang. Loc.)
Re-identification of CDR data by using social
network geo-tagged data
• Anonymization.. and re-identification
• Movie ratings from NetFlix Prize dataset
• Medical records of Massachusetts Hospital using a voters list
• Re-identification of anonymous volunteers in a DNA study for Personal Genome
Project
• In line with our domain
• Unique in the Crowd: the privacy bounds of Human Mobility
• Markov chain models for de-anonymization of geo-located data
Data fusion process
Mobility measures : radius of gyration
Knowledge extraction : radius of gyration
Radius of gyration : CDR
Radius of gyration : Social Network Data
Mobility measures and uniqueness of users mobility
Knowledge extraction : uniqueness of traces
Mobility measures and uniqueness of users mobility
Sample of 1000 users from each CDR dataset
Knowledge extraction : uniqueness of traces
Knowledge extraction : uniqueness of traces statistics
Knowledge extraction : matching users from different datasets CDR and
social dataset
Data fusion : matching algorithm
Knowledge extraction : matching statistics
• Matching by chance : Bonferroni principle
• False social user’s events created :
a) in a random way
b) by clonning events (+1km, +30min)
• As a result we have 60 % less in the number of matchings in the first
case and 40% in the second case
Data fusion : considerations
As real identity of CDR users is missing, a validation of these results is
difficoult.
Flickr user is Twitter user (mobility traces overlapping and similar
usernames) and (the only) CDR user.
MCC field of the CDR record matching with the language used for
describing pictures and tweets content.
Data fusion : groundtruth validation
Data fusion : considerations
Reidentifying CDR users : probabilistic approach
Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?
• Question which is both novel (no other works addressing it in this
domain) and fundamental
• Conditional probability
Re-identification : probabilistic approach
Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two
users are the same?
Re-identification : probabilistic modeling
Privacy risks for pesonal data
The revelatory potential power of location data
• Location of a person’s home. What kind of city area does he lives in?
• Locations of the stores a person frequent and from this information
shopping patterns can be inferred preferences and in some cases religious belief.
• There are also other types of very sensitive data such as health records. These can be
deduced by locations of doctors and hospitals the person visits
• By linking two or more locations on time and space, mobility
paths may be inferred.
Privacy risks : privacy preserving techniques
• Data Anonymization
a) K-anonymity in different improved versions
b) Possible reidentification of location data as already showed
• Data Suppression
a) Suppression and aggregation
b) Utility of the dataset after suppression dramatically reduced
Challenges
• One of the main challenges is the lack of common engineering standards for data
fusion systems. It has been one of the main impediments to integration and data
fusion.
• As different methods of data fusion behave differently in different applications, it
is not trivial to choose the best method for a specific task.
• Challenges during the data fusion design phase. At which level of abstraction,
reduction and simplification the data should be fused ?
• The lack of a unified framework that could orient the process of data fusion
towards a “structured data fusion” vision.
Conclusions and future work
• Information fusion as a an enabling process for novel applications
- Future work oriented towards the “structured data fusion” idea
• Privacy
- Assesment of variations of existing privacy preserving techniques (D.P.)
Publications
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness
for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in
enabling technologies for collaboration 17-20 2013.
• Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data
”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014.
• Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and
Humanized Computing, pp 1– 15.
• Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social
Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers
Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227.
• Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between
Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.

More Related Content

PPTX
Information Fusion Methods for Location Data Analysis
PPTX
Re-identification of Anomized CDR datasets using Social networlk Data
PPTX
Data fusion for city live event detection
PDF
Feature based similarity search in 3 d object databases
PDF
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...
PDF
Data mining based social network
DOC
by Warren Jin
PDF
Next generation big data analytics state of the art
Information Fusion Methods for Location Data Analysis
Re-identification of Anomized CDR datasets using Social networlk Data
Data fusion for city live event detection
Feature based similarity search in 3 d object databases
TREND-BASED NETWORKING DRIVEN BY BIG DATA TELEMETRY FOR SDN AND TRADITIONAL N...
Data mining based social network
by Warren Jin
Next generation big data analytics state of the art

What's hot (20)

PPT
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
PDF
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
PPT
CityPulse: Large-scale data analytics for smart cities
PPTX
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
PDF
The human face of AI: how collective and augmented intelligence can help sol...
PPTX
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...
PDF
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
PPT
The impact of Big Data on next generation of smart cities
DOC
Dotnet ieee titles 2013 14
DOC
Poster Abstracts
PPT
Semantic technologies for the Internet of Things
PDF
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
PPT
Large-scale data analytics for smart cities
PDF
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
PDF
Q046049397
PDF
Big Data Analytics : A Social Network Approach
PPTX
Data mining for social media
PPT
Internet of Things: The story so far
PDF
Building better knowledge graphs through social computing
PPT
Physical-Cyber-Social Data Analytics & Smart City Applications
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
Big Data Analytics- USE CASES SOLVED USING NETWORK ANALYSIS TECHNIQUES IN GEPHI
CityPulse: Large-scale data analytics for smart cities
Big Data Analysis- Live DATA PRESENTATION- Bitcoin Alpha trust network
The human face of AI: how collective and augmented intelligence can help sol...
Visualizing Exports of Personal Data by Exercising the Right of Data Portabil...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
The impact of Big Data on next generation of smart cities
Dotnet ieee titles 2013 14
Poster Abstracts
Semantic technologies for the Internet of Things
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...
Large-scale data analytics for smart cities
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Q046049397
Big Data Analytics : A Social Network Approach
Data mining for social media
Internet of Things: The story so far
Building better knowledge graphs through social computing
Physical-Cyber-Social Data Analytics & Smart City Applications
Ad

Similar to Presentation of PhD thesis on Location Data Fusion (20)

PDF
Cross Domain Data Fusion
PPTX
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
PDF
ledio_gjoni_tesi
PDF
Visualizing CDR Data
PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
PPTX
Data mining and Fusion Techniques for WSNs as a Source of The Big Data
PDF
Myths and challenges in knowledge extraction and analysis from human-generate...
PDF
City Data Dating: emerging affinities between diverse urban datasets
PPTX
Building and Measuring Privacy-Preserving Mobility Analytics
PDF
data Fusion and log correlation
PPTX
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
PPTX
Weiyi meng web data truthfulness analysis
PDF
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
PPTX
Presentation1 TSC v6.pptx
PPTX
Presentation aru
PDF
Feature level fusion of multi-source data for network intrusion detection
PDF
Identical Users in Different Social Media Provides Uniform Network Structure ...
PDF
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
PDF
Image-Based Multi-Sensor Data Representation and Fusion Via 2D Non-Linear Con...
Cross Domain Data Fusion
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
ledio_gjoni_tesi
Visualizing CDR Data
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Data mining and Fusion Techniques for WSNs as a Source of The Big Data
Myths and challenges in knowledge extraction and analysis from human-generate...
City Data Dating: emerging affinities between diverse urban datasets
Building and Measuring Privacy-Preserving Mobility Analytics
data Fusion and log correlation
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
Weiyi meng web data truthfulness analysis
Privacy Preserving Aggregate Statistics for Mobile Crowdsensing
Presentation1 TSC v6.pptx
Presentation aru
Feature level fusion of multi-source data for network intrusion detection
Identical Users in Different Social Media Provides Uniform Network Structure ...
MediaEval 2015 - CERTH at MediaEval 2015 Synchronization of Multi-User Event ...
Image-Based Multi-Sensor Data Representation and Fusion Via 2D Non-Linear Con...
Ad

More from Alket Cecaj (6)

PPTX
Distributed systems and blockchain technology
PPT
Joomla
PPT
Elaborazione e rappresentazione grafica e interattiva dell'informazione
PPTX
Collective awareness for human ict collaboration in smart cities
PPTX
Algorithms presentation
PDF
Bridges innovcampdk
Distributed systems and blockchain technology
Joomla
Elaborazione e rappresentazione grafica e interattiva dell'informazione
Collective awareness for human ict collaboration in smart cities
Algorithms presentation
Bridges innovcampdk

Recently uploaded (20)

PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Well-logging-methods_new................
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
OOP with Java - Java Introduction (Basics)
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
web development for engineering and engineering
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Well-logging-methods_new................
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
OOP with Java - Java Introduction (Basics)
R24 SURVEYING LAB MANUAL for civil enggi
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Foundation to blockchain - A guide to Blockchain Tech
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
web development for engineering and engineering
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx

Presentation of PhD thesis on Location Data Fusion

  • 1. Information fusion for location data analysis Candidate: Alket Cecaj Supervisor: Prof. Marco Mamei Doctorate School in Industrial Innovation Engineering
  • 2. Thesis outline • Introduction to Data Fusion Methods • Location Data and Application Scenarios • Data Fusion for Event Detection and Event Description • Re-identification of Anonymized CDR Records Using Information Fusion • Privacy issues • Conclusions
  • 3. Location data and application scenarios Data • Location data such as CDR (Call Description Records) • Geo-tagged social network data or data from LBS • Open data with a location dimension such as census data Applications • Social – economic development (D4D) . • Smart mobility applications, land use and city management • Ground truth information for validation analysis
  • 5. Introduction to data fusion methods • Stage based methods. • Feature level-based. • Semantic meaning-based data fusion methods
  • 6. Location data fusion : side effect • Data fusion enables a huge number of applications • Privacy risks for individual data
  • 7. Data fusion for event detection / description by using aggregated CDR data and geo-tagged social network data Detecting and describing events happening in urban areas by analysing spatio – temporal data Detecting and describing events happening in urban areas by analysing spatio – temporal data Riferimento all’articolo
  • 10. The dataset: spatio-temporal aggregation Spatial Aggregation Temporal aggregation
  • 12. Outlier detection method Median method : [LB,UB] = [Q50 – k*Q50, Q50 + k*Q50] IQR method : [LB,UB] = [Q25 – k*IQR, Q75 + k*IQR] Q75 method : [LB,UB] = [Q25 – k*Q25, Q25 + k*Q75]
  • 13. Groundtruth dataset  Football matches  Fairs  Protests  Other events Events happeing in the period of time the data covers
  • 14. Measuring precision and recall of the system True positives (tp) False positives (fp) False negatives (fn) Precision = tp / (tp + fp) Recall = tp / (tp + fn)
  • 15. Precision – Recall of event detection system
  • 16. Precision – Recall Milano vs Trentino SMS-Call
  • 17. Precision – Recall Milano vs Trentino SMS-Call
  • 18. Precision – Recall Milano vs Trentino SMS-Call
  • 19. By combining the results from the two datasets • Improvement of precision – recall performance of the method • The improvement is limited in the long run by the main dataset. • The same improvement can be observed also by joining the results of the other datasets. Improving event detection results by data fusion
  • 20. By using the CDR the events can be detected but not described: • By joining the results the data can complement and enrich each other. • In this case the social dataset can be used to describe semantically the events Data fusion for Event description
  • 21. Confronting the results with other works on event detection • Two other similar works • Using much more sophisticated algorithms • Comparable results
  • 22. Re-identification of CDR data by using social network geo-tagged data • Fine grained social and CDR user data • Mobility paths • Uniqueness of mobility prints • Matching of user’s mobility path • Re-identification probability evaluation • The groundtruth problem.
  • 23. Location data : CDR and social CDR data 1. Massive dataset about millions of users 2. Released in an anonymized format 3. Regularly sampled 4. Tower granularity (400 – sev. kml) Geo-tagged social data 1. Sparse data following exp. distrib. (too many users too little events per user) 2. Not anonymized 3. Irregular samplinig 4. Precise (GPS or triang. Loc.)
  • 24. Re-identification of CDR data by using social network geo-tagged data • Anonymization.. and re-identification • Movie ratings from NetFlix Prize dataset • Medical records of Massachusetts Hospital using a voters list • Re-identification of anonymous volunteers in a DNA study for Personal Genome Project • In line with our domain • Unique in the Crowd: the privacy bounds of Human Mobility • Markov chain models for de-anonymization of geo-located data
  • 26. Mobility measures : radius of gyration Knowledge extraction : radius of gyration
  • 28. Radius of gyration : Social Network Data
  • 29. Mobility measures and uniqueness of users mobility Knowledge extraction : uniqueness of traces
  • 30. Mobility measures and uniqueness of users mobility Sample of 1000 users from each CDR dataset Knowledge extraction : uniqueness of traces
  • 31. Knowledge extraction : uniqueness of traces statistics
  • 32. Knowledge extraction : matching users from different datasets CDR and social dataset
  • 33. Data fusion : matching algorithm
  • 34. Knowledge extraction : matching statistics
  • 35. • Matching by chance : Bonferroni principle • False social user’s events created : a) in a random way b) by clonning events (+1km, +30min) • As a result we have 60 % less in the number of matchings in the first case and 40% in the second case Data fusion : considerations
  • 36. As real identity of CDR users is missing, a validation of these results is difficoult. Flickr user is Twitter user (mobility traces overlapping and similar usernames) and (the only) CDR user. MCC field of the CDR record matching with the language used for describing pictures and tweets content. Data fusion : groundtruth validation
  • 37. Data fusion : considerations
  • 38. Reidentifying CDR users : probabilistic approach Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two users are the same?
  • 39. • Question which is both novel (no other works addressing it in this domain) and fundamental • Conditional probability Re-identification : probabilistic approach Given that CDR user Ci has Ni events (points) in common with FTi, how likely is that the two users are the same?
  • 41. Privacy risks for pesonal data The revelatory potential power of location data • Location of a person’s home. What kind of city area does he lives in? • Locations of the stores a person frequent and from this information shopping patterns can be inferred preferences and in some cases religious belief. • There are also other types of very sensitive data such as health records. These can be deduced by locations of doctors and hospitals the person visits • By linking two or more locations on time and space, mobility paths may be inferred.
  • 42. Privacy risks : privacy preserving techniques • Data Anonymization a) K-anonymity in different improved versions b) Possible reidentification of location data as already showed • Data Suppression a) Suppression and aggregation b) Utility of the dataset after suppression dramatically reduced
  • 43. Challenges • One of the main challenges is the lack of common engineering standards for data fusion systems. It has been one of the main impediments to integration and data fusion. • As different methods of data fusion behave differently in different applications, it is not trivial to choose the best method for a specific task. • Challenges during the data fusion design phase. At which level of abstraction, reduction and simplification the data should be fused ? • The lack of a unified framework that could orient the process of data fusion towards a “structured data fusion” vision.
  • 44. Conclusions and future work • Information fusion as a an enabling process for novel applications - Future work oriented towards the “structured data fusion” idea • Privacy - Assesment of variations of existing privacy preserving techniques (D.P.)
  • 45. Publications • Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli: “ Collective Awareness for Human ICT Collaboration in Smart Cities”. IEEE WETICE International conference on state-of-the art research in enabling technologies for collaboration 17-20 2013. • Alket Cecaj, Marco Mamei, Nicola Bicocchi : “ Re-identification of Anonymized CDR datasets Using Social Network Data ”. IEEE Percom International conference on Pervasive Computing and Communications. Budapest, Hungary 24-28, 2014. • Cecaj Alket, Marco Mamei (2016) : “Data Fusion for City Life Event Detection” In: Journal of Ambient Intelligence and Humanized Computing, pp 1– 15. • Nicola Bicocchi, Alket Cecaj, Damiano Fontana, Marco Mamei, Andrea Sassi, Franco Zambonelli.(2014) “ Social Collective Awareness in Socio-Technical Urban Superorganisms ”. Social Collective Intelligence Combining the Powers Of Humans and Machines to Build a Smarter Society,Part III, Applications and Case studies, page 227. • Cecaj, Alket, Marco Mamei, and Franco Zambonelli (2015). “Re-identification and Information Fusion Between Anonymized CDR and Social Network Data”. In: Journal of Ambient Intelligence and Humanized Computing, pp. 1–14.

Editor's Notes

  • #3: Introduzione ai metodi di data /information fusion. In particolare si parla di data o di information fusion a seconda che si tratti di una integrazione di basso o alto livello. I vari tipi di dati geo-referenziati e le diverse applicazioni che questi dati possono avere. Uno studio di rilevamento automatico di grandi eventi in aree urbane usando dati aggregati di telefonia mobile e dati social geo-referenziati. Dai dati aggregati si passa ai dati anonimizzati CDR che mostrano tracce di mobilità individuali. In questo lavoro si studiano diverse caratteristiche come l’unicità di queste tracce e di come questo può impattare la privacy. Alla fine, insieme alle conclusioni si presentano diversi punti aperti (sfide ancora aperte) da risolvere sia per quanto riguarda il campo di data fusion che quello sulla privacy preserving.
  • #4: La grande mole di dati generati durante la routine quotidiana come ad esempio I dati geo-referenziati come ad esempio i CDR (Call Description Records), i dati geo-referenziati che è possibile ottenere dai social network o (LBS come Foursquare) oppure gli open data come quelli del census. Dall’altra parte le applicazioni che derivano sono tante. Dal punto di vista dello sviluppo sociale si possono menzionare lavori che studiano i dati geo-referenziati a capire il meccanismo di diffusione delle malattie oppure i livelli di povertà nelle varie aree urbane, tutti studi che contribuiscono a orientare possibili interventi in questo senso. In un ambito smart city tali dati permettono di capire le varie dinamiche nelle grandi città come i commute patterns e land use tutte informazioni utili a capire e gestire al meglio una città. Anche se questi dati presi singolarmente sono utilissimi per le applicazioni menzionate prima, possono risultare molto più potenti se combinati o integrati in un’unica rappresentazione. Ad esempio anche se i CDR forniscono un indicazione su un grande raggruppamento di persone in una certa zona una volta combinati con i dati social possono rivelare anche il perché di un tale evento.
  • #5: Questo processo di combinazione e integrazione degli dati o data fusion punta ad analizzare i dati cosi che ciascun data set possa interagire, informare e completare gli altri data set. Record matching vs knowledge fusion.
  • #6: This is a category that uses different data sets that are in different stages of the process of data mining. Following this category, the data sets are loosely coupled without any requirements on their consistency. This method treats features extracted from different data sets and creates an array by concatenating them. This array can then be used in clustering and classification methods. 3. These methods take in consideration the relations between features in different data sets. This implies that the data miner knows what each data set represents, and why they can be fused or why they re-inforce each other in terms of enrichment of information.
  • #7: Data such as anonimyzed CDR or social network datasets
  • #8: By following the diagram in the first chapter we present the steps for applying the data – fusion methods.
  • #9: Milano Grid and time series of the activity levels of one of the cells during the two months period
  • #10: Big data challenge 2014 : aggregated CDR data and geo-tagged social network data tables .
  • #11: Faster computation as there are less entries
  • #23: The data used in the previous study were aggregated . It means that there were no personal data provided– they just provide the level of mobile phone activity in a certain geographic area identified by a square cell inside a grid . However there are many cases in which CDR data are released in a fine grained temporal and location scale, where personal anonymized data are provided. That means that individual mobility traces can be spotted and analyzed. In the same way geo-tagged social data form location based services such as Foursquare, or social networking services such as Twitter or Flickr can reveal location traces of their users.
  • #25: The data used in the previous study were aggregated . It means that there were no personal data provided– they just provide the level of mobile phone activity in a certain geographic area identified by a square cell inside a grid . However there are many cases in which CDR data are released in a fine grained temporal and location scale, where personal anonymized data are provided. That means that individual mobility traces can be spotted and analyzed. In the same way geo-tagged social data form location based services such as Foursquare, or social networking services such as Twitter or Flickr can reveal location traces of their users.
  • #32: Conclusions for this part : the uniqueness test shows the number of points needed for singleing out the mobility traces of 80-95 % of the overall users. A number of maximum 7 points is needed to do this. This number is not affected by the time intervall of the matchin process. The same can be sad for the percentage of the users paths singled out.
  • #33: Having discovered the number of points sufficient to single out a CDR user we proceed in matching the CRD users with the social ones . In the graphics a simple matching process between CDR and social data. While the C4 and C3 can be excluded due to their producing data in different locations in the same moment, nothing can be sad for C2 and C1. That’s why we use a probabilistic approach that could tell us (within a reasonably limit if a CDR user is the same social user with wich the events are matching )
  • #35: Conclusions for this part : the matching test shows the number of cdr users with which the social users match for a given number of points. That means that every social user has at least one point in common with (on average) 1000 CDR users. It has two points in common (on average) with 100 CDR users, it has 13 points in common with (on average) 50 CDR users. Analogously the percentage of CDR users with which the social users have 1, 2, 3…15 points in common decreases.