SlideShare a Scribd company logo
Semantic Approach to
Big Data and Event Processing
Listening to the pulse of our
cities fusing Social Media
Streams and Call Data Records
Emanuele Della Valle
DEIB - Politecnico di Milano
@manudellavalle
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Agenda
 Context
 Problem
 Experimental setting
 Solution
 Evaluation
 Conclusions
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
because the urban environment
is captured in open datasets
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
the pervasive deployment
of sensors
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
the wide adoption of smart
phones
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The digital reflection of our cities is sharpening
[photo: http://hoglundassociates.com/Images/Cloud_Gate.jpg]
and streams of information flows
through our cities thanks to
the usage of (location-based)
social networks
8/10/2015 @manudellavalle - http://emanueledellavalle.org
and it is tracking changes with a decreasing delay
8/10/2015 @manudellavalle - http://emanueledellavalle.org
and it is tracking changes with a decreasing delay
Data source By when Frequency Delay
Census data 100s year years months
Newspaper 100s year days 1 day
Weather sensors 10s year hours/minutes hours/minutes
TV news 10s years hours minutes
Traffic sensors years 15 minutes minutes
Call Data Recors years 15 minutes hours
Social media years seconds seconds
IoT recently milliseconds milliseconds
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Data pile up without making decision any easier
I have to decide:
A or B?
Why not C?
What if D?
mayor
8/10/2015 @manudellavalle - http://emanueledellavalle.org
But smarter Big Data can …
…advance our ability to feel the pulse of our cities
fusing all those
data sources
making sense of the
fused information
mayor
Definitely E!
to improve decision making and deliver innovative services
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Can we collect, analyse and repurpose
• social media and
• Call Data Records
to allow
• perceiving emerging patterns and
• observing their dynamics?
Let's focus on a concrete research question
[photo: https://www.flickr.com/photos/debord/4932655275]
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Can we collect, analyse and repurpose
• social media captured at place and events and
• privacy-preserving aggregates of Call Data Records
to allow visually
• perceiving emerging patterns and
• observing their dynamics?
More precisely, the research question is
[photo: https://www.flickr.com/photos/debord/4932655275]
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How to set up an experiment?
[photo: https://www.flickr.com/photos/myfuturedotcom/6053042920]
Question Answer
Which city? Milan
Comparing what? Milan Design Week vs. Milan in general
Experimental subjects? Event Managers & casual audience
8/10/2015 @manudellavalle - http://emanueledellavalle.org
What's Milan Design Week?
[map: http://www.fuorisalone.it]
The Milan Design Week (MDW) is a city-scale event
• held yearly in Milan,
• featuring around 1,200 events
• in 500+ places spread across the city and
• attracting about half a million people from all over the
world.
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Ingredients of the proposed solution
 Big Data technologies
- Address "velocity" of data streams in memory
- Address "volume" of data that do not fit in memory
 semantic technologies
- Address "variety" using Ontology Based Data Access
- Named Entity Recognition and Linking
 data science
- Statistical modelling
- detecting anomalies
 Visual analytics
- Allow no-expert access to data
- Tell stories out of data
8/10/2015 @manudellavalle - http://emanueledellavalle.org
CitySensing - a solution for event managers (2013)
F. Antonelli, M.Azzi,
M.Balduini, P.Ciuccarelli,
E.Della Valle, R. Larcher:
City sensing: visualising
mobile and social data
about a city scale event.
AVI 2014: 337-338
http://jol.telecomitalia.com/jols
kil/citysensing/
8/10/2015 @manudellavalle - http://emanueledellavalle.org
CitySensing - a solution for casual audience (2014)
M.Balduini, E.Della Valle, M.Azzi, R.Larcher, F.Antonelli, and P.Ciuccarelli:
CitySensing: Fusing City Data for Visual Storytelling. IEEE MultiMedia. TO APPEAR
http://jol.telecomitalia.com/jolskil/citysensing/
http://citysensing.fuorisalone.it/
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How CitySensing works – step 0
Set up a conceptual model (FraPPE) to master the variety in the data sources
M.Balduini, E. Della Valle: FraPPE: a vocabulary to represent heterogeneous
spatio-temporal data to support visual analytics. ISWC 2015
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How CitySensing works – step 0
 FraPPE
• Goal: a vocabulary to represent heterogeneous spatio-
temporal data to support visual analytics
 FraPPE offers an homogenous view to the
visual analytics interface built on heterogeneous
data
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How CitySensing works – step 1
For every pixel compute the volume of Call Data Records
(using privacy-preserving aggregation)
Real data recorded on 13 April 2013 between 13:00 and 00:00
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How CitySensing works – step 2
Find the anomalous pixels comparing the current
volumes with a model of the volumes in this time period
Real data recorded on 13 April 2013 between 13:00 and 00:00
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How CitySensing works – step 3
Map anomalies to the districts of Milano Design Week
Brera
Tortona
What's
this?
Real data recorded on 13 April 2013 between 13:00 and 00:00
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How CitySensing works – step 4
For every anomalous pixel capture the hashtags and semantic
entities named in the social media streams
Brera
Tortona
What's
this?
Real data recorded on 13 April 2013 between 13:00 and 00:00
8/10/2015 @manudellavalle - http://emanueledellavalle.org
How CitySensing works – step 5
Take away the hashtags and semantic entities that are
systematically used
Brera
Tortona
Real data recorded on 13 April 2013 between 13:00 and 00:00
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Logical architecture of CitySensing – setup time
Analyse Data Stream
Build Models
Capture Data Stream Capture Static Data
MDW
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Logical architecture of CitySensing – run time
Analyse Data Stream
Build Models
Detect Anomalies
Capture Data Stream
Visualize Analysis
Store Analysis
Capture Static Data
MDW
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Capturing static data via FraPPE
 The frame duration was fixed to
15 minutes
 Milano area was covered with
• 1 grid (100x100)
• 10,000 cells
• 250x250 meters in each cell
(the size of the mobile
network cells in the centre
of Milan)
 During the Milano Design Week
a total of 5.76 Mln pixel were
captured
 +1000 events in +600 places
where collected using the
crowd-sourced databases of fuorisalone.it, breradesigndistrict.it and
tortonaroundesign.com thanks to a partnership with studiolabo
Cells in which there are places
hosting Milan Design Week 2013
events
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Telecom Italia Call Data Records
 1.92 Mln Gaussian models were built
• one for each pixel (i.e., for each frame and cell)
• grouping the frames by working and week-end days
• using two months of Call Data Records, and
• verifying volume of CDR has a Gaussian distribution with an
Anderson-Darling test with a significance of 0.05
 Built on Pig, R e Cascalog
 The processing on 7 m1.large EC2 machines took 24 hours
Bad case Good case
Histogram
Histogram
Q-QPlot
Q-Qplot
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Telecom Italia Call Data Records
 Volume of CDR captured in Milan during the Design Week
 Calls, SMS and Internet access
were aggregated
(with privacy-preserving
methods) and an
anomaly index was
computed for each of
the 5.76 Mln pixel
 The processing of 1 day on 7 m1.large EC2 took 20 mins
What 2013 2014
Calls 16,743,875 19,719,629
SMSs 19,454,497 20,240,485
Internet data accesses 137,381,761 197,767,245
[image: https://cerijayne.files.wordpress.com/2011/10/outliersss.png]
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Do CDR-anomalous pixels relate to events?
 CDR-anomalous pixels =pixels in which the anomaly
index is high (>+2σ and <-2σ)
 To test if the anomalous pixels were related to the events
of the Milan Design Week
• We used three ground truth
– the pixel of Milan
– the pixels of Brera district
– the pixels of Tortona district
where there was at least an event of Milan Design Week 2013
• We compute
– Precision
– Recall
of the anomalous pixels to find pixels in those three ground
truths
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Do CDR-anomalous pixels relate to events?
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
MilanBreraTorotna
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Tuesday Wednesday Thursday Friday Saturday Sunday
precision
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Do CDR-anomalous pixels relate to events?
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
MilanBreraTorotna
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Tuesday Wednesday Thursday Friday Saturday Sunday
recall
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Do CDR-anomalous pixels relate to events?
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
MilanBreraTorotna
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Tuesday Wednesday Thursday Friday Saturday Sunday
precision
recall
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Social Streams
 The machinery: the Streaming Linked Data framework
M.Balduini, E.Della Valle, D.Dell'Aglio, M.Tsytsarau, T.Palpanas, and C.Confalonieri:
Social Listening of City Scale Events Using the Streaming Linked Data Framework.
International Semantic Web Conference (2) 2013: 1-16
Stream Bus
AnalyserDecorator
Adapter Publisher VisualizerStream
HTTP
HTTP
Data Source Streaming Linked Data Server HTML5 Browser
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Social Streams
 Decoration at work
Happily into a bottle of Heineken
bear #heinekendesignweek
@ the Heineken Magazzini
City-Scale Event: Milano Design Week
Event: Heineken Design Week
Location: The Magazzini
hosts
takesPlaceIn

M.Balduini, A.Bozzon, E.Della Valle, Y.Huang, G-J Houben: Recommending Venues Using
Continuous Predictive Social Media Analytics. IEEE Internet Computing 18(5): 28-35
(2014)
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Social Streams
 predictive models were built
• For hastags and semantic entities systematically present
• Using a Holt-Winter method
• grouping the frames by
– working and week-end days and
– Early morning, morning, afternoon, evening, and late night
• Analysing 300,000 geo-located micro-posts collected other
6 months in Milano area (november 2013, aprile 2014)
• It takes few seconds per hashtag/semantic entity on a
60€/month VM in a IaaS
Data
Fitted
Forecast
Lower 2,5%
Upper 97,5%
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Social Streams
 Usage of #milan in the weeks around Milan Design Week
 Subtracting the predicted usage of #milan
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
WD WE WD WE WD WE WD WE WD
Milan
Design
Week
WD WE WD WE WD WE WD WE WD
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Social Streams
 The difference between the observed and the predicted
usage of #milan perfectly fits the usage of #mdw (the official
hashtag of Milan Design Week)
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
200 – 700
700 – 1100
1100 – 1400
1400 – 1900
1900 – 200
WD WE WD WE WD WE WD WE WD
Milan
Design
Week
Anomalous
usage of
#milan
Usage of
#mdw
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Processing Social Streams
 Geo-references micro-posts captured, semantically annotated,
cleansed using the predictive models and analyzed in Milan area
 For each pixel with at least 1 micro-post we computed
 The volume related to Milano Design Week
 The top-10 hashtags
 The top-3 locations/events
 Real-time processing was possible with our in-memory
C-SPARQL engine and the Streaming Linked Data framework on
a 20€/month VM in a IaaS
What 2013 2014
Geo-located micropost 57,154 21,782
Linked to Milano Design Week 3,569 3,499
Linked to a specific location/event 761 547
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Do socially active pixels relate to events?
 socially active pixels =pixels in which we captured social
media that talk about Milan
Design Week
 To computes
• precision
• recall
of the socially active pixels in find pixels in pixels in the
three ground truths about Milan, Brera district and
Tortona district
8/10/2015 @manudellavalle - http://emanueledellavalle.org
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Do socially active pixels relate to events?
MilanBreraTorotna
Tuesday Wednesday Thursday Friday Saturday Sunday
precision
8/10/2015 @manudellavalle - http://emanueledellavalle.org
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Do socially active pixels relate to events?
MilanBreraTorotna
Tuesday Wednesday Thursday Friday Saturday Sunday
recall
8/10/2015 @manudellavalle - http://emanueledellavalle.org
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Do socially active pixels relate to events?
MilanBreraTorotna
Tuesday Wednesday Thursday Friday Saturday Sunday
precision
recall
8/10/2015 @manudellavalle - http://emanueledellavalle.org
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.2
0.4
0.6
0.8
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Do socially active pixels relate to events?
MilanBreraTorotna
Tuesday Wednesday Thursday Friday Saturday Sunday
precision
recall
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Anomalous Socially active Intersection Similar?




Are CDR-anomalous and socially active pixels similar?
 Which of the following four scenarios?
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Are CDR-anomalous and socially active pixels similar?
 More formally
• Jaccard
• E.g.,
J(A,B) = 8/11 J(A,B) = 3/11
A B A
B
J(A,B) =
|A ∩ B|
|A∪B|
8/10/2015 @manudellavalle - http://emanueledellavalle.org
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0904:00
0907:00
0910:00
0913:00
0916:00
0919:00
0922:00
1001:00
1004:00
1007:00
1010:00
1013:00
1016:00
1019:00
1022:00
1101:00
1104:00
1107:00
1110:00
1113:00
1116:00
1119:00
1122:00
1201:00
1204:00
1207:00
1210:00
1213:00
1216:00
1219:00
1222:00
1301:00
1304:00
1307:00
1310:00
1313:00
1316:00
1319:00
1322:00
1401:00
1404:00
1407:00
1410:00
1413:00
1416:00
1419:00
1422:00
1501:00
Are CDR-anomalous and socially active pixels similar?
BreraTorotna
Tuesday Wednesday Thursday Friday Saturday Sunday
recall CDR-anomalous
recall socially active
Jaccard
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Visualizing for a casual audience
8/10/2015 @manudellavalle - http://emanueledellavalle.org
See it in action!
http://youtu.be/MOBie09NHxM
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Evaluation methodology for the casual audience
 Guessability study
• Can you guess what I mean without any explanation?
 E.g.
Dinosaur extinction
"The Shining" by Stephen King
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Evaluation of interface guessability
8/10/2015 @manudellavalle - http://emanueledellavalle.org
The patters you should have got
 The CDR-anomaly and the social activity is
Correlated Partially correlated Not correlated
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Evaluation of interface guessability
Q: In Brera District
the volume of social
media signal is
partially correlated
with the value of
mobile anomaly
signal
A:
0
0.2
0.4
0.6
0.8
1
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Evaluation of interface guessability
Q: In Porta Romana
the volume of social
media signal is
strongly correlated
with the value of
mobile anomaly
signal
A:
0
0.2
0.4
0.6
0.8
1
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Evaluation of interface guessability
Q: In Tortona District
the volume of social
media signal is
strongly correlated
with the value of
mobile anomaly
signal
A:
0
0.2
0.4
0.6
0.8
1
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Back to the research question
[photo: https://www.flickr.com/photos/debord/4932655275]
Can we collect, analyse and repurpose
• social media captured at place and events and
• privacy-preserving aggregates of Call Data Records
to allow visually
• perceiving emerging patterns and
• observing their dynamics?
Yes!
at least, in Milano Design Week 2013 and 2014
[photo: https://flic.kr/p/beuDaX ]
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Take home message … guess it :-)
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Take home message … guess it :-)
Emanuele Della Valle
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Acknowledgements
 Politecnico di Milano
• DEIB
– What
- Scientific direction
- Semantic technologies
- Stream Processing
- Data science
– Who
- Emanuele Della Valle
- Marco Balduini
• Density Design Lab
– What
- Visual analytics
– Who
- Paolo Ciuccarelli
- Matteo Azzi
 Telecom Italia
• SKIL Lab
– What
- Big Data technology
- Data Science
– Who
- Fabrizio Antonelli
- Roberto Larker
 Funding agency
8/10/2015 @manudellavalle - http://emanueledellavalle.org
Semantic Approach to
Big Data and Event Processing
Listening to the pulse of our
cities fusing Social Media
Streams and Call Data Records
Emanuele Della Valle
DEIB - Politecnico di Milano
@manudellavalle
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
8/10/2015 @manudellavalle - http://emanueledellavalle.org

More Related Content

PDF
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
PPTX
Smart selfnovember2013
PDF
Big data, Behavioral Change and IOT Architecture
PPT
XEBICON Public November 2015
PPTX
Smart homeamsterdamoctober2013
PDF
Confluence2016
PPTX
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
PPTX
Digital delta & geodesign sept2014: connection water data with geo data infra...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Smart selfnovember2013
Big data, Behavioral Change and IOT Architecture
XEBICON Public November 2015
Smart homeamsterdamoctober2013
Confluence2016
City Data Fusion and City Sensing presented at EIT ICT Labs for EXPO 2015
Digital delta & geodesign sept2014: connection water data with geo data infra...

What's hot (14)

PDF
Safecast long version oct 2015
PPTX
Semantics-empowered Smart City applications: today and tomorrow
PPT
The European CIO Conference - November 27th, 2014
PPTX
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
PPT
CUbRIK is
PPT
How to make cities "smarter"?
PPT
Data Analytics for Smart Cities: Looking Back, Looking Forward
PPTX
Big Data & Smart City Applications
PDF
Tim willoughby
PPTX
Big data and smart cities
PPTX
The Future Started Yesterday: The Top Ten Computer and IT Trends
PPTX
20160511 Sustainability in Local Government
PDF
Geospatial Information Management
PDF
Strawberry energy
Safecast long version oct 2015
Semantics-empowered Smart City applications: today and tomorrow
The European CIO Conference - November 27th, 2014
City Data Fusion: A Big Data Infrastructure to sense the pulse of the city in...
CUbRIK is
How to make cities "smarter"?
Data Analytics for Smart Cities: Looking Back, Looking Forward
Big Data & Smart City Applications
Tim willoughby
Big data and smart cities
The Future Started Yesterday: The Top Ten Computer and IT Trends
20160511 Sustainability in Local Government
Geospatial Information Management
Strawberry energy
Ad

Viewers also liked (9)

PDF
Mastering the variety dimension of Big Data with semantic technologies: high ...
PDF
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
PPTX
Examples of Real-World Big Data Application
PDF
Mastering the Velocity Dimension of Big Data
PPTX
Integrating Sensor and Social Data for Understanding City Events
PDF
Semantics Approach to Big Data and Event Processing: an introduction focused ...
PDF
Examples of Applied Semantic Technologies: Social Data Annotation
PPTX
Knoesis-Semantic filtering-Tutorials
PDF
RDF Streams and Continuous SPARQL (C-SPARQL)
Mastering the variety dimension of Big Data with semantic technologies: high ...
Examples of Applied Semantic Technologies: Application of Semantic Sensor Net...
Examples of Real-World Big Data Application
Mastering the Velocity Dimension of Big Data
Integrating Sensor and Social Data for Understanding City Events
Semantics Approach to Big Data and Event Processing: an introduction focused ...
Examples of Applied Semantic Technologies: Social Data Annotation
Knoesis-Semantic filtering-Tutorials
RDF Streams and Continuous SPARQL (C-SPARQL)
Ad

Similar to Listening to the pulse of our cities fusing Social Media Streams and Call Data Records (20)

PPTX
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
PPTX
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
PPTX
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...
PPT
Mobile Data Mashups for Urban Computing Applications
PPTX
Using gamification to generate citizen input for public transport planning
PDF
Km4City: una soluzione aperta per erogare servizi Smart City
PDF
Snap4City a Solution for highly collaborative Smart Cities Environments
PDF
Km4City: Smart City Ontology Building for Effective Erogation of Services
PDF
Urban And Regional Data Management Udms Annual 2011 Siyka Zlatanova
PPSX
IT trends – 2013 & beyond
DOCX
Chapter 10 Google The Drive to Balance Privacy with Profit C.docx
PPTX
Open Data policy implementations: Creating economic value
PDF
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
PDF
Snap4City November 2019 Course: Smart City IOT Data Ingestion Interoperabilit...
PDF
"Km4City: Smart City Ontology Building for Effective Erogation of Services"
PDF
Untethered Engineering and the Fifth IT Wave
PPTX
Digital devices as one of my 2015-2016 lectures at the University of Bergamo.
PDF
RESOLUTE: Governing for Resilience – Implementation Challenges
PPT
Ist16-03 An Introduction to the Semantic Web
PPT
Gianluca Vannuccini - Commune di Firenze - open data city of florence - July...
Listening to the pulse of our cities with Stream Reasoning (and few more tech...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
FraPPE: a vocabulary to represent heterogeneous spatio-temporal data to suppo...
Mobile Data Mashups for Urban Computing Applications
Using gamification to generate citizen input for public transport planning
Km4City: una soluzione aperta per erogare servizi Smart City
Snap4City a Solution for highly collaborative Smart Cities Environments
Km4City: Smart City Ontology Building for Effective Erogation of Services
Urban And Regional Data Management Udms Annual 2011 Siyka Zlatanova
IT trends – 2013 & beyond
Chapter 10 Google The Drive to Balance Privacy with Profit C.docx
Open Data policy implementations: Creating economic value
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City November 2019 Course: Smart City IOT Data Ingestion Interoperabilit...
"Km4City: Smart City Ontology Building for Effective Erogation of Services"
Untethered Engineering and the Fifth IT Wave
Digital devices as one of my 2015-2016 lectures at the University of Bergamo.
RESOLUTE: Governing for Resilience – Implementation Challenges
Ist16-03 An Introduction to the Semantic Web
Gianluca Vannuccini - Commune di Firenze - open data city of florence - July...

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Computer network topology notes for revision
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Lecture1 pattern recognition............
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
IB Computer Science - Internal Assessment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Database Infoormation System (DBIS).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Supervised vs unsupervised machine learning algorithms
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Computer network topology notes for revision
Moving the Public Sector (Government) to a Digital Adoption
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Lecture1 pattern recognition............
Business Acumen Training GuidePresentation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
.pdf is not working space design for the following data for the following dat...
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
climate analysis of Dhaka ,Banglades.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Listening to the pulse of our cities fusing Social Media Streams and Call Data Records