SlideShare a Scribd company logo
Analogies Observatories Semantic Signatures Challenges Next Steps
Exploring the Data Universe with
Semantic Signatures
Plous Lecture 2015
Krzysztof Janowicz
STKO Lab, University of California, Santa Barbara, USA
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Puddingand planets
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Plum Pudding
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Thomson’s Plum Pudding Model (1904)
Positive charge distributed equally in the atom, electrons embedded as raisins
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Rutherford(-Bohr) Solar System Model (1911/13)
Small nucleus with a high mass and electrons that revolve around it
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Analogies & Atoms
Analogies
‘And I cherish more than anything the Analogies,
my most trustworthy masters. They know all the
secrets of Nature, and they ought to be least
neglected in Geometry.’ (Johannes Kepler)
Analogies enable us to explore a new
domain (target) by mapping its structure
to another, more familiar domain (source).
They allow us to ask new questions which
only become meaningful in the new domain.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Observatoriesand sensors
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Astronomical Observatories And Their Sensors
The Griffith Observatory
Griffith donated funds and land to build the observatory to make astronomy accessible to
the public. This was in clear contrast to the prevailing idea of locating observatories on
remote mountaintops and restrict them to scientists. Today, our society is willing to invest
billions to study phenomena that may not even exist anymore (e.g., the Pillars of Creation).
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Astronomical Observatories And Their Sensors
Observatories and Their Sensors
Whether on land or in space, observatories and their sensors serve
different purposes and are most useful when they work together.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Astronomical Observatories And Their Sensors
Spectral Signatures, Bands, and Remote Sensing
Spectral signatures are the combination of emitted, reflected, or absorbed
electromagnetic radiation at varying wavelengths (bands) that uniquely
identify a feature type.
Spectral libraries, the idea of sharing spectral signatures, has
revolutionized remote sensing.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
A Universe of Data?
The Data Universe: Synthesis Is The New Analysis
What is the common core of the digital universe, physical-cyber-social
systems, digital earth, 4th paradigm, big data, social machines, and so forth?
Synthesis is the new analysis
Observational science versus experimental science
(Unintended)reuse of existing data, semantic interoperability
Heterogeneity: multi-thematic, multi-perspective, multi-resolution
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
A Universe of Data?
Towards Data Observatories
Web Science Trust: ‘A web observatory is a system that gives public access
to some specific aspects of the WWW and provides the infrastructure and
visualization techniques to support monitoring, analysis, and experiments.’
Web Science Trust wants to establish a network of observatories.
New questions: are there laws of the data universe?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Constructing the
Analogy
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Semantic Signatures and Bands
Semantic Signatures As Analogy To Spectral Signatures
Geospatial bands
based on geographic location
ANND
Ripley’s K Bins
J Measure
Dzero
Temporal bands
based on geo-social check-ins
24 Hours
7 Days
Seasons
Thematic bands
based on venue tips and reviews
LDA topics
TF-IDF
Makes use of data
heterogeneity
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Thematic Bands
Thematic Bands & Geo-Indicativeness
Places at geographic location 34.43, -119.71 are:
of types city, county seat,...
at the coastline, near the mountains, have Mediterranean climate,...
described in terms of urban area, economy, tourism, government, employment,...
Interesting observation: some of these terms will co-occur by type, others per region.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Thematic Bands
Thematic Bands & Geo-Indicativeness
A thematic band can be
computed out of unstructured
text from sources such as
Wikipedia, travel blogs, news
articles, and so forth.
Non-georeferenced plain text
is often still geo-indicative
Different types of geographic
features have different,
diagnostic topics associated to
them (out of 500 topics)
Indicative topics and be lifted to
the type-level.
Here, we modeled topics using
latent Dirichlet allocation (LDA)
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Thematic Bands
Thematic Bands & Geographic Feature Types
City topics: 204>450>104>282>267>497>443>484>277>97>...
Town topics: 425>450>419>367>104>429>266>69>204>308>...
Mountain topics: 27>110>5>172>208>459>232>398>453>183>...
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Temporal Bands
Temporal Bands
Study geo-social
check-in data to
location-based social
networks.
Aggregate them to the
feature type level and
clean them.
Intuitively, people visit
wineries in the
after-noon and evening
and bakeries in the
mornings.
Combining weekly and
hourly bands to create
place type signatures.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Spatial Bands
Spatial Bands
POI plotted by similarity to bar and post office in OpenStreetMap data (London)
Similarity measured as association strength in OSM change history
Bars (and similar features) tend to clump together
Post Offices (and similar features) are rather uniformly distributed
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Spatial Bands
Spatial Bands
Dzero measures the likelihood of features of a certain type to co-occur
within a specific semantic and spatial range.
General idea: generate recommendations and clean up data based on
type likelihood. ’How likely is a post office directly next to an existing one?’
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Sensor Resolution & Social Sensing
(Remote sensing) sensors can be characterized by their resolution
Spatial resolution: smallest feature that can be detected, i.e., the pixel size.
Temporal resolution: smallest time interval between a repeated observation.
Spectral resolution: number, position, and width of spectral bands.
Radiometric resolution: small distinguishable differences in radiation magnitude.
Analogous social sensor resolutions, e.g., types of bands, number of topics.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Platial Resolution of Termporal Signatures
Circular temporal signatures histograms for Theme Park (a,b,c) and Drugstore (d,e,f).
About 50% of ≈ 400 Point Of Interest (POI) types are regionally invariant in the USA.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Temporal Resolution of Termporal Signatures
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
The ’Foursquare-day’
How and when do people check-in at places, manually, automatically?
Do they check-out? If not, after what time are they checked-out automatically?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Sensor Resolution & Social Sensing
Distinguishable Feature Types For Thematic Signatures From 500-Topics
Which classes in a feature type schema can be meaningfully distinguished?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
SpatialSearch Challenges
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
1. Challenge: Mapping User Locations from Spaces to Places
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
1. Challenge: Mapping User Locations from Spaces to Places
Estimate the place visited by a user from the user’s spatial location
(e.g., as measured by their smartphone).
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
Baseline: Google Place API
Marker Category Distance (m)
A Bakery 39.2
B Nightclub 41.4
C Nightclub 69.9
D American Restaurant 62.7
E Bakery 73.7
F Fast Food 65.0
G Apparel Store 85.8
H Ice Cream Shop 82.6
I Movie Theater 94.2
J Pub 88.9
K Cosmetics Shop 60.9
L Diner 70.0
M Italian Restaurant 45.7
N Furniture / Home Store 114.9
O Grocery Store 147.8
P BBQ Joint 82.3
Q Burrito Place 88.1
R Italian Restaurant 93.6
Geolocation APIs map geographic coordinates, e.g., from a user’s
smartphone, to an ordered sets of nearby candidate POI.
These services typically return the n nearest POI within a certain radius and
use spatial distance to the provided coordinates to determine their order.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
Our Approach: Distort POI Locations Using Temporal Signatures
Marker Category Distance (m) Monday 10AM (10−3) Saturday 11PM (10−3)
A Bakery 39.2 6.28 4.08
B Nightclub 41.4 0.26 44.16
C Nightclub 69.9 0.26 44.16
D American Restaurant 62.7 1.61 9.50
E Bakery 73.7 6.28 4.08
F Fast Food 65.0 4.80 5.78
G Apparel Store 85.8 2.51 1.09
H Ice Cream Shop 82.6 0.84 15.88
I Movie Theater 94.2 1.44 11.00
J Pub 88.9 0.53 22.66
K Cosmetics Shop 60.9 3.87 1.57
L Diner 70.0 5.49 7.56
M Italian Restaurant 45.7 1.42 7.96
N Furniture / Home Store 114.9 4.79 5.01
O Grocery Store 147.8 4.53 1.38
P BBQ Joint 82.3 0.43 9.35
Q Burrito Place 88.1 0.54 3.16
R Italian Restaurant 93.6 1.42 7.96
The likelihood of visiting a coffee shop, university, bakery, etc at 7pm is
rather low, while it is a peak hour for restaurants.
In analogy to scale distortion in cartography, we can modify the purely spatial
ranking by pulling and pushing places based on the check-in probability
of their temporal type signatures.
Different distortion models: linear, non-linear, symmetrical, non-symmetric
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
From Space to Place Through Time
Our Approach: Distort POI Locations Using Temporal Signatures
Marker Actual
Dist.(m)
Distorted
Dist.(m)
A 39.2 25.8
B 41.4 71.4
C 69.9 99.9
D 62.7 79.8
E 73.7 60.3
F 65.0 59.5
G 85.8 95.6
H 82.6 106.7
I 94.2 112.8
J 88.9 116.1
K 60.9 61.1
L 70.0 60.6
M 45.7 64.5
N 114.9 109.5
O 147.8 143.9
P 82.3 110.5
Q 88.1 115.2
R 93.6 112.4
Method MRR SRR nDCG 1st Pos.
Distance-Only 0.359 443.8 0.583 211
Temporally Adjusted 0.453 793.5 0.711 423
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
2. Challenge: Vague Cognitive Regions
Where is SoCal and NorCal?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Baseline: Tests With Human Participants
44 participants, 90 hexagon tessellation (≈ 4920km2
each)
Google Maps search for SoCal
[More on the extraction of polygons at spatial@ucsb.local2015]
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Data and Correlations
Source SoCal NorCal Total
Flickr 22132 19706 41838
Instagram 169648 116984 286632
Twitter 10376 3294 13670
Travel Blogs 107 78 185
Wikipedia 1450 700 2150
0 1000 2000 3000 4000
0.00.20.40.60.81.0
Empirical Cumluative Distribution
Flickr photo counts per user
CDF
Source ρ (M1) τ (M1)
Flickr 0.881 0.721
Instagram 0.867 0.711
Twitter 0.874 0.714
TravelBlogs & Wikipedia 0.897 0.74
Means 0.870 0.712
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Vague Cognitive Regions and Inter-rater Agreement
5
5
5 6.25
5.75
6.75
5.25
5.75
5.25
6.75
4.25
6.75
6.75
5.75
5.25
6.75
5.75
5.25
5.75
4.75
6.75
6.75
4.25 4.75
5.25
4
4
5
4
4
4
5
5.5
5.5
3.5
6.5
1.5
6.5
6.5 5.5
5.5
3.5
4.5
4.253
3
3
3
3.5
2.5
3.5
2.5
2.5
2.5
3.5
2.5
3.25
3.25
3.75
1.75
2.75
3
2
2
2
2.25 2.25
2.25
2.25
0 80 160 240 32040
Miles
®
Legend
Insufficient Data
Very Northern Californian
Moderately Northern Californian
Slightly Northern Californian
Equally Northern and Southern Californian
Slightly Southern Californian
Moderately Southern Californian
Very Southern Californian
Standard Deviations
< 0.01
0.01 - 0.50
0.51 - 1.00
1.01- 1.73
> 1.73
Source Four Raters Five Raters
Kendall’s W 0.953 0.929
p-value < 0.001 < 0.001
Key idea: Data sources become
raters/ participants.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Vague Cognitive Regions and Thematic Signatures
Do you even have to mine for the Socal and Norcal term directly?
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Vague Cognitive Regions: Where is SoCal?
Vague Cognitive Regions and Self-Similarity
0 5 10 15
0.000.050.100.150.20
KLD Divergence
Northern California
Southern California
Both Northern & Southern California
Based on 60 topics, the similarity between SoCal (and NorCal) cells is
higher than between SoCal and NorCal cells.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Is the Data Universe Homogenous amd Isotropic?
Limits Of The Data Universe Analogy
At large scale, the physical universe is homogenous and isotropic
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
Is the Data Universe Homogenous amd Isotropic?
Limits Of The Data Universe Analogy
In terms of geospatial distribution the Social Media Web is neither homogenous
nor isotropic. If you direct your social sensing instrument to a certain region,
there will be no signal.
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
NextSteps
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
POI Pulse Observatory
POI Pulse Observatory: Explore the Pulse of Los Angeles Using Signatures
http://poipulse.com/
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
POI Pulse Observatory
A Public Data Observatory at UCSB?
A tangible & public observatory at
UCSB; remember Griffith’s will.
Show & stream data from different
sources and show analysis results
Visualize privacy implications of data
Show citizens how their everyday data
is used for scientific discoveries
Exploring the Data Universe with Semantic Signatures K. Janowicz
Analogies Observatories Semantic Signatures Challenges Next Steps
The Right Place
Exploring the Data Universe with Semantic Signatures K. Janowicz

More Related Content

PDF
Ontology Engineering: A View from the Trenches - WOP 2015 Keynote
PDF
Linked (Data) Scientometrics Keynote
PDF
Why the Data Train Needs Semantic Rails -- The Case of Linked Scientometrics ...
PDF
Pattern-based Ontology Engineering
PDF
PDF
Objective Fiction, i-semantics keynote
PDF
Ontology Virtualization for Smart Data -- A Semantics Perspective on Open Dat...
PDF
'The Why, What, and How of Geo-Information Observatories' GeoRich2014 Keynote
Ontology Engineering: A View from the Trenches - WOP 2015 Keynote
Linked (Data) Scientometrics Keynote
Why the Data Train Needs Semantic Rails -- The Case of Linked Scientometrics ...
Pattern-based Ontology Engineering
Objective Fiction, i-semantics keynote
Ontology Virtualization for Smart Data -- A Semantics Perspective on Open Dat...
'The Why, What, and How of Geo-Information Observatories' GeoRich2014 Keynote

Similar to Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015 (20)

PDF
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
PDF
Supporting Geo-Ontology Engineering through Spatial Data Analytics
PDF
Supporting Geo-Ontology Engineering through Spatial Data Analytics
PDF
Sdwwg experiences and outlook
PPTX
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
PPTX
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
PPTX
Semantic Sensor Networks and Linked Stream Data
PDF
Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data
PPTX
Representing and Reasoning about Geographic Occurrences in the Sensor Web
PPTX
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
PPTX
Ingredients for Semantic Sensor Networks
PDF
VO Course 11: Spatial indexing
PDF
Dc32644652
PPT
Google Techtalk 2006
PDF
Semantic Sensor Web
PDF
VO Course 02: Astronomy & Standards
PPT
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
PPT
Lecture_2.ppt on networking system by mr desu
PPT
201109021 mcguinness ska_meeting
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Supporting Geo-Ontology Engineering through Spatial Data Analytics
Supporting Geo-Ontology Engineering through Spatial Data Analytics
Sdwwg experiences and outlook
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
Semantic Sensor Networks and Linked Stream Data
Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data
Representing and Reasoning about Geographic Occurrences in the Sensor Web
Spatial Semantics for Better Interoperability and Analysis: Challenges and Ex...
Ingredients for Semantic Sensor Networks
VO Course 11: Spatial indexing
Dc32644652
Google Techtalk 2006
Semantic Sensor Web
VO Course 02: Astronomy & Standards
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
Lecture_2.ppt on networking system by mr desu
201109021 mcguinness ska_meeting
Ad

More from kjanowicz (15)

PDF
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
PDF
Golledge Lecture May 2018
PDF
How “Alternative" are Alternative Facts? Towards Measuring Statement Coherenc...
PDF
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
PDF
Building Blocks for Distributed Geo-Knowledge Graphs
PDF
GeoVoCamp SB 2015 Welcome Slides
PDF
Heterogeneity is Here to Stay and Semantics is Not About Agreement
PDF
AAG 2014 Talk on Ontology Views, Reusue, Alignment
PDF
A Non-Technical, Example-Driven Introduction to Linked Data
PDF
Please don't agree: Introducing Descartes-Core
PDF
Where is the sweet spot for ontologies?
PDF
GEOSPATIAL SEMANTICS -- PROBLEMS AND PROJECTS
PDF
Semantics and Linked Data for CyberGIS -- AAG 2013 Frontiers and Roadmaps Se...
PDF
Big Geo Data
PDF
Introductory slides into Big Data in Geographic Information Science
Debiasing Knowledge Graphs: Why Female Presidents are not like Female Popes
Golledge Lecture May 2018
How “Alternative" are Alternative Facts? Towards Measuring Statement Coherenc...
Geo-Humanities 2017 Keynote at SIGSPATIAL 2017
Building Blocks for Distributed Geo-Knowledge Graphs
GeoVoCamp SB 2015 Welcome Slides
Heterogeneity is Here to Stay and Semantics is Not About Agreement
AAG 2014 Talk on Ontology Views, Reusue, Alignment
A Non-Technical, Example-Driven Introduction to Linked Data
Please don't agree: Introducing Descartes-Core
Where is the sweet spot for ontologies?
GEOSPATIAL SEMANTICS -- PROBLEMS AND PROJECTS
Semantics and Linked Data for CyberGIS -- AAG 2013 Frontiers and Roadmaps Se...
Big Geo Data
Introductory slides into Big Data in Geographic Information Science
Ad

Recently uploaded (20)

PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
2. Earth - The Living Planet earth and life
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
famous lake in india and its disturibution and importance
POSITIONING IN OPERATION THEATRE ROOM.ppt
The KM-GBF monitoring framework – status & key messages.pptx
INTRODUCTION TO EVS | Concept of sustainability
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Introduction to Cardiovascular system_structure and functions-1
ECG_Course_Presentation د.محمد صقران ppt
2. Earth - The Living Planet earth and life
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
2. Earth - The Living Planet Module 2ELS
Classification Systems_TAXONOMY_SCIENCE8.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Comparative Structure of Integument in Vertebrates.pptx
7. General Toxicologyfor clinical phrmacy.pptx
lecture 2026 of Sjogren's syndrome l .pdf
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
famous lake in india and its disturibution and importance

Exploring the Data Universe with Semantic Signatures: Plous Lecture 2015

  • 1. Analogies Observatories Semantic Signatures Challenges Next Steps Exploring the Data Universe with Semantic Signatures Plous Lecture 2015 Krzysztof Janowicz STKO Lab, University of California, Santa Barbara, USA Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 2. Analogies Observatories Semantic Signatures Challenges Next Steps Puddingand planets Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 3. Analogies Observatories Semantic Signatures Challenges Next Steps Analogies & Atoms Plum Pudding Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 4. Analogies Observatories Semantic Signatures Challenges Next Steps Analogies & Atoms Thomson’s Plum Pudding Model (1904) Positive charge distributed equally in the atom, electrons embedded as raisins Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 5. Analogies Observatories Semantic Signatures Challenges Next Steps Analogies & Atoms Rutherford(-Bohr) Solar System Model (1911/13) Small nucleus with a high mass and electrons that revolve around it Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 6. Analogies Observatories Semantic Signatures Challenges Next Steps Analogies & Atoms Analogies ‘And I cherish more than anything the Analogies, my most trustworthy masters. They know all the secrets of Nature, and they ought to be least neglected in Geometry.’ (Johannes Kepler) Analogies enable us to explore a new domain (target) by mapping its structure to another, more familiar domain (source). They allow us to ask new questions which only become meaningful in the new domain. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 7. Analogies Observatories Semantic Signatures Challenges Next Steps Observatoriesand sensors Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 8. Analogies Observatories Semantic Signatures Challenges Next Steps Astronomical Observatories And Their Sensors The Griffith Observatory Griffith donated funds and land to build the observatory to make astronomy accessible to the public. This was in clear contrast to the prevailing idea of locating observatories on remote mountaintops and restrict them to scientists. Today, our society is willing to invest billions to study phenomena that may not even exist anymore (e.g., the Pillars of Creation). Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 9. Analogies Observatories Semantic Signatures Challenges Next Steps Astronomical Observatories And Their Sensors Observatories and Their Sensors Whether on land or in space, observatories and their sensors serve different purposes and are most useful when they work together. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 10. Analogies Observatories Semantic Signatures Challenges Next Steps Astronomical Observatories And Their Sensors Spectral Signatures, Bands, and Remote Sensing Spectral signatures are the combination of emitted, reflected, or absorbed electromagnetic radiation at varying wavelengths (bands) that uniquely identify a feature type. Spectral libraries, the idea of sharing spectral signatures, has revolutionized remote sensing. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 11. Analogies Observatories Semantic Signatures Challenges Next Steps A Universe of Data? The Data Universe: Synthesis Is The New Analysis What is the common core of the digital universe, physical-cyber-social systems, digital earth, 4th paradigm, big data, social machines, and so forth? Synthesis is the new analysis Observational science versus experimental science (Unintended)reuse of existing data, semantic interoperability Heterogeneity: multi-thematic, multi-perspective, multi-resolution Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 12. Analogies Observatories Semantic Signatures Challenges Next Steps A Universe of Data? Towards Data Observatories Web Science Trust: ‘A web observatory is a system that gives public access to some specific aspects of the WWW and provides the infrastructure and visualization techniques to support monitoring, analysis, and experiments.’ Web Science Trust wants to establish a network of observatories. New questions: are there laws of the data universe? Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 13. Analogies Observatories Semantic Signatures Challenges Next Steps Constructing the Analogy Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 14. Analogies Observatories Semantic Signatures Challenges Next Steps Semantic Signatures and Bands Semantic Signatures As Analogy To Spectral Signatures Geospatial bands based on geographic location ANND Ripley’s K Bins J Measure Dzero Temporal bands based on geo-social check-ins 24 Hours 7 Days Seasons Thematic bands based on venue tips and reviews LDA topics TF-IDF Makes use of data heterogeneity Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 15. Analogies Observatories Semantic Signatures Challenges Next Steps Thematic Bands Thematic Bands & Geo-Indicativeness Places at geographic location 34.43, -119.71 are: of types city, county seat,... at the coastline, near the mountains, have Mediterranean climate,... described in terms of urban area, economy, tourism, government, employment,... Interesting observation: some of these terms will co-occur by type, others per region. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 16. Analogies Observatories Semantic Signatures Challenges Next Steps Thematic Bands Thematic Bands & Geo-Indicativeness A thematic band can be computed out of unstructured text from sources such as Wikipedia, travel blogs, news articles, and so forth. Non-georeferenced plain text is often still geo-indicative Different types of geographic features have different, diagnostic topics associated to them (out of 500 topics) Indicative topics and be lifted to the type-level. Here, we modeled topics using latent Dirichlet allocation (LDA) Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 17. Analogies Observatories Semantic Signatures Challenges Next Steps Thematic Bands Thematic Bands & Geographic Feature Types City topics: 204>450>104>282>267>497>443>484>277>97>... Town topics: 425>450>419>367>104>429>266>69>204>308>... Mountain topics: 27>110>5>172>208>459>232>398>453>183>... Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 18. Analogies Observatories Semantic Signatures Challenges Next Steps Temporal Bands Temporal Bands Study geo-social check-in data to location-based social networks. Aggregate them to the feature type level and clean them. Intuitively, people visit wineries in the after-noon and evening and bakeries in the mornings. Combining weekly and hourly bands to create place type signatures. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 19. Analogies Observatories Semantic Signatures Challenges Next Steps Spatial Bands Spatial Bands POI plotted by similarity to bar and post office in OpenStreetMap data (London) Similarity measured as association strength in OSM change history Bars (and similar features) tend to clump together Post Offices (and similar features) are rather uniformly distributed Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 20. Analogies Observatories Semantic Signatures Challenges Next Steps Spatial Bands Spatial Bands Dzero measures the likelihood of features of a certain type to co-occur within a specific semantic and spatial range. General idea: generate recommendations and clean up data based on type likelihood. ’How likely is a post office directly next to an existing one?’ Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 21. Analogies Observatories Semantic Signatures Challenges Next Steps Sensor Resolution & Social Sensing Sensor Resolution & Social Sensing (Remote sensing) sensors can be characterized by their resolution Spatial resolution: smallest feature that can be detected, i.e., the pixel size. Temporal resolution: smallest time interval between a repeated observation. Spectral resolution: number, position, and width of spectral bands. Radiometric resolution: small distinguishable differences in radiation magnitude. Analogous social sensor resolutions, e.g., types of bands, number of topics. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 22. Analogies Observatories Semantic Signatures Challenges Next Steps Sensor Resolution & Social Sensing Platial Resolution of Termporal Signatures Circular temporal signatures histograms for Theme Park (a,b,c) and Drugstore (d,e,f). About 50% of ≈ 400 Point Of Interest (POI) types are regionally invariant in the USA. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 23. Analogies Observatories Semantic Signatures Challenges Next Steps Sensor Resolution & Social Sensing Temporal Resolution of Termporal Signatures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 The ’Foursquare-day’ How and when do people check-in at places, manually, automatically? Do they check-out? If not, after what time are they checked-out automatically? Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 24. Analogies Observatories Semantic Signatures Challenges Next Steps Sensor Resolution & Social Sensing Distinguishable Feature Types For Thematic Signatures From 500-Topics Which classes in a feature type schema can be meaningfully distinguished? Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 25. Analogies Observatories Semantic Signatures Challenges Next Steps SpatialSearch Challenges Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 26. Analogies Observatories Semantic Signatures Challenges Next Steps From Space to Place Through Time 1. Challenge: Mapping User Locations from Spaces to Places Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 27. Analogies Observatories Semantic Signatures Challenges Next Steps From Space to Place Through Time 1. Challenge: Mapping User Locations from Spaces to Places Estimate the place visited by a user from the user’s spatial location (e.g., as measured by their smartphone). Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 28. Analogies Observatories Semantic Signatures Challenges Next Steps From Space to Place Through Time Baseline: Google Place API Marker Category Distance (m) A Bakery 39.2 B Nightclub 41.4 C Nightclub 69.9 D American Restaurant 62.7 E Bakery 73.7 F Fast Food 65.0 G Apparel Store 85.8 H Ice Cream Shop 82.6 I Movie Theater 94.2 J Pub 88.9 K Cosmetics Shop 60.9 L Diner 70.0 M Italian Restaurant 45.7 N Furniture / Home Store 114.9 O Grocery Store 147.8 P BBQ Joint 82.3 Q Burrito Place 88.1 R Italian Restaurant 93.6 Geolocation APIs map geographic coordinates, e.g., from a user’s smartphone, to an ordered sets of nearby candidate POI. These services typically return the n nearest POI within a certain radius and use spatial distance to the provided coordinates to determine their order. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 29. Analogies Observatories Semantic Signatures Challenges Next Steps From Space to Place Through Time Our Approach: Distort POI Locations Using Temporal Signatures Marker Category Distance (m) Monday 10AM (10−3) Saturday 11PM (10−3) A Bakery 39.2 6.28 4.08 B Nightclub 41.4 0.26 44.16 C Nightclub 69.9 0.26 44.16 D American Restaurant 62.7 1.61 9.50 E Bakery 73.7 6.28 4.08 F Fast Food 65.0 4.80 5.78 G Apparel Store 85.8 2.51 1.09 H Ice Cream Shop 82.6 0.84 15.88 I Movie Theater 94.2 1.44 11.00 J Pub 88.9 0.53 22.66 K Cosmetics Shop 60.9 3.87 1.57 L Diner 70.0 5.49 7.56 M Italian Restaurant 45.7 1.42 7.96 N Furniture / Home Store 114.9 4.79 5.01 O Grocery Store 147.8 4.53 1.38 P BBQ Joint 82.3 0.43 9.35 Q Burrito Place 88.1 0.54 3.16 R Italian Restaurant 93.6 1.42 7.96 The likelihood of visiting a coffee shop, university, bakery, etc at 7pm is rather low, while it is a peak hour for restaurants. In analogy to scale distortion in cartography, we can modify the purely spatial ranking by pulling and pushing places based on the check-in probability of their temporal type signatures. Different distortion models: linear, non-linear, symmetrical, non-symmetric Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 30. Analogies Observatories Semantic Signatures Challenges Next Steps From Space to Place Through Time Our Approach: Distort POI Locations Using Temporal Signatures Marker Actual Dist.(m) Distorted Dist.(m) A 39.2 25.8 B 41.4 71.4 C 69.9 99.9 D 62.7 79.8 E 73.7 60.3 F 65.0 59.5 G 85.8 95.6 H 82.6 106.7 I 94.2 112.8 J 88.9 116.1 K 60.9 61.1 L 70.0 60.6 M 45.7 64.5 N 114.9 109.5 O 147.8 143.9 P 82.3 110.5 Q 88.1 115.2 R 93.6 112.4 Method MRR SRR nDCG 1st Pos. Distance-Only 0.359 443.8 0.583 211 Temporally Adjusted 0.453 793.5 0.711 423 Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 31. Analogies Observatories Semantic Signatures Challenges Next Steps Vague Cognitive Regions: Where is SoCal? 2. Challenge: Vague Cognitive Regions Where is SoCal and NorCal? Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 32. Analogies Observatories Semantic Signatures Challenges Next Steps Vague Cognitive Regions: Where is SoCal? Baseline: Tests With Human Participants 44 participants, 90 hexagon tessellation (≈ 4920km2 each) Google Maps search for SoCal [More on the extraction of polygons at spatial@ucsb.local2015] Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 33. Analogies Observatories Semantic Signatures Challenges Next Steps Vague Cognitive Regions: Where is SoCal? Data and Correlations Source SoCal NorCal Total Flickr 22132 19706 41838 Instagram 169648 116984 286632 Twitter 10376 3294 13670 Travel Blogs 107 78 185 Wikipedia 1450 700 2150 0 1000 2000 3000 4000 0.00.20.40.60.81.0 Empirical Cumluative Distribution Flickr photo counts per user CDF Source ρ (M1) τ (M1) Flickr 0.881 0.721 Instagram 0.867 0.711 Twitter 0.874 0.714 TravelBlogs & Wikipedia 0.897 0.74 Means 0.870 0.712 Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 34. Analogies Observatories Semantic Signatures Challenges Next Steps Vague Cognitive Regions: Where is SoCal? Vague Cognitive Regions and Inter-rater Agreement 5 5 5 6.25 5.75 6.75 5.25 5.75 5.25 6.75 4.25 6.75 6.75 5.75 5.25 6.75 5.75 5.25 5.75 4.75 6.75 6.75 4.25 4.75 5.25 4 4 5 4 4 4 5 5.5 5.5 3.5 6.5 1.5 6.5 6.5 5.5 5.5 3.5 4.5 4.253 3 3 3 3.5 2.5 3.5 2.5 2.5 2.5 3.5 2.5 3.25 3.25 3.75 1.75 2.75 3 2 2 2 2.25 2.25 2.25 2.25 0 80 160 240 32040 Miles ® Legend Insufficient Data Very Northern Californian Moderately Northern Californian Slightly Northern Californian Equally Northern and Southern Californian Slightly Southern Californian Moderately Southern Californian Very Southern Californian Standard Deviations < 0.01 0.01 - 0.50 0.51 - 1.00 1.01- 1.73 > 1.73 Source Four Raters Five Raters Kendall’s W 0.953 0.929 p-value < 0.001 < 0.001 Key idea: Data sources become raters/ participants. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 35. Analogies Observatories Semantic Signatures Challenges Next Steps Vague Cognitive Regions: Where is SoCal? Vague Cognitive Regions and Thematic Signatures Do you even have to mine for the Socal and Norcal term directly? Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 36. Analogies Observatories Semantic Signatures Challenges Next Steps Vague Cognitive Regions: Where is SoCal? Vague Cognitive Regions and Self-Similarity 0 5 10 15 0.000.050.100.150.20 KLD Divergence Northern California Southern California Both Northern & Southern California Based on 60 topics, the similarity between SoCal (and NorCal) cells is higher than between SoCal and NorCal cells. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 37. Analogies Observatories Semantic Signatures Challenges Next Steps Is the Data Universe Homogenous amd Isotropic? Limits Of The Data Universe Analogy At large scale, the physical universe is homogenous and isotropic Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 38. Analogies Observatories Semantic Signatures Challenges Next Steps Is the Data Universe Homogenous amd Isotropic? Limits Of The Data Universe Analogy In terms of geospatial distribution the Social Media Web is neither homogenous nor isotropic. If you direct your social sensing instrument to a certain region, there will be no signal. Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 39. Analogies Observatories Semantic Signatures Challenges Next Steps NextSteps Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 40. Analogies Observatories Semantic Signatures Challenges Next Steps POI Pulse Observatory POI Pulse Observatory: Explore the Pulse of Los Angeles Using Signatures http://poipulse.com/ Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 41. Analogies Observatories Semantic Signatures Challenges Next Steps POI Pulse Observatory A Public Data Observatory at UCSB? A tangible & public observatory at UCSB; remember Griffith’s will. Show & stream data from different sources and show analysis results Visualize privacy implications of data Show citizens how their everyday data is used for scientific discoveries Exploring the Data Universe with Semantic Signatures K. Janowicz
  • 42. Analogies Observatories Semantic Signatures Challenges Next Steps The Right Place Exploring the Data Universe with Semantic Signatures K. Janowicz