SlideShare a Scribd company logo
Making sense of Users’ Web activitiesMathieu d'AquinKnowledge Media Institute, The Open University, UK
A bit of sci-fi to start with“… from people who are afraid that someone else knows information that they don’t and is gaining an unfair advantage by it. For all the claims one hears about the liberating impact of the data-net, the truth is that it whished on most of us a brand-new reason for paranoia” 				John Brunner, 				The Shockwave Rider, 1975
What we don’t know that they knowSimple important things:And more complex important things…What are all the websites that know my e-mail address?What does amazon.co.uk or the website of my favorite airline know about me?
Is this Personal Information Management?Yes, but…Looking at individual user’s information exchange and more generally activities on the WebThis is :BigHeterogeneousDistributedFragmentedSometimes implicitAnd hard to collect!
So, what do we do?Unrestricted monitoring of information exchange on the Web by an individual user
Local LoggingProxyHTTP RequestsHTTP RequestsLocal Web Agents (e.g., browser)External Web SitesHTTP ResponsesHTTP ResponsesWeb Exchange RDF Logs
<Request rdf:about="#request-1257949232709-1257949233757">   <startedAt>1257949232709</startedAt>   <endedAt>1257949233757</endedAt>   <origin rdf:resource="127.0.0.1" />   <onPort>80</onPort>   <toHostrdf:resource="api.facebook.com" />   <method rdf:resource="POST"/>   <toURLrdf:resource="http://api.facebook.com/restserver.php" />   <HTTPVersionrdf:resource="HTTP-1.1" />   <Host rdf:resource="api.facebook.com" />   <Content-Type rdf:resource="application--x-www-form-urlencoded" />   <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />   <Refererrdf:resource="app:--TweetDeck.swf" />   <X-Flash-Version rdf:resource="10.0.32.18" />   <Accept rdf:resource="*--*" />   <Accept-Language rdf:resource="en-us" />   <Accept-Encoding rdf:resource="gzip._deflate" />   <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" />   <Content-Length rdf:resource="984" />   <Connection rdf:resource="keep-alive" />   <Proxy-Connection rdf:resource="keep-alive" />   <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />   <response>      <Response rdf:about="#response-1257949232709--1257949233757">      <HTTPVersionrdf:resource="HTTP--1.0" />      <responseCoderdf:resource="200_OK" />      <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" />      <Content-Type rdf:resource="application--json" />      <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />     <Pragmardf:resource="no-cache" />      <Content-Encoding rdf:resource="gzip" />      <Content-Length rdf:resource="5943" />      <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />      <Proxy-Connection rdf:resource="keep-alive" />      <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />   </Response></response></Request><Request rdf:about="#request-1257949232709-1257949233757">   <startedAt>1257949232709</startedAt>   <endedAt>1257949233757</endedAt>   <origin rdf:resource="127.0.0.1" />   <onPort>80</onPort>   <toHostrdf:resource="api.facebook.com" />   <method rdf:resource="POST"/>   <toURLrdf:resource="http://api.facebook.com/restserver.php" />   <HTTPVersionrdf:resource="HTTP-1.1" />   <Host rdf:resource="api.facebook.com" />   <Content-Type rdf:resource="application--x-www-form-urlencoded" />   <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />   <Refererrdf:resource="app:--TweetDeck.swf" />   <X-Flash-Version rdf:resource="10.0.32.18" />   <Accept rdf:resource="*--*" />   <Accept-Language rdf:resource="en-us" />   <Accept-Encoding rdf:resource="gzip._deflate" />   <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" />   <Content-Length rdf:resource="984" />   <Connection rdf:resource="keep-alive" />   <Proxy-Connection rdf:resource="keep-alive" />   <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />   <response>      <Response rdf:about="#response-1257949232709--1257949233757">      <HTTPVersionrdf:resource="HTTP--1.0" />      <responseCoderdf:resource="200_OK" />      <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" />      <Content-Type rdf:resource="application--json" />      <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />     <Pragmardf:resource="no-cache" />      <Content-Encoding rdf:resource="gzip" />      <Content-Length rdf:resource="5943" />      <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />      <Proxy-Connection rdf:resource="keep-alive" />      <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />   </Response></response></Request><Request rdf:about="#request-1257949232709-1257949233757">   <startedAt>1257949232709</startedAt>   <endedAt>1257949233757</endedAt>   <origin rdf:resource="127.0.0.1" />   <onPort>80</onPort>   <toHostrdf:resource="api.facebook.com" />   <method rdf:resource="POST"/>   <toURLrdf:resource="http://api.facebook.com/restserver.php" />   <HTTPVersionrdf:resource="HTTP-1.1" />   <Host rdf:resource="api.facebook.com" />   <Content-Type rdf:resource="application--x-www-form-urlencoded" />   <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />   <Refererrdf:resource="app:--TweetDeck.swf" />   <X-Flash-Version rdf:resource="10.0.32.18" />   <Accept rdf:resource="*--*" />   <Accept-Language rdf:resource="en-us" />   <Accept-Encoding rdf:resource="gzip._deflate" />   <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" />   <Content-Length rdf:resource="984" />   <Connection rdf:resource="keep-alive" />   <Proxy-Connection rdf:resource="keep-alive" />   <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />   <response>      <Response rdf:about="#response-1257949232709--1257949233757">      <HTTPVersionrdf:resource="HTTP--1.0" />      <responseCoderdf:resource="200_OK" />      <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" />      <Content-Type rdf:resource="application--json" />      <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />     <Pragmardf:resource="no-cache" />      <Content-Encoding rdf:resource="gzip" />      <Content-Length rdf:resource="5943" />      <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />      <Proxy-Connection rdf:resource="keep-alive" />      <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />   </Response></response></Request>2.5 months = 3 Million HTTP Requests100 Million RDF Triples<Request rdf:about="#request-1257949232709-1257949233757">   <startedAt>1257949232709</startedAt>   <endedAt>1257949233757</endedAt>   <origin rdf:resource="127.0.0.1" />   <onPort>80</onPort>   <toHostrdf:resource="api.facebook.com" />   <method rdf:resource="POST"/>   <toURLrdf:resource="http://api.facebook.com/restserver.php" />   <HTTPVersionrdf:resource="HTTP-1.1" />   <Host rdf:resource="api.facebook.com" />   <Content-Type rdf:resource="application--x-www-form-urlencoded" />   <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />   <Refererrdf:resource="app:--TweetDeck.swf" />   <X-Flash-Version rdf:resource="10.0.32.18" />   <Accept rdf:resource="*--*" />   <Accept-Language rdf:resource="en-us" />   <Accept-Encoding rdf:resource="gzip._deflate" />   <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" />   <Content-Length rdf:resource="984" />   <Connection rdf:resource="keep-alive" />   <Proxy-Connection rdf:resource="keep-alive" />   <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />   <response>      <Response rdf:about="#response-1257949232709--1257949233757">      <HTTPVersionrdf:resource="HTTP--1.0" />      <responseCoderdf:resource="200_OK" />      <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" />      <Content-Type rdf:resource="application--json" />      <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />     <Pragmardf:resource="no-cache" />      <Content-Encoding rdf:resource="gzip" />      <Content-Length rdf:resource="5943" />      <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />      <Proxy-Connection rdf:resource="keep-alive" />      <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />   </Response></response></Request><Request rdf:about="#request-1257949232709-1257949233757">   <startedAt>1257949232709</startedAt>   <endedAt>1257949233757</endedAt>   <origin rdf:resource="127.0.0.1" />   <onPort>80</onPort>   <toHostrdf:resource="api.facebook.com" />   <method rdf:resource="POST"/>   <toURLrdf:resource="http://api.facebook.com/restserver.php" />   <HTTPVersionrdf:resource="HTTP-1.1" />   <Host rdf:resource="api.facebook.com" />   <Content-Type rdf:resource="application--x-www-form-urlencoded" />   <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" />   <Refererrdf:resource="app:--TweetDeck.swf" />   <X-Flash-Version rdf:resource="10.0.32.18" />   <Accept rdf:resource="*--*" />   <Accept-Language rdf:resource="en-us" />   <Accept-Encoding rdf:resource="gzip._deflate" />   <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" />   <Content-Length rdf:resource="984" />   <Connection rdf:resource="keep-alive" />   <Proxy-Connection rdf:resource="keep-alive" />   <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" />   <response>      <Response rdf:about="#response-1257949232709--1257949233757">      <HTTPVersionrdf:resource="HTTP--1.0" />      <responseCoderdf:resource="200_OK" />      <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" />      <Content-Type rdf:resource="application--json" />      <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" />     <Pragmardf:resource="no-cache" />      <Content-Encoding rdf:resource="gzip" />      <Content-Length rdf:resource="5943" />      <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" />      <Proxy-Connection rdf:resource="keep-alive" />      <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" />   </Response></response></Request>
What this talk is aboutUsing ontologies and external datasets to Generate abstractions of this low level dataEnrich it with external knowledge and modelsInterpret to give back useful information to the user
Online Activities Ontology HTTP Ontology Parameters and Website info.Personal InformationWeb Site InformationTrust ModelLocation Information
HTTP OntologyBuilt bottom-up from the dataCan help inferring simple things from itAnd answer questions through SPARQL queriesInternetPoint    time: DateTimeorigineRequest    time: DateTimetoURL: URLreferer: URLtoHostWebHost    domain: StringUser-AgentWebAgent    ID: StringhasResponseContentContent-TypeResponse    time: DateTimeresponseCode: intDataFile    ID: StringContentContent-TypeDataFormatMineID: String
Simple examplesRequests per time of dayRequests per User AgentsRequests per Host
Integrating basic infoDomain nameIPLocation“What!? What requests have I made to websites in Nigeria? What Data did I send?”Can be answered in a SPARQL query
More information about websitesThe linked data cloud is full of it.Using the domain name to address this information.CONSTRUCT {<domain_name> ?p ?y}WHERE {{{?xdbpedia:homepage <http://domain_name>}.			{?x ?p ?y}}UNION {{?xowl:sameAs ?z}.			{?xdbpedia:homepage <http://domain_name>}.			{?x ?p ?y}}}
ExamplesGoogle ServicesEntertainment WebsitesWeb AnalyticsInternet Search Enginesubject/categoryVideo sharingVideo Hostingwww.google-analytics.comCompanydeveloperWeb Search EngineSearch Enginetypesubject/categorygoogleownersubsediaryOfwww.youtube.comwww.google.comparentDBpediafreebase
ActivitiesCan we now understand the user activities?Based on website categories and on their parameters:GET http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2FPOST format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v =1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM %20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1 %29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Ftime%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20message%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%20type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20FROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27newsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28created%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=12565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9e1a5ec6c5768%2D605559235
Activities in an OntologyDerived in a bottom-up way from categories of activities/requestCan be used to characterize overall activities, individual activities or correlations between activities ActivityBasedRequestImplicitActivityExplicitActivityReportToAnalyticsSearchCheckStatusFeedSearchVideoSearchImageAutoCheckStatusFeedFollowLinkManualCheckStatusFeedFollowSearchResult
Example Activity: SearchSearch keywords
Example Activity: SearchinverseOf(linked-followed, referer)InformationalSearch= SearchRequest and min 2 link-followedNavigationalSearch= SearchRequest and =1 link-followedProminence of Navigational SearchesIndexedSite= exists refererNavigationalSearchIndexedSite(?x), NavigationalSearch(?y), referer(?x, ?y), searchTerm(?y, ?z) IndexedWithKeyword(?x, ?z)
Example Activity: SearchSearch KeywordsOpenCalaisTopics of interest
Personal data exchangeRequest ParametersPersonal Information (Profile)Trust Model
Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile  from the data
User profile re-constructed from Web activities36 attributes, 1,080 values, to 123 domainsA model of what piece of personal information was sent where (can answer the questions)
What that tells us about trustTaking the point of view of an external observer, we can derive an observed model of trust and criticality of dataIf this piece of data is critical to you and you give it to bob, you must trust bobIf you give this piece of data to many untrusted people, you probably don’t consider it critical
FormallyTrust in a domain = max of criticality of data it receivedCriticality of a piece of data= 1 / 1 + Σ (1- trust in websites that received the data)Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5
Interacting with the modelExpose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior
Demo
ConclusionFirst set tools exploiting logs of personal Web activity Demonstrate the need for ways to abstract and interpreter activity data, to support Web UsersDemonstrate the ability of semantic technologies, ontologies and the enrichment through external data, to provide such abilities
So much more to doCan I collect this tweet? From HTTPS? From my mobile phone?Can I link it to where I am?To what I’m doing? To what I have been doing?To the abstract of the presentation? To the slides on SlideShare.net? To blogs mentioning it?Can I cope with the scale of all this information? Can I decide what to share? Can I store all this securely? Can I get usable access to it? Can I learn something from it?
Thank youm.daquin@open.ac.uk@mdaquin

More Related Content

PPT
Advanced SEO for Web Developers
PPT
Getting More Traffic From Search Advanced Seo For Developers Presentation
PPTX
The internet for SEOs by Roxana Stingu
PPTX
.htaccess for SEOs - A presentation by Roxana Stingu
PDF
CITEC #CON2-Dirty Attack with Google Hacking
ODP
Web Browser Basics, Tips & Tricks - Draft 20 (Revised 5/18/17)
 
PDF
20190516 web security-basic
PDF
Impact of HTTP Cookie Violations in Web Archives
Advanced SEO for Web Developers
Getting More Traffic From Search Advanced Seo For Developers Presentation
The internet for SEOs by Roxana Stingu
.htaccess for SEOs - A presentation by Roxana Stingu
CITEC #CON2-Dirty Attack with Google Hacking
Web Browser Basics, Tips & Tricks - Draft 20 (Revised 5/18/17)
 
20190516 web security-basic
Impact of HTTP Cookie Violations in Web Archives

What's hot (20)

PDF
Scraping with Python for Fun and Profit - PyCon India 2010
PDF
How I learned to stop worrying and love the .htaccess file
PDF
The Case for HTTP/2
PPT
courts circuits : l'innovation dans le luxe 'mon idendité de luxe" (partie 3)
PDF
Supporting Web Archiving via Web Packaging
PPTX
Introduction to google hacking database
PPT
Web Browsers And Other Mistakes
PDF
Google Hacking 101
PDF
Readying Web Archives to Consume and Leverage Web Bundles
PPTX
Maven Plugins
PPT
Web Browsers And Other Mistakes
PDF
Mobile Web Performance - Getting and Staying Fast
PPTX
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
PDF
RDFa: introduction, comparison with microdata and microformats and how to use it
PDF
Web Page Test - Beyond the Basics
PDF
HTTP cookie hijacking in the wild: security and privacy implications
PDF
Asp.Net difference faqs- 4
PDF
Google Hack
PPT
Phpvsjsp
PPTX
Google Dorks
Scraping with Python for Fun and Profit - PyCon India 2010
How I learned to stop worrying and love the .htaccess file
The Case for HTTP/2
courts circuits : l'innovation dans le luxe 'mon idendité de luxe" (partie 3)
Supporting Web Archiving via Web Packaging
Introduction to google hacking database
Web Browsers And Other Mistakes
Google Hacking 101
Readying Web Archives to Consume and Leverage Web Bundles
Maven Plugins
Web Browsers And Other Mistakes
Mobile Web Performance - Getting and Staying Fast
"RDFa - what, why and how?" by Mike Hewett and Shamod Lacoul
RDFa: introduction, comparison with microdata and microformats and how to use it
Web Page Test - Beyond the Basics
HTTP cookie hijacking in the wild: security and privacy implications
Asp.Net difference faqs- 4
Google Hack
Phpvsjsp
Google Dorks
Ad

Viewers also liked (8)

PDF
OpenData in OpenEd (beyond OERs)
PPTX
How much Semantic Data on Small Devices?
PPTX
Building the Open University's Web of Linked Data
PPTX
NeOn Tool Support for Building Ontologies By Reuse - ICBO 09
PPT
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...
PPT
Referendum Oosterweelverbinding - Doe mee!
PPTX
Quick overview of the use of linked data in sociallearn
PPT
Linked Data as a new environment for Learning Analytics and education
OpenData in OpenEd (beyond OERs)
How much Semantic Data on Small Devices?
Building the Open University's Web of Linked Data
NeOn Tool Support for Building Ontologies By Reuse - ICBO 09
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...
Referendum Oosterweelverbinding - Doe mee!
Quick overview of the use of linked data in sociallearn
Linked Data as a new environment for Learning Analytics and education
Ad

Similar to Making sense of users' Web activities (20)

ODP
Sword v2 at UKCoRR
PDF
Google Devfest Singapore - OpenSocial
ODP
Moving from Web 1.0 to Web 2.0
PPT
Agile Descriptions
ODP
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
PPT
The Semantic Web An Introduction
PPT
BarCamp KL H20 Open Social Hackathon
PDF
REST Introduction (PHP London)
PPT
Semantic Web, an introduction for bioscientists
PPT
IMS Learning Tools Interoperability @ UCLA
PPT
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
PDF
Computer Networks: An Introduction
PPTX
WWW and HTTP
ODP
Web Scraping with PHP
PPT
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
PPT
Incorporating Web Services in Mobile Applications - Web 2.0 San Fran 2009
PPT
SearchMonkey
PPTX
Building high performance web apps.
PDF
The secret web performance metric no one is talking about
PDF
Sword v2 at UKCoRR
Google Devfest Singapore - OpenSocial
Moving from Web 1.0 to Web 2.0
Agile Descriptions
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
The Semantic Web An Introduction
BarCamp KL H20 Open Social Hackathon
REST Introduction (PHP London)
Semantic Web, an introduction for bioscientists
IMS Learning Tools Interoperability @ UCLA
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
Computer Networks: An Introduction
WWW and HTTP
Web Scraping with PHP
Linked Data and Search: Thomas Steiner (Google Inc, Germany)
Incorporating Web Services in Mobile Applications - Web 2.0 San Fran 2009
SearchMonkey
Building high performance web apps.
The secret web performance metric no one is talking about

More from Mathieu d'Aquin (20)

PDF
A factorial study of neural network learning from differences for regression
PDF
Recentrer l'intelligence artificielle sur les connaissances
PDF
Data and Knowledge as Commodities
PDF
Unsupervised learning approach for identifying sub-genres in music scores
PDF
Is knowledge engineering still relevant?
PDF
A data view of the data science process
PDF
Dealing with Open Domain Data
PDF
Web Analytics for Everyday Learning
PDF
Presentation a in ovive montpellier - 26%2 f06%2f2018 (1)
PDF
Learning Analytics: understand learning and support the learner
PDF
The AFEL Project
PDF
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...
PDF
Data ethics
PDF
Data for Learning and Learning with Data
PDF
Towards an “Ethics in Design” methodology for AI research projects
PDF
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...
PDF
Profiling information sources and services for discovery
PDF
Analyse de données et de réseaux sociaux pour l’aide à l’apprentissage infor...
PDF
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
PDF
Data analytics beyond data processing and how it affects Industry 4.0
A factorial study of neural network learning from differences for regression
Recentrer l'intelligence artificielle sur les connaissances
Data and Knowledge as Commodities
Unsupervised learning approach for identifying sub-genres in music scores
Is knowledge engineering still relevant?
A data view of the data science process
Dealing with Open Domain Data
Web Analytics for Everyday Learning
Presentation a in ovive montpellier - 26%2 f06%2f2018 (1)
Learning Analytics: understand learning and support the learner
The AFEL Project
Assessing the Readability of Policy Documents: The Case of Terms of Use of On...
Data ethics
Data for Learning and Learning with Data
Towards an “Ethics in Design” methodology for AI research projects
AFEL: Towards Measuring Online Activities Contributions to Self-Directed Lear...
Profiling information sources and services for discovery
Analyse de données et de réseaux sociaux pour l’aide à l’apprentissage infor...
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
Data analytics beyond data processing and how it affects Industry 4.0

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
KodekX | Application Modernization Development
PPTX
A Presentation on Artificial Intelligence
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
KodekX | Application Modernization Development
A Presentation on Artificial Intelligence
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx

Making sense of users' Web activities

  • 1. Making sense of Users’ Web activitiesMathieu d'AquinKnowledge Media Institute, The Open University, UK
  • 2. A bit of sci-fi to start with“… from people who are afraid that someone else knows information that they don’t and is gaining an unfair advantage by it. For all the claims one hears about the liberating impact of the data-net, the truth is that it whished on most of us a brand-new reason for paranoia” John Brunner, The Shockwave Rider, 1975
  • 3. What we don’t know that they knowSimple important things:And more complex important things…What are all the websites that know my e-mail address?What does amazon.co.uk or the website of my favorite airline know about me?
  • 4. Is this Personal Information Management?Yes, but…Looking at individual user’s information exchange and more generally activities on the WebThis is :BigHeterogeneousDistributedFragmentedSometimes implicitAnd hard to collect!
  • 5. So, what do we do?Unrestricted monitoring of information exchange on the Web by an individual user
  • 6. Local LoggingProxyHTTP RequestsHTTP RequestsLocal Web Agents (e.g., browser)External Web SitesHTTP ResponsesHTTP ResponsesWeb Exchange RDF Logs
  • 7. <Request rdf:about="#request-1257949232709-1257949233757"> <startedAt>1257949232709</startedAt> <endedAt>1257949233757</endedAt> <origin rdf:resource="127.0.0.1" /> <onPort>80</onPort> <toHostrdf:resource="api.facebook.com" /> <method rdf:resource="POST"/> <toURLrdf:resource="http://api.facebook.com/restserver.php" /> <HTTPVersionrdf:resource="HTTP-1.1" /> <Host rdf:resource="api.facebook.com" /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /> <Refererrdf:resource="app:--TweetDeck.swf" /> <X-Flash-Version rdf:resource="10.0.32.18" /> <Accept rdf:resource="*--*" /> <Accept-Language rdf:resource="en-us" /> <Accept-Encoding rdf:resource="gzip._deflate" /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" /> <Content-Length rdf:resource="984" /> <Connection rdf:resource="keep-alive" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /> <response> <Response rdf:about="#response-1257949232709--1257949233757"> <HTTPVersionrdf:resource="HTTP--1.0" /> <responseCoderdf:resource="200_OK" /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" /> <Content-Type rdf:resource="application--json" /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /> <Pragmardf:resource="no-cache" /> <Content-Encoding rdf:resource="gzip" /> <Content-Length rdf:resource="5943" /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /> </Response></response></Request><Request rdf:about="#request-1257949232709-1257949233757"> <startedAt>1257949232709</startedAt> <endedAt>1257949233757</endedAt> <origin rdf:resource="127.0.0.1" /> <onPort>80</onPort> <toHostrdf:resource="api.facebook.com" /> <method rdf:resource="POST"/> <toURLrdf:resource="http://api.facebook.com/restserver.php" /> <HTTPVersionrdf:resource="HTTP-1.1" /> <Host rdf:resource="api.facebook.com" /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /> <Refererrdf:resource="app:--TweetDeck.swf" /> <X-Flash-Version rdf:resource="10.0.32.18" /> <Accept rdf:resource="*--*" /> <Accept-Language rdf:resource="en-us" /> <Accept-Encoding rdf:resource="gzip._deflate" /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" /> <Content-Length rdf:resource="984" /> <Connection rdf:resource="keep-alive" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /> <response> <Response rdf:about="#response-1257949232709--1257949233757"> <HTTPVersionrdf:resource="HTTP--1.0" /> <responseCoderdf:resource="200_OK" /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" /> <Content-Type rdf:resource="application--json" /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /> <Pragmardf:resource="no-cache" /> <Content-Encoding rdf:resource="gzip" /> <Content-Length rdf:resource="5943" /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /> </Response></response></Request><Request rdf:about="#request-1257949232709-1257949233757"> <startedAt>1257949232709</startedAt> <endedAt>1257949233757</endedAt> <origin rdf:resource="127.0.0.1" /> <onPort>80</onPort> <toHostrdf:resource="api.facebook.com" /> <method rdf:resource="POST"/> <toURLrdf:resource="http://api.facebook.com/restserver.php" /> <HTTPVersionrdf:resource="HTTP-1.1" /> <Host rdf:resource="api.facebook.com" /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /> <Refererrdf:resource="app:--TweetDeck.swf" /> <X-Flash-Version rdf:resource="10.0.32.18" /> <Accept rdf:resource="*--*" /> <Accept-Language rdf:resource="en-us" /> <Accept-Encoding rdf:resource="gzip._deflate" /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" /> <Content-Length rdf:resource="984" /> <Connection rdf:resource="keep-alive" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /> <response> <Response rdf:about="#response-1257949232709--1257949233757"> <HTTPVersionrdf:resource="HTTP--1.0" /> <responseCoderdf:resource="200_OK" /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" /> <Content-Type rdf:resource="application--json" /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /> <Pragmardf:resource="no-cache" /> <Content-Encoding rdf:resource="gzip" /> <Content-Length rdf:resource="5943" /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /> </Response></response></Request>2.5 months = 3 Million HTTP Requests100 Million RDF Triples<Request rdf:about="#request-1257949232709-1257949233757"> <startedAt>1257949232709</startedAt> <endedAt>1257949233757</endedAt> <origin rdf:resource="127.0.0.1" /> <onPort>80</onPort> <toHostrdf:resource="api.facebook.com" /> <method rdf:resource="POST"/> <toURLrdf:resource="http://api.facebook.com/restserver.php" /> <HTTPVersionrdf:resource="HTTP-1.1" /> <Host rdf:resource="api.facebook.com" /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /> <Refererrdf:resource="app:--TweetDeck.swf" /> <X-Flash-Version rdf:resource="10.0.32.18" /> <Accept rdf:resource="*--*" /> <Accept-Language rdf:resource="en-us" /> <Accept-Encoding rdf:resource="gzip._deflate" /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" /> <Content-Length rdf:resource="984" /> <Connection rdf:resource="keep-alive" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /> <response> <Response rdf:about="#response-1257949232709--1257949233757"> <HTTPVersionrdf:resource="HTTP--1.0" /> <responseCoderdf:resource="200_OK" /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" /> <Content-Type rdf:resource="application--json" /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /> <Pragmardf:resource="no-cache" /> <Content-Encoding rdf:resource="gzip" /> <Content-Length rdf:resource="5943" /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /> </Response></response></Request><Request rdf:about="#request-1257949232709-1257949233757"> <startedAt>1257949232709</startedAt> <endedAt>1257949233757</endedAt> <origin rdf:resource="127.0.0.1" /> <onPort>80</onPort> <toHostrdf:resource="api.facebook.com" /> <method rdf:resource="POST"/> <toURLrdf:resource="http://api.facebook.com/restserver.php" /> <HTTPVersionrdf:resource="HTTP-1.1" /> <Host rdf:resource="api.facebook.com" /> <Content-Type rdf:resource="application--x-www-form-urlencoded" /> <User-Agent rdf:resource="Mozilla--5.0_(Macintosh;_U;_Intel_Mac_OS_X;_en)_AppleWebKit--526.9+_(KHTML._like_Gecko)_AdobeAIR--1.5.2" /> <Refererrdf:resource="app:--TweetDeck.swf" /> <X-Flash-Version rdf:resource="10.0.32.18" /> <Accept rdf:resource="*--*" /> <Accept-Language rdf:resource="en-us" /> <Accept-Encoding rdf:resource="gzip._deflate" /> <Cookie rdf:resource= "__qca=1239783354-42963995-12118014;___utma=87286159.357565716.1239892196.1252686326.1257582307.16;___utmz=87286159.1257582307.16.16.utmccn= (referral)|utmcsr=facebook.com|utmcct=--tos.php|utmcmd=referral;_c_user=605559235;_cur_max_lag=2;_datr=1239398136-0711bf1215821a9c58848bf0ffd0020ec8450cfa7154b9e228c29;_lsd=P3Zpn;_lxe=metm.daquin%40virgin.net;_lxs=3;_s_vsn_facebookpoc_1=9874874320812" /> <Content-Length rdf:resource="984" /> <Connection rdf:resource="keep-alive" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_c22b691f691dabd5ae893b9cb2f8add7" /> <response> <Response rdf:about="#response-1257949232709--1257949233757"> <HTTPVersionrdf:resource="HTTP--1.0" /> <responseCoderdf:resource="200_OK" /> <Cache-Control rdf:resource="private._no-store._no-cache._must-revalidate._post-check=0._pre-check=0" /> <Content-Type rdf:resource="application--json" /> <Expires rdf:resource="Mon._26_Jul_1997_05:00:00_GMT" /> <Pragmardf:resource="no-cache" /> <Content-Encoding rdf:resource="gzip" /> <Content-Length rdf:resource="5943" /> <X-Cache rdf:resource="MISS_from_roeburn.open.ac.uk" /> <Proxy-Connection rdf:resource="keep-alive" /> <data rdf:resource="data_5ccf6054fd0fba3ee7eb444e178eaf19" /> </Response></response></Request>
  • 8. What this talk is aboutUsing ontologies and external datasets to Generate abstractions of this low level dataEnrich it with external knowledge and modelsInterpret to give back useful information to the user
  • 9. Online Activities Ontology HTTP Ontology Parameters and Website info.Personal InformationWeb Site InformationTrust ModelLocation Information
  • 10. HTTP OntologyBuilt bottom-up from the dataCan help inferring simple things from itAnd answer questions through SPARQL queriesInternetPoint time: DateTimeorigineRequest time: DateTimetoURL: URLreferer: URLtoHostWebHost domain: StringUser-AgentWebAgent ID: StringhasResponseContentContent-TypeResponse time: DateTimeresponseCode: intDataFile ID: StringContentContent-TypeDataFormatMineID: String
  • 11. Simple examplesRequests per time of dayRequests per User AgentsRequests per Host
  • 12. Integrating basic infoDomain nameIPLocation“What!? What requests have I made to websites in Nigeria? What Data did I send?”Can be answered in a SPARQL query
  • 13. More information about websitesThe linked data cloud is full of it.Using the domain name to address this information.CONSTRUCT {<domain_name> ?p ?y}WHERE {{{?xdbpedia:homepage <http://domain_name>}. {?x ?p ?y}}UNION {{?xowl:sameAs ?z}. {?xdbpedia:homepage <http://domain_name>}. {?x ?p ?y}}}
  • 14. ExamplesGoogle ServicesEntertainment WebsitesWeb AnalyticsInternet Search Enginesubject/categoryVideo sharingVideo Hostingwww.google-analytics.comCompanydeveloperWeb Search EngineSearch Enginetypesubject/categorygoogleownersubsediaryOfwww.youtube.comwww.google.comparentDBpediafreebase
  • 15. ActivitiesCan we now understand the user activities?Based on website categories and on their parameters:GET http://uk.search.yahoo.com/beacon/module?p=idiocracy&url=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0387808%2FPOST format=JSON&method=fql%2Emultiquery&api%5Fkey=51d350e8d92da1f5623512a9e801da2b&v =1%2E0&queries=%7B%22query2%22%3A%22SELECT%20app%5Fid%2C%20display%5Fname%20FROM %20application%20WHERE%20app%5Fid%20IN%20%28SELECT%20app%5Fid%20FROM%20%23query1 %29%22%2C%22query1%22%3A%22SELECT%20post%5Fid%2C%20source%5Fid%2C%20created%5Ftime%2C%20updated%5Ftime%2C%20actor%5Fid%2C%20target%5Fid%2C%20app%5Fid%2C%20message%2C%20attachment%2C%20comments%2C%20likes%2C%20permalink%2C%20attribution%2C%20type%20FROM%20stream%20WHERE%20filter%5Fkey%20IN%20%28SELECT%20filter%5Fkey%20FROM%20stream%5Ffilter%20WHERE%20uid%20%3D%20605559235%20AND%20type%20%3D%20%27newsfeed%27%29%20AND%20%28created%5Ftime%20%3E%3D%201257443596%29%20AND%20%28%28created%5Ftime%20%3E%201257945423%29%20OR%20%28updated%5Ftime%20%21%3D%20created%5Ftime%29%29%20ORDER%20BY%20created%5Ftime%20DESC%20LIMIT%20200%22%7D&call%5Fid=12565739074246102&sig=01a13a72825ed83ed6d23bdf2791ad1a&session%5Fkey=be312ffdf9b9e1a5ec6c5768%2D605559235
  • 16. Activities in an OntologyDerived in a bottom-up way from categories of activities/requestCan be used to characterize overall activities, individual activities or correlations between activities ActivityBasedRequestImplicitActivityExplicitActivityReportToAnalyticsSearchCheckStatusFeedSearchVideoSearchImageAutoCheckStatusFeedFollowLinkManualCheckStatusFeedFollowSearchResult
  • 18. Example Activity: SearchinverseOf(linked-followed, referer)InformationalSearch= SearchRequest and min 2 link-followedNavigationalSearch= SearchRequest and =1 link-followedProminence of Navigational SearchesIndexedSite= exists refererNavigationalSearchIndexedSite(?x), NavigationalSearch(?y), referer(?x, ?y), searchTerm(?y, ?z) IndexedWithKeyword(?x, ?z)
  • 19. Example Activity: SearchSearch KeywordsOpenCalaisTopics of interest
  • 20. Personal data exchangeRequest ParametersPersonal Information (Profile)Trust Model
  • 21. Tool used to create mappings between data sent to websites (from logs on the right) with the user profile (left). Effectively reconstructing the profile from the data
  • 22. User profile re-constructed from Web activities36 attributes, 1,080 values, to 123 domainsA model of what piece of personal information was sent where (can answer the questions)
  • 23. What that tells us about trustTaking the point of view of an external observer, we can derive an observed model of trust and criticality of dataIf this piece of data is critical to you and you give it to bob, you must trust bobIf you give this piece of data to many untrusted people, you probably don’t consider it critical
  • 24. FormallyTrust in a domain = max of criticality of data it receivedCriticality of a piece of data= 1 / 1 + Σ (1- trust in websites that received the data)Obviously, these 2 formulas are interdependent. Treating them as a sequence, with initial values at 0.5
  • 25. Interacting with the modelExpose the user to his own observed behavior has observed, so that he can try to align it to his intended behavior
  • 26. Demo
  • 27. ConclusionFirst set tools exploiting logs of personal Web activity Demonstrate the need for ways to abstract and interpreter activity data, to support Web UsersDemonstrate the ability of semantic technologies, ontologies and the enrichment through external data, to provide such abilities
  • 28. So much more to doCan I collect this tweet? From HTTPS? From my mobile phone?Can I link it to where I am?To what I’m doing? To what I have been doing?To the abstract of the presentation? To the slides on SlideShare.net? To blogs mentioning it?Can I cope with the scale of all this information? Can I decide what to share? Can I store all this securely? Can I get usable access to it? Can I learn something from it?