SlideShare a Scribd company logo
Monday, March 26, 2012
WP1	
  Overview

     • “Backend” shared datasets and services
     • Mappings, integration and common vocabulary
     • Extra datasets to support usecase scenarios




                                           2

Monday, March 26, 2012
WP1:	
  Year	
  3	
  Direc2on	
  &	
  Achievements



     • Moving	
  from	
  single	
  ‘warehouse’	
  to	
  distributed	
  
       set	
  of	
  databases,	
  datasets	
  and	
  services
     • Planning	
  for	
  sustainable	
  life-­‐aFer-­‐project
     • Integra2ng	
  feedback	
  from	
  end-­‐to-­‐end	
  demos




                                                             3

Monday, March 26, 2012
4

Monday, March 26, 2012
Why	
  WP1?	
  two	
  roles

     • NoTube	
  internal:	
  a	
  hub	
  for	
  data	
  sharing
     • NoTube	
  external:	
  show	
  how	
  shared	
  datasets	
  
       and	
  vocabularies	
  help	
  with	
  user-­‐facing	
  “Web	
  
       and	
  TV”	
  problems
     • “show”	
  -­‐cri2cally-­‐	
  includes	
  “thinking	
  out	
  loud”	
  
       as	
  we	
  explore,	
  via	
  blog,	
  email,	
  twiTer	
  etc.
           – scholarly	
  ar2cles	
  rarely	
  reach	
  our	
  target	
  audiences

                                                                 5

Monday, March 26, 2012
Outreach	
  message

     • Let	
  metadata	
  flow	
  widely	
  -­‐	
  adver2sing	
  content,	
  
       rather	
  than	
  be	
  a	
  hidden	
  asset
     • Iden/fy	
  and	
  link	
  content	
  with	
  useful	
  URLs(*)
     • Open	
  APIs	
  to	
  control	
  TV	
  and	
  link	
  devices	
  [WP7c]

        ...from W3C TV & Web position paper (with Project Baird), Berlin 9 Feb 2011

       WP1 concerned primarily with the first two: getting metadata into the Web from
       source, rather than scraping, guessing, approximating.

                                                                   6

Monday, March 26, 2012
Aside:	
  RDFa	
  went	
  mainstream

     • Try	
  ‘View	
  source’	
  on	
  IMDB,	
  RoTen	
  Tomatoes,	
  
       BBC,	
  tv.com	
  sites	
  to	
  find	
  RDF	
  descrip2ons	
  of	
  
       TV	
  content.	
  
     • NoTube’s	
  approach	
  was	
  to	
  lead	
  by	
  example,	
  
       to	
  engage	
  with	
  industry	
  and	
  to	
  plan	
  from	
  the	
  
       beginning	
  for	
  the	
  ‘aFerlife’.
     • This	
  strategy	
  worked.

                                                               7

Monday, March 26, 2012
Facebook OGP




                                                   tv.com 'The Wire' page




                         ...simple, extensible standards are being adopted

                               OGP since 2010; schema.org since 2011...


                                                      8

Monday, March 26, 2012
TV	
  Data	
  Warehouse

     • We	
  s2ll	
  host	
  several	
  crawls	
  of	
  TV	
  EPG	
  data
     • Trend	
  is	
  for	
  data	
  to	
  be	
  more	
  cleanly	
  available	
  
       from	
  source,	
  without	
  scraping
     • Crawling,	
  aggrega2on	
  and	
  integra2on	
  s2ll	
  
       useful,	
  but	
  less	
  scraping	
  required
     • Crawled	
  'data	
  warehouse'	
  also	
  used	
  as	
  a	
  
       research	
  testbed	
  collec2on

                                                                 9

Monday, March 26, 2012
WP1:	
  Example	
  Datasets	
  

     • WP7c/WP3	
  use	
  DBpedia/Wikipedia	
  URLs	
  for	
  
       topics;	
  covers	
  all	
  mainstream	
  areas.	
  	
  
     • BBC	
  also	
  using	
  Lonclass/UDC	
  topic	
  codes	
  
       (we’re	
  helping	
  prepare	
  this	
  for	
  sharing)
     • For	
  Music,	
  we	
  adopt	
  MusicBrainz	
  IDs
     • Mapping	
  diverse	
  representa2ons	
  of	
  ‘genre’
     • “Organic”	
  item/topic	
  similarity	
  measures	
  
       derived	
  from	
  user	
  data	
  from	
  WP3
                                                    10

Monday, March 26, 2012
WP1:	
  Data	
  Services

     • Data	
  Services	
  exposed	
  as	
  sta2c	
  files:
           – Show	
  how	
  to	
  embed	
  RDFa	
  in	
  HTML
           – Publish	
  as	
  RDF/XML	
  Linked	
  Data
     • Interac2ve	
  Data	
  Services:
           – Using	
  W3C	
  SPARQL,	
  SQL	
  or	
  SOLR/Lucene,	
  over	
  
             HTTP	
  and/or	
  XMPP.


                                                                11

Monday, March 26, 2012
WP1:	
  Exploita2on	
  and	
  Sustainability

     • WP1’s	
  approach	
  designed	
  to	
  outlive	
  NoTube
     • Use,	
  augment	
  and	
  contribute	
  to	
  external	
  data
           – e.g.	
  DBpedia,	
  Archive.org,	
  W3C	
  &	
  wider	
  Web	
  of	
  
             data	
  trend	
  (e.g.	
  RDFa	
  adop2on)
           – also	
  we	
  demonstrate	
  e.g.	
  on	
  blog	
  how	
  we	
  did	
  it	
  -­‐	
  
             so	
  others	
  can	
  replicate	
  it
           – WP4	
  enrichments	
  can	
  be	
  fed	
  back	
  to	
  externals,	
  
             e.g.	
  similarity	
  metrics	
  &	
  clusters

                                                                             12

Monday, March 26, 2012
WP1:	
  Sustainability	
  2
     • NoTube’s	
  2010	
  W3C	
  “Web	
  &	
  TV”	
  posi2on	
  
       paper	
  lobbied	
  for	
  unique	
  IDs	
  &	
  public	
  
       metadata	
  for	
  video	
  content;	
  this	
  is	
  now	
  going	
  
       mainstream.
     • VUA	
  will	
  con2nue	
  hos2ng	
  some	
  data,	
  using	
  
       PURL.org	
  so	
  can	
  pass	
  e.g.	
  to	
  W3C	
  later.
     • Collab	
  with	
  Facebook	
  OGP	
  (helped	
  with	
  their	
  
       RDFa	
  adop2on)	
  and	
  now	
  search	
  engine's	
  
       Schema.org	
  (RDFa	
  and	
  extending	
  TV	
  vocab).
                                                             13

Monday, March 26, 2012
schema.org




                               14

Monday, March 26, 2012
Workpackage	
  Links

     •    Background	
  data	
  for	
  all	
  Workpackages
     •    Collaborated	
  with	
  WP2	
  on	
  BMF	
  RDF	
  models
     •    Closer	
  2es	
  throughout	
  WP3/7	
  developments
     •    WP4	
  en2ty	
  and	
  topic	
  URIs	
  point	
  to	
  WP1
     •    Outreach	
  work	
  around	
  RDFa,	
  Posi2on	
  Paper	
  


                                                        15

Monday, March 26, 2012
2nd	
  review	
  comments
     • Not	
  clear	
  though	
  how	
  this	
  work	
  has	
  built	
  upon	
  the	
  results	
  of	
  year	
  1,	
  
       and	
  how	
  the	
  current	
  progress	
  is	
  in	
  line	
  with	
  the	
  case	
  studies.	
  
           – Worked	
  more	
  closely	
  and	
  pragma1cally	
  with	
  case	
  studies	
  in	
  
             WP7,	
  especially	
  7c	
  and	
  related	
  WP3	
  work.	
  Moved	
  towards	
  more	
  
             decentralised	
  model,	
  instead	
  of	
  'warehouse'.
           – 7c	
  collabora1on	
  with	
  KMI's	
  'Watch	
  and	
  Buy'	
  scenario,	
  and	
  with	
  
             WP4	
  1med	
  ad	
  inser1on	
  work,	
  used	
  EU	
  p2pnext	
  'limo'	
  work;	
  also	
  
             egtaMETA	
  from	
  EBU	
  from	
  7c
           – WP1	
  work	
  became	
  more	
  "hands-­‐on";	
  we	
  helped	
  WP7	
  extract	
  
             datasets	
  such	
  as	
  TED.com	
  and	
  Archive.org	
  which	
  we	
  expect	
  will	
  
             shortly	
  be	
  replaceable	
  by	
  cleaner	
  informa1on	
  from	
  'official'	
  
             sources.	
  
                                                                                            16

Monday, March 26, 2012
2nd	
  review	
  comments
     • No	
  relevant	
  state	
  of	
  the	
  art	
  is	
  documented	
  and	
  no	
  details	
  or	
  
       cita<ons	
  on	
  automated	
  algorithms	
  are	
  given.	
  Evalua<on	
  is	
  
       restricted	
  to	
  examples	
  and	
  no	
  quan<ta<ve	
  data	
  are	
  given.
           – We	
  accept	
  weakness	
  in	
  report	
  (lack	
  of	
  scholarly/
             scien1fic	
  detail);	
  chose	
  to	
  focus	
  on	
  more	
  informal	
  
             communica1on	
  with	
  outside	
  world	
  in	
  final	
  phase.	
  A	
  2nd	
  
             version	
  of	
  the	
  doc	
  was	
  produced,	
  but	
  main	
  changes	
  
             were	
  around	
  'life	
  aUer	
  project'	
  themes	
  rather	
  than	
  
             adding	
  more	
  scien1fic	
  and	
  scholarly	
  detail.


                                                                                17

Monday, March 26, 2012
2nd	
  review	
  comments

     • 	
  A	
  close	
  collabora5on	
  with	
  WP7	
  is	
  
       recommended	
  in	
  order	
  to	
  ensure	
  that	
  work	
  
       meets	
  the	
  requirements	
  of	
  the	
  use	
  cases.
           – this	
  very	
  well	
  describes	
  our	
  emphasis	
  in	
  final	
  
             phase




                                                                       18

Monday, March 26, 2012
Lessons	
  Learned

     • It's	
  hard	
  to	
  simulate	
  an	
  evolving	
  global	
  data	
  
       ecosystem;	
  but	
  we've	
  played	
  a	
  small	
  part	
  in	
  
       some	
  huge	
  changes.
     • Publishers	
  will	
  adopt	
  simple	
  Seman2c	
  Web	
  
       standards	
  when	
  they	
  are	
  given	
  an	
  incen5ve.
     • It's	
  hard	
  for	
  a	
  4-­‐year	
  old	
  plan	
  to	
  stay	
  relevant	
  
       in	
  such	
  an	
  environment;	
  ability	
  to	
  be	
  agile	
  was	
  
       cri2cally	
  important.
                                                                    19

Monday, March 26, 2012
WP1	
  Summary

     • Used	
  open	
  standards	
  (RDF)	
  and	
  largely	
  open	
  data	
  (e.g.	
  
       Wikipedia/DBpedia)
     • Integrated,	
  mapped	
  and	
  data-­‐mined
     • Contribu1ng	
  our	
  addi1ons	
  back	
  to	
  the	
  community	
  /	
  
       commons	
  (highlight:	
  BBC	
  sims)
     • Documen1ng	
  what	
  we	
  learned	
  for	
  external	
  developers	
  and	
  
       subsequent	
  projects

                                                                           Questions?



                                                                          20

Monday, March 26, 2012
21

Monday, March 26, 2012
22

Monday, March 26, 2012
WP1:	
  End-­‐to-­‐End	
  issues

     • In	
  final	
  year,	
  our	
  End-­‐to-­‐End	
  scenarios	
  have	
  
       more	
  mature	
  implementa2ons
     • Feedback	
  from	
  WP3/7c:	
  key	
  issue	
  is	
  sparsity	
  
       of	
  large	
  vocabularies	
  when	
  used	
  for	
  record	
  
       matching.	
  No	
  single	
  solu2on	
  here.
     • Integra2ng	
  techniques	
  from	
  WP4	
  (e.g.	
  
       clustering,	
  data-­‐mining)	
  cri2cal	
  for	
  applying	
  
       large	
  and	
  chao2c	
  vocabularies	
  for	
  prac2cal	
  
       recommenda2ons.
                                                              23

Monday, March 26, 2012

More Related Content

PDF
Rober stephenson
PPTX
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
PDF
Experiences Evolving a New Analytical Platform: What Works and What's Missing
PPTX
Ky Learning Depot New Horizons
PPT
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
PDF
20100608sigmod
PDF
20100714accel
PDF
Vila LOD-innovacion- bib-semweb-redux
Rober stephenson
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
Experiences Evolving a New Analytical Platform: What Works and What's Missing
Ky Learning Depot New Horizons
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
20100608sigmod
20100714accel
Vila LOD-innovacion- bib-semweb-redux

Viewers also liked (6)

PPTX
Jack downey
PPT
Christoph Streit - Reasons to use a Private Cloud
PPTX
History
PPTX
Hitory topics and events
PPTX
Ayoub digital citizenship action plan
PPT
Digital citizenship Project
Jack downey
Christoph Streit - Reasons to use a Private Cloud
History
Hitory topics and events
Ayoub digital citizenship action plan
Digital citizenship Project
Ad

Similar to NoTube: Models & Semantics (20)

PDF
What is New in W3C land?
PDF
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
PPTX
PiLOD 2013: Is Linked Data the future of data integration in the enterprise?
ODP
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
PPT
PDF
SemTechBiz 2012 Panel on Linking Enterprise Data
PDF
Size does not matter (if your data is in a silo)
PDF
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...
PDF
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
PDF
LOD2 General Presentation 2012
PDF
Open Data Conference - Sören Auer - Linked Open Data
PDF
Pal gov.tutorial2.session15 1.linkeddata
PDF
EU Data Cloud - On to the Cloud
PDF
20111120 warsaw learning curve by b hyland notes
PPTX
Cloud Programming Models: eScience, Big Data, etc.
PDF
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
PDF
Open standards and open source mean open for business cms expo session mc-k...
PPTX
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
PPTX
Slidescambridge2012 120417062050-phpapp02
What is New in W3C land?
LOD2 Plenary Vienna 2012: WP3 - Knowledge Base Creation, Enrichment and Repair
PiLOD 2013: Is Linked Data the future of data integration in the enterprise?
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
SemTechBiz 2012 Panel on Linking Enterprise Data
Size does not matter (if your data is in a silo)
SFScon21 - Sander Van Dooren - Joinup: Maintaining an Open catalogue of reusa...
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
LOD2 General Presentation 2012
Open Data Conference - Sören Auer - Linked Open Data
Pal gov.tutorial2.session15 1.linkeddata
EU Data Cloud - On to the Cloud
20111120 warsaw learning curve by b hyland notes
Cloud Programming Models: eScience, Big Data, etc.
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
Open standards and open source mean open for business cms expo session mc-k...
Capturing Conversations, Context and Curricula: The JLeRN Experiment and the ...
Slidescambridge2012 120417062050-phpapp02
Ad

More from MODUL Technology GmbH (20)

PDF
Finding video shots for immersive journalism through text-to-video search
PDF
LEARNING SUSTAINABLE MOBILITY BEHAVIOUR IN POST-PANDEMIC VIENNA
PPT
How distinct and aligned with UGC is European capitals’ DMO branding on Insta...
PDF
Framing Few Shot Knowledge Graph Completion with Large Language Models
PPTX
Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...
PPTX
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...
PPTX
New Opportunities for Understanding Tourist Photography.pptx
PPT
How do destinations relate to one another? A study of visual destination bran...
PPTX
Do DMOs promote the right aspects of the destination? A study of Instagram ph...
PPT
The Impact of Social Media on perceived Destination Image: case of Mexico Ci...
PDF
The Impact of Social Media on perceived Destination Image: the case of Mexico...
PPT
How Instagram influences Visual Destination Image - a case study of Jordan an...
PPT
Media mining for smarter tourism
PDF
NoTube: Pattern-based Recommendations (part 3)
PDF
NoTube: Pattern-based Recommendations (part 1)
PDF
NoTube: Pattern-based Recommendations (part 1)
PDF
NoTube: Recommendations (Collaborative)
PDF
NoTube: User Profiling (Beancounter)
PDF
14 no tube dissemination and showcases [compatibility mode]
PDF
NoTube: BBC show case
Finding video shots for immersive journalism through text-to-video search
LEARNING SUSTAINABLE MOBILITY BEHAVIOUR IN POST-PANDEMIC VIENNA
How distinct and aligned with UGC is European capitals’ DMO branding on Insta...
Framing Few Shot Knowledge Graph Completion with Large Language Models
Unsupervised Topic Modeling with BERTopic for Coarse and Fine-Grained News Cl...
Breaking New Ground with EPOCH: AI and Web Intelligence Transform Price Forec...
New Opportunities for Understanding Tourist Photography.pptx
How do destinations relate to one another? A study of visual destination bran...
Do DMOs promote the right aspects of the destination? A study of Instagram ph...
The Impact of Social Media on perceived Destination Image: case of Mexico Ci...
The Impact of Social Media on perceived Destination Image: the case of Mexico...
How Instagram influences Visual Destination Image - a case study of Jordan an...
Media mining for smarter tourism
NoTube: Pattern-based Recommendations (part 3)
NoTube: Pattern-based Recommendations (part 1)
NoTube: Pattern-based Recommendations (part 1)
NoTube: Recommendations (Collaborative)
NoTube: User Profiling (Beancounter)
14 no tube dissemination and showcases [compatibility mode]
NoTube: BBC show case

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Cloud computing and distributed systems.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
A Presentation on Artificial Intelligence
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Cloud computing and distributed systems.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectral efficient network and resource selection model in 5G networks
A Presentation on Artificial Intelligence
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Assigned Numbers - 2025 - Bluetooth® Document
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation

NoTube: Models & Semantics

  • 2. WP1  Overview • “Backend” shared datasets and services • Mappings, integration and common vocabulary • Extra datasets to support usecase scenarios 2 Monday, March 26, 2012
  • 3. WP1:  Year  3  Direc2on  &  Achievements • Moving  from  single  ‘warehouse’  to  distributed   set  of  databases,  datasets  and  services • Planning  for  sustainable  life-­‐aFer-­‐project • Integra2ng  feedback  from  end-­‐to-­‐end  demos 3 Monday, March 26, 2012
  • 5. Why  WP1?  two  roles • NoTube  internal:  a  hub  for  data  sharing • NoTube  external:  show  how  shared  datasets   and  vocabularies  help  with  user-­‐facing  “Web   and  TV”  problems • “show”  -­‐cri2cally-­‐  includes  “thinking  out  loud”   as  we  explore,  via  blog,  email,  twiTer  etc. – scholarly  ar2cles  rarely  reach  our  target  audiences 5 Monday, March 26, 2012
  • 6. Outreach  message • Let  metadata  flow  widely  -­‐  adver2sing  content,   rather  than  be  a  hidden  asset • Iden/fy  and  link  content  with  useful  URLs(*) • Open  APIs  to  control  TV  and  link  devices  [WP7c] ...from W3C TV & Web position paper (with Project Baird), Berlin 9 Feb 2011 WP1 concerned primarily with the first two: getting metadata into the Web from source, rather than scraping, guessing, approximating. 6 Monday, March 26, 2012
  • 7. Aside:  RDFa  went  mainstream • Try  ‘View  source’  on  IMDB,  RoTen  Tomatoes,   BBC,  tv.com  sites  to  find  RDF  descrip2ons  of   TV  content.   • NoTube’s  approach  was  to  lead  by  example,   to  engage  with  industry  and  to  plan  from  the   beginning  for  the  ‘aFerlife’. • This  strategy  worked. 7 Monday, March 26, 2012
  • 8. Facebook OGP tv.com 'The Wire' page ...simple, extensible standards are being adopted OGP since 2010; schema.org since 2011... 8 Monday, March 26, 2012
  • 9. TV  Data  Warehouse • We  s2ll  host  several  crawls  of  TV  EPG  data • Trend  is  for  data  to  be  more  cleanly  available   from  source,  without  scraping • Crawling,  aggrega2on  and  integra2on  s2ll   useful,  but  less  scraping  required • Crawled  'data  warehouse'  also  used  as  a   research  testbed  collec2on 9 Monday, March 26, 2012
  • 10. WP1:  Example  Datasets   • WP7c/WP3  use  DBpedia/Wikipedia  URLs  for   topics;  covers  all  mainstream  areas.     • BBC  also  using  Lonclass/UDC  topic  codes   (we’re  helping  prepare  this  for  sharing) • For  Music,  we  adopt  MusicBrainz  IDs • Mapping  diverse  representa2ons  of  ‘genre’ • “Organic”  item/topic  similarity  measures   derived  from  user  data  from  WP3 10 Monday, March 26, 2012
  • 11. WP1:  Data  Services • Data  Services  exposed  as  sta2c  files: – Show  how  to  embed  RDFa  in  HTML – Publish  as  RDF/XML  Linked  Data • Interac2ve  Data  Services: – Using  W3C  SPARQL,  SQL  or  SOLR/Lucene,  over   HTTP  and/or  XMPP. 11 Monday, March 26, 2012
  • 12. WP1:  Exploita2on  and  Sustainability • WP1’s  approach  designed  to  outlive  NoTube • Use,  augment  and  contribute  to  external  data – e.g.  DBpedia,  Archive.org,  W3C  &  wider  Web  of   data  trend  (e.g.  RDFa  adop2on) – also  we  demonstrate  e.g.  on  blog  how  we  did  it  -­‐   so  others  can  replicate  it – WP4  enrichments  can  be  fed  back  to  externals,   e.g.  similarity  metrics  &  clusters 12 Monday, March 26, 2012
  • 13. WP1:  Sustainability  2 • NoTube’s  2010  W3C  “Web  &  TV”  posi2on   paper  lobbied  for  unique  IDs  &  public   metadata  for  video  content;  this  is  now  going   mainstream. • VUA  will  con2nue  hos2ng  some  data,  using   PURL.org  so  can  pass  e.g.  to  W3C  later. • Collab  with  Facebook  OGP  (helped  with  their   RDFa  adop2on)  and  now  search  engine's   Schema.org  (RDFa  and  extending  TV  vocab). 13 Monday, March 26, 2012
  • 14. schema.org 14 Monday, March 26, 2012
  • 15. Workpackage  Links • Background  data  for  all  Workpackages • Collaborated  with  WP2  on  BMF  RDF  models • Closer  2es  throughout  WP3/7  developments • WP4  en2ty  and  topic  URIs  point  to  WP1 • Outreach  work  around  RDFa,  Posi2on  Paper   15 Monday, March 26, 2012
  • 16. 2nd  review  comments • Not  clear  though  how  this  work  has  built  upon  the  results  of  year  1,   and  how  the  current  progress  is  in  line  with  the  case  studies.   – Worked  more  closely  and  pragma1cally  with  case  studies  in   WP7,  especially  7c  and  related  WP3  work.  Moved  towards  more   decentralised  model,  instead  of  'warehouse'. – 7c  collabora1on  with  KMI's  'Watch  and  Buy'  scenario,  and  with   WP4  1med  ad  inser1on  work,  used  EU  p2pnext  'limo'  work;  also   egtaMETA  from  EBU  from  7c – WP1  work  became  more  "hands-­‐on";  we  helped  WP7  extract   datasets  such  as  TED.com  and  Archive.org  which  we  expect  will   shortly  be  replaceable  by  cleaner  informa1on  from  'official'   sources.   16 Monday, March 26, 2012
  • 17. 2nd  review  comments • No  relevant  state  of  the  art  is  documented  and  no  details  or   cita<ons  on  automated  algorithms  are  given.  Evalua<on  is   restricted  to  examples  and  no  quan<ta<ve  data  are  given. – We  accept  weakness  in  report  (lack  of  scholarly/ scien1fic  detail);  chose  to  focus  on  more  informal   communica1on  with  outside  world  in  final  phase.  A  2nd   version  of  the  doc  was  produced,  but  main  changes   were  around  'life  aUer  project'  themes  rather  than   adding  more  scien1fic  and  scholarly  detail. 17 Monday, March 26, 2012
  • 18. 2nd  review  comments •  A  close  collabora5on  with  WP7  is   recommended  in  order  to  ensure  that  work   meets  the  requirements  of  the  use  cases. – this  very  well  describes  our  emphasis  in  final   phase 18 Monday, March 26, 2012
  • 19. Lessons  Learned • It's  hard  to  simulate  an  evolving  global  data   ecosystem;  but  we've  played  a  small  part  in   some  huge  changes. • Publishers  will  adopt  simple  Seman2c  Web   standards  when  they  are  given  an  incen5ve. • It's  hard  for  a  4-­‐year  old  plan  to  stay  relevant   in  such  an  environment;  ability  to  be  agile  was   cri2cally  important. 19 Monday, March 26, 2012
  • 20. WP1  Summary • Used  open  standards  (RDF)  and  largely  open  data  (e.g.   Wikipedia/DBpedia) • Integrated,  mapped  and  data-­‐mined • Contribu1ng  our  addi1ons  back  to  the  community  /   commons  (highlight:  BBC  sims) • Documen1ng  what  we  learned  for  external  developers  and   subsequent  projects Questions? 20 Monday, March 26, 2012
  • 23. WP1:  End-­‐to-­‐End  issues • In  final  year,  our  End-­‐to-­‐End  scenarios  have   more  mature  implementa2ons • Feedback  from  WP3/7c:  key  issue  is  sparsity   of  large  vocabularies  when  used  for  record   matching.  No  single  solu2on  here. • Integra2ng  techniques  from  WP4  (e.g.   clustering,  data-­‐mining)  cri2cal  for  applying   large  and  chao2c  vocabularies  for  prac2cal   recommenda2ons. 23 Monday, March 26, 2012