SlideShare a Scribd company logo
Web scale
Named Entity Mining
"There's simply too much information out there"
WI-IAT 2011
in memoriam of
Herbert A. Simon …
stuck
April 2011
Herbert Simon's Brookings Institute Lecture
"Designing Organizations for an Information-Rich World"
Johns Hopkins University, September 1, 1969
1.Tales & legends
Find & procurea crystal plastic replacement of a polycarbonate LEXAN 943
Main constraints:
•more resistant to detergent agents than LEXAN 943 (problem of cracking under combined effect of mechanical stress
and exposure to detergent agents)
•compatible with existing tools - withdrawal must be close to LEXAN 943
•optical characteristic close to LEXAN 943
•weldable by ultrasonic welding
•compliant with resistance to fire & smoke requirements 2 according to NFF16-101/102 and V0 according standard UL 94
delay : one week
organization centric search
Where is sold/operated the SA-24 Grinch 9K338 Igla-S portable air
defense missile system ?
location centric search
Recent information (past month)
about call for proposal
"outils Web innovants en entreprise" ?
time centric search
Location
"pro" searches focus on
Orgs People
Time
named entities
2.Introducing
WebNEM
relevant
query ?
query
again ?
where ?
+ browsing/ranking
results
Attention-greedy & burdensome
product
specifications
get
manufacturer
or distributor
find
compliant
products
"SA-24 Grinch
9K338 Igla-S"
Goal : Attention-saver process
exploratory data analysis
of high dimensional data
"In exploratory data analysis of high dimensional data
one of the main tasks is the formation of a
simplified, usually visual, overview of data sets.
....
Clustering and projection
are among the examples of useful methods
to achieve this task."
Fernando Lourenco, Victor Lobo, Fernando Bacao: Binary-based similarity measures for categorical data and their
application in self-organizing maps. JOCLAD 2004 - XI Jornadas de Classificacao e Anlise de Dados, April 1-3 , Lisbon (2004)
Lourenço, Lobo, Bação – JOCLAD 2004
WebNEM
collection of
relevant data,
anywhere in the web
+ projection on
Named Entities space
topical web crawler
named entity recognition
visualization/exploratory analysis tools
"Web scale" collection : brute force
never-ending crawl
fast answer,
"any" topic
a priori
"whole" Web indexing
general index
"everywhere"
huge resources required
(data size based)
user
query
"Web scale" collection : our approach
"close to optimal" resources
(usage based)
user
query
on-demand topical crawl
delayed answer,
but less garbage
tailored index
anywhere
relevant
built on order
Web slices
Projection : when to extract entities ?
Named Entity Recognition is resource intensive
crawl time whole web 1010 asynchronous
query time collection 102 real-time
crawl time web slice 104 asynchronous
process step data size required response time
www.squido.fr
our SaaS Web mining system
large scale
Named Entity extraction (EN/FR)
beta released to customers
June 2011
WebNEM with Squido
index
focused
crawl
search
topic
shallow
entity extraction
page
cleaning
user
queries
user
collections
deep
entity extraction
visualization
visualization
Page cleaning
instead
of
this
work
on
this
fast heuristic
DOM processing
Shallow extraction
detect
language
tokenize
sentence
split
gazetteers grammar
Web
docs
format
parse
index
Deep extraction
POS
tagger
grammar
ortho
matcher index
morpho
analyzer
NP/VP
chunker
≅≅≅≅ shallow extraction + elaborate linguistics
3.Annoyances
Linguistic processing throughput
deep extraction
too expensive
when crawling
shallow
extraction
OK
penalty
on
quality
workaround :
asynch deep extraction
on smaller collections
query time sanitization
Page cleaning
need evaluation
goal : ↗accuracy ? cost : ↘ recall ?
performance impact ?
↘ +1 processing step
↗ less text in later steps
"Multiple dates" usage ?
<DATE TYPE="DateDay" D="11" M="2" Y="2008">February 10-13, 2008</DATE>
<DATE TYPE="DateDay" D="11" M="2" Y="2008">February 9-13, 2008</DATE>
<DATE TYPE="DateDay" D="12" M="11" Y="2007">November 11-13, 2007</DATE>
<DATE TYPE="DateDay" D="14" M="10" Y="2008">October 12-17, 2008</DATE>
<DATE TYPE="DateDay" D="16" M="2" Y="2009">February 15-18, 2009</DATE>
<DATE TYPE="DateDay" D="17" M="9" Y="2007">September 16-19, 2007</DATE>
<DATE TYPE="DateDay" D="2" M="5" Y="2008">May 2, 2008</DATE>
<DATE TYPE="DateDay" D="26" M="5" Y="2009">May 24-29, 2009</DATE>
<DATE TYPE="DateDay" D="27" M="10" Y="2009">October 25-29, 2009</DATE>
<DATE TYPE="DateDay" D="7" M="10" Y="2008">October 5-9 2008</DATE>
<DATE TYPE="DateDay" D="8" M="2" Y="2009">February 7-10, 2009</DATE>
<DATE TYPE="DateDay" D="8" M="5" Y="2007">May 6-11, 2007</DATE>
<DATE TYPE="DateDay" D="9" M="10" Y="2007">October 7-12, 2007</DATE>
<DATE TYPE="DateMonth" M="11" Y="2009">November, 2009</DATE>
<DATE TYPE="DateMonth" M="2" Y="2009">February, 2009</DATE>
<DATE TYPE="DateMonth" M="8" Y="2008">August 2008</DATE>
retrieve
by date
sort
by date
?
Publishing date ?
critical for
time centric
searches
published
05/2011tagged as
7 jul 2011
& many more…
wrong
spelling
Tapei→Taipei
location is also a first name
"University of Michigan, Ann Arbor, MI"→Ann Arbor (person)
compound first names
"Jean-Claude Marin"→Claude Marin
wrong character case (very frequent on titles)
breaks all case-based rules
barrack obama→not extracted
How To Buy Electric Trucks→Buy Electric (organization)
In Virginia Life Is Sweet→Virginia Life (person)
polymorphism
"Nagy Bocsa", "Nagy-Bocsa", "Nagy"
sanitize parser output
for tokenization
transliteration, case, punctuation, …
4. Results
Reminder
Next results are obtained
automatically
from unstructured content
picked on the web
by an autonomous system,
without previous knowledge
of the topic or the visited Web sites
Let's try it with a use case
"hydrogen storage for fuel cells"
What's inside a collection
of 66 highly ranked documents ?
run a few cycles
(shallow extraction only)
entity
weight function
(tf-idf, …)
some
104 pages
PeopleOrgs Location Time
Special attention paid
to so-called outliers
Organizations > 900 : overload…
page cleaning + entity sanitization
=> better details & accuracy
↗attention ↘information : top 50
academic
team ?
H2 military
usage ?
new questions are instantly popping up
?
People
authors lead to
relevant content
(classic IR method,
even in libraries !)
?
Countries
political threats
on Lithium battery
supplies
argument in favor of
H2 technology
Cities
"Austin is in a unique position
to offer its electric grid as a
real world proving ground"
"Direct Methanol Fuel Cells"
⇒alternative to H2
!
!
!
changeover from nickel to lithium
will be complete by 2016 and 2018
Multiple-dates timeline
outlookhistory
domains
time
Honda President Takanobu Ito says
around 10 percent of Honda’s global sales
will be hybrids by 2015
In a few clicks...
DMFC alternative to H2
Austin,
TX
hydrogen storage
for fuel cells ?
changeover from
nickel to lithium
by 2016/2018
5. Perspectives
To clean or not to clean ?
performance impact"attention" impact
run pipeline with/without cleaningcorpus
label examples +/-
clean
set
full
set
time full
pipeline
Publishing date extraction
heuristic
DOM processing
prototype ready
need large scale
evaluation
build gold
standard from
RSS feeds
A zest of Linked Data ?
too slow & fat
for crawling...
use it "offline"
disambiguation, gazetteers, infoboxes, ...
Play with graphs
entity co-occurence, page similarity, ...
UI/user experience
search facets
word clouds
maps
dashboards
infoboxes
highlighting
graphs
Lexical Taxonomies Induction
22nd International Joint Conference on Artificial Intelligence (IJCAI 2011),
Barcelona, Spain, July 19-22nd, 2011
another kind of projection
a. A real need of Attention-saving…
b. WebNEM results are encouraging
c. Work in progress, lots of paths to explore
6. Digest
"There's simply
too much
information out
there."
"Leaders feel
misled. Stupid.
Trapped."
Final word by Herbert Simon
"Filtering by intelligent programs
is the main part of the answer"
[to information overload]
www.ixxo.fr
www.slideshare.net/fpouilloux
www.linkedin.com/pub/st%C3%A9phanie-jacquemont/20/271/767
www.linkedin.com/in/fpouilloux
MANY THANKS!
joint work of
CREDITS
Photos
2. Home page, The 2011 IEEE/WIC/ACM International Conference on Web
Intelligence
4. Designing Organizations for an Information-Rich World, The Herbert A.
Simon Collection
5.Vlad the Impaler, Wikimedia commons
7. Missile 9M342 of the portable anti-aircraft missile system Igla-S,
©vitalykuzmin.net
10. Internet Map 2005, ©www.opte.org
33. The Inspector, ©DePatie-Freleng Enterprises
36. Nanomaterials for Solid State Hydrogen Storage, book cover,
©springer.com
40. EnerDel/Argonne lithium-ion battery, ©Argonne National Laboratory
40. Pennybacker Bridge - Austin, TX, ©Andy Heatwole
41. 20060206211301_132363.jpg, pulpo.org, ©Jumpedforjoy
44. Linking Open Data cloud diagram, ©Richard Cyganiak and Anja
Jentzsch, lod-cloud.net
44. Taji crawl, ©The U.S. Army, www.flickr.com/soldiersmediacenter
48. Views of the solar corona by the Transition Region and Coronal
Explorer, Stanford-Lockheed Institute for Space Research, NASA Small
Explorer program
49. Hyperformance book cover, www.tjwaters.com
50. Dr Simon solving puzzles, The Herbert A. Simon Collection
Websites
wi-iat-2011.org
The Herbert A. Simon Collection, Carnegie Mellon University Libraries,
diva.library.cmu.edu/webapp/simon/index.html
www.google.com
online.barrons.com
www.me.utexas.edu/~dmfc-muri
www.alsace-industrie.fr
www.hybridcars.com
www.me.utexas.edu/blogs/meyersresearchgroup
Bibliography
Simon, H. A. (1971), "Designing Organizations for an Information-Rich
World", Carnegie Mellon University Libraries,
diva.library.cmu.edu/webapp/simon/item.jsp?q=/box00055/fld04178/bdl
0002/doc0001
Waters, T. J. (2011), "Hyperformance",
www.tjwaters.com/hyperformance-excerpt.html
R. Navigli, P. Velardi, S. Faralli. A Graph-based Algorithm for Inducing
Lexical Taxonomies from Scratch. Proc. of the 22nd International Joint
Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, July
19-22nd, 2011, pp. 1872-1877.

More Related Content

PDF
How to develop a data scientist – What business has requested v02
PDF
Transitioning web application frameworks towards the Semantic Web (master the...
PDF
RDFa: putting RDF on the Web
PPTX
Hello Open World - Semtech 2009
PDF
Preservation and institutional repositories for the digital arts and humanities
PDF
Caliber 2009 Tutorial Mgsree
PDF
How to Build Linked Data Sites with Drupal 7 and RDFa
PPTX
Semantic Web Landscape 2009
How to develop a data scientist – What business has requested v02
Transitioning web application frameworks towards the Semantic Web (master the...
RDFa: putting RDF on the Web
Hello Open World - Semtech 2009
Preservation and institutional repositories for the digital arts and humanities
Caliber 2009 Tutorial Mgsree
How to Build Linked Data Sites with Drupal 7 and RDFa
Semantic Web Landscape 2009

Viewers also liked (20)

PDF
Web Intelligence et Information Stratégique sur le Web
PDF
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
PPTX
Identités des sciences humaines et formation en humanités digitales, Claire C...
PPTX
Intelligence artificielle
PDF
Vers un monde digital plus intelligent
PPTX
L’intelligence artificielle
PDF
Intelligence Artificielle : Introduction à l'intelligence artificielle
PPTX
Gambia 2015 rural development and education discovery visit
PPTX
Sistek Chandler Cue10
PDF
Gamification at large and in learning
PDF
BCC 2005 - Justice Biometrics Cooperative
PDF
cleveland_overview_slides
PDF
Guión Litúrgico
PPT
Web 2.0 , social media safety in education with Lucian
PPTX
The kc quiz
PPTX
Horizon Report Higher Education Briefing
PDF
Using The National Science and Technology Council (NSTC)
PPTX
Jive World 12 - Apps 202
PPT
Anum presentation
Web Intelligence et Information Stratégique sur le Web
The 2011 IEEE/WIC/ACM International Conference on Web Intelligence » industry...
Identités des sciences humaines et formation en humanités digitales, Claire C...
Intelligence artificielle
Vers un monde digital plus intelligent
L’intelligence artificielle
Intelligence Artificielle : Introduction à l'intelligence artificielle
Gambia 2015 rural development and education discovery visit
Sistek Chandler Cue10
Gamification at large and in learning
BCC 2005 - Justice Biometrics Cooperative
cleveland_overview_slides
Guión Litúrgico
Web 2.0 , social media safety in education with Lucian
The kc quiz
Horizon Report Higher Education Briefing
Using The National Science and Technology Council (NSTC)
Jive World 12 - Apps 202
Anum presentation
Ad

Similar to Web Scale Named Entity Mining (20)

PPT
Can’t Find Your 404s?
PPTX
Library discovery: past, present and some futures
PPT
The personal search engine
PPTX
Lesson 2 network and the internet
PPTX
Web Archives and the dream of the Personal Search Engine
PDF
Science and Web2.0
PPT
An Open Context for Archaeology
PPT
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
PPT
Sem web tutorial general
PPT
The network reconfigures the catalog
PPTX
Reading Group 2013 (DERI NUIG)
PPTX
HKU Data Curation MLIM7350 Class 10
PDF
Semtech2006
PPT
(Re-)Discovering Lost Web Pages
ODP
Text-mining and Automation
PPT
Enterprise Navigation (KM World 2007)
PDF
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
PPT
Web3uploaded
PDF
Lecture: Ontologies and the Semantic Web
PDF
Can’t Find Your 404s?
Library discovery: past, present and some futures
The personal search engine
Lesson 2 network and the internet
Web Archives and the dream of the Personal Search Engine
Science and Web2.0
An Open Context for Archaeology
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Sem web tutorial general
The network reconfigures the catalog
Reading Group 2013 (DERI NUIG)
HKU Data Curation MLIM7350 Class 10
Semtech2006
(Re-)Discovering Lost Web Pages
Text-mining and Automation
Enterprise Navigation (KM World 2007)
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Web3uploaded
Lecture: Ontologies and the Semantic Web
Ad

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
A Presentation on Artificial Intelligence
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Digital-Transformation-Roadmap-for-Companies.pptx
A comparative analysis of optical character recognition models for extracting...
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
A Presentation on Artificial Intelligence
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
1. Introduction to Computer Programming.pptx
Tartificialntelligence_presentation.pptx
Building Integrated photovoltaic BIPV_UPV.pdf

Web Scale Named Entity Mining

  • 1. Web scale Named Entity Mining "There's simply too much information out there" WI-IAT 2011
  • 2. in memoriam of Herbert A. Simon …
  • 4. Herbert Simon's Brookings Institute Lecture "Designing Organizations for an Information-Rich World" Johns Hopkins University, September 1, 1969
  • 6. Find & procurea crystal plastic replacement of a polycarbonate LEXAN 943 Main constraints: •more resistant to detergent agents than LEXAN 943 (problem of cracking under combined effect of mechanical stress and exposure to detergent agents) •compatible with existing tools - withdrawal must be close to LEXAN 943 •optical characteristic close to LEXAN 943 •weldable by ultrasonic welding •compliant with resistance to fire & smoke requirements 2 according to NFF16-101/102 and V0 according standard UL 94 delay : one week organization centric search
  • 7. Where is sold/operated the SA-24 Grinch 9K338 Igla-S portable air defense missile system ? location centric search
  • 8. Recent information (past month) about call for proposal "outils Web innovants en entreprise" ? time centric search
  • 9. Location "pro" searches focus on Orgs People Time named entities
  • 11. relevant query ? query again ? where ? + browsing/ranking results Attention-greedy & burdensome product specifications get manufacturer or distributor find compliant products
  • 12. "SA-24 Grinch 9K338 Igla-S" Goal : Attention-saver process
  • 13. exploratory data analysis of high dimensional data
  • 14. "In exploratory data analysis of high dimensional data one of the main tasks is the formation of a simplified, usually visual, overview of data sets. .... Clustering and projection are among the examples of useful methods to achieve this task." Fernando Lourenco, Victor Lobo, Fernando Bacao: Binary-based similarity measures for categorical data and their application in self-organizing maps. JOCLAD 2004 - XI Jornadas de Classificacao e Anlise de Dados, April 1-3 , Lisbon (2004) Lourenço, Lobo, Bação – JOCLAD 2004
  • 15. WebNEM collection of relevant data, anywhere in the web + projection on Named Entities space topical web crawler named entity recognition visualization/exploratory analysis tools
  • 16. "Web scale" collection : brute force never-ending crawl fast answer, "any" topic a priori "whole" Web indexing general index "everywhere" huge resources required (data size based) user query
  • 17. "Web scale" collection : our approach "close to optimal" resources (usage based) user query on-demand topical crawl delayed answer, but less garbage tailored index anywhere relevant built on order Web slices
  • 18. Projection : when to extract entities ? Named Entity Recognition is resource intensive crawl time whole web 1010 asynchronous query time collection 102 real-time crawl time web slice 104 asynchronous process step data size required response time
  • 19. www.squido.fr our SaaS Web mining system large scale Named Entity extraction (EN/FR) beta released to customers June 2011
  • 20. WebNEM with Squido index focused crawl search topic shallow entity extraction page cleaning user queries user collections deep entity extraction visualization visualization
  • 25. Linguistic processing throughput deep extraction too expensive when crawling shallow extraction OK penalty on quality workaround : asynch deep extraction on smaller collections query time sanitization
  • 26. Page cleaning need evaluation goal : ↗accuracy ? cost : ↘ recall ? performance impact ? ↘ +1 processing step ↗ less text in later steps
  • 27. "Multiple dates" usage ? <DATE TYPE="DateDay" D="11" M="2" Y="2008">February 10-13, 2008</DATE> <DATE TYPE="DateDay" D="11" M="2" Y="2008">February 9-13, 2008</DATE> <DATE TYPE="DateDay" D="12" M="11" Y="2007">November 11-13, 2007</DATE> <DATE TYPE="DateDay" D="14" M="10" Y="2008">October 12-17, 2008</DATE> <DATE TYPE="DateDay" D="16" M="2" Y="2009">February 15-18, 2009</DATE> <DATE TYPE="DateDay" D="17" M="9" Y="2007">September 16-19, 2007</DATE> <DATE TYPE="DateDay" D="2" M="5" Y="2008">May 2, 2008</DATE> <DATE TYPE="DateDay" D="26" M="5" Y="2009">May 24-29, 2009</DATE> <DATE TYPE="DateDay" D="27" M="10" Y="2009">October 25-29, 2009</DATE> <DATE TYPE="DateDay" D="7" M="10" Y="2008">October 5-9 2008</DATE> <DATE TYPE="DateDay" D="8" M="2" Y="2009">February 7-10, 2009</DATE> <DATE TYPE="DateDay" D="8" M="5" Y="2007">May 6-11, 2007</DATE> <DATE TYPE="DateDay" D="9" M="10" Y="2007">October 7-12, 2007</DATE> <DATE TYPE="DateMonth" M="11" Y="2009">November, 2009</DATE> <DATE TYPE="DateMonth" M="2" Y="2009">February, 2009</DATE> <DATE TYPE="DateMonth" M="8" Y="2008">August 2008</DATE> retrieve by date sort by date ?
  • 28. Publishing date ? critical for time centric searches published 05/2011tagged as 7 jul 2011
  • 29. & many more… wrong spelling Tapei→Taipei location is also a first name "University of Michigan, Ann Arbor, MI"→Ann Arbor (person) compound first names "Jean-Claude Marin"→Claude Marin wrong character case (very frequent on titles) breaks all case-based rules barrack obama→not extracted How To Buy Electric Trucks→Buy Electric (organization) In Virginia Life Is Sweet→Virginia Life (person) polymorphism "Nagy Bocsa", "Nagy-Bocsa", "Nagy" sanitize parser output for tokenization transliteration, case, punctuation, …
  • 31. Reminder Next results are obtained automatically from unstructured content picked on the web by an autonomous system, without previous knowledge of the topic or the visited Web sites
  • 32. Let's try it with a use case "hydrogen storage for fuel cells" What's inside a collection of 66 highly ranked documents ? run a few cycles (shallow extraction only) entity weight function (tf-idf, …) some 104 pages PeopleOrgs Location Time
  • 33. Special attention paid to so-called outliers
  • 34. Organizations > 900 : overload… page cleaning + entity sanitization => better details & accuracy
  • 35. ↗attention ↘information : top 50 academic team ? H2 military usage ? new questions are instantly popping up ?
  • 36. People authors lead to relevant content (classic IR method, even in libraries !) ?
  • 37. Countries political threats on Lithium battery supplies argument in favor of H2 technology
  • 38. Cities "Austin is in a unique position to offer its electric grid as a real world proving ground" "Direct Methanol Fuel Cells" ⇒alternative to H2 ! ! !
  • 39. changeover from nickel to lithium will be complete by 2016 and 2018 Multiple-dates timeline outlookhistory domains time Honda President Takanobu Ito says around 10 percent of Honda’s global sales will be hybrids by 2015
  • 40. In a few clicks... DMFC alternative to H2 Austin, TX hydrogen storage for fuel cells ? changeover from nickel to lithium by 2016/2018
  • 42. To clean or not to clean ? performance impact"attention" impact run pipeline with/without cleaningcorpus label examples +/- clean set full set time full pipeline
  • 43. Publishing date extraction heuristic DOM processing prototype ready need large scale evaluation build gold standard from RSS feeds
  • 44. A zest of Linked Data ? too slow & fat for crawling... use it "offline" disambiguation, gazetteers, infoboxes, ...
  • 45. Play with graphs entity co-occurence, page similarity, ...
  • 46. UI/user experience search facets word clouds maps dashboards infoboxes highlighting graphs
  • 47. Lexical Taxonomies Induction 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, July 19-22nd, 2011 another kind of projection
  • 48. a. A real need of Attention-saving… b. WebNEM results are encouraging c. Work in progress, lots of paths to explore 6. Digest
  • 49. "There's simply too much information out there." "Leaders feel misled. Stupid. Trapped."
  • 50. Final word by Herbert Simon "Filtering by intelligent programs is the main part of the answer" [to information overload]
  • 52. CREDITS Photos 2. Home page, The 2011 IEEE/WIC/ACM International Conference on Web Intelligence 4. Designing Organizations for an Information-Rich World, The Herbert A. Simon Collection 5.Vlad the Impaler, Wikimedia commons 7. Missile 9M342 of the portable anti-aircraft missile system Igla-S, ©vitalykuzmin.net 10. Internet Map 2005, ©www.opte.org 33. The Inspector, ©DePatie-Freleng Enterprises 36. Nanomaterials for Solid State Hydrogen Storage, book cover, ©springer.com 40. EnerDel/Argonne lithium-ion battery, ©Argonne National Laboratory 40. Pennybacker Bridge - Austin, TX, ©Andy Heatwole 41. 20060206211301_132363.jpg, pulpo.org, ©Jumpedforjoy 44. Linking Open Data cloud diagram, ©Richard Cyganiak and Anja Jentzsch, lod-cloud.net 44. Taji crawl, ©The U.S. Army, www.flickr.com/soldiersmediacenter 48. Views of the solar corona by the Transition Region and Coronal Explorer, Stanford-Lockheed Institute for Space Research, NASA Small Explorer program 49. Hyperformance book cover, www.tjwaters.com 50. Dr Simon solving puzzles, The Herbert A. Simon Collection Websites wi-iat-2011.org The Herbert A. Simon Collection, Carnegie Mellon University Libraries, diva.library.cmu.edu/webapp/simon/index.html www.google.com online.barrons.com www.me.utexas.edu/~dmfc-muri www.alsace-industrie.fr www.hybridcars.com www.me.utexas.edu/blogs/meyersresearchgroup Bibliography Simon, H. A. (1971), "Designing Organizations for an Information-Rich World", Carnegie Mellon University Libraries, diva.library.cmu.edu/webapp/simon/item.jsp?q=/box00055/fld04178/bdl 0002/doc0001 Waters, T. J. (2011), "Hyperformance", www.tjwaters.com/hyperformance-excerpt.html R. Navigli, P. Velardi, S. Faralli. A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch. Proc. of the 22nd International Joint Conference on Artificial Intelligence (IJCAI 2011), Barcelona, Spain, July 19-22nd, 2011, pp. 1872-1877.