Automa'cally	
  Labeling	
  Facts	
  in	
  a	
  
Never-­‐Ending	
  Langue	
  Learning	
  system	
  
Estevam	
  R.	
  Hruschka	
  Jr.	
  
Federal	
  University	
  of	
  São	
  Carlos	
  
	
  
Joint	
  Work	
  with	
  the	
  Carnegie	
  Mellon	
  Read	
  The	
  Web	
  Group	
  
Never-­‐Ending	
  Learning	
  Language	
  
Automatically Labeling Facts in a Never-Ending Langue Learning system
Never-­‐Ending	
  Learning	
  
Never-­‐Ending	
  Learning	
  
•  Main Task: acquire	
  a	
  growing	
  competence	
  
without	
  asymptote	
  	
  
•  over	
  years	
  
•  mul'ple	
  func'ons	
  
•  where	
  learning	
  one	
  thing	
  improves	
  ability	
  to	
  learn	
  the	
  next	
  	
  
•  acquiring	
  data	
  from	
  humans,	
  environment	
  	
  
•  Many	
  candidate	
  domains:	
  	
  
•  Robots	
  	
  
•  SoEbots	
  	
  
•  Game	
  players	
  	
  
Never-­‐Ending	
  Learning	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Inputs:
l 	
   	
  initial ontology	
   	
  	
  
l 	
   	
  handful of examples of each predicate in ontology
l 	
   	
  the web
l 	
   	
  occasional interaction with human trainers
The task:
l 	
   	
  run 24x7, forever
•	
   	
  each day:
1.	
   	
  extract more facts from the web to populate the initial ontology
2.	
   	
  learn to read (perform #1) better than yesterday
hGp://rtw.ml.cmu.edu	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Goal:
•	
   	
  run 24x7, forever
•	
   	
  each day:
1.	
   	
  extract more facts from the web to populate given ontology
2.	
   	
  learn to read better than yesterday
Today...
Running 24 x 7, since January, 2010
Input:
•	
  	
  ontology defining ~800 categories and relations
•	
  	
  10-20 seed examples of each
•	
  	
  1 billion web pages (ClueWeb – Jamie Callan)
Result:
•	
  	
  continuously growing KB with +70,000,000 extracted beliefs
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Human	
  	
  
Advice	
  
and	
  e	
  
Human	
  	
  
Advice	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
Conversing	
  Learning	
  
Conversing	
  Learning	
  
Basic	
  Steps:	
  
•  Decide	
  which	
  task	
  is	
  going	
  to	
  be	
  asked	
  	
  
•  Determine	
  who	
  are	
  the	
  oracles	
  the	
  ML	
  system	
  is	
  
going	
  to	
  consult	
  	
  
•  Propose	
  a	
  method	
  of	
  conversa'on	
  with	
  oracles,	
  
oEen	
  humans	
  	
  
•  Determine	
  how	
  to	
  feedback	
  the	
  ML	
  system	
  with	
  
the	
  community	
  inputs	
  	
  
Conversing	
  Learning	
  
Basic	
  Steps:	
  
•  Decide	
  which	
  task	
  is	
  going	
  to	
  be	
  asked	
  
•  Determine	
  who	
  are	
  the	
  oracles	
  the	
  ML	
  system	
  is	
  
going	
  to	
  consult	
  	
  
•  Propose	
  a	
  method	
  of	
  conversa'on	
  with	
  oracles,	
  
oEen	
  humans	
  	
  
•  Determine	
  how	
  to	
  feedback	
  the	
  ML	
  system	
  with	
  
the	
  community	
  inputs	
  	
  
Conversing	
  Learning	
  
Decide	
  which	
  task	
  is	
  going	
  to	
  be	
  asked	
  
	
  
•  Learned	
  facts	
  
•  Learned	
  Inference	
  Rules	
  
•  Metadata	
  (mainly	
  for	
  automa'cally	
  extending	
  the	
  
ontology)	
  
Conversing	
  Learning	
  
Basic	
  Steps:	
  
•  Decide	
  which	
  task	
  is	
  going	
  to	
  be	
  asked	
  	
  
•  Determine	
  who	
  are	
  the	
  oracles	
  the	
  ML	
  system	
  is	
  
going	
  to	
  consult	
  	
  
•  Propose	
  a	
  method	
  of	
  conversa'on	
  with	
  oracles,	
  
oEen	
  humans	
  	
  
•  Determine	
  how	
  to	
  feedback	
  the	
  ML	
  system	
  with	
  
the	
  community	
  inputs	
  	
  
Conversing	
  Learning	
  
who	
  are	
  the	
  oracles	
  the	
  ML	
  system	
  is	
  going	
  to	
  
consult	
  
Yahoo!	
  Answers	
  	
  
– very	
  popular	
  on	
  the	
  Web	
  	
  
– a	
  lot	
  of	
  metadata	
  to	
  harvest	
  	
  	
  
TwiGer	
  	
  
– millions	
  of	
  users	
  worldwide	
  
– a	
  system	
  that	
  was	
  not	
  designed	
  to	
  work	
  as	
  a	
  QA	
  
environment	
  	
  
Both	
  web	
  communi'es	
  have	
  API	
  to	
  connect	
  to	
  their	
  
database	
  	
  
Conversing	
  Learning	
  
Conversing	
  Learning	
  
Basic	
  Steps:	
  
•  Decide	
  which	
  task	
  is	
  going	
  to	
  be	
  asked	
  	
  
•  Determine	
  who	
  are	
  the	
  oracles	
  the	
  ML	
  system	
  is	
  
going	
  to	
  consult	
  	
  
•  Propose	
  a	
  method	
  of	
  conversaBon	
  with	
  oracles,	
  
oDen	
  humans	
  	
  
•  Determine	
  how	
  to	
  feedback	
  the	
  ML	
  system	
  with	
  
the	
  community	
  inputs	
  	
  
Conversing	
  Learning	
  
Propose	
  a	
  method	
  of	
  conversaBon	
  with	
  
oracles,	
  oDen	
  humans	
  	
  
Macro	
  Ques'on-­‐Answering	
  
For	
  each	
  posted	
  ques'on:	
  
–  Ask	
  for	
  yes/no	
  simple	
  answers	
  
–  Try	
  to	
  understand	
  every	
  answer	
  
–  Discard	
  answers	
  too	
  difficult	
  to	
  understand	
  
–  Conclude	
  based	
  only	
  on	
  fully	
  understood	
  answers	
  
	
  
Conversing	
  Learning	
  
Basic	
  Steps:	
  
•  Decide	
  which	
  task	
  is	
  going	
  to	
  be	
  asked	
  	
  
•  Determine	
  who	
  are	
  the	
  oracles	
  the	
  ML	
  system	
  is	
  
going	
  to	
  consult	
  	
  
•  Propose	
  a	
  method	
  of	
  conversa'on	
  with	
  oracles,	
  
oEen	
  humans	
  	
  
•  Determine	
  how	
  to	
  feedback	
  the	
  ML	
  system	
  with	
  
the	
  community	
  inputs	
  	
  
Conversing	
  Learning	
  
how	
  to	
  feedback	
  the	
  ML	
  system	
  with	
  the	
  
community	
  inputs?	
  	
  
Suggested	
  ac'ons	
  to	
  NELL:	
  
–  Synonym/co-­‐reference	
  resolu'on	
  	
  
–  Automa'cally	
  update	
  the	
  Knowledge	
  Base	
  
	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  First	
  Order	
  Rules:	
  
•  Take	
  top	
  10%	
  of	
  rules	
  from	
  Rule	
  Learner	
  	
  
•  60	
  rules	
  were	
  converted	
  into	
  ques'ons	
  
and	
  asked	
  with	
  both	
  the	
  regular	
  and	
  the	
  
Yes/No	
  ques'on	
  approach	
  	
  
•  The	
  120	
  ques'ons	
  received	
  a	
  total	
  of	
  350	
  
answers.	
  	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  First	
  Order	
  Rules:	
  
•  Rule	
  extracted	
  from	
  NELL	
  in	
  PROLOG	
  format	
  	
  
stateLocatedInCountry(x,y):-­‐
statehascapital(x,z),	
  citylocatedincoutry(z,y)	
  	
  
	
  
•  converted	
  into	
  ques'on:	
  	
  
Is	
  this	
  statement	
  always	
  true?	
  If	
  state	
  X	
  has	
  
capital	
  Z	
  and	
  city	
  Z	
  is	
  located	
  in	
  country	
  Y	
  then	
  
state	
  X	
  is	
  located	
  in	
  country	
  Y.	
  	
  
Conversing	
  Learning	
  
Ques'on:	
  (Yes	
  or	
  No?)	
  If	
  athlete	
  Z	
  is	
  member	
  of	
  team	
  X	
  and	
  
athlete	
  Z	
  plays	
  in	
  league	
  Y,	
  then	
  team	
  X	
  plays	
  in	
  league	
  Y.	
  	
  
	
  
•  TwiGer	
  answers	
  sample:	
  	
  
	
   	
  No.	
  (Z	
  in	
  X)	
  ∧	
  (Z	
  in	
  Y)	
  →	
  (X	
  in	
  Y)	
  	
  
	
  
•  Yahoo!	
  Answers	
  sample:	
  	
  
	
   	
  NO,	
  Not	
  in	
  EVERY	
  case.	
  Athlete	
  Z	
  could	
  be	
  a	
  member	
  of	
  
football	
  team	
  X	
  and	
  he	
  could	
  also	
  play	
  in	
  his	
  pub’s	
  Friday	
  
nights	
  dart	
  team.	
  The	
  Dart	
  team	
  could	
  play	
  in	
  league	
  Y	
  (and	
  Z	
  
therefore	
  by	
  defini'on	
  plays	
  in	
  league	
  Y).	
  This	
  does	
  not	
  mean	
  
that	
  the	
  football	
  team	
  plays	
  in	
  the	
  darts	
  league!	
  	
  
Conversing	
  Learning	
  
Conversing	
  Learning	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  Facts	
  Valida'on:	
  
	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  Facts	
  Valida'on:	
  
	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  Facts	
  Valida'on:	
  
	
  
	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  Facts	
  Valida'on:	
  
	
  
Some	
  Ini'al	
  Results	
  with	
  Metadata:	
  
•  Ques'on:	
  Could	
  you	
  please	
  give	
  me	
  some	
  examples	
  of	
  
clothing?	
  	
  
•  Answer	
  01:	
  Snowshoes,	
  rain	
  ponchos,	
  galoshes,	
  sunhats,	
  
visors,	
  scarves,	
  miGens,	
  and	
  wellies	
  are	
  all	
  examples	
  of	
  
weather	
  specific	
  clothing!	
  	
  
•  Answer	
  02:	
  pants	
  	
  
•  Answer	
  03:	
  Training	
  shoes	
  can	
  be	
  worn	
  by	
  anyone	
  for	
  any	
  
purpose,	
  but	
  the	
  term	
  means	
  to	
  train	
  in	
  sports	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  Metadata:	
  
•  Users	
  replied	
  with	
  552	
  seeds	
  for	
  129	
  categories	
  
Total	
  of	
  5900	
  promo'ons	
  with	
  seeds	
  created	
  by	
  NELL’s	
  
developers	
  	
  
•  Total	
  of	
  5300	
  promo'ons	
  with	
  seeds	
  extracted	
  from	
  
answers	
  of	
  TwiGer	
  users	
  (similar	
  precision)	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  Metadata:	
  
•  For	
  Rela'on	
  Discovery	
  Components	
  
– Symmetry:	
  Is	
  it	
  always	
  true	
  that	
  if	
  a	
  person	
  P1	
  is	
  
neighbor	
  of	
  a	
  person	
  P2,	
  then	
  P2	
  is	
  neighbor	
  of	
  P1?	
  	
  
	
  
– An'-­‐symmetry:	
  Is	
  it	
  always	
  true	
  that	
  if	
  a	
  person	
  P1	
  is	
  
the	
  coach	
  of	
  a	
  person	
  P2,	
  then	
  P2	
  is	
  not	
  coach	
  of	
  P1?	
  	
  
	
  
Conversing	
  Learning	
  
Some	
  Ini'al	
  Results	
  with	
  Metadata:	
  
•  Feature	
  Weigh'ng/Selec'on	
  for	
  CMC	
  
– Logis'c	
  Regression	
  features	
  are	
  based	
  on	
  noun	
  phrase	
  
morphology	
  
– (true	
  or	
  false)	
  hotel	
  names	
  tend	
  to	
  be	
  compound	
  noun	
  
phrases	
  having	
  “hotel”	
  as	
  last	
  the	
  word.	
  	
  
– (true	
  or	
  false)	
  a	
  word	
  having	
  “burgh”	
  as	
  sufix	
  (ex.	
  
PiGsburgh)	
  tend	
  to	
  be	
  a	
  city	
  name.	
  
	
  
Conversing	
  Learning	
  
On	
  going	
  and	
  future	
  work	
  
•  Asking	
  to	
  the	
  right	
  community	
  and	
  to	
  the	
  right	
  person	
  
•  Asking	
  the	
  right	
  thing	
  to	
  maximize	
  the	
  results	
  with	
  
minimum	
  ques'ons	
  (mulB-­‐view	
  Ac've	
  Learning)	
  
•  BeGer	
  Ques'on-­‐Answering	
  methods	
  
•  Asking	
  in	
  different	
  languages	
  and	
  explore	
  'me	
  zones.	
  	
  
Conversing	
  Learning	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
OpenEval:	
  Web	
  InformaBon	
  
Query	
  EvaluaBon	
  
Mehdi	
  Samadi,	
  Manuela	
  Veloso	
  and	
  Manuel	
  Blum	
  
Computer	
  Science	
  Department	
  
Carnegie	
  Mellon	
  University,	
  PiGsburgh,	
  PA	
  
	
  
AAAI	
  2013,	
  July	
  16,	
  	
  
Bellevue,	
  WA,	
  USA	
  
I	
  can	
  wait	
  
more…	
  
Shrimp	
  is	
  
healthy	
  
0.72	
  
49	
  
Informa'on	
  
Valida'on	
  
healthyFood	
  
(shrimp)	
  
healthyFood	
  
(shrimp)	
  
healthyFood	
  
(apple)	
  
0.88	
  
•  Querying	
  by	
  human	
  or	
  agent	
  
•  Informa'on	
  valida'on	
  
•  Open	
  Web	
  
•  Online/Any'me	
  
•  Scalable	
  
•  Few	
  seed	
  examples	
  
	
  	
  	
  	
  for	
  training	
  
•  Small	
  
	
  	
  	
  ontology	
  
Mo'va'on	
  
Learning	
  
healthyFood	
   unHealthyFood	
   .	
  .	
  .	
  
50	
  
Food	
  
Apple	
  
Kale	
  
Black	
  Beans	
  
Salmon	
  
Walnut	
  
Banana	
  
…	
  
Animal	
  
Learning	
  
healthyFood	
   unHealthyFood	
   .	
  .	
  .	
  
51	
  
Food	
  
1-­‐	
  Given	
  an	
  input	
  predicate	
  instance	
  and	
  a	
  keyword,	
  OpenEval	
  
first	
  formulates	
  a	
  search	
  query.	
  
A	
  predicate	
  instance	
  
healthyFood(Apple)	
  	
  
Convert	
  to	
  a	
  query:	
  
{“apple”}.	
  
Animal	
  
Learning	
  
healthyFood	
   unHealthyFood	
   .	
  .	
  .	
  
52	
  
Food	
  
2-­‐	
  OpenEval	
  queries	
  the	
  open	
  Web	
  and	
  processes	
  the	
  retrieved	
  
unstructured	
  Web	
  pages.	
  
A	
  predicate	
  instance	
  
healthyFood(Apple)	
  	
  
Convert	
  to	
  a	
  query:	
  
{“apple”}.	
  
.	
  
.	
  
.	
  
Animal	
  
Extrac'ng	
  CBIs	
  
healthyFood	
   unHealthyFood	
   .	
  .	
  .	
  
53	
  
Food	
  
3-­‐	
  OpenEval	
  extracts	
  a	
  set	
  of	
  Context-­‐Based	
  Instances	
  (CBI).	
  
A	
  predicate	
  instance	
  
healthyFood(Shrimp)	
  	
  
Convert	
  to	
  a	
  query:	
  
{“shrimp”}.	
  
.	
  
.	
  
.	
  
X	
   pomaceous	
   fruit	
   apple	
   tree,	
  
species	
  Malus	
  domes'ca	
  rose	
  family	
  
widely	
  known	
  members	
  genus	
  Malus	
  
used	
   humans.	
   X	
   grow	
   small,	
  
deciduous	
   trees.	
   tree	
   originated	
  
Central	
  Asia,	
  wild	
  ancestora	
  
.	
  
.	
  
.	
  
Animal	
  
Learning	
  
healthyFood	
   unHealthyFood	
   .	
  .	
  .	
  
OpenEval	
  extracts	
  CBIs	
  for	
  each	
  predicate.	
  
.	
  .	
  .	
  
.	
  .	
  .	
  +	
   +	
   +	
   +	
   .	
  .	
  .	
  +	
   +	
   +	
   +	
  
healthyFood	
   unHealthyFood	
  
.	
  .	
  .	
  +	
   +	
   -­‐	
   -­‐	
  
healthyFood	
  
-­‐	
  +	
  
CBI	
  
54	
  
Food	
   Animal	
  
Learning	
  
healthyFood	
   unHealthyFood	
   .	
  .	
  .	
  
OpenEval	
  extracts	
  CBIs	
  for	
  each	
  predicate.	
  
.	
  .	
  .	
  
.	
  .	
  .	
  +	
   +	
   +	
   +	
   .	
  .	
  .	
  +	
   +	
   +	
   +	
  
healthyFood	
   unHealthyFood	
  
healthyFood	
  
-­‐	
  +	
  
CBI	
  
55	
  
Food	
  
.	
  .	
  .	
  +	
   +	
   -­‐	
  -­‐	
   .	
  .	
  .	
  
OpenEval	
  trains	
  a	
  SVM	
  for	
  each	
  predicate	
  using	
  training	
  CBIs.	
  
Animal	
  
What	
  does	
  OpenEval	
  learn?	
  
healthyFood(apple)	
   healthyFood(apple)	
  “vitamin”	
  
Learn	
  how	
  to	
  map	
  instances	
  to	
  an	
  appropriate	
  predicate	
  (i.e.,	
  
sense)	
  that	
  they	
  belong	
  to.	
   56	
  
Learning	
  
.	
  .	
  .	
  .	
  .	
  .	
  +	
   +	
   -­‐	
  -­‐	
  
healthyFood	
  
.	
  .	
  .	
   .	
  .	
  .	
  +	
   +	
   -­‐	
  -­‐	
  
unHealthyFood	
  
.	
  .	
  .	
  
57	
  
Learning	
  
.	
  .	
  .	
  
Choose	
  predicate	
  with	
  maximum	
  entropy.	
  
.	
  .	
  .	
  +	
   +	
   +	
   +	
   .	
  .	
  .	
  +	
   +	
   +	
   +	
  
healthyFood	
   unHealthyFood	
  
	
  
.	
  .	
  .	
  +	
   +	
   -­‐	
  -­‐	
  
healthyFood	
  
-­‐	
  +	
   -­‐	
  
.	
  .	
  .	
  
.	
  .	
  .	
  +	
   +	
   -­‐	
  -­‐	
  
healthyFood	
  
.	
  .	
  .	
   .	
  .	
  .	
  +	
   +	
   -­‐	
  -­‐	
  
unHealthyFood	
  
.	
  .	
  .	
  
Choose	
  a	
  keyword	
  for	
  the	
  selected	
  predicate.	
  
Extract	
  CBIs	
  for	
  the	
  predicate	
  using	
  the	
  selected	
  keyword.	
  
+	
   +	
   .	
  .	
  
Re-­‐train	
  a	
  SVM	
  for	
  the	
  predicate.	
  
58	
  
Predicate	
  Instance	
  Evaluator	
  	
  
keywords:	
  
healthyFood(shrimp)?	
  
Given	
  the	
  input	
  Bme,	
  which	
  CBIs	
  should	
  be	
  extracted?	
  
59	
  
Vitamin	
  	
  	
  	
  	
  0.88	
  
Calories	
  	
  	
  	
  	
  0.83	
  
Grow	
  	
  	
  	
  	
  	
  	
  	
  	
  0.69	
  
Tree	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.66	
  
Amount	
  	
  	
  	
  0.59	
  
Minerals	
  	
  	
  	
  0.49	
  
.	
  
.	
  
.	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
OpenEval	
  in	
  the	
  last	
  itera'on:	
  
academicfield	
  0.8976357986206526
	
  Environmental	
  Anthropology. 	
  	
  
Several	
  excellent	
  textbooks	
  and	
  readers	
  
in	
  environmental	
  anthropology	
  have	
  
now	
  appeared,	
  establishing	
  a	
  basic	
  
survey	
  of	
  the	
  	
  field.	
  
	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
OpenEval	
  in	
  the	
  last	
  itera'on:	
  
academicfield	
  0.912473775634353
	
  Anesthesiology. 	
  	
  
The	
  Department	
  of	
  Anesthesiology	
  is	
  
commiGed	
  to	
  excellence	
  in	
  clinical	
  
service,	
  educa'on,	
  research	
  and	
  faculty	
  
development.	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
OpenEval	
  in	
  the	
  last	
  itera'on:	
  
worksfor 	
  0.9845774661303888
	
  (charles	
  osgood,	
  cbs).	
  
Charles	
  Osgood,	
  oEen	
  referred	
  to	
  as	
  
CBS	
  News'	
  poet-­‐in-­‐residence,	
  has	
  been	
  
anchor	
  of	
  "CBS	
  News	
  Sunday	
  
Morning"	
  since	
  1994.	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Knowledge	
  Base	
  Valida'on	
  in	
  NELL	
  
•  Human	
  Supervision:	
  RTW	
  group	
  members;	
  	
  
•  Conversing	
  Learning:	
  NELL	
  can	
  autonomously	
  talk	
  to	
  
people	
  in	
  web	
  communi'es	
  and	
  ask	
  for	
  help	
  
•  Web	
  Querying:	
  NELL	
  can	
  query	
  the	
  Web	
  on	
  specific	
  
facts	
  to	
  verify	
  correctness,	
  or	
  to	
  predict	
  the	
  validity	
  of	
  
a	
  new	
  fact;	
  	
  
•  Hiring	
  Labelers:	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  web	
  services	
  such	
  as	
  Mechanical	
  Turk)	
  to	
  label	
  
data	
  and	
  help	
  the	
  system	
  to	
  validate	
  acquired	
  
knowledge.	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Hiring	
  Labelers:	
  	
  
•  Currently	
  NELL	
  can	
  autonomously	
  hire	
  people	
  
(using	
  Amazon’s	
  Mechanical	
  Turk)	
  
•  Default	
  number	
  of	
  instances	
  is	
  (uniformly	
  
distributed)	
  sampled	
  from	
  each	
  Category	
  and	
  
each	
  Rela'on	
  	
  
•  Can	
  be	
  used	
  to	
  precision	
  es'mate	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Hiring	
  Labelers:	
  	
  
•  Task	
  is	
  to	
  validate	
  Category	
  and	
  Rela'on	
  
instances	
  
– Category	
  instances:	
  Is	
  Google	
  a	
  company?	
  Is	
  
Mountain	
  View	
  a	
  city?	
  
– Rela'on	
  instances:	
  Is	
  Google	
  headquartered	
  in	
  
Mountain	
  View?	
  Does	
  Tom	
  Mitchell	
  work	
  for	
  
Carnegie	
  Mellon?	
  	
  	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
Hiring	
  Labelers:	
  	
  
•  Research	
  Ques'ons:	
  
– Sampling	
  Strategies/Adap've	
  Sampling	
  	
  
	
  
– Quality	
  of	
  answers/turkers	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
NELL	
  is	
  grown	
  enough	
  for	
  a	
  new	
  step	
  
	
  
NELL	
  turned	
  4	
  on	
  Jan	
  12!	
  
CongratulaBons	
  NELL!!	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
NELL	
  is	
  grown	
  enough	
  for	
  a	
  new	
  step	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
NELL	
  is	
  grown	
  enough	
  for	
  a	
  new	
  step	
  
•  Knowledge	
  on	
  Demand	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
NELL	
  is	
  grown	
  enough	
  for	
  a	
  new	
  step	
  
•  Knowledge	
  on	
  Demand	
  
NELL:	
  Never-­‐Ending	
  Language	
  Learner	
  
NELL	
  is	
  grown	
  enough	
  for	
  a	
  new	
  step	
  
•  Knowledge	
  on	
  Demand	
  –	
  Ask	
  NELL	
  
estevam.hruschka@gmail.com
Thank you very much Google
Mountain View!
And thanks to Google, DARPA, NSF, CNPq
for partial funding! And thanks to Yahoo! for
M45 computing and and thanks to Microsoft
for fellowship to Edith Law and thanks to
Carnegie Mellon University and thanks to
Federal University of São Carlos
References	
  
•  [Fern,	
  2008]	
  Xiaoli	
  Z.	
  Fern,	
  CS	
  434:	
  Machine	
  Learning	
  and	
  Data	
  Mining,	
  	
  School	
  of	
  Electrical	
  Engineering	
  
and	
  Computer	
  Science,	
  Oregon	
  State	
  University,	
  Fall	
  	
  2008.	
  
•  [DARPA,	
  2012]	
  DARPA	
  Machine	
  Reading	
  Program,	
  hGp://www.darpa.mil/Our_Work/I2O/Programs/
Machine_Reading.aspx.	
  
•  [Mitchell,	
  2006]	
  Tom	
  M.	
  Mitchell,	
  The	
  Discipline	
  of	
  Machine	
  Learning,	
  my	
  perspec've	
  on	
  this	
  research	
  
field,	
  July	
  2006	
  (hGp://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf).	
  
•  [Mitchell,	
  1997]	
  Tom	
  M.	
  Mitchell,	
  Machine	
  Learning.	
  McGraw-­‐Hill,	
  1997.	
  
•  [Etzioni	
  et	
  al.,	
  2007]	
  Oren	
  Etzioni,	
  Michele	
  Banko,	
  and	
  Michael	
  J.	
  Cafarella,	
  Machine	
  Reading.The	
  2007	
  
AAAI	
  Spring	
  Symposium.	
  Published	
  by	
  The	
  AAAI	
  Press,	
  Menlo	
  Park,	
  California,	
  2007.	
  
•  [Clark	
  et	
  al.,	
  2007]	
  Peter	
  Clark,	
  Phil	
  Harrison,	
  John	
  Thompson,	
  Rick	
  Wojcik,	
  Tom	
  Jenkins,	
  David	
  Israel,	
  
Reading	
  to	
  Learn:	
  An	
  Inves'ga'on	
  into	
  Language	
  Understanding.	
  The	
  2007	
  AAAI	
  Spring	
  Symposium.	
  
Published	
  by	
  The	
  AAAI	
  Press,	
  Menlo	
  Park,	
  California,	
  2007.	
  
•  [Norvig,	
  2007]	
  Peter	
  Norvig,	
  	
  Inference	
  in	
  Text	
  Understanding.	
  The	
  2007	
  AAAI	
  Spring	
  Symposium.	
  
Published	
  by	
  The	
  AAAI	
  Press,	
  Menlo	
  Park,	
  California,	
  2007.	
  
•  [Wang	
  &	
  Cohen,	
  2007]	
  Richard	
  C.	
  Wang	
  and	
  William	
  W.	
  Cohen:	
  Language-­‐Independent	
  Set	
  Expansion	
  of	
  
Named	
  En''es	
  using	
  the	
  Web.	
  In	
  Proceedings	
  of	
  IEEE	
  InternaHonal	
  Conference	
  on	
  Data	
  Mining	
  (ICDM	
  
2007),	
  Omaha,	
  NE,	
  USA.	
  2007.	
  
•  [Etzioni,	
  2008]	
  Oren	
  Etzioni.	
  2008.	
  Machine	
  reading	
  at	
  web	
  scale.	
  In	
  Proceedings	
  of	
  the	
  internaHonal	
  
conference	
  on	
  Web	
  search	
  and	
  web	
  data	
  mining	
  (WSDM	
  '08).	
  ACM,	
  New	
  York,	
  NY,	
  USA,	
  2-­‐2.	
  
•  [Banko,	
  et	
  al.,	
  2007]	
  Michele	
  Banko,	
  Michael	
  J.	
  Cafarella,	
  Stephen	
  Soderland,	
  MaGhew	
  Broadhead,	
  Oren	
  
Etzioni:	
  Open	
  Informa'on	
  Extrac'on	
  from	
  the	
  Web.	
  IJCAI	
  2007:	
  2670-­‐2676	
  
References	
  
•  [Weikum	
  et	
  al.,	
  2009]	
  G.	
  Weikum,	
  G.,	
  Kasneci,	
  M.	
  Ramanath,	
  F.	
  Suchanek.	
  DB	
  &	
  IR	
  methods	
  for	
  	
  
•  knowledge	
  discovery.	
  Communica'ons	
  of	
  the	
  ACM	
  52(4),	
  2009.	
  
•  [Theobald	
  &	
  Weikum,	
  2012]	
  Mar'n	
  Theobald	
  and	
  Gerhard	
  Weikum.	
  From	
  Informa'on	
  to	
  Knowledge:	
  
Harves'ng	
  En''es	
  and	
  Rela'onships	
  from	
  Web	
  Sources.	
  Tutorial	
  at	
  PODS	
  2012	
  	
  
•  [Hoffart	
  et	
  al.,	
  2012]	
  Johannes	
  Hoffart,	
  Fabian	
  Suchanek,	
  Klaus	
  Berberich,	
  Gerhard	
  Weikum.	
  YAGO2:	
  A	
  
Spa'ally	
  and	
  Temporally	
  Enhanced	
  Knowledge	
  Base	
  from	
  Wikipedia.	
  Special	
  issue	
  of	
  the	
  Ar'ficial	
  
Intelligence	
  Journal,	
  2012	
  	
  
•  [Etzioni	
  et	
  al.,	
  2011]	
  Oren	
  Etzioni,	
  Anthony	
  Fader,	
  Janara	
  Christensen,	
  Stephen	
  Soderland,	
  and	
  Mausam	
  
"Open	
  Informa'on	
  Extrac'on:	
  the	
  Second	
  Genera'on“.	
  	
  Proceedings	
  of	
  the	
  22nd	
  InternaHonal	
  Joint	
  
Conference	
  on	
  ArHficial	
  Intelligence	
  (IJCAI	
  2011).	
  
•  [Hady	
  et	
  al.,	
  2011]	
  Hady	
  W.	
  Lauw,	
  Ralf	
  Schenkel,	
  Fabian	
  Suchanek,	
  Mar'n	
  Theobald,	
  and	
  Gerhard	
  
Weikum,	
  "Seman'c	
  Knowledge	
  Bases	
  from	
  Web	
  Sources"	
  at	
  IJCAI	
  2011,	
  Barcelona,	
  July	
  2011	
  
•  [Fader	
  et	
  al.,	
  2011]	
  Anthony	
  Fader,	
  Stephen	
  Soderland,	
  and	
  Oren	
  Etzioni.	
  "Iden'fying	
  Rela'ons	
  for	
  Open	
  
Informa'on	
  Extrac'on”.	
  Proceedings	
  of	
  the	
  2011	
  Conference	
  on	
  Empirical	
  Methods	
  in	
  Natural	
  Language	
  
Processing	
  (EMNLP	
  2011)	
  
•  SeGles,	
  B.:	
  Closing	
  the	
  loop:	
  Fast,	
  interac've	
  semi-­‐supervised	
  annota'on	
  with	
  queries	
  on	
  features	
  and	
  
instances.	
  In:	
  Proc.	
  of	
  the	
  EMNLP’11,	
  Edinburgh,	
  ACL	
  (2011)	
  1467–1478	
  5.	
  	
  
•  Carlson,	
  A.,	
  BeGeridge,	
  J.,	
  Kisiel,	
  B.,	
  SeGles,	
  B.,	
  Jr.,	
  E.R.H.,	
  Mitchell,	
  T.M.:	
  Toward	
  an	
  architecture	
  for	
  never-­‐
ending	
  language	
  learning.	
  In:	
  Proceedings	
  of	
  the	
  Twenty-­‐Fourth	
  Conference	
  on	
  Ar'ficial	
  Intelligence	
  (AAAI	
  
2010).	
  
•  Pedro,	
  S.D.S.,	
  Hruschka	
  Jr.,	
  E.R.:	
  Collec've	
  intelligence	
  as	
  a	
  source	
  for	
  machine	
  learning	
  self-­‐supervision.	
  
In:	
  Proc.	
  of	
  the	
  4th	
  Interna'onal	
  Workshop	
  on	
  Web	
  Intelligence	
  and	
  Communi'es.	
  WIC12,	
  NY,	
  USA,	
  ACM	
  
(2012)	
  5:1–5:9	
  
References	
  
•  [Appel	
  &	
  Hruschka	
  Jr.,	
  2011]	
  Appel,	
  A.P.,	
  Hruschka	
  Jr.,	
  E.R.:	
  Prophet	
  –	
  a	
  link-­‐predictor	
  to	
  learn	
  new	
  rules	
  on	
  Nell.	
  
In:	
  Proceedings	
  of	
  the	
  2011	
  IEEE	
  11th	
  Interna'onal	
  Conference	
  on	
  Data	
  Mining	
  Workshops.	
  pp.	
  917–924.	
  ICDMW	
  
’11,	
  IEEE	
  Computer	
  Society,	
  Washington,	
  DC,	
  USA	
  (2011)	
  
•  [Mohamed	
  et	
  al.,	
  2011]	
  Mohamed,	
  T.P.,	
  Hruschka,	
  Jr.,	
  E.R.,	
  Mitchell,	
  T.M.:	
  Discovering	
  rela'ons	
  between	
  noun	
  
categories.	
  In:	
  Proceedings	
  of	
  the	
  Conference	
  on	
  Empirical	
  Methods	
  in	
  Nat-­‐	
  ural	
  Language	
  Processing.	
  pp.	
  1447–
1455.	
  EMNLP	
  ’11,	
  Associa'on	
  for	
  Computa-­‐	
  'onal	
  Linguis'cs,	
  Stroudsburg,	
  PA,	
  USA	
  (2011)	
  
•  [Pedro	
  &	
  Hruschka	
  Jr.,	
  2012]	
  Saulo	
  D.S.	
  Pedro	
  and	
  Estevam	
  R.	
  Hruschka	
  Jr.,	
  Conversing	
  Learning:	
  ac've	
  learning	
  
and	
  ac've	
  social	
  interac'on	
  for	
  human	
  supervision	
  in	
  never-­‐ending	
  learning	
  systems.	
  Xiii	
  Ibero-­‐american	
  
Conference	
  On	
  Ar'ficial	
  Intelligence,	
  IBERAMIA	
  2012,	
  2012.	
  
•  Krishnamurthy,	
  J.,	
  Mitchell,	
  T.M.:	
  Which	
  noun	
  phrases	
  denote	
  which	
  concepts.	
  In:	
  Proceedings	
  of	
  the	
  Forty	
  Ninth	
  
Annual	
  Mee'ng	
  of	
  the	
  Associa'on	
  for	
  Compu-­‐	
  ta'onal	
  Linguis'cs	
  (2011)	
  
•  Lao,	
  N.,	
  Mitchell,	
  T.,	
  Cohen,	
  W.W.:	
  Random	
  walk	
  inference	
  and	
  learning	
  in	
  a	
  large	
  scale	
  knowledge	
  base.	
  In:	
  
Proceedings	
  of	
  the	
  2011	
  Conference	
  on	
  Empirical	
  Methods	
  in	
  Natural	
  Language	
  Processing.	
  pp.	
  529–539.	
  Associa-­‐	
  
'on	
  for	
  Computa'onal	
  Linguis'cs,	
  Edinburgh,	
  Scotland,	
  UK.	
  (July	
  2011),	
  hGp://www.aclweb.org/anthology/
D11-­‐1049	
  
•  E.	
  R.	
  Hruschka	
  Jr.	
  and	
  M.	
  C.	
  Duarte	
  and	
  M.	
  C.	
  Nicole€.	
  Coupling	
  as	
  Strategy	
  for	
  Reducing	
  Concept-­‐DriE	
  in	
  Never-­‐
ending	
  Learning	
  Environments.	
  Fundamenta	
  Informa'cae,	
  IOS	
  Press,	
  2012.	
  
•  Saulo	
  D.S.	
  Pedro,	
  Ana	
  Paula	
  Appel,	
  and	
  Estevam	
  R.	
  Hruschka,	
  Jr.	
  Autonomously	
  reviewing	
  and	
  valida'ng	
  the	
  
knowledge	
  base	
  of	
  a	
  never-­‐ending	
  learning	
  system.	
  In	
  Proceedings	
  of	
  the	
  22nd	
  internaHonal	
  conference	
  on	
  World	
  
Wide	
  Web	
  companion	
  (WWW	
  '13	
  Companion),	
  1195-­‐120,	
  2013.	
  
•  S.	
  Verma	
  and	
  E.	
  R.	
  Hruschka	
  Jr.	
  Coupled	
  Bayesian	
  Sets	
  Algorithm	
  for	
  Semi-­‐supervised	
  Learning	
  and	
  Informa'on	
  
Extrac'on.	
  In	
  Proceedings	
  of	
  the	
  European	
  Conference	
  on	
  Machine	
  Learning	
  and	
  Principles	
  and	
  Prac'ce	
  of	
  
Knowledge	
  Discovery	
  in	
  Databases	
  (ECML	
  PKDD),	
  2012.	
  
•  Navarro,	
  L.	
  F.	
  and	
  Appel,	
  A.	
  P.	
  and	
  Hruschka	
  Jr.,	
  E.	
  R.,	
  GraphDB	
  –	
  Storing	
  Large	
  Graphs	
  on	
  Secondary	
  Memory.	
  In	
  
New	
  Trends	
  in	
  Databases	
  and	
  Informa'on.	
  Advances	
  in	
  Intelligent	
  Systems	
  and	
  Compu'ng,	
  Springer,	
  177-­‐186,	
  
2013.	
  
References	
  
•  Assuming	
  Facts	
  Are	
  Expressed	
  More	
  Than	
  Once.	
  	
  
J.	
  BeGeridge,	
  A.	
  RiGer	
  and	
  T.	
  Mitchell	
  In	
  Proceedings	
  of	
  the	
  27th	
  Interna'onal	
  Florida	
  Ar'ficial	
  
Intelligence	
  Research	
  Society	
  Conference	
  (FLAIRS-­‐27),	
  2014.	
  	
  
•  EsBmaBng	
  Accuracy	
  from	
  Unlabeled	
  Data.	
  	
  
E.	
  A.	
  Platanios,	
  A.	
  Blum,	
  T.	
  Mitchell.	
  In	
  Uncertainty	
  in	
  Ar'ficial	
  Intelligence	
  (UAI),	
  2014.	
  	
  
•  CTPs:	
  Contextual	
  Temporal	
  Profiles	
  for	
  Time	
  Scoping	
  Facts	
  via	
  EnBty	
  State	
  Change	
  DetecBon.	
  	
  
D.T.	
  Wijaya,	
  N.	
  Nakashole	
  and	
  T.M.	
  Mitchell.	
  In	
  Proceedings	
  of	
  the	
  Conference	
  on	
  Empirical	
  
Methods	
  in	
  Natural	
  Language	
  Processing	
  (EMNLP),	
  2014.	
  
•  IncorporaBng	
  Vector	
  Space	
  Similarity	
  in	
  Random	
  Walk	
  Inference	
  over	
  Knowledge	
  Bases.	
  
M.	
  Gardner,	
  P.	
  Talukdar,	
  J.	
  Krishnamurthy	
  and	
  T.M.	
  Mitchell.	
  In	
  Proceedings	
  of	
  the	
  Conference	
  on	
  
Empirical	
  Methods	
  in	
  Natural	
  Language	
  Processing	
  (EMNLP),	
  2014.	
  	
  
•  Scaling	
  Graph-­‐based	
  Semi	
  Supervised	
  Learning	
  to	
  Large	
  Number	
  of	
  Labels	
  Using	
  Count-­‐Min	
  
Sketch	
  
P.	
  P.	
  Talukdar,	
  and	
  W.	
  Cohen	
  In	
  17th	
  Interna'onal	
  Conference	
  on	
  Ar'ficial	
  Intelligence	
  and	
  
Sta's'cs	
  (AISTATS,	
  2014.	
  	
  
•  Programming	
  with	
  Personalized	
  PageRank:	
  A	
  Locally	
  Groundable	
  First-­‐Order	
  ProbabilisBc	
  Logic.	
  	
  
W.Y.	
  Wang,	
  K.	
  Mazai's	
  and	
  W.W.	
  Cohen.	
  In	
  Proceedings	
  of	
  the	
  Conference	
  on	
  Informa'on	
  and	
  
Knowledge	
  Management	
  (CIKM),	
  2013.	
  
•  Improving	
  Learning	
  and	
  Inference	
  in	
  a	
  Large	
  Knowledge-­‐base	
  using	
  Latent	
  SyntacBc	
  Cues.	
  	
  
MaG	
  Gardner,	
  Partha	
  Pra'm	
  Talukdar,	
  Bryan	
  Kisiel,	
  and	
  Tom	
  Mitchell.	
  In	
  Proceedings	
  of	
  the	
  2013	
  
Conference	
  on	
  Empirical	
  Methods	
  in	
  Natural	
  Language	
  Processing	
  (EMNLP	
  2013),	
  2013.	
  	
  

More Related Content

PDF
NELL: The Never-Ending Language Learning System
PDF
Question Answering - Application and Challenges
PDF
Collaborative filtering for recommendation systems in Python, Nicolas Hug
PDF
Transcript - Provenance and Social Science data
PDF
Recommender system algorithm and architecture
PDF
K010116671
PDF
E010243540
PPSX
NMC presentation Sababa Partners Inc
NELL: The Never-Ending Language Learning System
Question Answering - Application and Challenges
Collaborative filtering for recommendation systems in Python, Nicolas Hug
Transcript - Provenance and Social Science data
Recommender system algorithm and architecture
K010116671
E010243540
NMC presentation Sababa Partners Inc

Viewers also liked (20)

PPTX
PEN initiatives 2014
PDF
Riscos i Beneficis d'Internet
PDF
CV Santillán English
PDF
Bases administrativas
PDF
F010513135
PDF
E1304012630
PDF
O010227375
PPSX
anplppfinal
PDF
U01821129153
PDF
B017250715
PDF
J010217780
PDF
M012228693
PDF
N012147579
PDF
Digital Implementation of Fuzzy Logic Controller for Real Time Position Contr...
PPTX
Get2gether Presentation
PDF
H1304015157
PDF
C010521418
PDF
E012233643
PDF
Cf newsletter oct2007ambergitterprofiled
PDF
Hazibag general brochure 2015
PEN initiatives 2014
Riscos i Beneficis d'Internet
CV Santillán English
Bases administrativas
F010513135
E1304012630
O010227375
anplppfinal
U01821129153
B017250715
J010217780
M012228693
N012147579
Digital Implementation of Fuzzy Logic Controller for Real Time Position Contr...
Get2gether Presentation
H1304015157
C010521418
E012233643
Cf newsletter oct2007ambergitterprofiled
Hazibag general brochure 2015
Ad

Similar to Automatically Labeling Facts in a Never-Ending Langue Learning system (20)

PPTX
State of the art in Natural Language Processing (March 2019)
PDF
NLP Project Full Cycle
PDF
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
PDF
Automatic Knowledge Base Expansion for Dialogue Management
PPTX
Nautral Langauge Processing - Basics / Non Technical
PPTX
Taming Text
PPTX
https://www.slideshare.net/amaresimachew/hot-topics-132093738
PDF
Aspects of NLP Practice
PPTX
NLP.pptx
PPT
ppt
PDF
Grosof haley-talk-semtech2013-ver6-10-13
PDF
Natural Language Processing and Language Learning
PPTX
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
PDF
Practical NLP with Lisp
PPT
lect36-tasks.ppt
PPT
NLP Tasks and Applications.ppt useful in
PPTX
Introduction to natural language processing, history and origin
PDF
Natural Language Processing
DOCX
Natural language processing
PDF
Portuguese Linguistic Tools: What, Why and How
State of the art in Natural Language Processing (March 2019)
NLP Project Full Cycle
Training at AI Frontiers 2018 - Ni Lao: Weakly Supervised Natural Language Un...
Automatic Knowledge Base Expansion for Dialogue Management
Nautral Langauge Processing - Basics / Non Technical
Taming Text
https://www.slideshare.net/amaresimachew/hot-topics-132093738
Aspects of NLP Practice
NLP.pptx
ppt
Grosof haley-talk-semtech2013-ver6-10-13
Natural Language Processing and Language Learning
Introduction to Natural Language Processing - Stages in NLP Pipeline, Challen...
Practical NLP with Lisp
lect36-tasks.ppt
NLP Tasks and Applications.ppt useful in
Introduction to natural language processing, history and origin
Natural Language Processing
Natural language processing
Portuguese Linguistic Tools: What, Why and How
Ad

Recently uploaded (20)

PDF
Microsoft Core Cloud Services powerpoint
PDF
Global Data and Analytics Market Outlook Report
PPTX
Steganography Project Steganography Project .pptx
PPT
statistic analysis for study - data collection
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPT
Image processing and pattern recognition 2.ppt
PDF
Introduction to Data Science and Data Analysis
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
Leprosy and NLEP programme community medicine
PPTX
CYBER SECURITY the Next Warefare Tactics
PPTX
modul_python (1).pptx for professional and student
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
chrmotography.pptx food anaylysis techni
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Microsoft Core Cloud Services powerpoint
Global Data and Analytics Market Outlook Report
Steganography Project Steganography Project .pptx
statistic analysis for study - data collection
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
Image processing and pattern recognition 2.ppt
Introduction to Data Science and Data Analysis
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Predictive modeling basics in data cleaning process
Leprosy and NLEP programme community medicine
CYBER SECURITY the Next Warefare Tactics
modul_python (1).pptx for professional and student
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
chrmotography.pptx food anaylysis techni
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx

Automatically Labeling Facts in a Never-Ending Langue Learning system

  • 1. Automa'cally  Labeling  Facts  in  a   Never-­‐Ending  Langue  Learning  system   Estevam  R.  Hruschka  Jr.   Federal  University  of  São  Carlos     Joint  Work  with  the  Carnegie  Mellon  Read  The  Web  Group  
  • 5. Never-­‐Ending  Learning   •  Main Task: acquire  a  growing  competence   without  asymptote     •  over  years   •  mul'ple  func'ons   •  where  learning  one  thing  improves  ability  to  learn  the  next     •  acquiring  data  from  humans,  environment     •  Many  candidate  domains:     •  Robots     •  SoEbots     •  Game  players    
  • 7. NELL:  Never-­‐Ending  Language  Learner   Inputs: l     initial ontology       l     handful of examples of each predicate in ontology l     the web l     occasional interaction with human trainers The task: l     run 24x7, forever •    each day: 1.    extract more facts from the web to populate the initial ontology 2.    learn to read (perform #1) better than yesterday
  • 9. NELL:  Never-­‐Ending  Language  Learner   Goal: •    run 24x7, forever •    each day: 1.    extract more facts from the web to populate given ontology 2.    learn to read better than yesterday Today... Running 24 x 7, since January, 2010 Input: •    ontology defining ~800 categories and relations •    10-20 seed examples of each •    1 billion web pages (ClueWeb – Jamie Callan) Result: •    continuously growing KB with +70,000,000 extracted beliefs
  • 11. Human     Advice   and  e  
  • 13. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 14. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 15. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL     •  Human  Supervision:  RTW  group  members;    
  • 16. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL     •  Human  Supervision:  RTW  group  members;    
  • 17. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL     •  Human  Supervision:  RTW  group  members;    
  • 18. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL     •  Human  Supervision:  RTW  group  members;    
  • 19. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 20. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 22. Conversing  Learning   Basic  Steps:   •  Decide  which  task  is  going  to  be  asked     •  Determine  who  are  the  oracles  the  ML  system  is   going  to  consult     •  Propose  a  method  of  conversa'on  with  oracles,   oEen  humans     •  Determine  how  to  feedback  the  ML  system  with   the  community  inputs    
  • 23. Conversing  Learning   Basic  Steps:   •  Decide  which  task  is  going  to  be  asked   •  Determine  who  are  the  oracles  the  ML  system  is   going  to  consult     •  Propose  a  method  of  conversa'on  with  oracles,   oEen  humans     •  Determine  how  to  feedback  the  ML  system  with   the  community  inputs    
  • 24. Conversing  Learning   Decide  which  task  is  going  to  be  asked     •  Learned  facts   •  Learned  Inference  Rules   •  Metadata  (mainly  for  automa'cally  extending  the   ontology)  
  • 25. Conversing  Learning   Basic  Steps:   •  Decide  which  task  is  going  to  be  asked     •  Determine  who  are  the  oracles  the  ML  system  is   going  to  consult     •  Propose  a  method  of  conversa'on  with  oracles,   oEen  humans     •  Determine  how  to  feedback  the  ML  system  with   the  community  inputs    
  • 26. Conversing  Learning   who  are  the  oracles  the  ML  system  is  going  to   consult   Yahoo!  Answers     – very  popular  on  the  Web     – a  lot  of  metadata  to  harvest       TwiGer     – millions  of  users  worldwide   – a  system  that  was  not  designed  to  work  as  a  QA   environment     Both  web  communi'es  have  API  to  connect  to  their   database    
  • 28. Conversing  Learning   Basic  Steps:   •  Decide  which  task  is  going  to  be  asked     •  Determine  who  are  the  oracles  the  ML  system  is   going  to  consult     •  Propose  a  method  of  conversaBon  with  oracles,   oDen  humans     •  Determine  how  to  feedback  the  ML  system  with   the  community  inputs    
  • 29. Conversing  Learning   Propose  a  method  of  conversaBon  with   oracles,  oDen  humans     Macro  Ques'on-­‐Answering   For  each  posted  ques'on:   –  Ask  for  yes/no  simple  answers   –  Try  to  understand  every  answer   –  Discard  answers  too  difficult  to  understand   –  Conclude  based  only  on  fully  understood  answers    
  • 30. Conversing  Learning   Basic  Steps:   •  Decide  which  task  is  going  to  be  asked     •  Determine  who  are  the  oracles  the  ML  system  is   going  to  consult     •  Propose  a  method  of  conversa'on  with  oracles,   oEen  humans     •  Determine  how  to  feedback  the  ML  system  with   the  community  inputs    
  • 31. Conversing  Learning   how  to  feedback  the  ML  system  with  the   community  inputs?     Suggested  ac'ons  to  NELL:   –  Synonym/co-­‐reference  resolu'on     –  Automa'cally  update  the  Knowledge  Base    
  • 32. Conversing  Learning   Some  Ini'al  Results  with  First  Order  Rules:   •  Take  top  10%  of  rules  from  Rule  Learner     •  60  rules  were  converted  into  ques'ons   and  asked  with  both  the  regular  and  the   Yes/No  ques'on  approach     •  The  120  ques'ons  received  a  total  of  350   answers.    
  • 33. Conversing  Learning   Some  Ini'al  Results  with  First  Order  Rules:   •  Rule  extracted  from  NELL  in  PROLOG  format     stateLocatedInCountry(x,y):-­‐ statehascapital(x,z),  citylocatedincoutry(z,y)       •  converted  into  ques'on:     Is  this  statement  always  true?  If  state  X  has   capital  Z  and  city  Z  is  located  in  country  Y  then   state  X  is  located  in  country  Y.    
  • 34. Conversing  Learning   Ques'on:  (Yes  or  No?)  If  athlete  Z  is  member  of  team  X  and   athlete  Z  plays  in  league  Y,  then  team  X  plays  in  league  Y.       •  TwiGer  answers  sample:        No.  (Z  in  X)  ∧  (Z  in  Y)  →  (X  in  Y)       •  Yahoo!  Answers  sample:        NO,  Not  in  EVERY  case.  Athlete  Z  could  be  a  member  of   football  team  X  and  he  could  also  play  in  his  pub’s  Friday   nights  dart  team.  The  Dart  team  could  play  in  league  Y  (and  Z   therefore  by  defini'on  plays  in  league  Y).  This  does  not  mean   that  the  football  team  plays  in  the  darts  league!    
  • 37. Conversing  Learning   Some  Ini'al  Results  with  Facts  Valida'on:    
  • 38. Conversing  Learning   Some  Ini'al  Results  with  Facts  Valida'on:    
  • 39. Conversing  Learning   Some  Ini'al  Results  with  Facts  Valida'on:      
  • 40. Conversing  Learning   Some  Ini'al  Results  with  Facts  Valida'on:    
  • 41. Some  Ini'al  Results  with  Metadata:   •  Ques'on:  Could  you  please  give  me  some  examples  of   clothing?     •  Answer  01:  Snowshoes,  rain  ponchos,  galoshes,  sunhats,   visors,  scarves,  miGens,  and  wellies  are  all  examples  of   weather  specific  clothing!     •  Answer  02:  pants     •  Answer  03:  Training  shoes  can  be  worn  by  anyone  for  any   purpose,  but  the  term  means  to  train  in  sports   Conversing  Learning  
  • 42. Some  Ini'al  Results  with  Metadata:   •  Users  replied  with  552  seeds  for  129  categories   Total  of  5900  promo'ons  with  seeds  created  by  NELL’s   developers     •  Total  of  5300  promo'ons  with  seeds  extracted  from   answers  of  TwiGer  users  (similar  precision)   Conversing  Learning  
  • 43. Some  Ini'al  Results  with  Metadata:   •  For  Rela'on  Discovery  Components   – Symmetry:  Is  it  always  true  that  if  a  person  P1  is   neighbor  of  a  person  P2,  then  P2  is  neighbor  of  P1?       – An'-­‐symmetry:  Is  it  always  true  that  if  a  person  P1  is   the  coach  of  a  person  P2,  then  P2  is  not  coach  of  P1?       Conversing  Learning  
  • 44. Some  Ini'al  Results  with  Metadata:   •  Feature  Weigh'ng/Selec'on  for  CMC   – Logis'c  Regression  features  are  based  on  noun  phrase   morphology   – (true  or  false)  hotel  names  tend  to  be  compound  noun   phrases  having  “hotel”  as  last  the  word.     – (true  or  false)  a  word  having  “burgh”  as  sufix  (ex.   PiGsburgh)  tend  to  be  a  city  name.     Conversing  Learning  
  • 45. On  going  and  future  work   •  Asking  to  the  right  community  and  to  the  right  person   •  Asking  the  right  thing  to  maximize  the  results  with   minimum  ques'ons  (mulB-­‐view  Ac've  Learning)   •  BeGer  Ques'on-­‐Answering  methods   •  Asking  in  different  languages  and  explore  'me  zones.     Conversing  Learning  
  • 46. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 47. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 48. OpenEval:  Web  InformaBon   Query  EvaluaBon   Mehdi  Samadi,  Manuela  Veloso  and  Manuel  Blum   Computer  Science  Department   Carnegie  Mellon  University,  PiGsburgh,  PA     AAAI  2013,  July  16,     Bellevue,  WA,  USA  
  • 49. I  can  wait   more…   Shrimp  is   healthy   0.72   49   Informa'on   Valida'on   healthyFood   (shrimp)   healthyFood   (shrimp)   healthyFood   (apple)   0.88   •  Querying  by  human  or  agent   •  Informa'on  valida'on   •  Open  Web   •  Online/Any'me   •  Scalable   •  Few  seed  examples          for  training   •  Small        ontology   Mo'va'on  
  • 50. Learning   healthyFood   unHealthyFood   .  .  .   50   Food   Apple   Kale   Black  Beans   Salmon   Walnut   Banana   …   Animal  
  • 51. Learning   healthyFood   unHealthyFood   .  .  .   51   Food   1-­‐  Given  an  input  predicate  instance  and  a  keyword,  OpenEval   first  formulates  a  search  query.   A  predicate  instance   healthyFood(Apple)     Convert  to  a  query:   {“apple”}.   Animal  
  • 52. Learning   healthyFood   unHealthyFood   .  .  .   52   Food   2-­‐  OpenEval  queries  the  open  Web  and  processes  the  retrieved   unstructured  Web  pages.   A  predicate  instance   healthyFood(Apple)     Convert  to  a  query:   {“apple”}.   .   .   .   Animal  
  • 53. Extrac'ng  CBIs   healthyFood   unHealthyFood   .  .  .   53   Food   3-­‐  OpenEval  extracts  a  set  of  Context-­‐Based  Instances  (CBI).   A  predicate  instance   healthyFood(Shrimp)     Convert  to  a  query:   {“shrimp”}.   .   .   .   X   pomaceous   fruit   apple   tree,   species  Malus  domes'ca  rose  family   widely  known  members  genus  Malus   used   humans.   X   grow   small,   deciduous   trees.   tree   originated   Central  Asia,  wild  ancestora   .   .   .   Animal  
  • 54. Learning   healthyFood   unHealthyFood   .  .  .   OpenEval  extracts  CBIs  for  each  predicate.   .  .  .   .  .  .  +   +   +   +   .  .  .  +   +   +   +   healthyFood   unHealthyFood   .  .  .  +   +   -­‐   -­‐   healthyFood   -­‐  +   CBI   54   Food   Animal  
  • 55. Learning   healthyFood   unHealthyFood   .  .  .   OpenEval  extracts  CBIs  for  each  predicate.   .  .  .   .  .  .  +   +   +   +   .  .  .  +   +   +   +   healthyFood   unHealthyFood   healthyFood   -­‐  +   CBI   55   Food   .  .  .  +   +   -­‐  -­‐   .  .  .   OpenEval  trains  a  SVM  for  each  predicate  using  training  CBIs.   Animal  
  • 56. What  does  OpenEval  learn?   healthyFood(apple)   healthyFood(apple)  “vitamin”   Learn  how  to  map  instances  to  an  appropriate  predicate  (i.e.,   sense)  that  they  belong  to.   56  
  • 57. Learning   .  .  .  .  .  .  +   +   -­‐  -­‐   healthyFood   .  .  .   .  .  .  +   +   -­‐  -­‐   unHealthyFood   .  .  .   57  
  • 58. Learning   .  .  .   Choose  predicate  with  maximum  entropy.   .  .  .  +   +   +   +   .  .  .  +   +   +   +   healthyFood   unHealthyFood     .  .  .  +   +   -­‐  -­‐   healthyFood   -­‐  +   -­‐   .  .  .   .  .  .  +   +   -­‐  -­‐   healthyFood   .  .  .   .  .  .  +   +   -­‐  -­‐   unHealthyFood   .  .  .   Choose  a  keyword  for  the  selected  predicate.   Extract  CBIs  for  the  predicate  using  the  selected  keyword.   +   +   .  .   Re-­‐train  a  SVM  for  the  predicate.   58  
  • 59. Predicate  Instance  Evaluator     keywords:   healthyFood(shrimp)?   Given  the  input  Bme,  which  CBIs  should  be  extracted?   59   Vitamin          0.88   Calories          0.83   Grow                  0.69   Tree                      0.66   Amount        0.59   Minerals        0.49   .   .   .  
  • 60. NELL:  Never-­‐Ending  Language  Learner   OpenEval  in  the  last  itera'on:   academicfield  0.8976357986206526  Environmental  Anthropology.     Several  excellent  textbooks  and  readers   in  environmental  anthropology  have   now  appeared,  establishing  a  basic   survey  of  the    field.    
  • 61. NELL:  Never-­‐Ending  Language  Learner   OpenEval  in  the  last  itera'on:   academicfield  0.912473775634353  Anesthesiology.     The  Department  of  Anesthesiology  is   commiGed  to  excellence  in  clinical   service,  educa'on,  research  and  faculty   development.    
  • 62. NELL:  Never-­‐Ending  Language  Learner   OpenEval  in  the  last  itera'on:   worksfor  0.9845774661303888  (charles  osgood,  cbs).   Charles  Osgood,  oEen  referred  to  as   CBS  News'  poet-­‐in-­‐residence,  has  been   anchor  of  "CBS  News  Sunday   Morning"  since  1994.  
  • 63. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 64. NELL:  Never-­‐Ending  Language  Learner   Knowledge  Base  Valida'on  in  NELL   •  Human  Supervision:  RTW  group  members;     •  Conversing  Learning:  NELL  can  autonomously  talk  to   people  in  web  communi'es  and  ask  for  help   •  Web  Querying:  NELL  can  query  the  Web  on  specific   facts  to  verify  correctness,  or  to  predict  the  validity  of   a  new  fact;     •  Hiring  Labelers:  NELL  can  autonomously  hire  people   (using  web  services  such  as  Mechanical  Turk)  to  label   data  and  help  the  system  to  validate  acquired   knowledge.    
  • 65. NELL:  Never-­‐Ending  Language  Learner   Hiring  Labelers:     •  Currently  NELL  can  autonomously  hire  people   (using  Amazon’s  Mechanical  Turk)   •  Default  number  of  instances  is  (uniformly   distributed)  sampled  from  each  Category  and   each  Rela'on     •  Can  be  used  to  precision  es'mate  
  • 66. NELL:  Never-­‐Ending  Language  Learner   Hiring  Labelers:     •  Task  is  to  validate  Category  and  Rela'on   instances   – Category  instances:  Is  Google  a  company?  Is   Mountain  View  a  city?   – Rela'on  instances:  Is  Google  headquartered  in   Mountain  View?  Does  Tom  Mitchell  work  for   Carnegie  Mellon?      
  • 67. NELL:  Never-­‐Ending  Language  Learner   Hiring  Labelers:     •  Research  Ques'ons:   – Sampling  Strategies/Adap've  Sampling       – Quality  of  answers/turkers  
  • 68. NELL:  Never-­‐Ending  Language  Learner   NELL  is  grown  enough  for  a  new  step     NELL  turned  4  on  Jan  12!   CongratulaBons  NELL!!  
  • 69. NELL:  Never-­‐Ending  Language  Learner   NELL  is  grown  enough  for  a  new  step  
  • 70. NELL:  Never-­‐Ending  Language  Learner   NELL  is  grown  enough  for  a  new  step   •  Knowledge  on  Demand  
  • 71. NELL:  Never-­‐Ending  Language  Learner   NELL  is  grown  enough  for  a  new  step   •  Knowledge  on  Demand  
  • 72. NELL:  Never-­‐Ending  Language  Learner   NELL  is  grown  enough  for  a  new  step   •  Knowledge  on  Demand  –  Ask  NELL  
  • 73. estevam.hruschka@gmail.com Thank you very much Google Mountain View! And thanks to Google, DARPA, NSF, CNPq for partial funding! And thanks to Yahoo! for M45 computing and and thanks to Microsoft for fellowship to Edith Law and thanks to Carnegie Mellon University and thanks to Federal University of São Carlos
  • 74. References   •  [Fern,  2008]  Xiaoli  Z.  Fern,  CS  434:  Machine  Learning  and  Data  Mining,    School  of  Electrical  Engineering   and  Computer  Science,  Oregon  State  University,  Fall    2008.   •  [DARPA,  2012]  DARPA  Machine  Reading  Program,  hGp://www.darpa.mil/Our_Work/I2O/Programs/ Machine_Reading.aspx.   •  [Mitchell,  2006]  Tom  M.  Mitchell,  The  Discipline  of  Machine  Learning,  my  perspec've  on  this  research   field,  July  2006  (hGp://www.cs.cmu.edu/~tom/pubs/MachineLearning.pdf).   •  [Mitchell,  1997]  Tom  M.  Mitchell,  Machine  Learning.  McGraw-­‐Hill,  1997.   •  [Etzioni  et  al.,  2007]  Oren  Etzioni,  Michele  Banko,  and  Michael  J.  Cafarella,  Machine  Reading.The  2007   AAAI  Spring  Symposium.  Published  by  The  AAAI  Press,  Menlo  Park,  California,  2007.   •  [Clark  et  al.,  2007]  Peter  Clark,  Phil  Harrison,  John  Thompson,  Rick  Wojcik,  Tom  Jenkins,  David  Israel,   Reading  to  Learn:  An  Inves'ga'on  into  Language  Understanding.  The  2007  AAAI  Spring  Symposium.   Published  by  The  AAAI  Press,  Menlo  Park,  California,  2007.   •  [Norvig,  2007]  Peter  Norvig,    Inference  in  Text  Understanding.  The  2007  AAAI  Spring  Symposium.   Published  by  The  AAAI  Press,  Menlo  Park,  California,  2007.   •  [Wang  &  Cohen,  2007]  Richard  C.  Wang  and  William  W.  Cohen:  Language-­‐Independent  Set  Expansion  of   Named  En''es  using  the  Web.  In  Proceedings  of  IEEE  InternaHonal  Conference  on  Data  Mining  (ICDM   2007),  Omaha,  NE,  USA.  2007.   •  [Etzioni,  2008]  Oren  Etzioni.  2008.  Machine  reading  at  web  scale.  In  Proceedings  of  the  internaHonal   conference  on  Web  search  and  web  data  mining  (WSDM  '08).  ACM,  New  York,  NY,  USA,  2-­‐2.   •  [Banko,  et  al.,  2007]  Michele  Banko,  Michael  J.  Cafarella,  Stephen  Soderland,  MaGhew  Broadhead,  Oren   Etzioni:  Open  Informa'on  Extrac'on  from  the  Web.  IJCAI  2007:  2670-­‐2676  
  • 75. References   •  [Weikum  et  al.,  2009]  G.  Weikum,  G.,  Kasneci,  M.  Ramanath,  F.  Suchanek.  DB  &  IR  methods  for     •  knowledge  discovery.  Communica'ons  of  the  ACM  52(4),  2009.   •  [Theobald  &  Weikum,  2012]  Mar'n  Theobald  and  Gerhard  Weikum.  From  Informa'on  to  Knowledge:   Harves'ng  En''es  and  Rela'onships  from  Web  Sources.  Tutorial  at  PODS  2012     •  [Hoffart  et  al.,  2012]  Johannes  Hoffart,  Fabian  Suchanek,  Klaus  Berberich,  Gerhard  Weikum.  YAGO2:  A   Spa'ally  and  Temporally  Enhanced  Knowledge  Base  from  Wikipedia.  Special  issue  of  the  Ar'ficial   Intelligence  Journal,  2012     •  [Etzioni  et  al.,  2011]  Oren  Etzioni,  Anthony  Fader,  Janara  Christensen,  Stephen  Soderland,  and  Mausam   "Open  Informa'on  Extrac'on:  the  Second  Genera'on“.    Proceedings  of  the  22nd  InternaHonal  Joint   Conference  on  ArHficial  Intelligence  (IJCAI  2011).   •  [Hady  et  al.,  2011]  Hady  W.  Lauw,  Ralf  Schenkel,  Fabian  Suchanek,  Mar'n  Theobald,  and  Gerhard   Weikum,  "Seman'c  Knowledge  Bases  from  Web  Sources"  at  IJCAI  2011,  Barcelona,  July  2011   •  [Fader  et  al.,  2011]  Anthony  Fader,  Stephen  Soderland,  and  Oren  Etzioni.  "Iden'fying  Rela'ons  for  Open   Informa'on  Extrac'on”.  Proceedings  of  the  2011  Conference  on  Empirical  Methods  in  Natural  Language   Processing  (EMNLP  2011)   •  SeGles,  B.:  Closing  the  loop:  Fast,  interac've  semi-­‐supervised  annota'on  with  queries  on  features  and   instances.  In:  Proc.  of  the  EMNLP’11,  Edinburgh,  ACL  (2011)  1467–1478  5.     •  Carlson,  A.,  BeGeridge,  J.,  Kisiel,  B.,  SeGles,  B.,  Jr.,  E.R.H.,  Mitchell,  T.M.:  Toward  an  architecture  for  never-­‐ ending  language  learning.  In:  Proceedings  of  the  Twenty-­‐Fourth  Conference  on  Ar'ficial  Intelligence  (AAAI   2010).   •  Pedro,  S.D.S.,  Hruschka  Jr.,  E.R.:  Collec've  intelligence  as  a  source  for  machine  learning  self-­‐supervision.   In:  Proc.  of  the  4th  Interna'onal  Workshop  on  Web  Intelligence  and  Communi'es.  WIC12,  NY,  USA,  ACM   (2012)  5:1–5:9  
  • 76. References   •  [Appel  &  Hruschka  Jr.,  2011]  Appel,  A.P.,  Hruschka  Jr.,  E.R.:  Prophet  –  a  link-­‐predictor  to  learn  new  rules  on  Nell.   In:  Proceedings  of  the  2011  IEEE  11th  Interna'onal  Conference  on  Data  Mining  Workshops.  pp.  917–924.  ICDMW   ’11,  IEEE  Computer  Society,  Washington,  DC,  USA  (2011)   •  [Mohamed  et  al.,  2011]  Mohamed,  T.P.,  Hruschka,  Jr.,  E.R.,  Mitchell,  T.M.:  Discovering  rela'ons  between  noun   categories.  In:  Proceedings  of  the  Conference  on  Empirical  Methods  in  Nat-­‐  ural  Language  Processing.  pp.  1447– 1455.  EMNLP  ’11,  Associa'on  for  Computa-­‐  'onal  Linguis'cs,  Stroudsburg,  PA,  USA  (2011)   •  [Pedro  &  Hruschka  Jr.,  2012]  Saulo  D.S.  Pedro  and  Estevam  R.  Hruschka  Jr.,  Conversing  Learning:  ac've  learning   and  ac've  social  interac'on  for  human  supervision  in  never-­‐ending  learning  systems.  Xiii  Ibero-­‐american   Conference  On  Ar'ficial  Intelligence,  IBERAMIA  2012,  2012.   •  Krishnamurthy,  J.,  Mitchell,  T.M.:  Which  noun  phrases  denote  which  concepts.  In:  Proceedings  of  the  Forty  Ninth   Annual  Mee'ng  of  the  Associa'on  for  Compu-­‐  ta'onal  Linguis'cs  (2011)   •  Lao,  N.,  Mitchell,  T.,  Cohen,  W.W.:  Random  walk  inference  and  learning  in  a  large  scale  knowledge  base.  In:   Proceedings  of  the  2011  Conference  on  Empirical  Methods  in  Natural  Language  Processing.  pp.  529–539.  Associa-­‐   'on  for  Computa'onal  Linguis'cs,  Edinburgh,  Scotland,  UK.  (July  2011),  hGp://www.aclweb.org/anthology/ D11-­‐1049   •  E.  R.  Hruschka  Jr.  and  M.  C.  Duarte  and  M.  C.  Nicole€.  Coupling  as  Strategy  for  Reducing  Concept-­‐DriE  in  Never-­‐ ending  Learning  Environments.  Fundamenta  Informa'cae,  IOS  Press,  2012.   •  Saulo  D.S.  Pedro,  Ana  Paula  Appel,  and  Estevam  R.  Hruschka,  Jr.  Autonomously  reviewing  and  valida'ng  the   knowledge  base  of  a  never-­‐ending  learning  system.  In  Proceedings  of  the  22nd  internaHonal  conference  on  World   Wide  Web  companion  (WWW  '13  Companion),  1195-­‐120,  2013.   •  S.  Verma  and  E.  R.  Hruschka  Jr.  Coupled  Bayesian  Sets  Algorithm  for  Semi-­‐supervised  Learning  and  Informa'on   Extrac'on.  In  Proceedings  of  the  European  Conference  on  Machine  Learning  and  Principles  and  Prac'ce  of   Knowledge  Discovery  in  Databases  (ECML  PKDD),  2012.   •  Navarro,  L.  F.  and  Appel,  A.  P.  and  Hruschka  Jr.,  E.  R.,  GraphDB  –  Storing  Large  Graphs  on  Secondary  Memory.  In   New  Trends  in  Databases  and  Informa'on.  Advances  in  Intelligent  Systems  and  Compu'ng,  Springer,  177-­‐186,   2013.  
  • 77. References   •  Assuming  Facts  Are  Expressed  More  Than  Once.     J.  BeGeridge,  A.  RiGer  and  T.  Mitchell  In  Proceedings  of  the  27th  Interna'onal  Florida  Ar'ficial   Intelligence  Research  Society  Conference  (FLAIRS-­‐27),  2014.     •  EsBmaBng  Accuracy  from  Unlabeled  Data.     E.  A.  Platanios,  A.  Blum,  T.  Mitchell.  In  Uncertainty  in  Ar'ficial  Intelligence  (UAI),  2014.     •  CTPs:  Contextual  Temporal  Profiles  for  Time  Scoping  Facts  via  EnBty  State  Change  DetecBon.     D.T.  Wijaya,  N.  Nakashole  and  T.M.  Mitchell.  In  Proceedings  of  the  Conference  on  Empirical   Methods  in  Natural  Language  Processing  (EMNLP),  2014.   •  IncorporaBng  Vector  Space  Similarity  in  Random  Walk  Inference  over  Knowledge  Bases.   M.  Gardner,  P.  Talukdar,  J.  Krishnamurthy  and  T.M.  Mitchell.  In  Proceedings  of  the  Conference  on   Empirical  Methods  in  Natural  Language  Processing  (EMNLP),  2014.     •  Scaling  Graph-­‐based  Semi  Supervised  Learning  to  Large  Number  of  Labels  Using  Count-­‐Min   Sketch   P.  P.  Talukdar,  and  W.  Cohen  In  17th  Interna'onal  Conference  on  Ar'ficial  Intelligence  and   Sta's'cs  (AISTATS,  2014.     •  Programming  with  Personalized  PageRank:  A  Locally  Groundable  First-­‐Order  ProbabilisBc  Logic.     W.Y.  Wang,  K.  Mazai's  and  W.W.  Cohen.  In  Proceedings  of  the  Conference  on  Informa'on  and   Knowledge  Management  (CIKM),  2013.   •  Improving  Learning  and  Inference  in  a  Large  Knowledge-­‐base  using  Latent  SyntacBc  Cues.     MaG  Gardner,  Partha  Pra'm  Talukdar,  Bryan  Kisiel,  and  Tom  Mitchell.  In  Proceedings  of  the  2013   Conference  on  Empirical  Methods  in  Natural  Language  Processing  (EMNLP  2013),  2013.