Information  Highlighting Tim Ostler Cognitive Architecture Anaphora Ltd  [email_address] InfoVis’99 London 16 July 1999
Summary 1 Highlighters 2 Highlighting as information visualisation  3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
1 Highlighters 1 Origins 2  Cognitive function 3 Highlighting for others
Highlighters 1/3 Origins 1960s: use of  yellow fibre or felt pens  to highlight text begins in the USA 1971:  Schwan-Stabilo  of West Germany launches first  fluorescent   highlighter pen
Highlighters 2/3 Cognitive function Highlighting  feels  as though it helps revising, perhaps by encoding or  priming  material for incorporation into  long-term memory Partly confirmed by research: Hult et al. (1984) found that note-taking does involve  semantic encoding
Also used to mark up a text for  selective attention  of another person This  function chosen for study, because of clear application to  information overload   Conducted  user study  to define suitable heuristics for text selection Highlighters 3/3 Highlighting for others
Summary 1 Highlighters 2 Highlighting as information visualisation  3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
2 Highlighting as information visualisation 1 Syntax highlighting 2  SeeSoft 3 TextLight 4 Readers vs. Authors
Highlighting as info visualisation 1/4 Syntax highlighting Highlighting can be seen as a means of visualising the  logical or conceptual structure  of a text Enhances understanding of text Guides eye to most important passages  Principle is widely demonstrated by the syntax highlighting in  text-editors for programmers   Useful : need to  visualize logical structure  acute Easy : programming languages offer finite and  precise set of cues  for editors to detect and colour
Highlighting as info visualisation 2/4 SeeSoft One of a suite of  text structure visualisation  tools from team led by Stephen Eick at Lucent (formerly Bell) Laboratories Each line of code reduced to a  line of single pixel thickness , coloured according to a range of user-specified criteria Thousands of lines of code  can be displayed on the screen at once
Highlighting as info visualisation 3/4 TextLight TextLight Conceived as a tool to  Detect certain attributes of a text’s cognitive structure Encode them in visual, non-lexical form Superimpose them in place on the corresponding text Like a GIS, can reveal attributes of its data set that would otherwise be obscured, throwing the underlying structure into high relief
Highlighting as info visualisation 4/4 Readers vs. authors For  readers ,   no benefits  from using  different colours  for different categories of "new" information But for  authors and text analysts  extending TextLight to identify text attributes is as valuable as  colouring different CAD layers  to architects Revealing the pattern of distribution of attributes such as readability or levels of completion like a  knowledge discovery system for authors
Summary 1 Highlighters 2 Highlighting as information visualisation  3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
3 Past studies on visual cueing 1 Judging importance 2  Choosing words 1 3 Choosing words 2 4 Core content 5 How many words? 6 Large variance
Herbert Dreyfus: is the ability to tell the important from the unimportant a  fundamentally human  cognitive operation? Perhaps, but  in some genres  widespread agreement on signals  for different stages in a discourse So while we can’t tell what  seems   important for  every  person, we can assess what   is being  presented  as important Past studies 1/6   Judging importance
Weakness of all research: no formal rules on  which text  to cue Foster (1979): 26 students and lecturers given 3400-word text and asked to underline sentences containing  key ideas  author trying to put over Half subjects told not to underline more than 16 sentences, half not more than 8 First case: 213 selections spanned 80 sentences, with only 9 sentences selected by 6 or more Second case: 102 selections distributed over 52 sentences, with only 2 selected by 6 or more Foster’s conclusion: difficult to identify sections for cueing  Past studies 2/6 Choosing words 1
Other experiments  Klare et al (1955) cued  single words Dearborn et al (1949) emphasised word carrying the  "peak stress"  in a sentence (did not describe how word selected) Crouse & Ildstein (1972) cued  statements  or  sentences   Past studies 3/6   Choosing words 2
Past studies 4/6 “Core” content Most  specific  suggestions by Hershberger & Terry (1965) “ Core” content made up 1/3 of total text length:  New key words Familiar key words Key statements Basic core statements Key examples Rephrasing of key statements
Crouse & Ildstein (1972) Density  of cued material influences its effect Foster (1979) Optimal proportion  of text to be highlighted  still not established Past studies 5/6 How many words?
Fowler & Barker (1974) Pointed to the  large variance  (4% to 32%) observed in the proportion of text highlighted by members of the test group who were asked to highlight for themselves Rickards & August (1975) Asked to highlight passages of structural importance, test subjects all chose passages that Rickards & August considered  relatively unimportant Past studies 6/6 Large variance
Summary 1 Highlighters 2 Highlighting as information visualisation  3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
4 User study 1 Experimental procedure 2  Analytical procedure  3 Analysis of results 4 Observations
11 subjects provided with an 1111-word article from the financial times IT supplement, with instructions to imagine they were  corporate librarians  identifying the  key points  in an article for a board member Questionnaire sought:  Subjects’  past experience  of highlighting Criteria  for text selection At what points  made their selection Other comments User study 1/4   Experimental procedure
Article  input into spreadsheet as  left axis  of spreadsheet spanning 1111 rows (one word per row) Along the  top  of the spreadsheet entered the  attributes  for each word (36 categories)  For each word  probability of lying in a highlighted passage  given a decimal figure between 0 and 1   All other parameters  rebased  to fall between 0 and 1 Gave  correlation  of any given parameter with the probability that a word fell within a  highlighted  group of words User study 2/4  Analytical procedure
Results show  wide variance  in  number  of words highlighted Minimum of 50 (4.5%) Maximum of 396 (35.64%)  (Fowler & Barker 1974: 4-32%) Marked difference between  male  and  female  subjects Males averaging 15% Females 25.5% Little correlation  between  part of speech/syntactic role  and probability of highlighting Noticeable association with  longer words User study 3/4   Analysis of results
None of subjects made highlighting decisions before having read  at least one paragraph   Large majority (70%)  delayed  highlighting until whole passage read Conclusion: decisions made at a  discourse-analytical  and not a strictly  linguistic  level  User study 4/4 Observations
Summary 1 Highlighters 2 Highlighting as information visualisation  3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
5 Heuristics 1 Correlation with average choice 2  Key correlations 3 Best heuristics 4 Highlighting by humans 1 5 Highlighting by humans 2 6 Highlighting by best heuristics 7 Performance of best heuristics
Average correlation between any  one person’s   highlighting decisions  and the scores for   probability of given words being highlighted  was  0.44 For any  individual word  probability varied between  0  and  0.83 , offering clear guidelines for assessing any trial selection criteria  Heuristics 1/7 Correlation with average choice
Heuristics 2/7 Key correlations
Most successful heuristics: 1 Word should be part of  first statement in a discourse segment 2 Word should be part of first statement in  any quote not an immediate continuation of a previous quote 3 Word should be part of a  list 4 Word should be part of  “solution”  stage Heuristics 3/7 Best heuristics
Heuristics 4/7 Highlighting by humans 1 Areas where probability of highlighting is greater than  0.4
Heuristics 5/7 Highlighting by humans 2 Areas where probability of highlighting is greater than  0.33
Heuristics 6/7 Highlighting by best heuristics KEY First statement in a quote “ Solution” stage First statement in a discourse segment
Best combination of heuristics produced correlation with actual highlighting probability of  0.56   (average of 0.43   for test subjects) In other words, selecting text according to specified criteria achieved a correlation that was  greater than all but one of the test subjects achieved  and considerably higher than the average BUT: challenge is to  identify the markers  denoting relevant features in a discourse  Heuristics 7/7  Performance of best heuristics
Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
6 Identifying discourse markers 1 Segments 2  Statements 3 Solution stages 4 Stage labels 5 Cue words as signals 6 “Solution” signals
Identifying discourse markers 1/6 Segments Different means of discourse segmentation beyond the scope of this paper Segments most often coincide with beginning of paragraphs, and normally  begin with a proposition  or assertion Most effective technique found:  select opening statement  in its simplest form
Identifying discourse markers 2/6 Statements Sometimes preceded or followed by a  coherence relation  — a question or other linguistic feature that makes proposition’s relevance to the preceding text clear  Following text tends to  fill out details  and/or provide  supporting evidence  for the assertion
Identifying discourse markers 3/6 Solution stage “ Situation-problem-solution-evaluation ”  structure Narrative  structures Boy meets girl – boy loses girl –  boy regains girl  – boy & girl live happily ever after Feature articles Dogs make great pets – however they can get fleas –  Winalot have now launched a new anti-flea dog food  – owners have declared it a success)
Identifying discourse markers 4/6 Stage signals Hoey (1994) — elements of structure often signalled by  characteristic words Stage signals  as the most basic level  “ Cars are a common way of getting from A to B.  However , the congestion that they cause is a problem.  The solution is  to get people to use public transport.  In this way  everyone can get to work quickly.”
Identifying discourse markers 5/6 Cue words as signals Hoey (ibid.): Discourse structure essentially  evaluative   e.g. “If thyristors are used to control the motor of an electric car, the vehicle moves smoothly but with poor efficiency at low speeds”  “ Problem” stage signalled by negative evaluation “poor” So stages can be identified by spotting  cue words  or phrases
Identifying discourse markers 6/6   “Solution” signals TextLight need only be concerned with  “solution” signals Two examples of such signals Words to do with “ solving ”, “ developing ” or “ inventing ” Change of verb form into the  present perfect tense , as in "have -ed". Tense then reverts to simple present to denote that a new situation exists as a result of the solution
Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
7 “Given” and “new” information 1 Highlighting the new 2  Narrative stages 3 Importance 4 Intonation 5 First statement 6 Lists 7 “Solution” as “new” 8 Quasi-revision 9 Levels of “new”
“ Given” and “new” information 1/9 Highlighting the new Why  were best heuristics more effective than others?  Prague school (1930s) — information is composed of a mixture of  “given”  and  “new” information   Proposition: essential factor behind the choice of text to highlight is that they are all ways  in which “new” information is signalled at the discourse level
"Given" and ”new" information 2/9 Narrative stages Theory supported by the fact that  80%  of subjects stated that they were highlighting words that “ marked significant stages in the narrative .”  This implies information that is  new in the context of preceding text
“ Given” and “new” information 3/9 Importance We can argue that an idea’s  perceived importance  is judged according to the extent to which it is: New  as opposed to  given Matches a  perceived gap  in the structure of the reader’s  domain knowledge When highlighting on behalf of  others , we have to make informed judgement on  how ultimate reader will define importance
“ Given” and “new” information 4/9  Intonation Halliday (1970) — in spoken discourse,  intonation  is used to signal to the listener  what the speaker understands to be new  information Could  highlighting  perform equivalent function?
“ Given” and “new” information 5/9  First statement First statement in a paragraph can be considered as  supporting structure  for the statement at the  beginning  of the discourse segment that contains it  Operates as one of  primary statements  containing most of the “new” information in document
“ Given” and “new” information 6/9  Lists Lists typically act as  systematic tabulation  of what the author believes to be important (i.e. “new” and relevant) information  Often used for  predictive  purposes within a discourse, or for  enumerating  significant points People therefore tend to identify lists as  concentrated sources of meaning , and as such eligible for highlighting Speaker might very well emphasise this by  counting the points off  using the fingers of his hand
“ Given” and “new” information 7/9 “Solution” as “new ” Solution stages comprise “new” information: a  climactic point of novelty  in schema, justifying status as “highlightable” text If article modelled as histogram with columns depicting sentences plotted against new information content, highlighting like  slicing across the graph using a threshold value
“ Given” and “new” information 8/9 Quasi-revision Criteria and procedure would have been different for quasi-revision  Shorter range More spontaneously  applied Reader has  more detailed knowledge  of what is “new” info for him/herself  Highlighting can be done In  real time With  greater precision
“ Given” and “new” information 9/9 Levels of “newness” Information can also be perceived as “new” at  several levels: Within a  sentence , particular  words  can be seen as new  Within a  paragraph , some  sentences  can be interpreted as  new  and others as contextual or  supporting information Within a discourse segment or  discourse , still  longer passages  may be perceived as containing “new” information
Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
6  Future directions 1 Highlighting long neglected 2  Virtues of highlighting 3 TextLight: to do
Future directions 1/3 Highlighting long neglected The study of the  selection of words for highlighting  previously neglected Potential of  automatic highlighting as a tool  to handle information overload also neglected
Future directions 2/3  Virtues of highlighters Output  familiar to users Highlighting shown to be helpful in  content recall Addresses issue of  confidence Highlighting acts  not as a censor but as a guide : non-selected text (and therefore the context) always in view Suitable as a  plug-in module  for other programs
Future directions 3/3  TextLight: to do Incorporate discourse  segmentation algorithms Complete lexical dictionary  for cue recognition Port from Prolog to  Java  for greater portability
TextLight   URLs http://www.cogarch.demon.co.uk/textlight.html mailto:timo@cogarch.com

More Related Content

PDF
Primary research
PPTX
P#2 research framework
PDF
A checklist for reviewing a paper
PDF
Research Methods in Natural Language Processing (2018 version)
PDF
Research Methods in Natural Language Processing
PPTX
Title and research problem
PPTX
Lieterature review writing
PDF
WritingGuidance
Primary research
P#2 research framework
A checklist for reviewing a paper
Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing
Title and research problem
Lieterature review writing
WritingGuidance

What's hot (20)

PDF
A Review of Distributional models of word meaning (Lenci, 2018)
PPSX
Week3 pptslides structure and key terms of research report
PPTX
Content analysis
PPTX
Chapter 20 Presentation
PPT
Assignment presentation QDA analysis
PDF
Quantification of Portrayal Concepts using tf-idf Weighting
PDF
InHouse Training 141114
PPTX
Collecting Primary Data Using Semi Structured
PPTX
Research writing problems and solutions
PPTX
Qualitative research methods kanhaiya sapkota
PDF
Finding Your Voice as an Academic Writer
PPT
Examining reading
PPTX
Presentation m.ali pu
PPT
Drafting lit review
PDF
Combining IR with Relevance Feedback for Concept Location
PDF
Give a guideline to help students to submit a high quality research proposal ...
PPTX
Content analysis
PPSX
Week6 7a- developing a questionnaire
PDF
Opinion Mining Techniques for Non-English Languages: An Overview
A Review of Distributional models of word meaning (Lenci, 2018)
Week3 pptslides structure and key terms of research report
Content analysis
Chapter 20 Presentation
Assignment presentation QDA analysis
Quantification of Portrayal Concepts using tf-idf Weighting
InHouse Training 141114
Collecting Primary Data Using Semi Structured
Research writing problems and solutions
Qualitative research methods kanhaiya sapkota
Finding Your Voice as an Academic Writer
Examining reading
Presentation m.ali pu
Drafting lit review
Combining IR with Relevance Feedback for Concept Location
Give a guideline to help students to submit a high quality research proposal ...
Content analysis
Week6 7a- developing a questionnaire
Opinion Mining Techniques for Non-English Languages: An Overview
Ad

Similar to Information Highlighting (20)

PPTX
Scholars’ Perceptions of Relevance in Bibliography-Based People Recommender S...
PPTX
Chapter 12: Abstract ( english for writing research papers)
PPT
meta analysis
PPTX
The Open Research Agenda
PPTX
The Open Research Agenda
PDF
content analysis and discourse analysis
PPT
Information Skills For Researchers V3
DOCX
Saleegul summary
PPTX
John van genderen
DOCX
Research methods for strategic managers
PPTX
Write Title and Abstract in Scientific Academics
PDF
Tutoriel ssmt
PDF
How to write an effective title and abstract and choose appropriate keywords 
PPT
How to Conduct a Systematic Search
PPTX
Definiendo el enfoque lfe
PPT
Introduction to Thesis
PPTX
G1.suntasig.guallichico.maicol.alexander.english.project.design
DOCX
Respond using one or more of the following approaches
PPT
Sonja's power point on prior knowlege
PDF
writing a research proposal.pdf
Scholars’ Perceptions of Relevance in Bibliography-Based People Recommender S...
Chapter 12: Abstract ( english for writing research papers)
meta analysis
The Open Research Agenda
The Open Research Agenda
content analysis and discourse analysis
Information Skills For Researchers V3
Saleegul summary
John van genderen
Research methods for strategic managers
Write Title and Abstract in Scientific Academics
Tutoriel ssmt
How to write an effective title and abstract and choose appropriate keywords 
How to Conduct a Systematic Search
Definiendo el enfoque lfe
Introduction to Thesis
G1.suntasig.guallichico.maicol.alexander.english.project.design
Respond using one or more of the following approaches
Sonja's power point on prior knowlege
writing a research proposal.pdf
Ad

Recently uploaded (20)

PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
CloudStack 4.21: First Look Webinar slides
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Getting started with AI Agents and Multi-Agent Systems
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
August Patch Tuesday
PPT
Geologic Time for studying geology for geologist
PPT
Module 1.ppt Iot fundamentals and Architecture
Univ-Connecticut-ChatGPT-Presentaion.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
DP Operators-handbook-extract for the Mautical Institute
1 - Historical Antecedents, Social Consideration.pdf
O2C Customer Invoices to Receipt V15A.pptx
CloudStack 4.21: First Look Webinar slides
A novel scalable deep ensemble learning framework for big data classification...
Group 1 Presentation -Planning and Decision Making .pptx
Final SEM Unit 1 for mit wpu at pune .pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Getting started with AI Agents and Multi-Agent Systems
Chapter 5: Probability Theory and Statistics
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Hybrid model detection and classification of lung cancer
sustainability-14-14877-v2.pddhzftheheeeee
August Patch Tuesday
Geologic Time for studying geology for geologist
Module 1.ppt Iot fundamentals and Architecture

Information Highlighting

  • 1. Information Highlighting Tim Ostler Cognitive Architecture Anaphora Ltd [email_address] InfoVis’99 London 16 July 1999
  • 2. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 3. 1 Highlighters 1 Origins 2 Cognitive function 3 Highlighting for others
  • 4. Highlighters 1/3 Origins 1960s: use of yellow fibre or felt pens to highlight text begins in the USA 1971: Schwan-Stabilo of West Germany launches first fluorescent highlighter pen
  • 5. Highlighters 2/3 Cognitive function Highlighting feels as though it helps revising, perhaps by encoding or priming material for incorporation into long-term memory Partly confirmed by research: Hult et al. (1984) found that note-taking does involve semantic encoding
  • 6. Also used to mark up a text for selective attention of another person This function chosen for study, because of clear application to information overload Conducted user study to define suitable heuristics for text selection Highlighters 3/3 Highlighting for others
  • 7. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 8. 2 Highlighting as information visualisation 1 Syntax highlighting 2 SeeSoft 3 TextLight 4 Readers vs. Authors
  • 9. Highlighting as info visualisation 1/4 Syntax highlighting Highlighting can be seen as a means of visualising the logical or conceptual structure of a text Enhances understanding of text Guides eye to most important passages Principle is widely demonstrated by the syntax highlighting in text-editors for programmers Useful : need to visualize logical structure acute Easy : programming languages offer finite and precise set of cues for editors to detect and colour
  • 10. Highlighting as info visualisation 2/4 SeeSoft One of a suite of text structure visualisation tools from team led by Stephen Eick at Lucent (formerly Bell) Laboratories Each line of code reduced to a line of single pixel thickness , coloured according to a range of user-specified criteria Thousands of lines of code can be displayed on the screen at once
  • 11. Highlighting as info visualisation 3/4 TextLight TextLight Conceived as a tool to Detect certain attributes of a text’s cognitive structure Encode them in visual, non-lexical form Superimpose them in place on the corresponding text Like a GIS, can reveal attributes of its data set that would otherwise be obscured, throwing the underlying structure into high relief
  • 12. Highlighting as info visualisation 4/4 Readers vs. authors For readers , no benefits from using different colours for different categories of "new" information But for authors and text analysts extending TextLight to identify text attributes is as valuable as colouring different CAD layers to architects Revealing the pattern of distribution of attributes such as readability or levels of completion like a knowledge discovery system for authors
  • 13. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 14. 3 Past studies on visual cueing 1 Judging importance 2 Choosing words 1 3 Choosing words 2 4 Core content 5 How many words? 6 Large variance
  • 15. Herbert Dreyfus: is the ability to tell the important from the unimportant a fundamentally human cognitive operation? Perhaps, but in some genres widespread agreement on signals for different stages in a discourse So while we can’t tell what seems important for every person, we can assess what is being presented as important Past studies 1/6 Judging importance
  • 16. Weakness of all research: no formal rules on which text to cue Foster (1979): 26 students and lecturers given 3400-word text and asked to underline sentences containing key ideas author trying to put over Half subjects told not to underline more than 16 sentences, half not more than 8 First case: 213 selections spanned 80 sentences, with only 9 sentences selected by 6 or more Second case: 102 selections distributed over 52 sentences, with only 2 selected by 6 or more Foster’s conclusion: difficult to identify sections for cueing Past studies 2/6 Choosing words 1
  • 17. Other experiments Klare et al (1955) cued single words Dearborn et al (1949) emphasised word carrying the "peak stress" in a sentence (did not describe how word selected) Crouse & Ildstein (1972) cued statements or sentences Past studies 3/6 Choosing words 2
  • 18. Past studies 4/6 “Core” content Most specific suggestions by Hershberger & Terry (1965) “ Core” content made up 1/3 of total text length: New key words Familiar key words Key statements Basic core statements Key examples Rephrasing of key statements
  • 19. Crouse & Ildstein (1972) Density of cued material influences its effect Foster (1979) Optimal proportion of text to be highlighted still not established Past studies 5/6 How many words?
  • 20. Fowler & Barker (1974) Pointed to the large variance (4% to 32%) observed in the proportion of text highlighted by members of the test group who were asked to highlight for themselves Rickards & August (1975) Asked to highlight passages of structural importance, test subjects all chose passages that Rickards & August considered relatively unimportant Past studies 6/6 Large variance
  • 21. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 22. 4 User study 1 Experimental procedure 2 Analytical procedure 3 Analysis of results 4 Observations
  • 23. 11 subjects provided with an 1111-word article from the financial times IT supplement, with instructions to imagine they were corporate librarians identifying the key points in an article for a board member Questionnaire sought: Subjects’ past experience of highlighting Criteria for text selection At what points made their selection Other comments User study 1/4 Experimental procedure
  • 24. Article input into spreadsheet as left axis of spreadsheet spanning 1111 rows (one word per row) Along the top of the spreadsheet entered the attributes for each word (36 categories) For each word probability of lying in a highlighted passage given a decimal figure between 0 and 1 All other parameters rebased to fall between 0 and 1 Gave correlation of any given parameter with the probability that a word fell within a highlighted group of words User study 2/4 Analytical procedure
  • 25. Results show wide variance in number of words highlighted Minimum of 50 (4.5%) Maximum of 396 (35.64%) (Fowler & Barker 1974: 4-32%) Marked difference between male and female subjects Males averaging 15% Females 25.5% Little correlation between part of speech/syntactic role and probability of highlighting Noticeable association with longer words User study 3/4 Analysis of results
  • 26. None of subjects made highlighting decisions before having read at least one paragraph Large majority (70%) delayed highlighting until whole passage read Conclusion: decisions made at a discourse-analytical and not a strictly linguistic level User study 4/4 Observations
  • 27. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 28. 5 Heuristics 1 Correlation with average choice 2 Key correlations 3 Best heuristics 4 Highlighting by humans 1 5 Highlighting by humans 2 6 Highlighting by best heuristics 7 Performance of best heuristics
  • 29. Average correlation between any one person’s highlighting decisions and the scores for probability of given words being highlighted was 0.44 For any individual word probability varied between 0 and 0.83 , offering clear guidelines for assessing any trial selection criteria Heuristics 1/7 Correlation with average choice
  • 30. Heuristics 2/7 Key correlations
  • 31. Most successful heuristics: 1 Word should be part of first statement in a discourse segment 2 Word should be part of first statement in any quote not an immediate continuation of a previous quote 3 Word should be part of a list 4 Word should be part of “solution” stage Heuristics 3/7 Best heuristics
  • 32. Heuristics 4/7 Highlighting by humans 1 Areas where probability of highlighting is greater than 0.4
  • 33. Heuristics 5/7 Highlighting by humans 2 Areas where probability of highlighting is greater than 0.33
  • 34. Heuristics 6/7 Highlighting by best heuristics KEY First statement in a quote “ Solution” stage First statement in a discourse segment
  • 35. Best combination of heuristics produced correlation with actual highlighting probability of 0.56 (average of 0.43 for test subjects) In other words, selecting text according to specified criteria achieved a correlation that was greater than all but one of the test subjects achieved and considerably higher than the average BUT: challenge is to identify the markers denoting relevant features in a discourse Heuristics 7/7 Performance of best heuristics
  • 36. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 37. 6 Identifying discourse markers 1 Segments 2 Statements 3 Solution stages 4 Stage labels 5 Cue words as signals 6 “Solution” signals
  • 38. Identifying discourse markers 1/6 Segments Different means of discourse segmentation beyond the scope of this paper Segments most often coincide with beginning of paragraphs, and normally begin with a proposition or assertion Most effective technique found: select opening statement in its simplest form
  • 39. Identifying discourse markers 2/6 Statements Sometimes preceded or followed by a coherence relation — a question or other linguistic feature that makes proposition’s relevance to the preceding text clear Following text tends to fill out details and/or provide supporting evidence for the assertion
  • 40. Identifying discourse markers 3/6 Solution stage “ Situation-problem-solution-evaluation ” structure Narrative structures Boy meets girl – boy loses girl – boy regains girl – boy & girl live happily ever after Feature articles Dogs make great pets – however they can get fleas – Winalot have now launched a new anti-flea dog food – owners have declared it a success)
  • 41. Identifying discourse markers 4/6 Stage signals Hoey (1994) — elements of structure often signalled by characteristic words Stage signals as the most basic level “ Cars are a common way of getting from A to B. However , the congestion that they cause is a problem. The solution is to get people to use public transport. In this way everyone can get to work quickly.”
  • 42. Identifying discourse markers 5/6 Cue words as signals Hoey (ibid.): Discourse structure essentially evaluative e.g. “If thyristors are used to control the motor of an electric car, the vehicle moves smoothly but with poor efficiency at low speeds” “ Problem” stage signalled by negative evaluation “poor” So stages can be identified by spotting cue words or phrases
  • 43. Identifying discourse markers 6/6 “Solution” signals TextLight need only be concerned with “solution” signals Two examples of such signals Words to do with “ solving ”, “ developing ” or “ inventing ” Change of verb form into the present perfect tense , as in "have -ed". Tense then reverts to simple present to denote that a new situation exists as a result of the solution
  • 44. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 45. 7 “Given” and “new” information 1 Highlighting the new 2 Narrative stages 3 Importance 4 Intonation 5 First statement 6 Lists 7 “Solution” as “new” 8 Quasi-revision 9 Levels of “new”
  • 46. “ Given” and “new” information 1/9 Highlighting the new Why were best heuristics more effective than others? Prague school (1930s) — information is composed of a mixture of “given” and “new” information Proposition: essential factor behind the choice of text to highlight is that they are all ways in which “new” information is signalled at the discourse level
  • 47. "Given" and ”new" information 2/9 Narrative stages Theory supported by the fact that 80% of subjects stated that they were highlighting words that “ marked significant stages in the narrative .” This implies information that is new in the context of preceding text
  • 48. “ Given” and “new” information 3/9 Importance We can argue that an idea’s perceived importance is judged according to the extent to which it is: New as opposed to given Matches a perceived gap in the structure of the reader’s domain knowledge When highlighting on behalf of others , we have to make informed judgement on how ultimate reader will define importance
  • 49. “ Given” and “new” information 4/9 Intonation Halliday (1970) — in spoken discourse, intonation is used to signal to the listener what the speaker understands to be new information Could highlighting perform equivalent function?
  • 50. “ Given” and “new” information 5/9 First statement First statement in a paragraph can be considered as supporting structure for the statement at the beginning of the discourse segment that contains it Operates as one of primary statements containing most of the “new” information in document
  • 51. “ Given” and “new” information 6/9 Lists Lists typically act as systematic tabulation of what the author believes to be important (i.e. “new” and relevant) information Often used for predictive purposes within a discourse, or for enumerating significant points People therefore tend to identify lists as concentrated sources of meaning , and as such eligible for highlighting Speaker might very well emphasise this by counting the points off using the fingers of his hand
  • 52. “ Given” and “new” information 7/9 “Solution” as “new ” Solution stages comprise “new” information: a climactic point of novelty in schema, justifying status as “highlightable” text If article modelled as histogram with columns depicting sentences plotted against new information content, highlighting like slicing across the graph using a threshold value
  • 53. “ Given” and “new” information 8/9 Quasi-revision Criteria and procedure would have been different for quasi-revision Shorter range More spontaneously applied Reader has more detailed knowledge of what is “new” info for him/herself Highlighting can be done In real time With greater precision
  • 54. “ Given” and “new” information 9/9 Levels of “newness” Information can also be perceived as “new” at several levels: Within a sentence , particular words can be seen as new Within a paragraph , some sentences can be interpreted as new and others as contextual or supporting information Within a discourse segment or discourse , still longer passages may be perceived as containing “new” information
  • 55. Summary 1 Highlighters 2 Highlighting as information visualisation 3 Past studies of visual cueing 4 User study 5 Heuristics 6 Identifying discourse markers 7 “Given” and “new” information 8 Future directions
  • 56. 6 Future directions 1 Highlighting long neglected 2 Virtues of highlighting 3 TextLight: to do
  • 57. Future directions 1/3 Highlighting long neglected The study of the selection of words for highlighting previously neglected Potential of automatic highlighting as a tool to handle information overload also neglected
  • 58. Future directions 2/3 Virtues of highlighters Output familiar to users Highlighting shown to be helpful in content recall Addresses issue of confidence Highlighting acts not as a censor but as a guide : non-selected text (and therefore the context) always in view Suitable as a plug-in module for other programs
  • 59. Future directions 3/3 TextLight: to do Incorporate discourse segmentation algorithms Complete lexical dictionary for cue recognition Port from Prolog to Java for greater portability
  • 60. TextLight URLs http://www.cogarch.demon.co.uk/textlight.html mailto:timo@cogarch.com