SlideShare a Scribd company logo
Course Overview:
An Introduction to Information
Retrieval and Applications
J. H. Wang
Feb. 17, 2014
IR, Spring 2014 www.Vidyarthiplus.com 2
Instructor & TA
• Instructor
– J. H. Wang ( 王正豪 )
– Associate Professor, CSIE, NTUT
– Office: R1534, Technology Building
– E-mail: jhwang@csie.ntut.edu.tw
– Tel: ext. 4238
– Office Hour: 9:00-12:00 am, every Tuesday and
Thursday
• TA
– TBD (R1424, Technology Building)
IR, Spring 2014 www.Vidyarthiplus.com 3
Course Description
• Course Web Page: for the latest announcements and
updates of schedule, slides, and homeworks
– http://www.ntut.edu.tw/~jhwang/IR/
• Time: 9:10-12:00am, Fri.
• Classroom: R334, Technology Building
• Textbook:
– Christopher D. Manning, Prabhakar Raghavan and Hinrich
Schuetze, Introduction to Information Retrieval, Cambridge
University Press, 2008.
• Available online
• International Student Edition, imported by Kai-Fa ( 開發 )
Publishing
• Prerequisites:
– Basic knowledge of data structures and algorithms, linear
algebra, and probability theory
– Programming experience is *required* for homeworks &
projects
Target Audience
• Seniors
• Master students
• IMEECS (International Master’s Program
in Electrical Engineering and Computer
Science)
IR, Spring 2014 www.Vidyarthiplus.com 4
IR, Spring 2014 www.Vidyarthiplus.com 5
Additional References
• References:
– Ricardo Baeza-Yates and Berthier Ribeiro-Neto,
Modern Information Retrieval: The Concepts and Tec
hnology behind Search
, Addison-Wesley, 2011.
• This is the second edition of their book
Modern Information Retrieval in 1999. ( 華通 )
– Bruce Croft, Donald Metzler, and Trevor Strohman,
Search Engines: Information Retrieval in Practice,
Addison-Wesley, 2010. ( 全華 )
– Stefan Buettcher, Charles L.A. Clarke, and Gordon V.
Cormack,
Information Retrieval: Implementing and Evaluating
Search Engines
IR, Spring 2014 www.Vidyarthiplus.com 6
More Books on IR
• Gerald Salton, Automatic information organization and
retrieval, McGraw-Hill, 1968.
• Gerald Salton and M.J. McGill, Introduction to modern
information retrieval, McGraw-Hill, 1983.
– Two classics, but out-of-print.
• C. J. van Rijsbergen, Information Retrieval, Butterworths,
1979.
– The classic. More than 40 years old, but still worth reading.
• K. Sparck Jones, P. Willett,
Readings in Information Retrieval, Morgan Kaufmann,
1997.
– A collection of classical IR papers. (out of print)
• I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann,
Managing Gigabytes, 2nd edition, 1999.
– The authority on index construction and compression.
IR, Spring 2014 www.Vidyarthiplus.com 7
Grading Policy
• Homework assignments and
programming exercises: ~40%
• Mid-term exam: ~25%
• Term project: ~35%
– Including proposal, presentation, and final
report
IR, Spring 2014 www.Vidyarthiplus.com 8
Programming Exercises and Term
Project
• About 3 programming exercises
– Team-based (at most 2 persons per team)
– You can either write your own code or reuse existing
open source code
• The term project
– Either team-based system development (the same as
programming exercises)
– Or academic paper presentation
• Only one person per team allowed
– A proposal is *required* before midterm (Apr. 11,
2014)
IR, Spring 2014 www.Vidyarthiplus.com 9
About the Term Project
• The score you get depends on the functions,
difficulty and quality of your project
– For system development:
• System functions and correctness
– For academic paper presentation
• Quality and your presentation of the paper
• Major methods/experimental results *must* be presented
• Papers from top conferences are strongly suggested
– E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
• Proposals are *required* for each team, and will be counted
in the score
IR, Spring 2014 www.Vidyarthiplus.com 10
Online Submission
• Submission instructions
– Programs, project proposals, and project
reports in electronic files must be submitted to
the TA online at:
• Submissions website: (TBD)
– Before submission:
• User name: Your student ID
• Please change your default password at your first
login
IR, Spring 2014 www.Vidyarthiplus.com 11
What this Course is NOT about
• This course will NOT tell you
– The tips and tricks of using search engines,
although power users might have better ideas on how
to improve them
• There’re plenty of books and websites on that…
– How to find books in libraries,
although it’s somewhat related to the basic IR
concepts
– How to make money on the Web,
although the currently largest search engine did it
What’s Information Retrieval?
• Things that you have been doing all day!
– Searching for something interesting: Web, news,
e-mail, image, video, …
– Asking for advices
– …
• User interests are changing all the time…
– 2011: New Zealand Earthquake
– 2012: Jeremy Lin
– 2013: Meteor Russia
– 2014: ? (next slide)
IR, Spring 2014 www.Vidyarthiplus.com 12
IR, Spring 2014 www.Vidyarthiplus.com 13
What’s Information Retrieval
In Google News
IR, Spring 2014 www.Vidyarthiplus.com 14
IR, Spring 2014 www.Vidyarthiplus.com 15
IR, Spring 2014 www.Vidyarthiplus.com 16
In Wikipedia
IR, Spring 2014 www.Vidyarthiplus.com 17
In Google Images
IR, Spring 2014 www.Vidyarthiplus.com 18
IR, Spring 2014 www.Vidyarthiplus.com 19
IR, Spring 2014 www.Vidyarthiplus.com 20
In Google Video Search
IR, Spring 2014 www.Vidyarthiplus.com 21
In Google Translate…
IR, Spring 2014 www.Vidyarthiplus.com 22
Or More Related Keywords
• Blast
• Explosion
• Chelyabinsk
• Asteroid 2012 DA14
• …
What if We Search in Chinese
IR, Spring 2014 www.Vidyarthiplus.com 23
IR, Spring 2014 www.Vidyarthiplus.com 24
And More…
• 流星
• 彗星
• 隕石
• 俄羅斯
• 地球
• …
• And other languages…
• And other search engines…
• And social websites…
IR, Spring 2014 www.Vidyarthiplus.com 25
In Google Trends
IR, Spring 2014 www.Vidyarthiplus.com 26
And More…
IR, Spring 2014 www.Vidyarthiplus.com 27
And Other Keywords…
IR, Spring 2014 www.Vidyarthiplus.com 28
And Social Search…
How do I Know What People
Care about?
IR, Spring 2014 www.Vidyarthiplus.com 29
IR, Spring 2014 www.Vidyarthiplus.com 30
What are People Searching in
Taiwan?
IR, Spring 2014 www.Vidyarthiplus.com 31
IR, Spring 2014 www.Vidyarthiplus.com 32
What Is Information Retrieval?
• “Information retrieval is a field concerned
with the structure, analysis, organization,
storage, searching, and retrieval of
information.” (Salton, 1968)
IR, Spring 2014 www.Vidyarthiplus.com 33
Goal
• Information retrieval (IR): a research field
that targets at effectively and efficiently
searching information in text and
multimedia documents
• In this course, we will introduce the basic
text and query models in IR, retrieval
evaluation, indexing and searching, and
applications for IR
IR, Spring 2014 www.Vidyarthiplus.com 34
A Big Picture
IR, Spring 2014 www.Vidyarthiplus.com 35
Inverted
Index
User
Interface
Text Operations
Query
Expansion
Indexing
Retrieval
Ranking
Text
query
user need
user feedback
ranked docs
retrieved docs
Doc representation
logical view
inverted file
Document
Collection
IR, Spring 2014 www.Vidyarthiplus.com 36
Topics
• Text IR
– Indexing and searching
– Query languages and operations
• Retrieval evaluation
• Modeling
– Boolean model
– Vector space model
– Probabilistic model
• Applications for IR
– Multimedia IR
– Web search
– Digital libraries
IR, Spring 2014 www.Vidyarthiplus.com 37
Organization of the Textbook
• Basics in IR (focus)
– Inverted indexes for boolean queries (Ch.1-5)
– Term weighting and vector space model (Ch. 6-7)
– Evaluation in IR (Ch. 8)
• Advanced Topics
– Relevance feedback (Ch. 9)
– XML retrieval (Ch. 10)
– Probabilistic IR (Ch. 11)
– Language models (Ch. 12)
• Machine learning in IR (useful)
– Text classification (Ch. 13-15)
– Document clustering (Ch. 16-18)
• Web Search
– Web crawling and indexes (Ch. 19-20)
– Link analysis (Ch. 21)
Some Overlap with Other Fields
• Text mining
• Machine Learning
• Natural Language Processing
• Social Network Analysis
• …
IR, Spring 2014 www.Vidyarthiplus.com 38
IR, Spring 2014 www.Vidyarthiplus.com 39
Pointers to Other Topics
• Cross-language IR
• Image, video, and multimedia IR
• Speech retrieval
• Music retrieval
• User interfaces
• Parallel, distributed, and P2P IR
• Digital libraries
• Information science perspective
• Logic-based approaches to IR
• Natural language processing techniques
• …
IR, Spring 2014 www.Vidyarthiplus.com 40
Tentative Schedule
• Before midterm
– Boolean retrieval (1 wk)
– Indexing (2 wks)
– Vector space model and evaluation (2 wk)
– Relevance feedback (1 wk)
– Probabilistic IR (2 wk)
• After midterm
– Text classification (1-2 wk)
– Document clustering (1-2 wk)
– Web search (2 wks)
– Advanced topics: CLIR, IE, … (2 wks)
– Term Project Presentation (3 wks)
IR, Spring 2014 www.Vidyarthiplus.com 41
Generic Resources
• Wikipedia page on Information Retrieval:
http://en.wikipedia.org/wiki/Informatio
n_retrieval
• Information Retrieval Resources:
http://www-csli.stanford.edu/~hinrich/i
nformation-retrieval.html
•
IR, Spring 2014 www.Vidyarthiplus.com 42
Academic Resources
• Journals
– ACM TOIS: Transactions on Information Systems
– JASIST: Journal of the American Society of Information Sciences
– IP&M: Information Processing and Management
– IEEE TKDE: Transactions on Knowledge and Data Engineering
• Conferences
– ACM SIGIR: International Conference on Information Retrieval
– WWW: World Wide Web Conference
– ACM CIKM: Conference on Information Knowledge and
Management
– JCDL: ACM/IEEE Joint Conference on Digital Libraries
– ACM WSDM: International Conference on Web Search and Data
Mining
– TREC: Text Retrieval Conference
Teaching in English…
• Slides and lectures will be offered mainly
in English
• For better understanding for domestic
students, important concepts will be
briefly summarized in Chinese
IR, Spring 2014 www.Vidyarthiplus.com 43
IR, Spring 2014 www.Vidyarthiplus.com 44
Thanks for Your Attention!
• Any question or comment?
Please feel free to send e-mails to
jhwang@csie.ntut.edu.tw
or discuss with me at my office

More Related Content

PPT
00 intro
PPT
lessonslearned.ppt
PPTX
Bibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
PPTX
National level data metrics framework development in Kouth Korea -Iljr Rha
PPTX
Learning Analytics: New thinking supporting educational research
PPTX
Online Lecture May 2015
PPT
Survey on Integration of digital tools
00 intro
lessonslearned.ppt
Bibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
National level data metrics framework development in Kouth Korea -Iljr Rha
Learning Analytics: New thinking supporting educational research
Online Lecture May 2015
Survey on Integration of digital tools

Similar to introintrointrointrointrointrointrointro (20)

PPTX
EMMA Summer School - Rebecca Ferguson - Learning design and learning analytic...
PPTX
Learning design and learning analytics
PPTX
Learning Analytics: Realizing their Promise in the California State University
PDF
2014 e learning innovations conference maina muuro keynoteaddress 31st_july_2014
PDF
Emerging trends in librarianship
PPSX
Intro Course Overview
PPTX
2019 01 16 data matters - v6 - Using data to support the student digital expe...
PPT
E L E A R N I N G I N L I B R A R Y A N D I N F O R M A T I O N S C I E ...
PPTX
Job Talk: Research (2013) - Kennesaw State University
PPTX
Rae t4 d-knowledge-economy-sa-urs-dec2017
PDF
Elin Wihlborg, Mariana S. Gustafsson: Organizing safe on-line interaction and...
PPTX
Increasing Retention in Online Courses: Integrating Learning Preferences with...
PPTX
Learning analytics, learning design and MOOCs
PPTX
Research groups and teaching experiences at Computer Science Faculty (UNED)
PDF
Emerging Trends in Librarianship 2014
PDF
The role of a Socio-informatrician
PPTX
Integrating mobile devices and apps into your teaching
PDF
2014.07.02 EDUPUB Europe 2014 Oslo & Open Forum 報告
PDF
Technology Integration
PPTX
2014 NMC Horizon K-12 and Higher Education
EMMA Summer School - Rebecca Ferguson - Learning design and learning analytic...
Learning design and learning analytics
Learning Analytics: Realizing their Promise in the California State University
2014 e learning innovations conference maina muuro keynoteaddress 31st_july_2014
Emerging trends in librarianship
Intro Course Overview
2019 01 16 data matters - v6 - Using data to support the student digital expe...
E L E A R N I N G I N L I B R A R Y A N D I N F O R M A T I O N S C I E ...
Job Talk: Research (2013) - Kennesaw State University
Rae t4 d-knowledge-economy-sa-urs-dec2017
Elin Wihlborg, Mariana S. Gustafsson: Organizing safe on-line interaction and...
Increasing Retention in Online Courses: Integrating Learning Preferences with...
Learning analytics, learning design and MOOCs
Research groups and teaching experiences at Computer Science Faculty (UNED)
Emerging Trends in Librarianship 2014
The role of a Socio-informatrician
Integrating mobile devices and apps into your teaching
2014.07.02 EDUPUB Europe 2014 Oslo & Open Forum 報告
Technology Integration
2014 NMC Horizon K-12 and Higher Education
Ad

Recently uploaded (20)

PDF
Journal Meraj.pdfuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
PDF
Caterpillar CAT 311B EXCAVATOR (8GR00001-UP) Operation and Maintenance Manual...
PPTX
Understanding Machine Learning with artificial intelligence.pptx
PPTX
1. introduction-to-bvcjdhjdfffffffffffffffffffffffffffffffffffmicroprocessors...
PDF
EC290C NL EC290CNL Volvo excavator specs.pdf
PPTX
capstoneoooooooooooooooooooooooooooooooooo
PDF
Volvo EC300D L EC300DL excavator weight Manuals.pdf
PDF
Volvo EC20C Excavator Service maintenance schedules.pdf
PPTX
Fire Fighting Unit IV industrial safety.pptx
PDF
Honda Dealership SNS Evaluation pdf/ppts
PDF
Volvo EC290C NL EC290CNL Hydraulic Excavator Specs Manual.pdf
PDF
Delivers.ai: 2020–2026 Autonomous Journey
PDF
RPL-ASDC PPT PROGRAM NSDC GOVT SKILLS INDIA
PPTX
laws of thermodynamics with diagrams details
PPTX
Paediatric History & Clinical Examination.pptx
PPTX
Culture by Design.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PDF
Marketing project 2024 for marketing students
PPTX
Robot_ppt_YRG[1] [Read-Only]bestppt.pptx
PDF
Caterpillar Cat 315C Excavator (Prefix CJC) Service Repair Manual Instant Dow...
PDF
computer system to create, modify, analyse or optimize an engineering design.
Journal Meraj.pdfuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
Caterpillar CAT 311B EXCAVATOR (8GR00001-UP) Operation and Maintenance Manual...
Understanding Machine Learning with artificial intelligence.pptx
1. introduction-to-bvcjdhjdfffffffffffffffffffffffffffffffffffmicroprocessors...
EC290C NL EC290CNL Volvo excavator specs.pdf
capstoneoooooooooooooooooooooooooooooooooo
Volvo EC300D L EC300DL excavator weight Manuals.pdf
Volvo EC20C Excavator Service maintenance schedules.pdf
Fire Fighting Unit IV industrial safety.pptx
Honda Dealership SNS Evaluation pdf/ppts
Volvo EC290C NL EC290CNL Hydraulic Excavator Specs Manual.pdf
Delivers.ai: 2020–2026 Autonomous Journey
RPL-ASDC PPT PROGRAM NSDC GOVT SKILLS INDIA
laws of thermodynamics with diagrams details
Paediatric History & Clinical Examination.pptx
Culture by Design.pptxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Marketing project 2024 for marketing students
Robot_ppt_YRG[1] [Read-Only]bestppt.pptx
Caterpillar Cat 315C Excavator (Prefix CJC) Service Repair Manual Instant Dow...
computer system to create, modify, analyse or optimize an engineering design.
Ad

introintrointrointrointrointrointrointro

  • 1. Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 17, 2014
  • 2. IR, Spring 2014 www.Vidyarthiplus.com 2 Instructor & TA • Instructor – J. H. Wang ( 王正豪 ) – Associate Professor, CSIE, NTUT – Office: R1534, Technology Building – E-mail: jhwang@csie.ntut.edu.tw – Tel: ext. 4238 – Office Hour: 9:00-12:00 am, every Tuesday and Thursday • TA – TBD (R1424, Technology Building)
  • 3. IR, Spring 2014 www.Vidyarthiplus.com 3 Course Description • Course Web Page: for the latest announcements and updates of schedule, slides, and homeworks – http://www.ntut.edu.tw/~jhwang/IR/ • Time: 9:10-12:00am, Fri. • Classroom: R334, Technology Building • Textbook: – Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008. • Available online • International Student Edition, imported by Kai-Fa ( 開發 ) Publishing • Prerequisites: – Basic knowledge of data structures and algorithms, linear algebra, and probability theory – Programming experience is *required* for homeworks & projects
  • 4. Target Audience • Seniors • Master students • IMEECS (International Master’s Program in Electrical Engineering and Computer Science) IR, Spring 2014 www.Vidyarthiplus.com 4
  • 5. IR, Spring 2014 www.Vidyarthiplus.com 5 Additional References • References: – Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Tec hnology behind Search , Addison-Wesley, 2011. • This is the second edition of their book Modern Information Retrieval in 1999. ( 華通 ) – Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. ( 全華 ) – Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines
  • 6. IR, Spring 2014 www.Vidyarthiplus.com 6 More Books on IR • Gerald Salton, Automatic information organization and retrieval, McGraw-Hill, 1968. • Gerald Salton and M.J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983. – Two classics, but out-of-print. • C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. – The classic. More than 40 years old, but still worth reading. • K. Sparck Jones, P. Willett, Readings in Information Retrieval, Morgan Kaufmann, 1997. – A collection of classical IR papers. (out of print) • I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 2nd edition, 1999. – The authority on index construction and compression.
  • 7. IR, Spring 2014 www.Vidyarthiplus.com 7 Grading Policy • Homework assignments and programming exercises: ~40% • Mid-term exam: ~25% • Term project: ~35% – Including proposal, presentation, and final report
  • 8. IR, Spring 2014 www.Vidyarthiplus.com 8 Programming Exercises and Term Project • About 3 programming exercises – Team-based (at most 2 persons per team) – You can either write your own code or reuse existing open source code • The term project – Either team-based system development (the same as programming exercises) – Or academic paper presentation • Only one person per team allowed – A proposal is *required* before midterm (Apr. 11, 2014)
  • 9. IR, Spring 2014 www.Vidyarthiplus.com 9 About the Term Project • The score you get depends on the functions, difficulty and quality of your project – For system development: • System functions and correctness – For academic paper presentation • Quality and your presentation of the paper • Major methods/experimental results *must* be presented • Papers from top conferences are strongly suggested – E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, … • Proposals are *required* for each team, and will be counted in the score
  • 10. IR, Spring 2014 www.Vidyarthiplus.com 10 Online Submission • Submission instructions – Programs, project proposals, and project reports in electronic files must be submitted to the TA online at: • Submissions website: (TBD) – Before submission: • User name: Your student ID • Please change your default password at your first login
  • 11. IR, Spring 2014 www.Vidyarthiplus.com 11 What this Course is NOT about • This course will NOT tell you – The tips and tricks of using search engines, although power users might have better ideas on how to improve them • There’re plenty of books and websites on that… – How to find books in libraries, although it’s somewhat related to the basic IR concepts – How to make money on the Web, although the currently largest search engine did it
  • 12. What’s Information Retrieval? • Things that you have been doing all day! – Searching for something interesting: Web, news, e-mail, image, video, … – Asking for advices – … • User interests are changing all the time… – 2011: New Zealand Earthquake – 2012: Jeremy Lin – 2013: Meteor Russia – 2014: ? (next slide) IR, Spring 2014 www.Vidyarthiplus.com 12
  • 13. IR, Spring 2014 www.Vidyarthiplus.com 13 What’s Information Retrieval
  • 14. In Google News IR, Spring 2014 www.Vidyarthiplus.com 14
  • 15. IR, Spring 2014 www.Vidyarthiplus.com 15
  • 16. IR, Spring 2014 www.Vidyarthiplus.com 16 In Wikipedia
  • 17. IR, Spring 2014 www.Vidyarthiplus.com 17
  • 18. In Google Images IR, Spring 2014 www.Vidyarthiplus.com 18
  • 19. IR, Spring 2014 www.Vidyarthiplus.com 19
  • 20. IR, Spring 2014 www.Vidyarthiplus.com 20 In Google Video Search
  • 21. IR, Spring 2014 www.Vidyarthiplus.com 21 In Google Translate…
  • 22. IR, Spring 2014 www.Vidyarthiplus.com 22 Or More Related Keywords • Blast • Explosion • Chelyabinsk • Asteroid 2012 DA14 • …
  • 23. What if We Search in Chinese IR, Spring 2014 www.Vidyarthiplus.com 23
  • 24. IR, Spring 2014 www.Vidyarthiplus.com 24 And More… • 流星 • 彗星 • 隕石 • 俄羅斯 • 地球 • … • And other languages… • And other search engines… • And social websites…
  • 25. IR, Spring 2014 www.Vidyarthiplus.com 25 In Google Trends
  • 26. IR, Spring 2014 www.Vidyarthiplus.com 26 And More…
  • 27. IR, Spring 2014 www.Vidyarthiplus.com 27 And Other Keywords…
  • 28. IR, Spring 2014 www.Vidyarthiplus.com 28 And Social Search…
  • 29. How do I Know What People Care about? IR, Spring 2014 www.Vidyarthiplus.com 29
  • 30. IR, Spring 2014 www.Vidyarthiplus.com 30
  • 31. What are People Searching in Taiwan? IR, Spring 2014 www.Vidyarthiplus.com 31
  • 32. IR, Spring 2014 www.Vidyarthiplus.com 32 What Is Information Retrieval? • “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968)
  • 33. IR, Spring 2014 www.Vidyarthiplus.com 33 Goal • Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents • In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR
  • 34. IR, Spring 2014 www.Vidyarthiplus.com 34 A Big Picture
  • 35. IR, Spring 2014 www.Vidyarthiplus.com 35 Inverted Index User Interface Text Operations Query Expansion Indexing Retrieval Ranking Text query user need user feedback ranked docs retrieved docs Doc representation logical view inverted file Document Collection
  • 36. IR, Spring 2014 www.Vidyarthiplus.com 36 Topics • Text IR – Indexing and searching – Query languages and operations • Retrieval evaluation • Modeling – Boolean model – Vector space model – Probabilistic model • Applications for IR – Multimedia IR – Web search – Digital libraries
  • 37. IR, Spring 2014 www.Vidyarthiplus.com 37 Organization of the Textbook • Basics in IR (focus) – Inverted indexes for boolean queries (Ch.1-5) – Term weighting and vector space model (Ch. 6-7) – Evaluation in IR (Ch. 8) • Advanced Topics – Relevance feedback (Ch. 9) – XML retrieval (Ch. 10) – Probabilistic IR (Ch. 11) – Language models (Ch. 12) • Machine learning in IR (useful) – Text classification (Ch. 13-15) – Document clustering (Ch. 16-18) • Web Search – Web crawling and indexes (Ch. 19-20) – Link analysis (Ch. 21)
  • 38. Some Overlap with Other Fields • Text mining • Machine Learning • Natural Language Processing • Social Network Analysis • … IR, Spring 2014 www.Vidyarthiplus.com 38
  • 39. IR, Spring 2014 www.Vidyarthiplus.com 39 Pointers to Other Topics • Cross-language IR • Image, video, and multimedia IR • Speech retrieval • Music retrieval • User interfaces • Parallel, distributed, and P2P IR • Digital libraries • Information science perspective • Logic-based approaches to IR • Natural language processing techniques • …
  • 40. IR, Spring 2014 www.Vidyarthiplus.com 40 Tentative Schedule • Before midterm – Boolean retrieval (1 wk) – Indexing (2 wks) – Vector space model and evaluation (2 wk) – Relevance feedback (1 wk) – Probabilistic IR (2 wk) • After midterm – Text classification (1-2 wk) – Document clustering (1-2 wk) – Web search (2 wks) – Advanced topics: CLIR, IE, … (2 wks) – Term Project Presentation (3 wks)
  • 41. IR, Spring 2014 www.Vidyarthiplus.com 41 Generic Resources • Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Informatio n_retrieval • Information Retrieval Resources: http://www-csli.stanford.edu/~hinrich/i nformation-retrieval.html •
  • 42. IR, Spring 2014 www.Vidyarthiplus.com 42 Academic Resources • Journals – ACM TOIS: Transactions on Information Systems – JASIST: Journal of the American Society of Information Sciences – IP&M: Information Processing and Management – IEEE TKDE: Transactions on Knowledge and Data Engineering • Conferences – ACM SIGIR: International Conference on Information Retrieval – WWW: World Wide Web Conference – ACM CIKM: Conference on Information Knowledge and Management – JCDL: ACM/IEEE Joint Conference on Digital Libraries – ACM WSDM: International Conference on Web Search and Data Mining – TREC: Text Retrieval Conference
  • 43. Teaching in English… • Slides and lectures will be offered mainly in English • For better understanding for domestic students, important concepts will be briefly summarized in Chinese IR, Spring 2014 www.Vidyarthiplus.com 43
  • 44. IR, Spring 2014 www.Vidyarthiplus.com 44 Thanks for Your Attention! • Any question or comment? Please feel free to send e-mails to jhwang@csie.ntut.edu.tw or discuss with me at my office