3. Abstract
โข To tackle open-domain question answering
โข using Wikipedia as the unique knowledge source.
โข MRS: โMachine reading at scaleโ
1) Document Retriever (relevant articles ์ฐพ๊ธฐ) ์ ๋ณด ๊ฒ์
โข Search component (Bigram hashing + TF-IDF matching)
2) Document Reader (identifying the answer from articles) ์ ๋ณด ์ถ์ถ
โข Answer detection (multi-layer RNN)
โข Multitask learning + Distant supervision ๊ธฐ๋ฒ์ผ๋ก Full system์ฑ๋ฅ ํฅ์
3
4. Introduction
โข Wikipedia as a knowledge base (KB)
- A constantly evolving source of detailed information that could facilitate intelligent
machines.
- Contains up-to-date knowledge that humans are interested in.
- However, Wikipedia is designed for humans to read (not machines).
4
22. Distant supervision
โข Most machine learning techniques require a set of training data. A traditional approach for
collecting training data is to have humans label a set of documents. For example, for the
marriage relation, human annotators may label the pair "Bill Clinton" and "Hillary Clinton" as a
positive training example. This approach is expensive in terms of both time and money, and if
our corpus is large, will not yield enough data for our algorithms to work with. And because
humans make errors, the resulting training data will most likely be noisy.
โข An alternative approach to generating training data is distant supervision. In distant
supervision, we make use of an already existing database, such as Freebase or a domain-specific
database, to collect examples for the relation we want to extract. We then use these examples
to automatically generate our training data. For example, Freebase contains the fact that Barack
Obama and Michelle Obama are married. We take this fact, and then label each pair of "Barack
Obama" and "Michelle Obama" that appear in the same sentence as a positive example for our
marriage relation. This way we can easily generate a large amount of (possibly noisy) training
data. Applying distant supervision to get positive examples for a particular relation is easy,
but generating negative examples is more of an art than a science.
22