Introduction to Corpus Linguistics for Beginner

INTRODUCTION TO
CORPUS LINGUISTICS
karlinadenistia@staff.uns.ac.id
@karlinakuning
Karlina_Denistia

Corpus
Linguistics
–
Karlina
Denistia
2
https://scholar.google.de/citations?hl=en&user=D2U9r3cAAAAJ&view_op=list_works&sortby=pubdate

5
OUTLINE
• Background story
• What is corpus linguistics?
• Sources of corpus data
• Which sources for which research?

Language rules and systems
• Both of these are acceptable sentences
• We worked out the problem
• We worked the problem out
6

Language rules and systems
• Both of these are acceptable sentences
• We worked out the problem
• We worked the problem out
• Only one of these sentences may not be equally acceptable
• We worked out it
• We worked it out
the first one is likely to sound strange to many native speakers of English
7

8
Language variation:
- Speaker
- Context
- Necessity

9
OUTLINE

10
What is corpus linguistics?
• Corpus linguistics describes language variation and use by looking at large amounts of
texts that have been produced
• Written: news writing, text messaging or academic writing
• Oral: news reporting, face-to-face conversation or academic lectures
• A corpus is a representative collection of language that can be used to make statements
about language use
• a fairly large number of examples
• can be read by local computer

11
OUTLINE

12
Sources of corpus data
• Containing real world examples
• Books, papers, letters, spoken language, dialogues, twitter, news, chat
history, song lyrics, twitter, facebook posts, movie subtitle, etc
• Size: million words

Electronically available and computer-processable
• e.g., PDF  optical character recognition  text file
• e.g., audio file  speech to text by Siri  text file
• Built using semi-automated process (e.g., web crawlers)
• Manually typewritten text or copied - pasted news from internet file?

16
https://corpora.uni-leipzig.de/en?corpusId=ind_mixed_2013&word=

18
What is called as „I am doing a corpus
linguistics“?
• it is empirical, analyzing the actual patterns of use in natural language texts
• it utilizes a large and principled collection of natural texts, known as a “corpus”, as the
basis for analysis
• it makes extensive use of computers for analysis, using both automatic and interactive
techniques
• it depends on both quantitative and qualitative analytical techniques
(Biber, Conrad, & Reppen, 1998: 4)

21
Break and think:
What can we do with this corpus?
Morphology : Indonesian affix productivity
Semantics : figurative language with `head‘
Syntax : adverb mobility in Indonesian
Language use : new words in Indonesian corpora
Pragmatics : formal and informal construction
Any other ideas?

22
OUTLINE

23
Which corpus for which research?
• British National Corpus
• 4,048 texts (variety of texts
written in British English)
• Around 100 million words
• Lake district corpus
• 28 texts (Texts about Lake District
between 1700 – 1900 British English)
• 273,861 words

Know the aim of your research
24
(Gabrielatos, 2013)

25
Which one will you use?
• British National Corpus
• 4,048 texts (variety of texts
written in British English)
• Around 100 million words
• Lake district corpus
• 28 texts (Texts about Lake District
between 1700 – 1900 British English)
• 273,861 words

26
Summary
• Corpus linguistics allows more possibilities to describe linguistics
phenomena based on language use
• There are various kinds of corpora that could be used as the source of
information for language research
• Choosing corpora depends on the research question(s)

27
Any questions?
See you next week 
Note: you need to download AntConc for our next meeting

28
References
• Biber, D., S. Conrad & R. Reppen. 1998. Corpus Linguistics: Investigating Language, Structure and Use.
Cambridge: Cambridge University Press
• Crawford, William J., and Eniko Csomay. 2016. Doing Corpus Linguistics. New York: Routledge.
• Gabrielatos, Costas. 2013. Sketching Muslims: A Corpus Driven Analysis of Representations Around the Word
'Muslim' in the British Press 1998-2009. Applied Linguistics, 34(3): 255:278.

Introduction to Corpus Linguistics for Beginner

More Related Content

Similar to Introduction to Corpus Linguistics for Beginner (20)

Recently uploaded (20)

Introduction to Corpus Linguistics for Beginner

Editor's Notes