3
Most read
4
Most read
9
Most read
What can a corpus tell us about discourse? Mª Paz Muñoz Elisabet Martínez Ainoa Ortiz Ester Ortega Kautar Ouatik 2º Grado en Estudios Ingleses
1.  What is discourse? The term “discouse” has two meanings: Slippery:  it eludes neat definition Baggy:  it embraces a wide range of linguistic and social phenomena. Two basic senses of the term “discourse”: Discourse as conected text ->  formal sense Discourse as language in use ->  functional sense
The reference to the context and the participants can include not just the inmediate context of situation but the larger social and cultural context as well. Acording to Schiffrin “ To understand the language of discourse…we need to understand the world in which it resides” We can distinguish between two broad areas under “ the describable internal relationships”:  Cohesion across sentences and utterances, using grammatical and lexical devices (Halliday and Hasan 1976) The organisation and management of discourse, including the distribution of given and new information, topic management. ( Brow and Yule 1983, Coulthard 1985 and McCarthy 1991)
2. What can a corpus tell us about discourse?  Analysts would need to use quantitative methods with the aim of producing  findings that are both descriptive and explanatory. The descriptive findings are generated by searching for particular discourse features in a corpus - a collection of texts of a specific register, a single extended text, such as a textbook or a novel - using computational means.
Explaining the frequency, significance and use of these features generally involves reference to context, either the immediate co-textual environment, or to other text or other corpora of texts. For, as Stubbs (2001a) remind us, ' in corpus work, context means two rather different things:  co-text (a short span of a few words within one single text), and inter-text  ( repeated occurrences, often a very large number, of similar patterns across different, independent texts').
The analyst may  compare and contrast an individual text, or a sub-corpus of texts of a specific type, with text of another type, or with a larger and more general corpus. Corpus-derived frequency information has revolutionized language description at the level of  lexis and grammas, so too has the study of discourse hugely benefited from the kinds of quantitative data that corpora yield.
Using  corpus tools to identify what makes individual texts cohesive, or to track their internal organization through the use of discourses makers is more problematic. From a discourse perspective, typically involves identifying the  micro-features of specific text types and from this extrapolating textual macro-features.
Corpus tools cannot easily detect cohesive tries, such as pronominal reference, unless they have been tagged as such. It is another matter to identify what a device is cohesive  with. More amenable to corpus analysis are features of  lexical cohesion ( Halliday and Hasan 1976) including reiteration ( the direct and indirect repetition of words, the use of synonyms, near synonyms and general terms) and collocation.
Studies combining corpus data and other research tools, such as  discourse completion tests (DCTs), provide insights into how the corpus data are realized in specific context and between specific participants. They combine corpus-based procedures with  research methods from other disciplines, such as genre analysis, phraseology, pragmatic and ethnography. Research into the third level of discourse,  discourse-as-social-practice.
3. What are the limitations of using a corpus in the study of discourse and how might we overcome them? Discourse analyst have been slow to embrace the opportunities offered by corpus linguistics, due to the perception that corpora consist of de-contextualised text fragment.  Discourse analysis requires whole texts, often of the same type. Specialised corpora of specific registers, including spoken language, have proliferated, and most corpora of general English, including many that are freely available online, are tagged for text-type and register.
But there is a more fundamental problem facing the discourse analyst. While corpus tools allow researchers to track, tally and plot the surface features of discourse. They do not necessarily correlate with the underlying semantic relations between part of a text. This is a limitation of the study of cohesion in general. Halliday and Hasan: “cohesion expresses the continuity that exists between oe part of the text and another” In the end, quantitative data alone are not going to answer all the questions that analyst bring to the study of discourse. A comination of computation and interpretation offer the most promising way forward.
4. How does a corpus-based approach work in practice? It’s necessary to follow these steps to do the approach (analysis): Frequency list  & words researches Identify regularities of a corpus of linguistics. Constructs a provisional schematic (It can be  checked  and refined) Account the contextual and cultural factors of texts. Researcher is in a position of speculate (how the formal features of the texts encode their communicative and social functions)  
An example of a small corpus (10.000 words) of teenage written narratives:  the Cringe Text Corpus,  it was compiled using an online teenage magazine as the source. It was used: To compile a list of the most frequent words in the corpus. To search these for linkers (eliminate instances of linkage at the phrase level , I was so embarrassed )
For example: Paratactic syntax   (If it is used “and” many times) Hypotactic style   (Subordinate clauses, academic abstracts)
Dispersion plot:  Graphically tool, where the position of the item in each of the sixteen texts in which it occurs is visually represented by a short vertical line in the column headed  Plot .
  These thirty words alone are a strong indicator as to the thematic content of the narratives that comprise the Cringe Text Corpus. The fact that many are semantically related, either because belong to the same lexical set such as school, locker, walked, ran or to the same word family: friend, friends, walking, walked..
The semantic relations are typically instantly in the form of cohesive chains there are two types : identity chain and similarity chain   Identity chains typically run the length of the whole text. Similarity chain of words that are related simply because they commonly co-occur is called a cluster.
An identity chain is a set of items that are co-referential: every member of the set refers to the same person or event, (I, me) The items in a similarity chain “belong to the same general field of meaning, referring to actions, events, and objects and their tributes) Cross-checking with the individual texts in the corpus we can clearly see that they follow a narrative structure that shares characteristics with the structure described by Labov and Waletzky Abstract-orientation-complication-evaluation- resolution-coda
  In short, story-telling is the way that women and teenage girls perform their gender. A more critical analysis might argue that such discursive practices maintain and reproduce  asymmetrical  power relations in society, and that the teenage girls magazines are complicit in a process of discursively positioning their readership as the helpless and disempowered objects of male derision.   In this point, the statistical data that a corpus approach delivers can serve to corroborate the findings of a more impressionistic approach, to confirm or disconfirm, hunches, and to suggest new directions for further interrogation of the texts themselves. This cyclical alternation between counting and interpreting accurately characterises the application of corpus analysis to discourse.  
5. Wat kind of data do you need to study discourse? Obviously, discourse analysis needs texts; if not whole conversations, at least pretty long fragments. Most discourse analysis focuses on textual features of specific text-type, so it is needed sufficient examples of these to provide general data. However, it does not mean it has to be enormous. As Partington says, in a recompilation of texts of similar type, the interactional processes and the contexts remain fairly constant. Therefore, O'Keeffe claims that 'specialised lexis and structures are likely to occur with more regular patterning and distribution even with relatively small amounts of data'.
One of those kinds of corpus (that targets a specific text-type and register) would be the Cringe Text Corpus. Another specific, small corpus that was compiled using texts available on the internet is one of 24,000 words (from an academic journal) which consists of 139 texts, with about 174 words each. For descriptive and pedagogical purposes, a corpus this size can give us enough information.  For a more specific study, some form of tagging (grammatical, semantic or phonological) is almost a requirement. Nevertheless, Baker says that first, corpus builders have to think about what type of questions they want their corpus answer, and then decide if some particular forms of tagging are necessary or not.
The advantage of a small and homogeneous corpus is that the context is precise and specific, and this is crucial if the goal is the study of discourse-as-language-in-context or the investigation of discourse-as-social-practice. Mahlberg notes that ‘the way in which an analysis of corpus data can be related to social situations depends on the information that is available on the origins and context of the texts’. Moreover, ‘if the texts in a corpus are selected according to transparent criteria and information on their contexts is stored together with the texts, corpora can provide useful insights into meanings that are relevant to a society and indicative of the ways in which society created itself’.
Developments in the application of complex system theory to language acquisition and use suggest that we are experiencing the conjunction of two disciplines, corpus linguistics and psycholinguistics, which used to work in parallel, even though both were concerned with frequency effects and the important of usage.  The association of these two fields has influenced, for instance, in Ellis and Lu’s research of formulaic expressions. As well, investigation may demonstrate the frequency of occurrence of specific discourses, and of variations in their texture, both influence and are influenced by the performance of these discourses, by individuals and across whole socio-cgroupsultural.

More Related Content

PPT
Discourse and corpus
PPT
Critical discourse analysis
PPTX
Discourse analysis
PPT
Discourse in Society.ppt
PPTX
Coherence and speech act (Istifadah Luthfata Sari - Universitas Wahidiyah)
PPTX
Introducing Pragmatics
PPTX
Discourse analysis
PPTX
Discourse analysis and grammar
Discourse and corpus
Critical discourse analysis
Discourse analysis
Discourse in Society.ppt
Coherence and speech act (Istifadah Luthfata Sari - Universitas Wahidiyah)
Introducing Pragmatics
Discourse analysis
Discourse analysis and grammar

What's hot (20)

PPTX
Pragmatics presentation
PPTX
DISCOURSE AND PRAGMATICS
PPTX
DISCOURSE ANALYSIS FOR LANGUAGE TEACHER
PPTX
Introducing Critical Discourse Analysis
PPT
What can a corpus tell us about grammar?
PPT
Conversation analysis
PPTX
Unit 1
PPTX
Corpus linguistics and multi-word units
PPTX
The role of universal grammar in first and second language acquisition
PDF
Corpus Tools for Language Teaching
PPTX
Critical Discourse Analysis
PPTX
Halliday's model of language and discousre
PPT
Hedges
PDF
Interactional sociolinguistics
PPTX
Discourse analysis
PDF
Situational syllabus
PDF
Lecture 1 introduction to syntax
PPTX
Corpus linguistics
PPTX
Corpus Linguistics
PPT
3 the referential-theory2
Pragmatics presentation
DISCOURSE AND PRAGMATICS
DISCOURSE ANALYSIS FOR LANGUAGE TEACHER
Introducing Critical Discourse Analysis
What can a corpus tell us about grammar?
Conversation analysis
Unit 1
Corpus linguistics and multi-word units
The role of universal grammar in first and second language acquisition
Corpus Tools for Language Teaching
Critical Discourse Analysis
Halliday's model of language and discousre
Hedges
Interactional sociolinguistics
Discourse analysis
Situational syllabus
Lecture 1 introduction to syntax
Corpus linguistics
Corpus Linguistics
3 the referential-theory2
Ad

Viewers also liked (20)

PPT
What can a corpus tell us about lexis (1)
PPSX
Corpus linguistics
PDF
Education as a multilingual and multicultural space
PDF
Aesla 2011 getting_things_done_pascual_pérez-paredes
PPTX
Reflexión docente
PPTX
Reflexión docente
PPTX
Resumen presentacion2
PPTX
Developing corpus-based resources for language learning: looking back in "hope"
PPTX
A case study on college english classroom discourse
PPT
Specialist genres
PDF
Language corpora and the language classroom.
PPTX
A contrastive analysis of native and non-native speaker interviews
PPT
Corpus linguistics in language learning
DOCX
The Use of Corpus Linguistics in Lexicography
PPTX
Corpus linguistics
PDF
Exploring classroom discourse
PPT
Lexicography
PPTX
How to Use Corpora in Language Teaching
PDF
Foreign Language Classroom Assessment in Support of Teaching and Learning
DOCX
Corpus approaches to discourse analysis
What can a corpus tell us about lexis (1)
Corpus linguistics
Education as a multilingual and multicultural space
Aesla 2011 getting_things_done_pascual_pérez-paredes
Reflexión docente
Reflexión docente
Resumen presentacion2
Developing corpus-based resources for language learning: looking back in "hope"
A case study on college english classroom discourse
Specialist genres
Language corpora and the language classroom.
A contrastive analysis of native and non-native speaker interviews
Corpus linguistics in language learning
The Use of Corpus Linguistics in Lexicography
Corpus linguistics
Exploring classroom discourse
Lexicography
How to Use Corpora in Language Teaching
Foreign Language Classroom Assessment in Support of Teaching and Learning
Corpus approaches to discourse analysis
Ad

Similar to What can a corpus tell us about discourse (20)

PDF
Text Analysis, SFL, Birmingham School of text analysis
PPT
1588458063-discourse-vs.ppt
PPTX
Corpus and semantics final
DOCX
Corpus Analysis in Corpus linguistics
PDF
A COMPARATIVE STUDY OF COHESION AND COHERENCE IN MALE AND FEMALE AUTHORED NOV...
PDF
discourse analysis in language - report.pdf
PDF
Analysing Multimodal Intertextuality An Illustrative Analysis
PPTX
Nuevo presentación de microsoft power point
PPTX
Nuevo presentación de microsoft power point
PDF
Fillmore case grammar
PPTX
4 Qualitative analysis Research Methods.pptx
PPTX
Discourse analysis for teachers
PDF
Thematization
PPTX
LECTURE-13b APPROACHES TO TEXT ANALYSIS.pptx
PDF
An Outline Of Type-Theoretical Approaches To Lexical Semantics
PPTX
Dimensions of discourse and its fields .pptx
PPTX
Computer assisted text and corpus analysis
PDF
Using Corpora In Discourse Analysis Paul Baker
PDF
Black max models-and_metaphors_studies_in_language and philosophy
PDF
Sequences In Language And Text George K Mikros Editor Jn Macutek Editor
Text Analysis, SFL, Birmingham School of text analysis
1588458063-discourse-vs.ppt
Corpus and semantics final
Corpus Analysis in Corpus linguistics
A COMPARATIVE STUDY OF COHESION AND COHERENCE IN MALE AND FEMALE AUTHORED NOV...
discourse analysis in language - report.pdf
Analysing Multimodal Intertextuality An Illustrative Analysis
Nuevo presentación de microsoft power point
Nuevo presentación de microsoft power point
Fillmore case grammar
4 Qualitative analysis Research Methods.pptx
Discourse analysis for teachers
Thematization
LECTURE-13b APPROACHES TO TEXT ANALYSIS.pptx
An Outline Of Type-Theoretical Approaches To Lexical Semantics
Dimensions of discourse and its fields .pptx
Computer assisted text and corpus analysis
Using Corpora In Discourse Analysis Paul Baker
Black max models-and_metaphors_studies_in_language and philosophy
Sequences In Language And Text George K Mikros Editor Jn Macutek Editor

More from Pascual Pérez-Paredes (20)

PPTX
TELL-OP App - How it works
PPTX
PDF
Higher Education as a multilingual and multicultural space
PDF
English-medium instruction as a transformation policy
PDF
European Commission Erasmus – Facts, Figures & Trends.
PDF
Escribir ciencia en inglés
PPT
Pedagogical applications of corpus data for English for General and Specific ...
PPTX
Using pedagogic corpora in ELT
PPTX
Los blogs en el área de humanidades
PPTX
Kynnig á degi íslenskrar tungu
PPTX
Rannsókn á lestrarvenjum og notkun bókmennta
PPT
Involvement in personal narratives-ma of learner language
PPT
Jornada lectura lit. infantil September 28, 2011
PPT
Teaching and learning children litarature in europa ni̇han
PPTX
UK Comenius project dissemination event
PPTX
Corpus linguistics and pragmatics
PPT
What can a corpus tell us about registers and genres douglas biber
PPT
Boundaries and bridges in a European children’s literature project
PPT
CALICO 2010 Workshop
KEY
*Annotation tools: bridgingthe gap between corpora and clinical psychology.
TELL-OP App - How it works
Higher Education as a multilingual and multicultural space
English-medium instruction as a transformation policy
European Commission Erasmus – Facts, Figures & Trends.
Escribir ciencia en inglés
Pedagogical applications of corpus data for English for General and Specific ...
Using pedagogic corpora in ELT
Los blogs en el área de humanidades
Kynnig á degi íslenskrar tungu
Rannsókn á lestrarvenjum og notkun bókmennta
Involvement in personal narratives-ma of learner language
Jornada lectura lit. infantil September 28, 2011
Teaching and learning children litarature in europa ni̇han
UK Comenius project dissemination event
Corpus linguistics and pragmatics
What can a corpus tell us about registers and genres douglas biber
Boundaries and bridges in a European children’s literature project
CALICO 2010 Workshop
*Annotation tools: bridgingthe gap between corpora and clinical psychology.

Recently uploaded (20)

PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
IGGE1 Understanding the Self1234567891011
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Computer Architecture Input Output Memory.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
HVAC Specification 2024 according to central public works department
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
IGGE1 Understanding the Self1234567891011
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Practical Manual AGRO-233 Principles and Practices of Natural Farming
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Computer Architecture Input Output Memory.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
TNA_Presentation-1-Final(SAVE)) (1).pptx
Weekly quiz Compilation Jan -July 25.pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
LDMMIA Reiki Yoga Finals Review Spring Summer
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Introduction to pro and eukaryotes and differences.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
HVAC Specification 2024 according to central public works department
Complications of Minimal Access-Surgery.pdf
B.Sc. DS Unit 2 Software Engineering.pptx
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx

What can a corpus tell us about discourse

  • 1. What can a corpus tell us about discourse? Mª Paz Muñoz Elisabet Martínez Ainoa Ortiz Ester Ortega Kautar Ouatik 2º Grado en Estudios Ingleses
  • 2. 1. What is discourse? The term “discouse” has two meanings: Slippery: it eludes neat definition Baggy: it embraces a wide range of linguistic and social phenomena. Two basic senses of the term “discourse”: Discourse as conected text -> formal sense Discourse as language in use -> functional sense
  • 3. The reference to the context and the participants can include not just the inmediate context of situation but the larger social and cultural context as well. Acording to Schiffrin “ To understand the language of discourse…we need to understand the world in which it resides” We can distinguish between two broad areas under “ the describable internal relationships”: Cohesion across sentences and utterances, using grammatical and lexical devices (Halliday and Hasan 1976) The organisation and management of discourse, including the distribution of given and new information, topic management. ( Brow and Yule 1983, Coulthard 1985 and McCarthy 1991)
  • 4. 2. What can a corpus tell us about discourse? Analysts would need to use quantitative methods with the aim of producing findings that are both descriptive and explanatory. The descriptive findings are generated by searching for particular discourse features in a corpus - a collection of texts of a specific register, a single extended text, such as a textbook or a novel - using computational means.
  • 5. Explaining the frequency, significance and use of these features generally involves reference to context, either the immediate co-textual environment, or to other text or other corpora of texts. For, as Stubbs (2001a) remind us, ' in corpus work, context means two rather different things: co-text (a short span of a few words within one single text), and inter-text ( repeated occurrences, often a very large number, of similar patterns across different, independent texts').
  • 6. The analyst may compare and contrast an individual text, or a sub-corpus of texts of a specific type, with text of another type, or with a larger and more general corpus. Corpus-derived frequency information has revolutionized language description at the level of lexis and grammas, so too has the study of discourse hugely benefited from the kinds of quantitative data that corpora yield.
  • 7. Using corpus tools to identify what makes individual texts cohesive, or to track their internal organization through the use of discourses makers is more problematic. From a discourse perspective, typically involves identifying the micro-features of specific text types and from this extrapolating textual macro-features.
  • 8. Corpus tools cannot easily detect cohesive tries, such as pronominal reference, unless they have been tagged as such. It is another matter to identify what a device is cohesive with. More amenable to corpus analysis are features of lexical cohesion ( Halliday and Hasan 1976) including reiteration ( the direct and indirect repetition of words, the use of synonyms, near synonyms and general terms) and collocation.
  • 9. Studies combining corpus data and other research tools, such as discourse completion tests (DCTs), provide insights into how the corpus data are realized in specific context and between specific participants. They combine corpus-based procedures with research methods from other disciplines, such as genre analysis, phraseology, pragmatic and ethnography. Research into the third level of discourse, discourse-as-social-practice.
  • 10. 3. What are the limitations of using a corpus in the study of discourse and how might we overcome them? Discourse analyst have been slow to embrace the opportunities offered by corpus linguistics, due to the perception that corpora consist of de-contextualised text fragment. Discourse analysis requires whole texts, often of the same type. Specialised corpora of specific registers, including spoken language, have proliferated, and most corpora of general English, including many that are freely available online, are tagged for text-type and register.
  • 11. But there is a more fundamental problem facing the discourse analyst. While corpus tools allow researchers to track, tally and plot the surface features of discourse. They do not necessarily correlate with the underlying semantic relations between part of a text. This is a limitation of the study of cohesion in general. Halliday and Hasan: “cohesion expresses the continuity that exists between oe part of the text and another” In the end, quantitative data alone are not going to answer all the questions that analyst bring to the study of discourse. A comination of computation and interpretation offer the most promising way forward.
  • 12. 4. How does a corpus-based approach work in practice? It’s necessary to follow these steps to do the approach (analysis): Frequency list & words researches Identify regularities of a corpus of linguistics. Constructs a provisional schematic (It can be checked and refined) Account the contextual and cultural factors of texts. Researcher is in a position of speculate (how the formal features of the texts encode their communicative and social functions)  
  • 13. An example of a small corpus (10.000 words) of teenage written narratives: the Cringe Text Corpus, it was compiled using an online teenage magazine as the source. It was used: To compile a list of the most frequent words in the corpus. To search these for linkers (eliminate instances of linkage at the phrase level , I was so embarrassed )
  • 14. For example: Paratactic syntax (If it is used “and” many times) Hypotactic style (Subordinate clauses, academic abstracts)
  • 15. Dispersion plot: Graphically tool, where the position of the item in each of the sixteen texts in which it occurs is visually represented by a short vertical line in the column headed Plot .
  • 16.   These thirty words alone are a strong indicator as to the thematic content of the narratives that comprise the Cringe Text Corpus. The fact that many are semantically related, either because belong to the same lexical set such as school, locker, walked, ran or to the same word family: friend, friends, walking, walked..
  • 17. The semantic relations are typically instantly in the form of cohesive chains there are two types : identity chain and similarity chain   Identity chains typically run the length of the whole text. Similarity chain of words that are related simply because they commonly co-occur is called a cluster.
  • 18. An identity chain is a set of items that are co-referential: every member of the set refers to the same person or event, (I, me) The items in a similarity chain “belong to the same general field of meaning, referring to actions, events, and objects and their tributes) Cross-checking with the individual texts in the corpus we can clearly see that they follow a narrative structure that shares characteristics with the structure described by Labov and Waletzky Abstract-orientation-complication-evaluation- resolution-coda
  • 19.   In short, story-telling is the way that women and teenage girls perform their gender. A more critical analysis might argue that such discursive practices maintain and reproduce asymmetrical power relations in society, and that the teenage girls magazines are complicit in a process of discursively positioning their readership as the helpless and disempowered objects of male derision.   In this point, the statistical data that a corpus approach delivers can serve to corroborate the findings of a more impressionistic approach, to confirm or disconfirm, hunches, and to suggest new directions for further interrogation of the texts themselves. This cyclical alternation between counting and interpreting accurately characterises the application of corpus analysis to discourse.  
  • 20. 5. Wat kind of data do you need to study discourse? Obviously, discourse analysis needs texts; if not whole conversations, at least pretty long fragments. Most discourse analysis focuses on textual features of specific text-type, so it is needed sufficient examples of these to provide general data. However, it does not mean it has to be enormous. As Partington says, in a recompilation of texts of similar type, the interactional processes and the contexts remain fairly constant. Therefore, O'Keeffe claims that 'specialised lexis and structures are likely to occur with more regular patterning and distribution even with relatively small amounts of data'.
  • 21. One of those kinds of corpus (that targets a specific text-type and register) would be the Cringe Text Corpus. Another specific, small corpus that was compiled using texts available on the internet is one of 24,000 words (from an academic journal) which consists of 139 texts, with about 174 words each. For descriptive and pedagogical purposes, a corpus this size can give us enough information. For a more specific study, some form of tagging (grammatical, semantic or phonological) is almost a requirement. Nevertheless, Baker says that first, corpus builders have to think about what type of questions they want their corpus answer, and then decide if some particular forms of tagging are necessary or not.
  • 22. The advantage of a small and homogeneous corpus is that the context is precise and specific, and this is crucial if the goal is the study of discourse-as-language-in-context or the investigation of discourse-as-social-practice. Mahlberg notes that ‘the way in which an analysis of corpus data can be related to social situations depends on the information that is available on the origins and context of the texts’. Moreover, ‘if the texts in a corpus are selected according to transparent criteria and information on their contexts is stored together with the texts, corpora can provide useful insights into meanings that are relevant to a society and indicative of the ways in which society created itself’.
  • 23. Developments in the application of complex system theory to language acquisition and use suggest that we are experiencing the conjunction of two disciplines, corpus linguistics and psycholinguistics, which used to work in parallel, even though both were concerned with frequency effects and the important of usage. The association of these two fields has influenced, for instance, in Ellis and Lu’s research of formulaic expressions. As well, investigation may demonstrate the frequency of occurrence of specific discourses, and of variations in their texture, both influence and are influenced by the performance of these discourses, by individuals and across whole socio-cgroupsultural.