SlideShare a Scribd company logo
Going beyond simple word-list
creation using CasualConc
Yasuhiro IMAO
Osaka University, Japan
casualconc@gmail.com
AACL 2018 at Georgia State University, Atlanta GA
A few questions
How many of you are Mac users?
How many of you have used CasualConc?
A few observations
Through attending presentations / reading papers
Methods of analysis
Use of statistics
Hugely depend on
the access to the resources
the tools one uses
specialized application
programing skills
someone who can write scripts
To advance the field
more easy-to-use and accessible tools are necessary
Current Situation
AntConc and Antxxx
and other more small specialized application
WordSmith Tools / Monoconc Pro
The gold standard?
Introducing CasualConc
A little bit of background
I started developing a concordancer around 2005
I released the first, limited version around 2008
It is a Mac native app!
KWIC, Word/n-gram Lists, Collocation
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
Before going into more detail
Small Scale Corpus Research
Building your own specialized corpus
Possibly adding annotations (POS, syntactic, etc.)
Which tools to use?
A suggestion (not the answer)
I have developed few companion apps
CasualTranscriber (transcription helper)
CasualTextractor (text extractor/editor)
CasualTagger (tagging helper)
CasualPConc (parallel concordancer)
CasualTagger
KWIC search + short-cut tag insertion
Batch Processing
Tagging (TreeTagger, STF CoreNLP, MeCab)
Tokenizing (macOS built-in tagger)
Sentence Splitting (macOS built-in tagger, STF CoreNLP)
Misspelling detection (non-dictionary words)
compound-2-word variation detection
CasualTagger
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc
Finally, the main point!
Today’s highlights
Corpus file management - going beyond loading files
More informative word/n-gram lists
Individual file word/n-gram lists
More keyword extraction functions - going beyond LL/χ2
Visualizing frequency data - incl. multivariate analyses
Some other niche functions
Corpus File Management
Corpus File Management
Accept Drag & Drop!
Where were those files I want to use?
Of course, you can use Spotlight to search them, but…
If the application remembers where they are…
Corpus File Management
Corpus File Management
More Informative Word/n-gram List
Proportion, # of Files, % of Files
Proportion, # of Files, % of Files
3-grams / 4-grams comparison
p-frames
p-frames
Lemma / Spelling variations/ multi-word
Lemma / Spelling Variation
Multi-word Recognition
Multi Filters
Search Word List
Individual File Word/n-gram Lists
Individual File/Corpus Word/n-gram Lists
More keyword extraction functions
Sample
ICNALE - Writing
Written learner English corpus
College students in Asian countries/regions
JPN, CHN, HKG, IDN, KOR, PAK, PHI, SIN, THA, TWN
Two topics
Ave. 220-230 words
Simple Keyword Extraction (LL)
Multiple Keyword Statistics
Key-2-gram Extraction
Key-3-gram Extraction
Going Beyond Simple Keyness
Multiple Keyness Comparisons
TF-IDF
(term-frequency / inverse document frequency)
TF-IDF
(term-frequency / inverse document frequency)
Mann-Whitney-U (using multiple files)
Random Forest (1000 words)
Visualizing Frequency Data
CasualConc integrates
Word Cloud
Let’s go back to keyword analysis
Simple Keyword Extraction (LL)
Keyword analysis is done
How can you check the validity?
Let’s see how well they separate the groups
Use the LL result above 6.63 (p < .01)
Principle Component Analysis (PCA)
Cluster Analysis
What about different ability levels?
Principle Component Analysis (PCA)
Cluster Analysis
Heat Map
By the way, do you check raw data?
Not a lot of people really look at the data…
Let’s check the use of “I think”
Box Plot
Histogram
What about a word
that are too common to be looked at?
Box Plot
Histogram
Some other misc features…
Relative Position Plot (paragraph)
therefore / however
Japanese
Non-Japanese
ENS
Relative Position Polot (sentence)
therefore / however
Japanese
Non-Japanese
ENS
Vocab Profiler
Tokenizer
Tokenizer
Proper Name Extraction
The Way Forward
Utilizing Stanford CoreNLP - dependency information
Grammatical Relationship Search
CasualConc
I just released version 2.1.0 with the updated manual
The manual is full of screenshots with over 250 pages
It is a FREEWARE
Downloadable from
https://sites.google.com/site/casualconc
Or just google ‘casualconc’

More Related Content

PPT
Textmining
PPTX
Antconc
PPTX
Concordancer
ODP
Bibliographic metadata (including citation)
PPTX
Ant conc notes
PPTX
Ant conc ~design & development of a freeware
PPTX
Argument extraction from news, blogs and social media.
PPSX
Concordances
Textmining
Antconc
Concordancer
Bibliographic metadata (including citation)
Ant conc notes
Ant conc ~design & development of a freeware
Argument extraction from news, blogs and social media.
Concordances

What's hot (20)

PPT
cldr_overview
PPT
Safe assignment
PDF
Survey On Building A Database Driven Reverse Dictionary
PPTX
PPTX
7. name binding and scopes
PPTX
The big conversation: open annotation in manuscripts and the web
PDF
11 terms in Corpus Linguistics1 (2)
PDF
11 terms in corpus linguistics1 (1)
PDF
The Typed Index
PPTX
Corpus Linguistics :Analytical Tools
PPTX
Anti-plagiarism tools for our repositories
PPT
Boolean Retrieval
PDF
Realization of natural language interfaces using
PDF
Extracting keywords from texts - Sanda Martincic Ipsic
PPT
Dwilsonlis557
PDF
Using Variables in Programming
PPTX
Information retrieval guide
PDF
Adaptive information extraction
PPTX
What are the basics of Analysing a corpus? chpt.10 Routledge
PPT
Declarative programming language
cldr_overview
Safe assignment
Survey On Building A Database Driven Reverse Dictionary
7. name binding and scopes
The big conversation: open annotation in manuscripts and the web
11 terms in Corpus Linguistics1 (2)
11 terms in corpus linguistics1 (1)
The Typed Index
Corpus Linguistics :Analytical Tools
Anti-plagiarism tools for our repositories
Boolean Retrieval
Realization of natural language interfaces using
Extracting keywords from texts - Sanda Martincic Ipsic
Dwilsonlis557
Using Variables in Programming
Information retrieval guide
Adaptive information extraction
What are the basics of Analysing a corpus? chpt.10 Routledge
Declarative programming language
Ad

Similar to AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc (20)

PPT
Enhancing Language Learning Using Corpora
PDF
New Trends In Corpora And Language Learning Ana Frankenberggarcia Lynne Flowe...
PPTX
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
PDF
Text Mining Analytics 101
PDF
information retrival and text processing
PPTX
Using topic modelling frameworks for NLP and semantic search
ODP
Corpora, Blogs and Linguistic Variation (Paderborn)
PPTX
3. introduction to text mining
PPTX
3. introduction to text mining
PPTX
Visualizing Words and Topics with Scattertext
PDF
Big Data Palooza Talk: Aspects of Semantic Processing
PDF
learn about text preprocessing nip using nltk
PPT
Practical cases, Applied linguistics course (MUI)
PDF
The CW Corpus PITR2013
PPT
Concordancing 1
PDF
Ontology learning
PDF
Practical Corpus Linguistics An Introduction to Corpus-Based Language Analysi...
PDF
Corpus Linguistics for Language Teaching and Learning
PDF
ICT Tools for Teaching Vocabulary
PDF
Build your own corpus
Enhancing Language Learning Using Corpora
New Trends In Corpora And Language Learning Ana Frankenberggarcia Lynne Flowe...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Text Mining Analytics 101
information retrival and text processing
Using topic modelling frameworks for NLP and semantic search
Corpora, Blogs and Linguistic Variation (Paderborn)
3. introduction to text mining
3. introduction to text mining
Visualizing Words and Topics with Scattertext
Big Data Palooza Talk: Aspects of Semantic Processing
learn about text preprocessing nip using nltk
Practical cases, Applied linguistics course (MUI)
The CW Corpus PITR2013
Concordancing 1
Ontology learning
Practical Corpus Linguistics An Introduction to Corpus-Based Language Analysi...
Corpus Linguistics for Language Teaching and Learning
ICT Tools for Teaching Vocabulary
Build your own corpus
Ad

Recently uploaded (20)

PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Transform Your Business with a Software ERP System
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
ai tools demonstartion for schools and inter college
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Nekopoi APK 2025 free lastest update
PDF
Odoo Companies in India – Driving Business Transformation.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Operating system designcfffgfgggggggvggggggggg
PTS Company Brochure 2025 (1).pdf.......
Transform Your Business with a Software ERP System
Which alternative to Crystal Reports is best for small or large businesses.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
2025 Textile ERP Trends: SAP, Odoo & Oracle
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Understanding Forklifts - TECH EHS Solution
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
ai tools demonstartion for schools and inter college
Odoo POS Development Services by CandidRoot Solutions
Reimagine Home Health with the Power of Agentic AI​
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Choose the Right IT Partner for Your Business in Malaysia
Nekopoi APK 2025 free lastest update
Odoo Companies in India – Driving Business Transformation.pdf

AACL 2018 - Going Beyond Simple Word-list Creation Using CasualConc