DIVING INTO THE WORLD OF SPELL
CHECKS!
Niharika Krishnan
Cloud Community Days Conference - 19 June 2020
Before we get started….
NIHARIKA KRISHNAN
➢ Machine Learning Engineer, TCS
○ Build Chatbots for a living!
➢ Founding Member of PyLadies Chennai
○ Community of 100+ women tech enthusiasts
➢ Speaker
○ PyCon Canada’19, India’19
○ Google Women Techmakers, Global Diversity CFP
➢ AI and NLP enthusiast
~ 20 mails in a day
269 Billion Emails
in 2019
Diving into the World of Spell Checks - Niharika - CCDays
Python Packages
>>>from nltk.metrics import edit_distance
>>>edit_distance(“rain”,“shine”)
3
>>>b = TextBlob(“I havv goood speling”)
>>>print(b.correct())
I have good spelling!
>>>from spellchecker import SpellChecker
>>>spell = SpellChecker()
>>>misspelled = spell.unknown([“cmputr”, “study”,“watr”])
>>>for word in misspelled:
>>> print(spell.correction(word))
>>> print(spell.candidates(word))
computer
{'caput', 'caputs', 'compute', 'computor', 'impute', 'computer'}
water
{'water', 'watt', 'warr', 'wart', 'war', 'wath', 'wat'}
What happens under the
hood ?
Spell-checks
➢ Spell Checker points to spelling errors and possibly
suggests alternatives
➢ Autocorrector automatically picks the most likely word
➢ Types:
○ PHONETICS
○ EDIT DISTANCE (Peter Norvig)
○ SYMMETRIC DELETE SPELLING CORRECTION
(SymSpell)
➢ Real word Errors vs Non-Word Errors
Phonetics
➢ Detect similar-sounding words even if they are spelt differently like Smith &
Schmidt
➢ Creates a specific phonetic representation of a single word
➢ Algorithms:
○ SOUNDEX
○ METAPHONE
Soundex
Edit Distance
➢ Quantifying how dissimilar two strings are to
one another
➢ Minimum number of edit operations required to
transform s1 into s2
○ Insertion, Deletion
○ Substitution, Transposition
Algorithms
➢ LEVENSHTEIN
○ Insertion + Deletion + Substitution
○ RECIEVE → RECEIVE → Edit Distance = 2
○ RECEIVE → RECEIPT → Edit Distance = 2
○ Very different semantically and context
➢ DAMERAU - LEVENSHTEIN
○ Insertion + Deletion + Substitution + Transposition
○ Character swapping
➢ LEAST COMMON SUBSEQUENCE
○ Insertion + Deletion
Kitten → Sitten (substitute "s" for "k") Kitten → itten (delete "k" at 0)
sittEn → sittIn (substitute "i" for "e") itten → Sitten (insert "s" at 0)
sittin → sittinG (insert "g" at the end) sittEn → sittn (delete "e" at 4)
sittn → sittIn (insert "i" at 4)
sittin → sittinG (insert "g" at 6)
Levenshtein vs Longest Common Sequence
Algorithms
➢ HAMMING DISTANCE
○ SUBSTITUTION
○ Only applies to strings of the same length
➢ JARO
○ TRANSPOSITION + Matching Characters
○ Range [0,1] : 0 - Least Similar, 1 - Most Similar
➢ JARO-WINKLER
○ TRANSPOSITION + Matching Characters + Prefix
○ Uses a prefix scale of ‘p’ which gives more favourable ratings to strings that match from the
beginning for a set prefix length
Symmetric Delete Spelling Correction
➢ Delete-only edit candidate generation
➢ 5 letter word → 3 Million Possibilities vs 25 Possibilities (Edit Distance: 3)
INSERTION delete (dictionary entry,edit_distance) input entry
goa delete(goal,1), delete(goat,1) goa
DELETION dictionary entry delete(input entry,edit_distance)
goall goal delete(goall,1)
SUBSTITUTION &
TRANSPOSITION
delete(dictionary entry,edit_distance) delete(input entry,edit_distance)
goal delete(goal,1), delete(goat,1) delete(goak,1)
1 Million times faster
➢ Verbosity parameter:
○ Top: highest term frequency + smallest edit distance
○ Closest: smallest edit distance found, ordered by term frequency
○ All: All suggestions within maxEditDistance, ordered by edit distance, term frequency
➢ maxEditDistance
➢ Word frequency dictionary:
○ LoadDictionary
○ CreateDictionary (Customize it for your use-case!)
Symspellpy
Let’s see how symspell works!
QUESTIONS
niharikakrishnan
linkedin.com/in/niharikakrishnan
@Nihaaarika
Slide Deck: https://github.com/niharikakrishnan/Talks
Want to explore further? Let’s connect!

More Related Content

PDF
Don't Hire Me
PDF
Recreational Drivers
PDF
AWS Serverless Event-driven Architecture - in lastminute.com meetup
PPTX
Understanding azure batch service
PDF
DEVOPS AND MACHINE LEARNING
PDF
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
PPT
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
PPTX
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
Don't Hire Me
Recreational Drivers
AWS Serverless Event-driven Architecture - in lastminute.com meetup
Understanding azure batch service
DEVOPS AND MACHINE LEARNING
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES

More from CodeOps Technologies LLP (20)

PPTX
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
PPTX
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
PPTX
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
PPTX
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
PPTX
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
PPTX
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
PDF
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
PDF
YAML Tips For Kubernetes by Neependra Khare
PDF
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
PPTX
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
PDF
Jet brains space intro presentation
PDF
Functional Programming in Java 8 - Lambdas and Streams
PPTX
Distributed Tracing: New DevOps Foundation
PDF
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
PDF
Improve customer engagement and productivity with conversational ai
PPTX
Text semantics with azure text analytics cognitive services
PPTX
Build your model using azure custom vision and deploy it in a webapp
PDF
Quantum machine learning with microsoft q# at AI Dev Day
PPTX
Understanding Azure Face API at AI Dev Day Conference
PDF
Java & Microservices in Azure
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
YAML Tips For Kubernetes by Neependra Khare
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Jet brains space intro presentation
Functional Programming in Java 8 - Lambdas and Streams
Distributed Tracing: New DevOps Foundation
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
Improve customer engagement and productivity with conversational ai
Text semantics with azure text analytics cognitive services
Build your model using azure custom vision and deploy it in a webapp
Quantum machine learning with microsoft q# at AI Dev Day
Understanding Azure Face API at AI Dev Day Conference
Java & Microservices in Azure
Ad

Recently uploaded (20)

PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Visual explanation of Dijkstra's Algorithm using Python
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Types of Token_ From Utility to Security.pdf
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
Designing Intelligence for the Shop Floor.pdf
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Time Tracking Features That Teams and Organizations Actually Need
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PPTX
assetexplorer- product-overview - presentation
PPTX
Monitoring Stack: Grafana, Loki & Promtail
Weekly report ppt - harsh dattuprasad patel.pptx
Visual explanation of Dijkstra's Algorithm using Python
How Tridens DevSecOps Ensures Compliance, Security, and Agility
How to Use SharePoint as an ISO-Compliant Document Management System
Why Generative AI is the Future of Content, Code & Creativity?
Computer Software and OS of computer science of grade 11.pptx
Types of Token_ From Utility to Security.pdf
GSA Content Generator Crack (2025 Latest)
Wondershare Recoverit Full Crack New Version (Latest 2025)
Designing Intelligence for the Shop Floor.pdf
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Patient Appointment Booking in Odoo with online payment
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Oracle Fusion HCM Cloud Demo for Beginners
Time Tracking Features That Teams and Organizations Actually Need
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
assetexplorer- product-overview - presentation
Monitoring Stack: Grafana, Loki & Promtail
Ad

Diving into the World of Spell Checks - Niharika - CCDays

  • 1. DIVING INTO THE WORLD OF SPELL CHECKS! Niharika Krishnan Cloud Community Days Conference - 19 June 2020
  • 2. Before we get started…. NIHARIKA KRISHNAN ➢ Machine Learning Engineer, TCS ○ Build Chatbots for a living! ➢ Founding Member of PyLadies Chennai ○ Community of 100+ women tech enthusiasts ➢ Speaker ○ PyCon Canada’19, India’19 ○ Google Women Techmakers, Global Diversity CFP ➢ AI and NLP enthusiast
  • 3. ~ 20 mails in a day 269 Billion Emails in 2019
  • 6. >>>from nltk.metrics import edit_distance >>>edit_distance(“rain”,“shine”) 3 >>>b = TextBlob(“I havv goood speling”) >>>print(b.correct()) I have good spelling! >>>from spellchecker import SpellChecker >>>spell = SpellChecker() >>>misspelled = spell.unknown([“cmputr”, “study”,“watr”]) >>>for word in misspelled: >>> print(spell.correction(word)) >>> print(spell.candidates(word)) computer {'caput', 'caputs', 'compute', 'computor', 'impute', 'computer'} water {'water', 'watt', 'warr', 'wart', 'war', 'wath', 'wat'}
  • 7. What happens under the hood ?
  • 8. Spell-checks ➢ Spell Checker points to spelling errors and possibly suggests alternatives ➢ Autocorrector automatically picks the most likely word ➢ Types: ○ PHONETICS ○ EDIT DISTANCE (Peter Norvig) ○ SYMMETRIC DELETE SPELLING CORRECTION (SymSpell) ➢ Real word Errors vs Non-Word Errors
  • 9. Phonetics ➢ Detect similar-sounding words even if they are spelt differently like Smith & Schmidt ➢ Creates a specific phonetic representation of a single word ➢ Algorithms: ○ SOUNDEX ○ METAPHONE
  • 11. Edit Distance ➢ Quantifying how dissimilar two strings are to one another ➢ Minimum number of edit operations required to transform s1 into s2 ○ Insertion, Deletion ○ Substitution, Transposition
  • 12. Algorithms ➢ LEVENSHTEIN ○ Insertion + Deletion + Substitution ○ RECIEVE → RECEIVE → Edit Distance = 2 ○ RECEIVE → RECEIPT → Edit Distance = 2 ○ Very different semantically and context ➢ DAMERAU - LEVENSHTEIN ○ Insertion + Deletion + Substitution + Transposition ○ Character swapping ➢ LEAST COMMON SUBSEQUENCE ○ Insertion + Deletion
  • 13. Kitten → Sitten (substitute "s" for "k") Kitten → itten (delete "k" at 0) sittEn → sittIn (substitute "i" for "e") itten → Sitten (insert "s" at 0) sittin → sittinG (insert "g" at the end) sittEn → sittn (delete "e" at 4) sittn → sittIn (insert "i" at 4) sittin → sittinG (insert "g" at 6) Levenshtein vs Longest Common Sequence
  • 14. Algorithms ➢ HAMMING DISTANCE ○ SUBSTITUTION ○ Only applies to strings of the same length ➢ JARO ○ TRANSPOSITION + Matching Characters ○ Range [0,1] : 0 - Least Similar, 1 - Most Similar ➢ JARO-WINKLER ○ TRANSPOSITION + Matching Characters + Prefix ○ Uses a prefix scale of ‘p’ which gives more favourable ratings to strings that match from the beginning for a set prefix length
  • 15. Symmetric Delete Spelling Correction ➢ Delete-only edit candidate generation ➢ 5 letter word → 3 Million Possibilities vs 25 Possibilities (Edit Distance: 3) INSERTION delete (dictionary entry,edit_distance) input entry goa delete(goal,1), delete(goat,1) goa DELETION dictionary entry delete(input entry,edit_distance) goall goal delete(goall,1) SUBSTITUTION & TRANSPOSITION delete(dictionary entry,edit_distance) delete(input entry,edit_distance) goal delete(goal,1), delete(goat,1) delete(goak,1) 1 Million times faster
  • 16. ➢ Verbosity parameter: ○ Top: highest term frequency + smallest edit distance ○ Closest: smallest edit distance found, ordered by term frequency ○ All: All suggestions within maxEditDistance, ordered by edit distance, term frequency ➢ maxEditDistance ➢ Word frequency dictionary: ○ LoadDictionary ○ CreateDictionary (Customize it for your use-case!) Symspellpy Let’s see how symspell works!