SlideShare a Scribd company logo
FOREIGN LANGUAGES
FOR HUMANS AND
COMPUTERS
Peter Zukerman
University of Illinois at
Urbana Champaign
Majoring in Computer Science
and Linguistics
Volunteer at CodeMash
Using the tools available to
us today, you could easily
become conversational in a
language in 5 months
Then why did it take 3-4
years of study in High
School or College and we
can barely speak?
TOOLS
Anki – spaced repetition flashcards
(mobile, desktop)
iTalki or similar – live 1 on 1 lessons
and conversations with native
speakers (website)
Native Material – books, articles,
videos in the language
WHY ANKI?
Anki is a spaced repetition flash
card system.
1
By judging how well you know a
flashcard, it shows you it again
on an interval
2
If you miss a word, it shows it
again. Otherwise it’s shown after
a variable delay.
3
5 MONTH PLAN
5000 word flashcard deck sorted by frequency (Anki)
1000 flashcard grammar deck (Anki)
Weekly lessons with a native speaker (iTalki)
15 new words a day X 5 months = 2250 words (Anki)
6 grammar points a day (Anki)
I tested out of Korean in college using this method!
FOREIGN LANGUAGES AND
PROGRAMMING:
CHARACTER SETS AND
ENCODINGS
If you are a programmer […] and
you don’t know the basics of
characters, character sets,
encodings, and Unicode, and I
catch you, I’m going to punish
you by making you peel onions
for 6 months in a submarine
- Joel Spolsky, Cofounder of
StackOverflow.
ASCII
Numbers between 32 and 127
represent all the characters that
matter…
But what do you do with these:
鬱病, Путин, ‫,שלום‬ ‫,مرحبا‬ 😞
…to the English speakers
UNICODE
•A worldwide standard
•A single unique character set supporting all
alphabets and other symbols
•Unicode contains about 1,110,000 code
points (numeric representations of
characters)
WHAT UNICODE IS NOT:
COMMON
MISCONCEPTIONS
Not a 2-byte character set
Not the same as UTF-32/16/8
Not tied to any particular byte
representation (encoding)
TYPES OF UNICODE ENCODING
UTF-32
• Each character is
UTF-16
• Uses two bytes for most alphabets, and 4 bytes for less common ones
• Pro – less wasteful
• Con – some waste, incompatible with ASCII
UTF-8
•The most popular type of Unicode encoding
•It uses 1 byte for ASCII, 2 bytes for European
and Middle Eastern characters, and 3 or 4 bytes
for CJK and various non letters (emojis, math)
•Pro – backwards compatible with ASCII, not
wasteful
•Con – variable length
Foreign Languages for Humans and Computers
Foreign Languages for Humans and Computers
You can’t read text if
you don’t know how
its encoded
“There Ain’t No Such
Thing As Plain Text.”
-Joel Spolsky
ABOUT ME
• Sophomore at University of Illinois at Urbana
Champaign
• Majoring in Computer Science and Linguistics
• Interested in Natural Language Processing
and Machine Learning

More Related Content

PPTX
Foreign Languages for Humans and Computers
PPT
Exploring New Domains
PPTX
First day ppt for downloading
PPTX
Learning English as a Second Language
PPTX
What is the role played by the Native Language in SLA?
PPTX
RESEARCH WITH LOW-LITERATE ADOLESCENT & ADULT L2 LEARNERS
PPTX
Native Vs Non-Native Accent
DOCX
How does 'Elllo' work?
Foreign Languages for Humans and Computers
Exploring New Domains
First day ppt for downloading
Learning English as a Second Language
What is the role played by the Native Language in SLA?
RESEARCH WITH LOW-LITERATE ADOLESCENT & ADULT L2 LEARNERS
Native Vs Non-Native Accent
How does 'Elllo' work?

Similar to Foreign Languages for Humans and Computers (20)

PPT
Lecture_ASCII and Unicode.ppt
PDF
Unicode Explained Includes Index 1st Ed Korpela Jukka K
PPTX
Ascii and Unicode (Character Codes)
PPTX
What character is that
PPT
Unicode
PPTX
ASCII and Unicode.pptx
PPT
Unicode Fundamentals
PDF
Unicode (and Python)
PPTX
Encoding Nightmares (and how to avoid them)
PDF
Welcome to International Journal of Engineering Research and Development (IJERD)
PDF
Unicode Primer for the Uninitiated
PDF
A test of character; ASCII silly question get a silly ANSI
KEY
International Web Application Development
PDF
PDF
MULTILINGUAL CONVERSATION ASCII TO UNICODE IN INDIC SCRIPT
PDF
Character-Encoding-UnicodeWebinarSlides-20230515.pdf
PDF
Localizing your apps for multibyte languages
PDF
Abap slide class4 unicode-plusfiles
PDF
Sienna 12 huffman
PPTX
Lecture_ASCII and Unicode.ppt
Unicode Explained Includes Index 1st Ed Korpela Jukka K
Ascii and Unicode (Character Codes)
What character is that
Unicode
ASCII and Unicode.pptx
Unicode Fundamentals
Unicode (and Python)
Encoding Nightmares (and how to avoid them)
Welcome to International Journal of Engineering Research and Development (IJERD)
Unicode Primer for the Uninitiated
A test of character; ASCII silly question get a silly ANSI
International Web Application Development
MULTILINGUAL CONVERSATION ASCII TO UNICODE IN INDIC SCRIPT
Character-Encoding-UnicodeWebinarSlides-20230515.pdf
Localizing your apps for multibyte languages
Abap slide class4 unicode-plusfiles
Sienna 12 huffman
Ad

Recently uploaded (20)

PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
Autodesk AutoCAD Crack Free Download 2025
PPTX
Introduction to Windows Operating System
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Microsoft Office 365 Crack Download Free
PDF
Types of Token_ From Utility to Security.pdf
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
Computer Software - Technology and Livelihood Education
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Monitoring Stack: Grafana, Loki & Promtail
Designing Intelligence for the Shop Floor.pdf
CNN LeNet5 Architecture: Neural Networks
Autodesk AutoCAD Crack Free Download 2025
Introduction to Windows Operating System
Salesforce Agentforce AI Implementation.pdf
Topaz Photo AI Crack New Download (Latest 2025)
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
DNT Brochure 2025 – ISV Solutions @ D365
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Patient Appointment Booking in Odoo with online payment
Microsoft Office 365 Crack Download Free
Types of Token_ From Utility to Security.pdf
GSA Content Generator Crack (2025 Latest)
Wondershare Recoverit Full Crack New Version (Latest 2025)
Computer Software - Technology and Livelihood Education
MCP Security Tutorial - Beginner to Advanced
Monitoring Stack: Grafana, Loki & Promtail
Ad

Foreign Languages for Humans and Computers

  • 1. FOREIGN LANGUAGES FOR HUMANS AND COMPUTERS Peter Zukerman University of Illinois at Urbana Champaign Majoring in Computer Science and Linguistics Volunteer at CodeMash
  • 2. Using the tools available to us today, you could easily become conversational in a language in 5 months Then why did it take 3-4 years of study in High School or College and we can barely speak?
  • 3. TOOLS Anki – spaced repetition flashcards (mobile, desktop) iTalki or similar – live 1 on 1 lessons and conversations with native speakers (website) Native Material – books, articles, videos in the language
  • 4. WHY ANKI? Anki is a spaced repetition flash card system. 1 By judging how well you know a flashcard, it shows you it again on an interval 2 If you miss a word, it shows it again. Otherwise it’s shown after a variable delay. 3
  • 5. 5 MONTH PLAN 5000 word flashcard deck sorted by frequency (Anki) 1000 flashcard grammar deck (Anki) Weekly lessons with a native speaker (iTalki) 15 new words a day X 5 months = 2250 words (Anki) 6 grammar points a day (Anki) I tested out of Korean in college using this method!
  • 7. If you are a programmer […] and you don’t know the basics of characters, character sets, encodings, and Unicode, and I catch you, I’m going to punish you by making you peel onions for 6 months in a submarine - Joel Spolsky, Cofounder of StackOverflow.
  • 8. ASCII Numbers between 32 and 127 represent all the characters that matter… But what do you do with these: 鬱病, Путин, ‫,שלום‬ ‫,مرحبا‬ 😞 …to the English speakers
  • 9. UNICODE •A worldwide standard •A single unique character set supporting all alphabets and other symbols •Unicode contains about 1,110,000 code points (numeric representations of characters)
  • 10. WHAT UNICODE IS NOT: COMMON MISCONCEPTIONS Not a 2-byte character set Not the same as UTF-32/16/8 Not tied to any particular byte representation (encoding)
  • 11. TYPES OF UNICODE ENCODING UTF-32 • Each character is UTF-16 • Uses two bytes for most alphabets, and 4 bytes for less common ones • Pro – less wasteful • Con – some waste, incompatible with ASCII
  • 12. UTF-8 •The most popular type of Unicode encoding •It uses 1 byte for ASCII, 2 bytes for European and Middle Eastern characters, and 3 or 4 bytes for CJK and various non letters (emojis, math) •Pro – backwards compatible with ASCII, not wasteful •Con – variable length
  • 15. You can’t read text if you don’t know how its encoded “There Ain’t No Such Thing As Plain Text.” -Joel Spolsky
  • 16. ABOUT ME • Sophomore at University of Illinois at Urbana Champaign • Majoring in Computer Science and Linguistics • Interested in Natural Language Processing and Machine Learning