SlideShare a Scribd company logo
Human Interface Laboratory
Towards Cross-Lingual Generalization of
Translation Gender Bias
2021. 3. 9 @FAccT Conference
Won Ik Cho*, Jiwon Kim*, Jaeyoung Yang, Nam Soo Kim
Contents
• Translation gender bias
 What’s the problem and why this matters?
 Significant in which language pairs? - Struggles so far
• Our approach
 Language pairs and template
 Dataset construction
 Measurement of fluency and biasedness
• Discussion
 Results and analysis
 Takeaways
1
Bias
• Bias in machine learning?
 Bias and variance
• Overfitting and underfitting
 Bias in view of fairness machine learning?
• Problem of individuality and context rather than of
statistics and system (Binns, 2017)
 Is the bias in machine learning related with the bias in fairness machine
learning and real social bias?
• e.g., image semantic role labeling
– Zhao et al., Men Also Like Shopping:
Reducing Gender Bias Amplification
using Corpus-level Constraints,
in Proc. EMNLP, 2017.
• This also happens in translation!
2
Bias
• What is shown (social) bias in AI and NLP?
 Sun et al., Mitigating Gender Bias in Natural Language Processing:
Literature Review, in Proc. ACL, 2019.
3
Overview: Gender bias in translation?
• Formulation #1
 Gender-neutral pronouns
• Target problem?
 Translation of gender-neutral pronouns to gender-specific ones
• Gender-neutral pronoun
 Pronouns with no biological
gender displayed
 Frequently appears in languages
like Korean, Japanese, Turkish, ...
 Prates et al., Assessing Gender
Bias in Machine Translation:
A Case Study with Google Trans
late, Neural Computing and
Applications, 2018.
4
Overview: Gender bias in translation?
• Formulation #2
 Gendered languages
• Target problem?
 Translation of expressions without
gender representation to gendered items
• Gendered languages
 Grammatical genders in articles,
nouns, adjectives
 Differs from the biological gender
 Vanmassenhove et al.,
Getting Gender Right in
Neural Machine Translation,
in Proc. EMNLP, 2018.
5
Overview: Gender bias in translation?
• Why do they matter?
 The result can be offensive to end users
• When they do matter?
 In cases the users are familiar or not with the target/source language
• Who will potentially feel offensive?
 Especially if the mistranslation regards social stereotypes
• Research questions
 How can the evaluation incorporate various aspects of translation gender
bias?
 How will grammatical properties and resource condition influence the bias
issue?
6
Template-based attacks
• 걔(s/he)는 [##]이야!
 Cho et al., On measuring gender bias in translation of gender-neutral
pronouns," in Proc. GeBNLP, ACL Workshop, 2019.
• Why Korean?
 Displays various sentence styles
 Translation service
popular among the users
7
Semantic cues
• WinoMT
 Stanovsky et al., Evaluating Gender Bias in Machine Translation,
in Proc. ACL, 2019.
8
- Performance can differ among
a similar language family
- High accuracy in stereotypical
case does not guarantee the
accuracy in the counterpart
Our approach
• Combined approach – Cross-lingual evaluation?
 Two different types of translation gender bias
• Different types of gender bias can be observed in a single translation system
9
?
Our approach
• Language pairs
 Different types of `language pairs’ should be investigated
• 1) Languages with gender-neutral pronouns
• 2) Languages without (colloquially used) GNPs but not grammatically gendered
• 3) Languages with grammatical gender
10
Our approach
• Language pairs
 Linguistic criteria: On GNPs and genderedness
• Korean/Tagalog -> English -> German/Portuguese
 Resource criteria
• Korean > Tagalog
• German > Portuguese
11
Our approach
• Template
 How can the pairs be simultaneously evaluated?
12
Korean 그 여자에 대해 하나 말하자면, 걔는 초등학교 선생님이야.
Tagalog Isang bagay tungkol sa babae, siya ay isang guro sa elementarya.
English One thing about the woman, she is an elementary school teacher.
German Eine Sache über die Frau, sie ist eine Grundschullehrerin.
Portuguese Um facto sobre a mulher, ela é professora do ensino primário.
Our approach
• Template
 How can the pairs be simultaneously evaluated?
13
Our approach
• Evaluation
 Template-based evaluation set construction
 Inference with public MT modules
 Human evaluation (gender-related) and automatic metrics (fluency)
14
Our approach
• Measurement
 Biasedness
• Accuracy on biological gender
• Accuracy on grammatical gender
• Disparate impact
– Accuracy on female case
divided by accuracy on male case
 Fluency
• BLEU
– EN, DE, PT
• BERTScore
– Multilingual BERT
15
Results and analysis
• Results
 More bias-related errors in EN > DE/PT than in KO/TL > EN
• She is a game programmer > Sie ist ein professioneller Spieler
• aviador, soldado, monge (airman, soldier, monk)
• Exceptional cases for Bing KO-EN
16
Results and analysis
• Analysis
 Unbiasedness/Disparate impact
• Higher among type 1 languages
– DE, PT < KO, TL (overall)
• In the same type, resource seems
to matter
– DE < PT, KO < TL
 Fluency measurement
• Lexical and semantic approach have different results
– BLEU (lexical): DE > PT > KO, TL
– BERTScore (semantic): DE < PT, KO < TL
 Observations
• The amount of available language resource, though here assumed for public
MT modules, does not guarantee unbiased translation, albeit fluency measure
may be higher in some sense
• There is a difference regarding the evaluation on gender-related inference per
fluency measures
17
Takeaways
• Translation gender bias is problematic since wrong results can be
offensive to end users
• Translation gender bias matters regardless of the user proficiency
of the language, and especially offensive if the mistranslation
engages social stereotypes
• Our approach, including template and measurement, can combine
the translation gender bias evaluation regarding various language
pairs
• Our evaluation results suggest that the inductive bias as a social
stereotype is a major factor causing the errors and augmenting
training corpora may not be a solution
18
Reference (order of appearance)
• Binns, Reuben. "Fairness in Machine Learning: Lessons from Political Philosophy." arXiv preprint
arXiv:1712.03586 (2017).
• Zhao, Jieyu, et al. "Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-
level Constraints." arXiv preprint arXiv:1707.09457 (2017).
• Sun, Tony, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza,
Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. "Mitigating Gender Bias in Natural
Language Processing: Literature Review." In Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics, pp. 1630-1640. 2019.
• Prates, Marcelo OR, Pedro H. Avelar, and Luís C. Lamb. "Assessing Gender Bias in Machine
Translation: A Case Study with Google Translate." Neural Computing and Applications (2018): 1-
19.
• Vanmassenhove, Eva, Christian Hardmeier, and Andy Way. "Getting Gender Right in Neural
Machine Translation." In Proceedings of the 2018 Conference on Empirical Methods in Natural
Language Processing, pp. 3003-3008. 2018.
• Cho, Won Ik, et al. "On Measuring Gender Bias in Translation of Gender-neutral Pronouns."
GeBNLP 2019 (2019): 173.
• Stanovsky, Gabriel, Noah A. Smith, and Luke Zettlemoyer. "Evaluating Gender Bias in Machine
Translation." arXiv preprint arXiv:1906.00591 (2019).
19
Thank you!
EndOfPresentation

More Related Content

PPTX
190802 GeBNLP
PDF
Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...
PDF
11.effectiveness of social stories in children with semantic pragmatic disorder
PPT
Dynamic assessment and academic writing: evidence of learning transfer?
PPTX
SIOP Master Tutorial: NLP and Text Mining for I/O Psychologists
PPTX
http://englishresearch.ir/home
PPTX
Recent benchmarks for natural language inference
PPTX
2104 Talk @SSU
190802 GeBNLP
Carolyn Rosé - WESST - From Data to Design of Dynamic Support for Collaborati...
11.effectiveness of social stories in children with semantic pragmatic disorder
Dynamic assessment and academic writing: evidence of learning transfer?
SIOP Master Tutorial: NLP and Text Mining for I/O Psychologists
http://englishresearch.ir/home
Recent benchmarks for natural language inference
2104 Talk @SSU

Similar to 2103 ACM FAccT (20)

PPTX
A Survey of ‘Bias’ in Natural Language Processing Systems
PDF
Natural Language Processing: L01 introduction
PPTX
Automated Language Assessment Scoring and impact on instruction
PPTX
A Level English Language Exam Prep from AQA 2011
PPTX
Seven Steps to EnGendering Evaluations of Public Health Programs
PPTX
EDRD 6000 - Language issues in qualitative research shiyuan zhou
PPTX
Attitudes bolouri
PPTX
Boston Mini Upa2011: Localization Research Presentation by Jennifer Fabrizi a...
PPT
Psychological test adaptation
PDF
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
PPTX
Dr. Nafissi ELT5 2019
PPTX
Week 11 english 145
PDF
Assessing The Accuracy And Teachers Impressions Of Google Translate A Study...
PDF
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
PDF
Lepor: augmented automatic MT evaluation metric
PPTX
Machine translator Introduction
PPTX
Elements of language learning - an analysis of how different elements of lang...
PDF
A Comparison Of Freshman And Sophomore EFL Students Written Performance Thro...
PPTX
Error Analysis developed by Bochra Benaicha
PPT
Lessons 6 and 7 for blog
A Survey of ‘Bias’ in Natural Language Processing Systems
Natural Language Processing: L01 introduction
Automated Language Assessment Scoring and impact on instruction
A Level English Language Exam Prep from AQA 2011
Seven Steps to EnGendering Evaluations of Public Health Programs
EDRD 6000 - Language issues in qualitative research shiyuan zhou
Attitudes bolouri
Boston Mini Upa2011: Localization Research Presentation by Jennifer Fabrizi a...
Psychological test adaptation
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Dr. Nafissi ELT5 2019
Week 11 english 145
Assessing The Accuracy And Teachers Impressions Of Google Translate A Study...
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
Lepor: augmented automatic MT evaluation metric
Machine translator Introduction
Elements of language learning - an analysis of how different elements of lang...
A Comparison Of Freshman And Sophomore EFL Students Written Performance Thro...
Error Analysis developed by Bochra Benaicha
Lessons 6 and 7 for blog
Ad

More from WarNik Chow (20)

PPTX
2312 PACLIC
PPTX
2311 EAAMO
PPTX
2211 HCOMP
PPTX
2211 APSIPA
PPTX
2211 AACL
PPTX
2210 CODI
PPTX
2206 FAccT_inperson
PPTX
2206 Modupop!
PPTX
2204 Kakao talk on Hate speech dataset
PPTX
2108 [LangCon2021] kosp2e
PPTX
2106 PRSLLS
PPTX
2106 JWLLP
PPTX
2106 ACM DIS
PPTX
2102 Redone seminar
PPTX
2011 NLP-OSS
PPTX
2010 INTERSPEECH
PPTX
2010 PACLIC - pay attention to categories
PPTX
2010 HCLT Hate Speech
PPTX
2009 DevC Seongnam - NLP
PPTX
2008 [lang con2020] act!
2312 PACLIC
2311 EAAMO
2211 HCOMP
2211 APSIPA
2211 AACL
2210 CODI
2206 FAccT_inperson
2206 Modupop!
2204 Kakao talk on Hate speech dataset
2108 [LangCon2021] kosp2e
2106 PRSLLS
2106 JWLLP
2106 ACM DIS
2102 Redone seminar
2011 NLP-OSS
2010 INTERSPEECH
2010 PACLIC - pay attention to categories
2010 HCLT Hate Speech
2009 DevC Seongnam - NLP
2008 [lang con2020] act!
Ad

Recently uploaded (20)

PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPT
Predictive modeling basics in data cleaning process
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to Inferential Statistics.pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
Business_Capability_Map_Collection__pptx
PDF
Global Data and Analytics Market Outlook Report
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Microsoft Core Cloud Services powerpoint
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Predictive modeling basics in data cleaning process
annual-report-2024-2025 original latest.
Introduction to Inferential Statistics.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
STERILIZATION AND DISINFECTION-1.ppthhhbx
Navigating the Thai Supplements Landscape.pdf
Business_Capability_Map_Collection__pptx
Global Data and Analytics Market Outlook Report
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Microsoft Core Cloud Services powerpoint
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf

2103 ACM FAccT

  • 1. Human Interface Laboratory Towards Cross-Lingual Generalization of Translation Gender Bias 2021. 3. 9 @FAccT Conference Won Ik Cho*, Jiwon Kim*, Jaeyoung Yang, Nam Soo Kim
  • 2. Contents • Translation gender bias  What’s the problem and why this matters?  Significant in which language pairs? - Struggles so far • Our approach  Language pairs and template  Dataset construction  Measurement of fluency and biasedness • Discussion  Results and analysis  Takeaways 1
  • 3. Bias • Bias in machine learning?  Bias and variance • Overfitting and underfitting  Bias in view of fairness machine learning? • Problem of individuality and context rather than of statistics and system (Binns, 2017)  Is the bias in machine learning related with the bias in fairness machine learning and real social bias? • e.g., image semantic role labeling – Zhao et al., Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints, in Proc. EMNLP, 2017. • This also happens in translation! 2
  • 4. Bias • What is shown (social) bias in AI and NLP?  Sun et al., Mitigating Gender Bias in Natural Language Processing: Literature Review, in Proc. ACL, 2019. 3
  • 5. Overview: Gender bias in translation? • Formulation #1  Gender-neutral pronouns • Target problem?  Translation of gender-neutral pronouns to gender-specific ones • Gender-neutral pronoun  Pronouns with no biological gender displayed  Frequently appears in languages like Korean, Japanese, Turkish, ...  Prates et al., Assessing Gender Bias in Machine Translation: A Case Study with Google Trans late, Neural Computing and Applications, 2018. 4
  • 6. Overview: Gender bias in translation? • Formulation #2  Gendered languages • Target problem?  Translation of expressions without gender representation to gendered items • Gendered languages  Grammatical genders in articles, nouns, adjectives  Differs from the biological gender  Vanmassenhove et al., Getting Gender Right in Neural Machine Translation, in Proc. EMNLP, 2018. 5
  • 7. Overview: Gender bias in translation? • Why do they matter?  The result can be offensive to end users • When they do matter?  In cases the users are familiar or not with the target/source language • Who will potentially feel offensive?  Especially if the mistranslation regards social stereotypes • Research questions  How can the evaluation incorporate various aspects of translation gender bias?  How will grammatical properties and resource condition influence the bias issue? 6
  • 8. Template-based attacks • 걔(s/he)는 [##]이야!  Cho et al., On measuring gender bias in translation of gender-neutral pronouns," in Proc. GeBNLP, ACL Workshop, 2019. • Why Korean?  Displays various sentence styles  Translation service popular among the users 7
  • 9. Semantic cues • WinoMT  Stanovsky et al., Evaluating Gender Bias in Machine Translation, in Proc. ACL, 2019. 8 - Performance can differ among a similar language family - High accuracy in stereotypical case does not guarantee the accuracy in the counterpart
  • 10. Our approach • Combined approach – Cross-lingual evaluation?  Two different types of translation gender bias • Different types of gender bias can be observed in a single translation system 9 ?
  • 11. Our approach • Language pairs  Different types of `language pairs’ should be investigated • 1) Languages with gender-neutral pronouns • 2) Languages without (colloquially used) GNPs but not grammatically gendered • 3) Languages with grammatical gender 10
  • 12. Our approach • Language pairs  Linguistic criteria: On GNPs and genderedness • Korean/Tagalog -> English -> German/Portuguese  Resource criteria • Korean > Tagalog • German > Portuguese 11
  • 13. Our approach • Template  How can the pairs be simultaneously evaluated? 12 Korean 그 여자에 대해 하나 말하자면, 걔는 초등학교 선생님이야. Tagalog Isang bagay tungkol sa babae, siya ay isang guro sa elementarya. English One thing about the woman, she is an elementary school teacher. German Eine Sache über die Frau, sie ist eine Grundschullehrerin. Portuguese Um facto sobre a mulher, ela é professora do ensino primário.
  • 14. Our approach • Template  How can the pairs be simultaneously evaluated? 13
  • 15. Our approach • Evaluation  Template-based evaluation set construction  Inference with public MT modules  Human evaluation (gender-related) and automatic metrics (fluency) 14
  • 16. Our approach • Measurement  Biasedness • Accuracy on biological gender • Accuracy on grammatical gender • Disparate impact – Accuracy on female case divided by accuracy on male case  Fluency • BLEU – EN, DE, PT • BERTScore – Multilingual BERT 15
  • 17. Results and analysis • Results  More bias-related errors in EN > DE/PT than in KO/TL > EN • She is a game programmer > Sie ist ein professioneller Spieler • aviador, soldado, monge (airman, soldier, monk) • Exceptional cases for Bing KO-EN 16
  • 18. Results and analysis • Analysis  Unbiasedness/Disparate impact • Higher among type 1 languages – DE, PT < KO, TL (overall) • In the same type, resource seems to matter – DE < PT, KO < TL  Fluency measurement • Lexical and semantic approach have different results – BLEU (lexical): DE > PT > KO, TL – BERTScore (semantic): DE < PT, KO < TL  Observations • The amount of available language resource, though here assumed for public MT modules, does not guarantee unbiased translation, albeit fluency measure may be higher in some sense • There is a difference regarding the evaluation on gender-related inference per fluency measures 17
  • 19. Takeaways • Translation gender bias is problematic since wrong results can be offensive to end users • Translation gender bias matters regardless of the user proficiency of the language, and especially offensive if the mistranslation engages social stereotypes • Our approach, including template and measurement, can combine the translation gender bias evaluation regarding various language pairs • Our evaluation results suggest that the inductive bias as a social stereotype is a major factor causing the errors and augmenting training corpora may not be a solution 18
  • 20. Reference (order of appearance) • Binns, Reuben. "Fairness in Machine Learning: Lessons from Political Philosophy." arXiv preprint arXiv:1712.03586 (2017). • Zhao, Jieyu, et al. "Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus- level Constraints." arXiv preprint arXiv:1707.09457 (2017). • Sun, Tony, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, and William Yang Wang. "Mitigating Gender Bias in Natural Language Processing: Literature Review." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1630-1640. 2019. • Prates, Marcelo OR, Pedro H. Avelar, and Luís C. Lamb. "Assessing Gender Bias in Machine Translation: A Case Study with Google Translate." Neural Computing and Applications (2018): 1- 19. • Vanmassenhove, Eva, Christian Hardmeier, and Andy Way. "Getting Gender Right in Neural Machine Translation." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3003-3008. 2018. • Cho, Won Ik, et al. "On Measuring Gender Bias in Translation of Gender-neutral Pronouns." GeBNLP 2019 (2019): 173. • Stanovsky, Gabriel, Noah A. Smith, and Luke Zettlemoyer. "Evaluating Gender Bias in Machine Translation." arXiv preprint arXiv:1906.00591 (2019). 19

Editor's Notes