SlideShare a Scribd company logo
Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Background, Motivations
Wikipedia is global platform covering hundreds of languages
despite evidence of balkanization (Taneja & Wu, in press)
Past studies generally concentrate on one edition (usually English)
Important variations across languages
Content is diverse across languages (Hecht & Gergle, 2010)
Each edition of Wikipedia shows a self-focus bias with more articles
about regions where the language is spoken (Hecht & Gergle, 2009)
Multilingual users may act as unconscious translators bridging language
divides (Herring et al., 2007; Eleta & Golbeck, 2012)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Related work
Why edit Wikipedia in a foreign language?
Increased audience size (Crystal, 2003; Zuckerman, 2013)
In a Uzbekistan survey, Internet users reported accessing content in
foreign languages even while simultaneously reporting poor foreign
language skills (Wei & Kolko, 2005)
Editors of many editions of Wikipedia come from a wide variety of
timezones suggesting that bilingual editors are present (Yasseri et al.,
2012)
In a survey of editors, half of all editors reported editing in multiple
languages and 72% reported reading more than one language edition of
Wikipedia.†
†
https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/
Location %26 Language&oldid=8409990
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
1 Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data
All edits to any of the top 46 language editions (all editions with at
least 100,000 articles)
Recorded via the IRC stream
(code at http://www.scotthale.net/pubs/?websci2014)
32 days (8 July to 9 August 2013)
Edit meta-data
datetime
edition
article title
username
size of edit
flags (minor, bot, etc.)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data cleaning
Non-minor edits by registered, human users to articles
Only edits to main (article) namespace
Removed articles flagged as being created by ‘bots’
Removed anonymous users
Removed undeclared bots and users with only one edit session in the
month
Require at least four edits and at least 2 edits to one edition
Matching users and articles across languages
Look for common usernames across language editions
Check usernames are indeed linked global accounts
WikiData dump to match articles across languages
55,568 users with a total of 3,518,955 edits (excluding the Simple English
edition).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Data summary
Language Edits Articles Users NP
users
NP
edits
English 1,389,647 518,405 27,476 18% 3%
German 256,495 125,647 5,967 18% 2%
French 250,828 106,027 4,549 25% 3%
Spanish 191,934 66,848 4,338 24% 3%
Russian 239,267 92,326 3,961 16% 1%
Japanese 106,848 56,406 3,551 11% 2%
Italian 160,191 69,534 2,919 25% 2%
Chinese 112,888 42,937 2,309 14% 1%
Portuguese 67,505 32,753 1,730 29% 4%
Dutch 80,535 39,463 1,500 33% 3%
Polish 67,038 37,393 1,454 30% 3%
Top language editions: The Users column includes all users who edited the edition
during the data collection period. A percentage of these users (NP users) are
non-primary users who edited a different language edition more frequently.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Multilinguals vs Monolinguals
15.4% of users (8,544) edited multiple language editions.
Figure: Density plot comparing the number of edits made by monolingual and
multilingual Wikipedia users.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
Most editors will edit only one language edition
2 Multilingual users will edit different articles than monolingual users
3 When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Only 2.6% of edits are
from users writing in their
non-primary languages.
44% of the articles edited
by multilingual users in
their non-primary
languages were not edited
by any monolingual user
2D density plot of the number of multilingual
users editing articles in a non-primary language
against the number of monolingual users editing
the articles.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages. The distribution is
bimodal. A large number of users did not edit any of the same articles in their
primary languages, but a large number of users always edited the same articles in
their primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
What do multilinguals edit?
Histogram showing the distribution with which multilingual users edited articles in
other languages that they also edited in their primary languages after removing
edits to articles that do not exist in users’ primary languages.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
4 Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
5 Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Variations by language
Scatter plot of language size (number of unique users) and percentage of users who
are multilingual (edit more than one language edition). The three editions with less
than 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray).
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Language crossings
ar
bg
ca
cs
da
de
en
es
fa
fifr
he
hu
id
it
ja
ko
nl
no
pl
pt
ro
ru
sv
tr
uk
zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Language crossings (English removed)
ca
cs
de
es
fr
it
ja
nl
pl
pt
ru
sv
uk zh
Co-editing network graph
Nodes represent language
editions
Directed, weighted edges show
the log of the number of users
primarily editing one language
edition who edited another
edition
Only edges with weights over
1.96 standard deviations above
the mean are shown
Colors indicate communities
found by the infomap community
detection algorithm
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hypotheses
Most editors will edit only one language edition
Multilingual users will edit different articles than monolingual users
Ö When a user edits an article in another language that same user will
usually also edit the corresponding article in his native language
Users writing primarily in smaller-sized language editions will be more
likely to cross-language boundaries than users writing primarily in
larger-sized language editions
Larger-sized language editions, English chief among them, will be more
likely to have contributions from editors of different languages than
smaller-sized language editions
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Simple English
No big changes if Simple English edition is considered
Largest editor overlap with English edition
Dedicated group of editors:
45% of editors editing Simple most frequently do not edit any other
edition (similar to Esperanto)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Comparison with Twitter
Similar percentages of users multilingual (11% in Twitter)
Similar correlation between activity level and multilingualism
Language size not correlated with multilingualism on Twitter;
some language consistencies (Japanese, English) and some variations
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network.
http://www.scotthale.net/pubs/?chi2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Implications and future directions
Implications
Multilingual users found in all
editions; correlation with activity
Design for multilingual users
(universal language selector and
global accounts already progress
in this direction)
Important per language
variations
Inverse correlation between
multilingual users and self-focus
bias as measured by Hecht
(2009)
Further work
Move from edit meta-data to
edit content itself
What type of edits are users
making in non-primary
languages?
Variations by topic/theme?
Correlations with link/image
overlap?
Viewing vs. editing behavior
(survey results show much higher
percentage of users read multiple
editions)
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Multilinguals and Wikipedia Editing
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/pubs/?websci2014
25 June 2014
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as
well as the anonymous reviewers who provided helpful comments on previous versions of
this research.
Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge:
Cambridge University Press.
Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks:
How Multilingual Users of Twitter Connect Language Communities.
Proceedings of the American Society for Information Science and
Technology, 49(1), 1–4. Available from
http://dx.doi.org/10.1002/meet.14504901327
Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter
Network. In Proceedings of the sigchi conference on human factors in
computing systems (pp. 833–842). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/2556288.2557203
Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in
community-maintained knowledge repositories. In Proceedings of the
fourth international conference on communities and technologies (pp.
11–20). New York, NY, USA: ACM. Available from
http://doi.acm.org/10.1145/1556460.1556463
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0:
User-generated content and its applications in a multilingual context.
In Proceedings of the 28th international conference on human factors
in computing systems (pp. 291–300). New York, NY, USA: ACM.
Available from http://doi.acm.org/10.1145/1753326.1753370
Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E.,
Stoerger, S., et al. (2007). Language Networks on LiveJournal. In
Proceedings of the 40th annual hawaii international conference on
system sciences. Washington, DC, USA: IEEE Computer Society.
Available from http://dx.doi.org/10.1109/HICSS.2007.320
Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language
and Internet diffusion patterns in Uzbekistan. New Review of
Hypermedia and Multimedia, 11(2), 205–220.
Yasseri, T., Sumi, R., & Kert´esz, J. (2012). Circadian Patterns of Wikipedia
Editorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091.
Available from
http://dx.doi.org/10.1371%2Fjournal.pone.0030091
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of
Connection. London: W. W. Norton & Company.
Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing

More Related Content

PDF
Global connectivity and multilinguals in the Twitter network (slides)
PPT
eMargin Presentation given to Skills Funding Agency
PPT
The Wonderful World of Wikis
PPTX
NCompassLive: Surveys, Focus Groups & Observation
PPTX
Editing behavior of wikipedia editors
PPTX
Decomposing discussion forums using user roles
PPTX
Editing Wikipedia: Why You Should and How You Can Support Your Users
PDF
SharePoint Tutorial Lesson 60#: Embed Microsoft Content
Global connectivity and multilinguals in the Twitter network (slides)
eMargin Presentation given to Skills Funding Agency
The Wonderful World of Wikis
NCompassLive: Surveys, Focus Groups & Observation
Editing behavior of wikipedia editors
Decomposing discussion forums using user roles
Editing Wikipedia: Why You Should and How You Can Support Your Users
SharePoint Tutorial Lesson 60#: Embed Microsoft Content

Similar to Multilinguals and Wikipedia Editing (15)

PDF
Design and Multilingual Users on Twitter and Wikipedia
PDF
Quality assessment of Wikipedia and its sources
PDF
Reciprocal Enrichment between Wikipedia and Machine Translators
DOCX
List of wikipedias
PDF
Language commons wiki_final
PPT
I Heart Wikipedia
PDF
Enrichment of multilingual Wikipedia based on quality analysis
ODP
Waiting for the Babel Fish: Languages and Multilingualism
PDF
Indic Languages Wikipedia
PPTX
Getting started-wikipedia-october2013
PPTX
Intro to Editing Wikipedia - SCOTUS Editathon at NARA
PDF
Increasing access to free and open knowledge for speakers of underserved lang...
PDF
Wikipedia for Language Learning - a guide for teachers
PPTX
IMC2022_Wikipedia for Science_for weADAPT.pptx
Design and Multilingual Users on Twitter and Wikipedia
Quality assessment of Wikipedia and its sources
Reciprocal Enrichment between Wikipedia and Machine Translators
List of wikipedias
Language commons wiki_final
I Heart Wikipedia
Enrichment of multilingual Wikipedia based on quality analysis
Waiting for the Babel Fish: Languages and Multilingualism
Indic Languages Wikipedia
Getting started-wikipedia-october2013
Intro to Editing Wikipedia - SCOTUS Editathon at NARA
Increasing access to free and open knowledge for speakers of underserved lang...
Wikipedia for Language Learning - a guide for teachers
IMC2022_Wikipedia for Science_for weADAPT.pptx
Ad

More from Scott A. Hale (10)

PDF
Researching Misinformation
PDF
Big Tech & Disinformation: What are the main threats and how can journalists ...
PDF
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
PDF
Foreign-language Reviews: Help or Hindrance? (Slides)
PDF
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
PDF
Interactive Visualizations for teaching, research, and dissemination
PDF
Oxford Digital Humanities Summer School
PDF
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
PDF
Ancient History of the UK Web
PDF
ECPR 2011 Leaders and Followers Experiment
Researching Misinformation
Big Tech & Disinformation: What are the main threats and how can journalists ...
No Master Algorithm: Human-machine intelligence and the real-world needs of f...
Foreign-language Reviews: Help or Hindrance? (Slides)
How much is said in a microblog? A multilingual inquiry based on Weibo and Tw...
Interactive Visualizations for teaching, research, and dissemination
Oxford Digital Humanities Summer School
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
Ancient History of the UK Web
ECPR 2011 Leaders and Followers Experiment
Ad

Recently uploaded (20)

PDF
Introduction to Data Science and Data Analysis
PPTX
Leprosy and NLEP programme community medicine
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
modul_python (1).pptx for professional and student
PDF
Transcultural that can help you someday.
PDF
annual-report-2024-2025 original latest.
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Lecture1 pattern recognition............
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Data Science and Data Analysis
Leprosy and NLEP programme community medicine
Optimise Shopper Experiences with a Strong Data Estate.pdf
[EN] Industrial Machine Downtime Prediction
Reliability_Chapter_ presentation 1221.5784
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
modul_python (1).pptx for professional and student
Transcultural that can help you someday.
annual-report-2024-2025 original latest.
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
climate analysis of Dhaka ,Banglades.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Lecture1 pattern recognition............
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Database Infoormation System (DBIS).pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Clinical guidelines as a resource for EBP(1).pdf

Multilinguals and Wikipedia Editing

  • 1. Multilinguals and Wikipedia Editing Scott A. Hale Oxford Internet Institute http://www.scotthale.net/pubs/?websci2014 25 June 2014 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 2. Background, Motivations Wikipedia is global platform covering hundreds of languages despite evidence of balkanization (Taneja & Wu, in press) Past studies generally concentrate on one edition (usually English) Important variations across languages Content is diverse across languages (Hecht & Gergle, 2010) Each edition of Wikipedia shows a self-focus bias with more articles about regions where the language is spoken (Hecht & Gergle, 2009) Multilingual users may act as unconscious translators bridging language divides (Herring et al., 2007; Eleta & Golbeck, 2012) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 3. Related work Why edit Wikipedia in a foreign language? Increased audience size (Crystal, 2003; Zuckerman, 2013) In a Uzbekistan survey, Internet users reported accessing content in foreign languages even while simultaneously reporting poor foreign language skills (Wei & Kolko, 2005) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 4. Related work Why edit Wikipedia in a foreign language? Increased audience size (Crystal, 2003; Zuckerman, 2013) In a Uzbekistan survey, Internet users reported accessing content in foreign languages even while simultaneously reporting poor foreign language skills (Wei & Kolko, 2005) Editors of many editions of Wikipedia come from a wide variety of timezones suggesting that bilingual editors are present (Yasseri et al., 2012) In a survey of editors, half of all editors reported editing in multiple languages and 72% reported reading more than one language edition of Wikipedia.† † https://meta.wikimedia.org/w/index.php?title=Editor Survey 2011/ Location %26 Language&oldid=8409990 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 5. Hypotheses 1 Most editors will edit only one language edition 2 Multilingual users will edit different articles than monolingual users 3 When a user edits an article in another language that same user will usually also edit the corresponding article in his native language 4 Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions 5 Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 6. Data All edits to any of the top 46 language editions (all editions with at least 100,000 articles) Recorded via the IRC stream (code at http://www.scotthale.net/pubs/?websci2014) 32 days (8 July to 9 August 2013) Edit meta-data datetime edition article title username size of edit flags (minor, bot, etc.) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 7. Data cleaning Non-minor edits by registered, human users to articles Only edits to main (article) namespace Removed articles flagged as being created by ‘bots’ Removed anonymous users Removed undeclared bots and users with only one edit session in the month Require at least four edits and at least 2 edits to one edition Matching users and articles across languages Look for common usernames across language editions Check usernames are indeed linked global accounts WikiData dump to match articles across languages 55,568 users with a total of 3,518,955 edits (excluding the Simple English edition). Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 8. Data summary Language Edits Articles Users NP users NP edits English 1,389,647 518,405 27,476 18% 3% German 256,495 125,647 5,967 18% 2% French 250,828 106,027 4,549 25% 3% Spanish 191,934 66,848 4,338 24% 3% Russian 239,267 92,326 3,961 16% 1% Japanese 106,848 56,406 3,551 11% 2% Italian 160,191 69,534 2,919 25% 2% Chinese 112,888 42,937 2,309 14% 1% Portuguese 67,505 32,753 1,730 29% 4% Dutch 80,535 39,463 1,500 33% 3% Polish 67,038 37,393 1,454 30% 3% Top language editions: The Users column includes all users who edited the edition during the data collection period. A percentage of these users (NP users) are non-primary users who edited a different language edition more frequently. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 9. Multilinguals vs Monolinguals 15.4% of users (8,544) edited multiple language editions. Figure: Density plot comparing the number of edits made by monolingual and multilingual Wikipedia users. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 10. Hypotheses Most editors will edit only one language edition 2 Multilingual users will edit different articles than monolingual users 3 When a user edits an article in another language that same user will usually also edit the corresponding article in his native language 4 Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions 5 Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 11. What do multilinguals edit? Only 2.6% of edits are from users writing in their non-primary languages. 44% of the articles edited by multilingual users in their non-primary languages were not edited by any monolingual user 2D density plot of the number of multilingual users editing articles in a non-primary language against the number of monolingual users editing the articles. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 12. What do multilinguals edit? Histogram showing the distribution with which multilingual users edited articles in other languages that they also edited in their primary languages. The distribution is bimodal. A large number of users did not edit any of the same articles in their primary languages, but a large number of users always edited the same articles in their primary languages. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 13. What do multilinguals edit? Histogram showing the distribution with which multilingual users edited articles in other languages that they also edited in their primary languages after removing edits to articles that do not exist in users’ primary languages. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 14. Hypotheses Most editors will edit only one language edition Multilingual users will edit different articles than monolingual users Ö When a user edits an article in another language that same user will usually also edit the corresponding article in his native language 4 Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions 5 Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 15. Variations by language Scatter plot of language size (number of unique users) and percentage of users who are multilingual (edit more than one language edition). The three editions with less than 10 users in the sample are omitted (Uzbek, Cebuano, and Waray-Waray). Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 16. Language crossings ar bg ca cs da de en es fa fifr he hu id it ja ko nl no pl pt ro ru sv tr uk zh Co-editing network graph Nodes represent language editions Directed, weighted edges show the log of the number of users primarily editing one language edition who edited another edition Only edges with weights over 1.96 standard deviations above the mean are shown Colors indicate communities found by the infomap community detection algorithm Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 17. Language crossings (English removed) ca cs de es fr it ja nl pl pt ru sv uk zh Co-editing network graph Nodes represent language editions Directed, weighted edges show the log of the number of users primarily editing one language edition who edited another edition Only edges with weights over 1.96 standard deviations above the mean are shown Colors indicate communities found by the infomap community detection algorithm Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 18. Hypotheses Most editors will edit only one language edition Multilingual users will edit different articles than monolingual users Ö When a user edits an article in another language that same user will usually also edit the corresponding article in his native language Users writing primarily in smaller-sized language editions will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions Larger-sized language editions, English chief among them, will be more likely to have contributions from editors of different languages than smaller-sized language editions Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 19. Simple English No big changes if Simple English edition is considered Largest editor overlap with English edition Dedicated group of editors: 45% of editors editing Simple most frequently do not edit any other edition (similar to Esperanto) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 20. Comparison with Twitter Similar percentages of users multilingual (11% in Twitter) Similar correlation between activity level and multilingualism Language size not correlated with multilingualism on Twitter; some language consistencies (Japanese, English) and some variations Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network. http://www.scotthale.net/pubs/?chi2014 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 21. Implications and future directions Implications Multilingual users found in all editions; correlation with activity Design for multilingual users (universal language selector and global accounts already progress in this direction) Important per language variations Inverse correlation between multilingual users and self-focus bias as measured by Hecht (2009) Further work Move from edit meta-data to edit content itself What type of edits are users making in non-primary languages? Variations by topic/theme? Correlations with link/image overlap? Viewing vs. editing behavior (survey results show much higher percentage of users read multiple editions) Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 22. Multilinguals and Wikipedia Editing Scott A. Hale Oxford Internet Institute http://www.scotthale.net/pubs/?websci2014 25 June 2014 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing I would like to thank Eric T. Meyer, Taha Yasseri, Jonathan Bright, and Mike Thelwall as well as the anonymous reviewers who provided helpful comments on previous versions of this research.
  • 23. Crystal, D. (2003). English as a Global Language (2nd ed.). Cambridge: Cambridge University Press. Eleta, I., & Golbeck, J. (2012). Bridging Languages in Social Networks: How Multilingual Users of Twitter Connect Language Communities. Proceedings of the American Society for Information Science and Technology, 49(1), 1–4. Available from http://dx.doi.org/10.1002/meet.14504901327 Hale, S. A. (2014). Global Connectivity and Multilinguals in the Twitter Network. In Proceedings of the sigchi conference on human factors in computing systems (pp. 833–842). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/2556288.2557203 Hecht, B., & Gergle, D. (2009). Measuring self-focus bias in community-maintained knowledge repositories. In Proceedings of the fourth international conference on communities and technologies (pp. 11–20). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/1556460.1556463 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 24. Hecht, B., & Gergle, D. (2010). The Tower of Babel meets Web 2.0: User-generated content and its applications in a multilingual context. In Proceedings of the 28th international conference on human factors in computing systems (pp. 291–300). New York, NY, USA: ACM. Available from http://doi.acm.org/10.1145/1753326.1753370 Herring, S. C., Paolillo, J. C., Ramos-Vielba, I., Kouper, I., Wright, E., Stoerger, S., et al. (2007). Language Networks on LiveJournal. In Proceedings of the 40th annual hawaii international conference on system sciences. Washington, DC, USA: IEEE Computer Society. Available from http://dx.doi.org/10.1109/HICSS.2007.320 Wei, C. Y., & Kolko, B. E. (2005). Resistance to globalization: Language and Internet diffusion patterns in Uzbekistan. New Review of Hypermedia and Multimedia, 11(2), 205–220. Yasseri, T., Sumi, R., & Kert´esz, J. (2012). Circadian Patterns of Wikipedia Editorial Activity: A Demographic Analysis. PLoS ONE, 7(1), e30091. Available from http://dx.doi.org/10.1371%2Fjournal.pone.0030091 Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing
  • 25. Zuckerman, E. (2013). Rewire: Digital Cosmopolitans in the Age of Connection. London: W. W. Norton & Company. Scott A. Hale, @computermacgyve Multilinguals and Wikipedia Editing