SlideShare a Scribd company logo
Public
Disclosure
Authorized
Public
Disclosure
Authorized
Public
Disclosure
Authorized
Public
Disclosure
Authorized
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
© 2021 International Bank for Reconstruction and Development / The World Bank
1818 H Street NW, Washington, DC 20433
Telephone: 202-473-1000; Internet: www.worldbank.org
Some rights reserved
1 2 3 4 24 23 22 21
This work is a product of the staff of The World Bank with external contributions. The findings, interpretations,
and conclusions expressed in this work do not necessarily reflect the views of The World Bank, its Board of Execu-
tive Directors, or the governments they represent. The World Bank does not guarantee the accuracy, completeness,
or currency of the data included in this work and does not assume responsibility for any errors, omissions, or
discrepancies in the information, or liability with respect to the use of or failure to use the information, methods,
processes, or conclusions set forth. The boundaries, colors, denominations, and other information shown on any
map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any
territory or the endorsement or acceptance of such boundaries.
Nothing herein shall constitute or be construed or considered to be a limitation upon or waiver of the privileges
and immunities of The World Bank, all of which are specifically reserved.
Rights and Permissions
This work is available under the Creative Commons Attribution 3.0 IGO license (CC BY 3.0 IGO) http://
creativecommons.org/licenses/by/3.0/igo. Under the Creative Commons Attribution license, you are free to copy,
distribute, transmit, and adapt this work, including for commercial purposes, under the following conditions:
Attribution—Please cite the work as follows: World Bank. 2021. World Development Report 2021: Data for
Better Lives. Washington, DC: World Bank. doi:10.1596/978-1-4648-1600-0. License: Creative Commons
Attribution CC BY 3.0 IGO
Translations—If you create a translation of this work, please add the following disclaimer along with the attribu-
tion: This translation was not created by The World Bank and should not be considered an official World Bank
translation. The World Bank shall not be liable for any content or error in this translation.
Adaptations—If you create an adaptation of this work, please add the following disclaimer along with the attribu-
tion: This is an adaptation of an original work by The World Bank. Views and opinions expressed in the adapta-
tion are the sole responsibility of the author or authors of the adaptation and are not endorsed by the World Bank.
Third-party content—The World Bank does not necessarily own each component of the content contained within
the work. The World Bank therefore does not warrant that the use of any third-party-owned individual component
or part contained in the work will not infringe on the rights of those third parties. The risk of claims resulting from
such infringement rests solely with you. If you wish to re-use a component of the work, it is your responsibility
to determine whether permission is needed for that re-use and to obtain permission from the copyright owner.
Examples of components can include, but are not limited to, tables, figures, or images.
All queries on rights and licenses should be addressed to World Bank Publications, The World Bank Group,
1818 H Street NW, Washington, DC 20433, USA; e-mail: pubrights@worldbank.org.
ISSN, ISBN, e-ISBN, and DOI:
Softcover
ISSN: 0163-5085
ISBN: 978-1-4648-1600-0
e-ISBN: 978-1-4648-1601-7
DOI: 10.1596/978-1-4648-1600-0
Hardcover
ISSN: 0163-5085
ISBN: 978-1-4648-1607-9
DOI: 10.1596/978-1-4648-1607-9
Cover design: Dania Kibbi, Base Three Studio
Cover illustration: Mijke Coebergh
Interior design: George Kokkinidis, Design Language, Brooklyn, New York, with updates and additional figure
design by Dania Kibbi, Base Three Studio
Library of Congress Control Number: 2021935945
Contents | iii
xi Foreword
xiii Acknowledgments
xvii Abbreviations
1 Overview
3 Overview
3 Advancing development objectives through data
10 Aligning data governance with the social contract
16 Moving toward an integrated national data system
18 Notes
19 References
21	
Part I: Advancing development objectives
through data
23 Chapter 1: Harnessing the value of data for the poor
24 The untapped potential of data to serve development objectives
25 A brief history of data
27 A data typology
30 The economics of data and political economy issues
33 Data for development: A conceptual framework
38 A data governance framework to enforce the social contract for data
41 Putting it all together: Establishing an integrated national data system
42 Organization of this Report
43 Notes
44 References
48	
Spotlight 1.1: Helping communities to gain the ability to collect and
analyze their own data
50	
Spotlight 1.2: The importance of good data in helping low- and
middle-income countries to manage debt during and after the
COVID-19 pandemic
53 Chapter 2: Data as a force for public good
54 The central role of public intent data
54 Public intent data and development: Three pathways for adding value
59 Gaps in the coverage, quality, and usability of public intent data
65 Why data gaps persist: The political economy of public intent data
71 Realizing the potential of public intent data
75 Notes
78 References
Contents
iv | Contents
85 Spotlight 2.1: Deploying data to curtail violence against women and girls
89 Spotlight 2.2: The role of international organizations in improving public intent data
91 Chapter 3: Data as a resource for the private sector
92 Creating value and solving development challenges through data-driven business models
92 The role of data in the production process of firms
93 Data-driven businesses and the technologies that help them create value
95 Focus on platform firms in low- and middle-income countries
98 Data inputs for economic activity
100 The positive development impacts of data used in the production process
103 How use of data in the production process is transforming sectors
107 Some potential risks and adverse outcomes of data-driven businesses to be addressed by policy
111 Notes
113 References
117 Spotlight 3.1: The huge potential of open data for business applications
121 Chapter 4: Creative reuses of data for greater value
122 The power of repurposing and combining different types and sources of data
125 New insights from repurposing and combining data
134 Limitations in using private intent data for development
136 Investments in data innovations: Building a culture of data
143 Notes
144 References
150	
Spotlight 4.1: Gathering, sharing, and using better data on weather, water, and
climate from low- and middle-income countries
152 Spotlight 4.2: Making roads safer by repurposing private intent traffic data
155 Part II: Aligning data governance with the social contract
157	
Chapter 5: Data infrastructure policy: Ensuring equitable access for poor
people and poor countries
158 Data infrastructure as a source of inequity
160 Connecting poor people
168 Connecting poor countries
174 Conclusions and recommendations
178 Notes
180 References
184	
Spotlight 5.1: How the COVID-19 pandemic has recalibrated expectations of
reasonable data consumption and highlighted the digital divide
186 Spotlight 5.2: Data’s carbon footprint
189 Chapter 6: Data policies, laws, and regulations: Creating a trust environment
190 A trust framework of data safeguards and enablers
191 Building safeguards for trusted data use
199 Creating enablers for data sharing
206 Recommendations for crafting a holistic legal framework
208 A maturity model for strengthening the legal and regulatory framework
209 Notes
215 References
222	
Spotlight 6.1: The evolving social contract on data: Balancing data sharing and data
protection to facilitate contact tracing to control COVID-19
224 Spotlight 6.2: The debate over ownership of personal data
Contents | v
227	
Chapter 7: Creating value in the data economy: The role of competition,
trade, and tax policy
228 Shaping data regulation to support competition, trade, and taxation
229 Competition policy
237 Trade policy
246 Tax policy
250 Conclusion
250 Notes
254 References
259	
Spotlight 7.1: Understanding the interface between data protection and competition
policy
261	
Spotlight 7.2: The role of regional and international cooperation in addressing data
governance challenges
265	
Chapter 8: Institutions for data governance: Building trust through
collective action
266 How can institutions help govern data for development?
267 Data management across the data life cycle
269 Data governance functions
271 Mapping data governance functions to illustrative institutions
277 Data intermediation and collaboration
280 Making data governance institutions effective
284 Sustainable outcomes through inclusive multistakeholder governance
286 Assessing the institutional foundation through the lens of a maturity model
288 Notes
291 References
297 Spotlight 8.1: The need for a new global consensus on data: A call to action
299 Spotlight 8.2: Promoting citizen science in the Amazon basin
301 Part III: Moving toward an integrated national data system
303 Chapter 9: Creating an integrated national data system
304 Toward an integrated national data system
305 The vision of an integrated national data system
309 Realizing the vision
322 Notes
324 References
Boxes
O.1 18 Toward an integrated national data
system: Country examples
1.1 24 What this Report means by data
1.2 29 Innovation in traditional surveys:
A COVID-19 example in Brazil
1.3 32 The challenges of trading data
through markets
1.4 36 Using private intent data to tackle
COVID-19
2.1 55 Six types of public intent data
2.2 60 The World Bank’s Statistical
Performance Indicators
2.3 61 Gender data and the COVID-19
pandemic
3.1 94 Technologies and methods that
support data-driven decision-making
and intermediation
4.1 122 Using cellphones to combat
COVID-19
4.2 126 Leveraging private intent data to
tackle COVID-19
4.3 131 Preventing illegal fishing in
protected maritime areas
vi | Contents
4.4 135 Using statistical methods and
private intent data to improve
representativeness and geospatial
precision
5.1 174 The brain drain—ICT professionals
8.1 266 Uruguay’s whole-of-government
approach to data governance
8.2 269 The importance and complexity of
data governance institutions: The
example of digital identification
systems
8.3 279 Increased scrutiny of and constraints
on private data intermediaries
8.4 286 Building multistakeholder data
governance into smart city initiatives
through “digital democracy”
9.1 304 Relationship between an integrated
national data system and a national
statistical system
Figures
O.1 4 How data can support development:
A theory of change
O.2 6 A social contract for data founded on
value, trust, and equity
O.3 11 Data governance layers at the
national and international levels
O.4 13 The legal and regulatory framework
for data governance remains a work
in progress across all country income
groupings
O.5 14 Since 1990, the global trade in
data-driven services has grown
exponentially and now constitutes
half of trade in services
O.6 17 What happens in an integrated
national data system?
1.1 26 The share of people counted in a
census grew from about 1 in 10 in
1850 to 9 in 10 today
1.2 30 The data life cycle
1.3 34 Three pathways along which data can
foster development
B1.4.1 36 Use of repurposed data to study
COVID-19: Published articles, by type
of private intent data used
1.4 39 Data governance layers at the
national and international levels
1.5 41 Data flow safely across all
stakeholders in an integrated
national data system
S1.1.1 49 A citizen-led method to ascertain who
has authority in household decision-
making in rural Indian villages
S1.2.1 50 In six years, the composition of debt
has shifted dramatically
2.1 55 Certain data features can maximize
the value of public intent data
2.2 56 Improving access to water: Using
real-time sensor data to reduce repair
time for broken hand pumps in
Kenya
2.3 60 Gaps in geospatial datasets are
especially large in lower-income
countries
B2.3.1 61 Proportion of COVID-19 cases
reported with sex-disaggregated data
for 190 countries
2.4 63 Lower-income countries, especially
those affected by fragility and
conflict, have less comparable
poverty data than other country
groups
2.5 64 Lower-income countries are
less likely than other countries
to adhere to international best-
practice statistical standards and
methodologies
2.6 65 A positive feedback loop can connect
enablers and features of public intent
data with greater development value
2.7 66 Most countries do not fully fund
their national statistical plans
2.8 68 The older a country’s statistical
laws, the lower is its statistical
performance and the less open are
its data
2.9 69 Greater NSO independence and
freedom of the press are positively
correlated with better statistical
performance
2.10 70 Data supply and demand can
generate either virtuous or vicious
cycles of data production and use
2.11 71 Policies to realize the potential of
public intent data
S2.1.1 85 Prevalence of female genital
mutilation in women ages 15–49, by
country income level, 2010–19
3.1 94 The role of data in the production
process: Pathways to development
3.2 96 The role of data in economic activity
3.3 97 Platform firms are numerous in some
lower-income countries but tend to
be small
3.4 97 The importance of domestic versus
foreign-headquartered firms differs
across countries as indicated by firm
share of top websites
Contents | vii
3.5 98 Users visit and spend more time on
Facebook, Google, and YouTube than
other websites
3.6 99 Internet traffic in low- and middle-
income countries is concentrated in
several US-based firms
3.7 108 Risks to market structure and market
power stemming from platform
firms
S3.1.1 118 Private company use of public data
is extremely valuable in the United
States, suggesting the value of open
government data
B4.1.1 123 Smartphone location data reveal the
changes in the time users spend at
home in Jakarta
B4.2.1 126 Use of repurposed data to study
COVID-19: Published articles, by type
of private intent data used
B4.3.1 131 Public intent and private intent data
can be combined to detect illegal
fishing activity
4.1 134 Gaps in network coverage differ
across farm sizes, affecting
agricultural extension services
4.2 137 Artificial intelligence specialists
gravitate to the US market, no matter
where they are educated
5.1 158 The data infrastructure supply chain
5.2 160 The developing world overwhelmingly
accesses data using wireless networks
5.3 161 Gaps in 3G wireless broadband
internet coverage have been
shrinking, but usage gaps remain
stubbornly high
5.4 162 Globally, the coverage of wireless
technologies reflects their constant
upgrading
5.5 164 In low- and middle-income countries,
nearly 70 percent of those who do
not use the internet are held back by
deficiencies in digital literacy
5.6 166 Inequities in mobile data
consumption across country income
groups and regions are huge
5.7 167 The monthly price for 1 gigabyte of
data is unaffordable in low-income
countries
5.8 167 Data consumption is very sensitive
to market prices and service
affordability
5.9 168 The presence of domestic data
infrastructure facilitates national
data exchanges
5.10 170 Data infrastructure is relatively
scarce in low- and middle-income
countries
5.11 171 Countries develop domestic data
infrastructure in stages
B5.1.1 174 Low- and middle-income countries
are educating ICT professionals but
not retaining them
B5.1.2 175 Major wage differentials for ICT
professionals create a brain drain,
especially in low- and middle-income
countries
S5.2.1 186 Worldwide greenhouse gas
emissions from data consumption
have been flat, even though
electricity consumption has been
growing
6.1 191 Envisioning the multidimensional
nature of the legal framework for
trust
6.2 192 The evolution of data protection
6.3 194 Gaps in the regulatory framework
for cybersecurity are glaring across
country income groups
6.4 195 Progress on personal data protection
legislation differs markedly across
country income groups
6.5 200 Adoption of e-commerce and related
legislation is widespread across
country income groups
6.6 201 Regulations enabling access to
and reuse of public intent data are
unevenly developed across country
income groups
6.7 204 Adoption of enablers for sharing
private intent data lags those for
public intent data across country
income groups
7.1 228 The unusual characteristics of
data-driven businesses pose complex
challenges for policy makers in the
areas of competition, trade, and
taxation
7.2 230 In the digital economy, antitrust
cases related to passenger transport
are more prevalent in middle-income
countries than in high-income
countries
7.3 231 Among anticompetitive practices,
abuse of dominance is more
widespread worldwide across
multiple sectors of the digital
economy
viii | Contents
7.4 238 Since 1990, the global trade in
data-driven services has grown
exponentially and now constitutes
half of trade in services
7.5 239 Three distinct approaches to
handling cross-border data flows
7.6 248 East Asian countries are losing a
substantial volume of tax revenue by
failing to apply current VAT rules to
digital services
8.1 268 Data governance and data
management, working seamlessly
together in support of the social
contract
8.2 270 Functions of data governance
8.3 273 No low-income and few lower-
middle-income countries have a
separate data governance entity;
most embed them in another
government institution
8.4 276 The lower the country income level,
the fewer are the countries with data
protection authorities
8.5 276 More than half of countries across
all income groups have antitrust
authorities
8.6 277 Only about one-quarter of low-
income countries have cybersecurity
agencies
8.7 281 Features of well-functioning
institutions for effective data
governance
S8.2.1 300 Using a diverse set of open science
and citizen science tools and
technologies in the Amazon basin
9.1 305 What happens in an integrated
national data system?
9.2 309 A data maturity model for a
hypothetical national data system
9.3 310 Steps to integrating the public sector
into the national data system
9.4 313 Steps to integrating civil society into
the national data system
9.5 315 Steps to integrating academia into
the national data system
9.6 317 Steps to integrating the private sector
into the national data system
9.7 320 Steps to integrating international
and regional organizations into
the national data system and
collaborating across borders
Maps
O.1 5 Use of aggregated cellphone records
to track mobility week by week
during COVID-19 lockdowns in
The Gambia, March–May 2020
O.2 7 Highly refined data pinpointed
areas of Nigeria that needed better
sanitation
O.3 9 Combining satellite imagery with
household survey data increases
the resolution of the poverty map
of Tanzania
O.4 12 Data infrastructure is not yet
widespread across all parts of the
world
1.1 27 John Snow’s innovative mapping of
the cholera epidemic in London in
1854 revolutionized tracing of the
disease
B1.4.1 37 Use of repurposed data to study
COVID-19: Published articles, by
country
2.1 57 Reducing poverty: Mapping pockets
of poverty in Croatia allowed better
targeting of antipoverty funds
2.2 59 Improving public safety: The use of
citizen-collected data in Bogotá led to
greater safety around bike paths
B4.1.1 122 Mapping the home location of
smartphone users in Jakarta, 2020
B4.2.1 127 Uses of repurposed data to study
COVID-19: Published articles, by
country
4.1 132 Private intent data can provide unique
and comparable information not
collected by national governments,
such as the number of adults who
lack a formal financial account
4.2 133 Agricultural extension services
can be tailored to the slower, older
broadband internet accessible to
many small-scale farmers
S4.1.1 151 Large gaps remain in global reporting
on basic weather data
S4.2.1 153 By combining police reports and
crowdsourced data, researchers were
able to identify the 5 percent of roads
where half of the crashes occur in
Nairobi
5.1 169 The global fiber-optic cable
submarine network reaches all
corners of the world, but data
infrastructure is unevenly developed
7.1 241 Uptake of regulatory models to
cross-border data flows
Contents | ix
Tables
1.1 28 Examples of data types based
on original intent and collection
methods
2.1 63 Assessment of the openness of data,
by country income group
2.2 64 Data dissemination practices and
openness, by country income group
4.1 142 Selected research gaps to be
addressed to advance the use of
private intent data for development
5.1 177 Recommendations for data
infrastructure improvements
sequenced according to a maturity
model
6.1 207 Certain elements of the regulatory
framework are much better
developed than others, but
performance is generally low
6.2 209 Recommendations organized
according to a maturity model based
on data safeguards and enablers
7.1 241 Policy bases for regulating
cross-border personal data
7.2 244 Key provisions on digital trade in
recent trade agreements
8.1 268 Data management decisions along
the data life cycle
8.2 272 Candidate institutions and actors to
perform or inform data governance
functions
8.3 278 Snapshot of common data
intermediary structures
World-Development-Report-2021-Data-for-Better-Lives.pdf
Foreword | xi
Data governance is the subject of intense debate in advanced economies and increasingly
among large emerging markets. And yet many complex policy questions remain unan-
swered. In response, World Development Report 2021: Data for Better Lives surveys the emerging
landscape and provides policy makers with a framework for thinking through the issues,
opportunities, and trade-offs. One thing is clear: the perspective of lower-income countries
has so far been largely absent from these global debates and urgently needs to be heard.
Data are a double-edged sword. On the one hand, they offer tremendous potential to create
value by improving programs and policies, driving economies, and empowering citizens. On the
other hand, data accumulation can lead to a concentration of economic and political power, rais-
ing the possibility that data may be misused in ways that harm citizens. Data are a resource that
can be used and reused repeatedly to create more and more value, but there is a problem—the
more data are reused, the higher is the risk of abuse.
It is hard to imagine a more dramatic example of these opportunities and tensions than
the COVID-19 pandemic. Countries around the world have moved swiftly to repurpose mobile
phone records to monitor the spread of the virus. But at the same time, they have struggled to
balance this benefit against privacy concerns and the risk of misuse.
Beyond pandemic times, the statistical capacity to produce and effectively use core economic
and social data is limited. Many poor countries are unable to accurately track public finances,
report on external debt, or monitor their development goals. Without such data, the ability to
hold governments accountable and track progress withers.
Data governance arrangements to facilitate greater use of data while safeguarding against
misuse remain in their infancy. The legal and regulatory frameworks for data are inadequate in
lower-income countries, which all too often have gaps in critical safeguards as well as shortages
of data-sharing measures. There, the data systems and infrastructure that enable interoperabil-
ity and allow data to flow to more users are incomplete; less than 20 percent of low- and middle-
income countries have modern data infrastructure such as colocation data centers and direct
access to cloud computing facilities. Even where nascent data systems and governance frame-
works exist, a lack of institutions with the requisite administrative capacity, decision-making
autonomy, and financial resources holds back their effective implementation and enforcement.
To address these concerns, World Development Report 2021 calls for a new social contract for
data—one that enables the use and reuse of data to create economic and social value, promotes
equitable opportunities to benefit from data, and fosters citizens’ trust that they will not be
harmed by misuse of the data they provide. However, in seeking such a social contract, lower-
income countries are too often disadvantaged because they lack the infrastructure and skills
to capture data and turn them into value; the scale and agency to participate equitably in global
data markets and their governance; and the institutional and regulatory frameworks to create
trust in data systems.
Forging a new social contract for data is a pressing domestic policy priority that will require
strengthening national data systems and engaging all stakeholders at the national level. Because
of the global scale of data, some of the most challenging aspects of the social contract also call for
closer international cooperation to harmonize regulations and coordinate policies—bilaterally,
Foreword
xii | Foreword
regionally, and globally. Critical areas for international engagement include reform of interna-
tional taxation rights for data-driven businesses, World Trade Organization arrangements for
trade in data-enabled services, regional collaboration on the development of data infrastructure,
international harmonization of technical standards to support interoperability, and bilateral
collaboration on law enforcement and antitrust regulation.
The World Bank stands ready to support its client countries on this important and challeng-
ing agenda. The findings of this World Development Report will shape support for client countries
by identifying where public and private sector investments are the most critical, defining a rich
program for policy reform and technical assistance, and highlighting areas in which global ini-
tiatives can help to convene and facilitate cross-border cooperation.
Realizing the full value of data will depend on a substantial commitment and effort, and it
will be difficult. But the cost of failure is a world of missed opportunities and greater inequities.
David R. Malpass
President
The World Bank Group
Acknowledgments | xiii
This year’s World Development Report (WDR) was prepared by a team led by directors Robert
Cull, Vivien Foster, and Dean Jolliffe. Malarvizhi Veerappan served as report manager and as
a member of the Report’s leadership. The core team was composed of Jaffar Al-Rikabi, Adele
Moukheibir Barzelay, Miriam Bruhn, Rong Chen, Niccolò Comini, Samuel Paul Fraiberger,
Craig Hammer, Talip Kilic, Jan Loeprick, Daniel G. Mahler, Michael Minges, Martin Molinuevo,
Nancy Morrison, David Newhouse, Sara Nyman, Vincent Francis Ricciardi III, David Satola,
Dorothe Singer, Philip Wollburg, and Bilal Zia, together with research analysts Kenneth Zaul
Moreno Sermeno, Nikkie Pacheco, and Elizabeth Salazar. Selome Missael Paulos provided the
team with administrative support.
Davida Louise Connon, Hai-Anh H. Dang, Cem Dener, Lucas Kitzmüller, Aspasea Mckenna,
and David Medine were members of the extended team. Rory Macmillan and Zia Mehrabi pro-
vided input as expert consultants. The Report was sponsored by the World Bank’s Development
Economics Vice Presidency and was prepared in close partnership with the Infrastructure Vice
Presidency. Overall guidance was provided by Carmen M. Reinhart, Chief Economist, and Aart
Kraay, Deputy Chief Economist. The team is especially grateful for the support and guidance
provided by Makhtar Diop, who was at that timeVice President, Infrastructure, and Haishan Fu,
Director, Data Group. The team is also appreciative of the comments and suggestions from Mari
E. Pangestu, Managing Director, Development Policy and Partnerships. In the early months of
the Report’s preparation, Pinelopi Goldberg, then Chief Economist, provided guidance as well.
The team received suggestions from an advisory panel of high-level government officials,
representatives of national statistical agencies and competition authorities, and private sec-
tor leaders: Anil Arora (Chief Statistician of Canada), Ola Awad (President, Palestinian Central
Bureau of Statistics), Dr. Long Chen (Director, Luohan Academy, representing Alibaba Group),
Neil Jackson (Chief Statistician, Foreign, Commonwealth and Development Office of the UK),
Kersti Kaljulaid (President, Republic of Estonia), Francis W. Kariuki (Director-General, Com­
petition Authority of Kenya), Zia Khan (Senior Vice President, Innovation, The Rockefeller
Foundation), Ming Maa (President, Grab), Joyce Murray (Minister, Digital Government,
Canada), Nandan Nilekani (Cofounder, Infosys, and architect of India’s Data Empowerment
and Protection Architecture), Regina Opondo (Community Director, Kounkuey Design Initia-
tive), Sandra Quijada Javer (Director, National Institute of Statistics of Chile), Haleema Saeed
(Director General, International Relations, Palestinian Central Bureau of Statistics), Julio A.
Santaella (President, National Institute of Statistics and Geography of Mexico), Pravin Srivastava
(Secretary and Chief Statistician of India), Kecuk Suhariyanto (Chief Statistician, BPS Statistics
Indonesia), David Tudor (Group General Counsel, Prosus Group), Rodger Voorhies (President,
Global Growth and Opportunity Division, Bill  Melinda Gates Foundation), Sheila Warren
(Head, Blockchain and Data Policy, World Economic Forum), and Biratu Yigezu (Director,
CSA Ethiopia). The team also received guidance from a technical board of leading academics:
Emmanuelle Auriol (Research Faculty, University of Toulouse), Marshall Burke (Associate
Professor, Stanford University), Luciano Floridi (Faculty, University of Oxford), Jan Kramer
(Faculty and Chair, Internet and Telecommunications Business, University of Passau), Jacques
Kremer (President, WorldWide Advisors, Inc.), Bruno Liebhaberg (Director General, Centre on
Acknowledgments
xiv | Acknowledgments
Regulation in Europe), Fernando Perini (Senior Program Specialist, International Development
Research Centre), John Pullinger (President, International Association for Official Statistics),
Anna Scherbina (Associate Professor, Brandeis University), and Tommaso Valetti (Faculty,
Imperial College London). The team valued their advice and found it very useful. Finally, Shaida
Badiee, Oliver Chinganya, Johannes Jutting, Stephen MacFeely, Angela Me, and Claire Melamed
provided specialized reviews of chapter 9. However, the views expressed in the Report do not
necessarily reflect those of these advisers and reviewers.
The team would also like to thank the following World Bank staff, who were members of
the “BrainsTrust,” for their guidance during preparation of the Report: Luis Alberto Andres,
João Pedro Azevedo, Andrea Barone, Kathleen Beegle, Tania Begazo, Jerome Bezzina, Calogero
Carletto, Andrew L. Dabalen, Vyjayanti Desai, Marianne Fay, Michael Ferrantino, Marelize
Gorgens, Mary Hallward-Driemeier, Kimberly D. Johns, Tim Kelly, Saki Kumagai, Daniel Leder-
man, William F. Maloney, Fredesvinda F. Montes Herraiz, James Neumann, Umar Serajuddin,
Sharada Srinivasan, Davide Strusani, and Paolo Verme. Background and related research, along
with dissemination, are being generously supported by the KDI School of Public Policy and
Management (KDIS) Partnership Trust Fund, World Bank’s Knowledge for Change Program
(KCP, a multidonor trust fund), and the United States Agency for International Development.
The team drew on the analysis, research, and literature reviews of researchers and spe-
cialists from across the world. In addition, the team would like to thank the following for
feedback and helpful discussions: Susan Ariel Aaronson, Ali Abbas, Moctar Aboubacar, Karina
Acosta, Sonia Ahmand, Sabina Alkire, Noam Angrist, Belinda Archibong, Kaushik Basu, Vitoria
Beard, Theresa Beltramo, Willem Buiter, Mayra Buvinic, Tsenguunjav Byambasuren, Anupam
Chander, Anindya Chatterjee, Nancy Chau, Katie Clancy, Julie Crowley, Shanta Devarajan,
Gary Fields, Avi Goldfarb, Zihan Hu, Yingyi Jin, Ravi Kanbur, Carlos Lopez, Bentley MacLeod,
Chelsea Madryga, Annabella Ng, Jose Antonio Ocampo, Kevin O’Neil, Ariel Ortiz-Bobea, Brian
Park, Michael Pisa, Maria Poli, Cristian Pop-Eleches, Christian Reimsbach Kounatze, Elettra
Ronchi, Benjamin David Roseth, Phet Sayo, Heba Shams, Harman Sing Dhodi, Jenna Slotin,
Michael Stanger, Joseph Stiglitz, Eric Swanson, Aberash Tariku, Jim Tebrake, Jeni Tennison,
Anh Phuock Thien Nguyen, Kathryn Toure, Miguel Urquiola, Eric Verhoogen, Stefaan G.
Verhulst, Sonali Vyas, and Yue Wang.
The following World Bank colleagues provided insightful comments, feedback, collaborat-
ion, and support: Rabah Arezki, Audrey Ariss, Angela Armstrong, Aziz Atamanov, Natalia Baal,
Prospere R. Backiny-Yetna, Guadalupe Bedoya, Kathleen Beegle, Abdoullahi Beidou, Moussa
Blimpo, Hana Brixi, Niklas Buehren, Greta L. Bull, Maurizio Bussolo, Cesar Calderon, Yu Cao,
Greg Chen, Louise Cord, Paul Andres Corral Rodas, Jose Cuesta, Conrad Daly, Richard Damania,
Olivia D’Aoust, Francesca Daverio, Zelalem Yilma Debebe, Asli Demirgüç-Kunt, Francesca de
Nicola, Tami Ann Dokken, Mark Dutz, Alison Evans, Pablo Fajnzylber, Deon P. Filmer, Juliane
Fries, Isis Gaddis, Emilia Galiano, Roberta V. Gatti, Tina George, Alejandro Medina Giopp,
Chorching Goh, Lesly Goh, Markus Goldstein, Tania Gomez, Aparajita Goyal, Caren Grown,
Boutheina Guermazi, Afef Haddad, Daniel Halim, Stephane Hallegatte, Nagaraja Harshadeep,
Kristen Himelein, Johannes G. Hoogeveen, Chyi-Yun Huang, Elisabeth Huybens, Roumeen
Islam, Ergys Islamaj, Ann-Sofie Jespersen, Anna Kalashyan, Woubet Kassa, Faruk Khan, Young
Eun Kim, Soo Min Ko, Florence Kondylis, Ayhan Kose, Holly Krambeck, Megumi Kubota,
Christoph Lakner, Somik Lall, Adriana Legovini, Norman Loayza, Nancy Lozano Gracia,
Morgan Sofia Lucey, Cathrine Machingauta, Andrew D. Mason, Aaditya Mattoo, Anna Metz,
Frederic Meunier, Sveta Milusheva, Miriam Muller, Mamta Murthi, James Neumann, Rochelle
Glenene O’Hagan, Sandie Okoro, Madalina Papahagi, Utz Johann Pape, Elizaveta Perova,
Tobias Pfutze, Martin Raiser, Laura Ralston, Martin Rama, Pooja Rao, Vijayendra Rao, Sheila
Reszapi, Mark Roberts, Denis Robitaille, Carlos Rodriguez-Castelan, Dan Oliver Rogger, Clelia
Rontoyanni, Rosemary Rop, Karla Viviana Saavedra Betancourt, Arisha Salman, Tara Sarma,
Pierre Sauve, Marc Tobias Schiffbauer, Ethel Sennhauser, Gabriel Sensenbrenner, Fatima Shah,
Parmesh Shah, Nurlina Shaharuddin, Siddharth Sharma, Brian William Stacy, Clara Stinshoff,
Acknowledgments | xv
Prachi Tadsare, Daria Taglioni, Jomo Tariku, Jeff Thindwa, Hans Timmer, Margaret Triyana,
Tea Trumbic, Vladimir Tsirkunov, Andrew Whitby, Mark Williams, Quentin Wodon, Keong
Min Yoon, and Albert G. Zeufack.
Nancy Morrison provided developmental guidance in drafting the Report, which was edited
by Sabra Ledent and proofread by Gwenda Larsen and Catherine Farley. Robert Zimmermann
verified the Report’s extensive citations. Dania Kibbi, Base Three Studio, was the principal
graphic designer. Special thanks go to Stephen Pazdan, who coordinated and oversaw pro-
duction of the Report; Mark McClure, who provided assistance; and the World Bank’s Formal
Publishing Program. The team would also like to thank Mary C. Fisk, who coordinated multiple
translations of the overview and main messages; Bouchra Belfqih and her team in Translation
and Interpretation, who worked patiently on the translations; Patricia Katayama, who over-
saw the overall publication process; and Deb Barker and Yaneisy Martinez, who managed the
printing and electronic conversions of the Report and its many ancillary products. Naureen
Aziz, Mark Felsenthal, Chisako Fukuda, David Mariano, and Mikael Reventar developed the
communications and engagement strategy. Monique Pelloux Patron provided the team with
resource management support. The team would also like to thank Maria Alyanak, Marcelo
Buitron, Gabriela Calderon Motta, Maria del Camino Hurtado, and Alejandra Ramon for their
help with coordination.
Beyond Words studio designed and developed the WDR 2021 website (http://wdr2021.world
bank.org) and data stories. The team comprised Kate Ashton, Rebecca Conroy, Jamie Gilman,
Chris Hankins, Ceri Jones, Becca Muir, Richard Pullinger, Duncan Swain, and Lucy Swan. The
illustrations for the website were created by Elisabetta Calabritto. Maarten Lambrechts and
Adrianus Willem Tulp developed the website’s data stories. Emmanuel Cosmas Maliganya,
Vidhya Nagarajan, Balaji Natarajan, Somu Sinthamani, Anushka Thewarapperuma, and Roula
Yazigi contributed to website development and dissemination.
This Report draws on background papers prepared by Meaza Abraham, Rodrigo Barajas,
Noelia Carreras, Anupam Chander, Sandeep Chandy, Yuan Fang, Martina Francesca Ferracane,
Angelina Fisher (New York University), Dayoung Park, Sharada Srinivasan, Clara Stinshoff,
Thomas Streinz (New York University), Erik van der Marel, Dereje Wolde, Isabel Yu, and the
WDR team. Federico Cardenas Chacon, Nicolas Conserva, Lillyana Sophia Daza Jaller, Paris
Gkartzonikasm, New Doe Kaledzi, Olga Kuzmina, Aliaksandra Tyhrytskaya, and Yasmin Zand
provided support for the data collection and research required for the Global Data Regulation
Survey prepared for this Report.
Contributors to the special-feature spotlights in the Report are Diana Jimena Arango,
Anna-Maria Bogdanova, Martin Brocklehurst, Libby Hepburn, Haruna Kashiwase, Stephen
MacFeely, Angela Me, Sveta Milusheva, Vijayendra Rao, David Rogers, Evis Rucaj, Sven Schade,
Lea Shanley, Rubena Sukaj, Vladimir Tsirkunov, Mariana Varese, and Divyanshi Wadhwa. The
team would also like to thank the World Bank colleagues who helped organize and facilitate
consultations and advised on translations: Jaffar Al-Rikabi, Adele Moukheibir Barzelay, Rong
Chen, Candyce Da Cruz Rocha, Nicole El Hajj, Vivien Foster, and Jevgenijs Steinbuks.
The WDR team consulted policy makers, international organizations, civil society organiza-
tions, private sector companies, development partners, academics, research institutions, as well
as the offices of the World Bank’s Executive Directors and Bank colleagues. This year’s World
Development Report faced the additional challenge of the COVID-19 pandemic, which made it
impossible to conduct the usual face-to-face consultation meetings. Nonetheless, the team was
able to turn this into an advantage because virtual consultation events enabled much wider par-
ticipation by hundreds of stakeholders. Due to travel restrictions, the team conducted consulta-
tions using a digital format and partner networks, thereby allowing more people with diverse
backgrounds to participate. Some of the events focused on regional and targeted stakeholder
consultations in multiple languages, allowing hard-to-reach organizations to join the discus-
sion. Interactions with core contributing units to the World Development Report as part of ongo-
ing debates in certain areas such as taxes leveraged wider consultations taking place on these
xvi | Acknowledgments
related issues. Team members also drew heavily on their own experiences and interactions with
the many data professionals, statisticians, data scientists, civil society organizations, and policy
makers working in often difficult conditions to use data to achieve development outcomes.
The consultations were launched in October 2019 with a high-level roundtable—a side
event of the World Bank’s annual meetings—and continued until October 2020. The team also
conducted a series of bilateral consultations from April through June 2020 with the national
statistical offices of Canada, Chile, Ethiopia, India, Indonesia, Mexico, the United Kingdom,
and West Bank and Gaza. The team thanks the organizers of the many events that brought
together a wide variety of stakeholders, including government officials, civil society orga-
nizations, academic institutions, the private sector, and international organizations. The
organizers were Germany’s Federal Ministry for Economic Cooperation and Development
(BMZ) and Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ); Canada’s Inter-
national Development Research Centre (IDRC), which organized consultations across Africa,
Asia, and Latin America and the Caribbean in French, Spanish, and English; the Global Part-
nership for Sustainable Development Data (GPSDD); the GW Elliott School of International
Affairs; Columbia and Cornell Universities; the Mastercard Advocacy Center of Excellence;
A. T. Kearney’s Global Business Policy Council; the Fletcher School of Law and Diplomacy’s
Institute for ­
Business in the Global Context at Tufts University; Committee for the Coordina-
tion of Statistical Activities (CCSA); United Nations World Data Forum (UNWDF); Organisation
for Economic Co-operation and Development’s Development Assistance Committee (DAC);
Global Data Barometer; and the Open Gov Hub.
The team organized and held a dedicated consultation event for civil society members
and several bilateral consultations with technology firms, platform-based businesses, inter-
net exchange points, payment industries, and cybersecurity firms in the private sector. They
included Alibaba, Amazon, CADE (Brazilian competition authority), De Novo, Facebook, Google,
LinkedIn, Lori Systems (Kenya), Mastercard, Power2SME (India), and UA-IX (Ukranian Inter-
net Exchange Point). The team also reached out for guidance on specific topics from experts
based in institutions that included Aelex Legal (law firm, Abuja, Nigeria), Atlantic Council
(think tank), DataPrivacy.Com.Br (São Paulo, Brazil), Georgetown University, Hamu and Com-
pany (law firm, Lagos, Nigeria), Hogan Lovells (privacy and cybersecurity law firm), Hunton
Andrews Kurth (global privacy and cybersecurity law firm), Interswitch (Lagos, Nigeria), New
York University, Organisation for Economic Co-operation and Development, Oxford Internet
Institute, Rockefeller Foundation, the United Nations Conference on Trade and Development
(UNCTAD), the University of Southern California, and WilmerHale (data and cybersecurity law
firm). Detailed information about all consultations and contributors can be found at https://
www.worldbank.org/en/publication/wdr2021/consultations.
The team is grateful as well to the many World Bank colleagues who provided written com-
ments during the formal Bank-wide review process. Those comments proved to be invaluable at
a crucial stage in the Report’s production.
The team apologizes to any individuals or organizations inadvertently omitted from this list
and expresses its gratitude to all who contributed to this Report, including those whose names
may not appear here. The team members would also like to thank their families for their support
throughout the preparation of this Report.
Abbreviations | xvii
A4AI Alliance for Affordable Internet
AI artificial intelligence
AIS automatic identification system
APEC Asia-Pacific Economic Cooperation
API application programming interface
ARPU average revenue per user
ASEAN Association of Southeast Asian Nations
ATAF African Tax Administration Forum
ATI access to information
BEPS OECD/G20 Inclusive Framework on Base Erosion and Profit Shifting
BRICS Brazil, Russian Federation, India, China, South Africa
B2B business to business
B2C business to consumer
B2G business to government
CDR call detail record
CPTPP Comprehensive and Progressive Agreement for Trans-Pacific Partnership
CSIRT Computer Security Incident Response Team
CSO civil society organization
DEPA Digital Economy Partnership Agreement
DLT distributed ledger technology
DPA data protection authority
DTA Digital Trade Agreement
EITI Extractive Industries Transparency Initiative
EU European Union
FAIR findability, accessibility, interoperability, and reuse
FAO Food and Agriculture Organization
FCS fragile and conflict-affected situations
FRAND fair, reasonable, and non-discriminatory
GATS General Agreement on Trade in Services
GDP gross domestic product
GDPR General Data Protection Regulation (EU)
GHG greenhouse gas
GIS geographic information system
GNI gross national income
GPS global positioning system
G2B government to business
G2C government to consumer
G2G government to government
HAPS high-altitude platform station
ICT information and communication technology
ILO International Labour Organization
IMF International Monetary Fund
INDS integrated national data system
Abbreviations
xviii | Abbreviations
IoT Internet of Things
IP Internet Protocol
IPRs intellectual property rights
IRB Institutional Review Board
ISO International Organization for Standardization
ISP internet service provider
IT information technology
ITU International Telecommunication Union (UN)
IUCN International Union for Conservation of Nature
IXP internet exchange point
KIXP Kenya Internet Exchange Point
LEO low Earth orbiting
LSMS Living Standards Measurement Study
ME monitoring and evaluation
ML machine learning
MLAT mutual legal assistance treaty
MNE multinational enterprise
MNO mobile network operator
MPA Marine Protected Area
MSMEs micro, small, and medium enterprises
NGO nongovernmental organization
NIC news flow indices of corruption
NPDR Non-Personal Data Regulation (EU)
NSDS National Strategies for the Development of Statistics
NSO national statistical office
ODbL Open Database License
OECD Organisation for Economic Co-operation and Development
OGP Open Government Partnership
ONS Office for National Statistics (UK)
PARIS21 Partnership in Statistics for Development in the 21st Century
PIMS personal information management system
PPP public-private partnership
PTA preferential trade agreement
RADD Radar Alerts for Detecting Deforestation
SCC standard contractual clause
SDGs Sustainable Development Goals
SINTyS National Tax and Social Identification System (Argentina)
SMEs small and medium enterprises
SPI Statistical Performance Indicators
SSO standard setting organization
TFP total factor productivity
TVWS TV white space
UN United Nations
UNCITRAL United Nations Commission on International Trade Law
UNDP United Nations Development Programme
UNEP United Nations Environment Programme
UNICEF United Nations Children’s Fund
USITC United States International Trade Commission
USMCA United States–Mexico–Canada Agreement
USSD unstructured supplementary service data
VAT value added tax
WFP World Food Programme
WHO World Health Organization
OVERVIEW
You can have data without
information, but you cannot have
information without data.
—Daniel Keys Moran, computer programmer
and science fiction author
“
World-Development-Report-2021-Data-for-Better-Lives.pdf
Overview | 3
D
ata, which are growing at an unprecedented
rate, are becoming an integral part of the daily
lives of most people everywhere. But how
does that matter for the more than 700 million people
living in extreme poverty? Is the explosion in the new
types and uses of data improving their lives? Or will
poor people and poor countries be left behind, creating
a widening gap between those who reap the benefits
of this new data-driven world and those who do not?
The innovations resulting from the creative
new uses of data could prove to be one of the most
life-changing events of this era for everyone. Like
manygeneral-purposetechnologiessuchasthesteam
engine and electricity, the transformations emerging
from the data revolution could touch all aspects of
societies and economies. But such sweeping changes
are not automatic. The productivity value of the
steam engine and electricity was realized decades
after they were first introduced. The delay occurred
not because people did not recognize the importance
of these innovations—sooner or later everyone did—
but because the new manufacturing systems needed
for these innovations to realize their economic poten-
tial could not take shape overnight. Just as electricity
itself did not result in economic development, data
alone will not improve well-being. Data can improve
social and economic outcomes, but only if they are
used systematically in ways that create information
that generates insights that improve lives.
This Report aims to answer two fundamental
questions. First, how can data better advance devel-
opment objectives? Second, what kind of data gover-
nance arrangements are needed to support the gener-
ation and use of data in a safe, ethical, and secure way
while also delivering value equitably?
One important message of this Report is that
simply gathering more data is not the answer. Sig-
nificant data shortfalls, particularly in poor coun-
tries, do exist, but the aim of this Report is to shift
the focus toward using data more effectively to improve
development outcomes, particularly for poor people in
poor countries.
Advancing development
objectives through data
Part I of this Report develops a conceptual frame-
work that links data to development through three
institutional pathways (figure O.1). The middle path-
way is the use of data by governments and interna-
tional organizations to support evidence-based pol-
icy making and improved service delivery. The top
pathway is the use of data by civil society to monitor
the effects of government policies and by individu-
als to enable them to monitor and access public and
commercial services. The bottom pathway is the use
of data by private firms in the production process—
use that fuels their own growth as well as wider
economic growth. One implication of the conceptual
framework is that data alone cannot solve develop-
ment problems: people (in society, governments,
and firms) are the central actors transforming data
into useful information that can improve livelihoods
and lives.1
Alongside capital, land, and labor, data
are also an input to the development objectives that
emerge along all three pathways. But, unlike capital,
land, and labor, using data once does not diminish
its value. Data that were initially collected with one
intention can be reused for a completely different
purpose (chapter 1).
O
V
ERVIEW
4 | World Development Report 2021
Disseminating, exchanging, and
sharing data to enhance data reuse and
repurposing
Because the potential of data to serve a productive
use is essentially limitless, enabling the reuse and
repurposing of data is critical if data are to lead to
better lives. It is thus a central aspect of the concep-
tual framework. Figure O.1 uses two-way arrows to
depict these flows. The two-way arrow between the
private sector and government/international organi-
zations indicates the reuse and repurposing of data
originally collected for commercial purposes for pub-
lic policy, and vice versa. Similarly, the two-way arrow
between individuals/civil society/academia and gov-
ernment/international organizations indicates data
being exchanged and reused by those parties. The
final two-way arrows reflect the use of private sector
data and data-driven applications by individuals/civil
society/academia and the use of data and analysis
generated by individuals/civil society/academia by
firms. In practice, however, those holding data may
be unwilling to exchange data. They may have con-
cerns about data protection and security or the need
to capture returns on investments in collecting data.
Or they may hope to gain market power from accu-
mulating data to capture economies of scale or obtain
any other kind of political or competitive advantage
from hoarding them.
The phrase “sharing and reuse” is shorthand used
in this Report for all the types of transactions and
exchanges of data that permit reuse, from government
open data initiatives for sharing data to market-based
transactionsfordatainvolvingprivatefirms.Intheory,
defining clear economic property rights over data
should enable data to be traded widely on markets.
But in practice, the extent of the data trade (beyond the
market for advertising) has been limited by competing
claims on ownership, tensions between the wide dis-
semination of data and incentives to accumulate more
data for private commercial gain, and difficulties in
assessing the quality and accuracy of data.
Each of the three pathways illustrated in figure O.1
shows how data can improve lives, but those same
pathways create openings for data to be used in ways
that harm people. Through the government pathway,
data can be abused for political ends, such as politi-
cally motivated surveillance or discrimination along
lines of ethnicity, religion, race, gender, disability
Figure O.1 How data can support development: A theory of change
Source: WDR 2021 team.
Note: Positive impacts are shown in green; negative impacts are shown in red.
Greater accountability
Criminal activity, dark net
Development
Political surveillance
Increased business opportunities
Data
production and
collection
Greater transparency
More data on individuals
Production process in firms
Reuse
Reuse
Individuals
Civil society
Academia
Data analytics and
processing
Government
International
organizations
Private sector
Better policy making
and service delivery
Market concentration
Widening inequality
Discrimination
Overview | 5
status, or sexual orientation. In the pathway running
through individuals, there is the potential for cyber-
criminals to inflict considerable harm by stealing
and manipulating sensitive information. The “dark
net” is a vast parallel network of hidden websites that
provides an underground digital platform for a wide
array of criminal activities, facilitating illegal trade in
drugs, counterfeit currency, stolen goods, credit card
numbers, forged papers, firearms, and human organs.
Similarly, through the private sector pathway, exam-
ples of harmful use include, among other things, the
exploitation of information about consumer prefer-
ences and behavior to engage in aggressive or manip-
ulative marketing techniques based on microtarget-
ing of persuasive messages or to apply algorithms
that facilitate collusion among market players.2
Unlocking data for the public good
and safeguarding against misuses:
Some COVID-19 examples
Many countries have used data to control the
COVID-19 pandemic. This use includes tracking
peo­
ple’s locations to better understand mobility pat-
terns during lockdowns or to aid in disease contact
tracing. Using call detail records (CDRs) from March
through May 2020 aggregated to mask individual-
level data, policy makers in The Gambia were able
to review maps showing the movement of people
across administrative boundaries (map O.1). These
maps helped them understand the extent to which
lockdowns were succeeding in reducing movement
and allowed them to identify the factors linked to
lockdown ­
compliance and noncompliance and plan
accordingly. Meanwhile, the government of Israel
approved emergency regulations in March 2020 to
allow the individual-level data collected from cell-
phones to be used to track people and then, through
contact tracing, to curtail the spread of COVID-19.
CDRs were not created to aid public policy making
or to allow the government to track the movements
of individuals, but they are an example of data being
reused and repurposed (flowing in the vertical chan-
nels in figure O.1 ). In Israel, these data were being
collected before the pandemic, but they could be
accessed only for national security purposes.3
These early efforts at repurposing CDRs to track
infected individuals seemed to have a positive effect.
In The Gambia, the maps helped reveal that the
lockdown disproportionally affected poorer districts,
indicating a need for relief and recovery efforts to tar-
get these areas. In Israel, analysis of the cellular data
suggested their use led to identification of more than
one-third of all of the country’s coronavirus cases in
the early weeks of the pandemic (more than 5,500 of
the 16,200 people who had contracted the disease),
possibly contributing to Israel’s exceptionally low ini-
tial rates of coronavirus infections and deaths.
This new use of CDR data to track large parts of
the population of Israel sparked debate and pushback
over concerns about the potential misuse of the data
by government. In Israel, many lawmakers raised
privacy concerns, and the Supreme Court eventually
Map O.1 Use of aggregated cellphone records to track mobility week by week during
COVID-19 lockdowns in The Gambia, March–May 2020
Source: Knippenberg and Meyer 2020. Data at http://bit.do/WDR2021-Map-O_1.
Note: Blue shades indicate outflow of people; green shades indicate inflow of people. A nationwide lockdown was imposed on March 22, 2020. Data were gathered using call
detail records.
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
Brikama
Kanifing Mansa Konko
Kerewan
Georgetown
Basse
Santa Su
BANJUL
BANJUL
BANJUL
BANJUL
BANJUL
BANJUL
BANJUL
BANJUL
BANJUL
BANJUL
BANJUL
International boundaries
District boundaries
IBRD 45xxx | MARCH 2021
Week 3: March 15–21
Week 6: April 5–11
Week 12: May 17–23
Week 4: March 22–28
Week 7: April 12–18
Week 10: May 3–9
Week 5: March 29–April 4
Week 8: April 19–25
Week 11: May 10–16
Week 13: May 24–30
Week 9: April 26–May 2
% change in population flows
0 4 8 12 16 20 40 60+
-4
-8
-12
-16
-20
-40
-60
National capital
District capitals
IBRDGAM45640 | MARCH2021
6 | World Development Report 2021
halted the program. The Court ruled in late April 2020
that the government must legislate the use of cell-
phone tracking and that “a suitable alternative, com-
patible with the principles of privacy, must be found.”4
Many of the themes of this Report are illustrated in
this example. The sharing and reuse of private sector
CDR data with public authorities created social value
by supporting the control of COVID-19 infections,
thereby saving lives. At the same time, this transfer
of data raised fundamental concerns about trust, with
citizens concerned that their CDR data could then be
repurposed by government officials for other unin-
tended and potentially harmful purposes beyond pub-
lic health. Issues of equity were also at stake. Whereas
in a high-income country like Israel smartphone pen-
etration was 93 percent, in a low-income country like
The Gambia smartphone penetration was only 75 per-
cent. In each case, that minority of the population lack-
ing a smartphone was unable to generate CDR data
and would not necessarily benefit directly from the
public health protection afforded by contact tracing.
These examples also illustrate a key conundrum.
The potential benefits that people realize in the
form of improved policies and service delivery may
increase rapidly as more data, especially personal
data, are shared and reused—but the risks of data
being misused increase as well. These potential bene-
fits depend on data being disseminated or exchanged
between parties. But parties must trust the systems,
regulations, and institutions that underlie the secu-
rity of such exchanges to willingly engage in them.
How can people trust that their data will be pro-
tected and that they will share in the value that data
can produce? The mounting nature of such concerns
suggests the need for a new social contract around
data—that is, an agreement among all participants
in the process of creating, reusing, and sharing data
that fosters trust that they will not be harmed from
exchanging data and that part of the value created by
data will accrue equitably (figure O.2). The idea that
societies engage in these sort of agreements, or social
contracts, has existed for centuries, often linked to
the writing of philosophers such as Thomas Hobbes,
John Locke, and Jean-Jacques Rousseau.
Legal systems, and governance more generally, can
be viewed as instruments for establishing, facilitating,
and enforcing social contracts. Persuading parties to
abide by the rules of a social contract is not an easy
task and will hinge on ensuring that the benefits
from using data are shared in an equitable way—that
is, everyone has something to gain. In this process,
lower-income countries are too often disadvantaged,
lacking, as they often do, the infrastructure and skills
to capture data and turn them into value; the institu-
tional and regulatory frameworks to create trust in
data systems; and the scale and agency to participate
equitably in global data markets and their governance.
With data reshaping our lives, our societies, and
the world more generally, social contracts for data are
needed both nationally and internationally, especially
because of the cross-border nature of data transac-
tions and flows. Spotlight 8.1 extends this idea of a
social contract to the international realm, calling for a
global consensus to ensure that data are safeguarded
as a global public good and as a resource to achieve
equitable and sustainable development.
The untapped potential of data; the evolving legal,
regulatory, and governance frameworks for data
generation, use, and reuse; the importance of country
context (history, culture, governance, and political
economy) in shaping appropriate frameworks; the
role of technical capabilities for making the most of
data safely; and the need for trust and more equitable
sharing of the value of data—all these are the themes
at the core of this World Development Report.
Part I of the Report begins by describing in more
detail the potential development impact of data col-
lected for public purposes—public intent data (chapter
2); data collected by the private sector as part of rou-
tine business processes—private intent data (chapter 3);
and the synergies that arise from the joint use of
different types of data (chapter 4). This distinction
between public intent and private intent data is used
Source: WDR 2021 team.
Figure O.2 A social contract for data founded on
value, trust, and equity
The full value of data
materializes when systems enable
the use and reuse of data for
different purposes.
THE
SOCIAL
CONTRACT
VALUE
A trust environment is created
when the rights and interests that
all stakeholders have in data
are safeguarded.
TRUST
All share equitably in the benefits
of data when investments and
regulations create a level
playing field.
EQUITY
Overview | 7
regardless of who collected the data or the methods
used to gather the data (such as customer surveys,
accounting records, or digital transactions).
Public intent data can improve service
delivery, targeting, accountability, and
empowerment
Public intent data hold great potential for designing,
executing, and evaluating public programs and policy
(chapter 2). Because public intent data are a prereq-
uisite for many government functions, government
agencies are the primary producers of these data by
means of censuses, administrative data collection,
and more. Citizens, civil society organizations, non-
governmental organizations, academic institutions,
and international organizations contribute critically
to the production of public intent data using surveys,
crowdsourcing platforms, and other means.
These kinds of data can lead to better lives through
three main pathways: first, by improving policy mak-
ing and service delivery; second, by prioritizing scarce
resources and targeting them to reach marginalized
populations and areas; and third, by holding govern-
ment accountable and empowering individuals to
make better choices through more information and
knowledge.
An example from Nigeria illustrates the power
of public intent data to improve and target service
delivery. The 2015 National Water Supply and Sanita-
tion Survey commissioned by Nigeria’s government
gathered data from households, water points, water
schemes, and public facilities, including schools and
health facilities. These data revealed that 130 million
Nigerians (or more than two-thirds of the population
at that time) did not meet the standard for sanitation
set out by the Millennium Development Goals and
that inadequate access to clean water was especially
an issue for poor households and in certain geograph-
ical areas (map O.2).5
In response to the findings from
the report based on these data, President Muham-
madu Buhari declared a state of emergency in the
sector and launched the National Action Plan for
the Revitalization of Nigeria’s Water, Sanitation and
Hygiene (WASH) Sector.6
The higher the quality of the data (in terms of
features such as timeliness, accuracy, and resolution),
the greater is their potential to generate value for
development. Yet a variety of factors prevent coun-
tries—particularly low-income ones—from realizing
greater value from data for the public good. These
impediments include lack of resources, technical
capacity, data governance, and demand for data-
informed decision-making. The World Bank’s Statis-
tical Performance Indicators, released as part of this
Report, identify gaps in the availability, quality, and
usability of public intent data across 174 countries,
focusing on features related to the timeliness, granu-
larity, interoperability, and accessibility of those data.7
Unleashing the full potential of public intent data
requires high-level prioritization of data in the policy
process. Governments would then prioritize the pro-
duction of high-quality data and the open and trans-
parent use of data for decision-making. Transparency
and reliability of official statistics can help build trust
in government actions. A lack of transparency, such as
not revealing a country’s debt burden, can have harm-
ful economic consequences and damage the public’s
trust in government (see spotlight 1.2). Fulfilling the
potential of data requires long-term, stable financing
of data; investments in statistical and technical capac-
ity; and laws conducive to safe data production and
reuse. Other areas that must be addressed include low
levels of data literacy affecting the demand for data,
policy makers’ lack of incentives for and interest in
using data, low trust in the quality of public intent
data, and lack of infrastructure for accessing and
using the data. These investments and initiatives rely
on one another, and so failure to succeed in one area
jeopardizes the overall value that data can bring to
Source: World Bank 2017. Data at http://bit.do/WDR2021-Map-O_2.
Note: Geographic hotspots of inadequate access to improved sanitation are shown from the least
severe ( ) to the most severe ( ) in terms of the percentage of the population in that area that
meets an international benchmark for sanitation.
Map O.2 Highly refined data pinpointed areas of
Nigeria that needed better sanitation
IBRDNIR45641 | MARCH2021
ABUJA
ABUJA
0–19.9
20–39.9
40–59.9
60–79.9
80–100
National capital
International boundaries
State boundaries
% of population with inadequate
access to improved sanitation
8 | World Development Report 2021
development. Effective use of data can generate more
demand for data, thereby justifying investments to
produce more, and higher-quality, data.
Private intent data can fuel growth and
boost development
Data collected and curated by the private sector for
commercial purposes also hold great potential to
spur development (chapter 3). Innovations in the use
and application of data by businesses are creating tre-
mendous economic value by enhancing data-driven
decision-making and reducing transaction costs.
A 2011 study of 179 large firms in the United States
indicated that firms adopting data-driven decision-
making increased their productivity by 5–6 percent
relative to what would be expected in view of their
otherinvestmentsanduseofinformationtechnology.8
Although data are in many ways an input to the
production process of firms, much of the recent
explosion of new data has come about as a by-
product of economic activity, such as digitization of
firm operations, mobile phone usage by individuals,
digital transactions, and social media interactions.
These data are collected at high frequency and can
provide detailed information on individuals, busi-
nesses, economic outcomes, and phenomena. They
not only enhance the economic efficiency of the firms
themselves, but also offer potential to be repurposed
for public policy needs such as COVID-19 tracking. For
example, financial services providers are increasingly
adopting alternative credit scoring techniques to
solve the long-standing issue of lack of data on poten-
tial borrowers (or more specifically, asymmetric infor-
mation) in banking. These techniques take advantage
of users’ digital footprints to assess creditworthiness
for those who otherwise lack documentation. Two
prominent examples of this approach are Lenddo,
which operates in the Philippines, and Cignifi, which
operates in Africa, Asia, and Latin America.
But these trends also come with new risks that
must be addressed to ensure that the data-driven
economy raises social welfare. Concerns are growing
about excessive data collection, insufficient gover-
nance of data held by private firms, and inadequate
protection of personal data. Many of these concerns
revolve around the misuse of personal data. Such mis-
uses include the failure of firms to properly protect
the financial information of clients—exposing them
to theft of funds or identity—or firms’ engagement in
unauthorized use of, or failure to protect, individuals’
confidential health or location data.
Many of the processes through which firms create
value with their data are driven by algorithms and
machine learning. In these models, algorithms deter-
mine, among other things, what information, prod-
ucts, or services individuals are exposed to and at
what price; what insurance packages they are offered;
whether their loan applications are approved; what
jobs they qualify for; and what medical advice they
receive.
All these types of activities have the potential to
significantly improve economic efficiency. For exam-
ple, by consuming more data types and extracting
relevant information from seemingly unrelated pat-
terns, machine learning could generate credit scores
for more individuals with greater precision. However,
if the data fed into the machine learning embed
discriminatory assumptions, machine learning will
amplify that discrimination, not only producing
harmful results, but also magnifying them.9
This
point brings to mind the decades-old data science
adage “garbage in–garbage out,” meaning that a data
processing system such as machine learning is no bet-
ter than the data it is given to process.10
But there is a
deeper concern: the output from machine learning is
typically opaque and changes frequently as new data
enter the system. Almost by design, it creates a rule
that is not transparent, and so identifying discrimi-
natory elements of the algorithm can be technically
very challenging.
Often, data-driven markets exhibit positive net-
work externalities, leading to increasing returns
to scale and a propensity for a few large firms to
dominate. The result can be the exclusion of smaller
or more traditional firms to the detriment of local
entrepreneurship, with possible risks for consumer
welfare. These effects may be exacerbated in devel-
oping markets, where entrants find it harder to raise
start-upcapitalandwherethereislimitedhumancap-
ital in data sciences. To counteract this, policy makers
can address the underlying constraints to achieving
scale, such as geoblocking (restricting access to inter-
net content based on the user’s geographical location)
or lack of harmonization of data policies across coun-
tries. They can ensure that sector regulations and
government support schemes provide a level playing
field for all firms.
Combining and repurposing data can
deepen their development impact
Combining and repurposing different types of data
can enhance the impacts of data on development
(chapter 4). Development problems are complex,
spanning economic, cultural, environmental, demo-
graphic, and many other factors. Policy design based
on data covering only one factor will be incomplete,
Overview | 9
and sometimes ill-advised. Combining different types
of data can fill data gaps and offer new perspectives
on development problems.
As one example, public intent household surveys,
which gather extensive data on living standards, con-
sumption, income, and expenditures, are the basis for
estimating national poverty rates in most countries.
Because the survey instrument is so extensive and
time-consuming to administer, the samples tend to be
relativelysmall.Estimatesof poverty are usually statis-
tically valid for a nation and at some slightly finer level
of geographic stratification, but rarely are such house-
hold surveys designed to provide the refined profiles
of poverty that would allow policies to mitigate pov-
erty to target the village level or lower. Meanwhile, for
decades high-resolution poverty maps have been pro-
duced by estimating a model of poverty from survey
data and then mapping this model onto census data,
allowing an estimate of poverty for every household in
the census data. A problem with this approach is that
census data are available only once a decade (and in
many poorer countries even less frequently).
Modifications of this approach have replaced pop-
ulation census data with CDR data or various types
of remote sensing data (typically from satellites, but
also from drones). This repurposing of CDR or satel-
lite data can provide greater resolution and timelier
maps of poverty. For example, using only household
survey data the government of Tanzania was able to
profile the level of poverty across only 20 regions of
the country’s mainland. Once the household survey
data were combined with satellite imagery data, it
became possible to estimate poverty for each of the
country’s 169 districts (map O.3). Combining the two
data sources increased the resolution of the poverty
picture by eightfold with essentially no loss of preci-
sion. Other examples of this innovative analysis are
occurring in some of the world’s most data-deficient
environments such as Afghanistan and Rwanda,
offering solutions to pressing data gaps.11
Examples of other ways of repurposing data
include using online media and user-generated
content to map water/flood events in real time for
water management and food security and combining
Map O.3 Combining satellite imagery with household survey data increases the resolution of
the poverty map of Tanzania
Source: World Bank 2019. Data at http://bit.do/WDR2021-Map-O_3.
a. Poverty map using the Household Budget Survey
(20 regions)
b. Poverty map combining the data in panel a with satellite imagery
(169 districts)
DODOMA
DODOMA
ARUSHA
MANYARA
MARA
NJOMBE
SIMIYU
R U V U M A MTWARA
KILIMANJARO
IRINGA
L I N D I
DODOMA
KIGOMA
MWANZA
GEITA
SHINYANGA
RUKWA
K A T A V I
M B E Y A
PWANI
T A B O R A ZANZIBAR
NORTH
PEMBA
NORTH
PEMBA
SOUTH
ZANZIBAR
SOUTH
ZANZIBAR
URBANWEST
DARESSALAAM
KAGERA
SINGIDA TANGA
MOROGORO
DODOMA
DODOMA
IBRD 45xxx |
MARCH 2021
36–45
32–35
26–31
22–25
8–21
No data
National capital
International boundaries
Province boundaries
38–67
31–37
24–30
17–23
4–16
No data
National capital
International boundaries
District boundaries
Poverty incidence, 2018
(% of population)
Poverty incidence, 2018
(% of population)
IBRDTAN45642 | MARCH2021
10 | World Development Report 2021
satellite imagery data from public and private sources
to monitor crop yields and forecast malnutrition.12
Similarly, many examples in this Report highlight the
potential for repurposing data to improve programs,
policies, and outcomes in areas such as monitor-
ing public health (including the spread of disease),
managing crisis response and resource allocation,
ensuring road safety in transport and transit, and
monitoring illegal fishing and deforestation.
Novel ways to create and use data enable civil
society to hold governments accountable for policies
and to better monitor corruption.13
For example, uti-
lizing crowdsourced data and web scraping (extract-
ing data from websites), social media discussion
boards are emerging as ways in which local leaders
can act against corrupt officials and receive real-
time feedback on the impact of anticorruption pol-
icies. The “I paid a bribe” online initiative launched
in 2011 by the Janaagraha Centre for Citizenship and
Democracy in India has developed into one of the
largest crowdsourced anticorruption platforms in
the world. This tool collects citizens’ reports of cor-
rupt behavior and merges them with geospatial data
to highlight problem areas. In doing so, it empowers
individuals, civil society, and governments to fight
corrupt behavior.
To encourage more efforts to repurpose and com-
bine data sources, this Report describes ways in which
donors, governments, and companies could invest
in the people, partnerships, and research needed to
leverage these new data sources for public benefit.
Low-income countries should emphasize policy ini-
tiatives and investments in building the data skills
of analysts and decision-makers; expanding tertiary
education to encompass data science and analytics;
promoting partnerships with universities and private
companies in higher-income countries; strengthen-
ing the data literacy of senior government leadership;
creating institutional environments that encourage
the use of sophisticated data and evidence in policy
making; and revamping national statistical offices to
perform nontraditional roles with private intent data.
Aligning data governance with
the social contract
A well-designed data governance framework allows
countries to capture the full economic and social
value of both public intent and private intent data
and leverages synergies between them. This involves
creating trust in the integrity of the data system,
while ensuring that the benefits of data are equitably
shared. Such a framework is the tangible expression
of a country’s social contract around data.
Part II of this Report describes these building
blocks of data governance, which can deliver the
potential benefits of data while safeguarding against
harmful outcomes (figure O.3). These building blocks
include data infrastructure policies (chapter 5); poli-
cies, laws, and regulations around data (chapter 6);
related economic policies (chapter 7); and data gover-
nance institutions (chapter 8).
Although much of data governance is domestic
in focus, an efficient and equitable resolution of
many data governance challenges is possible only
with international collaboration. Bilateral efforts are
needed to manage cross-border spillovers of antitrust
decisions and to join forces to combat cybercrime.
Multilateral cooperation is essential to address global
free-rider problems (such as data protectionism or
tax evasion in data-enabled services) and to reduce
transaction costs through harmonization of legal and
technical standards for data protection and interop-
erability. At the same time, regional collaboration can
help amplify the voice of low- and middle-income
countries in global data governance negotiations and
help realize scale economies in the development of
data infrastructure.
Improving data infrastructure helps ensure
equitable access for poor people in poor
countries
The digital character of modern data calls for digital
infrastructure—a prerequisite for collecting, exchang-
ing, storing, processing, and distributing data (chapter
5). Yet the availability of such infrastructure is marked
by inequity both within and between countries.
Because the social and economic value of data infra-
structure rises steeply as more and more citizens are
connected, universal service policies have long existed
to promote service rollout. In recognition of the trans-
formative opportunities that broadband connectivity
presents for both individuals and nations, the United
Nations Broadband Commission has committed
the international community to reaching 75 percent
broadband-internet user penetration by 2025.14
That said, efforts to move toward universal access
face fundamental challenges. First, because of the
continual technological innovation in mobile tech-
nology service, coverage is a moving target. Whereas
in 2018, 92 percent of the world’s population lived
within range of a 3G signal (offering speeds of 40
megabytes per second), that share dropped to 80
percent for 4G technology (providing faster speeds
of 400 megabytes per second, which are needed for
more sophisticated smartphone applications that
can promote development). The recent commercial
launch of 5G technology (reaching speeds of 1,000
Overview | 11
• Antitrust for data platform businesses
• Trade in data-enabled services
• Taxation of data platform businesses
• International tax treaties to allocate
taxation rights across countries
• Global trade agreements on
cross-border trade in data-enabled
services
• Government entities to oversee,
regulate, and secure data
• Other stakeholders to set standards
and increase data access and reuse
• International organizations to support
collaboration on data governance and
promote standardization
• Cooperation on cross-border regulatory
spillovers and enforcement issues
National International
•
•
Universal coverage of broadband
networks
Domestic infrastructure to exchange,
store, and process data
•
•
Safeguards to secure and protect
data from the threat of misuse
Enablers to facilitate data sharing
among different stakeholders
• Cybersecurity conventions for
collaboration on tackling cybercrime
• Interoperability standards to facilitate
data exchanges across borders
•
•
Global technical standards for
compatibility of hardware and software
Regional collaboration on data
infrastructure to achieve scale
Infrastructure
policies
Institutions
Economic
policies
Laws and
regulations
megabytes per second) in a handful of leading-edge
markets risks leaving the low-income countries even
further behind. Policy makers can hasten technolog-
ical upgrades by creating a supportive environment
for private sector investment in the underpinning
fiber-optic networks, while introducing more effec-
tive management of critical spectrum resources.
Sharing infrastructure can also greatly reduce the
cost of upgrades. Yet a careful balance must be struck
between promoting competition in broadband provi-
sion wherever possible and encouraging cooperation
between service providers in market segments where
demand is too limited to support more than one infra-
structure network.
The second challenge is that a substantial major-
ity of the 40 percent of the world’s population who do
not use data services live within range of a broadband
signal. Of people living in low- and middle-income
countries who do not access the internet, more than
two-thirds stated in a survey that they do not know
what the internet is or how to use it, indicating that
digital literacy is a major issue.15
Affordability is also
a factor in low- and middle-income countries, where
the cost of an entry-level smartphone represents
about 80 percent of monthly income of the bottom
20 percent of households.16
Relatively high taxes and
duties further contribute to this expense.17
As costs
come down in response to innovation, competitive
pressures, and sound government policy, uptake
in use of the internet will likely increase. Yet even
among those who do use the internet, consumption
of data services stands at just 0.2 gigabytes per capita
per month, a fraction of what this Report estimates
may be needed to perform basic social and economic
functions online.
A third challenge in expanding connectivity is
its potential impact on global warming. The climate
impacts of increased connectivity present a set of
complicated trade-offs. In 2018 the electricity needed
to support data infrastructure was equal to approxi-
mately 1 percent of global consumption—a signif-
icant draw with environmental consequences. But
because of reliance on renewable energy–supported
data infrastructure and increasing energy efficien-
cies, greenhouse gas emissions linked to data infra-
structure are disproportionately lower than for other
sectors. Furthermore, access to data infrastructure
can have significant positive climatic effects as illus-
trated by the massive reduction in travel and increase
in videoconferencing during COVID-19 (spotlight 5.2).
Figure O.3 Data governance layers at the national and international levels
Source: WDR 2021 team.
12 | World Development Report 2021
Full participation in the data-driven economy
entails not only connecting individual citizens but
also developing adequate data infrastructure at the
national level. For the most part, low- and middle-
income countries lack domestic facilities to allow
their own locally generated data to be exchanged (via
internet exchange points, IXPs), stored (at colocation
data centers), and processed (on cloud platforms)—
see map O.4. Instead, many continue to depend on
overseas facilities, requiring them to transfer large
volumes of data in and out of the country—for which
they pay a substantial penalty in terms of slower
speed and higher prices.
Policy makers can do much to improve access to
data infrastructure progressively. This process begins
by encouraging the creation of domestic IXPs and
then fostering a suitable investment climate for colo-
cation data centers. In these centers, popular internet
content can be stored locally, and access to overseas
cloud infrastructure can be facilitated through the
provision of on-ramps. Such facilities can be shared at
the regional level, where suitable fiber-optic connec-
tivity exists between countries and there is adequate
regulatory harmonization. Because of the extremely
high standards of reliability required for data infra-
structure, as well as concerns about the carbon
footprint of data, the ideal private sector investment
climate should provide for reliable, clean, low-cost
electricity, natural cooling, and negligible disaster
risk—conditions that are not always readily met in
low- and middle-income countries.
Data laws and regulations can help create
an environment of trust
Trust in data transactions can be supported through a
robust legal and regulatory framework encompassing
both safeguards and enablers (chapter 6). The establish-
ment of such a framework remains a work in progress
across all country income groups (figure O.4).
Safeguards promote trust in data transactions by
avoiding or limiting harm arising from the misuse
of data. A fundamental prerequisite for trust in data
systems is cybersecurity. Achieving adequate cyber-
security calls for creating a legal framework that
obliges data controllers and processers to adopt tech-
nical systems to secure data.18
To date, only a small
minority of low- and middle-income countries have
adopted adequate legal frameworks for cybersecurity.
Kenya’s new Data Protection Act stands out as a good
example of comprehensive cybersecurity provisions.
Creation of an adequate legal framework for data
protection is also critical. Such a framework should
clearly differentiate between personal data (data
that identify the individual) and nonpersonal data
Map O.4 Data infrastructure is not yet widespread across all parts of the world
Sources: PeeringDB, Interconnection Database, https://www.peeringdb.com/; PCH Packet Clearing House, Packet Clearing House Report on Internet Exchange Point Locations
(database), accessed December 14, 2020, https://www.pch.net/ixp/summary; TeleGeography, Submarine Cables (database), https://www.submarinecablemap.com/. Data at
http://bit.do/WDR2021-Map-O_4.
Colocation data centers
Internet exchange points (IXPs)
Submarine cable landing stations
Submarine cables
IBRD 45xxx |
MARCH 2021 IBRDWLD45643 | MARCH2021
Overview | 13
(data that do not contain any personally identifiable
information). Among middle-income countries, Mau-
ritius is notable as having relatively well-developed
safeguards for personal data. Indeed, it has distin-
guished itself as one of the first Sub-Saharan African
countries to ratify the Council of Europe’s Convention
108+ for the Protection of Individuals with Regard to
the Processing of Personal Data.19
The protection of personal data is grounded in
international human rights law, which requires
that the interests of the data subject be adequately
safeguarded before enabling any kind of data trans-
action.20
This protection is usually achieved by com-
pelling the subjects of data to provide some form of
explicit consent for use of the data. But is such con-
sent meaningful? Evidence indicates that it would
take the average person 76 days a year to thoroughly
read the numerous disclosure documents soliciting
his or her consent to each website and application
visited!21
Thisfindingsuggeststheneedtostrengthen
the legal obligations for data service providers to act
in the best interest of the customers whose data are
being used.
Becauseofthelesssensitivenatureofnonpersonal
data, they can for the most part be adequately pro-
tected through intellectual property rights, allowing
some balancing of interests between data protection
and data reuse. However, this Report finds that most
low-income countries surveyed do not have intellec-
tual property rights in place for private intent data.
Complicating matters further, the distinction
between personal data and nonpersonal data is
becoming increasingly blurred. This blurring arises
from the widespread mixing and processing of differ-
ent data sources using sophisticated algorithms that
may render nonpersonal data (such as from mobile
phones) personally identifiable, or at least make it
possible to identify specific social groups.
Enablers facilitate access to and reuse of data
within and among stakeholder groups to ensure
that the full social and economic value of data can
be captured. The nature and extent of provisions to
support data sharing differ markedly across public
intent and private intent data. Significant efforts have
been made around the world to safely disclose public
intent data through open data policies (encouraging
proactive publication of government data), together
with access to information legislation (giving citizens
a legally enforceable right to compel disclosure). For
real impact, however, open data policies must be
supported by a consistent protocol for classifying
sensitive data, combined with interoperable techni-
cal standards, machine readable formats, and open
licensing to facilitate subsequent reuse.
Governments have much less influence when it
comes to disclosure of private intent data. Sharing of
such data may serve as a remedy for the concentra-
tion of market power, such as in the Arab Republic
of Egypt, where a merger between two major
ride-hailing applications was made conditional on
their sharing driver and rider information with
smaller competitors. In other contexts, private intent
data may also be critical for addressing important
public policy challenges, such as the use of mobile
phone records for contact tracing to control the
spread of COVID-19. And yet relatively little attention
has been paid so far to the possibility of incentivizing
the exchange of private intent data through measures
such as open licensing, data portability, and various
types of data partnerships. Some countries—notably
France—have nonetheless enacted legislation man-
dating the sharing of private sector data deemed to
be in the public interest.22
Aligning data regulation with economic
policy objectives can support the creation
of value
Data play a central role in rapidly expanding plat-
form-based business models. For example, search
engines collect data on users’ site visits, which they
can sell to marketing companies so they can target
Figure O.4 The legal and regulatory framework for
data governance remains a work in progress across
all country income groupings
Source: WDR 2021 team, based on Global Data Regulation Survey conducted exclusively for this Report
(https://microdata.worldbank.org/index.php/catalog/3866). Data at http://bit.do/WDR2021-Fig-O_4.
Note: The figure depicts the percentage of good practice laws and regulatory measures in place for
countries covered by the survey in each country income group.
0
10
20
30
40
50
60
70
80
90
E-commerce framework
Enabling access to
public intent data
Enabling access to
private intent data
Cybersecurity
Personal data protection
Nonpersonal
data protection
Cross-border
data flows
High-income Upper-middle-income
Lower-middle-income Low-income
14 | World Development Report 2021
advertisements more precisely. These platform-based
business models are becoming increasingly import-
ant in low- and middle-income countries. The design
of legal and regulatory frameworks for data has a real
impactonthebuoyancyofdata-drivenbusinessesand
requires difficult policy balances. Providing access to
essential sources of market data, for example, may
be critical for promoting competition among plat-
form businesses, but it also may affect incentives for
investment and innovation in data-driven businesses.
Again, regulations designed to protect personal data
may restrict cross-border data flows and materially
affect a country’s competitive edge in the burgeoning
trade of data-enabled services (chapter 7).
Competition and antitrust policy. Competition pol-
icy plays a critical role in ensuring that the value
created by platform-based business models is equita-
bly shared by producers and consumers.The presence
of economies of scale in data collection externalities
that increase the value of networks as more partici-
pants join platforms may lead to rapid accumulation
of market power. Addressing such market dominance
calls for two complementary strategies.
First, in countries that have sufficient capacity to
enforce antitrust regulation,ex post antitrustenforce-
ment should be applied—albeit with any adaptations
that may be needed to address the challenges posed
by data-driven businesses. For example, the standard
test of market dominance—overpricing by a market
leader—may not be meaningful in sectors where plat-
forms routinely provide consumer services for free.
However, even though several landmark antitrust
cases involving platform businesses have emerged
in middle-income countries, such as Egypt, India,
and Mexico, this Report finds that not a single low-
income country has completed such a case, despite
the presence of the same globally dominant firms in
these markets.
Second, in parallel with antitrust efforts, ex ante
regulatory measures to make essential data accessi-
ble to rival firms and new entrants also merit serious
consideration, as does empowering consumers to
switch among competing providers by mandating
full portability of their personal data. Care should
always be taken to verify that access to data is critical
for competition and does not unduly affect incentives
for innovation in data-driven businesses.
Tradepolicy. Platform-based businesses also open up
newavenuesforinternationaltrade,entailingsubstan-
tial cross-border flows of data (figure O.5). A country’s
regulatory framework for personal data protection
has a material impact on participation in such trade,
Figure O.5 Since 1990, the global trade in data-driven services has grown
exponentially and now constitutes half of trade in services
Source: WDR 2021 team calculations, based on World Bank, WITS (World Integrated Trade Solution) database, http://wits.worldbank.org/WITS/. Data at
http://bit.do/WDR2021-Fig-O_5.
Note: IP = Internet Protocol; PB = petabytes.
6,000
Services
(US$,
billions,
current)
Digital
data
flows
(PB
per
month)
5,000
4,000
3,000
2,000
1,000
1
9
9
0
1
9
9
1
1
9
9
2
1
9
9
3
1
9
9
4
1
9
9
5
1
9
9
6
1
9
9
7
1
9
9
8
1
9
9
9
2
0
0
0
2
0
0
1
2
0
0
2
2
0
0
3
2
0
0
4
2
0
0
5
2
0
0
6
2
0
0
7
2
0
0
8
2
0
0
9
2
0
1
0
2
0
1
1
2
0
1
2
2
0
1
3
2
0
1
4
2
0
1
5
2
0
1
6
2
0
1
7
2
0
1
8
0
100,000
80,000
60,000
40,000
20,000
0
Computer, communications, and other services Travel services
Transport services Other (unclassified) IP traffic (right axis)
Global
digital
data
flows
Traditional services
Data-driven services
Overview | 15
creating some tension between trust and value cre-
ation. Countries have adopted a variety of approaches
for dealing with this. Some, notably the federal juris-
diction in the United States, permit open data flows
based on private sector standards, with limited gov-
ernment involvement. Others, such as China, Nigeria,
the Russian Federation, and Vietnam, apply more
stringent regulatory requirements, requiring copies
of certain personal data to be stored domestically (data
localization) and state authorization for many interna-
tional exchanges. In between are countries (including
members of the European Union and others such as
Argentina and South Africa) that make cross-border
transfers of personal data conditional on whether the
partner trading country offers an adequate data pro-
tectionregime.ThisReportfindsthatacombinationof
well-defined domestic personal data protection mea-
sures with relative ease of cross-border movements
appears to offer the most favorable environment for
international trade in data-enabled services.23
Tax policy. Even though data-driven transactions
are creating more economic activity, the governments
of low- and middle-income countries are struggling
to share equitably in this value by mobilizing the
associated tax revenues. For indirect taxes (such as
value added taxes), the revenue rights are clearly
allocated to the country in which the final sales
are made. However, the administrative capacity to
capture this revenue is typically lacking. Estimates
for East Asian countries suggest that losses to fiscal
revenues could amount to as much as 1 percent of
the gross domestic product (GDP) by 2030.24
As for
direct taxes (such as corporate taxes), agreed-on inter-
national rules are lacking for allocating rights to tax
businesses that operate in markets without any phys-
ical presence. In the absence of such a consensus, an
increasing number of countries have been resorting
to the application of ad hoc digital service taxes as a
compensatory measure.
Sound institutions and governance can
improve the development impact of data
If institutions do not function well, policies and
laws and regulations are unlikely to be implemented
or enforced effectively, and infrastructure will not
deliver on its potential. An effective institutional
framework for data governance must fulfill several
critical functions, such as setting policy objectives,
developing supporting rules and standards, enforcing
compliance with such regulations, and continually
improving governance through learning and eval-
uation (chapter 8). For example, Uruguay’s creation
of a lead agency close to the Office of the President
and acting with a whole-of-government perspective
has been critical in driving the country’s successful
­
e-government reforms since 2007.
Although there is no one-size-fits-all approach
for governments seeking to create robust institu-
tional arrangements for data governance, certain
institutional design characteristics are of universal
importance. Institutions should be formally man-
dated, sufficiently resourced, and have the technical
capacity needed to effectively undertake their func-
tions in a coordinated manner across the whole of
government. The main institutional actors within
this framework often include data governance enti-
ties, data protection authorities, and cybersecurity
agencies, as well as new types of institutions such
as data trusts—accountability-oriented data interme-
diaries allowing individuals to pool their legal rights
over data and assign them to trustees with explicit
fiduciary duties. Institutional independence and
functional autonomy may be critical in some cases
to shield data governance institutions from undue
political or commercial influence. Behavioral and
cultural norms and political economy constraints
often stymie reform efforts, creating implementation
gaps, especially in low- and middle-income countries.
Change management, collaborative leadership, and a
culture of performance and incentives can help insti-
tutions overcome barriers to implementation and
coordination and effectively perform their roles and
responsibilities.
To maximize buy-in from all participants in the
data governance ecosystem, including society more
broadly, data management must be socially inclusive
and perceived as legitimate. Legitimacy is enhanced
when governments manage and use data in a
transparent manner and are subject to meaningful
systems of accountability. Nongovernmental actors
and emerging mechanisms such as data interme-
diaries can play an important role in the ecosystem
by helping governments and end users responsibly
share and use data to better harness their develop-
ment value, while safeguarding against the risks of
misuse or abuse. Engaging with stakeholders, across
society and internationally, in a collaborative and
transparent manner will foster trust and legitimacy
and strengthen the social contract around data use.
For example, the Association of Southeast Asian
Nations (ASEAN) has adopted a regional Framework
on Digital Data Governance, which helps coordinate
members’ data governance arrangements with a view
toward interoperability.
16 | World Development Report 2021
Moving toward an integrated
national data system
A well-functioning data governance framework
ensures that infrastructure, laws, economic policies,
and institutions work together to support the use of
data in a way that aligns with each society’s values,
while protecting individuals’ rights over use of their
data. This framework defines the rules, and associ-
ated compliance mechanisms, for how data can be
safely shared, used, and reused by all stakeholders.
Part III of this Report concludes with an aspira-
tional vision of an integrated national data system
(INDS) that can deliver on the promise of producing
high-quality data and then making data open in a
way that they are both protected and accessible to be
shared and reused by all stakeholders (chapter 9). The
aspirational INDS works seamlessly with the gover-
nance structure. If the governance framework can
be viewed as creating and enforcing the “rules of the
road,” the INDS can be seen as the “network of high-
ways” that connect all users, ensuring safe passage of
data to and from destinations.
The INDS is built on an intentional, whole-of-
government, multistakeholder approach to data
governance. It explicitly builds data production, pro-
tection, exchange, and use into planning and deci-
sion-making across government entities and actively
integrates the various stakeholders from civil society,
the public sector, and the private sector into the data
life cycle and into the governance structures of the
system.25
A well-functioning system requires people to
produce, process, and manage high-quality data;
people to populate the institutions that safeguard and
protect the data against misuse; and people to draft,
oversee, and implement data strategies, policies, and
regulations. The system also needs people to hold the
public and private sectors accountable and people
capable of using data from the production process of
private firms to improve policies in the public sector.
All this requires robust data literacy so that a wide
cross section of people benefit from an INDS.
For a sound INDS, institutions and actors must
also have the right incentives to produce, protect, and
share data, and funding must be sufficient to imple-
ment the infrastructure and institutions needed for
the system to function well. Finally, a culture of data
use helps foster a high-quality supply of data and
stimulate the demand for data-informed decision-
making without which the national data system is
not sustainable.
When government agencies, civil society, aca-
demia, and the private sector securely take part in
a national data system, the potential uses of data
expand and so does the potential impact on devel-
opment. In fact, the more integrated the system and
the more participants involved, the higher is the
potential return. If two participants safely exchange
data, data can flow in two directions. If three partici-
pants exchange data, data can flow in six directions,
and with four participants, in 12 directions. As data
are reused and repurposed, these connections will
increase rapidly. Higher degrees of integration
require close coordination and shared governance
between participants, but such integration is other­
wise compatible with a decentralized data architec-
ture. The system is designed to ensure that data flow
freely and safely—not remain in one place.
Even though most countries are far away from the
aspirational goal of a well-functioning data system,
setting sights on this target can provide countries
with guidance on the next steps in developing such a
system. How countries move toward this vision of an
INDS will depend on their current capacity and the
parameters of the social contract for data. There is no
singular blueprint for how to build an INDS. Instead,
this Report proposes a maturity model to help assess
progress. Countries in the initial stages are likely to
benefit the most from establishing the fundamentals
for an integrated national data system. This includes
developingpoliciesandstrategiesaimedatbetterdata
governance, strengthening the technical capacity for
data production and use of government agencies and
the national statistical office, and promoting data lit-
eracy through education and training. With the fun-
damentals in place, governments can work on initiat-
ing and systemizing data flows across and between
the participants in the national data system. This
requires policies and standards that ensure the con-
sistency and interoperability of data and institutions
and infrastructure to enable the secure exchange of
data that mitigates privacy risks. At advanced levels
of data maturity, the goal is to optimize the system
through shared data governance and collaboration
between the various stakeholders from government,
international organizations, civil society, and the pri-
vate sector.
The structure of this system will differ from
country to country, reflecting local norms for the
safe reuse and sharing of data. Nonetheless, certain
common attributes are needed to realize the devel-
opment gains from reusing and sharing data. A
well-functioning data system defines and establishes
Overview | 17
the authority and responsibility for data production,
flow, and use in a nation. This system would build
on the infrastructure, policies, laws and regulations,
and institutions discussed here; integrate the many
sources of data; and connect all the stakeholders
(figure O.6).
For many countries, a system in which high-
quality data flow and are used safely among various
participants remains a distant vision. A low-income
country suffering from high levels of poverty, fra-
gility, and poor governance may struggle to produce
even the most fundamental data, let alone set up
a whole-of-government, multistakeholder approach
to data governance. Yet keeping this vision in sight
matters for all countries, even those struggling the
most with data, because it can serve as a guide in
making decisions on how to develop their data sys-
tems (box O.1).
Coalescing around a common understanding
of a new social contract for data—one built on trust
to produce value from data that are equitably dis-
tributed—and finding the right blueprint for build-
ing an integrated national data system—one that
unleashes the value of data to improve lives through
Data are
produced protected open quality controlled used and reused
academic
institutions
civil society
and individuals
government
entities
international
organizations
the private
sector
By
Built on the pillars of
institutions
laws and
regulations
infrastructure
policies
economic
policies
Sustained by the foundation of
funding
human capital data demand
trust incentives
Figure O.6 What happens in an integrated national data system?
Source: WDR 2021 team.
18 | World Development Report 2021
creative, innovative applications by a widening array
of users—are highly aspirational goals. Achieving
these goals will require significant changes in how
data are produced, managed, protected, shared, and
used. Making these changes will be difficult and
will depend on substantial commitment and effort,
but the cost of failing to change is a world faced with
greater inequities and many missed opportunities.
Box O.1 Toward an integrated national data system: Country examples
Important steps in the right direction. Many countries
have adopted important initiatives that embody aspects
of what is envisioned in an integrated national data sys-
tem. South Africa’s Department of Planning, Monitoring
and Evaluation has developed a system that includes
the data produced by citizens who monitor the perfor-
mance of government programs. In Chile, civil society
participation is mandated by the 2011 Law on Associa-
tions and Citizen Participation in Public Management,
and the national statistical office has put in place a civil
society council. The inclusion of multiple stakeholders
in a national data system encourages sustainability and
helps ensure that all participants have an opportunity to
access and benefit from it. The Nepal Data Literacy Pro-
gram, established in 2019, comprises a 100-hour mod-
ular, customizable pedagogy to support both technical
skills building and efforts to enhance a culture of data
use among Nepalis. The program is now partnering with
Kathmandu University School of Management (KUSOM)
to incorporate data literacy toolkits into the university
programs and develop a data-driven course that will be
free to other institutions and thousands of students.
A fully realized vision. In Estonia, the government
has set up a national data system to safely manage
citizens’ personal data for use by government agencies
and participating businesses. X-Road is an open-source
data exchange layer solution that allows linked public and
­
private databases to automatically share information,
ensuring confidentiality, integrity, and interoperability
between data exchange parties. It combines a technical
solution (enabling technical architecture and a series of
protocols) with a governance solution (the once-only
principle enshrined in national law that obliges public
sector agencies to refrain from duplicating data requests).
Under this system, citizens have to supply government
agencies and participating businesses with their infor-
mation only once. It is then automatically transmitted to
other participating entities. X-Road’s cryptography proto-
cols also enhance transparency because they log entries
into the system and give individuals detailed insights into
who is sharing their data and for what purposes.
The X-Road arrangement both builds on and
enhances Estonia’s social contract on data by providing
trust, equity, and value. Its transparency engenders trust.
Its national scope, available to all, promotes equity. Its
ease and comprehensiveness provide value. To work
well, this digital data system depends on some “ana-
logue” components. Cooperation is fostered between
government and the private sector and between com-
ponents of infrastructure. Change management is built
into the entire system, from its foundations in national
law (and the social contract) to its design, uptake, and
upkeep. A culture of trust and sharing (data sharing)
is encouraged.
1.	
Chapter 1 reviews definitions of data and describes how
the term data is used in this Report.
2.	
Amnesty International (2019); Zuboff (2019).
3.	
Scheer and Cohen (2020).
4.	
Scheer and Cohen (2020).
5.	
World Bank (2017).
6.	FMWR (2018).
7.	
World Bank, Statistical Performance Indicators (data-
base), http://documents.worldbank.org/curated/en/8157
21616086786412/Measuring-the-Statistical-Performance
-of-Countries-An-Overview-of-Updates-to-the-World
-Bank-Statistical-Capacity-Index.
8.	
Brynjolfsson, Hitt, and Kim (2011).
9.	
For an extensive discussion of this problem and many
other concerns about machine learning, see O’Neil (2017).
10.	For an early reference to “garbage in–garbage out” in the
statistical literature, see Parzen (1964).
11.	
Aiken et al. (2020).
12.	
Burke and Lobell (2017); Osgood-Zimmerman et al.
(2018).
13.	
An early illustration of how data can be used to improve
accountability for public expenditure can be found in
a study of the use of education budgets in Uganda by
Reinikka and Svensson (2001).
14.	
See Broadband Commission for Sustainable Develop-
ment, InternationalTelecommunication Union, “Target 3:
Connectivity” (accessed October 31, 2020), https://broad
bandcommission.org/Pages/targets/Target-3.aspx.
15.	
Chen (2021). Analysis is based on Access Survey
2017–18 data collected by Research ICT Africa in 22
Notes
Overview | 19
low- and middle-income countries across Africa, Asia,
and Latin America.
16.	GSMA (2019).
17.	
World Bank analysis of World Trade Organization ad
valorem duties for “Telephones for cellular networks
‘mobile telephones’ or for other wireless networks”
(Harmonized System code 851712).
18.	
ITU et al. (2018).
19. COE (2018).
20.	
Safeguards for personal data are grounded in a human
rights framework based on international law. These
safeguards have their origin in the establishment of the
“rule of law” with the expression of individual rights in
the Enlightenment and were codified in international
law after World War II. They were further refined in the
context of analog data in the 1970s and 1980s with the
Fair Information Practices, the Council of Europe’s 1981
Convention for the Protection of Individuals with regard
to Automatic Processing of Personal Data (Convention
108), and the first guidelines issued by the Organisation
for Economic Co-operation and Development (OECD).
The OECD guidelines and Convention 108 were updated
in the digital context after launch of the World Wide
Web in 1995 and continue to evolve.
21.	Madrigal (2012).
22.	OECD (2019).
23.	
Ferracane and van der Marel (2021).
24.	Al-Rikabi and Loeprick (forthcoming).
25.	
An integrated national data system does not imply that
all data are integrated in a national database. Instead,
various participants are integrated in a system in
which data are safely flowing and used. This is akin to
a national statistical system in the sense that an ensem-
ble of participants jointly collects, protects, processes,
and disseminates official statistics. But unlike in the
national statistical system, the scope of an integrated
national data system goes well beyond official statistics;
it requires an intentional approach to governing the par-
ticipants and their roles.
References
Aiken, Emily L., Guadalupe Bedoya, Aidan Coville, and Joshua
E. Blumenstock. 2020. “Targeting Development Aid with
Machine Learning and Mobile Phone Data: Evidence
from an Anti-Poverty Intervention in Afghanistan.” In
COMPASS ’20: Proceedings of the 3rd ACM SIGCAS Con-
ference on Computing and Sustainable Societies, 310–11.
New York: Association for Computing Machinery.
Al-Rikabi, Jaffar, and Jan Loeprick. Forthcoming. “Simulating
Potential Tax Revenues from Data-Driven Platform Busi-
nesses in East Asia.” WDR 2021 background paper, World
Bank, Washington, DC.
Amnesty International. 2019. “Surveillance Giants: How the
Business Model of Google and Facebook Threatens
Human Rights.” Report POL 30/1404/2019, Amnesty
International, London. https://www.amnesty.org/en
/documents/document/?indexNumber=pol30%2f1404%
2f2019language=en.
Brynjolfsson, Erik, Lorin M. Hitt, and Heekyung Hellen
Kim. 2011. “Strength in Numbers: How Does Data-Driven
Decisionmaking Affect Firm Performance?” SSRN Schol-
arly Paper ID 1819486, Social Science Research Network,
Rochester, NY. doi.org/10.2139/ssrn.1819486.
Burke, Marshall, and David Lobell. 2017. “Satellite-Based
Assessment of Yield Variation and Its Determinants in
Smallholder African Systems.” PNAS, Proceedings of the
National Academy of Sciences 114 (9): 2189–94. doi.org
/10.1073/pnas.1616919114.
Chen, Rong. 2021. “A Demand-Side View of Mobile Internet
Adoption in the Global South.” Policy Research Working
Paper 9590, World Bank, Washington, DC. http://
documents.worldbank.org/curated/en/49287161635092
9155/A-Demand-Side-View-of-Mobile-Internet-Adoption
-in-the-Global-South.
COE (Council of Europe). 2018. “Convention 108+: Conven-
tion for the Protection of Individuals with Regard to
the Processing of Personal Data.” COE, Strasbourg.
https://rm.coe.int/convention-108-convention-for-the
-protection-of-individuals-with-regar/16808b36f1.
Ferracane, Martina Francesca, and Erik Leendert van der
Marel. 2021. “Regulating Personal Data: Data Models and
Digital Services Trade.” Policy Research Working Paper
9596, World Bank, Washington, DC. http://documents
.worldbank.org/curated/en/890741616533448170
/Regulating-Personal-Data-Data-Models-and-Digital
-Services-Trade.
FMWR (Federal Ministry ofWater Resources, Nigeria). 2018.
“National Action Plan for Revitalization of the WASH
Sector.” FMWR, Abuja.
GSMA (GSM Association). 2019. “Connected Society: State
of Mobile Internet Connectivity 2019.” GSMA, London.
https://www.gsma.com/mobilefordevelopment/resources
/the-state-of-mobile-internet-connectivity-report-2019/.
ITU (International Telecommunication Union), World Bank,
ComSec (Commonwealth Secretariat), CTO (Common-
wealth Telecommunications Organisation), and NATO
(Cooperative Cyber Defence Centre of Excellence, North
Atlantic Treaty Organization). 2018. “Guide to Develop-
ing a National Cybersecurity Strategy: Strategic Engage-
ment in Cybersecurity.” ITU, Geneva. https://www.itu
.int/pub/D-STR-CYB_GUIDE.01-2018.
Knippenberg, Erwin, and Moritz Meyer. 2020. “The Hidden
Potential of Mobile Phone Data: Insights on COVID-19
in The Gambia.” Data Blog (blog). September 10, 2020.
https://blogs.worldbank.org/opendata/hidden-potential
-mobile-phone-data-insights-covid-19-gambia.
Madrigal, Alexis C. 2012. “Reading the Privacy Policies You
Encounter in a Year Would Take 76 Work Days.” Atlantic,
March 1, 2012. https://www.theatlantic.com/technology
/archive/2012/03/reading-the-privacy-policies-you
-encounter-in-a-year-would-take-76-work-days/253851/.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2019. Enhancing Access to and Sharing of Data: Rec-
onciling Risks and Benefits for Data Re-Use across Societies.
Paris: OECD. https://www.oecd-ilibrary.org/science-and
-technology/enhancing-access-to-and-sharing-of-data
_276aaca8-en.
O’Neil, Cathy. 2017. Weapons of Math Destruction: How Big Data
Increases Inequality and Threatens Democracy. New York:
Broadway Books.
20 | World Development Report 2021
Osgood-Zimmerman, Aaron, Anoushka I. Millear, Rebecca
W. Stubbs, Chloe Shields, Brandon V. Pickering, Lucas
Earl, Nicholas Graetz, et al. 2018. “Mapping Child Growth
Failure in Africa between 2000 and 2015.” Nature 555
(7694): 41–47. doi.org/10.1038/nature25760.
Parzen, Emanuel. 1964. “Review of Smoothing, Forecasting
and Prediction of Discrete Time Series, by Robert Good-
ell Brown.” Journal of the American Statistical Asso­
ciation 59
(307): 973–74. doi.org/10.2307/2283122.
Reinikka, Ritva, and Jakob Svensson. 2001 “Explaining Leak-
age of Public Funds.” Policy Research Working Paper
2709, World Bank, Washington, DC.
Scheer, Steven, and Tova Cohen. 2020. “Israel Extends
Coronavirus Cell Phone Surveillance by Three Weeks.”
Emerging Markets (blog), May 5, 2020. https://www
.reuters.com/article/us-health-coronavirus-israel
-surveillanc/israel-extends-coronavirus-cell-phone
-surveillance-by-three-weeks-idUSKBN22H11I.
World Bank. 2017. A Wake Up Call: Nigeria Water Supply,
Sanitation, and Hygiene Poverty Diagnostic. WASH Poverty
Diagnostic Series. Washington, DC: World Bank. https://
openknowledge.worldbank.org/handle/10986/27703.
World Bank. 2019. “Tanzania, Mainland Poverty Assessment
2019: Executive Summary.” World Bank, Washington,
DC. https://www.worldbank.org/en/country/tanzania
/publication/tanzanias-path-to-poverty-reduction-and
-pro-poor-growth.
Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The
Fight for a Human Future at the New Frontier of Power. New
York: PublicAffairs. https://www.hbs.edu/faculty/Pages
/item.aspx?num=56791.
Harnessing the value of data for the poor
Data as a force for public good

Data as a resource for the private sector
Creative reuses of data for greater value
Advancing development
objectives through data
1.
2.
3.
4.
PART I
World-Development-Report-2021-Data-for-Better-Lives.pdf
Harnessing the value of data for the poor | 23
Main messages
Data can improve people’s lives in many ways. However, economic and
political factors typically prevent benefits from being shared equitably.
The value of data for development is largely untapped. Realizing data’s
full value entails repeatedly reusing and repurposing data in creative
ways to promote economic and social development.
The challenge is to develop a trust environment that safeguards against
harmful misuse of data as they are exchanged between parties and
enables data to be created, reused, and repurposed.
A strong data governance framework, composed of appropriate policies,
laws, regulations, and institutions, is needed to ensure that the full value
of data is realized and shared safely and equitably.
1
2
3
4
Harnessing the value of data
for the poor
1
CHAPTE
R
24 | World Development Report 2021
The untapped potential of data
to serve development objectives
A
t the turn of the nineteenth century, English
sociologist Seebohm Rowntree interviewed
a sample of families with the aim of better
understanding the poverty experienced not only by
thoseheinterviewed,butalsobyeveryoneinthetown
of York.1
The findings from this work changed pre-
conceptions by revealing that poverty was pervasive
outside of London and by demonstrating that people
cycled in and out of poverty over the course of their
lives.
How to turn data into information and infor-
mation into insights that can help the poor is at the
heart of this Report (see box 1.1 on the use here of the
term data). In the twenty-first century, data possess
the power to be truly life-changing. Most of the new
and fascinating ways in which data affect the lives of
many of us worldwide are linked to people being able
Box 1.1 What this Report means by data
The term data is difficult to define. It has meant different
things at different times, and in different disciplines.
Originally simply defined as facts, the term slowly
came to mean facts as they related to mathematical
representations. Despite the changing nature of data,
most people would not have thought of things such as
pictures, sounds, or words as data even as recently as
a few decades ago. But times have changed, and major
advances in computing power, together with innova-
tive thinking, have resulted in, for example, radiomics,
the science of converting medical images into data
that, once structured and analyzed, can help improve a
patient’s diagnosis and prognosis.a
Similarly, sound can
now be digitized and analyzed to, for example, explore
and better understand the galaxies.b
And the growing
field of text analytics converts words (such as keywords
from Google searches) into structured data that help us
better understand many social phenomena.c
Because the
evolving definition of data stems simply from technolog-
ical advances in computing and creative thinking, it is
challenging to provide a specific description of data that
would not soon seem archaic or anachronistic.
In very general terms, Carrière-Swallow and Haksar
point out that “data can be quantitative or qualitative
in nature, and may be stored on analog (that is, paper,
stone tablets) or digital media.”d
This view conforms
with how this Report uses the term. Indeed, some data
are still collected on paper in many countries. Process-
ing these data—digitizing them and entering them in a
spreadsheet or database—allows them to be more easily
analyzed, but a digital format is not necessarily an attri-
bute of data.
The Organisation for Economic Co-operation and
Development (OECD) states broadly that data are
“characteristics or information, usually numerical, that
are collected through observation.” More specifically,
data are “the physical representation of information in
a manner suitable for communication, interpretation, or
processing by human beings or by automatic means.”e
Although this description aligns fairly well with how the
term is used in this Report, a few distinctions are worth
noting. Here, data are sometimes collected through
observation, though they need not be. Data can be the
result of digital transactions or simply by-products of
our daily digital lives. Also, in this Report, data are not
synonymous with information. Rather, data must be
processed, structured, and analyzed to be converted
into information. This semantic distinction between data
and information emphasizes the critical role of improved
data management, literacy, and analysis for extracting
information, and creating value, from data.
An expansive description of data that resonates well
with how the term is used in this Report is provided by
the UK National Data Strategy:
When we refer to data, we mean information about
people, things and systems. . . . Data about people
can include personal data, such as basic contact
details, records generated through interaction
with services or the web, or information about
their physical characteristics (biometrics)—and it
can also extend to population-level data, such as
demographics. Data can also be about systems and
infrastructure, such as administrative records about
businesses and public services. Data is increasingly
used to describe location, such as geospatial refer-
ence details, and the environment we live in, such as
data about biodiversity or the weather. It can also
refer to the information generated by the burgeoning
web of sensors that make up the Internet of Things.f
a. Gillies, Kinahan, and Hricak (2015); Yala et al. (2021).
b. See, for example, Leighton and Petculescu (2016).
c. See, for example, Stephens-Davidowitz (2017).
d. Carrière-Swallow and Haksar (2019, 17).
e. Organisation for Economic Co-operation and Development (OECD),
“Glossary of Statistical Terms: Data,” OECD Statistics Portal, https://stats
.oecd.org/glossary/detail.asp?ID=532.
f. See “What We Mean by Data” (DCMS 2020).
Harnessing the value of data for the poor | 25
to extract greater value from data. Indeed, the data
produced by people can be used in innovative ways to
help them, but one does not have to be the producer
or user of data to benefit from the data revolution. In
fact, the data often collected from a small sample of
people can help shape policy to improve the lives of a
vastlylargerpopulation,whethertheywerepartofthe
sample or not—just as Seebohm Rowntree revealed
in his pioneering efforts. But for such approaches to
work, the samples must be truly representative of the
population, including the poor and other marginal-
ized groups. And yet both traditional censuses and
sample surveys, as well as new data sources captured
by the private sector, may fail to fully cover the most
disadvantaged groups.
An important attribute of data is that using them
does not diminish their value to be reused for some
other purpose—data are inexhaustible. But reusing or
repurposing data typically requires well-functioning
data systems that facilitate the safe flow of data in
formats that make the data valuable to many users.
These systems, however, typically do not function
well in many low- and middle-income countries.
Moreover, data have a dark side. Making data
accessible to more users and creating systems that
facilitate their reuse also opens the door for data to be
misused in ways that can harm individuals or devel-
opment objectives. With lives becoming increasingly
intertwined in the digital world, each day brings new
concerns about protecting personal data; misinfor-
mation; and attacks on software, networks, and data
systems.
Well-functioning data systems thus balance the
need to safeguard against outcomes that harm people,
while simultaneously enabling the potential for data
to improve lives. This Report returns often to the
need to strike this balance between safeguarding and
enabling.
The findings and recommendations in this Report
are drawn from an extensive array of material, includ-
ing academic research, international development
agency reports, commercial experiences, and a series
of consultations with innovators and stakeholders in
the data world. Although this Report reinforces and
builds on findings from WorldDevelopmentReport2016:
Digital Dividends,2
the World Bank report Information
and Communications for Development 2018: Data-Driven
Development,3
and many reports on digital technology,
this Report differs by focusing on how data them-
selves, rather than the adoption of digital technology,
can improve the lives of poor people.4
World Development Reports often synthesize estab-
lishedfindingsfromanalyticalworkandresearch,but
the issues and content surrounding data are evolving
rapidly. Many of the topics covered continue to be
widely debated in rich and poor countries alike. Con-
sensus has yet to emerge, and research is at an early
stage, particularly on how these issues affect low- and
middle-income countries. The goal, therefore, is not
to be overly prescriptive, but to develop frameworks
to help policy makers and countries think through
the trade-offs and adopt a balanced approach to devel-
oping both safeguards and enablers. Countries should
make the most of data, but safely, and as appropriate
for their social, political, and economic context.
The growing literature on data over the last few
years is largely written from a high-income country
perspective.5
This Report therefore sets out to fill the
large gap in the literature on the effects of data on
poor people and poor countries.
A brief history of data
Many of the themes of this Report were emerging
even in the earliest days of data collection and use.
For millennia, people have been collecting data. The
oldest censuses date back to at least 2000–1000 BCE
to ancient Egypt, Greece, and China, who enumerated
people, livestock, and food items.6
The Romans fielded
a census of men and their possessions every five
years—a practice referenced in the Christian Bible.7
Over the long history of data collection, the type
of data collected and the ways data have been used
have changed as societies’ priorities, values, power
structures, and government objectives have changed.8
Record keepers in the Incan Empire between 1400
and 1500 CE counted people, dwellings, llamas,
marriages, and potential army recruits.9
Rulers and
administrators gave priority to counting sources of
wealth and power considered of strategic importance
(the data were kept secret from the public). They
collected information first and foremost on property
for taxation and men for military recruitment and
labor force purposes, as well as enumerating newly
conquered peoples and territories. With little reason
to believe that the data being collected were meant to
improve lives, distrust was widespread—it was not
uncommon for citizens to resist being counted or
having their possessions counted.10
The ascent of Enlightenment ideals in eighteenth-
century Europe, with their emphasis on objective sci-
entific inquiry, brought a shift in attitudes toward the
role of data in society—from simply counting and reg-
istering phenomena to describing and understanding
living conditions for society as a whole.11
During
this era, and under the influence of the leading
26 | World Development Report 2021
intellectuals of the day, notions of the rule of law (and
accountability of states) evolved, a social contract
between the individual and the state coalesced, and
the Declaration of the Rights of Man and of the Cit-
izen emerged.12
These became the foundational ele-
ments of the current discussions of a social contract
for data based on human rights (see chapter 6).
Starting in the late eighteenth century, govern-
ments of the emerging nation-states in Europe and
North America established statistical agencies to pub-
lish official statistics on the state of the nation and
to inform public discourse. European nations began
systematically conducting full-fledged population
censuses, and a decennial national census became a
provision of the US Constitution. By the end of the
nineteenth century, half of the world’s population had
been enumerated in censuses (figure 1.1).13
These advances also led to some of the innova-
tions in statistics and social science research methods
that enabled the rise of the sample survey. The earliest
examples of sampling date back to the late seven-
teenth and early eighteenth centuries, but they lacked
the theoretical foundations to justify the method.14
Sampling remained highly controversial throughout
the nineteenth century, but methodological advances,
especially the concept of random sample selection,
led to its gradual acceptance in the early twentieth
century. A series of influential articles in the 1930s,
1940s, and 1950s filled the holes in the theoretical
foundations of survey sampling around the same
time that sampling frames with universal coverage
became available.15
Sample surveys grew enormously
popular, especially in the United States, quickly cover-
ing a wide range of topics.
Modern geospatial data systems developed along
a similar timeline. Building on the much older sci-
ence of cartography, this type of data is rooted in
the thematic maps of the eighteenth and nineteenth
centuries. Its goal was to relate geography to other
types of information.16
A prominent early application
was the spatial mapping of disease outbreaks—for
example, of yellow fever in New York City at the end
of the eighteenth century and especially of cholera in
British and other European municipalities during the
pandemics of the nineteenth century.17
Most promi-
nent among those is the map of London by physician
John Snow. During the 1854 cholera outbreak, Snow
plotted cholera-related deaths in London together
with the city’s water pumps, identifying a high con-
centration of cases close to a pump on Broad Street
and deducing that water from this pump was causing
infections (map 1.1). New cases in the area stopped
Figure 1.1 The share of people counted in a census grew from about 1 in 10 in 1850 to
9 in 10 today
Source: Whitby 2020. Data at http://bit.do/WDR2021-Fig-1_1.
Note: The shaded area represents the world’s population; the bars indicate the percentage of the population that was enumerated each decade.
8
7
6
5
4
3
2
1
0
1850 1860 1870 1880 1890
11% 14% 17% 21%
1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010
World
population
(billions)
94%
92%
92%
88%
71%
82%
58%
58%
46%
44%
51%
38%
90%
Harnessing the value of data for the poor | 27
almost entirely once the pump had been removed.18
Since the advent of Snow’s map, innovations in print-
ing and computer technology as well as the rise of
remote sensing have made geospatial data and their
applications versatile and ubiquitous.19
With the digital revolution, the types and scope
of data have changed dramatically, and the volume
of data collected has grown exponentially. In this
new landscape, private sector actors are playing an
increasingly larger role in data collection through
platform-based business models in which data are col-
lected passively as a by-product of business processes.
Digital platforms have also expanded the opportu-
nities for citizens to collect data, which often occurs
when governments fail to collect data (see spotlight
1.1). Examples include Utunzi, a platform that allows
individuals and organizations to report and document
violence against LGBTQI individuals,20
and various
platforms that allow users to report air pollution lev-
els, deforestation, and other location-specific environ-
mental data to raise awareness and spur action.
The foundational origins of data protection laws
can be linked to the Enlightenment era. Although
there is a clear arc from these historical concepts of
rights governing interactions between the state and
the individual to principles guiding data protection,
most policies guiding data regulation are very mod-
ern (see chapter 6). The principles of data protection
can trace their immediate roots to the US Fair Infor-
mation Practice Principles developed in the 1970s and
that formed the basis for the 1980 OECD (Organisa-
tion for Economic Co-operation and Development)
Guidelines (revised in 2013).21
Similarly, the basic
substantive rights and obligations in the European
Union’s General Data Protection Regulation, reflected
first in its 1995 Directive on the Protection of Personal
Data, trace their roots to the OECD Guidelines.22
A data typology
Although data can be used to improve development
outcomes, the challenges differ across data types.
To help readers conceptualize these data types and
better understand those challenges, this Report sorts
data types using a two-dimensional framework
(table 1.1). In the first dimension, data are classified
based on whether the original intent was for public or
commercial purposes. Both new and traditional types
of data collected for commercial purposes are called
private intent data. Data originally collected for public
purposes are called public intent data, regardless of
the collection instrument or the entity that manages
the data. Public intent data tend to be collected pur-
posefully with a view toward representativeness. By
contrast, private intent data are self-selecting in that
they cover only users of cellphones and the internet,
for example, and collection of these data may be
more incidental.
The second dimension distinguishes between
“traditional” and “new” data collection methods. Pub-
lic intent data are typically associated with traditional
data types such as censuses and surveys, although
newer sources of data (such as from satellite imaging
or e-government platforms) have become more prev-
alent. By design, traditional data collection efforts by
governments are for public purposes and are used to
inform policy making. But because the collection of
public data via traditional methods tends to be rela-
tively costly,23
surveys are conducted infrequently,24
and they often lack the granularity necessary to make
meaningful inferences about subpopulations of inter-
est. Meanwhile, traditional public intent data offer
important advantages over new private intent data
Map 1.1 John Snow’s innovative mapping of the
cholera epidemic in London in 1854 revolutionized
tracing of the disease
Source: Ball 2009. Map segment reproduced from John Snow, On the Mode of Communication of
Cholera, 2nd ed. (London: John Churchill, 1855).
Note: The solid black rectangles of various sizes represent deaths from cholera.
28 | World Development Report 2021
Data collection
methods and tools Public intent data Private intent data
Traditional Census, national accounts, household
surveys, enterprise surveys, labor force
surveys, surveys of personal finance,
administrative records
Any survey conducted by private entities,
including public opinion surveys deployed
by private entities; administrative data from
company financial accounts
New Location data from satellite imaging, digital
identification, facial recognition from public
cameras, public procurement data from
e-government platforms
Just-in-time digital data on individual
behavior/choices from digital platforms in
the private sector
in terms of their coverage of the population—and
thus their potential to benefit more people—and their
format, which makes them amenable to inferential
analytics by researchers and government officials.
Private intent data are often associated with new
sources of data produced using digital tools and
applications that are growing rapidly. Compared
with traditional public intent data, new private data
sources offer greatly improved timeliness, frequency,
and granularity of data, but they may not be repre-
sentative in coverage. New private intent data can
contribute significantly to addressing public sector
development challenges. Private intent data collected
through cellphones, internet usage, satellites, remote
sensors, and other sources provide information about
individuals and geographic locations that traditional
surveys simply cannot.
Any simple framework used to classify data types
carries limitations. Although much public intent data
have long been collected using traditional methods,
those methods are being updated and adapted. The
new methods will increasingly supplement or replace
traditional methods, and so the traditional–new dif-
ferentiation in table 1.1 is likely to evolve. The distinc-
tion between public and private stewardship of data
also may not be a salient one in some cases. For exam-
ple, citizen‑generated data—data that people or their
organizations produce to directly monitor, demand,
or drive change on issues that affect them—can be
produced through crowdsourcing mechanisms or
citizen reporting initiatives, and such data are often
organized and managed by civil society groups. The
data may reside with a private entity, but they are
clearly collected for public purposes.
Although data gathered through new methods
for private purposes offer tremendous potential to
improve timeliness and detail through massive sam-
ple size, they are not a panacea for the shortcomings
of public intent data collected using more traditional
methods. For one thing, private firms have little
incentive to curate their data for sharing, and thus
these data are not readily amenable for public use.
A potentially more difficult challenge is coverage.
Data collected for public policy purposes are almost
always designed to represent the relevant current
population (such as individuals, firms, health facili-
ties, students, or schools). However, survey designers
face challenges in meeting the representativeness
objective in terms of both coverage (such as under-
representation of slum inhabitants, top earners, or
informal enterprises) and timeliness (due to delays
in data processing). By contrast, collectors of private
intent data rarely need or have an interest in full pop-
ulation coverage; they focus much more on specific
subgroups (such as consumers and suppliers). Thus,
even though sample sizes can be massive and very
timely, they can provide only partial reflections of the
population. A study from the United Kingdom exam-
ined data from a variety of social media platforms and
found that none was representative of the population,
particularly underrepresenting the elderly, the less
well educated, and lower-income people.25
Public policies and programs need to be informed
by data that represent the relevant population. For this
reason, private intent data should not be viewed as a
substitute for public intent data in understanding the
scope of many development problems (box 1.2). That
said, the joint use of public intent data collected using
traditional methods and newer sources of private
intent data offers interesting opportunities to reap
significantly more value added than the isolated use
of one kind of data or the other. A key theme of this
Report is that governments should take advantage of com-
plementaritiesbetweennewandtraditionaldatatoconfront
developmentchallenges.Forexample,becausethemajor-
ity of the world’s poor live in rural areas and derive
Table 1.1 Examples of data types based on original intent and collection methods
Source: WDR 2021 team.
Harnessing the value of data for the poor | 29
their livelihoods from the land, measuring agricul-
tural productivity is central to policies and programs
to eliminate extreme poverty. Yet recent research has
shown that agricultural productivity, specifically
crop yield, is poorly measured with traditional survey
approaches that rely on farmer-reported information
on crop production and land areas.26
When sample
surveys rely instead on objective measurement meth-
ods, the resulting data not only accurately capture
crop yields at surveyed locations, but also can be used
Box 1.2 Innovation in traditional surveys: A COVID-19 example in
Brazil
A prime example of the importance of traditional surveys
and their potential for innovation comes from Brazil. In
May 2020, it was one of the first countries to complete
nationally representative surveys to produce data on the
prevalence of COVID-19.a
Fieldworkers clad in personal
protective equipment conducted a serology test on ran-
domly selected household members. This test detects
the presence of antibodies in the blood as a response to
a specific infection, such as COVID-19—that is, it detects
the body’s immune response to the infection caused by
the virus rather than the virus itself. While waiting for the
results of the test, the fieldworkers administered a brief
questionnaire to collect sociodemographic data and
asked the tested household member whether she or he
was experiencing symptoms associated with COVID-19.b
Asking questions about symptoms enabled the research
team to estimate rates of asymptomatic infection. Socio-
demographic questions, especially those about work and
travel outside the home, enabled the team to measure
how much a household member adhered to social dis-
tancing guidelines.
The test results were conveyed to the household
member before the fieldworkers left the dwelling, and
information on positive tests was sent to health author-
ities to help them track the spread of the virus. In May,
25,025 interviews in 133 “sentinel cities” were completed
in the baseline survey. Cities were chosen because of
their primacy in the local region as hubs of commerce
and services for surrounding urban and rural areas.
The survey was conducted three more times, the most
recent round in late August 2020. Multiple survey rounds
enabled researchers and public health officials to track
the spread of the virus over time by region.
At least two findings based on these serology tests
and the interviews are striking. First, COVID-19 infections
were far more prevalent than had been recorded. Overall
seroprevalence—the share of the population that tested
positive for the pathogen—for the 90 cities with a sam-
ple size of 200 or greater was 1.4 percent in the baseline
survey. Extrapolating this figure to the full population of
these ­
cities, who represent 25 percent of the country’s
population, produced an estimate of 760,000 cases,
compared with the 104,782 cases reported for those cities
in official statistics as of May 13, 2020. In the fourth round
of the survey in August, the seroprevalence rate had
climbed to 3.8 percent.c
Second, there was a remarkably wide regional vari-
ation in seroprevalence around the 1.4 percent national
average, ranging from less than 1 percent in most cities in
the South and Center-West regions to 25 percent in the
city of Breves in the Amazon (North region). Eleven of
the 15 cities with the highest seroprevalence were in the
North. The six cities with highest seroprevalence were
located along a 2,000-kilometer stretch of the Amazon
River. Beyond geography, seroprevalence varied across
ethnic groups and was highest among indigenous popu-
lations (3.7 percent in the baseline survey). Understand-
ing the scope of the overall problem and identifying
regions and populations with the most pressing needs
would not have been possible without population-based
surveys. These data also provided information on the
effectiveness (or lack thereof) of approaches adopted to
combat the spread of the disease.
Broadsupportforinvestigatingsomethingasimportant
and urgent as the prevalence of COVID-19 might have been
expected, and yet opposition sprang up in some quarters.
For example, in some areas sample size was suppressed
by the rapid spread of disinformation through social media
that characterized the interviewers as “swindlers,” or even
aspartofaplottospreadthevirus.In27cities,interviewers
were arrested, and in eight cities the tests were destroyed
by the local police force.d
Overall, however, the example
illustrates the importance of population-based surveys
(and public intent data in general) for understanding the
scope and nature of disease spread.
a. Hallal, Hartwig, et al. (2020). Brazil is the only country in Latin America to
complete a national survey.
b. Hallal, Horta, et al. (2020).
c. UFPEL (2020).
d. Hallal, Hartwig, et al. (2020).
30 | World Development Report 2021
to inform and develop remote sensing models that
combine data from surveys and satellites to provide
highly localized crop yield estimates across entire
regions and countries beyond the locations in which
sample surveys are conducted.27
Both public intent and private intent data have
advantages and disadvantages and pose distinct
challenges in terms of reuse and exchange to achieve
development objectives. But because public intent
and private intent data have inherent complementar-
ities, they can be used jointly to bolster development.
A ministry of health would be able to issue better
­
public policy if it could connect its health data with
that of other ministries such as education, labor, and
planning, as well as with that of health providers,
whether public or private, around the country. A pri-
vate firm would be able to operate more effectively if
it could link its data with other sources of informa-
tion, such as satellite data on population density and
socioeconomic data on wealth and well-being.
Combining the two types of data could advance
evidence-based policy through more precise and
timely official statistics that are produced more
cheaply, while preserving the representativeness
characteristic of public intent data. For example,
building on the well-established infrastructure for
socioeconomic surveys conducted by governments,
satellite data and call detail records from mobile
phones offer new opportunities for updating poverty
estimates for small areas more frequently. More gen-
erally, the high frequency of data collected for com-
mercial purposes holds promise for producing better
estimates of current socioeconomic conditions when
large-scale, costly surveys such as censuses or inte-
grated household surveys such as those of the World
Bank’s Living Standards Measurement Study are
infrequent. Real-time data on prices, nighttime lights,
or trade flows could be used to help “nowcast” (that
is, generate an estimate for the current time based on
data collected with a lag in time) macroeconomic data
to avoid lags in availability.
The economics of data and
political economy issues
The potential to extract further value from the
proliferation of data is significant because data are
inexhaustible or “nonrival”—that is, a person’s call
detail records, location history, internet usage, and
medical records, among other things, can be used
repeatedly by firms and governments for different
purposes without depleting them.28
This finding is
illustrated by the data life cycle (figure 1.2), which
depicts the potential circularity of data use, reuse, and
Figure 1.2 The data life cycle
Source: WDR 2021 team.
Create/
receive
Archive and
preserve
Analyze
and use
Transfer/
share
Store
Process
Destroy
Reuse
Harnessing the value of data for the poor | 31
repurposing, as long as data can be made safely acces-
sible across a wide array of users and unless explicit
steps are taken to destroy the data.
Because of the nonrival character of data and the
implications for their limitless reuse, it is inherently
difficult to place an economic value on data, although
many attempts have been made to do so. The diverse
approaches taken range from a cost-based meth-
odology that adds up different components of the
information value chain;29
to approaches that directly
quantify the economic benefits that data yield by
improving efficiency, reducing transaction costs,
or expanding markets;30
to estimates based on the
stock market value of data-intensive companies and
related acquisition transactions.31
Although all of
these approaches agree on the high value of data, the
partial nature of these estimates, together with the
heterogeneity of the approaches, prevent any defini-
tive conclusion.
In view of today’s increasingly sophisticated appli-
cation of machine learning and artificial intelligence
to drive data-based innovations, it is quite conceivable
that the economic value of unanticipated secondary
uses of data may far exceed the value of the primary
use of data—that is, the use for which they were orig-
inally collected. These characteristics raise the pros-
pect of serious underinvestment in data collection
from a social perspective because the entities bearing
the cost of data collection will not necessarily be the
ones capturing its full economic value.
At the same time, data are not a pure public good
because they are excludable, allowing the entity that
originally collects and holds the data to prevent oth-
ers from accessing them. Examples abound across
the public sector of the unwillingness of data holders
to share data with other government entities and the
public.Intheprivatesector,firmsmaynotwanttosell
or exchange their data with others, perhaps because
governments and firms lack the capacity to share or
exchange their data in a safe manner, or more simply
because of a lack of incentives (or legal requirements)
to make the data available. In some cases, reuse and
sharing of data may cede informational advantages
to competing firms in the private sector or rival enti-
ties in the public sector. Although the excludability
of data suggests that they could readily be traded on
markets, other economic characteristics complicate
this classical approach to addressing allocation issues
(see box 1.3).
A fundamental reason for the lack of incentives to
share, sell, or exchange data is the considerable econo-
mies of scale that accrue to holding data and the asso-
ciated economic or political power that they bring to
the data holder. Although the returns to the first few
bits of data are essentially zero, there is a point past
which the returns from additional data, and from
improvements in the systems supporting these data,
are substantial and increasing until they ultimately
level off.32
For example, in the field of artificial intel-
ligence the size of datasets is a critical determinant of
the accuracy of predictive algorithms.33
Modern deep-
learning techniques, with their complex models, have
an even more voracious appetite for large datasets
than traditional machine learning, and they may not
begin to experience diminishing returns until they
incorporate much larger scales with datasets contain-
ing tens or hundreds of millions of data points.
In addition to economies of scale, data are char-
acterized by economies of scope because combining
different types of related datasets can yield insights
that otherwise would have been unavailable from
one type of data alone. Similarly, weak and seemingly
very tangential relations can be identified through
machine learning techniques with larger and larger
volumes of data. For example, Google’s search engine
data may be used to evaluate the effectiveness of
advertising on YouTube, which is also owned by
Google.34
Social media can also track users’ behavior
to then build very detailed advertising profiles.
Imbalances in information sharing,
concentrations of power, and equity
concerns: A dark side of data
Such strong economies of scale and scope in data,
and the resulting assemblage of valuable informa-
tion by some actors at the potential expense of those
who are excluded from the transaction, may lead to
a concentration of power—economic or political—in
the hands of those with privileged access to large
volumes of data.35
In the private sector, market
forces are likely to lead to data agglomeration and
market concentration in data-driven businesses,
which may preclude entry by small firms and even-
tually create conditions for the abuse of market
power. Today, the firms that control the greatest
agglomerations of data are among the world’s larg-
est. The concentration of personal information in a
handful of companies raises concerns about market
power and discrimination. A key theme throughout
this Report is balancing the gains in efficiency that
new data bring with such equity concerns. On the
government side, data agglomeration may lead to
a concentration of personal information, which
can be used to amass and maintain political power,
discourage dissent, and even discriminate against
some population segments. Measures that limit and
32 | World Development Report 2021
neutralize this kind of dominance founded on the
control of data need to be central to any data gover-
nance framework.
Because reliable statistics can expose poor pol-
icy decisions and performance, dilute power, and
increase public scrutiny and pressure on govern-
ments, vested interests can be expected to intervene
to distort decisions about the collection, reuse, and
sharing of data. And indeed this Report finds strong
associations among country statistical performance,
independence of national statistical offices, and
freedom of the press, controlling for country size
and income level (chapter 2). The patterns indicate
that a free and empowered press is a critical check
on government power and an important facilitator of
statistical independence and data transparency.
Alternative data sources can provide a check on
political influences when the accuracy or impartiality
of official statistics is in question. For example, online
prices obtained through web scraping have been used
to construct daily price indexes in multiple coun-
tries, providing a comparison with official inflation
figures. Researchers found that from 2007 to 2011,
when Argentina reported an average annual inflation
rate of 8 percent, online data indicated that the rate
exceeded 20 percent.36
The higher figure was consis-
tent with inflation expectations from household sur-
veys conducted at the time and similar to estimates of
Box 1.3 The challenges of trading data through markets
From an economic perspective, it seems plausible that
access to data is best solved by first defining clear
economic property rights over data and then allowing
parties to trade in data. However, the limited nascent
economic literature on this subject suggests that for two
reasons these propositions are not as straightforward as
they may initially seem.
First, legal and economic challenges confound the
definition of property rights over data. A central issue
is the ambiguity involved in allocating property rights
between the data subject and the data collector, each of
which has some legitimate claim to be the “data owner.”
Present legal frameworks such as the European Union’s
General Data Protection Regulation allocate certain spe-
cific rights to the data subject, implicitly leaving residual
rights to the data collector as a purely de facto property
right.a
Typically, a greater degree of data protection will
benefit the data owner to the detriment of other potential
data users and vice versa. This finding suggests that there
is an economically optimal level of data protection. How-
ever, without efficient allocation of property rights, this
social welfare–maximizing outcome will not be attained.b
The large synergies and complementarities that arise
across different types of data (economies of scope)
raise the concern that fragmented ownership patterns
will prevent them from being realized, whether through
strategic behavior or through technical barriers such as
lack of interoperability. However, the classic trade-off
between the static objective of making data widely
available to maximize economic value and the dynamic
objective of preserving incentives for further data to be
collectedc
has weakened considerably with the advent
of digital data that are often collected without cost as a
by-product of other economic activities.
Second, although private bilateral market exchanges
of data are well established in certain niches (specifically,
trading personal data to target advertising), there are
as of today no open multilateral markets for data, and
many attempts to create such data markets have failed.d
Because data are one of many experience goods that are
difficult to evaluate in advance in areas such as price and
quality, an important challenge is how data providers can
convey information about the quality of their data before
providing access.e
In practice, data provenance has become the main
means of signaling the quality and accuracy of data,
relying on the reputation of the original source. How-
ever, the metadata needed to establish provenance may
themselves be subject to legal restrictions in areas such
as privacy, and data sellers may have strategic incen-
tives to conceal or manipulate such information. The
theoretical literature demonstrates that the institutional
mechanisms currently available for trade in data have led
to a sharp trade-off between the feasible scale of a data
market and the ability to verify the quality of the data
traded.f
Data may be traded via markets on a much larger
scale in the future, but legal and institutional adaptations
will be crucial to address challenges regarding data
property rights and quality.
a. Duch-Brown, Martens, and Mueller-Langer (2017).
b. Duch-Brown, Martens, and Mueller-Langer (2017).
c. Duch-Brown, Martens, and Mueller-Langer (2017).
d. Koutroumpis, Leiponen, and Thomas (2020).
e. This is known as the Arrow Information Paradox (Arrow 1962).
f. Koutroumpis, Leiponen, and Thomas (2020).
Harnessing the value of data for the poor | 33
some provincial governments and local economists.
Because online price data were available outside the
country, efforts by Argentina’s government to discour-
age local economists from collecting these data inde-
pendently were largely ineffective. These practices
were halted in 2015 as Argentina took steps to reaffirm
its commitment to the transparency and reliability of
official data through its National Institute of Statistics
and Censuses (INDEC). Similar disparities between
official inflation statistics and those obtained from
online prices have recently emerged in Turkey.37
Governments can pose broader challenges to the
use of nonofficial data sources. For example, Tanza-
nia’s 2018 amendment to its 2015 Statistics Act threat-
ened members of civil society groups that published
independent statistical information with imprison-
ment. Approval of the National Bureau of Statistics
was required to publish such information, and pub-
lishing statistics that “invalidate, distort or discredit”
official statistics was deemed a criminal offense.
These provisions were subsequently amended amid
international pressure.38
Finally, the transparency and reliability of official
statistics can have important macroeconomic impli-
cations. At a time when public debt levels are explod-
ing from pandemic-related spending (see spotlight
1.2), governments may be less than forthcoming with
data on the public debt, potentially enabling them to
overborrow and hide debts from both citizens and
creditors, at least for a while. Eventually, however,
that strategy can have negative repercussions. For
example, in Mozambique three state-backed compa-
nies took on in 2013 and 2014 more than US$2 billion
in government-guaranteed debt, equivalent to about
13 percent of the gross domestic product (GDP).39
Roughly US$1.2 billion of it was borrowed without
being disclosed to parliament and the public. The
country’s access to international credit markets was
severelycurtailedafterthehiddenloanswererevealed
in 2016. To rehabilitate its reputation, the government
has undertaken a complex reform package to foster
greater transparency and improve governance and
anticorruption frameworks.40
Data for development:
A conceptual framework
This Report poses two fundamental questions. How
can data better advance development objectives?
And what kind of data governance arrangements are
needed to support the generation and use of data in
a safe, ethical, and secure way while also delivering
value equitably? The first part of this Report identifies
themultiplepathwaysthroughwhichdatacansupport
or inhibit the development process, relying on the con-
ceptual framework presented in this chapter, together
with concrete illustrations and examples from recent
experience in less developed and emerging countries.
Three pathways by which data can
support development
Data can contribute to development by improving
the lives of the poor through multiple pathways.
The conceptual framework that guides this Report
focuses on three such horizontal pathways (figure 1.3).
The middle pathway is data generated by or received
by governments and international organizations to
support ­program administration, service delivery, and
evidence-based policy making (see chapter 2). The top
pathway is data created and used by civil society and
academia to monitor and analyze the effects of gov-
ernment programs and policies and by individuals to
empower and enable them to access public and com-
mercial services tailored to their needs. The bottom
pathway is data generated by private firms.These data
can be a factor of production that fuels firm and eco-
nomic growth. But data also can be part of production
processes in other ways (as an intermediate input,
an output, or a by-product) and can be mobilized and
repurposed to support development objectives (see
chapters 3 and 4).
In figure 1.3, two-way arrows link data production
and collection with the three groups of actors in the
center of the figure. These arrows indicate that data
do not merely flow to the actors. They also must be
collected with purpose, and data processing and
analytics by those actors provide important feedback
about what data should be produced and collected
going forward. The rectangle that encapsulates the
actors indicates their centrality in processing and
analyzing data to provide insights that lead to better
lives and better development outcomes. Among these
three pathways, data can be shared and exchanged
flowing vertically across public, private, and civil
society channels for further impact on development.
Data collected for use in one of these pathways can be
accessed and repurposed for a different use through
other pathways or by other data users.
Government and international organizations. At a
basic level, data enable governments to understand
the impact of policies and improve program admin-
istration and service delivery. For traditional data
types such as household and firm surveys, national
accounts, and administrative data, governments (or
agents authorized by governments) have been central
to collection efforts. They have collected data typically
34 | World Development Report 2021
for specific purposes, often intended to improve pol-
icies and encourage development. However, without
strong data systems in place to support data analysis
in relevant applications, much of the potential for
data to improve outcomes is unrealized.
Important factors in supporting successful
national data systems include trained staff, budget-
ary autonomy for agencies that collect data, adequate
data infrastructure, connected databases, and inter-
national partnerships (see chapter 9). However, these
resources are often scarce in low-income countries,
leaving these countries the least equipped to collect
and effectively use the data necessary to assess and
understand the scope and nature of the development
problems they face and make inroads to solving them.
Enhancing the capacity of client countries to collect,
analyze, and utilize data therefore has been, and will
continue to be, a priority of the World Bank Group,
and it is a major focus of this Report. International
organizations can help countries to address lack of
funding, technical capacity, governance, and demand
for public intent data and to overcome these barriers.
Sovereign-supported multilateral and bilateral devel-
opment institutions are also important collectors
and disseminators of data in their own right, and
they support country governments in their efforts to
improve and deploy data better.
A better ability to exchange public intent data
across many platforms (interoperability) could
increase their impact on development. Despite their
advantages in coverage, suitability for some types of
analysis, and potential for informing and improving
policy, public intent data are often stored in differ-
ent government agencies and formatted in different
ways. Fragmentation and incompatibilities thus limit
a government’s scope to use its data to the fullest
extent to improve policies, service delivery, and target-
ing. Interoperability across public intent data sources
is therefore an important goal.
The central role of government and international
organizations in fostering development through data
use and reuse is captured in figure 1.3 by the place-
ment of this pathway in the center of the figure (see
chapter 2).
Individuals, civil society, and academia. In the top
pathway, making data widely available enables
individuals and civil society to hold governments
accountable for policy choices. Inputs from civil soci-
ety provide a feedback mechanism through which
policies can be adapted and improved, leading to
more responsive governance. Civil society organi-
zations themselves create data by collecting surveys
and crowdsourcing information directly from citi-
zens. Such data can foster discussion, government
Figure 1.3 Three pathways along which data can foster development
Source: WDR 2021 team.
Note: Positive impacts are shown in green; negative impacts are shown in red.
Greater accountability
Criminal activity, dark net
Development
Political surveillance
Increased business opportunities
Data
production and
collection
Greater transparency
More data on individuals
Production process in firms
Reuse
Reuse
Data analytics and
processing
Private sector
Better policy making
and service delivery
Market concentration
Widening inequality
Discrimination
Individuals
Civil society
Academia
Government
International
organizations
Harnessing the value of data for the poor | 35
accountability, and transparency. Simply providing
individuals with better access to their own data col-
lected by government, international, or private sector
actors is another way to enable citizens to advocate
for themselves and improve their lives.
This pathway includes the use of administrative
datasets by academic researchers to improve the
quantity and quality of available evidence on social
programs and policies.41
For example, administra-
tive linked employer-employee datasets have been
used to document earnings inequality and to study
the sources of its decline in Brazil42
and to study
underreporting of wages by formal firms43
and the
effects of business start-up programs in Mexico.44
Often carried out in partnership with firms or gov-
ernments, this type of research is being published
increasingly in top academic journals.45
However,
broadening researchers’ access to administrative
datasets remains a challenge, even in countries
with well-developed statistical systems.46
The private sector. Through the bottom pathway,
data generated by the private sector also hold prom-
ise for improving the lives of the poor (see chapter 3).
For one thing, data have become critically important
in the production process of many firms. Indeed, the
business models of some of the world’s largest firms
(such as Amazon, Google, and Facebook) are pred-
icated on data. Some important platform business
models emerging in middle-income countries (such
as Grab in Indonesia and Mercado Libre in Latin
America) could greatly expand market access oppor-
tunities for small and medium enterprises. Other
data-based private solutions can directly improve
the lives of poor people—such as digital credit, often
applied for via cellphone, which facilitates financial
inclusion. Private financial services providers are also
using alternative credit scoring techniques that take
advantage of users’ digital footprints to train machine
learning algorithms to identify, score, and underwrite
credit for individuals who otherwise would lack doc-
umentation of their creditworthiness.
Data reuse, sharing, and repurposing for all pathways.
Enabling data reuse and repurposing is central to real-
izing their value (see chapter 4). Such reuse can take
place between actors within each of the three path-
ways, but also across pathways. The two-way arrow
in figure 1.3 between private firms and government
indicates the reuse and repurposing for public policy
of data originally collected for commercial purposes
and the reuse and repurposing of public intent data by
firms. Similarly, the two-way arrow between individu-
als/civil society/academia and governments indicates
the reuse, sharing, and repurposing of data between
those parties. The final two-way arrows reflect the use
of private sector data and data-driven applications by
individuals/civil society/academia and the use of data
and analysis generated by individuals/civil society/
academia by firms.
The many examples of repurposing data to
improve development outcomes include using geo-
spatial location data from mobile phones, mobile call
detail records, or social media (Facebook) and online
search (Google) data to predict and trace the outbreak
of disease, especially COVID-19 (box 1.4).47
Online
media and user-generated content can be used to map
water/flood events in real time for water manage-
ment and food security. Combining satellite imagery
data from private and public sources can be used to
monitor crop yields and forecast malnutrition.48
The COVID-19 experience has also shown how
public statistics constructed from private sector
data—on credit card spending, employment, and
business revenues—can serve as a new tool for
empirical research and policy analysis. In the United
States, indicators disaggregated by ZIP code, industry,
income group, and business size showed that small
businesses and low-income workers providing in-
person services within wealthier ZIP codes were
hardest-hit by the reduction in consumer spending
during the crisis.49
The patterns suggest that wide-
spread tax cuts or relief checks are not effective when
people are afraid to go out and spend. Unemployment
insurance benefits and grants or low-cost loans target-
ingstrugglingbusinessesarelikelyabetterapproach.50
Ways in which the same three pathways
can harm development
Although use, reuse, and repurposing of data offer
great prospects for fostering development, they
simultaneously pose significant risks that must be
managed to avoid negative development impacts. The
mounting nature of such concerns has prompted calls
for a new social contract around data. These risks can
manifest themselves through public, private, and civil
society pathways. Thus figure 1.3 also presents some
concrete (though by no means exhaustive) illustra-
tions (in red) of such negative impacts through each
of the three pathways.
In the middle pathway, governments can abuse
citizens’ data for political ends. As public sector data
systems improve and become increasingly inter­
operable, governments may accumulate a wide array
of information about specific individuals. As long as
public accountability is strong and state actors can
be presumed to act in the broader public interest,
this need not be a major concern. However, if those
36 | World Development Report 2021
presumptions do not hold, significant perils arise.
One clear risk is the potential to misuse such data for
politically motivated surveillance or discrimination
along the lines of ethnicity, religion, race, gender, dis-
ability status, or sexual orientation. Another concern
is the possible use of data by political incumbents,
domestic political players, or even foreign actors to
unduly influence electoral processes by privately
targeting misinformation to marginal voters during
campaigns. Civil society actors can also misuse
data for surveillance (to recruit members for violent
extremism, for example) or to unduly affect electoral
Box 1.4 Using private intent data to tackle COVID-19
At the onset of the COVID-19 outbreak, governments
began implementing policy measures to reduce social
contact and curb the spread of the pandemic. Data col-
lected through mobile phones, such as call detail records
and global positioning system (GPS) location data,
proved extremely valuable in quantifying the effective-
ness of policies ranging from partial curfews to strict
lockdowns.a
These data enabled measurement of popu-
lation density, travel patterns, and population mixing in
real time and at high resolution, making it possible to
better target policy interventions and inform epidemio-
logical modeling. Analysis of GPS locations showed that
by March 23, 2020, social distancing policies had helped
reduce mobility in major US cities by half.b
In Colombia,
Indonesia, and Mexico, travel restrictions and lockdowns
on mobility had different effects on mobility across
socioeconomic groups. Those in the top decile of wealth
reduced their mobility up to twice as much as those in
the bottom decile.c
Despite the potential of deploying mobility data in
the fight against COVID-19, their impact on policy has
been limited in lower-income countries. Bottlenecks
include a lack of technical expertise among government
organizations; restrictions on data access, especially
from mobile network operators; and a lack of investment
and political will required to scale up onetime projects.d
A review of the academic literature produces a
broader look at the impact of repurposed data on
the study of COVID-19 (figure B1.4.1 and map B1.4.1).
Between February and September 2020, more than 950
articles were published in scientific, medical, and tech-
nical journals that repurposed cellphone, social media,
Google search, and other types of private intent data
to track the disease and to offer policy and operational
solutions (figure B1.4.1). Despite the relatively large
number of articles in a short time frame, the coverage
of lower-income countries was quite limited, especially
in Africa (map B1.4.1). This pattern holds after adjusting
Figure B1.4.1 Use of repurposed data to study COVID-19: Published articles,
by type of private intent data used
Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI),
http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Fig-B1_4_1.
Note: Figure shows the number of articles published in scientific, medical, and technical journals across time from February to September 2020. The
cumulative sum across all categories is higher because some articles appear in more than one category.
Articles containing
“COVID” and . . .
Call detail records
Cellphone
Facebook
Google Trends
Big data
Twitter
Social media
1,000
750
500
250
Number
of
articles
0
February April July
2020
September
(Box continues next page)
Harnessing the value of data for the poor | 37
processes, as can private firms, but governments are
more likely to do so. Again, these examples of misuse
of data are mentioned to be indicative rather than
exhaustive of all possibilities.
In the top pathway, individuals and organized
groups can inflict considerable harm through cyber-
criminals who steal and manipulate sensitive infor-
mation. The so-called dark net is a vast parallel net-
work of hidden websites that can only be accessed
using specific software tools and private authoriza-
tions. It acts as an underground digital platform for
a wide array of criminal activities, facilitating illegal
trade in drugs, counterfeit currency, stolen goods,
credit card numbers, forged papers, firearms, and
human organs. In addition to facilitating criminal
activity in the real world, the internet can be the locus
for cybercrime in the digital world, with burgeoning
security breaches leading to the theft of critical data
and raising the risk of major disruptions to critical
services. One recent study estimated the annual cost
of such crime as between US$57 billion and US$110
billion in the United States alone.51
Data service pro-
viders have a tendency to underinvest in cybersecu-
rity because the economic consequences of any data
security breach are largely borne by the clients whose
data are compromised.
In the bottom pathway, private firms can poten-
tially abuse consumers’ data through anticompetitive
practices. Data-driven platform businesses experi-
ence steeply increasing returns to scale as user com-
munities expand, leading to positive network exter-
nalities that make them more and more attractive to
additional users. This dynamic has led to strong mar-
ket concentration in platform businesses—including
the number of articles for death rates associated with
COVID-19 in each country, and it likely reflects the
difficulties in accessing data and the limited research
funding and capacity.
a. Oliver et al. (2020).
b. Klein et al. (2020).
c. Fraiberger et al. (2020).
d. Oliver et al. (2020).
Box 1.4 Using private intent data to tackle COVID-19 (continued)
Map B1.4.1 Use of repurposed data to study COVID-19: Published articles,
by country
Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI),
http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Map-B1_4_1.
Note: Map shows the number of articles published in scientific, medical, and technical journals across countries from February to September 2020.
Article counts are divided by the COVID-19 death incidence rate.
7.5
No data
10.0
12.5
15.0
Number of articles
IBRD 45xxx |
MARCH 2021
IBRDWLD45644 | MARCH2021
38 | World Development Report 2021
e-commerce, search engines, and social media—
raising concerns about abuse of market power. For
services that are provided free of charge, abuse of
dominance may manifest itself in declining quality
of service, particularly in terms of the level of privacy
offeredtoconsumers.Inothercases,useofalgorithms
can facilitate price collusion (tacit or otherwise). More
broadly, data-driven businesses may exploit their
vast information about consumer preferences and
behavior to engage in aggressive or manipulative
marketing techniques based on microtargeting of
persuasive messages—a practice known as nudging—
which may unduly influence consumers’ choices or
simply be a nuisance.52
Just as data can be reused for positive pur-
poses, collecting and sharing sensitive data for ill-
intentioned purposes can pose significant risks. For
example, researchers at Cambridge Analytica devel-
oped a technique to map personality traits based on
what people had “liked” on Facebook. The researchers
paid users small sums to take a personality quiz and
download an app that would scrape some private
information from their profiles and those of their
friends—an activity permitted at the time. Cam-
bridge Analytica eventually obtained files for roughly
30 million users that contained enough information
for the company to match users to other records and
build psychographic profiles. However, only about
270,000 users—those who participated in the quiz—
had consented to having their data harvested.53
The
outcome was that political campaigns were able to
microtarget their political ads to individuals based on
these profiles.
Although social media data can be reused to affect
election outcomes, it is challenging to do so, and there
is little solid evidence that the approach has had such
effects thus far.54
However, the Cambridge Analytica
example demonstrates how private sector data can
be leveraged by third parties (in this case, a political
party) to attempt to influence voting behavior in ways
the originators of the data (Facebook users and their
friends) never intended.
The Cambridge Analytica example also highlights
the importance of transparency as data are increas-
ingly created, used, reused, and repurposed by a wider
range of people, organizations, businesses, and other
parties. At the most basic level, documentation of
sources and collection and aggregation methods are
crucial for data quality and for inspiring trust among
users of data. But transparent documentation is not
a priority in all countries, and some governments
may consciously opt for data opacity, thereby signifi-
cantly undermining public trust. In short, data policy
options are fraught with complex political economic
constraints.
A data governance framework
to enforce the social contract
for data
Data governance entails creating an environment
of implementing norms, infrastructure policies and
technical mechanisms, laws and regulations for data,
related economic policies, and institutions that can
effectively enable the safe, trustworthy use of public
intent and private intent data to achieve development
outcomes. By providing predictability and confidence
that these rights are being protected and protections
are enforceable, a robust and effectively implemented
data governance framework can strengthen trust
in the data system, thereby incentivizing the use of
data-driven products and services, increasing their
value, and ensuring a more equitable distribution of
benefits. In effect, data governance enforces the social
contract around data, by applying the principles of
trust, value, and equity.
A data governance framework can be visualized
as four distinct layers that build on and support one
another (figure 1.4). The foundational layer is the
policy framework for data infrastructure—both the
policies that promote universal access to internet data
services and the policies that ensure that countries
have adequate infrastructure to exchange, store, and
process data efficiently over the internet. The next
layer consists of the legal and regulatory environ-
ment for data itself, which creates rules to enable the
reuse and sharing of data while safeguarding against
their potential abuse and misuse. This normative
framework for data interacts in significant ways with
wider economic policy issues represented in the third
layer, which affect a country’s ability to harness the
economic value of data through competition, trade,
and taxation. The fourth layer is the institutional
ecosystem that ensures that data can deliver on their
potential and that laws, regulations, and policies are
effectively enforced.
Infrastructure policies. The digital character of
modern data makes infrastructure indispensable
for collecting, exchanging, storing, processing, and
distributing data (see chapter 5). Individual access to
data infrastructure is a prerequisite for both contrib-
uting one’s own data and accessing the data of others.
The powerful network benefits, which arise as more
and more people are connected to data infrastruc-
ture, are the economic underpinning for universal
service policies that have also been widely endorsed
Harnessing the value of data for the poor | 39
• Antitrust for data platform businesses
• Trade in data-enabled services
• Taxation of data platform businesses
• International tax treaties to allocate
taxation rights across countries
• Global trade agreements on
cross-border trade in data-enabled
services
• Government entities to oversee,
regulate, and secure data
• Other stakeholders to set standards
and increase data access and reuse
• International organizations to support
collaboration on data governance and
promote standardization
• Cooperation on cross-border regulatory
spillovers and enforcement issues
National International
•
•
Universal coverage of broadband
networks
Domestic infrastructure to exchange,
store, and process data
•
•
Safeguards to secure and protect
data from the threat of misuse
Enablers to facilitate data sharing
among different stakeholders
• Cybersecurity conventions for
collaboration on tackling cybercrime
• Interoperability standards to facilitate
data exchanges across borders
•
•
Global technical standards for
compatibility of hardware and software
Regional collaboration on data
infrastructure to achieve scale
Infrastructure
policies
Institutions
Economic
policies
Laws and
regulations
politically. Significant policy efforts are needed to
ensure adequate coverage of last-mile infrastructure
that keeps pace with constantly evolving technolog-
ical standards of performance. Policy makers also
need to consider how to address demand-side factors,
such as the affordability of handheld devices and data
charges, as well as people’s limited data literacy skills.
At the country level, affordable processing of data
transactions and adequate speed call for increasingly
sophisticateddatainfrastructure.Astartingpointisto
facilitate the creation of internet exchange points that
allow internet service providers to exchange domestic
internet traffic across their respective networks with-
out incurring expense and slowing speed by routing
traffic overseas. A further step is to create a policy
environment suitable for investment in colocation
data centers. Such centers allow storage and retrieval
of vast volumes of digital data, including local rep-
licas of popular global internet content, and they
can be used to provide dedicated access to overseas
cloud computing capacity that facilitates increasingly
sophisticated data processing and machine learning
techniques. Small-scale, regulatory deficiencies and
inadequate competition all conspire to hold back
the development of all forms of data infrastructure
in many low- and middle-income countries, posing
particular challenges for policy makers.
Laws and regulations. Legal and regulatory frame-
works for data need to be adequately developed, with
a balanced emphasis on both safeguards and enablers
(see chapter 6). The legal and regulatory provisions
to safeguard personal and nonpersonal data differ
greatly because these two types of data are typically
generated, used, and treated in very different ways.
Personally identifiable data convey information
that is specific to a known individual, although iden-
tifiers (such as names, addresses, and social security
numbers) that directly or indirectly point to a person
(or entity) could be deleted.55
Some types of personal
data, such as health histories or banking transactions,
may be more sensitive than others, such as shopping
records. Nonpersonal data are generated about non-
human subjects, including institutions or machines.
They may include data on prices, traffic patterns,
Figure 1.4 Data governance layers at the national and international levels
Source: WDR 2021 team.
40 | World Development Report 2021
weather, and agricultural practices. In practice, the
boundary between personal and nonpersonal data is
becoming increasingly blurred as it becomes possible
to infer personal characteristics from nonpersonal
data, such as mobile phone records. Advances in
artificial intelligence also are making the deidentifi-
cation of personal data more challenging and making
personal inferences from combining multiple sources
of nonpersonal data possible, thereby blurring the
boundaries between personal and nonpersonal data.
The nature of data safeguards for personal data
versus nonpersonal data is quite different. For per-
sonal data, a rights-based approach to data protection
is appropriate, emphasizing the rights of data sub-
jects as well as the obligations of data users as the
primary considerations. For nonpersonal data, intel-
lectual property rights provide the relevant frame
of reference, and there is greater scope to weigh the
balance of economic interests between safeguards
and enablers. Another important and underdeveloped
aspect of data protection, affecting both personal and
nonpersonal data, is cybersecurity.
Complementing suchsafeguards,greateraccessto
data for reuse can be enabled by open data regulations
and by provisions that ensure technical interoperabil-
ity between different types of data, allowing them to
be readily combined and repurposed. Data portability
provisions, which allow individuals to move their
own data from one service provider to another, also
help enhance the agency of data subjects.
Economic policies. Because of the proliferation of
data-driven platform business models, the design of
legal frameworks for data governance carries signifi-
cant implications for the real economy that are often
overlooked (see chapter 7). Competition agencies
grapple with the market power of globally dominant
technology firms operating data-driven business
models. Tax authorities struggle to collect revenues
from platform businesses that often have scale in a
market without any physical presence and can readily
shift tax liabilities across international borders. Trade
policy introduces tensions between the need to pro-
tect data domestically and the desire to benefit from
a growing cross-border trade in data-based goods and
services. In each case, decisions about the design of
the domestic regulatory framework for data will
materially affect economic performance.
Institutions. For effective enforcement of the nor-
mative framework, a suitable institutional ecosystem
that encompasses both state and nonstate actors
must be in place (see chapter 8). The proliferation
of arrangements around the world suggests that
there is no single institutional blueprint for the
implementation of data governance frameworks. The
important thing is to identify the critical functions
needed to deliver on the safeguards and enablers
embedded in legal statutes. Depending on the coun-
try context, it may make sense to assign some of these
roles to existing institutions (such as the national
statistical office or relevant sector regulators) or
to create new institutions (such as data protection
agencies or data intermediaries). Whatever the insti-
tutional architecture, common challenges facing the
effective implementation of data governance policies
include capacity and resource constraints, lack of
institutional autonomy, difficulties adopting a data-
driven culture, and problems of coordination across
stakeholder groups.
International dimension. Although they are rooted
in the domestic environment, data governance frame-
works also have important international dimensions
(as shown in figure 1.4 and further detailed in spot-
lights 7.2 and 8.1). In many instances, international
treaties provide the overarching legal framework
for the development of domestic legislation and
regulations. International agreements are also crit-
ical in reaching resolution of long-standing data
policy challenges such as how to treat cross-border
data flows in international trade or how to allocate
taxation rights for data transactions. At the institu-
tional level, decisions made by policy makers and
regulators, particularly in the larger global markets,
will have important spillover effects in smaller coun-
tries, particularly those with which the markets have
strong economic ties. These effects underscore the
importance of cross-border cooperation in addressing
common data governance challenges such as the reg-
ulation of market power in data-driven businesses. At
the same time, data infrastructure is to a considerable
extent cross-border in nature, with large volumes
of data flowing to overseas storage and processing
facilities and opportunities for regional collaboration
around infrastructure development. Facilitation of
such cross-border data movements also entails global
harmonization of technical standards.
In addition, there is an important role for interna-
tional cooperation in creating common standards and
guidelines for statistical activities (spotlight 2.2). The
creation of international measurement standards and
protocols helps improve comparability of measures
across countries in a way that allows national policy
makers to understand their country’s performance
relative to that of their neighbors. Cross-country
measurement of progress toward policy goals and,
more generally, of statistical performance ensure
that countries can benchmark and monitor their
Harnessing the value of data for the poor | 41
data achievements and identify and strengthen
their weaknesses. Good data governance, both at
the national and international levels, ensures that
the various components work together to enable the
effective and safe use of data in order to extract value
in a trustworthy, equitable way.
Putting it all together:
Establishing an integrated
national data system
A well-functioning data governance framework
ensures that infrastructure, laws and regulations,
policies, and institutions work together to support the
use of data in a way that aligns with the social con-
tract for data. This framework defines the rules, and
the associated compliance mechanisms, for how data
can be safely used, reused, and shared by all stake-
holders, including government entities, international
organizations, civil society and individuals, academic
institutions, and the private sector. To realize data’s
potential, this framework must be built around a data
system that not only ensures that data transactions
are safe, but also actively promotes access to data by
all stakeholders (figure 1.5).
This Report concludes with an aspirational vision
of an integrated national data system (INDS) that can
deliver on the promise of producing high-quality data
and then making data open in a way that it is both
protected and accessible to be shared and reused by all
stakeholders (chapter 9). Such an aspirational INDS
works seamlessly with the governance structure. If
the governance system is viewed as establishing the
rules of the road (and the institutions governing those
rules), the INDS can be envisioned as a network of
highways that connects all users and ensures the safe
passage of data to and from destinations.
A well-functioning INDS is powered by people:
people to produce, process, and manage high-
quality data; people to populate the institutions that
Figure 1.5 Data flow safely across all stakeholders in an integrated national data
system
Source: WDR 2021 team.
Government
entities
Academic
institutions
International
organizations
Data
Civil society
and
individuals
Private
sector
Data attributes
Produced Protected Open Quality controlled Used and reused
42 | World Development Report 2021
safeguard and protect the data against misuse; and
people to draft, oversee, and implement data strate-
gies, policies, and regulations. The system also needs
all people to have sufficient skills and knowledge to
use data in ways that allow them to hold the public
and private sectors accountable. All this requires
robust data literacy within data institutions, govern-
ment ministries, the private sector, and the general
population.
There is no singular blueprint for how to build
an INDS. It certainly must be funded sufficiently to
implement the infrastructure and institutions neces-
sary for the system to function well. Incentives need
to be in place to produce, protect, and share data,
and to create a data culture in which people demand
transparency and accountability. But how countries
move toward this vision of an INDS will depend on
their current capacity and the parameters of the social
contract for data. Although the path toward an INDS
will differ for each country, this Report proposes a
sequenced maturity model to help assess progress
and identify areas for more attention to further the
development of a well-functioning INDS.
The maturity model is based on a progression of
three stages: first, establishing fundamentals; sec-
ond, initiating data flows; and third, optimizing the
system (chapter 8). Although progress within these
stages will differ by country—and for a given country
progress in dealing with certain types of data also
may differ—these three stages nonetheless serve as a
useful reference to help assess weak spots and gaps in
the construction of an INDS.
Establishing fundamentals first requires taking
stock to identify the existing data types and the data
processing activities carried out by different actors.
This analysis should focus on activities already tak-
ing place—both inside and outside of government—
that present potential development opportunities for
data use, reuse, and repurposing, along with risks.
Uncovering gaps in the stock of data or bottlenecks in
gaining access to these data can help prioritize efforts
to address gaps and remove barriers. Governments
should also engage with the private sector and civil
society stakeholders to develop legislation, rules,
and standards to safeguard data, while encouraging
data collection, processing, and use. Other steps in
establishing fundamentals include efforts to facili-
tate public-private data sharing and cross-border data
transfers by establishing contracts with information
management services (such as identification sys-
tems) or licenses for regulated entities (such as banks
and telecom operators) that create provisions for
secure, protected data transactions between public
and private actors. Ensuring that the fundamentals
are in place also includes developing a data gover-
nance strategy with policies and laws that promote
the objectives of the INDS and enforce compliance
with rules.
The next phase is to ensure that data begin to flow
across all the stakeholders. One path to this goal is to
establish a government agency with sufficient power
to leverage compliance across ministries and public
sector agencies in how they manage and exchange
data. In addition, the rules and standards that enable
greater interoperability among datasets must be
established. Creating interoperability allows for
innovative new uses of multiple data files as these
data become accessible to a more diverse set of users.
It also allows for the development of measurement
standards to ensure data quality.56
Public-private and
cross-border data flows can be encouraged through
multistakeholder engagements with domestic and
international actors to promote harmonization princi-
ples, standards, and practices. Such engagements are
particularly important for data protection and cyber-
security, which require coordination to be effective.
To reach the optimized stage, the tools and methods
that helped create data flows should be incorpo-
rated into a unified whole-of-government approach.
Ongoing, recurrent investments in training increase
the effective use of data for decision-making and
accountability. Similarly, recurrent investments in
infrastructure keep systems sufficiently modern and
expand access. Data quality, data integration, and
data synchronization should be integral parts of all
processes at this stage. Meanwhile, the safe flow of
data through the data system should be continually
assessed and stress tested for weakness.
Organization of this Report
This Report is divided into three parts. Part I identi-
fies the multiple channels through which data can
support or impede the development process, making
sense of the data landscape and pointing out the asso-
ciated development opportunities and risks. This part
provides a conceptual framework (figure 1.3), together
with illustrations and examples from recent experi-
ence in low- and middle-income countries.
Part II, which describes the data governance layers
presented in figure 1.4, focuses on data governance
broadly defined to include data infrastructure policy
(chapter 5), the legal and regulatory framework for
data (chapter 6), the related economic policy implica-
tions (chapter 7), and institutions (chapter 8). These
diverse elements are effectively the building blocks
Harnessing the value of data for the poor | 43
of a social contract that seeks to deliver the potential
value of data equitably while safeguarding against
harmful outcomes. Examples and case studies illus-
trate both the importance of establishing safeguards
to prevent the misuse of data that could harm devel-
opment objectives and how data can be better enabled
to further development objectives.
Part III brings together the building blocks of the
Report to present the vision of an integrated national
data system (chapter 9).
Throughout the Report, spotlights at the end of
chapters highlight relevant cases in low- and middle-
income countries and internationally and explore
various policy issues in more depth.
This Report was prepared against the backdrop
of the COVID-19 pandemic. The pandemic itself is a
vivid illustration of the usefulness of data in dealing
with obstacles to development and the complexity of
the associated governance challenges. Examples of
how countries have used data as part of their response
to COVID-19 are featured in chapters, using boxes and
narratives to illustrate many of the issues addressed
in the Report. Those issues include the deficiencies of
public sector data systems and the complementarities
between public intent and private intent data, as well
as the legal and regulatory issues posed by accessing
private intent data for public purposes. More broadly,
through a discussion of the many ways in which data
can help economic development, this Report aims to
describe the challenges to realizing these gains, offer
guidance on how to attain them, and propose safe-
guards for protecting citizens.
8. Thorvaldsen (2017).
9. Bethlehem (2009).
10. Thorvaldsen (2017).
11. de Heer, de Leeuw, and van der Zouwen (1999).
12. Conseil constitutionnel, “Déclaration des Droits de
l’Homme et du Citoyen de 1789” [Declaration of Human
and Civic Rights of 26 August 1789], Paris, https://www
.conseil-constitutionnel.fr/le-bloc-de-constitutionnalite
/declaration-des-droits-de-l-homme-et-du-citoyen-de
-1789.
13. Whitby (2020).
14. Bethlehem (2009).
15. Bethlehem (2009).
16. Musa et al. (2013).
17. Wallis and Robinson (1987).
18. Musa et al. (2013).
19. Dempsey (2012).
20. LGBTQI stands for lesbian, gay, bisexual, transgender,
queer (or questioning), intersex.
21. OECD (2013).
22. Directive 95/46/EC on the protection of individuals with
regard to the processing of personal data and on the
free movement of such data (1995 Directive on Personal
Data Protection, https://eur-lex.europa.eu/legal-content
/EN/TXT/?uri=CELEX%3A31995L0046) was repealed and
replaced in 2016 by Regulation (EU) 2016/679 on the pro-
tection of natural persons with regard to the processing
of personal data and on the free movement of such data
(EU GDPR, https://eur-lex.europa.eu/legal-content/en
/TXT/?uri=CELEX%3A32016R0679).
23. Kilic et al. (2017).
24. Serajuddin et al. (2015).
25. Blank and Lutz (2017).
26. Abay et al. (2019); Carletto, Jolliffe, and Banerjee (2015);
Desiere and Jolliffe (2018); Gourlay, Kilic, and Lobell
(2019).
27. Lobell, Azzari, et al. (2020); Lobell, Di Tommaso, et al.
(2020).
28. Jones and Tonetti (2020). Treating data as a nonrival
input in a production function draws on the earlier lit-
erature that modeled information and ideas as nonrival
inputs to production. See Romer (1990) and Radner and
Stiglitz (1984).
29. For example, Statistics Canada (2019) estimated the
value of data in Canada considering the direct labor
cost of data production, associated indirect labor costs,
and other related expenses such as human resources
management and financial control. It quantified the
total own-account investment in databases in 2018 as
between Can$8 billion and Can$12 billion.
30. Two examples illustrate this approach. First, the US
Department of Commerce (2014) found that government
data helped US businesses generate at least US$24 bil-
lion a year. Second, Deloitte (2017) conducted a review
of studies of the economic value of open data (public
data available with no restrictions to users) to a wide
range of users in the United Kingdom and found that
satellite data from Landsat were worth US$2 billion a
year in commercial applications, while public transport
routing and scheduling data from Transport for London
Notes
1. Rowntree (2000 [1901]).
2. World Bank (2016).
3. World Bank (2019).
4. The Report also builds on other themes featured in
past World Bank reports, including the importance of
building the data capacity of countries (see World Bank
2018). More generally, World Bank reports have long
emphasized the importance of data, information, and
knowledge for economic, social, and political develop-
ment (see, for example, World Bank 2002). What has
changed is the nature and amount of data available,
the ways in which they are produced, and the ease with
which they can be exchanged, reused, and shared to
address development objectives. Thus the focus of this
World Development Report is on data for better lives, par-
ticularly for the poor.
5. See, for example, OECD (2013, 2016, 2018a, 2018b, 2019).
6. Whitby (2020).
7. Grajalez et al. (2013).
44 | World Development Report 2021
generated economic benefits to passengers valued at
£80 million a year.
31. Several illustrations of this approach suggest the mag-
nitude of monetary benefits. PwC (2019) found that
market capitalizations of data-intensive companies are
twice as likely to be in the top industry quartile as those
of companies operating in the same sector that are not
data-intensive. Li, Nirei, and Yamana (2019) looked at the
sums paid for acquisitions of data-intensive firms and
their impact on the market capitalization of the acquir-
ing firm. Frier (2018) examined the revenue streams
that companies are able to generate from sales of data
or associated advertising revenues, finding that Apple
charges application developers a commission of 30 per-
cent of their sales for accessing its consumer data, which
has earned the company US$42.8 billion in sales over the
past 10 years.
32. This is similar to suggesting that there is a nonconcavity
in the value of data and information. It is also linked
to the point that because ideas are nonrivalrous, they
exhibit increasing marginal returns over a range. See
Radner and Stiglitz (1984) and Romer (1990).
33. Juba and Le (2019).
34. Goldfarb and Tucker (2019).
35. Zingales (2017) notes that as the economic scale of firms
becomes large in relation to governments, economic and
political power may converge.
36. Cavallo (2013); Cavallo and Rigobon (2016).
37. Erkoyun (2020).
38. Nyeko (2019).
39. Economist (2019).
40. IMF (2019).
41. Cole et al. (2020).
42. Alvarez et al. (2018); Menezes-Filho et al. (2008).
43. Kumler, Verhoogen, and Frías (2020).
44. Kaplan, Piedra, and Seira (2011).
45. Chetty (2012); Cole et al. (2020).
46. Card et al. (2010).
47. Wesolowski et al. (2015).
48. Burke and Lobell (2017); Osgood-Zimmerman et al.
(2018).
49. Chetty et al. (2020).
50. Rosalsky (2020).
51. CEA (2018).
52. Amnesty International (2019); Zuboff (2019).
53. Rosenberg, Confessore, and Cadwalladr (2018).
54. Hern (2018).
55. Kayaalp (2017).
56. Anyone wondering about the importance of establish-
ing comparable definitions and developing precise
instruments for these measures need only look at
the US National Institute of Standards and Technol-
ogy, established in 1901. It has been home to five Nobel
laureates.
References
Abay, Kibrom A., Gashaw T. Abate, Christopher B. Barrett,
and Tanguy Bernard. 2019. “Correlated Non-Classical
Measurement Errors, ‘Second Best’ Policy Inference,
and the Inverse Size–Productivity Relationship in
Agriculture.” Journal of Development Economics 139 (June):
171–84. https://doi.org/10.1016/j.jdeveco.2019.03.008.
Alvarez, Jorge, Felipe Benguria, Niklas Engbom, and Chris-
tian Moser. 2018. “Firms and the Decline in Earnings
Inequality in Brazil.” American Economic Journal: Macro­
economics 10 (1): 149–89. https://doi.org/10.1257/mac
.20150355.
Amnesty International. 2019. “Surveillance Giants:
How the Business Model of Google and Facebook
Threatens Human Rights.” Report POL 30/1404/2019,
Amnesty International, London. https://www.amnesty
.org/en/documents/document/?indexNumber=pol30%
2f1404%2f2019language=en.
Arrow, Kenneth J. 1962. “Economic Welfare and the Alloca-
tion of Resources for Invention.” In The Rate and Direction
of Inventive Activity: Economic and Social Factors, edited by
National Bureau of Economic Research, 609–26. Prince­
ton Legacy Library Series. Princeton, NJ: Princeton Uni-
versity Press.
Ball, Laura. 2009. “Cholera and the Pump on Broad Street:
The Life and Legacy of John Snow.” History Teacher 43 (1):
105–19.
Bethlehem, Jelke. 2009. “The Rise of Survey Sampling.” Dis-
cussion Paper 09015, Statistics Netherlands, The Hague.
Blank, Grant, and Christoph Lutz. 2017. “Representativeness
of Social Media in Great Britain: Investigating Facebook,
LinkedIn, Twitter, Pinterest, Google+, and Instagram.”
American Behavioral Scientist 61 (7): 741–56. https://doi
.org/10.1177/0002764217717559.
Burke, Marshall, and David B. Lobell. 2017. “Satellite-Based
Assessment of Yield Variation and Its Determinants in
Smallholder African Systems.” PNAS Proceedings of the
National Academy of Sciences 114 (9): 2189–94. https://doi
.org/10.1073/pnas.1616919114.
Card, David E., Raj Chetty, Martin S. Feldstein, and Emman-
uel Saez. 2010. “Expanding Access to Administrative
Data for Research in the United States.” White Paper,
National Science Foundation, Alexandria, VA. http://
www.rajchetty.com/chettyfiles/NSFdataaccess.pdf.
Carletto, Calogero, Dean Jolliffe, and Raka Banerjee. 2015.
“From Tragedy to Renaissance: Improving Agricultural
Data for Better Policies.” Journal of Development Studies
51 (2): 133–48. https://doi.org/10.1080/00220388.2014
.968140.
Carrière-Swallow, Yan, and Vikram Haksar. 2019. “The
Economics and Implications of Data: An Integrated
Perspective.” Departmental Paper 19/16, Strategy, Policy,
and Review Department, International Monetary Fund,
Washington, DC.
Cavallo, Alberto. 2013. “Online and Official Price Indexes:
Measuring Argentina’s Inflation.” Journal of Monetary
Economics 60 (2): 152–65.
Cavallo, Alberto, and Roberto Rigobon. 2016. “The Billion
Prices Project: Using Online Prices for Inflation Mea-
surement and Research.” Journal of Economic Perspectives
30 (2): 151–78.
CEA (Council of Economic Advisers). 2018. “The Cost of Mali-
cious Cyber Activity to the U.S. Economy.” CEA, White
House, Washington, DC. https://www.whitehouse.gov
/wp-content/uploads/2018/02/The-Cost-of-Malicious
-Cyber-Activity-to-the-U.S.-Economy.pdf.
Harnessing the value of data for the poor | 45
Chetty, Raj. 2012. “Time Trends in the Use of Administrative
Data for Empirical Research.” Paper presented at NBER
Summer Institute 2012, National Bureau of Economic
Research,Cambridge,MA,July2–27.http://www.rajchetty
.com/chettyfiles/admin_data_trends.pdf.
Chetty, Raj, John N. Friedman, Nathaniel Hendren, Michael
Stepner, and Opportunity InsightsTeam. 2020. “How Did
COVID-19 and Stabilization Policies Affect Spending and
Employment? A New Real-Time EconomicTracker Based
on Private Sector Data.” NBER Working Paper 27431,
National Bureau of Economic Research, Cambridge, MA.
https://www.nber.org/system/files/working_papers
/w27431/w27431.pdf.
Cole, Shawn, Iqbal Dhaliwal, Anja Sautmann, and Lars Vil-
huber. 2020. Handbook on Using Administrative Data for
Research and Evidence-Based Policy. Cambridge, MA: Abdul
Latif Jameel Poverty Action Lab and Massachusetts
Institute of Technology. https://admindatahandbook.mit
.edu/book/v1.0-rc6/index.html.
DCMS (Department for Digital, Culture, Media, and Sport,
United Kingdom). 2020. “UK National Data Strategy.”
Policy paper, DCMS, London. https://www.gov.uk
/government/publications/uk-national-data-strategy
/national-data-strategy.
de Heer, Wim, Edith Desirée de Leeuw, and Johannes van
der Zouwen. 1999. “Methodological Issues in Survey
Research: A Historical Review.” Bulletin of Sociological
Methodology 64 (1): 25–48.
Deloitte. 2017. “Assessing the Value of TfL’s Open Data
and Digital Partnerships.” Deloitte LLP, London. http://
content.tfl.gov.uk/deloitte-report-tfl-open-data.pdf.
Dempsey, Caitlin. 2012. “History of GIS.” GIS Lounge (blog),
May 14, 2012. https://www.gislounge.com/history-of-gis/.
Desiere, Sam, and Dean Jolliffe. 2018. “Land Productivity
and Plot Size: Is Measurement Error Driving the Inverse
Relationship?” Journal of Development Economics 130 (Jan-
uary): 84–98. https://doi.org/10.1016/j.jdeveco.2017.10.002.
Duch-Brown, Nestor, Bertin Martens, and Frank Mueller-
Langer. 2017. “The Economics of Ownership, Access, and
Trade in Digital Data.” JRC Digital Economy Working
Paper 2017-01, Joint Research Center, European Commis-
sion, Seville, Spain. https://ec.europa.eu/jrc/sites/jrcsh
/files/jrc104756.pdf.
Economist.2019.“TheNetTightens:A$2bnLoanScandalSank
Mozambique’s Economy.” August 22, 2019. https://www
.economist.com/middle-east-and-africa/2019/08/22/a
-2bn-loan-scandal-sank-mozambiques-economy.
Erkoyun, Ezgi. 2020. “Researchers Say New Model Shows
Turkish Inflation Well Above Official Tally.” Reuters,
October22,2020.https://www.reuters.com/article/turkey
-economy-inflation-int-idUSKBN2771EY.
Fraiberger, Samuel P., Pablo Astudillo, Lorenzo Candeago,
Alex Chunet, Nicholas K. W. Jones, Maham Faisal Khan,
Bruno Lepri, et al. 2020. “Uncovering Socioeconomic
Gaps in Mobility Reduction during the COVID-19 Pan-
demic Using Location Data.” ArXiv:2006.15195 [Physics.
soc-ph], July 27, Cornell University, Ithaca, NY.
Frier, Sarah. 2018. “Is Apple Really Your Privacy Hero?”
Bloomberg Businessweek, June 8, 2018. https://www
.bloomberg.com/news/articles/2018-08-08/is-apple
-really-your-privacy-hero.
Gillies, Robert J., Paul E. Kinahan, and Hedvig Hricak. 2015.
“Radiomics: Images Are More Than Pictures, They Are
Data.” Radiology 278 (2): 563–77. https://doi.org/10.1148
/radiol.2015151169.
Goldfarb, Avi, and Catherine Tucker. 2019. “Digital Econom-
ics.” Journal of Economic Literature 57 (1): 3–43. https://doi
.org/10.1257/jel.20171452.
Gourlay, Sydney, Talip Kilic, and David B. Lobell. 2019. “A
New Spin on an Old Debate: Errors in Farmer-Reported
Production and Their Implications for Inverse Scale–
Productivity Relationship in Uganda.” Journal of Devel-
opment Economics 141 (November): 102376. https://www
.sciencedirect.com/science/article/pii/S030438781830
6588.
Grajalez, Carlos Gómez, Eileen Magnello, Robert Woods,
and Julian Champkin. 2013. “Great Moments in Statis-
tics.” Significance 10 (6): 21–28.
Hallal, Pedro Curi, Fernando P. Hartwig, Bernardo L. Horta,
Gabriel D. Victora, Mariângela F. Silveira, Cláudio José
Struchiner, Luís Paulo Vidaleti, et al. 2020. “Remarkable
Variability in SARS-CoV-2 Antibodies across Brazilian
Regions: Nationwide Serological Household Survey in
27 States.” medRxiv (May 30). https://www.medrxiv.org
/content/10.1101/2020.05.30.20117531v1.
Hallal, Pedro Curi, Bernardo L. Horta, Aluísio J. D. Barros,
Odir A. Dellagostin, Fernando P. Hartwig, Lúcia C. Pel-
landa, Cláudio José Struchiner, et al. 2020. “Trends in
the Prevalence of COVID-19 Infection in Rio Grande
do Sul, Brazil: Repeated Serological Surveys.” Ciência 
Saúde Coletiva 25 (supplement 1): 2395–401. https://doi
.org/10.1590/1413-81232020256.1.09632020.
Hern, Alex. 2018. “Cambridge Analytica: How Did It Turn
Clicks into Votes?” Guardian, May 6, 2018. https://www
.theguardian.com/news/2018/may/06/cambridge
-analytica-how-turn-clicks-into-votes-christopher-wylie.
IMF (International Monetary Fund). 2019. “Republic of
Mozambique: Diagnostic Report on Transparency,
Governance, and Corruption.” IMF Country Report
19/276, IMF, Washington, DC. https://www.imf.org/en
/Publications/CR/Issues/2019/08/23/Republic-of
-Mozambique-Diagnostic-Report-on-Transparency
-Governance-and-Corruption-48613.
Jones, Charles I., and Christopher Tonetti. 2020. “Nonrivalry
and the Economics of Data.” AmericanEconomicReview 110
(9): 2819–58. https://doi.org/10.1257/aer.20191330.
Juba, Brendan, and Hai S. Le. 2019. “Precision-Recall Versus
Accuracy and the Role of Large Data Sets.” Proceedings of
theAAAIConferenceonArtificialIntelligence 33 (01): 4039–48.
https://doi.org/10.1609/aaai.v33i01.33014039.
Kaplan, David S., Eduardo Piedra, and Enrique Seira. 2011.
“Entry Regulation and Business Start-Ups: Evidence from
Mexico.” Journal of Public Economics 95 (11–12): 1501–15.
https://doi.org/10.1016/j.jpubeco.2011.03.007.
Kayaalp, Mehmet. 2017. “Modes of De-Identification.” Paper
presented at American Medical Informatics Association
2017 Annual Symposium, Washington, DC. November
6–8. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC
5977668.
Kilic, Talip, Umar Serajuddin, Hiroki Uematsu, and Nobuo
Yoshida. 2017. “Costing Household Surveys for Moni-
toring Progress toward Ending Extreme Poverty and
46 | World Development Report 2021
Boosting Shared Prosperity.” Policy Research Working
Paper 7951, World Bank, Washington, DC.
Klein, Brennan, Timothy LaRock, Stefan McCabe, Leo
Torres, Filippo Privitera, Lake Brennan, Moritz U. G.
Kraemer, et al. 2020. “Assessing Changes in Commuting
and Individual Mobility in Major Metropolitan Areas
in the United States during the COVID-19 Outbreak.”
Network Science Institute, Northeastern University,
Boston. https://www.networkscienceinstitute.org
/publications/assessing-changes-in-commuting-and
-individual-mobility-in-major-metropolitan-areas-in-the
-united-states-during-the-covid-19-outbreak.
Koutroumpis, Pantelis, Aija Leiponen, and Llewellyn D. W.
Thomas. 2020. “Markets for Data.” IndustrialandCorporate
Change 29 (3): 645–60. https://doi.org/10.1093/icc/dtaa002.
Kumler,Todd, EricVerhoogen, and Judith Frías. 2020. “Enlist-
ing Employees in Improving Payroll Tax Compliance:
Evidence from Mexico.” Review of Economics and Statistics
102 (5): 881–96. https://doi.org/10.1162/rest_a_00907.
Leighton, Timothy G., and Andi Petculescu. 2016. “Guest
Editorial: Acoustic and Related Waves in Extraterrestrial
Environments.” Journal of the Acoustical Society of America
140 (2): 1397–99. https://doi.org/10.1121/1.4961539.
Li, Wendy C. Y., Makoto Nirei, and Kazufumi Yamana.
2019. “Value of Data: There’s No Such Thing as a Free
Lunch in the Digital Economy.” RIETI Discussion Paper
19-E-022, Research Institute of Economy, Trade, and
Industry, Tokyo. https://www.rieti.go.jp/jp/publications/
dp/19e022.pdf.
Lobell, David B., George Azzari, Marshall Burke, Sydney
Gourlay, Zhenong Jin, Talip Kilic, and Siobhan Murray.
2020. “Eyes in the Sky, Boots on the Ground: Assessing
Satellite- and Ground-Based Approaches to Crop Yield
Measurement and Analysis.” American Journal of Agri-
cultural Economics 102 (1): 202–19. https://doi.org/10.1093
/ajae/aaz051.
Lobell, David B., Stefania Di Tommaso, Calum You, Ismael
Yacoubou Djima, Marshall Burke, and Talip Kilic. 2020.
“Sight for Sorghums: Comparisons of Satellite- and
Ground-Based Sorghum Yield Estimates in Mali.” Remote
Sensing 12 (1): 100. https://doi.org/10.3390/rs12010100.
Menezes-Filho, Naércio Aquino, Marc-Andreas Muendler,
and Garey Ramey. 2008. “The Structure of Worker Com-
pensation in Brazil, with a Comparison to France and
the United States.” Review of Economics and Statistics 90 (2):
324–46.
Musa, George J., Po-Huang Chiang,Tyler Sylk, Rachel Bavley,
William Keating, Bereketab Lakew, Hui-Chen Tsou, and
Christina W. Hoven. 2013. “Use of GIS Mapping as a Pub-
lic Health Tool: From Cholera to Cancer.” Health Services
Insights 6 (November): 111–16. https://doi.org/10.4137/HSI
.S10471.
Nyeko, Oryem. 2019. “Tanzania Drops Threat of Prison
over Publishing Independent Statistics.” Human
Rights Watch, Dispatches, July 3. https://www.hrw.org
/news/2019/07/03/tanzania-drops-threat-prison-over
-publishing-independent-statistics.
OECD (Organisation for Economic Co-operation and
Development). 2013. The OECD Privacy Framework. Paris:
OECD. http://www.oecd.org/sti/ieconomy/oecd_privacy
_framework.pdf.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2016. “Big Data: Bringing Competition Policy
to the Digital Era.” Report DAF/COMP(2016)14 (rev.
November 29–30), Competition Committee, Directorate
for Financial and Enterprise Affairs, OECD, Paris. https://
one.oecd.org/document/DAF/COMP(2016)14/en/pdf.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2018a. Rethinking Antitrust Tools for Multi-Sided
Platforms 2018. Paris: OECD. https://www.oecd.org/daf
/competition/Rethinking-antitrust-tools-for-multi-sided
-platforms-2018.pdf.
OECD (Organisation for Economic Co-operation and
Development). 2018b. Tax Challenges Arising from Digi-
talisation: Interim Report 2018. Paris: OECD. http://dx.doi
.org/10.1787/9789264293083-en.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2019. Enhancing Access to and Sharing of Data:
Reconciling Risks and Benefits for Data Re-use across Societies.
Paris: OECD. https://www.oecd-ilibrary.org/content
/publication/276aaca8-en.
Oliver, Nuria, Bruno Lepri, Harald Sterly, Renaud Lambi-
otte, Sébastien Deletaille, Marco De Nadai, Emmanuel
Letouzé, et al. 2020. “Mobile Phone Data for Informing
Public Health Actions across the COVID-19 Pandemic
Life Cycle.” Science Advances 6 (23): eabc0764. https://doi
.org/10.1126/sciadv.abc0764.
Osgood-Zimmerman, Aaron, Anoushka I. Millear, Rebecca
W. Stubbs, Chloe Shields, Brandon V. Pickering, Lucas
Earl, Nicholas Graetz, et al. 2018. “Mapping Child
Growth Failure in Africa between 2000 and 2015.” Nature
555 (7694): 41–47. https://doi.org/10.1038/nature25760.
PwC. 2019. “Putting a Value on Data.” PwC, London. https://
www.pwc.co.uk/issues/data-analytics/insights/putting
-value-on-data.html.
Radner, Roy, and Joseph E. Stiglitz. 1984. “A Nonconcavity in
the Value of Information.” In Bayesian Models in Economic
Theory, edited by Marcel Boyer and Richard E. Kihl-
strom, 33–52. Studies in Bayesian Econometrics Series 5.
Amsterdam: Elsevier.
Romer, Paul M. 1990. “Endogenous Technological Change.”
Journal of Political Economy 98 (5): S71–S102.
Rosalsky, Greg. 2020. “The Dark Side of the Recovery
Revealed in Big Data.” Planet Money Newsletter, October
27, 2020. https://www.npr.org/sections/money/2020/10
/27/927842540/the-dark-side-of-the-recovery-revealed-in
-big-data.
Rosenberg, Matthew, Nicholas Confessore, and Carole Cad-
walladr. 2018. “How Trump Consultants Exploited the
Facebook Data of Millions.” New York Times, March 17,
2018. https://www.nytimes.com/2018/03/17/us/politics
/cambridge-analytica-trump-campaign.html.
Rowntree, Benjamin Seebohm. 2000. Poverty: A Study of
Town Life, 2d ed. Bristol, UK: Policy Press.
Serajuddin, Umar, Hiroki Uematsu, ChristinaWieser, Nobuo
Yoshida,andAndrewL.Dabalen.2015.“DataDeprivation:
Another Deprivation to End.” Policy Research Working
Paper 7252, World Bank, Washington, DC.
Statistics Canada. 2019. “The Value of Data in Canada: Exper-
imental Estimates.” Daily, July 10, 2019, Statistics Canada,
Ottawa. https://www150.statcan.gc.ca/n1/pub/13-605-x
/2019001/article/00009-eng.htm.
Harnessing the value of data for the poor | 47
Stephens-Davidowitz, Seth. 2017. Everybody Lies: Big Data,
New Data, and What the Internet Can Tell Us about Who We
Really Are. Illus. ed. New York: Dey Street Books.
Thorvaldsen, Gunnar. 2017. Censuses and Census Takers: A
Global History. Routledge Studies in Modern History
Series. London: Routledge. https://doi.org/10.4324/97813
15148502.
UFPEL (Federal University of Pelotas). 2020. “FAPESP e
Todos pela Saúde garantirão a continuidade do estudo
EPICOVID-19 BR.” Coordenação de Comunicação Social,
Pró-Reitoria de Gestão da Informação e Comunicação,
UFPEL, Pelotas, Rio Grande do Sul, Brazil. http://ccs2
.ufpel.edu.br/wp/2020/08/31/fapesp-e-todos-pela-saude
-garantirao-a-continuidade-do-estudo-epicovid-19-br/.
US Department of Commerce. 2014. “Fostering Innovation,
Creating Jobs, Driving Better Decisions: The Value of
Government Data.” Office of the Chief Economist, Eco-
nomics and Statistics Administration, US Department
of Commerce, Washington, DC. https://www.commerce
.gov/files/fostering-innovation-creating-jobs-driving
-better-decisions-value-government-data.
Wallis, Helen M., and Arthur Howard Robinson, eds. 1987.
Cartographical Innovations: An International Handbook of
Mapping Terms to 1900. Tring, UK: Map Collector
Publications.
Wesolowski, Amy, Taimur Qureshi, Maciej F. Boni, Pål Roe
Sundsøy, Michael A Johansson, Syed Basit Rasheed,
Kenth Engø-Monsen, et al. 2015. “Impact of Human
Mobility on the Emergence of Dengue Epidemics in
Pakistan.” ProceedingsoftheNationalAcademyofSciences 112
(38): 11887–92.
Whitby, Andrew. 2020. The Sum of the People: How the Census
Has Shaped Nations, from the Ancient World to the Modern
Age. New York: Basic Books.
World Bank. 2002. World Development Report 2002: Building
Institutions for Markets. Washington, DC: World Bank;
New York: Oxford University Press.
World Bank. 2016. World Development Report 2016: Digital
Dividends. Washington, DC: World Bank.
World Bank. 2018. Data for Development: An Evaluation of
World Bank Support for Data and Statistical Capacity. Wash-
ington, DC: Independent Evaluation Group, World Bank.
World Bank. 2019. IC4D, Information and Communications for
Development 2018: Data-Driven Development. Washington,
DC: World Bank. http://documents1.worldbank.org
/curated/en/987471542742554246/pdf/128301-97814648
13252.pdf.
Yala, Adam, Peter G. Mikhael, Fredrik Strand, Gigin Lin,
Kevin Smith, Yung-Liang Wan, Leslie Lamb, et al. 2021.
“Toward Robust Mammography-Based Models for
Breast Cancer Risk.” ScienceTranslationalMedicine 13 (578):
1–11. https://doi.org/10.1126/scitranslmed.aba4373.
Zingales, Luigi. 2017. “Towards a Political Theory of the
Firm.” Journal of Economic Perspectives 31 (3): 113–30.
Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The
Fight for a Human Future at the New Frontier of Power. New
York: Public Affairs. https://www.hbs.edu/faculty/Pages
/item.aspx?num=56791.
48 | World Development Report 2021
A novel experiment in India empowered villagers—particularly women,
many of them illiterate—to design their own process for collecting and
deploying data to track changes in the quality of public services and in
their living standards and to make better decisions in village meetings.
Spotlight 1.1
Helping communities to gain the
ability to collect and analyze their
own data
In 2014 the World Bank’s Social Observatory, working
closely with the Pudhu Vaazhvu Project of the South
Asia Livelihoods team in the south Indian state of
Tamil Nadu, developed a method called participatory
tracking.1
This effort built on the institutional con-
text in India, whereby democratically elected village
councils hold regular, open village meetings in which
budget priorities are planned and monitored, and
there is a network of women’s self-help groups with
a strong presence in every village in the state. Partici-
patory tracking proceeded in three steps.
First, representatives of women’s groups from 200
villages engaged in several weeks of deliberations to
think about what constitutes the good life for them,
turn those ideas into indicators measured using sur-
vey questions, and then test those questions in their
villages through a simple questionnaire that took no
more than 30 minutes to answer.
Second, the community-designed questionnaire
was incorporated into tablet-based software. A mem-
ber of each women’s group was trained via video
on how to administer the questionnaire in her own
village. In the pilot, women were able to conduct a
census of 40,000 households in about six weeks. Once
the survey was conducted, the data were dispatched
directly to a cloud server to prevent anyone from tam-
pering with them.
About one-third of villagers could not read or
write. Thus, the team developed ways of showing
the data that would be understandable to people who
were not literate.
Third, the Social Observatory team “coproduced”
data visualizations with the community. Figure S1.1.1
provides an example, developed to demonstrate vari-
ations in decision-making within the household. If
a member of the family has more authority over a
decision, his or her face becomes bigger. If women
have more authority, the colored background shades
from dark gray toward dark green. The village
median is displayed alongside that of another vil-
lage for comparison. These data were presented
at village planning meetings. They substantially
improved the quality of deliberation by allowing
citizens and officials to focus on the issues of
concern rather than debate the facts about where
decision-making power lay.
Versions of participatory tracking that focus more
on the management of public goods and common
property are being designed and will be scaled up in
the Indian states of Tamil Nadu (where the pilot was
developed) and Karnataka, which will cover more
than 75 million people, and nationwide in Indonesia,
which will cover another 145 million.
Helping communities to gain the ability to collect and analyze their own data | 49
Note
1. Palaniswamy, Sakhamuri, and Xia (2017).
Reference
Palaniswamy, Nethra, Smriti Sakhamuri, and Cassandra Xia.
2017. “Participatory Tracking: Customizing Visualiza-
tions.” Social Observatory (blog), September 2017. http://
socialobservatory.worldbank.org/articles/participatory
-tracking-customizing-visualizations.
Figure S1.1.1 A citizen-led method to ascertain who has authority in household
decision-making in rural Indian villages
Source: Palaniswamy, Sakhamuri, and Xia 2017.
Note: The figure shows a screenshot of a data visualization developed to demonstrate variations in household decision-making. If a member of the family has
more authority over a decision, his or her face becomes bigger. If women have more authority, the colored background shades from dark gray toward dark
green.
’
50 | World Development Report 2021
Improvements in the collection, reporting, and monitoring of data about
debt will be critical to borrowers and creditors alike.
Spotlight 1.2
The importance of good data in
helping low- and middle-income
countries to manage debt during and
after the COVID-19 pandemic
International Monetary Fund (IMF) have proposed
the DSSI.2
In managing current and future debt,
governments and partners will need to improve the
coverage, quality, timeliness, granularity, and trans-
parency of debt data.3
This effort requires investing
in both the data themselves and in the systems for
collecting, managing, analyzing, and reporting data,
especially for countries where the risks are greatest.
The World Bank and IMF have offered the following
recommendations to the G-20:4
Data on public debt remain opaque in some countries,
potentially enabling governments to overborrow and
hide debts from their citizens and creditors, at least for
some period (see the example of Mozambique from
chapter 1). This vulnerability is compounded by the
high (reported) debt levels of lower-income countries
at the outset of the COVID-19 crisis and the changing
composition of private creditors and debt instru-
ments. In 2019 almost half of all low-income countries
were either in debt distress or at high risk of it. As
the pandemic pushes as many as 150 million people
into extreme poverty,1
countries may need to take on
substantial additional debt, which could result in large
debt overhangs that could take years to manage.
The proliferation of complex debt instruments
may make it easier for governments to obscure their
debt position. Moreover, the composition of creditors
is changing: China, the largest creditor, increased its
share of the combined debt owed by Debt Service
Suspension Initiative (DSSI)-eligible countries to
G-20 countries from 38 percent to 57 percent between
2013 and end-2019 (figure S1.2.1). These changes could
create new exposures, especially regarding access to
future debt. While most lower-income countries owe
a relatively small share of their external public debt
to private creditors, some countries, including Chad,
Côte d’Ivoire, Ghana, Saint Lucia, and Zambia, owe as
much as 50–60 percent to private creditors. Private
sector participation in achieving a sustainable debt
trajectory will become increasingly critical for many
countries.
Recognizing the pressing need to manage the debt
burden of low-income countries, the World Bank and
Figure S1.2.1 In six years, the
composition of debt has shifted
dramatically
DSSI-eligible countries’ bilateral debt: Composition of
creditors, 2013 and 2019
Source: World Bank, Debt Data (dashboard), https://datatopics.worldbank
.org/debt/. Data at http://bit.do/WDR2021-Fig-S1_2_1.
Note: For more information, see “What Is the External Debt Reporting
System (DRS)?” World Bank, Washington, DC, https://datahelpdesk
.worldbank.org/knowledgebase/articles/381934-what-is-the-external
-debt-reporting-system-drs. DSSI = Debt Service Suspension Initiative.
China
Other G-20
bilateral creditors
Other
bilateral creditors
0 10 20 30
% of bilateral debt
40 50 60
Japan
India
2013 2019
The importance of good data | 51
• 
Establish clear and internationally harmonized concepts
and definitions of debt. Governments should adopt
international reporting and statistical standards
that clearly define debt concepts. New standards
should be developed where gaps exist to improve
the granularity of debt data.
• 
Strengthen legal frameworks within countries. Stronger
legal frameworks have clearly defined organiza-
tional structures, roles, and responsibilities; suffi-
cient monitoring, auditing, and compliance mecha-
nisms in place to avoid conflicts of interest; internal
controls to make sure that laws, procedures, and
policies are followed; and well-coordinated debt
management and other fiscal policies and financial
frameworks.
• 
Build a functional debt recording, management, and
dissemination system. A robust system is needed to
facilitate the production and use of timely, accurate,
high-quality, reliable, and complete data. The sys-
tem should be interoperable with other key appli-
cations and be protected by appropriate security
and data protection controls. Disseminating debt
data openly in user-friendly formats and building
the capability of systems to produce reports for
borrowers and inform debt service transactions can
facilitate the use of data.
• 
Improve the organizational structure. High-level gov­
ernment commitment and clear mandates are
essential.5
Effective coordination and account-
ability mechanisms can help to disentangle frag-
mented debt management functions across institu-
tions and entities and ensure the timely sharing of
data and analysis. Debt management offices should
have the authority to collect data from state-owned
enterprises or other indebted public entities.
• 
Strengthen staff capacity. Highly skilled staff with
technical capacity to extract, analyze, and publish
debt data are needed and should be retained and
adequately compensated.
Multilateral institutions such as the World Bank
and IMF play a critical role in improving debt trans-
parency by improving coverage in the databases they
manage, providing technical assistance, performing
assessments on a country’s current debt manage-
ment framework, designing international standards
on debt, and building awareness. Lenders, along with
borrowers, also have a responsibility to report debt
information fully, accurately, and transparently.
All of these steps will help creditors to assess
accurately the debt sustainability of their poten-
tial borrowers, citizens to hold their governments
accountable for the debt they assume, and borrowers
to design strategies based on a clear understanding of
the level, cost, and risk profile of their debt portfolio.
Increased debt transparency will also help many low-
and middle-income countries to assess and manage
their external debt during and after the COVID-19 cri-
sis and to work with policy makers toward achieving
sustainable debt levels and terms.
Notes
1. World Bank (2020b).
2. World Bank (2020a).
3. IMF and World Bank (2018).
4. World Bank (2020a).
5. Teeling (2018).
References
IMF (International Monetary Fund) and World Bank. 2018.
“G20 Notes on Strengthening Public Debt Transpar-
ency.” IMF, Washington, DC, June 14, 2018. https://www
.imf.org/external/np/g20/pdf/2018/072718.pdf.
Teeling, Gerry. 2018. “Debt Data Transparency.” Back-
ground paper for Intergovernmental Group of Experts
on Financing for Development, 2nd Session, United
Nations Conference onTrade and Development, Geneva,
November 7–9, 2018. https://unctad.org/system/files
/non-official-document/tdb_efd2c03_Teeling_en.pdf.
World Bank. 2020a. “Debt Service Suspension and
COVID-19.” Factsheet, December 21, 2020, World Bank,
Washington, DC. https://www.worldbank.org/en
/news/factsheet/2020/05/11/debt-relief-and-covid-19
-coronavirus.
World Bank. 2020b. Poverty and Shared Prosperity 2020: Rever-
sals of Fortune. Washington, DC: World Bank. https://
openknowledge.worldbank.org/bitstream/handle/10986
/34496/9781464816024.pdf.
World-Development-Report-2021-Data-for-Better-Lives.pdf
Data as a force for public good | 53
Main messages
Public intent data, a foundation of public policies, can play a transformative
role in the public sector. However, gaps in the availability, quality, and
usability of these data are pervasive, particularly in low-income countries—
the countries that stand to benefit most from improving public intent data.
Lack of resources, technical capacity, and data governance hamper the
production of useful data for public policy. Lack of data literacy and
demand for data limits their use for public policy.
These problems can be addressed through the high-level prioritization
of data, including long-term financing, investments in human capital, and
laws conducive to the safe production, exchange, and use of data. Some
investments in better data have paid for themselves.
Ensuring a political commitment to and predictable government financ-
ing for the production of public intent data remains a central struggle in
lower-income countries. The political will to prioritize funding for data
systems can be stimulated by boosting the demand for data.
Data as a force for public good
2
1
2
3
4
CHAPTE
R
54 | World Development Report 2021
The central role of public
intent data
S
uppose a woman walks into a doctor’s office
and is given a diagnosis without examination
by the doctor: no measurement of her heart
rate, no recording of her symptoms, and no review of
her medical history. The doctor just prescribes a med-
ication. Such an approach, and such a world in which
crucial data are not gathered, analyzed, and acted on,
would not be welcome, to say the least.1
Yet all too often governments make decisions
affecting people’s well-being without understanding
or even taking into account essential data. Designing
policies without data is akin to a shot in the dark.2
This problem is particularly acute in the poorest
countries, where gaps in both the availability and the
use of data are severest.3
Just as data gathered by a doctor can help improve
a patient’s diagnosis and ultimate well-being, data
gathered by governments, international organiza-
tions, research institutions, and civil society can
improve societal well-being by enhancing service
delivery, prioritizing scarce resources, holding gov-
ernments accountable, and empowering individuals.
These data serve as the foundation for core functions
of governments and their endeavors to reduce pov-
erty. The data a doctor gathers often take the form of a
conversation or some other means of communicating
information between patient and doctor. In the same
way,datagatheredwiththeintentofinformingpublic
policy should enrich the policy dialogue and allow for
systematic flows of information and communication
among governments, their citizens, and commerce.
Such flows of information and communication
require long-term investments in statistical capacity,
infrastructure, data governance, data literacy, and
data safeguards. These investments depend on one
another. Failure in one area jeopardizes the value that
data bring to development. Too often these invest-
ments are not made in the poorest parts of the world,
contributing to data deprivations and poverty.
How should such deprivations be addressed? This
chapter discusses the pathways through which data
for public policy generate value for development, the
obstacles to safe realization of value, and how those
obstacles can be overcome.
Public intent data and
development: Three pathways
for adding value
Public intent data—data collected with the intent
of serving the public good by informing the design,
execution, monitoring, and evaluation of public pol-
icy, or through other activities—are a prerequisite
for many government functions. For that reason,
government agencies are the primary producers of
public intent data through censuses, surveys, and
administrative data, among other things. Citizens,
civil society organizations (CSOs), nongovernmental
organizations (NGOs), academic institutions, and
international organizations also contribute critically
to the production of public intent data through sur-
veys, crowdsourcing platforms, and other means.
Data from firms can also be used for public policy—
a topic that will be covered in chapter 4.4
This chapter
distinguishes between six types of public intent data
that all serve the public good (box 2.1).
Thediscussionthatfollowsusescountryexamples
to describe three important pathways through which
public intent data can bring value to development by
(1) improving service delivery, (2) prioritizing scarce
resources, and (3) holding governments accountable
and empowering individuals. But these are not the
only pathways. Others include regulating the econ-
omy and markets, fostering public safety and secu-
rity, and improving dispute or conflict resolution.
The country examples reveal several conditions
that should be in place to maximize the value of pub-
lic intent data. The data need to be (1) produced with
adequate spatial and temporal coverage (complete,
timely, and frequent); (2) high in quality (granular,
accurate, and comparable); (3) easy to use (accessible,
understandable, and interoperable); and (4) safe to
use (impartial, confidential, and appropriate)—see
figure 2.1.5
With these features, development-related
data have the potential to transform development out-
comes. For this potential to be realized, the data must
be used explicitly to generate public good, including
through the three pathways summarized in the fol-
lowing sections.
Pathway 1: Improving service delivery
Increasing access to government services. One of the
fundamental ways in which public intent data can
improve livelihoods is by increasing access to gov-
ernment services. More access often requires data
representative of all residents. Use of administrative
data, particularly foundational identification (ID)
systems such as national IDs and civil registries as
well as digital identification, ensures that all persons
are covered and access is equitable. In Thailand at the
turn of the century, only 71 percent of the population
was covered by a public health insurance scheme
that was intended to be universal. Yet the country
had a near-universal foundational ID and population
Data as a force for public good | 55
Box 2.1 Six types of public intent data
Administrative data—such as birth, mar-
riage, and death records and data from
identification systems; population, health,
education, and tax records; and trade
flow data—are generated by a process of registration or
record keeping, usually by national authorities. Admin-
istrative data also include data used by governments to
run projects, programs, and services. The digital revolu-
tion has created new types of administrative data—for
example, when education and health inspectors’ use of
smartphone apps channels data to a central register.
Censuses aim to systematically enumerate
and record information about an entire
population of interest, whether individ-
uals, businesses, farms, or others. Most
prominently, population and housing censuses record
every person present or residing in a country and provide
essential information on the entire population and their
key socioeconomic conditions.
Sample surveys draw on a smaller, repre-
sentative sample of the entire population,
typically from censuses, to collect detailed
information more frequently. These sur-
veys cover many domains such as household surveys,
farm surveys, enterprise surveys, labor force surveys,
and demographic and health surveys. Key official statis-
tics, such as unemployment and national accounts, rely
on survey data, often in combination with administrative
data and census data.a
Citizen-generated data are produced by
individuals, often to fill gaps in public and
private sector data or when the accuracy
of existing data is in question. These data,
which can have an important monitoring and account-
ability function, contribute to solving problems that
citizens face.b
Examples include HarassMap, an Egyptian
tool that maps cases of sexual harassment based on
citizen reports, and ForestWatchers, a platform through
which citizens monitor the deforestation of the Amazon.
By contrast, machine-generated data are
automatically generated by a sensor,
application, or computer process without
human interactions. An example is the
sensors that monitor air pollution. These data emerge
when devices are embedded with sensors and other
technologies, allowing them to transfer data with each
other, a system known as the Internet of Things.
Geospatial data relate multiple layers of
information based on their geographic
locale. Public intent geospatial data
include satellite imagery of the Earth
such as that provided by the US National Aeronautics
and Space Administration’s Landsat program and the
European Space Agency’s Copernicus program; weather
data; and cadastral (property and land record) data.c
These data types are neither exhaustive nor mutually
exclusive. For example, all data sources can be geo­
referenced and thus can be used in geospatial applica-
tions, and some administrative data and geospatial data
can be machine-generated. Data sources are interoper-
able when they can be linked across and within these
types though common numeric identifiers for persons,
facilities, or firms; geospatial coordinates; time stamps;
and common classification standards.
a. Sample surveys also include the surveys that are implemented by social
media companies and target a sample of users who are active on their
platforms. Examples include the Future of Business and Gender Equality
at Home surveys conducted on the Facebook platform.
b. Meijer and Potjer (2018).
c. Such data sources are discussed in greater detail in chapter 4.
Figure 2.1 Certain data features can maximize the value of public intent data
Source: WDR 2021 team, drawing on Jolliffe et al. (forthcoming).
Ensuring the data have
adequate coverage
• Completeness
• Timeliness
• Frequency
• Granularity
• Accuracy
• Comparability
• Accessibility
• Understandability
• Interoperability
• Impartiality
• Confidentiality
• Appropriateness
Ensuring the data
are of high quality
Ensuring the data
are easy to use
Ensuring the data
are safe to use
56 | World Development Report 2021
registration system in which citizens and residents
were issued a personal ID number when they were
born or when their households were registered for the
first time. Leveraging this register and the personal
ID information from the existing public insurance
scheme, the government was able to identify the pop-
ulation not covered and so was able to increase health
insurance coverage from 71 percent to 95 percent.6
Machine-generated data also have the potential to
markedly improve access to services such as water.
In Kenya, sensors on water hand pumps, which are
inoperable in one-third of rural Africa, provide real-
time data on their functionality. This system helped
reduce the average time to repair a broken pump from
27 days to three days and the median time from six
days to one day (figure 2.2).7
Better preparing for and responding to emergencies.
Public intent data can also lead to a better emergency
response when disasters hit, whether environmental,
financial, health, or conflict related. For example,
weather data, especially weather forecasts, can help
people anticipate and prepare for extreme events.
The value of such data was revealed by two intense
cyclones in the Bay of Bengal 14 years apart. The
1999 cyclone caught the Indian state of Odisha by
surprise, causing massive devastation, killing more
than 10,000 people, and destroying housing and
public infrastructure. Since then, the Odisha State
Disaster Management Authority and the government
of Odisha have invested in weather forecast data and
disaster response measures. When another cyclone
hit in 2013, nearly 1 million people were evacuated to
cyclone shelters, safe houses, and inland locations,
and only 38 people died during and immediately after
the storm.8
These impressive results would not have
been possible without the weather data that gave
­
sufficient advance warning of the cyclone.
Mobile technologies have the potential to speed
up emergency responses. In Uganda, a health report-
ing program that provides beneficiaries, health pro-
fessionals, and the Ministry of Health with real-time
health data by using text messaging was able to cut
the response time to outbreaks of disease by half. The
technology was used after the 2012 Ebola outbreak
to help implement quarantines and other protective
measures.9
As these examples demonstrate, timely
data can contribute to quick reactions to a crisis.
Generating useful knowledge. Data generated and
used by academic institutions, think tanks, and inter-
national organizations play a vital role in ensuring
thatpoliciesareevidence-based.Impactevaluationsof
reformsanddevelopmentprojectsarefrequentlyused
to assess whether past policies have had the intended
consequences and to improve program design. In the
last few decades, numerous field experiments have
tested policies in a real-life setting under strict statis-
tical conditions that allow cause and effect to be ascer-
tained. Findings from such experiments have been
used to implement new policies and scale up existing
programs. One estimate suggests that the new poli-
cies and programs built on the research findings have
reached more than 400 million people worldwide.10
In Brazil, evidence from 2,150 municipalities found
that many mayors are willing to pay to learn the
results of impact evaluations, and that informing
mayors about research on a simple and effective pol-
icy increases the probability by 10 percentage points
that their municipality implements the policy.11
Research also plays an important role in ensuring
the accuracy of the data collected by governments,
which is critical to preventing policy recommenda-
tions based on inaccurate or misleading data.12
The
World Bank’s Living Standards Measurement Study
(LSMS) program, while supporting the production of
household survey data in 106 countries between 2011
and 2020,13
has also drawn attention to the impor-
tance of research on survey methodologies and the
role of better measurement in eliminating systematic
measurement errors in self-reported survey data that
otherwise bias empirical analyses and policy conclu-
sions.14
Muchofthemethodologicalresearchledbythe
LSMS is carried out in partnership with national sta-
tistical offices (NSOs), in turn facilitating the adoption
of improved methods in downstream national surveys.
Pathway 2: Prioritizing scarce resources
Targeting resources and reaching marginalized populations
and areas. When public intent data are granular—that
is, they are tied to an individual or a specific location—
they can help target resources and foster inclusion.
In Croatia, data from the population census were
Figure 2.2 Improving access to water: Using
real-time sensor data to reduce repair time for broken
hand pumps in Kenya
Source: SSEE 2014. Data at http://bit.do/WDR2021-Fig-2_2.
27
6
2.6
1
0 5 10 15 20 25 30
Mean
repair time
Median
repair time
Number of days
With mobile-enabled data Baseline
Data as a force for public good | 57
combined with household survey data and admin-
istrative data to create detailed maps of poverty and
deprivations (map 2.1).15
The maps revealed large dif-
ferences in living standards across municipalities and
within the territorial boundaries used for allocating
funds from the European Union (EU).16
More than
one-third of the EU’s annual budget—equivalent to
more than €50 billion—is dedicated to investments in
infrastructure, such as hospitals and schools, in less
economically developed areas. Because the allocation
of funds depends on an area’s gross domestic product
(GDP) per capita, poor municipalities situated in non-
poor regions may not receive funding. Armed with
the poverty map, Croatia responded with proposals
for new geographical subdivisions that concentrate
EU funds in the poorest areas.17
This reordering,
thanks to better data and analysis, has the potential to
reduce inequality and pockets of poverty in Croatia.
A long-running and rich example of the value
of granular data are the Demographic and Health
Surveys, which cover topics such as HIV/AIDS and
gender-based violence (see spotlight 2.1). Over the last
few decades, data from 82 of these surveys, disaggre-
gated by sex, have been used as inputs for developing
laws banning domestic violence, developing HIV
education programs, and more.18
In Vietnam, a survey
on gender-based violence revealed that more than
half of women have experienced physical, sexual, or
emotional abuse; that nearly half of these had physi-
cal injuries as a result; and that seven in eight did not
seek any help. These data spurred a public discussion
about the topic, informed the National Strategy on
Gender Equality, and introduced counseling, health,
legal, and shelter services for women subject to vio-
lence at home.19
Saving money and resources. Interoperability
between geospatial data and government records
can help governments save resources. Incomplete
and out-of-date property and taxpayer records are an
important reason that taxes remain uncollected in
many low- and middle-income countries. InTanzania,
the government introduced a Geographic Informa-
tion System for tax reporting and revenue collection.
The system identified buildings via satellite imagery,
collected and digitized data on their characteristics,
and provided a comprehensive, up-to-date record of
taxable properties. Using this new method, the city
government of Arusha identified 102,904 buildings—
nearly five times more than with earlier databases.
One year after the system was introduced, the eight
participating cities increased their revenue collection
by 30 percent on average.20
Interoperable administrative data have also been
used to increase efficiencies and save costs in public
welfare systems. For example, in Argentina the gov-
ernment identified ineligible beneficiaries across
various social programs using the country’s system
of unique taxpayer ID numbers. The exercise gener-
ated estimated savings of US$143 million over eight
years.21
More generally, investments in better data
systems have been shown to pay for themselves.22
Monitoring progress and determining priorities. Public
intent data can also help prioritize resources by mon-
itoring progress on key indicators and deliverables
over longer periods of time. Such monitoring is vital
for creating and tracking national and international
development goals. The Sustainable Development
Goals (SDGs), for example, rely heavily on public
intent data.23
If the data needed to measure one of the
targets were collected only every 10 years, tracking
progress would become challenging.
Cross-country comparable composite indexes—
often created by think tanks, research institutions,
and international organizations—allow countries to
benchmark their performance over time and against
peers and to decide on priorities. These data can
induce countries to respond with reforms in areas
where they are lagging. Multidimensional poverty
indexes, which measure poverty at the household and
individual levels, track certain indicators in countries
Map 2.1 Reducing poverty: Mapping pockets of
poverty in Croatia allowed better targeting of
antipoverty funds
Source: World Bank 2016a. Data at http://bit.do/WDR2021-Map-2_1.
IBRDCRO45401 | MARCH2021
58 | World Development Report 2021
over time, helping countries decide on areas of focus.
Costa Rica issued a presidential directive calling for
use of such an index for budgetary planning and as
an official measure for allocating resources and mon-
itoring and evaluating social programs. The country
has used the index to modify its budget allocation,
which helped accelerate poverty reduction during a
period of austerity without an increase in budget.24
Pathway 3: Holding government
accountable and empowering individuals
Fostering transparency and increasing government
accountability. CSOs and individuals are frequent pro-
ducers and users of public intent data. Their demand
for data can encourage transparency through data
analysis and data feedback systems. In China, media
and watchdog organizations in Beijing noted incon-
sistencies between official government data on
air quality and data from independent air quality
monitoring systems. Heightened concerns about air
quality have fueled a dramatic expansion in publicly
available, real-time data from thousands of air qual-
ity monitoring locations.25
The central government
launched a US$275 billion plan to improve air quality
throughout the country, and the Beijing municipal
government promised an additional US$160 billion
toward that goal.26
Good data can also encourage transparency in and
improve public procurement. Too often, public proj-
ects are not implemented adequately due to poor pro-
curement such as inflated costs, corruption, or ghost
contracts. Because 12 percent of global GDP is spent
on public procurement, this finding matters tremen-
dously for development outcomes.27
In Uganda, in
an attempt to improve procurement outcomes, local
government entities made administrative procure-
ment data from the bidding process down to the level
of execution of contracts available to certain CSOs.
These CSOs trained community members to under-
stand the information in the contracts and conduct
site checks to verify it.The findings revealed misman-
agement of resources by contractors and government
officials and a high dependence on noncompetitive
contracts. Not only did Uganda undertake reforms to
ensure that contracts were complying with national
procurement standards, but the national public pro-
curement agency also upgraded its procurement por-
tal in line with international open contracting data
standards, making Uganda the first African country
to do so.28
Government accountability can also be enhanced
through e-governance.29
In Pakistan, a smartphone
app that equips government health inspectors with
real-time data on rural public health clinics led to a
74 percent increase in clinic inspections. In turn, doc-
tor attendance rose by 18 percentage points, thereby
improving health care services.30
Empowering individuals. Disadvantaged groups are
sometimes left out of government efforts to collect
data because governments fail to acknowledge inclu-
sion of those groups as a policy objective. Citizens
must then often collect the data needed to empower
themselves. That data, such as on harassment and
early warning systems, can help fill a gap that neither
the public sector nor the private sector can fill. The
map-based mobile app Safetipin allows users to report
mobility and safety issues in cities related to lighting,
walk paths, visibility, public transport, and security.
Beyond informing citizens where it is safe to be in
their city, these data can be used to conduct citywide
audits. In Bogotá, Colombia, the city government
wanted to use this tool to map safety around bike
paths. The biking community helped collect images
along 230 kilometers of bike paths in the city, which
were then analyzed by Safetipin (map 2.2). This
analysis supported the authorities in understanding
where to improve lighting and add closed-circuit TV
cameras.31
Public intent data can also empower individuals
to make better choices through more information
and knowledge. The digital revolution has greatly
increased the accessibility of data, as well as how
easily information can be spread. One example is
providing smallholder farmers with agricultural
information digitally, often through text messages,
to increase their productivity. Such data transmis-
sion can improve on extension services, which rely
on in-person agricultural advice and are more costly
to sustain and whose quality is more difficult to
ensure. A meta-analysis suggests that providing
agricultural information increases yields by 4 percent
and farmers’ probability of increasing productivity-
enhancing agrochemical inputs.32
With more than
2 billion people living on smallholder farms, these
numbers can have major effects on global poverty
and shared prosperity.
Many of the features of public intent data that
increase their value for development can also increase
their potential for harm. Data may be misused for
political surveillance and control or discrimination
and exclusion, or they may inadvertently expose sen-
sitive information about individuals.33
For example,
in República Bolivariana de Venezuela, a digital bio-
metric fingerprint system was introduced initially for
voter registration and identification, but it has since
been integrated with other registers. Identification
Data as a force for public good | 59
Map 2.2 Improving public safety: The use of
citizen-collected data in Bogotá led to greater
safety around bike paths
Source: Safetipin 2016.
Note: Safety ratings of poor to excellent for Bogotá bike paths are based on safety scores.
with the digital fingerprint has become mandatory
to purchase basic goods such as food and medicine,
which has led to numerous cases of stores refusing
to sell young people, foreigners, and LGBTQI individ-
uals such goods.34
To avoid data being harmful in this
and other ways, certain prerequisites must be put in
place, notably robust data protection laws, indepen-
dent oversight, and legal and technological solutions
to safeguard the confidentiality of individuals and
prevent misuse of data.
Gaps in the coverage, quality,
and usability of public intent data
Despite the demonstrated value of public intent
data, gaps in their availability, quality, and usability
persist, particularly in poor countries. This section
documents these gaps, drawing on the World Bank’s
Statistical Performance Indicators (SPI), described in
box 2.2, as well as two other prominent indexes rating
public intent data availability and quality.35
When the coverage of data is inadequate:
Lack of timeliness, frequency, and
completeness
Lack of timely and frequent data remains an issue in
many thematic areas and across all types of public
intent data. Timeliness is particularly an issue with
survey and census data because long lags commonly
occur between their collection and their release. For
example, according to the Statistical Performance
Indicators, half of low-income countries have not
undertaken a population and housing census in the
last10years,and18percenthavenotdonesointhelast
20 years.36
The census has a foundational function in
any statistical system and is critical for political repre-
sentation and resource allocation. The costs of allow-
ing the census to become outdated are demonstrable.37
Monthly or quarterly industrial production indexes,
which are important to track current economic
activity, are available in only 9 percent of low-income
countries, compared with 40 percent of lower-middle-
income countries, 48 percent of upper-middle-income
countries, and 64 percent of high-income countries.38
Ground-based sensors, deployed in Internet of
Things systems, can measure some outcomes, such
as air pollution, climatic conditions, and water qual-
ity, on a continual basis and at a low cost. However,
adoption of these technologies is still too limited
to provide timely data at scale, particularly in low-
income countries.39
Lack of completeness is often less of a problem in
census and survey data because they are designed to
cover the entire population of interest. For adminis-
trative data, the story is different. Civil registration
and vital statistics systems (births and deaths) are
not complete in any low-income country, compared
with completeness in 22 percent of lower-middle-
income countries, 51 percent of upper-middle-income
countries, and 95 percent of high-income countries.40
These gaps leave about 1 billion people worldwide
without official proof of identity.41
More than one-
quarter of children overall, and more than half of
children in Sub-Saharan Africa, under the age of five
are not registered at birth.42
Although population and housing censuses are
designed to represent all individuals at the time of
IBRDCOL45414 | MARCH2021
60 | World Development Report 2021
the census, they can leave out some of the poorest and
most vulnerable. Many vulnerable groups are hard
to count in the first place, especially when census
enumeration focuses on residence and the concept
of the household. These groups include the displaced,
the homeless, slum inhabitants, nomads, migrants,
young children, and the disabled.43
The extent of
undercounting is difficult to measure systematically,
but in 2013 it was estimated that globally between 170
million and 320 million people were missing from
population census frames, with the poorest more
likely to be missed.44
As noted, in many countries the
census determines the allocation of resources and
political representation. Thus these omissions have
real consequences and can disenfranchise vulnerable
populations.45
They also affect the representativeness
of household surveys that use census-based sampling
frames.46
Lower-income countries also are susceptible to
coverage gaps in geospatial data, especially in some
of the geospatial reference datasets such as admin-
istrative boundaries, postal codes, and maps. The
Global Open Data Index of the Open Knowledge
Foundation assesses the availability and openness
of three such geospatial datasets in 94 countries:
administrative boundaries, addresses and locations,
and national maps. The assessment reveals that all
three datasets are often incomplete in lower-income
countries (figure 2.3).
Similarly, the road network coverage of the open
mapping platform OpenStreetMap is complete in
many high-income countries, but less so in lower-
income countries. OpenStreetMap is a citizen-
generated geospatial application that relies on its
users to digitize the location of roads and other infra-
structure. Its coverage disparities reflect the barriers
to making this type of data work for the poorest
countries. In India, by 2015 only 21 percent of the road
network had been digitized.47
Figure 2.3 Gaps in geospatial datasets are especially
large in lower-income countries
Source: WDR 2021 team calculations, based on data of Open Knowledge Foundation, GODI (Global Open
Data Index) (database), https://index.okfn.org/. Data at http://bit.do/WDR2021-Fig-2_3.
0
20
40
60
80
100
Share
of
countries
with
dataset
gaps
(%)
Administrative
boundaries
Datasets
Addresses and
locations
National maps
Low-income Lower-middle-income
Upper-middle-income High-income
Box 2.2 The World Bank’s Statistical Performance Indicators
The World Bank’s Statistical Performance Indicators
(SPI) measure statistical performance across 174 coun-
tries.a
The indicators are grouped into five pillars: (1) data
use, which captures the demand side of the statistical
system; (2) data services, which looks at the interaction
between data supply and demand such as the openness
of data and quality of data releases; (3) data products,
which reviews whether countries report on important
indicators; (4) data sources, which assesses whether cen-
suses, surveys, and other data sources are created; and
(5) data infrastructure, which captures whether founda-
tions such as financing, skills, and governance needed
for a strong statistical system are in place. Within each
pillar is a set of dimensions, and under each dimension
is a set of indicators to measure performance. The indi-
cators provide a time series extending at least from 2016
to 2019 in all cases, with some indicators going back to
2004. The data for the indicators are from a variety of
sources, including databases produced by the World
Bank, International Monetary Fund (IMF), United Nations
(UN), Partnership in Statistics for Development in the
21st Century (PARIS21), and Open Data Watch—and
in some cases, directly from national statistical office
websites. The indicators are also summarized as an index,
with scores ranging from a low of 0 to a high of 100.
a. World Bank, Statistical Performance Indicators (database), http://www
.worldbank.org/spi; Dang et al. (2021a, 2021b).
Data as a force for public good | 61
When data quality is poor: Lack of
granularity, accuracy, and comparability
Lack of granularity can occur when data are not
available at the desired level of disaggregation.
The gaps in data on women and girls are partic-
ularly severe. Only 10 of the 54 gender-specific
indicators (19 percent) in the SDGs are widely
available, based on international standards for
measurement, and only 24 percent of the avail-
able gender-specific indicators are from 2010 or
later.48
Gaps in sex-disaggregated data related to
the COVID-19 pandemic are also pervasive, causing
knowledge of the gender impacts of the pandemic to
be incomplete (box 2.3).
Box 2.3 Gender data and the COVID-19 pandemic
The COVID-19 pandemic was not gender-blind; it affected
men and women differently and may have exacerbated
gender inequalities.a
Yet knowledge of the gender
impacts of COVID-19 is incomplete because of data
gaps across all dimensions of well-being. At the most
basic level, data are lacking on COVID-19 infections and
deaths among men and women. In March 2020, only 61
percent of reported COVID-19 cases were disaggregated
by sex, and these data were provided by 26 countries. By
November 2020, reporting had grown to 80 countries,
but the proportion still stood at 60 percent. The reporting
was irregular throughout 2020, as shown in figure B2.3.1.
Understanding the gender dimensions of the COVID-19
impacts extends well beyond case and mortality data.
The data systems in place prior to the pandemic had
notable gender data gaps that hampered the ability to
track impacts and inform policy. For example, monitor-
ing impacts on jobs requires regular and timely data
on informal employment where women predominate.
However, only 41 percent of low-income countries (LICs)
and lower-middle-income countries (LMICs) report data
on informal jobs disaggregated by sex. And in seven
of the 10 countries where the recent economic con-
traction is severest, less than 38 percent of Sustainable
Development Goal economic opportunity indicators are
available by sex.b
Furthermore, preexisting biases in
face-to-face household survey design and implemen-
tation bled into phone surveys implemented during the
pandemic, limiting measurement of the gender-related
impacts of the crisis. These biases include designing
phone surveys aimed at household heads and lack of
survey content on time use.
There are also notable gaps in the gender data needed
to inform policy design and effectiveness. Although the
expansion of social protection programs is arguably the
largest policy response to offset the economic impacts
of the crisis, comparable sex-disaggregated measures of
social protection coverage are largely unavailable. Data
on personal identification cards and mobile phone own-
ership should inform program design decisions, espe-
cially as countries scale up digital platforms. Yet data
on gender differences in ownership of personal identity
cards are missing for more than a third of countries. Less
than a quarter of LICs and LMICs report data on mobile
phone ownership by women.c
Even though the pandemic created new demands for
statistics, it also interrupted the supply. More than half
of LICs and LMICs reported that the COVID-19 pandemic
affected national sta­
tistical offices’ ability to produce
socioeconomic statistics.d
This problem requires imme-
diate attention, but building effective, gender-aware
data systems will require sustained financial and human
capital investments.
Sources: Mayra Buvinic (Center for Global Development), Lorenz Noe
(Data2x), and Eric Swanson (Open Data Watch), with inputs from the
WDR 2021 team.
a. UN Women (2020).
b. Buvinic, Noe, and Swanson (2020).
c. Buvinic, Noe, and Swanson (2020).
d. UNSTATS and World Bank (2020).
Figure B2.3.1 Proportion of COVID-19
cases reported with sex-disaggregated
data for 190 countries
Sources: Global Health 50/50, University College London, COVID-19
Sex-Disaggregated Data Tracker (database), November 30, 2020, data
release, https://globalhealth5050.org/the-sex-gender-and-covid-19
-project/; Global Change Data Lab, University of Oxford, Our World in
Data, Coronavirus Pandemic (COVID-19) (database), https://ourworld
indata.org/coronavirus; calculations of Open Data Watch, Washington,
DC. Data at http://bit.do/WDR2021-Fig-B2_3_1.
0
M
a
r
c
h
A
p
r
i
l
M
a
y
J
u
n
e
2020
J
u
l
y
A
u
g
u
s
t
S
e
p
t
e
m
b
e
r
O
c
t
o
b
e
r
N
o
v
e
m
b
e
r
25
50
75
100
Percent
Sex-disaggregated? Yes
No
Proportion of countries reporting sex-disaggregated data
62 | World Development Report 2021
Althoughdatadisaggregatedattheindividuallevel
are central to understanding and addressing condi-
tions that uniquely affect the lives of women, men,
children, adults, the elderly, and persons with dis-
abilities, the required data are not being sufficiently
produced. For example, survey data on ownership of
physical and financial assets have traditionally been
collected at the household rather than the individual
level, limiting their usefulness in understanding
women’s relative wealth, rights, and decision-making
power in their families.49
Monetary poverty esti-
mates are also based on household-level measures of
resources, and “poor individuals” are identified based
on the poverty status of their entire households,
regardless of differences within households among
women, men, and children in access to and use of
resources.50
Meanwhile, gaps remain in the adoption
and proper implementation of the survey questions
developed by the Washington Group on Disability
Statistics—questions that are critical for obtaining
internationally comparable estimates on disabilities
and for disaggregating relevant SDG indicators by
disability status.51
Finally, despite the enormous potential of geo-
graphically granular data for targeting policies effec-
tively, such disaggregated data are rarely available
comprehensively. According to the 2020 Open Data
Inventory, about 90 percent of official statistics, even
when they are available, are not consistently reported
at the regional level (first administrative division),
and almost none are consistently reported at the dis-
trict level (second administrative division).
Poor accuracy of data can limit their usefulness.
For those collecting individual-level data through
household surveys, a concern is the choice of survey
respondents. Relying on proxy respondents to elicit
individual-level information—a common cost-saving
mechanism in large-scale household surveys—has
been shown to produce wrong estimates of gender
differences in asset ownership, labor market out-
comes, decision-making, and control of income.52
Reported levels of income, wages, and firm profits
vary, depending on the length of the period over
which they are recalled by survey respondents.53
The
length of recall also matters for the accuracy of survey
data on agricultural production, health, and labor.54
Accuracy is also a concern for administrative data.
One reason for the proliferation of survey data is the
perception that administrative records are unreli-
able and incomplete.55
A study of multiple African
countries found overreporting of vaccination rates in
health information systems by 5 percent of countries
and of primary enrollment rates in education manage-
ment systems by a third. This data inflation appears
to be connected to making aid flows conditional on
results, creating an incentive to misreport.56
Data quality concerns and methodological
challenges also characterize data produced by the
Internet of Things. For example, the quality of data
generated by low-cost commercial sensors used for
air pollution monitoring has been found to vary
widely when benchmarked against reference mea-
surements.57
Sensors must be calibrated to the
specific conditions in which they are used to yield
accurate results, but the calibration process remains
expensive and time-consuming.58
Lack of comparability is particularly a concern
among low-income countries. Only 40 percent of
low-income countries, 20 percent of countries in
fragile and conflict-affected situations (FCS), and
40 percent of countries in Sub-Saharan Africa (fig-
ure 2.4) have at least three comparable estimates of
extreme poverty.59
It is therefore difficult to under-
stand changes in living standards over time and
design policies to eradicate poverty. Recent innova-
tions in data collection in these countries suggest a
slightly more optimistic picture for the future.60
It is
also important to note that some lack of comparabil-
ity over time is necessary, particularly when adopting
new global standards.
When data are not easy to use:
Lack of accessibility, understandability,
and interoperability
Lack of data accessibility prohibits actors from using
data. According to an assessment of the Open Data
Inventory, lower-income countries lag far behind in
overall data openness (table 2.1), although even high-
income countries have mediocre openness scores.
Only 11 percent of low-income countries consistently
make data available with a license classifiable as open,
compared with 19 percent of lower-middle-income
countries, 22 percent of upper-middle-income coun-
tries, and 44 percent of high-income countries.
The Open Data Inventory assessment also reveals
some limitations to machine readability. To the
extent that governments publish official statistics,
only 37 percent of low-income countries make at least
some of these available in machine readable formats,
compared with 51 percent of lower-middle-income
countries, 61 percent of upper-middle-income coun-
tries, and 81 percent of high-income countries.
One reason for lack of data accessibility is that data
systems in the public sector can be very fragmented.
The health sector, for example, often has many dif-
ferent health information systems because of its ten-
dency to have many different service providers. These
include many private providers whose data are often
Data as a force for public good | 63
unavailable to the Ministry of Health. In Ethiopia, a
study of the health sector found 228 different digital
health information applications, of which only 39 per-
cent sent data to the Ministry of Health.61
Administra-
tive data, in particular, are too often siloed in different
systems, prohibiting their effective use for monitoring
and policy design. Although data coordination within
agencies is often limited, the challenge of siloed sys-
tems is even greater across government agencies.62
Lack of understandability prevents even those data
that are accessible from generating value. To be
understandable, data must be well disseminated,
backed up with sufficient metadata, responsive to
user needs, and, for certain purposes, summarized
and visualized for the user. A majority of countries
have data portals and provide metadata for their pub-
lished data—practices that facilitate wider data use.63
Low-income countries perform comparatively well in
the data portal and metadata categories, but even here
they lag. A larger gap remains in terms of advance
release calendars, which commit government units
to release data on a predetermined timetable. Only
Figure 2.4 Lower-income countries, especially those affected by fragility and
conflict, have less comparable poverty data than other country groups
Source: WDR 2021 team calculations, based on World Bank, PovcalNet: Data (database), http://iresearch.worldbank.org/PovcalNet/data.aspx. Data at http://
bit.do/WDR2021-Fig-2_4.
Note: Only those economies with at least one international poverty estimate are included. FCS status refers to the World Bank’s “Classification of Fragile and
Conflict-Affected Situations” (World Bank 2020a).
0
20
40
60
80
100
Share
of
countries
(%)
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
H
i
g
h
-
i
n
c
o
m
e
E
u
r
o
p
e
a
n
d
C
e
n
t
r
a
l
A
s
i
a
L
a
t
i
n
A
m
e
r
i
c
a
a
n
d
t
h
e
C
a
r
i
b
b
e
a
n
E
a
s
t
A
s
i
a
a
n
d
P
a
c
i
fi
c
S
o
u
t
h
A
s
i
a
M
i
d
d
l
e
E
a
s
t
a
n
d
N
o
r
t
h
A
f
r
i
c
a
S
u
b
-
S
a
h
a
r
a
n
A
f
r
i
c
a
N
o
n
-
F
C
S
F
C
S
Income group Region FCS status
Less than three poverty data points
Last three poverty data points not comparable
Last three poverty data points comparable
Table 2.1 Assessment of the openness of data, by country income group
Indicator Low-income
Lower-middle-
income
Upper-middle-
income High-income
Openness score (0–100) 38 47 50 66
Available in machine readable format (%) 37 51 61 81
Available in nonproprietary format (%) 75 85 81 84
Download options available (%) 56 68 68 78
Open terms of use/license (%) 11 19 22 44
Source: WDR 2021 team calculations, based on 2020/21 Open Data Inventory indicators (Open Data Watch, ODIN [Open Data Inventory] [database],
https://odin.opendatawatch.com/), also used as part of the World Bank’s Statistical Performance Indicators database, http://www.worldbank.org/spi.
Note: The openness score is the average by country income group on a scale of 0–100. All other indicators are the percentage of published data averaged by
country income group.
64 | World Development Report 2021
30 percent of NSOs in low-income countries pub-
lish such calendars, compared with almost all high-
income countries. Across the board, only a few NSOs
utilize user satisfaction surveys, which could play an
important role in gauging and understanding data
demand (table 2.2).
Limitations to interoperability. The use of common
standards, methodologies, and classifications across
public intent data sources ensures interoperability
and enables data integration. Common and unified
identification is needed across producers of pub-
lic intent data for geographic divisions below the
national level, such as regions, states, and districts.
There is significant scope for expanding the use of
georeferencing in censuses, surveys, and collection
of administrative data, particularly in low-income
settings. The use of common and unified personal
identifiers to match data across multiple data sources
is more contentious because of privacy and equity
concerns, and robust data protection legislation is a
prerequisite for their use.64
Personal identification
also requires trust and comprehensive civil regis-
tration and vital statistics systems, which have so
far been elusive in the poorest countries. The use of
tokenized identifiers in line with privacy by design
principles is a potential solution.65
Adhering to set methodologies and standards
in line with international best practices greatly
increases the interoperability and usability of pub-
lic intent data. The World Bank’s Statistical Per-
formance Indicators capture this aspect of public
intent data systematically. Under the indicator on
data infrastructure, standards related to systems of
national accounts, employment status, consumption,
consumer price indexes, and government finance
statistics, among others, are assessed. The indicator
shows a strong income gradient in the adherence to
international best-practice standards and methodol-
ogies.66
For example, the International Classification
of Status in Employment is being used in two-thirds
of high-income countries but in only 7 percent of
low-income countries (figure 2.5). By contrast, a large
share of all countries globally is using at least the 1993
international standards for the System of National
Accounts (SNA 1993).
When data are not safe to use:
Lack of impartiality, confidentiality,
and appropriateness for development
Gaps also remain in the safety of data. These can
occur when data are not immune to influence from
stakeholders, when they are not stored securely, or
when they are not properly deidentified. For example,
Greece’s debt statistics appear to have deliberately
misrepresented the country’s financial situation in
Table 2.2 Data dissemination practices and openness, by country income group
Indicator Low-income
Lower-middle-
income
Upper-middle-
income High-income
NSO uses advance release calendar 30 75 92 98
NSO has data portal 84 91 95 92
NSO has conducted user satisfaction survey 10 20 19 33
NSO makes metadata available 63 91 97 100
Source: Cameron et al. 2019.
Note: Data are for 2019. The percentages reflect the proportion of the population in each income group whose national statistical office (NSO) has the listed
attribute.
Figure 2.5 Lower-income countries
are less likely than other countries
to adhere to international best-
practice statistical standards and
methodologies
Source: WDR 2021 team calculations, based on World Bank, Statistical
Performance Indicators (database), http://www.worldbank.org/spi. Data at
http://bit.do/WDR2021-Fig-2_5.
0
20
40
60
80
100
At least 1993 System of
National Accounts in use
Share
of
countries
(%)
International Classification
of Status in Employment
in use
Low-income Lower-middle-income
Upper-middle-income High-income
Data as a force for public good | 65
the lead-up to the 2009 euro crisis, and data breaches
are all too common in government and private sector
databases.67
Similarly, deidentifying individuals has not always
proved to be enough to maintain confidentiality. In the
1990s, the governor of Massachusetts in the United
States approved making deidentified medical records
of state employees available for researchers. Although
keyidentifierssuchasnameandaddresswereremoved
from the data, by triangulating the information avail-
able with other public information a researcher was
abletoidentifythemedicalrecordsofthegovernorand
other individuals (see chapter 6 for more details).68
One
way to minimize these concerns is to ensure that only
appropriate data are produced—data that measure con-
cepts of interest, have a clear policy purpose, and are
not produced from attempts to collect excessive infor-
mation or surveil individuals. Such data, of course, can
still be misused and mishandled.
Why data gaps persist:
The political economy of
public intent data
The previous two sections describe how public
intent data can yield great value for development, yet
gaps in public intent data are severe, particularly in
low-income countries—the countries that stand to
benefit most from the data. Why do these data gaps
persist? This section answers that question, com-
plementing existing data sources with structured
interviews with NSOs across all income groups and
geographical regions.69
This approach requires dig-
ging one level deeper and understanding the main
roadblocks on the pathways to data for public policy,
or conversely, the enablers of public intent data. The
main roadblocks identified are lack of financing, tech-
nical capacity, data governance, and demand for data
(figure 2.6).
A common reason for these roadblocks is lack of
understanding of and commitment to the use of data
for policy making. In a positive feedback loop, realiz-
ing the value of public intent data increases under-
standing of the potential of the data, leading to a com-
mitment to the further production and use of public
intent data. To spearhead such commitments, SDG
Target 17.18 calls for increasing the availability of high-
quality, timely, and disaggregated reliable data, and
SDG Target 17.19 calls for developing measurements of
progress related to statistical capacity building.
Deficiencies in financing
Underinvestment and misaligned investment priori-
ties are perpetuating data gaps.
Figure 2.6 A positive feedback loop can connect enablers and features of public
intent data with greater development value
Source: WDR 2021 team.
delivery
Financing Technical capacity Governance Data demand
Enablers of public intent data
Desirable features of public intent data
Adequate coverage
Completeness
Timeliness
Frequency
High quality
Granularity
Accuracy
Comparability
Easy to use
Accessibility
Understandability
Interoperability
Safe to use
Impartiality
Confidentiality
Appropriateness
Value of public intent data
Improved service
scarce resources
Prioritization of
empowerment
Accountability and
66 | World Development Report 2021
Underinvestment by governments. Underinvestment
in public intent data systems is widespread. Only half
of countries had a national statistical plan that was
fully funded in 2019 (figure 2.7).70
Lack of national
funding for statistics is especially a struggle for frag-
ile and conflict-affected countries, countries in Sub-
Saharan Africa, and low-income countries. Whereas
93 percent of high-income countries have a fully
funded national statistical plan, not a single low-income
country has one. A recent review of public financing
of statistics found that seven of 10 low- and middle-
income countries analyzed funded less than half of
their respective national statistical plans, with country
contributions ranging from 9 percent to 77 percent.71
This problem is more pressing in low-income
countries with less government revenue to spend
on multiple priorities. However, the cost of public
data systems is modest relative to that of other
government functions. Decision-makers in budget
offices may not fully understand how much funding
is needed to produce high-quality data or lack the
incentives to prioritize data. How well public data
systems are funded is thus also a matter of high-level
government officials recognizing the value of public
intent data and offering leadership to encourage col-
lection of them.72
A key factor in such an effort is the
perceived relevance and credibility of public intent
data and its producers.73
Another reason for lack of funding for data is the
absence of a benchmark guiding how much govern-
ments should spend, unlike for other areas of gov-
ernment spending. For example, the Education 2030
Framework for Action urges countries to allocate at
least 4–6 percent of GDP or at least 15–20 percent
of their total public expenditure to education. The
Abuja Declaration urges countries to spend at least
15percentoftheirannualbudgettoimprovethehealth
sector.74
No similar guidelines are found on data.
Underinvestment by donors. Donors also invest rel-
atively little in public intent data. The share of total
official development assistance devoted to statistics
has ranged between 0.35 percent and 0.4 percent in
recent years, or US$693 million in 2018.75
The combi-
nation of national and donor contributions leaves a
funding gap of between US$100 million and US$700
million a year globally to upgrade public intent data
systems, depending on the scope of improvements.76
Misalignment of investment priorities. Beyond the
size of investments in public intent data, how donors
invest matters as well. With insufficient government
funding of data and with donors stepping in to fill
needs, the risk is that donor priorities will be funded
Figure 2.7 Most countries do not fully fund their national statistical plans
Source: WDR 2021 team calculations, based on indicators collected by the Partnership in Statistics for Development in the 21st Century (PARIS21) that are also
used as Statistical Performance Indicators (World Bank, http://www.worldbank.org/spi). Data at http://bit.do/WDR2021-Fig-2_7.
Note: Having a fully funded national statistical plan under implementation is Sustainable Development Goal Indicator 17.18.3. FCS = fragile and conflict-
affected situations.
0
20
40
60
80
100
Share
of
countries
with
fully
funded
plan
(%)
L
o
w
-
i
n
c
o
m
e
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
H
i
g
h
-
i
n
c
o
m
e
S
u
b
-
S
a
h
a
r
a
n
A
f
r
i
c
a
L
a
t
i
n
A
m
e
r
i
c
a
a
n
d
t
h
e
C
a
r
i
b
b
e
a
n
E
a
s
t
A
s
i
a
a
n
d
P
a
c
i
fi
c
S
o
u
t
h
A
s
i
a
M
i
d
d
l
e
E
a
s
t
a
n
d
N
o
r
t
h
A
f
r
i
c
a
E
u
r
o
p
e
a
n
d
C
e
n
t
r
a
l
A
s
i
a
F
C
S
N
o
n
-
F
C
S
Income group Region FCS status
0 0
48
3
94
74
69
50
43
22
93
61
41
Data as a force for public good | 67
at the expense of national priorities and that donors,
instead of national stakeholders, will become the
main clients of NSOs.77
Because investments in data tend to be small,
donors have limited incentives to make longer-term
commitments that strengthen data systems such
as technical capacity, research and development,
infrastructure, or recording of administrative data.
Instead, many investments prioritize the production
of new data or specific survey efforts such as a one-
off survey on a specific topic.78
In particular, donor
priorities skew toward monitoring and international
reporting.79
Although most national governments
subscribe to international reporting, there is argu-
ably a more immediate need for frequent and highly
geographically disaggregated data and strong admin-
istrative data systems for the effective day-to-day
functioning of government.80
Within the development community, lack of
donor coordination can undermine public intent
data systems, leading to duplication of and parallel
systems for data collection. Each project uses its own
set of indicators to report results instead of relying on
and strengthening country data systems.81
Such situ-
ations can arise if donors need to fulfill their internal
reporting requirements or are suspicious of the accu-
racy of government-reported data.
Lack of funding is also an issue for citizen-
generated data. Interviews with representatives from
NGOs in Argentina, Kenya, and Nepal revealed that
lack of funding can constrain the collection of citizen-
generated data.82
Similarly, although the cost of sen-
sors has steadily fallen over the last few years, the
costs of equipment, deployment, and transmission, as
well as the lack of off-the-shelf tools for environments
facing resource constraints, are still major barriers to
the generation and use of machine-generated data,
especially in smallholder agriculture.83
Deficiencies in technical capacity
Data gaps are also persisting because of underquali-
fied, understaffed, and underpaid data producers and
lack of technology and infrastructure.
Lack of qualified staff, proper staff renumeration, and
career incentives. The gaps in public intent data also
stem from limited technical capacity, especially in
lower-income countries—a result in part of the lim-
ited and misaligned resources previously discussed.
A shortage of skilled data scientists, statisticians, and
economistsacrosspublicdatasystemsisacriticalcon-
straint on the performance of the data producers and
the production of data, especially at a time when data
from digital sources are becoming more important.
The absence of key personnel in strategic positions
who have a commitment to data is especially costly
because of the importance of relationships between
ministries and NSOs and with civil society as a cata-
lyst for the flow of data and information.84
According to a global survey of NSOs conducted
by PARIS21, after a shortage of funds the biggest
obstacle to countries’ successful development of
capacity is lack of skilled staff to implement pro-
grams.85
In a list of 15 goals for capacity development,
86 percent of African NSOs selected strengthening
human resources as one of their five most important
goals, higher than any other category. It is particularly
difficult for NSOs to recruit new staff with the skills
needed to achieve their objectives. When reporting
the most frequent methods of human resource devel-
opment, only 7 percent of NSOs reported recruitment
of staff with new skill sets, and most of these NSOs
were in high-income countries.86
Recruitment and retention of skilled staff are
difficult without competitive pay scales and career
tracks.87
Consultations with NSOs revealed that
differences in pay scales across government entities
especially make it difficult for NSOs to recruit skilled
staff. In Ethiopia, the Central Statistical Agency fol-
lows civil service rules and regulations for remunera-
tion of staff, whereas research institutes and universi-
ties have their own rules and regulations.
A common challenge for other government agen-
cies that produce data is that they lack designated
data scientists or statisticians. This is particularly
problematic when other agency staff may lack the
time and capacity to make better use of the data col-
lected within their institution.88
Lack of technology, software, and infrastructure.
Even when producers of public intent data have
staff with the skills needed to collect, process, and
disseminate those data, they often lack the techno-
logical infrastructure to be effective in their work.
Constraints in technology and information tech-
nology (IT) infrastructure compound constraints
in technical capacity. For example, as part of the
Global COVID-19 Survey of NSOs, many NSOs in
low- and middle-income countries noted their need
for software to collect data remotely to meet new
data demands.89
In the PARIS21 survey, the option
most selected to achieve priorities for a national
statistical system in the medium term is acquiring
up-to-date technology and infrastructure.90
Tech-
nological shortcomings also constrain the ability
of individuals to produce data themselves because
many types of citizen-generated data rely on phone
or web technologies.91
68 | World Development Report 2021
Deficiencies in governance
In addition to shortages of skills and funding, various
failures and problems with data governance impede
the potential of public intent data from being realized.
At the national level, clear institutional mandates and
good coordination among the data-producing agen-
cies are critical for the exchange, interoperability, and
timely publication of data.92
In practice, exchanges of
data across ministries and between ministries and
NSOs and beyond are rare, even in well-resourced and
high-capacity environments.93
The absence of clear
mandates, responsibilities, and incentives to effec-
tively coordinate data production and data exchanges
can obstruct collaboration and lead to duplication of
data-gathering efforts.94
Deficiencies in the legal framework. The legal frame-
work governing data production and data exchanges
is a common barrier. Outdated statistical laws can
make it difficult for NSOs and data-producing agen-
cies to operate and collaborate effectively in light of
recent changes in the data landscape, such as the pro-
liferation of new data types, sources, and producers.
In Chile, the National Institute of Statistics (INE) has
had difficulties accessing key data from other public
institutions in a timely fashion, primarily because
the national statistical law is not sufficiently clear in
authorizing INE’s access to statistical information.
Whenthelawwaspassedin1970,dataexchangeswere
not a concern. Although a process to modernize the
law has been at the forefront of political discussions
for a decade, a revised version has yet to be formally
implemented. This issue is a concern more generally
because the older the national statistical law, the lower
isstatisticalperformanceingeneralanddataopenness
in particular at any country income level (figure 2.8).
Other important elements of the legal framework
are regulations governing data protection and the
right to information. When these safeguards are
lacking or weak, data exchanges can entail serious
risks to data protection.95
Lack of comprehensive data
protection regulations is a problem in many parts of
the world.96
A review of African countries found that
only 28 percent had procedures in place to ensure
deidentification of data before publication.97
Without
a requirement to share data and guidance on how
to treat confidential information, any risk-averse
government employee would face few incentives to
share data, especially confidential data, considering
the possibly high costs should confidentiality be
breached. The absence of comprehensive data protec-
tion legislation can also facilitate misuse of data such
as for political control or discrimination.98
Independence of the NSO. The legal, financial, and
institutional independence of the NSO is an import-
ant element of a successful public intent data sys-
tem, especially its data quality and openness.99
The
Figure 2.8 The older a country’s statistical laws, the lower is its statistical performance and the
less open are its data
Sources: WDR 2021 team, based on UNSTATS (Statistics Division, Department of Economic and Social Affairs, United Nations), UNSTATS (database), https://unstats.un.org/unsd/dnss/cp
/searchcp.aspx; Partnership in Statistics for Development in the 21st Century (PARIS21), https://paris21.org/knowledge-database?keyword=type%5B%5D=Statistical-Legislation
-Country-Documentsdate-from=date-to=page=; World Bank, World Development Indicators (database), https://databank.worldbank.org/source/world-development-indicators.
Data at http://bit.do/WDR2021-Fig-2_8.
Note: In panel a, the regression coefficient on age, controlling for GDP per capita, is –0.48, p  .01; in panel b, –0.39, p  .01. For the Statistical Performance Indicators, see World Bank,
Statistical Performance Indicators (database), http://www.worldbank.org/spi. For the Open Data Inventory (ODIN), see Open Data Watch, https://odin.opendatawatch.com/.
0
10
30
50
70
90
20
40
60
80
0
10
30
50
70
90
20
40
60
80
10 20 30 40 50 60 70 10
0
0 20 30 40 50 60 70
Statistical
Performance
Index
ODIN
overall
score
Age of statistical laws (years) Age of statistical laws (years)
a. Statistical performance b. Openness of data
High-income Upper-middle-income Low-income
Lower-middle-income
Data as a force for public good | 69
independence of producers of public intent data also
reinforces the credibility of and trust in the data and
its producers, which encourages data use in both gov-
ernment and civil society.100
An indicator capturing the independence of NSOs
in all African nations is included in the Ibrahim Index
of African Governance.101
The indicator measures the
institutional autonomy and financial independence
of an NSO. A perfect score indicates that an NSO is
able to publish data without clearance from another
government branch and has sufficient funding to do
so. A higher score on the NSO independence indicator
is highly correlated with statistical performance as
captured by the World Bank’s SPI (figure 2.9, panel
a). In 2019 the average score on NSO independence
was 34 out of 100, with low-income African countries
scoring below average. These findings illustrate
that NSO independence is precarious, particularly
in lower-income countries. Anecdotes of attacks on
NSO independence around the world suggest that
fragile NSO independence is not limited to the Afri-
can context.102
For example, in 2007 the Argentine
government began interfering with the independence
of Argentina’s NSO, the National Institute of Statistics
and Censuses (INDEC). The effort initially focused
on the consumer price index and later expanded to
other official statistics, casting doubt especially on
reported inflation statistics. Recognizing the harmful
effects of these measures, by 2015 a new government
had undertaken efforts to rebuild the institute, and
INDEC resumed the delivery of trustworthy statistics
with transparency and complete adherence to inter-
national principles.103
A government’s interest in having an independent
national statistical system can be affected by several
competing factors. On the one hand, a government
may have a vested interest in curtailing statistical
independence and the production and dissemination
of reliable data, fearing these could expose poor policy
decisionsandperformance,dilutepower,andincrease
public scrutiny and pressure.104
In this case, lack of
independence and the availability of reliable data
would make it harder to hold governments account-
able.105
On the other hand, an independent statistical
system producing reliable data in a transparent
fashion best informs government decision-making
and increases citizens’ trust in government data and
public institutions in general.106
Such transparency
can also facilitate favorable capital market and invest-
ment conditions and foster GDP growth.107
Finally,
international cooperation can boost statistical inde-
pendence and data transparency when adherence to
standards of data quality and the independence of
their producers is required for accession to interna-
tional organizations or agreements. An example is
Colombia’s successful bid to join the Organisation for
Economic Co-operation and Development (OECD).108
Civil society performs a vital function in demand-
ing transparency and holding government account-
able. Citizen-generated data can be used to challenge
official statistics when their accuracy or impartiality
are in question. A free and empowered press is a
Figure 2.9 Greater NSO independence and freedom of the press are positively correlated with
better statistical performance
Sources: NSO independence score: Mo Ibrahim Foundation, Ibrahim Index of African Governance (database), http://mo.ibrahim.foundation/iiag/; World Press Freedom Index: Reporters
Without Borders, 2020 World Press Freedom Index (database), https://rsf.org/en/ranking_table. Data at http://bit.do/WDR2021-Fig-2_9.
Note: The x’s represent countries. Panel a shows only African countries, and panel b shows all countries with data available. The NSO independence score ranges from 0 to 100. The
World Press Freedom Index ranges from 100 to 0—lower values imply greater press freedom. For the Statistical Performance Index, see World Bank, Statistical Performance Indicators
(database), http://www.worldbank.org/spi. NSO = national statistical office; RSF = Reporters Without Borders.
R² = 0.22
0
10
0
20
30
40
50
60
70
25 50 75 100
Statistical
Performance
Index
NSO independence score
a. NSO independence and statistical performance b. Freedom of the press and statistical performance
R² = 0.23
10
0
20
30
40
50
60
70
80
90
0
25
50
75
100
Statistical
Performance
Index
RSF World Press Freedom Index
70 | World Development Report 2021
critical check on government power in general and
on government interference with statistical indepen-
dence and data transparency in particular. Greater
press freedom, as measured in the World Press Free-
dom Index compiled by Reporters Without Borders,109
is highly correlated with statistical performance as
well as with statistical independence, regardless of a
country’s size or income level (figure 2.9, panel b).
Deficiencies in data demand
Even when high-quality data are available and acces-
sible, they must be put to an appropriate use to have
an impact on development. As such, lack of data use
is blocking the path to development.
Lowlevelsofdataliteracy. Several barriers to data use
remain. Low levels of data literacy among both policy
makers and civil society are one barrier.110
Potential
data users need to have both a conceptual understand-
ing of how data can inform policy questions and the
technical skills to extract the relevant information
from data. An analysis of the use of statistics in news
articles in 32 countries in four languages revealed con-
siderable scope for journalists to improve their critical
engagement with statistics—and that finding is likely
to apply to civil society at large.111
For policy makers as
well, data literacy is frequently identified as a barrier
to data use.112
Among the general population, compara-
bly low literacy and numeracy rates in lower-income
countries fundamentally diminish the pool of poten-
tial data users.113
Lack of incentives for and interest in data use. Even
when policy makers have the skills to use data, they
may not be interested in exercising those skills
because they do not attach value to data. Accordingly,
another major factor affecting demand for public
intent data is lack of incentives to use the data.114
When political leaders exhibit a commitment to data
use, they can generate expectations for civil servants
to rely on data more frequently and create incentives
for accountability. “Political champions,” as well as
changes in administration or individual government
officials, often create opportunities for data-driven
policy making.115
A data-literate society plays a major
role in creating these political commitments to data
use by demanding—and rewarding—the justification
of policy decisions with data.
Low trust in the quality of public intent data. Another
reason for lack of data use is the often low trust in the
quality of public intent data. Although data users can
check for signs of internal coherence, the accuracy of
datacannotbeinferredfromthedataalone,andincor-
rect statistics can take years to be detected, if they are
detected at all.116
A survey of data producers and users
in 140 countries found that NSO officials have much
greater confidence in the quality of national statistics
than ministry officials have.117
Lack of infrastructure to access and use the data. A
final reason for lack of data use is related to the infra-
structure needed to access and use data. For example,
internet access is key to obtaining data, but penetra-
tion rates are lower in poorer countries. The exclusive
sharing of data via online channels may exclude large
shares of potential data users who are hampered by
limited internet connectivity.118
And certain users may
be unaware that data are available for use.119
Lack of
internet connectivity, reliable power, and data centers
are also major challenges in the use of Internet of
Things systems and sensor data.120
Use of public intent data by a diverse group of
actors often translates into greater demand for
high-quality data. The rise in demand can drive
investment in data and capacity, setting off a virtuous
cycle of increasing data demand and supply (figure
2.10). For example, government ministries’ reliance
on and demand for high-quality data have been
associated with NSOs in Latin America exhibiting
higher capacity.121
In the same region, demand for and
interest in accurate and high-quality statistics in civil
Figure 2.10 Data supply and demand can generate
either virtuous or vicious cycles of data production
and use
Source: Adapted from Sanga (2013).
High
demand
Low
demand
Data supply–
constrained
countries
Virtuous cycle
countries
Vicious cycle
countries
Data demand–
constrained
countries
Low quality
and supply
High quality
and supply
Data as a force for public good | 71
society, academia, the media, and the private sector
have led to better funding, autonomy, and capacity
in national statistical systems.122
Conversely, coun-
tries with a low supply of data are likely to use data
less, creating a vicious cycle of data production and
use. In general, countries can benefit from assessing
whether their constraints are primarily on the supply
side or the demand side for data. They can then use
such an assessment to prioritize data-related policies
and maximize their return on development.123
Realizing the potential of public
intent data
To maximize the impact of public intent data on
development, governments need to address the
financing, technical capacity, governance, and data
demand roadblocks. This section describes policies to
overcome these foundational challenges. Figure 2.11
summarizes some of the main policies governments
can enact, categorized by the actors and barriers they
primarily address. International organizations also
have a role to play, and spotlight 2.2 discusses how
they can contribute to addressing the key roadblocks.
Chapter 9 builds on the analysis in this section,
specifically in the domain of data governance, lay-
ing out a bold vision for an integrated national data
system. Such a system can transform the role the
public sector plays in the data modernization agenda
by incorporating public intent data alongside private
intent data, integrating the users and producers of
both, and enabling safe data exchanges.
Figure 2.11
Financing Technical capacity Governance Data demand
Civil
society
and
academia
NSOs
Other
government
agencies
• Create a target fraction of
government spending or
a line item in the national
budget dedicated to the
NSO.
• Engage recurrently with
the Ministry of Finance to
understand and support
its data needs.
• Ensure more competitive
pay scales.
• Devote more time and
resources to building
capacity among staff.
• Ensure that NSO
independence is anchored
in laws and institutional
setup.
• Prevent statistical laws
from becoming outdated.
• Build trust in integrity of
official statistics via public
release calendars and best
practices in dissemination.
• Engage proactively with
nongovernmental entities.
• Designate a budget line
for data in each ministry
and agency.
• Ensure use of produced
data across the public
sector for monitoring,
evaluation, and more.
• Create technical units in
charge of data production
and use.
• Create a governmentwide
approach to the salaries of
statisticians and data
scientists.
• Assign clear roles,
mandates, and
responsibilities along the
data life cycle for
government agencies
(see chapters 8 and 9).
• Designate knowledge
brokers in government
agencies to champion the
flow and use of data.
• Institutionalize
data-intensive
management practices.
• Allocate resources to
citizen-generated data
collection.
• Promote data literacy in
primary and secondary
education.
• Enhance tertiary education
in data science and
statistics.
• Ensure that laws and
regulations facilitate the
safe dissemination of data.
• Enable citizens to engage
more easily with data
through open data
platforms, machine
readability, and data
visualizations.
Political commitment
Create a broad-based political and societal agreement on the value of high-quality public intent data
Figure 2.11 Policies to realize the potential of public intent data
Source: WDR 2021 team.
Note: The figure summarizes policies governments can enact, categorized by the actors and barriers the policies are primarily addressing. Many policies span several actors and bar-
riers but are placed into one box here for simplification. The role of the private sector in realizing the potential of public intent data is discussed in chapter 4. The role of international
organizations is examined in spotlight 2.2. NSO = national statistical office.
72 | World Development Report 2021
A common reason for the four roadblocks on the
pathways to data for public policy just described is
the lack of a political understanding and appreciation
of the value of data for policy making. Achieving
high-quality production and use of public intent data
requires an unequivocal high-level political commit-
ment to data for development, even when data do not
yield politically convenient insights. A broad-based
political and societal agreement on the value of public
intentdataisthemosteffectivewaytoensurearobust
political commitment to data. Such a social contract
for data can build the trust of all participants that
they will not be harmed in the production, exchange,
and use of data. Actors from across the public sector,
private sector, civil society, and academia can play an
important role in demanding and encouraging agree-
ment. One mechanism for formulating such broad
agreement and formalizing a commitment to data
is confirming the importance of data in countries’
national development plans. Another mechanism
is formulating a national data strategy—a topic dis-
cussed in greater detail in chapters 8 and 9.
Financing needs: Strengthening and
sustaining financial resources for data
producers
Mostlow-incomeandlower-middle-incomecountries
severely underspend on data. Securing sustainable
financing is an enduring struggle for data producers
and users. To reap the full value of data for develop-
ment, governments must raise current spending
levels drastically. At the same time, it is painfully hard
to obtain and benchmark how much governments are
spending on data. Thus one priority is to improve the
statistics on government spending on data.
One way to increase the priority given to financ-
ing of data is to establish a target (percentage) for
the government expenditure on the national sta-
tistical system. Such a target can be derived with a
view toward the resources needed to fully fund the
national statistical plan or be based onthespendingof
peer countries that have achieved sufficient funding.
If a government commits to such a target through a
national development plan or through other means, it
arms data producers during later budget negotiations.
Another way to implement stable and transparent
government financing is to insert a line item in the
national budget dedicated to the NSO. The absence of
such a budget line has been a problem for even high-
income countries. For example, the European Union’s
statistical agency, Eurostat, recently saw its budget
line merged into an overarching digitization and
modernization budget, raising fears that funding for
statistical needs could be at risk. Conversely, one of
the biggest steps in ensuring the independence of
the United Kingdom’s Office for National Statistics in
2007 was giving the office authority over how it uses
its budget. Similarly, data-producing ministries and
other government agencies could each receive a des-
ignated budget line for the production, processing,
management, and safe sharing of the administrative
data they produce. Other investment priorities should
be closing existing coverage gaps in vital statistics
and other registers and including populations that
are hard to reach.
The Ministry of Finance has a special role to play
as the most influential actor in budget negotiations
for government-financed producers of public intent
data. Recurrent engagement with, and consequently
systematic use of, public intent data and official
statistics by the Ministry of Finance is also likely to
improve the funding for data producers and the NSO
in particular.124
As documented in the examples ear-
lier in this chapter, it is important that the Ministry
of Finance understand that investing in data may
improve budgets through increased revenue collec-
tion and elimination of duplication in beneficiaries,
among other things.
Stable government financing can also be secured
by ensuring that data play a role in government
programs and projects. When government projects
have numerical targets, data management and data
analysis are a must. Where relevant, the legislature
could require that government program budgets be
supported or justified by evidence, necessitating the
use of data and therefore funding for data. Although
linking funding for data to monitoring government
targets may also create disincentives in producing
accurate data, resisting such disincentives must be at
the core of an NSO’s mission to ensure credibility of
and trust in official statistics.
Sometimes the financing for data is sufficient, but
the resources need to be better spent. Government
funding of citizen-generated data, for example, can
complement that of other public intent data and be a
less costly alternative. But doing so requires that civil
society data platforms have sufficient capabilities
and resources for community outreach, coordination,
monitoring of data collection, and quality assess-
ments of the data.125
Technical capacity needs: Investing in
human capital for production of public
intent data
Once more and better funding is provided, invest-
ment in technical capacity is a top priority. Such an
Data as a force for public good | 73
effort should start with human capital: investing in
statisticians, data scientists, and applied economists
across the public sector and in data literacy in the
population at large. These investments would pro-
mote demand for public intent data and bolster the
credibility of and trust in public intent data producers.
These goals could be achieved through a combination
of education and training initiatives.
Meanwhile, the public sector at large and the
NSO in particular should seek qualified statisticians,
applied economists, and data scientists. One way of
doing so is to create a governmentwide approach to
the formulation of salary scales and renumeration
of the positions across the public sector, including in
the NSO, the central bank, and line ministries. Such
an approach could minimize differentials in salary
scales within and across government agencies and
subsequently create an opportunity to adopt more
competitive salary scales to attract and retain talent.
A strategic area in which NSO capabilities in low-
and middle-income countries could be strengthened
is research on the development of improved methods
and standards for data production. The capacity to
conduct such methodological research is critical to
improving the availability, quality, and usability of
public intent data. NSOs could establish a business
line on experimental statistics, which may serve as
an avenue for participating in cutting-edge, multi-
disciplinary research efforts centered on integrating
public intent and private intent data. Low-capacity
NSOs, however, will have to strengthen, and in cer-
tain cases create, capabilities in data science and
geographic information systems. Twinning arrange-
ments between NSOs with established programs on
experimental statistics and those beginning to build
these capabilities may be one way to accelerate prog-
ress. These activities are also aligned with the call for
international organizations to sustain investments in
the search for improved methods of data collection,
curation, and analysis (see spotlight 2.2).
Beyond NSOs, data-related capabilities in min-
istries and other government agencies are often
insufficient. They could remedy the situation by first
creating technical units in charge of data produc-
tion, processing, management, and dissemination to
improve data quality. These units could also develop
ministry-specific action plans for capacity building,
and should be empowered by receiving the financial,
technological, and human resources they need to ful-
fill their mandated roles in the national data system.
Their goal would be delivery of high-quality knowl-
edge disseminated in accordance with a ministry-
specific public release calendar.
Capacity building should also be pursued in a
country’s education system.126
In line with the aspi-
rations of SDG Target 4.6, primary and secondary
educational institutions should elevate foundational
numeracy and statistical literacy skills so that, like
general literacy, they are part of the fundamental
curricula. These skills would empower an informed
public of data users and create a pool of potential can-
didates for specialized data professions. In tertiary
education and data-driven academic fields, advanced
education on statistics should be enhanced in ways
that equip future technocrats with data skills that
meet policy makers’ demands.
An example at the country level is Politeknik
Statistika, a highly selective university established
by Statistics Indonesia in 1958. Politeknik Statistika
awards bachelor’s degrees, with an emphasis on
applied training in official statistics, in preparation
for statistical careers at Statistics Indonesia and the
public sector at large. Examples at the regional level
include the Ecole Nationale Supérieure de Statistique
et d’Economie Appliquée (ENSEA) in Côte d’Ivoire and
the Eastern Africa Statistical Training Center (EASTC)
in Tanzania.
Degree and certificate programs with a data
science theme, including those offered online, can
facilitate development of statistical capacity in
techniques that cut across statistics and computer
science, such as artificial intelligence and machine
learning. A noteworthy example is the Think Data
Science Program that was launched in 2019 by the
Palestinian Central Bureau of Statistics (PCBS), in
partnership with the Arab American University
in West Bank and Gaza. As part of this program,
students have to complete a graduation project at
the PCBS, which gets accredited by the Ministry of
Higher Education.
Finally, investments in human capital should be
accompanied by investments in physical infrastruc-
ture, IT platforms, and software capabilities (see
chapter 5).
Governance needs: Making laws and
regulations conducive to production and
use of quality data
Effective use of public intent data depends on having
in place a governmentwide national data strategy or
another high-level document that outlines the roles,
responsibilities, and mandates of various govern-
ment agencies. Such arrangements are discussed in
detail in chapters 8 and 9.
The NSO must be truly independent, impar-
tial, and nonpolitical. Its independence should be
74 | World Development Report 2021
anchored in laws and an institutional setup that
curtails political interference in official statistics
and other public data products.127
Debatable is
whether placement of the NSO under the executive
branch of government leaves it open to attacks on
its independence. On the one hand, it is important
that the NSO be positioned to inform public debate
and policy. But this may be difficult to achieve if the
NSO is administratively separated from other parts
of the government and does not maintain a close
relationship with influential ministries such as the
Ministry of Finance or Treasury and the Ministry of
Commerce or Industry. On the other hand, reporting
to a specific ministry or an individual as part of the
executive branch leaves the NSO vulnerable to being
questioned, pressured, or otherwise influenced in
its involvement with politically sensitive statistical
activities.
Another way to safeguard against the politiciza-
tion of data is by making deidentified public intent
datasets publicly available and accessible. Ensuring
the creation and dissemination of deidentified public
intent datasets is partly a political task and partly a
technical one.
On the political front, the NSO and other govern-
ment agencies must promote open data for develop-
ment. These agencies should ensure that statistical
laws and regulations permit the public dissemination
of deidentified public intent data—both aggregated
data and microdata. They should also actively engage
with data users to cultivate a shared understanding
of the value of reusing open data for research and for
design and evaluation of public policy. Administrative
data in particular are often not accessible beyond the
ministry collecting the data.
On the technical front, safeguarding the confi-
dentiality of subjects of public intent data production
is an unconditional requirement. Confidential data
include both personally identifiable information and
the geographic coordinates of data subjects, includ-
ing communities, households, facilities, and estab-
lishments. Although best practices, standards, and
tools for microdata deidentification are available,128
the risk of disclosure is increasing with enhance-
ments in the interoperability of public intent data.
These trends call for continued improvement of
deidentification techniques. Building capabilities
within technical units of ministries and NSOs in the
use of analytical tools to remove sensitive informa-
tion, spatially deidentify microdata, and deal respon-
sibly with the risk of disclosure will also foster a
better culture of open data.
Data demand needs: Expanding the use of
public intent data
The precondition for the widespread use and reuse
of data is greater data literacy among the citizenry
at large and government decision-makers. The
integrity of and public trust in official statistics are
also critical to the demand for data. The integrity of
official statistics is closely tied to the perceived inde-
pendence and trustworthiness of the NSO. Existing
best practices can ensure integrity of and trust in
the computation of official statistics and the timing
of their release, even in the face of political pres-
sures. A first set of practices centers on effective
outreach and communication about NSO products.
These practices include publishing a release calen-
dar and providing a public explanation of potential
deviations from release dates, as well as publicly dis-
seminating meticulous documentation and meta­
-
data allowing findings to be replicated. Other best
practices include refraining from participating in
national politics and carrying out periodic outreach
efforts to cultivate public understanding and accep-
tance of the importance of an independent statisti-
cal agency.
NSOs could also increase use of and demand for
their data by engaging proactively with and listening
to stakeholders in government, academia, the private
sector, CSOs, and the media.129
These engagements
may have multiple objectives such as disseminating
statistical outputs, understanding and responding
to user needs, exploring links between NSO data
products and other data, and strengthening statis-
tical literacy. Statistics Canada, Statistics Indonesia,
and Mexico’s National Institute of Statistics and
Geography (INEGI) have engaged in recurrent
training of journalists in print, radio, television, and
digital media outlets on official statistics. INEGI
has expanded its work program on data and statis-
tics related to crime and victimization in Mexico in
response to the growing demands from policy mak-
ers and data users. Elsewhere, the Palestinian Central
Bureau of Statistics regularly disseminates official
statistics on important international observances,
such as International Workers’ Day and Interna-
tional Women’s Day.
Closely involving civil society in the use and
production of data is critical. This involvement can
be achieved by establishing advisory boards com-
posed of independent technical experts who can
help prepare national statistical strategies in view of
the needs of all users—not only the needs of various
government agencies.
Data as a force for public good | 75
Data visualization is another way in which NSOs
could increase the reach of official statistics and the
public’s understanding of them. It does little good
to achieve greater mastery of advanced analytics
without also ensuring that the policy makers design-
ing and enacting interventions that improve lives
understand and appreciate the value added by data.130
Distilling complex phenomena into compelling visu-
als and narratives for broad audiences is a timeless
idea that can effectively influence public debate and
policy making (for a pioneering example, see box 1.2
in chapter 1).
From the perspective of government ministries
and agencies, one way to jump-start data use in
planning and policy making is through the institu-
tionalized adoption of data-intensive management
practices. In Rwanda, as part of the nationwide
Imihigo performance contracts launched in 2006,
mayors commit to setting development targets. Each
target is subsequently evaluated and ranked by the
national government with respect to its achievement
and whether it was monitored appropriately.131
Man-
agement of these contracts not only requires large
amounts of data to evaluate performance, but, more
important, puts data on development outcomes at the
center of the policy discourse.132
When low data literacy or appreciation of data
are barriers to their use, knowledge brokers can
facilitate data use in the public sector. A knowledge
broker points policy makers to the relevant data and
creates value through collaboration.133
The important
role of knowledge brokers is highlighted by evidence
from a survey conducted by AidData: policy makers
reported that they learn about sources of data primar-
ily through personal interactions.134
The role of knowledge broker can be fulfilled by
government officials and by outsiders. Central ana-
lytical units and technical staff in line ministries can
serve as intermediaries for NSOs seeking to reach
senior officials and increase data use, presenting data
in both technical and nontechnical ways tailored to
the needs of decision-makers.135
Another useful tech-
nique is joint analytical exercises by the government
and researchers. Collaboration between external
researchers and policy makers is a major facilitator of
the use of evidence and data.136
If governments address these financing, human
capital, governance, and data demand roadblocks,
the value of public intent data can be maximized.
Chapter 9 discusses sequencing of the required
government interventions, placing such activities
within an integrated national data system. Another
way in which data can lead to better lives is via
the private sector. That is the topic of the next
chapter.
Notes
1. Unfortunately, in some contexts this scenario is not too
far from reality. For example, Das and Hammer (2007)
found that doctors in New Delhi often perform only a
fraction of the recommended examinations and tests
when patients present with common yet dangerous
health conditions.
2. As just one example, in Ethiopia a 2016 study by Rog-
ger and Somani (2018) surveying 1,831 officials of 382
organizations spanning three tiers of government
revealed officials’ significant lack of knowledge about
their area of work. Half thought that their district’s
population was at least 50 percent larger or smaller
than it was. Government staff in the educational sector
were on average 38 percent off when estimating pri-
mary enrollment figures.
3. Cameron et al. (2019).
4. See chapter 1 for more information on the distinction
between public intent data and private intent data,
chapter 3 for a discussion of private intent data, and
chapter 4 for a discussion of how both kinds of data can
be repurposed.
5. See Jolliffe et al. (forthcoming) for a lengthier discus-
sion of these 12 features of public intent data and exam-
ples of how they can generate value for development.
6. World Bank (2018d).
7. SDSN TReNDS (2018b); SSEE (2014).
8. Hallegatte et al. (2017).
9. SDSN TReNDS (2018a).
10. J-PAL (2018).
11. Hjort et al. (2019).
12. Arezki et al. (2020), for example, show that imprecise
definitions of employment in the Middle East and
North Africa blur the lines between unemployment
and informality and distort the role of women and
rural areas in national labor markets.
13. The World Bank LSMS team provided the number of
countries in which LSMS-supported survey data pro-
duction took place from 2011 to 2020.
14. Abay et al. (2019); Arthi et al. (2018); Carletto, Gourlay,
and Winters (2015); Carletto, Savastano, and Zezza
(2013); Carletto et al. (2017); De Weerdt, Gibson, and
Beegle (2019); Desiere and Jolliffe (2018); Dillon et al.
(2019); Gaddis et al. (2019); Gourlay, Kilic, and Lobell
(2019); Kilic et al. (2017, 2018).
15. World Bank (2016a, 2017a).
16. The EU uses a Nomenclature of Territorial Units for
Statistics, known as NUTS, for the purpose of allocat-
ing funds. Many EU countries have a hierarchy of three
NUTS levels. The second level, NUTS 2, is used for allo-
cation of funds. In the accompanying text, NUTS 2 is
referred to as areas.
17. Government of Croatia (2019).
18. Open Data Watch (2015b).
19. Data2x (2019).
76 | World Development Report 2021
20. McCluskey and Huang (2019) and unpublished notes
shared with the WDR 2021 team. The 30 percent refers
to own-source revenue collection—that is, the part of
the revenue collection that the cities themselves over-
see in contrast to revenue they receive from national
authorities and more.
21. World Bank (2018c). See also World Bank (2020b).
22. Roseth, Reyes, and Amézaga (2019) and references cited
therein provide evidence of an up-to-date census gen-
erating savings to the government many times its cost.
The value of public intent data to the private sector is
discussed in spotlight 3.1 and elsewhere.
23. The SDG on clean water and sanitation relies on a
mix of household surveys, population and housing
censuses, and administrative data (SDSN 2015). Earth
observation data are used for the SDGs on sustainable
cities and communities, life below water, life on land,
and more (Anderson et al. 2017). Citizen-generated data
are often used when government data are missing and
to verify government data (Lämmerhirt et al. 2018). In
the Philippines, for example, the NSO identified more
than 80 relevant SDG indicators where data are missing
and CSOs could provide inputs through community-
based monitoring systems (PARIS21 and PSA 2020).
24. MPPN (2017).
25. Yin et al. (2020). See World Air Quality Index Project,
World’s Air Pollution: Real-Time Air Quality Index
(database), https://waqi.info/, and OpenAQ, OpenAQ
(database), https://openaq.org/, for publicly available,
real-time data from air quality monitoring stations
around the globe, including those in China.
26. Open Data Watch (2015a).
27. Bosio and Djankov (2020).
28. AFIC (2018); GPSA (2020).
29. World Bank (2017c).
30. Callen et al. (2019). Petrov, Gurin, and Manley (2016)
and Verhulst and Young (2017) contain many other
examples and channels through which open data may
lead to better development outcomes.
31. Safetipin (2016).
32. Fabregas, Kremer, and Schilbach (2019).
33. For example, in several high-profile cases researchers
have been able to reidentify individuals from publicly
available microdata, even though the data had been
published in a deidentified fashion (Heffetz and Ligett
2014).
34. Díaz (2018); Fundación Reflejos de Venezuela (2016);
Privacy International (2019). LGBTQI stands for les-
bian, gay, bisexual, transgender, queer (or questioning),
intersex.
35. Open Data Watch, ODIN (Open Data Inventory) (data-
base), https://odin.opendatawatch.com/; Open Knowl-
edge Foundation, GODI (Global Open Data Index)
(database), https://index.okfn.org/.
36. WDR 2021 team calculations based on 2019 Statistical
Performance Indicators (World Bank, Statistical Perfor-
manceIndicators[database],http://www.worldbank.org
/spi). As of December 2014, 21 countries had not com-
pleted a census during the 2010 round of the population
and housing census (Statistics Division, Department of
Economic and Social Affairs, United Nations, World
Population and Housing Census Programme [data-
base], http://mdgs.un.org/unsd/demographic-social
/census/index.cshtml).
37. See estimates in, for example, Roseth, Reyes, and
Amézaga (2019) and references cited therein.
38. WDR 2021 team calculations based on 2019 Statistical
Capacity Indicators (World Bank, Statistical Capacity
Indicators [database], https://datatopics.worldbank
.org/statisticalcapacity/SCIdashboard.aspx).
39. López-Vargas, Fuentes, and Vivar (2020).
40. WDR 2021 team calculations based on 2019 Statistical
Performance Indicators (World Bank, Statistical Per-
formance Indicators [database], http://www.worldbank
.org/spi).
41. Desai, Diofasi, and Lu (2018); World Bank, Global
ID4D Dataset (Identification for Development Global
Dataset) (database), https://datacatalog.worldbank.org
/dataset/identification-development-global-dataset.
42. United Nations (2019b).
43. Carr-Hill (2013); Randall (2015); Seltzer and Walker
(2020); Toulemon (2017).
44. Carr-Hill (2013).
45. Jerven (2019).
46. Where vital registration systems function well, admin-
istrative records can be used to update census pop-
ulation counts. But these systems are weak in lower-
income countries. Gaps in registration will likely leave
out more vulnerable people than the census, especially
seasonal migrants and the displaced (Dunning, Gelb,
and Raghavan 2014). The use of administrative records
from nongovernment actors can supplement official
records.
47. Maron (2015).
48. UN Women (2018). Gender-specific SDG indicators
are those that explicitly call for disaggregation by
sex or that refer to gender equality as the underlying
objective.
49. Doss, Kieran, and Kilic (2020). Administrative recording
of land titles can serve the function of documenting
asset ownership at the individual level. However, land
and property ownership registries are among the less
developed administrative recording systems globally.
According to the Global Open Data Index, these data
are available in less than one-quarter of countries, even
in high-income nations, and are rarely openly available.
50. World Bank (2017b). Advances have been made in
intrahousehold poverty estimation based on structural
models and existing household survey data—that is,
clothing expenditures for women, men, and children
(Lechene,Pendakur,andWolf2019).Thepredictionspro-
vided by these models, however, have yet to be validated
in the context of randomized survey experiments that
collect detailed, individual-disaggregated consumption
data that can, in turn, be used to compute observed esti-
mates of intrahousehold poverty among women, men,
and children. These observed estimates can, in turn, be
compared with predictions stemming from structural
models, based on the data elicited through prevailing
approaches to household survey data collection.
Data as a force for public good | 77
51. Tiberti and Costa (2020); UN Women (2018). Similarly,
individual-disaggregated data on time use are required
to monitor SDG Target 5.4. Yet of the 84 countries
known to have conducted time use surveys in the
past, only 24 percent of them have collected data since
2010.
52. Ambler et al. (2020); Bardasi et al. (2011); Chen and Col-
lins (2014); Deere, Alvarado, and Twyman (2012); Fisher,
Reimer, and Carr (2010); Jacobs and Kes (2015); Kilic and
Moylan (2016); Kilic, Moylan, and Koolwal (2020); Kilic
et al. (2020).
53. See de Mel, McKenzie, and Woodruff (2009); de Nicola
and Giné (2014); Gibson and Kim (2010).
54. Arthi et al. (2018); Das, Hammer, and Sánchez-Paramo
(2012); Deininger et al. (2012); Gaddis et al. (2019); Kilic
et al. (2018); Wollburg, Tiberti, and Zezza (2020).
55. Sandefur and Glassman (2015).
56. Sandefur and Glassman (2015).
57. Karagulian et al. (2019).
58. Antony et al. (2020); Morawska et al. (2018).
59. Similar findings were reported in Beegle et al. (2016).
60. See Hoogeveen and Pape (2020) for more information
on such innovations. The last two poverty data points
are comparable in 60 percent of countries in FCS and
in 75 percent of low-income and Sub-Saharan African
countries.
61. FMOH (2018).
62. CTO (2018).
63. Custer and Sethi (2017); Kiregyera (2017).
64. However, under secure circumstances authorized
third-party researchers can be allowed to match indi-
vidual-levelrecords across multiple datasources to gen-
erate insights that rely on individual-level matching.
65. Privacy by design refers to proactively embedding pri-
vacy considerations in the design of information tech-
nology and data systems. See examples from Austria,
Estonia, and India covered in ID4D Practitioner’s Guide:
Version 1.0 (World Bank 2019b).
66. WDR 2021 team calculations based on the 2019 Statis-
tical Performance Indicators (World Bank, Statistical
Performance Indicators [database], http://www.world
bank.org/spi).
67. Katsimi and Moutos (2010).
68. Heffetz and Ligett (2014).
69. In particular, the team had discussions with the NSOs
of Canada, Chile, Ethiopia, India, Indonesia, Mexico,
the United Kingdom, and West Bank and Gaza.
70. See United Nations (2019b) for similar findings.
71. Calleja and Rogerson (2019). McQueston (2013) found
similar results.
72. Dargent et al. (2020); OECD (2017).
73. United Nations (2019b).
74. UNESCO (2016); WHO (2011).
75. PARIS21 (2020).
76. Calleja and Rogerson (2019).
77. Sethi and Prakash (2018).
78. Calleja and Rogerson (2019). National governments
also tend to prioritize covering ongoing expenses for
collecting data over onetime investments in systems.
79. Lange (2020).
80. Calleja and Rogerson (2019); Sandefur and Glassman
(2015); World Bank (2018a).
81. Sanna and McDonnell (2017).
82. Piovesan (2015).
83. Antony et al. (2020); Hosman (2014); López-Vargas,
Fuentes, and Vivar (2020); Pham, Rahim, and Cousin
(2016).
84. Allard et al. (2018).
85. PARIS21 (2018).
86. PARIS21 (2018).
87. Dargent et al. (2020).
88. Allard et al. (2018); Johnson, Massey, and O’Hara (2015).
89. Fu and Schweinfest (2020); UNSTATS and World Bank
(2020).
90. PARIS21 (2018).
91. Lämmerhirt et al. (2018).
92. OECD (2019).
93. Allard et al. (2018).
94. Calleja and Rogerson (2019); Khan, Wales, and Stuart
(2015).
95. OECD (2019).
96. Amnesty International (2019); Privacy International
(2013); United Nations (2019a).
97. Van Belle et al. (2018).
98. Amnesty International (2019); Privacy International
(2013); United Nations (2019a).
99. Independence of the national statistical system
underpins the UN Statistical Commission’s 1994
Fundamental Principles of Official Statistics. The
commission highlighted its concern for independence
in its 2015 United Nations Fundamental Principles of
Official Statistics: Implementation Guidelines (UNSTATS
2015).
100. Childs et al. (2019); Taylor (2016).
101. Mo Ibrahim Foundation, Ibrahim Index of African
Governance (IIAG) (database), http://mo.ibrahim
.foundation/iiag/.
102. Bodin (2011); Todesca (2017); Trewin (2018); von
Oppeln-Bronikowski et al. (2015).
103. Todesca (2017).
104. Hoogeveen and Nguyen (2019); Taylor (2016); World
Bank (2016b, 2017c).
105. Desiere, Staelens, D’Haese (2016); Jerven (2014).
106. Brackfield (2012); World Bank (2018a).
107. Arezki et al. (2020); Cady (2005); Cady and Pellechio
(2006); Kubota and Zeufack (2020).
108. Dargent et al. (2020).
109. See Reporters Wthout Borders, 2020 World Press Free-
dom Index (database), https://rsf.org/en/ranking_table.
110. World Bank (2016b, 2018a).
111. Klein, Galdin, and Mohamedou (2016).
112. Custer and Sethi (2017); Kiregyera (2017).
113. WDR 2021 team based on information in World Bank,
“Literacy Rate, Adult Total (% of People Ages 15 and
Above),” https://data.worldbank.org/indicator/se.adt
.litr.zs.
114. World Bank (2017c).
115. Manning, Goldman, and Hernández Licona (2020).
116. Hoogeveen and Nguyen (2019).
117. Sethi and Prakash (2018).
78 | World Development Report 2021
118. Custer and Sethi (2017); World Bank (2018a).
119. Custer and Sethi (2017); Kiregyera (2017).
120. ITU (2016); López-Vargas, Fuentes, and Vivar (2020);
Pham, Rahim, and Cousin (2016).
121. Dargent et al. (2020).
122. Dargent et al. (2020).
123. Scott (2005).
124. World Bank (2019a).
125. Lämmerhirt et al. (2018).
126. OECD (2017).
127. Bodin (2011); Todesca (2017); Trewin (2018); von
Oppeln-Bronikowski et al. (2015).
128. For more information on resources and tools related
to the anonymization of microdata, see World Bank
and PARIS21 Consortium, Microdata Anonymization
(database), International Household Survey Network,
PARIS21 Consortium, https://ihsn.org/anonymization.
129. Snorrason (2018).
130. Ashby (2019).
131. World Bank (2018b).
132. Krätke and Byiers (2014).
133. Head (2016); Manning, Goldman, and Hernández
Licona (2020).
134. Masaki et al. (2017).
135. Sethi and Prakash (2018).
136. Oliver et al. (2014).
References
Abay, Kibrom A., Gashaw T. Abate, Christopher B. Barrett,
and Tanguy Bernard. 2019. “Correlated Non-Classical
Measurement Errors, ‘Second Best’ Policy Inference,
and the Inverse Size-Productivity Relationship in
Agriculture.” Journal of Development Economics 139 (June):
171–84. https://doi.org/10.1016/j.jdeveco.2019.03.008.
AFIC (Africa Freedom of Information Center). 2018. “Eyes on
the Contract: Citizens’ Voice in Improving the Perfor-
mance of Public Contracts in Uganda.” 2nd Monitoring
Report, AFIC, Kampala, Uganda. https://africafoicentre
.org/download/eyes-on-the-contract-citizens-voice
-in-improving-the-performance-of-public-contracts-in
-uganda/.
Allard, Scott W., Emily R. Wiegand, Collen Schlecht, A. Rupa
Datta, Robert M. Goerge, and Elizabeth Weigensberg.
2018. “State Agencies’ Use of Administrative Data for
Improved Practice: Needs, Challenges, and Opportuni-
ties.” Public Administration Review 78 (2): 240–50.
Ambler, Kate, Cheryl Doss, Caitlin Kieran, and Simone Pas-
sarelli. 2020. “He Says, She Says: Exploring Patterns
of Spousal Agreement in Bangladesh.” Economic Devel-
opment and Cultural Change. Published ahead of print,
November 16. https://doi.org/10.1086/703082.
Amnesty International. 2019. “New Technologies and Their
Impact on the Promotion and Protection of Human
Rights in the Context of Assemblies: Submission to
the Office of the United Nations High Commissioner
for Human Rights,” Amnesty International, London.
https://www.amnesty.org/download/Documents
/IOR4012842019ENGLISH.pdf.
Anderson, Katherine, Barbara Ryan, William Sonntag,
Argyro Kavvada, and Lawrence Friedl. 2017. “Earth
Observation in Service of the 2030 Agenda for Sustain-
able Development.” Geo-Spatial Information Science 20 (2):
77–96.
Antony, Anish Paul, Kendra Leith, Craig Jolley, Jennifer Lu,
and Daniel J. Sweeney. 2020. “A Review of Practice and
Implementation of the Internet of Things (IoT) for
Smallholder Agriculture.” Sustainability 12 (9): 3750.
Arezki, Rabah, Daniel Lederman, Amani Abou Harb, Nelly
Youssef, Louis William El-Mallakh, Rachel Yuting Fan,
Asif Mohammed Islam, et al. 2020. “Middle East and
North Africa Economic Update, April 2020: How Trans-
parency Can Help the Middle East and North Africa.”
World Bank Other Operational Studies 33475, World
Bank, Washington, DC.
Arthi, Vellore, Kathleen Beegle, Joachim De Weerdt, and
Amparo Palacios-López. 2018. “Not Your Average Job:
Measuring Farm Labor in Tanzania.” Journal of Develop-
ment Economics 130 (January): 160–72.
Ashby, Deborah. 2019. “Pigeonholes and Mustard Seeds:
Growing Capacity to Use Data for Society.” Journal of the
Royal Statistical Society: Series A (Statistics in Society) 182 (4):
1121–37.
Bardasi, Elana, Kathleen Beegle, Andrew Dillon, and Pieter
Serneels. 2011. “Do Labor Statistics Depend on How
and to Whom the Questions Are Asked? Results from
a Survey Experiment in Tanzania.” World Bank Economic
Review 25 (3): 418–47.
Beegle, Kathleen, Luc Christiaensen, Andrew L. Dabalen,
and Isis Gaddis. 2016. Poverty in a Rising Africa. Africa
Poverty Report. Washington, DC: World Bank.
Bodin, Jean-Louis. 2011. “How to React When the Indepen-
dence of Statisticians and the Integrity of Statistics
Are Endangered?” Statistical Journal of the IAOS 27 (1–2):
59–69.
Bosio, Erica, and Simeon Djankov. 2020. “How Large Is Public
Procurement?” Let’s Talk Development (blog), February 5,
2020. https://blogs.worldbank.org/developmenttalk
/how-large-public-procurement.
Brackfield, David. 2012. “OECD Work on Measuring Trust
in Official Statistics.” Bulletin of the ISI 58th World Statis-
tics Congress of the International Statistical Institute, 2011
(December 2012): 3721–26. The Hague, Netherlands:
International Statistical Institute.
Buvinic, Mayra, Lorenz Noe, and Eric Swanson. 2020.
“Understanding Women’s and Girls’ Vulnerabilities to
the COVID-19 Pandemic: A Gender Analysis and Data
Dashboard of Low- and Lower-Middle Income Coun-
tries.” Data2x, United Nations Foundation, Washing-
ton, DC.
Cady, John. 2005. “Does SDDS Subscription Reduce Borrow-
ing Costs for Emerging Market Economies?” IMF Staff
Papers 52 (3): 503–17.
Cady, John, and Anthony J. Pellechio. 2006. “Sovereign Bor-
rowing Cost and the IMF’s Data Standards Initiatives.”
IMF Working Paper WP/06/78, International Monetary
Fund, Washington, DC.
Calleja, Rachel, and Andrew Rogerson. 2019. “Financing
Challenges for Developing Statistical Systems: A
Data as a force for public good | 79
Review of Financing Options.” PARIS21 Discussion
Paper 14, Partnership in Statistics for Development in
the 21st Century, Paris.
Callen, Michael, Saad Gulzar, Ali Hasanain, Muhammad
Yasir Khan, and Arman Rezaee. 2019. “Data and Policy
Decisions: Experimental Evidence from Pakistan.” King
Center on Global Development Working Paper 1055,
Stanford University, Stanford, CA.
Cameron, Grant James, Hai-Anh H. Dang, Mustafa Dinc,
James Stephen Foster, and Michael M. Lokshin. 2019.
“Measuring the Statistical Capacity of Nations.” Policy
Research Working Paper 8693, World Bank, Washing-
ton, DC.
Carletto, Calogero, Sydney Gourlay, Siobhan Murray, and
Alberto Zezza. 2017. “Cheaper, Faster, and More Than
Good Enough: Is GPS the New Gold Standard in Land
Area Measurement?” Survey Research Methods 11 (3):
235–65.
Carletto, Calogero, Sydney Gourlay, and Paul Winters. 2015.
“From Guesstimates to GPStimates: Land Area Mea-
surement and Implications for Agricultural Analysis.”
Journal of African Economies 24 (5): 593–628.
Carletto, Calogero, Sydney Savastano, and Alberto Zezza.
2013. “Fact or Artifact: The Impact of Measurement
Errors on the Farm Size–Productivity Relationship.”
Journal of Development Economics 103 (July): 254–61.
Carr-Hill, Roy. 2013. “Missing Millions and Measuring
Development Progress.” World Development 46 (June):
30–44.
Chen, J. Joyce, and LaPorchia A. Collins. 2014. “Let’s Talk
about the Money: Spousal Communication, Expendi-
tures, and Farm Production.” American Journal of Agricul-
tural Economics 96 (5): 1272–90.
Childs, Jennifer Hunter, Aleia Clark Fobia, Ryan King, and
Gerson Morales. 2019. “Trust and Credibility in the
US Federal Statistical System.” Survey Methods: Insights
from the Field, February 22. https://surveyinsights.org
/?p=10663.
CTO (Office of the Chief Technology Officer, US Department
of Health and Human Services). 2018. “The State of Data
Sharing at the US Department of Health and Human
Services.” US Department of Health and Human Ser-
vices, Washington, DC. https://www.hhs.gov/sites
/default/files/HHS_StateofDataSharing_0915.pdf.
Custer, Samantha, and Tanya Sethi, eds. 2017. “Avoiding Data
Graveyards: Insights from Data Producers and Users in
Three Countries.” AidData, Global Research Institute,
College of William and Mary, Williamsburg, VA.
Dang, Hai-Anh, Mustufa Dinc, Juderica Diaz, Hiroko Maeda,
John Pullinger, Umar Serajuddin, Brian Stacy, et al.
2021a. “Measuring the Statistical Performance of Coun-
tries: An Overview of Updates to the World Bank Statis­
tical Capacity Index.” World Bank, Washington, DC.
Dang, Hai-Anh, John Pullinger, Umar Serajuddin, and Brian
Stacy. 2021b. “Statistical Performance Index: A New
Tool to Measure Country Statistical Capacity.” Policy
Research Working Paper, World Bank, Washington, DC.
Dargent, Eduardo, Gabriela Lotta, José Antonio Mejía-
Guerra, and Gilberto Moncada. 2020. “Who Wants to
Know? The Political Economy of Statistical Capacity
in Latin America.” Inter-American Development Bank,
Washington, DC.
Das, Jishnu, and Jeffrey Hammer. 2007. “Money for Nothing:
The Dire Straits of Medical Practice in Delhi, India.” Jour-
nal of Development Economics 83 (1): 1–36.
Das, Jishnu, Jeffery Hammer, and Carolina Sánchez-Paramo.
2012. “The Impact of Recall Periods on Reported Morbid-
ity and Health Seeking Behavior.” Journal of Development
Economics 98 (May): 76–88.
Data2x. 2019. “Data Breaks the Silence on Violence against
Women: A Case Study of Vietnam.” Gender Data Impact
Case Study, United Nations Foundation, Washington,
DC. https://data2x.org/wp-content/uploads/2020/02
/Impact-Case-Studies-Vietnam-4P.pdf.
Deere, Carmen Diana, Gina E. Alvarado, and Jennifer Twy-
man. 2012. “Gender Inequality in Asset Ownership in
Latin America: Female Owners vs Household Heads.”
Development and Change 43 (2): 505–30.
Deininger, Klaus, Calogero Carletto, Sara Savastano, and
James Muwonge. 2012. “Can Diaries Help in Improv-
ing Agricultural Production Statistics? Evidence from
Uganda.” Journal of Development Economics 98 (May):
42–50.
de Mel, Suresh, David J. McKenzie, and Christopher M.
Woodruff. 2009. “Measuring Microenterprise Profits:
Must We Ask How the Sausage Is Made?” JournalofDevel-
opment Economics 88 (1): 19–31.
de Nicola, Francesca, and Xavier Giné. 2014. “How Accurate
Are Recall Data? Evidence from Coastal India.” Journal of
Development Economics 106 (January): 52–65.
Desai, Vyjayanti T., Anna Diofasi, and Jing Lu. 2018. “The
Global Identification Challenge: Who Are the 1 Billion
People without Proof of Identity?” Voices (blog), April
25, 2018. https://blogs.worldbank.org/voices/global
-identification-challenge-who-are-1-billion-people
-without-proof-identity.
Desiere, Sam, and Dean Mitchell Jolliffe. 2018. “Land Produc-
tivity and Plot Size: Is Measurement Error Driving the
Inverse Relationship?” Journal of Development Economics
130 (January): 84–98. https://doi.org/10.1016/j.jdeveco.2017
.10.002.
Desiere, Sam, Lotte Staelens, and Marijke D’Haese. 2016.
“When the Data Source Writes the Conclusion: Evaluat-
ing Agricultural Policies.” Journal of Development Studies 52
(9): 1372–87.
De Weerdt, Joachim, John Gibson, and Kathleen Beegle. 2019.
“What Can We Learn from Experimenting with Survey
Methods?” LICOS Discussion Paper 418, LICOS Center
for Institutions and Economic Performance, Faculty
of Economics and Business, Katholieke Universiteit
Leuven, Leuven, Belgium.
Díaz, Marianne. 2018. “El Cuerpo Como Dato.” @Derechos-
Digitales América Latina, Santiago, Chile. https://www
.derechosdigitales.org/wp-content/uploads/cuerpo
_DATO.pdf.
Dillon, Andrew, Sydney Gourlay, Kevin McGee, and Gbe-
misola Oseni. 2019. “Land Measurement Bias and Its
Empirical Implications: Evidence from a Validation
Exercise.” Economic Development and Cultural Change 67 (3):
595–624.
80 | World Development Report 2021
Doss, Cheryl Renee, Caitlin Kieran, and Talip Kilic. 2020.
“Measuring Ownership, Control, and Use of Assets.”
Feminist Economics 26 (3): 144–68.
Dunning, Casey, AlanGelb, and Sneha Raghavan. 2014. “Birth
Registration, Legal Identity, and the Post-2015 Agenda.”
CGD Policy Paper 046, Center for Global Development,
Washington, DC.
Fabregas, Raissa, Michael M. Kremer, and Frank Schilbach.
2019. “Realizing the Potential of Digital Development:
The Case of Agricultural Advice.” Science 366 (6471):
eaay3038.
Fisher, Monica, Jeffrey J. Reimer, and Edward R. Carr. 2010.
“Who Should Be Interviewed in Surveys of Household
Income?” World Development 38 (7): 966–73.
FMOH (Federal Ministry of Health, Ethiopia). 2018. eHealth
Apps Inventory. Addis Ababa: FMOH.
Fu, Haishan, and Stefan Schweinfest. 2020. “COVID-19 Wid-
ens Gulf of Global Data Inequality, While National Sta-
tistical Offices Step Up to Meet New Data Demands.”
Data Blog, June 5, 2020. https://blogs.worldbank.org
/opendata/covid-19-widens-gulf-global-data-inequality
-while-national-statistical-offices-step-up.
Fundación Reflejos de Venezuela. 2016. “El drama de ser
transgénero e intentar comprar en un supermercado.”
Fundación Reflejos de Venezuela, Caracas.
Gaddis, Isis, Gbemisola Oseni, Amparo Palacios-López, and
Janneke Pieters. 2019. “Measuring Farm Labor: Survey
Experimental Evidence from Ghana.” Policy Research
Working Paper 8717, World Bank, Washington, DC.
Gibson, John, and Bonggeun Kim. 2010. “Non-Classical
Measurement Error in Long-Term Retrospective Recall
Surveys.” Oxford Bulletin of Economics and Statistics 72 (5):
687–95.
Gourlay, Sydney, Talip Kilic, and David B. Lobell. 2019. “A
New Spin on an Old Debate: Errors in Farmer-Re-
ported Production and Their Implications for the
Inverse Scale–Productivity Relationship in Uganda.”
Journal of Development Economics 141 (November): 1–35.
https://www.sciencedirect.com/science/article/pii
/S0304387818306588.
Government of Croatia. 2019. “Gov’t Launches Changes to
Country’sStatisticalSubdivision.”Newsrelease,January
23.https://vlada.gov.hr/news/gov-t-launches-changes-to
-country-s-statistical-subdivision/25178.
GPSA (Global Partnership for Social Accountability). 2020.
“Making Public Contracts Work for People: Experi-
ences from Uganda.” GPSA, World Bank, Washington,
DC. https://www.thegpsa.org/stories/making-public
-contracts-work-people-experiences-uganda.
Hallegatte, Stéphane, Adrien Vogt-Schilb, Mook Bangalore,
and Julie Rozenberg. 2017. Unbreakable: Building the Resil-
ience of the Poor in the Face of Natural Disasters. Climate
Change and Development Series. Washington, DC:
World Bank.
Head, Brian W. 2016. “Toward More ‘Evidence-Informed’
Policy Making?” Public Administration Review 76 (3):
472–84.
Heffetz, Ori, and Katrina Ligett. 2014. ”Privacy and Data-
Based Research.” Journal of Economic Perspectives 28 (2):
75–98.
Hjort, Jonas, Diana Moreira, Gautam Rao, and Juan Fran-
cisco Santini. 2019. “How Research Affects Policy:
Experimental Evidence from 2,150 Brazilian Municipal-
ities.” NBER Working Paper 25941, National Bureau of
Economic Research, Cambridge, MA.
Hoogeveen, Johannes, and Nga Thi Viet Nguyen. 2019.
“Statistics Reform in Africa: Aligning Incentives with
Results.” Journal of Development Studies 55 (4): 702–19.
Hoogeveen, Johannes, and Utz Pape, eds. 2020. Data Collec-
tion in Fragile States: Innovations from Africa and Beyond.
Cham, Switzerland: Palgrave Macmillan.
Hosman, Laura. 2014. “Emerging Markets: Top ICT Hard-
ware Challenges.” Inveneo, San Francisco.
ITU (International Telecommunication Union). 2016. “Har-
nessing the Internet ofThings for Global Development.”
ITU, Geneva. https://www.itu-ilibrary.org/science-and
-technology/harnessing-the-internet-of-things-for
-global-development_pub/80d1ac90-en.
Jacobs, Krista, and Aslihan Kes. 2015. “The Ambiguity of Joint
Asset Ownership: Cautionary Tales from Uganda and
South Africa.” Feminist Economics 21 (3): 23–55.
Jerven, Morten. 2014. “The Political Economy of Agricultural
Statistics and Input Subsidies: Evidence from India,
Nigeria, and Malawi.” Journal of Agrarian Change 14 (1):
129–45.
Jerven, Morten. 2019. “The Problems of Economic Data in
Africa.” In Oxford Research Encyclopedia of Politics. Oxford,
UK: Oxford University Press. https://doi.org/10.1093
/acrefore/9780190228637.013.748.
Johnson, David S., Catherine Massey, and Amy O’Hara. 2015.
“The Opportunities and Challenges of Using Admin-
istrative Data Linkages to Evaluate Mobility.” Annals of
the American Academy of Political and Social Science 657 (1):
247–64.
Jolliffe, Dean, Talip Kilic, Daniel Gerszon Mahler, and Philip
Randolph Wollburg. Forthcoming. “Under What Condi-
tions Are Data Valuable for Development?” WDR 2021
background paper, World Bank, Washington, DC.
J-PAL (Abdul Latif Jameel Poverty Action Lab). 2018. “Annual
Report 2018 and a Look Ahead to 2019.” J-PAL, Massachu-
setts Institute of Technology, Cambridge, MA. https://
www.povertyactionlab.org/sites/default/files/2018
-annual-report-web-ready.pdf.
Karagulian, Federico, Maurizio Barbiere, Alexander Kotsev,
Laurent Spinelle, Michel Gerboles, Friedrich Lagler,
Nathalie Redon, et al. 2019. “Review of the Performance
of Low-Cost Sensors for Air Quality Monitoring.” Atmo-
sphere 10 (9): 506.
Katsimi, Margarita, and Thomas Moutos. 2010. “EMU and
the Greek Crisis: The Political-Economy Perspective.”
European Journal of Political Economy 26 (4): 568–76.
https://doi.org/10.1016/j.ejpoleco.2010.08.002.
Khan, Amina, Joseph Wales, and Elizabeth Stuart. 2015.
“Country Priorities for Data Development: What Does
History Tell Us?” Report, Overseas Development Insti-
tute, London. https://www.odi.org/sites/odi.org.uk/files
/odi-assets/publications-opinion-files/9695.pdf.
Kilic, Talip, and Heather G. Moylan. 2016. “Methodological
Experiment on Measuring Asset Ownership from a
Data as a force for public good | 81
Gender Perspective (MEXA).” Technical Report, World
Bank, Washington, DC.
Kilic, Talip, Heather G. Moylan, John Ilukor, Clement Mten-
gula, and Innocent Pangapanga-Phiri. 2018. “Root for
the Tubers: Extended-Harvest Crop Production and
Productivity Measurement in Surveys.” Policy Research
Working Paper 8618, World Bank, Washington, DC.
Kilic,Talip, Heather G. Moylan, and Gayatri B. Koolwal. 2020.
“Getting the (Gender-Disaggregated) Lay of the Land:
Impact of Survey Respondent Selection on Measuring
Land Ownership.” Policy Research Working Paper 9151,
World Bank, Washington, DC.
Kilic, Talip, Goedele Van den Broeck, Gayatri B. Koolwal,
and Heather G. Moylan. 2020. “Are You Being Asked?
Impacts of Respondent Selection on Measuring
Employment.” Policy Research Working Paper 9152,
World Bank, Washington, DC.
Kilic, Talip, Alberto Zezza, Calogero Carletto, and Sara Savas-
tano. 2017. “Missing(ness) in Action: Selectivity Bias in
GPS-Based Land Area Measurements.” World Develop-
ment 92 (April): 143–57.
Kiregyera, Ben. 2017. “Supporting Implementation of Fun-
damental Principles of Official Statistics in the African
Region.” Statistical Journal of the IAOS 33 (4): 863–67.
Klein, Thilo, Anaïs Galdin, and El Iza Mohamedou. 2016.
“An Indicator for Statistical Literacy Based on National
Newspaper Archives.” Paper presented at International
Association of Statistics Education’s 2016 Roundtable
Conference, “Promoting Understanding of Statistics
about Society,” Berlin, July 19–22, 2016. http://iase-web
.org/Conference_Proceedings.php?p=Promoting
_Understanding_of_Statistics_about_Society_2016.
Krätke, Florian, and Bruce Byiers. 2014. “The Political Econ-
omyofOfficialStatistics:ImplicationsfortheDataRevo-
lutioninSub-SaharanAfrica.”PARIS21DiscussionPaper
5, Partnership in Statistics for Development in the 21st
Century, Paris. http://ecdpm.org/wp-content/uploads
/DP-170-Political-Economy-Official-Statistics-Africa
-December-2014.pdf.
Kubota, Megumi, and Albert Zeufack. 2020. “Assessing the
Returns on Investment in Data Openness andTranspar-
ency.” Policy Research Working Paper 9139, World Bank,
Washington, DC.
Lämmerhirt,Danny,JonathanGray,TommasoVenturini,and
AxelMeunier.2018.“AdvancingSustainabilityTogether?
Citizen-Generated Data and the Sustainable Develop-
ment Goals.” Global Partnership for Sustainable Devel-
opment Data, United Nations, New York. http://www
.data4sdgs.org/resources/advancing-sustainability
-together-citizen-generated-data-and-sustainable
-development.
Lange, Simon. 2020. “Key Trends in Development Co-
operation for National Data and Statistical Systems.”
OECD Development Policy Paper 31, Organisation for
Economic Co-operation and Development, Paris.
Lechene, Valérie, Krishna Pendakur, and Alex Wolf. 2019.
“OLS Estimation of the Intra-Household Distribution of
Consumption.” IFS Working Paper W19/19, Institute for
Fiscal Studies, London.
López-Vargas, Ascensión, Manuel Fuentes, and Marta Vivar.
2020. “Challenges and Opportunities of the Internet of
Things for Global Development to Achieve the United
Nations Sustainable Development Goals.” IEEE Access 8:
37202–13.
Manning, Richard, Ian Goldman, and Gonzalo Hernán-
dez Licona. 2020. “The Impact of Impact Evaluation.”
UNU-WIDER Working Paper 2020/20, United Nations
University–World Institute for Development Econom-
ics Research, Helsinki.
Maron, Mikel. 2015. “How Complete Is OpenStreetMap?”
Mapbox (blog), November 19, 2015. https://blog.mapbox
.com/how-complete-is-openstreetmap-7c369787af6e.
Masaki,Takaaki, Samantha Custer, Agustina Eskenazi, Alena
Stern, and Rebecca Latourell. 2017. “Decoding Data Use:
How Do Leaders Source Data and Use It to Accelerate
Development?” AidData, Global Research Institute, Col-
lege of William and Mary, Williamsburg, VA.
McCluskey, William, and Chyi-Yun Huang. 2019. “The Role
of ICT in Property Tax Administration: Lessons from
Tanzania.” CMI Brief 6, Chr. Michelsen Institute, Ber-
gen, Norway.
McQueston, Kate. 2013. “Autonomy, Independence, and
Capacity of National Statistics Offices.” Background
paper, Center for Global Development, Washington,
DC; African Population and Health Research Council,
Nairobi.
Meijer, Albert, and Suzanne Potjer. 2018. “Citizen-Generated
Open Data: An Explorative Analysis of 25 Cases.” Govern-
ment Information Quarterly 35 (4): 613–21.
Morawska, Lidia, Phong K. Thai, Xiaoting Liu, Akwasi
Asumadu-Sakyi, Godwin Ayoko, Alena Bartonova,
Andrea Bedini, et al. 2018. “Applications of Low-Cost
Sensing Technologies for Air Quality Monitoring and
Exposure Assessment: How Far Have They Gone?” Envi-
ronment International 116 (July): 286–99.
MPPN (Multidimensional Poverty Peer Network). 2017.
“Using the MPI to Determine National Budgets in
Costa Rica.” Dimensions 4 (August): 14–18, Oxford
Poverty and Human Development Initiative, Oxford
Department of International Development, University
of Oxford, Oxford, UK. https://www.mppn.org/wp
-content/uploads/2017/08/Dim_4_ENGLISH_online
.pdf.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2017. Development Co-operation Report 2017: Data
for Development. Paris: OECD.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2019. The Path to Becoming a Data-Driven Public
Sector. OECD Digital Government Studies Series. Paris:
OECD. https://doi.org/10.1787/059814a7-en.
Oliver, Kathryn, Simon Innvar, Theo Lorenc, Jenny Wood-
man, and James Thomas. 2014. “A Systematic Review
of Barriers to and Facilitators of the Use of Evidence by
Policymakers.” BMC Health Services Research 14 (January
3), article 2. https://doi.org/10.1186/1472-6963-14-2.
Open Data Watch. 2015a. “Breathe Deep: Air Quality Reform
in China.” Data Impacts Case Studies, Open DataWatch,
Washington, DC. https://dataimpacts.org/project/data
-help-china-breath-better/.
82 | World Development Report 2021
Open Data Watch. 2015b. “Disaggregated Data: Impacts of
Demographic and Health Surveys.” Data Impacts Case
Studies, Open Data Watch, Washington, DC. https://
dataimpacts.org/project/health-surveys/.
PARIS21 (Partnership in Statistics for Development in the
21st Century). 2018. “Survey Results: New Approaches
to Capacity Development and Future Priorities, CD4.0
Survey.” PARIS21, Paris. https://paris21.org/capacity
-development-40/cd40-survey.
PARIS21 (Partnership in Statistics for Development in the
21st Century). 2020. “Partner Report on Support to Sta-
tistics: PRESS 2019.” PARIS21, Paris. https://paris21.org
/sites/default/files/inline-files/PARIS21_Press%202019
_WEB.pdf.
PARIS21 (Partnership in Statistics for Development in the
21st Century) and PSA (Philippine Statistics Authority).
2020. Use of Citizen-Generated Data for SDG Reporting
in the Philippines: A Case Study. Paris: PARIS21. https://
paris21.org/sites/default/files/inline-files/PSA-report
-FINAL.pdf.
Petrov, Oleg, Joel Gurin, and Laura Manley. 2016. “Open Data
for Sustainable Development.” Connections: Transport
and ICT, Note 2016-5, World Bank, Washington, DC.
Pham, Congduc, Abdur Rahim, and Philippe Cousin. 2016.
“Low-Cost, Long-Range Open IoT for Smarter Rural
African Villages.” In 2016 IEEE International Smart
Cities Conference (ISC2), edited by Institute of Elec-
trical and Electronics Engineers, 512–17. Red Hook, NY:
Curran Associates.
Piovesan, Federico. 2015. “Statistical Perspectives on Citizen-
Generated Data.” DataShift, Civicus, Johannesburg.
http://civicus.org/thedatashift/wp-content/uploads/2015
/07/statistical-perspectives-on-cgd_web_single-page.pdf.
Privacy International. 2013. “Biometrics: Friend or Foe of
Privacy?” Privacy International, London. https://privacy
international.org/news-analysis/1409/biometrics-friend
-or-foe-privacy.
Privacy International. 2019. “Submission to the Special
Rapporteurship on Economic, Social, Cultural, and
Environmental Rights of the Inter-American Com-
mission on Human Rights Regarding the Situation of
Economic, Social, Cultural, and Environmental Rights
in the Region.” TEDIC, InternetLab, Derechos Digitales,
Fundación Karisma, Dejusticia, Asociación por los
Derechos Civiles, and Privacy International, Privacy
International, London.
Randall, Sara. 2015. “Where Have All the Nomads Gone?
Fifty Years of Statistical and Demographic Invisi-
bilities of African Mobile Pastoralists.” Pastoralism 5,
article 22.
Rogger, Daniel Oliver, and Ravi Somani. 2018. “Hierarchy
and Information.” Policy Research Working Paper 8644,
World Bank, Washington, DC.
Roseth, Benjamin, Angela Reyes, and Karla Yee Amézaga.
2019. “The Value of Official Statistics: Lessons from
Intergovernmental Transfers.” IDB Technical Note
1682, Inter-American Development Bank, Washington,
DC. https://publications.iadb.org/publications/english
/document/The_Value_of_Official_Statistics_Lessons
_from_Intergovernmental_Transfers_en.pdf.
Safetipin. 2016. “Bogota: A Safety Analysis Report.”
Safetipin, Gurgaon, India. https://safetipin.com/report
/bogota-report-2016/.
Sandefur, Justin, and Amanda Glassman. 2015. “The Political
Economy of Bad Data: Evidence from African Survey
and Administrative Statistics.” Journal of Development
Studies 51 (2): 116–32.
Sanga, Dimitri. 2013. “The Challenges of the Narrative of
African Countries’ Development: Data Demand and
Supply Mismatches.” Paper presented at conference
“African Economic Development: Measuring Success
and Failure,” School for International Studies, Simon
Fraser University, Vancouver, Canada, April 18–20, 2013.
Sanna, Valentina, and Ida McDonnell. 2017. “Data for Devel-
opment: DAC Member Priorities and Challenges.”
OECD Development Co-Operation Working Paper 35,
OECD, Paris.
Scott, Christopher. 2005. “Measuring Up to the Measure-
ment Problem: The Role of Statistics in Evidence-Based
Policy-Making.” Partnership in Statistics for Develop-
ment in the 21st Century, Paris. https://paris21.org/sites
/default/files/MUMPS-full.pdf.
SDSN (United Nations Sustainable Development Solutions
Network). 2015. “Data for Development: A Needs Assess-
ment for SDG Monitoring and Statistical Capacity
Development.” SDSN, New York. https://sustainable
development.un.org/content/documents/2017Data-for
-Development-Full-Report.pdf.
SDSN TReNDS (United Nations Sustainable Development
Solutions Network–Thematic Research Network on
Data and Statistics). 2018a. “Data Sharing via SMS
Strengthens Uganda’s Health System: A Case Study
of mTRAC, Uganda.” SDSN TReNDS, New York.
http://www.data4sdgs.org/sites/default/files/2018-09
/mTRAC%20CaseStudy_FINAL.pdf.
SDSN TReNDS (United Nations Sustainable Development
Solutions Network–Thematic Research Network on
Data and Statistics). 2018b. “Handpump Data Improves
Water Access.” SDSN TReNDS, New York. http://www
.data4sdgs.org/sites/default/files/2018-12/Smart%20
Handpump%20Case%20Study.pdf.
Seltzer, Judith, and Deborah Klein Walker. 2020. “Counting
Children in the US 2020 Census: Assure Our Future Is
Represented.” NAM Perspectives Commentary, National
Academy of Medicine, Washington, DC. https://doi.org
/10.31478/202003d.
Sethi, Tanya, and Mihir Prakash. 2018. “Counting on Statis-
tics: How Can National Statistical Offices and Donors
Increase Use?” AidData, Global Research Institute,
College of William and Mary, Williamsburg, VA. https://
www.aiddata.org/publications/counting-on-statistics.
Snorrason, Hallgrímur. 2018. “Securing the Independence
of Official Statistics: Introductory Remarks.” Statistical
Journal of the IAOS 34 (2): 145–47.
SSEE (Smith School of Enterprise and the Environment).
2014. “From Rights to Results in Rural Water Services:
Evidence from Kyuso, Kenya.” Water Programme Work-
ing Paper 1, SSEE, University of Oxford, Oxford, UK.
https://www.smithschool.ox.ac.uk/publications/reports
/SSEE-rights-to-results_final_March2014.pdf.
Data as a force for public good | 83
Taylor, Matthew. 2016. “The Political Economy of Statistical
Capacity: A Theoretical Approach.” IDB Discussion
Paper IDP-DP-471, Inter-American Development Bank,
Washington, DC.
Tiberti, Marco, and Valentina Costa. 2020. “Disability Mea-
surement in Household Surveys: A Guidebook for
Designing Household Survey Questionnaires.” LSMS
Guidebook, Living Standards Measurement Study,
World Bank, Washington, DC. http://documents.world
bank.org/curated/en/456131578985058020/Disability
-Measurement-in-Household-Surveys-A-Guidebook-for
-Designing-Household-Survey-Questionnaires.
Todesca, Jorge A. 2017. “Political Power and the Argentine
Statistical System: The Case of INDEC1.” Statistical Jour-
nal of the IAOS 33 (4): 875–83.
Toulemon, Laurent. 2017. “Undercount of Young Children
and Young Adults in the New French Census.” Statistical
Journal of the IAOS 33 (2): 311–16.
Trewin, Dennis. 2018. “What Does an Independent Official
Statistical Agency Mean in Practice?” StatisticalJournalof
the IAOS 34 (2): 165–69.
UNESCO (United Nations Educational, Scientific, and Cul-
turalOrganization).2016.“Education2030:IncheonDec-
larationandFrameworkforAction.”DocumentED-2016/
WS/28,InstituteforInformationTechnologiesinEduca-
tion, UNESCO, Paris. http://uis.unesco.org/sites/default
/files/documents/education-2030-incheon-framework
-for-action-implementation-of-sdg4-2016-en_2.pdf.
United Nations. 2019a. “Extreme Poverty and Human Rights:
Note by the Secretary-General.” Document A/74/493,
United Nations, New York. https://undocs.org/A/74/493.
United Nations. 2019b. The Sustainable Development Goals
Report 2019. New York: United Nations.
UNSTATS (Statistics Division, Department of Economic and
Social Affairs, United Nations). 2015. United Nations
Fundamental Principles of Official Statistics: Implementa-
tion Guidelines. New York: United Nations. https://
.unstats.org/unsd/dnss/gp/Implementation_Guidelines
_FINAL_without_edit.pdf.
UNSTATS (Statistics Division, Department of Economic and
Social Affairs, United Nations) and World Bank. 2020.
“Monitoring the State of Statistical Operations under
the COVID-19 Pandemic: Highlights from the Second
Round of a Global COVID-19 Survey of National Statis-
tical Offices.” World Bank, Washington, DC.
UN Women. 2018. TurningPromisesintoAction:GenderEquality
in the 2030 Agenda for Sustainable Development. New York:
UN Women.
UN Women. 2020. “COVID-19 and Its Economic Toll on
Women: The Story behind the Numbers.” News and
Events, September 16, 2020. https://www.unwomen.org
/en/news/stories/2020/9/feature-covid-19-economic
-impacts-on-women.
Van Belle, Jean-Paul. 2018. Africa Data Revolution Report
2018: Status and Emerging Impact to Open Data in Africa.
With contributions by Danny Lämmerhirt, Carlos
Iglesias, Paul Mungai, Hubeidatu Nuhu, Mbongeni
Hlabano, Tarik Nesh-Nash, and Sarang Chaudhary.
Washington, DC: World Wide Web Foundation. https://
webfoundation.org/docs/2019/03/Africa-data-revolution
-report.pdf.
Verhulst, Stefaan G., and Andrew Young. 2017. “Open Data in
Developing Economies: Toward Building an Evidence
Base on What Works and How.” Governance Lab, New
York University, New York. https://odimpact.org/files
/odimpact-developing-economies.pdf.
von Oppeln-Bronikowski, Sibylle, Christine Kronz, Irina
Meinke, and Hannah Wirtzfeld. 2015. “How Can Profes-
sional and Ethical Frameworks Strengthen Statisticians
in Their Practical Work?” Statistical Journal of the IAOS 31
(4): 513–22.
WHO (World Health Organization). 2011. “The Abuja Decla-
ration: Ten Years On.” WHO, Geneva. https://www.who
.int/healthsystems/publications/abuja_report_aug_2011
.pdf?ua=1.
Wollburg, Philip, Marco Tiberti, and Alberto Zezza. 2020.
“Recall Length and Measurement Error in Agricultural
Surveys.” Food Policy. Published ahead of print, Decem-
ber 1, 2020. https://doi.org/10.1016/j.foodpol.2020.102003.
World Bank. 2016a. Croatia: Small-Area Estimation of
Consumption-Based Poverty (Poverty Maps). Washington,
DC: World Bank. https://razvoj.gov.hr/UserDocs
Images//Istaknute%20teme/Kartom%20siroma%C5%
A1tva//Croatia%20Small-Area%20Estimation%20of%20
Consumption-Based%20Poverty%20(Poverty%20Maps)
.pdf.
World Bank. 2016b. World Development Report 2016: Digital
Dividends. Washington, DC: World Bank.
World Bank. 2017a. “Index of Multiple Deprivation: Concep-
tual Framework for Identifying Lagging Municipalities
and Towns in Croatia.” World Bank, Washington, DC.
https://razvoj.gov.hr/UserDocsImages//Istaknute%20
teme/Kartom%20siroma%C5%A1tva//Index%20of%20
Multiple%20Deprivation%20-%20Conceptual%20
framework_18.06.2019.pdf.
World Bank. 2017b. “Monitoring Global Poverty: Report of
the Commission on Global Poverty.” World Bank, Wash-
ington, DC. https://openknowledge.worldbank.org
/bitstream/handle/10986/25141/9781464809613.pdf.
World Bank. 2017c. World Development Report 2017: Gover-
nance and the Law. Washington, DC: World Bank.
World Bank. 2018a. Data for Development: An Evaluation
of World Bank Support for Data and Statistical Capacity.
Washington, DC: Independent Evaluation Group,
World Bank.
World Bank. 2018b. Improving Public Sector Performance:
Through Innovation and Inter-Agency Coordination. Global
Report: Public Sector Performance. Washington, DC:
World Bank.
World Bank. 2018c. “Public Sector Savings and Revenue
from Identification Systems: Opportunities and Con-
straints.” World Bank, Washington, DC.
World Bank. 2018d. “The Role of Digital Identification for
Healthcare: The Emerging Use Cases.” Identification for
Development (ID4D), World Bank, Washington, DC.
World Bank. 2019a. IC4D, Information and Communication for
Development2018:Data-DrivenDevelopment.Washington,DC:
World Bank. http://documents1.worldbank.org/curated
84 | World Development Report 2021
/en/987471542742554246/pdf/128301-9781464813
252.pdf.
World Bank. 2019b. ID4D Practitioner’s Guide: Version 1.0.
Washington, DC: World Bank. http://documents1
.worldbank.org/curated/en/248371559325561562/pdf
/ID4D-Practitioner-s-Guide.pdf.
World Bank. 2020a. “Classification of Fragile and Conflict-
Affected Situations.” Brief, July 9, World Bank, Wash-
ington, DC. https://www.worldbank.org/en/topic
/fragilityconflictviolence/brief/harmonized-list-of
-fragile-situations.
World Bank. 2020b. “West Africa Unique Identification for
Regional Integration and Inclusion Program, Phase 2.”
Project Appraisal Document, Report PAD3556, World
Bank, Washington, DC.
Yin, Peng, Michael Brauer, Aaron J. Cohen, Haidong Wang,
Jie Li, Richard T. Burnett, Jeffrey D. Stanaway, et al. 2020.
“The Effect of Air Pollution on Deaths, Disease Burden,
and Life Expectancy across China and Its Provinces,
1990–2017: An Analysis for the Global Burden of Disease
Study 2017.” Lancet Planetary Health 4 (9): 386–98.
Deploying data to curtail violence against women and girls | 85
For too long, violence against women and girls has been a deep, dark
secret. Now, data collection efforts around the world are shedding light on
this tragic problem and leading to solutions.
Spotlight 2.1
Deploying data to curtail violence
against women and girls
Violence against women and girls (VAWG) is a global
pandemic. One out of three women and girls (35
percent) worldwide between the ages of 15 and 49
has experienced physical violence, sexual violence,
or both. At least 200 million girls and women have
undergone female genital mutilation (FGM), and in
at least 11 countries, more than half of women ages
15–49 have undergone FGM (figure S2.1.1).1
We know
Figure S2.1.1 Prevalence of female genital mutilation in women ages 15–49,
by country income level, 2010–19
Source: Adapted from Kashiwase and Pirlea 2019. Data are drawn from the World Bank World Development Indicators, https://databank.worldbank.org
/FGM-Prevalence/id/a4f22755 (SH.STA.FGMS.ZS), using data from Demographic and Health Surveys, Multiple Indicator Cluster Surveys, and UNICEF.
Data at http://bit.do/WDR2021-Fig-S2_1_1.
Note: FGM = female genital mutilation; UNICEF = United Nations Children’s Fund.
0
G
u
i
n
e
a
E
g
y
p
t
,
A
r
a
b
R
e
p
.
D
j
i
b
o
u
t
i
M
a
l
i
S
u
d
a
n
S
i
e
r
r
a
L
e
o
n
e
E
r
i
t
r
e
a
B
u
r
k
i
n
a
F
a
s
o
G
a
m
b
i
a
,
T
h
e
M
a
u
r
i
t
a
n
i
a
E
t
h
i
o
p
i
a
G
u
i
n
e
a
-
B
i
s
s
a
u
C
e
n
t
r
a
l
A
f
r
i
c
a
n
R
e
p
u
b
l
i
c
L
i
b
e
r
i
a
C
ô
t
e
d
’
I
v
o
i
r
e
C
h
a
d
S
e
n
e
g
a
l
K
e
n
y
a
N
i
g
e
r
i
a
Y
e
m
e
n
,
R
e
p
.
M
a
l
d
i
v
e
s
T
a
n
z
a
n
i
a
B
e
n
i
n
I
r
a
q
G
h
a
n
a
T
o
g
o
N
i
g
e
r
U
g
a
n
d
a
Share
of
women
subjected
to
FGM
(%)
25
50
75
100
94.5 94.0
88.6
87.2 86.6
83.0 83.0
75.8 75.7
66.6 65.2
44.9 44.4
38.4
36.7
24.2 23.3
21.0
19.5 18.5
12.9
10.0 9.2
7.4
3.8 3.1 2.0
0.3
Low-income Lower-middle-income Upper-middle-income
86 | World Development Report 2021
these facts because representative population-based
studies have been undertaken to understand the prev-
alence of VAWG. These studies have used a standard-
ized methodology in more than 90 countries across
all regions and all income groups. For example, data
for 55 low- and middle-income countries are available
through a standardized module measuring VAWG,
and this module has been incorporated in the Demo-
graphic and Health Surveys (DHSs).2
The availability and accessibility of reliable, com-
parable, and nationally representative VAWG data are
leading to solutions, including laws banning domestic
violence.3
Moreover, the data are informing diagnostic
work, prevention and response efforts, and policies in
low- and middle-income settings in key areas such as
health, education, social protection, and governance.
For instance, kNOwVAWdata,4
an initiative led by the
United Nations Population Fund (UNFPA), collects
VAWG data on 27 countries in Asia and the Pacific,
shedding light on why survivors are not accessing ser-
vices.TheWorldBankhasusedthesedataasabasisfor
its analytical and operational work. In the Great Lakes
region of Africa, an in-depth analysis of DHS data
helped to identify and target emergency and wom-
en’s health activities and to prevent and respond to
VAWG in Uganda. In 2019 Peru’s president welcomed
an in-depth analysis of VAWG data and expenditures,
setting the stage for a national results-oriented bud-
getary plan to reduce VAWG that was supported by
the Ministry of Economy and Finance and the Minis-
try of Women and Vulnerable Populations.
Investing in data to understand
the barriers preventing survivors
of violence from using essential
support services
In addition to amassing data on the prevalence of
VAWG, many countries and agencies that provide
essential services to survivors of violence keep track
of service-based data. Data on reported cases of
VAWG allow countries to understand who is seek-
ing help, when, for what types of violence, and how
often. Various barriers, including fear and lack of
knowledge, may prevent women from seeking ser-
vices; data help countries to understand and address
these barriers. For instance, service-based data can be
used to monitor important life-saving measures, like
providing victims with post-exposure prophylactics
(PEP) within 72 hours of a sexual assault. Tracking
how many survivors receive PEP can unveil barriers
related to the supply chain of essential medicines or
gaps in the training of health service staff.
Integrating service-based data with data repre-
sentative of a given population can yield important
insights. The Gender-Based Violence Information
Management System (GBVIMS) provides a global
example.5
This multiagency initiative facilitates the
safe, ethical, effective, and efficient standardization
and coordination of service-based data. While such
efforts are critical, it is also important to ensure that
investments in gender-based violence data systems
do not divert limited funds and staffing away from
the provision of services to the survivors of violence.
Separate streams of investment—and greater invest-
ment—in service provision and data systems are
necessary.
The first and foremost purpose of the GBVIMS
and service-based data is to improve the quality and
accessibility of services for survivors of VAWG. But
for these systems to be effective, several foundational
issues must be addressed. First, efforts to integrate
data should be driven by the needs of women and
girls seeking services, not by the ease of access to
centrally located data. Second, to overcome silos and
promote national monitoring, coordination is needed
across multiple institutions with different mandates
and data systems.
Addressing the surge in VAWG
during the COVID-19 pandemic
VAWG has surged during the COVID-19 pandemic.6
Lockdown measures designed to contain the spread of
the virus as well as the economic and health stresses
associated with this crisis have contributed to an
increase in violence—especially intimate partner vio-
lence.7
In the early days of the pandemic, the UNFPA
warnedthat31millionadditionalcasesofgender-based
violence could occur as a result of six months of lock-
downs.8
Complicating matters, providing services has
become more difficult because some resources have
been diverted to the COVID-19 response and some
services have been suspended altogether.9
Deploying data to curtail violence against women and girls | 87
Reliable data are crucial to understanding and
addressing this situation. However, collecting reliable
data onVAWG has been especially challenging during
the COVID-19 pandemic. Face-to-face data collection,
the predominant mode in low- and middle-income
countries, has been widely suspended. The alterna-
tives—remote data collection through telephone,
text messaging, or the Web—increase the risk of vio-
lence: confidentiality is nearly impossible to ensure,
and even electronic communications leave traces.10
Instead, sources of indirect information should be
used: for instance, service-based data or key infor-
mant interviews with frontline workers.
Collecting VAWG data ethically
Special care must be taken when handling data on
VAWG. Soberingly, collecting VAWG data can and
has caused women to experience more violence.
Ethical and safety guidelines must be followed when
considering both the collection and sharing of such
data. These guidelines identify minimum standards
for the collection of VAWG data, such as the ability
to offer referrals for support to all who say they have
experienced violence; the ability to guarantee confi-
dentiality and privacy for survivors when collecting
and reporting on data; and the commitment to use
the data collected for increased and improved action.
The World Health Organization (WHO) has issued the
following guidelines:
• 
“Putting Women First: Ethical and Safety Recom-
mendations for Research on Domestic Violence
against Women” (2001)11
• 
“Ethical and Safety Recommendations for Inter-
viewing Trafficked Women” (2003)12
• 
“Sample Design, Ethical and Safety Considerations,
and Response Rates” (2005)13
• “Ethical and Safety Recommendations for Research-
ing,Documenting,and Monitoring SexualViolence
in Emergencies” (2007)14
• 
“Ethical and Safety Recommendations for Inter­
vention Research on Violence against Women”
(2016),15
building on lessons from the publication
“Putting Women First: Ethical and Safety Recom-
mendations for Research on Domestic Violence
against Women” (2001)
The Sexual Violence Research Initiative of the
Medical Research Council in Pretoria, South Africa,
has also issued important guidelines:
• 
Ethical and Safety Recommendations for Research
on Perpetration of Sexual Violence (2012)16
Notes
1. Female genital mutilation “does not provide any health
benefits, but rather causes serious risks for women’s
health, including chronic infections and pain, menstrual
problems, and complications in childbirth” (Kashiwase
and Pirlea 2019). See also United Nations Children’s
Fund, Female Genital Mutilation (dashboard), updated
February 2020, https://data.unicef.org/topic/child
-protection/female-genital-mutilation/.
2. The country count is as of August 2020. For DHS data on
violence against women and girls, see ICF International,
STATcompiler (DHS Program STATcompiler) (database),
http://www.statcompiler.com/. Select “Choose Indicator”
and, from the dropdown menu, “Physical or sexual
violence committed by husband/partner.” Then click
“Next,” “Filter by World Region,” “Select All,” and “Next.”
The data will appear and can be augmented and refined
by choosing more categories from the “Indicators” and
“Countries” menus on the right.
3. For case studies of the impact of VAWG data on policy,
see “Disaggregated Data: Impacts of Demographic
and Health Surveys,” Data Impacts Case Studies, Open
Data Watch, https://dataimpacts.org/project/health
-surveys/.
4. See Measuring Prevalence of Violence against Women
in Asia-Pacific (dashboard), Regional Office for Asia and
the Pacific, United Nations Population Fund, https://asia
pacific.unfpa.org/knowvawdata.
5. See GBVIMS (Gender-Based Violence Information
Management System) (dashboard), Inter-Agency
GBVIMS Steering Committee, https://www.gbvims
.com/.
6. Bettinger-Lopez and Bro (2020); Johnson et al. (2020).
7. United Nations (2020); UNDP (2020).
8. UNFPA (2020).
9. Johnson et al. (2020).
10. UN Women (2020).
11. WHO (2001).
12. WHO (2003).
13. García-Moreno et al. (2005).
14. WHO (2007).
15. WHO (2016).
16. Jewkes, Dartnall, and Sikweyiya (2012).
88 | World Development Report 2021
References
Bettinger-Lopez, Caroline, and Alexandra Bro. 2020. “A Dou-
blePandemic:DomesticViolenceintheAgeofCOVID-19.”
In Brief (blog), May 13, 2020. https://www.cfr.org/in-brief
/double-pandemic-domestic-violence-age-covid-19.
García-Moreno, Claudia, Henrica A. F. M. Jansen, Mary
Ellsberg, Lori Heise, and Charlotte Watts. 2005. “Sam-
ple Design, Ethical and Safety Considerations, and
Response Rates.”InWHO Multi-Country Study on Women’s
Health and Domestic Violence against Women: Initial Results
on Prevalence, Health Outcomes, and Women’s Responses,
ch. 3, 19–24. Geneva: World Health Organization. https://
www.who.int/reproductivehealth/publications/violence
/24159358X/en/.
Jewkes, Rachel, Elizabeth Dartnall, and Yandisa Sikweyiya.
2012. “Ethical and Safety Recommendations for Research
on Perpetration of Sexual Violence.” Sexual Violence
Research Initiative, Gender and Health Research Unit,
Medical Research Council, Pretoria, South Africa.
Johnson, Katy, Lindsey Green, Muriel Volpellier, Suzanne
Kidenda, Thomas McHale, Karen Naimer, and Ranit
Mishori. 2020. “The Impact of COVID-19 on Services for
People Affected by Sexual and Gender-Based Violence.”
International Journal of Gynecology and Obstetrics 150 (3):
285–87. https://doi.org/10.1002/ijgo.13285.
Kashiwase, Haruna, and Florina Pirlea. 2019. “200 Million
Women and Girls in the World Today Have Undergone
Female Genital Mutilation.” Data Blog (blog), Septem-
ber 27, 2019. https://blogs.worldbank.org/opendata/200
-million-women-and-girls-world-today-have-undergone
-female-genital-mutilation.
United Nations. 2020. “Policy Brief: The Impact of COVID-19
on Women.” United Nations, New York.
UNDP (United Nations Development Programme). 2020.
“Gender-Based Violence and COVID-19.” UNDP Brief,
UNDP, New York. https://www.undp.org/content/undp
/en/home/librarypage/womens-empowerment/gender
-based-violence-and-covid-19.html.
UNFPA (United Nations Population Fund). 2020. “Mil-
lions More Cases of Violence, Child Marriage, Female
Genital Mutilation, Unintended Pregnancy Expected
due to the COVID-19 Pandemic.” News (blog), April 28,
2020. https://www.unfpa.org/news/millions-more-cases
-violence-child-marriage-female-genital-mutilation
-unintended-pregnancies.
UN Women (United Nations Entity for Gender Equality and
the Empowerment of Women). 2020. “Violence against
Women and Girls: Data Collection during COVID-19.”
UN Women, New York. https://www.unwomen.org
/en/digital-library/publications/2020/04/issue-brief
-violence-against-women-and-girls-data-collection
-during-covid-19.
WHO (World Health Organization). 2001. “Putting Women
First: Ethical and Safety Recommendations for Research
on DomesticViolence againstWomen.” DocumentWHO/
FCH/GWH/01.1, Department of Gender and Women’s
Health, Family and Community Health, WHO, Geneva.
https://www.who.int/gender/violence/womenfirtseng.pdf.
WHO (World Health Organization). 2003. “WHO Ethical and
Safety Recommendations for Interviewing Trafficked
Women.” Health Policy Unit, London School of Hygiene
and Tropical Medicine, London; WHO, Geneva.
WHO (World Health Organization). 2007. “WHO Ethical and
Safety Recommendations for Researching, Document-
ing, and Monitoring Sexual Violence in Emergencies.”
WHO, Geneva. https://www.who.int/gender/documents
/OMS_EthicsSafety10Aug07.pdf.
WHO (World Health Organization). 2016. Ethical and Safety
Recommendations for Intervention Research on Violence
against Women: Building on Lessons from the WHO Publi-
cation Putting Women First: Ethical and Safety Recom-
mendations for Research on Domestic Violence against
Women. Geneva: WHO.
The role of international organizations in improving public intent data | 89
International organizations can aid countries bilaterally to address
challenges regarding funding, technical capacity, governance, and data
demand and create global public goods to overcome these barriers.
Spotlight 2.2
The role of international organizations
in improving public intent data
More and better financing for
data production
Coalitions of international organizations and devel-
opment partners can provide coordinated global
solutions for activities that fulfill specific data needs.
Governments can finance such activities either under
national budgets or through loans or grants from
multilateral development banks. For example, the
World Bank’s Data for Policy Package identifies a core
set of social, economic, and sustainability statistics
crucial for monitoring and evaluating development
outcomes and provides governments with loans or
grantstoaddressthesedataneeds.Forrelativelylower
income, data-deprived countries, this aid can help
governments to prioritize which gaps to fill and sup-
plement scarce national funding. Another example is
the 50x2030 Initiative to Close the Agricultural Data
Gap, a multipartner initiative that seeks to transform
agricultural data systems across 50 low- and middle-
income countries by 2030. It uses innovative funding
mechanisms, leveraging donor funding to mobilize
national funding and create national ownership.
Advancing research and
development in methods and
tools
In addition to investing in improvements in the
technical capacity of data producers and users,
international organizations can also foster technical
capacity more broadly by providing global public
goods through research and development in methods
of data collection, curation, and analysis. For exam-
ple, they can support innovations in data capture,
including through portable sensors and mobile appli-
cations. These innovations must be validated rigor-
ously through methodological research activities that
compare the relative accuracy, cost-effectiveness, and
feasibility of new and traditional methods of data
collection. Based on such research, guidelines can be
formulated for integrating validated innovations into
surveys, censuses, and administrative records. Inter-
national organizations can play an important role
in carrying out such research and promoting these
innovations and associated guidelines. Partnerships
between international organizations and national sta-
tistical offices (NSOs) in methodological research and
development increase the likelihood that innovations
in data capture will be adopted and implemented.
Developing, disseminating, and
implementing global standards
for statistical activities
International organizations can also support efforts
to develop, disseminate, and implement international
standards and guidelines for statistical activities.
International statistical standards and guidelines
need to be disseminated and adopted at the country
level for data to be comparable across countries and
hence for policy makers to be able to compare their
performance with that of their peers. Among many
examples are the Systems of National Accounts
90 | World Development Report 2021
developed by the United Nations in collaboration
with several other partners; the International Labour
Organization’s International Standard Classification
of Occupations; and the monitoring and harmoni-
zation of data related to drinking water, sanitation,
and hygiene by the World Health Organization and
United Nations Children’s Fund (UNICEF). Technical
collaboration—including joint research among like-
minded international organizations, NSOs, technical
partners, and academia—is critical to the successful
production of international standards and guide-
lines, as are international forums for peer review,
discussion, endorsement, and promotion of these
public goods.
Coordinating actions to ensure
the effective diffusion of public
goods and funding activities
In the absence of coordination, organizations might
finance overlapping activities or fragment invest-
ments, overwhelm national data systems, or produce
conflicting standards and guidelines. The Inter-
Agency and Expert Working Groups as well as the
Intersecretariat Working Groups, under the aegis of
the United Nations Statistical Commission, provide
a platform for catalyzing collaborative work on the
development of standards and should continue to
be supported with periodic reviews of their terms of
reference and desired outputs. Awareness of these
working groups needs to be expanded, particularly
within international organizations, to assure coordi-
nated actions within an organization.
Making data accessible and
compatible with national
priorities and spurring local
demand
To satisfy increasing demand for data, international
organizations should make their own data, syn-
tax files, and metadata widely available and easily
accessible beyond their own institutions. The data
that international organizations require, such as
data on the Sustainable Development Goals, affect
the data produced by countries and can even crowd
out the domestic production of data. It is thus
imperative for such standards and goals to be made
compatible with the interests, priorities, and goals
of countries. When this is the case, the data main-
tained by international organizations can spur local
demand for cross-country data, foster their continued
production, and create a virtuous cycle of data pro-
duction and use.
Data as a resource for the private sector | 91
Main messages
Businesses are reaping tremendous value from both data created through
businesses’ economic activities and data shared by governments. Used as
an input in data-driven decision-making, those data can spur innovation in
products and services and reduce transaction costs, ultimately boosting
productivity, export competitiveness, and growth.
Use of data in the production process of firms may help tilt the playing
field toward poor people and underserved populations (who can trade
across platforms and access free services) by reducing fragmentation
in markets. However, it can also exacerbate domestic inequalities where
foundational skills, infrastructure, and finance are not widely available
in countries.
Use of data by businesses can also tilt the playing field away from poor
countries, whose local enterprises may struggle to compete with large
global players in part because of economies of scale and scope from data.
Although the use of data in the production process presents many
opportunities to solve development challenges, policy makers should
heed the risks this use presents for the concentration of economic power,
patterns of inequality, and protection of the rights of individuals.
Data as a resource for the
private sector
3
CHAPTE
R
1
2
3
4
92 | World Development Report 2021
Creating value and solving
development challenges through
data-driven business models
F
or millennia, farming and food supply have
depended on access to accurate information.
When will the rains come? How large will the
yieldsbe?Whatcropswillearnthemostmoneyatmar-
ket? Where are the most likely buyers located? Today,
that information is being collected and leveraged at an
unprecedented rate through data-driven agricultural
business models. In India, farmers can access a data-
driven platform that uses satellite imagery, artificial
intelligence (AI), and machine learning (ML) to detect
crop health remotely and estimate yield ahead of the
harvest. Farmers can then share such information
with financial institutions to demonstrate their poten-
tial profitability, thereby increasing their chance of
obtaining a loan. Other data-driven platforms provide
real-time crop prices and match sellers with buyers.
For remote populations around the world, receiv-
ing specialized medical care has been nearly impos-
sible without having to travel miles to urban areas.
Today, telehealth clinics and their specialists can
monitor and diagnose patients remotely using sen-
sors that collect patient health data and AI that helps
analyze such data.
Innovations like these herald the promise of busi-
ness models that apply data to create new and better
goods and services, helping to address development
challenges in the process. Both private intent and
public intent data are increasingly being used by
firms to create value in the production process. At the
same time, data are continually being produced as a
by-product of economic activity, creating digital foot-
prints that drive the data economy. With their grow-
ing capacity to collect, store, and process that data,
businesses find that their ability to extract value from
this data has been rising exponentially in recent years.
The COVID-19 crisis has created urgent demands
for the private sector to adopt data-driven solutions
to deal with the pandemic and increase resilience
and productivity for recovery. Big Tech companies
have been one of the few winners during the crisis as
consumers purchase more goods and services online.
As businesses shift toward recovery, the new reality
will likely accelerate trends toward data-driven tech-
nologies that allow for automation and traceability in
value chains.
For all their promise, however, the accelerating
pace of these trends also comes with risks related
to the concentration of economic power, greater
inequality, and protection of the rights of individuals.
The degree to which individuals can benefit from
the data-driven economy—including consumers,
entrepreneurs, and job seekers—will differ according
to their access to finance, education levels, skills,
and technology. In charting a way forward, policy
makers—across all stages of development of their
country’s data-driven economy—should remain alert
to these risks so that the use of data by firms contrib-
utes to broadly shared benefits.
The role of data in the
production process of firms
The role of data in the production process can be
conceptualized in different ways, depending on the
specificities of the firms, industries, technologies,
and types of data being considered. There is as yet
no overarching theory or consensus on the role of
data in the production process. The categories that
follow summarize various ways of understanding
the role of data in the creation of value by firms—as
a factor of production, as a productivity enhancer, as
a by-product, or as an output.
Data as a factor of production. For some firms, data
are considered an input central to their business,
essential to fulfillment of their core objectives. In this
context, data have been referred to as a factor of pro-
duction—on a par with labor, capital, and land—that
is a primary determinant of output and productivity.1
For example, many social media platforms are built
around monetizing their users’ data for advertising.
Data as a productivity enhancer. Data may also be
conceptualized as a driver of total factor productivity
(TFP). Increases in TFP reflect a more efficient use of
factors of production often thought to be driven by
technological change. Businesses use data along with
various technologies to become more productive by
improving their business processes, learning more
about their clients and customers, developing new
products, or making better data-driven decisions.2
In
this context, the addition of data to the production
process makes the main factors of production more
efficient, leading to better performance. According to
one study, in the US health care sector the use of big
data has been associated with a 0.7 percent increase
in productivity growth per year.3
Other studies have
found that among 179 large publicly traded US firms
the adoption of data-driven decision-making has led
to an increase in productivity of 5–6 percent.4
Dataasaby-productoftheproductionprocess.Data are
often passively created as a by-product of economic
activities. For example, call detail records (CDRs)
are a by-product of telephone usage. Observed data
Data as a resource for the private sector | 93
on ­
consumers’ browsing and buying patterns are a
by-product of online e-commerce. Data created in this
way can be used in the production of new products
or services, either by the firm that produced the
original data or by other firms with which the data
are shared, such as under commercial arrangements.
For example, e-commerce platforms use data created
as a by-product of transactions on their platform to
improve their product offerings; credit card com-
panies sometimes sell their transaction data for a
specific location to firms involved in tourism in that
location; and new firms use CDRs for commercial
purposes, including analytics and advertising.5
Data as an output. For some firms, data are the pri-
mary output of the production process. Examples are
data intermediaries, including rating services such
as Nielsen; pollsters such as Gallup; and data aggre-
gators such as dataPublica.6
These data are then used
either by other firms in their production processes or
by government in policy making.
In all cases, data have a role in creating value for
the economy, but the way in which data play into the
production process differs by context.
Pathways to development
Whether the use of data in the production process is
conceptualized as a factor of production or a driver of
productivity, its transformative effects on develop-
ment can be summarized by four channels:
1. 
Quality improvements in existing products and ser-
vices. This channel includes the use of data-driven
decision-making to provide consumers with bet-
ter health diagnostics, better credit scoring, better
search results, and more personalized product
recommendations.
2. 
Cost reduction in delivering products and services. Data
and analytics can reduce the costs of delivery,
which can then lower prices (subject to markets
being sufficiently competitive). For example, bet-
ter credit scoring can reduce the cost of delivering
loans and lead to lower interest rates on loans.
Sensor-based agricultural devices and platforms
that take and analyze soil readings can inform
farmers how much fertilizer they should apply,
which should reduce wastage and costs.
3. Greater innovation in development of new products and
services. Examples include the development of new
financial products, smart contracts and supply
chain tracking services, new products that rely on
applications such as online maps or translation,
and new consumer goods based on analysis of pur-
chasing trends.
4. 
More effective intermediation and lower transaction
costs. Platform firms can help solve market failures
and lower the entry and transaction costs for firms
that connect to those platforms. This happens
in part by reducing information asymmetries,
therebyincreasingtrustinthosefirms.Distributed
ledger technologies (DLTs) not only can reduce
transaction costs but also enhance trust through
secure transactions. Better intermediation can
disrupt traditional market structure and reduce
the market power of intermediaries, particularly
in sectors such as agriculture where they have tra-
ditionally played a central role in the value chain.
These four channels to increasing the impact of
data on development are driven by two key effects.
First, analytics applied to data can reveal patterns that
allow better data-driven decision-making. Second, data
can help to facilitate transactions, including by matching
the suppliers of goods and services with those who demand
them. In this way, the use of data can help overcome
market failures, with positive effects on productivity,
growth, jobs, and welfare (figure 3.1).
Data-driven businesses and the
technologies that help them
create value
Data-intensive analytics can be used to discover new
insights, enhance decision-making, and optimize pro-
cesses. When data are characterized by the “3 V’s”—
volume,velocity,andvariety—theycanserveasinputs
to big data analytics. Such analytics typically require
new methodologies and technologies to enable
enhanced decision-making (box 3.1). This chapter
focuses on the development impact of business mod-
els that use data-intensive technology or analytics as
their key value drivers, whether they are technology
firms (the providers of data-intensive technological
solutions) or traditional firms and entrepreneurs (the
adopters of data-intensive technologies).
Firmsmayusevariousdata-driventechnologiesby
themselves or in combination. A key business model
that has emerged using data-intensive technologies
are data-driven platform businesses, which use data,
along with AI/ML and other analytics, to intermediate
between distinct user groups to match supply with
demand. By overcoming informational asymmetries
and reducing search costs, these businesses facilitate
market exchanges and generate more data on users
and their behavior. Some may also use a combination
of other technologies. For example, the platform
GrainChain uses DLT to broker secure transactions
94 | World Development Report 2021
Related opportunities: access to “free” services, adaptation to COVID-19 challenges,
and support for the green agenda
Figure 3.1
Data in the
production process
intermediation
and lower
transaction costs
Product
innovation
Cost reduction and
process optimization
Quality
improvements
Better data-driven
decision-making
Data-driven
matching of demand
and supply and
transaction facilitation
• Boost opportunities
for lagging regions
and populations
• Create new export
opportunities
• Promote productivity
and growth
• Create new markets
and disrupt the market
power of traditional
incumbents
• Support local
entrepreneurship
ecosystems
Figure 3.1 The role of data in the production process: Pathways to development
Source: WDR 2021 team.
Box 3.1 Technologies and methods that support data-driven
decision-making and intermediation
Technology that supports data-intensive analytics:
artificial intelligence, including machine learning
Artificial intelligence (AI) and machine learning (ML) can
help firms analyze their data with less manual effort. AI is
the development and use of any device that perceives its
environment and takes actions that maximize its chance
of success of reaching a defined goal (including learning
and adapting to its environment). It is not a single tech-
nology but a family of technologies. Machine learning is
one application of AI. The algorithms that underlay AI
rely on inputs of large amounts of data to learn and pro-
duce accurate and valuable insights. Based on adoption
patterns, studies predict that firms responsible for about
70 percent of economic output will have adopted at least
one type of AI technology by 2030.a
Data-intensive analytic applications and big data
analytics sometimes require that data be processed in
different formats and distributed across different loca-
tions. These may include cloud computing, bio-inspired
computing, or quantum computing. They also require the
capacity to store big datasets and to clean them to
correct inaccuracies.
Technology that collects data and actions insights
from analytics: smart devices and devices connected
through the Internet of Things (IoT)
Devices include sensors and monitors that generate
data. Smart devices rely on these “machine-generated”
data to improve their operations, often using AI. Devices
are increasingly being connected to the IoT, which allows
them to receive and send data from and to other IoT
devices on ground moisture, climate and air quality,
individuals’ health metrics, firm asset performance, and
the movement of goods through supply chains. IoT and
machine-generated data from devices are poised to mul-
tiply exponentially the data generated by businesses,
with potential for development in agriculture, health,
manufacturing, and transportation (such as driverless
vehicles). IoT devices already exceed the number of
internet users and are forecast to reach 25 billion by
(Box continues next page)
Data as a resource for the private sector | 95
between buyers and sellers of agricultural commodi-
ties, while employing Internet of Things (IoT) devices
to accurately measure variables such as commodity
weight. In those platform businesses that earn a sig-
nificant proportion of their revenue from advertising,
datacollectedthroughtheplatformareusedtoinform
that advertising. Platform models are a key focus
of this chapter and of many of the economic policy
issues raised in chapter 7 because of their importance
to low- and middle-income countries.
The extent to which data-intensive technologies
can be deployed relies on the presence of key infra-
structure, most fundamentally network coverage.
There is also the challenge of bringing more people
online, especially in countries with a gap between the
number of people who have access to networks and
those who are online. This gap is a function of afford-
ability, the existence of local content, and digital skills
(see chapter 5). As more people and devices come
online and data usage matures, the network capacity
needed will grow, requiring sufficient spectrum to be
made available for mobile use—especially in lower-
income countries where mobile is the predominant
technology. Although 4G technology is sufficient for
many IoT uses, 5G will be needed for those uses that
require ultra-reliability and low latency such as smart
energy grids and autonomous vehicles. Reliability
of connection is important for DLT applications that
must keep a reliable and consistent record of data.
Storage and analysis of the data generated through
IoT devices and platform business models depend
on cloud computing (remote storage and process-
ing infrastructure) and the ability to transmit data
over the internet to data centers either locally or
abroad. Beyond data infrastructure, most technology
applications require a suite of other foundational
systems to create value, including reliable payment
systems and logistics networks, transport infrastruc-
ture, and address systems. Data infrastructure policy
is discussed further in chapter 5.
Figure 3.2 summarizes how data are used as an
input to and produced as a by-product of economic
activity. It illustrates how data created through eco-
nomic activity can be used as an input to either the
same economic activity or new activities.
Focus on platform firms in low-
and middle-income countries
Platform businesses, one of the most ubiquitous and
transformative data-driven models today, reduce
transaction costs and alleviate market failures.7
Ranging from start-ups to businesses operating at
scale, they are a mix of both locally grown and foreign
firms, and they are expanding across low- and middle-
income countries. More than 300 digital platforms
headquartered in Africa were active across major
Sub-Saharan African economies as of 2020.8
In Asia, a
study looking at local platforms that had reached scale
identified 62 major local platforms with an individual
market capitalization of at least US$800 million as of
2016, half of which were located in China.9
The diversity of new platforms is evident in recent
research examining both start-ups and scaled plat-
forms. At least 959 platform firms have established a
physical presence in a sample of 17 low- and middle-
income countries10
from all regions across four sectors
that are important for jobs or economic productivity:
e-commerce, transport and logistics (including both
freight and passenger transport), agriculture, and
Box 3.1 Technologies and methods that support data-driven
decision-making and intermediation (continued)
2025, with the introduction of fifth-generation (5G)
wireless technology.
Technology that creates transparency and trust in
data records: distributed ledger technology, including
blockchain
Distributed ledger technology (DLT) is a distributed
database in which data are recorded, shared, and syn-
chronized across the nodes (or devices) of a network.
Blockchain is a type of DLT whereby information is con-
solidated into “blocks” that are linked in a way in which
they can add information layers to the ledger, which
cannot be changed (in an “append-only” fashion). Block-
chain records transactions, tracks assets, or transfers
value between two parties in a verifiable and permanent
way without the need for a central coordinating entity.
Because everyone participating in the blockchain can see
all transactions, the technology engenders peer-to-peer
trust and has several applications, including enabling
payments, smart contracts, supply chain tracking, and
resolving data protection and security issues in the IoT.
a. MGI (2018).
96 | World Development Report 2021
tourism.11
In the sample, Bangladesh, Brazil, Indone-
sia, Kenya, and Nigeria have relatively high numbers
of platform firms when controlling for gross domestic
product (GDP) per capita (figure 3.3, panel a).
Across the countries in the sample, most platform
firms are recent entrants—55 percent were estab-
lished in the past five years.12
Only 11 percent of firms
were established more than 10 years ago. Firms also
tend to be small—over 80 percent have 50 or fewer
employees, and almost half (47 percent) have 10 or
fewer (figure 3.3, panel b). Most firms have remained
active (defined as having an active and up-to-date
online presence) since they were established; the
average share of firms currently active across regions
is more than 80 percent. Sub-Saharan Africa is an
outlier: nearly half of its firms appear to be inactive.13
E-commerce has the highest share of platform
firms in 82 percent of countries in the sample, with
the highest shares in South Asia and the Middle East
andNorthAfricaandthelowestinEuropeandCentral
Asia. The agriculture sector tends to have the smallest
share of firms across regions, with the exception of
Sub-Saharan Africa. The importance of e-commerce
in the data economy is also reflected in web traffic.
Although local data-driven firms are on the rise in
low- and middle-income countries, foreign-headquar-
tered firms have a significant presence, underscoring
the global nature of the data-driven economy. Their
presence is also a reminder that the platform econ-
omy is still nascent in lower-income countries rela-
tive to high-income economies (partly due to issues
around trust, lack of digital skills, and lack of access
to finance). Of the top 25 websites in terms of traffic
in the 17 low- and middle-income countries sampled,
59 percent belong to firms with foreign headquarters
on average14
—however, the figure varies across coun-
tries (figure 3.4). Although the presence of firms from
high-income countries in lower-income countries is
Figure 3.2 The role of data in economic activity
Source: WDR 2021 team.
Note: AI = artificial intelligence; DLT = distributed ledger technology; IoT = Internet of Things.
Data produced
as a by-product
of economic
activity
Data inputs for
economic activity
Platforms and other data-driven
businesses using technologies, including
• Big data analytics for insights and
innovation, including using AI
• IoT devices and smart devices
• Transactions facilitated by DLT,
including blockchain
Economic activity (firms,
entrepreneurs, farmers)
Data used to inform
advertising
Supported by network connectivity, storage capacity, and processing capacity
Source of revenue for
economic activities in
advertising-based
business models
• Data observed—on users, objects, and
processes—through economic activity,
including machine-generated data
• Data inferred through analytics
Big data cleaning
and storage
• Data volunteered by users
• Open public data
Data as a resource for the private sector | 97
Figure 3.3 Platform firms are numerous in some lower-income countries but tend to be small
Sources: Nyman and Stinshoff (forthcoming), based on information from Crunchbase, Crunchbase (database), https://www.crunchbase.com/; World Bank, Digital Development
(database), https://www.worldbank.org/en/topic/digitaldevelopment; Dow Jones and Company, Factiva (database), https://professional.dowjones.com/factiva/; Thomson Reuters
Foundation, “Inclusive Economies,” http://www.trust.org/inclusive-economies/; Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www
.alexa.com/topsites/countries; World Bank, World Development Indicators (database), https://datatopics.worldbank.org/world-development-indicators/.
Note: Panel a shows the number of platform firms and platform firms weighted by gross domestic product (GDP) per capita in selected low- and middle-income countries The total sample
of platform firms is 959. Per capita GDP is in constant 2010 US dollars for 2019. Panel b shows the share of firm sizes in terms of number of employees by region in a sample of 595 active
platform firms.
a. Number of platform firms and platform
firms weighted by GDP per capita
b. Distribution of platform firms, by region
and number of employees
Number of employees:
Share
of
firms
(%)
Number
of
firms
per
US$1,000
GDP
per
capita
Number
of
firms
2 5 6 9 15 25
34 34 35 41 42 43 56
78
106
149
0
5
10
15
20
25
30
35
40
0
50
100
150
200
250
300
A
r
m
e
n
i
a
M
o
r
o
c
c
o
T
u
n
i
s
i
a
S
r
i
L
a
n
k
a
P
e
r
u
P
h
i
l
i
p
p
i
n
e
s
E
g
y
p
t
,
A
r
a
b
R
e
p
.
C
o
l
o
m
b
i
a
U
k
r
a
i
n
e
B
a
n
g
l
a
d
e
s
h
K
e
n
y
a
S
o
u
t
h
A
f
r
i
c
a
M
a
l
a
y
s
i
a
N
i
g
e
r
i
a
I
n
d
o
n
e
s
i
a
R
u
s
s
i
a
n
F
e
d
e
r
a
t
i
o
n
B
r
a
z
i
l
Number of firms
Number of firms per US$1,000 GDP per capita (right axis)
0
10
20
30
40
50
60
70
80
90
100
S
o
u
t
h
A
s
i
a
E
a
s
t
A
s
i
a
a
n
d
P
a
c
i
fi
c
E
u
r
o
p
e
a
n
d
C
e
n
t
r
a
l
A
s
i
a
M
i
d
d
l
e
E
a
s
t
a
n
d
N
o
r
t
h
A
f
r
i
c
a
d
L
a
t
i
n
A
m
e
r
i
c
a
a
n
t
h
e
C
a
r
i
b
b
e
a
n
S
u
b
-
S
a
h
a
r
a
n
A
f
r
i
c
a
1–10 11–50 51–100 101–250 250
284
Figure 3.4 The importance of domestic versus foreign-headquartered firms
differs across countries as indicated by firm share of top websites
Source: WDR 2021 team, based on Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www.alexa.com/topsites
/countries.
Note: The figure shows the percentage of websites for firms with foreign headquarters versus domestic headquarters among the top 25 websites per country
based on traffic. Headquarters is understood to be the global headquarters, not the domestic or regional office. Total sample size is 425 websites.
20
24
48
52
52
56
56
60
64
64
64
64
68
68
76
80
84
80
76
52
48
48
44
44
40
36
36
36
36
32
32
24
20
16
0 10 20 30 40 50
Share of top websites (%)
60 70 80 90 100
Indonesia
Russian Federation
Egypt, Arab Rep.
Bangladesh
Nigeria
Morocco
Kenya
Armenia
South Africa
Malaysia
Ukraine
Tunisia
Brazil
Sri Lanka
Colombia
Peru
Philippines
Foreign firms Domestic firms
98 | World Development Report 2021
widespread, the opposite is not true. Only 15 percent
of digital firms headquartered in Sub-Saharan Africa
operate outside the region, and the majority of those
have expanded to the Middle East and North Africa.15
The leading global platforms are highly relevant to
the digital ecosystems of lower-income countries and
their citizens, particularly for online search and social
media. Google, YouTube (which is owned by Google),
and Facebook are among the top 10 most visited web-
sitesin62of77low-andmiddle-incomecountries(fig-
ure 3.5, panel a). These platforms also have the highest
average daily time spent on the site per user globally
(figure 3.5, panel b). In online markets where firms
compete for the attention of viewers, such popularity
can significantly intensify these platforms’ market
power in advertising (which is, in turn, important for
suppliers of other products) and increase the amount
of data being collected about users. Google’s Next Bil-
lion Users initiative is specifically aimed at develop-
ing products and services for lower-income countries.
Facebook has launched an app aimed at providing free
data in lower-income countries. WhatsApp (owned
by Facebook) is by far the most used mobile applica-
tion globally in terms of time.16
Because of the global
nature of these firms, dynamics in overseas markets
that affect the strategies and policies of these large
platforms will have repercussions for those in low-
and middle-income countries.
Data traffic over the internet is also highly con-
centrated in a few companies. Six US companies
generate more than 40 percent of the world’s internet
data flows (figure 3.6, panel a). Across the top 25
websites (by traffic) in the 17 sampled countries fea-
tured in figure 3.4, some 60 percent is owned by five
firms headquartered in the United States (Google,
Microsoft, Facebook, Verizon, and Amazon). Signif-
icant non-US parent companies include Naspers
(headquartered in South Africa), Alibaba (China), and
Jumia (which has its operations largely in Nigeria)—
see figure 3.6, panel b.
Data inputs for economic activity
The “digital footprint” and data collection
by firms
Everything a digital user does leaves a trail, whether
it is making a phone call, sending a text, conducting
an online search, posting on social media, or making
a digital transaction. The digital footprint of an indi-
vidual or business is their collection of traceable dig-
ital activities and communications on the internet or
other digital media. Data collected through devices—
particularly IoT devices—can also capture insights
on individuals and firms. For firms, such insights
are gained from data on throughput and efficiency,
spare capacity, and asset quality, among other things.
For individuals, they typically involve health and bio­
metric data.
Digital footprints can be actively created when
a user makes a choice to share information, such as
by posting on social media or volunteering informa-
tion to register for services. Or they can be passively
Figure 3.5 Users visit and spend more time on Facebook, Google, and YouTube
than other websites
Source: WDR 2021 team, based on data from Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www.alexa.com
/topsites/countries.
Note: Sample of 1,270 websites (top 10 websites in 127 countries).
a. Share of countries where
Facebook, Google, and YouTube are in the top 10
visited websites, by country income group
b. Daily time spent on website (global average):
Facebook, Google, and YouTube versus
other top 10 visited websites
100
94
78
Low-income
0
25
50
75
100
High-income
Percent
Middle-income 0 5 10 15 20
Average daily time spent per user (minutes)
Facebook.com
Google.com
Youtube.com
Others (average)
17.42
13.12
13.04
7.01
Data as a resource for the private sector | 99
created, when data are left behind as a by-product
of other activities such as an Internet Protocol (IP)
address, search history, or internet clicks. Firms
typically collect both active and passive data. Often,
this information is collected at exceedingly high fre-
quency and microgranularity.
Big data and ML algorithms enable firms to draw
inferences about the characteristics of individuals
(such as attitudes and socioeconomic status) and
other businesses (such as performance, capacity, and
reputation). In low- and middle-income countries,
digital footprints are best known for their ability to
predict repayment behavior. However, applications
also extend to the development of new products and
improvements in service delivery across the economy,
to the development of analytical tools for businesses,
and, importantly, to the creation of consumer pro-
files that can be used to sell advertising services—a
practice that ultimately subsidizes many of the “free”
products that consumers use today. A number of risks
have been identified and concerns raised about these
methods and applications. These are discussed later
in the chapter.
Data collected through mobile phone use has been
one of the fastest-growing sources of user informa-
tion and behavior. CDRs held by mobile network
operators (MNOs) contain certain data on every call
and text made, including the telephone number of the
caller and receiver, the date and time of the interac-
tion, and the associated cellphone tower. This infor-
mation is primarily collected for billing purposes but
can be used as well to identify the behavior, mobility
patterns, and social networks of users. MNOs can
also track data on use of value-added services, inter-
net services, and mobile money transactions.17
Data-driven firms—including e-commerce, online
search, and social media firms—produce, in addition,
data on behavioral patterns that can be detected by
noting the time, frequency, and extent of transactions
or communications made. Several variables that can
help predict economic status are available to platform
firms, including the device type of their users (desk-
top, tablet, mobile), the operating system (Windows,
iOS, Android), and the channel through which a user
arrived at the firm’s home page. For example, having
an iOS device consistently correlates with being in the
Figure 3.6 Internet traffic in low- and middle-income countries is concentrated in
several US-based firms
Source: Sandvine 2019. Data at http://bit.do/WDR2021-Fig-3_6_a.
Note: “Others” refers to file sharing, marketplace, security, and virtual private network (VPN)
messaging, as well as cloud and audio streaming.
Source: Nyman and Stinshoff (forthcoming), based on information in Alexa
Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020),
https://www.alexa.com/topsites/countries.
Note: The panel shows the percentage of the top 25 websites owned by the listed
parent organizations in the sample. Traffic rank is from a low of 1 (more traffic) to
a high of 20 (least traffic). Total sample includes 425 websites from 17 low- and
middle-income countries. Ownership is defined as majority shareholding.
a. Share of global internet traffic by firm,
2018 (% of total)
Share
of
top
websites
owned
by
firm
(%)
Rank
b. Ten most visited parent firms among
the top 25 websites
Google, 12 Netflix, 11
Apple,
4
Microsoft,
5
Amazon, 3
Facebook, 8
Others, 57
30
28.3
11.4
8.9
G
o
o
g
l
e
M
i
c
r
o
s
o
f
t
F
a
c
e
b
o
o
k
V
e
r
i
z
o
n
A
m
a
z
o
n
Z
o
o
m
A
l
i
b
a
b
a
N
a
s
p
e
r
s
J
u
m
i
a
6.3 5.9 5.5
3.8 3.8
1.7
25
20
15
10
5
0
20
18
16
14
12
10
8
6
4
2
0
% of websites Average rank (right axis)
100 | World Development Report 2021
top quartile of the income distribution in the United
States.18
A study in Germany found that the time of
day that purchases were made on an e-commerce
site was predictive of a consumer’s self-control and
repayment behavior. People who made purchases
between noon and 6 p.m. were half as likely to default
on their payment for the products bought as those
who made purchases between midnight and 6 a.m.
Consumers coming from a price comparison website
were roughly half as likely to default as customers
directed to the site via search engine ads, consistent
with research on impulse shopping.19
Finally, firms collect data that are generated as
a result of phenomena that are not attributable to a
specific individual or business. These data are often
generated by machines or devices and can include
data on traffic, weather and climatic conditions, and
network usage in the case of computing or network
industries. This Report highlights many examples of
using this type of data for development purposes.
The use of open public intent data by
businesses
Public intent data are also used by businesses for
commercial purposes, particularly where those data
are nonpersonal and where there are positive spill-
overs from the private use of such data. The use
of open public intent data by private firms is preva-
lent in advanced economies with advanced national
data systems, although there are also examples from
low-income and emerging economies. Spotlight 3.1
discusses these uses in greater detail.
The positive development
impacts of data used in the
production process
This section presents examples of potential positive
development impacts that data used in firms’ pro-
duction processes can facilitate (following the third
development pathway of data generated by private
firms described in chapter 1). It then turns to the
risks stemming from the use of data by firms that
policy makers need to consider. These examples are
included to help set out what the current production
possibility frontier might look like going forward.
Although many of the examples may be outliers, they
can indicate what may be possible and what countries
may want to aspire to.
Boosting opportunities for lagging regions and popu-
lations by reducing market fragmentation. Data-driven
business models can lower entry costs to markets
and provide new opportunities for small firms and
low-income households.20
Firms in less populated
areas can expand their access to markets through
data-driven platforms, which match sellers and buy-
ers, and through the logistics services of e-commerce
platforms, which provide support in bringing prod-
ucts to market from more remote areas. Costs associ-
ated with distance are as much as 65 percent lower for
onlinevendorsactiveonglobale-commerceplatforms,
compared with those for offline vendors.21
Lower-
income countries could reap substantial benefits
from such e-commerce platforms: the cost-reduction
effects of platforms tend to be greater for exporting
countries that are unknown or less trustworthy to
consumers (as measured by corruption indexes).22
In China’s rural (and traditionally lower-income)
Taobao Villages, where annual e-commerce transac-
tion volumes exceed 10 million yuan and at least 10
percent of households engage in e-commerce, rural
households trading goods on the Taobao platform
have significantly higher incomes and higher income
growth than those that do not.23
These data are sug-
gestive of the large benefits of data-driven business
models in lagging regions, but the evidence base
is just beginning to develop, and not all signs are
encouraging. For example, a recent randomized trial
in China that connected rural villages to e-commerce
found little evidence of income gains for rural pro-
ducers and workers. Understanding what factors led
to Taobao success and meager gains for other Chinese
villages is a crucial part of the future research agenda.
Creating new export opportunities. Not only can
entrepreneurs market their goods remotely, but
intangible data-enabled services now flow across
borders. The boom in data-enabled services creates
opportunities for new entrants in global trade and
may foster economic growth for countries tradition-
ally lagging in access to global markets. For example,
the rise in Indian services exports has been associated
with gains in per capita income and a decline in urban
and rural poverty head count ratios.24
The Indian dig-
ital services industry has also created employment,
especially for women and in smaller cities with
populations of about 1 million or less, which can help
bridge economic and social inequalities.25
The Philippines is another important beneficiary,
exporting an estimated US$23 billion in services
enabled by information technology (IT)—equivalent
to almost half of the country’s merchandise trade
exports and more than double its total agricultural
exports.26
In Africa, Senegal, a low-income country,
boasts a dynamic digital services sector focused on
business services as well as apps and software solu-
tions geared toward regional markets. Mauritius
Data as a resource for the private sector | 101
has developed a fast-growing export-oriented digital
business services industry, diversifying its services-
centered economy away from the country’s tourism
sector.27
Such expansion into digital services has
proven important in light of the tourism collapse pro-
voked by the COVID-19 pandemic. Studies show that
trade in services, in particular IT-enabled business
services, is not as prone to sudden collapses as other
forms of global trade.28
Promoting productivity and growth. Although evi-
dence from lower-income countries is scant, across
four industries (hotels, restaurants, taxis, and retail
trade) in 10 member countries of the Organisation for
Economic Co-operation and Development (OECD)29
the average service provider enjoyed bigger increases
in the overall productivity of labor and capital in
countries with relatively high online platform devel-
opment between 2011 and 2017.30
Increased e-sales
activities accounted for 18 percent of the labor pro-
ductivity growth in 14 European countries from 2003
to 2010.31
There is also some evidence from Europe of
a smaller productivity gap between large and small
firms in the sectors that use online sales most.32
Machine learning could fundamentally revolu-
tionize innovation, bringing data to the core of the
growth process.33
Empirical evidence on 18,000 US
manufacturing plants between 2005 and 2010 finds
that firms with more intensive data usage are signifi-
cantly more productive due to not only technology
adoption but also data-driven decision-making.34
Globally, AI could deliver additional economic output
of about US$13 trillion between 2018 and 2030 (net-
ting out competition effects and transition costs),
boosting global GDP by about 1.2 percent a year,
according to a 2018 modeling exercise.35
Of course, not all countries will be affected sim-
ilarly, and the evidence base for low- and middle-
income countries needs to be developed. The focus
in this chapter on data-driven business models and
productivity enhancement through data reveals an
imbalance toward case studies from higher-income
countries. That imbalance is itself a sign of the lop-
sided distribution of benefits deriving from data.
Creating new markets and disrupting the market
position of traditional incumbents. E-money platforms,
among others, have challenged traditional banks
and other service providers in transferring funds
to and from accounts. For example, M-Pesa reached
9.5 million customers within its first three years in
operation in Kenya, a country with only 8.4 million
bank accounts.36
Electronic freight exchanges such
as uShip and Mober that match carriers with cargo
holders have triggered global logistics providers such
as Schenker and DHL to develop their own digital
exchanges. Incumbent taxi providers who were pro-
tected by fixed license caps are now exposed to com-
petition from ride-hailing apps, which can improve
inclusion of some parts of the population. For exam-
ple, in Mexico City the proportion of female drivers
with Uber (5 percent) is higher than in Mexico’s taxi
industry(0.5–2.5percent).37
A2017studyof2,000firms
in 60 countries found that digital entrants boost the
size of an industry by both realizing latent demand
and taking market share from incumbents.38
Indeed,
the study showed that the profits of incumbents fall
significantly in response to competition from digital
entrants, and the slowest-growing incumbents are
the most affected. These effects should translate into
welfare benefits for users through more affordable
products and greater innovation, although the ben-
efits are less likely felt by the lowest income groups
because they do not participate in these markets as
consumers or suppliers.
Supporting local entrepreneurship ecosystems. An eco-
system is an interconnected set of services accessed
through a single integrated experience. Facebook,
for example, enables users to shop, make hotel
bookings, message contacts, read the news, and chat
with a doctor—all with different firms but through a
single interface. Successful data-driven firms often
spark new business models through complementary
products and aftermarkets. Such local ecosystems,
consisting of symbiotic and interdependent firms,
frequently rise up around leading multinational
platforms. The leading global platforms are therefore
highly relevant for the digital ecosystems of countries
and their citizens.
Ecosystems built around larger firms can enable
lower-income economies to build digital capabilities,
especially because they integrate data across a series
of services to increase the scale and scope of datasets.
For example, Apple’s Healthkit platform offers Apple
device users the option to share their health and activ-
ity data across applications on their smartphones.
This integration allows researchers, hospitals, and
developers of health care and fitness apps to access
valuable data to inform patient care, marketing, and
product development. The development of a healthy
ecosystem relies on provision of access to data and
systems.
Related opportunities arising from
data-driven business models
Three related opportunities can arise from data-
driven business models that are not linked directly to
the four channels discussed earlier. Evidence is just
102 | World Development Report 2021
emerging, but it indicates that these models could be
important for lower-income countries.
Providing nominally “free” services to consumers. Nom-
inally “free” or “zero price” services have become an
integral part of our lives. Free messaging services,
video communication, social connection tools, search
engines, map services, storage, and translation and
payment apps are now commonplace and increas-
ingly being developed with low-income consumers in
mind. Google is rolling out scaled-down search and
e-mail apps for low-end smartphones, as well as voice
search in various dialects to overcome literacy chal-
lenges. Free services also offer important inputs for
other data-driven businesses. Digital start-ups often
rely on integration with digital payment systems,
cloud storage, and online analytical tools. For trans-
formative business models that match and connect
users in different geographic locations, online map
services are a crucial input.
The welfare gains of nominally “free” digital goods
are substantial. However, they are likely to be under-
estimated because they are not captured in GDP—
they have a nominal price of zero. A recent study
found that including the welfare gains from Facebook
would add between 0.05 and 0.11 percentage points to
GDP growth per year in the United States.39
“Free” services reflect the very low marginal cost
of replicating and distributing data and certain dig-
ital services. But they are ultimately made possible
on a large scale because firms can monetize data
through advertising and data sales, thereby giving
rise to the idea that users in fact “pay with their data.”
This approach is reflected in the revenue structure of
some of the tech giants. Mobile advertising made up
84 percent of Google’s total revenue in 2019,40
while
Alibaba earns more than half its global revenues from
advertising.41
Both Google and Facebook have offered free
internet access in lower-income countries. Facebook’s
schemes have been criticized for mining the data of
low-income users while initially defying net neutral-
ity rules and offering access to only a limited set of
sites.42
Meanwhile, Google attempted to monetize its
free Wi-Fi service (rolled out in nine middle-income
countries) by showing ads to users, but it recently
discontinued these services because they have proven
unprofitable.43
More limited opportunities in low- and middle-
income countries for monetizing data may there-
fore limit the ability of firms operating locally to
offer free services. Facebook’s average revenue per
user in the United States and Canada was US$41.41
in 2019, whereas it was US$2.48 in all countries
except the United States, Canada, and European
and Asia-Pacific countries. Firms focused on lower-
income countries may find it difficult to replicate the
free services offered by firms that operate globally
and can cross-subsidize their operations with global
advertising.
Adapting to new ways of doing business because of the
COVID-19 pandemic. Urgent demands have surfaced
for the private sector to adopt data-driven solutions
to deal with the pandemic and increase resilience
and productivity for recovery. Firms will also need to
increasingly invest in the transparency and traceabil-
ity of value chains, increase their reliance on automa-
tionintheproductionprocess,andmakemoreprecise
predictions about their demand and input supply that
anticipate disruptions. Data-driven technologies will
play a critical role in helping firms adapt. Smart con-
nected devices and robots that automate previously
manual processes while collecting and analyzing data
will serve as a key input in this reengineering of busi-
ness processes during the recovery. AI that can pre-
dict consumption and production trends, combined
with platforms that provide matching through data
analysis, may also create on-demand labor forces.
Although the intensifying adoption of data-driven
business models can be an opportunity, these trends
also hold risks for the international competitiveness
of those countries not at the technological frontier,
with implications for jobs and inequality.
Impacts on the green and sustainability agenda. The
increased use of data-intensive technologies con-
tributes to global carbon emissions. And yet these
technologies can also help firms better manage their
environmental footprint and become sustainable,
while allowing sectors such as agriculture to adapt
to climate change. By improving the efficiency and
traceability of supply chains and production pro-
cesses, these technologies can reduce waste, enable
circular solutions, promote sustainable sourcing
of inputs, and empower consumers to make more
environmentally responsible decisions. By making
energy systems more efficient (including through
automated tracking of energy use), they can facilitate
the adoption of renewable energy through better
management of performance. Data-driven farming
can help farmers adapt to climate change while
rationalizing use of harmful inputs. However, the
net impact of such technologies on the environment
will depend on several factors, including responsible
actions by consumers and the decarbonization of the
energy sector.
Data as a resource for the private sector | 103
How use of data in the production
process is transforming sectors
New business models that use data to drive value are
springing up in low- and middle-income economies.
The data and technology that can be most transfor-
mative depend on the types of market failures that
need to be solved and the development channels that
are possible. This differs across sectors.
Finance
Some 1.7 billion adults worldwide did not have a bank
account as of 2017.44
At least 200 million small firms in
low- and middle-income countries have unmet credit
needs estimated at US$2.2 trillion.45
Several market
failures are to blame. First, the high cost of traditional
banking relative to the low-value transactions and
balances of low-income individuals makes it less via-
ble or attractive for traditional banking to serve this
market segment. Second, information asymmetries
between financial institutions and low-income bor-
rowers make it difficult to assess credit risk, thereby
limiting the supply and raising the price of credit.
Finally, formal financial services lack relevant prod-
ucts and services for low-income users. Digitization
and data analytics can help overcome these chal-
lenges to make services more accessible, affordable,
and secure.
Alternative credit scoring algorithms. Financial ser-
vice providers are increasingly adopting alternative
credit scoring techniques that take advantage of
users’ digital footprints to train ML algorithms to
identify, score, and underwrite credit for individ-
uals who otherwise lack documentation of their
creditworthiness.
Two early movers that have achieved scale—
Lenddo (Philippines) and Cignifi (operating in
Africa, Asia, and Latin America)—use data that
con­
sumers volunteer about their cellphone use
patterns, digital transactions, and social media and
web browsing activity to build algorithms that map
behavioral patterns and score the creditworthiness
of borrowers.
Payment and transaction histories have also
enabled e-commerce firms to move into lending.
Ant Financial’s MYbank app links directly to users’
Alibaba transaction data to score and extend ­
credit.46
Amazon’s small business loan operation (which
operates in China, India, and other countries) relies
on a seller’s sales performance on Amazon to decide
whether to extend credit. Destácame, a Chile-based
alternative credit scoring start-up and the first of its
kind in Latin America, uses data on utilities payments
for its credit scoring.
Psychometric tests, which assess the abilities, atti-
tudes, and personality traits of individuals, are also
being used to screen borrowers. LenddoEFL provides
financial institutions with psychometric tools that
analyze applicants’ answers on an online quiz, includ-
ing factors such as how long it takes applicants to
answer and how they interact with the web interface.
LenddoEFL claims to have made more than 12 mil-
lion credit assessments through more than 50 client
financial institutions around the world.47
Notwithstanding the opportunities these
approaches offer, observers have raised concerns
that using algorithms in this way can discriminate
against individuals and reinforce existing racial, gen-
der, and economic inequalities. Lenddo, for example,
has been known to rate consumers as less credit-
worthy if they are friends on Facebook with someone
who was late paying back a loan.48
Algorithmic bias is
discussed later in this chapter and in chapters 4 and 6.
Payment systems. Digital payments are by defi-
nition flows of electronic data. They are central to
powering e-commerce and other online transactions,
while simultaneously generating data on purchasing
patterns that can provide insights into a plethora of
consumer characteristics.
Mobile payments in particular have reduced the
transaction costs of transferring resources, enabling
new ways for households and firms to make
payments, save, and send remittances. The well-
documented benefits of mobile money in lower-
income countries include lowering transaction and
transport costs; encouraging saving through the
relatively safe storage of value in a digital format;
empowering female users through greater privacy,
thereby increasing their bargaining power within
families; and allowing more effective risk sharing
between households.49
In India, mobile money has
improved the ability of households to share risk,
providing welfare benefits of 3–4 percent of income
on average.50
Research also has found a significant
link between the use of the mobile financial service
M-Pesa and a reduction in poverty among Kenyans,
with greater impacts on female-headed households
through changes in financial behavior and movement
of labor from subsistence farming to secondary jobs
and entrepreneurship.51
Use of transaction data for product development. Digi-
tal payments generate large amounts of data on how
people make purchases and transfers, which can be
especially important in economies that run largely
104 | World Development Report 2021
on cash and among demographic groups that have a
small digital footprint. These data can enable firms to
see which regions and market segments are expand-
ing, understand user preferences and behavior to
target services such as microcredit, and predict fraud
and increase security within and between platforms.
For example, Mastercard’s Tourism Insights service
allows the tourism industry to make better invest-
ments by leveraging big data to provide information
on travelers’ preferences.52
In South Africa,TymeBank
offers customers incentives to link their debit cards
to their retail loyalty programs, providing access to
data on customer spending that are used for product
design.
Distributed ledger technology, including blockchain.
Blockchain eliminates the need for financial inter-
mediaries, drastically reducing settlement time and
making transfers almost instantaneous. The use of
digital technology can embed rules into smart con-
tracts, including automated execution of contract.
The explicit terms and payments of DLT can simplify
complex negotiation and verification processes.53
DLTs’ use of smart contracts in the provision of loans
and credit can also improve trust. This is especially
important for new and smaller firms that lack the
requisite credit histories and collateral.
Despite the promise of blockchain, there are seri-
ous challenges to its widespread adoption, including
unclear or unfavorable regulatory approaches and
lack of user understanding. Adopting blockchain
where the technology does not address the underly-
ing issue or consumer needs is also problematic.
Agriculture
Managing production and marketing risks is a key
challengeforsmallholderfarmersandagribusinesses.
Remote sensing and geographic information sys-
tems, together with data analytics, provide insights
into farming operations and propel the development
of smart farming, which can help manage production
and financial risks. For example, NubeSol, an Indian
agtech firm, provides sugarcane growers with a
monthly yield map of their plots, with forecasts of
yields and recommendations on inputs such as fertil-
izer based on remote sensing and data analytics.
JD Digits (JDD), a technology firm in China, is
adopting AI techniques and big data to provide credit
to farmers who raise pigs. Farms install AI-enabled
cameras that can recognize pigs’ faces, as well as IoT
technology to transmit data about the farms’ physical
conditions. If a pig with feeding abnormalities has
been identified, the algorithm can quickly extract
information about its growth history and immune
status to provide customized feeding care. The IoT
system adjusts farm conditions such as humidity,
temperature, and lighting based on real-time data on
the farm. Using information about farm operations,
JDD also carries out credit assessments to provide
farmers with loans, which has reduced their nonper-
forming loan ratio to nearly zero.
Platforms are using data as well to provide a
range of services and products along the value chain,
including by reducing idle capacity in machinery.
Hello Tractor, which emerged in Nigeria, operates a
platform connecting tractor owners and farmers who
lack their own equipment. Data about tractor loca-
tions and availability are monitored using an installed
device and then transmitted to Hello Tractor’s mobile
app platform, which farmers can use to submit a
booking request. In this way, farmers are able to find
the most cost-effective available tractor, and tractor
owners are able to monitor the use of equipment.
Another agriculture platform, DigiCow, pioneered
in Kenya, keeps digital health records on cows and
matches farmers with qualified veterinary services.
Integrated, data-focused solutions are emerging
along the whole agriculture value chain. Digifarm,
a mobile platform offered by Safaricom in Kenya,
provides farmers with one-stop access to a suite of
products, including financial and credit services,
quality farm products, and customized information
on best farming practices. Mobile money data from
M-Pesa and data on the way people behave on the app
are taken into consideration to provide farmers with
tailored products and services.
As agriculture supply chains become more com-
plex, margins imposed by different intermediaries
mount, which raises the prices paid by consumers
and depresses the income earned by farmers. Food
traceability concerns also increase.54
Data-based solu-
tions can improve food traceability, while disrupting
traditional market structures by reducing the need
for intermediaries. In Haiti, blockchain solutions
have allowed mango farmers to maintain ownership
of their produce until the final sale to US retailers
by facilitating traceability and direct payments.
Employed in conjunction with other value chain com-
ponents such as third-party logistics services, inter-
mediaries that previously held substantial market
power are circumvented. Customers can scan a QR
code on the final product to access information about
where the mango comes from, how it was packaged
and transported, and the costs involved at each step.55
Similarly, Walmart has collaborated with IBM to
trace mangoes from South and Central America to the
United States. Participants in this process cannot edit
Data as a resource for the private sector | 105
information because of the decentralization feature
of blockchain technology, which ensures trust and
transparency.56
Personal data protection has specific complexities
for farmers. Data on their farms are identifiable and
could be used to reveal personal details such as their
wealth and income. However, farmers could also
benefit from using and pooling their data to develop
commercial insights. The governance regime for agri-
culture thus requires special considerations (see the
further discussion of governance issues in chapter 8).
Health
To deliver individual health care in lower-income
countries, data-driven applications require comple-
mentary improvements in infrastructure and basic
health services before they can become truly trans-
formational. Some business models show promise
in helping overcome such challenges, which include
high logistical costs, counterfeiting of pharmaceuti-
cals, difficulties in coordinating health care resources,
and low supplies of specialist expertise, especially in
rural areas. With strong mobile phone penetration,
rising investment in digitizing health informa-
tion, and developments in cloud computing, more
health-focused businesses in low- and middle-income
countries are likely to adopt data-intensive advances
in coming years. Although such advances hold prom-
ise,thesensitivenatureofhealthdataimpliesanacute
need for policy makers to be aware of the risks posed
by the improper collection and use of these data.
Telehealth (mHealth and eHealth). Telehealth makes
use of data and connected devices to deliver care
remotely. In rural areas where the ratio of doctors to
patientsislow,telehealthisausefulwaytoaccesscon-
sultations and disease diagnosis. The model has also
played an important role during the COVID-19 pan-
demic, where remote diagnosis has been necessary.
Mobile apps combined with AI technology and
wearable devices can provide in-the-field diagnoses
and recommendations. For example, Colorimetrix, an
app that allows a smartphone camera to read results
from color-based tests for diabetes, kidney disease,
and urinary tract infections, was designed specifically
with lower-income economies in mind. Algorithms
are used to compare the result with stored calibra-
tion values. Results are delivered to the smartphone,
allowing for further analysis of results for trends. The
hope is that such apps will eventually also be able to
detect HIV, malaria, and tuberculosis.57
Accuhealth Chile monitors patients in remote
areas by using a range of connected medical sensors.
Both quantitative data on patients’ progress and
qualitative data collected through custom-created
questionnaires are sent to a virtual clinical service
that conducts patient triage based on algorithmic
analysis. Accuhealth is also using predictive algo-
rithms to make service delivery more cost-efficient.
In Cameroon, CardioPad was locally designed to
improve the access of patients living in rural areas
to cardiovascular health care. The CardioPad tablet is
paired with sensors that collect data on the patient’s
health statistics and transmit them over a mobile
network to hospitals where cardiologists can make a
diagnosis.
Drug verification. Substandard or falsified medical
products will be an urgent health care challenge in
the next decade, according to the World Health Orga-
nization (WHO).58
An estimated one-tenth of medical
products in low- and middle-income countries, partic-
ularly antimalarials and antibiotics, are substandard
or falsified.59
Mobile authentication services such as mPedigree
offer people in countries such as Ghana, India, Kenya,
and Nigeria an easy way to check the authenticity
of medicine. Launched in Ghana in 2007, mPedigree
allows pharmaceutical manufacturers to add a code
to their packaging that consumers can then ver-
ify using their mobile phones. mPedigree has also
begun using its consumer authentication data to
monitor for anomalies in real time so that it can then
generate warnings to brand owners, regulators, and
consumers.60
Supplies management. Digital platforms can also
help manage supplies in countries where centralized
provision is deficient or lacking. LifeBank is a Nige-
rian platform firm that matches hospitals requesting
blood with potential donors based on current demand
and location maps of all institutions involved in blood
distribution. Information about the donation, col-
lection, screening, storage, and delivery procedures
are recorded on a blockchain, thereby increasing
confidence in blood quality. LifeBank claims to have
reduced the average delivery time from about 24
hours to 45 minutes.61
During the COVID-19 crisis, the
platform has also extended its services to matching
medical equipment.
Education
Despite significant improvements in school enroll-
ment over the last decades, an average student
in low-income countries performs worse than 95
percent of the students in high-income countries.62
Lack of teaching resources and learning tools and
the traditional one-size-fits-all approach in education
have made it difficult to tailor instruction to students’
106 | World Development Report 2021
individual abilities and needs, particularly where
classrooms are overcrowded.63
Recent advances in big
data and AI offer opportunities to provide individu-
alized learning experiences for students. Machine
learning and data analytics techniques can help iden-
tify students’ behavioral patterns (such as mistakes
made frequently in tests) usually in a more efficient
way than teachers. For example, by memorizing and
understanding students’ learning paths, Byju, a digi-
tal supplemental learning platform in India, suggests
tailored learning materials such as videos, quizzes,
and flashcards that match the needs of individual stu-
dents. Besides analyzing individual learning behav-
iors, the platform also analyzes aggregate data on how
all students learn on its platform. If many students
are having trouble with similar types of problem sets,
the system flags the need to add more explanatory
videos or materials to the entire platform.
In China, Squirrel Ai Learning is another firm spe-
cializing in intelligent adaptive education. Students
start with a short diagnostic test to leave a digital
footprint reflecting their knowledge level so that the
teaching system can provide a tailored curriculum,
which is updated as the student proceeds through
learning modules. Based on its comparison trials
among middle school students, Squirrel Ai Learning
claims that its system does a better job of improving
mathtestscoresthantraditionalclassroomteaching.64
During the COVID-19 pandemic, digital plat-
forms that support live video communication have
been playing an indispensable role in transitioning
to online learning. As of May 2020, more than 140
countries had closed schools, affecting more than 60
percent of enrolled students.65
Lark, for example, is
providing educational institutions in India with free
cloud storage and video conference services. ­
Dingtalk,
a communication platform that supports video con-
ferencing and attendance tracking, has connected
more than 50 million students with teachers in China.
Transport and logistics
Data-driven firms in transport and logistics provide
matching services to facilitate the use of assets by
other market participants. By automating decision-
making and navigation, these models increase the
efficiency of service delivery and the management of
supply chains. Prominent applications are for digital
freight matching, digital courier logistics, and IoT-
enabled cold storage.
Digital freight matching. These platforms (often
dubbed “Uber for trucks”) match cargo and shippers
with trucks for last-mile transport. In lower-income
countries, where the supply of truck drivers is highly
fragmented and often informal, sourcing cargo is a
challenge, and returning with an empty load contrib-
utes to high shipping costs. In China, the empty load
rate is 27 percent versus 13 percent in Germany and
10 percent in the United States.66
Digital freight matching overcomes these chal-
lenges by matching cargo to drivers and trucks that
are underutilized. The model also uses data insights
to optimize routing and provide truckers with inte-
grated services and working capital. Because a signifi-
cant share of logistics services in lower-income coun-
tries leverage informal suppliers, these technologies
also represent an opportunity to formalize services.
Examples include Blackbuck (India), Cargo X (Brazil),
Full Truck Alliance (China), Kobo360 (Ghana, Kenya,
Nigeria, Togo, Uganda), and Lori (Kenya, Nigeria,
Rwanda, South Sudan, Tanzania, Uganda). In addition
to using data for matching, Blackbuck uses various
data to set reliable arrival times, drawing on global
positioning system (GPS) data and predictions on the
length of driver stops. Lori tracks data on costs and
revenues per lane, along with data on asset utiliza-
tion, to help optimize services. Cargo X charts routes
to avoid traffic and reduce the risk of cargo robbery.
Kobo360 chooses routes to avoid armed bandits based
on real-time information shared by drivers. Many of
the firms also allow shippers to track their cargo in
real time. Data on driver characteristics and behavior
have allowed platforms to offer auxiliary services to
address the challenges that truck drivers face. For
example, some platforms offer financial products to
help drivers pay upfront costs, such as tolls, fuel, and
tires, as well as targeted insurance products.
Kobo360 claims that its drivers increase their
monthly earnings by 40 percent and that users save
an average of about 7 percent in logistics costs.67
Lori
claims that more than 40 percent of grain moving
through Kenya to Uganda now moves through its
platform, and that the direct costs of moving bulk
grain have been reduced by 17 percent in Uganda.68
Digital courier logistics. The growth of on-demand
couriers enables small merchants and the growing
e-commerce industry to reach customers rapidly
and reliably in expanding urban areas. Data-driven
matching and route optimization overcome high
search costs and traffic congestion and provide verifi-
cation of safety standards through customer reviews.
The prime example of this business model is
Gojek, which is reportedly utilized by more than 1 mil-
lion motorcycle drivers serving 500,000 micro, small,
and medium enterprises (MSMEs)—including more
than 120,00 MSMEs since the onset of the COVID-19
pandemic.69
Established in Indonesia in 2010 as a
Data as a resource for the private sector | 107
call center to connect consumers to courier delivery
services, the company leveraged its data on consumer
behavior to expand into digital courier services in
2015. Its app now offers various logistics services,
including delivery of food and groceries and medi-
cines and pharmaceuticals. Gojek uses AI and ML for
matching, forecasting (to inform drivers where to go
ahead of a surge in demand), and dynamic pricing.
Through its 8 billion pings with drivers per day, Gojek
claims it generates 4–5 terabytes of data every day.
IoT-enabled cold storage. According to WHO, 19.4
million people across the globe lacked access to
routine life-saving vaccines in 2018, partly because
of lack of efficient cold chain systems.70
IoT-enabled
cold storage solutions allow the transport and storage
of temperature-sensitive food and medication, with
greater control and tracking by the freight owner.
For vaccines, the ability to track temperature can help
ensure confidence in integrity before dispensation,
even before further testing.
Gricd, a Nigerian start-up founded in 2018, uti-
lizes solar-powered, IoT-enabled mobile refrigeration
boxes whose internal probes collect temperature data
and transmit it to a server. Real-time information on
location and temperature can be accessed by freight
owners online or via a mobile app, ensuring that the
cold chain is effectively monitored and maintained.
Social media as a tool for connecting to
markets
High marketing and advertising costs hinder smaller
businesses trying to reach new markets and cus-
tomers. Meanwhile, high search costs and frictions
related to contract enforcement raise prices for buy-
ers and inhibit trade.71
Social media provide a low-
cost sales platform for firms domestically and abroad
and reduce search costs for consumers. They enable
products to be better targeted to consumers and can
reduce marketing costs by as much as 90 percent,
compared with traditional television marketing.72
Social media platforms also allow sellers to incor-
porate market intelligence into their product devel-
opment through real-time feedback and gathering
of online data.73
Given these advantages, it is not
surprising that nearly half of all enterprises in the
European Union had used social media for advertis-
ing purposes as of 2017.74
Reaching markets through social media could
disproportionately advantage smaller firms over
larger ones, as suggested by the high proportion
of small entrepreneurs who use Facebook. In 2018,
nearly four in 10 Facebook business users were
single-person firms connecting across 42 countries
(including low- and middle-income), although single-
person firms are only one-tenth of the general popu-
lation of firms.75
Businesses run by women are more
likely to leverage online tools to facilitate business
success than businesses run by men.76
On the buyer side, social connections can increase
trade by building trust, including by reducing infor-
mation asymmetries and providing a substitute for
the formal mechanisms of contract enforcement.77
According to a study of 180 countries and 332 Euro-
pean regions, social connectedness tends to increase
exports—particularly to those countries with a weak
rule of law—and to lower prices, especially for goods
whose prices are not transparent and that are not
traded on exchanges.78
Some potential risks and
adverse outcomes of data-driven
businesses to be addressed by
policy
Despite the potential transformative effects of data-
driven firms, policy makers need to take into account
several (often interrelated) risks and adverse out-
comes to ensure that the use of data in the productive
processes of firms safely fulfills their potential. The
relevance and immediacy of these concerns depend
onthedataintensityofacountry’seconomy.However,
becauseoftheglobalnatureofmanylargedata-driven
firms market dynamics in one country can often have
spillover effects internationally. This concern should
not discourage policy makers from fostering a data-
driven economic ecosystem in their country, but they
should put the appropriate safeguards and enablers
in place to ensure that data-driven markets remain
competitive and vibrant—and that gains are shared
broadly across society—as the data intensity of the
economy increases. These topics are covered in part II
of this Report.
Potential to increase the propensity for
dominant firms to emerge
Proprietary data can provide a firm with a compet-
itive advantage over rivals. Because data are often
created as a by-product of a firm’s economic activities,
once a firm has invested in the fixed cost of building
capacity to collect data, the marginal cost of creating
additional data is low. Moreover, better targeting of a
firm’sofferingcanattractmoreusers,therebyleverag-
ing network effects between platform users that can
lead to a “winner-takes-most” dynamic or, at the least,
a scale advantage that new entrants find difficult to
overcome.79
For example, an e-commerce platform
108 | World Development Report 2021
that incorporates more consumer data creates a more
customized shopping experience, with more accurate
product recommendations, more preordered shop-
ping baskets, and more consumer reviews. A platform
with a greater number of consumers will also attract
more suppliers through indirect network effects, rais-
ing users’ costs of switching to competing platforms.
The distribution of web traffic, a proxy for concentra-
tion in the e-commerce sector, is skewed toward a few
larger platforms. Among 631 business-to-consumer
online marketplaces in Africa, 56 percent of web vis-
itors went to 1 percent of sites in 2019.80
Jumia alone
had 24 percent of users.
Data can also ease a platform’s entry into adjacent
markets. Well-known examples are M-Pesa’s move
from money transfers into savings and loan prod-
ucts; Uber’s entry into food and freight delivery; and
Google’s evolution from search to shopping, maps,
and other markets. By combining multiple types of
data, platforms can benefit from the broader scope of
their data, which has spurred a growing number of
mergers aimed at accumulating data (a prime exam-
ple is Facebook’s acquisition of WhatsApp).81
The potential of a platform business to acquire
market power depends on its business model, includ-
ing the types of users that interact on the platform
and its revenue model, which affect the type of data
that gives firms a competitive advantage (figure 3.7).
For example, firms that rely on advertising revenue
require frequently updated consumer data to create
holistic consumer profiles. Platforms that earn a fee
based on their transactions conducted may rely more
on historical data on product demand and consumer
profiles for a smaller range of products. The type of
data required affects, in turn, the ease with which
firms can access or replicate the data they need.
Where platforms rely on volunteered or observed
consumer data, firms with greater market power may
be able to collect data more easily because consumers
have fewer options—meaning those firms can further
entrench their market positions.
The greater propensity for dominance in data-
driven markets raises the risk that smaller or more
traditional firms will be excluded, hindering local
entrepreneurship and posing risks for consumer
welfare. These effects can be exacerbated in devel-
oping markets, where entrants find it harder to raise
start-up capital and hire from the limited supply of
skilled programmers and data scientists. For exam-
ple, of the total private market funding received by
the 10 highest-funded disruptive tech firms in Africa,
77 percent went to firms owned by the three largest
African internet companies (two by Naspers, two
by Jumia, and one by Ringier One Africa Media).82
Where few large data-driven players currently oper-
ate or where a few large firms provide much-needed
goods and services, the risks may be less immediately
apparent. However, because of the dynamism of such
markets and their tendency to tip toward concen-
trated structures, it is important that policy makers
safeguard against dominance that forestalls entry
and innovation.
Figure 3.7 Risks to market structure and market power stemming from platform firms
Source: WDR 2021 team.
a. Such data include observed and inferred data and data requiring frequent updating.
Figure 3.7
Data give firms
a competitive
advantage, especially
where
Tendency toward
increasing
concentration
within markets and
entrenchment of
market power
Tendency to
conglomeration and
ability to leverage
market power
between markets
Greater market power may allow more data collection by
firms because users have fewer alternatives
Data as an input
into economic
activity
Potential sides
of a platform
• Consumers
• Suppliers or service
providers/“workforce”
(where a product or
service is provided by
a third party)
• Advertisers (where
the revenue model is
advertising-based)
• Data are
proprietary or
Network effects
access is restricted
• The required data
are less replicable
a
are stronger
Economies
of scale
associated
with data
Economies
of scope
associated
with data
Depending on
the platform
business
model, effects
can occur in
• Consumer
markets
• Markets for
“labor”
• Advertising
markets
Data as a resource for the private sector | 109
On DLT platforms, data access is not controlled
by one institution, thus reducing the extent to which
the benefits of network effects can entrench market
power. However, private blockchains still require a
central authority or institution that decides who can
participateinthesystem,therebyplacingpowerinthe
hands of the institution that acts as the gatekeeper.
By contrast, public blockchains such as Bitcoin do not
require a central authority but rather “proof-of-work”
(a system to deter frivolous or malicious uses of com-
puting power) to participate. This system, in turn,
generates very high energy costs because it requires
computing power and resource usage.83
Phenomena linked to the issue of dominance and
market structure are described in the sections that
follow.
Tension between cooperation and competition in data-
driven ecosystems. Complementary products are built
around larger platforms. These innovations can
become central to the business models of the larger
platforms. For example, Amazon, Uber, and Airbnb
would not be able to operate without the payment
systems embedded in their services. However, this
network structure also means that firms are depen-
dent on accessing the systems and data of other firms
that could become rivals in the future.
Typically, firms access the systems and data
of other firms through application programming
interfaces (APIs) provided by the core platform in the
ecosystem. APIs link platforms to other platforms
and to developers of digital services. Through APIs,
a platform or digital service provider will typically
either draw data from or provide data to other firms
to support its own functioning or support the func-
tioning of other players. In markets where data are a
key input, the owners of valuable data are gatekeepers
in the development of smaller entrepreneurs.
The emergence of potential competition from
complementors may provoke the lead platform to
restrict its API. For example, when Twitter perceived
a competitive threat from LinkedIn, Twitter restricted
the use of its API in 2012 to prevent users’ Tweets
from appearing on LinkedIn’s platform. In Kenya,
developers have complained about M-Pesa’s refusal to
share its API.84
In the United States, developers testi-
fying before Congress in 2020 accused Apple of mim-
icking their products and of citing privacy concerns
to restrict how third-party developers collect location
data.85
Chapter 7 covers competition issues.
Data-driven mergers and acquisitions. In recent
years, waves of acquisitions by large local players
have occurred in e-commerce in China and India
and in transportation in Southeast Asia, where Uber
exited eight Southeast Asian markets after selling its
businesses to Grab, the region’s leading platform.86
Of the mergers involving digital platforms that have
undergone review by antitrust authorities around the
world, 82 percent involved an acquisition by a very
large firm.87
Mergers of two very large firms were the
most common type of transaction. Chapter 7 covers
issues related to mergers in more detail.
Suboptimal exchange of data. Although the broad
use, reuse, and repurposing of data by firms can gen-
erate larger gains, market mechanisms may generate
specific patterns of data exchange and reuse below
the level that yields the greatest social welfare.88
Reg-
ulators must take several steps to balance the costs
and benefits of mandating data sharing to address
these concerns. First, they should protect individuals’
rights related to personal data. Second, they should
recognize that mandated data sharing would dampen
firms’ incentives to invest in data collection if firms
must share data with competitors or potential com-
petitors. And, third, they should take into account that
data sharing could jeopardize the provision of free or
subsidized services if a firm relies on monetization
of its data to cross-subsidize these services. Optimal
data sharing between firms could be lower in coun-
tries where data are less important to the economy.89
The right balance may differ in high-income and
lower-income economies, although policy makers in
some lower-income countries may wish to adopt a
forward-looking viewpoint in this area to set the stage
for future advances in their data economies. These
issues are further discussed in chapters 6 and 7.
Linked to data sharing is the debate over property
rights or access rights to data generated as a by-
product of economic activity—for example, in terms
of individuals versus firms in the case of personal
data or in terms of owners of devices or applications
versus the party using the device or application. This
issue is taken up in chapter 6.
Potential for exploitation of individuals
Excessive data collection. Data collected by firms track-
ing users across third-party websites, applications,
and devices can raise concerns. This practice is
dominated by a small number of large firms—for
example, a major part of Google’s data collection
occurs when a user is not directly engaged with any
of its products.90
Recent literature suggests that the
vast amounts of data collected in this manner may be
deemed excessive under existing European competi-
tion laws, where the focus is on the anticompetitive
harm that may occur whether or not data protec-
tion rules are infringed.91
More generally, both data
110 | World Development Report 2021
protection authorities and competition authorities
have noted that firms often understate and obscure
their actual data practices, preventing consumers
from making informed choices.92
Excessive collection
of data on children and other vulnerable groups is of
particular concern (such as in the education, health,
and financial sectors), especially in countries and
locales where individuals lack adequate knowledge
to protect against these risks. These issues are further
discussed in chapters 6 and 7.
Insufficient governance of data held by private firms.
Firms choose how much to invest in cybersecurity
and data protection, but that investment may fall
short of the level that yields the greatest social wel-
fare because firms do not fully internalize the value
of privacy and security to the individual user or the
need to engender trust in the data economy. Infra-
structure service providers may tend to underinvest
in cybersecurity because the economic consequences
of any breach in data security are largely borne by the
clients whose data are compromised. A 2018 report by
the US Council of Economic Advisers estimated that
malicious cyberactivity cost the US economy between
US$57 billion and US$109 billion in 2016, representing
between 0.31 and 0.58 percent of that year’s GDP.93
Governments may need to provide incentives or
regulations to offset the tendency of firms to under-
invest in cybersecurity, including imposing adequate
penalties for data breaches.94
Pricing, discrimination, and algorithmic risks. Because
firms have so much information on customers’ per-
sonal characteristics and purchase histories, they can
adjust their offerings to charge higher or lower prices
based on an individual’s price sensitivity.This practice
can allow low-income sectors of society to be served
that otherwise would not be. It can also help firms
clear their stock, thereby reducing wastage. However,
such price and quality discrimination can also harm
some consumers. This kind of price discrimination
is not inherently bad—it is a transfer of surplus from
consumers to producers, and governments can use
taxes and transfers to distribute it back again—but
data and data-driven business models do make it
easier to discriminate by price.
Algorithms can facilitate anticompetitive behavior
by firms, ultimately harming individuals through
higher prices. Algorithms can be trained to collude
independently by surreptitiously following the behav-
ior of a price leader, or they may be unintentionally
biased because of inherent bias in their training data.
This bias can magnify marginalization because the
most vulnerable populations are often those that are
least represented in digital data. Although algorithms
hold the promise of impartiality, this promise is
not always realized because ultimately they reflect
the same biases in human judgment and behavior
(due to logic bias and flawed assumptions) reflected
in their training datasets. But because algorithmic
decision-making is opaque, the potential biases and
anticompetitive effects may be difficult to detect.
In algorithmic decision-making, including credit
scoring, these risks could lead to discrimination
along socioeconomic lines that entrenches existing
inequalities.95
Groups with more limited access to
mobile phones, the internet, and bank accounts,
such as women, may become less visible in data and
decision-making if algorithmic bias is perpetuated
through use of biased datasets. If decisions are based
on data about those with whom a person interacts,
such as friends and neighbors, this, too, may amplify
discriminatory effects. For example, a poor credit
score for an individual may contribute to lower scores
for those in their neighborhood or social network.
Furthermore, alternative scoring tools may be used to
identify vulnerable individuals susceptible to preda-
tory loans and other product offerings.96
These concerns suggest the need to establish a sys-
tem of oversight, inspection, and auditing of firms’
algorithms. However, adequate standardized legal
and regulatory frameworks to deal with risks from
AI and regulatory capacity to determine harm and the
appropriate safeguards are lacking.
Indirect management of the workforce through algo-
rithms. Remotely collecting data on workers and
service providers to drive automated or semiauto-
mated decision-making on parameters such as task
allocations, performance evaluations, and incentives
for certain types of behavior has become particularly
prevalent in the gig economy.97
In addition to the risks
algorithmic management raises for bias and discrim-
ination, the practice makes it easier for firms to avoid
classifying individuals as employees and thus avoid
providing workers with benefits. Better understand-
ing of the organizational and welfare impacts of algo-
rithmic management and data collection on workers
would help identify appropriate protections.
Potential to increase inequality within and
among countries
Adoption of data-driven business models could
widen gaps within countries, between countries, and
between different types of firms, different types of
workers, and individuals in different income groups.
Within a country, the impact of the data-driven
economy on individuals—as consumers, entrepre-
neurs, or job seekers—will depend on their access
Data as a resource for the private sector | 111
to finance, education levels, skills, and technology.
Although selling through platforms can close the
productivity gaps between large and small firms for
those small firms that go down this sales channel,
overall smaller firms and entrepreneurs around the
world lag their larger counterparts in adopting basic
technologies such as fast broadband, having an inter-
net presence, selling online, and utilizing cloud com-
puting. Although the gig economy provides oppor-
tunities for job creation, only those who have the
assets and skills to participate (such as cars, mobile
devices, and literacy) will be able to benefit. And even
though automated decision-making may mean more
efficient and cost-effective service delivery for some
individuals, it may lead to greater bias and discrimi-
nation against others.
Likewise, the degree to which a country can
­
benefit from the data-driven economy depends on
its underlying infrastructure, capabilities, and scale.
The amount of data that can be derived locally
depends on a country’s level of digital economic activ-
ity. Firms from larger, more connected economies—
or firms that already operate across countries—with
access to larger datasets will have an advantage
that only grows with time. Firms from low- and
middle-income countries are more likely to lack
both access to finance to cover the initial costs of
collecting and managing their data and the analytical
capabilities to derive value from them. When com-
bined with fewer (or more uncertain) opportunities
for monetizing data, either now or in the future (such
as through advertising or development of new prod-
ucts), firms from lower-income economies also have
less incentive to invest in collecting and analyzing
data, which can worsen inequality between countries
on a macro level.
Discouraging international data-driven firms
from operating or locating in lower-income countries
(such as through restrictive data policies) is not a
viable solution because it deprives the local economy
of the pro-growth and development benefits that
data-driven firms can provide. Moreover, it prevents
the development of a local ecosystem of data-driven
entrepreneurs built around these larger firms—a sce-
nario that could slow the advancement of infrastruc-
ture and capabilities needed for lower-income coun-
tries to bridge the gap in the longer term. Instead,
governments can seek to harness the positive welfare
effects of the data-driven economy while mitigating
the risks to inequality through a combination of digi-
tal inclusion policies, public investments, and robust
legal and regulatory tools. These are the topics of
part II of this Report.
Notes
1. EIU (2012); Manyika et al. (2011).
2. Fernando (2021).
3. Manyika et al. (2011).
4. Brynjolfsson, Hitt, and Kim (2012); Brynjolfsson and
McElheran (2016b).
5. von Mörner (2017).
6. Magalhaes and Roseira (2017); Stott (2014).
7. Evans and Schmalensee (2016); Gawer (2014).
8. This figure is according to the database constructed by
the i2i Facility. The database covers eight Sub-Saharan
African countries: Ghana, Kenya, Nigeria, Rwanda,
South Africa, Tanzania, Uganda, and Zambia. See Africa’s
Digital Platforms Database, Insight2Impact (i2i), http://
access.i2ifacility.org/Digital_platforms/.
9. Evans (2016); Evans and Gawer (2016).
10. The countries in the sample are Armenia, Bangladesh,
Brazil, Colombia, the Arab Republic of Egypt, Indonesia,
Kenya, Malaysia, Morocco, Nigeria, Peru, the Philip-
pines, the Russian Federation, South Africa, Sri Lanka,
Tunisia, and Ukraine. These countries were selected
based on data availability and to provide a combination
of countries of different sizes and levels of economic
development across regions.
11. See Nyman and Stinshoff (forthcoming), who base their
data on information provided by Crunchbase, Crunch-
base (database), https://www.crunchbase.com/; Dow
Jones and Company, Factiva (database), https://
professional.dowjones.com/factiva/; Thomson Reuters
Foundation, “Inclu­
sive Economies,” http://www.trust.org
/inclusive-economies/. E-commerce includes both busi-
ness-to-business (B2B) and business-to-consumer (B2C)
business models (excluding agriculture wholesale).
Transport includes passenger transport and freight
transport/logistics, with the latter accounting for about
67 percent of firms in this category on average. Agri­
culture includes platforms where the main business
focus is agriculture, including both marketplace and
financial services. Tourism includes booking platforms
and accommodation sharing.
12. Data on the founding year were available for 75 percent
of firms in the database.
13. A firm is considered closed if the website cannot be
found or accessed via internet research, or if the domain
is for sale. A firm is confirmed closed if an article noting
the firm’s exit was found.
14. Sites are ordered by their Alexa traffic rank, calculated
using a combination of average daily visitors and
pageviews over the past month. The site with the high-
est combination of visitors and pageviews is ranked 1.
See Alexa Internet, “The Top 500 Sites on the Web, by
Country” (accessed April 2020), https://www.alexa.com
/topsites/countries.
15. Analysis of data from World Bank, Digital Business
Indicators (database), https://www.worldbank.org/en
/research/brief/digital-business-indicators.
16. Total session time of over 85 billion hours was recorded
from May to June 2018, according to data from 2020 on
data software development kits, mobile app downloads,
112 | World Development Report 2021
and revenue and usage data from Apptopia, Data (data-
base), https://apptopia.com/.
17. Kumar and Muhota (2012). See UN Global Working
Group on Big Data for Official Statistics for an over-
view of mobile phone data types and their potential use
(UNGWG 2019).
18. Bertrand and Kamenica (2018).
19. Berg et al. (2018).
20. Platforms, IoT devices, and blockchain reduce asymmet-
ric information about the quality and trustworthiness
of smaller suppliers. For platforms, user reviews partly
serve this purpose. For example, better-rated sellers
on eBay have higher prices and higher revenues—see
Houser and Wooders (2005); Livingston (2002); Lucking-
Reiley et al. (2007); Melnik and Alm (2002)—and sellers
with low ratings exit from eBay’s platform (Cabral and
Hortaçsu 2010).
21. Lendle et al. (2012).
22. Lendle et al. (2012).
23. Luo and Niu (2019).
24. De and Raychaudhuri (2008).
25. Balchin et al. (2016).
26. Data are as of 2018 and come from World Trade Orga-
nization, Trade Profiles (database), https://www.wto.org
/english/res_e/statis_e/trade_profiles_list_e.htm.
27. Balchin et al. (2016).
28. Borchert and Mattoo (2019).
29. The 10 countries are Belgium, France, Germany, Hun-
gary, Italy, Poland, Spain, Sweden, the United Kingdom,
and the United States.
30. Bailin et al. (2019).
31. Falk and Hagsten (2015).
32. Calculations of the European Investment Bank (EIB) and
World Bank based on EIB (2019).
33. Aghion, Jones, and Jones (2017); Cockburn, Henderson,
and Stern (2019).
34. Brynjolfsson and McElheran (2016a).
35. MGI (2018).
36. IFC (2009).
37. Eisenmeier (2018).
38. Bughin and van Zeebroeck (2017). The authors estimate
that tapping latent demand could increase industry size
by 0.5 percent a year.
39. Brynjolfsson et al. (2019).
40. Alphabet Inc. (2019); Clement (2020).
41. Alibaba Group (2019); McNair (2018).
42. West and Biddle (2017).
43. Singh (2020).
44. Demirgüç-Kunt et al. (2018).
45. Manyika et al. (2016).
46. Ant Financial is an affiliate company of the Alibaba Group.
47. LenddoEFL (2020).
48. Lobosco (2013).
49. Aron and Muellbauer (2019).
50. Patnam and Yao (2020).
51. Suri and Jack (2016).
52. Mastercard (2017). In May 2020, Mastercard and the
Caribbean Hotel and Tourism Association (CHTA)
launchedaTourismInsightsplatformthatlooksattravel
trends from search patterns to in-market spending
for the Dominican Republic, Jamaica, and Puerto Rico
(Jamaica Observer 2020).
53. Baruri (2016).
54. Creydt and Fischer (2019).
55. Open Access Government (2019).
56. Kamath (2018).
57. Levy (2014).
58. The World Health Organization (WHO 2018) defines as
substandard “authorized medical products that fail to
meet either their quality standards or specifications, or
both” and falsified “medical products that deliberately/
fraudulently misrepresent their identity, composition or
source.”
59. WHO (2018).
60. Taylor (2016).
61. Google (2021).
62. World Bank (2017).
63. Rouhiainen (2019).
64. Hao (2019).
65. UNESCO (2020).
66. Future Hub (2020).
67. Gerretsen (2020).
68. Okello (2018).
69. Estimates are provided by Gojek. See Universitas Indo-
nesia (2020).
70. WHO (2019).
71. See, for example, Aker (2010); Allen (2014); Eaton and
Kortum (2002); Jensen (2007); Simonovska and Waugh
(2014); Startz (2017).
72. See LYFE Marketing “Traditional Media vs. Social
Media Advertising: Cost Comparison,” https://www.lyfe
marketing.com/traditional-media-versus-social-media/.
73. Rumo Arongo Ndiege (2019).
74. EU Open Data Portal, Eurostat, “Social Media Use by
Type, Internet Advertising” (dataset), https://data.europa
.eu/euodp/en/data/dataset/MTxwCIIEx8RhOhZMmg
Wvg.
75. Facebook, OECD, and World Bank (2017).
76. Facebook, OECD, and World Bank (2017).
77. Bailey et al. (2020).
78. Bailey et al. (2018).
79. A model developed by Farboodi et al. (2019) shows that
data accumulation increases the skewness of firm
size distribution as large firms generate more data
and invest more in active experimentation. Although
there has been less research on DLT applications,
Benos, Garratt, and Gurrola-Perez (2019) suggested that
similar dynamics would also push these markets to
concentration.
80. See ecomConnect, International Trade Centre, African
Marketplace Explorer (dashboard), https://ecomconnect
.org/page/african-marketplace-explorer.
81. Argentesi et al. (2019).
82. CB Insights (2020). Private market funding refers to the
total amount of money a firm has received, including
from financial institutions and venture funding. Money
raised in public markets is excluded.
83. IRGC (2017).
84. Riley and Kulathunga (2017).
85. Romm (2020).
Data as a resource for the private sector | 113
86. Evans (2016).
87. Very large firms have more than 10,000 employees and
more than US$1 billion in revenues. See Nyman and
Barajas (forthcoming).
88. In the presence of privacy concerns and negative exter-
nalities in data sharing, the market may instead gener-
ate too much data sharing. See Acemoglu et al. (2019).
89. Jones and Tonetti (2019).
90. Schmidt (2018).
91. Robertson (2020).
92. Kemp (2019).
93. CEA (2018).
94. Gordon et al. (2015); Kashyap and Wetherilt (2019).
95. McGregor, Murray, and Ng (2019).
96. Hurley and Adebayo (2017).
97. Mateescu and Nguyen (2019).
References
Acemoglu, Daron, Ali Makhdoumi, Azarakhsh Malekian,
and Asuman Ozdaglar. 2019. “Too Much Data: Prices
and Inefficiencies in Data Markets.” NBER Working
Paper 26296, National Bureau of Economic Research,
Cambridge, MA. https://doi.org/10.3386/w26296.
Aghion, Philippe, Benjamin F. Jones, and Charles I. Jones.
2017. “Artificial Intelligence and Economic Growth.”
NBER Working Paper 23928, National Bureau of Eco-
nomic Research, Cambridge, MA.
Aker, Jenny C. 2010. “Information from Markets Near and
Far: Mobile Phones and Agricultural Markets in Niger.”
American Economic Journal: Applied Economics 2 (3): 46–59.
https://doi.org/10.1257/app.2.3.46.
Alibaba Group. 2019. “Alibaba Group Announces March
Quarter and Full Fiscal Year 2019 Results.” Press release,
May 15, 2019, Hangzhou, China. https://www.alibaba
group.com/en/news/press_pdf/p190515.pdf.
Allen, Treb. 2014. “Information Frictions in Trade.” Econo­
metrica 82 (6): 2041–83.
Alphabet Inc. 2019. “Form 10-K: Annual Report Pursuant
to Section 13 or 15(D) of the Securities Exchange Act of
1934.” US Securities and Exchange Commission, Wash-
ington, DC. https://abc.xyz/investor/static/pdf/20200204
_alphabet_10K.pdf?cache=cdd6dbf.
Argentesi, Elena, Paolo Buccirossi, Emilio Calvano, Tomaso
Duso, Alessia Marrazzo, and Salvatore Nava. 2019.
“Ex-post Assessment of Merger Control Decisions in
Digital Markets.” With contributions by Elena Salo-
mone and Anna Violini, Lear, Rome. https://www.learlab
.com/wp-content/uploads/2019/06/CMA_past_digital
_mergers_GOV.UK_version-1.pdf.
Aron, Janine, and John Muellbauer. 2019. “The Economics of
Mobile Money: Harnessing the Transformative Power of
Technology to Benefit the Global Poor.” VoxEU.Org, May 7,
2019. https://voxeu.org/article/economics-mobile-money.
Bailey, Michael, Rachel Cao, Theresa Kuchler, Johannes
Stroebel, and Arlene Wong. 2018. “Social Connectedness:
Measurement, Determinants, and Effects.” Journal of
Economic Perspectives 32 (3): 259–80.
Bailey, Michael, Abhinav Gupta, Sebastian Hillenbrand,
Theresa Kuchler, Robert J. Richmond, and Johannes
Stroebel. 2020. “International Trade and Social Connect-
edness.” NBER Working Paper 26960, National Bureau
of Economic Research, Cambridge, MA. https://doi.org
/10.3386/w26960.
Bailin, Alberto, Peter Gal, Valentine Millot, and Stéphane
Sorbe. 2019. “Like It or Not? The Impact of Online Plat-
forms on the Productivity of Incumbent Service Provid-
ers.” OECD Economics Department Working Paper 1548,
Organisation for Economic Co-operation and Develop-
ment, Paris. https://doi.org/10.1787/080a17ce-en.
Balchin, Neil, Bernard Hoekman, Hope Martin, Maximiliano
Mendez-Parra, Phyllis Papadavid, David Primack, and
Dirk Willem te Velde. 2016. “Trade in Services and Eco-
nomic Transformation.” SET Report, Supporting Eco-
nomicTransformation, Overseas Development Institute,
London. https://set.odi.org/wp-content/uploads/2016/11
/SET-Trade-in-Services-and-Economic-Transformation
_Final-Nov2016.pdf.
Baruri, Pani. 2016. “Blockchain Powered Financial Inclu-
sion.” PowerPoint presentation, Cognizant, Teaneck, NJ.
http://pubdocs.worldbank.org/en/710961476811913780
/Session-5C-Pani-Baruri-Blockchain-Financial-Inclusion
-Pani.pdf.
Benos, Evangelos, Rodney Garratt, and Pedro Gurrola-Perez.
2019. “The Economics of Distributed Ledger Technology
for Securities Settlement.” Ledger 4 (November): 121–56.
https://doi.org/10.5195/ledger.2019.144.
Berg, Tobias, Valentin Burg, Ana Gombović, and Manju
Puri. 2018. “On the Rise of the FinTechs: Credit
Scoring Using Digital Footprints.” FDIC CFR Working
Paper 2018-04, Center for Financial Research, Federal
Deposit Insurance Corporation, Arlington, VA. https://
www.fdic.gov/analysis/cfr/2018/wp2018/cfr-wp2018-04
.pdf.
Bertrand, Marianne, and Emir Kamenica. 2018. “Coming
Apart? Cultural Distances in the United States over
Time.” NBER Working Paper 24771, National Bureau
of Economic Research, Cambridge, MA. https://doi.org
/10.3386/w24771.
Borchert, Ingo, and Aaditya Mattoo. 2009. “The Crisis-
Resilience of Services Trade.” Service Industries Journal 30
(13): 2115–36.
Brynjolfsson, Erik, Avinash Collis, Walter Erwin Diewert,
Kevin J. Fox, and Felix Eggers. 2019. “GDP-B: Accounting
for the Value of New and Free Goods in the Digital Econ-
omy.” NBER Working Paper 25695, National Bureau of
Economic Research, Cambridge, MA.
Brynjolfsson, Erik, Lorin M. Hitt, and Heekyung Hellen Kim.
2012. “Strength in Numbers: How Does Data-Driven
Decision-Making Affect Firm Performance?” In Inter­
national Conference on Information Systems 2011 (ICIS 2011),
vol. 1, edited by Association for Information Systems,
541–58. Red Hook, NY: Curran Associates.
Brynjolfsson, Erik, and Kristina McElheran. 2016a. “Data in
Action: Data-Driven Decision Making in U.S. Manufac-
turing.” CES Working Paper 16-06, Center for Economic
Studies, US Census Bureau, Washington, DC. https://
www2.census.gov/ces/wp/2016/CES-WP-16-06.pdf.
Brynjolfsson, Erik, and Kristina McElheran. 2016b. “The
Rapid Adoption of Data-Driven Decision-Making.”
114 | World Development Report 2021
American Economic Review 106 (5): 133–39. https://doi.org
/10.1257/aer.p20161016.
Bughin, Jacques, and Nicolas van Zeebroeck. 2017. “The Best
Response to Digital Disruption.” MIT Sloan Management
Review 58 (4): 80–86. https://sloanreview.mit.edu/article
/the-right-response-to-digital-disruption/.
Cabral, Luís, and Ali Hortaçsu. 2010. “The Dynamics of Seller
Reputation: Evidence from eBay.” Journal of Industrial
Economics 58 (1): 54–78.
CB Insights. 2020. “The Fintech 250: The Top Fintech Com-
panies of 2020.” Research Report, CB Insights, New York.
https://www.cbinsights.com/research/report/fintech
-250-startups-most-promising/.
CEA (Council of Economic Advisers). 2018. “The Cost of
Malicious Cyber Activity to the U.S. Economy.” CEA,
White House,Washington, DC. https://www.whitehouse
.gov/wp-content/uploads/2018/02/The-Cost-of
-Malicious-Cyber-Activity-to-the-U.S.-Economy.pdf.
Clement, Jessica. 2020. “Google: Annual Advertising Reve-
nue 2001–2019.” Statista, New York. https://www.statista
.com/statistics/266249/advertising-revenue-of-google/.
Cockburn, Iain M., Rebecca Henderson, and Scott Stern.
2019. “The Impact of Artificial Intelligence on Innova-
tion: An Exploratory Analysis.” In The Economics of Artifi-
cial Intelligence: An Agenda, edited by Ajay Agrawal, Joshua
Gans,andAviGoldfarb,115–46.Cambridge,MA:National
Bureau of Economic Research; Chicago: University of
ChicagoPress.https://www.nber.org/books-and-chapters
/economics-artificial-intelligence-agenda/impact
-artificial-intelligence-innovation-exploratory-analysis.
Creydt, Marina, and Markus Fischer. 2019. “Blockchain
and More: Algorithm Driven Food Traceability.” Food
Control 105 (November): 45–51. https://doi.org/10.1016/j
.foodcont.2019.05.019.
De, Prabir, and Ajitava Raychaudhuri. 2008. “Is India’s
Services Trade Pro-Poor? A Simultaneous Approach.”
Markhub Working Paper 16, Macao Regional Knowledge
Hub, United Nations Economic and Social Commission
for Asia and the Pacific, Macao SAR, China.
Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya
Ansar, and Jake Hess. 2018. The Global Findex Database
2017: Measuring Financial Inclusion and the Fintech Revolu-
tion. Washington, DC: World Bank. https://doi.org/10
.1596/978-1-4648-1259-0.
Eaton, Jonathan, and Samuel Kortum. 2002. “Technology,
Geography, and Trade.” Econometrica 70 (5): 1741–79.
https://doi.org/10.1111/1468-0262.00352.
EIB (European Investment Bank). 2019. “EIB Group Survey
on Investment and Investment Finance 2019: EU Over-
view.” Economics Department, European Investment
Bank, Luxembourg. https://www.eib.org/en/publications
/econ-eibis-2019-eu.
Eisenmeier, Siegfried R. J. 2018. “Case Study: Ride-Sharing
Platforms in Developing Countries: Effects and Impli-
cations in Mexico City.” P4P Commission Background
Paper 3, Pathways for Prosperity Commission on Tech-
nology and Inclusive Development, Blavatnik School
of Government, University of Oxford, Oxford, UK.
https://pathwayscommission.bsg.ox.ac.uk/sites/default
/files/2019-09/ride-sharing_platforms_in_developing
_countries.pdf.
EIU (Economist Intelligence Unit). 2012. “The Deciding Fac-
tor: Big Data and Decision-Making.” Business Analytics,
Capgemini, Paris.
Evans, David S., and Richard Schmalensee. 2016. Match­
makers: The New Economics of Multisided Platforms. Boston:
Harvard Business Review Press.
Evans, Peter C. 2016. “The Rise of Asian Platforms.” Emerging
Platform Economy Series 3, Center for Global Enterprise,
New York. https://www.thecge.net/web/viewer.html
?file=/app/uploads/2016/11/FINALAsianPlatformPaper
.pdf.
Evans, Peter C., and Annabell Gawer. 2016. “The Rise of the
Platform Enterprise: A Global Survey.” Emerging Plat-
form Economy Series 1, Center for Global Enterprise,
New York. https://www.thecge.net/app/uploads/2016/01
/PDF-WEB-Platform-Survey_01_12.pdf.
Facebook, OECD (Organisation for Economic Co-operation
and Development), and World Bank. 2017. “Future of
Business Survey: Gender Management in Business.”
Factworks, San Mateo, CA. https://fbnewsroomes.files
.wordpress.com/2017/02/future-of-business-survey
-gender-management-in-business-january-20171.pdf.
Falk, Martin, and Eva Hagsten. 2015. “E-Commerce Trends
and Impacts across Europe.” International Journal of Pro-
duction Economics 170 (December): 357–69. https://doi.org
/10.1016/j.ijpe.2015.10.003.
Farboodi, Maryam, Roxana Mihet, Thomas Philippon, and
Laura Veldkamp. 2019. “Big Data and Firm Dynamics.”
NBER Working Paper 25515, National Bureau of Eco-
nomic Research, Cambridge, MA. https://doi.org/10.3386
/w25515.
Fernando, Jason. 2021. “Factors of Production.” Investopedia,
January 3, 2021. https://www.investopedia.com/terms/f
/factors-production.asp.
Future Hub. 2020. “E-Logistics Transforming Mid-Mile
Freight Logistics in Emerging Markets: Stories Told and
Untold.” Future Insight (blog), June 30, 2020. http://www
.fhub.io/blog/e-logistics-transforming-mid-mile-freight
-logistics-in-emerging-markets.
Gawer, Annabell. 2014. “Bridging Differing Perspectives on
Technological Platforms: Toward an Integrative Frame-
work.” Research Policy 43 (7): 1239–49.
Gerretsen, Isabelle. 2020. “Trucking App Kobo360 Wants to
Speed Up Deliveries across Africa.” CNNBusiness: Innovate
Africa, April 16, 2020. https://www.cnn.com/2020/04/16
/tech/kobo-360-trucks-spc-intl/index.html.
Google. 2021. “How Doctors, Dispatch Drivers, and Blood
Donors Are ComingTogether to Save Lives across Africa.”
Search On, Lifebank, Google, Mountain View, CA. https://
about.google/stories/lifebank/#:~:text=By%20designing
%20a%20system%20to,take%20your%20time%20to%20
solve.
Gordon, Lawrence A., Martin P. Loeb, William Lucyshyn,
and Lei Zhou. 2015. “Increasing Cybersecurity Invest-
ments in Private Sector Firms.” Journal of Cybersecurity 1
(1): 3–17. https://doi.org/10.1093/cybsec/tyv011.
Hao,Karen.2019.“ChinaHasStartedaGrandExperimentinAI
Education: It Could Reshape How theWorld Learns.” MIT
Technology Review. https://www.technologyreview.com
/2019/08/02/131198/china-squirrel-has-started-a-grand
-experiment-in-ai-education-it-could-reshape-how-the/.
Data as a resource for the private sector | 115
Houser, Daniel, and John Wooders. 2005. “Hard and Soft
Closes: A Field Experiment on Auction Closing Rules.”
In Experimental Business Research: Economic and Managerial
Perspectives, vol. 2, edited by Amnon Rapoport and Rami
Zwick, 123–31. Dordrecht, the Netherlands: Springer.
Hurley, Mikella, and Julius Adebayo. 2017. “Credit Scoring in
the Era of Big Data.” Yale Journal of Law and Technology 18
(1): 148–216.
IFC (International Finance Corporation). 2009. “M-Money
Channel Distribution Case, Kenya: Safaricom M-PESA.”
IFC, Washington, DC. https://www.ifc.org/wps/wcm
/connect/e0d2a9bd-16b9-4a36-8498-0b2650b9af8b/Tool
%2B6.7.%2BCase%2BStudy%2B-%2BM-PESA%2BKenya
%2B.pdf?MOD=AJPERESCVID=jkCVy-n.
IRGC (International Risk Governance Center). 2017. “Gov-
erning Risks and Benefits of Distributed Ledger Tech-
nologies.” IRGC, Geneva. https://irgc.org/wp-content
/uploads/2018/09/IRGC.-2017.-Governing-risks-and
-benefits-of-DLTs.-Highlights.pdf.
Jamaica Observer. 2020. “Mastercard, CHTA Launch Tourism
Insights Platform.” May 8, 2020. http://www.jamaica
observer.com/business-report/mastercard-chta-launch
-tourism-insights-platform_193696?profile=1442.
Jensen, Robert. 2007. “The Digital Provide: Information
(Technology), Market Performance, and Welfare in the
South Indian Fisheries Sector.” QuarterlyJournalofEconom-
ics 122 (3): 879–924. http://www.jstor.org/stable/25098864.
Jones, Charles I., and Christopher Tonetti. 2019. “Nonrivalry
and the Economics of Data.” NBERWorking Paper 26260,
National Bureau of Economic Research, Cambridge, MA.
https://ideas.repec.org/p/nbr/nberwo/26260.html.
Kamath, Reshma. 2018. “Food Traceability on Blockchain:
Walmart’s Pork and Mango Pilots with IBM.” JBBA,
JournaloftheBritishBlockchainAssociation 1 (1): 3712. https://
doi.org/10.31585/jbba-1-1-(10)2018.
Kashyap, Anil K., and Anne Wetherilt. 2019. “Some Principles
for Regulating Cyber Risk.” AEAPapersandProceedings 109
(May): 482–87. https://doi.org/10.1257/pandp.20191058.
Kemp, Katharine. 2019. “Concealed Data Practices and Com-
petition Law: Why Privacy Matters.” Report UNSWLRS
53, University of New South Wales Law Research Series,
UNSW Law, University of New South Wales, Sydney.
Kumar, Kabir, and Kim Muhota. 2012. “Can Digital Foot-
prints Lead to Greater Financial Inclusion?” CGAP Brief,
Consultative Group to Assist the Poor, Washington,
DC. https://www.cgap.org/sites/default/files/researches
/documents/CGAP-Brief-Can-Digital-Footprints-Lead-to
-Greater-Financial-Inclusion-Jul-2012.pdf.
LenddoEFL. 2020. “Product Briefs.” LenddoEFL, Singapore.
https://lenddoefl.com/resources.
Lendle, Andreas, Marcelo Olarreaga, Simon Schropp, and
Pierre-Louis Vezina. 2012. “There Goes Gravity: How
eBay Reduces Trade Costs.” Policy Research Working
Paper 6253, World Bank, Washington, DC.
Levy, Stephen. 2014. “Colorimetrix App Turns Smartphones
into Lab Test Readers.” MD+DI, March 19, 2014. https://
www.mddionline.com/digital-health/colorimetrix
-app-turns-smartphones-lab-test-readers.
Livingston, Jeffrey A. 2002. “How Valuable Is a Good Repu-
tation? A Sample Selection Model of Internet Auctions.”
Review of Economics and Statistics 87 (3): 453–65.
Lobosco, Katie. 2013. “Facebook Friends Could Change Your
Credit Score.” CNN Business, August 27, 2013. https://
money.cnn.com/2013/08/26/technology/social/facebook
-credit-score/index.html.
Lucking-Reiley, David, Doug Bryan, Naghi Prasad, and
Daniel Reeves. 2007. “Pennies from eBay: The Determi-
nants of Price in Online Auctions.” Journal of Industrial
Economics 55 (2): 223–33.
Luo, Xubei, and Chiyu Niu. 2019. “E-Commerce Participation
and Household Income Growth in Taobao Villages.”
Poverty and Equity Global Practice Working Paper 198,
World Bank, Washington, DC.
Magalhaes, Gustavo, and Catarina Roseira. 2017. “Open
Government Data and the Private Sector: An Empirical
View on Business Models and Value Creation.” Govern-
ment Information Quarterly 37 (3): 101248. https://doi.org
/10.1016/j.giq.2017.08.004.
Manyika, James, Michael Chui, Brad Brown, Jacques Bughin,
Richard Dobbs, Charles Roxburgh, and Angela Hung
Byers. 2011. “Big Data: The Next Frontier for Innovation,
Competition, and Productivity.” McKinsey Global Insti-
tute, New York.
Manyika, James, Susan Lund, Marc Singer, Olivia White,
and Chris Berry. 2016. “Digital Finance for All: Powering
Inclusive Growth in Emerging Economies.” McKinsey
Global Institute, New York. https://www.mckinsey.com
/~/media/McKinsey/Featured%20Insights/Employment
%20and%20Growth/How%20digital%20finance%20
could%20boost%20growth%20in%20emerging%20
economies/MG-Digital-Finance-For-All-Full-report
-September-2016.ashx.
Mastercard. 2017. “Leveraging Big Data to Drive Tourism
Revenue.” Mastercard Tourism Insights, Mastercard,
Purchase, NY. https://www.mastercard.us/content/dam
/public/mastercardcom/na/us/en/documents/tourism
-insights-summary.pdf.
Mateescu, Alexandra, and Aiha Nguyen. 2019. “Explainer:
Algorithmic Management in the Workplace.” February,
Data  Society Research Institute, New York. https://data
society.net/wp-content/uploads/2019/02/DS_Algorithmic
_Management_Explainer.pdf.
McGregor, Lorna, Daragh Murray, and Vivian Ng. 2019.
“International Human Rights Law as a Framework for
Algorithmic Accountability.” International and Compara-
tive Law Quarterly 68 (2): 309–43.
McNair, Corey. 2018. “Global Ad Spending Update: Alibaba,
Facebook, and Google to Capture Over 60% of Digital Ad
Dollars in 2019.” eMarketer, November 20, 2018, Insider
Intelligence, New York. https://www.emarketer.com
/content/global-ad-spending-update.
Melnik, Mikhail I., and James Alm. 2002. “Does a Seller’s
Ecommerce Reputation Matter? Evidence from eBay
Auctions.” Journal of Industrial Economics 50 (3): 337–49.
MGI (McKinsey Global Institute). 2018. “Notes from the AI
Frontier: Modeling the Impact of AI on the World Econ-
omy.” Discussion Paper, MGI, New York. https://www
.mckinsey.com/~/media/McKinsey/Featured%20
Insights/Artificial%20Intelligence/Notes%20from%20
the%20frontier%20Modeling%20the%20impact%20
of%20AI%20on%20the%20world%20economy/MGI
116 | World Development Report 2021
-Notes-from-the-AI-frontier-Modeling-the-impact-of-AI
-on-the-world-economy-September-2018.ashx.
Nyman, Sara, and Rodrigo Barajas. Forthcoming. “Antitrust
in the Digital Economy: A Global Perspective.” World
Bank, Washington, DC.
Nyman, Sara, and Clara Stinshoff. Forthcoming. “A Mapping
of Digital Platform Firms in Developing Economies.”
WDR 2021 background paper, World Bank, Washing-
ton, DC.
Okello, Ron. 2018. “TechCrunch Battlefield Africa Winner
Lori Systems Expands into Three New Countries; Wins
Multiple Bids Associated with $4B Railroad Project.”
Business Wire, August 18, 2018. https://www.businesswire
.com/news/home/20180918005205/en/TechCrunch
-Battlefield-Africa-Winner-Lori-Systems-Expands.
Open Access Government. 2019. “World Bank Blockchain
Pilot Shows Fresh Narrative for Haiti’s Farmers.” Block-
chain News, March 25, 2019. https://www.openaccess
government.org/world-bank-blockchain-haitis-farmers
/61205/.
Patnam, Manasa, and Weijia Yao. 2020. “The Real Effects
of Mobile Money: Evidence from a Large-Scale Fintech
Expansion.” IMF Working Paper WP/20/138, Interna-
tionalMonetaryFund,Washington,DC.https://www.imf
.org/en/Publications/WP/Issues/2020/07/24/The-Real
-Effects-of-Mobile-Money-Evidence-from-a-Large-Scale
-Fintech-Expansion-49549.
Riley, Thyra A., and Anoma Kulathunga. 2017. Bringing
E-money to the Poor: Successes and Failures.” Directions in
Development: Finance Series. Washington, DC: World
Bank.
Robertson, Viktoria H. S. E. 2020. “Excessive Data Collection:
Privacy Considerations and Abuse of Dominance in the
Era of Big Data.” CommonMarketLawReview 57 (1): 161–90.
Romm, Tony. 2020. “Companies Burned by Big Tech Plead
for Congress to Regulate Apple, Amazon, Facebook, and
Google.” Washington Post, January 18, 2020.
Rouhiainen, Lasse. 2019. “How AI and Data Could Personal-
ize Higher Education.” Harvard Business Review, Octo-
ber 14. https://hbr.org/2019/10/how-ai-and-data-could
-personalize-higher-education.
Rumo Arongo Ndiege, Joshua. 2019. “Social Media Tech-
nology for the Strategic Positioning of Small and
Medium-Sized Enterprises: Empirical Evidence from
Kenya.” EJISDC, Electronic Journal of Information Systems in
Developing Countries 85 (2): e12069. https://doi.org/10.1002
/isd2.12069.
Sandvine. 2019. “The Mobile Internet Phenomena Report,
February 2019.” Sandvine, Fremont, CA. https://www
.sandvine.com/hubfs/downloads/phenomena/2019
-mobile-phenomena-report.pdf.
Schmidt, Douglas C. 2018. “Google Data Collection.” Digital
Content Next, New York. https://digitalcontentnext.org
/wp-content/uploads/2018/08/DCN-Google-Data
-Collection-Paper.pdf.
Simonovska, Ina, and Michael Waugh. 2014. “The Elasticity
of Trade: Estimates and Evidence.” Journal of International
Economics 92 (1): 34–50.
Singh, Manish. 2020. “Google Ends Its Free Wi-Fi Program
Station.” TechCrunch, February 17, 2020. https://social
.techcrunch.com/2020/02/17/google-ends-its-free-wi-fi
-program-station/.
Startz, Meredith. 2017. “The Value of Face-to-Face: Search
and Contracting Problems in Nigerian Trade.” VoxDev,
July 31, 2017. https://voxdev.org/topic/firms-trade/value
-face-face-search-and-contracting-problems-nigerian
-trade#:~:text=My%20estimates%20suggest%20that%20
search,one%20sixth%20of%20consumer%20spending.
Stott,Andrew.2014.“OpenDataforEconomicGrowth.”Work-
ing Paper 89606, World Bank, Washington, DC. http://
documents.worldbank.org/curated/en/13162146815479
2082/Open-data-for-economic-growth.
Suri, Tavneet, and William Jack. 2016. “The Long-Run Pov-
erty and Gender Impacts of Mobile Money.” Science 354
(6317): 1288–92. https://science.sciencemag.org/content
/354/6317/1288.
Taylor, Phil. 2016. “Marie Stopes Turns to MPedigree for
Tracking Technology.” Securing Industry, Bromsgrove,
UK. https://www.securingindustry.com/pharmaceuticals
/marie-stopes-turns-to-mpedigree-for-tracking-technology
/s40/a2723/.
UNESCO (United Nations Educational, Scientific, and Cul-
tural Organization). 2020. “Education: From Disruption
to Recovery.” UNESCO, Paris. https://en.unesco.org
/covid19/educationresponse.
UNGWG (United Nations Global Working Group on Big
Data for Official Statistics). 2019. “Handbook on the
Use of Mobile Phone Data for Official Statistics.” Draft,
Statistics Division, Department of Economic and Social
Affairs, United Nations, New York. https://unstats.un.org
/bigdata/task-teams/mobile-phone/MPD%20Handbook
%2020191004.pdf.
Universitas Indonesia. 2020. “Dampak Ekonomi Gojek 2019:
dan Peran Ekosistem Gojek di Indonesia Saat Pandemi
COVID-19.” Demographics Institute, Faculty of Econom-
ics and Business, University of Indonesia, Depok, Indo-
nesia. https://drive.google.com/file/d/1RgVRNJaszs36-3z
__smhsqBegrufhNWw/view.
von Mörner, Moritz. 2017. “Application of Call Detail Records:
Chances and Obstacles.” Transportation Research Procedia
(25): 2233–41. https://doi.org/10.1016/j.trpro.2017.05.429.
West, Sarah, and Ellery Roberts Biddle. 2017. “Facebook’s Free
Basics Doesn’t Connect You to the Global Internet, But It
Does Collect Your Data.” Global Voices Advox, July 27, 2017.
https://advox.globalvoices.org/2017/07/27/facebooks-free
-basics-doesnt-connect-you-to-the-global-internet-but-it
-does-collect-your-data/.
WHO (World Health Organization). 2018. “Substandard and
Falsified Medical Products.” Fact Sheet, WHO, Geneva.
https://www.who.int/news-room/fact-sheets/detail
/substandard-and-falsified-medical-products.
WHO(WorldHealthOrganization).2019.“TheGlobalVaccine
Action Plan (2011–2020): Review and Lessons Learned.”
Report WHO/IVB/19.07, Strategic Advisory Group of
Experts on Immunization, WHO, Geneva. http://www
.who.int/immunization/global_vaccine_action_plan/en/.
World Bank. 2017. World Development Report 2017: Governance
and the Law. Washington, DC: World Bank.
The huge potential of open data for business applications | 117
Public intent data play a foundational role as a system of reference for the
entire economy.
Spotlight 3.1
The huge potential of open data for
business applications
Public intent data add tremendous value to the econ-
omy as a whole and to various sectors. The gross value
added from public data is estimated to range from 0.4
percent to 1.4 percent of GDP, according to a 2016 meta-
study that focused mainly on high-income countries.1
Specific public intent data products also yield great
value for particular sectors. For example, Denmark’s
open access dataset of addresses generated direct
economic benefits of €62 million (over DKr 450 mil-
lion) between 2005 and 2009, returning the €2 million
(roughly DKr 15 million) cost of investments in data
manytimesover.2
Thisexampleisrelevantforlow-and
middle-income countries, where the lack of addresses
and address datasets have been a barrier to the devel-
opment of data-driven transport and logistics services.
Creating value using public
intent data
Entrepreneurs create value using public intent data in
ways similar to how they use other data. First, com­
panies use public intent data to improvetheir­operations.
USretailers,forexample,combinedatafromtheAmer-
ican Community Survey with their own sales data to
customize inventory regionally.3
On the operations
side, businesses rely on price-level data to set wages
and allowances, among many other uses.4
Second,
firms use public intent data to develop new products
or services, including research and analytics services.5
For
example, the global energy analytics sector depends
on data from the Energy Information Administration
to monitor worldwide patterns of energy use.6
Other
businesses use data to provide new forms of advice to
theircustomers.Farmerline,acompanyinGhana,com-
bines government meteorological and administrative
data with proprietary data to provide advice to farm-
ers via text message. Firms also rely on demographic
statistics and business registers to inform their deci-
sions about whether and how to enter new markets.
Finally, firms may act as data intermediaries (see chap-
ter 8), aggregating and repackaging government data
in more accessible, user-friendly formats.
The changing landscape of
business sectors driven
by public data
While businesses driven by public data have been
studied in high-income economies in some detail,7
there is little systematic information on private sec-
tor use of public intent data and their value to the
economy in lower-income countries. Nevertheless, a
handful of sources shed light on the business use of
public intent data, including in emerging economies.8
These sources indicate that companies using pub-
lic intent data span a wide range of sectors in both
high-income and low- and middle-income econo-
mies. Around the world, the technology sector clearly
dominates. In low- and middle-income economies,
the research and consulting sector is the second most
frequent user of public intent data. Companies using
such data tend to be young and small in terms of the
number of employees, with a large majority of global
companies that use GovLab’s OpenData500 Global
Network database having 200 or fewer employ-
ees. In terms of the data used, half of the US-based
OpenData500 companies use data from multiple
government agencies. The US Census Bureau is one
of the most used sources (16 percent). Similarly, in
Mexico, the national statistics office, the National
118 | World Development Report 2021
Institute of Statistics and Geography (INEGI), is the
most important source of public intent data for busi-
nesses, with 88 percent of companies reporting that
they use INEGI data. Among the 200 firms in low- and
middle-income countries included in the Open Data
Impact Map compiled by the Open Data for Devel-
opment Network, geospatial data are the most com-
monly used type of public intent data (41 percent),
followed by demographic data (36 percent), economic
data (30 percent), and health data (27 percent).
These assessments indicate that the business
sector that uses public data is much smaller in most
low- and middle-income countries than in high-
income economies. This pattern is related closely
to challenges with public intent data in general.9
In
many cases, national data systems are limited with
respect to the amount of data being produced as well
as their quality and usability, timeliness, openness,
and accessibility (see chapter 2).10
Indeed, companies
driven by open data surveyed by the World Bank
Figure S3.1.1 Private company use of public data is extremely valuable in the
United States, suggesting the value of open government data
Source: Verhulst and Caplan 2015. Data at http://bit.do/WDR2021-Fig-S3_1_1. © The GovLab. Used with permission of The GovLab; further permission
required for reuse.
Note: The figure centers on 500 US firms. Starting with the right-hand, purple-shaded half of the circle, the gray lines emanating from the federal agencies
show which type of private sector firms have used data from which government departments. The portion of the semicircle for each department reflects
the number of firms using its data. Moving to the left-hand, green-shaded half of the circle, the gray lines emanating from the private sector reveal which
categories of company have used data from which government agency. For example, firms in the finance and investment sector have used open data from
19 US departments and agencies.
De
pa
rtm
en
t
of
Ag
ric
ul
tu
re
D
e
p
a
r
t
m
e
n
t
o
f
C
o
m
m
e
r
c
e
Departm
ent of Defense
Department of Education
Department of Energy
Department of Health and Human Services
Department of Homeland Security
Department of Justice
Department of Labor
Department of Transportation
Department of the Interior
Departm
ent of the
Treasury
F
e
d
e
r
a
l
E
l
e
c
t
i
o
n
C
o
m
m
i
s
s
i
o
n
F
e
d
e
r
a
l
R
e
s
e
r
v
e
B
o
a
r
d
M
u
l
t
i
p
l
e
f
e
d
e
r
a
l
d
a
t
a
s
o
u
r
c
e
s
N
a
t
io
n
a
l
A
e
r
o
n
a
u
t
ic
s
a
n
d
S
p
a
c
e
A
d
m
in
is
t
r
a
t
io
n
S
e
c
u
ri
ti
e
s
a
n
d
E
x
c
h
a
n
g
e
C
o
m
m
is
s
io
n
Environm
ental
Protectio
n
Agency
Tr
an
sp
o
rt
at
io
n
In
s
u
ra
n
c
e
E
n
e
r
g
y
H
o
u
s
i
n
g
a
n
d
r
e
a
l
e
s
t
a
t
e
G
e
o
s
p
a
t
i
a
l
a
n
d
m
a
p
p
i
n
g
Governance
Lifestyle and consumer
Health care
Food and agriculture
Business and legal services
Finance and investment
Scientific research
D
a
t
a
a
n
d
t
e
c
h
n
o
l
o
g
y
E
d
u
c
a
ti
o
n
E
n
v
ir
o
n
m
e
n
t
a
n
d
w
e
a
th
e
r
Re
se
arc
h
an
d
co
ns
ult
ing
Company
categories
Federal
agencies
The huge potential of open data for business applications | 119
reported poor quality and lack of openness of and
accessibility to public intent data as major concerns.11
The more active public data–driven business sec-
tors in Mexico and other Latin American countries
illustrate the potential for low- and middle-income
countries. Realizing this potential requires better
financing mechanisms and high-capacity, integrated,
and open national data systems (see chapter 9).
Figure S3.1.1 visualizes the huge potential of open
data. The figure centers on 500 firms based in the
United States. It reveals widespread private sector use
of publicly available federal government data, though
data from some agencies are used more than from
others. For example, fewer firms use open data from
the Department of Agriculture than from the Depart-
ment of Commerce (home of the US Census and
many other important data), as evidenced by a smaller
portion of the circle allocated to the Department of
Agriculture. But, even in the case of agriculture, firms
from seven distinct sectors have used the Department
of Agriculture’s open data. Figure S3.1.1 also reveals
that companies from some sectors rely on many types
of government data. For example, firms in the finance
and investment sector have used open data from 19
US departments and agencies. The intricacies of the
connections reveal the great potential for data use,
reuse, and repurposing. Such uses have only begun to
be exploited in both higher-income and lower-income
countries.
Notes
1. Lateral Economics (2014).
2. McMurren, Verhulst, and Young (2016).
3. Hughes-Cromwick and Coronado (2019).
4. Hughes-Cromwick and Coronado (2019).
5. Gurin, Bonina, and Verhulst (2019); Magalhaes and
Roseira (2017).
6. Hughes-Cromwick and Coronado (2019).
7. See, for example, Hughes-Cromwick and Coronado
(2019); Lateral Economics (2014); Manyika et al. (2013);
and Stott (2014).
8. See Morrison and Lal Das (2014); Center for Open Data
Enterprise, Open Data Impact Map (database), https://
opendataimpactmap.org/.
9. Gurin, Bonina, and Verhulst (2019).
10. Gurin, Bonina, and Verhulst (2019).
11. Morrison and Lal Das (2014).
References
Gurin, Joel, Carla Bonina, and Stefaan Verhulst. 2019. “Open
Data Stakeholders: Private Sector.” In The State of Open
Data:HistoriesandHorizons,editedbyTimDavies,Stephen
B. Walker, Mor Rubinstein, and Fernando Perini, 418–29.
Cape Town, South Africa: African Minds; Ottawa: Inter-
national Development Research Centre. https://www
.idrc.ca/en/book/state-open-data-histories-and-horizons.
Hughes-Cromwick, Ellen, and Julia Coronado. 2019. “The
Value of US Government Data to US Business Deci-
sions.” Journal of Economic Perspectives 33 (1): 131–46.
https://doi.org/10.1257/jep.33.1.131.
LateralEconomics.2014.“OpenforBusiness:HowOpenData
Can Help Achieve the G20 Growth Target.” Omidyar Net-
work, Redwood City, CA. https://lateraleconomics.com
.au/wp-content/uploads/omidyar_open_business.pdf.
Magalhaes, Gustavo, and Catarina Roseira. 2017. “Open
Government Data and the Private Sector: An Empirical
View on Business Models and Value Creation.” Govern­
ment Information Quarterly 37 (3): 101248. https://doi.org
/10.1016/j.giq.2017.08.004.
Manyika, James, Michael Chui, Peter Groves, Diana Farrell,
Steve Van Kuiken, and Elizabeth Almasi Doshi. 2013.
Open Data: Unlocking Innovation and Performance with
Liquid Information. New York: McKinsey Global Insti-
tute. https://www.mckinsey.com/business-functions
/mckinsey-digital/our-insights/open-data-unlocking
-innovation-and-performance-with-liquid-information.
McMurren, Juliet, Stefaan Verhulst, and Andrew Young.
2016. “Denmark’s Open Address Data Set: Consolidating
and Freeing-Up Address Data.” The GovLab, New York
University, New York; Omidyar Network, Redwood
City, CA. https://odimpact.org/case-denmarks-open
-address-data-set.html.
Morrison, Alla, and Prasanna Lal Das. 2014. “New Surveys
Reveal Dynamism, Challenges of Open Data-Driven
Businesses in Developing Countries.” Data Blog (blog),
December15,2014.https://blogs.worldbank.org/opendata
/new-surveys-reveal-dynamism-challenges-open-data
-driven-businesses-developing-countries.
Stott, Andrew. 2014. “Open Data for Economic Growth.”
Working Paper 89606, World Bank, Washington, DC.
http://documents.worldbank.org/curated/en/13162146
8154792082/Open-data-for-economic-growth.
Verhulst, Stefaan, and Robyn Caplan. 2015. “Open Data: A
Twenty-First-Century Asset for Small and Medium-
Sized Enterprises.” The GovLab, New York Univer-
sity, New York. https://www.thegovlab.org/static/files
/publications/OpenData-and-SME-Final-Aug2015.pdf.
World-Development-Report-2021-Data-for-Better-Lives.pdf
Creative reuses of data for greater value | 121
Main messages
Innovations in repurposing and combining public intent and private
intent data are opening doors to development impacts previously
unimaginable. These innovations can inform and advance policy goals,
help governments improve and target service delivery, and empower
individuals and civil society.
When private intent data are repurposed for public purposes, they can
help fill data gaps and provide real-time and finer-scale insights. When
public intent and private intent data are combined, some or many of the
limitations of each data type can be overcome.
Private intent data can be difficult to understand, monitor, and regulate.
They may also miss the poorest or other marginalized populations and
perpetuate discrimination and biases. Data protection is a key issue.
Responsive regulation and consumer protection measures are needed,
along with recognition of which populations are omitted from an analysis.
Using private intent data for effective policy making requires short- and
long-term coordinated investments in training, data partnerships, and
research. Best practices and guidelines need to be developed.
Creative reuses of data
for greater value
4
1
2
3
4
CHAPTE
R
122 | World Development Report 2021
The power of repurposing and
combining different types and
sources of data
L
ack of data and information is no more appar-
ent than during a crisis such as the COVID-19
pandemic or an earthquake. Urgent questions—
What is happening? How can we help?—should
receive good answers, and right away.
Consider the earthquake that devastated Haiti in
2010. Large donations of supplies and money poured
into the country within days of the disaster, but
delivering relief was difficult because vast numbers
of people scattered. Censuses were no longer useful
in helping responders direct relief to the people who
needed it most. Using data from mobile phones,
researchers were later able to demonstrate that they
could have pinpointed population movements in
almost real time. They found that one-third of the
estimated 630,000 residents of the capital, Port-au-
Prince, had fled the city.1
Even though this study
was retrospective, it demonstrated how real-time,
spatially pinpointed information like this could have
expedited relief efforts and saved countless lives had
it been accessed contemporaneously. This example
highlights an emerging question in development
research: When a pressing crisis such as the Haiti
earthquake or the COVID-19 pandemic emerges, what
data can complement traditional public intent data to
solve complex development challenges?
Recent technological shifts in lower-income coun-
tries—such as the adoption of mobile phones, social
media, digital transactions, and mobile money—have
generated a wealth of granular private intent data
(see chapter 3 and box 4.1) suited to a wide range of
secondary uses.2
These data are being leveraged to
Box 4.1 Using cellphones to combat COVID-19
After the onset of the COVID-19 outbreak, governments
began implementing policy measures to reduce social
contact and curb the spread of the pandemic. Data col-
lected through mobile phones, such as call detail records
and global positioning system (GPS) location data, have
been extremely valuable in quantifying the effectiveness
of policies, ranging from partial curfews to strict lock-
downs. These data enable measurement of population
density, travel patterns, and population mixing in real
time and at high resolution, making it possible to better
target policy interventions and improve epidemiological
modeling.a
Analysis of GPS locations showed that by
March 23, 2020, social distancing policies had helped
reduce mobility in major US cities by half.b
In Colombia,
Indonesia, and Mexico, the impact of nonpharmaceutical
interventions (such as travel restrictions and lockdowns)
on mobility differed by socioeconomic group. Smart-
phone users living in the top 20 percent wealthiest neigh-
borhoods in Jakarta, Indonesia, reduced their mobility up
to twice as much as those living in the bottom 40 percent
Map B4.1.1 Mapping the home location of smartphone users in Jakarta, 2020
Source: Fraiberger et al. 2020. Data at http://bit.do/WDR2021-Map-B4_1_1.
Note: This map of Jakarta’s metropolitan area shows the spatial distribution of smartphone users’ home location as a percentage of
Jakarta’s total population.
(Box continues next page)
IBRDINS45645 | MARCH2021
JAKARTA
JAKARTA
IBRD 45401 |
MARCH 2021
1
1–4.99
5–7.49
7.5–12.49
12.5
No data
National capital
Kelurahan boundaries
Smartphone users
(% of total population)
Creative reuses of data for greater value | 123
Box 4.1 Using cellphones to combat COVID-19 (continued)
(map B4.1.1 and figure B4.1.1).c
Using an epidemiological
model and estimates of population movements derived
from mobile phone data, research in China found that
nonpharmaceutical interventions implemented in late
January 2020 led to a 98.5 percent reduction in the num-
ber of COVID-19 cases one month later.
Meanwhile, mobile phones have proved to be a valu-
able tool for contact tracers seeking to alert individuals
who may have been in contact with an infected person.d
Although both private companies and government
actors have produced mobile phone applications for con-
tact tracing (such as the Corona app 100m in the Republic
of Korea, TraceTogether in Singapore, and COVIDSafe in
Australia), their efficacy relative to more traditional forms
of contact tracing has not yet been established. Digital
contact tracing also raises important concerns about
data protection,e
prompting researchers worldwide to
develop contact tracing technologies that preserve pri-
vacy. Examples are the Private Kit: Safe Paths developed
by the Massachusetts Institute of Technology (MIT) and
the Decentralized Privacy-Preserving ­
Proximity Tracing
(DP3T) protocol developed by a consortium of European
research institutions.
Despite the potential of deploying mobility data in
the fight against COVID-19, their impact on policy thus far
has been limited, especially in lower-income countries.
Bottlenecks include a lack of technical expertise among
government organizations; restrictions on data access,
especially by mobile network operators; and lack of the
investments and political will required to scale up one-
time projects.f
To ensure that mobility data can be made
accessible and useful for policy purposes, it is import-
ant for all stakeholders—governments, mobile phone
operators, technology companies, and researchers—
to collaborate and form interdisciplinary teams to facili-
tate readiness and responsiveness to future crises.
a. Buckee et al. (2020).
b. Klein et al. (2020).
c. Fraiberger et al. (2020).
d. Servick (2020b).
e. Servick (2020a).
f. Oliver et al. (2020).
Figure B4.1.1 Smartphone location data reveal the changes in the time users
spend at home in Jakarta
Source: Adapted from Fraiberger et al. 2020. Data at http://bit.do/WDR2021-Fig-B4_1_1.
Note: Figure shows the changes in the time users spent at home from February 1 to November 15, 2020, relative to the baseline period. PSBB =
Pembatasan Sosial Berskala Besar (large-scale social restrictions).
F
e
b
r
u
a
r
y
0
5
–5
10
15
20
Time
spent
at
home
(%
change)
25
M
a
r
c
h
A
p
r
i
l
Presidential
address
State
of
emergency
Strict
PSBB
Transport
restrictions
lifted
PSBB
transition
Strict
PSBB
M
a
y
J
u
n
e
J
u
l
y
A
u
g
u
s
t
S
e
p
t
e
m
b
e
r
O
c
t
o
b
e
r
N
o
v
e
m
b
e
r
D
e
c
e
m
b
e
r
2020
Wealth of administrative unit:
Low-wealth smartphone users (0%–40% of wealth)
Medium-wealth smartphone users (40%–80% of wealth)
High-wealth smartphone users (80%–100% of wealth)
124 | World Development Report 2021
monitor the effectiveness of policy measures and
predict outcomes of long-standing concern to devel-
opment practitioners and policy makers.
Technological advances in the private sector have
turned data into an integral component of the pro-
duction process, leading to gains in productivity and
generating even more data that can be repurposed
for development. Specifically, the same approaches
that are transforming efficiency and innovation
in the private realm are being repurposed to tackle
development bottlenecks in poor countries, making
the development process more efficient, innovative,
agile, and flexible. Because of the nonrivalrous nature
of data, private companies also are able to reuse and
repurpose publicly collected data, which can generate
welfare-enhancing economies of scope.3
That said, the reuse of private intent data is not a
panacea and may pose unique challenges for policy
making. For example, data created by businesses to
track mobile phone users may miss the poorest popu-
lations who do not have these technologies. Similarly,
the data required to target customer experiences and
to achieve business gains are different from the socio-
demographic information on which policy makers
rely to design inclusive policy. Furthermore, many of
the algorithms used to process private intent data are
considered trade secrets and thus lack the transpar-
ency required for effective policy making. Transpar-
ency and oversight are also important considerations
when giving private companies access to sensitive
data such as those related to facial recognition and
surveillance (see chapter 6).
Despite these challenges, combining public intent
and private intent data can offer real-time insights
that not only are inclusive of the entire population
(or nearly so) but also are more precisely estimated
for specific population segments and localities. This
is especially important for the poorest people in
the poorest countries, which have the largest data
gaps. Too often, individuals on the lowest end of the
income distribution remain on the margins when
government, civil society, and the private sector lack
the data to effectively allocate and target resources
based on need. Leveraging all available data may
reveal insights for the poor and marginalized that
were previously unattainable.
This chapter begins by showcasing innovative
uses of public intent and private intent data for aiding
development policy. Examples include data repurpos-
ing and synergies to improve predictions of disease
spread, streamline service delivery, and allocate aid
in disaster recovery. The chapter then turns to an
exploration of the challenges that arise when private
intent data are repurposed or when public intent and
private intent data are combined. It concludes with a
framework within which policy makers and funders
could invest in the human capital, data partnerships,
and research needed to gain useful insights from
these new types and combinations of data.
Features of private intent data that can
overcome gaps in public intent data
Private intent data are an alluring candidate to over-
come public intent data gaps and offer new perspec-
tives on development problems. These types of data
are increasingly large in scale, “always on,” zoomed in,
and, at times, less biased.
Big data. Private intent data are typically labeled
“big data,”4
recognizing their wide reach and scope.
The growing rates of mobile phone and social media
usage enable information to be gathered from all
users on these platforms. Although this process may
underrepresent certain parts of the population in
countries with lower usage rates, ever-larger portions
of a population are being brought into the fold as the
rates of mobile phone ownership and internet con-
nectivity continue to increase, even in lower-income
countries. When private intent data are repurposed
toward a public goal, their volume and reach can not
only inform first-order policy goals of poverty reduc-
tion and service delivery, but also facilitate efforts to
detect and study rare events, such as fraud, corrup-
tion, or criminal activity, through techniques such as
anomaly detection.
“Always on” data. Private intent data are always
on5
because the daily use of new technologies
entails constant data collection. Call detail records
(CDRs) and apps that log locations pinpointed by
satellite-based global positioning systems (GPS) offer
traces of where cellphone users travel throughout
the day. When a sudden and unexpected shock hits,
such as a natural disaster or a disease outbreak, such
data can provide precious real-time information on
human mobility and call density. The timeliness of
private intent data therefore contrasts with public
intent data, which are generally collected at inter-
vals of 1, 5, or 10 years and thus are not always very
timely. In Africa, for example, 14 of 59 countries did
not conduct any surveys from 2000 to 2010, impeding
the construction of nationally representative poverty
measures.6
This critical situation sparked the call for a
“data revolution” by the United Nations in 2014, push-
ing for an increase in data collection efforts in Africa
and elsewhere.7
Although the situation is improving,
with the average number of surveys per country per
year increasing from 0.5 in 1990 to 1.5 in 2010,8
the
Creative reuses of data for greater value | 125
lack of timeliness of public intent data has resulted in
huge knowledge gaps, which are particularly glaring
following major economic shocks such as COVID-19.
Meanwhile, private intent data are increasingly being
used to help fill these gaps.
“Zoomed in” data. Private intent data can zoom in
on individuals and locations. Private companies want
to know who is using their products or services and
in what ways they can optimize their offerings and
operations. Private intent data zoom into individuals
to collect key metrics such as transaction histories
to predict consumer behavior and bolster successful
products. Internet Protocol (IP) addresses, browsing
histories, and smartphone app logs add to a rich data-
set that companies collect on a single person over
time. Tracking whether app users enter a store or
whether IP addresses in a neighborhood are search-
ing for products on their site enable companies to
better plan their store locations and stock their sup-
plies. These data are now being applied to the public
sphere, ranging from improving population maps9
to
helping decision-makers target and optimize critical
development resources. A key challenge to using indi-
vidual data patterns to allocate resources or establish
eligibility for products and services is data manipula-
tion: individuals may strategically change browsing
or other data usage behavior to appear more favorable
in ranking criteria used by data algorithms to make
allocation decisions. More research and policy delib-
erations are needed to design algorithms and decision
rules that account for such user manipulation.
Potentiallylessbiaseddata. Private intent data poten-
tially reveal less “biased” information about people
than surveys or polls because researchers observe
actual behavior instead of relying on responses.
Although it is possible that respondents misreport
answers during surveys, they have little incentive to
do so when searching the internet. For that reason,
the Google internet search engine has been dubbed a
“digital truth serum.”10
This finding may apply espe-
cially to opinions on sensitive topics such as racism.
Few will admit their opinions in surveys, but they are
revealed through internet searches and can influence
political outcomes, among others.11
However, the
algorithms used by search engines are considered
private trade secrets and are usually optimized for
private benefit—not public benefit. Without knowl-
edge of the workings of these algorithms, users of
search engine outcomes as an exclusive source of
data may find they lead to biased and discriminatory
policy predictions.
Overall, combining public intent and private
intent data is a powerful way to gain aggregate
population insights in real time, if enough attention
is given to addressing representativeness, discrimi-
nation, and transparency. Calibrating private intent
data with census and survey data is one way to esti-
mate population-level needs.
The next section offers a broad range of innovative
examples of applications of private intent data to pub-
lic policy and instances in which public intent and
private intent data have been combined to promote
inclusive and timely development solutions.
New insights from repurposing
and combining data
Thelastdecadehasseenasurgeininnovativeresearch
that repurposes private intent data and combines it
with public intent data to tackle development issues.
In the spring and summer of 2020 when the COVID-19
outbreak reached global dimensions, more than 950
scientific and medical articles were published that
used private intent data to tackle the pandemic (box
4.2). Researchers’ ability to respond quickly to the
pandemic builds on a growing trend of research that
combines diverse data to tackle emerging issues.
Monitoring public health
Monitoring public health is a key area that could
benefit from repurposing and combining public
intent and private intent data. In many lower-income
countries, infectious diseases routinely pose large
health threats. Five of the top 10 causes of death in
low-income countries are communicable diseases,
including lower respiratory infections, diarrheal dis-
eases, HIV/AIDS, malaria, and tuberculosis.12
Viruses
have been responsible for more deaths than all armed
conflicts around the world over the last century.13
Especially in countries where data are limited, new
big private intent data sources can help inform public
policy interventions to reduce the mortality and mor-
bidity rates from infectious diseases. Identification
of hotspots can help disease control programs target
activities more effectively to those areas, reducing
infection rates both directly and indirectly in destina-
tion areas that are receiving infected travelers.14
As early as 2008, researchers began exploring
how mobile phone data could be used to measure
population mobility and then be applied to the study
of epidemics.15
A seminal study applied this research
at scale for all of Kenya using mobile phone data on
nearly 15 million individuals to identify sources of
imported malaria infections stemming from human
mobility.16
During the 2014 Ebola outbreak in West
Africa, researchers highlighted the potential benefits
126 | World Development Report 2021
of using mobile phone data in the design of public
policy.17
However, use of these analytics at the time of
the crisis remained limited.18
After onset of the COVID-19 pandemic, countries
began to deploy this type of research and to pair
mobile phone data with public intent data. Belgium
formed a Data Against COVID-19 task force to ana-
lyze deidentified mobile phone data. These data are
being used to monitor changes in human mobility
trends due to lockdown measures and to inform
decisions related to appropriate lockdown measures.
In the Republic of Korea, mobile phone data are being
used to aid contact tracing efforts to contain disease
spread. By combining mobile phone data with med-
ical facility records, credit card transaction logs, and
closed-circuit television recordings, the government
is identifying people at risk of exposure.19
Lower-
income countries such as Ghana and Mozambique
are beginning to use deidentified mobile phone data
to combat the pandemic, typically with the support
of international organizations that provide analytical
skills for processing the data.20
Other types of big data are also being enlisted to
create measures of mobility that can improve the
effectiveness of the pandemic response. Facebook
disease prevention maps are being used to study
COVID-19 and have been expanded to include colo-
cation maps that measure comingling among people
living in different areas and trends in whether indi-
viduals are staying near their homes or continuing to
go to other locations.21
Google has produced a new set
of measures to track the response to policies aimed
at flattening the curve of the COVID-19 pandemic.22
Other sources of data for GPS locations have been
Box 4.2 Leveraging private intent data to tackle COVID-19
Between February and September 2020, more than
950 articles were published in scientific, medical, and
technical journals that repurposed cellphone, social
media, Google search, and other types of big private
intent data to better understand the spread of COVID-
19 and to offer policy and operational solutions (figure
B4.2.1). Despite the relatively large number of articles in
a short time span, coverage of lower-income countries
was low, especially those in Africa (map B4.2.1). Lack
of expertise, poor training, difficult access to data, and
limited research support are key areas that funders could
address to ensure innovative uses of data in and about
lower-income countries.
(Box continues next page)
Figure B4.2.1 Use of repurposed data to study COVID-19: Published articles,
by type of private intent data used
Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI),
http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Fig-B4_2_1.
Note: Figure shows the number of articles published in scientific, medical, and technical journals across time from February to September 2020.
The cumulative sum across all categories is higher because some articles appear in more than one category.
Articles containing
“COVID” and . . .
Call detail records
Cellphone
Facebook
Google Trends
Big data
Twitter
Social media
1,000
750
500
250
Number
of
articles
0
February April July
2020
September
Creative reuses of data for greater value | 127
used by data analytics firms such as Baidu, Cuebiq,
and Unacast to assess the impacts of social distancing
measures for COVID-19.23
GPS data provide better
approximation of locations and mobility at a finer
spatial resolution, but their availability is limited by
smartphone penetration and usage. In many lower-
income countries, smartphone penetration is still low,
and even those individuals with smartphones may
only selectively turn on data or GPS because of high
costs and drain on battery life.
The potential of new data sources for support-
ing public health and epidemiology efforts goes far
beyond measures of mobility.24
Efforts are under
way to use data tools as early warning systems for
outbreaks and for understanding disease dynamics
and routes of transmission. For example, the com-
pany BlueDot provides infectious disease surveil-
lance services using advanced data analytics. It was
able to warn of the outbreak of COVID-19 before the
official announcement in early January 2020 by ana-
lyzing news reports, disease networks, and official
proclamations.25
A similar prediction was made for
the 2015–16 Zika outbreak that affected an estimated
1 million people, mainly in Latin America.26
By com-
bining online news sources, Google search queries,
Twitter posts, and government disease reports, local
outbreaks could have been detected two to three
weeks earlier, a retrospective study estimates.27
Com-
bining public intent and private intent data sources
has also improved forecasts for Ebola in West Africa28
and dengue in Southeast Asia.29
Improved forecasting
of disease outbreaks and associated population move-
ments is essential for efficient response measures to
curb incidence rates.30
Another open and fertile source of synergy is data
collected by wearables and other biotech devices. For
example, the Kinsa HealthWeather app tracks fevers
around the United States via smart thermometers
and uses the aggregate data to create prediction mod-
els for the spread of disease. This type of application
is particularly relevant in crises such as COVID-19,
where timely reporting of case growth can help
Map B4.2.1 Uses of repurposed data to study COVID-19: Published articles,
by country
Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI),
http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Map-B4_2_1.
Note: Map shows the number of articles published in scientific, medical, and technical journals across countries from February to September 2020.
Article counts are divided by the COVID-19 death incidence rate.
Box 4.2 Leveraging private intent data to tackle COVID-19 (continued)
7.5
No data
10.0
12.5
15.0
Number of articles
IBRD 45xxx |
MARCH 2021
IBRDWLD45644 | MARCH2021
128 | World Development Report 2021
accurately map disease spread and enable timely and
appropriate public policy responses.
Targeting resource allocations and
responses during crises
Approximately 20–30 million people worldwide are
displaced every year because of natural disasters
such as storms, floods, droughts, and geological
events.31
Over the last decade, about 600,000 people
lost their lives to natural disasters, most of them in
low- and middle-income countries.32
Effective disas-
ter prevention, mitigation, response, and recovery
require timely, cost-effective data at fine spatial
scales. However, many countries lack the adequate
early warning systems and advanced geological tools
to aid in this process—at times with devastating con-
sequences. During the 2018 earthquake and tsunami
in Central Sulawesi, Indonesia, the government could
have minimized the human cost had the country’s
warning system of buoys and seismographic sensors
not been defective.33
As climate change continues
to increase the frequency and damage of natural
disasters, lower-income countries will likely bear the
brunt of the economic and human impacts. Spotlight
4.1 highlights the importance of improved meteoro-
logical data for lower-income countries to confront
enhanced climate risks.
Recent data innovations have revealed that non-
traditional sources of private intent data such as
mobile phone usage, social media activity, online que-
ries, crowdsourcing platforms, and remote sensing
technologies can facilitate disaster management.34
These devices and activities are not a replacement for
advanced geological and meteorological equipment,
which can predict disasters and offer early warnings.
They can, however, help in government efforts to pre-
vent loss and provide relief when such events occur.
Various studies in both lower- and higher-income
countries have found that scraping social media plat-
forms for posts related to seismic activity produces an
in situ impact profile of seismic damage similar to the
ones produced by advanced geological instruments,
the traditional source of such data.35
Similarly, Tweets
have been analyzed for disaster-related keywords to
detect earthquakes in Australia and New Zealand.36
Deidentified CDR data are a good predictor of popu-
lation movement for weather-related disasters such
as floods. For example, the textual content of Tweets
was used to understand how people were reacting to
the 2011 floods in Thailand. Messages were classified
by their content to help highlight precise needs in
affected communities.37
The geospatial nature of social media posts can
further help prioritize resource allocation in times of
direneed.Moreover,combininggeographicandsocial
media analytics can enhance aid recovery efforts after
a disaster. In the aftermath of the 2014 earthquake
in Napa, California, researchers trained a machine
learning algorithm to extract disaster-related seman-
tics from Tweets and paired this information with
geolocations to identify spatial hotspots.38
From these
data, they were able to infer a disaster footprint and
assess damage. They also learned that this method
was transferrable to other social media platforms
and locations, with tweaks for cultural differences
in social media use. Similarly, researchers studying
Hurricane Irma, which hit Florida in 2017, found that
sentiment analysis39
on geolocated Tweets could be
used to guide resource allocation.40
Social media and
mobile records have also proven useful in tracking
recovery efforts. After Hurricane Sandy slammed into
the New York City area in 2012, researchers analyzed
Tweet topics and sentiment to see how those who
experienced the disaster were coping, compared with
those who did not experience it.
Finally, governments have long used satellite
imagery to assess damage in the aftermath of natural
disasters. However, this imagery usually lacks the spa-
tial resolution needed for a granular assessment. It is
typically considered public intent data, but a growing
number of private companies are launching their own
remote sensing technologies and data collection. The
start-up Cloud to Street uses private satellite data to
provide near real-time flood assessments to assist
disaster recovery and adaptive planning. In three days
in 2018, it was able to build a flood monitoring system
to help the Democratic Republic of Congo deploy
resources to 16,000 asylum seekers who had sought
refugealongtheflood-pronebanksoftheCongoRiver.
Cloud to Street leveraged high-resolution private
intent satellite data with data about cropland, popula-
tion, and public assets (such as roads and infrastruc-
ture) to generate real-time impact estimates served
on an interactive web platform and with automated
alerts. As decision-makers transitioned from disaster
response to recovery, Cloud to Street transitioned to
using freely available satellite images—an effort that
enabled longer-term support with fewer resources.41
Mapping poverty and targeting service
delivery more precisely
Timely, reliable data on population characteristics
are vital for responsive social and economic policy
making. Mobile CDR and remote sensing data have
Creative reuses of data for greater value | 129
recently been used to predict poverty patterns on
a granular level and in a timely fashion, thereby
helping to better target government services. Use of
these data sources costs a fraction of that for fielding
censuses or household surveys. Similar data from
social media, online engagement, and satellite imag-
ery are reducing the constraints to collecting data on
the most vulnerable and hard-to-reach populations.
Moreover, the same algorithms that Google and
Facebook use for online consumer marketing can be
tweaked to direct resources to people living in pov-
erty. In the same way that these tech firms predict
the advertising that may interest consumers based
on their digital behavior, development actors can use
digital behavior to predict whether people are eco-
nomically vulnerable.42
Research relying on data from Rwanda reveals
that past histories of mobile phone use extracted from
CDRs are a reliable predictor of socioeconomic sta-
tus as validated against survey data.43
Moreover, the
researchers find that the predicted characteristics of
millions of mobile phone users can be aggregated to
the same distribution of wealth across the entire coun-
try or at the cluster level—approximately equivalent to
a village in rural areas or a ward in urban areas—as
that indicated by traditional data sources. Such highly
localizedpovertymapscanbeusedtoeffectivelytarget
policies, programs, and resources to the poorest.These
methods can also improve demographic targeting of
services by gender, age, and income level. For exam-
ple, CDR data have been used to identify the gender
of phone users,44
as well as to identify the ultrapoor.45
Beyond the realm of CDRs, research in higher-
income countries has shown that online browsing
history and social media activity can also reliably pre-
dict household income. Social media footprints were
used in Spain to infer city-level behavioral measures
and predict socioeconomic output, specifically unem-
ployment.46
Similarly, data from Yelp reviews of retail
shops were used to measure changes in gentrification
and predict local housing prices.47
Equipped with real-
time and localized insights and trends, policy makers
can better inform policies to target areas that have
been affected by short-term economic shocks or long-
term economic shifts.
Remote sensing technology is yet another novel
way to collect population characteristics, predict pov-
erty patterns, and improve public service delivery.48
Researchers have relied on publicly available data
from Africa to both calibrate and validate machine
learning models. The Demographic and Health Sur-
vey (DHS) sponsored by the United States Agency for
International Development (USAID) and the World
Bank’s Living Standards Measurement Study (LSMS)
surveys provide high-resolution data on household
wealth and consumption expenditures. When cali-
bratedwiththesesurveys,satelliteimagerycanpredict
poverty.Atthesurveyclusterlevel,whenusedwithsur-
vey data from Malawi, Nigeria, Rwanda,Tanzania, and
Uganda satellite imagery can explain 55–75 percent of
the variation in wealth and consumption per capita.
Estimates of economic well-being using this approach
outperformed both similar estimates using satellite
readings of nighttime light in the same countries and
estimates using mobile phone data in Rwanda. Criti-
cally,thisapproachhasbeenshowntoworkreasonably
well for predicting wealth and poverty in countries
when they are excluded from the sample used to train
the model, suggesting the approach is scalable across
other countries, at least in Africa.
Ensuring road safety in transport and
transit
Road transport is an important element of economic
development. Access to transport and mobility are
highly correlated with income and quality of life.
Even though lower-income countries have only half
of the world’s vehicles, they account for 90 percent
of road traffic fatalities. In 2011 the World Health
Organization (WHO) and the World Bank launched
a Decade of Action for Road Safety, and they have
provided funding and technical assistance to build
systems aimed at reducing injuries and deaths on the
road. Despite these efforts, little progress has been
reported in low- and middle-income countries, and
the number of fatalities remains high.49
A new and growing body of literature studies how
alternative sources of data can be used to make prog-
ress toward achieving national road safety outcomes.
In the public sector, for example, a study in Nigeria
provided road safety agents with a monitoring sys-
tem to investigate and record road safety events via
mobile phone.50
Access to this mobile phone–based
database helped disseminate information better and
enabled agents to respond faster to road accidents.
Such transit monitoring practices are becoming more
widespread, especially in the private sector. Commer-
cial banks in Kenya now require a tracking device
in minibuses before approving loans to bus service
owners. As a result, today most long-range buses in
the country are equipped with GPS.51
This technology
advancement serves the dual purpose of tracking
assets under lien for the bank’s private benefit and
promoting safer driving for public benefit.
130 | World Development Report 2021
Social media analytics have also been applied in
the private sector to understand the traffic safety
culture. A recent study in Washington State in the
United States mined Twitter data to understand the
patterns, behaviors, and attitudes related to road
safety.52
Thestudyconductedsentimentanalysisbased
on traffic-related keywords to extract latent views on
topics such as safe driving measures, accidents, law
enforcement and patrolling, and accident-causing
behavior. It found that sentiment analysis using
social media posts can be used in developing policies
to improve traffic safety relevant to specific contexts.
This type of sentiment analysis could be applied in
lower-income countries as well, with substantial
benefits. Techniques are also being developed to fill
in gaps in data on the number and location of acci-
dents in lower-income countries. Recently, research-
ers developed an algorithm to identify and geolocate
crashes from Twitter feeds to substantially increase
the digital data available to prioritize road safety
policies. Spotlight 4.2 describes how car crash danger
zones were pinpointed in Nairobi, Kenya, by combin-
ing police reports and crowdsourced data.53
More broadly, research in this area has focused on
the transit industry to answer broader development
questions in the realm of private sector development.
For example, a study in Kenya found that providing
bus owners with data on their employees’ driving
behavior can improve firm operations.54
Specifically,
they placed GPS devices in Kenya’s inner-city public
transport vehicles and tracked a variety of data that
captured driving behavior, including acceleration,
jerk, location, and timestamp to measure the number
ofdailysafetyviolations.Themaincontributionofthis
datainnovationwastocorrectinformationalasymme-
try: once minibus owners could track driving perfor-
mance, drivers could receive more generous contracts
for better performance. In turn, drivers operated in a
manner less damaging to the vehicle, more frequently
met targets, and reduced underreporting of revenues.
Thus incentives between the company (principal) and
the drivers (agents) were better aligned. These types
of data can also provide governments with feedback
to use in redesigning their road infrastructure and
guide interventions to reduce accidents.
Monitoring illegal fishing and
deforestation
Recent advances in combining public intent and
private intent data are also improving the monitor-
ing of natural resource extraction. Box 4.3 features
one example: identifying illegal fishing in protected
ocean waters.
Efforts to monitor deforestation have also begun to
leverage public and private datasets. Combining data
in this way has enabled indigenous groups to patrol
their forest reserves and defend against encroach-
ment. With the aid of open-access or cheaper private
satellite imagery, cloud computing, community
observations, and publicly available property maps,
community-based forest monitoring has become
increasingly effective in identifying encroachment.55
In addition, through social media and platforms
such as Global Forest Watch the international com-
munity can better help local groups hold govern-
ments accountable in achieving national sustainable
development commitments.56
Similar data are being
used by companies to ensure that their suppliers are
meeting sustainability standards for forest products.
A recent initiative, Radar Alerts for Detecting Defor-
estation (RADD), was launched by the world’s 10 larg-
est palm oil producers and buyers to monitor illegal
deforestation in palm oil plantations.57
By funding
development of a system to detect illegal deforesta-
tion using public radar imagery, property maps, and
private procurement data, this initiative may signal a
shift from civil society monitoring the private sector
to the private sector monitoring itself to ensure that
company commitments are met.
Keeping governments accountable
Emerging data types are enabling civil society to
better monitor corruption. Utilizing crowdsourced
data and web scraping, social media discussion
boards are emerging as ways in which local leaders
can act against corrupt officials and receive real-time
feedback on the impact of anticorruption policies.58
Data reported in newspapers have been used to target
corruption, thereby allowing civil society organiza-
tions to press for stricter governance measures. A
systematic, real-time view of corruption trends can
be gained from the news flow indices of corruption
(NIC) constructed by the International Monetary
Fund (IMF), drawing on country-specific searches of
more than 665 million news articles.59
Regressing the
NIC onto the real per capita gross domestic product
(GDP) revealed that changes in corruption levels as
measured by the NIC indicators were associated with
3 percent lower economic growth over the next two
years. Combined with election data, NIC data have
helpedidentifycountriesthathadpeaksincorruption
before or after elections. These findings can prove
helpful to international responses to corruption.
Private sector data are making it possible for
international organizations and civil society actors
to monitor policy and report on important events
Creative reuses of data for greater value | 131
AIS
Automatic identification
systems (AIS) is a
collision avoidance
system that constantly
transmits a vessel’s
location at sea and
can be used to identify
and track fishing vessels.
Radar
Radar images (SAR)
can identify large
metal vessels and
penetrate clouds.
Nighttime
Nighttime optical
imagery (VIIRS)
picks up the
presence of
fishing vessels
using lights to
attract catch or
conduct operations
at night.
Optical imagery
Daytime
high-resolution
optical imagery
can be used to
visually identify
vessels.
Optical images
High-resolution
Detect lights
Global daily
coverage
Identify metal
vessels
Penetrate clouds
Identify vessels
Track movements
such as elections in real time. The Inter-American
Development Bank, in partnership with governments
in Latin America, has launched a website that uses
crowdsourced civic feedback to monitor public works
projects.60
Similarly, Civic Cops, a start-up in India,
providesasuiteofdigitalplatformstoconnectgovern-
mentswithcivilsociety,notablyofferingaservicethat
allows civic complaints and citizen service requests
to be filed by mobile phone and directed to the cor-
responding public authorities. Civic engagement data
Box 4.3 Preventing illegal fishing in protected maritime areas
Monitoring illegal fishing in Marine Protected Areas
(MPAs) is difficult because of their size and distance
from land. The boundaries of MPAs are curated and
made open access by the United Nations Environment
Programme (UNEP) and the International Union for
Conservation of Nature (IUCN). Yet identifying boats
in vast expanses of the ocean requires innovative
uses of data that are not publicly available. Global
Fishing Watch has data partnerships with the firm
ORBCOMM to access raw data from commercial trawlers’
automatic identification systems (AIS), which provides
the real-time geographic coordinates of each trawler to
help avoid collisions and provide other traffic services.
AIS data can be combined with optical and radar imag-
ery from satellites to detect illegal fishing activity (fig-
ure B4.3.1). By overlaying MPA boundaries on AIS data
used to identify boats and determining fishing behav-
iors from the time spent in specific areas, researchers
found that 59 percent of MPAs in the European Union
were commercially trawled. In areas that were heavily
fished, the presence of sensitive species (such as sharks,
rays, and skates) was 69 percent lower.a
Figure B4.3.1 Public intent and private intent data can be combined to
detect illegal fishing activity
Source: Infographic taken on July 8, 2020, globalfishingwatch.org. © Global Fishing Watch. Used with permission of Global Fishing Watch; further
permission required for reuse.
Note: Public intent data include satellite data. Private intent data include data from trawlers’ collision avoidance systems.
a. Dureuil et al. (2018).
132 | World Development Report 2021
have also been used to monitor elections in lower-
income countries. For example, in Sierra Leone’s 2012
elections a collection of citizen journalists traveled
throughout the country and reported election activity
through SMS text messages, which were then posted
on a Tumblr website, pegged to a Google map, and
disseminated on Twitter.61
Benchmarking policy priorities
Private intent data repurposed by international organi-
zations, civil society actors, and private companies are
being used to track policy goals and benchmark policy
priorities.These initiatives are invaluable because they
provide unique and comparable data across countries
that are not collected by national governments.
The data being harvested and disseminated to
promote financial inclusion have been widely recog-
nized by policy makers as critical to reducing poverty
and achieving inclusive economic growth. Partner-
ing with the polling firm Gallup Inc., the World Bank
launched the Global Findex database in 2011, the
world’s most comprehensive database on how adults
save, borrow, make payments, and manage risks
(map 4.1). This dataset was created by adding a mod-
ule to the Gallup World Poll, which offers a standing
global survey that produces comparable data across
countries and across time. Researchers, private com-
panies, and international organizations use these
data to understand the lives of people everywhere.62
The Global Findex database has become a mainstay
of global efforts to promote financial inclusion.
In addition to being widely cited by scholars and
development practitioners, Global Findex data have
been used to track progress toward the World Bank’s
goal of universal financial access by 2020 and the
United Nations’ Sustainable Development Goals
(SDG Target 8.10).
Data synergies can also help in critical policy areas
such as food security in both times of normality and
crises such as the COVID-19 pandemic. The potential
Source: World Bank, Global Findex (Global Financial Inclusion Database), https://globalfindex.worldbank.org/. Data at http://bit.do/WDR2021-Map-4_1.
Note: Data are not displayed for economies in which the share of adults without an account is 5 percent or less.
Map 4.1 Private intent data can provide unique and comparable information not collected by
national governments, such as the number of adults who lack a formal financial account
Globally, 1.7 billion adults lacked a formal financial account in 2017
Number of adults
without an account
200 million
1 million
10 million
100 million
IBRDWLD45646 | MARCH2021
Mobile network coverage
on farms
IBRD 45647 |
MARCH 2021
4G
3G
2G
None
Creative reuses of data for greater value | 133
to combine geospatial data with farmer output and
market pricing can improve the logistics and man-
agement of critical food systems. Meanwhile, inter-
national organizations have partnered with compa-
nies to create public intent surveys to track progress
toward the SDGs and inclusive development. For
example, in 2014 the United Nations Food and Agri-
culture Organization (FAO) began to add questions
to the Gallup World Poll to collect data for its Food
Insecurity Experience Scale (addressing SDG 2). In
2015 the International Labour Organization (ILO) and
Walk Free Foundation added questions that measure
the incidence of modern slavery (addressing SDG Tar-
get 8.7). Through a partnership with Facebook, in 2018
the World Bank and the Organisation for Economic
Co-Operation and Development (OECD) launched
the Future of Business biannual survey.63
The survey
targets active micro, small, and medium enterprises
(MSMEs) that host a Facebook business page. Using
these data, researchers have been able to study the
gender pay gap across 97 countries.64
Apart from surveys, companies are beginning to
repurpose their own data for the public good. During
the COVID-19 pandemic, Google began releasing
updated community mobility reports for 135 coun-
tries.65
These reports rely on users’ location data to
showdailychangesinmobilitypatternsatthecountry
or state/provincial level, such as fewer trips to transit
stations, retail stores, parks, grocery stores, pharma-
cies, workplaces, or residential addresses. These data
give public health officials and the general public a
way to benchmark a region’s response to COVID-19
relative to other regions and over time. Because the
data are collected systematically across countries,
they can also be used to compare behavioral responses
across the world. Another example of a private com-
pany repurposing its own data for public benefit is the
internet speed test company Ookla, which provides a
global index for internet speeds that ranks countries
for their mobile and fixed broadband.66
These data
can be used by governments and funders to prioritize
investments in broadband coverage.
Researchers are also combining global public
intent and private intent datasets to prioritize fund-
ing streams for donors. One example is in the digital
agricultural space, where farmers can access exten-
sion services on their cellphones. Digital agricultural
interventions offer a solution to the dearth of agricul-
tural extension agents in many lower-income coun-
tries, where the ratio of farmers to extension agents
often exceeds 1,000 to 1.67
Digital services can provide
farmers with expert scientific advice based on their
local field, market, and climatic conditions. Yet most
small-scale farmers live in areas with lower 3G and
4G coverage than in areas with relatively high shares
of large-scale farms (map 4.2 and figure 4.1).68
This
Map 4.2 Agricultural extension services can be tailored to the slower, older broadband internet
accessible to many small-scale farmers
Source: Mehrabi et al. 2020. Data at http://bit.do/WDR2021-Map-4_2.
Internet service among
small-scale farmers
IBRDWLD45647 | MARCH2021
134 | World Development Report 2021
finding suggests that the wave of digital agricultural
services should focus on 2G solutions (such as voice
and text messaging) to ensure that small-scale farm-
ers are reached. Combining private intent broadband
coverage data from the data aggregation company
Mosaik (now part of Ookla) with public intent farm
size data yields localized estimates of broadband
usage at 10 square kilometer resolution. This type of
analysis can be used in making decisions about the
deployment of infrastructure to support the faster
broadband required for digital services that depend
on smartphones.
Limitations in using private
intent data for development
Despite the enormous potential offered by private
intent data through repurposing and synergies, sev-
eral important limitations and challenges affect their
use for development projects. These issues should be
taken into account in the design of future research
and public policy.
Data coverage and representativeness
A key limitation of most private intent data is their
lack of representativeness. Private intent data are
often a by-product of the use of digital technologies
such as mobile phones or the internet. Having access
to these technologies typically requires infrastruc-
ture resources such as electricity or broadband that
are distributed unequally in lower-income coun-
tries. In addition, because smartphone ownership is
skewed toward those who can afford the phones, the
data collected through these technologies primarily
highlight the characteristics of a relatively wealth-
ier share of the population. A 2012 study combining
CDRs and surveys found that mobile phone owners
in Rwanda were wealthier, better educated, and
predominantly male.69
Similar conclusions emerged
from an analysis of the population of mobile phone
owners in Kenya.70
The lack of representativeness is
even more pronounced in social media data, which
typically require that users be literate in addition
to having internet access. Moreover, because of the
access charges associated with internet use, only the
wealthy can afford to use the internet on their mobile
devices. Estimates from Ghana, Kenya, Nigeria, and
Senegal suggest that less than one-third of the popu-
lation uses internet on a mobile phone, and less than
15 percent in Mozambique, Rwanda, Tanzania, and
Uganda.71
To overcome the lack of representativeness
of private intent data, development practitioners
often rely on statistical methods to combine them
with public intent data.
One important source of alternative data is satel-
lite imagery, which can be either public intent or pri-
vate intent, depending on the application. Images col-
lected by satellites have the advantage of being fully
representative of the population, and they are well
suited to picking up measures of building density
that are highly correlated with population density
and, by extension, economic well-being. Satellite data,
however, come with an important limitation—they
are typically available only for aggregated geographic
units such as grids or villages. So-called “bottom-up”
statistical techniques combine survey data with
remote sensing indicators to permit greater geospa-
tial precision (box 4.4).
Data profiling and discrimination
Because of the complexity and unstructured nature
of private intent data, data scientists are increas-
ingly relying on modern machine learning methods
and algorithms to analyze them.72
These algorithms
can contain millions of parameters, which can be
extremely costly and time-consuming to calibrate.73
Machine learning experts thus typically rely on
algorithms that are “pre-trained” using very large
quantities of private intent data to make them easy
to use for a variety of tasks. Although these algo-
rithms are extremely useful for extracting insights
from complex datasets, researchers in recent years
realized that biases in the data used to calibrate these
algorithms could contribute to discrimination,74
with
adverse consequences for people’s welfare. Other
research found that a machine learning tool created
Figure 4.1 Gaps in network coverage differ across
farm sizes, affecting agricultural extension services
Source: Mehrabi et al. 2020. Data at http://bit.do/WDR2021-Fig-4_1.
0
20
0–1
Network
coverage
(%)
1–2 2–5 5–10 10–50 50–100
Farm size (hectares)
100–200 200
40
60
80
100
Internet service: 2G 3G 4G
Creative reuses of data for greater value | 135
to predict the future criminal behavior of defendants
in the United States embedded racial discrimination:
black defendants were twice as likely as their white
counterparts to be falsely classified as future crimi-
nals.75
Similarly, image search engines such as Flickr,
which have been the source of training data for vari-
ous computer vision algorithms, have been shown to
overrepresent light-skinned men between the ages
of 18 and 40, leading to poorer performance by these
algorithms when making predictions of underrepre-
sented categories such as women or minorities.76
Similar issues arise when machine learning algo-
rithms are pre-trained using text containing racist
and sexist stereotypes. Text generation algorithms
trained on massive online text databases that were
scraped from the web, such as the GPT-2 database
created by Open AI, have been found to generate racist
and anti-Semitic text in response to specific inputs.77
When trained on Google News, word-embedding
algorithmsaimedatmeasuringthesimilaritybetween
words tend to propagate the sexist biases reflected in
the text, highlighting similarities between “man” and
“computer programmer,” whereas “woman” appears
to be associated with “homemaker.”78
Arguably, such
discrimination can have larger consequences in
lower-income countries, which typically lack safety
nets and social protection mechanisms.
Data transparency and manipulation
Both the data-generating process and the algorithms
usedtoprocessprivateintentdatasufferfromalackof
transparency. The algorithms used by search engines
Box 4.4 Using statistical methods and private intent data to improve
representativeness and geospatial precision
Combined data sources, by improving the representa-
tiveness and precision of survey data, enable indicators
to be reported at finer spatial scales. One statistical
approach to improving representativeness typically used
when combining survey data with mobile phone data or
satellite imagery is to average the data from different
sources using a common geographic unit of analysis.
For example, a welfare measure such as an asset index
could be averaged across all households in a village
(enumeration area). The results are then related to satel-
lite imagery or mobile phone data. This procedure works
well when extrapolating from imagery to predict average
consumption for countries or large areas not covered by
a survey. This method can also be used to generate local
estimates of welfare within a country, provided that an
appropriate statistical method is used to directly incor-
porate information from the sample into the estimation
procedure to obtain more precise estimates.
Facebook engineers have used deep-learning algo-
rithms to detect buildings in satellite imagery, allowing
them to downscale population estimates from the cen-
sus to a much finer spatial resolution. However, these
methods have significant drawbacks. Predictions based
on models specified at aggregate levels will generally
not deliver precise estimates unless they are combined
in an appropriate way with survey-based estimates.
Precision is an important consideration because most
national statistical offices will not publish imprecise esti-
mates due to quality concerns. Furthermore, geographic
downscaling relies on a few key assumptions. Facebook
assumes the population is distributed in proportion
to the “built-up area,” which leads to inconsistencies
between the estimates and the census. For example,
because a smaller portion of buildings in urban areas are
residential, relying on built-up area to distribute popula-
tion could exaggerate population counts in urban areas
compared with rural ones.
An alternative method for estimating the population
of small areas is to use “bottom-up” methods that draw
on data from survey listing exercises rather than “top-
down” disaggregation of census data. “Bottom-up”
techniques offer the important advantage of being
able to produce updated population estimates without
a census at a fraction of the cost. They use survey data
to calibrate a model that relates population in the areas
sampled by the survey to remote sensing indicators.
Geospatial indicators that predict population density
include the geographic size of the village, the number of
buildings, the extent of built-up area, and the presence
of nighttime lights. The model can then be used to gen-
erate population estimates nationwide. Similar methods
can be used to generate more precise estimates of
nonmonetary poverty.a
They likely could be applied to a
variety of socioeconomic indicators, including monetary
poverty, labor market outcomes, health outcomes, and
educational attainment.
a. Masaki et al. (2020).
136 | World Development Report 2021
are not public, and they are constantly optimized to
improve users’ experience. This process can lead to
inaccurate predictions of policy outcomes, such as the
notorious Google Flu Trends index. In 2009 a team
of scientists at Google published a paper describing
an innovative method to predict the number of flu
cases in the United States using the volume of search
terms related to the flu on Google.79
Their Google
Flu Trends index was initially able to predict official
numbers ahead of the US Centers for Disease Control
and Prevention (CDC), until it made headlines in 2013
for incorrectly predicting twice the number of actual
flu cases. Scientists investigating what went wrong
realized that many search terms used as predictors
were associated with the onset of winter instead of
the onset of colds.80
This “overfitting” is a major con-
cern when private intent datasets containing high-
dimensional data (that is, data with a high number of
featuresorindependentvariables)areusedtonowcast
policy outcomes that are infrequently observed. New
generations of forecasting models based on private
intent data should aim to rely on information coming
from multiple private data sources to avoid being too
dependent on the idiosyncrasies of a single source.
Even when accurate, predictive models are often
so opaque that their predictions cannot be easily
communicated to policy makers. Because machine
learning is increasingly used to shape development
policies, more research is needed to make complex
algorithms transparent and interpretable, thereby
increasing their legitimacy and ensuring they do not
contribute to unequal outcomes. More research is
also needed to understand trade-offs between inter-
pretability and predictive performance. For example,
researchers have estimated models using data to
predict poverty from satellite imagery in both Sri
Lanka and Uganda by focusing on objects in images
that correlate with standards of living such as roads,
buildings, and cars.81
In each country, the interpre-
table model performed as well as commonly used
black-box computer vision algorithms, indicating
that model interpretability does not necessarily come
at the cost of performance. The performance of image
recognition algorithms may be constrained, however,
because they are initially trained to detect a wide vari-
ety of objects using millions of images, which may
not isolate the most important portions of the images
for the specific purpose of predicting poverty.
An additional challenge of relying on algorithms
to design policy is that they can be manipulated.
People can change their behavior in response to
algorithmic decision-making to trick the system and
maximize their interests. For example, the nonprofit
GiveDirectly facilitates direct cash transfers to poor
households. As a proxy for poor living conditions, sat-
ellite imagery was initially used to target households
with thatched roofs. When GiveDirectly’s methods
became common knowledge, some families pre-
tended to live in a thatched structure near their home
to qualify for the aid.82
This concern about manipu-
lation increasingly motivates the design of machine
learning algorithms that assign more weight to
personal characteristics less likely to be subject to
manipulation.83
Investments in data innovations:
Building a culture of data
Effectively leveraging new types of data requires
investinginhumancapital,datasharing,andresearch
in lower-income countries. This section describes
areas in which governments, donors, and advocates
of corporate social responsibility can help promote
innovative uses of data for public benefit in lower-
income countries, thereby helping to build a culture
for the use of data and evidence.
Investing in people
Buildingtheskillsofanalystsanddecision-makers.Leverag-
ing the comparative advantages of public intent and
private intent data requires a long-term approach to
enhancing domestic human capital in lower-income
countries. Investments in human capital should focus
on decision-makers and analysts. Strengthening the
data and statistical literacy of decision-makers can
help them understand the potential utility and limita-
tions of these new data sources. This understanding
is key for them to champion a data innovation agenda
and advocate for the required human, technological,
and financial resources. Analysts, on the other hand,
need unique skill sets to leverage private intent data
that bridge many disciplines, including statistics,
economics, computer science, geographic informa-
tion systems (GIS), and the multidisciplinary field of
data science. Although many of these skills are akin to
those needed to bolster the capacities of national sta-
tistical offices (NSOs), teams with exposure to private
sector data and data systems will be able to work more
efficiently across data types and foster collaboration.
At the country level, it is critical to build analysts’
skills to integrate public intent and private intent data
for public policy design and evaluation. These skills
include data engineering to manage, process, and link
public intent and private intent data; analyzing inte-
grated datasets using traditional statistical and econo-
metric methods and the latest advances in machine
Creative reuses of data for greater value | 137
learning; and visualizing the emerging insights. These
skills must be augmented with acute awareness of the
ethics and data protection dimensions of public intent
and private intent data sources. Several competency
frameworks developed for big data analytics are use-
ful for a more granular understanding of the skill sets
required for data acquisition, processing, analysis,
visualization, and reporting.84
These broad directions
for capacity building focus on catalyzing the use of
new data sources, in contrast to the recommendations
presented in chapter 2, which focus on strengthening
data production within the public sector.
Enhancing tertiary education. The long-term process
of acquiring these skills begins by enhancing tertiary
education. Because of the wide array of competencies
that data scientists are expected to possess, univer-
sity and graduate degree programs may have to be
altered, particularly in lower-income countries. Stu-
dents need the foundational statistical skills central
to understanding and using public intent data, as well
as the frontier skills in artificial intelligence (AI) and
machine learning at the heart of leveraging the value
from the integration of public intent and private
intent data. The curricula of degree programs—in
the fields of statistics, economics, computer science,
and GIS—could be revised to align formal education
with the practical demands of jobs in data analytics.
In addition, new degree, graduate, and certificate pro-
grams with a data science theme could be established.
Promoting partnerships with universities and private
companies in higher-income countries. Such partnerships
can be instrumental in achieving these education
goals and enhancing training in contemporary data
topics such as machine learning and AI. These types
of initiatives can help tailor research in lower-income
countries that leverages private intent data to local
contexts and hires more local researchers. This would
be a welcome trend because this research field has
been predominantly led by principal investigators
who are not nationals of these countries.
Proficiency with AI is one of the most coveted
data skill sets. It involves feeding computers large
amounts of data to train them to identify patterns
and make predictions. For example, seismic activity
data are crunched by computers to learn how to pre-
dict earthquakes,85
and satellite images of agricultural
areas are processed to estimate crop yields.86
Accord-
ing to an analysis of self-reported job skills on the
professional network platform LinkedIn, the United
States leads in AI, followed by China (see figure 4.2).87
Low- and middle-income countries need to catch up
to these emerging trends in skills. In South Africa, the
minister of communications and digital technologies
argues there is no shortage of talent in the Africa
region, but rather a lack of visionary policy makers to
drive digitization and enable key infrastructure such
as data centers and cloud computing.88
Technical training can sometimes be obtained
cheaply or at no cost. Some digital companies provide
free online training, and their certifications often
attract job seekers.89
Cisco’s Networking Academy
has trained more than 10 million people in low- and
Figure 4.2 Artificial intelligence specialists gravitate to the US market, no matter
where they are educated
Source: MacroPolo, “The Global AI Talent Tracker,” https://macropolo.org/digital-projects/the-global-ai-talent-tracker/. Adapted with permission of
MacroPolo/Paulson Institute; further permission required for reuse. Data at http://bit.do/WDR2021-Fig-4_2.
Note: Country affiliations are based in panel a on the headquarters of institutions in which researchers currently work and in panel b on the country in which
researchers received their undergraduate degree.
United States, 59
China, 11
Europe, 10
Canada, 6
United Kingdom, 4
Others, 10
a. Location of employment (%) b. Location of study (%)
China, 29
United States, 20
Europe, 18
India, 8
Canada, 5
United Kingdom, 4
Iran, Islamic Rep., 3
Israel, 3
Others, 10
138 | World Development Report 2021
middle-income countries, often in partnership with
local academic institutions with no or low-cost tui-
tion. It also offers free online courses.90
Although
basic tech knowledge is needed to participate in these
options, these offerings suggest that relevant training
can be obtained in many developing countries at low
cost provided good broadband internet connectivity
is available. Popular cloud data management and ana-
lytical applications also feature graphical user inter-
faces, making it easier for those without advanced
coding skills to use them.
Increasing training, mentorship, and on-the-job train-
ing. Improvements along the formal education sup-
ply chain can be augmented by on-the-job training
efforts that target a broad coalition of data producers
and users across the public sector, academia, and civil
society. Increasing access to online training platforms
(such as DataCamp and Coursera) and online degree
and certificate programs, as well as free courses
offered by prestigious universities in higher-income
countries, can help build capacity across an impres-
sive array of topics related to both foundational and
frontier data analytics. These activities could be
supplemented by continued support of emerging
data science initiatives that provide scope for col-
laboration, mentorship, and learning, including the
Deep Learning Indaba Institute,91
Data Science Africa
conferences,92
and the competition platform Zindi.93
Skills training companies and platforms have
recently surfaced supporting the development of
digital data skills in developing countries and linking
trainees to employers. Upskilling platforms such as
Andela and Gebeya in Africa and Revelo in Brazil train
students in data analytics and software development.
Andela, founded in 2014, is training young people to
meet the demand for information technology (IT)
talent globally and within Africa. Gebeya, founded
in 2016, matches trainees with companies in Africa.94
And data labeling companies such as CloudFactory in
Kenya and Nepal and Samasource in Kenya are creat-
ing jobs for cleaning, categorizing, and labeling data
used for AI applications.95
As for its continued support of short-term training
and mentorship programs in lower-income countries,
the international community should evaluate the
conditions for achieving sustained improvements in
local capacity to identify short-term capacity-building
models that hold promise.
Strengthening data literacy among senior leadership
and creating institutional environments that encourage
the use of sophisticated data and evidence. The big push
to build an army of data scientists for jobs in the
public sector, private sector, and civil society must be
complemented with efforts to create enabling insti-
tutional and leadership environments (see chapter
8) that place a high premium on the use of data and
evidence—both internally for management of these
institutions and externally for understanding and
producing policies that enhance welfare.
To help strengthen data literacy, especially in
low-capacity settings, regional and international
development partners can leverage their expertise or
technical partnerships to provide governments with
technical assistance. They can also organize objective
peer reviews for gauging the relevance and accuracy
of complex research that hinges on the integration of
public intent and private intent data sources, includ-
ing efforts sponsored by international agencies them-
selves (see spotlight 2.2).
On the whole, strengthening the data literacy of
the senior leadership of public sector institutions will
not guarantee that they will seek data and evidence
when designing policies, especially if their insights
do not appear to contribute to the political objectives
of their government (see chapter 8). As discussed in
chapter 2, mutually reinforcing constraints in financ-
ing,humancapital,datagovernance,anddatademand
must be overcome as part of a long-term, holistic plan
backed by domestic support from politicians of the
major political parties, academia, and civil society.
In the short term, strengthening human capital in
NSOsandlineministriesinlower-incomecountriesin
the production and use of public intent data will indi-
rectly contribute to the pool of skill sets required for
public intent and private intent data to be integrated
into official statistics and knowledge products gener-
ated within the public sector (see chapters 2 and 9 for
further discussion). International organizations can
provide these institutions with technical assistance
to cultivate open data practices and to build skills in
the creation and dissemination of public use census,
survey, and administrative datasets that are subject to
international best practices in deidentification. This
effort can catalyze downstream research that brings
together public intent and private intent data sources.
Statistical capacity-building projects financed by
international organizations and traditionally focused
on the production and use of public intent data should
be expanded systematically to allow for investments
in skills critical to the integration of public intent and
private intent data sources. NSOs could establish a
business line on experimental statistics (that is, statis-
tics that leverage new data sources and methods to
better respond to users’ needs and can be viewed as
official statistics “in the making”). This business line
would provide a more direct route to investing in staff
Creative reuses of data for greater value | 139
who can conduct cutting-edge research grounded in
synergies among public intent and private intent data
sources.96
Revamping NSOs to perform nontraditional roles with
private intent data. In general, for NSOs to maintain
relevance in a landscape in which they no longer gen-
erate the majority of the data, they should be empow-
ered data stewards endowed with qualified staff who
can perform nontraditional roles. NSOs must be able
to field requests for accessing confidential data that
can be used to calibrate and validate models that
fuse public intent and private intent data sources. By
pursuing a work program on experimental statistics,
NSOs should aspire to be proactive contributors to
research that would assess the public intent data
requirements of synergistic applications. The Data
Science Campus in the United Kingdom’s Office for
National Statistics (ONS) is an example of a unit in
an NSO that is tasked with leveraging the latest
advances in data science and the synergies between
public intent and private intent data sources to serve
the public good. The Campus works on data science
projects not only for the ONS, but also for the UK
government as well as international organizations in
collaboration with partners from academia and the
private sector.97
Twinning arrangements between the
NSOs in high-income countries with similar initia-
tives and NSOs in low- and middle-income countries
can be one way to strengthen NSO capabilities in
low-capacity environments to create units akin to the
ONS Data Science Campus.
NSOs will also need to grapple with data pro-
tection issues. They must, for example, determine
whether spatially deidentified data are sufficient for
calibration purposes and what minimum volume and
scope of confidential data will have to be accessed for
specific applications. NSOs also can carefully identify
applications in which access to confidential data are
not required. However, accommodating requests for
applications with well-defined and well-articulated
confidential data needs or responding to time-
sensitive requests tied to immediate policy needs
(such as a humanitarian or disaster response) ulti-
mately require that NSOs have personnel who are
trained in data protection and law and who can enter
into and enforce data sharing agreements to mitigate
data protection risks. To fulfill these roles, NSOs must
receive a significant infusion of financial and human
capital and should consider actively engaging—at
least in the short term—international organizations
or academic institutions and research organizations,
at both the local and international levels, to bridge the
gaps in internal institutional and technical capacity.
Investing in data accessibility. Accessing private
intent data remains challenging, especially in lower-
income countries. Large barriers, such as protecting
customers and maintaining competitive advantages,
prevent companies from sharing their data. In addi-
tion, pulling data from a company’s database requires
computing and human resources that are typically
outside of a business’s key performance indicators. If
a public organization has poorly formulated requests
for a company’s data, compiling and exporting data
can become a time-intensive burden on companies.
Even if a company is willing and able to share its data,
because of the diversity of private intent data types it
is difficult to create standards to share data. Shared
data must have clear documentation, be in a usable
format that is interoperable with other private and
public datasets for integration, and have been deiden-
tified. Creating these types of standards may require
third parties to coordinate efforts and will place more
resource burdens on companies.
Utilizing data collaboratives and research partnerships.
These cooperative arrangements are essential ways
for different sectors, research institutions, and gov-
ernments to share data. The Open Data Lab describes
data collaboratives as moving beyond public-private
partnerships to pool data resources that researchers
use for public benefit.98
A successful example can
be found in Nairobi, Kenya, where researchers have
partnered with local government agencies to develop
spatially integrated road safety datasets with inputs
from administrative, social media, private, and tradi-
tional sources (see spotlight 4.2 for details).
Data collaboratives can be coordinated by civil
society or universities, or through corporate social
responsibility programs. Facebook’s Data for Good
initiative is an example of how technology compa-
nies can be incentivized to share their data through
corporate social responsibility programs. By leverag-
ing customer data and engaging with civil society
and university partners, Facebook is offering a suite
of innovative datasets intended to aid public policy
decisions. Offerings range from mobility data and
downscaled population maps to data on electric grid
coverage.
Data collaboratives and research partnerships
could provide companies’ application programming
interfaces (APIs) and cloud services through tiers
made available to the public sector. Social media plat-
forms such as Twitter provide APIs so that users can
download their text data using free tiers.
Private companies could be encouraged to share
their data at reduced cost for public initiatives, with
special grants for researchers or tax breaks for the
140 | World Development Report 2021
data provider. Cloud computing services, such as
Google Cloud and Amazon Web Services, are offering
small education grants to researchers to access the
computing infrastructure needed to leverage these
datasets, which are often large. Flowminder, a Swed-
ish nongovernmental organization, provides code,
instructions, and support for mobile network oper-
ators to aggregate, deidentify, and share their CDR
data. Their open-source tool, FlowKit, provides APIs,
code, and databases to aid companies sharing these
sensitive records with researchers.99
Trusted intermediaries are building platforms
that provide researchers with private intent data or
facilitate sending programming code to private com-
panies, which can, in turn, run the code with their pri-
vate intent data on behalf of the researcher and share
aggregated research insights. Opportunity Insights, a
nonpartisan, nonprofit research organization based
at Harvard University, offers a Track the Recovery
platform that gives researchers access to near real-
time economic data to understand the COVID-19
policy response in the United States. As the broker of
the data sharing agreements, Opportunity Insights
deidentifies data to facilitate sharing by protecting
customers’ and companies’ data. For example, they
protect companies’ data through aggregation and by
creating relative indicators that mask actual revenue
and profit. OPAL (“Open Algorithms”) takes a different
approach. OPAL is a nonprofit partnership created by
groups at MIT Media Lab, Imperial College London,
the financial company Orange, the World Economic
Forum, and the Data-Pop Alliance. Its platform allows
researchers to send companies certified open-source
algorithms that are then run behind the companies’
firewalls.
Despite the promise of these innovative data-
sharing pathways, many are not available in lower-
income countries. For example, these countries
rarely participate in data collaboratives, according
to data compiled by the Open Data Lab.100
A similar
trend can be seen in the limited number of studies on
lower-income countries that leveraged private intent
data in the early stages of the COVID-19 pandemic
(box 4.2). More investments are needed in accessing
private intent data for public benefit in lower-income
countries.
Investing in research
Investments in research are needed to develop
methods and enable lower-income countries to
grow research programs that leverage private intent
data for public policy. The research community can
achieve quick wins by focusing on foundational
areas such as testing whether validated methods
in one region translate to contexts where data are
sparse. Over the longer term, research strategies
would benefit from building validation and training
datasets in lower-income countries to avoid issues
similar to data profiling and discrimination when
using pre-trained models from higher-income coun-
tries. The selections that follow describe some of the
high-priority research needed to advance the use of
private intent data for public benefit in the short and
long term.
Shorter-term research needs. Because much of the
current innovation in using private intent data is led
by researchers and technology companies in higher-
income countries, many of the available methods
are not tailored to the development context.101
Even
when a solution is developed for and validated in
a particular lower-income country, understanding
whether and when the solution can be extended to
other lower-income countries can enable research
in data-sparse contexts. For example, even though
international phone call usage correlates with wealth
more strongly in Rwanda than in Balkh province
in Afghanistan, such a finding can still be useful in
contexts such as Balkh province.102
Similarly, granular
poverty maps that use digital trace data from mobile
phones hold great potential for better targeting social
services, but the patterns that algorithms use to make
poverty predictions may differ from context to con-
text.103
Research is needed to determine when granu-
lar poverty estimates created for one country can be
transferred to another country and when they will
lead to misleading maps.
In the short term, researchers also need to produce
methods that preserve privacy while combining pub-
lic intent and private intent data. As more datasets are
made available to researchers and decision-makers,
more opportunities arise to reverse-engineer tradi-
tional deidentification methods. If these risks are
not eliminated, individuals and companies may be
reluctant to share their data. One example of how
public intent data are being designed to prevent
de-anonymization is the GPS data collected from
household surveys for the Demographic and Health
Survey and the Living Standards Measurement
Study. Even if surveys collect GPS-based locations
for communities and households, the resulting data
are not included in public use datasets to ensure
the confidentiality of respondents. Any third-party
user that obtains DHS- or LSMS-type survey data
has access only to spatially offset locations of survey
enumeration areas. For example, a household’s loca-
tion is represented using the 10 square kilometer area
Creative reuses of data for greater value | 141
within which the house is located. These surveys are
also good examples of providing documentation that
makes the precision and accuracy of the deidentified
data explicit.
Concerns about data protection have limited
the eagerness to share data, even in critical times
such as during the Ebola crisis. Historically, data
deidentification techniques have maintained equi-
librium between the producers and consumers of
data, preserving individuals’ privacy while limiting
information loss. However, deidentification tech-
niques have proven to be increasingly imperfect with
high-dimensional private intent data. Despite the use
of standard deidentification techniques, one study
found that four data points were enough to reidentify
95 percent of individuals in a mobile phone dataset of
1.5 million people.104
In this context, new data sharing
frameworks have been proposed to mitigate privacy
risks while maximizing the informative potential of
private intent data. Researchers have proposed four
models for use of mobile phone data, depending on
the level of risk tolerance and the number of potential
third-party users.105
The Social Science One initiative,
which allows researchers from academic institutions
to access Facebook data at scale,106
is an example of
how new data sharing frameworks could be applied
to access private intent data, paving the way for
future public-private collaborations. As noted, more
research will be needed to design methods that allow
the privacy of private intent data to be protected,
while minimizing the loss of precision associated
with using these data in applications aimed to inform
public policy.107
Inacontextoflowdataandcodingliteracy,off-the-
shelf programming tools can lead to more effective
and responsible use of private intent data. Flowmind-
er’s FlowKit is an example of an open-source solution
that helps companies to deidentify, clean, and export
their data effectively for policy applications.108
Using
FlowKit, Flowminder and its partners have been able
torapidlyintegrateCDRsintotheCOVID-19response.
Aequitas is another open-source toolkit that provides
an intuitive way to audit machine learning models for
discrimination and bias.109
These types of tools enable
researchers to access data and companies to share
data without the need for specialized skills to collate
and deidentify the datasets. Ideally, these research
tools should be designed to promote access to data
and to share technical knowledge between lower-
income countries, from higher-income countries to
lower-income countries, or from lower-income coun-
tries to higher-income countries. Research funding
would not only operate on short-term project cycles
but also support the institutional setup of research
labs and institutes in lower-income countries.
Longer-term research needs. These needs include
devising best practices and quality standards. Most
decision-makers will not be well versed in the latest
data methods. Best practices and quality standards
can facilitate trust in leveraging new data types
for policy. These types of standards and governing
institutions are available for public intent data. Con-
ceivably, then, they could be translated for private
intent data. For example, traditional data collection
using sample surveys has many imperfections, but
by studying them extensively, the research com-
munity has come up with ways to address them or
quantify the errors they introduce.110
In the same
way, researchers need to study the limitations of pri-
vate intent data and develop the appropriate quality
standards for their use in public policy. For example,
there is currently no consensus on the criteria needed
to determine whether a poverty map is fit for use in
resource allocations.
International organizations can play a major role
in this process by providing platforms for discuss-
ing, formulating, and promoting these practices and
standards (see spotlight 8.1). The working groups
established under the auspices of the United Nations
Statistical Commission on household surveys, open
data, and big data may provide insight into the types
of commissions that could be established.111
Coordinating investment
In the longer term, coordinated investment in
high-quality training data from lower-income coun-
tries will also be needed. Currently, private intent data
are mostly repurposed for machine learning applica-
tions, which require high-quality data collected on
location via remote sensing to train algorithms. For
example, over the last five years pioneering research
on small-scale farming systems has successfully com-
bined high-quality georeferenced survey data with
high-resolution, multispectral satellite imagery from
public sources (Sentinel-2) and private sources (Planet
and Maxar, formerly Digital Globe) to obtain crop
yield estimates on individual plots.112
These efforts
have shown the importance of using high-quality
ground data—including georeferenced plot outlines
and objective measures of crop yields—to calibrate
and validate remote sensing models that can, in turn,
churn out high-resolution grids for crop types and
crop yields for entire regions and countries.
One of the challenges preventing the rapid scale
up of these efforts is the lack of knowledge on
the required volume and content of georeferenced
142 | World Development Report 2021
microdata that should be collected through surveys
to inform downstream remote sensing applications
capable of meeting needs for spatially disaggregated
estimation and reporting. These challenges could be
addressed by research. Similarly, to analyze natural
language data to, for example, measure attitudes on
certain topics in the population, researchers typically
rely on pre-trained language models (that is, models
already trained on a large corpus of text). The lack of
these pre-trained language models in languages other
than the major ones has been a barrier to the analysis
of text data in low- and middle-income countries.113
If the people in these countries are themselves the
designers, curators, and owners of location-specific,
high-quality training data to test private intent data,
the center of research gravity would naturally shift
toward lower-income countries.
Table4.1summarizesselectedshort-andlong-term
research needs. Answers to the questions listed in the
table will vary in accordance with the development
Research area Examples of research gaps
Societal impacts • 
How do we ensure that algorithmic-based policy making can lead to fair outcomes?
• 
How can we increase the transparency and interpretability of policy predictions using private intent
data?
• 
How can we design algorithms that can be safeguarded against manipulation?
• 
What are the trade-offs between granularity and precision, and what is the optimal mix for
targeting of development programs?
Quality standards • 
How can standards be created, agreed on, updated, and communicated to the general
development community? Who needs to be part of these conversations?
• 
To ensure that policy makers can trust and use results, what should the standards be for accuracy
and precision for frontier applications that use private intent data or that combine public intent and
private intent data?
External validitya
• 
How promising is the approach of building models in countries that have data and applying them
to countries with limited data?
• 
How can issues akin to data profiling and discrimination be avoided when using pre-trained
models from higher-income countries in cases of novel development use?
• 
To what extent can applications that combine public intent survey data with private intent data
predict values calculated from census data within a country?
Machine learning • 
How does the approach to machine learning and spatial feature selection need to change from
common machine learning tasks to more specialized tasks that will aid development policy?
• 
Which features best predict spatial variation in development outcomes in different contexts? What
are the trade-offs between predictive accuracy and cost?
Training and validation data • 
What should be the required volume of and approach to public intent data collection for calibrating
and validating machine learning algorithms that combine public intent and private intent data?
Deidentificationb
• 
How do deidentification methods need to change to protect individuals and companies when
private intent data are used for public benefit?
• 
How does (spatial) deidentification of public intent data affect the accuracy and precision of
applications that use public intent data to calibrate and validate machine learning algorithms that
combine public intent and private intent data?
Capturing longitudinal
change
• 
How do accuracy and precision differ in applications that aim to estimate longitudinal change
versus obtaining cross-sectional predictions for the same development outcome?
• 
What features best predict longitudinal change in different contexts?
• 
How can we ensure the stability over time of algorithms aimed at predicting changes in policy
outcomes?
• 
When public intent survey data are combined with imagery—specifically, spatial features
(predictors) extracted via deep-learning techniques—in order to derive high-resolution estimates of
a development outcome, how do the spatial, spectral,c
and temporal resolution of satellite imagery
affect the accuracy and precision of the predictions for the outcome of interest?
• 
Do these effects vary based on the decisions on the size of satellite imagery grids that are
processed for extracting spatial features?
Table 4.1 Selected research gaps to be addressed to advance the use of private intent data for
development
a.	External validity relates to the research findings of one location holding true in another location.
b.	The term deidentification is used instead of anonymization because, although data are processed to deidentify any individual, these data may become identifiable in the future as computing and
machine learning advance. Thus data may never be truly anonymized.
c.	“Spectral” refers to different wavelengths on the visual spectrum. Satellite images typically have multiple “bands” that capture different spectral ranges.
Creative reuses of data for greater value | 143
outcome/process that researchers are aiming to bet-
ter measure and understand through the use and
augmentation of private intent data. For example,
the requirements for high-resolution estimation of
population density will differ from requirements for
estimating crop yields.
The growing availability and use of private intent
data for development purposes have potentially large
benefits, especially when paired with public intent
data. However, the way forward requires a condu-
cive and enabling environment that trains both ana-
lysts and higher-level decision-makers to consider
critically issues of data protection, discrimination,
manipulation, representativeness, and transparency.
Repurposing and combining public intent and private
intent data are central to getting more value from
data, but the benefits must be shared equitably while
safeguarding against harmful outcomes. Part II of
this Report describes the building blocks of a social
contract that enables such data flows, including infra-
structure policies, legal and regulatory frameworks
for data, related economic policies, and the institu-
tions of data governance.
Notes
1. Bengtsson et al. (2011).
2. Chetty et al. (2020); Oliver et al. (2020).
3. Beraja, Yang, and Yuchtman (2020).
4. Salganik (2017).
5. Salganik (2017).
6. Serajuddin et al. (2015).
7. IEAG (2014).
8. Demombynes and Sandefur (2015).
9. Tiecke and Gros (2016).
10. Stephens-Davidowitz (2017).
11. For the 2008 and 2012 US presidential elections,
Stephens-Davidowitz (2017) found that an area’s search
rate for terms with racial overtones was a robust nega-
tive predictor of presidential candidate Barack Obama’s
vote share.
12. WHO (2008).
13. Adda (2016).
14. Ihantamalala et al. (2018); Milusheva (2020); Weso-
lowski et al. (2012).
15. González, Hidalgo, and Barabási (2008); Le Menach
et al. (2011); Tatem et al. (2009).
16. Wesolowski et al. (2012).
17. Wesolowski et al. (2012).
18. Peak et al. (2018). After the outbreak, they studied how
mobile phone data for Sierra Leone could have been
used to evaluate the impacts of interventions meant to
decrease travel during the epidemic.
19. COVID-19 National Emergency Response Center
(2020).
20. Burns (2020).
21. Chang et al. (2020); Maas et al. (2019).
22. Aktay et al. (2020).
23. Lai et al. (2020); Pepe et al. (2020).
24. Salathé et al. (2012).
25. McCall (2020).
26. PAHO and WHO (2016).
27. McGough et al. (2017).
28. Kraemer et al. (2019).
29. Yang et al. (2017).
30. Milinovich et al. (2014).
31. Internal Displacement Monitoring Center (IDMC),
Data of GIDD (Global Internal Displacement Data-
base), https://www.internal-displacement.org/database
/displacement-data.
32. Ritchie and Roser (2019).
33. BBC News (2018); CNN Indonesia (2018).
34. Bengtsson et al. (2011); Lu, Bengtsson, and Holme (2012);
Wilson et al. (2016).
35. Robinson, Power, and Cameron (2013).
36. Robinson, Power, and Cameron (2013).
37. Kongthon et al. (2012).
38. Resch, Usländer, and Havas (2018).
39. Sentiment analysis is the process of computationally
identifying and categorizing opinions expressed in
a piece of text, especially to determine whether the
writer’s attitude toward a topic or product is positive,
negative, or neutral. See “sentiment analysis,” Lexico,
Oxford University Press, https://www.lexico.com/en
/definition/sentiment_analysis.
40. Reynard and Shirgaokar (2019).
41. See “Case Study 5: Delivering Remote Flood Analytics
as a Scalable Service,” pages 61–68 in Sylvester (2019).
42. Blumenstock, Cadamuro, and On (2015); Jean et al.
(2016); Yeh et al. (2020).
43. Blumenstock, Cadamuro, and On (2015).
44. Frias-Martinez, Frias-Martinez, and Oliver (2010).
45. Aiken et al. (2020).
46. Llorente et al. (2015).
47. Glaeser, Kim, and Luca (2018).
48. Blumenstock (2016).
49. Bonnet, Lechat, and Ridde (2018).
50. Williams, Idowu, and Olonade (2015).
51. Kelley, Lane, and Schönholzer (2020).
52. Dai and Sujon (2019).
53. Milusheva et al. (2020).
54. Kelley, Lane, and Schönholzer (2020).
55. Pratihast et al. (2014).
56. See World Resources Institute, Global Forest Watch
(dashboard), https://www.globalforestwatch.org/.
57. WRI (2019).
58. See, for example, Janaagraha Centre for Citizenship
andDemocracy,IPaidaBribe(dashboard),https://www
.ipaidabribe.com/about-us#gsc.tab=0.I.
59. Hlatshwayo et al. (2018).
60. Inter-American Development Bank, “Countries That
Have Already Implemented the Investment Map Initia-
tive,” https://www.iadb.org/en/reform-modernization
-state/countries.
61. Marshall (2012).
62. Deaton (2008); Falk et al. (2018). For a list of projects that
have used Gallup World Poll data, see Gallup, “Work-
ing Together to Change the World,” https://www.gallup
144 | World Development Report 2021
.com/analytics/318176/public-sector-success-stories
.aspx.
63. Goldstein, Gonzalez Martinez, and Papineni (2019).
64. Goldstein, Gonzalez Martinez, and Papineni (2019).
65. See Google, Community Mobility Reports (database),
https://www.google.com/covid19/mobility/?hl=en.
66. See Ookla, Speedtest Global Index (database), https://
www.speedtest.net/global-index.
67. Davis et al. (2010).
68. Mehrabi et al. (2020).
69. Blumenstock and Eagle (2012).
70. Wesolowski et al. (2012).
71. Frankfurter et al. (2020).
72. Jean et al. (2016).
73. Strubell, Ganesh, and McCallum (2019).
74. Zou and Schiebinger (2018).
75. Angwin et al. (2016).
76. Buolamwini and Gebru (2018).
77. Wallace et al. (2019).
78. Bolukbasi et al. (2016).
79. Ginsberg et al. (2009).
80. Lazer et al. (2014).
81. Ayush et al. (2020); Engstrom, Hersh, and Newhouse
(2017).
82. Blumenstock (2018).
83. Björkegren, Blumenstock, and Knight (2020).
84. Carretero, Vuorikari, and Punie (2017); GSS (2016); Vale
and Gjaltema (2020).
85. Perol, Gharbi, and Denolle (2018).
86. RTI International, “Impact: Using Satellite Images
and Artificial Intelligence to Improve Agricultural
Resilience,” https://www.rti.org/impact/using-satellite
-images-and-artificial-intelligence-improve-agricultural
-resilience.
87. Perisic (2018).
88. ITU (2020).
89. Flowers (2019).
90. Cisco Systems, “Cisco Networking Academy,” https://
www.cisco.com/c/en/us/about/csr/impact/education
/networking-academy.html.
91. Deep Learning Indaba Institute, https://deeplearning
indaba.com/2020/.
92. Data Science Africa, http://www.datascienceafrica
.org/.
93. Zindi (2020).
94. Buckholtz (2019).
95. Kaye (2019).
96. See, for example, the related efforts under the Euro-
pean Statistical System by Eurostat, Statistics Den-
mark, Destatis (Germany), National Statistics Institute
(Spain), ISTAT (Italy), Central Statistical Bureau of Lat-
via, Statistics Netherlands, Statistics Poland, Statistics
Portugal, National Institute of Statistics (Romania),
Statistics Finland, Statistics Iceland, and the Federal
Statistical Office (Switzerland)—see European Sta-
tistical System, Eurostat, “Experimental Statistics,”
Luxembourg, https://ec.europa.eu/eurostat/web/ess
/experimental-statistics.
97. For more information on the ONS Data Science
Campus and its projects, see Data Science Campus,
Office for National Statistics, “Data Science for Public
Good: Projects,” https://datasciencecampus.ons.gov.uk
/projects/.
98. GovLab, Tandon School of Engineering, New York Uni-
versity, “Data Collaboratives,” https://datacollaboratives
.org/.
99. Flowminder Foundation, “FlowKit CDR Analytics Tool-
kit,” https://flowkit.xyz/.
100. GovLab, Tandon School of Engineering, New York Uni­
ver­sity, “Data Collaboratives,” https://datacollaboratives
.org/.
101. Blumenstock (2018).
102. Aiken et al. (2020).
103. Blumenstock (2018).
104. de Montjoye et al. (2013).
105. de Montjoye et al. (2013).
106. Social Science One, Institute for Quantitative Social
Science, Harvard University, “Building Industry-
Academic Partnerships,” https://socialscience.one/home.
107. Dwork and Roth (2014).
108. Flowminder Foundation, “FlowKit CDR Analytics Tool-
kit,” https://flowkit.xyz/.
109. Saleiro et al. (2019).
110. Bethelehem (2009).
111. United Nations Statistical Commission, Statistics
Division, Department of Economic and Social Affairs,
United Nations, “Active Groups under the Statistical
Commission by Pillar and Type of Group,” https://
unstats.un.org/unsd/statcom/groups/.
112. Burke and Lobell (2017); Gourlay, Kilic, and Lobell
(2019); Jain et al. (2016); Lambert et al. (2018); Lobell et al.
(2020).
113. Zindi (2020).
References
Adda, Jérôme. 2016. “Economic Activity and the Spread of
Viral Diseases: Evidence from High Frequency Data.”
Quarterly Journal of Economics 131 (2): 891–941.
Aiken, Emily L., Guadalupe Bedoya, Aidan Coville, and
Joshua Evan Blumenstock. 2020. “Targeting Develop-
ment Aid with Machine Learning and Mobile Phone
Data: Evidence from an Anti-Poverty Intervention in
Afghanistan.” In COMPASS ’20: Proceedings of the 3rd ACM
SIGCAS Conference on Computing and Sustainable Societies,
310–11.NewYork:AssociationforComputingMachinery.
Aktay, Ahmet, Shailesh Bavadekar, Gwen Cossoul, John
Davis, Damien Desfontaines, Alex Fabrikant, Evgeniy
Gabrilovich, et al. 2020. “Google COVID-19 Community
Mobility Reports: Anonymization Process Description
(Version 1.0).” April 8, 2020. https://arxiv.org/abs/2004
.04145v1.
Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren
Kirchner. 2016. “Machine Bias: There’s Software Used
across the Country to Predict Future Criminals, and
It’s Biased against Blacks.” ProPublica, May 23, 2016.
https://www.propublica.org/article/machine-bias-risk
-assessments-in-criminal-sentencing.
Ayush, Kumar, Burak Uzkent, Marshall Burke, David
B. Lobell, and Stefano Ermon. 2020. “Generating
Creative reuses of data for greater value | 145
Accountability, and Transparency, 23–24 February 2018, New
York, NY, USA, edited by Sorelle A. Friedler and Christo
Wilson, 77–91. Cambridge, MA: MIT Press. https://
dam-prod.media.mit.edu/x/2018/02/06/Gender%20
Shades%20Intersectional%20Accuracy%20Disparities
.pdf.
Burke, Marshall, and David B. Lobell. 2017. “Satellite-Based
Assessment of Yield Variation and Its Determinants in
Smallholder African Systems.” PNAS, Proceedings of the
National Academy of Sciences 114 (9): 2189–94. https://doi
.org/10.1073/pnas.1616919114.
Burns, Sarah. 2020. “How Anonymized Mobile Data Are
Helping Ghana Fight COVID-19.” Global Partnership
for Sustainable Development Data, United Nations,
New York. https://www.data4sdgs.org/news/how
-anonymized-mobile-data-are-helping-ghana-fight
-covid-19.
Carretero, Stephanie, Riina Vuorikari, and Yves Punie. 2017.
“DigComp 2.1: The Digital Competence Framework for
Citizens, with Eight Proficiency Levels and Examples
of Use.” JRC Working Paper JRC106281, Joint Research
Center, EU Science Hub, Seville, Spain.
Chang, Meng-Chun, Rebecca Kahn, Yu-An Li, Cheng-Sheng
Lee, Caroline O. Buckee, and Hsiao-Han Chang. 2020.
“Modeling the Impact of Human Mobility and Travel
Restrictions on the Potential Spread of SARS-CoV-2 in
Taiwan.” medRxiv, April 11, 2020. https://doi.org/10.1101
/2020.04.07.20053439.
Chetty, Raj, John N. Friedman, Nathaniel Hendren, Michael
Stepner, and the Opportunity Insights Team. 2020. “How
Did COVID-19 and Stabilization Policies Affect Spending
and Employment? A New Real-Time Economic Tracker
Based on Private Sector Data.” NBER Working Paper
27431, National Bureau of Economic Research, Cam-
bridge, MA. https://doi.org/10.3386/w27431.
CNN Indonesia. 2018. “BNPB: Seluruh Buoy Deteksi Tsu-
nami di Indonesia Rusak.” CNN Indonesia, September 30,
2018. https://www.cnnindonesia.com/nasional/20180930
160115-20-334439/bnpb-seluruh-buoy-deteksi-tsunami
-di-indonesia-rusak.
COVID-19 National Emergency Response Center. 2020.
“Contact Transmission of COVID-19 in South Korea:
Novel Investigation Techniques for Tracing Contacts.”
Osong Public Health and Research Perspectives 11 (1): 60–63.
COVID-19 National Emergency Response Center, Epi­
demiology and Case Management Team, Korea Centers
for Disease Control and Prevention, Cheongju, Republic
of Korea. https://doi.org/10.24171/j.phrp.2020.11.1.09.
Dai, Fei, and Mohhammad Sujon. 2019. “Measuring Current
Traffic Safety Culture via Social Media Mining.” WTSC
Report 2019-AG-2856, Washington Traffic Safety Com-
mission, Olympia, WA. http://wtsc.wa.gov/wp-content
/uploads/dlm_uploads/2019/10/Measuring-Traffic-Safety
-Culture-via-Social-Media-Mining_Oct2019-1.pdf.
Davis, Kristin E., Burton Swanson, David Amudavi, Daniel
Ayalew Mekonnen, Aaron Flohrs, Jens Riese, Chloe
Lamb, and Elias Zerfu. 2010. “In-Depth Assessment of
the Public Agricultural Extension System of Ethiopia and
Recommendations for Improvement.” IFPRI Discussion
Paper01041,InternationalFoodPolicyResearchInstitute,
Interpretable Poverty Maps Using Object Detection in
Satellite Images.” Cornell University, Ithaca, NY. http://
arxiv.org/abs/2002.01612.
BBC News. 2018. “Indonesia Earthquake and Tsunami: How
Warning System Failed theVictims.” BBCNews, October 1,
2018. https://www.bbc.com/news/world-asia-45663054.
Bengtsson, Linus, Xin Lu, Anna Thorson, Richard Garfield,
and Johan von Schreeb. 2011. “Improved Response to
Disasters and Outbreaks by Tracking Population Move-
ments with Mobile Phone Network Data: A Post-Earth-
quake Geospatial Study in Haiti.” PLoS Medicine 8 (8):
e1001083. https://doi.org/10.1371/journal.pmed.1001083.
Beraja, Martin, David Y. Yang, and Noam Yuchtman. 2020.
“Data-Intensive Innovation and the State: Evidence from
AI Firms in China.” NBER Working Paper 27723, National
Bureau of Economic Research, Cambridge, MA. https://
www.nber.org/papers/w27723.
Bethlehem, Jelke. 2009. “The Rise of Survey Sampling.” Dis-
cussion Paper 09015, Statistics Netherlands, The Hague.
Björkegren, Daniel, Joshua Evan Blumenstock, and Samsun
Knight. 2020. “Manipulation-Proof Machine Learn-
ing.” Cornell University, Ithaca, NY. http://arxiv.org/abs
/2004.03865.
Blumenstock, Joshua Evan. 2016. “Fighting Poverty with
Data.” Science 353 (6301): 753–54. https://doi.org/10.1126
/science.aah5217.
Blumenstock, Joshua Evan. 2018. “Don’t Forget People in
the Use of Big Data for Development.” Nature 561 (7722):
170–72. https://doi.org/10.1038/d41586-018-06215-5.
Blumenstock, Joshua Evan, Gabriel Cadamuro, and Robert
On. 2015. “Predicting Poverty and Wealth from Mobile
Phone Metadata.” Science 350 (6264): 1073–76. https://doi
.org/10.1126/science.aac4420.
Blumenstock, Joshua Evan, and Nathan Eagle. 2012. “Divided
We Call: Disparities in Access and Use of Mobile Phones
in Rwanda.” Information Technologies and International
Development 8 (2): 1–16.
Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh
Saligrama, and Adam Kalai. 2016. “Man Is to Computer
Programmer as Woman Is to Homemaker? Debiasing
Word Embeddings.” July 21, Cornell University, Ithaca,
NY. https://arxiv.org/abs/1607.06520.
Bonnet, Emmanuel, Lucie Lechat, and Valéry Ridde. 2018.
“What Interventions Are Required to Reduce Road
Traffic Injuries in Africa? A Scoping Review of the Lit-
erature.” PLoS ONE 13 (11): e0208195. https://doi.org/10.1371
/journal.pone.0208195.
Buckee, Caroline O., Satchit Balsari, Jennifer Chan, Mercè
Crosas, Francesca Dominici, Urs Gasser, Yonatan H.
Grad, et al. 2020. “Aggregated Mobility Data Could Help
Fight COVID-19.” Science 368 (6487): 145–46. https://doi
.org/10.1126/science.abb8021.
Buckholtz, Alison. 2019. “Africa’s IT Talent Pool.” IFC Insights
(blog), December 2019. https://www.ifc.org/wps/wcm
/connect/news_ext_content/ifc_external_corporate_site
/news+and+events/news/insights/africa-it-talent.
Buolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades:
Intersectional Accuracy Disparities in Commercial
Gender Classification.” In PMLR, Proceedings of Machine
Learning Research, vol. 81, FAT 2018, Conference on Fairness,
146 | World Development Report 2021
Washington, DC. https://www.ifpri.org/publication
/depth-assessment-public-agricultural-extension-system
-ethiopia-and-recommendations.
Deaton, Angus S. 2008. “Income, Health, and Well-Being
around the World: Evidence from the Gallup World Poll.”
Journal of Economic Perspectives 22 (2): 53–72. https://doi
.org/10.1257/jep.22.2.53.
Demombynes, Gabriel, and Justin Sandefur. 2015. “Costing a
Data Revolution.” World Economics 16 (3): 99–112.
de Montjoye, Yves-Alexandre, César A. Hidalgo, Michel
Verleysen, and Vincent D. Blondel. 2013. “Unique in the
Crowd: The Privacy Bounds of Human Mobility.” Scien-
tific Reports 3 (1): 1376. https://doi.org/10.1038/srep01376.
Dureuil, Manuel, Kristina Boerder, Kirsti A. Burnett, Rainer
Froese, and Boris Worm. 2018. “Elevated Trawling inside
Protected Areas Undermines Conservation Outcomes in
a Global Fishing Hot Spot.” Science 362 (6421): 1403–07.
https://doi.org/10.1126/science.aau0561.
Dwork, Cynthia, and Aaron Roth. 2014. “The Algorithmic
Foundations of Differential Privacy.” Foundations and
Trends in Theoretical Computer Science 9 (3–4): 211–407.
http://dx.doi.org/10.1561/0400000042.
Engstrom, Ryan, Jonathan Samuel Hersh, and David Locke
Newhouse. 2017. “Poverty from Space: Using High-
Resolution Satellite Imagery for Estimating Economic
Well-Being.” Policy Research Working Paper 8284, World
Bank, Washington, DC.
Falk, Armin, Anke Becker, Thomas Dohmen, Benjamin Enke,
David Huffman, and Uwe Sunde. 2018. “Global Evidence
on Economic Preferences.” Quarterly Journal of Economics
133 (4): 1645–92. https://doi.org/10.1093/qje/qjy013.
Flowers, Andrew. 2019. “Indeed Tech Skills Explorer:
Fastest-Rising Tech Skills.” Occupation Spotlight (blog),
November 26, 2019. https://www.hiringlab.org/2019/11/26
/fastest-rising-tech-skills/.
Fraiberger, Samuel P., Pablo Astudillo, Lorenzo Candeago,
Alex Chunet, Nicholas K. W. Jones, Maham Faisal Khan,
Bruno Lepri, et al. 2020. “Uncovering Socioeconomic
Gaps in Mobility Reduction during the COVID-19 Pan-
demic Using Location Data.” Cornell University, Ithaca,
NY. http://arxiv.org/abs/2006.15195.
Frankfurter, Zoe, Klaudia Kokoszka, David Locke Newhouse,
Ani Rudra Silwal, and Siwei Tian. 2020. “Measuring
Internet Access in Sub-Saharan Africa (SSA).” Poverty
and Equity Notes 31 (August), World Bank, Washington,
DC. https://openknowledge.worldbank.org/bitstream
/handle/10986/34302/Measuring-Internet-in-Access-in
-Sub-Saharan-Africa-SSA.pdf?sequence=1.
Frias-Martinez, Vanessa, Enrique Frias-Martinez, and Nuria
Oliver. 2010. “A Gender-Centric Analysis of Calling
Behavior in a Developing Economy Using Call Detail
Records.” In Artificial Intelligence for Development: Papers
from the AAAI Spring Symposium, edited by Association
for the Advancement of Artificial Intelligence, 37–42.
Technical Report SS-10-01. Menlo Park, CA: AAAI Press.
Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lyn-
nette Brammer, Mark S. Smolinski, and Larry Brilliant.
2009. “Detecting Influenza Epidemics Using Search
Engine Query Data.” Nature 457 (February): 1012–14.
https://www.nature.com/articles/nature07634.
Glaeser, Edward L., Hyunjin Kim, and Michael Luca. 2018.
“Nowcasting Gentrification: Using Yelp Data to Quan-
tify Neighborhood Change.” AEA Papers and Proceedings
108 (May): 77–82.
Goldstein, Markus P., Paula Gonzalez Martinez, and
Sreelakshmi Papineni. 2019. “Tackling the Global Prof-
itarchy: Gender and the Choice of Business Sector.”
Policy Research Working Paper 8865, World Bank,
Washington, DC. https://openknowledge.worldbank
.org/handle/10986/31747.
González, Marta C., César A. Hidalgo, and Albert-László
Barabási. 2008. “Understanding Individual Human
Mobility Patterns.” Nature 453 (7196): 779–82.
Gourlay, Sydney, Talip Kilic, and David B. Lobell. 2019. “A
New Spin on an Old Debate: Errors in Farmer-Reported
Production and Their Implications for Inverse Scale–
Productivity Relationship in Uganda.” Journal of Devel-
opment Economics 141 (November): 102376. https://doi.org
/10.1016/j.jdeveco.2019.102376.
GSS (Government Statistical Service, UK). 2016. “Com-
petency Framework for the Government Statistician
Group (GSG).” GSS, Office of National Statistics,
London.
Hlatshwayo, Sandile, Anne Oeking, Manuk Ghazanchyan,
David Corvino, Ananya Shukla, and Lamin Leigh. 2018.
“The Measurement and Macro-Relevance of Corruption:
A Big Data Approach.” IMF Working Paper WP/18/195,
International Monetary Fund, Washington, DC. http://
dx.doi.org/10.5089/9781484373095.001.
IEAG (Independent Expert Advisory Group on a Data Revo-
lution for Sustainable Development). 2014. “A World That
Counts: Mobilising the Data Revolution for Sustainable
Development.” Data Revolution Group, United Nations,
New York.
Ihantamalala, Felana Angella, Vincent Herbreteau, Feno
M. J. Rakotoarimanana, Jean Marius Rakotondramanga,
Simon Cauchemez, Bienvenue Rahoilijaona, Gwenaëlle
Pennober, et al. 2018. “Estimating Sources and Sinks of
Malaria Parasites in Madagascar.” Nature Communica-
tions 9 (1): 3897.
ITU (International Telecommunication Union). 2020. “Africa
Is at the AI Innovation Table and ‘Ready for the Next
Wave.’” ITU News, June 23, 2020. https://www.itu.int/en
/myitu/News/2020/06/23/07/55/AI-for-Good-2020-Africa
-innovation.
Jain, Meha, Amit Srivastava, Balwinder Singh, Rajiv Joon,
Andrew Mcdonald, Keitasha Royal, Madeline Lisaius,
et al. 2016. “Mapping Smallholder Wheat Yields and
Sowing Dates Using Micro-Satellite Data.” Remote Sens-
ing 8 (November): 860. https://doi.org/10.3390/rs8100860.
Jean, Neal, Marshall Burke, Michael Xie, W. Matthew Davis,
David B. Lobell, and Stefano Ermon. 2016. “Combining
Satellite Imagery and Machine Learning to Predict
Poverty.” Science 353 (6301): 790–94. https://doi.org/10.1126
/science.aaf7894.
Kaye, Kate. 2019. “These Companies Claim to Provide ‘Fair-
Trade’ Data Work: Do They?” MIT Technology Review,
August7.https://www.technologyreview.com/2019/08/07
/133845/cloudfactory-ddd-samasource-imerit-impact
-sourcing-companies-for-data-annotation//.
Creative reuses of data for greater value | 147
Kelley, Erin, Gregory Lane, and David Schönholzer. 2020.
“Monitoring in Target Contracts: Theory and Experi-
ment in Kenyan Public Transit.” Paper presented at Vir-
tual BREAD/CEPR/STICERD/TCD Conference on Devel-
opment Economics, October 1–3, 2020. https://youtu.be
/TU-_xDR3x7I.
Klein, Brennan, Timothy LaRock, Stefan McCabe, Leo
Torres, Filippo Privitera, Lake Brennan, Moritz U. G.
Kraemer, et al. 2020. “Assessing Changes in Commuting
and Individual Mobility in Major Metropolitan Areas
in the United States during the COVID-19 Outbreak.”
Network Science Institute, Northeastern University,
Boston. https://www.networkscienceinstitute.org
/publications/assessing-changes-in-commuting-and
-individual-mobility-in-major-metropolitan-areas-in-the
-united-states-during-the-covid-19-outbreak.
Kongthon, Alisa, Choochart Haruechaiyasak, Jaruwat Pailai,
and Sarawoot Kongyoung. 2012. “The Role of Twitter
during a Natural Disaster: Case Study of 2011Thai Flood.”
In 2012 Proceedings of PICMET ‘12: Technology Management
for Emerging Technologies, edited by Institute of Electrical
and Electronics Engineers, 2227–32. Red Hook, NY: Cur-
ran Associates.
Kraemer, Moritz U. G., Nick Golding, Dionisio Bisanzio,
Samir Bhatt, David M. Pigott, S. E. Ray, O. J. Brady, et al.
2019. “Utilizing General Human Movement Models to
Predict the Spread of Emerging Infectious Diseases in
Resource Poor Settings.” Scientific Reports 9 (March): 5151.
https://doi.org/10.1038/s41598-019-41192-3.
Lai, Shengjie, Nick W. Ruktanonchai, Liangcai Zhou, Olivia
Prosper, Wei Luo, Jessica R. Floyd, Amy Wesolowski,
et al. 2020. “Effect of Non-Pharmaceutical Interventions
to Contain COVID-19 in China.” Nature 585 (7825): 410–13.
https://doi.org/10.1038/s41586-020-2293-x.
Lambert, Marie-Julie, Pierre C. Sibiry Traoré, Xavier Blaes,
Philippe Baret, and Pierre Defourny. 2018. “Estimating
Smallholder Crops Production at Village Level from
Sentinel-2 Time Series in Mali’s Cotton Belt.” Remote
Sensing of Environment 216 (October): 647–57. https://doi
.org/10.1016/j.rse.2018.06.036.
Lazer, David, Ryan Kennedy, Gary King, and Alessandro
Vespignani. 2014. “The Parable of Google Flu: Traps in
Big Data Analysis.” Science 343 (6176): 1203–05. https://doi
.org/10.1126/science.1248506.
Le Menach, Arnaud, Andrew J. Tatem, Justin M. Cohen,
Simon I. Hay, Heather Randell, Anand P. Patil, and David
L. Smith. 2011. “Travel Risk, Malaria Importation, and
Malaria Transmission in Zanzibar.” Scientific Reports 1: 93.
https://www.nature.com/articles/srep00093.
Llorente, Alejandro, Manuel Garcia-Herranz, Manuel
Cebrian, and Esteban Moro. 2015. “Social Media Finger-
prints of Unemployment.” PLoS ONE 10 (5): e0128692.
https://doi.org/10.1371/journal.pone.0128692.
Lobell, David B., George Azzari, Marshall Burke, Sydney
Gourlay, Zhenong Jin, Talip Kilic, and Siobhan Murray.
2020. “Eyes in the Sky, Boots on the Ground: Assessing
Satellite- and Ground-Based Approaches to Crop Yield
Measurement and Analysis.” American Journal of Agricul-
tural Economics 102 (1): 202–19. https://doi.org/10.1093/ajae
/aaz051.
Lu, Xin, Linus Bengtsson, and Petter Holme. 2012. “Predict-
ability of Population Displacement after the 2010 Haiti
Earthquake.” PNAS, Proceedings of the National Academy of
Sciences of the United States of America 109 (29): 11576–81.
https://doi.org/10.1073/pnas.1203882109.
Maas, Paige, Shankar Iyer, Andreas Gros, Wonhee Park,
Laura McGorman, Chaya Nayak, and P. Alex Dow. 2019.
“Facebook Disaster Maps: Aggregate Insights for Crisis
Response and Recovery.” In Conference Proceedings: 16th
International Conference on Information Systems for Crisis
Response and Management, edited by Zeno Franco, José
J. González, and José H. Canós, 836–47. Valencia, Spain:
Polytechnic University of Valencia.
Marshall, Sarah. 2012. “Citizen Journalists Report Sierra
Leone Elections by SMS.” Journalism, November 20, 2012.
https://www.journalism.co.uk/news/citizen-journalists
-report-sierra-leone-elections-by-sms-/s2/a551240/.
Masaki, Takaaki, David Locke Newhouse, Ani Rudra Silwal,
Adane Bedada, and Ryan Engstrom. 2020. “Small Area
Estimation of Non-Monetary Poverty with Geospatial
Data.” Policy Research Working Paper 9383, World Bank,
Washington, DC.
McCall, Becky. 2020. “COVID-19 and Artificial Intelligence:
Protecting Health-Care Workers and Curbing the
Spread.” Lancet Digital Health 2 (4): e166–e167. https://doi
.org/10.1016/S2589-7500(20)30054-6.
McGough, Sarah F., John S. Brownstein, Jared B. Hawkins,
and Mauricio Santillana. 2017. “Forecasting Zika Inci-
dence in the 2016 Latin America Outbreak Combining
Traditional Disease Surveillance with Search, Social
Media, and News Report Data.” PLoS Neglected Tropical
Diseases 11 (1): e0005295.
Mehrabi, Zia, Mollie J. McDowell, Vincent Ricciardi, Chris-
tian Levers, Juan Diego Martinez, Natascha Mehrabi,
Hannah Wittman, et al. 2020. “The Global Divide in
Data-Driven Farming.” Nature Sustainability 4 (February
2021): 154–60. https://doi.org/10.1038/s41893-020-00631-0.
Milinovich, Gabriel J., Gail M. Williams, Archie C. A.
Clements, and Wenbiao Hu. 2014. “Internet-Based Sur-
veillance Systems for Monitoring Emerging Infectious
Diseases.” Lancet Infectious Diseases 14 (2): 160–68. https://
doi.org/10.1016/S1473-3099(13)70244-5.
Milusheva, Sveta. 2020. “Managing the Spread of Disease
with Mobile Phone Data.” Journal of Development Eco-
nomics 147 (November): 102559. https://doi.org/10.1016/j
.jdeveco.2020.102559.
Milusheva, Sveta, Robert Marty, Guadalupe Bedoya, Eliza-
beth Resor, Sarah Williams, and Arianna Legovini. 2020.
“Can Crowdsourcing Create the Missing Crash Data?”
In COMPASS ’20: Proceedings of the 3rd ACM SIGCAS
Conference on Computing and Sustainable Societies, 305–06.
New York: Association for Computing Machinery.
https://doi.org/10.1145/3378393.3402264.
Oliver, Nuria, Bruno Lepri, Harald Sterly, Renaud Lambi-
otte, Sébastien Deletaille, Marco De Nadai, Emmanuel
Letouzé, et al. 2020. “Mobile Phone Data for Informing
Public Health Actions across the COVID-19 Pandemic
Life Cycle.” Science Advances 6 (23): eabc0764. https://doi
.org/10.1126/sciadv.abc0764.
148 | World Development Report 2021
PAHO (Pan American Health Organization) and WHO
(World Health Organization). 2016. “Zika Cases and
Congenital Syndrome Associated with Zika Virus
Reported by Countries and Territories in the Americas:
Cumulative Cases, 2015–2016.” PAHO, Washington,
DC. https://www.paho.org/hq/dmdocuments/2016/2016
-dec-29-phe-ZIKV-cases.pdf.
Peak, Corey M., Amy Wesolowski, Elisabeth zu Erbach-
Schoenberg, Andrew J. Tatem, Erik Wetter, Xin Lu,
Daniel Power, et al. 2018. “Population Mobility Reduc-
tions Associated with Travel Restrictions during the
Ebola Epidemic in Sierra Leone: Use of Mobile Phone
Data.” International Journal of Epidemiology 47 (5): 1562–70.
Pepe, Emanuele, Paolo Bajardi, Laetitia Gauvin, Filippo
Privitera, Brennan Lake, Ciro Cattuto, and Michele
Tizzoni. 2020. “COVID-19 Outbreak Response: A Dataset
to Assess Mobility Changes in Italy Following National
Lockdown.” Scientific Data 7: 230. https://doi.org/10.1038
/s41597-020-00575-2.
Perisic, Igor. 2018. “How Artificial Intelligence Is Already
Impacting Today’s Jobs.” Economic Graph (blog), Septem-
ber 17, 2018. https://economicgraph.linkedin.com/blog
/how-artificial-intelligence-is-already-impacting-todays
-jobs.
Perol, Thibaut, Michaël Gharbi, and Marine Denolle. 2018.
“Convolutional Neural Network for Earthquake Detec-
tion and Location.” Science Advances 4 (2): e1700578.
https://doi.org/10.1126/sciadv.1700578.
Pratihast, Arun Kumar, Ben DeVries, Valerio Avitabile, Sytze
De Bruin, Lammert Kooistra, Mesfin Tekle, and Martin
Herold. 2014. “Combining Satellite Data and Community-
Based Observations for Forest Monitoring.” Forests 5 (10):
2464–89. https://doi.org/10.3390/f5102464.
Resch, Bernd, Florian Usländer, and Clemens Havas. 2018.
“Combining Machine-Learning Topic Models and Spa-
tiotemporal Analysis of Social Media Data for Disaster
Footprint and Damage Assessment.” Cartography and
Geographic Information Science 45 (4): 362–76.
Reynard, Darcy, and Manish Shirgaokar. 2019. “Harnessing
the Power of Machine Learning: Can Twitter Data Be
Useful in Guiding Resource Allocation Decisions during
a Natural Disaster?” Transportation Research Part D: Trans-
port and Environment 77 (December): 449–63.
Ritchie, Hannah, and Max Roser. 2019. “Natural Disasters.”
Our World in Data. Global Change Data Lab and Oxford
Martin Program on Global Development, University of
Oxford, Oxford, UK. https://ourworldindata.org/natural
-disasters.
Robinson, Bella Fay, Robert Power, and Mark Cameron.
2013. “A Sensitive Twitter Earthquake Detector.” In
WWW ’13: Proceedings of the 22nd International Conference
on World Wide Web, 999–1002. New York: Association for
Computing Machinery. https://dl.acm.org/doi/10.1145
/2487788.2488101.
Salathé, Marcel, Linus Bengtsson, Todd J. Bodnar, Devon D.
Brewer, John S. Brownstein, Caroline Buckee, Ellsworth
M. Campbell, et al. 2012. “Digital Epidemiology.” PLoS
Computational Biology 8 (7): e1002616.
Saleiro, Pedro, Benedict Kuester, Loren Hinkson, Jesse
London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, et al.
2019. “Aequitas: A Bias and Fairness Audit Toolkit.”
Cornell University, Ithaca, NY. https://arxiv.org/abs
/1811.05577.
Salganik, Matthew J. 2017. Bit by Bit: Social Research in the
Digital Age. Princeton, NJ: Princeton University Press.
Serajuddin, Umar, Hiroki Uematsu, ChristinaWieser, Nobuo
Yoshida,andAndrewL.Dabalen.2015.“DataDeprivation:
Another Deprivation to End.” Policy Research Working
Paper 7252, World Bank, Washington, DC.
Servick, Kelly. 2020a. “Cellphone Tracking Could Help Stem
the Spread of Coronavirus: Is Privacy the Price?” Science,
March 22. https://www.sciencemag.org/news/2020/03
/cellphone-tracking-could-help-stem-spread-coronavirus
-privacy-price.
Servick, Kelly. 2020b. “COVID-19 Contact Tracing Apps
Are Coming to a Phone Near You: How Will We Know
Whether They Work?” Science, May 21. https://www
.sciencemag.org/news/2020/05/countries-around-world
-are-rolling-out-contact-tracing-apps-contain-corona
virus-how.
Stephens-Davidowitz, Seth. 2017. Everybody Lies: Big Data,
New Data, and What the Internet Can Tell Us about Who We
Really Are. New York: HarperCollins.
Strubell, Emma, Ananya Ganesh, and Andrew McCallum.
2019. “Energy and Policy Considerations for Deep Learn-
ing in NLP.” Proceedings of 57th Annual Meeting of the
Association for Computational Linguistics, Florence,
Italy, July 2019.
Sylvester, Gerard, ed. 2019. “E-Agriculture in Action: Big Data
for Agriculture.” Food and Agriculture Organization of
the United Nations and International Telecommunica-
tion Union, Bangkok. http://www.fao.org/3/ca5427en
/ca5427en.pdf.
Tatem, Andrew J., Youliang Qiu, David L. Smith, Oliver Sabot,
Abdullah S. Ali, and Bruno Moonen. 2009. “The Use of
Mobile Phone Data for the Estimation of the Travel
Patterns and Imported Plasmodium Falciparum Rates
among Zanzibar Residents.” Malaria Journal 8 (Decem-
ber): 287. https://doi.org/10.1186/1475-2875-8-287.
Tiecke, Tobias G., and Andreas Gros. 2016. “Connecting the
World with Better Maps.” Facebook Engineering (blog),
February 22, 2016. https://engineering.fb.com/core-data
/connecting-the-world-with-better-maps/.
Vale, Steven, and Taeke Gjaltema. 2020. “High-Level Group
for the Modernisation of Official Statistics.” United
Nations Economic Commission for Europe, Geneva.
https://statswiki.unece.org/display/hlgbas/High-Level+
Group+for+the+Modernisation+of+Official+Statistics.
Wallace, Eric, Shi Feng, Nikhil Kandpal, Matt Gardner, and
Sameer Singh. 2019. “Universal Adversarial Triggers
for Attacking and Analyzing NLP.” Cornell University,
Ithaca, NY. http://arxiv.org/abs/1908.07125.
Wesolowski, Amy, Nathan Eagle, Abdisalan M. Noor, Robert
W. Snow, and Caroline O. Buckee. 2012. “Heterogeneous
Mobile Phone Ownership and Usage Patterns in Kenya.”
PLoSONE7(4):e35319.https://doi.org/10.1371/journal.pone
.0035319.
Creative reuses of data for greater value | 149
WHO (World Health Organization). 2008. “The Top 10
Causes of Death.” Fact Sheets (blog), May 24, 2008.
https://www.who.int/news-room/fact-sheets/detail/the
-top-10-causes-of-death.
Williams, Kehinde, Adebayo Peter Idowu, and Emmanuel
Olonade. 2015. “Online RoadTraffic Accident Monitoring
System for Nigeria.” Transactions on Networks and Commu-
nications 3 (1): 10–30. https://doi.org/10.14738/tnc.31.589.
Wilson, Robin, Elisabeth zu Erbach-Schoenberg, Maximilian
Albert,DanielPower,SimonTudge,MiguelGonzalez,Sam
Guthrie, et al. 2016. “Rapid and Near Real-Time Assess-
ments of Population Displacement Using Mobile Phone
Data Following Disasters: The 2015 Nepal Earthquake.”
PLoS Currents 8 (February 24). https://doi.org/10.1371
/currents.dis.d073fbece328e4c39087bc086d694b5c.
WRI (World Resources Institute). 2019. “Palm Oil Industry to
Jointly Develop Radar Monitoring Technology to Detect
Deforestation.” Press release, October 31, 2019. https://
www.wri.org/news/2019/10/release-palm-oil-industry
-jointly-develop-radar-monitoring-technology-detect.
Yang, Shihao, Samuel C. Kou, Fred Lu, John S. Brown-
stein, Nicholas Brooke, and Mauricio Santillana. 2017.
“Advances in Using Internet Searches to Track Dengue.”
PLoS Computational Biology 13 (7): e1005607.
Yeh, Christopher, Anthony Perez, Anne Driscoll, George
Azzari, Zhongyi Tang, David B. Lobell, Stefano Ermon,
et al. 2020. “Using Publicly Available Satellite Imagery
and Deep Learning to Understand Economic Well-Being
in Africa.” Nature Communications 11 (1): 2583. https://doi
.org/10.1038/s41467-020-16185-w.
Zindi. 2020. “GIZ AI4D Africa Language Challenge, Round 2:
$6,000USD.”Competitions,June1,2020.https://zindi.africa
/competitions/ai4d-african-language-dataset-challenge.
Zou, James, and Londa Schiebinger. 2018. “AI Can Be Sexist
and Racist: It’s Time to Make It Fair.” Nature 559 (7714):
324–26. https://doi.org/10.1038/d41586-018-05707-8.
150 | World Development Report 2021
Two-way flows of data between local, regional, and international
meteorological centers have high value for social and economic
development.
Spotlight 4.1
Gathering, sharing, and using better
data on weather, water, and climate
from low- and middle-income
countries
The need for weather and climate information is
growing rapidly as people are becoming more vulner-
able to natural hazards, including those exacerbated
by climate change. To support economic and social
development, all countries need to have access to
sufficiently accurate, reliable, and understandable
weather, water, and climate data (as stated in the Sus-
tainable Development Goals).1
This is the case every-
where, but particularly in low- and middle-income
countries, which are bearing the brunt of losses from
natural hazards.2
Scientific and technological advances have brought
weather prediction systems to a level where they can
provide weather intelligence to inform the decisions
of individuals and groups of individuals.3
But such
data are useful for decision-making only if more high-
quality observational data are shared, assimilated, or
used to adjust model outputs. While satellites provide
most of the data for models, local data play several
critical roles. Scientists from national meteorological
services work with regional and global centers to cali-
brate global models at national and local scales.
The more local data are shared internationally,
the better the weather predictions produced by global
centers can be applied locally. Exchanging more and
better data internationally and doing so more fre-
quently have many direct benefits, yielding better-
performing models, more accurate local forecasts,
and improved verification of forecasts, helping to
monitor, improve, and compare the quality of fore-
casts and forecasting systems.
Recognizing the benefit of sharing national
data with regional and global forecasting centers,
Ukraine recently increased the number of weather
stations reporting data to the European Centre for
Medium-Range Weather Forecasts (ECMWF) from
30 to 130.4
The extra data produced by these stations
will help to improve global forecasts and thus regional
and national forecasts. In particular, local observa-
tions of near-surface temperature and humidity will
improve estimates of soil moisture, which influence
regional and global forecasts of near-surface tem-
perature and rainfall. Additional data on snow depth
from Denmark, Hungary, the Netherlands, Romania,
Sweden, and Switzerland have improved forecasts of
air temperature in the northern hemisphere.
More data are now being shared in real time
between ECMWF and all 37 countries participating
in the Regional Integrated Multi-Hazard Early Warn-
ing System for Africa and Asia (RIMES).5
The Bangla-
desh Meteorological Department, for example, has
increased from 10 to 32 the number of stations shar-
ing observational data taken every three hours and
provided nearly 40 years of historical data. The total
number of stations added by all RIMES members is
now 500 and is expected to increase to 1,500 soon,
leading to a significant improvement in the accuracy
and lead time of weather forecasts.
However, these data are not categorized as essen-
tial data, as defined by the World Meteorological
Organization,6
and are not considered open data from
the perspective of their use and reuse. For this reason,
RIMES needs to ensure that these data are protected
by nondisclosure agreements. In return, ECMWF
shares high-resolution digital forecast products with
each participating country, with the aim of improving
national forecasts and deepening the technical collab-
oration between RIMES countries and ECMWF. With
access to these high-resolution forecast products
from ECMWF, countries can focus more efforts on
Gathering, sharing, and using better data | 151
applying forecast information to the needs of their
population and on building skills in data analytics.7
As map S4.1.1 shows, significant gaps in reporting
basic weather data still exist. Important steps to take
are getting countries to recognize the value of sharing
their data and to participate in improving the mod-
els. In time, it is anticipated that open data policies
similar to the European Union Directive will apply to
meteorological data everywhere, characterizing these
data as having high value for social and economic
development. The German Meteorological Service,
for example, has started openly sharing all of the data
it uses for its public tasks.8
Now more than 500 peta-
bytes of data are downloaded monthly and used by a
wide range of industries in Germany to improve their
economic performance.
Map S4.1.1 Large gaps remain in global reporting on basic weather data
Source: World Bank map, based on data from WDQMS (WIGOS Data Quality Monitoring System) (webtool), World Meteorological Organization, Geneva,
https://wdqms.wmo.int. Data at http://bit.do/WDR2021-Map-S4_1_1.
Note: Snapshot of World Meteorological Organization Integrated Global Observing System interactive map showing observations of surface temperature
measured on a typical day (December 14) in 2020. NWP = numerical weather prediction; OSCAR = Observing Systems Capability Analysis and Review Tool.
Notes
1. See United Nations, Sustainable Development Goals:
Decade of Action (dashboard), https://www.un.org
/sustainabledevelopment/decade-of-action/.
2. Hallegatte, Bangalore, and Vogt-Schilb (2016).
3. Charts Catalogue (weather map repository), European
Centre for Medium-Range Weather Forecasts, https://
apps.ecmwf.int/webapps/opencharts/?.
4. ECMWF (2018).
5. Rimes (2020).
6. Zillman (2019).
7. ECMWF (2020).
8. See Deutscher Wetterdienst, Open Data Server (dash-
board), https://www.dwd.de/EN/ourservices/opendata
/opendata.html.
References
ECMWF (European Centre for Medium-Range Weather
Forecasts). 2018. “Extra Weather Station Data Improve
ECMWF’s Forecasts.” News, June 18, 2018. https://www
.ecmwf.int/en/about/media-centre/news/2018/extra
-weather-station-data-improve-ecmwfs-forecasts.
ECMWF (European Centre for Medium-Range Weather
Forecasts). 2020. “ECMWF Moves towards a Policy of
Open Data.” News, October 7, 2020. https://www.ecmwf
.int/en/about/media-centre/news/2020/ecmwf-moves
-towards-policy-open-data.
Hallegatte, Stéphane, Mook Bangalore, and Adrien Vogt-
Schilb. 2016. “Socioeconomic Resilience: Multi-Hazard
Estimates in 117 Countries.” Policy Research Working
Paper 7886, World Bank, Washington, DC.
RIMES (Regional Integrated Multi-Hazard Early Warning
System for Africa and Asia). 2020. “37 RIMES Countries
Agree to Share Data and Focus on Impact Forecasting
and Forecast-Based Action for 2021–2025.” RIMES, Asian
Institute of Technology, Pathumthani, Thailand, Novem-
ber 26, 2020. https://www.rimes.int/?q=node/630.
Zillman, John W. 2019. “Origin, Impact, and Aftermath of
WMO Resolution 40.” WMO Bulletin 68 (2): 69–71. https://
library.wmo.int/doc_num.php?explnum_id=10077.
IBRDWLD45650 | MARCH2021
More than 100%
Normal ≥ 80%
Availability issues ≥ 30%
Availability issues  30%
Not received in period
OSCAR schedule issues
No match in OSCAR/surface land observation
Availability of surface land observations
(global NWP) as of December 14, 2020
152 | World Development Report 2021
Developing spatially integrated datasets that leverage administrative,
social media, private, and more traditional sources can help to pave the
way for smart and socially beneficial investments and policies.
Spotlight 4.2
Making roads safer by repurposing
private intent traffic data
Reducing road mortality by half (Sustainable Develop-
ment Goal 3.6) could save 675,000 lives a year. Yet the
countrywide investments in regulation, enforcement,
education, and infrastructure needed to attain this
ambitious target are often out of reach. Resource con-
straints require countries to make smart choices on
where and how to invest. Knowing where and when
road traffic crashes happen can help to prioritize
investments where they matter most. Unfortunately,
many countries facing these difficult choices have
little or no data on road traffic crashes and inadequate
capacity to analyze the data they do have. Official data
on road traffic crashes capture only 56 percent of fatal-
ities in low- and middle-income countries, on average.1
Crash reports exist, yet they are buried in piles
of paper or collected by private operators instead of
being converted into useful data or disseminated to
the people who need the information to make policy
decisions. In Kenya, where official figures under­
report the number of fatalities by a factor of 4.5,2
the
rapid expansion of mobile phones and social media
provides an opportunity to leverage commuter
reports on traffic conditions as a potential source of
data on road traffic crashes.
Big data mining, combined with digitization of
official paper records, has demonstrated how dispa-
rate data can be leveraged to inform urban spatial
analysis, planning, and management.3
Researchers
worked in close collaboration with the National
Police Service to digitize more than 10,000 situation
reports spanning from 2013 to 2020 from the 14
police stations in Nairobi to create the first digital
and geolocated administrative dataset of individual
crashes in the city. They combined administrative
data with data crowdsourced using a software appli-
cation for mobile devices and short message service
(SMS) traffic platform, Ma3Route, which has more
than 1.1 million subscribers in Kenya. They analyzed
870,000 transport-related tweets submitted between
2012 and 2020 to identify and geolocate 36,428 crash
reports by developing and improving natural lan-
guage processing and geoparsing algorithms.4
To verify the accuracy of crowdsourced reports
and the efficiency of the algorithms, the team dis-
patched a motorcycle delivery company to the site
of the reported crash minutes after each new crash
report was received for a subset of reports. In 92 per-
cent of cases, a crash was verified to have occurred
in the stated location or nearby. By combining these
sources of data, researchers were able to identify the 5
percent of roads (crash black spots) where 50 percent
of the road traffic deaths occur in the city (map S4.2.1).
This exercise demonstrates that addressing data
scarcity can transform an intractable problem into a more
manageable one. In this case, investing in the safety of
a 6,200-kilometer road network is intractable. Digitiz-
ing and analyzing administrative data and variables
on injuries and deaths can help to narrow down the
locations and times of the day and week that are
associated with the most severe crashes. The analysis
offers an invaluable road map for future regulation,
infrastructure, and enforcement efforts.
More insights can be gained by integrating exist-
ing data and collecting further information, such as
Making roads safer by repurposing private intent traffic data | 153
Uber and Waze data on average speeds on road seg-
ments and on road obstacles; Google Maps data on
land use; and weather data on driving conditions. The
researchers also invested in a massive data collection
effort that is surveying the infrastructure and video-
taping and coding the behavior of road users in 200
crash black spots in the city. The analysis of these new
data will generate hypotheses to optimize the policy
response to the road safety problem.
Overall, developing spatially integrated datasets
that leverage administrative, social media, private,
and more traditional sources can help to fill data gaps
and pave the way for smart and socially beneficial
investments and policies.
Source: Milusheva et al. 2020.
Note: Data shown are for July 2017–July 2018.
Map S4.2.1 By combining police reports and crowdsourced data, researchers were
able to identify the 5 percent of roads where half of the crashes occur in Nairobi
Notes
1. WDR 2021 and World Health Organization (WHO) team
calculations based on a comparison of reported deaths
from the WHO Global Status Report on Road Safety (WHO
2018).
2. WHO (2018).
3. Milusheva et al. (2020).
4. The new algorithms build on work by Finkel, Grenager,
and Manning (2005); Gelernter and Balaji (2013); and
Ritter et al. (2011).
References
Finkel, Jenny Rose, Trond Grenager, and Christopher Mann­
ing. 2005. “Incorporating Non-local Information into
Information Extraction Systems by Gibbs Sampling.”
In 43rd Annual Meeting of the Association for Computational
Linguistics: Proceedings of the Conference, edited by Kevin
Knight, Hwee Tou Ng, and Kemal Oflazer, 363–70. New
Brunswick, NJ: Association for Computational Linguis-
tics. https://www.aclweb.org/anthology/P05-1045.
Gelernter, Judith, and Shilpa Balaji. 2013. “An Algorithm for
Local Geoparsing of Microtext.” GeoInformatica 17 (4):
635–67. https://doi.org/10.1007/s10707-012-0173-8.
Milusheva, Sveta, Robert Marty, Guadalupe Bedoya, Eliza-
beth Resor, Sarah Williams, and Arianna Legovini. 2020.
“Can Crowdsourcing Create the Missing Crash Data?”
In COMPASS ’20: Proceedings of the 3rd ACM SIGCAS Confer-
ence on Computing and Sustainable Societies, 305–06. New
York: Association for Computing Machinery. https://doi
.org/10.1145/3378393.3402264.
Ritter, Alan, Sam Clark, Mausam, and Oren Willi Etzioni.
2011. “Named Entity Recognition in Tweets: An Exper-
imental Study.” In Conference on Empirical Methods in
Natural Language Processing: Proceedings of the Conference,
1524–34. Stroudsburg, PA: Association for Computa-
tional Linguistics.
WHO (World Health Organization). 2018. Global Status
Report on Road Safety 2018. Geneva: WHO. https://www
.who.int/publications/i/item/9789241565684.
Crashes geolocated from police situation reports
Crashes identified by crowdsourced reports that were geolocated and clustered
into individual crashes IBRDKEN45648 | MARCH2021
World-Development-Report-2021-Data-for-Better-Lives.pdf
PART II
Data infrastructure policy: Ensuring equitable
access for poor people and poor countries
Data policies, laws, and regulations:
Creating a trust environment
Creating value in the data economy:
The role of competition, trade, and tax policy
Institutions for data governance:
Building trust through collective action
Aligning data governance
with the social contract
5.
6.
7.
8.
World-Development-Report-2021-Data-for-Better-Lives.pdf
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 157
Data infrastructure policy:
Ensuring equitable access for
poor people and poor countries
5
CHAPTE
R
Main messages
As new mobile technologies emerge, policy makers should proactively
facilitate their rollout by promoting service competition, where possible,
and infrastructure sharing, where necessary.
Universal service policies should incorporate measures designed to ease
the demand-side barriers often faced by those who do not seek data
services even when they are locally available. These measures include
programs to improve the affordability of handsets and data services, while
enhancing the digital literacy of excluded groups.
To ensure high-speed, cost-effective data services, policy makers should
facilitate development of domestic data infrastructure that allows local
storage, processing, and exchange of data so that data need not travel
through distant overseas facilities.
A competitive market and open governance arrangements are two policies
that support the creation of internet exchange points. Establishment of
colocation data centers will depend on a stable investment climate for
private sector investors, combined with the availability of low-cost reliable
sources of clean energy.
1
2
3
4
158 | World Development Report 2021
Data infrastructure as a source of
inequity
I
nfrastructure is a prerequisite for collecting,
exchanging, storing, processing, and distributing
modern data because of its digital character. Har-
nessing the full economic and social value of modern
data services calls for digital infrastructure that is
universally accessible, while also offering adequate
internet speed at affordable cost. Yet the developing
world is lagging behind, with major gaps between
rich and poor people on broadband connectivity, and
a substantial divide emerging between rich and poor
countries in the availability of data infrastructure.
Well-designed infrastructure policies are needed to
redress these adverse trends.
Concerns about inequities in access to data infra-
structure stem from growing evidence of a link with
economic activity. Numerous studies have found that
broadband infrastructure boosts economic growth,1
increasing productivity2
and employment3
while
enabling digital enterprises. For example, the arrival
of fiber-optic submarine cables in Africa has had pos-
itive effects on employment from the entry of new
firms, greater productivity, and higher exports.4
More
broadly, a 10 percent increase in data centers results
in an expansion of exports in data-related services of
about 1.6 percent.5
As a growing share of economic
activity becomes data-enabled, it is important to
ensure that poor people and poor countries are not
excluded from such opportunities by the absence of
suitable data infrastructure.
Data infrastructure forms a supply chain that
originates in global data storage centers and data
processing facilities known as cloud computing
platforms (figure 5.1). From there, data pass through
Figure 5.1 The data infrastructure supply chain
Source: Adapted from World Bank (2019c).
Note: DSL = digital subscriber line; DTH = direct-to-home; IXP = internet exchange point; km = kilometers; LTE = Long-Term Evolution; WWW = World Wide Web (internet).
Figure 5.1 (2 column plus margin 38p6)
1–5 km
Last mile
(access)
Distributing internet from
the middle mile in the
community to individual
homes and businesses
10–1,000 km
Middle-mile network
(distribution)
Connecting bigger cities
together and routing data to
the service centers
1,000 km
Connecting the service
centers to the worldwide
internet
Cross-border network
(international links)
Purpose
Subsea cable
Terrestrial fiber
Fiber-optic backbone Fiber
DSL
Coaxial
Wireless
Wi-Fi, LTE,
TV white space
Satellite link (DTH)
Cellular
towers
Satellite link
Earth station
Wireless
Technologies
Typical
distances
Balloons
and drones
WWW
Fiber or
microwave backbone
IXP
WWW Fiber
Fiber
Fiber
Colocation
data center
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 159
internet exchange points (IXPs) for transfer to users.
Data then flow in and out of countries through
an intercontinental network of submarine cables.
Once on domestic soil, data are distributed through
national fiber-optic and microwave backbone net-
works until they reach a proximate location for distri-
bution to local communities, whether through wired
connections, or wireless signals provided by mobile
(or cellular) networks. Finally, data are exchanged
with individuals, businesses, and public institutions
through fixed lines or wireless signals from cellular
towers, and increasingly with inanimate machines,
cameras, and sensors connected to what is known as
the Internet of Things (IoT). Data travel thousands of
kilometers along this seamless infrastructure supply
chain at breathtaking speeds of 200,000 kilometers
per second—meaning that digital data can, in princi-
ple, circle the globe five times within a second.6
Data traffic is growing rapidly around the world.
Internet data usage rose from 4.6 to 13 gigabytes per
person per month between 2012 and 2017.7
Four trends
are driving the explosion in data traffic. First, the
number of internet users is growing. More than half
of the world’s population is now online, up from less
than one-third in 2010, and that share is forecast to
reach two-thirds by 2023. Second, the number of con-
nected devices on the IoT already exceeds the number
of human users and is forecast to reach 25 billion
by 2025 with the diffusion of 5G technology.8
Third,
internet speeds are continually increasing, which
supports growing data volumes. By 2023 the speed
of broadband service provided over fixed networks is
expected to double from 2018 levels,9
even as the speed
of broadband service provided over fixed networks
triples. Fourth, video accounts for three-fifths of inter-
net traffic, and associated quality improvements are
increasing video data traffic.10
A two-hour movie in
standard definition uses 1.4 gigabytes of data, whereas
ultra-high definition uses 18 gigabytes.11
Although most data traffic is still carried over
fixed networks, data traffic carried over wireless
networks is forecast to rise to more than 20 percent
of the global total by 2022, up from only 3 percent in
2012. This shift is driven by the greater prevalence of
mobile traffic in emerging nations, with China and
India alone accounting for more than 40 percent of
the world’s mobile data traffic as of 2018.
Both poor people and poor countries face fundamen-
tal inequities in their ability to access data infrastruc-
ture. To participate in the data-driven economy, people
require internet connectivity. It entails both access
to last-mile internet infrastructure—increasingly
provided through a wireless signal—and ownership of
a data-enabled mobile handset (also known as a smart-
phone)—or alternatively a full-blown fixed line con-
nection. Such connectivity makes it possible for people
to both have access to data about other people (and
increasingly other things) and provide their own data
to others. Large swathes of the population remain
excluded from the internet, particularly the poor, the
uneducated, the elderly, those living in rural areas,
and—in some parts of the world—women. This com-
plex situation reflects both the supply-side challenges
entailed in rolling out coverage of the latest mobile
technologies and the demand-side barriers preventing
potentialusersfromtakinguptheserviceevenwhenit
becomes available. Moreover, because of the growing
volumes of data underpinning economic and social
activity, connectivity is meaningful only if it can be
provided at affordable cost and adequate speed.
Unless countries have access to modern data
infrastructure, connectivity (even when available)
will remain prohibitively expensive and slow. Such
infrastructure begins with adequate international
bandwidth to permit fluid and unconstrained access
to the global internet commons. As traffic grows,
local IXPs are needed to prevent domestic data
transfers from being diverted across vast distances
overseas. The addition of domestic colocation data
centers—wholesale storage facilities that host other
companies’ data—allows substantial volumes of
popular overseas content to be stored locally, further
improving internet performance. It may also permit
direct access to cloud computing platforms, greatly
enhancing data processing capabilities. Although
almost all countries now enjoy access to global inter-
net submarine cables through either direct coastal
access points or cross-border land connections,
domestic data infrastructure—such as IXPs, coloca-
tion data centers, and cloud computing platforms—
remain nascent across low- and middle-income
nations, leaving them to contend with low internet
speeds and high data charges.
This chapter unpacks the underlying issues that
explain the data inequities faced by poor people and
poor countries, with an emphasis on identifying
appropriate policy responses. The chapter updates,
complements, and extends the earlier treatment
of related issues in World Development Report 2016:
Digital Dividends. For this reason, coverage of supply-
side issues is on a relatively high level, whereas the
demand-side barriers, as well as the emerging chal-
lenges posed by development of domestic data infra-
structure, receive more attention.
160 | World Development Report 2021
Connecting poor people
Many individuals in low- and middle-income nations
usebasiccellphonesforapplicationssuchastextmes-
saging and mobile money. These applications have
had tremendous development impacts, even without
using much data or requiring broadband internet
access.12
Beyond such basic telephony applications,
access to broadband internet, in combination with
ownership of a feature phone or smartphone, greatly
enriches an individual’s ability to use data for a better
life. Social media connect family and friends; online
government services and shopping websites save
individualstimeandmoney;onlinelearningandtele-
medicine provide new, accessible, and inexpensive
ways of delivering education and health. The COVID-
19 pandemic is reinforcing the importance of access
to broadband internet for remote learning and home
working, as well as improving the overall resilience
of economies to shocks of various kinds (see spot-
light 5.1).
In the context of low- and middle-income coun-
tries, wireless broadband networks have emerged as
the most relevant technology for accessing data ser-
vices. The impacts of wireless broadband are greater
than those of wired broadband in these nations,13
par-
ticularly because the expansion of fixed broadband is
relatively limited and has yet to reach the minimum
threshold to have a statistically significant effect
on economic growth.14
Even in upper-middle-and
high-income nations, where fixed broadband is more
prevalent users spend most of their time online on
mobile phones. Among the poorest in these countries,
many only use wireless networks to access the inter-
net (figure 5.2).15
The world’s political commitment to universal
access for internet was most recently articulated in
a 2019 report of the United Nations Broadband Com-
mission for Sustainable Development, which calls for
75 percent access to broadband worldwide by 2025—
65 percent in developing economies and 35 percent in
least developed countries.16
The United Nations also
encourages all countries to adopt by 2025 a national
plan for universal access to broadband. These targets
reflect a reappraisal by the international community
following the failure to reach Sustainable Develop-
ment Goal (SDG) 9, Target 9.c, which called for “uni-
versal and affordable access to the internet in least
developed countries by 2020.”17
From an economic standpoint, public policy sup-
port for universal coverage of telecommunications
and data services has hinged on positive network
externalities. In other words, the economic value
of communications infrastructure rises as more
­
members of a society are connected because such
growth exponentially increases the number of pairs
of people who can communicate with each other.18
Such positive externalities have underpinned the
case for providing public subsidies to ensure that
universal access can be achieved. Furthermore, as the
internet becomes the central platform for much of
social and economic life, providing all citizens with
an opportunity to access this platform is increasingly
a matter of social inclusion.
The shortfall of digital connectivity in the devel-
oping world can be understood in terms of three dif-
ferent types of gaps. The coverage gap refers to the fact
that last-mile digital infrastructure has yet to reach
all inhabited locales. The usage gap refers to the fact
that, even when coverage becomes available, uptake
of the service by the affected population will typically
not be universal. The consumptiongap refers to the fact
that, even when people do take up the service, data
consumption is typically too low to support basic
economic and social functions. The discussion that
follows focuses primarily on people, but small firms
face many of the same barriers.
Although all but 8 percent of the world’s popula-
tion is covered by a wireless broadband network (fig-
ure 5.3, panel a), this overall figure hides significant
Figure 5.2 The developing world
overwhelmingly accesses data using
wireless networks
Source: WDR 2021 team, based on data from International Telecommunica-
tion Union, Statistics (database), https://www.itu.int/en/ITU-D/Statistics
/Pages/stat/default.aspx. Data at http://bit.do/WDR2021-Fig-5_2.
Note: Data are for 2019.
140
120
100
80
60
40
20
0
Number
of
broadband
subscriptions
per
100
persons
H
i
g
h
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
W
o
r
l
d
Wireless Fixed
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 161
regional differences (figure 5.3, panel b). The coverage
gap is less than 1 percent in North America, but as
high as 29 percent in Sub-Saharan Africa. The usage
gapencompassed 42 percent of the world’s population
in 2018, but as much as 64 percent of the population of
South Asia, where more than 1 billion people are cov-
ered by a broadband signal without making use of the
internet. Although the 3G coverage gap has shrunk
by more than half over the last five years thanks to
successful rollout of last-mile infrastructure on the
supply side, the usage gap has remained remarkably
stable, indicating the persistence of barriers on the
demand side. Indeed, as of 2018 more than four in
five of the unserved persons worldwide lived in areas
where 3G signal coverage was already available.
Ultimately, the three gaps are interrelated because
improving service uptake and data consumption also
contribute to commercial viability, increasing the
revenues generated by any particular investment in
network coverage.Thus progress on closing the usage
gap and narrowing the consumption gap will further
help eliminate the coverage gap.
Closing the coverage gap
In 2018 more than 600 million people lived without
access to the internet, a far cry from the United
Nations’ SDG target of ­
universal and affordable
access to the internet by 2020.19
Most of those who are
unconnected live in lower-income nations. Estimates
suggest that achieving universal broadband internet
access by 2030 will require an investment of approxi-
mately US$100 billion in Africa alone.20
The coverage gap is typically reported relative to
3G technology, which delivers speeds of 42 mega-
bytes per second, making it the first generation able
to support data-rich smartphone applications. How-
ever, rapid innovation in the mobile communications
sector leads to a new generation of technology just
about every decade, ushering in substantial improve-
ments in speed and bandwidth and making universal
coverage something of a moving target. In fact, 4G
technology, offering speeds of 400 megabytes per
second, is already widely available in the developing
world (figure 5.4). If 4G were used as the relevant
technological benchmark, the coverage gap would
rise from 8 percent to 20 percent in 2018, and the
problem of access would no longer be confined solely
to Sub-Saharan Africa.
In 2019 5G technology became commercially avail-
able in 23 high-income economies and China, with a
global coverage gap of 95 percent by the end of the
first year. The new 5G technology is revolutionary
because of both its exceptionally high speed of 1,000
megabytes per second, as well as its greatly enhanced
Figure 5.3 Gaps in 3G wireless broadband internet coverage have been shrinking, but usage
gaps remain stubbornly high
Sources: WDR 2021 team, based on 2015 and 2018 data in ITU (2018a). Data at http://bit.do/WDR2021-Fig-5_3.
Connected Usage gap Coverage gap
60
78
56 58
89
28
22
39
15
36 36
11
64
49
2
1
6 8 6 8
29
0
10
20
30
40
50
60
70
80
90
100
Share
of
population
(%)
Share
of
population
(%)
E
a
s
t
A
s
i
a
a
n
d
P
a
c
i
fi
c
E
u
r
o
p
e
a
n
d
C
e
n
t
r
a
l
A
s
i
a
L
a
t
i
n
A
m
e
r
i
c
a
a
n
d
t
h
e
C
a
r
i
b
b
e
a
n
M
i
d
d
l
e
E
a
s
t
a
n
d
N
o
r
t
h
A
f
r
i
c
a
N
o
r
t
h
A
m
e
r
i
c
a
S
o
u
t
h
A
s
i
a
S
u
b
-
S
a
h
a
r
a
n
A
f
r
i
c
a
41 45 48 50
37
40
41 42
21
15 11 8
0
10
20
30
40
50
60
70
80
90
100
a. Globally over time, 2015–18 b. By region, 2018
2015 2016 2017 2018
162 | World Development Report 2021
capacity to transmit a large amount of sensor-based
data from the IoT in near real time, offering numer-
ous applications across different economic sectors.
Forecasts suggest that one-third of mobile subscrip-
tions could be 5G by 2025.21
This Report finds that under current conditions,
5G stand-alone technology (that is, technology not
dependent on 4G for signaling) does not seem to be
broadly viable across low- and middle-income coun-
tries, outside of major urban areas. However, devel-
oping 5G non–stand-alone technology as an incre-
mental evolution of 4G greatly improves its viability.
It could then become a cost-effective technology for
meeting population coverage in densely populated
middle-income countries once data traffic grows to
the point that large numbers of users are demanding
many gigabytes of data per month.22
Viability could
also be greatly improved by adopting regulations that
promote sharing of infrastructure and policies that
limit the burden of taxes and spectrum license fees
on investors.23
Pastinvestmentsinfiber-opticbackbonenetworks
and cellular towers (under 3G) have had a significant
impact on the viability of the newer wireless technol-
ogies(suchas4Gand5G).Countriesatanearlierstage
of infrastructure development will find it challenging
to leapfrog ahead, but for others investments in the
fiber-optic backbone will continue to provide payoffs
as countries upgrade to more advanced technologies.
The coverage gap reflects the lack of commercial
viability associated with serving remote populations
in the absence of any government intervention. Clos-
ing this gap calls for concerted efforts to drive down
the cost of service provision, as well as better design
of government policies on universal service access.
Among the main policy measures to cut costs are
those aimed at strengthening competition in the sec-
tor, enabling the sharing of infrastructure, improving
the availability and affordability of the wireless spec-
trum, and exploiting new technologies.
Reducing retail costs. Individuals’ access to reliable
high-speed data services depends on both extensive
last-mile coverage and proximity to the national
fiber-optic backbone infrastructure. Limited retail
competition can lead to high profit margins inflating
charges to customers for last-mile access.24
In addi-
tion, costs may be relatively high due to limited elec-
tricity coverage in outlying areas, forcing operators to
rely on their own higher-cost diesel power generation
for base stations instead of being able to draw energy
from the public grid.25
Meanwhile, recent innovations
in wireless cellular technologies may reduce the cost
of last-mile rollout. For example, in Japan Rakuten
recently achieved 40 percent reductions in the cost of
traditional cellular networks through migration to a
cloud-based, software-driven environment.26
Introducing fiber-optic backbone competition. The
greater the proximity of users to backbone infrastruc-
ture, the stronger are the mobile signals and the faster
the data download speeds. In Africa, 45 percent of the
population lives more than 10 kilometers away from
fiber-optic network infrastructure—more than in any
other region.27
Relative to microwave links, fiber-optic
backbones offer greater carrying capacity at higher
speeds. Yet many countries in Africa still need to
upgrade from microwave to fiber-optic technology—
an estimated 250,000 kilometers of fiber-optic cable
are needed across the continent.28
Deployment of
fiber-optic cable can cost as much as US$70,000 per
kilometer29
—a high entry barrier for building national
fiber-optic backbones.
As a result, competition is often limited, and, in
the absence of regulation, high wholesale prices and
limited network development may result. Lack of
competition is further exacerbated when backbone
infrastructure operators are vertically integrated,
providing both wholesale and retail services.30
In the
absence of robust competition, some countries have
opted for state-owned backbone development, based
on vendor financing, but this approach crowds out
private investment and unnecessarily adds to the
public debt. A competitive backbone market may be a
Figure 5.4 Globally, the coverage of wireless
technologies reflects their constant upgrading
Source: WDR 2021 team, based on data in ITU (2018a). Data at http://bit.do/WDR2021-Fig-5_4.
0
2013 2014 2015 2016 2017 2018
1
2
3
4
5
6
7
8
Coverage
(billions
of
people)
World population Internet users
2G 3G 4G
Internet service:
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 163
preferable alternative policy, with government taking
on a coordinating role, inviting multiple operators
to participate, enforcing open access and cost-based
pricing, and offering incentives to existing or new
operators to invest in less lucrative areas to complete
the infrastructure backbone.
Enabling sharing of infrastructure. Another way to
increase coverage by keeping costs down is to create a
regulatory environment that facilitates the sharing of
infrastructure both across sectors and within digital
infrastructure markets.31
The cost of broadband trans-
mission and core network deployment can be reduced
by using existing railway lines, power transmission
grids, and pipelines, or by coordinating with road
construction to lay ducts along highways. In emerg-
ing markets, and particularly in the poorest countries
where demand may be thin and infrastructure costs
and the associated risks relatively high, operators
could be allowed to share backhaul infrastructure
(such as fiber-optic cable) or local facilities (such as
communication towers). Sharing of infrastructure
has great potential to accelerate digital connectivity.
Recent estimates suggest that the cost of deploying
5G mobile network technology could be reduced
by more than 40 percent by sharing antenna sites.32
However, the tensions between promoting competi-
tion and enabling cooperation in the market for dig-
ital infrastructure must be carefully balanced, with
cooperationencouragedonlyinmarketsegmentsthat
cannot efficiently support more than one operator.
Improving the availability and affordability of the
spectrum. Making adequate spectrum available at
relatively low cost is important for reducing coverage
gaps. A low-frequency spectrum is attractive for rural
areas because it provides wider coverage, requiring
a lower density of cellular towers to cover a given
area and reducing investment costs. Governments
have often delayed the migration from analog to dig-
ital television, which releases coveted low-frequency
spectrum for wireless broadband use. Some govern-
ments auction frequencies with elevated reserve
prices that raise investment costs and are then passed
on to users through higher prices. For example, in
Senegal operators boycotted the 4G spectrum auc-
tion because of the high reserve price—CFAF 30 bil-
lion (US$49.86 million).33
Other governments charge
recurring fees for the use of spectrum, raising the
cost of deploying infrastructure in rural areas.
Exploiting new technologies. Emerging niche tech-
nologies—such as TV white space (TVWS), hot air bal-
loons,andlow-orbitsatellites—promisetosignificantly
reduce last-mile deployment costs in remote areas,
although many have yet to scale up commercially.
TVWS uses the buffer frequencies between TV chan-
nels to provide broadband internet access. It is already
being used successfully in Colombia to connect rural
schools and coffee plantations in geographically chal-
lenging locations such as mountainous rainforests.34
Two innovative solutions that have been proposed to
reach remote rural areas are high-altitude platform
station (HAPS) systems, which use a network of hot air
balloons to provide unserved locations with connectiv-
ity,35
and low-Earth orbiting (LEO) satellites. Iridium—
which in 1998 became the first LEO to launch—today
has slightly more than 1 million subscribers, mainly in
niche markets such as the maritime aviation sectors
and emergency services, as well as oil and gas.36
Yet
neither HAPS nor LEO satellites have proved they can
provide direct consumer broadband access in rural
areas on a sustainable basis at an affordable price.
Reforming universal service funds. Adopting these
approaches to driving down costs can substantially
expand the coverage attainable on a commercially
viable basis. Nonetheless, some remote pockets will
not reach universal access without some form of
state support. Many countries have created universal
service funds to harness public resources to subsidize
infrastructure rollout in unserved areas. These funds
are typically financed by obligatory levies charged on
operators. However, for a variety of reasons many of
these funds have proven to be unsuccessful (Kenya is
one of the few exceptions in Africa).37
Funds often suf-
fer from poor design, lack of spatial planning to guide
fund allocations, a mismatch between funds collected
and disbursed, political interference, and failure to
incorporate sustainability factors such as training
and education, maintenance, and energy supply.38
For
example, in Africa more than US$400 million worth
of universal service funds have not been disbursed.39
A study of countries with universal service funds in
the Asia-Pacific region found that they did not expe-
rience higher internet growth than countries without
funds—except Malaysia and Pakistan, where the
funds were transparent, efficient, and targeted exten-
sion of the national fiber-optic backbones.40
Tackling the usage gap
Of the 3.8 billion people not using the internet in 2018,
3.1 billion lived within range of a wireless broadband
signal.41
Government efforts to provide universal ser-
vice access have traditionally focused on eliminating
the coverage gap through rolling out the supply of
infrastructure, but such policies should increasingly
be oriented toward addressing the demand-side barri-
ers that limit service uptake, thereby creating such a
sizable usage gap.
164 | World Development Report 2021
Targeting the most critical underserved segments.
Although traditional universal service policies have
largely focused on directing public support to under-
served rural communities, there is considerable scope
to target demand-side policy measures to particular
categories of under-served individuals irrespective of
their location. A suite of large sample household sur-
veys conducted in 22 developing countries in 2017/18
reveal that people who do not connect to broadband
service even when it is available are significantly
more likely to be poor (in the bottom 40 percent of the
national income distribution), less educated (having
only a primary education), elderly (over 50 years old),
and female.42
Of these, the largest effect is associated
with education: completion of primary education
adds 35 percentage points to the likelihood of internet
uptake.
Also noteworthy is the significant gender digital
divide. Globally, some 250 million fewer women than
men use the internet. In low-income countries, only
one in seven women is online, compared with one in
five men.43
Women are somewhat more likely than
men to be challenged by digital literacy issues and to
face additional obstacles to being online. For example,
in many countries lack of family approval for women
owning a cellphone is a major barrier.44
Broadly, three reasons have been put forward to
explain the usage gap in low- and middle-income
countries. First, people find it difficult to afford a
mobile device or data services. Second, they lack the
digital literacy needed to use the internet.45
Or, third,
they do not see internet services offering any content
or application of relevance to their lives. The house-
hold surveys conducted in 22 developing countries in
2017/18 found that the reasons most frequently cited
by people for not taking up data services are related
to digital literacy (69 percent), followed by afford-
ability concerns (15 percent) and relevance issues (12
percent)—see figure 5.5.46
Digital literacy limitations
appear to dominate in South Asia, whereas afford-
ability concerns are more prominent in some African
countries. These different constraints are not, how-
ever, mutually exclusive. A person who becomes dig-
itally literate and therefore more motivated to access
the internet may then face affordability challenges
not relevant before.
Individuals are also attracted to using the internet
when family members or friends do so, particularly
when it comes to social media. Analysis conducted for
this Report found that social network effects have a
significant positive impact on the usage of wireless
internet in low- and middle-income countries. Indi-
viduals whose five closest friends are using an online
social network are 63 percent more likely to use the
internet than those whose closest friends are not
already active on social media.47
Addressing the widespread problem of digital literacy.
In the 2017/18 household surveys, digital literacy was
the most fundamental reason given for not using the
internet. More than 84 percent of those surveyed who
Figure 5.5 In low- and middle-income countries, nearly 70 percent of those who do not use the
internet are held back by deficiencies in digital literacy
Source: Chen 2021. Data at http://bit.do/WDR2021-Fig-5_5.
Note: Respondents to the survey conducted for this Report had access to internet service. Responses to the digital literacy category included “Do not know what internet is” and “Do not
know how to use internet.” Responses to the affordability category included “No access device” and “Too expensive.” Responses to the relevance category included “No interest/not
useful” and “No relevant content in local language.”
0
10
20
30
40
50
60
70
80
90
100
Share
of
respondents
(%)
N
e
p
a
l
C
o
l
o
m
b
i
a
I
n
d
i
a
B
a
n
g
l
a
d
e
s
h
P
a
k
i
s
t
a
n
L
e
s
o
t
h
o
S
e
n
e
g
a
l
P
e
r
u
G
u
a
t
e
m
a
l
a
C
a
m
b
o
d
i
a
P
a
r
a
g
u
a
y
N
i
g
e
r
i
a
G
h
a
n
a
E
c
u
a
d
o
r
S
r
i
L
a
n
k
a
K
e
n
y
a
A
r
g
e
n
t
i
n
a
U
g
a
n
d
a
M
o
z
a
m
b
i
q
u
e
T
a
n
z
a
n
i
a
R
w
a
n
d
a
S
o
u
t
h
A
f
r
i
c
a
T
o
t
a
l
Digital literacy Affordability Relevance Other
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 165
were either uneducated or had completed only a pri-
mary school education stated they “do not know what
internet is” or “do not know how to use internet.”48
Various initiatives are under way to teach basic
digital skills. Mobile industry association GSMA has
developed a Mobile Internet Skills Training Toolkit
basedona“trainthetrainers”approach.49
Resultsfrom
a pilot project in Bangladesh found that mobile inter-
net usage among the beneficiary group more than
tripled, with 19 percent of group members becoming
regular mobile data users.50
In early 2017, the Rwan-
dan government launched the Digital Ambassadors
Program, which trained 5,000 youth posted to all 30
districts in the country to provide digital skills train-
ing to 5 million Rwandans over a four-year period.51
Field studies conducted in Burkina Faso, Mali, Sen-
egal, and Tanzania found that audio and icon-based
interfaces and a stripped-down version of the internet
(“internet lite”) helped students overcome their digital
literacy limitations.52
Despite these examples, there is
little evidence that digital literacy programs are oper-
ating at the scale needed to significantly improve the
uptake of data services, or that they are being suitably
integrated with efforts to address the more funda-
mental underlying problem of basic literacy.
Once people become digitally literate, a key deter-
minant of using the internet is availability of local
language content.53
Social media usage grows rapidly
as the relevant apps become available in local lan-
guages much sooner than internet content.54
Making digital devices more affordable. Poor people
wishing to avail themselves of internet access must
first be able to afford a mobile device. However,
according to one study, the cost of even an entry-level
device exceeds 20 percent of the monthly income in
more than half of low- and middle-income nations.55
Another study found that the cost of a low-end US$42
smartphone is more than 80 percent of the monthly
income in low-income countries.56
Efforts are under way to make entry-level inter-
net devices more affordable. Mobile operators are
creating partnerships to obtain inexpensive handsets
or are bundling mobile phones with subscriptions.
Pan-African operator MTN collaborated with China
Mobile to launch a US$20 smartphone targeted at
10 million first-time users.57
In India, Jio offers an
internet-enabled phone for Rs 699 (US$9.21), provided
the customer spends at least Rs 1,500 (US$19.77) a year
on service charges.58
Although most branded phones
are manufactured in East Asia, several countries have
created reassembly plants to manufacture inexpen-
sive mobile phones locally. In Ethiopia, a Chinese
company is assembling about 1 million phones a year
for export throughout the region.59
In Costa Rica and
Malaysia, universal service funds have been used to
subsidize internet devices for low-income users.60
Taxes, import duties, and other fees also affect
device affordability. Despite the low purchasing
power of their populations, low-income countries
on average impose the highest customs duties on
mobile phones, adding 7 percent to prices on average.
One study found that several mainly low- and middle-
income countries applied handset excise taxes
(beyond the regular sales tax) and activation fees.61
Ownership is also affected by substantial gender
gaps in low- and middle-income countries; the share
of men owning mobile devices is 20 percentage points
higher than the share of women.62
Narrowing the consumption gap
Even among people who connect to the internet and
subscribe to data services, a wide consumption gap
remains in wireless data usage across country income
levels and regions, with the data usage per capita in
high-income countries more than 30 times higher
than that in low-income countries (figure 5.6). The
consumption gap is even wider if fixed broadband is
considered. The number of fixed broadband subscrip-
tions is much higher in high-income economies, and
because of more favorable data plans, these subscrip-
tions support much higher levels of consumption
than mobile subscriptions—potentially as much as
100 times more.63
The consumption gap raises questions about how
much data are “enough” to meet basic social and eco-
nomic needs. In 2019 the Alliance for Affordable Inter-
net (A4AI) stated that 1 gigabyte of data per month
was sufficient to benefit from the internet in a mean-
ingful way,64
but later it revised its estimate of “mean-
ingful connectivity” to unlimited access as a result of
the burgeoning use of data during the COVID-19 pan­
demic.65
Based on a detailed empirical examination
of data consumption patterns, this Report estimates
that 660 megabytes per month is adequate to meet
basic needs for e-government services, online shop-
ping, browsing news, medical and educational infor-
mation, and the like, rising steeply toward 6 gigabytes
per month if a certain amount of social media and
video-related usage is also included.66
Two fundamental drivers of low data consump-
tion are the struggle to afford data usage charges
and the technical constraints on network perfor-
mance. These two drivers can be related. Problems
with affordability translate into lower usage, which,
in turn, means lower revenue streams and weaker
incentives to invest in better network performance.
166 | World Development Report 2021
Tackling affordability constraints. Many internet
users in low- and middle-income nations limit their
mobile data usage because of affordability con-
straints (figure 5.7). A survey carried out in 11 emerg-
ing countries found that a median of 48 percent of
respondents had difficulty paying for their mobile
data usage, and 42 percent frequently or occasion-
ally restrict the amount of data they use.67
Instead
of purchasing large amounts of data on a monthly
basis, users buy it in small amounts when they have
the money. Many mobile operators offer a variety of
data bundles to cater to this pattern. MTN Zambia, for
example, has 17 prepaid data plans, ranging from one-
hour plans, including 5 megabytes of data, to weekly
bundles offering unlimited access to popular social
media applications.68
What is an affordable level of expenditure on data
services has been the subject of some debate. The
Alliance for Affordable Internet established a norma-
tive affordability threshold of 2 percent of monthly
income linked to a normative consumption threshold
of 1 gigabyte per month.69
This threshold was subse-
quently adopted by the UN Broadband Commission.70
According to these norms, data services could be con-
sidered generally affordable to the average consumer,
except in low-income countries. The reality is that the
actual expenditure on data services (known as average
revenueperuserorARPU)andtheassociateddatacon-
sumption levels both fall well below these norms. In
fact, it is only when the cost per gigabyte of data drops
below 0.5 percent of gross national income (GNI) per
capita that data consumption reaches and eventually
exceeds the 1 gigabyte threshold (figure 5.8).
The amount of data that people can afford to con-
sume is itself a function of the prices that operators
offer across different markets, as well as over time. In
India, rapid entry of mobile operators offering 4G ser-
vice in 2016 boosted coverage from 4 percent in 2015
to 94 percent in 2018.71
Intensifying competition led
to a dramatic price drop from US$4.41 to US$0.17 per
gigabyte per month from 2014 to 2018 and a surge in
consumption per subscriber from 0.3 to 7.7 gigabytes
per month over the same period.72
Similarly, in Cam-
bodia intense competition has brought down the cost
per gigabyte of data from US$4.56 in 2013 to one of
the world’s lowest at US$0.13 in 2019, driving up data
consumption to 6.9 gigabytes per capita per month—
the highest mobile data usage per capita of any low-
or lower-middle-income nation. This increase was
achieved through a combination of measures such as
migrating spectrum and users to 4G to achieve lower
operating costs, outsourcing construction work,
and moving software to the cloud.73
As a result of
the low data charges, Cambodian consumers devote
Figure 5.6 Inequities in mobile data consumption across country income groups and regions
are huge
Source: WDR 2021 team. Data at http://bit.do/WDR2021-Fig-5_6.
Note: Data are for 2018. Figures include averages of 119 economies with data. GB = gigabytes.
7.1
2.8
1.3
0.2
3.7
0
1
2
3
4
5
6
7
8
Data
consumption
(GB
per
capita
per
month)
Data
consumption
(GB
per
capita
per
month)
H
i
g
h
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
W
o
r
l
d
a. By country income group
7.4
5.0
4.5
4.2
2.4
1.1
0.3
0
1
2
3
4
5
6
7
8
M
i
d
d
l
e
E
a
s
t
a
n
d
N
o
r
t
h
A
f
r
i
c
a
E
u
r
o
p
e
a
n
d
C
e
n
t
r
a
l
A
s
i
a
N
o
r
t
h
A
m
e
r
i
c
a
E
a
s
t
A
s
i
a
a
n
d
P
a
c
i
fi
c
L
a
t
i
n
A
m
e
r
i
c
a
a
n
d
t
h
e
C
a
r
i
b
b
e
a
n
S
o
u
t
h
A
s
i
a
S
u
b
-
S
a
h
a
r
a
n
A
f
r
i
c
a
b. By region
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 167
96 percent of their information and communication
technology (ICT) spending to data services, having
largely moved away from voice and text.
Interestingly, users in low-income countries typ-
ically spend much more on voice and text services—
between 2 and 4 percent of monthly income—than
they do on data services. Because traditional voice
and text can alternatively be provided on over-the-top
(OTT) data-based services, which bypass traditional
distribution and use only a small amount of data,
users could substantially reduce their overall ICT
expenditure by substituting data for voice and text
services.
Addressing technical constraints. Slow speeds also
discourage consumers from using more data. Down-
loading 250 megabytes takes 17 seconds at a speed
of 100 megabytes per second, but as long as three
minutes at a speed of 10 megabytes per second.74
The
speed dividends arising from migration to the next
generation of mobile technology clearly drive higher
levels of data consumption. For example, in India
during 2018 a 2G subscriber consumed just 0.5 giga-
bytes per month, rising to 5.3 gigabytes per month for
a 3G subscriber and 9.7 gigabytes per month for a 4G
subscriber.75
Regulatory policies and retail competition drive
migration to higher-generation mobile technologies.
Transitions to next generations can be encouraged
by early release of competitively priced spectrum.
The auction of 3G spectrum in Thailand in 2012 was
designed to support rapid upgrade from 2G by incor-
porating license conditions for coverage, pricing, and
quality. Within two years, all license conditions were
exceeded, with nearly universal 3G coverage, upgrad-
ing of three-quarters of subscriptions, lower prices,
and service speeds exceeding license requirements
by tenfold.76
Allowing operators to “refarm” their spectrum
holdings can also accelerate migration to next-
generation mobile. In contrast to regulatory policies
that tie new mobile technologies to specific frequen-
cies that are then auctioned, refarming allows oper-
ators to launch new mobile technology using their
own spectrum. This encourages them not only to be
more efficient in their use of spectrum, but also to
rapidly upgrade their existing subscribers to reclaim
the spectrum used for the older technology. About
three-quartersofthespectrumusedfor4Garoundthe
world has been refarmed: 25 percent from switching
from analog to digital television broadcasting and 50
percent from operators using their own spectrum.77
Nonetheless, many operators try to recapture their
original capital expenditure before upgrading to a
new generation of wireless technology.78
To overcome
this issue, Rwanda created a public-private part-
nership to roll out a 4G/LTE (Long-Term Evolution)
network that covered almost 99 percent of the popu-
lation by the end of 2019—the highest level in the
world.However,becauseofthehighcostofobtaininga
4G/LTE–compatible device, as well as technical chal-
lenges with the migration of voice services, service
uptake has been modest, with most Rwandans con-
tinuing to use slow 2G technology for mobile data.79
Figure 5.7 The monthly price for
1 gigabyte of data is unaffordable in
low-income countries
Source: Cable.co.uk, Worldwide Mobile Data Pricing 2020: The Cost of 1GB
of Mobile Data in 228 Countries (dashboard), https://www.cable.co.uk
/mobiles/worldwide-data-pricing/. Data at http://bit.do/WDR2021
-Fig-5_7.
Note: Data are as of 2018. The affordability threshold is 2 percent of monthly
income. Prices are the median prices of the economies in the group. GNI =
gross national income.
0
Affordability threshold
1
2
3
4
5
6
7
8
9
Data
cost
(%
of
GNI
per
capita)
H
i
g
h
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
W
o
r
l
d
Figure 5.8 Data consumption is very
sensitive to market prices and service
affordability
Source: WDR 2021 team. Data at http://bit.do/WDR2021-Fig-5_8.
Note: Each circle represents a country. GB = gigabyte; GNI = gross national
income.
R² = 0.7637
0
1
2
3
4
5
6
7
0.5
0 1.0 1.5 2.0 2.5 3.0
GB
usage
per
month
Average cost per user per GB
(% of GNI per capita)
168 | World Development Report 2021
Connecting poor countries
The high cost and low speed of internet services
have emerged as key drivers of data consumption
in the developing world. One reason is that many
low-income countries lack their own domestic data
infrastructure, relying instead on overseas facilities
to exchange data (via internet exchange points), store
data (at colocation data centers), and process data
(on cloud platforms). This reliance requires them to
transfer large volumes of data in and out of the coun-
try (see “tromboning” in figure 5.9), for which they
pay a substantial penalty: prices that are several times
higher than those in countries with their own infra-
structure. They also experience slower speeds that
can be an order of magnitude lower. This situation
can be avoided by creating IXP infrastructure at the
national level, eventually complemented by coloca-
tion data centers.
Consider a user who wants to view an educational
video online. The request is uploaded as a small
packet of data with address information and goes
from the user’s device to the national backbone and
onward to the internet service provider (ISP). Often
in lower-income countries, the video is not available
domestically, obliging the ISP to route the request
overseas, where it finds its destination—say, in Cal-
ifornia. The video is then downloaded back to the
user. Such a circuitous process for accessing content
incurs significant charges from international carriers
while prejudicing service quality. This same oper-
ation would be much faster and cheaper if a replica
of this content were stored at a local colocation data
center that could be accessed via a local IXP.
This example illustrates how international band-
width is a critical part of the data infrastructure,
enabling data to be sent to and retrieved from any-
where in the world. The global internet bandwidth
stood at 463 terabytes per second in 2019, almost
tripling from 2015. Sub-Saharan Africa had the fastest
growth in bandwidth of any region over the 2015–19
period. It grew by 53 percent a year, reflecting a large
increase in capacity because of the deployment of
new submarine cables. However, Sub-Saharan Africa
continues to lag other regions in total capacity.
Most international internet traffic is carried over
the world’s dense web of some 400 undersea fiber-
optic cables, spanning more than 1 million kilo-
meters.80
Almost all coastal economies are now
connected to undersea cables (map 5.1). Submarine
cable ownership has diversified from consortiums of
telecommunication operators to include wholesale
operators and increasingly big content providers
such as Amazon, Google, and Microsoft.81
Notably,
Facebook recently announced plans to lay the 2Africa
submarine cable around Africa. It will have nearly
three times the capacity of all the undersea cables
currently serving the continent.82
The growing
convergence of content provision and carriage of
content will require greater regulatory oversight to
ensure that carriage is provided in an open, nondis-
criminatory manner.
Before establishing a submarine cable connection,
countries used costly, low-capacity satellite links.
Connection to submarine cables has dramatically
lowered wholesale international bandwidth prices.
Results in Africa over the last decade have been dra-
matic, with the price of 1 megabit per second drop-
ping from US$3,500 to US$29 in Mauritania and from
US$1,174 to US$73 in Togo.83
In Tonga, the submarine
cable increased capacity by more than 100 times,
while prices dropped from US$495 to US$155 per
megabit per second.84
Nonetheless, restrictive policies
for access to submarine cable landing stations may
Figure 5.9 The presence of domestic data
infrastructure facilitates national data exchanges
Source: WDR 2021 team.
Note: ISP = internet service provider; IXP = internet exchange point.
mn plus margin 21p7)
International
Domestic
Local content
data flow
Tromboning
of local content
End user End user
International
carrier 1
International
carrier 2
Domestic
ISP 1
Domestic
ISP 2
IXP
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 169
prevent the full benefits of this reduction in whole-
sale prices to feed through into retail tariffs.
Despite such progress, huge price differences
persist for the exchange of data traffic. For example,
the cost of exchanging data is around US$0.45 per
megabyte per second in North America and US$0.62
per megabyte per second in London, compared with
US$2.38 in São Paulo and US$5.00 in Johannesburg.85
These stark differences in costs may in part reflect the
limited development of domestic data infrastructure
in low- and middle-income countries.
Strengthening data infrastructure
Without a domestic capability to exchange data,
countries are totally reliant on international band-
width. As noted, such bandwidth is expensive and
slower than exchanging traffic locally. Such reliance
also affects service resilience, since a country is
completely shut off from the internet if there is any
disruption to international bandwidth. For exam-
ple, after a trawler snapped a submarine cable in
2018, Mauritania was offline for two days, and nine
other West African countries experienced internet
outages.86
Although there will always be a need for
international bandwidth, an appropriate balance is
needed between relying on overseas infrastructure
and developing domestic facilities.87
The economic case for domestic data infrastruc-
ture hinges on whether the present value of the
resulting cost savings and speed improvements for
data transactions over the life of such infrastructure
exceeds the associated immediate up-front invest-
ment in facilities. The cost of developing IXPs is
relatively modest and likely can be supported even in
nascent markets so long as the sector is not monopo-
listic. As for colocation data centers, the investments
are more sizable. There are also significant scale
economies associated with the development of the
associated power infrastructure that may account
for as much as 40 percent of investment costs. The
operating expenses are also largely fixed; about half
of them are related to energy for cooling the facili-
ties. Because exceptionally high levels of reliability
and security are needed for colocation data centers,
market dynamics favor hyperscale service provid-
ers with established reputations. This requirement
further reinforces the case for larger-scale facilities
in countries that have a relatively stable investment
climate, including low levels of disaster risk, and
the availability of clean, reliable, and cost-effective
Map 5.1 The global fiber-optic cable submarine network reaches all corners of the world,
but data infrastructure is unevenly developed
Sources: PeeringDB, Interconnection Database, https://www.peeringdb.com/; PCH Packet Clearing House, Packet Clearing House Report on Internet Exchange Point Locations
(database), accessed December 14, 2020, https://www.pch.net/ixp/summary; TeleGeography, Submarine Cables (database), https://www.submarinecablemap.com/. Data at
http://bit.do/WDR2021-Map-5_1.
Colocation data centers
Internet exchange points (IXPs)
Submarine cable landing stations
Submarine cables
IBRD 45xxx |
MARCH 2021 IBRDWLD45643 | MARCH2021
170 | World Development Report 2021
sources of energy or natural sources of cooling such
as water bodies.
Creating internet exchange points. By keeping data
traffic in the country, IXPs can reduce reliance on
international bandwidth, lowering costs and improv-
ing performance. One study covering Latin America
noted that “local bits” are cheaper than “exported bits,”
finding that the region spent around US$2 billion a
year for international bandwidth—a sum that could
be reduced by one-third through greater use of IXPs.88
IXPs reduce the time it takes to retrieve data, enhanc-
ing user engagement. In Rwanda, it is 40 times faster
to access a locally hosted website (5 milliseconds)
than one hosted in the United States or Europe (200
milliseconds).89
As of June 2020, there were 556 IXPs across the
globe.90
Europe, with the largest number, accounts
for 37 percent of the world total, while Africa has
just 9 percent and accounts for less than 2 percent
of global IXP traffic, although that traffic is growing
rapidly.91
Stark differences in the availability of IXPs
are evident across country income groups, partic-
ularly when population differences are taken into
account.
IXPs are often established initially by universities
or as nonprofit associations of ISPs, located in small
server rooms with technical tasks carried out by vol-
unteers. As greater volumes of traffic are exchanged
and new participants join, a more sustainable tech-
nical and operational environment is needed. Gover-
nance arrangements are then formalized, staff hired,
and equipment upgraded. Eventually, the IXP grows
to the point where many participants want to join
without having to deploy a physical connection to the
exchange. This leads to the creation of multiple IXPs
indifferentlocations,withthecentralIXPrelocatedto
a colocation data center. For example, DE-CIX, an IXP
in Frankfurt, Germany, began operations 25 years ago
in an old post office when three ISPs interconnected
their networks. 92
Today, it is the world’s leading IXP,
spread over more than three dozen data centers and
linking almost 1,000 participants, with average traffic
of more than 6 terabytes per second.
Developing colocation data centers. Data centers
have emerged as a vital component of the digital
infrastructure ecosystem. In a data center, networked
computers provide remote storage, processing, and
distribution of data. The centers are mainly operated
by global information technology (IT) companies,
governments, and enterprises that host other com­
panies’ data (colocation data centers). Data centers
range from small rooms in organizations where data
are kept on storage devices connected to computer
servers to giant warehouse-like facilities where thou-
sands of servers are arranged on racks. Colocation
data centers offer companies multiple advantages,
including the physical space to store a growing
amount of data, the security associated with high
industry reliability standards (as they almost never
shut down), and easy internet access due to their
growing association with IXPs.93
Globally, some 3,700 data centers are connected
to the internet.94
The disparities in data center pen-
etration among country income levels are wide,
particularly when taking population differences into
account (figure 5.10). Although there are more than
three data centers per million inhabitants in North
America, the ratio is only 0.8 per million in South Asia
and Sub-Saharan Africa. In fact, there are more data
centers in the state of California than in all of Sub-
Saharan Africa. These disparities may be related to
lower income and thus demand, but they also reflect
shortcomings in the investment climate.
Major colocation data center companies have
largely shunned investing in low- and middle-
income economies. This lack of investment is often
attributed to a lack of demand, as well as an aversion
to a country’s perceived high risk of natural disas-
ters, unpredictable political environment, barriers to
doing business, and unreliable energy and internet
infrastructure.95
However, certain large businesses in
low-income countries, such as those in the financial
Figure 5.10 Data infrastructure is
relatively scarce in low- and middle-
income countries
Source: WDR 2021 team, adapted from PeeringDB, https://www.peeringdb
.com/. Data at http://bit.do/WDR2021-Fig-5_10.
Note: The figure depicts data centers connected to the internet. Data were
extracted in June 2020.
Percentage
of
countries
with
data
infrastructure
90
100
80
70
60
50
40
30
20
10
0
Internet
exchange points
Colocation
data centers
Cloud
on-ramps
High-income Upper-middle-income
Lower-middle-income Low-income
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 171
sector and other service industries, already have
in-house data storage systems that, if aggregated,
could create the scale necessary for colocation data
centers.
Regional players are filling the void left by the
large global data center providers. For example, Africa
Data Centres (part of the Liquid Telecom Group) has
colocation facilities in Kenya, Rwanda, South Africa,
and Zimbabwe. Although some of the scale issues
associated with developing colocation data centers
could potentially be overcome through regional
collaboration around shared facilities, the case for
such an approach hinges on the existence of strong
regional fiber-optic network connectivity to ensure
that data can be transferred rapidly and reliably to
any shared regional data facility; competitive pricing
of such data transfers; and regional harmonization of
the regulatory framework to support agile cross-bor-
der data transfers (as discussed in chapter 7).
Despite mounting concerns about the environ-
mental impact of data centers, there is evidence that
the industry is taking aggressive action to curtail
emissions and that availability of renewable energy
is a factor in attracting investment (see spotlight 5.2).
Climbing the data infrastructure ladder. A country’s
development of data infrastructure can be envi-
sioned as a series of stages that over time lower costs
and improve performance (figure 5.11).96
The stages
progress from having no domestic IXP (stage 0), to
establishing an IXP (stage 1), to attracting content
providers and deploying data centers that host a
diverse group of participants (stage 2), to locating
the IXP alongside a colocation data center (stage 3).
As countries move up the ladder, more data are
exchanged nationally, and reliance on costly interna-
tional bandwidth is consequently reduced, lowering
retail prices, raising speed, and allowing higher data
consumption.
Drawing on industry registries capturing the
availability of data infrastructure globally in 2020,
it is possible to build a comprehensive picture of
domestic data infrastructure in the developing
world. At stage 0 are 28 of the countries surveyed
for this Report, none of which has an IXP, and these
include almost half of the low-income country group.
Underlying barriers are often responsible for the lack
of an IXP: 10 of these countries are small island states
where the scale of traffic is insufficient; four are in
fragile and conflict-affected situations that impede
the development of the data ecosystem; and five are
monopolies where the sole national telecom operator
is also in effect the IXP.
At stage 1 are 29 countries in which IXPs connect
local ISPs.These include more than a third of low- and
lower-middle-income countries. These IXPs are often
located on the premises of government agencies or
at academic institutions, typically in a small server
room and in some cases using equipment provided
through development assistance. For example, the
African Internet Exchange System (AXIS) project,
financed by the European Union, established IXPs in
14 African nations.97
The IXPs in this group of nations
often generate limited amounts of traffic, sometimes
because not all ISPs participate in the IXP. In some
Figure 5.11 Countries develop domestic data infrastructure in stages
Source: Comini, Srinivasan, and Minges, forthcoming.
Note: Data provide close to global coverage for the year 2020 and are compiled from a variety of industry sources, including Packet Clearing House, CAIDA, PeeringDB,
EURO-IX, and AF-IX. Amounts are in US dollars. GB = gigabyte; ISP = internet service provider; IXP = internet exchange point; MB = megabit.
28 countries
Low-income: 48%
Lower-middle-
income: 30%
29 countries
Low-income: 38%
Lower-middle-
income: 36%
13 countries
Low-income: 14%
Lower-middle-
income: 18%
7 countries
Low-income: 0%
Lower-middle-
income: 16%
Average price of
fixed broadband
per MB per month
Average price of
wireless broadband
per GB per month
Stage 3
IXP located alongside
a colocation data center
Stage 2
Diversity of participants
at the IXP and presence
of major international
content providers
Stage 1
Domestic data between
ISPs exchanged at
the IXP
Stage 0
Data exchanged overseas
Figure 5.11 (2 column plus margin 38p6)
$140.00
$5.60
$18.30
$4.50
$17.10
$2.20
$4.30
$0.80
172 | World Development Report 2021
countries, regulatory restrictions forbid participants
that are not ISPs from joining the IXP.
The 13 countries at stage 2 have IXPs with non-
ISP participants. Less than one-fifth of low- and
lower-middle-income countries fall into this group.
This group generally has numerous and diverse
participants in the IXP, including all ISPs, as well
as government agencies, local cloud providers, and
national content companies, among others. Often,
they have a Google Global Cache, a subset of Google’s
most popular content,98
as well as content delivery
networks (CDNs). However, large international con-
tent and cloud providers do not always use the IXP
either because it is not located in a neutral data center
or because its governance is not organized according
to an open multistakeholder model.
At stage 3 are seven countries with IXPs colocated
in data centers with international content partici-
pants. There are no low-income countries in this cat-
egory, and only a minority of lower-middle-income
countries. These arrangements often include multi-
ple IXPs located in several data centers to facilitate
participation. The Kenya Internet Exchange Point
(KIXP) illustrates how this results in a dense network,
enabling many participants to exchange data. KIXP
is operated by a nonprofit organization representing
technology companies, and its board follows inter-
national IXP best practices. KIXP has no restrictions
on the types of organization that can connect to the
exchange. It is located in colocation data centers in
Kenya’s two largest cities, Nairobi and Mombasa. Par-
ticipants include national, regional, and international
ISPs; government agencies; financial companies; and
international content and cloud providers such as
Amazon, Facebook, Google, and Microsoft.
Empirical evidence suggests that the benefits to
countries of moving up the data infrastructure ladder
are substantial. The average cost per gigabyte of wire-
less data per month drops from US$5.60 in countries
at stage 0 to US$0.80 in countries at stage 3, while the
corresponding cost per megabit for fixed data drops
from US$140.00 to US$4.30 per month.
Nonetheless, the full benefits of developing
domestic data infrastructure become apparent only
when the local market is sufficiently competitive. For
example, because of its strategic geographic location
on the Horn of Africa, Djibouti’s DjIX is a regional
hub handling the exchange of considerable volumes
of internet traffic. However, even though Djibouti’s
data infrastructure provides valuable services to
neighboring countries, the monopolistic market
structure of the national telecommunications sec-
tor does not allow these advantages to be passed on
to domestic consumers, who face some of the most
unaffordable internet charges in the world.99
Policy makers have an important role to play in
shepherding IXPs through these various stages of
development. In the early stages, demand remains
incipient, and it is not possible to benefit from scale
economies in infrastructure development. Gov-
ernments and academic institutions can help ini-
tially with nascent IXPs by providing facilities and
resources for training. Encouraging ISPs to exchange
traffic locally helps boost demand for data services
by reducing the cost of exchange. If needed, govern-
ments may have to mandate participation, particu-
larly where dominant operators have been resistant.
As their IXPs grow, governments can reduce their
role, encouraging IXPs to become self-sustaining.
Open IXP policies and multistakeholder governance
are important for attracting non-ISPs to the member-
ship, including large content providers. Government
insistence on control over IXP practices discourages
private sector investment in the data ecosystem.100
A
supportive regulatory environment for IXPs, as well
as attention to sound governance practices, should
ensure that multiple ISPs as well as universities, large
enterprises, and other significant users make full
use of the available IXP infrastructure. For example,
in 2011 Bolivia legally mandated the creation of a
national IXP requiring the participation of local ISPs.
However, traffic growth was limited until 2018, when
improved governance arrangements incentivized
greater reliance on the IXP by local market players.101
Accessing cloud platforms
Just as there is growing reliance on colocation data
centers to store data, the processing of data is being
handled increasingly by cloud platforms. Cloud
platforms essentially enable users to access scal-
able data storage and computing resources across
the internet or other digital networks as and when
required. Continual enhancements in cutting-edge
computing capabilities, combined with significant
improvements in the capacity and speed of process-
ing, transmitting, and storing data, are making cloud
computing increasingly important in the delivery of
public and private services.102
Cloud platforms offer significant benefits in terms
of security, resilience, scale, and flexibility. Security is
arguably better on large cloud computing platforms
than what many businesses or governments could
achieve in-house.103
Strong security features include
ongoing data backups, redundant sites, and industry
certifications, as well as adherence to national data
protection regulations. However, moving data to the
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 173
cloud environment also presents new vulnerabilities
such as reduced visibility of assets and operations, or
the possibility that applications used to access cloud
services could be compromised. IT infrastructure
becomes more resilient as digital data and computing
power become geographically distributed. This resil-
iency is enhanced by classifying services by region
and availability zones and connecting data centers in
the same geographic area. Cloud computing is attrac-
tive because it is often cheaper to share resources
on a common platform than to replicate hardware,
software, and storage requirements on individual
company sites. Small enterprises can then outsource
IT activities that they otherwise would not be able to
provide internally, while benefiting from the flexibil-
ity of immediate upgrades to the most recent analyt-
ics and storage technology.
As broadband connectivity has become more
widely available around the world, cloud computing
has been growing rapidly, with industry revenues
exceeding US$180 billion in 2018, up 27 percent over
the previous year.104
A few large companies dominate
the cloud space, with almost all software and IT ser-
vices firms based in the United States. These hyper-
scale providers operate cloud data centers mainly in
high-income countries, with just a handful in large
middle-income nations such as Brazil and South
Africa, though not elsewhere in the developing world.
Free cloud services funded by advertising, such
as webmail and online social networks, are already
widely used in low- and middle-income nations.
Google Docs provides word processing, spreadsheet,
and presentation software used by millions around
the world.105
IBM offers several free services on its
cloud.106
However, sophisticated cloud services such
as storage and analysis of vast amounts of data can
be costly for developing economies because of the
cost of moving data internationally and the resulting
sacrifice in terms of speed.
Onepotentialsolutionistodevelopcloudplatforms
at the regional level by aggregating demand to achieve
economies of scale. Regional harmonization of regu-
lations for data security, data protection,107
and data
sovereignty could further reduce compliance costs
and help induce major cloud providers to locate closer
to low- and middle-income countries. For example, in
March 2019 Microsoft launched the first data centers
from a large cloud provider in Africa, with locations
in Cape Town and Johannesburg, South Africa, and
potential wider relevance to southern Africa.108
Another approach is for countries with colocation
data centers to encourage the creation of “on-ramps”
to cloud computing services. These are prevalent in
some 80 percent of high-income countries but only
in about 10 percent of middle-income countries such
as India and Indonesia, and not at all in low-income
countries.
Cloud on-ramps are private connections between
data centers and cloud providers. They allow clients
to interact directly with overseas cloud providers
through domestic IXPs located in colocation data
centers without needing to use the internet to access
cloud services.109
This process provides greater secu-
rity and reliability because data are not transmitted to
the cloud over public infrastructure but rather directly
via the on-ramp. Performance in terms of speed is also
greatly improved and costs are significantly lowered
because the cloud provider is responsible for man-
aging and routing the data traffic from the domestic
colocation data center to its cloud data center overseas
using the on-ramp. At the same time, cloud services
create demand for data centers because some appli-
cations require very high speed, which can only be
achieved when computational power is located close
to the user at the network’s “edge.”110
This description
underscores the complementarities between different
types of data infrastructure, such as IXPs, colocation
data centers, and cloud computing.
Big data analysis is increasingly taking place over
distributed cloud networks because the considerable
processing power needed is available only on the
cloud. Data are stored in one or more places and pro-
cessed in others.The cloud has also enabled a new col-
laborative environment for software development in
which developers from around the world participate
in modifying code. The world’s largest open-source
platform, GitHub, hosts more than 100 million reposi-
tories used by 50 million developers worldwide.111
The
growth in new software projects is mainly coming
from low- and middle-income nations, with Africa
expanding more rapidly than any other region. Open-
source repositories in Africa created by software
developers grew 40 percent in 2019.112
Without skilled human resources, countries will
be limited in their ability to apply modern data infra-
structure to achieving economic and social impacts.
Workers are needed to create and maintain data
infrastructure, as well as to collect, store, manage,
and analyze large amounts of data. Although the skills
needed to collect, store, and manage data are available
in many parts of the world, those for analyzing big
datasets are limited. Data scientists—specialists in
math, computer, and analytical skills—who derive
value from large datasets are in short supply, and low-
and middle-income countries are at a disadvantage in
the global market for technical skills (see box 5.1).
174 | World Development Report 2021
Conclusions and
recommendations
Low- and lower-middle-income countries continue to
face major challenges in connecting themselves to the
vital infrastructure that underpins the data-driven
economy. Many have yet to develop their own IXPs
and remain reliant on slow, expensive international
data transfers to access the World Wide Web. Colo-
cation data centers that allow further local storage
and processing of data, as well as caching of internet
content, are still not prevalent in low- and lower-
middle-income countries, while global cloud com-
puting resources are almost entirely concentrated in
high-income countries, with a limited availability of
on-ramps to facilitate access by the developing world.
Box 5.1 The brain drain—ICT professionals
Available data on the supply of and demand for infor-
mation and communication technology (ICT) skills paint
two distinct pictures.a
Among country income groups,
there does not seem to be wide divergence in the pro-
portion of students graduating with ICT degrees, which
typically falls in the 7–8 percent range overall, yet with
marked discrepancies between men and women (figure
B5.1.1, panel a). However, employment of ICT profession-
als is strongly correlated with country income groups,
since these workers account for 2.1 percent of total
employment in high-income nations, dropping to 0.1 per-
cent in low-income countries (figure B5.1.1, panel b).
The mismatch between supply and demand in low-
and middle-income economies prompts outward migra-
tion toward better employment opportunities in countries
where the digital sector is more developed. Evidence of
this brain drain already exists: all countries except high-
income are experiencing large outflows of skilled tech
workers (figure B5.1.2, panel a). Even where skilled data
workers have opportunities in low- and middle-income
countries, wage differentials could be a driver of migration.
The average wages for ICT employees are significantly
higher in high-income nations (figure B5.1.2, panel b), and
significant wage differentials exist across regions.
(Box continues next page)
Figure B5.1.1 Low- and middle-income countries are educating ICT
professionals but not retaining them
Sources: Institute for Statistics, United Nations Educational, Scientific, and Cultural Organization, Data of UIS.Stat (database), http://data.uis
.unesco.org/; International Labour Organization, ILOSTAT Database, http://www.ilo.org/ilostat/. Data at http://bit.do/WDR2021-Fig-B5_1_1.
Note: For ICT graduates, panel a presents the latest available data between 2015 and 2018 for 120 economies. For ICT professionals, panel b
plots the latest available data between 2016 and 2019 for 73 economies. Country income group percentages are collective country averages.
ICT = information and communication technology.
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
0
1
2
3
4
5
6
7
8
9
Percentage
of
total
graduates
Percentage
of
total
professionals
H
i
g
h
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
a. ICT graduates, by gender
Male Female
0
0.5
1.0
1.5
2.0
2.5
W
o
r
l
d
H
i
g
h
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
b. ICT professionals, by country income group
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 175
At the same time, in poor countries large swathes
of the poorest and most disadvantaged segments
of society continue to be excluded from access to
broadband data services. For low-income countries,
particularly in Africa, the coverage gap for broadband
signal remains significant, affecting 30 percent of the
population. A major concern is the usage gap—the
vast majority of those who do not have data access
today live within range of a mobile signal, but they
face either affordability or literacy challenges that
prevent them from making use of the service. The
COVID-19 pandemic has drawn growing attention to
the consumption gap, which highlights the limited
volumes of data usage in the developing world and
Box 5.1 The brain drain—ICT professionals (continued)
The relative penetration rate of data science skills—
relevant to artificial intelligence—across comparable occu-
pations is four times higher in high-income countries than
in low-income countries.b
Given salary differentials of 5–10
times between data scientists in low- and high-income
countries, it is estimated that workers with these skills in
low-income countries are 33 percent more likely to migrate
than workers from high-income countries. There was a net
outflow of at least 70,000 workers from low- and middle-
income countries every year from 2015 to 2019.c
a.	Data from the national statistical office on data skills are lacking in both availability and specificity, particularly for low- and middle-income countries. New
sources of data, such as the professional networking platform LinkedIn, are emerging as sources of timely and granular information on the labor market,
albeit with limited representativeness. The LinkedIn data used in this Report have been validated against international metrics where available to assess
coverage and representativeness (Zhu, Fritzler, and Orlowski 2018).
b.	
Estimation based on the Skills Genome Benchmarking Methodology, using LinkedIn’s Skills Genome Country-Occupation data. This method allows a fair
comparison of the penetration of data science skills of all countries in the dataset with that of a chosen benchmark (low-income countries, in this case) by
controlling for common occupations among each country and the benchmark.
c.	WDR 2021 team calculations, based on Zhu, Fritzler, and Orlowski (2018).
Figure B5.1.2 Major wage differentials for ICT professionals create a brain
drain, especially in low- and middle-income countries
Sources: International Labour Organization, ILOSTAT Database, http://www.ilo.org/ilostat/; LinkedIn data (2015–19). Data at http://bit.do
/WDR2021-Fig-B5_1_2.
Note: According to LinkedIn dataset classification, “disruptive technology skills” include knowledge and skills in areas of materials science,
aerospace engineering, development tools, data science, robotics, artificial intelligence, human computer interaction, genetic engineering,
nanotechnology, and fintech. ICT = information and communication technology.
Earnings
(US$)
a. Net flow of workers with disruptive
technology skills
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
b. Mean nominal monthly earnings of
ICT employees
H
i
g
h
-
i
n
c
o
m
e
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
H
i
g
h
-
i
n
c
o
m
e
U
p
p
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
e
r
-
m
i
d
d
l
e
-
i
n
c
o
m
e
L
o
w
-
i
n
c
o
m
e
–200
–100
0
100
200
Net
flow
(per
10,000
workers)
176 | World Development Report 2021
the implications for the population’s ability to access
data-based services such as tele-education.
Connecting poor people
When it comes to connecting people to data infra-
structure, the following steps are recommended.
Keep costs down through competition. Governments
should prioritize all measures to drive down the cost
of service provision. Lower costs improve the com-
mercial viability of services, thereby reducing cover-
age gaps and accelerating technology upgrades. They
also help to improve service affordability, thereby
reducing usage and consumption gaps. Governments
have two possible levers for reducing costs. The most
fundamental is creating competitive pressures along
the supply chain, including both its wholesale and
retail tiers, while addressing structural impediments
such as vertical integration. In addition, governments
could create a regulatory environment that supports
sharing bottleneck infrastructures in areas with low
data traffic that could not otherwise support compet-
itive provision of data infrastructure.
Harness private sector investment. To develop digital
infrastructuregovernmentsshouldrelyontheprivate
sector wherever possible. This calls for privatization
of state-owned incumbents and a policy of avoiding
state investment (such as through vendor-financed
models) in segments such as the national fiber-optic
backbone where the private sector is willing to invest.
The entry of new market players is an important
market trend, including the growing role of content
providers in building backbone infrastructure, which
will require careful consideration of competition and
regulatory issues such as net neutrality.113
With the
advent of 5G, industrial players are also expected
to become more active in the development of ICT
infrastructure.
Rethink universal service policies. The underperfor-
mance of traditional universal service funds points
to the need to rethink and modernize government
policy measures to support universal service. Mea-
sures such as license coverage obligations should be
considered. The role of innovative technologies (such
as TV white space) and new entrants (such as content
providers) in reaching remote populations are also
relevant. Supply-side subsidies should be competi-
tively awarded and carefully targeted to those pockets
that cannot be reached effectively after considering
all other available measures. Furthermore, universal
service funds could be redirected to addressing the
usage gap by funding digital literacy programs or
supporting access to lower-cost mobile devices. This
would entail targeting such resources more toward
disadvantaged population segments than to under-
served geographic areas.
Calibrate fiscal regimes carefully. The governments of
poor countries have typically regarded digital infra-
structure and associated data services as a potential
fiscalcashcow(throughtaxes,fees,andothercharges)
in the context of low mobilization of public sector
revenues. This view has led to relatively high indirect
taxes on mobile devices and data services, significant
import duties on equipment, and in some cases high
reservation fees for spectrum access. However, there
are important trade-offs between the fiscal revenues
generated by the sector in the short term and the pace
of digital infrastructure rollout and service uptake in
the longer term, which also has implications for eco-
nomic growth and associated tax revenues over time.
The design of the fiscal regime for digital infrastruc-
ture and data services must therefore be carefully
thought through to balance these competing policy
objectives. There may be a case for giving lighter
tax treatment to low-end mobile devices to support
uptake by disadvantaged groups.
Support upgrades to new technologies. Governments
need to create an environment that enables acceler-
ated upgrades to higher-generation technologies.
The fiber-optic backbone is a critical prerequisite for
further upgrade of networks to 4G and 5G technol-
ogies. Thus measures to expand this network at any
stage, in partnership with the private sector, would be
a no-regrets strategy. In addition, allowing reform of
the spectrum so that operators can repurpose exist-
ing spectrum allocations would be a helpful strategy.
License conditions could also be used to package such
regulatory allowances with obligations for data ser-
vice providers to provide wide-ranging service cover-
age. Finally, the complexity of these new technologies
will require adequate investments in cybersecurity
protection.114
Connecting poor countries
When connecting countries to data infrastructure,
the following steps are recommended.
Progressively develop domestic data infrastructure.
Governments need to pay much more attention to
the specific infrastructure required to support the
sharing, storage, and processing of large volumes of
data. To participate in the data-driven economy on a
competitive basis, countries must be able to perform
high-volume data operations at the greatest possible
speed and lowest possible cost. The infrastructures
that can meet these performance goals are internet
exchange points, colocation data centers, and cloud
computing.
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 177
Promote creation of internet exchange points. IXPs
remain scarce across the developing world, and even
where they do exist they often fail to achieve their
potential. Governments have a role in creating the
enabling conditions for such institutions to emerge
and be widely utilized. Better governance models are
needed so that IXPs can develop and become sus-
tainable. Collaboration between IXPs and key stake-
holders such as ISPs, government entities, research
networks, and content providers can help to raise
awareness, expand skills, and create the trust essen-
tial for IXPs to be successful.
Create a favorable environment for colocation data
centers. Neutral, privately owned colocation data cen-
ters are an essential part of data infrastructure and
critical for promoting the digital economy. They are
a secure venue in which local and international com-
panies can store their data and help support the local
hosting industry. The willingness of private investors
to install such facilities in low- and middle-income
countries is affected by concerns about the enabling
environment—in particular, the availability of clean
and secure energy sources, as well as relative political
stabilityandeaseofdoingbusiness.Governmentscan
catalyze the market by moving their online services to
data centers and by encouraging businesses to host
locally to create economies of scale, while establish-
ing a solid data protection framework to build trust.
Secure on-ramps to the cloud. Cloud computing
creates tremendous opportunities for low- and
middle-income nations to gain remote access to
advanced computing facilities for data management
and analytics. The cloud also enables collaborative
creation of software, thereby giving software devel-
opers around the world opportunities to participate.
However, cloud data centers and on-ramps are mainly
located in upper-middle- and high-income nations.
Low-income countries could induce the major cloud
providers to locate closer through a regional approach
that aggregates demand and harmonizes compliance
requirements for security, data protection, and sover-
eignty. Governments also need to foster an enabling
environment that encourages cloud providers to
locate in local data centers in order to provide an
on-ramp to their services.
Invest in and retain human resources. Realization of
the potential for data infrastructure to contribute to
economic development depends on adequate human
resources, particularly in frontier areas such as data
science and artificial intelligence. Although there
is an acute global scarcity of these skills, evidence
suggests that low- and middle-income nations are
producing some graduates in these fields. However,
wage differentials in a highly competitive global
market are leading to a powerful brain drain effect,
preventing those countries from harnessing these
skills. The brain drain is often exacerbated by the lack
of opportunities arising from undeveloped local data
infrastructure. Governments need to stimulate their
digital economies by encouraging private investment
in fiber-optic backbones and data centers that gener-
ate direct and indirect employment.
The recommendations presented here are orga-
nized within the maturity model framework in table
5.1, recognizing that different countries may be at
different stages of developing data infrastructure.
Table 5.1 Recommendations for data infrastructure improvements sequenced according
to a maturity model
Stage of country Connecting people Connecting countries
Establishing
fundamentals
Eliminate coverage gaps by reducing costs through
wholesale and retail competition, as well as
infrastructure sharing arrangements, and, where still
required, providing well-crafted state support.
Ensure adequate international bandwidth.
Create a competitive market environment for
international gateways and internet service providers.
Initiating data
flows
Narrow usage gaps through digital literacy
campaigns, investment in basic education, lower
taxation and import duties on low-end handsets, and
support of local ventures for manufacturing handsets.
Encourage creation of the first domestic IXP and
facilitate participation by all relevant domestic players.
Allow additional IXPs to emerge and players to
formalize and mature.
Encourage arrangements to cache popular international
internet content on local servers.
Optimizing the
system
Upgrade digital networks to the latest generation to
improve speed and efficiency and facilitate higher
consumption.
Create a supportive environment in which colocation
data centers can emerge, integrate with IXPs, and
provide on-ramp access to cloud services.
Source: WDR 2021 team.
Note: IXPs = internet exchange points.
178 | World Development Report 2021
Notes
1. Briglauer and Gugler (2019); Czernich et al. (2011); Katz
and Callorda (2018); Koutroumpis (2018); Minges (2015).
2. Bertschek and Niebel (2016).
3. Shapiro and Hassett (2012).
4. Hjort and Poulsen (2019).
5. van der Marel (2020).
6. See“HowDoesDataTravelontheInternet?”Networking
Guides, https://networkingguides.com/how-does-data
-travel-over-the-internet/.
7. Cisco (2018).
8. Ericsson (2020).
9. Cisco (2020).
10. Monash University (2020).
11. Sandvine (2019).
12. Text messages allow traders to check on agricultural
prices, remind the sick when to take their medicine,
and help nurses register births. Mobile money has
unleashed add-on services in microinsurance, agri-
culture, and transportation, and it is facilitating the
deployment of off-grid energy by allowing users to
repay the cost of solar panels with micropayments.
Development agencies can make conditional cash
transfers to mobile money accounts, reducing costs
and increasing security.
13. Katz and Callorda (2018).
14. Czernich et al. (2009).
15. Anderson and Kumar (2019).
16. Broadband Commission (2019).
17. For further details on Sustainable Development Goal
9, Target 9.c, see Department of Economic and Social
Affairs, United Nations, “Goals: 9, Build Resilient Infra-
structure, Promote Inclusive and Sustainable Industri-
alization and Foster Innovation,” https://sdgs.un.org
/goals/goal9.
18. See, for example, Michie (1997) or Madden (2010).
19. SDG Target 9.c states: “Significantly increase access
to information and communications technology and
strive to provide universal and affordable access to the
Internet in least developed countries by 2020” (Depart-
ment of Economic and Social Affairs, United Nations,
“Goals: 9, Build Resilient Infrastructure, Promote
Inclusive and Sustainable Industrialization and Foster
Innovation,” https://sdgs.un.org/goals/goal9).
20. Broadband Commission (2019).
21. Ericsson (2020).
22. Oughton et al. (2018).
23. Oughton et al. (2018).
24. World Bank (2016).
25. GSMA (2019c).
26. Kapko (2020).
27. Broadband Commission (2019).
28. Broadband Commission (2019).
29. GSMA (2019b).
30. See World Bank (2019c). One example is Liquid
Telecom, which has rolled out 70,000 kilometers of
fiber-optic cable through several African nations. See
Liquid Telecom, “Our Network,” https://www.liquid
telecom.com/about-us/our_network.
31. For example, landlocked Mongolia’s north-south fiber-
optic backbone connecting it to China and the Russian
Federation runs along the railway (Tsolmondelger 2019).
32. Strusani and Houngbonon (2020).
33. UN-OHRLLS (2017).
34. See the information on the Adopting TV White Spaces
Project in Colombia (ITU 2018c, 33).
35. Loon (2020).
36. Iridium Communications (2020).
37. Intelecom Research and Consultancy (2016).
38. GSMA (2013).
39. World Wide Web Foundation and A4AI (2018).
40. ESCAP (2017).
41. ITU (2018a).
42. Chen (2021).
43. ITU (2018b).
44. GSMA (2020).
45. The United Nations Educational, Scientific, and Cul-
tural Organization (UNESCO) defines digital literacy
as “the ability to access, manage, understand, integrate,
communicate, evaluate and create information safely
and appropriately through digital technologies for
employment, decent jobs and entrepreneurship. It
includes competences that are variously referred to as
computer literacy, ICT literacy, information literacy
and media literacy” (Law et al. 2018).
46. Chen (2021).
47. Chen (2021).
48. Chen (2021).
49. GSMA (2017).
50. GSMA (2019a).
51. See Ministry of ICT and Innovation, “Digital Ambassa-
dors Programme,” Kigali, Rwanda, https://www.minict
.gov.rw/projects/digital-ambassadors-programme.
52. Radovanović et al. (2020).
53. Internet Society (2015).
54. Silver and Smith (2019).
55. GSMA (2019a).
56. A4AI (2020).
57. MTN (2020).
58. JioPhone, “Jio Digital Life,” Reliance Jio Infocomm Ltd,
Mumbai, India, https://www.jio.com/en-in/jiophone.
59. New China (2018).
60. A4AI (2020).
61. GSMA (2019d).
62. GSMA (2020).
63. Telefónica S.A., “Quarterly Results: 2020 January–
September,” https://www.telefonica.com/en/web/share
holders-investors/financial_reports/quarterly-reports/.
64. A4A1 (2019).
65. Sonia (2020).
66. Chen and Minges (2021).
67. Silver et al. (2019).
68. ITU (2018c).
69. See Alliance for Affordable Internet, “Affordable Internet
Is ‘1 for 2’,” https://a4ai.org/affordable-internet-is-1-for-2.
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 179
70. See “Advocacy Target 2” (Broadband Commission
2020).
71. BBC News (2019a).
72. BBC News (2019b).
73. Based on information provided by one of Cambodia’s
seven mobile operators, Smart (Smart Axiata 2019).
74. See the speed graph provided by West Central Tele-
phone Association, Sebeka, MN, https://www.wcta
.net/speed-demo/.
75. TRAI(2019).
76. Malisuwan, Tiamnara, and Suriyakrai (2015).
77. Sanni (2016).
78. Capitel (2016).
79. RURA (2019).
80. TeleGeography, Submarine Cables (database), https://
www.submarinecablemap.com/.
81. Miller (2019).
82. Ahmad and Salvadori (2020).
83. See World Bank, “West Africa Regional Communica-
tions Infrastructure Project, APL 2,” https://projects
.worldbank.org/en/projects-operations/project-detail
/P123093.
84. World Bank (2019a).
85. See TeleGeography, Submarine Cable Frequently
Asked Questions, https://www2.telegeography.com
/submarine-cable-faqs-frequently-asked-questions.
86. Baynes (2018).
87. The development of domestic data infrastructure
should not be confounded with the question of data
localization. Data localization, a regulatory issue dis-
cussed at some length under the trade section of chap-
ter 7, concerns the adoption of government restrictions
requiringthatacountry’sdatabestoredandsometimes
processed on national territory, often with associated
government controls on cross-border data transfers.
Although domestic data infrastructure is a prerequisite
for data localization, the development of domestic data
infrastructure serves many other critical functions. In
particular, it supports the cost-effective exchange of
data among domestic parties and facilitates the access
of country nationals to data from other jurisdictions by
allowing copies of such data to be stored locally.
88. Agudelo et al. (2014).
89. Internet Society (2017).
90. The count of the number of IXPs in the world differs
depending on the source. For example, PeeringDB
(https://www.peeringdb.com/) reported 786 in June
2020. The variations are often due to differences in
definitions (such as whether private peering facilities
are included).
91. See Packet Clearing House, “Packet Clearing House
Report on Internet Exchange Point Locations,” https://
www.pch.net/ixp/summary.
92. DE-CIX (2015).
93. Dobran (2018). Colocation data centers are vulnerable
to physical and cybersecurity threats. They may be an
attractive target for cybercriminals because they host
large amounts of data and private information, all in
the same location. However, because they have more
resources, colocation sites can invest in better security
protections than what could be achieved in-house for a
typical small or medium-size business.
94. PeeringDB, https://www.peeringdb.com/.
95. CW (2016).
96. Comini, Srinivasan, and Minges (forthcoming).
97. EU-AITF (2018).
98. See Interconnect Help, Google, “Introduction to GGC,”
https://support.google.com/interconnect/answer
/9058809?hl=en.
99. Comini, Srinivasan, and Minges (forthcoming).
100. Balancing Act (2019).
101. Comini, Srinivasan, and Minges (forthcoming).
102. UNCTAD (2013).
103. All large cloud providers have International Organiza-
tion for Standardization/International Electrotechnical
Commission (ISO/IEC) 27000 certification compliant
with regulatory and legal requirements that relate to
the security of information (ISO/IEC, “Popular Stan-
dards: ISO/IEC 27001, Information Security Manage-
ment,” https://www.iso.org/isoiec-27001-information
-security.html).
104. IDC (2019).
105. See Google, “Google Docs,” https://www.google.com
/docs/about/.
106. International Business Machines, “IBM Cloud: Free
Tier,” https://www.ibm.com/cloud/free.
107. See chapter 7 for a discussion about data localization
regulations.
108. Keane (2019).
109. See DP Facilities, “The Critical Role Data Centers Play
in Today’s Enterprise Networks: Part 3, Why Cloud
On-Ramps Are Key for an Enterprise Migrating to
the Cloud,” https://www.dpfacilities.com/blog/cloud
-onramps-are-key-to-migration/.
110. World Bank (2019b).
111. GitHub, “Where the World Builds Software,” https://
github.com/.
112. GitHub, “The 2020 State of the OCTO–VERSE,” https://
octoverse.github.com/.
113. Because of the complexity of the topic, this chapter
does not address the issue of net neutrality and its
impact on market regulation and competition.
114. With their distributed routing approach and soft-
ware-driven design, 5G networks present an array of
new cybersecurity challenges that must be addressed
before these networks are widely deployed. More-
over, IoT devices are often manufactured without
adequate cybersecurity protections, and they have
security vulnerabilities. These vulnerabilities can
be exploited by bad actors who can gain access to
the network or harness the computational power of
an IoT device for other malicious purposes, such as
distributed denial of service attacks. A forthcoming
World Bank 5G flagship report will address in detail
the cybersecurity issues raised by the uptake of the
5G technology.
180 | World Development Report 2021
References
A4AI(AllianceforAffordableInternet).2019.“The2019Afford-
ability Report.” October, World Wide Web Foundation,
Washington, DC. https://a4ai.org/affordability-report.
A4AI (Alliance for Affordable Internet). 2020. “From Luxury
to Lifeline: Reducing the Cost of Mobile Devices to Reach
Universal Internet Access.” August 6, World Wide Web
Foundation, Washington, DC. https://docs.google.com
/document/d/1YFXbUr-W0LTOAXs9QEtlf8oEBsrakQ
_lLeHUyb-6oEY/edit.
Agudelo, Mauricio, Raúl Katz, Ernesto Flores-Roux, María
Cristina Duarte Botero, Fernando Callorda, and Taylor
Berry. 2014. Expansión de infraestructura regional para
la interconexión de tráfico de internet en América Latina.
Caracas, República Bolivariana de Venezuela: Devel-
opment Bank of Latin America. http://scioteca.caf.com
/handle/123456789/522.
Ahmad, Najam, and Kevin Salvadori. 2020. “Building
2Africa, a Transformative Subsea Cable to Better Con-
nect Africa.” Connectivity, Networking,  Traffic (blog),
May 13, 2020. https://engineering.fb.com/2020/05/13
/connectivity/2africa/.
Anderson, Monica, and Madhumitha Kumar. 2019. “Digital
Divide Persists Even as Lower-Income Americans Make
Gains in Tech Adoption.” Fact Tank, News in the Numbers
(blog), May 7, 2019. https://www.pewresearch.org/fact
-tank/2019/05/07/digital-divide-persists-even-as-lower
-income-americans-make-gains-in-tech-adoption/.
Balancing Act. 2019. “Ugandan Government and Regulator
Want to Take Control of Local IXP: The Latest in a Dis-
turbing Pattern of Regulatory Interventions.” Balancing
Act News, July 12. https://www.balancingact-africa.com
/news/telecoms-en/45586/ugandan-government-and
-regulator-want-to-take-control-of-local-ixp-the-latest-in
-a-disturbing-pattern-of-regulatory-interventions.
Baynes, Chris. 2018. “Entire Country Taken Offline for Two
Days after Undersea Internet Cable Cut.” Independent,
April 11. https://www.independent.co.uk/news/world
/africa/mauritiana-internet-cut-underwater-cable
-offline-days-west-africa-a8298551.html.
BBC News. 2019a. “Mobile Data: Why India Has the World’s
Cheapest.” BBC News, March 18. https://www.bbc.com
/news/world-asia-india-47537201.
BBC News. 2019b. “Reliance Jio: India’s Cheapest Data Pro-
vider to Raise Prices.” BBC News, November 20. https://
www.bbc.com/news/world-asia-india-50484594.
Bertschek, Irene, and Thomas Niebel. 2016. “Mobile and
More Productive? Firm-Level Evidence on the Pro-
ductivity Effects of Mobile Internet Use.” Telecommu-
nications Policy 40 (9): 888–98. https://doi.org/10.1016/j
.telpol.2016.05.007.
Briglauer, Wolfgang, and Klaus Gugler. 2019. “Go for Giga-
bit? First Evidence on Economic Benefits of High-Speed
Broadband Technologies in Europe.” JCMS, Journal of
Common Market Studies 57 (5): 1071–90. https://doi.org
/10.1111/jcms.12872.
Broadband Commission (United Nations Broadband
Commission for Sustainable Development). 2019. Con-
necting Africa through Broadband: A Strategy for Doubling
Connectivity by 2021 and Reaching Universal Access by 2030.
October. Geneva: Broadband Commission Working
Group on Broadband for All, International Telecommu-
nication Union. https://www.broadbandcommission.org
/Documents/working-groups/DigitalMoonshotfor
Africa_Report.pdf.
Broadband Commission (United Nations Broadband Com-
mission for Sustainable Development). 2020. The State
of Broadband 2020: Tackling Digital Inequalities; A Decade for
Action. September. Geneva: International Telecommuni-
cation Union. https://broadbandcommission.org/Pages
/targets/Target-3.aspx.
Buckholtz, Alison. 2019. “Africa’s IT Talent Pool.” IFC Insights
(blog), December 2019. https://www.ifc.org/wps/wcm
/connect/news_ext_content/ifc_external_corporate_site
/news+and+events/news/insights/africa-it-talent.
CW (Cushman and Wakefield). 2016. “Data Centre Risk
Index.” CW, London. https://verne-global-lackey.s3
.amazonaws.com/uploads%2F2017%2F1%2Fb5e0a0da
-5ad2-01b3-1eb8-8f782f22a534%2FC%26W_Data_Centre
+Risk_Index_Report_2016.pdf.
Capitel. 2016. “Economics of Mobile Data in Frontier and
Emerging Markets.” Techno-Commercial Planning
and Transaction Advisory in Telecoms, Media, and
Technology, Capitel, Gurgaon, India; Singapore. https://
capitelpartners.com/wp-content/uploads/2018/04
/Capitel-Economics-of-mobile-data.pdf.
Chen, Rong. 2021. “A Demand-Side View of Mobile Internet
Adoption in the Global South.” Policy Research Work-
ing Paper 9590, World Bank, Washington, DC. http://
documents.worldbank.org/curated/en/49287161635092
9155/A-Demand-Side-View-of-Mobile-Internet-Adoption
-in-the-Global-South.
Chen, Rong, and Michael Minges. 2021. “Minimum Data
Consumption: How Much Is Needed to Support On-Line
Activities, and Is It Affordable?” Digital Development:
AnalyticalInsights,Note3,WorldBank,Washington,DC.
http://pubdocs.worldbank.org/en/742001611762098567
/Analytical-Insights-Series-Jan-2021.pdf.
Cisco. 2018. “Cisco Visual Networking Index: Forecast and
Trends, 2017–2022.” White Paper, November, Cisco
Systems, San Jose, CA. https://cloud.report/Resources
/Whitepapers/eea79d9b-9fe3-4018-86c6-3d1df813d3b8
_white-paper-c11-741490.pdf.
Cisco. 2020. “Cisco Annual Internet Report (2018–2023).”
White Paper, March, Cisco Systems, San Jose, CA. https://
www.cisco.com/c/en/us/solutions/collateral/executive
-perspectives/annual-internet-report/white-paper-c11
-741490.html.
Comini, Niccolo, Sharada Srinivasan, and Michael Minges.
Forthcoming. “The Role of a National Data Infrastruc-
ture: Internet Exchange Points, Content Delivery Net-
works, and Data Centers.” WDR 2021 background paper,
World Bank, Washington, DC.
Czernich, Nina, Oliver Falck, Tobias Kretschmer, and Ludger
Woessmann. 2009. “Broadband Infrastructure and Eco-
nomic Growth.” CESifo Working Paper 2861 (December),
MunichSocietyforthePromotionofEconomicResearch,
Center for Economic Studies, Ludwig Maximilian Uni-
versity and Ifo Institute for Economic Research, Munich.
https://www.cesifo.org/DocDL/cesifo1_wp2861.pdf.
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 181
Czernich, Nina, Oliver Falck, Tobias Kretschmer, and
Ludger Woessmann. 2011. “Broadband Infrastructure
and Economic Growth.” Economic Journal 121 (552): 505–
32. doi:10.1111/j.1468-0297.2011.02420.x.
DE-CIX (Deutscher Commercial Internet Exchange). 2015.
“DE-CIX: From the Interconnection of Three Internet
Service Providers back in 1995 to the World’s Leading
InternetExchange.”DE-CIXExchange,Frankfurt.https://
www.de-cix.net/Files/d4167da7aafe0da34f384181606db
6b8572cf2c1/DE-CIX_From-the-interconnection-of-three
-ISP-to-the-worlds-leading-IX.pdf.
Dobran, Bojana. 2018. “Data Center Tier Classification
Levels Explained (Tier 1, 2, 3, 4).” PhoenixNAP (blog),
May 2, 2018. https://phoenixnap.com/blog/data-center
-tiers-classification.
Ericsson. 2020. “Mobility Report.” Ericsson, Stockholm.
https://www.ericsson.com/4adc87/assets/local/mobility
-report/documents/2020/november-2020-ericsson
-mobility-report.pdf.
ESCAP (United Nations Economic and Social Commission
for Asia and the Pacific). 2017. “The Impact of Universal
Service Funds on Fixed-Broadband Deployment and
Internet Adoption in Asia and the Pacific.” Asia-Pacific
Information Superhighway (AP-IS) Working Paper, ICT
and Development Section, Information and Communi-
cations Technology and Disaster Risk Reduction Divi-
sion, ESCAP, Bangkok, Thailand. https://www.unescap
.org/sites/default/files/Universal%20Access%20and%20
Service%20Funds.pdf.
EU-AITF(EuropeanUnion–AfricaInfrastructureTrustFund).
2018. “EU–Africa Infrastructure Trust Fund: Annual
Report 2017.” European Investment Bank, Luxembourg.
https://www.eib.org/attachments/country/eu_africa
_infrastructure_trust_fund_annual_report_2017_en.pdf.
GSMA (GSM Association). 2013. Universal Service Fund Study.
London:GSMA.https://www.gsma.com/publicpolicy/wp
-content/uploads/2016/09/GSMA2013_Report_Survey
OfUniversalServiceFunds.pdf.
GSMA (GSM Association). 2017. Connected Society: Mobile
Internet Skills Training Toolkit; A Guide for Training People in
Basic Mobile Internet Skills. London: GSMA. https://www
.gsma.com/mobilefordevelopment/connected-society
/mistt/.
GSMA (GSM Association). 2019a. “Connected Society:
Mobile Internet Skills Training Toolkit; Banglalink
Pilot Evaluation.” GSMA, London. https://www.gsma
.com/mobilefordevelopment/resources/mobile-internet
-skills-training-toolkit-banglalink-pilot-evaluation/.
GSMA (GSM Association). 2019b. “Connected Society: The
State of Mobile Internet Connectivity 2019.” GSMA,
London. https://www.gsma.com/mobilefordevelopment
/resources/the-state-of-mobile-internet-connectivity
-report-2019/.
GSMA (GSM Association). 2019c. “GSMA Connected Soci-
ety: Closing the Coverage Gap; How Innovation Can
Drive Rural Connectivity.” GSMA, London. https://www
.gsma.com/mobilefordevelopment/wp-content/uploads
/2019/07/GSMA-Closing-The-Coverage-Gap-How
-Innovation-Can-Drive-Rural-Connectivity-Report-2019
.pdf.
GSMA (GSM Association). 2019d. “Rethinking Mobile Tax-
ation to Improve Connectivity.” GSMA, London. https://
www.gsma.com/publicpolicy/wp-content/uploads
/2019/02/Rethinking-mobile-taxation-to-improve
-connectivity_Feb19.pdf.
GSMA (GSM Association). 2020. “Connected Women: The
Mobile Gender Gap Report 2020.” GSMA, London.
https://www.gsma.com/mobilefordevelopment/wp
-content/uploads/2020/05/GSMA-The-Mobile-Gender
-Gap-Report-2020.pdf.
Hjort, Jonas, and Jonas Poulsen. 2019. “The Arrival of Fast
Internet and Employment in Africa.” American Economic
Review 109 (3): 1032–79.
IDC (International Data Corporation). 2019. “Worldwide
Public Cloud Services Revenue Grows to Nearly $183
Billion in 2018, Led by the Top 5 Service Providers and
Accelerating Public Cloud Services Spending in China.”
IDC, Framingham, MA. https://www.idc.com/getdoc
.jsp?containerId=prUS45411519.
Intelecom Research and Consultancy. 2016. “ICT Access
Gaps Study: Final Report.” Communications Authority
of Kenya, Nairobi. https://ca.go.ke/wp-content/uploads
/2018/02/ICT-Access-Gaps-Report-April-2016-.pdf.
Internet Society. 2015. “Local Content: An Internet Society
Public Policy Briefing.” Public Policy, Internet Society,
Reston, VA. https://www.internetsociety.org/policybriefs
/localcontent/.
Internet Society. 2017. “The Benefits of Local Content Host-
ing: A Case Study.” Internet Society, Reston, VA. https://
www.internetsociety.org/wp-content/uploads/2017/08
/ISOC_LocalContentRwanda_report_20170505.pdf.
Iridium Communications. 2020. 2019AnnualReport:Reliability
Above All. McLean, VA: Iridium Communications. https://
investor.iridium.com/annual-reports.
ITU (International Telecommunication Union). 2018a. ICTs,
LDCs, and the SDGs: Achieving Universal and Affordable
Internet in the Least Developed Countries. Thematic Report:
ITU Development, LDCs and Small Island Develop-
ing States Series. Geneva: ITU. https://www.itu.int/en
/ITU-D/LDCs/Pages/Publications/LDCs/D-LDC-ICTLDC
-2018-PDF-E.pdf.
ITU (International Telecommunication Union). 2018b.
“ITU’s Approach to Bridging the Digital Gender Divide.”
ITU News, October 22, 2018. https://www.itu.int/es/myitu
/News/2020/05/22/11/16/ITU-s-approach-to-bridging-the
-digital-gender-divide.
ITU (International Telecommunication Union). 2018c. Report
on the WSIS Stocktaking 2018. Geneva: ITU.
Kapko, Matt. 2020. “Rakuten Mobile Delivers Its Virtualized
Reality.” SDxCentral News, April 8, 2020. https://www
.sdxcentral.com/articles/news/rakuten-mobile-delivers
-its-virtualized-reality/2020/04/.
Katz, Raul, and Fernando Callorda. 2018. “The Economic
Contribution of Broadband, Digitization, and ICT
Regulation.” Thematic Report: Regulatory and Market
Environment Series, International Telecommunication
Union, Geneva. https://www.itu.int/pub/D-PREF-EF.BDR
-2018.
Keane, Tom. 2019. “Microsoft Opens First Datacenters in
Africa with General Availability of Microsoft Azure.”
Microsoft Azure Announcements (blog), March 6, 2019.
https://azure.microsoft.com/en-us/blog/microsoft-opens
182 | World Development Report 2021
-first-datacenters-in-africa-with-general-availability-of
-microsoft-azure/.
Koutroumpis, Pantelis. 2018. “The Economic Impact of
Broadband: Evidence from OECD Countries.” Ofcom,
London. https://www.ofcom.org.uk/__data/assets/pdf
_file/0025/113299/economic-broadband-oecd-countries
.pdf.
Law, Nancy, David Woo, Jimmy de la Torre, and Gary
Wong. 2018. A Global Framework of Reference on Digital
Literacy Skills for Indicator 4.4.2. Information Paper 51,
UIS/2018/ICT/IP/51. Montreal: Institute for Statistics,
United Nations Educational, Scientific, and Cultural
Organization. http://uis.unesco.org/sites/default/files
/documents/ip51-global-framework-reference-digital
-literacy-skills-2018-en.pdf.
Loon. 2020. “The Stratosphere.” Loon, Mountain View, CA.
https://loon.com/static/pdfs/Stratosphere_Whitepaper
_May1.pdf.
Madden, Gary. 2010. “Economic Welfare and Universal
Service.” Telecommunications Policy 34 (1–2): 110–16.
https://www.sciencedirect.com/science/article/abs/pii
/S0308596109001141.
Malisuwan, Settapong, Noppadol Tiamnara, and Nattakit
Suriyakrai. 2015. “A Post-Auction Review of 2.1 GHz
Spectrum Licensing Obligations in Thailand.” Interna-
tional Journal of Innovation, Management, and Technology 6
(4): 285–89. https://doi.org/10.7763/IJIMT.2015.V6.616.
Manulis, Mark, Chris P. Bridges, Richard Harrison, Venk-
katesh Sekar, and Andy Davis. 2020. “Cyber Security in
New Space: Analysis of Threats, Key Enabling Technol-
ogies, and Challenges.” International Journal of Information
Security. Published ahead of print, May 12, 2020. https://
link.springer.com/article/10.1007/s10207-020-00503-w.
Michie, Jonathan. 1997. “Network Externalities: The Eco-
nomics of Universal Access.” Utilities Policy 6 (4): 317–24.
http://www.sciencedirect.com/science/article/pii/S095
717879700026X.
Miller, Jayne. 2019. “This Is What Our 2019 Submarine
Cable Map Shows Us about Content Provider Cables.”
TeleGeography (blog), March 19, 2019. https://blog.tele
geography.com/this-is-what-our-2019-submarine-cable
-map-shows-us-about-content-provider-cables.
Minges, Michael. 2015. “Exploring the Relationship between
Broadband and Economic Growth.” Background paper,
World Development Report 2016: Digital Dividends, World
Bank, Washington, DC. http://documents.worldbank.org
/curated/en/178701467988875888/Exploring-the-relation
ship-between-broadband-and-economic-growth.
Monash University. 2020. “World’s Fastest Internet Speed
from a Single Optical Chip.” ScienceDaily, May 22, 2020.
https://www.sciencedaily.com/releases/2020/05/2005
22095504.htm.
MTN. 2020. “#Good Together: Sustainability Report for the
Year Ended 31 December 2019.” March 31, 2020, MTN
Group, Johannesburg, South Africa. https://www.mtn
.com/wp-content/uploads/2020/03/MTN-Sustainability
-report.pdf.
NewChina. 2018. “Chinese Phone Maker Celebrates 100-Mln-
USD Export Milestone in Ethiopia.” New China, Novem-
ber 21, 2018. http://www.xinhuanet.com/english/2018-11
/21/c_137622555.htm.
Oughton, Edward, Zoraida Frias, Tom Russell, Doug-
las Sicker, and David D. Cleevely. 2018. Towards 5G:
Scenario-Based Assessment of the Future Supply and
Demand for Mobile Telecommunications Infrastruc-
ture.” Technological Forecasting and Social Change 133
(August): 141–55. https://doi.org/10.1016/j.techfore.2018
.03.016.
Radovanović, Danica, Christine Holst, Sarbani Banerjee
Belur, Ritu Srivastava, Georges Vivien Houngbonon,
Erwan Le Quentrec, Josephine Miliza, Andrea S. Winkler,
and Josef Noll. 2020. “Digital Literacy Key Performance
Indicators for Sustainable Development.” Social Inclusion
8 (2): 151–67. https://doi.org/10.17645/si.v8i2.2587.
RURA (Rwanda Utilities Regulatory Authority). 2019. “Statis-
tics Report for Telecom, Media, and Broadcasting Sector
as of the Fourth Quarter of the Year 2019.” ICT Quarterly
Statistics Report, RURA, Kigali, Rwanda. https://rura.rw
/fileadmin/Documents/ICT/statistics/ICT_Quarterly
_Statistics_report_as_of_December_2019.pdf.
Sandvine. 2019. “The Global Internet Phenomena Report.”
Sandvine, Fremont, CA. https://www.sandvine.com/hubfs
/Sandvine_Redesign_2019/Downloads/Internet%20
Phenomena/Internet%20Phenomena%20Report%20
Q32019%2020190910.pdf.
Sanni, Shola. 2016. “How to Implement Spectrum Re-
Farming.” GSM Association, London. https://www.gsma
.com/spectrum/wp-content/uploads/2017/11/10-Day-2
-Session-3-How-to-Implement-Spectrum-Refarming
-Shola-Sanni.pdf.
Shapiro, Robert J., and Kevin A. Hassett. 2012. “The Employ-
ment Effects of Advances in Internet and Wireless
Technology: Evaluating the Transitions from 2G to 3G
and from 3G to 4G.” NDN and New Policy Institute,
Washington, DC. http://www.sonecon.com/docs/studies
/Wireless_Technology_and_Jobs-Shapiro_Hassett
-January_2012.pdf.
Silver, Laura, and Aaron Smith. 2019. “In Some Countries,
Many Use the Internet without Realizing It.” Fact Tank,
News in the Numbers (blog), May 2, 2019. https://www
.pewresearch.org/fact-tank/2019/05/02/in-some
-countries-many-use-the-internet-without-realizing-it/.
Silver, Laura, Emily A. Vogels, Mara Mordecai, Jeremiah
Cha, Raea Rasmussen, and Lee Rainie. 2019. “Mobile
Divides in Emerging Countries.” Internet  Technology
(blog), November 20, 2019. https://www.pewresearch
.org/internet/2019/11/20/mobile-divides-in-emerging
-economies/.
Smart Axiata. 2019. “Sweating Assets to Drive Down Cost/
GB.” Slide presentation at “Axiata Analyst  Investor
Day,” Smart Axiata, Phnom Penh, Cambodia. https://
axiata.listedcompany.com/misc/6c_Operational
_Excellence_in_Action-Smart.pdf.
Sonia, Jorge. 2020. “Covid-19 Shows We Need More Than
Basic Internet Access: We Need Meaningful Connectiv-
ity.” Alliance for Affordable Internet, Washington, DC.
https://a4ai.org/covid-19-shows-we-need-more-than
-basic-internet-access-we-need-meaningful-connectivity/.
Strusani, Davide, and Georges Vivien Houngbonon. 2020.
“Accelerating Digital Connectivity through Infrastruc-
ture Sharing.” EMCompass Note 79, International
Finance Corporation, Washington, DC. https://www.ifc
Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 183
World Bank. 2016. World Development Report 2016: Digital
Dividends. Washington, DC: World Bank. https://www
.worldbank.org/en/publication/wdr2016.
World Bank. 2019a. “Implementation Completion and
Results Report, Tonga: Pacific Regional Connectivity
Program.” Report No. ICR00004623, World Bank, Wash-
ington, DC. http://documents.worldbank.org/curated
/en/844361556573010769/Tonga-Pacific-Regional
-Connectivity-Program-Project.
World Bank. 2019b. Information and Communications for
Development 2018: Data-Driven Development. Washington,
DC: World Bank. https://openknowledge.worldbank.org
/handle/10986/30437.
World Bank. 2019c. Innovative Business Models for Expanding
Fiber-Optic Networks and Closing the Access Gaps. Wash-
ington, DC: Digital Development Partnership, World
Bank. https://documents.worldbank.org/en/publication
/documents-reports/documentdetail/674601544534500
678/main-report.
World Wide Web Foundation and A4AI (Alliance for Afford-
able Internet). 2018. “Universal Service and Access
Funds: An Untapped Resource to Close the Gender
Digital Divide.” World Wide Web Foundation, Wash-
ington, DC. https://webfoundation.org/research/closing
-gender-digital-divide-in-africa.
Zhu, Tingting Juni, Alan Fritzler, and Jan Alexander Kaz-
imierz Orlowski. 2018. “World Bank Group–LinkedIn
Data Insights: Jobs, Skills, and Migration Trends; Meth-
odology and Validation Results.” World Bank, Wash-
ington, DC. https://datacatalog.worldbank.org/dataset
/world-bank-group-linkedin-digital-data-development
/resource/b16c3403-2d59-45ca-bc31.
.org/wps/wcm/connect/Publications_EXT_Content/IFC
_External_Publication_Site/Publications_Listing_Page
/EMCompass-Note-79-Digital-Infrastructure-Sharing.
TRAI (Telecom Regulatory Authority of India). 2019. “Wire-
less Data Services in India: An Analytical Report.” TRAI,
New Delhi. https://trai.gov.in/sites/default/files/Wireless
_Data_Service_Report_21082019.pdf.
Tsolmondelger, Odkhuu. 2019. “ICT Infrastructure along
Transport Network.” Presentation, November 20, 2019,
Information Communications Network LLC, Ulaan-
baatar, Mongolia. https://www.unescap.org/sites/default
/files/ICT%20Infrastructure%20Along%20Transport%20
Network%2C%20Mongolia%20NetCom.pdf.
UNCTAD (United Nations Conference on Trade and
Development). 2013. Information Economy Report 2013:
The Cloud Economy and Developing Countries. Geneva:
United Nations. https://unctad.org/system/files/official
-document/ier2013_en.pdf.
UN-OHRLLS (Office of the High Representative for the Least
Developed Countries, Landlocked Developing Countries,
and Small Island Developing States). 2017. “Leveraging
Investments in Broadband for National Development:
The Case of Rwanda and Senegal.” UN-OHRLLS, United
Nations, New York. https://unohrlls.org/custom-content
/uploads/2017/07/Leveraging-Investments-in-Broadband
-for-National-Development-2017.pdf.
van der Marel, Erik. 2020. “Sources of Comparative Advan-
tage in Data-Related Services.” Working Paper EUI
RSCAS 2020/30, Robert Schuman Center for Advanced
Studies, Global Governance Programme-393, European
University Institute, San Domenico di Fiesole (FI), Italy.
http://hdl.handle.net/1814/66987.
184 | World Development Report 2021
Massive increases in data traffic herald greater use of online platforms and
underscore the importance of including currently excluded users.
Spotlight 5.1
How the COVID-19 pandemic
has recalibrated expectations of
reasonable data consumption and
highlighted the digital divide
Around the world, the COVID-19 pandemic has
resulted in millions of people using videoconferenc-
ing for working and learning from home and other
activities. The three main platforms reported around
700 million daily users in March–April 2020. Adding
in other platforms, the number of users was equal to
roughly one-tenth of the world’s population. Zoom’s
average number of users jumped from 10 million in
December 2019 to 300 million in April 2020.1
Cisco’s
Webex recorded 324 million users in March 2020,
doubling from January 2020.2
Microsoft Teams had
75 million daily users in April 2020.3
Being home-
bound also resulted in more use of social media, video
streaming, and online gaming.4
These online activities have driven massive
increases in data traffic. In Spain, internet traffic
increased 40 percent in the week following the shut-
down, while mobile data traffic rose 25 percent.5
The
German internet exchange DE-CIX, one of the world’s
largest, recorded a 10 percent increase in traffic
during the first two weeks in March, when shelter-at-
home was implemented in the country, breaking the
world record for data throughput. Videoconferencing
traffic on DE-CIX rose 50 percent, and gaming and
social media traffic grew 25 percent.6
For the most part, telecommunications networks
have stood up well to this massive increase in traf-
fic. Networks are engineered to handle peak traffic,
resulting in large parts of the day where capacity is
more than sufficient. Similar to flattening the curve
for COVID-19, telecommunications operators worked
to smooth fluctuations in traffic flows during the
height of the pandemic in spring 2020. Widespread
deployment of high-capacity fiber-optic backbone
and access networks has proved vital for dealing with
the surge in traffic. During COVID-19, regulators have
increased mobile data capacity by releasing spec-
trum,7
and streaming video services have reduced
traffic 25 percent by using compression technology.8
Some telecommunications operators have increased
their data allowances, and some have provided free
data and smartphones to health workers.
Telecommunication networks have thus far
proved up to the task in the new social distancing
world for those who already have access, but limita-
tions have prevented the transition of vital public
services, such as education, to the online space, with
major repercussions for schooling. Many predict
that videoconferencing will continue to be used
more after the pandemic, though not at the same
high level.
The ability to use Web conferencing tools has
shone a spotlight on the digital divide. Many students
around the world have been excluded from online
learning because they lack broadband access and
computers. Concerns about security surround video
conferencing9
and data privacy for big data analytics
used during the pandemic.10
Telecommunication
companies have also been criticized for waiting for a
crisis to offer pro-consumer data allowances.
How the COVID-19 pandemic has recalibrated expectations of reasonable data consumption | 185
Notes
1. Zoom (2020).
2. Mukherjee (2020).
3. Spencer, Nadella, and Hood (2020).
4. Sandvine (2020).
5. See Telefónica (2020).
6. DE-CIX (2020).
7. GSMA (2020).
8. Florance (2020).
9. Paul (2020).
10. OECD (2020).
References
DE-CIX (Deutscher Commercial Internet Exchange). 2020.
“Internet Exchange Operator DE-CIX Sees a Strong
Change in Internet User Behavior.” Press Release,
March 18, 2020. https://www.de-cix.net/en/about-de-cix
/media-center/press-releases/internet-exchange
-operator-de-cix-sees-a-strong-change-in-internet-user
-behavior.
Florance, Ken. 2020. “Reducing Netflix Traffic Where It’s
Needed While Maintaining the Member Experience.”
Innovation (blog), March 21, 2020. https://about.netflix
.com/en/news/reducing-netflix-traffic-where-its-needed.
GSMA (GSM Association). 2020. “Keeping Everyone and
Everything Connected: How Temporary Access to Spec-
trum Can Ease Congestion during the COVID-19 Crisis.”
Newsroom (blog), March 31, 2020. https://www.gsma.com
/newsroom/blog/keeping-everyone-and-everything
-connected-how-temporary-access-to-spectrum-can
-ease-congestion-during-the-covid-19-crisis/.
Mukherjee, Supantha. 2020. “Cisco’s Webex Draws Record
324 Million Users in March.” Technology News (blog),
April 3, 2020. https://www.reuters.com/article/us-cisco
-systems-webex-idUSKBN21L2SY.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2020. “Tracking and Tracing COVID: Protecting
Privacy and Data while Using Apps and Biometrics.”
OECD Policy Responses to Coronavirus (COVID-19),
OECD, Paris. https://read.oecd-ilibrary.org/view/?ref=129
_129655-7db0lu7dtotitle=Tracking-and-Tracing-COVID
-Protecting-privacy-and-data-while-using.
Paul, Kari. 2020. “Zoom Releases Security Updates in
Response to ‘Zoom-Bombings.’” Guardian, April 23, 2020.
http://www.theguardian.com/technology/2020/apr/23
/zoom-update-security-encryption-bombing.
Sandvine. 2020. “The Global Internet Phenomena Report:
COVID-19 Spotlight.” Sandvine, Fremont, CA. https://
www.sandvine.com/phenomena.
Spencer, Michael, Satya Nadella, and Amy Hood. 2020.
“Microsoft Fiscal Year 2020 Third Quarter Earnings Con-
ference Call.” Microsoft, Redmond, WA, April 29, 2020.
https://www.microsoft.com/en-us/Investor/events/FY
-2020/earnings-fy-2020-q3.aspx.
Telefónica. 2020. “Operators Advise a Rational and Respon-
sible Use of Telecommunication Networks to Cope with
Traffic Increases.” News Release, March 15, 2020. https://
www.telefonica.com/documents/737979/145808680/pr
-usoresponsable-redes.pdf/874e69ed-a201-92b3-4d54
-60de558d0084?version=1.0.
Zoom. 2020. “Zoom: Annual Report, Fiscal 2020.” Zoom
Video Communications, San Jose, CA. https://
investors.zoom.us/static-files/28614884-1d63-477a
-9148-a7039796f19c.
186 | World Development Report 2021
Data infrastructure is becoming increasingly energy efficient and turning
to renewable sources of energy. Increased use of information and
communication technology solutions such as videoconferencing could
help to reduce global carbon dioxide emissions by one-fifth by 2030.
Spotlight 5.2
Data’s carbon footprint
Data infrastructure consumes significant amounts
of energy, with environmental consequences, includ-
ing global warming. Electricity consumption for
data infrastructure amounted to 1 percent of the
global total in 2018 (231 terawatt-hours) (figure S5.2.1,
panel a). Although data traffic grew 100 percent from
2015 to 2018, associated electricity consumption
rose just 16 percent, and its share of total global con-
sumption remains constant. Huge gains in energy
efficiency have made this possible. One reason is
a shift from smaller data centers to more efficient
larger ones, particularly among some of the bigger
players in China, Japan, and the United States. Mod-
ernization of telecommunication networks is also
contributing. Fiber-optic cable is 85 percent more
energy efficient than vintage copper wires, while
each successive generation of wireless technology
conserves more energy than the previous one. For
Figure S5.2.1 Worldwide greenhouse gas emissions from data consumption have
been flat, even though electricity consumption has been growing
Source: World Bank, original research based on annual reports from 49 companies in 26 countries. Data at http://bit.do/WDR2021-Fig-S5_2_1.
Note: Data were derived from environmental reports of companies accounting for 90 percent of data traffic. IT = information technology; tCO2
e= tonnes of
carbon dioxide equivalent.
0
0.2
0.4
0.6
0.8
1.0
1.2
0
50
100
150
200
250
Terawatt-hours
Percent
Percent
tCO
2
e
(millions)
2015 2016 2017 2018
a. Electricity consumption
Data infrastructure
% of world total (right axis)
0
0.1
0.2
0.3
0
25
50
75
100
125
2015 2016 2017 2018
b. Greenhouse gas emissions from
data infrastructure
% of world total (right axis)
Software and IT services
Telecommunication services
Data’s carbon footprint | 187
example, 4G can be more than 50 times more energy
efficient than 2G.1
Despite rising electricity consumption, green-
house gas (GHG) emissions from data infrastructure
have been flat since 2015, equivalent to 0.2 percent of
the global total (see figure S5.2.1, panel b). This is due
to the rising share of renewables in the electricity mix
used by data centers.2
For example, Equinix, one of the
world’s leading data center operators, increased its
share of renewable energy from less than one-third
in 2014 to 92 percent in 2018, leading to a two-thirds
reduction in GHG emissions.3
Moreover, the tech
giants—Apple, Google, and Microsoft—have switched
to 100 percent renewable energy, while completely
offsetting their GHG emissions. Apple generates more
than 600 megawatt-hours of its own energy through
one of the largest investments in renewables outside
of utility companies,4
while Google is the world’s larg-
est nonutility buyer of renewable energy. Digital com-
panies already account for one-quarter of the world’s
renewable energy purchases,5
and they are encourag-
ing their downstream suppliers to follow suit.6
Accesstorenewableenergyisbecominganimport-
ant factor in deciding where to expand data infra-
structure for dozens of major investors. For example,
Equinix seeks markets with favorable renewable
energy policies when deciding on the location of
new data centers.7
Foreign operators are developing
renewable energy projects in low- and middle-income
nations. For example, Vodafone Group is constructing
autility-scalesolarparkintheArabRepublicofEgypt.8
Data infrastructure is also playing a significant
role in reducing emissions, as demonstrated by the
COVID-19 outbreak. The massive use of video confer-
encing, fueled by the pandemic, has greatly reduced
fuel consumption associated with travel—potentially
on a permanent basis.9
Increased use of information
and communication technology (ICT) solutions could
lead to a 20 percent reduction in global carbon dioxide
(CO2
) emissions by 2030.10
This reduction is almost
10 times greater than the ICT sector’s own footprint
during the same period and translates into as much
potential emissions reduction as the mobility, manu­
facturing, agriculture, building, and energy sectors
combined.
Notes
1. IEA (2020).
2. There are, nonetheless, some subtleties of environmen-
tal accounting. Direct emissions from electricity are
reported on either a market basis or a location basis
(Equinix 2019). Market basis reflects the emissions from
the electricity that a company is purchasing, which may
differ from the electricity that the company is using at
its location. Location-based emissions are higher, yet
most companies report market-based emissions.
3. Equinix (2019).
4. Apple (2019).
5. Ambrose (2020).
6. In 2018 Apple announced that 23 of its suppliers had
committed to 100 percent clean energy (Apple 2018).
7. Nareit (2019).
8. Vodafone (2019).
9. Darrow (2020).
10. GeSI (2015).
References
Ambrose, Jillian. 2020. “Tech Giants Power Record Surge in
Renewable Energy Sales.” Guardian, January 28, 2020.
https://www.theguardian.com/environment/2020/jan
/28/google-tech-giants-spark-record-rise-in-sales-of
-renewable-energy.
Apple. 2018. “Apple Now Globally Powered by 100 Percent
Renewable Energy.” Press Release, April 9, 2018. https://
www.apple.com/newsroom/2018/04/apple-now-globally
-powered-by-100-percent-renewable-energy/.
Apple. 2019. “2019 Environmental Responsibility Report.”
Apple Inc., Cupertino, CA, April 2019. https://www.apple
.com/environment/pdf/Apple_Environmental_Respon
sibility_Report_2019.pdf.
Darrow, Barb. 2020. “Zoom Taps Oracle to Keep Schools
Teaching, Businesses Running, Friends and Family Con-
nected.” Oracle News Connect (blog), April 28, 2020. https://
www.oracle.com/corporate/blog/zoom-selects-oracle
-042820.html.
Equinix. 2019. “2018 Corporate Sustainability Report: Con-
necting with Purpose.” Equinix, Redwood City, CA.
https://sustainability.equinix.com/wp-content/uploads
/2019/12/Sustainability-Report-2018.pdf.
GeSI (Global e-Sustainability Initiative). 2015. #SMARTer2030:
ICT Solutions for 21st Century Challenges. Brussels: Global
e-Sustainability Initiative. http://smarter2030.gesi.org
/downloads/Full_report.pdf.
IEA (International Energy Agency). 2020. “Data Centres
and Data Transmission Networks.” Tracking Report,
IEA, Paris, June 2020. https://www.iea.org/reports/data
-centres-and-data-transmission-networks.
Nareit. 2019. “REIT Industry ESG Report.” Nareit, Washing-
ton, DC, June 2019. https://www.reit.com/sites/default
/files/media/PDFs/Research/Nareit2019_Sustainability
Report_webv3.pdf.
Vodafone.2019.“SustainableBusinessReport2019.”Vodafone
Group Plc, Newbury, UK. https://www.vodafone.com
/content/dam/vodcom/sustainability/pdfs/sustainable
business2019.pdf.
World-Development-Report-2021-Data-for-Better-Lives.pdf
Data policies, laws, and regulations: Creating a trust environment | 189
Main messages
Trust in data transactions is sustained by a robust legal and regulatory
framework encompassing both safeguards, which prevent the misuse of
data, and enablers, which facilitate access to and reuse of data.
Safeguards must differentiate between personal data, requiring a rights-
based approach with individual protection, and nonpersonal data, allow-
ing a balancing of interests in data reuse.
Enablers for data sharing are typically more developed for public intent
data, where public policy and law mandating data access and sharing
are more readily established, than for private intent data, where govern-
ments have more limited influence.
Creation of a trust environment remains a work in progress worldwide,
especially in low-income countries. There is no one-size-fits-all legal and
regulatory framework. In countries with weak regulatory environments,
the design of suitable safeguards and enablers may have to be carefully
adapted to local priorities and capacities.
Data policies, laws, and regulations:
Creating a trust environment
6
CHAPTE
R
1
2
3
4
190 | World Development Report 2021
A trust framework of data
safeguards and enablers
W
ith the growing recognition of the use,
misuse, and underuse of data, respon-
sible governance of data has gained
importance, resulting in new global legal and regu-
latory standards. This movement was propelled by
the revelations in 2013 by US whistleblower Edward
Snowden of global surveillance by Western democra-
cies,1
followed by the Cambridge Analytica scandal in
2018.2
In response, countries enacted major policies
to protect data. A series of epochal rulings by India’s
Supreme Court identified a constitutional right to
privacy, and the country is now considering new data
protection legislation. In the European Union (EU),
its General Data Protection Regulation (GDPR) came
into force in 2018 with its application beyond the EU’s
borders,3
and it inspired similar legislation in other
jurisdictions,suchastheUSstateofCalifornia.4
China
implemented its Personal Information Standard in
2018, promulgated its Civil Code in 2020, and intro-
duced a new draft Personal Data Protection Law for
public consultation in 2020.5
Despite these important
advances regarding personal data, legal frameworks
for data governance across much of the developing
world remain a patchwork, raising concerns about
the ability of lower-income countries to benefit from
the development opportunities emerging from the
burgeoning global data economy.
This greater attention to the use and reuse of
personal data is part of an evolving social contract
around data, which remains under negotiation
across the globe (see spotlight 6.1 for an example of
how COVID-19 is creating new challenges for using
data while protecting rights). With a view toward
informing this process, this chapter lays out the legal
mechanisms that enable trusted and trustworthy
domestic and cross-border data transactions for the
use and reuse of both personal and nonpersonal data.
Whether the focus is on the collection, use, transfer,
or processing of data between businesses, or among
citizens, businesses, and governments, each of these
interactions is a data transaction with the potential
to create value—as long as both parties trust the over-
all process sufficiently. However, a variety of factors
can undermine trust. These may include the absence,
weakness, or uneven application of the legal frame-
work; weak institutions and law enforcement or lack
of effective ways for parties to enforce their rights;
practices that unfairly benefit certain actors; skewed
or lopsided incentives (see chapter 8); and poor or
insecure infrastructure (see chapter 5).
From a normative perspective, trust is a function
of both “hard law” and “soft law.” Hard law includes
domestic, regional, and international law, as well as
case law and statutory law that originate from tort,
contract, and competition law. Some of the issues
embedded in domestic law have their origins in
well-hewn and commonly agreed standards derived
from international law, conventions, and treaties.
Emerging applications of trust law and competition
law may also play a valuable role in strengthening the
normative framework for data.
Whereas hard law is shaped by state actors, soft
law includes standards, terms and conditions of use,
norms, and codes of conduct and other voluntary
frameworks used by nonstate actors, including
industry participants and civil society (see chapter 8).
These soft law elements can play an equally valuable
role in governing data use according to needs and
cultural specificity.6
A central claim of this Report is that use of data for
development purposes requires a legal framework for
data governance that includes both safeguards and
enablers. Safeguards generally refers to those norms
and legal frameworks that ensure and promote
trust in the data governance and data management
ecosystem by avoiding and limiting harm arising
from the misuse of data or breaches affecting their
security and integrity. Enablers generally refers to
those policies, laws, regulations, and standards that
facilitate the use, reuse, and sharing of data within
and between stakeholder groups through open-
ness, interoperability, and portability. Whereas the
approach to safeguards differs markedly for personal
and nonpersonal data, a common set of enablers is
relevant to both categories.
For the collection and processing of personal
data, this Report proposes a rights-based approach,
whereby access to personal data must first be ade-
quately safeguarded before enabling use and reuse.
This two-step process helps to rebalance power
asymmetries between data holders/subjects and
data controllers/users that can undermine trust. For
the purposes of this chapter, personal data include
not only data directly provided by an individual,
but also personally identifiable information and
machine-generated information that can readily be
linked to an individual (such as mobile phone data).7
For nonpersonal data, this Report advocates a
balance of interests approach to safeguards and
enablers, recognizing that trade-offs typically arise
between increasing data access and safeguarding
intellectual property rights (IPRs) over nonpersonal
data. The focus is thus on a legal framework that
Data policies, laws, and regulations: Creating a trust environment | 191
enables the (re)use and sharing of data through reg-
ulatory predictability, data openness, and portability
(the ability to readily transfer data from one service
to another based on clear legal and technical stan-
dards). Of growing importance are data that blend
both personal and nonpersonal sources—so-called
mixed data.
The creation, collection, and use or processing
of personal and nonpersonal data by public or
private sector entities in both domestic and cross-
border contexts interact in a dynamic way in a three-
dimensional legal/regulatory space in which differ-
ent elements of the legal framework apply (see figure
6.1). The underlying type of data does not necessarily
determine how the data might be treated legally
across the data value chain; that depends on how such
data are used or processed. For example, data that
may start off as public sector and personal data (such
as household survey, health, or geolocation data) may
end up as private sector and nonpersonal data (when
integrated as part of a proprietary algorithm and per-
fectly deidentified). Similarly, data that start out as
private data may end up in the public domain if pub-
lished as open data or shared with government under
a data sharing agreement. These dynamic shifts in
data uses may change the legal treatment of that data
accordingly.
The trust framework encompassing safeguards
and enablers is underpinned by rule of law and good
governance principles. These include certainty, trans-
parency, accountability, nondiscrimination, fairness,
inclusiveness, and openness. They are subject to due
process limitations such as necessity and proportion-
ality. Transparency, accountability, and certainty in
rulemaking can be reinforced by ensuring that laws
and regulations are developed according to good
regulatory practices. These include supporting con-
sultative rulemaking8
and ensuring that regulations
are based on evidence, with stakeholder impacts and
spillover effects fully considered through regulatory
impact analysis.9
In addition, recent developments
in regulatory design have included efforts to adapt
regulations to the digital age. Mechanisms such as
regulatory sandboxes and laboratories help make
regulations more agile and readily adaptable to evolv-
ing circumstances. By drafting principle-based and
technologically neutral laws and regulations, policy
makers help them remain relevant as technologies
evolve and reduce compliance burdens.
To capture the current robustness and complete-
ness of normative frameworks for data governance
around the world, the chapter draws on a new Global
Data Regulation Survey conducted exclusively for
this Report.10
It collected information on attributes of
the regulatory framework in 80 countries (covering
80 percent of the world’s population) selected from
global regions and country income groups across the
development spectrum. The survey entails a detailed
assessment of domestic laws, regulations, and admin-
istrative requirements, reflecting the regulatory sta-
tus of each country as of June 1, 2020. Survey results
are summarized in a variety of subindexes that cap-
ture different aspects of the regulatory environment
for safeguards and enablers.
This chapter focuses squarely on the legal dimen-
sion of data governance. Chapter 7 then examines the
resulting economic trade-offs, and chapters 8 and 9
discuss the design of institutional ecosystems to sup-
port implementation and enforcement.
Building safeguards for trusted
data use
The term safeguards refers to the trust environment
around the collection and use of data. It includes
supporting individuals’ agency—that is, their ability
to exercise control—over how their personal data are
used, through mechanisms such as consent, rights
Figure 6.1 (1 column plus margin 21p7)
Public
Cross-border
Nonpersonal
Domestic
Personal
Private
Types of data
Types of actors
Types of transactions
Figure 6.1 Envisioning the multidimensional nature of
the legal framework for trust
Source: WDR 2021 team.
192 | World Development Report 2021
of use of data, and regimes that allow reuse of data
for “legitimate purposes” without express consent.
Safeguards also encompass how data are secured and
accessed, covering the obligations of those who col-
lect, process, or use data to take precautions to ensure
the integrity of the data and protect data rights,
including intellectual property rights and other lim-
itations on the use of nonpersonal data (see figure 6.1).
Safeguards are analyzed primarily according to
whether they are related to personal data, nonper-
sonal data, or mixed data. The degree of sensitivity of
these types of data differs markedly, leading to vari-
ous legal approaches.
Safeguards for personal data, nonpersonal
data, and mixed data
Safeguards for personal data are grounded in a rights-
based framework that has evolved over time (see
figure 6.2). These safeguards have their origin in the
establishment of the “rule of law” in conjunction with
the expression of individual rights in the Enlighten-
ment and were codified in international law after
World War II. They were further refined in the con-
text of analog data in the 1970s and 1980s with the
Fair Information Practices, the Council of Europe’s
Convention 108 for the Protection of Individuals with
regard to Automatic Processing of Personal Data,11
and the first guidelines issued by the Organisation for
Economic Co-operation and Development (OECD).
Safeguards must necessarily adapt to technological
change and will continue to evolve accordingly. For
example, the OECD guidelines were updated after the
launch in 1995 of the World Wide Web, and Conven-
tion 108 was updated to Convention 108+ in response
to the entry into force of the GDPR.
Safeguards for nonpersonal data entail a more
straightforward balancing of economic incentives
and interests, grounded in IPRs as typically enshrined
in domestic law.
For datasets containing mixed data, it is the
responsibility of the data processing entity to ensure
thatpersonaldataareprotected.Thiscompliancechal-
lenge has become more acute in recent years because
source data and collection methods have evolved and
Figure 6.2 The evolution of data protection
Source: WDR 2021 team.
Note: This figure shows a clear arc from historical concepts of rights governing interactions between the state and the individual (and between states) to principles guiding data
protection. EU = European Union; GDPR = General Data Protection Regulation; OECD = Organisation for Economic Co-operation and Development; UDHR = Universal Declaration of
Human Rights; UN = United Nations; WWW = World Wide Web.
Figure 6.2 (2 column plus margin 38p6)
Treaties of
Westphalia
Introduces the
rule of law
Déclaration des
droits de l'homme
et du citoyen
Expression of
individual rights
Article 12, UDHR
Rights codified
in international
law
Fair Information Practices
Modern expression
of data privacy
Council
of Europe
Convention 108
2019 Convention
108+ codifies
data protection
OECD
guidelines
WWW
launched
EU directive
First expression of
data protection
rights of the
digital age
Privacy
by design
GDPR
UN
Principles
1981
1948
1973
1983
2013
1995
2009 2018
NEXT?
1995
1789
1648
2018
Data policies, laws, and regulations: Creating a trust environment | 193
blurred the distinction between the traditional legal
definitions of personal and nonpersonal data.
The Global Data Regulation Survey conducted
for this Report provides an overview of the extent
to which good-practice data safeguards have been
adopted. Across the 80 countries surveyed, about
40 percent of the elements of good-practice regula-
tory safeguards are in place. Although scores range
consider­ably, from less than 35 percent in low-income
countries to more than 50 percent in high-income
countries, the results highlight that even among the
latter the regulatory framework is far from complete.
Of the high-income countries included in the survey,
Australia, the United Kingdom, and Uruguay stand out
as those with the most advanced safeguards. Among
the strongest middle-income countries are Colombia,
Moldova, and Nigeria. Other low- and middle-income
nations that have endeavored to develop safeguard
regulatory frameworks are Benin, Mexico, and Tur-
key. Mauritius, a standout among its middle-income
peers, performs well on most safeguard measures. It
has deliberately designed and implemented policies
based on best practices and has distinguished itself
as one of the first Sub-Saharan African countries to
ratify Convention 108+. In Latin America, Uruguay
is one of two countries to have received an adequacy
determination from the European Commission.
Overarching safeguards for cybersecurity
and cybercrime
A key element in establishing trust in the data eco-
system for both personal and nonpersonal data is
ensuring the security of the network infrastructure
and elements over which data flow.
Cybercrime laws effectively give teeth to cyber-
security policies. Although there is no universally
accepted definition of cybercrime, the concept
encom­
passes both a narrow view—criminal activities
targeting information and communication technolo-
gies (ICT) and software—and a broader view—tradi-
tional crimes committed in cyberspace.12
In practice,
the scope of cybercrime is typically understood to
include unauthorized access to a computer system
(sometimes called hacking), unauthorized monitor-
ing, data alteration or deletion, system interference,
theft of computer content, misuse of devices, and
offenses related to computer content and function.13
Cybercrime knows no borders. The crime can be
committed from any computer, no matter where,
connected to the internet or from a public or private
entity that relies on ICT systems. Similarly, the impact
of the crime can be felt anywhere, even outside the
jurisdiction where the cybercriminal is physically
located. Thus to be truly effective, a cybercrime law
needs to extend beyond dealing with criminal activ-
ity within a subnational or national jurisdiction and
become a tool to maximize cross-border coopera-
tion.14
This requirement entails the legal notion of
dual criminality, which establishes that a criminal
activity in one jurisdiction is also a criminal activity
in another.15
It also demands practical collaboration,
usually achieved through mutual legal assistance
treaties (MLATs).
Countries enter into MLATs either through bilat-
eral treaties with other countries or by adhering to
an instrument that features a built-in MLAT process,
such as the Council of Europe’s Budapest Convention
of 2001. The main legal instrument for cybersecurity
in Europe and beyond, this convention provides for
balancing security interests with respect for human
rights.16
Sixty-five countries have acceded to the
­
convention, with an additional 12 states participat-
ing as observers.17
Of the members and observers, 26
countries are lower-middle-income. Recently, some
governments have been sidestepping the MLAT pro-
cess by making requests for evidence directly to for-
eign law enforcement agencies and allowing them to
do likewise. In this vein, the United States adopted
the Clarifying Lawful Overseas Use of Data (CLOUD)
Act of 2018, which authorizes the US government to
enter into bilateral agreements with foreign gov-
ernments, allowing the parties to remove any legal
barriers that would prevent the other party from
seeking and obtaining data directly from the service
providers in the other country under certain circum-
stances.18
This has attracted comment for potentially
sidestepping legal protections for personal data.19
The European Union is considering a draft regula-
tion with similar provisions.20
Cybersecurity encompasses the data protection
requirements for the technical systems used by
data processors and controllers, as well as the estab-
lishment of a national Computer Security Incident
Response Team (CSIRT), an expert group that handles
computer security incidents (see chapter 8). In addi-
tion to dealing with the criminal behaviors discussed,
cybersecurity also builds trust by addressing unin-
tentional data breaches and disclosures (such as those
resulting from badly configured servers) and holding
firms accountable.
Overall, the Global Data Regulation Survey reveals
a low level of uptake of cybersecurity measures (fig-
ure 6.3). None of the low-income countries included
in the survey has legally imposed a full range of
security measures on data processers and controllers.
Evenamonghigh-incomecountries,barely40percent
194 | World Development Report 2021
of those surveyed require data processors and con-
trollers to comply with these security requirements,
such as by adopting an internal policy establishing
procedures for preventing and detecting violations;
establishing the confidentiality of data and systems
that use or generate personal data; appointing a per-
sonal data processing or information security officer
or manager; performing internal controls; assessing
the harm that might arise from a data breach; or intro-
ducing an awareness program among employees.
CSIRTs are far more prevalent. They can be found in
all high-income countries and in about one-third of
low-income countries.
Among the lower-middle-income group, a good
reflection of best practice is the comprehensive cyber-
security requirements in Kenya’s new Data Protection
Act. It requires data controllers to consider measures
such as pseudonymization and encryption of data;
an ability to restore the availability of and access to
personal data in the event of a physical or technical
incident; and mechanisms to identify internal and
external risks to personal data that are reasonably
foreseeable. It also requires steps to ensure that
safeguards are established, effectively implemented,
and continually updated in response to new risks or
deficiencies.
Safeguarding personal data
To better address underlying concerns about the
power asymmetries between (individual) data sub-
jects and data processors and collectors, this Report
advocates an approach based on control over personal
data rather than one grounded in data ownership
(see spotlight 6.2). Under the rights-based approach
to protection of personal data, individuals have fun-
damental rights regarding their personal data. These
rights are both substantive and procedural.
Substantive rights include measures preventing the
unauthorized disclosure of personal data and the use
of personal data for unwarranted surveillance, unfair
targeting, exclusion, discrimination, unjust treat-
ment, or persecution. Such substantive rights also
require purpose specification, data minimization, and
storage limitations.
Procedural rights are built around the concepts of
necessity, transparency,21
accountability, proportion-
ality, and due process. They include rights to receive
notice about and to object to how data are used and
rights of access to correct and erase data (including
the right to be forgotten),22
as well as rights to redress
and remedy. These rights are grounded mainly in
domestic law. The absence of a harmonized global
legal framework for protection of personal data
affects cross-border data transactions involving
personal data, which are especially limited in lower-
middle-income countries (see chapter 7).
Adoption of data protection laws is comparatively
widespread.23
Nearly 60 percent of countries sur-
veyed for this Report have adopted such laws, ranging
from 40 percent of low-income countries to almost
80 percent of high-income countries (figure 6.4).
Yet the quality of such legislation is uneven, with
important good-practice elements often lacking.
Legal frameworks for the protection of personal data
should typically include individual rights to chal-
lenge the accuracy and object to the use of personal
data and parallel requirements for data processors to
limit the purpose of data use, minimize the volume
of data collected, and limit the time frame for data
storage. These legal provisions are much less prev-
alent in low- and middle-income countries than
in high-income countries. Although many lower-
middle-income countries have laws on the books,
their enforcement is uneven: only some 40 percent
of low-income and lower-middle-income countries
have created a data protection authority, compared
with some 60 percent of high-income countries.
Figure 6.3 Gaps in the regulatory framework for
cybersecurity are glaring across country income
groups
Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata
.worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_3.
Note: The figure shows the percentage of countries in each country income group that had adopted
good-practice legal and regulatory frameworks for cybersecurity and cybercrime as of 2020. CSIRT =
Computer Security Incident Response Team.
High-income Upper-middle-income
Lower-middle-income Low-income
Security requirements for automated
processing of personal data
CSIRT
Cybersecurity
requirements for
data processors
and controllers
Regulation of
cybercrime
activities
0
20
40
60
80
100
Data policies, laws, and regulations: Creating a trust environment | 195
The uneven quality of data protection legislation
affects in practice the effectiveness of safeguards for
personal data.
After many years in the making, both Kenya and
Nigeria recently updated their legal regimes for data
protection. In 2019 Kenya’s new Data Protection Act
entered into force, while Nigeria’s National Informa-
tion Technology Development Agency issued a new
Data Protection Regulation. Both instruments reflect
many of the elements contained in the GDPR. India
is also debating landmark data protection legislation.
Requiring consent or other lawful bases for data collec-
tion and processing. Most data protection laws rely on
individual consent as one lawful means of limiting
how data about individuals can be collected and
used.24
The appropriate lawful basis for data process-
ing depends on factors that include how the data will
be used and the relationship between the parties. For
example, consent may not be the appropriate basis for
data processing by public authorities.25
The consent model has normative and practical
limitations. Current commercial practices often adopt
a “tick the box” approach to obtaining consent, and
they are more often based on incentives to limit
corporate liability than on a desire to ensure that
consent is “informed” (that is, that individuals fully
understand what will happen to the information they
have authorized for collection and are effectively in
control of how their data will be further used and
shared). Privacy notices are often long, complex doc-
uments written by companies’ legal teams. It is, then,
difficult for people to read all the disclosure docu-
ments on the websites they visit or for all the apps on
their smartphones.
This difficulty is particularly acute in the devel-
oping world, where literacy rates remain low and
individuals face language and technical barriers to
understanding privacy notices. In such cases, data
processors should take extra care to obtain informed
consent through adapted means. Firms can use con-
sent to justify collecting and processing excessive
amounts of data, especially in countries where data
protectionauthoritiesmaynothaveenoughresources
to monitor and enforce compliance with other obliga-
tions, such as data minimization. Addressing these
concerns would require taking a more user-centric
approach to obtaining informed consent for the
collection of volunteered data, including using
Figure 6.4 Progress on personal data protection legislation differs markedly
across country income groups
Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata.worldbank.org/index.php/catalog/3866. Data at http://
bit.do/WDR2021-Fig-6_4.
Note: The figure shows the percentage of countries in each country income group that had adopted good-practice legal and regulatory frameworks to
safeguard personal data as of 2020.
High-income Upper-middle-income
Lower-middle-income Low-income
0
20
40
60
80
100
Data protection law enacted
No exception to limitations on data collecting
and processing by governments
Necessity and proportionality test
for government exceptions
Purpose limitation
requirement
Data minimization
requirement
Data storage limitations
Requirements to incorporate
privacy by design
Regulation of data sharing with third parties
Individual right to challenge accuracy and
rectify personal data
Regulatory limitation on algorithmic
decision-making
Individual right of redress to object to
use of personal data
Data protection authority created
196 | World Development Report 2021
simplified terms of service and embedding respon-
sible data collection practices in operations to avoid
collecting excessive amounts of data.
Although consent may still be an appropriate
lawful basis in some instances (such as when data
are volunteered by individuals), newer technologies
involving passive data collection (such as by the
Internet of Things) and merging or linking datasets
to make inferences pose further challenges to the
consent model.
Alternatives to consent include relying on other
lawful bases for processing personal data, including
resorting to a “legitimate purpose” test or fiduciary
duty requirement. A legitimate purpose test would
limit the use of personal data to what is compatible,
consistent, and beneficial to data subjects based on
the original purpose for which the data were col-
lected. Under this approach, data could still be used
for more wide-ranging purposes if they are anony­
mized or aggregated to, for example, develop new
products and services, or to make risk assessments
without impinging on the data subject’s rights. Rely-
ing on a fiduciary duty approach would require data
collection and processing firms to always act in the
best interests of data subjects and in ways that are not
detrimental to them. Legally obligating providers to
act in the best interests of their customers can help
establish trust and confidence among customers
that their data are being used responsibly. Examples
of fiduciary duty breaches include using customer
data to unfairly manipulate purchasing decisions.
Another alternative to these approaches that might
require less oversight is to ban use of certain types of
data outright based on identified possible misuses of
personal data.26
In principle, the limitations on the use of personal
data enshrined in data protection legislation apply
to all parties that process or control personal data.
Nevertheless, governments may choose to create
exceptions to these compliance and liability limita-
tions for data processing by public sector entities. The
Global Data Regulation Survey indicates that these
exceptions are widespread in all surveyed countries
that have data protection legislation (figure 6.4). Most
of these exceptions are limited and pertain to specific
data uses, such as in relation to national security
as in Brazil and India27
or in transactions involving
health data as in Gabon. Other countries have passed
laws that provide for more wide-ranging exceptions,
including exemption from the requirement to obtain
consent from data holders when performing lawful
government functions such as service delivery.28
Where such government exceptions exist, good
practice calls for them to be transparent and objec-
tive. They should also be limited in scope and dura-
tion (such as through sunset provisions) to respect
due process limitations. These exceptions must be
“necessary and proportionate” to the intended objec-
tives—limitations designed to ensure that any estab-
lished exceptions are lawful and balanced against the
objective being sought.29
Furthermore, exceptions
should be consistent with international human rights
law. More than one-third of high-income countries
require justification for the exceptions, while less
than 10 percent of surveyed low-income countries
place such process limitations on government action.
This lack of limitations creates additional opportuni-
ties for unchecked state surveillance or mission creep,
thereby undermining trust in data use.30
Meetingtechnologicalchallenges. Rapid technological
progress in data processing, machine learning, and
artificial intelligence (AI) pose challenges to current
data protection frameworks. In particular, traditional
data protection is based on the notion that infor-
mation is volunteered by the data subject, whereas
data analysis is increasingly based on observed data
(obtained from passive scraping of information from
devices and social media accounts) or inferred data31
(generated from a vast array of correlates using statis-
tical techniques). In addition, AI and machine learn-
ing rely on large-scale datasets to function, creating
tensions with established data protection principles
such as data minimization. Although linking these
data sources provides a fuller picture of the individ-
ual, the linked data could also have a negative impact
on the subject if used in decisions such as on credit
or employment, with limited enforceability of the
protections applicable to volunteered data, includ-
ing accessing and seeking correction of erroneous
information.
The increasingly widespread practice of linking
datasets to feed algorithms also stretches the limits
of technical mechanisms to protect personal data,
such as anonymization. Unlike pseudonymized data,
once data are thoroughly deidentified legally they are
no longer considered to be personal data. Thus they
can be published or used outside the scope of data
protection law, even if the original source contains
personal data.32
Although anonymization techniques
can protect individual datasets, research has shown
that linking datasets enables the reidentification of
individuals in deidentified data and risks blurring the
boundary between personal and nonpersonal data.33
At the same time, anonymization techniques can
Data policies, laws, and regulations: Creating a trust environment | 197
reduce the size and accuracy of datasets, affecting
their value to third parties once published.34
Even when anonymization techniques can deiden­-
tify individuals, concerns are growing about the
use of such data to identify groups of people who
could be targeted for surveillance or discrimination
(including groups defined by ethnicity, race, religion,
or sexual orientation).35
Data protection laws need
to keep pace with technological efforts aimed at
deanonymization.36
Laws could require data users to
adopt a holistic approach37
to data protection that can
be adapted to different risks from data uses,38
includ-
ing protecting data by design and default.
Adopting “data protection by design.” Data protection
by design embeds data protection practices into the
initial design phase of data-driven products and ser-
vices39
through a combination of hardware and soft-
ware features, legal and administrative provisions,
and privacy-enhancing technologies (PETs) using
encryption40
and statistical techniques.41
Such mea-
sures complement and enhance existing legal data
protection in ways that reduce the risk of identifiabil-
ity of data.42
Data protection by design has evolved from “pri-
vacy by design,” which was first adopted as an inter-
national standard in 2010. It was later recognized
by its inclusion in the Mauritius Declaration on the
Internet of Things in 2014,43
with a new International
Organization for Standardization (ISO) standard
under development.44
The concept—originally devel-
oped in Canada45
—has been integrated into data
protection regulation and practice in the European
Union,46
as well as Australia (State of Victoria);47
Hong Kong SAR, China;48
and the United Kingdom.49
Nevertheless, the Global Data Regulation Survey
indicates limited uptake of data protection or privacy
by design approaches. Less than 20 percent of the
countries surveyed have adopted such requirements,
ranging from 36 percent uptake in the high-income
countries surveyed to negligible adoption in middle-
income countries (figure 6.4). An interesting excep-
tion is Benin, which mandates “data protection by
design” in its Digital Code Act.
PETs are often used to deidentify data at the
source (for example, by relying on anonymization
and aggregation) to reduce their identifiability. The
result may be a trade-off between the level of data
protection afforded and the resulting usefulness of
the data (for data uses requiring granular or identifi-
able characteristics such as gender or age). Research
showing the ease of reidentifying previously deiden-
tified data (using only four data points50
or when
linking datasets) has highlighted the limitations of
current anonymization methods and has prompted
the development of new techniques.51
Separately, the
value of encryption-based PETs may be limited if law
enforcement authorities argue that back doors should
be included in these systems.
These limitations have also prompted the emer-
gence of other mechanisms to protect personal
data, including personal information management
systems (PIMS) such as Safe Sharing sites52
and per-
sonal data stores.53
These tools can help users store,
use, and manage how their personal information is
shared with third parties. To address certain cyber-
vulnerabilities and technical features of data protec-
tion by design and act as effective safeguards, PETs
should be accompanied by supporting organizational
and behavioral measures.54
Dealing with automated processing. The growing use
of algorithms for automated processing of personal
data can add significant value through the applica-
tion of predictive analytics, but it poses additional
regulatory and societal challenges. These include
algorithmic bias, risks to personal data protection,
and lack of transparency, accountability, and other
procedural safeguards (such as redress) to ensure that
decisions made on the basis of automated processing
are conducted in compliance with due process.55
Only
about 30 percent of countries included in the Global
Data Regulation Survey have put in place measures
to restrict decision-making based on automatically
processed personal data (figure 6.4). Among the rela-
tively small number of countries whose laws address
this, Côte d’Ivoire has included provisions in its data
protection act that prohibit the use of automated pro-
cessing of personal data in judicial decision-making
to prevent bias.56
Automated processing of personal data in the
criminal justice sector is an example of controversial
public sector use of these technologies—especially
those using facial recognition—that can perpetuate
biases.57
A 2016 study conducted in Oakland, Cali-
fornia, found that, despite survey data showing an
even distribution of drug use across racial groups,
algorithmic predictions of police arrests were con-
centrated in predominantly African-American com-
munities, creating feedback loops that reinforced
patterns of structural or systemic bias in the history
of police arrests.58
Algorithms can also introduce
racial biases when facial recognition algorithms
are trained predominantly on data from Caucasian
faces, significantly reducing their accuracy in rec-
ognizing other ethnicities.59
Evidence suggests that
198 | World Development Report 2021
racial60
and gender61
bias in private sector uses of AI
for decision-making is also prevalent.
Additional challenges within the public sector
include a lack of transparency and accountability
in the use of automated decision-making systems.
Many of the technologies procured by public sector
entities are developed by private sector corporations.
Thus, the underlying algorithms may be subject to
copyright or other IPRs that restrict the ability to
undertake independent third-party audits. The use of
such technologies by the public sector, without imple-
mentation of the appropriate audits and grievance
redress mechanisms, may impair public trust in data
processing by institutions and lead to discrimination
or otherwise unfair decisions.
Because of these challenges, as the uptake in
AI technologies and automated decision-making
systems increases in both the public and private
sectors, some principles for algorithmic regulation
are emerging at both the national and international
levels. Internationally, the focus has frequently been
on developing guiding principles based on data
ethics. For example, OECD and the Group of Twenty
(G-20) published two closely related sets of princi-
ples on ethical AI in 2019 that highlight the need to
ensure transparency, explainability, and inclusion of
unrepresented or vulnerable groups in the design
and implementation of AI systems.62
Fulfilling this
needwillrequiresignificantcapacity-buildingefforts
to promote responsible use of AI in lower-income
countries.
Principles grounded in data ethics can be applied
to other types of data uses that may have important
societal impacts. Human rights-based frameworks,
for example, can provide useful guiding principles
for responsible data use.63
Some countries have made
efforts to support transparency and accountability
in the use of AI and automated decision-making
systems in the public sector by publishing the source
code of algorithms in public registers,64
revising
pro­
curement rules, and developing charters,65
regu-
lations, or certifications.66
In February 2020, a Dutch
court ruled that an automated surveillance system
developed to detect welfare fraud in the Netherlands
(SyRI) violated human rights by not meeting a “fair
balance” between its objectives and its risk to privacy.
It then halted the system.67
Relying on competition and consumer protection laws.
In countries where data protection legislation is not
yet in place, other statutory instruments—notably,
consumer protection and competition legislation—
have been leveraged to protect the data rights of
individuals, notwithstanding the rights’ distinct legal
focus. Under a rights-based approach, data protection
law is generally aimed at achieving individual agency,
whereas consumer protection law aims to promote
economic fairness for consumers, and competition
law strives for fairness among businesses. These
approaches are complementary, but they are not
an adequate substitute for the scope and protection
of a rights-based data protection legal framework.
Nonetheless, consumer protection agencies may have
wider-ranging powers than data protection authori-
ties,68
equipping them to address some of the issues
underlying misuse of personal data, such as unfair
consumer practices or competition concerns (see
chapter 7 for further discussion of data and compe-
tition issues).69
Safeguarding nonpersonal data
Safeguards for the domestic use and reuse of non-
personal data revolve around the protection of
intellectual property rights fit for the digital age, as
well as cybersecurity measures. Various contractual
elements affecting how entities use and reuse non-
personal data (and even mixed data) are also relevant,
including contracts themselves (terms and condi-
tions, assignment of liability and remedies), as well
as industry standards, codes of conduct, and audit
requirements. Soft law tools include the use of stan-
dards to broker trust among entities exchanging data.
Nonpersonal data produced by the private sector
can be protected under copyright, although copyright
is limited to protecting creative expression, such as
compilations, as opposed to raw data. Some govern-
mentshaveintroducedinnovationstoovercomethese
limitations.70
Observing that while the rights to data
utilization may be controlled by contract but are not
always specified in terms, Japan’s Ministry of Econ-
omy, Trade and Industry updated application of the
Unfair Competition Prevention Act to provide protec-
tion for industrial data by publishing guidelines along
with model contract clauses for data transactions.71
India’s Ministry of Electronics and Information Tech-
nology published a draft governance framework for
nonpersonal data, recommending clarifications on
the scope, classification, rights of use of nonpersonal
data, and creation of a nonpersonal data authority.72
Governments may also wish to establish rules to
support the reuse of public sector data by preventing
the private sector from setting excessively high prices
for the use of licensed data-driven products and
services developed using public sector, or otherwise
“high value,” data. One mechanism is to mandate
firms to license such products on fair, reasonable, and
non-discriminatory (FRAND) terms by considering
Data policies, laws, and regulations: Creating a trust environment | 199
them “essential data infrastructure.” Governments
may, however, find that IPR protection of nonper-
sonal data conflicts with other policies that encour-
age the interoperability of data systems and the free
reuse of datasets.
Protection of nonpersonal data under an IPR
regime is currently more prevalent in upper-middle-
income countries than in most of the low-income
countries surveyed. Fifty percent of upper-middle-
income countries protect nonpersonal data under
their respective IPR frameworks. For example, Brazil’s
copyright law covers the use of databases containing
“economic rights.”73
Similarly, in Bangladesh pro-
gramming codes, data, and charts are deemed to be
the property of the owner, as indicated in the 2000
Copyright Act.
Creating enablers for data
sharing
This section examines a variety of enablers, including
thoserelatedtoelectronictransactions(e-transactions),
data sharing policies (including open data, access to
information regimes, open licensing), and exceptions
to the liability of data intermediaries.
Enablers are primarily analyzed according to the
domain of the data—that is, whether data are gener-
ated or controlled, or both, by the public or private
sector. This approach highlights the varying margin
of control that governments have over these two
types of data. For public sector data, governments
can employ several policy and legal tools to directly
mandate access to and sharing of data—indeed, some
already do so for certain health, patent, and even
airline passenger data. By contrast, most data transac-
tions involving the private sector are based on volun-
tary contractual agreements. The government’s role
is largely limited to creating incentives to promote
private sector data sharing. Although the discussion
here deals mainly with domestic data transactions,
many of the enablers can be adapted to cross-border
data transactions (see chapter 7).
Across the 80 countries surveyed for this Report,
just under half (47 percent) of the elements of a
good-practice regulatory framework for enabling
data use and reuse are in place. The scores range
considerably, from 30 percent among low-income
countries to 62 percent among high-income coun-
tries. Although Estonia and the United Kingdom
stand out among the high-income countries surveyed
for the most advanced enablers, their performance
is matched in the middle-income group by Mexico.
Several other low- and middle-income nations are
also making progress establishing regulatory frame-
works to enable data reuse, such as China, Colombia,
Indonesia, and Nigeria.
Overarching enablers for electronic
transactions
Many data uses or transfers are executed via elec-
tronic transactions. Individuals using their data to
transact online need assurance that their data are
being used in a safe and secure manner. Laws gov-
erning e-commerce and e-transactions provide an
overarching legal framework that helps create trust
in both public and private sector online data transac-
tions, which, in turn, encourages use of data online.
Introducing e-commerce laws. A good-practice regu-
latory environment for electronic transactions begins
with foundational e-commerce legislation, which is
a prerequisite to the widespread use of more sophis-
ticated online credentials. Such laws are relatively
widespread; more than 70 of the countries surveyed,
including about 70 percent of low-income countries
surveyed, have such laws. And there is little variation
across country income groups (figure 6.5). Legal rec-
ognition of electronic signatures is one of the few
areas in which high-income countries remain far
ahead of low- and middle-income countries.
Establishing legal equivalence of paper-based and elec-
tronic communications. In a legal framework, the cen-
tral issue is to establish that a data transfer will not
be denied legal value merely because it is done elec-
tronically—that is, the online transaction, contract,
or communication has legal equivalence to physical
transactions, and electronic evidence has probative
value.74
For example, electronic contracts and signa-
tures are given the same legal value as a wet ink sig-
nature on a paper contract, and digital evidence has
the same value as physical evidence.75
The majority of
surveyed countries’ e-commerce legislation includes
such provisions (figure 6.5), an unsurprising finding
given that model laws on e-commerce were promul-
gated in the late 1990s.76
For example, provisions
enabling e-transactions are found in Morocco’s Law
No. 53-05 (2007), and good-practice provisions are
embedded in Thailand’s Electronic Transactions Act
(2019 amendments).
Authenticating parties to an online transaction. Spe-
cial legal treatment surrounds the manner in which
parties to an online transaction are authenticated.
Most laws governing e-transactions take a layered
approach to the digital authentication of parties to
a transaction, with built-in recognition that certain
types of online transactions require greater degrees
of reliability about the identity of parties, while others
200 | World Development Report 2021
require lower levels of assurance. Some—such as land
transactions and certain family law matters, includ-
ing marriage and divorce—are generally outside the
scope of these laws because of the sensitive nature
of the transaction. For transactions requiring a high
level of assurance, public or private key infrastructure
is often recognized in e-transaction laws as providing
robust authentication, and it is backed up by a digital
certification process.77
Other trust services may also
be specified as a basis for verifying and validating
electronic signatures, seals, or time stamps; verify-
ing and validating certificates to be used for website
authentication; and a range of activities related to
data transfers.78
Introducing digital identification. An important tool
for authentication of parties to a digital transaction
is a trusted digital identification system with wide-
spread coverage, allowing individuals to securely
prove their identity in online settings. Currently, an
estimated 1 billion people worldwide do not have
government-recognized proof of their identity (and
many more do not have the means to securely and
reliably prove who they are in the digital world).79
Although the use of digital identity verification and
authentication tools is on the rise, driven in part by
advances in connectivity as well as growth in digital
payments and services,80
fewer than half of surveyed
countries have government-recognized digital identi-
fication systems that would enable people to remotely
authenticate themselves to access e-government
services. Those that do are mainly higher-income
nations (figure 6.5).
Ensuring technical neutrality of online systems.
E-transaction laws should be principle-based and
technology-neutral so that they accommodate a wide
range of technical solutions and avoid requiring
specific authentication technologies to the exclusion
of others. Such requirements avoid capture of the
e-transaction or authentication market and help laws
adapt as technologies evolve.81
Technology neutrality
is also a feature of digital identity programs and of
some digital identity laws.82
Enabling reuse of public intent data
The challenges with sharing and reusing public sec-
tor data abound.They include barriers to the real-time
provision of data; data not being shared or published
in reusable formats (standardized and machine read-
able with metadata); and data not being provided at
reasonable cost. Usage is also affected by the quality
or relevance of the data being shared. Political econ-
omy factors, including the absence of a data sharing
culture in public administration and lack of coordina-
tion among government entities, can further impede
the exchange of public sector data (see chapter 8).
Overcoming these challenges can yield consider-
able returns. An impact assessment of the 2003 Direc-
tive on the Reuse of Public Sector Information found
that in the European Union the direct economic value
of public sector information was €52 billion in 2017,
potentially rising to €194 billion by 2030.83
In recog-
nition of such potential value, national governments
have ramped up efforts to use policy, legal, and regula-
tory tools to mandate data sharing within and beyond
the public sector.
A good-practice regulatory environment for
enabling reuse of public sector data would include
foundational legislation on open data and access to
information, as well as digital identity verification
and authentication; a data classification policy; adop-
tion of syntactic and semantic interoperability; and
user-friendly licensing arrangements. The surveyed
countries have adopted about half of such good prac-
tices, ranging, on average, from less than 30 percent
by low-income countries to two-thirds by high-
income countries (figure 6.6).
Legislation to promote and regulate the publica-
tion and use of public sector data (open government
Figure 6.5 Adoption of e-commerce and related
legislation is widespread across country income
groups
Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata
.worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_5.
Note: The figure shows the percentage of countries in each country income group that had adopted
good-practice legal and regulatory frameworks for e-commerce as of 2020.
High-income Upper-middle-income
Lower-middle-income Low-income
0
20
40
60
80
100
E-commerce law
Legal equivalence
of paper-based
and electronic
communications
Legal recognition of
electronic signatures
Technological
neutrality of electronic
communications
Digital ID
system to access
e-government
services
Data policies, laws, and regulations: Creating a trust environment | 201
data) can be passed as stand-alone open data acts,
such as in the Republic of Korea and Mexico; embed-
ded in other related legislation, such as the laws
mandating data sharing in Australia,84
India, and the
United Kingdom;85
or through broader e-government
omnibus legislation, such as France’s Law for a Digital
Republic.86
Themattercanalsobetackledatthesupra-
national level, such as through the European Union’s
Open Data Directive of 2019 (replacing the Public
Sector Reuse Directive of 2003), which includes a list
of “high value datasets”87
to be published at no charge
as key inputs to the development of AI.
Open data policies or laws and access to informa-
tion (ATI) legislation (also known as right to informa-
tion or freedom of information) play complementary
roles as enablers for the use and sharing of public
sector data. Open data policies or laws require public
disclosure of data as the general rule (ex ante disclo-
sure) rather than waiting for an individual request for
access to information (ex post disclosure).88
In coun-
tries that have passed open data policies without any
legal foundation, the publication of open government
data relies on the cooperation of holders of public
sector data to publish their data. By contrast, ATI
legislation provides citizens and firms with a legally
enforceable right to compel disclosure.89
Open Barometer, an organization that compiles a
global measure of how governments are publishing
and using open data for accountability, innovation,
and social impact, recommends aligning access to
information and open data. This alignment would
entail amending ATI laws to provide for proactive dis-
closure of data and mandating that nonpersonal data
will be open by default, available in machine readable
formats, and published under an open license to
enable reuse outside government.
About one-third of surveyed countries have open
data legislation, and more than 70 percent have ATI
legislation (figure 6.6). Whereas ATI legislation is
widespread in countries across all stages of devel-
opment, adoption of open licensing regimes is more
common in high-income countries.
Establishing open data policies. A country’s public
sector data being prepared for publication can be clas-
sified on a spectrum from closed to open. According
to the Open Knowledge Foundation, for data to be
considered open it must be “freely used, re-used and
redistributed by anyone—subject only, at most, to
the requirement to attribute and sharealike.”90
Open
data are thought to be the most decisive approach
governments can use to enhance access to public
sector data and enable their reuse by third parties to
create value.91
According to the Open Data Institute,
key elements of a robust data policy include a clear
definition of open data and a general declaration
of principles that should guide the publication and
reuse of open data.92
Geospatial and transportation data are often
prioritized for publication by governments under
open data initiatives.93
However, certain categories
of data may not be suitable for publication as open
data, including personal data and data with national
security implications. Care must be taken to ensure
that personal data are not published on open data
portals without adequate protective measures and a
conscious assessment of the associated risks. A data
protection impact assessment can be used to evaluate
the risks of data processing and ensure that data are
adequately safeguarded before being shared.94
Asopendatasystemsmature,governmentsshould
move from merely promoting access to data to facili-
tating use of data. A key enabling reform is ensuring
that data and metadata are “open by default,” available
Figure 6.6 Regulations enabling access to and reuse
of public intent data are unevenly developed across
country income groups
Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata
.worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_6.
Note: The figure shows the percentage of countries in each country income group that had adopted
good-practice legal and regulatory frameworks to enable access, use, and reuse of public intent data
as of 2020.
High-income Upper-middle-income
Lower-middle-income Low-income
0
20
40
60
80
100
Semantic interoperability mandated
for government entities
Open data law
Data
classification
policy
Mandatory use of data
classification for
government data
Access to information (ATI)
legislation
Exceptions to
the access to
information (ATI)
right
Adoption of open
licensing regime
by government
202 | World Development Report 2021
in a machine readable format and by bulk download
or via application programming interfaces (APIs)—
computing interfaces and code that allow data to be
readily transmitted between one software product or
application and another. A survey conducted by the
Open Data Barometer found that less than 10 percent
of governments that have established open data por-
tals include a provision for data to be open by default.
Moreover, only half of the datasets published are in
a machine readable format, and only one-quarter of
datasets have an open license.95
Ensuring unified data classification standards. A key
enabler of data reuse is a data classification policy
that categorizes types of data according to objective
and easily implementable criteria across the differ-
ent stages of the data life cycle.96
Data classification
policies typically entail categorizing data according
to their sensitivity (such as classified, confidential,
or business use only). Although data classification
policies are found in more than half of the countries
surveyed (figure 6.6), their practical effects are lim-
ited because in less than one-third of countries is the
application of data classification policies mandatory
for government database applications or document
management systems.
Restricted data (data that cannot be published
as open data) could possibly be shared bilaterally by
agreement (such as with memoranda of understand-
ing). Alternatively, innovative mechanisms, includ-
ing data pools and data sandboxes, allow data to be
accessed and processed in a controlled environment,
subject to specific restrictions on data use. For exam-
ple, data could be analyzed at a secure data repository
(whether virtual or physical) but not taken off-site.97
Allowing access to information. ATI legislation is a
key complementary enabler for increasing access to
public sector data that have not been published on an
opendataplatform.Suchlegislationprovidesthelegal
means for enforcement of public sector disclosure.98
As with open data legislation, ATI legislation can be
more or less effectively implemented, depending on
how broadly the exemption categories for disclosure
are drafted or interpreted and how restrictively data
classification policies are applied at the working level.
If government entities claim that much of their data
are “sensitive” and therefore fall under one of the
exceptions for disclosure under ATI statutes, then
the usefulness of such legislation for enabling public
data access may be limited. This concern is warranted
because nearly half the countries included in the
Global Data Regulation Survey—across the income
spectrum—have placed significant exceptions on an
individual’s rights to access public information under
such laws (figure 6.6).
Another limit to the impact of ATI legislation is
its scope of application, which is necessarily limited
to public sector data. Open data policies, although
originating in the public sector, can be voluntarily
adopted by the private sector. However, there is no
general legal equivalent to ATI requests to compel
the disclosure of private sector data. Currently, the
majority of private sector data sharing is undertaken
on a contractual basis. Certain experts have argued
that expanding the scope of laws mandating access to
private sector data, consistent with competition law,
could be the “next frontier in data governance.”99
Promoting interoperability of data and systems. For
the value of data—including open data—to be fully
harnessed, legislation must go beyond promoting
access to data and ensure that data can be used more
effectively by combining or linking datasets. Doing
so requires provisions governing the interoperability
of data (and metadata) and their quality, as well as the
modalities under which data should be published.
These good-practice characteristics include publish-
ing data in a machine readable format (under FAIR
principles that govern the findability, accessibility,
interoperability, and reuse of data)100
and ideally
via APIs.101
Interoperability of data and systems can
be supported by adopting harmonized standards—
ideally, open standards. Open standards are often
determined by sectoral or international standard
setting organizations (SSOs) in order to support the
interoperability of data and systems within a partic-
ular market or sector. They are therefore designed
collaboratively based on user needs.102
Public intent data should also be published under
an open license and at no charge or at a marginal
price to cover the costs of dissemination or reproduc-
tion.103
Nearly 48 percent of the surveyed countries
have adopted some form of open licensing regime
for public intent data. All the high-income countries
covered in the survey have done so, compared with
about 40 percent of middle-income countries. Other
countries, such as Jordan and Mauritius,104
have
adopted Creative Commons Attribution 4.0 Inter­
national Licenses for government datasets released
as open data. In Jordan, datasets published by the
government are open to all and licensed under a
Jordanian Open Government Data License, which
allows the use, reuse, and sharing of data, in compat-
ibility with the Creative Commons (CC-BY) license.105
To ensure that data prioritized for publication meet
the needs of nongovernmental actors in the private
Data policies, laws, and regulations: Creating a trust environment | 203
sector and civil society, these decisions should be
guided by consultations with multiple stakeholders
(see chapter 8).
Enabling access to and the seamless transfer of
public sector data between different entities within
the public sector and with end users (including
individuals and businesses) requires ensuring the
interoperability of information technology (IT)
systems (including platforms) and data (syntactic
and semantic interoperability). As defined by ISO,
syntactic interoperability enables “the formats of the
exchanged information [to] be understood by the
participating systems,” while semantic interopera-
bility enables the “meaning of the data model within
the context of a subject area to be understood by the
participating systems.”106
Effective data and systems
interoperability requires the implementation of sev-
eral technical protocols and a government interoper-
ability platform.
In addition to technical enablers for interoperabil-
ity across the whole of government, an enabling legal
and regulatory framework is often required. This
framework mandates the use of the government’s
interoperability platform and data exchange proto-
cols, ensuring that all government entities connect
to and use the platform as a vehicle for exchanging
data. Very few countries surveyed have adopted a
full range of common technical standards (such as
the FAIR principles) that enable the interoperability
of systems, registries, and databases (figure 6.6).
Estonia is among the few countries surveyed that
has established standards for open APIs for govern-
ment to government (G2G), government to business
(G2B), and government to consumer (G2C) services;
standardized communications protocols for access-
ing metadata; and developed semantic catalogues for
data and metadata.
A distinct advantage of implementing interop-
erability is the possibility of applying the once-only
principle to access to data, which reduces the admin-
istrative burden. Citizens and businesses are asked to
provide their data only once, thereby requiring pub-
lic sector entities to internally share and reuse data—
with the appropriate safeguards—in the provision
of administrative services. Because the risk of data
breaches and misuse increases when data are stored
in centralized or decentralized but linked reposito-
ries,theonce-onlyprincipleshouldbecomplemented
with robust legal and technical data protection as
well as cybersecurity and cybercrime safeguards,
implemented in a citizen-centered and trustworthy
manner, with sufficient capacity for implementation
(see chapter 8).107
This once-only principle was
integrated into the European eGovernment Action
Plan (2016–20) for implementation across the
European Union,108
with the intention of enabling
both domestic and cross-border inter­
operability. It
is also one of the pillars of the 2015 Digital Single
Market strategy109
and The Once-Only Principle
Project (TOOP),110
which has been piloted under the
European Union’s Horizon 2020 framework.111
At the
national level, Austria, Belgium, Denmark, Estonia,
the Netherlands, Portugal, and Spain have integrated
the once-only principle into domestic law for applica-
tion across government or on a sector basis.112
Enabling reuse of private intent data
The majority of business-to-business (B2B) and
business-to-government (B2G) data transactions
are governed by bilateral data sharing agreements
sourced in contract law.113
Consequently, policy and
legal interventions to encourage access to private
sector data focus on mitigating the legal and technical
challenges that discourage the use and sharing of data
by private sector entities. Governments also maintain
a greater margin of control over private sector data
transactions involving personal data, which are sub-
ject to personal data protection and privacy legislation
(or competition and consumer protection laws).
As appreciation has grown of the strategic value of
private sector data for enabling evidence-based policy
making and promoting innovation and competition
in key sectors (see chapter 4), some governments have
enacted legislation mandating the sharing of private
sector data deemed to be in the public interest and
whose voluntary sharing by the private sector would,
otherwise, have been too costly to incentivize.114
Many
of the sectors prioritized by such legislation (includ-
ing utilities and transportation) are considered to be
particularly relevant for the development of AI.
At the European level, the 2019 EU Open Data
Directive115
requires the European Commission to
adopt a list of high-value datasets to be provided free
of charge, in machine readable formats, via APIs, and
where relevant, via bulk download. These datasets,
considered to have “high commercial or societal
potential,” include geospatial data, Earth observation
data, meteorological data, data about corporate own-
ership, mobility data, and data from publicly funded
research projects.116
At the national level, France’s Law
for a Digital Republic (2016) includes provisions man-
dating making private sector data available according
to open standards for the creation of “public inter-
est datasets.”117
Another relevant example is the UK
204 | World Development Report 2021
Digital Economy Act (2017), which enables researchers
to gain access to deidentified data for research pur-
poses.118
At the subnational level, cities such as New
York, San Francisco, and São Paulo have also made
legal attempts to promote public-private data sharing
by requiring certain private sector platforms to share
their commercial data for regulatory purposes and to
spur the development of smart cities.119
A good-practice regulatory environment for
enabling reuse of private sector data encompasses
data portability and voluntary licensing of access to
essential data (figure 6.7). On average, surveyed coun-
tries have adopted less than 20 percent of such good
practices for enabling private sector reuse of data,
which is less than half the level of uptake found for
enablers related to public sector data.
Promoting open licensing. Licensing regimes, which
provide permission to use an otherwise proprietary
dataset, can be effective enablers of innovation
and competition. They can encourage holders of
data-related intellectual property rights to invest in
products and markets, knowing that they can control
access to licensed products and receive returns on
their investments.120
Licensing of intellectual prop-
erty rights is often voluntary, but in some cases it is
implemented on a compulsory basis by regulators or
industry participants to avoid market distortions.121
Voluntary licensing on FRAND terms can be a useful
mechanism in enabling the development of open
standards because the terms allow companies to
share technology and data.122
The adoption of such
licensing regimes, however, remains rare, especially
in low- and middle-income nations (figure 6.7). Korea
and the United Kingdom are among the few surveyed
countries that have done so.
A range of open licenses are available for use with
data. Open data licenses (Open Database Licenses, or
ODbLs) provide users with the legal rights to freely
share, modify, and use a database without regard to
copyright or other intellectual property rights or lim-
itations around data ownership. These license agree-
ments are published by the Open Data Commons,
which makes available a set of legal tools and licenses
to help users publish, provide, and use open data.123
The ODbL license sets out user rights, establishes the
correct procedure for attributing credit, and specifies
how to modify data to facilitate their sharing and
comparability. Another form of open license for data
is issued by Creative Commons, an international net-
work devoted to educational access and expanding
the range of creative works available for others to
build on legally and to share.124
Under the license, any
person can use, copy, publish, distribute, transmit, or
process the data and make them available to third par-
ties. They can also develop new derivatives of the data
by combining them with other data or using them in
a product or service, as long as they are attributed to
the publisher(s) using a specified statement.125
Requiring data portability. Voluntary data transac-
tions between parties are greatly facilitated by data
portability. The right to personal data portability is
designed to facilitate data transfers with the aim of
increasing an individual’s choice and control over
data about them. More fundamentally, the right to
personal data portability is aimed at “rebalancing the
relationship” between data generators/providers and
data controllers (including data users and platforms)
by mitigating the risk of locking in consumer data.
On a more systemic level, this right is intended to
foster competition between companies.126
Portability can be broken down into three distinct
rights: first, to receive a copy of the data provided
by the data generator to the data collector or user
(including data consumers and platforms); second,
to transmit data to another data collector/user;
Figure 6.7 Adoption of enablers for sharing private
intent data lags those for public intent data across
country income groups
Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata
.worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_7.
Note: The figure shows the percentage of countries in each country income group that had adopted
good-practice legal and regulatory frameworks to enable access to, sharing, and reuse of private intent
data as of 2020. FRAND = fair, reasonable, and non-discriminatory; ID = identification.
High-income Upper-middle-income
Lower-middle-income Low-income
Data portability
formatting
requirement
0
20
40
60
80
100
Data portability right
for individuals
Private sector able to digitally
verify or authenticate ID
Mandate for voluntary
licensing of access to
essential data on
FRAND basis
Data policies, laws, and regulations: Creating a trust environment | 205
and third, to request a direct transfer from one data
collector/user to another.127
Although data portability rights extend to the
raw data provided by the data subject (interpreted to
include observed data), they do not appear to extend
to inferred data (based on drawing inferences from
the raw data provided), which are increasingly being
used to develop AI.128
Enabling the direct transfer of
(personal or nonpersonal) data to another provider
requires that the source and host data formats and
systems are fully interoperable through the imple-
mentation of APIs.129
At present, interoperability is
encouraged, not mandated, by the GDPR130
and EU
regulations on the free flow of nonpersonal data.131
Alternatives to direct portability include storing
personal data in personal information management
systems,132
although their uptake is limited.133
In the European Union, the right to personal data
portability is mandated by Article 20 of the GDPR and
considered one of its most significant innovations.134
Little more than 10 percent of countries surveyed
have enacted data portability rights for individuals.
By contrast, the portability of nonpersonal data is not
mandated, but only encouraged as a means of pro-
moting competition and enabling the development of
competitive sectors using AI and big data.135
Individuals’ ability to enforce their right to data
portability can in practice be supported by requiring
data to be transferred in a standard machine readable
format. The surveyed countries that grant individuals
data portability rights all include formatting require-
ments to support data portability (figure 6.7). For
example, in the Philippines the 2012 Data Protection
Act grants data portability rights to data subjects and
empowerstheNationalPrivacyCommissiontospecify
the format and technical requirements to enable data
portability.136
Using a somewhat different approach,
Australia created a specific “consumer data right” in
August 2019137
to enable data portability through its
Competition and Consumer Act 2010.138
The act fol-
lows a phased sectoral approach to implementation,139
which enables common standards to be adapted to
sector-specific needs while being made publicly avail-
able on the open-source platform GitHub.140
Despite these advances, technical limitations and
legal uncertainties challenge effective enforcement of
data portability rights. At the same time, cybersecu-
rity risks, such as improper access or identity theft,
could increase if portability is not accompanied by
robust security measures.
In the absence of specific laws or regulations
mandating interoperability and portability standards,
some private sector actors have developed their own
mechanisms. One example is the collaborative Data
Transfer Project.141
Data format standardization is a
key component of enabling data portability in prac-
tice: the right to data portability cannot be exercised if
data are not downloaded in a format common to other
service providers. In practice, despite the source code
and APIs being open source, the leadership of this
project raises broader questions about the first mover
advantage that Big Tech companies have in setting
de facto standards and markets for B2B data sharing
between platforms.
Using APIs to enable effective interoperability and por-
tability. APIs can be used to either enable data sharing
(such as through open APIs) and portability or limit
access to data, depending on how they are designed.142
Although APIs are technical in nature, technolegal
frameworks can be developed to determine access
and control rules for third parties. These rules can
include setting controls to ensure the syntactic and
synthetic portability of data; the identity of the API
users; the type and amount of data transacted; and
the controls on the identifiability of data.143
APIs
designed with access and control mechanisms that
enable the selection of a limited amount of data
can provide users with more flexibility than if they
downloaded bulk data.144
That said, because APIs
can expose data to unauthorized access during data
transfers, they may prove challenging to use effec-
tively in lower-middle-income countries that do not
have sufficient technical capacity to respond to cyber-
security risks.
Fully leveraging APIs to enable effective interop-
erability and portability requires ensuring that they
are developed according to common standards. These
standards can be developed through regulation or
by industry, based on a multistakeholder approach.
Examples of successful initiatives include the Berlin
Group, which has developed common API standards
for the European banking industry.145
Cases from the
financial services sector (such as the UK Open Bank-
ing Initiative and implementation of the European
Union’s Second Payment Service Directive) may pro-
vide helpful lessons for the effective implementation
of these mechanisms as enablers for data reuse and
sharing.
Forging data partnerships. An alternative modality
for private sector data sharing is through data public-
private partnerships (PPPs) entered into on mutually
agreed contractual terms between private sector
entities or between government and businesses. For
example, the traffic application Waze has partnered
206 | World Development Report 2021
with more than 1,000 cities and other local public
­
sector entities through its Connected Citizens Pro-
gram146
to exchange traffic data and derive insights
to inform mobility projects, manage traffic and con-
gestion, support emergency response, and share data
with citizens through a cloud-based platform.147
Data partnerships pose several challenges. Part-
nerships between large companies and small and
medium enterprises may raise concerns about
fairness because of asymmetries in information or
market power. Partnerships between public and pri-
vate entities may lead to conflicts of interest because
of the government’s dual role as data user and data
regulator.148
In either case, partnerships may create
uncertainties around the allocation of the legal lia-
bility associated with the use of the data, as well as
potential compliance costs due to lack of harmoniza-
tion of legal frameworks applicable to both parties.149
Some of these risks can be mitigated by developing
contract guidelines or standard contractual terms
to harmonize provisions and rectify information
asymmetries. Some public sector initiatives have
attempted to develop such standard terms to promote
data sharing.150
Not all data sharing partnerships are designed
for profit. Some businesses provide their data and
digital tools at no charge to governments, academia,
and nongovernmental organizations for “social good.”
Data philanthropy,151
particularly in the area of big
data, has enabled the World Bank,152
together with UN
agencies—the World Health Organization (WHO),
United Nations Development Programme (UNDP),
World Food Programme (WFP), and United Nations
Children’s Fund (UNICEF)—and others, to leverage
companies’ data stock and digital capabilities to fully
exploit the value of data for development, while bene-
fiting the private sector through positive externalities.
Limiting intermediary liability. One of the great
enablersoftheflowofdataacrosstheinternetarerules
limiting the liability of intermediaries for content that
flows over their platforms. The intermediary liability
concept has roots in US telecommunications law dat-
ing back to the 1930s,153
and it has been informed by
subsequent US case law.154
Crucially, this exemption
from liability was extended to “interactive computer
services” (internet service providers) in Section 230 of
the 1996 amendments to the Communications Act of
1934155
and in the Digital Millennium Copyright Act.156
The advent of data platform business models has led
to growing requests from users for the “take-down”
of their personal information and has triggered an
ongoing debate between privacy advocates and Big
Tech about responsibility for fundamental issues of
freedom of expression and transparency of knowl-
edge. Liability exemptions have been criticized as
harboring defamatory conduct, encouraging harass-
ment online, and undermining attempts by law
enforcement to attribute conduct to specific individ-
uals.157
Nevertheless, freedom of expression advocates
continue to support shielding intermediaries from
liability.158
The rapidly changing landscape is creating
significant regulatory uncertainty for Big Tech firms
(see the overview and chapter 1 for a discussion on
the broader policy considerations relating to content
moderation and mis/disinformation).
Recommendations for crafting a
holistic legal framework
Any new social contract on data must rest on the
foundation of a comprehensive legal and regulatory
framework that helps build trust between stakehold-
ers, integrating both safeguards and enablers. As the
results of the Global Data Regulation Survey suggest,
the development and robustness of different aspects
of the legal and regulatory framework are quite
uneven, with relatively consistent patterns across
country income groups (table 6.1). These divergences
may be exacerbated by differences in implementa-
tion. E-commerce legislation is the only area in which
all country income groups are doing comparatively
well. Development is at an intermediate level in areas
such as enabling reuse of public intent data, safe-
guarding both personal and nonpersonal data, pro-
tecting cybersecurity, and combating cybercrime. By
far the weakest area of performance of the surveyed
countries is enablers for private intent data. Overall,
the average scores of high-income countries are not
very high in absolute terms, warranting an advanced
(green) classification in table 6.1 in only one case. And
the score differential between high- and low-income
countries is relatively small (rarely more than 30
points). Both findings indicate the novel challenges
of developing a sound data governance legal frame-
work and the significant progress all countries need
to make.
To fill the many remaining gaps in the legal frame-
work and further strengthen existing provisions, this
Report offers several recommendations. Overall, the
underlying legal framework needs to be approached
holistically. Although different elements of the legal
framework can be viewed in a modular fashion, the
elaboration of particular laws needs to touch on all
critical aspects. The crafting of such a coherent legal
framework should take into account both evolving
best practices and local conditions based on robust
Data policies, laws, and regulations: Creating a trust environment | 207
stakeholder consultation. There is no one-size-fits-all
solution.
Recommendations for strengthening
safeguards
Adopt and implement personal data protection legislation.
One of the biggest contributors to the trust frame-
work is the adoption of personal data protection
legislation following a rights-based approach. For
countries that lack data protection legislation or
enforcement agencies, the existing consumer protec-
tion legislation and competition law can be leveraged
to remedy certain manifestations of the misuse of
personal data. Although such legislation and laws
may be helpful, their scope of application is limited,
making them complements to, not substitutes for,
personal data protection legislation.
Introduce more meaningful models of consent. Tradi-
tional approaches to consent, developed in an analog
age, are an increasingly uncomfortable fit in the
modern digital age. Furthermore, in lower-income
countries, where literacy challenges continue to
affect a significant share of the population, reliance
on “consent,” as traditionally applied, will continue
to be problematic as more people access the inter-
net and permit their data to be used and reused. To
ensure that consent remains a meaningful legal basis
for using data, new models should be seriously con-
sidered, including those that shift responsibility for
data protection from individuals to the collectors and
users of the data.
Expandprotectiontomixeddataandgroupprivacy.New
data uses, fueled by innovative analytical techniques
and the growth of algorithm-based technologies such
as big data and the Internet of Things, are blurring
the distinction between personal and nonpersonal
data. At present, only personal data fall within the
scope of most current data protection laws, while ano-
nymized personal data are considered nonpersonal
data. In view of the ease of reidentifying and linking
datasets, which opens the door to deriving sensitive
or discriminatory insights from the processing of
nonpersonal data, policy makers should consider
expanding the scope of data protection legislation to
protect such mixed data. A related issue is that current
provisions for personal data protection, which focus
on the individual, do not preclude the identification
and potential misuse of data attributes pertaining to
homogeneous groups (including those defined by
ethnicity, race, religion, or sexual orientation). These
protections are particularly important in complex or
fragile sociopolitical environments or emergency con-
texts because of the increased risk of misuse of such
data for targeting or surveillance.
Adopt data protection by design and default. Privacy-
enhancing technologies are important complements
of data protection legislation, allowing privacy to
be embedded in data-driven products and services
right from the design phase. These standards can
play a valuable role in safeguarding fundamental data
rights in contexts in which weak institutional capac-
ity diminishes the legal enforceability of those rights.
However, for technical mechanisms to have teeth,
they must be underpinned by a robust legal frame-
work that creates the rights and limits on use that
privacy-enhancing technologies reinforce. Because of
the utility of data protection and privacy by design,
policy makers should consider building more of these
Table 6.1 Certain elements of the regulatory framework are much better developed than others,
but performance is generally low
Safeguards Enablers
Average score,
by country group
Cybersecurity
and
cybercrime
Personal
data
Nonpersonal
data
E-commerce and
e-transactions
Public intent
data
Private intent
data
High-income 73 59 43 86 69 30
Upper-middle-income 57 46 29 74 62 20
Lower-middle-income 55 43 38 72 44 15
Low-income 39 31 47 59 28 3
Global 56 44 38 73 50 17
Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata.worldbank.org/index.php/catalog/3866.
Note: The table shows the average score for good-practice data governance by theme across country income groups as of 2020. Colors refer to the level of the regulatory framework:
= advanced level (scores of 75–100); = moderate level (scores of 50–75); = evolving level (scores of 25–50); and = basic level (scores below 25).
208 | World Development Report 2021
requirements into their regulatory frameworks, while
maintaining technological neutrality.
Prioritize cybersecurity measures. Protecting indi-
viduals’ and groups’ rights in data is one thing; pro-
tecting the infrastructure and systems over which
those data flow—cybersecurity—is another. From
a legal perspective, these protections are gained by
adopting cybercrime legislation that balances secu-
rity concerns with other fundamental rights. Too
few countries have adopted serious legal provisions
to ensure cybersecurity, leading to mounting social
and economic risks. This gap should be addressed as
a matter of urgency.
Recommendations for strengthening
enablers
Build a robust yet flexible foundation for electronic trans-
actions. Digital transactions should be granted legal
equivalence to the analog variety, with limited excep-
tions. Robust authentication should be technology
neutral to ensure a level playing field for a wide vari-
ety of approaches to authenticating transactions and
related trust services.
Make data open by default and easy to access. Coun-
tries should strengthen open data policies by calling
for open-by-default approaches to public sector data
through legislation across the whole of government.
Datasets to be published should be prioritized using
inputfromendusers.Endusersshouldnotbecharged
(or should pay a limited price) for public intent data.
Consistently apply reasonable norms for data classifi-
cation. Implementation of open data policies or laws
requires the consistent application of clear, reason-
able data classification policies.
Adoptopenstandardsandsharing-friendlylicenses.Pol-
icy makers should strengthen open access to public
intent data, including adoption of open standards and
sharing-friendly licenses.
Strengthen access to information provisions. Access to
information legislation should be expanded to cover
the proactive and transparent disclosure of nonsen-
sitive data. Exceptions to disclosure will be necessary
and should be proportionate. ATI laws should provide
for regular public disclosure of ATI requests received
and rejected, and justification for any rejection, ide-
ally on an open platform.
Promote the interoperability of data and systems.
Improving the use and sharing of data will rely on
developing and applying unified technical stan-
dards to support the interoperability of data and
systems. Interoperability of systems entails adoption
of common technical protocols and a government
interoperability platform. Data can be made interop-
erable by ensuring that they are classified and
processed according to common standards and pub-
lished in a machine readable format.
Supportdataportability. The right to data portability
should be strengthened by requiring data to be in a
structured, commonly used, and machine readable
format. Interoperable data and systems can help
achieve continuous data portability, where propor-
tionate and technically feasible. As an alternative or
complement to direct portability, personal informa-
tion management systems can help users receive
and manage their data, but their uptake is currently
limited. The enforcement of data portability rights
depends on adequate market competition, enabling
users to switch providers. For data portability to be
meaningful, there is also a need to address the lack of
clear understanding of these rights by data subjects,
as well as the implementation challenges faced by
micro, small, and medium enterprises.
Promote sharing of private intent data. Governments
can incentivize the sharing of private sector data by
promoting data sharing agreements and enhancing
intellectual property rights. Together, these measures
can help reduce incentives for data hoarding and
leverage the reusability of data. In the case of public
interest data, and particularly under emergency sit-
uations, governments should increasingly consider
mandating private sector data sharing, subject to
suitable conditions and safeguards.
A maturity model for
strengthening the legal and
regulatory framework
The urgency of applying these measures will depend
on how far a country’s legal and regulatory frame-
work for data has evolved. Countries should develop
sound, comprehensive policies based on best prac-
tices adapted to their circumstances. Building on this
foundation, countries should then enact robust legis-
lation buttressed by multistakeholder consultation,
followed by clear time-bound implementation pro-
cedures to ensure accountability. The identified mea-
sures can tentatively be mapped onto the maturity
model framework summarized in table 6.2. Although
certain safeguarding and enabling elements are con-
sidered foundational, the ability to build an effective
legal regime for trusted data use is dependent on
ensuring that the overall framework is both inter-
nally coherent and aligned with the country’s policy
orientation, data culture, and social contract on data.
Data policies, laws, and regulations: Creating a trust environment | 209
Table 6.2 Recommendations organized according to a maturity model based on
data safeguards and enablers
Stage of country’s
data system Safeguards Enablers
Establishing
fundamentals
Conduct a baseline needs assessment.
Develop a comprehensive policy framework based on
best practices that does the following:
• 
Safeguards personal, nonpersonal, and evolving
categories of data and promotes greater equity
around data
• 
Enhances the security of systems and infrastructure
that protect against misuse of data
• 
Expands individuals’ agency and control over their
personal data
• 
Promotes certainty and predictability, integrating the
fundamental safeguards discussed in this chapter
such as data protection and cybersecurity.
Conduct a baseline needs assessment.
Develop a comprehensive policy framework based
on best practices that enables the use and sharing
of data for development purposes, ensuring access,
openness, interoperability, portability, predictability,
and transparency, while integrating the fundamental
enablers discussed in this chapter, such as electronic
transactions.
Initiating data flows Elaborate a legal framework that embodies policy
prerogatives that include:
• 
Personal data protection
• 
Promotion of cybersecurity and combating of
cybercrime
• 
Regulation of competition
• 
Provisions in the legal framework to provide
for establishment of the relevant enforcement
institutions.
Elaborate a legal framework that embodies policy
prerogatives that include:
• 
Legal recognition of e-transactions
• 
Access to information
• 
Intellectual property rights for nonpersonal data
• 
Openness of public intent data, including the use of
licenses that encourage data sharing
• 
Data classification principles.
Optimizing the
system
Promote awareness of safeguards:
• 
Domestically, through adoption of data protection
by design and default, together with associated
cybersecurity measures
• 
Internationally, through cross-border interoperability
of data protection standards
• 
Address more complex issues such as mixed data
and group rights
• 
Ensure that the capacity of the institutions
responsible for overseeing these activities is
sufficient
• 
Establish metrics to monitor and evaluate the
implementation and enforcement of these policies
and laws.
Consider issues such as data portability and
increasing incentives around sharing of private intent
data. Ensure that the capacity of the institutions
responsible for overseeing these activities is sufficient.
Establish metrics to monitor and evaluate the
implementation of these policies, laws, and institutions.
Source: WDR 2021 team.
Notes
1. Gellman (2013).
2. Confessore (2018).
3. A framework for data protection existed in the
EU prior to the GDPR—the 1995 Data Protection
Directive. Because a directive requires incorpora-
tion into domestic law, several European countries
adopted their own data protection regimes, in some
cases with even more stringent protections (such
as ­
Germany). However, adoption of the GDPR is
a ­
significant evolution in three key dimensions.
First, as a regulation that applies directly to all EU
members, it has harmonized data protection law
across the EU. Second, it has supported enforcement
through the introduction of significant fines. And,
third, it has applied extraterritorially to cross-border
data transactions involving data subjects in the EU.
4. Attorney General’s Office, California Department
of Justice, California Consumer Privacy Act (CCPA),
https://oag.ca.gov/privacy/ccpa.
5. China’s revision of its civil code will enter into force
in 2021. Articles 1032–1039 grant certain rights to indi-
viduals. These provisions may be superseded by the
expected introduction of a new law on protection of
personal information. See, for example, Dong (2020)
and Fang, Bigg, and Zhang (2020). China also published
for consultation a draft law on personal data protection
that in many respects mirrors provisions of the GDPR
(Zhang and Yin 2020).
210 | World Development Report 2021
6. See Fisher and Streinz (2021) and Lessig (1999).
7. Personally identifiable information refers to informa-
tion that can be used to distinguish or trace the identity
of a data subject. Examples of such information are the
subject’s name, national identity number, or biometric
records, alone or when combined with other personal
or identifying information that is linked or linkable to
a specific data subject, such as date and place of birth or
mother’s maiden name.
8. PMC (2019).
9. A regulatory impact analysis (RIA), also known as
a regulatory impact assessment, is defined by the
Organisation for Economic Co-operation and Devel-
opment (OECD) as “a systemic approach to critically
assessing the positive and negative effects of proposed
and existing regulations and nonregulatory alterna-
tives. As employed in OECD countries it encompasses
a range of methods. It is an important element of an
evidence-based approach to policy making” (OECD,
“Regulatory Impact Analysis,” https://www.oecd.org
/regreform/regulatory-policy/ria.htm). According to
data from the World Bank’s Global Indicators of Reg-
ulatory Governance (GIRG), 86 out of 186 countries
surveyed carry out RIAs on either a regular or a semi-
regular basis (World Bank, Global Indicators of Reg-
ulatory Governance [dashboard], https://rulemaking
.worldbank.org/).However,althoughmosthigh-income
countries carry out RIAs (45 out of 59, or 76 percent),
only 12 percent of low- and middle-income countries
do so. Moreover, even though all OECD high-income
countries except for Italy and Chile have developed
specific RIA guidelines, only three countries in Sub-
Saharan Africa (Kenya, South Africa, and Uganda)
have set requirements. For more details, see Deighton-
Smith, Erbacci, and Kauffmann (2016); ITU (2014);
World Bank (2018); World Bank, “Key Findings,” https://
rulemaking.worldbank.org/en/key-findings.
10. Chen (2021). To access the World Bank’s Global
Data Regulation Survey and its results, see https://
microdata.worldbank.org/index.php/catalog/3866.
11. COE (2018).
12. See, generally, page 70 of World Bank and United
Nations (2017).
13. World Bank and United Nations (2017).
14. World Bank and United Nations (2017).
15. In the absence of dual criminality, if an activity is crimi-
nal in jurisdiction X but is not in jurisdiction Y, then the
authorities in X could not extradite a criminal in Y.
16. Treaty Office, Directorate of Legal Advice and Public
International Law, Council of Europe, “Details of Treaty
No. 185: Convention on Cybercrime,” https://www.coe.int
/en/web/conventions/full-list/-/conventions/treaty/185.
17. Cybercrime, Council of Europe, “Parties/Observers to
the Budapest Convention and Observer Organisations
to the T-CY,” https://www.coe.int/en/web/cybercrime
/parties-observers.
18. U.S. Clarifying Lawful Overseas Use of Data Act
(“CLOUD” Act), 18 U.S. Code §2523, https://www.justice
.gov/dag/page/file/1152896/download.
19. LOC (2018).
20. Council of the European Union (2019).
21. EC (2018a).
22. EC (2014).
23. Sources differ on the number of data protection laws
enacted around the world: 128 countries, according to
the United Nations Conference on Trade and Develop-
ment, Data Protection and Privacy Legislation World-
wide (dashboard), https://unctad.org/page/data
-protection-and-privacy-legislation-worldwide; 116
countries, according to DLA Piper (2020); and 142 coun-
tries (as of 2019), according to Greenleaf and Cottier
(2020), as referenced by Anderson and Renieris (2020).
24. Consent is not the only basis for data processing, but
it remains a centerpiece because of its historical leg-
acy. Even in the GDPR, consent is one among many
grounds for legitimate data processing. See, for exam-
ple, GDPR Article 6.1 (EU 2018a).
25. See Recital 43 of the GDPR (EU 2018c).
26. For example, the US Fair Credit Reporting Act bans
certain types of data from being used to determine an
individual’s creditworthiness (FTC 2018).
27. Section 35 of the Personal Data Protection Bill currently
under discussion in India states that, in the event of an
imminent threat to the sovereignty or integrity of the
country or security of the state, the government has
the power to exempt public sector entities from appli-
cation of the bill entirely (Parliament of India 2019).
28. Sections 13(1) and (2) of India’s Personal Data Protection
Bill (2018) state that, until and unless such a threat
occurs, personal data may be processed without pro-
curing consent from the user in the following cases:
“(1) Personal data may be processed if such processing
is necessary for any function of Parliament or any State
Legislature. (2) Personal data may be processed if such
processing is necessary for the exercise of any function
of the State authorised by law for: (a) the provision of
any service or benefit to the data principal from the
State; or (b) the issuance of any certification, license or
permit for any action or activity of the data principal
by the State” (Personal Data Protection Bill, 2018,
https://www.meity.gov.in/writereaddata/files/Personal
_Data_Protection_Bill,2018.pdf).
29. See Article 8 of the European Convention on Human
Rights on the right to respect for private and family life
(ECHR 2010). The European Court of Human Rights
(ECHR) has interpreted limitations to the right as sub-
ject to a “legitimate aim” necessary to fulfill a “pressing
social need” and “proportionate to the legitimate aim
pursued” (ECHR 2020, 12). These due process restric-
tions apply even in emergency situations—see Article
15 (ECHR 2010). Such situations could include war or
pandemic. The European Data Protection Board (EDPB
2018), civil society organizations such as the Elec-
tronic Frontier Foundation (see, for example, Gelman
1998), and Article 19 of the European Convention have
enshrined these principles into data protection rules
and guidelines. See Electronic Frontier Foundation, “13
International Principles on the Application of Human
Data policies, laws, and regulations: Creating a trust environment | 211
Rights to Communication Surveillance,” https://
www.eff.org/files/2014/01/05/13p-onepagerfinal.pdf.
30. Ben-Avie and Tiwari (2019).
31. According to the World Economic Forum, “volunteered
data” are data that are “created and explicitly shared
by individuals, e.g., social network profiles”; “observed
data” are “captured by recording the actions of individ-
uals, e.g., location data when using cell phones”; and
“inferred data” are “data about individuals based on an
analysis of volunteered or observed information, e.g.,
credit scores” (WEF 2011).
32. Austin and Lie (2019). See also Recital 26 of the GDPR:
“The principles of data protection should therefore not
apply to anonymous information, namely information
which does not relate to an identified or identifiable
natural person or to personal data rendered anony-
mous in such a manner that the data subject is not
or no longer identifiable” (EU 2018b). Pseudonymized
data, however, still count as personally identifiable
information.
33. Austin and Lie (2019); de Montjoye et al. (2013). Addi-
tional research argues that only three data points are
needed for reidentification in most cases (Sweeney
2000). There is also much recent research on the lim-
its and durability of these deidentifying technologies
(Lubarsky 2017).
34. Austin and Lie (2019). For example, scientific research
may require certain personally identifiable infor-
mation characteristics (such as age and gender) for
accuracy.
35. Current international guidelines for data collection
and processing, such as the 2013 OECD guidelines and
the United Nations Data Privacy, Ethics and Protection
Principles (UNSDG 2017), and leading legal frame-
works such as the European Union’s General Data
Protection Regulation (EU 2018f), focus on protecting
personal data and professionally identifiable informa-
tion. For a broader discussion, see Taylor, Floridi, and
van der Sloot (2017).
36. Krämer, Senellart, and de Streel (2020).
37. A purpose-driven approach to data protection should
involve determining as threshold questions what data
should be collected and what data should be shared.
Anonos, “Schrems II Webinar Summary: Lawful Data
Transfers,” https://www.schremsii.com/faqs-and
-summary-edps-noyb-webinar.
38. In other words, focus on the ways in which the data
will and may be used and what its potential impacts
may be. For this reason, tools such as data protection
impact assessments (and, when appropriate, human
rights impact assessments, such as when high-risk,
data-driven technologies are being used) can help iden-
tify risks that must be mitigated through the appropri-
ate legal, technical, and organizational means.
39. Cavoukian (2011).
40. For example, homomorphic encryption allows analy­
sis of encrypted data. Similar in purpose, federated
learning techniques allow data to be processed and
analyzed without having to send raw data to a central
server (Homomorphic Encryption Standardization,
“Homomorphic Encryption,” https://homomorphicen
cryption.org/; Potey, Dhote, and Sharma 2016). That
said, encryption is not a silver bullet for compliance.
Encryption may be an effective safeguard while data
are in storage or in transit, but it may not provide
sufficient protection for processing if data must be
de-encrypted before computation.
41. Newer techniques that have emerged in response
to challenges around deidentification include
K-anonymity (works by aggregating data attributes)
and differential privacy (works by introducing random
noise into datasets)—see Austin and Lie (2019); Dwork
(2006); Sweeney (2000).
42. The European Commission’s guidance on privacy by
design is clear that these techniques should not be
a substitute for robust legal protections: “The term
‘Privacy by Design’ means nothing more than ‘data
protection through technology design.’ Behind this
is the thought that data protection in data processing
procedures is best adhered to when it is already inte-
grated in the technology when created. . . . The text of
the law leads one to conclude that often several protec-
tive measures must be used with one another to satisfy
statutory requirements. In practice, this consideration
is already performed in an early development phase
when setting technology decisions. Recognized certifi-
cation can serve as an indicator to authorities that the
persons responsible have complied with the statutory
requirements of ‘Privacy by Design’” (Intersoft Con-
sulting, “GDPR: Privacy by Design,” https://gdpr-info
.eu/issues/privacy-by-design/). Also see DSGVO-Portal,
“Recital 78 GDPR | General Data Protection Regula-
tion,” https://www.dsgvo-portal.de/gdpr_recital_78.php.
43. The Mauritius Declaration on the Internet of Things
states: “Data processing starts from the moment the
data are collected. All protective measures should be in
place from the outset. We encourage the development
of technologies that facilitate new ways to incorporate
data protection and consumer privacy from the outset.
Privacy by design and default should no longer be
regarded as something peculiar. They should become
a key selling point of innovative technologies” (EDPS
2014, 2).
44. The International Organization for Standardization
has created a technical committee for a new ISO stan-
dard on Consumer Protection: Privacy by Design for
Consumer Goods and Services (ISO 2018).
45. Cavoukian (2010).
46. According to the European Commission: “Companies/
organisations are encouraged to implement technical
and organisational measures, at the earliest stages of
the design of the processing operations, in such a way
that safeguards privacy and data protection principles
right from the start (‘data protection by design’). By
default, companies/organisations should ensure that
personal data is processed with the highest privacy pro-
tection (for example only the data necessary should be
processed, short storage period, limited accessibility)
so that by default personal data isn’t made accessible
to an indefinite number of persons (‘data protection
212 | World Development Report 2021
by default’).” European Commission, “What Does Data
Protection ‘by Design’ and ‘by Default’ Mean?” https://
ec.europa.eu/info/law/law-topic/data-protection
/reform/rules-business-and-organisations/obligations
/what-does-data-protection-design-and-default-mean
_en.
47. OVIC (2020).
48. See PCPD (2012) for materials on the data protection
framework in China.
49. ICO (2018).
50. de Montjoye at al. (2013). Additional research argues
that only three data points are needed for reidentifica-
tion in most cases (Sweeney 2000).
51. These new techniques include statistical approaches
such as K-anonymity, which aggregates data attributes
(Sweeney 2002); differential privacy, which intro-
duces random noise (Dwork 2006); and encryption
techniques such as homomorphic encryption, which
conduct analysis on encrypted data (Potey, Dhote, and
Sharma 2016).
52. Austin and Lie (2019).
53. See Hasselbalch and Tranberg (2016). An explanation
of a personal data store is offered in Mydex, “What Is
a Personal Data Store?” https://pds.mydex.org/what
-personal-data-store-0.
54. ENISA (2014). Also see the recommendation by the
European Union Agency for Cybersecurity (ENISA)
that it may be necessary to overlay several privacy by
design or pseudonymization techniques in order to
meet the GDPR’s threshold (ENISA 2019).
55. The OECD Recommendation on Artificial Intelligence
“identifies five complementary values-based principles
for the responsible stewardship of trustworthy AI”
(OECD 2019c). In particular, according to principle 2 on
human-centered values and fairness, “AI actors should
respect the rule of law, human rights, and democratic
values throughout the AI system life cycle. These
include freedom, dignity and autonomy, privacy and
data protection, nondiscrimination and equality, diver-
sity, fairness, social justice, and internationally recog-
nized labor rights.” These actors should also “imple-
ment mechanisms and safeguards, such as capacity
for human determination, that are appropriate to the
context.” According to principle 3 on transparency and
explainability, “AI actors should commit to transpar-
ency and responsible disclosure regarding AI systems.”
One of the aims should be “to enable those adversely
affected by an AI system to challenge [the] outcome
based on plain and easy-to-understand information.”
As of May 2019, 44 countries had adhered to the OECD
Recommendation and the five principles (OECD 2019c).
56. See Loi N° 2013-450 relative à la protection des données
à caractère personnel [Law 2013-450 on the protection
of personal data], Journal Officiel de la Republique de Côte
d’Ivoire, August 8, 2013, 474–82.
57. Controversies around the use of facial recognition
and other AI-based technologies for law enforcement
have been in the public eye in the United Kingdom
since 2019, when the UK Information Commissioner
Office launched an investigation into the use of facial
recognition technology in King’s Cross in London, on
the grounds that it might raise data protection con-
cerns. Subsequently, the UK High Court’s decision in
favor of the use of facial recognition by the SouthWales
Police, after the claimant argued that its use would be a
violation of privacy, was the first legal challenge to the
use of facial recognition by police in the world. See ICO
(2019); Nilsson (2019); Smith (2016).
58. The 2016 study conducted by the Human Rights Data
Analysis Group using 2010 and 2011 data from the
Oakland police department and other sources com-
pared a mapping of drug use based on survey data
from the victims of crime with another based on algo-
rithmic analysis of police arrests. The study showed
that biased source data could reinforce and poten-
tially amplify racial bias in law enforcement practices
(Lum 2016). Data on arrests showed that African-
American neighborhoods have on average 200 times
more drug arrests than other areas in Oakland (NIST
2020; Smith 2016).
59. Hill (2020).
60. Noble (2018).
61. Dastin (2018).
62. Organisation for Economic Co-operation and Develop-
ment, http://www.oecd.org/going-digital/ai/principles/;
G-20 (Japan-led), https://www.meti.go.jp/press/2019/06
/20190610010/20190610010-1.pdf.
63. HLCM (2018).
64. Cision (2020); City of Amsterdam (2020); City of Hel-
sinki (2020).
65. DCMS (2019); Stats NZ (2019). For a subnational exam-
ple, see Nantes City’s Metropolitan Charter on Data
(Ville de Nantes 2019). At a national level, France’s
Etalab has developed a map of algorithmic systems
in use across public sector entities in France and is
providing ministries, departments, and agencies with
guidance on their reporting and other accountability
requirements (Etalab 2020a, 2020b).
66. See Canada’s responsible use of AI in government pro-
grams, including Guiding Principles, lists of certified
providers of AI services, and its Algorithmic Impact
Assessment (TBS 2020).
67. Henley and Booth (2020).
68. The mandate of the US Federal Trade Commission
(FTC) includes hearing and adjudicating cases involv-
ing unfair competition or unfair or deceptive acts
under Section 5 of the FTC Act (see Federal Trade
Commission, Federal Trade Commission Act, https://
www.ftc.gov/enforcement/statutes/federal-trade
-commission-act). According to the FTC, “when com-
panies tell consumers they will safeguard their per-
sonal information, the FTC can and does take law
enforcement action to make sure that companies live
up [to] these promises. The FTC has brought legal
actions against organizations that have violated con-
sumers’ privacy rights, or misled them by failing to
maintain security for sensitive consumer information,
or caused substantial consumer injury. In many of
these cases, the FTC has charged the defendants with
violating Section 5 of the FTC Act, which bars unfair
Data policies, laws, and regulations: Creating a trust environment | 213
and deceptive acts and practices in or affecting com-
merce. In addition to the FTC Act, the agency also
enforces other federal laws relating to consumers’ pri-
vacy and security” (see Federal Trade Commission,
“Privacy and Security Enforcement,” https://www
.ftc.gov/news-events/media-resources/protecting
-consumer-privacy/privacy-security-enforcement).
69. Hoofnagle, Hartzog, and Solove (2019).
70. In the context of its 2020 European Data Strategy, the
EU may adopt a new Data Act in 2021, which would
update the IPR framework currently in force (includ-
ing a possible revision of the 1996 Database Directive)
to support the use and reuse of nonpersonal data (EC
2020b).
71. See, for example, Contract Guidelines on Data Utiliza-
tion Rights, updating the Unfair Competitive Preven-
tion Act of 2018 (METI 2020).
72. MeitY (2020).
73. See World Intellectual Property Organization, “Brazil:
Law No. 9.610 of February 19, 1998 (Law on Copyright
and Neighboring Rights, as amended by Law No. 12.853
of August 14, 2013),” WIPO Lex (database), https://
wipolex.wipo.int/en/legislation/details/17474.
74. See, generally, the two model laws promulgated by the
United Nations Commission on International Trade
Law (UNCITRAL 1998, 2001).
75. For purposes of this discussion, no distinction is
drawn between “electronic” signatures and “digital”
signatures, although commonly “digital” signatures
are associated with the use of public key infrastruc-
ture (PKI). For a more detailed explanation of PKI
and the differences between e-signatures and digital
signatures, see UNCITRAL (2001, 26–27; https://www
.uncitral.org/pdf/english/texts/electcom/ml-elecsig-e
.pdf).
76. UNCITRAL (1998).
77. Public key infrastructure (PKI) has been defined as
follows: “The framework and services that provide
for the generation, production, distribution, control,
accounting, and destruction of public key certificates.
Components include the personnel, policies, processes,
server platforms, software, and workstations used for
the purpose of administering certificates and public-
private key pairs, including the ability to issue, main-
tain,recover,andrevokepublickeycertificates”(https://
nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST
.SP.800-53r4.pdf).
78. EU (2014, article 3[15]).
79. See, generally, World Bank, ID4D Data: Global Identifi-
cation Challenge by the Numbers (dashboard), https://
id4d.worldbank.org/global-dataset; Sustainable Devel-
opment, Department of Economic and Social Affairs,
United Nations, “The 17 Goals,” https://sdgs.un.org
/goals. Sustainable Development Goal (SDG) 16.9 states:
“By 2030, provide legal identity for all, including birth
registration.”
80. World Bank (2019).
81. UNCITRAL (1998).
82. National Assembly, Togo (2020).
83. EC (2018c).
84. The Australian government’s Data Sharing and
Release Act of 2018 was drafted based on the results
of a report of the Productivity Commission (PC
2017). The purpose of the act is to (1) promote bet-
ter sharing of public sector data, (2) build trust in
use of public data, (3) dial up or down appropriate
safeguards, (4) maintain the integrity of the data
system, and (5) establish institutional arrangements
(see Department of the Prime Minister and Cabinet,
“Data Sharing and Release Reforms,” https://www
.pmc.gov.au/public-data/data-sharing-and-release
-reforms). This is expected to lead to (1) more efficient
and effective government services for citizens; (2) more
well-informed government programs and policies;
(3) greater transparency around government activities
and spending; (4) economic growth from innovative
data use; and (5) research solutions to current and
emerging social, environmental, and economic issues.
The purpose of the act is thus to move the paradigm
from one that restricts access to identifiable data to one
that authorizes release if appropriate data safeguards
are in place. To complement the Data Sharing and
Release Act, the government published a best-practice
guide outlining good-practice principles based on the
Five Safes Framework to manage the risks of disclosure
and designed to assess whether and how to share data
(PMC 2019). By enabling a privacy by design approach
to data sharing by focusing on controls and benefits
instead of merely reducing the level of detail in the data
to be shared, the principles help maximize the useful-
ness of the data.
85. National Archives (2019).
86. Section 1 of France’s Law for a Digital Republic lays out
provisions on open government data (Légifrance 2016).
Also see Dodds (2016).
87. They include geospatial, meteorological, and mobility
data, as well as statistics and data on corporate owner-
ship and Earth observation and the environment.
88. Noveck (2017).
89. Noveck (2017).
90. See Open Knowledge Foundation (2020).
91. OECD (2013, 2019a); Ubaldi (2013); Vickery (2012).
92. Dodds (2016).
93. OECD (2019a).
94. Austin and Lie (2019); Dodds (2016).
95. World Wide Web Foundation (2017).
96. For general principles, see ISO and IEC (2016, sec. 8.2).
For a practical example, see Data.NSW (2020).
97. OECD (2019a).
98. OECD (2019a).
99. Austin and Lie (2019).
100. Wilkinson et al. (2016).
101. See Article 3: “Art. L. 300-4.- Any provision made elec-
tronically under this book is done in an open standard,
easily reusable and exploitable by an automated pro-
cessing system” of the French Republic (Légifrance
2016).
102. Because the development of open standards is
often undertaken with input from leading industry
participants, who frequently integrate their firms’
214 | World Development Report 2021
proprietary technical standards into the design, SSOs
may require the application of patent rights on FRAND
terms. The adoption of FRAND licensing terms can
therefore become a condition for participation in SSOs.
The obligation to offer FRAND licenses to new market
entrants usually extends to third-party technology
providers whether or not they are SSO members. For
further details, see Ragavan, Murphy, and Davé (2016).
103. The Open Knowledge Foundation’s definition of open
data (“Open Definition”) sets out conditions for the
availability and access of data, its reuse and redistribu-
tion, and universal participation. On the latter, “every-
one must be able to use, re-use and redistribute—there
should be no discrimination against fields of endeavor
or against persons or groups. For example, ‘non-
commercial’ restrictions that would prevent ‘commer-
cial’ use, or restrictions of use for certain purposes (e.g.
only in education), are not allowed.” See Open Knowl-
edge Foundation (2020; https://okfn.org/opendata/).
104. MITCI (2017).
105. Council of Ministers, Jordan (2019).
106. ISO and IEC (2017).
107. EDRi (2015). Ensuring sufficient resources and techni-
cal capacity to effectively discharge these functions is
critical. For example, Estonia’s X-Tee data exchange and
interoperability platform is continuously monitored
to mitigate cyberthreats (RIA 2020). See chapter 8 for
further details on implementation.
108. EC (2016).
109. As the European Commission notes: “Online public
services are crucial to increasing the cost-efficiency
and quality of the services provided to citizens and
companies. One example of increased efficiency is
the ‘Once Only’ principle—only in 48% of cases do
public administrations reuse information about the
citizen or companies that is already in their possession
without asking again. The extension of this principle,
in compliance with data protection legislation, would
generate an annual net saving at the EU level of around
EUR 5 billion per year by 2017. The Commission will
launch a pilot project for the ‘Once-Only’ principle for
businesses and citizens and explore the possibility of
an EU wide e-safe solution (a secure online repository
for documents). Extending ‘Once-Only’ across borders
would further contribute to the efficiency of the Digital
Single Market” (EC 2015, 16).
110. TOOP (2021).
111. See European Commission, “Horizon 2020,” https://
ec.europa.eu/programmes/horizon2020/en.
112. SCOOP4C, “Stakeholder Community: Once-Only Prin-
ciple for Citizens,” https://www.scoop4c.eu/.
113. OECD (2019a).
114. OECD (2019a).
115. EU (2019b).
116. EC (2020a).
117. Légifrance (2016). This covers, for example, data from
delegated public services or data that are relevant for
targeting welfare payments or constructing national
statistics (OECD 2019a).
118. The UK Digital Economy Act enables accredited
researchers to gain access to deidentified data for
research purposes (National Archives, United Kingdom
2017, c. 30, Chap. 5). The act regulates data sharing prac-
tices for the purposes of research using public data,
but it does not govern data sharing in other contexts
(Austin and Lie 2019).
119. Finch and Tene (2018).
120. OECD (2019b).
121. Ragavan, Murphy, and Davé (2016).
122. FRAND licensing regimes have been designed to be
an effective competition law remedy (see the Apple
vs. Samsung cases), but infringements of FRAND
terms involve contractual remedies between the
patent holder and the SSO (or third party). However,
experts have argued that the pro-innovation and
competitive effects of licensing regimes depend on
how they are implemented. Indeed, some have argued
that an “excessive reliance” on FRAND terms may be
counterproductive.
123. Open Knowledge Foundation, “Open Data Commons
Open Database License (ODbL) v1.0,” https://opendata
commons.org/licenses/odbl/1-0/.
124. Creative Commons, “Open Data,” https://creative
commons.org/about/program-areas/open-data/.
125. MoICT (2017).
126. See the European Union’s Free Flow of Nonpersonal
Data Regulation (EU 2018e), the Payment Services
Directive (EU 2015), the Digital Content Directive (EU
2019a), and certain sectoral regulations, in addition to
the right to data portability for personal data enshrined
in Article 20 of the GDPR. See also Borgogno and
Colangelo (2019).
127. Article 20 of the GDPR (EU 2016).
128. The European Commission notes: “In general, given
the policy objectives of the right to data portability, the
term ‘provided by the data subject’ must be interpreted
broadly, and should exclude ‘inferred data’ and ‘derived
data,’ which include personal data that are created by
a service provider (for example, algorithmic results).
A data controller can exclude those inferred data but
should include all other personal data provided by the
data subject through technical means provided by the
controller” (EC 2017). This approach contrasts with that
of other legal frameworks, such as the California Con-
sumer Protection Act (CCPA), that are broader in scope
covering inferred data (see OneTrust DataGuidance
and FPF 2019).
129. Krämer, Senellart, and de Streel (2020).
130. See Recital 68 of the GDPR: “Data controllers should
be encouraged to develop interoperable formats that
enable data portability. . . . The data subject’s right to
transmit or receive personal data concerning him or
her should not create an obligation for the controllers
to adopt or maintain processing systems which are
technically compatible. . . . Where technically feasible,
the data subject should have the right to have the per-
sonal data transmitted directly from one controller to
another” (EU 2018d).
Data policies, laws, and regulations: Creating a trust environment | 215
131. See Article 6, “Porting of Data,” of the EU Regulation on
the Free Flow of Non-personal Data: “The Commission
shall encourage andfacilitate[emphasis added] the devel-
opment of self-regulatory codes of conduct at Union
level (‘codes of conduct’), in order to contribute to a
competitive data economy” (EU 2018e, 67).
132. PIMS can help individuals control their ported data
through mechanisms that simplify the process. They
can include mechanisms that support individual con-
trol over ported data such as schema mappings (which
convert data from the sender’s format to the receiver’s)
or functionalities that centralize and help visualize
consent and rights management for portability or
broader data protection. However, these mechanisms
have not been standardized across the industry to date,
which affects the broader sustainability of the business
model and their adoption as an alternative to other
enforcement mechanisms.
133. Measures such as shifting to authentication mecha-
nisms (like privacy seals) and open-source solutions
that are more user friendly may support the adoption
of PIMS as alternatives for consumers, especially if the
reliability of these solutions are certified to promote
trust (Krämer, Senellart, and de Streel 2020).
134. The first right to portability mandated by EU law was
the portability of phone numbers, following the Uni-
versal Services Directive, based on a legislative effort to
create competition in the telecommunications sector
(Zanfir-Fortuna and Hondagneu-Messner 2019).
135. Borgogno and Colangelo (2019).
136. Congress of the Philippines (2012).
137. The “consumer data right” aims to “give Australians
greater control over their data, empowering their
consumers to choose to share their data with trusted
recipients only for the purposes they have authorized”
(Treasury, Australia 2020).
138. See Part IVD in Federal Register of Legislation,
Australia (2019).
139. The act begins with the telecommunications, bank-
ing, and energy sectors before rolling out across the
economy.
140. The data standards body has released version 1.6.0 of
the consumer data standards, which represent high-
level standards and are in accordance with the rules
and phasing timetable of the Australian Competition
and Consumer Commission. See Data61, Common-
wealth Scientific and Industrial Research Organisa-
tion, “Consumer Data Standards,” https://consumer
datastandards.gov.au/consumer-data-standards/.
141. At present, the Data Transfer Project is at the pilot
stage, making it difficult to measure the impact of the
project on enabling continuous portability of data. It
remains an interesting model of private sector–led
cooperation to develop standard and interoperable
data formats that could be scaled up. See Google, “Data
Transfer Project,” https://datatransferproject.dev/. The
founding members of the Data Transfer Project were
Google and Facebook. They were later joined by Apple,
Microsoft, and Twitter.
142. Borgogno and Colangelo (2019).
143. OECD (2019a).
144. This was a point of discussion at the international
policy workshop “Data for Better Lives: Enablers and
Safeguards” hosted by the World Bank and the German
Federal Ministry of Economic Cooperation and Devel-
opment in Washington, DC, June 9–10, 2020.
145. See Berlin Group, “PSD2 Access to Bank Accounts,”
https://www.berlin-group.org/psd2-access-to-bank
-accounts.
146. Waze (2018).
147. Google, “Waze for Cities: Working Together for a
Smarter, More Sustainable Future,” Waze, https://www
.waze.com/ccp. Waze and other companies have been
sharing data with local governments in Brazil since
the 2016 Rio Olympics under their Connected Citizens
Program. Their platform is designed to support public
entities with urban planning, traffic optimization, law
enforcement, and emergency service provision (Huyer
and Cecconi 2020).
148. OECD (2019a).
149. Huyer and Cecconi (2020).
150. These include Japan’s “Contract Guidance on Utili-
zation of AI and Data” (METI 2018); the Netherlands’
Dare-2-Share Cooperation Agreement (Dare 2 Share
Ministries, “Terms and Conditions,” https://www
.dare2share.org/about/terms-and-conditions/); and
the European Union’s proposed “Guidance on Private
Sector Data Sharing” (EC 2018b). Japan’s Ministry of
Economy, Trade and Industry (METI) developed the
“Contract Guidance on Utilization of AI and Data” as a
resource for businesses entering a data sharing agree-
ment. It highlights factors and terms to be considered
for inclusion when drafting a contract using data or AI,
including sample clauses.
151. Kirkpatrick (2014).
152. See “Development Data Partnership,” https://data
partnership.org/.
153. GPO (2018).
154. Kosseff (2019, 27).
155. GPO (2018, at sec. 230).
156. LOC (1998).
157. Kosseff (2019, 5).
158. See Electronic Frontier Foundation, “Manila Princi-
ples on Intermediary Liability,” https://www.manila
principles.org/.
References
Anderson, Thea, and Elizabeth M. Renieris. 2020. “Data
Protection and Digital Infrastructure before, during, and
after a Pandemic.” Omidyar Network, Redwood City, CA.
https://omidyar.com/data-protection-and-digital
-infrastructure-before-during-and-after-a-pandemic/.
Austin, Lisa M., and David Lie. 2019. “Safe Sharing Sites.” NYU
Law Review 94 (4): 591–623. https://www.nyulawreview
.org/issues/volume-94-number-4/safe-sharing-sites/.
Ben-Avie, Jochai, and Udbhav Tiwari. 2019. “India’s New Data
Protection Bill: Strong on Companies, Step Backward
216 | World Development Report 2021
on Government Surveillance.” Open Policy and Advocacy
(blog), December 10, 2019. https://blog.mozilla.org/net
policy/2019/12/10/indias-new-data-protection-bill-strong
-on-companies-weak-on-gov.
Borgogno, Oscar, and Giuseppe Colangelo. 2019. “Data Shar-
ing and Interoperability: Fostering Innovation and Com-
petition through APIs.” Computer Law and Security Review
35 (5): 105314. https://doi.org/10.1016/j.clsr.2019.03.008.
Cavoukian, Ann. 2010. “Privacy by Design: The Definitive
Workshop; A Foreword by Ann Cavoukian, Ph.D.” Iden-
tity in the Information Society 3 (2): 247–51. https://doi.org
/10.1007/s12394-010-0062-y.
Cavoukian, Ann. 2011. “PbD, Privacy by Design, the 7 Foun-
dational Principles: Implementation and Mapping of
Fair Information Practices.” Information and Privacy
Commissioner of Ontario, Toronto.
Chen, Rong. 2021. “Mapping Data Governance Legal Frame-
works around the World: Findings from the Global
Data Regulation Diagnostic.” Policy Research Work-
ing Paper 9615, World Bank, Washington, DC. http://
documents.worldbank.org/curated/en/58133161781768
0243/Mapping-Data-Governance-Legal-Frameworks
-Around-the-World-Findings-from-the-Global-Data
-Regulation-Diagnostic.
Cision. 2020. “Helsinki and Amsterdam First Cities in the
World to Launch Open AI Register.” Cision News, Septem-
ber 28, 2020. Cision, Chicago. https://news.cision.com
/fi/city-of-helsinki/r/helsinki-and-amsterdam-first
-cities-in-the-world-to-launch-open-ai-register,c3204076.
City of Amsterdam. 2020. “What Is the Algorithm Regis-
ter?” City of Amsterdam Algorithm Register Beta. https://
algoritmeregister.amsterdam.nl/en/ai-register/.
City of Helsinki. 2020. “What Is an Artificial Intelligence
Register?” City of Helsinki Artificial Intelligence Register.
https://ai.hel.fi/.
COE(CouncilofEurope).2018.“Convention108+:Convention
for the Protection of Individuals with Regard to the Pro-
cessingofPersonalData.”COE,Strasbourg.https://rm.coe
.int/convention-108-convention-for-the-protection-of
-individuals-with-regar/16808b36f1.
Confessore, Nicholas. 2018. “Cambridge Analytica and Face-
book: The Scandal and the Fallout So Far.” NewYorkTimes,
April 4, 2018. https://www.nytimes.com/2018/04/04/us
/politics/cambridge-analytica-scandal-fallout.html.
Congress of the Philippines. 2012. “Republic Act No. 10173: An
Act Protecting Individual Personal Information in Infor-
mation and Communications Systems in the Govern-
ment and the Private Sector, Creating for This Purpose a
National Privacy Commission, and for Other Purposes.”
August 12, 2012, Lawphil Project, Arellano Law Founda-
tion, Manila. https://lawphil.net/statutes/repacts/ra2012
/ra_10173_2012.html.
Council of Ministers, Jordan. 2019. “Jordan Open Govern-
ment Data License.” Issue version 1.0, Open Govern-
ment Data Platform. https://portal.jordan.gov.jo/OGD
-License_en.pdf.
Council of the European Union. 2019. “Regulation of the
European Parliament and of the Council on European
Production and Preservation Orders for Electronic
Evidence in Criminal Matters.” Interinstitutional File
2018/0108(COD),CounciloftheEuropeanUnion,Brussels.
https://data.consilium.europa.eu/doc/document/ST
-10206-2019-INIT/en/pdf.
Dastin, Jeffrey. 2018. “Amazon Scraps Secret AI Recruit-
ing Tool That Showed Bias against Women.” Reuters,
October 10, 2018. https://www.reuters.com/article/us
-amazon-com-jobs-automation-insight/amazon-scraps
-secret-ai-recruiting-tool-that-showed-bias-against
-women-idUSKCN1MK08G.
Data.NSW. 2020. “NSW Government Information Classifi-
cation, Labelling, and Handling Guidelines.” Data.NSW,
Data Analytics Center, Customer, Delivery, and Trans-
formation, Department of Customer Service, Govern­
ment of New South Wales, Sydney. https://www
.digital.nsw.gov.au/sites/default/files/NSW%20Info%20
Classification%20Labelling%20and%20Handling%20
Guidelines%202020%20V2.1_1.pdf.
DCMS (Department for Digital, Culture, Media, and Sport,
United Kingdom). 2019. “Digital Charter.” Policy Paper,
DCMS, London. https://www.gov.uk/government
/publications/digital-charter/digital-charter.
Deighton-Smith, Rex, Angelo Erbacci, and Céline Kauff-
mann. 2016. “Promoting Inclusive Growth through
Better Regulation: The Role of Regulatory Impact
Assessment.” OECD Regulatory Policy Working Paper 3,
Organisation for Economic Co-operation and Develop-
ment, Paris. https://doi.org/10.1787/5jm3tqwqp1vj-en.
de Montjoye, Yves-Alexandre, César A. Hidalgo, Michel
Verleysen, and Vincent D. Blondel. 2013. “Unique in
the Crowd: The Privacy Bounds of Human Mobility.”
Scientific Reports 3 (1): article 1376. https://doi.org/10.1038
/srep01376.
DLA Piper. 2020. Data Protection Laws of the World. London:
DLA Piper. https://www.dlapiperdataprotection.com
/index.html?t=aboutc=AO.
Dodds, Leigh. 2016. “How to Write a Good Open Data Policy.”
Guides. Open Data Institute, London.
Dong, Marissa Xiao. 2020. “China: The Civil Code Strength-
ens Civil Law Protection around Privacy and Personal
Information.” Conventus Law, June 12, 2020. http://
www.conventuslaw.com/report/china-the-civil-code
-strengthens-civil-law/.
Dwork, Cynthia. 2006. “Differential Privacy.” In Automata,
Languages and Programming: 33rd International Colloquium,
ICALP 2006, Venice, Italy, July 10–14, 2006, Proceedings, Part
II, edited by Michele Bugliesi, Bart Preneel, Vladimiro
Sassone, and Ingo Wegener, 1–12. Lecture Notes in Com-
puter Science Series, Vol. 4052. Berlin: Springer. https://
link.springer.com/chapter/10.1007%2F11787006_1.
EC (European Commission). 2014. “Guidelines on the Imple-
mentation of the Court of Justice of the European Union
Judgment on ‘Google Spain and INC v. Agencia Española
de Protección de Datos (AEPD) and Mario Costeja
González’ C-131/12.” Document WP225, Directorate C
(Fundamental Rights and Union Citizenship), Director-
ate General Justice, EC, Brussels. https://ec.europa.eu
/newsroom/article29/item-detail.cfm?item_id=667236.
EC (European Commission). 2015. “A Digital Single Market
Strategy for Europe.” Document COM(2015) 192 final,
EC, Brussels. https://eur-lex.europa.eu/legal-content/EN
/TXT/PDF/?uri=CELEX:52015DC0192from=EN.
EC (European Commission). 2016. “EU eGovernment Action
Plan 2016–2020: Accelerating the Digital Transformation
Data policies, laws, and regulations: Creating a trust environment | 217
of Government.” Document COM(2016) 179 final, EC,
Brussels. https://eur-lex.europa.eu/legal-content/EN
/TXT/?uri=CELEX:52016DC0179.
EC (European Commission). 2017. “Article 29 Data Protec-
tion Working Party: Guidelines on the Right to Data
Portability.” Document WP242 rev.01, Directorate C
(Fundamental Rights and Rule of Law), Director-
ate General Justice and Consumers, EC, Brussels.
https://ec.europa.eu/newsroom/article29/item-detail
.cfm?item_id=611233.
EC(EuropeanCommission).2018a.“Article29WorkingParty:
Guidelines on Consent under Regulation 2016/679.” Doc-
ument WP259 rev.01, Directorate C (Fundamental Rights
and Union Citizenship), Directorate General Justice,
EC, Brussels. https://ec.europa.eu/newsroom/article29
/document.cfm?action=displaydoc_id=51030.
EC (European Commission). 2018b. “Guidance on Private
Sector Data Sharing.” Text. Shaping Europe’s Digital
Future—European Commission. https://ec.europa.eu
/digital-single-market/en/guidance-private-sector-data
-sharing.
EC (European Commission). 2018c. Study to Support the
Review of Directive 2003/98/EC on the Re-Use of Public Sec-
tor Information: Final Report. Luxembourg: Publications
Office of the European Union. https://data.europa.eu
/doi/10.2759/373622.
EC (European Commission). 2020a. “European Legislation
on Open Data and the Re-Use of Public Sector Informa-
tion.” ShapingEurope’sDigitalFuture:Policy.Data Policy and
Innovation (Unit G.1), EC, Brussels. https://ec.europa.eu
/digital-single-market/en/european-legislation-reuse
-public-sector-information.
EC (European Commission). 2020b. “A European Strategy
for Data.” Communication COM(2020) 66 final,
Brussels, EC. https://ec.europa.eu/info/sites/info/files
/communication-european-strategy-data-19feb2020_en
.pdf.
ECHR (European Court of Human Rights). 2010. “European
Convention on Human Rights.” ECHR and Council of
Europe, Strasbourg. https://www.echr.coe.int/documents
/convention_eng.pdf.
ECHR (European Court of Human Rights). 2020. Guide on
Article 8 of the European Convention on Human Rights: Right
to Respect for Private and Family Life, Home and Correspon-
dence, rev. ed. Strasbourg: ECHR. https://www.echr.coe
.int/documents/guide_art_8_eng.pdf.
EDPB (European Data Protection Board). 2018. “Guidelines
2/2018 on Derogations of Article 49 under Regulation
2016/679.” Guidelines. EDPB, Brussels. https://edpb.europa
.eu/sites/edpb/files/files/file1/edpb_guidelines_2_2018
_derogations_en.pdf.
EDPS (European Data Protection Supervisor). 2014.
“Mauritius Declaration on the Internet of Things.”
36th International Conference of Data Protection
and Privacy Commissioners, Balaclava, Mauritius,
October 14, 2014. https://edps.europa.eu/sites/edp/files
/publication/14-10-14_mauritius_declaration_en.pdf.
EDRi (European Digital Rights). 2015. “A Truly Digital Sin-
gle Market?” June, EDRi, Brussels. https://edri.org/files
/DSM_Analysis_EDRi_20150617.pdf.
ENISA (European Union Agency for Cybersecurity). 2014.
“Privacy and Data Protection by Design: From Policy
to Engineering.” ENISA, Heraklion, Greece. https://data
.europa.eu/doi/10.2824/38623.
ENISA (European Union Agency for Cybersecurity). 2019.
“Pseudonymisation Techniques and Best Practices:
Recommendations on Shaping Technology According to
Data Protection and Privacy Provisions.” ENISA, Herak-
lion, Greece. https://www.enisa.europa.eu/publications
/pseudonymisation-techniques-and-best-practices.
Etalab. 2020a. “Algorithmes de Nantes Métropole.” data.gouv.
fr, October 7, 2020, Etalab, Paris. https://www.data.gouv
.fr/en/datasets/algorithmes-de-nantes-metropole/.
Etalab. 2020b. “Les algorithmes publics: enjeux et obli-
gations” [Public sector algorithms: challenges and
obligations]. guides.etalab.gouv.fr, Etalab, Paris. https://
guides.etalab.gouv.fr/algorithmes/guide/#_1-a-quoi
-servent-les-algorithmes-publics.
EU (European Union). 2014. “Regulation (EU) No 910/2014 of
the European Parliament and of the Council of 23 July
2014 on Electronic Identification and Trust Services
for Electronic Transactions in the Internal Market and
Repealing Directive 1999/93/EC.” Official Journal of the
European Union L 257/73 (August 8). https://ec.europa.eu
/futurium/en/system/files/ged/eidas_regulation.pdf.
EU (European Union). 2015. “Directive (EU) 2015/2366 of
the European Parliament and of the Council of 25
November 2015 on Payment Services in the Internal
Market, Amending Directives 2002/65/EC, 2009/110/
EC and 2013/36/EU and Regulation (EU) No 1093/2010,
and Repealing Directive 2007/64/EC.” Official Journal of
the European Union L 337/35 (December 23). https://eur
-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX
:32015L2366from=EN.
EU (European Union). 2016. “Regulation (EU) 2016/679 of the
European Parliament and of the Council of 27 April 2016
on the Protection of Natural Persons with Regard to the
Processing of Personal Data and on the Free Movement
of Such Data, and Repealing Directive 95/46/EC (General
Data Protection Regulation).” Official Journal of the Euro-
pean Union L 119/1 (May 4). https://eur-lex.europa.eu/eli
/reg/2016/679/oj.
EU (European Union). 2018a. “Art. 6 GDPR: Lawfulness
of Processing.” GDPR.Eu, November 14, 2018. Proton
Technologies, Calgary, Canada. https://gdpr.eu/article-6
-how-to-process-personal-data-legally/.
EU (European Union). 2018b. “Recital 26: Not Applicable
to Anonymous Data.” GDPR.Eu, November 14, 2018.
Proton Technologies, Calgary, Canada. https://gdpr.eu
/recital-26-not-applicable-to-anonymous-data/.
EU (European Union). 2018c. “Recital 43: Freely Given
Consent.” GDPR.Eu, November 14, 2018. Proton Technol-
ogies, Calgary, Canada. https://gdpr.eu/recital-43-freely
-given-consent/.
EU (European Union). 2018d. “Recital 68: Right of Data
Portability.” GDPR.Eu, November 14, 2018. Proton Tech-
nologies, Calgary, Canada. https://gdpr.eu/recital-68
-right-of-data-portability/.
EU (European Union). 2018e. “Regulation (EU) 2018/1807
of the European Parliament and of the Council of 14
November 2018 on a Framework for the Free Flow of
Non-personal Data in the European Union.” Official
Journal of the European Union L 303, 61 (November 10):
78–68. https://eur-lex.europa.eu/legal-content/EN/TXT
/PDF/?uri=OJ:L:2018:303:FULLfrom=EN.
218 | World Development Report 2021
EU (European Union). 2018f. “What Is GDPR, the EU’s
New Data Protection Law?” GDPR.Eu, May 25, 2018.
Proton Technologies, Calgary, Canada. https://gdpr.eu
/what-is-gdpr/.
EU (European Union). 2019a. “Directive (EU) 2019/770
of the European Parliament and of the Council of 20
May 2019 on Certain Aspects Concerning Contracts
for the Supply of Digital Content and Digital Ser-
vices.” Official Journal of the European Union L 136/1 (May
22).https://eur-lex.europa.eu/legal-content/EN/TXT/PDF
/?uri=CELEX:32019L0770from=EN.
EU (European Union). 2019b. “Directive (EU) 2019/1024 of the
European Parliament and of the Council of 20 June 2019
on Open Data and the Re-Use of Public Sector Informa-
tion.” Official Journal of the European Union L 172/56 (June
26). https://eur-lex.europa.eu/legal-content/EN/TXT
/PDF/?uri=CELEX:32019L1024from=EN.
Fang, Sammy, Carolyn Bigg, and John Zhang. 2020. “New
Chinese Civil Code Introduces Greater Protection of
Privacy Rights and Personal Information.” Insights,
June 9, 2020, DLA Piper, London. https://www.dlapiper
.com/en/uk/insights/publications/2020/06/new-chinese
-civil-code-introduces-greater-protection-of-privacy
-rights-and-personal-information/.
Federal Register of Legislation, Australia. 2019. “Competition
and Consumer Act 2010, No. 51, 1974.” Compilation 121.
Sydney: Office of Parliamentary Counsel. http://www
.legislation.gov.au/Details/C2019C00317/Html/Volume_1.
Finch, Kelsey, and Omer Tene. 2018. “Smart Cities: Privacy,
Transparency, and Community.” In The Cambridge Hand-
book of Consumer Privacy, edited by Evan Selinger, Jules
Polonetsky, and Omer Tene, 125–48. Cambridge Law
Handbooks Series. Cambridge, UK: Cambridge Univer-
sity Press. https://doi.org/10.1017/9781316831960.007.
Fisher, Angelina, and Thomas Streinz. 2021. “Confronting
Data Inequality.” WDR 2021 background paper, World
Bank, Washington, DC. https://papers.ssrn.com/sol3
/papers.cfm?abstract_id=3825724.
FTC (Federal Trade Commission, United States). 2018. Fair
Credit Reporting Act, 15 U.S.C § 1681, rev. ed. Washington,
DC: FTC. https://www.ftc.gov/system/files/documents
/statutes/fair-credit-reporting-act/545a_fair-credit
-reporting-act-0918.pdf.
Gellman, Barton. 2013. “Edward Snowden, after Months
of NSA Revelations, Says His Mission’s Accomplished.”
Washington Post, December 23, 2013. https://www
.washingtonpost.com/world/national-security/edward
-snowden-after-months-of-nsa-revelations-says-his
-missions-accomplished/2013/12/23/49fc36de-6c1c-11e3
-a523-fe73f0ff6b8d_story.html.
Gelman, Robert B. 1998. Protecting Yourself Online: The Defin-
itive Resource on Safety, Freedom, and Privacy in Cyberspace.
With Stanton McCandlish and Members of the Elec-
tronic Frontier Foundation. New York: HarperCollins.
GPO (Government Publishing Office, United States). 2018.
Communications Act of 1934, as Amended. United States
Code, 2018 ed. Title 47: Telecommunications. Washington,
DC: GPO. https://www.govinfo.gov/app/details
/USCODE-2018-title47/USCODE-2018-title47-chap5
-subchapI-sec151.
Greenleaf, Graham, and Bertil Cottier. 2020. “2020 Ends a
Decade of 62 New Data Privacy Laws.” Privacy Laws and
Business International Report 163: 24–26. https://papers
.ssrn.com/sol3/papers.cfm?abstract_id=3572611.
Hasselbalch, Gry, and Pernille Tranberg. 2016. “Personal
Data Stores Want to Give Individuals Power over Their
Data.” Dataethics (blog), September 27, 2016. https://
dataethics.eu/personal-data-stores-will-give-individual
-power-their-data/.
Henley, Jon, and Robert Booth. 2020. “Welfare Surveillance
System Violates Human Rights, Dutch Court Rules.”
Guardian, February 5, 2020. https://www.theguardian
.com/technology/2020/feb/05/welfare-surveillance
-system-violates-human-rights-dutch-court-rules.
Hill, Kashmir. 2020. “Wrongfully Accused by an Algorithm.”
New York Times, August 3, 2020. https://www.nytimes
.com/2020/06/24/technology/facial-recognition-arrest
.html.
HLCM (High-Level Committee on Management, United
Nations). 2018. “Personal Data Protection and Privacy
Principles.” HLCM, Chief Executives Board for Coor-
dination, United Nations, Geneva. https://unsceb.org
/personal-data-protection-and-privacy-principles.
Hoofnagle, Chris Jay, Woodrow Hartzog, and Daniel J.
Solove. 2019. “The FTC Can Rise to the Privacy Challenge,
but Not without Help from Congress.” Brookings TechTank
(blog), August 8, 2019. https://www.brookings.edu/blog
/techtank/2019/08/08/the-ftc-can-rise-to-the-privacy
-challenge-but-not-without-help-from-congress/.
Huyer, Esther, and Gianfranco Cecconi. 2020. “Business-
to-Government Data Sharing.” Analytical Report 12,
European Data Portal, European Commission, Luxem-
bourg. https://www.europeandataportal.eu/sites/default
/files/analytical_report_12_business_government_data
_sharing.pdf.
ICO (Information Commissioner’s Office). 2018. Guide to the
General Data Protection Regulation (GDPR). Wilmslow, UK:
ICO. https://ico.org.uk/media/for-organisations/guide-to
-data-protection/guide-to-the-general-data-protection
-regulation-gdpr-1-1.pdf.
ICO (Information Commissioner’s Office). 2019. “The Use of
Live Facial Recognition Technology by Law Enforcement
in Public Places.” Information Commissioner’s Opinion,
2019/01, October 31, 2019. https://ico.org.uk/media/about
-the-ico/documents/2616184/live-frt-law-enforcement
-opinion-20191031.pdf.
ISO (International Organization for Standardization). 2018.
“ISO/PC 317: Consumer Protection: Privacy by Design
for Consumer Goods and Services.” Taking Part: Technical
Committee, ISO, Geneva. https://www.iso.org/committee
/6935430.html.
ISO (International Organization for Standardization) and
IEC (International Electrotechnical Commission). 2016.
“ISO/IEC 27011:2016(en): Information Technology,
Security Techniques, Code of Practice for Information
Security Controls Based on ISO/IEC 27002 for Telecom-
munications Organizations.” Online Browsing Platform,
ISO, Geneva, https://www.iso.org/obp/ui/#iso:std:iso
-iec:27011:ed-2:v1:en.
ISO (International Organization for Standardization) and
IEC (International Electrotechnical Commission). 2017.
“ISO/IEC 19941:2017, Information Technology, Cloud
Com­
puting, Interoperability, and Portability.” Online
Data policies, laws, and regulations: Creating a trust environment | 219
Browsing Platform, ISO, Geneva. https://www.iso.org
/obp/ui/#iso:std:iso-iec:19941:ed-1:v1:en.
ITU (International Telecommunication Union). 2014. “Using
Regulatory Impact Analysis to Improve Decision Mak-
ing in the ICT Sector.” ITU, Geneva.
Kirkpatrick, Robert. 2014. “A Big Data Revolution for Sustain-
able Development.” In The Global Compact International
Yearbook 2014, edited by United Nations Global Compact
Office, 33–35. New York: United Nations; Münster,
Germany: macondo publishing.
Kosseff, Jeff. 2019. The Twenty-Six Words That Created the
Internet. Ithaca, NY: Cornell University Press.
Krämer, Jan, Pierre Senellart, and Alexandre de Streel. 2020.
“Making Data Portability More Effective for the Digi-
tal Economy: Economic Implications and Regulatory
Challenges.” Center on Regulation in Europe, Brussels.
https://cerre.eu/publications/report-making-data
-portability-more-effective-digital-economy/.
Légifrance. 2016. “Loi no 2016-1321 du 7 octobre 2016 pour une
République numérique.” Journal officiel de la République
française, October 8, 2016, Légifrance, Direction de l’in-
formation légale et administrative, Paris. https://www
.legifrance.gouv.fr/download/file/SJ9w29KN2wvvWjcmi
PwHr3B0La5rYk6ys5dm_FwTPZs=/JOE_TEXTE.
Lessig, Lawrence. 1999. Code and Other Laws of Cyberspace.
New York: Basic Books.
LOC (Library of Congress, United States). 1998. “H. R. 2281
Digital Millennium Copyright Act: 105th Congress
(1997–1998).” Congress.gov, October 28, 1998, LOC,
Washington, DC. https://www.congress.gov/bill/105th
-congress/house-bill/2281.
LOC (Library of Congress, United States). 2018. “H. R. 4943,
CLOUD Act: 115th Congress (2017–2018).” Congress.gov,
February 6, 2018, LOC, Washington, DC. https://www
.congress.gov/bill/115th-congress/house-bill/4943.
Lubarsky, Boris. 2017. “Re-Identification of ‘Anonymized’
Data.” Georgetown Law Technology Review (April): 202–13.
https://georgetownlawtechreview.org/re-identification
-of-anonymized-data/GLTR-04-2017/.
Lum, Kristian. 2016. “Predictive Policing Reinforces Police
Bias.” HRDAG: Human Rights Data Analysis Group, Octo-
ber 10. http://hrdag.org/2016/10/10/predictive-policing
-reinforces-police-bias/.
MeitY (Ministry of Electronics and Information Technol-
ogy). 2020. “Report by the Committee of Experts on Non-
Personal Data Governance Framework.” 111972/2020/
CL  ES. MeitY, New Delhi. https://www.hunton
privacyblog.com/wp-content/uploads/sites/28/2020/08
/mygov_159453381955063671.pdf.
METI (Ministry of Economy, Trade, and Industry, Japan).
2018. “METI Formulates ‘Contract Guidance on Utiliza-
tion of AI and Data.’” News release, June 15, 2018. https://
www.meti.go.jp/english/press/2018/0615_002.html.
METI (Ministry of Economy, Trade, and Industry, Japan).
2020. “Unfair Competition Prevention Act.” Policy Index.
Intellectual Property Policy Office, METI, Tokyo. https://
www.meti.go.jp/english/policy/economy/chizai/chiteki
/index.html.
MITCI (Ministry of Technology, Communication, and Inno-
vation, Mauritius). 2017. “National Open Data Policy.”
MITCI, Quatre Bornes, Mauritius. https://mitci.govmu
.org/Documents/Strategies/Mauritius%20Open%20Data
%20Policy%20May%202017.pdf.
MoICT (Ministry of Information and Communication Tech-
nology, Jordan). 2017. “Open Government Data Policy.”
MoICT, Amman, Jordan. https://modee.gov.jo/ebv4.0
/root_storage/en/eb_list_page/open_government_data
_policy_2017.pdf.
National Archives, United Kingdom. 2017. “Digital Economy
Act 2017.” legislation.gov.uk, National Archives, London.
https://www.legislation.gov.uk/ukpga/2017/30/contents
/enacted.
National Archives, United Kingdom. 2019. “Guidance on the
Implementation of the Re-use of Public Sector Informa-
tion Regulations 2015: For Public Sector Bodies.” Version
1.1, National Archives, London. https://www.national
archives.gov.uk/documents/information-management
/psi-implementation-guidance-public-sector-bodies.pdf.
National Assembly, Togo. 2020. “Loi Relative a l’Identifi-
cation Biometrique des Personnes Physiques au Togo”
[Law on the biometric measurement of natural persons
in Togo]. National Assembly, Open Session, Septem-
ber 3. http://www.assemblee-nationale.tg/images/bio
metrie%20loi%20AN.pdf.
Nilsson, Patricia. 2019. “Police Fear Bias in Use of Arti-
ficial Intelligence to Fight Crime.” Financial Times,
September 15, 2019. https://www.ft.com/content
/5753689c-d63e-11e9-a0bd-ab8ec6435630.
NIST (National Institute of Standards and Technology).
2020. “NIST Study Evaluates Effects of Race, Age, Sex
on Face Recognition Software.” News, December 19,
2019, updated May 18, 2020, NIST, US Department of
Commerce, Gaithersburg, MD. https://www.nist.gov
/news-events/news/2019/12/nist-study-evaluates-effects
-race-age-sex-face-recognition-software.
Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search
Engines Reinforce Racism. New York: NYU Press. https://
nyupress.org/9781479837243/algorithms-of-oppression.
Noveck, Beth Simone. 2017. “Rights-Based and Tech-Driven:
Open Data, Freedom of Information, and the Future
of Government Transparency.” Yale Human Rights and
Development Law Journal 19 (1): article 1. https://digital
commons.law.yale.edu/yhrdlj/vol19/iss1/1.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2013. The OECD Privacy Framework. Paris: OECD.
http://www.oecd.org/sti/ieconomy/oecd_privacy_frame
work.pdf.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2019a. Enhancing Access to and Sharing of Data:
Reconciling Risks and Benefits for Data Re-Use across Societies.
Paris: OECD. https://doi.org/10.1787/276aaca8-en.
OECD (Organisation for Economic Co-operation and Devel-
opment). 2019b. “Licensing of IP Rights and Competition
Law.” Background Note DAF/COMP(2019)3, Competition
Committee, Directorate for Financial and Enterprise
Affairs,OECD,Paris.https://one.oecd.org/document/DAF
/COMP(2019)3/en/pdf.
OECD (Organisation for Economic Co-operation and
Development). 2019c. “Recommendation of the Coun-
cil on Artificial Intelligence.” OECD Legal Instruments,
OECD/LEGAL/0449, adopted on May 22, 2019. https://
220 | World Development Report 2021
legalinstruments.oecd.org/en/instruments/OECD
-LEGAL-0449.
OneTrust DataGuidance and FPF (Future of Privacy Forum).
2019. “Comparing Privacy Laws: GDPR v. CCPA.”
OneTrust, London; FPF, Washington, DC. https://fpf
.org/wp-content/uploads/2019/12/ComparingPrivacy
Laws_GDPR_CCPA.pdf.
Open Knowledge Foundation. 2020. “What Is Open Data?”
Open Data Handbook: Guide. London: Open Knowledge
Foundation. https://opendatahandbook.org/guide/en
/what-is-open-data/.
OVIC (Office of the Victorian Information Commissioner).
2020. “Victorian Protective Data Security Framework,
Version 2.0.” OVIC, Melbourne. https://ovic.vic.gov.au
/wp-content/uploads/2020/02/Victorian-Protective-Data
-Security-Framework-V2.0.pdf.
Parliament of India. 2019. “The Personal Data Protection
Bill, 2019.” Bill No. 373 of 2019, Parliament of India, New
Delhi. https://dataprotectionindia.in/act/.
PC (Productivity Commission, Australia). 2017. Data Avail-
ability and Use. Productivity Commission Inquiry Report
82. Canberra: PC. https://www.pc.gov.au/inquiries
/completed/data-access/report/data-access.pdf.
PCPD (Office of the Privacy Commissioner for Personal
Data, Hong Kong SAR, China). 2012. “Privacy by Design
Conference.” PCPD, Hong Kong SAR, China. https://
www.pcpd.org.hk/pbdconference/index.html.
PMC (Department of the Prime Minister and Cabinet,
­
Australia). 2019. “Best Practice Guide to Applying Data
Sharing Principles.” PMC, Canberra. https://www.pmc
.gov.au/resource-centre/public-data/data-sharing
-principles.
Potey, Manish M., C. A. Dhote, and Deepak H. Sharma. 2016.
“Homomorphic Encryption for Security of Cloud Data.”
Procedia Computer Science 79 (January): 175–81. https://doi
.org/10.1016/j.procs.2016.03.023.
Ragavan, Srividhya, Brendan Murphy, and Raj Davé. 2016.
“FRAND v. Compulsory Licensing: The Lesser of the Two
Evils.” Duke Law andTechnology Review 14 (1): 83–120.
RIA (Information System Authority, Estonia). 2020. “Data
Exchange Layer X-Tee.” RIA, Tallinn, Estonia. https://
www.ria.ee/en/state-information-system/x-tee.html#:
~:text=X%2Dtee%2C%20the%20data%20exchange,data
%20based%20on%20an%20agreement.
Smith, Jack, IV. 2016. “Crime-Prediction Tool May Be Rein-
forcing Discriminatory Policing—Business Insider.” Busi-
nessInsider,October10,2016.https://www.businessinsider
.com/predictive-policing-discriminatory-police-crime
-2016-10?r=UK.
Stats NZ (Statistics New Zealand). 2019. “Algorithm Char-
ter.” Stats NZ, Wellington, New Zealand. https://data
.govt.nz/assets/Uploads/Draft-Algorithm-Charter-for
-consultation.pdf.
Sweeney, Latanya. 2000. “Simple Demographics Often Iden-
tify People Uniquely.” Data Privacy Working Paper 3,
Carnegie Mellon University, Pittsburgh.
Sweeney, Latanya. 2002. “k-Anonymity: A Model for Protect-
ing Privacy.” International Journal of Uncertainty, Fuzziness
and Knowledge-Based Systems 10 (05): 557–70. https://doi
.org/10.1142/S0218488502001648.
Taylor, Linnet, Luciano Floridi, and Bart van der Sloot, eds.
2017. Group Privacy: New Challenges of Data Technologies.
Philosophical Studies Series, vol. 126. Cham, Switzerland:
Springer.
TBS (Treasury Board of Canada Secretariat). 2020. “Respon-
sible Use of Artificial Intelligence (AI).” Canada.ca,
July 28, 2020, TBS, Ottawa. https://www.canada.ca/en
/government/system/digital-government/digital
-government-innovations/responsible-use-ai.html.
TOOP (The Once-Only Principle Project). 2021. “The Once-
Only Principle Project.” Tallinn University of Technol-
ogy, Tallinn, Estonia. https://www.toop.eu/about.
Treasury, Australia. 2020. Inquiry into Future Directions for the
Consumer Data Right. Canberra: Treasury. https://treasury
.gov.au/sites/default/files/2020-12/cdrinquiry-accessible
final.pdf.
Ubaldi, Barbara. 2013. “Open Government Data: Towards
Empirical Analysis of Open Government Data Initia-
tives.” OECD Working Paper on Public Governance 22,
Organisation for Economic Co-operation and Develop-
ment, Paris. https://doi.org/10.1787/5k46bj4f03s7-en.
UNCITRAL (United Nations Commission on International
Trade Law). 1998. “UNCITRAL Model Law on Electronic
Commerce (1996) with Additional Article 5 bis as Adopted
in 1998.” UNCITRAL, Vienna. https://uncitral.un.org/en
/texts/ecommerce/modellaw/electronic_commerce.
UNCITRAL (United Nations Commission on International
Trade Law). 2001. “UNCITRAL Model Law on Electronic
Signatures (2001).” UNCITRAL, Vienna. https://uncitral
.un.org/en/texts/ecommerce/modellaw/electronic
_signatures.
UNCITRAL (United Nations Commission on International
Trade Law). 2009. Promoting Confidence in Electronic Com-
merce: Legal Issues on International Use of Electronic Authen-
tication and Signature Methods. Vienna: United Nations.
https://www.uncitral.org/pdf/english/texts/electcom/08
-55698_Ebook.pdf.
UNSDG (United Nations Sustainable Development Group).
2017. “Data Privacy, Ethics, and Protection: Guidance
Note on Big Data for Achievement of the 2030 Agenda.”
UNSDG, New York. https://unsdg.un.org/sites/default
/files/UNDG_BigData_final_web.pdf.
Vickery, Graham. 2012. “Review of Recent Studies on PSI
Re-use and Related Market Developments.” Information
Economics, Paris.
Ville de Nantes (City of Nantes, France). 2019. “Charte mét-
ropolitaine de la donnée” [Metropolitan data charter].
Nantes Métropole, May 2019, Ville de Nantes, France.
https://metropole.nantes.fr/charte-donnee.
Waze. 2018. “Waze Celebrates 600 Connected Citizens
Program Partners.” Waze, Google, Mountain View, CA.
https://medium.com/waze/waze-celebrates-600
-connected-citizens-program-partners-36945fbceb66.
WEF (World Economic Forum). 2011. “Personal Data: The
Emergence of a New Asset Class.” In collaboration with
Bain  Company, Inc., WEF, Geneva. http://www3.we
forum.org/docs/WEF_ITTC_PersonalDataNewAsset
_Report_2011.pdf.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aal-
bersberg, Gabrielle Appleton, Myles Axton, Arie Baak,
Data policies, laws, and regulations: Creating a trust environment | 221
Niklas Blomberg, et al. 2016. “The FAIR Guiding Princi-
ples for Scientific Data Management and Stewardship.”
Scientific Data 3 (March 15): 160018. https://doi.org/10.1038
/sdata.2016.18.
World Bank. 2018. “Global Indicators of Regulatory Gov-
ernance: Worldwide Practices of Regulatory Impact
Assessments.” World Bank, Washington, DC. http://docu
ments1.worldbank.org/curated/en/905611520284525814
/Global-Indicators-of-Regulatory-Governance-World
wide-Practices-of-Regulatory-Impact-Assessments.pdf.
World Bank. 2019. ID4D Practitioner’s Guide: Version 1.0.
October 2019. Washington, DC: World Bank. https://
documents.worldbank.org/en/publication/documents
-reports/documentdetail/248371559325561562/id4d
-practitioner-s-guide.
World Bank and United Nations. 2017. Combatting Cyber-
crime: Tools and Capacity Building for Emerging Economies.
Washington, DC: World Bank. http://documents.world
bank.org/curated/en/355401535144740611/Combatting
-Cybercrime-Tools-and-Capacity-Building-for-Emerging
-Economies.
World Wide Web Foundation. 2017. Open Data Barometer:
Global Report, 4th ed. Washington, DC: World Wide Web
Foundation. https://opendatabarometer.org/doc/4th
Edition/ODB-4thEdition-GlobalReport.pdf.
Zanfir-Fortuna, Gabriela, and Sasha Hondagneu-Messner.
2019. “CPDP 2019 Panel: Understanding the Limits
and Benefits of Data Portability.” Future of Privacy
Forum, 2019 Computers, Privacy, and Data Protection
Conference, Brussels, February 26, 2019. https://fpf.org
/2019/02/26/cpdp-2019-panel-understanding-the-limits
-and-benefits-of-data-portability/.
Zhang, Gil, and Kate Yin. 2020. “A Look at China’s Draft
of Personal Data Protection Law.” PrivacyTracker, Interna-
tional Association of Privacy Professionals, Portsmouth,
NH. https://iapp.org/news/a/a-look-at-chinas-draft-of
-personal-data-protection-law/.
222 | World Development Report 2021
The need for immediate and reliable information about COVID-19 has
tested the systems in place for protecting data.
Spotlight 6.1
The evolving social contract on data:
Balancing data sharing and data
protection to facilitate contact tracing
to control COVID-19
International and national laws recognize that, in
extraordinary circumstances, certain fundamental
rights, including the right to data protection, may
be restricted, with the following conditions: basic
democratic principles and safeguards are ensured,
and the restriction is legitimate, time limited, and not
arbitrary.1
Following the World Health Organization’s dec-
laration of a global pandemic in mid-March 2020,2
governments around the world have adopted con-
tact tracing strategies to track down any individual
who might have come into contact with an infected
person, so that they may be quarantined to prevent
further spread of the disease.3
Such contact tracing
has historically been carried out manually by public
health authorities.4
However, it can be undertaken
much more efficiently on a massive scale using dig-
ital technologies such as mobile applications, which
can simultaneously deliver public health advice.
Despite these benefits, contact tracing raises
several concerns. First, tools relying on location trac-
ing may be construed as unwarranted surveillance
and a threat to privacy, especially in jurisdictions
with inadequate data protection frameworks5
and
given that location data are hard​to anonymize fully.
Second, personal data collected in contact tracing
currently flow beyond trusted parties and organi-
zations, reaching more third parties than accounted
for in current governance models. Third, there is
evidence that using geographic location in contact
tracing may be inaccurate and inefficient because it
does not provide all of the relevant facts. An empiri-
cal study of the Ebola outbreak found that those data
are meaningful only when reidentified, touching on
the “purpose limitation” used in good-practice data
protection laws.
While countries around the world have been
developing contact tracing apps, two approaches
have emerged: centralized and decentralized. Both
approaches use Bluetooth signals to log when smart-
phone owners are in proximity to one another, send-
ing alerts to users who may have been infected when
someone develops COVID-19 symptoms.
Under the centralized model originally pursued by
the UK government, anonymized data are gathered
and uploaded to a remote server, where matches are
made with other contacts when a person starts to
experience COVID-19 symptoms.6
The United King-
dom’s proposed approach contains a persistent iden-
tifier that is shared with the National Health Service,
allowing public authorities to receive infection data
automatically. The central server then alerts other
app users who have had significant contact with the
infected person. Despite the public health merits of
the centralized approach, the application was aban-
doned in mid-June 2020 in favor of a decentralized
approach, due to low rates of phone recognition
during its testing phase on the Isle of Wight.
In contrast, the decentralized model, promoted
jointly by Apple and Google, aims to support contact
tracing by health agencies, while integrating privacy
and security into the design.7
Users have more control
over their information because it is stored in a decen-
tralized manner on their phones, preventing the
siphoning of data into central government servers. In
this model, “The protocol excludes processing of any
location data—unless the user opts in—applies ‘Roll-
ing Proximity Identifiers’ that prevent identification
Balancing data sharing and data protection to facilitate contact tracing to control COVID-19 | 223
of the user, processes proximity identifiers obtained
from other devices exclusively on the device, [and]
permits only users to decide whether to contribute
to contact tracing by sharing Diagnosis Keys with
the ‘Diagnosis Server’ if diagnosed with COVID-19,
resulting in the alert to other users.”8
A multistakeholder consortium, the Pan-European
Privacy-Preserving Proximity Tracing (PEPP-PT)
community, is developing contact tracing technolo-
gies that adhere to strong European privacy and data
protection laws and principles.9
The PEPP-PT techni-
cal mechanisms and standards fully protect privacy,
while taking advantage of the possibilities of digital
technology to maximize the speed and real-time
capability of national pandemic responses. Initiatives
under the umbrella of PEPP-PT aim to develop an
open protocol for COVID- 19 proximity tracing using
Bluetooth Low Energy on mobile devices and an
architecture to ensure that personal data stay entirely
on an individual’s phone.
Notes
1. Access Now (2020).
2. WHO (2020).
3. Yan (2020).
4. eHealth Network (2020).
5. FPF (2020).
6. Economist (2020).
7. Google (2020); Sabbagh and Hern (2020).
8. eHealth Network (2020).
9. See Pan-European Privacy-Preserving Proximity Tracing
(dashboard), n.d.
References
Access Now. 2020. “Recommendations on Privacy and Data
Protection in the Fight against COVID-19.” Access Now,
Brooklyn, NY, March 2020. https://www.accessnow.org
/cms/assets/uploads/2020/03/Access-Now-recommen
dations-on-Covid-and-data-protection-and-privacy.pdf.
Economist. 2020. “Privacy Be Damned: Some Countries
Want Central Databases for Contact-Tracing Apps.”
April 30, 2020. https://www.economist.com/europe/2020
/04/30/some-countries-want-central-databases-for
-contact-tracing-apps.
eHealth Network. 2020. “Mobile Applications to Support
Contact Tracing in the EU’s Fight against COVID-19:
Common EU Toolbox for Member States.” Version
1.0, eHealth Network, Brussels, April 15, 2020. https://
ec.europa.eu/health/sites/health/files/ehealth/docs
/covid-19_apps_en.pdf.
FPF (Future of Privacy Forum). 2020. “Privacy  Pandemics:
The Role of Mobile Apps (Chart).” FPF, Washington, DC,
April 2020. https://fpf.org/wp-content/uploads/2020/04
/editPrivacy-Pandemics_-The-Role-of-Mobile-Apps
-Chart-11.pdf.
Google. 2020. “Apple and Google Partner on COVID-19
Contact Tracing Technology.” Company Announcements
(blog), April 10, 2020. https://blog.google/inside-google
/company-announcements/apple-and-google-partner
-covid-19-contact-tracing-technology/.
Pan-European Privacy-Preserving Proximity Tracing (dash-
board). n.d. “PEPP-PT.” GitHub. Accessed December 15,
2020. https://github.com/pepp-pt.
Sabbagh, Dan, and Alex Hern. 2020. “UK Abandons
Contact-Tracing App for Apple and Google Model.”
Guardian, June 18, 2020. https://www.theguardian.com
/world/2020/jun/18/uk-poised-to-abandon-coronavirus
-app-in-favour-of-apple-and-google-models.
WHO (World Health Organization). 2020. “WHO Announces
COVID-19 Outbreak a Pandemic.” Media Release, March
12, 2020. WHO Regional Office for Europe, Copenha-
gen. https://www.euro.who.int/en/health-topics/health
-emergencies/coronavirus-covid-19/news/news/2020/3
/who-announces-covid-19-outbreak-a-pandemic.
Yan, Holly. 2020. “Contact Tracing 101: How It Works, Who
Could Get Hired, and Why It’s So Critical in Fighting
Coronavirus Now.” CNN Health (blog), May 15, 2020.
https://www.cnn.com/2020/04/27/health/contact-tracing
-explainer-coronavirus/index.html.
224 | World Development Report 2021
Personal data “ownership” is incompatible with a rights-based approach
to personal data protection.
Spotlight 6.2
The debate over ownership of
personal data
The commercial value of personal data has prompted
arguments that individuals should be allowed to
commercialize their own data.1
Granting individuals
ownership rights over personal data is sometimes
proposed to address data inequalities and to deter-
mine how such data can be used and by whom.2
Yet
“ownership” neither addresses these inequalities nor
empowers individuals to control the use of data. Per-
sonal data “ownership” makes sense only if personal
data are considered an “asset” with associated prop-
erty rights.3
If personal data are property, they can be
used as collateral and for commercial exchange, with
the potential implication that individuals could even
trade away the data that contain their digital identity.
Some scholars suggest that concepts of property
rights should apply to personal data.4
Others suggest
that market-based solutions should be used to protect
data,5
called the “personal data economy.”6
The eco-
nomic literature is mixed on whether data ownership
rights could solve market failures or improve social
outcomes.7
Some suggest that the optimal distribu-
tion of ownership rights would depend on factors
including the investment required to create the data8
and the ability to monetize data.9
A regime based on
property rights would likely increase the transaction
costs involved in data sharing, by requiring negotia-
tion of the terms of sale and use.
Ascribing data ownership rights to personal data
also poses legal challenges. First, personal data often
involve overlapping interests of different parties.10
These interests are present in the collection, creation,
and use of the data.11
If ownership were allocated to
the “party with the clearest interest or who could
make the most value out of it,”12
it would be practi-
cally difficult to identify the party or parties meeting
this definition.13
It is also unclear how to compensate
interested third parties if their rights are breached
through downstream data uses.14
Creating a data
ownership right would require elaborating “neces-
sary user” rights and rules to accommodate the public
interest needs of such data,15
such as those raised in
the COVID-19 pandemic response.
Second, “owning” personal data might incentivize
poor and more vulnerable people to sell their personal
data, exacerbating existing inequities. Under a rights-
basedapproachtopersonaldataprotection,individuals
have fundamental rights regarding their data. Perhaps
ironically, these rights—more than “ownership”—give
individuals control over their data, enabling them to
negotiate the use of these data.16
These immutable
rights—like due process under law—cannot be bar-
gained away like chattel. Even current case law does
not support ownership rights over personal data.
Notes
1. Start-ups providing personal data management services
to internet users have appeared, ranging from compa-
nies that compensate users for their personal informa-
tion to those that require users to pay fees to avoid the
use of their personal information (Elvy 2017).
2. This spotlight deals only with issues concerning “own-
ership.” Other theories include treating personal data
as labor (see Posner and Weyl 2018, who posit that the
individual’s role in creating the data is recognized and
compensated as labor) or allowing personal data to be
shared through licensing arrangements (see Savona
2019, who suggests that data could be recognized as a
The debate over ownership of personal data | 225
Farkas, Thomas J. 2017. “Data Created by the Internet of
Things: The New Gold without Ownership?” Revista
la Propiedad Inmaterial 23 (June): 5–17. https://revistas
.uexternado.edu.co/index.php/propin/article/view/4975.
Fisher, Angelina, and Thomas Streinz. 2021. “Confronting
Data Inequality.” WDR 2021 background paper, World
Bank, Washington, DC. https://papers.ssrn.com/sol3
/papers.cfm?abstract_id=3825724.
Haupt, Michael. 2016. “Introducing Personal Data Exchanges
and the Personal Data Economy.” #ExitTheSystem (blog),
December 7, 2016. https://medium.com/project-2030
/what-is-a-personal-data-exchange-256bcd5bf447.
Kerber, Wolfgang. 2016. “Digital Markets, Data, and
Privacy: Competition Law, Consumer Law, and Data Pro-
tection.” MACIE Paper 2016/3, Marburg Centre for Insti-
tutional Economics, School of Business and Economics,
Philipps-University Marburg, Marburg, Germany.
Laudon, Kenneth C. 1996. “Markets and Privacy.” Communi-
cations of the ACM 39 (9): 92–104. https://doi.org/10.1145
/234215.234476.
Posner, Eric A., and E. Glen Weyl. 2018. Radical Markets:
Uprooting Capitalism and Democracy for a Just Society.
Princeton, NJ: Princeton University Press.
Samuelson, Pamela. 2000. “Privacy as Intellectual Property?”
Stanford Law Review 52 (5): 1125–73. https://doi.org/10
.2307/1229511.
Savona, Maria. 2019. “The Value of Data: Towards a Frame-
work to Redistribute It.” SPRU Working Paper SWPS
2019-21, Science Policy Research Unit, Business School,
University of Sussex, Brighton, UK, October 2019.
Scassa, Teresa. 2017. “Sharing Data in the Platform Economy:
A Public Interest Argument for Access to Platform Data.”
UBC Law Review 54 (4): 1017–71.
Tirole, Jean. 2017. Economics for the Common Good. Prince­ton,
NJ: Princeton University Press. https://press.princeton
.edu/books/hardcover/9780691175164/economics-for-the
-common-good.
Viljoen, Salomé. 2020. “Data as Property.” Phenomenal World,
October 16, 2020. https://phenomenalworld.org/analysis
/data-as-property.
Wiebe, Andreas. 2016. “Protection of Industrial Data: A
New Property Right for the Digital Economy?” GRUR Int
10/2016 (October): 877–83. http://www.grur.org/uploads
/media/GRURInt_2016_10_Inhalt_fertig.pdf.
Zech, Herbert. 2016. “Data as a Tradeable Commodity.” In
European Contract Law and the Digital Single Market: The
Implications of the Digital Revolution, edited by Alberto De
Franceschi, 51–80. Cambridge, UK: Intersentia. https://
doi.org/10.1017/9781780685212.004.
Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism:
The Fight for a Human Future at the New Frontier of Power.
New York: PublicAffairs.
licensable asset owned by the individual who generates
it); see also Fisher and Streinz 2021. A related aspect
of this debate revolves around expanding the types of
data available to creditors and other decision makers
beyond traditional data, such as payments on loans.
These nontraditional types of data could include utility
payments, cash flow, and social media data. The reliabil-
ity of such data and the ability to access and dispute the
information are important issues. The analytics applied
to such data may also fall in the nonpersonal category.
Ownership of nonpersonal data, by contrast, is a more
straightforward issue of intellectual property rights,
which is addressed in chapter 6.
3. Castells (2010); Zuboff (2019).
4. Laudon (1996); Samuelson (2000).
5. See, for example, Carrascal et al. (2013) and Kerber (2016).
6. See, for example, Haupt (2016).
7. Duch-Brown, Martens, and Mueller-Langer (2017).
8. Tirole (2017); Zech (2016).
9. Dosis and Sand-Zantman (2019).
10. Scassa (2017).
11. Scassa (2017).
12. Wiebe (2016, 880).
13. See, for example, Farkas (2017).
14. Viljoen (2020)
15. Scassa (2017).
16. Scassa (2017).
References
Carrascal, Juan Pablo, Christopher Riederer, Vijay Erramilli,
Mauro Cherubini, and Rodrigo de Oliveira. 2013. “Your
Browsing Behavior for a Big Mac: Economics of Per-
sonal Information Online.” In WWW ’13: Proceedings of the
22nd International Conference on World Wide Web, 189–200.
New York: Association for Computing Machinery.
http://jpcarrascal.com/docs/publications/WWW2013
-Browsing_behavior_big_mac.pdf.
Castells, Manuel. 2010. The Information Age: Economy, Society,
and Culture. Vol 1: The Rise of the Network Society, 2d ed.
Malden, MA: Wiley-Blackwell.
Dosis, Anastasios, and Wilfried Sand-Zantman. 2019.
“The Ownership of Data.” TSE Working Paper 19-1025,
Toulouse School of Economics, University of Toulouse,
Toulouse, France, July 2019.
Duch-Brown, Nestor, Bertin Martens, and Frank Mueller-
Langer. 2017. “The Economics of Ownership, Access, and
Trade in Digital Data.” JRC Digital Economy Working
Paper 2017–01, Joint Research Center, European Com-
mission, Seville, Spain.
Elvy, Stacy-Ann. 2017. “Paying for Privacy and the Personal
Data Economy.” Colombia Law Review 117 (6): 1369–459.
World-Development-Report-2021-Data-for-Better-Lives.pdf
Creating value in the data economy: The role of competition, trade, and tax policy | 227
Main messages
The expanding role of data in ubiquitous platform business models is
reshaping competition, trade, and taxation in the real economy, posing
important risks for low- and middle-income countries.
The way countries design safeguards and enablers for data will have
knock-on effects for the real economy. For example, enabling data
sharing among market players can play a valuable role in promoting
competition. At the same time, the stringency of data safeguards will
shape cross-border trading patterns for data-enabled services. Meanwhile,
the intangible nature of digital value chains is posing major challenges for
tax revenue mobilization.
Low- and middle-income countries too often lack the institutional
capacity to manage the economic policy challenges posed by the data-
driven economy. These challenges call for agile competition policies and
modern trade and tax administrations. Complicating matters, policies on
competition, trade, and taxation are significantly intertwined.
Internationally coordinated action—on antitrust enforcement, regulation
of platform firms, data standards, trade agreements, and tax policy—
is critical to ensuring efficient, equitable policies for the data economy
that respond to countries’ needs and interests.
Creating value in the data
economy: The role of competition,
trade, and tax policy
7
CHAPTE
R
1
2
3
4
228 | World Development Report 2021
Shaping data regulation to
support competition, trade, and
taxation
R
apid technological innovation and the associ-
ated explosive production of data are reshap-
ing the business landscape (see chapter 3).
New data-driven businesses—with their intangible
assets such as data and algorithms—are rapidly
gaining ground in markets worldwide. These include
the global players that often make headlines, such
as Alibaba, Amazon, Facebook, and Alphabet (Google),
as well as more local platform businesses in lower-
income countries, such as the Nigerian agricultural
platform Hello Tractor, which matches smallholder
farmers with underutilized tractors, and the Egyptian
educational platform Tutorama, which matches stu-
dents to high-quality tutors.
Data-driven businesses exhibit idiosyncrasies that
distinguish them from more traditional firms—nota-
bly, their greater propensity for market dominance
in some cases, their ability to achieve scale without
mass, and the intangibility of their transactions (fig-
ure 7.1). Each of these characteristics poses important
challenges for economic outcomes in competition,
trade, and taxation, potentially offsetting some of
their positive effects.
Using data as an input to the production process
can give businesses a competitive advantage, which,
because of economies of scale and scope associated
with data and the strong network effects arising from
platforms, enables them to entrench their market
position and potentially exercise market power. Data-
driven markets tend to exhibit economies of scale
because of the large upfront fixed cost of technical
infrastructure and the advantage large datasets offer
for learning. By accumulating tremendous amounts
of data through transactions and applying algo-
rithms, businesses are able to provide their clients
with customized services and products. For example,
e-commerce platforms tailor product suggestions to
their clients’ shopping history, thereby enforcing the
tendency of customers to stick with such platforms
because they “understand” their customers better.
Figure 7.1 The unusual characteristics of data-driven businesses pose complex
challenges for policy makers in the areas of competition, trade, and taxation
• Adapt antitrust rules to
challenges posed by
data-driven businesses
• Adopt pro-competition data
remedies subject to
safeguards for innovation
• Ensure that traditional
regulations promote
competition and consumer
choice
• Create a level playing field
between digital and
analog businesses
• Adapt indirect tax
administration to platform
businesses
• Adapt direct tax rules to
account for intangible value
drivers
• Establish regulatory
framework for
cross-border data flows
• Balance individual rights,
implementation costs, and
business opportunities
• Adopt international (trade)
agreements for data
Data as a factor of production
Propensity for
market dominance
Competition
between firms
Competition policy Tax policy Trade policy
Entry of new firms and competition dynamics drive ability toexport
Trade in digital services can drive competition in importing countries
Trade between
countries
Scale without mass
Intangibility
of transactions
Figure 7.1 (2 column 33p7)
Source: WDR 2021 team.
Creating value in the data economy: The role of competition, trade, and tax policy | 229
Positive network effects further strengthen market
positions because third-party suppliers prefer to
display their products on more highly frequented
platforms where the odds are higher they will make a
sale. Customers, in turn, are more likely to visit such
platforms because of the wider choice of third-party
suppliers.
Because data-driven businesses rely on value
­
drivers such as data and algorithms that can be
infinitely replicated and instantly distributed around
the world, they are able to achieve scale without mass.
They can, then, be a major player in a market without
having any brick-and-mortar facilities, local employ-
ees, or even domestic business registration. For
example, Facebook has more than 2.6 billion active
visitors each month spread over nearly every country
in the world, but it maintains office locations in just
35 countries.1
Similarly, AliExpress, a Chinese-based
e-commerce platform launched in 2010, now has
users in more than 230 countries and regions.2
It, too,
relies on data and algorithms rather than offline retail
stores. Meanwhile, the cross-border trade in digital
services is climbing. These services, largely intan-
gible, include social media, big data analytics, cloud
computing, and online services.
The changing nature of data-driven businesses
challenges traditional economic policies, calling for
adaptation and paradigm shifts (figure 7.1). In compe-
tition policy, it is important to address the tendency
of data-driven businesses to tip toward concentrated
market structures and entrenched market power.
However,thecomplexandnovelnatureofdata-driven
business models can pose difficulties for traditional
antitrust regulation in all countries. At the same time,
the rapid growth of international trade in data-driven
services is raising tensions between the need for data
to flow across borders and the need to ensure that any
personal data involved in such trade are adequately
protected as they move into other jurisdictions. Tax-
ation authorities are struggling to value, map, and
track digital value chains in the data-driven sector.
Administration of the value added tax (VAT) is com-
plicated by third-party vendors operating through
platform businesses. Meanwhile, the prevalence of
scale without mass is challenging the long-standing
taxation principle of permanent establishment and
physical presence as a basis for establishing corporate
tax liability.
Domestic policies relating to competition, trade,
and taxation are significantly intertwined, requiring
a coordinated approach to policy reform. Effective
antitrust policies strengthen competition in the
domestic market, thereby enhancing competitiveness
in international trade. Governance frameworks that
support trade in data-enabled goods and services may
at the same time intensify competitive pressures in
the local economy. Nonetheless, for platform busi-
nesses, trade openness could lead instead to greater
market concentration as the dominance of global
firms is further reinforced by access to new markets
and associated data. At the same time, the inability to
tax platform businesses effectively may undermine
competition between local and foreign firms, as well
as between digital and analog businesses operating in
thesamesectors,affectingbothcompetitionindomes-
tic markets and competitiveness in global markets.
All three of these policy areas call for internation-
ally coordinated action. Antitrust measures in one
country can affect the fortunes of globally active
firms, with spillover effects for other jurisdictions.
Trade agreements, as well as global harmonization
of technical standards for data sharing, play an essen-
tial role in promoting international commerce for
data-enabled goods and services. International tax
policy regimes determine the allocation of taxation
rights across countries, while cross-border coopera-
tion on tax administration can help ensure that rev-
enues are fully captured.
This chapter discusses the policy challenges and
responses arising from competition, trade, and tax-
ation. A central theme is how these policies interact
with the broader legal and regulatory framework on
data safeguards and enablers described in chapter 6.
Policy makers should heed the two-way linkages
between economic outcomes and data regulation and
carefully weigh the trade-offs that may arise between
safeguarding and enabling data sharing versus
advancing wider economic goals such as productivity,
competitiveness, and growth.
Competition policy
The rise of data-driven businesses can drive pro-
development market opportunities, but data can also
give firms a competitive advantage that may push
markets into entrenched concentration and market
power. This advantage increases the risk of excluding
smaller firms and entrepreneurs and exploiting indi-
vidual users of data (chapter 3). The key challenge for
policy makers is to preserve the positive externalities
that create value in data-driven markets, while ensur-
ing that these externalities can be harnessed by all
players in a competitive, vibrant ecosystem without
violating the rights of individuals.
This chapter focuses on data-driven platform
firms because of their pertinence across economies
230 | World Development Report 2021
at different levels of development. The complex inter-
actions between data protection concerns (outlined
in chapter 6) and competition concerns (described in
this chapter) are further explored in spotlight 7.1.
Governments have two complementary compe-
tition policy tools to safeguard against the risks of
excluding smaller firms and exploiting individual
users of data. The first tool is enforcement of antitrust
laws, with adaptations to the context of data-driven
businesses. Enforcement involves detecting and
punishing anticompetitive practices (in which a firm
abuses its dominant position or a group of firms
enters into an anticompetitive agreement) or pre-
venting anticompetitive mergers. The second tool is
the design of regulations to allow data-driven firms to
enter markets and compete on a level playing field,
while also protecting users. These ex ante policies lay
down ground rules for the market to promote compe-
tition and could include data governance regulations,
regulations directed at large data-driven platforms,
and traditional sector regulations. Because these two
competition policy tools complement one another,
they can be applied in parallel, depending on the insti-
tutional setting and the issues to be tackled.
Enforcing antitrust laws
Antitrust investigations in the data economy are
not just a developed country phenomenon. World-
wide, as of January 2020 some 102 antitrust cases
across 16 different sectors on abuse of dominance,
anticompetitive agreements, and mergers had been
finalized.3
European antitrust authorities have final-
ized the most cases (33 percent), followed by authori-
ties in East Asia and the Pacific (18 percent) and Latin
America (15 percent). The most active lower-middle-
income countries included the Arab Republic of
Egypt, India, and Kenya, with more cases still under
investigation in Indonesia, Nigeria, and Zimbabwe.
Low-income jurisdictions had not yet finalized any
antitrust cases involving digital platforms, likely
reflecting the absence of functional or well-resourced
antitrust regimes and lower policy prioritization.4
Among landmark cases from the middle-income
country group is the 2018 investigation by the Com-
petition Commission of India finding that Google
abused its dominant position in web search and
advertising. Google favored its own services and
partners through manual manipulation of its search
algorithm, thereby putting smaller businesses at
a disadvantage.5
In another salient example, Mex-
ico’s Federal Economic Competition Commission
(COFECE) blocked Walmart’s proposed acquisition
of the Cornershop app in 2019, because the new com-
pany would have access to data on the sales of com-
peting retailers through the Cornershop platform,
which was believed to prejudice smaller rivals.6
Antitrust cases related to the digital economy in
e-commerce, passenger transport, and operating
system application development account for more
than half of cases globally (figure 7.2). E-commerce
Figure 7.2 In the digital economy, antitrust cases related to passenger transport are more
prevalent in middle-income countries than in high-income countries
Source: Nyman and Barajas, forthcoming. Data at http://bit.do/WDR2021-Fig-7_2.
Note: Percentages are based on information for all antitrust cases involving digital platforms globally for which information was publicly available as of January 2020. Cases total 62 in
high-income countries and 40 in middle-income countries. Panel a: “Others” includes digital music, e-books, educational materials, food delivery, online comparison platforms, dating
platforms, and ticketing. Panel b: “Others” includes online delivery services, ticketing, and tourism. No cases were finalized in low-income countries, according to publicly available
information.
a. High-income countries (% of total cases)
Retail e-commerce,
21
Others,
15
Online search
and advertising,
15
Tourism
(accommodation),
13
Social
media and
communica-
tions,
10
Software and
operating systems,
15
Passenger
transport,
8
Online
real estate
platforms, 5
Retail e-commerce,
23
Software and
operating systems,
18
Others,
10
Passenger transport,
35
Mobile
financial
services,
8
Online
search and
advertising,
8
b. Middle-income countries (% of total cases)
itrust
cases
9
1
1
10
12
14
16
Creating value in the data economy: The role of competition, trade, and tax policy | 231
cases account for about 20 percent of total cases sam-
pled and are equally prevalent in middle- and high-
income jurisdictions. Cases related to the passenger
transport sector are the most prevalent in middle-
income countries, where they make up one-third
of all cases, reflecting in part a wave of mergers
between ride-hailing firms in middle-income coun-
tries in recent years.
Abuse of dominance cases are more prevalent in
middle-income countries, while cases on anticompet-
itive agreements are more common in high-income
countries. The type of anticompetitive behavior
by data-driven platforms in different sectors may
depend on the nature of their business models. In
the e-commerce and tourism sectors, cases of vertical
restraints (agreements between firms at different lev-
els of the value chain that constrain competition) pre-
dominate, potentially reflecting the reliance on small
businesses to provide products and capacity in these
sectors (figure 7.3). In online search and advertising
and software and operating systems, cases of abuse
of dominance are more common, likely because of
their reliance on self-preferencing algorithms. In
passenger transport, collusion cases (agreements to
fix market parameters between firms at the same
level of the value chain) have been the most frequent,
which could stem from the scope for applying pricing
algorithms in this sector.7
Adapting antitrust tools. New market dynamics
arising from data-driven markets have spurred
policy makers to rethink their approaches to antitrust
enforcement, with jurisdictions around the world
devising new strategies and articulating new guid-
ance.8
Debate and refinement continue, particularly
when it comes to issues such as defining “relevant
markets” and determining “dominance.” For example,
Kenya’scompetitionauthorityrecentlypublishednew
Market Definition Guidelines to capture trends in
data-driven markets. Such guidelines can be an effec-
tive way for authorities to begin to tackle these issues,
while providing clarity to firms on the approach that
will be taken to regulate their conduct and on the fac-
tors that will be assessed in antitrust cases.
The complexity of the data-driven economy for
conventional antitrust reflects several departures
from traditional markets. Salient challenges include
how to assess consumer harm in markets in which
goods and services are nominally provided for “free,”
how to address collusive algorithms (see chapter 3),
and how to account for the nonprice dimensions of
competition such as privacy.
The multisided nature of data-driven platforms
also means that interactions among groups of users
(including advertisers, in some cases) complicate the
definition of markets and raises the possibility of
cross-subsidization across different sides of the plat-
form. This cross-subsidization includes advertising
revenues effectively covering the cost of nominally
“free” services provided to platform users. It is pre-
cisely the user data collected and processed by these
Figure 7.3 Among anticompetitive practices, abuse of dominance is more widespread
worldwide across multiple sectors of the digital economy
Source: Nyman and Barajas, forthcoming. Data at http://bit.do/WDR2021-Fig-7_3.
Note: Based on information for all antitrust cases involving digital platforms globally for which information was publicly available as of January 2020.
a. High-income countries (% of total cases)
Retail e-commerce,
21
Others,
15
Online search
and advertising,
15
Tourism
(accommodation),
13
Social
media and
communica-
tions,
10
Software and
operating systems,
15
Passenger
transport,
8
Online
real estate
platforms, 5
Retail e-commerce,
23
Software and
operating systems,
18
Others,
10
Passenger transport,
35
Mobile
financial
services,
8
Online
search and
advertising,
8
b. Middle-income countries (% of total cases)
E
-
c
o
m
m
e
r
c
e
Number
of
antitrust
cases
P
a
s
s
e
n
g
e
r
t
r
a
n
s
p
o
r
t
S
o
f
t
w
a
r
e
a
n
d
o
p
e
r
a
t
i
n
g
s
y
s
t
e
m
s
O
n
l
i
n
e
s
e
a
r
c
h
a
n
d
a
d
v
e
r
t
i
s
i
n
g
T
o
u
r
i
s
m
(
a
c
c
o
m
m
o
d
a
t
i
o
n
)
O
n
l
i
n
e
r
e
a
l
e
s
t
a
t
e
p
l
a
t
f
o
r
m
s
E
-
b
o
o
k
s
S
o
c
i
a
l
m
e
d
i
a
a
n
d
c
o
m
m
u
n
i
c
a
t
i
o
n
s
s
e
r
v
i
c
e
s
M
o
b
i
l
e
fi
n
a
n
c
i
a
l
s
e
r
v
i
c
e
s
T
i
c
k
e
t
i
n
g
O
n
l
i
n
e
d
e
l
i
v
e
r
y
s
e
r
v
i
c
e
3
5
9 9
2 1 1 1
3
6
1 1 1
9
1
1
7
1
0
2
4
6
8
10
12
14
16
Abuse of dominance Collusion Vertical restraints
232 | World Development Report 2021
platforms that make advertising on these platforms
so valuable because of better targeting (see chapter 3).
The multisided nature of platforms also means
that, in addition to selling its own services, a plat-
form acts as a buyer of services or even as a de facto
employer of individuals in the gig economy such as
drivers, household workers, or professionals who
provide services. The potential imbalance in bargain-
ing power between data-driven platform firms and
individuals or small suppliers warrants the attention
of policy makers. For example, in 2020 South Africa
published regulations on protecting against abuse of
buyer power that explicitly acknowledged the poten-
tial for such situations to arise in e-commerce mar-
kets and gig economy services.9
Some competition
authoritieshavealsobeguntoconsideradoptingrules
against abuse of a superior bargaining position.10
Platform firms typically exist in a digital eco-
system, where providers of complementary digital
products interconnect and regularly exchange data
to provide consumer products (such as the use
of Google Maps by digital transport apps). To the
extent that these complementor firms may also act
as nascent competitors with larger platform firms,
the effect of competition restrictions on these com-
plementary products is an important consideration
beyond the direct effect on the users of a platform.
The potential for platforms to acquire potential
competitors in complementary markets before
they can become a competitive threat—and either
shut them down or prevent further development of
their products—has also become a topic of debate.11
The potential for firms to engage in such “killer” or
“zombie” acquisitions may merit consideration in
merger reviews. Likewise, authorities should be
increasingly alert to the possible harm to competition
and innovation from mergers driven by the desire to
acquire new data or data-relevant intellectual prop-
erty such as algorithms.12
Under traditional antitrust regimes, mergers
involving data-driven firms may be less likely to
trigger a review by the antitrust authority because
such firms typically do not have sufficient tangible
assets or revenues to meet the traditional thresholds
for merger notification.13
Although the urgency of
these concerns for developing countries will depend
on the start-up environment in a country, thresholds
for merger notification could be revamped to allow
antitrust authorities to review potentially anticom-
petitive mergers involving data-driven firms that
may appear small but could rapidly become market
challengers through exponential growth. This has
already occurred in Austria, Germany, and Japan,
which have adopted complementary thresholds for
digital markets based on transaction values. Another
option for triggering reviews would be requiring noti-
fication before mergers of any planned acquisition by
dominant firms or shifting presumptions for future
mergers so that an acquisition by a dominant plat-
form would be presumed anticompetitive unless the
merging parties are able to show otherwise.14
Creating institutional capacity to assess cases in the data
economy. Views vary as to whether specialized digital
market authorities should be established to oversee
competition in the digital economy (as formally pro-
posed in the United Kingdom15
and informally dis-
cussed in the United States16
). An increasing number
of antitrust authorities in high-income economies
have established teams specializing in the digital
economy, but authorities in low- and middle-income
countries lack the same capacity. A recent World
Bank survey of eight competition authorities across
developing regions found that none had specialized
staff dedicated to cases in the digital economy, and
almost all cited limited staffing as a key constraint
in their ability to pursue competition issues in digi-
tal markets. Moreover, half the authorities surveyed
acknowledged their lack of understanding of plat-
form business models.17
The shortage of capacity
and resources among recently formed competition
authorities contrasts with the abundant resources
of large global digital firms. This imbalance must be
taken into account by policy makers when consider-
ing the resourcing and institutional setup required to
combat competition issues in the data-driven econ-
omy. It further strengthens the case for international
cooperation as well as for ex ante measures to prevent
harm from occurring in the first place (covered later
in this section).
As for the substance of investigations, antitrust
investigations of platform businesses are increas-
ingly considering features specific to data-driven
markets. But the most frequently assessed factors
in antitrust cases involving platforms tend to be
more traditional: assessment of competition from
other platforms and the prices of goods and services.
Although these factors often remain relevant, other,
more novel issues posed by these markets appeared to
be less systematically assessed by agencies. Network
effects and multihoming behavior by consumers (use
of multiple platforms for the same service) are explic-
itly mentioned in about 40 percent of cases across
all jurisdictions. The issue of whether lack of access
to data constitutes a barrier to entry or an essential
input is present in 27 percent of all cases, but issues
with algorithms are covered in only 13 percent. Data
protection and privacy factors were raised as issues
in only 6 percent of cases overall. Building capacity
Creating value in the data economy: The role of competition, trade, and tax policy | 233
within regulatory authorities would help further
address these challenges.
Some antitrust authorities may use data analytics
toenhancetheircapabilitytodetectcompetitionissues
in online markets. For example, the Russian Federa-
tion’s antitrust authority has created an automated
system to screen for bid rigging by analyzing data
frompublictenderstoidentifysignsofcollusion.Simi-
larly, Brazil’s competition authority has created a cartel
screening unit to collect and analyze data, including
through web scraping. In the context of COVID-19,
competition authorities have encouraged e-commerce
platform firms to monitor for excessive pricing prac-
tices by their sellers. For example, Amazon suspended
thousands of sellers in the United States alone,18
and
Jumia delisted hundreds of products in Nigeria whose
prices had been raised in response to the pandemic.19
Remedying harm caused by anticompetitive firm
behavior. A consensus is growing that simply ending
such behavior and imposing fines are insufficient.
Antitrust authorities are moving toward designing
remedies that can help restore competition in the
market. Ordering the end of practices such as anti-
competitive exclusivity clauses and the anticompet-
itive tying or bundling of products can, by encourag-
ing multihoming by consumers, facilitate access to
data by a broader range of competitors. Elsewhere,
further measures specifically targeting data and algo-
rithms may be necessary. But these remedies can be
difficult to design and monitor, especially when the
algorithms being used by the firm are not discernible.
In the Uber-Careem merger in Egypt, for example,
Uber was obliged to grant future competitors access
to Careem’s “points of interest map data” on a onetime
basis; to grant current competitors access to trip data
(including rider and driver information), subject to
data protection laws; and to give riders access to their
own data. Uber also committed to removing exclusiv-
ity requirements in contracts with drivers to prevent
them from being locked in the merged platform.
A different approach was taken in the Uber-Grab
merger in Singapore. There, the competition author-
ity considered mandating transferability of driver
data between apps before abandoning the idea after
a survey of potential entrants suggested this was not
an impediment to their entry.20
This case reinforces
the need to design remedies case by case and to link
them clearly to a theory of harm in order to safeguard
against remedies that are detrimental to innovation
without significantly improving competition.
Cooperating across borders. Because of the global
nature of many platform businesses, antitrust deci-
sions taken in one jurisdiction often have spillover
effectsinothercountries.Forexample,inGermany,as
part of a remedy to respond to competition concerns
from third-party sellers, Amazon agreed to amend
its terms of business for sellers on Amazon’s online
marketplaces across Europe, North America, and Asia.
Among the changes was a reduction in Amazon’s (pre-
viously extensive) rights to use data on the products
of third parties.21
Such cross-country benefits could be
further leveraged by promoting international cooper-
ation among antitrust authorities, thereby creating a
more predictable regulatory environment for firms.
To date, the European Union (EU) is alone in
having a substantial track record in competition
enforcement in digital markets at the regional level.
However, several regional competition authorities
are becoming more active—such as the Competition
Commission of the Common Market for Eastern and
Southern Africa (COMESA) free trade area—although
they have yet to take on a digital markets case.
More informal collaboration is also occurring among
competition authorities. For example, the BRICS
countries (Brazil, Russia, India, China, and South
Africa) released a joint report on the digital econ-
omy that calls for increased cooperation among the
authorities.22
Antitrust agencies in the Group of Seven
(G-7) also released a “common understanding of the
challenges posed by the digital economy” in 2019,
addressing the need for international cooperation.23
Emerging free trade agreements—such as the African
Continental Free Trade Agreement, which is likely to
have a digital focus—could also foster harmonization
on competition policy for the data-driven economy
through their competition policy protocols.
Promoting competition through regulation
ex ante
Ex ante market regulations that promote competition
by enabling data use may be just as important as
antitrust remedies. Indeed, given the durable nature
of market power in data-driven markets and the chal-
lenges of designing antitrust remedies, ensuring that
markets do not slip into entrenched market power
in the first place may be even more important than
enforcing antitrust rules ex post. In countries with-
out a developed antitrust regime, including many
low-income countries, this is the only line of policy
response to foster competitive data-driven markets.
Adopting mandatory and voluntary schemes to improve
access to data. Governments are considering various
regulatory options to ease access to data in digital
ecosystems. However, such options remain relatively
untested, and evidence on their efficacy is still scarce.
Options include facilitating multihoming; extending
the right to portability of personal data (in essence,
the right to move personal data between different
234 | World Development Report 2021
controllers); facilitating data interoperability (the
ability for different systems to share and use data in
a coordinated, timely manner); and encouraging data
sharing or pooling schemes (when two or more firms
agree to merge their data for access by themselves
and possibly third parties).
Relatively few jurisdictions have put mandatory
portability and interoperability schemes into prac-
tice, although a few instructive models are emerging.
The European Union has been at the forefront of this
push, although an ability to enforce these schemes is
not yet clear. The right to portability of personal data
is contained in the European Union’s General Data
Protection Regulation (GDPR), whereas its 2019 reg-
ulation on the free flow of nonpersonal data should
be important for firms that rely on machine data.24
Kenya, Mexico, Nigeria, the Philippines, and Thai-
land are some other jurisdictions that have also put
in place rights to portability, although—as with the
GDPR—these regimes have yet to be tested.
Measures to mandate interoperability go beyond
portability rights; they also aim to ease the sharing of
data from a technical perspective. Ease is important
where continual access to data is required.25
An exam-
ple is the implementation of “open banking” regimes
whereby financial service providers are mandated to
share data on user accounts to third parties through
open application programming interfaces (APIs).
Banking data are well suited to data sharing initia-
tives because they are relatively homogeneous and
standardized, and the concept of open banking is now
well established in Europe.
The United Kingdom’s open banking initiative
seems to have been particularly successful in spurring
market entry and innovation, with 134 third-party pro-
viders currently registered and supplying services.26
At least nine other jurisdictions also have emerging
open banking regimes in place.27
In Brazil, the central
bank published a draft regulation on open banking in
2019. In India, although there is no mandatory open
banking regime, policy measures to encourage data
sharing have been introduced. These include safe-
guards such as the creation of “account aggregators”
to ensure that individuals consent to use of their
financial data and that data are not used beyond the
agreed terms.28
Beyond the realm of open banking, in
Mexico several revisions of the Fintech Law have been
adopted to enable data sharing, including facilitation
of access to user data and regulation of the fees banks
can charge for sharing user data.29
Voluntary industry data access schemes are also
operating around the world. The Open Ag Data
Alliance, an industrywide project, aims to standard-
ize the way in which farmers’ data are shared with
larger firms (such as those developing precision agri-
culture models).30
In Kenya, the One Million Farmer
Platform connects agtech players across 14 value
chains to take advantage of large-scale shared data
collection and digitized farmer profiles.31
In Nigeria,
financial industry players have voluntarily formed
an alliance to develop open banking APIs.32
The Solid
project, launched in 2018, aims to provide open-
source software that allows users to fully own their
data and allows developers to create decentralized
apps that run on that data. Meanwhile, Apple, Face-
book, Microsoft, and Twitter are developing the Data
Transfer Project, an open-source initiative to enable
portability of some types of user data among partici-
pating platforms, although its impact on competition
has yet to be seen.
When jurisdictions seek to impose mandatory
data sharing regulations, they must design such
schemes carefully to avoid distortive effects such as
stifling incentives to innovate, facilitating collusion
through excessive transparency of firms’ strategic
variables such as prices, and unduly raising the cost of
doing business. The immediate benefits of mandating
data sharing need to be balanced with the possibility
that it would reduce incentives for those sharing data
to invest in data collection and for competitors receiv-
ing data to build their own collection capacity.
To safeguard incentives for innovation, it may
be useful to examine whether the data to be shared
have the features of an “essential facility” (akin to the
“essential facilities doctrine” framework used to reg-
ulate sharing of infrastructure). Such an examination
would require careful economic analysis of whether
the data in question are an essential input that com-
petitors cannot replicate or substitute. Because data,
unlike physical infrastructure, are nonrivalrous, there
should in theory be a stronger argument in favor of
granting access requirements. Nonetheless, design
of such requirements would still have to carefully
consider future market dynamics, including incen-
tives for innovation; whether the remedy should be
timebound and limited to markets where a competi-
tion issue has been identified; and whether different
firms should have different obligations according to
their market position. Regulators should also ensure
that shared data are kept secure, and that, when per-
sonal data are involved, the exchange is carried out in
compliance with data protection laws.
Regulating the structure and behavior of data-driven
platformfirms. Calls to regulate large data-driven firms
Creating value in the data economy: The role of competition, trade, and tax policy | 235
ex ante are gaining ground and merit consideration.
Moves in this direction must be targeted at remedy-
ing specific competition bottlenecks and should be
grounded in sound economic analysis (for example,
remedies around data sharing should be based on a
finding that a lack of data is indeed prohibiting firms
from competing). The possibility of regulating large
platforms as essential facilities (akin to telecom regu-
lation) may be promising. However, it should be con-
ditional on a platform or its data constituting a bot-
tleneck or an unavoidable trading partner for other
firms, and on the data in question being an essential,
nonreplicable input to potential competitors.
Another area of discussion has been the pos-
sibility of breaking up large data-driven platform
firms.33
Because of the highly interventionist nature
of this solution, it should be considered only when
the firm in question holds market power believed
to be entrenched and durable. It is also important to
consider that the network effects that initially led to
the firm’s dominance may persist in each of the indi-
vidual market segments after the firm is broken up.
There is no guarantee that simply separating a firm’s
segments will overcome these network effects and
allow expansion by others.
Nonetheless, structural solutions may be relevant
to solve some issues typically seen with platform
firms. Preventing firms from operating in multiple
markets would mitigate the risk of firms leveraging
economies of scope from data insights across mul-
tiple markets. It would also address the risk of verti-
cally integrated firms providing preferences for their
own products. All this needs to be weighed against the
benefits consumers may experience from economies
of scale and scope that arise when service providers
participate in multiple markets.
Other ex ante regulations could target the “qual-
ity” dimensions of services provided by data-driven
firms—particularly the protections afforded to indi-
viduals on the collection and use of their data (if not
already adequately covered by the data protection
regime). They could also look at the terms applied to
the (often small) suppliers that participate in these
platforms. In this vein, regulators could consider
providing smaller firms with access to platforms or
prominence in their rankings on a fair, consistent,
and transparent basis. For example, the European
Union enacted a platform-to-business regulation in
2019 that requires a platform to make its terms and
conditions easily available to businesses that trade
on the platform, including disclosure of conditions
under which either party may access data generated
by or provided to the platform, as well as explanations
of the ranking algorithms employed.34
Leveraging offline regulation. For some data-driven
businesses, the key to being able to enter and com-
pete does not lie so much in data remedies as in
other aspects of regulation, including “offline” regu-
lation. In some countries, new regulations are being
imposed to protect traditional or incumbent players.
In Morocco and Tunisia, state-owned enterprises
are not subject to the same data protection obliga-
tions that are binding for the private sector.35
New
e-commerce rules in India that prohibit foreign firms
from selling their own products on their platforms
are intended to protect domestic retailers against
risks of exclusion (reflecting the line of regulatory
reasoning on preventing self-preferencing noted ear-
lier). However, the fact that these regulations target
only foreign firms and are not predicated on the firm
holding a dominant position may mean that such
regulations could be creating an unlevel playing field
beyond what is needed to prevent adverse outcomes.
Regulations that ban entry of data-driven business
models are an obvious example of offline regulatory
restrictions. Spurred by protests from incumbent
players, Uber was blocked from entering a range of
countries, including Bulgaria (for a time), Italy, and
Spain.36
In response, the competition authorities of at
least 24 countries have advocated against dispropor-
tionate restrictions for transport platforms, including
those in Brazil, Colombia, Indonesia, Kenya, Mexico,
and Peru. In China, the government helped to resolve
uncertainty by legalizing ridesharing apps and estab-
lishing procedures to formally license drivers.37
In
Mexico, the competition authority recommended
that local governments recognize transportation
services provided by platforms, leading to new regu-
lations allowing them to operate fully in Mexico City
and other localities.38
Still other regulations can raise the costs of data-
driven firms to compete. For example, ridesharing
regulations in Jordan specify that fares charged by
ridesharing companies must be 15 percent higher
than those of taxis, and discounts may not go below
the tariff stipulated for taxis.39
In Egypt, drivers and
vehicles working with ridesharing platforms are
required to pay 25 percent higher registration fees
and taxes than traditional taxis.40
Fintech is another sector in which the conditions
favoring incumbents are gradually being dismantled.
Some regulations have focused on unstructured
supplementary service data (USSD) channels, which
establishareal-timesessionbetweenamobilehandset
236 | World Development Report 2021
and an application to generate a financial service and
are considered an “essential facility” for many fintech
providers.RegulatorsinBangladesh,Colombia,Kenya,
and Peru have promulgated regulations to open up
third-party access to mobile network operators’ USSD
channels after some providers strategically restricted
access to those channels to dampen competition from
potential rivals.41
Banks can also unfairly exclude non-
bank rivals from payment settlement infrastructure,
which is often owned or controlled by incumbent
bank consortiums. In response to this issue, the Peo-
ple’s Bank of China created a separate clearinghouse
for nonbank payment providers in 2018, in part to
create a more level playing field.
Recommended reforms of competition
policy
The recommendations for competition policy that
follow are grouped according to a maturity model
designed to reflect a country’s stage in the develop-
ment process.
Establishing fundamentals
In low-income countries with limited institutional
capacity, a pragmatic approach to competition policy
focusing on the essentials is warranted. It should
build on related instruments that may already be in
place, while developing institutional capacity for the
future.
Create a level playing field for data-driven businesses.
The first priority is to ensure that traditional regula-
tions and policies do not hinder entry by data-driven
firms or create an unlevel playing field between
firms. This could include harmonizing standards
and requirements for entry and operation of data-
driven firms with standards and requirements for
traditional firms in competing markets (for example,
ride-hailing platforms and regular taxis).
Build on existing sectoral regulations. In regulated
sectors with institutional capacity—such as telecom,
energy, and banking—more targeted regulatory
approaches to encouraging data access could be con-
sidered, building on international experience such as
with open banking.
Develop capacity for dealing with data-driven busi-
nesses. Governments should also invest in building an
understanding of data-driven business models and
data ecosystems in selected agencies, such as sector
regulators, and by industry-related policy makers. As
part of sectoral initiatives or industrial policies, policy
makers could consult with the private sector—partic-
ularly start-ups—to understand data needs and data
bottlenecks for firms and broker industry-led solu-
tions for data sharing.
Accelerating data flows
In addition to the preceding recommendations, coun-
tries with more capable institutions could encourage
or mandate data sharing for markets or circum-
stances where it would be pro-competitive on balance
and in compliance with data protection policies for
personal data.
Develop guidelines for portability and interoperability.
Where portability is possible, consumers should be
trained to exercise these rights. Where data sharing
may be mandated under existing legal tools (such as
the competition law, market regulations, or license
terms), policy makers could define an essential facil-
ities–style framework and build the in
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf
World-Development-Report-2021-Data-for-Better-Lives.pdf

More Related Content

PDF
intro to morphological, biochemical and molecular markers.pdf
PDF
Form No. SH-4 : Securities Transfer Form
PPTX
Epigenética 1.2
PDF
Data for development
PDF
Enhanced Social Accountability through Open Access to Data
PDF
World Development Indicators 2015
PDF
WORLD BANK- World Development Indicators 2015
PPTX
Implementing the SDGs in a Changing World
intro to morphological, biochemical and molecular markers.pdf
Form No. SH-4 : Securities Transfer Form
Epigenética 1.2
Data for development
Enhanced Social Accountability through Open Access to Data
World Development Indicators 2015
WORLD BANK- World Development Indicators 2015
Implementing the SDGs in a Changing World

Similar to World-Development-Report-2021-Data-for-Better-Lives.pdf (20)

PDF
Sustainable Development Finance, Current Trends and Maximizing Impact
PDF
Big datafordevelopment un-globalpulsejune2012
PDF
Executive Summary: Mobilsing the Data Revolution for Sustainable Development
PPTX
Melssaw's take away
PPTX
A new paradigm for Development Economics?
PPTX
Open Data for Socio-Economic Value/Ina Smith
PPTX
On the Sustainable Development Goals in the Arab World
PDF
Data privacy and security in ICT4D - Meeting Report
PDF
Leading at Frontiers of National Data Strategy Dialogs
PDF
Sustainability, infrastructure and resilience in the era of the SDGs
PDF
Building an Integrated National Data System in South Africa
PPTX
Achieving the SDGs in Africa: data, finance, digitalization & localization
PDF
World Development Report 2016: Digital Dividends
PPTX
The Sustainable Development Goals: Reality & Prospects
PPTX
Progress of Arab Economies in a Changing World
PDF
WEF_IT_UnlockingValuePersonalData_CollectionUsage_Report_2013
PPTX
Implementing the SDGs in a Changing World
PPTX
On the SDGs in the Arab World
PPTX
Sustainable Development Goals and the WBG “Leave No One Behind”
Sustainable Development Finance, Current Trends and Maximizing Impact
Big datafordevelopment un-globalpulsejune2012
Executive Summary: Mobilsing the Data Revolution for Sustainable Development
Melssaw's take away
A new paradigm for Development Economics?
Open Data for Socio-Economic Value/Ina Smith
On the Sustainable Development Goals in the Arab World
Data privacy and security in ICT4D - Meeting Report
Leading at Frontiers of National Data Strategy Dialogs
Sustainability, infrastructure and resilience in the era of the SDGs
Building an Integrated National Data System in South Africa
Achieving the SDGs in Africa: data, finance, digitalization & localization
World Development Report 2016: Digital Dividends
The Sustainable Development Goals: Reality & Prospects
Progress of Arab Economies in a Changing World
WEF_IT_UnlockingValuePersonalData_CollectionUsage_Report_2013
Implementing the SDGs in a Changing World
On the SDGs in the Arab World
Sustainable Development Goals and the WBG “Leave No One Behind”
Ad

Recently uploaded (20)

PPTX
Το κρασί από την αρχαιότητα έως σήμερα.pptx
PDF
Wendy’s Menu Canada – Complete Guide 2025
DOCX
Breast Pump Accessories Guide_ What You Need.docx
PPTX
plantilla-powerpoint-lactancia-materna-221215162739-ccfab79a
PDF
Microwaving Fish: The best way to cook a fish!
PDF
Ecosure Passing Score with eAuditor Audits & Inspections
PDF
08_Mango_Dis_PARTIALSTEMPARASITE.pdf -farmers
PDF
Brown-Illustrative-Abstract-Group-Project-Presentation-1.pdf
PPTX
SUSTAINABLE FOOD PRODUCTION and supply chain
PPTX
英国罗汉普顿大学毕业证复刻,英国罗汉普顿大学文凭成绩单定制在线制作
PPTX
philippines-slides-march-2024-datahub-web.pptx
PPTX
TLE 10 - LESSON 3 MARKET FORM OF EGGS BY GROUP 5 .pptx
PPTX
presentation Brew_Bliss_Business_Plan (1).pptx
PPT
Nuclear Chemistry.dcbskdbcsljbcksjbcsljdbcsljbs
PDF
How Food Data Scraping Is Revolutionizing Restaurant Growth Strategies
PDF
PREPARE SALAD & SALD DRESSING cookery 1.
PPTX
Food-Sanitation-and-Microbiology_20250801_223934_0000.pptx
PPTX
Lecture 2 Effect of water on shelf life of food.pptx
PPTX
ChemistrGenetic MaterCytoplasm and cell organekojljmlilles8.pptial.ppty of Li...
PDF
PROMO FUNCTIONAL BEVERAGES MARKET, USA, MAY-JUNE 2025
Το κρασί από την αρχαιότητα έως σήμερα.pptx
Wendy’s Menu Canada – Complete Guide 2025
Breast Pump Accessories Guide_ What You Need.docx
plantilla-powerpoint-lactancia-materna-221215162739-ccfab79a
Microwaving Fish: The best way to cook a fish!
Ecosure Passing Score with eAuditor Audits & Inspections
08_Mango_Dis_PARTIALSTEMPARASITE.pdf -farmers
Brown-Illustrative-Abstract-Group-Project-Presentation-1.pdf
SUSTAINABLE FOOD PRODUCTION and supply chain
英国罗汉普顿大学毕业证复刻,英国罗汉普顿大学文凭成绩单定制在线制作
philippines-slides-march-2024-datahub-web.pptx
TLE 10 - LESSON 3 MARKET FORM OF EGGS BY GROUP 5 .pptx
presentation Brew_Bliss_Business_Plan (1).pptx
Nuclear Chemistry.dcbskdbcsljbcksjbcsljdbcsljbs
How Food Data Scraping Is Revolutionizing Restaurant Growth Strategies
PREPARE SALAD & SALD DRESSING cookery 1.
Food-Sanitation-and-Microbiology_20250801_223934_0000.pptx
Lecture 2 Effect of water on shelf life of food.pptx
ChemistrGenetic MaterCytoplasm and cell organekojljmlilles8.pptial.ppty of Li...
PROMO FUNCTIONAL BEVERAGES MARKET, USA, MAY-JUNE 2025
Ad

World-Development-Report-2021-Data-for-Better-Lives.pdf

  • 4. © 2021 International Bank for Reconstruction and Development / The World Bank 1818 H Street NW, Washington, DC 20433 Telephone: 202-473-1000; Internet: www.worldbank.org Some rights reserved 1 2 3 4 24 23 22 21 This work is a product of the staff of The World Bank with external contributions. The findings, interpretations, and conclusions expressed in this work do not necessarily reflect the views of The World Bank, its Board of Execu- tive Directors, or the governments they represent. The World Bank does not guarantee the accuracy, completeness, or currency of the data included in this work and does not assume responsibility for any errors, omissions, or discrepancies in the information, or liability with respect to the use of or failure to use the information, methods, processes, or conclusions set forth. The boundaries, colors, denominations, and other information shown on any map in this work do not imply any judgment on the part of The World Bank concerning the legal status of any territory or the endorsement or acceptance of such boundaries. Nothing herein shall constitute or be construed or considered to be a limitation upon or waiver of the privileges and immunities of The World Bank, all of which are specifically reserved. Rights and Permissions This work is available under the Creative Commons Attribution 3.0 IGO license (CC BY 3.0 IGO) http:// creativecommons.org/licenses/by/3.0/igo. Under the Creative Commons Attribution license, you are free to copy, distribute, transmit, and adapt this work, including for commercial purposes, under the following conditions: Attribution—Please cite the work as follows: World Bank. 2021. World Development Report 2021: Data for Better Lives. Washington, DC: World Bank. doi:10.1596/978-1-4648-1600-0. License: Creative Commons Attribution CC BY 3.0 IGO Translations—If you create a translation of this work, please add the following disclaimer along with the attribu- tion: This translation was not created by The World Bank and should not be considered an official World Bank translation. The World Bank shall not be liable for any content or error in this translation. Adaptations—If you create an adaptation of this work, please add the following disclaimer along with the attribu- tion: This is an adaptation of an original work by The World Bank. Views and opinions expressed in the adapta- tion are the sole responsibility of the author or authors of the adaptation and are not endorsed by the World Bank. Third-party content—The World Bank does not necessarily own each component of the content contained within the work. The World Bank therefore does not warrant that the use of any third-party-owned individual component or part contained in the work will not infringe on the rights of those third parties. The risk of claims resulting from such infringement rests solely with you. If you wish to re-use a component of the work, it is your responsibility to determine whether permission is needed for that re-use and to obtain permission from the copyright owner. Examples of components can include, but are not limited to, tables, figures, or images. All queries on rights and licenses should be addressed to World Bank Publications, The World Bank Group, 1818 H Street NW, Washington, DC 20433, USA; e-mail: pubrights@worldbank.org. ISSN, ISBN, e-ISBN, and DOI: Softcover ISSN: 0163-5085 ISBN: 978-1-4648-1600-0 e-ISBN: 978-1-4648-1601-7 DOI: 10.1596/978-1-4648-1600-0 Hardcover ISSN: 0163-5085 ISBN: 978-1-4648-1607-9 DOI: 10.1596/978-1-4648-1607-9 Cover design: Dania Kibbi, Base Three Studio Cover illustration: Mijke Coebergh Interior design: George Kokkinidis, Design Language, Brooklyn, New York, with updates and additional figure design by Dania Kibbi, Base Three Studio Library of Congress Control Number: 2021935945
  • 5. Contents | iii xi Foreword xiii Acknowledgments xvii Abbreviations 1 Overview 3 Overview 3 Advancing development objectives through data 10 Aligning data governance with the social contract 16 Moving toward an integrated national data system 18 Notes 19 References 21 Part I: Advancing development objectives through data 23 Chapter 1: Harnessing the value of data for the poor 24 The untapped potential of data to serve development objectives 25 A brief history of data 27 A data typology 30 The economics of data and political economy issues 33 Data for development: A conceptual framework 38 A data governance framework to enforce the social contract for data 41 Putting it all together: Establishing an integrated national data system 42 Organization of this Report 43 Notes 44 References 48 Spotlight 1.1: Helping communities to gain the ability to collect and analyze their own data 50 Spotlight 1.2: The importance of good data in helping low- and middle-income countries to manage debt during and after the COVID-19 pandemic 53 Chapter 2: Data as a force for public good 54 The central role of public intent data 54 Public intent data and development: Three pathways for adding value 59 Gaps in the coverage, quality, and usability of public intent data 65 Why data gaps persist: The political economy of public intent data 71 Realizing the potential of public intent data 75 Notes 78 References Contents
  • 6. iv | Contents 85 Spotlight 2.1: Deploying data to curtail violence against women and girls 89 Spotlight 2.2: The role of international organizations in improving public intent data 91 Chapter 3: Data as a resource for the private sector 92 Creating value and solving development challenges through data-driven business models 92 The role of data in the production process of firms 93 Data-driven businesses and the technologies that help them create value 95 Focus on platform firms in low- and middle-income countries 98 Data inputs for economic activity 100 The positive development impacts of data used in the production process 103 How use of data in the production process is transforming sectors 107 Some potential risks and adverse outcomes of data-driven businesses to be addressed by policy 111 Notes 113 References 117 Spotlight 3.1: The huge potential of open data for business applications 121 Chapter 4: Creative reuses of data for greater value 122 The power of repurposing and combining different types and sources of data 125 New insights from repurposing and combining data 134 Limitations in using private intent data for development 136 Investments in data innovations: Building a culture of data 143 Notes 144 References 150 Spotlight 4.1: Gathering, sharing, and using better data on weather, water, and climate from low- and middle-income countries 152 Spotlight 4.2: Making roads safer by repurposing private intent traffic data 155 Part II: Aligning data governance with the social contract 157 Chapter 5: Data infrastructure policy: Ensuring equitable access for poor people and poor countries 158 Data infrastructure as a source of inequity 160 Connecting poor people 168 Connecting poor countries 174 Conclusions and recommendations 178 Notes 180 References 184 Spotlight 5.1: How the COVID-19 pandemic has recalibrated expectations of reasonable data consumption and highlighted the digital divide 186 Spotlight 5.2: Data’s carbon footprint 189 Chapter 6: Data policies, laws, and regulations: Creating a trust environment 190 A trust framework of data safeguards and enablers 191 Building safeguards for trusted data use 199 Creating enablers for data sharing 206 Recommendations for crafting a holistic legal framework 208 A maturity model for strengthening the legal and regulatory framework 209 Notes 215 References 222 Spotlight 6.1: The evolving social contract on data: Balancing data sharing and data protection to facilitate contact tracing to control COVID-19 224 Spotlight 6.2: The debate over ownership of personal data
  • 7. Contents | v 227 Chapter 7: Creating value in the data economy: The role of competition, trade, and tax policy 228 Shaping data regulation to support competition, trade, and taxation 229 Competition policy 237 Trade policy 246 Tax policy 250 Conclusion 250 Notes 254 References 259 Spotlight 7.1: Understanding the interface between data protection and competition policy 261 Spotlight 7.2: The role of regional and international cooperation in addressing data governance challenges 265 Chapter 8: Institutions for data governance: Building trust through collective action 266 How can institutions help govern data for development? 267 Data management across the data life cycle 269 Data governance functions 271 Mapping data governance functions to illustrative institutions 277 Data intermediation and collaboration 280 Making data governance institutions effective 284 Sustainable outcomes through inclusive multistakeholder governance 286 Assessing the institutional foundation through the lens of a maturity model 288 Notes 291 References 297 Spotlight 8.1: The need for a new global consensus on data: A call to action 299 Spotlight 8.2: Promoting citizen science in the Amazon basin 301 Part III: Moving toward an integrated national data system 303 Chapter 9: Creating an integrated national data system 304 Toward an integrated national data system 305 The vision of an integrated national data system 309 Realizing the vision 322 Notes 324 References Boxes O.1 18 Toward an integrated national data system: Country examples 1.1 24 What this Report means by data 1.2 29 Innovation in traditional surveys: A COVID-19 example in Brazil 1.3 32 The challenges of trading data through markets 1.4 36 Using private intent data to tackle COVID-19 2.1 55 Six types of public intent data 2.2 60 The World Bank’s Statistical Performance Indicators 2.3 61 Gender data and the COVID-19 pandemic 3.1 94 Technologies and methods that support data-driven decision-making and intermediation 4.1 122 Using cellphones to combat COVID-19 4.2 126 Leveraging private intent data to tackle COVID-19 4.3 131 Preventing illegal fishing in protected maritime areas
  • 8. vi | Contents 4.4 135 Using statistical methods and private intent data to improve representativeness and geospatial precision 5.1 174 The brain drain—ICT professionals 8.1 266 Uruguay’s whole-of-government approach to data governance 8.2 269 The importance and complexity of data governance institutions: The example of digital identification systems 8.3 279 Increased scrutiny of and constraints on private data intermediaries 8.4 286 Building multistakeholder data governance into smart city initiatives through “digital democracy” 9.1 304 Relationship between an integrated national data system and a national statistical system Figures O.1 4 How data can support development: A theory of change O.2 6 A social contract for data founded on value, trust, and equity O.3 11 Data governance layers at the national and international levels O.4 13 The legal and regulatory framework for data governance remains a work in progress across all country income groupings O.5 14 Since 1990, the global trade in data-driven services has grown exponentially and now constitutes half of trade in services O.6 17 What happens in an integrated national data system? 1.1 26 The share of people counted in a census grew from about 1 in 10 in 1850 to 9 in 10 today 1.2 30 The data life cycle 1.3 34 Three pathways along which data can foster development B1.4.1 36 Use of repurposed data to study COVID-19: Published articles, by type of private intent data used 1.4 39 Data governance layers at the national and international levels 1.5 41 Data flow safely across all stakeholders in an integrated national data system S1.1.1 49 A citizen-led method to ascertain who has authority in household decision- making in rural Indian villages S1.2.1 50 In six years, the composition of debt has shifted dramatically 2.1 55 Certain data features can maximize the value of public intent data 2.2 56 Improving access to water: Using real-time sensor data to reduce repair time for broken hand pumps in Kenya 2.3 60 Gaps in geospatial datasets are especially large in lower-income countries B2.3.1 61 Proportion of COVID-19 cases reported with sex-disaggregated data for 190 countries 2.4 63 Lower-income countries, especially those affected by fragility and conflict, have less comparable poverty data than other country groups 2.5 64 Lower-income countries are less likely than other countries to adhere to international best- practice statistical standards and methodologies 2.6 65 A positive feedback loop can connect enablers and features of public intent data with greater development value 2.7 66 Most countries do not fully fund their national statistical plans 2.8 68 The older a country’s statistical laws, the lower is its statistical performance and the less open are its data 2.9 69 Greater NSO independence and freedom of the press are positively correlated with better statistical performance 2.10 70 Data supply and demand can generate either virtuous or vicious cycles of data production and use 2.11 71 Policies to realize the potential of public intent data S2.1.1 85 Prevalence of female genital mutilation in women ages 15–49, by country income level, 2010–19 3.1 94 The role of data in the production process: Pathways to development 3.2 96 The role of data in economic activity 3.3 97 Platform firms are numerous in some lower-income countries but tend to be small 3.4 97 The importance of domestic versus foreign-headquartered firms differs across countries as indicated by firm share of top websites
  • 9. Contents | vii 3.5 98 Users visit and spend more time on Facebook, Google, and YouTube than other websites 3.6 99 Internet traffic in low- and middle- income countries is concentrated in several US-based firms 3.7 108 Risks to market structure and market power stemming from platform firms S3.1.1 118 Private company use of public data is extremely valuable in the United States, suggesting the value of open government data B4.1.1 123 Smartphone location data reveal the changes in the time users spend at home in Jakarta B4.2.1 126 Use of repurposed data to study COVID-19: Published articles, by type of private intent data used B4.3.1 131 Public intent and private intent data can be combined to detect illegal fishing activity 4.1 134 Gaps in network coverage differ across farm sizes, affecting agricultural extension services 4.2 137 Artificial intelligence specialists gravitate to the US market, no matter where they are educated 5.1 158 The data infrastructure supply chain 5.2 160 The developing world overwhelmingly accesses data using wireless networks 5.3 161 Gaps in 3G wireless broadband internet coverage have been shrinking, but usage gaps remain stubbornly high 5.4 162 Globally, the coverage of wireless technologies reflects their constant upgrading 5.5 164 In low- and middle-income countries, nearly 70 percent of those who do not use the internet are held back by deficiencies in digital literacy 5.6 166 Inequities in mobile data consumption across country income groups and regions are huge 5.7 167 The monthly price for 1 gigabyte of data is unaffordable in low-income countries 5.8 167 Data consumption is very sensitive to market prices and service affordability 5.9 168 The presence of domestic data infrastructure facilitates national data exchanges 5.10 170 Data infrastructure is relatively scarce in low- and middle-income countries 5.11 171 Countries develop domestic data infrastructure in stages B5.1.1 174 Low- and middle-income countries are educating ICT professionals but not retaining them B5.1.2 175 Major wage differentials for ICT professionals create a brain drain, especially in low- and middle-income countries S5.2.1 186 Worldwide greenhouse gas emissions from data consumption have been flat, even though electricity consumption has been growing 6.1 191 Envisioning the multidimensional nature of the legal framework for trust 6.2 192 The evolution of data protection 6.3 194 Gaps in the regulatory framework for cybersecurity are glaring across country income groups 6.4 195 Progress on personal data protection legislation differs markedly across country income groups 6.5 200 Adoption of e-commerce and related legislation is widespread across country income groups 6.6 201 Regulations enabling access to and reuse of public intent data are unevenly developed across country income groups 6.7 204 Adoption of enablers for sharing private intent data lags those for public intent data across country income groups 7.1 228 The unusual characteristics of data-driven businesses pose complex challenges for policy makers in the areas of competition, trade, and taxation 7.2 230 In the digital economy, antitrust cases related to passenger transport are more prevalent in middle-income countries than in high-income countries 7.3 231 Among anticompetitive practices, abuse of dominance is more widespread worldwide across multiple sectors of the digital economy
  • 10. viii | Contents 7.4 238 Since 1990, the global trade in data-driven services has grown exponentially and now constitutes half of trade in services 7.5 239 Three distinct approaches to handling cross-border data flows 7.6 248 East Asian countries are losing a substantial volume of tax revenue by failing to apply current VAT rules to digital services 8.1 268 Data governance and data management, working seamlessly together in support of the social contract 8.2 270 Functions of data governance 8.3 273 No low-income and few lower- middle-income countries have a separate data governance entity; most embed them in another government institution 8.4 276 The lower the country income level, the fewer are the countries with data protection authorities 8.5 276 More than half of countries across all income groups have antitrust authorities 8.6 277 Only about one-quarter of low- income countries have cybersecurity agencies 8.7 281 Features of well-functioning institutions for effective data governance S8.2.1 300 Using a diverse set of open science and citizen science tools and technologies in the Amazon basin 9.1 305 What happens in an integrated national data system? 9.2 309 A data maturity model for a hypothetical national data system 9.3 310 Steps to integrating the public sector into the national data system 9.4 313 Steps to integrating civil society into the national data system 9.5 315 Steps to integrating academia into the national data system 9.6 317 Steps to integrating the private sector into the national data system 9.7 320 Steps to integrating international and regional organizations into the national data system and collaborating across borders Maps O.1 5 Use of aggregated cellphone records to track mobility week by week during COVID-19 lockdowns in The Gambia, March–May 2020 O.2 7 Highly refined data pinpointed areas of Nigeria that needed better sanitation O.3 9 Combining satellite imagery with household survey data increases the resolution of the poverty map of Tanzania O.4 12 Data infrastructure is not yet widespread across all parts of the world 1.1 27 John Snow’s innovative mapping of the cholera epidemic in London in 1854 revolutionized tracing of the disease B1.4.1 37 Use of repurposed data to study COVID-19: Published articles, by country 2.1 57 Reducing poverty: Mapping pockets of poverty in Croatia allowed better targeting of antipoverty funds 2.2 59 Improving public safety: The use of citizen-collected data in Bogotá led to greater safety around bike paths B4.1.1 122 Mapping the home location of smartphone users in Jakarta, 2020 B4.2.1 127 Uses of repurposed data to study COVID-19: Published articles, by country 4.1 132 Private intent data can provide unique and comparable information not collected by national governments, such as the number of adults who lack a formal financial account 4.2 133 Agricultural extension services can be tailored to the slower, older broadband internet accessible to many small-scale farmers S4.1.1 151 Large gaps remain in global reporting on basic weather data S4.2.1 153 By combining police reports and crowdsourced data, researchers were able to identify the 5 percent of roads where half of the crashes occur in Nairobi 5.1 169 The global fiber-optic cable submarine network reaches all corners of the world, but data infrastructure is unevenly developed 7.1 241 Uptake of regulatory models to cross-border data flows
  • 11. Contents | ix Tables 1.1 28 Examples of data types based on original intent and collection methods 2.1 63 Assessment of the openness of data, by country income group 2.2 64 Data dissemination practices and openness, by country income group 4.1 142 Selected research gaps to be addressed to advance the use of private intent data for development 5.1 177 Recommendations for data infrastructure improvements sequenced according to a maturity model 6.1 207 Certain elements of the regulatory framework are much better developed than others, but performance is generally low 6.2 209 Recommendations organized according to a maturity model based on data safeguards and enablers 7.1 241 Policy bases for regulating cross-border personal data 7.2 244 Key provisions on digital trade in recent trade agreements 8.1 268 Data management decisions along the data life cycle 8.2 272 Candidate institutions and actors to perform or inform data governance functions 8.3 278 Snapshot of common data intermediary structures
  • 13. Foreword | xi Data governance is the subject of intense debate in advanced economies and increasingly among large emerging markets. And yet many complex policy questions remain unan- swered. In response, World Development Report 2021: Data for Better Lives surveys the emerging landscape and provides policy makers with a framework for thinking through the issues, opportunities, and trade-offs. One thing is clear: the perspective of lower-income countries has so far been largely absent from these global debates and urgently needs to be heard. Data are a double-edged sword. On the one hand, they offer tremendous potential to create value by improving programs and policies, driving economies, and empowering citizens. On the other hand, data accumulation can lead to a concentration of economic and political power, rais- ing the possibility that data may be misused in ways that harm citizens. Data are a resource that can be used and reused repeatedly to create more and more value, but there is a problem—the more data are reused, the higher is the risk of abuse. It is hard to imagine a more dramatic example of these opportunities and tensions than the COVID-19 pandemic. Countries around the world have moved swiftly to repurpose mobile phone records to monitor the spread of the virus. But at the same time, they have struggled to balance this benefit against privacy concerns and the risk of misuse. Beyond pandemic times, the statistical capacity to produce and effectively use core economic and social data is limited. Many poor countries are unable to accurately track public finances, report on external debt, or monitor their development goals. Without such data, the ability to hold governments accountable and track progress withers. Data governance arrangements to facilitate greater use of data while safeguarding against misuse remain in their infancy. The legal and regulatory frameworks for data are inadequate in lower-income countries, which all too often have gaps in critical safeguards as well as shortages of data-sharing measures. There, the data systems and infrastructure that enable interoperabil- ity and allow data to flow to more users are incomplete; less than 20 percent of low- and middle- income countries have modern data infrastructure such as colocation data centers and direct access to cloud computing facilities. Even where nascent data systems and governance frame- works exist, a lack of institutions with the requisite administrative capacity, decision-making autonomy, and financial resources holds back their effective implementation and enforcement. To address these concerns, World Development Report 2021 calls for a new social contract for data—one that enables the use and reuse of data to create economic and social value, promotes equitable opportunities to benefit from data, and fosters citizens’ trust that they will not be harmed by misuse of the data they provide. However, in seeking such a social contract, lower- income countries are too often disadvantaged because they lack the infrastructure and skills to capture data and turn them into value; the scale and agency to participate equitably in global data markets and their governance; and the institutional and regulatory frameworks to create trust in data systems. Forging a new social contract for data is a pressing domestic policy priority that will require strengthening national data systems and engaging all stakeholders at the national level. Because of the global scale of data, some of the most challenging aspects of the social contract also call for closer international cooperation to harmonize regulations and coordinate policies—bilaterally, Foreword
  • 14. xii | Foreword regionally, and globally. Critical areas for international engagement include reform of interna- tional taxation rights for data-driven businesses, World Trade Organization arrangements for trade in data-enabled services, regional collaboration on the development of data infrastructure, international harmonization of technical standards to support interoperability, and bilateral collaboration on law enforcement and antitrust regulation. The World Bank stands ready to support its client countries on this important and challeng- ing agenda. The findings of this World Development Report will shape support for client countries by identifying where public and private sector investments are the most critical, defining a rich program for policy reform and technical assistance, and highlighting areas in which global ini- tiatives can help to convene and facilitate cross-border cooperation. Realizing the full value of data will depend on a substantial commitment and effort, and it will be difficult. But the cost of failure is a world of missed opportunities and greater inequities. David R. Malpass President The World Bank Group
  • 15. Acknowledgments | xiii This year’s World Development Report (WDR) was prepared by a team led by directors Robert Cull, Vivien Foster, and Dean Jolliffe. Malarvizhi Veerappan served as report manager and as a member of the Report’s leadership. The core team was composed of Jaffar Al-Rikabi, Adele Moukheibir Barzelay, Miriam Bruhn, Rong Chen, Niccolò Comini, Samuel Paul Fraiberger, Craig Hammer, Talip Kilic, Jan Loeprick, Daniel G. Mahler, Michael Minges, Martin Molinuevo, Nancy Morrison, David Newhouse, Sara Nyman, Vincent Francis Ricciardi III, David Satola, Dorothe Singer, Philip Wollburg, and Bilal Zia, together with research analysts Kenneth Zaul Moreno Sermeno, Nikkie Pacheco, and Elizabeth Salazar. Selome Missael Paulos provided the team with administrative support. Davida Louise Connon, Hai-Anh H. Dang, Cem Dener, Lucas Kitzmüller, Aspasea Mckenna, and David Medine were members of the extended team. Rory Macmillan and Zia Mehrabi pro- vided input as expert consultants. The Report was sponsored by the World Bank’s Development Economics Vice Presidency and was prepared in close partnership with the Infrastructure Vice Presidency. Overall guidance was provided by Carmen M. Reinhart, Chief Economist, and Aart Kraay, Deputy Chief Economist. The team is especially grateful for the support and guidance provided by Makhtar Diop, who was at that timeVice President, Infrastructure, and Haishan Fu, Director, Data Group. The team is also appreciative of the comments and suggestions from Mari E. Pangestu, Managing Director, Development Policy and Partnerships. In the early months of the Report’s preparation, Pinelopi Goldberg, then Chief Economist, provided guidance as well. The team received suggestions from an advisory panel of high-level government officials, representatives of national statistical agencies and competition authorities, and private sec- tor leaders: Anil Arora (Chief Statistician of Canada), Ola Awad (President, Palestinian Central Bureau of Statistics), Dr. Long Chen (Director, Luohan Academy, representing Alibaba Group), Neil Jackson (Chief Statistician, Foreign, Commonwealth and Development Office of the UK), Kersti Kaljulaid (President, Republic of Estonia), Francis W. Kariuki (Director-General, Com­ petition Authority of Kenya), Zia Khan (Senior Vice President, Innovation, The Rockefeller Foundation), Ming Maa (President, Grab), Joyce Murray (Minister, Digital Government, Canada), Nandan Nilekani (Cofounder, Infosys, and architect of India’s Data Empowerment and Protection Architecture), Regina Opondo (Community Director, Kounkuey Design Initia- tive), Sandra Quijada Javer (Director, National Institute of Statistics of Chile), Haleema Saeed (Director General, International Relations, Palestinian Central Bureau of Statistics), Julio A. Santaella (President, National Institute of Statistics and Geography of Mexico), Pravin Srivastava (Secretary and Chief Statistician of India), Kecuk Suhariyanto (Chief Statistician, BPS Statistics Indonesia), David Tudor (Group General Counsel, Prosus Group), Rodger Voorhies (President, Global Growth and Opportunity Division, Bill Melinda Gates Foundation), Sheila Warren (Head, Blockchain and Data Policy, World Economic Forum), and Biratu Yigezu (Director, CSA Ethiopia). The team also received guidance from a technical board of leading academics: Emmanuelle Auriol (Research Faculty, University of Toulouse), Marshall Burke (Associate Professor, Stanford University), Luciano Floridi (Faculty, University of Oxford), Jan Kramer (Faculty and Chair, Internet and Telecommunications Business, University of Passau), Jacques Kremer (President, WorldWide Advisors, Inc.), Bruno Liebhaberg (Director General, Centre on Acknowledgments
  • 16. xiv | Acknowledgments Regulation in Europe), Fernando Perini (Senior Program Specialist, International Development Research Centre), John Pullinger (President, International Association for Official Statistics), Anna Scherbina (Associate Professor, Brandeis University), and Tommaso Valetti (Faculty, Imperial College London). The team valued their advice and found it very useful. Finally, Shaida Badiee, Oliver Chinganya, Johannes Jutting, Stephen MacFeely, Angela Me, and Claire Melamed provided specialized reviews of chapter 9. However, the views expressed in the Report do not necessarily reflect those of these advisers and reviewers. The team would also like to thank the following World Bank staff, who were members of the “BrainsTrust,” for their guidance during preparation of the Report: Luis Alberto Andres, João Pedro Azevedo, Andrea Barone, Kathleen Beegle, Tania Begazo, Jerome Bezzina, Calogero Carletto, Andrew L. Dabalen, Vyjayanti Desai, Marianne Fay, Michael Ferrantino, Marelize Gorgens, Mary Hallward-Driemeier, Kimberly D. Johns, Tim Kelly, Saki Kumagai, Daniel Leder- man, William F. Maloney, Fredesvinda F. Montes Herraiz, James Neumann, Umar Serajuddin, Sharada Srinivasan, Davide Strusani, and Paolo Verme. Background and related research, along with dissemination, are being generously supported by the KDI School of Public Policy and Management (KDIS) Partnership Trust Fund, World Bank’s Knowledge for Change Program (KCP, a multidonor trust fund), and the United States Agency for International Development. The team drew on the analysis, research, and literature reviews of researchers and spe- cialists from across the world. In addition, the team would like to thank the following for feedback and helpful discussions: Susan Ariel Aaronson, Ali Abbas, Moctar Aboubacar, Karina Acosta, Sonia Ahmand, Sabina Alkire, Noam Angrist, Belinda Archibong, Kaushik Basu, Vitoria Beard, Theresa Beltramo, Willem Buiter, Mayra Buvinic, Tsenguunjav Byambasuren, Anupam Chander, Anindya Chatterjee, Nancy Chau, Katie Clancy, Julie Crowley, Shanta Devarajan, Gary Fields, Avi Goldfarb, Zihan Hu, Yingyi Jin, Ravi Kanbur, Carlos Lopez, Bentley MacLeod, Chelsea Madryga, Annabella Ng, Jose Antonio Ocampo, Kevin O’Neil, Ariel Ortiz-Bobea, Brian Park, Michael Pisa, Maria Poli, Cristian Pop-Eleches, Christian Reimsbach Kounatze, Elettra Ronchi, Benjamin David Roseth, Phet Sayo, Heba Shams, Harman Sing Dhodi, Jenna Slotin, Michael Stanger, Joseph Stiglitz, Eric Swanson, Aberash Tariku, Jim Tebrake, Jeni Tennison, Anh Phuock Thien Nguyen, Kathryn Toure, Miguel Urquiola, Eric Verhoogen, Stefaan G. Verhulst, Sonali Vyas, and Yue Wang. The following World Bank colleagues provided insightful comments, feedback, collaborat- ion, and support: Rabah Arezki, Audrey Ariss, Angela Armstrong, Aziz Atamanov, Natalia Baal, Prospere R. Backiny-Yetna, Guadalupe Bedoya, Kathleen Beegle, Abdoullahi Beidou, Moussa Blimpo, Hana Brixi, Niklas Buehren, Greta L. Bull, Maurizio Bussolo, Cesar Calderon, Yu Cao, Greg Chen, Louise Cord, Paul Andres Corral Rodas, Jose Cuesta, Conrad Daly, Richard Damania, Olivia D’Aoust, Francesca Daverio, Zelalem Yilma Debebe, Asli Demirgüç-Kunt, Francesca de Nicola, Tami Ann Dokken, Mark Dutz, Alison Evans, Pablo Fajnzylber, Deon P. Filmer, Juliane Fries, Isis Gaddis, Emilia Galiano, Roberta V. Gatti, Tina George, Alejandro Medina Giopp, Chorching Goh, Lesly Goh, Markus Goldstein, Tania Gomez, Aparajita Goyal, Caren Grown, Boutheina Guermazi, Afef Haddad, Daniel Halim, Stephane Hallegatte, Nagaraja Harshadeep, Kristen Himelein, Johannes G. Hoogeveen, Chyi-Yun Huang, Elisabeth Huybens, Roumeen Islam, Ergys Islamaj, Ann-Sofie Jespersen, Anna Kalashyan, Woubet Kassa, Faruk Khan, Young Eun Kim, Soo Min Ko, Florence Kondylis, Ayhan Kose, Holly Krambeck, Megumi Kubota, Christoph Lakner, Somik Lall, Adriana Legovini, Norman Loayza, Nancy Lozano Gracia, Morgan Sofia Lucey, Cathrine Machingauta, Andrew D. Mason, Aaditya Mattoo, Anna Metz, Frederic Meunier, Sveta Milusheva, Miriam Muller, Mamta Murthi, James Neumann, Rochelle Glenene O’Hagan, Sandie Okoro, Madalina Papahagi, Utz Johann Pape, Elizaveta Perova, Tobias Pfutze, Martin Raiser, Laura Ralston, Martin Rama, Pooja Rao, Vijayendra Rao, Sheila Reszapi, Mark Roberts, Denis Robitaille, Carlos Rodriguez-Castelan, Dan Oliver Rogger, Clelia Rontoyanni, Rosemary Rop, Karla Viviana Saavedra Betancourt, Arisha Salman, Tara Sarma, Pierre Sauve, Marc Tobias Schiffbauer, Ethel Sennhauser, Gabriel Sensenbrenner, Fatima Shah, Parmesh Shah, Nurlina Shaharuddin, Siddharth Sharma, Brian William Stacy, Clara Stinshoff,
  • 17. Acknowledgments | xv Prachi Tadsare, Daria Taglioni, Jomo Tariku, Jeff Thindwa, Hans Timmer, Margaret Triyana, Tea Trumbic, Vladimir Tsirkunov, Andrew Whitby, Mark Williams, Quentin Wodon, Keong Min Yoon, and Albert G. Zeufack. Nancy Morrison provided developmental guidance in drafting the Report, which was edited by Sabra Ledent and proofread by Gwenda Larsen and Catherine Farley. Robert Zimmermann verified the Report’s extensive citations. Dania Kibbi, Base Three Studio, was the principal graphic designer. Special thanks go to Stephen Pazdan, who coordinated and oversaw pro- duction of the Report; Mark McClure, who provided assistance; and the World Bank’s Formal Publishing Program. The team would also like to thank Mary C. Fisk, who coordinated multiple translations of the overview and main messages; Bouchra Belfqih and her team in Translation and Interpretation, who worked patiently on the translations; Patricia Katayama, who over- saw the overall publication process; and Deb Barker and Yaneisy Martinez, who managed the printing and electronic conversions of the Report and its many ancillary products. Naureen Aziz, Mark Felsenthal, Chisako Fukuda, David Mariano, and Mikael Reventar developed the communications and engagement strategy. Monique Pelloux Patron provided the team with resource management support. The team would also like to thank Maria Alyanak, Marcelo Buitron, Gabriela Calderon Motta, Maria del Camino Hurtado, and Alejandra Ramon for their help with coordination. Beyond Words studio designed and developed the WDR 2021 website (http://wdr2021.world bank.org) and data stories. The team comprised Kate Ashton, Rebecca Conroy, Jamie Gilman, Chris Hankins, Ceri Jones, Becca Muir, Richard Pullinger, Duncan Swain, and Lucy Swan. The illustrations for the website were created by Elisabetta Calabritto. Maarten Lambrechts and Adrianus Willem Tulp developed the website’s data stories. Emmanuel Cosmas Maliganya, Vidhya Nagarajan, Balaji Natarajan, Somu Sinthamani, Anushka Thewarapperuma, and Roula Yazigi contributed to website development and dissemination. This Report draws on background papers prepared by Meaza Abraham, Rodrigo Barajas, Noelia Carreras, Anupam Chander, Sandeep Chandy, Yuan Fang, Martina Francesca Ferracane, Angelina Fisher (New York University), Dayoung Park, Sharada Srinivasan, Clara Stinshoff, Thomas Streinz (New York University), Erik van der Marel, Dereje Wolde, Isabel Yu, and the WDR team. Federico Cardenas Chacon, Nicolas Conserva, Lillyana Sophia Daza Jaller, Paris Gkartzonikasm, New Doe Kaledzi, Olga Kuzmina, Aliaksandra Tyhrytskaya, and Yasmin Zand provided support for the data collection and research required for the Global Data Regulation Survey prepared for this Report. Contributors to the special-feature spotlights in the Report are Diana Jimena Arango, Anna-Maria Bogdanova, Martin Brocklehurst, Libby Hepburn, Haruna Kashiwase, Stephen MacFeely, Angela Me, Sveta Milusheva, Vijayendra Rao, David Rogers, Evis Rucaj, Sven Schade, Lea Shanley, Rubena Sukaj, Vladimir Tsirkunov, Mariana Varese, and Divyanshi Wadhwa. The team would also like to thank the World Bank colleagues who helped organize and facilitate consultations and advised on translations: Jaffar Al-Rikabi, Adele Moukheibir Barzelay, Rong Chen, Candyce Da Cruz Rocha, Nicole El Hajj, Vivien Foster, and Jevgenijs Steinbuks. The WDR team consulted policy makers, international organizations, civil society organiza- tions, private sector companies, development partners, academics, research institutions, as well as the offices of the World Bank’s Executive Directors and Bank colleagues. This year’s World Development Report faced the additional challenge of the COVID-19 pandemic, which made it impossible to conduct the usual face-to-face consultation meetings. Nonetheless, the team was able to turn this into an advantage because virtual consultation events enabled much wider par- ticipation by hundreds of stakeholders. Due to travel restrictions, the team conducted consulta- tions using a digital format and partner networks, thereby allowing more people with diverse backgrounds to participate. Some of the events focused on regional and targeted stakeholder consultations in multiple languages, allowing hard-to-reach organizations to join the discus- sion. Interactions with core contributing units to the World Development Report as part of ongo- ing debates in certain areas such as taxes leveraged wider consultations taking place on these
  • 18. xvi | Acknowledgments related issues. Team members also drew heavily on their own experiences and interactions with the many data professionals, statisticians, data scientists, civil society organizations, and policy makers working in often difficult conditions to use data to achieve development outcomes. The consultations were launched in October 2019 with a high-level roundtable—a side event of the World Bank’s annual meetings—and continued until October 2020. The team also conducted a series of bilateral consultations from April through June 2020 with the national statistical offices of Canada, Chile, Ethiopia, India, Indonesia, Mexico, the United Kingdom, and West Bank and Gaza. The team thanks the organizers of the many events that brought together a wide variety of stakeholders, including government officials, civil society orga- nizations, academic institutions, the private sector, and international organizations. The organizers were Germany’s Federal Ministry for Economic Cooperation and Development (BMZ) and Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ); Canada’s Inter- national Development Research Centre (IDRC), which organized consultations across Africa, Asia, and Latin America and the Caribbean in French, Spanish, and English; the Global Part- nership for Sustainable Development Data (GPSDD); the GW Elliott School of International Affairs; Columbia and Cornell Universities; the Mastercard Advocacy Center of Excellence; A. T. Kearney’s Global Business Policy Council; the Fletcher School of Law and Diplomacy’s Institute for ­ Business in the Global Context at Tufts University; Committee for the Coordina- tion of Statistical Activities (CCSA); United Nations World Data Forum (UNWDF); Organisation for Economic Co-operation and Development’s Development Assistance Committee (DAC); Global Data Barometer; and the Open Gov Hub. The team organized and held a dedicated consultation event for civil society members and several bilateral consultations with technology firms, platform-based businesses, inter- net exchange points, payment industries, and cybersecurity firms in the private sector. They included Alibaba, Amazon, CADE (Brazilian competition authority), De Novo, Facebook, Google, LinkedIn, Lori Systems (Kenya), Mastercard, Power2SME (India), and UA-IX (Ukranian Inter- net Exchange Point). The team also reached out for guidance on specific topics from experts based in institutions that included Aelex Legal (law firm, Abuja, Nigeria), Atlantic Council (think tank), DataPrivacy.Com.Br (São Paulo, Brazil), Georgetown University, Hamu and Com- pany (law firm, Lagos, Nigeria), Hogan Lovells (privacy and cybersecurity law firm), Hunton Andrews Kurth (global privacy and cybersecurity law firm), Interswitch (Lagos, Nigeria), New York University, Organisation for Economic Co-operation and Development, Oxford Internet Institute, Rockefeller Foundation, the United Nations Conference on Trade and Development (UNCTAD), the University of Southern California, and WilmerHale (data and cybersecurity law firm). Detailed information about all consultations and contributors can be found at https:// www.worldbank.org/en/publication/wdr2021/consultations. The team is grateful as well to the many World Bank colleagues who provided written com- ments during the formal Bank-wide review process. Those comments proved to be invaluable at a crucial stage in the Report’s production. The team apologizes to any individuals or organizations inadvertently omitted from this list and expresses its gratitude to all who contributed to this Report, including those whose names may not appear here. The team members would also like to thank their families for their support throughout the preparation of this Report.
  • 19. Abbreviations | xvii A4AI Alliance for Affordable Internet AI artificial intelligence AIS automatic identification system APEC Asia-Pacific Economic Cooperation API application programming interface ARPU average revenue per user ASEAN Association of Southeast Asian Nations ATAF African Tax Administration Forum ATI access to information BEPS OECD/G20 Inclusive Framework on Base Erosion and Profit Shifting BRICS Brazil, Russian Federation, India, China, South Africa B2B business to business B2C business to consumer B2G business to government CDR call detail record CPTPP Comprehensive and Progressive Agreement for Trans-Pacific Partnership CSIRT Computer Security Incident Response Team CSO civil society organization DEPA Digital Economy Partnership Agreement DLT distributed ledger technology DPA data protection authority DTA Digital Trade Agreement EITI Extractive Industries Transparency Initiative EU European Union FAIR findability, accessibility, interoperability, and reuse FAO Food and Agriculture Organization FCS fragile and conflict-affected situations FRAND fair, reasonable, and non-discriminatory GATS General Agreement on Trade in Services GDP gross domestic product GDPR General Data Protection Regulation (EU) GHG greenhouse gas GIS geographic information system GNI gross national income GPS global positioning system G2B government to business G2C government to consumer G2G government to government HAPS high-altitude platform station ICT information and communication technology ILO International Labour Organization IMF International Monetary Fund INDS integrated national data system Abbreviations
  • 20. xviii | Abbreviations IoT Internet of Things IP Internet Protocol IPRs intellectual property rights IRB Institutional Review Board ISO International Organization for Standardization ISP internet service provider IT information technology ITU International Telecommunication Union (UN) IUCN International Union for Conservation of Nature IXP internet exchange point KIXP Kenya Internet Exchange Point LEO low Earth orbiting LSMS Living Standards Measurement Study ME monitoring and evaluation ML machine learning MLAT mutual legal assistance treaty MNE multinational enterprise MNO mobile network operator MPA Marine Protected Area MSMEs micro, small, and medium enterprises NGO nongovernmental organization NIC news flow indices of corruption NPDR Non-Personal Data Regulation (EU) NSDS National Strategies for the Development of Statistics NSO national statistical office ODbL Open Database License OECD Organisation for Economic Co-operation and Development OGP Open Government Partnership ONS Office for National Statistics (UK) PARIS21 Partnership in Statistics for Development in the 21st Century PIMS personal information management system PPP public-private partnership PTA preferential trade agreement RADD Radar Alerts for Detecting Deforestation SCC standard contractual clause SDGs Sustainable Development Goals SINTyS National Tax and Social Identification System (Argentina) SMEs small and medium enterprises SPI Statistical Performance Indicators SSO standard setting organization TFP total factor productivity TVWS TV white space UN United Nations UNCITRAL United Nations Commission on International Trade Law UNDP United Nations Development Programme UNEP United Nations Environment Programme UNICEF United Nations Children’s Fund USITC United States International Trade Commission USMCA United States–Mexico–Canada Agreement USSD unstructured supplementary service data VAT value added tax WFP World Food Programme WHO World Health Organization
  • 21. OVERVIEW You can have data without information, but you cannot have information without data. —Daniel Keys Moran, computer programmer and science fiction author “
  • 23. Overview | 3 D ata, which are growing at an unprecedented rate, are becoming an integral part of the daily lives of most people everywhere. But how does that matter for the more than 700 million people living in extreme poverty? Is the explosion in the new types and uses of data improving their lives? Or will poor people and poor countries be left behind, creating a widening gap between those who reap the benefits of this new data-driven world and those who do not? The innovations resulting from the creative new uses of data could prove to be one of the most life-changing events of this era for everyone. Like manygeneral-purposetechnologiessuchasthesteam engine and electricity, the transformations emerging from the data revolution could touch all aspects of societies and economies. But such sweeping changes are not automatic. The productivity value of the steam engine and electricity was realized decades after they were first introduced. The delay occurred not because people did not recognize the importance of these innovations—sooner or later everyone did— but because the new manufacturing systems needed for these innovations to realize their economic poten- tial could not take shape overnight. Just as electricity itself did not result in economic development, data alone will not improve well-being. Data can improve social and economic outcomes, but only if they are used systematically in ways that create information that generates insights that improve lives. This Report aims to answer two fundamental questions. First, how can data better advance devel- opment objectives? Second, what kind of data gover- nance arrangements are needed to support the gener- ation and use of data in a safe, ethical, and secure way while also delivering value equitably? One important message of this Report is that simply gathering more data is not the answer. Sig- nificant data shortfalls, particularly in poor coun- tries, do exist, but the aim of this Report is to shift the focus toward using data more effectively to improve development outcomes, particularly for poor people in poor countries. Advancing development objectives through data Part I of this Report develops a conceptual frame- work that links data to development through three institutional pathways (figure O.1). The middle path- way is the use of data by governments and interna- tional organizations to support evidence-based pol- icy making and improved service delivery. The top pathway is the use of data by civil society to monitor the effects of government policies and by individu- als to enable them to monitor and access public and commercial services. The bottom pathway is the use of data by private firms in the production process— use that fuels their own growth as well as wider economic growth. One implication of the conceptual framework is that data alone cannot solve develop- ment problems: people (in society, governments, and firms) are the central actors transforming data into useful information that can improve livelihoods and lives.1 Alongside capital, land, and labor, data are also an input to the development objectives that emerge along all three pathways. But, unlike capital, land, and labor, using data once does not diminish its value. Data that were initially collected with one intention can be reused for a completely different purpose (chapter 1). O V ERVIEW
  • 24. 4 | World Development Report 2021 Disseminating, exchanging, and sharing data to enhance data reuse and repurposing Because the potential of data to serve a productive use is essentially limitless, enabling the reuse and repurposing of data is critical if data are to lead to better lives. It is thus a central aspect of the concep- tual framework. Figure O.1 uses two-way arrows to depict these flows. The two-way arrow between the private sector and government/international organi- zations indicates the reuse and repurposing of data originally collected for commercial purposes for pub- lic policy, and vice versa. Similarly, the two-way arrow between individuals/civil society/academia and gov- ernment/international organizations indicates data being exchanged and reused by those parties. The final two-way arrows reflect the use of private sector data and data-driven applications by individuals/civil society/academia and the use of data and analysis generated by individuals/civil society/academia by firms. In practice, however, those holding data may be unwilling to exchange data. They may have con- cerns about data protection and security or the need to capture returns on investments in collecting data. Or they may hope to gain market power from accu- mulating data to capture economies of scale or obtain any other kind of political or competitive advantage from hoarding them. The phrase “sharing and reuse” is shorthand used in this Report for all the types of transactions and exchanges of data that permit reuse, from government open data initiatives for sharing data to market-based transactionsfordatainvolvingprivatefirms.Intheory, defining clear economic property rights over data should enable data to be traded widely on markets. But in practice, the extent of the data trade (beyond the market for advertising) has been limited by competing claims on ownership, tensions between the wide dis- semination of data and incentives to accumulate more data for private commercial gain, and difficulties in assessing the quality and accuracy of data. Each of the three pathways illustrated in figure O.1 shows how data can improve lives, but those same pathways create openings for data to be used in ways that harm people. Through the government pathway, data can be abused for political ends, such as politi- cally motivated surveillance or discrimination along lines of ethnicity, religion, race, gender, disability Figure O.1 How data can support development: A theory of change Source: WDR 2021 team. Note: Positive impacts are shown in green; negative impacts are shown in red. Greater accountability Criminal activity, dark net Development Political surveillance Increased business opportunities Data production and collection Greater transparency More data on individuals Production process in firms Reuse Reuse Individuals Civil society Academia Data analytics and processing Government International organizations Private sector Better policy making and service delivery Market concentration Widening inequality Discrimination
  • 25. Overview | 5 status, or sexual orientation. In the pathway running through individuals, there is the potential for cyber- criminals to inflict considerable harm by stealing and manipulating sensitive information. The “dark net” is a vast parallel network of hidden websites that provides an underground digital platform for a wide array of criminal activities, facilitating illegal trade in drugs, counterfeit currency, stolen goods, credit card numbers, forged papers, firearms, and human organs. Similarly, through the private sector pathway, exam- ples of harmful use include, among other things, the exploitation of information about consumer prefer- ences and behavior to engage in aggressive or manip- ulative marketing techniques based on microtarget- ing of persuasive messages or to apply algorithms that facilitate collusion among market players.2 Unlocking data for the public good and safeguarding against misuses: Some COVID-19 examples Many countries have used data to control the COVID-19 pandemic. This use includes tracking peo­ ple’s locations to better understand mobility pat- terns during lockdowns or to aid in disease contact tracing. Using call detail records (CDRs) from March through May 2020 aggregated to mask individual- level data, policy makers in The Gambia were able to review maps showing the movement of people across administrative boundaries (map O.1). These maps helped them understand the extent to which lockdowns were succeeding in reducing movement and allowed them to identify the factors linked to lockdown ­ compliance and noncompliance and plan accordingly. Meanwhile, the government of Israel approved emergency regulations in March 2020 to allow the individual-level data collected from cell- phones to be used to track people and then, through contact tracing, to curtail the spread of COVID-19. CDRs were not created to aid public policy making or to allow the government to track the movements of individuals, but they are an example of data being reused and repurposed (flowing in the vertical chan- nels in figure O.1 ). In Israel, these data were being collected before the pandemic, but they could be accessed only for national security purposes.3 These early efforts at repurposing CDRs to track infected individuals seemed to have a positive effect. In The Gambia, the maps helped reveal that the lockdown disproportionally affected poorer districts, indicating a need for relief and recovery efforts to tar- get these areas. In Israel, analysis of the cellular data suggested their use led to identification of more than one-third of all of the country’s coronavirus cases in the early weeks of the pandemic (more than 5,500 of the 16,200 people who had contracted the disease), possibly contributing to Israel’s exceptionally low ini- tial rates of coronavirus infections and deaths. This new use of CDR data to track large parts of the population of Israel sparked debate and pushback over concerns about the potential misuse of the data by government. In Israel, many lawmakers raised privacy concerns, and the Supreme Court eventually Map O.1 Use of aggregated cellphone records to track mobility week by week during COVID-19 lockdowns in The Gambia, March–May 2020 Source: Knippenberg and Meyer 2020. Data at http://bit.do/WDR2021-Map-O_1. Note: Blue shades indicate outflow of people; green shades indicate inflow of people. A nationwide lockdown was imposed on March 22, 2020. Data were gathered using call detail records. Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su Brikama Kanifing Mansa Konko Kerewan Georgetown Basse Santa Su BANJUL BANJUL BANJUL BANJUL BANJUL BANJUL BANJUL BANJUL BANJUL BANJUL BANJUL International boundaries District boundaries IBRD 45xxx | MARCH 2021 Week 3: March 15–21 Week 6: April 5–11 Week 12: May 17–23 Week 4: March 22–28 Week 7: April 12–18 Week 10: May 3–9 Week 5: March 29–April 4 Week 8: April 19–25 Week 11: May 10–16 Week 13: May 24–30 Week 9: April 26–May 2 % change in population flows 0 4 8 12 16 20 40 60+ -4 -8 -12 -16 -20 -40 -60 National capital District capitals IBRDGAM45640 | MARCH2021
  • 26. 6 | World Development Report 2021 halted the program. The Court ruled in late April 2020 that the government must legislate the use of cell- phone tracking and that “a suitable alternative, com- patible with the principles of privacy, must be found.”4 Many of the themes of this Report are illustrated in this example. The sharing and reuse of private sector CDR data with public authorities created social value by supporting the control of COVID-19 infections, thereby saving lives. At the same time, this transfer of data raised fundamental concerns about trust, with citizens concerned that their CDR data could then be repurposed by government officials for other unin- tended and potentially harmful purposes beyond pub- lic health. Issues of equity were also at stake. Whereas in a high-income country like Israel smartphone pen- etration was 93 percent, in a low-income country like The Gambia smartphone penetration was only 75 per- cent. In each case, that minority of the population lack- ing a smartphone was unable to generate CDR data and would not necessarily benefit directly from the public health protection afforded by contact tracing. These examples also illustrate a key conundrum. The potential benefits that people realize in the form of improved policies and service delivery may increase rapidly as more data, especially personal data, are shared and reused—but the risks of data being misused increase as well. These potential bene- fits depend on data being disseminated or exchanged between parties. But parties must trust the systems, regulations, and institutions that underlie the secu- rity of such exchanges to willingly engage in them. How can people trust that their data will be pro- tected and that they will share in the value that data can produce? The mounting nature of such concerns suggests the need for a new social contract around data—that is, an agreement among all participants in the process of creating, reusing, and sharing data that fosters trust that they will not be harmed from exchanging data and that part of the value created by data will accrue equitably (figure O.2). The idea that societies engage in these sort of agreements, or social contracts, has existed for centuries, often linked to the writing of philosophers such as Thomas Hobbes, John Locke, and Jean-Jacques Rousseau. Legal systems, and governance more generally, can be viewed as instruments for establishing, facilitating, and enforcing social contracts. Persuading parties to abide by the rules of a social contract is not an easy task and will hinge on ensuring that the benefits from using data are shared in an equitable way—that is, everyone has something to gain. In this process, lower-income countries are too often disadvantaged, lacking, as they often do, the infrastructure and skills to capture data and turn them into value; the institu- tional and regulatory frameworks to create trust in data systems; and the scale and agency to participate equitably in global data markets and their governance. With data reshaping our lives, our societies, and the world more generally, social contracts for data are needed both nationally and internationally, especially because of the cross-border nature of data transac- tions and flows. Spotlight 8.1 extends this idea of a social contract to the international realm, calling for a global consensus to ensure that data are safeguarded as a global public good and as a resource to achieve equitable and sustainable development. The untapped potential of data; the evolving legal, regulatory, and governance frameworks for data generation, use, and reuse; the importance of country context (history, culture, governance, and political economy) in shaping appropriate frameworks; the role of technical capabilities for making the most of data safely; and the need for trust and more equitable sharing of the value of data—all these are the themes at the core of this World Development Report. Part I of the Report begins by describing in more detail the potential development impact of data col- lected for public purposes—public intent data (chapter 2); data collected by the private sector as part of rou- tine business processes—private intent data (chapter 3); and the synergies that arise from the joint use of different types of data (chapter 4). This distinction between public intent and private intent data is used Source: WDR 2021 team. Figure O.2 A social contract for data founded on value, trust, and equity The full value of data materializes when systems enable the use and reuse of data for different purposes. THE SOCIAL CONTRACT VALUE A trust environment is created when the rights and interests that all stakeholders have in data are safeguarded. TRUST All share equitably in the benefits of data when investments and regulations create a level playing field. EQUITY
  • 27. Overview | 7 regardless of who collected the data or the methods used to gather the data (such as customer surveys, accounting records, or digital transactions). Public intent data can improve service delivery, targeting, accountability, and empowerment Public intent data hold great potential for designing, executing, and evaluating public programs and policy (chapter 2). Because public intent data are a prereq- uisite for many government functions, government agencies are the primary producers of these data by means of censuses, administrative data collection, and more. Citizens, civil society organizations, non- governmental organizations, academic institutions, and international organizations contribute critically to the production of public intent data using surveys, crowdsourcing platforms, and other means. These kinds of data can lead to better lives through three main pathways: first, by improving policy mak- ing and service delivery; second, by prioritizing scarce resources and targeting them to reach marginalized populations and areas; and third, by holding govern- ment accountable and empowering individuals to make better choices through more information and knowledge. An example from Nigeria illustrates the power of public intent data to improve and target service delivery. The 2015 National Water Supply and Sanita- tion Survey commissioned by Nigeria’s government gathered data from households, water points, water schemes, and public facilities, including schools and health facilities. These data revealed that 130 million Nigerians (or more than two-thirds of the population at that time) did not meet the standard for sanitation set out by the Millennium Development Goals and that inadequate access to clean water was especially an issue for poor households and in certain geograph- ical areas (map O.2).5 In response to the findings from the report based on these data, President Muham- madu Buhari declared a state of emergency in the sector and launched the National Action Plan for the Revitalization of Nigeria’s Water, Sanitation and Hygiene (WASH) Sector.6 The higher the quality of the data (in terms of features such as timeliness, accuracy, and resolution), the greater is their potential to generate value for development. Yet a variety of factors prevent coun- tries—particularly low-income ones—from realizing greater value from data for the public good. These impediments include lack of resources, technical capacity, data governance, and demand for data- informed decision-making. The World Bank’s Statis- tical Performance Indicators, released as part of this Report, identify gaps in the availability, quality, and usability of public intent data across 174 countries, focusing on features related to the timeliness, granu- larity, interoperability, and accessibility of those data.7 Unleashing the full potential of public intent data requires high-level prioritization of data in the policy process. Governments would then prioritize the pro- duction of high-quality data and the open and trans- parent use of data for decision-making. Transparency and reliability of official statistics can help build trust in government actions. A lack of transparency, such as not revealing a country’s debt burden, can have harm- ful economic consequences and damage the public’s trust in government (see spotlight 1.2). Fulfilling the potential of data requires long-term, stable financing of data; investments in statistical and technical capac- ity; and laws conducive to safe data production and reuse. Other areas that must be addressed include low levels of data literacy affecting the demand for data, policy makers’ lack of incentives for and interest in using data, low trust in the quality of public intent data, and lack of infrastructure for accessing and using the data. These investments and initiatives rely on one another, and so failure to succeed in one area jeopardizes the overall value that data can bring to Source: World Bank 2017. Data at http://bit.do/WDR2021-Map-O_2. Note: Geographic hotspots of inadequate access to improved sanitation are shown from the least severe ( ) to the most severe ( ) in terms of the percentage of the population in that area that meets an international benchmark for sanitation. Map O.2 Highly refined data pinpointed areas of Nigeria that needed better sanitation IBRDNIR45641 | MARCH2021 ABUJA ABUJA 0–19.9 20–39.9 40–59.9 60–79.9 80–100 National capital International boundaries State boundaries % of population with inadequate access to improved sanitation
  • 28. 8 | World Development Report 2021 development. Effective use of data can generate more demand for data, thereby justifying investments to produce more, and higher-quality, data. Private intent data can fuel growth and boost development Data collected and curated by the private sector for commercial purposes also hold great potential to spur development (chapter 3). Innovations in the use and application of data by businesses are creating tre- mendous economic value by enhancing data-driven decision-making and reducing transaction costs. A 2011 study of 179 large firms in the United States indicated that firms adopting data-driven decision- making increased their productivity by 5–6 percent relative to what would be expected in view of their otherinvestmentsanduseofinformationtechnology.8 Although data are in many ways an input to the production process of firms, much of the recent explosion of new data has come about as a by- product of economic activity, such as digitization of firm operations, mobile phone usage by individuals, digital transactions, and social media interactions. These data are collected at high frequency and can provide detailed information on individuals, busi- nesses, economic outcomes, and phenomena. They not only enhance the economic efficiency of the firms themselves, but also offer potential to be repurposed for public policy needs such as COVID-19 tracking. For example, financial services providers are increasingly adopting alternative credit scoring techniques to solve the long-standing issue of lack of data on poten- tial borrowers (or more specifically, asymmetric infor- mation) in banking. These techniques take advantage of users’ digital footprints to assess creditworthiness for those who otherwise lack documentation. Two prominent examples of this approach are Lenddo, which operates in the Philippines, and Cignifi, which operates in Africa, Asia, and Latin America. But these trends also come with new risks that must be addressed to ensure that the data-driven economy raises social welfare. Concerns are growing about excessive data collection, insufficient gover- nance of data held by private firms, and inadequate protection of personal data. Many of these concerns revolve around the misuse of personal data. Such mis- uses include the failure of firms to properly protect the financial information of clients—exposing them to theft of funds or identity—or firms’ engagement in unauthorized use of, or failure to protect, individuals’ confidential health or location data. Many of the processes through which firms create value with their data are driven by algorithms and machine learning. In these models, algorithms deter- mine, among other things, what information, prod- ucts, or services individuals are exposed to and at what price; what insurance packages they are offered; whether their loan applications are approved; what jobs they qualify for; and what medical advice they receive. All these types of activities have the potential to significantly improve economic efficiency. For exam- ple, by consuming more data types and extracting relevant information from seemingly unrelated pat- terns, machine learning could generate credit scores for more individuals with greater precision. However, if the data fed into the machine learning embed discriminatory assumptions, machine learning will amplify that discrimination, not only producing harmful results, but also magnifying them.9 This point brings to mind the decades-old data science adage “garbage in–garbage out,” meaning that a data processing system such as machine learning is no bet- ter than the data it is given to process.10 But there is a deeper concern: the output from machine learning is typically opaque and changes frequently as new data enter the system. Almost by design, it creates a rule that is not transparent, and so identifying discrimi- natory elements of the algorithm can be technically very challenging. Often, data-driven markets exhibit positive net- work externalities, leading to increasing returns to scale and a propensity for a few large firms to dominate. The result can be the exclusion of smaller or more traditional firms to the detriment of local entrepreneurship, with possible risks for consumer welfare. These effects may be exacerbated in devel- oping markets, where entrants find it harder to raise start-upcapitalandwherethereislimitedhumancap- ital in data sciences. To counteract this, policy makers can address the underlying constraints to achieving scale, such as geoblocking (restricting access to inter- net content based on the user’s geographical location) or lack of harmonization of data policies across coun- tries. They can ensure that sector regulations and government support schemes provide a level playing field for all firms. Combining and repurposing data can deepen their development impact Combining and repurposing different types of data can enhance the impacts of data on development (chapter 4). Development problems are complex, spanning economic, cultural, environmental, demo- graphic, and many other factors. Policy design based on data covering only one factor will be incomplete,
  • 29. Overview | 9 and sometimes ill-advised. Combining different types of data can fill data gaps and offer new perspectives on development problems. As one example, public intent household surveys, which gather extensive data on living standards, con- sumption, income, and expenditures, are the basis for estimating national poverty rates in most countries. Because the survey instrument is so extensive and time-consuming to administer, the samples tend to be relativelysmall.Estimatesof poverty are usually statis- tically valid for a nation and at some slightly finer level of geographic stratification, but rarely are such house- hold surveys designed to provide the refined profiles of poverty that would allow policies to mitigate pov- erty to target the village level or lower. Meanwhile, for decades high-resolution poverty maps have been pro- duced by estimating a model of poverty from survey data and then mapping this model onto census data, allowing an estimate of poverty for every household in the census data. A problem with this approach is that census data are available only once a decade (and in many poorer countries even less frequently). Modifications of this approach have replaced pop- ulation census data with CDR data or various types of remote sensing data (typically from satellites, but also from drones). This repurposing of CDR or satel- lite data can provide greater resolution and timelier maps of poverty. For example, using only household survey data the government of Tanzania was able to profile the level of poverty across only 20 regions of the country’s mainland. Once the household survey data were combined with satellite imagery data, it became possible to estimate poverty for each of the country’s 169 districts (map O.3). Combining the two data sources increased the resolution of the poverty picture by eightfold with essentially no loss of preci- sion. Other examples of this innovative analysis are occurring in some of the world’s most data-deficient environments such as Afghanistan and Rwanda, offering solutions to pressing data gaps.11 Examples of other ways of repurposing data include using online media and user-generated content to map water/flood events in real time for water management and food security and combining Map O.3 Combining satellite imagery with household survey data increases the resolution of the poverty map of Tanzania Source: World Bank 2019. Data at http://bit.do/WDR2021-Map-O_3. a. Poverty map using the Household Budget Survey (20 regions) b. Poverty map combining the data in panel a with satellite imagery (169 districts) DODOMA DODOMA ARUSHA MANYARA MARA NJOMBE SIMIYU R U V U M A MTWARA KILIMANJARO IRINGA L I N D I DODOMA KIGOMA MWANZA GEITA SHINYANGA RUKWA K A T A V I M B E Y A PWANI T A B O R A ZANZIBAR NORTH PEMBA NORTH PEMBA SOUTH ZANZIBAR SOUTH ZANZIBAR URBANWEST DARESSALAAM KAGERA SINGIDA TANGA MOROGORO DODOMA DODOMA IBRD 45xxx | MARCH 2021 36–45 32–35 26–31 22–25 8–21 No data National capital International boundaries Province boundaries 38–67 31–37 24–30 17–23 4–16 No data National capital International boundaries District boundaries Poverty incidence, 2018 (% of population) Poverty incidence, 2018 (% of population) IBRDTAN45642 | MARCH2021
  • 30. 10 | World Development Report 2021 satellite imagery data from public and private sources to monitor crop yields and forecast malnutrition.12 Similarly, many examples in this Report highlight the potential for repurposing data to improve programs, policies, and outcomes in areas such as monitor- ing public health (including the spread of disease), managing crisis response and resource allocation, ensuring road safety in transport and transit, and monitoring illegal fishing and deforestation. Novel ways to create and use data enable civil society to hold governments accountable for policies and to better monitor corruption.13 For example, uti- lizing crowdsourced data and web scraping (extract- ing data from websites), social media discussion boards are emerging as ways in which local leaders can act against corrupt officials and receive real- time feedback on the impact of anticorruption pol- icies. The “I paid a bribe” online initiative launched in 2011 by the Janaagraha Centre for Citizenship and Democracy in India has developed into one of the largest crowdsourced anticorruption platforms in the world. This tool collects citizens’ reports of cor- rupt behavior and merges them with geospatial data to highlight problem areas. In doing so, it empowers individuals, civil society, and governments to fight corrupt behavior. To encourage more efforts to repurpose and com- bine data sources, this Report describes ways in which donors, governments, and companies could invest in the people, partnerships, and research needed to leverage these new data sources for public benefit. Low-income countries should emphasize policy ini- tiatives and investments in building the data skills of analysts and decision-makers; expanding tertiary education to encompass data science and analytics; promoting partnerships with universities and private companies in higher-income countries; strengthen- ing the data literacy of senior government leadership; creating institutional environments that encourage the use of sophisticated data and evidence in policy making; and revamping national statistical offices to perform nontraditional roles with private intent data. Aligning data governance with the social contract A well-designed data governance framework allows countries to capture the full economic and social value of both public intent and private intent data and leverages synergies between them. This involves creating trust in the integrity of the data system, while ensuring that the benefits of data are equitably shared. Such a framework is the tangible expression of a country’s social contract around data. Part II of this Report describes these building blocks of data governance, which can deliver the potential benefits of data while safeguarding against harmful outcomes (figure O.3). These building blocks include data infrastructure policies (chapter 5); poli- cies, laws, and regulations around data (chapter 6); related economic policies (chapter 7); and data gover- nance institutions (chapter 8). Although much of data governance is domestic in focus, an efficient and equitable resolution of many data governance challenges is possible only with international collaboration. Bilateral efforts are needed to manage cross-border spillovers of antitrust decisions and to join forces to combat cybercrime. Multilateral cooperation is essential to address global free-rider problems (such as data protectionism or tax evasion in data-enabled services) and to reduce transaction costs through harmonization of legal and technical standards for data protection and interop- erability. At the same time, regional collaboration can help amplify the voice of low- and middle-income countries in global data governance negotiations and help realize scale economies in the development of data infrastructure. Improving data infrastructure helps ensure equitable access for poor people in poor countries The digital character of modern data calls for digital infrastructure—a prerequisite for collecting, exchang- ing, storing, processing, and distributing data (chapter 5). Yet the availability of such infrastructure is marked by inequity both within and between countries. Because the social and economic value of data infra- structure rises steeply as more and more citizens are connected, universal service policies have long existed to promote service rollout. In recognition of the trans- formative opportunities that broadband connectivity presents for both individuals and nations, the United Nations Broadband Commission has committed the international community to reaching 75 percent broadband-internet user penetration by 2025.14 That said, efforts to move toward universal access face fundamental challenges. First, because of the continual technological innovation in mobile tech- nology service, coverage is a moving target. Whereas in 2018, 92 percent of the world’s population lived within range of a 3G signal (offering speeds of 40 megabytes per second), that share dropped to 80 percent for 4G technology (providing faster speeds of 400 megabytes per second, which are needed for more sophisticated smartphone applications that can promote development). The recent commercial launch of 5G technology (reaching speeds of 1,000
  • 31. Overview | 11 • Antitrust for data platform businesses • Trade in data-enabled services • Taxation of data platform businesses • International tax treaties to allocate taxation rights across countries • Global trade agreements on cross-border trade in data-enabled services • Government entities to oversee, regulate, and secure data • Other stakeholders to set standards and increase data access and reuse • International organizations to support collaboration on data governance and promote standardization • Cooperation on cross-border regulatory spillovers and enforcement issues National International • • Universal coverage of broadband networks Domestic infrastructure to exchange, store, and process data • • Safeguards to secure and protect data from the threat of misuse Enablers to facilitate data sharing among different stakeholders • Cybersecurity conventions for collaboration on tackling cybercrime • Interoperability standards to facilitate data exchanges across borders • • Global technical standards for compatibility of hardware and software Regional collaboration on data infrastructure to achieve scale Infrastructure policies Institutions Economic policies Laws and regulations megabytes per second) in a handful of leading-edge markets risks leaving the low-income countries even further behind. Policy makers can hasten technolog- ical upgrades by creating a supportive environment for private sector investment in the underpinning fiber-optic networks, while introducing more effec- tive management of critical spectrum resources. Sharing infrastructure can also greatly reduce the cost of upgrades. Yet a careful balance must be struck between promoting competition in broadband provi- sion wherever possible and encouraging cooperation between service providers in market segments where demand is too limited to support more than one infra- structure network. The second challenge is that a substantial major- ity of the 40 percent of the world’s population who do not use data services live within range of a broadband signal. Of people living in low- and middle-income countries who do not access the internet, more than two-thirds stated in a survey that they do not know what the internet is or how to use it, indicating that digital literacy is a major issue.15 Affordability is also a factor in low- and middle-income countries, where the cost of an entry-level smartphone represents about 80 percent of monthly income of the bottom 20 percent of households.16 Relatively high taxes and duties further contribute to this expense.17 As costs come down in response to innovation, competitive pressures, and sound government policy, uptake in use of the internet will likely increase. Yet even among those who do use the internet, consumption of data services stands at just 0.2 gigabytes per capita per month, a fraction of what this Report estimates may be needed to perform basic social and economic functions online. A third challenge in expanding connectivity is its potential impact on global warming. The climate impacts of increased connectivity present a set of complicated trade-offs. In 2018 the electricity needed to support data infrastructure was equal to approxi- mately 1 percent of global consumption—a signif- icant draw with environmental consequences. But because of reliance on renewable energy–supported data infrastructure and increasing energy efficien- cies, greenhouse gas emissions linked to data infra- structure are disproportionately lower than for other sectors. Furthermore, access to data infrastructure can have significant positive climatic effects as illus- trated by the massive reduction in travel and increase in videoconferencing during COVID-19 (spotlight 5.2). Figure O.3 Data governance layers at the national and international levels Source: WDR 2021 team.
  • 32. 12 | World Development Report 2021 Full participation in the data-driven economy entails not only connecting individual citizens but also developing adequate data infrastructure at the national level. For the most part, low- and middle- income countries lack domestic facilities to allow their own locally generated data to be exchanged (via internet exchange points, IXPs), stored (at colocation data centers), and processed (on cloud platforms)— see map O.4. Instead, many continue to depend on overseas facilities, requiring them to transfer large volumes of data in and out of the country—for which they pay a substantial penalty in terms of slower speed and higher prices. Policy makers can do much to improve access to data infrastructure progressively. This process begins by encouraging the creation of domestic IXPs and then fostering a suitable investment climate for colo- cation data centers. In these centers, popular internet content can be stored locally, and access to overseas cloud infrastructure can be facilitated through the provision of on-ramps. Such facilities can be shared at the regional level, where suitable fiber-optic connec- tivity exists between countries and there is adequate regulatory harmonization. Because of the extremely high standards of reliability required for data infra- structure, as well as concerns about the carbon footprint of data, the ideal private sector investment climate should provide for reliable, clean, low-cost electricity, natural cooling, and negligible disaster risk—conditions that are not always readily met in low- and middle-income countries. Data laws and regulations can help create an environment of trust Trust in data transactions can be supported through a robust legal and regulatory framework encompassing both safeguards and enablers (chapter 6). The establish- ment of such a framework remains a work in progress across all country income groups (figure O.4). Safeguards promote trust in data transactions by avoiding or limiting harm arising from the misuse of data. A fundamental prerequisite for trust in data systems is cybersecurity. Achieving adequate cyber- security calls for creating a legal framework that obliges data controllers and processers to adopt tech- nical systems to secure data.18 To date, only a small minority of low- and middle-income countries have adopted adequate legal frameworks for cybersecurity. Kenya’s new Data Protection Act stands out as a good example of comprehensive cybersecurity provisions. Creation of an adequate legal framework for data protection is also critical. Such a framework should clearly differentiate between personal data (data that identify the individual) and nonpersonal data Map O.4 Data infrastructure is not yet widespread across all parts of the world Sources: PeeringDB, Interconnection Database, https://www.peeringdb.com/; PCH Packet Clearing House, Packet Clearing House Report on Internet Exchange Point Locations (database), accessed December 14, 2020, https://www.pch.net/ixp/summary; TeleGeography, Submarine Cables (database), https://www.submarinecablemap.com/. Data at http://bit.do/WDR2021-Map-O_4. Colocation data centers Internet exchange points (IXPs) Submarine cable landing stations Submarine cables IBRD 45xxx | MARCH 2021 IBRDWLD45643 | MARCH2021
  • 33. Overview | 13 (data that do not contain any personally identifiable information). Among middle-income countries, Mau- ritius is notable as having relatively well-developed safeguards for personal data. Indeed, it has distin- guished itself as one of the first Sub-Saharan African countries to ratify the Council of Europe’s Convention 108+ for the Protection of Individuals with Regard to the Processing of Personal Data.19 The protection of personal data is grounded in international human rights law, which requires that the interests of the data subject be adequately safeguarded before enabling any kind of data trans- action.20 This protection is usually achieved by com- pelling the subjects of data to provide some form of explicit consent for use of the data. But is such con- sent meaningful? Evidence indicates that it would take the average person 76 days a year to thoroughly read the numerous disclosure documents soliciting his or her consent to each website and application visited!21 Thisfindingsuggeststheneedtostrengthen the legal obligations for data service providers to act in the best interest of the customers whose data are being used. Becauseofthelesssensitivenatureofnonpersonal data, they can for the most part be adequately pro- tected through intellectual property rights, allowing some balancing of interests between data protection and data reuse. However, this Report finds that most low-income countries surveyed do not have intellec- tual property rights in place for private intent data. Complicating matters further, the distinction between personal data and nonpersonal data is becoming increasingly blurred. This blurring arises from the widespread mixing and processing of differ- ent data sources using sophisticated algorithms that may render nonpersonal data (such as from mobile phones) personally identifiable, or at least make it possible to identify specific social groups. Enablers facilitate access to and reuse of data within and among stakeholder groups to ensure that the full social and economic value of data can be captured. The nature and extent of provisions to support data sharing differ markedly across public intent and private intent data. Significant efforts have been made around the world to safely disclose public intent data through open data policies (encouraging proactive publication of government data), together with access to information legislation (giving citizens a legally enforceable right to compel disclosure). For real impact, however, open data policies must be supported by a consistent protocol for classifying sensitive data, combined with interoperable techni- cal standards, machine readable formats, and open licensing to facilitate subsequent reuse. Governments have much less influence when it comes to disclosure of private intent data. Sharing of such data may serve as a remedy for the concentra- tion of market power, such as in the Arab Republic of Egypt, where a merger between two major ride-hailing applications was made conditional on their sharing driver and rider information with smaller competitors. In other contexts, private intent data may also be critical for addressing important public policy challenges, such as the use of mobile phone records for contact tracing to control the spread of COVID-19. And yet relatively little attention has been paid so far to the possibility of incentivizing the exchange of private intent data through measures such as open licensing, data portability, and various types of data partnerships. Some countries—notably France—have nonetheless enacted legislation man- dating the sharing of private sector data deemed to be in the public interest.22 Aligning data regulation with economic policy objectives can support the creation of value Data play a central role in rapidly expanding plat- form-based business models. For example, search engines collect data on users’ site visits, which they can sell to marketing companies so they can target Figure O.4 The legal and regulatory framework for data governance remains a work in progress across all country income groupings Source: WDR 2021 team, based on Global Data Regulation Survey conducted exclusively for this Report (https://microdata.worldbank.org/index.php/catalog/3866). Data at http://bit.do/WDR2021-Fig-O_4. Note: The figure depicts the percentage of good practice laws and regulatory measures in place for countries covered by the survey in each country income group. 0 10 20 30 40 50 60 70 80 90 E-commerce framework Enabling access to public intent data Enabling access to private intent data Cybersecurity Personal data protection Nonpersonal data protection Cross-border data flows High-income Upper-middle-income Lower-middle-income Low-income
  • 34. 14 | World Development Report 2021 advertisements more precisely. These platform-based business models are becoming increasingly import- ant in low- and middle-income countries. The design of legal and regulatory frameworks for data has a real impactonthebuoyancyofdata-drivenbusinessesand requires difficult policy balances. Providing access to essential sources of market data, for example, may be critical for promoting competition among plat- form businesses, but it also may affect incentives for investment and innovation in data-driven businesses. Again, regulations designed to protect personal data may restrict cross-border data flows and materially affect a country’s competitive edge in the burgeoning trade of data-enabled services (chapter 7). Competition and antitrust policy. Competition pol- icy plays a critical role in ensuring that the value created by platform-based business models is equita- bly shared by producers and consumers.The presence of economies of scale in data collection externalities that increase the value of networks as more partici- pants join platforms may lead to rapid accumulation of market power. Addressing such market dominance calls for two complementary strategies. First, in countries that have sufficient capacity to enforce antitrust regulation,ex post antitrustenforce- ment should be applied—albeit with any adaptations that may be needed to address the challenges posed by data-driven businesses. For example, the standard test of market dominance—overpricing by a market leader—may not be meaningful in sectors where plat- forms routinely provide consumer services for free. However, even though several landmark antitrust cases involving platform businesses have emerged in middle-income countries, such as Egypt, India, and Mexico, this Report finds that not a single low- income country has completed such a case, despite the presence of the same globally dominant firms in these markets. Second, in parallel with antitrust efforts, ex ante regulatory measures to make essential data accessi- ble to rival firms and new entrants also merit serious consideration, as does empowering consumers to switch among competing providers by mandating full portability of their personal data. Care should always be taken to verify that access to data is critical for competition and does not unduly affect incentives for innovation in data-driven businesses. Tradepolicy. Platform-based businesses also open up newavenuesforinternationaltrade,entailingsubstan- tial cross-border flows of data (figure O.5). A country’s regulatory framework for personal data protection has a material impact on participation in such trade, Figure O.5 Since 1990, the global trade in data-driven services has grown exponentially and now constitutes half of trade in services Source: WDR 2021 team calculations, based on World Bank, WITS (World Integrated Trade Solution) database, http://wits.worldbank.org/WITS/. Data at http://bit.do/WDR2021-Fig-O_5. Note: IP = Internet Protocol; PB = petabytes. 6,000 Services (US$, billions, current) Digital data flows (PB per month) 5,000 4,000 3,000 2,000 1,000 1 9 9 0 1 9 9 1 1 9 9 2 1 9 9 3 1 9 9 4 1 9 9 5 1 9 9 6 1 9 9 7 1 9 9 8 1 9 9 9 2 0 0 0 2 0 0 1 2 0 0 2 2 0 0 3 2 0 0 4 2 0 0 5 2 0 0 6 2 0 0 7 2 0 0 8 2 0 0 9 2 0 1 0 2 0 1 1 2 0 1 2 2 0 1 3 2 0 1 4 2 0 1 5 2 0 1 6 2 0 1 7 2 0 1 8 0 100,000 80,000 60,000 40,000 20,000 0 Computer, communications, and other services Travel services Transport services Other (unclassified) IP traffic (right axis) Global digital data flows Traditional services Data-driven services
  • 35. Overview | 15 creating some tension between trust and value cre- ation. Countries have adopted a variety of approaches for dealing with this. Some, notably the federal juris- diction in the United States, permit open data flows based on private sector standards, with limited gov- ernment involvement. Others, such as China, Nigeria, the Russian Federation, and Vietnam, apply more stringent regulatory requirements, requiring copies of certain personal data to be stored domestically (data localization) and state authorization for many interna- tional exchanges. In between are countries (including members of the European Union and others such as Argentina and South Africa) that make cross-border transfers of personal data conditional on whether the partner trading country offers an adequate data pro- tectionregime.ThisReportfindsthatacombinationof well-defined domestic personal data protection mea- sures with relative ease of cross-border movements appears to offer the most favorable environment for international trade in data-enabled services.23 Tax policy. Even though data-driven transactions are creating more economic activity, the governments of low- and middle-income countries are struggling to share equitably in this value by mobilizing the associated tax revenues. For indirect taxes (such as value added taxes), the revenue rights are clearly allocated to the country in which the final sales are made. However, the administrative capacity to capture this revenue is typically lacking. Estimates for East Asian countries suggest that losses to fiscal revenues could amount to as much as 1 percent of the gross domestic product (GDP) by 2030.24 As for direct taxes (such as corporate taxes), agreed-on inter- national rules are lacking for allocating rights to tax businesses that operate in markets without any phys- ical presence. In the absence of such a consensus, an increasing number of countries have been resorting to the application of ad hoc digital service taxes as a compensatory measure. Sound institutions and governance can improve the development impact of data If institutions do not function well, policies and laws and regulations are unlikely to be implemented or enforced effectively, and infrastructure will not deliver on its potential. An effective institutional framework for data governance must fulfill several critical functions, such as setting policy objectives, developing supporting rules and standards, enforcing compliance with such regulations, and continually improving governance through learning and eval- uation (chapter 8). For example, Uruguay’s creation of a lead agency close to the Office of the President and acting with a whole-of-government perspective has been critical in driving the country’s successful ­ e-government reforms since 2007. Although there is no one-size-fits-all approach for governments seeking to create robust institu- tional arrangements for data governance, certain institutional design characteristics are of universal importance. Institutions should be formally man- dated, sufficiently resourced, and have the technical capacity needed to effectively undertake their func- tions in a coordinated manner across the whole of government. The main institutional actors within this framework often include data governance enti- ties, data protection authorities, and cybersecurity agencies, as well as new types of institutions such as data trusts—accountability-oriented data interme- diaries allowing individuals to pool their legal rights over data and assign them to trustees with explicit fiduciary duties. Institutional independence and functional autonomy may be critical in some cases to shield data governance institutions from undue political or commercial influence. Behavioral and cultural norms and political economy constraints often stymie reform efforts, creating implementation gaps, especially in low- and middle-income countries. Change management, collaborative leadership, and a culture of performance and incentives can help insti- tutions overcome barriers to implementation and coordination and effectively perform their roles and responsibilities. To maximize buy-in from all participants in the data governance ecosystem, including society more broadly, data management must be socially inclusive and perceived as legitimate. Legitimacy is enhanced when governments manage and use data in a transparent manner and are subject to meaningful systems of accountability. Nongovernmental actors and emerging mechanisms such as data interme- diaries can play an important role in the ecosystem by helping governments and end users responsibly share and use data to better harness their develop- ment value, while safeguarding against the risks of misuse or abuse. Engaging with stakeholders, across society and internationally, in a collaborative and transparent manner will foster trust and legitimacy and strengthen the social contract around data use. For example, the Association of Southeast Asian Nations (ASEAN) has adopted a regional Framework on Digital Data Governance, which helps coordinate members’ data governance arrangements with a view toward interoperability.
  • 36. 16 | World Development Report 2021 Moving toward an integrated national data system A well-functioning data governance framework ensures that infrastructure, laws, economic policies, and institutions work together to support the use of data in a way that aligns with each society’s values, while protecting individuals’ rights over use of their data. This framework defines the rules, and associ- ated compliance mechanisms, for how data can be safely shared, used, and reused by all stakeholders. Part III of this Report concludes with an aspira- tional vision of an integrated national data system (INDS) that can deliver on the promise of producing high-quality data and then making data open in a way that they are both protected and accessible to be shared and reused by all stakeholders (chapter 9). The aspirational INDS works seamlessly with the gover- nance structure. If the governance framework can be viewed as creating and enforcing the “rules of the road,” the INDS can be seen as the “network of high- ways” that connect all users, ensuring safe passage of data to and from destinations. The INDS is built on an intentional, whole-of- government, multistakeholder approach to data governance. It explicitly builds data production, pro- tection, exchange, and use into planning and deci- sion-making across government entities and actively integrates the various stakeholders from civil society, the public sector, and the private sector into the data life cycle and into the governance structures of the system.25 A well-functioning system requires people to produce, process, and manage high-quality data; people to populate the institutions that safeguard and protect the data against misuse; and people to draft, oversee, and implement data strategies, policies, and regulations. The system also needs people to hold the public and private sectors accountable and people capable of using data from the production process of private firms to improve policies in the public sector. All this requires robust data literacy so that a wide cross section of people benefit from an INDS. For a sound INDS, institutions and actors must also have the right incentives to produce, protect, and share data, and funding must be sufficient to imple- ment the infrastructure and institutions needed for the system to function well. Finally, a culture of data use helps foster a high-quality supply of data and stimulate the demand for data-informed decision- making without which the national data system is not sustainable. When government agencies, civil society, aca- demia, and the private sector securely take part in a national data system, the potential uses of data expand and so does the potential impact on devel- opment. In fact, the more integrated the system and the more participants involved, the higher is the potential return. If two participants safely exchange data, data can flow in two directions. If three partici- pants exchange data, data can flow in six directions, and with four participants, in 12 directions. As data are reused and repurposed, these connections will increase rapidly. Higher degrees of integration require close coordination and shared governance between participants, but such integration is other­ wise compatible with a decentralized data architec- ture. The system is designed to ensure that data flow freely and safely—not remain in one place. Even though most countries are far away from the aspirational goal of a well-functioning data system, setting sights on this target can provide countries with guidance on the next steps in developing such a system. How countries move toward this vision of an INDS will depend on their current capacity and the parameters of the social contract for data. There is no singular blueprint for how to build an INDS. Instead, this Report proposes a maturity model to help assess progress. Countries in the initial stages are likely to benefit the most from establishing the fundamentals for an integrated national data system. This includes developingpoliciesandstrategiesaimedatbetterdata governance, strengthening the technical capacity for data production and use of government agencies and the national statistical office, and promoting data lit- eracy through education and training. With the fun- damentals in place, governments can work on initiat- ing and systemizing data flows across and between the participants in the national data system. This requires policies and standards that ensure the con- sistency and interoperability of data and institutions and infrastructure to enable the secure exchange of data that mitigates privacy risks. At advanced levels of data maturity, the goal is to optimize the system through shared data governance and collaboration between the various stakeholders from government, international organizations, civil society, and the pri- vate sector. The structure of this system will differ from country to country, reflecting local norms for the safe reuse and sharing of data. Nonetheless, certain common attributes are needed to realize the devel- opment gains from reusing and sharing data. A well-functioning data system defines and establishes
  • 37. Overview | 17 the authority and responsibility for data production, flow, and use in a nation. This system would build on the infrastructure, policies, laws and regulations, and institutions discussed here; integrate the many sources of data; and connect all the stakeholders (figure O.6). For many countries, a system in which high- quality data flow and are used safely among various participants remains a distant vision. A low-income country suffering from high levels of poverty, fra- gility, and poor governance may struggle to produce even the most fundamental data, let alone set up a whole-of-government, multistakeholder approach to data governance. Yet keeping this vision in sight matters for all countries, even those struggling the most with data, because it can serve as a guide in making decisions on how to develop their data sys- tems (box O.1). Coalescing around a common understanding of a new social contract for data—one built on trust to produce value from data that are equitably dis- tributed—and finding the right blueprint for build- ing an integrated national data system—one that unleashes the value of data to improve lives through Data are produced protected open quality controlled used and reused academic institutions civil society and individuals government entities international organizations the private sector By Built on the pillars of institutions laws and regulations infrastructure policies economic policies Sustained by the foundation of funding human capital data demand trust incentives Figure O.6 What happens in an integrated national data system? Source: WDR 2021 team.
  • 38. 18 | World Development Report 2021 creative, innovative applications by a widening array of users—are highly aspirational goals. Achieving these goals will require significant changes in how data are produced, managed, protected, shared, and used. Making these changes will be difficult and will depend on substantial commitment and effort, but the cost of failing to change is a world faced with greater inequities and many missed opportunities. Box O.1 Toward an integrated national data system: Country examples Important steps in the right direction. Many countries have adopted important initiatives that embody aspects of what is envisioned in an integrated national data sys- tem. South Africa’s Department of Planning, Monitoring and Evaluation has developed a system that includes the data produced by citizens who monitor the perfor- mance of government programs. In Chile, civil society participation is mandated by the 2011 Law on Associa- tions and Citizen Participation in Public Management, and the national statistical office has put in place a civil society council. The inclusion of multiple stakeholders in a national data system encourages sustainability and helps ensure that all participants have an opportunity to access and benefit from it. The Nepal Data Literacy Pro- gram, established in 2019, comprises a 100-hour mod- ular, customizable pedagogy to support both technical skills building and efforts to enhance a culture of data use among Nepalis. The program is now partnering with Kathmandu University School of Management (KUSOM) to incorporate data literacy toolkits into the university programs and develop a data-driven course that will be free to other institutions and thousands of students. A fully realized vision. In Estonia, the government has set up a national data system to safely manage citizens’ personal data for use by government agencies and participating businesses. X-Road is an open-source data exchange layer solution that allows linked public and ­ private databases to automatically share information, ensuring confidentiality, integrity, and interoperability between data exchange parties. It combines a technical solution (enabling technical architecture and a series of protocols) with a governance solution (the once-only principle enshrined in national law that obliges public sector agencies to refrain from duplicating data requests). Under this system, citizens have to supply government agencies and participating businesses with their infor- mation only once. It is then automatically transmitted to other participating entities. X-Road’s cryptography proto- cols also enhance transparency because they log entries into the system and give individuals detailed insights into who is sharing their data and for what purposes. The X-Road arrangement both builds on and enhances Estonia’s social contract on data by providing trust, equity, and value. Its transparency engenders trust. Its national scope, available to all, promotes equity. Its ease and comprehensiveness provide value. To work well, this digital data system depends on some “ana- logue” components. Cooperation is fostered between government and the private sector and between com- ponents of infrastructure. Change management is built into the entire system, from its foundations in national law (and the social contract) to its design, uptake, and upkeep. A culture of trust and sharing (data sharing) is encouraged. 1. Chapter 1 reviews definitions of data and describes how the term data is used in this Report. 2. Amnesty International (2019); Zuboff (2019). 3. Scheer and Cohen (2020). 4. Scheer and Cohen (2020). 5. World Bank (2017). 6. FMWR (2018). 7. World Bank, Statistical Performance Indicators (data- base), http://documents.worldbank.org/curated/en/8157 21616086786412/Measuring-the-Statistical-Performance -of-Countries-An-Overview-of-Updates-to-the-World -Bank-Statistical-Capacity-Index. 8. Brynjolfsson, Hitt, and Kim (2011). 9. For an extensive discussion of this problem and many other concerns about machine learning, see O’Neil (2017). 10. For an early reference to “garbage in–garbage out” in the statistical literature, see Parzen (1964). 11. Aiken et al. (2020). 12. Burke and Lobell (2017); Osgood-Zimmerman et al. (2018). 13. An early illustration of how data can be used to improve accountability for public expenditure can be found in a study of the use of education budgets in Uganda by Reinikka and Svensson (2001). 14. See Broadband Commission for Sustainable Develop- ment, InternationalTelecommunication Union, “Target 3: Connectivity” (accessed October 31, 2020), https://broad bandcommission.org/Pages/targets/Target-3.aspx. 15. Chen (2021). Analysis is based on Access Survey 2017–18 data collected by Research ICT Africa in 22 Notes
  • 39. Overview | 19 low- and middle-income countries across Africa, Asia, and Latin America. 16. GSMA (2019). 17. World Bank analysis of World Trade Organization ad valorem duties for “Telephones for cellular networks ‘mobile telephones’ or for other wireless networks” (Harmonized System code 851712). 18. ITU et al. (2018). 19. COE (2018). 20. Safeguards for personal data are grounded in a human rights framework based on international law. These safeguards have their origin in the establishment of the “rule of law” with the expression of individual rights in the Enlightenment and were codified in international law after World War II. They were further refined in the context of analog data in the 1970s and 1980s with the Fair Information Practices, the Council of Europe’s 1981 Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data (Convention 108), and the first guidelines issued by the Organisation for Economic Co-operation and Development (OECD). The OECD guidelines and Convention 108 were updated in the digital context after launch of the World Wide Web in 1995 and continue to evolve. 21. Madrigal (2012). 22. OECD (2019). 23. Ferracane and van der Marel (2021). 24. Al-Rikabi and Loeprick (forthcoming). 25. An integrated national data system does not imply that all data are integrated in a national database. Instead, various participants are integrated in a system in which data are safely flowing and used. This is akin to a national statistical system in the sense that an ensem- ble of participants jointly collects, protects, processes, and disseminates official statistics. But unlike in the national statistical system, the scope of an integrated national data system goes well beyond official statistics; it requires an intentional approach to governing the par- ticipants and their roles. References Aiken, Emily L., Guadalupe Bedoya, Aidan Coville, and Joshua E. Blumenstock. 2020. “Targeting Development Aid with Machine Learning and Mobile Phone Data: Evidence from an Anti-Poverty Intervention in Afghanistan.” In COMPASS ’20: Proceedings of the 3rd ACM SIGCAS Con- ference on Computing and Sustainable Societies, 310–11. New York: Association for Computing Machinery. Al-Rikabi, Jaffar, and Jan Loeprick. Forthcoming. “Simulating Potential Tax Revenues from Data-Driven Platform Busi- nesses in East Asia.” WDR 2021 background paper, World Bank, Washington, DC. Amnesty International. 2019. “Surveillance Giants: How the Business Model of Google and Facebook Threatens Human Rights.” Report POL 30/1404/2019, Amnesty International, London. https://www.amnesty.org/en /documents/document/?indexNumber=pol30%2f1404% 2f2019language=en. Brynjolfsson, Erik, Lorin M. Hitt, and Heekyung Hellen Kim. 2011. “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” SSRN Schol- arly Paper ID 1819486, Social Science Research Network, Rochester, NY. doi.org/10.2139/ssrn.1819486. Burke, Marshall, and David Lobell. 2017. “Satellite-Based Assessment of Yield Variation and Its Determinants in Smallholder African Systems.” PNAS, Proceedings of the National Academy of Sciences 114 (9): 2189–94. doi.org /10.1073/pnas.1616919114. Chen, Rong. 2021. “A Demand-Side View of Mobile Internet Adoption in the Global South.” Policy Research Working Paper 9590, World Bank, Washington, DC. http:// documents.worldbank.org/curated/en/49287161635092 9155/A-Demand-Side-View-of-Mobile-Internet-Adoption -in-the-Global-South. COE (Council of Europe). 2018. “Convention 108+: Conven- tion for the Protection of Individuals with Regard to the Processing of Personal Data.” COE, Strasbourg. https://rm.coe.int/convention-108-convention-for-the -protection-of-individuals-with-regar/16808b36f1. Ferracane, Martina Francesca, and Erik Leendert van der Marel. 2021. “Regulating Personal Data: Data Models and Digital Services Trade.” Policy Research Working Paper 9596, World Bank, Washington, DC. http://documents .worldbank.org/curated/en/890741616533448170 /Regulating-Personal-Data-Data-Models-and-Digital -Services-Trade. FMWR (Federal Ministry ofWater Resources, Nigeria). 2018. “National Action Plan for Revitalization of the WASH Sector.” FMWR, Abuja. GSMA (GSM Association). 2019. “Connected Society: State of Mobile Internet Connectivity 2019.” GSMA, London. https://www.gsma.com/mobilefordevelopment/resources /the-state-of-mobile-internet-connectivity-report-2019/. ITU (International Telecommunication Union), World Bank, ComSec (Commonwealth Secretariat), CTO (Common- wealth Telecommunications Organisation), and NATO (Cooperative Cyber Defence Centre of Excellence, North Atlantic Treaty Organization). 2018. “Guide to Develop- ing a National Cybersecurity Strategy: Strategic Engage- ment in Cybersecurity.” ITU, Geneva. https://www.itu .int/pub/D-STR-CYB_GUIDE.01-2018. Knippenberg, Erwin, and Moritz Meyer. 2020. “The Hidden Potential of Mobile Phone Data: Insights on COVID-19 in The Gambia.” Data Blog (blog). September 10, 2020. https://blogs.worldbank.org/opendata/hidden-potential -mobile-phone-data-insights-covid-19-gambia. Madrigal, Alexis C. 2012. “Reading the Privacy Policies You Encounter in a Year Would Take 76 Work Days.” Atlantic, March 1, 2012. https://www.theatlantic.com/technology /archive/2012/03/reading-the-privacy-policies-you -encounter-in-a-year-would-take-76-work-days/253851/. OECD (Organisation for Economic Co-operation and Devel- opment). 2019. Enhancing Access to and Sharing of Data: Rec- onciling Risks and Benefits for Data Re-Use across Societies. Paris: OECD. https://www.oecd-ilibrary.org/science-and -technology/enhancing-access-to-and-sharing-of-data _276aaca8-en. O’Neil, Cathy. 2017. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York: Broadway Books.
  • 40. 20 | World Development Report 2021 Osgood-Zimmerman, Aaron, Anoushka I. Millear, Rebecca W. Stubbs, Chloe Shields, Brandon V. Pickering, Lucas Earl, Nicholas Graetz, et al. 2018. “Mapping Child Growth Failure in Africa between 2000 and 2015.” Nature 555 (7694): 41–47. doi.org/10.1038/nature25760. Parzen, Emanuel. 1964. “Review of Smoothing, Forecasting and Prediction of Discrete Time Series, by Robert Good- ell Brown.” Journal of the American Statistical Asso­ ciation 59 (307): 973–74. doi.org/10.2307/2283122. Reinikka, Ritva, and Jakob Svensson. 2001 “Explaining Leak- age of Public Funds.” Policy Research Working Paper 2709, World Bank, Washington, DC. Scheer, Steven, and Tova Cohen. 2020. “Israel Extends Coronavirus Cell Phone Surveillance by Three Weeks.” Emerging Markets (blog), May 5, 2020. https://www .reuters.com/article/us-health-coronavirus-israel -surveillanc/israel-extends-coronavirus-cell-phone -surveillance-by-three-weeks-idUSKBN22H11I. World Bank. 2017. A Wake Up Call: Nigeria Water Supply, Sanitation, and Hygiene Poverty Diagnostic. WASH Poverty Diagnostic Series. Washington, DC: World Bank. https:// openknowledge.worldbank.org/handle/10986/27703. World Bank. 2019. “Tanzania, Mainland Poverty Assessment 2019: Executive Summary.” World Bank, Washington, DC. https://www.worldbank.org/en/country/tanzania /publication/tanzanias-path-to-poverty-reduction-and -pro-poor-growth. Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. New York: PublicAffairs. https://www.hbs.edu/faculty/Pages /item.aspx?num=56791.
  • 41. Harnessing the value of data for the poor Data as a force for public good Data as a resource for the private sector Creative reuses of data for greater value Advancing development objectives through data 1. 2. 3. 4. PART I
  • 43. Harnessing the value of data for the poor | 23 Main messages Data can improve people’s lives in many ways. However, economic and political factors typically prevent benefits from being shared equitably. The value of data for development is largely untapped. Realizing data’s full value entails repeatedly reusing and repurposing data in creative ways to promote economic and social development. The challenge is to develop a trust environment that safeguards against harmful misuse of data as they are exchanged between parties and enables data to be created, reused, and repurposed. A strong data governance framework, composed of appropriate policies, laws, regulations, and institutions, is needed to ensure that the full value of data is realized and shared safely and equitably. 1 2 3 4 Harnessing the value of data for the poor 1 CHAPTE R
  • 44. 24 | World Development Report 2021 The untapped potential of data to serve development objectives A t the turn of the nineteenth century, English sociologist Seebohm Rowntree interviewed a sample of families with the aim of better understanding the poverty experienced not only by thoseheinterviewed,butalsobyeveryoneinthetown of York.1 The findings from this work changed pre- conceptions by revealing that poverty was pervasive outside of London and by demonstrating that people cycled in and out of poverty over the course of their lives. How to turn data into information and infor- mation into insights that can help the poor is at the heart of this Report (see box 1.1 on the use here of the term data). In the twenty-first century, data possess the power to be truly life-changing. Most of the new and fascinating ways in which data affect the lives of many of us worldwide are linked to people being able Box 1.1 What this Report means by data The term data is difficult to define. It has meant different things at different times, and in different disciplines. Originally simply defined as facts, the term slowly came to mean facts as they related to mathematical representations. Despite the changing nature of data, most people would not have thought of things such as pictures, sounds, or words as data even as recently as a few decades ago. But times have changed, and major advances in computing power, together with innova- tive thinking, have resulted in, for example, radiomics, the science of converting medical images into data that, once structured and analyzed, can help improve a patient’s diagnosis and prognosis.a Similarly, sound can now be digitized and analyzed to, for example, explore and better understand the galaxies.b And the growing field of text analytics converts words (such as keywords from Google searches) into structured data that help us better understand many social phenomena.c Because the evolving definition of data stems simply from technolog- ical advances in computing and creative thinking, it is challenging to provide a specific description of data that would not soon seem archaic or anachronistic. In very general terms, Carrière-Swallow and Haksar point out that “data can be quantitative or qualitative in nature, and may be stored on analog (that is, paper, stone tablets) or digital media.”d This view conforms with how this Report uses the term. Indeed, some data are still collected on paper in many countries. Process- ing these data—digitizing them and entering them in a spreadsheet or database—allows them to be more easily analyzed, but a digital format is not necessarily an attri- bute of data. The Organisation for Economic Co-operation and Development (OECD) states broadly that data are “characteristics or information, usually numerical, that are collected through observation.” More specifically, data are “the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means.”e Although this description aligns fairly well with how the term is used in this Report, a few distinctions are worth noting. Here, data are sometimes collected through observation, though they need not be. Data can be the result of digital transactions or simply by-products of our daily digital lives. Also, in this Report, data are not synonymous with information. Rather, data must be processed, structured, and analyzed to be converted into information. This semantic distinction between data and information emphasizes the critical role of improved data management, literacy, and analysis for extracting information, and creating value, from data. An expansive description of data that resonates well with how the term is used in this Report is provided by the UK National Data Strategy: When we refer to data, we mean information about people, things and systems. . . . Data about people can include personal data, such as basic contact details, records generated through interaction with services or the web, or information about their physical characteristics (biometrics)—and it can also extend to population-level data, such as demographics. Data can also be about systems and infrastructure, such as administrative records about businesses and public services. Data is increasingly used to describe location, such as geospatial refer- ence details, and the environment we live in, such as data about biodiversity or the weather. It can also refer to the information generated by the burgeoning web of sensors that make up the Internet of Things.f a. Gillies, Kinahan, and Hricak (2015); Yala et al. (2021). b. See, for example, Leighton and Petculescu (2016). c. See, for example, Stephens-Davidowitz (2017). d. Carrière-Swallow and Haksar (2019, 17). e. Organisation for Economic Co-operation and Development (OECD), “Glossary of Statistical Terms: Data,” OECD Statistics Portal, https://stats .oecd.org/glossary/detail.asp?ID=532. f. See “What We Mean by Data” (DCMS 2020).
  • 45. Harnessing the value of data for the poor | 25 to extract greater value from data. Indeed, the data produced by people can be used in innovative ways to help them, but one does not have to be the producer or user of data to benefit from the data revolution. In fact, the data often collected from a small sample of people can help shape policy to improve the lives of a vastlylargerpopulation,whethertheywerepartofthe sample or not—just as Seebohm Rowntree revealed in his pioneering efforts. But for such approaches to work, the samples must be truly representative of the population, including the poor and other marginal- ized groups. And yet both traditional censuses and sample surveys, as well as new data sources captured by the private sector, may fail to fully cover the most disadvantaged groups. An important attribute of data is that using them does not diminish their value to be reused for some other purpose—data are inexhaustible. But reusing or repurposing data typically requires well-functioning data systems that facilitate the safe flow of data in formats that make the data valuable to many users. These systems, however, typically do not function well in many low- and middle-income countries. Moreover, data have a dark side. Making data accessible to more users and creating systems that facilitate their reuse also opens the door for data to be misused in ways that can harm individuals or devel- opment objectives. With lives becoming increasingly intertwined in the digital world, each day brings new concerns about protecting personal data; misinfor- mation; and attacks on software, networks, and data systems. Well-functioning data systems thus balance the need to safeguard against outcomes that harm people, while simultaneously enabling the potential for data to improve lives. This Report returns often to the need to strike this balance between safeguarding and enabling. The findings and recommendations in this Report are drawn from an extensive array of material, includ- ing academic research, international development agency reports, commercial experiences, and a series of consultations with innovators and stakeholders in the data world. Although this Report reinforces and builds on findings from WorldDevelopmentReport2016: Digital Dividends,2 the World Bank report Information and Communications for Development 2018: Data-Driven Development,3 and many reports on digital technology, this Report differs by focusing on how data them- selves, rather than the adoption of digital technology, can improve the lives of poor people.4 World Development Reports often synthesize estab- lishedfindingsfromanalyticalworkandresearch,but the issues and content surrounding data are evolving rapidly. Many of the topics covered continue to be widely debated in rich and poor countries alike. Con- sensus has yet to emerge, and research is at an early stage, particularly on how these issues affect low- and middle-income countries. The goal, therefore, is not to be overly prescriptive, but to develop frameworks to help policy makers and countries think through the trade-offs and adopt a balanced approach to devel- oping both safeguards and enablers. Countries should make the most of data, but safely, and as appropriate for their social, political, and economic context. The growing literature on data over the last few years is largely written from a high-income country perspective.5 This Report therefore sets out to fill the large gap in the literature on the effects of data on poor people and poor countries. A brief history of data Many of the themes of this Report were emerging even in the earliest days of data collection and use. For millennia, people have been collecting data. The oldest censuses date back to at least 2000–1000 BCE to ancient Egypt, Greece, and China, who enumerated people, livestock, and food items.6 The Romans fielded a census of men and their possessions every five years—a practice referenced in the Christian Bible.7 Over the long history of data collection, the type of data collected and the ways data have been used have changed as societies’ priorities, values, power structures, and government objectives have changed.8 Record keepers in the Incan Empire between 1400 and 1500 CE counted people, dwellings, llamas, marriages, and potential army recruits.9 Rulers and administrators gave priority to counting sources of wealth and power considered of strategic importance (the data were kept secret from the public). They collected information first and foremost on property for taxation and men for military recruitment and labor force purposes, as well as enumerating newly conquered peoples and territories. With little reason to believe that the data being collected were meant to improve lives, distrust was widespread—it was not uncommon for citizens to resist being counted or having their possessions counted.10 The ascent of Enlightenment ideals in eighteenth- century Europe, with their emphasis on objective sci- entific inquiry, brought a shift in attitudes toward the role of data in society—from simply counting and reg- istering phenomena to describing and understanding living conditions for society as a whole.11 During this era, and under the influence of the leading
  • 46. 26 | World Development Report 2021 intellectuals of the day, notions of the rule of law (and accountability of states) evolved, a social contract between the individual and the state coalesced, and the Declaration of the Rights of Man and of the Cit- izen emerged.12 These became the foundational ele- ments of the current discussions of a social contract for data based on human rights (see chapter 6). Starting in the late eighteenth century, govern- ments of the emerging nation-states in Europe and North America established statistical agencies to pub- lish official statistics on the state of the nation and to inform public discourse. European nations began systematically conducting full-fledged population censuses, and a decennial national census became a provision of the US Constitution. By the end of the nineteenth century, half of the world’s population had been enumerated in censuses (figure 1.1).13 These advances also led to some of the innova- tions in statistics and social science research methods that enabled the rise of the sample survey. The earliest examples of sampling date back to the late seven- teenth and early eighteenth centuries, but they lacked the theoretical foundations to justify the method.14 Sampling remained highly controversial throughout the nineteenth century, but methodological advances, especially the concept of random sample selection, led to its gradual acceptance in the early twentieth century. A series of influential articles in the 1930s, 1940s, and 1950s filled the holes in the theoretical foundations of survey sampling around the same time that sampling frames with universal coverage became available.15 Sample surveys grew enormously popular, especially in the United States, quickly cover- ing a wide range of topics. Modern geospatial data systems developed along a similar timeline. Building on the much older sci- ence of cartography, this type of data is rooted in the thematic maps of the eighteenth and nineteenth centuries. Its goal was to relate geography to other types of information.16 A prominent early application was the spatial mapping of disease outbreaks—for example, of yellow fever in New York City at the end of the eighteenth century and especially of cholera in British and other European municipalities during the pandemics of the nineteenth century.17 Most promi- nent among those is the map of London by physician John Snow. During the 1854 cholera outbreak, Snow plotted cholera-related deaths in London together with the city’s water pumps, identifying a high con- centration of cases close to a pump on Broad Street and deducing that water from this pump was causing infections (map 1.1). New cases in the area stopped Figure 1.1 The share of people counted in a census grew from about 1 in 10 in 1850 to 9 in 10 today Source: Whitby 2020. Data at http://bit.do/WDR2021-Fig-1_1. Note: The shaded area represents the world’s population; the bars indicate the percentage of the population that was enumerated each decade. 8 7 6 5 4 3 2 1 0 1850 1860 1870 1880 1890 11% 14% 17% 21% 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 World population (billions) 94% 92% 92% 88% 71% 82% 58% 58% 46% 44% 51% 38% 90%
  • 47. Harnessing the value of data for the poor | 27 almost entirely once the pump had been removed.18 Since the advent of Snow’s map, innovations in print- ing and computer technology as well as the rise of remote sensing have made geospatial data and their applications versatile and ubiquitous.19 With the digital revolution, the types and scope of data have changed dramatically, and the volume of data collected has grown exponentially. In this new landscape, private sector actors are playing an increasingly larger role in data collection through platform-based business models in which data are col- lected passively as a by-product of business processes. Digital platforms have also expanded the opportu- nities for citizens to collect data, which often occurs when governments fail to collect data (see spotlight 1.1). Examples include Utunzi, a platform that allows individuals and organizations to report and document violence against LGBTQI individuals,20 and various platforms that allow users to report air pollution lev- els, deforestation, and other location-specific environ- mental data to raise awareness and spur action. The foundational origins of data protection laws can be linked to the Enlightenment era. Although there is a clear arc from these historical concepts of rights governing interactions between the state and the individual to principles guiding data protection, most policies guiding data regulation are very mod- ern (see chapter 6). The principles of data protection can trace their immediate roots to the US Fair Infor- mation Practice Principles developed in the 1970s and that formed the basis for the 1980 OECD (Organisa- tion for Economic Co-operation and Development) Guidelines (revised in 2013).21 Similarly, the basic substantive rights and obligations in the European Union’s General Data Protection Regulation, reflected first in its 1995 Directive on the Protection of Personal Data, trace their roots to the OECD Guidelines.22 A data typology Although data can be used to improve development outcomes, the challenges differ across data types. To help readers conceptualize these data types and better understand those challenges, this Report sorts data types using a two-dimensional framework (table 1.1). In the first dimension, data are classified based on whether the original intent was for public or commercial purposes. Both new and traditional types of data collected for commercial purposes are called private intent data. Data originally collected for public purposes are called public intent data, regardless of the collection instrument or the entity that manages the data. Public intent data tend to be collected pur- posefully with a view toward representativeness. By contrast, private intent data are self-selecting in that they cover only users of cellphones and the internet, for example, and collection of these data may be more incidental. The second dimension distinguishes between “traditional” and “new” data collection methods. Pub- lic intent data are typically associated with traditional data types such as censuses and surveys, although newer sources of data (such as from satellite imaging or e-government platforms) have become more prev- alent. By design, traditional data collection efforts by governments are for public purposes and are used to inform policy making. But because the collection of public data via traditional methods tends to be rela- tively costly,23 surveys are conducted infrequently,24 and they often lack the granularity necessary to make meaningful inferences about subpopulations of inter- est. Meanwhile, traditional public intent data offer important advantages over new private intent data Map 1.1 John Snow’s innovative mapping of the cholera epidemic in London in 1854 revolutionized tracing of the disease Source: Ball 2009. Map segment reproduced from John Snow, On the Mode of Communication of Cholera, 2nd ed. (London: John Churchill, 1855). Note: The solid black rectangles of various sizes represent deaths from cholera.
  • 48. 28 | World Development Report 2021 Data collection methods and tools Public intent data Private intent data Traditional Census, national accounts, household surveys, enterprise surveys, labor force surveys, surveys of personal finance, administrative records Any survey conducted by private entities, including public opinion surveys deployed by private entities; administrative data from company financial accounts New Location data from satellite imaging, digital identification, facial recognition from public cameras, public procurement data from e-government platforms Just-in-time digital data on individual behavior/choices from digital platforms in the private sector in terms of their coverage of the population—and thus their potential to benefit more people—and their format, which makes them amenable to inferential analytics by researchers and government officials. Private intent data are often associated with new sources of data produced using digital tools and applications that are growing rapidly. Compared with traditional public intent data, new private data sources offer greatly improved timeliness, frequency, and granularity of data, but they may not be repre- sentative in coverage. New private intent data can contribute significantly to addressing public sector development challenges. Private intent data collected through cellphones, internet usage, satellites, remote sensors, and other sources provide information about individuals and geographic locations that traditional surveys simply cannot. Any simple framework used to classify data types carries limitations. Although much public intent data have long been collected using traditional methods, those methods are being updated and adapted. The new methods will increasingly supplement or replace traditional methods, and so the traditional–new dif- ferentiation in table 1.1 is likely to evolve. The distinc- tion between public and private stewardship of data also may not be a salient one in some cases. For exam- ple, citizen‑generated data—data that people or their organizations produce to directly monitor, demand, or drive change on issues that affect them—can be produced through crowdsourcing mechanisms or citizen reporting initiatives, and such data are often organized and managed by civil society groups. The data may reside with a private entity, but they are clearly collected for public purposes. Although data gathered through new methods for private purposes offer tremendous potential to improve timeliness and detail through massive sam- ple size, they are not a panacea for the shortcomings of public intent data collected using more traditional methods. For one thing, private firms have little incentive to curate their data for sharing, and thus these data are not readily amenable for public use. A potentially more difficult challenge is coverage. Data collected for public policy purposes are almost always designed to represent the relevant current population (such as individuals, firms, health facili- ties, students, or schools). However, survey designers face challenges in meeting the representativeness objective in terms of both coverage (such as under- representation of slum inhabitants, top earners, or informal enterprises) and timeliness (due to delays in data processing). By contrast, collectors of private intent data rarely need or have an interest in full pop- ulation coverage; they focus much more on specific subgroups (such as consumers and suppliers). Thus, even though sample sizes can be massive and very timely, they can provide only partial reflections of the population. A study from the United Kingdom exam- ined data from a variety of social media platforms and found that none was representative of the population, particularly underrepresenting the elderly, the less well educated, and lower-income people.25 Public policies and programs need to be informed by data that represent the relevant population. For this reason, private intent data should not be viewed as a substitute for public intent data in understanding the scope of many development problems (box 1.2). That said, the joint use of public intent data collected using traditional methods and newer sources of private intent data offers interesting opportunities to reap significantly more value added than the isolated use of one kind of data or the other. A key theme of this Report is that governments should take advantage of com- plementaritiesbetweennewandtraditionaldatatoconfront developmentchallenges.Forexample,becausethemajor- ity of the world’s poor live in rural areas and derive Table 1.1 Examples of data types based on original intent and collection methods Source: WDR 2021 team.
  • 49. Harnessing the value of data for the poor | 29 their livelihoods from the land, measuring agricul- tural productivity is central to policies and programs to eliminate extreme poverty. Yet recent research has shown that agricultural productivity, specifically crop yield, is poorly measured with traditional survey approaches that rely on farmer-reported information on crop production and land areas.26 When sample surveys rely instead on objective measurement meth- ods, the resulting data not only accurately capture crop yields at surveyed locations, but also can be used Box 1.2 Innovation in traditional surveys: A COVID-19 example in Brazil A prime example of the importance of traditional surveys and their potential for innovation comes from Brazil. In May 2020, it was one of the first countries to complete nationally representative surveys to produce data on the prevalence of COVID-19.a Fieldworkers clad in personal protective equipment conducted a serology test on ran- domly selected household members. This test detects the presence of antibodies in the blood as a response to a specific infection, such as COVID-19—that is, it detects the body’s immune response to the infection caused by the virus rather than the virus itself. While waiting for the results of the test, the fieldworkers administered a brief questionnaire to collect sociodemographic data and asked the tested household member whether she or he was experiencing symptoms associated with COVID-19.b Asking questions about symptoms enabled the research team to estimate rates of asymptomatic infection. Socio- demographic questions, especially those about work and travel outside the home, enabled the team to measure how much a household member adhered to social dis- tancing guidelines. The test results were conveyed to the household member before the fieldworkers left the dwelling, and information on positive tests was sent to health author- ities to help them track the spread of the virus. In May, 25,025 interviews in 133 “sentinel cities” were completed in the baseline survey. Cities were chosen because of their primacy in the local region as hubs of commerce and services for surrounding urban and rural areas. The survey was conducted three more times, the most recent round in late August 2020. Multiple survey rounds enabled researchers and public health officials to track the spread of the virus over time by region. At least two findings based on these serology tests and the interviews are striking. First, COVID-19 infections were far more prevalent than had been recorded. Overall seroprevalence—the share of the population that tested positive for the pathogen—for the 90 cities with a sam- ple size of 200 or greater was 1.4 percent in the baseline survey. Extrapolating this figure to the full population of these ­ cities, who represent 25 percent of the country’s population, produced an estimate of 760,000 cases, compared with the 104,782 cases reported for those cities in official statistics as of May 13, 2020. In the fourth round of the survey in August, the seroprevalence rate had climbed to 3.8 percent.c Second, there was a remarkably wide regional vari- ation in seroprevalence around the 1.4 percent national average, ranging from less than 1 percent in most cities in the South and Center-West regions to 25 percent in the city of Breves in the Amazon (North region). Eleven of the 15 cities with the highest seroprevalence were in the North. The six cities with highest seroprevalence were located along a 2,000-kilometer stretch of the Amazon River. Beyond geography, seroprevalence varied across ethnic groups and was highest among indigenous popu- lations (3.7 percent in the baseline survey). Understand- ing the scope of the overall problem and identifying regions and populations with the most pressing needs would not have been possible without population-based surveys. These data also provided information on the effectiveness (or lack thereof) of approaches adopted to combat the spread of the disease. Broadsupportforinvestigatingsomethingasimportant and urgent as the prevalence of COVID-19 might have been expected, and yet opposition sprang up in some quarters. For example, in some areas sample size was suppressed by the rapid spread of disinformation through social media that characterized the interviewers as “swindlers,” or even aspartofaplottospreadthevirus.In27cities,interviewers were arrested, and in eight cities the tests were destroyed by the local police force.d Overall, however, the example illustrates the importance of population-based surveys (and public intent data in general) for understanding the scope and nature of disease spread. a. Hallal, Hartwig, et al. (2020). Brazil is the only country in Latin America to complete a national survey. b. Hallal, Horta, et al. (2020). c. UFPEL (2020). d. Hallal, Hartwig, et al. (2020).
  • 50. 30 | World Development Report 2021 to inform and develop remote sensing models that combine data from surveys and satellites to provide highly localized crop yield estimates across entire regions and countries beyond the locations in which sample surveys are conducted.27 Both public intent and private intent data have advantages and disadvantages and pose distinct challenges in terms of reuse and exchange to achieve development objectives. But because public intent and private intent data have inherent complementar- ities, they can be used jointly to bolster development. A ministry of health would be able to issue better ­ public policy if it could connect its health data with that of other ministries such as education, labor, and planning, as well as with that of health providers, whether public or private, around the country. A pri- vate firm would be able to operate more effectively if it could link its data with other sources of informa- tion, such as satellite data on population density and socioeconomic data on wealth and well-being. Combining the two types of data could advance evidence-based policy through more precise and timely official statistics that are produced more cheaply, while preserving the representativeness characteristic of public intent data. For example, building on the well-established infrastructure for socioeconomic surveys conducted by governments, satellite data and call detail records from mobile phones offer new opportunities for updating poverty estimates for small areas more frequently. More gen- erally, the high frequency of data collected for com- mercial purposes holds promise for producing better estimates of current socioeconomic conditions when large-scale, costly surveys such as censuses or inte- grated household surveys such as those of the World Bank’s Living Standards Measurement Study are infrequent. Real-time data on prices, nighttime lights, or trade flows could be used to help “nowcast” (that is, generate an estimate for the current time based on data collected with a lag in time) macroeconomic data to avoid lags in availability. The economics of data and political economy issues The potential to extract further value from the proliferation of data is significant because data are inexhaustible or “nonrival”—that is, a person’s call detail records, location history, internet usage, and medical records, among other things, can be used repeatedly by firms and governments for different purposes without depleting them.28 This finding is illustrated by the data life cycle (figure 1.2), which depicts the potential circularity of data use, reuse, and Figure 1.2 The data life cycle Source: WDR 2021 team. Create/ receive Archive and preserve Analyze and use Transfer/ share Store Process Destroy Reuse
  • 51. Harnessing the value of data for the poor | 31 repurposing, as long as data can be made safely acces- sible across a wide array of users and unless explicit steps are taken to destroy the data. Because of the nonrival character of data and the implications for their limitless reuse, it is inherently difficult to place an economic value on data, although many attempts have been made to do so. The diverse approaches taken range from a cost-based meth- odology that adds up different components of the information value chain;29 to approaches that directly quantify the economic benefits that data yield by improving efficiency, reducing transaction costs, or expanding markets;30 to estimates based on the stock market value of data-intensive companies and related acquisition transactions.31 Although all of these approaches agree on the high value of data, the partial nature of these estimates, together with the heterogeneity of the approaches, prevent any defini- tive conclusion. In view of today’s increasingly sophisticated appli- cation of machine learning and artificial intelligence to drive data-based innovations, it is quite conceivable that the economic value of unanticipated secondary uses of data may far exceed the value of the primary use of data—that is, the use for which they were orig- inally collected. These characteristics raise the pros- pect of serious underinvestment in data collection from a social perspective because the entities bearing the cost of data collection will not necessarily be the ones capturing its full economic value. At the same time, data are not a pure public good because they are excludable, allowing the entity that originally collects and holds the data to prevent oth- ers from accessing them. Examples abound across the public sector of the unwillingness of data holders to share data with other government entities and the public.Intheprivatesector,firmsmaynotwanttosell or exchange their data with others, perhaps because governments and firms lack the capacity to share or exchange their data in a safe manner, or more simply because of a lack of incentives (or legal requirements) to make the data available. In some cases, reuse and sharing of data may cede informational advantages to competing firms in the private sector or rival enti- ties in the public sector. Although the excludability of data suggests that they could readily be traded on markets, other economic characteristics complicate this classical approach to addressing allocation issues (see box 1.3). A fundamental reason for the lack of incentives to share, sell, or exchange data is the considerable econo- mies of scale that accrue to holding data and the asso- ciated economic or political power that they bring to the data holder. Although the returns to the first few bits of data are essentially zero, there is a point past which the returns from additional data, and from improvements in the systems supporting these data, are substantial and increasing until they ultimately level off.32 For example, in the field of artificial intel- ligence the size of datasets is a critical determinant of the accuracy of predictive algorithms.33 Modern deep- learning techniques, with their complex models, have an even more voracious appetite for large datasets than traditional machine learning, and they may not begin to experience diminishing returns until they incorporate much larger scales with datasets contain- ing tens or hundreds of millions of data points. In addition to economies of scale, data are char- acterized by economies of scope because combining different types of related datasets can yield insights that otherwise would have been unavailable from one type of data alone. Similarly, weak and seemingly very tangential relations can be identified through machine learning techniques with larger and larger volumes of data. For example, Google’s search engine data may be used to evaluate the effectiveness of advertising on YouTube, which is also owned by Google.34 Social media can also track users’ behavior to then build very detailed advertising profiles. Imbalances in information sharing, concentrations of power, and equity concerns: A dark side of data Such strong economies of scale and scope in data, and the resulting assemblage of valuable informa- tion by some actors at the potential expense of those who are excluded from the transaction, may lead to a concentration of power—economic or political—in the hands of those with privileged access to large volumes of data.35 In the private sector, market forces are likely to lead to data agglomeration and market concentration in data-driven businesses, which may preclude entry by small firms and even- tually create conditions for the abuse of market power. Today, the firms that control the greatest agglomerations of data are among the world’s larg- est. The concentration of personal information in a handful of companies raises concerns about market power and discrimination. A key theme throughout this Report is balancing the gains in efficiency that new data bring with such equity concerns. On the government side, data agglomeration may lead to a concentration of personal information, which can be used to amass and maintain political power, discourage dissent, and even discriminate against some population segments. Measures that limit and
  • 52. 32 | World Development Report 2021 neutralize this kind of dominance founded on the control of data need to be central to any data gover- nance framework. Because reliable statistics can expose poor pol- icy decisions and performance, dilute power, and increase public scrutiny and pressure on govern- ments, vested interests can be expected to intervene to distort decisions about the collection, reuse, and sharing of data. And indeed this Report finds strong associations among country statistical performance, independence of national statistical offices, and freedom of the press, controlling for country size and income level (chapter 2). The patterns indicate that a free and empowered press is a critical check on government power and an important facilitator of statistical independence and data transparency. Alternative data sources can provide a check on political influences when the accuracy or impartiality of official statistics is in question. For example, online prices obtained through web scraping have been used to construct daily price indexes in multiple coun- tries, providing a comparison with official inflation figures. Researchers found that from 2007 to 2011, when Argentina reported an average annual inflation rate of 8 percent, online data indicated that the rate exceeded 20 percent.36 The higher figure was consis- tent with inflation expectations from household sur- veys conducted at the time and similar to estimates of Box 1.3 The challenges of trading data through markets From an economic perspective, it seems plausible that access to data is best solved by first defining clear economic property rights over data and then allowing parties to trade in data. However, the limited nascent economic literature on this subject suggests that for two reasons these propositions are not as straightforward as they may initially seem. First, legal and economic challenges confound the definition of property rights over data. A central issue is the ambiguity involved in allocating property rights between the data subject and the data collector, each of which has some legitimate claim to be the “data owner.” Present legal frameworks such as the European Union’s General Data Protection Regulation allocate certain spe- cific rights to the data subject, implicitly leaving residual rights to the data collector as a purely de facto property right.a Typically, a greater degree of data protection will benefit the data owner to the detriment of other potential data users and vice versa. This finding suggests that there is an economically optimal level of data protection. How- ever, without efficient allocation of property rights, this social welfare–maximizing outcome will not be attained.b The large synergies and complementarities that arise across different types of data (economies of scope) raise the concern that fragmented ownership patterns will prevent them from being realized, whether through strategic behavior or through technical barriers such as lack of interoperability. However, the classic trade-off between the static objective of making data widely available to maximize economic value and the dynamic objective of preserving incentives for further data to be collectedc has weakened considerably with the advent of digital data that are often collected without cost as a by-product of other economic activities. Second, although private bilateral market exchanges of data are well established in certain niches (specifically, trading personal data to target advertising), there are as of today no open multilateral markets for data, and many attempts to create such data markets have failed.d Because data are one of many experience goods that are difficult to evaluate in advance in areas such as price and quality, an important challenge is how data providers can convey information about the quality of their data before providing access.e In practice, data provenance has become the main means of signaling the quality and accuracy of data, relying on the reputation of the original source. How- ever, the metadata needed to establish provenance may themselves be subject to legal restrictions in areas such as privacy, and data sellers may have strategic incen- tives to conceal or manipulate such information. The theoretical literature demonstrates that the institutional mechanisms currently available for trade in data have led to a sharp trade-off between the feasible scale of a data market and the ability to verify the quality of the data traded.f Data may be traded via markets on a much larger scale in the future, but legal and institutional adaptations will be crucial to address challenges regarding data property rights and quality. a. Duch-Brown, Martens, and Mueller-Langer (2017). b. Duch-Brown, Martens, and Mueller-Langer (2017). c. Duch-Brown, Martens, and Mueller-Langer (2017). d. Koutroumpis, Leiponen, and Thomas (2020). e. This is known as the Arrow Information Paradox (Arrow 1962). f. Koutroumpis, Leiponen, and Thomas (2020).
  • 53. Harnessing the value of data for the poor | 33 some provincial governments and local economists. Because online price data were available outside the country, efforts by Argentina’s government to discour- age local economists from collecting these data inde- pendently were largely ineffective. These practices were halted in 2015 as Argentina took steps to reaffirm its commitment to the transparency and reliability of official data through its National Institute of Statistics and Censuses (INDEC). Similar disparities between official inflation statistics and those obtained from online prices have recently emerged in Turkey.37 Governments can pose broader challenges to the use of nonofficial data sources. For example, Tanza- nia’s 2018 amendment to its 2015 Statistics Act threat- ened members of civil society groups that published independent statistical information with imprison- ment. Approval of the National Bureau of Statistics was required to publish such information, and pub- lishing statistics that “invalidate, distort or discredit” official statistics was deemed a criminal offense. These provisions were subsequently amended amid international pressure.38 Finally, the transparency and reliability of official statistics can have important macroeconomic impli- cations. At a time when public debt levels are explod- ing from pandemic-related spending (see spotlight 1.2), governments may be less than forthcoming with data on the public debt, potentially enabling them to overborrow and hide debts from both citizens and creditors, at least for a while. Eventually, however, that strategy can have negative repercussions. For example, in Mozambique three state-backed compa- nies took on in 2013 and 2014 more than US$2 billion in government-guaranteed debt, equivalent to about 13 percent of the gross domestic product (GDP).39 Roughly US$1.2 billion of it was borrowed without being disclosed to parliament and the public. The country’s access to international credit markets was severelycurtailedafterthehiddenloanswererevealed in 2016. To rehabilitate its reputation, the government has undertaken a complex reform package to foster greater transparency and improve governance and anticorruption frameworks.40 Data for development: A conceptual framework This Report poses two fundamental questions. How can data better advance development objectives? And what kind of data governance arrangements are needed to support the generation and use of data in a safe, ethical, and secure way while also delivering value equitably? The first part of this Report identifies themultiplepathwaysthroughwhichdatacansupport or inhibit the development process, relying on the con- ceptual framework presented in this chapter, together with concrete illustrations and examples from recent experience in less developed and emerging countries. Three pathways by which data can support development Data can contribute to development by improving the lives of the poor through multiple pathways. The conceptual framework that guides this Report focuses on three such horizontal pathways (figure 1.3). The middle pathway is data generated by or received by governments and international organizations to support ­program administration, service delivery, and evidence-based policy making (see chapter 2). The top pathway is data created and used by civil society and academia to monitor and analyze the effects of gov- ernment programs and policies and by individuals to empower and enable them to access public and com- mercial services tailored to their needs. The bottom pathway is data generated by private firms.These data can be a factor of production that fuels firm and eco- nomic growth. But data also can be part of production processes in other ways (as an intermediate input, an output, or a by-product) and can be mobilized and repurposed to support development objectives (see chapters 3 and 4). In figure 1.3, two-way arrows link data production and collection with the three groups of actors in the center of the figure. These arrows indicate that data do not merely flow to the actors. They also must be collected with purpose, and data processing and analytics by those actors provide important feedback about what data should be produced and collected going forward. The rectangle that encapsulates the actors indicates their centrality in processing and analyzing data to provide insights that lead to better lives and better development outcomes. Among these three pathways, data can be shared and exchanged flowing vertically across public, private, and civil society channels for further impact on development. Data collected for use in one of these pathways can be accessed and repurposed for a different use through other pathways or by other data users. Government and international organizations. At a basic level, data enable governments to understand the impact of policies and improve program admin- istration and service delivery. For traditional data types such as household and firm surveys, national accounts, and administrative data, governments (or agents authorized by governments) have been central to collection efforts. They have collected data typically
  • 54. 34 | World Development Report 2021 for specific purposes, often intended to improve pol- icies and encourage development. However, without strong data systems in place to support data analysis in relevant applications, much of the potential for data to improve outcomes is unrealized. Important factors in supporting successful national data systems include trained staff, budget- ary autonomy for agencies that collect data, adequate data infrastructure, connected databases, and inter- national partnerships (see chapter 9). However, these resources are often scarce in low-income countries, leaving these countries the least equipped to collect and effectively use the data necessary to assess and understand the scope and nature of the development problems they face and make inroads to solving them. Enhancing the capacity of client countries to collect, analyze, and utilize data therefore has been, and will continue to be, a priority of the World Bank Group, and it is a major focus of this Report. International organizations can help countries to address lack of funding, technical capacity, governance, and demand for public intent data and to overcome these barriers. Sovereign-supported multilateral and bilateral devel- opment institutions are also important collectors and disseminators of data in their own right, and they support country governments in their efforts to improve and deploy data better. A better ability to exchange public intent data across many platforms (interoperability) could increase their impact on development. Despite their advantages in coverage, suitability for some types of analysis, and potential for informing and improving policy, public intent data are often stored in differ- ent government agencies and formatted in different ways. Fragmentation and incompatibilities thus limit a government’s scope to use its data to the fullest extent to improve policies, service delivery, and target- ing. Interoperability across public intent data sources is therefore an important goal. The central role of government and international organizations in fostering development through data use and reuse is captured in figure 1.3 by the place- ment of this pathway in the center of the figure (see chapter 2). Individuals, civil society, and academia. In the top pathway, making data widely available enables individuals and civil society to hold governments accountable for policy choices. Inputs from civil soci- ety provide a feedback mechanism through which policies can be adapted and improved, leading to more responsive governance. Civil society organi- zations themselves create data by collecting surveys and crowdsourcing information directly from citi- zens. Such data can foster discussion, government Figure 1.3 Three pathways along which data can foster development Source: WDR 2021 team. Note: Positive impacts are shown in green; negative impacts are shown in red. Greater accountability Criminal activity, dark net Development Political surveillance Increased business opportunities Data production and collection Greater transparency More data on individuals Production process in firms Reuse Reuse Data analytics and processing Private sector Better policy making and service delivery Market concentration Widening inequality Discrimination Individuals Civil society Academia Government International organizations
  • 55. Harnessing the value of data for the poor | 35 accountability, and transparency. Simply providing individuals with better access to their own data col- lected by government, international, or private sector actors is another way to enable citizens to advocate for themselves and improve their lives. This pathway includes the use of administrative datasets by academic researchers to improve the quantity and quality of available evidence on social programs and policies.41 For example, administra- tive linked employer-employee datasets have been used to document earnings inequality and to study the sources of its decline in Brazil42 and to study underreporting of wages by formal firms43 and the effects of business start-up programs in Mexico.44 Often carried out in partnership with firms or gov- ernments, this type of research is being published increasingly in top academic journals.45 However, broadening researchers’ access to administrative datasets remains a challenge, even in countries with well-developed statistical systems.46 The private sector. Through the bottom pathway, data generated by the private sector also hold prom- ise for improving the lives of the poor (see chapter 3). For one thing, data have become critically important in the production process of many firms. Indeed, the business models of some of the world’s largest firms (such as Amazon, Google, and Facebook) are pred- icated on data. Some important platform business models emerging in middle-income countries (such as Grab in Indonesia and Mercado Libre in Latin America) could greatly expand market access oppor- tunities for small and medium enterprises. Other data-based private solutions can directly improve the lives of poor people—such as digital credit, often applied for via cellphone, which facilitates financial inclusion. Private financial services providers are also using alternative credit scoring techniques that take advantage of users’ digital footprints to train machine learning algorithms to identify, score, and underwrite credit for individuals who otherwise would lack doc- umentation of their creditworthiness. Data reuse, sharing, and repurposing for all pathways. Enabling data reuse and repurposing is central to real- izing their value (see chapter 4). Such reuse can take place between actors within each of the three path- ways, but also across pathways. The two-way arrow in figure 1.3 between private firms and government indicates the reuse and repurposing for public policy of data originally collected for commercial purposes and the reuse and repurposing of public intent data by firms. Similarly, the two-way arrow between individu- als/civil society/academia and governments indicates the reuse, sharing, and repurposing of data between those parties. The final two-way arrows reflect the use of private sector data and data-driven applications by individuals/civil society/academia and the use of data and analysis generated by individuals/civil society/ academia by firms. The many examples of repurposing data to improve development outcomes include using geo- spatial location data from mobile phones, mobile call detail records, or social media (Facebook) and online search (Google) data to predict and trace the outbreak of disease, especially COVID-19 (box 1.4).47 Online media and user-generated content can be used to map water/flood events in real time for water manage- ment and food security. Combining satellite imagery data from private and public sources can be used to monitor crop yields and forecast malnutrition.48 The COVID-19 experience has also shown how public statistics constructed from private sector data—on credit card spending, employment, and business revenues—can serve as a new tool for empirical research and policy analysis. In the United States, indicators disaggregated by ZIP code, industry, income group, and business size showed that small businesses and low-income workers providing in- person services within wealthier ZIP codes were hardest-hit by the reduction in consumer spending during the crisis.49 The patterns suggest that wide- spread tax cuts or relief checks are not effective when people are afraid to go out and spend. Unemployment insurance benefits and grants or low-cost loans target- ingstrugglingbusinessesarelikelyabetterapproach.50 Ways in which the same three pathways can harm development Although use, reuse, and repurposing of data offer great prospects for fostering development, they simultaneously pose significant risks that must be managed to avoid negative development impacts. The mounting nature of such concerns has prompted calls for a new social contract around data. These risks can manifest themselves through public, private, and civil society pathways. Thus figure 1.3 also presents some concrete (though by no means exhaustive) illustra- tions (in red) of such negative impacts through each of the three pathways. In the middle pathway, governments can abuse citizens’ data for political ends. As public sector data systems improve and become increasingly inter­ operable, governments may accumulate a wide array of information about specific individuals. As long as public accountability is strong and state actors can be presumed to act in the broader public interest, this need not be a major concern. However, if those
  • 56. 36 | World Development Report 2021 presumptions do not hold, significant perils arise. One clear risk is the potential to misuse such data for politically motivated surveillance or discrimination along the lines of ethnicity, religion, race, gender, dis- ability status, or sexual orientation. Another concern is the possible use of data by political incumbents, domestic political players, or even foreign actors to unduly influence electoral processes by privately targeting misinformation to marginal voters during campaigns. Civil society actors can also misuse data for surveillance (to recruit members for violent extremism, for example) or to unduly affect electoral Box 1.4 Using private intent data to tackle COVID-19 At the onset of the COVID-19 outbreak, governments began implementing policy measures to reduce social contact and curb the spread of the pandemic. Data col- lected through mobile phones, such as call detail records and global positioning system (GPS) location data, proved extremely valuable in quantifying the effective- ness of policies ranging from partial curfews to strict lockdowns.a These data enabled measurement of popu- lation density, travel patterns, and population mixing in real time and at high resolution, making it possible to better target policy interventions and inform epidemio- logical modeling. Analysis of GPS locations showed that by March 23, 2020, social distancing policies had helped reduce mobility in major US cities by half.b In Colombia, Indonesia, and Mexico, travel restrictions and lockdowns on mobility had different effects on mobility across socioeconomic groups. Those in the top decile of wealth reduced their mobility up to twice as much as those in the bottom decile.c Despite the potential of deploying mobility data in the fight against COVID-19, their impact on policy has been limited in lower-income countries. Bottlenecks include a lack of technical expertise among government organizations; restrictions on data access, especially from mobile network operators; and a lack of investment and political will required to scale up onetime projects.d A review of the academic literature produces a broader look at the impact of repurposed data on the study of COVID-19 (figure B1.4.1 and map B1.4.1). Between February and September 2020, more than 950 articles were published in scientific, medical, and tech- nical journals that repurposed cellphone, social media, Google search, and other types of private intent data to track the disease and to offer policy and operational solutions (figure B1.4.1). Despite the relatively large number of articles in a short time frame, the coverage of lower-income countries was quite limited, especially in Africa (map B1.4.1). This pattern holds after adjusting Figure B1.4.1 Use of repurposed data to study COVID-19: Published articles, by type of private intent data used Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI), http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Fig-B1_4_1. Note: Figure shows the number of articles published in scientific, medical, and technical journals across time from February to September 2020. The cumulative sum across all categories is higher because some articles appear in more than one category. Articles containing “COVID” and . . . Call detail records Cellphone Facebook Google Trends Big data Twitter Social media 1,000 750 500 250 Number of articles 0 February April July 2020 September (Box continues next page)
  • 57. Harnessing the value of data for the poor | 37 processes, as can private firms, but governments are more likely to do so. Again, these examples of misuse of data are mentioned to be indicative rather than exhaustive of all possibilities. In the top pathway, individuals and organized groups can inflict considerable harm through cyber- criminals who steal and manipulate sensitive infor- mation. The so-called dark net is a vast parallel net- work of hidden websites that can only be accessed using specific software tools and private authoriza- tions. It acts as an underground digital platform for a wide array of criminal activities, facilitating illegal trade in drugs, counterfeit currency, stolen goods, credit card numbers, forged papers, firearms, and human organs. In addition to facilitating criminal activity in the real world, the internet can be the locus for cybercrime in the digital world, with burgeoning security breaches leading to the theft of critical data and raising the risk of major disruptions to critical services. One recent study estimated the annual cost of such crime as between US$57 billion and US$110 billion in the United States alone.51 Data service pro- viders have a tendency to underinvest in cybersecu- rity because the economic consequences of any data security breach are largely borne by the clients whose data are compromised. In the bottom pathway, private firms can poten- tially abuse consumers’ data through anticompetitive practices. Data-driven platform businesses experi- ence steeply increasing returns to scale as user com- munities expand, leading to positive network exter- nalities that make them more and more attractive to additional users. This dynamic has led to strong mar- ket concentration in platform businesses—including the number of articles for death rates associated with COVID-19 in each country, and it likely reflects the difficulties in accessing data and the limited research funding and capacity. a. Oliver et al. (2020). b. Klein et al. (2020). c. Fraiberger et al. (2020). d. Oliver et al. (2020). Box 1.4 Using private intent data to tackle COVID-19 (continued) Map B1.4.1 Use of repurposed data to study COVID-19: Published articles, by country Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI), http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Map-B1_4_1. Note: Map shows the number of articles published in scientific, medical, and technical journals across countries from February to September 2020. Article counts are divided by the COVID-19 death incidence rate. 7.5 No data 10.0 12.5 15.0 Number of articles IBRD 45xxx | MARCH 2021 IBRDWLD45644 | MARCH2021
  • 58. 38 | World Development Report 2021 e-commerce, search engines, and social media— raising concerns about abuse of market power. For services that are provided free of charge, abuse of dominance may manifest itself in declining quality of service, particularly in terms of the level of privacy offeredtoconsumers.Inothercases,useofalgorithms can facilitate price collusion (tacit or otherwise). More broadly, data-driven businesses may exploit their vast information about consumer preferences and behavior to engage in aggressive or manipulative marketing techniques based on microtargeting of persuasive messages—a practice known as nudging— which may unduly influence consumers’ choices or simply be a nuisance.52 Just as data can be reused for positive pur- poses, collecting and sharing sensitive data for ill- intentioned purposes can pose significant risks. For example, researchers at Cambridge Analytica devel- oped a technique to map personality traits based on what people had “liked” on Facebook. The researchers paid users small sums to take a personality quiz and download an app that would scrape some private information from their profiles and those of their friends—an activity permitted at the time. Cam- bridge Analytica eventually obtained files for roughly 30 million users that contained enough information for the company to match users to other records and build psychographic profiles. However, only about 270,000 users—those who participated in the quiz— had consented to having their data harvested.53 The outcome was that political campaigns were able to microtarget their political ads to individuals based on these profiles. Although social media data can be reused to affect election outcomes, it is challenging to do so, and there is little solid evidence that the approach has had such effects thus far.54 However, the Cambridge Analytica example demonstrates how private sector data can be leveraged by third parties (in this case, a political party) to attempt to influence voting behavior in ways the originators of the data (Facebook users and their friends) never intended. The Cambridge Analytica example also highlights the importance of transparency as data are increas- ingly created, used, reused, and repurposed by a wider range of people, organizations, businesses, and other parties. At the most basic level, documentation of sources and collection and aggregation methods are crucial for data quality and for inspiring trust among users of data. But transparent documentation is not a priority in all countries, and some governments may consciously opt for data opacity, thereby signifi- cantly undermining public trust. In short, data policy options are fraught with complex political economic constraints. A data governance framework to enforce the social contract for data Data governance entails creating an environment of implementing norms, infrastructure policies and technical mechanisms, laws and regulations for data, related economic policies, and institutions that can effectively enable the safe, trustworthy use of public intent and private intent data to achieve development outcomes. By providing predictability and confidence that these rights are being protected and protections are enforceable, a robust and effectively implemented data governance framework can strengthen trust in the data system, thereby incentivizing the use of data-driven products and services, increasing their value, and ensuring a more equitable distribution of benefits. In effect, data governance enforces the social contract around data, by applying the principles of trust, value, and equity. A data governance framework can be visualized as four distinct layers that build on and support one another (figure 1.4). The foundational layer is the policy framework for data infrastructure—both the policies that promote universal access to internet data services and the policies that ensure that countries have adequate infrastructure to exchange, store, and process data efficiently over the internet. The next layer consists of the legal and regulatory environ- ment for data itself, which creates rules to enable the reuse and sharing of data while safeguarding against their potential abuse and misuse. This normative framework for data interacts in significant ways with wider economic policy issues represented in the third layer, which affect a country’s ability to harness the economic value of data through competition, trade, and taxation. The fourth layer is the institutional ecosystem that ensures that data can deliver on their potential and that laws, regulations, and policies are effectively enforced. Infrastructure policies. The digital character of modern data makes infrastructure indispensable for collecting, exchanging, storing, processing, and distributing data (see chapter 5). Individual access to data infrastructure is a prerequisite for both contrib- uting one’s own data and accessing the data of others. The powerful network benefits, which arise as more and more people are connected to data infrastruc- ture, are the economic underpinning for universal service policies that have also been widely endorsed
  • 59. Harnessing the value of data for the poor | 39 • Antitrust for data platform businesses • Trade in data-enabled services • Taxation of data platform businesses • International tax treaties to allocate taxation rights across countries • Global trade agreements on cross-border trade in data-enabled services • Government entities to oversee, regulate, and secure data • Other stakeholders to set standards and increase data access and reuse • International organizations to support collaboration on data governance and promote standardization • Cooperation on cross-border regulatory spillovers and enforcement issues National International • • Universal coverage of broadband networks Domestic infrastructure to exchange, store, and process data • • Safeguards to secure and protect data from the threat of misuse Enablers to facilitate data sharing among different stakeholders • Cybersecurity conventions for collaboration on tackling cybercrime • Interoperability standards to facilitate data exchanges across borders • • Global technical standards for compatibility of hardware and software Regional collaboration on data infrastructure to achieve scale Infrastructure policies Institutions Economic policies Laws and regulations politically. Significant policy efforts are needed to ensure adequate coverage of last-mile infrastructure that keeps pace with constantly evolving technolog- ical standards of performance. Policy makers also need to consider how to address demand-side factors, such as the affordability of handheld devices and data charges, as well as people’s limited data literacy skills. At the country level, affordable processing of data transactions and adequate speed call for increasingly sophisticateddatainfrastructure.Astartingpointisto facilitate the creation of internet exchange points that allow internet service providers to exchange domestic internet traffic across their respective networks with- out incurring expense and slowing speed by routing traffic overseas. A further step is to create a policy environment suitable for investment in colocation data centers. Such centers allow storage and retrieval of vast volumes of digital data, including local rep- licas of popular global internet content, and they can be used to provide dedicated access to overseas cloud computing capacity that facilitates increasingly sophisticated data processing and machine learning techniques. Small-scale, regulatory deficiencies and inadequate competition all conspire to hold back the development of all forms of data infrastructure in many low- and middle-income countries, posing particular challenges for policy makers. Laws and regulations. Legal and regulatory frame- works for data need to be adequately developed, with a balanced emphasis on both safeguards and enablers (see chapter 6). The legal and regulatory provisions to safeguard personal and nonpersonal data differ greatly because these two types of data are typically generated, used, and treated in very different ways. Personally identifiable data convey information that is specific to a known individual, although iden- tifiers (such as names, addresses, and social security numbers) that directly or indirectly point to a person (or entity) could be deleted.55 Some types of personal data, such as health histories or banking transactions, may be more sensitive than others, such as shopping records. Nonpersonal data are generated about non- human subjects, including institutions or machines. They may include data on prices, traffic patterns, Figure 1.4 Data governance layers at the national and international levels Source: WDR 2021 team.
  • 60. 40 | World Development Report 2021 weather, and agricultural practices. In practice, the boundary between personal and nonpersonal data is becoming increasingly blurred as it becomes possible to infer personal characteristics from nonpersonal data, such as mobile phone records. Advances in artificial intelligence also are making the deidentifi- cation of personal data more challenging and making personal inferences from combining multiple sources of nonpersonal data possible, thereby blurring the boundaries between personal and nonpersonal data. The nature of data safeguards for personal data versus nonpersonal data is quite different. For per- sonal data, a rights-based approach to data protection is appropriate, emphasizing the rights of data sub- jects as well as the obligations of data users as the primary considerations. For nonpersonal data, intel- lectual property rights provide the relevant frame of reference, and there is greater scope to weigh the balance of economic interests between safeguards and enablers. Another important and underdeveloped aspect of data protection, affecting both personal and nonpersonal data, is cybersecurity. Complementing suchsafeguards,greateraccessto data for reuse can be enabled by open data regulations and by provisions that ensure technical interoperabil- ity between different types of data, allowing them to be readily combined and repurposed. Data portability provisions, which allow individuals to move their own data from one service provider to another, also help enhance the agency of data subjects. Economic policies. Because of the proliferation of data-driven platform business models, the design of legal frameworks for data governance carries signifi- cant implications for the real economy that are often overlooked (see chapter 7). Competition agencies grapple with the market power of globally dominant technology firms operating data-driven business models. Tax authorities struggle to collect revenues from platform businesses that often have scale in a market without any physical presence and can readily shift tax liabilities across international borders. Trade policy introduces tensions between the need to pro- tect data domestically and the desire to benefit from a growing cross-border trade in data-based goods and services. In each case, decisions about the design of the domestic regulatory framework for data will materially affect economic performance. Institutions. For effective enforcement of the nor- mative framework, a suitable institutional ecosystem that encompasses both state and nonstate actors must be in place (see chapter 8). The proliferation of arrangements around the world suggests that there is no single institutional blueprint for the implementation of data governance frameworks. The important thing is to identify the critical functions needed to deliver on the safeguards and enablers embedded in legal statutes. Depending on the coun- try context, it may make sense to assign some of these roles to existing institutions (such as the national statistical office or relevant sector regulators) or to create new institutions (such as data protection agencies or data intermediaries). Whatever the insti- tutional architecture, common challenges facing the effective implementation of data governance policies include capacity and resource constraints, lack of institutional autonomy, difficulties adopting a data- driven culture, and problems of coordination across stakeholder groups. International dimension. Although they are rooted in the domestic environment, data governance frame- works also have important international dimensions (as shown in figure 1.4 and further detailed in spot- lights 7.2 and 8.1). In many instances, international treaties provide the overarching legal framework for the development of domestic legislation and regulations. International agreements are also crit- ical in reaching resolution of long-standing data policy challenges such as how to treat cross-border data flows in international trade or how to allocate taxation rights for data transactions. At the institu- tional level, decisions made by policy makers and regulators, particularly in the larger global markets, will have important spillover effects in smaller coun- tries, particularly those with which the markets have strong economic ties. These effects underscore the importance of cross-border cooperation in addressing common data governance challenges such as the reg- ulation of market power in data-driven businesses. At the same time, data infrastructure is to a considerable extent cross-border in nature, with large volumes of data flowing to overseas storage and processing facilities and opportunities for regional collaboration around infrastructure development. Facilitation of such cross-border data movements also entails global harmonization of technical standards. In addition, there is an important role for interna- tional cooperation in creating common standards and guidelines for statistical activities (spotlight 2.2). The creation of international measurement standards and protocols helps improve comparability of measures across countries in a way that allows national policy makers to understand their country’s performance relative to that of their neighbors. Cross-country measurement of progress toward policy goals and, more generally, of statistical performance ensure that countries can benchmark and monitor their
  • 61. Harnessing the value of data for the poor | 41 data achievements and identify and strengthen their weaknesses. Good data governance, both at the national and international levels, ensures that the various components work together to enable the effective and safe use of data in order to extract value in a trustworthy, equitable way. Putting it all together: Establishing an integrated national data system A well-functioning data governance framework ensures that infrastructure, laws and regulations, policies, and institutions work together to support the use of data in a way that aligns with the social con- tract for data. This framework defines the rules, and the associated compliance mechanisms, for how data can be safely used, reused, and shared by all stake- holders, including government entities, international organizations, civil society and individuals, academic institutions, and the private sector. To realize data’s potential, this framework must be built around a data system that not only ensures that data transactions are safe, but also actively promotes access to data by all stakeholders (figure 1.5). This Report concludes with an aspirational vision of an integrated national data system (INDS) that can deliver on the promise of producing high-quality data and then making data open in a way that it is both protected and accessible to be shared and reused by all stakeholders (chapter 9). Such an aspirational INDS works seamlessly with the governance structure. If the governance system is viewed as establishing the rules of the road (and the institutions governing those rules), the INDS can be envisioned as a network of highways that connects all users and ensures the safe passage of data to and from destinations. A well-functioning INDS is powered by people: people to produce, process, and manage high- quality data; people to populate the institutions that Figure 1.5 Data flow safely across all stakeholders in an integrated national data system Source: WDR 2021 team. Government entities Academic institutions International organizations Data Civil society and individuals Private sector Data attributes Produced Protected Open Quality controlled Used and reused
  • 62. 42 | World Development Report 2021 safeguard and protect the data against misuse; and people to draft, oversee, and implement data strate- gies, policies, and regulations. The system also needs all people to have sufficient skills and knowledge to use data in ways that allow them to hold the public and private sectors accountable. All this requires robust data literacy within data institutions, govern- ment ministries, the private sector, and the general population. There is no singular blueprint for how to build an INDS. It certainly must be funded sufficiently to implement the infrastructure and institutions neces- sary for the system to function well. Incentives need to be in place to produce, protect, and share data, and to create a data culture in which people demand transparency and accountability. But how countries move toward this vision of an INDS will depend on their current capacity and the parameters of the social contract for data. Although the path toward an INDS will differ for each country, this Report proposes a sequenced maturity model to help assess progress and identify areas for more attention to further the development of a well-functioning INDS. The maturity model is based on a progression of three stages: first, establishing fundamentals; sec- ond, initiating data flows; and third, optimizing the system (chapter 8). Although progress within these stages will differ by country—and for a given country progress in dealing with certain types of data also may differ—these three stages nonetheless serve as a useful reference to help assess weak spots and gaps in the construction of an INDS. Establishing fundamentals first requires taking stock to identify the existing data types and the data processing activities carried out by different actors. This analysis should focus on activities already tak- ing place—both inside and outside of government— that present potential development opportunities for data use, reuse, and repurposing, along with risks. Uncovering gaps in the stock of data or bottlenecks in gaining access to these data can help prioritize efforts to address gaps and remove barriers. Governments should also engage with the private sector and civil society stakeholders to develop legislation, rules, and standards to safeguard data, while encouraging data collection, processing, and use. Other steps in establishing fundamentals include efforts to facili- tate public-private data sharing and cross-border data transfers by establishing contracts with information management services (such as identification sys- tems) or licenses for regulated entities (such as banks and telecom operators) that create provisions for secure, protected data transactions between public and private actors. Ensuring that the fundamentals are in place also includes developing a data gover- nance strategy with policies and laws that promote the objectives of the INDS and enforce compliance with rules. The next phase is to ensure that data begin to flow across all the stakeholders. One path to this goal is to establish a government agency with sufficient power to leverage compliance across ministries and public sector agencies in how they manage and exchange data. In addition, the rules and standards that enable greater interoperability among datasets must be established. Creating interoperability allows for innovative new uses of multiple data files as these data become accessible to a more diverse set of users. It also allows for the development of measurement standards to ensure data quality.56 Public-private and cross-border data flows can be encouraged through multistakeholder engagements with domestic and international actors to promote harmonization princi- ples, standards, and practices. Such engagements are particularly important for data protection and cyber- security, which require coordination to be effective. To reach the optimized stage, the tools and methods that helped create data flows should be incorpo- rated into a unified whole-of-government approach. Ongoing, recurrent investments in training increase the effective use of data for decision-making and accountability. Similarly, recurrent investments in infrastructure keep systems sufficiently modern and expand access. Data quality, data integration, and data synchronization should be integral parts of all processes at this stage. Meanwhile, the safe flow of data through the data system should be continually assessed and stress tested for weakness. Organization of this Report This Report is divided into three parts. Part I identi- fies the multiple channels through which data can support or impede the development process, making sense of the data landscape and pointing out the asso- ciated development opportunities and risks. This part provides a conceptual framework (figure 1.3), together with illustrations and examples from recent experi- ence in low- and middle-income countries. Part II, which describes the data governance layers presented in figure 1.4, focuses on data governance broadly defined to include data infrastructure policy (chapter 5), the legal and regulatory framework for data (chapter 6), the related economic policy implica- tions (chapter 7), and institutions (chapter 8). These diverse elements are effectively the building blocks
  • 63. Harnessing the value of data for the poor | 43 of a social contract that seeks to deliver the potential value of data equitably while safeguarding against harmful outcomes. Examples and case studies illus- trate both the importance of establishing safeguards to prevent the misuse of data that could harm devel- opment objectives and how data can be better enabled to further development objectives. Part III brings together the building blocks of the Report to present the vision of an integrated national data system (chapter 9). Throughout the Report, spotlights at the end of chapters highlight relevant cases in low- and middle- income countries and internationally and explore various policy issues in more depth. This Report was prepared against the backdrop of the COVID-19 pandemic. The pandemic itself is a vivid illustration of the usefulness of data in dealing with obstacles to development and the complexity of the associated governance challenges. Examples of how countries have used data as part of their response to COVID-19 are featured in chapters, using boxes and narratives to illustrate many of the issues addressed in the Report. Those issues include the deficiencies of public sector data systems and the complementarities between public intent and private intent data, as well as the legal and regulatory issues posed by accessing private intent data for public purposes. More broadly, through a discussion of the many ways in which data can help economic development, this Report aims to describe the challenges to realizing these gains, offer guidance on how to attain them, and propose safe- guards for protecting citizens. 8. Thorvaldsen (2017). 9. Bethlehem (2009). 10. Thorvaldsen (2017). 11. de Heer, de Leeuw, and van der Zouwen (1999). 12. Conseil constitutionnel, “Déclaration des Droits de l’Homme et du Citoyen de 1789” [Declaration of Human and Civic Rights of 26 August 1789], Paris, https://www .conseil-constitutionnel.fr/le-bloc-de-constitutionnalite /declaration-des-droits-de-l-homme-et-du-citoyen-de -1789. 13. Whitby (2020). 14. Bethlehem (2009). 15. Bethlehem (2009). 16. Musa et al. (2013). 17. Wallis and Robinson (1987). 18. Musa et al. (2013). 19. Dempsey (2012). 20. LGBTQI stands for lesbian, gay, bisexual, transgender, queer (or questioning), intersex. 21. OECD (2013). 22. Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data (1995 Directive on Personal Data Protection, https://eur-lex.europa.eu/legal-content /EN/TXT/?uri=CELEX%3A31995L0046) was repealed and replaced in 2016 by Regulation (EU) 2016/679 on the pro- tection of natural persons with regard to the processing of personal data and on the free movement of such data (EU GDPR, https://eur-lex.europa.eu/legal-content/en /TXT/?uri=CELEX%3A32016R0679). 23. Kilic et al. (2017). 24. Serajuddin et al. (2015). 25. Blank and Lutz (2017). 26. Abay et al. (2019); Carletto, Jolliffe, and Banerjee (2015); Desiere and Jolliffe (2018); Gourlay, Kilic, and Lobell (2019). 27. Lobell, Azzari, et al. (2020); Lobell, Di Tommaso, et al. (2020). 28. Jones and Tonetti (2020). Treating data as a nonrival input in a production function draws on the earlier lit- erature that modeled information and ideas as nonrival inputs to production. See Romer (1990) and Radner and Stiglitz (1984). 29. For example, Statistics Canada (2019) estimated the value of data in Canada considering the direct labor cost of data production, associated indirect labor costs, and other related expenses such as human resources management and financial control. It quantified the total own-account investment in databases in 2018 as between Can$8 billion and Can$12 billion. 30. Two examples illustrate this approach. First, the US Department of Commerce (2014) found that government data helped US businesses generate at least US$24 bil- lion a year. Second, Deloitte (2017) conducted a review of studies of the economic value of open data (public data available with no restrictions to users) to a wide range of users in the United Kingdom and found that satellite data from Landsat were worth US$2 billion a year in commercial applications, while public transport routing and scheduling data from Transport for London Notes 1. Rowntree (2000 [1901]). 2. World Bank (2016). 3. World Bank (2019). 4. The Report also builds on other themes featured in past World Bank reports, including the importance of building the data capacity of countries (see World Bank 2018). More generally, World Bank reports have long emphasized the importance of data, information, and knowledge for economic, social, and political develop- ment (see, for example, World Bank 2002). What has changed is the nature and amount of data available, the ways in which they are produced, and the ease with which they can be exchanged, reused, and shared to address development objectives. Thus the focus of this World Development Report is on data for better lives, par- ticularly for the poor. 5. See, for example, OECD (2013, 2016, 2018a, 2018b, 2019). 6. Whitby (2020). 7. Grajalez et al. (2013).
  • 64. 44 | World Development Report 2021 generated economic benefits to passengers valued at £80 million a year. 31. Several illustrations of this approach suggest the mag- nitude of monetary benefits. PwC (2019) found that market capitalizations of data-intensive companies are twice as likely to be in the top industry quartile as those of companies operating in the same sector that are not data-intensive. Li, Nirei, and Yamana (2019) looked at the sums paid for acquisitions of data-intensive firms and their impact on the market capitalization of the acquir- ing firm. Frier (2018) examined the revenue streams that companies are able to generate from sales of data or associated advertising revenues, finding that Apple charges application developers a commission of 30 per- cent of their sales for accessing its consumer data, which has earned the company US$42.8 billion in sales over the past 10 years. 32. This is similar to suggesting that there is a nonconcavity in the value of data and information. It is also linked to the point that because ideas are nonrivalrous, they exhibit increasing marginal returns over a range. See Radner and Stiglitz (1984) and Romer (1990). 33. Juba and Le (2019). 34. Goldfarb and Tucker (2019). 35. Zingales (2017) notes that as the economic scale of firms becomes large in relation to governments, economic and political power may converge. 36. Cavallo (2013); Cavallo and Rigobon (2016). 37. Erkoyun (2020). 38. Nyeko (2019). 39. Economist (2019). 40. IMF (2019). 41. Cole et al. (2020). 42. Alvarez et al. (2018); Menezes-Filho et al. (2008). 43. Kumler, Verhoogen, and Frías (2020). 44. Kaplan, Piedra, and Seira (2011). 45. Chetty (2012); Cole et al. (2020). 46. Card et al. (2010). 47. Wesolowski et al. (2015). 48. Burke and Lobell (2017); Osgood-Zimmerman et al. (2018). 49. Chetty et al. (2020). 50. Rosalsky (2020). 51. CEA (2018). 52. Amnesty International (2019); Zuboff (2019). 53. Rosenberg, Confessore, and Cadwalladr (2018). 54. Hern (2018). 55. Kayaalp (2017). 56. Anyone wondering about the importance of establish- ing comparable definitions and developing precise instruments for these measures need only look at the US National Institute of Standards and Technol- ogy, established in 1901. It has been home to five Nobel laureates. References Abay, Kibrom A., Gashaw T. Abate, Christopher B. Barrett, and Tanguy Bernard. 2019. “Correlated Non-Classical Measurement Errors, ‘Second Best’ Policy Inference, and the Inverse Size–Productivity Relationship in Agriculture.” Journal of Development Economics 139 (June): 171–84. https://doi.org/10.1016/j.jdeveco.2019.03.008. Alvarez, Jorge, Felipe Benguria, Niklas Engbom, and Chris- tian Moser. 2018. “Firms and the Decline in Earnings Inequality in Brazil.” American Economic Journal: Macro­ economics 10 (1): 149–89. https://doi.org/10.1257/mac .20150355. Amnesty International. 2019. “Surveillance Giants: How the Business Model of Google and Facebook Threatens Human Rights.” Report POL 30/1404/2019, Amnesty International, London. https://www.amnesty .org/en/documents/document/?indexNumber=pol30% 2f1404%2f2019language=en. Arrow, Kenneth J. 1962. “Economic Welfare and the Alloca- tion of Resources for Invention.” In The Rate and Direction of Inventive Activity: Economic and Social Factors, edited by National Bureau of Economic Research, 609–26. Prince­ ton Legacy Library Series. Princeton, NJ: Princeton Uni- versity Press. Ball, Laura. 2009. “Cholera and the Pump on Broad Street: The Life and Legacy of John Snow.” History Teacher 43 (1): 105–19. Bethlehem, Jelke. 2009. “The Rise of Survey Sampling.” Dis- cussion Paper 09015, Statistics Netherlands, The Hague. Blank, Grant, and Christoph Lutz. 2017. “Representativeness of Social Media in Great Britain: Investigating Facebook, LinkedIn, Twitter, Pinterest, Google+, and Instagram.” American Behavioral Scientist 61 (7): 741–56. https://doi .org/10.1177/0002764217717559. Burke, Marshall, and David B. Lobell. 2017. “Satellite-Based Assessment of Yield Variation and Its Determinants in Smallholder African Systems.” PNAS Proceedings of the National Academy of Sciences 114 (9): 2189–94. https://doi .org/10.1073/pnas.1616919114. Card, David E., Raj Chetty, Martin S. Feldstein, and Emman- uel Saez. 2010. “Expanding Access to Administrative Data for Research in the United States.” White Paper, National Science Foundation, Alexandria, VA. http:// www.rajchetty.com/chettyfiles/NSFdataaccess.pdf. Carletto, Calogero, Dean Jolliffe, and Raka Banerjee. 2015. “From Tragedy to Renaissance: Improving Agricultural Data for Better Policies.” Journal of Development Studies 51 (2): 133–48. https://doi.org/10.1080/00220388.2014 .968140. Carrière-Swallow, Yan, and Vikram Haksar. 2019. “The Economics and Implications of Data: An Integrated Perspective.” Departmental Paper 19/16, Strategy, Policy, and Review Department, International Monetary Fund, Washington, DC. Cavallo, Alberto. 2013. “Online and Official Price Indexes: Measuring Argentina’s Inflation.” Journal of Monetary Economics 60 (2): 152–65. Cavallo, Alberto, and Roberto Rigobon. 2016. “The Billion Prices Project: Using Online Prices for Inflation Mea- surement and Research.” Journal of Economic Perspectives 30 (2): 151–78. CEA (Council of Economic Advisers). 2018. “The Cost of Mali- cious Cyber Activity to the U.S. Economy.” CEA, White House, Washington, DC. https://www.whitehouse.gov /wp-content/uploads/2018/02/The-Cost-of-Malicious -Cyber-Activity-to-the-U.S.-Economy.pdf.
  • 65. Harnessing the value of data for the poor | 45 Chetty, Raj. 2012. “Time Trends in the Use of Administrative Data for Empirical Research.” Paper presented at NBER Summer Institute 2012, National Bureau of Economic Research,Cambridge,MA,July2–27.http://www.rajchetty .com/chettyfiles/admin_data_trends.pdf. Chetty, Raj, John N. Friedman, Nathaniel Hendren, Michael Stepner, and Opportunity InsightsTeam. 2020. “How Did COVID-19 and Stabilization Policies Affect Spending and Employment? A New Real-Time EconomicTracker Based on Private Sector Data.” NBER Working Paper 27431, National Bureau of Economic Research, Cambridge, MA. https://www.nber.org/system/files/working_papers /w27431/w27431.pdf. Cole, Shawn, Iqbal Dhaliwal, Anja Sautmann, and Lars Vil- huber. 2020. Handbook on Using Administrative Data for Research and Evidence-Based Policy. Cambridge, MA: Abdul Latif Jameel Poverty Action Lab and Massachusetts Institute of Technology. https://admindatahandbook.mit .edu/book/v1.0-rc6/index.html. DCMS (Department for Digital, Culture, Media, and Sport, United Kingdom). 2020. “UK National Data Strategy.” Policy paper, DCMS, London. https://www.gov.uk /government/publications/uk-national-data-strategy /national-data-strategy. de Heer, Wim, Edith Desirée de Leeuw, and Johannes van der Zouwen. 1999. “Methodological Issues in Survey Research: A Historical Review.” Bulletin of Sociological Methodology 64 (1): 25–48. Deloitte. 2017. “Assessing the Value of TfL’s Open Data and Digital Partnerships.” Deloitte LLP, London. http:// content.tfl.gov.uk/deloitte-report-tfl-open-data.pdf. Dempsey, Caitlin. 2012. “History of GIS.” GIS Lounge (blog), May 14, 2012. https://www.gislounge.com/history-of-gis/. Desiere, Sam, and Dean Jolliffe. 2018. “Land Productivity and Plot Size: Is Measurement Error Driving the Inverse Relationship?” Journal of Development Economics 130 (Jan- uary): 84–98. https://doi.org/10.1016/j.jdeveco.2017.10.002. Duch-Brown, Nestor, Bertin Martens, and Frank Mueller- Langer. 2017. “The Economics of Ownership, Access, and Trade in Digital Data.” JRC Digital Economy Working Paper 2017-01, Joint Research Center, European Commis- sion, Seville, Spain. https://ec.europa.eu/jrc/sites/jrcsh /files/jrc104756.pdf. Economist.2019.“TheNetTightens:A$2bnLoanScandalSank Mozambique’s Economy.” August 22, 2019. https://www .economist.com/middle-east-and-africa/2019/08/22/a -2bn-loan-scandal-sank-mozambiques-economy. Erkoyun, Ezgi. 2020. “Researchers Say New Model Shows Turkish Inflation Well Above Official Tally.” Reuters, October22,2020.https://www.reuters.com/article/turkey -economy-inflation-int-idUSKBN2771EY. Fraiberger, Samuel P., Pablo Astudillo, Lorenzo Candeago, Alex Chunet, Nicholas K. W. Jones, Maham Faisal Khan, Bruno Lepri, et al. 2020. “Uncovering Socioeconomic Gaps in Mobility Reduction during the COVID-19 Pan- demic Using Location Data.” ArXiv:2006.15195 [Physics. soc-ph], July 27, Cornell University, Ithaca, NY. Frier, Sarah. 2018. “Is Apple Really Your Privacy Hero?” Bloomberg Businessweek, June 8, 2018. https://www .bloomberg.com/news/articles/2018-08-08/is-apple -really-your-privacy-hero. Gillies, Robert J., Paul E. Kinahan, and Hedvig Hricak. 2015. “Radiomics: Images Are More Than Pictures, They Are Data.” Radiology 278 (2): 563–77. https://doi.org/10.1148 /radiol.2015151169. Goldfarb, Avi, and Catherine Tucker. 2019. “Digital Econom- ics.” Journal of Economic Literature 57 (1): 3–43. https://doi .org/10.1257/jel.20171452. Gourlay, Sydney, Talip Kilic, and David B. Lobell. 2019. “A New Spin on an Old Debate: Errors in Farmer-Reported Production and Their Implications for Inverse Scale– Productivity Relationship in Uganda.” Journal of Devel- opment Economics 141 (November): 102376. https://www .sciencedirect.com/science/article/pii/S030438781830 6588. Grajalez, Carlos Gómez, Eileen Magnello, Robert Woods, and Julian Champkin. 2013. “Great Moments in Statis- tics.” Significance 10 (6): 21–28. Hallal, Pedro Curi, Fernando P. Hartwig, Bernardo L. Horta, Gabriel D. Victora, Mariângela F. Silveira, Cláudio José Struchiner, Luís Paulo Vidaleti, et al. 2020. “Remarkable Variability in SARS-CoV-2 Antibodies across Brazilian Regions: Nationwide Serological Household Survey in 27 States.” medRxiv (May 30). https://www.medrxiv.org /content/10.1101/2020.05.30.20117531v1. Hallal, Pedro Curi, Bernardo L. Horta, Aluísio J. D. Barros, Odir A. Dellagostin, Fernando P. Hartwig, Lúcia C. Pel- landa, Cláudio José Struchiner, et al. 2020. “Trends in the Prevalence of COVID-19 Infection in Rio Grande do Sul, Brazil: Repeated Serological Surveys.” Ciência Saúde Coletiva 25 (supplement 1): 2395–401. https://doi .org/10.1590/1413-81232020256.1.09632020. Hern, Alex. 2018. “Cambridge Analytica: How Did It Turn Clicks into Votes?” Guardian, May 6, 2018. https://www .theguardian.com/news/2018/may/06/cambridge -analytica-how-turn-clicks-into-votes-christopher-wylie. IMF (International Monetary Fund). 2019. “Republic of Mozambique: Diagnostic Report on Transparency, Governance, and Corruption.” IMF Country Report 19/276, IMF, Washington, DC. https://www.imf.org/en /Publications/CR/Issues/2019/08/23/Republic-of -Mozambique-Diagnostic-Report-on-Transparency -Governance-and-Corruption-48613. Jones, Charles I., and Christopher Tonetti. 2020. “Nonrivalry and the Economics of Data.” AmericanEconomicReview 110 (9): 2819–58. https://doi.org/10.1257/aer.20191330. Juba, Brendan, and Hai S. Le. 2019. “Precision-Recall Versus Accuracy and the Role of Large Data Sets.” Proceedings of theAAAIConferenceonArtificialIntelligence 33 (01): 4039–48. https://doi.org/10.1609/aaai.v33i01.33014039. Kaplan, David S., Eduardo Piedra, and Enrique Seira. 2011. “Entry Regulation and Business Start-Ups: Evidence from Mexico.” Journal of Public Economics 95 (11–12): 1501–15. https://doi.org/10.1016/j.jpubeco.2011.03.007. Kayaalp, Mehmet. 2017. “Modes of De-Identification.” Paper presented at American Medical Informatics Association 2017 Annual Symposium, Washington, DC. November 6–8. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC 5977668. Kilic, Talip, Umar Serajuddin, Hiroki Uematsu, and Nobuo Yoshida. 2017. “Costing Household Surveys for Moni- toring Progress toward Ending Extreme Poverty and
  • 66. 46 | World Development Report 2021 Boosting Shared Prosperity.” Policy Research Working Paper 7951, World Bank, Washington, DC. Klein, Brennan, Timothy LaRock, Stefan McCabe, Leo Torres, Filippo Privitera, Lake Brennan, Moritz U. G. Kraemer, et al. 2020. “Assessing Changes in Commuting and Individual Mobility in Major Metropolitan Areas in the United States during the COVID-19 Outbreak.” Network Science Institute, Northeastern University, Boston. https://www.networkscienceinstitute.org /publications/assessing-changes-in-commuting-and -individual-mobility-in-major-metropolitan-areas-in-the -united-states-during-the-covid-19-outbreak. Koutroumpis, Pantelis, Aija Leiponen, and Llewellyn D. W. Thomas. 2020. “Markets for Data.” IndustrialandCorporate Change 29 (3): 645–60. https://doi.org/10.1093/icc/dtaa002. Kumler,Todd, EricVerhoogen, and Judith Frías. 2020. “Enlist- ing Employees in Improving Payroll Tax Compliance: Evidence from Mexico.” Review of Economics and Statistics 102 (5): 881–96. https://doi.org/10.1162/rest_a_00907. Leighton, Timothy G., and Andi Petculescu. 2016. “Guest Editorial: Acoustic and Related Waves in Extraterrestrial Environments.” Journal of the Acoustical Society of America 140 (2): 1397–99. https://doi.org/10.1121/1.4961539. Li, Wendy C. Y., Makoto Nirei, and Kazufumi Yamana. 2019. “Value of Data: There’s No Such Thing as a Free Lunch in the Digital Economy.” RIETI Discussion Paper 19-E-022, Research Institute of Economy, Trade, and Industry, Tokyo. https://www.rieti.go.jp/jp/publications/ dp/19e022.pdf. Lobell, David B., George Azzari, Marshall Burke, Sydney Gourlay, Zhenong Jin, Talip Kilic, and Siobhan Murray. 2020. “Eyes in the Sky, Boots on the Ground: Assessing Satellite- and Ground-Based Approaches to Crop Yield Measurement and Analysis.” American Journal of Agri- cultural Economics 102 (1): 202–19. https://doi.org/10.1093 /ajae/aaz051. Lobell, David B., Stefania Di Tommaso, Calum You, Ismael Yacoubou Djima, Marshall Burke, and Talip Kilic. 2020. “Sight for Sorghums: Comparisons of Satellite- and Ground-Based Sorghum Yield Estimates in Mali.” Remote Sensing 12 (1): 100. https://doi.org/10.3390/rs12010100. Menezes-Filho, Naércio Aquino, Marc-Andreas Muendler, and Garey Ramey. 2008. “The Structure of Worker Com- pensation in Brazil, with a Comparison to France and the United States.” Review of Economics and Statistics 90 (2): 324–46. Musa, George J., Po-Huang Chiang,Tyler Sylk, Rachel Bavley, William Keating, Bereketab Lakew, Hui-Chen Tsou, and Christina W. Hoven. 2013. “Use of GIS Mapping as a Pub- lic Health Tool: From Cholera to Cancer.” Health Services Insights 6 (November): 111–16. https://doi.org/10.4137/HSI .S10471. Nyeko, Oryem. 2019. “Tanzania Drops Threat of Prison over Publishing Independent Statistics.” Human Rights Watch, Dispatches, July 3. https://www.hrw.org /news/2019/07/03/tanzania-drops-threat-prison-over -publishing-independent-statistics. OECD (Organisation for Economic Co-operation and Development). 2013. The OECD Privacy Framework. Paris: OECD. http://www.oecd.org/sti/ieconomy/oecd_privacy _framework.pdf. OECD (Organisation for Economic Co-operation and Devel- opment). 2016. “Big Data: Bringing Competition Policy to the Digital Era.” Report DAF/COMP(2016)14 (rev. November 29–30), Competition Committee, Directorate for Financial and Enterprise Affairs, OECD, Paris. https:// one.oecd.org/document/DAF/COMP(2016)14/en/pdf. OECD (Organisation for Economic Co-operation and Devel- opment). 2018a. Rethinking Antitrust Tools for Multi-Sided Platforms 2018. Paris: OECD. https://www.oecd.org/daf /competition/Rethinking-antitrust-tools-for-multi-sided -platforms-2018.pdf. OECD (Organisation for Economic Co-operation and Development). 2018b. Tax Challenges Arising from Digi- talisation: Interim Report 2018. Paris: OECD. http://dx.doi .org/10.1787/9789264293083-en. OECD (Organisation for Economic Co-operation and Devel- opment). 2019. Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-use across Societies. Paris: OECD. https://www.oecd-ilibrary.org/content /publication/276aaca8-en. Oliver, Nuria, Bruno Lepri, Harald Sterly, Renaud Lambi- otte, Sébastien Deletaille, Marco De Nadai, Emmanuel Letouzé, et al. 2020. “Mobile Phone Data for Informing Public Health Actions across the COVID-19 Pandemic Life Cycle.” Science Advances 6 (23): eabc0764. https://doi .org/10.1126/sciadv.abc0764. Osgood-Zimmerman, Aaron, Anoushka I. Millear, Rebecca W. Stubbs, Chloe Shields, Brandon V. Pickering, Lucas Earl, Nicholas Graetz, et al. 2018. “Mapping Child Growth Failure in Africa between 2000 and 2015.” Nature 555 (7694): 41–47. https://doi.org/10.1038/nature25760. PwC. 2019. “Putting a Value on Data.” PwC, London. https:// www.pwc.co.uk/issues/data-analytics/insights/putting -value-on-data.html. Radner, Roy, and Joseph E. Stiglitz. 1984. “A Nonconcavity in the Value of Information.” In Bayesian Models in Economic Theory, edited by Marcel Boyer and Richard E. Kihl- strom, 33–52. Studies in Bayesian Econometrics Series 5. Amsterdam: Elsevier. Romer, Paul M. 1990. “Endogenous Technological Change.” Journal of Political Economy 98 (5): S71–S102. Rosalsky, Greg. 2020. “The Dark Side of the Recovery Revealed in Big Data.” Planet Money Newsletter, October 27, 2020. https://www.npr.org/sections/money/2020/10 /27/927842540/the-dark-side-of-the-recovery-revealed-in -big-data. Rosenberg, Matthew, Nicholas Confessore, and Carole Cad- walladr. 2018. “How Trump Consultants Exploited the Facebook Data of Millions.” New York Times, March 17, 2018. https://www.nytimes.com/2018/03/17/us/politics /cambridge-analytica-trump-campaign.html. Rowntree, Benjamin Seebohm. 2000. Poverty: A Study of Town Life, 2d ed. Bristol, UK: Policy Press. Serajuddin, Umar, Hiroki Uematsu, ChristinaWieser, Nobuo Yoshida,andAndrewL.Dabalen.2015.“DataDeprivation: Another Deprivation to End.” Policy Research Working Paper 7252, World Bank, Washington, DC. Statistics Canada. 2019. “The Value of Data in Canada: Exper- imental Estimates.” Daily, July 10, 2019, Statistics Canada, Ottawa. https://www150.statcan.gc.ca/n1/pub/13-605-x /2019001/article/00009-eng.htm.
  • 67. Harnessing the value of data for the poor | 47 Stephens-Davidowitz, Seth. 2017. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are. Illus. ed. New York: Dey Street Books. Thorvaldsen, Gunnar. 2017. Censuses and Census Takers: A Global History. Routledge Studies in Modern History Series. London: Routledge. https://doi.org/10.4324/97813 15148502. UFPEL (Federal University of Pelotas). 2020. “FAPESP e Todos pela Saúde garantirão a continuidade do estudo EPICOVID-19 BR.” Coordenação de Comunicação Social, Pró-Reitoria de Gestão da Informação e Comunicação, UFPEL, Pelotas, Rio Grande do Sul, Brazil. http://ccs2 .ufpel.edu.br/wp/2020/08/31/fapesp-e-todos-pela-saude -garantirao-a-continuidade-do-estudo-epicovid-19-br/. US Department of Commerce. 2014. “Fostering Innovation, Creating Jobs, Driving Better Decisions: The Value of Government Data.” Office of the Chief Economist, Eco- nomics and Statistics Administration, US Department of Commerce, Washington, DC. https://www.commerce .gov/files/fostering-innovation-creating-jobs-driving -better-decisions-value-government-data. Wallis, Helen M., and Arthur Howard Robinson, eds. 1987. Cartographical Innovations: An International Handbook of Mapping Terms to 1900. Tring, UK: Map Collector Publications. Wesolowski, Amy, Taimur Qureshi, Maciej F. Boni, Pål Roe Sundsøy, Michael A Johansson, Syed Basit Rasheed, Kenth Engø-Monsen, et al. 2015. “Impact of Human Mobility on the Emergence of Dengue Epidemics in Pakistan.” ProceedingsoftheNationalAcademyofSciences 112 (38): 11887–92. Whitby, Andrew. 2020. The Sum of the People: How the Census Has Shaped Nations, from the Ancient World to the Modern Age. New York: Basic Books. World Bank. 2002. World Development Report 2002: Building Institutions for Markets. Washington, DC: World Bank; New York: Oxford University Press. World Bank. 2016. World Development Report 2016: Digital Dividends. Washington, DC: World Bank. World Bank. 2018. Data for Development: An Evaluation of World Bank Support for Data and Statistical Capacity. Wash- ington, DC: Independent Evaluation Group, World Bank. World Bank. 2019. IC4D, Information and Communications for Development 2018: Data-Driven Development. Washington, DC: World Bank. http://documents1.worldbank.org /curated/en/987471542742554246/pdf/128301-97814648 13252.pdf. Yala, Adam, Peter G. Mikhael, Fredrik Strand, Gigin Lin, Kevin Smith, Yung-Liang Wan, Leslie Lamb, et al. 2021. “Toward Robust Mammography-Based Models for Breast Cancer Risk.” ScienceTranslationalMedicine 13 (578): 1–11. https://doi.org/10.1126/scitranslmed.aba4373. Zingales, Luigi. 2017. “Towards a Political Theory of the Firm.” Journal of Economic Perspectives 31 (3): 113–30. Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. New York: Public Affairs. https://www.hbs.edu/faculty/Pages /item.aspx?num=56791.
  • 68. 48 | World Development Report 2021 A novel experiment in India empowered villagers—particularly women, many of them illiterate—to design their own process for collecting and deploying data to track changes in the quality of public services and in their living standards and to make better decisions in village meetings. Spotlight 1.1 Helping communities to gain the ability to collect and analyze their own data In 2014 the World Bank’s Social Observatory, working closely with the Pudhu Vaazhvu Project of the South Asia Livelihoods team in the south Indian state of Tamil Nadu, developed a method called participatory tracking.1 This effort built on the institutional con- text in India, whereby democratically elected village councils hold regular, open village meetings in which budget priorities are planned and monitored, and there is a network of women’s self-help groups with a strong presence in every village in the state. Partici- patory tracking proceeded in three steps. First, representatives of women’s groups from 200 villages engaged in several weeks of deliberations to think about what constitutes the good life for them, turn those ideas into indicators measured using sur- vey questions, and then test those questions in their villages through a simple questionnaire that took no more than 30 minutes to answer. Second, the community-designed questionnaire was incorporated into tablet-based software. A mem- ber of each women’s group was trained via video on how to administer the questionnaire in her own village. In the pilot, women were able to conduct a census of 40,000 households in about six weeks. Once the survey was conducted, the data were dispatched directly to a cloud server to prevent anyone from tam- pering with them. About one-third of villagers could not read or write. Thus, the team developed ways of showing the data that would be understandable to people who were not literate. Third, the Social Observatory team “coproduced” data visualizations with the community. Figure S1.1.1 provides an example, developed to demonstrate vari- ations in decision-making within the household. If a member of the family has more authority over a decision, his or her face becomes bigger. If women have more authority, the colored background shades from dark gray toward dark green. The village median is displayed alongside that of another vil- lage for comparison. These data were presented at village planning meetings. They substantially improved the quality of deliberation by allowing citizens and officials to focus on the issues of concern rather than debate the facts about where decision-making power lay. Versions of participatory tracking that focus more on the management of public goods and common property are being designed and will be scaled up in the Indian states of Tamil Nadu (where the pilot was developed) and Karnataka, which will cover more than 75 million people, and nationwide in Indonesia, which will cover another 145 million.
  • 69. Helping communities to gain the ability to collect and analyze their own data | 49 Note 1. Palaniswamy, Sakhamuri, and Xia (2017). Reference Palaniswamy, Nethra, Smriti Sakhamuri, and Cassandra Xia. 2017. “Participatory Tracking: Customizing Visualiza- tions.” Social Observatory (blog), September 2017. http:// socialobservatory.worldbank.org/articles/participatory -tracking-customizing-visualizations. Figure S1.1.1 A citizen-led method to ascertain who has authority in household decision-making in rural Indian villages Source: Palaniswamy, Sakhamuri, and Xia 2017. Note: The figure shows a screenshot of a data visualization developed to demonstrate variations in household decision-making. If a member of the family has more authority over a decision, his or her face becomes bigger. If women have more authority, the colored background shades from dark gray toward dark green. ’
  • 70. 50 | World Development Report 2021 Improvements in the collection, reporting, and monitoring of data about debt will be critical to borrowers and creditors alike. Spotlight 1.2 The importance of good data in helping low- and middle-income countries to manage debt during and after the COVID-19 pandemic International Monetary Fund (IMF) have proposed the DSSI.2 In managing current and future debt, governments and partners will need to improve the coverage, quality, timeliness, granularity, and trans- parency of debt data.3 This effort requires investing in both the data themselves and in the systems for collecting, managing, analyzing, and reporting data, especially for countries where the risks are greatest. The World Bank and IMF have offered the following recommendations to the G-20:4 Data on public debt remain opaque in some countries, potentially enabling governments to overborrow and hide debts from their citizens and creditors, at least for some period (see the example of Mozambique from chapter 1). This vulnerability is compounded by the high (reported) debt levels of lower-income countries at the outset of the COVID-19 crisis and the changing composition of private creditors and debt instru- ments. In 2019 almost half of all low-income countries were either in debt distress or at high risk of it. As the pandemic pushes as many as 150 million people into extreme poverty,1 countries may need to take on substantial additional debt, which could result in large debt overhangs that could take years to manage. The proliferation of complex debt instruments may make it easier for governments to obscure their debt position. Moreover, the composition of creditors is changing: China, the largest creditor, increased its share of the combined debt owed by Debt Service Suspension Initiative (DSSI)-eligible countries to G-20 countries from 38 percent to 57 percent between 2013 and end-2019 (figure S1.2.1). These changes could create new exposures, especially regarding access to future debt. While most lower-income countries owe a relatively small share of their external public debt to private creditors, some countries, including Chad, Côte d’Ivoire, Ghana, Saint Lucia, and Zambia, owe as much as 50–60 percent to private creditors. Private sector participation in achieving a sustainable debt trajectory will become increasingly critical for many countries. Recognizing the pressing need to manage the debt burden of low-income countries, the World Bank and Figure S1.2.1 In six years, the composition of debt has shifted dramatically DSSI-eligible countries’ bilateral debt: Composition of creditors, 2013 and 2019 Source: World Bank, Debt Data (dashboard), https://datatopics.worldbank .org/debt/. Data at http://bit.do/WDR2021-Fig-S1_2_1. Note: For more information, see “What Is the External Debt Reporting System (DRS)?” World Bank, Washington, DC, https://datahelpdesk .worldbank.org/knowledgebase/articles/381934-what-is-the-external -debt-reporting-system-drs. DSSI = Debt Service Suspension Initiative. China Other G-20 bilateral creditors Other bilateral creditors 0 10 20 30 % of bilateral debt 40 50 60 Japan India 2013 2019
  • 71. The importance of good data | 51 • Establish clear and internationally harmonized concepts and definitions of debt. Governments should adopt international reporting and statistical standards that clearly define debt concepts. New standards should be developed where gaps exist to improve the granularity of debt data. •  Strengthen legal frameworks within countries. Stronger legal frameworks have clearly defined organiza- tional structures, roles, and responsibilities; suffi- cient monitoring, auditing, and compliance mecha- nisms in place to avoid conflicts of interest; internal controls to make sure that laws, procedures, and policies are followed; and well-coordinated debt management and other fiscal policies and financial frameworks. •  Build a functional debt recording, management, and dissemination system. A robust system is needed to facilitate the production and use of timely, accurate, high-quality, reliable, and complete data. The sys- tem should be interoperable with other key appli- cations and be protected by appropriate security and data protection controls. Disseminating debt data openly in user-friendly formats and building the capability of systems to produce reports for borrowers and inform debt service transactions can facilitate the use of data. •  Improve the organizational structure. High-level gov­ ernment commitment and clear mandates are essential.5 Effective coordination and account- ability mechanisms can help to disentangle frag- mented debt management functions across institu- tions and entities and ensure the timely sharing of data and analysis. Debt management offices should have the authority to collect data from state-owned enterprises or other indebted public entities. •  Strengthen staff capacity. Highly skilled staff with technical capacity to extract, analyze, and publish debt data are needed and should be retained and adequately compensated. Multilateral institutions such as the World Bank and IMF play a critical role in improving debt trans- parency by improving coverage in the databases they manage, providing technical assistance, performing assessments on a country’s current debt manage- ment framework, designing international standards on debt, and building awareness. Lenders, along with borrowers, also have a responsibility to report debt information fully, accurately, and transparently. All of these steps will help creditors to assess accurately the debt sustainability of their poten- tial borrowers, citizens to hold their governments accountable for the debt they assume, and borrowers to design strategies based on a clear understanding of the level, cost, and risk profile of their debt portfolio. Increased debt transparency will also help many low- and middle-income countries to assess and manage their external debt during and after the COVID-19 cri- sis and to work with policy makers toward achieving sustainable debt levels and terms. Notes 1. World Bank (2020b). 2. World Bank (2020a). 3. IMF and World Bank (2018). 4. World Bank (2020a). 5. Teeling (2018). References IMF (International Monetary Fund) and World Bank. 2018. “G20 Notes on Strengthening Public Debt Transpar- ency.” IMF, Washington, DC, June 14, 2018. https://www .imf.org/external/np/g20/pdf/2018/072718.pdf. Teeling, Gerry. 2018. “Debt Data Transparency.” Back- ground paper for Intergovernmental Group of Experts on Financing for Development, 2nd Session, United Nations Conference onTrade and Development, Geneva, November 7–9, 2018. https://unctad.org/system/files /non-official-document/tdb_efd2c03_Teeling_en.pdf. World Bank. 2020a. “Debt Service Suspension and COVID-19.” Factsheet, December 21, 2020, World Bank, Washington, DC. https://www.worldbank.org/en /news/factsheet/2020/05/11/debt-relief-and-covid-19 -coronavirus. World Bank. 2020b. Poverty and Shared Prosperity 2020: Rever- sals of Fortune. Washington, DC: World Bank. https:// openknowledge.worldbank.org/bitstream/handle/10986 /34496/9781464816024.pdf.
  • 73. Data as a force for public good | 53 Main messages Public intent data, a foundation of public policies, can play a transformative role in the public sector. However, gaps in the availability, quality, and usability of these data are pervasive, particularly in low-income countries— the countries that stand to benefit most from improving public intent data. Lack of resources, technical capacity, and data governance hamper the production of useful data for public policy. Lack of data literacy and demand for data limits their use for public policy. These problems can be addressed through the high-level prioritization of data, including long-term financing, investments in human capital, and laws conducive to the safe production, exchange, and use of data. Some investments in better data have paid for themselves. Ensuring a political commitment to and predictable government financ- ing for the production of public intent data remains a central struggle in lower-income countries. The political will to prioritize funding for data systems can be stimulated by boosting the demand for data. Data as a force for public good 2 1 2 3 4 CHAPTE R
  • 74. 54 | World Development Report 2021 The central role of public intent data S uppose a woman walks into a doctor’s office and is given a diagnosis without examination by the doctor: no measurement of her heart rate, no recording of her symptoms, and no review of her medical history. The doctor just prescribes a med- ication. Such an approach, and such a world in which crucial data are not gathered, analyzed, and acted on, would not be welcome, to say the least.1 Yet all too often governments make decisions affecting people’s well-being without understanding or even taking into account essential data. Designing policies without data is akin to a shot in the dark.2 This problem is particularly acute in the poorest countries, where gaps in both the availability and the use of data are severest.3 Just as data gathered by a doctor can help improve a patient’s diagnosis and ultimate well-being, data gathered by governments, international organiza- tions, research institutions, and civil society can improve societal well-being by enhancing service delivery, prioritizing scarce resources, holding gov- ernments accountable, and empowering individuals. These data serve as the foundation for core functions of governments and their endeavors to reduce pov- erty. The data a doctor gathers often take the form of a conversation or some other means of communicating information between patient and doctor. In the same way,datagatheredwiththeintentofinformingpublic policy should enrich the policy dialogue and allow for systematic flows of information and communication among governments, their citizens, and commerce. Such flows of information and communication require long-term investments in statistical capacity, infrastructure, data governance, data literacy, and data safeguards. These investments depend on one another. Failure in one area jeopardizes the value that data bring to development. Too often these invest- ments are not made in the poorest parts of the world, contributing to data deprivations and poverty. How should such deprivations be addressed? This chapter discusses the pathways through which data for public policy generate value for development, the obstacles to safe realization of value, and how those obstacles can be overcome. Public intent data and development: Three pathways for adding value Public intent data—data collected with the intent of serving the public good by informing the design, execution, monitoring, and evaluation of public pol- icy, or through other activities—are a prerequisite for many government functions. For that reason, government agencies are the primary producers of public intent data through censuses, surveys, and administrative data, among other things. Citizens, civil society organizations (CSOs), nongovernmental organizations (NGOs), academic institutions, and international organizations also contribute critically to the production of public intent data through sur- veys, crowdsourcing platforms, and other means. Data from firms can also be used for public policy— a topic that will be covered in chapter 4.4 This chapter distinguishes between six types of public intent data that all serve the public good (box 2.1). Thediscussionthatfollowsusescountryexamples to describe three important pathways through which public intent data can bring value to development by (1) improving service delivery, (2) prioritizing scarce resources, and (3) holding governments accountable and empowering individuals. But these are not the only pathways. Others include regulating the econ- omy and markets, fostering public safety and secu- rity, and improving dispute or conflict resolution. The country examples reveal several conditions that should be in place to maximize the value of pub- lic intent data. The data need to be (1) produced with adequate spatial and temporal coverage (complete, timely, and frequent); (2) high in quality (granular, accurate, and comparable); (3) easy to use (accessible, understandable, and interoperable); and (4) safe to use (impartial, confidential, and appropriate)—see figure 2.1.5 With these features, development-related data have the potential to transform development out- comes. For this potential to be realized, the data must be used explicitly to generate public good, including through the three pathways summarized in the fol- lowing sections. Pathway 1: Improving service delivery Increasing access to government services. One of the fundamental ways in which public intent data can improve livelihoods is by increasing access to gov- ernment services. More access often requires data representative of all residents. Use of administrative data, particularly foundational identification (ID) systems such as national IDs and civil registries as well as digital identification, ensures that all persons are covered and access is equitable. In Thailand at the turn of the century, only 71 percent of the population was covered by a public health insurance scheme that was intended to be universal. Yet the country had a near-universal foundational ID and population
  • 75. Data as a force for public good | 55 Box 2.1 Six types of public intent data Administrative data—such as birth, mar- riage, and death records and data from identification systems; population, health, education, and tax records; and trade flow data—are generated by a process of registration or record keeping, usually by national authorities. Admin- istrative data also include data used by governments to run projects, programs, and services. The digital revolu- tion has created new types of administrative data—for example, when education and health inspectors’ use of smartphone apps channels data to a central register. Censuses aim to systematically enumerate and record information about an entire population of interest, whether individ- uals, businesses, farms, or others. Most prominently, population and housing censuses record every person present or residing in a country and provide essential information on the entire population and their key socioeconomic conditions. Sample surveys draw on a smaller, repre- sentative sample of the entire population, typically from censuses, to collect detailed information more frequently. These sur- veys cover many domains such as household surveys, farm surveys, enterprise surveys, labor force surveys, and demographic and health surveys. Key official statis- tics, such as unemployment and national accounts, rely on survey data, often in combination with administrative data and census data.a Citizen-generated data are produced by individuals, often to fill gaps in public and private sector data or when the accuracy of existing data is in question. These data, which can have an important monitoring and account- ability function, contribute to solving problems that citizens face.b Examples include HarassMap, an Egyptian tool that maps cases of sexual harassment based on citizen reports, and ForestWatchers, a platform through which citizens monitor the deforestation of the Amazon. By contrast, machine-generated data are automatically generated by a sensor, application, or computer process without human interactions. An example is the sensors that monitor air pollution. These data emerge when devices are embedded with sensors and other technologies, allowing them to transfer data with each other, a system known as the Internet of Things. Geospatial data relate multiple layers of information based on their geographic locale. Public intent geospatial data include satellite imagery of the Earth such as that provided by the US National Aeronautics and Space Administration’s Landsat program and the European Space Agency’s Copernicus program; weather data; and cadastral (property and land record) data.c These data types are neither exhaustive nor mutually exclusive. For example, all data sources can be geo­ referenced and thus can be used in geospatial applica- tions, and some administrative data and geospatial data can be machine-generated. Data sources are interoper- able when they can be linked across and within these types though common numeric identifiers for persons, facilities, or firms; geospatial coordinates; time stamps; and common classification standards. a. Sample surveys also include the surveys that are implemented by social media companies and target a sample of users who are active on their platforms. Examples include the Future of Business and Gender Equality at Home surveys conducted on the Facebook platform. b. Meijer and Potjer (2018). c. Such data sources are discussed in greater detail in chapter 4. Figure 2.1 Certain data features can maximize the value of public intent data Source: WDR 2021 team, drawing on Jolliffe et al. (forthcoming). Ensuring the data have adequate coverage • Completeness • Timeliness • Frequency • Granularity • Accuracy • Comparability • Accessibility • Understandability • Interoperability • Impartiality • Confidentiality • Appropriateness Ensuring the data are of high quality Ensuring the data are easy to use Ensuring the data are safe to use
  • 76. 56 | World Development Report 2021 registration system in which citizens and residents were issued a personal ID number when they were born or when their households were registered for the first time. Leveraging this register and the personal ID information from the existing public insurance scheme, the government was able to identify the pop- ulation not covered and so was able to increase health insurance coverage from 71 percent to 95 percent.6 Machine-generated data also have the potential to markedly improve access to services such as water. In Kenya, sensors on water hand pumps, which are inoperable in one-third of rural Africa, provide real- time data on their functionality. This system helped reduce the average time to repair a broken pump from 27 days to three days and the median time from six days to one day (figure 2.2).7 Better preparing for and responding to emergencies. Public intent data can also lead to a better emergency response when disasters hit, whether environmental, financial, health, or conflict related. For example, weather data, especially weather forecasts, can help people anticipate and prepare for extreme events. The value of such data was revealed by two intense cyclones in the Bay of Bengal 14 years apart. The 1999 cyclone caught the Indian state of Odisha by surprise, causing massive devastation, killing more than 10,000 people, and destroying housing and public infrastructure. Since then, the Odisha State Disaster Management Authority and the government of Odisha have invested in weather forecast data and disaster response measures. When another cyclone hit in 2013, nearly 1 million people were evacuated to cyclone shelters, safe houses, and inland locations, and only 38 people died during and immediately after the storm.8 These impressive results would not have been possible without the weather data that gave ­ sufficient advance warning of the cyclone. Mobile technologies have the potential to speed up emergency responses. In Uganda, a health report- ing program that provides beneficiaries, health pro- fessionals, and the Ministry of Health with real-time health data by using text messaging was able to cut the response time to outbreaks of disease by half. The technology was used after the 2012 Ebola outbreak to help implement quarantines and other protective measures.9 As these examples demonstrate, timely data can contribute to quick reactions to a crisis. Generating useful knowledge. Data generated and used by academic institutions, think tanks, and inter- national organizations play a vital role in ensuring thatpoliciesareevidence-based.Impactevaluationsof reformsanddevelopmentprojectsarefrequentlyused to assess whether past policies have had the intended consequences and to improve program design. In the last few decades, numerous field experiments have tested policies in a real-life setting under strict statis- tical conditions that allow cause and effect to be ascer- tained. Findings from such experiments have been used to implement new policies and scale up existing programs. One estimate suggests that the new poli- cies and programs built on the research findings have reached more than 400 million people worldwide.10 In Brazil, evidence from 2,150 municipalities found that many mayors are willing to pay to learn the results of impact evaluations, and that informing mayors about research on a simple and effective pol- icy increases the probability by 10 percentage points that their municipality implements the policy.11 Research also plays an important role in ensuring the accuracy of the data collected by governments, which is critical to preventing policy recommenda- tions based on inaccurate or misleading data.12 The World Bank’s Living Standards Measurement Study (LSMS) program, while supporting the production of household survey data in 106 countries between 2011 and 2020,13 has also drawn attention to the impor- tance of research on survey methodologies and the role of better measurement in eliminating systematic measurement errors in self-reported survey data that otherwise bias empirical analyses and policy conclu- sions.14 Muchofthemethodologicalresearchledbythe LSMS is carried out in partnership with national sta- tistical offices (NSOs), in turn facilitating the adoption of improved methods in downstream national surveys. Pathway 2: Prioritizing scarce resources Targeting resources and reaching marginalized populations and areas. When public intent data are granular—that is, they are tied to an individual or a specific location— they can help target resources and foster inclusion. In Croatia, data from the population census were Figure 2.2 Improving access to water: Using real-time sensor data to reduce repair time for broken hand pumps in Kenya Source: SSEE 2014. Data at http://bit.do/WDR2021-Fig-2_2. 27 6 2.6 1 0 5 10 15 20 25 30 Mean repair time Median repair time Number of days With mobile-enabled data Baseline
  • 77. Data as a force for public good | 57 combined with household survey data and admin- istrative data to create detailed maps of poverty and deprivations (map 2.1).15 The maps revealed large dif- ferences in living standards across municipalities and within the territorial boundaries used for allocating funds from the European Union (EU).16 More than one-third of the EU’s annual budget—equivalent to more than €50 billion—is dedicated to investments in infrastructure, such as hospitals and schools, in less economically developed areas. Because the allocation of funds depends on an area’s gross domestic product (GDP) per capita, poor municipalities situated in non- poor regions may not receive funding. Armed with the poverty map, Croatia responded with proposals for new geographical subdivisions that concentrate EU funds in the poorest areas.17 This reordering, thanks to better data and analysis, has the potential to reduce inequality and pockets of poverty in Croatia. A long-running and rich example of the value of granular data are the Demographic and Health Surveys, which cover topics such as HIV/AIDS and gender-based violence (see spotlight 2.1). Over the last few decades, data from 82 of these surveys, disaggre- gated by sex, have been used as inputs for developing laws banning domestic violence, developing HIV education programs, and more.18 In Vietnam, a survey on gender-based violence revealed that more than half of women have experienced physical, sexual, or emotional abuse; that nearly half of these had physi- cal injuries as a result; and that seven in eight did not seek any help. These data spurred a public discussion about the topic, informed the National Strategy on Gender Equality, and introduced counseling, health, legal, and shelter services for women subject to vio- lence at home.19 Saving money and resources. Interoperability between geospatial data and government records can help governments save resources. Incomplete and out-of-date property and taxpayer records are an important reason that taxes remain uncollected in many low- and middle-income countries. InTanzania, the government introduced a Geographic Informa- tion System for tax reporting and revenue collection. The system identified buildings via satellite imagery, collected and digitized data on their characteristics, and provided a comprehensive, up-to-date record of taxable properties. Using this new method, the city government of Arusha identified 102,904 buildings— nearly five times more than with earlier databases. One year after the system was introduced, the eight participating cities increased their revenue collection by 30 percent on average.20 Interoperable administrative data have also been used to increase efficiencies and save costs in public welfare systems. For example, in Argentina the gov- ernment identified ineligible beneficiaries across various social programs using the country’s system of unique taxpayer ID numbers. The exercise gener- ated estimated savings of US$143 million over eight years.21 More generally, investments in better data systems have been shown to pay for themselves.22 Monitoring progress and determining priorities. Public intent data can also help prioritize resources by mon- itoring progress on key indicators and deliverables over longer periods of time. Such monitoring is vital for creating and tracking national and international development goals. The Sustainable Development Goals (SDGs), for example, rely heavily on public intent data.23 If the data needed to measure one of the targets were collected only every 10 years, tracking progress would become challenging. Cross-country comparable composite indexes— often created by think tanks, research institutions, and international organizations—allow countries to benchmark their performance over time and against peers and to decide on priorities. These data can induce countries to respond with reforms in areas where they are lagging. Multidimensional poverty indexes, which measure poverty at the household and individual levels, track certain indicators in countries Map 2.1 Reducing poverty: Mapping pockets of poverty in Croatia allowed better targeting of antipoverty funds Source: World Bank 2016a. Data at http://bit.do/WDR2021-Map-2_1. IBRDCRO45401 | MARCH2021
  • 78. 58 | World Development Report 2021 over time, helping countries decide on areas of focus. Costa Rica issued a presidential directive calling for use of such an index for budgetary planning and as an official measure for allocating resources and mon- itoring and evaluating social programs. The country has used the index to modify its budget allocation, which helped accelerate poverty reduction during a period of austerity without an increase in budget.24 Pathway 3: Holding government accountable and empowering individuals Fostering transparency and increasing government accountability. CSOs and individuals are frequent pro- ducers and users of public intent data. Their demand for data can encourage transparency through data analysis and data feedback systems. In China, media and watchdog organizations in Beijing noted incon- sistencies between official government data on air quality and data from independent air quality monitoring systems. Heightened concerns about air quality have fueled a dramatic expansion in publicly available, real-time data from thousands of air qual- ity monitoring locations.25 The central government launched a US$275 billion plan to improve air quality throughout the country, and the Beijing municipal government promised an additional US$160 billion toward that goal.26 Good data can also encourage transparency in and improve public procurement. Too often, public proj- ects are not implemented adequately due to poor pro- curement such as inflated costs, corruption, or ghost contracts. Because 12 percent of global GDP is spent on public procurement, this finding matters tremen- dously for development outcomes.27 In Uganda, in an attempt to improve procurement outcomes, local government entities made administrative procure- ment data from the bidding process down to the level of execution of contracts available to certain CSOs. These CSOs trained community members to under- stand the information in the contracts and conduct site checks to verify it.The findings revealed misman- agement of resources by contractors and government officials and a high dependence on noncompetitive contracts. Not only did Uganda undertake reforms to ensure that contracts were complying with national procurement standards, but the national public pro- curement agency also upgraded its procurement por- tal in line with international open contracting data standards, making Uganda the first African country to do so.28 Government accountability can also be enhanced through e-governance.29 In Pakistan, a smartphone app that equips government health inspectors with real-time data on rural public health clinics led to a 74 percent increase in clinic inspections. In turn, doc- tor attendance rose by 18 percentage points, thereby improving health care services.30 Empowering individuals. Disadvantaged groups are sometimes left out of government efforts to collect data because governments fail to acknowledge inclu- sion of those groups as a policy objective. Citizens must then often collect the data needed to empower themselves. That data, such as on harassment and early warning systems, can help fill a gap that neither the public sector nor the private sector can fill. The map-based mobile app Safetipin allows users to report mobility and safety issues in cities related to lighting, walk paths, visibility, public transport, and security. Beyond informing citizens where it is safe to be in their city, these data can be used to conduct citywide audits. In Bogotá, Colombia, the city government wanted to use this tool to map safety around bike paths. The biking community helped collect images along 230 kilometers of bike paths in the city, which were then analyzed by Safetipin (map 2.2). This analysis supported the authorities in understanding where to improve lighting and add closed-circuit TV cameras.31 Public intent data can also empower individuals to make better choices through more information and knowledge. The digital revolution has greatly increased the accessibility of data, as well as how easily information can be spread. One example is providing smallholder farmers with agricultural information digitally, often through text messages, to increase their productivity. Such data transmis- sion can improve on extension services, which rely on in-person agricultural advice and are more costly to sustain and whose quality is more difficult to ensure. A meta-analysis suggests that providing agricultural information increases yields by 4 percent and farmers’ probability of increasing productivity- enhancing agrochemical inputs.32 With more than 2 billion people living on smallholder farms, these numbers can have major effects on global poverty and shared prosperity. Many of the features of public intent data that increase their value for development can also increase their potential for harm. Data may be misused for political surveillance and control or discrimination and exclusion, or they may inadvertently expose sen- sitive information about individuals.33 For example, in República Bolivariana de Venezuela, a digital bio- metric fingerprint system was introduced initially for voter registration and identification, but it has since been integrated with other registers. Identification
  • 79. Data as a force for public good | 59 Map 2.2 Improving public safety: The use of citizen-collected data in Bogotá led to greater safety around bike paths Source: Safetipin 2016. Note: Safety ratings of poor to excellent for Bogotá bike paths are based on safety scores. with the digital fingerprint has become mandatory to purchase basic goods such as food and medicine, which has led to numerous cases of stores refusing to sell young people, foreigners, and LGBTQI individ- uals such goods.34 To avoid data being harmful in this and other ways, certain prerequisites must be put in place, notably robust data protection laws, indepen- dent oversight, and legal and technological solutions to safeguard the confidentiality of individuals and prevent misuse of data. Gaps in the coverage, quality, and usability of public intent data Despite the demonstrated value of public intent data, gaps in their availability, quality, and usability persist, particularly in poor countries. This section documents these gaps, drawing on the World Bank’s Statistical Performance Indicators (SPI), described in box 2.2, as well as two other prominent indexes rating public intent data availability and quality.35 When the coverage of data is inadequate: Lack of timeliness, frequency, and completeness Lack of timely and frequent data remains an issue in many thematic areas and across all types of public intent data. Timeliness is particularly an issue with survey and census data because long lags commonly occur between their collection and their release. For example, according to the Statistical Performance Indicators, half of low-income countries have not undertaken a population and housing census in the last10years,and18percenthavenotdonesointhelast 20 years.36 The census has a foundational function in any statistical system and is critical for political repre- sentation and resource allocation. The costs of allow- ing the census to become outdated are demonstrable.37 Monthly or quarterly industrial production indexes, which are important to track current economic activity, are available in only 9 percent of low-income countries, compared with 40 percent of lower-middle- income countries, 48 percent of upper-middle-income countries, and 64 percent of high-income countries.38 Ground-based sensors, deployed in Internet of Things systems, can measure some outcomes, such as air pollution, climatic conditions, and water qual- ity, on a continual basis and at a low cost. However, adoption of these technologies is still too limited to provide timely data at scale, particularly in low- income countries.39 Lack of completeness is often less of a problem in census and survey data because they are designed to cover the entire population of interest. For adminis- trative data, the story is different. Civil registration and vital statistics systems (births and deaths) are not complete in any low-income country, compared with completeness in 22 percent of lower-middle- income countries, 51 percent of upper-middle-income countries, and 95 percent of high-income countries.40 These gaps leave about 1 billion people worldwide without official proof of identity.41 More than one- quarter of children overall, and more than half of children in Sub-Saharan Africa, under the age of five are not registered at birth.42 Although population and housing censuses are designed to represent all individuals at the time of IBRDCOL45414 | MARCH2021
  • 80. 60 | World Development Report 2021 the census, they can leave out some of the poorest and most vulnerable. Many vulnerable groups are hard to count in the first place, especially when census enumeration focuses on residence and the concept of the household. These groups include the displaced, the homeless, slum inhabitants, nomads, migrants, young children, and the disabled.43 The extent of undercounting is difficult to measure systematically, but in 2013 it was estimated that globally between 170 million and 320 million people were missing from population census frames, with the poorest more likely to be missed.44 As noted, in many countries the census determines the allocation of resources and political representation. Thus these omissions have real consequences and can disenfranchise vulnerable populations.45 They also affect the representativeness of household surveys that use census-based sampling frames.46 Lower-income countries also are susceptible to coverage gaps in geospatial data, especially in some of the geospatial reference datasets such as admin- istrative boundaries, postal codes, and maps. The Global Open Data Index of the Open Knowledge Foundation assesses the availability and openness of three such geospatial datasets in 94 countries: administrative boundaries, addresses and locations, and national maps. The assessment reveals that all three datasets are often incomplete in lower-income countries (figure 2.3). Similarly, the road network coverage of the open mapping platform OpenStreetMap is complete in many high-income countries, but less so in lower- income countries. OpenStreetMap is a citizen- generated geospatial application that relies on its users to digitize the location of roads and other infra- structure. Its coverage disparities reflect the barriers to making this type of data work for the poorest countries. In India, by 2015 only 21 percent of the road network had been digitized.47 Figure 2.3 Gaps in geospatial datasets are especially large in lower-income countries Source: WDR 2021 team calculations, based on data of Open Knowledge Foundation, GODI (Global Open Data Index) (database), https://index.okfn.org/. Data at http://bit.do/WDR2021-Fig-2_3. 0 20 40 60 80 100 Share of countries with dataset gaps (%) Administrative boundaries Datasets Addresses and locations National maps Low-income Lower-middle-income Upper-middle-income High-income Box 2.2 The World Bank’s Statistical Performance Indicators The World Bank’s Statistical Performance Indicators (SPI) measure statistical performance across 174 coun- tries.a The indicators are grouped into five pillars: (1) data use, which captures the demand side of the statistical system; (2) data services, which looks at the interaction between data supply and demand such as the openness of data and quality of data releases; (3) data products, which reviews whether countries report on important indicators; (4) data sources, which assesses whether cen- suses, surveys, and other data sources are created; and (5) data infrastructure, which captures whether founda- tions such as financing, skills, and governance needed for a strong statistical system are in place. Within each pillar is a set of dimensions, and under each dimension is a set of indicators to measure performance. The indi- cators provide a time series extending at least from 2016 to 2019 in all cases, with some indicators going back to 2004. The data for the indicators are from a variety of sources, including databases produced by the World Bank, International Monetary Fund (IMF), United Nations (UN), Partnership in Statistics for Development in the 21st Century (PARIS21), and Open Data Watch—and in some cases, directly from national statistical office websites. The indicators are also summarized as an index, with scores ranging from a low of 0 to a high of 100. a. World Bank, Statistical Performance Indicators (database), http://www .worldbank.org/spi; Dang et al. (2021a, 2021b).
  • 81. Data as a force for public good | 61 When data quality is poor: Lack of granularity, accuracy, and comparability Lack of granularity can occur when data are not available at the desired level of disaggregation. The gaps in data on women and girls are partic- ularly severe. Only 10 of the 54 gender-specific indicators (19 percent) in the SDGs are widely available, based on international standards for measurement, and only 24 percent of the avail- able gender-specific indicators are from 2010 or later.48 Gaps in sex-disaggregated data related to the COVID-19 pandemic are also pervasive, causing knowledge of the gender impacts of the pandemic to be incomplete (box 2.3). Box 2.3 Gender data and the COVID-19 pandemic The COVID-19 pandemic was not gender-blind; it affected men and women differently and may have exacerbated gender inequalities.a Yet knowledge of the gender impacts of COVID-19 is incomplete because of data gaps across all dimensions of well-being. At the most basic level, data are lacking on COVID-19 infections and deaths among men and women. In March 2020, only 61 percent of reported COVID-19 cases were disaggregated by sex, and these data were provided by 26 countries. By November 2020, reporting had grown to 80 countries, but the proportion still stood at 60 percent. The reporting was irregular throughout 2020, as shown in figure B2.3.1. Understanding the gender dimensions of the COVID-19 impacts extends well beyond case and mortality data. The data systems in place prior to the pandemic had notable gender data gaps that hampered the ability to track impacts and inform policy. For example, monitor- ing impacts on jobs requires regular and timely data on informal employment where women predominate. However, only 41 percent of low-income countries (LICs) and lower-middle-income countries (LMICs) report data on informal jobs disaggregated by sex. And in seven of the 10 countries where the recent economic con- traction is severest, less than 38 percent of Sustainable Development Goal economic opportunity indicators are available by sex.b Furthermore, preexisting biases in face-to-face household survey design and implemen- tation bled into phone surveys implemented during the pandemic, limiting measurement of the gender-related impacts of the crisis. These biases include designing phone surveys aimed at household heads and lack of survey content on time use. There are also notable gaps in the gender data needed to inform policy design and effectiveness. Although the expansion of social protection programs is arguably the largest policy response to offset the economic impacts of the crisis, comparable sex-disaggregated measures of social protection coverage are largely unavailable. Data on personal identification cards and mobile phone own- ership should inform program design decisions, espe- cially as countries scale up digital platforms. Yet data on gender differences in ownership of personal identity cards are missing for more than a third of countries. Less than a quarter of LICs and LMICs report data on mobile phone ownership by women.c Even though the pandemic created new demands for statistics, it also interrupted the supply. More than half of LICs and LMICs reported that the COVID-19 pandemic affected national sta­ tistical offices’ ability to produce socioeconomic statistics.d This problem requires imme- diate attention, but building effective, gender-aware data systems will require sustained financial and human capital investments. Sources: Mayra Buvinic (Center for Global Development), Lorenz Noe (Data2x), and Eric Swanson (Open Data Watch), with inputs from the WDR 2021 team. a. UN Women (2020). b. Buvinic, Noe, and Swanson (2020). c. Buvinic, Noe, and Swanson (2020). d. UNSTATS and World Bank (2020). Figure B2.3.1 Proportion of COVID-19 cases reported with sex-disaggregated data for 190 countries Sources: Global Health 50/50, University College London, COVID-19 Sex-Disaggregated Data Tracker (database), November 30, 2020, data release, https://globalhealth5050.org/the-sex-gender-and-covid-19 -project/; Global Change Data Lab, University of Oxford, Our World in Data, Coronavirus Pandemic (COVID-19) (database), https://ourworld indata.org/coronavirus; calculations of Open Data Watch, Washington, DC. Data at http://bit.do/WDR2021-Fig-B2_3_1. 0 M a r c h A p r i l M a y J u n e 2020 J u l y A u g u s t S e p t e m b e r O c t o b e r N o v e m b e r 25 50 75 100 Percent Sex-disaggregated? Yes No Proportion of countries reporting sex-disaggregated data
  • 82. 62 | World Development Report 2021 Althoughdatadisaggregatedattheindividuallevel are central to understanding and addressing condi- tions that uniquely affect the lives of women, men, children, adults, the elderly, and persons with dis- abilities, the required data are not being sufficiently produced. For example, survey data on ownership of physical and financial assets have traditionally been collected at the household rather than the individual level, limiting their usefulness in understanding women’s relative wealth, rights, and decision-making power in their families.49 Monetary poverty esti- mates are also based on household-level measures of resources, and “poor individuals” are identified based on the poverty status of their entire households, regardless of differences within households among women, men, and children in access to and use of resources.50 Meanwhile, gaps remain in the adoption and proper implementation of the survey questions developed by the Washington Group on Disability Statistics—questions that are critical for obtaining internationally comparable estimates on disabilities and for disaggregating relevant SDG indicators by disability status.51 Finally, despite the enormous potential of geo- graphically granular data for targeting policies effec- tively, such disaggregated data are rarely available comprehensively. According to the 2020 Open Data Inventory, about 90 percent of official statistics, even when they are available, are not consistently reported at the regional level (first administrative division), and almost none are consistently reported at the dis- trict level (second administrative division). Poor accuracy of data can limit their usefulness. For those collecting individual-level data through household surveys, a concern is the choice of survey respondents. Relying on proxy respondents to elicit individual-level information—a common cost-saving mechanism in large-scale household surveys—has been shown to produce wrong estimates of gender differences in asset ownership, labor market out- comes, decision-making, and control of income.52 Reported levels of income, wages, and firm profits vary, depending on the length of the period over which they are recalled by survey respondents.53 The length of recall also matters for the accuracy of survey data on agricultural production, health, and labor.54 Accuracy is also a concern for administrative data. One reason for the proliferation of survey data is the perception that administrative records are unreli- able and incomplete.55 A study of multiple African countries found overreporting of vaccination rates in health information systems by 5 percent of countries and of primary enrollment rates in education manage- ment systems by a third. This data inflation appears to be connected to making aid flows conditional on results, creating an incentive to misreport.56 Data quality concerns and methodological challenges also characterize data produced by the Internet of Things. For example, the quality of data generated by low-cost commercial sensors used for air pollution monitoring has been found to vary widely when benchmarked against reference mea- surements.57 Sensors must be calibrated to the specific conditions in which they are used to yield accurate results, but the calibration process remains expensive and time-consuming.58 Lack of comparability is particularly a concern among low-income countries. Only 40 percent of low-income countries, 20 percent of countries in fragile and conflict-affected situations (FCS), and 40 percent of countries in Sub-Saharan Africa (fig- ure 2.4) have at least three comparable estimates of extreme poverty.59 It is therefore difficult to under- stand changes in living standards over time and design policies to eradicate poverty. Recent innova- tions in data collection in these countries suggest a slightly more optimistic picture for the future.60 It is also important to note that some lack of comparabil- ity over time is necessary, particularly when adopting new global standards. When data are not easy to use: Lack of accessibility, understandability, and interoperability Lack of data accessibility prohibits actors from using data. According to an assessment of the Open Data Inventory, lower-income countries lag far behind in overall data openness (table 2.1), although even high- income countries have mediocre openness scores. Only 11 percent of low-income countries consistently make data available with a license classifiable as open, compared with 19 percent of lower-middle-income countries, 22 percent of upper-middle-income coun- tries, and 44 percent of high-income countries. The Open Data Inventory assessment also reveals some limitations to machine readability. To the extent that governments publish official statistics, only 37 percent of low-income countries make at least some of these available in machine readable formats, compared with 51 percent of lower-middle-income countries, 61 percent of upper-middle-income coun- tries, and 81 percent of high-income countries. One reason for lack of data accessibility is that data systems in the public sector can be very fragmented. The health sector, for example, often has many dif- ferent health information systems because of its ten- dency to have many different service providers. These include many private providers whose data are often
  • 83. Data as a force for public good | 63 unavailable to the Ministry of Health. In Ethiopia, a study of the health sector found 228 different digital health information applications, of which only 39 per- cent sent data to the Ministry of Health.61 Administra- tive data, in particular, are too often siloed in different systems, prohibiting their effective use for monitoring and policy design. Although data coordination within agencies is often limited, the challenge of siloed sys- tems is even greater across government agencies.62 Lack of understandability prevents even those data that are accessible from generating value. To be understandable, data must be well disseminated, backed up with sufficient metadata, responsive to user needs, and, for certain purposes, summarized and visualized for the user. A majority of countries have data portals and provide metadata for their pub- lished data—practices that facilitate wider data use.63 Low-income countries perform comparatively well in the data portal and metadata categories, but even here they lag. A larger gap remains in terms of advance release calendars, which commit government units to release data on a predetermined timetable. Only Figure 2.4 Lower-income countries, especially those affected by fragility and conflict, have less comparable poverty data than other country groups Source: WDR 2021 team calculations, based on World Bank, PovcalNet: Data (database), http://iresearch.worldbank.org/PovcalNet/data.aspx. Data at http:// bit.do/WDR2021-Fig-2_4. Note: Only those economies with at least one international poverty estimate are included. FCS status refers to the World Bank’s “Classification of Fragile and Conflict-Affected Situations” (World Bank 2020a). 0 20 40 60 80 100 Share of countries (%) L o w e r - m i d d l e - i n c o m e L o w - i n c o m e U p p e r - m i d d l e - i n c o m e H i g h - i n c o m e E u r o p e a n d C e n t r a l A s i a L a t i n A m e r i c a a n d t h e C a r i b b e a n E a s t A s i a a n d P a c i fi c S o u t h A s i a M i d d l e E a s t a n d N o r t h A f r i c a S u b - S a h a r a n A f r i c a N o n - F C S F C S Income group Region FCS status Less than three poverty data points Last three poverty data points not comparable Last three poverty data points comparable Table 2.1 Assessment of the openness of data, by country income group Indicator Low-income Lower-middle- income Upper-middle- income High-income Openness score (0–100) 38 47 50 66 Available in machine readable format (%) 37 51 61 81 Available in nonproprietary format (%) 75 85 81 84 Download options available (%) 56 68 68 78 Open terms of use/license (%) 11 19 22 44 Source: WDR 2021 team calculations, based on 2020/21 Open Data Inventory indicators (Open Data Watch, ODIN [Open Data Inventory] [database], https://odin.opendatawatch.com/), also used as part of the World Bank’s Statistical Performance Indicators database, http://www.worldbank.org/spi. Note: The openness score is the average by country income group on a scale of 0–100. All other indicators are the percentage of published data averaged by country income group.
  • 84. 64 | World Development Report 2021 30 percent of NSOs in low-income countries pub- lish such calendars, compared with almost all high- income countries. Across the board, only a few NSOs utilize user satisfaction surveys, which could play an important role in gauging and understanding data demand (table 2.2). Limitations to interoperability. The use of common standards, methodologies, and classifications across public intent data sources ensures interoperability and enables data integration. Common and unified identification is needed across producers of pub- lic intent data for geographic divisions below the national level, such as regions, states, and districts. There is significant scope for expanding the use of georeferencing in censuses, surveys, and collection of administrative data, particularly in low-income settings. The use of common and unified personal identifiers to match data across multiple data sources is more contentious because of privacy and equity concerns, and robust data protection legislation is a prerequisite for their use.64 Personal identification also requires trust and comprehensive civil regis- tration and vital statistics systems, which have so far been elusive in the poorest countries. The use of tokenized identifiers in line with privacy by design principles is a potential solution.65 Adhering to set methodologies and standards in line with international best practices greatly increases the interoperability and usability of pub- lic intent data. The World Bank’s Statistical Per- formance Indicators capture this aspect of public intent data systematically. Under the indicator on data infrastructure, standards related to systems of national accounts, employment status, consumption, consumer price indexes, and government finance statistics, among others, are assessed. The indicator shows a strong income gradient in the adherence to international best-practice standards and methodol- ogies.66 For example, the International Classification of Status in Employment is being used in two-thirds of high-income countries but in only 7 percent of low-income countries (figure 2.5). By contrast, a large share of all countries globally is using at least the 1993 international standards for the System of National Accounts (SNA 1993). When data are not safe to use: Lack of impartiality, confidentiality, and appropriateness for development Gaps also remain in the safety of data. These can occur when data are not immune to influence from stakeholders, when they are not stored securely, or when they are not properly deidentified. For example, Greece’s debt statistics appear to have deliberately misrepresented the country’s financial situation in Table 2.2 Data dissemination practices and openness, by country income group Indicator Low-income Lower-middle- income Upper-middle- income High-income NSO uses advance release calendar 30 75 92 98 NSO has data portal 84 91 95 92 NSO has conducted user satisfaction survey 10 20 19 33 NSO makes metadata available 63 91 97 100 Source: Cameron et al. 2019. Note: Data are for 2019. The percentages reflect the proportion of the population in each income group whose national statistical office (NSO) has the listed attribute. Figure 2.5 Lower-income countries are less likely than other countries to adhere to international best- practice statistical standards and methodologies Source: WDR 2021 team calculations, based on World Bank, Statistical Performance Indicators (database), http://www.worldbank.org/spi. Data at http://bit.do/WDR2021-Fig-2_5. 0 20 40 60 80 100 At least 1993 System of National Accounts in use Share of countries (%) International Classification of Status in Employment in use Low-income Lower-middle-income Upper-middle-income High-income
  • 85. Data as a force for public good | 65 the lead-up to the 2009 euro crisis, and data breaches are all too common in government and private sector databases.67 Similarly, deidentifying individuals has not always proved to be enough to maintain confidentiality. In the 1990s, the governor of Massachusetts in the United States approved making deidentified medical records of state employees available for researchers. Although keyidentifierssuchasnameandaddresswereremoved from the data, by triangulating the information avail- able with other public information a researcher was abletoidentifythemedicalrecordsofthegovernorand other individuals (see chapter 6 for more details).68 One way to minimize these concerns is to ensure that only appropriate data are produced—data that measure con- cepts of interest, have a clear policy purpose, and are not produced from attempts to collect excessive infor- mation or surveil individuals. Such data, of course, can still be misused and mishandled. Why data gaps persist: The political economy of public intent data The previous two sections describe how public intent data can yield great value for development, yet gaps in public intent data are severe, particularly in low-income countries—the countries that stand to benefit most from the data. Why do these data gaps persist? This section answers that question, com- plementing existing data sources with structured interviews with NSOs across all income groups and geographical regions.69 This approach requires dig- ging one level deeper and understanding the main roadblocks on the pathways to data for public policy, or conversely, the enablers of public intent data. The main roadblocks identified are lack of financing, tech- nical capacity, data governance, and demand for data (figure 2.6). A common reason for these roadblocks is lack of understanding of and commitment to the use of data for policy making. In a positive feedback loop, realiz- ing the value of public intent data increases under- standing of the potential of the data, leading to a com- mitment to the further production and use of public intent data. To spearhead such commitments, SDG Target 17.18 calls for increasing the availability of high- quality, timely, and disaggregated reliable data, and SDG Target 17.19 calls for developing measurements of progress related to statistical capacity building. Deficiencies in financing Underinvestment and misaligned investment priori- ties are perpetuating data gaps. Figure 2.6 A positive feedback loop can connect enablers and features of public intent data with greater development value Source: WDR 2021 team. delivery Financing Technical capacity Governance Data demand Enablers of public intent data Desirable features of public intent data Adequate coverage Completeness Timeliness Frequency High quality Granularity Accuracy Comparability Easy to use Accessibility Understandability Interoperability Safe to use Impartiality Confidentiality Appropriateness Value of public intent data Improved service scarce resources Prioritization of empowerment Accountability and
  • 86. 66 | World Development Report 2021 Underinvestment by governments. Underinvestment in public intent data systems is widespread. Only half of countries had a national statistical plan that was fully funded in 2019 (figure 2.7).70 Lack of national funding for statistics is especially a struggle for frag- ile and conflict-affected countries, countries in Sub- Saharan Africa, and low-income countries. Whereas 93 percent of high-income countries have a fully funded national statistical plan, not a single low-income country has one. A recent review of public financing of statistics found that seven of 10 low- and middle- income countries analyzed funded less than half of their respective national statistical plans, with country contributions ranging from 9 percent to 77 percent.71 This problem is more pressing in low-income countries with less government revenue to spend on multiple priorities. However, the cost of public data systems is modest relative to that of other government functions. Decision-makers in budget offices may not fully understand how much funding is needed to produce high-quality data or lack the incentives to prioritize data. How well public data systems are funded is thus also a matter of high-level government officials recognizing the value of public intent data and offering leadership to encourage col- lection of them.72 A key factor in such an effort is the perceived relevance and credibility of public intent data and its producers.73 Another reason for lack of funding for data is the absence of a benchmark guiding how much govern- ments should spend, unlike for other areas of gov- ernment spending. For example, the Education 2030 Framework for Action urges countries to allocate at least 4–6 percent of GDP or at least 15–20 percent of their total public expenditure to education. The Abuja Declaration urges countries to spend at least 15percentoftheirannualbudgettoimprovethehealth sector.74 No similar guidelines are found on data. Underinvestment by donors. Donors also invest rel- atively little in public intent data. The share of total official development assistance devoted to statistics has ranged between 0.35 percent and 0.4 percent in recent years, or US$693 million in 2018.75 The combi- nation of national and donor contributions leaves a funding gap of between US$100 million and US$700 million a year globally to upgrade public intent data systems, depending on the scope of improvements.76 Misalignment of investment priorities. Beyond the size of investments in public intent data, how donors invest matters as well. With insufficient government funding of data and with donors stepping in to fill needs, the risk is that donor priorities will be funded Figure 2.7 Most countries do not fully fund their national statistical plans Source: WDR 2021 team calculations, based on indicators collected by the Partnership in Statistics for Development in the 21st Century (PARIS21) that are also used as Statistical Performance Indicators (World Bank, http://www.worldbank.org/spi). Data at http://bit.do/WDR2021-Fig-2_7. Note: Having a fully funded national statistical plan under implementation is Sustainable Development Goal Indicator 17.18.3. FCS = fragile and conflict- affected situations. 0 20 40 60 80 100 Share of countries with fully funded plan (%) L o w - i n c o m e L o w e r - m i d d l e - i n c o m e U p p e r - m i d d l e - i n c o m e H i g h - i n c o m e S u b - S a h a r a n A f r i c a L a t i n A m e r i c a a n d t h e C a r i b b e a n E a s t A s i a a n d P a c i fi c S o u t h A s i a M i d d l e E a s t a n d N o r t h A f r i c a E u r o p e a n d C e n t r a l A s i a F C S N o n - F C S Income group Region FCS status 0 0 48 3 94 74 69 50 43 22 93 61 41
  • 87. Data as a force for public good | 67 at the expense of national priorities and that donors, instead of national stakeholders, will become the main clients of NSOs.77 Because investments in data tend to be small, donors have limited incentives to make longer-term commitments that strengthen data systems such as technical capacity, research and development, infrastructure, or recording of administrative data. Instead, many investments prioritize the production of new data or specific survey efforts such as a one- off survey on a specific topic.78 In particular, donor priorities skew toward monitoring and international reporting.79 Although most national governments subscribe to international reporting, there is argu- ably a more immediate need for frequent and highly geographically disaggregated data and strong admin- istrative data systems for the effective day-to-day functioning of government.80 Within the development community, lack of donor coordination can undermine public intent data systems, leading to duplication of and parallel systems for data collection. Each project uses its own set of indicators to report results instead of relying on and strengthening country data systems.81 Such situ- ations can arise if donors need to fulfill their internal reporting requirements or are suspicious of the accu- racy of government-reported data. Lack of funding is also an issue for citizen- generated data. Interviews with representatives from NGOs in Argentina, Kenya, and Nepal revealed that lack of funding can constrain the collection of citizen- generated data.82 Similarly, although the cost of sen- sors has steadily fallen over the last few years, the costs of equipment, deployment, and transmission, as well as the lack of off-the-shelf tools for environments facing resource constraints, are still major barriers to the generation and use of machine-generated data, especially in smallholder agriculture.83 Deficiencies in technical capacity Data gaps are also persisting because of underquali- fied, understaffed, and underpaid data producers and lack of technology and infrastructure. Lack of qualified staff, proper staff renumeration, and career incentives. The gaps in public intent data also stem from limited technical capacity, especially in lower-income countries—a result in part of the lim- ited and misaligned resources previously discussed. A shortage of skilled data scientists, statisticians, and economistsacrosspublicdatasystemsisacriticalcon- straint on the performance of the data producers and the production of data, especially at a time when data from digital sources are becoming more important. The absence of key personnel in strategic positions who have a commitment to data is especially costly because of the importance of relationships between ministries and NSOs and with civil society as a cata- lyst for the flow of data and information.84 According to a global survey of NSOs conducted by PARIS21, after a shortage of funds the biggest obstacle to countries’ successful development of capacity is lack of skilled staff to implement pro- grams.85 In a list of 15 goals for capacity development, 86 percent of African NSOs selected strengthening human resources as one of their five most important goals, higher than any other category. It is particularly difficult for NSOs to recruit new staff with the skills needed to achieve their objectives. When reporting the most frequent methods of human resource devel- opment, only 7 percent of NSOs reported recruitment of staff with new skill sets, and most of these NSOs were in high-income countries.86 Recruitment and retention of skilled staff are difficult without competitive pay scales and career tracks.87 Consultations with NSOs revealed that differences in pay scales across government entities especially make it difficult for NSOs to recruit skilled staff. In Ethiopia, the Central Statistical Agency fol- lows civil service rules and regulations for remunera- tion of staff, whereas research institutes and universi- ties have their own rules and regulations. A common challenge for other government agen- cies that produce data is that they lack designated data scientists or statisticians. This is particularly problematic when other agency staff may lack the time and capacity to make better use of the data col- lected within their institution.88 Lack of technology, software, and infrastructure. Even when producers of public intent data have staff with the skills needed to collect, process, and disseminate those data, they often lack the techno- logical infrastructure to be effective in their work. Constraints in technology and information tech- nology (IT) infrastructure compound constraints in technical capacity. For example, as part of the Global COVID-19 Survey of NSOs, many NSOs in low- and middle-income countries noted their need for software to collect data remotely to meet new data demands.89 In the PARIS21 survey, the option most selected to achieve priorities for a national statistical system in the medium term is acquiring up-to-date technology and infrastructure.90 Tech- nological shortcomings also constrain the ability of individuals to produce data themselves because many types of citizen-generated data rely on phone or web technologies.91
  • 88. 68 | World Development Report 2021 Deficiencies in governance In addition to shortages of skills and funding, various failures and problems with data governance impede the potential of public intent data from being realized. At the national level, clear institutional mandates and good coordination among the data-producing agen- cies are critical for the exchange, interoperability, and timely publication of data.92 In practice, exchanges of data across ministries and between ministries and NSOs and beyond are rare, even in well-resourced and high-capacity environments.93 The absence of clear mandates, responsibilities, and incentives to effec- tively coordinate data production and data exchanges can obstruct collaboration and lead to duplication of data-gathering efforts.94 Deficiencies in the legal framework. The legal frame- work governing data production and data exchanges is a common barrier. Outdated statistical laws can make it difficult for NSOs and data-producing agen- cies to operate and collaborate effectively in light of recent changes in the data landscape, such as the pro- liferation of new data types, sources, and producers. In Chile, the National Institute of Statistics (INE) has had difficulties accessing key data from other public institutions in a timely fashion, primarily because the national statistical law is not sufficiently clear in authorizing INE’s access to statistical information. Whenthelawwaspassedin1970,dataexchangeswere not a concern. Although a process to modernize the law has been at the forefront of political discussions for a decade, a revised version has yet to be formally implemented. This issue is a concern more generally because the older the national statistical law, the lower isstatisticalperformanceingeneralanddataopenness in particular at any country income level (figure 2.8). Other important elements of the legal framework are regulations governing data protection and the right to information. When these safeguards are lacking or weak, data exchanges can entail serious risks to data protection.95 Lack of comprehensive data protection regulations is a problem in many parts of the world.96 A review of African countries found that only 28 percent had procedures in place to ensure deidentification of data before publication.97 Without a requirement to share data and guidance on how to treat confidential information, any risk-averse government employee would face few incentives to share data, especially confidential data, considering the possibly high costs should confidentiality be breached. The absence of comprehensive data protec- tion legislation can also facilitate misuse of data such as for political control or discrimination.98 Independence of the NSO. The legal, financial, and institutional independence of the NSO is an import- ant element of a successful public intent data sys- tem, especially its data quality and openness.99 The Figure 2.8 The older a country’s statistical laws, the lower is its statistical performance and the less open are its data Sources: WDR 2021 team, based on UNSTATS (Statistics Division, Department of Economic and Social Affairs, United Nations), UNSTATS (database), https://unstats.un.org/unsd/dnss/cp /searchcp.aspx; Partnership in Statistics for Development in the 21st Century (PARIS21), https://paris21.org/knowledge-database?keyword=type%5B%5D=Statistical-Legislation -Country-Documentsdate-from=date-to=page=; World Bank, World Development Indicators (database), https://databank.worldbank.org/source/world-development-indicators. Data at http://bit.do/WDR2021-Fig-2_8. Note: In panel a, the regression coefficient on age, controlling for GDP per capita, is –0.48, p .01; in panel b, –0.39, p .01. For the Statistical Performance Indicators, see World Bank, Statistical Performance Indicators (database), http://www.worldbank.org/spi. For the Open Data Inventory (ODIN), see Open Data Watch, https://odin.opendatawatch.com/. 0 10 30 50 70 90 20 40 60 80 0 10 30 50 70 90 20 40 60 80 10 20 30 40 50 60 70 10 0 0 20 30 40 50 60 70 Statistical Performance Index ODIN overall score Age of statistical laws (years) Age of statistical laws (years) a. Statistical performance b. Openness of data High-income Upper-middle-income Low-income Lower-middle-income
  • 89. Data as a force for public good | 69 independence of producers of public intent data also reinforces the credibility of and trust in the data and its producers, which encourages data use in both gov- ernment and civil society.100 An indicator capturing the independence of NSOs in all African nations is included in the Ibrahim Index of African Governance.101 The indicator measures the institutional autonomy and financial independence of an NSO. A perfect score indicates that an NSO is able to publish data without clearance from another government branch and has sufficient funding to do so. A higher score on the NSO independence indicator is highly correlated with statistical performance as captured by the World Bank’s SPI (figure 2.9, panel a). In 2019 the average score on NSO independence was 34 out of 100, with low-income African countries scoring below average. These findings illustrate that NSO independence is precarious, particularly in lower-income countries. Anecdotes of attacks on NSO independence around the world suggest that fragile NSO independence is not limited to the Afri- can context.102 For example, in 2007 the Argentine government began interfering with the independence of Argentina’s NSO, the National Institute of Statistics and Censuses (INDEC). The effort initially focused on the consumer price index and later expanded to other official statistics, casting doubt especially on reported inflation statistics. Recognizing the harmful effects of these measures, by 2015 a new government had undertaken efforts to rebuild the institute, and INDEC resumed the delivery of trustworthy statistics with transparency and complete adherence to inter- national principles.103 A government’s interest in having an independent national statistical system can be affected by several competing factors. On the one hand, a government may have a vested interest in curtailing statistical independence and the production and dissemination of reliable data, fearing these could expose poor policy decisionsandperformance,dilutepower,andincrease public scrutiny and pressure.104 In this case, lack of independence and the availability of reliable data would make it harder to hold governments account- able.105 On the other hand, an independent statistical system producing reliable data in a transparent fashion best informs government decision-making and increases citizens’ trust in government data and public institutions in general.106 Such transparency can also facilitate favorable capital market and invest- ment conditions and foster GDP growth.107 Finally, international cooperation can boost statistical inde- pendence and data transparency when adherence to standards of data quality and the independence of their producers is required for accession to interna- tional organizations or agreements. An example is Colombia’s successful bid to join the Organisation for Economic Co-operation and Development (OECD).108 Civil society performs a vital function in demand- ing transparency and holding government account- able. Citizen-generated data can be used to challenge official statistics when their accuracy or impartiality are in question. A free and empowered press is a Figure 2.9 Greater NSO independence and freedom of the press are positively correlated with better statistical performance Sources: NSO independence score: Mo Ibrahim Foundation, Ibrahim Index of African Governance (database), http://mo.ibrahim.foundation/iiag/; World Press Freedom Index: Reporters Without Borders, 2020 World Press Freedom Index (database), https://rsf.org/en/ranking_table. Data at http://bit.do/WDR2021-Fig-2_9. Note: The x’s represent countries. Panel a shows only African countries, and panel b shows all countries with data available. The NSO independence score ranges from 0 to 100. The World Press Freedom Index ranges from 100 to 0—lower values imply greater press freedom. For the Statistical Performance Index, see World Bank, Statistical Performance Indicators (database), http://www.worldbank.org/spi. NSO = national statistical office; RSF = Reporters Without Borders. R² = 0.22 0 10 0 20 30 40 50 60 70 25 50 75 100 Statistical Performance Index NSO independence score a. NSO independence and statistical performance b. Freedom of the press and statistical performance R² = 0.23 10 0 20 30 40 50 60 70 80 90 0 25 50 75 100 Statistical Performance Index RSF World Press Freedom Index
  • 90. 70 | World Development Report 2021 critical check on government power in general and on government interference with statistical indepen- dence and data transparency in particular. Greater press freedom, as measured in the World Press Free- dom Index compiled by Reporters Without Borders,109 is highly correlated with statistical performance as well as with statistical independence, regardless of a country’s size or income level (figure 2.9, panel b). Deficiencies in data demand Even when high-quality data are available and acces- sible, they must be put to an appropriate use to have an impact on development. As such, lack of data use is blocking the path to development. Lowlevelsofdataliteracy. Several barriers to data use remain. Low levels of data literacy among both policy makers and civil society are one barrier.110 Potential data users need to have both a conceptual understand- ing of how data can inform policy questions and the technical skills to extract the relevant information from data. An analysis of the use of statistics in news articles in 32 countries in four languages revealed con- siderable scope for journalists to improve their critical engagement with statistics—and that finding is likely to apply to civil society at large.111 For policy makers as well, data literacy is frequently identified as a barrier to data use.112 Among the general population, compara- bly low literacy and numeracy rates in lower-income countries fundamentally diminish the pool of poten- tial data users.113 Lack of incentives for and interest in data use. Even when policy makers have the skills to use data, they may not be interested in exercising those skills because they do not attach value to data. Accordingly, another major factor affecting demand for public intent data is lack of incentives to use the data.114 When political leaders exhibit a commitment to data use, they can generate expectations for civil servants to rely on data more frequently and create incentives for accountability. “Political champions,” as well as changes in administration or individual government officials, often create opportunities for data-driven policy making.115 A data-literate society plays a major role in creating these political commitments to data use by demanding—and rewarding—the justification of policy decisions with data. Low trust in the quality of public intent data. Another reason for lack of data use is the often low trust in the quality of public intent data. Although data users can check for signs of internal coherence, the accuracy of datacannotbeinferredfromthedataalone,andincor- rect statistics can take years to be detected, if they are detected at all.116 A survey of data producers and users in 140 countries found that NSO officials have much greater confidence in the quality of national statistics than ministry officials have.117 Lack of infrastructure to access and use the data. A final reason for lack of data use is related to the infra- structure needed to access and use data. For example, internet access is key to obtaining data, but penetra- tion rates are lower in poorer countries. The exclusive sharing of data via online channels may exclude large shares of potential data users who are hampered by limited internet connectivity.118 And certain users may be unaware that data are available for use.119 Lack of internet connectivity, reliable power, and data centers are also major challenges in the use of Internet of Things systems and sensor data.120 Use of public intent data by a diverse group of actors often translates into greater demand for high-quality data. The rise in demand can drive investment in data and capacity, setting off a virtuous cycle of increasing data demand and supply (figure 2.10). For example, government ministries’ reliance on and demand for high-quality data have been associated with NSOs in Latin America exhibiting higher capacity.121 In the same region, demand for and interest in accurate and high-quality statistics in civil Figure 2.10 Data supply and demand can generate either virtuous or vicious cycles of data production and use Source: Adapted from Sanga (2013). High demand Low demand Data supply– constrained countries Virtuous cycle countries Vicious cycle countries Data demand– constrained countries Low quality and supply High quality and supply
  • 91. Data as a force for public good | 71 society, academia, the media, and the private sector have led to better funding, autonomy, and capacity in national statistical systems.122 Conversely, coun- tries with a low supply of data are likely to use data less, creating a vicious cycle of data production and use. In general, countries can benefit from assessing whether their constraints are primarily on the supply side or the demand side for data. They can then use such an assessment to prioritize data-related policies and maximize their return on development.123 Realizing the potential of public intent data To maximize the impact of public intent data on development, governments need to address the financing, technical capacity, governance, and data demand roadblocks. This section describes policies to overcome these foundational challenges. Figure 2.11 summarizes some of the main policies governments can enact, categorized by the actors and barriers they primarily address. International organizations also have a role to play, and spotlight 2.2 discusses how they can contribute to addressing the key roadblocks. Chapter 9 builds on the analysis in this section, specifically in the domain of data governance, lay- ing out a bold vision for an integrated national data system. Such a system can transform the role the public sector plays in the data modernization agenda by incorporating public intent data alongside private intent data, integrating the users and producers of both, and enabling safe data exchanges. Figure 2.11 Financing Technical capacity Governance Data demand Civil society and academia NSOs Other government agencies • Create a target fraction of government spending or a line item in the national budget dedicated to the NSO. • Engage recurrently with the Ministry of Finance to understand and support its data needs. • Ensure more competitive pay scales. • Devote more time and resources to building capacity among staff. • Ensure that NSO independence is anchored in laws and institutional setup. • Prevent statistical laws from becoming outdated. • Build trust in integrity of official statistics via public release calendars and best practices in dissemination. • Engage proactively with nongovernmental entities. • Designate a budget line for data in each ministry and agency. • Ensure use of produced data across the public sector for monitoring, evaluation, and more. • Create technical units in charge of data production and use. • Create a governmentwide approach to the salaries of statisticians and data scientists. • Assign clear roles, mandates, and responsibilities along the data life cycle for government agencies (see chapters 8 and 9). • Designate knowledge brokers in government agencies to champion the flow and use of data. • Institutionalize data-intensive management practices. • Allocate resources to citizen-generated data collection. • Promote data literacy in primary and secondary education. • Enhance tertiary education in data science and statistics. • Ensure that laws and regulations facilitate the safe dissemination of data. • Enable citizens to engage more easily with data through open data platforms, machine readability, and data visualizations. Political commitment Create a broad-based political and societal agreement on the value of high-quality public intent data Figure 2.11 Policies to realize the potential of public intent data Source: WDR 2021 team. Note: The figure summarizes policies governments can enact, categorized by the actors and barriers the policies are primarily addressing. Many policies span several actors and bar- riers but are placed into one box here for simplification. The role of the private sector in realizing the potential of public intent data is discussed in chapter 4. The role of international organizations is examined in spotlight 2.2. NSO = national statistical office.
  • 92. 72 | World Development Report 2021 A common reason for the four roadblocks on the pathways to data for public policy just described is the lack of a political understanding and appreciation of the value of data for policy making. Achieving high-quality production and use of public intent data requires an unequivocal high-level political commit- ment to data for development, even when data do not yield politically convenient insights. A broad-based political and societal agreement on the value of public intentdataisthemosteffectivewaytoensurearobust political commitment to data. Such a social contract for data can build the trust of all participants that they will not be harmed in the production, exchange, and use of data. Actors from across the public sector, private sector, civil society, and academia can play an important role in demanding and encouraging agree- ment. One mechanism for formulating such broad agreement and formalizing a commitment to data is confirming the importance of data in countries’ national development plans. Another mechanism is formulating a national data strategy—a topic dis- cussed in greater detail in chapters 8 and 9. Financing needs: Strengthening and sustaining financial resources for data producers Mostlow-incomeandlower-middle-incomecountries severely underspend on data. Securing sustainable financing is an enduring struggle for data producers and users. To reap the full value of data for develop- ment, governments must raise current spending levels drastically. At the same time, it is painfully hard to obtain and benchmark how much governments are spending on data. Thus one priority is to improve the statistics on government spending on data. One way to increase the priority given to financ- ing of data is to establish a target (percentage) for the government expenditure on the national sta- tistical system. Such a target can be derived with a view toward the resources needed to fully fund the national statistical plan or be based onthespendingof peer countries that have achieved sufficient funding. If a government commits to such a target through a national development plan or through other means, it arms data producers during later budget negotiations. Another way to implement stable and transparent government financing is to insert a line item in the national budget dedicated to the NSO. The absence of such a budget line has been a problem for even high- income countries. For example, the European Union’s statistical agency, Eurostat, recently saw its budget line merged into an overarching digitization and modernization budget, raising fears that funding for statistical needs could be at risk. Conversely, one of the biggest steps in ensuring the independence of the United Kingdom’s Office for National Statistics in 2007 was giving the office authority over how it uses its budget. Similarly, data-producing ministries and other government agencies could each receive a des- ignated budget line for the production, processing, management, and safe sharing of the administrative data they produce. Other investment priorities should be closing existing coverage gaps in vital statistics and other registers and including populations that are hard to reach. The Ministry of Finance has a special role to play as the most influential actor in budget negotiations for government-financed producers of public intent data. Recurrent engagement with, and consequently systematic use of, public intent data and official statistics by the Ministry of Finance is also likely to improve the funding for data producers and the NSO in particular.124 As documented in the examples ear- lier in this chapter, it is important that the Ministry of Finance understand that investing in data may improve budgets through increased revenue collec- tion and elimination of duplication in beneficiaries, among other things. Stable government financing can also be secured by ensuring that data play a role in government programs and projects. When government projects have numerical targets, data management and data analysis are a must. Where relevant, the legislature could require that government program budgets be supported or justified by evidence, necessitating the use of data and therefore funding for data. Although linking funding for data to monitoring government targets may also create disincentives in producing accurate data, resisting such disincentives must be at the core of an NSO’s mission to ensure credibility of and trust in official statistics. Sometimes the financing for data is sufficient, but the resources need to be better spent. Government funding of citizen-generated data, for example, can complement that of other public intent data and be a less costly alternative. But doing so requires that civil society data platforms have sufficient capabilities and resources for community outreach, coordination, monitoring of data collection, and quality assess- ments of the data.125 Technical capacity needs: Investing in human capital for production of public intent data Once more and better funding is provided, invest- ment in technical capacity is a top priority. Such an
  • 93. Data as a force for public good | 73 effort should start with human capital: investing in statisticians, data scientists, and applied economists across the public sector and in data literacy in the population at large. These investments would pro- mote demand for public intent data and bolster the credibility of and trust in public intent data producers. These goals could be achieved through a combination of education and training initiatives. Meanwhile, the public sector at large and the NSO in particular should seek qualified statisticians, applied economists, and data scientists. One way of doing so is to create a governmentwide approach to the formulation of salary scales and renumeration of the positions across the public sector, including in the NSO, the central bank, and line ministries. Such an approach could minimize differentials in salary scales within and across government agencies and subsequently create an opportunity to adopt more competitive salary scales to attract and retain talent. A strategic area in which NSO capabilities in low- and middle-income countries could be strengthened is research on the development of improved methods and standards for data production. The capacity to conduct such methodological research is critical to improving the availability, quality, and usability of public intent data. NSOs could establish a business line on experimental statistics, which may serve as an avenue for participating in cutting-edge, multi- disciplinary research efforts centered on integrating public intent and private intent data. Low-capacity NSOs, however, will have to strengthen, and in cer- tain cases create, capabilities in data science and geographic information systems. Twinning arrange- ments between NSOs with established programs on experimental statistics and those beginning to build these capabilities may be one way to accelerate prog- ress. These activities are also aligned with the call for international organizations to sustain investments in the search for improved methods of data collection, curation, and analysis (see spotlight 2.2). Beyond NSOs, data-related capabilities in min- istries and other government agencies are often insufficient. They could remedy the situation by first creating technical units in charge of data produc- tion, processing, management, and dissemination to improve data quality. These units could also develop ministry-specific action plans for capacity building, and should be empowered by receiving the financial, technological, and human resources they need to ful- fill their mandated roles in the national data system. Their goal would be delivery of high-quality knowl- edge disseminated in accordance with a ministry- specific public release calendar. Capacity building should also be pursued in a country’s education system.126 In line with the aspi- rations of SDG Target 4.6, primary and secondary educational institutions should elevate foundational numeracy and statistical literacy skills so that, like general literacy, they are part of the fundamental curricula. These skills would empower an informed public of data users and create a pool of potential can- didates for specialized data professions. In tertiary education and data-driven academic fields, advanced education on statistics should be enhanced in ways that equip future technocrats with data skills that meet policy makers’ demands. An example at the country level is Politeknik Statistika, a highly selective university established by Statistics Indonesia in 1958. Politeknik Statistika awards bachelor’s degrees, with an emphasis on applied training in official statistics, in preparation for statistical careers at Statistics Indonesia and the public sector at large. Examples at the regional level include the Ecole Nationale Supérieure de Statistique et d’Economie Appliquée (ENSEA) in Côte d’Ivoire and the Eastern Africa Statistical Training Center (EASTC) in Tanzania. Degree and certificate programs with a data science theme, including those offered online, can facilitate development of statistical capacity in techniques that cut across statistics and computer science, such as artificial intelligence and machine learning. A noteworthy example is the Think Data Science Program that was launched in 2019 by the Palestinian Central Bureau of Statistics (PCBS), in partnership with the Arab American University in West Bank and Gaza. As part of this program, students have to complete a graduation project at the PCBS, which gets accredited by the Ministry of Higher Education. Finally, investments in human capital should be accompanied by investments in physical infrastruc- ture, IT platforms, and software capabilities (see chapter 5). Governance needs: Making laws and regulations conducive to production and use of quality data Effective use of public intent data depends on having in place a governmentwide national data strategy or another high-level document that outlines the roles, responsibilities, and mandates of various govern- ment agencies. Such arrangements are discussed in detail in chapters 8 and 9. The NSO must be truly independent, impar- tial, and nonpolitical. Its independence should be
  • 94. 74 | World Development Report 2021 anchored in laws and an institutional setup that curtails political interference in official statistics and other public data products.127 Debatable is whether placement of the NSO under the executive branch of government leaves it open to attacks on its independence. On the one hand, it is important that the NSO be positioned to inform public debate and policy. But this may be difficult to achieve if the NSO is administratively separated from other parts of the government and does not maintain a close relationship with influential ministries such as the Ministry of Finance or Treasury and the Ministry of Commerce or Industry. On the other hand, reporting to a specific ministry or an individual as part of the executive branch leaves the NSO vulnerable to being questioned, pressured, or otherwise influenced in its involvement with politically sensitive statistical activities. Another way to safeguard against the politiciza- tion of data is by making deidentified public intent datasets publicly available and accessible. Ensuring the creation and dissemination of deidentified public intent datasets is partly a political task and partly a technical one. On the political front, the NSO and other govern- ment agencies must promote open data for develop- ment. These agencies should ensure that statistical laws and regulations permit the public dissemination of deidentified public intent data—both aggregated data and microdata. They should also actively engage with data users to cultivate a shared understanding of the value of reusing open data for research and for design and evaluation of public policy. Administrative data in particular are often not accessible beyond the ministry collecting the data. On the technical front, safeguarding the confi- dentiality of subjects of public intent data production is an unconditional requirement. Confidential data include both personally identifiable information and the geographic coordinates of data subjects, includ- ing communities, households, facilities, and estab- lishments. Although best practices, standards, and tools for microdata deidentification are available,128 the risk of disclosure is increasing with enhance- ments in the interoperability of public intent data. These trends call for continued improvement of deidentification techniques. Building capabilities within technical units of ministries and NSOs in the use of analytical tools to remove sensitive informa- tion, spatially deidentify microdata, and deal respon- sibly with the risk of disclosure will also foster a better culture of open data. Data demand needs: Expanding the use of public intent data The precondition for the widespread use and reuse of data is greater data literacy among the citizenry at large and government decision-makers. The integrity of and public trust in official statistics are also critical to the demand for data. The integrity of official statistics is closely tied to the perceived inde- pendence and trustworthiness of the NSO. Existing best practices can ensure integrity of and trust in the computation of official statistics and the timing of their release, even in the face of political pres- sures. A first set of practices centers on effective outreach and communication about NSO products. These practices include publishing a release calen- dar and providing a public explanation of potential deviations from release dates, as well as publicly dis- seminating meticulous documentation and meta­ - data allowing findings to be replicated. Other best practices include refraining from participating in national politics and carrying out periodic outreach efforts to cultivate public understanding and accep- tance of the importance of an independent statisti- cal agency. NSOs could also increase use of and demand for their data by engaging proactively with and listening to stakeholders in government, academia, the private sector, CSOs, and the media.129 These engagements may have multiple objectives such as disseminating statistical outputs, understanding and responding to user needs, exploring links between NSO data products and other data, and strengthening statis- tical literacy. Statistics Canada, Statistics Indonesia, and Mexico’s National Institute of Statistics and Geography (INEGI) have engaged in recurrent training of journalists in print, radio, television, and digital media outlets on official statistics. INEGI has expanded its work program on data and statis- tics related to crime and victimization in Mexico in response to the growing demands from policy mak- ers and data users. Elsewhere, the Palestinian Central Bureau of Statistics regularly disseminates official statistics on important international observances, such as International Workers’ Day and Interna- tional Women’s Day. Closely involving civil society in the use and production of data is critical. This involvement can be achieved by establishing advisory boards com- posed of independent technical experts who can help prepare national statistical strategies in view of the needs of all users—not only the needs of various government agencies.
  • 95. Data as a force for public good | 75 Data visualization is another way in which NSOs could increase the reach of official statistics and the public’s understanding of them. It does little good to achieve greater mastery of advanced analytics without also ensuring that the policy makers design- ing and enacting interventions that improve lives understand and appreciate the value added by data.130 Distilling complex phenomena into compelling visu- als and narratives for broad audiences is a timeless idea that can effectively influence public debate and policy making (for a pioneering example, see box 1.2 in chapter 1). From the perspective of government ministries and agencies, one way to jump-start data use in planning and policy making is through the institu- tionalized adoption of data-intensive management practices. In Rwanda, as part of the nationwide Imihigo performance contracts launched in 2006, mayors commit to setting development targets. Each target is subsequently evaluated and ranked by the national government with respect to its achievement and whether it was monitored appropriately.131 Man- agement of these contracts not only requires large amounts of data to evaluate performance, but, more important, puts data on development outcomes at the center of the policy discourse.132 When low data literacy or appreciation of data are barriers to their use, knowledge brokers can facilitate data use in the public sector. A knowledge broker points policy makers to the relevant data and creates value through collaboration.133 The important role of knowledge brokers is highlighted by evidence from a survey conducted by AidData: policy makers reported that they learn about sources of data primar- ily through personal interactions.134 The role of knowledge broker can be fulfilled by government officials and by outsiders. Central ana- lytical units and technical staff in line ministries can serve as intermediaries for NSOs seeking to reach senior officials and increase data use, presenting data in both technical and nontechnical ways tailored to the needs of decision-makers.135 Another useful tech- nique is joint analytical exercises by the government and researchers. Collaboration between external researchers and policy makers is a major facilitator of the use of evidence and data.136 If governments address these financing, human capital, governance, and data demand roadblocks, the value of public intent data can be maximized. Chapter 9 discusses sequencing of the required government interventions, placing such activities within an integrated national data system. Another way in which data can lead to better lives is via the private sector. That is the topic of the next chapter. Notes 1. Unfortunately, in some contexts this scenario is not too far from reality. For example, Das and Hammer (2007) found that doctors in New Delhi often perform only a fraction of the recommended examinations and tests when patients present with common yet dangerous health conditions. 2. As just one example, in Ethiopia a 2016 study by Rog- ger and Somani (2018) surveying 1,831 officials of 382 organizations spanning three tiers of government revealed officials’ significant lack of knowledge about their area of work. Half thought that their district’s population was at least 50 percent larger or smaller than it was. Government staff in the educational sector were on average 38 percent off when estimating pri- mary enrollment figures. 3. Cameron et al. (2019). 4. See chapter 1 for more information on the distinction between public intent data and private intent data, chapter 3 for a discussion of private intent data, and chapter 4 for a discussion of how both kinds of data can be repurposed. 5. See Jolliffe et al. (forthcoming) for a lengthier discus- sion of these 12 features of public intent data and exam- ples of how they can generate value for development. 6. World Bank (2018d). 7. SDSN TReNDS (2018b); SSEE (2014). 8. Hallegatte et al. (2017). 9. SDSN TReNDS (2018a). 10. J-PAL (2018). 11. Hjort et al. (2019). 12. Arezki et al. (2020), for example, show that imprecise definitions of employment in the Middle East and North Africa blur the lines between unemployment and informality and distort the role of women and rural areas in national labor markets. 13. The World Bank LSMS team provided the number of countries in which LSMS-supported survey data pro- duction took place from 2011 to 2020. 14. Abay et al. (2019); Arthi et al. (2018); Carletto, Gourlay, and Winters (2015); Carletto, Savastano, and Zezza (2013); Carletto et al. (2017); De Weerdt, Gibson, and Beegle (2019); Desiere and Jolliffe (2018); Dillon et al. (2019); Gaddis et al. (2019); Gourlay, Kilic, and Lobell (2019); Kilic et al. (2017, 2018). 15. World Bank (2016a, 2017a). 16. The EU uses a Nomenclature of Territorial Units for Statistics, known as NUTS, for the purpose of allocat- ing funds. Many EU countries have a hierarchy of three NUTS levels. The second level, NUTS 2, is used for allo- cation of funds. In the accompanying text, NUTS 2 is referred to as areas. 17. Government of Croatia (2019). 18. Open Data Watch (2015b). 19. Data2x (2019).
  • 96. 76 | World Development Report 2021 20. McCluskey and Huang (2019) and unpublished notes shared with the WDR 2021 team. The 30 percent refers to own-source revenue collection—that is, the part of the revenue collection that the cities themselves over- see in contrast to revenue they receive from national authorities and more. 21. World Bank (2018c). See also World Bank (2020b). 22. Roseth, Reyes, and Amézaga (2019) and references cited therein provide evidence of an up-to-date census gen- erating savings to the government many times its cost. The value of public intent data to the private sector is discussed in spotlight 3.1 and elsewhere. 23. The SDG on clean water and sanitation relies on a mix of household surveys, population and housing censuses, and administrative data (SDSN 2015). Earth observation data are used for the SDGs on sustainable cities and communities, life below water, life on land, and more (Anderson et al. 2017). Citizen-generated data are often used when government data are missing and to verify government data (Lämmerhirt et al. 2018). In the Philippines, for example, the NSO identified more than 80 relevant SDG indicators where data are missing and CSOs could provide inputs through community- based monitoring systems (PARIS21 and PSA 2020). 24. MPPN (2017). 25. Yin et al. (2020). See World Air Quality Index Project, World’s Air Pollution: Real-Time Air Quality Index (database), https://waqi.info/, and OpenAQ, OpenAQ (database), https://openaq.org/, for publicly available, real-time data from air quality monitoring stations around the globe, including those in China. 26. Open Data Watch (2015a). 27. Bosio and Djankov (2020). 28. AFIC (2018); GPSA (2020). 29. World Bank (2017c). 30. Callen et al. (2019). Petrov, Gurin, and Manley (2016) and Verhulst and Young (2017) contain many other examples and channels through which open data may lead to better development outcomes. 31. Safetipin (2016). 32. Fabregas, Kremer, and Schilbach (2019). 33. For example, in several high-profile cases researchers have been able to reidentify individuals from publicly available microdata, even though the data had been published in a deidentified fashion (Heffetz and Ligett 2014). 34. Díaz (2018); Fundación Reflejos de Venezuela (2016); Privacy International (2019). LGBTQI stands for les- bian, gay, bisexual, transgender, queer (or questioning), intersex. 35. Open Data Watch, ODIN (Open Data Inventory) (data- base), https://odin.opendatawatch.com/; Open Knowl- edge Foundation, GODI (Global Open Data Index) (database), https://index.okfn.org/. 36. WDR 2021 team calculations based on 2019 Statistical Performance Indicators (World Bank, Statistical Perfor- manceIndicators[database],http://www.worldbank.org /spi). As of December 2014, 21 countries had not com- pleted a census during the 2010 round of the population and housing census (Statistics Division, Department of Economic and Social Affairs, United Nations, World Population and Housing Census Programme [data- base], http://mdgs.un.org/unsd/demographic-social /census/index.cshtml). 37. See estimates in, for example, Roseth, Reyes, and Amézaga (2019) and references cited therein. 38. WDR 2021 team calculations based on 2019 Statistical Capacity Indicators (World Bank, Statistical Capacity Indicators [database], https://datatopics.worldbank .org/statisticalcapacity/SCIdashboard.aspx). 39. López-Vargas, Fuentes, and Vivar (2020). 40. WDR 2021 team calculations based on 2019 Statistical Performance Indicators (World Bank, Statistical Per- formance Indicators [database], http://www.worldbank .org/spi). 41. Desai, Diofasi, and Lu (2018); World Bank, Global ID4D Dataset (Identification for Development Global Dataset) (database), https://datacatalog.worldbank.org /dataset/identification-development-global-dataset. 42. United Nations (2019b). 43. Carr-Hill (2013); Randall (2015); Seltzer and Walker (2020); Toulemon (2017). 44. Carr-Hill (2013). 45. Jerven (2019). 46. Where vital registration systems function well, admin- istrative records can be used to update census pop- ulation counts. But these systems are weak in lower- income countries. Gaps in registration will likely leave out more vulnerable people than the census, especially seasonal migrants and the displaced (Dunning, Gelb, and Raghavan 2014). The use of administrative records from nongovernment actors can supplement official records. 47. Maron (2015). 48. UN Women (2018). Gender-specific SDG indicators are those that explicitly call for disaggregation by sex or that refer to gender equality as the underlying objective. 49. Doss, Kieran, and Kilic (2020). Administrative recording of land titles can serve the function of documenting asset ownership at the individual level. However, land and property ownership registries are among the less developed administrative recording systems globally. According to the Global Open Data Index, these data are available in less than one-quarter of countries, even in high-income nations, and are rarely openly available. 50. World Bank (2017b). Advances have been made in intrahousehold poverty estimation based on structural models and existing household survey data—that is, clothing expenditures for women, men, and children (Lechene,Pendakur,andWolf2019).Thepredictionspro- vided by these models, however, have yet to be validated in the context of randomized survey experiments that collect detailed, individual-disaggregated consumption data that can, in turn, be used to compute observed esti- mates of intrahousehold poverty among women, men, and children. These observed estimates can, in turn, be compared with predictions stemming from structural models, based on the data elicited through prevailing approaches to household survey data collection.
  • 97. Data as a force for public good | 77 51. Tiberti and Costa (2020); UN Women (2018). Similarly, individual-disaggregated data on time use are required to monitor SDG Target 5.4. Yet of the 84 countries known to have conducted time use surveys in the past, only 24 percent of them have collected data since 2010. 52. Ambler et al. (2020); Bardasi et al. (2011); Chen and Col- lins (2014); Deere, Alvarado, and Twyman (2012); Fisher, Reimer, and Carr (2010); Jacobs and Kes (2015); Kilic and Moylan (2016); Kilic, Moylan, and Koolwal (2020); Kilic et al. (2020). 53. See de Mel, McKenzie, and Woodruff (2009); de Nicola and Giné (2014); Gibson and Kim (2010). 54. Arthi et al. (2018); Das, Hammer, and Sánchez-Paramo (2012); Deininger et al. (2012); Gaddis et al. (2019); Kilic et al. (2018); Wollburg, Tiberti, and Zezza (2020). 55. Sandefur and Glassman (2015). 56. Sandefur and Glassman (2015). 57. Karagulian et al. (2019). 58. Antony et al. (2020); Morawska et al. (2018). 59. Similar findings were reported in Beegle et al. (2016). 60. See Hoogeveen and Pape (2020) for more information on such innovations. The last two poverty data points are comparable in 60 percent of countries in FCS and in 75 percent of low-income and Sub-Saharan African countries. 61. FMOH (2018). 62. CTO (2018). 63. Custer and Sethi (2017); Kiregyera (2017). 64. However, under secure circumstances authorized third-party researchers can be allowed to match indi- vidual-levelrecords across multiple datasources to gen- erate insights that rely on individual-level matching. 65. Privacy by design refers to proactively embedding pri- vacy considerations in the design of information tech- nology and data systems. See examples from Austria, Estonia, and India covered in ID4D Practitioner’s Guide: Version 1.0 (World Bank 2019b). 66. WDR 2021 team calculations based on the 2019 Statis- tical Performance Indicators (World Bank, Statistical Performance Indicators [database], http://www.world bank.org/spi). 67. Katsimi and Moutos (2010). 68. Heffetz and Ligett (2014). 69. In particular, the team had discussions with the NSOs of Canada, Chile, Ethiopia, India, Indonesia, Mexico, the United Kingdom, and West Bank and Gaza. 70. See United Nations (2019b) for similar findings. 71. Calleja and Rogerson (2019). McQueston (2013) found similar results. 72. Dargent et al. (2020); OECD (2017). 73. United Nations (2019b). 74. UNESCO (2016); WHO (2011). 75. PARIS21 (2020). 76. Calleja and Rogerson (2019). 77. Sethi and Prakash (2018). 78. Calleja and Rogerson (2019). National governments also tend to prioritize covering ongoing expenses for collecting data over onetime investments in systems. 79. Lange (2020). 80. Calleja and Rogerson (2019); Sandefur and Glassman (2015); World Bank (2018a). 81. Sanna and McDonnell (2017). 82. Piovesan (2015). 83. Antony et al. (2020); Hosman (2014); López-Vargas, Fuentes, and Vivar (2020); Pham, Rahim, and Cousin (2016). 84. Allard et al. (2018). 85. PARIS21 (2018). 86. PARIS21 (2018). 87. Dargent et al. (2020). 88. Allard et al. (2018); Johnson, Massey, and O’Hara (2015). 89. Fu and Schweinfest (2020); UNSTATS and World Bank (2020). 90. PARIS21 (2018). 91. Lämmerhirt et al. (2018). 92. OECD (2019). 93. Allard et al. (2018). 94. Calleja and Rogerson (2019); Khan, Wales, and Stuart (2015). 95. OECD (2019). 96. Amnesty International (2019); Privacy International (2013); United Nations (2019a). 97. Van Belle et al. (2018). 98. Amnesty International (2019); Privacy International (2013); United Nations (2019a). 99. Independence of the national statistical system underpins the UN Statistical Commission’s 1994 Fundamental Principles of Official Statistics. The commission highlighted its concern for independence in its 2015 United Nations Fundamental Principles of Official Statistics: Implementation Guidelines (UNSTATS 2015). 100. Childs et al. (2019); Taylor (2016). 101. Mo Ibrahim Foundation, Ibrahim Index of African Governance (IIAG) (database), http://mo.ibrahim .foundation/iiag/. 102. Bodin (2011); Todesca (2017); Trewin (2018); von Oppeln-Bronikowski et al. (2015). 103. Todesca (2017). 104. Hoogeveen and Nguyen (2019); Taylor (2016); World Bank (2016b, 2017c). 105. Desiere, Staelens, D’Haese (2016); Jerven (2014). 106. Brackfield (2012); World Bank (2018a). 107. Arezki et al. (2020); Cady (2005); Cady and Pellechio (2006); Kubota and Zeufack (2020). 108. Dargent et al. (2020). 109. See Reporters Wthout Borders, 2020 World Press Free- dom Index (database), https://rsf.org/en/ranking_table. 110. World Bank (2016b, 2018a). 111. Klein, Galdin, and Mohamedou (2016). 112. Custer and Sethi (2017); Kiregyera (2017). 113. WDR 2021 team based on information in World Bank, “Literacy Rate, Adult Total (% of People Ages 15 and Above),” https://data.worldbank.org/indicator/se.adt .litr.zs. 114. World Bank (2017c). 115. Manning, Goldman, and Hernández Licona (2020). 116. Hoogeveen and Nguyen (2019). 117. Sethi and Prakash (2018).
  • 98. 78 | World Development Report 2021 118. Custer and Sethi (2017); World Bank (2018a). 119. Custer and Sethi (2017); Kiregyera (2017). 120. ITU (2016); López-Vargas, Fuentes, and Vivar (2020); Pham, Rahim, and Cousin (2016). 121. Dargent et al. (2020). 122. Dargent et al. (2020). 123. Scott (2005). 124. World Bank (2019a). 125. Lämmerhirt et al. (2018). 126. OECD (2017). 127. Bodin (2011); Todesca (2017); Trewin (2018); von Oppeln-Bronikowski et al. (2015). 128. For more information on resources and tools related to the anonymization of microdata, see World Bank and PARIS21 Consortium, Microdata Anonymization (database), International Household Survey Network, PARIS21 Consortium, https://ihsn.org/anonymization. 129. Snorrason (2018). 130. Ashby (2019). 131. World Bank (2018b). 132. Krätke and Byiers (2014). 133. Head (2016); Manning, Goldman, and Hernández Licona (2020). 134. Masaki et al. (2017). 135. Sethi and Prakash (2018). 136. Oliver et al. (2014). References Abay, Kibrom A., Gashaw T. Abate, Christopher B. Barrett, and Tanguy Bernard. 2019. “Correlated Non-Classical Measurement Errors, ‘Second Best’ Policy Inference, and the Inverse Size-Productivity Relationship in Agriculture.” Journal of Development Economics 139 (June): 171–84. https://doi.org/10.1016/j.jdeveco.2019.03.008. AFIC (Africa Freedom of Information Center). 2018. “Eyes on the Contract: Citizens’ Voice in Improving the Perfor- mance of Public Contracts in Uganda.” 2nd Monitoring Report, AFIC, Kampala, Uganda. https://africafoicentre .org/download/eyes-on-the-contract-citizens-voice -in-improving-the-performance-of-public-contracts-in -uganda/. Allard, Scott W., Emily R. Wiegand, Collen Schlecht, A. Rupa Datta, Robert M. Goerge, and Elizabeth Weigensberg. 2018. “State Agencies’ Use of Administrative Data for Improved Practice: Needs, Challenges, and Opportuni- ties.” Public Administration Review 78 (2): 240–50. Ambler, Kate, Cheryl Doss, Caitlin Kieran, and Simone Pas- sarelli. 2020. “He Says, She Says: Exploring Patterns of Spousal Agreement in Bangladesh.” Economic Devel- opment and Cultural Change. Published ahead of print, November 16. https://doi.org/10.1086/703082. Amnesty International. 2019. “New Technologies and Their Impact on the Promotion and Protection of Human Rights in the Context of Assemblies: Submission to the Office of the United Nations High Commissioner for Human Rights,” Amnesty International, London. https://www.amnesty.org/download/Documents /IOR4012842019ENGLISH.pdf. Anderson, Katherine, Barbara Ryan, William Sonntag, Argyro Kavvada, and Lawrence Friedl. 2017. “Earth Observation in Service of the 2030 Agenda for Sustain- able Development.” Geo-Spatial Information Science 20 (2): 77–96. Antony, Anish Paul, Kendra Leith, Craig Jolley, Jennifer Lu, and Daniel J. Sweeney. 2020. “A Review of Practice and Implementation of the Internet of Things (IoT) for Smallholder Agriculture.” Sustainability 12 (9): 3750. Arezki, Rabah, Daniel Lederman, Amani Abou Harb, Nelly Youssef, Louis William El-Mallakh, Rachel Yuting Fan, Asif Mohammed Islam, et al. 2020. “Middle East and North Africa Economic Update, April 2020: How Trans- parency Can Help the Middle East and North Africa.” World Bank Other Operational Studies 33475, World Bank, Washington, DC. Arthi, Vellore, Kathleen Beegle, Joachim De Weerdt, and Amparo Palacios-López. 2018. “Not Your Average Job: Measuring Farm Labor in Tanzania.” Journal of Develop- ment Economics 130 (January): 160–72. Ashby, Deborah. 2019. “Pigeonholes and Mustard Seeds: Growing Capacity to Use Data for Society.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 182 (4): 1121–37. Bardasi, Elana, Kathleen Beegle, Andrew Dillon, and Pieter Serneels. 2011. “Do Labor Statistics Depend on How and to Whom the Questions Are Asked? Results from a Survey Experiment in Tanzania.” World Bank Economic Review 25 (3): 418–47. Beegle, Kathleen, Luc Christiaensen, Andrew L. Dabalen, and Isis Gaddis. 2016. Poverty in a Rising Africa. Africa Poverty Report. Washington, DC: World Bank. Bodin, Jean-Louis. 2011. “How to React When the Indepen- dence of Statisticians and the Integrity of Statistics Are Endangered?” Statistical Journal of the IAOS 27 (1–2): 59–69. Bosio, Erica, and Simeon Djankov. 2020. “How Large Is Public Procurement?” Let’s Talk Development (blog), February 5, 2020. https://blogs.worldbank.org/developmenttalk /how-large-public-procurement. Brackfield, David. 2012. “OECD Work on Measuring Trust in Official Statistics.” Bulletin of the ISI 58th World Statis- tics Congress of the International Statistical Institute, 2011 (December 2012): 3721–26. The Hague, Netherlands: International Statistical Institute. Buvinic, Mayra, Lorenz Noe, and Eric Swanson. 2020. “Understanding Women’s and Girls’ Vulnerabilities to the COVID-19 Pandemic: A Gender Analysis and Data Dashboard of Low- and Lower-Middle Income Coun- tries.” Data2x, United Nations Foundation, Washing- ton, DC. Cady, John. 2005. “Does SDDS Subscription Reduce Borrow- ing Costs for Emerging Market Economies?” IMF Staff Papers 52 (3): 503–17. Cady, John, and Anthony J. Pellechio. 2006. “Sovereign Bor- rowing Cost and the IMF’s Data Standards Initiatives.” IMF Working Paper WP/06/78, International Monetary Fund, Washington, DC. Calleja, Rachel, and Andrew Rogerson. 2019. “Financing Challenges for Developing Statistical Systems: A
  • 99. Data as a force for public good | 79 Review of Financing Options.” PARIS21 Discussion Paper 14, Partnership in Statistics for Development in the 21st Century, Paris. Callen, Michael, Saad Gulzar, Ali Hasanain, Muhammad Yasir Khan, and Arman Rezaee. 2019. “Data and Policy Decisions: Experimental Evidence from Pakistan.” King Center on Global Development Working Paper 1055, Stanford University, Stanford, CA. Cameron, Grant James, Hai-Anh H. Dang, Mustafa Dinc, James Stephen Foster, and Michael M. Lokshin. 2019. “Measuring the Statistical Capacity of Nations.” Policy Research Working Paper 8693, World Bank, Washing- ton, DC. Carletto, Calogero, Sydney Gourlay, Siobhan Murray, and Alberto Zezza. 2017. “Cheaper, Faster, and More Than Good Enough: Is GPS the New Gold Standard in Land Area Measurement?” Survey Research Methods 11 (3): 235–65. Carletto, Calogero, Sydney Gourlay, and Paul Winters. 2015. “From Guesstimates to GPStimates: Land Area Mea- surement and Implications for Agricultural Analysis.” Journal of African Economies 24 (5): 593–628. Carletto, Calogero, Sydney Savastano, and Alberto Zezza. 2013. “Fact or Artifact: The Impact of Measurement Errors on the Farm Size–Productivity Relationship.” Journal of Development Economics 103 (July): 254–61. Carr-Hill, Roy. 2013. “Missing Millions and Measuring Development Progress.” World Development 46 (June): 30–44. Chen, J. Joyce, and LaPorchia A. Collins. 2014. “Let’s Talk about the Money: Spousal Communication, Expendi- tures, and Farm Production.” American Journal of Agricul- tural Economics 96 (5): 1272–90. Childs, Jennifer Hunter, Aleia Clark Fobia, Ryan King, and Gerson Morales. 2019. “Trust and Credibility in the US Federal Statistical System.” Survey Methods: Insights from the Field, February 22. https://surveyinsights.org /?p=10663. CTO (Office of the Chief Technology Officer, US Department of Health and Human Services). 2018. “The State of Data Sharing at the US Department of Health and Human Services.” US Department of Health and Human Ser- vices, Washington, DC. https://www.hhs.gov/sites /default/files/HHS_StateofDataSharing_0915.pdf. Custer, Samantha, and Tanya Sethi, eds. 2017. “Avoiding Data Graveyards: Insights from Data Producers and Users in Three Countries.” AidData, Global Research Institute, College of William and Mary, Williamsburg, VA. Dang, Hai-Anh, Mustufa Dinc, Juderica Diaz, Hiroko Maeda, John Pullinger, Umar Serajuddin, Brian Stacy, et al. 2021a. “Measuring the Statistical Performance of Coun- tries: An Overview of Updates to the World Bank Statis­ tical Capacity Index.” World Bank, Washington, DC. Dang, Hai-Anh, John Pullinger, Umar Serajuddin, and Brian Stacy. 2021b. “Statistical Performance Index: A New Tool to Measure Country Statistical Capacity.” Policy Research Working Paper, World Bank, Washington, DC. Dargent, Eduardo, Gabriela Lotta, José Antonio Mejía- Guerra, and Gilberto Moncada. 2020. “Who Wants to Know? The Political Economy of Statistical Capacity in Latin America.” Inter-American Development Bank, Washington, DC. Das, Jishnu, and Jeffrey Hammer. 2007. “Money for Nothing: The Dire Straits of Medical Practice in Delhi, India.” Jour- nal of Development Economics 83 (1): 1–36. Das, Jishnu, Jeffery Hammer, and Carolina Sánchez-Paramo. 2012. “The Impact of Recall Periods on Reported Morbid- ity and Health Seeking Behavior.” Journal of Development Economics 98 (May): 76–88. Data2x. 2019. “Data Breaks the Silence on Violence against Women: A Case Study of Vietnam.” Gender Data Impact Case Study, United Nations Foundation, Washington, DC. https://data2x.org/wp-content/uploads/2020/02 /Impact-Case-Studies-Vietnam-4P.pdf. Deere, Carmen Diana, Gina E. Alvarado, and Jennifer Twy- man. 2012. “Gender Inequality in Asset Ownership in Latin America: Female Owners vs Household Heads.” Development and Change 43 (2): 505–30. Deininger, Klaus, Calogero Carletto, Sara Savastano, and James Muwonge. 2012. “Can Diaries Help in Improv- ing Agricultural Production Statistics? Evidence from Uganda.” Journal of Development Economics 98 (May): 42–50. de Mel, Suresh, David J. McKenzie, and Christopher M. Woodruff. 2009. “Measuring Microenterprise Profits: Must We Ask How the Sausage Is Made?” JournalofDevel- opment Economics 88 (1): 19–31. de Nicola, Francesca, and Xavier Giné. 2014. “How Accurate Are Recall Data? Evidence from Coastal India.” Journal of Development Economics 106 (January): 52–65. Desai, Vyjayanti T., Anna Diofasi, and Jing Lu. 2018. “The Global Identification Challenge: Who Are the 1 Billion People without Proof of Identity?” Voices (blog), April 25, 2018. https://blogs.worldbank.org/voices/global -identification-challenge-who-are-1-billion-people -without-proof-identity. Desiere, Sam, and Dean Mitchell Jolliffe. 2018. “Land Produc- tivity and Plot Size: Is Measurement Error Driving the Inverse Relationship?” Journal of Development Economics 130 (January): 84–98. https://doi.org/10.1016/j.jdeveco.2017 .10.002. Desiere, Sam, Lotte Staelens, and Marijke D’Haese. 2016. “When the Data Source Writes the Conclusion: Evaluat- ing Agricultural Policies.” Journal of Development Studies 52 (9): 1372–87. De Weerdt, Joachim, John Gibson, and Kathleen Beegle. 2019. “What Can We Learn from Experimenting with Survey Methods?” LICOS Discussion Paper 418, LICOS Center for Institutions and Economic Performance, Faculty of Economics and Business, Katholieke Universiteit Leuven, Leuven, Belgium. Díaz, Marianne. 2018. “El Cuerpo Como Dato.” @Derechos- Digitales América Latina, Santiago, Chile. https://www .derechosdigitales.org/wp-content/uploads/cuerpo _DATO.pdf. Dillon, Andrew, Sydney Gourlay, Kevin McGee, and Gbe- misola Oseni. 2019. “Land Measurement Bias and Its Empirical Implications: Evidence from a Validation Exercise.” Economic Development and Cultural Change 67 (3): 595–624.
  • 100. 80 | World Development Report 2021 Doss, Cheryl Renee, Caitlin Kieran, and Talip Kilic. 2020. “Measuring Ownership, Control, and Use of Assets.” Feminist Economics 26 (3): 144–68. Dunning, Casey, AlanGelb, and Sneha Raghavan. 2014. “Birth Registration, Legal Identity, and the Post-2015 Agenda.” CGD Policy Paper 046, Center for Global Development, Washington, DC. Fabregas, Raissa, Michael M. Kremer, and Frank Schilbach. 2019. “Realizing the Potential of Digital Development: The Case of Agricultural Advice.” Science 366 (6471): eaay3038. Fisher, Monica, Jeffrey J. Reimer, and Edward R. Carr. 2010. “Who Should Be Interviewed in Surveys of Household Income?” World Development 38 (7): 966–73. FMOH (Federal Ministry of Health, Ethiopia). 2018. eHealth Apps Inventory. Addis Ababa: FMOH. Fu, Haishan, and Stefan Schweinfest. 2020. “COVID-19 Wid- ens Gulf of Global Data Inequality, While National Sta- tistical Offices Step Up to Meet New Data Demands.” Data Blog, June 5, 2020. https://blogs.worldbank.org /opendata/covid-19-widens-gulf-global-data-inequality -while-national-statistical-offices-step-up. Fundación Reflejos de Venezuela. 2016. “El drama de ser transgénero e intentar comprar en un supermercado.” Fundación Reflejos de Venezuela, Caracas. Gaddis, Isis, Gbemisola Oseni, Amparo Palacios-López, and Janneke Pieters. 2019. “Measuring Farm Labor: Survey Experimental Evidence from Ghana.” Policy Research Working Paper 8717, World Bank, Washington, DC. Gibson, John, and Bonggeun Kim. 2010. “Non-Classical Measurement Error in Long-Term Retrospective Recall Surveys.” Oxford Bulletin of Economics and Statistics 72 (5): 687–95. Gourlay, Sydney, Talip Kilic, and David B. Lobell. 2019. “A New Spin on an Old Debate: Errors in Farmer-Re- ported Production and Their Implications for the Inverse Scale–Productivity Relationship in Uganda.” Journal of Development Economics 141 (November): 1–35. https://www.sciencedirect.com/science/article/pii /S0304387818306588. Government of Croatia. 2019. “Gov’t Launches Changes to Country’sStatisticalSubdivision.”Newsrelease,January 23.https://vlada.gov.hr/news/gov-t-launches-changes-to -country-s-statistical-subdivision/25178. GPSA (Global Partnership for Social Accountability). 2020. “Making Public Contracts Work for People: Experi- ences from Uganda.” GPSA, World Bank, Washington, DC. https://www.thegpsa.org/stories/making-public -contracts-work-people-experiences-uganda. Hallegatte, Stéphane, Adrien Vogt-Schilb, Mook Bangalore, and Julie Rozenberg. 2017. Unbreakable: Building the Resil- ience of the Poor in the Face of Natural Disasters. Climate Change and Development Series. Washington, DC: World Bank. Head, Brian W. 2016. “Toward More ‘Evidence-Informed’ Policy Making?” Public Administration Review 76 (3): 472–84. Heffetz, Ori, and Katrina Ligett. 2014. ”Privacy and Data- Based Research.” Journal of Economic Perspectives 28 (2): 75–98. Hjort, Jonas, Diana Moreira, Gautam Rao, and Juan Fran- cisco Santini. 2019. “How Research Affects Policy: Experimental Evidence from 2,150 Brazilian Municipal- ities.” NBER Working Paper 25941, National Bureau of Economic Research, Cambridge, MA. Hoogeveen, Johannes, and Nga Thi Viet Nguyen. 2019. “Statistics Reform in Africa: Aligning Incentives with Results.” Journal of Development Studies 55 (4): 702–19. Hoogeveen, Johannes, and Utz Pape, eds. 2020. Data Collec- tion in Fragile States: Innovations from Africa and Beyond. Cham, Switzerland: Palgrave Macmillan. Hosman, Laura. 2014. “Emerging Markets: Top ICT Hard- ware Challenges.” Inveneo, San Francisco. ITU (International Telecommunication Union). 2016. “Har- nessing the Internet ofThings for Global Development.” ITU, Geneva. https://www.itu-ilibrary.org/science-and -technology/harnessing-the-internet-of-things-for -global-development_pub/80d1ac90-en. Jacobs, Krista, and Aslihan Kes. 2015. “The Ambiguity of Joint Asset Ownership: Cautionary Tales from Uganda and South Africa.” Feminist Economics 21 (3): 23–55. Jerven, Morten. 2014. “The Political Economy of Agricultural Statistics and Input Subsidies: Evidence from India, Nigeria, and Malawi.” Journal of Agrarian Change 14 (1): 129–45. Jerven, Morten. 2019. “The Problems of Economic Data in Africa.” In Oxford Research Encyclopedia of Politics. Oxford, UK: Oxford University Press. https://doi.org/10.1093 /acrefore/9780190228637.013.748. Johnson, David S., Catherine Massey, and Amy O’Hara. 2015. “The Opportunities and Challenges of Using Admin- istrative Data Linkages to Evaluate Mobility.” Annals of the American Academy of Political and Social Science 657 (1): 247–64. Jolliffe, Dean, Talip Kilic, Daniel Gerszon Mahler, and Philip Randolph Wollburg. Forthcoming. “Under What Condi- tions Are Data Valuable for Development?” WDR 2021 background paper, World Bank, Washington, DC. J-PAL (Abdul Latif Jameel Poverty Action Lab). 2018. “Annual Report 2018 and a Look Ahead to 2019.” J-PAL, Massachu- setts Institute of Technology, Cambridge, MA. https:// www.povertyactionlab.org/sites/default/files/2018 -annual-report-web-ready.pdf. Karagulian, Federico, Maurizio Barbiere, Alexander Kotsev, Laurent Spinelle, Michel Gerboles, Friedrich Lagler, Nathalie Redon, et al. 2019. “Review of the Performance of Low-Cost Sensors for Air Quality Monitoring.” Atmo- sphere 10 (9): 506. Katsimi, Margarita, and Thomas Moutos. 2010. “EMU and the Greek Crisis: The Political-Economy Perspective.” European Journal of Political Economy 26 (4): 568–76. https://doi.org/10.1016/j.ejpoleco.2010.08.002. Khan, Amina, Joseph Wales, and Elizabeth Stuart. 2015. “Country Priorities for Data Development: What Does History Tell Us?” Report, Overseas Development Insti- tute, London. https://www.odi.org/sites/odi.org.uk/files /odi-assets/publications-opinion-files/9695.pdf. Kilic, Talip, and Heather G. Moylan. 2016. “Methodological Experiment on Measuring Asset Ownership from a
  • 101. Data as a force for public good | 81 Gender Perspective (MEXA).” Technical Report, World Bank, Washington, DC. Kilic, Talip, Heather G. Moylan, John Ilukor, Clement Mten- gula, and Innocent Pangapanga-Phiri. 2018. “Root for the Tubers: Extended-Harvest Crop Production and Productivity Measurement in Surveys.” Policy Research Working Paper 8618, World Bank, Washington, DC. Kilic,Talip, Heather G. Moylan, and Gayatri B. Koolwal. 2020. “Getting the (Gender-Disaggregated) Lay of the Land: Impact of Survey Respondent Selection on Measuring Land Ownership.” Policy Research Working Paper 9151, World Bank, Washington, DC. Kilic, Talip, Goedele Van den Broeck, Gayatri B. Koolwal, and Heather G. Moylan. 2020. “Are You Being Asked? Impacts of Respondent Selection on Measuring Employment.” Policy Research Working Paper 9152, World Bank, Washington, DC. Kilic, Talip, Alberto Zezza, Calogero Carletto, and Sara Savas- tano. 2017. “Missing(ness) in Action: Selectivity Bias in GPS-Based Land Area Measurements.” World Develop- ment 92 (April): 143–57. Kiregyera, Ben. 2017. “Supporting Implementation of Fun- damental Principles of Official Statistics in the African Region.” Statistical Journal of the IAOS 33 (4): 863–67. Klein, Thilo, Anaïs Galdin, and El Iza Mohamedou. 2016. “An Indicator for Statistical Literacy Based on National Newspaper Archives.” Paper presented at International Association of Statistics Education’s 2016 Roundtable Conference, “Promoting Understanding of Statistics about Society,” Berlin, July 19–22, 2016. http://iase-web .org/Conference_Proceedings.php?p=Promoting _Understanding_of_Statistics_about_Society_2016. Krätke, Florian, and Bruce Byiers. 2014. “The Political Econ- omyofOfficialStatistics:ImplicationsfortheDataRevo- lutioninSub-SaharanAfrica.”PARIS21DiscussionPaper 5, Partnership in Statistics for Development in the 21st Century, Paris. http://ecdpm.org/wp-content/uploads /DP-170-Political-Economy-Official-Statistics-Africa -December-2014.pdf. Kubota, Megumi, and Albert Zeufack. 2020. “Assessing the Returns on Investment in Data Openness andTranspar- ency.” Policy Research Working Paper 9139, World Bank, Washington, DC. Lämmerhirt,Danny,JonathanGray,TommasoVenturini,and AxelMeunier.2018.“AdvancingSustainabilityTogether? Citizen-Generated Data and the Sustainable Develop- ment Goals.” Global Partnership for Sustainable Devel- opment Data, United Nations, New York. http://www .data4sdgs.org/resources/advancing-sustainability -together-citizen-generated-data-and-sustainable -development. Lange, Simon. 2020. “Key Trends in Development Co- operation for National Data and Statistical Systems.” OECD Development Policy Paper 31, Organisation for Economic Co-operation and Development, Paris. Lechene, Valérie, Krishna Pendakur, and Alex Wolf. 2019. “OLS Estimation of the Intra-Household Distribution of Consumption.” IFS Working Paper W19/19, Institute for Fiscal Studies, London. López-Vargas, Ascensión, Manuel Fuentes, and Marta Vivar. 2020. “Challenges and Opportunities of the Internet of Things for Global Development to Achieve the United Nations Sustainable Development Goals.” IEEE Access 8: 37202–13. Manning, Richard, Ian Goldman, and Gonzalo Hernán- dez Licona. 2020. “The Impact of Impact Evaluation.” UNU-WIDER Working Paper 2020/20, United Nations University–World Institute for Development Econom- ics Research, Helsinki. Maron, Mikel. 2015. “How Complete Is OpenStreetMap?” Mapbox (blog), November 19, 2015. https://blog.mapbox .com/how-complete-is-openstreetmap-7c369787af6e. Masaki,Takaaki, Samantha Custer, Agustina Eskenazi, Alena Stern, and Rebecca Latourell. 2017. “Decoding Data Use: How Do Leaders Source Data and Use It to Accelerate Development?” AidData, Global Research Institute, Col- lege of William and Mary, Williamsburg, VA. McCluskey, William, and Chyi-Yun Huang. 2019. “The Role of ICT in Property Tax Administration: Lessons from Tanzania.” CMI Brief 6, Chr. Michelsen Institute, Ber- gen, Norway. McQueston, Kate. 2013. “Autonomy, Independence, and Capacity of National Statistics Offices.” Background paper, Center for Global Development, Washington, DC; African Population and Health Research Council, Nairobi. Meijer, Albert, and Suzanne Potjer. 2018. “Citizen-Generated Open Data: An Explorative Analysis of 25 Cases.” Govern- ment Information Quarterly 35 (4): 613–21. Morawska, Lidia, Phong K. Thai, Xiaoting Liu, Akwasi Asumadu-Sakyi, Godwin Ayoko, Alena Bartonova, Andrea Bedini, et al. 2018. “Applications of Low-Cost Sensing Technologies for Air Quality Monitoring and Exposure Assessment: How Far Have They Gone?” Envi- ronment International 116 (July): 286–99. MPPN (Multidimensional Poverty Peer Network). 2017. “Using the MPI to Determine National Budgets in Costa Rica.” Dimensions 4 (August): 14–18, Oxford Poverty and Human Development Initiative, Oxford Department of International Development, University of Oxford, Oxford, UK. https://www.mppn.org/wp -content/uploads/2017/08/Dim_4_ENGLISH_online .pdf. OECD (Organisation for Economic Co-operation and Devel- opment). 2017. Development Co-operation Report 2017: Data for Development. Paris: OECD. OECD (Organisation for Economic Co-operation and Devel- opment). 2019. The Path to Becoming a Data-Driven Public Sector. OECD Digital Government Studies Series. Paris: OECD. https://doi.org/10.1787/059814a7-en. Oliver, Kathryn, Simon Innvar, Theo Lorenc, Jenny Wood- man, and James Thomas. 2014. “A Systematic Review of Barriers to and Facilitators of the Use of Evidence by Policymakers.” BMC Health Services Research 14 (January 3), article 2. https://doi.org/10.1186/1472-6963-14-2. Open Data Watch. 2015a. “Breathe Deep: Air Quality Reform in China.” Data Impacts Case Studies, Open DataWatch, Washington, DC. https://dataimpacts.org/project/data -help-china-breath-better/.
  • 102. 82 | World Development Report 2021 Open Data Watch. 2015b. “Disaggregated Data: Impacts of Demographic and Health Surveys.” Data Impacts Case Studies, Open Data Watch, Washington, DC. https:// dataimpacts.org/project/health-surveys/. PARIS21 (Partnership in Statistics for Development in the 21st Century). 2018. “Survey Results: New Approaches to Capacity Development and Future Priorities, CD4.0 Survey.” PARIS21, Paris. https://paris21.org/capacity -development-40/cd40-survey. PARIS21 (Partnership in Statistics for Development in the 21st Century). 2020. “Partner Report on Support to Sta- tistics: PRESS 2019.” PARIS21, Paris. https://paris21.org /sites/default/files/inline-files/PARIS21_Press%202019 _WEB.pdf. PARIS21 (Partnership in Statistics for Development in the 21st Century) and PSA (Philippine Statistics Authority). 2020. Use of Citizen-Generated Data for SDG Reporting in the Philippines: A Case Study. Paris: PARIS21. https:// paris21.org/sites/default/files/inline-files/PSA-report -FINAL.pdf. Petrov, Oleg, Joel Gurin, and Laura Manley. 2016. “Open Data for Sustainable Development.” Connections: Transport and ICT, Note 2016-5, World Bank, Washington, DC. Pham, Congduc, Abdur Rahim, and Philippe Cousin. 2016. “Low-Cost, Long-Range Open IoT for Smarter Rural African Villages.” In 2016 IEEE International Smart Cities Conference (ISC2), edited by Institute of Elec- trical and Electronics Engineers, 512–17. Red Hook, NY: Curran Associates. Piovesan, Federico. 2015. “Statistical Perspectives on Citizen- Generated Data.” DataShift, Civicus, Johannesburg. http://civicus.org/thedatashift/wp-content/uploads/2015 /07/statistical-perspectives-on-cgd_web_single-page.pdf. Privacy International. 2013. “Biometrics: Friend or Foe of Privacy?” Privacy International, London. https://privacy international.org/news-analysis/1409/biometrics-friend -or-foe-privacy. Privacy International. 2019. “Submission to the Special Rapporteurship on Economic, Social, Cultural, and Environmental Rights of the Inter-American Com- mission on Human Rights Regarding the Situation of Economic, Social, Cultural, and Environmental Rights in the Region.” TEDIC, InternetLab, Derechos Digitales, Fundación Karisma, Dejusticia, Asociación por los Derechos Civiles, and Privacy International, Privacy International, London. Randall, Sara. 2015. “Where Have All the Nomads Gone? Fifty Years of Statistical and Demographic Invisi- bilities of African Mobile Pastoralists.” Pastoralism 5, article 22. Rogger, Daniel Oliver, and Ravi Somani. 2018. “Hierarchy and Information.” Policy Research Working Paper 8644, World Bank, Washington, DC. Roseth, Benjamin, Angela Reyes, and Karla Yee Amézaga. 2019. “The Value of Official Statistics: Lessons from Intergovernmental Transfers.” IDB Technical Note 1682, Inter-American Development Bank, Washington, DC. https://publications.iadb.org/publications/english /document/The_Value_of_Official_Statistics_Lessons _from_Intergovernmental_Transfers_en.pdf. Safetipin. 2016. “Bogota: A Safety Analysis Report.” Safetipin, Gurgaon, India. https://safetipin.com/report /bogota-report-2016/. Sandefur, Justin, and Amanda Glassman. 2015. “The Political Economy of Bad Data: Evidence from African Survey and Administrative Statistics.” Journal of Development Studies 51 (2): 116–32. Sanga, Dimitri. 2013. “The Challenges of the Narrative of African Countries’ Development: Data Demand and Supply Mismatches.” Paper presented at conference “African Economic Development: Measuring Success and Failure,” School for International Studies, Simon Fraser University, Vancouver, Canada, April 18–20, 2013. Sanna, Valentina, and Ida McDonnell. 2017. “Data for Devel- opment: DAC Member Priorities and Challenges.” OECD Development Co-Operation Working Paper 35, OECD, Paris. Scott, Christopher. 2005. “Measuring Up to the Measure- ment Problem: The Role of Statistics in Evidence-Based Policy-Making.” Partnership in Statistics for Develop- ment in the 21st Century, Paris. https://paris21.org/sites /default/files/MUMPS-full.pdf. SDSN (United Nations Sustainable Development Solutions Network). 2015. “Data for Development: A Needs Assess- ment for SDG Monitoring and Statistical Capacity Development.” SDSN, New York. https://sustainable development.un.org/content/documents/2017Data-for -Development-Full-Report.pdf. SDSN TReNDS (United Nations Sustainable Development Solutions Network–Thematic Research Network on Data and Statistics). 2018a. “Data Sharing via SMS Strengthens Uganda’s Health System: A Case Study of mTRAC, Uganda.” SDSN TReNDS, New York. http://www.data4sdgs.org/sites/default/files/2018-09 /mTRAC%20CaseStudy_FINAL.pdf. SDSN TReNDS (United Nations Sustainable Development Solutions Network–Thematic Research Network on Data and Statistics). 2018b. “Handpump Data Improves Water Access.” SDSN TReNDS, New York. http://www .data4sdgs.org/sites/default/files/2018-12/Smart%20 Handpump%20Case%20Study.pdf. Seltzer, Judith, and Deborah Klein Walker. 2020. “Counting Children in the US 2020 Census: Assure Our Future Is Represented.” NAM Perspectives Commentary, National Academy of Medicine, Washington, DC. https://doi.org /10.31478/202003d. Sethi, Tanya, and Mihir Prakash. 2018. “Counting on Statis- tics: How Can National Statistical Offices and Donors Increase Use?” AidData, Global Research Institute, College of William and Mary, Williamsburg, VA. https:// www.aiddata.org/publications/counting-on-statistics. Snorrason, Hallgrímur. 2018. “Securing the Independence of Official Statistics: Introductory Remarks.” Statistical Journal of the IAOS 34 (2): 145–47. SSEE (Smith School of Enterprise and the Environment). 2014. “From Rights to Results in Rural Water Services: Evidence from Kyuso, Kenya.” Water Programme Work- ing Paper 1, SSEE, University of Oxford, Oxford, UK. https://www.smithschool.ox.ac.uk/publications/reports /SSEE-rights-to-results_final_March2014.pdf.
  • 103. Data as a force for public good | 83 Taylor, Matthew. 2016. “The Political Economy of Statistical Capacity: A Theoretical Approach.” IDB Discussion Paper IDP-DP-471, Inter-American Development Bank, Washington, DC. Tiberti, Marco, and Valentina Costa. 2020. “Disability Mea- surement in Household Surveys: A Guidebook for Designing Household Survey Questionnaires.” LSMS Guidebook, Living Standards Measurement Study, World Bank, Washington, DC. http://documents.world bank.org/curated/en/456131578985058020/Disability -Measurement-in-Household-Surveys-A-Guidebook-for -Designing-Household-Survey-Questionnaires. Todesca, Jorge A. 2017. “Political Power and the Argentine Statistical System: The Case of INDEC1.” Statistical Jour- nal of the IAOS 33 (4): 875–83. Toulemon, Laurent. 2017. “Undercount of Young Children and Young Adults in the New French Census.” Statistical Journal of the IAOS 33 (2): 311–16. Trewin, Dennis. 2018. “What Does an Independent Official Statistical Agency Mean in Practice?” StatisticalJournalof the IAOS 34 (2): 165–69. UNESCO (United Nations Educational, Scientific, and Cul- turalOrganization).2016.“Education2030:IncheonDec- larationandFrameworkforAction.”DocumentED-2016/ WS/28,InstituteforInformationTechnologiesinEduca- tion, UNESCO, Paris. http://uis.unesco.org/sites/default /files/documents/education-2030-incheon-framework -for-action-implementation-of-sdg4-2016-en_2.pdf. United Nations. 2019a. “Extreme Poverty and Human Rights: Note by the Secretary-General.” Document A/74/493, United Nations, New York. https://undocs.org/A/74/493. United Nations. 2019b. The Sustainable Development Goals Report 2019. New York: United Nations. UNSTATS (Statistics Division, Department of Economic and Social Affairs, United Nations). 2015. United Nations Fundamental Principles of Official Statistics: Implementa- tion Guidelines. New York: United Nations. https:// .unstats.org/unsd/dnss/gp/Implementation_Guidelines _FINAL_without_edit.pdf. UNSTATS (Statistics Division, Department of Economic and Social Affairs, United Nations) and World Bank. 2020. “Monitoring the State of Statistical Operations under the COVID-19 Pandemic: Highlights from the Second Round of a Global COVID-19 Survey of National Statis- tical Offices.” World Bank, Washington, DC. UN Women. 2018. TurningPromisesintoAction:GenderEquality in the 2030 Agenda for Sustainable Development. New York: UN Women. UN Women. 2020. “COVID-19 and Its Economic Toll on Women: The Story behind the Numbers.” News and Events, September 16, 2020. https://www.unwomen.org /en/news/stories/2020/9/feature-covid-19-economic -impacts-on-women. Van Belle, Jean-Paul. 2018. Africa Data Revolution Report 2018: Status and Emerging Impact to Open Data in Africa. With contributions by Danny Lämmerhirt, Carlos Iglesias, Paul Mungai, Hubeidatu Nuhu, Mbongeni Hlabano, Tarik Nesh-Nash, and Sarang Chaudhary. Washington, DC: World Wide Web Foundation. https:// webfoundation.org/docs/2019/03/Africa-data-revolution -report.pdf. Verhulst, Stefaan G., and Andrew Young. 2017. “Open Data in Developing Economies: Toward Building an Evidence Base on What Works and How.” Governance Lab, New York University, New York. https://odimpact.org/files /odimpact-developing-economies.pdf. von Oppeln-Bronikowski, Sibylle, Christine Kronz, Irina Meinke, and Hannah Wirtzfeld. 2015. “How Can Profes- sional and Ethical Frameworks Strengthen Statisticians in Their Practical Work?” Statistical Journal of the IAOS 31 (4): 513–22. WHO (World Health Organization). 2011. “The Abuja Decla- ration: Ten Years On.” WHO, Geneva. https://www.who .int/healthsystems/publications/abuja_report_aug_2011 .pdf?ua=1. Wollburg, Philip, Marco Tiberti, and Alberto Zezza. 2020. “Recall Length and Measurement Error in Agricultural Surveys.” Food Policy. Published ahead of print, Decem- ber 1, 2020. https://doi.org/10.1016/j.foodpol.2020.102003. World Bank. 2016a. Croatia: Small-Area Estimation of Consumption-Based Poverty (Poverty Maps). Washington, DC: World Bank. https://razvoj.gov.hr/UserDocs Images//Istaknute%20teme/Kartom%20siroma%C5% A1tva//Croatia%20Small-Area%20Estimation%20of%20 Consumption-Based%20Poverty%20(Poverty%20Maps) .pdf. World Bank. 2016b. World Development Report 2016: Digital Dividends. Washington, DC: World Bank. World Bank. 2017a. “Index of Multiple Deprivation: Concep- tual Framework for Identifying Lagging Municipalities and Towns in Croatia.” World Bank, Washington, DC. https://razvoj.gov.hr/UserDocsImages//Istaknute%20 teme/Kartom%20siroma%C5%A1tva//Index%20of%20 Multiple%20Deprivation%20-%20Conceptual%20 framework_18.06.2019.pdf. World Bank. 2017b. “Monitoring Global Poverty: Report of the Commission on Global Poverty.” World Bank, Wash- ington, DC. https://openknowledge.worldbank.org /bitstream/handle/10986/25141/9781464809613.pdf. World Bank. 2017c. World Development Report 2017: Gover- nance and the Law. Washington, DC: World Bank. World Bank. 2018a. Data for Development: An Evaluation of World Bank Support for Data and Statistical Capacity. Washington, DC: Independent Evaluation Group, World Bank. World Bank. 2018b. Improving Public Sector Performance: Through Innovation and Inter-Agency Coordination. Global Report: Public Sector Performance. Washington, DC: World Bank. World Bank. 2018c. “Public Sector Savings and Revenue from Identification Systems: Opportunities and Con- straints.” World Bank, Washington, DC. World Bank. 2018d. “The Role of Digital Identification for Healthcare: The Emerging Use Cases.” Identification for Development (ID4D), World Bank, Washington, DC. World Bank. 2019a. IC4D, Information and Communication for Development2018:Data-DrivenDevelopment.Washington,DC: World Bank. http://documents1.worldbank.org/curated
  • 104. 84 | World Development Report 2021 /en/987471542742554246/pdf/128301-9781464813 252.pdf. World Bank. 2019b. ID4D Practitioner’s Guide: Version 1.0. Washington, DC: World Bank. http://documents1 .worldbank.org/curated/en/248371559325561562/pdf /ID4D-Practitioner-s-Guide.pdf. World Bank. 2020a. “Classification of Fragile and Conflict- Affected Situations.” Brief, July 9, World Bank, Wash- ington, DC. https://www.worldbank.org/en/topic /fragilityconflictviolence/brief/harmonized-list-of -fragile-situations. World Bank. 2020b. “West Africa Unique Identification for Regional Integration and Inclusion Program, Phase 2.” Project Appraisal Document, Report PAD3556, World Bank, Washington, DC. Yin, Peng, Michael Brauer, Aaron J. Cohen, Haidong Wang, Jie Li, Richard T. Burnett, Jeffrey D. Stanaway, et al. 2020. “The Effect of Air Pollution on Deaths, Disease Burden, and Life Expectancy across China and Its Provinces, 1990–2017: An Analysis for the Global Burden of Disease Study 2017.” Lancet Planetary Health 4 (9): 386–98.
  • 105. Deploying data to curtail violence against women and girls | 85 For too long, violence against women and girls has been a deep, dark secret. Now, data collection efforts around the world are shedding light on this tragic problem and leading to solutions. Spotlight 2.1 Deploying data to curtail violence against women and girls Violence against women and girls (VAWG) is a global pandemic. One out of three women and girls (35 percent) worldwide between the ages of 15 and 49 has experienced physical violence, sexual violence, or both. At least 200 million girls and women have undergone female genital mutilation (FGM), and in at least 11 countries, more than half of women ages 15–49 have undergone FGM (figure S2.1.1).1 We know Figure S2.1.1 Prevalence of female genital mutilation in women ages 15–49, by country income level, 2010–19 Source: Adapted from Kashiwase and Pirlea 2019. Data are drawn from the World Bank World Development Indicators, https://databank.worldbank.org /FGM-Prevalence/id/a4f22755 (SH.STA.FGMS.ZS), using data from Demographic and Health Surveys, Multiple Indicator Cluster Surveys, and UNICEF. Data at http://bit.do/WDR2021-Fig-S2_1_1. Note: FGM = female genital mutilation; UNICEF = United Nations Children’s Fund. 0 G u i n e a E g y p t , A r a b R e p . D j i b o u t i M a l i S u d a n S i e r r a L e o n e E r i t r e a B u r k i n a F a s o G a m b i a , T h e M a u r i t a n i a E t h i o p i a G u i n e a - B i s s a u C e n t r a l A f r i c a n R e p u b l i c L i b e r i a C ô t e d ’ I v o i r e C h a d S e n e g a l K e n y a N i g e r i a Y e m e n , R e p . M a l d i v e s T a n z a n i a B e n i n I r a q G h a n a T o g o N i g e r U g a n d a Share of women subjected to FGM (%) 25 50 75 100 94.5 94.0 88.6 87.2 86.6 83.0 83.0 75.8 75.7 66.6 65.2 44.9 44.4 38.4 36.7 24.2 23.3 21.0 19.5 18.5 12.9 10.0 9.2 7.4 3.8 3.1 2.0 0.3 Low-income Lower-middle-income Upper-middle-income
  • 106. 86 | World Development Report 2021 these facts because representative population-based studies have been undertaken to understand the prev- alence of VAWG. These studies have used a standard- ized methodology in more than 90 countries across all regions and all income groups. For example, data for 55 low- and middle-income countries are available through a standardized module measuring VAWG, and this module has been incorporated in the Demo- graphic and Health Surveys (DHSs).2 The availability and accessibility of reliable, com- parable, and nationally representative VAWG data are leading to solutions, including laws banning domestic violence.3 Moreover, the data are informing diagnostic work, prevention and response efforts, and policies in low- and middle-income settings in key areas such as health, education, social protection, and governance. For instance, kNOwVAWdata,4 an initiative led by the United Nations Population Fund (UNFPA), collects VAWG data on 27 countries in Asia and the Pacific, shedding light on why survivors are not accessing ser- vices.TheWorldBankhasusedthesedataasabasisfor its analytical and operational work. In the Great Lakes region of Africa, an in-depth analysis of DHS data helped to identify and target emergency and wom- en’s health activities and to prevent and respond to VAWG in Uganda. In 2019 Peru’s president welcomed an in-depth analysis of VAWG data and expenditures, setting the stage for a national results-oriented bud- getary plan to reduce VAWG that was supported by the Ministry of Economy and Finance and the Minis- try of Women and Vulnerable Populations. Investing in data to understand the barriers preventing survivors of violence from using essential support services In addition to amassing data on the prevalence of VAWG, many countries and agencies that provide essential services to survivors of violence keep track of service-based data. Data on reported cases of VAWG allow countries to understand who is seek- ing help, when, for what types of violence, and how often. Various barriers, including fear and lack of knowledge, may prevent women from seeking ser- vices; data help countries to understand and address these barriers. For instance, service-based data can be used to monitor important life-saving measures, like providing victims with post-exposure prophylactics (PEP) within 72 hours of a sexual assault. Tracking how many survivors receive PEP can unveil barriers related to the supply chain of essential medicines or gaps in the training of health service staff. Integrating service-based data with data repre- sentative of a given population can yield important insights. The Gender-Based Violence Information Management System (GBVIMS) provides a global example.5 This multiagency initiative facilitates the safe, ethical, effective, and efficient standardization and coordination of service-based data. While such efforts are critical, it is also important to ensure that investments in gender-based violence data systems do not divert limited funds and staffing away from the provision of services to the survivors of violence. Separate streams of investment—and greater invest- ment—in service provision and data systems are necessary. The first and foremost purpose of the GBVIMS and service-based data is to improve the quality and accessibility of services for survivors of VAWG. But for these systems to be effective, several foundational issues must be addressed. First, efforts to integrate data should be driven by the needs of women and girls seeking services, not by the ease of access to centrally located data. Second, to overcome silos and promote national monitoring, coordination is needed across multiple institutions with different mandates and data systems. Addressing the surge in VAWG during the COVID-19 pandemic VAWG has surged during the COVID-19 pandemic.6 Lockdown measures designed to contain the spread of the virus as well as the economic and health stresses associated with this crisis have contributed to an increase in violence—especially intimate partner vio- lence.7 In the early days of the pandemic, the UNFPA warnedthat31millionadditionalcasesofgender-based violence could occur as a result of six months of lock- downs.8 Complicating matters, providing services has become more difficult because some resources have been diverted to the COVID-19 response and some services have been suspended altogether.9
  • 107. Deploying data to curtail violence against women and girls | 87 Reliable data are crucial to understanding and addressing this situation. However, collecting reliable data onVAWG has been especially challenging during the COVID-19 pandemic. Face-to-face data collection, the predominant mode in low- and middle-income countries, has been widely suspended. The alterna- tives—remote data collection through telephone, text messaging, or the Web—increase the risk of vio- lence: confidentiality is nearly impossible to ensure, and even electronic communications leave traces.10 Instead, sources of indirect information should be used: for instance, service-based data or key infor- mant interviews with frontline workers. Collecting VAWG data ethically Special care must be taken when handling data on VAWG. Soberingly, collecting VAWG data can and has caused women to experience more violence. Ethical and safety guidelines must be followed when considering both the collection and sharing of such data. These guidelines identify minimum standards for the collection of VAWG data, such as the ability to offer referrals for support to all who say they have experienced violence; the ability to guarantee confi- dentiality and privacy for survivors when collecting and reporting on data; and the commitment to use the data collected for increased and improved action. The World Health Organization (WHO) has issued the following guidelines: •  “Putting Women First: Ethical and Safety Recom- mendations for Research on Domestic Violence against Women” (2001)11 •  “Ethical and Safety Recommendations for Inter- viewing Trafficked Women” (2003)12 •  “Sample Design, Ethical and Safety Considerations, and Response Rates” (2005)13 • “Ethical and Safety Recommendations for Research- ing,Documenting,and Monitoring SexualViolence in Emergencies” (2007)14 •  “Ethical and Safety Recommendations for Inter­ vention Research on Violence against Women” (2016),15 building on lessons from the publication “Putting Women First: Ethical and Safety Recom- mendations for Research on Domestic Violence against Women” (2001) The Sexual Violence Research Initiative of the Medical Research Council in Pretoria, South Africa, has also issued important guidelines: •  Ethical and Safety Recommendations for Research on Perpetration of Sexual Violence (2012)16 Notes 1. Female genital mutilation “does not provide any health benefits, but rather causes serious risks for women’s health, including chronic infections and pain, menstrual problems, and complications in childbirth” (Kashiwase and Pirlea 2019). See also United Nations Children’s Fund, Female Genital Mutilation (dashboard), updated February 2020, https://data.unicef.org/topic/child -protection/female-genital-mutilation/. 2. The country count is as of August 2020. For DHS data on violence against women and girls, see ICF International, STATcompiler (DHS Program STATcompiler) (database), http://www.statcompiler.com/. Select “Choose Indicator” and, from the dropdown menu, “Physical or sexual violence committed by husband/partner.” Then click “Next,” “Filter by World Region,” “Select All,” and “Next.” The data will appear and can be augmented and refined by choosing more categories from the “Indicators” and “Countries” menus on the right. 3. For case studies of the impact of VAWG data on policy, see “Disaggregated Data: Impacts of Demographic and Health Surveys,” Data Impacts Case Studies, Open Data Watch, https://dataimpacts.org/project/health -surveys/. 4. See Measuring Prevalence of Violence against Women in Asia-Pacific (dashboard), Regional Office for Asia and the Pacific, United Nations Population Fund, https://asia pacific.unfpa.org/knowvawdata. 5. See GBVIMS (Gender-Based Violence Information Management System) (dashboard), Inter-Agency GBVIMS Steering Committee, https://www.gbvims .com/. 6. Bettinger-Lopez and Bro (2020); Johnson et al. (2020). 7. United Nations (2020); UNDP (2020). 8. UNFPA (2020). 9. Johnson et al. (2020). 10. UN Women (2020). 11. WHO (2001). 12. WHO (2003). 13. García-Moreno et al. (2005). 14. WHO (2007). 15. WHO (2016). 16. Jewkes, Dartnall, and Sikweyiya (2012).
  • 108. 88 | World Development Report 2021 References Bettinger-Lopez, Caroline, and Alexandra Bro. 2020. “A Dou- blePandemic:DomesticViolenceintheAgeofCOVID-19.” In Brief (blog), May 13, 2020. https://www.cfr.org/in-brief /double-pandemic-domestic-violence-age-covid-19. García-Moreno, Claudia, Henrica A. F. M. Jansen, Mary Ellsberg, Lori Heise, and Charlotte Watts. 2005. “Sam- ple Design, Ethical and Safety Considerations, and Response Rates.”InWHO Multi-Country Study on Women’s Health and Domestic Violence against Women: Initial Results on Prevalence, Health Outcomes, and Women’s Responses, ch. 3, 19–24. Geneva: World Health Organization. https:// www.who.int/reproductivehealth/publications/violence /24159358X/en/. Jewkes, Rachel, Elizabeth Dartnall, and Yandisa Sikweyiya. 2012. “Ethical and Safety Recommendations for Research on Perpetration of Sexual Violence.” Sexual Violence Research Initiative, Gender and Health Research Unit, Medical Research Council, Pretoria, South Africa. Johnson, Katy, Lindsey Green, Muriel Volpellier, Suzanne Kidenda, Thomas McHale, Karen Naimer, and Ranit Mishori. 2020. “The Impact of COVID-19 on Services for People Affected by Sexual and Gender-Based Violence.” International Journal of Gynecology and Obstetrics 150 (3): 285–87. https://doi.org/10.1002/ijgo.13285. Kashiwase, Haruna, and Florina Pirlea. 2019. “200 Million Women and Girls in the World Today Have Undergone Female Genital Mutilation.” Data Blog (blog), Septem- ber 27, 2019. https://blogs.worldbank.org/opendata/200 -million-women-and-girls-world-today-have-undergone -female-genital-mutilation. United Nations. 2020. “Policy Brief: The Impact of COVID-19 on Women.” United Nations, New York. UNDP (United Nations Development Programme). 2020. “Gender-Based Violence and COVID-19.” UNDP Brief, UNDP, New York. https://www.undp.org/content/undp /en/home/librarypage/womens-empowerment/gender -based-violence-and-covid-19.html. UNFPA (United Nations Population Fund). 2020. “Mil- lions More Cases of Violence, Child Marriage, Female Genital Mutilation, Unintended Pregnancy Expected due to the COVID-19 Pandemic.” News (blog), April 28, 2020. https://www.unfpa.org/news/millions-more-cases -violence-child-marriage-female-genital-mutilation -unintended-pregnancies. UN Women (United Nations Entity for Gender Equality and the Empowerment of Women). 2020. “Violence against Women and Girls: Data Collection during COVID-19.” UN Women, New York. https://www.unwomen.org /en/digital-library/publications/2020/04/issue-brief -violence-against-women-and-girls-data-collection -during-covid-19. WHO (World Health Organization). 2001. “Putting Women First: Ethical and Safety Recommendations for Research on DomesticViolence againstWomen.” DocumentWHO/ FCH/GWH/01.1, Department of Gender and Women’s Health, Family and Community Health, WHO, Geneva. https://www.who.int/gender/violence/womenfirtseng.pdf. WHO (World Health Organization). 2003. “WHO Ethical and Safety Recommendations for Interviewing Trafficked Women.” Health Policy Unit, London School of Hygiene and Tropical Medicine, London; WHO, Geneva. WHO (World Health Organization). 2007. “WHO Ethical and Safety Recommendations for Researching, Document- ing, and Monitoring Sexual Violence in Emergencies.” WHO, Geneva. https://www.who.int/gender/documents /OMS_EthicsSafety10Aug07.pdf. WHO (World Health Organization). 2016. Ethical and Safety Recommendations for Intervention Research on Violence against Women: Building on Lessons from the WHO Publi- cation Putting Women First: Ethical and Safety Recom- mendations for Research on Domestic Violence against Women. Geneva: WHO.
  • 109. The role of international organizations in improving public intent data | 89 International organizations can aid countries bilaterally to address challenges regarding funding, technical capacity, governance, and data demand and create global public goods to overcome these barriers. Spotlight 2.2 The role of international organizations in improving public intent data More and better financing for data production Coalitions of international organizations and devel- opment partners can provide coordinated global solutions for activities that fulfill specific data needs. Governments can finance such activities either under national budgets or through loans or grants from multilateral development banks. For example, the World Bank’s Data for Policy Package identifies a core set of social, economic, and sustainability statistics crucial for monitoring and evaluating development outcomes and provides governments with loans or grantstoaddressthesedataneeds.Forrelativelylower income, data-deprived countries, this aid can help governments to prioritize which gaps to fill and sup- plement scarce national funding. Another example is the 50x2030 Initiative to Close the Agricultural Data Gap, a multipartner initiative that seeks to transform agricultural data systems across 50 low- and middle- income countries by 2030. It uses innovative funding mechanisms, leveraging donor funding to mobilize national funding and create national ownership. Advancing research and development in methods and tools In addition to investing in improvements in the technical capacity of data producers and users, international organizations can also foster technical capacity more broadly by providing global public goods through research and development in methods of data collection, curation, and analysis. For exam- ple, they can support innovations in data capture, including through portable sensors and mobile appli- cations. These innovations must be validated rigor- ously through methodological research activities that compare the relative accuracy, cost-effectiveness, and feasibility of new and traditional methods of data collection. Based on such research, guidelines can be formulated for integrating validated innovations into surveys, censuses, and administrative records. Inter- national organizations can play an important role in carrying out such research and promoting these innovations and associated guidelines. Partnerships between international organizations and national sta- tistical offices (NSOs) in methodological research and development increase the likelihood that innovations in data capture will be adopted and implemented. Developing, disseminating, and implementing global standards for statistical activities International organizations can also support efforts to develop, disseminate, and implement international standards and guidelines for statistical activities. International statistical standards and guidelines need to be disseminated and adopted at the country level for data to be comparable across countries and hence for policy makers to be able to compare their performance with that of their peers. Among many examples are the Systems of National Accounts
  • 110. 90 | World Development Report 2021 developed by the United Nations in collaboration with several other partners; the International Labour Organization’s International Standard Classification of Occupations; and the monitoring and harmoni- zation of data related to drinking water, sanitation, and hygiene by the World Health Organization and United Nations Children’s Fund (UNICEF). Technical collaboration—including joint research among like- minded international organizations, NSOs, technical partners, and academia—is critical to the successful production of international standards and guide- lines, as are international forums for peer review, discussion, endorsement, and promotion of these public goods. Coordinating actions to ensure the effective diffusion of public goods and funding activities In the absence of coordination, organizations might finance overlapping activities or fragment invest- ments, overwhelm national data systems, or produce conflicting standards and guidelines. The Inter- Agency and Expert Working Groups as well as the Intersecretariat Working Groups, under the aegis of the United Nations Statistical Commission, provide a platform for catalyzing collaborative work on the development of standards and should continue to be supported with periodic reviews of their terms of reference and desired outputs. Awareness of these working groups needs to be expanded, particularly within international organizations, to assure coordi- nated actions within an organization. Making data accessible and compatible with national priorities and spurring local demand To satisfy increasing demand for data, international organizations should make their own data, syn- tax files, and metadata widely available and easily accessible beyond their own institutions. The data that international organizations require, such as data on the Sustainable Development Goals, affect the data produced by countries and can even crowd out the domestic production of data. It is thus imperative for such standards and goals to be made compatible with the interests, priorities, and goals of countries. When this is the case, the data main- tained by international organizations can spur local demand for cross-country data, foster their continued production, and create a virtuous cycle of data pro- duction and use.
  • 111. Data as a resource for the private sector | 91 Main messages Businesses are reaping tremendous value from both data created through businesses’ economic activities and data shared by governments. Used as an input in data-driven decision-making, those data can spur innovation in products and services and reduce transaction costs, ultimately boosting productivity, export competitiveness, and growth. Use of data in the production process of firms may help tilt the playing field toward poor people and underserved populations (who can trade across platforms and access free services) by reducing fragmentation in markets. However, it can also exacerbate domestic inequalities where foundational skills, infrastructure, and finance are not widely available in countries. Use of data by businesses can also tilt the playing field away from poor countries, whose local enterprises may struggle to compete with large global players in part because of economies of scale and scope from data. Although the use of data in the production process presents many opportunities to solve development challenges, policy makers should heed the risks this use presents for the concentration of economic power, patterns of inequality, and protection of the rights of individuals. Data as a resource for the private sector 3 CHAPTE R 1 2 3 4
  • 112. 92 | World Development Report 2021 Creating value and solving development challenges through data-driven business models F or millennia, farming and food supply have depended on access to accurate information. When will the rains come? How large will the yieldsbe?Whatcropswillearnthemostmoneyatmar- ket? Where are the most likely buyers located? Today, that information is being collected and leveraged at an unprecedented rate through data-driven agricultural business models. In India, farmers can access a data- driven platform that uses satellite imagery, artificial intelligence (AI), and machine learning (ML) to detect crop health remotely and estimate yield ahead of the harvest. Farmers can then share such information with financial institutions to demonstrate their poten- tial profitability, thereby increasing their chance of obtaining a loan. Other data-driven platforms provide real-time crop prices and match sellers with buyers. For remote populations around the world, receiv- ing specialized medical care has been nearly impos- sible without having to travel miles to urban areas. Today, telehealth clinics and their specialists can monitor and diagnose patients remotely using sen- sors that collect patient health data and AI that helps analyze such data. Innovations like these herald the promise of busi- ness models that apply data to create new and better goods and services, helping to address development challenges in the process. Both private intent and public intent data are increasingly being used by firms to create value in the production process. At the same time, data are continually being produced as a by-product of economic activity, creating digital foot- prints that drive the data economy. With their grow- ing capacity to collect, store, and process that data, businesses find that their ability to extract value from this data has been rising exponentially in recent years. The COVID-19 crisis has created urgent demands for the private sector to adopt data-driven solutions to deal with the pandemic and increase resilience and productivity for recovery. Big Tech companies have been one of the few winners during the crisis as consumers purchase more goods and services online. As businesses shift toward recovery, the new reality will likely accelerate trends toward data-driven tech- nologies that allow for automation and traceability in value chains. For all their promise, however, the accelerating pace of these trends also comes with risks related to the concentration of economic power, greater inequality, and protection of the rights of individuals. The degree to which individuals can benefit from the data-driven economy—including consumers, entrepreneurs, and job seekers—will differ according to their access to finance, education levels, skills, and technology. In charting a way forward, policy makers—across all stages of development of their country’s data-driven economy—should remain alert to these risks so that the use of data by firms contrib- utes to broadly shared benefits. The role of data in the production process of firms The role of data in the production process can be conceptualized in different ways, depending on the specificities of the firms, industries, technologies, and types of data being considered. There is as yet no overarching theory or consensus on the role of data in the production process. The categories that follow summarize various ways of understanding the role of data in the creation of value by firms—as a factor of production, as a productivity enhancer, as a by-product, or as an output. Data as a factor of production. For some firms, data are considered an input central to their business, essential to fulfillment of their core objectives. In this context, data have been referred to as a factor of pro- duction—on a par with labor, capital, and land—that is a primary determinant of output and productivity.1 For example, many social media platforms are built around monetizing their users’ data for advertising. Data as a productivity enhancer. Data may also be conceptualized as a driver of total factor productivity (TFP). Increases in TFP reflect a more efficient use of factors of production often thought to be driven by technological change. Businesses use data along with various technologies to become more productive by improving their business processes, learning more about their clients and customers, developing new products, or making better data-driven decisions.2 In this context, the addition of data to the production process makes the main factors of production more efficient, leading to better performance. According to one study, in the US health care sector the use of big data has been associated with a 0.7 percent increase in productivity growth per year.3 Other studies have found that among 179 large publicly traded US firms the adoption of data-driven decision-making has led to an increase in productivity of 5–6 percent.4 Dataasaby-productoftheproductionprocess.Data are often passively created as a by-product of economic activities. For example, call detail records (CDRs) are a by-product of telephone usage. Observed data
  • 113. Data as a resource for the private sector | 93 on ­ consumers’ browsing and buying patterns are a by-product of online e-commerce. Data created in this way can be used in the production of new products or services, either by the firm that produced the original data or by other firms with which the data are shared, such as under commercial arrangements. For example, e-commerce platforms use data created as a by-product of transactions on their platform to improve their product offerings; credit card com- panies sometimes sell their transaction data for a specific location to firms involved in tourism in that location; and new firms use CDRs for commercial purposes, including analytics and advertising.5 Data as an output. For some firms, data are the pri- mary output of the production process. Examples are data intermediaries, including rating services such as Nielsen; pollsters such as Gallup; and data aggre- gators such as dataPublica.6 These data are then used either by other firms in their production processes or by government in policy making. In all cases, data have a role in creating value for the economy, but the way in which data play into the production process differs by context. Pathways to development Whether the use of data in the production process is conceptualized as a factor of production or a driver of productivity, its transformative effects on develop- ment can be summarized by four channels: 1.  Quality improvements in existing products and ser- vices. This channel includes the use of data-driven decision-making to provide consumers with bet- ter health diagnostics, better credit scoring, better search results, and more personalized product recommendations. 2.  Cost reduction in delivering products and services. Data and analytics can reduce the costs of delivery, which can then lower prices (subject to markets being sufficiently competitive). For example, bet- ter credit scoring can reduce the cost of delivering loans and lead to lower interest rates on loans. Sensor-based agricultural devices and platforms that take and analyze soil readings can inform farmers how much fertilizer they should apply, which should reduce wastage and costs. 3. Greater innovation in development of new products and services. Examples include the development of new financial products, smart contracts and supply chain tracking services, new products that rely on applications such as online maps or translation, and new consumer goods based on analysis of pur- chasing trends. 4.  More effective intermediation and lower transaction costs. Platform firms can help solve market failures and lower the entry and transaction costs for firms that connect to those platforms. This happens in part by reducing information asymmetries, therebyincreasingtrustinthosefirms.Distributed ledger technologies (DLTs) not only can reduce transaction costs but also enhance trust through secure transactions. Better intermediation can disrupt traditional market structure and reduce the market power of intermediaries, particularly in sectors such as agriculture where they have tra- ditionally played a central role in the value chain. These four channels to increasing the impact of data on development are driven by two key effects. First, analytics applied to data can reveal patterns that allow better data-driven decision-making. Second, data can help to facilitate transactions, including by matching the suppliers of goods and services with those who demand them. In this way, the use of data can help overcome market failures, with positive effects on productivity, growth, jobs, and welfare (figure 3.1). Data-driven businesses and the technologies that help them create value Data-intensive analytics can be used to discover new insights, enhance decision-making, and optimize pro- cesses. When data are characterized by the “3 V’s”— volume,velocity,andvariety—theycanserveasinputs to big data analytics. Such analytics typically require new methodologies and technologies to enable enhanced decision-making (box 3.1). This chapter focuses on the development impact of business mod- els that use data-intensive technology or analytics as their key value drivers, whether they are technology firms (the providers of data-intensive technological solutions) or traditional firms and entrepreneurs (the adopters of data-intensive technologies). Firmsmayusevariousdata-driventechnologiesby themselves or in combination. A key business model that has emerged using data-intensive technologies are data-driven platform businesses, which use data, along with AI/ML and other analytics, to intermediate between distinct user groups to match supply with demand. By overcoming informational asymmetries and reducing search costs, these businesses facilitate market exchanges and generate more data on users and their behavior. Some may also use a combination of other technologies. For example, the platform GrainChain uses DLT to broker secure transactions
  • 114. 94 | World Development Report 2021 Related opportunities: access to “free” services, adaptation to COVID-19 challenges, and support for the green agenda Figure 3.1 Data in the production process intermediation and lower transaction costs Product innovation Cost reduction and process optimization Quality improvements Better data-driven decision-making Data-driven matching of demand and supply and transaction facilitation • Boost opportunities for lagging regions and populations • Create new export opportunities • Promote productivity and growth • Create new markets and disrupt the market power of traditional incumbents • Support local entrepreneurship ecosystems Figure 3.1 The role of data in the production process: Pathways to development Source: WDR 2021 team. Box 3.1 Technologies and methods that support data-driven decision-making and intermediation Technology that supports data-intensive analytics: artificial intelligence, including machine learning Artificial intelligence (AI) and machine learning (ML) can help firms analyze their data with less manual effort. AI is the development and use of any device that perceives its environment and takes actions that maximize its chance of success of reaching a defined goal (including learning and adapting to its environment). It is not a single tech- nology but a family of technologies. Machine learning is one application of AI. The algorithms that underlay AI rely on inputs of large amounts of data to learn and pro- duce accurate and valuable insights. Based on adoption patterns, studies predict that firms responsible for about 70 percent of economic output will have adopted at least one type of AI technology by 2030.a Data-intensive analytic applications and big data analytics sometimes require that data be processed in different formats and distributed across different loca- tions. These may include cloud computing, bio-inspired computing, or quantum computing. They also require the capacity to store big datasets and to clean them to correct inaccuracies. Technology that collects data and actions insights from analytics: smart devices and devices connected through the Internet of Things (IoT) Devices include sensors and monitors that generate data. Smart devices rely on these “machine-generated” data to improve their operations, often using AI. Devices are increasingly being connected to the IoT, which allows them to receive and send data from and to other IoT devices on ground moisture, climate and air quality, individuals’ health metrics, firm asset performance, and the movement of goods through supply chains. IoT and machine-generated data from devices are poised to mul- tiply exponentially the data generated by businesses, with potential for development in agriculture, health, manufacturing, and transportation (such as driverless vehicles). IoT devices already exceed the number of internet users and are forecast to reach 25 billion by (Box continues next page)
  • 115. Data as a resource for the private sector | 95 between buyers and sellers of agricultural commodi- ties, while employing Internet of Things (IoT) devices to accurately measure variables such as commodity weight. In those platform businesses that earn a sig- nificant proportion of their revenue from advertising, datacollectedthroughtheplatformareusedtoinform that advertising. Platform models are a key focus of this chapter and of many of the economic policy issues raised in chapter 7 because of their importance to low- and middle-income countries. The extent to which data-intensive technologies can be deployed relies on the presence of key infra- structure, most fundamentally network coverage. There is also the challenge of bringing more people online, especially in countries with a gap between the number of people who have access to networks and those who are online. This gap is a function of afford- ability, the existence of local content, and digital skills (see chapter 5). As more people and devices come online and data usage matures, the network capacity needed will grow, requiring sufficient spectrum to be made available for mobile use—especially in lower- income countries where mobile is the predominant technology. Although 4G technology is sufficient for many IoT uses, 5G will be needed for those uses that require ultra-reliability and low latency such as smart energy grids and autonomous vehicles. Reliability of connection is important for DLT applications that must keep a reliable and consistent record of data. Storage and analysis of the data generated through IoT devices and platform business models depend on cloud computing (remote storage and process- ing infrastructure) and the ability to transmit data over the internet to data centers either locally or abroad. Beyond data infrastructure, most technology applications require a suite of other foundational systems to create value, including reliable payment systems and logistics networks, transport infrastruc- ture, and address systems. Data infrastructure policy is discussed further in chapter 5. Figure 3.2 summarizes how data are used as an input to and produced as a by-product of economic activity. It illustrates how data created through eco- nomic activity can be used as an input to either the same economic activity or new activities. Focus on platform firms in low- and middle-income countries Platform businesses, one of the most ubiquitous and transformative data-driven models today, reduce transaction costs and alleviate market failures.7 Ranging from start-ups to businesses operating at scale, they are a mix of both locally grown and foreign firms, and they are expanding across low- and middle- income countries. More than 300 digital platforms headquartered in Africa were active across major Sub-Saharan African economies as of 2020.8 In Asia, a study looking at local platforms that had reached scale identified 62 major local platforms with an individual market capitalization of at least US$800 million as of 2016, half of which were located in China.9 The diversity of new platforms is evident in recent research examining both start-ups and scaled plat- forms. At least 959 platform firms have established a physical presence in a sample of 17 low- and middle- income countries10 from all regions across four sectors that are important for jobs or economic productivity: e-commerce, transport and logistics (including both freight and passenger transport), agriculture, and Box 3.1 Technologies and methods that support data-driven decision-making and intermediation (continued) 2025, with the introduction of fifth-generation (5G) wireless technology. Technology that creates transparency and trust in data records: distributed ledger technology, including blockchain Distributed ledger technology (DLT) is a distributed database in which data are recorded, shared, and syn- chronized across the nodes (or devices) of a network. Blockchain is a type of DLT whereby information is con- solidated into “blocks” that are linked in a way in which they can add information layers to the ledger, which cannot be changed (in an “append-only” fashion). Block- chain records transactions, tracks assets, or transfers value between two parties in a verifiable and permanent way without the need for a central coordinating entity. Because everyone participating in the blockchain can see all transactions, the technology engenders peer-to-peer trust and has several applications, including enabling payments, smart contracts, supply chain tracking, and resolving data protection and security issues in the IoT. a. MGI (2018).
  • 116. 96 | World Development Report 2021 tourism.11 In the sample, Bangladesh, Brazil, Indone- sia, Kenya, and Nigeria have relatively high numbers of platform firms when controlling for gross domestic product (GDP) per capita (figure 3.3, panel a). Across the countries in the sample, most platform firms are recent entrants—55 percent were estab- lished in the past five years.12 Only 11 percent of firms were established more than 10 years ago. Firms also tend to be small—over 80 percent have 50 or fewer employees, and almost half (47 percent) have 10 or fewer (figure 3.3, panel b). Most firms have remained active (defined as having an active and up-to-date online presence) since they were established; the average share of firms currently active across regions is more than 80 percent. Sub-Saharan Africa is an outlier: nearly half of its firms appear to be inactive.13 E-commerce has the highest share of platform firms in 82 percent of countries in the sample, with the highest shares in South Asia and the Middle East andNorthAfricaandthelowestinEuropeandCentral Asia. The agriculture sector tends to have the smallest share of firms across regions, with the exception of Sub-Saharan Africa. The importance of e-commerce in the data economy is also reflected in web traffic. Although local data-driven firms are on the rise in low- and middle-income countries, foreign-headquar- tered firms have a significant presence, underscoring the global nature of the data-driven economy. Their presence is also a reminder that the platform econ- omy is still nascent in lower-income countries rela- tive to high-income economies (partly due to issues around trust, lack of digital skills, and lack of access to finance). Of the top 25 websites in terms of traffic in the 17 low- and middle-income countries sampled, 59 percent belong to firms with foreign headquarters on average14 —however, the figure varies across coun- tries (figure 3.4). Although the presence of firms from high-income countries in lower-income countries is Figure 3.2 The role of data in economic activity Source: WDR 2021 team. Note: AI = artificial intelligence; DLT = distributed ledger technology; IoT = Internet of Things. Data produced as a by-product of economic activity Data inputs for economic activity Platforms and other data-driven businesses using technologies, including • Big data analytics for insights and innovation, including using AI • IoT devices and smart devices • Transactions facilitated by DLT, including blockchain Economic activity (firms, entrepreneurs, farmers) Data used to inform advertising Supported by network connectivity, storage capacity, and processing capacity Source of revenue for economic activities in advertising-based business models • Data observed—on users, objects, and processes—through economic activity, including machine-generated data • Data inferred through analytics Big data cleaning and storage • Data volunteered by users • Open public data
  • 117. Data as a resource for the private sector | 97 Figure 3.3 Platform firms are numerous in some lower-income countries but tend to be small Sources: Nyman and Stinshoff (forthcoming), based on information from Crunchbase, Crunchbase (database), https://www.crunchbase.com/; World Bank, Digital Development (database), https://www.worldbank.org/en/topic/digitaldevelopment; Dow Jones and Company, Factiva (database), https://professional.dowjones.com/factiva/; Thomson Reuters Foundation, “Inclusive Economies,” http://www.trust.org/inclusive-economies/; Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www .alexa.com/topsites/countries; World Bank, World Development Indicators (database), https://datatopics.worldbank.org/world-development-indicators/. Note: Panel a shows the number of platform firms and platform firms weighted by gross domestic product (GDP) per capita in selected low- and middle-income countries The total sample of platform firms is 959. Per capita GDP is in constant 2010 US dollars for 2019. Panel b shows the share of firm sizes in terms of number of employees by region in a sample of 595 active platform firms. a. Number of platform firms and platform firms weighted by GDP per capita b. Distribution of platform firms, by region and number of employees Number of employees: Share of firms (%) Number of firms per US$1,000 GDP per capita Number of firms 2 5 6 9 15 25 34 34 35 41 42 43 56 78 106 149 0 5 10 15 20 25 30 35 40 0 50 100 150 200 250 300 A r m e n i a M o r o c c o T u n i s i a S r i L a n k a P e r u P h i l i p p i n e s E g y p t , A r a b R e p . C o l o m b i a U k r a i n e B a n g l a d e s h K e n y a S o u t h A f r i c a M a l a y s i a N i g e r i a I n d o n e s i a R u s s i a n F e d e r a t i o n B r a z i l Number of firms Number of firms per US$1,000 GDP per capita (right axis) 0 10 20 30 40 50 60 70 80 90 100 S o u t h A s i a E a s t A s i a a n d P a c i fi c E u r o p e a n d C e n t r a l A s i a M i d d l e E a s t a n d N o r t h A f r i c a d L a t i n A m e r i c a a n t h e C a r i b b e a n S u b - S a h a r a n A f r i c a 1–10 11–50 51–100 101–250 250 284 Figure 3.4 The importance of domestic versus foreign-headquartered firms differs across countries as indicated by firm share of top websites Source: WDR 2021 team, based on Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www.alexa.com/topsites /countries. Note: The figure shows the percentage of websites for firms with foreign headquarters versus domestic headquarters among the top 25 websites per country based on traffic. Headquarters is understood to be the global headquarters, not the domestic or regional office. Total sample size is 425 websites. 20 24 48 52 52 56 56 60 64 64 64 64 68 68 76 80 84 80 76 52 48 48 44 44 40 36 36 36 36 32 32 24 20 16 0 10 20 30 40 50 Share of top websites (%) 60 70 80 90 100 Indonesia Russian Federation Egypt, Arab Rep. Bangladesh Nigeria Morocco Kenya Armenia South Africa Malaysia Ukraine Tunisia Brazil Sri Lanka Colombia Peru Philippines Foreign firms Domestic firms
  • 118. 98 | World Development Report 2021 widespread, the opposite is not true. Only 15 percent of digital firms headquartered in Sub-Saharan Africa operate outside the region, and the majority of those have expanded to the Middle East and North Africa.15 The leading global platforms are highly relevant to the digital ecosystems of lower-income countries and their citizens, particularly for online search and social media. Google, YouTube (which is owned by Google), and Facebook are among the top 10 most visited web- sitesin62of77low-andmiddle-incomecountries(fig- ure 3.5, panel a). These platforms also have the highest average daily time spent on the site per user globally (figure 3.5, panel b). In online markets where firms compete for the attention of viewers, such popularity can significantly intensify these platforms’ market power in advertising (which is, in turn, important for suppliers of other products) and increase the amount of data being collected about users. Google’s Next Bil- lion Users initiative is specifically aimed at develop- ing products and services for lower-income countries. Facebook has launched an app aimed at providing free data in lower-income countries. WhatsApp (owned by Facebook) is by far the most used mobile applica- tion globally in terms of time.16 Because of the global nature of these firms, dynamics in overseas markets that affect the strategies and policies of these large platforms will have repercussions for those in low- and middle-income countries. Data traffic over the internet is also highly con- centrated in a few companies. Six US companies generate more than 40 percent of the world’s internet data flows (figure 3.6, panel a). Across the top 25 websites (by traffic) in the 17 sampled countries fea- tured in figure 3.4, some 60 percent is owned by five firms headquartered in the United States (Google, Microsoft, Facebook, Verizon, and Amazon). Signif- icant non-US parent companies include Naspers (headquartered in South Africa), Alibaba (China), and Jumia (which has its operations largely in Nigeria)— see figure 3.6, panel b. Data inputs for economic activity The “digital footprint” and data collection by firms Everything a digital user does leaves a trail, whether it is making a phone call, sending a text, conducting an online search, posting on social media, or making a digital transaction. The digital footprint of an indi- vidual or business is their collection of traceable dig- ital activities and communications on the internet or other digital media. Data collected through devices— particularly IoT devices—can also capture insights on individuals and firms. For firms, such insights are gained from data on throughput and efficiency, spare capacity, and asset quality, among other things. For individuals, they typically involve health and bio­ metric data. Digital footprints can be actively created when a user makes a choice to share information, such as by posting on social media or volunteering informa- tion to register for services. Or they can be passively Figure 3.5 Users visit and spend more time on Facebook, Google, and YouTube than other websites Source: WDR 2021 team, based on data from Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www.alexa.com /topsites/countries. Note: Sample of 1,270 websites (top 10 websites in 127 countries). a. Share of countries where Facebook, Google, and YouTube are in the top 10 visited websites, by country income group b. Daily time spent on website (global average): Facebook, Google, and YouTube versus other top 10 visited websites 100 94 78 Low-income 0 25 50 75 100 High-income Percent Middle-income 0 5 10 15 20 Average daily time spent per user (minutes) Facebook.com Google.com Youtube.com Others (average) 17.42 13.12 13.04 7.01
  • 119. Data as a resource for the private sector | 99 created, when data are left behind as a by-product of other activities such as an Internet Protocol (IP) address, search history, or internet clicks. Firms typically collect both active and passive data. Often, this information is collected at exceedingly high fre- quency and microgranularity. Big data and ML algorithms enable firms to draw inferences about the characteristics of individuals (such as attitudes and socioeconomic status) and other businesses (such as performance, capacity, and reputation). In low- and middle-income countries, digital footprints are best known for their ability to predict repayment behavior. However, applications also extend to the development of new products and improvements in service delivery across the economy, to the development of analytical tools for businesses, and, importantly, to the creation of consumer pro- files that can be used to sell advertising services—a practice that ultimately subsidizes many of the “free” products that consumers use today. A number of risks have been identified and concerns raised about these methods and applications. These are discussed later in the chapter. Data collected through mobile phone use has been one of the fastest-growing sources of user informa- tion and behavior. CDRs held by mobile network operators (MNOs) contain certain data on every call and text made, including the telephone number of the caller and receiver, the date and time of the interac- tion, and the associated cellphone tower. This infor- mation is primarily collected for billing purposes but can be used as well to identify the behavior, mobility patterns, and social networks of users. MNOs can also track data on use of value-added services, inter- net services, and mobile money transactions.17 Data-driven firms—including e-commerce, online search, and social media firms—produce, in addition, data on behavioral patterns that can be detected by noting the time, frequency, and extent of transactions or communications made. Several variables that can help predict economic status are available to platform firms, including the device type of their users (desk- top, tablet, mobile), the operating system (Windows, iOS, Android), and the channel through which a user arrived at the firm’s home page. For example, having an iOS device consistently correlates with being in the Figure 3.6 Internet traffic in low- and middle-income countries is concentrated in several US-based firms Source: Sandvine 2019. Data at http://bit.do/WDR2021-Fig-3_6_a. Note: “Others” refers to file sharing, marketplace, security, and virtual private network (VPN) messaging, as well as cloud and audio streaming. Source: Nyman and Stinshoff (forthcoming), based on information in Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www.alexa.com/topsites/countries. Note: The panel shows the percentage of the top 25 websites owned by the listed parent organizations in the sample. Traffic rank is from a low of 1 (more traffic) to a high of 20 (least traffic). Total sample includes 425 websites from 17 low- and middle-income countries. Ownership is defined as majority shareholding. a. Share of global internet traffic by firm, 2018 (% of total) Share of top websites owned by firm (%) Rank b. Ten most visited parent firms among the top 25 websites Google, 12 Netflix, 11 Apple, 4 Microsoft, 5 Amazon, 3 Facebook, 8 Others, 57 30 28.3 11.4 8.9 G o o g l e M i c r o s o f t F a c e b o o k V e r i z o n A m a z o n Z o o m A l i b a b a N a s p e r s J u m i a 6.3 5.9 5.5 3.8 3.8 1.7 25 20 15 10 5 0 20 18 16 14 12 10 8 6 4 2 0 % of websites Average rank (right axis)
  • 120. 100 | World Development Report 2021 top quartile of the income distribution in the United States.18 A study in Germany found that the time of day that purchases were made on an e-commerce site was predictive of a consumer’s self-control and repayment behavior. People who made purchases between noon and 6 p.m. were half as likely to default on their payment for the products bought as those who made purchases between midnight and 6 a.m. Consumers coming from a price comparison website were roughly half as likely to default as customers directed to the site via search engine ads, consistent with research on impulse shopping.19 Finally, firms collect data that are generated as a result of phenomena that are not attributable to a specific individual or business. These data are often generated by machines or devices and can include data on traffic, weather and climatic conditions, and network usage in the case of computing or network industries. This Report highlights many examples of using this type of data for development purposes. The use of open public intent data by businesses Public intent data are also used by businesses for commercial purposes, particularly where those data are nonpersonal and where there are positive spill- overs from the private use of such data. The use of open public intent data by private firms is preva- lent in advanced economies with advanced national data systems, although there are also examples from low-income and emerging economies. Spotlight 3.1 discusses these uses in greater detail. The positive development impacts of data used in the production process This section presents examples of potential positive development impacts that data used in firms’ pro- duction processes can facilitate (following the third development pathway of data generated by private firms described in chapter 1). It then turns to the risks stemming from the use of data by firms that policy makers need to consider. These examples are included to help set out what the current production possibility frontier might look like going forward. Although many of the examples may be outliers, they can indicate what may be possible and what countries may want to aspire to. Boosting opportunities for lagging regions and popu- lations by reducing market fragmentation. Data-driven business models can lower entry costs to markets and provide new opportunities for small firms and low-income households.20 Firms in less populated areas can expand their access to markets through data-driven platforms, which match sellers and buy- ers, and through the logistics services of e-commerce platforms, which provide support in bringing prod- ucts to market from more remote areas. Costs associ- ated with distance are as much as 65 percent lower for onlinevendorsactiveonglobale-commerceplatforms, compared with those for offline vendors.21 Lower- income countries could reap substantial benefits from such e-commerce platforms: the cost-reduction effects of platforms tend to be greater for exporting countries that are unknown or less trustworthy to consumers (as measured by corruption indexes).22 In China’s rural (and traditionally lower-income) Taobao Villages, where annual e-commerce transac- tion volumes exceed 10 million yuan and at least 10 percent of households engage in e-commerce, rural households trading goods on the Taobao platform have significantly higher incomes and higher income growth than those that do not.23 These data are sug- gestive of the large benefits of data-driven business models in lagging regions, but the evidence base is just beginning to develop, and not all signs are encouraging. For example, a recent randomized trial in China that connected rural villages to e-commerce found little evidence of income gains for rural pro- ducers and workers. Understanding what factors led to Taobao success and meager gains for other Chinese villages is a crucial part of the future research agenda. Creating new export opportunities. Not only can entrepreneurs market their goods remotely, but intangible data-enabled services now flow across borders. The boom in data-enabled services creates opportunities for new entrants in global trade and may foster economic growth for countries tradition- ally lagging in access to global markets. For example, the rise in Indian services exports has been associated with gains in per capita income and a decline in urban and rural poverty head count ratios.24 The Indian dig- ital services industry has also created employment, especially for women and in smaller cities with populations of about 1 million or less, which can help bridge economic and social inequalities.25 The Philippines is another important beneficiary, exporting an estimated US$23 billion in services enabled by information technology (IT)—equivalent to almost half of the country’s merchandise trade exports and more than double its total agricultural exports.26 In Africa, Senegal, a low-income country, boasts a dynamic digital services sector focused on business services as well as apps and software solu- tions geared toward regional markets. Mauritius
  • 121. Data as a resource for the private sector | 101 has developed a fast-growing export-oriented digital business services industry, diversifying its services- centered economy away from the country’s tourism sector.27 Such expansion into digital services has proven important in light of the tourism collapse pro- voked by the COVID-19 pandemic. Studies show that trade in services, in particular IT-enabled business services, is not as prone to sudden collapses as other forms of global trade.28 Promoting productivity and growth. Although evi- dence from lower-income countries is scant, across four industries (hotels, restaurants, taxis, and retail trade) in 10 member countries of the Organisation for Economic Co-operation and Development (OECD)29 the average service provider enjoyed bigger increases in the overall productivity of labor and capital in countries with relatively high online platform devel- opment between 2011 and 2017.30 Increased e-sales activities accounted for 18 percent of the labor pro- ductivity growth in 14 European countries from 2003 to 2010.31 There is also some evidence from Europe of a smaller productivity gap between large and small firms in the sectors that use online sales most.32 Machine learning could fundamentally revolu- tionize innovation, bringing data to the core of the growth process.33 Empirical evidence on 18,000 US manufacturing plants between 2005 and 2010 finds that firms with more intensive data usage are signifi- cantly more productive due to not only technology adoption but also data-driven decision-making.34 Globally, AI could deliver additional economic output of about US$13 trillion between 2018 and 2030 (net- ting out competition effects and transition costs), boosting global GDP by about 1.2 percent a year, according to a 2018 modeling exercise.35 Of course, not all countries will be affected sim- ilarly, and the evidence base for low- and middle- income countries needs to be developed. The focus in this chapter on data-driven business models and productivity enhancement through data reveals an imbalance toward case studies from higher-income countries. That imbalance is itself a sign of the lop- sided distribution of benefits deriving from data. Creating new markets and disrupting the market position of traditional incumbents. E-money platforms, among others, have challenged traditional banks and other service providers in transferring funds to and from accounts. For example, M-Pesa reached 9.5 million customers within its first three years in operation in Kenya, a country with only 8.4 million bank accounts.36 Electronic freight exchanges such as uShip and Mober that match carriers with cargo holders have triggered global logistics providers such as Schenker and DHL to develop their own digital exchanges. Incumbent taxi providers who were pro- tected by fixed license caps are now exposed to com- petition from ride-hailing apps, which can improve inclusion of some parts of the population. For exam- ple, in Mexico City the proportion of female drivers with Uber (5 percent) is higher than in Mexico’s taxi industry(0.5–2.5percent).37 A2017studyof2,000firms in 60 countries found that digital entrants boost the size of an industry by both realizing latent demand and taking market share from incumbents.38 Indeed, the study showed that the profits of incumbents fall significantly in response to competition from digital entrants, and the slowest-growing incumbents are the most affected. These effects should translate into welfare benefits for users through more affordable products and greater innovation, although the ben- efits are less likely felt by the lowest income groups because they do not participate in these markets as consumers or suppliers. Supporting local entrepreneurship ecosystems. An eco- system is an interconnected set of services accessed through a single integrated experience. Facebook, for example, enables users to shop, make hotel bookings, message contacts, read the news, and chat with a doctor—all with different firms but through a single interface. Successful data-driven firms often spark new business models through complementary products and aftermarkets. Such local ecosystems, consisting of symbiotic and interdependent firms, frequently rise up around leading multinational platforms. The leading global platforms are therefore highly relevant for the digital ecosystems of countries and their citizens. Ecosystems built around larger firms can enable lower-income economies to build digital capabilities, especially because they integrate data across a series of services to increase the scale and scope of datasets. For example, Apple’s Healthkit platform offers Apple device users the option to share their health and activ- ity data across applications on their smartphones. This integration allows researchers, hospitals, and developers of health care and fitness apps to access valuable data to inform patient care, marketing, and product development. The development of a healthy ecosystem relies on provision of access to data and systems. Related opportunities arising from data-driven business models Three related opportunities can arise from data- driven business models that are not linked directly to the four channels discussed earlier. Evidence is just
  • 122. 102 | World Development Report 2021 emerging, but it indicates that these models could be important for lower-income countries. Providing nominally “free” services to consumers. Nom- inally “free” or “zero price” services have become an integral part of our lives. Free messaging services, video communication, social connection tools, search engines, map services, storage, and translation and payment apps are now commonplace and increas- ingly being developed with low-income consumers in mind. Google is rolling out scaled-down search and e-mail apps for low-end smartphones, as well as voice search in various dialects to overcome literacy chal- lenges. Free services also offer important inputs for other data-driven businesses. Digital start-ups often rely on integration with digital payment systems, cloud storage, and online analytical tools. For trans- formative business models that match and connect users in different geographic locations, online map services are a crucial input. The welfare gains of nominally “free” digital goods are substantial. However, they are likely to be under- estimated because they are not captured in GDP— they have a nominal price of zero. A recent study found that including the welfare gains from Facebook would add between 0.05 and 0.11 percentage points to GDP growth per year in the United States.39 “Free” services reflect the very low marginal cost of replicating and distributing data and certain dig- ital services. But they are ultimately made possible on a large scale because firms can monetize data through advertising and data sales, thereby giving rise to the idea that users in fact “pay with their data.” This approach is reflected in the revenue structure of some of the tech giants. Mobile advertising made up 84 percent of Google’s total revenue in 2019,40 while Alibaba earns more than half its global revenues from advertising.41 Both Google and Facebook have offered free internet access in lower-income countries. Facebook’s schemes have been criticized for mining the data of low-income users while initially defying net neutral- ity rules and offering access to only a limited set of sites.42 Meanwhile, Google attempted to monetize its free Wi-Fi service (rolled out in nine middle-income countries) by showing ads to users, but it recently discontinued these services because they have proven unprofitable.43 More limited opportunities in low- and middle- income countries for monetizing data may there- fore limit the ability of firms operating locally to offer free services. Facebook’s average revenue per user in the United States and Canada was US$41.41 in 2019, whereas it was US$2.48 in all countries except the United States, Canada, and European and Asia-Pacific countries. Firms focused on lower- income countries may find it difficult to replicate the free services offered by firms that operate globally and can cross-subsidize their operations with global advertising. Adapting to new ways of doing business because of the COVID-19 pandemic. Urgent demands have surfaced for the private sector to adopt data-driven solutions to deal with the pandemic and increase resilience and productivity for recovery. Firms will also need to increasingly invest in the transparency and traceabil- ity of value chains, increase their reliance on automa- tionintheproductionprocess,andmakemoreprecise predictions about their demand and input supply that anticipate disruptions. Data-driven technologies will play a critical role in helping firms adapt. Smart con- nected devices and robots that automate previously manual processes while collecting and analyzing data will serve as a key input in this reengineering of busi- ness processes during the recovery. AI that can pre- dict consumption and production trends, combined with platforms that provide matching through data analysis, may also create on-demand labor forces. Although the intensifying adoption of data-driven business models can be an opportunity, these trends also hold risks for the international competitiveness of those countries not at the technological frontier, with implications for jobs and inequality. Impacts on the green and sustainability agenda. The increased use of data-intensive technologies con- tributes to global carbon emissions. And yet these technologies can also help firms better manage their environmental footprint and become sustainable, while allowing sectors such as agriculture to adapt to climate change. By improving the efficiency and traceability of supply chains and production pro- cesses, these technologies can reduce waste, enable circular solutions, promote sustainable sourcing of inputs, and empower consumers to make more environmentally responsible decisions. By making energy systems more efficient (including through automated tracking of energy use), they can facilitate the adoption of renewable energy through better management of performance. Data-driven farming can help farmers adapt to climate change while rationalizing use of harmful inputs. However, the net impact of such technologies on the environment will depend on several factors, including responsible actions by consumers and the decarbonization of the energy sector.
  • 123. Data as a resource for the private sector | 103 How use of data in the production process is transforming sectors New business models that use data to drive value are springing up in low- and middle-income economies. The data and technology that can be most transfor- mative depend on the types of market failures that need to be solved and the development channels that are possible. This differs across sectors. Finance Some 1.7 billion adults worldwide did not have a bank account as of 2017.44 At least 200 million small firms in low- and middle-income countries have unmet credit needs estimated at US$2.2 trillion.45 Several market failures are to blame. First, the high cost of traditional banking relative to the low-value transactions and balances of low-income individuals makes it less via- ble or attractive for traditional banking to serve this market segment. Second, information asymmetries between financial institutions and low-income bor- rowers make it difficult to assess credit risk, thereby limiting the supply and raising the price of credit. Finally, formal financial services lack relevant prod- ucts and services for low-income users. Digitization and data analytics can help overcome these chal- lenges to make services more accessible, affordable, and secure. Alternative credit scoring algorithms. Financial ser- vice providers are increasingly adopting alternative credit scoring techniques that take advantage of users’ digital footprints to train ML algorithms to identify, score, and underwrite credit for individ- uals who otherwise lack documentation of their creditworthiness. Two early movers that have achieved scale— Lenddo (Philippines) and Cignifi (operating in Africa, Asia, and Latin America)—use data that con­ sumers volunteer about their cellphone use patterns, digital transactions, and social media and web browsing activity to build algorithms that map behavioral patterns and score the creditworthiness of borrowers. Payment and transaction histories have also enabled e-commerce firms to move into lending. Ant Financial’s MYbank app links directly to users’ Alibaba transaction data to score and extend ­ credit.46 Amazon’s small business loan operation (which operates in China, India, and other countries) relies on a seller’s sales performance on Amazon to decide whether to extend credit. Destácame, a Chile-based alternative credit scoring start-up and the first of its kind in Latin America, uses data on utilities payments for its credit scoring. Psychometric tests, which assess the abilities, atti- tudes, and personality traits of individuals, are also being used to screen borrowers. LenddoEFL provides financial institutions with psychometric tools that analyze applicants’ answers on an online quiz, includ- ing factors such as how long it takes applicants to answer and how they interact with the web interface. LenddoEFL claims to have made more than 12 mil- lion credit assessments through more than 50 client financial institutions around the world.47 Notwithstanding the opportunities these approaches offer, observers have raised concerns that using algorithms in this way can discriminate against individuals and reinforce existing racial, gen- der, and economic inequalities. Lenddo, for example, has been known to rate consumers as less credit- worthy if they are friends on Facebook with someone who was late paying back a loan.48 Algorithmic bias is discussed later in this chapter and in chapters 4 and 6. Payment systems. Digital payments are by defi- nition flows of electronic data. They are central to powering e-commerce and other online transactions, while simultaneously generating data on purchasing patterns that can provide insights into a plethora of consumer characteristics. Mobile payments in particular have reduced the transaction costs of transferring resources, enabling new ways for households and firms to make payments, save, and send remittances. The well- documented benefits of mobile money in lower- income countries include lowering transaction and transport costs; encouraging saving through the relatively safe storage of value in a digital format; empowering female users through greater privacy, thereby increasing their bargaining power within families; and allowing more effective risk sharing between households.49 In India, mobile money has improved the ability of households to share risk, providing welfare benefits of 3–4 percent of income on average.50 Research also has found a significant link between the use of the mobile financial service M-Pesa and a reduction in poverty among Kenyans, with greater impacts on female-headed households through changes in financial behavior and movement of labor from subsistence farming to secondary jobs and entrepreneurship.51 Use of transaction data for product development. Digi- tal payments generate large amounts of data on how people make purchases and transfers, which can be especially important in economies that run largely
  • 124. 104 | World Development Report 2021 on cash and among demographic groups that have a small digital footprint. These data can enable firms to see which regions and market segments are expand- ing, understand user preferences and behavior to target services such as microcredit, and predict fraud and increase security within and between platforms. For example, Mastercard’s Tourism Insights service allows the tourism industry to make better invest- ments by leveraging big data to provide information on travelers’ preferences.52 In South Africa,TymeBank offers customers incentives to link their debit cards to their retail loyalty programs, providing access to data on customer spending that are used for product design. Distributed ledger technology, including blockchain. Blockchain eliminates the need for financial inter- mediaries, drastically reducing settlement time and making transfers almost instantaneous. The use of digital technology can embed rules into smart con- tracts, including automated execution of contract. The explicit terms and payments of DLT can simplify complex negotiation and verification processes.53 DLTs’ use of smart contracts in the provision of loans and credit can also improve trust. This is especially important for new and smaller firms that lack the requisite credit histories and collateral. Despite the promise of blockchain, there are seri- ous challenges to its widespread adoption, including unclear or unfavorable regulatory approaches and lack of user understanding. Adopting blockchain where the technology does not address the underly- ing issue or consumer needs is also problematic. Agriculture Managing production and marketing risks is a key challengeforsmallholderfarmersandagribusinesses. Remote sensing and geographic information sys- tems, together with data analytics, provide insights into farming operations and propel the development of smart farming, which can help manage production and financial risks. For example, NubeSol, an Indian agtech firm, provides sugarcane growers with a monthly yield map of their plots, with forecasts of yields and recommendations on inputs such as fertil- izer based on remote sensing and data analytics. JD Digits (JDD), a technology firm in China, is adopting AI techniques and big data to provide credit to farmers who raise pigs. Farms install AI-enabled cameras that can recognize pigs’ faces, as well as IoT technology to transmit data about the farms’ physical conditions. If a pig with feeding abnormalities has been identified, the algorithm can quickly extract information about its growth history and immune status to provide customized feeding care. The IoT system adjusts farm conditions such as humidity, temperature, and lighting based on real-time data on the farm. Using information about farm operations, JDD also carries out credit assessments to provide farmers with loans, which has reduced their nonper- forming loan ratio to nearly zero. Platforms are using data as well to provide a range of services and products along the value chain, including by reducing idle capacity in machinery. Hello Tractor, which emerged in Nigeria, operates a platform connecting tractor owners and farmers who lack their own equipment. Data about tractor loca- tions and availability are monitored using an installed device and then transmitted to Hello Tractor’s mobile app platform, which farmers can use to submit a booking request. In this way, farmers are able to find the most cost-effective available tractor, and tractor owners are able to monitor the use of equipment. Another agriculture platform, DigiCow, pioneered in Kenya, keeps digital health records on cows and matches farmers with qualified veterinary services. Integrated, data-focused solutions are emerging along the whole agriculture value chain. Digifarm, a mobile platform offered by Safaricom in Kenya, provides farmers with one-stop access to a suite of products, including financial and credit services, quality farm products, and customized information on best farming practices. Mobile money data from M-Pesa and data on the way people behave on the app are taken into consideration to provide farmers with tailored products and services. As agriculture supply chains become more com- plex, margins imposed by different intermediaries mount, which raises the prices paid by consumers and depresses the income earned by farmers. Food traceability concerns also increase.54 Data-based solu- tions can improve food traceability, while disrupting traditional market structures by reducing the need for intermediaries. In Haiti, blockchain solutions have allowed mango farmers to maintain ownership of their produce until the final sale to US retailers by facilitating traceability and direct payments. Employed in conjunction with other value chain com- ponents such as third-party logistics services, inter- mediaries that previously held substantial market power are circumvented. Customers can scan a QR code on the final product to access information about where the mango comes from, how it was packaged and transported, and the costs involved at each step.55 Similarly, Walmart has collaborated with IBM to trace mangoes from South and Central America to the United States. Participants in this process cannot edit
  • 125. Data as a resource for the private sector | 105 information because of the decentralization feature of blockchain technology, which ensures trust and transparency.56 Personal data protection has specific complexities for farmers. Data on their farms are identifiable and could be used to reveal personal details such as their wealth and income. However, farmers could also benefit from using and pooling their data to develop commercial insights. The governance regime for agri- culture thus requires special considerations (see the further discussion of governance issues in chapter 8). Health To deliver individual health care in lower-income countries, data-driven applications require comple- mentary improvements in infrastructure and basic health services before they can become truly trans- formational. Some business models show promise in helping overcome such challenges, which include high logistical costs, counterfeiting of pharmaceuti- cals, difficulties in coordinating health care resources, and low supplies of specialist expertise, especially in rural areas. With strong mobile phone penetration, rising investment in digitizing health informa- tion, and developments in cloud computing, more health-focused businesses in low- and middle-income countries are likely to adopt data-intensive advances in coming years. Although such advances hold prom- ise,thesensitivenatureofhealthdataimpliesanacute need for policy makers to be aware of the risks posed by the improper collection and use of these data. Telehealth (mHealth and eHealth). Telehealth makes use of data and connected devices to deliver care remotely. In rural areas where the ratio of doctors to patientsislow,telehealthisausefulwaytoaccesscon- sultations and disease diagnosis. The model has also played an important role during the COVID-19 pan- demic, where remote diagnosis has been necessary. Mobile apps combined with AI technology and wearable devices can provide in-the-field diagnoses and recommendations. For example, Colorimetrix, an app that allows a smartphone camera to read results from color-based tests for diabetes, kidney disease, and urinary tract infections, was designed specifically with lower-income economies in mind. Algorithms are used to compare the result with stored calibra- tion values. Results are delivered to the smartphone, allowing for further analysis of results for trends. The hope is that such apps will eventually also be able to detect HIV, malaria, and tuberculosis.57 Accuhealth Chile monitors patients in remote areas by using a range of connected medical sensors. Both quantitative data on patients’ progress and qualitative data collected through custom-created questionnaires are sent to a virtual clinical service that conducts patient triage based on algorithmic analysis. Accuhealth is also using predictive algo- rithms to make service delivery more cost-efficient. In Cameroon, CardioPad was locally designed to improve the access of patients living in rural areas to cardiovascular health care. The CardioPad tablet is paired with sensors that collect data on the patient’s health statistics and transmit them over a mobile network to hospitals where cardiologists can make a diagnosis. Drug verification. Substandard or falsified medical products will be an urgent health care challenge in the next decade, according to the World Health Orga- nization (WHO).58 An estimated one-tenth of medical products in low- and middle-income countries, partic- ularly antimalarials and antibiotics, are substandard or falsified.59 Mobile authentication services such as mPedigree offer people in countries such as Ghana, India, Kenya, and Nigeria an easy way to check the authenticity of medicine. Launched in Ghana in 2007, mPedigree allows pharmaceutical manufacturers to add a code to their packaging that consumers can then ver- ify using their mobile phones. mPedigree has also begun using its consumer authentication data to monitor for anomalies in real time so that it can then generate warnings to brand owners, regulators, and consumers.60 Supplies management. Digital platforms can also help manage supplies in countries where centralized provision is deficient or lacking. LifeBank is a Nige- rian platform firm that matches hospitals requesting blood with potential donors based on current demand and location maps of all institutions involved in blood distribution. Information about the donation, col- lection, screening, storage, and delivery procedures are recorded on a blockchain, thereby increasing confidence in blood quality. LifeBank claims to have reduced the average delivery time from about 24 hours to 45 minutes.61 During the COVID-19 crisis, the platform has also extended its services to matching medical equipment. Education Despite significant improvements in school enroll- ment over the last decades, an average student in low-income countries performs worse than 95 percent of the students in high-income countries.62 Lack of teaching resources and learning tools and the traditional one-size-fits-all approach in education have made it difficult to tailor instruction to students’
  • 126. 106 | World Development Report 2021 individual abilities and needs, particularly where classrooms are overcrowded.63 Recent advances in big data and AI offer opportunities to provide individu- alized learning experiences for students. Machine learning and data analytics techniques can help iden- tify students’ behavioral patterns (such as mistakes made frequently in tests) usually in a more efficient way than teachers. For example, by memorizing and understanding students’ learning paths, Byju, a digi- tal supplemental learning platform in India, suggests tailored learning materials such as videos, quizzes, and flashcards that match the needs of individual stu- dents. Besides analyzing individual learning behav- iors, the platform also analyzes aggregate data on how all students learn on its platform. If many students are having trouble with similar types of problem sets, the system flags the need to add more explanatory videos or materials to the entire platform. In China, Squirrel Ai Learning is another firm spe- cializing in intelligent adaptive education. Students start with a short diagnostic test to leave a digital footprint reflecting their knowledge level so that the teaching system can provide a tailored curriculum, which is updated as the student proceeds through learning modules. Based on its comparison trials among middle school students, Squirrel Ai Learning claims that its system does a better job of improving mathtestscoresthantraditionalclassroomteaching.64 During the COVID-19 pandemic, digital plat- forms that support live video communication have been playing an indispensable role in transitioning to online learning. As of May 2020, more than 140 countries had closed schools, affecting more than 60 percent of enrolled students.65 Lark, for example, is providing educational institutions in India with free cloud storage and video conference services. ­ Dingtalk, a communication platform that supports video con- ferencing and attendance tracking, has connected more than 50 million students with teachers in China. Transport and logistics Data-driven firms in transport and logistics provide matching services to facilitate the use of assets by other market participants. By automating decision- making and navigation, these models increase the efficiency of service delivery and the management of supply chains. Prominent applications are for digital freight matching, digital courier logistics, and IoT- enabled cold storage. Digital freight matching. These platforms (often dubbed “Uber for trucks”) match cargo and shippers with trucks for last-mile transport. In lower-income countries, where the supply of truck drivers is highly fragmented and often informal, sourcing cargo is a challenge, and returning with an empty load contrib- utes to high shipping costs. In China, the empty load rate is 27 percent versus 13 percent in Germany and 10 percent in the United States.66 Digital freight matching overcomes these chal- lenges by matching cargo to drivers and trucks that are underutilized. The model also uses data insights to optimize routing and provide truckers with inte- grated services and working capital. Because a signifi- cant share of logistics services in lower-income coun- tries leverage informal suppliers, these technologies also represent an opportunity to formalize services. Examples include Blackbuck (India), Cargo X (Brazil), Full Truck Alliance (China), Kobo360 (Ghana, Kenya, Nigeria, Togo, Uganda), and Lori (Kenya, Nigeria, Rwanda, South Sudan, Tanzania, Uganda). In addition to using data for matching, Blackbuck uses various data to set reliable arrival times, drawing on global positioning system (GPS) data and predictions on the length of driver stops. Lori tracks data on costs and revenues per lane, along with data on asset utiliza- tion, to help optimize services. Cargo X charts routes to avoid traffic and reduce the risk of cargo robbery. Kobo360 chooses routes to avoid armed bandits based on real-time information shared by drivers. Many of the firms also allow shippers to track their cargo in real time. Data on driver characteristics and behavior have allowed platforms to offer auxiliary services to address the challenges that truck drivers face. For example, some platforms offer financial products to help drivers pay upfront costs, such as tolls, fuel, and tires, as well as targeted insurance products. Kobo360 claims that its drivers increase their monthly earnings by 40 percent and that users save an average of about 7 percent in logistics costs.67 Lori claims that more than 40 percent of grain moving through Kenya to Uganda now moves through its platform, and that the direct costs of moving bulk grain have been reduced by 17 percent in Uganda.68 Digital courier logistics. The growth of on-demand couriers enables small merchants and the growing e-commerce industry to reach customers rapidly and reliably in expanding urban areas. Data-driven matching and route optimization overcome high search costs and traffic congestion and provide verifi- cation of safety standards through customer reviews. The prime example of this business model is Gojek, which is reportedly utilized by more than 1 mil- lion motorcycle drivers serving 500,000 micro, small, and medium enterprises (MSMEs)—including more than 120,00 MSMEs since the onset of the COVID-19 pandemic.69 Established in Indonesia in 2010 as a
  • 127. Data as a resource for the private sector | 107 call center to connect consumers to courier delivery services, the company leveraged its data on consumer behavior to expand into digital courier services in 2015. Its app now offers various logistics services, including delivery of food and groceries and medi- cines and pharmaceuticals. Gojek uses AI and ML for matching, forecasting (to inform drivers where to go ahead of a surge in demand), and dynamic pricing. Through its 8 billion pings with drivers per day, Gojek claims it generates 4–5 terabytes of data every day. IoT-enabled cold storage. According to WHO, 19.4 million people across the globe lacked access to routine life-saving vaccines in 2018, partly because of lack of efficient cold chain systems.70 IoT-enabled cold storage solutions allow the transport and storage of temperature-sensitive food and medication, with greater control and tracking by the freight owner. For vaccines, the ability to track temperature can help ensure confidence in integrity before dispensation, even before further testing. Gricd, a Nigerian start-up founded in 2018, uti- lizes solar-powered, IoT-enabled mobile refrigeration boxes whose internal probes collect temperature data and transmit it to a server. Real-time information on location and temperature can be accessed by freight owners online or via a mobile app, ensuring that the cold chain is effectively monitored and maintained. Social media as a tool for connecting to markets High marketing and advertising costs hinder smaller businesses trying to reach new markets and cus- tomers. Meanwhile, high search costs and frictions related to contract enforcement raise prices for buy- ers and inhibit trade.71 Social media provide a low- cost sales platform for firms domestically and abroad and reduce search costs for consumers. They enable products to be better targeted to consumers and can reduce marketing costs by as much as 90 percent, compared with traditional television marketing.72 Social media platforms also allow sellers to incor- porate market intelligence into their product devel- opment through real-time feedback and gathering of online data.73 Given these advantages, it is not surprising that nearly half of all enterprises in the European Union had used social media for advertis- ing purposes as of 2017.74 Reaching markets through social media could disproportionately advantage smaller firms over larger ones, as suggested by the high proportion of small entrepreneurs who use Facebook. In 2018, nearly four in 10 Facebook business users were single-person firms connecting across 42 countries (including low- and middle-income), although single- person firms are only one-tenth of the general popu- lation of firms.75 Businesses run by women are more likely to leverage online tools to facilitate business success than businesses run by men.76 On the buyer side, social connections can increase trade by building trust, including by reducing infor- mation asymmetries and providing a substitute for the formal mechanisms of contract enforcement.77 According to a study of 180 countries and 332 Euro- pean regions, social connectedness tends to increase exports—particularly to those countries with a weak rule of law—and to lower prices, especially for goods whose prices are not transparent and that are not traded on exchanges.78 Some potential risks and adverse outcomes of data-driven businesses to be addressed by policy Despite the potential transformative effects of data- driven firms, policy makers need to take into account several (often interrelated) risks and adverse out- comes to ensure that the use of data in the productive processes of firms safely fulfills their potential. The relevance and immediacy of these concerns depend onthedataintensityofacountry’seconomy.However, becauseoftheglobalnatureofmanylargedata-driven firms market dynamics in one country can often have spillover effects internationally. This concern should not discourage policy makers from fostering a data- driven economic ecosystem in their country, but they should put the appropriate safeguards and enablers in place to ensure that data-driven markets remain competitive and vibrant—and that gains are shared broadly across society—as the data intensity of the economy increases. These topics are covered in part II of this Report. Potential to increase the propensity for dominant firms to emerge Proprietary data can provide a firm with a compet- itive advantage over rivals. Because data are often created as a by-product of a firm’s economic activities, once a firm has invested in the fixed cost of building capacity to collect data, the marginal cost of creating additional data is low. Moreover, better targeting of a firm’sofferingcanattractmoreusers,therebyleverag- ing network effects between platform users that can lead to a “winner-takes-most” dynamic or, at the least, a scale advantage that new entrants find difficult to overcome.79 For example, an e-commerce platform
  • 128. 108 | World Development Report 2021 that incorporates more consumer data creates a more customized shopping experience, with more accurate product recommendations, more preordered shop- ping baskets, and more consumer reviews. A platform with a greater number of consumers will also attract more suppliers through indirect network effects, rais- ing users’ costs of switching to competing platforms. The distribution of web traffic, a proxy for concentra- tion in the e-commerce sector, is skewed toward a few larger platforms. Among 631 business-to-consumer online marketplaces in Africa, 56 percent of web vis- itors went to 1 percent of sites in 2019.80 Jumia alone had 24 percent of users. Data can also ease a platform’s entry into adjacent markets. Well-known examples are M-Pesa’s move from money transfers into savings and loan prod- ucts; Uber’s entry into food and freight delivery; and Google’s evolution from search to shopping, maps, and other markets. By combining multiple types of data, platforms can benefit from the broader scope of their data, which has spurred a growing number of mergers aimed at accumulating data (a prime exam- ple is Facebook’s acquisition of WhatsApp).81 The potential of a platform business to acquire market power depends on its business model, includ- ing the types of users that interact on the platform and its revenue model, which affect the type of data that gives firms a competitive advantage (figure 3.7). For example, firms that rely on advertising revenue require frequently updated consumer data to create holistic consumer profiles. Platforms that earn a fee based on their transactions conducted may rely more on historical data on product demand and consumer profiles for a smaller range of products. The type of data required affects, in turn, the ease with which firms can access or replicate the data they need. Where platforms rely on volunteered or observed consumer data, firms with greater market power may be able to collect data more easily because consumers have fewer options—meaning those firms can further entrench their market positions. The greater propensity for dominance in data- driven markets raises the risk that smaller or more traditional firms will be excluded, hindering local entrepreneurship and posing risks for consumer welfare. These effects can be exacerbated in devel- oping markets, where entrants find it harder to raise start-up capital and hire from the limited supply of skilled programmers and data scientists. For exam- ple, of the total private market funding received by the 10 highest-funded disruptive tech firms in Africa, 77 percent went to firms owned by the three largest African internet companies (two by Naspers, two by Jumia, and one by Ringier One Africa Media).82 Where few large data-driven players currently oper- ate or where a few large firms provide much-needed goods and services, the risks may be less immediately apparent. However, because of the dynamism of such markets and their tendency to tip toward concen- trated structures, it is important that policy makers safeguard against dominance that forestalls entry and innovation. Figure 3.7 Risks to market structure and market power stemming from platform firms Source: WDR 2021 team. a. Such data include observed and inferred data and data requiring frequent updating. Figure 3.7 Data give firms a competitive advantage, especially where Tendency toward increasing concentration within markets and entrenchment of market power Tendency to conglomeration and ability to leverage market power between markets Greater market power may allow more data collection by firms because users have fewer alternatives Data as an input into economic activity Potential sides of a platform • Consumers • Suppliers or service providers/“workforce” (where a product or service is provided by a third party) • Advertisers (where the revenue model is advertising-based) • Data are proprietary or Network effects access is restricted • The required data are less replicable a are stronger Economies of scale associated with data Economies of scope associated with data Depending on the platform business model, effects can occur in • Consumer markets • Markets for “labor” • Advertising markets
  • 129. Data as a resource for the private sector | 109 On DLT platforms, data access is not controlled by one institution, thus reducing the extent to which the benefits of network effects can entrench market power. However, private blockchains still require a central authority or institution that decides who can participateinthesystem,therebyplacingpowerinthe hands of the institution that acts as the gatekeeper. By contrast, public blockchains such as Bitcoin do not require a central authority but rather “proof-of-work” (a system to deter frivolous or malicious uses of com- puting power) to participate. This system, in turn, generates very high energy costs because it requires computing power and resource usage.83 Phenomena linked to the issue of dominance and market structure are described in the sections that follow. Tension between cooperation and competition in data- driven ecosystems. Complementary products are built around larger platforms. These innovations can become central to the business models of the larger platforms. For example, Amazon, Uber, and Airbnb would not be able to operate without the payment systems embedded in their services. However, this network structure also means that firms are depen- dent on accessing the systems and data of other firms that could become rivals in the future. Typically, firms access the systems and data of other firms through application programming interfaces (APIs) provided by the core platform in the ecosystem. APIs link platforms to other platforms and to developers of digital services. Through APIs, a platform or digital service provider will typically either draw data from or provide data to other firms to support its own functioning or support the func- tioning of other players. In markets where data are a key input, the owners of valuable data are gatekeepers in the development of smaller entrepreneurs. The emergence of potential competition from complementors may provoke the lead platform to restrict its API. For example, when Twitter perceived a competitive threat from LinkedIn, Twitter restricted the use of its API in 2012 to prevent users’ Tweets from appearing on LinkedIn’s platform. In Kenya, developers have complained about M-Pesa’s refusal to share its API.84 In the United States, developers testi- fying before Congress in 2020 accused Apple of mim- icking their products and of citing privacy concerns to restrict how third-party developers collect location data.85 Chapter 7 covers competition issues. Data-driven mergers and acquisitions. In recent years, waves of acquisitions by large local players have occurred in e-commerce in China and India and in transportation in Southeast Asia, where Uber exited eight Southeast Asian markets after selling its businesses to Grab, the region’s leading platform.86 Of the mergers involving digital platforms that have undergone review by antitrust authorities around the world, 82 percent involved an acquisition by a very large firm.87 Mergers of two very large firms were the most common type of transaction. Chapter 7 covers issues related to mergers in more detail. Suboptimal exchange of data. Although the broad use, reuse, and repurposing of data by firms can gen- erate larger gains, market mechanisms may generate specific patterns of data exchange and reuse below the level that yields the greatest social welfare.88 Reg- ulators must take several steps to balance the costs and benefits of mandating data sharing to address these concerns. First, they should protect individuals’ rights related to personal data. Second, they should recognize that mandated data sharing would dampen firms’ incentives to invest in data collection if firms must share data with competitors or potential com- petitors. And, third, they should take into account that data sharing could jeopardize the provision of free or subsidized services if a firm relies on monetization of its data to cross-subsidize these services. Optimal data sharing between firms could be lower in coun- tries where data are less important to the economy.89 The right balance may differ in high-income and lower-income economies, although policy makers in some lower-income countries may wish to adopt a forward-looking viewpoint in this area to set the stage for future advances in their data economies. These issues are further discussed in chapters 6 and 7. Linked to data sharing is the debate over property rights or access rights to data generated as a by- product of economic activity—for example, in terms of individuals versus firms in the case of personal data or in terms of owners of devices or applications versus the party using the device or application. This issue is taken up in chapter 6. Potential for exploitation of individuals Excessive data collection. Data collected by firms track- ing users across third-party websites, applications, and devices can raise concerns. This practice is dominated by a small number of large firms—for example, a major part of Google’s data collection occurs when a user is not directly engaged with any of its products.90 Recent literature suggests that the vast amounts of data collected in this manner may be deemed excessive under existing European competi- tion laws, where the focus is on the anticompetitive harm that may occur whether or not data protec- tion rules are infringed.91 More generally, both data
  • 130. 110 | World Development Report 2021 protection authorities and competition authorities have noted that firms often understate and obscure their actual data practices, preventing consumers from making informed choices.92 Excessive collection of data on children and other vulnerable groups is of particular concern (such as in the education, health, and financial sectors), especially in countries and locales where individuals lack adequate knowledge to protect against these risks. These issues are further discussed in chapters 6 and 7. Insufficient governance of data held by private firms. Firms choose how much to invest in cybersecurity and data protection, but that investment may fall short of the level that yields the greatest social wel- fare because firms do not fully internalize the value of privacy and security to the individual user or the need to engender trust in the data economy. Infra- structure service providers may tend to underinvest in cybersecurity because the economic consequences of any breach in data security are largely borne by the clients whose data are compromised. A 2018 report by the US Council of Economic Advisers estimated that malicious cyberactivity cost the US economy between US$57 billion and US$109 billion in 2016, representing between 0.31 and 0.58 percent of that year’s GDP.93 Governments may need to provide incentives or regulations to offset the tendency of firms to under- invest in cybersecurity, including imposing adequate penalties for data breaches.94 Pricing, discrimination, and algorithmic risks. Because firms have so much information on customers’ per- sonal characteristics and purchase histories, they can adjust their offerings to charge higher or lower prices based on an individual’s price sensitivity.This practice can allow low-income sectors of society to be served that otherwise would not be. It can also help firms clear their stock, thereby reducing wastage. However, such price and quality discrimination can also harm some consumers. This kind of price discrimination is not inherently bad—it is a transfer of surplus from consumers to producers, and governments can use taxes and transfers to distribute it back again—but data and data-driven business models do make it easier to discriminate by price. Algorithms can facilitate anticompetitive behavior by firms, ultimately harming individuals through higher prices. Algorithms can be trained to collude independently by surreptitiously following the behav- ior of a price leader, or they may be unintentionally biased because of inherent bias in their training data. This bias can magnify marginalization because the most vulnerable populations are often those that are least represented in digital data. Although algorithms hold the promise of impartiality, this promise is not always realized because ultimately they reflect the same biases in human judgment and behavior (due to logic bias and flawed assumptions) reflected in their training datasets. But because algorithmic decision-making is opaque, the potential biases and anticompetitive effects may be difficult to detect. In algorithmic decision-making, including credit scoring, these risks could lead to discrimination along socioeconomic lines that entrenches existing inequalities.95 Groups with more limited access to mobile phones, the internet, and bank accounts, such as women, may become less visible in data and decision-making if algorithmic bias is perpetuated through use of biased datasets. If decisions are based on data about those with whom a person interacts, such as friends and neighbors, this, too, may amplify discriminatory effects. For example, a poor credit score for an individual may contribute to lower scores for those in their neighborhood or social network. Furthermore, alternative scoring tools may be used to identify vulnerable individuals susceptible to preda- tory loans and other product offerings.96 These concerns suggest the need to establish a sys- tem of oversight, inspection, and auditing of firms’ algorithms. However, adequate standardized legal and regulatory frameworks to deal with risks from AI and regulatory capacity to determine harm and the appropriate safeguards are lacking. Indirect management of the workforce through algo- rithms. Remotely collecting data on workers and service providers to drive automated or semiauto- mated decision-making on parameters such as task allocations, performance evaluations, and incentives for certain types of behavior has become particularly prevalent in the gig economy.97 In addition to the risks algorithmic management raises for bias and discrim- ination, the practice makes it easier for firms to avoid classifying individuals as employees and thus avoid providing workers with benefits. Better understand- ing of the organizational and welfare impacts of algo- rithmic management and data collection on workers would help identify appropriate protections. Potential to increase inequality within and among countries Adoption of data-driven business models could widen gaps within countries, between countries, and between different types of firms, different types of workers, and individuals in different income groups. Within a country, the impact of the data-driven economy on individuals—as consumers, entrepre- neurs, or job seekers—will depend on their access
  • 131. Data as a resource for the private sector | 111 to finance, education levels, skills, and technology. Although selling through platforms can close the productivity gaps between large and small firms for those small firms that go down this sales channel, overall smaller firms and entrepreneurs around the world lag their larger counterparts in adopting basic technologies such as fast broadband, having an inter- net presence, selling online, and utilizing cloud com- puting. Although the gig economy provides oppor- tunities for job creation, only those who have the assets and skills to participate (such as cars, mobile devices, and literacy) will be able to benefit. And even though automated decision-making may mean more efficient and cost-effective service delivery for some individuals, it may lead to greater bias and discrimi- nation against others. Likewise, the degree to which a country can ­ benefit from the data-driven economy depends on its underlying infrastructure, capabilities, and scale. The amount of data that can be derived locally depends on a country’s level of digital economic activ- ity. Firms from larger, more connected economies— or firms that already operate across countries—with access to larger datasets will have an advantage that only grows with time. Firms from low- and middle-income countries are more likely to lack both access to finance to cover the initial costs of collecting and managing their data and the analytical capabilities to derive value from them. When com- bined with fewer (or more uncertain) opportunities for monetizing data, either now or in the future (such as through advertising or development of new prod- ucts), firms from lower-income economies also have less incentive to invest in collecting and analyzing data, which can worsen inequality between countries on a macro level. Discouraging international data-driven firms from operating or locating in lower-income countries (such as through restrictive data policies) is not a viable solution because it deprives the local economy of the pro-growth and development benefits that data-driven firms can provide. Moreover, it prevents the development of a local ecosystem of data-driven entrepreneurs built around these larger firms—a sce- nario that could slow the advancement of infrastruc- ture and capabilities needed for lower-income coun- tries to bridge the gap in the longer term. Instead, governments can seek to harness the positive welfare effects of the data-driven economy while mitigating the risks to inequality through a combination of digi- tal inclusion policies, public investments, and robust legal and regulatory tools. These are the topics of part II of this Report. Notes 1. EIU (2012); Manyika et al. (2011). 2. Fernando (2021). 3. Manyika et al. (2011). 4. Brynjolfsson, Hitt, and Kim (2012); Brynjolfsson and McElheran (2016b). 5. von Mörner (2017). 6. Magalhaes and Roseira (2017); Stott (2014). 7. Evans and Schmalensee (2016); Gawer (2014). 8. This figure is according to the database constructed by the i2i Facility. The database covers eight Sub-Saharan African countries: Ghana, Kenya, Nigeria, Rwanda, South Africa, Tanzania, Uganda, and Zambia. See Africa’s Digital Platforms Database, Insight2Impact (i2i), http:// access.i2ifacility.org/Digital_platforms/. 9. Evans (2016); Evans and Gawer (2016). 10. The countries in the sample are Armenia, Bangladesh, Brazil, Colombia, the Arab Republic of Egypt, Indonesia, Kenya, Malaysia, Morocco, Nigeria, Peru, the Philip- pines, the Russian Federation, South Africa, Sri Lanka, Tunisia, and Ukraine. These countries were selected based on data availability and to provide a combination of countries of different sizes and levels of economic development across regions. 11. See Nyman and Stinshoff (forthcoming), who base their data on information provided by Crunchbase, Crunch- base (database), https://www.crunchbase.com/; Dow Jones and Company, Factiva (database), https:// professional.dowjones.com/factiva/; Thomson Reuters Foundation, “Inclu­ sive Economies,” http://www.trust.org /inclusive-economies/. E-commerce includes both busi- ness-to-business (B2B) and business-to-consumer (B2C) business models (excluding agriculture wholesale). Transport includes passenger transport and freight transport/logistics, with the latter accounting for about 67 percent of firms in this category on average. Agri­ culture includes platforms where the main business focus is agriculture, including both marketplace and financial services. Tourism includes booking platforms and accommodation sharing. 12. Data on the founding year were available for 75 percent of firms in the database. 13. A firm is considered closed if the website cannot be found or accessed via internet research, or if the domain is for sale. A firm is confirmed closed if an article noting the firm’s exit was found. 14. Sites are ordered by their Alexa traffic rank, calculated using a combination of average daily visitors and pageviews over the past month. The site with the high- est combination of visitors and pageviews is ranked 1. See Alexa Internet, “The Top 500 Sites on the Web, by Country” (accessed April 2020), https://www.alexa.com /topsites/countries. 15. Analysis of data from World Bank, Digital Business Indicators (database), https://www.worldbank.org/en /research/brief/digital-business-indicators. 16. Total session time of over 85 billion hours was recorded from May to June 2018, according to data from 2020 on data software development kits, mobile app downloads,
  • 132. 112 | World Development Report 2021 and revenue and usage data from Apptopia, Data (data- base), https://apptopia.com/. 17. Kumar and Muhota (2012). See UN Global Working Group on Big Data for Official Statistics for an over- view of mobile phone data types and their potential use (UNGWG 2019). 18. Bertrand and Kamenica (2018). 19. Berg et al. (2018). 20. Platforms, IoT devices, and blockchain reduce asymmet- ric information about the quality and trustworthiness of smaller suppliers. For platforms, user reviews partly serve this purpose. For example, better-rated sellers on eBay have higher prices and higher revenues—see Houser and Wooders (2005); Livingston (2002); Lucking- Reiley et al. (2007); Melnik and Alm (2002)—and sellers with low ratings exit from eBay’s platform (Cabral and Hortaçsu 2010). 21. Lendle et al. (2012). 22. Lendle et al. (2012). 23. Luo and Niu (2019). 24. De and Raychaudhuri (2008). 25. Balchin et al. (2016). 26. Data are as of 2018 and come from World Trade Orga- nization, Trade Profiles (database), https://www.wto.org /english/res_e/statis_e/trade_profiles_list_e.htm. 27. Balchin et al. (2016). 28. Borchert and Mattoo (2019). 29. The 10 countries are Belgium, France, Germany, Hun- gary, Italy, Poland, Spain, Sweden, the United Kingdom, and the United States. 30. Bailin et al. (2019). 31. Falk and Hagsten (2015). 32. Calculations of the European Investment Bank (EIB) and World Bank based on EIB (2019). 33. Aghion, Jones, and Jones (2017); Cockburn, Henderson, and Stern (2019). 34. Brynjolfsson and McElheran (2016a). 35. MGI (2018). 36. IFC (2009). 37. Eisenmeier (2018). 38. Bughin and van Zeebroeck (2017). The authors estimate that tapping latent demand could increase industry size by 0.5 percent a year. 39. Brynjolfsson et al. (2019). 40. Alphabet Inc. (2019); Clement (2020). 41. Alibaba Group (2019); McNair (2018). 42. West and Biddle (2017). 43. Singh (2020). 44. Demirgüç-Kunt et al. (2018). 45. Manyika et al. (2016). 46. Ant Financial is an affiliate company of the Alibaba Group. 47. LenddoEFL (2020). 48. Lobosco (2013). 49. Aron and Muellbauer (2019). 50. Patnam and Yao (2020). 51. Suri and Jack (2016). 52. Mastercard (2017). In May 2020, Mastercard and the Caribbean Hotel and Tourism Association (CHTA) launchedaTourismInsightsplatformthatlooksattravel trends from search patterns to in-market spending for the Dominican Republic, Jamaica, and Puerto Rico (Jamaica Observer 2020). 53. Baruri (2016). 54. Creydt and Fischer (2019). 55. Open Access Government (2019). 56. Kamath (2018). 57. Levy (2014). 58. The World Health Organization (WHO 2018) defines as substandard “authorized medical products that fail to meet either their quality standards or specifications, or both” and falsified “medical products that deliberately/ fraudulently misrepresent their identity, composition or source.” 59. WHO (2018). 60. Taylor (2016). 61. Google (2021). 62. World Bank (2017). 63. Rouhiainen (2019). 64. Hao (2019). 65. UNESCO (2020). 66. Future Hub (2020). 67. Gerretsen (2020). 68. Okello (2018). 69. Estimates are provided by Gojek. See Universitas Indo- nesia (2020). 70. WHO (2019). 71. See, for example, Aker (2010); Allen (2014); Eaton and Kortum (2002); Jensen (2007); Simonovska and Waugh (2014); Startz (2017). 72. See LYFE Marketing “Traditional Media vs. Social Media Advertising: Cost Comparison,” https://www.lyfe marketing.com/traditional-media-versus-social-media/. 73. Rumo Arongo Ndiege (2019). 74. EU Open Data Portal, Eurostat, “Social Media Use by Type, Internet Advertising” (dataset), https://data.europa .eu/euodp/en/data/dataset/MTxwCIIEx8RhOhZMmg Wvg. 75. Facebook, OECD, and World Bank (2017). 76. Facebook, OECD, and World Bank (2017). 77. Bailey et al. (2020). 78. Bailey et al. (2018). 79. A model developed by Farboodi et al. (2019) shows that data accumulation increases the skewness of firm size distribution as large firms generate more data and invest more in active experimentation. Although there has been less research on DLT applications, Benos, Garratt, and Gurrola-Perez (2019) suggested that similar dynamics would also push these markets to concentration. 80. See ecomConnect, International Trade Centre, African Marketplace Explorer (dashboard), https://ecomconnect .org/page/african-marketplace-explorer. 81. Argentesi et al. (2019). 82. CB Insights (2020). Private market funding refers to the total amount of money a firm has received, including from financial institutions and venture funding. Money raised in public markets is excluded. 83. IRGC (2017). 84. Riley and Kulathunga (2017). 85. Romm (2020).
  • 133. Data as a resource for the private sector | 113 86. Evans (2016). 87. Very large firms have more than 10,000 employees and more than US$1 billion in revenues. See Nyman and Barajas (forthcoming). 88. In the presence of privacy concerns and negative exter- nalities in data sharing, the market may instead gener- ate too much data sharing. See Acemoglu et al. (2019). 89. Jones and Tonetti (2019). 90. Schmidt (2018). 91. Robertson (2020). 92. Kemp (2019). 93. CEA (2018). 94. Gordon et al. (2015); Kashyap and Wetherilt (2019). 95. McGregor, Murray, and Ng (2019). 96. Hurley and Adebayo (2017). 97. Mateescu and Nguyen (2019). References Acemoglu, Daron, Ali Makhdoumi, Azarakhsh Malekian, and Asuman Ozdaglar. 2019. “Too Much Data: Prices and Inefficiencies in Data Markets.” NBER Working Paper 26296, National Bureau of Economic Research, Cambridge, MA. https://doi.org/10.3386/w26296. Aghion, Philippe, Benjamin F. Jones, and Charles I. Jones. 2017. “Artificial Intelligence and Economic Growth.” NBER Working Paper 23928, National Bureau of Eco- nomic Research, Cambridge, MA. Aker, Jenny C. 2010. “Information from Markets Near and Far: Mobile Phones and Agricultural Markets in Niger.” American Economic Journal: Applied Economics 2 (3): 46–59. https://doi.org/10.1257/app.2.3.46. Alibaba Group. 2019. “Alibaba Group Announces March Quarter and Full Fiscal Year 2019 Results.” Press release, May 15, 2019, Hangzhou, China. https://www.alibaba group.com/en/news/press_pdf/p190515.pdf. Allen, Treb. 2014. “Information Frictions in Trade.” Econo­ metrica 82 (6): 2041–83. Alphabet Inc. 2019. “Form 10-K: Annual Report Pursuant to Section 13 or 15(D) of the Securities Exchange Act of 1934.” US Securities and Exchange Commission, Wash- ington, DC. https://abc.xyz/investor/static/pdf/20200204 _alphabet_10K.pdf?cache=cdd6dbf. Argentesi, Elena, Paolo Buccirossi, Emilio Calvano, Tomaso Duso, Alessia Marrazzo, and Salvatore Nava. 2019. “Ex-post Assessment of Merger Control Decisions in Digital Markets.” With contributions by Elena Salo- mone and Anna Violini, Lear, Rome. https://www.learlab .com/wp-content/uploads/2019/06/CMA_past_digital _mergers_GOV.UK_version-1.pdf. Aron, Janine, and John Muellbauer. 2019. “The Economics of Mobile Money: Harnessing the Transformative Power of Technology to Benefit the Global Poor.” VoxEU.Org, May 7, 2019. https://voxeu.org/article/economics-mobile-money. Bailey, Michael, Rachel Cao, Theresa Kuchler, Johannes Stroebel, and Arlene Wong. 2018. “Social Connectedness: Measurement, Determinants, and Effects.” Journal of Economic Perspectives 32 (3): 259–80. Bailey, Michael, Abhinav Gupta, Sebastian Hillenbrand, Theresa Kuchler, Robert J. Richmond, and Johannes Stroebel. 2020. “International Trade and Social Connect- edness.” NBER Working Paper 26960, National Bureau of Economic Research, Cambridge, MA. https://doi.org /10.3386/w26960. Bailin, Alberto, Peter Gal, Valentine Millot, and Stéphane Sorbe. 2019. “Like It or Not? The Impact of Online Plat- forms on the Productivity of Incumbent Service Provid- ers.” OECD Economics Department Working Paper 1548, Organisation for Economic Co-operation and Develop- ment, Paris. https://doi.org/10.1787/080a17ce-en. Balchin, Neil, Bernard Hoekman, Hope Martin, Maximiliano Mendez-Parra, Phyllis Papadavid, David Primack, and Dirk Willem te Velde. 2016. “Trade in Services and Eco- nomic Transformation.” SET Report, Supporting Eco- nomicTransformation, Overseas Development Institute, London. https://set.odi.org/wp-content/uploads/2016/11 /SET-Trade-in-Services-and-Economic-Transformation _Final-Nov2016.pdf. Baruri, Pani. 2016. “Blockchain Powered Financial Inclu- sion.” PowerPoint presentation, Cognizant, Teaneck, NJ. http://pubdocs.worldbank.org/en/710961476811913780 /Session-5C-Pani-Baruri-Blockchain-Financial-Inclusion -Pani.pdf. Benos, Evangelos, Rodney Garratt, and Pedro Gurrola-Perez. 2019. “The Economics of Distributed Ledger Technology for Securities Settlement.” Ledger 4 (November): 121–56. https://doi.org/10.5195/ledger.2019.144. Berg, Tobias, Valentin Burg, Ana Gombović, and Manju Puri. 2018. “On the Rise of the FinTechs: Credit Scoring Using Digital Footprints.” FDIC CFR Working Paper 2018-04, Center for Financial Research, Federal Deposit Insurance Corporation, Arlington, VA. https:// www.fdic.gov/analysis/cfr/2018/wp2018/cfr-wp2018-04 .pdf. Bertrand, Marianne, and Emir Kamenica. 2018. “Coming Apart? Cultural Distances in the United States over Time.” NBER Working Paper 24771, National Bureau of Economic Research, Cambridge, MA. https://doi.org /10.3386/w24771. Borchert, Ingo, and Aaditya Mattoo. 2009. “The Crisis- Resilience of Services Trade.” Service Industries Journal 30 (13): 2115–36. Brynjolfsson, Erik, Avinash Collis, Walter Erwin Diewert, Kevin J. Fox, and Felix Eggers. 2019. “GDP-B: Accounting for the Value of New and Free Goods in the Digital Econ- omy.” NBER Working Paper 25695, National Bureau of Economic Research, Cambridge, MA. Brynjolfsson, Erik, Lorin M. Hitt, and Heekyung Hellen Kim. 2012. “Strength in Numbers: How Does Data-Driven Decision-Making Affect Firm Performance?” In Inter­ national Conference on Information Systems 2011 (ICIS 2011), vol. 1, edited by Association for Information Systems, 541–58. Red Hook, NY: Curran Associates. Brynjolfsson, Erik, and Kristina McElheran. 2016a. “Data in Action: Data-Driven Decision Making in U.S. Manufac- turing.” CES Working Paper 16-06, Center for Economic Studies, US Census Bureau, Washington, DC. https:// www2.census.gov/ces/wp/2016/CES-WP-16-06.pdf. Brynjolfsson, Erik, and Kristina McElheran. 2016b. “The Rapid Adoption of Data-Driven Decision-Making.”
  • 134. 114 | World Development Report 2021 American Economic Review 106 (5): 133–39. https://doi.org /10.1257/aer.p20161016. Bughin, Jacques, and Nicolas van Zeebroeck. 2017. “The Best Response to Digital Disruption.” MIT Sloan Management Review 58 (4): 80–86. https://sloanreview.mit.edu/article /the-right-response-to-digital-disruption/. Cabral, Luís, and Ali Hortaçsu. 2010. “The Dynamics of Seller Reputation: Evidence from eBay.” Journal of Industrial Economics 58 (1): 54–78. CB Insights. 2020. “The Fintech 250: The Top Fintech Com- panies of 2020.” Research Report, CB Insights, New York. https://www.cbinsights.com/research/report/fintech -250-startups-most-promising/. CEA (Council of Economic Advisers). 2018. “The Cost of Malicious Cyber Activity to the U.S. Economy.” CEA, White House,Washington, DC. https://www.whitehouse .gov/wp-content/uploads/2018/02/The-Cost-of -Malicious-Cyber-Activity-to-the-U.S.-Economy.pdf. Clement, Jessica. 2020. “Google: Annual Advertising Reve- nue 2001–2019.” Statista, New York. https://www.statista .com/statistics/266249/advertising-revenue-of-google/. Cockburn, Iain M., Rebecca Henderson, and Scott Stern. 2019. “The Impact of Artificial Intelligence on Innova- tion: An Exploratory Analysis.” In The Economics of Artifi- cial Intelligence: An Agenda, edited by Ajay Agrawal, Joshua Gans,andAviGoldfarb,115–46.Cambridge,MA:National Bureau of Economic Research; Chicago: University of ChicagoPress.https://www.nber.org/books-and-chapters /economics-artificial-intelligence-agenda/impact -artificial-intelligence-innovation-exploratory-analysis. Creydt, Marina, and Markus Fischer. 2019. “Blockchain and More: Algorithm Driven Food Traceability.” Food Control 105 (November): 45–51. https://doi.org/10.1016/j .foodcont.2019.05.019. De, Prabir, and Ajitava Raychaudhuri. 2008. “Is India’s Services Trade Pro-Poor? A Simultaneous Approach.” Markhub Working Paper 16, Macao Regional Knowledge Hub, United Nations Economic and Social Commission for Asia and the Pacific, Macao SAR, China. Demirgüç-Kunt, Asli, Leora Klapper, Dorothe Singer, Saniya Ansar, and Jake Hess. 2018. The Global Findex Database 2017: Measuring Financial Inclusion and the Fintech Revolu- tion. Washington, DC: World Bank. https://doi.org/10 .1596/978-1-4648-1259-0. Eaton, Jonathan, and Samuel Kortum. 2002. “Technology, Geography, and Trade.” Econometrica 70 (5): 1741–79. https://doi.org/10.1111/1468-0262.00352. EIB (European Investment Bank). 2019. “EIB Group Survey on Investment and Investment Finance 2019: EU Over- view.” Economics Department, European Investment Bank, Luxembourg. https://www.eib.org/en/publications /econ-eibis-2019-eu. Eisenmeier, Siegfried R. J. 2018. “Case Study: Ride-Sharing Platforms in Developing Countries: Effects and Impli- cations in Mexico City.” P4P Commission Background Paper 3, Pathways for Prosperity Commission on Tech- nology and Inclusive Development, Blavatnik School of Government, University of Oxford, Oxford, UK. https://pathwayscommission.bsg.ox.ac.uk/sites/default /files/2019-09/ride-sharing_platforms_in_developing _countries.pdf. EIU (Economist Intelligence Unit). 2012. “The Deciding Fac- tor: Big Data and Decision-Making.” Business Analytics, Capgemini, Paris. Evans, David S., and Richard Schmalensee. 2016. Match­ makers: The New Economics of Multisided Platforms. Boston: Harvard Business Review Press. Evans, Peter C. 2016. “The Rise of Asian Platforms.” Emerging Platform Economy Series 3, Center for Global Enterprise, New York. https://www.thecge.net/web/viewer.html ?file=/app/uploads/2016/11/FINALAsianPlatformPaper .pdf. Evans, Peter C., and Annabell Gawer. 2016. “The Rise of the Platform Enterprise: A Global Survey.” Emerging Plat- form Economy Series 1, Center for Global Enterprise, New York. https://www.thecge.net/app/uploads/2016/01 /PDF-WEB-Platform-Survey_01_12.pdf. Facebook, OECD (Organisation for Economic Co-operation and Development), and World Bank. 2017. “Future of Business Survey: Gender Management in Business.” Factworks, San Mateo, CA. https://fbnewsroomes.files .wordpress.com/2017/02/future-of-business-survey -gender-management-in-business-january-20171.pdf. Falk, Martin, and Eva Hagsten. 2015. “E-Commerce Trends and Impacts across Europe.” International Journal of Pro- duction Economics 170 (December): 357–69. https://doi.org /10.1016/j.ijpe.2015.10.003. Farboodi, Maryam, Roxana Mihet, Thomas Philippon, and Laura Veldkamp. 2019. “Big Data and Firm Dynamics.” NBER Working Paper 25515, National Bureau of Eco- nomic Research, Cambridge, MA. https://doi.org/10.3386 /w25515. Fernando, Jason. 2021. “Factors of Production.” Investopedia, January 3, 2021. https://www.investopedia.com/terms/f /factors-production.asp. Future Hub. 2020. “E-Logistics Transforming Mid-Mile Freight Logistics in Emerging Markets: Stories Told and Untold.” Future Insight (blog), June 30, 2020. http://www .fhub.io/blog/e-logistics-transforming-mid-mile-freight -logistics-in-emerging-markets. Gawer, Annabell. 2014. “Bridging Differing Perspectives on Technological Platforms: Toward an Integrative Frame- work.” Research Policy 43 (7): 1239–49. Gerretsen, Isabelle. 2020. “Trucking App Kobo360 Wants to Speed Up Deliveries across Africa.” CNNBusiness: Innovate Africa, April 16, 2020. https://www.cnn.com/2020/04/16 /tech/kobo-360-trucks-spc-intl/index.html. Google. 2021. “How Doctors, Dispatch Drivers, and Blood Donors Are ComingTogether to Save Lives across Africa.” Search On, Lifebank, Google, Mountain View, CA. https:// about.google/stories/lifebank/#:~:text=By%20designing %20a%20system%20to,take%20your%20time%20to%20 solve. Gordon, Lawrence A., Martin P. Loeb, William Lucyshyn, and Lei Zhou. 2015. “Increasing Cybersecurity Invest- ments in Private Sector Firms.” Journal of Cybersecurity 1 (1): 3–17. https://doi.org/10.1093/cybsec/tyv011. Hao,Karen.2019.“ChinaHasStartedaGrandExperimentinAI Education: It Could Reshape How theWorld Learns.” MIT Technology Review. https://www.technologyreview.com /2019/08/02/131198/china-squirrel-has-started-a-grand -experiment-in-ai-education-it-could-reshape-how-the/.
  • 135. Data as a resource for the private sector | 115 Houser, Daniel, and John Wooders. 2005. “Hard and Soft Closes: A Field Experiment on Auction Closing Rules.” In Experimental Business Research: Economic and Managerial Perspectives, vol. 2, edited by Amnon Rapoport and Rami Zwick, 123–31. Dordrecht, the Netherlands: Springer. Hurley, Mikella, and Julius Adebayo. 2017. “Credit Scoring in the Era of Big Data.” Yale Journal of Law and Technology 18 (1): 148–216. IFC (International Finance Corporation). 2009. “M-Money Channel Distribution Case, Kenya: Safaricom M-PESA.” IFC, Washington, DC. https://www.ifc.org/wps/wcm /connect/e0d2a9bd-16b9-4a36-8498-0b2650b9af8b/Tool %2B6.7.%2BCase%2BStudy%2B-%2BM-PESA%2BKenya %2B.pdf?MOD=AJPERESCVID=jkCVy-n. IRGC (International Risk Governance Center). 2017. “Gov- erning Risks and Benefits of Distributed Ledger Tech- nologies.” IRGC, Geneva. https://irgc.org/wp-content /uploads/2018/09/IRGC.-2017.-Governing-risks-and -benefits-of-DLTs.-Highlights.pdf. Jamaica Observer. 2020. “Mastercard, CHTA Launch Tourism Insights Platform.” May 8, 2020. http://www.jamaica observer.com/business-report/mastercard-chta-launch -tourism-insights-platform_193696?profile=1442. Jensen, Robert. 2007. “The Digital Provide: Information (Technology), Market Performance, and Welfare in the South Indian Fisheries Sector.” QuarterlyJournalofEconom- ics 122 (3): 879–924. http://www.jstor.org/stable/25098864. Jones, Charles I., and Christopher Tonetti. 2019. “Nonrivalry and the Economics of Data.” NBERWorking Paper 26260, National Bureau of Economic Research, Cambridge, MA. https://ideas.repec.org/p/nbr/nberwo/26260.html. Kamath, Reshma. 2018. “Food Traceability on Blockchain: Walmart’s Pork and Mango Pilots with IBM.” JBBA, JournaloftheBritishBlockchainAssociation 1 (1): 3712. https:// doi.org/10.31585/jbba-1-1-(10)2018. Kashyap, Anil K., and Anne Wetherilt. 2019. “Some Principles for Regulating Cyber Risk.” AEAPapersandProceedings 109 (May): 482–87. https://doi.org/10.1257/pandp.20191058. Kemp, Katharine. 2019. “Concealed Data Practices and Com- petition Law: Why Privacy Matters.” Report UNSWLRS 53, University of New South Wales Law Research Series, UNSW Law, University of New South Wales, Sydney. Kumar, Kabir, and Kim Muhota. 2012. “Can Digital Foot- prints Lead to Greater Financial Inclusion?” CGAP Brief, Consultative Group to Assist the Poor, Washington, DC. https://www.cgap.org/sites/default/files/researches /documents/CGAP-Brief-Can-Digital-Footprints-Lead-to -Greater-Financial-Inclusion-Jul-2012.pdf. LenddoEFL. 2020. “Product Briefs.” LenddoEFL, Singapore. https://lenddoefl.com/resources. Lendle, Andreas, Marcelo Olarreaga, Simon Schropp, and Pierre-Louis Vezina. 2012. “There Goes Gravity: How eBay Reduces Trade Costs.” Policy Research Working Paper 6253, World Bank, Washington, DC. Levy, Stephen. 2014. “Colorimetrix App Turns Smartphones into Lab Test Readers.” MD+DI, March 19, 2014. https:// www.mddionline.com/digital-health/colorimetrix -app-turns-smartphones-lab-test-readers. Livingston, Jeffrey A. 2002. “How Valuable Is a Good Repu- tation? A Sample Selection Model of Internet Auctions.” Review of Economics and Statistics 87 (3): 453–65. Lobosco, Katie. 2013. “Facebook Friends Could Change Your Credit Score.” CNN Business, August 27, 2013. https:// money.cnn.com/2013/08/26/technology/social/facebook -credit-score/index.html. Lucking-Reiley, David, Doug Bryan, Naghi Prasad, and Daniel Reeves. 2007. “Pennies from eBay: The Determi- nants of Price in Online Auctions.” Journal of Industrial Economics 55 (2): 223–33. Luo, Xubei, and Chiyu Niu. 2019. “E-Commerce Participation and Household Income Growth in Taobao Villages.” Poverty and Equity Global Practice Working Paper 198, World Bank, Washington, DC. Magalhaes, Gustavo, and Catarina Roseira. 2017. “Open Government Data and the Private Sector: An Empirical View on Business Models and Value Creation.” Govern- ment Information Quarterly 37 (3): 101248. https://doi.org /10.1016/j.giq.2017.08.004. Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. 2011. “Big Data: The Next Frontier for Innovation, Competition, and Productivity.” McKinsey Global Insti- tute, New York. Manyika, James, Susan Lund, Marc Singer, Olivia White, and Chris Berry. 2016. “Digital Finance for All: Powering Inclusive Growth in Emerging Economies.” McKinsey Global Institute, New York. https://www.mckinsey.com /~/media/McKinsey/Featured%20Insights/Employment %20and%20Growth/How%20digital%20finance%20 could%20boost%20growth%20in%20emerging%20 economies/MG-Digital-Finance-For-All-Full-report -September-2016.ashx. Mastercard. 2017. “Leveraging Big Data to Drive Tourism Revenue.” Mastercard Tourism Insights, Mastercard, Purchase, NY. https://www.mastercard.us/content/dam /public/mastercardcom/na/us/en/documents/tourism -insights-summary.pdf. Mateescu, Alexandra, and Aiha Nguyen. 2019. “Explainer: Algorithmic Management in the Workplace.” February, Data Society Research Institute, New York. https://data society.net/wp-content/uploads/2019/02/DS_Algorithmic _Management_Explainer.pdf. McGregor, Lorna, Daragh Murray, and Vivian Ng. 2019. “International Human Rights Law as a Framework for Algorithmic Accountability.” International and Compara- tive Law Quarterly 68 (2): 309–43. McNair, Corey. 2018. “Global Ad Spending Update: Alibaba, Facebook, and Google to Capture Over 60% of Digital Ad Dollars in 2019.” eMarketer, November 20, 2018, Insider Intelligence, New York. https://www.emarketer.com /content/global-ad-spending-update. Melnik, Mikhail I., and James Alm. 2002. “Does a Seller’s Ecommerce Reputation Matter? Evidence from eBay Auctions.” Journal of Industrial Economics 50 (3): 337–49. MGI (McKinsey Global Institute). 2018. “Notes from the AI Frontier: Modeling the Impact of AI on the World Econ- omy.” Discussion Paper, MGI, New York. https://www .mckinsey.com/~/media/McKinsey/Featured%20 Insights/Artificial%20Intelligence/Notes%20from%20 the%20frontier%20Modeling%20the%20impact%20 of%20AI%20on%20the%20world%20economy/MGI
  • 136. 116 | World Development Report 2021 -Notes-from-the-AI-frontier-Modeling-the-impact-of-AI -on-the-world-economy-September-2018.ashx. Nyman, Sara, and Rodrigo Barajas. Forthcoming. “Antitrust in the Digital Economy: A Global Perspective.” World Bank, Washington, DC. Nyman, Sara, and Clara Stinshoff. Forthcoming. “A Mapping of Digital Platform Firms in Developing Economies.” WDR 2021 background paper, World Bank, Washing- ton, DC. Okello, Ron. 2018. “TechCrunch Battlefield Africa Winner Lori Systems Expands into Three New Countries; Wins Multiple Bids Associated with $4B Railroad Project.” Business Wire, August 18, 2018. https://www.businesswire .com/news/home/20180918005205/en/TechCrunch -Battlefield-Africa-Winner-Lori-Systems-Expands. Open Access Government. 2019. “World Bank Blockchain Pilot Shows Fresh Narrative for Haiti’s Farmers.” Block- chain News, March 25, 2019. https://www.openaccess government.org/world-bank-blockchain-haitis-farmers /61205/. Patnam, Manasa, and Weijia Yao. 2020. “The Real Effects of Mobile Money: Evidence from a Large-Scale Fintech Expansion.” IMF Working Paper WP/20/138, Interna- tionalMonetaryFund,Washington,DC.https://www.imf .org/en/Publications/WP/Issues/2020/07/24/The-Real -Effects-of-Mobile-Money-Evidence-from-a-Large-Scale -Fintech-Expansion-49549. Riley, Thyra A., and Anoma Kulathunga. 2017. Bringing E-money to the Poor: Successes and Failures.” Directions in Development: Finance Series. Washington, DC: World Bank. Robertson, Viktoria H. S. E. 2020. “Excessive Data Collection: Privacy Considerations and Abuse of Dominance in the Era of Big Data.” CommonMarketLawReview 57 (1): 161–90. Romm, Tony. 2020. “Companies Burned by Big Tech Plead for Congress to Regulate Apple, Amazon, Facebook, and Google.” Washington Post, January 18, 2020. Rouhiainen, Lasse. 2019. “How AI and Data Could Personal- ize Higher Education.” Harvard Business Review, Octo- ber 14. https://hbr.org/2019/10/how-ai-and-data-could -personalize-higher-education. Rumo Arongo Ndiege, Joshua. 2019. “Social Media Tech- nology for the Strategic Positioning of Small and Medium-Sized Enterprises: Empirical Evidence from Kenya.” EJISDC, Electronic Journal of Information Systems in Developing Countries 85 (2): e12069. https://doi.org/10.1002 /isd2.12069. Sandvine. 2019. “The Mobile Internet Phenomena Report, February 2019.” Sandvine, Fremont, CA. https://www .sandvine.com/hubfs/downloads/phenomena/2019 -mobile-phenomena-report.pdf. Schmidt, Douglas C. 2018. “Google Data Collection.” Digital Content Next, New York. https://digitalcontentnext.org /wp-content/uploads/2018/08/DCN-Google-Data -Collection-Paper.pdf. Simonovska, Ina, and Michael Waugh. 2014. “The Elasticity of Trade: Estimates and Evidence.” Journal of International Economics 92 (1): 34–50. Singh, Manish. 2020. “Google Ends Its Free Wi-Fi Program Station.” TechCrunch, February 17, 2020. https://social .techcrunch.com/2020/02/17/google-ends-its-free-wi-fi -program-station/. Startz, Meredith. 2017. “The Value of Face-to-Face: Search and Contracting Problems in Nigerian Trade.” VoxDev, July 31, 2017. https://voxdev.org/topic/firms-trade/value -face-face-search-and-contracting-problems-nigerian -trade#:~:text=My%20estimates%20suggest%20that%20 search,one%20sixth%20of%20consumer%20spending. Stott,Andrew.2014.“OpenDataforEconomicGrowth.”Work- ing Paper 89606, World Bank, Washington, DC. http:// documents.worldbank.org/curated/en/13162146815479 2082/Open-data-for-economic-growth. Suri, Tavneet, and William Jack. 2016. “The Long-Run Pov- erty and Gender Impacts of Mobile Money.” Science 354 (6317): 1288–92. https://science.sciencemag.org/content /354/6317/1288. Taylor, Phil. 2016. “Marie Stopes Turns to MPedigree for Tracking Technology.” Securing Industry, Bromsgrove, UK. https://www.securingindustry.com/pharmaceuticals /marie-stopes-turns-to-mpedigree-for-tracking-technology /s40/a2723/. UNESCO (United Nations Educational, Scientific, and Cul- tural Organization). 2020. “Education: From Disruption to Recovery.” UNESCO, Paris. https://en.unesco.org /covid19/educationresponse. UNGWG (United Nations Global Working Group on Big Data for Official Statistics). 2019. “Handbook on the Use of Mobile Phone Data for Official Statistics.” Draft, Statistics Division, Department of Economic and Social Affairs, United Nations, New York. https://unstats.un.org /bigdata/task-teams/mobile-phone/MPD%20Handbook %2020191004.pdf. Universitas Indonesia. 2020. “Dampak Ekonomi Gojek 2019: dan Peran Ekosistem Gojek di Indonesia Saat Pandemi COVID-19.” Demographics Institute, Faculty of Econom- ics and Business, University of Indonesia, Depok, Indo- nesia. https://drive.google.com/file/d/1RgVRNJaszs36-3z __smhsqBegrufhNWw/view. von Mörner, Moritz. 2017. “Application of Call Detail Records: Chances and Obstacles.” Transportation Research Procedia (25): 2233–41. https://doi.org/10.1016/j.trpro.2017.05.429. West, Sarah, and Ellery Roberts Biddle. 2017. “Facebook’s Free Basics Doesn’t Connect You to the Global Internet, But It Does Collect Your Data.” Global Voices Advox, July 27, 2017. https://advox.globalvoices.org/2017/07/27/facebooks-free -basics-doesnt-connect-you-to-the-global-internet-but-it -does-collect-your-data/. WHO (World Health Organization). 2018. “Substandard and Falsified Medical Products.” Fact Sheet, WHO, Geneva. https://www.who.int/news-room/fact-sheets/detail /substandard-and-falsified-medical-products. WHO(WorldHealthOrganization).2019.“TheGlobalVaccine Action Plan (2011–2020): Review and Lessons Learned.” Report WHO/IVB/19.07, Strategic Advisory Group of Experts on Immunization, WHO, Geneva. http://www .who.int/immunization/global_vaccine_action_plan/en/. World Bank. 2017. World Development Report 2017: Governance and the Law. Washington, DC: World Bank.
  • 137. The huge potential of open data for business applications | 117 Public intent data play a foundational role as a system of reference for the entire economy. Spotlight 3.1 The huge potential of open data for business applications Public intent data add tremendous value to the econ- omy as a whole and to various sectors. The gross value added from public data is estimated to range from 0.4 percent to 1.4 percent of GDP, according to a 2016 meta- study that focused mainly on high-income countries.1 Specific public intent data products also yield great value for particular sectors. For example, Denmark’s open access dataset of addresses generated direct economic benefits of €62 million (over DKr 450 mil- lion) between 2005 and 2009, returning the €2 million (roughly DKr 15 million) cost of investments in data manytimesover.2 Thisexampleisrelevantforlow-and middle-income countries, where the lack of addresses and address datasets have been a barrier to the devel- opment of data-driven transport and logistics services. Creating value using public intent data Entrepreneurs create value using public intent data in ways similar to how they use other data. First, com­ panies use public intent data to improvetheir­operations. USretailers,forexample,combinedatafromtheAmer- ican Community Survey with their own sales data to customize inventory regionally.3 On the operations side, businesses rely on price-level data to set wages and allowances, among many other uses.4 Second, firms use public intent data to develop new products or services, including research and analytics services.5 For example, the global energy analytics sector depends on data from the Energy Information Administration to monitor worldwide patterns of energy use.6 Other businesses use data to provide new forms of advice to theircustomers.Farmerline,acompanyinGhana,com- bines government meteorological and administrative data with proprietary data to provide advice to farm- ers via text message. Firms also rely on demographic statistics and business registers to inform their deci- sions about whether and how to enter new markets. Finally, firms may act as data intermediaries (see chap- ter 8), aggregating and repackaging government data in more accessible, user-friendly formats. The changing landscape of business sectors driven by public data While businesses driven by public data have been studied in high-income economies in some detail,7 there is little systematic information on private sec- tor use of public intent data and their value to the economy in lower-income countries. Nevertheless, a handful of sources shed light on the business use of public intent data, including in emerging economies.8 These sources indicate that companies using pub- lic intent data span a wide range of sectors in both high-income and low- and middle-income econo- mies. Around the world, the technology sector clearly dominates. In low- and middle-income economies, the research and consulting sector is the second most frequent user of public intent data. Companies using such data tend to be young and small in terms of the number of employees, with a large majority of global companies that use GovLab’s OpenData500 Global Network database having 200 or fewer employ- ees. In terms of the data used, half of the US-based OpenData500 companies use data from multiple government agencies. The US Census Bureau is one of the most used sources (16 percent). Similarly, in Mexico, the national statistics office, the National
  • 138. 118 | World Development Report 2021 Institute of Statistics and Geography (INEGI), is the most important source of public intent data for busi- nesses, with 88 percent of companies reporting that they use INEGI data. Among the 200 firms in low- and middle-income countries included in the Open Data Impact Map compiled by the Open Data for Devel- opment Network, geospatial data are the most com- monly used type of public intent data (41 percent), followed by demographic data (36 percent), economic data (30 percent), and health data (27 percent). These assessments indicate that the business sector that uses public data is much smaller in most low- and middle-income countries than in high- income economies. This pattern is related closely to challenges with public intent data in general.9 In many cases, national data systems are limited with respect to the amount of data being produced as well as their quality and usability, timeliness, openness, and accessibility (see chapter 2).10 Indeed, companies driven by open data surveyed by the World Bank Figure S3.1.1 Private company use of public data is extremely valuable in the United States, suggesting the value of open government data Source: Verhulst and Caplan 2015. Data at http://bit.do/WDR2021-Fig-S3_1_1. © The GovLab. Used with permission of The GovLab; further permission required for reuse. Note: The figure centers on 500 US firms. Starting with the right-hand, purple-shaded half of the circle, the gray lines emanating from the federal agencies show which type of private sector firms have used data from which government departments. The portion of the semicircle for each department reflects the number of firms using its data. Moving to the left-hand, green-shaded half of the circle, the gray lines emanating from the private sector reveal which categories of company have used data from which government agency. For example, firms in the finance and investment sector have used open data from 19 US departments and agencies. De pa rtm en t of Ag ric ul tu re D e p a r t m e n t o f C o m m e r c e Departm ent of Defense Department of Education Department of Energy Department of Health and Human Services Department of Homeland Security Department of Justice Department of Labor Department of Transportation Department of the Interior Departm ent of the Treasury F e d e r a l E l e c t i o n C o m m i s s i o n F e d e r a l R e s e r v e B o a r d M u l t i p l e f e d e r a l d a t a s o u r c e s N a t io n a l A e r o n a u t ic s a n d S p a c e A d m in is t r a t io n S e c u ri ti e s a n d E x c h a n g e C o m m is s io n Environm ental Protectio n Agency Tr an sp o rt at io n In s u ra n c e E n e r g y H o u s i n g a n d r e a l e s t a t e G e o s p a t i a l a n d m a p p i n g Governance Lifestyle and consumer Health care Food and agriculture Business and legal services Finance and investment Scientific research D a t a a n d t e c h n o l o g y E d u c a ti o n E n v ir o n m e n t a n d w e a th e r Re se arc h an d co ns ult ing Company categories Federal agencies
  • 139. The huge potential of open data for business applications | 119 reported poor quality and lack of openness of and accessibility to public intent data as major concerns.11 The more active public data–driven business sec- tors in Mexico and other Latin American countries illustrate the potential for low- and middle-income countries. Realizing this potential requires better financing mechanisms and high-capacity, integrated, and open national data systems (see chapter 9). Figure S3.1.1 visualizes the huge potential of open data. The figure centers on 500 firms based in the United States. It reveals widespread private sector use of publicly available federal government data, though data from some agencies are used more than from others. For example, fewer firms use open data from the Department of Agriculture than from the Depart- ment of Commerce (home of the US Census and many other important data), as evidenced by a smaller portion of the circle allocated to the Department of Agriculture. But, even in the case of agriculture, firms from seven distinct sectors have used the Department of Agriculture’s open data. Figure S3.1.1 also reveals that companies from some sectors rely on many types of government data. For example, firms in the finance and investment sector have used open data from 19 US departments and agencies. The intricacies of the connections reveal the great potential for data use, reuse, and repurposing. Such uses have only begun to be exploited in both higher-income and lower-income countries. Notes 1. Lateral Economics (2014). 2. McMurren, Verhulst, and Young (2016). 3. Hughes-Cromwick and Coronado (2019). 4. Hughes-Cromwick and Coronado (2019). 5. Gurin, Bonina, and Verhulst (2019); Magalhaes and Roseira (2017). 6. Hughes-Cromwick and Coronado (2019). 7. See, for example, Hughes-Cromwick and Coronado (2019); Lateral Economics (2014); Manyika et al. (2013); and Stott (2014). 8. See Morrison and Lal Das (2014); Center for Open Data Enterprise, Open Data Impact Map (database), https:// opendataimpactmap.org/. 9. Gurin, Bonina, and Verhulst (2019). 10. Gurin, Bonina, and Verhulst (2019). 11. Morrison and Lal Das (2014). References Gurin, Joel, Carla Bonina, and Stefaan Verhulst. 2019. “Open Data Stakeholders: Private Sector.” In The State of Open Data:HistoriesandHorizons,editedbyTimDavies,Stephen B. Walker, Mor Rubinstein, and Fernando Perini, 418–29. Cape Town, South Africa: African Minds; Ottawa: Inter- national Development Research Centre. https://www .idrc.ca/en/book/state-open-data-histories-and-horizons. Hughes-Cromwick, Ellen, and Julia Coronado. 2019. “The Value of US Government Data to US Business Deci- sions.” Journal of Economic Perspectives 33 (1): 131–46. https://doi.org/10.1257/jep.33.1.131. LateralEconomics.2014.“OpenforBusiness:HowOpenData Can Help Achieve the G20 Growth Target.” Omidyar Net- work, Redwood City, CA. https://lateraleconomics.com .au/wp-content/uploads/omidyar_open_business.pdf. Magalhaes, Gustavo, and Catarina Roseira. 2017. “Open Government Data and the Private Sector: An Empirical View on Business Models and Value Creation.” Govern­ ment Information Quarterly 37 (3): 101248. https://doi.org /10.1016/j.giq.2017.08.004. Manyika, James, Michael Chui, Peter Groves, Diana Farrell, Steve Van Kuiken, and Elizabeth Almasi Doshi. 2013. Open Data: Unlocking Innovation and Performance with Liquid Information. New York: McKinsey Global Insti- tute. https://www.mckinsey.com/business-functions /mckinsey-digital/our-insights/open-data-unlocking -innovation-and-performance-with-liquid-information. McMurren, Juliet, Stefaan Verhulst, and Andrew Young. 2016. “Denmark’s Open Address Data Set: Consolidating and Freeing-Up Address Data.” The GovLab, New York University, New York; Omidyar Network, Redwood City, CA. https://odimpact.org/case-denmarks-open -address-data-set.html. Morrison, Alla, and Prasanna Lal Das. 2014. “New Surveys Reveal Dynamism, Challenges of Open Data-Driven Businesses in Developing Countries.” Data Blog (blog), December15,2014.https://blogs.worldbank.org/opendata /new-surveys-reveal-dynamism-challenges-open-data -driven-businesses-developing-countries. Stott, Andrew. 2014. “Open Data for Economic Growth.” Working Paper 89606, World Bank, Washington, DC. http://documents.worldbank.org/curated/en/13162146 8154792082/Open-data-for-economic-growth. Verhulst, Stefaan, and Robyn Caplan. 2015. “Open Data: A Twenty-First-Century Asset for Small and Medium- Sized Enterprises.” The GovLab, New York Univer- sity, New York. https://www.thegovlab.org/static/files /publications/OpenData-and-SME-Final-Aug2015.pdf.
  • 141. Creative reuses of data for greater value | 121 Main messages Innovations in repurposing and combining public intent and private intent data are opening doors to development impacts previously unimaginable. These innovations can inform and advance policy goals, help governments improve and target service delivery, and empower individuals and civil society. When private intent data are repurposed for public purposes, they can help fill data gaps and provide real-time and finer-scale insights. When public intent and private intent data are combined, some or many of the limitations of each data type can be overcome. Private intent data can be difficult to understand, monitor, and regulate. They may also miss the poorest or other marginalized populations and perpetuate discrimination and biases. Data protection is a key issue. Responsive regulation and consumer protection measures are needed, along with recognition of which populations are omitted from an analysis. Using private intent data for effective policy making requires short- and long-term coordinated investments in training, data partnerships, and research. Best practices and guidelines need to be developed. Creative reuses of data for greater value 4 1 2 3 4 CHAPTE R
  • 142. 122 | World Development Report 2021 The power of repurposing and combining different types and sources of data L ack of data and information is no more appar- ent than during a crisis such as the COVID-19 pandemic or an earthquake. Urgent questions— What is happening? How can we help?—should receive good answers, and right away. Consider the earthquake that devastated Haiti in 2010. Large donations of supplies and money poured into the country within days of the disaster, but delivering relief was difficult because vast numbers of people scattered. Censuses were no longer useful in helping responders direct relief to the people who needed it most. Using data from mobile phones, researchers were later able to demonstrate that they could have pinpointed population movements in almost real time. They found that one-third of the estimated 630,000 residents of the capital, Port-au- Prince, had fled the city.1 Even though this study was retrospective, it demonstrated how real-time, spatially pinpointed information like this could have expedited relief efforts and saved countless lives had it been accessed contemporaneously. This example highlights an emerging question in development research: When a pressing crisis such as the Haiti earthquake or the COVID-19 pandemic emerges, what data can complement traditional public intent data to solve complex development challenges? Recent technological shifts in lower-income coun- tries—such as the adoption of mobile phones, social media, digital transactions, and mobile money—have generated a wealth of granular private intent data (see chapter 3 and box 4.1) suited to a wide range of secondary uses.2 These data are being leveraged to Box 4.1 Using cellphones to combat COVID-19 After the onset of the COVID-19 outbreak, governments began implementing policy measures to reduce social contact and curb the spread of the pandemic. Data col- lected through mobile phones, such as call detail records and global positioning system (GPS) location data, have been extremely valuable in quantifying the effectiveness of policies, ranging from partial curfews to strict lock- downs. These data enable measurement of population density, travel patterns, and population mixing in real time and at high resolution, making it possible to better target policy interventions and improve epidemiological modeling.a Analysis of GPS locations showed that by March 23, 2020, social distancing policies had helped reduce mobility in major US cities by half.b In Colombia, Indonesia, and Mexico, the impact of nonpharmaceutical interventions (such as travel restrictions and lockdowns) on mobility differed by socioeconomic group. Smart- phone users living in the top 20 percent wealthiest neigh- borhoods in Jakarta, Indonesia, reduced their mobility up to twice as much as those living in the bottom 40 percent Map B4.1.1 Mapping the home location of smartphone users in Jakarta, 2020 Source: Fraiberger et al. 2020. Data at http://bit.do/WDR2021-Map-B4_1_1. Note: This map of Jakarta’s metropolitan area shows the spatial distribution of smartphone users’ home location as a percentage of Jakarta’s total population. (Box continues next page) IBRDINS45645 | MARCH2021 JAKARTA JAKARTA IBRD 45401 | MARCH 2021 1 1–4.99 5–7.49 7.5–12.49 12.5 No data National capital Kelurahan boundaries Smartphone users (% of total population)
  • 143. Creative reuses of data for greater value | 123 Box 4.1 Using cellphones to combat COVID-19 (continued) (map B4.1.1 and figure B4.1.1).c Using an epidemiological model and estimates of population movements derived from mobile phone data, research in China found that nonpharmaceutical interventions implemented in late January 2020 led to a 98.5 percent reduction in the num- ber of COVID-19 cases one month later. Meanwhile, mobile phones have proved to be a valu- able tool for contact tracers seeking to alert individuals who may have been in contact with an infected person.d Although both private companies and government actors have produced mobile phone applications for con- tact tracing (such as the Corona app 100m in the Republic of Korea, TraceTogether in Singapore, and COVIDSafe in Australia), their efficacy relative to more traditional forms of contact tracing has not yet been established. Digital contact tracing also raises important concerns about data protection,e prompting researchers worldwide to develop contact tracing technologies that preserve pri- vacy. Examples are the Private Kit: Safe Paths developed by the Massachusetts Institute of Technology (MIT) and the Decentralized Privacy-Preserving ­ Proximity Tracing (DP3T) protocol developed by a consortium of European research institutions. Despite the potential of deploying mobility data in the fight against COVID-19, their impact on policy thus far has been limited, especially in lower-income countries. Bottlenecks include a lack of technical expertise among government organizations; restrictions on data access, especially by mobile network operators; and lack of the investments and political will required to scale up one- time projects.f To ensure that mobility data can be made accessible and useful for policy purposes, it is import- ant for all stakeholders—governments, mobile phone operators, technology companies, and researchers— to collaborate and form interdisciplinary teams to facili- tate readiness and responsiveness to future crises. a. Buckee et al. (2020). b. Klein et al. (2020). c. Fraiberger et al. (2020). d. Servick (2020b). e. Servick (2020a). f. Oliver et al. (2020). Figure B4.1.1 Smartphone location data reveal the changes in the time users spend at home in Jakarta Source: Adapted from Fraiberger et al. 2020. Data at http://bit.do/WDR2021-Fig-B4_1_1. Note: Figure shows the changes in the time users spent at home from February 1 to November 15, 2020, relative to the baseline period. PSBB = Pembatasan Sosial Berskala Besar (large-scale social restrictions). F e b r u a r y 0 5 –5 10 15 20 Time spent at home (% change) 25 M a r c h A p r i l Presidential address State of emergency Strict PSBB Transport restrictions lifted PSBB transition Strict PSBB M a y J u n e J u l y A u g u s t S e p t e m b e r O c t o b e r N o v e m b e r D e c e m b e r 2020 Wealth of administrative unit: Low-wealth smartphone users (0%–40% of wealth) Medium-wealth smartphone users (40%–80% of wealth) High-wealth smartphone users (80%–100% of wealth)
  • 144. 124 | World Development Report 2021 monitor the effectiveness of policy measures and predict outcomes of long-standing concern to devel- opment practitioners and policy makers. Technological advances in the private sector have turned data into an integral component of the pro- duction process, leading to gains in productivity and generating even more data that can be repurposed for development. Specifically, the same approaches that are transforming efficiency and innovation in the private realm are being repurposed to tackle development bottlenecks in poor countries, making the development process more efficient, innovative, agile, and flexible. Because of the nonrivalrous nature of data, private companies also are able to reuse and repurpose publicly collected data, which can generate welfare-enhancing economies of scope.3 That said, the reuse of private intent data is not a panacea and may pose unique challenges for policy making. For example, data created by businesses to track mobile phone users may miss the poorest popu- lations who do not have these technologies. Similarly, the data required to target customer experiences and to achieve business gains are different from the socio- demographic information on which policy makers rely to design inclusive policy. Furthermore, many of the algorithms used to process private intent data are considered trade secrets and thus lack the transpar- ency required for effective policy making. Transpar- ency and oversight are also important considerations when giving private companies access to sensitive data such as those related to facial recognition and surveillance (see chapter 6). Despite these challenges, combining public intent and private intent data can offer real-time insights that not only are inclusive of the entire population (or nearly so) but also are more precisely estimated for specific population segments and localities. This is especially important for the poorest people in the poorest countries, which have the largest data gaps. Too often, individuals on the lowest end of the income distribution remain on the margins when government, civil society, and the private sector lack the data to effectively allocate and target resources based on need. Leveraging all available data may reveal insights for the poor and marginalized that were previously unattainable. This chapter begins by showcasing innovative uses of public intent and private intent data for aiding development policy. Examples include data repurpos- ing and synergies to improve predictions of disease spread, streamline service delivery, and allocate aid in disaster recovery. The chapter then turns to an exploration of the challenges that arise when private intent data are repurposed or when public intent and private intent data are combined. It concludes with a framework within which policy makers and funders could invest in the human capital, data partnerships, and research needed to gain useful insights from these new types and combinations of data. Features of private intent data that can overcome gaps in public intent data Private intent data are an alluring candidate to over- come public intent data gaps and offer new perspec- tives on development problems. These types of data are increasingly large in scale, “always on,” zoomed in, and, at times, less biased. Big data. Private intent data are typically labeled “big data,”4 recognizing their wide reach and scope. The growing rates of mobile phone and social media usage enable information to be gathered from all users on these platforms. Although this process may underrepresent certain parts of the population in countries with lower usage rates, ever-larger portions of a population are being brought into the fold as the rates of mobile phone ownership and internet con- nectivity continue to increase, even in lower-income countries. When private intent data are repurposed toward a public goal, their volume and reach can not only inform first-order policy goals of poverty reduc- tion and service delivery, but also facilitate efforts to detect and study rare events, such as fraud, corrup- tion, or criminal activity, through techniques such as anomaly detection. “Always on” data. Private intent data are always on5 because the daily use of new technologies entails constant data collection. Call detail records (CDRs) and apps that log locations pinpointed by satellite-based global positioning systems (GPS) offer traces of where cellphone users travel throughout the day. When a sudden and unexpected shock hits, such as a natural disaster or a disease outbreak, such data can provide precious real-time information on human mobility and call density. The timeliness of private intent data therefore contrasts with public intent data, which are generally collected at inter- vals of 1, 5, or 10 years and thus are not always very timely. In Africa, for example, 14 of 59 countries did not conduct any surveys from 2000 to 2010, impeding the construction of nationally representative poverty measures.6 This critical situation sparked the call for a “data revolution” by the United Nations in 2014, push- ing for an increase in data collection efforts in Africa and elsewhere.7 Although the situation is improving, with the average number of surveys per country per year increasing from 0.5 in 1990 to 1.5 in 2010,8 the
  • 145. Creative reuses of data for greater value | 125 lack of timeliness of public intent data has resulted in huge knowledge gaps, which are particularly glaring following major economic shocks such as COVID-19. Meanwhile, private intent data are increasingly being used to help fill these gaps. “Zoomed in” data. Private intent data can zoom in on individuals and locations. Private companies want to know who is using their products or services and in what ways they can optimize their offerings and operations. Private intent data zoom into individuals to collect key metrics such as transaction histories to predict consumer behavior and bolster successful products. Internet Protocol (IP) addresses, browsing histories, and smartphone app logs add to a rich data- set that companies collect on a single person over time. Tracking whether app users enter a store or whether IP addresses in a neighborhood are search- ing for products on their site enable companies to better plan their store locations and stock their sup- plies. These data are now being applied to the public sphere, ranging from improving population maps9 to helping decision-makers target and optimize critical development resources. A key challenge to using indi- vidual data patterns to allocate resources or establish eligibility for products and services is data manipula- tion: individuals may strategically change browsing or other data usage behavior to appear more favorable in ranking criteria used by data algorithms to make allocation decisions. More research and policy delib- erations are needed to design algorithms and decision rules that account for such user manipulation. Potentiallylessbiaseddata. Private intent data poten- tially reveal less “biased” information about people than surveys or polls because researchers observe actual behavior instead of relying on responses. Although it is possible that respondents misreport answers during surveys, they have little incentive to do so when searching the internet. For that reason, the Google internet search engine has been dubbed a “digital truth serum.”10 This finding may apply espe- cially to opinions on sensitive topics such as racism. Few will admit their opinions in surveys, but they are revealed through internet searches and can influence political outcomes, among others.11 However, the algorithms used by search engines are considered private trade secrets and are usually optimized for private benefit—not public benefit. Without knowl- edge of the workings of these algorithms, users of search engine outcomes as an exclusive source of data may find they lead to biased and discriminatory policy predictions. Overall, combining public intent and private intent data is a powerful way to gain aggregate population insights in real time, if enough attention is given to addressing representativeness, discrimi- nation, and transparency. Calibrating private intent data with census and survey data is one way to esti- mate population-level needs. The next section offers a broad range of innovative examples of applications of private intent data to pub- lic policy and instances in which public intent and private intent data have been combined to promote inclusive and timely development solutions. New insights from repurposing and combining data Thelastdecadehasseenasurgeininnovativeresearch that repurposes private intent data and combines it with public intent data to tackle development issues. In the spring and summer of 2020 when the COVID-19 outbreak reached global dimensions, more than 950 scientific and medical articles were published that used private intent data to tackle the pandemic (box 4.2). Researchers’ ability to respond quickly to the pandemic builds on a growing trend of research that combines diverse data to tackle emerging issues. Monitoring public health Monitoring public health is a key area that could benefit from repurposing and combining public intent and private intent data. In many lower-income countries, infectious diseases routinely pose large health threats. Five of the top 10 causes of death in low-income countries are communicable diseases, including lower respiratory infections, diarrheal dis- eases, HIV/AIDS, malaria, and tuberculosis.12 Viruses have been responsible for more deaths than all armed conflicts around the world over the last century.13 Especially in countries where data are limited, new big private intent data sources can help inform public policy interventions to reduce the mortality and mor- bidity rates from infectious diseases. Identification of hotspots can help disease control programs target activities more effectively to those areas, reducing infection rates both directly and indirectly in destina- tion areas that are receiving infected travelers.14 As early as 2008, researchers began exploring how mobile phone data could be used to measure population mobility and then be applied to the study of epidemics.15 A seminal study applied this research at scale for all of Kenya using mobile phone data on nearly 15 million individuals to identify sources of imported malaria infections stemming from human mobility.16 During the 2014 Ebola outbreak in West Africa, researchers highlighted the potential benefits
  • 146. 126 | World Development Report 2021 of using mobile phone data in the design of public policy.17 However, use of these analytics at the time of the crisis remained limited.18 After onset of the COVID-19 pandemic, countries began to deploy this type of research and to pair mobile phone data with public intent data. Belgium formed a Data Against COVID-19 task force to ana- lyze deidentified mobile phone data. These data are being used to monitor changes in human mobility trends due to lockdown measures and to inform decisions related to appropriate lockdown measures. In the Republic of Korea, mobile phone data are being used to aid contact tracing efforts to contain disease spread. By combining mobile phone data with med- ical facility records, credit card transaction logs, and closed-circuit television recordings, the government is identifying people at risk of exposure.19 Lower- income countries such as Ghana and Mozambique are beginning to use deidentified mobile phone data to combat the pandemic, typically with the support of international organizations that provide analytical skills for processing the data.20 Other types of big data are also being enlisted to create measures of mobility that can improve the effectiveness of the pandemic response. Facebook disease prevention maps are being used to study COVID-19 and have been expanded to include colo- cation maps that measure comingling among people living in different areas and trends in whether indi- viduals are staying near their homes or continuing to go to other locations.21 Google has produced a new set of measures to track the response to policies aimed at flattening the curve of the COVID-19 pandemic.22 Other sources of data for GPS locations have been Box 4.2 Leveraging private intent data to tackle COVID-19 Between February and September 2020, more than 950 articles were published in scientific, medical, and technical journals that repurposed cellphone, social media, Google search, and other types of big private intent data to better understand the spread of COVID- 19 and to offer policy and operational solutions (figure B4.2.1). Despite the relatively large number of articles in a short time span, coverage of lower-income countries was low, especially those in Africa (map B4.2.1). Lack of expertise, poor training, difficult access to data, and limited research support are key areas that funders could address to ensure innovative uses of data in and about lower-income countries. (Box continues next page) Figure B4.2.1 Use of repurposed data to study COVID-19: Published articles, by type of private intent data used Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI), http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Fig-B4_2_1. Note: Figure shows the number of articles published in scientific, medical, and technical journals across time from February to September 2020. The cumulative sum across all categories is higher because some articles appear in more than one category. Articles containing “COVID” and . . . Call detail records Cellphone Facebook Google Trends Big data Twitter Social media 1,000 750 500 250 Number of articles 0 February April July 2020 September
  • 147. Creative reuses of data for greater value | 127 used by data analytics firms such as Baidu, Cuebiq, and Unacast to assess the impacts of social distancing measures for COVID-19.23 GPS data provide better approximation of locations and mobility at a finer spatial resolution, but their availability is limited by smartphone penetration and usage. In many lower- income countries, smartphone penetration is still low, and even those individuals with smartphones may only selectively turn on data or GPS because of high costs and drain on battery life. The potential of new data sources for support- ing public health and epidemiology efforts goes far beyond measures of mobility.24 Efforts are under way to use data tools as early warning systems for outbreaks and for understanding disease dynamics and routes of transmission. For example, the com- pany BlueDot provides infectious disease surveil- lance services using advanced data analytics. It was able to warn of the outbreak of COVID-19 before the official announcement in early January 2020 by ana- lyzing news reports, disease networks, and official proclamations.25 A similar prediction was made for the 2015–16 Zika outbreak that affected an estimated 1 million people, mainly in Latin America.26 By com- bining online news sources, Google search queries, Twitter posts, and government disease reports, local outbreaks could have been detected two to three weeks earlier, a retrospective study estimates.27 Com- bining public intent and private intent data sources has also improved forecasts for Ebola in West Africa28 and dengue in Southeast Asia.29 Improved forecasting of disease outbreaks and associated population move- ments is essential for efficient response measures to curb incidence rates.30 Another open and fertile source of synergy is data collected by wearables and other biotech devices. For example, the Kinsa HealthWeather app tracks fevers around the United States via smart thermometers and uses the aggregate data to create prediction mod- els for the spread of disease. This type of application is particularly relevant in crises such as COVID-19, where timely reporting of case growth can help Map B4.2.1 Uses of repurposed data to study COVID-19: Published articles, by country Source: WDR 2021 team, based on data from CORD-19 (COVID-19 Open Research Dataset) Semantic Scholar team, Ai2 (Allen Institute for AI), http://www.semanticscholar.org/cord19. Data at http://bit.do/WDR2021-Map-B4_2_1. Note: Map shows the number of articles published in scientific, medical, and technical journals across countries from February to September 2020. Article counts are divided by the COVID-19 death incidence rate. Box 4.2 Leveraging private intent data to tackle COVID-19 (continued) 7.5 No data 10.0 12.5 15.0 Number of articles IBRD 45xxx | MARCH 2021 IBRDWLD45644 | MARCH2021
  • 148. 128 | World Development Report 2021 accurately map disease spread and enable timely and appropriate public policy responses. Targeting resource allocations and responses during crises Approximately 20–30 million people worldwide are displaced every year because of natural disasters such as storms, floods, droughts, and geological events.31 Over the last decade, about 600,000 people lost their lives to natural disasters, most of them in low- and middle-income countries.32 Effective disas- ter prevention, mitigation, response, and recovery require timely, cost-effective data at fine spatial scales. However, many countries lack the adequate early warning systems and advanced geological tools to aid in this process—at times with devastating con- sequences. During the 2018 earthquake and tsunami in Central Sulawesi, Indonesia, the government could have minimized the human cost had the country’s warning system of buoys and seismographic sensors not been defective.33 As climate change continues to increase the frequency and damage of natural disasters, lower-income countries will likely bear the brunt of the economic and human impacts. Spotlight 4.1 highlights the importance of improved meteoro- logical data for lower-income countries to confront enhanced climate risks. Recent data innovations have revealed that non- traditional sources of private intent data such as mobile phone usage, social media activity, online que- ries, crowdsourcing platforms, and remote sensing technologies can facilitate disaster management.34 These devices and activities are not a replacement for advanced geological and meteorological equipment, which can predict disasters and offer early warnings. They can, however, help in government efforts to pre- vent loss and provide relief when such events occur. Various studies in both lower- and higher-income countries have found that scraping social media plat- forms for posts related to seismic activity produces an in situ impact profile of seismic damage similar to the ones produced by advanced geological instruments, the traditional source of such data.35 Similarly, Tweets have been analyzed for disaster-related keywords to detect earthquakes in Australia and New Zealand.36 Deidentified CDR data are a good predictor of popu- lation movement for weather-related disasters such as floods. For example, the textual content of Tweets was used to understand how people were reacting to the 2011 floods in Thailand. Messages were classified by their content to help highlight precise needs in affected communities.37 The geospatial nature of social media posts can further help prioritize resource allocation in times of direneed.Moreover,combininggeographicandsocial media analytics can enhance aid recovery efforts after a disaster. In the aftermath of the 2014 earthquake in Napa, California, researchers trained a machine learning algorithm to extract disaster-related seman- tics from Tweets and paired this information with geolocations to identify spatial hotspots.38 From these data, they were able to infer a disaster footprint and assess damage. They also learned that this method was transferrable to other social media platforms and locations, with tweaks for cultural differences in social media use. Similarly, researchers studying Hurricane Irma, which hit Florida in 2017, found that sentiment analysis39 on geolocated Tweets could be used to guide resource allocation.40 Social media and mobile records have also proven useful in tracking recovery efforts. After Hurricane Sandy slammed into the New York City area in 2012, researchers analyzed Tweet topics and sentiment to see how those who experienced the disaster were coping, compared with those who did not experience it. Finally, governments have long used satellite imagery to assess damage in the aftermath of natural disasters. However, this imagery usually lacks the spa- tial resolution needed for a granular assessment. It is typically considered public intent data, but a growing number of private companies are launching their own remote sensing technologies and data collection. The start-up Cloud to Street uses private satellite data to provide near real-time flood assessments to assist disaster recovery and adaptive planning. In three days in 2018, it was able to build a flood monitoring system to help the Democratic Republic of Congo deploy resources to 16,000 asylum seekers who had sought refugealongtheflood-pronebanksoftheCongoRiver. Cloud to Street leveraged high-resolution private intent satellite data with data about cropland, popula- tion, and public assets (such as roads and infrastruc- ture) to generate real-time impact estimates served on an interactive web platform and with automated alerts. As decision-makers transitioned from disaster response to recovery, Cloud to Street transitioned to using freely available satellite images—an effort that enabled longer-term support with fewer resources.41 Mapping poverty and targeting service delivery more precisely Timely, reliable data on population characteristics are vital for responsive social and economic policy making. Mobile CDR and remote sensing data have
  • 149. Creative reuses of data for greater value | 129 recently been used to predict poverty patterns on a granular level and in a timely fashion, thereby helping to better target government services. Use of these data sources costs a fraction of that for fielding censuses or household surveys. Similar data from social media, online engagement, and satellite imag- ery are reducing the constraints to collecting data on the most vulnerable and hard-to-reach populations. Moreover, the same algorithms that Google and Facebook use for online consumer marketing can be tweaked to direct resources to people living in pov- erty. In the same way that these tech firms predict the advertising that may interest consumers based on their digital behavior, development actors can use digital behavior to predict whether people are eco- nomically vulnerable.42 Research relying on data from Rwanda reveals that past histories of mobile phone use extracted from CDRs are a reliable predictor of socioeconomic sta- tus as validated against survey data.43 Moreover, the researchers find that the predicted characteristics of millions of mobile phone users can be aggregated to the same distribution of wealth across the entire coun- try or at the cluster level—approximately equivalent to a village in rural areas or a ward in urban areas—as that indicated by traditional data sources. Such highly localizedpovertymapscanbeusedtoeffectivelytarget policies, programs, and resources to the poorest.These methods can also improve demographic targeting of services by gender, age, and income level. For exam- ple, CDR data have been used to identify the gender of phone users,44 as well as to identify the ultrapoor.45 Beyond the realm of CDRs, research in higher- income countries has shown that online browsing history and social media activity can also reliably pre- dict household income. Social media footprints were used in Spain to infer city-level behavioral measures and predict socioeconomic output, specifically unem- ployment.46 Similarly, data from Yelp reviews of retail shops were used to measure changes in gentrification and predict local housing prices.47 Equipped with real- time and localized insights and trends, policy makers can better inform policies to target areas that have been affected by short-term economic shocks or long- term economic shifts. Remote sensing technology is yet another novel way to collect population characteristics, predict pov- erty patterns, and improve public service delivery.48 Researchers have relied on publicly available data from Africa to both calibrate and validate machine learning models. The Demographic and Health Sur- vey (DHS) sponsored by the United States Agency for International Development (USAID) and the World Bank’s Living Standards Measurement Study (LSMS) surveys provide high-resolution data on household wealth and consumption expenditures. When cali- bratedwiththesesurveys,satelliteimagerycanpredict poverty.Atthesurveyclusterlevel,whenusedwithsur- vey data from Malawi, Nigeria, Rwanda,Tanzania, and Uganda satellite imagery can explain 55–75 percent of the variation in wealth and consumption per capita. Estimates of economic well-being using this approach outperformed both similar estimates using satellite readings of nighttime light in the same countries and estimates using mobile phone data in Rwanda. Criti- cally,thisapproachhasbeenshowntoworkreasonably well for predicting wealth and poverty in countries when they are excluded from the sample used to train the model, suggesting the approach is scalable across other countries, at least in Africa. Ensuring road safety in transport and transit Road transport is an important element of economic development. Access to transport and mobility are highly correlated with income and quality of life. Even though lower-income countries have only half of the world’s vehicles, they account for 90 percent of road traffic fatalities. In 2011 the World Health Organization (WHO) and the World Bank launched a Decade of Action for Road Safety, and they have provided funding and technical assistance to build systems aimed at reducing injuries and deaths on the road. Despite these efforts, little progress has been reported in low- and middle-income countries, and the number of fatalities remains high.49 A new and growing body of literature studies how alternative sources of data can be used to make prog- ress toward achieving national road safety outcomes. In the public sector, for example, a study in Nigeria provided road safety agents with a monitoring sys- tem to investigate and record road safety events via mobile phone.50 Access to this mobile phone–based database helped disseminate information better and enabled agents to respond faster to road accidents. Such transit monitoring practices are becoming more widespread, especially in the private sector. Commer- cial banks in Kenya now require a tracking device in minibuses before approving loans to bus service owners. As a result, today most long-range buses in the country are equipped with GPS.51 This technology advancement serves the dual purpose of tracking assets under lien for the bank’s private benefit and promoting safer driving for public benefit.
  • 150. 130 | World Development Report 2021 Social media analytics have also been applied in the private sector to understand the traffic safety culture. A recent study in Washington State in the United States mined Twitter data to understand the patterns, behaviors, and attitudes related to road safety.52 Thestudyconductedsentimentanalysisbased on traffic-related keywords to extract latent views on topics such as safe driving measures, accidents, law enforcement and patrolling, and accident-causing behavior. It found that sentiment analysis using social media posts can be used in developing policies to improve traffic safety relevant to specific contexts. This type of sentiment analysis could be applied in lower-income countries as well, with substantial benefits. Techniques are also being developed to fill in gaps in data on the number and location of acci- dents in lower-income countries. Recently, research- ers developed an algorithm to identify and geolocate crashes from Twitter feeds to substantially increase the digital data available to prioritize road safety policies. Spotlight 4.2 describes how car crash danger zones were pinpointed in Nairobi, Kenya, by combin- ing police reports and crowdsourced data.53 More broadly, research in this area has focused on the transit industry to answer broader development questions in the realm of private sector development. For example, a study in Kenya found that providing bus owners with data on their employees’ driving behavior can improve firm operations.54 Specifically, they placed GPS devices in Kenya’s inner-city public transport vehicles and tracked a variety of data that captured driving behavior, including acceleration, jerk, location, and timestamp to measure the number ofdailysafetyviolations.Themaincontributionofthis datainnovationwastocorrectinformationalasymme- try: once minibus owners could track driving perfor- mance, drivers could receive more generous contracts for better performance. In turn, drivers operated in a manner less damaging to the vehicle, more frequently met targets, and reduced underreporting of revenues. Thus incentives between the company (principal) and the drivers (agents) were better aligned. These types of data can also provide governments with feedback to use in redesigning their road infrastructure and guide interventions to reduce accidents. Monitoring illegal fishing and deforestation Recent advances in combining public intent and private intent data are also improving the monitor- ing of natural resource extraction. Box 4.3 features one example: identifying illegal fishing in protected ocean waters. Efforts to monitor deforestation have also begun to leverage public and private datasets. Combining data in this way has enabled indigenous groups to patrol their forest reserves and defend against encroach- ment. With the aid of open-access or cheaper private satellite imagery, cloud computing, community observations, and publicly available property maps, community-based forest monitoring has become increasingly effective in identifying encroachment.55 In addition, through social media and platforms such as Global Forest Watch the international com- munity can better help local groups hold govern- ments accountable in achieving national sustainable development commitments.56 Similar data are being used by companies to ensure that their suppliers are meeting sustainability standards for forest products. A recent initiative, Radar Alerts for Detecting Defor- estation (RADD), was launched by the world’s 10 larg- est palm oil producers and buyers to monitor illegal deforestation in palm oil plantations.57 By funding development of a system to detect illegal deforesta- tion using public radar imagery, property maps, and private procurement data, this initiative may signal a shift from civil society monitoring the private sector to the private sector monitoring itself to ensure that company commitments are met. Keeping governments accountable Emerging data types are enabling civil society to better monitor corruption. Utilizing crowdsourced data and web scraping, social media discussion boards are emerging as ways in which local leaders can act against corrupt officials and receive real-time feedback on the impact of anticorruption policies.58 Data reported in newspapers have been used to target corruption, thereby allowing civil society organiza- tions to press for stricter governance measures. A systematic, real-time view of corruption trends can be gained from the news flow indices of corruption (NIC) constructed by the International Monetary Fund (IMF), drawing on country-specific searches of more than 665 million news articles.59 Regressing the NIC onto the real per capita gross domestic product (GDP) revealed that changes in corruption levels as measured by the NIC indicators were associated with 3 percent lower economic growth over the next two years. Combined with election data, NIC data have helpedidentifycountriesthathadpeaksincorruption before or after elections. These findings can prove helpful to international responses to corruption. Private sector data are making it possible for international organizations and civil society actors to monitor policy and report on important events
  • 151. Creative reuses of data for greater value | 131 AIS Automatic identification systems (AIS) is a collision avoidance system that constantly transmits a vessel’s location at sea and can be used to identify and track fishing vessels. Radar Radar images (SAR) can identify large metal vessels and penetrate clouds. Nighttime Nighttime optical imagery (VIIRS) picks up the presence of fishing vessels using lights to attract catch or conduct operations at night. Optical imagery Daytime high-resolution optical imagery can be used to visually identify vessels. Optical images High-resolution Detect lights Global daily coverage Identify metal vessels Penetrate clouds Identify vessels Track movements such as elections in real time. The Inter-American Development Bank, in partnership with governments in Latin America, has launched a website that uses crowdsourced civic feedback to monitor public works projects.60 Similarly, Civic Cops, a start-up in India, providesasuiteofdigitalplatformstoconnectgovern- mentswithcivilsociety,notablyofferingaservicethat allows civic complaints and citizen service requests to be filed by mobile phone and directed to the cor- responding public authorities. Civic engagement data Box 4.3 Preventing illegal fishing in protected maritime areas Monitoring illegal fishing in Marine Protected Areas (MPAs) is difficult because of their size and distance from land. The boundaries of MPAs are curated and made open access by the United Nations Environment Programme (UNEP) and the International Union for Conservation of Nature (IUCN). Yet identifying boats in vast expanses of the ocean requires innovative uses of data that are not publicly available. Global Fishing Watch has data partnerships with the firm ORBCOMM to access raw data from commercial trawlers’ automatic identification systems (AIS), which provides the real-time geographic coordinates of each trawler to help avoid collisions and provide other traffic services. AIS data can be combined with optical and radar imag- ery from satellites to detect illegal fishing activity (fig- ure B4.3.1). By overlaying MPA boundaries on AIS data used to identify boats and determining fishing behav- iors from the time spent in specific areas, researchers found that 59 percent of MPAs in the European Union were commercially trawled. In areas that were heavily fished, the presence of sensitive species (such as sharks, rays, and skates) was 69 percent lower.a Figure B4.3.1 Public intent and private intent data can be combined to detect illegal fishing activity Source: Infographic taken on July 8, 2020, globalfishingwatch.org. © Global Fishing Watch. Used with permission of Global Fishing Watch; further permission required for reuse. Note: Public intent data include satellite data. Private intent data include data from trawlers’ collision avoidance systems. a. Dureuil et al. (2018).
  • 152. 132 | World Development Report 2021 have also been used to monitor elections in lower- income countries. For example, in Sierra Leone’s 2012 elections a collection of citizen journalists traveled throughout the country and reported election activity through SMS text messages, which were then posted on a Tumblr website, pegged to a Google map, and disseminated on Twitter.61 Benchmarking policy priorities Private intent data repurposed by international organi- zations, civil society actors, and private companies are being used to track policy goals and benchmark policy priorities.These initiatives are invaluable because they provide unique and comparable data across countries that are not collected by national governments. The data being harvested and disseminated to promote financial inclusion have been widely recog- nized by policy makers as critical to reducing poverty and achieving inclusive economic growth. Partner- ing with the polling firm Gallup Inc., the World Bank launched the Global Findex database in 2011, the world’s most comprehensive database on how adults save, borrow, make payments, and manage risks (map 4.1). This dataset was created by adding a mod- ule to the Gallup World Poll, which offers a standing global survey that produces comparable data across countries and across time. Researchers, private com- panies, and international organizations use these data to understand the lives of people everywhere.62 The Global Findex database has become a mainstay of global efforts to promote financial inclusion. In addition to being widely cited by scholars and development practitioners, Global Findex data have been used to track progress toward the World Bank’s goal of universal financial access by 2020 and the United Nations’ Sustainable Development Goals (SDG Target 8.10). Data synergies can also help in critical policy areas such as food security in both times of normality and crises such as the COVID-19 pandemic. The potential Source: World Bank, Global Findex (Global Financial Inclusion Database), https://globalfindex.worldbank.org/. Data at http://bit.do/WDR2021-Map-4_1. Note: Data are not displayed for economies in which the share of adults without an account is 5 percent or less. Map 4.1 Private intent data can provide unique and comparable information not collected by national governments, such as the number of adults who lack a formal financial account Globally, 1.7 billion adults lacked a formal financial account in 2017 Number of adults without an account 200 million 1 million 10 million 100 million IBRDWLD45646 | MARCH2021
  • 153. Mobile network coverage on farms IBRD 45647 | MARCH 2021 4G 3G 2G None Creative reuses of data for greater value | 133 to combine geospatial data with farmer output and market pricing can improve the logistics and man- agement of critical food systems. Meanwhile, inter- national organizations have partnered with compa- nies to create public intent surveys to track progress toward the SDGs and inclusive development. For example, in 2014 the United Nations Food and Agri- culture Organization (FAO) began to add questions to the Gallup World Poll to collect data for its Food Insecurity Experience Scale (addressing SDG 2). In 2015 the International Labour Organization (ILO) and Walk Free Foundation added questions that measure the incidence of modern slavery (addressing SDG Tar- get 8.7). Through a partnership with Facebook, in 2018 the World Bank and the Organisation for Economic Co-Operation and Development (OECD) launched the Future of Business biannual survey.63 The survey targets active micro, small, and medium enterprises (MSMEs) that host a Facebook business page. Using these data, researchers have been able to study the gender pay gap across 97 countries.64 Apart from surveys, companies are beginning to repurpose their own data for the public good. During the COVID-19 pandemic, Google began releasing updated community mobility reports for 135 coun- tries.65 These reports rely on users’ location data to showdailychangesinmobilitypatternsatthecountry or state/provincial level, such as fewer trips to transit stations, retail stores, parks, grocery stores, pharma- cies, workplaces, or residential addresses. These data give public health officials and the general public a way to benchmark a region’s response to COVID-19 relative to other regions and over time. Because the data are collected systematically across countries, they can also be used to compare behavioral responses across the world. Another example of a private com- pany repurposing its own data for public benefit is the internet speed test company Ookla, which provides a global index for internet speeds that ranks countries for their mobile and fixed broadband.66 These data can be used by governments and funders to prioritize investments in broadband coverage. Researchers are also combining global public intent and private intent datasets to prioritize fund- ing streams for donors. One example is in the digital agricultural space, where farmers can access exten- sion services on their cellphones. Digital agricultural interventions offer a solution to the dearth of agricul- tural extension agents in many lower-income coun- tries, where the ratio of farmers to extension agents often exceeds 1,000 to 1.67 Digital services can provide farmers with expert scientific advice based on their local field, market, and climatic conditions. Yet most small-scale farmers live in areas with lower 3G and 4G coverage than in areas with relatively high shares of large-scale farms (map 4.2 and figure 4.1).68 This Map 4.2 Agricultural extension services can be tailored to the slower, older broadband internet accessible to many small-scale farmers Source: Mehrabi et al. 2020. Data at http://bit.do/WDR2021-Map-4_2. Internet service among small-scale farmers IBRDWLD45647 | MARCH2021
  • 154. 134 | World Development Report 2021 finding suggests that the wave of digital agricultural services should focus on 2G solutions (such as voice and text messaging) to ensure that small-scale farm- ers are reached. Combining private intent broadband coverage data from the data aggregation company Mosaik (now part of Ookla) with public intent farm size data yields localized estimates of broadband usage at 10 square kilometer resolution. This type of analysis can be used in making decisions about the deployment of infrastructure to support the faster broadband required for digital services that depend on smartphones. Limitations in using private intent data for development Despite the enormous potential offered by private intent data through repurposing and synergies, sev- eral important limitations and challenges affect their use for development projects. These issues should be taken into account in the design of future research and public policy. Data coverage and representativeness A key limitation of most private intent data is their lack of representativeness. Private intent data are often a by-product of the use of digital technologies such as mobile phones or the internet. Having access to these technologies typically requires infrastruc- ture resources such as electricity or broadband that are distributed unequally in lower-income coun- tries. In addition, because smartphone ownership is skewed toward those who can afford the phones, the data collected through these technologies primarily highlight the characteristics of a relatively wealth- ier share of the population. A 2012 study combining CDRs and surveys found that mobile phone owners in Rwanda were wealthier, better educated, and predominantly male.69 Similar conclusions emerged from an analysis of the population of mobile phone owners in Kenya.70 The lack of representativeness is even more pronounced in social media data, which typically require that users be literate in addition to having internet access. Moreover, because of the access charges associated with internet use, only the wealthy can afford to use the internet on their mobile devices. Estimates from Ghana, Kenya, Nigeria, and Senegal suggest that less than one-third of the popu- lation uses internet on a mobile phone, and less than 15 percent in Mozambique, Rwanda, Tanzania, and Uganda.71 To overcome the lack of representativeness of private intent data, development practitioners often rely on statistical methods to combine them with public intent data. One important source of alternative data is satel- lite imagery, which can be either public intent or pri- vate intent, depending on the application. Images col- lected by satellites have the advantage of being fully representative of the population, and they are well suited to picking up measures of building density that are highly correlated with population density and, by extension, economic well-being. Satellite data, however, come with an important limitation—they are typically available only for aggregated geographic units such as grids or villages. So-called “bottom-up” statistical techniques combine survey data with remote sensing indicators to permit greater geospa- tial precision (box 4.4). Data profiling and discrimination Because of the complexity and unstructured nature of private intent data, data scientists are increas- ingly relying on modern machine learning methods and algorithms to analyze them.72 These algorithms can contain millions of parameters, which can be extremely costly and time-consuming to calibrate.73 Machine learning experts thus typically rely on algorithms that are “pre-trained” using very large quantities of private intent data to make them easy to use for a variety of tasks. Although these algo- rithms are extremely useful for extracting insights from complex datasets, researchers in recent years realized that biases in the data used to calibrate these algorithms could contribute to discrimination,74 with adverse consequences for people’s welfare. Other research found that a machine learning tool created Figure 4.1 Gaps in network coverage differ across farm sizes, affecting agricultural extension services Source: Mehrabi et al. 2020. Data at http://bit.do/WDR2021-Fig-4_1. 0 20 0–1 Network coverage (%) 1–2 2–5 5–10 10–50 50–100 Farm size (hectares) 100–200 200 40 60 80 100 Internet service: 2G 3G 4G
  • 155. Creative reuses of data for greater value | 135 to predict the future criminal behavior of defendants in the United States embedded racial discrimination: black defendants were twice as likely as their white counterparts to be falsely classified as future crimi- nals.75 Similarly, image search engines such as Flickr, which have been the source of training data for vari- ous computer vision algorithms, have been shown to overrepresent light-skinned men between the ages of 18 and 40, leading to poorer performance by these algorithms when making predictions of underrepre- sented categories such as women or minorities.76 Similar issues arise when machine learning algo- rithms are pre-trained using text containing racist and sexist stereotypes. Text generation algorithms trained on massive online text databases that were scraped from the web, such as the GPT-2 database created by Open AI, have been found to generate racist and anti-Semitic text in response to specific inputs.77 When trained on Google News, word-embedding algorithmsaimedatmeasuringthesimilaritybetween words tend to propagate the sexist biases reflected in the text, highlighting similarities between “man” and “computer programmer,” whereas “woman” appears to be associated with “homemaker.”78 Arguably, such discrimination can have larger consequences in lower-income countries, which typically lack safety nets and social protection mechanisms. Data transparency and manipulation Both the data-generating process and the algorithms usedtoprocessprivateintentdatasufferfromalackof transparency. The algorithms used by search engines Box 4.4 Using statistical methods and private intent data to improve representativeness and geospatial precision Combined data sources, by improving the representa- tiveness and precision of survey data, enable indicators to be reported at finer spatial scales. One statistical approach to improving representativeness typically used when combining survey data with mobile phone data or satellite imagery is to average the data from different sources using a common geographic unit of analysis. For example, a welfare measure such as an asset index could be averaged across all households in a village (enumeration area). The results are then related to satel- lite imagery or mobile phone data. This procedure works well when extrapolating from imagery to predict average consumption for countries or large areas not covered by a survey. This method can also be used to generate local estimates of welfare within a country, provided that an appropriate statistical method is used to directly incor- porate information from the sample into the estimation procedure to obtain more precise estimates. Facebook engineers have used deep-learning algo- rithms to detect buildings in satellite imagery, allowing them to downscale population estimates from the cen- sus to a much finer spatial resolution. However, these methods have significant drawbacks. Predictions based on models specified at aggregate levels will generally not deliver precise estimates unless they are combined in an appropriate way with survey-based estimates. Precision is an important consideration because most national statistical offices will not publish imprecise esti- mates due to quality concerns. Furthermore, geographic downscaling relies on a few key assumptions. Facebook assumes the population is distributed in proportion to the “built-up area,” which leads to inconsistencies between the estimates and the census. For example, because a smaller portion of buildings in urban areas are residential, relying on built-up area to distribute popula- tion could exaggerate population counts in urban areas compared with rural ones. An alternative method for estimating the population of small areas is to use “bottom-up” methods that draw on data from survey listing exercises rather than “top- down” disaggregation of census data. “Bottom-up” techniques offer the important advantage of being able to produce updated population estimates without a census at a fraction of the cost. They use survey data to calibrate a model that relates population in the areas sampled by the survey to remote sensing indicators. Geospatial indicators that predict population density include the geographic size of the village, the number of buildings, the extent of built-up area, and the presence of nighttime lights. The model can then be used to gen- erate population estimates nationwide. Similar methods can be used to generate more precise estimates of nonmonetary poverty.a They likely could be applied to a variety of socioeconomic indicators, including monetary poverty, labor market outcomes, health outcomes, and educational attainment. a. Masaki et al. (2020).
  • 156. 136 | World Development Report 2021 are not public, and they are constantly optimized to improve users’ experience. This process can lead to inaccurate predictions of policy outcomes, such as the notorious Google Flu Trends index. In 2009 a team of scientists at Google published a paper describing an innovative method to predict the number of flu cases in the United States using the volume of search terms related to the flu on Google.79 Their Google Flu Trends index was initially able to predict official numbers ahead of the US Centers for Disease Control and Prevention (CDC), until it made headlines in 2013 for incorrectly predicting twice the number of actual flu cases. Scientists investigating what went wrong realized that many search terms used as predictors were associated with the onset of winter instead of the onset of colds.80 This “overfitting” is a major con- cern when private intent datasets containing high- dimensional data (that is, data with a high number of featuresorindependentvariables)areusedtonowcast policy outcomes that are infrequently observed. New generations of forecasting models based on private intent data should aim to rely on information coming from multiple private data sources to avoid being too dependent on the idiosyncrasies of a single source. Even when accurate, predictive models are often so opaque that their predictions cannot be easily communicated to policy makers. Because machine learning is increasingly used to shape development policies, more research is needed to make complex algorithms transparent and interpretable, thereby increasing their legitimacy and ensuring they do not contribute to unequal outcomes. More research is also needed to understand trade-offs between inter- pretability and predictive performance. For example, researchers have estimated models using data to predict poverty from satellite imagery in both Sri Lanka and Uganda by focusing on objects in images that correlate with standards of living such as roads, buildings, and cars.81 In each country, the interpre- table model performed as well as commonly used black-box computer vision algorithms, indicating that model interpretability does not necessarily come at the cost of performance. The performance of image recognition algorithms may be constrained, however, because they are initially trained to detect a wide vari- ety of objects using millions of images, which may not isolate the most important portions of the images for the specific purpose of predicting poverty. An additional challenge of relying on algorithms to design policy is that they can be manipulated. People can change their behavior in response to algorithmic decision-making to trick the system and maximize their interests. For example, the nonprofit GiveDirectly facilitates direct cash transfers to poor households. As a proxy for poor living conditions, sat- ellite imagery was initially used to target households with thatched roofs. When GiveDirectly’s methods became common knowledge, some families pre- tended to live in a thatched structure near their home to qualify for the aid.82 This concern about manipu- lation increasingly motivates the design of machine learning algorithms that assign more weight to personal characteristics less likely to be subject to manipulation.83 Investments in data innovations: Building a culture of data Effectively leveraging new types of data requires investinginhumancapital,datasharing,andresearch in lower-income countries. This section describes areas in which governments, donors, and advocates of corporate social responsibility can help promote innovative uses of data for public benefit in lower- income countries, thereby helping to build a culture for the use of data and evidence. Investing in people Buildingtheskillsofanalystsanddecision-makers.Leverag- ing the comparative advantages of public intent and private intent data requires a long-term approach to enhancing domestic human capital in lower-income countries. Investments in human capital should focus on decision-makers and analysts. Strengthening the data and statistical literacy of decision-makers can help them understand the potential utility and limita- tions of these new data sources. This understanding is key for them to champion a data innovation agenda and advocate for the required human, technological, and financial resources. Analysts, on the other hand, need unique skill sets to leverage private intent data that bridge many disciplines, including statistics, economics, computer science, geographic informa- tion systems (GIS), and the multidisciplinary field of data science. Although many of these skills are akin to those needed to bolster the capacities of national sta- tistical offices (NSOs), teams with exposure to private sector data and data systems will be able to work more efficiently across data types and foster collaboration. At the country level, it is critical to build analysts’ skills to integrate public intent and private intent data for public policy design and evaluation. These skills include data engineering to manage, process, and link public intent and private intent data; analyzing inte- grated datasets using traditional statistical and econo- metric methods and the latest advances in machine
  • 157. Creative reuses of data for greater value | 137 learning; and visualizing the emerging insights. These skills must be augmented with acute awareness of the ethics and data protection dimensions of public intent and private intent data sources. Several competency frameworks developed for big data analytics are use- ful for a more granular understanding of the skill sets required for data acquisition, processing, analysis, visualization, and reporting.84 These broad directions for capacity building focus on catalyzing the use of new data sources, in contrast to the recommendations presented in chapter 2, which focus on strengthening data production within the public sector. Enhancing tertiary education. The long-term process of acquiring these skills begins by enhancing tertiary education. Because of the wide array of competencies that data scientists are expected to possess, univer- sity and graduate degree programs may have to be altered, particularly in lower-income countries. Stu- dents need the foundational statistical skills central to understanding and using public intent data, as well as the frontier skills in artificial intelligence (AI) and machine learning at the heart of leveraging the value from the integration of public intent and private intent data. The curricula of degree programs—in the fields of statistics, economics, computer science, and GIS—could be revised to align formal education with the practical demands of jobs in data analytics. In addition, new degree, graduate, and certificate pro- grams with a data science theme could be established. Promoting partnerships with universities and private companies in higher-income countries. Such partnerships can be instrumental in achieving these education goals and enhancing training in contemporary data topics such as machine learning and AI. These types of initiatives can help tailor research in lower-income countries that leverages private intent data to local contexts and hires more local researchers. This would be a welcome trend because this research field has been predominantly led by principal investigators who are not nationals of these countries. Proficiency with AI is one of the most coveted data skill sets. It involves feeding computers large amounts of data to train them to identify patterns and make predictions. For example, seismic activity data are crunched by computers to learn how to pre- dict earthquakes,85 and satellite images of agricultural areas are processed to estimate crop yields.86 Accord- ing to an analysis of self-reported job skills on the professional network platform LinkedIn, the United States leads in AI, followed by China (see figure 4.2).87 Low- and middle-income countries need to catch up to these emerging trends in skills. In South Africa, the minister of communications and digital technologies argues there is no shortage of talent in the Africa region, but rather a lack of visionary policy makers to drive digitization and enable key infrastructure such as data centers and cloud computing.88 Technical training can sometimes be obtained cheaply or at no cost. Some digital companies provide free online training, and their certifications often attract job seekers.89 Cisco’s Networking Academy has trained more than 10 million people in low- and Figure 4.2 Artificial intelligence specialists gravitate to the US market, no matter where they are educated Source: MacroPolo, “The Global AI Talent Tracker,” https://macropolo.org/digital-projects/the-global-ai-talent-tracker/. Adapted with permission of MacroPolo/Paulson Institute; further permission required for reuse. Data at http://bit.do/WDR2021-Fig-4_2. Note: Country affiliations are based in panel a on the headquarters of institutions in which researchers currently work and in panel b on the country in which researchers received their undergraduate degree. United States, 59 China, 11 Europe, 10 Canada, 6 United Kingdom, 4 Others, 10 a. Location of employment (%) b. Location of study (%) China, 29 United States, 20 Europe, 18 India, 8 Canada, 5 United Kingdom, 4 Iran, Islamic Rep., 3 Israel, 3 Others, 10
  • 158. 138 | World Development Report 2021 middle-income countries, often in partnership with local academic institutions with no or low-cost tui- tion. It also offers free online courses.90 Although basic tech knowledge is needed to participate in these options, these offerings suggest that relevant training can be obtained in many developing countries at low cost provided good broadband internet connectivity is available. Popular cloud data management and ana- lytical applications also feature graphical user inter- faces, making it easier for those without advanced coding skills to use them. Increasing training, mentorship, and on-the-job train- ing. Improvements along the formal education sup- ply chain can be augmented by on-the-job training efforts that target a broad coalition of data producers and users across the public sector, academia, and civil society. Increasing access to online training platforms (such as DataCamp and Coursera) and online degree and certificate programs, as well as free courses offered by prestigious universities in higher-income countries, can help build capacity across an impres- sive array of topics related to both foundational and frontier data analytics. These activities could be supplemented by continued support of emerging data science initiatives that provide scope for col- laboration, mentorship, and learning, including the Deep Learning Indaba Institute,91 Data Science Africa conferences,92 and the competition platform Zindi.93 Skills training companies and platforms have recently surfaced supporting the development of digital data skills in developing countries and linking trainees to employers. Upskilling platforms such as Andela and Gebeya in Africa and Revelo in Brazil train students in data analytics and software development. Andela, founded in 2014, is training young people to meet the demand for information technology (IT) talent globally and within Africa. Gebeya, founded in 2016, matches trainees with companies in Africa.94 And data labeling companies such as CloudFactory in Kenya and Nepal and Samasource in Kenya are creat- ing jobs for cleaning, categorizing, and labeling data used for AI applications.95 As for its continued support of short-term training and mentorship programs in lower-income countries, the international community should evaluate the conditions for achieving sustained improvements in local capacity to identify short-term capacity-building models that hold promise. Strengthening data literacy among senior leadership and creating institutional environments that encourage the use of sophisticated data and evidence. The big push to build an army of data scientists for jobs in the public sector, private sector, and civil society must be complemented with efforts to create enabling insti- tutional and leadership environments (see chapter 8) that place a high premium on the use of data and evidence—both internally for management of these institutions and externally for understanding and producing policies that enhance welfare. To help strengthen data literacy, especially in low-capacity settings, regional and international development partners can leverage their expertise or technical partnerships to provide governments with technical assistance. They can also organize objective peer reviews for gauging the relevance and accuracy of complex research that hinges on the integration of public intent and private intent data sources, includ- ing efforts sponsored by international agencies them- selves (see spotlight 2.2). On the whole, strengthening the data literacy of the senior leadership of public sector institutions will not guarantee that they will seek data and evidence when designing policies, especially if their insights do not appear to contribute to the political objectives of their government (see chapter 8). As discussed in chapter 2, mutually reinforcing constraints in financ- ing,humancapital,datagovernance,anddatademand must be overcome as part of a long-term, holistic plan backed by domestic support from politicians of the major political parties, academia, and civil society. In the short term, strengthening human capital in NSOsandlineministriesinlower-incomecountriesin the production and use of public intent data will indi- rectly contribute to the pool of skill sets required for public intent and private intent data to be integrated into official statistics and knowledge products gener- ated within the public sector (see chapters 2 and 9 for further discussion). International organizations can provide these institutions with technical assistance to cultivate open data practices and to build skills in the creation and dissemination of public use census, survey, and administrative datasets that are subject to international best practices in deidentification. This effort can catalyze downstream research that brings together public intent and private intent data sources. Statistical capacity-building projects financed by international organizations and traditionally focused on the production and use of public intent data should be expanded systematically to allow for investments in skills critical to the integration of public intent and private intent data sources. NSOs could establish a business line on experimental statistics (that is, statis- tics that leverage new data sources and methods to better respond to users’ needs and can be viewed as official statistics “in the making”). This business line would provide a more direct route to investing in staff
  • 159. Creative reuses of data for greater value | 139 who can conduct cutting-edge research grounded in synergies among public intent and private intent data sources.96 Revamping NSOs to perform nontraditional roles with private intent data. In general, for NSOs to maintain relevance in a landscape in which they no longer gen- erate the majority of the data, they should be empow- ered data stewards endowed with qualified staff who can perform nontraditional roles. NSOs must be able to field requests for accessing confidential data that can be used to calibrate and validate models that fuse public intent and private intent data sources. By pursuing a work program on experimental statistics, NSOs should aspire to be proactive contributors to research that would assess the public intent data requirements of synergistic applications. The Data Science Campus in the United Kingdom’s Office for National Statistics (ONS) is an example of a unit in an NSO that is tasked with leveraging the latest advances in data science and the synergies between public intent and private intent data sources to serve the public good. The Campus works on data science projects not only for the ONS, but also for the UK government as well as international organizations in collaboration with partners from academia and the private sector.97 Twinning arrangements between the NSOs in high-income countries with similar initia- tives and NSOs in low- and middle-income countries can be one way to strengthen NSO capabilities in low-capacity environments to create units akin to the ONS Data Science Campus. NSOs will also need to grapple with data pro- tection issues. They must, for example, determine whether spatially deidentified data are sufficient for calibration purposes and what minimum volume and scope of confidential data will have to be accessed for specific applications. NSOs also can carefully identify applications in which access to confidential data are not required. However, accommodating requests for applications with well-defined and well-articulated confidential data needs or responding to time- sensitive requests tied to immediate policy needs (such as a humanitarian or disaster response) ulti- mately require that NSOs have personnel who are trained in data protection and law and who can enter into and enforce data sharing agreements to mitigate data protection risks. To fulfill these roles, NSOs must receive a significant infusion of financial and human capital and should consider actively engaging—at least in the short term—international organizations or academic institutions and research organizations, at both the local and international levels, to bridge the gaps in internal institutional and technical capacity. Investing in data accessibility. Accessing private intent data remains challenging, especially in lower- income countries. Large barriers, such as protecting customers and maintaining competitive advantages, prevent companies from sharing their data. In addi- tion, pulling data from a company’s database requires computing and human resources that are typically outside of a business’s key performance indicators. If a public organization has poorly formulated requests for a company’s data, compiling and exporting data can become a time-intensive burden on companies. Even if a company is willing and able to share its data, because of the diversity of private intent data types it is difficult to create standards to share data. Shared data must have clear documentation, be in a usable format that is interoperable with other private and public datasets for integration, and have been deiden- tified. Creating these types of standards may require third parties to coordinate efforts and will place more resource burdens on companies. Utilizing data collaboratives and research partnerships. These cooperative arrangements are essential ways for different sectors, research institutions, and gov- ernments to share data. The Open Data Lab describes data collaboratives as moving beyond public-private partnerships to pool data resources that researchers use for public benefit.98 A successful example can be found in Nairobi, Kenya, where researchers have partnered with local government agencies to develop spatially integrated road safety datasets with inputs from administrative, social media, private, and tradi- tional sources (see spotlight 4.2 for details). Data collaboratives can be coordinated by civil society or universities, or through corporate social responsibility programs. Facebook’s Data for Good initiative is an example of how technology compa- nies can be incentivized to share their data through corporate social responsibility programs. By leverag- ing customer data and engaging with civil society and university partners, Facebook is offering a suite of innovative datasets intended to aid public policy decisions. Offerings range from mobility data and downscaled population maps to data on electric grid coverage. Data collaboratives and research partnerships could provide companies’ application programming interfaces (APIs) and cloud services through tiers made available to the public sector. Social media plat- forms such as Twitter provide APIs so that users can download their text data using free tiers. Private companies could be encouraged to share their data at reduced cost for public initiatives, with special grants for researchers or tax breaks for the
  • 160. 140 | World Development Report 2021 data provider. Cloud computing services, such as Google Cloud and Amazon Web Services, are offering small education grants to researchers to access the computing infrastructure needed to leverage these datasets, which are often large. Flowminder, a Swed- ish nongovernmental organization, provides code, instructions, and support for mobile network oper- ators to aggregate, deidentify, and share their CDR data. Their open-source tool, FlowKit, provides APIs, code, and databases to aid companies sharing these sensitive records with researchers.99 Trusted intermediaries are building platforms that provide researchers with private intent data or facilitate sending programming code to private com- panies, which can, in turn, run the code with their pri- vate intent data on behalf of the researcher and share aggregated research insights. Opportunity Insights, a nonpartisan, nonprofit research organization based at Harvard University, offers a Track the Recovery platform that gives researchers access to near real- time economic data to understand the COVID-19 policy response in the United States. As the broker of the data sharing agreements, Opportunity Insights deidentifies data to facilitate sharing by protecting customers’ and companies’ data. For example, they protect companies’ data through aggregation and by creating relative indicators that mask actual revenue and profit. OPAL (“Open Algorithms”) takes a different approach. OPAL is a nonprofit partnership created by groups at MIT Media Lab, Imperial College London, the financial company Orange, the World Economic Forum, and the Data-Pop Alliance. Its platform allows researchers to send companies certified open-source algorithms that are then run behind the companies’ firewalls. Despite the promise of these innovative data- sharing pathways, many are not available in lower- income countries. For example, these countries rarely participate in data collaboratives, according to data compiled by the Open Data Lab.100 A similar trend can be seen in the limited number of studies on lower-income countries that leveraged private intent data in the early stages of the COVID-19 pandemic (box 4.2). More investments are needed in accessing private intent data for public benefit in lower-income countries. Investing in research Investments in research are needed to develop methods and enable lower-income countries to grow research programs that leverage private intent data for public policy. The research community can achieve quick wins by focusing on foundational areas such as testing whether validated methods in one region translate to contexts where data are sparse. Over the longer term, research strategies would benefit from building validation and training datasets in lower-income countries to avoid issues similar to data profiling and discrimination when using pre-trained models from higher-income coun- tries. The selections that follow describe some of the high-priority research needed to advance the use of private intent data for public benefit in the short and long term. Shorter-term research needs. Because much of the current innovation in using private intent data is led by researchers and technology companies in higher- income countries, many of the available methods are not tailored to the development context.101 Even when a solution is developed for and validated in a particular lower-income country, understanding whether and when the solution can be extended to other lower-income countries can enable research in data-sparse contexts. For example, even though international phone call usage correlates with wealth more strongly in Rwanda than in Balkh province in Afghanistan, such a finding can still be useful in contexts such as Balkh province.102 Similarly, granular poverty maps that use digital trace data from mobile phones hold great potential for better targeting social services, but the patterns that algorithms use to make poverty predictions may differ from context to con- text.103 Research is needed to determine when granu- lar poverty estimates created for one country can be transferred to another country and when they will lead to misleading maps. In the short term, researchers also need to produce methods that preserve privacy while combining pub- lic intent and private intent data. As more datasets are made available to researchers and decision-makers, more opportunities arise to reverse-engineer tradi- tional deidentification methods. If these risks are not eliminated, individuals and companies may be reluctant to share their data. One example of how public intent data are being designed to prevent de-anonymization is the GPS data collected from household surveys for the Demographic and Health Survey and the Living Standards Measurement Study. Even if surveys collect GPS-based locations for communities and households, the resulting data are not included in public use datasets to ensure the confidentiality of respondents. Any third-party user that obtains DHS- or LSMS-type survey data has access only to spatially offset locations of survey enumeration areas. For example, a household’s loca- tion is represented using the 10 square kilometer area
  • 161. Creative reuses of data for greater value | 141 within which the house is located. These surveys are also good examples of providing documentation that makes the precision and accuracy of the deidentified data explicit. Concerns about data protection have limited the eagerness to share data, even in critical times such as during the Ebola crisis. Historically, data deidentification techniques have maintained equi- librium between the producers and consumers of data, preserving individuals’ privacy while limiting information loss. However, deidentification tech- niques have proven to be increasingly imperfect with high-dimensional private intent data. Despite the use of standard deidentification techniques, one study found that four data points were enough to reidentify 95 percent of individuals in a mobile phone dataset of 1.5 million people.104 In this context, new data sharing frameworks have been proposed to mitigate privacy risks while maximizing the informative potential of private intent data. Researchers have proposed four models for use of mobile phone data, depending on the level of risk tolerance and the number of potential third-party users.105 The Social Science One initiative, which allows researchers from academic institutions to access Facebook data at scale,106 is an example of how new data sharing frameworks could be applied to access private intent data, paving the way for future public-private collaborations. As noted, more research will be needed to design methods that allow the privacy of private intent data to be protected, while minimizing the loss of precision associated with using these data in applications aimed to inform public policy.107 Inacontextoflowdataandcodingliteracy,off-the- shelf programming tools can lead to more effective and responsible use of private intent data. Flowmind- er’s FlowKit is an example of an open-source solution that helps companies to deidentify, clean, and export their data effectively for policy applications.108 Using FlowKit, Flowminder and its partners have been able torapidlyintegrateCDRsintotheCOVID-19response. Aequitas is another open-source toolkit that provides an intuitive way to audit machine learning models for discrimination and bias.109 These types of tools enable researchers to access data and companies to share data without the need for specialized skills to collate and deidentify the datasets. Ideally, these research tools should be designed to promote access to data and to share technical knowledge between lower- income countries, from higher-income countries to lower-income countries, or from lower-income coun- tries to higher-income countries. Research funding would not only operate on short-term project cycles but also support the institutional setup of research labs and institutes in lower-income countries. Longer-term research needs. These needs include devising best practices and quality standards. Most decision-makers will not be well versed in the latest data methods. Best practices and quality standards can facilitate trust in leveraging new data types for policy. These types of standards and governing institutions are available for public intent data. Con- ceivably, then, they could be translated for private intent data. For example, traditional data collection using sample surveys has many imperfections, but by studying them extensively, the research com- munity has come up with ways to address them or quantify the errors they introduce.110 In the same way, researchers need to study the limitations of pri- vate intent data and develop the appropriate quality standards for their use in public policy. For example, there is currently no consensus on the criteria needed to determine whether a poverty map is fit for use in resource allocations. International organizations can play a major role in this process by providing platforms for discuss- ing, formulating, and promoting these practices and standards (see spotlight 8.1). The working groups established under the auspices of the United Nations Statistical Commission on household surveys, open data, and big data may provide insight into the types of commissions that could be established.111 Coordinating investment In the longer term, coordinated investment in high-quality training data from lower-income coun- tries will also be needed. Currently, private intent data are mostly repurposed for machine learning applica- tions, which require high-quality data collected on location via remote sensing to train algorithms. For example, over the last five years pioneering research on small-scale farming systems has successfully com- bined high-quality georeferenced survey data with high-resolution, multispectral satellite imagery from public sources (Sentinel-2) and private sources (Planet and Maxar, formerly Digital Globe) to obtain crop yield estimates on individual plots.112 These efforts have shown the importance of using high-quality ground data—including georeferenced plot outlines and objective measures of crop yields—to calibrate and validate remote sensing models that can, in turn, churn out high-resolution grids for crop types and crop yields for entire regions and countries. One of the challenges preventing the rapid scale up of these efforts is the lack of knowledge on the required volume and content of georeferenced
  • 162. 142 | World Development Report 2021 microdata that should be collected through surveys to inform downstream remote sensing applications capable of meeting needs for spatially disaggregated estimation and reporting. These challenges could be addressed by research. Similarly, to analyze natural language data to, for example, measure attitudes on certain topics in the population, researchers typically rely on pre-trained language models (that is, models already trained on a large corpus of text). The lack of these pre-trained language models in languages other than the major ones has been a barrier to the analysis of text data in low- and middle-income countries.113 If the people in these countries are themselves the designers, curators, and owners of location-specific, high-quality training data to test private intent data, the center of research gravity would naturally shift toward lower-income countries. Table4.1summarizesselectedshort-andlong-term research needs. Answers to the questions listed in the table will vary in accordance with the development Research area Examples of research gaps Societal impacts •  How do we ensure that algorithmic-based policy making can lead to fair outcomes? •  How can we increase the transparency and interpretability of policy predictions using private intent data? •  How can we design algorithms that can be safeguarded against manipulation? •  What are the trade-offs between granularity and precision, and what is the optimal mix for targeting of development programs? Quality standards •  How can standards be created, agreed on, updated, and communicated to the general development community? Who needs to be part of these conversations? •  To ensure that policy makers can trust and use results, what should the standards be for accuracy and precision for frontier applications that use private intent data or that combine public intent and private intent data? External validitya •  How promising is the approach of building models in countries that have data and applying them to countries with limited data? •  How can issues akin to data profiling and discrimination be avoided when using pre-trained models from higher-income countries in cases of novel development use? •  To what extent can applications that combine public intent survey data with private intent data predict values calculated from census data within a country? Machine learning •  How does the approach to machine learning and spatial feature selection need to change from common machine learning tasks to more specialized tasks that will aid development policy? •  Which features best predict spatial variation in development outcomes in different contexts? What are the trade-offs between predictive accuracy and cost? Training and validation data •  What should be the required volume of and approach to public intent data collection for calibrating and validating machine learning algorithms that combine public intent and private intent data? Deidentificationb •  How do deidentification methods need to change to protect individuals and companies when private intent data are used for public benefit? •  How does (spatial) deidentification of public intent data affect the accuracy and precision of applications that use public intent data to calibrate and validate machine learning algorithms that combine public intent and private intent data? Capturing longitudinal change •  How do accuracy and precision differ in applications that aim to estimate longitudinal change versus obtaining cross-sectional predictions for the same development outcome? •  What features best predict longitudinal change in different contexts? •  How can we ensure the stability over time of algorithms aimed at predicting changes in policy outcomes? •  When public intent survey data are combined with imagery—specifically, spatial features (predictors) extracted via deep-learning techniques—in order to derive high-resolution estimates of a development outcome, how do the spatial, spectral,c and temporal resolution of satellite imagery affect the accuracy and precision of the predictions for the outcome of interest? •  Do these effects vary based on the decisions on the size of satellite imagery grids that are processed for extracting spatial features? Table 4.1 Selected research gaps to be addressed to advance the use of private intent data for development a. External validity relates to the research findings of one location holding true in another location. b. The term deidentification is used instead of anonymization because, although data are processed to deidentify any individual, these data may become identifiable in the future as computing and machine learning advance. Thus data may never be truly anonymized. c. “Spectral” refers to different wavelengths on the visual spectrum. Satellite images typically have multiple “bands” that capture different spectral ranges.
  • 163. Creative reuses of data for greater value | 143 outcome/process that researchers are aiming to bet- ter measure and understand through the use and augmentation of private intent data. For example, the requirements for high-resolution estimation of population density will differ from requirements for estimating crop yields. The growing availability and use of private intent data for development purposes have potentially large benefits, especially when paired with public intent data. However, the way forward requires a condu- cive and enabling environment that trains both ana- lysts and higher-level decision-makers to consider critically issues of data protection, discrimination, manipulation, representativeness, and transparency. Repurposing and combining public intent and private intent data are central to getting more value from data, but the benefits must be shared equitably while safeguarding against harmful outcomes. Part II of this Report describes the building blocks of a social contract that enables such data flows, including infra- structure policies, legal and regulatory frameworks for data, related economic policies, and the institu- tions of data governance. Notes 1. Bengtsson et al. (2011). 2. Chetty et al. (2020); Oliver et al. (2020). 3. Beraja, Yang, and Yuchtman (2020). 4. Salganik (2017). 5. Salganik (2017). 6. Serajuddin et al. (2015). 7. IEAG (2014). 8. Demombynes and Sandefur (2015). 9. Tiecke and Gros (2016). 10. Stephens-Davidowitz (2017). 11. For the 2008 and 2012 US presidential elections, Stephens-Davidowitz (2017) found that an area’s search rate for terms with racial overtones was a robust nega- tive predictor of presidential candidate Barack Obama’s vote share. 12. WHO (2008). 13. Adda (2016). 14. Ihantamalala et al. (2018); Milusheva (2020); Weso- lowski et al. (2012). 15. González, Hidalgo, and Barabási (2008); Le Menach et al. (2011); Tatem et al. (2009). 16. Wesolowski et al. (2012). 17. Wesolowski et al. (2012). 18. Peak et al. (2018). After the outbreak, they studied how mobile phone data for Sierra Leone could have been used to evaluate the impacts of interventions meant to decrease travel during the epidemic. 19. COVID-19 National Emergency Response Center (2020). 20. Burns (2020). 21. Chang et al. (2020); Maas et al. (2019). 22. Aktay et al. (2020). 23. Lai et al. (2020); Pepe et al. (2020). 24. Salathé et al. (2012). 25. McCall (2020). 26. PAHO and WHO (2016). 27. McGough et al. (2017). 28. Kraemer et al. (2019). 29. Yang et al. (2017). 30. Milinovich et al. (2014). 31. Internal Displacement Monitoring Center (IDMC), Data of GIDD (Global Internal Displacement Data- base), https://www.internal-displacement.org/database /displacement-data. 32. Ritchie and Roser (2019). 33. BBC News (2018); CNN Indonesia (2018). 34. Bengtsson et al. (2011); Lu, Bengtsson, and Holme (2012); Wilson et al. (2016). 35. Robinson, Power, and Cameron (2013). 36. Robinson, Power, and Cameron (2013). 37. Kongthon et al. (2012). 38. Resch, Usländer, and Havas (2018). 39. Sentiment analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text, especially to determine whether the writer’s attitude toward a topic or product is positive, negative, or neutral. See “sentiment analysis,” Lexico, Oxford University Press, https://www.lexico.com/en /definition/sentiment_analysis. 40. Reynard and Shirgaokar (2019). 41. See “Case Study 5: Delivering Remote Flood Analytics as a Scalable Service,” pages 61–68 in Sylvester (2019). 42. Blumenstock, Cadamuro, and On (2015); Jean et al. (2016); Yeh et al. (2020). 43. Blumenstock, Cadamuro, and On (2015). 44. Frias-Martinez, Frias-Martinez, and Oliver (2010). 45. Aiken et al. (2020). 46. Llorente et al. (2015). 47. Glaeser, Kim, and Luca (2018). 48. Blumenstock (2016). 49. Bonnet, Lechat, and Ridde (2018). 50. Williams, Idowu, and Olonade (2015). 51. Kelley, Lane, and Schönholzer (2020). 52. Dai and Sujon (2019). 53. Milusheva et al. (2020). 54. Kelley, Lane, and Schönholzer (2020). 55. Pratihast et al. (2014). 56. See World Resources Institute, Global Forest Watch (dashboard), https://www.globalforestwatch.org/. 57. WRI (2019). 58. See, for example, Janaagraha Centre for Citizenship andDemocracy,IPaidaBribe(dashboard),https://www .ipaidabribe.com/about-us#gsc.tab=0.I. 59. Hlatshwayo et al. (2018). 60. Inter-American Development Bank, “Countries That Have Already Implemented the Investment Map Initia- tive,” https://www.iadb.org/en/reform-modernization -state/countries. 61. Marshall (2012). 62. Deaton (2008); Falk et al. (2018). For a list of projects that have used Gallup World Poll data, see Gallup, “Work- ing Together to Change the World,” https://www.gallup
  • 164. 144 | World Development Report 2021 .com/analytics/318176/public-sector-success-stories .aspx. 63. Goldstein, Gonzalez Martinez, and Papineni (2019). 64. Goldstein, Gonzalez Martinez, and Papineni (2019). 65. See Google, Community Mobility Reports (database), https://www.google.com/covid19/mobility/?hl=en. 66. See Ookla, Speedtest Global Index (database), https:// www.speedtest.net/global-index. 67. Davis et al. (2010). 68. Mehrabi et al. (2020). 69. Blumenstock and Eagle (2012). 70. Wesolowski et al. (2012). 71. Frankfurter et al. (2020). 72. Jean et al. (2016). 73. Strubell, Ganesh, and McCallum (2019). 74. Zou and Schiebinger (2018). 75. Angwin et al. (2016). 76. Buolamwini and Gebru (2018). 77. Wallace et al. (2019). 78. Bolukbasi et al. (2016). 79. Ginsberg et al. (2009). 80. Lazer et al. (2014). 81. Ayush et al. (2020); Engstrom, Hersh, and Newhouse (2017). 82. Blumenstock (2018). 83. Björkegren, Blumenstock, and Knight (2020). 84. Carretero, Vuorikari, and Punie (2017); GSS (2016); Vale and Gjaltema (2020). 85. Perol, Gharbi, and Denolle (2018). 86. RTI International, “Impact: Using Satellite Images and Artificial Intelligence to Improve Agricultural Resilience,” https://www.rti.org/impact/using-satellite -images-and-artificial-intelligence-improve-agricultural -resilience. 87. Perisic (2018). 88. ITU (2020). 89. Flowers (2019). 90. Cisco Systems, “Cisco Networking Academy,” https:// www.cisco.com/c/en/us/about/csr/impact/education /networking-academy.html. 91. Deep Learning Indaba Institute, https://deeplearning indaba.com/2020/. 92. Data Science Africa, http://www.datascienceafrica .org/. 93. Zindi (2020). 94. Buckholtz (2019). 95. Kaye (2019). 96. See, for example, the related efforts under the Euro- pean Statistical System by Eurostat, Statistics Den- mark, Destatis (Germany), National Statistics Institute (Spain), ISTAT (Italy), Central Statistical Bureau of Lat- via, Statistics Netherlands, Statistics Poland, Statistics Portugal, National Institute of Statistics (Romania), Statistics Finland, Statistics Iceland, and the Federal Statistical Office (Switzerland)—see European Sta- tistical System, Eurostat, “Experimental Statistics,” Luxembourg, https://ec.europa.eu/eurostat/web/ess /experimental-statistics. 97. For more information on the ONS Data Science Campus and its projects, see Data Science Campus, Office for National Statistics, “Data Science for Public Good: Projects,” https://datasciencecampus.ons.gov.uk /projects/. 98. GovLab, Tandon School of Engineering, New York Uni- versity, “Data Collaboratives,” https://datacollaboratives .org/. 99. Flowminder Foundation, “FlowKit CDR Analytics Tool- kit,” https://flowkit.xyz/. 100. GovLab, Tandon School of Engineering, New York Uni­ ver­sity, “Data Collaboratives,” https://datacollaboratives .org/. 101. Blumenstock (2018). 102. Aiken et al. (2020). 103. Blumenstock (2018). 104. de Montjoye et al. (2013). 105. de Montjoye et al. (2013). 106. Social Science One, Institute for Quantitative Social Science, Harvard University, “Building Industry- Academic Partnerships,” https://socialscience.one/home. 107. Dwork and Roth (2014). 108. Flowminder Foundation, “FlowKit CDR Analytics Tool- kit,” https://flowkit.xyz/. 109. Saleiro et al. (2019). 110. Bethelehem (2009). 111. United Nations Statistical Commission, Statistics Division, Department of Economic and Social Affairs, United Nations, “Active Groups under the Statistical Commission by Pillar and Type of Group,” https:// unstats.un.org/unsd/statcom/groups/. 112. Burke and Lobell (2017); Gourlay, Kilic, and Lobell (2019); Jain et al. (2016); Lambert et al. (2018); Lobell et al. (2020). 113. Zindi (2020). References Adda, Jérôme. 2016. “Economic Activity and the Spread of Viral Diseases: Evidence from High Frequency Data.” Quarterly Journal of Economics 131 (2): 891–941. Aiken, Emily L., Guadalupe Bedoya, Aidan Coville, and Joshua Evan Blumenstock. 2020. “Targeting Develop- ment Aid with Machine Learning and Mobile Phone Data: Evidence from an Anti-Poverty Intervention in Afghanistan.” In COMPASS ’20: Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies, 310–11.NewYork:AssociationforComputingMachinery. Aktay, Ahmet, Shailesh Bavadekar, Gwen Cossoul, John Davis, Damien Desfontaines, Alex Fabrikant, Evgeniy Gabrilovich, et al. 2020. “Google COVID-19 Community Mobility Reports: Anonymization Process Description (Version 1.0).” April 8, 2020. https://arxiv.org/abs/2004 .04145v1. Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. “Machine Bias: There’s Software Used across the Country to Predict Future Criminals, and It’s Biased against Blacks.” ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk -assessments-in-criminal-sentencing. Ayush, Kumar, Burak Uzkent, Marshall Burke, David B. Lobell, and Stefano Ermon. 2020. “Generating
  • 165. Creative reuses of data for greater value | 145 Accountability, and Transparency, 23–24 February 2018, New York, NY, USA, edited by Sorelle A. Friedler and Christo Wilson, 77–91. Cambridge, MA: MIT Press. https:// dam-prod.media.mit.edu/x/2018/02/06/Gender%20 Shades%20Intersectional%20Accuracy%20Disparities .pdf. Burke, Marshall, and David B. Lobell. 2017. “Satellite-Based Assessment of Yield Variation and Its Determinants in Smallholder African Systems.” PNAS, Proceedings of the National Academy of Sciences 114 (9): 2189–94. https://doi .org/10.1073/pnas.1616919114. Burns, Sarah. 2020. “How Anonymized Mobile Data Are Helping Ghana Fight COVID-19.” Global Partnership for Sustainable Development Data, United Nations, New York. https://www.data4sdgs.org/news/how -anonymized-mobile-data-are-helping-ghana-fight -covid-19. Carretero, Stephanie, Riina Vuorikari, and Yves Punie. 2017. “DigComp 2.1: The Digital Competence Framework for Citizens, with Eight Proficiency Levels and Examples of Use.” JRC Working Paper JRC106281, Joint Research Center, EU Science Hub, Seville, Spain. Chang, Meng-Chun, Rebecca Kahn, Yu-An Li, Cheng-Sheng Lee, Caroline O. Buckee, and Hsiao-Han Chang. 2020. “Modeling the Impact of Human Mobility and Travel Restrictions on the Potential Spread of SARS-CoV-2 in Taiwan.” medRxiv, April 11, 2020. https://doi.org/10.1101 /2020.04.07.20053439. Chetty, Raj, John N. Friedman, Nathaniel Hendren, Michael Stepner, and the Opportunity Insights Team. 2020. “How Did COVID-19 and Stabilization Policies Affect Spending and Employment? A New Real-Time Economic Tracker Based on Private Sector Data.” NBER Working Paper 27431, National Bureau of Economic Research, Cam- bridge, MA. https://doi.org/10.3386/w27431. CNN Indonesia. 2018. “BNPB: Seluruh Buoy Deteksi Tsu- nami di Indonesia Rusak.” CNN Indonesia, September 30, 2018. https://www.cnnindonesia.com/nasional/20180930 160115-20-334439/bnpb-seluruh-buoy-deteksi-tsunami -di-indonesia-rusak. COVID-19 National Emergency Response Center. 2020. “Contact Transmission of COVID-19 in South Korea: Novel Investigation Techniques for Tracing Contacts.” Osong Public Health and Research Perspectives 11 (1): 60–63. COVID-19 National Emergency Response Center, Epi­ demiology and Case Management Team, Korea Centers for Disease Control and Prevention, Cheongju, Republic of Korea. https://doi.org/10.24171/j.phrp.2020.11.1.09. Dai, Fei, and Mohhammad Sujon. 2019. “Measuring Current Traffic Safety Culture via Social Media Mining.” WTSC Report 2019-AG-2856, Washington Traffic Safety Com- mission, Olympia, WA. http://wtsc.wa.gov/wp-content /uploads/dlm_uploads/2019/10/Measuring-Traffic-Safety -Culture-via-Social-Media-Mining_Oct2019-1.pdf. Davis, Kristin E., Burton Swanson, David Amudavi, Daniel Ayalew Mekonnen, Aaron Flohrs, Jens Riese, Chloe Lamb, and Elias Zerfu. 2010. “In-Depth Assessment of the Public Agricultural Extension System of Ethiopia and Recommendations for Improvement.” IFPRI Discussion Paper01041,InternationalFoodPolicyResearchInstitute, Interpretable Poverty Maps Using Object Detection in Satellite Images.” Cornell University, Ithaca, NY. http:// arxiv.org/abs/2002.01612. BBC News. 2018. “Indonesia Earthquake and Tsunami: How Warning System Failed theVictims.” BBCNews, October 1, 2018. https://www.bbc.com/news/world-asia-45663054. Bengtsson, Linus, Xin Lu, Anna Thorson, Richard Garfield, and Johan von Schreeb. 2011. “Improved Response to Disasters and Outbreaks by Tracking Population Move- ments with Mobile Phone Network Data: A Post-Earth- quake Geospatial Study in Haiti.” PLoS Medicine 8 (8): e1001083. https://doi.org/10.1371/journal.pmed.1001083. Beraja, Martin, David Y. Yang, and Noam Yuchtman. 2020. “Data-Intensive Innovation and the State: Evidence from AI Firms in China.” NBER Working Paper 27723, National Bureau of Economic Research, Cambridge, MA. https:// www.nber.org/papers/w27723. Bethlehem, Jelke. 2009. “The Rise of Survey Sampling.” Dis- cussion Paper 09015, Statistics Netherlands, The Hague. Björkegren, Daniel, Joshua Evan Blumenstock, and Samsun Knight. 2020. “Manipulation-Proof Machine Learn- ing.” Cornell University, Ithaca, NY. http://arxiv.org/abs /2004.03865. Blumenstock, Joshua Evan. 2016. “Fighting Poverty with Data.” Science 353 (6301): 753–54. https://doi.org/10.1126 /science.aah5217. Blumenstock, Joshua Evan. 2018. “Don’t Forget People in the Use of Big Data for Development.” Nature 561 (7722): 170–72. https://doi.org/10.1038/d41586-018-06215-5. Blumenstock, Joshua Evan, Gabriel Cadamuro, and Robert On. 2015. “Predicting Poverty and Wealth from Mobile Phone Metadata.” Science 350 (6264): 1073–76. https://doi .org/10.1126/science.aac4420. Blumenstock, Joshua Evan, and Nathan Eagle. 2012. “Divided We Call: Disparities in Access and Use of Mobile Phones in Rwanda.” Information Technologies and International Development 8 (2): 1–16. Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” July 21, Cornell University, Ithaca, NY. https://arxiv.org/abs/1607.06520. Bonnet, Emmanuel, Lucie Lechat, and Valéry Ridde. 2018. “What Interventions Are Required to Reduce Road Traffic Injuries in Africa? A Scoping Review of the Lit- erature.” PLoS ONE 13 (11): e0208195. https://doi.org/10.1371 /journal.pone.0208195. Buckee, Caroline O., Satchit Balsari, Jennifer Chan, Mercè Crosas, Francesca Dominici, Urs Gasser, Yonatan H. Grad, et al. 2020. “Aggregated Mobility Data Could Help Fight COVID-19.” Science 368 (6487): 145–46. https://doi .org/10.1126/science.abb8021. Buckholtz, Alison. 2019. “Africa’s IT Talent Pool.” IFC Insights (blog), December 2019. https://www.ifc.org/wps/wcm /connect/news_ext_content/ifc_external_corporate_site /news+and+events/news/insights/africa-it-talent. Buolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In PMLR, Proceedings of Machine Learning Research, vol. 81, FAT 2018, Conference on Fairness,
  • 166. 146 | World Development Report 2021 Washington, DC. https://www.ifpri.org/publication /depth-assessment-public-agricultural-extension-system -ethiopia-and-recommendations. Deaton, Angus S. 2008. “Income, Health, and Well-Being around the World: Evidence from the Gallup World Poll.” Journal of Economic Perspectives 22 (2): 53–72. https://doi .org/10.1257/jep.22.2.53. Demombynes, Gabriel, and Justin Sandefur. 2015. “Costing a Data Revolution.” World Economics 16 (3): 99–112. de Montjoye, Yves-Alexandre, César A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. 2013. “Unique in the Crowd: The Privacy Bounds of Human Mobility.” Scien- tific Reports 3 (1): 1376. https://doi.org/10.1038/srep01376. Dureuil, Manuel, Kristina Boerder, Kirsti A. Burnett, Rainer Froese, and Boris Worm. 2018. “Elevated Trawling inside Protected Areas Undermines Conservation Outcomes in a Global Fishing Hot Spot.” Science 362 (6421): 1403–07. https://doi.org/10.1126/science.aau0561. Dwork, Cynthia, and Aaron Roth. 2014. “The Algorithmic Foundations of Differential Privacy.” Foundations and Trends in Theoretical Computer Science 9 (3–4): 211–407. http://dx.doi.org/10.1561/0400000042. Engstrom, Ryan, Jonathan Samuel Hersh, and David Locke Newhouse. 2017. “Poverty from Space: Using High- Resolution Satellite Imagery for Estimating Economic Well-Being.” Policy Research Working Paper 8284, World Bank, Washington, DC. Falk, Armin, Anke Becker, Thomas Dohmen, Benjamin Enke, David Huffman, and Uwe Sunde. 2018. “Global Evidence on Economic Preferences.” Quarterly Journal of Economics 133 (4): 1645–92. https://doi.org/10.1093/qje/qjy013. Flowers, Andrew. 2019. “Indeed Tech Skills Explorer: Fastest-Rising Tech Skills.” Occupation Spotlight (blog), November 26, 2019. https://www.hiringlab.org/2019/11/26 /fastest-rising-tech-skills/. Fraiberger, Samuel P., Pablo Astudillo, Lorenzo Candeago, Alex Chunet, Nicholas K. W. Jones, Maham Faisal Khan, Bruno Lepri, et al. 2020. “Uncovering Socioeconomic Gaps in Mobility Reduction during the COVID-19 Pan- demic Using Location Data.” Cornell University, Ithaca, NY. http://arxiv.org/abs/2006.15195. Frankfurter, Zoe, Klaudia Kokoszka, David Locke Newhouse, Ani Rudra Silwal, and Siwei Tian. 2020. “Measuring Internet Access in Sub-Saharan Africa (SSA).” Poverty and Equity Notes 31 (August), World Bank, Washington, DC. https://openknowledge.worldbank.org/bitstream /handle/10986/34302/Measuring-Internet-in-Access-in -Sub-Saharan-Africa-SSA.pdf?sequence=1. Frias-Martinez, Vanessa, Enrique Frias-Martinez, and Nuria Oliver. 2010. “A Gender-Centric Analysis of Calling Behavior in a Developing Economy Using Call Detail Records.” In Artificial Intelligence for Development: Papers from the AAAI Spring Symposium, edited by Association for the Advancement of Artificial Intelligence, 37–42. Technical Report SS-10-01. Menlo Park, CA: AAAI Press. Ginsberg, Jeremy, Matthew H. Mohebbi, Rajan S. Patel, Lyn- nette Brammer, Mark S. Smolinski, and Larry Brilliant. 2009. “Detecting Influenza Epidemics Using Search Engine Query Data.” Nature 457 (February): 1012–14. https://www.nature.com/articles/nature07634. Glaeser, Edward L., Hyunjin Kim, and Michael Luca. 2018. “Nowcasting Gentrification: Using Yelp Data to Quan- tify Neighborhood Change.” AEA Papers and Proceedings 108 (May): 77–82. Goldstein, Markus P., Paula Gonzalez Martinez, and Sreelakshmi Papineni. 2019. “Tackling the Global Prof- itarchy: Gender and the Choice of Business Sector.” Policy Research Working Paper 8865, World Bank, Washington, DC. https://openknowledge.worldbank .org/handle/10986/31747. González, Marta C., César A. Hidalgo, and Albert-László Barabási. 2008. “Understanding Individual Human Mobility Patterns.” Nature 453 (7196): 779–82. Gourlay, Sydney, Talip Kilic, and David B. Lobell. 2019. “A New Spin on an Old Debate: Errors in Farmer-Reported Production and Their Implications for Inverse Scale– Productivity Relationship in Uganda.” Journal of Devel- opment Economics 141 (November): 102376. https://doi.org /10.1016/j.jdeveco.2019.102376. GSS (Government Statistical Service, UK). 2016. “Com- petency Framework for the Government Statistician Group (GSG).” GSS, Office of National Statistics, London. Hlatshwayo, Sandile, Anne Oeking, Manuk Ghazanchyan, David Corvino, Ananya Shukla, and Lamin Leigh. 2018. “The Measurement and Macro-Relevance of Corruption: A Big Data Approach.” IMF Working Paper WP/18/195, International Monetary Fund, Washington, DC. http:// dx.doi.org/10.5089/9781484373095.001. IEAG (Independent Expert Advisory Group on a Data Revo- lution for Sustainable Development). 2014. “A World That Counts: Mobilising the Data Revolution for Sustainable Development.” Data Revolution Group, United Nations, New York. Ihantamalala, Felana Angella, Vincent Herbreteau, Feno M. J. Rakotoarimanana, Jean Marius Rakotondramanga, Simon Cauchemez, Bienvenue Rahoilijaona, Gwenaëlle Pennober, et al. 2018. “Estimating Sources and Sinks of Malaria Parasites in Madagascar.” Nature Communica- tions 9 (1): 3897. ITU (International Telecommunication Union). 2020. “Africa Is at the AI Innovation Table and ‘Ready for the Next Wave.’” ITU News, June 23, 2020. https://www.itu.int/en /myitu/News/2020/06/23/07/55/AI-for-Good-2020-Africa -innovation. Jain, Meha, Amit Srivastava, Balwinder Singh, Rajiv Joon, Andrew Mcdonald, Keitasha Royal, Madeline Lisaius, et al. 2016. “Mapping Smallholder Wheat Yields and Sowing Dates Using Micro-Satellite Data.” Remote Sens- ing 8 (November): 860. https://doi.org/10.3390/rs8100860. Jean, Neal, Marshall Burke, Michael Xie, W. Matthew Davis, David B. Lobell, and Stefano Ermon. 2016. “Combining Satellite Imagery and Machine Learning to Predict Poverty.” Science 353 (6301): 790–94. https://doi.org/10.1126 /science.aaf7894. Kaye, Kate. 2019. “These Companies Claim to Provide ‘Fair- Trade’ Data Work: Do They?” MIT Technology Review, August7.https://www.technologyreview.com/2019/08/07 /133845/cloudfactory-ddd-samasource-imerit-impact -sourcing-companies-for-data-annotation//.
  • 167. Creative reuses of data for greater value | 147 Kelley, Erin, Gregory Lane, and David Schönholzer. 2020. “Monitoring in Target Contracts: Theory and Experi- ment in Kenyan Public Transit.” Paper presented at Vir- tual BREAD/CEPR/STICERD/TCD Conference on Devel- opment Economics, October 1–3, 2020. https://youtu.be /TU-_xDR3x7I. Klein, Brennan, Timothy LaRock, Stefan McCabe, Leo Torres, Filippo Privitera, Lake Brennan, Moritz U. G. Kraemer, et al. 2020. “Assessing Changes in Commuting and Individual Mobility in Major Metropolitan Areas in the United States during the COVID-19 Outbreak.” Network Science Institute, Northeastern University, Boston. https://www.networkscienceinstitute.org /publications/assessing-changes-in-commuting-and -individual-mobility-in-major-metropolitan-areas-in-the -united-states-during-the-covid-19-outbreak. Kongthon, Alisa, Choochart Haruechaiyasak, Jaruwat Pailai, and Sarawoot Kongyoung. 2012. “The Role of Twitter during a Natural Disaster: Case Study of 2011Thai Flood.” In 2012 Proceedings of PICMET ‘12: Technology Management for Emerging Technologies, edited by Institute of Electrical and Electronics Engineers, 2227–32. Red Hook, NY: Cur- ran Associates. Kraemer, Moritz U. G., Nick Golding, Dionisio Bisanzio, Samir Bhatt, David M. Pigott, S. E. Ray, O. J. Brady, et al. 2019. “Utilizing General Human Movement Models to Predict the Spread of Emerging Infectious Diseases in Resource Poor Settings.” Scientific Reports 9 (March): 5151. https://doi.org/10.1038/s41598-019-41192-3. Lai, Shengjie, Nick W. Ruktanonchai, Liangcai Zhou, Olivia Prosper, Wei Luo, Jessica R. Floyd, Amy Wesolowski, et al. 2020. “Effect of Non-Pharmaceutical Interventions to Contain COVID-19 in China.” Nature 585 (7825): 410–13. https://doi.org/10.1038/s41586-020-2293-x. Lambert, Marie-Julie, Pierre C. Sibiry Traoré, Xavier Blaes, Philippe Baret, and Pierre Defourny. 2018. “Estimating Smallholder Crops Production at Village Level from Sentinel-2 Time Series in Mali’s Cotton Belt.” Remote Sensing of Environment 216 (October): 647–57. https://doi .org/10.1016/j.rse.2018.06.036. Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343 (6176): 1203–05. https://doi .org/10.1126/science.1248506. Le Menach, Arnaud, Andrew J. Tatem, Justin M. Cohen, Simon I. Hay, Heather Randell, Anand P. Patil, and David L. Smith. 2011. “Travel Risk, Malaria Importation, and Malaria Transmission in Zanzibar.” Scientific Reports 1: 93. https://www.nature.com/articles/srep00093. Llorente, Alejandro, Manuel Garcia-Herranz, Manuel Cebrian, and Esteban Moro. 2015. “Social Media Finger- prints of Unemployment.” PLoS ONE 10 (5): e0128692. https://doi.org/10.1371/journal.pone.0128692. Lobell, David B., George Azzari, Marshall Burke, Sydney Gourlay, Zhenong Jin, Talip Kilic, and Siobhan Murray. 2020. “Eyes in the Sky, Boots on the Ground: Assessing Satellite- and Ground-Based Approaches to Crop Yield Measurement and Analysis.” American Journal of Agricul- tural Economics 102 (1): 202–19. https://doi.org/10.1093/ajae /aaz051. Lu, Xin, Linus Bengtsson, and Petter Holme. 2012. “Predict- ability of Population Displacement after the 2010 Haiti Earthquake.” PNAS, Proceedings of the National Academy of Sciences of the United States of America 109 (29): 11576–81. https://doi.org/10.1073/pnas.1203882109. Maas, Paige, Shankar Iyer, Andreas Gros, Wonhee Park, Laura McGorman, Chaya Nayak, and P. Alex Dow. 2019. “Facebook Disaster Maps: Aggregate Insights for Crisis Response and Recovery.” In Conference Proceedings: 16th International Conference on Information Systems for Crisis Response and Management, edited by Zeno Franco, José J. González, and José H. Canós, 836–47. Valencia, Spain: Polytechnic University of Valencia. Marshall, Sarah. 2012. “Citizen Journalists Report Sierra Leone Elections by SMS.” Journalism, November 20, 2012. https://www.journalism.co.uk/news/citizen-journalists -report-sierra-leone-elections-by-sms-/s2/a551240/. Masaki, Takaaki, David Locke Newhouse, Ani Rudra Silwal, Adane Bedada, and Ryan Engstrom. 2020. “Small Area Estimation of Non-Monetary Poverty with Geospatial Data.” Policy Research Working Paper 9383, World Bank, Washington, DC. McCall, Becky. 2020. “COVID-19 and Artificial Intelligence: Protecting Health-Care Workers and Curbing the Spread.” Lancet Digital Health 2 (4): e166–e167. https://doi .org/10.1016/S2589-7500(20)30054-6. McGough, Sarah F., John S. Brownstein, Jared B. Hawkins, and Mauricio Santillana. 2017. “Forecasting Zika Inci- dence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data.” PLoS Neglected Tropical Diseases 11 (1): e0005295. Mehrabi, Zia, Mollie J. McDowell, Vincent Ricciardi, Chris- tian Levers, Juan Diego Martinez, Natascha Mehrabi, Hannah Wittman, et al. 2020. “The Global Divide in Data-Driven Farming.” Nature Sustainability 4 (February 2021): 154–60. https://doi.org/10.1038/s41893-020-00631-0. Milinovich, Gabriel J., Gail M. Williams, Archie C. A. Clements, and Wenbiao Hu. 2014. “Internet-Based Sur- veillance Systems for Monitoring Emerging Infectious Diseases.” Lancet Infectious Diseases 14 (2): 160–68. https:// doi.org/10.1016/S1473-3099(13)70244-5. Milusheva, Sveta. 2020. “Managing the Spread of Disease with Mobile Phone Data.” Journal of Development Eco- nomics 147 (November): 102559. https://doi.org/10.1016/j .jdeveco.2020.102559. Milusheva, Sveta, Robert Marty, Guadalupe Bedoya, Eliza- beth Resor, Sarah Williams, and Arianna Legovini. 2020. “Can Crowdsourcing Create the Missing Crash Data?” In COMPASS ’20: Proceedings of the 3rd ACM SIGCAS Conference on Computing and Sustainable Societies, 305–06. New York: Association for Computing Machinery. https://doi.org/10.1145/3378393.3402264. Oliver, Nuria, Bruno Lepri, Harald Sterly, Renaud Lambi- otte, Sébastien Deletaille, Marco De Nadai, Emmanuel Letouzé, et al. 2020. “Mobile Phone Data for Informing Public Health Actions across the COVID-19 Pandemic Life Cycle.” Science Advances 6 (23): eabc0764. https://doi .org/10.1126/sciadv.abc0764.
  • 168. 148 | World Development Report 2021 PAHO (Pan American Health Organization) and WHO (World Health Organization). 2016. “Zika Cases and Congenital Syndrome Associated with Zika Virus Reported by Countries and Territories in the Americas: Cumulative Cases, 2015–2016.” PAHO, Washington, DC. https://www.paho.org/hq/dmdocuments/2016/2016 -dec-29-phe-ZIKV-cases.pdf. Peak, Corey M., Amy Wesolowski, Elisabeth zu Erbach- Schoenberg, Andrew J. Tatem, Erik Wetter, Xin Lu, Daniel Power, et al. 2018. “Population Mobility Reduc- tions Associated with Travel Restrictions during the Ebola Epidemic in Sierra Leone: Use of Mobile Phone Data.” International Journal of Epidemiology 47 (5): 1562–70. Pepe, Emanuele, Paolo Bajardi, Laetitia Gauvin, Filippo Privitera, Brennan Lake, Ciro Cattuto, and Michele Tizzoni. 2020. “COVID-19 Outbreak Response: A Dataset to Assess Mobility Changes in Italy Following National Lockdown.” Scientific Data 7: 230. https://doi.org/10.1038 /s41597-020-00575-2. Perisic, Igor. 2018. “How Artificial Intelligence Is Already Impacting Today’s Jobs.” Economic Graph (blog), Septem- ber 17, 2018. https://economicgraph.linkedin.com/blog /how-artificial-intelligence-is-already-impacting-todays -jobs. Perol, Thibaut, Michaël Gharbi, and Marine Denolle. 2018. “Convolutional Neural Network for Earthquake Detec- tion and Location.” Science Advances 4 (2): e1700578. https://doi.org/10.1126/sciadv.1700578. Pratihast, Arun Kumar, Ben DeVries, Valerio Avitabile, Sytze De Bruin, Lammert Kooistra, Mesfin Tekle, and Martin Herold. 2014. “Combining Satellite Data and Community- Based Observations for Forest Monitoring.” Forests 5 (10): 2464–89. https://doi.org/10.3390/f5102464. Resch, Bernd, Florian Usländer, and Clemens Havas. 2018. “Combining Machine-Learning Topic Models and Spa- tiotemporal Analysis of Social Media Data for Disaster Footprint and Damage Assessment.” Cartography and Geographic Information Science 45 (4): 362–76. Reynard, Darcy, and Manish Shirgaokar. 2019. “Harnessing the Power of Machine Learning: Can Twitter Data Be Useful in Guiding Resource Allocation Decisions during a Natural Disaster?” Transportation Research Part D: Trans- port and Environment 77 (December): 449–63. Ritchie, Hannah, and Max Roser. 2019. “Natural Disasters.” Our World in Data. Global Change Data Lab and Oxford Martin Program on Global Development, University of Oxford, Oxford, UK. https://ourworldindata.org/natural -disasters. Robinson, Bella Fay, Robert Power, and Mark Cameron. 2013. “A Sensitive Twitter Earthquake Detector.” In WWW ’13: Proceedings of the 22nd International Conference on World Wide Web, 999–1002. New York: Association for Computing Machinery. https://dl.acm.org/doi/10.1145 /2487788.2488101. Salathé, Marcel, Linus Bengtsson, Todd J. Bodnar, Devon D. Brewer, John S. Brownstein, Caroline Buckee, Ellsworth M. Campbell, et al. 2012. “Digital Epidemiology.” PLoS Computational Biology 8 (7): e1002616. Saleiro, Pedro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, et al. 2019. “Aequitas: A Bias and Fairness Audit Toolkit.” Cornell University, Ithaca, NY. https://arxiv.org/abs /1811.05577. Salganik, Matthew J. 2017. Bit by Bit: Social Research in the Digital Age. Princeton, NJ: Princeton University Press. Serajuddin, Umar, Hiroki Uematsu, ChristinaWieser, Nobuo Yoshida,andAndrewL.Dabalen.2015.“DataDeprivation: Another Deprivation to End.” Policy Research Working Paper 7252, World Bank, Washington, DC. Servick, Kelly. 2020a. “Cellphone Tracking Could Help Stem the Spread of Coronavirus: Is Privacy the Price?” Science, March 22. https://www.sciencemag.org/news/2020/03 /cellphone-tracking-could-help-stem-spread-coronavirus -privacy-price. Servick, Kelly. 2020b. “COVID-19 Contact Tracing Apps Are Coming to a Phone Near You: How Will We Know Whether They Work?” Science, May 21. https://www .sciencemag.org/news/2020/05/countries-around-world -are-rolling-out-contact-tracing-apps-contain-corona virus-how. Stephens-Davidowitz, Seth. 2017. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are. New York: HarperCollins. Strubell, Emma, Ananya Ganesh, and Andrew McCallum. 2019. “Energy and Policy Considerations for Deep Learn- ing in NLP.” Proceedings of 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July 2019. Sylvester, Gerard, ed. 2019. “E-Agriculture in Action: Big Data for Agriculture.” Food and Agriculture Organization of the United Nations and International Telecommunica- tion Union, Bangkok. http://www.fao.org/3/ca5427en /ca5427en.pdf. Tatem, Andrew J., Youliang Qiu, David L. Smith, Oliver Sabot, Abdullah S. Ali, and Bruno Moonen. 2009. “The Use of Mobile Phone Data for the Estimation of the Travel Patterns and Imported Plasmodium Falciparum Rates among Zanzibar Residents.” Malaria Journal 8 (Decem- ber): 287. https://doi.org/10.1186/1475-2875-8-287. Tiecke, Tobias G., and Andreas Gros. 2016. “Connecting the World with Better Maps.” Facebook Engineering (blog), February 22, 2016. https://engineering.fb.com/core-data /connecting-the-world-with-better-maps/. Vale, Steven, and Taeke Gjaltema. 2020. “High-Level Group for the Modernisation of Official Statistics.” United Nations Economic Commission for Europe, Geneva. https://statswiki.unece.org/display/hlgbas/High-Level+ Group+for+the+Modernisation+of+Official+Statistics. Wallace, Eric, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. “Universal Adversarial Triggers for Attacking and Analyzing NLP.” Cornell University, Ithaca, NY. http://arxiv.org/abs/1908.07125. Wesolowski, Amy, Nathan Eagle, Abdisalan M. Noor, Robert W. Snow, and Caroline O. Buckee. 2012. “Heterogeneous Mobile Phone Ownership and Usage Patterns in Kenya.” PLoSONE7(4):e35319.https://doi.org/10.1371/journal.pone .0035319.
  • 169. Creative reuses of data for greater value | 149 WHO (World Health Organization). 2008. “The Top 10 Causes of Death.” Fact Sheets (blog), May 24, 2008. https://www.who.int/news-room/fact-sheets/detail/the -top-10-causes-of-death. Williams, Kehinde, Adebayo Peter Idowu, and Emmanuel Olonade. 2015. “Online RoadTraffic Accident Monitoring System for Nigeria.” Transactions on Networks and Commu- nications 3 (1): 10–30. https://doi.org/10.14738/tnc.31.589. Wilson, Robin, Elisabeth zu Erbach-Schoenberg, Maximilian Albert,DanielPower,SimonTudge,MiguelGonzalez,Sam Guthrie, et al. 2016. “Rapid and Near Real-Time Assess- ments of Population Displacement Using Mobile Phone Data Following Disasters: The 2015 Nepal Earthquake.” PLoS Currents 8 (February 24). https://doi.org/10.1371 /currents.dis.d073fbece328e4c39087bc086d694b5c. WRI (World Resources Institute). 2019. “Palm Oil Industry to Jointly Develop Radar Monitoring Technology to Detect Deforestation.” Press release, October 31, 2019. https:// www.wri.org/news/2019/10/release-palm-oil-industry -jointly-develop-radar-monitoring-technology-detect. Yang, Shihao, Samuel C. Kou, Fred Lu, John S. Brown- stein, Nicholas Brooke, and Mauricio Santillana. 2017. “Advances in Using Internet Searches to Track Dengue.” PLoS Computational Biology 13 (7): e1005607. Yeh, Christopher, Anthony Perez, Anne Driscoll, George Azzari, Zhongyi Tang, David B. Lobell, Stefano Ermon, et al. 2020. “Using Publicly Available Satellite Imagery and Deep Learning to Understand Economic Well-Being in Africa.” Nature Communications 11 (1): 2583. https://doi .org/10.1038/s41467-020-16185-w. Zindi. 2020. “GIZ AI4D Africa Language Challenge, Round 2: $6,000USD.”Competitions,June1,2020.https://zindi.africa /competitions/ai4d-african-language-dataset-challenge. Zou, James, and Londa Schiebinger. 2018. “AI Can Be Sexist and Racist: It’s Time to Make It Fair.” Nature 559 (7714): 324–26. https://doi.org/10.1038/d41586-018-05707-8.
  • 170. 150 | World Development Report 2021 Two-way flows of data between local, regional, and international meteorological centers have high value for social and economic development. Spotlight 4.1 Gathering, sharing, and using better data on weather, water, and climate from low- and middle-income countries The need for weather and climate information is growing rapidly as people are becoming more vulner- able to natural hazards, including those exacerbated by climate change. To support economic and social development, all countries need to have access to sufficiently accurate, reliable, and understandable weather, water, and climate data (as stated in the Sus- tainable Development Goals).1 This is the case every- where, but particularly in low- and middle-income countries, which are bearing the brunt of losses from natural hazards.2 Scientific and technological advances have brought weather prediction systems to a level where they can provide weather intelligence to inform the decisions of individuals and groups of individuals.3 But such data are useful for decision-making only if more high- quality observational data are shared, assimilated, or used to adjust model outputs. While satellites provide most of the data for models, local data play several critical roles. Scientists from national meteorological services work with regional and global centers to cali- brate global models at national and local scales. The more local data are shared internationally, the better the weather predictions produced by global centers can be applied locally. Exchanging more and better data internationally and doing so more fre- quently have many direct benefits, yielding better- performing models, more accurate local forecasts, and improved verification of forecasts, helping to monitor, improve, and compare the quality of fore- casts and forecasting systems. Recognizing the benefit of sharing national data with regional and global forecasting centers, Ukraine recently increased the number of weather stations reporting data to the European Centre for Medium-Range Weather Forecasts (ECMWF) from 30 to 130.4 The extra data produced by these stations will help to improve global forecasts and thus regional and national forecasts. In particular, local observa- tions of near-surface temperature and humidity will improve estimates of soil moisture, which influence regional and global forecasts of near-surface tem- perature and rainfall. Additional data on snow depth from Denmark, Hungary, the Netherlands, Romania, Sweden, and Switzerland have improved forecasts of air temperature in the northern hemisphere. More data are now being shared in real time between ECMWF and all 37 countries participating in the Regional Integrated Multi-Hazard Early Warn- ing System for Africa and Asia (RIMES).5 The Bangla- desh Meteorological Department, for example, has increased from 10 to 32 the number of stations shar- ing observational data taken every three hours and provided nearly 40 years of historical data. The total number of stations added by all RIMES members is now 500 and is expected to increase to 1,500 soon, leading to a significant improvement in the accuracy and lead time of weather forecasts. However, these data are not categorized as essen- tial data, as defined by the World Meteorological Organization,6 and are not considered open data from the perspective of their use and reuse. For this reason, RIMES needs to ensure that these data are protected by nondisclosure agreements. In return, ECMWF shares high-resolution digital forecast products with each participating country, with the aim of improving national forecasts and deepening the technical collab- oration between RIMES countries and ECMWF. With access to these high-resolution forecast products from ECMWF, countries can focus more efforts on
  • 171. Gathering, sharing, and using better data | 151 applying forecast information to the needs of their population and on building skills in data analytics.7 As map S4.1.1 shows, significant gaps in reporting basic weather data still exist. Important steps to take are getting countries to recognize the value of sharing their data and to participate in improving the mod- els. In time, it is anticipated that open data policies similar to the European Union Directive will apply to meteorological data everywhere, characterizing these data as having high value for social and economic development. The German Meteorological Service, for example, has started openly sharing all of the data it uses for its public tasks.8 Now more than 500 peta- bytes of data are downloaded monthly and used by a wide range of industries in Germany to improve their economic performance. Map S4.1.1 Large gaps remain in global reporting on basic weather data Source: World Bank map, based on data from WDQMS (WIGOS Data Quality Monitoring System) (webtool), World Meteorological Organization, Geneva, https://wdqms.wmo.int. Data at http://bit.do/WDR2021-Map-S4_1_1. Note: Snapshot of World Meteorological Organization Integrated Global Observing System interactive map showing observations of surface temperature measured on a typical day (December 14) in 2020. NWP = numerical weather prediction; OSCAR = Observing Systems Capability Analysis and Review Tool. Notes 1. See United Nations, Sustainable Development Goals: Decade of Action (dashboard), https://www.un.org /sustainabledevelopment/decade-of-action/. 2. Hallegatte, Bangalore, and Vogt-Schilb (2016). 3. Charts Catalogue (weather map repository), European Centre for Medium-Range Weather Forecasts, https:// apps.ecmwf.int/webapps/opencharts/?. 4. ECMWF (2018). 5. Rimes (2020). 6. Zillman (2019). 7. ECMWF (2020). 8. See Deutscher Wetterdienst, Open Data Server (dash- board), https://www.dwd.de/EN/ourservices/opendata /opendata.html. References ECMWF (European Centre for Medium-Range Weather Forecasts). 2018. “Extra Weather Station Data Improve ECMWF’s Forecasts.” News, June 18, 2018. https://www .ecmwf.int/en/about/media-centre/news/2018/extra -weather-station-data-improve-ecmwfs-forecasts. ECMWF (European Centre for Medium-Range Weather Forecasts). 2020. “ECMWF Moves towards a Policy of Open Data.” News, October 7, 2020. https://www.ecmwf .int/en/about/media-centre/news/2020/ecmwf-moves -towards-policy-open-data. Hallegatte, Stéphane, Mook Bangalore, and Adrien Vogt- Schilb. 2016. “Socioeconomic Resilience: Multi-Hazard Estimates in 117 Countries.” Policy Research Working Paper 7886, World Bank, Washington, DC. RIMES (Regional Integrated Multi-Hazard Early Warning System for Africa and Asia). 2020. “37 RIMES Countries Agree to Share Data and Focus on Impact Forecasting and Forecast-Based Action for 2021–2025.” RIMES, Asian Institute of Technology, Pathumthani, Thailand, Novem- ber 26, 2020. https://www.rimes.int/?q=node/630. Zillman, John W. 2019. “Origin, Impact, and Aftermath of WMO Resolution 40.” WMO Bulletin 68 (2): 69–71. https:// library.wmo.int/doc_num.php?explnum_id=10077. IBRDWLD45650 | MARCH2021 More than 100% Normal ≥ 80% Availability issues ≥ 30% Availability issues 30% Not received in period OSCAR schedule issues No match in OSCAR/surface land observation Availability of surface land observations (global NWP) as of December 14, 2020
  • 172. 152 | World Development Report 2021 Developing spatially integrated datasets that leverage administrative, social media, private, and more traditional sources can help to pave the way for smart and socially beneficial investments and policies. Spotlight 4.2 Making roads safer by repurposing private intent traffic data Reducing road mortality by half (Sustainable Develop- ment Goal 3.6) could save 675,000 lives a year. Yet the countrywide investments in regulation, enforcement, education, and infrastructure needed to attain this ambitious target are often out of reach. Resource con- straints require countries to make smart choices on where and how to invest. Knowing where and when road traffic crashes happen can help to prioritize investments where they matter most. Unfortunately, many countries facing these difficult choices have little or no data on road traffic crashes and inadequate capacity to analyze the data they do have. Official data on road traffic crashes capture only 56 percent of fatal- ities in low- and middle-income countries, on average.1 Crash reports exist, yet they are buried in piles of paper or collected by private operators instead of being converted into useful data or disseminated to the people who need the information to make policy decisions. In Kenya, where official figures under­ report the number of fatalities by a factor of 4.5,2 the rapid expansion of mobile phones and social media provides an opportunity to leverage commuter reports on traffic conditions as a potential source of data on road traffic crashes. Big data mining, combined with digitization of official paper records, has demonstrated how dispa- rate data can be leveraged to inform urban spatial analysis, planning, and management.3 Researchers worked in close collaboration with the National Police Service to digitize more than 10,000 situation reports spanning from 2013 to 2020 from the 14 police stations in Nairobi to create the first digital and geolocated administrative dataset of individual crashes in the city. They combined administrative data with data crowdsourced using a software appli- cation for mobile devices and short message service (SMS) traffic platform, Ma3Route, which has more than 1.1 million subscribers in Kenya. They analyzed 870,000 transport-related tweets submitted between 2012 and 2020 to identify and geolocate 36,428 crash reports by developing and improving natural lan- guage processing and geoparsing algorithms.4 To verify the accuracy of crowdsourced reports and the efficiency of the algorithms, the team dis- patched a motorcycle delivery company to the site of the reported crash minutes after each new crash report was received for a subset of reports. In 92 per- cent of cases, a crash was verified to have occurred in the stated location or nearby. By combining these sources of data, researchers were able to identify the 5 percent of roads (crash black spots) where 50 percent of the road traffic deaths occur in the city (map S4.2.1). This exercise demonstrates that addressing data scarcity can transform an intractable problem into a more manageable one. In this case, investing in the safety of a 6,200-kilometer road network is intractable. Digitiz- ing and analyzing administrative data and variables on injuries and deaths can help to narrow down the locations and times of the day and week that are associated with the most severe crashes. The analysis offers an invaluable road map for future regulation, infrastructure, and enforcement efforts. More insights can be gained by integrating exist- ing data and collecting further information, such as
  • 173. Making roads safer by repurposing private intent traffic data | 153 Uber and Waze data on average speeds on road seg- ments and on road obstacles; Google Maps data on land use; and weather data on driving conditions. The researchers also invested in a massive data collection effort that is surveying the infrastructure and video- taping and coding the behavior of road users in 200 crash black spots in the city. The analysis of these new data will generate hypotheses to optimize the policy response to the road safety problem. Overall, developing spatially integrated datasets that leverage administrative, social media, private, and more traditional sources can help to fill data gaps and pave the way for smart and socially beneficial investments and policies. Source: Milusheva et al. 2020. Note: Data shown are for July 2017–July 2018. Map S4.2.1 By combining police reports and crowdsourced data, researchers were able to identify the 5 percent of roads where half of the crashes occur in Nairobi Notes 1. WDR 2021 and World Health Organization (WHO) team calculations based on a comparison of reported deaths from the WHO Global Status Report on Road Safety (WHO 2018). 2. WHO (2018). 3. Milusheva et al. (2020). 4. The new algorithms build on work by Finkel, Grenager, and Manning (2005); Gelernter and Balaji (2013); and Ritter et al. (2011). References Finkel, Jenny Rose, Trond Grenager, and Christopher Mann­ ing. 2005. “Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling.” In 43rd Annual Meeting of the Association for Computational Linguistics: Proceedings of the Conference, edited by Kevin Knight, Hwee Tou Ng, and Kemal Oflazer, 363–70. New Brunswick, NJ: Association for Computational Linguis- tics. https://www.aclweb.org/anthology/P05-1045. Gelernter, Judith, and Shilpa Balaji. 2013. “An Algorithm for Local Geoparsing of Microtext.” GeoInformatica 17 (4): 635–67. https://doi.org/10.1007/s10707-012-0173-8. Milusheva, Sveta, Robert Marty, Guadalupe Bedoya, Eliza- beth Resor, Sarah Williams, and Arianna Legovini. 2020. “Can Crowdsourcing Create the Missing Crash Data?” In COMPASS ’20: Proceedings of the 3rd ACM SIGCAS Confer- ence on Computing and Sustainable Societies, 305–06. New York: Association for Computing Machinery. https://doi .org/10.1145/3378393.3402264. Ritter, Alan, Sam Clark, Mausam, and Oren Willi Etzioni. 2011. “Named Entity Recognition in Tweets: An Exper- imental Study.” In Conference on Empirical Methods in Natural Language Processing: Proceedings of the Conference, 1524–34. Stroudsburg, PA: Association for Computa- tional Linguistics. WHO (World Health Organization). 2018. Global Status Report on Road Safety 2018. Geneva: WHO. https://www .who.int/publications/i/item/9789241565684. Crashes geolocated from police situation reports Crashes identified by crowdsourced reports that were geolocated and clustered into individual crashes IBRDKEN45648 | MARCH2021
  • 175. PART II Data infrastructure policy: Ensuring equitable access for poor people and poor countries Data policies, laws, and regulations: Creating a trust environment Creating value in the data economy: The role of competition, trade, and tax policy Institutions for data governance: Building trust through collective action Aligning data governance with the social contract 5. 6. 7. 8.
  • 177. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 157 Data infrastructure policy: Ensuring equitable access for poor people and poor countries 5 CHAPTE R Main messages As new mobile technologies emerge, policy makers should proactively facilitate their rollout by promoting service competition, where possible, and infrastructure sharing, where necessary. Universal service policies should incorporate measures designed to ease the demand-side barriers often faced by those who do not seek data services even when they are locally available. These measures include programs to improve the affordability of handsets and data services, while enhancing the digital literacy of excluded groups. To ensure high-speed, cost-effective data services, policy makers should facilitate development of domestic data infrastructure that allows local storage, processing, and exchange of data so that data need not travel through distant overseas facilities. A competitive market and open governance arrangements are two policies that support the creation of internet exchange points. Establishment of colocation data centers will depend on a stable investment climate for private sector investors, combined with the availability of low-cost reliable sources of clean energy. 1 2 3 4
  • 178. 158 | World Development Report 2021 Data infrastructure as a source of inequity I nfrastructure is a prerequisite for collecting, exchanging, storing, processing, and distributing modern data because of its digital character. Har- nessing the full economic and social value of modern data services calls for digital infrastructure that is universally accessible, while also offering adequate internet speed at affordable cost. Yet the developing world is lagging behind, with major gaps between rich and poor people on broadband connectivity, and a substantial divide emerging between rich and poor countries in the availability of data infrastructure. Well-designed infrastructure policies are needed to redress these adverse trends. Concerns about inequities in access to data infra- structure stem from growing evidence of a link with economic activity. Numerous studies have found that broadband infrastructure boosts economic growth,1 increasing productivity2 and employment3 while enabling digital enterprises. For example, the arrival of fiber-optic submarine cables in Africa has had pos- itive effects on employment from the entry of new firms, greater productivity, and higher exports.4 More broadly, a 10 percent increase in data centers results in an expansion of exports in data-related services of about 1.6 percent.5 As a growing share of economic activity becomes data-enabled, it is important to ensure that poor people and poor countries are not excluded from such opportunities by the absence of suitable data infrastructure. Data infrastructure forms a supply chain that originates in global data storage centers and data processing facilities known as cloud computing platforms (figure 5.1). From there, data pass through Figure 5.1 The data infrastructure supply chain Source: Adapted from World Bank (2019c). Note: DSL = digital subscriber line; DTH = direct-to-home; IXP = internet exchange point; km = kilometers; LTE = Long-Term Evolution; WWW = World Wide Web (internet). Figure 5.1 (2 column plus margin 38p6) 1–5 km Last mile (access) Distributing internet from the middle mile in the community to individual homes and businesses 10–1,000 km Middle-mile network (distribution) Connecting bigger cities together and routing data to the service centers 1,000 km Connecting the service centers to the worldwide internet Cross-border network (international links) Purpose Subsea cable Terrestrial fiber Fiber-optic backbone Fiber DSL Coaxial Wireless Wi-Fi, LTE, TV white space Satellite link (DTH) Cellular towers Satellite link Earth station Wireless Technologies Typical distances Balloons and drones WWW Fiber or microwave backbone IXP WWW Fiber Fiber Fiber Colocation data center
  • 179. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 159 internet exchange points (IXPs) for transfer to users. Data then flow in and out of countries through an intercontinental network of submarine cables. Once on domestic soil, data are distributed through national fiber-optic and microwave backbone net- works until they reach a proximate location for distri- bution to local communities, whether through wired connections, or wireless signals provided by mobile (or cellular) networks. Finally, data are exchanged with individuals, businesses, and public institutions through fixed lines or wireless signals from cellular towers, and increasingly with inanimate machines, cameras, and sensors connected to what is known as the Internet of Things (IoT). Data travel thousands of kilometers along this seamless infrastructure supply chain at breathtaking speeds of 200,000 kilometers per second—meaning that digital data can, in princi- ple, circle the globe five times within a second.6 Data traffic is growing rapidly around the world. Internet data usage rose from 4.6 to 13 gigabytes per person per month between 2012 and 2017.7 Four trends are driving the explosion in data traffic. First, the number of internet users is growing. More than half of the world’s population is now online, up from less than one-third in 2010, and that share is forecast to reach two-thirds by 2023. Second, the number of con- nected devices on the IoT already exceeds the number of human users and is forecast to reach 25 billion by 2025 with the diffusion of 5G technology.8 Third, internet speeds are continually increasing, which supports growing data volumes. By 2023 the speed of broadband service provided over fixed networks is expected to double from 2018 levels,9 even as the speed of broadband service provided over fixed networks triples. Fourth, video accounts for three-fifths of inter- net traffic, and associated quality improvements are increasing video data traffic.10 A two-hour movie in standard definition uses 1.4 gigabytes of data, whereas ultra-high definition uses 18 gigabytes.11 Although most data traffic is still carried over fixed networks, data traffic carried over wireless networks is forecast to rise to more than 20 percent of the global total by 2022, up from only 3 percent in 2012. This shift is driven by the greater prevalence of mobile traffic in emerging nations, with China and India alone accounting for more than 40 percent of the world’s mobile data traffic as of 2018. Both poor people and poor countries face fundamen- tal inequities in their ability to access data infrastruc- ture. To participate in the data-driven economy, people require internet connectivity. It entails both access to last-mile internet infrastructure—increasingly provided through a wireless signal—and ownership of a data-enabled mobile handset (also known as a smart- phone)—or alternatively a full-blown fixed line con- nection. Such connectivity makes it possible for people to both have access to data about other people (and increasingly other things) and provide their own data to others. Large swathes of the population remain excluded from the internet, particularly the poor, the uneducated, the elderly, those living in rural areas, and—in some parts of the world—women. This com- plex situation reflects both the supply-side challenges entailed in rolling out coverage of the latest mobile technologies and the demand-side barriers preventing potentialusersfromtakinguptheserviceevenwhenit becomes available. Moreover, because of the growing volumes of data underpinning economic and social activity, connectivity is meaningful only if it can be provided at affordable cost and adequate speed. Unless countries have access to modern data infrastructure, connectivity (even when available) will remain prohibitively expensive and slow. Such infrastructure begins with adequate international bandwidth to permit fluid and unconstrained access to the global internet commons. As traffic grows, local IXPs are needed to prevent domestic data transfers from being diverted across vast distances overseas. The addition of domestic colocation data centers—wholesale storage facilities that host other companies’ data—allows substantial volumes of popular overseas content to be stored locally, further improving internet performance. It may also permit direct access to cloud computing platforms, greatly enhancing data processing capabilities. Although almost all countries now enjoy access to global inter- net submarine cables through either direct coastal access points or cross-border land connections, domestic data infrastructure—such as IXPs, coloca- tion data centers, and cloud computing platforms— remain nascent across low- and middle-income nations, leaving them to contend with low internet speeds and high data charges. This chapter unpacks the underlying issues that explain the data inequities faced by poor people and poor countries, with an emphasis on identifying appropriate policy responses. The chapter updates, complements, and extends the earlier treatment of related issues in World Development Report 2016: Digital Dividends. For this reason, coverage of supply- side issues is on a relatively high level, whereas the demand-side barriers, as well as the emerging chal- lenges posed by development of domestic data infra- structure, receive more attention.
  • 180. 160 | World Development Report 2021 Connecting poor people Many individuals in low- and middle-income nations usebasiccellphonesforapplicationssuchastextmes- saging and mobile money. These applications have had tremendous development impacts, even without using much data or requiring broadband internet access.12 Beyond such basic telephony applications, access to broadband internet, in combination with ownership of a feature phone or smartphone, greatly enriches an individual’s ability to use data for a better life. Social media connect family and friends; online government services and shopping websites save individualstimeandmoney;onlinelearningandtele- medicine provide new, accessible, and inexpensive ways of delivering education and health. The COVID- 19 pandemic is reinforcing the importance of access to broadband internet for remote learning and home working, as well as improving the overall resilience of economies to shocks of various kinds (see spot- light 5.1). In the context of low- and middle-income coun- tries, wireless broadband networks have emerged as the most relevant technology for accessing data ser- vices. The impacts of wireless broadband are greater than those of wired broadband in these nations,13 par- ticularly because the expansion of fixed broadband is relatively limited and has yet to reach the minimum threshold to have a statistically significant effect on economic growth.14 Even in upper-middle-and high-income nations, where fixed broadband is more prevalent users spend most of their time online on mobile phones. Among the poorest in these countries, many only use wireless networks to access the inter- net (figure 5.2).15 The world’s political commitment to universal access for internet was most recently articulated in a 2019 report of the United Nations Broadband Com- mission for Sustainable Development, which calls for 75 percent access to broadband worldwide by 2025— 65 percent in developing economies and 35 percent in least developed countries.16 The United Nations also encourages all countries to adopt by 2025 a national plan for universal access to broadband. These targets reflect a reappraisal by the international community following the failure to reach Sustainable Develop- ment Goal (SDG) 9, Target 9.c, which called for “uni- versal and affordable access to the internet in least developed countries by 2020.”17 From an economic standpoint, public policy sup- port for universal coverage of telecommunications and data services has hinged on positive network externalities. In other words, the economic value of communications infrastructure rises as more ­ members of a society are connected because such growth exponentially increases the number of pairs of people who can communicate with each other.18 Such positive externalities have underpinned the case for providing public subsidies to ensure that universal access can be achieved. Furthermore, as the internet becomes the central platform for much of social and economic life, providing all citizens with an opportunity to access this platform is increasingly a matter of social inclusion. The shortfall of digital connectivity in the devel- oping world can be understood in terms of three dif- ferent types of gaps. The coverage gap refers to the fact that last-mile digital infrastructure has yet to reach all inhabited locales. The usage gap refers to the fact that, even when coverage becomes available, uptake of the service by the affected population will typically not be universal. The consumptiongap refers to the fact that, even when people do take up the service, data consumption is typically too low to support basic economic and social functions. The discussion that follows focuses primarily on people, but small firms face many of the same barriers. Although all but 8 percent of the world’s popula- tion is covered by a wireless broadband network (fig- ure 5.3, panel a), this overall figure hides significant Figure 5.2 The developing world overwhelmingly accesses data using wireless networks Source: WDR 2021 team, based on data from International Telecommunica- tion Union, Statistics (database), https://www.itu.int/en/ITU-D/Statistics /Pages/stat/default.aspx. Data at http://bit.do/WDR2021-Fig-5_2. Note: Data are for 2019. 140 120 100 80 60 40 20 0 Number of broadband subscriptions per 100 persons H i g h - i n c o m e U p p e r - m i d d l e - i n c o m e L o w e r - m i d d l e - i n c o m e L o w - i n c o m e W o r l d Wireless Fixed
  • 181. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 161 regional differences (figure 5.3, panel b). The coverage gap is less than 1 percent in North America, but as high as 29 percent in Sub-Saharan Africa. The usage gapencompassed 42 percent of the world’s population in 2018, but as much as 64 percent of the population of South Asia, where more than 1 billion people are cov- ered by a broadband signal without making use of the internet. Although the 3G coverage gap has shrunk by more than half over the last five years thanks to successful rollout of last-mile infrastructure on the supply side, the usage gap has remained remarkably stable, indicating the persistence of barriers on the demand side. Indeed, as of 2018 more than four in five of the unserved persons worldwide lived in areas where 3G signal coverage was already available. Ultimately, the three gaps are interrelated because improving service uptake and data consumption also contribute to commercial viability, increasing the revenues generated by any particular investment in network coverage.Thus progress on closing the usage gap and narrowing the consumption gap will further help eliminate the coverage gap. Closing the coverage gap In 2018 more than 600 million people lived without access to the internet, a far cry from the United Nations’ SDG target of ­ universal and affordable access to the internet by 2020.19 Most of those who are unconnected live in lower-income nations. Estimates suggest that achieving universal broadband internet access by 2030 will require an investment of approxi- mately US$100 billion in Africa alone.20 The coverage gap is typically reported relative to 3G technology, which delivers speeds of 42 mega- bytes per second, making it the first generation able to support data-rich smartphone applications. How- ever, rapid innovation in the mobile communications sector leads to a new generation of technology just about every decade, ushering in substantial improve- ments in speed and bandwidth and making universal coverage something of a moving target. In fact, 4G technology, offering speeds of 400 megabytes per second, is already widely available in the developing world (figure 5.4). If 4G were used as the relevant technological benchmark, the coverage gap would rise from 8 percent to 20 percent in 2018, and the problem of access would no longer be confined solely to Sub-Saharan Africa. In 2019 5G technology became commercially avail- able in 23 high-income economies and China, with a global coverage gap of 95 percent by the end of the first year. The new 5G technology is revolutionary because of both its exceptionally high speed of 1,000 megabytes per second, as well as its greatly enhanced Figure 5.3 Gaps in 3G wireless broadband internet coverage have been shrinking, but usage gaps remain stubbornly high Sources: WDR 2021 team, based on 2015 and 2018 data in ITU (2018a). Data at http://bit.do/WDR2021-Fig-5_3. Connected Usage gap Coverage gap 60 78 56 58 89 28 22 39 15 36 36 11 64 49 2 1 6 8 6 8 29 0 10 20 30 40 50 60 70 80 90 100 Share of population (%) Share of population (%) E a s t A s i a a n d P a c i fi c E u r o p e a n d C e n t r a l A s i a L a t i n A m e r i c a a n d t h e C a r i b b e a n M i d d l e E a s t a n d N o r t h A f r i c a N o r t h A m e r i c a S o u t h A s i a S u b - S a h a r a n A f r i c a 41 45 48 50 37 40 41 42 21 15 11 8 0 10 20 30 40 50 60 70 80 90 100 a. Globally over time, 2015–18 b. By region, 2018 2015 2016 2017 2018
  • 182. 162 | World Development Report 2021 capacity to transmit a large amount of sensor-based data from the IoT in near real time, offering numer- ous applications across different economic sectors. Forecasts suggest that one-third of mobile subscrip- tions could be 5G by 2025.21 This Report finds that under current conditions, 5G stand-alone technology (that is, technology not dependent on 4G for signaling) does not seem to be broadly viable across low- and middle-income coun- tries, outside of major urban areas. However, devel- oping 5G non–stand-alone technology as an incre- mental evolution of 4G greatly improves its viability. It could then become a cost-effective technology for meeting population coverage in densely populated middle-income countries once data traffic grows to the point that large numbers of users are demanding many gigabytes of data per month.22 Viability could also be greatly improved by adopting regulations that promote sharing of infrastructure and policies that limit the burden of taxes and spectrum license fees on investors.23 Pastinvestmentsinfiber-opticbackbonenetworks and cellular towers (under 3G) have had a significant impact on the viability of the newer wireless technol- ogies(suchas4Gand5G).Countriesatanearlierstage of infrastructure development will find it challenging to leapfrog ahead, but for others investments in the fiber-optic backbone will continue to provide payoffs as countries upgrade to more advanced technologies. The coverage gap reflects the lack of commercial viability associated with serving remote populations in the absence of any government intervention. Clos- ing this gap calls for concerted efforts to drive down the cost of service provision, as well as better design of government policies on universal service access. Among the main policy measures to cut costs are those aimed at strengthening competition in the sec- tor, enabling the sharing of infrastructure, improving the availability and affordability of the wireless spec- trum, and exploiting new technologies. Reducing retail costs. Individuals’ access to reliable high-speed data services depends on both extensive last-mile coverage and proximity to the national fiber-optic backbone infrastructure. Limited retail competition can lead to high profit margins inflating charges to customers for last-mile access.24 In addi- tion, costs may be relatively high due to limited elec- tricity coverage in outlying areas, forcing operators to rely on their own higher-cost diesel power generation for base stations instead of being able to draw energy from the public grid.25 Meanwhile, recent innovations in wireless cellular technologies may reduce the cost of last-mile rollout. For example, in Japan Rakuten recently achieved 40 percent reductions in the cost of traditional cellular networks through migration to a cloud-based, software-driven environment.26 Introducing fiber-optic backbone competition. The greater the proximity of users to backbone infrastruc- ture, the stronger are the mobile signals and the faster the data download speeds. In Africa, 45 percent of the population lives more than 10 kilometers away from fiber-optic network infrastructure—more than in any other region.27 Relative to microwave links, fiber-optic backbones offer greater carrying capacity at higher speeds. Yet many countries in Africa still need to upgrade from microwave to fiber-optic technology— an estimated 250,000 kilometers of fiber-optic cable are needed across the continent.28 Deployment of fiber-optic cable can cost as much as US$70,000 per kilometer29 —a high entry barrier for building national fiber-optic backbones. As a result, competition is often limited, and, in the absence of regulation, high wholesale prices and limited network development may result. Lack of competition is further exacerbated when backbone infrastructure operators are vertically integrated, providing both wholesale and retail services.30 In the absence of robust competition, some countries have opted for state-owned backbone development, based on vendor financing, but this approach crowds out private investment and unnecessarily adds to the public debt. A competitive backbone market may be a Figure 5.4 Globally, the coverage of wireless technologies reflects their constant upgrading Source: WDR 2021 team, based on data in ITU (2018a). Data at http://bit.do/WDR2021-Fig-5_4. 0 2013 2014 2015 2016 2017 2018 1 2 3 4 5 6 7 8 Coverage (billions of people) World population Internet users 2G 3G 4G Internet service:
  • 183. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 163 preferable alternative policy, with government taking on a coordinating role, inviting multiple operators to participate, enforcing open access and cost-based pricing, and offering incentives to existing or new operators to invest in less lucrative areas to complete the infrastructure backbone. Enabling sharing of infrastructure. Another way to increase coverage by keeping costs down is to create a regulatory environment that facilitates the sharing of infrastructure both across sectors and within digital infrastructure markets.31 The cost of broadband trans- mission and core network deployment can be reduced by using existing railway lines, power transmission grids, and pipelines, or by coordinating with road construction to lay ducts along highways. In emerg- ing markets, and particularly in the poorest countries where demand may be thin and infrastructure costs and the associated risks relatively high, operators could be allowed to share backhaul infrastructure (such as fiber-optic cable) or local facilities (such as communication towers). Sharing of infrastructure has great potential to accelerate digital connectivity. Recent estimates suggest that the cost of deploying 5G mobile network technology could be reduced by more than 40 percent by sharing antenna sites.32 However, the tensions between promoting competi- tion and enabling cooperation in the market for dig- ital infrastructure must be carefully balanced, with cooperationencouragedonlyinmarketsegmentsthat cannot efficiently support more than one operator. Improving the availability and affordability of the spectrum. Making adequate spectrum available at relatively low cost is important for reducing coverage gaps. A low-frequency spectrum is attractive for rural areas because it provides wider coverage, requiring a lower density of cellular towers to cover a given area and reducing investment costs. Governments have often delayed the migration from analog to dig- ital television, which releases coveted low-frequency spectrum for wireless broadband use. Some govern- ments auction frequencies with elevated reserve prices that raise investment costs and are then passed on to users through higher prices. For example, in Senegal operators boycotted the 4G spectrum auc- tion because of the high reserve price—CFAF 30 bil- lion (US$49.86 million).33 Other governments charge recurring fees for the use of spectrum, raising the cost of deploying infrastructure in rural areas. Exploiting new technologies. Emerging niche tech- nologies—such as TV white space (TVWS), hot air bal- loons,andlow-orbitsatellites—promisetosignificantly reduce last-mile deployment costs in remote areas, although many have yet to scale up commercially. TVWS uses the buffer frequencies between TV chan- nels to provide broadband internet access. It is already being used successfully in Colombia to connect rural schools and coffee plantations in geographically chal- lenging locations such as mountainous rainforests.34 Two innovative solutions that have been proposed to reach remote rural areas are high-altitude platform station (HAPS) systems, which use a network of hot air balloons to provide unserved locations with connectiv- ity,35 and low-Earth orbiting (LEO) satellites. Iridium— which in 1998 became the first LEO to launch—today has slightly more than 1 million subscribers, mainly in niche markets such as the maritime aviation sectors and emergency services, as well as oil and gas.36 Yet neither HAPS nor LEO satellites have proved they can provide direct consumer broadband access in rural areas on a sustainable basis at an affordable price. Reforming universal service funds. Adopting these approaches to driving down costs can substantially expand the coverage attainable on a commercially viable basis. Nonetheless, some remote pockets will not reach universal access without some form of state support. Many countries have created universal service funds to harness public resources to subsidize infrastructure rollout in unserved areas. These funds are typically financed by obligatory levies charged on operators. However, for a variety of reasons many of these funds have proven to be unsuccessful (Kenya is one of the few exceptions in Africa).37 Funds often suf- fer from poor design, lack of spatial planning to guide fund allocations, a mismatch between funds collected and disbursed, political interference, and failure to incorporate sustainability factors such as training and education, maintenance, and energy supply.38 For example, in Africa more than US$400 million worth of universal service funds have not been disbursed.39 A study of countries with universal service funds in the Asia-Pacific region found that they did not expe- rience higher internet growth than countries without funds—except Malaysia and Pakistan, where the funds were transparent, efficient, and targeted exten- sion of the national fiber-optic backbones.40 Tackling the usage gap Of the 3.8 billion people not using the internet in 2018, 3.1 billion lived within range of a wireless broadband signal.41 Government efforts to provide universal ser- vice access have traditionally focused on eliminating the coverage gap through rolling out the supply of infrastructure, but such policies should increasingly be oriented toward addressing the demand-side barri- ers that limit service uptake, thereby creating such a sizable usage gap.
  • 184. 164 | World Development Report 2021 Targeting the most critical underserved segments. Although traditional universal service policies have largely focused on directing public support to under- served rural communities, there is considerable scope to target demand-side policy measures to particular categories of under-served individuals irrespective of their location. A suite of large sample household sur- veys conducted in 22 developing countries in 2017/18 reveal that people who do not connect to broadband service even when it is available are significantly more likely to be poor (in the bottom 40 percent of the national income distribution), less educated (having only a primary education), elderly (over 50 years old), and female.42 Of these, the largest effect is associated with education: completion of primary education adds 35 percentage points to the likelihood of internet uptake. Also noteworthy is the significant gender digital divide. Globally, some 250 million fewer women than men use the internet. In low-income countries, only one in seven women is online, compared with one in five men.43 Women are somewhat more likely than men to be challenged by digital literacy issues and to face additional obstacles to being online. For example, in many countries lack of family approval for women owning a cellphone is a major barrier.44 Broadly, three reasons have been put forward to explain the usage gap in low- and middle-income countries. First, people find it difficult to afford a mobile device or data services. Second, they lack the digital literacy needed to use the internet.45 Or, third, they do not see internet services offering any content or application of relevance to their lives. The house- hold surveys conducted in 22 developing countries in 2017/18 found that the reasons most frequently cited by people for not taking up data services are related to digital literacy (69 percent), followed by afford- ability concerns (15 percent) and relevance issues (12 percent)—see figure 5.5.46 Digital literacy limitations appear to dominate in South Asia, whereas afford- ability concerns are more prominent in some African countries. These different constraints are not, how- ever, mutually exclusive. A person who becomes dig- itally literate and therefore more motivated to access the internet may then face affordability challenges not relevant before. Individuals are also attracted to using the internet when family members or friends do so, particularly when it comes to social media. Analysis conducted for this Report found that social network effects have a significant positive impact on the usage of wireless internet in low- and middle-income countries. Indi- viduals whose five closest friends are using an online social network are 63 percent more likely to use the internet than those whose closest friends are not already active on social media.47 Addressing the widespread problem of digital literacy. In the 2017/18 household surveys, digital literacy was the most fundamental reason given for not using the internet. More than 84 percent of those surveyed who Figure 5.5 In low- and middle-income countries, nearly 70 percent of those who do not use the internet are held back by deficiencies in digital literacy Source: Chen 2021. Data at http://bit.do/WDR2021-Fig-5_5. Note: Respondents to the survey conducted for this Report had access to internet service. Responses to the digital literacy category included “Do not know what internet is” and “Do not know how to use internet.” Responses to the affordability category included “No access device” and “Too expensive.” Responses to the relevance category included “No interest/not useful” and “No relevant content in local language.” 0 10 20 30 40 50 60 70 80 90 100 Share of respondents (%) N e p a l C o l o m b i a I n d i a B a n g l a d e s h P a k i s t a n L e s o t h o S e n e g a l P e r u G u a t e m a l a C a m b o d i a P a r a g u a y N i g e r i a G h a n a E c u a d o r S r i L a n k a K e n y a A r g e n t i n a U g a n d a M o z a m b i q u e T a n z a n i a R w a n d a S o u t h A f r i c a T o t a l Digital literacy Affordability Relevance Other
  • 185. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 165 were either uneducated or had completed only a pri- mary school education stated they “do not know what internet is” or “do not know how to use internet.”48 Various initiatives are under way to teach basic digital skills. Mobile industry association GSMA has developed a Mobile Internet Skills Training Toolkit basedona“trainthetrainers”approach.49 Resultsfrom a pilot project in Bangladesh found that mobile inter- net usage among the beneficiary group more than tripled, with 19 percent of group members becoming regular mobile data users.50 In early 2017, the Rwan- dan government launched the Digital Ambassadors Program, which trained 5,000 youth posted to all 30 districts in the country to provide digital skills train- ing to 5 million Rwandans over a four-year period.51 Field studies conducted in Burkina Faso, Mali, Sen- egal, and Tanzania found that audio and icon-based interfaces and a stripped-down version of the internet (“internet lite”) helped students overcome their digital literacy limitations.52 Despite these examples, there is little evidence that digital literacy programs are oper- ating at the scale needed to significantly improve the uptake of data services, or that they are being suitably integrated with efforts to address the more funda- mental underlying problem of basic literacy. Once people become digitally literate, a key deter- minant of using the internet is availability of local language content.53 Social media usage grows rapidly as the relevant apps become available in local lan- guages much sooner than internet content.54 Making digital devices more affordable. Poor people wishing to avail themselves of internet access must first be able to afford a mobile device. However, according to one study, the cost of even an entry-level device exceeds 20 percent of the monthly income in more than half of low- and middle-income nations.55 Another study found that the cost of a low-end US$42 smartphone is more than 80 percent of the monthly income in low-income countries.56 Efforts are under way to make entry-level inter- net devices more affordable. Mobile operators are creating partnerships to obtain inexpensive handsets or are bundling mobile phones with subscriptions. Pan-African operator MTN collaborated with China Mobile to launch a US$20 smartphone targeted at 10 million first-time users.57 In India, Jio offers an internet-enabled phone for Rs 699 (US$9.21), provided the customer spends at least Rs 1,500 (US$19.77) a year on service charges.58 Although most branded phones are manufactured in East Asia, several countries have created reassembly plants to manufacture inexpen- sive mobile phones locally. In Ethiopia, a Chinese company is assembling about 1 million phones a year for export throughout the region.59 In Costa Rica and Malaysia, universal service funds have been used to subsidize internet devices for low-income users.60 Taxes, import duties, and other fees also affect device affordability. Despite the low purchasing power of their populations, low-income countries on average impose the highest customs duties on mobile phones, adding 7 percent to prices on average. One study found that several mainly low- and middle- income countries applied handset excise taxes (beyond the regular sales tax) and activation fees.61 Ownership is also affected by substantial gender gaps in low- and middle-income countries; the share of men owning mobile devices is 20 percentage points higher than the share of women.62 Narrowing the consumption gap Even among people who connect to the internet and subscribe to data services, a wide consumption gap remains in wireless data usage across country income levels and regions, with the data usage per capita in high-income countries more than 30 times higher than that in low-income countries (figure 5.6). The consumption gap is even wider if fixed broadband is considered. The number of fixed broadband subscrip- tions is much higher in high-income economies, and because of more favorable data plans, these subscrip- tions support much higher levels of consumption than mobile subscriptions—potentially as much as 100 times more.63 The consumption gap raises questions about how much data are “enough” to meet basic social and eco- nomic needs. In 2019 the Alliance for Affordable Inter- net (A4AI) stated that 1 gigabyte of data per month was sufficient to benefit from the internet in a mean- ingful way,64 but later it revised its estimate of “mean- ingful connectivity” to unlimited access as a result of the burgeoning use of data during the COVID-19 pan­ demic.65 Based on a detailed empirical examination of data consumption patterns, this Report estimates that 660 megabytes per month is adequate to meet basic needs for e-government services, online shop- ping, browsing news, medical and educational infor- mation, and the like, rising steeply toward 6 gigabytes per month if a certain amount of social media and video-related usage is also included.66 Two fundamental drivers of low data consump- tion are the struggle to afford data usage charges and the technical constraints on network perfor- mance. These two drivers can be related. Problems with affordability translate into lower usage, which, in turn, means lower revenue streams and weaker incentives to invest in better network performance.
  • 186. 166 | World Development Report 2021 Tackling affordability constraints. Many internet users in low- and middle-income nations limit their mobile data usage because of affordability con- straints (figure 5.7). A survey carried out in 11 emerg- ing countries found that a median of 48 percent of respondents had difficulty paying for their mobile data usage, and 42 percent frequently or occasion- ally restrict the amount of data they use.67 Instead of purchasing large amounts of data on a monthly basis, users buy it in small amounts when they have the money. Many mobile operators offer a variety of data bundles to cater to this pattern. MTN Zambia, for example, has 17 prepaid data plans, ranging from one- hour plans, including 5 megabytes of data, to weekly bundles offering unlimited access to popular social media applications.68 What is an affordable level of expenditure on data services has been the subject of some debate. The Alliance for Affordable Internet established a norma- tive affordability threshold of 2 percent of monthly income linked to a normative consumption threshold of 1 gigabyte per month.69 This threshold was subse- quently adopted by the UN Broadband Commission.70 According to these norms, data services could be con- sidered generally affordable to the average consumer, except in low-income countries. The reality is that the actual expenditure on data services (known as average revenueperuserorARPU)andtheassociateddatacon- sumption levels both fall well below these norms. In fact, it is only when the cost per gigabyte of data drops below 0.5 percent of gross national income (GNI) per capita that data consumption reaches and eventually exceeds the 1 gigabyte threshold (figure 5.8). The amount of data that people can afford to con- sume is itself a function of the prices that operators offer across different markets, as well as over time. In India, rapid entry of mobile operators offering 4G ser- vice in 2016 boosted coverage from 4 percent in 2015 to 94 percent in 2018.71 Intensifying competition led to a dramatic price drop from US$4.41 to US$0.17 per gigabyte per month from 2014 to 2018 and a surge in consumption per subscriber from 0.3 to 7.7 gigabytes per month over the same period.72 Similarly, in Cam- bodia intense competition has brought down the cost per gigabyte of data from US$4.56 in 2013 to one of the world’s lowest at US$0.13 in 2019, driving up data consumption to 6.9 gigabytes per capita per month— the highest mobile data usage per capita of any low- or lower-middle-income nation. This increase was achieved through a combination of measures such as migrating spectrum and users to 4G to achieve lower operating costs, outsourcing construction work, and moving software to the cloud.73 As a result of the low data charges, Cambodian consumers devote Figure 5.6 Inequities in mobile data consumption across country income groups and regions are huge Source: WDR 2021 team. Data at http://bit.do/WDR2021-Fig-5_6. Note: Data are for 2018. Figures include averages of 119 economies with data. GB = gigabytes. 7.1 2.8 1.3 0.2 3.7 0 1 2 3 4 5 6 7 8 Data consumption (GB per capita per month) Data consumption (GB per capita per month) H i g h - i n c o m e U p p e r - m i d d l e - i n c o m e L o w e r - m i d d l e - i n c o m e L o w - i n c o m e W o r l d a. By country income group 7.4 5.0 4.5 4.2 2.4 1.1 0.3 0 1 2 3 4 5 6 7 8 M i d d l e E a s t a n d N o r t h A f r i c a E u r o p e a n d C e n t r a l A s i a N o r t h A m e r i c a E a s t A s i a a n d P a c i fi c L a t i n A m e r i c a a n d t h e C a r i b b e a n S o u t h A s i a S u b - S a h a r a n A f r i c a b. By region
  • 187. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 167 96 percent of their information and communication technology (ICT) spending to data services, having largely moved away from voice and text. Interestingly, users in low-income countries typ- ically spend much more on voice and text services— between 2 and 4 percent of monthly income—than they do on data services. Because traditional voice and text can alternatively be provided on over-the-top (OTT) data-based services, which bypass traditional distribution and use only a small amount of data, users could substantially reduce their overall ICT expenditure by substituting data for voice and text services. Addressing technical constraints. Slow speeds also discourage consumers from using more data. Down- loading 250 megabytes takes 17 seconds at a speed of 100 megabytes per second, but as long as three minutes at a speed of 10 megabytes per second.74 The speed dividends arising from migration to the next generation of mobile technology clearly drive higher levels of data consumption. For example, in India during 2018 a 2G subscriber consumed just 0.5 giga- bytes per month, rising to 5.3 gigabytes per month for a 3G subscriber and 9.7 gigabytes per month for a 4G subscriber.75 Regulatory policies and retail competition drive migration to higher-generation mobile technologies. Transitions to next generations can be encouraged by early release of competitively priced spectrum. The auction of 3G spectrum in Thailand in 2012 was designed to support rapid upgrade from 2G by incor- porating license conditions for coverage, pricing, and quality. Within two years, all license conditions were exceeded, with nearly universal 3G coverage, upgrad- ing of three-quarters of subscriptions, lower prices, and service speeds exceeding license requirements by tenfold.76 Allowing operators to “refarm” their spectrum holdings can also accelerate migration to next- generation mobile. In contrast to regulatory policies that tie new mobile technologies to specific frequen- cies that are then auctioned, refarming allows oper- ators to launch new mobile technology using their own spectrum. This encourages them not only to be more efficient in their use of spectrum, but also to rapidly upgrade their existing subscribers to reclaim the spectrum used for the older technology. About three-quartersofthespectrumusedfor4Garoundthe world has been refarmed: 25 percent from switching from analog to digital television broadcasting and 50 percent from operators using their own spectrum.77 Nonetheless, many operators try to recapture their original capital expenditure before upgrading to a new generation of wireless technology.78 To overcome this issue, Rwanda created a public-private part- nership to roll out a 4G/LTE (Long-Term Evolution) network that covered almost 99 percent of the popu- lation by the end of 2019—the highest level in the world.However,becauseofthehighcostofobtaininga 4G/LTE–compatible device, as well as technical chal- lenges with the migration of voice services, service uptake has been modest, with most Rwandans con- tinuing to use slow 2G technology for mobile data.79 Figure 5.7 The monthly price for 1 gigabyte of data is unaffordable in low-income countries Source: Cable.co.uk, Worldwide Mobile Data Pricing 2020: The Cost of 1GB of Mobile Data in 228 Countries (dashboard), https://www.cable.co.uk /mobiles/worldwide-data-pricing/. Data at http://bit.do/WDR2021 -Fig-5_7. Note: Data are as of 2018. The affordability threshold is 2 percent of monthly income. Prices are the median prices of the economies in the group. GNI = gross national income. 0 Affordability threshold 1 2 3 4 5 6 7 8 9 Data cost (% of GNI per capita) H i g h - i n c o m e U p p e r - m i d d l e - i n c o m e L o w e r - m i d d l e - i n c o m e L o w - i n c o m e W o r l d Figure 5.8 Data consumption is very sensitive to market prices and service affordability Source: WDR 2021 team. Data at http://bit.do/WDR2021-Fig-5_8. Note: Each circle represents a country. GB = gigabyte; GNI = gross national income. R² = 0.7637 0 1 2 3 4 5 6 7 0.5 0 1.0 1.5 2.0 2.5 3.0 GB usage per month Average cost per user per GB (% of GNI per capita)
  • 188. 168 | World Development Report 2021 Connecting poor countries The high cost and low speed of internet services have emerged as key drivers of data consumption in the developing world. One reason is that many low-income countries lack their own domestic data infrastructure, relying instead on overseas facilities to exchange data (via internet exchange points), store data (at colocation data centers), and process data (on cloud platforms). This reliance requires them to transfer large volumes of data in and out of the coun- try (see “tromboning” in figure 5.9), for which they pay a substantial penalty: prices that are several times higher than those in countries with their own infra- structure. They also experience slower speeds that can be an order of magnitude lower. This situation can be avoided by creating IXP infrastructure at the national level, eventually complemented by coloca- tion data centers. Consider a user who wants to view an educational video online. The request is uploaded as a small packet of data with address information and goes from the user’s device to the national backbone and onward to the internet service provider (ISP). Often in lower-income countries, the video is not available domestically, obliging the ISP to route the request overseas, where it finds its destination—say, in Cal- ifornia. The video is then downloaded back to the user. Such a circuitous process for accessing content incurs significant charges from international carriers while prejudicing service quality. This same oper- ation would be much faster and cheaper if a replica of this content were stored at a local colocation data center that could be accessed via a local IXP. This example illustrates how international band- width is a critical part of the data infrastructure, enabling data to be sent to and retrieved from any- where in the world. The global internet bandwidth stood at 463 terabytes per second in 2019, almost tripling from 2015. Sub-Saharan Africa had the fastest growth in bandwidth of any region over the 2015–19 period. It grew by 53 percent a year, reflecting a large increase in capacity because of the deployment of new submarine cables. However, Sub-Saharan Africa continues to lag other regions in total capacity. Most international internet traffic is carried over the world’s dense web of some 400 undersea fiber- optic cables, spanning more than 1 million kilo- meters.80 Almost all coastal economies are now connected to undersea cables (map 5.1). Submarine cable ownership has diversified from consortiums of telecommunication operators to include wholesale operators and increasingly big content providers such as Amazon, Google, and Microsoft.81 Notably, Facebook recently announced plans to lay the 2Africa submarine cable around Africa. It will have nearly three times the capacity of all the undersea cables currently serving the continent.82 The growing convergence of content provision and carriage of content will require greater regulatory oversight to ensure that carriage is provided in an open, nondis- criminatory manner. Before establishing a submarine cable connection, countries used costly, low-capacity satellite links. Connection to submarine cables has dramatically lowered wholesale international bandwidth prices. Results in Africa over the last decade have been dra- matic, with the price of 1 megabit per second drop- ping from US$3,500 to US$29 in Mauritania and from US$1,174 to US$73 in Togo.83 In Tonga, the submarine cable increased capacity by more than 100 times, while prices dropped from US$495 to US$155 per megabit per second.84 Nonetheless, restrictive policies for access to submarine cable landing stations may Figure 5.9 The presence of domestic data infrastructure facilitates national data exchanges Source: WDR 2021 team. Note: ISP = internet service provider; IXP = internet exchange point. mn plus margin 21p7) International Domestic Local content data flow Tromboning of local content End user End user International carrier 1 International carrier 2 Domestic ISP 1 Domestic ISP 2 IXP
  • 189. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 169 prevent the full benefits of this reduction in whole- sale prices to feed through into retail tariffs. Despite such progress, huge price differences persist for the exchange of data traffic. For example, the cost of exchanging data is around US$0.45 per megabyte per second in North America and US$0.62 per megabyte per second in London, compared with US$2.38 in São Paulo and US$5.00 in Johannesburg.85 These stark differences in costs may in part reflect the limited development of domestic data infrastructure in low- and middle-income countries. Strengthening data infrastructure Without a domestic capability to exchange data, countries are totally reliant on international band- width. As noted, such bandwidth is expensive and slower than exchanging traffic locally. Such reliance also affects service resilience, since a country is completely shut off from the internet if there is any disruption to international bandwidth. For exam- ple, after a trawler snapped a submarine cable in 2018, Mauritania was offline for two days, and nine other West African countries experienced internet outages.86 Although there will always be a need for international bandwidth, an appropriate balance is needed between relying on overseas infrastructure and developing domestic facilities.87 The economic case for domestic data infrastruc- ture hinges on whether the present value of the resulting cost savings and speed improvements for data transactions over the life of such infrastructure exceeds the associated immediate up-front invest- ment in facilities. The cost of developing IXPs is relatively modest and likely can be supported even in nascent markets so long as the sector is not monopo- listic. As for colocation data centers, the investments are more sizable. There are also significant scale economies associated with the development of the associated power infrastructure that may account for as much as 40 percent of investment costs. The operating expenses are also largely fixed; about half of them are related to energy for cooling the facili- ties. Because exceptionally high levels of reliability and security are needed for colocation data centers, market dynamics favor hyperscale service provid- ers with established reputations. This requirement further reinforces the case for larger-scale facilities in countries that have a relatively stable investment climate, including low levels of disaster risk, and the availability of clean, reliable, and cost-effective Map 5.1 The global fiber-optic cable submarine network reaches all corners of the world, but data infrastructure is unevenly developed Sources: PeeringDB, Interconnection Database, https://www.peeringdb.com/; PCH Packet Clearing House, Packet Clearing House Report on Internet Exchange Point Locations (database), accessed December 14, 2020, https://www.pch.net/ixp/summary; TeleGeography, Submarine Cables (database), https://www.submarinecablemap.com/. Data at http://bit.do/WDR2021-Map-5_1. Colocation data centers Internet exchange points (IXPs) Submarine cable landing stations Submarine cables IBRD 45xxx | MARCH 2021 IBRDWLD45643 | MARCH2021
  • 190. 170 | World Development Report 2021 sources of energy or natural sources of cooling such as water bodies. Creating internet exchange points. By keeping data traffic in the country, IXPs can reduce reliance on international bandwidth, lowering costs and improv- ing performance. One study covering Latin America noted that “local bits” are cheaper than “exported bits,” finding that the region spent around US$2 billion a year for international bandwidth—a sum that could be reduced by one-third through greater use of IXPs.88 IXPs reduce the time it takes to retrieve data, enhanc- ing user engagement. In Rwanda, it is 40 times faster to access a locally hosted website (5 milliseconds) than one hosted in the United States or Europe (200 milliseconds).89 As of June 2020, there were 556 IXPs across the globe.90 Europe, with the largest number, accounts for 37 percent of the world total, while Africa has just 9 percent and accounts for less than 2 percent of global IXP traffic, although that traffic is growing rapidly.91 Stark differences in the availability of IXPs are evident across country income groups, partic- ularly when population differences are taken into account. IXPs are often established initially by universities or as nonprofit associations of ISPs, located in small server rooms with technical tasks carried out by vol- unteers. As greater volumes of traffic are exchanged and new participants join, a more sustainable tech- nical and operational environment is needed. Gover- nance arrangements are then formalized, staff hired, and equipment upgraded. Eventually, the IXP grows to the point where many participants want to join without having to deploy a physical connection to the exchange. This leads to the creation of multiple IXPs indifferentlocations,withthecentralIXPrelocatedto a colocation data center. For example, DE-CIX, an IXP in Frankfurt, Germany, began operations 25 years ago in an old post office when three ISPs interconnected their networks. 92 Today, it is the world’s leading IXP, spread over more than three dozen data centers and linking almost 1,000 participants, with average traffic of more than 6 terabytes per second. Developing colocation data centers. Data centers have emerged as a vital component of the digital infrastructure ecosystem. In a data center, networked computers provide remote storage, processing, and distribution of data. The centers are mainly operated by global information technology (IT) companies, governments, and enterprises that host other com­ panies’ data (colocation data centers). Data centers range from small rooms in organizations where data are kept on storage devices connected to computer servers to giant warehouse-like facilities where thou- sands of servers are arranged on racks. Colocation data centers offer companies multiple advantages, including the physical space to store a growing amount of data, the security associated with high industry reliability standards (as they almost never shut down), and easy internet access due to their growing association with IXPs.93 Globally, some 3,700 data centers are connected to the internet.94 The disparities in data center pen- etration among country income levels are wide, particularly when taking population differences into account (figure 5.10). Although there are more than three data centers per million inhabitants in North America, the ratio is only 0.8 per million in South Asia and Sub-Saharan Africa. In fact, there are more data centers in the state of California than in all of Sub- Saharan Africa. These disparities may be related to lower income and thus demand, but they also reflect shortcomings in the investment climate. Major colocation data center companies have largely shunned investing in low- and middle- income economies. This lack of investment is often attributed to a lack of demand, as well as an aversion to a country’s perceived high risk of natural disas- ters, unpredictable political environment, barriers to doing business, and unreliable energy and internet infrastructure.95 However, certain large businesses in low-income countries, such as those in the financial Figure 5.10 Data infrastructure is relatively scarce in low- and middle- income countries Source: WDR 2021 team, adapted from PeeringDB, https://www.peeringdb .com/. Data at http://bit.do/WDR2021-Fig-5_10. Note: The figure depicts data centers connected to the internet. Data were extracted in June 2020. Percentage of countries with data infrastructure 90 100 80 70 60 50 40 30 20 10 0 Internet exchange points Colocation data centers Cloud on-ramps High-income Upper-middle-income Lower-middle-income Low-income
  • 191. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 171 sector and other service industries, already have in-house data storage systems that, if aggregated, could create the scale necessary for colocation data centers. Regional players are filling the void left by the large global data center providers. For example, Africa Data Centres (part of the Liquid Telecom Group) has colocation facilities in Kenya, Rwanda, South Africa, and Zimbabwe. Although some of the scale issues associated with developing colocation data centers could potentially be overcome through regional collaboration around shared facilities, the case for such an approach hinges on the existence of strong regional fiber-optic network connectivity to ensure that data can be transferred rapidly and reliably to any shared regional data facility; competitive pricing of such data transfers; and regional harmonization of the regulatory framework to support agile cross-bor- der data transfers (as discussed in chapter 7). Despite mounting concerns about the environ- mental impact of data centers, there is evidence that the industry is taking aggressive action to curtail emissions and that availability of renewable energy is a factor in attracting investment (see spotlight 5.2). Climbing the data infrastructure ladder. A country’s development of data infrastructure can be envi- sioned as a series of stages that over time lower costs and improve performance (figure 5.11).96 The stages progress from having no domestic IXP (stage 0), to establishing an IXP (stage 1), to attracting content providers and deploying data centers that host a diverse group of participants (stage 2), to locating the IXP alongside a colocation data center (stage 3). As countries move up the ladder, more data are exchanged nationally, and reliance on costly interna- tional bandwidth is consequently reduced, lowering retail prices, raising speed, and allowing higher data consumption. Drawing on industry registries capturing the availability of data infrastructure globally in 2020, it is possible to build a comprehensive picture of domestic data infrastructure in the developing world. At stage 0 are 28 of the countries surveyed for this Report, none of which has an IXP, and these include almost half of the low-income country group. Underlying barriers are often responsible for the lack of an IXP: 10 of these countries are small island states where the scale of traffic is insufficient; four are in fragile and conflict-affected situations that impede the development of the data ecosystem; and five are monopolies where the sole national telecom operator is also in effect the IXP. At stage 1 are 29 countries in which IXPs connect local ISPs.These include more than a third of low- and lower-middle-income countries. These IXPs are often located on the premises of government agencies or at academic institutions, typically in a small server room and in some cases using equipment provided through development assistance. For example, the African Internet Exchange System (AXIS) project, financed by the European Union, established IXPs in 14 African nations.97 The IXPs in this group of nations often generate limited amounts of traffic, sometimes because not all ISPs participate in the IXP. In some Figure 5.11 Countries develop domestic data infrastructure in stages Source: Comini, Srinivasan, and Minges, forthcoming. Note: Data provide close to global coverage for the year 2020 and are compiled from a variety of industry sources, including Packet Clearing House, CAIDA, PeeringDB, EURO-IX, and AF-IX. Amounts are in US dollars. GB = gigabyte; ISP = internet service provider; IXP = internet exchange point; MB = megabit. 28 countries Low-income: 48% Lower-middle- income: 30% 29 countries Low-income: 38% Lower-middle- income: 36% 13 countries Low-income: 14% Lower-middle- income: 18% 7 countries Low-income: 0% Lower-middle- income: 16% Average price of fixed broadband per MB per month Average price of wireless broadband per GB per month Stage 3 IXP located alongside a colocation data center Stage 2 Diversity of participants at the IXP and presence of major international content providers Stage 1 Domestic data between ISPs exchanged at the IXP Stage 0 Data exchanged overseas Figure 5.11 (2 column plus margin 38p6) $140.00 $5.60 $18.30 $4.50 $17.10 $2.20 $4.30 $0.80
  • 192. 172 | World Development Report 2021 countries, regulatory restrictions forbid participants that are not ISPs from joining the IXP. The 13 countries at stage 2 have IXPs with non- ISP participants. Less than one-fifth of low- and lower-middle-income countries fall into this group. This group generally has numerous and diverse participants in the IXP, including all ISPs, as well as government agencies, local cloud providers, and national content companies, among others. Often, they have a Google Global Cache, a subset of Google’s most popular content,98 as well as content delivery networks (CDNs). However, large international con- tent and cloud providers do not always use the IXP either because it is not located in a neutral data center or because its governance is not organized according to an open multistakeholder model. At stage 3 are seven countries with IXPs colocated in data centers with international content partici- pants. There are no low-income countries in this cat- egory, and only a minority of lower-middle-income countries. These arrangements often include multi- ple IXPs located in several data centers to facilitate participation. The Kenya Internet Exchange Point (KIXP) illustrates how this results in a dense network, enabling many participants to exchange data. KIXP is operated by a nonprofit organization representing technology companies, and its board follows inter- national IXP best practices. KIXP has no restrictions on the types of organization that can connect to the exchange. It is located in colocation data centers in Kenya’s two largest cities, Nairobi and Mombasa. Par- ticipants include national, regional, and international ISPs; government agencies; financial companies; and international content and cloud providers such as Amazon, Facebook, Google, and Microsoft. Empirical evidence suggests that the benefits to countries of moving up the data infrastructure ladder are substantial. The average cost per gigabyte of wire- less data per month drops from US$5.60 in countries at stage 0 to US$0.80 in countries at stage 3, while the corresponding cost per megabit for fixed data drops from US$140.00 to US$4.30 per month. Nonetheless, the full benefits of developing domestic data infrastructure become apparent only when the local market is sufficiently competitive. For example, because of its strategic geographic location on the Horn of Africa, Djibouti’s DjIX is a regional hub handling the exchange of considerable volumes of internet traffic. However, even though Djibouti’s data infrastructure provides valuable services to neighboring countries, the monopolistic market structure of the national telecommunications sec- tor does not allow these advantages to be passed on to domestic consumers, who face some of the most unaffordable internet charges in the world.99 Policy makers have an important role to play in shepherding IXPs through these various stages of development. In the early stages, demand remains incipient, and it is not possible to benefit from scale economies in infrastructure development. Gov- ernments and academic institutions can help ini- tially with nascent IXPs by providing facilities and resources for training. Encouraging ISPs to exchange traffic locally helps boost demand for data services by reducing the cost of exchange. If needed, govern- ments may have to mandate participation, particu- larly where dominant operators have been resistant. As their IXPs grow, governments can reduce their role, encouraging IXPs to become self-sustaining. Open IXP policies and multistakeholder governance are important for attracting non-ISPs to the member- ship, including large content providers. Government insistence on control over IXP practices discourages private sector investment in the data ecosystem.100 A supportive regulatory environment for IXPs, as well as attention to sound governance practices, should ensure that multiple ISPs as well as universities, large enterprises, and other significant users make full use of the available IXP infrastructure. For example, in 2011 Bolivia legally mandated the creation of a national IXP requiring the participation of local ISPs. However, traffic growth was limited until 2018, when improved governance arrangements incentivized greater reliance on the IXP by local market players.101 Accessing cloud platforms Just as there is growing reliance on colocation data centers to store data, the processing of data is being handled increasingly by cloud platforms. Cloud platforms essentially enable users to access scal- able data storage and computing resources across the internet or other digital networks as and when required. Continual enhancements in cutting-edge computing capabilities, combined with significant improvements in the capacity and speed of process- ing, transmitting, and storing data, are making cloud computing increasingly important in the delivery of public and private services.102 Cloud platforms offer significant benefits in terms of security, resilience, scale, and flexibility. Security is arguably better on large cloud computing platforms than what many businesses or governments could achieve in-house.103 Strong security features include ongoing data backups, redundant sites, and industry certifications, as well as adherence to national data protection regulations. However, moving data to the
  • 193. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 173 cloud environment also presents new vulnerabilities such as reduced visibility of assets and operations, or the possibility that applications used to access cloud services could be compromised. IT infrastructure becomes more resilient as digital data and computing power become geographically distributed. This resil- iency is enhanced by classifying services by region and availability zones and connecting data centers in the same geographic area. Cloud computing is attrac- tive because it is often cheaper to share resources on a common platform than to replicate hardware, software, and storage requirements on individual company sites. Small enterprises can then outsource IT activities that they otherwise would not be able to provide internally, while benefiting from the flexibil- ity of immediate upgrades to the most recent analyt- ics and storage technology. As broadband connectivity has become more widely available around the world, cloud computing has been growing rapidly, with industry revenues exceeding US$180 billion in 2018, up 27 percent over the previous year.104 A few large companies dominate the cloud space, with almost all software and IT ser- vices firms based in the United States. These hyper- scale providers operate cloud data centers mainly in high-income countries, with just a handful in large middle-income nations such as Brazil and South Africa, though not elsewhere in the developing world. Free cloud services funded by advertising, such as webmail and online social networks, are already widely used in low- and middle-income nations. Google Docs provides word processing, spreadsheet, and presentation software used by millions around the world.105 IBM offers several free services on its cloud.106 However, sophisticated cloud services such as storage and analysis of vast amounts of data can be costly for developing economies because of the cost of moving data internationally and the resulting sacrifice in terms of speed. Onepotentialsolutionistodevelopcloudplatforms at the regional level by aggregating demand to achieve economies of scale. Regional harmonization of regu- lations for data security, data protection,107 and data sovereignty could further reduce compliance costs and help induce major cloud providers to locate closer to low- and middle-income countries. For example, in March 2019 Microsoft launched the first data centers from a large cloud provider in Africa, with locations in Cape Town and Johannesburg, South Africa, and potential wider relevance to southern Africa.108 Another approach is for countries with colocation data centers to encourage the creation of “on-ramps” to cloud computing services. These are prevalent in some 80 percent of high-income countries but only in about 10 percent of middle-income countries such as India and Indonesia, and not at all in low-income countries. Cloud on-ramps are private connections between data centers and cloud providers. They allow clients to interact directly with overseas cloud providers through domestic IXPs located in colocation data centers without needing to use the internet to access cloud services.109 This process provides greater secu- rity and reliability because data are not transmitted to the cloud over public infrastructure but rather directly via the on-ramp. Performance in terms of speed is also greatly improved and costs are significantly lowered because the cloud provider is responsible for man- aging and routing the data traffic from the domestic colocation data center to its cloud data center overseas using the on-ramp. At the same time, cloud services create demand for data centers because some appli- cations require very high speed, which can only be achieved when computational power is located close to the user at the network’s “edge.”110 This description underscores the complementarities between different types of data infrastructure, such as IXPs, colocation data centers, and cloud computing. Big data analysis is increasingly taking place over distributed cloud networks because the considerable processing power needed is available only on the cloud. Data are stored in one or more places and pro- cessed in others.The cloud has also enabled a new col- laborative environment for software development in which developers from around the world participate in modifying code. The world’s largest open-source platform, GitHub, hosts more than 100 million reposi- tories used by 50 million developers worldwide.111 The growth in new software projects is mainly coming from low- and middle-income nations, with Africa expanding more rapidly than any other region. Open- source repositories in Africa created by software developers grew 40 percent in 2019.112 Without skilled human resources, countries will be limited in their ability to apply modern data infra- structure to achieving economic and social impacts. Workers are needed to create and maintain data infrastructure, as well as to collect, store, manage, and analyze large amounts of data. Although the skills needed to collect, store, and manage data are available in many parts of the world, those for analyzing big datasets are limited. Data scientists—specialists in math, computer, and analytical skills—who derive value from large datasets are in short supply, and low- and middle-income countries are at a disadvantage in the global market for technical skills (see box 5.1).
  • 194. 174 | World Development Report 2021 Conclusions and recommendations Low- and lower-middle-income countries continue to face major challenges in connecting themselves to the vital infrastructure that underpins the data-driven economy. Many have yet to develop their own IXPs and remain reliant on slow, expensive international data transfers to access the World Wide Web. Colo- cation data centers that allow further local storage and processing of data, as well as caching of internet content, are still not prevalent in low- and lower- middle-income countries, while global cloud com- puting resources are almost entirely concentrated in high-income countries, with a limited availability of on-ramps to facilitate access by the developing world. Box 5.1 The brain drain—ICT professionals Available data on the supply of and demand for infor- mation and communication technology (ICT) skills paint two distinct pictures.a Among country income groups, there does not seem to be wide divergence in the pro- portion of students graduating with ICT degrees, which typically falls in the 7–8 percent range overall, yet with marked discrepancies between men and women (figure B5.1.1, panel a). However, employment of ICT profession- als is strongly correlated with country income groups, since these workers account for 2.1 percent of total employment in high-income nations, dropping to 0.1 per- cent in low-income countries (figure B5.1.1, panel b). The mismatch between supply and demand in low- and middle-income economies prompts outward migra- tion toward better employment opportunities in countries where the digital sector is more developed. Evidence of this brain drain already exists: all countries except high- income are experiencing large outflows of skilled tech workers (figure B5.1.2, panel a). Even where skilled data workers have opportunities in low- and middle-income countries, wage differentials could be a driver of migration. The average wages for ICT employees are significantly higher in high-income nations (figure B5.1.2, panel b), and significant wage differentials exist across regions. (Box continues next page) Figure B5.1.1 Low- and middle-income countries are educating ICT professionals but not retaining them Sources: Institute for Statistics, United Nations Educational, Scientific, and Cultural Organization, Data of UIS.Stat (database), http://data.uis .unesco.org/; International Labour Organization, ILOSTAT Database, http://www.ilo.org/ilostat/. Data at http://bit.do/WDR2021-Fig-B5_1_1. Note: For ICT graduates, panel a presents the latest available data between 2015 and 2018 for 120 economies. For ICT professionals, panel b plots the latest available data between 2016 and 2019 for 73 economies. Country income group percentages are collective country averages. ICT = information and communication technology. L o w e r - m i d d l e - i n c o m e U p p e r - m i d d l e - i n c o m e 0 1 2 3 4 5 6 7 8 9 Percentage of total graduates Percentage of total professionals H i g h - i n c o m e L o w - i n c o m e a. ICT graduates, by gender Male Female 0 0.5 1.0 1.5 2.0 2.5 W o r l d H i g h - i n c o m e U p p e r - m i d d l e - i n c o m e L o w e r - m i d d l e - i n c o m e L o w - i n c o m e b. ICT professionals, by country income group
  • 195. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 175 At the same time, in poor countries large swathes of the poorest and most disadvantaged segments of society continue to be excluded from access to broadband data services. For low-income countries, particularly in Africa, the coverage gap for broadband signal remains significant, affecting 30 percent of the population. A major concern is the usage gap—the vast majority of those who do not have data access today live within range of a mobile signal, but they face either affordability or literacy challenges that prevent them from making use of the service. The COVID-19 pandemic has drawn growing attention to the consumption gap, which highlights the limited volumes of data usage in the developing world and Box 5.1 The brain drain—ICT professionals (continued) The relative penetration rate of data science skills— relevant to artificial intelligence—across comparable occu- pations is four times higher in high-income countries than in low-income countries.b Given salary differentials of 5–10 times between data scientists in low- and high-income countries, it is estimated that workers with these skills in low-income countries are 33 percent more likely to migrate than workers from high-income countries. There was a net outflow of at least 70,000 workers from low- and middle- income countries every year from 2015 to 2019.c a. Data from the national statistical office on data skills are lacking in both availability and specificity, particularly for low- and middle-income countries. New sources of data, such as the professional networking platform LinkedIn, are emerging as sources of timely and granular information on the labor market, albeit with limited representativeness. The LinkedIn data used in this Report have been validated against international metrics where available to assess coverage and representativeness (Zhu, Fritzler, and Orlowski 2018). b. Estimation based on the Skills Genome Benchmarking Methodology, using LinkedIn’s Skills Genome Country-Occupation data. This method allows a fair comparison of the penetration of data science skills of all countries in the dataset with that of a chosen benchmark (low-income countries, in this case) by controlling for common occupations among each country and the benchmark. c. WDR 2021 team calculations, based on Zhu, Fritzler, and Orlowski (2018). Figure B5.1.2 Major wage differentials for ICT professionals create a brain drain, especially in low- and middle-income countries Sources: International Labour Organization, ILOSTAT Database, http://www.ilo.org/ilostat/; LinkedIn data (2015–19). Data at http://bit.do /WDR2021-Fig-B5_1_2. Note: According to LinkedIn dataset classification, “disruptive technology skills” include knowledge and skills in areas of materials science, aerospace engineering, development tools, data science, robotics, artificial intelligence, human computer interaction, genetic engineering, nanotechnology, and fintech. ICT = information and communication technology. Earnings (US$) a. Net flow of workers with disruptive technology skills 0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 b. Mean nominal monthly earnings of ICT employees H i g h - i n c o m e L o w e r - m i d d l e - i n c o m e U p p e r - m i d d l e - i n c o m e L o w - i n c o m e H i g h - i n c o m e U p p e r - m i d d l e - i n c o m e L o w e r - m i d d l e - i n c o m e L o w - i n c o m e –200 –100 0 100 200 Net flow (per 10,000 workers)
  • 196. 176 | World Development Report 2021 the implications for the population’s ability to access data-based services such as tele-education. Connecting poor people When it comes to connecting people to data infra- structure, the following steps are recommended. Keep costs down through competition. Governments should prioritize all measures to drive down the cost of service provision. Lower costs improve the com- mercial viability of services, thereby reducing cover- age gaps and accelerating technology upgrades. They also help to improve service affordability, thereby reducing usage and consumption gaps. Governments have two possible levers for reducing costs. The most fundamental is creating competitive pressures along the supply chain, including both its wholesale and retail tiers, while addressing structural impediments such as vertical integration. In addition, governments could create a regulatory environment that supports sharing bottleneck infrastructures in areas with low data traffic that could not otherwise support compet- itive provision of data infrastructure. Harness private sector investment. To develop digital infrastructuregovernmentsshouldrelyontheprivate sector wherever possible. This calls for privatization of state-owned incumbents and a policy of avoiding state investment (such as through vendor-financed models) in segments such as the national fiber-optic backbone where the private sector is willing to invest. The entry of new market players is an important market trend, including the growing role of content providers in building backbone infrastructure, which will require careful consideration of competition and regulatory issues such as net neutrality.113 With the advent of 5G, industrial players are also expected to become more active in the development of ICT infrastructure. Rethink universal service policies. The underperfor- mance of traditional universal service funds points to the need to rethink and modernize government policy measures to support universal service. Mea- sures such as license coverage obligations should be considered. The role of innovative technologies (such as TV white space) and new entrants (such as content providers) in reaching remote populations are also relevant. Supply-side subsidies should be competi- tively awarded and carefully targeted to those pockets that cannot be reached effectively after considering all other available measures. Furthermore, universal service funds could be redirected to addressing the usage gap by funding digital literacy programs or supporting access to lower-cost mobile devices. This would entail targeting such resources more toward disadvantaged population segments than to under- served geographic areas. Calibrate fiscal regimes carefully. The governments of poor countries have typically regarded digital infra- structure and associated data services as a potential fiscalcashcow(throughtaxes,fees,andothercharges) in the context of low mobilization of public sector revenues. This view has led to relatively high indirect taxes on mobile devices and data services, significant import duties on equipment, and in some cases high reservation fees for spectrum access. However, there are important trade-offs between the fiscal revenues generated by the sector in the short term and the pace of digital infrastructure rollout and service uptake in the longer term, which also has implications for eco- nomic growth and associated tax revenues over time. The design of the fiscal regime for digital infrastruc- ture and data services must therefore be carefully thought through to balance these competing policy objectives. There may be a case for giving lighter tax treatment to low-end mobile devices to support uptake by disadvantaged groups. Support upgrades to new technologies. Governments need to create an environment that enables acceler- ated upgrades to higher-generation technologies. The fiber-optic backbone is a critical prerequisite for further upgrade of networks to 4G and 5G technol- ogies. Thus measures to expand this network at any stage, in partnership with the private sector, would be a no-regrets strategy. In addition, allowing reform of the spectrum so that operators can repurpose exist- ing spectrum allocations would be a helpful strategy. License conditions could also be used to package such regulatory allowances with obligations for data ser- vice providers to provide wide-ranging service cover- age. Finally, the complexity of these new technologies will require adequate investments in cybersecurity protection.114 Connecting poor countries When connecting countries to data infrastructure, the following steps are recommended. Progressively develop domestic data infrastructure. Governments need to pay much more attention to the specific infrastructure required to support the sharing, storage, and processing of large volumes of data. To participate in the data-driven economy on a competitive basis, countries must be able to perform high-volume data operations at the greatest possible speed and lowest possible cost. The infrastructures that can meet these performance goals are internet exchange points, colocation data centers, and cloud computing.
  • 197. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 177 Promote creation of internet exchange points. IXPs remain scarce across the developing world, and even where they do exist they often fail to achieve their potential. Governments have a role in creating the enabling conditions for such institutions to emerge and be widely utilized. Better governance models are needed so that IXPs can develop and become sus- tainable. Collaboration between IXPs and key stake- holders such as ISPs, government entities, research networks, and content providers can help to raise awareness, expand skills, and create the trust essen- tial for IXPs to be successful. Create a favorable environment for colocation data centers. Neutral, privately owned colocation data cen- ters are an essential part of data infrastructure and critical for promoting the digital economy. They are a secure venue in which local and international com- panies can store their data and help support the local hosting industry. The willingness of private investors to install such facilities in low- and middle-income countries is affected by concerns about the enabling environment—in particular, the availability of clean and secure energy sources, as well as relative political stabilityandeaseofdoingbusiness.Governmentscan catalyze the market by moving their online services to data centers and by encouraging businesses to host locally to create economies of scale, while establish- ing a solid data protection framework to build trust. Secure on-ramps to the cloud. Cloud computing creates tremendous opportunities for low- and middle-income nations to gain remote access to advanced computing facilities for data management and analytics. The cloud also enables collaborative creation of software, thereby giving software devel- opers around the world opportunities to participate. However, cloud data centers and on-ramps are mainly located in upper-middle- and high-income nations. Low-income countries could induce the major cloud providers to locate closer through a regional approach that aggregates demand and harmonizes compliance requirements for security, data protection, and sover- eignty. Governments also need to foster an enabling environment that encourages cloud providers to locate in local data centers in order to provide an on-ramp to their services. Invest in and retain human resources. Realization of the potential for data infrastructure to contribute to economic development depends on adequate human resources, particularly in frontier areas such as data science and artificial intelligence. Although there is an acute global scarcity of these skills, evidence suggests that low- and middle-income nations are producing some graduates in these fields. However, wage differentials in a highly competitive global market are leading to a powerful brain drain effect, preventing those countries from harnessing these skills. The brain drain is often exacerbated by the lack of opportunities arising from undeveloped local data infrastructure. Governments need to stimulate their digital economies by encouraging private investment in fiber-optic backbones and data centers that gener- ate direct and indirect employment. The recommendations presented here are orga- nized within the maturity model framework in table 5.1, recognizing that different countries may be at different stages of developing data infrastructure. Table 5.1 Recommendations for data infrastructure improvements sequenced according to a maturity model Stage of country Connecting people Connecting countries Establishing fundamentals Eliminate coverage gaps by reducing costs through wholesale and retail competition, as well as infrastructure sharing arrangements, and, where still required, providing well-crafted state support. Ensure adequate international bandwidth. Create a competitive market environment for international gateways and internet service providers. Initiating data flows Narrow usage gaps through digital literacy campaigns, investment in basic education, lower taxation and import duties on low-end handsets, and support of local ventures for manufacturing handsets. Encourage creation of the first domestic IXP and facilitate participation by all relevant domestic players. Allow additional IXPs to emerge and players to formalize and mature. Encourage arrangements to cache popular international internet content on local servers. Optimizing the system Upgrade digital networks to the latest generation to improve speed and efficiency and facilitate higher consumption. Create a supportive environment in which colocation data centers can emerge, integrate with IXPs, and provide on-ramp access to cloud services. Source: WDR 2021 team. Note: IXPs = internet exchange points.
  • 198. 178 | World Development Report 2021 Notes 1. Briglauer and Gugler (2019); Czernich et al. (2011); Katz and Callorda (2018); Koutroumpis (2018); Minges (2015). 2. Bertschek and Niebel (2016). 3. Shapiro and Hassett (2012). 4. Hjort and Poulsen (2019). 5. van der Marel (2020). 6. See“HowDoesDataTravelontheInternet?”Networking Guides, https://networkingguides.com/how-does-data -travel-over-the-internet/. 7. Cisco (2018). 8. Ericsson (2020). 9. Cisco (2020). 10. Monash University (2020). 11. Sandvine (2019). 12. Text messages allow traders to check on agricultural prices, remind the sick when to take their medicine, and help nurses register births. Mobile money has unleashed add-on services in microinsurance, agri- culture, and transportation, and it is facilitating the deployment of off-grid energy by allowing users to repay the cost of solar panels with micropayments. Development agencies can make conditional cash transfers to mobile money accounts, reducing costs and increasing security. 13. Katz and Callorda (2018). 14. Czernich et al. (2009). 15. Anderson and Kumar (2019). 16. Broadband Commission (2019). 17. For further details on Sustainable Development Goal 9, Target 9.c, see Department of Economic and Social Affairs, United Nations, “Goals: 9, Build Resilient Infra- structure, Promote Inclusive and Sustainable Industri- alization and Foster Innovation,” https://sdgs.un.org /goals/goal9. 18. See, for example, Michie (1997) or Madden (2010). 19. SDG Target 9.c states: “Significantly increase access to information and communications technology and strive to provide universal and affordable access to the Internet in least developed countries by 2020” (Depart- ment of Economic and Social Affairs, United Nations, “Goals: 9, Build Resilient Infrastructure, Promote Inclusive and Sustainable Industrialization and Foster Innovation,” https://sdgs.un.org/goals/goal9). 20. Broadband Commission (2019). 21. Ericsson (2020). 22. Oughton et al. (2018). 23. Oughton et al. (2018). 24. World Bank (2016). 25. GSMA (2019c). 26. Kapko (2020). 27. Broadband Commission (2019). 28. Broadband Commission (2019). 29. GSMA (2019b). 30. See World Bank (2019c). One example is Liquid Telecom, which has rolled out 70,000 kilometers of fiber-optic cable through several African nations. See Liquid Telecom, “Our Network,” https://www.liquid telecom.com/about-us/our_network. 31. For example, landlocked Mongolia’s north-south fiber- optic backbone connecting it to China and the Russian Federation runs along the railway (Tsolmondelger 2019). 32. Strusani and Houngbonon (2020). 33. UN-OHRLLS (2017). 34. See the information on the Adopting TV White Spaces Project in Colombia (ITU 2018c, 33). 35. Loon (2020). 36. Iridium Communications (2020). 37. Intelecom Research and Consultancy (2016). 38. GSMA (2013). 39. World Wide Web Foundation and A4AI (2018). 40. ESCAP (2017). 41. ITU (2018a). 42. Chen (2021). 43. ITU (2018b). 44. GSMA (2020). 45. The United Nations Educational, Scientific, and Cul- tural Organization (UNESCO) defines digital literacy as “the ability to access, manage, understand, integrate, communicate, evaluate and create information safely and appropriately through digital technologies for employment, decent jobs and entrepreneurship. It includes competences that are variously referred to as computer literacy, ICT literacy, information literacy and media literacy” (Law et al. 2018). 46. Chen (2021). 47. Chen (2021). 48. Chen (2021). 49. GSMA (2017). 50. GSMA (2019a). 51. See Ministry of ICT and Innovation, “Digital Ambassa- dors Programme,” Kigali, Rwanda, https://www.minict .gov.rw/projects/digital-ambassadors-programme. 52. Radovanović et al. (2020). 53. Internet Society (2015). 54. Silver and Smith (2019). 55. GSMA (2019a). 56. A4AI (2020). 57. MTN (2020). 58. JioPhone, “Jio Digital Life,” Reliance Jio Infocomm Ltd, Mumbai, India, https://www.jio.com/en-in/jiophone. 59. New China (2018). 60. A4AI (2020). 61. GSMA (2019d). 62. GSMA (2020). 63. Telefónica S.A., “Quarterly Results: 2020 January– September,” https://www.telefonica.com/en/web/share holders-investors/financial_reports/quarterly-reports/. 64. A4A1 (2019). 65. Sonia (2020). 66. Chen and Minges (2021). 67. Silver et al. (2019). 68. ITU (2018c). 69. See Alliance for Affordable Internet, “Affordable Internet Is ‘1 for 2’,” https://a4ai.org/affordable-internet-is-1-for-2.
  • 199. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 179 70. See “Advocacy Target 2” (Broadband Commission 2020). 71. BBC News (2019a). 72. BBC News (2019b). 73. Based on information provided by one of Cambodia’s seven mobile operators, Smart (Smart Axiata 2019). 74. See the speed graph provided by West Central Tele- phone Association, Sebeka, MN, https://www.wcta .net/speed-demo/. 75. TRAI(2019). 76. Malisuwan, Tiamnara, and Suriyakrai (2015). 77. Sanni (2016). 78. Capitel (2016). 79. RURA (2019). 80. TeleGeography, Submarine Cables (database), https:// www.submarinecablemap.com/. 81. Miller (2019). 82. Ahmad and Salvadori (2020). 83. See World Bank, “West Africa Regional Communica- tions Infrastructure Project, APL 2,” https://projects .worldbank.org/en/projects-operations/project-detail /P123093. 84. World Bank (2019a). 85. See TeleGeography, Submarine Cable Frequently Asked Questions, https://www2.telegeography.com /submarine-cable-faqs-frequently-asked-questions. 86. Baynes (2018). 87. The development of domestic data infrastructure should not be confounded with the question of data localization. Data localization, a regulatory issue dis- cussed at some length under the trade section of chap- ter 7, concerns the adoption of government restrictions requiringthatacountry’sdatabestoredandsometimes processed on national territory, often with associated government controls on cross-border data transfers. Although domestic data infrastructure is a prerequisite for data localization, the development of domestic data infrastructure serves many other critical functions. In particular, it supports the cost-effective exchange of data among domestic parties and facilitates the access of country nationals to data from other jurisdictions by allowing copies of such data to be stored locally. 88. Agudelo et al. (2014). 89. Internet Society (2017). 90. The count of the number of IXPs in the world differs depending on the source. For example, PeeringDB (https://www.peeringdb.com/) reported 786 in June 2020. The variations are often due to differences in definitions (such as whether private peering facilities are included). 91. See Packet Clearing House, “Packet Clearing House Report on Internet Exchange Point Locations,” https:// www.pch.net/ixp/summary. 92. DE-CIX (2015). 93. Dobran (2018). Colocation data centers are vulnerable to physical and cybersecurity threats. They may be an attractive target for cybercriminals because they host large amounts of data and private information, all in the same location. However, because they have more resources, colocation sites can invest in better security protections than what could be achieved in-house for a typical small or medium-size business. 94. PeeringDB, https://www.peeringdb.com/. 95. CW (2016). 96. Comini, Srinivasan, and Minges (forthcoming). 97. EU-AITF (2018). 98. See Interconnect Help, Google, “Introduction to GGC,” https://support.google.com/interconnect/answer /9058809?hl=en. 99. Comini, Srinivasan, and Minges (forthcoming). 100. Balancing Act (2019). 101. Comini, Srinivasan, and Minges (forthcoming). 102. UNCTAD (2013). 103. All large cloud providers have International Organiza- tion for Standardization/International Electrotechnical Commission (ISO/IEC) 27000 certification compliant with regulatory and legal requirements that relate to the security of information (ISO/IEC, “Popular Stan- dards: ISO/IEC 27001, Information Security Manage- ment,” https://www.iso.org/isoiec-27001-information -security.html). 104. IDC (2019). 105. See Google, “Google Docs,” https://www.google.com /docs/about/. 106. International Business Machines, “IBM Cloud: Free Tier,” https://www.ibm.com/cloud/free. 107. See chapter 7 for a discussion about data localization regulations. 108. Keane (2019). 109. See DP Facilities, “The Critical Role Data Centers Play in Today’s Enterprise Networks: Part 3, Why Cloud On-Ramps Are Key for an Enterprise Migrating to the Cloud,” https://www.dpfacilities.com/blog/cloud -onramps-are-key-to-migration/. 110. World Bank (2019b). 111. GitHub, “Where the World Builds Software,” https:// github.com/. 112. GitHub, “The 2020 State of the OCTO–VERSE,” https:// octoverse.github.com/. 113. Because of the complexity of the topic, this chapter does not address the issue of net neutrality and its impact on market regulation and competition. 114. With their distributed routing approach and soft- ware-driven design, 5G networks present an array of new cybersecurity challenges that must be addressed before these networks are widely deployed. More- over, IoT devices are often manufactured without adequate cybersecurity protections, and they have security vulnerabilities. These vulnerabilities can be exploited by bad actors who can gain access to the network or harness the computational power of an IoT device for other malicious purposes, such as distributed denial of service attacks. A forthcoming World Bank 5G flagship report will address in detail the cybersecurity issues raised by the uptake of the 5G technology.
  • 200. 180 | World Development Report 2021 References A4AI(AllianceforAffordableInternet).2019.“The2019Afford- ability Report.” October, World Wide Web Foundation, Washington, DC. https://a4ai.org/affordability-report. A4AI (Alliance for Affordable Internet). 2020. “From Luxury to Lifeline: Reducing the Cost of Mobile Devices to Reach Universal Internet Access.” August 6, World Wide Web Foundation, Washington, DC. https://docs.google.com /document/d/1YFXbUr-W0LTOAXs9QEtlf8oEBsrakQ _lLeHUyb-6oEY/edit. Agudelo, Mauricio, Raúl Katz, Ernesto Flores-Roux, María Cristina Duarte Botero, Fernando Callorda, and Taylor Berry. 2014. Expansión de infraestructura regional para la interconexión de tráfico de internet en América Latina. Caracas, República Bolivariana de Venezuela: Devel- opment Bank of Latin America. http://scioteca.caf.com /handle/123456789/522. Ahmad, Najam, and Kevin Salvadori. 2020. “Building 2Africa, a Transformative Subsea Cable to Better Con- nect Africa.” Connectivity, Networking, Traffic (blog), May 13, 2020. https://engineering.fb.com/2020/05/13 /connectivity/2africa/. Anderson, Monica, and Madhumitha Kumar. 2019. “Digital Divide Persists Even as Lower-Income Americans Make Gains in Tech Adoption.” Fact Tank, News in the Numbers (blog), May 7, 2019. https://www.pewresearch.org/fact -tank/2019/05/07/digital-divide-persists-even-as-lower -income-americans-make-gains-in-tech-adoption/. Balancing Act. 2019. “Ugandan Government and Regulator Want to Take Control of Local IXP: The Latest in a Dis- turbing Pattern of Regulatory Interventions.” Balancing Act News, July 12. https://www.balancingact-africa.com /news/telecoms-en/45586/ugandan-government-and -regulator-want-to-take-control-of-local-ixp-the-latest-in -a-disturbing-pattern-of-regulatory-interventions. Baynes, Chris. 2018. “Entire Country Taken Offline for Two Days after Undersea Internet Cable Cut.” Independent, April 11. https://www.independent.co.uk/news/world /africa/mauritiana-internet-cut-underwater-cable -offline-days-west-africa-a8298551.html. BBC News. 2019a. “Mobile Data: Why India Has the World’s Cheapest.” BBC News, March 18. https://www.bbc.com /news/world-asia-india-47537201. BBC News. 2019b. “Reliance Jio: India’s Cheapest Data Pro- vider to Raise Prices.” BBC News, November 20. https:// www.bbc.com/news/world-asia-india-50484594. Bertschek, Irene, and Thomas Niebel. 2016. “Mobile and More Productive? Firm-Level Evidence on the Pro- ductivity Effects of Mobile Internet Use.” Telecommu- nications Policy 40 (9): 888–98. https://doi.org/10.1016/j .telpol.2016.05.007. Briglauer, Wolfgang, and Klaus Gugler. 2019. “Go for Giga- bit? First Evidence on Economic Benefits of High-Speed Broadband Technologies in Europe.” JCMS, Journal of Common Market Studies 57 (5): 1071–90. https://doi.org /10.1111/jcms.12872. Broadband Commission (United Nations Broadband Commission for Sustainable Development). 2019. Con- necting Africa through Broadband: A Strategy for Doubling Connectivity by 2021 and Reaching Universal Access by 2030. October. Geneva: Broadband Commission Working Group on Broadband for All, International Telecommu- nication Union. https://www.broadbandcommission.org /Documents/working-groups/DigitalMoonshotfor Africa_Report.pdf. Broadband Commission (United Nations Broadband Com- mission for Sustainable Development). 2020. The State of Broadband 2020: Tackling Digital Inequalities; A Decade for Action. September. Geneva: International Telecommuni- cation Union. https://broadbandcommission.org/Pages /targets/Target-3.aspx. Buckholtz, Alison. 2019. “Africa’s IT Talent Pool.” IFC Insights (blog), December 2019. https://www.ifc.org/wps/wcm /connect/news_ext_content/ifc_external_corporate_site /news+and+events/news/insights/africa-it-talent. CW (Cushman and Wakefield). 2016. “Data Centre Risk Index.” CW, London. https://verne-global-lackey.s3 .amazonaws.com/uploads%2F2017%2F1%2Fb5e0a0da -5ad2-01b3-1eb8-8f782f22a534%2FC%26W_Data_Centre +Risk_Index_Report_2016.pdf. Capitel. 2016. “Economics of Mobile Data in Frontier and Emerging Markets.” Techno-Commercial Planning and Transaction Advisory in Telecoms, Media, and Technology, Capitel, Gurgaon, India; Singapore. https:// capitelpartners.com/wp-content/uploads/2018/04 /Capitel-Economics-of-mobile-data.pdf. Chen, Rong. 2021. “A Demand-Side View of Mobile Internet Adoption in the Global South.” Policy Research Work- ing Paper 9590, World Bank, Washington, DC. http:// documents.worldbank.org/curated/en/49287161635092 9155/A-Demand-Side-View-of-Mobile-Internet-Adoption -in-the-Global-South. Chen, Rong, and Michael Minges. 2021. “Minimum Data Consumption: How Much Is Needed to Support On-Line Activities, and Is It Affordable?” Digital Development: AnalyticalInsights,Note3,WorldBank,Washington,DC. http://pubdocs.worldbank.org/en/742001611762098567 /Analytical-Insights-Series-Jan-2021.pdf. Cisco. 2018. “Cisco Visual Networking Index: Forecast and Trends, 2017–2022.” White Paper, November, Cisco Systems, San Jose, CA. https://cloud.report/Resources /Whitepapers/eea79d9b-9fe3-4018-86c6-3d1df813d3b8 _white-paper-c11-741490.pdf. Cisco. 2020. “Cisco Annual Internet Report (2018–2023).” White Paper, March, Cisco Systems, San Jose, CA. https:// www.cisco.com/c/en/us/solutions/collateral/executive -perspectives/annual-internet-report/white-paper-c11 -741490.html. Comini, Niccolo, Sharada Srinivasan, and Michael Minges. Forthcoming. “The Role of a National Data Infrastruc- ture: Internet Exchange Points, Content Delivery Net- works, and Data Centers.” WDR 2021 background paper, World Bank, Washington, DC. Czernich, Nina, Oliver Falck, Tobias Kretschmer, and Ludger Woessmann. 2009. “Broadband Infrastructure and Eco- nomic Growth.” CESifo Working Paper 2861 (December), MunichSocietyforthePromotionofEconomicResearch, Center for Economic Studies, Ludwig Maximilian Uni- versity and Ifo Institute for Economic Research, Munich. https://www.cesifo.org/DocDL/cesifo1_wp2861.pdf.
  • 201. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 181 Czernich, Nina, Oliver Falck, Tobias Kretschmer, and Ludger Woessmann. 2011. “Broadband Infrastructure and Economic Growth.” Economic Journal 121 (552): 505– 32. doi:10.1111/j.1468-0297.2011.02420.x. DE-CIX (Deutscher Commercial Internet Exchange). 2015. “DE-CIX: From the Interconnection of Three Internet Service Providers back in 1995 to the World’s Leading InternetExchange.”DE-CIXExchange,Frankfurt.https:// www.de-cix.net/Files/d4167da7aafe0da34f384181606db 6b8572cf2c1/DE-CIX_From-the-interconnection-of-three -ISP-to-the-worlds-leading-IX.pdf. Dobran, Bojana. 2018. “Data Center Tier Classification Levels Explained (Tier 1, 2, 3, 4).” PhoenixNAP (blog), May 2, 2018. https://phoenixnap.com/blog/data-center -tiers-classification. Ericsson. 2020. “Mobility Report.” Ericsson, Stockholm. https://www.ericsson.com/4adc87/assets/local/mobility -report/documents/2020/november-2020-ericsson -mobility-report.pdf. ESCAP (United Nations Economic and Social Commission for Asia and the Pacific). 2017. “The Impact of Universal Service Funds on Fixed-Broadband Deployment and Internet Adoption in Asia and the Pacific.” Asia-Pacific Information Superhighway (AP-IS) Working Paper, ICT and Development Section, Information and Communi- cations Technology and Disaster Risk Reduction Divi- sion, ESCAP, Bangkok, Thailand. https://www.unescap .org/sites/default/files/Universal%20Access%20and%20 Service%20Funds.pdf. EU-AITF(EuropeanUnion–AfricaInfrastructureTrustFund). 2018. “EU–Africa Infrastructure Trust Fund: Annual Report 2017.” European Investment Bank, Luxembourg. https://www.eib.org/attachments/country/eu_africa _infrastructure_trust_fund_annual_report_2017_en.pdf. GSMA (GSM Association). 2013. Universal Service Fund Study. London:GSMA.https://www.gsma.com/publicpolicy/wp -content/uploads/2016/09/GSMA2013_Report_Survey OfUniversalServiceFunds.pdf. GSMA (GSM Association). 2017. Connected Society: Mobile Internet Skills Training Toolkit; A Guide for Training People in Basic Mobile Internet Skills. London: GSMA. https://www .gsma.com/mobilefordevelopment/connected-society /mistt/. GSMA (GSM Association). 2019a. “Connected Society: Mobile Internet Skills Training Toolkit; Banglalink Pilot Evaluation.” GSMA, London. https://www.gsma .com/mobilefordevelopment/resources/mobile-internet -skills-training-toolkit-banglalink-pilot-evaluation/. GSMA (GSM Association). 2019b. “Connected Society: The State of Mobile Internet Connectivity 2019.” GSMA, London. https://www.gsma.com/mobilefordevelopment /resources/the-state-of-mobile-internet-connectivity -report-2019/. GSMA (GSM Association). 2019c. “GSMA Connected Soci- ety: Closing the Coverage Gap; How Innovation Can Drive Rural Connectivity.” GSMA, London. https://www .gsma.com/mobilefordevelopment/wp-content/uploads /2019/07/GSMA-Closing-The-Coverage-Gap-How -Innovation-Can-Drive-Rural-Connectivity-Report-2019 .pdf. GSMA (GSM Association). 2019d. “Rethinking Mobile Tax- ation to Improve Connectivity.” GSMA, London. https:// www.gsma.com/publicpolicy/wp-content/uploads /2019/02/Rethinking-mobile-taxation-to-improve -connectivity_Feb19.pdf. GSMA (GSM Association). 2020. “Connected Women: The Mobile Gender Gap Report 2020.” GSMA, London. https://www.gsma.com/mobilefordevelopment/wp -content/uploads/2020/05/GSMA-The-Mobile-Gender -Gap-Report-2020.pdf. Hjort, Jonas, and Jonas Poulsen. 2019. “The Arrival of Fast Internet and Employment in Africa.” American Economic Review 109 (3): 1032–79. IDC (International Data Corporation). 2019. “Worldwide Public Cloud Services Revenue Grows to Nearly $183 Billion in 2018, Led by the Top 5 Service Providers and Accelerating Public Cloud Services Spending in China.” IDC, Framingham, MA. https://www.idc.com/getdoc .jsp?containerId=prUS45411519. Intelecom Research and Consultancy. 2016. “ICT Access Gaps Study: Final Report.” Communications Authority of Kenya, Nairobi. https://ca.go.ke/wp-content/uploads /2018/02/ICT-Access-Gaps-Report-April-2016-.pdf. Internet Society. 2015. “Local Content: An Internet Society Public Policy Briefing.” Public Policy, Internet Society, Reston, VA. https://www.internetsociety.org/policybriefs /localcontent/. Internet Society. 2017. “The Benefits of Local Content Host- ing: A Case Study.” Internet Society, Reston, VA. https:// www.internetsociety.org/wp-content/uploads/2017/08 /ISOC_LocalContentRwanda_report_20170505.pdf. Iridium Communications. 2020. 2019AnnualReport:Reliability Above All. McLean, VA: Iridium Communications. https:// investor.iridium.com/annual-reports. ITU (International Telecommunication Union). 2018a. ICTs, LDCs, and the SDGs: Achieving Universal and Affordable Internet in the Least Developed Countries. Thematic Report: ITU Development, LDCs and Small Island Develop- ing States Series. Geneva: ITU. https://www.itu.int/en /ITU-D/LDCs/Pages/Publications/LDCs/D-LDC-ICTLDC -2018-PDF-E.pdf. ITU (International Telecommunication Union). 2018b. “ITU’s Approach to Bridging the Digital Gender Divide.” ITU News, October 22, 2018. https://www.itu.int/es/myitu /News/2020/05/22/11/16/ITU-s-approach-to-bridging-the -digital-gender-divide. ITU (International Telecommunication Union). 2018c. Report on the WSIS Stocktaking 2018. Geneva: ITU. Kapko, Matt. 2020. “Rakuten Mobile Delivers Its Virtualized Reality.” SDxCentral News, April 8, 2020. https://www .sdxcentral.com/articles/news/rakuten-mobile-delivers -its-virtualized-reality/2020/04/. Katz, Raul, and Fernando Callorda. 2018. “The Economic Contribution of Broadband, Digitization, and ICT Regulation.” Thematic Report: Regulatory and Market Environment Series, International Telecommunication Union, Geneva. https://www.itu.int/pub/D-PREF-EF.BDR -2018. Keane, Tom. 2019. “Microsoft Opens First Datacenters in Africa with General Availability of Microsoft Azure.” Microsoft Azure Announcements (blog), March 6, 2019. https://azure.microsoft.com/en-us/blog/microsoft-opens
  • 202. 182 | World Development Report 2021 -first-datacenters-in-africa-with-general-availability-of -microsoft-azure/. Koutroumpis, Pantelis. 2018. “The Economic Impact of Broadband: Evidence from OECD Countries.” Ofcom, London. https://www.ofcom.org.uk/__data/assets/pdf _file/0025/113299/economic-broadband-oecd-countries .pdf. Law, Nancy, David Woo, Jimmy de la Torre, and Gary Wong. 2018. A Global Framework of Reference on Digital Literacy Skills for Indicator 4.4.2. Information Paper 51, UIS/2018/ICT/IP/51. Montreal: Institute for Statistics, United Nations Educational, Scientific, and Cultural Organization. http://uis.unesco.org/sites/default/files /documents/ip51-global-framework-reference-digital -literacy-skills-2018-en.pdf. Loon. 2020. “The Stratosphere.” Loon, Mountain View, CA. https://loon.com/static/pdfs/Stratosphere_Whitepaper _May1.pdf. Madden, Gary. 2010. “Economic Welfare and Universal Service.” Telecommunications Policy 34 (1–2): 110–16. https://www.sciencedirect.com/science/article/abs/pii /S0308596109001141. Malisuwan, Settapong, Noppadol Tiamnara, and Nattakit Suriyakrai. 2015. “A Post-Auction Review of 2.1 GHz Spectrum Licensing Obligations in Thailand.” Interna- tional Journal of Innovation, Management, and Technology 6 (4): 285–89. https://doi.org/10.7763/IJIMT.2015.V6.616. Manulis, Mark, Chris P. Bridges, Richard Harrison, Venk- katesh Sekar, and Andy Davis. 2020. “Cyber Security in New Space: Analysis of Threats, Key Enabling Technol- ogies, and Challenges.” International Journal of Information Security. Published ahead of print, May 12, 2020. https:// link.springer.com/article/10.1007/s10207-020-00503-w. Michie, Jonathan. 1997. “Network Externalities: The Eco- nomics of Universal Access.” Utilities Policy 6 (4): 317–24. http://www.sciencedirect.com/science/article/pii/S095 717879700026X. Miller, Jayne. 2019. “This Is What Our 2019 Submarine Cable Map Shows Us about Content Provider Cables.” TeleGeography (blog), March 19, 2019. https://blog.tele geography.com/this-is-what-our-2019-submarine-cable -map-shows-us-about-content-provider-cables. Minges, Michael. 2015. “Exploring the Relationship between Broadband and Economic Growth.” Background paper, World Development Report 2016: Digital Dividends, World Bank, Washington, DC. http://documents.worldbank.org /curated/en/178701467988875888/Exploring-the-relation ship-between-broadband-and-economic-growth. Monash University. 2020. “World’s Fastest Internet Speed from a Single Optical Chip.” ScienceDaily, May 22, 2020. https://www.sciencedaily.com/releases/2020/05/2005 22095504.htm. MTN. 2020. “#Good Together: Sustainability Report for the Year Ended 31 December 2019.” March 31, 2020, MTN Group, Johannesburg, South Africa. https://www.mtn .com/wp-content/uploads/2020/03/MTN-Sustainability -report.pdf. NewChina. 2018. “Chinese Phone Maker Celebrates 100-Mln- USD Export Milestone in Ethiopia.” New China, Novem- ber 21, 2018. http://www.xinhuanet.com/english/2018-11 /21/c_137622555.htm. Oughton, Edward, Zoraida Frias, Tom Russell, Doug- las Sicker, and David D. Cleevely. 2018. Towards 5G: Scenario-Based Assessment of the Future Supply and Demand for Mobile Telecommunications Infrastruc- ture.” Technological Forecasting and Social Change 133 (August): 141–55. https://doi.org/10.1016/j.techfore.2018 .03.016. Radovanović, Danica, Christine Holst, Sarbani Banerjee Belur, Ritu Srivastava, Georges Vivien Houngbonon, Erwan Le Quentrec, Josephine Miliza, Andrea S. Winkler, and Josef Noll. 2020. “Digital Literacy Key Performance Indicators for Sustainable Development.” Social Inclusion 8 (2): 151–67. https://doi.org/10.17645/si.v8i2.2587. RURA (Rwanda Utilities Regulatory Authority). 2019. “Statis- tics Report for Telecom, Media, and Broadcasting Sector as of the Fourth Quarter of the Year 2019.” ICT Quarterly Statistics Report, RURA, Kigali, Rwanda. https://rura.rw /fileadmin/Documents/ICT/statistics/ICT_Quarterly _Statistics_report_as_of_December_2019.pdf. Sandvine. 2019. “The Global Internet Phenomena Report.” Sandvine, Fremont, CA. https://www.sandvine.com/hubfs /Sandvine_Redesign_2019/Downloads/Internet%20 Phenomena/Internet%20Phenomena%20Report%20 Q32019%2020190910.pdf. Sanni, Shola. 2016. “How to Implement Spectrum Re- Farming.” GSM Association, London. https://www.gsma .com/spectrum/wp-content/uploads/2017/11/10-Day-2 -Session-3-How-to-Implement-Spectrum-Refarming -Shola-Sanni.pdf. Shapiro, Robert J., and Kevin A. Hassett. 2012. “The Employ- ment Effects of Advances in Internet and Wireless Technology: Evaluating the Transitions from 2G to 3G and from 3G to 4G.” NDN and New Policy Institute, Washington, DC. http://www.sonecon.com/docs/studies /Wireless_Technology_and_Jobs-Shapiro_Hassett -January_2012.pdf. Silver, Laura, and Aaron Smith. 2019. “In Some Countries, Many Use the Internet without Realizing It.” Fact Tank, News in the Numbers (blog), May 2, 2019. https://www .pewresearch.org/fact-tank/2019/05/02/in-some -countries-many-use-the-internet-without-realizing-it/. Silver, Laura, Emily A. Vogels, Mara Mordecai, Jeremiah Cha, Raea Rasmussen, and Lee Rainie. 2019. “Mobile Divides in Emerging Countries.” Internet Technology (blog), November 20, 2019. https://www.pewresearch .org/internet/2019/11/20/mobile-divides-in-emerging -economies/. Smart Axiata. 2019. “Sweating Assets to Drive Down Cost/ GB.” Slide presentation at “Axiata Analyst Investor Day,” Smart Axiata, Phnom Penh, Cambodia. https:// axiata.listedcompany.com/misc/6c_Operational _Excellence_in_Action-Smart.pdf. Sonia, Jorge. 2020. “Covid-19 Shows We Need More Than Basic Internet Access: We Need Meaningful Connectiv- ity.” Alliance for Affordable Internet, Washington, DC. https://a4ai.org/covid-19-shows-we-need-more-than -basic-internet-access-we-need-meaningful-connectivity/. Strusani, Davide, and Georges Vivien Houngbonon. 2020. “Accelerating Digital Connectivity through Infrastruc- ture Sharing.” EMCompass Note 79, International Finance Corporation, Washington, DC. https://www.ifc
  • 203. Data infrastructure policy: Ensuring equitable access for poor people and poor countries | 183 World Bank. 2016. World Development Report 2016: Digital Dividends. Washington, DC: World Bank. https://www .worldbank.org/en/publication/wdr2016. World Bank. 2019a. “Implementation Completion and Results Report, Tonga: Pacific Regional Connectivity Program.” Report No. ICR00004623, World Bank, Wash- ington, DC. http://documents.worldbank.org/curated /en/844361556573010769/Tonga-Pacific-Regional -Connectivity-Program-Project. World Bank. 2019b. Information and Communications for Development 2018: Data-Driven Development. Washington, DC: World Bank. https://openknowledge.worldbank.org /handle/10986/30437. World Bank. 2019c. Innovative Business Models for Expanding Fiber-Optic Networks and Closing the Access Gaps. Wash- ington, DC: Digital Development Partnership, World Bank. https://documents.worldbank.org/en/publication /documents-reports/documentdetail/674601544534500 678/main-report. World Wide Web Foundation and A4AI (Alliance for Afford- able Internet). 2018. “Universal Service and Access Funds: An Untapped Resource to Close the Gender Digital Divide.” World Wide Web Foundation, Wash- ington, DC. https://webfoundation.org/research/closing -gender-digital-divide-in-africa. Zhu, Tingting Juni, Alan Fritzler, and Jan Alexander Kaz- imierz Orlowski. 2018. “World Bank Group–LinkedIn Data Insights: Jobs, Skills, and Migration Trends; Meth- odology and Validation Results.” World Bank, Wash- ington, DC. https://datacatalog.worldbank.org/dataset /world-bank-group-linkedin-digital-data-development /resource/b16c3403-2d59-45ca-bc31. .org/wps/wcm/connect/Publications_EXT_Content/IFC _External_Publication_Site/Publications_Listing_Page /EMCompass-Note-79-Digital-Infrastructure-Sharing. TRAI (Telecom Regulatory Authority of India). 2019. “Wire- less Data Services in India: An Analytical Report.” TRAI, New Delhi. https://trai.gov.in/sites/default/files/Wireless _Data_Service_Report_21082019.pdf. Tsolmondelger, Odkhuu. 2019. “ICT Infrastructure along Transport Network.” Presentation, November 20, 2019, Information Communications Network LLC, Ulaan- baatar, Mongolia. https://www.unescap.org/sites/default /files/ICT%20Infrastructure%20Along%20Transport%20 Network%2C%20Mongolia%20NetCom.pdf. UNCTAD (United Nations Conference on Trade and Development). 2013. Information Economy Report 2013: The Cloud Economy and Developing Countries. Geneva: United Nations. https://unctad.org/system/files/official -document/ier2013_en.pdf. UN-OHRLLS (Office of the High Representative for the Least Developed Countries, Landlocked Developing Countries, and Small Island Developing States). 2017. “Leveraging Investments in Broadband for National Development: The Case of Rwanda and Senegal.” UN-OHRLLS, United Nations, New York. https://unohrlls.org/custom-content /uploads/2017/07/Leveraging-Investments-in-Broadband -for-National-Development-2017.pdf. van der Marel, Erik. 2020. “Sources of Comparative Advan- tage in Data-Related Services.” Working Paper EUI RSCAS 2020/30, Robert Schuman Center for Advanced Studies, Global Governance Programme-393, European University Institute, San Domenico di Fiesole (FI), Italy. http://hdl.handle.net/1814/66987.
  • 204. 184 | World Development Report 2021 Massive increases in data traffic herald greater use of online platforms and underscore the importance of including currently excluded users. Spotlight 5.1 How the COVID-19 pandemic has recalibrated expectations of reasonable data consumption and highlighted the digital divide Around the world, the COVID-19 pandemic has resulted in millions of people using videoconferenc- ing for working and learning from home and other activities. The three main platforms reported around 700 million daily users in March–April 2020. Adding in other platforms, the number of users was equal to roughly one-tenth of the world’s population. Zoom’s average number of users jumped from 10 million in December 2019 to 300 million in April 2020.1 Cisco’s Webex recorded 324 million users in March 2020, doubling from January 2020.2 Microsoft Teams had 75 million daily users in April 2020.3 Being home- bound also resulted in more use of social media, video streaming, and online gaming.4 These online activities have driven massive increases in data traffic. In Spain, internet traffic increased 40 percent in the week following the shut- down, while mobile data traffic rose 25 percent.5 The German internet exchange DE-CIX, one of the world’s largest, recorded a 10 percent increase in traffic during the first two weeks in March, when shelter-at- home was implemented in the country, breaking the world record for data throughput. Videoconferencing traffic on DE-CIX rose 50 percent, and gaming and social media traffic grew 25 percent.6 For the most part, telecommunications networks have stood up well to this massive increase in traf- fic. Networks are engineered to handle peak traffic, resulting in large parts of the day where capacity is more than sufficient. Similar to flattening the curve for COVID-19, telecommunications operators worked to smooth fluctuations in traffic flows during the height of the pandemic in spring 2020. Widespread deployment of high-capacity fiber-optic backbone and access networks has proved vital for dealing with the surge in traffic. During COVID-19, regulators have increased mobile data capacity by releasing spec- trum,7 and streaming video services have reduced traffic 25 percent by using compression technology.8 Some telecommunications operators have increased their data allowances, and some have provided free data and smartphones to health workers. Telecommunication networks have thus far proved up to the task in the new social distancing world for those who already have access, but limita- tions have prevented the transition of vital public services, such as education, to the online space, with major repercussions for schooling. Many predict that videoconferencing will continue to be used more after the pandemic, though not at the same high level. The ability to use Web conferencing tools has shone a spotlight on the digital divide. Many students around the world have been excluded from online learning because they lack broadband access and computers. Concerns about security surround video conferencing9 and data privacy for big data analytics used during the pandemic.10 Telecommunication companies have also been criticized for waiting for a crisis to offer pro-consumer data allowances.
  • 205. How the COVID-19 pandemic has recalibrated expectations of reasonable data consumption | 185 Notes 1. Zoom (2020). 2. Mukherjee (2020). 3. Spencer, Nadella, and Hood (2020). 4. Sandvine (2020). 5. See Telefónica (2020). 6. DE-CIX (2020). 7. GSMA (2020). 8. Florance (2020). 9. Paul (2020). 10. OECD (2020). References DE-CIX (Deutscher Commercial Internet Exchange). 2020. “Internet Exchange Operator DE-CIX Sees a Strong Change in Internet User Behavior.” Press Release, March 18, 2020. https://www.de-cix.net/en/about-de-cix /media-center/press-releases/internet-exchange -operator-de-cix-sees-a-strong-change-in-internet-user -behavior. Florance, Ken. 2020. “Reducing Netflix Traffic Where It’s Needed While Maintaining the Member Experience.” Innovation (blog), March 21, 2020. https://about.netflix .com/en/news/reducing-netflix-traffic-where-its-needed. GSMA (GSM Association). 2020. “Keeping Everyone and Everything Connected: How Temporary Access to Spec- trum Can Ease Congestion during the COVID-19 Crisis.” Newsroom (blog), March 31, 2020. https://www.gsma.com /newsroom/blog/keeping-everyone-and-everything -connected-how-temporary-access-to-spectrum-can -ease-congestion-during-the-covid-19-crisis/. Mukherjee, Supantha. 2020. “Cisco’s Webex Draws Record 324 Million Users in March.” Technology News (blog), April 3, 2020. https://www.reuters.com/article/us-cisco -systems-webex-idUSKBN21L2SY. OECD (Organisation for Economic Co-operation and Devel- opment). 2020. “Tracking and Tracing COVID: Protecting Privacy and Data while Using Apps and Biometrics.” OECD Policy Responses to Coronavirus (COVID-19), OECD, Paris. https://read.oecd-ilibrary.org/view/?ref=129 _129655-7db0lu7dtotitle=Tracking-and-Tracing-COVID -Protecting-privacy-and-data-while-using. Paul, Kari. 2020. “Zoom Releases Security Updates in Response to ‘Zoom-Bombings.’” Guardian, April 23, 2020. http://www.theguardian.com/technology/2020/apr/23 /zoom-update-security-encryption-bombing. Sandvine. 2020. “The Global Internet Phenomena Report: COVID-19 Spotlight.” Sandvine, Fremont, CA. https:// www.sandvine.com/phenomena. Spencer, Michael, Satya Nadella, and Amy Hood. 2020. “Microsoft Fiscal Year 2020 Third Quarter Earnings Con- ference Call.” Microsoft, Redmond, WA, April 29, 2020. https://www.microsoft.com/en-us/Investor/events/FY -2020/earnings-fy-2020-q3.aspx. Telefónica. 2020. “Operators Advise a Rational and Respon- sible Use of Telecommunication Networks to Cope with Traffic Increases.” News Release, March 15, 2020. https:// www.telefonica.com/documents/737979/145808680/pr -usoresponsable-redes.pdf/874e69ed-a201-92b3-4d54 -60de558d0084?version=1.0. Zoom. 2020. “Zoom: Annual Report, Fiscal 2020.” Zoom Video Communications, San Jose, CA. https:// investors.zoom.us/static-files/28614884-1d63-477a -9148-a7039796f19c.
  • 206. 186 | World Development Report 2021 Data infrastructure is becoming increasingly energy efficient and turning to renewable sources of energy. Increased use of information and communication technology solutions such as videoconferencing could help to reduce global carbon dioxide emissions by one-fifth by 2030. Spotlight 5.2 Data’s carbon footprint Data infrastructure consumes significant amounts of energy, with environmental consequences, includ- ing global warming. Electricity consumption for data infrastructure amounted to 1 percent of the global total in 2018 (231 terawatt-hours) (figure S5.2.1, panel a). Although data traffic grew 100 percent from 2015 to 2018, associated electricity consumption rose just 16 percent, and its share of total global con- sumption remains constant. Huge gains in energy efficiency have made this possible. One reason is a shift from smaller data centers to more efficient larger ones, particularly among some of the bigger players in China, Japan, and the United States. Mod- ernization of telecommunication networks is also contributing. Fiber-optic cable is 85 percent more energy efficient than vintage copper wires, while each successive generation of wireless technology conserves more energy than the previous one. For Figure S5.2.1 Worldwide greenhouse gas emissions from data consumption have been flat, even though electricity consumption has been growing Source: World Bank, original research based on annual reports from 49 companies in 26 countries. Data at http://bit.do/WDR2021-Fig-S5_2_1. Note: Data were derived from environmental reports of companies accounting for 90 percent of data traffic. IT = information technology; tCO2 e= tonnes of carbon dioxide equivalent. 0 0.2 0.4 0.6 0.8 1.0 1.2 0 50 100 150 200 250 Terawatt-hours Percent Percent tCO 2 e (millions) 2015 2016 2017 2018 a. Electricity consumption Data infrastructure % of world total (right axis) 0 0.1 0.2 0.3 0 25 50 75 100 125 2015 2016 2017 2018 b. Greenhouse gas emissions from data infrastructure % of world total (right axis) Software and IT services Telecommunication services
  • 207. Data’s carbon footprint | 187 example, 4G can be more than 50 times more energy efficient than 2G.1 Despite rising electricity consumption, green- house gas (GHG) emissions from data infrastructure have been flat since 2015, equivalent to 0.2 percent of the global total (see figure S5.2.1, panel b). This is due to the rising share of renewables in the electricity mix used by data centers.2 For example, Equinix, one of the world’s leading data center operators, increased its share of renewable energy from less than one-third in 2014 to 92 percent in 2018, leading to a two-thirds reduction in GHG emissions.3 Moreover, the tech giants—Apple, Google, and Microsoft—have switched to 100 percent renewable energy, while completely offsetting their GHG emissions. Apple generates more than 600 megawatt-hours of its own energy through one of the largest investments in renewables outside of utility companies,4 while Google is the world’s larg- est nonutility buyer of renewable energy. Digital com- panies already account for one-quarter of the world’s renewable energy purchases,5 and they are encourag- ing their downstream suppliers to follow suit.6 Accesstorenewableenergyisbecominganimport- ant factor in deciding where to expand data infra- structure for dozens of major investors. For example, Equinix seeks markets with favorable renewable energy policies when deciding on the location of new data centers.7 Foreign operators are developing renewable energy projects in low- and middle-income nations. For example, Vodafone Group is constructing autility-scalesolarparkintheArabRepublicofEgypt.8 Data infrastructure is also playing a significant role in reducing emissions, as demonstrated by the COVID-19 outbreak. The massive use of video confer- encing, fueled by the pandemic, has greatly reduced fuel consumption associated with travel—potentially on a permanent basis.9 Increased use of information and communication technology (ICT) solutions could lead to a 20 percent reduction in global carbon dioxide (CO2 ) emissions by 2030.10 This reduction is almost 10 times greater than the ICT sector’s own footprint during the same period and translates into as much potential emissions reduction as the mobility, manu­ facturing, agriculture, building, and energy sectors combined. Notes 1. IEA (2020). 2. There are, nonetheless, some subtleties of environmen- tal accounting. Direct emissions from electricity are reported on either a market basis or a location basis (Equinix 2019). Market basis reflects the emissions from the electricity that a company is purchasing, which may differ from the electricity that the company is using at its location. Location-based emissions are higher, yet most companies report market-based emissions. 3. Equinix (2019). 4. Apple (2019). 5. Ambrose (2020). 6. In 2018 Apple announced that 23 of its suppliers had committed to 100 percent clean energy (Apple 2018). 7. Nareit (2019). 8. Vodafone (2019). 9. Darrow (2020). 10. GeSI (2015). References Ambrose, Jillian. 2020. “Tech Giants Power Record Surge in Renewable Energy Sales.” Guardian, January 28, 2020. https://www.theguardian.com/environment/2020/jan /28/google-tech-giants-spark-record-rise-in-sales-of -renewable-energy. Apple. 2018. “Apple Now Globally Powered by 100 Percent Renewable Energy.” Press Release, April 9, 2018. https:// www.apple.com/newsroom/2018/04/apple-now-globally -powered-by-100-percent-renewable-energy/. Apple. 2019. “2019 Environmental Responsibility Report.” Apple Inc., Cupertino, CA, April 2019. https://www.apple .com/environment/pdf/Apple_Environmental_Respon sibility_Report_2019.pdf. Darrow, Barb. 2020. “Zoom Taps Oracle to Keep Schools Teaching, Businesses Running, Friends and Family Con- nected.” Oracle News Connect (blog), April 28, 2020. https:// www.oracle.com/corporate/blog/zoom-selects-oracle -042820.html. Equinix. 2019. “2018 Corporate Sustainability Report: Con- necting with Purpose.” Equinix, Redwood City, CA. https://sustainability.equinix.com/wp-content/uploads /2019/12/Sustainability-Report-2018.pdf. GeSI (Global e-Sustainability Initiative). 2015. #SMARTer2030: ICT Solutions for 21st Century Challenges. Brussels: Global e-Sustainability Initiative. http://smarter2030.gesi.org /downloads/Full_report.pdf. IEA (International Energy Agency). 2020. “Data Centres and Data Transmission Networks.” Tracking Report, IEA, Paris, June 2020. https://www.iea.org/reports/data -centres-and-data-transmission-networks. Nareit. 2019. “REIT Industry ESG Report.” Nareit, Washing- ton, DC, June 2019. https://www.reit.com/sites/default /files/media/PDFs/Research/Nareit2019_Sustainability Report_webv3.pdf. Vodafone.2019.“SustainableBusinessReport2019.”Vodafone Group Plc, Newbury, UK. https://www.vodafone.com /content/dam/vodcom/sustainability/pdfs/sustainable business2019.pdf.
  • 209. Data policies, laws, and regulations: Creating a trust environment | 189 Main messages Trust in data transactions is sustained by a robust legal and regulatory framework encompassing both safeguards, which prevent the misuse of data, and enablers, which facilitate access to and reuse of data. Safeguards must differentiate between personal data, requiring a rights- based approach with individual protection, and nonpersonal data, allow- ing a balancing of interests in data reuse. Enablers for data sharing are typically more developed for public intent data, where public policy and law mandating data access and sharing are more readily established, than for private intent data, where govern- ments have more limited influence. Creation of a trust environment remains a work in progress worldwide, especially in low-income countries. There is no one-size-fits-all legal and regulatory framework. In countries with weak regulatory environments, the design of suitable safeguards and enablers may have to be carefully adapted to local priorities and capacities. Data policies, laws, and regulations: Creating a trust environment 6 CHAPTE R 1 2 3 4
  • 210. 190 | World Development Report 2021 A trust framework of data safeguards and enablers W ith the growing recognition of the use, misuse, and underuse of data, respon- sible governance of data has gained importance, resulting in new global legal and regu- latory standards. This movement was propelled by the revelations in 2013 by US whistleblower Edward Snowden of global surveillance by Western democra- cies,1 followed by the Cambridge Analytica scandal in 2018.2 In response, countries enacted major policies to protect data. A series of epochal rulings by India’s Supreme Court identified a constitutional right to privacy, and the country is now considering new data protection legislation. In the European Union (EU), its General Data Protection Regulation (GDPR) came into force in 2018 with its application beyond the EU’s borders,3 and it inspired similar legislation in other jurisdictions,suchastheUSstateofCalifornia.4 China implemented its Personal Information Standard in 2018, promulgated its Civil Code in 2020, and intro- duced a new draft Personal Data Protection Law for public consultation in 2020.5 Despite these important advances regarding personal data, legal frameworks for data governance across much of the developing world remain a patchwork, raising concerns about the ability of lower-income countries to benefit from the development opportunities emerging from the burgeoning global data economy. This greater attention to the use and reuse of personal data is part of an evolving social contract around data, which remains under negotiation across the globe (see spotlight 6.1 for an example of how COVID-19 is creating new challenges for using data while protecting rights). With a view toward informing this process, this chapter lays out the legal mechanisms that enable trusted and trustworthy domestic and cross-border data transactions for the use and reuse of both personal and nonpersonal data. Whether the focus is on the collection, use, transfer, or processing of data between businesses, or among citizens, businesses, and governments, each of these interactions is a data transaction with the potential to create value—as long as both parties trust the over- all process sufficiently. However, a variety of factors can undermine trust. These may include the absence, weakness, or uneven application of the legal frame- work; weak institutions and law enforcement or lack of effective ways for parties to enforce their rights; practices that unfairly benefit certain actors; skewed or lopsided incentives (see chapter 8); and poor or insecure infrastructure (see chapter 5). From a normative perspective, trust is a function of both “hard law” and “soft law.” Hard law includes domestic, regional, and international law, as well as case law and statutory law that originate from tort, contract, and competition law. Some of the issues embedded in domestic law have their origins in well-hewn and commonly agreed standards derived from international law, conventions, and treaties. Emerging applications of trust law and competition law may also play a valuable role in strengthening the normative framework for data. Whereas hard law is shaped by state actors, soft law includes standards, terms and conditions of use, norms, and codes of conduct and other voluntary frameworks used by nonstate actors, including industry participants and civil society (see chapter 8). These soft law elements can play an equally valuable role in governing data use according to needs and cultural specificity.6 A central claim of this Report is that use of data for development purposes requires a legal framework for data governance that includes both safeguards and enablers. Safeguards generally refers to those norms and legal frameworks that ensure and promote trust in the data governance and data management ecosystem by avoiding and limiting harm arising from the misuse of data or breaches affecting their security and integrity. Enablers generally refers to those policies, laws, regulations, and standards that facilitate the use, reuse, and sharing of data within and between stakeholder groups through open- ness, interoperability, and portability. Whereas the approach to safeguards differs markedly for personal and nonpersonal data, a common set of enablers is relevant to both categories. For the collection and processing of personal data, this Report proposes a rights-based approach, whereby access to personal data must first be ade- quately safeguarded before enabling use and reuse. This two-step process helps to rebalance power asymmetries between data holders/subjects and data controllers/users that can undermine trust. For the purposes of this chapter, personal data include not only data directly provided by an individual, but also personally identifiable information and machine-generated information that can readily be linked to an individual (such as mobile phone data).7 For nonpersonal data, this Report advocates a balance of interests approach to safeguards and enablers, recognizing that trade-offs typically arise between increasing data access and safeguarding intellectual property rights (IPRs) over nonpersonal data. The focus is thus on a legal framework that
  • 211. Data policies, laws, and regulations: Creating a trust environment | 191 enables the (re)use and sharing of data through reg- ulatory predictability, data openness, and portability (the ability to readily transfer data from one service to another based on clear legal and technical stan- dards). Of growing importance are data that blend both personal and nonpersonal sources—so-called mixed data. The creation, collection, and use or processing of personal and nonpersonal data by public or private sector entities in both domestic and cross- border contexts interact in a dynamic way in a three- dimensional legal/regulatory space in which differ- ent elements of the legal framework apply (see figure 6.1). The underlying type of data does not necessarily determine how the data might be treated legally across the data value chain; that depends on how such data are used or processed. For example, data that may start off as public sector and personal data (such as household survey, health, or geolocation data) may end up as private sector and nonpersonal data (when integrated as part of a proprietary algorithm and per- fectly deidentified). Similarly, data that start out as private data may end up in the public domain if pub- lished as open data or shared with government under a data sharing agreement. These dynamic shifts in data uses may change the legal treatment of that data accordingly. The trust framework encompassing safeguards and enablers is underpinned by rule of law and good governance principles. These include certainty, trans- parency, accountability, nondiscrimination, fairness, inclusiveness, and openness. They are subject to due process limitations such as necessity and proportion- ality. Transparency, accountability, and certainty in rulemaking can be reinforced by ensuring that laws and regulations are developed according to good regulatory practices. These include supporting con- sultative rulemaking8 and ensuring that regulations are based on evidence, with stakeholder impacts and spillover effects fully considered through regulatory impact analysis.9 In addition, recent developments in regulatory design have included efforts to adapt regulations to the digital age. Mechanisms such as regulatory sandboxes and laboratories help make regulations more agile and readily adaptable to evolv- ing circumstances. By drafting principle-based and technologically neutral laws and regulations, policy makers help them remain relevant as technologies evolve and reduce compliance burdens. To capture the current robustness and complete- ness of normative frameworks for data governance around the world, the chapter draws on a new Global Data Regulation Survey conducted exclusively for this Report.10 It collected information on attributes of the regulatory framework in 80 countries (covering 80 percent of the world’s population) selected from global regions and country income groups across the development spectrum. The survey entails a detailed assessment of domestic laws, regulations, and admin- istrative requirements, reflecting the regulatory sta- tus of each country as of June 1, 2020. Survey results are summarized in a variety of subindexes that cap- ture different aspects of the regulatory environment for safeguards and enablers. This chapter focuses squarely on the legal dimen- sion of data governance. Chapter 7 then examines the resulting economic trade-offs, and chapters 8 and 9 discuss the design of institutional ecosystems to sup- port implementation and enforcement. Building safeguards for trusted data use The term safeguards refers to the trust environment around the collection and use of data. It includes supporting individuals’ agency—that is, their ability to exercise control—over how their personal data are used, through mechanisms such as consent, rights Figure 6.1 (1 column plus margin 21p7) Public Cross-border Nonpersonal Domestic Personal Private Types of data Types of actors Types of transactions Figure 6.1 Envisioning the multidimensional nature of the legal framework for trust Source: WDR 2021 team.
  • 212. 192 | World Development Report 2021 of use of data, and regimes that allow reuse of data for “legitimate purposes” without express consent. Safeguards also encompass how data are secured and accessed, covering the obligations of those who col- lect, process, or use data to take precautions to ensure the integrity of the data and protect data rights, including intellectual property rights and other lim- itations on the use of nonpersonal data (see figure 6.1). Safeguards are analyzed primarily according to whether they are related to personal data, nonper- sonal data, or mixed data. The degree of sensitivity of these types of data differs markedly, leading to vari- ous legal approaches. Safeguards for personal data, nonpersonal data, and mixed data Safeguards for personal data are grounded in a rights- based framework that has evolved over time (see figure 6.2). These safeguards have their origin in the establishment of the “rule of law” in conjunction with the expression of individual rights in the Enlighten- ment and were codified in international law after World War II. They were further refined in the con- text of analog data in the 1970s and 1980s with the Fair Information Practices, the Council of Europe’s Convention 108 for the Protection of Individuals with regard to Automatic Processing of Personal Data,11 and the first guidelines issued by the Organisation for Economic Co-operation and Development (OECD). Safeguards must necessarily adapt to technological change and will continue to evolve accordingly. For example, the OECD guidelines were updated after the launch in 1995 of the World Wide Web, and Conven- tion 108 was updated to Convention 108+ in response to the entry into force of the GDPR. Safeguards for nonpersonal data entail a more straightforward balancing of economic incentives and interests, grounded in IPRs as typically enshrined in domestic law. For datasets containing mixed data, it is the responsibility of the data processing entity to ensure thatpersonaldataareprotected.Thiscompliancechal- lenge has become more acute in recent years because source data and collection methods have evolved and Figure 6.2 The evolution of data protection Source: WDR 2021 team. Note: This figure shows a clear arc from historical concepts of rights governing interactions between the state and the individual (and between states) to principles guiding data protection. EU = European Union; GDPR = General Data Protection Regulation; OECD = Organisation for Economic Co-operation and Development; UDHR = Universal Declaration of Human Rights; UN = United Nations; WWW = World Wide Web. Figure 6.2 (2 column plus margin 38p6) Treaties of Westphalia Introduces the rule of law Déclaration des droits de l'homme et du citoyen Expression of individual rights Article 12, UDHR Rights codified in international law Fair Information Practices Modern expression of data privacy Council of Europe Convention 108 2019 Convention 108+ codifies data protection OECD guidelines WWW launched EU directive First expression of data protection rights of the digital age Privacy by design GDPR UN Principles 1981 1948 1973 1983 2013 1995 2009 2018 NEXT? 1995 1789 1648 2018
  • 213. Data policies, laws, and regulations: Creating a trust environment | 193 blurred the distinction between the traditional legal definitions of personal and nonpersonal data. The Global Data Regulation Survey conducted for this Report provides an overview of the extent to which good-practice data safeguards have been adopted. Across the 80 countries surveyed, about 40 percent of the elements of good-practice regula- tory safeguards are in place. Although scores range consider­ably, from less than 35 percent in low-income countries to more than 50 percent in high-income countries, the results highlight that even among the latter the regulatory framework is far from complete. Of the high-income countries included in the survey, Australia, the United Kingdom, and Uruguay stand out as those with the most advanced safeguards. Among the strongest middle-income countries are Colombia, Moldova, and Nigeria. Other low- and middle-income nations that have endeavored to develop safeguard regulatory frameworks are Benin, Mexico, and Tur- key. Mauritius, a standout among its middle-income peers, performs well on most safeguard measures. It has deliberately designed and implemented policies based on best practices and has distinguished itself as one of the first Sub-Saharan African countries to ratify Convention 108+. In Latin America, Uruguay is one of two countries to have received an adequacy determination from the European Commission. Overarching safeguards for cybersecurity and cybercrime A key element in establishing trust in the data eco- system for both personal and nonpersonal data is ensuring the security of the network infrastructure and elements over which data flow. Cybercrime laws effectively give teeth to cyber- security policies. Although there is no universally accepted definition of cybercrime, the concept encom­ passes both a narrow view—criminal activities targeting information and communication technolo- gies (ICT) and software—and a broader view—tradi- tional crimes committed in cyberspace.12 In practice, the scope of cybercrime is typically understood to include unauthorized access to a computer system (sometimes called hacking), unauthorized monitor- ing, data alteration or deletion, system interference, theft of computer content, misuse of devices, and offenses related to computer content and function.13 Cybercrime knows no borders. The crime can be committed from any computer, no matter where, connected to the internet or from a public or private entity that relies on ICT systems. Similarly, the impact of the crime can be felt anywhere, even outside the jurisdiction where the cybercriminal is physically located. Thus to be truly effective, a cybercrime law needs to extend beyond dealing with criminal activ- ity within a subnational or national jurisdiction and become a tool to maximize cross-border coopera- tion.14 This requirement entails the legal notion of dual criminality, which establishes that a criminal activity in one jurisdiction is also a criminal activity in another.15 It also demands practical collaboration, usually achieved through mutual legal assistance treaties (MLATs). Countries enter into MLATs either through bilat- eral treaties with other countries or by adhering to an instrument that features a built-in MLAT process, such as the Council of Europe’s Budapest Convention of 2001. The main legal instrument for cybersecurity in Europe and beyond, this convention provides for balancing security interests with respect for human rights.16 Sixty-five countries have acceded to the ­ convention, with an additional 12 states participat- ing as observers.17 Of the members and observers, 26 countries are lower-middle-income. Recently, some governments have been sidestepping the MLAT pro- cess by making requests for evidence directly to for- eign law enforcement agencies and allowing them to do likewise. In this vein, the United States adopted the Clarifying Lawful Overseas Use of Data (CLOUD) Act of 2018, which authorizes the US government to enter into bilateral agreements with foreign gov- ernments, allowing the parties to remove any legal barriers that would prevent the other party from seeking and obtaining data directly from the service providers in the other country under certain circum- stances.18 This has attracted comment for potentially sidestepping legal protections for personal data.19 The European Union is considering a draft regula- tion with similar provisions.20 Cybersecurity encompasses the data protection requirements for the technical systems used by data processors and controllers, as well as the estab- lishment of a national Computer Security Incident Response Team (CSIRT), an expert group that handles computer security incidents (see chapter 8). In addi- tion to dealing with the criminal behaviors discussed, cybersecurity also builds trust by addressing unin- tentional data breaches and disclosures (such as those resulting from badly configured servers) and holding firms accountable. Overall, the Global Data Regulation Survey reveals a low level of uptake of cybersecurity measures (fig- ure 6.3). None of the low-income countries included in the survey has legally imposed a full range of security measures on data processers and controllers. Evenamonghigh-incomecountries,barely40percent
  • 214. 194 | World Development Report 2021 of those surveyed require data processors and con- trollers to comply with these security requirements, such as by adopting an internal policy establishing procedures for preventing and detecting violations; establishing the confidentiality of data and systems that use or generate personal data; appointing a per- sonal data processing or information security officer or manager; performing internal controls; assessing the harm that might arise from a data breach; or intro- ducing an awareness program among employees. CSIRTs are far more prevalent. They can be found in all high-income countries and in about one-third of low-income countries. Among the lower-middle-income group, a good reflection of best practice is the comprehensive cyber- security requirements in Kenya’s new Data Protection Act. It requires data controllers to consider measures such as pseudonymization and encryption of data; an ability to restore the availability of and access to personal data in the event of a physical or technical incident; and mechanisms to identify internal and external risks to personal data that are reasonably foreseeable. It also requires steps to ensure that safeguards are established, effectively implemented, and continually updated in response to new risks or deficiencies. Safeguarding personal data To better address underlying concerns about the power asymmetries between (individual) data sub- jects and data processors and collectors, this Report advocates an approach based on control over personal data rather than one grounded in data ownership (see spotlight 6.2). Under the rights-based approach to protection of personal data, individuals have fun- damental rights regarding their personal data. These rights are both substantive and procedural. Substantive rights include measures preventing the unauthorized disclosure of personal data and the use of personal data for unwarranted surveillance, unfair targeting, exclusion, discrimination, unjust treat- ment, or persecution. Such substantive rights also require purpose specification, data minimization, and storage limitations. Procedural rights are built around the concepts of necessity, transparency,21 accountability, proportion- ality, and due process. They include rights to receive notice about and to object to how data are used and rights of access to correct and erase data (including the right to be forgotten),22 as well as rights to redress and remedy. These rights are grounded mainly in domestic law. The absence of a harmonized global legal framework for protection of personal data affects cross-border data transactions involving personal data, which are especially limited in lower- middle-income countries (see chapter 7). Adoption of data protection laws is comparatively widespread.23 Nearly 60 percent of countries sur- veyed for this Report have adopted such laws, ranging from 40 percent of low-income countries to almost 80 percent of high-income countries (figure 6.4). Yet the quality of such legislation is uneven, with important good-practice elements often lacking. Legal frameworks for the protection of personal data should typically include individual rights to chal- lenge the accuracy and object to the use of personal data and parallel requirements for data processors to limit the purpose of data use, minimize the volume of data collected, and limit the time frame for data storage. These legal provisions are much less prev- alent in low- and middle-income countries than in high-income countries. Although many lower- middle-income countries have laws on the books, their enforcement is uneven: only some 40 percent of low-income and lower-middle-income countries have created a data protection authority, compared with some 60 percent of high-income countries. Figure 6.3 Gaps in the regulatory framework for cybersecurity are glaring across country income groups Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata .worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_3. Note: The figure shows the percentage of countries in each country income group that had adopted good-practice legal and regulatory frameworks for cybersecurity and cybercrime as of 2020. CSIRT = Computer Security Incident Response Team. High-income Upper-middle-income Lower-middle-income Low-income Security requirements for automated processing of personal data CSIRT Cybersecurity requirements for data processors and controllers Regulation of cybercrime activities 0 20 40 60 80 100
  • 215. Data policies, laws, and regulations: Creating a trust environment | 195 The uneven quality of data protection legislation affects in practice the effectiveness of safeguards for personal data. After many years in the making, both Kenya and Nigeria recently updated their legal regimes for data protection. In 2019 Kenya’s new Data Protection Act entered into force, while Nigeria’s National Informa- tion Technology Development Agency issued a new Data Protection Regulation. Both instruments reflect many of the elements contained in the GDPR. India is also debating landmark data protection legislation. Requiring consent or other lawful bases for data collec- tion and processing. Most data protection laws rely on individual consent as one lawful means of limiting how data about individuals can be collected and used.24 The appropriate lawful basis for data process- ing depends on factors that include how the data will be used and the relationship between the parties. For example, consent may not be the appropriate basis for data processing by public authorities.25 The consent model has normative and practical limitations. Current commercial practices often adopt a “tick the box” approach to obtaining consent, and they are more often based on incentives to limit corporate liability than on a desire to ensure that consent is “informed” (that is, that individuals fully understand what will happen to the information they have authorized for collection and are effectively in control of how their data will be further used and shared). Privacy notices are often long, complex doc- uments written by companies’ legal teams. It is, then, difficult for people to read all the disclosure docu- ments on the websites they visit or for all the apps on their smartphones. This difficulty is particularly acute in the devel- oping world, where literacy rates remain low and individuals face language and technical barriers to understanding privacy notices. In such cases, data processors should take extra care to obtain informed consent through adapted means. Firms can use con- sent to justify collecting and processing excessive amounts of data, especially in countries where data protectionauthoritiesmaynothaveenoughresources to monitor and enforce compliance with other obliga- tions, such as data minimization. Addressing these concerns would require taking a more user-centric approach to obtaining informed consent for the collection of volunteered data, including using Figure 6.4 Progress on personal data protection legislation differs markedly across country income groups Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata.worldbank.org/index.php/catalog/3866. Data at http:// bit.do/WDR2021-Fig-6_4. Note: The figure shows the percentage of countries in each country income group that had adopted good-practice legal and regulatory frameworks to safeguard personal data as of 2020. High-income Upper-middle-income Lower-middle-income Low-income 0 20 40 60 80 100 Data protection law enacted No exception to limitations on data collecting and processing by governments Necessity and proportionality test for government exceptions Purpose limitation requirement Data minimization requirement Data storage limitations Requirements to incorporate privacy by design Regulation of data sharing with third parties Individual right to challenge accuracy and rectify personal data Regulatory limitation on algorithmic decision-making Individual right of redress to object to use of personal data Data protection authority created
  • 216. 196 | World Development Report 2021 simplified terms of service and embedding respon- sible data collection practices in operations to avoid collecting excessive amounts of data. Although consent may still be an appropriate lawful basis in some instances (such as when data are volunteered by individuals), newer technologies involving passive data collection (such as by the Internet of Things) and merging or linking datasets to make inferences pose further challenges to the consent model. Alternatives to consent include relying on other lawful bases for processing personal data, including resorting to a “legitimate purpose” test or fiduciary duty requirement. A legitimate purpose test would limit the use of personal data to what is compatible, consistent, and beneficial to data subjects based on the original purpose for which the data were col- lected. Under this approach, data could still be used for more wide-ranging purposes if they are anony­ mized or aggregated to, for example, develop new products and services, or to make risk assessments without impinging on the data subject’s rights. Rely- ing on a fiduciary duty approach would require data collection and processing firms to always act in the best interests of data subjects and in ways that are not detrimental to them. Legally obligating providers to act in the best interests of their customers can help establish trust and confidence among customers that their data are being used responsibly. Examples of fiduciary duty breaches include using customer data to unfairly manipulate purchasing decisions. Another alternative to these approaches that might require less oversight is to ban use of certain types of data outright based on identified possible misuses of personal data.26 In principle, the limitations on the use of personal data enshrined in data protection legislation apply to all parties that process or control personal data. Nevertheless, governments may choose to create exceptions to these compliance and liability limita- tions for data processing by public sector entities. The Global Data Regulation Survey indicates that these exceptions are widespread in all surveyed countries that have data protection legislation (figure 6.4). Most of these exceptions are limited and pertain to specific data uses, such as in relation to national security as in Brazil and India27 or in transactions involving health data as in Gabon. Other countries have passed laws that provide for more wide-ranging exceptions, including exemption from the requirement to obtain consent from data holders when performing lawful government functions such as service delivery.28 Where such government exceptions exist, good practice calls for them to be transparent and objec- tive. They should also be limited in scope and dura- tion (such as through sunset provisions) to respect due process limitations. These exceptions must be “necessary and proportionate” to the intended objec- tives—limitations designed to ensure that any estab- lished exceptions are lawful and balanced against the objective being sought.29 Furthermore, exceptions should be consistent with international human rights law. More than one-third of high-income countries require justification for the exceptions, while less than 10 percent of surveyed low-income countries place such process limitations on government action. This lack of limitations creates additional opportuni- ties for unchecked state surveillance or mission creep, thereby undermining trust in data use.30 Meetingtechnologicalchallenges. Rapid technological progress in data processing, machine learning, and artificial intelligence (AI) pose challenges to current data protection frameworks. In particular, traditional data protection is based on the notion that infor- mation is volunteered by the data subject, whereas data analysis is increasingly based on observed data (obtained from passive scraping of information from devices and social media accounts) or inferred data31 (generated from a vast array of correlates using statis- tical techniques). In addition, AI and machine learn- ing rely on large-scale datasets to function, creating tensions with established data protection principles such as data minimization. Although linking these data sources provides a fuller picture of the individ- ual, the linked data could also have a negative impact on the subject if used in decisions such as on credit or employment, with limited enforceability of the protections applicable to volunteered data, includ- ing accessing and seeking correction of erroneous information. The increasingly widespread practice of linking datasets to feed algorithms also stretches the limits of technical mechanisms to protect personal data, such as anonymization. Unlike pseudonymized data, once data are thoroughly deidentified legally they are no longer considered to be personal data. Thus they can be published or used outside the scope of data protection law, even if the original source contains personal data.32 Although anonymization techniques can protect individual datasets, research has shown that linking datasets enables the reidentification of individuals in deidentified data and risks blurring the boundary between personal and nonpersonal data.33 At the same time, anonymization techniques can
  • 217. Data policies, laws, and regulations: Creating a trust environment | 197 reduce the size and accuracy of datasets, affecting their value to third parties once published.34 Even when anonymization techniques can deiden­- tify individuals, concerns are growing about the use of such data to identify groups of people who could be targeted for surveillance or discrimination (including groups defined by ethnicity, race, religion, or sexual orientation).35 Data protection laws need to keep pace with technological efforts aimed at deanonymization.36 Laws could require data users to adopt a holistic approach37 to data protection that can be adapted to different risks from data uses,38 includ- ing protecting data by design and default. Adopting “data protection by design.” Data protection by design embeds data protection practices into the initial design phase of data-driven products and ser- vices39 through a combination of hardware and soft- ware features, legal and administrative provisions, and privacy-enhancing technologies (PETs) using encryption40 and statistical techniques.41 Such mea- sures complement and enhance existing legal data protection in ways that reduce the risk of identifiabil- ity of data.42 Data protection by design has evolved from “pri- vacy by design,” which was first adopted as an inter- national standard in 2010. It was later recognized by its inclusion in the Mauritius Declaration on the Internet of Things in 2014,43 with a new International Organization for Standardization (ISO) standard under development.44 The concept—originally devel- oped in Canada45 —has been integrated into data protection regulation and practice in the European Union,46 as well as Australia (State of Victoria);47 Hong Kong SAR, China;48 and the United Kingdom.49 Nevertheless, the Global Data Regulation Survey indicates limited uptake of data protection or privacy by design approaches. Less than 20 percent of the countries surveyed have adopted such requirements, ranging from 36 percent uptake in the high-income countries surveyed to negligible adoption in middle- income countries (figure 6.4). An interesting excep- tion is Benin, which mandates “data protection by design” in its Digital Code Act. PETs are often used to deidentify data at the source (for example, by relying on anonymization and aggregation) to reduce their identifiability. The result may be a trade-off between the level of data protection afforded and the resulting usefulness of the data (for data uses requiring granular or identifi- able characteristics such as gender or age). Research showing the ease of reidentifying previously deiden- tified data (using only four data points50 or when linking datasets) has highlighted the limitations of current anonymization methods and has prompted the development of new techniques.51 Separately, the value of encryption-based PETs may be limited if law enforcement authorities argue that back doors should be included in these systems. These limitations have also prompted the emer- gence of other mechanisms to protect personal data, including personal information management systems (PIMS) such as Safe Sharing sites52 and per- sonal data stores.53 These tools can help users store, use, and manage how their personal information is shared with third parties. To address certain cyber- vulnerabilities and technical features of data protec- tion by design and act as effective safeguards, PETs should be accompanied by supporting organizational and behavioral measures.54 Dealing with automated processing. The growing use of algorithms for automated processing of personal data can add significant value through the applica- tion of predictive analytics, but it poses additional regulatory and societal challenges. These include algorithmic bias, risks to personal data protection, and lack of transparency, accountability, and other procedural safeguards (such as redress) to ensure that decisions made on the basis of automated processing are conducted in compliance with due process.55 Only about 30 percent of countries included in the Global Data Regulation Survey have put in place measures to restrict decision-making based on automatically processed personal data (figure 6.4). Among the rela- tively small number of countries whose laws address this, Côte d’Ivoire has included provisions in its data protection act that prohibit the use of automated pro- cessing of personal data in judicial decision-making to prevent bias.56 Automated processing of personal data in the criminal justice sector is an example of controversial public sector use of these technologies—especially those using facial recognition—that can perpetuate biases.57 A 2016 study conducted in Oakland, Cali- fornia, found that, despite survey data showing an even distribution of drug use across racial groups, algorithmic predictions of police arrests were con- centrated in predominantly African-American com- munities, creating feedback loops that reinforced patterns of structural or systemic bias in the history of police arrests.58 Algorithms can also introduce racial biases when facial recognition algorithms are trained predominantly on data from Caucasian faces, significantly reducing their accuracy in rec- ognizing other ethnicities.59 Evidence suggests that
  • 218. 198 | World Development Report 2021 racial60 and gender61 bias in private sector uses of AI for decision-making is also prevalent. Additional challenges within the public sector include a lack of transparency and accountability in the use of automated decision-making systems. Many of the technologies procured by public sector entities are developed by private sector corporations. Thus, the underlying algorithms may be subject to copyright or other IPRs that restrict the ability to undertake independent third-party audits. The use of such technologies by the public sector, without imple- mentation of the appropriate audits and grievance redress mechanisms, may impair public trust in data processing by institutions and lead to discrimination or otherwise unfair decisions. Because of these challenges, as the uptake in AI technologies and automated decision-making systems increases in both the public and private sectors, some principles for algorithmic regulation are emerging at both the national and international levels. Internationally, the focus has frequently been on developing guiding principles based on data ethics. For example, OECD and the Group of Twenty (G-20) published two closely related sets of princi- ples on ethical AI in 2019 that highlight the need to ensure transparency, explainability, and inclusion of unrepresented or vulnerable groups in the design and implementation of AI systems.62 Fulfilling this needwillrequiresignificantcapacity-buildingefforts to promote responsible use of AI in lower-income countries. Principles grounded in data ethics can be applied to other types of data uses that may have important societal impacts. Human rights-based frameworks, for example, can provide useful guiding principles for responsible data use.63 Some countries have made efforts to support transparency and accountability in the use of AI and automated decision-making systems in the public sector by publishing the source code of algorithms in public registers,64 revising pro­ curement rules, and developing charters,65 regu- lations, or certifications.66 In February 2020, a Dutch court ruled that an automated surveillance system developed to detect welfare fraud in the Netherlands (SyRI) violated human rights by not meeting a “fair balance” between its objectives and its risk to privacy. It then halted the system.67 Relying on competition and consumer protection laws. In countries where data protection legislation is not yet in place, other statutory instruments—notably, consumer protection and competition legislation— have been leveraged to protect the data rights of individuals, notwithstanding the rights’ distinct legal focus. Under a rights-based approach, data protection law is generally aimed at achieving individual agency, whereas consumer protection law aims to promote economic fairness for consumers, and competition law strives for fairness among businesses. These approaches are complementary, but they are not an adequate substitute for the scope and protection of a rights-based data protection legal framework. Nonetheless, consumer protection agencies may have wider-ranging powers than data protection authori- ties,68 equipping them to address some of the issues underlying misuse of personal data, such as unfair consumer practices or competition concerns (see chapter 7 for further discussion of data and compe- tition issues).69 Safeguarding nonpersonal data Safeguards for the domestic use and reuse of non- personal data revolve around the protection of intellectual property rights fit for the digital age, as well as cybersecurity measures. Various contractual elements affecting how entities use and reuse non- personal data (and even mixed data) are also relevant, including contracts themselves (terms and condi- tions, assignment of liability and remedies), as well as industry standards, codes of conduct, and audit requirements. Soft law tools include the use of stan- dards to broker trust among entities exchanging data. Nonpersonal data produced by the private sector can be protected under copyright, although copyright is limited to protecting creative expression, such as compilations, as opposed to raw data. Some govern- mentshaveintroducedinnovationstoovercomethese limitations.70 Observing that while the rights to data utilization may be controlled by contract but are not always specified in terms, Japan’s Ministry of Econ- omy, Trade and Industry updated application of the Unfair Competition Prevention Act to provide protec- tion for industrial data by publishing guidelines along with model contract clauses for data transactions.71 India’s Ministry of Electronics and Information Tech- nology published a draft governance framework for nonpersonal data, recommending clarifications on the scope, classification, rights of use of nonpersonal data, and creation of a nonpersonal data authority.72 Governments may also wish to establish rules to support the reuse of public sector data by preventing the private sector from setting excessively high prices for the use of licensed data-driven products and services developed using public sector, or otherwise “high value,” data. One mechanism is to mandate firms to license such products on fair, reasonable, and non-discriminatory (FRAND) terms by considering
  • 219. Data policies, laws, and regulations: Creating a trust environment | 199 them “essential data infrastructure.” Governments may, however, find that IPR protection of nonper- sonal data conflicts with other policies that encour- age the interoperability of data systems and the free reuse of datasets. Protection of nonpersonal data under an IPR regime is currently more prevalent in upper-middle- income countries than in most of the low-income countries surveyed. Fifty percent of upper-middle- income countries protect nonpersonal data under their respective IPR frameworks. For example, Brazil’s copyright law covers the use of databases containing “economic rights.”73 Similarly, in Bangladesh pro- gramming codes, data, and charts are deemed to be the property of the owner, as indicated in the 2000 Copyright Act. Creating enablers for data sharing This section examines a variety of enablers, including thoserelatedtoelectronictransactions(e-transactions), data sharing policies (including open data, access to information regimes, open licensing), and exceptions to the liability of data intermediaries. Enablers are primarily analyzed according to the domain of the data—that is, whether data are gener- ated or controlled, or both, by the public or private sector. This approach highlights the varying margin of control that governments have over these two types of data. For public sector data, governments can employ several policy and legal tools to directly mandate access to and sharing of data—indeed, some already do so for certain health, patent, and even airline passenger data. By contrast, most data transac- tions involving the private sector are based on volun- tary contractual agreements. The government’s role is largely limited to creating incentives to promote private sector data sharing. Although the discussion here deals mainly with domestic data transactions, many of the enablers can be adapted to cross-border data transactions (see chapter 7). Across the 80 countries surveyed for this Report, just under half (47 percent) of the elements of a good-practice regulatory framework for enabling data use and reuse are in place. The scores range considerably, from 30 percent among low-income countries to 62 percent among high-income coun- tries. Although Estonia and the United Kingdom stand out among the high-income countries surveyed for the most advanced enablers, their performance is matched in the middle-income group by Mexico. Several other low- and middle-income nations are also making progress establishing regulatory frame- works to enable data reuse, such as China, Colombia, Indonesia, and Nigeria. Overarching enablers for electronic transactions Many data uses or transfers are executed via elec- tronic transactions. Individuals using their data to transact online need assurance that their data are being used in a safe and secure manner. Laws gov- erning e-commerce and e-transactions provide an overarching legal framework that helps create trust in both public and private sector online data transac- tions, which, in turn, encourages use of data online. Introducing e-commerce laws. A good-practice regu- latory environment for electronic transactions begins with foundational e-commerce legislation, which is a prerequisite to the widespread use of more sophis- ticated online credentials. Such laws are relatively widespread; more than 70 of the countries surveyed, including about 70 percent of low-income countries surveyed, have such laws. And there is little variation across country income groups (figure 6.5). Legal rec- ognition of electronic signatures is one of the few areas in which high-income countries remain far ahead of low- and middle-income countries. Establishing legal equivalence of paper-based and elec- tronic communications. In a legal framework, the cen- tral issue is to establish that a data transfer will not be denied legal value merely because it is done elec- tronically—that is, the online transaction, contract, or communication has legal equivalence to physical transactions, and electronic evidence has probative value.74 For example, electronic contracts and signa- tures are given the same legal value as a wet ink sig- nature on a paper contract, and digital evidence has the same value as physical evidence.75 The majority of surveyed countries’ e-commerce legislation includes such provisions (figure 6.5), an unsurprising finding given that model laws on e-commerce were promul- gated in the late 1990s.76 For example, provisions enabling e-transactions are found in Morocco’s Law No. 53-05 (2007), and good-practice provisions are embedded in Thailand’s Electronic Transactions Act (2019 amendments). Authenticating parties to an online transaction. Spe- cial legal treatment surrounds the manner in which parties to an online transaction are authenticated. Most laws governing e-transactions take a layered approach to the digital authentication of parties to a transaction, with built-in recognition that certain types of online transactions require greater degrees of reliability about the identity of parties, while others
  • 220. 200 | World Development Report 2021 require lower levels of assurance. Some—such as land transactions and certain family law matters, includ- ing marriage and divorce—are generally outside the scope of these laws because of the sensitive nature of the transaction. For transactions requiring a high level of assurance, public or private key infrastructure is often recognized in e-transaction laws as providing robust authentication, and it is backed up by a digital certification process.77 Other trust services may also be specified as a basis for verifying and validating electronic signatures, seals, or time stamps; verify- ing and validating certificates to be used for website authentication; and a range of activities related to data transfers.78 Introducing digital identification. An important tool for authentication of parties to a digital transaction is a trusted digital identification system with wide- spread coverage, allowing individuals to securely prove their identity in online settings. Currently, an estimated 1 billion people worldwide do not have government-recognized proof of their identity (and many more do not have the means to securely and reliably prove who they are in the digital world).79 Although the use of digital identity verification and authentication tools is on the rise, driven in part by advances in connectivity as well as growth in digital payments and services,80 fewer than half of surveyed countries have government-recognized digital identi- fication systems that would enable people to remotely authenticate themselves to access e-government services. Those that do are mainly higher-income nations (figure 6.5). Ensuring technical neutrality of online systems. E-transaction laws should be principle-based and technology-neutral so that they accommodate a wide range of technical solutions and avoid requiring specific authentication technologies to the exclusion of others. Such requirements avoid capture of the e-transaction or authentication market and help laws adapt as technologies evolve.81 Technology neutrality is also a feature of digital identity programs and of some digital identity laws.82 Enabling reuse of public intent data The challenges with sharing and reusing public sec- tor data abound.They include barriers to the real-time provision of data; data not being shared or published in reusable formats (standardized and machine read- able with metadata); and data not being provided at reasonable cost. Usage is also affected by the quality or relevance of the data being shared. Political econ- omy factors, including the absence of a data sharing culture in public administration and lack of coordina- tion among government entities, can further impede the exchange of public sector data (see chapter 8). Overcoming these challenges can yield consider- able returns. An impact assessment of the 2003 Direc- tive on the Reuse of Public Sector Information found that in the European Union the direct economic value of public sector information was €52 billion in 2017, potentially rising to €194 billion by 2030.83 In recog- nition of such potential value, national governments have ramped up efforts to use policy, legal, and regula- tory tools to mandate data sharing within and beyond the public sector. A good-practice regulatory environment for enabling reuse of public sector data would include foundational legislation on open data and access to information, as well as digital identity verification and authentication; a data classification policy; adop- tion of syntactic and semantic interoperability; and user-friendly licensing arrangements. The surveyed countries have adopted about half of such good prac- tices, ranging, on average, from less than 30 percent by low-income countries to two-thirds by high- income countries (figure 6.6). Legislation to promote and regulate the publica- tion and use of public sector data (open government Figure 6.5 Adoption of e-commerce and related legislation is widespread across country income groups Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata .worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_5. Note: The figure shows the percentage of countries in each country income group that had adopted good-practice legal and regulatory frameworks for e-commerce as of 2020. High-income Upper-middle-income Lower-middle-income Low-income 0 20 40 60 80 100 E-commerce law Legal equivalence of paper-based and electronic communications Legal recognition of electronic signatures Technological neutrality of electronic communications Digital ID system to access e-government services
  • 221. Data policies, laws, and regulations: Creating a trust environment | 201 data) can be passed as stand-alone open data acts, such as in the Republic of Korea and Mexico; embed- ded in other related legislation, such as the laws mandating data sharing in Australia,84 India, and the United Kingdom;85 or through broader e-government omnibus legislation, such as France’s Law for a Digital Republic.86 Themattercanalsobetackledatthesupra- national level, such as through the European Union’s Open Data Directive of 2019 (replacing the Public Sector Reuse Directive of 2003), which includes a list of “high value datasets”87 to be published at no charge as key inputs to the development of AI. Open data policies or laws and access to informa- tion (ATI) legislation (also known as right to informa- tion or freedom of information) play complementary roles as enablers for the use and sharing of public sector data. Open data policies or laws require public disclosure of data as the general rule (ex ante disclo- sure) rather than waiting for an individual request for access to information (ex post disclosure).88 In coun- tries that have passed open data policies without any legal foundation, the publication of open government data relies on the cooperation of holders of public sector data to publish their data. By contrast, ATI legislation provides citizens and firms with a legally enforceable right to compel disclosure.89 Open Barometer, an organization that compiles a global measure of how governments are publishing and using open data for accountability, innovation, and social impact, recommends aligning access to information and open data. This alignment would entail amending ATI laws to provide for proactive dis- closure of data and mandating that nonpersonal data will be open by default, available in machine readable formats, and published under an open license to enable reuse outside government. About one-third of surveyed countries have open data legislation, and more than 70 percent have ATI legislation (figure 6.6). Whereas ATI legislation is widespread in countries across all stages of devel- opment, adoption of open licensing regimes is more common in high-income countries. Establishing open data policies. A country’s public sector data being prepared for publication can be clas- sified on a spectrum from closed to open. According to the Open Knowledge Foundation, for data to be considered open it must be “freely used, re-used and redistributed by anyone—subject only, at most, to the requirement to attribute and sharealike.”90 Open data are thought to be the most decisive approach governments can use to enhance access to public sector data and enable their reuse by third parties to create value.91 According to the Open Data Institute, key elements of a robust data policy include a clear definition of open data and a general declaration of principles that should guide the publication and reuse of open data.92 Geospatial and transportation data are often prioritized for publication by governments under open data initiatives.93 However, certain categories of data may not be suitable for publication as open data, including personal data and data with national security implications. Care must be taken to ensure that personal data are not published on open data portals without adequate protective measures and a conscious assessment of the associated risks. A data protection impact assessment can be used to evaluate the risks of data processing and ensure that data are adequately safeguarded before being shared.94 Asopendatasystemsmature,governmentsshould move from merely promoting access to data to facili- tating use of data. A key enabling reform is ensuring that data and metadata are “open by default,” available Figure 6.6 Regulations enabling access to and reuse of public intent data are unevenly developed across country income groups Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata .worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_6. Note: The figure shows the percentage of countries in each country income group that had adopted good-practice legal and regulatory frameworks to enable access, use, and reuse of public intent data as of 2020. High-income Upper-middle-income Lower-middle-income Low-income 0 20 40 60 80 100 Semantic interoperability mandated for government entities Open data law Data classification policy Mandatory use of data classification for government data Access to information (ATI) legislation Exceptions to the access to information (ATI) right Adoption of open licensing regime by government
  • 222. 202 | World Development Report 2021 in a machine readable format and by bulk download or via application programming interfaces (APIs)— computing interfaces and code that allow data to be readily transmitted between one software product or application and another. A survey conducted by the Open Data Barometer found that less than 10 percent of governments that have established open data por- tals include a provision for data to be open by default. Moreover, only half of the datasets published are in a machine readable format, and only one-quarter of datasets have an open license.95 Ensuring unified data classification standards. A key enabler of data reuse is a data classification policy that categorizes types of data according to objective and easily implementable criteria across the differ- ent stages of the data life cycle.96 Data classification policies typically entail categorizing data according to their sensitivity (such as classified, confidential, or business use only). Although data classification policies are found in more than half of the countries surveyed (figure 6.6), their practical effects are lim- ited because in less than one-third of countries is the application of data classification policies mandatory for government database applications or document management systems. Restricted data (data that cannot be published as open data) could possibly be shared bilaterally by agreement (such as with memoranda of understand- ing). Alternatively, innovative mechanisms, includ- ing data pools and data sandboxes, allow data to be accessed and processed in a controlled environment, subject to specific restrictions on data use. For exam- ple, data could be analyzed at a secure data repository (whether virtual or physical) but not taken off-site.97 Allowing access to information. ATI legislation is a key complementary enabler for increasing access to public sector data that have not been published on an opendataplatform.Suchlegislationprovidesthelegal means for enforcement of public sector disclosure.98 As with open data legislation, ATI legislation can be more or less effectively implemented, depending on how broadly the exemption categories for disclosure are drafted or interpreted and how restrictively data classification policies are applied at the working level. If government entities claim that much of their data are “sensitive” and therefore fall under one of the exceptions for disclosure under ATI statutes, then the usefulness of such legislation for enabling public data access may be limited. This concern is warranted because nearly half the countries included in the Global Data Regulation Survey—across the income spectrum—have placed significant exceptions on an individual’s rights to access public information under such laws (figure 6.6). Another limit to the impact of ATI legislation is its scope of application, which is necessarily limited to public sector data. Open data policies, although originating in the public sector, can be voluntarily adopted by the private sector. However, there is no general legal equivalent to ATI requests to compel the disclosure of private sector data. Currently, the majority of private sector data sharing is undertaken on a contractual basis. Certain experts have argued that expanding the scope of laws mandating access to private sector data, consistent with competition law, could be the “next frontier in data governance.”99 Promoting interoperability of data and systems. For the value of data—including open data—to be fully harnessed, legislation must go beyond promoting access to data and ensure that data can be used more effectively by combining or linking datasets. Doing so requires provisions governing the interoperability of data (and metadata) and their quality, as well as the modalities under which data should be published. These good-practice characteristics include publish- ing data in a machine readable format (under FAIR principles that govern the findability, accessibility, interoperability, and reuse of data)100 and ideally via APIs.101 Interoperability of data and systems can be supported by adopting harmonized standards— ideally, open standards. Open standards are often determined by sectoral or international standard setting organizations (SSOs) in order to support the interoperability of data and systems within a partic- ular market or sector. They are therefore designed collaboratively based on user needs.102 Public intent data should also be published under an open license and at no charge or at a marginal price to cover the costs of dissemination or reproduc- tion.103 Nearly 48 percent of the surveyed countries have adopted some form of open licensing regime for public intent data. All the high-income countries covered in the survey have done so, compared with about 40 percent of middle-income countries. Other countries, such as Jordan and Mauritius,104 have adopted Creative Commons Attribution 4.0 Inter­ national Licenses for government datasets released as open data. In Jordan, datasets published by the government are open to all and licensed under a Jordanian Open Government Data License, which allows the use, reuse, and sharing of data, in compat- ibility with the Creative Commons (CC-BY) license.105 To ensure that data prioritized for publication meet the needs of nongovernmental actors in the private
  • 223. Data policies, laws, and regulations: Creating a trust environment | 203 sector and civil society, these decisions should be guided by consultations with multiple stakeholders (see chapter 8). Enabling access to and the seamless transfer of public sector data between different entities within the public sector and with end users (including individuals and businesses) requires ensuring the interoperability of information technology (IT) systems (including platforms) and data (syntactic and semantic interoperability). As defined by ISO, syntactic interoperability enables “the formats of the exchanged information [to] be understood by the participating systems,” while semantic interopera- bility enables the “meaning of the data model within the context of a subject area to be understood by the participating systems.”106 Effective data and systems interoperability requires the implementation of sev- eral technical protocols and a government interoper- ability platform. In addition to technical enablers for interoperabil- ity across the whole of government, an enabling legal and regulatory framework is often required. This framework mandates the use of the government’s interoperability platform and data exchange proto- cols, ensuring that all government entities connect to and use the platform as a vehicle for exchanging data. Very few countries surveyed have adopted a full range of common technical standards (such as the FAIR principles) that enable the interoperability of systems, registries, and databases (figure 6.6). Estonia is among the few countries surveyed that has established standards for open APIs for govern- ment to government (G2G), government to business (G2B), and government to consumer (G2C) services; standardized communications protocols for access- ing metadata; and developed semantic catalogues for data and metadata. A distinct advantage of implementing interop- erability is the possibility of applying the once-only principle to access to data, which reduces the admin- istrative burden. Citizens and businesses are asked to provide their data only once, thereby requiring pub- lic sector entities to internally share and reuse data— with the appropriate safeguards—in the provision of administrative services. Because the risk of data breaches and misuse increases when data are stored in centralized or decentralized but linked reposito- ries,theonce-onlyprincipleshouldbecomplemented with robust legal and technical data protection as well as cybersecurity and cybercrime safeguards, implemented in a citizen-centered and trustworthy manner, with sufficient capacity for implementation (see chapter 8).107 This once-only principle was integrated into the European eGovernment Action Plan (2016–20) for implementation across the European Union,108 with the intention of enabling both domestic and cross-border inter­ operability. It is also one of the pillars of the 2015 Digital Single Market strategy109 and The Once-Only Principle Project (TOOP),110 which has been piloted under the European Union’s Horizon 2020 framework.111 At the national level, Austria, Belgium, Denmark, Estonia, the Netherlands, Portugal, and Spain have integrated the once-only principle into domestic law for applica- tion across government or on a sector basis.112 Enabling reuse of private intent data The majority of business-to-business (B2B) and business-to-government (B2G) data transactions are governed by bilateral data sharing agreements sourced in contract law.113 Consequently, policy and legal interventions to encourage access to private sector data focus on mitigating the legal and technical challenges that discourage the use and sharing of data by private sector entities. Governments also maintain a greater margin of control over private sector data transactions involving personal data, which are sub- ject to personal data protection and privacy legislation (or competition and consumer protection laws). As appreciation has grown of the strategic value of private sector data for enabling evidence-based policy making and promoting innovation and competition in key sectors (see chapter 4), some governments have enacted legislation mandating the sharing of private sector data deemed to be in the public interest and whose voluntary sharing by the private sector would, otherwise, have been too costly to incentivize.114 Many of the sectors prioritized by such legislation (includ- ing utilities and transportation) are considered to be particularly relevant for the development of AI. At the European level, the 2019 EU Open Data Directive115 requires the European Commission to adopt a list of high-value datasets to be provided free of charge, in machine readable formats, via APIs, and where relevant, via bulk download. These datasets, considered to have “high commercial or societal potential,” include geospatial data, Earth observation data, meteorological data, data about corporate own- ership, mobility data, and data from publicly funded research projects.116 At the national level, France’s Law for a Digital Republic (2016) includes provisions man- dating making private sector data available according to open standards for the creation of “public inter- est datasets.”117 Another relevant example is the UK
  • 224. 204 | World Development Report 2021 Digital Economy Act (2017), which enables researchers to gain access to deidentified data for research pur- poses.118 At the subnational level, cities such as New York, San Francisco, and São Paulo have also made legal attempts to promote public-private data sharing by requiring certain private sector platforms to share their commercial data for regulatory purposes and to spur the development of smart cities.119 A good-practice regulatory environment for enabling reuse of private sector data encompasses data portability and voluntary licensing of access to essential data (figure 6.7). On average, surveyed coun- tries have adopted less than 20 percent of such good practices for enabling private sector reuse of data, which is less than half the level of uptake found for enablers related to public sector data. Promoting open licensing. Licensing regimes, which provide permission to use an otherwise proprietary dataset, can be effective enablers of innovation and competition. They can encourage holders of data-related intellectual property rights to invest in products and markets, knowing that they can control access to licensed products and receive returns on their investments.120 Licensing of intellectual prop- erty rights is often voluntary, but in some cases it is implemented on a compulsory basis by regulators or industry participants to avoid market distortions.121 Voluntary licensing on FRAND terms can be a useful mechanism in enabling the development of open standards because the terms allow companies to share technology and data.122 The adoption of such licensing regimes, however, remains rare, especially in low- and middle-income nations (figure 6.7). Korea and the United Kingdom are among the few surveyed countries that have done so. A range of open licenses are available for use with data. Open data licenses (Open Database Licenses, or ODbLs) provide users with the legal rights to freely share, modify, and use a database without regard to copyright or other intellectual property rights or lim- itations around data ownership. These license agree- ments are published by the Open Data Commons, which makes available a set of legal tools and licenses to help users publish, provide, and use open data.123 The ODbL license sets out user rights, establishes the correct procedure for attributing credit, and specifies how to modify data to facilitate their sharing and comparability. Another form of open license for data is issued by Creative Commons, an international net- work devoted to educational access and expanding the range of creative works available for others to build on legally and to share.124 Under the license, any person can use, copy, publish, distribute, transmit, or process the data and make them available to third par- ties. They can also develop new derivatives of the data by combining them with other data or using them in a product or service, as long as they are attributed to the publisher(s) using a specified statement.125 Requiring data portability. Voluntary data transac- tions between parties are greatly facilitated by data portability. The right to personal data portability is designed to facilitate data transfers with the aim of increasing an individual’s choice and control over data about them. More fundamentally, the right to personal data portability is aimed at “rebalancing the relationship” between data generators/providers and data controllers (including data users and platforms) by mitigating the risk of locking in consumer data. On a more systemic level, this right is intended to foster competition between companies.126 Portability can be broken down into three distinct rights: first, to receive a copy of the data provided by the data generator to the data collector or user (including data consumers and platforms); second, to transmit data to another data collector/user; Figure 6.7 Adoption of enablers for sharing private intent data lags those for public intent data across country income groups Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata .worldbank.org/index.php/catalog/3866. Data at http://bit.do/WDR2021-Fig-6_7. Note: The figure shows the percentage of countries in each country income group that had adopted good-practice legal and regulatory frameworks to enable access to, sharing, and reuse of private intent data as of 2020. FRAND = fair, reasonable, and non-discriminatory; ID = identification. High-income Upper-middle-income Lower-middle-income Low-income Data portability formatting requirement 0 20 40 60 80 100 Data portability right for individuals Private sector able to digitally verify or authenticate ID Mandate for voluntary licensing of access to essential data on FRAND basis
  • 225. Data policies, laws, and regulations: Creating a trust environment | 205 and third, to request a direct transfer from one data collector/user to another.127 Although data portability rights extend to the raw data provided by the data subject (interpreted to include observed data), they do not appear to extend to inferred data (based on drawing inferences from the raw data provided), which are increasingly being used to develop AI.128 Enabling the direct transfer of (personal or nonpersonal) data to another provider requires that the source and host data formats and systems are fully interoperable through the imple- mentation of APIs.129 At present, interoperability is encouraged, not mandated, by the GDPR130 and EU regulations on the free flow of nonpersonal data.131 Alternatives to direct portability include storing personal data in personal information management systems,132 although their uptake is limited.133 In the European Union, the right to personal data portability is mandated by Article 20 of the GDPR and considered one of its most significant innovations.134 Little more than 10 percent of countries surveyed have enacted data portability rights for individuals. By contrast, the portability of nonpersonal data is not mandated, but only encouraged as a means of pro- moting competition and enabling the development of competitive sectors using AI and big data.135 Individuals’ ability to enforce their right to data portability can in practice be supported by requiring data to be transferred in a standard machine readable format. The surveyed countries that grant individuals data portability rights all include formatting require- ments to support data portability (figure 6.7). For example, in the Philippines the 2012 Data Protection Act grants data portability rights to data subjects and empowerstheNationalPrivacyCommissiontospecify the format and technical requirements to enable data portability.136 Using a somewhat different approach, Australia created a specific “consumer data right” in August 2019137 to enable data portability through its Competition and Consumer Act 2010.138 The act fol- lows a phased sectoral approach to implementation,139 which enables common standards to be adapted to sector-specific needs while being made publicly avail- able on the open-source platform GitHub.140 Despite these advances, technical limitations and legal uncertainties challenge effective enforcement of data portability rights. At the same time, cybersecu- rity risks, such as improper access or identity theft, could increase if portability is not accompanied by robust security measures. In the absence of specific laws or regulations mandating interoperability and portability standards, some private sector actors have developed their own mechanisms. One example is the collaborative Data Transfer Project.141 Data format standardization is a key component of enabling data portability in prac- tice: the right to data portability cannot be exercised if data are not downloaded in a format common to other service providers. In practice, despite the source code and APIs being open source, the leadership of this project raises broader questions about the first mover advantage that Big Tech companies have in setting de facto standards and markets for B2B data sharing between platforms. Using APIs to enable effective interoperability and por- tability. APIs can be used to either enable data sharing (such as through open APIs) and portability or limit access to data, depending on how they are designed.142 Although APIs are technical in nature, technolegal frameworks can be developed to determine access and control rules for third parties. These rules can include setting controls to ensure the syntactic and synthetic portability of data; the identity of the API users; the type and amount of data transacted; and the controls on the identifiability of data.143 APIs designed with access and control mechanisms that enable the selection of a limited amount of data can provide users with more flexibility than if they downloaded bulk data.144 That said, because APIs can expose data to unauthorized access during data transfers, they may prove challenging to use effec- tively in lower-middle-income countries that do not have sufficient technical capacity to respond to cyber- security risks. Fully leveraging APIs to enable effective interop- erability and portability requires ensuring that they are developed according to common standards. These standards can be developed through regulation or by industry, based on a multistakeholder approach. Examples of successful initiatives include the Berlin Group, which has developed common API standards for the European banking industry.145 Cases from the financial services sector (such as the UK Open Bank- ing Initiative and implementation of the European Union’s Second Payment Service Directive) may pro- vide helpful lessons for the effective implementation of these mechanisms as enablers for data reuse and sharing. Forging data partnerships. An alternative modality for private sector data sharing is through data public- private partnerships (PPPs) entered into on mutually agreed contractual terms between private sector entities or between government and businesses. For example, the traffic application Waze has partnered
  • 226. 206 | World Development Report 2021 with more than 1,000 cities and other local public ­ sector entities through its Connected Citizens Pro- gram146 to exchange traffic data and derive insights to inform mobility projects, manage traffic and con- gestion, support emergency response, and share data with citizens through a cloud-based platform.147 Data partnerships pose several challenges. Part- nerships between large companies and small and medium enterprises may raise concerns about fairness because of asymmetries in information or market power. Partnerships between public and pri- vate entities may lead to conflicts of interest because of the government’s dual role as data user and data regulator.148 In either case, partnerships may create uncertainties around the allocation of the legal lia- bility associated with the use of the data, as well as potential compliance costs due to lack of harmoniza- tion of legal frameworks applicable to both parties.149 Some of these risks can be mitigated by developing contract guidelines or standard contractual terms to harmonize provisions and rectify information asymmetries. Some public sector initiatives have attempted to develop such standard terms to promote data sharing.150 Not all data sharing partnerships are designed for profit. Some businesses provide their data and digital tools at no charge to governments, academia, and nongovernmental organizations for “social good.” Data philanthropy,151 particularly in the area of big data, has enabled the World Bank,152 together with UN agencies—the World Health Organization (WHO), United Nations Development Programme (UNDP), World Food Programme (WFP), and United Nations Children’s Fund (UNICEF)—and others, to leverage companies’ data stock and digital capabilities to fully exploit the value of data for development, while bene- fiting the private sector through positive externalities. Limiting intermediary liability. One of the great enablersoftheflowofdataacrosstheinternetarerules limiting the liability of intermediaries for content that flows over their platforms. The intermediary liability concept has roots in US telecommunications law dat- ing back to the 1930s,153 and it has been informed by subsequent US case law.154 Crucially, this exemption from liability was extended to “interactive computer services” (internet service providers) in Section 230 of the 1996 amendments to the Communications Act of 1934155 and in the Digital Millennium Copyright Act.156 The advent of data platform business models has led to growing requests from users for the “take-down” of their personal information and has triggered an ongoing debate between privacy advocates and Big Tech about responsibility for fundamental issues of freedom of expression and transparency of knowl- edge. Liability exemptions have been criticized as harboring defamatory conduct, encouraging harass- ment online, and undermining attempts by law enforcement to attribute conduct to specific individ- uals.157 Nevertheless, freedom of expression advocates continue to support shielding intermediaries from liability.158 The rapidly changing landscape is creating significant regulatory uncertainty for Big Tech firms (see the overview and chapter 1 for a discussion on the broader policy considerations relating to content moderation and mis/disinformation). Recommendations for crafting a holistic legal framework Any new social contract on data must rest on the foundation of a comprehensive legal and regulatory framework that helps build trust between stakehold- ers, integrating both safeguards and enablers. As the results of the Global Data Regulation Survey suggest, the development and robustness of different aspects of the legal and regulatory framework are quite uneven, with relatively consistent patterns across country income groups (table 6.1). These divergences may be exacerbated by differences in implementa- tion. E-commerce legislation is the only area in which all country income groups are doing comparatively well. Development is at an intermediate level in areas such as enabling reuse of public intent data, safe- guarding both personal and nonpersonal data, pro- tecting cybersecurity, and combating cybercrime. By far the weakest area of performance of the surveyed countries is enablers for private intent data. Overall, the average scores of high-income countries are not very high in absolute terms, warranting an advanced (green) classification in table 6.1 in only one case. And the score differential between high- and low-income countries is relatively small (rarely more than 30 points). Both findings indicate the novel challenges of developing a sound data governance legal frame- work and the significant progress all countries need to make. To fill the many remaining gaps in the legal frame- work and further strengthen existing provisions, this Report offers several recommendations. Overall, the underlying legal framework needs to be approached holistically. Although different elements of the legal framework can be viewed in a modular fashion, the elaboration of particular laws needs to touch on all critical aspects. The crafting of such a coherent legal framework should take into account both evolving best practices and local conditions based on robust
  • 227. Data policies, laws, and regulations: Creating a trust environment | 207 stakeholder consultation. There is no one-size-fits-all solution. Recommendations for strengthening safeguards Adopt and implement personal data protection legislation. One of the biggest contributors to the trust frame- work is the adoption of personal data protection legislation following a rights-based approach. For countries that lack data protection legislation or enforcement agencies, the existing consumer protec- tion legislation and competition law can be leveraged to remedy certain manifestations of the misuse of personal data. Although such legislation and laws may be helpful, their scope of application is limited, making them complements to, not substitutes for, personal data protection legislation. Introduce more meaningful models of consent. Tradi- tional approaches to consent, developed in an analog age, are an increasingly uncomfortable fit in the modern digital age. Furthermore, in lower-income countries, where literacy challenges continue to affect a significant share of the population, reliance on “consent,” as traditionally applied, will continue to be problematic as more people access the inter- net and permit their data to be used and reused. To ensure that consent remains a meaningful legal basis for using data, new models should be seriously con- sidered, including those that shift responsibility for data protection from individuals to the collectors and users of the data. Expandprotectiontomixeddataandgroupprivacy.New data uses, fueled by innovative analytical techniques and the growth of algorithm-based technologies such as big data and the Internet of Things, are blurring the distinction between personal and nonpersonal data. At present, only personal data fall within the scope of most current data protection laws, while ano- nymized personal data are considered nonpersonal data. In view of the ease of reidentifying and linking datasets, which opens the door to deriving sensitive or discriminatory insights from the processing of nonpersonal data, policy makers should consider expanding the scope of data protection legislation to protect such mixed data. A related issue is that current provisions for personal data protection, which focus on the individual, do not preclude the identification and potential misuse of data attributes pertaining to homogeneous groups (including those defined by ethnicity, race, religion, or sexual orientation). These protections are particularly important in complex or fragile sociopolitical environments or emergency con- texts because of the increased risk of misuse of such data for targeting or surveillance. Adopt data protection by design and default. Privacy- enhancing technologies are important complements of data protection legislation, allowing privacy to be embedded in data-driven products and services right from the design phase. These standards can play a valuable role in safeguarding fundamental data rights in contexts in which weak institutional capac- ity diminishes the legal enforceability of those rights. However, for technical mechanisms to have teeth, they must be underpinned by a robust legal frame- work that creates the rights and limits on use that privacy-enhancing technologies reinforce. Because of the utility of data protection and privacy by design, policy makers should consider building more of these Table 6.1 Certain elements of the regulatory framework are much better developed than others, but performance is generally low Safeguards Enablers Average score, by country group Cybersecurity and cybercrime Personal data Nonpersonal data E-commerce and e-transactions Public intent data Private intent data High-income 73 59 43 86 69 30 Upper-middle-income 57 46 29 74 62 20 Lower-middle-income 55 43 38 72 44 15 Low-income 39 31 47 59 28 3 Global 56 44 38 73 50 17 Source: WDR 2021 team, based on World Bank, Global Data Regulation Survey, https://microdata.worldbank.org/index.php/catalog/3866. Note: The table shows the average score for good-practice data governance by theme across country income groups as of 2020. Colors refer to the level of the regulatory framework: = advanced level (scores of 75–100); = moderate level (scores of 50–75); = evolving level (scores of 25–50); and = basic level (scores below 25).
  • 228. 208 | World Development Report 2021 requirements into their regulatory frameworks, while maintaining technological neutrality. Prioritize cybersecurity measures. Protecting indi- viduals’ and groups’ rights in data is one thing; pro- tecting the infrastructure and systems over which those data flow—cybersecurity—is another. From a legal perspective, these protections are gained by adopting cybercrime legislation that balances secu- rity concerns with other fundamental rights. Too few countries have adopted serious legal provisions to ensure cybersecurity, leading to mounting social and economic risks. This gap should be addressed as a matter of urgency. Recommendations for strengthening enablers Build a robust yet flexible foundation for electronic trans- actions. Digital transactions should be granted legal equivalence to the analog variety, with limited excep- tions. Robust authentication should be technology neutral to ensure a level playing field for a wide vari- ety of approaches to authenticating transactions and related trust services. Make data open by default and easy to access. Coun- tries should strengthen open data policies by calling for open-by-default approaches to public sector data through legislation across the whole of government. Datasets to be published should be prioritized using inputfromendusers.Endusersshouldnotbecharged (or should pay a limited price) for public intent data. Consistently apply reasonable norms for data classifi- cation. Implementation of open data policies or laws requires the consistent application of clear, reason- able data classification policies. Adoptopenstandardsandsharing-friendlylicenses.Pol- icy makers should strengthen open access to public intent data, including adoption of open standards and sharing-friendly licenses. Strengthen access to information provisions. Access to information legislation should be expanded to cover the proactive and transparent disclosure of nonsen- sitive data. Exceptions to disclosure will be necessary and should be proportionate. ATI laws should provide for regular public disclosure of ATI requests received and rejected, and justification for any rejection, ide- ally on an open platform. Promote the interoperability of data and systems. Improving the use and sharing of data will rely on developing and applying unified technical stan- dards to support the interoperability of data and systems. Interoperability of systems entails adoption of common technical protocols and a government interoperability platform. Data can be made interop- erable by ensuring that they are classified and processed according to common standards and pub- lished in a machine readable format. Supportdataportability. The right to data portability should be strengthened by requiring data to be in a structured, commonly used, and machine readable format. Interoperable data and systems can help achieve continuous data portability, where propor- tionate and technically feasible. As an alternative or complement to direct portability, personal informa- tion management systems can help users receive and manage their data, but their uptake is currently limited. The enforcement of data portability rights depends on adequate market competition, enabling users to switch providers. For data portability to be meaningful, there is also a need to address the lack of clear understanding of these rights by data subjects, as well as the implementation challenges faced by micro, small, and medium enterprises. Promote sharing of private intent data. Governments can incentivize the sharing of private sector data by promoting data sharing agreements and enhancing intellectual property rights. Together, these measures can help reduce incentives for data hoarding and leverage the reusability of data. In the case of public interest data, and particularly under emergency sit- uations, governments should increasingly consider mandating private sector data sharing, subject to suitable conditions and safeguards. A maturity model for strengthening the legal and regulatory framework The urgency of applying these measures will depend on how far a country’s legal and regulatory frame- work for data has evolved. Countries should develop sound, comprehensive policies based on best prac- tices adapted to their circumstances. Building on this foundation, countries should then enact robust legis- lation buttressed by multistakeholder consultation, followed by clear time-bound implementation pro- cedures to ensure accountability. The identified mea- sures can tentatively be mapped onto the maturity model framework summarized in table 6.2. Although certain safeguarding and enabling elements are con- sidered foundational, the ability to build an effective legal regime for trusted data use is dependent on ensuring that the overall framework is both inter- nally coherent and aligned with the country’s policy orientation, data culture, and social contract on data.
  • 229. Data policies, laws, and regulations: Creating a trust environment | 209 Table 6.2 Recommendations organized according to a maturity model based on data safeguards and enablers Stage of country’s data system Safeguards Enablers Establishing fundamentals Conduct a baseline needs assessment. Develop a comprehensive policy framework based on best practices that does the following: •  Safeguards personal, nonpersonal, and evolving categories of data and promotes greater equity around data •  Enhances the security of systems and infrastructure that protect against misuse of data •  Expands individuals’ agency and control over their personal data •  Promotes certainty and predictability, integrating the fundamental safeguards discussed in this chapter such as data protection and cybersecurity. Conduct a baseline needs assessment. Develop a comprehensive policy framework based on best practices that enables the use and sharing of data for development purposes, ensuring access, openness, interoperability, portability, predictability, and transparency, while integrating the fundamental enablers discussed in this chapter, such as electronic transactions. Initiating data flows Elaborate a legal framework that embodies policy prerogatives that include: •  Personal data protection •  Promotion of cybersecurity and combating of cybercrime •  Regulation of competition •  Provisions in the legal framework to provide for establishment of the relevant enforcement institutions. Elaborate a legal framework that embodies policy prerogatives that include: •  Legal recognition of e-transactions •  Access to information •  Intellectual property rights for nonpersonal data •  Openness of public intent data, including the use of licenses that encourage data sharing •  Data classification principles. Optimizing the system Promote awareness of safeguards: •  Domestically, through adoption of data protection by design and default, together with associated cybersecurity measures •  Internationally, through cross-border interoperability of data protection standards •  Address more complex issues such as mixed data and group rights •  Ensure that the capacity of the institutions responsible for overseeing these activities is sufficient •  Establish metrics to monitor and evaluate the implementation and enforcement of these policies and laws. Consider issues such as data portability and increasing incentives around sharing of private intent data. Ensure that the capacity of the institutions responsible for overseeing these activities is sufficient. Establish metrics to monitor and evaluate the implementation of these policies, laws, and institutions. Source: WDR 2021 team. Notes 1. Gellman (2013). 2. Confessore (2018). 3. A framework for data protection existed in the EU prior to the GDPR—the 1995 Data Protection Directive. Because a directive requires incorpora- tion into domestic law, several European countries adopted their own data protection regimes, in some cases with even more stringent protections (such as ­ Germany). However, adoption of the GDPR is a ­ significant evolution in three key dimensions. First, as a regulation that applies directly to all EU members, it has harmonized data protection law across the EU. Second, it has supported enforcement through the introduction of significant fines. And, third, it has applied extraterritorially to cross-border data transactions involving data subjects in the EU. 4. Attorney General’s Office, California Department of Justice, California Consumer Privacy Act (CCPA), https://oag.ca.gov/privacy/ccpa. 5. China’s revision of its civil code will enter into force in 2021. Articles 1032–1039 grant certain rights to indi- viduals. These provisions may be superseded by the expected introduction of a new law on protection of personal information. See, for example, Dong (2020) and Fang, Bigg, and Zhang (2020). China also published for consultation a draft law on personal data protection that in many respects mirrors provisions of the GDPR (Zhang and Yin 2020).
  • 230. 210 | World Development Report 2021 6. See Fisher and Streinz (2021) and Lessig (1999). 7. Personally identifiable information refers to informa- tion that can be used to distinguish or trace the identity of a data subject. Examples of such information are the subject’s name, national identity number, or biometric records, alone or when combined with other personal or identifying information that is linked or linkable to a specific data subject, such as date and place of birth or mother’s maiden name. 8. PMC (2019). 9. A regulatory impact analysis (RIA), also known as a regulatory impact assessment, is defined by the Organisation for Economic Co-operation and Devel- opment (OECD) as “a systemic approach to critically assessing the positive and negative effects of proposed and existing regulations and nonregulatory alterna- tives. As employed in OECD countries it encompasses a range of methods. It is an important element of an evidence-based approach to policy making” (OECD, “Regulatory Impact Analysis,” https://www.oecd.org /regreform/regulatory-policy/ria.htm). According to data from the World Bank’s Global Indicators of Reg- ulatory Governance (GIRG), 86 out of 186 countries surveyed carry out RIAs on either a regular or a semi- regular basis (World Bank, Global Indicators of Reg- ulatory Governance [dashboard], https://rulemaking .worldbank.org/).However,althoughmosthigh-income countries carry out RIAs (45 out of 59, or 76 percent), only 12 percent of low- and middle-income countries do so. Moreover, even though all OECD high-income countries except for Italy and Chile have developed specific RIA guidelines, only three countries in Sub- Saharan Africa (Kenya, South Africa, and Uganda) have set requirements. For more details, see Deighton- Smith, Erbacci, and Kauffmann (2016); ITU (2014); World Bank (2018); World Bank, “Key Findings,” https:// rulemaking.worldbank.org/en/key-findings. 10. Chen (2021). To access the World Bank’s Global Data Regulation Survey and its results, see https:// microdata.worldbank.org/index.php/catalog/3866. 11. COE (2018). 12. See, generally, page 70 of World Bank and United Nations (2017). 13. World Bank and United Nations (2017). 14. World Bank and United Nations (2017). 15. In the absence of dual criminality, if an activity is crimi- nal in jurisdiction X but is not in jurisdiction Y, then the authorities in X could not extradite a criminal in Y. 16. Treaty Office, Directorate of Legal Advice and Public International Law, Council of Europe, “Details of Treaty No. 185: Convention on Cybercrime,” https://www.coe.int /en/web/conventions/full-list/-/conventions/treaty/185. 17. Cybercrime, Council of Europe, “Parties/Observers to the Budapest Convention and Observer Organisations to the T-CY,” https://www.coe.int/en/web/cybercrime /parties-observers. 18. U.S. Clarifying Lawful Overseas Use of Data Act (“CLOUD” Act), 18 U.S. Code §2523, https://www.justice .gov/dag/page/file/1152896/download. 19. LOC (2018). 20. Council of the European Union (2019). 21. EC (2018a). 22. EC (2014). 23. Sources differ on the number of data protection laws enacted around the world: 128 countries, according to the United Nations Conference on Trade and Develop- ment, Data Protection and Privacy Legislation World- wide (dashboard), https://unctad.org/page/data -protection-and-privacy-legislation-worldwide; 116 countries, according to DLA Piper (2020); and 142 coun- tries (as of 2019), according to Greenleaf and Cottier (2020), as referenced by Anderson and Renieris (2020). 24. Consent is not the only basis for data processing, but it remains a centerpiece because of its historical leg- acy. Even in the GDPR, consent is one among many grounds for legitimate data processing. See, for exam- ple, GDPR Article 6.1 (EU 2018a). 25. See Recital 43 of the GDPR (EU 2018c). 26. For example, the US Fair Credit Reporting Act bans certain types of data from being used to determine an individual’s creditworthiness (FTC 2018). 27. Section 35 of the Personal Data Protection Bill currently under discussion in India states that, in the event of an imminent threat to the sovereignty or integrity of the country or security of the state, the government has the power to exempt public sector entities from appli- cation of the bill entirely (Parliament of India 2019). 28. Sections 13(1) and (2) of India’s Personal Data Protection Bill (2018) state that, until and unless such a threat occurs, personal data may be processed without pro- curing consent from the user in the following cases: “(1) Personal data may be processed if such processing is necessary for any function of Parliament or any State Legislature. (2) Personal data may be processed if such processing is necessary for the exercise of any function of the State authorised by law for: (a) the provision of any service or benefit to the data principal from the State; or (b) the issuance of any certification, license or permit for any action or activity of the data principal by the State” (Personal Data Protection Bill, 2018, https://www.meity.gov.in/writereaddata/files/Personal _Data_Protection_Bill,2018.pdf). 29. See Article 8 of the European Convention on Human Rights on the right to respect for private and family life (ECHR 2010). The European Court of Human Rights (ECHR) has interpreted limitations to the right as sub- ject to a “legitimate aim” necessary to fulfill a “pressing social need” and “proportionate to the legitimate aim pursued” (ECHR 2020, 12). These due process restric- tions apply even in emergency situations—see Article 15 (ECHR 2010). Such situations could include war or pandemic. The European Data Protection Board (EDPB 2018), civil society organizations such as the Elec- tronic Frontier Foundation (see, for example, Gelman 1998), and Article 19 of the European Convention have enshrined these principles into data protection rules and guidelines. See Electronic Frontier Foundation, “13 International Principles on the Application of Human
  • 231. Data policies, laws, and regulations: Creating a trust environment | 211 Rights to Communication Surveillance,” https:// www.eff.org/files/2014/01/05/13p-onepagerfinal.pdf. 30. Ben-Avie and Tiwari (2019). 31. According to the World Economic Forum, “volunteered data” are data that are “created and explicitly shared by individuals, e.g., social network profiles”; “observed data” are “captured by recording the actions of individ- uals, e.g., location data when using cell phones”; and “inferred data” are “data about individuals based on an analysis of volunteered or observed information, e.g., credit scores” (WEF 2011). 32. Austin and Lie (2019). See also Recital 26 of the GDPR: “The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anony- mous in such a manner that the data subject is not or no longer identifiable” (EU 2018b). Pseudonymized data, however, still count as personally identifiable information. 33. Austin and Lie (2019); de Montjoye et al. (2013). Addi- tional research argues that only three data points are needed for reidentification in most cases (Sweeney 2000). There is also much recent research on the lim- its and durability of these deidentifying technologies (Lubarsky 2017). 34. Austin and Lie (2019). For example, scientific research may require certain personally identifiable infor- mation characteristics (such as age and gender) for accuracy. 35. Current international guidelines for data collection and processing, such as the 2013 OECD guidelines and the United Nations Data Privacy, Ethics and Protection Principles (UNSDG 2017), and leading legal frame- works such as the European Union’s General Data Protection Regulation (EU 2018f), focus on protecting personal data and professionally identifiable informa- tion. For a broader discussion, see Taylor, Floridi, and van der Sloot (2017). 36. Krämer, Senellart, and de Streel (2020). 37. A purpose-driven approach to data protection should involve determining as threshold questions what data should be collected and what data should be shared. Anonos, “Schrems II Webinar Summary: Lawful Data Transfers,” https://www.schremsii.com/faqs-and -summary-edps-noyb-webinar. 38. In other words, focus on the ways in which the data will and may be used and what its potential impacts may be. For this reason, tools such as data protection impact assessments (and, when appropriate, human rights impact assessments, such as when high-risk, data-driven technologies are being used) can help iden- tify risks that must be mitigated through the appropri- ate legal, technical, and organizational means. 39. Cavoukian (2011). 40. For example, homomorphic encryption allows analy­ sis of encrypted data. Similar in purpose, federated learning techniques allow data to be processed and analyzed without having to send raw data to a central server (Homomorphic Encryption Standardization, “Homomorphic Encryption,” https://homomorphicen cryption.org/; Potey, Dhote, and Sharma 2016). That said, encryption is not a silver bullet for compliance. Encryption may be an effective safeguard while data are in storage or in transit, but it may not provide sufficient protection for processing if data must be de-encrypted before computation. 41. Newer techniques that have emerged in response to challenges around deidentification include K-anonymity (works by aggregating data attributes) and differential privacy (works by introducing random noise into datasets)—see Austin and Lie (2019); Dwork (2006); Sweeney (2000). 42. The European Commission’s guidance on privacy by design is clear that these techniques should not be a substitute for robust legal protections: “The term ‘Privacy by Design’ means nothing more than ‘data protection through technology design.’ Behind this is the thought that data protection in data processing procedures is best adhered to when it is already inte- grated in the technology when created. . . . The text of the law leads one to conclude that often several protec- tive measures must be used with one another to satisfy statutory requirements. In practice, this consideration is already performed in an early development phase when setting technology decisions. Recognized certifi- cation can serve as an indicator to authorities that the persons responsible have complied with the statutory requirements of ‘Privacy by Design’” (Intersoft Con- sulting, “GDPR: Privacy by Design,” https://gdpr-info .eu/issues/privacy-by-design/). Also see DSGVO-Portal, “Recital 78 GDPR | General Data Protection Regula- tion,” https://www.dsgvo-portal.de/gdpr_recital_78.php. 43. The Mauritius Declaration on the Internet of Things states: “Data processing starts from the moment the data are collected. All protective measures should be in place from the outset. We encourage the development of technologies that facilitate new ways to incorporate data protection and consumer privacy from the outset. Privacy by design and default should no longer be regarded as something peculiar. They should become a key selling point of innovative technologies” (EDPS 2014, 2). 44. The International Organization for Standardization has created a technical committee for a new ISO stan- dard on Consumer Protection: Privacy by Design for Consumer Goods and Services (ISO 2018). 45. Cavoukian (2010). 46. According to the European Commission: “Companies/ organisations are encouraged to implement technical and organisational measures, at the earliest stages of the design of the processing operations, in such a way that safeguards privacy and data protection principles right from the start (‘data protection by design’). By default, companies/organisations should ensure that personal data is processed with the highest privacy pro- tection (for example only the data necessary should be processed, short storage period, limited accessibility) so that by default personal data isn’t made accessible to an indefinite number of persons (‘data protection
  • 232. 212 | World Development Report 2021 by default’).” European Commission, “What Does Data Protection ‘by Design’ and ‘by Default’ Mean?” https:// ec.europa.eu/info/law/law-topic/data-protection /reform/rules-business-and-organisations/obligations /what-does-data-protection-design-and-default-mean _en. 47. OVIC (2020). 48. See PCPD (2012) for materials on the data protection framework in China. 49. ICO (2018). 50. de Montjoye at al. (2013). Additional research argues that only three data points are needed for reidentifica- tion in most cases (Sweeney 2000). 51. These new techniques include statistical approaches such as K-anonymity, which aggregates data attributes (Sweeney 2002); differential privacy, which intro- duces random noise (Dwork 2006); and encryption techniques such as homomorphic encryption, which conduct analysis on encrypted data (Potey, Dhote, and Sharma 2016). 52. Austin and Lie (2019). 53. See Hasselbalch and Tranberg (2016). An explanation of a personal data store is offered in Mydex, “What Is a Personal Data Store?” https://pds.mydex.org/what -personal-data-store-0. 54. ENISA (2014). Also see the recommendation by the European Union Agency for Cybersecurity (ENISA) that it may be necessary to overlay several privacy by design or pseudonymization techniques in order to meet the GDPR’s threshold (ENISA 2019). 55. The OECD Recommendation on Artificial Intelligence “identifies five complementary values-based principles for the responsible stewardship of trustworthy AI” (OECD 2019c). In particular, according to principle 2 on human-centered values and fairness, “AI actors should respect the rule of law, human rights, and democratic values throughout the AI system life cycle. These include freedom, dignity and autonomy, privacy and data protection, nondiscrimination and equality, diver- sity, fairness, social justice, and internationally recog- nized labor rights.” These actors should also “imple- ment mechanisms and safeguards, such as capacity for human determination, that are appropriate to the context.” According to principle 3 on transparency and explainability, “AI actors should commit to transpar- ency and responsible disclosure regarding AI systems.” One of the aims should be “to enable those adversely affected by an AI system to challenge [the] outcome based on plain and easy-to-understand information.” As of May 2019, 44 countries had adhered to the OECD Recommendation and the five principles (OECD 2019c). 56. See Loi N° 2013-450 relative à la protection des données à caractère personnel [Law 2013-450 on the protection of personal data], Journal Officiel de la Republique de Côte d’Ivoire, August 8, 2013, 474–82. 57. Controversies around the use of facial recognition and other AI-based technologies for law enforcement have been in the public eye in the United Kingdom since 2019, when the UK Information Commissioner Office launched an investigation into the use of facial recognition technology in King’s Cross in London, on the grounds that it might raise data protection con- cerns. Subsequently, the UK High Court’s decision in favor of the use of facial recognition by the SouthWales Police, after the claimant argued that its use would be a violation of privacy, was the first legal challenge to the use of facial recognition by police in the world. See ICO (2019); Nilsson (2019); Smith (2016). 58. The 2016 study conducted by the Human Rights Data Analysis Group using 2010 and 2011 data from the Oakland police department and other sources com- pared a mapping of drug use based on survey data from the victims of crime with another based on algo- rithmic analysis of police arrests. The study showed that biased source data could reinforce and poten- tially amplify racial bias in law enforcement practices (Lum 2016). Data on arrests showed that African- American neighborhoods have on average 200 times more drug arrests than other areas in Oakland (NIST 2020; Smith 2016). 59. Hill (2020). 60. Noble (2018). 61. Dastin (2018). 62. Organisation for Economic Co-operation and Develop- ment, http://www.oecd.org/going-digital/ai/principles/; G-20 (Japan-led), https://www.meti.go.jp/press/2019/06 /20190610010/20190610010-1.pdf. 63. HLCM (2018). 64. Cision (2020); City of Amsterdam (2020); City of Hel- sinki (2020). 65. DCMS (2019); Stats NZ (2019). For a subnational exam- ple, see Nantes City’s Metropolitan Charter on Data (Ville de Nantes 2019). At a national level, France’s Etalab has developed a map of algorithmic systems in use across public sector entities in France and is providing ministries, departments, and agencies with guidance on their reporting and other accountability requirements (Etalab 2020a, 2020b). 66. See Canada’s responsible use of AI in government pro- grams, including Guiding Principles, lists of certified providers of AI services, and its Algorithmic Impact Assessment (TBS 2020). 67. Henley and Booth (2020). 68. The mandate of the US Federal Trade Commission (FTC) includes hearing and adjudicating cases involv- ing unfair competition or unfair or deceptive acts under Section 5 of the FTC Act (see Federal Trade Commission, Federal Trade Commission Act, https:// www.ftc.gov/enforcement/statutes/federal-trade -commission-act). According to the FTC, “when com- panies tell consumers they will safeguard their per- sonal information, the FTC can and does take law enforcement action to make sure that companies live up [to] these promises. The FTC has brought legal actions against organizations that have violated con- sumers’ privacy rights, or misled them by failing to maintain security for sensitive consumer information, or caused substantial consumer injury. In many of these cases, the FTC has charged the defendants with violating Section 5 of the FTC Act, which bars unfair
  • 233. Data policies, laws, and regulations: Creating a trust environment | 213 and deceptive acts and practices in or affecting com- merce. In addition to the FTC Act, the agency also enforces other federal laws relating to consumers’ pri- vacy and security” (see Federal Trade Commission, “Privacy and Security Enforcement,” https://www .ftc.gov/news-events/media-resources/protecting -consumer-privacy/privacy-security-enforcement). 69. Hoofnagle, Hartzog, and Solove (2019). 70. In the context of its 2020 European Data Strategy, the EU may adopt a new Data Act in 2021, which would update the IPR framework currently in force (includ- ing a possible revision of the 1996 Database Directive) to support the use and reuse of nonpersonal data (EC 2020b). 71. See, for example, Contract Guidelines on Data Utiliza- tion Rights, updating the Unfair Competitive Preven- tion Act of 2018 (METI 2020). 72. MeitY (2020). 73. See World Intellectual Property Organization, “Brazil: Law No. 9.610 of February 19, 1998 (Law on Copyright and Neighboring Rights, as amended by Law No. 12.853 of August 14, 2013),” WIPO Lex (database), https:// wipolex.wipo.int/en/legislation/details/17474. 74. See, generally, the two model laws promulgated by the United Nations Commission on International Trade Law (UNCITRAL 1998, 2001). 75. For purposes of this discussion, no distinction is drawn between “electronic” signatures and “digital” signatures, although commonly “digital” signatures are associated with the use of public key infrastruc- ture (PKI). For a more detailed explanation of PKI and the differences between e-signatures and digital signatures, see UNCITRAL (2001, 26–27; https://www .uncitral.org/pdf/english/texts/electcom/ml-elecsig-e .pdf). 76. UNCITRAL (1998). 77. Public key infrastructure (PKI) has been defined as follows: “The framework and services that provide for the generation, production, distribution, control, accounting, and destruction of public key certificates. Components include the personnel, policies, processes, server platforms, software, and workstations used for the purpose of administering certificates and public- private key pairs, including the ability to issue, main- tain,recover,andrevokepublickeycertificates”(https:// nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST .SP.800-53r4.pdf). 78. EU (2014, article 3[15]). 79. See, generally, World Bank, ID4D Data: Global Identifi- cation Challenge by the Numbers (dashboard), https:// id4d.worldbank.org/global-dataset; Sustainable Devel- opment, Department of Economic and Social Affairs, United Nations, “The 17 Goals,” https://sdgs.un.org /goals. Sustainable Development Goal (SDG) 16.9 states: “By 2030, provide legal identity for all, including birth registration.” 80. World Bank (2019). 81. UNCITRAL (1998). 82. National Assembly, Togo (2020). 83. EC (2018c). 84. The Australian government’s Data Sharing and Release Act of 2018 was drafted based on the results of a report of the Productivity Commission (PC 2017). The purpose of the act is to (1) promote bet- ter sharing of public sector data, (2) build trust in use of public data, (3) dial up or down appropriate safeguards, (4) maintain the integrity of the data system, and (5) establish institutional arrangements (see Department of the Prime Minister and Cabinet, “Data Sharing and Release Reforms,” https://www .pmc.gov.au/public-data/data-sharing-and-release -reforms). This is expected to lead to (1) more efficient and effective government services for citizens; (2) more well-informed government programs and policies; (3) greater transparency around government activities and spending; (4) economic growth from innovative data use; and (5) research solutions to current and emerging social, environmental, and economic issues. The purpose of the act is thus to move the paradigm from one that restricts access to identifiable data to one that authorizes release if appropriate data safeguards are in place. To complement the Data Sharing and Release Act, the government published a best-practice guide outlining good-practice principles based on the Five Safes Framework to manage the risks of disclosure and designed to assess whether and how to share data (PMC 2019). By enabling a privacy by design approach to data sharing by focusing on controls and benefits instead of merely reducing the level of detail in the data to be shared, the principles help maximize the useful- ness of the data. 85. National Archives (2019). 86. Section 1 of France’s Law for a Digital Republic lays out provisions on open government data (Légifrance 2016). Also see Dodds (2016). 87. They include geospatial, meteorological, and mobility data, as well as statistics and data on corporate owner- ship and Earth observation and the environment. 88. Noveck (2017). 89. Noveck (2017). 90. See Open Knowledge Foundation (2020). 91. OECD (2013, 2019a); Ubaldi (2013); Vickery (2012). 92. Dodds (2016). 93. OECD (2019a). 94. Austin and Lie (2019); Dodds (2016). 95. World Wide Web Foundation (2017). 96. For general principles, see ISO and IEC (2016, sec. 8.2). For a practical example, see Data.NSW (2020). 97. OECD (2019a). 98. OECD (2019a). 99. Austin and Lie (2019). 100. Wilkinson et al. (2016). 101. See Article 3: “Art. L. 300-4.- Any provision made elec- tronically under this book is done in an open standard, easily reusable and exploitable by an automated pro- cessing system” of the French Republic (Légifrance 2016). 102. Because the development of open standards is often undertaken with input from leading industry participants, who frequently integrate their firms’
  • 234. 214 | World Development Report 2021 proprietary technical standards into the design, SSOs may require the application of patent rights on FRAND terms. The adoption of FRAND licensing terms can therefore become a condition for participation in SSOs. The obligation to offer FRAND licenses to new market entrants usually extends to third-party technology providers whether or not they are SSO members. For further details, see Ragavan, Murphy, and Davé (2016). 103. The Open Knowledge Foundation’s definition of open data (“Open Definition”) sets out conditions for the availability and access of data, its reuse and redistribu- tion, and universal participation. On the latter, “every- one must be able to use, re-use and redistribute—there should be no discrimination against fields of endeavor or against persons or groups. For example, ‘non- commercial’ restrictions that would prevent ‘commer- cial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.” See Open Knowl- edge Foundation (2020; https://okfn.org/opendata/). 104. MITCI (2017). 105. Council of Ministers, Jordan (2019). 106. ISO and IEC (2017). 107. EDRi (2015). Ensuring sufficient resources and techni- cal capacity to effectively discharge these functions is critical. For example, Estonia’s X-Tee data exchange and interoperability platform is continuously monitored to mitigate cyberthreats (RIA 2020). See chapter 8 for further details on implementation. 108. EC (2016). 109. As the European Commission notes: “Online public services are crucial to increasing the cost-efficiency and quality of the services provided to citizens and companies. One example of increased efficiency is the ‘Once Only’ principle—only in 48% of cases do public administrations reuse information about the citizen or companies that is already in their possession without asking again. The extension of this principle, in compliance with data protection legislation, would generate an annual net saving at the EU level of around EUR 5 billion per year by 2017. The Commission will launch a pilot project for the ‘Once-Only’ principle for businesses and citizens and explore the possibility of an EU wide e-safe solution (a secure online repository for documents). Extending ‘Once-Only’ across borders would further contribute to the efficiency of the Digital Single Market” (EC 2015, 16). 110. TOOP (2021). 111. See European Commission, “Horizon 2020,” https:// ec.europa.eu/programmes/horizon2020/en. 112. SCOOP4C, “Stakeholder Community: Once-Only Prin- ciple for Citizens,” https://www.scoop4c.eu/. 113. OECD (2019a). 114. OECD (2019a). 115. EU (2019b). 116. EC (2020a). 117. Légifrance (2016). This covers, for example, data from delegated public services or data that are relevant for targeting welfare payments or constructing national statistics (OECD 2019a). 118. The UK Digital Economy Act enables accredited researchers to gain access to deidentified data for research purposes (National Archives, United Kingdom 2017, c. 30, Chap. 5). The act regulates data sharing prac- tices for the purposes of research using public data, but it does not govern data sharing in other contexts (Austin and Lie 2019). 119. Finch and Tene (2018). 120. OECD (2019b). 121. Ragavan, Murphy, and Davé (2016). 122. FRAND licensing regimes have been designed to be an effective competition law remedy (see the Apple vs. Samsung cases), but infringements of FRAND terms involve contractual remedies between the patent holder and the SSO (or third party). However, experts have argued that the pro-innovation and competitive effects of licensing regimes depend on how they are implemented. Indeed, some have argued that an “excessive reliance” on FRAND terms may be counterproductive. 123. Open Knowledge Foundation, “Open Data Commons Open Database License (ODbL) v1.0,” https://opendata commons.org/licenses/odbl/1-0/. 124. Creative Commons, “Open Data,” https://creative commons.org/about/program-areas/open-data/. 125. MoICT (2017). 126. See the European Union’s Free Flow of Nonpersonal Data Regulation (EU 2018e), the Payment Services Directive (EU 2015), the Digital Content Directive (EU 2019a), and certain sectoral regulations, in addition to the right to data portability for personal data enshrined in Article 20 of the GDPR. See also Borgogno and Colangelo (2019). 127. Article 20 of the GDPR (EU 2016). 128. The European Commission notes: “In general, given the policy objectives of the right to data portability, the term ‘provided by the data subject’ must be interpreted broadly, and should exclude ‘inferred data’ and ‘derived data,’ which include personal data that are created by a service provider (for example, algorithmic results). A data controller can exclude those inferred data but should include all other personal data provided by the data subject through technical means provided by the controller” (EC 2017). This approach contrasts with that of other legal frameworks, such as the California Con- sumer Protection Act (CCPA), that are broader in scope covering inferred data (see OneTrust DataGuidance and FPF 2019). 129. Krämer, Senellart, and de Streel (2020). 130. See Recital 68 of the GDPR: “Data controllers should be encouraged to develop interoperable formats that enable data portability. . . . The data subject’s right to transmit or receive personal data concerning him or her should not create an obligation for the controllers to adopt or maintain processing systems which are technically compatible. . . . Where technically feasible, the data subject should have the right to have the per- sonal data transmitted directly from one controller to another” (EU 2018d).
  • 235. Data policies, laws, and regulations: Creating a trust environment | 215 131. See Article 6, “Porting of Data,” of the EU Regulation on the Free Flow of Non-personal Data: “The Commission shall encourage andfacilitate[emphasis added] the devel- opment of self-regulatory codes of conduct at Union level (‘codes of conduct’), in order to contribute to a competitive data economy” (EU 2018e, 67). 132. PIMS can help individuals control their ported data through mechanisms that simplify the process. They can include mechanisms that support individual con- trol over ported data such as schema mappings (which convert data from the sender’s format to the receiver’s) or functionalities that centralize and help visualize consent and rights management for portability or broader data protection. However, these mechanisms have not been standardized across the industry to date, which affects the broader sustainability of the business model and their adoption as an alternative to other enforcement mechanisms. 133. Measures such as shifting to authentication mecha- nisms (like privacy seals) and open-source solutions that are more user friendly may support the adoption of PIMS as alternatives for consumers, especially if the reliability of these solutions are certified to promote trust (Krämer, Senellart, and de Streel 2020). 134. The first right to portability mandated by EU law was the portability of phone numbers, following the Uni- versal Services Directive, based on a legislative effort to create competition in the telecommunications sector (Zanfir-Fortuna and Hondagneu-Messner 2019). 135. Borgogno and Colangelo (2019). 136. Congress of the Philippines (2012). 137. The “consumer data right” aims to “give Australians greater control over their data, empowering their consumers to choose to share their data with trusted recipients only for the purposes they have authorized” (Treasury, Australia 2020). 138. See Part IVD in Federal Register of Legislation, Australia (2019). 139. The act begins with the telecommunications, bank- ing, and energy sectors before rolling out across the economy. 140. The data standards body has released version 1.6.0 of the consumer data standards, which represent high- level standards and are in accordance with the rules and phasing timetable of the Australian Competition and Consumer Commission. See Data61, Common- wealth Scientific and Industrial Research Organisa- tion, “Consumer Data Standards,” https://consumer datastandards.gov.au/consumer-data-standards/. 141. At present, the Data Transfer Project is at the pilot stage, making it difficult to measure the impact of the project on enabling continuous portability of data. It remains an interesting model of private sector–led cooperation to develop standard and interoperable data formats that could be scaled up. See Google, “Data Transfer Project,” https://datatransferproject.dev/. The founding members of the Data Transfer Project were Google and Facebook. They were later joined by Apple, Microsoft, and Twitter. 142. Borgogno and Colangelo (2019). 143. OECD (2019a). 144. This was a point of discussion at the international policy workshop “Data for Better Lives: Enablers and Safeguards” hosted by the World Bank and the German Federal Ministry of Economic Cooperation and Devel- opment in Washington, DC, June 9–10, 2020. 145. See Berlin Group, “PSD2 Access to Bank Accounts,” https://www.berlin-group.org/psd2-access-to-bank -accounts. 146. Waze (2018). 147. Google, “Waze for Cities: Working Together for a Smarter, More Sustainable Future,” Waze, https://www .waze.com/ccp. Waze and other companies have been sharing data with local governments in Brazil since the 2016 Rio Olympics under their Connected Citizens Program. Their platform is designed to support public entities with urban planning, traffic optimization, law enforcement, and emergency service provision (Huyer and Cecconi 2020). 148. OECD (2019a). 149. Huyer and Cecconi (2020). 150. These include Japan’s “Contract Guidance on Utili- zation of AI and Data” (METI 2018); the Netherlands’ Dare-2-Share Cooperation Agreement (Dare 2 Share Ministries, “Terms and Conditions,” https://www .dare2share.org/about/terms-and-conditions/); and the European Union’s proposed “Guidance on Private Sector Data Sharing” (EC 2018b). Japan’s Ministry of Economy, Trade and Industry (METI) developed the “Contract Guidance on Utilization of AI and Data” as a resource for businesses entering a data sharing agree- ment. It highlights factors and terms to be considered for inclusion when drafting a contract using data or AI, including sample clauses. 151. Kirkpatrick (2014). 152. See “Development Data Partnership,” https://data partnership.org/. 153. GPO (2018). 154. Kosseff (2019, 27). 155. GPO (2018, at sec. 230). 156. LOC (1998). 157. Kosseff (2019, 5). 158. See Electronic Frontier Foundation, “Manila Princi- ples on Intermediary Liability,” https://www.manila principles.org/. References Anderson, Thea, and Elizabeth M. Renieris. 2020. “Data Protection and Digital Infrastructure before, during, and after a Pandemic.” Omidyar Network, Redwood City, CA. https://omidyar.com/data-protection-and-digital -infrastructure-before-during-and-after-a-pandemic/. Austin, Lisa M., and David Lie. 2019. “Safe Sharing Sites.” NYU Law Review 94 (4): 591–623. https://www.nyulawreview .org/issues/volume-94-number-4/safe-sharing-sites/. Ben-Avie, Jochai, and Udbhav Tiwari. 2019. “India’s New Data Protection Bill: Strong on Companies, Step Backward
  • 236. 216 | World Development Report 2021 on Government Surveillance.” Open Policy and Advocacy (blog), December 10, 2019. https://blog.mozilla.org/net policy/2019/12/10/indias-new-data-protection-bill-strong -on-companies-weak-on-gov. Borgogno, Oscar, and Giuseppe Colangelo. 2019. “Data Shar- ing and Interoperability: Fostering Innovation and Com- petition through APIs.” Computer Law and Security Review 35 (5): 105314. https://doi.org/10.1016/j.clsr.2019.03.008. Cavoukian, Ann. 2010. “Privacy by Design: The Definitive Workshop; A Foreword by Ann Cavoukian, Ph.D.” Iden- tity in the Information Society 3 (2): 247–51. https://doi.org /10.1007/s12394-010-0062-y. Cavoukian, Ann. 2011. “PbD, Privacy by Design, the 7 Foun- dational Principles: Implementation and Mapping of Fair Information Practices.” Information and Privacy Commissioner of Ontario, Toronto. Chen, Rong. 2021. “Mapping Data Governance Legal Frame- works around the World: Findings from the Global Data Regulation Diagnostic.” Policy Research Work- ing Paper 9615, World Bank, Washington, DC. http:// documents.worldbank.org/curated/en/58133161781768 0243/Mapping-Data-Governance-Legal-Frameworks -Around-the-World-Findings-from-the-Global-Data -Regulation-Diagnostic. Cision. 2020. “Helsinki and Amsterdam First Cities in the World to Launch Open AI Register.” Cision News, Septem- ber 28, 2020. Cision, Chicago. https://news.cision.com /fi/city-of-helsinki/r/helsinki-and-amsterdam-first -cities-in-the-world-to-launch-open-ai-register,c3204076. City of Amsterdam. 2020. “What Is the Algorithm Regis- ter?” City of Amsterdam Algorithm Register Beta. https:// algoritmeregister.amsterdam.nl/en/ai-register/. City of Helsinki. 2020. “What Is an Artificial Intelligence Register?” City of Helsinki Artificial Intelligence Register. https://ai.hel.fi/. COE(CouncilofEurope).2018.“Convention108+:Convention for the Protection of Individuals with Regard to the Pro- cessingofPersonalData.”COE,Strasbourg.https://rm.coe .int/convention-108-convention-for-the-protection-of -individuals-with-regar/16808b36f1. Confessore, Nicholas. 2018. “Cambridge Analytica and Face- book: The Scandal and the Fallout So Far.” NewYorkTimes, April 4, 2018. https://www.nytimes.com/2018/04/04/us /politics/cambridge-analytica-scandal-fallout.html. Congress of the Philippines. 2012. “Republic Act No. 10173: An Act Protecting Individual Personal Information in Infor- mation and Communications Systems in the Govern- ment and the Private Sector, Creating for This Purpose a National Privacy Commission, and for Other Purposes.” August 12, 2012, Lawphil Project, Arellano Law Founda- tion, Manila. https://lawphil.net/statutes/repacts/ra2012 /ra_10173_2012.html. Council of Ministers, Jordan. 2019. “Jordan Open Govern- ment Data License.” Issue version 1.0, Open Govern- ment Data Platform. https://portal.jordan.gov.jo/OGD -License_en.pdf. Council of the European Union. 2019. “Regulation of the European Parliament and of the Council on European Production and Preservation Orders for Electronic Evidence in Criminal Matters.” Interinstitutional File 2018/0108(COD),CounciloftheEuropeanUnion,Brussels. https://data.consilium.europa.eu/doc/document/ST -10206-2019-INIT/en/pdf. Dastin, Jeffrey. 2018. “Amazon Scraps Secret AI Recruit- ing Tool That Showed Bias against Women.” Reuters, October 10, 2018. https://www.reuters.com/article/us -amazon-com-jobs-automation-insight/amazon-scraps -secret-ai-recruiting-tool-that-showed-bias-against -women-idUSKCN1MK08G. Data.NSW. 2020. “NSW Government Information Classifi- cation, Labelling, and Handling Guidelines.” Data.NSW, Data Analytics Center, Customer, Delivery, and Trans- formation, Department of Customer Service, Govern­ ment of New South Wales, Sydney. https://www .digital.nsw.gov.au/sites/default/files/NSW%20Info%20 Classification%20Labelling%20and%20Handling%20 Guidelines%202020%20V2.1_1.pdf. DCMS (Department for Digital, Culture, Media, and Sport, United Kingdom). 2019. “Digital Charter.” Policy Paper, DCMS, London. https://www.gov.uk/government /publications/digital-charter/digital-charter. Deighton-Smith, Rex, Angelo Erbacci, and Céline Kauff- mann. 2016. “Promoting Inclusive Growth through Better Regulation: The Role of Regulatory Impact Assessment.” OECD Regulatory Policy Working Paper 3, Organisation for Economic Co-operation and Develop- ment, Paris. https://doi.org/10.1787/5jm3tqwqp1vj-en. de Montjoye, Yves-Alexandre, César A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. 2013. “Unique in the Crowd: The Privacy Bounds of Human Mobility.” Scientific Reports 3 (1): article 1376. https://doi.org/10.1038 /srep01376. DLA Piper. 2020. Data Protection Laws of the World. London: DLA Piper. https://www.dlapiperdataprotection.com /index.html?t=aboutc=AO. Dodds, Leigh. 2016. “How to Write a Good Open Data Policy.” Guides. Open Data Institute, London. Dong, Marissa Xiao. 2020. “China: The Civil Code Strength- ens Civil Law Protection around Privacy and Personal Information.” Conventus Law, June 12, 2020. http:// www.conventuslaw.com/report/china-the-civil-code -strengthens-civil-law/. Dwork, Cynthia. 2006. “Differential Privacy.” In Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10–14, 2006, Proceedings, Part II, edited by Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Wegener, 1–12. Lecture Notes in Com- puter Science Series, Vol. 4052. Berlin: Springer. https:// link.springer.com/chapter/10.1007%2F11787006_1. EC (European Commission). 2014. “Guidelines on the Imple- mentation of the Court of Justice of the European Union Judgment on ‘Google Spain and INC v. Agencia Española de Protección de Datos (AEPD) and Mario Costeja González’ C-131/12.” Document WP225, Directorate C (Fundamental Rights and Union Citizenship), Director- ate General Justice, EC, Brussels. https://ec.europa.eu /newsroom/article29/item-detail.cfm?item_id=667236. EC (European Commission). 2015. “A Digital Single Market Strategy for Europe.” Document COM(2015) 192 final, EC, Brussels. https://eur-lex.europa.eu/legal-content/EN /TXT/PDF/?uri=CELEX:52015DC0192from=EN. EC (European Commission). 2016. “EU eGovernment Action Plan 2016–2020: Accelerating the Digital Transformation
  • 237. Data policies, laws, and regulations: Creating a trust environment | 217 of Government.” Document COM(2016) 179 final, EC, Brussels. https://eur-lex.europa.eu/legal-content/EN /TXT/?uri=CELEX:52016DC0179. EC (European Commission). 2017. “Article 29 Data Protec- tion Working Party: Guidelines on the Right to Data Portability.” Document WP242 rev.01, Directorate C (Fundamental Rights and Rule of Law), Director- ate General Justice and Consumers, EC, Brussels. https://ec.europa.eu/newsroom/article29/item-detail .cfm?item_id=611233. EC(EuropeanCommission).2018a.“Article29WorkingParty: Guidelines on Consent under Regulation 2016/679.” Doc- ument WP259 rev.01, Directorate C (Fundamental Rights and Union Citizenship), Directorate General Justice, EC, Brussels. https://ec.europa.eu/newsroom/article29 /document.cfm?action=displaydoc_id=51030. EC (European Commission). 2018b. “Guidance on Private Sector Data Sharing.” Text. Shaping Europe’s Digital Future—European Commission. https://ec.europa.eu /digital-single-market/en/guidance-private-sector-data -sharing. EC (European Commission). 2018c. Study to Support the Review of Directive 2003/98/EC on the Re-Use of Public Sec- tor Information: Final Report. Luxembourg: Publications Office of the European Union. https://data.europa.eu /doi/10.2759/373622. EC (European Commission). 2020a. “European Legislation on Open Data and the Re-Use of Public Sector Informa- tion.” ShapingEurope’sDigitalFuture:Policy.Data Policy and Innovation (Unit G.1), EC, Brussels. https://ec.europa.eu /digital-single-market/en/european-legislation-reuse -public-sector-information. EC (European Commission). 2020b. “A European Strategy for Data.” Communication COM(2020) 66 final, Brussels, EC. https://ec.europa.eu/info/sites/info/files /communication-european-strategy-data-19feb2020_en .pdf. ECHR (European Court of Human Rights). 2010. “European Convention on Human Rights.” ECHR and Council of Europe, Strasbourg. https://www.echr.coe.int/documents /convention_eng.pdf. ECHR (European Court of Human Rights). 2020. Guide on Article 8 of the European Convention on Human Rights: Right to Respect for Private and Family Life, Home and Correspon- dence, rev. ed. Strasbourg: ECHR. https://www.echr.coe .int/documents/guide_art_8_eng.pdf. EDPB (European Data Protection Board). 2018. “Guidelines 2/2018 on Derogations of Article 49 under Regulation 2016/679.” Guidelines. EDPB, Brussels. https://edpb.europa .eu/sites/edpb/files/files/file1/edpb_guidelines_2_2018 _derogations_en.pdf. EDPS (European Data Protection Supervisor). 2014. “Mauritius Declaration on the Internet of Things.” 36th International Conference of Data Protection and Privacy Commissioners, Balaclava, Mauritius, October 14, 2014. https://edps.europa.eu/sites/edp/files /publication/14-10-14_mauritius_declaration_en.pdf. EDRi (European Digital Rights). 2015. “A Truly Digital Sin- gle Market?” June, EDRi, Brussels. https://edri.org/files /DSM_Analysis_EDRi_20150617.pdf. ENISA (European Union Agency for Cybersecurity). 2014. “Privacy and Data Protection by Design: From Policy to Engineering.” ENISA, Heraklion, Greece. https://data .europa.eu/doi/10.2824/38623. ENISA (European Union Agency for Cybersecurity). 2019. “Pseudonymisation Techniques and Best Practices: Recommendations on Shaping Technology According to Data Protection and Privacy Provisions.” ENISA, Herak- lion, Greece. https://www.enisa.europa.eu/publications /pseudonymisation-techniques-and-best-practices. Etalab. 2020a. “Algorithmes de Nantes Métropole.” data.gouv. fr, October 7, 2020, Etalab, Paris. https://www.data.gouv .fr/en/datasets/algorithmes-de-nantes-metropole/. Etalab. 2020b. “Les algorithmes publics: enjeux et obli- gations” [Public sector algorithms: challenges and obligations]. guides.etalab.gouv.fr, Etalab, Paris. https:// guides.etalab.gouv.fr/algorithmes/guide/#_1-a-quoi -servent-les-algorithmes-publics. EU (European Union). 2014. “Regulation (EU) No 910/2014 of the European Parliament and of the Council of 23 July 2014 on Electronic Identification and Trust Services for Electronic Transactions in the Internal Market and Repealing Directive 1999/93/EC.” Official Journal of the European Union L 257/73 (August 8). https://ec.europa.eu /futurium/en/system/files/ged/eidas_regulation.pdf. EU (European Union). 2015. “Directive (EU) 2015/2366 of the European Parliament and of the Council of 25 November 2015 on Payment Services in the Internal Market, Amending Directives 2002/65/EC, 2009/110/ EC and 2013/36/EU and Regulation (EU) No 1093/2010, and Repealing Directive 2007/64/EC.” Official Journal of the European Union L 337/35 (December 23). https://eur -lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX :32015L2366from=EN. EU (European Union). 2016. “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation).” Official Journal of the Euro- pean Union L 119/1 (May 4). https://eur-lex.europa.eu/eli /reg/2016/679/oj. EU (European Union). 2018a. “Art. 6 GDPR: Lawfulness of Processing.” GDPR.Eu, November 14, 2018. Proton Technologies, Calgary, Canada. https://gdpr.eu/article-6 -how-to-process-personal-data-legally/. EU (European Union). 2018b. “Recital 26: Not Applicable to Anonymous Data.” GDPR.Eu, November 14, 2018. Proton Technologies, Calgary, Canada. https://gdpr.eu /recital-26-not-applicable-to-anonymous-data/. EU (European Union). 2018c. “Recital 43: Freely Given Consent.” GDPR.Eu, November 14, 2018. Proton Technol- ogies, Calgary, Canada. https://gdpr.eu/recital-43-freely -given-consent/. EU (European Union). 2018d. “Recital 68: Right of Data Portability.” GDPR.Eu, November 14, 2018. Proton Tech- nologies, Calgary, Canada. https://gdpr.eu/recital-68 -right-of-data-portability/. EU (European Union). 2018e. “Regulation (EU) 2018/1807 of the European Parliament and of the Council of 14 November 2018 on a Framework for the Free Flow of Non-personal Data in the European Union.” Official Journal of the European Union L 303, 61 (November 10): 78–68. https://eur-lex.europa.eu/legal-content/EN/TXT /PDF/?uri=OJ:L:2018:303:FULLfrom=EN.
  • 238. 218 | World Development Report 2021 EU (European Union). 2018f. “What Is GDPR, the EU’s New Data Protection Law?” GDPR.Eu, May 25, 2018. Proton Technologies, Calgary, Canada. https://gdpr.eu /what-is-gdpr/. EU (European Union). 2019a. “Directive (EU) 2019/770 of the European Parliament and of the Council of 20 May 2019 on Certain Aspects Concerning Contracts for the Supply of Digital Content and Digital Ser- vices.” Official Journal of the European Union L 136/1 (May 22).https://eur-lex.europa.eu/legal-content/EN/TXT/PDF /?uri=CELEX:32019L0770from=EN. EU (European Union). 2019b. “Directive (EU) 2019/1024 of the European Parliament and of the Council of 20 June 2019 on Open Data and the Re-Use of Public Sector Informa- tion.” Official Journal of the European Union L 172/56 (June 26). https://eur-lex.europa.eu/legal-content/EN/TXT /PDF/?uri=CELEX:32019L1024from=EN. Fang, Sammy, Carolyn Bigg, and John Zhang. 2020. “New Chinese Civil Code Introduces Greater Protection of Privacy Rights and Personal Information.” Insights, June 9, 2020, DLA Piper, London. https://www.dlapiper .com/en/uk/insights/publications/2020/06/new-chinese -civil-code-introduces-greater-protection-of-privacy -rights-and-personal-information/. Federal Register of Legislation, Australia. 2019. “Competition and Consumer Act 2010, No. 51, 1974.” Compilation 121. Sydney: Office of Parliamentary Counsel. http://www .legislation.gov.au/Details/C2019C00317/Html/Volume_1. Finch, Kelsey, and Omer Tene. 2018. “Smart Cities: Privacy, Transparency, and Community.” In The Cambridge Hand- book of Consumer Privacy, edited by Evan Selinger, Jules Polonetsky, and Omer Tene, 125–48. Cambridge Law Handbooks Series. Cambridge, UK: Cambridge Univer- sity Press. https://doi.org/10.1017/9781316831960.007. Fisher, Angelina, and Thomas Streinz. 2021. “Confronting Data Inequality.” WDR 2021 background paper, World Bank, Washington, DC. https://papers.ssrn.com/sol3 /papers.cfm?abstract_id=3825724. FTC (Federal Trade Commission, United States). 2018. Fair Credit Reporting Act, 15 U.S.C § 1681, rev. ed. Washington, DC: FTC. https://www.ftc.gov/system/files/documents /statutes/fair-credit-reporting-act/545a_fair-credit -reporting-act-0918.pdf. Gellman, Barton. 2013. “Edward Snowden, after Months of NSA Revelations, Says His Mission’s Accomplished.” Washington Post, December 23, 2013. https://www .washingtonpost.com/world/national-security/edward -snowden-after-months-of-nsa-revelations-says-his -missions-accomplished/2013/12/23/49fc36de-6c1c-11e3 -a523-fe73f0ff6b8d_story.html. Gelman, Robert B. 1998. Protecting Yourself Online: The Defin- itive Resource on Safety, Freedom, and Privacy in Cyberspace. With Stanton McCandlish and Members of the Elec- tronic Frontier Foundation. New York: HarperCollins. GPO (Government Publishing Office, United States). 2018. Communications Act of 1934, as Amended. United States Code, 2018 ed. Title 47: Telecommunications. Washington, DC: GPO. https://www.govinfo.gov/app/details /USCODE-2018-title47/USCODE-2018-title47-chap5 -subchapI-sec151. Greenleaf, Graham, and Bertil Cottier. 2020. “2020 Ends a Decade of 62 New Data Privacy Laws.” Privacy Laws and Business International Report 163: 24–26. https://papers .ssrn.com/sol3/papers.cfm?abstract_id=3572611. Hasselbalch, Gry, and Pernille Tranberg. 2016. “Personal Data Stores Want to Give Individuals Power over Their Data.” Dataethics (blog), September 27, 2016. https:// dataethics.eu/personal-data-stores-will-give-individual -power-their-data/. Henley, Jon, and Robert Booth. 2020. “Welfare Surveillance System Violates Human Rights, Dutch Court Rules.” Guardian, February 5, 2020. https://www.theguardian .com/technology/2020/feb/05/welfare-surveillance -system-violates-human-rights-dutch-court-rules. Hill, Kashmir. 2020. “Wrongfully Accused by an Algorithm.” New York Times, August 3, 2020. https://www.nytimes .com/2020/06/24/technology/facial-recognition-arrest .html. HLCM (High-Level Committee on Management, United Nations). 2018. “Personal Data Protection and Privacy Principles.” HLCM, Chief Executives Board for Coor- dination, United Nations, Geneva. https://unsceb.org /personal-data-protection-and-privacy-principles. Hoofnagle, Chris Jay, Woodrow Hartzog, and Daniel J. Solove. 2019. “The FTC Can Rise to the Privacy Challenge, but Not without Help from Congress.” Brookings TechTank (blog), August 8, 2019. https://www.brookings.edu/blog /techtank/2019/08/08/the-ftc-can-rise-to-the-privacy -challenge-but-not-without-help-from-congress/. Huyer, Esther, and Gianfranco Cecconi. 2020. “Business- to-Government Data Sharing.” Analytical Report 12, European Data Portal, European Commission, Luxem- bourg. https://www.europeandataportal.eu/sites/default /files/analytical_report_12_business_government_data _sharing.pdf. ICO (Information Commissioner’s Office). 2018. Guide to the General Data Protection Regulation (GDPR). Wilmslow, UK: ICO. https://ico.org.uk/media/for-organisations/guide-to -data-protection/guide-to-the-general-data-protection -regulation-gdpr-1-1.pdf. ICO (Information Commissioner’s Office). 2019. “The Use of Live Facial Recognition Technology by Law Enforcement in Public Places.” Information Commissioner’s Opinion, 2019/01, October 31, 2019. https://ico.org.uk/media/about -the-ico/documents/2616184/live-frt-law-enforcement -opinion-20191031.pdf. ISO (International Organization for Standardization). 2018. “ISO/PC 317: Consumer Protection: Privacy by Design for Consumer Goods and Services.” Taking Part: Technical Committee, ISO, Geneva. https://www.iso.org/committee /6935430.html. ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission). 2016. “ISO/IEC 27011:2016(en): Information Technology, Security Techniques, Code of Practice for Information Security Controls Based on ISO/IEC 27002 for Telecom- munications Organizations.” Online Browsing Platform, ISO, Geneva, https://www.iso.org/obp/ui/#iso:std:iso -iec:27011:ed-2:v1:en. ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission). 2017. “ISO/IEC 19941:2017, Information Technology, Cloud Com­ puting, Interoperability, and Portability.” Online
  • 239. Data policies, laws, and regulations: Creating a trust environment | 219 Browsing Platform, ISO, Geneva. https://www.iso.org /obp/ui/#iso:std:iso-iec:19941:ed-1:v1:en. ITU (International Telecommunication Union). 2014. “Using Regulatory Impact Analysis to Improve Decision Mak- ing in the ICT Sector.” ITU, Geneva. Kirkpatrick, Robert. 2014. “A Big Data Revolution for Sustain- able Development.” In The Global Compact International Yearbook 2014, edited by United Nations Global Compact Office, 33–35. New York: United Nations; Münster, Germany: macondo publishing. Kosseff, Jeff. 2019. The Twenty-Six Words That Created the Internet. Ithaca, NY: Cornell University Press. Krämer, Jan, Pierre Senellart, and Alexandre de Streel. 2020. “Making Data Portability More Effective for the Digi- tal Economy: Economic Implications and Regulatory Challenges.” Center on Regulation in Europe, Brussels. https://cerre.eu/publications/report-making-data -portability-more-effective-digital-economy/. Légifrance. 2016. “Loi no 2016-1321 du 7 octobre 2016 pour une République numérique.” Journal officiel de la République française, October 8, 2016, Légifrance, Direction de l’in- formation légale et administrative, Paris. https://www .legifrance.gouv.fr/download/file/SJ9w29KN2wvvWjcmi PwHr3B0La5rYk6ys5dm_FwTPZs=/JOE_TEXTE. Lessig, Lawrence. 1999. Code and Other Laws of Cyberspace. New York: Basic Books. LOC (Library of Congress, United States). 1998. “H. R. 2281 Digital Millennium Copyright Act: 105th Congress (1997–1998).” Congress.gov, October 28, 1998, LOC, Washington, DC. https://www.congress.gov/bill/105th -congress/house-bill/2281. LOC (Library of Congress, United States). 2018. “H. R. 4943, CLOUD Act: 115th Congress (2017–2018).” Congress.gov, February 6, 2018, LOC, Washington, DC. https://www .congress.gov/bill/115th-congress/house-bill/4943. Lubarsky, Boris. 2017. “Re-Identification of ‘Anonymized’ Data.” Georgetown Law Technology Review (April): 202–13. https://georgetownlawtechreview.org/re-identification -of-anonymized-data/GLTR-04-2017/. Lum, Kristian. 2016. “Predictive Policing Reinforces Police Bias.” HRDAG: Human Rights Data Analysis Group, Octo- ber 10. http://hrdag.org/2016/10/10/predictive-policing -reinforces-police-bias/. MeitY (Ministry of Electronics and Information Technol- ogy). 2020. “Report by the Committee of Experts on Non- Personal Data Governance Framework.” 111972/2020/ CL ES. MeitY, New Delhi. https://www.hunton privacyblog.com/wp-content/uploads/sites/28/2020/08 /mygov_159453381955063671.pdf. METI (Ministry of Economy, Trade, and Industry, Japan). 2018. “METI Formulates ‘Contract Guidance on Utiliza- tion of AI and Data.’” News release, June 15, 2018. https:// www.meti.go.jp/english/press/2018/0615_002.html. METI (Ministry of Economy, Trade, and Industry, Japan). 2020. “Unfair Competition Prevention Act.” Policy Index. Intellectual Property Policy Office, METI, Tokyo. https:// www.meti.go.jp/english/policy/economy/chizai/chiteki /index.html. MITCI (Ministry of Technology, Communication, and Inno- vation, Mauritius). 2017. “National Open Data Policy.” MITCI, Quatre Bornes, Mauritius. https://mitci.govmu .org/Documents/Strategies/Mauritius%20Open%20Data %20Policy%20May%202017.pdf. MoICT (Ministry of Information and Communication Tech- nology, Jordan). 2017. “Open Government Data Policy.” MoICT, Amman, Jordan. https://modee.gov.jo/ebv4.0 /root_storage/en/eb_list_page/open_government_data _policy_2017.pdf. National Archives, United Kingdom. 2017. “Digital Economy Act 2017.” legislation.gov.uk, National Archives, London. https://www.legislation.gov.uk/ukpga/2017/30/contents /enacted. National Archives, United Kingdom. 2019. “Guidance on the Implementation of the Re-use of Public Sector Informa- tion Regulations 2015: For Public Sector Bodies.” Version 1.1, National Archives, London. https://www.national archives.gov.uk/documents/information-management /psi-implementation-guidance-public-sector-bodies.pdf. National Assembly, Togo. 2020. “Loi Relative a l’Identifi- cation Biometrique des Personnes Physiques au Togo” [Law on the biometric measurement of natural persons in Togo]. National Assembly, Open Session, Septem- ber 3. http://www.assemblee-nationale.tg/images/bio metrie%20loi%20AN.pdf. Nilsson, Patricia. 2019. “Police Fear Bias in Use of Arti- ficial Intelligence to Fight Crime.” Financial Times, September 15, 2019. https://www.ft.com/content /5753689c-d63e-11e9-a0bd-ab8ec6435630. NIST (National Institute of Standards and Technology). 2020. “NIST Study Evaluates Effects of Race, Age, Sex on Face Recognition Software.” News, December 19, 2019, updated May 18, 2020, NIST, US Department of Commerce, Gaithersburg, MD. https://www.nist.gov /news-events/news/2019/12/nist-study-evaluates-effects -race-age-sex-face-recognition-software. Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. New York: NYU Press. https:// nyupress.org/9781479837243/algorithms-of-oppression. Noveck, Beth Simone. 2017. “Rights-Based and Tech-Driven: Open Data, Freedom of Information, and the Future of Government Transparency.” Yale Human Rights and Development Law Journal 19 (1): article 1. https://digital commons.law.yale.edu/yhrdlj/vol19/iss1/1. OECD (Organisation for Economic Co-operation and Devel- opment). 2013. The OECD Privacy Framework. Paris: OECD. http://www.oecd.org/sti/ieconomy/oecd_privacy_frame work.pdf. OECD (Organisation for Economic Co-operation and Devel- opment). 2019a. Enhancing Access to and Sharing of Data: Reconciling Risks and Benefits for Data Re-Use across Societies. Paris: OECD. https://doi.org/10.1787/276aaca8-en. OECD (Organisation for Economic Co-operation and Devel- opment). 2019b. “Licensing of IP Rights and Competition Law.” Background Note DAF/COMP(2019)3, Competition Committee, Directorate for Financial and Enterprise Affairs,OECD,Paris.https://one.oecd.org/document/DAF /COMP(2019)3/en/pdf. OECD (Organisation for Economic Co-operation and Development). 2019c. “Recommendation of the Coun- cil on Artificial Intelligence.” OECD Legal Instruments, OECD/LEGAL/0449, adopted on May 22, 2019. https://
  • 240. 220 | World Development Report 2021 legalinstruments.oecd.org/en/instruments/OECD -LEGAL-0449. OneTrust DataGuidance and FPF (Future of Privacy Forum). 2019. “Comparing Privacy Laws: GDPR v. CCPA.” OneTrust, London; FPF, Washington, DC. https://fpf .org/wp-content/uploads/2019/12/ComparingPrivacy Laws_GDPR_CCPA.pdf. Open Knowledge Foundation. 2020. “What Is Open Data?” Open Data Handbook: Guide. London: Open Knowledge Foundation. https://opendatahandbook.org/guide/en /what-is-open-data/. OVIC (Office of the Victorian Information Commissioner). 2020. “Victorian Protective Data Security Framework, Version 2.0.” OVIC, Melbourne. https://ovic.vic.gov.au /wp-content/uploads/2020/02/Victorian-Protective-Data -Security-Framework-V2.0.pdf. Parliament of India. 2019. “The Personal Data Protection Bill, 2019.” Bill No. 373 of 2019, Parliament of India, New Delhi. https://dataprotectionindia.in/act/. PC (Productivity Commission, Australia). 2017. Data Avail- ability and Use. Productivity Commission Inquiry Report 82. Canberra: PC. https://www.pc.gov.au/inquiries /completed/data-access/report/data-access.pdf. PCPD (Office of the Privacy Commissioner for Personal Data, Hong Kong SAR, China). 2012. “Privacy by Design Conference.” PCPD, Hong Kong SAR, China. https:// www.pcpd.org.hk/pbdconference/index.html. PMC (Department of the Prime Minister and Cabinet, ­ Australia). 2019. “Best Practice Guide to Applying Data Sharing Principles.” PMC, Canberra. https://www.pmc .gov.au/resource-centre/public-data/data-sharing -principles. Potey, Manish M., C. A. Dhote, and Deepak H. Sharma. 2016. “Homomorphic Encryption for Security of Cloud Data.” Procedia Computer Science 79 (January): 175–81. https://doi .org/10.1016/j.procs.2016.03.023. Ragavan, Srividhya, Brendan Murphy, and Raj Davé. 2016. “FRAND v. Compulsory Licensing: The Lesser of the Two Evils.” Duke Law andTechnology Review 14 (1): 83–120. RIA (Information System Authority, Estonia). 2020. “Data Exchange Layer X-Tee.” RIA, Tallinn, Estonia. https:// www.ria.ee/en/state-information-system/x-tee.html#: ~:text=X%2Dtee%2C%20the%20data%20exchange,data %20based%20on%20an%20agreement. Smith, Jack, IV. 2016. “Crime-Prediction Tool May Be Rein- forcing Discriminatory Policing—Business Insider.” Busi- nessInsider,October10,2016.https://www.businessinsider .com/predictive-policing-discriminatory-police-crime -2016-10?r=UK. Stats NZ (Statistics New Zealand). 2019. “Algorithm Char- ter.” Stats NZ, Wellington, New Zealand. https://data .govt.nz/assets/Uploads/Draft-Algorithm-Charter-for -consultation.pdf. Sweeney, Latanya. 2000. “Simple Demographics Often Iden- tify People Uniquely.” Data Privacy Working Paper 3, Carnegie Mellon University, Pittsburgh. Sweeney, Latanya. 2002. “k-Anonymity: A Model for Protect- ing Privacy.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (05): 557–70. https://doi .org/10.1142/S0218488502001648. Taylor, Linnet, Luciano Floridi, and Bart van der Sloot, eds. 2017. Group Privacy: New Challenges of Data Technologies. Philosophical Studies Series, vol. 126. Cham, Switzerland: Springer. TBS (Treasury Board of Canada Secretariat). 2020. “Respon- sible Use of Artificial Intelligence (AI).” Canada.ca, July 28, 2020, TBS, Ottawa. https://www.canada.ca/en /government/system/digital-government/digital -government-innovations/responsible-use-ai.html. TOOP (The Once-Only Principle Project). 2021. “The Once- Only Principle Project.” Tallinn University of Technol- ogy, Tallinn, Estonia. https://www.toop.eu/about. Treasury, Australia. 2020. Inquiry into Future Directions for the Consumer Data Right. Canberra: Treasury. https://treasury .gov.au/sites/default/files/2020-12/cdrinquiry-accessible final.pdf. Ubaldi, Barbara. 2013. “Open Government Data: Towards Empirical Analysis of Open Government Data Initia- tives.” OECD Working Paper on Public Governance 22, Organisation for Economic Co-operation and Develop- ment, Paris. https://doi.org/10.1787/5k46bj4f03s7-en. UNCITRAL (United Nations Commission on International Trade Law). 1998. “UNCITRAL Model Law on Electronic Commerce (1996) with Additional Article 5 bis as Adopted in 1998.” UNCITRAL, Vienna. https://uncitral.un.org/en /texts/ecommerce/modellaw/electronic_commerce. UNCITRAL (United Nations Commission on International Trade Law). 2001. “UNCITRAL Model Law on Electronic Signatures (2001).” UNCITRAL, Vienna. https://uncitral .un.org/en/texts/ecommerce/modellaw/electronic _signatures. UNCITRAL (United Nations Commission on International Trade Law). 2009. Promoting Confidence in Electronic Com- merce: Legal Issues on International Use of Electronic Authen- tication and Signature Methods. Vienna: United Nations. https://www.uncitral.org/pdf/english/texts/electcom/08 -55698_Ebook.pdf. UNSDG (United Nations Sustainable Development Group). 2017. “Data Privacy, Ethics, and Protection: Guidance Note on Big Data for Achievement of the 2030 Agenda.” UNSDG, New York. https://unsdg.un.org/sites/default /files/UNDG_BigData_final_web.pdf. Vickery, Graham. 2012. “Review of Recent Studies on PSI Re-use and Related Market Developments.” Information Economics, Paris. Ville de Nantes (City of Nantes, France). 2019. “Charte mét- ropolitaine de la donnée” [Metropolitan data charter]. Nantes Métropole, May 2019, Ville de Nantes, France. https://metropole.nantes.fr/charte-donnee. Waze. 2018. “Waze Celebrates 600 Connected Citizens Program Partners.” Waze, Google, Mountain View, CA. https://medium.com/waze/waze-celebrates-600 -connected-citizens-program-partners-36945fbceb66. WEF (World Economic Forum). 2011. “Personal Data: The Emergence of a New Asset Class.” In collaboration with Bain Company, Inc., WEF, Geneva. http://www3.we forum.org/docs/WEF_ITTC_PersonalDataNewAsset _Report_2011.pdf. Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aal- bersberg, Gabrielle Appleton, Myles Axton, Arie Baak,
  • 241. Data policies, laws, and regulations: Creating a trust environment | 221 Niklas Blomberg, et al. 2016. “The FAIR Guiding Princi- ples for Scientific Data Management and Stewardship.” Scientific Data 3 (March 15): 160018. https://doi.org/10.1038 /sdata.2016.18. World Bank. 2018. “Global Indicators of Regulatory Gov- ernance: Worldwide Practices of Regulatory Impact Assessments.” World Bank, Washington, DC. http://docu ments1.worldbank.org/curated/en/905611520284525814 /Global-Indicators-of-Regulatory-Governance-World wide-Practices-of-Regulatory-Impact-Assessments.pdf. World Bank. 2019. ID4D Practitioner’s Guide: Version 1.0. October 2019. Washington, DC: World Bank. https:// documents.worldbank.org/en/publication/documents -reports/documentdetail/248371559325561562/id4d -practitioner-s-guide. World Bank and United Nations. 2017. Combatting Cyber- crime: Tools and Capacity Building for Emerging Economies. Washington, DC: World Bank. http://documents.world bank.org/curated/en/355401535144740611/Combatting -Cybercrime-Tools-and-Capacity-Building-for-Emerging -Economies. World Wide Web Foundation. 2017. Open Data Barometer: Global Report, 4th ed. Washington, DC: World Wide Web Foundation. https://opendatabarometer.org/doc/4th Edition/ODB-4thEdition-GlobalReport.pdf. Zanfir-Fortuna, Gabriela, and Sasha Hondagneu-Messner. 2019. “CPDP 2019 Panel: Understanding the Limits and Benefits of Data Portability.” Future of Privacy Forum, 2019 Computers, Privacy, and Data Protection Conference, Brussels, February 26, 2019. https://fpf.org /2019/02/26/cpdp-2019-panel-understanding-the-limits -and-benefits-of-data-portability/. Zhang, Gil, and Kate Yin. 2020. “A Look at China’s Draft of Personal Data Protection Law.” PrivacyTracker, Interna- tional Association of Privacy Professionals, Portsmouth, NH. https://iapp.org/news/a/a-look-at-chinas-draft-of -personal-data-protection-law/.
  • 242. 222 | World Development Report 2021 The need for immediate and reliable information about COVID-19 has tested the systems in place for protecting data. Spotlight 6.1 The evolving social contract on data: Balancing data sharing and data protection to facilitate contact tracing to control COVID-19 International and national laws recognize that, in extraordinary circumstances, certain fundamental rights, including the right to data protection, may be restricted, with the following conditions: basic democratic principles and safeguards are ensured, and the restriction is legitimate, time limited, and not arbitrary.1 Following the World Health Organization’s dec- laration of a global pandemic in mid-March 2020,2 governments around the world have adopted con- tact tracing strategies to track down any individual who might have come into contact with an infected person, so that they may be quarantined to prevent further spread of the disease.3 Such contact tracing has historically been carried out manually by public health authorities.4 However, it can be undertaken much more efficiently on a massive scale using dig- ital technologies such as mobile applications, which can simultaneously deliver public health advice. Despite these benefits, contact tracing raises several concerns. First, tools relying on location trac- ing may be construed as unwarranted surveillance and a threat to privacy, especially in jurisdictions with inadequate data protection frameworks5 and given that location data are hard​to anonymize fully. Second, personal data collected in contact tracing currently flow beyond trusted parties and organi- zations, reaching more third parties than accounted for in current governance models. Third, there is evidence that using geographic location in contact tracing may be inaccurate and inefficient because it does not provide all of the relevant facts. An empiri- cal study of the Ebola outbreak found that those data are meaningful only when reidentified, touching on the “purpose limitation” used in good-practice data protection laws. While countries around the world have been developing contact tracing apps, two approaches have emerged: centralized and decentralized. Both approaches use Bluetooth signals to log when smart- phone owners are in proximity to one another, send- ing alerts to users who may have been infected when someone develops COVID-19 symptoms. Under the centralized model originally pursued by the UK government, anonymized data are gathered and uploaded to a remote server, where matches are made with other contacts when a person starts to experience COVID-19 symptoms.6 The United King- dom’s proposed approach contains a persistent iden- tifier that is shared with the National Health Service, allowing public authorities to receive infection data automatically. The central server then alerts other app users who have had significant contact with the infected person. Despite the public health merits of the centralized approach, the application was aban- doned in mid-June 2020 in favor of a decentralized approach, due to low rates of phone recognition during its testing phase on the Isle of Wight. In contrast, the decentralized model, promoted jointly by Apple and Google, aims to support contact tracing by health agencies, while integrating privacy and security into the design.7 Users have more control over their information because it is stored in a decen- tralized manner on their phones, preventing the siphoning of data into central government servers. In this model, “The protocol excludes processing of any location data—unless the user opts in—applies ‘Roll- ing Proximity Identifiers’ that prevent identification
  • 243. Balancing data sharing and data protection to facilitate contact tracing to control COVID-19 | 223 of the user, processes proximity identifiers obtained from other devices exclusively on the device, [and] permits only users to decide whether to contribute to contact tracing by sharing Diagnosis Keys with the ‘Diagnosis Server’ if diagnosed with COVID-19, resulting in the alert to other users.”8 A multistakeholder consortium, the Pan-European Privacy-Preserving Proximity Tracing (PEPP-PT) community, is developing contact tracing technolo- gies that adhere to strong European privacy and data protection laws and principles.9 The PEPP-PT techni- cal mechanisms and standards fully protect privacy, while taking advantage of the possibilities of digital technology to maximize the speed and real-time capability of national pandemic responses. Initiatives under the umbrella of PEPP-PT aim to develop an open protocol for COVID- 19 proximity tracing using Bluetooth Low Energy on mobile devices and an architecture to ensure that personal data stay entirely on an individual’s phone. Notes 1. Access Now (2020). 2. WHO (2020). 3. Yan (2020). 4. eHealth Network (2020). 5. FPF (2020). 6. Economist (2020). 7. Google (2020); Sabbagh and Hern (2020). 8. eHealth Network (2020). 9. See Pan-European Privacy-Preserving Proximity Tracing (dashboard), n.d. References Access Now. 2020. “Recommendations on Privacy and Data Protection in the Fight against COVID-19.” Access Now, Brooklyn, NY, March 2020. https://www.accessnow.org /cms/assets/uploads/2020/03/Access-Now-recommen dations-on-Covid-and-data-protection-and-privacy.pdf. Economist. 2020. “Privacy Be Damned: Some Countries Want Central Databases for Contact-Tracing Apps.” April 30, 2020. https://www.economist.com/europe/2020 /04/30/some-countries-want-central-databases-for -contact-tracing-apps. eHealth Network. 2020. “Mobile Applications to Support Contact Tracing in the EU’s Fight against COVID-19: Common EU Toolbox for Member States.” Version 1.0, eHealth Network, Brussels, April 15, 2020. https:// ec.europa.eu/health/sites/health/files/ehealth/docs /covid-19_apps_en.pdf. FPF (Future of Privacy Forum). 2020. “Privacy Pandemics: The Role of Mobile Apps (Chart).” FPF, Washington, DC, April 2020. https://fpf.org/wp-content/uploads/2020/04 /editPrivacy-Pandemics_-The-Role-of-Mobile-Apps -Chart-11.pdf. Google. 2020. “Apple and Google Partner on COVID-19 Contact Tracing Technology.” Company Announcements (blog), April 10, 2020. https://blog.google/inside-google /company-announcements/apple-and-google-partner -covid-19-contact-tracing-technology/. Pan-European Privacy-Preserving Proximity Tracing (dash- board). n.d. “PEPP-PT.” GitHub. Accessed December 15, 2020. https://github.com/pepp-pt. Sabbagh, Dan, and Alex Hern. 2020. “UK Abandons Contact-Tracing App for Apple and Google Model.” Guardian, June 18, 2020. https://www.theguardian.com /world/2020/jun/18/uk-poised-to-abandon-coronavirus -app-in-favour-of-apple-and-google-models. WHO (World Health Organization). 2020. “WHO Announces COVID-19 Outbreak a Pandemic.” Media Release, March 12, 2020. WHO Regional Office for Europe, Copenha- gen. https://www.euro.who.int/en/health-topics/health -emergencies/coronavirus-covid-19/news/news/2020/3 /who-announces-covid-19-outbreak-a-pandemic. Yan, Holly. 2020. “Contact Tracing 101: How It Works, Who Could Get Hired, and Why It’s So Critical in Fighting Coronavirus Now.” CNN Health (blog), May 15, 2020. https://www.cnn.com/2020/04/27/health/contact-tracing -explainer-coronavirus/index.html.
  • 244. 224 | World Development Report 2021 Personal data “ownership” is incompatible with a rights-based approach to personal data protection. Spotlight 6.2 The debate over ownership of personal data The commercial value of personal data has prompted arguments that individuals should be allowed to commercialize their own data.1 Granting individuals ownership rights over personal data is sometimes proposed to address data inequalities and to deter- mine how such data can be used and by whom.2 Yet “ownership” neither addresses these inequalities nor empowers individuals to control the use of data. Per- sonal data “ownership” makes sense only if personal data are considered an “asset” with associated prop- erty rights.3 If personal data are property, they can be used as collateral and for commercial exchange, with the potential implication that individuals could even trade away the data that contain their digital identity. Some scholars suggest that concepts of property rights should apply to personal data.4 Others suggest that market-based solutions should be used to protect data,5 called the “personal data economy.”6 The eco- nomic literature is mixed on whether data ownership rights could solve market failures or improve social outcomes.7 Some suggest that the optimal distribu- tion of ownership rights would depend on factors including the investment required to create the data8 and the ability to monetize data.9 A regime based on property rights would likely increase the transaction costs involved in data sharing, by requiring negotia- tion of the terms of sale and use. Ascribing data ownership rights to personal data also poses legal challenges. First, personal data often involve overlapping interests of different parties.10 These interests are present in the collection, creation, and use of the data.11 If ownership were allocated to the “party with the clearest interest or who could make the most value out of it,”12 it would be practi- cally difficult to identify the party or parties meeting this definition.13 It is also unclear how to compensate interested third parties if their rights are breached through downstream data uses.14 Creating a data ownership right would require elaborating “neces- sary user” rights and rules to accommodate the public interest needs of such data,15 such as those raised in the COVID-19 pandemic response. Second, “owning” personal data might incentivize poor and more vulnerable people to sell their personal data, exacerbating existing inequities. Under a rights- basedapproachtopersonaldataprotection,individuals have fundamental rights regarding their data. Perhaps ironically, these rights—more than “ownership”—give individuals control over their data, enabling them to negotiate the use of these data.16 These immutable rights—like due process under law—cannot be bar- gained away like chattel. Even current case law does not support ownership rights over personal data. Notes 1. Start-ups providing personal data management services to internet users have appeared, ranging from compa- nies that compensate users for their personal informa- tion to those that require users to pay fees to avoid the use of their personal information (Elvy 2017). 2. This spotlight deals only with issues concerning “own- ership.” Other theories include treating personal data as labor (see Posner and Weyl 2018, who posit that the individual’s role in creating the data is recognized and compensated as labor) or allowing personal data to be shared through licensing arrangements (see Savona 2019, who suggests that data could be recognized as a
  • 245. The debate over ownership of personal data | 225 Farkas, Thomas J. 2017. “Data Created by the Internet of Things: The New Gold without Ownership?” Revista la Propiedad Inmaterial 23 (June): 5–17. https://revistas .uexternado.edu.co/index.php/propin/article/view/4975. Fisher, Angelina, and Thomas Streinz. 2021. “Confronting Data Inequality.” WDR 2021 background paper, World Bank, Washington, DC. https://papers.ssrn.com/sol3 /papers.cfm?abstract_id=3825724. Haupt, Michael. 2016. “Introducing Personal Data Exchanges and the Personal Data Economy.” #ExitTheSystem (blog), December 7, 2016. https://medium.com/project-2030 /what-is-a-personal-data-exchange-256bcd5bf447. Kerber, Wolfgang. 2016. “Digital Markets, Data, and Privacy: Competition Law, Consumer Law, and Data Pro- tection.” MACIE Paper 2016/3, Marburg Centre for Insti- tutional Economics, School of Business and Economics, Philipps-University Marburg, Marburg, Germany. Laudon, Kenneth C. 1996. “Markets and Privacy.” Communi- cations of the ACM 39 (9): 92–104. https://doi.org/10.1145 /234215.234476. Posner, Eric A., and E. Glen Weyl. 2018. Radical Markets: Uprooting Capitalism and Democracy for a Just Society. Princeton, NJ: Princeton University Press. Samuelson, Pamela. 2000. “Privacy as Intellectual Property?” Stanford Law Review 52 (5): 1125–73. https://doi.org/10 .2307/1229511. Savona, Maria. 2019. “The Value of Data: Towards a Frame- work to Redistribute It.” SPRU Working Paper SWPS 2019-21, Science Policy Research Unit, Business School, University of Sussex, Brighton, UK, October 2019. Scassa, Teresa. 2017. “Sharing Data in the Platform Economy: A Public Interest Argument for Access to Platform Data.” UBC Law Review 54 (4): 1017–71. Tirole, Jean. 2017. Economics for the Common Good. Prince­ton, NJ: Princeton University Press. https://press.princeton .edu/books/hardcover/9780691175164/economics-for-the -common-good. Viljoen, Salomé. 2020. “Data as Property.” Phenomenal World, October 16, 2020. https://phenomenalworld.org/analysis /data-as-property. Wiebe, Andreas. 2016. “Protection of Industrial Data: A New Property Right for the Digital Economy?” GRUR Int 10/2016 (October): 877–83. http://www.grur.org/uploads /media/GRURInt_2016_10_Inhalt_fertig.pdf. Zech, Herbert. 2016. “Data as a Tradeable Commodity.” In European Contract Law and the Digital Single Market: The Implications of the Digital Revolution, edited by Alberto De Franceschi, 51–80. Cambridge, UK: Intersentia. https:// doi.org/10.1017/9781780685212.004. Zuboff, Shoshana. 2019. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. New York: PublicAffairs. licensable asset owned by the individual who generates it); see also Fisher and Streinz 2021. A related aspect of this debate revolves around expanding the types of data available to creditors and other decision makers beyond traditional data, such as payments on loans. These nontraditional types of data could include utility payments, cash flow, and social media data. The reliabil- ity of such data and the ability to access and dispute the information are important issues. The analytics applied to such data may also fall in the nonpersonal category. Ownership of nonpersonal data, by contrast, is a more straightforward issue of intellectual property rights, which is addressed in chapter 6. 3. Castells (2010); Zuboff (2019). 4. Laudon (1996); Samuelson (2000). 5. See, for example, Carrascal et al. (2013) and Kerber (2016). 6. See, for example, Haupt (2016). 7. Duch-Brown, Martens, and Mueller-Langer (2017). 8. Tirole (2017); Zech (2016). 9. Dosis and Sand-Zantman (2019). 10. Scassa (2017). 11. Scassa (2017). 12. Wiebe (2016, 880). 13. See, for example, Farkas (2017). 14. Viljoen (2020) 15. Scassa (2017). 16. Scassa (2017). References Carrascal, Juan Pablo, Christopher Riederer, Vijay Erramilli, Mauro Cherubini, and Rodrigo de Oliveira. 2013. “Your Browsing Behavior for a Big Mac: Economics of Per- sonal Information Online.” In WWW ’13: Proceedings of the 22nd International Conference on World Wide Web, 189–200. New York: Association for Computing Machinery. http://jpcarrascal.com/docs/publications/WWW2013 -Browsing_behavior_big_mac.pdf. Castells, Manuel. 2010. The Information Age: Economy, Society, and Culture. Vol 1: The Rise of the Network Society, 2d ed. Malden, MA: Wiley-Blackwell. Dosis, Anastasios, and Wilfried Sand-Zantman. 2019. “The Ownership of Data.” TSE Working Paper 19-1025, Toulouse School of Economics, University of Toulouse, Toulouse, France, July 2019. Duch-Brown, Nestor, Bertin Martens, and Frank Mueller- Langer. 2017. “The Economics of Ownership, Access, and Trade in Digital Data.” JRC Digital Economy Working Paper 2017–01, Joint Research Center, European Com- mission, Seville, Spain. Elvy, Stacy-Ann. 2017. “Paying for Privacy and the Personal Data Economy.” Colombia Law Review 117 (6): 1369–459.
  • 247. Creating value in the data economy: The role of competition, trade, and tax policy | 227 Main messages The expanding role of data in ubiquitous platform business models is reshaping competition, trade, and taxation in the real economy, posing important risks for low- and middle-income countries. The way countries design safeguards and enablers for data will have knock-on effects for the real economy. For example, enabling data sharing among market players can play a valuable role in promoting competition. At the same time, the stringency of data safeguards will shape cross-border trading patterns for data-enabled services. Meanwhile, the intangible nature of digital value chains is posing major challenges for tax revenue mobilization. Low- and middle-income countries too often lack the institutional capacity to manage the economic policy challenges posed by the data- driven economy. These challenges call for agile competition policies and modern trade and tax administrations. Complicating matters, policies on competition, trade, and taxation are significantly intertwined. Internationally coordinated action—on antitrust enforcement, regulation of platform firms, data standards, trade agreements, and tax policy— is critical to ensuring efficient, equitable policies for the data economy that respond to countries’ needs and interests. Creating value in the data economy: The role of competition, trade, and tax policy 7 CHAPTE R 1 2 3 4
  • 248. 228 | World Development Report 2021 Shaping data regulation to support competition, trade, and taxation R apid technological innovation and the associ- ated explosive production of data are reshap- ing the business landscape (see chapter 3). New data-driven businesses—with their intangible assets such as data and algorithms—are rapidly gaining ground in markets worldwide. These include the global players that often make headlines, such as Alibaba, Amazon, Facebook, and Alphabet (Google), as well as more local platform businesses in lower- income countries, such as the Nigerian agricultural platform Hello Tractor, which matches smallholder farmers with underutilized tractors, and the Egyptian educational platform Tutorama, which matches stu- dents to high-quality tutors. Data-driven businesses exhibit idiosyncrasies that distinguish them from more traditional firms—nota- bly, their greater propensity for market dominance in some cases, their ability to achieve scale without mass, and the intangibility of their transactions (fig- ure 7.1). Each of these characteristics poses important challenges for economic outcomes in competition, trade, and taxation, potentially offsetting some of their positive effects. Using data as an input to the production process can give businesses a competitive advantage, which, because of economies of scale and scope associated with data and the strong network effects arising from platforms, enables them to entrench their market position and potentially exercise market power. Data- driven markets tend to exhibit economies of scale because of the large upfront fixed cost of technical infrastructure and the advantage large datasets offer for learning. By accumulating tremendous amounts of data through transactions and applying algo- rithms, businesses are able to provide their clients with customized services and products. For example, e-commerce platforms tailor product suggestions to their clients’ shopping history, thereby enforcing the tendency of customers to stick with such platforms because they “understand” their customers better. Figure 7.1 The unusual characteristics of data-driven businesses pose complex challenges for policy makers in the areas of competition, trade, and taxation • Adapt antitrust rules to challenges posed by data-driven businesses • Adopt pro-competition data remedies subject to safeguards for innovation • Ensure that traditional regulations promote competition and consumer choice • Create a level playing field between digital and analog businesses • Adapt indirect tax administration to platform businesses • Adapt direct tax rules to account for intangible value drivers • Establish regulatory framework for cross-border data flows • Balance individual rights, implementation costs, and business opportunities • Adopt international (trade) agreements for data Data as a factor of production Propensity for market dominance Competition between firms Competition policy Tax policy Trade policy Entry of new firms and competition dynamics drive ability toexport Trade in digital services can drive competition in importing countries Trade between countries Scale without mass Intangibility of transactions Figure 7.1 (2 column 33p7) Source: WDR 2021 team.
  • 249. Creating value in the data economy: The role of competition, trade, and tax policy | 229 Positive network effects further strengthen market positions because third-party suppliers prefer to display their products on more highly frequented platforms where the odds are higher they will make a sale. Customers, in turn, are more likely to visit such platforms because of the wider choice of third-party suppliers. Because data-driven businesses rely on value ­ drivers such as data and algorithms that can be infinitely replicated and instantly distributed around the world, they are able to achieve scale without mass. They can, then, be a major player in a market without having any brick-and-mortar facilities, local employ- ees, or even domestic business registration. For example, Facebook has more than 2.6 billion active visitors each month spread over nearly every country in the world, but it maintains office locations in just 35 countries.1 Similarly, AliExpress, a Chinese-based e-commerce platform launched in 2010, now has users in more than 230 countries and regions.2 It, too, relies on data and algorithms rather than offline retail stores. Meanwhile, the cross-border trade in digital services is climbing. These services, largely intan- gible, include social media, big data analytics, cloud computing, and online services. The changing nature of data-driven businesses challenges traditional economic policies, calling for adaptation and paradigm shifts (figure 7.1). In compe- tition policy, it is important to address the tendency of data-driven businesses to tip toward concentrated market structures and entrenched market power. However,thecomplexandnovelnatureofdata-driven business models can pose difficulties for traditional antitrust regulation in all countries. At the same time, the rapid growth of international trade in data-driven services is raising tensions between the need for data to flow across borders and the need to ensure that any personal data involved in such trade are adequately protected as they move into other jurisdictions. Tax- ation authorities are struggling to value, map, and track digital value chains in the data-driven sector. Administration of the value added tax (VAT) is com- plicated by third-party vendors operating through platform businesses. Meanwhile, the prevalence of scale without mass is challenging the long-standing taxation principle of permanent establishment and physical presence as a basis for establishing corporate tax liability. Domestic policies relating to competition, trade, and taxation are significantly intertwined, requiring a coordinated approach to policy reform. Effective antitrust policies strengthen competition in the domestic market, thereby enhancing competitiveness in international trade. Governance frameworks that support trade in data-enabled goods and services may at the same time intensify competitive pressures in the local economy. Nonetheless, for platform busi- nesses, trade openness could lead instead to greater market concentration as the dominance of global firms is further reinforced by access to new markets and associated data. At the same time, the inability to tax platform businesses effectively may undermine competition between local and foreign firms, as well as between digital and analog businesses operating in thesamesectors,affectingbothcompetitionindomes- tic markets and competitiveness in global markets. All three of these policy areas call for internation- ally coordinated action. Antitrust measures in one country can affect the fortunes of globally active firms, with spillover effects for other jurisdictions. Trade agreements, as well as global harmonization of technical standards for data sharing, play an essen- tial role in promoting international commerce for data-enabled goods and services. International tax policy regimes determine the allocation of taxation rights across countries, while cross-border coopera- tion on tax administration can help ensure that rev- enues are fully captured. This chapter discusses the policy challenges and responses arising from competition, trade, and tax- ation. A central theme is how these policies interact with the broader legal and regulatory framework on data safeguards and enablers described in chapter 6. Policy makers should heed the two-way linkages between economic outcomes and data regulation and carefully weigh the trade-offs that may arise between safeguarding and enabling data sharing versus advancing wider economic goals such as productivity, competitiveness, and growth. Competition policy The rise of data-driven businesses can drive pro- development market opportunities, but data can also give firms a competitive advantage that may push markets into entrenched concentration and market power. This advantage increases the risk of excluding smaller firms and entrepreneurs and exploiting indi- vidual users of data (chapter 3). The key challenge for policy makers is to preserve the positive externalities that create value in data-driven markets, while ensur- ing that these externalities can be harnessed by all players in a competitive, vibrant ecosystem without violating the rights of individuals. This chapter focuses on data-driven platform firms because of their pertinence across economies
  • 250. 230 | World Development Report 2021 at different levels of development. The complex inter- actions between data protection concerns (outlined in chapter 6) and competition concerns (described in this chapter) are further explored in spotlight 7.1. Governments have two complementary compe- tition policy tools to safeguard against the risks of excluding smaller firms and exploiting individual users of data. The first tool is enforcement of antitrust laws, with adaptations to the context of data-driven businesses. Enforcement involves detecting and punishing anticompetitive practices (in which a firm abuses its dominant position or a group of firms enters into an anticompetitive agreement) or pre- venting anticompetitive mergers. The second tool is the design of regulations to allow data-driven firms to enter markets and compete on a level playing field, while also protecting users. These ex ante policies lay down ground rules for the market to promote compe- tition and could include data governance regulations, regulations directed at large data-driven platforms, and traditional sector regulations. Because these two competition policy tools complement one another, they can be applied in parallel, depending on the insti- tutional setting and the issues to be tackled. Enforcing antitrust laws Antitrust investigations in the data economy are not just a developed country phenomenon. World- wide, as of January 2020 some 102 antitrust cases across 16 different sectors on abuse of dominance, anticompetitive agreements, and mergers had been finalized.3 European antitrust authorities have final- ized the most cases (33 percent), followed by authori- ties in East Asia and the Pacific (18 percent) and Latin America (15 percent). The most active lower-middle- income countries included the Arab Republic of Egypt, India, and Kenya, with more cases still under investigation in Indonesia, Nigeria, and Zimbabwe. Low-income jurisdictions had not yet finalized any antitrust cases involving digital platforms, likely reflecting the absence of functional or well-resourced antitrust regimes and lower policy prioritization.4 Among landmark cases from the middle-income country group is the 2018 investigation by the Com- petition Commission of India finding that Google abused its dominant position in web search and advertising. Google favored its own services and partners through manual manipulation of its search algorithm, thereby putting smaller businesses at a disadvantage.5 In another salient example, Mex- ico’s Federal Economic Competition Commission (COFECE) blocked Walmart’s proposed acquisition of the Cornershop app in 2019, because the new com- pany would have access to data on the sales of com- peting retailers through the Cornershop platform, which was believed to prejudice smaller rivals.6 Antitrust cases related to the digital economy in e-commerce, passenger transport, and operating system application development account for more than half of cases globally (figure 7.2). E-commerce Figure 7.2 In the digital economy, antitrust cases related to passenger transport are more prevalent in middle-income countries than in high-income countries Source: Nyman and Barajas, forthcoming. Data at http://bit.do/WDR2021-Fig-7_2. Note: Percentages are based on information for all antitrust cases involving digital platforms globally for which information was publicly available as of January 2020. Cases total 62 in high-income countries and 40 in middle-income countries. Panel a: “Others” includes digital music, e-books, educational materials, food delivery, online comparison platforms, dating platforms, and ticketing. Panel b: “Others” includes online delivery services, ticketing, and tourism. No cases were finalized in low-income countries, according to publicly available information. a. High-income countries (% of total cases) Retail e-commerce, 21 Others, 15 Online search and advertising, 15 Tourism (accommodation), 13 Social media and communica- tions, 10 Software and operating systems, 15 Passenger transport, 8 Online real estate platforms, 5 Retail e-commerce, 23 Software and operating systems, 18 Others, 10 Passenger transport, 35 Mobile financial services, 8 Online search and advertising, 8 b. Middle-income countries (% of total cases) itrust cases 9 1 1 10 12 14 16
  • 251. Creating value in the data economy: The role of competition, trade, and tax policy | 231 cases account for about 20 percent of total cases sam- pled and are equally prevalent in middle- and high- income jurisdictions. Cases related to the passenger transport sector are the most prevalent in middle- income countries, where they make up one-third of all cases, reflecting in part a wave of mergers between ride-hailing firms in middle-income coun- tries in recent years. Abuse of dominance cases are more prevalent in middle-income countries, while cases on anticompet- itive agreements are more common in high-income countries. The type of anticompetitive behavior by data-driven platforms in different sectors may depend on the nature of their business models. In the e-commerce and tourism sectors, cases of vertical restraints (agreements between firms at different lev- els of the value chain that constrain competition) pre- dominate, potentially reflecting the reliance on small businesses to provide products and capacity in these sectors (figure 7.3). In online search and advertising and software and operating systems, cases of abuse of dominance are more common, likely because of their reliance on self-preferencing algorithms. In passenger transport, collusion cases (agreements to fix market parameters between firms at the same level of the value chain) have been the most frequent, which could stem from the scope for applying pricing algorithms in this sector.7 Adapting antitrust tools. New market dynamics arising from data-driven markets have spurred policy makers to rethink their approaches to antitrust enforcement, with jurisdictions around the world devising new strategies and articulating new guid- ance.8 Debate and refinement continue, particularly when it comes to issues such as defining “relevant markets” and determining “dominance.” For example, Kenya’scompetitionauthorityrecentlypublishednew Market Definition Guidelines to capture trends in data-driven markets. Such guidelines can be an effec- tive way for authorities to begin to tackle these issues, while providing clarity to firms on the approach that will be taken to regulate their conduct and on the fac- tors that will be assessed in antitrust cases. The complexity of the data-driven economy for conventional antitrust reflects several departures from traditional markets. Salient challenges include how to assess consumer harm in markets in which goods and services are nominally provided for “free,” how to address collusive algorithms (see chapter 3), and how to account for the nonprice dimensions of competition such as privacy. The multisided nature of data-driven platforms also means that interactions among groups of users (including advertisers, in some cases) complicate the definition of markets and raises the possibility of cross-subsidization across different sides of the plat- form. This cross-subsidization includes advertising revenues effectively covering the cost of nominally “free” services provided to platform users. It is pre- cisely the user data collected and processed by these Figure 7.3 Among anticompetitive practices, abuse of dominance is more widespread worldwide across multiple sectors of the digital economy Source: Nyman and Barajas, forthcoming. Data at http://bit.do/WDR2021-Fig-7_3. Note: Based on information for all antitrust cases involving digital platforms globally for which information was publicly available as of January 2020. a. High-income countries (% of total cases) Retail e-commerce, 21 Others, 15 Online search and advertising, 15 Tourism (accommodation), 13 Social media and communica- tions, 10 Software and operating systems, 15 Passenger transport, 8 Online real estate platforms, 5 Retail e-commerce, 23 Software and operating systems, 18 Others, 10 Passenger transport, 35 Mobile financial services, 8 Online search and advertising, 8 b. Middle-income countries (% of total cases) E - c o m m e r c e Number of antitrust cases P a s s e n g e r t r a n s p o r t S o f t w a r e a n d o p e r a t i n g s y s t e m s O n l i n e s e a r c h a n d a d v e r t i s i n g T o u r i s m ( a c c o m m o d a t i o n ) O n l i n e r e a l e s t a t e p l a t f o r m s E - b o o k s S o c i a l m e d i a a n d c o m m u n i c a t i o n s s e r v i c e s M o b i l e fi n a n c i a l s e r v i c e s T i c k e t i n g O n l i n e d e l i v e r y s e r v i c e 3 5 9 9 2 1 1 1 3 6 1 1 1 9 1 1 7 1 0 2 4 6 8 10 12 14 16 Abuse of dominance Collusion Vertical restraints
  • 252. 232 | World Development Report 2021 platforms that make advertising on these platforms so valuable because of better targeting (see chapter 3). The multisided nature of platforms also means that, in addition to selling its own services, a plat- form acts as a buyer of services or even as a de facto employer of individuals in the gig economy such as drivers, household workers, or professionals who provide services. The potential imbalance in bargain- ing power between data-driven platform firms and individuals or small suppliers warrants the attention of policy makers. For example, in 2020 South Africa published regulations on protecting against abuse of buyer power that explicitly acknowledged the poten- tial for such situations to arise in e-commerce mar- kets and gig economy services.9 Some competition authoritieshavealsobeguntoconsideradoptingrules against abuse of a superior bargaining position.10 Platform firms typically exist in a digital eco- system, where providers of complementary digital products interconnect and regularly exchange data to provide consumer products (such as the use of Google Maps by digital transport apps). To the extent that these complementor firms may also act as nascent competitors with larger platform firms, the effect of competition restrictions on these com- plementary products is an important consideration beyond the direct effect on the users of a platform. The potential for platforms to acquire potential competitors in complementary markets before they can become a competitive threat—and either shut them down or prevent further development of their products—has also become a topic of debate.11 The potential for firms to engage in such “killer” or “zombie” acquisitions may merit consideration in merger reviews. Likewise, authorities should be increasingly alert to the possible harm to competition and innovation from mergers driven by the desire to acquire new data or data-relevant intellectual prop- erty such as algorithms.12 Under traditional antitrust regimes, mergers involving data-driven firms may be less likely to trigger a review by the antitrust authority because such firms typically do not have sufficient tangible assets or revenues to meet the traditional thresholds for merger notification.13 Although the urgency of these concerns for developing countries will depend on the start-up environment in a country, thresholds for merger notification could be revamped to allow antitrust authorities to review potentially anticom- petitive mergers involving data-driven firms that may appear small but could rapidly become market challengers through exponential growth. This has already occurred in Austria, Germany, and Japan, which have adopted complementary thresholds for digital markets based on transaction values. Another option for triggering reviews would be requiring noti- fication before mergers of any planned acquisition by dominant firms or shifting presumptions for future mergers so that an acquisition by a dominant plat- form would be presumed anticompetitive unless the merging parties are able to show otherwise.14 Creating institutional capacity to assess cases in the data economy. Views vary as to whether specialized digital market authorities should be established to oversee competition in the digital economy (as formally pro- posed in the United Kingdom15 and informally dis- cussed in the United States16 ). An increasing number of antitrust authorities in high-income economies have established teams specializing in the digital economy, but authorities in low- and middle-income countries lack the same capacity. A recent World Bank survey of eight competition authorities across developing regions found that none had specialized staff dedicated to cases in the digital economy, and almost all cited limited staffing as a key constraint in their ability to pursue competition issues in digi- tal markets. Moreover, half the authorities surveyed acknowledged their lack of understanding of plat- form business models.17 The shortage of capacity and resources among recently formed competition authorities contrasts with the abundant resources of large global digital firms. This imbalance must be taken into account by policy makers when consider- ing the resourcing and institutional setup required to combat competition issues in the data-driven econ- omy. It further strengthens the case for international cooperation as well as for ex ante measures to prevent harm from occurring in the first place (covered later in this section). As for the substance of investigations, antitrust investigations of platform businesses are increas- ingly considering features specific to data-driven markets. But the most frequently assessed factors in antitrust cases involving platforms tend to be more traditional: assessment of competition from other platforms and the prices of goods and services. Although these factors often remain relevant, other, more novel issues posed by these markets appeared to be less systematically assessed by agencies. Network effects and multihoming behavior by consumers (use of multiple platforms for the same service) are explic- itly mentioned in about 40 percent of cases across all jurisdictions. The issue of whether lack of access to data constitutes a barrier to entry or an essential input is present in 27 percent of all cases, but issues with algorithms are covered in only 13 percent. Data protection and privacy factors were raised as issues in only 6 percent of cases overall. Building capacity
  • 253. Creating value in the data economy: The role of competition, trade, and tax policy | 233 within regulatory authorities would help further address these challenges. Some antitrust authorities may use data analytics toenhancetheircapabilitytodetectcompetitionissues in online markets. For example, the Russian Federa- tion’s antitrust authority has created an automated system to screen for bid rigging by analyzing data frompublictenderstoidentifysignsofcollusion.Simi- larly, Brazil’s competition authority has created a cartel screening unit to collect and analyze data, including through web scraping. In the context of COVID-19, competition authorities have encouraged e-commerce platform firms to monitor for excessive pricing prac- tices by their sellers. For example, Amazon suspended thousands of sellers in the United States alone,18 and Jumia delisted hundreds of products in Nigeria whose prices had been raised in response to the pandemic.19 Remedying harm caused by anticompetitive firm behavior. A consensus is growing that simply ending such behavior and imposing fines are insufficient. Antitrust authorities are moving toward designing remedies that can help restore competition in the market. Ordering the end of practices such as anti- competitive exclusivity clauses and the anticompet- itive tying or bundling of products can, by encourag- ing multihoming by consumers, facilitate access to data by a broader range of competitors. Elsewhere, further measures specifically targeting data and algo- rithms may be necessary. But these remedies can be difficult to design and monitor, especially when the algorithms being used by the firm are not discernible. In the Uber-Careem merger in Egypt, for example, Uber was obliged to grant future competitors access to Careem’s “points of interest map data” on a onetime basis; to grant current competitors access to trip data (including rider and driver information), subject to data protection laws; and to give riders access to their own data. Uber also committed to removing exclusiv- ity requirements in contracts with drivers to prevent them from being locked in the merged platform. A different approach was taken in the Uber-Grab merger in Singapore. There, the competition author- ity considered mandating transferability of driver data between apps before abandoning the idea after a survey of potential entrants suggested this was not an impediment to their entry.20 This case reinforces the need to design remedies case by case and to link them clearly to a theory of harm in order to safeguard against remedies that are detrimental to innovation without significantly improving competition. Cooperating across borders. Because of the global nature of many platform businesses, antitrust deci- sions taken in one jurisdiction often have spillover effectsinothercountries.Forexample,inGermany,as part of a remedy to respond to competition concerns from third-party sellers, Amazon agreed to amend its terms of business for sellers on Amazon’s online marketplaces across Europe, North America, and Asia. Among the changes was a reduction in Amazon’s (pre- viously extensive) rights to use data on the products of third parties.21 Such cross-country benefits could be further leveraged by promoting international cooper- ation among antitrust authorities, thereby creating a more predictable regulatory environment for firms. To date, the European Union (EU) is alone in having a substantial track record in competition enforcement in digital markets at the regional level. However, several regional competition authorities are becoming more active—such as the Competition Commission of the Common Market for Eastern and Southern Africa (COMESA) free trade area—although they have yet to take on a digital markets case. More informal collaboration is also occurring among competition authorities. For example, the BRICS countries (Brazil, Russia, India, China, and South Africa) released a joint report on the digital econ- omy that calls for increased cooperation among the authorities.22 Antitrust agencies in the Group of Seven (G-7) also released a “common understanding of the challenges posed by the digital economy” in 2019, addressing the need for international cooperation.23 Emerging free trade agreements—such as the African Continental Free Trade Agreement, which is likely to have a digital focus—could also foster harmonization on competition policy for the data-driven economy through their competition policy protocols. Promoting competition through regulation ex ante Ex ante market regulations that promote competition by enabling data use may be just as important as antitrust remedies. Indeed, given the durable nature of market power in data-driven markets and the chal- lenges of designing antitrust remedies, ensuring that markets do not slip into entrenched market power in the first place may be even more important than enforcing antitrust rules ex post. In countries with- out a developed antitrust regime, including many low-income countries, this is the only line of policy response to foster competitive data-driven markets. Adopting mandatory and voluntary schemes to improve access to data. Governments are considering various regulatory options to ease access to data in digital ecosystems. However, such options remain relatively untested, and evidence on their efficacy is still scarce. Options include facilitating multihoming; extending the right to portability of personal data (in essence, the right to move personal data between different
  • 254. 234 | World Development Report 2021 controllers); facilitating data interoperability (the ability for different systems to share and use data in a coordinated, timely manner); and encouraging data sharing or pooling schemes (when two or more firms agree to merge their data for access by themselves and possibly third parties). Relatively few jurisdictions have put mandatory portability and interoperability schemes into prac- tice, although a few instructive models are emerging. The European Union has been at the forefront of this push, although an ability to enforce these schemes is not yet clear. The right to portability of personal data is contained in the European Union’s General Data Protection Regulation (GDPR), whereas its 2019 reg- ulation on the free flow of nonpersonal data should be important for firms that rely on machine data.24 Kenya, Mexico, Nigeria, the Philippines, and Thai- land are some other jurisdictions that have also put in place rights to portability, although—as with the GDPR—these regimes have yet to be tested. Measures to mandate interoperability go beyond portability rights; they also aim to ease the sharing of data from a technical perspective. Ease is important where continual access to data is required.25 An exam- ple is the implementation of “open banking” regimes whereby financial service providers are mandated to share data on user accounts to third parties through open application programming interfaces (APIs). Banking data are well suited to data sharing initia- tives because they are relatively homogeneous and standardized, and the concept of open banking is now well established in Europe. The United Kingdom’s open banking initiative seems to have been particularly successful in spurring market entry and innovation, with 134 third-party pro- viders currently registered and supplying services.26 At least nine other jurisdictions also have emerging open banking regimes in place.27 In Brazil, the central bank published a draft regulation on open banking in 2019. In India, although there is no mandatory open banking regime, policy measures to encourage data sharing have been introduced. These include safe- guards such as the creation of “account aggregators” to ensure that individuals consent to use of their financial data and that data are not used beyond the agreed terms.28 Beyond the realm of open banking, in Mexico several revisions of the Fintech Law have been adopted to enable data sharing, including facilitation of access to user data and regulation of the fees banks can charge for sharing user data.29 Voluntary industry data access schemes are also operating around the world. The Open Ag Data Alliance, an industrywide project, aims to standard- ize the way in which farmers’ data are shared with larger firms (such as those developing precision agri- culture models).30 In Kenya, the One Million Farmer Platform connects agtech players across 14 value chains to take advantage of large-scale shared data collection and digitized farmer profiles.31 In Nigeria, financial industry players have voluntarily formed an alliance to develop open banking APIs.32 The Solid project, launched in 2018, aims to provide open- source software that allows users to fully own their data and allows developers to create decentralized apps that run on that data. Meanwhile, Apple, Face- book, Microsoft, and Twitter are developing the Data Transfer Project, an open-source initiative to enable portability of some types of user data among partici- pating platforms, although its impact on competition has yet to be seen. When jurisdictions seek to impose mandatory data sharing regulations, they must design such schemes carefully to avoid distortive effects such as stifling incentives to innovate, facilitating collusion through excessive transparency of firms’ strategic variables such as prices, and unduly raising the cost of doing business. The immediate benefits of mandating data sharing need to be balanced with the possibility that it would reduce incentives for those sharing data to invest in data collection and for competitors receiv- ing data to build their own collection capacity. To safeguard incentives for innovation, it may be useful to examine whether the data to be shared have the features of an “essential facility” (akin to the “essential facilities doctrine” framework used to reg- ulate sharing of infrastructure). Such an examination would require careful economic analysis of whether the data in question are an essential input that com- petitors cannot replicate or substitute. Because data, unlike physical infrastructure, are nonrivalrous, there should in theory be a stronger argument in favor of granting access requirements. Nonetheless, design of such requirements would still have to carefully consider future market dynamics, including incen- tives for innovation; whether the remedy should be timebound and limited to markets where a competi- tion issue has been identified; and whether different firms should have different obligations according to their market position. Regulators should also ensure that shared data are kept secure, and that, when per- sonal data are involved, the exchange is carried out in compliance with data protection laws. Regulating the structure and behavior of data-driven platformfirms. Calls to regulate large data-driven firms
  • 255. Creating value in the data economy: The role of competition, trade, and tax policy | 235 ex ante are gaining ground and merit consideration. Moves in this direction must be targeted at remedy- ing specific competition bottlenecks and should be grounded in sound economic analysis (for example, remedies around data sharing should be based on a finding that a lack of data is indeed prohibiting firms from competing). The possibility of regulating large platforms as essential facilities (akin to telecom regu- lation) may be promising. However, it should be con- ditional on a platform or its data constituting a bot- tleneck or an unavoidable trading partner for other firms, and on the data in question being an essential, nonreplicable input to potential competitors. Another area of discussion has been the pos- sibility of breaking up large data-driven platform firms.33 Because of the highly interventionist nature of this solution, it should be considered only when the firm in question holds market power believed to be entrenched and durable. It is also important to consider that the network effects that initially led to the firm’s dominance may persist in each of the indi- vidual market segments after the firm is broken up. There is no guarantee that simply separating a firm’s segments will overcome these network effects and allow expansion by others. Nonetheless, structural solutions may be relevant to solve some issues typically seen with platform firms. Preventing firms from operating in multiple markets would mitigate the risk of firms leveraging economies of scope from data insights across mul- tiple markets. It would also address the risk of verti- cally integrated firms providing preferences for their own products. All this needs to be weighed against the benefits consumers may experience from economies of scale and scope that arise when service providers participate in multiple markets. Other ex ante regulations could target the “qual- ity” dimensions of services provided by data-driven firms—particularly the protections afforded to indi- viduals on the collection and use of their data (if not already adequately covered by the data protection regime). They could also look at the terms applied to the (often small) suppliers that participate in these platforms. In this vein, regulators could consider providing smaller firms with access to platforms or prominence in their rankings on a fair, consistent, and transparent basis. For example, the European Union enacted a platform-to-business regulation in 2019 that requires a platform to make its terms and conditions easily available to businesses that trade on the platform, including disclosure of conditions under which either party may access data generated by or provided to the platform, as well as explanations of the ranking algorithms employed.34 Leveraging offline regulation. For some data-driven businesses, the key to being able to enter and com- pete does not lie so much in data remedies as in other aspects of regulation, including “offline” regu- lation. In some countries, new regulations are being imposed to protect traditional or incumbent players. In Morocco and Tunisia, state-owned enterprises are not subject to the same data protection obliga- tions that are binding for the private sector.35 New e-commerce rules in India that prohibit foreign firms from selling their own products on their platforms are intended to protect domestic retailers against risks of exclusion (reflecting the line of regulatory reasoning on preventing self-preferencing noted ear- lier). However, the fact that these regulations target only foreign firms and are not predicated on the firm holding a dominant position may mean that such regulations could be creating an unlevel playing field beyond what is needed to prevent adverse outcomes. Regulations that ban entry of data-driven business models are an obvious example of offline regulatory restrictions. Spurred by protests from incumbent players, Uber was blocked from entering a range of countries, including Bulgaria (for a time), Italy, and Spain.36 In response, the competition authorities of at least 24 countries have advocated against dispropor- tionate restrictions for transport platforms, including those in Brazil, Colombia, Indonesia, Kenya, Mexico, and Peru. In China, the government helped to resolve uncertainty by legalizing ridesharing apps and estab- lishing procedures to formally license drivers.37 In Mexico, the competition authority recommended that local governments recognize transportation services provided by platforms, leading to new regu- lations allowing them to operate fully in Mexico City and other localities.38 Still other regulations can raise the costs of data- driven firms to compete. For example, ridesharing regulations in Jordan specify that fares charged by ridesharing companies must be 15 percent higher than those of taxis, and discounts may not go below the tariff stipulated for taxis.39 In Egypt, drivers and vehicles working with ridesharing platforms are required to pay 25 percent higher registration fees and taxes than traditional taxis.40 Fintech is another sector in which the conditions favoring incumbents are gradually being dismantled. Some regulations have focused on unstructured supplementary service data (USSD) channels, which establishareal-timesessionbetweenamobilehandset
  • 256. 236 | World Development Report 2021 and an application to generate a financial service and are considered an “essential facility” for many fintech providers.RegulatorsinBangladesh,Colombia,Kenya, and Peru have promulgated regulations to open up third-party access to mobile network operators’ USSD channels after some providers strategically restricted access to those channels to dampen competition from potential rivals.41 Banks can also unfairly exclude non- bank rivals from payment settlement infrastructure, which is often owned or controlled by incumbent bank consortiums. In response to this issue, the Peo- ple’s Bank of China created a separate clearinghouse for nonbank payment providers in 2018, in part to create a more level playing field. Recommended reforms of competition policy The recommendations for competition policy that follow are grouped according to a maturity model designed to reflect a country’s stage in the develop- ment process. Establishing fundamentals In low-income countries with limited institutional capacity, a pragmatic approach to competition policy focusing on the essentials is warranted. It should build on related instruments that may already be in place, while developing institutional capacity for the future. Create a level playing field for data-driven businesses. The first priority is to ensure that traditional regula- tions and policies do not hinder entry by data-driven firms or create an unlevel playing field between firms. This could include harmonizing standards and requirements for entry and operation of data- driven firms with standards and requirements for traditional firms in competing markets (for example, ride-hailing platforms and regular taxis). Build on existing sectoral regulations. In regulated sectors with institutional capacity—such as telecom, energy, and banking—more targeted regulatory approaches to encouraging data access could be con- sidered, building on international experience such as with open banking. Develop capacity for dealing with data-driven busi- nesses. Governments should also invest in building an understanding of data-driven business models and data ecosystems in selected agencies, such as sector regulators, and by industry-related policy makers. As part of sectoral initiatives or industrial policies, policy makers could consult with the private sector—partic- ularly start-ups—to understand data needs and data bottlenecks for firms and broker industry-led solu- tions for data sharing. Accelerating data flows In addition to the preceding recommendations, coun- tries with more capable institutions could encourage or mandate data sharing for markets or circum- stances where it would be pro-competitive on balance and in compliance with data protection policies for personal data. Develop guidelines for portability and interoperability. Where portability is possible, consumers should be trained to exercise these rights. Where data sharing may be mandated under existing legal tools (such as the competition law, market regulations, or license terms), policy makers could define an essential facil- ities–style framework and build the in