About the value of personal data

Santiago Andrés Azcoitia

Professor, Researcher & ICT Senior Consultant

Published Mar 14, 2023

Abstract:

“Data” is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. Due to the unique characteristics of “data” as an economic good, data markets are underperforming, concentrated and geographically imbalanced, which undermines the immense potential of machine learning and artificial intelligence.

Aiming to unlock this potential, several entities trading data over the Internet have entered the market adopting at least ten different business models. They face problems to protect data ownership, to deal with a fragmented market that lacks secure sovereign exchange standards, to come up with knowledgeable data valuation models, and to set explainable prices.

In that direction, practitioners of different disciplines have proposed methodologies to calculate the value of data from different perspectives responding to different contexts and needs. Still, measuring the data economy continues to be a titanic challenge that will require huge efforts from the scientific community, the public sector and the industry in the next few years.

Thanks to the development of artificial intelligence (AI) and to the massive adoption of machine learning (ML) models, “data” is becoming a key production factor, comparable in importance to capital, land, or labour. However, due to the unique characteristics of “data” as an economic good (a freely replicable, non-depletable asset holding a highly combinatorial and context-specific value [1]), companies are reluctant to share them, exchanges often take place ad hoc and through barter arrangements [2], and most valuable data assets still remain unexploited in corporate “silos” nowadays.

As a result, the so-called data economy flourishes around a restricted number of champions, horizontally integrated across the value chain [3], and shows a significant geographical imbalance [4]. Unsurprisingly, unleashing the potential of data in the economy has become a key policy objective in the European Union [5], which predicts the size of the data economy to reach 827€ billion for the EU27 countries in 2025 [6]. Some analysts have calculated the potential value of the data economy to become US$2.5 trillion globally in the same year [7], and a potential of +US$13 trillion in AI by 2030 [8].

The massive collection and exploitation of personal data by digital firms in exchange of services, often with little or no consent, has raised a general concern about privacy and data protection [9]. Apart from spurring recent legislative developments in this direction [10,11], this concern has raised remarkable voices warning against the unsustainability of the existing digital economics, some of which propose that people are paid for their data in a sort of worldwide data labour market as a potential solution to this dilemma [12]. Some dare to estimate a transfer of 9% of the data economy from data-driven companies to data owners thanks to a new radical market of data as labour, meaning US$ 20k yearly income for a family of four, while increasing the overall size of the economy by 3% [13]. This result is far above the US$ 1k per individual resulting from dividing the market capitalisation of data-driven market champions by the global population in the beginning of 2023.

A few models have been proposed to calculate the value of personal data [14], often resulting in disparate and apparently contradictory outcomes. These models follow heterogeneous methods such as relying on the market capitalization of data-driven firms, on their turnover or on the net income of data providers, analysing unit prices by user or data volume, or evaluating the cost of a data breach. Some models quantify the economic and social impact of personal data use cases [15]. Finally, other valuation techniques turn to user surveys to determine their willingness to pay for protecting their privacy [16].

In spite of the difficulties and challenges of trading such a peculiar economic good, there already exists a relevant B2B data market that extends well beyond personal data. A recent study of companies trading data over the Internet revealed more than 2,000 such entities, and identified ten different business models. Such models include data and digital service providers, marketplaces embedded in data management platforms (Snowflake, Carto, Cognite), and data marketplaces (DMs) aiming to mediate between sellers and buyers and manage data transactions. Among the latter, general-purpose DMs (AWS, Advaneo, DataRade) intend to trade any kind of data, and are being challenged by niche DMs that target specific industries, such as automobile (Caruso, Otonomo), energy and logistics (Veracity), finance (Refinitiv, S&P). Other niche DMs focus on specific data types such as IoT real-time sensor data (e.g., IOTA, Terbine), or cover data sourcing for specific purposes, such as feeding ML algorithms (e.g., Nokia DM, DefinedCrowd). In addition, personal information management systems (PIMS, like Digi.me, Meeco, ErnieApp, or Swash) leverage recent data protection legislation to empower end users take control of their personal data, to help them exert their rights as granted by law, and to manage their consent to share their personal data with third parties. Some studies have identified key challenges that these companies face and existing or developing technology that may help with overcoming them [17, 18].

An interesting trend was spotted towards the distribution or federation of data exchange platforms [19], which can also benefit from the growing processing capabilities of the cloud edge. Through commodifying and specialising data trading, data markets are moving away from horizontally integrated monolithic siloed data providers, and towards distributed “niche” exchange platforms (Ocean Protocol, Settlemint) [17], often leveraging blockchain and their own cryptocurrencies to manage and settle transactions, and relying on federated learning [20] to process data where it is stored (Nokia DM, Acuratio). Two meaningful initiatives targeting those objectives are receiving substantial support from governments and key industry players in Europe: International Data Spaces and the Gaia-X project.

Data pricing remains a relevant problem that has long attracted the attention of researchers from very different disciplines [21]. Different schools resort to disparate techniques such as running auctions, measuring the quality of data as a weighted sum of features, comparing the information provided by different queries of a single database, or quantifying the loss of privacy by sharing a piece of data or the decreasing utility of noisy versions of a dataset. A recent study has gathered and analysed information about more than 200,000 data products offered by 43 data providers and marketplaces, idenfied which categories of data are more popular, which of them command the highest prices, which data features are being used by sellers to set the price of data products and which features do very valuable products have in common [22]. Based on this metadata, some works have managed to compare across data marketplaces and ML regression models have been trained to learn the relationship between prices and metadata as a first step to predict them and increase the transparency of data markets [23].

The 'value of data' is oftentimes linked to that of personal data, and more specifically to its application in marketing and advertising. Some notable works have measured the prices observed in online advertising to different user profiles [24, 25], and there are tools to calculate the revenue that users generate for social networks like Facebook [26].

Finally, measuring the value of personal data before acquiring them would help in avoiding the indiscriminate replication of data, most of which eventually turns out to be useless and is filtered out during the training process. Knowing the value of data beforehand allows buyers to select and purchase only products that are useful for their specific purposes. Neither is this valuation necessarily dependent on the volume of data, nor can it be easily calculated through heuristics [27, 28, 29]. On the contrary, it often requires an ad hoc valuation for the potential buyer’s specific task, a functionality that data marketplaces can offer to potential buyers looking for controlling the efficiency of such pre-processing [30, 31].

In conclusion, despite the huge efforts of the industry and the scientific community, measuring the value of data and measuring the data economy remain major challenges from both a technical and an economic perspective [32]. Furthermore, there is a growing need for reaching consensus and standard methodologies to calculate the value of personal data for accounting purposes, for valuating data-intensive enterprises, for compensating people, for setting up data taxes [33, 34], for just selecting the best data to feed a ML model [35], etc. I firmly believe this is a thriving research field that will require joint efforts of practitioners from different disciplines in the years to come.

References:

D. Coyle, S. Diepeveen, J. Wdowin, J. Tennison, and L. Kay. The value of data – policy implications. Bennett Institute for Public Policy, Cambridge, 2020
D. Laney. Infonomics. Gartner Inc.
C. Shapiro and H. R. Varian. Information Rules: A Strategic Guide to the Network Economy. Harvard Business School Press, 2000
United Nations Conference on Trade and Development UNCTAD. Digital economy report. cross-border data flows and development: For whom the data flow, 2021
European Union, A European Strategy for Data
G. Cattaneo, G. Micheletti, and al. The European Data Market Monitoring Tool. Key Facts and Figures, First Policy Conclusions, Data Landscape and Quantified Stories. Final Study Report. European Commission, 2020.
N. Henke, J. Bughin, and al. The age of analytics: Competing in a Data-driven World. McKinsey Global Institute, 2016.
J. Bughin, J. Seong, J. Manyika, M. Chui, and R. Joshi. Notes from the ai frontier: Modeling the impact of ai on the world economy. McKinsey Global Institute, 2018.
C. Veliz. Privacy is Power: Why and How You Should Take Back Control of Your Data. Bantam Press, 2021.
European Union. General Data Protection Regulation, April 2016.
State of California. California Consumer Privacy Act, 2018.
J. Lanier. Who Owns the Future? Simon & Schuster, 2013
E. A. Posner and G. Weyl. Radical Markets. Uprooting Capitalism and Democracy for a Just Society. Princeton Univ. Press, 2018.
OECD. Exploring the economics of personal data: A survey of methodologies for measuring monetary value. OECD Digital Economy Papers, 2013.
S. Diepeveen and J. Wdowin. The value of data policy implications report – accompanying literature review. Bennett Institute for Public Policy, Cambridge, 2020.
P. Carrascal, C. Riederer, V. Erramilli, M. Cherubini, and R. de Oliveira. Your browsing behavior for a big mac: Economics of personal information online. In Proc. of WWW’13
S. Andrés Azcoitia and N. Laoutaris, A Survey of Data Marketplaces and their Business Models. ACM SIGMOD Record, 51(3), (Sep 2022).
C. Fernandez, P. Subramaniam, and M. J. Franklin. Data market platforms: Trading data assets to solve data problems. In Proc. of VLDB Endow., volume 13, 2020.
L. Giaretta, T. Marchioro, E. Markatos, and ˇS. Girdzijauskas. Towards a decentralized infrastructure for data marketplaces: Narrowing the gap between academia and industry. In Proc. of the 1st Workshop on the Data Economy, DE ’22. ACM, 2022.
H B. McMahan, E. Moore, D. Ramage, S. Hampson, B. Agüera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Proc. of AISTATS’17 (2017)
J. Pei. Data pricing – from economics to data science. In Proc. of SIGKDD. ACM, 2020
S. Andrés Azcoitia, C. Iordanou, and N. Laoutaris. Measuring the Price of Data in Commercial Data Marketplaces. In Proc. of the 1st Workshop on the Data Economy, DE ’22. ACM, 2022.
S. Andrés Azcoitia, C. Iordanou, and N. Laoutaris. Understanding the Price of Data in Commercial Data Marketplaces. Accepted in IEEE ICDE 2023.
P. Papadopoulos, N. Kourtellis, P. Rodriguez, and N. Laoutaris. If you are not paying for it, you are the product: How much do advertisers pay to reach you? In Proc. ACM IMC’17
L. Olejnik, M. Tran, and C. Castelluccia. Selling off privacy at auction. In Proc. of NDSS’14
J. Cabañas, A. Cuevas, and R. Cuevas. FDVT: Data valuation tool for facebook users. In Proc. of CHI Conf., 2017
Cao, H. Truong, T. Truong-Huu, and M. Nguyen. Enabling awareness of quality of training and costs in federated machine learning marketplaces. In Proc. of IEEE/ACM International Conf. on Utility and Cloud Computing, 2022.
S. Andrés Azcoitia, M. Paraschiv, and N. Laoutaris. Computing the relative value of spatio-temporal data in data marketplaces. In Proc. of SIGSPATIAL’22
N. Gupta, H. Patel, S. Afzal, N. Panwar, R. S. Mittal, S. Guttula, A. Jain, L. Nagalapatti, S. Mehta, S. Hans, P. Lohia, A. Aggarwal, and D. Saha. Data quality toolkit: Automatic assessment of data quality and remediation for machine learning datasets, 2021
S. Andrés Azcoitia and N. Laoutaris. Try Before You Buy: A practical data purchasing algorithm for real-world data marketplaces. In Proc. of the 1st Workshop on the Data Economy, DE ’22. ACM, 2022.
X. Xu, A. Hannun, and L. Van Der Maaten. Data appraisal without data sharing. In Proc. of Machine Learning Research, 2022.
C. Artigas. Keynote at the 1st ACM Workshop on the Data Economy, 2022.
A Tax on Data Could Fix New York’s Budget - Wall Street Journal
Newsom wants companies collecting personal data to share the wealth with Californians - Los Angeles Times (latimes.com)
A. Ghorbani and J. Zou. Data Shapley: Equitable valuation of data for machine learning. In Proc. of ICML’19 (2019).

Image "Matrix Code Computer" by 0fjd125gk87 licensed by Pixabay

LinkedIn respects your privacy

About the value of personal data

Santiago Andrés Azcoitia

Professor, Researcher & ICT Senior Consultant

More articles by this author

Others also viewed

AI in Private Markets Is Only as Smart as Your Data

The First Principles of Value and Digital Data Ownership

Synthetic Data: Modern Day Alchemy, or the Voice of the Under-Represented?

Dirty Data, Big Problems: The Hard Truth About Data Quality

Why the Boardroom Needs to Pay Attention to Data Bias

The Danger of Data Without Context

Data sovereignty is a first step towards AI sovereignty and AI competitiveness

Data doesn't have to be Big to be important

The Future of Corporate Intelligence: Elevating and Expanding the Role of Predictive Intelligence

AI is moving fast. Your data is not. Now what?

Explore content categories

A successful thesis defense involves many people

May 12, 2023

Personal Information in the context of the Digital Markets Act

Jun 6, 2022