Signal 08.01 (2024!): Original content powers the web, but those who create it are often the least rewarded. AI could make this exponentially worse
Welcome to the first Signal of 2024. I hope everyone had a nice break and managed to switch off.
The biggest media story over the last month has revolved around the issue of AI platforms and the debate over how they compensate original content creators and media companies for using their creative works in the training and evolution of their models.
The issue is simple. AI models require information to be ingested in order to create a base level of knowledge. A good model will need this information to be ingested and refreshed frequently. Sourced, vetted and professional content is an extremely important input for this if an AI platform is going to be able to create value that is based on timely, accurate and credible sources.
The stakes are high as the financial value at stake is high. Open AI is already valued north of $100 billion USD, but if you think about the other main businesses in the AI fight (Google, Amazon, MSFT, Meta, Apple) they have a collective current market cap north of $8 trillion USD, with many of these businesses seeing 50%+ share price growth in 2023 alone fueled by the forecast future value of their AI platforms.
In my view the issue is simple. Without high quality content these AI platforms are more 'A' than 'I', which materially limits their ability to be useful and value creating. And in the scheme of trillions of dollars of wealth creation, it's pretty normal for the content inputs to this wealth creation to feel they have a right to compensation at a level commensurate with the technological inputs.
1. Enter the NY Times
Open A.I had brokered some content usage negotiations with a handful of publishing businesses across the world, but the NY Times action marks the most prominent.
The Times lawsuit is based around what it believes is "widescale copying" that allows Chat GPT users to ask the platform for article excerpts from NY Times paywalled stories and receive entire paragraphs of information copied verbatim.
The Times premise is simple - Open AI (and other AI models) are scraping Times content without permission and without compensation. It then uses this content to power a product that is generating significant revenue and market cap. Web scraping has been around for as long as the web - but it is especially prevalent now and according to Bloomberg Law accounts for over 60% of web traffic in some categories such as travel and e-commerce.
Technology companies, many of which scrape others content, are often very litigious in protecting their own content/platforms against scraping. Microsoft's Linkedin, and Meta have both engaged in legal cases to prevent others from scraping their services for services that may generate the other party financial or valuation gain. Google bans automated scraping for commercial purposes across its products.
The Times lawsuit is focused on verbatim misuse of their copyrighted works. They allege scraping is occurring and then the information is being parroted verbatim on Chat GPT. It also alleges that users of Chat GPT can easily use it to avoid the paywall and obtain articles that require payment.
2. The upcoming issue is more complex
Copied excerpts of information is likely an easy technical problem to fix and is unlikely to be the enduring challenge.
The greater issue will be when an AI platform can aggregate or homogenise credible sources to create what appears to be original thinking but is blended from multiple sources.
For example, someone may ask for the 5 most impactful areas influencing the future of mining stocks and Chat GPT (or another model) may come back with an answer that could be informed by varied sources - it could be the AFR, Bloomberg, economic textbooks, etc. All of these inputs have created value (and the answer needs all of them), but how are they compensated?
Again, this area comes down to one key principle - the right of the content creators to obtain the same value for their work as the technology creators. In this instance the above product is highly valuable - a nuanced and well considered piece of information delivered rapidly, using highly credible sources. But in the current climate the only people seeing any financial reward is the technology owner.
3. This has implications for any business with valuable IP
Right now this is a content creator issue, but ultimately the same principles of scraping and re-use apply to every single industry. And the same challenge of a highly valuable technology company freely using IP and information that another company has paid to create (and often owns the copyright and/or trademarks on) to solely benefit the technology company. Web scraping will take all the copy you've created, images you've paid for, design you've created, colours you use, pricing, information, SKUs etc and use it in a way that provides your business with no value but creates immense wealth elsewhere.
Don't get me wrong, I am not anti AI. It has some excellent uses. But its intelligence is highly predicated on the intelligence of real people and those inputs need to be protected.