Signal 08.01 (2024!): Original content powers the web, but those who create it are often the least rewarded. AI could make this exponentially worse

Ben Shepherd

General Manager Strategy + Special Projects // National Rugby League

Published Jan 8, 2024

Welcome to the first Signal of 2024. I hope everyone had a nice break and managed to switch off.

The biggest media story over the last month has revolved around the issue of AI platforms and the debate over how they compensate original content creators and media companies for using their creative works in the training and evolution of their models.

The issue is simple. AI models require information to be ingested in order to create a base level of knowledge. A good model will need this information to be ingested and refreshed frequently. Sourced, vetted and professional content is an extremely important input for this if an AI platform is going to be able to create value that is based on timely, accurate and credible sources.

The stakes are high as the financial value at stake is high. Open AI is already valued north of $100 billion USD, but if you think about the other main businesses in the AI fight (Google, Amazon, MSFT, Meta, Apple) they have a collective current market cap north of $8 trillion USD, with many of these businesses seeing 50%+ share price growth in 2023 alone fueled by the forecast future value of their AI platforms.

In my view the issue is simple. Without high quality content these AI platforms are more 'A' than 'I', which materially limits their ability to be useful and value creating. And in the scheme of trillions of dollars of wealth creation, it's pretty normal for the content inputs to this wealth creation to feel they have a right to compensation at a level commensurate with the technological inputs.

1. Enter the NY Times

Open A.I had brokered some content usage negotiations with a handful of publishing businesses across the world, but the NY Times action marks the most prominent.

The Times lawsuit is based around what it believes is "widescale copying" that allows Chat GPT users to ask the platform for article excerpts from NY Times paywalled stories and receive entire paragraphs of information copied verbatim.

Axios summary of the lawsuit

The Times premise is simple - Open AI (and other AI models) are scraping Times content without permission and without compensation. It then uses this content to power a product that is generating significant revenue and market cap. Web scraping has been around for as long as the web - but it is especially prevalent now and according to Bloomberg Law accounts for over 60% of web traffic in some categories such as travel and e-commerce.

Technology companies, many of which scrape others content, are often very litigious in protecting their own content/platforms against scraping. Microsoft's Linkedin, and Meta have both engaged in legal cases to prevent others from scraping their services for services that may generate the other party financial or valuation gain. Google bans automated scraping for commercial purposes across its products.

The Times lawsuit is focused on verbatim misuse of their copyrighted works. They allege scraping is occurring and then the information is being parroted verbatim on Chat GPT. It also alleges that users of Chat GPT can easily use it to avoid the paywall and obtain articles that require payment.

2. The upcoming issue is more complex

Copied excerpts of information is likely an easy technical problem to fix and is unlikely to be the enduring challenge.

The greater issue will be when an AI platform can aggregate or homogenise credible sources to create what appears to be original thinking but is blended from multiple sources.

For example, someone may ask for the 5 most impactful areas influencing the future of mining stocks and Chat GPT (or another model) may come back with an answer that could be informed by varied sources - it could be the AFR, Bloomberg, economic textbooks, etc. All of these inputs have created value (and the answer needs all of them), but how are they compensated?

Again, this area comes down to one key principle - the right of the content creators to obtain the same value for their work as the technology creators. In this instance the above product is highly valuable - a nuanced and well considered piece of information delivered rapidly, using highly credible sources. But in the current climate the only people seeing any financial reward is the technology owner.

3. This has implications for any business with valuable IP

Right now this is a content creator issue, but ultimately the same principles of scraping and re-use apply to every single industry. And the same challenge of a highly valuable technology company freely using IP and information that another company has paid to create (and often owns the copyright and/or trademarks on) to solely benefit the technology company. Web scraping will take all the copy you've created, images you've paid for, design you've created, colours you use, pricing, information, SKUs etc and use it in a way that provides your business with no value but creates immense wealth elsewhere.

Don't get me wrong, I am not anti AI. It has some excellent uses. But its intelligence is highly predicated on the intelligence of real people and those inputs need to be protected.

Signal 08.01 (2024!): Original content powers the web, but those who create it are often the least rewarded. AI could make this exponentially worse

Ben Shepherd

General Manager Strategy + Special Projects // National Rugby League

1. Enter the NY Times

2. The upcoming issue is more complex

3. This has implications for any business with valuable IP

Signal

3,718 followers

More articles by this author

Others also viewed

Unethical AI is Bankrupting the Web

The New Era of Search: Conversational AI and Its Impact on Information Retrieval

Beyond AI Slop: How to Create Authentic Content with Artificial Intelligence

Want Your Brand in AI Search Results? Start Here.

The Content Black Hole: How AI Is Killing the Web—and What Comes Next

Human vs AI, who will win the content race.

Search is Beyond Google: Is Your Site Getting the Visiblity in AI Search?

The Great Content Heist: How AI Is Stealing the Internet's Business Model

Google’s AI Push: Helping Users or Hurting Creators?

Is Your Content Being Used to Train the AI in Google Bard?

Explore content categories

1. Enter the NY Times

2. The upcoming issue is more complex

3. This has implications for any business with valuable IP

Signal

3,718 followers

The three certainties in life - death, taxes and tech taxes

Sep 13, 2025

Signal: Australian media company reporting season. What's the temperature?

Aug 30, 2025

Signal: KPop Demon Hunters is breaking the Internet in the best ways

Aug 23, 2025

Australian media businesses could benefit from private equity ownership. The question is whether the reverse is true.

Aug 17, 2025

How come Australian TV revenue was down 4.6% in F25 when UK TV revenue was up 3.8%? It all depends how comfortable you are in investing to win.

Aug 9, 2025

Signal: "I'm taking my ball and going home": BARB, YouTube is/isn't the new TV, and transparency going backwards.

Aug 2, 2025

The future of TV must be a walled garden

Apr 12, 2025

The idea of principal media makes total sense until you realise it makes no sense at all.

Apr 5, 2025

We are in the principal media era.

Mar 1, 2025

Costar's bid for Domain creates a tough question for Nine: What business is it in?

Feb 22, 2025

Others also viewed

Unethical AI is Bankrupting the Web

The New Era of Search: Conversational AI and Its Impact on Information Retrieval

Beyond AI Slop: How to Create Authentic Content with Artificial Intelligence

Want Your Brand in AI Search Results? Start Here.

The Content Black Hole: How AI Is Killing the Web—and What Comes Next

Human vs AI, who will win the content race.

Search is Beyond Google: Is Your Site Getting the Visiblity in AI Search?

The Great Content Heist: How AI Is Stealing the Internet's Business Model

Google’s AI Push: Helping Users or Hurting Creators?

Is Your Content Being Used to Train the AI in Google Bard?

Explore content categories