Let's talk about DeepSeek.
Navigating the DeepSeek Buzz: What It Really Means for Your AI Strategy
Okay, let's talk about DeepSeek. The buzz around their R1 model has been deafening, and while the technical achievements are noteworthy, it's crucial for you to separate hype from reality, especially if you're making strategic decisions in the AI space.
You've likely heard that DeepSeek's R1 matches the performance of OpenAI's o1 reasoning model. As highlighted in a recent Lawfare article by Dean W. Ball:
"Language models can simply learn to think. This staggering fact about reality—that one can replace the very difficult problem of explicitly teaching a machine to think with the much more tractable problem of scaling up a machine learning model..." — Dean W. Ball, Lawfare, January 28, 2025
What DeepSeek and others have shown is that embedding reasoning into AI models through reinforcement learning is a real architectural advancement - one that benefits the entire AI community. While it is a real architectural advantage to some degree to embed reasoning into a model there’s a difference in the user experience. For instance OpenAI uses "hidden reasoning tokens" in their o1 & o1 mini models. That is, it seems that from OpenAI’s product management perspective it was NOT optimal to put the reasoning tokens into the output from the LLM to the end user. However they might have been mistaken: apparently DeepSeek realized that putting reasoning tokens in their output to "bring the user along" impresses most who use it in a way that how it self-dialogues seems novel…it’s not; but kudos to DeepSeek for taking a different tact on an interesting LLM reasoning feature.
Let's put this into perspective. The media has tried to highlight DeepSeek's supposed costs being "3-5% of the cost" of the larger OpenAI or Anthropic models...that’s a claim about the training costs, and likely dramatically understated claim. But it’s harder to parse when, as Alex Kantrowitz inaccurately stated on a recent podcast, the claim that inference is also significantly cheaper is just not accurate. In reality, the cost of inference—running the model in real-world applications—is nearly identical across similar-sized models, regardless of how they were trained.
What did DeepSeek achieve? They demonstrated that innovation can thrive under constraints. By optimizing their training process, they showed that you don't need the most expensive hardware to make meaningful progress in AI. However, as Reed Albergotti writes in Semaphore:
"DeepSeek represents an offering—and in the grand scheme of things, a somewhat small one—of some good ideas on how to make AI models more efficient." — Semaphore, January 28, 2025
It's an incremental improvement that, while not revolutionary, adds to our collective understanding of AI efficiency. However there's speculation that DeepSeek relied on data from existing models developed at great expense by companies like OpenAI, which complicates the narrative around their cost savings.
Recent investigations suggest the cost savings narrative is even more complicated. As Reuters reported on January 29, 2025:
"Some technologists believe that DeepSeek's model may have learned from U.S. models to make some of its gains. The distillation technique involves having an older, more established and powerful AI model evaluate the quality of the answers coming out of a newer model, effectively transferring the older model's learnings."
This practice of distillation—while common in AI development—violates the terms of service of companies like OpenAI. In fact, OpenAI has confirmed they're investigating whether DeepSeek inappropriately distilled their models. Additionally, DeepSeek's paper acknowledges using Meta's Llama architecture for some of their distilled models, further complicating the narrative around their claimed innovations and cost efficiency.
The real takeaway here is the importance of focusing on practical applications and efficiency in AI. As business leaders, you should be considering how to leverage AI technologies that are reliable, cost-effective, and aligned with your strategic goals. The DeepSeek episode is a reminder to stay grounded and critical amid the hype.
At Dataleaders.ai, we're committed to helping you navigate the evolving AI landscape with clarity and confidence. Our Gears platform delivers generative AI solutions that are both powerful and pragmatic, optimizing inference and real-world applicability. We understand that your focus is on driving tangible business outcomes, not just chasing the latest trends.
Our focus inbound sales VoiceAI agent uses the best available LLMs for the right reason that gets the right business benefit. Who knows…maybe DeepSeek R1 will be a useful tool on the menu of options as the GenAI ecosystem evolves. If we do use it (as we have been tinkering with since earlier this week) we’ll do so through Groq demonstrating how rapidly the AI ecosystem can adapt and collaborate to make new innovations accessible. Just this past weekend Groq stood up the DeepSeek-R1-Distill-Llama-70b, on their ultra fast LPU Groq cloud API. This says it all to encapsulate what the value prop of what Groq provides:
Reasoning models are capable of complex decision making with explicit reasoning chains that are part of the token output and used for decision-making – this makes ultra-low latency and fast inference essential. Complex problems often require multiple chains of reasoning tokens where each step builds on previous results. Low latency compounds benefits across reasoning chains and shaves off minutes of reasoning to a response in seconds.
Let's keep innovating and pushing boundaries, but let's also stay focused on what truly matters—making AI work for you in the real world. Each development, whether incremental or breakthrough, contributes to the exciting evolution of AI technology and its practical business applications.
#GenerativeAI #LLM #DeepSeek
side note: DeepSeek doesn't really have their Janus Image generators as dialed as DeepSeekR1 using the same prompt (/imagine "a neon futuristic sign with the words "DON'T BELIEVE ALL THE HYPE"") and seed image that got me the title image of this article...it got this from the HuggingFace Janus WebGPU model just stood up: 😆
AI custom development | Ambassador at 044.ai | Empowering businesses with intelligent AI
6moHey Luke, let's connect!
🚀CyberRank SaaS tool: AI driven future for supply management | Creative Technologist for IISRI®
7mohttps://www.cyberrank.ai/
Benefits Consultant with Cummings, Fraser, and Associates, LLC
7moThanks for posting Luke. Also, thank you for sharing in laymen's terms. I've been curious about Deepseek and it's true cost.
Supply Chain Executive at Retired Life
7moThe Best DeepSeek Quotes. “Deepseek R1 is AI’s Sputnik moment.” ~Marc Andreessen https://www.supplychaintoday.com/the-best-deepseek-quotes/
I don't know man. For people who want to leverage reasoning models it's kinda a game changer. It's an order of magnitude cheaper to run R1 vs. o1 which means I'm actually using reasoning models now. For most current applications that could leverage a reasoning model, R1 performs just as well as o1 The training method is a lot more efficient and incredibly straight forward. When combined with their R1 models you can fairly cheaply fine tune R1 for you own uses cases. Even the distills are good, meaning you can run a small reasoning model that's specific to your domain. You couldn't really do that before, unless you were a big player due to cost It's open, meaning I can run it in my DC, and limit data leakage I agree that there is a lot FUD, cope and hype, but I wouldn't discount the value that was created when they dropped these models and the papers for replicating / fine tuning. For people who are actively using reasoning models, this is a real game changer.