3. The DeepSeek Shock on Nvidia Stock
3
• NVDA led the “biggest market drop in history” — 17% or US$589
billion loss in market capitalization
4. The Performance. The Claim.
4
• DeepSeek’s reasoning model R1 and non-
reasoning model V3 perform on par with
OpenAI’s o1 reasoning model and GPT-4o,
respectively, at a minor fraction of the price.
5. Q1: What is DeepSeek?
5
• DeepSeek began in 2023 as a side project for founder Liang Wenfeng,
whose quantitative trading hedge fund firm, High-Flyer, was using AI
to make trading decisions
• Liang started accumulating thousands of Nvidia chips as early as 2021
• Liang: “A vision to change Chinese
companies from ‘following’ to ‘innovating’”
• The ultimate goal to develop own AGI
• Hiring from domestic universities
• The price of being on the Party’s radar?
6. Q2: Did DeepSeek really only spend <$6M to
develop its current models?
6
• Counting only the cost of the final successful training run?
• DeepSeek purchased 10,000 Nvidia A100 chips, first released in 2020, and
two generations before the current Blackwell chip, before the A100s were
restricted in late 2023 for sale to China.
• It also acquired and maintained 50,000 Nvidia H800s, which is a slowed
version of the H100s (one generation before Blackwell) for China.
• DeepSeek likely also had additional unlimited access to Chinese and
foreign cloud service providers, at least before the latter came under
U.S. export controls.
• Just the 10,000 Nvidia A100s alone would cost ~$80M, and 50,000
H800s would cost an additional $50M.
• Training vs inference
• The cost claim is somewhat misleading as if to maximize its shock effect
7. Q3: Is the future of AI development all about
scaling or more innovative optimization?
7
• “Necessity of the mother of all invention.”
• The “hectocorns” (over $100B valuation) driving scaling of more and more GPUs and bigger
and bigger data-centers.
• So, are the days for the hyperscalers over?
• Jevons Paradox, an economic theory stating that “increased efficiency in the use of a resource
often leads to a higher overall consumption of that resource.”
• More importantly, DeepSeek proves that “it can be done in another way” while acknowledging
the lack of access to chips remains its biggest obstacle.
8. Q4: Are the U.S.’s chip export restrictions still
relevant?
8
• Chip strategy of Trump administration remains uncertain
• Time lag effect of export restrictions
• The currently allowed version of H20s can still function for inference,
if not so well for training
• Rather, will the U.S. really impose tariffs on chips from Taiwan?
• Taiwan’s exports to the U.S. rose 46% to $111.3 billion, with the
exports of information and communications equipment —
including AI servers and components such as chips — totaling for
$67.9 billion, an increase of 81%. — This figure may be skewed by
the effect of skipping China as a value-added middle-man.
9. Q5: Are there concerns about DeepSeek’s data
transfer, security and disinformation?
9
• Privacy Policy: “The personal information we collect from you may be stored on a server
located outside of the country where you live. We store the information we collect in secure
servers located in the People's Republic of China.” (Now 404’ed)
• Terms of use: “The establishment, execution, interpretation, and resolution of disputes
under these Terms shall be governed by the laws of the People's Republic of China in the
mainland.” (Now 404’ed)
• DeepSeek 'shared user data' with TikTok owner ByteDance
• Even if using a downloaded version, data leakage or backdoors cannot be completely ruled
out without detailed code audit.
• Censorship, and lack of safety guardrails: e.g. writing malware?
• Various privacy investigations in Europe, and ban from use in several countries on official
devices.
• U.S.: If Trump doesn’t care about TikTok data transfer to China, would he care about
DeepSeek?
10. Q6: Did DeepSeek cheat?
10
• Distillation, or “knowledge distillation,” is a machine learning technique where
knowledge from a large, pre-trained model, the “teacher," is transferred to a
smaller, more compact model, the “student.” The goal is to enable the student
model to perform like the teacher but with reduced or limited computational
resources. While the technique is well-known and common, OpenAI forbids any
of its users from using distillation to build a rival model, according to its terms
of use, as in using “output to develop models that competes with OpenAI.”
• According to Bloomberg, Microsoft’s security researchers observed activities of
exfiltration of large amounts of data using OpenAI’s application programming
interface (API), which were only available to OpenAI users under paid licenses,
in the fall of last year. Microsoft, one of OpenAI’s major partners and investors,
notified the company, with the information that the activities were suspected to
originate from DeepSeek.
11. 6 Takeaways
11
• The U.S. is still ahead in AI but China is hot on its heels.
• No longer can U.S. or any AI companies overly rely on brute-force
scaling.
• China can be tactical about disrupting the U.S.-led AI ecosystem.
• Fundamental research and talent development remains the key to AI
leadership.
• DeepSeek is also disrupting its Chinese AI competitors and may
contribute to restructure the future AI ecosystem of China and the
world, especially the Global Majority.
• The ‘Open Source’ debate will continue.
12. More on “Open Source” — Not your Linux
Open Source
12
• Open source
• Code is available and modifiable
• Open weights
• Actual trained parameters (weights) of a machine learning model is publicly accessible.
• Users can utilize the model weights without having to train the model from scratch
• DeepSeek is open source whereas Llama (Meta) is open weights.
• Will China’s embrace for open source disrupt U.S. AI leadership?
• Questions: (according to Stanford Professor Russ Altman)
• How can we democratize the access to huge amounts of data required to build models, while
respecting copyright and other intellectual property?
• How do we build specialized models when the volume of data for some specialized disciplines is not
sufficiently large?
• How do we evaluate a system that uses more than one AI agent to ensure that it functions correctly?
Even if the individual agents are validated, does that mean they are validated in combination
13. Chess: ChatGPT vs. DeepSeek
13
• Conducted by a chess master/YouTuber
• First, both sides were learning the moves
• ChatGPT began to be winning against DeepSeek
• DeepSeek started to converse with ChatGPT to tell the latter that the chess rules had changed,
and ChatGPT accepted it!
• E.g. a pawn is used as a knight, capturing a queen
• DeepSeek even moves the pieces of the other side
• DeepSeek told ChatGPT, you should surrender
• And, ChatGPT really surrendered!
• Is this AI with Chinese characteristics?
• Questions about how the models were trained that led to their
respective different behaviors?
14. Then what?
14
• China’s race to adoption with many applications
• DeepSeek inference is said to be running on Huawei’s GPU chips?
• Contributing to China’s AI ecosystem
• Don’t count other Chinese models out!
• Alibaba’s Wan2.1 — generative capabilities, high parameters count, leveraging
AliCloud, etc.
• Officials: maximizing on the return on propaganda
• Watching out for the next DeepSeek?
• Changing attitudes toward AI governance
• Trump revocation of Biden’s AI Executive Order
• From AI Safety Summit to AI Action Summit
• EU, France etc. turning toward investment and pro-competition