Spreadsheets are all you need.ai https://spreadsheets-are-all-you-need.ai Learn AI topics using familiar tools Fri, 14 Mar 2025 21:59:34 +0000 en-US hourly 1 ArrrZero: Why DeepSeek R1 is less important than R1-Zero https://spreadsheets-are-all-you-need.ai/arrrzero-why-deepseek-r1-is-less-important-than-r1-zero/ Fri, 14 Mar 2025 21:57:23 +0000 https://spreadsheets-are-all-you-need.ai/?p=285 The headlines about DeepSeek R1’s cost are missing an even bigger story, the R1-Zero model. This model, and research behind it, are reshaping how we thought AI assistants should be trained.

To understand why the R1-Zero model matters, we have to look at how AI chatbots like ChatGPT are typically created. When an AI model is trained on text from the internet, the resulting model is actually pretty dumb. Chatting with it is useless because it only knows how to literally imitate text on the internet. For example, in the video for this blog post, when the model (called a “base model”) is given a prompt like “First Name:”, it replies “Name:, Address:”. Essentially, the model is trying to guess what kind of web page might contain the prompt and then replies with the imagined content from the rest of that hypothetical web page. In this case, it sees “First Name:” and guesses it’s part of a form, so it replies with the rest of the form (i.e. Address). This is effective enough for crafting fake news stories about Unicorns but it can’t carry a conversation nor answer user questions directly.

To turn this “base model” (also known as a “pretrained model”) into a chatbot like ChatGPT has typically involved a complex and labor-intensive process. The process requires the three major steps shown in the diagram below (adapted from the State of GPT) that are labeled “Supervised Finetuning”, “Reward Modeling”, and “Reinforcement Learning”.

The “Supervised Finetuning” step requires humans to write thousands of example conversations teaching the model how a helpful AI assistant should behave. Even once the model has learned from those examples, humans are needed to grade the quality of the responses the model produces which further fine tunes the quality of the model in a process called “Reinforcement Learning from Human Feedback (RLHF)”.

DeepSeek R1-Zero changes the game by eliminating most of this human work. Instead of requiring armies of people to train and fine-tune the AI, it uses math and coding problems to teach the AI how to think and reason. Because these problems have clear right and wrong answers, computers can automatically check if the AI is correct without any human oversight needed.

“It’s an aha moment, not just for the model, but for the researchers,” notes the DeepSeek team, describing the moment their AI learned to catch its own mistakes and consider different approaches to problems.

This breakthrough could accelerate AI development by removing what experts call the “human bottleneck” (the need for extensive human input in training AI systems). It’s already being recognized as a major advancement, with some experts considering it even more significant than other recent AI developments that have grabbed headlines.

What makes this particularly exciting is that several teams have already successfully reproduced similar results using smaller-scale versions of this approach, suggesting this could become a new standard way of creating AI assistants that can think and reason effectively.

As AI continues to evolve, DeepSeek R1-Zero’s innovative training method might just be the key to creating smarter AI assistants more quickly and efficiently than ever before.

]]>
FLOPS Demystified: AI and the math behind DeepSeek training costs https://spreadsheets-are-all-you-need.ai/flops-demystified-ai-and-the-math-behind-deepseek-training-costs/ Sat, 22 Feb 2025 18:24:09 +0000 https://spreadsheets-are-all-you-need.ai/?p=1508 .entry-content { font-size: 1.35em; }

This might be my most important AI article yet but it's also my biggest FLOP.

FLOPs are one of the most fundamental metrics in AI. To understand how AI works and what it costs to train amazing models like ChatGPT or DeepSeek, you need to understand FLOPs.

In the accompanying video and this article we will:

  • Explore the difference between FLOP, FLOPS, & FLOPs
  • Reveal why AI is fundamentally different from traditional software 
  • Crack open GPT-2 (using spreadsheets-are-all-you-need) to count every FLOP
  • Apply these insights to analyze DeepSeek's viral "$5.3M training cost" claim and uncover what this number really means (and what it doesn't)

What Are FLOPs?

FLOP stands for FLoating point OPeration. Any basic mathematical operation (like addition or multiplication) performed on decimal numbers is considered a FLOP. For example, if you add 5.2 and 4.4 to get 9.6 you just did a single FLOP.

Confusingly, FLOPs (with a lowercase 's') and FLOPS (with a capital 'S') are different: 

  • FLOPs (with a lowercase 's'): how many floating point operations occur
  • FLOPS (with a capital 'S'): how many operations happen per second.

As an analogy you can think of FLOPs as “miles” (a quantity of something) and FLOPS as "miles per hour" (how fast that number changes over time).

Why FLOPs Matter in AI

Traditional software applications like Microsoft Word or PowerPoint primarily use logic and control flow statements - if/then conditions, loops, and function calls. AI models, particularly large language models, work fundamentally differently. They convert words into numbers, perform massive amounts of mathematical operations on these numbers, and then convert the resulting numbers back into words.

Let's look at a simplified example. You’ve probably heard that AI models are trained to fill in the blank and complete passages like this one: "Mike is quick. He moves ___". 

To solve this, the AI model will,

  1. Convert each word into floating point numbers (aka decimal numbers)
  2. Perform complex mathematical calculations on those numbers
  3. Produce a final decimal number
  4. Maps that final decimal number back to the known words in its vocabulary, with closer matches getting higher probabilities

For our example passage "Mike is quick. He moves ___", words like "quickly" and "fast" would be chosen in step 4 because they map to numbers that are close to the model's calculated result in step 3.

Measuring FLOPs in Practice

Using the web version of Spreadsheets Are All You Need, we can actually count these operations in a real language model without leaving our browser. Watch the accompanying video and follow along the steps yourself to see how easy it is to actually count FLOPs!

When processing just six tokens (words or parts of words), GPT-2 Small performs approximately one billion floating point operations. 

This matches closely with theoretical estimates derived from the model's architecture. A common useful estimate is that the number of FLOPs needed to process one token is approximately two times the number of parameters in the model. This makes sense when you consider that each neuron in a neural network primarily performs two operations (multiplication and addition) for each parameter.

Understanding AI Training Costs

This brings us to the recent discussion around DeepSeek's training costs. DeepSeek reported spending $5.3 million to train their latest model. As we show in the video, this estimate does line up with theoretical estimates and the reported data from their technical report:

Unfortunately, this led to articles like the one below, comparing this figure to OpenAI's reported $5 billion in development costs. 

This comparison is deeply misleading.

The $5.3 million represents just the GPU compute costs for the final training run. It doesn't include:

  • Costs of research and experimentation leading up to the final architecture
  • Failed training attempts
  • Data collection and processing
  • Personnel costs
  • Infrastructure beyond raw compute

Yes DeepSeek probably spent $5.4 million on their final training run but that was not the total amount they spent to build the model.

Think of it like measuring the cost of building a house by only counting the lumber used in the final construction. Yes, lumber is a significant expense, but it's far from the total cost of creating a house.

The Reality of Model Development

As I argue in the video, the development of frontier AI models is more akin to Thomas Edison's journey to create the light bulb. Edison didn't just build one light bulb. He made nearly 3,000 attempts before finding a design that worked. Each attempt represented real costs in materials, time, and labor.

Similarly, creating a successful AI model requires numerous experiments, failed attempts, and iterations. The final training run is just the tip of a very expensive iceberg.

The next time you see headlines about FLOPs or AI training costs, hopefully you’ll now be better prepared to not only understand them but also put them in the proper context. 

Best of luck on your AI journey.

]]>