Spreadsheets are all you need.ai https://spreadsheets-are-all-you-need.ai Learn AI topics using familiar tools Sat, 22 Feb 2025 19:01:14 +0000 en-US hourly 1 FLOPS Demystified: AI and the math behind DeepSeek training costs https://spreadsheets-are-all-you-need.ai/flops-demystified-ai-and-the-math-behind-deepseek-training-costs/ Sat, 22 Feb 2025 18:24:09 +0000 https://spreadsheets-are-all-you-need.ai/?p=1508 .entry-content { font-size: 1.35em; }

This might be my most important AI article yet but it's also my biggest FLOP.

FLOPs are one of the most fundamental metrics in AI. To understand how AI works and what it costs to train amazing models like ChatGPT or DeepSeek, you need to understand FLOPs.

In the accompanying video and this article we will:

  • Explore the difference between FLOP, FLOPS, & FLOPs
  • Reveal why AI is fundamentally different from traditional software 
  • Crack open GPT-2 (using spreadsheets-are-all-you-need) to count every FLOP
  • Apply these insights to analyze DeepSeek's viral "$5.3M training cost" claim and uncover what this number really means (and what it doesn't)

What Are FLOPs?

FLOP stands for FLoating point OPeration. Any basic mathematical operation (like addition or multiplication) performed on decimal numbers is considered a FLOP. For example, if you add 5.2 and 4.4 to get 9.6 you just did a single FLOP.

Confusingly, FLOPs (with a lowercase 's') and FLOPS (with a capital 'S') are different: 

  • FLOPs (with a lowercase 's'): how many floating point operations occur
  • FLOPS (with a capital 'S'): how many operations happen per second.

As an analogy you can think of FLOPs as “miles” (a quantity of something) and FLOPS as "miles per hour" (how fast that number changes over time).

Why FLOPs Matter in AI

Traditional software applications like Microsoft Word or PowerPoint primarily use logic and control flow statements - if/then conditions, loops, and function calls. AI models, particularly large language models, work fundamentally differently. They convert words into numbers, perform massive amounts of mathematical operations on these numbers, and then convert the resulting numbers back into words.

Let's look at a simplified example. You’ve probably heard that AI models are trained to fill in the blank and complete passages like this one: "Mike is quick. He moves ___". 

To solve this, the AI model will,

  1. Convert each word into floating point numbers (aka decimal numbers)
  2. Perform complex mathematical calculations on those numbers
  3. Produce a final decimal number
  4. Maps that final decimal number back to the known words in its vocabulary, with closer matches getting higher probabilities

For our example passage "Mike is quick. He moves ___", words like "quickly" and "fast" would be chosen in step 4 because they map to numbers that are close to the model's calculated result in step 3.

Measuring FLOPs in Practice

Using the web version of Spreadsheets Are All You Need, we can actually count these operations in a real language model without leaving our browser. Watch the accompanying video and follow along the steps yourself to see how easy it is to actually count FLOPs!

When processing just six tokens (words or parts of words), GPT-2 Small performs approximately one billion floating point operations. 

This matches closely with theoretical estimates derived from the model's architecture. A common useful estimate is that the number of FLOPs needed to process one token is approximately two times the number of parameters in the model. This makes sense when you consider that each neuron in a neural network primarily performs two operations (multiplication and addition) for each parameter.

Understanding AI Training Costs

This brings us to the recent discussion around DeepSeek's training costs. DeepSeek reported spending $5.3 million to train their latest model. As we show in the video, this estimate does line up with theoretical estimates and the reported data from their technical report:

Unfortunately, this led to articles like the one below, comparing this figure to OpenAI's reported $5 billion in development costs. 

This comparison is deeply misleading.

The $5.3 million represents just the GPU compute costs for the final training run. It doesn't include:

  • Costs of research and experimentation leading up to the final architecture
  • Failed training attempts
  • Data collection and processing
  • Personnel costs
  • Infrastructure beyond raw compute

Yes DeepSeek probably spent $5.4 million on their final training run but that was not the total amount they spent to build the model.

Think of it like measuring the cost of building a house by only counting the lumber used in the final construction. Yes, lumber is a significant expense, but it's far from the total cost of creating a house.

The Reality of Model Development

As I argue in the video, the development of frontier AI models is more akin to Thomas Edison's journey to create the light bulb. Edison didn't just build one light bulb. He made nearly 3,000 attempts before finding a design that worked. Each attempt represented real costs in materials, time, and labor.

Similarly, creating a successful AI model requires numerous experiments, failed attempts, and iterations. The final training run is just the tip of a very expensive iceberg.

The next time you see headlines about FLOPs or AI training costs, hopefully you’ll now be better prepared to not only understand them but also put them in the proper context. 

Best of luck on your AI journey.

]]>