The world continues to wonder how DeepSeek was able to train an AI model like R1 with only 2,048 Nvidia chips, but more clues are emerging. The most scandalous being the accusations of "espionage" by OpenAI, which claims to have evidence that the Chinese company used their models to accelerate training.
In statements sent to Financial Times and Bloomberg, the American company claimed to have evidence that the Chinese group had used OpenAI's models to train their own, a technique known in the industry as distillation and is gaining significant weight in the world of artificial intelligence.
This technique involves extracting responses and information from one language model to train another smaller one. It is a widely used practice, to the point that OpenAI itself has tutorials on its website to teach its clients how to do it. The advantages lie in clients being able to have smaller models, highly specialized in a specific task, and cheaper to use after this process in which the smaller model interacts with the larger one.
Why then does it anger Sam Altman's company that their Asian rival has resorted to this technique? Mainly because under the distillation model mentioned above, these new models trained from GPT generate income and benefits for the company, but not for DeepSeek.
In this case, the technique has been used to create a rival model, something that goes against the terms and conditions of ChatGPT. Ironically, Altman accuses others of stealing his code after his company used all the content from the Internet without permission to train their initial models. The accusations have been backed by the U.S. Government, as David Sacks, Donald Trump's 'czar' for artificial intelligence, stated that the U.S. Government had found "tangible evidence" that DeepSeek used OpenAI to train their models.
Large language models have protective measures to prevent this. For example, OpenAI acknowledges that it monitors suspicious IPs and if it is evident that these practices are being carried out, they are blocked.
However, Ben Thompson, one of the leading experts in the field, pointed out that there is a growing consensus that this is not just happening with DeepSeek, but is a widespread practice among competitors, as recently demonstrated by Chinese AI, significantly reducing training costs and making it difficult for competitors to prove. A technique that would once again benefit the lagging actors and leave OpenAI or Anthropic with the burden of having to continue investing larger amounts than their rivals to develop the most advanced models because if you have the best model, there is no one to copy from.
On the other hand, it reinforces the benefits of open-source code, assuming that sharing this code will allow others to improve it and ultimately benefit the original creator of the model, who can benefit from these new improvements.