DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

vegeta@lemmy.world · 3 months ago

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

filister@lemmy.world · 3 months ago

What is amazing in this case is that they achieved spending a fraction of the inference cost that OpenAI is paying.

Plus they are a lot cheaper too. But I am pretty sure that the American government will ban them in no time, citing national security concerns, etc.

Nevertheless, I think we need more open source models.

Not to mention that NVIDIA also needs to be brought to earth.

Corngood@lemmy.ml · 3 months ago

This sounds like good engineering, but surely there’s not a big gap with their competitors. They are spending tens of millions on hardware and energy, and this is something a handful of (very good) programmers should be able to pull off.

Unless I’m missing something, It’s the sort of thing that’s done all the time on console games.

KingRandomGuy@lemmy.world · 3 months ago

Part of this was an optimization that was necessary due to their resource restrictions. Chinese firms can only purchase H800 GPUs instead of H200 or H100. These have much slower inter-GPU communication (less than half the bandwidth!) as a result of export bans by the US government, so this optimization was done to try and alleviate some of that bottleneck. It’s unclear to me if this type of optimization would make as big of a difference for a lab using H100s/H200s; my guess is that it probably matters less.

mesa@lemmy.world · 3 months ago

Reminds me of the Bitcoin mining and how askii miners overtook graphic card mining practically overnight. It would not surprise me if this goes the same way.

CodexArcanum@lemmy.dbzer0.com · 3 months ago

It’s already happening. This article takes a long look at many of the rising threats to nvidia. Some highlights:

Google has been running on their own homemade TPUs (tensor processing units) for years, and say they on the 6th generation of those.
Some AI researchers are building an entirely AMD based stack from scratch, essentially writing their own drivers and utilities to make it happen.
Cerebras.ai is creating their own AI chips using a unique whole-die system. They make an AI chip the size of entire silicon wafer (30cm square) with 900,000 micro-cores.

So yeah, it’s not just “China AI bad” but that the entire market is catching up and innovating around nvidia’s monopoly.

sinceasdf@lemmy.world · 3 months ago

This is why Nvidia stock has been hit so hard. CUDA is their moat

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses assembly-like PTX programming instead

DeepSeek's AI breakthrough bypasses industry-standard CUDA, uses Nvidia's assembly-like PTX programming instead