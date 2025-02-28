Chinese AI DeepSeek speeds up Nvidia H800 by 8x using FlashMLA to bypass sanctions

According to the developers, the use of FlashMLA increases the number of Chinese company DeepSeek introduced FlashMLA technology, which allows to significantly increase the performance of Nvidia Hopper H800 chips.

What is FlashMLA?

FlashMLA is a software optimization that improves the performance of Nvidia Hopper processors without hardware changes. It increases the H800 memory bandwidth to 3000 GB/s, which is almost twice the standard maximum.

Low-rank key-value compression — an algorithm that breaks data fragments into smaller parts for faster processing.

Optimized memory usage — reduces memory consumption by 40–60%.

Dynamic resource allocation — the memory paging system adjusts the load depending on the task, which speeds up the processing of variable-length sequences.

Bypassing US sanctions?

DeepSeek FlashMLA demonstrates the potential of software optimizations for the Chinese AI industry. In fact, it allows the H800 to be used with efficiency close to that of the more powerful H100, the supply of which to China is limited by sanctions.

So far, FlashMLA only works with the H800, but a possible expansion to other models could significantly impact the AI ​​computing market.

In addition to DeepSeek, Chinese researchers continue to develop methods to increase the power of available GPUs. Recently, scientists from Shenzhen University and Beijing Institute of Technology increased the performance of the Nvidia RTX 4070 in peridynamics tasks by 800 times. However, this project has military-industrial implications, since it was developed in collaboration with Russian specialists.