On March 1, DeepSeek released the technical article "Overview of the DeepSeek-V3/R1 Inference System", which announced the optimization details of the model inference system for the first time, and disclosed key information such as the cost and profit margin of DeepSeek.
What is open-sourced by DeepSeek?
Starting February 24, DeepSeek launched its annual "Open Source Week" event, which open-sourced one core technology project per day for five days, covering multiple key areas from AI model training to file system optimization. The release of these open-source projects not only demonstrates DeepSeek's deep accumulation in hardware optimization, algorithm design, and distributed computing, but also provides powerful tools and infrastructure for AI developers around the world.
Day 1: FlashMLA
FlashMLA is for Nvidia Hopper GPU-optimized, high-efficiency MLA decoding core, especially suitable for handling variable-length sequences, suitable for high-performance AI tasks. After using FlashMLA, H800 The GPU is capable of achieving up to 3000GB/s of memory bandwidth and 580TFLOPS of computing performance.
Day 2: DeepEP
As the first open-source EP (expert) dedicated to MoE (hybrid expert) model training and inference parallelism, expert parallelism) communication library, DeepEP is designed for large-scale model training. Features include efficient full-to-all communication, support for NVLink and RDMA, high-throughput cores, and low-latency cores.
Day 3: DeepGEMM
DeepGEMM is an efficient FP8 The GEMM library supports GEMM operations on traditional dense models and MoE models. At Nvidia Hopper GPU, it can achieve more than 1350 FP8 performance in TFLOPS. Although the core logic is only about 300 lines of code, it outperforms expert-tuned kernels in most matrix sizes.
Day 4: Three important open sources
DualPipe: A bidirectional pipeline-parallel algorithm used to achieve computational communication overlap in V3/R1 training.
EPLB: Expert Parallel Load Balancer for V3/R1.
Analytics data is publicly shared: Data from DeepSeek's training and inference framework is shared publicly to help the community better understand the communication computation overlap strategy and its underlying implementation details.
Day 5: 3FS
3FS is a high-performance parallel file system designed for AI training and inference workloads, supporting strong consistency and high throughput (6.6 in a 180-node cluster TiB/s of aggregate read throughput), simplifying distributed application development.
In order to encourage users to make full use of resources, the DeepSeek open platform launched off-peak discount activities during the "Open Source Week". During the nightly idle period from 00:30 to 8:30 Beijing time every day, the price of API calls was significantly reduced: DeepSeek-V3 was reduced to 50% of the original price, and DeepSeek-R1 was as low as 25% of the original price. The aim is to make the service experience more affordable and smooth.
Why did DeepSeek choose open source?
The series of core technology projects released by DeepSeek during "Open Source Week" are like building a pontoon bridge over NVIDIA's sturdy AI moat. More importantly, these open-source modules demonstrate the DeepSeek team's ability to deeply dissect and reconstruct the tight coupling model of NVIDIA's CUDA to parallel computing chips, a hardware and software infrastructure that has traditionally been considered unshakable.
Challenging traditional barriers
Through intensive open-source models and algorithms, DeepSeek not only responds to the outside world's doubts that its training models still require huge computing power, but also indicates that the open source of these core libraries will greatly stimulate the innovation vitality of global AI software and hardware teams. For AI model software research teams, hardware requirements can be reduced through algorithm optimization, such as low-rank attention compression; For China's AI chip R&D team, it can redesign the internal computing unit and communication bus by drawing on these algorithm optimizations to promote the development of domestic software and hardware integration AI models.
A new era begins
DeepSeek's open source action can be called the "conscience of the industry", although the direct application value for ordinary users and most users is limited, but it is an extremely valuable resource for the underlying technology practitioners. As the infrastructure optimization mentioned in the DeepSeek-V3 paper is all open source, many open source frameworks can adopt these optimization strategies to further reduce the cost of hardware resources, which may usher in a wave of API interface price reductions and continue to lead the open and transparent development of the industry.
Global trends
As a pioneer of open source large models, the success of DeepSeek has led to the new trend of open source large models. Baidu, Alibaba and other leading manufacturers have announced the open source of their large models, showing the common choice of leading enterprises for open source. Xiangyang Shen, Chairman of the Council of the Hong Kong University of Science and Technology and a foreign member of the National Academy of Engineering, pointed out at the 2025 Global Developer Pioneer Conference (GDC) that although the share of closed source still exceeds that of open source, this pattern will change drastically in the next year or two. He believes that through the efforts of Shanghai and other places, the Chinese team will lead the open source trend in the future.
Balance open source and closed source
Although in China, the open source of large models seems to have become the mainstream, on a global scale, this has not formed a unified trend. OpenAI, for example, still adheres to the closed-source route. Even DeepSeek has reservations in the open-source process, such as not disclosing its training data and process. This battle between open source and closed source is not only between enterprises, but may even rise to the national level. In the era of digital economy, the cost of information replication is almost zero, and DeepSeek's choice of open source can quickly occupy the market, gain a large number of active users, and explore other business models to achieve profitability on this basis. In contrast, the traditional closed-source model needs to invest a lot of advertising costs to promote users.
The competition for AI R&D is becoming more and more fierce
R&D competition in the field of artificial intelligence is becoming increasingly fierce. On February 27, local time, artificial intelligence giant OpenAI released GPT-4.5 (research preview) and claimed that this is the company's largest and most powerful chat model to date. However, due to the continued high investment and high cost, the launch of GPT-4.5 has sparked widespread controversy.
Challenges and controversies of GPT-4.5
According to public information, developers can call GPT-4.5 directly in the API, but its pricing is significantly higher than that of its predecessor - the price of input tokens is 30 times more expensive than GPT-4, and the price of output tokens is 15 times more expensive. Although OpenAI's CEO Altman wants to launch GPT-4.5 at the same time Plus and Pro versions, but due to the shortage of GPU resources, it had to be postponed until next week after tens of thousands of GPUs were added before the official launch of the Plus version.
Comparison of open source and closed source development routes
What do you think of OpenAI's new GPT-4.5? Wang Wei, an industry expert, believes that this reflects the difference between the two different development routes of closed source and open source. "GPT-4.5 does excel in a lot of evaluation capabilities, but it costs a lot of computing power and money. From our point of view, while it has advantages, it comes at a huge cost. In contrast, we prefer a sustainable model like DeepSeek. ”
DeepSeek's low-cost strategy
At the same time, DeepSeek continues to take the low-cost and cost-effective route. On February 26, DeepSeek announced a price reduction notice: 00:30 to 08:30 Beijing time is the off-peak period every day, during which the price of API calls will be significantly reduced. Specifically, the DeepSeek-V3 is reduced to 50% of the original price, and the DeepSeek-R1 is reduced to 25% of the original price. This initiative is designed to encourage users to make the most of their free time at night and enjoy a more affordable service experience.