Recently, a Silicon Valley startup called Etched has made waves in the field of AI hardware with its ASIC chips for AI to provide a more cost-effective choice for Transformer computing used by mainstream AI large model companies from the lowest level of architecture.
20 times faster than NVIDIA's H100
Etched by Harvard dropout Gavin Uberti and Chris Founded in 2022, Zhu has developed an ASIC chip called Sohu designed specifically for Transformer models. Etched claims that the Sohu chip reasoned Llama-3 The 70B is 100 times faster than NVIDIA's H20 while consuming significantly less power.
Etched has just raised $120 million in new funding, led by Primary Venture Partners and Positive Sum Ventures, Peter Thiel, Github CEO Thomas Dohmke and former Coinbase CTO Balaji Srinivasan and other well-known investors also participated in this round of financing.
As the Transformer model continues to drive generative AI breakthroughs, the Sohu chip is expected to break the pattern of NVIDIA GPU dominance and reshape the AI computing landscape on the AI inference side.
|What are the advantages of ASICs?
The battle between ASIC and general-purpose computing power cards has been around for a long time, and with the entry of cloud vendors and large OEMs, the competition has intensified.
At present, the main manufacturer of general-purpose computing power cards is NVIDIA, accounting for nearly 70% of the AI computing power market; The main ASIC vendors are Broadcom and Marvell, which account for more than 60% of the ASIC market.
ASICs offer the advantages of high performance, low power consumption, cost-effectiveness, confidentiality and security, and reduced board size for specific mission scenarios.
This advantage is mainly due to:
• ASICs: Integrated circuits designed for specific applications, optimized for specific tasks, where they typically have the advantage of high performance, low power consumption, and low power consumption over GPUs. But the downside is that it is not universal.
•Universal computing power card: provides standardized high computing performance, but does not focus on specific task scenarios, is suitable for a wide range of applications, and has versatility.
换句话说,ASIC是牺牲通用性,来换取特定场景的高性能;通用算力卡则具备通用性,但在特定场景下,性能不如ASIC。
事实上,对于不同的算力卡客户来讲,需求是不同的。
云厂商也许更看重弹性计算,企业也许更关注集群算力等。面对特定的需求,ASIC比标准算力卡更具备优势,更加贴合客户自身的使用场景。
目前,Google、Meta、微软和亚马逊等云和超大规模公司正在引领ASIC这一潮流。比如,谷歌的TPU、Meta的MTIA、微软的Maia、亚马逊Trainium2等。
需要注意的是,ASIC的成本也许高于通用算力卡。根据大摩的测算,GB200的TCO(总拥有成本),比TPUv5低了44%,比Trainimium 2低了30%。
Optimized specifically for Transformers
The GPU integrates a variety of computing units, including FP64, FP32, integer arithmetic, and Tensor designed for deep learning Cores, et al. However, the efficient use of these resources to process a wide range of CUDA code requires highly complex compilation techniques and huge software development investments, and even then, the results achieved may be limited.
Etched takes a more centralized approach that focuses on the operation of the Transformer model, which not only simplifies the software stack, but also takes full advantage of Tensor The potential of Cores has enhanced the computing performance of AI in a targeted manner. This focus of Etched is particularly appropriate given that most AI companies tend to use specialized Transformer inference libraries, such as TensorRT-LLM, vLLM, or HuggingFace's TGI, which already cover a wide range of industry needs.
The Transformer model is highly versatile in different application scenarios such as text, image, and video processing, which means that users can flexibly adjust the model's hyperparameters to suit a variety of tasks without significantly modifying the core model architecture.
In response to the customization needs of industry leaders, Etched has eliminated traditional reverse engineering challenges by opening up its software stack, from the driver to the kernel level, and giving engineers the ability to directly customize the Transformer layer to their specific needs, greatly improving the customizability and flexibility of the system.
In addition, by reducing the memory footprint, the Sohu chip increases the number of transistors used for data processing, and adopts a single large-scale core design, which effectively avoids the coordination overhead between multiple cores and further optimizes computing efficiency.
|The ASIC competitive landscape is open
At present, the competitive environment of the ASIC market is becoming increasingly open, especially reflected in the fact that large cloud service providers are actively participating in the accelerator race of AI technology through independent research and development or cooperative research and development. According to the latest financial report, Broadcom's network business revenue in the first quarter of fiscal 2024 increased significantly to $3.3 billion, an increase of 46%, which was mainly driven by the high demand for customized AI accelerators from two hyperscale customers.
By the end of fiscal 2024, AI-related businesses will raise the proportion of its total semiconductor revenue to 35% from its previous estimate of 25%, exceeding the revenue scale of $10 billion, of which about 70% is directly attributable to the sale of AI accelerator products.
Recent rumors have pointed out that ByteDance is working with Broadcom to develop an ASIC chip based on a 5nm process designed for AI, and plans to be produced by TSMC, but ByteDance has denied this.
The wide application of ASIC chips and the reduction of computing power costs brought by them are the key paths to promote large-scale AI models to a wider range of industrial applications. Unlike the rapid establishment of ASICs in the Bitcoin space, the development path of AI chips is expected to be more complex and changeable, going through a cycle of multiple iterations and upgrades. This process includes: general-purpose GPUs first facilitate the exploration and initial formation of new models and algorithms, and then realize large-scale applications through specialized ASICs, driving the explosion of market demand, and the rich ecosystem will further attract users and innovators, giving rise to more advanced algorithms, and so on, gradually advancing until the vision of comprehensive artificial general intelligence (AGI) is realized.