OpenAI held a spring press conference, focusing on the release of the GPT-4o large model, with a performance of GPT-4 Turbo level, capable of multi-modal real-time interaction. At the same time, a number of functions such as GPT-4o and ChatGPT membership version will be open to all users for free in the future, and dual-end applications on desktop and mobile phones will be launched, which is expected to continue to increase the number of users with ease of use.
End-to-end multimodal model greatly reduces latency
As an iteration of GPT-4, GPT-4o supports any combination of text, audio, and images as input, and generates output of any combination of text, audio, and images. GPT-4o has a significant reduction in latency, with a minimum delay of 232 ms and an average delay of 320 ms in voice mode, similar to the response time of humans in conversation, while GPT-3.5 and GPT-4 have a latency of 2.8 seconds and 5.4 seconds, respectively. We believe that the reduction in GPT-4o latency is due to full-stack optimization, which OpenAI says has spent a lot of effort on improving the efficiency of each layer of the stack over the past 2 years.
An end-to-end model where the inputs and outputs of multiple modalities are all handled by the same neural network. In GPT-4, the speech pattern consists of three separate models that transcribe audio to text, receive text and output text, and convert that text back into audio, resulting in GPT-4 losing a lot of information — not being able to directly observe pitch, multiple speakers, or background noise, or output laughter, singing, or expressing emotion.
GPT-4o will be free and open, and the number of users is expected to surge
With a focus on advancing AI technology and ensuring that everyone can benefit from it, OpenAI will open up GPT-4o directly to free users for multiple features. Specifically, ChatGPT free users can use GPT-4o to access the following features, including:
Experience GPT-4 level intelligence.
Get replies from models and networks.
Analyze data and create charts.
Give feedback on what you shot.
Upload files for help summarizing, writing, or analyzing.
Discover and use GPT and the GPT Store.
Plus users, on the other hand, have up to 5 times the message limit compared to free users, and will experience the alpha version of the new voice mode, GPT-4o, in the coming weeks. Opening GPT-4o to free users is expected to accelerate the penetration of AI products and stimulate a surge in the number of users, thereby gradually building an AI ecosystem, and related products will directly benefit.
Giant anxiety and end-side revolution
The market smells the opportunity brought by technology, AI Agent has become the cutting-edge direction that tech giants are competing to pursue. At present, there have been a lot of generalized agent concept products, Microsoft's Copilot, Google's Gemini are all digital assistants in this queue that have been highly anticipated.
Now, the accelerated launch of GPT-4o has revolutionized the performance of intelligent interactions. Hu Yanping, the former founder of DCCI Internet Data Center, believes that GPT-4o has redefined the interaction mode of machine vision, hearing and cameras, and the application explosiveness is immeasurable.
Advances in new technologies have added fuel to the giants' fear of missing out. It is reported that Google plans to launch a personalized digital assistant "Pixie" at the 2024 I/O developer conference, powered by Gemini, which is expected to integrate multimodal functions.
Microsoft is also gearing up to develop AI A new scenario for the agent. In order to build a personalized chatbot, on March 19, Microsoft has partnered with Inflection AI reached an agreement to license its core technology; For the closest AI available Agent Copilot, Microsoft is working on how to provide good prompts for generative AI for Copilot for Micorosft 365 provides a completion function.
Jingtai is optimistic about the landing of AI large models
GPT-4o greatly improves the human-computer interaction experience, and has a broad space for landing on hardware products such as mobile phones, smart wearable devices, smart home products, and PCs. Recently, we have noticed an acceleration in the implementation of AI large model hardware:
1) According to a Bloomberg report, Apple and OpenAI are close to reaching an iOS chatbot agreement. Investors are advised to pay attention to Apple's WWDC on June 11 2024。
2) According to The Information reports that Meta is exploring the development of AI headsets with cameras, which it hopes will be used to recognize objects and translate foreign languages. In April 2024, Meta will release Llama At 3 o'clock, it was announced that Ray-Ban smart glasses will be equipped with Llama 3. It will have functions such as text translation, live video, and object recognition.