From December 10th to 15th, NeurIPS (Neural Information Processing Systems Conference), the world's top AI conference, was held in Vancouver, Canada At the 2024 conference, Ilya, former co-founder of OpenAI and founder of SSI In his speech, Sutskever said that pre-training as the first phase of AI model development is coming to an end.
He likened data to the fuel for AI development, pointing out that since we only have one Internet, data growth has peaked, and AI is about to enter the "post-oil era", which means that pre-trained models that rely on massive data will be unsustainable, and AI development needs new breakthroughs.
The end of the pre-training era and the "agent" transformation of AI models
In the field of artificial intelligence, pre-training, as the first stage of model development, is gradually coming to an end. This phase relies heavily on learning patterns from large amounts of unlabeled data from a wide range of sources, including the internet and books. However, as available data resources approach saturation, the development path of AI models in the future is shifting.
Ilya, a professor at the University of Toronto and chief scientist at OpenAI "We've reached the peak of our data, and there's no more new data coming out," Sutskever noted. We must make the most of the data resources we have. He stressed that the amount of information on the Internet is limited, and "there is only one Internet", which means that future models need to explore new ways of development and tap more potential on the basis of existing data.
In a media interview in November this year, Sutskever mentioned that the pre-training effect of large-scale models is gradually leveling off. "The 2010s were the era of expansion, driving advances in model performance by increasing data and computing power. Now that we have once again entered a new era of exploration and discovery, it has become crucial to find the next breakthrough point. It's more important than ever to scale in the right direction. ”
In the future, Sutskever predicts that the next generation of AI models will be truly "agentary," meaning that they will not only be able to perform tasks and make decisions autonomously, but also interact with various software environments. This shift marks a big step forward for AI from a mere data processing tool to an agent with autonomous behavior.
In addition, Sutskever revealed that his team – Scale, Speed, and Innovation (SSI) is working on an alternative to traditional pre-training extensions, but details have not yet been announced. This shows that researchers are actively exploring new technological paths to address the challenges of limited data resources and open up new possibilities for the development of AI.
AI self-awareness may be born: from pattern matching to real reasoning
Sutskever predicts that AI systems in the future will no longer rely solely on pattern matching, but will have real reasoning abilities and may even develop self-consciousness. This shift heralds a major leap forward in the field of artificial intelligence.
Sutskever noted that as AI systems become more capable of reasoning, "they will become more and more unpredictable." He drew an analogy with the performance of advanced AI in chess: "These systems are able to understand things from limited data without getting confused. "This means that the AI of the future will not only be able to process large amounts of information, but also make sound inferences and decisions about new situations.
He further compared the evolution of AI systems to the brain-to-body ratio in biology, citing research on the brain-weight relationship in different species. Human ancestors showed a unique slope in this ratio that differed from other mammals, suggesting a significant increase in cognitive abilities. "AI may find a similar scaling path beyond the current way of working based on pre-training," Sutskever said.
The nature of superintelligence will be very different from what we know today. Sutskever wanted to take the opportunity to make the difference feel concretely: "Now that we have powerful language models, they are great chat partners who perform superhuman abilities on certain tasks, but at the same time have reliability issues and can sometimes seem confusing. How to reconcile this contradiction is not clear. ”
Sutskever emphasized that the AI systems of the future will be highly unpredictable because they will be able to make sense of limited data without getting confused – a huge limitation of current AI. "I'm not talking about a specific approach or timeline, I'm just saying that this is going to happen."
When all of these abilities are finally combined with self-awareness (considering that self-awareness is useful for efficiency), we will usher in a whole new form of intelligence. Not only do these systems boast incredible capabilities, but the problems associated with them will also be completely different from what we are used to in the past.
Multi-hop inference and cross-distributed generalization of large language models (LLMs).
Discussing whether large language models (LLMs) can perform cross-distributed generalization of multi-hop inference cannot be answered simply with "yes" or "no". First, we need to understand a few key concepts: "generalization across distributions", "within distributions", and what they mean in different contexts.
What is "generalization across distributions"?
In the world of machine learning, "generalization" refers to the ability of a model to perform on unseen data. "Cross-distribution generalization" goes a step further and refers to the ability of a model to perform well in a new environment that is different from the distribution of the training data. In other words, whether the model can adapt to new, unencountered data patterns and environmental changes.
What does "in-distribution" mean?
"In-distribution" typically refers to data that has similar statistical properties to the training data. If the data that a model encounters when testing is from the same probability distribution as the training data, then that data is considered "in-distribution". Conversely, if the test data deviates significantly from the distribution characteristics of the training data, it is considered "out-of-place".
The evolution of generalization
Before the rise of deep learning, tasks such as machine translation relied heavily on rule-based techniques such as string matching and n-gram. At that point, "generalization" often meant whether the model could handle phrases that weren't in the training dataset at all. As technology has advanced, our standards for generalization have increased dramatically. Nowadays, even if a model achieves a high score on a specific task, such as a math competition, we need to consider whether it just memorizes what has been discussed on the Internet, or whether it really has the ability to generalize across distributions.
Generalization ability of LLMs
Currently, large language models have demonstrated impressive generalization capabilities to some extent. They can excel in a wide variety of tasks and are able to process data that has never been seen before. However, this ability is still somewhat limited. For example, when faced with an entirely new problem situation, the model may rely on similar examples in memory rather than solving the problem through real reasoning. Nonetheless, LLMs have demonstrated greater generalization capabilities than earlier methods, especially when it comes to handling complex tasks.
Humans vs. models
Compared with humans, there is still a lot of room for improvement in the generalization ability of LLMs. Humans are not only able to flexibly apply knowledge in known contexts, but also to quickly adjust strategies and find solutions in new contexts. While current models are not yet fully up to this level, they are already capable of mimicking human cognitive processes in some ways, especially in multi-hop reasoning and processing complex information.
The answer to the question of whether LLMs can generalize across distributions for multi-hop inference is not a simple yes or no. We need to look at this question from multiple perspectives, including but not limited to how the model actually performs, the memory components it relies on, and its ability to adapt to new situations. With the deepening of research and the development of technology, we have reason to believe that future models will make greater breakthroughs in generalization capabilities.