Hustler Words – In the bustling corridors of OpenAI, a silent revolution is brewing, spearheaded by researchers like Hunter Lightman. Fresh off the heels of ChatGPT’s meteoric rise in 2022, Lightman and his team, MathGen, were quietly laying the groundwork for something even more ambitious: AI agents capable of performing tasks on a computer with human-like reasoning. Their initial focus? Mastering high school math competitions.
The journey, as Lightman explained to hustlerwords.com, was about pushing the boundaries of mathematical reasoning, an area where AI models of the time faltered. Today, MathGen’s work is considered pivotal to OpenAI’s quest to build AI reasoning models, the very heart of AI agents.

While OpenAI’s AI systems aren’t flawless – still prone to hallucinations and struggling with intricate tasks – their progress in mathematical reasoning is undeniable. One of their models even clinched a gold medal at the International Math Olympiad, a testament to their advancements. OpenAI believes these reasoning skills are transferable, paving the way for the general-purpose agents they’ve long envisioned.

Related Post
Unlike ChatGPT, a serendipitous discovery turned consumer sensation, OpenAI’s agents are the result of years of dedicated effort. Sam Altman, OpenAI’s CEO, envisions a future where computers handle tasks on demand, a concept he highlighted at the company’s first developer conference in 2023.
The release of OpenAI’s first AI reasoning model, o1, in the fall of 2024, sent shockwaves through the industry. The 21 researchers behind this breakthrough have become Silicon Valley’s most sought-after talent, with Mark Zuckerberg luring five of them to Meta with lucrative compensation packages, including Shengjia Zhao, now chief scientist of Meta Superintelligence Labs.
The rise of OpenAI’s reasoning models and agents is intertwined with reinforcement learning (RL), a machine learning technique that provides feedback to AI models in simulated environments. While RL has been around for decades, OpenAI has been instrumental in harnessing its potential.
Early on, Andrej Karpathy, one of OpenAI’s first employees, explored using RL to create an AI agent capable of using a computer. However, it took years to develop the necessary models and training techniques.
In 2018, OpenAI introduced its first large language model in the GPT series, pre-trained on vast amounts of internet data. While GPT models excelled at text processing, they struggled with basic math.
The breakthrough came in 2023 with "Q*" (later "Strawberry"), combining LLMs, RL, and test-time computation. This gave models extra time and computing power to plan, verify steps, and provide answers. This led to the "chain-of-thought" (CoT) approach, improving AI’s performance on unseen math questions.
El Kishky described the experience as witnessing the model reason, notice mistakes, and backtrack, akin to reading a person’s thoughts. OpenAI realized that the planning and fact-checking abilities of AI reasoning models could power AI agents.
Lightman described it as a pivotal moment, solving a problem he had been grappling with for years.
OpenAI identified two key areas for improvement: increasing computational power during post-training and providing models with more time and processing power while answering questions.
Following the Strawberry breakthrough, OpenAI formed an "Agents" team, led by Daniel Selsam, to advance this new paradigm. The team’s work eventually became part of a larger project to develop the o1 reasoning model, led by Ilya Sutskever, Mark Chen, and Jakub Pachocki.
Securing resources for o1 required researchers to demonstrate breakthroughs. Lightman emphasized that research at OpenAI is "bottom up," with the company supporting promising ideas.
Some former employees believe that OpenAI’s focus on developing the smartest AI models, rather than products, allowed them to prioritize o1. By late 2024, other AI labs were seeing diminishing returns on traditional pretraining scaling, making OpenAI’s approach prescient.
While the goal of AI research is to recreate human intelligence, El Kishky defines reasoning in terms of computer science: teaching the model to efficiently expend compute to get an answer. Lightman focuses on the model’s results, regardless of their relation to human brains.
OpenAI’s researchers acknowledge that their definitions of reasoning may be debated, but argue that the capabilities of their models are more important. Nathan Lambert, an AI researcher with AI2, compares AI reasoning modes to airplanes, both inspired by nature but operating through different mechanisms.
A recent position paper by researchers from OpenAI, Anthropic, and Google DeepMind suggests that AI reasoning models are not well understood and require further research.
Today’s AI agents excel in well-defined domains like coding. However, general-purpose AI agents struggle with complex, subjective tasks.
Lightman believes that the limitations of agents on subjective tasks are a "data problem," and OpenAI is exploring ways to train on less verifiable tasks. Noam Brown, another OpenAI researcher, said that OpenAI has new general-purpose RL techniques that allow them to teach AI models skills that aren’t easily verified.
OpenAI’s IMO model, which spawns multiple agents to explore ideas simultaneously, is becoming more popular. Brown believes these models will become more capable in math and other reasoning areas.
These techniques may improve OpenAI’s upcoming GPT-5 model, which the company hopes will dominate the market and power agents for developers and consumers.
El Kishky says OpenAI wants to develop AI agents that intuitively understand users’ needs, without requiring specific settings. The ultimate goal is an agent that can do anything on the internet for you, a vision that guides OpenAI’s research.
While OpenAI led the AI industry a few years ago, it now faces competition. The question is whether OpenAI can deliver its agentic future before its rivals.






Leave a Comment