Reinforcement Learning LLM Reasoning

OpenAI finds DeepSeek used its data to train R1 reasoning model

OpenAI believes its data was used to train DeepSeek’s R1 large language model, multiple publications reported today. DeepSeek is a Chinese artificial intelligence provider that develops open-source ...

NextBigFuture

OpenAI o1 Model Sets New Math and Complex Reasoning Records

OpenAI o1 is a new large language model trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long internal chain of thought before responding ...

VentureBeat

Beyond math and coding: New RL framework helps train LLM agents for complex, real-world tasks

Researchers at the University of Science and Technology of China have developed a new reinforcement learning (RL) framework that helps train large language models (LLMs) for complex agentic tasks ...

Hosted on MSN

China's DeepSeek applying trial-and-error learning to its AI 'reasoning'

Chinese AI company DeepSeek has shown it can improve the reasoning of its LLM DeepSeek-R1 through trial-and-error based reinforcement learning, and even be made to ...

Geeky Gadgets

New ChatGPT o1-preview reinforcement learning process explained

OpenAI has introduced its latest AI model, ChatGPT o1, a large language model (LLM) that significantly advances the field of AI reasoning. Leveraging reinforcement learning (RL), o1 represents a leap ...

Science News

A look under the hood of DeepSeek’s AI models doesn’t provide all the answers

It’s been almost a year since DeepSeek made a major AI splash. In January, the Chinese company reported that one of its large language models rivaled an OpenAI counterpart on math and coding ...

mccormick.northwestern.edu

Training Reasoning Agents in Interactive, Complex Environments

Chatbots can make quick work of routine e-commerce customer service tasks and information retrieval. Sephora’s Smart Skin Scan, for example, provides personalized product recommendations, while Lowe’s ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results