ChatGPT Agent will be able to control the computer

OpenAI has taken another step in developing autonomous AI agents that can act almost like virtual assistants. A new tool called ChatGPT Agent can perform tasks on behalf of a user, using its own “virtual computer.”

What ChatGPT Agent can do

The agent works on the basis of a specially trained model, which is able to:

search for information on websites and filter results;

run code and analyze data;

create spreadsheets, presentations and reports;

manage calendar and schedule meetings;

book restaurants, make purchases and perform routine tasks.

All the agent’s work is done using a built-in virtual computer – it can download and process files, execute commands in the terminal, view results in a visual browser.

Technology and Learning

The ChatGPT Agent model was trained on complex tasks that require the simultaneous use of many tools: text and visual browsers, a terminal, as well as support for importing user data. For training, a reinforcement learning methodology was used, similar to that used for logical thinking models.

ChatGPT Agent combines the capabilities of two previous OpenAI products – Operator and Deep Research. Behind the project is a team of 20-35 people working on both the research and product parts.

Application examples

During the demo, the agent planned a meeting by checking availability in Google Calendar, selecting a restaurant through OpenTable, and making a reservation. The user could adjust preferences along the way. Another example is preparing a detailed report on the popularity of Labubus toys compared to Beanie Babies.

The agent is also suitable for online shopping and regular small tasks, such as applying for a parking space on a schedule.

Features and Security

ChatGPT Agent not only has access to a browser, but also runs on a full-fledged virtual computer, which significantly expands its capabilities. At the same time, the agent is not very fast – it can take 15-30 minutes to complete complex tasks, but this still saves the user time.

To prevent unwanted actions, the agent always asks for permission before sending emails or making reservations. When visiting financial sites, the agent is in observation mode: the agent only works in the current tab and stops when the user switches.

Availability