Europe Pending - The OpenAi Agent that thinks and Acts

by Adam Pettman, Head of Innovation and AI · 18th July 2025

At 2i, we’ve built our reputation on ensuring that digital systems are safe, scalable, and dependable. Now with the US release of ChatGPT Agent, we’re witnessing a new era of AI, one that blends reasoning with action, and intelligence with autonomy. Fingers crossed this kind of innovation makes its way over the atlantic as soon as possible.

A huge leap toward agentic AI: systems that can browse the web, run code, access APIs, and complete real-world tasks using their own virtual computers, all under your instruction and supervision.

But while the technology is impressive, its successful deployment depends on something foundational: robust QA.

What is ChatGPT Agent?

ChatGPT Agent merges three previously separate capabilities into a single, unified system:

Conversational fluency (ChatGPT)

Autonomous web interaction (Operator)

Information synthesis and analysis (deep research)

This allows ChatGPT to:

Navigate websites and access content behind logins

Run terminal commands and perform data transformations

Create editable slide decks and spreadsheets

Analyse inboxes, calendars, and dashboards

Carry out complex workflows across multiple steps and tools

It does all this using its own secure virtual machine, maintaining context throughout - shifting seamlessly between browsers, APIs, and code as needed.

Why QA Matters in Agentic AI

AssureAI: A framework built for this moment

As AI shifts from providing suggestions to executing tasks, the risks and responsibilities grow in parallel. At 2i, we’ve developed AssureAI; a structured, technology-agnostic framework for evaluating the quality of AI systems. It focuses on four core principles:

Accuracy

Does the agent consistently produce correct and relevant outputs, even across multi-step tasks?

We test for factual integrity, hallucination resistance, and contextual consistency across sessions.

Performance

Is the AI efficient, stable, and responsive under real-world conditions?

We assess execution speed, resource utilisation, and reliability. Especially when AI is integrated into live workflows.

Explainability

Are the agent’s actions and decisions understandable to humans?

With ChatGPT Agent acting across multiple interfaces (browsers, terminals, APIs), explainability is essential - for debugging, compliance, and trust.

Robustness

How well does the AI handle adversarial inputs, unexpected scenarios, or ambiguous instructions?

We test behaviour under stress - ensuring safe failure modes, resistance to prompt injection, and graceful recovery from edge cases.

These four pillars underpin all responsible AI adoption. Without them, organisations risk deploying powerful tools that behave unpredictably, erode trust, or compromise security.

With them, AI becomes a reliable partner, no longer the usual clever assistant.

Transforming Productivity—Safely

Used wisely, ChatGPT Agent opens up powerful new use cases:

For teams: Automate inbox triage, meeting prep, research, and reporting

For delivery: Convert dashboards to presentations, summarise logs, or generate test artefacts

For operations: Plan and book travel, draft documents, and integrate data from disparate systems

The best Ai systems are designed for collaborative, human-in-the-loop workflows—you can intervene, adjust, or stop tasks at any point. The model also proactively asks for clarification or confirmation where needed.

This is automation with boundaries. Something we believe should be the standard, not the exception.

Trust, Safety, and Control

OpenAI has equipped ChatGPT Agent with a suite of safety features:

Permission-based actions: AI asks before making consequential changes (e.g. purchases, emails)

Secure browser takeover: Data entered during sessions isn’t stored or visible to the model

Privacy tools: One-click clearing of browsing data and active sessions

Prompt injection defences: Trained to detect and resist malicious web content designed to manipulate behaviour

These are important foundations - but as we advise our clients, they must be combined with organisational QA processes. Testing agentic systems can’t be an afterthought, it must be embedded from the start.

At 2i, we see this as more than just a new feature. It’s the next frontier of software interaction, and we’re here to make sure it’s done right.

Want to explore how AI agents could work inside your delivery pipelines, or how to test them before they go live?

Let’s talk. We’ll help you adopt AI that’s not just powerful - but trusted, tested, and truly fit for purpose.

Start the conversation

Europe Pending - The OpenAi Agent that Thinks and Acts

Recent posts