Get in touch

Europe Pending - The OpenAi Agent that Thinks and Acts

Why quality assurance is now more essential than ever 

by Adam Pettman, Head of Innovation and AI · 18th July 2025

At 2i, we’ve built our reputation on ensuring that digital systems are safe, scalable, and dependable. Now with the US release of ChatGPT Agent, we’re witnessing a new era of AI, one that blends reasoning with action, and intelligence with autonomy. Fingers crossed this kind of innovation makes its way over the atlantic as soon as possible. 

A huge leap toward agentic AI: systems that can browse the web, run code, access APIs, and complete real-world tasks using their own virtual computers, all under your instruction and supervision. 

But while the technology is impressive, its successful deployment depends on something foundational: robust QA. 

 

 

 

What is ChatGPT Agent? 

ChatGPT Agent merges three previously separate capabilities into a single, unified system: 

  • Conversational fluency (ChatGPT) 

  • Autonomous web interaction (Operator) 

  • Information synthesis and analysis (deep research) 

 

This allows ChatGPT to: 

  • Navigate websites and access content behind logins 

  • Run terminal commands and perform data transformations 

  • Create editable slide decks and spreadsheets 

  • Analyse inboxes, calendars, and dashboards 

  • Carry out complex workflows across multiple steps and tools 

 

It does all this using its own secure virtual machine, maintaining context throughout - shifting seamlessly between browsers, APIs, and code as needed. 

 

Why QA Matters in Agentic AI 

AssureAI: A framework built for this moment 

 As AI shifts from providing suggestions to executing tasks, the risks and responsibilities grow in parallel. At 2i, we’ve developed AssureAI; a structured, technology-agnostic framework for evaluating the quality of AI systems. It focuses on four core principles: 

 

Accuracy 

Does the agent consistently produce correct and relevant outputs, even across multi-step tasks? 

We test for factual integrity, hallucination resistance, and contextual consistency across sessions. 

 

Performance 

Is the AI efficient, stable, and responsive under real-world conditions? 

We assess execution speed, resource utilisation, and reliability. Especially when AI is integrated into live workflows. 

 

Explainability 

Are the agent’s actions and decisions understandable to humans? 

With ChatGPT Agent acting across multiple interfaces (browsers, terminals, APIs), explainability is essential - for debugging, compliance, and trust. 

 

Robustness 

How well does the AI handle adversarial inputs, unexpected scenarios, or ambiguous instructions? 

We test behaviour under stress - ensuring safe failure modes, resistance to prompt injection, and graceful recovery from edge cases. 

 

These four pillars underpin all responsible AI adoption. Without them, organisations risk deploying powerful tools that behave unpredictably, erode trust, or compromise security. 

With them, AI becomes a reliable partner, no longer the usual clever assistant. 

 

Transforming Productivity—Safely 

Used wisely, ChatGPT Agent opens up powerful new use cases:

For teams: Automate inbox triage, meeting prep, research, and reporting 

For delivery: Convert dashboards to presentations, summarise logs, or generate test artefacts 

For operations: Plan and book travel, draft documents, and integrate data from disparate systems

 

The best Ai systems are designed for collaborative, human-in-the-loop workflows—you can intervene, adjust, or stop tasks at any point. The model also proactively asks for clarification or confirmation where needed. 

This is automation with boundaries. Something we believe should be the standard, not the exception. 

 

Trust, Safety, and Control 

OpenAI has equipped ChatGPT Agent with a suite of safety features: 

Permission-based actions: AI asks before making consequential changes (e.g. purchases, emails) 

Secure browser takeover: Data entered during sessions isn’t stored or visible to the model 

Privacy tools: One-click clearing of browsing data and active sessions 

Prompt injection defences: Trained to detect and resist malicious web content designed to manipulate behaviour 

 

These are important foundations - but as we advise our clients, they must be combined with organisational QA processes. Testing agentic systems can’t be an afterthought, it must be embedded from the start.  

At 2i, we see this as more than just a new feature. It’s the next frontier of software interaction, and we’re here to make sure it’s done right. 

Want to explore how AI agents could work inside your delivery pipelines, or how to test them before they go live? 

Let’s talk. We’ll help you adopt AI that’s not just powerful - but trusted, tested, and truly fit for purpose. 

 

Start the conversation