Hands-on with OpenAI AgentKit: Building a Research + Reporter Workflow in Agent Builder

TL;DR

I used OpenAI AgentKit with Agent Builder to create a simple Research plus Reporter chain. It turns a topic into keywords, searches the web per keyword, then assembles a short report with sources.
AgentKit gives you a visual canvas, tools, logging and approvals. You still need basic logic using Common Expression Language, CEL, to make it robust.
Good fit if you want an OpenAI native starting point that you can later export to code. Less ideal if you need model agnostic orchestration today.
Watch costs and reliability. Use strict JSON outputs, hard iteration caps and keep chat history off for single turn tool pipelines.

Why this guide, and who it is for

I wanted to see how far you can get with AgentKit before dropping down to a full code build. This post walks through what AgentKit is, the exact workflow I built, and the trade offs I hit along the way. It is practical and neutral, with enough detail that you can recreate it.

Who this is for

Builders who like a visual canvas, and plan to export to Python or TypeScript later.
People comfortable with light programming concepts, loops, counters and JSON shaping.
Teams who value safety features, user approvals and built in logging without extra plumbing.

What is OpenAI AgentKit

AgentKit is OpenAI’s framework for building and running agents that can use tools, keep state and follow control logic. Agent Builder is the visual canvas that lets you chain models, tools and logic without writing full application code.

Key parts you will touch:

Models. Choose an OpenAI model for each step, for example an instruction tuned model for structured outputs.
Tools. Add first party tools like Web Search when you need retrieval. Configure scope and output format.
Logic and state. Use nodes to set variables, track counters and branch. Expressions are written in Common Expression Language, CEL.
Safety and governance. Guardrails for unsafe or personal data, plus User Approval gates so a human can confirm sensitive steps.
Logging and preview. Run the flow on canvas. Each node shows its status and links to logs when things break.

Demo at a glance

The demo is a three agent chain that takes a topic and returns a short, sourced report.

Keyword Agent → Report Agent → Assembler Agent

Keyword Agent turns a topic into a compact JSON list of keywords.
Report Agent loops through those keywords, runs Web Search, and extracts 2 to 3 useful facts with URLs.
Assembler Agent combines the findings into a readable summary with headings and a source list.

I keep state variables for limits, for example no_of_keywords and max_iterations. I add a User Approval step before doing the searches so I can review the keyword list first.

Recreate this build in 8 steps

Create a project in AgentKit and open Agent Builder.
Start variables. Add topic as text, no_of_keywords as an integer, for example 3, and max_iterations as a hard cap, for example 5.
Keyword Agent. Prompt for a strict JSON array of keywords. Fail if the array is empty or longer than no_of_keywords.
User Approval. Show the keyword list and require a quick human check before continuing.
Report Agent. For each keyword, call Web Search, prefer recent and credible sources, and extract a few facts plus URLs in JSON. Keep chat history off to reduce context bleed and cost.
Set State. Track loop counters with CEL so you stop when you reach the caps. If a search fails or returns junk, skip and continue.
Assembler Agent. Turn the collected JSON into a short report with headings, bullet points and a list of sources.
Preview and log review. Run on canvas, inspect node logs, tighten prompts and schemas, then export to code if you want to run locally.

Tiny examples, not a full code dump

CEL counter idea: iterations = iterations + 1 then check iterations > max_iterations to exit early.
Strict JSON: ask the model for {"keyword": "", "facts": [{"text": "", "url": ""}]} and reject any freeform prose.

Guardrails and User Approval

Guardrails help block jailbreaks, unsafe content and personal information before the flow proceeds. The out of the box patterns cover common numbers and identifiers, including Australian TFNs, Medicare numbers, ABNs and ACNs, plus credit cards and emails. This reduces the chance you accidentally process or output something you should not.

User Approval adds a human in the loop. I place it after the Keyword Agent so I can confirm scope before any web calls. You can add another approval before publishing the final report if needed.

Tip: pair Guardrails with strict output schemas. If the model must return a valid JSON shape, you have fewer places for risky text to sneak through.

Costs and performance

You pay for tokens and for any tools you call, for example Web Search. During testing, it is easy to spend more than you expect if you let loops run wild or you keep chat history on when you do not need it.

Practical ways to keep costs down:

Keep chat history off for single turn, tool heavy pipelines. Turn it on only when you need conversation continuity.
Use hard iteration caps. Set both a keyword limit and a strict max iteration count.
Ask for short, strict JSON. The more concise the output, the fewer tokens and the easier the parsing.
Prefer few, credible sources per keyword instead of long lists.

Debugging and gotchas

A few things cost me time the first run. Here is the short list, with fixes.

Infinite or long loops. Set no_of_keywords and max_iterations. Fail fast if either is exceeded.
Schema drift. If the model drifts from your JSON shape, add an example JSON block and a single line rule, for example, reject any text outside the JSON.
Search noise. Ask for 2 to 3 facts only, each with a source URL, and bias to recent results. Be explicit about what to ignore.
Chat history confusion. Leave it off for non conversational steps. It reduces cost and avoids the model referencing previous keywords when you do not want it to.
Ambiguity in prompts. Add country or domain hints if you want Australia first, for example prioritise .gov.au, .edu.au and credible news domains.

When to export to code

Agent Builder is great for layout, quick iteration and demos. I export when I need one or more of the following:

Local runs on my laptop or tighter integration with existing code and data.
Tests, version control and CI, and more control over error handling.
Model choice beyond OpenAI, or more complex retrieval and ranking logic.

My plan for Part 2 is to export this workflow to TypeScript or Python using the SDK and compare the experience.

How AgentKit compares

People often compare AgentKit to visual orchestrators like LangFlow or n8n because the canvases look similar. The positioning is different.

AgentKit is opinionated around OpenAI services, with strong safety and approvals built in.
LangFlow or n8n are model agnostic and orchestration first. They are better if you need to mix many providers today.

Choose based on your deployment target and how much you want to customise the stack.

FAQ

Can I use non OpenAI models?
AgentKit is designed for OpenAI models. If you need other providers, you will likely export to code or choose a different orchestrator.

Do I need to learn CEL?
A little. CEL handles small expressions for state and conditionals. If you think in Python, translate your intent into tiny CEL snippets.

How do I keep costs under control?
Short JSON outputs, small contexts, iteration caps and chat history off where possible. Review logs and trim anything noisy.

Can I restrict research to Australia?
Yes. Add an AU bias in the Web Search configuration and in prompts, then filter or post process by domain if needed.

Is AgentKit no code?
Not quite. The canvas helps a lot, yet you still need developer style thinking to make outputs reliable.

References and resources

OpenAI AgentKit overview and node reference, official docs. See the Agent Builder docs and this short announcement, Introducing AgentKit.

Common Expression Language, CEL, documentation.
My Part 2, exporting the workflow to code, coming soon.

Note
Tested on 25 Oct 2025 using OpenAI AgentKit and Agent Builder, not sponsored, features may change.