Everyone’s selling you “AI agents.” A lot of them are basically Dynamo graphs or scripted integrations with a chatbot bolted on. Here’s how to tell the difference—and why it matters for what you’re actually buying.
The Uncomfortable Truth About “Agents”
Walk into any AEC tech conference right now and you’ll hear “AI agent” every five minutes. Then ask vendors to show what happens when their “agent” encounters something it wasn’t explicitly set up for—outside the polished demo path. In many cases, you’ll see the seams immediately.
What’s happening is simple: “AI agent” sounds a lot sexier than “workflow” or “scripted automation.” So nearly anything that touches an LLM—chatbots, templated automations, if-then-else chains with a GPT call in the middle—gets rebranded as an “agent.”
That’s not just annoying marketing inflation. If you can’t distinguish a workflow from an agent, you can’t reason clearly about risk (what can this thing actually do on its own?), you can’t reason clearly about value (what am I actually paying for?), and you can’t design the right solution for the job.
So let’s draw the line.
Workflows vs. Agents: The Real Distinction
A workflow is a fixed sequence of steps. You define the path upfront. Code (or a graph) orchestrates what happens next. Anthropic defines workflows as “systems where LLMs and tools are orchestrated through predefined code paths,” and LangGraph’s team echoes the same distinction—workflows are more deterministic, with predetermined control flow.
Think: a Dynamo graph that checks room names against a standard. A Python script that exports parameters to Excel in a fixed format. A LangGraph flow where nodes and edges are hard-coded.
These are workflows. The logic path is deterministic: run it twice on the same input and the sequence of operations is the same, even if an LLM call inside might add some textual variation.
An agent, by contrast, maintains a plan → act → observe → adjust loop. Anthropic defines agents as “systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.”
The key features: it decides what to do next based on what just happened. It can change course mid-execution. It can choose different tools or strategies than last time for the same kind of input.
Practical test: Can it productively handle a situation you didn’t explicitly enumerate in your control flow?
A workflow hits an unknown case and falls off the edge: “I don’t know what to do.” An agent can choose a new strategy, tool, or plan based on context—even if you didn’t pre-wire that specific path.
Right now, many of the tools marketed as “agents” in AEC are actually workflows with some LLM steps sprinkled in. That’s not bad—workflows are often exactly what you want. You just need to know which you’re buying.
Even OpenAI Says: Start With Workflows
Here’s the part that should make every BIM manager and digital practice lead feel vindicated: OpenAI explicitly tells you to only build agents when your use case clearly meets specific criteria.
In their Practical Guide to Building Agents, OpenAI says agents are best for workflows that:
- Require nuanced judgment or exceptions – Complex decision-making where rigid rule systems break down
- Have rules that are painful to maintain – When your automation logic has exploded into an unwieldy ruleset
- Depend heavily on unstructured data – Where the core work is parsing and interpreting messy documents
And then they add this critical line: “Before committing to building an agent, validate that your use case can meet these criteria clearly. Otherwise, a deterministic solution may suffice.“
Translation: if your BIM standards checking, parameter validation, or handover data export can be expressed as a stable ruleset—which most of it can—you don’t need an agent. You need a workflow.
OpenAI also defines what an agent actually is: model + tools + instructions. Not “fancy prompting.” Not “multi-step automation.” An actual reasoning engine that can call functions and make decisions based on explicit operating rules.
The Spectrum: Predictability vs. Flexibility
LangGraph’s creators give us a useful mental model: as you move from workflows toward agents, you trade predictability for flexibility.
Their “How to think about agent frameworks” post is explicit: “As your system becomes more agentic, it will become less predictable.” They frame “Predictability vs agency” as the core trade-off, and say that for many applications, workflows are simpler, cheaper, faster, and better—and that most agentic systems are a combination of workflows and agents.
They also quote the same OpenAI line and explicitly say: “You should use workflows when you can use workflows. Most agentic systems are a combination.”
The key insight: both OpenAI and LangGraph agree on two big points:
- Simple, rule-based workflows are often the right choice. Not every use case needs an agent.
- Too much model freedom hurts reliability. If you let the model roam without constraints, your system becomes harder to trust.
For AEC—where regulation, liability, and client trust demand consistent, repeatable behavior—this is critical. You want to position your automation on the predictability side of that spectrum unless you have a genuinely ambiguous, exception-heavy problem that can’t be solved any other way.
Andrew Ng’s Four Patterns: What Actually Makes Something “Agentic”
Andrew Ng has been teaching a useful framework for understanding what makes workflows “agentic” beyond just “it uses AI.” He identifies four core patterns:
1. Reflection
The agent reviews its own outputs and asks “is this good enough?” before returning an answer. Think: self-critique loops for code, reports, or design options.
2. Tool Use
Agents call external tools (search, databases, calculators, APIs) instead of hallucinating capabilities. This is the difference between “tell me the area” and “calculate the area using the model data.”
3. Planning
The agent breaks tasks into steps, sequences them, and executes iteratively instead of “one big prompt → one big answer.”
4. Multi-agent Collaboration
Multiple specialized agents work together—coder + reviewer, researcher + summarizer—on different aspects of a task.
Ng emphasizes that agentic workflows are iterative and back-and-forth: “Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times.” The LLM asks clarifying questions, revises, and refines instead of trying to be “perfect in one shot.”
If a system isn’t doing at least some of these patterns—if it’s just running a fixed sequence of prompts—it’s probably not an “agent.” It’s a script with a marketing budget.
The Technical Reality: What Actually Makes Something an Agent
Let’s strip the jargon.
Workflows: Prompt Chaining in a Fixed Skeleton
A typical “AI workflow” looks like this: LLM call to extract structured data → code checks the data and branches → LLM call to generate a report. The sequence of steps is fixed. You can add gates and branches, but all of them are enumerated in advance. Anthropic explicitly classifies prompt-chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer as workflow patterns.
Agents: Decision Loops With State
A minimal “agent loop” looks like:
while not task_complete:
observe current state
decide next action
execute action
evaluate result
update plan
The loop itself is simple; the hard part is that “decide next action” is genuinely open-ended: it might decide to read different data sources than last time, choose a new tool or strategy in response to a failure, or stop early because it decides the value isn’t worth the risk.
Tool-Calling: The Bridge
Modern LLMs can call functions: check_room_parameters(), update_element_parameter(), query_spec().
In a workflow, your code decides which tool to call and when. In an agent, the model decides which tools to call next based on what it just observed—within the set you’ve exposed.
The core difference: Agents maintain state and adapt their strategy. Workflows execute a predetermined strategy.
Why Most AEC Tasks Don’t Need Agents
Here’s the uncomfortable but important part: most of what AEC needs is better served by workflows plus LLMs, not fully autonomous agents.
Anthropic themselves say: “We recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all… Workflows offer predictability and consistency for well-defined tasks, whereas agents are the better option when flexibility and model-driven decision-making are needed at scale.”
Workflow territory (rule-based, repeatable):
Standards checking – You know the naming rules, required parameters, valid values. That’s a deterministic ruleset with a clear mapping from violation → fix. Great workflow territory. Your firm’s BIM standards aren’t ambiguous—they’re just tedious to enforce manually.
Data export / handover – IFC / COBie / owner-specific Excel templates. Schema is known; mapping is known. A workflow with maybe some LLM sugar to normalize labels. The owner’s EIR isn’t a mystery—it’s a spec you need to hit reliably, every time.
Routing RFIs or submittals – Once you define the routing logic (discipline, CSI division, contract responsibility), it’s deterministic. You don’t need an agent to “figure out” who should answer an RFI about structural steel—you need a lookup table and some basic parsing.
Agent territory (genuinely ambiguous path):
Ambiguous interpretation tasks – “Does this detail actually satisfy the fire-rating intent, given the spec and the adjacent assemblies?” That involves nuanced interpretation of drawings, specs, and prior decisions. There’s no single “right” answer you can encode upfront.
Multi-step problem solving with unknowns – “Figure out why the same clashes keep reappearing even after we ‘fixed’ them. Is the issue modeling standards, sequencing, or a bad federated setup?” The diagnostic path isn’t predetermined—you need something that can explore and adapt.
Adaptive monitoring / triage – “Watch this stream of model commits and DM me when patterns appear that historically led to RFIs or change orders.” You can’t enumerate every warning sign upfront; you need pattern recognition and judgment.
These are places where you can’t easily enumerate every path upfront, where context and judgment matter more than strict repeatability.
For the rest—which is the bulk of BIM QA/QC, data plumbing, and document grunt work—you want deterministic workflows with LLMs in the loop where language/semantics add value, and a human firmly in control of any high-liability changes.
The Reliability Problem: Why This Matters for Your Firm
Production AI guides emphasize what digital practice leaders already know from painful experience: non-determinism and hallucinations become catastrophic when an agent acts.
When a chatbot hallucinates, you roll your eyes and rephrase the question. When an agent hallucinates and changes your model or sends an email to the owner, you have a liability problem.
Production guidance consistently recommends:
- Offload precision tasks to deterministic tools – Don’t rely on the LLM for mathematical calculations, date comparisons, or structured data retrieval. Use calculators, database queries, and rule engines.
- Use the LLM as the reasoning brain, not the execution engine
- Implement human-in-the-loop for high-impact decisions
- Instrument every step with full observability – You need complete traces of prompts, tool calls, observations, and intermediate reasoning to debug agents
For AEC, this maps directly to your reality: you can’t let an “agent” quietly change fire ratings, delete elements, or alter coordination models without a clear audit trail and rollback capability. The stakes are too high.
The Audit Question
Here’s the test to run on any “AI agent” pitch: “Show me the log of what it did and why, step by step.”
For a workflow, a good log will show: Step 1: called function A → got this result. Step 2: applied rule B → flagged 37 violations. Step 3: wrote report to location C.
For an agent, a good log will show: Observed state X. Decided to try action Y (and why). Y failed because Z. Updated plan; chose action W instead. Stopped because condition S was met.
If a vendor can’t show you decision-level logs—only a linear transcript of tool calls—they’ve probably built a workflow, not a true agent.
Again: that might be exactly what you need. But workflow-level capability should be priced and evaluated as such.
The Real Question Isn’t “Can We?” But “Should We?”
Your firm has been through this cycle before: BIM before standards and processes were ready. VR walkthroughs that never quite found daily use. Blockchain pilots hunting for a problem. Now, “AI agents” everywhere.
The question is not: “Can we deploy an agent?” The question is: “Does this specific problem justify agent-level autonomy?”
You should be asking: Can we describe the ideal behavior as a deterministic policy or ruleset? Is the path from input → output predictable and stable across projects? Would we prefer a deterministic, auditable system here if we could get it?
If yes, you’re almost always better off with a workflow: clear rules, explicit control flow, LLMs injected only where natural language understanding or generation helps, easy to test, audit, and version.
Agents are powerful—but they’re also more expensive (longer runs, more tool calls, more context handling), harder to govern (they can “do the wrong thing, but confidently”), and much harder to evaluate exhaustively in high-liability domains like construction.
Use them when the problem genuinely requires adaptive decision-making and unknown paths—not because a vendor says “that’s the future.”
What to Ask Next Time Someone Sells You an “Agent”
Next time you’re pitched an “AI agent” for BIM or construction, ask:
- “Does it decide what to do next, or does it follow a predetermined sequence?” If the path is always the same (even with branches), you’re in workflow land.
- “Can it adapt if it encounters something outside its pre-defined cases?” What does it do if a tool fails? If data is missing? If the model is dirtier than the demo?
- “Show me a log of its decision-making process, not just its outputs.” Can you see Observe → Decide → Act → Reflect, or just a list of function calls?
- “What happens when it’s wrong? How do we see what it was ‘thinking’?” Do you have transparency into the prompts, tools, and internal reasoning steps?
- “Does this meet OpenAI’s criteria for needing an agent?” Is it genuinely exception-heavy, unstructured, and judgment-dependent—or could a deterministic workflow handle it?
If they can’t answer those clearly, you’re probably looking at a script with a marketing budget.
And that’s fine—as long as it actually solves a real problem, you’re paying workflow prices (not “autonomous agent” prices), and you’re not taking on invisible autonomy risk in a high-liability domain.
Because in AEC—where margins are thin, liability is high, and BIM is already hard enough—the last thing you need is to pay for autonomy you don’t need and can’t trust.