Opinions on Agentic Software Engineering

The missing harness: why process models matter more than ever for AI agents

 

In the previous post, I argued that getting reliable output from a software development agent is fundamentally a problem of information provisioning — making the right knowledge available, at the right granularity, at the right moment. In this post, I extend on the challenge of reviewing agent output and become more concrete about what that looks like in practice – and why executable process models are one of the most underexplored mechanisms for addressing it.

 

Three categories of failure that undermine reliable agent output

The challenges that get in the way of reliable agent-based development fall into three broad categories. (For a general obstacle, pattern, and anti-patttern overview, visit https://lexler.github.io/augmented-coding-patterns/ )

  1. Context management. Agents are stateless — they cannot learn from previous sessions, which means all relevant context has to be provided every time. This is compounded by the fact that context windows are finite, and loading too much into them at once tends to produce shallow or misdirected output. Two common anti-patterns emerge here: using a single agent across too many distinct tasks, and overloading the context with instructions to the point where the agent loses focus rather than gaining it.
  2. Runtime behavior. Agent output quality degrades as task complexity increases. More subtly, training data continues to exert more influence over agent behavior than prompting alone — a phenomenon sometimes called selective hearing. And as a work session extends, earlier instructions tend to fade in influence, leading to context rot: the agent gradually drifts from the original constraints.
  3. Output quality. Agents produce a lot of output quickly, which creates its own risk — the anti-pattern of accepting output without adequate review. Related to this is hallucination: an agent may indicate that work is complete when it has not actually produced a correct or complete result.

 

Executable engineeing process models

An executable software engineering process is not a rigid workflow enforced by an automation engine. It is better understood as a guidance framework — one that supports engineers in working flexibly, deviating from the intended sequence when needed, while providing a path back to a compliant state. I prefer the term flow model to emphasize fluidity over rigidity.

 

A flow model consists of three core elements: instructions describing what to do at each step, concrete input artifacts or data that the agent needs to carry out that step, and quality checks that evaluate the completeness and correctness of the output before work continues.

 

This structure directly addresses each of the challenge categories described above.

  • On context management: rather than asking an agent to discover relevant information from broad, general descriptions, a flow model defines precisely what is relevant per step. The agent receives a narrow, reliable entry point each time — reducing the need for sophisticated prompting and making team standards explicit and shared, in the same way good documentation benefits junior developers.
  • On runtime challenges: a flow model enforces a clear separation between similar but distinct tasks — feature implementation versus change request, for instance — and splits work into steps with well-defined scope. Each step specifies what may change and what must not, grounding the agent's behavior. Explicit quality checks at step boundaries give the agent a mechanism to detect and correct deviations, addressing both selective hearing and context rot. Review steps ensure that the agent's intent is visible and aligned before implementation proceeds.
  • On output quality: per-step checks provide feedback not only to the agent but also to the developer — rapid, legible signals about how far the agent has progressed, whether the output is ready for human review, or where it has stalled. This reduces the frequency and burden of human review while raising the quality threshold at which review happens. External quality checks also serve as a structural defense against hallucination, since completion cannot be asserted by the agent alone.

 

Summarized: an executable process model functions as part of an outer harness. Its goal is to help the agent produce correct output as early as possible — with checks acting as sensors that allow automated retry and self-correction before human review.

 

Process modeling existed — but agents are what make it necessary

Software and systems engineering process modeling is not a new idea. It was proposed and explored in the 1990s, but never gained widespread traction in industry. Research conducted more recently found that engineering process modeling in enterprises tends to remain semi-formal at best. The reasons are not hard to find: executable process was often perceived as too constraining, and the value of doing it rigorously was not high enough to justify the effort when the knowledge lived in the heads of experienced developers who could fill in the gaps.

 

That calculus has changed. When agents are part of the development workflow, the knowledge can no longer live implicitly in anyone's head. It has to be externalized, structured, and available every time. Process modeling was a solution looking for a problem of sufficient scale. It may have found one.

 

There is also an important practical benefit to modeling process over artifact alone: agents and humans can work on the same artifacts seamlessly. When an agent gets stuck, the human stepping in has full access to the feedback the agent received. When a human prepares early-stage artifacts, the agent can take over to suggest refinements, run checks, or complete implementation and testing. The handoff works in both directions.

 

Using a flow model in practice: from epics to traced implementation

I currently use an executable flow model for the development of the BlueContext Flow Engine itself — the flow engine I am building to address exactly this need. I maintain dedicated processes for feature implementation and change request implementation, which guide the agent from epics downstream through requirements, test case specification, and code and test implementation, while maintaining bidirectional traceability throughout. This is not a waterfall approach: I decide which requirements to prioritize and which to revise based on feedback from implementation. But it supports a rigorous engineering process that does not depend on remembering to provide the right context each time. If you feel this is something for you to try out - get in touch.

 

Open questions and future potential

There remain significant open questions — in tooling, but also in method. How quickly can and should a flow model evolve, and which parts must stay fixed to ensure compliance in regulated domains? How do you accommodate individual developer preferences within a shared process structure? As adoption scales toward enterprise use, how do established practices from process variant modeling apply? How tightly should tool access be restricted per step without becoming counterproductive?

 

There is also potential that extends beyond context engineering. Process-centric learning: analyzing which capabilities an agent actually exercised in a given step — and learning from that over time to improve future recommendations — is an open area worth exploring.

From a security perspective, a process model that tracks expected resource access patterns per step could provide a meaningful signal when an agent attempts to access something outside the norm, which may indicate prompt injection. Taking this further, defining resource permissions — such as which MCP tools are available — at the step level rather than at the project or agent level offers a more precise security boundary, reducing the attack surface without artificially constraining the agent's capabilities where they are legitimately needed.

 

These are the questions I am working through. If any of them are relevant to challenges you are facing, I would be glad to compare notes — feel free to get in touch.