In the previous post, I argued that getting reliable output from a software development agent is fundamentally a problem of information provisioning — making the right knowledge available, at the right granularity, at the right moment. In this post, I extend on the challenge of reviewing agent output and become more concrete about what that looks like in practice – and why executable process models are one of the most underexplored mechanisms for addressing it.
Three categories of failure that undermine reliable agent output
The challenges that get in the way of reliable agent-based development fall into three broad categories. (For a general obstacle, pattern, and anti-patttern overview, visit https://lexler.github.io/augmented-coding-patterns/ )
Executable engineeing process models
An executable software engineering process is not a rigid workflow enforced by an automation engine. It is better understood as a guidance framework — one that supports engineers in working flexibly, deviating from the intended sequence when needed, while providing a path back to a compliant state. I prefer the term flow model to emphasize fluidity over rigidity.
A flow model consists of three core elements: instructions describing what to do at each step, concrete input artifacts or data that the agent needs to carry out that step, and quality checks that evaluate the completeness and correctness of the output before work continues.
This structure directly addresses each of the challenge categories described above.
Summarized: an executable process model functions as part of an outer harness. Its goal is to help the agent produce correct output as early as possible — with checks acting as sensors that allow automated retry and self-correction before human review.
Process modeling existed — but agents are what make it necessary
Software and systems engineering process modeling is not a new idea. It was proposed and explored in the 1990s, but never gained widespread traction in industry. Research conducted more recently found that engineering process modeling in enterprises tends to remain semi-formal at best. The reasons are not hard to find: executable process was often perceived as too constraining, and the value of doing it rigorously was not high enough to justify the effort when the knowledge lived in the heads of experienced developers who could fill in the gaps.
That calculus has changed. When agents are part of the development workflow, the knowledge can no longer live implicitly in anyone's head. It has to be externalized, structured, and available every time. Process modeling was a solution looking for a problem of sufficient scale. It may have found one.
There is also an important practical benefit to modeling process over artifact alone: agents and humans can work on the same artifacts seamlessly. When an agent gets stuck, the human stepping in has full access to the feedback the agent received. When a human prepares early-stage artifacts, the agent can take over to suggest refinements, run checks, or complete implementation and testing. The handoff works in both directions.
Using a flow model in practice: from epics to traced implementation
I currently use an executable flow model for the development of the BlueContext Flow Engine itself — the flow engine I am building to address exactly this need. I maintain dedicated processes for feature implementation and change request implementation, which guide the agent from epics downstream through requirements, test case specification, and code and test implementation, while maintaining bidirectional traceability throughout. This is not a waterfall approach: I decide which requirements to prioritize and which to revise based on feedback from implementation. But it supports a rigorous engineering process that does not depend on remembering to provide the right context each time. If you feel this is something for you to try out - get in touch.
Open questions and future potential
There remain significant open questions — in tooling, but also in method. How quickly can and should a flow model evolve, and which parts must stay fixed to ensure compliance in regulated domains? How do you accommodate individual developer preferences within a shared process structure? As adoption scales toward enterprise use, how do established practices from process variant modeling apply? How tightly should tool access be restricted per step without becoming counterproductive?
There is also potential that extends beyond context engineering. Process-centric learning: analyzing which capabilities an agent actually exercised in a given step — and learning from that over time to improve future recommendations — is an open area worth exploring.
From a security perspective, a process model that tracks expected resource access patterns per step could provide a meaningful signal when an agent attempts to access something outside the norm, which may indicate prompt injection. Taking this further, defining resource permissions — such as which MCP tools are available — at the step level rather than at the project or agent level offers a more precise security boundary, reducing the attack surface without artificially constraining the agent's capabilities where they are legitimately needed.
These are the questions I am working through. If any of them are relevant to challenges you are facing, I would be glad to compare notes — feel free to get in touch.
© 2026 bluecontext.at
