Early 2026 Agentic Coding Update: OpenCode and Claude Code

📖 New Ebook Available

Build Your First MCP Server: A Developer's Guide to Wrapping Existing APIs for AI Agents to Use

Learn to create powerful AI integrations step by step

Get it for $9.99 →

Early 2026 Agentic Coding Update

I keep getting the same question: how involved are you really on big AI-assisted PRs, and how long does it actually take?

Short answer. Very involved. Usually many hours.

Also, I do not use Cursor anymore.

My daily stack now is OpenCode with gpt-5.3-codex on OpenAI Pro, plus Claude Code on my company's team plan.

I am sharing this because people keep asking if these PRs are mostly autonomous now. They are not.

The Ralph Loop Is Real

I still like the Ralph loop concept.

At its core, it is a while loop with an exit condition. In practice, that usually means do this until the test passes.

The concept is solid. The part people miss is control.

I am not letting loops run wild and auto-merge whatever comes out. I stay in the loop, steer aggressively, and stop bad directions early.

Looping is not the advantage by itself. Checkpoints are the advantage.

How I Run The Loop

For complex work, this still feels like a mad scientist operation.

  1. Manager agent builds the plan and asks clarifying questions.
  2. Coding agent executes one slice at a time, often starting with failing tests.
  3. I monitor the implementation live and interrupt if it drifts.
  4. Manager agent assigns the next slice.
  5. Auditing prompts and agents run self-reflection against PRIME_DIRECTIVE style checks.
  6. Final review layers happen before merge so conventions still hold.

This is not passive prompting. It is active systems management.

What A PRIME_DIRECTIVE Is

A PRIME_DIRECTIVE is the contract for the work before code starts. It defines what must be true for a PR to be considered valid, not just what compiles.

My directive docs usually include:

  1. Audit objective
  2. Non-negotiable pass criteria
  3. Parity rules against a reference implementation
  4. Architecture and security rules
  5. Required evidence artifacts
  6. A locked decision log

An anonymized example shape:

# PRIME_DIRECTIVE

## Audit Objective

- Match intended external behavior from the reference system.
- Allow divergence only when intentional, documented, and test-covered.

## Non-Negotiable Pass Criteria

- Strict session isolation per user/channel/peer.
- Deterministic command semantics for safety-critical actions.
- Authz boundaries enforced across user and admin paths.
- Streaming events follow fixed contracts.

## Required Evidence

- Parity matrix: matched | intentional_divergence | uncovered
- Security/authz matrix with evidence paths
- Regression proof for high-risk paths
- Validation outputs for lint, tests, and build

## Decision Log (Locked)

- Explicit architecture decisions that cannot drift without review.

The Feedback Loop I Actually Run

The loop is not just do work until tests pass. It is manager-controlled iteration with reporting and resets between rounds.

  1. Manager agent creates the next slice and prompt.
  2. Coding agent implements the slice and returns a report.
  3. I review the report and diffs, then steer, interrupt, or redirect.
  4. I paste outcomes back to the manager agent for the next step.
  5. I compact or clear coding-agent context between rounds to prevent context brain rot.
  6. Audit passes check whether work still satisfies PRIME_DIRECTIVE constraints.
  7. Final review layers happen before merge.

This is slower than full autopilot, but quality is much higher.

Why CLAUDE.md Matters More In 2026

LLMs are effectively stateless between sessions. If you do not provide stable context every time, behavior drifts fast.

That is why I anchor my workflow on CLAUDE.md. It is the one place that reliably onboards each new session with project intent, constraints, and execution defaults.

This post from HumanLayer explains the principle well: Writing a good CLAUDE.md.

Less Instructions, Better Adherence

One of my biggest lessons this year is that instruction density can hurt compliance.

If your root instructions try to cover every edge case, the agent starts ignoring more of them. I keep root rules short and broadly applicable, then push details into focused skill docs.

That gives me better instruction following and cleaner output under pressure.

Progressive Disclosure Over Prompt Bloat

I use a progressive disclosure pattern:

  1. Root CLAUDE.md for universal rules.
  2. Task-specific skills loaded on demand.
  3. Local overrides only when a folder has truly unique constraints.

Typical skill locations:

  • .opencode/skills/<name>/SKILL.md
  • .claude/skills/<name>/SKILL.md
  • .agents/skills/<name>/SKILL.md

Why I Stay Hands-On

If you try to fully automate PR generation with weak controls, you can produce a lot of low-trust output very fast.

My priority is protecting senior engineers from AI slop PRs that ignore conventions. That means strict gates, explicit reviews, and constant correction.

Skills Plus Rules Changed Everything

The biggest 2026 upgrade is reusable Agent Skills plus shared rule files.

I standardize on CLAUDE.md and .claude/skills because that works across both tools.

  • Claude Code does not read AGENTS.md by default.
  • OpenCode supports AGENTS.md and also reads Claude-compatible fallbacks.
  • Shared skill folders keep behavior consistent across sessions and agents.

Docs I use:

Each SKILL.md needs frontmatter with at least:

  • name
  • description

Then I lock down permissions so agents only load the skills they should.

Skills Are Powerful, But Not Automatically Safe

I posted about this on LinkedIn yesterday, and it is worth repeating here.

Skills are just markdown files, but they still shape agent behavior in real ways. Treat every installed skill like executable influence over your workflow.

A few failure modes are common:

  • Skills can quietly bias recommendations toward products and services.
  • Skills can contain dangerous instructions like exfiltrating secrets with curl.
  • Skills can be install-heavy and then linger, burning context when they are no longer useful.
  • Skills can be overly generic and conflict with your codebase conventions.

I have had the best results by taking inspiration from public skills, then rewriting them for my own repo standards and review gates.

One place I look for ideas is skills.sh, Vercel's skills leaderboard.

Discovery is not trust. I treat every skill as untrusted until I read the raw markdown and run a security audit pass.

When skills are aligned well, they save real code review time. When they are not, they create review debt.

Safe Skill Adoption Checklist

Please do not auto-install skills without reviewing them.

  1. Read the raw SKILL.md yourself before enabling it.
  2. Ask an agent to audit the skill for security risks and exfiltration patterns.
  3. Check that the skill matches your repo's actual conventions and architecture.
  4. Restrict permissions so the skill cannot overreach.
  5. Remove or disable stale skills that no longer earn their context cost.

My 2026 Baseline

This is the minimum loop I trust for non-trivial PRs:

  1. Manager plan with clarifying questions first.
  2. Coding agent execution in small slices.
  3. Self-reflection and audit pass before final review.
  4. Deterministic verification with tests, lint, and build checks.
  5. Final human gate before merge.

Guardrails For Branches

I published my plugin pack for this here:

The point is simple. Keep agent branch and PR behavior predictable without slowing down normal human workflows.

What Changed Since 2025

Last year I wrote about structured non-vibe coding loops:

The 2026 version is stricter.

Better gates. Better reusable skills. More self-reflection passes. More deliberate reviews.

I still use automation heavily. I just treat it as a managed pipeline, not autopilot.

Practical Takeaway

If you want better AI-assisted PR quality, do not start by removing humans from the loop.

Start here:

  1. Better rules
  2. Better skills
  3. Better tests
  4. Better review gates

That is the difference between fast progress and fast mess.

Want to Chat About AI Engineering?

I hold monthly office hours to discuss your AI Product, MCP Servers, Web Dev, systematically improving your app with Evals, or whatever strikes your fancy. These times are odd because it's weekends and before/after my day job, but I offer this as a free community service. I may create anonymized content from our conversations as they often make interesting blog posts for others to learn from.

Book Office Hours