Agent Components
The Kubernetes Agent is composed of multiple specialized autonomous agents, coordinated by a central supervisor. As of v0.3.0, it includes a dedicated ArgoCD onboarding orchestrator and sub-agents.
1. ๐ฏ Supervisor Agentโ
The Supervisor Agent is the central orchestrator that manages the entire lifecycle of Helm chart generation and cluster operations. It coordinates specialized swarms, manages state, and enforces human-in-the-loop (HITL) safety gates.
Key Responsibilitiesโ
- Orchestration: Manages workflow phases (Planning โ Generation โ Validation).
- Delegation: Routes tasks to specialized swarms via tool-based delegation.
- State Management: Maintains the global state and transforms it for specific swarms.
- Safety: Enforces mandatory HITL approval gates at critical transition points.
Architectureโ
The Supervisor uses a Tool-Based Delegation pattern, using LangChain's create_agent to dynamically route tasks based on the current workflow state.
Human-in-the-Loop Gatesโ
The Supervisor enforces strict approval gates:
| Gate | Trigger | Purpose |
|---|---|---|
| Planning Review | After planning completion | Review architecture and requirements analysis. |
| Generation Review | After template generation | Review generated artifacts and specify workspace path. |
| Execution Approval | Before cluster changes | (Helm Mgmt) Confirm installation/upgrade plans. |
2. ๐ Planner Agentโ
The Planner Agent is a specialized "Deep Agent" responsible for transforming natural language requirements into a rigorous technical plan. It uses a swarm of sub-agents to analyze requirements, detect gaps, and orchestrate production-ready architectures.
Architecture (Deep Agent)โ
The Planner operates as its own supervisor managing two sub-agents:
Sub-Agentsโ
| Sub-Agent | Role | Key Tools |
|---|---|---|
| Requirements Analyzer | Extract & Validate | parse_requirements, classify_complexity, validate_requirements |
| Architecture Planner | Design & Size | design_k8s_architecture, estimate_resources, check_dependencies |
Key Logic: 5-Step Workflowโ
- Extract: Parse 12 critical fields (App Type, Framework, Image, Exposure, etc.).
- Gap Detection: If critical info (e.g., Image, Port) is missing, pause and ask user via
request_human_input. - Analysis: Classify complexity (Simple/Medium/Complex) and validate completeness.
- Planning: Design K8s resources (Deployment vs StatefulSet, HPA, PDB) and estimate CPU/Memory.
- Compile: Output a structured
ChartPlanJSON for the generator.
๐ก Smart Clarification: The agent prioritizes questions. It won't ask for optional details if critical ones (like "What is the docker image?") are missing.
3. โ๏ธ Template Coordinatorโ
The Template Coordinator is a LangGraph-based agent that orchestrates the execution of 13 specialized tools to generate production-ready Helm chart templates.
Architecture (Coordinator Pattern)โ
Instead of a simple chain, it uses a Coordinator Node to manage dependencies and execution order dynamically.
Execution Phases & Toolsโ
The coordinator executes tools in 4 strict phases to respect dependencies:
| Phase | Description | Key Tools |
|---|---|---|
| 1. Core Templates | Essential resources required for any chart. | generate_helpers_tpl, generate_deployment, generate_service |
| 2. Conditional | Optional features based on planner output. | generate_hpa, generate_pdb, generate_network_policy, generate_ingress |
| 3. Documentation | Needs all templates to be finished first. | generate_readme (scans all templates) |
| 4. Aggregation | Assembles final file structure. | aggregate_chart |
Key Featuresโ
- Dependency Management: Knows that Ingress requires Service, and Service requires Deployment.
- Smart Retries: If a tool fails (e.g., LLM error), the
Error Handlernode retries it up to 3 times. - Values Aggregation: The
generate_values_yamltool runs last, collecting all variables used across all templates to ensure nothing is undefined.
4. โ Generator (Validator) Agentโ
The Generator Agent (also known as the Validator Deep Agent) focuses on Quality Assurance. It uses a ReAct pattern (Reasoning โ Action โ Observation) to autonomously validate and fix charts.
Tool Stackโ
It combines direct filesystem access with specialized Helm validators:
| Category | Tools | Purpose |
|---|---|---|
| File System | ls, read_file, write_file, edit_file | Inspect structure and apply fixes. |
| Validation | helm_lint, helm_template, helm_dry_run | Validate syntax, rendering, and cluster compatibility. |
| Escalation | ask_human | Request help for complex issues. |
Validation Pipelineโ
The agent runs validations sequentially, growing more strict at each step:
๐ฉน Self-Healing Mechanismโ
The agent attempts to fix errors autonomously before bothering the user.
- Analyze Error: Detects issues like bad indentation, missing fields, or deprecated APIs.
- Apply Fix: Uses
edit_fileto modify the YAML directly. - Verify: Re-runs the validation tool.
- Escalate: If it fails 2 times in a row, it triggers
ask_humanfor manual intervention.
5. ๐ก๏ธ Helm Management Agentโ
The Helm Management Deep Agent is the operational arm ensuring "Safety at Speed". It employs a Dual-Path Architecture to handle both quick queries and high-stakes cluster modifications securely.
Dual-Path Architectureโ
The agent routes requests based on intent classification ("Risk Profile").
The 5-Phase "Safe-Track" Pipelineโ
Used for install, upgrade, rollback, and uninstall operations.
| Phase | Activity | HITL Gate? |
|---|---|---|
| 1. Discovery | Detects if release exists (Upgrade vs Install). Fetches chart info. | No |
| 2. Confirmation | Values Confirmation: Shows "Proposed Changes" vs "Current". | YES |
| 3. Planning | Runs helm_validate_values, checks prerequisites, generates diffs. | No |
| 4. Approval | Plan Approval: The "Nuclear Button". Final specific sign-off. | YES |
| 5. Execution | Performs operation (helm_install) & verifies pod health. | No |
Safety Middlewareโ
HelmApprovalHITLMiddleware: The failsafe. Even if the LLM tries to skip approval, this code-level interceptor forces a hard stop before any write operation.ErrorRecoveryMiddleware: Automatically retries flaky read-operations (up to 3 times) to handle network blips.
6. ๐งญ ArgoCD Onboarding Orchestratorโ
The ArgoCD Onboarding Orchestrator is the control plane for GitOps workflows. It interprets user intent, validates prerequisites, and coordinates the ArgoCD sub-agents with explicit human approvals.
Key Responsibilitiesโ
- Intent Classification: Determine read-only query vs. workflow (create/update/delete/sync).
- Prerequisite Checks: Ensure project/repo/app state is known via MCP before acting.
- Plan Preview: Present a human-friendly plan with what, where, and why.
- Approval Gates: Require HITL approval for risky operations.
Workflow Phasesโ
- Understand: Parse request and required targets.
- Validate: Fetch current state (project/repo/app).
- Plan: Present a preview and request approval.
- Execute: Run MCP tool calls with tool-level approvals.
- Verify: Confirm success and summarize changes.
7. ๐ฆ Project Agentโ
Handles ArgoCD project CRUD operations (create/get/list/update/delete), including checks for existing project constraints and permissions.
8. ๐๏ธ Repository Agentโ
Manages ArgoCD repositories (list/get/onboard/delete) and performs repository connectivity diagnostics.
9. ๐ Application Agentโ
Handles ArgoCD applications: create/update/delete, sync operations, diff previews, and health checks.
10. ๐งช Debug Agentโ
Fetches ArgoCD application logs and events to assist with troubleshooting workflows.
State Managementโ
The Kubernetes Agent uses a sophisticated state management system designed for resumability, isolation, and type safety. This system allows the Supervisor to seamlessly delegate tasks to sub-agents while maintaining a coherent global history.
๐งฉ Specialized State Schemasโ
Each agent swarm operates on its own dedicated state schema, optimized for its specific task.
| State Schema | Used By | Key Fields |
|---|---|---|
MainSupervisorState | Supervisor | user_query, workflow_state, active_phase, helm_chart_artifacts |
PlanningSwarmState | Planner | requirements, chart_plan (JSON), gaps_detected |
GenerationSwarmState | Template | generated_templates, completed_tools, pending_dependencies |
ValidationSwarmState | Validator | blocking_issues, validation_results, retry_counts |
HelmAgentState | Helm Mgmt | chart_metadata, current_release (Live State), execution_plan |
ArgoCDOnboardingState | ArgoCD Onboarding | project_info, repository_info, application_info, approval_checkpoints |
๐ State Transformersโ
Data is not shared blindly. A StateTransformer middleware explicitly converts data when moving between the Supervisor and Sub-Agents (including the ArgoCD onboarding workflow). This ensures "Context Isolation"โsub-agents see only what they need, preventing hallucination from irrelevant history.
๐พ Persistence & Handoffsโ
- Checkpointer: All states are persisted to PostgreSQL. This enables long-running workflows where the user might take hours to approve a plan.
- Interrupts: When a HITL gate is triggered (e.g., in Helm Mgmt Phase 2), the state is saved, execution stops, and the system waits. Upon approval, it resumes exactly where it left off, hydrating the state from the database.