
Step-by-Step OpenClaw Tutorial 2026: Zero to Hero Deployment Guide

By Sarah Jenkins


By Sarah Jenkins
OpenClaw is an open-source, self-driven AI agent. This book combines best practices to provide a full-lifecycle guide from initial setup to application deployment, while deeply deconstructing its underlying operational mechanisms and implementation principles.
New to OpenClaw? Experience it in three steps:
This book is part of an AI technical series. The following titles provide complementary knowledge:
| Book Title | Relationship to This Book |
|---|---|
| AI for Beginners | Foundational AI knowledge for those without a technical background. |
| Prompt Engineering Guide | Theoretical basis for designing effective agent prompts. |
| Context Engineering Guide | Managing agent context and memory architecture design. |
| Claude Technical Guide | Claude's MCP protocol, tool use, and Agentic Coding. |
| Agentic AI Definitive Guide | General agent architectures and multi-agent collaboration patterns. |
| AI Security Definitive Guide | Security design and defense practices for agent systems. |
| LLM Internals & Architecture | Deep dive into the logic and structure of Large Language Models. |
Issues and Pull Requests are welcome, especially regarding: typo corrections, broken link fixes, practical case studies, and reusable templates.
Welcome to the world of OpenClaw. If you have ever used ChatGPT or Claude, you might have been amazed by the intelligence of Large Language Models (LLMs). However, when you try to get them to "actually get the job done"—such as automatically checking emails, organizing data, and sending it to a Lark group—you quickly realize there is a missing link: a system framework that connects AI with real-world software.
OpenClaw is exactly that: an Agent project designed to bridge the gap between LLMs and the real world, allowing AI to complete complex tasks for you automatically and securely.
As the opening of this book, this chapter aims to build a clear and systematic "cognitive map," outlining the core problems OpenClaw solves and defining its system boundaries. Through this chapter, you will establish a comprehensive understanding of OpenClaw.
This chapter includes the following sections:
Upon completing this chapter, you will be able to:
This section explores the inevitable trend of Large Language Models (LLMs) evolving into Agents, analyzes the complex engineering pain points encountered during implementation, and elucidates OpenClaw’s positioning as a "task operation domain," its applicable boundaries, and its core architectural logic.
Imagine you’ve hired an intern who is exceptionally brilliant—capable of researching data, drafting documents, and organizing information—and is available 24/7.
OpenClaw is that intern, except it lives inside your computer. More precisely: OpenClaw is a self-driven Agent that can be installed on local machines or servers, allowing it to access and utilize various tools (e.g., calendars, emails, chat windows).
It can assist you with:
The birth of OpenClaw is inseparable from the efforts of founder Peter Steinberger (also the founder of PSPDFKit). The project initially launched under the name Clawdbot, later changing to Moltbot following community feedback and positioning adjustments. However, in early 2026, the project faced a period of turbulence: it received a severe trademark compliance warning, followed closely by the hijacking of its official X (Twitter) account. Under extreme pressure, the community held a global vote, ultimately establishing the name OpenClaw.
This "trial by fire" served as a catalyst for community cohesion. The transparent governance and collaboration during the renaming crisis laid a foundation of trust for subsequent explosive growth—as of now, the project's GitHub Star count has surpassed many veteran open-source projects, making it one of the fastest-growing in its field.
As the capabilities of LLMs leap forward, interaction paradigms are undergoing profound changes. Moving from simple Q&A bots to assisted driving systems, and finally to Agents that autonomously plan and execute long-process tasks, system complexity is rising exponentially. When Agents truly take over business workflows, traditional architectures become inadequate, facing three core challenges:
In traditional business architectures, integrating LLM capabilities often requires weaving extensive logic into core code to handle complex context splicing, API callback parsing, and process exceptions. This disrupts business lines. OpenClaw positions itself as an independent "Runtime Domain," decoupling business logic from Agent scheduling. It actively takes over the heavy lifting of state assembly, model scheduling, and retry/fallback logic, providing a foundation for private deployment, multi-channel access, and support for multiple Agent types.
OpenClaw offers specific features to address the three major engineering pain points:
When introducing any new architectural component, clarifying its boundaries is vital. OpenClaw has typical use cases as well as specific scenarios where it should be used with caution.
This section analyzes the fundamental architecture of OpenClaw, aiming to answer four key questions: How is the system layered? What are the core components? How does a single request flow through the system? And which layer should be inspected first in case of a failure?
From the perspective of official product positioning and physical operational boundaries, OpenClaw can be divided into three core planes:
graph TD
User["User (DM/Group Chat)"] <--> |"Messages"| ChatProv["Multi-channel Inbox (Telegram/WhatsApp, etc.)"]
ChatProv <--> |"Network Requests"| Gateway["Gateway Process (WS+HTTP :18789)"]
ControlUI["Control UI / WebChat"] <--> |"WS Direct"| Gateway
subgraph GatewayProcess ["Gateway Process (Control & Operation)"]
Gateway <--> |"Event Dispatch"| AgentRuntime["Agent Runtime (Embedded Pi SDK Session)"]
AgentRuntime --> |"Intercept/Validate"| Policy["Tool Policy + Sandbox Policy"]
Policy --> CoreTools["Core Tools (fs/exec/web)"]
Policy --> BrowserTool["Browser Tool (CDP / Playwright)"]
Policy --> NodesTool["Nodes Tool (node.invoke)"]
end
subgraph OperationSurface ["External Operation Surface (Capability)"]
NodesTool <--> |"WS (role: node)"| PairedNode["Paired Node Device (macOS/iOS, etc.)"]
BrowserTool --> Chrome["Chromium / Remote CDP"]
CoreTools --> HostEnv["Docker Sandbox / Host Process"]
endCorresponding to the four-layer architecture, all abstractions in the system manifest as these five core objects:
To understand how data actually moves through the system, we track the lifecycle of a standard request:
sequenceDiagram
autonumber
actor User as User
participant Gateway
participant Node
participant Agent as Agent Engine
participant Session
participant Model as LLM
participant Tool as Tool
User->>Gateway: Submit message after channel adaptation
Note over Gateway: Permission Point: Auth & Pairing Check
Gateway->>Node: Dispatch request based on routing
Node->>Agent: Invoke target Agent
Agent->>Session: Locate corresponding Session
Session-->>Agent: Extract historical context
Note over Agent,Model: Budget Point: Timeout & Retry Control
Agent->>Model: Submit assembled prompt
Model-->>Agent: Determine action required
Note over Model: Failure Point: Quota Fallback
Agent->>Tool: Call controlled Tool for external action
Tool-->>Agent: Return structured result
Agent->>Model: Re-submit context with tool result
Model-->>Agent: Generate final response
Agent->>Session: Write tool result & final response
Agent->>Gateway: Return processing result
Gateway->>User: Return to user terminal via original path
Because the architecture is strictly decoupled, troubleshooting follows an "outside-in" scanning strategy:
| Layer & Core Object | Core Responsibility | Typical Failure Symptom & Direction |
| Ingress (Channels) | Adapt protocols to standard events | No logs of incoming messages; client "connection dropped." Check: Webhook config, network connectivity. |
| Control (Gateway/Node) | Connection, Global Auth, Routing | "Instant Red-Bar Denial" (Unauthorized) without model errors. Check: Pairing files, routing topology, ACLs. |
| Operation (Agent/Session) | Memory, Retries, Sandboxing | Infinite thinking, blocking conditions, loss of context, OOC (Out of Character). Check: Context length, Agent JSON, Compaction params. |
| Capability (Tool/Model) | External interaction, Completion | No action response, 429 Rate Limit, unreleased calls. Check: Model quota, Tool metadata, Liveness probes. |
Why can a seemingly simple core (Event + Executor + State Machine) support complex, interruptible, long-running agent systems? OpenClaw is built directly upon the π (pi) minimalist operation skeleton:
flowchart LR
subgraph Inbound ["Ingress & Channels"]
U["User Message"] --> CH["Channels<br/>(WhatsApp/Telegram/etc.)"]
end
subgraph Gateway ["Gateway Control Plane"]
CH --> GW["Gateway<br/>WS + HTTP + Control UI"]
GW --> ROUTE["Routing/Bindings<br/>SessionKey Decision"]
ROUTE --> RUN["Agent Runtime<br/>Prompt Assembly/State Machine"]
RUN --> MODEL["Model Providers<br/>(Strategy-based Fallback)"]
RUN --> TOOL["Tools<br/>(Policy/Sandbox/Approval)"]
RUN --> OUT["Reply Stream"]
RUN --> STORE["Session Store + Transcript<br/>sessions.json + *.jsonl"]
GW --> LOGS["File Logs (JSONL)<br/>Control UI/CLI tail"]
end
OUT --> GW --> CH --> U
TOOL --> RUNThis section introduces the differences between OpenClaw and its primary alternatives, including conversational AI, assistant-style tools (such as Claude Coworker), and automated workflows.
Tools like ChatGPT, DeepSeek, and early Claude chat models provide an excellent intelligent Q&A experience for individuals. However, when building enterprise-grade autonomous intelligent systems, they typically face the following limitations:
The concept of an "AI Coworker" is currently on the rise, represented by Cursor and the Claude-based "Coworker" architecture proposed by Anthropic. Their positioning is very close to OpenClaw’s, as both strive to create "digital outsourcing" that integrates into business processes. Despite similar philosophies, key differences exist in their implementation forms:
Before the explosion of Agents, enterprises often relied on Zapier, RPA tools, or similar integration platforms to achieve cross-system workflows.
To provide a more comprehensive view of OpenClaw's position in the industry ecosystem, the following table compares typical representatives of cloud assistants, development frameworks, and automation platforms:
| Product/Platform | Deployment Form | Core Positioning | Tool Operation Location | Key Differences (vs. OpenClaw) |
| OpenAI Assistants | Cloud API (Developer Integration) | Building agents for apps; supports tool/function calling | App-side or cloud isolated environment | OpenClaw excels in "local operation surface" and multi-entry access without extra client development; OpenAI focuses on cloud API standardization. |
| LangChain/LangGraph | Dev Framework (Self-hosted) | Code library and orchestration for building agents | Code deployment side | LangChain is a "framework lego" requiring significant code to assemble; OpenClaw is an "out-of-the-box" personal runtime with a ready-made gateway. |
| n8n / Zapier | Self-hosted / Cloud | Workflow automation (including AI-enabled nodes) | Workflow node side | Traditional tools win on visual orchestration and massive SaaS integrations; OpenClaw wins on "NLP-based fuzzy reasoning" and deep integration with local resources. |
| Dify | Cloud / Self-hosted | Low-code AI app building; visual workflows and agent editors | Cloud or self-hosted nodes | Dify excels in the visual frontend experience; OpenClaw excels in local operation, private isolation, and fine-grained permissions. |
| Coze (ByteDance) | Cloud SaaS | Low-code agent building; integrated with ByteDance ecosystem | Cloud isolated environment | Coze is ready-to-use but limited to its cloud ecosystem; OpenClaw offers full privatization, local operation, and enterprise-grade Gateway capabilities, but requires self-maintenance. |
| AutoGen (Microsoft) | Dev Framework (Self-hosted) | Multi-agent collaboration; focuses on role-playing and planning | Code deployment side | AutoGen excels in multi-agent dialogue and collaborative programming; OpenClaw excels in multi-channel entry for single agents, local tool calling, and long-term memory. Overall, OpenClaw holds a distinct competitive advantage in the "Local Operation Surface (Files/Processes/Browsers/Nodes) + Unified Chat UI" space, though it remains positioned as an isolated environment platform for individuals or internal networks regarding enterprise RBAC or cloud availability hosting. |
The comparison table above reflects the landscape at the time of this publication. However, the AI Agent field is moving rapidly. The capability boundaries, deployment options, and pricing models of various products are constantly evolving. Readers are encouraged to visit official project documentation and community discussions for the latest feature benchmarks and user feedback.
Based on different requirement stages, the selection advice is as follows:
Before deciding to implement OpenClaw, it is essential to establish clear expectations. This section helps you make an informed selection based on four dimensions: "What it's good for," "What it's not good for," "What the risks are," and "Token costs."
OpenClaw’s core competitiveness lies in the trinity of self-hosting + multi-channel access + tool operation. The following scenarios are its optimal strengths:
No tool is a silver bullet. OpenClaw may not be the optimal choice in these scenarios:
It is worth noting the structural impact of Agents on work styles. When peers begin using AI to accept orders 24/7 or automatically plan routes, those who do not use them may be systematically eliminated due to the efficiency gap—this is known as the "Coercion Effect." Self-hosted Agent platforms like OpenClaw are the infrastructure for this trend.
Furthermore, Agents change business logic: they have no emotional preferences and only seek the optimal solution (cost-performance, speed), ignoring ads and visual marketing. This means traditional traffic funnel models may gradually fail. For teams evaluating OpenClaw, this is both an opportunity and a reminder to take post-deployment security and governance seriously.
When deploying and using OpenClaw, maintain a clear understanding of the following risks:
Security Risks
OpenClaw empowers AI to execute Shell commands, read/write files, and send messages. This is essentially a mismatch between reasoning capability and operation permissions—current LLM reliability is not yet sufficient for the operational permissions granted. Prompt Injection or misconfiguration could lead to accidental file deletion, sensitive information leakage, or unauthorized external requests. It is vital to:
This is the most common oversight for new users. Every interaction with OpenClaw consumes model API Tokens, and Tokens are real money.
In standard ChatGPT/Claude dialogues, Token consumption = User Input + Model Output. In OpenClaw, a seemingly simple request may undergo multiple reasoning loops:
graph LR
A["User Message"] --> B["Prompt Assembly<br/>System Prompt + Tools<br/>+ Context History"]
B --> C["LLM Inference #1"]
C --> D["Tool Call"]
D --> E["Tool Result Injection"]
E --> F["LLM Inference #2"]
F --> G["Final Response"]
Each reasoning round resends the complete system prompt, tool definitions, and context history. A single user message may trigger 2–5 rounds, each consuming thousands of tokens.
| Component | Consumption per Round | Description |
| System Prompt | 500-1000 tokens | Role definition and behavioral instructions |
| Tool Definitions | 200-500 tokens | Descriptions for every mounted tool |
| Context History | 500-5000 tokens | Grows with dialogue turns; the largest cost source |
| User Message | 50-1000 tokens | The actual user input |
| Model Output | 100-2000 tokens | Inference results and tool call parameters |
| Tool Return Value | 100-2000 tokens | Result of tool operation injected back into context Using Claude Sonnet 4.6 ($3/1M input, $15/1M output) as an example, an interaction involving 2 tool calls totals roughly 8,000–15,000 tokens, costing about $0.03–$0.10. While seemingly small, frequent interactions can lead to monthly bills of dozens or even hundreds of dollars. |
Rule of Thumb: If a task can be completed via a traditional script or API call (e.g., scheduled data fetching), don't let the LLM intervene. Use Agents only for parts requiring natural language understanding, fuzzy decision-making, or multi-step reasoning. This is the fundamental principle of token cost control.
Based on OpenClaw's core features—strong isolation, self-controlled infrastructure, and a rich node toolchain—we can derive several high-potential real-world application scenarios. These are areas where pure cloud-based SaaS often struggles to provide full coverage.
For power users or digital professionals, OpenClaw acts as a personal digital nerve center.
Scenario Description: By sending commands via Telegram or WhatsApp, OpenClaw can query calendars, organize emails, summarize notes, or even perform low-frequency O&M (Operations and Maintenance) tasks in the background.
Specific Workflow:
Combining Cron jobs, Webhook triggers, and Node capabilities.
Scenario Description: Deploying the Gateway on a low-power device (like a Raspberry Pi) for constant operation. It can monitor hardware status on a laboratory LAN or pull data at specific times to send to designated groups.
Specific Workflow:
For researchers or users who need long-term context accumulation.
Scenario Description: Leveraging OpenClaw's "File as Truth" design, all interaction memories and extracted long-term knowledge are stored as standard Markdown or JSONL files. A local SQLite database builds vector indexes to enable Retrieval-Augmented Generation (RAG).
Specific Workflow:
For security defense researchers.
Scenario Description: Since OpenClaw explicitly lists LLM-specific threats (e.g., Prompt Injection leading to arbitrary command operation, SSRF, or approval bypasses) as research points, it is naturally suited as an experimental platform in isolated virtual environments.
Specific Workflow:
This chapter has outlined OpenClaw’s system positioning, architectural landscape, and core objects.
After reading this chapter, try to answer the following questions:
This chapter aims to guide you through deploying OpenClaw in a local or server environment. By following a systematic preparation process, complete installation workflow, and acceptance testing, you will ensure that OpenClaw runs reliably in your environment, laying the foundation for the practical exercises in subsequent chapters.
This chapter consists of the following sections:
After completing this chapter, you will be able to:
Scope of ApplicationThis guide is applicable to macOS, Linux, and Windows (WSL2 recommended) environments. For production-grade deployments, it is strongly recommended to use a Linux host with Docker, supplemented by a reverse proxy, process manager, and a strict least-privilege account policy.
This section outlines the system environment and network connectivity requirements that must be verified before installation.
Key dependencies to prepare:
Regardless of the installation method, the purity and version compatibility of the underlying runtime environment is the first hurdle.
[!WARNING]Insufficient memory causes numerous issues. When performing auto-updates, running browsers, or processing long-context tasks, servers often encounter OOM (Out of Memory) errors that freeze processes or cause update failures. The cost saved on hardware is rarely worth the subsequent troubleshooting overhead.
Since OpenClaw relies heavily on remote APIs, network status directly dictates availability. This is split into two parts:
# Installation Network: npm registry
curl -sS -m 5 -o /dev/null -w "npm registry: %{http_code}\n" https://registry.npmjs.org/
# Runtime Network: LLM Provider API (OpenAI example)
curl -sS -m 10 -o /dev/null -w "llm provider: %{http_code}\n" https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"# Notes:# - 200: Authentication passed and network is reachable.# - 401/403: Usually means authentication failed, but the network path is open (still useful for "can I connect" checks).
Often overlooked but fatal hidden issues include:
To ensure a "plug-and-play" experience, gather these materials in a password manager or environment variables beforehand. Do not hardcode them.
[!CAUTION]Never hardcode API tokens in test scripts or copy-paste them into chat software. Use environment variables for secure management.
You can use the following diagnostic script check_env.sh to verify core dependencies:
Bash
#!/bin/bashecho "=== OpenClaw Environment Self-Check ==="
node --version || echo "Warning: Node.js not installed or version < 22"
npm --version || echo "Tip: npm not installed. Required for non-script installations."
docker --version || echo "Tip: Docker not installed (Required for container deployment)."echo "Testing installation network (Official Script)..."
curl -s -m 5 -o /dev/null -w "install script: %{http_code}\n" https://openclaw.ai/install.sh
echo "Testing runtime network (LLM Provider, e.g., OpenAI)..."if [ -n "${OPENAI_API_KEY:-}" ]; then
curl -sS -m 10 -o /dev/null -w "llm provider: %{http_code}\n" https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"else
curl -sS -m 10 -o /dev/null -w "llm provider: %{http_code}\n" https://api.openai.com/v1/models
fiecho "Note: 200 = Success; 401/403 = Reachable but check Key/Permissions."echo "Check Complete"
Expected Output (Healthy Environment):
Plaintext
=== OpenClaw Environment Self-Check ===
v22.12.0
10.2.3
Docker version 27.3.1
Testing installation network...
install script: 200
Testing runtime network...
llm provider: 200
Check Complete
Common Anomalies:
| Output | Meaning | Action |
| Warning: Node.js not installed | Missing Node.js or not in PATH | Run nvm install 22 or install from official site |
| install script: 000 | Cannot connect to openclaw.ai | Check network/proxy/DNS settings |
| llm provider: 401 | Invalid API Key or not set | Check $OPENAI_API_KEY environment variable |
| llm provider: 403 | API Key lacks permission | Verify account has available credit/quota |
This section describes how to install OpenClaw in your chosen environment. The official recommendation is to use the one-click installation script for the best experience, though installation via package managers like npm is also supported.
The simplest and fastest way to install is by executing the official one-click script. This script automatically handles dependencies and installs the latest version of the OpenClaw CLI.
macOS / Linux Run the following command in your terminal:
Bash
curl -fsSL https://openclaw.ai/install.sh | bash
Windows (PowerShell) Run the following command in PowerShell:
PowerShell
iwr -useb https://openclaw.ai/install.ps1 | iex
If you are familiar with the Node ecosystem or require precise version control for specific workflows, you can install OpenClaw globally using npm or pnpm.
Bash
# Global installation via npm
npm install -g openclaw@latest
# Global installation via pnpm
pnpm add -g openclaw@latest
Suggestion: Avoid relying on the latest tag indefinitely in testing or production environments. A safer practice is to lock to a specific version and include that version number in your delivery documentation and regression checklists.
Bash
npm install -g openclaw@<version>
Ideal for containerized or headless deployment scenarios.
./docker-setup.sh
You can customize behavior via environment variables, such as enabling the sandbox or pre-installing extensions:
Bash
export OPENCLAW_SANDBOX=1
export OPENCLAW_EXTENSIONS="diagnostics-otel matrix"
./docker-setup.sh
Manual Installation: If not using the automation script, execute the following commands in sequence:
Bash
docker build -t openclaw:local -f Dockerfile .
docker compose run --rm openclaw-cli onboard
docker compose up -d openclaw-gateway
Post-Installation Verification: Access http://127.0.0.1:18789/ in your browser. Retrieve the Token from your .env file and paste it into the console Settings. You can confirm the gateway status via the health check endpoint:
Bash
curl -fsS http://127.0.0.1:18789/healthz
Best for developers or scenarios requiring custom modifications. Requires Node.js >= 22 and pnpm.
Bash
git clone https://github.com/openclaw/openclaw.git
cd openclaw
pnpm install
pnpm openclaw setup
To start the gateway:
Bash
node openclaw.mjs gateway --port 18789 --verbose
For development mode (hot reload):
Bash
pnpm gateway:watch
Official support is also provided for the following methods in specific O&M scenarios:
OpenClaw provides environment variables to override default paths, which is particularly useful for multi-instance or non-standard deployments:
After installation, perform a minimal verification to ensure the command is available and your PATH is correct:
Bash
openclaw --version
openclaw --help
To upgrade to a new version, re-run the one-click script or re-execute the global installation command with the desired <version> tag (or @latest).
Bash
npm install -g openclaw@<version>
The goal of an upgrade strategy is not just to use the newest version, but to ensure the process is verifiable and reversible:
💡 Troubleshooting Tale: The Node Version MysteryA community user reported that openclaw installed successfully but threw constant SyntaxError: Unexpected token errors upon startup. After three hours of troubleshooting, it was discovered that the system's default Node version was v16 (a legacy setting in nvm), while OpenClaw requires Node 22+. Lesson: Always run node -v to confirm your version before installation, especially in environments using version managers like nvm or Volta.
This section guides you through generating a minimal configuration using the official onboarding wizard and completing your first interaction verification.
Execute the following command to start the wizard. It is officially recommended to use the --install-daemon flag. This not only generates your workspace configuration but also automatically installs background daemon services (such as LaunchAgent for macOS or systemd user services for Linux/WSL2).
Bash
openclaw onboard --install-daemon
Tip: The primary difference between using and omitting the --install-daemon flag lies in the automatic configuration of background services:With the flag (Recommended): In addition to generating workspace configurations (e.g., ~/.openclaw/workspace), it registers and installs system background services. This ensures the Gateway service continues running after a reboot, making it ideal for long-term use.Without the flag (Running only openclaw onboard): Only generates configuration files and completes initialization without registering background services. You may need to start the service manually whenever you wish to use it—better for temporary local trials.
During the wizard’s series of prompts, follow this "Golden Path" to minimize troubleshooting stress:
Note: If you did not use the --install-daemon flag, you must manually execute openclaw gateway to start the service after closing your terminal or restarting your computer.
Once the Gateway is running, the core goal is to complete the initialization dialogue (Bootstrap) via the built-in Control UI (Dashboard).
Run the following command to open the local console directly, or visit http://127.0.0.1:18789/#token=<TOKEN> in your browser:
Bash
openclaw dashboard
Once opened, you will see the Gateway Dashboard interface, structured as shown in the overview below:
Figure 2-1: Dashboard Overview
In the dialogue box, set boundaries for your Agent by defining:
"Hello! I am a busy office worker who often forgets things. Please act as my daily productivity assistant. Follow these rules: 1. Give practical, down-to-earth advice. 2. Keep answers brief and structured. 3. Explain technical terms in plain language. As a smoke test, give me a Markdown list of 5 daily to-do items I can execute today, sorted by priority."
If you receive a structured, practical response, congratulations! Your base installation is successful.
After the first dialogue, check ~/.openclaw/workspace. You will find a set of Markdown files generated from templates. These form the Agent's Bootstrap Context: every time a session starts, the Gateway injects these into the system prompt so the Agent immediately knows who it is and what to do.
Plaintext
~/.openclaw/workspace/
├── AGENTS.md # Workspace Home: Startup checklist & red lines
├── SOUL.md # Persona: Values, communication style, boundaries
├── USER.md # User Profile: Name, timezone, preferences
├── IDENTITY.md # Agent Metadata: Name, avatar, emoji
├── TOOLS.md # Environment Notes: Local device names, SSH hosts
├── HEARTBEAT.md # Heartbeat Inspection List (Optional)
├── BOOTSTRAP.md # Onboarding script (Auto-deleted after completion)
└── memory/ # Memory directory (Daily conversation summaries)
| File | One-Sentence Definition | When it's Read |
| AGENTS.md | The "Home Page." Defines file reading order and rules for group chats. | Every Session |
| SOUL.md | The "Character Manual." Defines pragmatic values and communication style. | Every Session |
| USER.md | User Profile. Records your background and preferences, evolving over time. | Every Session |
| IDENTITY.md | Agent Metadata. Stores the Agent's name and representative emoji. | Every Session |
| TOOLS.md | Environment Memo. Records local device names and SSH aliases. | Every Session |
| HEARTBEAT.md | Heartbeat Task List. Executed during periodic polls. | Heartbeat Only |
| BOOTSTRAP.md | Onboarding Script. Guides the initial self-introduction. | First Run Only |
[!TIP]These are standard Markdown files. You can edit them anytime. If you modify SOUL.md or AGENTS.md, the changes take effect in the next session—no Gateway restart required.
Verify your setup using the built-in diagnostic tools:
Bash
# Check Gateway status
openclaw gateway status
# Perform a full configuration health check
openclaw doctor
Once diagnostics pass, your "Model and Control Link" is established. The next section will cover monitoring, log troubleshooting, and deep availability verification for the background Gateway.
After initializing with openclaw onboard --install-daemon, the OpenClaw Gateway is automatically configured as a system-level daemon that starts on boot (such as a LaunchAgent on macOS or a systemd user service on Linux). This section explains how to verify and manage this background service process.
If you have not installed the daemon, or if you need to perform temporary debugging, you can run the Gateway directly in the foreground:
Bash
# Start Gateway in the foreground with real-time log output to the console
openclaw gateway --port 18789
Configuration & Health Check: Exposing Errors Early
Bash
openclaw doctor
If the health check indicates that configuration files failed to load or contain syntax errors, prioritize checking your config paths and formatting. Environment variables like OPENCLAW_HOME and OPENCLAW_CONFIG_PATH can also be used to override default paths.
When deployed via the --install-daemon mode, OpenClaw can manage its background status directly through its own CLI tools, eliminating the need for third-party process managers like pm2.
Bash
# View the current background status and critical ports of the Gateway
openclaw gateway status
If the process status appears abnormal, check the corresponding background logs for your specific platform.
Tip: Service names registered automatically may vary by environment. On Linux, this is typically sudo systemctl status openclaw or systemctl --user status openclaw. However, using the CLI wrapper commands is the most platform-independent and universal method.
Once the service is confirmed to be running, further verify the Gateway’s control plane operation:
Bash
openclaw logs --limit 200
# Optional: Follow structured logs in real-time
openclaw logs --follow --json
If logs show repeated restarts or authentication failures, stop and perform a layered troubleshooting analysis.
The fastest way to verify Gateway functionality is to send a test message directly to a supported channel via the CLI:
Bash
# Replace target with an actual reachable account ID (e.g., a bound WhatsApp number)
openclaw message send --target +15555550123 --message "Hello from OpenClaw Dashboard"
Success here proves that the path from process loading to route invocation is unobstructed. You can then proceed to visualized dialogues via the Control UI.
When a first run fails, use the following four-layer "near-to-far" path for rapid localization:
flowchart TD
start["Gateway Abnormal"] --> L1["Environment & Config Layer"]
L1 -->|"Port occupied? Config error?"| L1ok{Pass?}
L1ok -->|"No"| fix1["Fix file paths & port conflicts"]
L1ok -->|"Yes"| L2["Control Plane Layer"]
L2 -->|"Permission error? Model Key invalid?"| L2ok{Pass?}
L2ok -->|"No"| fix2["Update Auth via Dashboard"]
L2ok -->|"Yes"| L3["Operation Link Layer"]
L3 -->|"Plugin crash? Tool timeout?"| L3ok{Pass?}
L3ok -->|"No"| fix3["Audit tool and skill parameters"]
L3ok -->|"Yes"| L4["External Network Layer"]
L4 --> fix4["Troubleshoot API firewalls & rate limits"]
Upon completing this chapter, you should possess a standardized workflow for installation, initialization, and first-run acceptance, as well as the ability to quickly localize common environment and dependency issues.
Chapter 3 will dive deep into configuring initial agent instructions and group chat strategies within the Dashboard and WebChat environments. Advanced multi-channel integration will be covered in Chapter 7.
This chapter establishes a reproducible "Minimum Viable Baseline": first, we will run the main local loop using the Dashboard and WebChat to master diagnostic commands and troubleshooting sequences; next, we will solidify initial instruction goals and formatting; finally, we will set up minimum security and access boundaries for entry points. By the end of this chapter, you will have your first functional OpenClaw instance and a solid foundation for further in-depth learning.
This chapter consists of the following sections:
Upon completing this chapter, you will be able to:
This section establishes a "minimum closed loop" based on the official Web interface: first, confirm gateway health via CLI, then use the Dashboard to open WebChat for a minimal interaction test. We will also include common blockers, such as "new device approval," in our troubleshooting path. The goal is to ensure a stable, reproducible local baseline exists before connecting any external channels.
Connecting external channels introduces numerous variables: platform rate limiting, callback retries, network jitter, and group chat noise. Verifying the main loop via the local Dashboard and WebChat offers two immediate benefits:
The following operational steps are recommended.
The Dashboard (Control UI) serves as the Web management center for OpenClaw. The interface consists of a Top Bar, Sidebar Navigation, and Main Content Area.
Top Bar: The left side features a hamburger menu (≡, to collapse/expand the sidebar) and the OpenClaw logo. The right side displays the version number (e.g., Version 2026.3.8), a health indicator (Health OK), and a theme switcher (System/Light/Dark).
Navigation Bar: Contains the following menu items, divided into two groups. The first group (visible by default) includes 10 items:
| Menu Item | Route | Functional Description |
| Chat | /chat | WebChat window supporting session selection, new chats, and streaming output. Features a "Thinking" toggle, Focus Mode, and Cron Sessions viewer. |
| Overview | /overview | Gateway overview page. Shows Access info (WebSocket URL, Token, Session Key) and Snapshot cards (Status, Uptime, Cron status). |
| Channels | /channels | Channel management. Displays status (Running, Mode, Last probe), Account configs, Allowlists, and credential settings (Bot Tokens). |
| Instances | /instances | Instance list. Shows presence beacons for connected gateways and clients, including Hostname, IP, OS, Version, and Permissions. |
| Sessions | /sessions | Session management. Lists active sessions with Key, Label, Kind, Token usage, and per-session overrides for Thinking/Verbose modes. |
| Usage | /usage | Statistics and cost analysis. Supports date filtering (7d/30d), Token/Cost views, Activity timelines, and data export. |
| Cron Jobs | /cron | Scheduled task management. Features global status, a creation form (Schedule, Operation mode, Wake mode), and an existing jobs list. |
| Agents | /agents | Agent configuration center. Left: Agent list; Right: Details panel including Overview, Files (edit AGENTS.md/SOUL.md), Tools, and Skills. |
| Skills | /skills | Skill management. Lists built-in skills (bundled/blocked), supports search/filter, and allows for dependency installation (e.g., 1Password CLI). |
| Nodes | /nodes | Device and permission management. Configures Exec Approval policies (Security/Ask Mode) and manages paired Device IDs/Tokens. |
The second group (appearing in the expanded area or via direct routing) includes 4 items:
| Menu Item | Route | Functional Description |
| Config | /config | Global config editor (openclaw.json). Supports Form/Raw editing modes with search, tagging, and Save/Apply operations. |
| Debug | /debug | Debug snapshots. Displays raw JSON of internal gateway states (heartbeat, channelSummary, queued events) for deep troubleshooting. |
| Logs | /logs | Real-time log viewer. Reads JSONL logs with level filtering (trace to fatal), keyword search, auto-follow, and export functionality. |
| Docs | External | Direct link to the official OpenClaw documentation site (docs.openclaw.ai). |
In a troubleshooting workflow, it is recommended to confirm gateway health in Overview, check external connections in Channels, and finally perform interaction tests in Chat. For deeper diagnostics, refer to Logs and Debug.
Bash
openclaw health --json
2. Open the Dashboard.
Bash
openclaw dashboard
# Or open directly on the gateway machine:# http://127.0.0.1:18789/
Common Blocker: First-time access from a new browser or device requires approval. If the Dashboard indicates a pending device, list and approve it via the CLI:
Bash
openclaw devices list
openclaw devices approve <ID>
The key value of WebChat is exposing the process: whether the model request was sent, if tools were proposed/executed, and if output is streaming back. For troubleshooting, the most important task is aligning each interaction with the traces in the logs.
Operational Suggestion: Enable structured logs and compare the streaming output in the Dashboard's Chat interface.
Figure 3-1: WebChat interface illustration. The Chat page provides a complete view: the left side for input, and the right or bottom for streaming output, including user input, model reasoning, tool call requests, and results.
Bash
openclaw logs --follow --json
It is recommended to use reproducible test cases rather than random questions.
Test Case 1: Health Link Confirmation
{
"status": "ok",
"gateway": "running",
"uptime": 12345,
"channels": { "telegram": "connected" },
"models": { "default": "gpt-5" }
}
{ "level": "info", "event": "request_received", "message": "..." }
{ "level": "info", "event": "response_sent", "duration_ms": 2333 }
est Case 3: Streaming Verification
This section shifts troubleshooting from "guessing based on errors" to "layered positioning based on evidence." The core strategy is to use health and status to determine system availability, followed by channels status --probe and models status --check to verify dependencies, and finally using doctor and diagnostic configurations to sample evidence while redacting sensitive information.
All CLI commands in this section can be found in Appendix E: Command Cheat Sheet for full syntax; for a more systematic process, see Appendix C: Troubleshooting Checklist.
We recommend a fixed troubleshooting sequence:
Run health and status probes first, then verify channel and model dependencies to narrow down issues to actionable steps.
Command 1: Health Check
Bash
openclaw health --json
✅ Normal Output (includes fallback models and timestamps):
JSON
{
"status": "ok",
"gateway": "running",
"uptime": 45678,
"channels": { "telegram": "connected", "whatsapp": "connected" },
"models": { "default": "gpt-5", "fallback": "claude-opus-4-6" },
"last_check": "2026-03-06T10:30:45.123Z"
}
❌ Common Abnormalities: status: "degraded" indicates partial failure. Check the errors array for expired tokens or quota warnings.
Command 2: Status Overview & Deep Probe
Bash
openclaw status --deep
✅ Normal Output: Shows PID, Uptime, Memory/CPU usage, and a summary of active chats per channel.
❌ Common Abnormalities: High memory usage (>90%), disconnected channels, or LIMIT REACHED on model quotas.
Command 3: Channel Status & Connectivity Probe
Bash
openclaw channels status --probe
✅ Normal Output: Provides webhook latency and message delivery confirmation.
❌ Common Abnormalities: bot_token_invalid or high webhook_latency (>5000ms), suggesting network issues or firewall blocks
Command 4: Model Status & Auth Probe
Bash
openclaw models status --check
✅ Normal Output: Shows provider availability, token usage percentage, and latency.
❌ Common Abnormalities: authentication_failed (invalid API key) or rate_limited (100% quota used).
Logs are written to /tmp/openclaw/openclaw-YYYY-MM-DD.log by default. Use --json and jq for filtering. You can also view these in real-time via the Logs tab in the Dashboard.
Bash
openclaw logs --follow --json
The doctor command fixes common issues, while the config controls log rotation and data masking.
Bash
openclaw doctor --fix
Configuration Example (Redaction & Logging):
JavaScript
{
"logging": {
"level": "info",
"redactSensitive": "tools",
"redactPatterns": ["sk-[A-Za-z0-9]{16,}"]
},
"diagnostics": {
"enabled": true,
"flags": ["telegram.*"]
}
}
[!WARNING] OpenClaw enforces strict Schema validation. Unknown keys will cause the Gateway to reject startup; use openclaw doctor to restore configuration.
These commands can be sent directly in the chat interface:
| Command | Purpose | Typical Scenario |
| /status | View current running status | Use this first if the agent is stuck. |
| /stop | Stop the current task | Force-terminate a stuck tool call. |
| /compact | Compress session context | Save tokens when near the limit. |
| /new | Start a fresh session | Avoid context pollution when switching topics. |
| /model <name> | Switch current model | Swap to a cheaper or more powerful model. |
| /think <level> | Adjust reasoning depth | Use off for chat, high for complex logic. |
[!TIP] /status + /stop is your first line of defense. If the agent is unresponsive but /status shows it is running, a tool call is likely hanging—use /stop to recover.💡 Real-World Note: Health "OK" but no messagesopenclaw health --json might return ok while WhatsApp fails because the QR pairing expired. The process is alive, but the socket is inactive. Always use channels status --probe for true end-to-end verification.
Initial instructions are designed to solidify an agent's core objectives, operation boundaries, and output standards. They provide a stable semantic foundation for subsequent tool calls and routing decisions. Based on OpenClaw's instructions configuration, this section details best practices for writing instructions—including what to include and what to avoid—and explores how to build instructions, channel entry points, tool policies, and system observability into a repeatable workflow.
In engineering terms, initial instructions are not about "creating a persona" but are a runtime contract that includes at least three types of information:
OpenClaw supports global default instructions as well as specific overrides for individual agents. The configuration uses agents.defaults.instructions as the base value, which can be customized per agent.
The following example demonstrates basic instruction patterns. These settings can also be managed visually via the Agents page in the Dashboard:
Figure 3-4: Visual configuration for Agents.
The example below emphasizes "verifiable and troubleshootable" output requirements:
JavaScript
{
agents: {
defaults: {
instructions: "Produce executable steps and verification methods; state uncertainties clearly and provide troubleshooting paths; avoid fabricating commands or config keys."
},
work: {
displayName: "Work Assistant",
instructions: "Handle work-related queries only; for write-access or high-risk operations, provide confirmation points and rollback plans first."
}
}
}
If using multi-agent routing, it is recommended to place "entry governance" in the entry agent's instructions and "domain knowledge/tool usage" in the domain agent's instructions to avoid overloading a single set of instructions.
Instruction executability comes from being "checkable." In practice, use the following structure:
| Dimension | ❌ Bad (Literary/Vague) | ✅ Good (Checkable/Verifiable) |
| Goal | "You are a friendly assistant helping the user." | "Only handle K8s ops queries; reply 'Out of scope' for others." |
| Boundary | "Please use dangerous commands carefully." | "Forbidden: kubectl delete, helm uninstall. If required, output a rollback plan and wait for confirmation." |
| Output | "Please answer as detailed as possible." | "Output must include: Conclusion (1 sentence), Command (commented), Verification (expected output), Failure Handling (next step)." |
| Source | "Answer based on your knowledge." | "Only cite official docs/runbooks in the workspace; if unsure, explicitly state 'Not found in documentation'." A "Good Instruction" Template: Plaintext |
A "Good Instruction" Template:
Plaintext
You are a K8s Ops Assistant. Follow these rules:
1. Only handle Kubernetes cluster operations issues.
2. Fixed format: Conclusion → Command (commented) → Verification → Failure Handling.
3. Forbidden: Destructive commands like delete/uninstall.
4. If unsure, explain why and provide a diagnostic path.
5. Citations must include the document name or URL.
Avoid relying solely on instructions for security. True boundaries should be enforced by tool policies and sandbox constraints for deterministic protection (see Section 5.2: Tool Policy).
Example 1: Simple — Personal Daily Assistant
JavaScript
{
agents: {
personal_assistant: {
displayName: "OpenClaw-Personal",
model: "gpt-5",
tools: ["calendar_query", "reminder_set", "task_log"],
instructions: `You are a personal schedule assistant.
1. Only handle schedule, task, and reminder requests.
2. Format: Confirmation → Result → Suggestions.
3. No access to private mail/finance data.
4. If unsure, do not hallucinate.`
}
}
}
Example 2: Medium — DevOps Team Assistant
JavaScript
{
agents: {
devops_assistant: {
displayName: "OpenClaw-DevOps",
tools: ["kubectl_get", "kubectl_logs", "healthcheck_run"],
instructions: `You are a Team DevOps Assistant.
- Format: 1) Diagnosis, 2) Exec Steps (Commented Shell), 3) Verification, 4) Failure Handling.
- [Read] commands are unrestricted; [Write] commands (patch/upgrade) require a YAML diff and a "Press Ctrl+C to abort" prompt.
- Forbidden: Destructive delete commands.
- Citations must include timestamps/node names.`
}
}
}
Example 3: Complex — Multilingual Support Gateway
JavaScript
{
agents: {
support_gateway: {
displayName: "Support Gateway",
tools: ["language_detect", "intent_classify", "ticket_create", "agent_escalate"],
instructions: `You are a Multilingual Support Gateway.
1. Detect language and respond in kind.
2. Classify intent (Consult/Account/Fault/Suggestion/Complaint).
3. Self-service first: use knowledge_search/faq_retrieve.
4. Escalate only if: user requests human, self-service fails 3x, or security issue.
5. For escalation: generate ticket_create and inform user of tracking ID.
- Keep technical terms (Pod, API) in English.
- Redact PII (emails/IDs) in logs.`
}
}
}
While the previous sections treat instructions as "executable specs" for teams, an alternative exists for personal use: SOUL.md. This file defines the communication style and relationship mapping, making the agent feel more like a "colleague who knows you" rather than a "support bot following SOPs."
| Dimension | Executable Spec Route | SOUL.md Personal Route |
| Use Case | Team tools, Production | Personal assistant, Private |
| Goal | Stability, Auditability | Natural, Personalized |
| Content | Prohibitions, Escalation | Style, Background, Taboos |
SOUL.md Structure:
[!NOTE] SOUL.md is great for personal exploration. For team/production environments, stick to checkable instruction specs to prevent "persona drift." You can layer them: use instructions for boundaries and SOUL.md for style.
To verify if instructions are working, use reproducible test cases:
openclaw status --deep
openclaw logs --follow --json
If the model deviates, check if the issue is with routing bindings or tool policies before modifying the instruction itself. Version control your instructions alongside your configuration to maintain consistency across environments.
Imagine if, without any security configuration, your agent is pulled into a large group of 500 people. If someone @mentions it with a sensitive question and it directly leaks internal company data, you have a serious problem.
The core objective is to put a "security lock" on the agent: intercepting strangers in private chats with pairing, enforcing @mention thresholds in group chats, and restricting which groups it can join.
OpenClaw supports four private chat policies (dmPolicy). It is generally recommended to use pairing or allowlist.
Configuration Example (e.g., Feishu/WhatsApp with Pairing + Allowlist):
JavaScript
{
channels: {
feishu: {
dmPolicy: 'pairing',
allowFrom: ['ou_123456789'],
},
},
}
Operational Example: Listing and Approving Pairing Requests
Bash
openclaw pairing list feishu
openclaw pairing approve feishu <CODE> --notify
Group chats carry high risks, specifically input noise and unauthorized side effects. It is strongly recommended to follow these three thresholds in your configuration:
{
channels: {
feishu: {
groupPolicy: 'allowlist',
groupAllowFrom: ['group_123456'],
groups: { '*': { requireMention: true } },
},
},
messages: {
groupChat: {
mentionPatterns: ['@openclaw', '@AI_Assistant'],
},
},
}
It is recommended to perform three sets of test cases:
openclaw channels status --probe
openclaw logs --follow --json
Beyond security and access control, group chats offer an often-overlooked engineering value: Context Isolation.
When all topics happen within a single session, the memory file becomes increasingly long and cluttered. The agent must process significant irrelevant information, leading to a drop in response quality and speed. By creating multiple groups, you can naturally isolate contexts by scenario:
Typical Group Shunting Plan
| Group | Use Case | Context Characteristics |
| Main Chat (DM) | Daily use, personal memory | Long-term memory, personalized config |
| Work Group | Specific work tasks | Only contains work-related context |
| Writing Group | Content creation | Only contains writing styles and templates |
| Test Group | Testing new features/configs | Can be cleared at any time without affecting formal memory Operation: Simply create a new group chat and add both yourself and the agent. Each group's sessionKey is naturally unique, and the context is isolated automatically. |
[!TIP] Multi-group shunting is much more cost-effective than frequently using /new to start fresh sessions. While /new discards the current context, group isolation is persistent—each group accumulates its own memory without mutual interference.
Operation: Simply create a new group chat and add both yourself and the agent. Each group's sessionKey is naturally unique, and the context is isolated automatically.
[!TIP] Multi-group shunting is much more cost-effective than frequently using /new to start fresh sessions. While /new discards the current context, group isolation is persistent—each group accumulates its own memory without mutual interference.
The goal of Chapter 3 is to establish a "Local Minimum Closed-Loop Baseline": verifying that the main loop is functional, observable, and reproducible without introducing external channel variables. This provides a stable reference frame for subsequent configuration tuning and scaling.
After completing this chapter, you should have solidified the following key conclusions:
Before moving to the next chapter, please self-assess:
Chapter 4 dives into the configuration system and model integration: upgrading from "able to answer" to "controllable and replaceable," and establishing a baseline for verifiable model selection and failover.
This chapter addresses configuration and models from the perspective of "System Controllability": first, by understanding the structure and priority of configuration files; next, by completing the integration of model providers; and finally, by establishing basic strategies for model selection and failover. Through this chapter, you will master how to transform an OpenClaw system into a "predictable and tunable" foundation for agents. After reading, you should be able to independently answer three questions: Where is the current configuration taking effect? Why was the current model selected? How will the system degrade when a failure occurs?
This chapter includes the following sections:
Upon completing this chapter, you will be able to:
This section systematically reviews the operation chain of the OpenClaw configuration system, including specific configuration sources, priority determination, override rules, and auditing mechanisms. The core objective is to ensure that any runtime behavior of the system can be traced back to a definitive input configuration and to provide specific methods for verifying the final effective parameters.
In OpenClaw, configuration is not a "collection of parameters" but the input for system behavior. The significance of writing behavior into configuration is that when the same system runs on different machines, channels, or accounts, its behavior remains reproducible and explainable.
From the perspective of the system chain, configuration determines at least three types of outcomes:
Figure 4-1: Config Global Configuration View.
Many instances of "configuration not taking effect" are not priority issues, but rather issues of writing to the wrong scope. The safest approach is to split configuration into four layers of responsibility before discussing overrides:
You can understand priority using a stable model without relying on implementation details: the closer to runtime, the higher the priority. Common sources include:
The same field may appear in multiple places; the final value depends on the override source, not "which one is written later in the file." Therefore, when governing configurations, avoid redundant definitions of the same semantic fields to prevent leftover overrides when copying across environments.
Mapping "configuration taking effect" to an evidence chain creates a stable troubleshooting method. It is recommended to collect four types of evidence in order:
This section focuses on the models.providers configuration, specifically exploring naming conventions and association mechanisms between providers and models. We will cover best practices for safely referencing API keys using ${VAR_NAME} placeholders to avoid plain-text storage and the use of multiple keys with keyId to support smooth rotation and canary releases. Finally, we provide a standardized set of acceptance commands to ensure that integrated models are not only highly available but also flexibly replaceable.
In OpenClaw, provider configurations are centralized in models.providers, which defines "how to connect and what credentials to use." Note:
{
models: {
providers: {
openai: { apiKey: "${OPENAI_API_KEY}" },
anthropic: { apiKey: "${ANTHROPIC_API_KEY}" },
},
},
agents: {
defaults: {
model: {
primary: "openai/gpt-5.2",
},
},
},
}
Common Naming Patterns (Examples for formatting reference):
Anthropic (Claude)
Setup: export ANTHROPIC_API_KEY="sk-ant-<YOUR_API_KEY>..."
OpenAI
Setup: export OPENAI_API_KEY="sk-proj-<YOUR_API_KEY>.."
Self-hosted Ollama
OpenRouter
Configuration supports writing ${VAR_NAME} in string fields or using a SecretRef object. We recommend placing keys in environment variables or a secret management system so that configuration files remain auditable and reproducible without leaking credentials to disk.
{
models: {
providers: {
openai: {
apiKey: { source: "env", id: "OPENAI_API_KEY" },
},
anthropic: {
apiKey: { source: "file", id: "/run/secrets/anthropic_key" },
},
custom: {
apiKey: { source: "exec", id: "vault kv get -field=key secret/custom" },
},
},
},
}
[!WARNING] Disk Leakage Risk: While standard interpolation is common, using the SecretRef object strictly isolates the data pipeline. This prevents keys from being accidentally de-referenced and written back to disk in plain text if the system or an AI agent rewrites the configuration file.
A single provider can host multiple keys, with keyId selecting the default. We recommend adding a new key and verifying it with small-scale traffic before switching the keyId and revoking the old key.
Rotation Tip: Do not simply replace the value of an environment variable during a failure window, as this destroys the evidence trail. Instead, add a new key, switch the ID, observe, and then decommission.
After configuration, use this minimal command set for verification:
Bash
openclaw doctor
openclaw models status --check
openclaw status --deep
Upstream providers implement rate limits. OpenClaw handles these to prevent service interruptions.
When a 429 (Too Many Requests) is received, OpenClaw employs exponential backoff and a cooldown window:
You can proactively control request frequency in models.providers to prevent hitting upstream limits:
JavaScript
{
models: {
providers: {
openai: {
rateLimit: {
requestsPerMinute: 60,
tokensPerMinute: 100000,
},
},
},
},
}
The most robust approach is configuring multiple providers. If one provider is frequently throttled, the system automatically switches to a secondary provider to ensure continuity.
💡 Real-World Note: The "Invisible" Env Var A common trap: exporting ANTHROPIC_API_KEY in .bashrc, but OpenClaw fails to read it when running as a systemd service. Systemd does not load shell profiles. Solution: explicitly define it in the systemd unit file using EnvironmentFile= or use openclaw secrets configure.
This section clarifies the implementation of model selection through agents.defaults.model: how to set the default primary model, how individual agents can override these defaults, and which capability constraints to prioritize for tool calls and long-context scenarios. Finally, it provides a verification method based on models status and regression testing to move model selection from "intuition" to "comparability."
Model selection should not be based solely on performance. For an operational system, at least four dimensions must be considered simultaneously:
| Business Scenario | Quality Requirement | Latency Tolerance | Cost Sensitivity | Recommended Strategy |
| Customer Support | Medium (Standard QA) | Low (User waiting) | High (Per-message billing) | Small model primary (e.g., gpt-5-mini), large model fallback. |
| Data Analysis | High (Complex logic) | Medium (Can wait 30s) | Medium | Large model primary (e.g., gpt-5.2), focus on context window. |
| Scheduled Inspection | High (Structured output) | High (Offline/Async) | Low | Large model primary; retry upon failure rather than downgrade. |
Engineering-wise, the most stable approach is to fix a default primary model and use fallback chains for safety, rather than frequently switching models manually.
Model Selection Decision Tree
The following diagram illustrates the complete decision-making process for model selection and fallback configuration based on task complexity:
Steps to use this decision tree:
{
"agents": {
"defaults": {
"model": {
"primary": "openai/gpt-5.2",
"fallbacks": [
"anthropic/claude-sonnet-4-6"
]
}
}
}
}
agents.defaults.model.primaryThe default primary model should be defined in agents.defaults.model.primary. Treat it as a "system baseline" rather than a casual toggle. Fix the default value first and use fallback chains to handle edge cases, ensuring the evidence chain remains intact during troubleshooting.
agents.listWhen different agents handle distinct tasks, you can override model selection within agents.list. This allows the model to evolve alongside tool policies and workspace isolation.
JavaScript
{
"agents": {
"list": [
{
"id": "assistant",
"model": { "primary": "openai/gpt-5.2" }
},
{
"id": "fast",
"model": { "primary": "openai/gpt-5-mini" }
}
]
}
}
When selecting models, double-check these three constraints:
First, confirm model availability, then run a minimal regression test.
Bash
openclaw models status --check
When changing the primary model or adjusting the fallback chain, use a fixed set of regression cases covering:
This section introduces the configuration methods for fallback chains, their trigger timing, and their linkage with retry mechanisms. The core of the configuration lies in using agents.defaults.model.primary and agents.defaults.model.fallbacks to set the primary model and a prioritized list of fallback targets. Additionally, this section provides a verification scheme based on "fault injection and observation" to ensure continuous availability and the actual effectiveness of the fallback mechanism.
The key to fallback is not "switching upon any failure," but "applying different actions to different failures." It is recommended to classify errors into three categories based on "operability":
The minimal viable syntax consists of a primary model plus a sequential fallback list. The system will attempt alternative models in order when the primary model fails.
JavaScript
{
agents: {
defaults: {
model: {
primary: "openai/gpt-5.2",
fallbacks: [
// First fallback: Smaller model from the same provider (common for handling concurrent rate limits)"openai/gpt-5-mini",
// Second fallback: Cross-provider model (common for handling upstream or persistent network failures)"anthropic/claude-sonnet-4-6",
],
},
},
},
}
It is recommended to prioritize based on "continuity first, but explainable": the earlier a fallback target appears, the more stable and available it should be, with acceptable fluctuations in cost and quality.
Fallback chains must be designed alongside retries; otherwise, "retry deadlock" or "silent switching" may occur.
Beyond cross-model fallbacks chains, OpenClaw maintains an auth-profile rotation mechanism within the same provider. When a key or account triggers a failure, the system automatically switches to the next available credential for that provider instead of immediately jumping across models.
Cooldown Gradients:
| Failure Count | Cooldown Duration |
| 1 | 1 minute |
| 2 | 5 minutes |
| 3 | 25 minutes |
| 4+ | 1 hour (Cap) |
Billing-related blocks (e.g., 402 payment failure) have an independent gradient: starting from 5 hours, doubling each time up to a 24-hour cap; the timer resets automatically after 24 error-free hours.
Cooldown states are persisted in the usageStats field of ~/.openclaw/agents/<agentId>/agent/auth-profiles.json and remain effective after a restart. For more on auth-level reliability, see Chapter 11.
Fallback is not complete just because it is written in the config; you must verify that it actually triggers and can be reconciled.
openclaw models status --check
openclaw logs --follow --json
Chapter 4 upgrades "model accessibility" to "model controllability." The core objective is not simply switching to more powerful models, but transforming configuration, authentication, selection, and failover into explainable system capabilities.
Below is a "essential-only" minimal configuration: integrate a provider, set a default primary model, configure a fallback chain, and verify with commands.
Bash
export OPENAI_API_KEY="..."export ANTHROPIC_API_KEY="..."
2. Configuration Snippet (Merge this into your ~/.openclaw/openclaw.json):
JavaScript
{
models: {
providers: {
openai: { apiKey: "${OPENAI_API_KEY}" },
anthropic: { apiKey: "${ANTHROPIC_API_KEY}" },
},
},
agents: {
defaults: {
model: {
primary: "openai/gpt-5.2",
fallbacks: ["openai/gpt-5-mini", "anthropic/claude-sonnet-4-6"],
},
},
},
}
openclaw doctor
openclaw models status --check
openclaw status --deep
Achieved Objectives: Providers are available, the default model is explainable, and a fallback chain exists and is drill-ready.
Chapter 5 moves into the tool system, skills, and plugins: upgrading from "being able to answer" to "being able to act," while confining those actions within least-privilege and auditable boundaries.
This chapter discusses the action layer of AI Agent systems: while the model is responsible for proposing intent, it is the tools and extended capabilities that generate actual external impact. The core theme is upgrading "the ability to call tools" into "the ability to execute stably within defined boundaries, with auditability, troubleshooting, and rollback capabilities."
Learning Objectives:
This chapter consists of the following sections:
Note: Command examples in this book may occasionally omit the "main command prefix" (e.g., certain deployments require a unified CLI prefix before subcommands). If you encounter a "command not found" error during operation, please prioritize the --help output of your local CLI and the conventions used in other chapters of this book.
This section explores how to construct and manage a tool inventory from an engineering perspective, covering core concepts such as tool contracts, failure semantics, and read/write boundaries. It also introduces how to ensure the reproducibility and replayability of tool calls (supporting hands-on verification in local instances).
Building a tool inventory is not about memorizing a static list (as available tools change with versions, configuration templates, plugins, and deployment forms); rather, it is to enable you to answer three things on your own instance:
| Tool Category | Typical Tool Patterns (Examples) | Risk Level | Default Policy Suggestion | Acceptance Focus |
| Read-only Query | group:web, read, memory_search | Low | Allow by default (rate-limit as needed) | Accuracy, latency, availability |
| Side-effect Write | write, edit, group:messaging | Medium | Deny by default (open by entry/role) | Idempotency, rollback, permission boundaries |
| Exec/Command | group:runtime (exec, bash, process) | High | Deny by default (open for min. scope) | Whitelists, audit, blast radius |
| Interaction Auto | group:ui (browser, canvas) | High | Deny by default (open if necessary) | Step verifiability, failure localization |
| Extended Tools | plugins.* (provided by plugins) | Varies | Plugin whitelist first, then tool policy | Start/stop, canary, replayable evidence |
Note: The "Tool Patterns" in the table above represent governance methods and risk layering; specific tool IDs and available commands should be based on the actual output of status --deep, structured logs, and subcommand --help on your local instance.
A minimal, reproducible "Tool Inventory Generation/Verification" workflow (rely on evidence, not memory):
In OpenClaw, a "tool" should be viewed as a controlled operation: it has explicit inputs, explicit outputs, explicit failure semantics, and its side effects must be auditable.
The core significance of treating tools as first-class objects is pushing the system from "acting on feeling" to "acting by contract." A contract answers at least four questions:
{
"name": "create_ticket",
"description": "Create a support ticket",
"parameters": {
"type": "object",
"properties": {
"title": { "type": "string" },
"priority": { "enum": ["P1", "P2", "P3"] }
},
"required": ["title"]
},
"failure_semantics": {
"retryable_errors": ["NetworkTimeout", "RateLimitExceeded"],
"fatal_errors": ["Unauthorized", "InvalidFormat"]
}
}
As long as side effects exist, "failure semantics" must be designed on the tool side rather than relying on prompt-based remedies after the fact.
Based on permission boundaries and acceptance flows, tool calls can be categorized into three paradigms:
| Dimension | Built-in Readability | jina.ai Reader |
| JavaScript Rendering | Not Supported | Supported |
| Paywalled Content | Restricted | Partially Bypassable |
| Social Media (e.g., X/Twitter) | Not Supported | Supported |
| Output Format | HTML Snippets | Clean Markdown |
| Deployment Dependency | None (Built-in) | None (Free API, no key required) |
[!TIP] You can configure jina.ai Reader as a custom Skill, allowing the Agent to automatically degrade to it if Readability fails. See 5.3 Skills and Plugins for specific skill configuration methods.
The flowchart below illustrates the full lifecycle of a tool call, from model proposal to result back-injection:
flowchart TD
M["Model Reasoning"] -->|"Output tool call intent"| P["Proposal: tool_name + params"]
P --> C{"Policy Validation"}
C -->|"allow hit"| E["Execute Tool"]
C -->|"deny hit"| D["Deny + Reason Injection"]
E --> R{"Operation Result"}
R -->|"Success"| S["Structured Injection"]
R -->|"Retryable Error"| RT["Bounded Retry"]
R -->|"Fatal Error"| F["Failure Injection + Alert"]
RT --> E
S --> M2["Model Continues Reasoning"]
D --> M2
F --> M2
From an engineering standpoint, it is recommended to break chained orchestration into checkable stages: every step should produce structured intermediate results written to the session or log. This way, when a failure occurs, you can locate the specific step rather than just seeing "task failed."
If tool outputs are back-injected as-is, the common consequences are context explosion and loss of evidence. A more robust back-injection method is the "three-part" structure:
{
"summary": "User account is active but password expired.",
"evidence": {
"account_id": "u_12345",
"status": "active",
"last_login": "2026-03-18T00:18:00Z"
},
"raw_output": "{\"db_record\": {...}}"
}
The goal of this structure is to allow subsequent reasoning to reference "evidence" rather than a large block of noise.
The reliability of a tool system comes from observability. It is recommended to solidify the minimum reproducible information of a tool call as a record:
Based on official tool governance, this section translates the question of "whether a tool can be invoked" into a configurable and auditable policy plane. Key topics include the matching semantics of tools.allow and tools.deny, default policy selection via tools.profile, and layered governance by channel/group using channels.*.groups.*.tools. The goal is to ensure the system is "secure by default, open by necessity," and capable of answering "why this was allowed but that was denied" during incident reviews.
The official configuration can be broken down into four key blocks:
Note: The tool IDs and wildcard patterns used here explain governance methods and layering logic; specific available tools and precise naming depend on your version, enabled plugins, and the actual output of status --deep.
Before diving into configuration, clarify the mapping between system concepts and actual fields:
| Concept | Config Field | Default Value | Aliases or Supplementary Notes |
| Default Scenario Template | tools.profile | minimal | Options: minimal, coding, messaging, full, etc. |
| Tool Grouping | group:* prefix | (Version-specific) | Used to batch-control similar tools (e.g., group:runtime for command operation). |
| Global Allowlist | tools.allow | [] | Must be explicitly declared under strict configurations. |
| Global Denylist | tools.deny | [] | Highest priority: blocks even if allow permits the tool. |
The tools/groups included in each profile are as follows:
| Profile | Included Tools/Groups |
| minimal | Only session_status |
| coding | group:fs, group:runtime, group:sessions, memory_search, memory_get, image |
| messaging | group:messaging, sessions_list, sessions_history, sessions_send, session_status |
| full | No restrictions (equivalent to no profile set) |
The system also supports overriding global profiles per agent via agents.list[].tools.profile. Refer to the Tools Documentation.
Key semantics provided by official documentation:
[!WARNING] Wildcard "Foot-guns" and elevated Privileges: Overusing * in the allow list (especially alongside tools.elevated to bypass sandbox restrictions) means granting unconditional authorization to all unknown plugins, which can easily lead to silent privilege escalation. It is recommended to strictly control allowFrom, avoid *, and always include verification commands (like security audit) in pre-deployment checks.Note: tools.elevated is a global, sender-based configuration and cannot be set per-agent in agents.list[].tools. To restrict elevation for a specific agent, you should disable exec in that agent's tools.deny.
Configuration Example (converging from "default open" to "deny shell write operations"):
JavaScript
{
tools: {
allow: ['*'],
deny: ['group:runtime', 'write', 'edit', 'apply_patch'],
},
}
Specific Case: A Complete Interception Chain for Unauthorized Operation
Scenario: A DevOps assistant deployed in a Telegram group. A user says: "Help me delete that failing Pod in the staging environment."
{
"ts": "2026-02-20T10:30:15Z",
"trace_id": "t-20260220-042",
"event": "tool_denied",
"tool": "exec",
"rule": "tools.deny: group:runtime",
"agent": "dev_assistant",
"channel": "telegram",
"sender": "user_987654"
}
This is why security boundaries must be secured by tool policies rather than just instructions in the prompt—even if the model "wants" to execute, the policy intercepts it deterministically.
As systems scale to multiple channels, groups, and entry points, the platform provides the ability to restrict tools at the channel/group level via channels.*.groups.*.tools, with further overrides available via toolsBySender.
Specific Case: Internal Ops Group vs. External Support Group
Assume one OpenClaw instance serves both an external WhatsApp group and an internal R&D Telegram group:
{
tools: {
profile: 'coding',
deny: ['group:runtime'],
},
channels: {
whatsapp: {
groups: {
'*': {
tools: { deny: ['group:runtime'] },
},
},
},
telegram: {
groups: {
'*': {
tools: { deny: ['group:runtime', 'write', 'edit'] },
toolsBySender: {
'123456789': { alsoAllow: ['group:runtime'] },
},
},
},
},
},
}
It is recommended to use "more conservative group chat policies" as an operational baseline.
Official support also exists for tailoring tool policies based on the model provider or specific model via tools.byProvider. For example, models with weaker tool-calling capabilities can be limited to a minimal toolset:
JavaScript
{
tools: {
profile: 'coding',
byProvider: {
'google-antigravity': { profile: 'minimal' },
'openai/gpt-5.2': { allow: ['group:fs', 'sessions_list'] },
},
},
}
When a primary agent spawns a sub-agent via sessions_spawn, the sub-agent’s tool availability is automatically narrowed. This is a system-level, hard-coded security boundary—even if not explicitly declared in the config, the following tools are disabled.
Tools Always Disabled for All Sub-agents (SUBAGENT_TOOL_DENY_ALWAYS):
| Tool | Reason for Disabling |
| gateway | System management tool; sub-agents should not control the gateway. |
| agents_list | Agent listings belong to the management plane. |
| whatsapp_login | Interactive setup process; unsuitable for automated sub-tasks. |
| session_status | Status queries should be managed by the parent agent. |
| cron | Scheduling authority should be converged at the top level. |
| memory_search / memory_get | Sub-agents should receive info via spawn prompts, not global retrieval. |
| sessions_send | Sub-agents should return results via the announce protocol, not direct messaging. |
Additional Tools Disabled for Leaf Nodes (SUBAGENT_TOOL_DENY_LEAF):
When a sub-agent reaches the maxSpawnDepth (meaning it cannot spawn further levels), it additionally loses:
| Tool | Reason for Disabling |
| sessions_spawn | Leaf nodes cannot spawn further sub-agents. |
| sessions_list / sessions_history | Session management is reserved for the orchestrator. |
Decision formula: isLeaf = depth >= max(1, floor(maxSpawnDepth)).
Configuration Override: System-level disabling can be explicitly bypassed using tools.subagents.tools.alsoAllow. For instance, if a sub-agent truly needs memory access:
JavaScript
{
tools: {
subagents: {
tools: {
alsoAllow: ["memory_search"]
}
}
}
}
This design embodies the "Defense in Depth" principle: even if a parent agent's prompt fails to restrict a sub-agent's behavior, the system ensures that privileges do not propagate indefinitely.
To verify if tool policies are effective, use two types of evidence:
openclaw doctor --fix
openclaw status --deep
openclaw logs --follow --json
Tools determine "what can be done," while engineering methods determine "how to do it more stably." Plugins and skills are used in parallel, offering complementary capabilities and methodologies. Plugins extend runtime capabilities and tools, while skills solidify reusable methodologies and operation steps. In new projects, the two typically work together rather than replacing one another.
At the system boundary and underlying physical form, responsibilities can be split into two parts:
The core of the plugin system is "explicit activation and explicit permission." The configuration structure for plugins.entries and the whitelist mechanism for plugins.allow and plugins.deny are defined in the official documentation.
A common way to enable a plugin is as follows:
JavaScript
{
plugins: {
entries: {
'com.example.my_plugin': {
enabled: true,
config: {
// Custom plugin configuration
},
},
},
allow: ['com.example.my_plugin'],
},
}
It is recommended to include plugin self-checks in your acceptance process before going live:
Note: Some deployments require a "main command prefix" before these subcommands; if you receive a "command not found" error, prioritize the help output of your local CLI and the conventions used in other chapters of this book.
Skills are used to distill high-frequency tasks into documented processes featuring "executable steps + constraints + acceptance criteria." In the open-source ecosystem, ClawHub acts as a public skill repository, supporting vector-based semantic search, version control (Semver), and a convenient CLI installation/update experience.
You can view the list of all current built-in capabilities and their status on the Skills page of the Dashboard, as shown below:
Figure 5-2: Skills Repository Management
Similarly, you can use the CLI to semantically search for and install capability packages created by others:
Example:
Bash
skills list --eligible "daily report"
agent --message "Invoke daily-report (passing date, data source, and other parameters)"
Below is a simplified, self-authored skill template:
Markdown
# Channel Self-Check
This section provides self-check methods.
## Applicable Scenarios
Channels not replying, group chats not triggering, pairing anomalies.
## Steps1. Run `doctor --fix` first.
2. Then run `channels status --probe`.
3. If issues persist, follow `logs --follow --json` and filter for `routed` and `tool_denied` events.
## Output Requirements
Must provide the command used, expected output, exception branches, and next steps.
Note: A skill is a methodological guide, not a tool permission boundary; whether a high-risk tool is allowed to execute is still determined by tool policies and sandboxing.
While enjoying the thrill of one-click downloads for "hacker instruction sets" from ClawHub, one must confront the underlying supply chain hazards. In February 2026, a Snyk security report (Leaky Skills) revealed that out of approximately 3,984 skills indexed on ClawHub, 283 (7.1%) posed credential leakage risks. That same month, Koi Security’s ClawHavoc investigation identified 341 malicious skills and a critical Remote Code Operation vulnerability, CVE-2026-25253. Because skills are just Markdown files, they are often used as "inducement-based malicious payload executors" via complex prompts.
For example, some malicious SKILL.md files explicitly instruct the LLM to output user environment variables, API credentials, or sensitive local files in plain text to the chat history before executing API tools, or induce the operation of toxic one-click Bash scripts. Models are intelligent, but they are also easily "brainwashed" by documentation to follow such orders.
Therefore, when introducing third-party skills, you must personally review the SKILL.md content to prevent any instructions that attempt to bypass your tool interception chain.
Browser tools are used to convert interactions such as "visiting webpages, logging in, clicking, and scraping" into controlled tool calls, enabling agents to retrieve web information or perform operations as needed. This section explains the operational boundaries, common commands, and how to integrate web automation into tool policies and troubleshooting closed-loops.
The engineering challenge of browser automation lies in "controllability." Browser capabilities should be treated as high-risk tools (capable of cross-site access, page reading, and triggering external side effects), with boundaries established at two points:
Not all web interactions require launching a full browser. In practice, web-related capabilities can be divided into four progressive levels based on cost, complexity, and application scenarios. Always prioritize lower levels and only upgrade to higher levels when necessary.
Detailed Breakdown of the Four Levels
| Level | Capability | Application Scenario | Dependencies | Performance & Cost |
| L0 | Search Engine + Web Scraping | Daily info retrieval (covers 80% of cases) | Brave Search + Readability / jina.ai | Lowest |
| L1 | Headless Browser | SPA pages requiring JavaScript rendering | Headless Chrome | Low |
| L2 | Headful Browser + DOM Ops | Requires login, filling forms, clicking buttons | Chrome + Virtual Desktop (Xvfb) | Medium (Requires ≥4GB RAM) |
| L3 | Screenshot + Visual Recognition | Info exists only in images (product pics, charts) | Headful Browser + Multimodal LLM | Highest (Slowest speed) |
Decision Logic
[!NOTE] To use L2 and L3 on a cloud server, you must first install a virtual desktop service (such as Xvfb), which simulates a display in memory. Complete installation command: sudo apt-get install -y xvfb chromium-browser fonts-noto-cjk. Start command: Xvfb :99 -screen 0 1280x1024x24 &, and set the environment variable export DISPLAY=:99.
The following commands can be used to manage the browser service and verify the environment:
Before requesting browser capabilities, it is recommended to check the status first and start as needed:
Bash
openclaw browser status
openclaw browser start
Once the browser service is available, use browser open to quickly open a target page to verify network and environmental health:
Bash
openclaw browser open "https://example.com"
To stop the browser service, use browser stop:
Bash
openclaw browser stop
Web-related capabilities generally fall into two categories:
Browser-related issues should be resolved by narrowing down through these levels:
openclaw browser status
openclaw status --deep
openclaw logs --follow --json
If the "page opens but the agent cannot complete the task," first check if tool policies have denied the browser tools. if the "browser fails to start," check dependencies and the running environment, and run doctor to get structured self-check results:
Bash
openclaw doctor
The core of Chapter 5 is incorporating OpenClaw's action capabilities into engineering governance: tools and extended capabilities must be constrained by policy and possess verifiable acceptance and troubleshooting paths.
Beyond technical testing, you can try applying tool capabilities to these real-life scenarios:
Chapter 6 enters the realm of sessions, context, and memory, with the goal of turning task continuity into a controllable capability: knowing what the system remembers, why it remembers it, and how to compress or prune that data.
This chapter transforms an agent’s ability from "being able to chat" into "being able to steadily advance tasks." Sessions define state ownership, Context organizes available information within Token budgets, and Memory facilitates the long-term accumulation of facts and preferences across sessions. Together, these three elements determine a system’s reproducibility, observability, and maintainability. Through this chapter, you will learn how to maintain long-term, predictable, and reproducible conversational capabilities for agents under finite resource constraints.
This chapter consists of the following sections:
After completing this chapter, you will be able to:
This section introduces OpenClaw's session management mechanism, which consists of three core components: defining session scopes, setting reset strategies, and ensuring state persistence and troubleshooting. Through proper configuration, developers can transform risks like "cross-talk," "duplicate operation," and "state recovery failure" into configurable and observable engineering practices.
OpenClaw's session behavior is controlled by the global session configuration. The most critical "knob" is session.scope, which defines "which messages are folded into the same session." For official examples and field explanations, see: Session Configuration.
Typical Selection Logic:
| Concept | Configuration Field | Default | Aliases or Supplementary Notes |
| DM Merge Key | session.dmScope | main | Options: main (all DMs merged into one main session), per-peer (isolated by peer), per-channel-peer (isolated by channel + peer), per-account-channel-peer. |
| Reset Strategy | session.reset | N/A | Supports mode (e.g., daily, idle), atHour, idleMinutes; manual reset commands can be set via session.resetTriggers. |
| Identity Binding | session.identityLinks | {} | Used for cross-channel binding, e.g., merging User A on TG with User A on Discord. |
Configuration Example (Adapted from official docs to highlight key fields):
JavaScript
{
session: {
scope: "per-sender",
// Fold DM sessions into agent:<agentId>:<mainKey> to merge multiple DMs into a "main session."dmScope: "main",
mainKey: "main",
// Link multiple channel identities as the same "person" to prevent fragmented cross-channel dialogue.identityLinks: {
alice: ["telegram:123456789", "discord:987654321012345678"],
},
},
}
Check-off Point: You can explain which sessionKey a message will eventually land in and find the corresponding records in the logs and session storage.
After running for a long time, sessions accumulate history and context drift. OpenClaw provides configurations for resets based on time or idle duration, supporting different settings per session type (e.g., longer for DMs, shorter for group chats).
Configuration Example:
JavaScript
{
session: {
// Global reset strategyreset: {
mode: "daily", // Options: "daily", "idle", etc.atHour: 4, // Hour to reset daily (local timezone)idleMinutes: 60, // Reset after being idle for this duration
},
// Manual reset commandsresetTriggers: ["/new", "/reset"],
},
}
[!NOTE] Session reset strategies are configured globally via session.reset. For differentiated resets by session type (DM, group, thread), refer to the specific sub-field descriptions under session in the official documentation.
Beyond reset strategies, how messages queue within a session affects the user experience. OpenClaw's official queue modes include collect, steer, followup, steer-backlog, and interrupt. The default value is collect.
[!WARNING] The queue mode commonly seen in earlier versions is merely a legacy alias for steer, not the recommended default mode for queuing. Avoid using queue as a mode value in new configurations.
Main Official Queue Modes:
// ~/.openclaw/openclaw.json
{
messages: {
queue: {
mode: "collect", // Official defaultdebounceMs: 1000,
cap: 20,
drop: "summarize",
byChannel: {
telegram: "collect",
discord: "collect",
},
},
},
}
[!TIP] If you frequently need to change directions while the agent is executing (e.g., "Stop searching that, try this keyword instead"), it is recommended to enable steer mode for that specific channel. For independent long tasks where interruptions are unwanted, keep the default collect mode.
When OpenClaw acts as a shared assistant bot (e.g., opening DMs to multiple users on Telegram or WhatsApp), understanding the security of DM isolation is vital:
Secure DM Mode Recommendation: If serving multiple untrusted users, you must prohibit the use of high-risk tools like group:runtime and restrict file I/O to strict sandboxes or independent volumes. See the Official Security Documentation.
By default, OpenClaw session data is stored per agent in the ~/.openclaw/agents/<agentId>/sessions/ directory. You can override this via session.store. The design separates state into two layers:
{
session: {
store: "~/.openclaw/agents/{agentId}/sessions/sessions.json",
},
}
Recommendation: Include session storage directories in backups and audits, but do not sync sensitive content to untrusted locations. Enable masking or minimal logging where necessary.
[!TIP] Trap: Does the /new command cause memory loss? Many beginners assume /new makes the AI "forget" everything. In reality, this command only creates a new sessionId and clears the temporary dialogue context (Transcript). Persistent memory files on disk (like MEMORY.md) remain intact and will be automatically reloaded in the new session.
The Scenario: User A asks a question on Telegram but receives User B's context from WhatsApp.
An operator receives a report: "I asked about deployment, but the bot replied with financial statement info." The troubleshooting steps:
{
session: {
identityLinks: {
// Error! alice and bob mistakenly bound to the same identityalice: ["telegram:alice_tg_id", "whatsapp:bob_wa_id"],
},
},
}
3. **Root Cause Confirmation:** Due to the link error, User B's WhatsApp history was injected into User A's context.
4. **Fix:** Correct the `identityLinks` to ensure only IDs belonging to the same person are under one identity. Restart and verify with `status --deep`.
**Lesson:** `identityLinks` is the "identity merge switch." A misconfiguration can lead to privacy-leaking cross-talk. It should be on the change-audit checklist.
---
## 6.1.7 Troubleshooting Commands: Locating Anomalies via Status and Logs
When issues like "cross-talk," "sudden context loss," or "failure to reset" occur, prioritize system self-checks and structured logs.
```bash
# View overall status (--deep performs a more thorough probe)
openclaw status --deep
# Trace logs, add --json for easy filtering with jq
openclaw logs --follow --json
Operation Example: Filter the event stream for a specific session key in the logs to see if a session is being written to by multiple identities. (Field names depend on actual logs).
Bash
cat runtime.log | jq -r 'select(.type=="log") | .log | select(.sessionKey=="agent:main:whatsapp:dm:+15555550123") | [.ts,.trace_id,.event,.from] | @tsv' | tail
This section discusses context budgets based on OpenClaw's actual mechanisms: context is composed of workspaces, skills, session history, and tool receipts. When tool outputs accumulate, they must be pruned according to specific rules before being sent to the model, rather than modifying the history on disk. The focus lies on the behavior and tuning methods of agents.defaults.contextPruning, and how to use replays and metrics to verify that pruning hasn't compromised critical decision-making.
In OpenClaw, the workspace is the primary source of context. The official memory mechanism defines the purposes of several key files: system prompts, skills, workspace instructions, long-term memory, and daily logs.
The engineering challenge of context is not "whether information exists," but "whether information is injected in a usable form." A typical anti-pattern is leaving massive amounts of raw tool output in the session indefinitely, leading to spiraling costs, latency, and attention dilution.
The official agents.defaults.contextPruning feature is used to prune old tool results before a request is sent to the model. A key point: it only changes the "context sent to the model" and does not modify the session history on disk, facilitating easy replay and auditing.
[!IMPORTANT] Session Pruning is currently only effective when mode: "cache-ttl" is used with the Anthropic API (including Anthropic models via OpenRouter). Its core purpose is to prune old tool results to reduce re-caching costs after a session has been idle longer than the prompt cache TTL. If using OpenAI or other providers, this feature is currently not applicable.
Two Pruning Methods:
{
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
keepLastAssistants: 3,
softTrimRatio: 0.3,
hardClearRatio: 0.5,
minPrunableToolChars: 50000,
softTrim: { maxChars: 4000, headChars: 1500, tailChars: 1500 },
hardClear: { enabled: true, placeholder: "[Old tool result content cleared]" },
tools: { deny: ["browser", "canvas"] },
},
},
},
}
Check-off Point: In long sessions, the model input volume is controlled and stable; when replaying the same trace, key decisions do not drift inexplicably due to pruning.
When tuning, it is recommended to follow this sequence:
Operation Example: Observe system status and model-side error distributions to confirm that pruning has not introduced abnormal retries or formatting errors.
Bash
# Check overall status
openclaw status --all
Operation Example: Count the frequency of pruning-related events or placeholders to determine if over-pruning is occurring. (Field names depend on actual implementation).
Bash
cat runtime.log | rg "Old tool result content cleared" | wc -l
This section explains the official memory system: "where memory resides, how it is retrieved, and how to prevent pollution." OpenClaw’s long-term memory centers on workspace files, supported by vector indices and built-in memory tools (e.g., memory_search, memory_get). Mastering these mechanisms is essential to transforming memory from "accumulated clutter" into a "maintainable asset."
Based on official design, OpenClaw’s memory follows the "files are the source of truth" philosophy. Stored within the workspace, it consists of two primary layers:
The official best practices for "when to write to memory" are as follows:
## Deployment Regions- Conclusion: Production environment deployed in us-east-1
- Source: Change Order CHG-12345
- Updated: YYYY-MM-DD
To extract data from the fragments (both MEMORY.md and memory/**/*.md), official memory tools provide two main methods:
{
agents: {
defaults: {
memorySearch: {
query: {
hybrid: {
enabled: true,
vectorWeight: 0.7,
textWeight: 0.3
}
}
}
}
}
}
Key Point: Retrieval results should always follow the logic of "quality over quantity." Flooding the context with massive candidates only causes the model to "blindly prioritize noise."
Hybrid retrieval requires a ready index. OpenClaw’s indexing pipeline runs automatically after files hit the disk, but understanding the internal flow helps troubleshoot "file edited but not searchable" issues.
Listening and Debouncing
The system uses Chokidar to monitor memory files (MEMORY.md and memory/*.md) in the workspace in real-time. Whether written by the agent or edited by the user, saving the file triggers the indexing process. To avoid redundant builds from high-frequency writes, the listener uses a 1.5-second debounce delay—indexing only starts after 1.5 seconds of silence following the last save.
Chunking Strategy
Before indexing, the system chunks file content into units of approximately 400 tokens, with an 80-token overlap between adjacent blocks. The goal of overlap is to prevent critical semantics from being severed—for instance, a decision description spanning a boundary can be fully matched in either adjacent block.
Chunk 1: Lines 1-15 ──┐
├─ 80-token overlap
Chunk 2: Lines 12-28 ──┘──┐
├─ 80-token overlap
Chunk 3: Lines 25-40 ─────┘
Vector Generation and Storage
Each text block is sent to the embedding model (default: text-embedding-3-small, generating 1536-dimensional vectors), and the results are stored in a local SQLite database containing four core tables:
| Table | Responsibility |
| chunks | Stores raw chunk text, file paths, and line ranges. |
| embeddings | Stores the 1536-dimensional vector for each chunk. |
| fts (Full-Text Search) | Stores inverted indices for BM25 keyword retrieval. |
| vector_cache | Maps text hashes to vectors to skip duplicate embedding calls. |
The vector_cache table is noteworthy: when file content is unchanged, the system uses hash comparisons to skip duplicate API calls, saving costs and accelerating index builds.
Check-off Point: After modifying a memory file, wait ~2 seconds and use memory_search for a newly written keyword. If it hits, the pipeline is working. If it consistently fails, check the embedding API key (see 6.3.3).
In long-term systems, facts will inevitably expire. It is recommended to add "Source/Updated/Expiry" fields to every memory entry and perform periodic reviews:
💡 Real-world Trap: The Mystery of "Silent Failure" in Memory SearchYou set up your Anthropic API Key and think everything is fine, yet memory_search never returns results. After hours of debugging, you realize: vector retrieval for memory search requires an independent embedding API key (OpenAI, Gemini, or Voyage), which is distinct from the chat model key. The most frustrating part is that it doesn't error out; it just silently returns empty results. Use openclaw doctor after setup to verify the embedding provider status.
This section breaks down "context explosion" into two configurable mechanisms: Tool Result Pruning and Session Compaction. The former is controlled by agents.defaults.contextPruning, aiming to discard old tool results to reduce model input volume. The latter is controlled by agents.defaults.compaction, aiming to generate summaries when a session nears its threshold and flush long-term memory if necessary. Together, they ensure both usability and reproducibility in long-running sessions.
When dealing with context explosion, beginners often confuse these two OpenClaw schemes. Their fundamental differences lie in "persistence" and "lifecycle triggers":
[!IMPORTANT] Session Pruning is currently only effective when mode: "cache-ttl" is used with specific drivers like the Anthropic API. Compaction, however, is a long-term, independent safety valve that benefits every model. While they complement each other, they do not depend on one another.
Understanding the full parameters of pruning helps with fine-tuning in production—such as adjusting how many recent assistant replies to protect or at what volume threshold to trigger pruning.
| Parameter | Default Value | Description |
| mode | "cache-ttl" | Pruning mode. |
| ttlMs | 300,000 (5 mins) | Cache TTL. |
| keepLastAssistants | 3 | Protects the last N assistant replies from pruning. |
| softTrimRatio | 0.3 | Triggers soft-trim when context reaches 30%. |
| hardClearRatio | 0.5 | Triggers hard-clear when context reaches 50%. |
| softTrim.maxChars | 4,000 | Only prune tool results exceeding this length. |
| softTrim.headChars | 1,500 | Characters to keep at the head during soft-trim. |
| softTrim.tailChars | 1,500 | Characters to keep at the tail during soft-trim. |
| minPrunableToolChars | 50,000 | Minimum tool result size for hard-clear. |
Pruning is executed in two stages: the first stage (soft-trim) truncates long tool results into "Head 1500 + ... + Tail 1500"; the second stage (hard-clear) replaces the entire result with the placeholder [Old tool result content cleared].
Three Key Safety Constraints (Hard-coded rules in source):
Compaction essentially asks the LLM to summarize history. However, LLMs have a dangerous tendency to silently alter opaque identifiers—shortening UUIDs, omitting parts of API Keys, or dropping URL parameters. This leads to "ghost errors" in subsequent tool calls.
The source code (compaction.ts) includes built-in Identifier Preservation Instructions, injected into every summary request:
"Preserve all opaque identifiers exactly as written (no shortening or reconstruction), including UUIDs, hashes, IDs, tokens, API keys, hostnames, IPs, ports, URLs, and file names."
Batch Summary Strategy: If a dialogue is too long for a single summary, the system splits messages by token share (default into 2 segments), summarizes each independently, and then merges them. The merge phase specifically requires preserving active task states, batch progress (e.g., "5/17 completed"), the last user request, decision rationale, and to-do items.
Safety Filtering: Before compaction, stripToolResultDetails() strips the details field from all tool results to prevent untrusted or verbose payloads (like stderr or HTTP headers) from entering the summary prompt, saving tokens and avoiding prompt injection.
Before executing compaction, OpenClaw attempts to let the agent proactively save critical info. When the system detects context approaching the soft threshold (softThresholdTokens), it inserts a silent agent turn to allow the model to archive important content to persistent storage.
The 6-Step Pre-Compaction Process:
{
agents: {
defaults: {
compaction: {
reserveTokensFloor: 20000,
memoryFlush: {
enabled: true,
softThresholdTokens: 4000,
prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store."
}
}
}
}
}
Operation Example: Use the status command to confirm system availability, then observe logs to see if pruning and compaction are triggered too frequently during peak periods.
Bash
openclaw status --deep
openclaw logs --follow --json
Operation Example: Count how often tool results are being trimmed (Log fields depend on actual implementation).
Bash
cat runtime.log | rg "Tool result trimmed" | wc -l
Chapter 6 brings the issue of "conversations becoming unstable as they grow" back from the realm of model capability into the realm of engineering control: Session Keys determine state ownership, Context Pruning regulates input volume, Long-term Memory files carry stable facts, and Session Compaction ensures long conversations continue to progress while remaining easy to replay and audit.
Having completed this chapter, your agent now possesses "photographic memory." You might try the following practices:
When access points expand from a single channel to multi-channel, multi-group, and multi-endpoint environments, the most common issue is not a lack of model capability, but rather blurred boundaries: Who is responsible for processing? Which entry points are allowed to trigger? Which capabilities are high-risk? How do we playback and locate issues when they occur?
The core objective of this chapter is to transform multi-channel and multi-agent operations into a manageable system:
This chapter includes the following sections:
This section explains the integration models, configuration structures, and security boundaries for Telegram and WhatsApp based on official access methods. it provides a landing path from "functional" to "controllable": first using Gating Strategies to converge the trigger surface, then using Probes and Logs to verify connectivity, and finally using Binding and Tool Policies to fix high-risk capabilities within controlled entry points.
Telegram channels are typically integrated via Bot Tokens. The official documentation defines the channels.telegram configuration fields, as well as entry points for Private Message (DM) and Group Chat strategies. The dmPolicy defaults to pairing (consistent with WhatsApp) and supports four options: pairing, allowlist, open, and disabled. It is recommended to start securely using allowlists and mention gating.
Example Configuration:
JavaScript
{
channels: {
telegram: {
botToken: '${TELEGRAM_BOT_TOKEN}',
dmPolicy: 'allowlist',
allowFrom: ['tg:987654321'],
groupPolicy: 'allowlist',
groupAllowFrom: ['tg:987654321'],
groups: { '*': { requireMention: true } },
},
},
messages: {
groupChat: {
mentionPatterns: ['@openclaw'],
},
},
}
After configuration, it is recommended to verify connectivity with a channel probe before moving to Routing and Tool layer governance:
Bash
openclaw channels status --probe
The WhatsApp channel runs on an actual account, meaning the primary risk is a naturally larger trigger surface. The official documentation provides dmPolicy, groupPolicy, and pairing workflows. It is recommended to converge entry points through pairing, allowlists, and mention gating.
Official recommendations include two deployment modes:
openclaw channels login --channel whatsapp --account work
Example Configuration:
JavaScript
{
channels: {
whatsapp: {
dmPolicy: 'pairing',
allowFrom: ['+15555550123'],
groupPolicy: 'allowlist',
groupAllowFrom: ['+15555550123'],
groups: { '*': { requireMention: true } },
},
},
messages: {
groupChat: {
mentionPatterns: ['@openclaw'],
},
},
}
The pairing process should be included in the O&M (Operations) acceptance:
Bash
openclaw pairing list whatsapp
openclaw pairing approve whatsapp <CODE> --notify
Channel strategies handle entry gating and default takeovers. However, for administrator accounts, critical groups, or fixed business numbers, it is recommended to use Binding to fix the takeover agent. This reduces uncertainty in model-based routing.
Verify whether a binding is effective using commands rather than just observing the conversation:
Bash
openclaw agents list --bindings
The recommended troubleshooting sequence for channel issues is:
openclaw doctor
openclaw status --deep
openclaw channels status --probe
openclaw logs --follow --json
Pro Tip: When "Group chat does not trigger," first check groups.*.requireMention and messages.groupChat.mentionPatterns. When "Unauthorized operation" occurs, first check if the Tool Policy is defaulting to "deny" for high-risk tool groups.
As a system expands from a single entry point to multi-channel, multi-group, and multi-endpoint environments, the greatest risk often stems from boundary drift: low-trust entry points triggering high-privilege capabilities, or different entry points sharing the same policy, making accountability difficult to trace. Based on OpenClaw's official channel policies, multi-account configurations, and binding mechanisms, this section explains how to shift entry point governance to the configuration layer and provides a set of commands for acceptance and troubleshooting.
The first principle of entry point governance is to separate Private Messages (DMs) from Group Chats. Official channel documentation generally provides dmPolicy and groupPolicy entry points to define allowlists and mention gating respectively:
The value of multi-account support lies not in "running multiple instances," but in isolation: decoupling external support entry points from internal O&M (Operations) entry points, ensuring different entry points possess distinct gating policies. For example, channels.whatsapp.accounts can be used to provide multi-account isolation.
The following example demonstrates a configuration skeleton for two accounts:
JavaScript
{
channels: {
whatsapp: {
accounts: {
support: {
dmPolicy: 'pairing',
allowFrom: ['+15555550123'],
groupPolicy: 'allowlist',
groupAllowFrom: ['+15555550123'],
groups: { '*': { requireMention: true } },
},
ops: {
dmPolicy: 'allowlist',
allowFrom: ['+15555550999'],
groupPolicy: 'allowlist',
groupAllowFrom: ['+15555550999'],
groups: { '*': { requireMention: true } },
},
},
},
},
messages: {
groupChat: {
mentionPatterns: ['@openclaw'],
},
},
}
Once multi-account deployment is live, it is recommended to treat the account identifier as a first-class dimension in logs and alerts to facilitate auditing and attribution.
While multi-account setups handle entry point isolation, Binding handles precise routing. For administrator endpoints, critical groups, or fixed business numbers, it is recommended to prioritize using bindings to fix the takeover agent, thereby reducing uncertainty in model-based routing.
You can directly verify whether a binding is effective using the following command:
Bash
openclaw agents list --bindings
Issues related to entry point governance should be resolved layer by layer:
openclaw doctor
openclaw channels status --probe
openclaw status --deep
openclaw logs --follow --json
Quick Fix: If "Group chat triggers incorrectly" occurs, first check mention gating and allowlists; if "Unauthorized operation" occurs, first check if the tool policy is defaulting to "deny" for high-risk tool groups.
Connecting Lark requires not only configuring OpenClaw but also completing a series of authorizations on the Lark Open Platform. Many beginners fail at the first step: "Long-Connection Subscription." This section outlines an error-proof, end-to-end integration flow for Lark.
Note: Lark integration introduces external platform variables. If you are still setting up your local baseline environment, it is recommended to complete the basic configuration in [Chapter 3] and verify its stability before starting this section.
Common Failure Points Quick Check (Review before starting):
End-to-End Interaction Flow for Lark Integration(Sequence Summary):
Lessons Learned: If you go straight to "Events and Callbacks" to enable the long connection (Step 4) without clicking "Create Version and Publish" (Step 1), the system will infinitely report "Long-connection subscription failed."
Lark is not a built-in channel; therefore, you must install and enable the corresponding plugin before configuring the channel:
Bash
# 1. Download and install the plugin (Required, or 'enable' will fail)
openclaw plugins install @openclaw/feishu
# 2. Enable the plugin for the runtime
openclaw plugins enable feishu
# 3. Add and configure the channel
openclaw channels add
[!NOTE] Difference between install and enable: install downloads the code package from the registry to the local environment; enable registers the plugin into the current openclaw.json configuration to activate it. A common mistake is trying to enable directly. If it fails, force the install step to bring the plugin into the local extensions directory.
openclaw channels list
Routing is not about "whether a model can do it," but rather "which agent should take over this message, and what is it authorized to do." This section focuses on OpenClaw's multi-agent routing to explain the decision chain, the priority of binding rules, and how to use observability to turn routing into an engineering capability that is replayable, auditable, and troubleshootable.
In a multi-agent system, three things must be determined as soon as a message enters: Who processes it (Ownership), what can be done (Tools & Permissions), and where the state is written (Sessions & Memory). Failure to clarify these results in two common types of faults:
The typical OpenClaw decision chain can be summarized as: "Match bindings first, then enter the router." Bindings are used to stably hand over messages from specific sources to designated agents, reducing the uncertainty of model-based classification. If a binding hits, the system bypasses the router and delivers the message directly to the bound agent.
The "Binding-First" Branch in Multi-Agent Routing
(Flow Summary):
In engineering practice, it is recommended to prioritize high-risk or high-certainty entry points in bindings and leave intent-based entry points to the router.
The advantage of binding is that it turns "routing correctness" from a probability problem into a rule-based one. In the configuration structure, bindings is a top-level array (at the same level as agents and channels). Each binding points to a target agent via agentId and describes matching conditions via a match object.
JavaScript
{
// bindings is a top-level array, NOT nested inside agentsbindings: [
{
agentId: "work",
match: {
channel: "whatsapp",
peer: { kind: "direct", id: "+15551234567" },
},
},
{
agentId: "work",
match: {
channel: "telegram",
peer: { kind: "direct", id: "987654321" },
},
},
],
agents: {
list: [
{
id: "work",
name: "Work Assistant",
workspace: "~/.openclaw/workspace-work",
agentDir: "~/.openclaw/agents/work/agent",
},
],
},
}
[!WARNING] bindings must be placed at the top level of the configuration file. Nested definitions inside an agent object in agents.list will not be recognized, causing the binding to fail silently.
As the system grows to "multi-account, multi-group, multi-entry," we recommend a two-layer strategy:
Here is a real-world scenario: A team uses Telegram and WhatsApp and needs to route tasks to different agents based on the source and risk level.
Scenario Description
{
agents: {
list: [
{
id: "assistant",
default: true,
name: "Default Assistant",
workspace: "~/.openclaw/workspace-assistant",
agentDir: "~/.openclaw/agents/assistant/agent",
model: "anthropic/claude-sonnet-4-6",
tools: { allow: ["group:fs", "group:web"], deny: ["group:runtime"] },
},
{
id: "devops",
name: "DevOps Agent",
workspace: "~/.openclaw/workspace-devops",
agentDir: "~/.openclaw/agents/devops/agent",
model: "anthropic/claude-sonnet-4-6",
tools: { allow: ["group:runtime", "group:fs", "group:web"] },
sandbox: { mode: "all", scope: "agent" },
},
{
id: "writer",
name: "Writing Agent",
workspace: "~/.openclaw/workspace-writer",
agentDir: "~/.openclaw/agents/writer/agent",
model: "anthropic/claude-sonnet-4-6",
tools: { allow: ["group:fs", "group:web"], deny: ["group:runtime"] },
},
],
},
bindings: [
{
agentId: "devops",
match: {
channel: "telegram",
peer: { kind: "group", id: "-1001234567890" },
},
},
{
agentId: "writer",
match: {
channel: "whatsapp",
peer: { kind: "direct", id: "+8615600000000" },
},
},
],
channels: {
telegram: { enabled: true },
whatsapp: { enabled: true },
},
}
Binding Priority and Matching Order When a message arrives, the system attempts to match in the following order (most specific first):
[!NOTE] When a binding contains multiple match fields, all fields must be satisfied (AND semantics). The first hitting binding takes effect; subsequent checks are bypassed.
Verify Loading and Priority
Bash
openclaw agents list --bindings
openclaw logs --follow --json --filter "routing"
The difficulty in troubleshooting multi-agent systems is that "it looks like a model issue, but it's actually a routing or policy issue." We recommend logging:
While routing ensures messages go to the right agent, Memory Isolation ensures knowledge doesn't cross-contaminate. OpenClaw implements dual-layer isolation: Source Files are separated by workspace, and Vector Indices are separated by Agent ID.
Plaintext
~/.clawdbot/memory/ # Index storage (Status dir)
├── assistant.sqlite # Vector index for Assistant
└── devops.sqlite # Vector index for DevOps
~/.openclaw/workspace-assistant/ # Assistant workspace (Source files)
├── MEMORY.md
└── memory/
~/.openclaw/workspace-devops/ # DevOps workspace (Source files)
├── MEMORY.md
└── memory/
Each agent in agents.list declares an independent workspace. Their MEMORY.md and memory/ directories are physically separate. Index files (SQLite) are stored centrally but distinguished by agentId—memory_search will only query the index file corresponding to the current agent.
Engineering Significance:
⚠️ WARNING: Without strict sandbox mode (sandbox.mode: "all"), an agent might theoretically use filesystem tools to read another agent's workspace. In production, explicitly configure the sandbox parameter for each agent to ensure filesystem access is constrained to its own path.
Multi-agent systems are used not only for entry routing but also for decomposing complex tasks into parallelizable sub-tasks and delivering results to multiple target groups or endpoints. This section details the configuration frameworks and validation commands for OpenClaw's Sub-Agent and Broadcast Group capabilities. It also introduces an engineering-grade Handover Protocol to prevent collaboration from losing fidelity due to vague "verbal descriptions."
Sub-agents are task-operation units dynamically derived by a master agent via built-in tools or slash commands. They allow a complex task to be split into multiple parallel branches, with the master agent serving as the coordinator and summarizer.
The master agent can derive sub-agents manually via slash commands or automatically through the spawn_subagent tool. Basic slash commands include:
Bash
/subagents spawn <agentId> <task> # Derive a sub-agent to execute a specific task
/subagents list # View all active sub-agents
/subagents log <id> # View sub-agent runtime logs
/subagents erminate <id|all> # Terminate sub-agents
Broadcast groups distribute a single inbound message to multiple agents for parallel processing simultaneously—ideal for scenarios like "multi-role review," "multi-language support," or "batch alerting."
Broadcasts are configured via the top-level broadcast field, mapping a Group ID or phone number to an array of Agent IDs:
JavaScript
{
broadcast: {
strategy: 'parallel', // Optional, defaults to parallel'120363000000000000@g.us': ['alfred', 'baerbel'], // WhatsApp Group → Two agents respond'+15555550123': ['support', 'logger'], // DM Number → Two agents respond
},
}
Because broadcasting is a high-impact action, it should be paired with Tool Policies: only allow specific agents to trigger broadcasts, and log the group name, target summary, and trigger reason for auditing.
When a sub-agent completes a task, it must "notify" the parent agent or target session. OpenClaw manages this through the Announce Queue, which handles concurrency conflicts, cross-channel routing, message loss, and retries.
Queue Lifecycle: Enqueue → Debounce → Drain → Deliver
Core Configuration Parameters:
| Parameter | Default | Description |
| debounceMs | 1000 (1s) | Window to aggregate more incoming messages. |
| cap | 20 | Maximum queue depth. |
| dropPolicy | "summarize" | Overflow policy: summarize, old (drop oldest), or new (reject new). |
| mode | — | Queue behavior: steer, followup, collect, queue, etc. |
Key Features:
The most common failure in collaboration is incomplete handover information. Use a structured template for handovers:
Scenario: A coordinator triggers daily at 09:00, spawning sub-agents to check Git commits and Jira tasks, then broadcasts a summary.
JavaScript
{
agents: {
list: [
{
id: "coordinator",
displayName: "Report Coordinator",
tools: ["spawn_subagent", "summarize_reports", "broadcast_message"],
systemPrompt: "You are the daily summarizer. At 9 AM, spawn sub-agents to check code and tasks, then broadcast the result."
},
{
id: "code-reviewer",
toolGroups: ["git_readonly"],
systemPrompt: "Analyze Git commits from the last 24h. Return JSON."
},
{
id: "task-tracker",
toolGroups: ["jira_readonly"],
systemPrompt: "Search for tasks marked 'Done' today. Return JSON."
}
]
},
cron: [
{
id: "daily_standup",
agentId: "coordinator",
expression: "0 9 * * 1-5", // Mon-Fri 09:00task: "Generate today's report",
},
],
broadcast: {
strategy: "sequential",
"C1234567890": ["coordinator"], // Slack Channel"120363000000000000@g.us": ["coordinator"] // WhatsApp Group
}
}
Verification & Monitoring Commands:
Bash
openclaw cron list # Check scheduled tasks
openclaw cron trigger <id> # Manually trigger for testing
openclaw logs --follow --json --filter "subagent"
openclaw logs --follow --json --filter "broadcast"
Sub-agents parallelize complex tasks while maintaining clear boundaries via the dynamic spawn mechanism. Broadcast groups ensure stable delivery to multiple agents via top-level configuration. Both should be paired with Tool Policies and structured logging to ensure the collaboration is auditable and replayable. By standardizing handover protocols, you can significantly reduce information loss in multi-agent workflows.
Chapter 7 establishes deterministic boundaries for multi-channel access and multi-agent collaboration: Channel Strategies converge the trigger surface, Binding and Routing converge ownership, and Tool Policies and Sandbox Constraints converge operation capabilities. All of these are supported by probes and structured logs to provide a replayable troubleshooting path.
Once a proper routing network and sandbox policies are configured, multi-agent collaboration can significantly extend business boundaries:
[Chapter 8] moves into Automation and O&M (Operations): covering self-checks, scheduled jobs (Cron), remote access, and security baselines. The goal is to advance the system from "functional" to "capable of long-term stable operation."
This chapter focuses on the continuous operation scenarios of OpenClaw: exploring how to evolve the system from an initial "test-ready" state into an industrial-grade architecture capable of "long-term reliable operation" in production environments. Through this chapter, you will master the core capabilities required to keep OpenClaw running securely, predictably, and auditably under unattended conditions.
Long-term operability is far more than simply adding a few cron scripts; it requires embedding the following four dimensions of governance capabilities into the system at the architectural level:
This chapter includes the following sections:
After completing this chapter, you will be able to:
This section discusses the engineering implementation of Hooks: decoupling governance logic from the main operation chain and ensuring that the Hook itself does not become a new source of failure.
[!NOTE] The Hook patterns discussed here represent general engineering best practices. Specific registration methods and event lists on the implementation side may evolve with versions: use the actual output of doctor, status --deep, and structured logs as your source of truth. This section focuses on the responsibility boundaries and stability constraints of Hooks.
Hooks are suited for carrying cross-cutting concerns rather than the primary business flow. Common scenarios include:
It is recommended to fix Hooks within three specific stages to avoid role confusion. The following flowchart illustrates how Hooks intercept the three stages of the main chain:
{% @mermaid/diagram content="flowchart LR
subgraph input["Input Stage"]
I1["Rate Limit/Whitelisting"] --> I2["Format Validation"]
end
subgraph exec["Operation Stage"]
E1["Tool Call Audit"] --> E2["Risk Tagging"]
end
subgraph output["Output Stage"]
O1["Data Masking"] --> O2["Format Convergence"]
end
req["Inbound Request"] --> input
input -->|"Admission Passed"| main["Main Chain Reasoning"]
main --> exec
exec --> result["Generate Result"]
result --> output
output --> resp["Return Response"]
input -->|"Admission Rejected"| reject["Early Rejection"]" %}
Hook Entry Points in the Three-Stage Lifecycle of the Main Chain
| Stage | Objective | Prohibited Actions |
| Input Stage | Early rejection, noise reduction, admission validation | Long-duration external calls, irreversible writes |
| Operation Stage | Recording key decisions, risk interception | Modifying core business state |
| Output Stage | Masking and format governance | Temporary privilege escalation, bypassing policy enforcement |
Hooks must be constrained; otherwise, they risk dragging down the main chain.
It is recommended that all Hook events use a unified structure, containing at minimum: Timestamp, Trace ID, Stage, Action, and Result.
JSON
{
"ts": "YYYY-MM-DDTHH:MM:SSZ",
"trace_id": "t-YYYYMMDD-001",
"stage": "input",
"hook": "rate_limit_guard",
"event": "rejected",
"reason": "rate_limit_exceeded"
}
After deployment, two types of checks are recommended:
openclaw status --deep
openclaw logs --follow --json
cat runtime.log | jq -r 'select(.type=="log") | .log | select(.component=="hook") | .event' | sort | uniq -c | sort -nr | head
This section focuses on the stability design of unattended operations. The goal is not merely to "trigger once at a set time," but to ensure that "repeated Operation does not lead to a loss of control."
[!NOTE] The concepts of re-entrancy protection, idempotency keys, and failure shunting discussed here are general scheduling engineering practices applicable to external scheduled jobs orchestrated on a host. Specific switches and event names for built-in scheduling mechanisms may evolve with versions: use the actual output of --help, status --deep, and structured logs as your source of truth.
Concrete Example: Automated Daily Standup Summaries
Suppose a team wants an agent to automatically pull yesterday's progress from Slack and Jira every morning at 9:00 AM, generate a formatted standup summary, and post it to a Feishu/Lark group. The engineering constraints for this job are as follows:
# crontab entry
0 9 * * 1-5 /opt/openclaw/jobs/daily_standup.sh >> /var/log/oc_jobs/standup.log 2>&1
# daily_standup.sh core logic
WINDOW_START=$(date -d "today 09:00" +%s)
IDEM_KEY="daily_standup:v1:${WINDOW_START}"# Idempotency check: Execute only once per windowif redis-cli SET "oc_idem:${IDEM_KEY}" 1 NX EX 86400; then
openclaw agent --message "Generate today's standup summary and post to Feishu group ops_daily"elseecho "Already executed, skipping"fi
Scheduled tasks in a production environment must satisfy at least the following:
The most common issue when task operation time exceeds the trigger cycle is a "re-entrancy storm." It is recommended to use a distributed lock with a TTL (Time-To-Live) and record instance ownership.
Bash
# Execute task only if lock acquisition is successful
redis-cli SET oc_job_lock:daily_report "<instance_id>" NX EX 600
When releasing the lock, verify the lock holder to avoid accidentally deleting a lock belonging to another instance.
Before retrying a job, define idempotency key rules. A recommended format is "Job Name + Window Start Time."
idempotency_key = "daily_report:v2:" + window_start_ts
As long as the idempotency key remains consistent, downstream write operations must behave such that "repeated requests have no additional side effects."
Before retrying a job, define idempotency key rules. A recommended format is "Job Name + Window Start Time."
idempotency_key = "daily_report:v2:" + window_start_ts
As long as the idempotency key remains consistent, downstream write operations must behave such that "repeated requests have no additional side effects."
Focus on three items during acceptance:
💡 Troubleshooting Note: The "Ghost Operation" of Scheduled TasksA standup summary task was set for 9:00 AM daily, but occasionally executed twice—at 9:00 and 9:03. Investigation revealed that the cron scheduler reloaded during a Gateway restart. If the restart happened exactly within the task trigger window, duplicate operation occurred. Solution: Add an idempotency check within the task logic (e.g., checking if the summary was already sent today) or use openclaw cron list to confirm task status before restarting.
The previous section covered Cron jobs designed for precise points in time; this section introduces another built-in scheduling primitive in OpenClaw—Heartbeat. If Cron answers "what should be done at a specific time," Heartbeat answers "is there anything that needs attention?"
[!TIP] Not sure whether to choose Heartbeat or Cron? A quick rule of thumb: If you need precise timing → use Cron; if you need periodic awareness and "as-needed" notifications → use Heartbeat. See 8.3.8 Selection Decision Tree for a detailed comparison.
A Heartbeat is a scheduled agent turn at the gateway level: at fixed intervals (default 30 minutes), the gateway injects a heartbeat prompt into the agent's main session, triggering a full agent reasoning cycle. The agent reads the HEARTBEAT.md checklist in the workspace, inspects statuses like inboxes, calendars, and todos, and then provides one of two responses:
The following diagram illustrates the complete heartbeat path from timer trigger to message delivery. When troubleshooting, locating where it "stuck" using this map is the fastest approach.
{% @mermaid/diagram content="flowchart TD
A["① Timer Expires"] --> B["② Wake-up Consolidation
Merge multiple triggers within a 250ms window"]
B --> C{"③ Pre-check Gating"}
C -->|"Disabled / Quiet Hours / Queue Busy / Checklist Empty"| SKIP["Skip This Turn"]
C -->|"All Passed"| D["④ Assemble Prompt
Select template based on source, append current time"]
D --> E["⑤ Call LLM"]
E --> F{"⑥ Parse Response"}
F -->|"HEARTBEAT_OK or Empty"| G["Silent Handling
Roll back session timestamp"]
F -->|"Same as Last Time"| H["Deduplication Skip"]
F -->|"Substantive Content"| I["⑦ Parse Delivery Target & Visibility"]
I --> J["⑧ Channel Readiness Check & Send"]
J --> K["⑨ Emit Event, Trim Transcript, Advance Schedule"]" %}
Several key design decisions are worth noting:
Timers are per-agent. Each agent independently maintains its own heartbeat interval and last operation time. In multi-agent scenarios, they heartbeat at their own pace without blocking each other. When configurations are hot-reloaded, the system recalculates intervals for all agents.
Wake-up consolidation prevents storms. Multiple trigger sources (timer expiry, Cron events, exec completion) might request a heartbeat simultaneously. The system uses a 250ms window to merge them into a single operation, preserving the highest priority wake-up reason (ACTION > DEFAULT > INTERVAL > RETRY).
Heartbeats do not extend session life. If the response is HEARTBEAT_OK (no substantive content), the gateway rolls back the session's updatedAt timestamp to its pre-heartbeat value. This ensures that idle expiry works correctly—pure heartbeats should not indefinitely prolong a session's lifespan.
The key to understanding heartbeat behavior is knowing exactly what is sent to the LLM during each cycle and how the system interprets the reply.
When promptMode is not "minimal", the gateway injects a heartbeat protocol instruction into the system prompt:
## Heartbeats
Heartbeat prompt: [Configured heartbeat prompt or default]
If you receive a heartbeat poll (a user message matching the heartbeat
prompt above), and there is nothing that needs attention, reply exactly:
HEARTBEAT_OK
OpenClaw treats a leading/trailing "HEARTBEAT_OK" as a heartbeat ack
(and may discard it).
If something needs attention, do NOT include "HEARTBEAT_OK"; reply with
the alert text instead.
If using Lightweight Context mode (lightContext: true), this paragraph is not injected—because in lightweight mode, only HEARTBEAT.md is loaded, not the full system prompt paragraphs.
The # Project Context section of the system prompt injects workspace bootstrap files. Which files are injected depends on the lightContext configuration:
| Mode | Injected Files | Use Case |
| Full Mode (Default) | All bootstrap files (SOUL.md, README.md, TOOLS.md, MEMORY.md, HEARTBEAT.md, etc.) | Requires full context for decision-making |
| Lightweight Mode (lightContext: true) | HEARTBEAT.md only | Inspection checklist is clear; no other context needed; saves tokens |
The user message is the "trigger command" for the heartbeat, taking three forms depending on the source:
① Regular Heartbeat (Timer Expiry)
Read HEARTBEAT.md if it exists (workspace context). Follow it strictly.
Do not infer or repeat old tasks from prior chats.
If nothing needs attention, reply HEARTBEAT_OK.
Current time: 2026-03-09 14:30 (Asia/Shanghai) / 06:30 UTC
The default prompt can be completely replaced via heartbeat.prompt. The time line at the end is automatically appended in a fixed format: Current time: [Local Time] ([Timezone]) / [UTC Time] UTC.
② Cron Event Trigger
When a Cron job generates a system event, the next heartbeat will automatically pick it up:
A scheduled reminder has been triggered. The reminder content is:
[Reminder text from Cron event]
Handle this reminder internally. Do not relay it to the user unless
explicitly requested.
Current time: 2026-03-09 14:30 (Asia/Shanghai) / 06:30 UTC
③ Async Command (exec) Completion Trigger
An async command you ran earlier has completed. The result is shown in
the system messages above. Handle the result internally. Do not relay
it to the user unless explicitly requested.
Current time: 2026-03-09 14:30 (Asia/Shanghai) / 06:30 UTC
The heartbeat cycle passes an isHeartbeat: true flag through the entire call chain, producing the following effects:
| Segment | Effect |
| Model Selection | Can use heartbeat.model to override the default model |
| Bootstrap Context | Switches to lightweight filtering (HEARTBEAT.md only) when lightContext: true |
| Tool Error Alerts | Tool error alerts can be suppressed via config to avoid interfering with heartbeat judgment |
| Transcript Trimming | Trims the transcript after a HEARTBEAT_OK reply to avoid context pollution |
HEARTBEAT_OK is more than a string; it is a bidirectionally agreed-upon protocol token:
| Position | Behavior |
| Start or end of reply | Identified as an ACK token; if remaining text is ≤ ackMaxChars (default 300), the entire reply is discarded |
| Middle of reply | No special handling; treated as regular text |
| In non-heartbeat turn | Leading/trailing HEARTBEAT_OK is silently stripped and logged; messages containing only this token are discarded |
The system's standardization process also handles occasional formatting marks added by LLMs—<b>HEARTBEAT_OK</b> and HEARTBEAT_OK are both correctly recognized and stripped.
Minimal configuration works with default values. The full configuration is as follows:
JavaScript
{
agents: {
defaults: {
heartbeat: {
every: "30m", // Interval duration string; "0m" to disablemodel: "anthropic/claude-sonnet-4-6", // Optional: use a cheaper model for heartbeatsprompt: "Read HEARTBEAT.md if it exists...", // Custom prompt (replaces, doesn't merge)target: "none", // "none" | "last" | specific channel nameto: "+15551234567", // Specific recipient within the channelaccountId: "ops-bot", // Account ID for multi-account channelsdirectPolicy: "allow", // "allow" | "block" (block DM deliveries)lightContext: false, // If true, only injects HEARTBEAT.md to save tokensincludeReasoning: false, // If true, sends the reasoning process as wellackMaxChars: 300, // Max allowed characters alongside HEARTBEAT_OKactiveHours: { // Active period limitsstart: "09:00", // HH:MM, inclusiveend: "22:00", // HH:MM, exclusive; "24:00" means midnighttimezone: "Asia/Shanghai" // "user" | "local" | IANA timezone
}
}
}
}
}
Heartbeat configuration cascades across two dimensions:
Agent Dimension:agents.defaults.heartbeat sets the global default; agents.list[].heartbeat overrides for a specific agent. Once any agent declares a heartbeat block, only those agents that declared it will run heartbeats.
Channel Visibility Dimension:channels.defaults.heartbeat → channels.<channel>.heartbeat → channels.<channel>.accounts.<id>.heartbeat. Overrides happen from broad to specific, controlling the showOk, showAlerts, and useIndicator toggles.
JavaScript
{
agents: {
defaults: {
heartbeat: { every: "30m", target: "last" }
},
list: [
{ id: "main", default: true }, // No heartbeat block -> Heartbeat not running
{
id: "ops",
heartbeat: { // Only the ops agent runs heartbeatsevery: "1h",
target: "telegram",
to: "12345678:topic:42",
accountId: "ops-bot"
}
}
]
}
}
HEARTBEAT.md is an optional file in the agent's workspace root directory that acts as a heartbeat inspection checklist. The default prompt instructs the agent to read and strictly execute it.
Markdown
# Heartbeat Inspection Checklist- Scan inbox; notify with summary if there are urgent emails
- Check calendar events for the next 2 hours
- Report results if any background tasks have completed
- Send a brief greeting if idle for more than 8 hours
Design Points:
activeHours performs timezone-aware filtering before a heartbeat triggers, supporting windows that cross midnight. Heartbeats outside the window are skipped, logged with the reason quiet-hours.
Common Patterns:
| Goal | Configuration |
| Heartbeat only during work hours | activeHours: { start: "09:00", end: "18:00" } |
| Run 24/7 | Omit activeHours (default behavior) |
| Avoid late-night disturbance | activeHours: { start: "08:00", end: "24:00" } |
[!WARNING] start and end cannot be equal (e.g., 08:00 to 08:00); this is treated as a zero-width window, and heartbeats will always be skipped.
Visibility controls determine whether heartbeat messages are actually sent to the channel:
YAML
channels:defaults:heartbeat:showOk: false # Don't send HEARTBEAT_OK ACKs by defaultshowAlerts: true # Send alert content by defaultuseIndicator: true # Emit UI indicator events by defaulttelegram:heartbeat:showOk: true # Show OK confirmations on Telegram specificallywhatsapp:accounts:work:heartbeat:showAlerts: false # The "work" account does not receive alerts
When all three toggles are false, the gateway skips the heartbeat cycle entirely (without calling the LLM), which is the most economical "Total Silence" mode.
Every heartbeat operation emits an event for UI and monitoring consumption:
| Status | Meaning | Trigger Scenario |
| sent | Message Delivered | Alert content successfully sent to the channel |
| ok-empty | Empty Reply | LLM had no output; optionally sends HEARTBEAT_OK |
| ok-token | Token ACK | Reply contained only HEARTBEAT_OK; stripped |
| skipped | Skipped | alerts-disabled / duplicate / quiet-hours / no-target, etc. |
| failed | Failed | Error during LLM call or delivery process |
Indicator Type Mapping: ok-empty / ok-token → "ok" (Green); sent → "alert" (Yellow); failed → "error" (Red). In the WebChat Debug page, you can see JSON snapshots of the latest heartbeat events.
You don't have to wait for the next heartbeat cycle; you can trigger one immediately:
Bash
# Wake up heartbeat immediately
openclaw system event --text "Check for urgent follow-ups" --mode now
# Wait until the next heartbeat cycle to process
openclaw system event --text "Check project status" --mode next-heartbeat
If multiple agents have heartbeats configured, --mode now will trigger heartbeats for all of them immediately.
System events are not limited to manual input—Cron jobs and exec completions also generate events, which the heartbeat automatically incorporates during its next trigger. The system checks the pending event queue and generates specific prompts based on the event type (e.g., Cron reminder text is embedded directly into the prompt body; see 8.3.3 User Message Body).
Does it need to run at a precise time?
Yes -> Use Cron
No -> Continue...
Does it need to be isolated from the main session?
Yes -> Use Cron (isolated session)
No -> Continue...
Can it be merged with other periodic checks?
Yes -> Use Heartbeat (add to HEARTBEAT.md)
No -> Use Cron
Is it a one-time reminder?
Yes -> Use Cron + --at
No -> Continue...
Does it require a different model or reasoning depth?
Yes -> Use Cron (isolated) + --model/--thinking
No -> Use Heartbeat
Best Practice: Use both in tandem. Use Heartbeat for routine periodic inspections (inbox, calendar, notifications), completing multiple checks in one batch. Use Cron for independent jobs requiring precise timing (daily reports, weekly reviews, fixed reminders). This reduces API calls while maintaining time precision for critical tasks.
| Dimension | Heartbeat | Cron (main session) | Cron (isolated session) |
| Session | Main Session | Main Session (via System Event) | cron:<jobId> Independent Session |
| Context | Full History | Full History | Starts Blank |
| Model | Overridable | Main Session Model | Overridable |
| Output | Deliver only if non-OK | Heartbeat Prompt + Event | Summary (announce) by default |
| Time Precision | Approximate (Queue load) | Precise (Sec-level cron) | Precise |
| Token Cost | Multi-check per turn | Joins next heartbeat (No extra turn) | One full turn per job |
Heartbeats run full agent cycles; without control, token consumption can be significant. Several ways to reduce costs:
This section discusses the goals of OpenClaw remote operation and maintenance (O&M): ensuring reachability while minimizing the attack surface and providing rapid-revocation access mechanisms.
[!NOTE] The tunneling, Zero Trust, and credential management discussed here are general infrastructure practices that must be implemented on the host or network level. OpenClaw itself does not include built-in remote access components.
Remote O&M starts with controlling the exposure of entry points.
Commonly viable solutions include:
For remote access, it is recommended to:
When OpenClaw is deployed on a cloud server, the assets, reports, and code organized by the agent are stored on the remote disk. In engineering practice, it is recommended to establish a controlled two-way file synchronization mechanism to seamlessly map the "cloud workspace" to your local machine:
When an agent runs an L2 (headed browser) on the cloud for automation, it often encounters complex CAPTCHAs (sliders, behavioral verification) or requires manual confirmation. Relying solely on screenshots (L3) is often insufficient to bypass these security policies.
In practice, you can install a web-based remote desktop service (like KasmVNC) on the server.
In the event of credential leakage or suspicious access, prioritize the following:
The following commands can be used to quickly inspect the remote access baseline of a host:
Bash
# Check SSH authentication methods
sshd -T | grep -E 'passwordauthentication|pubkeyauthentication'# Check for non-local listening ports
lsof -nP -iTCP -sTCP:LISTEN | grep -v '127.0.0.1' || echo "No public listeners"
It is recommended to perform these inspections consistently after deployments or changes, archiving them alongside the output of doctor/status.
It is recommended to perform these inspections consistently after deployments or changes, archiving them alongside the output of doctor/status.
The goal of a security baseline is not to write a static "configuration checklist," but to confine high-risk capabilities within deterministic boundaries and ensure every critical decision is traceable and reviewable. Based on the official OpenClaw security and configuration documentation, this section provides a practical end-to-end flow for security and auditing: how to define layered boundaries, how to record audit events, how to inject secrets, and how to use self-check commands to transform the baseline into a verifiable process.
OpenClaw's official security model is built on the fundamental assumption of a "trusted operator boundary" (personal assistant model). It does not natively provide hard isolation against malicious multi-tenancy. If the gateway is subjected to the outside world without hardened security protections, the system's chain of trust becomes extremely fragile.
In an agent system, risk does not stem from a single entry point but from the combined pipeline of "Inbound Channel + Routing + Tools + Secrets + Memory." A Minimum Viable Product (MVP) for defense-in-depth can be split into four layers, each with verifiable control points:
openclaw doctor
openclaw status --deep
openclaw channels status --probe
openclaw models status --check
openclaw secrets audit
openclaw security audit
Note: The availability of certain CLI commands (e.g., secrets audit and security audit) depends on your version. If a command is missing, consult the official documentation for your current version or use openclaw --help.
The core of auditing is answering four questions: Who, when, via which entry point, did what—and why did the system allow or deny it? To prevent logs from becoming unsearchable text heaps, it is recommended to abstract key actions into structured audit events with four fixed dimensions:
openclaw logs --follow --json | jq -c 'select(.type=="log") | .log | select(.event=="routed" or .event=="tool_call" or .event=="tool_denied") | {ts, trace_id, event, agentId, channelId, peerId, tool, reason}'
Note: Actual field names should be verified against the raw logs --json output. It is recommended to print a raw log entry without filters first to confirm whether the format uses trace_id or traceId before applying jq filters.
If you find that "the same entry point triggers different capabilities at different times," prioritize checking for multi-account configurations, group chat policy discrepancies, or boundary drift caused by overlapping tool policies.
The goal of secret governance is to limit the usable scope and duration of a credential after a potential leak. OpenClaw configurations support retrieving secrets via ${VAR} or SecretRef objects, avoiding the storage of plaintext keys in configuration files or code repositories.
[!NOTE] OpenClaw reads variable references from process environment variables, the current directory's .env, and ~/.openclaw/.env, following the priority rule that ".env does not override existing environment variables."
The following example shows common secret injection patterns: simple interpolation or using SecretRef to integrate with unified credential pipelines. It is recommended to use different keyIds for different environments during deployment.
JavaScript
{
models: {
providers: {
openai: {
keys: {
// Method 1: String interpolationprod: { apiKey: "${OPENAI_API_KEY}" },
// Method 2: Secure SecretRef specificationstaging: { apiKey: { source: "env", id: "OPENAI_API_KEY_STAGING" } }
}
}
}
}
}
Controlling the blast radius typically requires three "hard rules":
Even with the isolation above, serious engineering disasters can occur if core configuration files (e.g., openclaw.json) are not protected against read/write access by the agent.
[!CAUTION] Beware of "AI Repairing Itself to Failure": In real-world cases, users have granted agents global filesystem read/write permissions. When a minor glitch occurred, the agent triggered a "self-healing" hallucination and began modifying its own openclaw.json.Because the agent did not fully understand the parameter schema, it corrupted the configuration. OpenClaw detected the change and auto-restarted, only to crash immediately due to the config error. If a process supervisor (like systemd) repeatedly restarts it, the system enters an infinite error loop, potentially consuming massive log space or API quotas.
Protection Recommendations:
| Concept | Config Field | Default | Description |
| Hot Reload Mode | gateway.reload.mode | hybrid | Options: hybrid (auto-detect reload vs restart), hot, restart, off. debounceMs defaults to 300ms. |
In openclaw.json, you can set gateway.reload.mode: 'hot' or 'off' to completely disable file watching. This prevents "modification triggering frequent abnormal restarts."
A security baseline that isn't drilled will eventually fail during changes. It is recommended to split verification into two categories:
openclaw doctor
openclaw health --json
openclaw status --deep
openclaw security audit
openclaw secrets reload
Version Tip: The name and parameters for openclaw secrets reload may vary by version. If the command fails, use openclaw --help to get the correct syntax.
When an anomaly is found, use doctor and status to determine if it's a config, dependency, or runtime state issue, then use the traceId from the logs to replay and locate the problem. For logs involving sensitive info, ensure redaction is enabled as per official security advice
This chapter has focused on automated operations and security baselines, providing key practices to advance OpenClaw from "functional" to a state that is "controlled, auditable, and replayable."
Only by adding automated operations can the system truly run "unattended." For example:
With the completion of Chapter 8, you have successfully transitioned from an observer to a proficient practitioner of the OpenClaw framework. Over these eight chapters, we have moved from the initial setup of the Five Core Objects to the sophisticated orchestration of Multi-Agent Collaboration and Automated O&M practices. You now possess the foundational knowledge and technical skills to build autonomous "digital employees" that can handle real-world workflows. This concludes the primary guide on the core application and advanced usage of OpenClaw. You are now fully equipped to deploy and optimize your own intelligent agents.
About the author

Sarah Jenkins is a seasoned OpenClaw developer with a strong focus on optimizing high-performance computing solutions. Her work primarily involves crafting efficient parallel algorithms and enhancing GPU acceleration for complex scientific simulations. Jenkins is renowned for her meticulous attention to detail and her ability to translate intricate theoretical concepts into practical, robust OpenClaw implementations.
