Self-Host

Neutree Agent Platform — Self-HostInstall the platform on your own Kubernetes cluster, pulling images from public registries

What one install gives you

This is the connected / online installer: the target cluster must be able to reach the public internet. Images are pulled directly from public registries (ghcr.io / docker.io / registry.k8s.io) and prerequisite charts/manifests are fetched from their public sources. There is no offline image bundle, no in-cluster registry, and no host image-loading step. For fully air-gapped sites, a separate offline installer ships an image tarball, an in-cluster registry, and a host-prep step.

Core platform (always installed)

Control plane — agent management, scheduling, user and workspace management
Channel gateway — the entry point for external events (webhooks, Slack, etc.) to reach agents
Data layer — PostgreSQL (CloudNativePG) + shared NFS
Agent workspace runtime — one pod per workspace runs the agent; agents can @ each other, share files, and share a memory store

Optional modules (off by default)

Code Sandbox — lets agents run code and serve temporary web previews. Powered by the third-party OpenSandbox, which you install yourself; the platform points at it via OPENSANDBOX_URL
Remote Browser — lets agents drive a real browser while users watch live over WebRTC. Ships a bundled TURN relay (coturn) and a published headful Chromium image
LDAP — let users sign in with their LDAP account

Prerequisites

Infrastructure

Resource	Requirement	Notes
Kubernetes	v1.28+ (multi-node), or a single k3s node (single-node profile)	3+ workers recommended
Worker nodes	4 vCPU / 8GB RAM minimum	Agent pods are created per workspace dynamically
Public registry access	Nodes can pull from `ghcr.io`, `docker.io`, `registry.k8s.io`	Override `REGISTRY` only to use a mirror
RWX shared storage	A CSI that supports ReadWriteMany (NFS is the most common)	Backs the AFS shared directory, 500Gi by default
RWO volume storage	Any CSI that can run PostgreSQL (Ceph RBD, vSAN, etc.; the same NFS also works)	PostgreSQL data volumes + agent workspace container disks

Network

Item	Requirement
Node IP	At least one worker IP reachable by users (NodePort uses it)
NodePort	3 free ports in 30000–32767: `TOS_NODE_PORT` / `BROWSER_NODE_PORT` / `SANDBOX_NODE_PORT`
TURN ports	When the Remote Browser's TURN relay is enabled: open `3478/tcp+udp` and `49152-49252/udp` on the coturn node
Storage reachability	All nodes can mount the two storage classes above (NFS / block-storage CSI, etc.)
Registry reachability	All nodes can pull images from the public registries

LLM API

The platform does not bundle any model. Depending on the agent types you enable, you must provide protocol-compatible API endpoints:

Agent type	API protocol required
Codex	OpenAI Responses API (note: not Chat Completions)
Claude Code	Anthropic API

If your existing model service only supports the OpenAI Chat Completions API, one option is to put a translating proxy in front of it that converts the OpenAI Chat protocol to the Anthropic protocol, then point Claude Code-style agents at the proxy.

kubeconfig permissions

Installation requires cluster-admin — install.sh touches resources that a namespace-scoped admin cannot (CRDs, webhooks, ClusterRoles, StorageClasses, etc.). You can revoke it immediately after install; at steady state the control plane authenticates via its own in-cluster ServiceAccount with tightly scoped permissions (normal read/write within the namespace + cluster-scoped get/list on nodes only).

The operator's kubeconfig is never mounted into any platform pod. If a temporary cluster-admin is not acceptable, here is an equivalent minimal ClusterRole.

Equivalent minimal ClusterRole

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nap-installer
rules:
  - apiGroups: [apiextensions.k8s.io]
    resources: [customresourcedefinitions]
    verbs: [get, list, watch, create, update, patch, delete]
  - apiGroups: [admissionregistration.k8s.io]
    resources: [validatingwebhookconfigurations, mutatingwebhookconfigurations]
    verbs: [get, list, watch, create, update, patch, delete]
  - apiGroups: [""]
    resources: [namespaces]
    verbs: [get, list, create, update, patch]
  - apiGroups: [rbac.authorization.k8s.io]
    resources: [clusterroles, clusterrolebindings, roles, rolebindings]
    verbs: [get, list, watch, create, update, patch, delete]
  - apiGroups: [storage.k8s.io]
    resources: [storageclasses]
    verbs: [get, list, create, update, patch]
  - apiGroups: [postgresql.cnpg.io]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: [opensandbox.alibaba.com]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: ["", apps, batch, networking.k8s.io, policy]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: [""]
    resources: [nodes]
    verbs: [get, list, watch]

This is still close to cluster-admin in practice (*/* on the core/apps/batch groups), but spelling out the resources makes a security review easier.

Once the prerequisites are in place:

Overview

What one install gives you

Core platform (always installed)

Control plane — agent management, scheduling, user and workspace management
Channel gateway — the entry point for external events (webhooks, Slack, etc.) to reach agents
Data layer — PostgreSQL (CloudNativePG) + shared NFS
Agent workspace runtime — one pod per workspace runs the agent; agents can @ each other, share files, and share a memory store

Optional modules (off by default)

Code Sandbox — lets agents run code and serve temporary web previews. Powered by the third-party OpenSandbox, which you install yourself; the platform points at it via OPENSANDBOX_URL
Remote Browser — lets agents drive a real browser while users watch live over WebRTC. Ships a bundled TURN relay (coturn) and a published headful Chromium image
LDAP — let users sign in with their LDAP account

Prerequisites

Infrastructure

Resource	Requirement	Notes
Kubernetes	v1.28+ (multi-node), or a single k3s node (single-node profile)	3+ workers recommended
Worker nodes	4 vCPU / 8GB RAM minimum	Agent pods are created per workspace dynamically
Public registry access	Nodes can pull from `ghcr.io`, `docker.io`, `registry.k8s.io`	Override `REGISTRY` only to use a mirror
RWX shared storage	A CSI that supports ReadWriteMany (NFS is the most common)	Backs the AFS shared directory, 500Gi by default
RWO volume storage	Any CSI that can run PostgreSQL (Ceph RBD, vSAN, etc.; the same NFS also works)	PostgreSQL data volumes + agent workspace container disks

Network

Item	Requirement
Node IP	At least one worker IP reachable by users (NodePort uses it)
NodePort	3 free ports in 30000–32767: `TOS_NODE_PORT` / `BROWSER_NODE_PORT` / `SANDBOX_NODE_PORT`
TURN ports	When the Remote Browser's TURN relay is enabled: open `3478/tcp+udp` and `49152-49252/udp` on the coturn node
Storage reachability	All nodes can mount the two storage classes above (NFS / block-storage CSI, etc.)
Registry reachability	All nodes can pull images from the public registries

LLM API

The platform does not bundle any model. Depending on the agent types you enable, you must provide protocol-compatible API endpoints:

Agent type	API protocol required
Codex	OpenAI Responses API (note: not Chat Completions)
Claude Code	Anthropic API

kubeconfig permissions

The operator's kubeconfig is never mounted into any platform pod. If a temporary cluster-admin is not acceptable, here is an equivalent minimal ClusterRole.

Equivalent minimal ClusterRole

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nap-installer
rules:
  - apiGroups: [apiextensions.k8s.io]
    resources: [customresourcedefinitions]
    verbs: [get, list, watch, create, update, patch, delete]
  - apiGroups: [admissionregistration.k8s.io]
    resources: [validatingwebhookconfigurations, mutatingwebhookconfigurations]
    verbs: [get, list, watch, create, update, patch, delete]
  - apiGroups: [""]
    resources: [namespaces]
    verbs: [get, list, create, update, patch]
  - apiGroups: [rbac.authorization.k8s.io]
    resources: [clusterroles, clusterrolebindings, roles, rolebindings]
    verbs: [get, list, watch, create, update, patch, delete]
  - apiGroups: [storage.k8s.io]
    resources: [storageclasses]
    verbs: [get, list, create, update, patch]
  - apiGroups: [postgresql.cnpg.io]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: [opensandbox.alibaba.com]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: ["", apps, batch, networking.k8s.io, policy]
    resources: ["*"]
    verbs: ["*"]
  - apiGroups: [""]
    resources: [nodes]
    verbs: [get, list, watch]

This is still close to cluster-admin in practice (*/* on the core/apps/batch groups), but spelling out the resources makes a security review easier.

Once the prerequisites are in place:

Configure

Fill in the form for your environment; values.env is previewed live on the right. Everything is processed locally — nothing is uploaded. Secrets are generated with crypto.getRandomValues (equivalent to openssl rand -hex 32). Once a machine-internal secret is set, do not change it on upgrade — otherwise issued session tokens and the existing database become unusable.

Container Registry

Public registry path that holds all first-party images

REGISTRY*

Registry path holding the images, no trailing slash. Default is the official public registry; override only to use a mirror

IMAGE_TAG

Tag for all first-party images. Pin to a release tag for a reproducible install

Cluster & Access

K8s namespace, kubeconfig path, and the IP / NodePort users reach the platform at

NAMESPACE

KUBECONFIG

TOS_HOST*

Required

TOS_NODE_PORT*

Web UI + API. 30000–32767. Still rendered but not exposed when INGRESS_MODE=external

INGRESS_MODE

nodeport = default NodePort exposure; external = Services become ClusterIP and your own ingress fronts the HTTP services

Admin Account

Created by a seed Job on first install. JWT_SECRET signs session tokens

ADMIN_USERNAME

ADMIN_PASSWORD*

Required

ADMIN_DISPLAY_NAME

JWT_SECRET*

Required

CREDENTIAL_ENCRYPTION_KEY*

Required

PostgreSQL

PG_USERNAME

PG_PASSWORD*

Required

PG_INSTANCES

CNPG cluster replica count (including the primary). At least 3 in production

PG_STORAGE_SIZE

Volume size per PostgreSQL instance, e.g. 10Gi / 100Gi

PG_STORAGE_CLASS

StorageClass for PostgreSQL volumes. Leave empty to use the cluster default StorageClass

Shared Storage

AFS backend + cross-node NFS (ReadWriteMany)

NFS_SERVER*

Required

NFS_PATH*

NFS_STORAGE_CLASS

Name of the StorageClass the NFS provisioner creates

AFS_STORAGE_SIZE

Size of the AFS RWX PVC

Agent Runtime

Each workspace pod the control plane spawns dynamically

AGENT_IMAGE_PREFIX

References REGISTRY by default; supply a full prefix to customize

AGENT_IMAGE_TAG

AGENT_STORAGE_CLASS

StorageClass for agent workspace PVCs (ReadWriteOnce is fine). The volume root must be 0777 — the installer's nfs-subdir provisioner (nfs-nap) satisfies this. After install, verify the backend: kubectl get sc <class> -o jsonpath={.provisioner}; if it is nfs.csi.k8s.io / SFS the volume root is 0755 and agents hit mkdir EACCES, requiring mountPermissions:"0777"

AGENT_NODE_SELECTOR

Optional. Format: key1=val1,key2=val2

Optional modules

Code Sandbox

Lets agents run code and serve temporary web previews

Remote Browser

Lets agents drive a real browser while users watch live. The TURN relay is bundled with the browser and enabled together

LDAP

Let users sign in with their LDAP account

values.env preview

# ============================================================================
# Neutree Agent Platform Self-Hosted Deployment Configuration
# Generated by nap.docs.neutree.ai configuration generator
# ============================================================================

# --- Container Registry ---
REGISTRY=ghcr.io/neutree-ai/agent-platform
IMAGE_TAG=latest

# --- Cluster & Access ---
NAMESPACE=nap
KUBECONFIG=./kubeconfig.yaml
TOS_HOST=
TOS_NODE_PORT=30080
INGRESS_MODE=nodeport

# --- Admin Account ---
ADMIN_USERNAME=admin
ADMIN_PASSWORD=
ADMIN_DISPLAY_NAME=Admin
JWT_SECRET=
CREDENTIAL_ENCRYPTION_KEY=

# --- PostgreSQL ---
PG_USERNAME=nap
PG_PASSWORD=
PG_INSTANCES=3
PG_STORAGE_SIZE=10Gi
PG_STORAGE_CLASS=

# --- Shared Storage ---
NFS_SERVER=
NFS_PATH=/data/nap
NFS_STORAGE_CLASS=nfs-nap
AFS_STORAGE_SIZE=500Gi

# --- Agent Runtime ---
AGENT_IMAGE_PREFIX=${REGISTRY}/nap-agent
AGENT_IMAGE_TAG=latest
AGENT_STORAGE_CLASS=nfs-csi
AGENT_NODE_SELECTOR=

# --- Code Sandbox (disabled) ---
SANDBOX_ENABLED=false

# --- Remote Browser + TURN Relay (disabled) ---
BROWSER_ENABLED=false
COTURN_ENABLED=false

# --- LDAP (disabled) ---
LDAP_ENABLED=false

The interactive configuration generator is online at nap.docs.neutree.ai/self-host/#configure. For full field documentation see self-host/values.env.example.

Install

Tools on the operator machine

The host running the installer (distinct from the cluster nodes) needs:

kubectl — a version compatible with the target cluster
envsubst — usually shipped with the gettext package
openssl — used by gen-secrets.sh to generate random secrets
helm 3.x — only needed when the cluster doesn't already have an NFS provisioner; invoked by install.sh's prerequisites stage

The cluster nodes (not the operator machine) must be able to pull from ghcr.io, docker.io, and registry.k8s.io.

Quick start

git clone <this-repo> && cd self-host
cp values.env.example values.env
./gen-secrets.sh                # fills random machine secrets
vi values.env                   # set TOS_HOST, ADMIN_PASSWORD, storage, etc.
./install.sh

When it finishes, open http://<TOS_HOST>:<TOS_NODE_PORT> and log in with the admin username / password from values.env.

Step by step

Get the installer
```
git clone <this-repo> && cd self-host
```
All first-party images are pulled from the public registry (${REGISTRY}, default ghcr.io/neutree-ai/agent-platform); there is no image tarball to load. Override REGISTRY only if you mirror the images elsewhere.
Prepare values.env
We recommend the — fill it in online, download the result, and place it in the self-host/ directory.
You can also edit it on the command line: cp values.env.example values.env, run ./gen-secrets.sh to fill all machine-internal secrets, then vi values.env to set TOS_HOST, the admin password, and storage settings.
Run the installer
```
./install.sh
```
The same command serves first-time install and upgrade; it is idempotent and safe to re-run. It installs prerequisites (the CloudNativePG operator and the NFS subdir provisioner), renders the manifests with your values.env and applies them, then seeds the admin user, OAuth clients, and the MCP catalog via one-shot Jobs. nap-cp runs SQL migrations on startup.
Log in
Open http://<TOS_HOST>:<TOS_NODE_PORT> in a browser and log in with ADMIN_USERNAME and the ADMIN_PASSWORD from values.env.

install.sh subcommands

For running stages separately; a single ./install.sh is enough for the normal case.

./install.sh                  # full: prereqs + manifests + seed
./install.sh --prereqs-only   # only CNPG operator + NFS provisioner
./install.sh --manifests-only # only render + apply k8s manifests
./install.sh --seed-only      # only seed admin / OAuth clients / MCP (K8s Jobs)
./install.sh --render-only    # render manifests to rendered/ without applying

Single-node profile

A single k3s node that pulls every image straight from the public registry — same as the full profile, just with PG_INSTANCES=1 and an in-cluster NFS server for RWX storage (a single node has no external NFS). It does not bring up an in-cluster registry and does not load any tarball.

cp values.env.single-node.example values.env
./gen-secrets.sh
vi values.env                 # set TOS_HOST + ADMIN_PASSWORD
./install.sh --profile=single-node

Run this on a host that has a working k3s with its kubeconfig at /etc/rancher/k3s/k3s.yaml (the default in the single-node example).

Air-gapped sites

This page documents the connected installer. For fully air-gapped / offline sites there is a separate offline installer that ships an image tarball, an in-cluster registry, and a host image-loading step.

Upgrade

Upgrading is the same command as a first install. Pin IMAGE_TAG to the new release tag (or keep latest) in your existing values.env, then re-run:

./install.sh

install.sh is idempotent, so the upgrade path matches the first install. It re-renders and re-applies the manifests and refreshes the first-party deployments to pick up new image digests. SQL migrations run automatically when nap-cp starts.

Do not change secrets · Reuse the values.env from your first install. If a machine-internal secret (e.g. JWT_SECRET) changes, all issued session tokens are invalidated and the existing database can no longer be reached.

Upgrading from a pre-2026-05 release

Optional-module defaults changed from "enabled unless configured" to "disabled unless configured". If the following _ENABLED fields aren't set explicitly in values.env, the corresponding capabilities are off after the upgrade:

Capability	Old default	New default	Keep it on with
Remote Browser (incl. TURN)	On	Off	`BROWSER_ENABLED=true`
Code Sandbox	On	Off	`SANDBOX_ENABLED=true`
LDAP login	Whether `LDAP_URL` is non-empty	Off	`LDAP_ENABLED=true`

COTURN_ENABLED is now part of the browser module and tracks BROWSER_ENABLED automatically — no separate configuration.

Troubleshoot

install.sh fails

First find the deployment that isn't ready (replace $NAMESPACE with NAMESPACE from values.env, default nap):

kubectl -n $NAMESPACE get pods
kubectl -n $NAMESPACE describe pod <not-ready-pod>
kubectl -n $NAMESPACE logs deploy/<deployment>

Common causes:

Images won't pull → confirm the nodes can reach ghcr.io / docker.io / registry.k8s.io. If you mirror images, check REGISTRY and the IMAGE_PULL_SECRET you configured.
PVCs stuck Pending → run kubectl -n $NAMESPACE get pvc and check the StorageClass exists and its provisioner is healthy
PostgreSQL won't start → kubectl -n $NAMESPACE describe cluster.postgresql.cnpg.io nap-pg; the most common cause is the CSI behind PG_STORAGE_CLASS not being writable
NodePort already in use → change TOS_NODE_PORT / BROWSER_NODE_PORT / SANDBOX_NODE_PORT and re-run install.sh

Usually because JWT_SECRET changed during an upgrade — all issued tokens are invalidated. Roll JWT_SECRET in values.env back to its first-install value and re-run ./install.sh.

Cannot reach the platform

The browser gets no response at http://<TOS_HOST>:<TOS_NODE_PORT>. Two common causes:

TOS_HOST is unreachable — the configured IP is not a worker node reachable from the browser. Set the correct node IP and re-run install.sh
NodePort not open — the node firewall blocks the port; ask your SRE to open it

Browser / Sandbox missing after upgrade

Optional-module defaults changed to "disabled unless configured" as of 2026-05. If you previously enabled the browser or sandbox, set BROWSER_ENABLED=true / SANDBOX_ENABLED=true explicitly in values.env. See the compatibility section on the Upgrade tab.

Agent fails to start: `mkdir /workspace/.home/.claude: EACCES`

The agent container runs as a non-root user (node, uid 1000), and /workspace is a mounted PVC. If that PVC is backed by the community nfs.csi.k8s.io driver, that driver does not chmod the provisioned subdirectory by default(per its docs, mountPermissions defaults to 0; chmod only runs when non-zero), so subdirectory permissions come from the NFS server's default mkdir umask — typically root:root 0755, which uid 1000 cannot write to.

Verify on the NFS server:

ls -ld <nfs-share>/pvc-<uuid>
# drwxr-xr-x 1 root root ...   <- 0755, not 0777

Fix: add mountPermissions: "0777" (as a string) to the StorageClass parameters, then delete the failed PVC and let the control plane recreate it. This only affects newly provisioned PVs; existing subdirectories need a manual chmod 0777 on the NFS server.

parameters:
  server: <nfs-server>
  share: <export-path>
  mountPermissions: "0777"

Confirm the StorageClass backend first, then decide how to fix:

kubectl get sc <AGENT_STORAGE_CLASS> -o jsonpath='{.provisioner}{"\n"}'

Returns cluster.local/nfs-subdir-external-provisioner — this is the installer's own provisioner, which mkdir 0777s subdirectories, so this normally doesn't happen; if it still errors, check the actual NFS server permissions.
Returns nfs.csi.k8s.io (or another CSI driver such as SFS) — apply the mountPermissions: "0777" fix above.

Common pitfall: the installer's NFS provisioner step has a "skip if a StorageClass of the same name already exists" check (see install_nfs_provisioner). If a StorageClass named NFS_STORAGE_CLASS (default nfs-nap) already exists before install and is backed by nfs.csi.k8s.io / SFS, the installer silently skips and does not deploy the bundled nfs-subdir provisioner, so agent workspaces land on a 0755 backend and hit this error. In that case kubectl get deploy -n nap nfs-subdir-external-provisioner returns NotFound. Fix either way: add mountPermissions: "0777" to that SC (as above), or delete the pre-existing SC / use a different NFS_STORAGE_CLASS name and re-run the installer so nfs-subdir actually installs.

Browser live view doesn't render

Enable Remote Browser in the configuration generator, set TURN_HOST (a LAN or public IP browsers can reach) and TURN_AUTH_SECRET, and re-run install.sh. The TURN relay is bundled with the browser and starts/stops together with it.