External Agents¶
Overview¶
External agents are long-running processes that connect to JustIAM via a persistent gRPC stream and execute scheduled tasks on behalf of a tenant. Unlike the built-in inline worker (which runs scripts inside the backend process), agents:
- Run outside the JustIAM cluster — on-premises, in a private cloud, or in any Kubernetes namespace you control.
- Require no inbound connectivity — the agent dials out to the backend; no port forwarding or firewall rules are needed.
- Hold no credentials — task secrets and config are pre-resolved by the backend and
injected into each
TaskAssignmentmessage. The agent never touches your database.
This makes agents ideal for tasks that need access to internal infrastructure (private APIs, on-premises directories, VPN-only services) without exposing those resources to the JustIAM cluster.
Architecture¶
┌─────────────────────────────────────────────────┐
│ JustIAM cluster │
│ │
│ ┌─────────┐ gRPC stream ┌──────────────┐ │
│ │ Backend │◄───────────────┤ agent binary │ │
│ │ :9090 │ TaskAssignment│ │ │
│ │ │───────────────►│ (anywhere) │ │
│ └─────────┘ TaskResult └──────────────┘ │
└─────────────────────────────────────────────────┘
- The agent connects to
BACKEND_ADDR(gRPC, TLS or plaintext). - On connect it sends a
RegisterAgentmessage containing its slug, worker type, and max concurrency. - The backend dispatches
TaskAssignmentmessages to the agent. The assignment includes the compiled script, injected secrets and config — everything needed to run without database access. - The agent executes the script and sends back a
TaskResult(output + status). - A heartbeat ping is exchanged every 20 s to keep the stream alive through load balancers and firewalls.
Agent tokens¶
Each agent authenticates with a per-tenant agent token. Tokens are managed from Infrastructure → Agent Tokens in the admin UI, or via the API / Terraform.
Create via admin UI:
- Go to Infrastructure → Agent Tokens.
- Click New Token, enter a descriptive name (e.g.
on-prem-agent). - Copy the token value — it is shown once and not stored in plain text.
List via API:
Returns a paginated list of active tokens. Response shape:
Create via API:
POST /api/v1/agent-tokens
Authorization: Bearer <admin-token>
Content-Type: application/json
{"name": "on-prem-agent"}
Response:
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "on-prem-agent",
"token": "agent_fe04c34fe8070a1d8face1d4787197d03b03cfee25ed9be4dbe9509d50f91024",
"created_at": "2026-04-21T00:00:00Z"
}
Create via Terraform:
resource "justiam_agent_token" "on_prem" {
name = "on-prem-agent"
}
# Store the raw token in a secret manager or output it once:
output "agent_token" {
value = justiam_agent_token.on_prem.token
sensitive = true
}
Enabling agent execution for a tenant¶
Set worker_mode = agent in the tenant's configuration:
Kubernetes Secret (justiam-tenant-<slug>):
apiVersion: v1
kind: Secret
metadata:
name: justiam-tenant-acme
namespace: justiam-mt
type: Opaque
stringData:
db_url: "postgres://acme:secret@postgres:5432/acme?sslmode=require"
jwt_secret: "long-random-string"
worker_mode: "agent"
worker_agent_pool: "dedicated" # "shared" or "dedicated"
Agent pool types¶
| Pool | Description |
|---|---|
shared |
Tasks are dispatched to the platform-wide shared agent pool. Individual agent instances are not exposed per-tenant. |
dedicated |
Tasks are dispatched only to agents whose TENANT_SLUG matches this tenant's slug. |
When worker_mode = agent:
- Scheduled tasks are dispatched according to
worker_agent_pool(see table above). - If no agent with capacity is available the task run is marked error immediately (no silent fallback to inline execution).
Running an agent¶
Binary environment variables¶
| Variable | Default | Description |
|---|---|---|
AGENT_TOKEN |
(required for dedicated agents) | Token created in Infrastructure → Agent Tokens. Not needed when SHARED_AGENT_PLATFORM_KEY is set. |
SHARED_AGENT_PLATFORM_KEY |
(unset) | Hex-encoded 32-byte HMAC key for shared agents. When set, the agent self-registers and AGENT_TOKEN is not required. |
AGENT_NAME |
POD_NAME or hostname |
Stable identity for the agent in the platform DB. Set to metadata.name via fieldRef in k8s. |
BACKEND_GRPC_ADDR |
localhost:9090 |
host:port of the backend gRPC server |
BACKEND_URL |
http://localhost:8080 |
HTTP base URL for IDP API calls made by task scripts (idp.* helpers) |
GRPC_TLS |
auto |
true / false / auto — auto enables TLS when the address ends in :443 |
TENANT_SLUG |
* |
Tenant this agent serves. * means cross-tenant (shared). |
TENANT_HOST_SUFFIX |
justiam.com |
Domain suffix used to build the HTTP Host header, e.g. acme.justiam.com |
WORKER_TYPE |
task |
Worker category — must match the tenant's worker_type (currently task) |
MAX_CONCURRENT |
4 |
Maximum number of tasks running simultaneously on this agent |
AGENT_ID_FILE |
/data/agent-id |
Path to persist the stable agent UUID across restarts |
AGENT_NAME |
POD_NAME or hostname |
Human-readable name shown in the Workers UI and stored in agent_connections. Set to the node name (e.g. default) for dedicated agents provisioned by the controlplane. |
VERSION |
injected by ldflags | Version string reported to the backend and shown in the Workers UI. Automatically set when building with make build-agent VERSION=x.y.z. |
OTEL_ENABLED |
false |
Set to true to export execution spans to Tempo via the backend relay |
OTEL_SERVICE_NAME |
justiam-agent |
Service name shown in Grafana traces (e.g. agent-acme-default) |
OTEL_RESOURCE_ATTRIBUTES |
(unset) | Extra resource attributes, e.g. k8s.namespace.name=justiam-mt — required for Loki related-logs links in Tempo |
OTEL_SAMPLE_RATIO |
1.0 |
Fraction of traces to sample (0.0–1.0) |
Kubernetes deployment — tenant-specific agent¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-acme
namespace: acme-infra # your own namespace, outside the JustIAM cluster
spec:
replicas: 1
selector:
matchLabels:
app: justiam-agent
template:
metadata:
labels:
app: justiam-agent
spec:
volumes:
- name: agent-id
emptyDir: {}
containers:
- name: agent
image: marcportabellaclotet/justiam-mt-agent:latest
imagePullPolicy: Always
volumeMounts:
- name: agent-id
mountPath: /data
env:
- name: AGENT_TOKEN
valueFrom:
secretKeyRef:
name: justiam-agent-acme
key: agent_token
- name: BACKEND_ADDR
value: "grpc.justiam.com:443"
- name: BACKEND_URL
value: "https://acme.justiam.com"
- name: TENANT_SLUG
value: "acme"
- name: MAX_CONCURRENT
value: "4"
- name: AGENT_ID_FILE
value: "/data/agent-id"
Kubernetes deployment — shared (cross-tenant) agent¶
A shared agent serves tasks for any tenant that has worker_mode = agent but no
dedicated agent available. Shared agents use HMAC self-registration — they
obtain a short-lived token automatically at startup rather than requiring a static
AGENT_TOKEN secret.
Initial setup (one-time)¶
In the controlplane UI go to Infrastructure → Shared Agent Pool → Initialise. This will:
- Generate a 32-byte platform key and store it in a
justiam-agent-platformKubernetes Secret. - Create the
agent-sharedDeployment (2 replicas, auto-scaling configured). - Create a HorizontalPodAutoscaler.
Alternatively use the API:
POST /api/shared-agent-pool
Authorization: Bearer <controlplane-session>
Content-Type: application/json
{
"min_replicas": 2,
"max_replicas": 10,
"cpu_target_pct": 70
}
Manual Kubernetes deployment¶
If you manage the Deployment yourself, use SHARED_AGENT_PLATFORM_KEY instead of
AGENT_TOKEN. The AGENT_NAME is used as the stable pod identity in the platform DB:
env:
- name: SHARED_AGENT_PLATFORM_KEY
valueFrom:
secretKeyRef:
name: justiam-agent-platform
key: platform_key
- name: AGENT_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: BACKEND_GRPC_ADDR
value: "backend.justiam-mt.svc:9090"
- name: BACKEND_URL
value: "http://backend.justiam-mt.svc:8080"
- name: TENANT_HOST_SUFFIX
value: "justiam.com"
# TENANT_SLUG omitted → defaults to "*"
- name: MAX_CONCURRENT
value: "4"
How self-registration works¶
- On startup the agent calls
POST /api/agent/v1/registersigned withHMAC-SHA256(platform_key, "agent_register:<name>:<unix_timestamp>"). - The backend verifies the signature and timestamp window (±5 min) and inserts a
short-lived token (4h TTL) into
shared_agent_tokens. - The agent uses the token for gRPC auth. At ~75% of TTL (3 h) it renews automatically in a background goroutine — no restarts needed.
Key rotation¶
Rotate the platform key from the controlplane UI (Infrastructure → Shared Agent Pool → Rotate Key) or via the API:
This generates a new key, updates the justiam-agent-platform Secret, and triggers
a graceful rolling restart of the agent-shared Deployment (respects
maxUnavailable: 1). In-flight tasks on surviving pods complete normally; each
restarted pod immediately registers with the new key.
Note: When
BACKEND_GRPC_ADDRpoints to an internal cluster hostname (no TLS), setGRPC_TLS=falseexplicitly. When using the public gRPC ingress (:443), TLS is enabled automatically.
Viewing connected agents¶
Open Workers in the admin UI to see:
- Current login-worker mode and live connected agents
- All connected agents with agent ID, slug, worker type, max concurrency, and active task count
Or query the API directly:
Selecting agents for tasks (dispatch algorithm)¶
When a task is due the backend selects the least-loaded eligible agent within the configured pool:
Dedicated pool (worker_agent_pool: dedicated)¶
- Only tenant-specific agents (
TENANT_SLUG = <slug>) are considered. - The agent with the lowest active-task count is chosen (reservoir sampling on ties).
- If no dedicated agent has capacity the task run is marked error (no silent fallback).
Shared pool (worker_agent_pool: shared)¶
- Only shared agents (
TENANT_SLUG = *) are considered. - The least-loaded eligible agent is chosen.
Global concurrency cap¶
task_max_concurrent sets an optional cross-replica cap on the total number of concurrently running tasks for the tenant across all pools (0 = unlimited). Configure it from the controlplane UI or via PUT /tenants/{slug}/worker-limit.
Multi-replica accuracy: Agent connections are persisted to each tenant's database (
agent_connectionstable). The Workers page always shows all connected agents across every backend replica, not just those on the replica serving the current HTTP request.
Security considerations¶
- Agent tokens are stored as SHA-256 hashes — the raw token cannot be recovered from the database.
- Revoke a token immediately from Infrastructure → Agent Tokens → Revoke if it is compromised. Connected agents using that token are rejected on their next registration attempt.
- Task scripts run inside the Yaegi interpreter with the same package allowlist
as the inline worker (
fmt,strings,time, standard library — noos/exec,syscall, ornetraw sockets). - The gRPC stream is protected by TLS when connecting through a public ingress. Inside the cluster (agent ↔ backend) plaintext is acceptable since the traffic is within the private network.
Backend gRPC server configuration¶
| Variable | Default | Description |
|---|---|---|
GRPC_PORT |
9090 |
Port the backend listens on for agent connections. Set to empty to disable the gRPC server entirely. |
The gRPC server sends keepalive PINGs every 30 s and enforces a minimum client ping interval of 15 s. Agents send PINGs every 20 s. These defaults survive most load balancer idle timeouts.