Skip to content

External Agents

Overview

External agents are long-running processes that connect to JustIAM via a persistent gRPC stream and execute scheduled tasks on behalf of a tenant. Unlike the built-in inline worker (which runs scripts inside the backend process), agents:

  • Run outside the JustIAM cluster — on-premises, in a private cloud, or in any Kubernetes namespace you control.
  • Require no inbound connectivity — the agent dials out to the backend; no port forwarding or firewall rules are needed.
  • Hold no credentials — task secrets and config are pre-resolved by the backend and injected into each TaskAssignment message. The agent never touches your database.

This makes agents ideal for tasks that need access to internal infrastructure (private APIs, on-premises directories, VPN-only services) without exposing those resources to the JustIAM cluster.


Architecture

  ┌─────────────────────────────────────────────────┐
  │  JustIAM cluster                                │
  │                                                 │
  │  ┌─────────┐  gRPC stream   ┌──────────────┐   │
  │  │ Backend │◄───────────────┤ agent binary │   │
  │  │ :9090   │  TaskAssignment│              │   │
  │  │         │───────────────►│  (anywhere)  │   │
  │  └─────────┘  TaskResult    └──────────────┘   │
  └─────────────────────────────────────────────────┘
  1. The agent connects to BACKEND_ADDR (gRPC, TLS or plaintext).
  2. On connect it sends a RegisterAgent message containing its slug, worker type, and max concurrency.
  3. The backend dispatches TaskAssignment messages to the agent. The assignment includes the compiled script, injected secrets and config — everything needed to run without database access.
  4. The agent executes the script and sends back a TaskResult (output + status).
  5. A heartbeat ping is exchanged every 20 s to keep the stream alive through load balancers and firewalls.

Agent tokens

Each agent authenticates with a per-tenant agent token. Tokens are managed from Infrastructure → Agent Tokens in the admin UI, or via the API / Terraform.

Create via admin UI:

  1. Go to Infrastructure → Agent Tokens.
  2. Click New Token, enter a descriptive name (e.g. on-prem-agent).
  3. Copy the token value — it is shown once and not stored in plain text.

List via API:

GET /api/v1/agent-tokens?page=1&limit=25&search=on-prem
Authorization: Bearer <admin-token>

Returns a paginated list of active tokens. Response shape:

{
  "data": [ /* AgentTokenRecord objects */ ],
  "total": 3,
  "page": 1,
  "limit": 25
}

Create via API:

POST /api/v1/agent-tokens
Authorization: Bearer <admin-token>
Content-Type: application/json

{"name": "on-prem-agent"}

Response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "on-prem-agent",
  "token": "agent_fe04c34fe8070a1d8face1d4787197d03b03cfee25ed9be4dbe9509d50f91024",
  "created_at": "2026-04-21T00:00:00Z"
}

Create via Terraform:

resource "justiam_agent_token" "on_prem" {
  name = "on-prem-agent"
}

# Store the raw token in a secret manager or output it once:
output "agent_token" {
  value     = justiam_agent_token.on_prem.token
  sensitive = true
}

Enabling agent execution for a tenant

Set worker_mode = agent in the tenant's configuration:

Kubernetes Secret (justiam-tenant-<slug>):

apiVersion: v1
kind: Secret
metadata:
  name: justiam-tenant-acme
  namespace: justiam-mt
type: Opaque
stringData:
  db_url: "postgres://acme:secret@postgres:5432/acme?sslmode=require"
  jwt_secret: "long-random-string"
  worker_mode: "agent"
  worker_agent_pool: "dedicated"   # "shared" or "dedicated"

Agent pool types

Pool Description
shared Tasks are dispatched to the platform-wide shared agent pool. Individual agent instances are not exposed per-tenant.
dedicated Tasks are dispatched only to agents whose TENANT_SLUG matches this tenant's slug.

When worker_mode = agent:

  • Scheduled tasks are dispatched according to worker_agent_pool (see table above).
  • If no agent with capacity is available the task run is marked error immediately (no silent fallback to inline execution).

Running an agent

Binary environment variables

Variable Default Description
AGENT_TOKEN (required for dedicated agents) Token created in Infrastructure → Agent Tokens. Not needed when SHARED_AGENT_PLATFORM_KEY is set.
SHARED_AGENT_PLATFORM_KEY (unset) Hex-encoded 32-byte HMAC key for shared agents. When set, the agent self-registers and AGENT_TOKEN is not required.
AGENT_NAME POD_NAME or hostname Stable identity for the agent in the platform DB. Set to metadata.name via fieldRef in k8s.
BACKEND_GRPC_ADDR localhost:9090 host:port of the backend gRPC server
BACKEND_URL http://localhost:8080 HTTP base URL for IDP API calls made by task scripts (idp.* helpers)
GRPC_TLS auto true / false / autoauto enables TLS when the address ends in :443
TENANT_SLUG * Tenant this agent serves. * means cross-tenant (shared).
TENANT_HOST_SUFFIX justiam.com Domain suffix used to build the HTTP Host header, e.g. acme.justiam.com
WORKER_TYPE task Worker category — must match the tenant's worker_type (currently task)
MAX_CONCURRENT 4 Maximum number of tasks running simultaneously on this agent
AGENT_ID_FILE /data/agent-id Path to persist the stable agent UUID across restarts
AGENT_NAME POD_NAME or hostname Human-readable name shown in the Workers UI and stored in agent_connections. Set to the node name (e.g. default) for dedicated agents provisioned by the controlplane.
VERSION injected by ldflags Version string reported to the backend and shown in the Workers UI. Automatically set when building with make build-agent VERSION=x.y.z.
OTEL_ENABLED false Set to true to export execution spans to Tempo via the backend relay
OTEL_SERVICE_NAME justiam-agent Service name shown in Grafana traces (e.g. agent-acme-default)
OTEL_RESOURCE_ATTRIBUTES (unset) Extra resource attributes, e.g. k8s.namespace.name=justiam-mt — required for Loki related-logs links in Tempo
OTEL_SAMPLE_RATIO 1.0 Fraction of traces to sample (0.0–1.0)

Kubernetes deployment — tenant-specific agent

apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-acme
  namespace: acme-infra   # your own namespace, outside the JustIAM cluster
spec:
  replicas: 1
  selector:
    matchLabels:
      app: justiam-agent
  template:
    metadata:
      labels:
        app: justiam-agent
    spec:
      volumes:
        - name: agent-id
          emptyDir: {}
      containers:
        - name: agent
          image: marcportabellaclotet/justiam-mt-agent:latest
          imagePullPolicy: Always
          volumeMounts:
            - name: agent-id
              mountPath: /data
          env:
            - name: AGENT_TOKEN
              valueFrom:
                secretKeyRef:
                  name: justiam-agent-acme
                  key: agent_token
            - name: BACKEND_ADDR
              value: "grpc.justiam.com:443"
            - name: BACKEND_URL
              value: "https://acme.justiam.com"
            - name: TENANT_SLUG
              value: "acme"
            - name: MAX_CONCURRENT
              value: "4"
            - name: AGENT_ID_FILE
              value: "/data/agent-id"

Kubernetes deployment — shared (cross-tenant) agent

A shared agent serves tasks for any tenant that has worker_mode = agent but no dedicated agent available. Shared agents use HMAC self-registration — they obtain a short-lived token automatically at startup rather than requiring a static AGENT_TOKEN secret.

Initial setup (one-time)

In the controlplane UI go to Infrastructure → Shared Agent Pool → Initialise. This will:

  1. Generate a 32-byte platform key and store it in a justiam-agent-platform Kubernetes Secret.
  2. Create the agent-shared Deployment (2 replicas, auto-scaling configured).
  3. Create a HorizontalPodAutoscaler.

Alternatively use the API:

POST /api/shared-agent-pool
Authorization: Bearer <controlplane-session>
Content-Type: application/json

{
  "min_replicas": 2,
  "max_replicas": 10,
  "cpu_target_pct": 70
}

Manual Kubernetes deployment

If you manage the Deployment yourself, use SHARED_AGENT_PLATFORM_KEY instead of AGENT_TOKEN. The AGENT_NAME is used as the stable pod identity in the platform DB:

env:
  - name: SHARED_AGENT_PLATFORM_KEY
    valueFrom:
      secretKeyRef:
        name: justiam-agent-platform
        key: platform_key
  - name: AGENT_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name
  - name: BACKEND_GRPC_ADDR
    value: "backend.justiam-mt.svc:9090"
  - name: BACKEND_URL
    value: "http://backend.justiam-mt.svc:8080"
  - name: TENANT_HOST_SUFFIX
    value: "justiam.com"
  # TENANT_SLUG omitted → defaults to "*"
  - name: MAX_CONCURRENT
    value: "4"

How self-registration works

  1. On startup the agent calls POST /api/agent/v1/register signed with HMAC-SHA256(platform_key, "agent_register:<name>:<unix_timestamp>").
  2. The backend verifies the signature and timestamp window (±5 min) and inserts a short-lived token (4h TTL) into shared_agent_tokens.
  3. The agent uses the token for gRPC auth. At ~75% of TTL (3 h) it renews automatically in a background goroutine — no restarts needed.

Key rotation

Rotate the platform key from the controlplane UI (Infrastructure → Shared Agent Pool → Rotate Key) or via the API:

POST /api/shared-agent-pool/rotate-key
Authorization: Bearer <controlplane-session>

This generates a new key, updates the justiam-agent-platform Secret, and triggers a graceful rolling restart of the agent-shared Deployment (respects maxUnavailable: 1). In-flight tasks on surviving pods complete normally; each restarted pod immediately registers with the new key.

Note: When BACKEND_GRPC_ADDR points to an internal cluster hostname (no TLS), set GRPC_TLS=false explicitly. When using the public gRPC ingress (:443), TLS is enabled automatically.


Viewing connected agents

Open Workers in the admin UI to see:

  • Current login-worker mode and live connected agents
  • All connected agents with agent ID, slug, worker type, max concurrency, and active task count

Or query the API directly:

GET /api/v1/workers
Authorization: Bearer <admin-token>

Selecting agents for tasks (dispatch algorithm)

When a task is due the backend selects the least-loaded eligible agent within the configured pool:

Dedicated pool (worker_agent_pool: dedicated)

  1. Only tenant-specific agents (TENANT_SLUG = <slug>) are considered.
  2. The agent with the lowest active-task count is chosen (reservoir sampling on ties).
  3. If no dedicated agent has capacity the task run is marked error (no silent fallback).

Shared pool (worker_agent_pool: shared)

  1. Only shared agents (TENANT_SLUG = *) are considered.
  2. The least-loaded eligible agent is chosen.

Global concurrency cap

task_max_concurrent sets an optional cross-replica cap on the total number of concurrently running tasks for the tenant across all pools (0 = unlimited). Configure it from the controlplane UI or via PUT /tenants/{slug}/worker-limit.

Multi-replica accuracy: Agent connections are persisted to each tenant's database (agent_connections table). The Workers page always shows all connected agents across every backend replica, not just those on the replica serving the current HTTP request.


Security considerations

  • Agent tokens are stored as SHA-256 hashes — the raw token cannot be recovered from the database.
  • Revoke a token immediately from Infrastructure → Agent Tokens → Revoke if it is compromised. Connected agents using that token are rejected on their next registration attempt.
  • Task scripts run inside the Yaegi interpreter with the same package allowlist as the inline worker (fmt, strings, time, standard library — no os/exec, syscall, or net raw sockets).
  • The gRPC stream is protected by TLS when connecting through a public ingress. Inside the cluster (agent ↔ backend) plaintext is acceptable since the traffic is within the private network.

Backend gRPC server configuration

Variable Default Description
GRPC_PORT 9090 Port the backend listens on for agent connections. Set to empty to disable the gRPC server entirely.

The gRPC server sends keepalive PINGs every 30 s and enforces a minimum client ping interval of 15 s. Agents send PINGs every 20 s. These defaults survive most load balancer idle timeouts.