CLI Reference

The Graphora CLI is a single binary that covers the full extract → inspect → compare → regression-test loop. It works in two modes: embedded (runs the extraction pipeline locally — only a Gemini API key required) and remote (talks to a Graphora API server, hosted or self-deployed). This page is the authoritative command reference. For end-to-end walkthroughs see the Platform Quickstart and Local Dev Guide.

Installation

pip install graphora[cli]

This installs both the graphora CLI and the Python client library. Requires Python 3.9 or higher. To install just the client library without the CLI, omit the [cli] extra. Verify:

graphora --version

Quick start

# 1. One-time setup — picks up a free Gemini key from Google AI Studio
graphora config init --api-key "$GEMINI_API_KEY"

# 2. Extract a graph (schema auto-inferred)
graphora extract report.pdf -o graph.json

# 3. (Remote mode only) inspect the result
graphora explain tx_abc123 --node-id n_alice

Commands at a glance

Command	Purpose
`extract`	Run extraction over one or more documents
`schema infer`	Generate an ontology from documents without extracting
`schema validate`	Lint an ontology YAML file
`explain`	Show the evidence and decisions behind a node or edge
`diff`	Compare two transforms — added / removed / changed
`scenario`	Create and inspect named graph snapshots
`test`	Golden-corpus regression runner with F1 gating
`install`	Wire Graphora’s MCP server into an agent client (Cursor, Claude Desktop, etc.)
`config`	Manage `~/.graphora/config.yaml`
`status`	Show current configuration and mode availability
`update`	Update the embedded-mode `graphora-api` binary
`init`	Shortcut for `graphora config init`
`version`	Print the CLI version

Configuration

Config is stored at ~/.graphora/config.yaml. The CLI also reads several environment variables — see Environment variables below.

`graphora config init`

Interactive setup. For embedded mode, graphora-api is auto-downloaded and cached on first use; you only need a Gemini API key.

graphora config init --api-key "$GEMINI_API_KEY"

# Remote mode against a hosted server
graphora config init --mode remote \
  --api-url "https://api.graphora.io" \
  --auth-token "$CLERK_JWT"

# Local development against your own graphora-api clone
graphora config init --api-path /path/to/graphora-api

Option	Description
`--mode`, `-m`	`embedded` (default) or `remote`
`--api-key`, `-k`	LLM API key (embedded mode)
`--api-path`, `-p`	Custom path to a local `graphora-api` clone

`graphora config set`

graphora config set llm.api_key "your-key"
graphora config set llm.model "gemini-1.5-pro"
graphora config set defaults.mode "remote"
graphora config set api.url "https://api.graphora.io"

`graphora config get` / `show` / `path`

graphora config get llm.model          # one value
graphora config show                   # all values, sensitive fields masked
graphora config path                   # print the config file path

`graphora init`

Shortcut for graphora config init with the two flags people set most often:

graphora init --api-key "$GEMINI_API_KEY"
graphora init --mode remote

`graphora status`

Reports the active mode, whether the embedded graphora-api binary is cached, and whether your API key / auth token are configured.

graphora status

Extracting graphs

`graphora extract`

Run the extraction pipeline over one or more documents.

graphora extract <file...> [options]

Argument	Description
`files`	One or more document files or directories (recurses)

Option	Description
`--output`, `-o`	Output file path (default: `graph.json`)
`--schema`, `-s`	Ontology YAML file (auto-inferred if omitted)
`--format`, `-f`	Output format: `json` (default) or `cypher`
`--api-key`, `-k`	LLM API key (embedded mode only; env: `GRAPHORA_API_KEY`)
`--mode`, `-m`	Override the configured execution mode
`--verbose`, `-v`	Show detailed progress

# Single document, auto-inferred schema
graphora extract report.pdf -o graph.json

# Use an explicit ontology
graphora extract report.pdf --schema ontology.yaml -o graph.json

# Multiple documents combined into one graph
graphora extract doc1.pdf doc2.pdf docs/ -o combined.json

# Cypher output for Neo4j import
graphora extract report.pdf -f cypher -o import.cypher

# Excel workbook (sheets + cells become structured input via MarkItDown)
graphora extract audit-workpaper.xlsx -o graph.json

# Force remote mode for a one-off
graphora extract report.pdf --mode remote

Supported file formats (10 total):

Category	Extensions	Notes
PDF	`.pdf`	Multi-backend with OCR fallback for scanned pages
Office	`.docx`, `.xlsx`, `.pptx`	Converted to markdown via MarkItDown — tables / cells / slide text preserved
Text	`.txt`, `.md`, `.csv`, `.json`, `.xml`, `.html`	Ingested directly without conversion

The parser sniffs by file extension. Directories passed to graphora extract are recursed; files with unsupported extensions are skipped silently. Behind the scenes — schema modes. graphora extract is a thin wrapper over three API paths, picked by the --schema / --mode flags:

CLI invocation	API path	Schema posture
`graphora extract doc.pdf` (no `--schema`)	`POST /transform/upload`	Auto-infer ontology from document content before extracting
`graphora extract doc.pdf --schema ontology.yaml`	`POST /transform/{ontology_id}/upload`	Use the registered ontology to bias extraction
(programmatic)	`POST /transform/schemaless/upload`	Extract with generic schema, refine ontology post-hoc

See API Reference — Extraction modes for the full posture comparison and when to pick each.

`graphora schema infer`

Generate an ontology YAML from documents without performing a full extraction. Useful as a starting point before extraction, or to diff against an existing schema.

graphora schema infer document.pdf -o ontology.yaml
graphora schema infer doc1.pdf doc2.pdf -o combined-schema.yaml

Option	Description
`--output`, `-o`	Output file path (default: `ontology.yaml`)
`--api-key`, `-k`	LLM API key (env: `GRAPHORA_API_KEY`)
`--verbose`, `-v`	Show detailed progress

`graphora schema validate`

Lint an ontology YAML for structural correctness and common issues. Exits non-zero on validation failure — safe to wire into pre-commit hooks.

graphora schema validate ontology.yaml
graphora schema validate ontology.yaml -v   # verbose

Inspecting graphs

The next three commands require remote mode — they talk to a Graphora API server (hosted or your own deployment) and need an auth token.

`graphora explain`

Show the source-span evidence and decision-log trail behind a single node or edge. Renders the originating text + document, the schema-inference / merge / accept-reject decisions the pipeline emitted, and any alternatives considered.

graphora explain <transform-id> --node-id <node-id>
graphora explain <transform-id> --edge-id <edge-id>

Argument	Description
`transform_id`	Transform ID the node or edge belongs to

Option	Description
`--node-id`, `-n`	Node ID to explain (mutually exclusive with `--edge-id`)
`--edge-id`, `-e`	Edge ID to explain
`--json`	Emit raw JSON evidence payload (default: formatted view)
`--api-url`	Override base URL (env: `GRAPHORA_API_URL`)
`--auth-token`	Override auth token (env: `GRAPHORA_AUTH_TOKEN`)

graphora explain tx_abc123 --node-id n_alice
graphora explain tx_abc123 --edge-id e_works_at --json > evidence.json

`graphora diff`

Compare two transforms and show added / removed / changed nodes and edges, with a summary table plus per-section detail tables.

graphora diff <base-transform-id> <compare-transform-id>

Option	Description
`--json`	Emit the raw diff payload (same shape as the `/diff` endpoint)
`--show-unchanged`	Include unchanged counts in the summary (off by default)
`--limit`, `-n`	Per-section row cap on the detail tables (default: 20)

graphora diff tx-old tx-new
graphora diff tx-old tx-new --show-unchanged
graphora diff tx-old tx-new --json | jq '.summary'
graphora diff tx-old tx-new -n 5            # tighter detail tables

Each entry in the changed section expands to one row per property change, so the rendered row count can exceed --limit when entries have multiple changes. --json always emits the full set.

Scenarios

A scenario is a named point-in-time snapshot of a transform’s extracted graph. Scenarios are useful for branching what-if comparisons before publishing, or for pinning a known-good state before running a destructive merge.

`graphora scenario create`

graphora scenario create -t <transform-id> -n <name> [-d <description>]

Option	Description
`--transform-id`, `-t`	Required. Transform whose graph will be snapshotted
`--name`, `-n`	Required. Scenario name. Must be unique per (you, transform_id) — duplicates return 409
`--description`, `-d`	Optional free-form note
`--json`	Emit the full scenario record (including embedded snapshot) as JSON

graphora scenario create -t tx-abc -n baseline
graphora scenario create -t tx-abc -n "merge-alice-alicia" -d "what-if ER"

`graphora scenario list`

graphora scenario list
graphora scenario list --json

Lightweight summary view (counts only, no embedded graph). Use scenario show to fetch the full record.

`graphora scenario show`

graphora scenario show <scenario-id>
graphora scenario show sc-abc123 --json | jq '.graph.nodes | length'

`graphora scenario delete`

graphora scenario delete <scenario-id>          # interactive confirm
graphora scenario delete <scenario-id> --yes    # non-interactive (CI)

The server’s delete is intentionally non-idempotent — a 404 fires when the scenario doesn’t exist or belongs to another user. Same posture as explain and diff on cross-tenant access.

Regression testing

`graphora test`

Run a golden-corpus regression test against a Graphora API. For each subdirectory under <corpus-path>, the runner registers ontology.yaml, uploads document.txt, waits for extraction to complete, then POSTs the live transform_id plus expected.json to /api/v1/golden/score and records the report.

graphora test <corpus-path> [options]

Option	Description
`--min-f1`	Minimum F1 score required for exit 0. Applied to both `nodes.f1` and `edges.f1` of each entry. Default `0.0` makes the runner informational
`--output`, `-o`	Write full per-doc report set + summary to a JSON file
`--json`	Emit aggregated report set to stdout as JSON (instead of the Rich table)

graphora test golden/                        # informational
graphora test golden/ --min-f1 0.6           # gate at F1 >= 0.6
graphora test golden/ --json > report.json
graphora test golden/ -o report.json --min-f1 0.5

Exit codes:

Code	Meaning
`0`	Every doc’s `nodes.f1` and `edges.f1` met `--min-f1`
`1`	At least one doc fell below the threshold
`2`	Configuration / corpus problem (no docs, missing credentials, etc.)

Each subdirectory under <corpus-path> must contain document.txt, ontology.yaml, and expected.json to count as a corpus entry. Other subdirectories are skipped silently.

MCP integration

`graphora install`

Wire Graphora’s MCP server into an agent client’s config so the agent can call graphora-mcp tools directly.

graphora install <client> [options]

Argument	Description
`client`	One of: `claude-code`, `claude-desktop`, `codex`, `cursor`, `vscode`

Option	Description
`--api-url`	Graphora API URL the MCP server will talk to. Falls back to `graphora config init` / `GRAPHORA_API_URL`
`--auth-token`	Auth token for the API. Falls back to config / `GRAPHORA_AUTH_TOKEN`. Pass an empty string (or skip) when the server has auth bypass enabled
`--command`	Path to the `graphora-mcp` executable (default: unqualified `graphora-mcp`, relying on `PATH`). Use an absolute path when your agent client launches from a shell that doesn’t include the install directory
`--workspace`	Workspace root for workspace-relative configs (`cursor` / `vscode` / `codex` / `claude-code`). Defaults to the current directory. Ignored for `claude-desktop`, which is user-global
`--force`	Overwrite the existing config file instead of merging the `graphora` entry into `mcpServers`. Destructive — wipes other MCP server entries
`--dry-run`	Print the resulting config to stdout without writing

# Workspace-relative install for Cursor
graphora install cursor

# User-global install for Claude Desktop
graphora install claude-desktop

# Preview without writing
graphora install cursor --dry-run

# One-off against a local API
graphora install cursor \
  --api-url http://localhost:8000 \
  --auth-token "$JWT"

The cursor, vscode, codex, and claude-code clients write to the workspace; claude-desktop writes to the user-global config. Existing mcpServers entries are preserved unless you pass --force.

Maintenance

`graphora update`

Update the embedded-mode graphora-api binary by re-downloading from GitHub.

graphora update
graphora update --channel nightly
graphora update --version v1.2.0
graphora update --force                # re-download even if up to date

Option	Description
`--channel`, `-c`	`stable` (default), `nightly`, or `main`
`--version`, `-v`	Specific version to install (e.g., `v1.2.0`)
`--force`, `-f`	Force re-download even when up to date

`graphora version`

graphora version

Prints the installed graphora CLI version. Equivalent to graphora --version.

Embedded vs remote mode

Mode	What runs locally	Requirements
embedded (default)	The full extraction pipeline — `graphora-api` runs in-process	Gemini API key only
remote	Just the client — extraction runs on the server	API URL + auth token

Embedded mode is recommended for one-off extractions, CI pipelines, and evaluation. No database, Redis, or external services needed; graphora-api is auto-downloaded on first use and cached at ~/.graphora/api/. Remote mode is required for the explain, diff, scenario, and test commands, since those depend on server-side state (transform IDs, scenario storage, the golden-scoring endpoint). Configure it once with graphora config init --mode remote.

Environment variables

Variable	Purpose
`GEMINI_API_KEY`	LLM API key for embedded mode
`GRAPHORA_API_KEY`	Alternative to `GEMINI_API_KEY` (used by `extract` / `schema infer`)
`GRAPHORA_API_URL`	Base URL for remote-mode commands
`GRAPHORA_AUTH_TOKEN`	Clerk-issued bearer token for remote mode

Precedence for --api-url / --auth-token: explicit flag > environment variable > ~/.graphora/config.yaml.

Python client library

The CLI is built on top of the Python client. For programmatic access:

from graphora import GraphoraClient

client = GraphoraClient()

with open("ontology.yaml") as f:
    ontology = client.register_ontology(f.read())

transform = client.transform(ontology_id=ontology.id, files=["document.pdf"])
status = client.wait_for_transform(transform.id)
graph = client.get_transformed_graph(transform_id=transform.id)
print(f"Extracted {len(graph.nodes)} nodes and {len(graph.edges)} edges")

See the Client Library documentation for the full API surface.

​CLI Reference

​Installation

​Quick start

​Commands at a glance

​Configuration

​graphora config init

​graphora config set

​graphora config get / show / path

​graphora init

​graphora status

​Extracting graphs

​graphora extract

​graphora schema infer

​graphora schema validate

​Inspecting graphs

​graphora explain

​graphora diff

​Scenarios

​graphora scenario create

​graphora scenario list

​graphora scenario show

​graphora scenario delete

​Regression testing

​graphora test

​MCP integration

​graphora install

​Maintenance

​graphora update

​graphora version

​Embedded vs remote mode

​Environment variables

​Python client library