CLI Reference
The Graphora CLI is a single binary that covers the full extract → inspect → compare → regression-test loop. It works in two modes: embedded (runs the extraction pipeline locally — only a Gemini API key required) and remote (talks to a Graphora API server, hosted or self-deployed). This page is the authoritative command reference. For end-to-end walkthroughs see the Platform Quickstart and Local Dev Guide.Installation
graphora CLI and the Python client library. Requires Python 3.9 or higher. To install just the client library without the CLI, omit the [cli] extra.
Verify:
Quick start
Commands at a glance
| Command | Purpose |
|---|---|
extract | Run extraction over one or more documents |
schema infer | Generate an ontology from documents without extracting |
schema validate | Lint an ontology YAML file |
explain | Show the evidence and decisions behind a node or edge |
diff | Compare two transforms — added / removed / changed |
scenario | Create and inspect named graph snapshots |
test | Golden-corpus regression runner with F1 gating |
install | Wire Graphora’s MCP server into an agent client (Cursor, Claude Desktop, etc.) |
config | Manage ~/.graphora/config.yaml |
status | Show current configuration and mode availability |
update | Update the embedded-mode graphora-api binary |
init | Shortcut for graphora config init |
version | Print the CLI version |
Configuration
Config is stored at~/.graphora/config.yaml. The CLI also reads several environment variables — see Environment variables below.
graphora config init
Interactive setup. For embedded mode, graphora-api is auto-downloaded and cached on first use; you only need a Gemini API key.
| Option | Description |
|---|---|
--mode, -m | embedded (default) or remote |
--api-key, -k | LLM API key (embedded mode) |
--api-path, -p | Custom path to a local graphora-api clone |
graphora config set
graphora config get / show / path
graphora init
Shortcut for graphora config init with the two flags people set most often:
graphora status
Reports the active mode, whether the embedded graphora-api binary is cached, and whether your API key / auth token are configured.
Extracting graphs
graphora extract
Run the extraction pipeline over one or more documents.
| Argument | Description |
|---|---|
files | One or more document files or directories (recurses) |
| Option | Description |
|---|---|
--output, -o | Output file path (default: graph.json) |
--schema, -s | Ontology YAML file (auto-inferred if omitted) |
--format, -f | Output format: json (default) or cypher |
--api-key, -k | LLM API key (embedded mode only; env: GRAPHORA_API_KEY) |
--mode, -m | Override the configured execution mode |
--verbose, -v | Show detailed progress |
| Category | Extensions | Notes |
|---|---|---|
.pdf | Multi-backend with OCR fallback for scanned pages | |
| Office | .docx, .xlsx, .pptx | Converted to markdown via MarkItDown — tables / cells / slide text preserved |
| Text | .txt, .md, .csv, .json, .xml, .html | Ingested directly without conversion |
graphora extract are recursed; files with unsupported extensions are skipped silently.
Behind the scenes — schema modes. graphora extract is a thin wrapper over three API paths, picked by the --schema / --mode flags:
| CLI invocation | API path | Schema posture |
|---|---|---|
graphora extract doc.pdf (no --schema) | POST /transform/upload | Auto-infer ontology from document content before extracting |
graphora extract doc.pdf --schema ontology.yaml | POST /transform/{ontology_id}/upload | Use the registered ontology to bias extraction |
| (programmatic) | POST /transform/schemaless/upload | Extract with generic schema, refine ontology post-hoc |
graphora schema infer
Generate an ontology YAML from documents without performing a full extraction. Useful as a starting point before extraction, or to diff against an existing schema.
| Option | Description |
|---|---|
--output, -o | Output file path (default: ontology.yaml) |
--api-key, -k | LLM API key (env: GRAPHORA_API_KEY) |
--verbose, -v | Show detailed progress |
graphora schema validate
Lint an ontology YAML for structural correctness and common issues. Exits non-zero on validation failure — safe to wire into pre-commit hooks.
Inspecting graphs
The next three commands require remote mode — they talk to a Graphora API server (hosted or your own deployment) and need an auth token.graphora explain
Show the source-span evidence and decision-log trail behind a single node or edge. Renders the originating text + document, the schema-inference / merge / accept-reject decisions the pipeline emitted, and any alternatives considered.
| Argument | Description |
|---|---|
transform_id | Transform ID the node or edge belongs to |
| Option | Description |
|---|---|
--node-id, -n | Node ID to explain (mutually exclusive with --edge-id) |
--edge-id, -e | Edge ID to explain |
--json | Emit raw JSON evidence payload (default: formatted view) |
--api-url | Override base URL (env: GRAPHORA_API_URL) |
--auth-token | Override auth token (env: GRAPHORA_AUTH_TOKEN) |
graphora diff
Compare two transforms and show added / removed / changed nodes and edges, with a summary table plus per-section detail tables.
| Option | Description |
|---|---|
--json | Emit the raw diff payload (same shape as the /diff endpoint) |
--show-unchanged | Include unchanged counts in the summary (off by default) |
--limit, -n | Per-section row cap on the detail tables (default: 20) |
--limit when entries have multiple changes. --json always emits the full set.
Scenarios
A scenario is a named point-in-time snapshot of a transform’s extracted graph. Scenarios are useful for branching what-if comparisons before publishing, or for pinning a known-good state before running a destructive merge.graphora scenario create
| Option | Description |
|---|---|
--transform-id, -t | Required. Transform whose graph will be snapshotted |
--name, -n | Required. Scenario name. Must be unique per (you, transform_id) — duplicates return 409 |
--description, -d | Optional free-form note |
--json | Emit the full scenario record (including embedded snapshot) as JSON |
graphora scenario list
scenario show to fetch the full record.
graphora scenario show
graphora scenario delete
explain and diff on cross-tenant access.
Regression testing
graphora test
Run a golden-corpus regression test against a Graphora API. For each subdirectory under <corpus-path>, the runner registers ontology.yaml, uploads document.txt, waits for extraction to complete, then POSTs the live transform_id plus expected.json to /api/v1/golden/score and records the report.
| Option | Description |
|---|---|
--min-f1 | Minimum F1 score required for exit 0. Applied to both nodes.f1 and edges.f1 of each entry. Default 0.0 makes the runner informational |
--output, -o | Write full per-doc report set + summary to a JSON file |
--json | Emit aggregated report set to stdout as JSON (instead of the Rich table) |
| Code | Meaning |
|---|---|
0 | Every doc’s nodes.f1 and edges.f1 met --min-f1 |
1 | At least one doc fell below the threshold |
2 | Configuration / corpus problem (no docs, missing credentials, etc.) |
<corpus-path> must contain document.txt, ontology.yaml, and expected.json to count as a corpus entry. Other subdirectories are skipped silently.
MCP integration
graphora install
Wire Graphora’s MCP server into an agent client’s config so the agent can call graphora-mcp tools directly.
| Argument | Description |
|---|---|
client | One of: claude-code, claude-desktop, codex, cursor, vscode |
| Option | Description |
|---|---|
--api-url | Graphora API URL the MCP server will talk to. Falls back to graphora config init / GRAPHORA_API_URL |
--auth-token | Auth token for the API. Falls back to config / GRAPHORA_AUTH_TOKEN. Pass an empty string (or skip) when the server has auth bypass enabled |
--command | Path to the graphora-mcp executable (default: unqualified graphora-mcp, relying on PATH). Use an absolute path when your agent client launches from a shell that doesn’t include the install directory |
--workspace | Workspace root for workspace-relative configs (cursor / vscode / codex / claude-code). Defaults to the current directory. Ignored for claude-desktop, which is user-global |
--force | Overwrite the existing config file instead of merging the graphora entry into mcpServers. Destructive — wipes other MCP server entries |
--dry-run | Print the resulting config to stdout without writing |
cursor, vscode, codex, and claude-code clients write to the workspace; claude-desktop writes to the user-global config. Existing mcpServers entries are preserved unless you pass --force.
Maintenance
graphora update
Update the embedded-mode graphora-api binary by re-downloading from GitHub.
| Option | Description |
|---|---|
--channel, -c | stable (default), nightly, or main |
--version, -v | Specific version to install (e.g., v1.2.0) |
--force, -f | Force re-download even when up to date |
graphora version
graphora CLI version. Equivalent to graphora --version.
Embedded vs remote mode
| Mode | What runs locally | Requirements |
|---|---|---|
| embedded (default) | The full extraction pipeline — graphora-api runs in-process | Gemini API key only |
| remote | Just the client — extraction runs on the server | API URL + auth token |
graphora-api is auto-downloaded on first use and cached at ~/.graphora/api/.
Remote mode is required for the explain, diff, scenario, and test commands, since those depend on server-side state (transform IDs, scenario storage, the golden-scoring endpoint). Configure it once with graphora config init --mode remote.
Environment variables
| Variable | Purpose |
|---|---|
GEMINI_API_KEY | LLM API key for embedded mode |
GRAPHORA_API_KEY | Alternative to GEMINI_API_KEY (used by extract / schema infer) |
GRAPHORA_API_URL | Base URL for remote-mode commands |
GRAPHORA_AUTH_TOKEN | Clerk-issued bearer token for remote mode |
--api-url / --auth-token: explicit flag > environment variable > ~/.graphora/config.yaml.
