Skip to main content

CLI Reference

The Graphora CLI is a single binary that covers the full extract → inspect → compare → regression-test loop. It works in two modes: embedded (runs the extraction pipeline locally — only a Gemini API key required) and remote (talks to a Graphora API server, hosted or self-deployed). This page is the authoritative command reference. For end-to-end walkthroughs see the Platform Quickstart and Local Dev Guide.

Installation

pip install graphora[cli]
This installs both the graphora CLI and the Python client library. Requires Python 3.9 or higher. To install just the client library without the CLI, omit the [cli] extra. Verify:
graphora --version

Quick start

# 1. One-time setup — picks up a free Gemini key from Google AI Studio
graphora config init --api-key "$GEMINI_API_KEY"

# 2. Extract a graph (schema auto-inferred)
graphora extract report.pdf -o graph.json

# 3. (Remote mode only) inspect the result
graphora explain tx_abc123 --node-id n_alice

Commands at a glance

CommandPurpose
extractRun extraction over one or more documents
schema inferGenerate an ontology from documents without extracting
schema validateLint an ontology YAML file
explainShow the evidence and decisions behind a node or edge
diffCompare two transforms — added / removed / changed
scenarioCreate and inspect named graph snapshots
testGolden-corpus regression runner with F1 gating
installWire Graphora’s MCP server into an agent client (Cursor, Claude Desktop, etc.)
configManage ~/.graphora/config.yaml
statusShow current configuration and mode availability
updateUpdate the embedded-mode graphora-api binary
initShortcut for graphora config init
versionPrint the CLI version

Configuration

Config is stored at ~/.graphora/config.yaml. The CLI also reads several environment variables — see Environment variables below.

graphora config init

Interactive setup. For embedded mode, graphora-api is auto-downloaded and cached on first use; you only need a Gemini API key.
graphora config init --api-key "$GEMINI_API_KEY"

# Remote mode against a hosted server
graphora config init --mode remote \
  --api-url "https://api.graphora.io" \
  --auth-token "$CLERK_JWT"

# Local development against your own graphora-api clone
graphora config init --api-path /path/to/graphora-api
OptionDescription
--mode, -membedded (default) or remote
--api-key, -kLLM API key (embedded mode)
--api-path, -pCustom path to a local graphora-api clone

graphora config set

graphora config set llm.api_key "your-key"
graphora config set llm.model "gemini-1.5-pro"
graphora config set defaults.mode "remote"
graphora config set api.url "https://api.graphora.io"

graphora config get / show / path

graphora config get llm.model          # one value
graphora config show                   # all values, sensitive fields masked
graphora config path                   # print the config file path

graphora init

Shortcut for graphora config init with the two flags people set most often:
graphora init --api-key "$GEMINI_API_KEY"
graphora init --mode remote

graphora status

Reports the active mode, whether the embedded graphora-api binary is cached, and whether your API key / auth token are configured.
graphora status

Extracting graphs

graphora extract

Run the extraction pipeline over one or more documents.
graphora extract <file...> [options]
ArgumentDescription
filesOne or more document files or directories (recurses)
OptionDescription
--output, -oOutput file path (default: graph.json)
--schema, -sOntology YAML file (auto-inferred if omitted)
--format, -fOutput format: json (default) or cypher
--api-key, -kLLM API key (embedded mode only; env: GRAPHORA_API_KEY)
--mode, -mOverride the configured execution mode
--verbose, -vShow detailed progress
# Single document, auto-inferred schema
graphora extract report.pdf -o graph.json

# Use an explicit ontology
graphora extract report.pdf --schema ontology.yaml -o graph.json

# Multiple documents combined into one graph
graphora extract doc1.pdf doc2.pdf docs/ -o combined.json

# Cypher output for Neo4j import
graphora extract report.pdf -f cypher -o import.cypher

# Excel workbook (sheets + cells become structured input via MarkItDown)
graphora extract audit-workpaper.xlsx -o graph.json

# Force remote mode for a one-off
graphora extract report.pdf --mode remote
Supported file formats (10 total):
CategoryExtensionsNotes
PDF.pdfMulti-backend with OCR fallback for scanned pages
Office.docx, .xlsx, .pptxConverted to markdown via MarkItDown — tables / cells / slide text preserved
Text.txt, .md, .csv, .json, .xml, .htmlIngested directly without conversion
The parser sniffs by file extension. Directories passed to graphora extract are recursed; files with unsupported extensions are skipped silently. Behind the scenes — schema modes. graphora extract is a thin wrapper over three API paths, picked by the --schema / --mode flags:
CLI invocationAPI pathSchema posture
graphora extract doc.pdf (no --schema)POST /transform/uploadAuto-infer ontology from document content before extracting
graphora extract doc.pdf --schema ontology.yamlPOST /transform/{ontology_id}/uploadUse the registered ontology to bias extraction
(programmatic)POST /transform/schemaless/uploadExtract with generic schema, refine ontology post-hoc
See API Reference — Extraction modes for the full posture comparison and when to pick each.

graphora schema infer

Generate an ontology YAML from documents without performing a full extraction. Useful as a starting point before extraction, or to diff against an existing schema.
graphora schema infer document.pdf -o ontology.yaml
graphora schema infer doc1.pdf doc2.pdf -o combined-schema.yaml
OptionDescription
--output, -oOutput file path (default: ontology.yaml)
--api-key, -kLLM API key (env: GRAPHORA_API_KEY)
--verbose, -vShow detailed progress

graphora schema validate

Lint an ontology YAML for structural correctness and common issues. Exits non-zero on validation failure — safe to wire into pre-commit hooks.
graphora schema validate ontology.yaml
graphora schema validate ontology.yaml -v   # verbose

Inspecting graphs

The next three commands require remote mode — they talk to a Graphora API server (hosted or your own deployment) and need an auth token.

graphora explain

Show the source-span evidence and decision-log trail behind a single node or edge. Renders the originating text + document, the schema-inference / merge / accept-reject decisions the pipeline emitted, and any alternatives considered.
graphora explain <transform-id> --node-id <node-id>
graphora explain <transform-id> --edge-id <edge-id>
ArgumentDescription
transform_idTransform ID the node or edge belongs to
OptionDescription
--node-id, -nNode ID to explain (mutually exclusive with --edge-id)
--edge-id, -eEdge ID to explain
--jsonEmit raw JSON evidence payload (default: formatted view)
--api-urlOverride base URL (env: GRAPHORA_API_URL)
--auth-tokenOverride auth token (env: GRAPHORA_AUTH_TOKEN)
graphora explain tx_abc123 --node-id n_alice
graphora explain tx_abc123 --edge-id e_works_at --json > evidence.json

graphora diff

Compare two transforms and show added / removed / changed nodes and edges, with a summary table plus per-section detail tables.
graphora diff <base-transform-id> <compare-transform-id>
OptionDescription
--jsonEmit the raw diff payload (same shape as the /diff endpoint)
--show-unchangedInclude unchanged counts in the summary (off by default)
--limit, -nPer-section row cap on the detail tables (default: 20)
graphora diff tx-old tx-new
graphora diff tx-old tx-new --show-unchanged
graphora diff tx-old tx-new --json | jq '.summary'
graphora diff tx-old tx-new -n 5            # tighter detail tables
Each entry in the changed section expands to one row per property change, so the rendered row count can exceed --limit when entries have multiple changes. --json always emits the full set.

Scenarios

A scenario is a named point-in-time snapshot of a transform’s extracted graph. Scenarios are useful for branching what-if comparisons before publishing, or for pinning a known-good state before running a destructive merge.

graphora scenario create

graphora scenario create -t <transform-id> -n <name> [-d <description>]
OptionDescription
--transform-id, -tRequired. Transform whose graph will be snapshotted
--name, -nRequired. Scenario name. Must be unique per (you, transform_id) — duplicates return 409
--description, -dOptional free-form note
--jsonEmit the full scenario record (including embedded snapshot) as JSON
graphora scenario create -t tx-abc -n baseline
graphora scenario create -t tx-abc -n "merge-alice-alicia" -d "what-if ER"

graphora scenario list

graphora scenario list
graphora scenario list --json
Lightweight summary view (counts only, no embedded graph). Use scenario show to fetch the full record.

graphora scenario show

graphora scenario show <scenario-id>
graphora scenario show sc-abc123 --json | jq '.graph.nodes | length'

graphora scenario delete

graphora scenario delete <scenario-id>          # interactive confirm
graphora scenario delete <scenario-id> --yes    # non-interactive (CI)
The server’s delete is intentionally non-idempotent — a 404 fires when the scenario doesn’t exist or belongs to another user. Same posture as explain and diff on cross-tenant access.

Regression testing

graphora test

Run a golden-corpus regression test against a Graphora API. For each subdirectory under <corpus-path>, the runner registers ontology.yaml, uploads document.txt, waits for extraction to complete, then POSTs the live transform_id plus expected.json to /api/v1/golden/score and records the report.
graphora test <corpus-path> [options]
OptionDescription
--min-f1Minimum F1 score required for exit 0. Applied to both nodes.f1 and edges.f1 of each entry. Default 0.0 makes the runner informational
--output, -oWrite full per-doc report set + summary to a JSON file
--jsonEmit aggregated report set to stdout as JSON (instead of the Rich table)
graphora test golden/                        # informational
graphora test golden/ --min-f1 0.6           # gate at F1 >= 0.6
graphora test golden/ --json > report.json
graphora test golden/ -o report.json --min-f1 0.5
Exit codes:
CodeMeaning
0Every doc’s nodes.f1 and edges.f1 met --min-f1
1At least one doc fell below the threshold
2Configuration / corpus problem (no docs, missing credentials, etc.)
Each subdirectory under <corpus-path> must contain document.txt, ontology.yaml, and expected.json to count as a corpus entry. Other subdirectories are skipped silently.

MCP integration

graphora install

Wire Graphora’s MCP server into an agent client’s config so the agent can call graphora-mcp tools directly.
graphora install <client> [options]
ArgumentDescription
clientOne of: claude-code, claude-desktop, codex, cursor, vscode
OptionDescription
--api-urlGraphora API URL the MCP server will talk to. Falls back to graphora config init / GRAPHORA_API_URL
--auth-tokenAuth token for the API. Falls back to config / GRAPHORA_AUTH_TOKEN. Pass an empty string (or skip) when the server has auth bypass enabled
--commandPath to the graphora-mcp executable (default: unqualified graphora-mcp, relying on PATH). Use an absolute path when your agent client launches from a shell that doesn’t include the install directory
--workspaceWorkspace root for workspace-relative configs (cursor / vscode / codex / claude-code). Defaults to the current directory. Ignored for claude-desktop, which is user-global
--forceOverwrite the existing config file instead of merging the graphora entry into mcpServers. Destructive — wipes other MCP server entries
--dry-runPrint the resulting config to stdout without writing
# Workspace-relative install for Cursor
graphora install cursor

# User-global install for Claude Desktop
graphora install claude-desktop

# Preview without writing
graphora install cursor --dry-run

# One-off against a local API
graphora install cursor \
  --api-url http://localhost:8000 \
  --auth-token "$JWT"
The cursor, vscode, codex, and claude-code clients write to the workspace; claude-desktop writes to the user-global config. Existing mcpServers entries are preserved unless you pass --force.

Maintenance

graphora update

Update the embedded-mode graphora-api binary by re-downloading from GitHub.
graphora update
graphora update --channel nightly
graphora update --version v1.2.0
graphora update --force                # re-download even if up to date
OptionDescription
--channel, -cstable (default), nightly, or main
--version, -vSpecific version to install (e.g., v1.2.0)
--force, -fForce re-download even when up to date

graphora version

graphora version
Prints the installed graphora CLI version. Equivalent to graphora --version.

Embedded vs remote mode

ModeWhat runs locallyRequirements
embedded (default)The full extraction pipeline — graphora-api runs in-processGemini API key only
remoteJust the client — extraction runs on the serverAPI URL + auth token
Embedded mode is recommended for one-off extractions, CI pipelines, and evaluation. No database, Redis, or external services needed; graphora-api is auto-downloaded on first use and cached at ~/.graphora/api/. Remote mode is required for the explain, diff, scenario, and test commands, since those depend on server-side state (transform IDs, scenario storage, the golden-scoring endpoint). Configure it once with graphora config init --mode remote.

Environment variables

VariablePurpose
GEMINI_API_KEYLLM API key for embedded mode
GRAPHORA_API_KEYAlternative to GEMINI_API_KEY (used by extract / schema infer)
GRAPHORA_API_URLBase URL for remote-mode commands
GRAPHORA_AUTH_TOKENClerk-issued bearer token for remote mode
Precedence for --api-url / --auth-token: explicit flag > environment variable > ~/.graphora/config.yaml.

Python client library

The CLI is built on top of the Python client. For programmatic access:
from graphora import GraphoraClient

client = GraphoraClient()

with open("ontology.yaml") as f:
    ontology = client.register_ontology(f.read())

transform = client.transform(ontology_id=ontology.id, files=["document.pdf"])
status = client.wait_for_transform(transform.id)
graph = client.get_transformed_graph(transform_id=transform.id)
print(f"Extracted {len(graph.nodes)} nodes and {len(graph.edges)} edges")
See the Client Library documentation for the full API surface.