Skip to main content

Configuration

Graphora is configured through environment variables. Copy .env.example to .env and set the values for your environment.
cp .env.example .env

Minimal Configuration

For local development with zero dependencies, you only need three variables:
STORAGE_TYPE=memory
AUTH_BYPASS_ENABLED=true
GOOGLE_GEMINI_API_KEY=your-key
Everything else has sensible defaults.

Storage

VariableDefaultDescription
STORAGE_TYPEneo4jGraph storage backend. memory for local dev, neo4j for production.
STORAGE_BATCH_SIZE1000Batch size for storage operations.
STORAGE_RETRIES3Number of retries for storage operations.

Neo4j (Production)

When STORAGE_TYPE=neo4j, configure the Neo4j connection through the Graphora UI Config page or via Supabase database_configs entries. The API supports staging and production database environments.

In-Memory (Development)

When STORAGE_TYPE=memory, all graph data is stored in the running process. No database setup is required. Data is lost when the server stops.

LLM Providers

Configure at least one LLM provider for AI-powered extraction.
VariableDescription
GOOGLE_GEMINI_API_KEYGoogle AI Studio API key for Gemini models
OPENAI_API_KEYOpenAI API key
ANTHROPIC_API_KEYAnthropic API key
DEEPSEEK_API_KEYDeepSeek API key
DEEPSEEK_BASE_URLDeepSeek API base URL (default: https://api.deepseek.com)
VERTEXAI_PROJECT_IDGoogle Cloud project ID for Vertex AI
VERTEXAI_LOCATIONVertex AI region (default: us-central1)
VERTEXAI_DEFAULT_MODELDefault Vertex AI model (default: gemini-2.5-flash-lite-001)

Authentication

VariableDefaultDescription
AUTH_BYPASS_ENABLEDfalseSkip Clerk authentication for local dev. Never enable in production.
AUTH_BYPASS_USER_IDlocal-dev-userUser ID used when auth bypass is enabled.
AUTH_BYPASS_EMAILdev@localhostEmail used when auth bypass is enabled.
CLERK_JWKS_URLURL to Clerk JWKS for token verification. Required in production.
CLERK_ISSUERExpected issuer for Clerk tokens.
CLERK_AUDIENCEExpected audience for Clerk tokens.
CLERK_API_KEYClerk backend API key for management operations.

API Server

VariableDefaultDescription
API_PORT8000Port for the API server.
PUBLIC_API_URLhttp://localhost:8000Base URL exposed to clients and other apps.
CORS_ORIGINShttp://localhost:3000,http://127.0.0.1:3000Comma-separated list of allowed CORS origins.
LOG_LEVELINFOApplication log level (DEBUG, INFO, WARNING, ERROR).

Redis

Redis is used for progress tracking during document transformations. It is optional — when unavailable, the API falls back to an in-memory store automatically.
VariableDefaultDescription
REDIS_HOSTlocalhostRedis host.
REDIS_PORT6379Redis port.
REDIS_DB0Redis database number.
REDIS_PASSWORDRedis password (optional).
CACHE_TTL_HOURS24Cache TTL in hours.
REDIS_RATE_LIMIT_DB1Separate Redis database for rate limiting.
When Redis is unavailable, the progress tracker logs a warning and uses in-memory storage. This is expected behavior for local development without Docker.

Prefect (Workflow Orchestration)

VariableDefaultDescription
PREFECT_API_URLhttp://127.0.0.1:4200/apiPrefect API URL.
PREFECT_API_KEYPrefect API key for authentication.

Database (Postgres)

Postgres stores application configuration, audit logs, and AI provider settings.
VariableDefaultDescription
DATABASE_URLFull PostgreSQL connection string. Takes precedence over individual vars.
POSTGRES_HOSTlocalhostPostgres host.
POSTGRES_PORT5432Postgres port.
POSTGRES_DBgraphoraPostgres database name.
POSTGRES_USERgraphoraPostgres username.
POSTGRES_PASSWORDgraphoraPostgres password.

Document Processing

VariableDefaultDescription
MAX_CHUNK_SIZE32000Maximum size of a text chunk (characters).
MIN_CHUNK_SIZE1000Minimum size of a text chunk.
MAX_CHUNKS_PER_DOC100Maximum number of chunks per document.
SEMANTIC_THRESHOLD0.7Threshold for semantic similarity in chunking.
EMBEDDING_MODELsentence-transformers/all-mpnet-base-v2Model for text embeddings.
TRANSFORM_MAX_CONCURRENCY4Max concurrent LLM extractions per transform.
EXTRACTION_CONCURRENCY5Concurrency for extraction tasks.

Quality Validation

VariableDefaultDescription
QUALITY_MIN_SCORE85.0Minimum score required for auto-approval.
QUALITY_FAIL_SCORE70.0Minimum score to proceed; below this the transform fails.
QUALITY_FAIL_ON_VIOLATIONtrueFail the transform when violations are present and auto-approval is off.

Entity Resolution

VariableDefaultDescription
ENTITY_RESOLUTION_EMBEDDING_ENABLEDtrueEnable embedding-based semantic similarity matching.
ENTITY_RESOLUTION_EMBEDDING_MODELall-MiniLM-L6-v2Model for entity resolution embeddings.
ENTITY_RESOLUTION_SIMILARITY_THRESHOLD0.85Minimum similarity threshold for entity matching.
ENTITY_RESOLUTION_CROSS_DOCUMENT_ENABLEDtrueEnable cross-document entity linking.

Security

VariableDefaultDescription
ENCRYPTION_MASTER_KEYMaster encryption key for password encryption (base64 encoded).

File Paths

VariableDefaultDescription
UPLOAD_DIR/tmp/graphora/uploadsDirectory for uploaded files.
ONTOLOGY_DIR~/.graphora/ontologiesDirectory for stored ontologies.
LOG_DIR/tmp/graphora/logsDirectory for log files.

Example Configurations

Local Development (Minimal)

STORAGE_TYPE=memory
AUTH_BYPASS_ENABLED=true
GOOGLE_GEMINI_API_KEY=your-key

Local Development (Full Stack)

STORAGE_TYPE=neo4j
AUTH_BYPASS_ENABLED=true
GOOGLE_GEMINI_API_KEY=your-key
REDIS_HOST=127.0.0.1
REDIS_PORT=6379
PREFECT_API_URL=http://127.0.0.1:4200/api
DATABASE_URL=postgresql://graphora:graphora@localhost:5432/graphora

Production

STORAGE_TYPE=neo4j
AUTH_BYPASS_ENABLED=false
GOOGLE_GEMINI_API_KEY=your-key
CLERK_JWKS_URL=https://your-clerk-instance/.well-known/jwks.json
CLERK_ISSUER=https://your-clerk-instance
CLERK_AUDIENCE=your-audience
REDIS_HOST=your-redis-host
REDIS_PASSWORD=your-redis-password
DATABASE_URL=postgresql://user:pass@host:5432/graphora
ENCRYPTION_MASTER_KEY=your-base64-key
LOG_LEVEL=WARNING