Document Transformation
Document transformation is the process of converting unstructured documents into structured graph data based on your ontology. This is a core capability of Graphora that allows you to extract meaningful entities and relationships from your documents.The Transformation Process
When you submit documents to Graphora for transformation, the following steps occur:Entity Extraction
Entities defined in your ontology are identified and extracted from the documents.
Relationship Extraction
Relationships between entities are identified based on the ontology definitions.
Supported Document Types
Graphora supports a variety of document types for transformation:- Text files (
.txt) - PDF documents (
.pdf) - PPT documents (
.ppt) - PPTX documents (
.pptx) - Word documents (
.doc) - Word documents (
.docx) - CSV files (
.csv) - JSON files (
.json) - YAML files (
.yaml) - XML files (
.xml)
Transformation in the Client Library
The Graphora client library provides methods for transforming documents and monitoring the transformation process:Transforming Documents
Checking Transformation Status
Retrieving the Transformed Graph
Transformation Stages
The transformation process goes through several stages, which you can monitor through thestage_progress field in the transformation status:
| Stage | Description |
|---|---|
UPLOAD | Documents are being uploaded to the platform |
PARSING | Documents are being parsed into a processable format |
EXTRACTION | Entities and relationships are being extracted |
VALIDATION | Extracted data is being validated against the ontology |
INDEXING | Data is being indexed for efficient querying |
COMPLETED | Transformation is complete |
Handling Transformation Errors
If a transformation fails, you can check theerror field in the transformation status:
Best Practices for Document Transformation
To get the best results from document transformation:- Ensure your ontology is well-defined: The quality of extraction depends on your ontology
- Use appropriate document types: Different document types may yield different results
- Monitor transformation progress: Long documents may take time to process
- Handle errors gracefully: Check for and handle transformation errors
- Clean up after completion: Use
cleanup_transformto remove temporary files
Example: Complete Transformation Workflow
Here’s a complete example of transforming documents and handling the results:Next Steps
- Learn about Merging extracted data
- Explore the Graph data model
- Check out the API Reference for detailed information
