Quality Validation
Quality validation is a critical component of Graphora’s knowledge graph generation pipeline. It ensures that extracted data meets your business requirements and maintains high standards before being merged into production databases.Overview
Graphora’s quality validation system automatically evaluates extracted knowledge graphs against predefined rules and standards, providing comprehensive feedback about data quality, completeness, and correctness.Key Benefits
- Automated Quality Assessment: Systematic evaluation of extracted data
- Comprehensive Scoring: 0-100 scale with letter grades (A-F)
- Detailed Violation Reports: Entity and property-level feedback
- Business Rule Validation: Custom domain-specific checks
- Quality-Driven Workflows: Approval/rejection based on quality thresholds
Quality Metrics
Overall Quality Score
The system provides an overall quality score from 0-100 with corresponding letter grades:- A (90-100): Excellent quality, ready for production
- B (80-89): Good quality, minor issues acceptable
- C (70-79): Average quality, review recommended
- D (60-69): Below average, significant issues present
- F (0-59): Poor quality, requires attention
Detailed Metrics
Quality Rule Types
Format Rules
Validate data types, formats, and structural requirements:- Data Type Validation: Ensure properties match expected types (string, integer, date)
- Format Compliance: Validate patterns like email addresses, phone numbers, URLs
- Range Validation: Check numeric values fall within acceptable ranges
- Length Constraints: Verify string lengths meet requirements
Business Rules
Domain-specific validation based on your business logic:- Uniqueness Constraints: Ensure unique identifiers are truly unique
- Relationship Validation: Verify relationships make business sense
- Cross-Entity Validation: Check consistency across related entities
- Custom Logic: Apply domain-specific business rules
Completeness Rules
Ensure required information is present and complete:- Required Properties: Validate all mandatory fields are populated
- Relationship Completeness: Check required relationships exist
- Coverage Analysis: Assess what percentage of expected data is present
- Missing Data Detection: Identify gaps in the knowledge graph
Consistency Rules
Maintain data consistency across the knowledge graph:- Cross-Reference Validation: Ensure referenced entities exist
- Temporal Consistency: Validate date and time relationships
- Hierarchical Consistency: Check parent-child relationships
- Constraint Validation: Enforce complex multi-entity constraints
Quality Violations
Violation Severity Levels
Quality violations are categorized by severity:Error (High Severity)
- Impact: Critical issues that prevent data usage
- Examples: Missing required properties, invalid data types, broken relationships
- Recommendation: Must be fixed before proceeding
Warning (Medium Severity)
- Impact: Issues that may affect data quality but don’t prevent usage
- Examples: Missing optional properties, format inconsistencies
- Recommendation: Should be reviewed and potentially fixed
Info (Low Severity)
- Impact: Minor issues or suggestions for improvement
- Examples: Optimization suggestions, style inconsistencies
- Recommendation: Optional improvements
Violation Details
Each violation provides comprehensive information:Quality-Driven Workflows
Approval Thresholds
Configure quality thresholds for different approval workflows:Feedback Loop
Quality validation supports continuous improvement through feedback:- Approval with Comments: Explain acceptance decisions
- Rejection with Reasons: Detail what needs improvement
- Learning Comments: Provide context for future processing
- Rule Refinement: Adjust quality rules based on feedback
Quality Configuration
Rule Definition in Ontology
Quality rules are defined directly in your ontology YAML:Dynamic Rule Application
Quality rules are applied dynamically based on your ontology:- Entity-Specific Rules: Applied only to specific entity types
- Property-Specific Rules: Target individual properties
- Global Rules: Applied across all entities
- Conditional Rules: Applied based on specific conditions
Quality API Usage
Basic Quality Workflow
Advanced Quality Analysis
Best Practices
Rule Design
- Start Simple: Begin with basic format and required field rules
- Iterate Gradually: Add business rules based on real data patterns
- Balance Strictness: Avoid overly restrictive rules that reject good data
- Document Rules: Provide clear descriptions for all quality rules
Quality Thresholds
- Environment-Specific: Use different thresholds for staging vs. production
- Data-Type Aware: Adjust thresholds based on data complexity
- Business Aligned: Set thresholds that match business requirements
- Monitor Trends: Track quality improvements over time
Violation Management
- Prioritize Errors: Always address error-level violations first
- Review Warnings: Assess warning-level violations for business impact
- Track Patterns: Identify recurring violation types for rule refinement
- Provide Feedback: Use detailed feedback to improve future processing
Continuous Improvement
- Regular Review: Periodically review quality rules and thresholds
- Feedback Analysis: Analyze approval/rejection patterns
- Rule Optimization: Refine rules based on real-world usage
- Performance Monitoring: Track quality validation performance
Integration with Other Concepts
Quality validation integrates seamlessly with other Graphora concepts:With Ontology
- Quality rules are defined in your ontology YAML
- Rule enforcement depends on entity and property definitions
- Schema changes can trigger quality rule updates
With Transformation
- Quality validation runs automatically after transformation
- Quality results influence transformation retry decisions
- Poor quality can trigger automatic reprocessing
With Merging
- Quality approval is required before merging
- Quality feedback informs conflict resolution strategies
- High-quality data reduces merge conflicts
With Graph Management
- Quality scores help prioritize graph maintenance
- Quality trends inform graph evolution decisions
- Quality violations guide data cleanup efforts
Next Steps
- Learn how to define quality rules in your ontology
- Explore the complete Quality API reference
- Review quality validation examples
- Understand how quality integrates with the merging process
