Skip to main content

Ontology Examples

This section provides complete, real-world ontology examples that demonstrate different features and patterns. Each example includes entities, relationships, quality rules, and explanations of design decisions.

Financial Reporting Ontology

A comprehensive ontology for SEC Form 10-K financial documents with quality validation:
version: 1

qualityConfig:
  overallScoreThreshold: 80
  maxViolationsPerEntity: 5
  confidenceThreshold: 0.7
  distributionRules:
    maxDuplicateRatio: 0.3
    minUniqueRatio: 0.1

entities:
  Metadata:
    properties:
      name:
        type: string
        description: "Document identifier (e.g., Form13_2023)"
        required: true
        quality:
          format:
            pattern: "^Form[0-9]+_[0-9]{4}$"
            caseFormat: "titleCase"
      type:
        type: string
        description: "Document type classification"
        required: true
        quality:
          business:
            allowedValues: ["financial_report", "annual_report", "quarterly_report"]
      filingDate:
        type: string
        description: "Filing date in YYYY-MM-DD format"
        required: true
        quality:
          format:
            pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
      context:
        type: string
        description: "High-level document summary"
        index: true
        quality:
          format:
            minLength: 10
            maxLength: 500
    relationships:
      ABOUT_COMPANY:
        target: Company
        cardinality: manyToOne
        description: "Document subject company"

  Company:
    properties:
      name:
        type: string
        description: "Official company legal name"
        required: true
        unique: true
        index: true
        quality:
          format:
            pattern: "^[A-Z][a-zA-Z0-9\\s&.,-]+$"
            minLength: 2
            maxLength: 100
            caseFormat: "titleCase"
          business:
            forbiddenValues: ["Unknown Company", "N/A", "TBD", ""]
      cik:
        type: string
        description: "SEC Central Index Key (10-digit identifier)"
        unique: true
        quality:
          format:
            pattern: "^[0-9]{10}$"
      ticker:
        type: string
        description: "Stock exchange ticker symbol"
        unique: true
        index: true
        quality:
          format:
            pattern: "^[A-Z]{1,5}$"
            caseFormat: "upperCase"
      industry:
        type: string
        description: "Primary industry classification"
        required: true
        index: true
        quality:
          business:
            allowedValues:
              - "Technology"
              - "Healthcare"
              - "Financial Services"
              - "Manufacturing"
              - "Retail"
              - "Energy"
              - "Real Estate"
              - "Utilities"
              - "Telecommunications"
    relationships:
      HAS_BUSINESS:
        target: Business
        cardinality: oneToMany
      HAS_RISK_FACTOR:
        target: RiskFactor
        cardinality: oneToMany
      HAS_LEGAL_PROCEEDING:
        target: LegalProceeding
        cardinality: oneToMany

  Business:
    properties:
      description:
        type: string
        description: "General business description"
        required: true
        quality:
          format:
            minLength: 20
            maxLength: 1000
      employees:
        type: integer
        description: "Total number of employees"
        quality:
          business:
            minValue: 1
            maxValue: 1000000
      revenue:
        type: float
        description: "Annual revenue in millions USD"
        quality:
          business:
            minValue: 0.0
            maxValue: 1000000.0
      seasonality:
        type: string
        description: "Business seasonality patterns"
        quality:
          format:
            maxLength: 200
          business:
            forbiddenValues: ["None", "N/A", "Unknown"]
    relationships:
      HAS_SEGMENT:
        target: BusinessSegment
        cardinality: oneToMany
      HAS_PRODUCT:
        target: Product
        cardinality: oneToMany
      COMPETES_WITH:
        target: Company
        cardinality: manyToMany
        properties:
          market_overlap:
            type: string
            description: "Description of competitive overlap"

  RiskFactor:
    properties:
      name:
        type: string
        description: "Risk factor identifier"
        required: true
        unique: true
        quality:
          format:
            minLength: 5
            maxLength: 100
            caseFormat: "titleCase"
          business:
            forbiddenValues: ["Unknown Risk", "General Risk", "Other"]
      description:
        type: string
        description: "Detailed risk description"
        required: true
        quality:
          format:
            minLength: 50
            maxLength: 2000
      potentialImpact:
        type: string
        description: "Potential business impact"
        quality:
          business:
            allowedValues: ["Low", "Medium", "High", "Critical"]
      mitigationStrategy:
        type: string
        description: "Risk mitigation approach"
        quality:
          format:
            minLength: 10
            maxLength: 500

  BusinessSegment:
    properties:
      name:
        type: string
        description: "Business segment name"
        required: true
        quality:
          format:
            minLength: 2
            maxLength: 50
            caseFormat: "titleCase"
      revenuePercentage:
        type: float
        description: "Percentage of total company revenue"
        quality:
          business:
            minValue: 0.0
            maxValue: 100.0
            inclusiveMin: true
            inclusiveMax: true
      description:
        type: string
        description: "Segment business description"
        quality:
          format:
            minLength: 20
            maxLength: 500

  Product:
    properties:
      name:
        type: string
        description: "Product or service name"
        required: true
        quality:
          format:
            minLength: 2
            maxLength: 100
            caseFormat: "titleCase"
      category:
        type: string
        description: "Product category"
        quality:
          business:
            allowedValues: ["Software", "Hardware", "Service", "Platform", "Solution"]
      description:
        type: string
        description: "Product description"
        quality:
          format:
            minLength: 10
            maxLength: 300

  LegalProceeding:
    properties:
      caseName:
        type: string
        description: "Legal case identifier"
        required: true
        unique: true
        quality:
          format:
            minLength: 5
            maxLength: 200
      status:
        type: string
        description: "Current case status"
        required: true
        quality:
          business:
            allowedValues: ["Active", "Settled", "Dismissed", "Pending"]
      description:
        type: string
        description: "Case description and potential impact"
        required: true
        quality:
          format:
            minLength: 20
            maxLength: 1000
      potentialLiability:
        type: float
        description: "Estimated liability in millions USD"
        quality:
          business:
            minValue: 0.0
            maxValue: 10000.0

Key Features of This Ontology

  • Metadata entity captures document-level information
  • Filing date validation ensures proper date format
  • Document type constraints limit to valid financial document types
  • CIK validation enforces 10-digit SEC identifier format
  • Ticker symbol validation for stock exchange symbols
  • Industry classification uses controlled vocabulary
  • Pattern matching for structured data (dates, CIK, ticker)
  • Business rules prevent common data quality issues
  • Range validation for numeric fields like revenue and employees
  • Hierarchical structure from company to business segments
  • Many-to-many competitive relationships
  • One-to-many risk factors and legal proceedings

Technology Company Ontology

Simplified ontology focused on technology companies and their products:
version: 1

qualityConfig:
  overallScoreThreshold: 75
  confidenceThreshold: 0.65

entities:
  Company:
    properties:
      name:
        type: string
        description: "Company name"
        required: true
        unique: true
        quality:
          format:
            minLength: 2
            maxLength: 80
            caseFormat: "titleCase"
          business:
            forbiddenValues: ["Unknown", "TBD", "N/A"]
      website:
        type: string
        description: "Company website URL"
        quality:
          format:
            pattern: "^https?://[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}.*$"
      foundedYear:
        type: integer
        description: "Year company was founded"
        quality:
          business:
            minValue: 1950
            maxValue: 2024
      employeeCount:
        type: integer
        description: "Number of employees"
        quality:
          business:
            minValue: 1
            maxValue: 500000
      headquarters:
        type: string
        description: "Company headquarters location"
        quality:
          format:
            minLength: 5
            maxLength: 100
    relationships:
      DEVELOPS:
        target: Product
        cardinality: oneToMany
      ACQUIRED:
        target: Company
        cardinality: oneToMany
        properties:
          acquisitionDate:
            type: string
            description: "Date of acquisition (YYYY-MM-DD)"
            quality:
              format:
                pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
          acquisitionPrice:
            type: float
            description: "Acquisition price in millions USD"

  Product:
    properties:
      name:
        type: string
        description: "Product name"
        required: true
        quality:
          format:
            minLength: 2
            maxLength: 60
            caseFormat: "titleCase"
      category:
        type: string
        description: "Product category"
        required: true
        quality:
          business:
            allowedValues:
              - "Mobile App"
              - "Web Application"
              - "Software Platform"
              - "API Service"
              - "Hardware Device"
              - "Cloud Service"
              - "Development Tool"
      launchDate:
        type: string
        description: "Product launch date"
        quality:
          format:
            pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
      description:
        type: string
        description: "Product description"
        quality:
          format:
            minLength: 10
            maxLength: 300
      pricingModel:
        type: string
        description: "How the product is priced"
        quality:
          business:
            allowedValues:
              - "Free"
              - "Freemium"
              - "Subscription"
              - "One-time Purchase"
              - "Usage-based"
              - "Enterprise License"
    relationships:
      INTEGRATES_WITH:
        target: Product
        cardinality: manyToMany
        description: "Product integration relationships"
      TARGETS:
        target: Market
        cardinality: manyToMany

  Market:
    properties:
      name:
        type: string
        description: "Market segment name"
        required: true
        quality:
          format:
            minLength: 3
            maxLength: 50
            caseFormat: "titleCase"
      size:
        type: string
        description: "Market size description"
        quality:
          business:
            allowedValues: ["Small", "Medium", "Large", "Enterprise"]
      region:
        type: string
        description: "Geographic region"
        quality:
          business:
            allowedValues:
              - "North America"
              - "Europe"
              - "Asia Pacific"
              - "Latin America"
              - "Middle East"
              - "Africa"
              - "Global"

  Person:
    properties:
      name:
        type: string
        description: "Person's full name"
        required: true
        quality:
          format:
            minLength: 2
            maxLength: 60
            caseFormat: "titleCase"
      role:
        type: string
        description: "Role or title"
        quality:
          business:
            allowedValues:
              - "CEO"
              - "CTO"
              - "CFO"
              - "VP Engineering"
              - "VP Product"
              - "VP Marketing"
              - "Director"
              - "Senior Manager"
              - "Manager"
              - "Engineer"
              - "Designer"
              - "Founder"
      linkedin:
        type: string
        description: "LinkedIn profile URL"
        quality:
          format:
            pattern: "^https://www\\.linkedin\\.com/in/[a-zA-Z0-9-]+/?$"
    relationships:
      WORKS_FOR:
        target: Company
        cardinality: manyToOne
        properties:
          startDate:
            type: string
            description: "Employment start date"
          isCurrent:
            type: boolean
            description: "Whether currently employed"
            required: true

Technology Ontology Highlights

  • Web-focused validation: Website and LinkedIn URL patterns
  • Product categorization: Technology-specific product types
  • Pricing models: Common SaaS and technology pricing approaches
  • Market segmentation: Geographic and size-based market classification
  • Person-company relationships: Employment tracking with temporal data

Healthcare Research Ontology

Ontology designed for medical research documents and clinical data:
version: 1

qualityConfig:
  overallScoreThreshold: 85  # Higher threshold for medical data
  confidenceThreshold: 0.8     # Higher confidence required

entities:
  Study:
    properties:
      title:
        type: string
        description: "Research study title"
        required: true
        quality:
          format:
            minLength: 10
            maxLength: 200
      studyId:
        type: string
        description: "Unique study identifier"
        unique: true
        required: true
        quality:
          format:
            pattern: "^[A-Z]{2,4}-[0-9]{4,6}$"
      phase:
        type: string
        description: "Clinical trial phase"
        quality:
          business:
            allowedValues: ["Preclinical", "Phase I", "Phase II", "Phase III", "Phase IV"]
      status:
        type: string
        description: "Current study status"
        required: true
        quality:
          business:
            allowedValues: ["Planning", "Active", "Completed", "Terminated", "Suspended"]
      startDate:
        type: string
        description: "Study start date"
        quality:
          format:
            pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
      participantCount:
        type: integer
        description: "Number of study participants"
        quality:
          business:
            minValue: 1
            maxValue: 100000
    relationships:
      INVESTIGATES:
        target: Condition
        cardinality: manyToMany
      TESTS:
        target: Treatment
        cardinality: manyToMany
      CONDUCTED_BY:
        target: Institution
        cardinality: manyToOne

  Condition:
    properties:
      name:
        type: string
        description: "Medical condition name"
        required: true
        unique: true
        quality:
          format:
            minLength: 3
            maxLength: 100
            caseFormat: "title_case"
      icdCode:
        type: string
        description: "ICD-10 diagnostic code"
        quality:
          format:
            pattern: "^[A-Z][0-9]{2}(\\.[0-9]{1,2})?$"
      severity:
        type: string
        description: "Condition severity level"
        quality:
          business:
            allowedValues: ["Mild", "Moderate", "Severe", "Critical"]
      prevalence:
        type: string
        description: "Condition prevalence"
        quality:
          business:
            allowedValues: ["Rare", "Uncommon", "Common", "Very Common"]

  Treatment:
    properties:
      name:
        type: string
        description: "Treatment name"
        required: true
        quality:
          format:
            minLength: 2
            maxLength: 80
            caseFormat: "titleCase"
      type:
        type: string
        description: "Treatment type"
        required: true
        quality:
          business:
            allowedValues: ["Drug", "Device", "Procedure", "Therapy", "Surgery"]
      approvalStatus:
        type: string
        description: "Regulatory approval status"
        quality:
          business:
            allowedValues: ["Investigational", "FDA Approved", "EMA Approved", "Withdrawn"]
      mechanism:
        type: string
        description: "Treatment mechanism of action"
        quality:
          format:
            minLength: 10
            maxLength: 300

  Institution:
    properties:
      name:
        type: string
        description: "Institution name"
        required: true
        unique: true
        quality:
          format:
            minLength: 5
            maxLength: 100
            caseFormat: "titleCase"
      type:
        type: string
        description: "Institution type"
        required: true
        quality:
          business:
            allowedValues: ["Hospital", "University", "Research Center", "Pharmaceutical Company"]
      country:
        type: string
        description: "Institution country"
        required: true
        quality:
          format:
            minLength: 2
            maxLength: 50
      city:
        type: string
        description: "Institution city"
        quality:
          format:
            minLength: 2
            maxLength: 50

  Researcher:
    properties:
      name:
        type: string
        description: "Researcher full name"
        required: true
        quality:
          format:
            minLength: 5
            maxLength: 60
            caseFormat: "titleCase"
      degree:
        type: string
        description: "Highest degree"
        quality:
          business:
            allowedValues: ["MD", "PhD", "MD PhD", "MS", "MA", "BS", "Other"]
      specialization:
        type: string
        description: "Medical specialization"
        quality:
          format:
            minLength: 3
            maxLength: 50
      email:
        type: string
        description: "Email address"
        quality:
          format:
            pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
            caseFormat: "lowerCase"
    relationships:
      AFFILIATED_WITH:
        target: Institution
        cardinality: manyToMany
        properties:
          role:
            type: string
            description: "Role at institution"
          startDate:
            type: string
            description: "Affiliation start date"
      LEADS:
        target: Study
        cardinality: oneToMany

Healthcare Ontology Features

  • Medical coding: ICD-10 code validation for conditions
  • Regulatory compliance: FDA/EMA approval status tracking
  • Clinical phases: Standard clinical trial phase terminology
  • Institutional relationships: Complex affiliations between researchers and institutions
  • Higher quality thresholds: Stricter validation for medical data accuracy

Simple Blog Content Ontology

A minimal ontology for blog content management:
version: 1

entities:
  Post:
    properties:
      title:
        type: string
        description: "Blog post title"
        required: true
        quality:
          format:
            minLength: 5
            maxLength: 100
      slug:
        type: string
        description: "URL slug"
        unique: true
        quality:
          format:
            pattern: "^[a-z0-9-]+$"
      content:
        type: string
        description: "Post content"
        required: true
        quality:
          format:
            minLength: 100
      publishedDate:
        type: string
        description: "Publication date"
        quality:
          format:
            pattern: "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
      status:
        type: string
        description: "Post status"
        required: true
        quality:
          business:
            allowedValues: ["Draft", "Published", "Archived"]
    relationships:
      WRITTEN_BY:
        target: Author
        cardinality: manyToOne
      HAS_TAG:
        target: Tag
        cardinality: manyToMany

  Author:
    properties:
      name:
        type: string
        description: "Author name"
        required: true
        quality:
          format:
            minLength: 2
            maxLength: 50
      email:
        type: string
        description: "Author email"
        unique: true
        quality:
          format:
            pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
      bio:
        type: string
        description: "Author biography"
        quality:
          format:
            maxLength: 300

  Tag:
    properties:
      name:
        type: string
        description: "Tag name"
        required: true
        unique: true
        quality:
          format:
            minLength: 2
            maxLength: 30
            caseFormat: "lowerCase"
      description:
        type: string
        description: "Tag description"
        quality:
          format:
            maxLength: 150
This simple ontology demonstrates:
  • Content management basics with posts, authors, and tags
  • URL slug validation for SEO-friendly URLs
  • Status workflow for content publishing
  • Many-to-many tagging system

Best Practices Demonstrated

Across these examples, notice these patterns:

Naming Conventions

  • Entities: PascalCase (Company, BusinessSegment)
  • Properties: camelCase (employeeCount, filingDate)
  • Relationships: UPPER_CASE (WORKS_FOR, HAS_SUBSIDIARY)

Quality Rules

  • Required fields for critical data
  • Pattern validation for structured data (emails, dates, codes)
  • Business rules to prevent common data quality issues

Relationship Design

  • Clear cardinality specifications
  • Meaningful relationship names that read naturally
  • Relationship properties for temporal and contextual data

Domain-Specific Rules

  • Higher thresholds for sensitive domains (healthcare: 85%)
  • Industry-specific patterns (CIK codes, ICD codes, ticker symbols)
  • Controlled vocabularies for categorical data

Next Steps

Test Your Ontology

Learn how to validate and test your ontology with real data