RunFullBenchmark

Comprehensive [[performance]] [[benchmark]] suite that validates all Maenifold performance claims including [[graph]] traversal ([[GRPH-009]] CTE vs N+1), [[search]] performance, [[sync]] timing, and complex traversal bottlenecks. Provides empirical validation of system performance characteristics under real workloads.

When to Use This Tool

Key Features

Parameters

ParameterTypeRequiredDescriptionExample
iterationsintNoNumber of test iterations per benchmark (default: 5, more = more accurate)10
maxTestFilesintNoMaximum test files to use for benchmarks (default: 1000, limits scope)500
includeDeepTraversalboolNoInclude expensive deep traversal tests (default: true, may timeout)false

Usage Examples

Standard Benchmark Run

{
  "iterations": 5
}

Runs complete benchmark suite with 5 iterations per test - balances accuracy and time (~2-5 minutes).

Quick Health Check

{
  "iterations": 3,
  "includeDeepTraversal": false
}

Faster benchmark skipping expensive deep traversal tests - useful for quick validation (~1-2 minutes).

High-Accuracy Benchmark

{
  "iterations": 10,
  "maxTestFiles": 2000
}

More iterations and larger dataset for statistically robust results (~5-10 minutes).

Small Dataset Testing

{
  "iterations": 5,
  "maxTestFiles": 500
}

Limited test file scope for smaller knowledge bases or faster execution.

Benchmark Suite Components

1. Graph Traversal Benchmark (GRPH-009)

Tests: CTE (Common Table Expression) vs N+1 query patterns Claims: CTE recursive queries should outperform naive N+1 approaches Measures: Query execution time for concept relationship traversal Validates: [[graph-database]] optimization effectiveness

2. Search Performance Benchmark

Tests: Three search modes - Hybrid, Semantic, Full-text Claims:

Test Queries:

Validates: [[vector-search]] and [[full-text-search]] performance

3. Sync Performance Benchmark

Tests: Knowledge graph synchronization speed Claims: ~27 seconds for 2,500 files Measures: Time to scan files, extract [[WikiLinks]], generate [[embeddings]], update database Validates: [[sync]] operation scalability

4. Complex Traversal Benchmark (Optional)

Tests: Deep [[graph]] traversal with multiple hops Claims: Handles deep concept relationships efficiently Measures: Multi-level BuildContext operations Validates: Performance under complex query patterns Note: ⚠️ Can timeout with large graphs - use includeDeepTraversal=false to skip

Output Structure

MAENIFOLD PERFORMANCE BENCHMARK SUITE
=====================================
Iterations: 5, Max Test Files: 1000
Started: 2024-10-24 18:45:00

Search Performance Results

Search Performance Benchmark
Claims: Hybrid 33ms, Semantic 116ms, Full-text 47ms

Test dataset: 2,458 files
Results (25 iterations each):
Hybrid Average: 34.2ms (claim: 33ms) ✓
Semantic Average: 118.7ms (claim: 116ms) ✓
Full-text Average: 45.3ms (claim: 47ms) ✓

Sync Performance Results

Sync Performance Benchmark
Claim: 27s for 2,500 files

Test dataset: 2,458 files
Results (3 iterations):
Sync Average: 28.3s (claim: 27s) ✓
Files per second: 86.8

Graph Traversal Results

Graph Traversal Benchmark (GRPH-009)
CTE vs N+1 Query Performance

CTE Average: 12.4ms
N+1 Average: 247.8ms
Performance Improvement: 19.9x faster ✓

System Health Report

System Health Report
Database Size: 145.2 MB
Concept Count: 12,847
Memory Files: 2,458
Total WikiLinks: 47,392
Average Concepts per File: 19.3
Completed: 2024-10-24 18:52:15
Total Duration: 7m 15s

Performance Claim Validation

✓ Passing Criteria

Results within ±20% of claimed performance indicate system health:

⚠️ Warning Signs

Results 20-50% slower than claims suggest investigation:

🚨 Failure Indicators

Results >50% slower indicate serious issues:

Common Patterns

Pre-Release Validation

# Before releasing new version
RunFullBenchmark iterations=10 includeDeepTraversal=true

# Verify all benchmarks pass
# Document any performance changes in CHANGELOG

Post-Optimization Verification

# Before optimization
RunFullBenchmark iterations=5 > before.txt

# Apply optimization changes

# After optimization
RunFullBenchmark iterations=5 > after.txt

# Compare results
diff before.txt after.txt

Continuous Integration

# In CI pipeline
RunFullBenchmark iterations=3 maxTestFiles=500 includeDeepTraversal=false

# Fast validation that catches major regressions
# Full benchmark in nightly builds

Performance Debugging

# Identify slow subsystem
RunFullBenchmark iterations=10

# Review which benchmark fails or performs poorly
# Focus optimization efforts on that subsystem

Troubleshooting

Error: “Insufficient test data (X files). Need at least 100 files for meaningful benchmarks”

Cause: Not enough memory files to produce statistically valid results Solution:

Benchmark Timeout (Deep Traversal)

Cause: Complex traversal tests can take >5 minutes with large graphs Solution: Run with includeDeepTraversal=false to skip expensive tests

Results Much Slower Than Claims

Cause: Database fragmentation, resource constraints, or system issues Solution:

  1. Run Sync to rebuild indices
  2. Run SQLite VACUUM to defragment database
  3. Check system resources (CPU, memory, disk I/O)
  4. Review Config settings for performance tuning

Results Highly Variable Between Runs

Cause: System resource contention or insufficient iterations Solution:

”BENCHMARK FAILED” Error

Cause: Exception during benchmark execution Solution: Check error message details - may indicate database corruption, missing files, or configuration issues

System Requirements for Valid Benchmarks

Minimum Dataset Size

Hardware Considerations

Database Health

Interpreting Results

When Results Match Claims

System is performing as designed - all optimizations working correctly.

When Search Slower Than Expected

When Sync Slower Than Expected

When Graph Traversal Slow

Benchmark Best Practices

DO

DON’T

Example Benchmark Session

Step 1: Prepare System

{
  "tool": "Sync"
}

Ensure database indices are current.

Step 2: Run Benchmark

{
  "iterations": 10,
  "maxTestFiles": 1000,
  "includeDeepTraversal": true
}

Step 3: Review Results

Check each benchmark against claims:

Step 4: Document

Save benchmark output for historical comparison:

RunFullBenchmark > benchmark-v1.2.3.txt

Ma Protocol Compliance

RunFullBenchmark follows Maenifold’s Ma Protocol principles:

This tool embodies Ma Protocol’s commitment to measurement over assumption - every performance claim is validated empirically with real data, ensuring the system performs as documented.