Inference benchmarking · configuration recommendation · auditable exports

Make inference configuration decisions with evidence, not guesswork.

Sigilant Labs runs controlled benchmarks across candidate configurations (quantization, context, batch, and runtime parameters) and produces a recommendation with the supporting artifacts you need for review and reproducibility.

Outputs include per-variant metrics, gate results, and exportable JSON/CSV suitable for internal sign‑off and iteration tracking.

What you receive

  • Recommendation with the top configuration for the declared target profile.
  • Variant breakdown (latency/throughput/memory + quality gates) for each candidate.
  • Exports: JSON + CSV artifacts designed for audits and reproducible comparisons.

Supported focus

  • GGUF / llama.cpp deployments
  • CPU and cloud target profiles
  • Multiple quantizations and context/batch ladders

How it works

1) Define the run

Select a model artifact and specify a target hardware profile (e.g., CPU class or cloud flavor). Choose candidate quants and constraints.

2) Execute controlled benchmarks

Sigilant evaluates variants under consistent conditions to reduce run-to-run variance and surface tradeoffs.

3) Review and export

Inspect metrics and gates, then export artifacts (JSON/CSV) for documentation, sharing, and future comparisons.

Operational principles

FAQ

Is this a subscription? Not initially. We currently support prepaid credit packs. Subscription plans may be introduced later.

What consumes credits? Credits are consumed when a run is executed. Estimated consumption is shown before confirmation.

Do results vary? Yes. Performance depends on hardware, model, and workload. We provide controlled settings and report variance where applicable.

How do I get access? Use the contact page to request access; we onboard accounts and provide console credentials.