Falsiflow Launchpad

CI gates for claims before they ship. See a real AI eval PR fail on placeholder evidence, then pass after the evidence is source-backed.

Status try_ready

Live PR Story

1. Risky claim

AI eval improved

A PR tries to ship an eval claim before the dataset, raw outputs, baseline, and metadata are reviewable.

Open PR #17

2. CI blocks it

claim_check_blocked

Strict CI refuses placeholder evidence and keeps the claim out of release notes.

Blocked run

3. Evidence passes

claim_check_ready

The same PR passes after source-backed eval rows, pinned versions, raw artifacts, and a review bundle are added.

Ready run

Examplebiointerface_coatings

Decision checkclaim_check_ready

Source filessources_ready

Review packagebundle_verified

Ready Or Blocked

Blocked placeholder claim_check_blocked

Missing source files, placeholder values, or unpinned benchmark metadata stop the claim before it reaches release notes.

Source-backed pass claim_check_ready

Measured rows, required metadata, source manifests, audit reports, and bundle verification agree.

Claims This Demo Targets

AI eval claim

Model beats baseline

Blocks public comparison until dataset, model, raw-output, and reproducibility evidence are pinned.

Product metric

Activation improved

Blocks launch until metric definition, exposure, guardrails, rollback owner, and dashboard evidence are present.

R&D result

Experiment is ready

Blocks advancement until measured evidence, raw source files, thresholds, and review artifacts line up.

Vendor handoff

Supplier can run it

Keeps contact claims, scope confirmation, and measured-data return requirements separate.

1. Run your own check

Open workbench

Upload project, evidence, and source files, then run the local evidence gate from this browser.

Open workbench

2. See the example result

Open report

A readable pass/fail report that shows the decision, the evidence, and what to check next.

Open report

3. Make your own checklist

Open wizard

Answer a few form fields and download starter files without writing JSON by hand.

Open wizard

4. Inspect the evidence

Open dashboard

See which measured values and source files support the example decision.

Open dashboard

Advanced CLI Handoff

Use this only when you are ready to inspect or rerun the generated project from the terminal.

falsiflow doctor --project-dir public_demo/project --strict