Benchmark
Litica vs. flat retrieval on evaluative multi-agent queries
We evaluated Litica against flat retrieval (naive top-k embedding search over a single unstructured store) on a set of evaluative multi-agent queries: questions that require reasoning about context, authorship, and sequence across agent boundaries, not just factual lookup.
Metric: Precision at 3 (P@3), the fraction of the top 3 retrieved items judged relevant by an independent evaluator. Queries were drawn from three domains: research-and-draft pipelines, compliance-gated underwriting workflows, and multi-session developer tooling. The flat retrieval baseline used the same underlying data with a standard embedding model; no re-ranking was applied.
Five of the evaluated queries were purely evaluative, asking why a decision was made, what a specific agent found, or what changed across turns. Flat retrieval returned no relevant results for any of these five queries (P@3 = 0.00). Litica resolved all five using spreading activation across the shared namespace with full provenance attribution.
| Query | Litica P@3 | Flat Retrieval P@3 |
|---|---|---|
| What made this customer hesitate? | 0.80 | 0.00 |
| What did we decide last sprint? | 0.75 | 0.00 |
| Which agent flagged the compliance risk? | 0.70 | 0.00 |
| What context did the retrieval agent surface? | 0.60 | 0.00 |
| What changed between draft v1 and v2? | 0.50 | 0.00 |
| Overall mean P@3 (full eval set) | 0.67 | 0.29 |
Methodology note: evaluation was conducted on internal datasets. Full methodology, evaluator instructions, and raw query set will be published in an accompanying technical post. Numbers reflect shipped capability as of Q1 2026. The comparison table on the homepage will be updated as additional capabilities ship.