Technology / Retrieval & RAG

Retrieval is the easy part. Reasoning over what you retrieve is the hard part.

Vector search plus an LLM is a great demo. It is a poor system. The interesting problems start at the next layer up.

From retrieval-augmented generation to retrieval-augmented reasoning.

Lewis et al. wrote down the original idea in 2020: combine the parametric memory inside a language model with non-parametric memory pulled from somewhere else. It solved a real problem. Models could finally stop making things up about Tuesday.

Most enterprise RAG since then has stopped at that paper. Vector search plus an LLM. Predictable failure modes when knowledge has to be composed across sources, when claims contradict, when half the documents are eighteen months old and no one bothered to mark them as such.

Our research line — Retrieve Is Not Enough — is about what a retrieval system needs once you accept that similarity search is the floor, not the ceiling.

Four questions vector search cannot answer.

A retrieval system that ships into production has to answer these. None of them are about cosine distance.

What kind of knowledge did I just find?

A claim, a source, a hypothesis, an old belief that quietly stopped being true. Each one needs a different downstream treatment.

How much should I trust it?

Confidence, decay, provenance — first-class properties of every unit, not annotations bolted on after the fact.

Does it contradict what I just said two paragraphs ago?

And if it does, which side does the system surface — to the agent, to the analyst, to the person about to make a call?

Is the context I assembled actually useful for the task?

Decision-ready, not topically relevant. A flat top-k of similar chunks is not a context. It is a guess that the LLM has been kind enough to dress up.

Seven pieces, in order.

Each stage is a design decision. Skip one and the failure shows up two queries later — usually as a confident sentence no one can trace back to a source.

Hybrid retrieval

Dense embeddings, sparse retrieval, structural retrieval over a graph — weighted dynamically by the kind of question being asked. Pure dense loses precision on entity-heavy queries; pure sparse loses recall on conceptual ones. We use all three, and we do not pretend one is enough.

Knowledge Object layer

What comes back is not a chunk. It is a typed unit — claim, hypothesis, decision, observation — carrying its source, its confidence, its decay state, and its relationships to other units. The agent above gets to know what kind of thing it is reading.

Provenance graph

Every retrieved claim is traceable to its origin and to other claims that support, depend on, or argue with it. Without this, you cannot audit the output. With it, you can defend the output.

Deterministic salience ranking

Importance is computed by formula — ODE-based, courtesy of Project OIDA — not asked of an LLM in passing. Same question, same data, same ranking. Two months later, still the same. That stability is the point.

Contradiction detection

When retrieved evidence conflicts, the system flags it. It does not let the generative model quietly pick a side and write a confident sentence about it.

Decay-aware filtering

Knowledge has a half-life that depends on what it is. A market observation is not a ratified decision. Our filtering knows the difference and downweights accordingly.

Task-shaped context assembly

The context that reaches the model is shaped for the job — analysis, decision support, memo, due diligence — not delivered as a uniform stack of similar paragraphs and a hope.

What it looks like, written down.

Two reference excerpts. The orchestrator on one side, the shape of what it hands back on the other. Real implementations are calibrated to domain — these are the bones, not the body.

retrieval.py

# Hybrid retrieval: dense + sparse + structural,
# weighted dynamically by query class.
def retrieve(query: str, k: int = 12) -> list[KnowledgeObject]:
    weights = router.weight_for(query)

    dense   = vector_index.search(query, k=k * 3)
    sparse  = bm25_index.search(query, k=k * 3)
    graph   = provenance_graph.expand(query, k=k * 2)

    candidates = fuse(dense, sparse, graph, weights=weights)
    candidates = apply_decay(candidates, now=clock.now())
    candidates = detect_contradictions(candidates)

    # ODE-based salience, deterministic across queries.
    return rank_by_salience(candidates)[:k]

knowledge_object.ts

// What the agent actually receives — not a flat chunk.
type RetrievedKO = {
  id: string
  claim: string
  type: 'factual' | 'opinion' | 'hypothesis'
      | 'decision' | 'commitment'

  // epistemic state
  confidence: number      // [0, 1], deterministic
  decayState: number      // [0, 1], 1 = fresh

  // provenance
  source: { docId: string; span: [number, number]
            actor: string; role: string }

  // graph relationships
  supports: string[]      // KO ids supported
  contradicts: string[]   // KO ids in conflict

  salience: number        // ODE-derived, ranked
}

Where it is already running.

Each one exercises the same architecture under a different kind of pressure.

Investment intelligence

Madara / AskMadarAI

PE and VC analysts comparing companies, weighing contradictory signals, writing memos where every claim is traceable. We use Madara for deal sourcing, comparables and risk mapping. It is now being unbundled into specialised AgentStreet agents — Startup Evaluator, PE Intelligence, Portfolio Monitor.

Media intelligence

Newjee

Newjee does not retrieve articles. It retrieves claims, narratives, media actors and framing patterns — and maps how the same event gets constructed across outlets. Media monitoring stops being a clipping service and starts being an instrument.

Flagship venture · Epistemic Knowledge for the AI Era

OIDA

OIDA is the tech company accelerated by KVA, working on epistemic infrastructure for organisations. Retrieval there is one piece of a Knowledge Gravity Engine that models how knowledge ages, conflicts, and is trusted. It works at the epistemological layer — what is known, with what confidence, what is decaying — distinct from ontological platforms that just model entities and relations.

Client implementations

Enterprise retrieval

Legal and regulatory workflows, R&D knowledge bases, due diligence systems, compliance retrieval. Same architecture, calibrated to domain, owned by the client.

Retrieve Is Not Enough.

The position paper. Why similarity-based retrieval breaks under epistemic load, and what a retrieval architecture has to do instead in order to support reasoning, decision support and outputs anyone can defend. In preparation for 2026.

All KVA Research →

Build retrieval that holds up.

If you are shipping a RAG system into production — or replacing one that quietly stopped earning trust — we run targeted retrieval audits and architecture reviews.

Audit your retrieval →Next: Knowledge Architecture →