·Szymon Smagowski · AI Lab · est. 2024


Scroll into the example flow — see what's possibleSmall team exploring AI?Talk to the lab

Node · 01 · USER

The question.

Any input, any language. Previous user context and chat history travel with the turn — and the graph can enter from any node, not only here.

multilingualuser contextany entry

Node · 02 · VOICE

Steer with your voice.

Everything text can do — done by voice instead. Wake-word, in-browser, no audio leaving the device until the user wants it to. Hands free, eyes on the work, "in the room with the assistant".

wake-wordon-devicehands-free

Node · 03 · ROUTER

The router.

A fast LLM classifies the turn — small-talk, retrieval, or tool call. Everything downstream branches here.

classifybranching

Node · 04 · EMBEDDING

Text becomes vectors.

Text becomes a high-dimensional vector. Same meaning lands close together — across languages, across typos.

embedding modelcross-lingual

Node · 05 · VECTOR DB

The vector database.

Indexed chunks as dense vectors. Payload ACLs filter at query time — no separate auth pass. Sparse index in parallel for keyword recall.

vector DBpayload ACLsparse + dense

Node · 06 · RERANK

An LLM judge.

Vector similarity gives a coarse ranking. A smaller LLM rescores the candidates and trims to the most relevant — the noise cosine distance can't see.

LLM-as-judgerelevance score

Node · 07 · TOOLS

When the agent acts.

Some questions need actions, not retrieval. The agent picks a tool, runs it, brings the output back. Scoped args block prompt injection structurally.

scoped toolstool calls

Node · 08 · ANSWER LLM

The generator.

Retrieved chunks plus tool output become the prompt. A large LLM streams the response back to the user.

large LLMstreaming

Node · 09 · OBSERVABILITY

Every step traced.

Every step lands as a trace span — scores, args, latency, cost. The same trace ID flows from the user turn to the eval harness.

tracingRAG eval

Node · 10 · FEEDBACK

The human signal.

Thumbs up / down on the answer or the retrieval, free-text follow-ups, escalations. Real users telling the system what's good and what's wrong — captured per turn, joined to the trace.

👍 / 👎per-turnjoined to trace

Node · 11 · DATASET + EVAL

A labelled ground truth.

Business subject-matter experts plus the user feedback stream — distilled into a labelled eval set. RAG and LLM-answer quality scored against it on every build. The dataset grows, the bar moves with it.

expert reviewRLHFeval set

The loop closes

The system learns.

Findings from feedback and eval flow back to retrieval and the model — re-ranked priors, re-weighted retrieval, tuned prompts, fine-tunes. Today's bug becomes tomorrow's regression test.

Talk to the lab

Got something to build?

A short email is fine. Tell me what you're trying to do and where you're stuck — I'll write back.