The lab · news

Agent-curated, daily.

Built and run on GCP. Scrapes RSS feeds, GitHub trending, arXiv categories, and Hacker News. Daily ingest, daily publish, daily summary. The page itself is the demo.

How is this page done?

Articles

16 May 2026

How this page is made

A small daily pipeline, six steps.

Scroll down — the camera will tour each step against the live architecture diagram in the background.

Step · 01 · SOURCES

Four feeds.

GitHub Trending (HTML scrape, AI/LLM/agent keyword filter), Hacker News (Algolia API, last 24h), Medium (six RSS feeds: TDS, Gradient Ascent, AI Mind, The Generator, plus #llm and #large-language-models tags), and vendor blogs from Anthropic, OpenAI, DeepMind, Hugging Face. Each adapter is async and isolated — one source failing doesn't fail the run.

github trendinghacker news6 medium feeds4 vendor blogs

Step · 02 · SCRAPE

Fan out, merge in.

All four adapters fan out in parallel. Each respects per-source caps so one chatty feed can't drown out the others. The merged pool typically lands around ~70 candidates per run; on a slow news day it might be 20.

async parallelper-source cap~70 candidates

Step · 03 · DEDUP

Drop duplicates.

Two layers. First: within-batch clustering by cosine similarity ≥ 0.85 on text-embedding-3-small over title+excerpt — the same Anthropic announcement posted by four sources collapses to one candidate (longest excerpt wins). Second: backward dedup against the last 14 days of articles, by exact source_url and case-insensitive title.

cosine ≥ 0.85embeddings · openai14-day backward

Step · 04 · CURATE

Pick one winner.

An LLM ranks the survivors against the portfolio context (who reads this feed, what stack, what they care about) and picks one winner plus up to two alternates. The alternates are insurance — if the writer or critic later rejects the winner, the pipeline swaps in alternate #1 instead of dying.

llm · gpt-5.4-nano1 winner + 2 alternates

Step · 05 · WRITE

Draft the post.

Same LLM, different prompt. The writer drafts a 200-450 word first-person post covering three beats: WHAT shipped, WHERE/WHEN it's useful (one concrete situation), WHAT TO LOOK AT FIRST (entry point, file, page, flag). One short bullet list is fine. No headings, no images, no marketing voice.

llm · gpt-5.4-nano200-450 wordsfirst person

Step · 06 · CRITIC

Cheap quality gate.

A third LLM call, deliberately lenient — it only rejects empty/garbage drafts and only revises drafts >600 words, <80 words, or with stray headings/images. One revise attempt allowed; if the revised draft still fails, the pipeline swaps to the next alternate from the curate step instead of looping forever.

llm · gpt-5.4-nano≤1 reviseswap to alternate

Step · 07 · PUBLISH

Land on this page.

Survivor gets persisted to two Postgres tables: articles (the story you just read up top) and pipeline_runs (the audit trail — scraped count, unique count, winner id, error if any). The Next.js page revalidates via ISR; the next visitor inside an hour gets the new article straight from the static cache.

postgresarticles + pipeline_runsssg + isr · 1h

The loop — once a day

Read what survived.

Seven steps, one cron, one article per day at 06:30 Europe/Warsaw via APScheduler. Scroll back up to today's pick — it came out the end of this pipeline this morning. Tomorrow's run is already scheduled.

Want this built for you?

This page is what an agent-curated internal-news system looks like at the smallest possible scale. Bigger ones are similar — same shape, more inputs, tighter rubric. If you want one for your team, talk to the lab.

Bio & channelsTalk to the lab