The lab · news
Agent-curated, daily.
Built and run on GCP. Scrapes RSS feeds, GitHub trending, arXiv categories, and Hacker News. Daily ingest, daily publish, daily summary. The page itself is the demo.
How is this page done?↓Articles
16 May 2026How this page is made
A small daily pipeline, six steps.
Scroll down — the camera will tour each step against the live architecture diagram in the background.
Step · 01 · SOURCES
Four feeds.
GitHub Trending (HTML scrape, AI/LLM/agent keyword filter), Hacker News (Algolia API, last 24h), Medium (six RSS feeds: TDS, Gradient Ascent, AI Mind, The Generator, plus #llm and #large-language-models tags), and vendor blogs from Anthropic, OpenAI, DeepMind, Hugging Face. Each adapter is async and isolated — one source failing doesn't fail the run.
Step · 02 · SCRAPE
Fan out, merge in.
All four adapters fan out in parallel. Each respects per-source caps so one chatty feed can't drown out the others. The merged pool typically lands around ~70 candidates per run; on a slow news day it might be 20.
Step · 03 · DEDUP
Drop duplicates.
Two layers. First: within-batch clustering by cosine similarity ≥ 0.85 on text-embedding-3-small over title+excerpt — the same Anthropic announcement posted by four sources collapses to one candidate (longest excerpt wins). Second: backward dedup against the last 14 days of articles, by exact source_url and case-insensitive title.
Step · 04 · CURATE
Pick one winner.
An LLM ranks the survivors against the portfolio context (who reads this feed, what stack, what they care about) and picks one winner plus up to two alternates. The alternates are insurance — if the writer or critic later rejects the winner, the pipeline swaps in alternate #1 instead of dying.
Step · 05 · WRITE
Draft the post.
Same LLM, different prompt. The writer drafts a 200-450 word first-person post covering three beats: WHAT shipped, WHERE/WHEN it's useful (one concrete situation), WHAT TO LOOK AT FIRST (entry point, file, page, flag). One short bullet list is fine. No headings, no images, no marketing voice.
Step · 06 · CRITIC
Cheap quality gate.
A third LLM call, deliberately lenient — it only rejects empty/garbage drafts and only revises drafts >600 words, <80 words, or with stray headings/images. One revise attempt allowed; if the revised draft still fails, the pipeline swaps to the next alternate from the curate step instead of looping forever.
Step · 07 · PUBLISH
Land on this page.
Survivor gets persisted to two Postgres tables: articles (the story you just read up top) and pipeline_runs (the audit trail — scraped count, unique count, winner id, error if any). The Next.js page revalidates via ISR; the next visitor inside an hour gets the new article straight from the static cache.
The loop — once a day
Read what survived.
Seven steps, one cron, one article per day at 06:30 Europe/Warsaw via APScheduler. Scroll back up to today's pick — it came out the end of this pipeline this morning. Tomorrow's run is already scheduled.
Want this built for you?
This page is what an agent-curated internal-news system looks like at the smallest possible scale. Bigger ones are similar — same shape, more inputs, tighter rubric. If you want one for your team, talk to the lab.
Bio & channelsTalk to the lab→