Naive Gazeta

I had initially built the news aggregator on Azure. Getting anything to actually work felt like eating a bucket of lard, nothing was lean, nothing talked to each other, everything was buried under ten layers of toggles and silos. So I scrapped it and rebuilt the whole thing on Cloudflare in Python.

Every morning at 6AM UTC, a pipeline kicks off automatically:

Fetches articles from The Guardian API and a set of RSS feeds
Extracts full article text using a TypeScript Worker built on Cloudflare's HTMLRewriter
Summarizes each article with Llama 4 Scout, a 17B parameter model running on Cloudflare's inference infrastructure
Ranks articles by recency, source weight, and content quality
Publishes the digest to /digests

The whole pipeline is written in Python, running on Cloudflare Workers via Pyodide (Python compiled to WebAssembly). Each step is a durable checkpoint, so if something fails mid-run, it resumes from the last completed step rather than starting over.

Data lives in D1 (SQLite at the edge) and R2 (object storage). No traditional backend, no VPS, no bill surprises.

Stack: Python · Cloudflare Workers · D1 · R2 · Workers AI · Cloudflare Workflows · Pages Functions

Previous version (Azure): github.com/batoorsayed/news-aggregator

Live: batoorsayed.com/digests