filmsearch

1 min read ~240 wpm

A semantic search engine for films — built so you can ask things like "movies about love and space" and get back what you actually mean, not just title matches. Corpus of 54,000+ films scraped from Wikipedia.

This project has two articles because it has two versions. The first attempt failed. I was throwing LLMs at the problem without understanding what retrieval actually is. The second attempt worked — because I stopped outsourcing the thinking and built BM25 with multi-zone ranking myself. A startup founder used it to find a Japanese animated film he'd been unable to find anywhere else. Still my favorite piece of feedback.

try it → · github →

the journey
v1 oct 2024 failed learned what retrieval actually is v2 nov 2024 worked

Four weeks between versions. The difference wasn't a bigger model — it was building BM25 and multi-zone ranking from scratch.

articles
Multi-zone, BM25-based film search — the technical breakdown

The rebuild. Dropped the LLM-heavy approach and built a proper retrieval system: BM25 scoring, per-zone TF-IDF tables, cosine similarity for zone ranking, composite scores as a stop-word filter. Math and architecture, step by step.

Film Search Platform — first attempt (the one that failed)

The first swing. Scrapy pipeline, structured data via Llama 3.1, 384-dim embeddings in Supabase, cosine similarity. It ran. It didn't really work. Keeping the write-up online because the failure is the more useful artifact.