A guide to the Journalist Tool content monitoring pipeline
Last updated March 2026
Journalist Tool is an AI-powered content monitoring and analysis pipeline for beat reporters. It watches RSS feeds, YouTube channels, podcasts, websites, and forums for new content; acquires and cleans transcripts; runs structured AI analysis; maintains a living, self-improving knowledge base; and synthesizes daily intelligence briefings with cross-source themes, signal alerts, and story angles.
Watches RSS, YouTube, podcasts, websites, and forums. 7-tier transcript acquisition with free-first strategy.
Claude Haiku extracts entities, signals, claims, quotes, sentiment, and topics from every piece of content.
Self-improving Cortex with entity profiles, glossary, ASR corrections, and contradiction detection.
Claude Sonnet synthesizes cross-source Signal Radar briefs with trend alerts and story angles.
The pipeline runs as 8 independent Railway cron services sharing the same Docker image. Each service handles one concern and runs on its own schedule. Failures in one service don't affect the others.
Check Sources
Poll RSS, YouTube, web, and forum sources for new content. Circuit breaker skips unhealthy sources.
Transcribe
7-tier acquisition: raw text, web scrape, YouTube subs, RSS tags, mirrors, Groq Whisper, YT audio.
Enrich
Haiku + RAG corrects ASR errors, normalizes jargon, extracts unknown terms. Webhooks dispatched.
Analyze
Haiku produces structured JSON: summary, entities, signals, claims, quotes, sentiment, topics.
Cortex Learn
Promote terms, refine definitions, detect duplicates, merge aliases, generate training questions.
Synthesize
Sonnet clusters stories across sources and generates the daily Signal Radar brief.
Renders the brief as HTML and sends it via Resend. Pipeline complete.
Three deployment targets work together. Railway runs the Python pipeline on a cron schedule. Supabase hosts the PostgreSQL database and Edge Functions. Vercel serves this dashboard and the Train Cortex game.
Railway runs 8 independent cron services from the same Docker image (python:3.12-slim + ffmpeg + uv). Each service is dispatched by the RUN_MODE env var: source-checker (*/15), transcribe-worker (*/30), analyze-worker (*/30), cortex-applier (*/30), re-enricher (every 2h), cortex-learner (every 4h), embedder (daily 3am), and content-monitor (daily synthesis + email). No long-running server.
Supabase provides PostgreSQL with pgvector for embeddings, plus Edge Functions for the Train Cortex API. 22 tables, 7 RPCs, HNSW indexes for vector similarity search.
Vercel hosts this Next.js dashboard. Server components read directly from Supabase — no API layer in between. Also hosts the Train Cortex game as static HTML.
At-a-glance stats: total items, active sources, unhealthy sources, latest brief. Preview of the most recent brief and the last 10 discovered items.
When to use: Check system health and see what's new.
Filterable, sortable table of all discovered content. Filter by status (pending, transcribed, analyzed, failed) and source. Each item links to its full analysis.
When to use: Browse content, find specific items, check processing status.
All monitored sources with health indicators. Inline controls to pause, resume, edit, reset health, or delete. Add new sources via the form.
When to use: Manage what the tool monitors. Diagnose source failures.
Beat configuration hub. Each beat has a glossary editor, entity manager, signal taxonomy editor, KB browsers, auto-mined suggestion review, and a seed-from-text tool.
When to use: Configure domain knowledge. Review and train the KB.
Daily Signal Radar briefs. Each brief contains an executive summary, convergence threads with cross-source quotes and claims, signal alerts, trend alerts, and story angles.
When to use: Read your morning intelligence briefing.
Machine intelligence tracking: rumors, leaks, announcements, and predictions extracted from content. Filter by entity, type, or confidence level.
When to use: Track accountability: who said what, and were they right?
Two search modes: item search (title + analysis text) and transcript search (full-text segment search with highlighted results and timestamps).
When to use: Find what anyone said about any topic across all sources.
Cortex is a living, self-improving domain knowledge system. It learns from every transcript the pipeline processes, building a richer understanding of the beat over time.
Canonical registry of people, companies, and organizations. Auto-synthesized profiles from accumulated mentions. Alias detection and entity merging.
Living dictionary of domain-specific terms. Definitions refined from accumulated evidence. Version history tracks how definitions evolve.
Categorized signal types (acquisitions, launches, rumors, etc.) with descriptions and examples. Used by analysis to tag content with detected signals.
Key claims extracted from content with confidence scores. Predictions, rumors, and factual assertions tracked for accountability.
Known speech-to-text errors mapped to correct forms. Fed back into enrichment to improve transcript quality over time.
Detects when new information conflicts with existing KB entries. Flags for human review with side-by-side comparison.
Gamified human-in-the-loop training. Tinder-style swipe cards for disambiguating entities, verifying ASR corrections, and reviewing definitions.
Dashboard metrics: glossary breakdown, linked terms, refinement status, ASR correction counts, contradiction counts, pending questions.
Here's how to get the most out of the Journalist Tool day-to-day.
Point the tool at RSS feeds, YouTube channels, podcasts, websites, or forums you want to monitor. Set check intervals per source.
Eight independent Railway cron services run automatically on staggered schedules. They check sources (every 15min), transcribe and enrich (every 30min), analyze (every 30min), apply KB answers, re-enrich stale transcripts, run Cortex learning, embed chunks, and synthesize daily briefs.
Check /briefs each morning for the Signal Radar — cross-source themes, signal alerts, trend alerts, and key quotes.
Use the Claims Explorer to browse predictions, rumors, and factual claims. Filter by entity, type, or confidence to track accountability.
Use /search for full-text and semantic search across all transcripts and analyses. Find what anyone said about any topic.
Visit Train Cortex to answer KB questions (merge terms, verify ASR, disambiguate entities). This improves enrichment and analysis quality over time.
Check /intel for machine intelligence tracking — rumors, leaks, and announcements with source corroboration.
Flywheel effect: The more sources you add and the more you train the Cortex, the smarter the system gets. Each pipeline run builds on previous knowledge. ASR corrections improve transcripts. Glossary terms improve analysis. Entity profiles improve briefs.
| Concept | Definition |
|---|---|
| Source | A content feed to monitor: RSS, YouTube channel, podcast, website, or forum. Each source has a type, URL, check interval, and health tracking. |
| Item | A single piece of content discovered from a source. Progresses through statuses: pending, transcript_acquired, analyzed, failed. |
| Transcript | The text content of an item. Acquired via 7 tiers (free first, paid fallback). Raw transcript is cleaned during enrichment. |
| Analysis | Structured AI output for an item: summary, entities, topics, signals, claims, quotes, sentiment. Produced by Claude Haiku. |
| Signal | A categorized indicator detected in content: acquisition rumor, product launch, executive move, financial result, etc. |
| Claim | A specific assertion extracted from content with a confidence score. Predictions, rumors, and facts tracked for accountability. |
| Brief | A daily synthesis report generated by Claude Sonnet. Contains cross-source themes, signal alerts, trend alerts, and story angles. |
| Beat | A subject area / domain (e.g., pinball industry). Contains the system prompt, glossary, entities, signal taxonomy, and webhook config. |
| Cortex | The living knowledge base. Includes entity profiles, glossary, ASR corrections, contradictions, and training questions. Self-improves with each pipeline run. |
Last updated: March 2026
Covers pipeline, architecture, dashboard pages, Cortex, and workflow.