RSS Summarizer
Automated news pipeline that ingests articles from Portuguese/Brazilian sources, processes them with AI for summarization and categorization, and publishes optimized posts to Telegram.
Client:
Portugalist, Brazil
Project Overview
The client engaged Axis to design a cost-effective and resilient news automation system. The solution parses raw RSS feeds (Globo, Sapo, Sapo AF), filters irrelevant entries, leverages LLMs for extraction and summarization, and delivers polished, formatted posts with titles, summaries, hashtags, and source links directly to Telegram. The system was carefully engineered to reduce API usage costs while maintaining consistent quality output.
Challenge
Parsing and formatting diverse RSS feeds with inconsistent structures.
Minimizing costly AI API calls without reducing post quality.
Handling multilingual content (Portuguese + Brazilian variations) and filtering out noise (ads, irrelevant characters).
Structuring reliable, JSON-like outputs from LLMs despite their inherent variability.
Meeting Telegram-specific constraints (post length limits, caption formatting).
Tech Stack
Artificial Intelligence: Llama family (via Together.ai API) for summarization, keyword/hashtag generation, and content optimization.
Backend Logic: Python-based parsers and processing pipeline.
Data Management: Lightweight database for deduplication, freshness tracking, and automated cleanup.
Integration & Delivery: Telegram Bot API for final post publishing.
Optimization Tools: Prompt engineering, configurable system parameters (config file) for fast iteration and tuning.
Solution
Axis delivered an optimized multi-stage system:
Smart Ingestion: Custom RSS parsers for Globo/Sapo with filters to skip short, ad-like content and prioritize longer, higher-value articles.
Cost Control: Changed logic from “AI on every article” to “AI only on scheduled-to-publish articles,” drastically lowering API expenses.
Robust AI Prompts: Structured prompts enforced clean separation of output fields (Title, Description, Hashtags) and reduced errors like malformed JSON.
Language & Format Filtering: Logic added to remove unwanted symbols, normalize hashtags, and enforce consistency across dialects.
Post Formatting: Telegram-ready output with bold titles, proper paragraphing, hashtags, and link attribution.
Error Handling: Fail-safes for missing fields, oversized posts, and empty AI responses.