Concurrent search, deduplication, URL content analysis, and structured AI report synthesis.
Track 01 · Pipeline. The research engine responsible for fetching, filtering, and synthesizing the web for a 6-agent content pipeline. Runs concurrent lookups across SerpAPI, Tavily, Brave, and DuckDuckGo, strips boilerplate navigation from raw pages, ranks sources by quality, and writes structured ResearchReports via Anthropic, OpenAI, or Groq. Extracted from production Agentic OS.
Standard LLMs cannot browse the live web without a search tool, but simply hooking an LLM to a search API is slow and noisy. If you ask an agent to research a topic, and it executes a single search query, reads one result, and writes a response, it inherits the bias and limits of that single source.
To generate high-quality research, you need a system that queries multiple search indexes (SerpAPI, Brave, Tavily) in parallel, filters out advertising, navigation headers, and duplicate links, reads and scores the content of 5+ web pages simultaneously, and feeds those structured insights into a final synthesis model.
src/llm/router.py: Handlers for multi-provider API calls. Manages key loading, default parameters, and cascading exceptions between Anthropic, OpenAI, and Groq.src/search/providers.py: Implementations for SerpAPI, Tavily, Brave Search, and DuckDuckGo. Uses python asyncio to fire queries concurrently and merges results into unified data objects.src/content/analyzer.py: The scraper and parser module. Extracts text using Beautiful Soup selectors, strips layout noise, and runs source analysis prompts.src/agents/research_agent.py: Orchestrates the search, crawl, evaluation, and synthesis pipeline. Converts parameters into a final report.The system evaluates services dynamically, shifting load when APIs rate-limit or fail:
| LLM Providers (Priority order) | Search Providers (Concurrent execution) |
|---|---|
| 1. Anthropic Claude (Best synthesis output, paid) | SerpAPI (Google index search, requires key) |
| 2. OpenAI GPT (High consistency, paid) | Tavily (AI search specialist, requires key) |
| 3. Groq (Fast response, free tier) | Brave Search (Independent index, requires key) |
| 4. Google Gemini (Fallback, free tier) | DuckDuckGo (Free backup, runs out-of-the-box) |
git clone https://github.com/shubham0086/research-agent cd research-agent pip install -r requirements.txt cp .env.example .env # Run zero-key mode (uses DDG Search + mock LLM output) python demo/run.py # Run full research pipeline (requires keys) python demo/run.py --topic "autonomous agent memory" --depth deep
Research Agent represents the **concurrent data harvesting** pipeline. It is Stage 2 of the Agentic OS content production line, running immediately after client onboarding:
Brief Intake → [Research Agent] → Content Strategist → Creator → QA → Formatter
The structured data emitted by the Research Agent is fed directly into the Content Strategist to define the outlines of new projects.
The research report is only as good as the search queries you provide and the domains it ranks. A vague search query will retrieve generic articles, and the resulting LLM synthesis will read like a generic Wikipedia summary. To resolve this, we recommend running a "Keyword Expansion" step beforehand to feed the Search Manager highly specific keywords rather than a single sentences. Furthermore, the scraper fetches HTML pages; it does not parse complex media files or locked PDFs.