Ankush Srivastava
← All work
live·2026

Scout.AI

OSINT-powered talent intelligence: ask in natural language, get a structured report back.

scout-tau-bay.vercel.app
Researchers indexed
4,748
Universities
8 (Boston-area)
Data sources
6
Pipeline
3-stage AI search
Output
Structured intelligence report
Status
Live in production

Most people looking for a research lab know roughly what they want to work on but don't know which professor's group is the right fit. Searching faculty directories by department gives a list of names. Searching by keyword gives noise.

Scout aggregates open-source intelligence on 4,748 researchers across 8 Boston-area universities (MIT, Harvard, Northeastern, BU, Tufts, BC, UMass, Brandeis), pulling from six classes of source: academic publication databases, open-source code, university directories, arXiv preprints, citation indices, and professional networks. Each profile carries publications, h-index, GitHub activity, advisor, lab, notable papers, and an AI-computed Scout Score.

A query like "PhD students publishing on diffusion models at MIT" goes through three stages. First, an LLM expands the query into a richer search string. Second, the expanded query is embedded with Gemini and cosine-matched against every researcher embedding stored as binary blobs in Turso (libsql), top 25 returned. Third, those 25 profiles get serialised into context and a second LLM call produces a structured JSON report: an executive summary, key researchers with relevance explanations, candidate collaborations with project ideas, and a strategic recommendation. The output is a report, not a list.

Built solo and end to end: ingestion pipelines, ranking, the activity-tracking layer, PDF export via html2canvas-pro and jsPDF, the auth middleware. Two model tiers (Mercury-2 for fast, Step-3.5-Flash for in-depth) sit behind OpenRouter so the same query can be routed to whichever is cheapest or fastest for that step. Failure modes get silent retries; markdown-wrapped JSON gets a fallback parser. The interesting engineering is in the ranking, not the retrieval. Naive cosine similarity surfaces popular profiles, not relevant ones, so candidate retrieval and reranking are kept as separate stages and tuned independently.

Stack
Next.js 14TypeScriptTurso (libsql)OpenRouter LLMsGemini embeddingsTailwindFramer MotionVercel