Dec 15, 2025
RAG that feels magical: personalization without confusion
Personalization in AI products often feels like a gimmick—a few variables swapped here and there. But with Retrieval-Augmented Generation (RAG), we have the opportunity to make a product truly understand the user's specific context.
### The Challenge of Over-Retrieval The biggest mistake in RAG implementation is pulling too much noise. When you dump 10 irrelevant documents into an LLM prompt just because they matched a vector search, you dilute the signal. The "magic" happens in the filtering.
### My Approach at CosmiQ At CosmiQ, we implemented a multi-stage retrieval process: 1. **User Identity Filtering**: Only search within the user's explicit knowledge base. 2. **Temporal Weighting**: Prioritize documents and interactions from the last 7 days. 3. **Semantic Reranking**: Use a smaller, faster model to verify the relevance of the top 20 matches before passing the top 5 to the main LLM.
The result? Users mentioned that the AI "remembered" things they'd forgotten they even wrote. That’s the outcome we’re aiming for.
Enjoyed this note?
I regularly share thoughts on building AI products and scaling engineering loops. Let's connect if you're shipping something interesting.