The Roman Nepali Embedding Problem

The Roman Nepali Embedding Problem I spelled the same Nepali word four different ways and asked four open-source embedding models whether the spellings meant the same thing. The model with the prettiest-looking cosine gap wasn’t the one that actually worked — and a twenty-line preprocessing script beat all four of them without touching a single weight. This post is a small experiment with a strong conclusion: if you are shipping NLP for Nepali users today, the best thing you can do is not a bigger model — it’s a regex. ...

April 21, 2026 · 11 min · Anil Paudel

Understanding RAG from scratch (Part I)

1. Introduction to RAG Fundamentals Hey everyone, and welcome to our deep dive into Retrieval-Augmented Generation, or RAG! In the rapidly evolving world of Large Language Models (LLMs), we’ve seen incredible feats of text generation, translation, and conversation. However, even the most powerful LLMs have limitations. They can sometimes “hallucinate” – producing plausible but incorrect information – or lack knowledge about events that occurred after their training data was collected. ...

May 1, 2025 · 20 min · Anil Paudel