Skip to main content

Posts

Showing posts from March, 2026

Hardening the Pipeline: Defensive Engineering for a Public Beta

Moving from a local prototype to a shared environment requires a shift in mindset from "does it work?" to "can it be broken?" The Cost of Curiosity As I prepared to share Engrain with a small group of friends, I had to face a reality of LLM-based applications: tokens are expensive and resources are finite. Without safeguards, a single user (or a malicious bot) could exhaust my Google Cloud budget or trigger a DoS by spamming the Gemini API with massive blocks of text. I needed to move beyond functional code and implement defensive layers to protect the system's availability and my wallet. Layer 1: Intelligent Rate Limiting I chose slowapi to handle request throttling. A key decision here was how to identify users. Relying solely on IP addresses is unreliable in the age of VPNs and shared networks, so I implemented a custom key function that prioritizes the authenticated user_id from the request state, falling back to the remote address only f...

Engineering Hierarchical Memory and Hybrid RAG for Deeper AI Conversations

As the bug fixes were going on, I tackled two core challenges in building an intelligent reading companion: how to give an AI a "long-term memory" without blowing the token budget, and how to make RAG truly smart by combining semantic understanding with precise keyword matching. The Challenge of Context: Engineering Hierarchical Memory One of the biggest hurdles when building LLM-powered applications is managing context. Users expect conversations to flow naturally, remembering past interactions. However, passing an entire chat history to the LLM quickly hits token limits, leading to truncated responses or huge costs. On the flip side, only providing the last few messages results in a frustratingly forgetful AI. To solve this, I implemented a 3-tier hierarchical memory compression system, inspired by how humans process and recall information. My goal was to mimic our ability to hold recent details, recall broader themes, and summarize long-term knowledge. Here...

Automating Engrain: Smart Database Logic, Seamless Deployments, and Robust Observability

To build a truly robust and scalable AI application, I knew I needed to move beyond basic CRUD operations and manual deployments towards embedding intelligence and automation directly into the database and streamlining my development workflow. DATABASE INTELLIGENCE: SUPABASE FUNCTIONS & TRIGGERS As Engrain grows, the complexity of managing data and user interactions increases. Relying solely on the application layer for every piece of logic can lead to performance bottlenecks, data inconsistencies, and a heavier load on the API. My solution was to push critical, data-centric logic directly into the database using Supabase Functions (PostgreSQL stored procedures) and Triggers. I chose this approach for several key reasons: Performance: Executing logic directly on the database server minimizes network latency and allows for highly optimized operations, especially for complex queries. Atomicity & Data Integrity: Functions and triggers ensure that certa...

Engrain - Containerizing

  After a working model of my project was ready, it was now time to test it for production. And the best way before deploying the web application is to test it locally on Docker. Before setting up the docker, I was finding out ways on how can I reduce the size of my application since I was running an open-source model (BAAI/bge-m3). After researching a little, I got to know that I can run this model via API through Hugging Face . All that is needed is a Hugging Face API token. After testing this out, I installed Docker . Docker uses Linux under the hood. It containerizes the whole of your application (kind of like packaging your whole application and moving it to an alien world) and runs it on an isolated container. If you application works error-free here, it most probably would run error-free on cloud. Since we needed 2 containers (1 for frontend and 1 for backend), we needed to define 2 Dockerfiles. Here comes Docker Compose. It is a single-file used to defining, initializing an...