Pranay Jain

Posts

Showing posts with the label Engrain

Hardening the Pipeline: Defensive Engineering for a Public Beta

Moving from a local prototype to a shared environment requires a shift in mindset from "does it work?" to "can it be broken?" The Cost of Curiosity As I prepared to share Engrain with a small group of friends, I had to face a reality of LLM-based applications: tokens are expensive and resources are finite. Without safeguards, a single user (or a malicious bot) could exhaust my Google Cloud budget or trigger a DoS by spamming the Gemini API with massive blocks of text. I needed to move beyond functional code and implement defensive layers to protect the system's availability and my wallet. Layer 1: Intelligent Rate Limiting I chose slowapi to handle request throttling. A key decision here was how to identify users. Relying solely on IP addresses is unreliable in the age of VPNs and shared networks, so I implemented a custom key function that prioritizes the authenticated user_id from the request state, falling back to the remote address only f...

Engineering Hierarchical Memory and Hybrid RAG for Deeper AI Conversations

As the bug fixes were going on, I tackled two core challenges in building an intelligent reading companion: how to give an AI a "long-term memory" without blowing the token budget, and how to make RAG truly smart by combining semantic understanding with precise keyword matching. The Challenge of Context: Engineering Hierarchical Memory One of the biggest hurdles when building LLM-powered applications is managing context. Users expect conversations to flow naturally, remembering past interactions. However, passing an entire chat history to the LLM quickly hits token limits, leading to truncated responses or huge costs. On the flip side, only providing the last few messages results in a frustratingly forgetful AI. To solve this, I implemented a 3-tier hierarchical memory compression system, inspired by how humans process and recall information. My goal was to mimic our ability to hold recent details, recall broader themes, and summarize long-term knowledge. Here...

Automating Engrain: Smart Database Logic, Seamless Deployments, and Robust Observability

To build a truly robust and scalable AI application, I knew I needed to move beyond basic CRUD operations and manual deployments towards embedding intelligence and automation directly into the database and streamlining my development workflow. DATABASE INTELLIGENCE: SUPABASE FUNCTIONS & TRIGGERS As Engrain grows, the complexity of managing data and user interactions increases. Relying solely on the application layer for every piece of logic can lead to performance bottlenecks, data inconsistencies, and a heavier load on the API. My solution was to push critical, data-centric logic directly into the database using Supabase Functions (PostgreSQL stored procedures) and Triggers. I chose this approach for several key reasons: Performance: Executing logic directly on the database server minimizes network latency and allows for highly optimized operations, especially for complex queries. Atomicity & Data Integrity: Functions and triggers ensure that certa...

Engrain - Containerizing

After a working model of my project was ready, it was now time to test it for production. And the best way before deploying the web application is to test it locally on Docker. Before setting up the docker, I was finding out ways on how can I reduce the size of my application since I was running an open-source model (BAAI/bge-m3). After researching a little, I got to know that I can run this model via API through Hugging Face . All that is needed is a Hugging Face API token. After testing this out, I installed Docker . Docker uses Linux under the hood. It containerizes the whole of your application (kind of like packaging your whole application and moving it to an alien world) and runs it on an isolated container. If you application works error-free here, it most probably would run error-free on cloud. Since we needed 2 containers (1 for frontend and 1 for backend), we needed to define 2 Dockerfiles. Here comes Docker Compose. It is a single-file used to defining, initializing an...

Engrain - Beefing Up

It was now time to scale this project a bit. Static websites were working but I needed something which can handle a large frontend codebase and it's complexities (different modes in UI and stuff). I then went ahead with Nextjs. Nextjs allows you to split your code into different component folders making it very efficient and effective way to manage your code. You can either choose page routing or app routing . App routing has better performance. The components in the app routing renders on the server - fast loading time, whereas, in page routing it renders on the client browser. The app routing supports streaming, the page routing does not. I then moved from the local storage to Supabase for storing all my books, chapters, highlights and logs. It has a free-tier, easy to set up and also provides various authentication methods. It has PostgreSQL under the hood. It also supports pgvector to store vectors into the database (I will demonstrate the need for this in a bit). Now was ...

Engrain - The start of my personal library

The amalgamation of two of my passions - reading books and building something useful - is Engrain. Personally facing the problem of retention, hundreds of highlights stored in a drawer that I rarely open. This was the starting point of this project. The first version of the application was a simple frontend (HTML, CSS) to provide the UI to interact with my Fast API backend: a button to create books and chapters, a button to upload the images of the highlights and a button to talk to the Gemini based LLM wrapper. At the backend I had gemini-3-flash-preview model, which is a multi-modal LLM, that reads the image and based on the prompt, generates the highlighted text. All of the highlights, logs and chat history were stored locally. Initially, I had 3 chat modes - Summarize, Brainstorm and Socratic. All the highlights for that specific chapter in scope were provided to the LLM as context and carefully orchestrated system prompts for each mode was made. The summarizer summarizes all of t...