Moving from a local prototype to a shared environment requires a shift in mindset from "does it work?" to "can it be broken?" The Cost of Curiosity As I prepared to share Engrain with a small group of friends, I had to face a reality of LLM-based applications: tokens are expensive and resources are finite. Without safeguards, a single user (or a malicious bot) could exhaust my Google Cloud budget or trigger a DoS by spamming the Gemini API with massive blocks of text. I needed to move beyond functional code and implement defensive layers to protect the system's availability and my wallet. Layer 1: Intelligent Rate Limiting I chose slowapi to handle request throttling. A key decision here was how to identify users. Relying solely on IP addresses is unreliable in the age of VPNs and shared networks, so I implemented a custom key function that prioritizes the authenticated user_id from the request state, falling back to the remote address only f...
Crisp and clear version of myself.