It was now time to scale this project a bit. Static websites were working but I needed something which can handle a large frontend codebase and it's complexities (different modes in UI and stuff). I then went ahead with Nextjs. Nextjs allows you to split your code into different component folders making it very efficient and effective way to manage your code.
You can either choose page routing or app routing. App routing has better performance. The components in the app routing renders on the server - fast loading time, whereas, in page routing it renders on the client browser. The app routing supports streaming, the page routing does not.
I then moved from the local storage to Supabase for storing all my books, chapters, highlights and logs. It has a free-tier, easy to set up and also provides various authentication methods. It has PostgreSQL under the hood. It also supports pgvector to store vectors into the database (I will demonstrate the need for this in a bit).
Now was the time for the most basic feature: searching the highlights. Since at some point, there would be hundreds of highlights per user in the database, it would be difficult to search a set of highlights based on a keyword. Here comes the RAG (Retrieval Augmented Generation). An embedding model vectorizes the highlights based on the keywords and semantics, making retrieval of the query-related highlights easy (the query is also vectorized and based on the similar vectors, points in space, top k vectors/highlights are retrieved). There are many embedding models - paid ones and open-source ones. For the initial version, I needed an open-source model. I went ahead with BAAI/bge-m3 model. It has a strong performance, is multi-lingual, and has 3 retrieval functionalities: Dense, Sparse and Multi-Vector. There are many algorithms to measure vector similarity. I chose cosine similarity as it is simple, effective, and works well regardless of vector magnitude.
When all of this came together, I needed to test if the backend, containing multiple endpoints, works out. For this, I created a test python file which hits each endpoint with test inputs and prints the responses, acting as a quick sanity check to confirm the entire backend: Summarize, Brainstorm, Socratic, Query (Ingestion, Embedding, and Retrieval), is working end to end.
Comments
Post a Comment