Skip to main content

Engrain - Containerizing

 After a working model of my project was ready, it was now time to test it for production. And the best way before deploying the web application is to test it locally on Docker.


Before setting up the docker, I was finding out ways on how can I reduce the size of my application since I was running an open-source model (BAAI/bge-m3). After researching a little, I got to know that I can run this model via API through Hugging Face. All that is needed is a Hugging Face API token.


After testing this out, I installed Docker. Docker uses Linux under the hood. It containerizes the whole of your application (kind of like packaging your whole application and moving it to an alien world) and runs it on an isolated container. If you application works error-free here, it most probably would run error-free on cloud. Since we needed 2 containers (1 for frontend and 1 for backend), we needed to define 2 Dockerfiles. Here comes Docker Compose. It is a single-file used to defining, initializing and running your containers in one-go. Now, to build both of my containers, all I needed to do was run this command: docker compose build, and done.


Once this was done, I deployed these 2 docker containers to Google Cloud. It gives you free $300 in credits. I used gcloud via CLI (Command Line Interface) and hence it was easy to make changes in my application and re-deploy again using the same commands. You have to be cautious about your secrets, region and other sensitive parameters that go in.


Using BAAI/bge-m3 via Hugging Face: 

api_url = "https://router.huggingface.co/hf-inference/models/BAAI/bge-m3/pipeline/feature-extraction"
        
        async with httpx.AsyncClient() as client:
            for attempt in range(max_retries):
                try:
                    response = await client.post(
                        api_url,
                        headers=self.hf_headers,
                        json={"inputs": text},
                        timeout=10.0
                    )
                    
                    if response.status_code == 200:
                        embedding = response.json()
                        logger.info(f"✅ Embedding generated: {len(embedding)} dimensions")
                        return embedding

Comments

Popular posts from this blog

Does God exist?

Does God exist? There is an emotional perspective to this question (called as religion) and then there is a logical perspective. Let us touch upon the latter one in this blog. If God doesn't exist, who created this universe? How are we conscious? Who designed everything so perfectly that we are alive? These are some of the typical questions of the "logical" class of people. But what they forget to take into consideration is that how old the life on earth really is compared to the age of the universe? The age of the universe is estimated to be around 13.8 billion years since the Big Bang. The life on earth started to exist from about 3.7 billion years ago. What does that mean? If there really was a creator, what took him/her so long to form life on earth? And if indeed he designed every position of the planet, comet and space-time fabric so perfectly, why was there an imperfection for about 10.1 billion years? What do you mean by "perfect"? Anything that supports...

What is mind?

What is mind? Think about it. Vow, isn't it an irony? How can you even think about thought? That's right. Do not think about it. Observe your thought. What is it exactly? Isn't it something that you say to yourself? It is basically a bundle of words. But wait, where do these words come from? And are you even controlling it? Can you control all of your thoughts? How do we find answers to these questions? Observe. You don't control most of your thoughts. They just happen subconsciously. Where do the words come from? From all of the information that you have collected till now from birth. Think of it like ChatGPT, someone is prompting and it is answering. Is ChatGPT aware and in control? No. It is answering what it knows, what it is trained for. Similar is our mind, except, the prompt is either the sensory inputs or the mind itself. Aren't we the best AI Language Model possible already? Not in terms of latency, but in terms of accuracy.

Top Prompting Techniques for LLMs

 1. Ask it to imitate the person you look up to. Example, MS Dhoni. You don't have to meet Dhoni to ask your question any more. LLM can answer your query pretending to be him. 2. GRWC - Specify Goal, Return format, Warnings, Context. 3. Reverse prompting - Give LLM a piece of art that you want to get inspiration from and ask it to design a prompt that will generate that art piece. Now, tune that prompt to your liking.  4. When reasoning, ask the LLM for a tree of thought to consider all possibilities and explore all options rationally. Think of what frameworks you apply when you think and replicate the same thing with the LLM.