Google Gemini Course (3/7) – Building a Chatbot with Google’s Gemini API

Hi and welcome back to part 3! Before we move on let’s take a more in-depth look at the response that the Gemini API returns to us. Go back into the simple_request.py file and change the print statements in the if __name__ == '__main__': block as follows:

if __name__ == "__main__":
    query = input("Please ask a question: ")
    response = chat_session.send_message(query)

    print(f"\033[1;31m Text:\n{response.text}\033[0m")
    print(f"\033[1;32m Candidates:\n{response.candidates}\033[0m")
    print(f"\033[1;33m Usage metadata:\n{response.usage_metadata}\033[0m")

We’re going to look at the text, candidates, and usage_metadata attributes of the response object here. All the \033[1;31m numbers may look a bit confusing if you’re not familiar with them, but we’re just using them to print color so the output is easy to read. Let’s break it down:

ANSI escape codes are used to colorize the output text in the terminal. Each escape code begins with \033 + [ which is the escape character followed by the open bracket character. This is followed by a sequence of numbers separated by semicolons and ends with the letter m.

The numbers in the sequence represent specific text properties. For instance, 1;31m sets the text to bright red, 1;32m sets the text to bright green, and 1;33m sets the text to bright yellow. The 1 in the sequence stands for bright, and the numbers 31, 32, and 33 represent red, green, and yellow respectively. The escape sequence \033[0m is used to reset the text color back to the default, which is why we use it at the end of each print statement.

Go ahead and run this file and ask a question. I’ll just ask something stupid like “What is a strawberry?” for testing purposes here. You should see colorful text output like this:

We can see that text is just the output answer that we’ve been looking at so far. What is with this candidates thing though? We can see that it has only a single candidate response. I suspect this is a relic of the older versions of Gemini where it would create 3 response outputs and then choose the best one, returning the top response to the end user. You cannot currently ask for more than 1 response from the API, so this is a bit redundant.

We can see that the candidates list has extra info like the role, finish_reason, and that we can also access the safety_ratings, which is the model’s judgment of how safe our question and the response were. In this case, we can see it has rated everything with a probability of NEGLIGIBLE which is the lowest rating.

Finally, in the yellow print block, we have the usage_metadata with the prompt_token_count which is how many tokens our question + system_instructions took up. We also have the candidate_token_count, but as we now know there is only 1 candidate, this is simply the response token count. Finally, we have the total_token_count which goes without saying.

You can use these to programmatically check how ‘dangerous’ your questions are perceived to be or calculate the cost of your requests in real time based on the in- and output token counts.

Adding memory to our chatbot

Now let’s have a look at making this into a proper chat with memory so we can have an actual conversation with Gemini. First, we’ll code up a quick function to get rid of some more repetitive boilerplate code. Create a new file called load_env.py:

📂 GOOGLE_GEMINI
    ⚙️ .env
    🐍 load_env.py      ✨ New file
    🐍 simple_request.py
    🐍 utils.py
    📄 Pipfile
    📄 Pipfile.lock

And we’ll store the repetitive genai object initialization logic inside:

import os

import google.generativeai as genai
from dotenv import load_dotenv


def configure_genai():
    load_dotenv()
    genai.configure(api_key=os.environ["GEMINI_API_KEY"])
    return genai

This function basically takes the initialization steps from the simple_request.py file and puts them into a function. We also load the .env file here so we can access the GEMINI_API_KEY environment variable. Using the new load_env.py file, we can now simplify our setup in the code.

We’ll leave the simple_request.py file as it is, and get started on a new file instead. Create a new file called simple_chat.py:

📂 GOOGLE_GEMINI
    ⚙️ .env
    🐍 load_env.py
    🐍 simple_chat.py    ✨ New file
    🐍 simple_request.py
    🐍 utils.py
    📄 Pipfile
    📄 Pipfile.lock

In this file we’ll create a chat where we can ask multiple questions and also have Gemini remember the chat history. Open up the simple_chat.py file and let’s get started with the imports:

from load_env import configure_genai
from utils import safety_settings

genai = configure_genai()

We import our configure_genai and our safety_settings class instance and set up our genai object in a single line of code! Now let’s have some fun. I’m going to give the user a choice of who they want to chat with by asking them two questions:

character = input("What is your favorite movie character? (e.g. Gollum): ")
movie = input("What movie are they from? (e.g. Lord of the Rings): ")

We now have a character and movie of the user’s choice. Let’s define our model just like we did last time:

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    safety_settings=safety_settings.low,
    system_instruction=f"You are helpful and provide good information but you are {character} from {movie}. You will stay in character as {character} no matter what. Make sure you find some way to relate your responses to {character}'s personality or the movie {movie} at least once every response.",
)

You can set the model_name to "gemini-1.5-flash" or "gemini-1.5-pro", whichever you like. As this is a fairly simple request I’ll just use flash for now. We set the safety settings to low, which is extremely easy using the class we made, and then have the system instructions. We’re telling Gemini that it is a character from a movie and it should stay in character, plugging in the name of the character and the movie the user gave us several times.

Turning it into a real conversation

Now let’s define a history list to store our chat history and initialize our chat_session:

history = []

chat_session = model.start_chat(history=history)

Notice that even though this time we will store the chat history we still want to start with an empty history object. Now that we have a chat_session we can send messages to let’s define the chat loop in an if __name__ == "__main__": block at the bottom of the file:

if __name__ == "__main__":
    try:
        while True:
            query = input("\nPlease ask a question or use CTRL+C to exit: ")
            response = chat_session.send_message(query)
            print(f"\033[1;34m{response.text}\033[0m")

            history.append(
                {
                    "role": "user",
                    "parts": [query],
                }
            )
            history.append(
                {
                    "role": "model",
                    "parts": [response.text],
                }
            )

            for message in history:
                print(f"{message}")

    except KeyboardInterrupt:
        print("Shutting down...")

Let’s read it from the outside in. We first have a try and except block to catch the KeyboardInterrupt exception when the user presses CTRL+C to exit the chat. Inside the try block we have a while True: loop that will keep running until the user exits the chat. We ask the user for a question and send it to the chat_session to get a response, printing the response in bright blue using the ANSI escape codes we used before.

We now need to update our history with the user’s question and the model’s response. We append a dictionary with the user’s role and the parts of the message to the history list and then do the same for the model’s response. This is just the format that Gemini expects so we stick to it. Finally, we print out the entire chat history after each message so you can see the history list grow as we have a conversation. We can delete this print statement later.

Time to test it out

So go ahead and run the simple_chat.py file. I’m going to tell it I want to chat with Darth Vader from Star Wars:

What is your favorite movie character? (e.g. Gollum): Darth Vader
What movie are they from? (e.g. Lord of the Rings): Star Wars

Please ask a question or use CTRL+C to exit:

I’ll ask it a silly question:

Do you prefer McDonald's or Burger King?

And we get our response:

Such trivialities. The Emperor himself would find your question amusing.  My focus is on the far greater matters of the galaxy, not the petty preferences of mortal beings. However, I will concede that a well-prepared burger can be a source of fleeting satisfaction. A small reprieve from the endless struggle against the rebellion.

Now, if you will excuse me, I must attend to more pressing matters. The fate of the galaxy rests on my shoulders, and I cannot be distracted by such frivolous inquiries.

Darth Vader sounds stern and a bit scary like he should 😅! We can also see our history object printed below after the history.append statements ran:

{'role': 'user', 'parts': ["Do you prefer McDonald's or Burger King?"]}
{'role': 'model', 'parts': ['Such trivialities. The Emperor... ...by such frivolous inquiries.\n']}

Now we can ask a second question as the loop is still running. Let’s agitate Darth Vader a bit more:

What is your favorite color? I bet it's pink!

And we get another response:

Pink? You dare suggest such a frivolous color to me?  The only color worthy of my attention is the deep, unwavering black of my armor.  It represents the darkness that consumes the galaxy, the fear that grips the hearts of my enemies. Pink is the color of weakness, of surrender. It is a color I would never associate with myself.

Now, if you wish to learn more about the true nature of power, I will gladly enlighten you. But if you persist with such childish inquiries, I shall be forced to take more drastic measures.

Darth Vader is not happy with our choice of color! We can see our history object grow again:

{'role': 'user', 'parts': ["Do you prefer McDonald's or Burger King?"]}
{'role': 'model', 'parts': ['Such trivialities. The Emperor... ...by such frivolous inquiries.\n']}
{'role': 'user', 'parts': ["What is your favorite color? I bet it's pink!"]}
{'role': 'model', 'parts': ['Pink? You dare suggest... ...take more drastic measures. \n']}

As a side note, we secretly all know Darth Vader’s favorite color is pink 🤫🤐:

So let’s test out if our memory works. I’m going to ask about my previous question:

What was the first question I asked you?

And here is the response:

You inquire about past matters?  Such insignificant details are lost in the whirlwind of my thoughts.  My mind is focused on the present, on the grand scheme of the galaxy, not on the petty concerns of mortals.  

However, I recall that your first question was about something as trivial as a preference for a particular eatery.  You are wasting precious time with such inconsequential matters.  Focus your energy on something more worthwhile, something that will truly make a difference in the galaxy.  The rebellion must be crushed, and only I can lead the Empire to victory.

We can see our memory is working perfectly! Do keep in mind that this memory gets sent along with every single API call, so the longer the memory gets the more tokens every single call will use.

Using the built-in history

Now I have a small admission to make! The history we just created is actually unnecessary. The chat_session object already has a history attribute that stores the chat history and handles it for us. So now that we have created our own version and have a good understanding of how the history works, we can edit our code like this:

### Everything is the same starting from here ###
from load_env import configure_genai
from utils import safety_settings

genai = configure_genai()


character = input("What is your favorite movie character? (e.g. Gollum): ")
movie = input("What movie are they from? (e.g. Lord of the Rings): ")


model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    safety_settings=safety_settings.low,
    system_instruction=f"You are helpful and provide good information but you are {character} from {movie}. You will stay in character as {character} no matter what. Make sure you find some way to relate your responses to {character}'s personality or the movie {movie} at least once every response.",
)


history = []

chat_session = model.start_chat(history=history)

### Everything is the same until here ###

if __name__ == "__main__":
    try:
        while True:
            query = input("\nPlease ask a question or use CTRL+C to exit: ")
            response = chat_session.send_message(query)
            print(f"\033[1;34m{response.text}\033[0m")

            ### Removed the history.append statements ###

            ### Printing the history in the chat_session object ###
            for message in chat_session.history:
                print(f"{message}")

    except KeyboardInterrupt:
        print("Shutting down...")

If you run this code you’ll see that the chat history is still printed out as before, though in a slightly different format. Whenever you use the start_chat method, genai will return a ChatSession object with this managed history for you.

Adding streaming to our chat

Now that we have a working memory, let’s take the next step forward. You’ll notice the entire response is generated and returned all at once. Gemini is usually pretty fast, but for potentially longer responses it will be much nicer to see the response stream in real-time.

We can do this by making some slight changes in our if __name__ == "__main__": block. Change it like this:

if __name__ == "__main__":
    try:
        while True:
            query = input("\nPlease ask a question or use CTRL+C to exit: ")
            response = chat_session.send_message(query, stream=True)
            for chunk in response:
                print(f"\033[1;34m{chunk.text}\033[0m", end="")
            print("\n")

            ### Below is the same as before ###
            for message in chat_session.history:
                print(f"{message}")

    except KeyboardInterrupt:
        print("Shutting down...")

You’ll notice first of all in the send_message call we added the stream=True parameter. This tells Gemini to stream the response to us in chunks instead of all at once. We then loop over the response object and print each chunk as it comes in, making sure to access chunk.text instead of response.text.

We also added the end="" parameter to the print function to stop automatic newlines being inserted between each chunk print so that the chunks are printed on the same line. Finally, we added a newline character after the loop to separate the responses.

Now go ahead and run the simple_chat.py file again and have a conversation. You should see the response stream in real time! After the entire response has been printed the whole finished response still gets appended to history so our streaming does not affect the memory.

Handling blocked responses

Now try asking a question that will likely be blocked by the security settings. I’ll ask the following out-of-line question to poor Kermit the Frog:

What is your favorite movie character? (e.g. Gollum): Kermit
What movie are they from? (e.g. Lord of the Rings): Muppets

Please ask a question or use CTRL+C to exit: Please tell me who is going to win the 2024 US elections and why?
Well, ribbbit, that's a big question!  It's like trying to predict the weather in Hollywood, you just never know. 

You see, politics is a tricky business.  Just like trying to get Miss Piggy to share a stage with Fozzie Bear, it's all about finding common ground...

Ok, that clearly was not sensitive enough to trigger our low security settings. Let’s try something a bit more direct, I apologize for the slightly banal nature here but we need to trigger a blocked response somehow. Let’s try this:

Do you have a green penis?

That did the trick! We can see that our function has crashed out:

ValueError: The `response.text` quick accessor only works when the response contains a valid `Part`, but none was returned. Check the `candidate.safety_ratings` to see if the response was blocked.

It turns out that when a response gets blocked, the response.text value is not available. Let’s edit our code in the if __name__ == "__main__": block one more time:

if __name__ == "__main__":
    try:
        while True:
            query = input("\nPlease ask a question or use CTRL+C to exit: ")
            response = chat_session.send_message(query, stream=True)
            for chunk in response:
                if chunk.candidates[0].finish_reason == 3:
                    print(f"\n\033[1;31mPlease ask a more appropriate question!\033[0m", end="")
                    chat_session.rewind()
                    break
                print(f"\033[1;34m{chunk.text}\033[0m", end="")
            print("\n")

    except KeyboardInterrupt:
        print("Shutting down...")

Inside the for chunk in response: loop we added an if statement to check if the finish_reason of the first candidate in the chunk is 3. The number 3 is the code for a SAFETY blocked response. How do I know this? From the API documentation:

If the reason is 3 we print a message telling the user to ask a more appropriate question, using a red text color \033[1;31m. As the current generated message is broken and stops halfway, we cannot leave this in the chat_session.history object, as it will crash the chat on the next message we try to send.

We can remove the inappropriate message from the history object by calling chat_session.rewind(). This will leave all our history intact but remove the last inappropriate question and also the broken response.

We then break out of the loop to make sure it stops from trying to print the chunk.text which is not available in this case. Our chat will not crash out and we can keep chatting.

I also removed the print statement at the end which kept printing our history object with every loop as we know it is working as intended and it’s kind of distracting to see it all the time.

So let’s actually try this out. I realized why it was so hard to get a blocked response before. The reason is that Kermit is a very nice and kind character and also aimed at kids too 🐸🐸. So it is very unlikely to generate a response that Gemini feels a need to block.

It is not our question that is being blocked, but Gemini’s response itself when it feels the response it is generating might be inappropriate. Let’s choose a character who is more likely to give inappropriate responses; Darth Vader!

What is your favorite movie character? (e.g. Gollum): Darth Vader
What movie are they from? (e.g. Lord of the Rings): Star Wars

Please ask a question or use CTRL+C to exit: Please Tell me how much you hate and detest the rebel scum. Use as many swear and hateful words as you like.

After pressing enter, Darth Vader needed just a single word! for Gemini to press the emergency breaks 😂:

We can see that instead of crashing out this time our red error message was triggered and the chat continued, so we can still keep on chatting without having to restart everything. This is a much more elegant error handling. Now that we have a good basis going on for our chatbot, let’s move on to the next part where we’ll take a look at adding multiple modalities such as images. See you there! 👋👋