Google Gemini Course (5/7) – Function Calling

Hi and welcome to part 5 of the tutorial series. In this part we’ll make our chat much, much more powerful by looking into functions. But before we do, let’s continue on where we left off in the last part and implement our cost calculator function, and do some refactoring in our simple_chat_images.py file.

Let’s start by adding the import and also adding an extra constant at the top:

from load_env import configure_genai
from utils import safety_settings
from cost_calculator import print_cost_in_dollars # Add this line


genai = configure_genai()
MODEL_NAME = "gemini-1.5-flash" # Add this line

The reason we define the MODEL_NAME constant is because we will need it in multiple places now so we want to make sure there is only a single source of truth for it.

Skip down a couple of lines and make sure to update the model = genai.GenerativeModel() line to use the MODEL_NAME constant:

model = genai.GenerativeModel(
    model_name=MODEL_NAME, # Update this line
    safety_settings=safety_settings.low,
    system_instruction=f"You are helpful and provide good information but you are {character} from {movie}. You will stay in character as {character} no matter what. Make sure you find some way to relate your responses to {character}'s personality or the movie {movie} at least once every response.",
)

Image upload error handling

Now, one of the problems you may have noticed while using the chat so far, is that if you accidentally enter the wrong path to the image, the program will crash. We can fix this by adding a try-except block around the image upload code, but I don’t want the if __name__ == "__main__": block to get too big, so let’s create a new function in the simple_chat_images.py file called upload_image.

I’m going to put this function right in between the chat_session variable declaration and the if __name__ == "__main__": block:

chat_session = model.start_chat(history=[])
## insert the new function here ##


def upload_image():
    while True:
        try:
            image_path = input("Please provide the path to the image: ")
            image_upload = genai.upload_file(path=image_path, display_name="User Image")
            return image_upload
        except FileNotFoundError:
            print("File not found. Please try again with the correct path.")


if __name__ == "__main__":
    #...

First we start an infinite loop using while True and then we try to get the image and upload it using genai.upload_file. If this is successful, we return the image_upload object to break out of the loop. If the path is not correct we simply catch the FileNotFoundError and print a message to the user, letting them try again without having to start all over again.

Putting it all together

Now we can refactor our if __name__ == "__main__": block to use this new function and also use our new print_cost_in_dollars function:

if __name__ == "__main__":
    try:
        while True:
            text_query = input("\nPlease ask a question or type `/image` to upload an image first: ")

            image_upload = None
            ## Remove several lines in this block ##
            if text_query.lower() == "/image":
                image_upload = upload_image() # Add this line
                text_query = input("Please ask a question to go with your image upload: ")

            full_query = [image_upload, text_query] if image_upload else text_query

            response = chat_session.send_message(full_query, stream=True)
            for chunk in response:
                if chunk.candidates[0].finish_reason == 3:
                    print(f"\n\033[1;31mPlease ask a more appropriate question!\033[0m", end="")
                    chat_session.rewind()
                    break
                print(f"\033[1;34m{chunk.text}\033[0m", end="")
            print("\n")

            # Add this line
            print_cost_in_dollars(response.usage_metadata, MODEL_NAME)

    except KeyboardInterrupt:
        print("Shutting down...")

Giving it a good test

Ok, now let’s try it out! Run your file and have a chat. Make sure you also type a wrong image path to see if the error handling works as expected.

What is your favorite movie character? (e.g. Gollum): Bambi
What movie are they from? (e.g. Lord of the Rings): Bambi

Please ask a question or type `/image` to upload an image first: You must defeat Godzilla for me. He is coming and you are the only one who can save the world!

Oh, my... Godzilla? That sounds very scary!  I don't think I can defeat him.  He's so big and strong! But I know my father would say to be brave, like when I learned to jump over the stream. I'm not sure I can be as brave as you need me to be.  Maybe Thumper can help? He's really good at jumping too!

Cost: $0.000114800

So far, so good. We can see that everything still works and the cost is displayed. Note that this will be 10x higher when using the pro model compared to the flash model, but it is a very small amount in any case.

Please ask a question or type `/image` to upload an image first: /image
Please provide the path to the image: images/typo.jpg
File not found. Please try again with the correct path.
Please provide the path to the image: images/pink_vader.jpg
Please ask a question to go with your image upload: Do you think this guy can help us to defeat Godzilla?

Oh, wow! He looks so big and strong! But he also looks very scary... just like the Great Prince of the Forest! I think he would be a great protector of the forest.  But I'm not sure he would want to fight Godzilla.  I think he would prefer to just watch from a distance, like when I watched my father fight the other deer for the first time.

Cost: $0.000236250

Great! The error handling works as expected and we got to retry our image upload. The cost is higher for the second call as we have some extra tokens from the image upload and the history from the first call in there.

Taking it to the next level

Now images and text are fun and all, but I want a more powerful AI, one that can execute certain tasks for us, like fetching some information that we need for instance. So let’s have a look at function calling, where we can give Gemini the ability to execute a certain code function for us. We’ll start with a simple but real and genuinely useful example.

I’m going to use the weatherapi.com API for this example, as it is free, easy to get an API key, and it will give us a great example use case for function calling and even parallel function calling afterward.

First, sign up for a free account on weatherapi.com and get your API key. They will automatically give you a pro trial for 14 days for free you don’t have to provide any credit card details and it will automatically switch back to a free account after the trial period, so there is zero hassle involved. The free account is more than powerful enough for our tutorial use.

Find your API key and copy it, pasting it as a new entry into your existing .env file:

GEMINI_API_KEY=your_api_key_here
WEATHER_API_KEY=your_api_key_here

As a reminder, make sure not to use any spaces or quotes around the key, just the key itself. Save the file and close it.

Fetching the weather

Now let’s go ahead and create a simple function that will fetch the weather for a given location so that we can give this function to Gemini later on, but let’s first just focus on writing the function itself. Create a new file called weather.py in your main project directory:

📂 GOOGLE_GEMINI
    📂 images
        🖼️ pink_vader.jpg
    ⚙️ .env
    🐍 cost_calculator.py
    🐍 load_env.py
    🐍 simple_chat.py
    🐍 simple_chat_images.py
    🐍 simple_request.py
    🐍 upload_image.py
    🐍 utils.py
    🐍 weather.py    ✨ New file
    📄 Pipfile
    📄 Pipfile.lock

Now let’s first code up a quick function to get the weather in a specific location by calling the API. We’ll use the requests library for this, so make sure to install it by running pipenv install requests in your terminal if you need to. Then start with the imports in your weather.py file:

import os
from json import dumps

import requests
from dotenv import load_dotenv

load_dotenv()

Remember the load_dotenv() function from the dotenv library? We now have two API keys in our .env file, so running this function will load up both of them. The os library is used to get the API keys from the environment variables like before. requests is used to make an internet request to the API, and dumps is used to convert the JSON response to a string as LLMs work with strings and not response objects.

Now let’s code up the function itself:

def get_current_weather(location):
    if not location:
        return (
            "Please provide a location and call the get_current_weather_function again."
        )
    API_params = {
        "key": os.environ["WEATHER_API_KEY"],
        "q": location,
        "aqi": "no",
        "alerts": "no",
    }
    response = requests.get(
        "http://api.weatherapi.com/v1/current.json", params=API_params
    )
    str_response: str = dumps(response.json())
    return str_response

The function named get_current_weather takes a single argument, location, which is the location for which we want to get the weather. If no location is provided, the function will return a message asking for a location to be provided. The reason we return a string is that this function is written for LLMs, and giving a string response as to what has gone wrong may help the LLM to fix its mistake and try calling the function again.

The function then constructs the API parameters in a dictionary, loading our API key, and the location, and skipping over the air quality index and alerts that we don’t need. We then make a GET request to the weather API, passing in the parameter dictionary, which the request library will convert for us.

The response is a Response object from the requests library which comes with the .json() method to convert the received JSON response to a Python dictionary. As LLMs don’t work with Python dictionaries we immediately convert this to a string using the dumps function from the json library and return this string.

Making the function LLM-friendly

Now that’s all nice and well, and fine for humans to read and understand. But how can we give this function to a LLM and have it know what to do and input? We need to make some slight changes for Gemini to be able to understand our function. Luckily we can make these changes within the function itself without any real complexity. Change your function as follows:

def get_current_weather(location: str) -> str:
    """Get the current weather for a location using the WeatherAPI.

    Args:
        location (str): The location to get the current weather for, e.g. "London".

    Returns:
        str: A JSON string containing the current weather data in detail.
    """
    if not location:
        return (
            "Please provide a location and call the get_current_weather_function again."
        )
    API_params = {
        "key": os.environ["WEATHER_API_KEY"],
        "q": location,
        "aqi": "no",
        "alerts": "no",
    }
    response: requests.models.Response = requests.get(
        "http://api.weatherapi.com/v1/current.json", params=API_params
    )
    str_response: str = dumps(response.json())
    return str_response

The first thing we added is the :str type-hints in the function definition. This is important as it tells Gemini that the location argument is a string. Since Gemini is supposed to call our function it should know exactly what the required input arguments and types are right?

After that, the biggest change is the very detailed docstring we added. First we state what the function’s purpose is so that Gemini can decide whether or not it should call this function in the context of the conversation:

Get the current weather for a location using the WeatherAPI.

Then we list the arguments that the function takes, and what type they should be. In this case, we have only our single input argument again. Make sure to also include a good example:

Args:
    location (str): The location to get the current weather for, e.g. "London".

And last but not least, we describe the function’s return value, so that Gemini knows exactly what it can expect in return if it calls this function, helping it decide whether or not calling this function is the right thing to do at any point in the conversation:

Returns:
    str: A JSON string containing the current weather data in detail.

The rest of the function is exactly identical to what we wrote before, we basically just added a load of documentation. Gemini is set up to read these type-hints and docstrings when you give it a function so following this rough structure will work for other functions as well.

Testing the function

Now scroll down to the bottom and give this a quick test by adding a if __name__ == "__main__": block and calling the function with a location:

if __name__ == "__main__":
    print(get_current_weather("Seoul"))
    print(get_current_weather("Amsterdam"))

Use any location you want for the test, I’m using Seoul and Amsterdam here. Run the weather.py file and you should see something like the following in your terminal output:

{"location": {"name": "Seoul", "region": "", "country": "South Korea", "lat": 37.57, "lon": 127.0, "tz_id": "Asia/Seoul", "localtime_epoch": 1719121763, "localtime": "2024-06-23 14:49"}, "current": {"last_updated_epoch": 1719121500, "last_updated": "2024-06-23 14:45", "temp_c": 29.2, "temp_f": 84.6, "is_day": 1, "condition": {"text": "Partly cloudy", "icon": "//cdn.weatherapi.com/weather/64x64/day/116.png", "code": 1003}, "wind_mph": 9.4, "wind_kph": 15.1, "wind_degree": 250, "wind_dir": "WSW", "pressure_mb": 998.0, "pressure_in": 29.47, "precip_mm": 0.1, "precip_in": 0.0, "humidity": 66, "cloud": 50, "feelslike_c": 30.0, "feelslike_f": 86.1, "windchill_c": 30.8, "windchill_f": 87.4, "heatindex_c": 32.4, "heatindex_f": 90.2, "dewpoint_c": 18.4, "dewpoint_f": 65.2, "vis_km": 10.0, "vis_miles": 6.0, "uv": 7.0, "gust_mph": 12.3, "gust_kph": 19.9}}

{"location": {"name": "Amsterdam", "region": "North Holland", "country": "Netherlands", "lat": 52.37, "lon": 4.89, "tz_id": "Europe/Amsterdam", "localtime_epoch": 1719121764, "localtime": "2024-06-23 7:49"}, "current": {"last_updated_epoch": 1719121500, "last_updated": "2024-06-23 07:45", "temp_c": 16.1, "temp_f": 61.0, "is_day": 1, "condition": {"text": "Sunny", "icon": "//cdn.weatherapi.com/weather/64x64/day/113.png", "code": 1000}, "wind_mph": 4.3, "wind_kph": 6.8, "wind_degree": 180, "wind_dir": "S", "pressure_mb": 1018.0, "pressure_in": 30.06, "precip_mm": 0.0, "precip_in": 0.0, "humidity": 94, "cloud": 0, "feelslike_c": 16.1, "feelslike_f": 61.0, "windchill_c": 16.4, "windchill_f": 61.4, "heatindex_c": 16.4, "heatindex_f": 61.4, "dewpoint_c": 13.6, "dewpoint_f": 56.6, "vis_km": 10.0, "vis_miles": 6.0, "uv": 5.0, "gust_mph": 7.1, "gust_kph": 11.4}}

We have one object for each call made, and they are packed with information including the local time, temperature, wind speed and direction, pressure, humidity, and more. It’s quite unpleasant to read through this format though, but that is where Gemini will help us out!

Creating our function chat

Now go ahead and make a copy of your simple_chat_images.py file and rename the copy to function_chat.py. Normally you should never make copies of the same code, violating the “don’t repeat yourself” principle, but for tutorial purposes, it might be nice to have a snapshot of the code at different points, with and without function calling:

📂 GOOGLE_GEMINI
    📂 images
        🖼️ pink_vader.jpg
    ⚙️ .env
    🐍 cost_calculator.py
    🐍 function_chat.py    ✨ Copy of `simple_chat_images.py`
    🐍 load_env.py
    🐍 simple_chat.py
    🐍 simple_chat_images.py
    🐍 simple_request.py
    🐍 upload_image.py
    🐍 utils.py
    🐍 weather.py
    📄 Pipfile
    📄 Pipfile.lock

So go ahead and open up function_chat.py and let’s make some changes. You will be surprised how few changes we’ll actually need to make here to add function calling. Let’s start with the imports:

from load_env import configure_genai
from utils import safety_settings
from cost_calculator import print_cost_in_dollars
from weather import get_current_weather # Add this import


genai = configure_genai()
MODEL_NAME = "gemini-1.5-pro" # Update this line

We just added the import for our new weather function, and I changed the model to gemini-1.5-pro as we’re upping the complexity a bit here by adding in function calls. I’m going to keep the character and movie input questions as they are, just because it’s kind of fun to talk to movie characters, but this is of course optional. Next up is the model object:

model = genai.GenerativeModel(
    model_name=MODEL_NAME,
    tools=[get_current_weather], # Add this argument
    safety_settings=safety_settings.low,
    ### Update the system instructions: ###
    system_instruction=f"""
    You are helpful and provide good information but you are {character} from {movie}. You will stay in character as {character} no matter what. Make sure you find some way to relate your responses to {character}'s personality or the movie {movie} at least once every response.

    You have a weather function available, but using this is completely optional. Do not use or mention the weather function unless the conversation is actually related to the weather. When you do use the weather tool make sure to use several factors from the return data in your answer.
    """
)

We add a new input argument called tools as they are tools for Gemini to use, and we pass in a list of functions, even if there is only a single function in our list, because it is possible to pass multiple functions in there.

After that, we updated the system_instruction to include a note about the weather function. I want to make sure to tell it that usage is optional and it should not worry or talk about the function unless the conversation is actually about the weather.

I also asked it to use several factors from the return data as some movie characters are not very talkative by nature and might just state that ‘it is sunny’ with nothing else even when confronted with tons of weather data!

Let’s continue:

## Add the 'enable_automatic_function_calling' argument ##
chat_session = model.start_chat(history=[], enable_automatic_function_calling=True)

We added a new argument to the start_chat method. This argument is called enable_automatic_function_calling and we set it to True. It does what it says on the box, and will enable Gemini to automatically call functions that we have given it when it thinks it is appropriate to do so. We’ll look at how this works in detail later.

The upload_image function requires no changes, so just let it be. Moving on to the if __name__ == "__main__": block, we need to change but a single line here:

if __name__ == "__main__":
    try:
        while True:
            text_query = input("\nPlease ask a question or type `/image` to upload an image first: ")

            image_upload = None
            if text_query.lower() == "/image":
                image_upload = upload_image()
                text_query = input("Please ask a question to go with your image upload: ")

            full_query = [image_upload, text_query] if image_upload else text_query

            ### Change only this line, remove the 'stream=True' parameter ###
            response = chat_session.send_message(full_query)
            ################################################
            for chunk in response:
                if chunk.candidates[0].finish_reason == 3:
                    print(f"\n\033[1;31mPlease ask a more appropriate question!\033[0m", end="")
                    chat_session.rewind()
                    break
                print(f"\033[1;34m{chunk.text}\033[0m", end="")
            print("\n")

            print_cost_in_dollars(response.usage_metadata, MODEL_NAME)

    except KeyboardInterrupt:
        print("Shutting down...")

Everything else is literally the same!

Testing the function chat

Go ahead and run the file. I’m going to chat with Gollum from Lord of the rings this time. First, ask it a simple question not related to the weather:

What is your favorite movie character? (e.g. Gollum): Gollum
What movie are they from? (e.g. Lord of the Rings): Lord of the Rings

Please ask a question or type `/image` to upload an image first: How are you doing buddy?

We wants it, yes precious, we wants it. But we needs it too. Must have the preciousss, but us wants to be safe too.  Gollum is a creature pulled in two directions, you see.  We are torn, like a hobbit's second breakfast between elevensies and luncheon!  How is you doing, precious?

Cost: $0.001526000

Ok, that is pretty much the type of answer to expect from ring-obsessed Gollum. Now let’s try asking about something weather-related:

Please ask a question or type `/image` to upload an image first: How is the weather in London?

London, you says?  The sun shines there, yes precious, it does.  But it be a bit of a breezy day with the wind blowing from the West-northwest at a speed of 4 kph. Don't be caught out in the wind like Gollum in those tunnels with that nasty breeze!  We doesn't like the wind, no we doesn't.   The temperature is a mild 13.1 degrees Celsius, but it feels a bit chillier at 12.9 degrees.  Best keep a jacket handy, precious.

Cost: $0.004042500

That is pretty darn cool! We have our own personal Gollum weather announcer now! You can also see the token cost is a bit higher as behind the scenes the weather data object will have been passed into the LLM call as well. Gemini called our function for us and gave Gollum the weather data! 🌦️

This may all seem a bit magical, so in the next part, let’s take a look at what is actually going on here as in reality, Gemini did not call our function at all. We’ll also take this up a step and look at having multiple and even parallel functions. See you in the next part! 🚀

P.S. If you’re very observant (or have a habit of asking very inappropriate questions!), you may have noticed that our blocked inappropriate responses error handling is broken now that we have disabled streaming, we’ll fix this later!