Python LangChain Course πŸπŸ¦œπŸ”— Understanding Agents and building our own (5/6)

Hi and welcome back! In this part, we’re going to be building our own custom agent from scratch. So far the whole agent may have seemed a bit magical as it just runs off reasoning back and forth on its own. In this part we’re going to really understand how an agent works and how it’s built by making our own. This part will be a bit longer than usual so buckle up and get comfortable!

In this part we’re going to be reinventing the wheel a little bit but, without going too deep down the rabbit hole, this will greatly increase your understanding of how an LLM tool that can basically only generate text can become a powerful AI agent that makes decisions and takes actions.

Make a new folder named '5_Understanding_agents' to begin with, and inside make another folder named 'tools'. We’re going to be using 2 different tools in this part and one of them is going to be our internet tool from part 4. So copy the 'internet_tool.py' file from '4_Custom_tools/tools' into '5_Understanding_agents/tools'. Your folder now looks like this:

πŸ“Finx_LangChain
    πŸ“1_Summarizing_long_texts
    πŸ“2_Chat_with_large_documents
    πŸ“3_Agents_and_tools
    πŸ“4_Custom_tools
    πŸ“5_Understanding_agents
        πŸ“tools
            πŸ“„internet_tool.py      (copy of the internet tool from part 4)
    πŸ“„.env

Just in case you need the contents of the internet_tool.py file here it is again:

######################################################################################################################
##### This file is just a copy of the one from part 4, do not make changes here, make them in the original file ######
######################################################################################################################


import requests
from bs4 import BeautifulSoup
from langchain.tools import BaseTool


class InternetTool(BaseTool):
    name: str = "internet_tool"
    description: str = (
        "useful when you want to read the text on any url on the internet."
    )

    def _get_text_content(self, url: str) -> str:
        """Get the text content of a webpage with HTML tags removed"""
        response = requests.get(url)
        html_content = response.text
        soup = BeautifulSoup(html_content, "html.parser")
        for tag in ["nav", "footer", "aside", "script", "style", "img"]:
            for match in soup.find_all(tag):
                match.decompose()
        text_content = soup.get_text()
        text_content = " ".join(text_content.split())
        return text_content

    def _limit_chars(self, text: str) -> str:
        """limit number of output characters"""
        return text[:10_000]

    def _run(self, url: str) -> str:
        try:
            text_content = self._get_text_content(url)
            return self._limit_chars(text_content)
        except Exception as e:
            return f"The following error occurred while trying to fetch the {url}: {e}"

    def _arun(self, url: str):
        raise NotImplementedError("This tool does not support asynchronous execution")


if __name__ == "__main__":
    tool = InternetTool()
    print(
        tool.run("https://en.wikipedia.org/wiki/List_of_Italian_desserts_and_pastries")
    )

Obviously, in a real coding project never ever ever copy code like this! Having code in multiple places is bad. But for the sake of this tutorial, we’re going to keep everything segregated per folder so that you can easily reference anything you want to later on.

Moby Duck, Moby Dick??

Now inside the '5_Understanding_agents/tools' folder make another file named 'moby_duck_search.py'. (Not Moby Dick from the novel about the whale, I’ll explain the name in a second):

πŸ“Finx_LangChain
    πŸ“1_Summarizing_long_texts
    πŸ“2_Chat_with_large_documents
    πŸ“3_Agents_and_tools
    πŸ“4_Custom_tools
    πŸ“5_Understanding_agents
        πŸ“tools
            πŸ“„internet_tool.py      (copy of the internet tool from part 4)
            πŸ“„moby_duck_search.py
    πŸ“„.env

Inside this file, we’ll write the second tool our custom agent will use. Open up 'moby_duck_search.py' and start with our imports:

from json import dumps
from langchain.tools import BaseTool
from langchain.utilities import DuckDuckGoSearchAPIWrapper

We will use the Python built-in JSON library’s dumps or dump-to-string method to convert objects to string format. We import the BaseTool because we need to inherit from it as always. We also import DuckDuckGo because we’re going to be using DuckDuckGo to search for articles but limiting the search to the MobyGames gaming website. Hence this tool being named Moby-Duck-Search.

I chose this example website randomly and you can also use a different similar website about a specific topic if you so desire. This one will be pretty easy, so let’s get to implementing our tool:

class MobyDuckSearch(BaseTool):
    name: str = "moby_duck_search"  # Pun intended.
    description: str = (
        "A tool that uses DuckDuckGo Search to search the MobyGames game website. "
        "Useful for when you need to answer questions about games. "
        "Input should be a search query. "
    )
    api_wrapper = DuckDuckGoSearchAPIWrapper()

We start our class declaration by inheriting from the BaseTool and setting a default value for the name and description. We also create an instance of the DuckDuckGoSearchAPIWrapper which comes with LangChain which we store in a variable named ‘api_wrapper‘. We can simply reuse this but alter the input slightly. While still inside the MobyDuckSearch class block add:

    def _run(self, query: str) -> str:
        """Just call the DuckDuckGoSearchAPIWrapper.run method, but with the edited query."""
        targeted_query = f"site:mobygames.com {query}"
        results_with_metadata: list = self.api_wrapper.results(
            targeted_query, num_results=3
        )
        return dumps(results_with_metadata)

    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support asynchronous execution")

We implement the ._run() method that takes a query as a string and returns a string from the function. We first create a targeted_query which is just "Mario-Kart" in becomes "site:mobygames.com Mario-Kart" out. You’re probably familiar with this syntax from Google as it will allow you to search for results only on that specific website.

We then call the .results() method on our api_wrapper instance (which is just the DuckDuckGoSearchAPIWrapper we imported from LangChain and has the .results method already built-in for us) and pass in the targeted_query and the number of results we want. Note that we save this in a variable of type list named ‘results_with_metadata‘ as the result will also include metadata which contains the link or URL to the search results page. It’s good to be explicit with variable naming instead of just saying ‘results‘ so it’s very clear and readable exactly what kind of data this variable holds.

We then return the results_with_metadata variable but we first convert it to a string using the json.dumps() method. This method converts it to a stringified JSON object because LLMs only work with strings so we need a string return to our method, else we won’t be able to feed it back into ChatGPT or another LLM. Finally, we declare the unused ._arun() method just to be complete.

Our MobyDuckSearch class now looks like this:

class MobyDuckSearch(BaseTool):
    name: str = "moby_duck_search"  # Pun intended.
    description: str = (
        "A tool that uses DuckDuckGo Search to search the MobyGames game website. "
        "Useful for when you need to answer questions about games. "
        "Input should be a search query. "
    )
    api_wrapper = DuckDuckGoSearchAPIWrapper()

    def _run(self, query: str) -> str:
        """Just call the DuckDuckGoSearchAPIWrapper.run method, but with the edited query."""
        targeted_query = f"site:mobygames.com {query}"
        results_with_metadata: list = self.api_wrapper.results(
            targeted_query, num_results=3
        )
        return dumps(results_with_metadata)

    def _arun(self, query: str):
        raise NotImplementedError("This tool does not support asynchronous execution")

A quick test run

Let’s add a quick test below and outside our class to see if it works:

if __name__ == "__main__":
    moby_duck_tool = MobyDuckSearch()
    print(moby_duck_tool.run("lego star wars"))

Remember this will only run if we run the file directly, and not if it’s imported inside another file. We just create a new instance of our moby duck tool and then run it with a query of 'lego star wars'. Go ahead and run the file to see if your tool is working, the structure of your output should look like this:

[
    {"snippet": "Snippet here", "title": "Title here", "link": "https://link.com/"},
    {"snippet": "Snippet here", "title": "Title here", "link": "https://link.com/"},
    {"snippet": "Snippet here", "title": "Title here", "link": "https://link.com/"}
]

Note that even though it kind of looks like a list of dictionaries it’s just a string type variable we can feed to ChatGPT.

Go ahead and save this file and now create an __init__.py file in the '5_Understanding_agents/tools' folder.

πŸ“Finx_LangChain
    πŸ“1_Summarizing_long_texts
    πŸ“2_Chat_with_large_documents
    πŸ“3_Agents_and_tools
    πŸ“4_Custom_tools
    πŸ“5_Understanding_agents
        πŸ“tools
            πŸ“„__init__.py
            πŸ“„internet_tool.py      (copy of the internet tool from part 4)
            πŸ“„moby_duck_search.py
    πŸ“„.env

Import both the moby duck and internet tool inside this file:

from .moby_duck_search import MobyDuckSearch
from .internet_tool import InternetTool

Now go ahead and save and close your __init__.py file.

Setting up our Agent’s prompt

We’re done with our tools for now. The first thing we’re going to build for our agent is the prompt. Our instructions to the agent on what we want it to do and how. To keep our file structure and our final agent code cleaner and readable we’re going to create one more folder. Create a folder named ‘prompts‘ inside the '5_Understanding_agents' folder and inside create an empty __init__.py file and a file named 'base_agent_template.py'. Your folder structure should now look like this:

πŸ“Finx_LangChain
    πŸ“1_Summarizing_long_texts
    πŸ“2_Chat_with_large_documents
    πŸ“3_Agents_and_tools
    πŸ“4_Custom_tools
    πŸ“5_Understanding_agents
        πŸ“prompts
            πŸ“„__init__.py   (empty file)
            πŸ“„base_agent_template.py    (empty file)
        πŸ“tools
            πŸ“„__init__.py
            πŸ“„internet_tool.py
            πŸ“„moby_duck_search.py
    πŸ“„.env

Open your 'base_agent_template.py' file and declare the following variable containing a prompt inside:

base_agent_template = """
Answer the following questions as best you can, but speaking as fanatic gaming enthusiast. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to speak as a fervent gaming enthusiast when giving your final answer.

Question: {input}
{agent_scratchpad}
"""

This is just a prompt template as we’ve seen several of in the past, notice the {tools} and {tool_names} variables. We’ll be replacing these with the actual tools we want to use and their names. We also have a {agent_scratchpad} variable which we’ll use to store the agent’s thoughts and actions as it goes through the process of answering the question. We’ll see how this works in a bit.

Note how most of this template is a standard structure that defines how we want the LLM to structure its thought process and answer our question. Now that we can see a basic template form for this ReAct reasoning style agent it makes a lot of sense and is actually quite simple. We’re merely asking the LLM to output text as that is all it knows how to do and all it can do in the end. We’re just asking it to output it in a specific format and order so that we can work with it, that’s all! This should make the whole reasoning agent from the previous parts look a lot less magical and a lot more understandable.

Go ahead and save and close this file, then open your __init__.py file, making sure to open the one in this folder which is still empty, and import the base_agent_template:

from .base_agent_template import base_agent_template

Ok go ahead and save and close that as well.

Let’s build our Agent

Let’s move on to our agent. Create a file called '1_building_an_agent.py' inside your '5_Understanding_agents' folder:

πŸ“Finx_LangChain
    πŸ“1_Summarizing_long_texts
    πŸ“2_Chat_with_large_documents
    πŸ“3_Agents_and_tools
    πŸ“4_Custom_tools
    πŸ“5_Understanding_agents
        πŸ“prompts
            πŸ“„__init__.py
            πŸ“„base_agent_template.py
        πŸ“tools
            πŸ“„__init__.py
            πŸ“„internet_tool.py
            πŸ“„moby_duck_search.py
        πŸ“„1_building_an_agent.py     <------New file
    πŸ“„.env

Inside the '1_building_an_agent.py' file we’ll start with our imports. There are going to be quite a lot of them as in this part we’ll really dig into building an agent from its parts, so let’s get started:

import re

from decouple import config
from langchain import LLMChain
from langchain.agents import (
    AgentExecutor,
    AgentOutputParser,
    LLMSingleActionAgent,
    Tool,
)
from langchain.chat_models import ChatOpenAI
from langchain.prompts import StringPromptTemplate
from langchain.schema import AgentAction, AgentFinish
from prompts import base_agent_template
from tools import MobyDuckSearch

I’ll go over these imports in a broad sense, and we’ll explain them in more detail when we use each import. That makes more sense as we’ll actually get to see what the import does instead of just talking about it theoretically. We import the “re” module for regular expressions, as we want to test if the model output matches certain patterns later on. Decouple is of course for our API key, and all the other parts are bits and pieces we will combine together to build our agent.

Finally, we import our own MobyDuckSearch tool and prompt template. (We will get the internet tool involved later on in the tutorial, I haven’t forgotten about it!). Again, we’ll see how each of these imports works when we use them.

Setup

Let’s set up our ChatGPT API and our tools:

chat_gpt_api = ChatOpenAI(
    temperature=0, model="gpt-3.5-turbo-0613", openai_api_key=config("OPENAI_API_KEY")
)

moby_duck_tool = MobyDuckSearch()

tools = [
    Tool(
        name=moby_duck_tool.name,
        func=moby_duck_tool.run,
        description=moby_duck_tool.description,
    )
]

We set up our ChatGPT API as always. We then create a new instance of the MobyDuckSearch class and then create a list of tools. Inside we only create a single Tool object for now, using the name and description we have defined inside the class and passing the .run method as the func argument.

The prompt template (formatter)

We’re going to be writing custom versions of most things so you can really see and understand what is going on. We’ll go to the prompt-template formatter next. We have the prompt we defined in the prompts folder, but it has the variables that need to be plugged into it before it can be used. Let’s start on our new class:

class MobyDuckPromptTemplate(StringPromptTemplate):
    template: str
    tools: list[Tool]

We define a new class that inherits from StringPromptTemplate. This is a class that is used to format the prompt which is sent to the LLM. The format method must return a string. New instances of this class will take a template and a list of Tools as input. The template is a template string like we have in our prompts folder with {variables} in brackets in the string. We now need to add a .format method:

class MobyDuckPromptTemplate(StringPromptTemplate):
    template: str
    tools: list[Tool]

    def format(self, **kwargs) -> str:
        # Method implementation will go here.

The .format method is defined as an abstract method in the BasePromptTemplate, which is a parent of the StringPromptTemplate we inherited from. An abstract method in the parent class means the child classes should implement it. This method should return the formatted prompt as a string and will contain our prompt formatting logic.

Our .format method takes self, which is this particular instance of the class itself, and **kwargs, which is a dictionary of whatever other arguments were passed in when the .format method was called, basically just catching all the other arguments that were passed in.

So, before we start building our format method, what exactly is going to be inside this kwargs dictionary? At this point in the format function, our kwargs dictionary will look like this:

# Do not put this in your code #
{'input': 'The user input query', 'intermediate_steps': [list of steps taken so far]}

Where did these two key-value pairs come from? Who or what passed them into our format method? The AgentExecutor class which we imported from LangChain and will take a look at later on takes care of passing in the input and intermediate_steps if any have taken place, so on the first call this will be an empty list, but after the agent calls a tool or does something else, the AgentExecutor will add the action and observation to the intermediate_steps list and pass it back into the format method.

As we now know our format method will receive an intermediate_steps variable, let’s pop it off into a variable called ‘intermediate_steps‘.

    def format(self, **kwargs) -> str:
        intermediate_steps = kwargs.pop("intermediate_steps")

So in order to format our prompt, which is the whole purpose of this .format method, we need to pass in the variables we left open in our prompt template, namely {tools}, {tool_names}, {input}, and {agent_scratchpad}. As we saw a moment before, we already have the input key in our kwargs dictionary, so that one’s taken care of.

Next, we need to provide the {agent_scratchpad} This scratchpad is the LangChain name for basically the notes of the agent. As stated above, with each step the AgentExecutor will return the intermediate_steps to our format method, so we can prep the prompt for the next step by adding the agent’s actions and observations so far to the prompt of the next call. So before each ChatGPT call a new fresh prompt will be generated using our format method, inputting the actions and observations so far into the prompt before asking ChatGPT for its next step.

This acts as a sort of memory. Remember that a ChatGPT call is just a text completion based on whatever text you put in. So we feed back whatever the agent has thought and done so far, otherwise, it has no idea or memory of what has happened. We’ll start with an empty string for our scratchpad.

    def format(self, **kwargs) -> str:
        intermediate_steps = kwargs.pop("intermediate_steps")
        scratchpad = ""

Now what exactly is in our intermediate_steps variable? The intermediate_steps variable is a list of tuples (value1, value2) with two values each. The first value is an AgentAction object, which is a dataclass with three attributes: ‘tool‘, ‘tool_input‘, and ‘log‘. For now, we just want the ‘log‘, which is a string of the agent’s thoughts and actions. An example of a ‘log‘ is the following, which is just a single string with two linebreaks in it:

log='Thought: Oh, I love zombie games! There are so many great ones out there. Let me think about the best recommendation for a zombie game from 2022.
Action: moby_duck_search
Action Input: "zombie game 2022"'

You can see these are the strings you have been seeing in your terminal all along when running an agent!

The second item in each tuple in the intermediate_steps list is the observation, which is basically just the output of whatever tool was called, so in this case, it will be the string we return at the end of our ._run() method in the MobyDuckSearch tool containing the search results. We can loop over the list of tuples and give each of the two entries a name, ‘action‘ and ‘tool_output‘, as there will always be two entries in the tuple.

    def format(self, **kwargs) -> str:
        intermediate_steps = kwargs.pop("intermediate_steps")
        scratchpad = ""

        for action, tool_output in intermediate_steps:
            scratchpad += action.log
            scratchpad += f"\nObservation: {tool_output}\nThought: "

For each action and tool_output in each intermediate step in the list of intermediate step tuples, we concatenate the action.log string to our scratchpad variable. We then add the called tool’s output to the scratchpad, adding in a \n newline before it to make it readable and ending with a \n newline and Thought: to finish out the prompt and prompt the ChatGPT model to continue and give us its next thought in the sequence.

In the basis, all we’re doing is prompting text completion so we deliberately end the text with “Thought: ” to prompt the model to insert its next thought. Let’s continue:

    def format(self, **kwargs) -> str:
        intermediate_steps = kwargs.pop("intermediate_steps")
        scratchpad = ""

        for action, tool_output in intermediate_steps:
            scratchpad += action.log
            scratchpad += f"\nObservation: {tool_output}\nThought: "

        kwargs["agent_scratchpad"] = scratchpad

Remember we have the kwargs dictionary which already contains an input key, we now have the scratchpad so we can add that to the kwargs dictionary as well.

        kwargs["tools"] = "\n".join(
            [f"{tool.name}: {tool.description}" for tool in self.tools]
        )

We still need to add the tools to the kwargs dictionary. Reading from the inside out we first loop over each tool in self.tools, and then create a string that contains the tool.name and then the tool.description. Now we have a list holding a string with "name: description" for each tool, and we simply join them together with a \n newline in between each tool, giving us a string with each tool on a new line. We then add this string to the kwargs dictionary under the ‘tools‘ key.

        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])

The last {variable} in our prompt template was the tool names. This one is pretty easy. We loop over each tool in self.tools and get the tool.name in a list. We then join all these tools together in a single string with a comma and space in between each entry. We then add this string to the kwargs dictionary as the ‘tool_names‘ key.

We now have all the variables we need to format our prompt and fill in the holes in our template. This format will run after each call updating the actions taken and observations so far before sending out a new call to ChatGPT. We can now return our formatted prompt:

        return self.template.format(**kwargs)

Here we call the .format string method built into Python, not to be confused with the .format class method we are defining right now by the same name. We give the format method the dictionary with all the arguments it needs to fill in the {variable} slots in our template and then return the resulting completed prompt string. Our whole class now looks like this:

class MobyDuckPromptTemplate(StringPromptTemplate):
    template: str
    tools: list[Tool]

    def format(self, **kwargs) -> str:
        intermediate_steps = kwargs.pop("intermediate_steps")
        scratchpad = ""

        for action, tool_output in intermediate_steps:
            scratchpad += action.log
            scratchpad += f"\nObservation: {tool_output}\nThought: "

        kwargs["agent_scratchpad"] = scratchpad
        kwargs["tools"] = "\n".join(
            [f"{tool.name}: {tool.description}" for tool in self.tools]
        ) kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools]) return self.template.format(**kwargs)

So we have a class we can use as a prompt formatter, now, let’s actually instantiate a new instance of this class:

prompt_formatter = MobyDuckPromptTemplate(
    template=base_agent_template,
    tools=tools,
    input_variables=["input", "intermediate_steps"],
)

We pass in our base_agent_template we wrote in the prompts folder and the list of tools we declared above. The input variables is a list of the variable names that our class’s .format method expects as keyword arguments, so we list the two variables we know it will receive, ‘input‘ and ‘intermediate_steps‘.

Parsing the output

That takes care of the prompt generation part of our agent. Now ChatGPT, or any other LLM for that matter, does not actually have any ability to call a function or use our tools, its only ability is to output text completions. So if our LLM wants to use one of our tools it will tell us so in textual format. We need to parse the text output the LLM sends back to us. This is where output parsers come in:

class MobyDuckOutputParser(AgentOutputParser):

We define a new class and inherit from the AgentOutputParser we imported from LangChain. Now we have to define our parse method:

class MobyDuckOutputParser(AgentOutputParser):
    def parse(self, llm_output: str) -> AgentAction | AgentFinish:

The input of this method will be self and the llm_output, which is ChatGPT’s output in our case, in string format. We type hint the output of this function to either be an AgentAction or an AgentFinish object. These are just two basic datatypes by LangChain with the AgentFinish basically containing the final answer output and an AgentAction being the ‘intermediate_steps’ we received in our prompt formatter’s format method earlier, containing the action to take and the log.

Why these two objects? We discussed the AgentExecutor class in our format method. It is the thing that passes the ‘input‘ and ‘intermediate_steps‘ into our prompt formatter. An AgentExecutor takes either an AgentFinish object, which terminates the call and returns the final result, or an AgentAction object, which tells the AgentExecutor another action must be taken. It will then call the prompt formatter, passing in the ‘input‘ and ‘intermediate_steps‘ variables, which is why they appeared in our format method. I hope this is all slowly starting to make sense!

Inside our parse method, first check if the LLM has finished. If it has, it will have "Final Answer:" in its output, because that’s the structure we told it to use in our prompt template.

    def parse(self, llm_output: str) -> AgentAction | AgentFinish:
        if "Final Answer:" in llm_output:
            answer = llm_output.split("Final Answer:")[-1].strip()
            return AgentFinish(
                return_values={"output": answer},
                log=llm_output,
            )

So if we find the string "Final Answer:" in the llm_output, we return an AgentFinish object. If we hover over AgentFinish in our code editor we can see it takes two arguments, a dictionary of return_values and a log. The return_values dictionary is just a dictionary of whatever values we want to return, in this case, we just want to return the final answer.

We create a variable called ‘answer‘ and for its value, we split the llm_output string on the "Final Answer:" string which gives us a list of two strings, the part before the LLM gives its final answer and the part after. We select the last string using [-1] which will select the last index in a list and then use .strip() to get rid of any extra whitespace.

Now we simply return an AgentFinish object with the return_values dictionary containing the ‘output‘ key and the answer variable as the value. We also pass in the llm_output, which was the full LLM’s output, as the log.

That takes care of the case where our model has finished our problem. Now let’s handle the case where it is still in action. We’ll have to find out which action it wants to take, which should be one of the names of our tools in the tool_names variable. We’ll also have to find out what input arguments it wants to give that tool. As we will receive all this data in string format we’ll have to extract the needed information from the string using regular expressions:

        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"

Without going too deep into regex, which is a whole course on its own (this regex expression is taken from LangChain’s documentation), let’s take a very brief look. Here we look for the word "Action" optionally followed by \s* white space characters (like space and tab), optionally followed by digits (\d*), optionally followed by white space characters again (\s*), then followed by a colon (:). So this could match "Action:" but also "Action : " or even "Action 1 : ".

The (._?) is a capturing group that will capture any characters until we get to the \n newline character that follows the group in the regex. So any characters in between "Action:" and the next \n newline character will be stored in a group we can access later to extract the value. We then do basically the same thing again for "Action Input:", allowing for possible spaces and numbers in between, and ending with a second capture group (.*) that will capture whatever comes after the "Action Input:".

Again, regex is a programming language in itself and too much to fully get into here, but we basically capture the action and action input in two groups that we can access later on, using this regular expression pattern. Note this pattern matches the expectation we set with our prompt template that we feed into ChatGPT, where we ask for output using this exact structure.

Now we need to run the regex pattern against our ChatGPT / LLM output:

        match = re.search(regex, llm_output, re.DOTALL)

We call the search method on the ‘re‘ library, passing in our regex pattern, the llm_output we want to look for matches in, and finally the re.DOTALL flag. This flag allows the '.' dot character which normally matches every possible character except for the newline character, to also match the newline character. If you’re not too sure about regex, don’t worry too much about this detail, regex is a subject for another tutorial course. Continue as follows:

        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2).strip(" ").strip('"')

If we don’t find a match (our match variable is empty), we raise a ValueError for now. Else, we take group 1 of the match, which is the first capture group we talked about in our pattern, so whatever text came after "Action:", and store it in a variable called 'action', calling strip() to get rid of any extra whitespace. We then do the same for the second capture group, storing it in a variable called 'action_input', but this time we strip off any space( ) and also any double quotes (") that might be in the string.

Our parse method had to either return an AgentFinish or an AgentAction. In this case, we’re still in action, so let’s return an AgentAction object:

        return AgentAction(tool=action, tool_input=action_input, log=llm_output)

We pass in the tool and tool_input we just extracted and the original complete ChatGPT output as the log. For clarity, here is the entire finished MobyDuckOutputParser class:

class MobyDuckOutputParser(AgentOutputParser):
    def parse(self, llm_output: str) -> AgentAction | AgentFinish:
        if "Final Answer:" in llm_output:
            answer = llm_output.split("Final Answer:")[-1].strip()
            return AgentFinish(
                return_values={"output": answer},
                log=llm_output,
            )

        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)

        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2).strip(" ").strip('"')

        return AgentAction(tool=action, tool_input=action_input, log=llm_output)

Now we declare a simple llm_chain, combining our ChatGPT with the prompt_formatter we built above:

llm_chain = LLMChain(llm=chat_gpt_api, prompt=prompt_formatter)

Remember the prompt_formatter is an instance of our MobyDuckPromptTemplate class with our base_agent_template prompt from the prompts folder and everything else it needs to generate the prompt passed in.

Adding it all together: the Agent

Now it’s time to define our actual agent. We’ll be using the LLMSingleActionAgent type we already imported up top. The single action part simply means the agent will take a single action each time, but as we have seen it may run multiple times, which is where the AgentExecutor comes in, but more on that in a moment. First our agent:

moby_duck_agent = LLMSingleActionAgent(
    llm_chain=llm_chain,
    output_parser=MobyDuckOutputParser(),
    stop=["\nObservation:"],
)

We declare a new LLMSingleActionAgent and give it the LLM chain containing our ChatGPT API and the prompt formatter with the prompt and manner to format it. We also give it the output parser we just wrote above, passing in a new instance of the class.

Now what is the stop argument? This is actually a standard API feature in ChatGPT that tells ChatGPT that if you run into this series of characters while generating output, stop then and there, that is the end of your output. If we look back to the template we wrote for our agent, for each time it runs ChatGPT is asked to generate the following (straight from our prompt template):

Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action

So the ChatGPT chooses which tool to call, e.g. Action: moby_duck_tool. Then it chooses the Action Input, e.g. "zombie game 2022". It will follow the pattern and then generate the next line which says "Observation:" as it’s following our template, but after observation WE and not ChatGPT need to input the result of calling the tool. This is why after "Observation:" we tell ChatGPT to stop the generating there.

This is why we wrote the following line of code in our prompt formatter:

# snippet from our MobyDuckPromptTemplate class's .format() method #
scratchpad += f"\nObservation: {tool_output}\nThought: "

Because here we add the tool output after Observation: and then we can send the whole thing back to ChatGPT to continue generating, either calling another tool or giving us the final answer.

The final step: an AgentExecutor

Okay so now we have an agent with our ChatGPT API, prompt creation functionality, output parser, and stop argument. We’re almost done! We just need to create an AgentExecutor to actually run our agent. Before we do, let’s discuss what the AgentExecutor actually is as we’ve mentioned it several times already and promised a proper explanation.

The AgentExecutor is basically a loop that manages executing the Agent. For every loop, it will pass the user input query and the previous steps that have happened so far to the agent. If the agent returns an AgentFinish object, the AgentExecutor will return the end result directly to the user, and if the Agent returns an AgentAction, the AgentExecutor will call that tool and get the Observation. The loop will now repeat, passing the new Observation back into the agent along with the previous steps that have happened so far, until an AgentFinish object is returned.

So the AgentExecutor is basically just the execution loop that takes all the steps we’ve built so far and will run them together in a loop in the correct order.

agent_executor = AgentExecutor.from_agent_and_tools(
    agent=moby_duck_agent, tools=tools, verbose=True
)

We just pass in our agent and tools and set verbose to True as we want to see what it’s doing. Now let’s finally run our own agent!

agent_executor.run("Can you recommend me a zombie game from the year 2022?")

And my output is:

> Entering new AgentExecutor chain...
Thought: Oh, I love zombie games! There are so many great ones out there. Let me think about the best zombie game from 2022.
Action: moby_duck_search
Action Input: "best zombie game 2022"

Observation:[{list of search result objects for query "best zombie game 2022"}]
There are three great zombie games from 2022 that I found: Zombie Cure Lab, Zombie Survivors, and SurrounDead. Let me think about which one to recommend.
Action: moby_duck_search
Action Input: "Zombie Cure Lab"

Observation:[{list of search result objects for query "Zombie Cure Lab"}]
Zombie Cure Lab is a game where you manage a lab and try to cure the zombie virus. It has day and night shifts, and you need to keep your workforce happy to prevent outbreaks. You also build defenses to keep zombies out at night. It sounds like a unique and challenging game. I recommend Zombie Cure Lab as the best zombie game from 2022.

Final Answer: The best zombie game from 2022 is Zombie Cure Lab.

> Finished chain.

Our tool speaks in a gaming enthusiast voice as that’s what we instructed it to do. It calls our moby_duck_search tool and passes in a search query. Notice it follows our structure exactly as we instructed in the prompt template we wrote.

We then go into the second ChatGPT call after our AgentExecutor has called our tool and added the search results after "Observation:". Our tool is apparently most impressed with a particular search result about a game called "Zombie Cure Lab" and wants to do another DuckDuckGo search on this game. The AgentExecutor obliges, calls the tool, and feeds the response back into the next call to ChatGPT which now concludes that Zombie Cure Lab is the best zombie game from 2022.

We can argue if it is objectively the best game or not, but ChatGPT gave us a pretty good zombie game recommendation from 2022 based on autonomously carried out research, that’s pretty darn cool!

Now let’s take this one last step further before we end this tutorial part, it’s already gotten really long anyway. I’ll cut you some slack in part 6, I promise!

So say I want to ask the agent for more information about this game, and I’m a lazy user so I just word my next question like this:

"What is the game about?"

Now if we send this query to our agent executor it will fire up a new agent and load it up. The new agent will have no idea what game we are talking about and we’re in trouble. For this, our agent will need the final step to intelligence, memory!

Adding memory to our Agent

First, go back into your prompts folder and open the base_agent_template.py file. We’ll need to add a second prompt version that is slightly different and allows for agent memory. Below your existing 'base_agent_template' variable, add a second variable to this file (you can just copy it as it’s almost the same):

base_agent_template_w_memory = """
Answer the following questions as best you can, but speaking as fanatic gaming enthusiast. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to speak as a fervent gaming enthusiast when giving your final answer.

Previous conversation history:
{history}

New question: {input}
{agent_scratchpad}
"""

As you can see, we just added the line Previous conversation history: {history} in our template with memory version, simple enough. Save and close this file, then open the __init__.py file in the prompts folder and add this new variable to the existing import statement:

from .base_agent_template import base_agent_template, base_agent_template_w_memory

Save and close this file as well and now let’s get back to our '1_building_an_agent.py' file with all our stuff in it. At the top add 2 extra imports to the already existing list of imports:

from prompts import base_agent_template_w_memory
from tools import InternetTool

So we import the prompt-template-with-memory version we just wrote (you can of course combine this import statement with the other prompt import statement if you want), and we import the InternetTool we wrote in the previous part and copied over to our tools folder at the beginning of this tutorial. Remember the InternetTool we wrote allows the agent to get the page text for a certain URL anywhere on the internet.

Let’s add the internet tool to our tools:

internet_tool = InternetTool()
tools.append(
    Tool(
        name="visit_specific_url",
        func=internet_tool.run,
        description=(
            "Useful when you want more information about a page by opening it's url on the internet."
            "Input should be a valid and full internet url with nothing else attached."
        ),
    )
)

We created a new instance of the InternetTool and then just appended a new Tool object to the already existing tools list. Note how we did not use the internet_tool.name and internet_tool.description defaults in our class but wrote a custom name and description this time.

Now we just retrace our final steps to build a second agent. I’m just going to keep coding in this same file below all the already existing stuff as this tutorial is already very long and I want to focus on the learning concepts here and not software project structuring best practices.

prompt_formatter_w_memory = MobyDuckPromptTemplate(
    template=base_agent_template_w_memory,
    tools=tools,
    input_variables=["input", "intermediate_steps", "history"],
)

So we declare a second prompt_formatter using our MobyDuckPromptTemplate class again. This time the list of input_variables also includes history and we use our base_agent_template_w_memory as the template.

Now we combine our ChatGPT API and the prompt formatter into a simple chain like we did before:

llm_chain_w_memory = LLMChain(llm=chat_gpt_api, prompt=prompt_formatter_w_memory)

And we declare our new agent with memory just like before:

moby_duck_agent_w_memory = LLMSingleActionAgent(
    llm_chain=llm_chain_w_memory,
    output_parser=MobyDuckOutputParser(),
    stop=["\nObservation:"],
)

Now we’ll actually need to add the memory. The AgentExecutor class that runs the loop will take our memory object and integrate it into the Agent Execution loop for us, but first, we need some memory. Add the following import to the top of the file:

from langchain.memory import ConversationBufferWindowMemory

ConversationBufferWindowMemory will basically just hold a list of strings with whatever the “Human” asked and then what the “AI” answered. A conversation history that is similar to the one we kept in the “function calls and embeddings” Finxter academy course. It will basically just store the already completed conversations and feed them back into the loop if we ask another question after the first one.

Go ahead and instantiate a memory object:

memory = ConversationBufferWindowMemory(k=10)

We pass in the k argument which is the number of messages to keep in memory. Now we can finally create our AgentExecutor:

agent_executor_w_memory = AgentExecutor.from_agent_and_tools(
    agent=moby_duck_agent_w_memory, tools=tools, verbose=True, memory=memory
)

This time we simply pass in an extra argument called memory. Now to test this I’m just going to ask two questions ahead of time. If you want to do this properly you should write a loop that asks the user for input and then calls .run() on the agent executor allowing the user to ask a new question after each AgentExecutor chain finishes running. But as this tutorial is already so long let’s just cheat a little bit and hardcode two questions:

agent_executor_w_memory.run("Can you recommend me a zombie game from the year 2022?")
agent_executor_w_memory.run("Can you give me more information on that first game?")

I’m going to assume the first question will net at least one game recommendation in which case the second question about more information on that first game will make sense. Make sure you comment out any .run() statements you still have on the old agent executor without memory up above or they will also run again and let’s run our file with these two questions:

> Entering new AgentExecutor chain...
Thought: Oh, I love zombie games! Let me think of a good recommendation from 2022.
Action: moby_duck_search
Action Input: "zombie game 2022"

Observation:[{...list of search result objects for query "zombie game 2022"}]
There are a few great zombie games from 2022 that I found. One recommendation is "Zombie Apocalypse: The Last Defense." It's a tower defense game where you place explosive mines to stop the zombies. You can also buy allies to help you in the war against the undead. It has a wide range of enemies, 15 power-ups, and a beautiful visual effects. You can even drive a car and a war tank! You should definitely check it out!

Final Answer: I recommend "Zombie Apocalypse: The Last Defense" as a great zombie game from 2022.

> Finished chain.

We got another good recommendation. Now the second question will run asking for more information which triggers the AgentExecutor chain again and the agent will know from memory what game we’re talking about.

> Entering new AgentExecutor chain...
Thought: I need to find more information about "Zombie Apocalypse: The Last Defense" to provide a detailed response.
Action: visit_specific_url
Action Input: https://www.mobygames.com/game/zombie-apocalypse-the-last-defense

This time it uses our internet tool from part 4 to visit the specific url.

Observation: ...Loads of text from the MobyGames page for "Zombie Apocalypse: The Last Defense"

Final Answer: "Zombie Apocalypse: The Last Defense" is an action strategy/tactics game released in 2022 on Windows. It features real-time strategy.... (etc, a great summary of the features and details)

> Finished chain.

That’s awesome! We can now ask follow-up questions. We did it, we built our own agent step by step! Some of this was reinventing the wheel a little bit, but this tutorial was purposefully so to give you a deeper understanding of the inner workings of an agent. I hope it all seems a lot more understandable and logical and less magical to you now and gives you more insight into how this all works together.

As a small caveat, you might have noticed your model was not 100% reliable when running and could occasionally have a parsing error or trouble calling the tools/functions. This is why, again, in practice, I recommend you use the OpenAI agent as much as possible because it’s based on OpenAI’s function calls and uses the 'gpt-turbo-0613' model.

This way you get a high-quality model which is specifically trained to handle calling functions, making it more robust and reliable than a lot of the open-source models out there. (Of course, you can also use the function-calls-specific GPT4 version).

Either way, the underlying principles are the same. That’s the end for part 5 of the tutorial series, this one was pretty long, but I’ll see you soon in part 6 where we’ll have a look at LangChain Expression Language and have some fun building an LLM chain that corrects its own mistakes!


This tutorial is part of our original course on Python LangChain. You can find the course URL here: πŸ‘‡

πŸ§‘β€πŸ’» Original Course Link: Becoming a Langchain Prompt Engineer with Python – and Build Cool Stuff πŸ¦œπŸ”—