Welcome back to part 3 where we’ll take a look at LangChain agents. This is where LLMs like ChatGPT start to get really cool and also a bit scary at the same time. We’re going to give the decision-making powers to ChatGPT and let it decide which action it needs to take. An action can be to use a specific ‘tool’ and see what the output is, or return a response to the user.
Tools are the name LangChain uses for what are basically a type of function calls that can run a specific action for us and return the result. We will use pre-built tools in this part and make our own in the next part, so you will gradually get a good understanding of how they work.
So far we’ve been using a predetermined chain of calls that we defined ahead of time. But what if we need to create a chain dynamically based on the user’s input, what if we don’t know ahead of time exactly which ‘path’ of LLM calls should be made? This is where LangChain agents come in. An agent has access to a suite of tools, and depending on what the user input is it may or may not call any of these tools, or even multiple in a row, until it can reason its way to the answer.
Our coverage of agents will run through parts 3, 4, and 5, where we’ll create our own agent and take a deeper look at its inner functioning. So if anything seems confusing or vague, hang in there, as we’ll go into more and more detail as we go along. I feel this is preferable over just dumping a huge load of theory on you all at once before we ever get started. So let’s get started!
Creating a simple Python agent
Let’s start by creating a new folder called ‘3_Agents_and_tools
‘ in our base project directory and inside that folder create a new file called ‘1_python_agent.py
‘.
πFinx_LangChain π1_Summarizing_long_texts π2_Chat_with_large_documents π3_Agents_and_tools π1_python_agent.py π.env
Open up this file and as always we’ll start with our imports:
from decouple import config from langchain.agents.agent_toolkits import create_python_agent from langchain.agents.agent_types import AgentType from langchain.chat_models import ChatOpenAI from langchain.tools.python.tool import PythonREPLTool
We’ll be using create_python_agent
to⦠you guessed it, create a Python agent. AgentType
will let us choose an agent type. Decouple/config and ChatOpenAI are familiar by now. PythonREPLTool is one of the predefined tools that LangChain comes with. It’s a tool that will run a Python REPL session and return the output. We’ll see how this works in a moment.
Now we need to create an agent and give it our Python REPL Tool, so it can use this to execute code. At first, this may all seem a bit magical, but we’ll go into more details about Tools and Agents step by step, so hang in there for now and just enjoy the magic. Let’s set up our ChatGPT API first:
chat_gpt_api = ChatOpenAI( openai_api_key=config("OPENAI_API_KEY"), temperature=0, model="gpt-3.5-turbo-0613", )
We initialize the ChatGPT API exactly as before. The thing we have done differently though is we specified a very specific version of ChatGPT, namely 'gpt-3.5-turbo-0613'
. This is a version of ChatGPT that has been specifically trained to handle function calls. For more details on how function calls work, again, I refer back to my OpenAI function calls and embeddings tutorial series also available on the Finxter Academy, as I don’t want to repeat the same material in multiple courses and waste your time!
LangChain has several types of agents, and one of them is actually the ‘OpenAI functions’ agent. This one is specifically made for the OpenAI function call models like the one mentioned above and is not for use with other LLMs. The advantage of this is that this agent uses the OpenAI function calls under the hood, which gives us a very strong and reliable system based on OpenAI’s function calls in the implementation, but with the simplicity of LangChain on the surface. By using this agent combined with the specific function-call-trained GPT model we can take advantage of the model’s specific training to reliably handle function calls.
Now let’s set up our agent:
agent = create_python_agent( llm=chat_gpt_api, tool=PythonREPLTool(), verbose=True, agent_type=AgentType.OPENAI_FUNCTIONS, agent_executor_kwargs={"handle_parsing_errors": True}, )
We use the create_pyton_agent
function to create our agent, passing in our initialized chat_gpt_api
and the PythonREPLTool we imported. Notice that we pass a new instance of this PythonREPLTool by calling it with the brackets()
. We set verbose to true, which will show us more output in the terminal containing the agent’s reasoning process. For the agent_type
we use AgentType.OPENAI_FUNCTIONS
, for the reasons we explained above.
The last argument, agent_executor_kwargs, allows us to pass options to the agent using a dictionary format. In this case, we set the “handle_parsing_errors
” key to True. If a function call is requested by the model but the function name or the arguments are not correct or not parseable an error might occur. This option will pass a generic error back to ChatGPT along with the parsing error text and give the model a chance to recover should it error when trying to call a function. I’ve rarely experienced this with ChatGPT’s function calls as they are quite robust, but just in case it does output something that cannot be successfully parsed into a valid function call, we give it a chance to recover.
Running our first agent π΅
So let’s take our agent for a spin. Add the following to run your agent:
agent.run("Please print hello A.I. world to the console.")
Now go ahead and run your file and watch the console!
> Entering new AgentExecutor chain... Invoking: `Python_REPL` with `print('hello A.I. world')` Python REPL can execute arbitrary code. Use with caution. hello A.I. world hello A.I. world > Finished chain.
First, you see the agent chain has started. Its first and only reasoning step is that it should invoke the Python_REPL
tool it has to solve this problem and pass in the argument 'print('hello A.I. world')
‘. The Python_REPL
tool then executes this Python code and returns the output, which is then returned to the user. The agent then finishes its chain and returns. The reason you see the final output twice is because the REPL executed a print statement after which the agent also returned the final answer to us.
Again, we’ll get into more detail on how these tools and agents work and how the agent knows which tools to pick in the coming parts. For now, let’s try one more call to our agent.
agent.run( "Please write a function to calculate which weekday (monday, tuesday, etc.) a given date is. The date should be in format 'YYYY-MM-DD' and the function should return a string with the weekday name. Return the weekday for the date '2025-06-13'." )
Now let’s give this a spin!
Invoking: `Python_REPL` with `import datetime def get_weekday(date): year, month, day = map(int, date.split('-')) weekday = datetime.date(year, month, day).strftime('%A') return weekday get_weekday('2025-06-13')` The weekday for the date '2025-06-13' is "Friday". > Finished chain.
We can see again that the agent has decided to call its Python_REPL
tool to solve this question. it passed in an import, wrote a function, and then called its function with our date. Now that it has its answer it uses ChatGPT to return a natural language response to us, The weekday for the date ‘2025-06-13’ is “Friday”.
This is the basic idea of an agent. It can reason about what to do next, and it has one or more tools which are basically just functions, which it can decide to call or not to call, depending on the situation. So the first LLM call above decided to call the Python_REPL
tool, this decision came from ChatGPT. Then the output of the Python_REPL
tool was returned to ChatGPT. Since it now had its answer, there is no need for another function call and the only logical decision is to return the correct answer to the end user, which it does.
While we could write chains of OpenAI function calls to achieve this same result, as we did in the ‘function calls and embeddings’ tutorial series, this takes a lot of work out of our hands and provides easy integration with some predefined outside tools. If we really need more control at a lower level we can always go back to custom defining our chains of function calls in the openai
module.
Creating a more complex agent
So let’s create a slightly more complex agent, and check out some more of the in-built tools in the process. Go ahead and close this file and create a new file named '2_internet_search_agent.py'
in the same folder.
πFinx_LangChain π1_Summarizing_long_texts π2_Chat_with_large_documents π3_Agents_and_tools π1_python_agent.py π2_internet_search_agent.py π.env
First of all, open a terminal and run the following command to install two Python packages we’ll use:
pip install duckduckgo-search wikipedia
We’ll be using duckduckgo-search
to search the internet and Wikipedia for specific Wikipedia searches. I chose the DuckDuckGo search engine over a fully-fledged Google API for this tutorial series as I don’t want to bother you signing up for a Google search API like ‘Serp API’ which requires a sign-up with email and phone number verification and is a bit of a nuisance. If you need it later, the process is the same. We’ll just use DuckDuckGo for now, which does not require any kind of sign-up.
Inside your '2_internet_search_agent.py'
file let’s add our imports:
from decouple import config from langchain.agents import Tool, initialize_agent from langchain.agents.agent_types import AgentType from langchain.chat_models import ChatOpenAI from langchain.utilities import DuckDuckGoSearchAPIWrapper, WikipediaAPIWrapper
We import our familiar decouple/config, AgentType, and ChatOpenAI which we have used before. The Tool
class import will allow us to define a tool and the initialize_agent function will allow us to initialize an agent by passing the needed parameters. DuckDuckGoSearchAPIWrapper and WikipediaAPIWrapper are two classes that contain the functionality our tools will use to search the internet and Wikipedia.
Some basic setup:
openai_api_key = config("OPENAI_API_KEY") wiki = WikipediaAPIWrapper() duck = DuckDuckGoSearchAPIWrapper(region="en-us", max_results=10)
We load our OpenAI API key into a variable and create instances of our imported Wikipedia and DuckDuckGo classes, passing in a locale and a maximum number of results into the DuckDuckGo class. Both of these are already fully functional tools, so let’s test them out. Add the following to your file and run it:
print(wiki.run("Donald Trump"))
So if we ask Wikipedia about Donald Trump we get a bunch of info from his Wikipedia page, pretty cool! Now get rid of this print statement and change it to the following:
print(duck.run("eiffel tower"))
Now when we run it we get a list of search results! Go ahead and comment out or remove this print statement as well. Notice in both cases that they had a .run
method to use the tool, this is part of the LangChain tool base structure. We now want to arm our LLM self-thinking agent with these tools so it can use them to answer questions. Notice the similarity to OpenAI’s function calls here.
Building our list of tools
Let’s define our tools to build up our toolkit:
tools = [ Tool( name="wikipedia", func=wiki.run, description="Useful for when you need detailed information on a topic from wikipedia.", ), Tool( name="duckduckgo", func=duck.run, description="Useful for when you need to search the internet for something another tool cannot find.", ), ]
We created a list with two Tool instances. Note how each tool takes three things. A name, a function to run, and a description. The name is just a name we give the tool, the function is the function we want to run when this tool is called, and the description is a description of what this tool does. This description is for the LLM to understand what it can achieve by using this tool/calling this function. The LLM will make decisions whether or not to call a certain function based on this description you provide, so if you have trouble with a function call not triggering or triggering when it shouldn’t, tweaking your description can help.
Now we need to initialize an agent and give it these tools. Add the following to your file:
chat_gpt_api = ChatOpenAI( openai_api_key=openai_api_key, temperature=0, model="gpt-3.5-turbo-0613", ) agent = initialize_agent( agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, llm=chat_gpt_api, tools=tools, verbose=True, max_iterations=10, )
First, we set up the ChatGPT API as before, and then we initialize an agent in a similar fashion as the previous part. We give it our ChatGPT API as LLM, pass in the list of tools we just defined as tools it can use, and set verbose to True because we want to see all its reasoning steps. We set the max_iterations
to 10, which means it will try to solve the problem in 10 steps or less.
ReAct agents
The agent type we chose this time is Zero Shot React Description. This is a type of agent that can be used with any LLM as it is not tied to OpenAI’s function calls. We’re deliberately using a different agent here to explore the available options. The Zero Shot part means that this agent does not have a conversational memory (we’ll make one with memory later), so if we ask it a second question afterward, it will have no memory of the first question asked before.
The ReAct part does not refer to the popular frontend Javascript library React here. It is a framework or model for prompting that has been developed by researchers that shows effectiveness for these kinds of tasks. Without getting into the details of the research paper, the basic idea is that the model does Reasoning, which is the Re part, and based on that reasoning it takes Action, which is the Act part. Then based on the result of the action, or its Observation, it will go and reason again.
One interesting point here is that if we simply ask an LLM a question, it will tend to answer first and then come up with the reasoning for why this is the correct answer later, providing a plausible explanation even if the answer is wrong because it has already gone in this direction. Researchers found that by asking the model to Reason first and then having it give its answer AFTER the reasoning the accuracy went up significantly.
This is a really interesting finding and something good to keep in mind for all your future prompt engineering adventures. You’ll see this back in the agent’s output later where it will constantly start with reason on what it should do next. You can read more on the ReAct paradigm with a link to the full paper here if you’re interested: https://react-lm.github.io/
Testing our agent
Now let’s test our agent. Let’s ask about the current CEO of Twitter. This fact has changed after Elon Musk bought it and renamed it to X, which is something that ChatGPT would never know as it has no knowledge of recent events, proving that a correct answer is not coming from ChatGPT’s base knowledge. Note that we’re deliberately asking for the CEO of Twitter and not the new name X, to make the question even more difficult for our agent. Add the following line to your file and run it:
agent.run("Who is the current CEO of Twitter?")
Here is what I got:
I need to find out the current CEO of Twitter. Action: duckduckgo Action Input: "current CEO of Twitter" Observation: ** Several search results with multiple names coming up in different search results. Cut out for brevity ** Thought: There are conflicting results from the search. I should try using Wikipedia for more reliable information. Action: wikipedia Action Input: "Twitter" Observation: Page: Twitter Summary: X, formerly Twitter, ... ** article cut out for brevity ** Page: Twitter, Inc. Summary: Twitter, Inc. was an American social media company ... ** article cut out for brevity ** Page: List of most-followed Twitter accounts Summary: This list contains the top 50 accounts ... ** article cut out for brevity ** Thought:The information from Wikipedia is more reliable and consistent. I now know the final answer. Final Answer: The current CEO of Twitter is Linda Yaccarino.
This is the ReAct style agent in action. We can see that it first reasoned it should find the current CEO of Twitter and doing an internet search will be a good idea so it uses DuckDuckGo. We deliberately formulated a question that would return conflicting results here to show how the agent can reason about this and decide to use Wikipedia instead. It reads several Wikipedia pages and after this observation, it is asked to reason again. Remember the point is that it always reasons first and answers later. This time its reasoning leads it to the conclusion that it now does have the correct answer, so it stops the ReAct cycle and gives us the final answer.
Now if that’s not cool I don’t know what is! If you’re curious about the underlying prompts that make this reasoning style work, we’ll take a look at the basics of this in a later part.
Where things might go wrong
For now, let’s try one more question. This time we’ll ask a double-deep question which requires two research steps. Make sure you comment out the above question about the CEO of Twitter, or it will run again as well. Add the following line to your file and run it:
agent.run("Who was the singer of the song 'Hotel California' and when was he born?")
Here is what I got:
I need to find out the name of the singer of the song 'Hotel California' and their birthdate. Action: duckduckgo Action Input: "singer of Hotel California" Observation: "......" Thought:I still need to find out the birthdate of the singer of the song 'Hotel California'. Action: wikipedia Action Input: "Hotel California" Observation: Page: Hotel California Summary: "......" Page: Hotel California (Eagles album) Summary: "......" Page: Disneyland Hotel (California) Summary: "......" Thought:I now know the final answer. Final Answer: The singer of the song 'Hotel California' is Don Henley.
See how it forgot all about the birthdate? It reasoned it still needed the birthdate but then forgot all about it. Also when running this agent multiple times it will sometimes fail to call the functions correctly saying it wants to call a nameless function.
The solution
So what’s going on here? Well, I don’t actually recommend you use the Zero Shot React Description agent that much, but the very verbose output is very good for understanding the ReAct paradigm and how it works. I recommend you use the OpenAI agents as much as possible because they use the underlying function calls mechanism which is a bit less verbose on the agent and its reasoning process but more reliable as OpenAI has probably spent a fortune fine-tuning it. Go up to where we initialized our agent in the code and find the following block:
agent = initialize_agent( agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, llm=chat_gpt_api, tools=tools, verbose=True, max_iterations=10, )
Change ONLY the agent type as follows:
agent = initialize_agent( agent=AgentType.OPENAI_MULTI_FUNCTIONS, # Change only this line llm=chat_gpt_api, tools=tools, verbose=True, max_iterations=10, )
Now go ahead and run your agent again with the Hotel California singer’s date of birth question:
> Entering new AgentExecutor chain... Invoking: `wikipedia` with `{'tool_input': 'Hotel California'}` Page: Hotel California Summary: "......" Page: Hotel California (Eagles album) Summary: "......" Page: Disneyland Hotel (California) Summary: "......" Invoking: `wikipedia` with `{'tool_input': 'Don Henley'}` Page: Don Henley Summary: "......" Page: Don Henley discography Summary: "......" Page: Don Felder Summary: "......" The singer of the song "Hotel California" is Don Henley. He was born on July 22, 1947. > Finished chain.
That’s more like it! Notice how this agent doesn’t output the reasoning process quite as verbosely, which is why I wanted to show you the Zero Shot React Description agent first, so you have a good feel for what’s going on. But as you can see for this use case the OpenAIFunctions agent is very reliable and effective. Just be aware that there are different agents and that you can play around to see which works best for your use case!
More tools, more power!
So now we have an agent with multiple tools in its belt, and it can call multiple tools in succession before answering, just like we did with the function calls in the ‘function calls and embeddings’ tutorials series. Now let’s make our agent even more powerful by giving it another tool to work with!
At the top of your '2_internet_search_agent.py'
file add the following import:
from langchain.utilities.dalle_image_generator import DallEAPIWrapper
Then scroll back down to the bottom and initialize DALLΒ·E:
dalle = DallEAPIWrapper(openai_api_key=openai_api_key, n=1, size="512x512") print(dalle.run("a cat dancing in the desert, style: cartoony"))
As DALLΒ·E is OpenAI’s image-generating AI, we can use our OpenAI API key. N
refers to the number of images we want to output and the size speaks for itself. Now let’s test-run the tool, go ahead and run your Python file (but make sure to comment out the agent.run
statements above to avoid running all of them again as well).
The return in your terminal should be a link. Click it to see your DALLΒ·E image. Mine is pretty fun!
Now let’s define a new Tool
:
dalle_tool = Tool( name="dalle", func=dalle.run, description="Useful for when you need to generate images of something.", ) tools.append(dalle_tool)
We create another Tool instance just like we did before, and then use .append to add it to the existing tool list we already have. As a quick side note, do be careful generating very large amounts of images as the image models tend to be a little bit more expensive ($0.016 – $0.020), and if you need high-quality AI images frankly there are better options than the DALLΒ·E API out there right now. Now let’s initialize an agent V2:
agent_v2 = initialize_agent( agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, llm=chat_gpt_api, tools=tools, verbose=True, max_iterations=10, )
I’m going to try the ReAct agent again for this one so we can see the reasoning process. We pass in our ChatGPT API and our tools, set verbose to True
, and max iterations to 10. Now let’s test our new agent:
agent_v2.run("I would like an image of a flying spaghetti monster.")
There you go, now you see we can use multiple tools and combine them all to make a powerful AI agent. There are many tools available for working with all sorts of stuff, such as the computer file system, searching specialized knowledge websites, connecting to Google Drive etc, etc.
That’s it for part 3. Sometimes we will need more control than a predefined out-of-the-box tool made by someone else will offer though. This is why in the next tutorial part we’ll be upping our game and taking a look at building our own tools from scratch!
This tutorial is part of our original course on Python LangChain. You can find the course URL here: π
π§βπ» Original Course Link: Becoming a Langchain Prompt Engineer with Python – and Build Cool Stuff π¦π