Hi and welcome back to part 4 of this tutorial series where we’ll once again be taking it up a step. We’ll basically compress the Agent and the Executor into a single node and then have multiple of these ‘agent and executor’ nodes inside of a team working together. First, we’ll cover the basic idea and do some short work to prepare the extra functions we will need, and then we’ll continue into the next part where we’ll put it all together into a multi-agent team that does the work for us while we sit back and relax!
Advantages of multi-agent teams
So why is this multi-agent thing useful in the first place? We can simply give one agent multiple tools right? Well, up to a point. If you give a single agent a prompt to first do thing A
by calling function_a
and then do thing B
by calling function_b
followed by either function_c
or function_d
depending on the output of function_b
then the prompt of this agent is going to become a mess and it will also be fairly unreliable. The main advantages of multi-agent teams for more complex setups are:
- Grouping responsibilities gives better results as agents will tend to perform better when they have a more focused task rather than a dozen tools and responsibilities to choose from.
- Separate prompts will give better results as each prompt can have its own examples of exactly what we want it to do and how. We can even have a specific agent run on a fine-tuned version of ChatGPT that is specifically trained and optimized for that node’s task.
- Easier development as you can work on, test, and evaluate each agent in insolation without it being connected to and breaking stuff elsewhere in the chain when you make improvements. It’s also easier to conceptually wrap your brain around the system as a whole.
There are many possible slight variations for how this could be implemented. You could have a shared scratchpad
for example so that all of the agents can see what thought processes and work the other agents have done. The downside is that this is very verbose though and the amount of information exchanged may be pointlessly large.
Alternatively, you could have them be isolated as single LLM calls without a strong interconnection that basically operate independently but they are merely strung together in a chain. This may be a bit too isolated though.
The example we’ll be looking at here lies somewhere in the middle where we will have independent fully-fledged agents that have their own scratchpad and ability to call tools if needed but the result of each agent doing its independent work gets stored in a shared state object like we had in the previous part.
This will be supervised by a sort of ‘team supervisor’ node we’ll call an ‘agent supervisor’ that will use this overall state object with the work done so far to decide what happens next and who to call. The basic idea looks like this:
The user sends a query to the Team Supervisor. The Team Supervisor then has a team of agents and it decides who it should call on next to complete some work, it can choose any of the agents at any point. Every agent points back to the Team Supervisor so that the Team Supervisor gets to decide again after each step which agent is next or if the work has been completed, in which case it will return to the end user.
Ours will look slightly different but we’ll build a diagram for it as we go along.
Tavily API
Before we jump in we’ll need to add another API key to our .env
and setup_environment.py
files. We will be using the Tavily API lightly during this part and again in the next part of the series. Go to https://app.tavily.com/ and sign up for a free API key.
Tavily is a search engine optimized for AI agents and we can use it to have an agent search the internet. One of the reasons I chose Tavily here is that LangChain comes with pre-built tools for Tavily that we can just import and use as is, allowing us to focus more on learning about LangGraph as we have one less tool to write. You can just use your Google account for quick and easy sign up and it will cost you nothing for the first 1000 or so queries which is way more than we’ll use. Get your API key and copy it to the clipboard. Then open your .env
file and add it like so:
OPENAI_API_KEY=your_api_key_here LANGCHAIN_API_KEY=your_api_key_here WEATHER_API_KEY=your_api_key_here TAVILY_API_KEY=your_api_key_here
Make sure not to use any spaces or quotation marks as usual. Then go ahead and save and close the .env
file. Now open the setup_environment.py
file and add a single tine to load the TAVILY_API_KEY
to an environment variable like so:
import os from datetime import date from decouple import config def set_environment_variables(project_name: str = "") -> None: if not project_name: project_name = f"Test_{date.today()}" os.environ["OPENAI_API_KEY"] = str(config("OPENAI_API_KEY")) os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_API_KEY"] = str(config("LANGCHAIN_API_KEY")) os.environ["LANGCHAIN_PROJECT"] = project_name ##### Add only this line ##### os.environ["TAVILY_API_KEY"] = str(config("TAVILY_API_KEY")) ############################## print("API Keys loaded and tracing set with project name: ", project_name)
Now save and close the setup_environment.py
file.
Prep for our multi-agent team
For this example over the next two parts, we will be creating a multi-agent team that will generate travel itineraries for us in PDF format, with us simply inputting a query and getting a fully formed PDF travel itinerary out the other end including an image. We will have three different tools that we will need for the overall setup:
- An image generator: We already made one in the last part, so we can just import and reuse it, which is one of the nice things about LangChain tools.
- An internet search tool: In case the agent wants to search for more information. LangChain comes with some pre-built tools one of which is for Tavily Search, which is why we got the API key. We can just use this prebuilt here to save some time.
- A PDF generator: We will need a tool for our agents to be able to write PDF files and save them to disk. We will have to write this one ourselves before we can get started on our travel itinerary multi-agent team setup.
PDF writing tool
So let’s write up a quick PDF writing tool for our agents before we move on. Inside your tools
folder make a new file named pdf.py
:
๐ FINX_LANGGRAPH ๐ images ๐ tools ๐ __init__.py ๐ image.py ๐ pdf.py โจNew file ๐ weather.py ๐ .env ๐ langchain_basics.py ๐ Pipfile ๐ Pipfile.lock ๐ setup_environment.py ๐ simple_langgraph.py
Inside this new pdf.py
file get started with our imports:
import os import uuid from pathlib import Path import pdfkit from langchain.tools import tool from markdown import markdown from pydantic import BaseModel, Field
We import os
to work with the operating system, uuid
to generate unique filenames again, and Path
to create a path towards an output folder to save our PDF files. The tool
decorator from LangChain is the same one that we used last time and the Basemodel
and Field
from pydantic
imports are for defining the input arguments interface for our function just like we did before.
The pdfkit
library is going to let us save HTML to real output PDF files, but the downside is that it needs HTML as input to do the conversion. As HTML is more complex for our LLM agents to write which introduces more variables and I want to keep this example simple we will be using the markdown
library to convert markdown to HTML for us. That way we can just tell our agents to write in markdown formatting (which is very simple) and our function will do markdown
-> HTML
-> PDF
.
Both pdfkit
and markdown
are not installed by default so we will have to install them in our virtual environment. Open your terminal and run:
pipenv install markdown==3.6 pdfkit==1.0.0
That will take care of the basic Python library installs, but pdfkit
needs an additional step, as it actually uses something called wkhtmltopdf
under the hood to achieve the conversion. Head over to https://wkhtmltopdf.org/downloads.html and click the appropriate download for your platform. I am on Windows so I’ll select the Windows 64-bit download option:
Run the installer and select an install location. I’ll simply use the default C:\Program Files\wkhtmltopdf
myself. Whichever install location you choose, take note of it and copy it somewhere as you will need it in a moment:
Let that run the install and when it’s done we can get back to the code! Below our imports in pdf.py
we’ll add some quick setup:
PATH_WKHTMLTOPDF = r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe" PDFKIT_CONFIG = pdfkit.configuration(wkhtmltopdf=PATH_WKHTMLTOPDF) OUTPUT_DIRECTORY = Path(__file__).parent.parent / "output"
First of all, we do some setup for pdfkit
by pointing it to the location of the wkhtmltopdf
executable. This is the path I used on my Windows machine, you have to adjust this path to where you installed wkhtmltopdf
on your machine so be sure that you use the correct path for you! After defining the path we can simply call pdfkit.configuration
with the wkhtmltopdf
argument set to the path we just defined. Later in the code when we actually write the PDF files, we can pass in this PDFKIT_CONFIG
as an argument to use this configuration.
We then use the same trick as last time to get a path to a folder named output
in our project root. This is where we will save our PDF files, but the folder doesn’t exist yet. Make sure you create it right now or the code will fail when it tries to save the PDF files later and you’ll be stuck debugging why it doesn’t work:
๐ FINX_LANGGRAPH ๐ images ๐ output โจNew empty folder ๐ tools ๐ __init__.py ๐ image.py ๐ pdf.py ๐ weather.py ๐ .env ๐ langchain_basics.py ๐ Pipfile ๐ Pipfile.lock ๐ setup_environment.py ๐ simple_langgraph.py
Good! Now back to our pdf.py
file. Below the setup we’ll define our input arguments interface just like we did with our other tools so far:
class MarkdownToPDFInput(BaseModel): markdown_text: str = Field( description="Markdown text to convert to PDF, provided in valid markdown format." )
We simply define the input arguments as a single string that has to be in a valid markdown format. Once again make sure your description is a good one as the LLM will use it, it is not just for our own reference.
HTML generation
Let’s make the problem we need to solve smaller by first writing a separate function to generate the HTML from the markdown text so we can just feed HTML into pdfkit
:
def generate_html_text(markdown_text: str) -> str: """Convert markdown text to HTML text.""" markdown_text = markdown_text.replace("file:///", "").replace("file://", "") html_text = markdown(markdown_text) html_text = f""" <html> <head> <style> @import url('https://fonts.googleapis.com/css2?family=Roboto&display=swap'); body {{ font-family: 'Roboto', sans-serif; line-height: 150%; }} </style> </head> <body> {html_text} </body> </html> """ return html_text
This function takes a markdown_text
as string input. First, we’ll search the markdown text for any file:///
or file://
protocol declarations sometimes used when the model inserts our image in markdown. These are not needed so we simply replace them with an empty string ""
as these would cause our image to not show up in the final generated PDF file. This kind of thing is something you just discover during your development work.
Now we can simply call the markdown
function we imported on our markdown to get valid HTML based on the markdown. As I felt like doing some light styling I then wrapped the html_text
in some basic HTML tags html
, head
, and body
. In the head
we can then include a style
tag which allows us to load the Roboto
font from Google using the css
function @import url
, set it as the font, and give some extra line height to our document to make the text more readable. This is the final html_text
that will be returned with the markdown call converted HTML in the body
portion. If you happen to be less familiar with HTML just copy what I have, it’s not really important for the course.
Finishing up the tool
Now it’s time to define the actual tool itself. Continue below:
@tool("markdown_to_pdf_file", args_schema=MarkdownToPDFInput) def markdown_to_pdf_file(markdown_text: str) -> str: """Convert markdown text to a PDF file. Takes valid markdown as a string as input and will return a string file-path to the generated PDF.""" html_text = generate_html_text(markdown_text) unique_id: uuid.UUID = uuid.uuid4() pdf_path = OUTPUT_DIRECTORY / f"{unique_id}.pdf" options = { "no-stop-slow-scripts": True, "print-media-type": True, "encoding": "UTF-8", "enable-local-file-access": "", } pdfkit.from_string( html_text, str(pdf_path), configuration=PDFKIT_CONFIG, options=options ) if os.path.exists(pdf_path): return str(pdf_path) else: return "Could not generate PDF, please check your input and try again."
We start with the @tool
decorator, once again providing a string name for our function and then the input argument interface we defined. The function itself takes a markdown_text
as input and returns a string file path to the generated PDF file. We have a docstring that explains what the function does and what it expects as input as the LLM is going to use this.
We then call our generate_html_text
function on the markdown_text
to get the html_text
we need and generate a unique ID for the PDF file name, creating a path to the PDF file in our OUTPUT_DIRECTORY
folder. We then define some options for pdfkit
to use when generating the PDF. These are just some basic options that I found to work ok for our example, we don’t want to get sidetracked here by spending too much time on this as it is not the focus of this tutorial.
Finally, we call pdfkit.from_string
with the html_text
, the path to the PDF file in str
format instead of a Path
object, the configuration
we set up atop this file, and the options
we just defined. If the PDF file is successfully generated, which we can check with the os.path.exists
function to see if the file exists or not, we return the path to the PDF file. If it does not exist we return a message saying that the PDF could not be generated. We purposely do not raise an error but send a string response as the agent can receive this, try to find the error, fix it, and try again.
PDF tool test run
Now let’s add a quick test at the bottom of our file:
markdown_dummy_text = """ # Title This is a test of the markdown to PDF function. ## Subtitle This is a test of the markdown to PDF function. ### Sub-subtitle This is a test of the markdown to PDF function. This is a paragraph with random text in it nunc nunc tincidunt nunc, nec. S'il vous plaรฎt. """ if __name__ == "__main__": print(markdown_to_pdf_file(markdown_dummy_text))
There are a couple of headings here and some French with non-standard characters like in “plaรฎt” to make sure it also works with special characters. Now go ahead and run your file (Reminder: make sure you created the output
folder!). Close the printer message popup if you get one, we’ll just ignore it for now. You should see a new PDF file in your output
folder. Go ahead and open it:
It’s not perfect by any means, but it works well enough for our LangGraph example purposes. As LangGraph is the focus here we will not spend any more time perfecting the details of this particular tool.
One last step though to fix the imports. Open up the tools/__init__.py
file and fix the code to:
from .image import generate_image from .weather import get_weather from .pdf import markdown_to_pdf_file
Save and close that so we can have the nicer imports in our main code. That’s it for the preparation, this part is slightly shorter by design as the next one will be extra long. It is finally time to set up and run our multi-agent team! So let’s get to the fun stuff, I’ll see you there! ๐