AutoGen Studio (4/4): Multi Agent Teams and More Powerful Agents

Welcome back to part 3, where we’ll explore both the group chat features and how to level up our agents to make them more powerful. We’ll start with the group chat which involves more than 2 agents as a team.

👉 Go Back to the Full Course: Next Level Prompt Engineering with AutoGen Studio

Let’s take a quick look at how this works using the example that AutoGen has already pre-setup for us to get started.

Group chats and the group chat manager

First head over to the Build tab and then the Workflows section. You will see that a Travel Agent Group Chat Workflow has already been created for us. Click on this workflow and let’s see how it works first.

So we have the name, description, a ‘summary method’ of last, and the sender is a userproxy type agent. This is the same userproxy that we have seen before, and if you open it up you’ll see that most fields are empty as its job is just to pass on our message and act as a code executor if needed.

On the right side we have the Receiver just like before, but this time instead of having an assistant here we have the group_chat_manager. Click it and let’s check it out:

So we can see that this special agent type has a Group chat agents field, and this is where we can add multiple agents to our group chat, in this case, it has three different agents in the group chat.

If we scroll down we can see that the group_chat_manager has a name but no description, no meaningful system message, and no models or skills. It’s more of a special agent type as an empty shell that contains other agents to form the group but doesn’t necessarily contribute to the group or its conversations itself.

Like the [brackets] in a Python list, it serves as a container for the contents. There might be a model declared under Model by the way, but you can delete it from the list just like I did before taking the screenshot.

The agents in the group chat

So as the group_chat_manager is basically just a container for the agents in the group chat, let’s take a look at the agents themselves. First, let’s open the primary_assistant:

As we can see from the prompt message, this is the agent that will actually lead and coordinate the group chat. This can seem surprising at first as you’d probably have expected the group_chat_manager to be the one coordinating the group.

We can see it has instructions to coordinate with the two other agents in the group chat and the instructions for the final response being the complete plan + the word TERMINATE again.

If we close this and open up the local_assistant we can see all settings are basically the same except for the prompt, which basically tells the agent that it is a travel agent:

Notice two things here:

It also has instructions that if a plan is already satisfactory that is ok, so as not to force it to rebuild an already perfectly fine travel plan.
It doesn’t have the ending and TERMINATE instructions here, as we deliberately want this decision responsibility to lie with the primary_assistant alone.

Finally, we have the language_assistant:

This one has the same basic structure and idea as the other agent, providing its part on the language and communication challenges to the plan but leaving the main responsibility with the primary assistant.

The group chat in action

Ok so close up this workflow and let’s give it a spin! Head over to the Playground tab and click on +New, and choose the Travel Agent Group Chat Workflow from the dropdown menu:

Now I’m going to ask it a simple question and let’s see:

I won’t show the entire output here, as it is a bit long and not really the topic we’re dealing with, plus you can obviously try this for yourself. But here is what happened for me:

userproxy (to group_chat_manager): Just passes our query along about wanting to visit Paris for 3 days.
primary_assistant (to group_chat_manager): Some intro talk and then asks the local_assistant for a list of must-see attractions and the language_assistant for basic phrases or language tips.
local_assistant (to group_chat_manager): Provides a full three-day itinerary for Paris.
language_assistant (to group_chat_manager): Provides some basic phrases and language tips.
primary_assistant (to group_chat_manager): Based on the valuable input from the local_assistant and language_assistant, here is your complete travel itinerary for a 3-day visit to Paris, including local attractions and language tips: [gives the whole travel plan with tips here] and wishes you a great trip before using the keyword TERMINATE.

Now what you should see pop up in the user interface is message number 5 from above, which contains the full resulting plan. But you may actually see message number 4 pop up in the user interface instead and it will claim there are only 4 messages in the total chat and not 5, just like it did for me here where the non-final answer of the language tips is showing as the final output in the user interface:

So what gives?

Bugs – AutoGen Studio is a work in progress

AutoGen actually did a good job under the hood. If you go to the terminal window (the one you used to start up the docker container) you will be able to read through all of AutoGen’s output. AutoGen Studio is just a visual interface to work with the underlying AutoGen library.

If we look at the terminal output we can see message number 5 is there, no problem:

primary_assistant (to group_chat_manager):

Based on the valuable input from the local_assistant and language_assistant, here is your complete travel itinerary for a 3-day visit to Paris, including local attractions and language tips:

**Day 1: Classic Landmarks and Evening Cruise**
- Shares plans here...

**Day 2: Art, History, and Shopping**
- Shares plans here...

**Day 3: Parks, Neighborhoods, and Local Life**
- Shares plans here...

**Language Tips:**
- Shares basic phrases and tips here...

With this itinerary and language tips, you're all set for a memorable and smooth trip to Paris. Enjoy the culture, history, and cuisine of this beautiful city. Bon voyage and have a fantastic trip! TERMINATE

So what we can see here is the underlying AutoGen library worked fine, but AutoGen Studio had some trouble there. Be aware that AutoGen Studio is very much a work in progress and there are still some bugs and issues to be worked out. Just know that you can check the real output in your terminal window in these cases!

Some other small bugs I’ve noticed, to keep in mind:

When filling out settings for an agent, for example, sometimes when filling out certain fields or adding things, the other fields will reset to older values again. Just make sure you check all fields before closing the window to make sure nothing has jumped back to older settings again.
We have the group_chat_manager here that holds the primary_assistant, local_assistant, and language_assistant agents inside. If you update any of these three inner agents, you have to go to the group_chat_manager and delete the agent from the list and then add it again, otherwise, it will still use the old version of the agent
Group chats, when defined through the user interface of AutoGen Studio, often behave weirdly and all sorts of assistants will try to execute code that doesn’t exist. The group chat feature is not yet well implemented in the user interface version of AutoGen at this moment. (The travel group chat worked quite well as it came predefined behind the screens, which is why we covered it. Defining a new group chat doesn’t work as reliably yet).

Much of the time we don’t need a bigger group chat, and only need the user proxy agent coupled with one single powerful assistant that can keep iterating over its own work and use the userproxy to execute and test the code, the two-agent setup.

This is what the rest of the course will focus on, but honestly, for many use cases, it is a lot more direct, simple, and powerful to use AutoGen with this two-agent setup anyway, so no despair. This has actually not stopped me from building what I wanted to build using AutoGen. It has merely stopped me from making it unnecessarily complex and convoluted! Simple is beautiful.

So yeah, admittedly some things are still a bit clunky here and there, but I’m sure they will be improved in the future. If you’re watching this a couple of months from now, just update the libraries after you finish the course, to the latest version of AutoGen Studio with hopefully some of these issues fixed. (We’ll cover specific instructions for this at the very end).

Practical considerations – token usage

While this is really cool it can also be pretty expensive to have agents casually iterating over solutions and planning how to best solve your question as you make multiple ChatGPT calls for each agent, and all of these calls will have a fairly long message history context sent along with them, racking up the tokens faster.

Where a normal GPT-4 call with a simple message will not use as many tokens, a more complex iterative chat like this can easily fire off 10 messages and all of these will have a fairly long context. I’m not saying you’ll suddenly be using up $10 dollars an hour worth of credits or anything, but your GPT-4 bill will go up considerably faster than you are used to.

In the last part, we’ll look at setting up other LLMs to work with in AutoGen Studio, so we can use Gpt-3.5 turbo which is massively cheaper, though as you will see the quality and chance for problems is considerably higher. Remember that some expense is fine, as long as the work it does has a high value to you. It’s a lot cheaper than human labor! But this is definitely something to keep in mind.

Creating our function to add to the agent’s skillset

Ok so let’s have some fun here and see how far we can get without writing any code ourselves. You can watch and read along for this next part if you are very concerned about your token usage (you can easily burn a dollar or $2 if you play around with this for an hour or two) or follow along and try the same for yourself.

This next part is where things get really exciting 😎, particularly for you no-code or beginning coders out there. We’re basically going to use the AI to make itself more powerful. We’re sort of going to have the AI write a more powerful version of itself in a way and have it evolve with new skills.

Keep in mind that AI is not 100% deterministic though so you may not have the exact same results as me, but you should be able to get something similar.

First, go to the Playground tab, click on +New, and choose the General Agent Workflow from the dropdown menu. Then ask it to do the following “Please write a generic function that takes markdown as input and returns a PDF file”:

As you can see I have already run this. The basic breakdown of what happened for me is like this:

userproxy (to primary_assistant): Passes along our query as always.
primary_assistant (to userproxy): Generates the requested code but also gives instructions on all the dependencies that need to be installed to be able to run the code.
userproxy (to primary_assistant): Executes the code and fails because of uninstalled dependencies.
primary_assistant (to userproxy): Gives a shell script that the userproxy can run to install the dependencies.
userproxy (to primary_assistant): Executes the shell script to install the dependencies and succeeds this time.
primary_assistant (to userproxy): Gives the markdown to PDF code to the userproxy again.
userproxy (to primary_assistant): Executes the code and fails to run because this time a Python package named markdown2 is missing.
primary_assistant (to userproxy): Gives a shell script that the userproxy can run to install the markdown2 package.
userproxy (to primary_assistant): Executes the shell script to install the markdown2 package and succeeds.
primary_assistant (to userproxy): Gives the markdown to PDF code to the userproxy again.
userproxy (to primary_assistant): Executes the code and succeeds this time.
primary_assistant (to userproxy): The setup is now complete, and you can use the markdown_to_pdf function to convert markdown content to a PDF file. TERMINATE

PHEW, that was quite a back-and-forth! But it’s actually pretty cool to see how the primary_assistant was able to coordinate with the userproxy to get all the dependencies installed and the code running.

Levelling up our agents

Now click on the markdown_to_pdf.py file that was returned to us:

Here we can see the function that has been generated for us. But remember, it did not just generate this function, it has also meticulously installed all the dependencies for us and made sure our environment is set up to run the code. That is pretty cool!

So go ahead and copy the code by clicking on the clipboard icon you can see on the top right of the code (indicated in green here)

Now go to the Build tab and then the Skills section. Click on the green +New Skill button:

So now give this new skill the name markdown_to_pdf_file_converter, paste the code you copied into the box, and click Ok:

We now have a new skill, and basically, the AI is writing itself for us! 😎 Let’s see what the AI can do with this new skill.

Go to the Build tab and then the Workflows section. Click on the green +New Workflow button, then make sure you use the Two Agents option:

The setup is fairly easy so I’ll walk you through it. First, fill in the name and description, and set the Summary Method to last:

Leave the userproxy window on all default settings, it is fine as is:

For the primary_assistant, we’ll pass in a description and a system message, and then add a skill. Don’t worry about the model, the gpt-4-1106-preview is just the empty setting, but it will default to gpt-4-1106-preview anyway:

So add in the description of “Explains topics in meticulous detail using PDF file output” and make sure in the Skills section below that you add the markdown_to_pdf_file_converter skill we created.

The system message is a bit long so I’ll provide it here. Much of it is simply a copy from the default primary assistant with some modifications:

You are a helpful assistant that can explain topics in meticulous detail. The user will provide you with a topic, and your job is to explain the topic fully. Make sure you explain all facets of the topic and how they work so that even a beginner student can use your explanation to learn the topic. MAKE SURE TO STRUCTURE YOUR EXPLANATION IN MARKDOWN FORMAT.

You will then use the markdown_to_pdf_file_converter that you have been given as a skill. Pass in the explanation in Markdown format, and then generate the PDF file. This is the final answer you will present to the user. The libraries and dependencies to run the markdown_to_pdf_file_converter should already be installed, so try without installing dependencies first.

At each point, do your best to determine if the user's request has been addressed.
IF THE REQUEST HAS NOT BEEN ADDRESSED, RESPOND WITH CODE TO ADDRESS IT. IF A FAILURE OCCURRED (e.g., due to a missing library) AND SOME ADDITIONAL CODE WAS WRITTEN (e.g. code to install the library), ENSURE THAT THE ORIGINAL CODE TO ADDRESS THE TASK STILL GETS EXECUTED. If the request HAS been addressed, respond with a summary of the result. The summary must be written as a coherent helpful response to the user request e.g. 'Sure, here is your PDF file' and the PDF file should be done.  The summary MUST end with the word TERMINATE. If the user request is  pleasantry or greeting, you should respond with a pleasantry or greeting and TERMINATE.

Nothing too surprising there. When you have everything set up, press the green Ok button in all windows, and let’s give it a spin!

Testing our self-evolved agent

Go to the Playground tab and click on +New, and choose the Topic Explanation Workflow from the dropdown menu:

I’m going to go ahead and ask it to:

Please tell me about penguins

While the primary assistant did get confused providing code to execute to the user proxy in the wrong manner, it quickly recovered and gave me a PDF file:

And if I open the PDF file I can see it has a nice explanation of penguins, and also has all the Markdown formatting the assistant implemented, to make it look nice and readable:

We asked the AI to write a new function and make itself more powerful, also installing all the needed libraries and dependencies to run it, and already we can have nicely structured PDF files generated for us as part of any task or thing we ask it to do from now on.

Levelling up our agents even further

What features do you want your super powerful AI assistant to have? Just ask it to write them for you and add them as skills! 😎

Say, for example, we want our AI agents to be able to fetch any information from any page on the whole internet? Let’s ask it for a function.

I’m not going to tell you what buttons to press here as I don’t want to waste your time doing the same thing over and over, you can just repeat the same steps as above basically. You already know how to do this, I just want to get you thinking about what you can do with this and give you some ideas:

userproxy (to primary_assistant):

Please write a generic function that takes a URL as input and returns the main text content of the page, filtering out menu items, excessive whitespace, newline, and tab characters, and other such things as much as possible. Then test it on the following URL to make sure it works: https://en.wikipedia.org/wiki/Horace_Walpole

Ok, I asked it very specifically to filter out as much as possible and test the function on a random Wikipedia page I selected.

primary_assistant (to userproxy):

To create a generic function that fetches the main text content of a webpage, we can use the `requests` library to retrieve the page content and the `BeautifulSoup` library from `bs4` to parse the HTML and extract the main text. We'll filter out menus, navigation, and other non-main content elements as much as possible.

First, I'll provide you with the Python code for the function. After that, we'll test it with the specified URL.

# filename: fetch_main_content.py
import requests
from bs4 import BeautifulSoup

def get_main_text_content(url):
    try:
        # Send a GET request to the URL
        response = requests.get(url)
        # Check if the request was successful
        if response.status_code == 200:
            # Parse the content with BeautifulSoup
            soup = BeautifulSoup(response.content, 'html.parser')
            # Find the main content of the page
            # For Wikipedia, the main content is within the <div id="bodyContent"> tag
            main_content = soup.find('div', id='bodyContent')
            if main_content:
                # Remove script and style elements
                for script_or_style in main_content(['script', 'style']):
                    script_or_style.decompose()
                # Get text and remove excess whitespace
                text = ' '.join(main_content.stripped_strings)
                return text
            else:
                return "Main content not found."
        else:
            return f"Error: Response status code is {response.status_code}"
    except Exception as e:
        return f"Error: {str(e)}"

# Test the function with the provided URL
url = 'https://en.wikipedia.org/wiki/Horace_Walpole'
main_text_content = get_main_text_content(url)
print(main_text_content[:1000])  # Print the first 1000 characters of the main text content

Please save the above code in a file named `fetch_main_content.py` and execute it. This will test the function with the Wikipedia page for Horace Walpole and print the first 1000 characters of the main text content.

We can see it coded up a function for us and included a testing section at the bottom. If you decide to copy this function as a new skill, make sure you remove the part after the # Test the function with the provided URL comment, as that is just for testing purposes.

(If you’re not familiar with coding and don’t feel comfortable doing this, best to not ask for a test specifically and only ask for a generic function instead, like we did in the PDF generator function example).

EXECUTING CODE BLOCK 0 (inferred language is python)...
userproxy (to primary_assistant):

exitcode: 0 (execution succeeded)
Code output:
From Wikipedia, the free encyclopedia English writer and politician (1717–1797)
...first 1000 characters of the main text content here, abbreviated...

The code was executed successfully and we can see the first 1000 characters of the main text content of the Wikipedia page for Horace Walpole (I removed them for copyright and whatnot just to be safe).

primary_assistant (to userproxy):

The function has successfully retrieved and printed the main text content from the Wikipedia page for Horace Walpole, filtering out menu items, excessive whitespace, newline, and tab characters. The output provided is the beginning of the article, which includes biographical information and political career details.

The task of fetching and filtering the main text content of the given URL has been addressed successfully. TERMINATE

And now we just remove the test portion of the code. Alternatively simply don’t ask for a test in the first place but only a generic function as it will tend to test its code anyway. We add this as a new skill and use the 2 agent setup as we did before to create whatever workflow we want. You can reuse the basic prompt setup and just alter it a bit with your specific desires. You have all the tools to do this on your own now!

Where to go from here

Maybe you want a PDF summary of a particular webpage, with a matching AI-generated image in it as well? You already have the 3 skills your AI agent will need to accomplish all of this, an image generating skill, a markdown to PDF skill, and a webpage content fetching skill.

Think of all the things you can do with this with a bit of imagination! As another idea, there is a predefined example skill in the Skills section that came pre-installed named find_papers_arxiv. Arxiv is a website where you can find scientific papers on all sorts of topics. You can add this function to an agent and ask it for the latest developments in the professional field of X… and it will go and find the latest papers for you:

This is kind of like a ‘Train your AI’ game, where you can keep leveling up and teaching it new stuff. Pretty mind-blowing 🤯. This would all have been unthinkable just a few years ago.

What about using other LLMs or GPT-3.5 turbo?

Now to finish this off, let’s take a brief look at the possibility of using other LLMs than GPT-4 we’ve been using so far. We’ll start with trying out GPT-3.5 turbo, which is a lot cheaper than GPT-4, sounds like a great deal right? Well, let’s see what happens.

Go to the Models in the Build tab and click on the +New Model button:

It took me a while to figure this out because no matter what I input into the boxes and how well I filled them out, it would always just ignore everything and use GPT-4. After much research, I found out that for OpenAI models you don’t really need to fill out anything, as they are sort of the default!

We already specified our API key as an environment variable, so just set the top input box to either gpt-3.5-turbo or to the specific gpt-3.5-turbo-0613 model name and leave the rest blank. Then click Ok.

Great, that is our 3.5-turbo setup done. Now go to the Workflows tab and define a new workflow:

Set the name and description to “GPT-3.5 Turbo General Workflow”, and then open up the primary_assistant by clicking on it:

Leave everything on default settings but make sure under Model to delete all the models there and then add only the gpt-3.5-turbo you just created. Then click Ok and then Ok again to create the workflow.

Testing GPT-3.5 turbo

Now create a new session for your new workflow in the Playground:

And then press the green Stock Price button at the bottom to run the NVIDIA / TESLA stock price example we ran before. You will see that GPT-3.5 turbo really struggles here. Here is basically what I got:

userproxy (to primary_assistant): Passes along our query as always.
primary_assistant (to userproxy): Sure let’s plot a chart (but doesn’t actually do anything or generate code).
userproxy (to primary_assistant): Waiting for something it can work with… silence…
primary_assistant (to userproxy): Sorry I don’t have access to stock price data, can you provide it to me?
userproxy (to primary_assistant): Waiting for something it can work with… silence…
primary_assistant (to userproxy): Similar to message 4 again.
userproxy (to primary_assistant): Waiting for something it can work with… silence…
primary_assistant (to userproxy): Gives some code for plotting out data from a CSV file, but doesn’t actually provide the CSV file.
userproxy (to primary_assistant): Tries to execute this code but fails as the CSV files don’t actually exist.
primary_assistant (to userproxy): Please make sure that you provide the CSV data, it seems the files do not exist.
userproxy (to primary_assistant): Waiting for something it can work with… silence…
primary_assistant (to userproxy): Apologizes again for not having access to stock data (but doesn’t realize it can easily write code to retrieve this) and tells us where to get it.

—Message limit reached—

So yeah, it’s clear that for anything that involves more advanced reasoning and code writing than just simple chat, GPT-3.5 turbo is not going to cut it. It’s not because we used the General Agent Workflow either (instead of the Visualization Workflow), as the General Workflow has no problem with this task when using GPT-4:

The difference really is night and day.

3.5 turbo may still be helpful if you want to do simpler non-code things, but if that’s the case there is not really much reason to use AutoGen Studio anyway, as its strength really lies in its autonomous nature where it generates code and just executes it to achieve whatever you want it to do.

Running LLMs locally (for free)

Now you can actually also run local LLMs, like actually running one on your own computer and then connecting to it from AutoGen Studio over localhost. As the utility for this is more limited for most users considering the quality drops we saw with even 3.5 turbo, I won’t go into detail here, but give you a rough outline to get you started.

Note that you are likely to run into the same issues as we did with GPT-3.5 turbo, as the local LLMs are generally not quite as powerful as GPT-4. But they offer a free alternative for chat-based tasks and things that do not involve writing and spontaneously executing code to solve problems on the fly.

If this is something you’d like to do you’ll have to get ollama, which will quite easily allow you to run large language models on your local computer. Windows is not yet available, but you can use WSL2 to run it over Linux, which we already installed in the first part of this course, so you’re halfway there already! If you’re on macOS or Linux you can just run it on your main OS.

See here for more details on the ollama project. Then you can use litellm (see here) to host your local LLM as a server and connect to it from AutoGen Studio. As litellm mostly uses the OpenAI API style of structure for the endpoint it hosts, you can set up a new Model in the Models section of the Build tab and set only the name and the Base URL fields to connect to your local model, using http://0.0.0.0:8000/ as the base URL.

Again, I’ll leave this to you as a hobby project if you’re interested in exploring this, as for most AutoGen Studio use, GPT-4 turbo is probably going to be your best bet for now. (Though we might see better results with larger group chats on free LLMs when this feature is more fully implemented in the user interface, as the agents could try to correct one another).

Updating to the latest version of AutoGen Studio

Now I promised to show you how to update to the latest version of AutoGen and AutoGen Studio. This is really easy to do and you will not lose any of the work you have done so far! First, find your Docker desktop window that is still running, or open it from the whale icon in your taskbar:

Now click on the name autogenstudio in the name column, so not the one under the image column. You will now see the logs for your Docker container:

Go to the Exec tab, which will give us the terminal window for our Docker container, and then run the following two commands in there:

pip install pyautogen --upgrade

To upgrade the AutoGen library, and then run:

pip install autogenstudio --upgrade

To upgrade the AutoGen Studio user interface and wait for the command to finish installing. Now first make sure to click the refresh button in the top right corner of the Docker window in the image above, to restart the Docker container. If you now refresh your browser you will see the newest version. That’s it!

Conclusion

In conclusion, AutoGen Studio is a really amazing piece of software, though it still has some bugs to work out with the larger group chat and some other things, but they may be fixed in your newly updated version already, development is going fast and your version number will be considerably higher after the upgrade command.

The ability to just have an agent generate any code on the fly and execute it to achieve things that were normally impossible or had to be provided as pre-coded and pre-planned functions ahead of time is really amazing.

This truly is yet another of those next-level steps in the field of prompt engineering. I really hope you enjoyed this mini tutorial series and that it has given you some ideas on how to use AutoGen Studio to achieve your own goals.

As always, it was my honor and pleasure to accompany you on this journey, and I will see you in the next one!

👉 Go Back to the Full Course: Next Level Prompt Engineering with AutoGen Studio

juan.jeffery@aefin.com.au

February 12, 2024 at 9:51 am

I followed the code examples to the letter but seem to have done something wrong. I keep getting the same response in the final step [I’m sorry for any confusion, but as an AI developed by OpenAI, I don’t have the capability to directly access external databases or APIs to retrieve live stock data or plot charts within this platform. However, I can guide you on how to do this using Python with libraries such as matplotlib for plotting and yfinance for retrieving stock data.] I get similar for markdown and the others – the generated code is fine and executes with the results in the command prompt but I cannot get any output to the playground for all prompts including the example prompts

Chris
February 19, 2024 at 4:53 pm

Thanks, Juan, this is strange. I cannot really debug from remote with the limited information. But if you have more details or a solution to share, feel free to let us know as a comment here. If you cannot solve it yourself, feel free to send us more details so we can completely reproduce the error.

Keep up the good work at the forefront of AI!
Chris

Log in to Reply

AutoGen Studio (4/4): Multi Agent Teams and More Powerful Agents

Group chats and the group chat manager

The agents in the group chat

The group chat in action

Bugs – AutoGen Studio is a work in progress

Practical considerations – token usage

Creating our function to add to the agent’s skillset

Levelling up our agents

Testing our self-evolved agent

Levelling up our agents even further

Where to go from here

What about using other LLMs or GPT-3.5 turbo?

Testing GPT-3.5 turbo

Running LLMs locally (for free)

Updating to the latest version of AutoGen Studio

Conclusion

2 thoughts on “AutoGen Studio (4/4): Multi Agent Teams and More Powerful Agents”

Leave a Comment Cancel Reply