AutoGen Studio (3/4) – Next Level Prompt Engineering

👉 Go Back to the Full Course: Next Level Prompt Engineering with AutoGen Studio

Hi and welcome back to part 2, where we’re going to be diving head first into AutoGen Studio, now that we have it running inside a secure environment where it can freely execute any code it generates without restrictions or danger.

You will be dropped into the Playground tab which lists your sessions. A session is just sort of an ongoing conversation between you and one of the workflows you’ve created. A quick pro-tip, you can click that sun icon next to the ‘Guest User’ name on the top right to switch to ‘night mode’.

That’s basically it for the Playground tab, so let’s take a quick look at the Gallery:

This is of course empty right now, but when you have a session you want to save from the Playground tab, you can save it to this Gallery tab, and it will remain here for your future reference, even if you delete the session in the Playground tab.

The build tab

Now let’s take a rough look at the Build tab before we dive in deeper.

We have 4 menu items here. The top one is for Skills. These skills are basically just Python functions. If you’ve worked with OpenAI function calling before, these are just the functions or tools that you used to pass in.

The major difference is that traditionally ChatGPT gave us the arguments and the function it wanted to call and we then had to call the functions ourselves. In AutoGen, the AI agents can call these functions or skills themselves, executing code autonomously. They can even generate functions that don’t exist in this skill list and execute them right away. This is why we went through all the trouble of setting up a separate Docker environment to run in!

If you click on one of these skills, you’ll see that it is just a Python function:

Moving on to the Models tab, here we can define the LLMs that AutoGen Studio has access to. We can use these to define our connection to ChatGPT if we want to use different versions, like GPT-3.5 Turbo or GPT-4, but we can also use other LLMs besides OpenAI’s ChatGPT and even run an LLM locally on our own computer and add the model to the list in here. We’ll get back to this topic later.

If you click on one of the models that is predefined, you’ll see that the details are mostly empty:

This doesn’t really matter for now as AutoGen Studio will sort of default to ChatGPT-4 automatically and read our API key that we set as an environment variable using our Dockerfile. So we don’t really have to set up anything in the Models tab for now.

Moving on to the Agents tab, here is where you define the AI agents or ‘virtual members of your team’ or ‘robot slaves’🤖🤖 if you will 🙃.

The two agents that come predefined are the primary_assistant and the userproxy. The primary assistant is a standard assistant that can write code, has a Prompt with instructions it follows, can use Skills, and can be linked to a certain LLM Model or Models.

The userproxy is a special agent in your group that is used to represent the user. It takes your input and passes it to the agent or groups of agents, and is also used to execute code.

We’ll go into more detail on specific prompts and settings for the agents later.


The final tab is the ‘Workflows tab’:

This is where we combine it all together to create a Workflow. It can be as simple as just two agents, which is the userproxy agent taking the user input and feeding it to an assistant agent, like this:

    🤖Userproxy --> 🤖Assistant

But it can also be more complex, involving multiple agents, which looks something like this:

🤖Userproxy --> 🤖Group chat manager --> 🤖🤖🤖🤖 (team)

Where the userproxy agent feeds into a group chat manager agent, which then feeds into a whole group of agents that all collaborate to achieve your goal.

So let’s check this stuff out, starting with something simple. Click the Visualization Agent Workflow on the right-hand side:

As you can see here, the overall workflow has several properties we can set:

  • The Workflow Name and Workflow Description are just a short name and description, mostly for our own convenience and to keep things organized.
  • The Summary Method indicates the final return type. As the agents work on our assignment, eventually they will be done. While we will be able to see the whole conversation no matter what setting we use, this Summary Method setting will determine what the agents will return as a final answer to the conversation. We can choose between:
  • last – The last message of the conversation is shown as the outcome of the task.
  • none – No summary is shown, but you can of course check out the whole conversation.
  • llm – This will generate a summary of the entire conversation using the linked LLM model. So it’s like having one final LLM call to ask for a summary.
  • The Sender, which is usually the userproxy agent which receives your input question and sends it to the other agent(s).
  • The Receiver, which is the agent or team that will receive the assignment and work on it. This can be either a single agent or a group of agents.

So go ahead and click on the Sender which is the userproxy agent:

Here in the userproxy agent properties we again have a name and description which are just for our own convenience. The Max Consecutive Auto Reply sets a limit to the number of messages so you can set a limit to how long you want the ‘conversation’ to go on for and not get stuck in some kind of infinite loop just making endless LLM calls.

The Human Input Mode has three settings in the underlying AutoGen library; ALWAYS, TERMINATE, or NEVER, and it determines when the userproxy will ask the human being for feedback during generation (human-in-the-loop). This feature is not yet available in the user-interface AutoGen Studio though, so we’ll just leave it at NEVER for now, which means the models do their work independently without human feedback and return the final result.

If you’re watching this tutorial in the future, and this feature is now in the Studio version of AutoGen, know that the ALWAYS version will always ask for human feedback whenever it receives a message (as the name ‘always’ suggests). TERMINATE will ask for human feedback when it encounters a termination message (more on those later), so it will check with you if you’re satisfied with the final result before making it into the final answer and ending.

If we scroll down below we have the following settings:

The System Message is like the instructions for the agent. For a userproxy agent we just leave these empty. The Model and Skills sections allow us to give the agent access to specific LLM models and skills that we can define in the other two tabs we looked at. Again, for the userproxy agent just leave everything blank as it’s sort of a special agent.

Whenever defining an agent, be careful with that last dropdown at the bottom “Or replace with an existing agent” as it is a quick way to overwrite all the settings you just made with whatever agent you select.

So go ahead and close up the userproxy without changing anything, and then let’s check out the visualization_assistant that is set as the receiver:

Again we have the name and description, Max Consecutive Auto Reply, and Human Input Mode settings, just leave them as is. The important stuff is below so let’s scroll on down:

AutoGen style prompts

The System Message here is very important and holds some important clues about how AutoGen agents work. If we take some time to understand this right now, you will save a lot of time when you write the prompts for your own AutoGen agents later on. I’ll copy it here and let’s take a moment to analyze what’s going on:

Your task is to ensure you generate a high-quality visualization for the user. Your visualizations must follow best practices and you must articulate your reasoning for your choices. The visualization must not have grid or outline box. The visualization should have an APPROPRIATE ASPECT RATIO e..g rectangular for time series data. The title must be bold. Importantly, if THE CHART IS A LINE CHART, you MUST ADD A LINE OF BEST FIT and ADD TEXT ON THE SLOPE OF EACH LINE. Note that today's date is 12/10/2023. At each point, do your best to determine if the user's request has been addressed and if so, respond with a summary. The summary must be written as a coherent helpful response to the user request e.g. 'Sure, here is result to your request '. The summary MUST end with the word TERMINATE. If the user request is  pleasantry or greeting, you should respond with a pleasantry or greeting and TERMINATE.

The initial instructions are much like you would expect for a typical prompt setup, telling the model what its role is and what it’s supposed to do. It gets detailed instructions on how to draw graphs and what features are desired.

The important part is the end. First of all the model is instructed to “determine if the user’s request has been addressed”, and if so, “respond with a summary”. So the model is encouraged to check if it is done with the task, which is important.

After that, it says the summary which will become the last message, “must be written as a coherent helpful response to the user request”. So we can see that this type of assistant will do well with the Summary Method of last that we discussed earlier.

Finally, it says that the summary must end with the word TERMINATE. This one is really important. Regard this as a special keyword in AutoGen agents. TERMINATE means done, so if you are writing the prompt for an agent of your own, make sure to instruct it to write TERMINATE when the task is done.

If you do not do this, the agents may just keep generating responses, and if you have multiple agents in a group they may even start thanking each other, not realizing they are done and forgetting all about their assignment and returning the response to you. This looks something like the following:

COEX_teacher (to COEX_groupchat_assistant):

Thank you for the feedback. I'm glad to hear that the updated comments are comprehensive and helpful for understanding the `get_random_date` function. If you have any more code that needs explaining or any other questions, feel free to reach out. Happy coding!

COEX_critic (to COEX_groupchat_assistant):

You're welcome! I'm glad I could help. If you have any more code that needs review or any questions in the future, don't hesitate to ask. Happy coding to you too!

COEX_teacher (to COEX_groupchat_assistant):

As an AI, I don't code myself, but I'm here to assist whenever you need help understanding code, learning programming concepts, or tackling coding challenges. Remember, I'm here to support you on your coding journey. Have a great day!

COEX_critic (to COEX_groupchat_assistant):

Thank you for the clarification. If you ever need assistance with code explanations, programming concepts, or guidance on coding challenges, feel free to reach out. Have a wonderful day and happy teaching!

I think you will agree that while this looks hilarious and silly, it is not very helpful and also wastes a lot of our tokens until the maximum auto-reply limit we set kicks in. This problem is called a gratitude loop and is one of the reasons for the specific “if you’re done, please end and write TERMINATE” instructions in the prompt.

Ok now let’s return to the visualization_assistant settings we still have open:

The Model setting has three models passed in here. As we saw when we checked out the Models tab earlier, these are actually empty settings that are not set up yet, but if these had their settings filled in, they would be used in the order you put them in the list. So in this example, it would use GPT-4-1106-preview first, but if that was unavailable for some reason it would use the second one as a fallback option, and if that was unavailable the third, etc..

The model will default to GPT-4-1106-preview, reading our API key from the Docker environment variable we set up earlier. Just leave the Model settings as is for now.

Finally, we have the Skills section, which should have the generate_images skill passed in, but for some reason on my AutoGen, it is empty. So let’s add it by pressing add+ and then selecting the generate_images skill from the dropdown and using Add Skill:

Now that we have the generate_images skill in there, press OK to save the settings:

Then press OK again in the Workflow Specification window to save the Visualization Agent Workflow.

Time to have some fun!

Now let’s go to the Playground tab and test it out. Click + New and then select the Visualization Agent Workflow from the dropdown and press Create to start a new session:

Now let’s ask it to do something for us. I’m going to ask it to create me a cool image.

Create me an image of a giant and fearsome half dinosaur half robot T-Rex with a laser cannon on its back, in the middle of a major metropolitan city, destroying skyscrapers and shooting down fighter jets with its laser cannon.

So we give it a moment to work and then:

Now of course the image is pretty awesome:

But let’s take a look using this very simple example, and see the beauty and simplicity of what is going on here exactly:

First, the userproxy notifies the visualization_assistant of the task to be executed, it simply passes on our message to the assistant as we explained.

userproxy ( to visualization_assistant )
Create me an image of a giant and fearsome half dinosaur half robot T-Rex with a laser cannon on its back, in the middle of a major metropolitan city, destroying skyscrapers and shooting down fighter jets with its laser cannon.

Now the visualization_assistant responds back to the userproxy, notice how we have sort of a conversation going on here? The visualization_assistant reasons that it can use the generate_and_save_images skill that we gave it.

visualization_assistant ( to userproxy )
To create the image you requested, I will use the generate_and_save_images function from the file. This function will generate an image based on your description and save it to disk. Please execute the following Python code to generate the image:
# filename:
from skills import generate_and_save_images

# Define the query for the image generation
query = "A giant and fearsome half dinosaur half robot T-Rex with a laser cannon on its back, in the middle of a major metropolitan city, destroying skyscrapers and shooting down fighter jets with its laser cannon."

# Call the function to generate and save the image
Please run this script, and it will generate the image and print the filename where the image is saved.

Notice how it generates new code but does not execute the code it generates. It starts by importing the skill, adding our query in a variable, and then adding a call to the function. Once this code has been generated, it passes the code back to the userproxy with a request to “please execute the script”.

The userproxy now receives this code from the visualization_assistant, with the request to it. Remember the userproxy only passes on our messages and executes code, those are its only functions. This is why it doesn’t need an LLM to run and we left all those boxes empty earlier on.

userproxy ( to visualization_assistant )
exitcode: 0 (execution succeeded)
Code output:
Image saved to cb7bb0f9-d133-450d-93b6-ffa49476d9cd.png

The userproxy simply executes the code as requested by the assistant, and returns the exitcode to the assistant to indicate if there were any errors or not, plus whatever output the code execution returned.

Now the visualization assistant correctly reasons the task has been accomplished, as the code ran successfully and returned the saved image. So it follows its prompt instructions to give a helpful response summary and then write the word TERMINATE to end the generating process.

visualization_assistant ( to userproxy )
Great! The image has been successfully generated and saved with the filename cb7bb0f9-d133-450d-93b6-ffa49476d9cd.png. You can now view the image to see the depiction of a giant and fearsome half dinosaur half robot T-Rex with a laser cannon on its back, in the middle of a major metropolitan city, destroying skyscrapers and shooting down fighter jets with its laser cannon.

If you need any further assistance or another image, feel free to ask. TERMINATE

If the userproxy code execution returned an error instead, the assistant would reiterate to improve on the code, find out what went wrong, and request an execution of the fixed code instead until the task has been completed successfully, at which point the response summary and TERMINATE would be triggered, so it’s like self-correcting code.

The whole thing has a very simple elegance to it, now all we really do it scale this idea up to more complex tasks and teams. All bigger conversations basically follow the same patterns as above.

More examples

So let’s look at an example that AutoGen Studio has prepared and built in for us. You will see the four green buttons with suggestions at the bottom, go ahead and click the Stock Price button to run the stock price demo example:

This will ask AutoGen for a “chart of NVDA and TESLA stock price for 2023”. Pro tip: You can wait for the final response to appear in the AutoGen UI, or you can read along as the conversation runs in real-time in the terminal window where you started Docker.

When it’s done you’ll see something like this appear:

If you look at the conversation, you’ll see the exact same simple pattern as above, except there are a couple more messages. In my case the following happened:

  • userproxy passes the assignment to the assistant
  • assistant reasoned it needs to install some libraries, so gives the code to do this to the userproxy and asks it to execute it
  • userproxy executes the library install command successfully
  • assistant generates code to retrieve the stock data and plot it in a chart, and asks the userproxy to execute it
  • userproxy executes the code but runs into errors, so it returns the errors to the assistant
  • assistant reasons where it made mistakes and improves the code, then asks the userproxy to execute the improved code
  • userproxy executes the improved code successfully
  • assistant realizes the task has been completed successfully, so it gives a summary response and writes TERMINATE to end the conversation

Now your version may be slightly different in the details, but the overall pattern will be the same. It will make mistakes while it is going and then reason its way out of them and keep going until it’s done.

Note that if you just want to ask a very simple question, like having a normal and old-fashioned conversation with a chatbot, there is nothing stopping you from doing this. Try pressing the Markdown button example:

This will ask AutoGen to “List out the top 5 rivers in Africa and their length and return that as a markdown table. Do not try to write any code, just write the table”.

Very quickly you will have your response as this takes only a single ‘vanilla’ style ChatGPT call to answer:

So while AutoGen is mostly built to do more complex stuff, you can also just make a single call if that is what you want.

That’s it for part 2! Now that we really understand how AutoGen works and what style of prompts and interaction goes on between the agents, we’re ready to take things up a notch in part 3 and look at adding new functions, group chats, and even having the AI basically creating more powerful versions of itself! See you there!

👉 Go Back to the Full Course: Next Level Prompt Engineering with AutoGen Studio