Intro to AI Engineering with Python (4/6) โ€“ Rapid Prototyping and Demos Using Gradio

Hi and welcome back to part 4 of this tutorial series. In this part, we’ll look at using Gradio to quickly build an interface so we can have a prototype of whatever it is we’re building using LLMs, APIs, or even Machine Learning models.

This part will be slightly different from the others in that we’ll take a bit of time out to build a small project using everything we’ve learned so far and adding Gradio as a new component. The project we’ll build is a simple silly story generator and it should look like this at the end of this tutorial part:

We’ll have some fun with this so the project is admittedly a bit silly in nature, but the knowledge can be equally applied to serious matters and will show you the basics of quickly prototyping an idea or setup so you can share it with others who can then interact with it through a simple interface.

Getting started

The first step is to install Gradio, as it is simply a Python library. So run the following command in your terminal:

pip install gradio

Now that we have gradio installed, let’s get to it! Start by creating a new file in your project folder named gradio_project.py:

๐Ÿ“ Intro_to_AI_engineering
    ๐Ÿ“ output
    ๐Ÿ“„ .env
    ๐Ÿ“„ chat_gpt_request.py
    ๐Ÿ“„ generate_image.py
    ๐Ÿ“„ generate_speech.py
    ๐Ÿ“„ gradio_project.py     โœจ New file
    ๐Ÿ“„ langchain_basics.py
    ๐Ÿ“„ test_audio.mp3
    ๐Ÿ“„ text_to_summarize.txt
    ๐Ÿ“„ transcribe_audio.py

And inside gradio_project.py, we’ll get started with our imports:

import gradio as gr
from dotenv import load_dotenv
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

These are all familiar imports from previous parts, except the gradio import. It’s a bit of a convention to use gr as the alias for gradio when importing it, so we’ll stick with that. Now that is nice and all, but as you can see from the project image I shared above, our prototype will also have an image.

Complete your imports by adding the following:

from generate_image import generate_image, download_and_save_image

load_dotenv()

We could also use LangChain to call Dall-e and generate our images that way, but in part 2 we already wrote reusable functions to generate an image and download it from the internet. We can simply import and reuse what we’ve already built before!

Finally, we call load_dotenv() as always, for the API key.

Writing the prompts

As we can see from the project image featuring Muscle Kiwi Man above, we are going to need a prompt for ChatGPT to write us a silly story on a particular topic. We’ll also need an image though and if you remember our generate_image function, it takes a description of the desired image to be generated as input.

If we have to write this description ourselves that would be very inconvenient, yet giving the whole short story to Dall-e will be too much conflicting information about what needs to be in the image. This is why we’ll need a second prompt for ChatGPT in which we’ll simply ask it to generate an appropriate description for the image for us, which can then go into our generate_image function.

Let’s start with the prompt for the story generation:

short_story_prompt = ChatPromptTemplate.from_template(
    """
    Please give me a short story where the main character is a(n) {word} and make it very bizarre. The story should have two to three paragraphs and proper formatting."
    Story:
    """
)

We use the same LangChain ChatPromptTemplate class we did before, making sure to put in a {word} placeholder for the main character of the story, which the user will provide for us. We also specify the desired length of the story and for it to have proper formatting (using paragraphs).

The ending just says Story: to invite the LLM to start writing the story for us right away, without any awkward introductions. Sometimes an LLM will answer like “Sure I can help you with that” and then start the story, but we don’t want that in our case. By stating Story: it wouldn’t make sense for ChatGPT to start with anything other than the story itself.

Next, we’ll write the prompt to ask for a good image description to fit the story:

image_description_prompt = ChatPromptTemplate.from_template(
    """
    Please take the following story and describe the image that would go with it (Return a prompt for an image generator). Keep in mind the main character of this story is a(n) {word}.
    Story: {story}
    Your prompt for the image generator:
    """
)

So here we just ask ChatGPT to describe an image that would go with the story we provide it, stating for clarity that we need a prompt for an image generator. We also remind it to keep the main character in mind as it should be important in the image. There are two placeholders for {word} and {story}, and we end with the Your prompt for the image generator: trick again to try and avoid any unnecessary introductions in the response.

Setting up our model and the Chains

Now we can set up our ChatGPT model and the Chains we’ll use to generate the story and the image description. Let’s define the llm and output_parser first:

llm = ChatOpenAI(model="gpt-4o-mini")
output_parser = StrOutputParser()

No surprises there, these are the same as we’ve used before. Again, feel free to use the larger gpt-4o model if you prefer. Now we’ll set up the individual chains for the story and the image description:

story_chain = short_story_prompt | llm | output_parser
image_description_chain = image_description_prompt | llm | output_parser

We simply used the | pipe syntax to chain the prompt into the LLM and then into the output parser for both chains. Again, if you want to learn more intricate and powerful ways to combine and create systems, refer to the LangChain/LangGraph course we have at the Finxter Academy. As this system is pretty simple, we’ll just use a function to call the chains in order:

def run_chain(word: str):
    story = story_chain.invoke({"word": word})
    print("story:", story, end="\n\n")

    image_description = image_description_chain.invoke(
        {
            "word": word,
            "story": story,
        }
    )
    print("image description:", image_description)

    image_url = generate_image(image_description)
    image_file_path = download_and_save_image(image_url)

    return story, image_file_path

So the entire function takes a single input of word which is a string value. This is the only thing we need from the user to run this. Technically, they can enter more than a single word (e.g. “Crazy Pineapple”), but we’ll just name it word for now. We then get the story by calling .invoke on the story_chain, passing in a dictionary with the key and value for word.

There is a print statement in there, just so we can also see what is going on in the terminal, and we then invoke the second chain as well, passing in both the word parameter and the story parameter which resulted from the last step. Another print statement purely for the terminal follows.

Now we call our generate_image function from part 2. But as Gradio will need a real image to display, and not just an internet URL, we also use our download_and_save_image function which will save the image for us. The function we wrote also returns the file path of the saved image which we save in a variable.

Finally, we have a double return of both the story and the image_file_path, which is all that Gradio will need to display the story and the image in our demo prototype.

Building the Gradio interface

Now that we have everything we want to run in our demo, it’s time to see how we can use Gradio to build a quick and simple user interface so that we can share our product prototype with less technical people.

Gradio has several ways to build an interface, one of which is using Blocks. This is a pretty simple and intuitive layout system and the basic idea is as follows:

  • We start by creating a Blocks context using gr.Blocks() This context will contain all the components of the interface.
  • Inside the Blocks context, you can add various components like text inputs, buttons, text outputs, and images. These components are added using methods like gr.Textbox(), gr.Button(), gr.Image(), etc.
  • We can organize the layout of these components using the containers gr.Row() and gr.Column() to place items either horizontally or vertically in rows.
  • Each component can have various properties set, such as label, placeholder, autofocus, max_length, etc, (depending on what type of component it is) to customize its behavior and appearance.
  • We can define interactions between components using methods like submit_button.click(). This method specifies what should happen when a button is clicked.
  • Finally, we’ll have to launch the interface using .launch(), which will start a local web server so we can see the interface in our browser.

So let’s start by opening a Blocks context and setting up the basic layout:

if __name__ == "__main__":
    with gr.Blocks() as demo:
        with gr.Column():
            with gr.Row():
                #...Row1 stuff here...
            with gr.Row():
                #...Row2 stuff here...
            with gr.Row():
                #...Row3 stuff here...

I used an if __name__ == "__main__": guard to make sure the code only runs when we run the file directly, just like we used in previous parts. We open up the context by stating with gr.Blocks() as demo:, and then everything inside the indented block will be part of the interface.

If we look at the silly story generator one more time:

We can see that the items are laid out in a column from top to bottom, and inside this column, there are several rows. First we have a title header (which we’ll add later), then a row for the input, another row for the Generate button, and finally a row for the output which holds the story and the image side by side.

First I’ll add the title header:

if __name__ == "__main__":
    with gr.Blocks() as demo:
        gr.Markdown("# Silly-o-Matic Story Generator Thing...")
        with gr.Column():
            with gr.Row():
                #...Row1 stuff here...
            with gr.Row():
                #...Row2 stuff here...
            with gr.Row():
                #...Row3 stuff here...

We use the gr.Markdown() method to add some markdown text. The # symbol is used to denote a header in markdown, thus the text will be displayed as a big bold title in the interface.

Next, we’ll implement the first row:

if __name__ == "__main__":
    with gr.Blocks() as demo:
        gr.Markdown("# Silly-o-Matic Story Generator Thing...")
        with gr.Column():
            with gr.Row():
                word_input = gr.Textbox(
                    label="Who or what is the main character of your story?",
                    placeholder="e.g. 'a talking banana'",
                    autofocus=True,
                    max_length=40,
                )
            with gr.Row():
                #...Row2 stuff here...
            with gr.Row():
                #...Row3 stuff here...

The word input is simply a gr.Textbox() component, the label and placeholder are self-explanatory, and we set a max_length of 40 characters. The autofocus property is set to True so that the input field is automatically selected when the page is loaded for user convenience.

Now replace the #...Row2 stuff here... comment with the following gr.Button button:

submit_button = gr.Button(
    value="๐Ÿคช Generate a silly story for me! ูฉ(^แ—œ^ )ูˆ ( โ‰งแ—œโ‰ฆ)ยด-( ยฐใƒฎยฐ ) ?",
    variant="primary",
    size="lg",
)

The value property is the text that will be displayed on the button (you don’t have to copy all the weird emojis!). I set the variant to primary to give it a little bit of a different style from the rest of the page so it attracts attention, and size="lg" gives us the large size.

For the next row, we’ll have to add two components as we want them side by side. Note that gradio will handle the sizing for us automatically so we don’t have to define that we want both of these to be half the width of the page or anything like that. Replace the #...Row3 stuff here... comment with the following:

story_output = gr.Textbox(
    label="Generated Story",
    lines=20,
    max_lines=40,
    show_copy_button=True,
)
image_output = gr.Image(
    label="Generated Image",
    show_download_button=True,
    show_share_button=True,
    show_fullscreen_button=True,
)

The gr.Textbox() component is used to display the generated story, and we set the lines and max_lines properties to 20 and 40 respectively to make it a bit larger. The show_copy_button=True does what it says on the box and is enabled for convenience.

Next we add a gr.Image() which is named image_output. We set the label to “Generated Image” and enabled the show_download, show_share, and show_fullscreen buttons which Gradio provides for us.

We have two more things to finish this up. First, we need to define what happens when the button is clicked, and then we need to launch the interface. For clarity, I’ll provide the completed interface code here, with these two additions at the end:

if __name__ == "__main__":
    with gr.Blocks() as demo:
        gr.Markdown("# Silly-o-Matic Story Generator Thing...")
        with gr.Column():
            with gr.Row():
                word_input = gr.Textbox(
                    label="Who or what is the main character of your story?",
                    placeholder="e.g. 'a talking banana'",
                    autofocus=True,
                    max_length=40,
                )
            with gr.Row():
                submit_button = gr.Button(
                    value="๐Ÿคช Generate a silly story for me! ูฉ(^แ—œ^ )ูˆ ( โ‰งแ—œโ‰ฆ)ยด-( ยฐใƒฎยฐ ) ?",
                    variant="primary",
                    size="lg",
                )
            with gr.Row():
                story_output = gr.Textbox(
                    label="Generated Story",
                    lines=20,
                    max_lines=40,
                    show_copy_button=True,
                )
                image_output = gr.Image(
                    label="Generated Image",
                    show_download_button=True,
                    show_share_button=True,
                    show_fullscreen_button=True,
                )

        # Define button click action
        submit_button.click(
            run_chain, inputs=word_input, outputs=[story_output, image_output]
        )

    # Launch the interface
    demo.launch()

So we call the .click() method on the submit_button to define what will happen when the button is clicked. We first pass in the function that needs to be called when the button is clicked, this is the run_chain function we defined earlier.

In the inputs parameter, we pass in the word_input component, which is the input field where the user will enter the main character of the story. We need Gradio to pass this text into the run_chain function for us when calling it.

The run_chain function returns two values, the story and the image_file_path. The outputs parameter is a list that maps where each of these outputs needs to be displayed respectively, so the story goes into the story_output component, and the image_file_path goes into the image_output component.

Finally, we call .launch() on the demo context to start the local web server and display the interface in our browser.

Now it is time for the big moment! Run the file in your terminal and you should see something like this:

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

You can Ctrl + Click on the URL to open it in your browser and tadaa:

Try something fun as input, it will generate really weird stories and very trippy images, such as Frodo and the singing cabbages:

And let’s not forget about Colonel T-rex and the time-traveling food critics from the future who come to taste his amazing Fried chicken before he turns into pasta-saurus:

I’ve played around with it and gotten many bizarre stories and images, it’s a lot of fun! More importantly, you can now apply this same knowledge to prototype and share more serious ideas and projects as well.

We did not do a full course on Gradio as it is a pretty basic concept, but nevertheless, I still wanted to include it in this introductory course. If you’d like to see more Gradio examples you can check out the Voice-First Development: Building Cutting-Edge Python Apps Powered By OpenAI Whisper course where we use Gradio to build 3 different transcription related apps, including this podcast transcriber shown below:

You can also check out part 3 of the Hugging Face: Running Free and Open-Source LLMs Locally to Generate Text, Images, Speech, and Music on Your Machine course where we use Gradio to implement a chatbot interface running on a local LLM model completely on our own computer:

That’s it for this part, I’ll see you in the next one where we look at running LLMs and AI models locally and also have a look at HuggingFace. See you there!

Leave a Comment