(4/6) OpenAI API Mastery: Innovating with GPT-4 Turbo and DALL·E 3 – AI Image Edits and Variations

Welcome to part 4, where we’ll be looking at DALLE·3 (hereafter referred to as Dalle) and AI image edits and variations.

Dalle 3 is OpenAI’s newest image generation AI and the quality is stunning. Before we get started on the basics, let’s take a moment to discuss the pricing.

DALL·E 3 costs $0.04 per image in 1024×1024 and $0.08 per image in 1024×1792 or 1792×1024 in standard quality mode, or $0.08 and $0.12 per image in HD quality mode.

While this is still affordable, it is a lot more expensive than general text generation models. It is a lot of fun to play around with, and I have to actively stop myself from wasting too much time generating spaghetti monsters and zombie versions of Spongebob, but be careful before you generate hundreds of images as the costs do add up. (More info on size and quality later)

Ok so let’s just get started, and we’ll discuss the details as we go along!

Create a new folder named ‘4_DALLE‘ and create a new file called ‘dalle_3.py‘ AND another file named ‘utils.py‘ inside, like this:

    📁FINX_OPENAI_UPDATES (root project folder)
        📁1_Parallel_function_calling
        📁2_JSON_mode_and_seeds
        📁3_GPT4_turbo
        📁4_DALLE
            📄dalle_3.py
            📄utils.py
        📄.env

Image Download Utility

Let’s get started on the ‘utils.py‘ file first.

As we’ll be receiving hyperlinks to the generated images as a response, and not the images themselves, we’ll make a quick download-image-from-link utility to save us from writing the same code several times.

We’ll be taking the link and downloading and saving that image inside the current ‘4_DALLE‘ folder as a convenient solution to save us some time and effort.

Inside utils.py start with the imports:

import requests
import datetime
from pathlib import Path

current_directory = Path(__file__).parent

We use requests to make an HTTP request to the image link, and we’ll use datetime to generate a unique image filename based on the current time so we don’t overwrite or have conflicts between images.

We import pathlib and use the trick we used before to get the current directory of the file we’re running right now so we can easily save the images in the same folder.

Now our simple utility:

def image_downloader(image_url: str | None):
    if image_url is None:
        raise ValueError("No image URL returned from API.")
    response = requests.get(image_url)
    if response.status_code != 200:
        raise ValueError("Could not download image from URL.")
    # Get current time in format YYYY-MM-DD-HH-MM-SS-MS
    current_time: str = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")
    with open(f"{current_directory}/{current_time}.png", "wb") as file:
        file.write(response.content)

The function will take a string with the URL or None in case the API didn’t return an image somehow.

If there is no link we raise a ValueError.

We then make a request to the URL and check if the response status code is 200, which means the request was successful. If it wasn’t we raise another ValueError.

If the request is successful we generate a string with the current time, using string-from-time, in the format YYYY-MM-DD-HH-MM-SS-MS and save the image to the current directory with that string as the filename.

We use the 'wb' flag to write the image in binary mode as it’s not a text file but an image format.

Go ahead and save and close that, now every time we want to download an image we can just call image_downloader(image_url) and it will download and save the image for us in the current folder.

DALL-E 3

Now go to your 'dalle_3.py' file and add the imports and client variable:

from decouple import config
from openai import OpenAI
from utils import image_downloader

client = OpenAI(api_key=config("OPENAI_API_KEY"))

Now let’s create a simple function to make a call to the API:

def dalle_3(query: str):
    response = client.images.generate(
        model="dall-e-3",
        prompt=query,
        size="1024x1024",
        quality="standard",  # standard or hd
        n=1,
    )
    image_url = response.data[0].url
    print(response)
    image_downloader(image_url)
    return image_url

We take a string as the query and make a call to the API, setting the model to 'dall-e-3' and the prompt to our query.

We set the size to 1024×1024 and the quality to standard, the n is the number of images to generate, but you cannot generate more than 1 image at a time using DALL-E version 3 for now, which is why the n parameter is set to 1.

We then get the image_url, which is located in the response.data[0].url and print the response.

After that, we call our image_downloader function with the image_url, so the image will appear in our directory automatically, and return the image_url.

If you need multiple images you can make multiple calls in parallel at the same time as long as they are separate calls each requesting 1 image. As we have no need for this, and to keep your costs to a minimum, we’ll stick with single images for the tutorials.

So let’s generate an image:

dalle_3("a dragon flying in the night sky breathing fire, mystical and magical.")

Go ahead and run your file and here is what I got:

Pretty epic right? (Above is the quality="standard" setting)

Note that you can set the quality to "high", but you may not really need to. Both look very good. For reference, I ran the above function again with nothing changed except the setting set to quality="high" and here is what I got:

Pretty epic, but not a huge difference between the first and second images in terms of quality, they are both very good. The first image will cost you 4 dollar cents and the second 8 dollar cents.

Automatic Prompt Revision

One interesting fact is that behind the scenes your prompt is actually rewritten because very detailed prompts tend to give much better output in these models.

They probably have a special version of ChatGPT fine-tuned behind the scenes to turn our basic prompt query into something much more detailed, which is why the output is so nice. It also does some safety checks of course, like making sure you’re not trying to create images of Trump on the moon riding a dinosaur, not that you were thinking of doing that hey! 😉

This revised version of your prompt query is actually returned in the response in case you’re curious to see it. It’s located in:

response.data[0].revised_prompt

And I’ve gone ahead and looked in there for our dragon image query to give you an idea of how your queries are changed. The query we sent was:

"a dragon flying in the night sky breathing fire, mystical and magical."

But the actual query that OpenAI fed into their model was:

Picture this: a wondrous, majestic dragon with scales reflecting the moonlight, soaring effortlessly through the inky eldritch blackness of the night sky. The creature, bathed in silhouettes from the luminescent stars, unveils its fearsome jaws as it exhales a wild, fiery blaze. The fire thrown from its mouth dances fiercely against the night, creating a magical spectacle of light and shadows. This scene captures the epitome of mystical realm, illuminating the dark expanse, marking its territory with its mesmerizing pyro display.

You can see this is very long and very descriptive, though it does align well with our original query and request.

These kinds of very long and descriptive prompts, and user prompts being automatically rewritten in this fashion, are quite common for high-end image generation models.

Using a Basic Non-Rewritten Prompt

In case you don’t want your query to be rewritten, OpenAI actually gives you a way to avoid this, by stating specifically in the request that you want to use the tool with simple prompts, asking it not to add any detail.

So let’s create version 2 of our function below that allows the user to choose if they want a basic prompt or one that is auto-revised:

def dalle_3_w_prompt_choice(query: str, basic_prompt=False):
    basic_request = """
    I NEED to test how the tool works with extremely simple prompts.
    DO NOT add any detail, just use it AS-IS:
    """
    final_query = query + basic_request if basic_prompt else query
    response = client.images.generate(
        model="dall-e-3",
        prompt=final_query,
        size="1024x1024",
        quality="standard",  # standard or hd
        n=1,
    )
    image_url = response.data[0].url
    print(response)
    print(f"Revised prompt: {response.data[0].revised_prompt}")
    image_downloader(image_url)
    return image_url

We have a variable named 'basic_request' which contains the exact text OpenAI recommends us to use if we want to use the basic prompt feature.

We then have a final_query variable which is the query we want to use, plus the basic_request if the user wants a basic prompt, or just the query if they don’t.

All the rest is the same. So if we set basic_prompt to False our image will get generated as before, but if we set it to True we’ll get a basic prompt.

We also added a print statement that prints the response.data[0].revised_prompt key, so you can see the revised prompt or the fact that your prompt input was not revised if you set basic_prompt to True.

I’m going to do another call to the API with the basic prompt setting set to True, so we can see the difference in image quality:

dalle_3_w_prompt_choice(
    "a dragon flying in the night sky breathing fire, mystical and magical.",
    basic_prompt=True,
)

Dragon with the same query, quality standard, and basic_prompt = True:

Still looks pretty good to me! And in our terminal, we can see the print message returns our exact prompt meaning the model did not change our input at all:

Revised prompt: a dragon flying in the night sky breathing fire, mystical and magical.

So all of those images are pretty insane and epic, but what about the rights? Can I use these images as I please?

The official answer from OpenAI is as follows (original link):

Subject to the Content Policy and Terms, you own the images you create with DALL·E, including the right to reprint, sell, and merchandise regardless of whether an image was generated through a free or paid credit.

So yes, you can use these for your blog or whatever you wish to use them for.

Image Edits and Variations

Next, I want to take a brief look at some features that are not yet available in DALL·E 3 but are available in DALL·E 2.

These will undoubtedly come to version 3 at some point in the future and are extremely powerful, which is why I want to take a moment to prepare us for what is coming in the future!

So let’s go ahead and create a new file named 'edits_and_variations.py' in our '4_DALLE' folder:

    📁FINX_OPENAI_UPDATES (root project folder)
        📁1_Parallel_function_calling
        📁2_JSON_mode_and_seeds
        📁3_GPT4_turbo
        📁4_DALLE
            📄dalle_3.py
            📄edits_and_variations.py
            📄utils.py
        📄.env

And inside 'edits_and_variations.py' add the imports:

from decouple import config
from pathlib import Path
from openai import OpenAI
from utils import image_downloader

Just our basic imports plus the image_downloader utility we made. Now for our basic setup:

client = OpenAI(api_key=config("OPENAI_API_KEY"))
current_directory = Path(__file__).parent

AI Image Edits

We’ll start with the AI image edits first.

The idea here is that we have an original image, and we want to edit some features of that image, change it in some way.

To illustrate this idea we will be using an image from the Unsplash site, which is a free image site with some really nice images. I’ve chosen this image of a woman standing on top of a sand dune:

Thanks to the user ‘NEOM’ on Unsplash for this cool image. It’s under the Unsplash license so we don’t have to worry about copyright. I’ve quickly made the image square for simplicity, you can download the version I will use by just saving the image below:

In order to have an AI edit of a certain part of this image, we must have a masked version of the image, though I’m sure there will be easier ways to do this for end-users without ‘Photoshop’ skills in the future.

For your convenience, I’ve already made a masked version of the image, which you can download below:

This is nothing but a PNG file with a transparent hole where we want the image to be edited. Make sure you save both the 'edit-original.png' and 'edit-masked.png' in your '4_DALLE' folder like so:

    📁FINX_OPENAI_UPDATES (root project folder)
        📁1_Parallel_function_calling
        📁2_JSON_mode_and_seeds
        📁3_GPT4_turbo
        📁4_DALLE
            📄dalle_3.py
            📄edits_and_variations.py
            📄utils.py
            📄edit-original.png
            📄edit-masked.png
        📄.env

Now let’s create a function to make a call to the API and request an edit based on the original image and the mask:

def dalle_editor(image_path: str, masked_image_path: str, prompt: str):
    response = client.images.edit(
        model="dall-e-2",
        image=open(image_path, "rb"),
        mask=open(masked_image_path, "rb"),
        prompt=prompt,
        n=1,
        size="1024x1024",
    )
    image_url = response.data[0].url
    image_downloader(image_url)
    return image_url

We create a function that takes a path to both the image and masked_image, and then a string for the prompt. We make a call to the API using client.images.edit this time and pass in the DALLE-2 model. We then open the image and masked_image in binary mode (‘rb’) and pass them in as the image and mask parameters. We then pass in the prompt, set n or the number to 1, and size=”1024×1024″.

We then get the image_url from the response and call our image_downloader function with the image_url to download it, and finally we return the image_url.

One important point to keep in mind here is that we must describe the end result picture we want in our prompt text, and that includes features already present in the original image.

Notice in the prompt below we describe the sand dune in the middle of the desert with a clear blue sky, keeping congruent with the original base image:

dalle_editor(
    image_path=f"{current_directory}/edit-original.png",
    masked_image_path=f"{current_directory}/edit-masked.png",
    prompt="Picture of a sanddune in the middle of the desert with a scary alien monster standing at the top of the sanddune. Desert, with a clear blue sky. Fierce strong muscular monster alien.",
)

I’m not going to wrap it in a print statement to print the image link (but you may if you like), as our image editor will auto-save the image to the current directory for us.

Go ahead and run this and see what image pops up in your folder, here is what I got:

Now as you can see, this result is really not that good. That’s because the publicly available DALL·E 2 model API frankly is not that good. But imagine this with the quality of the DALL·E 3 API and you will realize that this type of image editing is going to be a big deal in the future, which is why we’re taking a look at it now.

The potential is enormous and I think it’s clear that in the future the authenticity of any and all images will be up for discussion, as image manipulation will no longer be limited to those with elite ‘Photoshop’ skills.

In the future users will just describe which feature or area to replace or perhaps draw a circle using their mouse cursor in some simple web interface to create the masked area, and the AI will do the rest.

Make sure you comment out the dalle_editor() function call so it won’t keep getting called as we continue with the next example below.

Image Variations

Another use for AI image manipulation is to generate variations of an existing image, making similar images in the same style.

The syntax for this is fairly familiar and for this example, I will use the same 'edit-original.png' image you already have in your folder from the previous example.

We’ll keep coding in the 'edits_and_variations.py' file adding a variations function below:

def dalle_variation(img_path: str, number_of_variations: int):
    response = client.images.create_variation(
        model="dall-e-2",
        image=open(img_path, "rb"),
        n=number_of_variations,
        size="1024x1024",
    )
    image_url = response.data[0].url
    image_downloader(image_url)
    return image_url

This is basically the same setup as above except we take a number_of_variations argument and then call client.images.create_variation.

We only need the original image and the number of variations we want to generate. Note that there is no prompt message as this model’s only function is to generate variations.

So go ahead and give this a spin:

dalle_variation(
    img_path=f"{current_directory}/edit-original.png",
    number_of_variations=1,
)

And here is what I got:

Now the result of this is surprisingly good, considering we’re only using the DALL·E 2 API, which is nowhere even close to the quality of version 3.

You can see the image is an extremely similar variation to the original, and it tells the same story of a person walking off into the distance to the top of a sand dune.

As these editing and variation functions come to version 3 in the future they will be very powerful, and with high quality, they will definitely be worth using. I can already see future jailbroken or open-source versions of this technology providing us with endless pictures of Trump in space and Elon Musk dancing on stage like Elvis.

Now that we have taken a brief look at the current and future states of AI image generation and editing, let’s move on to the next part where we’ll be looking at the current state of AI speech generation, which is also taking shocking leaps from where it used to be.

👉 I’ll see you soon in part 5!