AI Meme Engineer (3/4) – Meme Image Generation

Welcome to part 3, where we will be getting started on the image generation logic. The code for this is not too bad but it does require some explanation. We’ll make sure to go over it in detail.

The first thing we’ll need is Pillow!

No, not that kind of pillow. Pillow, originally known as PIL (Python Imaging Library), is a library that adds image processing capabilities for Python. So let’s install it first by running this command in your terminal:

pip install pillow

Starting on the meme image editor

With that out of the way, let’s get coding! First create a new file called meme_image_editor.py in your project root folder:

πŸ“ Meme_Gen
    πŸ“ fonts
        πŸ“„ ARIAL.TTF
        πŸ“„ ComicSansMS3.ttf
        πŸ“„ impact.ttf
    πŸ“ output
    πŸ“ templates
        πŸ–Ό (all the images in here...)
    πŸ“„ .env
    πŸ“„ get_meme.py
    πŸ“„ load_meme_data.py
    πŸ“„ meme_data.json
    πŸ“„ meme_image_editor.py  --> create this file
    πŸ“„ system_instructions.py

Now inside this new file, let’s start by importing the necessary modules:

import textwrap
from pathlib import Path
from uuid import uuid4

from PIL import Image, ImageDraw, ImageFont

from load_meme_data import MemeData

First up we have textwrap. This module is a part of the standard Python Library and provides utilities for formatting and wrapping text. It is particularly useful for ensuring that text fits within a certain width, which will be perfect for us when overlaying text on our images.

Next, we have Path and uuid4. Path is a class from the pathlib module that represents the path to a file or directory, our output image folder for example. uuid4 is a function from the uuid module that generates a random UUID (Universally Unique Identifier) which we will use to name our output images so they don’t overwrite each other.

Finally, we have Image, ImageDraw, and ImageFont from the Pillow library. These classes will allow us to work with the image data, and you will see them used in action later.

Next up, let’s define some constants:

ROOT_DIRECTORY = Path(__file__).resolve().parent
IMAGE_FOLDER = ROOT_DIRECTORY / "templates"
FONT_FOLDER = ROOT_DIRECTORY / "fonts"
OUTPUT_FOLDER = ROOT_DIRECTORY / "output"
LINE_HEIGHT_MULTIPLIER = 1.4

First we get the ROOT_DIRECTORY by giving Path the __file__ attribute of the current module (meme_image_editor.py) and using .resolve() to get the path to this file. We then use .parent to get the parent directory which contains this file, thus the root directory of our project.

Now we can easily create paths to our templates, fonts, and output folders by using the / operator to concatenate the ROOT_DIRECTORY with the folder name.

The last constant LINE_HEIGHT_MULTIPLIER will be used later to make sure the text has the correct height for the box coordinates it’s supposed to fit in. It’s declared here so we can easily adjust the number later if needed, but I’ve found 1.4 to work very well through trial and error.

Utility functions

We’ll create a couple of utility functions before we get into the main logic. First of all, most memes are typed IN ALL CAPITAL LETTERS, but not all of them. For the UNO Draw 25 meme, for example, I prefer the comic sans font in lowercase. We haven’t asked ChatGPT to account for this and we don’t have to as this is very easy to fix:

def handle_text_caps(font_name: str, text: str) -> str:
    if font_name == "impact.ttf":
        return text.upper()
    elif font_name == "ComicSansMS.ttf":
        return text.lower()
    return text

Our utility function handle_text_caps takes in the font_name and the text as strings. If the font is Impact I want the return to be in ALL CAPITAL LETTERS. If the font is Comic Sans, I want the return to be in all lowercase. For Arial I want the text to be as is, so there is no need to name it as it will just pass through this function unchanged.

Note: If you did use extra fonts or different names, make sure to use the correct names for your setup here.

The next thing we’ll need is a utility function which gives us a rough estimate of how wide a typical character is for a particular font at a particular size.

Why do we need to know this? Well, the textwrap function that we will use later requires a width for each line of text before it should wrap to the next line. This is not measured in pixels but in the number of characters before the line should wrap. So we need to know the width of a character in a particular font and size to calculate how many characters can fit inside of our pixel coordinates for the textbox.

So add the following function:

def get_char_width_in_px(font: ImageFont.FreeTypeFont, font_name: str) -> int:
    representative_character = handle_text_caps(font_name, "A")
    char_left_top_right_bottom = font.getbbox(representative_character)
    return char_left_top_right_bottom[2] - char_left_top_right_bottom[0]

We named the function get_char_width_in_px and it takes in a font object of type ImageFont.FreeTypeFont and a font_name as a string. ImageFont.FreeTypeFont is just the name of the class that ImageFont uses to represent a font. We will get this type of object later in our main logic when we load the font and then pass it into this function.

First we get a representative character for the font. In this case, we use the letter “A”, either as a capital letter or not, using our handle_text_caps function to decide. Different letters have different widths, so using A is just a rough estimate, but it works well enough for our purposes.

The ImageFont.FreeTypeFont has a method called getbbox (get bounding box) which returns a tuple of 4 integers representing the left, top, right, and bottom coordinates of the bounding box of the character. So basically a list of 4 numbers that represent the width and height of the character.

We’re going to return the value of the right coordinate (located in index 2) minus the left coordinate (located in index 0) to get the width of the character in pixels which is the return for the function, being the width of a single letter a for this font at this size.

The next utility function is going to be very easy! We need a function to generate unique filenames for our output images. Add this function to your code:

def get_unique_filename() -> Path:
    return OUTPUT_FOLDER / f"{uuid4()}.png"

This function is called get_unique_filename and it returns a Path object. We use the uuid4 function to generate a random UUID and then concatenate it with the .png extension and the OUTPUT_FOLDER path to create a unique filename for our output images. This path will look something like ‘output\0d2e83d3-f0d9-4a53-a270-dd8bde38d207.png’.

Next up is a utility function to calculate the total height of a wrapped text. So say we have some text that is wrapped over 4 lines total, we need to know the exact pixel height for those lines of text combined for a given font and font size.

Why do we need to know the exact height of a block of several lines of wrapped text? Well, we have the x, y, width, and height coordinates of where the text is supposed to go in our memedata. We need a way to test if the height of this block of text at the current font size will fit within the height we defined for the textbox in our meme_data.json file.

Let’s take it one step at a time though, this function is only going to be concerned with calculating the total height of the block of text and returning the value:

def calculate_text_height(
    drawing, lines: list[str], font: ImageFont.FreeTypeFont
) -> int:
    total_text_height = 0
    for line in lines:
        bbox_for_line = drawing.textbbox((0, 0), line, font=font)
        bbox_top = bbox_for_line[1]
        bbox_bottom = bbox_for_line[3]
        total_text_height += bbox_bottom - bbox_top
    return int(total_text_height * LINE_HEIGHT_MULTIPLIER)

Let’s go over this as it may seem a bit confusing to read, starting with the input arguments. We have drawing which is an ImageDraw object we will pass in from the main function later. This ImageDraw object allows us to draw stuff, in this case test text just to see what the height will be. The second argument is lines which is a list of strings, each string being a line of text that has been wrapped by the textwrap module. So imagine something like:

lines = [
    "This is the meme",
    "text that we are",
    "going to overlay",
    "on the image"
]

The final argument is the font object of type ImageFont.FreeTypeFont that we saw in one of our previous functions as well which holds settings for the current font name and size. Again, we will pass all these arguments in from the main function later when we call this utility function.

We’re going to start by initializing the variable total_text_height to 0. We then loop over each line in the lines list and get the bounding box for that line of text using the drawing.textbbox method. This method returns a tuple of 4 integers representing the left, top, right, and bottom coordinates of the bounding box of the text. We then get the top and bottom coordinates from this tuple, storing them in the variables named bbox_top and bbox_bottom to keep the code readable.

We then calculate the height of the line of text by subtracting the top coordinate from the bottom coordinate and adding this value to the total_text_height variable, doing this for each line in the text. Finally, we multiply the total height by the LINE_HEIGHT_MULTIPLIER constant we defined earlier to account for the empty space between the lines and not just the height of the letters themselves. This gives us a good enough estimate of the height of the text block if it were to really be drawn on the image, so it acts as a ‘height test’ if you will.

Main function

Next up is the main function that will bind together the logic and all these utility functions to successfully overlay text on an image. It is a reasonably long function which makes it a bit harder to explain in written format.

I’ll show the function in bits and pieces and then after explaining everything we’ll look at the whole thing all put together one more time. It might be good to just read over the piece-by-piece explanation first and then look at and start copying the whole function at the end when we repeat it.

Let’s get started:

def overlay_text_on_image(meme: MemeData, texts: list[str]) -> Path:
    font_name: str = meme["font_path"]
    texts = [handle_text_caps(font_name, text) for text in texts]
    image_path: Path = IMAGE_FOLDER / meme["file_path"]
    font_file: str = str(FONT_FOLDER / font_name)
    bounding_boxes: list[list[int]] = meme["text_coordinates_xy_wh"]

We define a function overlay_text_on_image which takes in a MemeData object and a list of strings. The MemeData object is one of the 11 that we have in our list of memes in the meme_data.json file, and the list of strings is the text that ChatGPT generated for us to overlay on the image.

We have all the data inside of our meme object so we can just access the ["font_path"] to get the name of the font. We then use a list comprehension to loop over each text in the texts list and pass it through our handle_text_caps function to make sure it is in the correct case for the font, saving the result as texts.

We get the image_path to the template image by concatenating the IMAGE_FOLDER path with the file_path key from the meme object, and then also get the path to the font file by concatenating the FONT_FOLDER path with the font_name key from the meme object.

Note that the image_path is a Path object but in the case of font_file we have converted the path into a string using the str() method. This is because the function from Pillow which will load the font for us requires a string version of the path and not a Path object.

Finally, we get the bounding_boxes list from the meme object which contains the x, y, width, and height coordinates for the text box that we are going to overlay the text in. Again, this is a list of lists, each inner list containing 4 integers like we set up in our meme_data:

First we’re going to open up our image:

    with Image.open(image_path) as img:
        draw = ImageDraw.Draw(img)

        for bounding_box, text in zip(bounding_boxes, texts):
            # We run a bunch of code here

We open the image using the Image.open method from the Pillow library and use a context manager to ensure that the image is properly closed after we are done with it. This means that all indented code after this line will be executed with the image open as img.

We then create an ImageDraw object called draw which we will use to draw text on the image. The ImageDraw.Draw function is just an interface provided by Pillow for 2D drawings, so draw is now an object that we can use to draw on the image using Pillow.

Next, we use the zip function. As not everybody will be familiar with this method here is the basic idea:

The zip method in Python takes two or more lists and combines them into pairs, creating a new iterable of tuples. Think of it like zipping up a jacket, where each side of the zipper is a different list. For example, if you have [🍎, 🍌, 🍍, πŸ“] in one list, and [πŸš—, 🚲, 🚒, ✈️] in another list, using zip will pair them up like this: [(🍎, πŸš—), (🍌, 🚲), (🍍, 🚒), (πŸ“, ✈️)]. So we first have the first index of both lists, then the second, and so on.

So in our code above, we zip the bounding boxes and the texts together. This means that we will get pairs of (bounding_box, text) where bounding_box is a list of 4 integers and text is the string supposed to go inside of that particular bounding box’s coordinates. We then have a for loop that will loop over each pair of bounding-box-and-text, so each entry in or list will be processed one by one. Coming back to the example above, we’d have one loop for (🍎, πŸš—), then one loop for (🍌, 🚲), etc, etc…

Now let’s continue with the code, I’ll repeat the for loop line we ended the last block with for clarity:

        for bounding_box, text in zip(bounding_boxes, texts):
            x, y, box_width, box_height = bounding_box

            font_size = 8
            font = ImageFont.truetype(font_file, font_size)
            lines = textwrap.wrap(
                text,
                break_long_words=False,
                width=box_width // get_char_width_in_px(font, font_name),
            )
            total_text_height = calculate_text_height(draw, lines, font)

First we unpack the bounding_box list into 4 variables: x, y, box_width, and box_height. These are the coordinates for the text box that we are going to overlay the text in, and having them properly named like this makes the code more readable.

Now we may have a very long meme text that needs to fit inside these coordinates or it may be a super short text like "Me". So our first goal here is to find the maximum font size that will fit inside of the bounding box coordinates we have. We’re going to start with a font_size of 8 and then later we’ll start increasing it until the text no longer fits inside the box to find the maximum size.

We first create a font object using the ImageFont.truetype method from Pillow. This method takes in the path to the font file as a string and the font size as an integer. Next, we’re going to use the textwrap.wrap method to wrap the text into lines that will fit inside the box width.

The break_long_words=False argument means that the textwrap module will not break long words in the text, but instead will wrap the text at the nearest space character. The width argument is the number of characters that can fit inside the box width, which we calculate by dividing the total box width by the width of a single character in the font using our own get_char_width_in_px function.

Then we calculate the total_text_height of the wrapped text using our own calculate_text_height function. This is why we wrote these utility functions first. Now we have the total height of the text if we wrap it around at this font size. When it gets too large to fit in the coordinates we’ll know we’ve hit the limit for the font.

Let’s add some more:

        for bounding_box, text in zip(bounding_boxes, texts):
            x, y, box_width, box_height = bounding_box

            font_size = 8
            font = ImageFont.truetype(font_file, font_size)
            lines = textwrap.wrap(
                text,
                break_long_words=False,
                width=box_width // get_char_width_in_px(font, font_name),
            )
            total_text_height = calculate_text_height(draw, lines, font)

            ## New code from here ##
            def text_height_is_ok():
                return total_text_height < box_height

            def text_width_is_ok():
                return all(
                    draw.textbbox((0, 0), line, font=font)[2] < box_width
                    for line in lines
                )

            while text_height_is_ok() and text_width_is_ok():
                font_size += 1
                font = ImageFont.truetype(font_file, font_size)
                lines = textwrap.wrap(
                    text,
                    break_long_words=False,
                    width=box_width // get_char_width_in_px(font, font_name),
                )
                total_text_height = calculate_text_height(draw, lines, font)

First we define two functions inside of this function. The first one is text_height_is_ok which returns a boolean value of True if the total_text_height is less than the box_height and False if it is not.

The next function is text_width_is_ok which returns True if the width of all the lines of text is less than the box_width and False if it is not. For each line in the lines list it will use the draw.textbbox method to get the bounding box of the line of text and then check if the right coordinate, located in index [2] of the bounding box is less than the box_width. If all the lines are less than the box_width then the function will return True as all lines fit into the box width. Otherwise, it will return False.

Then we open a while loop. This loop will run as long as the text_height_is_ok function returns True and the text_width_is_ok function returns True. This means that the loop will run as long as the text fits inside the box height and width.

Each time the loop runs, we’re going to increase the font size by 1 and then reinitialize the font object with this new size. After that, we recalculate the lines list by wrapping the text again at the new font size and then recalculate the total_text_height of the wrapped text at this new font size. This is a repeat of the logic we had before but now each time this while loop runs the text will be one pixel larger in font size than before.

Eventually the text will be too large to fit inside the box height or width and either text_height_is_ok or text_width_is_ok will return False and the loop will stop. This is how we find the maximum font size that will fit inside the box coordinates we have.

After we break out of this while loop we need to go back one font size to get the font size that actually fits inside the box:

            while text_height_is_ok() and text_width_is_ok():
                font_size += 1
                font = ImageFont.truetype(font_file, font_size)
                lines = textwrap.wrap(
                    text,
                    break_long_words=False,
                    width=box_width // get_char_width_in_px(font, font_name),
                )
                total_text_height = calculate_text_height(draw, lines, font)

            ## New code from here ##
            font_size -= 1
            font = ImageFont.truetype(font_file, font_size)
            lines = textwrap.wrap(
                text,
                break_long_words=False,
                width=box_width // get_char_width_in_px(font, font_name),
            )

We just reduce the font size by 1 and reinitialize the font object with this new size, recalculating the lines one last time at the correct size.

Now we have a bunch of text that either fills the complete width or height of our bounding box. What this means is the bounding box could be 500 pixels wide and 100 pixels high, but the text could be 500 pixels wide and only 50 pixels high. Ideally, we would want the text to be centered inside of this bounding box, so we’d get a stroke of 25px of empty space, then 50px of the text, and then another 25px of empty space.

Let’s add some code to take care of this:

            font_size -= 1
            font = ImageFont.truetype(font_file, font_size)
            lines = textwrap.wrap(
                text,
                break_long_words=False,
                width=box_width // get_char_width_in_px(font, font_name),
            )

            ## New code from here ##
            total_text_height = calculate_text_height(draw, lines, font)
            text_y = y + (box_height - total_text_height) / 2

First of all we get the total_text_height of the wrapped text at the correct font size using our utility function. Then we calculate the desired text_y coordinate which is the y coordinate where the text should start drawing. This is calculated by taking the y coordinate of the bounding box and then adding half of the empty space above and below the text. This will center the text vertically inside of the bounding box.

Now that we have the perfect coordinates, font size, and wrapped text, we can finally actually draw the text on the image:

            total_text_height = calculate_text_height(draw, lines, font)
            text_y = y + (box_height - total_text_height) / 2

            ## New code from here ##
            for line in lines:
                text_width, text_height = draw.textbbox((0, 0), line, font=font)[2:]
                text_x = x + (box_width - text_width) / 2

                text_stroke = meme.get("text_stroke", False)
                text_draw_settings = {
                    "xy": (text_x, text_y),
                    "text": line,
                    "font": font,
                    "fill": meme["text_color"],
                }

                if text_stroke:
                    stroke_width = get_char_width_in_px(font, font_name) // 6
                    text_draw_settings["stroke_width"] = stroke_width
                    text_draw_settings["stroke_fill"] = (
                        "black" if meme["text_color"] == "white" else "white"
                    )

                draw.text(**text_draw_settings)
                text_y += text_height

Here we open yet another inner loop as we have several lines of text to write. For each line in lines we calculate the width and height of the line of text using the draw.textbbox (text-bounding-box) method we’ve used before. We know this method returns the left, top, right, and bottom coordinates of the bounding box of the text, so using the slice from index 2 to the end [:2] we get the width and height of the text which we save in text_width and text_height.

Now we calculate the ideal text_x coordinate for the text in much the same way we did with the text_y coordinate before, making sure to place the text in the middle of the bounding box horizontally.

Next up we get the "text_stroke" key from the meme object to see if we had set a stroke for the text in the meme_data.json file, saving the True or False in the text_stroke variable. We then set up a dictionary with settings for the real text drawing, having the xy coordinates in the first key, followed by the text and font, and finally the fill color for the text.

We can pass this dictionary to the draw.text method to make it draw with our settings but before we do we need to make some changes if text_stroke is enabled. We calculate the stroke_width by dividing the width of a character by 6, as I found this to be a reasonable stroke size and this way the stroke scales bigger or smaller with the font size which I personally like.

We then add two new keys to the text_draw_settings dictionary, stroke_width and stroke_fill, for the width of the stroke we just calculated and the fill color. The fill color is black if the meme’s regular text color is white and white if the text color is black, as we want the stroke to be the opposite color of the text to make it stand out.

Finally, we can now call the draw.text method and unpack ** the text_draw_settings dictionary into the method. This will pass all the key-value pairs in the dictionary as keyword arguments to the method, so we don’t have to write them all out manually. This will draw the text on the image with the settings we have defined.

After drawing the text we increase the text_y coordinate by the text_height of the line of text. Remember this is a loop that runs for each line so it will continue down to the next line now. This way we move the text_y coordinate down by the height of the line of text so the next line of text will be drawn below the current line.

Now all that is left is to save the image and return the file path to where the image is stored:

                draw.text(**text_draw_settings)
                text_y += text_height

        ## New code from here ##
        image_path = get_unique_filename()
        img.save(image_path)

    print(f"Image saved to {image_path}")
    return image_path

We go back out two levels of indentation, spelling the end of the for bounding_box, text in zip(bounding_boxes, texts): loop we started so long ago. We then generate a unique filename for the image using our get_unique_filename function and save the image to this path using the img.save method from Pillow.

Finally, we print a message to the console telling the user where the image was saved and return the image_path so the user can access the image if they want to. Phew! That was a lot of code but we’re finally done with the main function.

Putting it all together

Here is the whole thing one more time without being cut up into pieces, so you can better see the indentation and inner loops fitting together:

def overlay_text_on_image(meme: MemeData, texts: list[str]) -> Path:
    font_name: str = meme["font_path"]
    texts = [handle_text_caps(font_name, text) for text in texts]
    image_path: Path = IMAGE_FOLDER / meme["file_path"]
    font_file: str = str(FONT_FOLDER / font_name)
    bounding_boxes: list[list[int]] = meme["text_coordinates_xy_wh"]

    with Image.open(image_path) as img:
        draw = ImageDraw.Draw(img)

        for bounding_box, text in zip(bounding_boxes, texts):
            x, y, box_width, box_height = bounding_box

            font_size = 8
            font = ImageFont.truetype(font_file, font_size)
            lines = textwrap.wrap(
                text,
                break_long_words=False,
                width=box_width // get_char_width_in_px(font, font_name),
            )
            total_text_height = calculate_text_height(draw, lines, font)

            def text_height_is_ok():
                return total_text_height < box_height

            def text_width_is_ok():
                return all(
                    draw.textbbox((0, 0), line, font=font)[2] < box_width
                    for line in lines
                )

            while text_height_is_ok() and text_width_is_ok():
                font_size += 1
                font = ImageFont.truetype(font_file, font_size)
                lines = textwrap.wrap(
                    text,
                    break_long_words=False,
                    width=box_width // get_char_width_in_px(font, font_name),
                )
                total_text_height = calculate_text_height(draw, lines, font)

            font_size -= 1
            font = ImageFont.truetype(font_file, font_size)
            lines = textwrap.wrap(
                text,
                break_long_words=False,
                width=box_width // get_char_width_in_px(font, font_name),
            )

            total_text_height = calculate_text_height(draw, lines, font)
            text_y = y + (box_height - total_text_height) / 2

            for line in lines:
                text_width, text_height = draw.textbbox((0, 0), line, font=font)[2:]
                text_x = x + (box_width - text_width) / 2

                text_stroke = meme['text_stroke']
                text_draw_settings = {
                    "xy": (text_x, text_y),
                    "text": line,
                    "font": font,
                    "fill": meme["text_color"],
                }

                if text_stroke:
                    stroke_width = get_char_width_in_px(font, font_name) // 6
                    text_draw_settings["stroke_width"] = stroke_width
                    text_draw_settings["stroke_fill"] = (
                        "black" if meme["text_color"] == "white" else "white"
                    )

                draw.text(**text_draw_settings)
                text_y += text_height

        image_path = get_unique_filename()
        img.save(image_path)

    print(f"Image saved to {image_path}")
    return image_path

Testing the code

Before we move on from here let’s have a quick test inside this file to make sure our code works as expected. It’s time for another if __name__ == "__main__": block:

if __name__ == "__main__":
    from load_meme_data import MemeData, load_meme_data

    meme_data: list[MemeData] = load_meme_data()
    chosen_meme = meme_data[10]
    overlay_text_on_image(
        chosen_meme,
        chosen_meme["example_output"]
    )

For our quick test, we’ll import the MemeData datatype ‘type hint’ and the load_meme_data function to load the data. We then load the meme data using the load_meme_data function so that we have the list of MemeData objects to work with. For this test I will arbitrarily choose the 11th meme in the list but choose any you like.

Now we can make a test call to our new overlay_text_on_image function. We pass in the chosen_meme object for the data and the second argument is a list of the texts we want to draw. Let’s just use the "example_output" key from the chosen_meme object as this is just a test run. Now go ahead and run your file, and you’ll see the following in your terminal:

Image saved to C:\Coding_Vault\Meme_Gen\output\e25413fd-dc04-4041-a6b6-e06bd2c1dd68.png

Go ahead and CTRL+Click on the path or open it manually and tada:

Here is our first meme image with the text successfully overlayed on top of it! If you see this image then congratulations, you did everything correctly!

Feel free to change the index in the test from [10] to any other number if you want to test some more. Here is index [4] for example:

These are just the example texts of course, so let’s head over to part 4 where we will string this all together with the ChatGPT logic and generate some real memes! We will also build a nice frontend so you don’t have to keep opening the meme images manually. See you there!

Leave a Comment