Welcome to part 3, where we will be getting started on the image generation logic. The code for this is not too bad but it does require some explanation. We’ll make sure to go over it in detail.
The first thing we’ll need is Pillow
!
No, not that kind of pillow. Pillow
, originally known as PIL
(Python Imaging Library), is a library that adds image processing capabilities for Python. So let’s install it first by running this command in your terminal:
pip install pillow
Starting on the meme image editor
With that out of the way, let’s get coding! First create a new file called meme_image_editor.py
in your project root folder:
π Meme_Gen π fonts π ARIAL.TTF π ComicSansMS3.ttf π impact.ttf π output π templates πΌ (all the images in here...) π .env π get_meme.py π load_meme_data.py π meme_data.json π meme_image_editor.py --> create this file π system_instructions.py
Now inside this new file, let’s start by importing the necessary modules:
import textwrap from pathlib import Path from uuid import uuid4 from PIL import Image, ImageDraw, ImageFont from load_meme_data import MemeData
First up we have textwrap
. This module is a part of the standard Python Library and provides utilities for formatting and wrapping text. It is particularly useful for ensuring that text fits within a certain width, which will be perfect for us when overlaying text on our images.
Next, we have Path
and uuid4
. Path
is a class from the pathlib
module that represents the path to a file or directory, our output image folder for example. uuid4
is a function from the uuid
module that generates a random UUID (Universally Unique Identifier) which we will use to name our output images so they don’t overwrite each other.
Finally, we have Image
, ImageDraw
, and ImageFont
from the Pillow
library. These classes will allow us to work with the image data, and you will see them used in action later.
Next up, let’s define some constants:
ROOT_DIRECTORY = Path(__file__).resolve().parent IMAGE_FOLDER = ROOT_DIRECTORY / "templates" FONT_FOLDER = ROOT_DIRECTORY / "fonts" OUTPUT_FOLDER = ROOT_DIRECTORY / "output" LINE_HEIGHT_MULTIPLIER = 1.4
First we get the ROOT_DIRECTORY
by giving Path
the __file__
attribute of the current module (meme_image_editor.py
) and using .resolve()
to get the path to this file. We then use .parent
to get the parent directory which contains this file, thus the root directory of our project.
Now we can easily create paths to our templates
, fonts
, and output
folders by using the /
operator to concatenate the ROOT_DIRECTORY
with the folder name.
The last constant LINE_HEIGHT_MULTIPLIER
will be used later to make sure the text has the correct height for the box coordinates it’s supposed to fit in. It’s declared here so we can easily adjust the number later if needed, but I’ve found 1.4 to work very well through trial and error.
Utility functions
We’ll create a couple of utility functions before we get into the main logic. First of all, most memes are typed IN ALL CAPITAL LETTERS, but not all of them. For the UNO Draw 25 meme, for example, I prefer the comic sans font in lowercase. We haven’t asked ChatGPT to account for this and we don’t have to as this is very easy to fix:
def handle_text_caps(font_name: str, text: str) -> str: if font_name == "impact.ttf": return text.upper() elif font_name == "ComicSansMS.ttf": return text.lower() return text
Our utility function handle_text_caps
takes in the font_name
and the text
as strings. If the font is Impact I want the return to be in ALL CAPITAL LETTERS. If the font is Comic Sans, I want the return to be in all lowercase. For Arial I want the text to be as is, so there is no need to name it as it will just pass through this function unchanged.
Note: If you did use extra fonts or different names, make sure to use the correct names for your setup here.
The next thing we’ll need is a utility function which gives us a rough estimate of how wide a typical character is for a particular font at a particular size.
Why do we need to know this? Well, the textwrap function that we will use later requires a width for each line of text before it should wrap to the next line. This is not measured in pixels but in the number of characters before the line should wrap. So we need to know the width of a character in a particular font and size to calculate how many characters can fit inside of our pixel coordinates for the textbox.
So add the following function:
def get_char_width_in_px(font: ImageFont.FreeTypeFont, font_name: str) -> int: representative_character = handle_text_caps(font_name, "A") char_left_top_right_bottom = font.getbbox(representative_character) return char_left_top_right_bottom[2] - char_left_top_right_bottom[0]
We named the function get_char_width_in_px
and it takes in a font
object of type ImageFont.FreeTypeFont
and a font_name
as a string. ImageFont.FreeTypeFont
is just the name of the class that ImageFont
uses to represent a font. We will get this type of object later in our main logic when we load the font and then pass it into this function.
First we get a representative character for the font. In this case, we use the letter “A”, either as a capital letter or not, using our handle_text_caps
function to decide. Different letters have different widths, so using A
is just a rough estimate, but it works well enough for our purposes.
The ImageFont.FreeTypeFont
has a method called getbbox
(get bounding box) which returns a tuple of 4 integers representing the left, top, right, and bottom coordinates of the bounding box of the character. So basically a list of 4 numbers that represent the width and height of the character.
We’re going to return the value of the right coordinate (located in index 2) minus the left coordinate (located in index 0) to get the width of the character in pixels which is the return for the function, being the width of a single letter a
for this font at this size.
The next utility function is going to be very easy! We need a function to generate unique filenames for our output images. Add this function to your code:
def get_unique_filename() -> Path: return OUTPUT_FOLDER / f"{uuid4()}.png"
This function is called get_unique_filename
and it returns a Path
object. We use the uuid4
function to generate a random UUID and then concatenate it with the .png
extension and the OUTPUT_FOLDER
path to create a unique filename for our output images. This path will look something like ‘output\0d2e83d3-f0d9-4a53-a270-dd8bde38d207.png’.
Next up is a utility function to calculate the total height of a wrapped text. So say we have some text that is wrapped over 4 lines total, we need to know the exact pixel height for those lines of text combined for a given font and font size.
Why do we need to know the exact height of a block of several lines of wrapped text? Well, we have the x, y, width, and height coordinates of where the text is supposed to go in our memedata. We need a way to test if the height of this block of text at the current font size will fit within the height we defined for the textbox in our meme_data.json
file.
Let’s take it one step at a time though, this function is only going to be concerned with calculating the total height of the block of text and returning the value:
def calculate_text_height( drawing, lines: list[str], font: ImageFont.FreeTypeFont ) -> int: total_text_height = 0 for line in lines: bbox_for_line = drawing.textbbox((0, 0), line, font=font) bbox_top = bbox_for_line[1] bbox_bottom = bbox_for_line[3] total_text_height += bbox_bottom - bbox_top return int(total_text_height * LINE_HEIGHT_MULTIPLIER)
Let’s go over this as it may seem a bit confusing to read, starting with the input arguments. We have drawing
which is an ImageDraw
object we will pass in from the main function later. This ImageDraw
object allows us to draw stuff, in this case test text just to see what the height will be. The second argument is lines
which is a list of strings, each string being a line of text that has been wrapped by the textwrap
module. So imagine something like:
lines = [ "This is the meme", "text that we are", "going to overlay", "on the image" ]
The final argument is the font
object of type ImageFont.FreeTypeFont
that we saw in one of our previous functions as well which holds settings for the current font name and size. Again, we will pass all these arguments in from the main function later when we call this utility function.
We’re going to start by initializing the variable total_text_height
to 0. We then loop over each line in the lines
list and get the bounding box for that line of text using the drawing.textbbox
method. This method returns a tuple of 4 integers representing the left, top, right, and bottom coordinates of the bounding box of the text. We then get the top and bottom coordinates from this tuple, storing them in the variables named bbox_top
and bbox_bottom
to keep the code readable.
We then calculate the height of the line of text by subtracting the top coordinate from the bottom coordinate and adding this value to the total_text_height
variable, doing this for each line in the text. Finally, we multiply the total height by the LINE_HEIGHT_MULTIPLIER
constant we defined earlier to account for the empty space between the lines and not just the height of the letters themselves. This gives us a good enough estimate of the height of the text block if it were to really be drawn on the image, so it acts as a ‘height test’ if you will.
Main function
Next up is the main function that will bind together the logic and all these utility functions to successfully overlay text on an image. It is a reasonably long function which makes it a bit harder to explain in written format.
I’ll show the function in bits and pieces and then after explaining everything we’ll look at the whole thing all put together one more time. It might be good to just read over the piece-by-piece explanation first and then look at and start copying the whole function at the end when we repeat it.
Let’s get started:
def overlay_text_on_image(meme: MemeData, texts: list[str]) -> Path: font_name: str = meme["font_path"] texts = [handle_text_caps(font_name, text) for text in texts] image_path: Path = IMAGE_FOLDER / meme["file_path"] font_file: str = str(FONT_FOLDER / font_name) bounding_boxes: list[list[int]] = meme["text_coordinates_xy_wh"]
We define a function overlay_text_on_image
which takes in a MemeData
object and a list of strings. The MemeData
object is one of the 11 that we have in our list of memes in the meme_data.json
file, and the list of strings is the text that ChatGPT generated for us to overlay on the image.
We have all the data inside of our meme
object so we can just access the ["font_path"]
to get the name of the font. We then use a list comprehension to loop over each text in the texts
list and pass it through our handle_text_caps
function to make sure it is in the correct case for the font, saving the result as texts
.
We get the image_path to the template image by concatenating the IMAGE_FOLDER
path with the file_path
key from the meme
object, and then also get the path to the font file by concatenating the FONT_FOLDER
path with the font_name
key from the meme
object.
Note that the image_path
is a Path
object but in the case of font_file
we have converted the path into a string using the str()
method. This is because the function from Pillow which will load the font for us requires a string version of the path and not a Path
object.
Finally, we get the bounding_boxes
list from the meme
object which contains the x, y, width, and height coordinates for the text box that we are going to overlay the text in. Again, this is a list of lists, each inner list containing 4 integers like we set up in our meme_data:
First we’re going to open up our image:
with Image.open(image_path) as img: draw = ImageDraw.Draw(img) for bounding_box, text in zip(bounding_boxes, texts): # We run a bunch of code here
We open the image using the Image.open
method from the Pillow
library and use a context manager to ensure that the image is properly closed after we are done with it. This means that all indented code after this line will be executed with the image open as img
.
We then create an ImageDraw
object called draw
which we will use to draw text on the image. The ImageDraw.Draw function is just an interface provided by Pillow for 2D drawings, so draw
is now an object that we can use to draw on the image using Pillow.
Next, we use the zip
function. As not everybody will be familiar with this method here is the basic idea:
The zip method in Python takes two or more lists and combines them into pairs, creating a new iterable of tuples. Think of it like zipping up a jacket, where each side of the zipper is a different list. For example, if you have [π, π, π, π] in one list, and [π, π², π’, βοΈ] in another list, using zip will pair them up like this: [(π, π), (π, π²), (π, π’), (π, βοΈ)]. So we first have the first index of both lists, then the second, and so on.
So in our code above, we zip
the bounding boxes and the texts together. This means that we will get pairs of (bounding_box, text)
where bounding_box
is a list of 4 integers and text
is the string supposed to go inside of that particular bounding box’s coordinates. We then have a for
loop that will loop over each pair of bounding-box-and-text, so each entry in or list will be processed one by one. Coming back to the example above, we’d have one loop for (π, π), then one loop for (π, π²), etc, etcβ¦
Now let’s continue with the code, I’ll repeat the for
loop line we ended the last block with for clarity:
for bounding_box, text in zip(bounding_boxes, texts): x, y, box_width, box_height = bounding_box font_size = 8 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) total_text_height = calculate_text_height(draw, lines, font)
First we unpack the bounding_box
list into 4 variables: x
, y
, box_width
, and box_height
. These are the coordinates for the text box that we are going to overlay the text in, and having them properly named like this makes the code more readable.
Now we may have a very long meme text that needs to fit inside these coordinates or it may be a super short text like "Me"
. So our first goal here is to find the maximum font size that will fit inside of the bounding box coordinates we have. We’re going to start with a font_size
of 8 and then later we’ll start increasing it until the text no longer fits inside the box to find the maximum size.
We first create a font
object using the ImageFont.truetype
method from Pillow. This method takes in the path to the font file as a string and the font size as an integer. Next, we’re going to use the textwrap.wrap
method to wrap the text into lines that will fit inside the box width.
The break_long_words=False
argument means that the textwrap
module will not break long words in the text, but instead will wrap the text at the nearest space character. The width
argument is the number of characters that can fit inside the box width, which we calculate by dividing the total box width by the width of a single character in the font using our own get_char_width_in_px
function.
Then we calculate the total_text_height
of the wrapped text using our own calculate_text_height
function. This is why we wrote these utility functions first. Now we have the total height of the text if we wrap it around at this font size. When it gets too large to fit in the coordinates we’ll know we’ve hit the limit for the font.
Let’s add some more:
for bounding_box, text in zip(bounding_boxes, texts): x, y, box_width, box_height = bounding_box font_size = 8 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) total_text_height = calculate_text_height(draw, lines, font) ## New code from here ## def text_height_is_ok(): return total_text_height < box_height def text_width_is_ok(): return all( draw.textbbox((0, 0), line, font=font)[2] < box_width for line in lines ) while text_height_is_ok() and text_width_is_ok(): font_size += 1 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) total_text_height = calculate_text_height(draw, lines, font)
First we define two functions inside of this function. The first one is text_height_is_ok
which returns a boolean value of True
if the total_text_height
is less than the box_height
and False
if it is not.
The next function is text_width_is_ok
which returns True
if the width of all the lines of text is less than the box_width
and False
if it is not. For each line
in the lines
list it will use the draw.textbbox
method to get the bounding box of the line of text and then check if the right coordinate, located in index [2]
of the bounding box is less than the box_width
. If all the lines are less than the box_width
then the function will return True
as all lines fit into the box width. Otherwise, it will return False
.
Then we open a while
loop. This loop will run as long as the text_height_is_ok
function returns True
and the text_width_is_ok
function returns True
. This means that the loop will run as long as the text fits inside the box height and width.
Each time the loop runs, we’re going to increase the font size by 1 and then reinitialize the font
object with this new size. After that, we recalculate the lines
list by wrapping the text again at the new font size and then recalculate the total_text_height
of the wrapped text at this new font size. This is a repeat of the logic we had before but now each time this while
loop runs the text will be one pixel larger in font size than before.
Eventually the text will be too large to fit inside the box height or width and either text_height_is_ok
or text_width_is_ok
will return False
and the loop will stop. This is how we find the maximum font size that will fit inside the box coordinates we have.
After we break out of this while loop we need to go back one font size to get the font size that actually fits inside the box:
while text_height_is_ok() and text_width_is_ok(): font_size += 1 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) total_text_height = calculate_text_height(draw, lines, font) ## New code from here ## font_size -= 1 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), )
We just reduce the font size by 1 and reinitialize the font
object with this new size, recalculating the lines
one last time at the correct size.
Now we have a bunch of text that either fills the complete width or height of our bounding box. What this means is the bounding box could be 500 pixels wide and 100 pixels high, but the text could be 500 pixels wide and only 50 pixels high. Ideally, we would want the text to be centered inside of this bounding box, so we’d get a stroke of 25px of empty space, then 50px of the text, and then another 25px of empty space.
Let’s add some code to take care of this:
font_size -= 1 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) ## New code from here ## total_text_height = calculate_text_height(draw, lines, font) text_y = y + (box_height - total_text_height) / 2
First of all we get the total_text_height
of the wrapped text at the correct font size using our utility function. Then we calculate the desired text_y
coordinate which is the y coordinate where the text should start drawing. This is calculated by taking the y
coordinate of the bounding box and then adding half of the empty space above and below the text. This will center the text vertically inside of the bounding box.
Now that we have the perfect coordinates, font size, and wrapped text, we can finally actually draw the text on the image:
total_text_height = calculate_text_height(draw, lines, font) text_y = y + (box_height - total_text_height) / 2 ## New code from here ## for line in lines: text_width, text_height = draw.textbbox((0, 0), line, font=font)[2:] text_x = x + (box_width - text_width) / 2 text_stroke = meme.get("text_stroke", False) text_draw_settings = { "xy": (text_x, text_y), "text": line, "font": font, "fill": meme["text_color"], } if text_stroke: stroke_width = get_char_width_in_px(font, font_name) // 6 text_draw_settings["stroke_width"] = stroke_width text_draw_settings["stroke_fill"] = ( "black" if meme["text_color"] == "white" else "white" ) draw.text(**text_draw_settings) text_y += text_height
Here we open yet another inner loop as we have several lines of text to write. For each line
in lines
we calculate the width and height of the line of text using the draw.textbbox
(text-bounding-box) method we’ve used before. We know this method returns the left, top, right, and bottom coordinates of the bounding box of the text, so using the slice from index 2 to the end [:2]
we get the width and height of the text which we save in text_width
and text_height
.
Now we calculate the ideal text_x
coordinate for the text in much the same way we did with the text_y
coordinate before, making sure to place the text in the middle of the bounding box horizontally.
Next up we get the "text_stroke"
key from the meme
object to see if we had set a stroke for the text in the meme_data.json
file, saving the True
or False
in the text_stroke
variable. We then set up a dictionary with settings for the real text drawing, having the xy
coordinates in the first key, followed by the text
and font
, and finally the fill
color for the text.
We can pass this dictionary to the draw.text
method to make it draw with our settings but before we do we need to make some changes if text_stroke
is enabled. We calculate the stroke_width
by dividing the width of a character by 6, as I found this to be a reasonable stroke size and this way the stroke scales bigger or smaller with the font size which I personally like.
We then add two new keys to the text_draw_settings
dictionary, stroke_width
and stroke_fill
, for the width of the stroke we just calculated and the fill color. The fill color is black if the meme’s regular text color is white and white if the text color is black, as we want the stroke to be the opposite color of the text to make it stand out.
Finally, we can now call the draw.text
method and unpack **
the text_draw_settings
dictionary into the method. This will pass all the key-value pairs in the dictionary as keyword arguments to the method, so we don’t have to write them all out manually. This will draw the text on the image with the settings we have defined.
After drawing the text we increase the text_y
coordinate by the text_height
of the line of text. Remember this is a loop that runs for each line so it will continue down to the next line now. This way we move the text_y
coordinate down by the height of the line of text so the next line of text will be drawn below the current line.
Now all that is left is to save the image and return the file path to where the image is stored:
draw.text(**text_draw_settings) text_y += text_height ## New code from here ## image_path = get_unique_filename() img.save(image_path) print(f"Image saved to {image_path}") return image_path
We go back out two levels of indentation, spelling the end of the for bounding_box, text in zip(bounding_boxes, texts):
loop we started so long ago. We then generate a unique filename for the image using our get_unique_filename
function and save the image to this path using the img.save
method from Pillow.
Finally, we print a message to the console telling the user where the image was saved and return the image_path
so the user can access the image if they want to. Phew! That was a lot of code but we’re finally done with the main function.
Putting it all together
Here is the whole thing one more time without being cut up into pieces, so you can better see the indentation and inner loops fitting together:
def overlay_text_on_image(meme: MemeData, texts: list[str]) -> Path: font_name: str = meme["font_path"] texts = [handle_text_caps(font_name, text) for text in texts] image_path: Path = IMAGE_FOLDER / meme["file_path"] font_file: str = str(FONT_FOLDER / font_name) bounding_boxes: list[list[int]] = meme["text_coordinates_xy_wh"] with Image.open(image_path) as img: draw = ImageDraw.Draw(img) for bounding_box, text in zip(bounding_boxes, texts): x, y, box_width, box_height = bounding_box font_size = 8 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) total_text_height = calculate_text_height(draw, lines, font) def text_height_is_ok(): return total_text_height < box_height def text_width_is_ok(): return all( draw.textbbox((0, 0), line, font=font)[2] < box_width for line in lines ) while text_height_is_ok() and text_width_is_ok(): font_size += 1 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) total_text_height = calculate_text_height(draw, lines, font) font_size -= 1 font = ImageFont.truetype(font_file, font_size) lines = textwrap.wrap( text, break_long_words=False, width=box_width // get_char_width_in_px(font, font_name), ) total_text_height = calculate_text_height(draw, lines, font) text_y = y + (box_height - total_text_height) / 2 for line in lines: text_width, text_height = draw.textbbox((0, 0), line, font=font)[2:] text_x = x + (box_width - text_width) / 2 text_stroke = meme['text_stroke'] text_draw_settings = { "xy": (text_x, text_y), "text": line, "font": font, "fill": meme["text_color"], } if text_stroke: stroke_width = get_char_width_in_px(font, font_name) // 6 text_draw_settings["stroke_width"] = stroke_width text_draw_settings["stroke_fill"] = ( "black" if meme["text_color"] == "white" else "white" ) draw.text(**text_draw_settings) text_y += text_height image_path = get_unique_filename() img.save(image_path) print(f"Image saved to {image_path}") return image_path
Testing the code
Before we move on from here let’s have a quick test inside this file to make sure our code works as expected. It’s time for another if __name__ == "__main__":
block:
if __name__ == "__main__": from load_meme_data import MemeData, load_meme_data meme_data: list[MemeData] = load_meme_data() chosen_meme = meme_data[10] overlay_text_on_image( chosen_meme, chosen_meme["example_output"] )
For our quick test, we’ll import the MemeData
datatype ‘type hint’ and the load_meme_data
function to load the data. We then load the meme data using the load_meme_data
function so that we have the list of MemeData
objects to work with. For this test I will arbitrarily choose the 11th meme in the list but choose any you like.
Now we can make a test call to our new overlay_text_on_image
function. We pass in the chosen_meme
object for the data and the second argument is a list of the texts we want to draw. Let’s just use the "example_output"
key from the chosen_meme
object as this is just a test run. Now go ahead and run your file, and you’ll see the following in your terminal:
Image saved to C:\Coding_Vault\Meme_Gen\output\e25413fd-dc04-4041-a6b6-e06bd2c1dd68.png
Go ahead and CTRL+Click
on the path or open it manually and tada:
Here is our first meme image with the text successfully overlayed on top of it! If you see this image then congratulations, you did everything correctly!
Feel free to change the index in the test from [10]
to any other number if you want to test some more. Here is index [4]
for example:
These are just the example texts of course, so let’s head over to part 4 where we will string this all together with the ChatGPT logic and generate some real memes! We will also build a nice frontend so you don’t have to keep opening the meme images manually. See you there!