AI Meme Engineer (2/4) – Calling ChatGPT and Generating Memes

Welcome to part two of this tutorial series. Now that we have our meme data all set up and ready to go, it’s time to look at integrating ChatGPT into our project and generating the text for our memes.

Getting a ChatGPT API Key

The first thing we’ll need is a ChatGPT account. If you already have an account and API key, skip ahead past this next portion. If this is your first time using it, don’t worry, you will get a bunch of free credits after signing up for your first account so you should be able to follow along without having to pay any money.

Go to https://platform.openai.com/ and log in. If you already use OpenAI and already have an account set up with a free or paid API key, you can use that. If you don’t have an account just log in with your Google account. It will ask you something simple, like to fill in your birthday, and ta-da, you have an account!

When you log in on a brand new account you will see something like this (navigate to the dashboard if you land on a different page):

Find API keys in the left sidebar and click on it. If this is a new account it will ask you to verify your phone in order to create a new API key:

The reason they do this is to prevent bots from creating loads of free accounts and abusing their system. Just give them a phone number and they will send you a verification code to enter. You will also get a bunch of free credits from them to follow along with this tutorial, so it’s a win-win!

Find the green button to + Create new secret key in the top right corner and click on it:

In the next window, you can leave everything as is. You don’t need to give it a name or select a project. You can do these things if you want to, but I’ll just create a nameless general key for now by accepting everything as is and clicking the green Create secret key button:

You will now see your new API key:

So make sure you press the Copy button and save it somewhere safe, maybe a password manager. You won’t be able to see this key again, though you can always generate a new one if you lose it. Make sure not to share your key as anyone with your key can use your credits!

Adding the ChatGPT to our project

We’ll need to save our API key in our project. We don’t want to hardcode it into our scripts, as that would be a security risk. Instead, we’ll save it in a separate file and read it from there. This way, we can also easily share our code without sharing our API key.

Create a new file in the root folder and name it .env:

📁 Meme_Gen
    📁 fonts
        📄 ARIAL.TTF
        📄 ComicSansMS3.ttf
        📄 impact.ttf
    📁 output
    📁 templates
        🖼 (all the images in here...)
    📄 .env  --> create this file
    📄 load_meme_data.py
    📄 meme_data.json

The file has no name but only the extension .env and will store our API key. Open the file and paste your API key in there:

OPENAI_API_KEY=sk-loadsoflettersandnumbers

Make sure to replace sk-loadsoflettersandnumbers with your actual API key. Save and close this file. If you ever share your code, make sure to exclude this file from the shared code.

The next thing we’ll need to do is install two libraries for Python. Open a terminal in VSCode by clicking Terminal in the top menu and selecting New Terminal, or using the shortcut [Ctrl + `]. In the terminal, type the following command and press Enter:

pip install openai --upgrade

This will either install or update (if you already have it) the OpenAI Python library. This library will allow us to easily interact with the ChatGPT API.

The next library we need is called python-dotenv. This library will allow us to read the API key from the .env file we created. Install it by typing the following command in the terminal and pressing Enter:

pip install python-dotenv

Setting up the ChatGPT logic

Time to rock ‘n roll! Let’s create a new Python script called get_meme.py in our project folder:

📁 Meme_Gen
    📁 fonts
        📄 ARIAL.TTF
        📄 ComicSansMS3.ttf
        📄 impact.ttf
    📁 output
    📁 templates
        🖼 (all the images in here...)
    📄 .env
    📄 get_meme.py  --> create this file
    📄 load_meme_data.py
    📄 meme_data.json

The meme generation logic will not be too hard but will involve several steps, so let’s build a simplified version first that just calls ChatGPT, and then we can keep adding stuff on top of it until we have what we need.

Inside of get_meme.py, let’s start by importing the necessary libraries and loading our API key:

import os

from dotenv import load_dotenv
from openai import OpenAI


load_dotenv()

The openai library will allow us to interact with the ChatGPT API easily, and dotenv will allow us to read our API key from the .env file. Calling the load_dotenv() function will read the .env file and store the API keys contained inside it as environment variables. The os is imported as it has the ability to read these environment variables.

Let’s continue:

CLIENT = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
SYSTEM_MESSAGE = {
    "role": "system",
    "content": [
        {
            "type": "text",
            "text": "You are a cat. Reply to whatever the user says in whatever manner you see fit, but remember to be a cat.",
        }
    ],
}

Here, we create a CLIENT object that will allow us to interact with the ChatGPT API. We pass in the api_key parameter by using the os library’s .getenv method to read the environment variable (which the load_env() function loaded from the .env file).

We also create a SYSTEM_MESSAGE object that will be used to set the initial context for the conversation with ChatGPT. The object structure is a bit odd, but looking past that it basically just says that this message is coming from the system role and contains a text message that tells ChatGPT to act like a cat.

Calling ChatGPT

We’ll make these instructions more complex later. For now, let’s move on and create a function that will call ChatGPT:

def call_chatgpt(user_message):
    response = CLIENT.chat.completions.create(
        model="gpt-4o",
        messages=[SYSTEM_MESSAGE, user_message], # type: ignore
        temperature=1,
        max_tokens=2048,
    )
    return response.choices[0].message.content

This function will take a user_message as input and return a response from ChatGPT. We call the CLIENT.chat.completions.create method to send a message to ChatGPT, passing in the following parameters:

model: The model we want to use. We’re using gpt-4o here, which is currently one of the latest multi-purpose models.
messages: A list of messages to send to ChatGPT. We send the SYSTEM_MESSAGE first and then the user_message which this function receives as input.
temperature: A parameter that controls the randomness of the response between 0 and 1. A higher value will make the response more random, so our setting of 1 is on the more creative side (you can always try to lower this later).
max_tokens: The maximum number of tokens (length) in the response. We set this to 2048 to allow for a longer response, if you make this too short the response might be cut off halfway through.

Note: The # type: ignore comment is there to suppress a type error that the linter might throw. This is because SYSTEM_MESSAGE and user_message are not of the exact expected types. We used the correct structure for these variables ourselves so you can safely ignore this error.

Now that we have the response from OpenAI, we return the content of the message of the first choice in the response.choices list. This is the response from ChatGPT, it’s a bit awkward to access it like this in .choices[0].message.content but we’re just returning the response text that ChatGPT generated.

Now let’s create one more function that is actually going to call the call_chatgpt function. This may seem a bit redundant now, but we’ll add more logic to this function later:

def generate_memes(user_input: str):
    user_message = {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_input,
            }
        ],
    }
    response = call_chatgpt(user_message)
    print(response)

This function named generate_memes takes a user_input which is a string as input. It creates a user_message object that is structured similarly to the SYSTEM_MESSAGE object, but this time it’s from the user role and contains the user_input text. We then call the call_chatgpt function with this user_message object and print the response.

Giving it a test run

For now, we’re not going to return anything, we’ll worry about those details later. Let’s have a test first to see if what we have so far works. We’ll reuse the if__name__ == "__main__": trick we learned last time to do this. Add the following code at the end of the script:

if __name__ == "__main__":
    print("Welcome to the meme generator!")
    print("You can provide a situation or a topic to generate a meme.")
    generate_memes(user_input=input("Please provide a topic or situation: "))

So this code will only run if we run this file directly. It will print a welcome message and ask the user for a topic or situation. The input() function will wait for the user to type something and press Enter. Whatever the user types will be passed to the generate_memes function.

Now of course it’s not going to generate a meme if we use it right now, because we instructed ChatGPT to act like a cat. We’ll fix those details later, but for now, let’s just see if the ChatGPT part works by running the file:

Welcome to the meme generator!
You can provide a situation or a topic to generate a meme.
Please provide a topic or situation: I am writing a tutorial series
Purr, that sounds interesting! Is it about how to properly chase string or the art of the perfect nap? I could give you some pointers on those topics! *Wiggles whiskers*

Awesome, our ChatGPT call and the code work so far.

Creating proper system instructions

The first thing we need to do is craft some proper system instructions for ChatGPT to get it to generate meme texts for us. After that, we can improve on our overall get_meme.py script to generate the meme texts we need.

Create a new file named system_instructions.py in the project folder:

📁 Meme_Gen
    📁 fonts
        📄 ARIAL.TTF
        📄 ComicSansMS3.ttf
        📄 impact.ttf
    📁 output
    📁 templates
        🖼 (all the images in here...)
    📄 .env
    📄 get_meme.py
    📄 load_meme_data.py
    📄 meme_data.json
    📄 system_instructions.py  --> create this file

We’ll come back to get_meme.py in a moment, but we’ll store the system instructions in this separate file as they will be a bit longer than a single string. Open system_instructions.py and add the following:

SYSTEM_INSTRUCTIONS_TEMPLATE = """
You are a meme-generating robot. You will receive a situation or simply some text from the user. If the user describes a situation or even a whole story, use the main situation or topic as much as possible for the meme. If the user simply provides very simple text or even a single word, use the topic to generate a meme.

You can use the following meme templates: {meme_data_text}. It is your job to choose one of these templates and then generate the meme based on the user's input. Your output will be in line with the meme template you choose, so if it has 2 example sentences, you should generate 2 sentences, just like the example. Make sure your example also follows the meme template example sentence structure, so do not suddenly use very long sentences or a different structure.

Provide your output in the form of a valid JSON object with the following keys and values:
meme_id: The ID of the meme template you chose.
meme_name: The name of the meme template you chose.
meme_text: The text you generated, matching the structure of the example, as a list of texts. Stick to the same number of texts as instructed in the meme template data.

I want to have 3 options in the output object, each using a different meme template, so you will provide the above output 3 times wrapped in a JSON list.

Example user input:
I ate all the chocolate.

Example output:
{example_output}
"""

This is just a single very long string variable. It contains the instructions we want to give to ChatGPT. We tell ChatGPT that it is a meme-generating robot and that it should use the user’s input to generate a meme. I won’t go over all the details as you can pretty much read the instructions yourself, but we’re just being very clear about exactly what we want ChatGPT to do, which is the main point of importance here.

Note that there are two placeholders in this string: {meme_data_text} and {example_output}. We’ll replace these with actual data later. This is a template string.

First I’m going to define the example output in a separate variable below the SYSTEM_INSTRUCTIONS_TEMPLATE. We told ChatGPT to generate 3 different memes and provide them to us in a JSON list. We’ll provide an exact example of what this list should look like to be absolutely sure.

EXAMPLE_OUTPUT= """
{
    "output": [
        {
            "meme_id": 6,
            "meme_name": "Hide the Pain Harold",
            "meme_text": [
                "Ate all the chocolate.",
                "Realized now I have nothing for dessert."
            ]
        },
        {
            "meme_id": 7,
            "meme_name": "Success Kid",
            "meme_text": [
                "Found the last chocolate bar in the pantry.",
                "Ate it all by myself!"
            ]
        },
        {
            "meme_id": 0,
            "meme_name": "Drake Hotline Bling Meme",
            "meme_text": [
                "Sharing the chocolate.",
                "Eating all the chocolate myself."
            ]
        }
    ]
}
"""

Ok, so this is just a perfectly formed JSON object to be 110% clear to ChatGPT exactly what we want to receive from it. Just a JSON object in this format, with 3 different options, and nothing else.

All we really need is the meme_id to identify the meme template we need, but I included the meme_name as well as it is easier to read in our own print logs in the console. Then the most important thing is the meme_text itself of course, which is where ChatGPT gives us the actual meme in a JSON list format.

Using this JSON syntax is going to make it very easy for us to parse the response we get back from ChatGPT into a valid Python object so we can send it straight into our image editing part of the code later on.

Now let’s create a function to put this whole thing together to create the final system instructions:

def get_system_instructions(meme_data_text):
    return SYSTEM_INSTRUCTIONS_TEMPLATE.format(
        meme_data_text=meme_data_text, example_output=EXAMPLE_OUTPUT
    )

So we have a function named get_system_instructions that takes a meme_data_text as input. This function will return the SYSTEM_INSTRUCTIONS_TEMPLATE string with the placeholders replaced by the meme_data_text and EXAMPLE_OUTPUT variables.

What this means is that in a different Python file, I simply import this function and give it the meme data in text format to get the full system instructions I can then feed straight to ChatGPT. Having it in this separate file keeps everything nice and organized. Go ahead and save and close the system_instructions.py file.

Upgrading the `get_meme.py` script

Now it’s time to get back to our get_meme.py file and make use of these system instructions. Open get_meme.py and let’s first update our imports:

import json  # new import
import os
from typing import TypedDict  # new import

from dotenv import load_dotenv
from openai import OpenAI

from load_meme_data import load_meme_data_flat_string  # new import
from system_instructions import get_system_instructions  # new import

We added an import for the json library as we will be asking ChatGPT to return this format to us, so we’ll need something to parse the data into an object. We’ll use TypedDict to create a quick type declaration we’ll explain in a moment.

The load_meme_data_flat_string function is the one we created in part 1 that loads the data as a flat string (we cannot insert dictionary objects into our system instructions) and finally, we have get_system_instructions which we just finished creating.

Now straight after the imports, before we get into the load_dotenv() function, let’s add a type declaration for the output ChatGPT will be sending back to us:

class MemeGPTOutput(TypedDict):
    meme_id: int
    meme_name: str
    meme_text: list[str]

This is a TypedDict that we’re calling MemeGPTOutput. It has three keys: meme_id which is an integer, meme_name which is a string, and meme_text which is a list of strings. This is the exact structure of the JSON object we’re expecting from ChatGPT, as these are the instructions we’re giving in the system_instructions.py file.

Just like the similar TypedDict class we created in our load meme data Python file, this is just a type hint which states that any object of type MemeGPTOutput is a dictionary with these three keys and their respective types. This is a way to make our code a bit more readable and to help the linter understand what we’re doing.

For the next block of code, we’ll load the meme data text and then plug in our real system instructions into the SYSTEM_MESSAGE object. Update your code like this:

load_dotenv()

CLIENT = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
MEME_DATA_TEXT = load_meme_data_flat_string() # add this line
SYSTEM_MESSAGE = {
    "role": "system",
    "content": [
        {
            "type": "text", # pass in the real instructions here
            "text": get_system_instructions(MEME_DATA_TEXT),
        }
    ],
}

Just like we planned, we merely have to give the MEME_DATA_TEXT to the get_system_instructions function to get the full instructions we need to send to ChatGPT.

For the call_chatgpt function we’ll only need to make one slight change. ChatGPT generally provides answers to questions in a text format, but what many people don’t know is that there is actually a JSON mode in which ChatGPT can return structured data. As we’re asking it to return JSON to us, turning on this mode is perfect for our use case. Update the call_chatgpt function like this:

def call_chatgpt(user_message):
    response = CLIENT.chat.completions.create(
        model="gpt-4o",
        messages=[SYSTEM_MESSAGE, user_message], # type: ignore
        temperature=1,
        max_tokens=2048,
        response_format={"type": "json_object"}, # add this line
    )
    return response.choices[0].message.content

The # type: ignore comment was already there, you only have to add the response_format={"type": "json_object"}, line to turn on JSON mode. Everything else can stay the same.

Next up is the generate_memes function. We’ll need to make quite a few updates here so I will just provide the entire function here and then go over the new version in detail. Replace the generate_memes function with this:

def generate_memes(user_input: str) -> list[str] | None:
    user_message = {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_input,
            }
        ],
    }
    response = call_chatgpt(user_message)

    if not response:
        print("No response from the model, something went wrong.")
        return

    try:
        meme_output: list[MemeGPTOutput] = json.loads(response)['output']
    except json.JSONDecodeError:
        print("Invalid response from the model.")
        return

    print(meme_output)

Our generate_memes function still takes a user input which is a string, but now it will return either a list of strings (str) or | it will return None (no return statement). For now, this function has no return statement so the function will implicitly return None, but we’ll add the return statement later.

The part which defines the user_message object is the same as before and we’re still calling the call_chatgpt function with this user_message. After this, we have a simple if not response test just in case, to check if something went wrong. If there is no response we print a message and return None as there is no point running the rest of the function.

Now we have a try: block where we try to parse the response we get back from ChatGPT. We use the json.loads function to parse the JSON string into a Python object. As long as ChatGPT gives us the correct structure, and it is very good at doing this, we will now have a list of MemeGPTOutput structure objects in our meme_output variable.

If something does go wrong while trying to parse the JSON ChatGPT sends us it will jump over to the except block where we catch the json.JSONDecodeError exception. If this happens we print a message and return. Of course, you could add more advanced retry logic where you give feedback to ChatGPT if it fails, but for now we’ll keep it simple and this is good enough.

Finally, we print the meme_output variable which should contain the 3 meme outputs we asked ChatGPT to generate for us. Of course, we don’t have the whole image-editing logic that will embed these memes into images yet, but let’s just see how it works so far.

The if __name__ == "__main__": block at the end of the file will remain the same with no changes:

if __name__ == "__main__":
    print("Welcome to the meme generator!")
    print("You can provide a situation or a topic to generate a meme.")
    generate_memes(user_input=input("Please provide a topic or situation: "))

Now we’re ready to test our code. Run your updated get_meme.py file and provide a topic or situation when prompted:

Welcome to the meme generator!
You can provide a situation or a topic to generate a meme.
Please provide a topic or situation: I just got home from work and I'm very hungry
[{'meme_id': 6, 'meme_name': 'Hide the Pain Harold', 'meme_text': ['Finally home from work.', 'Too hungry to think.']}, {'meme_id': 9, 'meme_name': 'Disaster Girl', 'meme_text': ['When you rush home from work to eat.', 'But the fridge is empty.']}, {'meme_id': 0, 'meme_name': 'Drake Hotline Bling Meme', 'meme_text': ['Getting food delivered.', 'Cooking something quickly.']}]

There we go! That’s perfect and you can see that we have a valid object structure. It looks more like something we just fetched from a database than something ChatGPT generated, isn’t that cool?

That’s it for the base logic of the text meme generation. I’ll see you back in part 3 where we dive into the image editing process to start turning this into real memey goodness!