Hi and welcome to the first part of this ‘Introduction to AI Engineering’ tutorial series! In this part we’ll first take a look at ChatGPT, the different options and models available, and how to use it via the API with Python so we can automate our ChatGPT calls.
What you’ll need
To follow along with this tutorial series, you’ll need a code editor. I recommend using Visual Studio Code, but you can use any code editor you like. If you don’t have a code editor installed, you can download Visual Studio Code from here.
You’ll also need to have Python installed on your computer. If you don’t have Python installed, you can download it from here. If you’re asked whether you want to add Python to your PATH, make sure to check yes.
The last thing we’ll need is a ChatGPT account. If this is your first time using it, don’t worry, you will get a bunch of free credits after signing up for your first account so you should be able to follow along without having to pay any money.
Go to https://platform.openai.com/ and log in. If you already use OpenAI and already have an account set up with a free or paid API key, you can use that. If you don’t have an account just log in with your Google account. It will ask you something simple, like to fill in your birthday, and ta-da, you have an account!
When you log in on a brand new account you will see something like this (navigate to the dashboard if you land on a different page):
Find API keys in the left sidebar and click on it. If this is a new account it will ask you to verify your phone in order to create a new API key:
The reason they do this is to prevent bots from creating loads of free accounts and abusing their system. Just give them a phone number and they will send you a verification code to enter. You will also get a bunch of free credits from them to follow along with this tutorial, so it’s a win-win!
Find the green button to + Create new secret key
in the top right corner and click on it:
In the next window, you can leave everything as is. You don’t need to give it a name or select a project. You can do these things if you want to, but I’ll just create a nameless general key for now by accepting everything as is and clicking the green Create secret key
button:
You will now see your new API key:
So make sure you press the Copy
button and save it somewhere safe, maybe a password manager. You won’t be able to see this key again, though you can always generate a new one if you lose it. Make sure not to share your key as anyone with your key can use your credits!
Now that we have our API key, we can get started!
Models available and pricing
If we go to the pricing page on the OpenAI website, we can see that several models are available for the newest GPT-4o
model. At the time of writing this tutorial, models and their costs look like this:
As you can see, there are currently three versions of GPT-4o
available, so what is going on here? gpt-4o
is the original and first version of the model whereas gpt-4o-2024-05-13
is an updated version thereof, and gpt-4o-2024-08-06
is the latest update. To further complicate matters simply inputting the generic gpt-4o
will tend to automatically use the latest version available, which is gpt-4o-2024-08-06
in this case, so you don’t have to worry about the specific version.
As you can see from the pricing of $5 per 1m input tokens and $15 per 1m output tokens for the other model, the latest gpt-4o-2024-08-06
version is considerably cheaper at $2.50 per 1m input tokens and $10 per 1m output tokens. This is nice as it encourages us to use the latest and most up-to-date model. Now this may sound very expensive but 1 million tokens is a very large amount. If we look at the same prices but per 1000 tokens it looks a lot different:
So what is a token exactly? A token in the context of a Large Language Model (LLM) like GPT-4o is a piece of text that the model processes. The model uses these tokens to understand and generate text, with each token representing a chunk of the input or output data.
As an LLM does not think in words like we humans do, a token in the English language is typically about 3-4 text characters long. So as a rough generalization, you can think of 1000 input tokens as about 3000-4000 characters of text you send to the model, and 1000 output tokens as about 3000-4000 characters of text the model sends back to you. (Note that this can vary depending on the specific text and language used.)
The GPT-4o
model shown above is the currently most powerful and capable flagship model. But as LLMs are quite powerful in general and we often ask them to do simple tasks or answer simple questions, we often don’t need quite this much firepower.
It would be kind of like sending a tank to pick up groceries at the store. It can do it, but it’s a bit overkill. Imagine you’re coming up with your tank and brraaawhh and all the roads get destroyed, traffic signs falling over and the store is in ruins and you’re like “I just wanted some milk and bread…”.
Luckily OpenAI has more suitable budget models for this purpose. Where this used to be gpt-3.5-turbo
in the past, the new GPT-4o
model has its own budget version called gpt-4o-mini
which replaces the aforementioned 3.5-turbo
. If we look at the pricing we can see it is over 15 times cheaper both input and output token-wise compared to the flagship model:
And the prices per 1000 tokens are so small it’s hard to even read them:
So depending on the complexity of the task at hand, we’ll use either gpt-4o
or gpt-4o-mini
. For simple tasks like summarizing text, answering questions, or generating simple text, the gpt-4o-mini
model is more than enough. For more complex tasks like generating code, writing essays, or creating complex text, the gpt-4o
model is more suitable.
Project setup
So let’s take a look at how this works and how to use ChatGPT programmatically by calling the API. First of all, create a folder for your project that we can use for all parts of this tutorial series. I’ll name mine Broad_intro_to_AI
:
π Broad_intro_to_AI
Now open the folder in VSCode. You can do this by right-clicking the folder and selecting Open with Code
. If you don’t see this option, you can open VSCode and then open the folder from there.
The first thing we’ll need is to save our API key in this project. We don’t want to hardcode it into our scripts, as that would be a security risk. Instead, we’ll save it in a separate file and read it from there. This way, we can also easily share our code without sharing our API key.
Create a new file in the folder by right-clicking the explorer/browser sidebar and selecting New File
. Name it .env
:
π Intro_to_AI_engineering
π .env
The file has no name but only the extension .env and will store our API key. Open the file and paste your API key in there:
OPENAI_API_KEY=sk-loadsoflettersandnumbers
Make sure to replace sk-loadsoflettersandnumbers
with your actual API key. Save and close this file. If you ever share your code, make sure to exclude this file from the shared code.
The next thing we’ll need to do is install two libraries for Python. Open a terminal in VSCode by clicking Terminal
in the top menu and selecting New Terminal
, or using the shortcut [Ctrl + ]
. In the terminal, type the following command and press Enter:
pip install openai --upgrade
This will either install or update (if you already have it) the OpenAI Python library. This library will allow us to easily interact with the ChatGPT API.
The next library we need is called python-dotenv
. This library will allow us to read the API key from the .env
file we created. Install it by typing the following command in the terminal and pressing Enter:
pip install python-dotenv
Making our first ChatGPT request
Now that we have everything set up and ready to go, let’s create a new Python file in the folder by right-clicking the folder in the sidebar and selecting New File
. Name the file chat_gpt_request.py
:
π Intro_to_AI_engineering
π .env
π chat_gpt_request.py
Open up this file and let’s make a request. The first thing we’ll put in this file is our imports:
import os
from openai import OpenAI
from dotenv import load_dotenv
The os
(operating-system) library will allow us to read the API key from the .env
file, used in combination with the load_dotenv
import. The OpenAI
import is the library we installed earlier that will allow us to interact with the ChatGPT API.
Now we’ll load the API key and set up the OpenAI object. Add this below the imports:
load_dotenv()
CLIENT = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))
Calling the load_dotenv()
function will read the .env
file with our API key and make it so the os
library now has access to this key. We then create a CLIENT
object using the OpenAI
class we imported and pass in an argument api_key
with the value of our API key.
The os.getenv('OPENAI_API_KEY')
function will read the environment variable OPENAI_API_KEY
from the .env
file. This is why it’s important to name the key in the .env
file the same as the environment variable we’re trying to read. The OpenAI
object which we named CLIENT
now has access to our api_key
.
Don’t worry too much about this if it seems confusing, this is just sort of boilerplate setup code.
So now that we have a CLIENT
to interact with ChatGPT, let’s write some instructions that we want to give to ChatGPT. Add the following code below the previous code:
summarizing_instructions = """
You are a helpful assistant for summarizing information. You will be provided with input and your job is to summarize it in a concise and professional manner. Make sure you keep the important key points intact and provide a clear and easy-to-understand summary of the information that is engaging and easy to read.
"""
We create a variable summarizing_instructions
and assign it a multi-line string. A multi-line string just means that it can span multiple lines and is enclosed by triple quotes """
. This string contains instructions for ChatGPT to follow.
We’re asking ChatGPT to become a summarizer and just give us back a summary of whatever we send it.
Instructions to summarize are nice, but we’ll also need something to actually summarize. For now, let’s create a text_to_summarize.txt
file that will hold the text to summarize, in the same folder as your chat_gpt_request.py
file:
π Intro_to_AI_engineering
π .env
π chat_gpt_request.py
π text_to_summarize.txt
Now simply go somewhere on the internet and just copy and paste any text you would like to try this with. I’ll be using the wikipedia page for the Portuguese city of Porto as an example, but feel free to use anything you like. Paste the text into the text_to_summarize.txt
file and save the file.
Now our text to summarize is inside a separate .txt
file where we can easily change the text later without having to change our Python code. We do need to load this text inside our code to use it though, but Python makes this quite easy for us. Back to our chat_gpt_request.py
file, let’s load this text into a variable:
with open("text_to_summarize.txt", "r", encoding='utf-8') as file:
text_to_summarize = file.read()
This code will open the text_to_summarize.txt
file in read mode ("r"
) and read the contents of the file into a variable named text_to_summarize
. The encoding='utf-8'
argument is used to specify the encoding of the file, which is necessary for reading files with special characters. It’s often safe to assume that text files are encoded in UTF-8.
Now that we have both summarizing instructions and something to be summarized, let’s continue below our previous code and make a call to ChatGPT using our CLIENT
object:
print("Asking SummaryGPT...")
summary = CLIENT.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": summarizing_instructions},
{"role": "user", "content": text_to_summarize}
]
)
print(summary)
The CLIENT
object has a chat
attribute which has a completions
attribute which has a create
method. This method will create a new chat completion, which is essentially a request to ChatGPT to generate a response based on the input we provide.
For the model, I’ve selected the gpt-4o-mini
model for now, as this way the request is really cheap and you don’t have to worry about the costs even if you summarize very long texts. Feel free to use gpt-4o-2024-08-06
as discussed earlier though if you want to, it will still be very affordable.
The messages
argument is a list of messages that ChatGPT will use to generate a response. The first message is a system message as indicated by the role
of system
. This is where we typically provide the instructions for ChatGPT on what to do which need to be included for every call. The second message with the role
of user
symbolizes the user input, in this case we provide the text we want to summarize, but this could be where a 3rd party user would ask a specific question to ChatGPT.
At the end we’re going to print the summary
object which will contain the response from ChatGPT so we can see what is inside. Go ahead and save the file and run it by typing the following command in the terminal and pressing Enter:
python chat_gpt_request.py
You can also press the play button in the top right corner as a neat shortcut instead of the terminal (if you use VSCode):
You will see the output summary we want but also a lot of other stuff around it so it’s kind of hard to read:
Let’s take a moment and ‘unfurl’ this output so we can see what is going on here:
ChatCompletion(
id="chatcmpl-A9936mMiNvFs3Xx0VYHBwgULIF3Tu",
choices=[
Choice(
finish_reason="stop",
index=0,
logprobs=None,
message=ChatCompletionMessage(
content="This is where the actual answer is located with our summary of the text.",
refusal=None,
role="assistant",
function_call=None,
tool_calls=None,
),
)
],
created=1726743100,
model="gpt-4o-mini-2024-07-18",
object="chat.completion",
service_tier=None,
system_fingerprint="fp_e9627b5346",
usage=CompletionUsage(
completion_tokens=268,
prompt_tokens=11732,
total_tokens=12000,
completion_tokens_details={"reasoning_tokens": 0},
),
)
You can see we have a whole bunch of information available to us. If you want to learn more about everything in detail, see the resources at the end of this lesson, but notice we also have details on the amount of tokens used at the end.
Let’s focus on extracting our actual summary for now as this is the text we’re interested in. We can see from the object above that the actual answer is in the object’s choices
attribute, which is a list in which we need to access the first element.
The actual summary is in the message
attribute of this element, which holds the summary text in its content
attribute. Why is the obviously most important part of the object so hidden away and hard to get to? I honestly have no idea.
It’s like going to a huge library where the librarian insists on showing you every encyclopedia and academic journal in existence before reluctantly leading you to the one shelf with fiction books which is hidden all the way in the back of the library but first you must cross a bridge over a pit of lava and not die from the fire-breathing dragons guarding the entrance.
Anyways… let’s extract the summary and print out only the summary text instead of the whole object. Edit the code like this:
print("Asking SummaryGPT...")
summary = CLIENT.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": summarizing_instructions},
{"role": "user", "content": text_to_summarize}
]
)
extracted_summary = summary.choices[0].message.content
print(extracted_summary)
Notice that this follows the structure we just described in this object to get to the good stuff.
Now we get just our summary in a nice and readable format:
Porto, also known as Oporto, is the second-largest city in Portugal, situated in the northern part along the Douro River estuary. The city has a rich historical background, with roots tracing back to around 275 BCE, and it was designated as a UNESCO World Heritage Site in 1996 for its historic center. Porto is notable for its global city ranking, vibrant urban atmosphere, and significant contributions to the production of Port wine, which originally derives its name from the city. With a population of approximately 248,769 within its smaller municipality and about 1.8 million in the metropolitan area, Porto is an important cultural and economic hub, recognized as "A Cidade Invicta" or "The Unconquered City."
The city boasts a Mediterranean climate, characterized by warm, dry summers and mild, rainy winters, making it a desirable destination for tourism, which has significantly increased in recent years. Porto is famous for its architectural landmarks, including the Dom LuΓs I Bridge, Porto Cathedral, and vibrant neighborhoods like Ribeira. The city has a diverse cultural scene, marked by events such as the SΓ£o JoΓ£o Festival and Queima das Fitas, and is home to renowned institutions like the University of Porto and several international schools. As a key player in sports, especially football, Porto hosts top-tier teams like FC Porto, and has made its mark in various other athletic disciplines.
That’s much better! We just have our summary. Since it is in our code inside a variable, we can now do anything we want with it. We could save it to a file, send it to a database, or do whatever we want with it.
You now have the ability to ask ChatGPT for help with anything programmatically, by simply changing the system message and user message in the messages
list. This is a very powerful tool that can be used for a wide variety of tasks.
That’s it for part 1 of this tutorial series. If you enjoyed this topic and would like to learn more, check out the Python OpenAI API Mastery: Function Calls and Embeddings course on the Finxter Academy!
In the course, you’ll begin with a basic function call to make ChatGPT perform tasks like joking on random topics, then delve into utility functions to understand its underlying workings. You’ll integrate real-world data like live weather, learning function calling and parallel function calling in the process and interact with databases using SQL. You will also learn to leverage embeddings for both finding similar content and performing precise sentiment analysis, and you will learn to build a fully-fledged assistant using the OpenAI assistants API.
That’s it for part 1 of this broad introduction to AI tutorial series. I’ll see you in part 2 where we’ll cover the other APIs available, covering image generation, text-to-speech, and speech-to-text transcription. See you there!