OpenAI Fine-Tuning (1/4): Fine-tuning Our Own ChatGPT Model

Welcome to this course on ChatGPT fine-tuning. My name is Dirk van Meerveld and together we will be taking a look at fine-tuning ChatGPT to make our own custom versions of the popular LLM. Before we dive in we’ll take a look at what fine-tuning entails and when we should use it.

Why fine-tuning?

First of all, let’s take a moment to consider how we usually get ChatGPT to do what we want. We tell it, using a prompt message right? Basically, we tell it what we want to do, and we probably give it some examples as well if the task has any complexity to it. This is called “few-shot-learning” as we give a couple of demonstrations on how to perform the task.

So usually prompt engineering will get ChatGPT to do whatever we want and there’s not really any problem with that right? But what if the problem we want ChatGPT to solve is a bit more complex, and would require hundreds of reasonably sized examples? There are several use cases for this, but we’ll start with an example on brand identity.

Say that your company brand has a certain style and tone of communication, which is different from the default ChatGPT way of speaking. You are probably not a famous person, so you can’t just query GPT to write “In the style of Elon Musk” or “In the style of Barack Obama”. ChatGPT doesn’t know who you are!

So what do you do? Use the very largest GPT-4-turbo model with the largest context limit and just send 100 pages full of examples of your brand’s style of communication in the prompt setup message every single time? This will not work very well for several reasons:

  • Cost -> Sending that much information with every GPT call, especially when using the most expensive GPT4 model, will be very expensive if you scale it up.
  • Latency -> Your calls will not only be expensive but also slow in comparison, as the amount of data sent and processed is very large.
  • The normal model will have trouble learning an entire way of speaking including the tone and nuances from just a single system prompt setup message, even if it is very long. The input text is just a prompt and this style of speaking will not get ’embedded into the neurons’ of the model so to speak.

This is where fine-tuning comes to the rescue. Basically, OpenAI will give us a vanilla GPT model in a separate container. We then get to provide extra training data of our own, and OpenAI will further train the GPT model on the data we provide, creating our own custom fine-tuned version of ChatGPT.

We feed it a large amount of examples of our brand’s style of communication. This way we won’t have to send a million tokens in the context limit every time and can just query our custom-trained model which has our brand’s style of communication embedded into its very neurons!

I think you can see how this would be extremely helpful in many areas. A content creator may want some help writing initial drafts or ideas for new work but needs them to adhere to his own writing style. A large brand company may want to employ customer service bots, like all do these days, but needs them to adhere to the brand’s style and rules for communication, just like the human employees. Anyone with any kind of writing or speaking style may want some assistance from ChatGPT but in their own style and form of speech.

Let’s clone Chris!

To explore this idea and show you how to implement this for yourself or your clients using example data of their writing, we will be using an example most of you will be familiar with, Chris! Most of you will be familiar with Chris’ writing from the Finxter emails as you’ve probably received a fair amount of them if you’re taking this Finxter course. Today we are going to make ChrisGPT, a model that has been fine-tuned on Chris’ writing style.

I’ve chosen Chris as an example for several reasons:

  1. Most of you are probably familiar with him from the emails.
  2. He’s not so famous that we could just query GPT to write “In the style of Chris” and get a good result. This makes it into a realistic example of doing this for yourself or a client.
  3. He has a distinct style of writing that we will be able to differentiate from the default ChatGPT style.
  4. I have loads of data for him in the form of the Finxter emails on my email account.
  5. He has agreed to let us do this (thanks Chris! 😉).

Of course, Finxter emails from my Gmail inbox aren’t perfectly clean ideal-world data examples, but they will be good enough to give you an idea of how fine-tuning works and how you can use it to implement a brand identity or writing style into ChatGPT with the proper data. They will also add some fun to our exploration and learning experience!

Now I could just give you the perfectly formatted data so that you can send it straight to the OpenAI API for fine-tuning and we could have the whole thing over and done with in a single video. But if I did that I’d be doing you a huge disservice. The fine-tuning API itself is very simple and doesn’t take much time to learn at all, but it’s only a small part of the picture.

The real challenge is getting your input data, and getting the input data right. There is a real process to getting the right training data, making sure you formulate it right and avoid the pitfalls, and then getting all the training data into properly formulated examples. The fine-tuning part itself is the easy part. We will look at the nitty-gritty details of getting the data right, as our GPT will only ever be as good as our training data.

Garbage in = Garbage out.

So what kind of data do we need?

Well, basically we need to show ChatGPT what we want, by giving it examples. These examples need to be in the same format as normal ChatGPT chat completions, which makes total sense, as we’re trying to teach it what kind of chat completions we want it to create right? So the basic idea is something like this:

[
  {
    "messages": [
      {
        "role": "system",
        "content": "You are a Harry Potter wizard spell creator who comes up with new magical spells! Alohamora!"
      },
      { "role": "user", "content": "How can I bake a cookie?" },
      {
        "role": "assistant",
        "content": "Confectio Deliciosum! This spell will help you bake cookies in no time!"
      }
    ]
  },
  {
    "messages": [
      {
        "role": "system",
        "content": "You are a Harry Potter wizard spell creator who comes up with new magical spells! Alohamora!"
      },
      { "role": "user", "content": "I need to fend off a shark!" },
      {
        "role": "assistant",
        "content": "Squalus Repellum! This spell will help you fend off a shark!"
      }
    ]
  },
  {
    "messages": [
      {
        "role": "system",
        "content": "You are a Harry Potter wizard spell creator who comes up with new magical spells! Alohamora!"
      },
      { "role": "user", "content": "I need to lose some weight fast." },
      {
        "role": "assistant",
        "content": "Adiposum Reductum! Your belly fat will just fly out the window!"
      }
    ]
  }
]

Notice that we have a list of objects, which look like the messages list we would typically work with. We are basically simulating conversations with ChatGPT that never occurred, so we can use these as training data. For each example we have a system message, then we have the user query in the second slot, and the assistant’s response in the third slot.

Now notice that the entire prompt message is sent and is exactly the same for all three examples. You might feel that this is wasteful, as we’re repeating the same thing over and over again, but remember that we’re trying to sort of hard-wire these instructions into ChatGPT, so this is a good thing.

If you try to leave out the system messages to save tokens the model will have to learn entirely through demonstration of the outcome, and your training may actually take longer. We recommend just leaving that system message in there for every example, and make sure it’s a good one because it is going to get baked into the model!

The second entry, the user query, is obviously going to be different each time. Make sure you include examples that match the kind of use you want to use your final fine-tuned model for. Especially make sure you include any edge cases and harder-than-usual examples, the training phase is the time to show the model what it will be up against.

The third entry, the assistant’s response, is going to be the exact perfect answer that we want ChatGPT to give for this query. ChatGPT will be trained on this system message, with this query, this is the response I should give.

Note the example above is of course useless, as we could easily achieve this output without any fine-tuning at all from basically any LLM in existence. It is just an example of the training data structure. In reality, we need at least 10 examples for fine-tuning, but you should probably aim for at the very least 50 well-crafted examples if not more.

Also, the final format needs to be in JSONL format, with every object flattened down onto a single very long line, which looks kind of like this:

{"messages": [{system...}, {user...}, {assistant...}]}
{"messages": [{system...}, {user...}, {assistant...}]}
{"messages": [{system...}, {user...}, {assistant...}]}

But this is only a minor and easy conversion, so we’ll get back to that later.

As for the length, each training example is limited to the context length of the model. So every single line of the JSONL data can be up to the context limit, which for gpt-3.5-turbo-1106 is 16,385 tokens. As this is a very high amount, we’re not going to worry about it too much for our use cases here, as we’re not going to be going over this limit.

Now we’ll be using gpt-3.5-turbo-1106 here as it is the newest version of the model that has fine-tuning support so far. This is probably a good thing though as fine-tuning on GPT-4 would be a lot more expensive and as we’ll be showing the model exactly what we want it to do anyway, we won’t really need GPT-4’s extra capabilities.

The data

So, I’ve gone through my email account and extracted a whole bunch of emails I have received from Chris at Finxter, the last 200 to be precise. This very first step, I have done for you, as I obviously cannot give you all access to my personal email inbox! But I will still cover roughly the steps taken:

  1. I’ve applied a label to all the emails I wanted to extract from my inbox, so I could easily find them.
  2. I went to Google Takeout and requested a download of all my emails with that label.
  3. I received a file with all my emails in MBOX format.
  4. I wrote a Python script, mbox_to_json_decode.py, which takes the emails, decodes them, takes all my personal unsubscribe links and other personal data out, and then writes them to a JSON file.

As this MBOX to JSON conversion is hyper-specific, and the MBOX file contains some of my personal data, this is the only step along the way we will skip, as the chances that you will also have to convert MBOX files to JSON are very slim and I want to keep this course relevant. If you do need information on MBOX to JSON conversion, I will add the mbox_to_json_decode script in the GitHub repository so you can check it out if you need to.

So now we are left with Finx_dataset.json, which will be our entry point for this tutorial. Normally I would include this file in the GitHub repository, but as it has a large amount of the Finxter email data, we have elected to not include it in the repository. Instead, the file will be available for download from the course page in the Finxter Academy. If you haven’t downloaded it yet, please do so now.

Then go ahead and create a base project folder to use for this course, I’ve named mine Finx_Fine_Tuning, and then create a folder named data inside of it. Then move the Finx_dataset.json file into the data folder to create the following structure:

📁Finx_Fine_Tuning
    📁data
        📄Finx_dataset.json

Create a venv in the root project folder

Ok, just a small detour before we continue with our project!

We’ll be running this project inside a virtual environment. A virtual environment is a self-contained directory that will allow us to install specific versions of packages inside the virtual environment without affecting the global Python installation.

We will use this as I will be using specific versions for the libraries we install as we go along, and I want to make sure that you have the exact same experience as I do. The virtual environment will make it easy for you to install my exact versions without worrying about affecting any of your other projects.

To create a new virtual environment we’ll use a tool called pipenv. If you don’t have pipenv installed, you can install it using pip, which is Python’s package manager. Run the following command in your terminal:

pip install pipenv

Make sure the terminal is inside your root project folder, e.g. /c/Coding_Vault/Finx_Fine_Tuning, and then run the following command to create a new virtual environment:

pipenv shell

This will create a new virtual environment and also a Pipfile in your project directory. Any packages you install using pipenv install will be added to the Pipfile.

  1. To generate a Pipfile.lock, which is used to produce deterministic builds, run:
pipenv lock

This will create a Pipfile.lock in your project directory, which contains the exact version of each dependency to ensure that future installs are able to replicate the same environment.

We don’t need to install a library first to create a Pipfile.lock. From now on when we install a library in this virtual environment with pipenv install library_name, they will be added to the Pipfile and Pipfile.lock.

Back to our data

Back to where we were. Our root project folder should now look like this:

📁Finx_Fine_Tuning
    📁data
        📄Finx_dataset.json
    📄Pipfile
    📄Pipfile.lock

Let’s go ahead and take a look at the Finx_dataset.json file we downloaded earlier to see what kind of raw data we are working with here:

[
  {
    "subject": "5 Proxies to Investing in OpenAI",
    "body": "<html>\n<head>\n\t<title></title>\n</head>\n<body data-gr-ext-installed=\"\" data-new-gr-c-s-check-loaded=\"8.909.0\" data-new-gr-c-s-loaded=\"8.909.0\" style=\"font-family:Arial;font-size:16px;\">\n<p style=\"text-align: center;\"><a href=\"{Link}\" target=\"_blank\"><img alt=\"\" height=\"39\" src=\"{Link}\" width=\"153\" /></a></p>\n\n<p>\u00a0</p>\n\n<p>Hey {User},</p>\n\n<p>To profit from change, we need to increase ownership of disruptive trends. Today's article covers a question that many Finxters frequently ask:</p>\n\n<p>\ud83e\udeb4 [<strong>Blog</strong>] <a href=\"{Link}\">How to Invest in OpenAI?</a> \ud83c\udf33</p>\n\n<p>While it's not possible to invest in OpenAI directly, the blog discusses five alternatives:</p>\n\n<ul>\n\t<li><strong>MSFT </strong>(49% stake in OpenAI),</li>\n\t<li><strong>NVIDIA </strong>(makes more revenue from OpenAI than any other company),</li>\n\t<li><strong>ARKVX </strong>(<em>Anthropic!</em>),</li>\n\t<li><strong>META </strong>(<em>Llama 2!</em>), and</li>\n\t<li><strong>TSLA </strong>(Optimus!).</li>\n</ul>\n\n<p>Check it out if you're interested in any of those! No financial advice. \ud83d\ude0a</p>\n\n<p>Be on the right side of change. \ud83d\ude80<br />\nChris</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p><strong>\u2665\ufe0f Community Corner: Featured Resources</strong></p>\n\n<ul>\n\t<li><a href=\"{Link}\">TradeUnafraid</a> is a trading platform owned and operated by Finxter community member Lee.</li>\n</ul>\n\n<p>Do you want to feature your own startup, YouTube channel, blog, or website as a <a href=\"{Link}\">Finxter premium member</a>? Hit reply and let me know!</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<div style=\"background:#eeeeee;border:1px solid #fcfcfc;padding:20px 20px;\">\n<p><span><strong><a href=\"{Link}\">How are we doing?</a></strong><br />\n<a href=\"{Link}\">\u2b50</a><br />\n<a href=\"{Link}\">\u2b50\u2b50</a><br />\n<a href=\"{Link}\">\u2b50\u2b50\u2b50</a><br />\n<br />\nTo make sure you keep getting these emails, please add <em>chris@finxter.com</em> to your address book.<br />\n<br />\nI'd love to hear your feedback so that I can improve this free email course over time. Please reply to this email and share everything on your mind!<br />\n<br />\n<strong>If you find the Finxter Email Academy useful, please invite a friend or colleague! \u2764</strong></span></p>\n\n<p><br />\n<span>Here's the subscription link you can share:<br />\n<a href=\"{Link}\" target=\"_blank\">https://blog.finxter.com/subscribe/</a><br />\n<br />\nDownload the Ultimate Python Cheat Sheet here (direct PDF download): \ud83d\udc0d</span></p>\n\n<p><span><strong><a href=\"{Link}\" target=\"_blank\">The Ultimate Python Cheat Sheet</a></strong><br />\n<br />\nNot very motivated to learn today? Consider this:<br />\n<strong><em>\"Knowledge compounds!\"</em></strong> -- Warren Buffett<br />\n<br />\nConsequently, if you improve your skills by 1% every day, you'll 36x your programming skills within a year!</span></p>\n</div>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p><br />\n<em><strong><span>Finxter, Dr. Christian Mayer</span></strong><br />\n<span>{Address}., {City}, {Country}</span></em></p>\n\n<p><span>Want out of the loop? I'm so sad to see you go. \ud83d\ude22 How could we have done better? </span><br />\n<span>To help future Finxters, please hit reply and tell us! \ud83e\udd17</span></p>\n<a href=\"{Link}\" >Unsubscribe here</a>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n</body>\n</html>\n<img src=\"{Link}\" alt=\"\" style=\"width:1px;height:1px;\"/>\n"
  },
  {
    "subject": "Tech Deflation vs Inflation",
    "body": "Email no2..."
  }

As you can see, we have a list of objects, each with a subject and body key. The body key contains the raw HTML of the email, which we will need to clean up a bit before using it for our purposes. The only preprocessing I’ve done in the MBOX to JSON conversion is removing links and personal data for generic {Link} and {User} placeholders.

If you’re wondering what the \uxxxx characters are, like the sequence \ud83d\udc0d, they are Unicode escape sequences that represent characters in the Unicode standard. Specifically, this sequence represents the “snake” emoji (🐍). You will see these quite a lot as Chris is of course famous for his creative emoji usage!

The full list has about 200 of these email objects, in non-chronological order. If you scroll through the data, you will see some noise in there, which will be reflected in our final product. For the purposes of this tutorial, it will be good enough. For professional use, you’d want to make sure to clean up the data more thoroughly, spending some more time here.

Preparing our data

We now have our basic data, and we know what kind of format we need for the training data, like the Harry Potter magical spells example we showed. Now let’s start wrangling the data into the format we need. As with all complex coding tasks let’s take it one step at a time, and let’s build our solution in small and reusable parts.

We’ll start with a utility to convert the email above into a more readable and simple format. Instead of the HTML above with all the emojis in a format that we cannot even read and loads of HTML tags all over the place, let’s have a utility function that takes that HTML email as input and returns a simple and readable markdown format version for us to work with instead.

So go ahead and create a new folder named utils in the root project folder, and then create a new file named html_email.py inside the utils folder:

📁Finx_Fine_Tuning
    📁data
        📄Finx_dataset.json
    📁utils
        📄html_email.py
    📄Pipfile
    📄Pipfile.lock

Now before we get started on the html_email.py file, we’ll need to install a library called html2text which will help us convert the HTML emails to markdown. Someone has already written a library to do this for us, so we don’t have to write it ourselves. Always use existing solutions when they exist to speed up your development cycle!

To install a specific version of a package in our Pipenv environment, you can use the pipenv install command followed by the package name and the version number. Run the following command:

pipenv install html2text==2020.1.16

This command will add html2text to our Pipfile under the [packages] section with the specified version. It will also update your Pipfile.lock to include the exact version of html2text and its dependencies.

Now let’s go ahead and open the html_email.py file and add the following code:

import html2text

def html_to_markdown(html: str) -> str:
    html = html.encode("utf-16", "surrogatepass").decode("utf-16")

    html_to_text_converter = html2text.HTML2Text()
    html_to_text_converter.ignore_links = False
    return html_to_text_converter.handle(html)

We first import the library we have just installed. Then we define a function html_to_markdown which takes an HTML string as input and returns a markdown string.

We then take the html variable, which is a string, and we
will convert any Unicode escape sequences in the string back into their corresponding characters. The "surrogatepass" error handler instructs Python to properly handle any surrogate characters in the string so that for the \ud83d\ude80 patterns we talked about earlier, after running this line, they will be turned into the corresponding emoji characters (in this case, the rocket emoji 🚀).

This works because the .encode method converts the string to bytes using UTF-16 encoding, which includes converting Unicode escape sequences to their actual Unicode characters. Then, the .decode method converts those bytes back into a string, preserving the Unicode characters. So we basically did a round-trip conversion from Unicode escape sequences to actual Unicode characters.

We then create an instance of the HTML2Text class and set the ignore_links attribute to False to include links in the output. We then call the handle method of the HTML2Text instance and pass the HTML string as an argument to convert it to markdown, and simply return the result.

Let’s test it out

Let’s go ahead and give it a test run. Above the html_to_markdown function, add the following variable holding a test email string:

test_email = '<html>\n<head>\n\t<title></title>\n</head>\n<body data-gr-ext-installed="" data-new-gr-c-s-check-loaded="8.909.0" data-new-gr-c-s-loaded="8.909.0" style="font-family:Arial;font-size:16px;">\n<p style="text-align: center;"><a href="{Link}" target="_blank"><img alt="" height="39" src="{Link}" width="153" /></a></p>\n\n<p>\u00a0</p>\n\n<p>Hey {User},</p>\n\n<p>To profit from change, we need to increase ownership of disruptive trends. Today\'s article covers a question that many Finxters frequently ask:</p>\n\n<p>\ud83e\udeb4 [<strong>Blog</strong>] <a href="{Link}">How to Invest in OpenAI?</a> \ud83c\udf33</p>\n\n<p>While it\'s not possible to invest in OpenAI directly, the blog discusses five alternatives:</p>\n\n<ul>\n\t<li><strong>MSFT </strong>(49% stake in OpenAI),</li>\n\t<li><strong>NVIDIA </strong>(makes more revenue from OpenAI than any other company),</li>\n\t<li><strong>ARKVX </strong>(<em>Anthropic!</em>),</li>\n\t<li><strong>META </strong>(<em>Llama 2!</em>), and</li>\n\t<li><strong>TSLA </strong>(Optimus!).</li>\n</ul>\n\n<p>Check it out if you\'re interested in any of those! No financial advice. \ud83d\ude0a</p>\n\n<p>Be on the right side of change. \ud83d\ude80<br />\nChris</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p><strong>\u2665\ufe0f Community Corner: Featured Resources</strong></p>\n\n<ul>\n\t<li><a href="{Link}">TradeUnafraid</a> is a trading platform owned and operated by Finxter community member Lee.</li>\n</ul>\n\n<p>Do you want to feature your own startup, YouTube channel, blog, or website as a <a href="{Link}">Finxter premium member</a>? Hit reply and let me know!</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<div style="background:#eeeeee;border:1px solid #fcfcfc;padding:20px 20px;">\n<p><span><strong><a href="{Link}">How are we doing?</a></strong><br />\n<a href="{Link}">\u2b50</a><br />\n<a href="{Link}">\u2b50\u2b50</a><br />\n<a href="{Link}">\u2b50\u2b50\u2b50</a><br />\n<br />\nTo make sure you keep getting these emails, please add <em>chris@finxter.com</em> to your address book.<br />\n<br />\nI\'d love to hear your feedback so that I can improve this free email course over time. Please reply to this email and share everything on your mind!<br />\n<br />\n<strong>If you find the Finxter Email Academy useful, please invite a friend or colleague! \u2764</strong></span></p>\n\n<p><br />\n<span>Here\'s the subscription link you can share:<br />\n<a href="{Link}" target="_blank">https://blog.finxter.com/subscribe/</a><br />\n<br />\nDownload the Ultimate Python Cheat Sheet here (direct PDF download): \ud83d\udc0d</span></p>\n\n<p><span><strong><a href="{Link}" target="_blank">The Ultimate Python Cheat Sheet</a></strong><br />\n<br />\nNot very motivated to learn today? Consider this:<br />\n<strong><em>"Knowledge compounds!"</em></strong> -- Warren Buffett<br />\n<br />\nConsequently, if you improve your skills by 1% every day, you\'ll 36x your programming skills within a year!</span></p>\n</div>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p><br />\n<em><strong><span>Finxter, Dr. Christian Mayer</span></strong><br />\n<span>{Address}., {City}, {Country}</span></em></p>\n\n<p><span>Want out of the loop? I\'m so sad to see you go. \ud83d\ude22 How could we have done better? </span><br />\n<span>To help future Finxters, please hit reply and tell us! \ud83e\udd17</span></p>\n<a href="{Link}" >Unsubscribe here</a>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n\n<p>\u00a0</p>\n</body>\n</html>\n<img src="{Link}" alt="" style="width:1px;height:1px;"/>\n'

Just copy it from the written version of the tutorial, and make sure you insert it above the function we wrote:

import html2text

test_email = ...

def html_to_markdown(html: str) -> str:
    ...

Now, below the html_to_markdown function, add the following code to test the function:

if __name__ == "__main__":
    markdown_content = html_to_markdown(test_email)

    with open("test.md", "w", encoding="utf-8") as file:
        file.write(markdown_content)

This code will run the html_to_markdown function with the test_email string as input, and then write the result to a file named test.md. The if __name__ == "__main__": line ensures that the code inside the block only runs when the script is executed directly, and not when we import the html_to_markdown function into another script later on.

💡 Python Top-tip 💡
In Python, when a script is run, a special built-in variable called __name__ is set to "__main__". However, if a module is imported, __name__ is set to the module's name instead. By checking if __name__ == "__main__":, the script can determine whether it's being run directly or being imported as a module.

This allows for a flexible way to organize your code. You can put code that tests the functionality of the module or demonstrates how to use the module under this if statement. When the module is imported, this code won't run, but when the script is run directly, the code will execute. This is particularly useful for unit testing or for scripts that can be used both as utility modules and as standalone programs.

Now go ahead and run the script and a new file named test.md will be created. If you check it out it will have the markdown version of the email we provided as input.

[![]({Link})]({Link})

Hey {User},

To profit from change, we need to increase ownership of disruptive trends.
Today's article covers a question that many Finxters frequently ask:

🪴 [ **Blog** ] [How to Invest in OpenAI?]({Link}) 🌳

While it's not possible to invest in OpenAI directly, the blog discusses five
alternatives:

  * **MSFT** (49% stake in OpenAI),
  * **NVIDIA** (makes more revenue from OpenAI than any other company),
  * **ARKVX** ( _Anthropic!_ ),
  * **META** ( _Llama 2!_ ), and
  * **TSLA** (Optimus!).

Check it out if you're interested in any of those! No financial advice. 😊

Be on the right side of change. 🚀
Chris

**♥️ Community Corner: Featured Resources**

  * [TradeUnafraid]({Link}) is a trading platform owned and operated by Finxter community member Lee.

Do you want to feature your own startup, YouTube channel, blog, or website as
a [Finxter premium member]({Link})? Hit reply and let me know!

**[How are we doing?]({Link})**
[⭐]({Link})
[⭐⭐]({Link})
[⭐⭐⭐]({Link})

If we render this properly as markdown it will result in the following look:

###########################START##########################

Hey {User},

To profit from change, we need to increase ownership of disruptive trends.
Today’s article covers a question that many Finxters frequently ask:

🪴 [ Blog ] How to Invest in OpenAI? 🌳

While it’s not possible to invest in OpenAI directly, the blog discusses five
alternatives:

  • MSFT (49% stake in OpenAI),
  • NVIDIA (makes more revenue from OpenAI than any other company),
  • ARKVX ( Anthropic! ),
  • META ( Llama 2! ), and
  • TSLA (Optimus!).

Check it out if you’re interested in any of those! No financial advice. 😊

Be on the right side of change. 🚀
Chris

♥️ Community Corner: Featured Resources

  • TradeUnafraid is a trading platform owned and operated by Finxter community member Lee.

Do you want to feature your own startup, YouTube channel, blog, or website as
a Finxter premium member? Hit reply and let me know!

How are we doing?

⭐⭐
⭐⭐⭐

###########################END##########################

Which is good enough for our purposes for this tutorial. We will be using this markdown version of the emails as our training data for the fine-tuning process. We could go and clean up even further to have cleaner output, but for the purposes of this tutorial, this will be good enough.

Now that we have our HTML to Markdown function prepared, we’ll continue in part 2, where we will generate the actual training data for our fine-tuning of ChrisGPT. I’ll see you in part 2!