Hugging Face Course (4/6) – Local Image Generation

👉 Back to the Full Course on local models and Hugging Face (+Videos)

Welcome back to part 4! This time we will be looking at other non-LLM models that we can run locally on our own machine. We’ll be stepping away from Ollama and switching to the HuggingFace machine-learning web community from now on.

I’ve deliberately chosen a variety of different high-quality models that require different installation methods and have different quirks and pitfalls to them. My hope is that after seeing the various ways to install and run these models you will be able to tackle installing any other model you want to run in the future on your own.

Hugging Face is a leading platform and community for open-source natural language processing (NLP) and machine learning models. It has a vast library of pre-trained models for NLP tasks like text generation, translation, summarization, question answering, etc.

Researchers and developers can share, discover, and deploy machine learning models and the website provides model cards with details on performance, training data, intended use cases, and licenses.

Hugging Face has a vibrant open-source community contributing code, Hugging Face Spaces (model hosting), Datasets, Tokenizers, and more. As it has a large model library and active community, Hugging Face has become a central hub for the open-source AI Machine Learning community.

So without further ado, let’s head over to the Hugging Face website and see what we can do with it:

Before we head over to the Models section, which is where our focus will be for the rest of these tutorials, I just want to point out that Hugging Face also has an excellent section for Datasets:

You can find all sorts of useful datasets in many categories to use for training, fine-tuning, or testing models. Finally, if we move over one more tab to the right we have Hugging Face Spaces, where you can see all sorts of cool live demos for AI models.

Finding a model

At any point, you may need to sign up for a free account to access some of the features on the website. I recommend just creating a free account before continuing with this tutorial. When you’re done, click on the Models tab, and let’s get started!

We have so many models available, from Computer Vision to LLMs to audio-related models! First click on the Text-to-Image option on the left side. Sort the results by Most downloads and you should see a model named stabilityai/sdxl-turbo somewhere near the top.

SDXL Turbo is made by the stability.ai team and is based on Stable Diffusion XL. It is a pretty fast model that we can deploy locally to generate images at will! Click on the model to open the model card with more information:

Here you can see all the details for this model. It has a whopping 2.3 million downloads in the last month alone! If you scroll down you will see instructions on how to install this model and usually some example code:

These are the base instructions I will be using to figure out how we can run the model locally, but in some cases, we’ll also have to search the GitHub repository for a certain model to find more detailed instructions.

Finally, if we scroll down even further we can see some of the limitations of this model, and also that we’re allowed to use this for commercial purposes!

Jupyter Notebooks

For this part of the tutorial, we’ll be using Jupyter Notebooks, as it’s more practical to code interactively and keep our Python kernel running while we add more code. It will also display our generated images very nicely which is the main benefit here, as we cannot display image output in the console.

Install the Jupyter Notebooks extension for VS Code if you don’t have it already, by going to the extensions tab and searching for Jupyter Notebooks. It’s the extension with an insanely high number of downloads. (Yupyter by Microsoft).

I won’t be going into a detailed explanation of Jupyter Notebooks here, but if you’re not quite familiar with them, don’t worry. Just follow along with the video version of this tutorial so you can follow along step-by-step and you’ll get a feel for how it works.

First, go ahead and create a new file named image_gen.ipynb in the root folder of your project:

📁Local_Models
    📁test_files
    📄chat_app.py
    📄image_gen.ipynb    ✨New file
    📄local_chat.py
    📄local_chat_memory.py
    📄memory.py
    📄model_preloader.py
    📄Pipfile
    📄Pipfile.lock

Installing PyTorch

Now we’ll need to install PyTorch. PyTorch is an open-source machine learning library based on the Torch library, used for building and training neural networks. PyTorch supports hardware acceleration like CUDA GPUs and has a large community around it. For many models, you will need to have PyTorch installed to be able to run them.

Here is where it gets a bit tricky though, as some of you will have a CUDA GPU and some won’t. Head over to the PyTorch website and make sure you’re on the Start Locally tab:

Now scroll down until you get to the following choice menu section:

Choose the Stable build version, then pick the platform you’re on. I’ll be going for Windows and will use the Pip package manager inside VSCode as we have done so far. Obviously our language will be Python. Next up is your choice for either CUDA 11.8, CUDA 12.1, or CPU.

If you don’t have a CUDA GPU, CPU is fine. Everything we will do will still run fine, it will just be slower, so go ahead and choose CPU. You will get the command to run in your terminal. Make sure you change it to install inside of our virtual environment by replacing pip3 with pipenv:

#pip3 install torch torchvision torchaudio
pipenv install torch torchvision torchaudio

After running this command, skip ahead a little bit to where we get started with the Jupyter Notebook.

If you do have a CUDA GPU, we have 2 options left. Personally I would just go for the CUDA 12.1 version. There are a couple of potentially confusing points here so let’s go over them:

It may seem like you need to install CUDA yourself, but the CUDA version of the command below will actually install the correct runtime dependencies for you. This is different than actually installing the full CUDA toolkit. As we don’t need the full CUDA toolkit here, we won’t bother with all of that, but if you want more information you can find it here.
If you have a graphics card but you’re not quite sure whether it’s CUDA compatible, you can check here. If it’s on the list you should be good to go.

Ok, so now we have the install command pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121. If we just change this to pipenv instead, it’s going to get confused by the --index-url part of the install command. There is not really any clear documentation on how to use this particular combination with a pipenv, so we’re just going to cheat a little bit here. (There is a solution here if you’re really interested but I don’t want to bother you too much with the details of pipenv Pipfiles as it’s quite off-topic for this tutorial.)

Even though we are inside the virtual environment, we can still use the pip3 .... install command. You may have to change this to pip instead of pip3 though depending on your system. What will happen if we do is that pip will still install these dependencies inside of our virtual environment, so it’s not a big deal and will work fine, but it’s not the most elegant solution as it will be missing from our Pipfile and Pipfile.lock files. (You can add it to the pipfile as a comment: # run command: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121)

So go ahead and run the command in your terminal with the virtual environment still activated:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Again, use pip or pip3 depending on your system.

Testing PyTorch

When that is done running, go ahead and finally open the image_gen.ipynb Yupyter notebook file we created earlier. Go ahead and create a first code cell and put the following code in it:

import torch

torch.cuda.is_available()

Again, if you’re not familiar with Jupyter Notebooks, it may be helpful to follow along with the video version of the tutorial here. This code will check if we have a CUDA GPU available. If you do, it will return True, otherwise it will return False.

Go ahead and run this cell by pressing the play button to the left of it and you will probably be asked to select the Python kernel you want to use. Make sure you use the virtual environment named one again that has the name of your folder in the name, just like we did before in the previous parts.

You will probably get a popup message that you need to install ipykernel:

Ipykernel is the Jupyter version of the Python kernel which is what allows the session to stay alive between code cells etc. It is not in our virtual environment hence the error message. Just click install and it will take care of it for you, and then run the cell. Alternatively, you can also run pipenv install ipykernel in your terminal which will do the same thing.

Now when the first cell runs you should hopefully see the following:

Yay!! We did it! Torch and CUDA are all set. If you chose to run on CPU above without the CUDA installation this will obviously return False for you, which is absolutely fine, all the code we will use will still run for you as well. As long as you get either a True or False return that means PyTorch is successfully running.

💡 If you still have trouble, make sure you’re executing the Yupyter notebook in the correct virtual environment where you installed the libraries using the terminal! Click the long name ending on (Python x.xx.x) in the top right to switch your Python to the correct version.

If all else fails Google is your friend for your specific error message. I try to cover as much ground as possible but it is simply impossible to cover every single combination of system hardware and software setup possible. Spending an hour or even two Googling and being stuck is sometimes an inevitable part of the software development process.

Finishing the model setup

Now that we have PyTorch🔥 installed, we have a couple more installs to make our chosen SDXL-Turbo model work. Do not despair though, these are really simple one-liner installs! Run the following in your virtual environment terminal window:

pipenv install diffusers transformers accelerate

The diffusers library deals with diffusion models for generating images, audio, and more. It also has pre-built diffusion pipelines, which are basically like pre-built chains that we can use to generate images with just a few lines of code. The transformers and accelerate libraries have related inter-dependent functionality that we will need to run the model but we will not have to call these ourselves directly.

Running SDXL-Turbo

With that out of the way let’s get back to our Jupyter notebook. In image_gen.ipynb replace the code in the first code cell like this:

# Generic code that will work on any system. See alternative version below.
from diffusers.pipelines.auto_pipeline import AutoPipelineForText2Image
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo")
pipe.to(device)

We import torch and we use the AutoPipelineForText2Image class from the diffusers library, which is a pre-built pipeline for running these types of image generation models with minimal code.

Now we need to set the device that the torch library will use to perform its calculations. We check if a CUDA GPU is available and if it is we set the device to cuda:0, otherwise we set it to cpu. This same line will work for you no matter if you chose the CUDA or CPU version of PyTorch earlier.

Finally, we create a new pipeline by using the from_pretrained method of the AutoPipelineForText2Image class. We pass in the name of the model we want to use, which is stabilityai/sdxl-turbo, the model we chose earlier on the Hugging Face website. Now that we have this pipeline we need to move it onto the same device we set for the torch library.

A quick note, I made the code above a bit more generic so it will run on any system. If you are running a good graphics card, you can make the edit shown below here, but make sure you do not use this version if you are running on CPU as it will not work:

## Do not use this code block with CPU. Use the code block above instead!
from diffusers.pipelines.auto_pipeline import AutoPipelineForText2Image
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to(device)

We set the data type to floating point 16, but this is not possible for CPU inference, hence the two different versions. Use the one appropriate for your system, or if you’re not sure just use the first version as it will work fine regardless of your system setup.

Go ahead and run this new cell. It may take some time to load all this stuff and download the model the first time around, so we can just leave it running. As a side note, you may have an issue where the diffusers import has a red line squiggly line underneath it. This is just a visual thing and the code will still run fine. It will probably disappear if you restart VSCode.

Now go to the next code cell and let’s code up a quick reusable function using the code example from the Hugging Face model card for the stabilityai/sdxl-turbo model as a reference.

In the new cell, write the following code:

def generate_image(prompt: str):
    image = pipe(prompt=prompt, num_inference_steps=1, guidance_scale=0.0).images[0]
    return image

The code is just taken from the example on the model card, calling the pipe that we set up. The model card from Hugging Face also offers us an explanation of why we set the guidance_scale to 0.0, and that the num_inference_steps is set to 1 is enough. (Of course we can play around with these values later!). It shows us that the image output will be located in the output object’s .images attribute in index [0], so we just follow its lead.

Make sure you run this cell so the function gets loaded into memory.

Generating images

Now we can finally generate an image! In the next cell, write the following call:

generate_image("A beautiful sunset over the ocean.")

Now give it some time to run. It may be pretty quick or take a minute or two based on how powerful your system is. When it’s done Jupyter notebooks will display the function output image automatically without needing a print statement. This is one of the nice things about Jupyter notebooks:

That is pretty impressive! 🤯 This looks pretty much like a real photograph but was generated in a couple of seconds. It didn’t even take a massively powerful server using proprietary software but was done by our humble local computer running an open-source model!

I’ve deliberately tested this on a less powerful system to try and identify possible issues you may run into:

💡 If your model crashes check in your task manager if your memory is completely full. You may still have previous models or stuff running so shutting down extraneous stuff and cleaning up your memory can help. This hogs up a reasonable amount of processing power and memory.

💡 If you tried freeing up as much memory as possible and the problem still persists, try just switching to “cpu” by setting device = "cpu" in the first cell instead, and make sure you re-run the first cell to apply the changes. This will be a bit slower but will stop crashing if your GPU is not powerful enough and the model will still run fine on just CPU power.

Remember our conversation with AI Master Yoda from the last part?

I still feel bad for Master Yoda telling us off for eating pizza!🍕 Let’s take some virtual revenge on him for telling us off. Create a new code cell under the sunset image and write the following code, running the cell afterwards:

generate_image("Jedi Master Yoda eating a slice of pizza in the middle of a forest.")

Many pizza’s, eat, you shall, master Yoda!:

Awesome! Our virtual revenge was successful! 🍕🍕🍕 Not every generated image will be as good as the one above, sometimes you’ll have a bit of a worse one. Just give it another spin and try again. We now have a convenient and free text-to-image model on our local computer that we can run any time we want!

Image-to-image generation

Before we move on to the next part, there is one more thing I’d like to look at though. This particular model is not just a text-to-image model but it can also do image-to-image generations, where we give it an image as input reference.

Create a new file named image_to_image.ipynb in your project root folder:

📁Local_Models
    📁test_files
    📄chat_app.py
    📄image_gen.ipynb
    📄image_to_image.ipynb    ✨New file
    📄local_chat.py
    📄local_chat_memory.py
    📄memory.py
    📄model_preloader.py
    📄Pipfile
    📄Pipfile.lock

If we look at the model card on Hugging Face again we can see specific instructions and a code sample here for image-to-image generation:

Note the specific instructions for the num_inference_steps multiplied by the strength parameter needing to be 1 or higher. Best to just stick with the default values used in the example code first, and then do your experimenting afterwards.

Inside the image_to_image.ipynb file, create a new code cell and start with the following code, choosing either the generic version here or the CUDA version directly below:

# Generic version. It looks the same but notice the pipeline is different than the previous one, now using the AutoPipelineForImage2Image class instead of Text2Image.

from diffusers.pipelines.auto_pipeline import AutoPipelineForImage2Image
from diffusers.utils.loading_utils import load_image
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo")
pipe.to(device)

## GPU version, do not use with CPU setup.

from diffusers.pipelines.auto_pipeline import AutoPipelineForImage2Image
from diffusers.utils.loading_utils import load_image
import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pipe = AutoPipelineForImage2Image.from_pretrained("stabilityai/sdxl-turbo", torch_dtype=torch.float16, variant="fp16")
pipe.to(device)

This code is extremely similar to last time except we are importing and using a different pipeline this time. We also use a utility function named load_image from the diffusers library to load an image from a file in the format the model expects. This is the only real difference in the code.

Make sure you run the cell to load the model and pipeline into memory.

Loading the input image

Now we’ll need a seed image to give to the model. Download the image below and save it in the test_files folder in your project root under the name chicken.jpg:

📁Local_Models
    📁test_files
        🖼️chicken.jpg    ✨New file
        🖼️output1.jpg
        🖼️output2.jpg
    📄chat_app.py
    📄image_gen.ipynb
    📄image_to_image.ipynb
    📄local_chat.py
    📄local_chat_memory.py
    📄memory.py
    📄model_preloader.py
    📄Pipfile
    📄Pipfile.lock

You can also use any other image you like of course! But I’ll be using this AI-generated image of a chicken as our input seed image. Back to the image_to_image.ipynb file, create a new code cell, and write the following code to load the image in our notebook:

init_image = load_image("test_files\chicken.jpg").resize((512, 512))
init_image

We use the provided load_image function to load the image from the file path, resizing it to 512x512 pixels as this is the size our model likes to work with. When you run this cell you should see the image displayed in the notebook:

Notice that our input picture was not perfectly square but had a wide aspect ratio. As a result, our picture resized to 512x512 pixels is a bit squished. This is not a problem for the model which will tend to get rid of this squashed look for us.

Generating an image from an image

Now in the next cell let’s define a prompt and generate and show the resulting image:

prompt = "cartoon chicken 3d animation disney animation movie 3d cartoon chicken"

image = pipe(prompt, image=init_image, num_inference_steps=2, strength=0.5, guidance_scale=0.0).images[0]
image

We’re using the same code as the Hugging Face model card instructions again, feeding it our input image and sticking with the number of inference steps and strength settings suggested in the example, having the guidance scale set to 0.0 again as the model card instructions stated that SDXL-Turbo does not use this parameter. We index into .images[0] to get the image. Finally, we just state image as we don’t need to add a print statement in Jupyter notebooks to display the image output.

When you run this cell you should see the generated image displayed in the notebook. Here is what I got:

That’s a pretty cool-looking cartoon chicken! 🐔 It looks cartoony and 3d, just like we asked, but at the same time, it clearly is not just a random AI-generated chicken image. The background and floor are very similar and the orientation, positioning, and posture of our chicken are all exactly the same despite the different drawing style we requested.

Some more examples

I’ll give it a try with the strawberry image we used in part 1 of the tutorial, which is still in your test_files folder under the name output2.jpg:

I’m going to give it the prompt of Banana and see what it comes up with. With this reference image, it will probably look somewhere halfway in between a strawberry and a banana. Ladies and gentlemen, I give you the StrawNana! 🍓🍌

That’s pretty cool! I’m having way too much fun here! As always with these image models, the output is a bit different each time. If you are not satisfied with your first result just try running it again. A strawberry hand grenade to finish up:

That’s it for part 4. Go ahead and play around as much as you want! You can turn the code into a function again like last time if you prefer. When you’re done generating new fruits and cross-breeding animals into new species, I’ll see you in part 5 where we’ll be using AI to generate speech and even music! 🎶

P.S. Here’s a couple more pictures I generated for fun:

"A dying star in space, grand and beautiful, with a black hole in the center."

"A gigantic pizza with a city on top of it, with people living in the buildings."

👉 Back to the Full Course on local models and Hugging Face (+Videos)