π Back to the Full Course on local models and Hugging Face (+Videos)
Hi and welcome to this tutorial series on running Large Language and Machine Learning Models on your local machine. In this first part, we’ll be looking at running LLMs locally on our own computer for free. To achieve this we will use ollama
which is a tool that is designed to help us achieve just this. As usual, let’s just jump right in and we’ll explain as we go along.
Installing ollama
Go to the ollama website and download the appropriate version for your operating system.
I am on Windows, so I’ll be using the Windows (Preview)
version here. Once you have downloaded the file, click the installer to get started.
Just let it run and it will do its own thingβ¦
When it’s done, you will see an alert box in the corner:
Running an LLM
We now have the Ollama server running in the background. Let’s tell it what LLM we want it to run. We’ll get started with Llama 3
.
Llama 3 is Meta
(Facebook)’s open-source large language model. It is a general-purpose chat model just like ChatGPT, meaning it is good at general NLP tasks like question answering, sentiment analysis, text classification, and even code generation.
Unlike OpenAI GPT-4 and Google Gemini, Llama 3 is freely available for almost all uses and is open-source.
There are two basic sizes of Llama 3, the smaller one having 8 billion parameters which is quite practical to run on most computers, and the larger one having 70 billion parameters.
To get started all you need is a terminal window. I’ll use my bash
terminal integrated into VSCode here as we’ll be using VSCode for our coding in a moment anyway.
ollama run llama3
When you run this command, Ollama will automatically start pulling the llama 3
model for you. This will take a while as the model is around 4.7 GB in size.
Note that we are downloading the smaller version of Llama 3 here. If you want to run the larger version, replace the above command with ollama run llama3:70b
instead, but make sure you have a powerful enough computer as the model alone is 40 GB in size.
As all steps are exactly the same I’ll stick with the smaller version for now.
You should see something like the following:
pulling manifest pulling 00e1317cbf74... 100% ββββββββββββββββββββββββββββββββββ 4.7 GB pulling 4fa551d4f938... 100% ββββββββββββββββββββββββββββββββββ 12 KB pulling 8ab4849b038c... 100% ββββββββββββββββββββββββββββββββββ 254 B pulling 577073ffcc6c... 100% ββββββββββββββββββββββββββββββββββ 110 B pulling ad1518640c43... 100% ββββββββββββββββββββββββββββββββββ 483 B verifying sha256 digest writing manifest removing any unused layers success >>> Send a message (/? for help)
So let’s go and send it a message, shall we?
>>> What is a main sequence star?
Give it a second and it should start streaming your response to the terminal. This may be pretty fast or very slow depending on how powerful your system is. Ollama will use your GPU if available or run in CPU-only mode if required.
A Main Sequence Star! In the vast universe, stars come in different forms and stages of life. A Main Sequence Star (MSS) is a specific type of star that's incredibly important for our existence. **What is a Main Sequence Star?** A Main Sequence Star is a star that's fusing hydrogen atoms into helium in its core. This process releases an enormous amount of energy, making the star shine brightly. Here are some key characteristics of Main Sequence Stars: 1. **Stable burning**: MSS stars have a stable, consistent energy output, which means their brightness remains relatively constant over millions of years. 2. **Hydrogen fusion**: The core of the star is where hydrogen atoms (protons) combine to form helium, releasing energy in the form of light and heat. 3. **Medium mass**: Main Sequence Stars typically have a mass between 0.1 and 10 times that of our Sun (G-type star). More massive stars are red supergiants or blue giants. 4. **Surface temperature**: MSS temperatures range from about 3,000 to 50,000 Kelvin (4,500Β°F to 90,000Β°F), which determines the star's color: white, yellow, orange, or red. **Why Main Sequence Stars matter** Our Sun is a G-type Main Sequence Star, and it's the perfect distance from us for life as we know it. The energy released by our Sun (and other MSS stars) powers: 1. **Photosynthesis**: Plants convert sunlight into chemical energy through photosynthesis. 2. **Earth's climate**: The stable energy output of our Sun regulates Earth's temperature and weather patterns. 3. **Life on Earth**: The warmth and light provided by the Sun support life as we know it. In summary, Main Sequence Stars are stars that fuse hydrogen into helium in their cores, releasing energy and supporting life across the universe! Do you have any more questions about Main Sequence Stars or anything else?
Pretty good, you are now running llama 3
locally and chatting with it inside your terminal! Play around with it as you like and then type the following to exit:
/bye
Running other models
So let’s take a look at running some other models. We’ll start with Phi-3
. Run the following command to pull the model to your computer (the run command did this automatically for us before):
ollama pull phi3
Phi-3 is a family of open AI models developed by Microsoft that are small language models (SLMs). Microsoft claims that Phi-3 models are the most capable SLMs available, outperforming models of the same size and the next size up in language, coding, and math capabilities.
Phi-3 models are designed to be more efficient and cost-effective than large language models (LLMs), while still delivering strong capabilities. They are designed to be more accessible and deployable on resource-constrained devices like smartphones. This comes at the cost of less factual knowledge and only dealing with the English language well.
The download will finish reasonably quickly, as the model is only 2.3 GB in size. Once it’s done, run the following command to check our locally saved models:
ollama list
And you will see we now have two models available:
NAME ID SIZE MODIFIED llama3:latest a6990ed6be41 4.7 GB 25 minutes ago phi3:latest a2c89ceaed85 2.3 GB 3 seconds ago
Now let’s run phi3
:
ollama run phi3
Then ask it a question to check it out (if you need multiple lines wrap them in """
):
>>> """What ... is a sea turtle? ... """
A sea turtle, also known as a marine turtle, is a reptile of the order Testudines that has adapted to living in aquatic environments...... (truncated)
Easy enough! Type /bye
to exit the conversation. If you want to remove this smaller model from your computer, run the following command using rm
for remove:
ollama rm phi3
And now if you run ollama list
again, you will see that the phi3
model is no longer available:
NAME ID SIZE MODIFIED llama3:latest a6990ed6be41 4.7 GB 35 minutes ago
Available models
There are many more models available to run with ollama
, including general and more specialized models. It can be hard to keep track of all the different options out there so before we move on to adding prompts and a REST API and such, let’s explore what is out there in the free open-source LLM world.
Mistral (7B)
–ollama run mistral
. Made by the French company Mistral AI.Neural Chat (7B)
–ollama run neural-chat
. Fine-tuned model based on Mistral. Released by Intel.Starling (7B)
–ollama run starling-lm
. New open-source large language model (LLM) developed by researchers at UC Berkeley. It used feedback from AI models like GPT-4 as part of its training.Solar (10.7B)
–ollama run solar
. Developed by Upstage, a South Korean AI company. They focus on providing purpose-trained versions of Solar for various domains like healthcare, customer support, finance, etc.Llama 2 Uncensored (7B)
–ollama run llama2-uncensored
. Based on Llama 2 by Meta but has been retrained into an uncensored version that will not refuse certain questions.
These are fairly similar to Llama 3, so we won’t be installing them here one by one. Feel free to pull any of these you like and play around with them.
In addition, we also have Gemma
, which is developed by Google. You can sort of think of it as a ‘lite’ version of its powerful Gemini
models that you can download and run locally. There are two sizes:
Gemma 2B
–ollama run gemma:2b
.Gemma 7B
–ollama run gemma:7b
.
Specific use-case models.
To top it all off, we have two more specific models available. These are:
Code Llama (7B)
–ollama run codellama
. Code Llama is a code-specialized version of Meta’s Llama 2 model, further trained on a massive 500 billion token code and code-related dataset.LLaVA
–ollama run llava
. LLaVA (Large Language and Vision Assistant) is a large multimodal model developed by Microsoft Research. It combines a vision encoder and a language model to achieve multimodal chat capabilities mimicking the popular GPT-4.
We’re going to be taking a look at LLaVA, but before we dive in any further, let’s create a base directory to hold our project files. I’ll name my base directory Local_Models
and then also create a new folder inside named test_files
:
πLocal_Models πtest_files
I’ll be opening the Local_Models
directory in VSCode and running all the commands from there as we will soon get into more coding anyway.
Now save the next two images to the test_files
folder, making sure to give them nondescriptive names (so the LLM cannot cheat). I’ll name them output1.jpg
and output2.jpg
:
output1.jpg -> Trex on a bicycle.
output2.jpg -> A single strawberry.
So now you have the following:
πLocal_Models πtest_files πΌοΈoutput1.jpg πΌοΈoutput2.jpg
Start up a new terminal and make sure you are in the root project folder or use the cd
command to get there. In my case, this is:
admin@DirkMasterPC MINGW64 /c/Coding_Vault/Local_Models
Now lets both pull and run the LLaVA
model:
ollama run llava
When it is done downloading the model you can ask your question including the image in the following manner:
>>> Please describe this image ./test_files/output1.jpg
Make sure you do not forget the .
in front of the path and use the correct file name and extension, otherwise, it will hallucinate something random for you.
This image features a digital illustration of an anthropomorphic Tyrannosaurus Rex riding a bicycle. The dinosaur appears to be a juvenile with smaller size, characterized by a large head with two sharp teeth, and its body is proportioned to human standards. It has two arms, each holding the handlebars of the bike, and one leg bent over the pedal for propulsion. The T-Rex is wearing a black helmet with a visor and is looking ahead. The bicycle itself is equipped with training wheels on both sides, suggesting it's a children's or beginner's model. The tires are black, and the bike frame seems to be made of metal, possibly steel or aluminum. The background depicts a forested area with trees and foliage, creating a natural setting. There are also small details such as a bird flying in the upper left corner and what appears to be a small pond or stream visible through the branches. The overall style of the image is cartoonish and whimsical, likely intended for entertainment or educational purposes.
That is really impressive for an open-source and free model. It even realizes the Trex is smaller than it would normally be. It does get some small details wrong such as a helmet and training wheels on the side, but overall it is very good. This is a really difficult and detailed image as well to be fair. Let’s try with the more simple strawberry image:
First type /bye
to exit the conversation and then run the LLaVA model again:
ollama run llava
You may wonder why we pointlessly restarted LLaVA. I’ve noticed the terminal version of Ollama sometimes will get stuck describing features of the previous image. We’ll leave the terminal and start coding with our local LLMs soon, but for now, we just restart LLaVA to get a fresh start and get rid of the previous context.
>>> Please describe this image ./test_files/output2.jpg
The image is a digitally altered photograph that combines two different subjects. On the left side, there's a section of a yellow stool with an abstract design. Overlaid on this is a visual effect where several magnets with the word "Downtown" are arranged... (truncated)
Somehow it has trouble recognizing a strawberry! So yeah, the image recognition for LLaVA is really not perfect yet and is a bit hit and miss and seems to be ignorant of fruits in particular. It is still very impressive that it can recognize the Trex on a bike though.
I’m sure that if an open-source model of this size can get hit-and-miss results running on a normal local computer already, in a couple of years your phone will run models that will recognize anything you throw at it in real time.
As this is not quite reliable enough for us to use in our projects yet, we’ll be sticking to the text-based models for now as we move on to using Ollama with a REST API so we can approach it from our code.
Setting up a virtual environment
We’ll be running this project inside a virtual environment. A virtual environment is a self-contained directory that will allow us to install specific versions of packages inside the virtual environment without affecting the global Python installation.
We will use this mainly as there are many different versions of the packages we will be installing and we don’t want to have conflicts with other projects and installations already on your computer.
The virtual environment will make it easy for you to install my exact versions without worrying about affecting any of your other projects and is a good practice to follow in general.
To create a new virtual environment we’ll use a tool called pipenv
. If you don’t have pipenv
installed, you can install it using pip, which is Python’s package manager. Run the following command in your terminal:
pip install pipenv
Make sure the terminal is inside your root project folder, e.g. /c/Coding_Vault/Local_Models
, and then run the following command to create a new virtual environment:
pipenv shell
This will create a new virtual environment and also a Pipfile
in your project directory. Any packages you install using pipenv install
will be added to the Pipfile
.
- To generate a
Pipfile.lock
, which is used to produce deterministic builds, run:
pipenv lock
This will create a Pipfile.lock
in your project directory, which contains the exact version of each dependency to ensure that future installs are able to replicate the same environment.
We don’t need to install a library first to create a Pipfile.lock
. From now on when we install a library in this virtual environment with pipenv install library_name
, it will be added to the Pipfile
and Pipfile.lock
automatically, which are basically just text files keeping track of our exact project dependencies.
For reference, I’m using Python 3.10 for this project, but you should be fine with any recent version. Consider upgrading if you’re using an older version.
Now press Ctrl + Shift + P
and type Python: Select Interpreter
and select the virtual environment you just created. This will make sure that you are using the correct Python version for this project, as sometimes this seems to not happen automatically. You can find the correct one by looking at the name inside the (braces) as it should contain the name of your project folder.
Setting up a REST API for our LLMs
In order to start interacting with our LLMs programmatically we need to have an API to interact with the models running on our local computer. We can’t just keep chatting one message at a time in the terminal.
First, let’s start a model. I’ll be using llama3
for this example:
ollama run llama3
When it is up and running go ahead and open a second terminal window by clicking the + button:
You now have a second terminal window and can use the menu on the right to switch between them. In this new terminal window let’s do a simple API test call using the curl
command. This is a command line tool that will allow us to make HTTP requests.
curl http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt":"What is the hypothalamus?" }'
This will use the command line tool curl
to make a POST request to the Ollama API running on your computer. In the unlikely case that your system doesn’t have curl
installed, don’t worry about it and just read along or use a program like Postman
to make the request. (We won’t be using curl
after this test).
The -d
flag is used to send HTTP POST data along. The data is a JSON object with two keys: model
and prompt
. The model
key specifies the model we want to use, and the prompt
key specifies the prompt, as simple as that.
You may wonder where the port number 11434
and this /api/generate
path come from. Ollama will host an API for you on this port and path by default when you run a model.
Go ahead and send the request through your terminal and it will start streaming the response to you like this:
$ curl http://localhost:11434/api/generate -d '{ > "model": "llama3", > "prompt":"What is the hypothalamus?" > }' {"model":"llama3","created_at":"...","response":"The","done":false} {"model":"llama3","created_at":"...","response":" hypoth","done":false} {"model":"llama3","created_at":"...","response":"alam","done":false} {"model":"llama3","created_at":"...","response":"us","done":false} etc, etc...
You can press Ctrl + C
to stop the streaming and get back to your terminal when you’re satisfied that it works. We now know how to use Ollama and run LLMs locally on our own system and have an API to make requests to our own local models.
Go back to the first terminal window and type /bye
to exit the conversation and shut down the model. You can also close the terminal window if you like.
In the next part, we’ll learn how to make this into a proper chatbot with memory so it has an awareness of the context of the conversation and the ability to remember previous messages. We’ll also create a simple web interface for our chatbot using Gradio
, so we can interact with it in a more convenient way.
I’ll see you soon in the next part!
π Back to the Full Course on local models and Hugging Face (+Videos)
I just downloaded the llava model on 5/27/24, and the description of the strawberry is greatly improved from what you got:
“The image shows a single, ripe strawberry placed on what appears to be a surface with a neutral color. The strawberry has a rich red color and is characterized by its white core and the green leaves still attached. It’s positioned in such a way that it takes up a central role in the frame, with the top of the strawberry
visible at the top of the image, giving depth to the composition. The background is blurred, focusing attention on the strawberry. There are no visible texts or brands within the image, and the style suggests a high-resolution, detailed photograph with an artistic composition emphasizing the texture and vibrant color of the
fruit. “
π€―π€―π€―
The ctrl shift P deal is a VSC thing, correct?
You might want to mention you’re using bash.
Guys, you are more than welcome to delete this comment and the previous one. I’m using a somewhat different setup. If there’s a better place for me to ask questions, please let me know. saus@ieee.org…
Thanks, no this is great to ask. Also possible to ask ChatGPT for simpler questions. Great to have you as an active course member!