LangChain, LangSmith and LangGraph

Hi and welcome to this course on LangGraph, LangChain, and LangSmith. My name is Dirk van Meerveld and I will be your host and guide as we go on this exploration together.

So what is up with all these Lang-words? Well, in short:

LangChain is a basic framework that will allow us to work with LLMs.
LangGraph will allow us to make more complex combinations using LangChain by introducing graph structures, where we can have multiple nodes or even teams of LLM agents working together.
LangSmith is a tool that helps us see exactly what is going on while we work with the above two, to help us debug and improve our code in a more convenient way.

LangChain

Let’s get started with LangChain🔗 first. Langchain is a framework designed to make it easier to build applications that use large language models (LLMs). Think of it as a set of tools that helps bridge the gap between LLMs and the applications you might want to build with them.

LangChain helps us:

Provide a unified interface: Any code you write can be used with different LLMs with little modification, and you can use the same code to write prompts or tools for different LLMs.
Prebuilt tools for common tasks: Langchain includes tools for common tasks you might want to do with LLMs, such as building chatbots, summarizing documents, or analyzing code. Besides just building our own tools and functions, we can also import community pre-built tools.
Memory and Context: Langchain makes it easy to incorporate memory and context into our LLM applications. This means our application can remember past interactions and use that information to inform future responses.

So let’s get started! First go ahead and create a new project folder and name it whatever you like, I’ll call mine FINX_LANGGRAPH:

📂 FINX_LANGGRAPH

Create a venv in the root project folder

We’ll be running this project inside a virtual environment. A virtual environment is a self-contained directory that will allow us to install specific versions of packages inside the virtual environment without affecting the global Python installation.

We will use this as I will be using specific versions for the libraries we install as we go along, and I want to make sure that you have the exact same experience as I do.

For example, when we use pydantic we’ll be using the older V1 for this project, as it plays nicely with LangChain. You’ll probably have V2 installed on your system-wide Python installation, and then your imports will be different from mine, causing confusion. We also don’t want to mess with your system-wide Python installation.

The virtual environment will make it easy for you to install my exact versions without worrying about affecting any of your other projects and is a good practice to follow in general.

To create a new virtual environment we’ll use a tool called pipenv. If you don’t have pipenv installed, you can install it using pip, which is Python’s package manager. Run the following command in your terminal:

pip install pipenv

Make sure the terminal is inside your root project folder, e.g. /c/Coding_Vault/Finx_Fine_Tuning, and then run the following command to create a new virtual environment:

pipenv shell

This will create a new virtual environment and also a Pipfile in your project directory. Any packages you install using pipenv install will be added to the Pipfile.

To generate a Pipfile.lock, which is used to produce deterministic builds, run:

pipenv lock

This will create a Pipfile.lock in your project directory, which contains the exact version of each dependency to ensure that future installs are able to replicate the same environment.

We don’t need to install a library first to create a Pipfile.lock. From now on when we install a library in this virtual environment with pipenv install library_name, they will be added to the Pipfile and Pipfile.lock, which are basically just text files keeping track of our exact project dependencies.

For reference, I’m using Python 3.10 for this project, but you should be fine with any recent version. Consider upgrading if you’re using an older version.

Basic project setup

Before we get started, we need to make sure we have our OpenAI API key ready to load in a convenient way, we cannot hardcode this one in our source code. Go to https://platform.openai.com/api-keys and copy your API key, or make a new one. You’ll only pay for what you use which will be cents if you just play around with it casually. Then create a new file called .env in the root folder of your project:

📂 FINX_LANGGRAPH
    📄 .env             ✨New file
    📄 Pipfile
    📄 Pipfile.lock

And paste your API key in the .env file like this, making sure not to use any spaces or quotes:

OPENAI_API_KEY=your_api_key_here

Then go ahead and save and close this file. If you are using Git, make sure to add this file to your .gitignore file so you don’t accidentally commit your API key to your repository. If you’re not using Git, just make sure you exclude the .env file if you share your code with anyone.

We’ll be using several API keys and settings across our project, adding more as we go, so let’s create a simple and reusable way to load them to stop us from writing the same code over and over again.

Run the following command in your terminal to add the python-decouple package inside your pipenv environment:

pipenv install python-decouple==3.7

We will use this package to read the .env file and get the API key from it. Now create a new file named setup_environment.py in the root folder of your project:

📂 FINX_LANGGRAPH
    📄 .env
    📄 Pipfile
    📄 Pipfile.lock
    📄 setup_environment.py  ✨New file

Then inside this new setup_environment.py file, write the following code:

import os

from decouple import config


def set_environment_variables() -> None:
    os.environ["OPENAI_API_KEY"] = str(config("OPENAI_API_KEY"))

We import the os and config from the decouple package we just installed a minute ago. We then create a function we can import from our other code files.

The config("OPENAI_API_KEY") function reads the .env file and gets the value of the OPENAI_API_KEY variable we set in there, so make sure you have used the exact same name in there. The str() cast just makes sure it’s a string value. We then set this value to the OPENAI_API_KEY environment variable using os.environ.

This way we can just use LangChain freely without having to worry about our API key as both LangChain and OpenAI are set up to read our API keys from the environment variables automatically.

LangChain basics

Ok, time to get started with LangChain! Let’s cover the basics first so we understand the building blocks. We’ll start with some installs. Make sure you run all of these even if you have some of these libraries installed already as we’re not using the global Python installation but our virtual environment. Run the following command in your terminal:

pipenv install openai==1.14.2 langchain==0.1.13 langchain-openai==0.1.0

The openai library will work with the OpenAI API behind the scenes while we use langchain and the langchain-openai library has some functionality that overlaps both.

Now create a new file named langchain_basics.py in the root folder of your project:

📂 FINX_LANGGRAPH
    📄 .env
    📄 langchain_basics.py  ✨New file
    📄 Pipfile
    📄 Pipfile.lock
    📄 setup_environment.py

Inside this new langchain_basics.py file, let’s get started with the following imports:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from setup_environment import set_environment_variables

Before we explain the imports, I want to cover a potential problem you may have here. You may have the following problem where the imports are not recognized and have red squiggly lines under them even though you just installed these libraries:

So what is going on here? Well, the virtual environment we created comes with its own Python interpreter, and the Python interpreter in your code editor is probably set to the system-wide Python interpreter. This means that the code editor doesn’t know where to find the libraries we just installed in the virtual environment.

To fix this, press Ctrl+Shift+P in VS Code to open the command palette, then type Python: Select Interpreter and select the Python interpreter from the virtual environment you created. You can find the correct one easily by comparing your root project name with the interpreter name. My root folder is FINX_LANGGRAPH, so I can find mine in the list under this name:

When you click this the red squiggly lines should go away and you’re now using the correct Python interpreter.

With that out of the way, let’s look at the imports here:

StrOutputParser is a class that will help us parse the output from the LLMs into a string format. Normally when you get the return from ChatGPT, we have to index into the response.choices[0].message.content to get the response. Just think of this as a convenience class that will help us with this.
ChatPromptTemplate is a class that will help us create a template for our chat prompts. This will make it easier to create prompts for the LLMs.
ChatOpenAI is a class that will basically just allow us to create an instance of OpenAI and use it with LangChain.

The value here of these output parsers and prompt templates is that they are a unified interface that we can use in the same manner without changes even if we change the LLM we are using halfway through our project or in the future.

Prompt templates

We then import the set_environment_variables function from the setup_environment file we created earlier. Now let’s continue our code by creating a prompt template:

set_environment_variables()


french_german_prompt = ChatPromptTemplate.from_template(
    "Please tell me the french and german words for {word} with an example sentence for each."
)

First, we make sure to call our set_environment_variables function to set our API key. As a simple example prompt, I’ll create an example that asks for the French and German words for a given word, along with an example sentence for each. This is just a simple example to show the parts of LangChain before we get into more complex examples.

The {word} part is the template variable that we can replace with any word we want to ask about. We then create a ChatPromptTemplate instance using the from_template method and pass in our prompt string. The ChatPromptTemplate class will help us create prompts for the LLMs in a more convenient way and basically deals with formatting message history like this:

## Example of a ChatPromptTemplate
template = ChatPromptTemplate.from_messages([
            ("system", "You are a helpful AI bot. Your name is {name}."),
            ("human", "Hello, how are you doing?"),
            ("ai", "I'm doing well, thanks!"),
            ("human", "{user_input}"),
        ])

We need only a single message here though, which is why we use the from_template method. In this case, LangChain will assume this to be a human message so this will result in:

template = ChatPromptTemplate.from_messages([
            ("human", "Please tell me the french and german words for {word} with an example sentence for each.")
        ])

Creating a chain

Now that we have a prompt template to create our prompts, let’s continue:

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
output_parser = StrOutputParser()

french_german_chain = french_german_prompt | llm | output_parser

First, we define our LLM instance using the ChatOpenAI class and pass in the model we want to use. I’ll be using gpt-3.5-turbo-0125 as it is more than enough for the simple test we’re doing here. If at any part in the course you want to use GPT-4-turbo instead then feel free to do so.

We’ve already set the API key to the environment variable so we don’t need to worry about it. We then create an instance of the StrOutputParser class to parse the output from the LLMs into a string response as discussed earlier.

Now that we have three building blocks, it is time for one of LangChain’s important concepts, “chains”. We can simply use the | operator to chain these building blocks together. This operator is taken from the pipe operator in Unix, which is used to chain commands together.

In this case, we take the french_german_prompt as the entry point of our chain, and we pipe the resulting prompt into our llm, making an LLM call. We then pipe the output into our output_parser to get the string response. Notice how easy and readable the chain is. We use chains to build stuff with large language models, hence the name LangChain. This piping style of syntax above is often referred to as LCEL or LangChain Expression Language.

Running the chain

Now let’s actually try and run this chain. To do this we can simply use the invoke method on our chain:

result = french_german_chain.invoke({"word": "polar bear"})
print(result)

We can technically also just pass in the string "polar bear" as we only have a single variable, but it’s better practice to use a dictionary like this as you may have multiple variables in your prompt. So go ahead and run this Python file and you should get something like the following:

French: ours polaire
German: Eisbär

Example sentence in French: L'ours polaire est un animal emblématique de l'Arctique.
Example sentence in German: Der Eisbär ist das größte an Land lebende Raubtier der Welt.

The order or structure may be slightly different as we didn’t specify any specific desired output structure, but that’s not the point here, it works! You’ll notice LangChain is very easy to read and understand, and this exact same code can be used with other LLMs with little modification.

We can also very easily stream the response instead. Edit your code like this, commenting out the previous invoke call and calling stream instead:

# result = french_german_chain.invoke({"word": "polar bear"})
# print(result)

for chunk in french_german_chain.stream({"word": "polar bear"}):
    print(chunk, end="", flush=True)

So for every chunk in the stream that results from calling french_german_chain.stream with the word “polar bear”, we print the chunk to the console. The end="" and flush=True are just to make sure there are no line breaks in between print messages and that the output is printed immediately to the console.

Now if you run it again, you’ll see the tokens being streamed and written to your console in real time.

Another useful method provided for us is batch, so let’s give that a spin as well:

# for chunk in french_german_chain.stream({"word": "polar bear"}):
#     print(chunk, end="", flush=True)

print(
    french_german_chain.batch(
        [{"word": "computer"}, {"word": "elephant"}, {"word": "carrot"}]
    )
)

This time we pass in a list of dictionaries with one entry for each run in the batch. Running this will give the responses in a list, one for each entry in the batch:

["French: \nComputer - Ordinateur \nExample sentence: J'utilise mon ordinateur pour travailler et regarder des films.\n\nGerman:\nComputer - Computer \nExample sentence: Mein Computer ist schon ein paar Jahre alt, aber er funktioniert immer noch einwandfrei.", "French: éléphant\nExample sentence: J'ai vu un éléphant au zoo.\n\nGerman: Elefant\nExample sentence: Der Elefant im Zoo war sehr groß.", "French: carotte\nExample sentence: J'ai acheté des carottes pour faire une soupe.\n\nGerman: Karotte\nExample sentence: Ich esse gerne Karotten als Snack."]

Now go ahead and comment that one out as well and let’s check the properties of our chain:

# print(
#     french_german_chain.batch(
#         [{"word": "computer"}, {"word": "elephant"}, {"word": "carrot"}]
#     )
# )

print("input_schema", french_german_chain.input_schema.schema())
print("output_schema", french_german_chain.output_schema.schema())

And if we run that we get a JSON schema that shows the in and outputs of our chain:

input_schema {'title': 'PromptInput', 'type': 'object', 'properties': {'word': {'title': 'Word', 'type': 'string'}}}
output_schema {'title': 'StrOutputParserOutput', 'type': 'string'}

We can see that the input takes a single object variable that needs to have a key word with a string value. If we add more variables to our prompt, we’ll see them in the schema as well. The output schema is a simple string because we used the StrOutputParser to parse the output into a string in the end.

Adding complexity

That is the basics of an extremely simple chain in LangChain. So let’s make it a bit more complex here. In this same file let’s declare a second chain and let’s say for the sake of a simple demonstration that this second chain is supposed to check if the output of the first chain is correct or not. (We’re just using simple examples here to save time and get to the good stuff faster).

So down below the other stuff in the langchain_basics.py file, let’s define the prompt template for our second chain:

# print("input_schema", french_german_chain.input_schema.schema())
# print("output_schema", french_german_chain.output_schema.schema())


check_if_correct_prompt = ChatPromptTemplate.from_template(
    """
    You are a helpful assistant that looks at a question and its given answer. You will find out what is wrong with the answer and improve it. You will return the improved version of the answer.
    Question:\n{question}\nAnswer Given:\n{initial_answer}\nReview the answer and give me an improved version instead.
    Improved answer:
    """
)

This time we have two variables in our prompt, question and initial_answer. We ask it to give an improved version of the first answer. The first answer is likely to be perfect already but again this is just for the sake of a quick demonstration.

We can reuse the llm and output_parser instances we created earlier, so let’s just create a new chain with the new prompt:

check_answer_chain = check_if_correct_prompt | llm | output_parser

Now we will need to run the input through the first chain, and then we need to keep both the original prompt from the first chain and the answer we get back from the first chain to pass them into the second one. So let’s do that:

def run_chain(word: str) -> str:
    initial_answer = french_german_chain.invoke({"word": word})
    print("initial answer:", initial_answer, end="\n\n")
    answer = check_answer_chain.invoke(
        {
            "question": f"Please tell me the french and german words for {word} with an example sentence for each.",
            "initial_answer": initial_answer,
        }
    )
    print("improved answer:", answer)
    return answer

So we define a function run_chain that takes a word as string input and will return a string. The initial answer is our return after we invoke the french_german_chain with the word.

We then print this answer and pass it into the check_answer_chain along with the original prompt, by passing both through a dictionary with the appropriate keys matching our prompt template. We print the improved answer and return it.

Now let’s run this function with a word:

run_chain("strawberries")

I apologize if I suddenly gave you a craving for strawberries! 🍓🍓🍓 Run it and your output will be something like this:

initial answer: French: fraises
Example sentence: J'adore manger des fraises en été.

German: Erdbeeren
Example sentence: Im Sommer esse ich gerne Erdbeeren mit Sahne.

improved answer: French: fraises
Example sentence: J'adore manger des fraises en été.

German: Erdbeeren
Example sentence: Im Sommer esse ich gerne Erdbeeren.

Now of course both of them are fine and there wasn’t really anything to improve as the question is very simple, but we successfully ran a chain through another chain.

So that works fine, but you can see passing the values around to the second chain is a bit cumbersome. Now imagine we want to add a 3rd step to the chains above or even a 4th one. A conditional split path perhaps? If x then call chain a and else call chain b.

Using the above method would be a bit of a mess, so we’d have to create some kind of state object instead that has all the data in a single object so that we can pass this around between chains, with each chain adding or modifying the state object as needed.

This is actually a pretty good solution to the problem and as it happens, this is pretty much what LangGraph will do for us. Before we get there though, we need to take a short detour to LangSmith and also learn how to write our own tools in LangChain so we can use the power of function calling and agents to fully leverage the power of LangGraph and create some really cool stuff. That’s it for part 1 of this course, I hope you enjoyed it and I’ll see you in the next one!