Python Beam Search Algorithm

You can check out the slide deck here to get a first intuition on how the algorithm works:

Python-blog-Beam-Search-Algorithm Download

Before we’ll dive into the algorithm and the Python implementation, let’s first skim over some related graph tutorials you may enjoy and that may help your understanding!

What is the Beam Search Algorithm?

The beam search algorithm is an informed search algorithm it is a more flexible variant of the previously explained best-first search algorithm. The beam search algorithm can take multiple paths into each iteration, ordered and selected by their path length.

Reminder: informed search algorithms use some kind of auxiliary information to guide their search strategy. Not being statically determined upfront makes them an interesting choice for a wide range of applications. However, their performance is greatly determined by the quality of auxiliary information, commonly known in computer science as heuristic function, h(vertex).

The same as its base algorithm, the best-first search algorithm, the beam search algorithm uses a greedy, hence best-first approach, where the next β path choices are determined by their current length, rather than the overall quality of the path.

Symbol β (reads as “beta”) stands for the width of the beam, i.e. the number of the shortest (cheapest) paths to be taken into the next iteration of the algorithm, while all the other paths are being pruned.

What is the Purpose of Beam Search?

As a more flexible variant of the best-first search algorithm, the beam search algorithm inherits some of its predecessor’s fundamental properties. However, depending on the β, the algorithm can perform as both pure best-first search algorithm (β=1), pure breadth-first search algorithm (β=∞), and of course, anything in between.

Applications: The beam search algorithm is commonly used in applications such as machine translation, where there is possibly more than one good enough solution.

Except for its robustness, the most notable property of the beam search algorithm is its ability to maintain the manageability and usability of systems with limited resources in dealing with large and dense graphs.

How Does Beam Search Work?

The beam search algorithm starts the graph traversal by marking the start vertex as visited, i.e. putting it in the dictionary and placing it into the priority queue of candidate vertices.

We will use the term explored, which is synonymous with the terms expanded or extended in other literature.

Vertex priority determines the best β vertices to be kept for the next iteration. However, this selection will be done first by expanding all the neighboring vertices for each vertex in the current tier.

Then, the best β paths will be kept and taken to the next iteration.

The cycle of choosing, exploring, and populating the priority queue continues, until the priority queue becomes exhausted. At that point, the beam search algorithm stops its execution.

Since the heuristic function greatly influences the algorithm performance, the function’s accuracy is crucial.

What are the Properties or Beam Search?

The main property of the beam search algorithm lies in its versatility, i.e. the fact that it can switch between the best-first search and breadth-first search approach of traversing the graph.

Its performance depends on the quality of the heuristic function, which in most cases represents the distance estimation from the goal vertex. The choice of heuristic function can influence the algorithm to find the shortest possible path to the goal vertex, to never complete the search — and everything in between these two extremes.

How is Beam Search Implemented in Python?

The implementation of our beam search algorithm is achieved by the function beam_search().

For a more self-contained educational showcase, we will omit the commonly used graph data structure and introduce a few simplifications.

First, we will assume densely connected vertices (with many-to-many connections).
Second, we will define a fixed matrix representing distances, or weights between individual vertices in each tier.
Third, each element of the distance matrix is composed of two parts: the first one is a list of distances from any previous vertex to its neighboring vertices, where vertices are determined by the indices of each distance, e.g. in a list [12, 13, 14], distance to vertex 0 is 12, and distances to vertices 1 and 2 are 13 and 14.

The function beam_search() takes only two parameters:

The distances parameter takes an initialized numpy.array object.
The beta parameter takes a number representing the beam width, which we choose between integer values of 1 and ∞ (a large enough number for practical purposes).

For a better understanding of the algorithm and its implementation, each step is precisely described in the code below:

from numpy import array


# Uses a beam to search through the tree
def beam_search(distances, beta):
    # Defines an initial, dummy record structure represented by
    # visited vertices and the path length (so far),
    # traversed along each path.
    paths_so_far = [[list(), 0]]
    
    # Propagates through the neighbouring vertices tier by tier
    # (row by row). A vertex position is indicated by its
    # index in each row (dists).
    for idx, tier in enumerate(distances):
        if idx > 0:
            print(f'Paths kept after tier {idx-1}:')
            print(*paths_so_far, sep='\n')
        paths_at_tier = list()
        
        # Follows each path.
        for i in range(len(paths_so_far)):
            # Reads the current path and its length (sum of distances).
            path, distance = paths_so_far[i]
            
            # Extends the path for every possible neighboring vertex
            # at the current tier.
            for j in range(len(tier)):
                path_extended = [path + [j], distance + tier[j]]
                paths_at_tier.append(path_extended)
                
        # Sorts the newly generated paths by their length.
        paths_ordered = sorted(paths_at_tier, key=lambda element: element[1])
        
        # The best 'beta' paths are preserved.
        paths_so_far = paths_ordered[:beta]
        print(f'\nPaths pruned after tier {idx}: ')
        print(*paths_ordered[beta:], sep='\n')
        
    return paths_so_far


# Define a distance matrix of 10 tiers
dists = [[1, 3, 2, 5, 8],
         [4, 7, 9, 6, 7]]
dists = array(dists)

# Calculates the best paths by applying a beam of width 'beta'.
best_beta_paths = beam_search(dists, 3)

# Prints the best 'beta' paths.
print('\nThe best \'beta\' paths:')
for beta_path in best_beta_paths:
    print(beta_path)

The test run gave us the output (for β = 3):

Paths pruned after tier 0: 
[[3], 5]
[[4], 8]
Paths kept after tier 0:
[[0], 1]
[[2], 2]
[[1], 3]

Paths pruned after tier 1: 
[[1, 0], 7]
[[0, 1], 8]
[[0, 4], 8]
[[2, 3], 8]
[[2, 1], 9]
[[2, 4], 9]
[[1, 3], 9]
[[0, 2], 10]
[[1, 1], 10]
[[1, 4], 10]
[[2, 2], 11]
[[1, 2], 12]

The best 'beta' paths:
[[0, 0], 5]
[[2, 0], 6]
[[0, 3], 7]

The resulting output shows the intermediate and final paths in a list of vertex indices and the total path length.

Based on the output, we can see that the search started from the root vertex and that in the first iteration (tier 0) the beam_search() has pruned two and kept three (shortest) paths, marked by the vertex indices and the appropriate total path length so far.

In the second iteration, the beam_search() has pruned twelve and kept three (shortest) paths, marked by the vertex indices and the appropriate total path length so far. Since our dists matrix has only two tiers (I’ve kept it short for educational purposes), the algorithm ends here.

The final result is also displayed as The best ‘beta’ paths.

After setting the β to 1 and a re-run, we got a result matching a best-first search algorithm:

Paths pruned after tier 0: 
[[2], 2]
[[1], 3]
[[3], 5]
[[4], 8]
Paths kept after tier 0:
[[0], 1]

Paths pruned after tier 1: 
[[0, 3], 7]
[[0, 1], 8]
[[0, 4], 8]
[[0, 2], 10]

The best 'beta' paths:
[[0, 0], 5]

On the contrary, a large β (larger than number of distances in each tier multiplied, e.g. 5 x 5) would yield a breadth-first search algorithm behavior, where no pruning occurs:

Paths pruned after tier 0: 

Paths kept after tier 0:
[[0], 1]
[[2], 2]
[[1], 3]
[[3], 5]
[[4], 8]

Paths pruned after tier 1: 


The best 'beta' paths:
[[0, 0], 5]
[[2, 0], 6]
[[0, 3], 7]
[[1, 0], 7]
[[0, 1], 8]
[[0, 4], 8]
[[2, 3], 8]
[[2, 1], 9]
[[2, 4], 9]
[[1, 3], 9]
[[3, 0], 9]
[[0, 2], 10]
[[1, 1], 10]
[[1, 4], 10]
[[2, 2], 11]
[[3, 3], 11]
[[1, 2], 12]
[[3, 1], 12]
[[3, 4], 12]
[[4, 0], 12]
[[3, 2], 14]
[[4, 3], 14]
[[4, 1], 15]
[[4, 4], 15]
[[4, 2], 17]

Efficiency analysis

The algorithm’s worst-case time complexity depends on β and lies between O(bd) (behaves like a pure best-first search algorithm) and O(|V| + |E|) (behaves like a pure breadth-first search algorithm). It is determined by the heuristic function and the factor β.

The algorithm’s worst-case space complexity also depends on β and lies between O(bd) (behaves like a pure best-first search algorithm) and O(|V|) (behaves like a pure breadth-first search algorithm).

Conclusion

In this article, we learned about the beam search algorithm.

First, we explained what a beam search algorithm is.
Second, we took a look at what are its common purposes and applications.
Third, we went through an explanation of how the algorithm works.
Fourth, we examined the algorithm’s main properties.
Fifth, we went through the implementation of the algorithm. We also tested the algorithm by calling its main function, beam_search(), and analyzed its steps of execution for the smallest, middle, and largest (“infinite”) β scenarios.
Sixth, we analyzed the algorithm efficiency.

In the end, we concluded that the algorithm’s efficiency may be optimal if it behaves like a breadth-first search algorithm, although such mode of operation would defeat its initial purpose — to reduce the space complexity and data storage requirements.

In other modes of operation, the algorithm is not guaranteed to perform optimally and might also take a virtually infinite time in reaching the solution, especially for β = 1.

However, this behavior can be prevented by constructing the appropriate heuristic function using the relevant knowledge about the graph and vertice relationships.