Python Quickselect Algorithm

What is the Quickselect algorithm?

The Quickselect algorithm is a computer algorithm designed to find the k^th (e.g. smallest or largest) element from an unordered list. It is based on the idea behind the Quicksort algorithm, invented by the same author, Sir Charles Anthony Richard (Tony) Hoare. Here, k stands for the index of an element that would otherwise be found in an ordered list by following a common statistical operation, such as minimum, maximum, or median.

Finding the k^th Element – A Brute-Force Approach

Let us take for an example an unordered list li (assume zero-based indexing) of natural numbers: [22, 35, 1, 4, 8, 12, 7, 55]. We’re interested to find e.g. the k^th smallest number in the list li.

Using a naive approach, we would reorder, i.e. sort our list li by placing the smallest number on the 0^th place, the next one in the 1^st place, and so on, all the way to the largest element.

Finally, we would read the element on the k^th position.

Following this procedure, we’ve sorted our numbers and got them ordered as a list li_ordered = [1, 4, 7, 8, 12, 22, 35, 55]. Assuming the k we’re interested in is equal to 4, we would now easily retrieve li_ordered[4] = 12:

k = 4
li = [22, 35, 1, 4, 8, 12, 7, 55]
li.sort()

print(li)				
# Result: [1, 4, 7, 8, 12, 22, 35, 55]

print(li[4])			
# Result: 12

The given example is a brute-force approach, and though it brought us to our result, it isn’t the best way to do so. It is quite the opposite. Our approach required that we sort the entire list li, take into account that k = 4, and only then read our desired element by accessing it via the index k. How can we do better?

Towards the Quickselect Algorithm

A step in the right direction is taking a look at which elements of the list li_ordered are outside of the k^th position.

We see that elements before the position k are smaller than (or generally speaking, smaller than or equal to) the element li_ordered[k].
Likewise, elements after the position k are larger than (or generally speaking, larger than or equal to) the element li_ordered[k].

However, the absolute position of the k^th element does not change even if the left list slice li_ordered[:k] and the right list slice li_ordered[k+1:] are not internally ordered. In other words, the internal ordering of elements before and after position k is irrelevant, as long as all the smaller elements are placed somewhere before the position k, and all the larger elements are placed somewhere after the position k.

With this in mind, we only need a high-level arrangement that separates the smaller and larger numbers, for the k^th element, in two groups. This insight enables us to improve our naive approach by recognizing a central role that the k^th element has.

Every list element revolves or pivots around the k^th element during the rearrangement process, making the k^th element a pivotal element, or pivot.

It also leads us to the conclusion that we do not need the entire list li in a completely ordered form, potentially saving us from many unnecessary, costly comparisons. The only question that remains is, how to create the right pivot since our original list is not ordered? Following this question, we arrive at the Quickselect algorithm.

Quickselect – the Concept

The main goal of the Quickselect algorithm is to yield the correct pivot which corresponds to the k^th position in an ordered list.

In the majority of cases, an attempt at creating the correct pivot will take more than one iteration of pivot selection, but will gradually yield the correct one.

In each iteration, our list li will be consecutively split into left and right sublists, i.e. partitions, which contain elements partitioned accordingly to the candidate pivot in the iteration.

We will implement partitioning by introducing the function partition(). Each algorithm iteration will only on the partition that contains apply partition() the k^th position.

partition() will form two partitions based on its input arguments (list li and partition limits; k is read from the outer scope) and return the pivot index (also known as the partition boundary in computer science literature) as each iteration ends.

If the pivot index i_pivot returned from the most recent iteration of partition() is larger than k, we are certain that the k^th element is somewhere in the left partition. In that case, the algorithm continues by reapplying partition() on the left partition, further creating two smaller subpartitions with a new pivot candidate. In the last iteration, the pivot index will match the k^th position. We will retrieve the correct k^th element, and end the function partition().

The algorithm implementation consists of the outer function, usually called quickselect(), which iteratively calls the inner function partition() and returns the k^th element when found.

quickselect() can be implemented in two ways: as a recursive function or a non-recursive function with a loop. There are also two popular approaches, or schemes with variations to implementing the function partition(): the Lomuto partition scheme and Hoare’s original partition scheme, which are described below. Quickselect algorithm does not use an auxiliary data structure to support its execution. All the steps the algorithm performs are done on the original list li, classifying the algorithm as an in-place algorithm.

Lomuto Partition Scheme

Lomuto partition scheme, designed by Nico Lomuto, also uses an arbitrary position as a pivot (in our example, it is the rightmost position of the examined partition).

The iteration begins by setting both scanners to the same starting position (in our example, it is the leftmost position). The scanners, finder, and replacer, move to the right one step at a time.

The finder examines its element and, if the element is larger than the pivot element, it just skips the element and moves to the right.

The replacer holds its position, pointing to the last encountered element larger than the pivot element. The element, pointed to by the replacer, represents the candidate for eventual element exchange. When the finder finds the element smaller than the pivot element, it exchanges the element with the replacer.

This way, both partitions are simultaneously maintained, and now both scanners move by one step to the right.

Finally, when the finder reaches the rightmost element, the pivot, the iteration’s final exchange takes place. As the replacer still points to the first position in the right partition (as always, the exchange candidate), the replacer’s element is moved to the rightmost place.

The pivot element now takes first place in the right partition and becomes a partition boundary.

# Lomuto partition scheme

def partition(li, leftLimit, rightLimit):
	
	# A common practice is picking a random or 
	# an outermost position as a pivot candidate.
	pivot = rightLimit
	
	# Partition scanners ('finder' and 'replacer').
	# They both start from the same side, e.g. left.
	finder = replacer = leftLimit
	
	# The search for pivot continues recursively
	# until it is found and returned to 
	# the calling function, quickselect().
	while finder < rightLimit:
		
		if li[finder] < li[pivot]:
			li[finder], li[replacer] = li[replacer], li[finder]
			replacer += 1
		
		finder += 1
	
	# 'finder' is now equal to the right limit
	li[finder], li[replacer] = li[replacer], li[finder]
	pivot = replacer
	
	# If the pivot position matches 'k', 
	# we have found our kth element.
	if pivot == k:
		return pivot
	
	# Otherwise, if our kth element is somewhere 
	# in the left partition, the recursive search
	# turns left...
	elif pivot > k:
		return partition(li, leftLimit, pivot-1)
	
	# ... or if our kth element is somewhere in 
	# the right partition, the recursive search
	# turns right.
	else:
		return partition(li, pivot+1, rightLimit)

Hoare’s Original Partition Scheme

Hoare’s original partition scheme works by selecting a pivot, and scanning (scanner variables l and r) for misplaced elements from the left and right sides simultaneously. When misplaced elements are found (e.g. a larger one on the left side or a smaller one on the right side), they are exchanged.

The scanning continues until the scanners cross each other, meaning the split point has been found. As the left scanner currently points to the leftmost element in the right partition, it will swap its content with the pivot. Now that the pivot becomes a partition boundary, the algorithm decides if a partition boundary is equal to k (the search stops) or not equal to k (the search continues).

# Hoare original partition scheme

def partition(li, leftLimit, rightLimit):
		
	# A common practice is picking a random or 
	# an outermost position as a pivot candidate.
	pivot = rightLimit
	
	# Partition scanners (left and right).
	l, r = leftLimit, rightLimit - 1

	# The search for pivot continues recursively
	# until it is found and returned to 
	# the calling function, quickselect().
	while True:
		
		# The left scanner searches and pauses for the 
		# element(s) not belonging in the smaller group. 
		# It stops when the right scanner is crossed.
		while l <= r and li[l] <= li[pivot]:
			l += 1
			
		# The right scanner searches and pauses for the 
		# element(s) not belonging in the larger group. 
		# It stops when the left scanner is crossed.
		while r >= l and li[r] >= li[pivot]:
			r -= 1
		
		# When the left scanner crosses the right scanner,
		# the pivot and left scanner swap their content.
		if l > r:
			li[pivot], li[l] = li[l], li[pivot]

			# A partition boundary is formed by putting 
			# a pivot on its true position, delimiting
			# the two partitions.
			pivot = l
			
			# If pivot position matches 'k', 
			# we have found our kth element.
			if pivot == k:
				return pivot
			
			# Otherwise, if our kth element is somewhere 
			# in the left partition, the recursive search
			# turns left...
			elif pivot > k:
				return partition(li, leftLimit, pivot-1)
			
			# ... or if our kth element is somewhere in 
			# the right partition, the recursive search
			# turns right.
			else:
				return partition(li, pivot+1, rightLimit)
		
		# The scanners haven't crossed yet, but only paused
		# so that the element exchange can take place. The
		# misplaced elements are being placed in their groups.
		li[l], li[r] = li[r], li[l]

Quickselect – Implementation and Testing

After choosing one of the partition schemes, there is little left to be added. We add the main algorithm function quickselect(li, k), which has only two parameters, list li, and the k^th element index k. Although not necessary, for practical reasons, the function partition() is defined inside the main function quickselect(). Alongside the definition of the quickselect(), we also test the correctness of the algorithm implementation.

from random import sample

# Testing sample size
sampleSize = 10000
# Algorithm correctness flag
algorithmCorrect = True

# The main algorithm function
def quickselect(li, k):

	def partition(li, leftLimit, rightLimit):
		# (replace with Lomuto or Hoare partition function body)

	# If 'k' is out of bounds, end the algorithm.
	if k > len(li) or k < 0:
		return
	
	# Initialize the partition limits.
	left = 0
	right = len(li) - 1
	
	# If the list has only one element, end the algorithm.
	if left == right:
		return left
	
	# Start the search over the whole 
	# list as an initial partition.
	return partition(li, left, right)


# Testing phase - not a part of the Quickselect algorithm

# Generates a random list of 'sampleSize' in length.
li_original = sample(range(-sampleSize*100, sampleSize*100), sampleSize)

# Prints the initial, unordered list
print(li_original)

# Inspect the algorithm correctness: the loop invokes quickselect() 
# for each 'k' in the sorted variant of 'li_original' and compares 
# the results. The 'li_original' remains unsorted.
for k, v in enumerate(sorted(li_original)):
	print(f'k = {k}, v = {v}')

	# Quickselect() modifies the list in-place, so a 
	# fresh (shallow) copy is required for each test run.
	li = li_original.copy()
	algorithmResult = li[quickselect(li, k)]
	print(f'Algorithm result: {algorithmResult}', end='\n\n')
	
	# The algorithm is assumed to be correct until 
	# the first mismatch occurs.
	if v != algorithmResult:
		algorithmCorrect = False
		break
	
print(f'Algorithm correct: {algorithmCorrect}')

The object enumerate(sorted(li_original)) serves as a test reference: k = 0 returns the true smallest element in the object, k = 1 returns the next smallest element etc.

Since we expect that quickselect() will do the for same arguments li and k, we compare returned results with the enumerator object’s results. To ensure identical starting conditions for each test case, i.e. for each k, a shallow copy of the original, unordered list li_original is made. Finally, the algorithm is considered correct if no counter example is found during the entire run.

Efficiency Analysis

Quickselect algorithm complexity is entirely determined by the underlying partition scheme.

Hoare’s partition scheme is considered more efficient than the Lomuto partitioning scheme due to fewer average number of element swaps (exchanges). However, both partition schemes’ worst-case complexities degrade to O(n²) for sorted inputs with outermost elements as pivots.

This behavior is because one of the partitions contains only the outermost element during each iteration, and the other partition contains all the other elements. In that case, the partition function discards only the outermost position per iteration (n iterations in total) and scans all the remaining elements (n scans) per iteration.

The best and average-case complexity for both schemes, and therefore, for the entire Quickselect algorithm is O(n log n), assuming that each following partition is, on average, cut in half (log n iterations in total) and that all elements in the partition have to be scanned (n scans).

Conclusion

In this article, we have learned about the Quickselect algorithm.

First, we explained what it does and how to emulate it using a naive approach on a sorted list.
Second, we described how to find the element by applying a brute-force approach.
Third, we looked at the specifics of the sorted list regarding the searched element and gained an insight enabling us to improve our initial approach.
Fourth, we discussed the inner workings behind Quickselect. We also took a look at the main concept of Quickselect.
The fifth and sixth sections familiarized us with the two most common partition schemes used when teaching or implementing Quickselect.
The seventh section exposed the implementation of both partition schemes and the main algorithm, accompanied by an optional, but illustrative example of a simple algorithm correctness test.
In the final, eighth section, we took a look at a brief overview of the algorithm’s efficiency analysis.

Quickselect gives very good results for a random, unordered, average-to-long list of elements. Yet, when the list is relatively short or mainly sorted, simpler algorithms, such as insertion sort, are known to be more efficient, adaptive, and more straightforward to implement.