What Is A Set?

The set data structure is one of the basic collection data types in Python and many other programming languages.

In fact, there are even popular languages for distributed computing that focus almost exclusively on set operations (like MapReduce or Apache Spark) as primitives of the programming language.

Definition: A set is an unordered collection of unique elements.

Let's break this down.

(1) Collection

A set is a collection of elements like a list or a tuple. The collection consists of either primitive elements (e.g. integers, floats, strings), or complex elements (e.g. objects, tuples). However, all data types must be hashable.

What is a hashable data type? 

Here is the relevant excerpt of the documentation:

"An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method)."

The set data structure heavily relies on the hash function to implement the specification (this would be a nice topic for tomorrow).

Let's have a look at an example (we stay with the Harry Potter examples because this is at the top of my mind -- reading it every day with my daughter):

hero = "Harry"
guide = "Dumbledore"
enemy = "Lord V."

print(hash(hero))
# 6175908009919104006

print(hash(guide))
# -5197671124693729851

## Puzzle: can we create a set of strings?

characters = {hero, guide, enemy}
print(characters)
# {'Lord V.', 'Dumbledore', 'Harry'}


## Puzzle: can we create a set of lists?
team_1 = [hero, guide]
team_2 = [enemy]

teams = {team_1, team_2}
# TypeError: unhashable type: 'list'

As you can see, we can create a set of strings because strings are hashable. But we cannot create a set of lists because lists are unhashable.

Why are lists unhashable? 

Because they are mutable: you can change a list by appending or removing elements. If you change the list data type, the hash value changes (it is calculated based on the content of the list). This directly violates the above definition ("hash value [...] never changes during its lifetime").

Key takeaway: mutable data types are not hashable. Therefore, you cannot use them in sets.

(2) Unordered

Unlike lists, sets are unordered because there is no fixed order of the elements. In other words, regardless of the order you put stuff in the set, you can never be sure in which order the set stores these elements.

Here is an example from the above code:

characters = {hero, guide, enemy} 
print(characters)
# {'Lord V.', 'Dumbledore', 'Harry'}

You put in the hero first, but the interpreter prints the enemy first (the Python interpreter is on the dark side, obviously).

(3) Unique

All elements in the set are unique. Each pair of values (x,y) in the set produces a different pair of hash values (hash(x)!=hash(y)). Hence, each pair of elements x and y in the set are different.

This means that we cannot create an army of Harry Potter clones to fight Lord V:

clone_army = {hero, hero, hero, hero, hero, enemy} 
print(clone_army)
# {'Lord V.', 'Harry'}

No matter how often you put the same value into the same set, the set stores only one instance of this value. An extension of the normal set data structure is the “multiset” data structure where a multiset can store multiple instances of the same value.

8 thoughts on “What Is A Set?”

Leave a Reply

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}