Sets and programming languages 鈥� Clayton Cafiero

Sets and programming languages

Author

Clayton Cafiero

Published

2025-01-05

Sets in the context of programming languages

Sets aren鈥檛 quite the same in the context of programming languages. For example, in most commonly used programming languages with a set datatype, two sets containing the same elements are considered equal (equivalent in terms of elements) but not necessarily identical (the same object in memory). This distinction arises because sets are typically treated as separate objects, even if they contain identical data. However, there are scenarios or specialized languages where sets containing the same elements might be treated as identical due to specific constraints or optimization mechanisms.

Many imperative programming languages support sets (of a sort). For example, in Python, two sets are treated as equivalent if they contain the same objects. (Sets are written enclosed in curly braces in Python.)

>>> a = {1, 2, 3}
>>> b = {1, 2, 3}
>>> a == b   # a and b are equivalent...
True
>>> a is b   # but not identical!
False

In this case, a and b are Python sets containing the same objects (the int literals 1, 2, and 3). When we test for equivalence with the == operator, this yields True鈥�a and b are equivalent. However, they are not identical. If we test for identity using the is operator, this yields False鈥�a and b are different objects, despite the fact that they contain the same elements. This is quite different from mathematical sets where, as we have seen, if A and B contain exactly the same elements then A and B are identical.

In a functional programming language like Haskell, we can compare two sets for equivalence in similar fashion.

ghci> import qualified Data.Set as Set
ghci> let set1 = Set.fromList [1, 2, 3]
ghci> let set2 = Set.fromList [1, 2, 3]
ghci> set1 == set2
True
ghci>

However, testing set identity is neither common nor straightforward in Haskell.

In some cases, the compiler may reuse equivalent objects鈥攅specially if they are immutable (as they would be in Haskell). Again, this is not always straightforward. The point is, that mathematical sets and 鈥渟ets鈥� in programming languages are rather different.

That said, let鈥檚 use Python to demonstrate certain set operations.

Creating a set

As we鈥檝e seen, we can create a set in Python using curly braces.

>>> a = {1, 2, 3}

We can confirm the type using Python鈥檚 built-in isinstance:

>>> isinstance(a, set)
True

Python sets cannot include duplicate values. If we try to assign a set literal containing repeated elements, Python ignores the repetitions.

>>> a = {1, 1, 2, 3}
>>> a
{1, 2, 3}

In Python, sets are mutable, so we can add (or remove) elements from a set. We add an element using the .add() method.

>>> a = {1, 2, 3}
>>> a.add(4)
>>> a
{1, 2, 3, 4}

To remove an element, we can use the .discard() method.

>>> a = {1, 2, 3, 4}
>>> a.discard(2)
>>> a
{1, 3, 4}

Notice that both of these methods modify the underlying set, and return None.

If you鈥檝e seen Python dictionaries before (which also use curly braces), you鈥檒l recognize that the set literal {1, 2, 3} is different from a dict literal, in that it has no key/value pairs (separated by :). However, the literal {} is treated as a dict in Python, and not an empty set. (Yes, unlike in mathematics, in Python we can have more than one empty set.) To create an empty set, we must use the set() constructor.

>>> i_am_empty = set()

Set comprehensions

Python supports comprehensions for lists, tuples, and sets. This is a convenient shorthand.

>>> a = {n for n in range(10)}
>>> a
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

In this form, a comprehension doesn鈥檛 do any more than the set() constructor would.

>>> a = set(range(10))
>>> a
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Both of these are equivalent to \{n \: | \: n \in \mathbb{N}\text{, and } 0 \leq n \leq 9\} (on the condition that 0 is included among the natural numbers). If we understand the universe to be \mathbb{N} then we can exclude that predicate from the set builder:

\{n \: | \: 0 \leq n \leq 9\}.

However, we can construct sets in a manner similar to set builder notation by applying a predicate in the comprehension.

>>> a = {n for n in range(10) if n % 2}  # odd integers between 0 and 10
>>> a
{1, 3, 5, 7, 9}

This is equivalent to \{n \: | \: n \equiv 1 \mod 2\text{, and } 0 < n < 10\}. We could also have a little fun using the definition of odd and do it this way:

>>> a = {2 * n + 1 for n in range(5)}  # same
>>> a
{1, 3, 5, 7, 9}

Infinite sets

In programming, we cannot construct infinite sets. This is a significant difference between the world of programming and the world of mathematics. We could try, I suppose, but eventually we鈥檇 run out of memory.

>>> s = set()
>>> i = 0
>>> while True:
...     s.add(i)
...     i += 1
...

If you try this, be prepared to wait a long time before getting an error message.

Finding the cardinality of a set

To find the cardinality of a set, we use the Python built-in len().

>>> a = {'c', 'a', 't'}
>>> len(a)
3
>>> len(set())
0

Testing for subset inclusion

Python allows us to test for subset inclusion using the .issubset() method, but it also overloads comparison operators < (strict subset) and <= (subset or equal). The overloaded operators are analogous to \subset and \subseteq.

>>> a = {'gardenia', 'petunia'}
>>> b = {'petunia', 'begonia', 'gardenia'}
>>> a.issubset(b)   # is `a` a subset of `b`?
True
>>> a <= b   # is `a` a subset of `b`?
True
>>> a < b    # is `a` a strict subset of `b`?
True
>>> b <= a   # is `b` a subset of `a`?
False

Membership

We can test for membership in a set with the Python鈥檚 in.

>>> s = {'gouda', 'cheddar', 'brie', 'romano'}
>>> 'cheddar' in s    # is 'cheddar' an element of s?
True
>>> 'wombat' in s     # is 'wombat' an element of s?
False

We can test for membership in the complement of a set with not in.

>>> s = {'gouda', 'cheddar', 'brie', 'romano'}
>>> 'cheddar' not in s    # is 'cheddar' in the complement of s?
False
>>> 'wombat' not in s     # is 'wombat' in the complement of s?
True

Set difference

Set difference in Python is straightforward, as the - operator is overloaded.

>>> a = {1, 2, 3, 4}
>>> b = {2, 4, 6, 8}
>>> a - b            # a \ b
{1, 3}

This is equivalent to A \setminus B. Note that just as subtraction of real numbers is not commutative, neither is set difference.

Set union and intersection

Python does not overload the + operator for set union. Instead, it provides the .union() method.

>>> a = {1, 2, 3, 4}
>>> b = {2, 4, 6, 8}
>>> a.union(b)
{1, 2, 3, 4, 6, 8}

This is equivalent to A \cup B.

Python does overload the | operator for set union, but tbh, that鈥檚 a little peculiar.

>>> a = {1, 2, 3, 4}
>>> b = {2, 4, 6, 8}
>>> a | b
{1, 2, 3, 4, 6, 8}

I don鈥檛 know about you, but I鈥檒l stick with .union().

For intersection, Python provides the .intersection() method.

>>> a = {1, 2, 3, 4}
>>> b = {2, 4, 6, 8}
>>> a.intersection(b)
{2, 4}

This is equivalent to A \cap B.

Both .union() and .intersection() can take multiple sets as arguments.

>>> a = {1, 2, 3, 4}
>>> b = {2, 4, 6, 8}
>>> c = {1, 3, 5, 7}
>>> a.union(b, c)
{1, 2, 3, 4, 5, 6, 7, 8}
>>> a.intersection(b, c)
set()

Here set() is the empty set, returned by the .intersection() method.

Notice that both methods return the result of the operation鈥攖hey do not modify the underlying set(s).

Symmetric difference

Recall that symmetric difference between A and B is the set of all elements that are in A or in B, but not in both. We can implement this in Python by taking the union and subtracting the intersection:

>>> a = {1, 2, 3, 4}
>>> b = {2, 4, 6, 8}
>>> a.union(b) - a.intersection(b)
{8, 1, 3, 6}

We can also implement this by taking the union of the differences:

>>> a = {1, 2, 3, 4}
>>> b = {2, 4, 6, 8}
>>> (a - b).union(b - a)
{8, 1, 3, 6}

Notice that (A \cup B) \setminus (A \cap B) and (A \setminus B) \cup (B \setminus A) (as above) are equivalent.

Summing (numeric) elements of a set

If a Python set consists only of numeric elements (int or float), we can sum the elements of a set with Python鈥檚 built-in sum().

>>> a = {1, 2, 3, 4}
>>> sum(a)
10

Exercises

Write Python set comprehensions for the following sets (with universe \mathbb{N})
1. \{n \: | \: n < 100 \}
2. \{n \: | \: n \equiv 1 \mod 2\text{, and } n < 100 \}
3. \{n \: | \: n \equiv 0 \mod 2\text{, and } n \equiv 0 \mod 3\text{, and } n < 100 \}
In Python, take the union \{1, 2, 3\} \cup \emptyset. What is the result?
In Python, take the intersection \{1, 2, 3\} \cap \emptyset. What is the result?
Write a Python function which takes two sets as arguments and returns their symmetric difference.
Test your function from #4 with the following inputs, A and B, and verify that it returns the correct values for A \oplus B.

A	B	A \oplus B
\{1, 2, 3\}	\{3, 4, 5\}	\{1, 2, 4, 5\}
\{1, 2, 3\}	\{4, 5, 6\}	\{1, 2, 3, 4, 5, 6\}
\{1, 2, 3\}	\{1, 2, 3\}	\emptyset
\{1, 2, 3\}	\emptyset	\{1, 2, 3\}

抖阴探探