Python

Python collections tutorial

In this post, we’ll discuss the underrated Python collections package, which is part of the standard library. Collections allows you to utilize several data structures beyond base Python.

How to get a count of all the elements in a list

One very useful function in collections is the Counter method, which you can use to return a count of all the elements in a list.


nums = [3, 3, 4, 1, 10, 10, 10, 10, 5]
collections.Counter(nums)

The Counter object that gets returned is also modifiable. Let’s define a variable equal to the result above.


counts = collections.Counter(nums)

counts[20] += 1

Notice how we can add the number 20 to our Counter object without having to initialize it with a 0 value.

Counter can also be used if you have a list of tuples. This can be handy if you want to count pairs of numbers or other types of objects.


temp = [(2, 3), (2, 3), (4, 5), (2, 3), (1, 1), (4, 5)]

collections.Counter(temp)

How to create default values for Python dictionaries

In a previous post, we discussed Python dictionaries. defaultdict provides a modified dictionary structure that lets you assign a default value for any key. This means that if you try referencing a key that you haven’t defined, defaultdict will return some default value rather than a “KeyError”. The default value can also be a data structure. For example, suppose you want to create a dictionary where you expect each element to be a set. defaultdict can automatically create this for you.


sample_map = collections.defaultdict(set)

sample_map["a"].add(5)
sample_map["a"].add(10)
sample_map["b"].add(2)

sample_map

If you try to reference a non-existent key, Python will return an empty set since that’s our default value.


sample_map["x"] # returns empty set

Also, note how we don’t need to initialize sample_map[“a”] to an empty set since defaultdict takes care of it for us. This works for other data types as well. For example, the code below will create a default dictionary structure for lists.


list_container = collections.defaultdict(list)

list_container["x"].append(5)
list_container["x"].append(3)

list_container["y"].append(1)

Using defaultdict is one way of creating adjacency lists in Python to represent graphs. It can be useful in situations where you need to build out the adjacency list based off some prior logic (like running DFS, for example).

In addition to sets and lists, defaultdict also supports ints (integers) as a default value.


collections.defaultdict(int)


# create a dictionary where the default value is 5
def_five = collections.defaultdict(lambda: 5)

def_five["a"] = 10

# "b" is not a key, but running this line
# will return the default value of 5
def_five["b"]

deque

The deque data structure in collections is a double-ended queue. deque is pronounced like “deck”. A deque works similarly to a list, but is more optimized for appending or popping off elements from either the front of the queue or the end. For instance, if you have a very large list in Python and you want to append an element to the start of the list, this operation takes O(n) time since each element in the list needs to be shifted over by one. In other words, if your list has 100,000 elements and you append an element to the front of the list, this requires 100,000 shift operations to move each element over by one in the list. This could become computationally expensive, especially if you need to do this within a loop, for example.


# create a list with 100,000 elements
nums = list(range(100000))

# append -10 to the front of the list
# note that this takes O(n) time, or 100,000 operations since 
# each number if shifted over by one in the list
nums.insert(0, -10)

Doubled-ended queues, on the other hand, can append elements to the front or back in approximately O(1) time. Appending elements to the front of the queue can be done using the appendleft method.


queue = collections.deque()

queue.appendleft(5)
queue.appendleft(4)
queue.appendleft(3)

Similarly, to pop elements off from the left side, we can use the popleft method. Again, normally if you were to pop an element off from the left side of of a list, it would take O(n) operations, but with a doubled-ended queue, you can do this in O(1) time.


queue.popleft()

Double-ended queues also comes with operations to append multiple elements at once to either the front or back of the queue. For example, Python lists have an extend method that adds a list of other elements to the current list.


nums = [2, 3, 4]

nums.extend([5, 6, 7])

# nums = [2, 3, 4, 5, 6, 7]

Double-ended queues have an extendleft method and an extend method to be able to do this for the left and right sides of the list, respectively.


queue.extendleft([1, 2, 3, 4, 5])

Named tuples

Another data structure available is the named tuple. If you’ve used R, you may be familiar with the concept of named vectors where you can reference elements in a vector not only by index, but only by a name. Similarly, named tuples offered by collections let you reference tuple elements by name or index.


Info = collections.namedtuple("info", ["a", "b", "c"])
test = Info(a = 10, b = 20, c = 30)

# reference elements by index
test[0] + test[1]

# reference elements by name
test.a + test.b


Named tuples can also be useful when working with sqlite, as seen here.

Conclusion

That’s all for now! If you enjoyed this post, please share it with your friends. To learn more about collections, check out its documentation here.

Andrew Treadway

Recent Posts

Software Engineering for Data Scientists (New book!)

Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…

2 years ago

How to stop long-running code in Python

Ever had long-running code that you don't know when it's going to finish running? If…

3 years ago

Faster alternatives to pandas

Background If you've done any type of data analysis in Python, chances are you've probably…

3 years ago

Automated EDA with Python

In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…

4 years ago

How to plot XGBoost trees in R

In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…

4 years ago

How to create PDF files with Python

In a previous article we talked about several ways to read PDF files with Python.…

4 years ago