Python

Why defining constants is important – a Python example

This post will walk through an example of why defining a known constant can save lots of computational time.

How to find the key with the maximum value in a Python dictionary

There’s a few ways to go about getting the key associated with the max value in a Python dictionary. The two ways we’ll show each involve using a list comprehension.

First, let’s set the scene by creating a dictionary with 100,000 key-value pairs. We’ll just make the keys the integers between 0 and 99,999 and we’ll use the random package to randomly assign values for each of these keys based off the uniform distribution between 0 and 100,000.


import random
import time

vals = [random.uniform(0, 100000) for x in range(100000)]

mapping = dict(zip(range(100000), vals))


Now, mapping contains 100,000 key-value pairs.

Naive approach

The first approach we’ll try is using a list comprehension to loop over mapping.items(). This list comprehension contains an if statement checking each particular value for whether it is equal to the max of the values in the mapping dict. As a reminder, using the items method on a dictionary will return a collection of the key-value tuples comprising the key-value mapping seen in the dictionary (this collection has what’s known as the dict_items data type).

Note – in this approach we’re calculating the max of mapping.values() in every iteration of the list comprehension. While calculating this once doesn’t appear to take much time, it actually adds up to quite a bit of time the larger our list comprehension becomes.

 
start = time.time()
max_key = [key for key,val in mapping.items() 
              if val == max(mapping.values())]
end = time.time()

print(end - start)

As we can see, this approach took over 169 seconds, which is…very slow. Let’s see the effect of defining a constant for max(mapping.values()).

Much more efficient approach

After we run the slow-running code above, let’s do things more efficiently. Since the max across the values of mapping doesn’t change, we just need to calculate it once and store that value as a variable.


start = time.time()
m = max(mapping.values()) # define constant for max across values
max_key = [key for key,val in mapping.items() if val == m]
end = time.time()

print(end - start)

In this case, our task is done in a fraction of a second! This is much faster than our initial approach. As we increase the size of the dictionary, this time difference becomes more and more pronounced.

That’s it for this post! Please check out my other Python posts by clicking here.

Andrew Treadway

Recent Posts

Software Engineering for Data Scientists (New book!)

Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for…

2 years ago

How to stop long-running code in Python

Ever had long-running code that you don't know when it's going to finish running? If…

3 years ago

Faster alternatives to pandas

Background If you've done any type of data analysis in Python, chances are you've probably…

3 years ago

Automated EDA with Python

In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used…

4 years ago

How to plot XGBoost trees in R

In this post, we're going to cover how to plot XGBoost trees in R. XGBoost…

4 years ago

Python collections tutorial

In this post, we'll discuss the underrated Python collections package, which is part of the…

4 years ago