Go to: Na-Rae Han's home page  

Python 2.7 Tutorial

With Videos by mybringback.com

More Functions and List Comprehension

                          << Previous Tutorial           Next Tutorial >>
On this page: list comprehension [f(x) for x in li if ...], set(), sum(), min(), max(), all(), any(), re.search().

Supercharging List Comprehension with Additional Functions

List comprehension is already handy enough, but it gets even more powerful with a bit of creativity armed with various Python built-in functions. Below is a list of Python functions particularly useful in a list context:
Function What it does Example
sum() returns the sum of the numbers in list
>>> sum([1,2,3,4])
10 

set()

takes a list or string, returns a set object after removing duplicates
>>> set([3,4,2,5,2,3,4])
set([2,3,4,5])
>>> set('banana')
set(['a', 'b', 'n'])  
min() returns the smallest item in list
>>> min([9,4,15,2001,30])
4 
max() returns the largest item in list
>>> max([9,4,15,2001,30])
2001 

any()

returns True only if at least one of the sequence items is True or "considered" to be true
>>> any([False, False, False, True, False])
True
>>> any([False, False, False, False])
False
>>> any([0, 0, 0, 3, 0, 0])
True
>>> any(['', '', 'a', 'ab', ''])
True

all()

returns True only if all of the sequence items are True or "considered" to be true
>>> all([True, True, True, True, True])
True
>>> all([True, True, False, False, True])
False
>>> all([4, 3, 6, 3, 12, 34])
True
>>> all(['abc', '', 'a', 'ab', 'ac'])
False

Exploring State Names

Let's explore the list of United States' 50 state names. First, copy over the states list from the text samples page, towards the bottom. The first mystery to solve: how long is the longest one-word state name? Below, you convert state names without ' ' to their respective lengths and then use max() on the resulting list.
 
>>> max([len(x) for x in states if ' ' not in x])
13 
There you have your straight answer, but perhaps you are curious to see what the longest ones are. Here the tuple-making syntax (a, b) comes in handy: the x is converted to (len(x), x) with the first item as the string length. And when you sort the result it does so based on the leftmost item. Now you see 'Massachusetts' is the longest at 13 characters:
 
>>> sorted([(len(x), x) for x in states if ' ' not in x], reverse=True)[:5]
[(13, 'Massachusetts'), (12, 'Pennsylvania'), (11, 'Mississippi'), 
(11, 'Connecticut'), (10, 'Washington')] 
How long are the one-word state names on average? To compute this, you need the sum of the name lengths divided by 50. sum() is the function to use:
 
>>> sum([len(x) for x in states if ' ' not in x])/50.0
6.08 
Next up, you wonder which state name uses fewest different alphabetic letters. set() comes in handy here. Below, you convert each state name into the set of lowercase characters in it, and then to its size. Then you use min() to find the smallest value:
 
>>> set('Aloha'.lower())
set(['a', 'h', 'l', 'o'])
>>> len(set('Aloha'.lower()))
4
>>> min([len(set(x.lower())) for x in states])
3 
So, there is a state name which only has 3 different letters! Which would that be? Again, you can try tupling and sorting:
 
>>> sorted([(len(set(x.lower())), x) for x in states])[:5]
[(3, 'Ohio'), (4, 'Alabama'), (4, 'Alaska'), (4, 'Hawaii'), (4, 'Indiana')] 
OK, so 'Ohio' has only 3 letters, but 'Alabama' is much longer yet with only 4 characters, because 'a' gets repeated multiple times. Which has you thinking: which state names have no character repetition? The way to test this is to see if the length of the lower-cased name is the same as the length of the "set-fied" name. Hence:
 
>>> [x for x in states if len(x.lower()) == len(set(x.lower()))]
['Florida', 'Idaho', 'Iowa', 'Maine', 'New York', 'Texas', 'Utah', 'Vermont', 'Wyoming'] 
Now, the opposite question would be: which state name has the most repetition? Could it be 'Alabama'? What you want here is the longest state name with the fewest different letters. The avgrep() function below calculates this as the length of the string divided by the number of unique characters in it:
 
>>> def avgrep(x):
        return float(len(x))/len(set(x.lower()))

>>> avgrep('banana')
2.0
>>> sorted([(avgrep(x), x) for x in states], reverse=True)[:5]
[(2.75, 'Mississippi'), (2.25, 'Tennessee'), (1.75, 'Indiana'), (1.75, 'Alabama'), 
(1.625, 'Massachusetts')] 
The winner is 'Mississippi', where each letter is repeated 2.75 times on average.

Nested List Comprehension

For something different: list comprehension inside list comprehension. You wonder if there is any state name that has all of the five "vowel"s. First you define a list of vowels, and then you can test if a word has all of them through list comprehension and the all() function:
 
>>> vowels = list('aeiou')
>>> vowels
['a', 'e', 'i', 'o', 'u'] 
>>> [v in 'auntie' for v in vowels]
[True, True, True, False, True]
>>> all([v in 'auntie' for v in vowels])
False
>>> all([v in 'dialogue' for v in vowels])
True  
Now, you embed this list comprehension inside another list comprehension over the list of state names:
 
>>> [x for x in states if all([v in x for v in vowels])]
[]  
Unfortunately, it returns an empty list, meaning there is no state name with all 5 "vowel"s.

Regular Expressions in List Comprehension

Regular expressions too should be part of your toolbox when working with list comprehension. Going back to state names, did you notice a lot of them end in 'i...a'? I did, and I implemented the following regular expression search to find the ones that do. The regular expression is r'i.?a$', and the re.search() method is used, because each string needs to be matched once.
 
>>> [x for x in states if re.search(r'i.?a$', x)]
['California', 'Florida', 'Georgia', 'North Carolina', 'Pennsylvania', 
'South Carolina', 'Virginia', 'West Virginia']  
Also, an awful lot of them seem to have 'a' as the last vowel: 'Arizona', 'Texas', etc. To allow consonants, but not vowels, to follow 'a', I used the regular expression r'a[^aeiou]*$'. There are as many as 28 state names that match this pattern:
 
>>> [x for x in states if re.search(r'a[^aeiou]*$', x)]
['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Florida', 'Georgia', 
'Indiana', 'Iowa', 'Kansas', 'Louisiana', 'Maryland', 'Michigan', 'Minnesota', 
'Montana', 'Nebraska', 'Nevada', 'North Carolina', 'North Dakota', 'Oklahoma', 
'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Texas', 'Utah', 
'Virginia', 'West Virginia']
>>> len([x for x in states if re.search(r'a[^aeiou]*$', x)])
28  

Practice

The state names are very list-comprehendable. I leave you with the following questions to try out:
  1. Which state names do not have 'a' at all?
  2. Which state names contain only one type of "vowel"? 'Alaska' is an example.
  3. Which state name has the largest # of different alphabetic characters?
  4. Which alphabetic letters are not present in any of the state names?