Python 3 Notes: More Functions and List Comprehension

Go to: Na-Rae Han's home page

Python 3 Notes
[ HOME | LING 1330/2330 ]
More Functions and List Comprehension

<< Previous Note Next Note >>

On this page: list comprehension [f(x) for x in li if ...], set(), sum(), min(), max(), all(), any(), re.search().
Supercharging List Comprehension with Additional Functions
List comprehension is already handy enough, but it gets even more powerful with a bit of creativity armed with various Python built-in functions. Below is a list of Python functions particularly useful in a list context:

Function What it does Example

sum() returns the sum of the numbers in list
>>> sum([1,2,3,4]) 10

set()
takes a list or string, returns a set object (marked with { }) after removing duplicates
>>> set([3,4,2,5,2,3,4]) {2, 3, 4, 5} >>> set('banana') {'a', 'n', 'b'}

min() returns the smallest item in list
>>> min([9,4,15,2001,30]) 4

max() returns the largest item in list
>>> max([9,4,15,2001,30]) 2001

any()
returns True only if at least one of the sequence items is True or "considered" to be true
>>> any([False, False, False, True, False]) True >>> any([False, False, False, False]) False >>> any([0, 0, 0, 3, 0, 0]) True >>> any(['', '', 'a', 'ab', '']) True

all()
returns True only if all of the sequence items are True or "considered" to be true
>>> all([True, True, True, True, True]) True >>> all([True, True, False, False, True]) False >>> all([4, 3, 6, 3, 12, 34]) True >>> all(['abc', '', 'a', 'ab', 'ac']) False

Exploring State Names
Let's explore the list of United States' 50 state names. First, copy over the states list from the text samples page, towards the bottom. The first mystery to solve: how long is the longest one-word state name? Below, you convert state names without ' ' to their respective lengths and then use max() on the resulting list.

>>> max([len(x) for x in states if ' ' not in x]) 13
There you have your straight answer, but perhaps you are curious to see what the longest ones are. Here the tuple-making syntax (a, b) comes in handy: the x is converted to (len(x), x) with the first item as the string length. And when you sort the result it does so based on the leftmost item. Now you see 'Massachusetts' is the longest at 13 characters:

>>> sorted([(len(x), x) for x in states if ' ' not in x], reverse=True)[:5] [(13, 'Massachusetts'), (12, 'Pennsylvania'), (11, 'Mississippi'), (11, 'Connecticut'), (10, 'Washington')]
How long are the state names on average? To compute this, you should first convert each state name into its length, and then funnel the list into sum(), then divide it by 50:

>>> [len(x) for x in states] [7, 6, 7, 8, 10, 8, 11, 8, 7, 7, 6, 5, 8, 7, 4, 6, 8, 9, 5, 8, 13, 8, 9, 11, 8, 7, 8, 6, 13, 10, 10, 8, 14, 12, 4, 8, 6, 12, 12, 14, 12, 9, 5, 4, 7, 8, 10, 13, 9, 7] >>> sum([len(x) for x in states])/50 8.44
Next up, you wonder which state name uses fewest different alphabetic letters. set() comes in handy here. Below, you convert each state name into the set of lowercase characters in it, and then to its size. Then you use min() to find the smallest value:

>>> set('Aloha'.lower()) {'h', 'l', 'a', 'o'} >>> len(set('Aloha'.lower())) 4 >>> min([len(set(x.lower())) for x in states]) 3
So, there is a state name which only has 3 different letters! Which would that be? Again, you can try tupling and sorting:

>>> sorted([(len(set(x.lower())), x) for x in states])[:5] [(3, 'Ohio'), (4, 'Alabama'), (4, 'Alaska'), (4, 'Hawaii'), (4, 'Indiana')]
OK, so 'Ohio' has only 3 letters, but 'Alabama' is much longer yet with only 4 characters, because 'a' gets repeated multiple times. Which has you thinking: which state names have no character repetition? The way to test this is to see if the length of the lower-cased name is the same as the length of the "set-fied" name. Hence:

>>> [x for x in states if len(x.lower()) == len(set(x.lower()))] ['Florida', 'Idaho', 'Iowa', 'Maine', 'New York', 'Texas', 'Utah', 'Vermont', 'Wyoming']
Now, the opposite question would be: which state name has the most repetition? Could it be 'Alabama'? What you want here is the longest state name with the fewest different letters. The avgrep() function below calculates this as the length of the string divided by the number of unique characters in it:

>>> def avgrep(x): return len(x)/len(set(x.lower())) >>> avgrep('banana') 2.0 >>> sorted([(avgrep(x), x) for x in states], reverse=True)[:5] [(2.75, 'Mississippi'), (2.25, 'Tennessee'), (1.75, 'Indiana'), (1.75, 'Alabama'), (1.625, 'Massachusetts')]
The winner is 'Mississippi', where each letter is repeated 2.75 times on average.
Nested List Comprehension
For something different: list comprehension inside list comprehension. You wonder if there is any state name that has all of the five "vowel"s. First you define a list of vowels, and then you can test if a word has all of them through list comprehension and the all() function:

>>> vowels = list('aeiou') >>> vowels ['a', 'e', 'i', 'o', 'u'] >>> [v in 'auntie' for v in vowels] [True, True, True, False, True] >>> all([v in 'auntie' for v in vowels]) False >>> all([v in 'dialogue' for v in vowels]) True
Now, you embed this list comprehension inside another list comprehension over the list of state names:

>>> [x for x in states if all([v in x for v in vowels])] []
Unfortunately, it returns an empty list, meaning there is no state name with all 5 "vowel"s. But could there be something with four, without 'u'? Yes there is:

>>> [x for x in states if all([v in x for v in 'aeio'])] ['Georgia', 'Minnesota']
Well, turns out we didn't have to turn 'aeiou' into a list at all: a string, as a sequence, works just like a list (of characters) in a list comprehension environment.
Regular Expressions in List Comprehension
Regular expressions too should be part of your toolbox when working with list comprehension. Going back to state names, did you notice a lot of them end in 'i...a'? I did, and I implemented the following regular expression search to find the ones that do. The regular expression is r'i.?a$', and the re.search() method is used, because each string needs to be matched once.

>>> import re >>> [x for x in states if re.search(r'i.?a$', x)] ['California', 'Florida', 'Georgia', 'North Carolina', 'Pennsylvania', 'South Carolina', 'Virginia', 'West Virginia']
Also, an awful lot of them seem to have 'a' as the last vowel: 'Arizona', 'Texas', etc. To allow consonants, but not vowels, to follow 'a', I used the regular expression r'a[^aeiou]*$'. There are as many as 28 state names that match this pattern:

>>> [x for x in states if re.search(r'a[^aeiou]*$', x)] ['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Florida', 'Georgia', 'Indiana', 'Iowa', 'Kansas', 'Louisiana', 'Maryland', 'Michigan', 'Minnesota', 'Montana', 'Nebraska', 'Nevada', 'North Carolina', 'North Dakota', 'Oklahoma', 'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota', 'Texas', 'Utah', 'Virginia', 'West Virginia'] >>> len([x for x in states if re.search(r'a[^aeiou]*$', x)]) 28

Practice
The state names are very list-comprehendable. (list-comprehensible?) I leave you with the following questions to try out:

Which state names do not have 'a' at all?

[x for x in states if 'a' not in x.lower()]

Which state names start and end with the same letter?

[x for x in states if x[0].lower() == x[-1]]

Which state names start and end with a "vowel" character??

[s for s in states if s[0].lower() in 'aeiou' and s[-1] in 'aeiou']

Which state names contain only one type of "vowel"? 'Alaska' is an example.

[x for x in states if len([v for v in 'aeiou' if v in x.lower()]) == 1]

Which state name has the largest # of different alphabetic characters?

sorted([(len(set(x.lower())), x) for x in states], reverse=True)[:5]
This one is counting the space ' ' as a letter, so technically you would want to count that out. It doesn't change the ranking, however.

Which alphabetic letter is not present in any of the state names? Yep, there's only one!

[x for x in 'abcdefghijklmnopqrstuwvyxz' if not any([s for s in states if x in s.lower()])]