|
On this page: list comprehension [f(x) for x in li if ...].
Filtering Items In a List
Suppose we have a list. Often, we want to gather only the items that meet certain criteria. Below, we have a list of words, and we want to extract from it only the ones that contain 'wo'. For this, we will need to first make a new empty list, and then iterate through the original list to find items put in:
|
>>> wood = 'How much wood would a woodchuck chuck if a woodchuck could
chuck wood?'.split()
>>> wood
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if', 'a',
'woodchuck', 'could', 'chuck', 'wood?']
>>> wolist = []
>>> for x in wood:
if 'wo' in x:
wolist.append(x)
>>> wolist
['wood', 'would', 'woodchuck', 'woodchuck', 'wood?']
>>>
| |
OK, that works, but that's a lot of lines of code. What if I told you you can accomplish it all with one line of Python code? Well you can! Behold the superpower of list comprehension:
|
>>> [x for x in wood if 'wo' in x]
['wood', 'would', 'woodchuck', 'woodchuck', 'wood?']
>>>
| |
You want a list of words that are 5+ characters? That too can be done with list comprehension:
|
>>> [x for x in wood if len(x) >= 5]
['would', 'woodchuck', 'chuck', 'woodchuck', 'could', 'chuck', 'wood?']
>>>
| |
Words that are 5+ characters AND end with 'ck':
|
>>> [x for x in wood if len(x) >= 5 and x.endswith('ck')]
['woodchuck', 'chuck', 'woodchuck', 'chuck']
>>>
| |
You get the idea. Basically, list comprehension for filtering starts with [x for x in li], which in fact creates a new list that's identical to li, and then tacks on an if ... clause at the end, which works as filtering criteria.
|
>>> [x for x in wood if len(x) <= 4]
['How', 'much', 'wood', 'a', 'if', 'a']
>>>
| |
Transforming Items in a List
Another popular type of task with a list is to transform each item. For example, suppose I want to create a new list where each 'o' is replaced by 'oo' in every word. As before, the usual for-loop process gets the job done but is tedious:
|
>>> wood
['How', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if', 'a',
'woodchuck', 'could', 'chuck', 'wood?']
>>> doubleo = []
>>> for x in wood:
doubleo.append(x.replace('o', 'oo'))
>>> doubleo
['Hoow', 'much', 'wooood', 'woould', 'a', 'woooodchuck', 'chuck', 'if',
'a', 'woooodchuck', 'coould', 'chuck', 'wooood?']
>>>
| |
Again, with list comprehension, all you need is one line of code:
|
>>> [x.replace('o', 'oo') for x in wood]
['Hoow', 'much', 'wooood', 'woould', 'a', 'woooodchuck', 'chuck', 'if',
'a', 'woooodchuck', 'coould', 'chuck', 'wooood?']
>>>
| |
Another example -- capitalizing every word:
|
>>> [x.capitalize() for x in wood]
['How', 'Much', 'Wood', 'Would', 'A', 'Woodchuck', 'Chuck', 'If', 'A',
'Woodchuck', 'Could', 'Chuck', 'Wood?']
>>>
| |
A list of word length, for every word in wood:
|
>>> [len(x) for x in wood]
[3, 4, 4, 5, 1, 9, 5, 2, 1, 9, 5, 5, 5]
>>>
| |
So you can see how handy this is. The syntax works like this: starting with [x for x in li], which creates a new list that's identical to li, the initial x is substituted with f(x), a certain function with x as the input. The result is a new list where each x is transformed to f(x).
Filtering and Transformation, Applied Together
You might ask: can we filter AND transform at the same time? Sure we can. Below, we are filtering in only those words with 'wo' and then uppercasing them:
|
>>> [x.upper() for x in wood if 'wo' in x]
['WOOD', 'WOULD', 'WOODCHUCK', 'WOODCHUCK', 'WOOD?']
>>>
| |
What we have here is this syntax: [f(x) for x in li if ...]. Here's another example:
|
>>> [x+'-away' for x in wood if len(x) <= 4]
['How-away', 'much-away', 'wood-away', 'a-away', 'if-away', 'a-away']
>>>
| |
The transformation operation f(x) can be more complex. Below, you are filtering in words that are 5+ characters long, and outputing the words and their length as tuples.
|
>>> [(x, len(x)) for x in wood if len(x) >=5]
[('would', 5), ('woodchuck', 9), ('chuck', 5), ('woodchuck', 9),
('could', 5), ('chuck', 5), ('wood?', 5)]
>>>
| |
In the NLTK book, you will see a lot of examples of list comprehension in action, performing exciting operations on gigantic lists of words and other linguistic data. You should get comfortable with list comprehension: it will super-charge your text processing.
|