|
On this page: .split(), .join(), and list().
Splitting a Sentence into Words: .split()
Below, mary is a single string. Even though it is a sentence, the words are not represented as discreet units. For that, you need a different data type: a list of strings where each string corresponds to a word. .split() is the method to use:
|
>>> mary = 'Mary had a little lamb'
>>> mary.split()
['Mary', 'had', 'a', 'little', 'lamb']
| |
.split() splits mary on whitespce, and the returned result is a list of words in mary. This list contains 5 items as the len() function demonstrates. len() on mary, by contrast, returns the number of characters in the string (including the spaces).
|
>>> mwords = mary.split()
>>> mwords
['Mary', 'had', 'a', 'little', 'lamb']
>>> len(mwords)
5
>>> len(mary)
22
| |
Whitespace characters include space ' ', the newline character '\n', and tab '\t', among others. .split() separates on any combined sequence of those characters:
|
>>> chom = ' colorless green \n\tideas\n'
>>> print(chom)
colorless green
ideas
>>> chom.split()
['colorless', 'green', 'ideas']
| |
Splitting on a Specific Substring
By providing an optional parameter, .split('x') can be used to split a string on a specific substring 'x'. Without 'x' specified, .split() simply splits on all whitespace, as seen above.
|
>>> mary = 'Mary had a little lamb'
>>> mary.split('a')
['M', 'ry h', 'd ', ' little l', 'mb']
>>> hi = 'Hello mother,\nHello father.'
>>> print(hi)
Hello mother,
Hello father.
>>> hi.split()
['Hello', 'mother,', 'Hello', 'father.']
>>> hi.split('\n')
['Hello mother,', 'Hello father.']
| |
String into a List of Characters: list()
But what if you want to split a string into a list of characters? In Python, characters are simply strings of length 1. The list() function turns a string into a list of individual letters:
|
>>> list('hello world')
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
| |
More generally, list() is a built-in function that turns a Python data object into a list. When a string type is given, what's returned is a list of characters in it. When other data types are given, the specifics vary but the returned type is always a list. See this tutorial for details.
Joining a List of Strings: .join()
If you have a list of words, how do you put them back together into a single string? .join() is the method to use. Called on a "separator" string 'x', 'x'.join(y) joins every element in the list y separated by 'x'. Below, words in mwords are joined back into the sentence string with a space in between:
|
>>> mwords
['Mary', 'had', 'a', 'little', 'lamb']
>>> ' '.join(mwords)
'Mary had a little lamb'
| |
Joining can be done on any separator string. Below, '--' and the tab character '\t' are used.
|
>>> '--'.join(mwords)
'Mary--had--a--little--lamb'
>>> '\t'.join(mwords)
'Mary\thad\ta\tlittle\tlamb'
>>> print('\t'.join(mwords))
Mary had a little lamb
| |
The method can also be called on the empty string '' as the separator. The effect is the elements in the list joined together with nothing in between. Below, a list of characters is put back together into the original string:
|
>>> hi = 'hello world'
>>> hichars = list(hi)
>>> hichars
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']
>>> ''.join(hichars)
'hello world'
| |
|