Go to: Na-Rae Han's home page  

Python 3 Notes

        [ HOME | LING 1330/2330 ]

Splitting and Joining Strings

<<Previous Note           Next Note >>
On this page: .split(), .join(), and list().

Splitting a Sentence into Words: .split()

Below, mary is a single string. Even though it is a sentence, the words are not represented as discreet units. For that, you need a different data type: a list of strings where each string corresponds to a word. .split() is the method to use:
 
>>> mary = 'Mary had a little lamb'
>>> mary.split() 
['Mary', 'had', 'a', 'little', 'lamb'] 
.split() splits mary on whitespce, and the returned result is a list of words in mary. This list contains 5 items as the len() function demonstrates. len() on mary, by contrast, returns the number of characters in the string (including the spaces).
 
>>> mwords = mary.split() 
>>> mwords
['Mary', 'had', 'a', 'little', 'lamb'] 
>>> len(mwords)                # number of items in mwords
5 
>>> len(mary)                  # number of characters
22 
Whitespace characters include space ' ', the newline character '\n', and tab '\t', among others. .split() separates on any combined sequence of those characters:
 
>>> chom = ' colorless     green \n\tideas\n'       # ' ', '\n', '\t' bunched up
>>> print(chom)
 colorless     green 
	ideas
 
>>> chom.split()
['colorless', 'green', 'ideas'] 

Splitting on a Specific Substring

By providing an optional parameter, .split('x') can be used to split a string on a specific substring 'x'. Without 'x' specified, .split() simply splits on all whitespace, as seen above.
 
>>> mary = 'Mary had a little lamb'
>>> mary.split('a')                 # splits on 'a'
['M', 'ry h', 'd ', ' little l', 'mb'] 
>>> hi = 'Hello mother,\nHello father.'
>>> print(hi)
Hello mother,
Hello father. 
>>> hi.split()                # no parameter given: splits on whitespace
['Hello', 'mother,', 'Hello', 'father.'] 
>>> hi.split('\n')                 # splits on '\n' only
['Hello mother,', 'Hello father.'] 

String into a List of Characters: list()

But what if you want to split a string into a list of characters? In Python, characters are simply strings of length 1. The list() function turns a string into a list of individual letters:
 
>>> list('hello world')
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] 
More generally, list() is a built-in function that turns a Python data object into a list. When a string type is given, what's returned is a list of characters in it. When other data types are given, the specifics vary but the returned type is always a list. See this tutorial for details.

Joining a List of Strings: .join()

If you have a list of words, how do you put them back together into a single string? .join() is the method to use. Called on a "separator" string 'x', 'x'.join(y) joins every element in the list y separated by 'x'. Below, words in mwords are joined back into the sentence string with a space in between:
 
>>> mwords
['Mary', 'had', 'a', 'little', 'lamb'] 
>>> ' '.join(mwords)
'Mary had a little lamb' 
Joining can be done on any separator string. Below, '--' and the tab character '\t' are used.
 
>>> '--'.join(mwords)
'Mary--had--a--little--lamb' 
>>> '\t'.join(mwords)
'Mary\thad\ta\tlittle\tlamb' 
>>> print('\t'.join(mwords))
Mary    had     a       little  lamb 
The method can also be called on the empty string '' as the separator. The effect is the elements in the list joined together with nothing in between. Below, a list of characters is put back together into the original string:
 
>>> hi = 'hello world'
>>> hichars = list(hi)
>>> hichars
['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] 
>>> ''.join(hichars)
'hello world'