- We are interested in the word 'ought'. How many text files in Gutenberg corpus contain this word? How many total lines?
- This time, we want to find how many 'ought' is NOT immediately followed by 'to'. How would you achieve this? Is there a potential pitfall to your approach?
- How many times does the name 'Ishmael' occur in melville-moby_dick.txt? Which n-th lines?
- You want to find out how many lines contain both 'sense' and 'leave' in Gutenberg corpus, and formulate the following command:
grep -i sense *.txt | grep -i leave | wc -l
Now observe what is problematic with this approach. What optional switch should have been used in this case?
- You want to pull up the lines in which 'joy' and 'tear' occur together. This command achieves it (almost... we'll ignore -w for now):
grep -i joy *.txt | grep -i tear
And then you think: "This only pulls up cases where the two words are on the same line! Better expand the context..." and formulate this command to print out additional lines around the matches:
grep -i -C 2 joy *.txt | grep -i -C 2 tear
Does it give you the intended result? What is problematic with this approach?
|