In this project, you’ll write a program to calculate and display the number of times each word appears in Life, the Universe and Everything! (or any other book). In the process, you’ll get to practice almost all the Python skills you have acquired over the course, including command-line arguments, file input-output, strings and string methods, functions, lists and dictionaries, custom sorting, loops and if-statements. Let’s get started! (Sidenote: Life, the Universe and Everything! is the third book in the Hitchhiker’s Guide to Galaxy science fiction series by Douglas Adams).
Note: You need to have Python installed on your computer to be able to do this project.
Although we have provided guidance and instructions for the project, you’ll be writing all the code for this project. You are to read text from a file, and output the number of times each word appears in the file.
Here’s the sample expected interaction. The following command:
python3 wordcount.py hitch3.txt
should produce two output files,
the 3076 and 1599 of 1490 to 1371 a 1344 he 1248 it 1061 was 917 ... more lines ... robot 55 any 54 made 54 will 54 eyes 53 how 53 too 53 anything 51 galaxy 51 mind 51 round 51 got 50 nothing 50 rather 50 right 50 being 49 sky 49 ... many more lines ...
' 11 'cos 2 'em 1 'strue 1 - 83 --indeed 1 1 1 10 1 108 4 11 1 ... more lines ... about 173 above 20 abrupt 1 abruptly 1 absence 2 absolute 5 absolutely 2 abstractedly 1 ... many more lines ...
Keep reading for more detailed instructions. If you feel confident, try downloading the data and not looking at the rest of the instructions (or use as little as needed). Once you’re done with coding, take the quiz!
- Step 1: Download the data
- You can download the data from the following URL: hitch3.txt. This file contains a plain-text version of Life, the Universe and Everything! the third book in the Hitchhiker’s Guide to Galaxy science fiction series. (Sample included below)
- Tip: I suggest creating another file, say
hitch3small.txt, which only has the first 50 lines or so. It will make it easier to print out what your code is doing and look at the output.
- Step 2: Get filename from command-line and read the input
- Use the sys module to get the filename from command-line, and then read the file. Relevant tutorials:
- Python3 Modules and Command-line execution
- Python3 Sorting and File input-output
- Tip: After every step, keep printing out the values of your intermediate variables to check if everything is working as you expect it to.
- Step 3: Split the text into a list of words
- For this exercise, we’ll define a word as any sequence consisting of alphabets (a-z, A-Z), digits (0-9), apostrophe (’) or hyphens (-). For example,
"You're a jerk, Dent," it said simply.has the following words:
- You might find it helpful to define
chris among the characters mentioned above.
- Relevant tutorial: Python3 Lists and Loops
- Step 4: Count the number of occurrences of each word.
- When counting, convert words to lowercase. So
helloare all considered towards the count for
- Relevant tutorial: Python3 Dictionaries and Tuples
- Step 5: Sort the items by word count and output to
- Most popular first. Break ties by alphabetical ordering. In the example above,
anybecause word count for
robotis higher. But
madebecause they have the same word count, but
anyis earlier in alphabetical order.
- Relevant tutorial: Python3 Sorting and File input-output
- Step 6: Sort the words by alphabetical order and output to
- Step 7: Double check everything works as expected and take the quiz!
The file begins as follows:
Douglas Adams Life, the Universe, and Everything ================================================================= Douglas Adams The Hitch Hiker's Guide to the Galaxy Douglas Adams The Restaurant at the End of the Universe Douglas Adams Life, the Universe, and Everything Douglas Adams So long, and thanks for all the fish ================================================================= Life, the universe and everything for Sally ================================================================= Chapter 1 The regular early morning yell of horror was the sound of Arthur Dent waking up and suddenly remembering where he was. It wasn't just that the cave was cold, it wasn't just that it was damp and smelly. It was the fact that the cave was in the middle of Islington and there wasn't a bus due for two million years. ...
The solution to this project is included at the end of the quiz.