In the workspace is a file called "text.txt" which contains a piece of text (it's the opening passage of Charles Darwins On the Origin of Species).
Write a program that reads the text in the file and lists each unique word, along with its frequency (i.e., how many times it occurs).
Example: (These numbers might not be correct - they're just a guide to sort of output you should generate).
when: 3
we: 3
look: 1
to: 9
the: 11
… etc.
Your program should work not only for this file but also for variations. You may assume the following about the text in a file:
First. Words are separated by whitespace. In the sample text given all of the whitespace is just a single space. But sometimes it might be multiple spaces, or tabs, or line breaks, etc. Make sure your program can handle all of these different kinds of whitespace. You will find the string method split very helpful.
Second. The text might contain punctuation marks (as the sample text does). Make sure that you don't include punctuation marks in words. You could remove them from the text altogether, using the programming task from Week 1 as a guide. It will be good enough if your program can handle punctuation marks:
. ? ! , ; : ( ) [ ] { } "
Don't worry about dashes and single quote marks:
- '
These are tricky, because sometimes they are used as parts of words (e.g. hyphenated words, such as "sub-variety", or contractions, such as "aren't") and sometimes they are used not as part of words (e.g., as dashes or quote marks). If you feel like a challenge then you could get your program to deal with them correctly, but you're not expected to, and you wont lose marks if you dont.
Third. Words might occur both with a capital first letter and without a capital first letter (e.g., the sample text contains both "When" and when). You should consider these to be the same word. You could make all words lower case, turning When into when. Or you could make them all upper case, turning both When and when into WHEN. It's up to you.
Note that you can add your own files to the workspace. So you could create your own text file, called, for example, "my_text.txt", put whatever text you like in that file, and then check your program by getting it to read that file instead of the sample file. You could add something like the following text, which contains things your program should be able to handle:
Hello, world, hello!
World: hello?
GOODBYE.
A nice thing about this is that you know what answers you should get: "hello" occurs three times. world occurs twice, and goodbye occurs once. So, if you're converting words to lower case then your output should be something like this:
hello: 3
world: 2
goodbye: 1
If you're feeling up to it, get the words appear in alphabetical order. Even better, get them to appear in order of frequency, from the most frequent down to the least frequent. Again, youre not expected to, and you wont lose marks if you dont do either of these things.
1. What is the difference between a list and a tuple?
Choices:
2. An ordinary playing card has two attributes: rank ('A', 2, 3, 4, 5, 6, 7, 9, 10, J, Q, K), and a suit (Hearts, Spades, Diamonds, and Clubs). When considering all 52 playing cards, which of the following collections would be the most appropriate for representing a single playing card?
Choices:
3. Suppose you're writing a program to work with the grades of students in a class. Each student has a unique id, but different students might get the same grade. Which one of the following collections would best allow you to store and modify this data?
Choices:
4. Which one of the following expressions is a list literal?
Choices:
5. The following code was written to show the intersection of the empty set with another set, but it generates an error. Why?
s = {‘a’, ‘b’, ‘c’}
print({}.intersection(s))
Choices:
6. Which one of the following statements is true?
Choices:
7. Which one of the following statements could you use to extend a list a by appending the elements of a list b?
Choices:
8. Which one of the following statements could you use to add the letter 'a' as an item to the end of a tuple t?
Choices:
9. Suppose that letters is a list of ten letters. Which one of the following pieces of code could you use to change the first two elements 'a' and b?
Choices:
10. Why does the following piece of code print b?
x = [‘b’, ‘b’]
print(x.pop())
Choices:
11. Which one of the following is true?
Choices:
12. Suppose you have a list called "names" which contains a number of names. Suppose you want to join them into a string, separated by commas. Which one of the following expressions could you use?
Choices:
13. Suppose you want to loop through a dictionary d and print each of its values. Which one of the following pieces of code could you use?
Choices:
for x in d:
print(d[x])
for x, y in d.items():
print(y)
for x in d.values():
print(x)
You could use any of the above.
14. Which one of the following is true?
Choices:
15. In the code below, line 2 will produce an error but line 3 will not. Why?
x = ([1], [2], [3])
x[0] = 4 # Error
x[0][0] = # No error
Choices:
16. Suppose that word is a variable that refers to a word. Suppose you want an expression that returns the set of consonants in word. Which one of the following expressions could you use?
Choices:
17. Consider the following code template:
for I in < expression >:
< statement >
Suppose you want i to loop through the numbers 0, 2, 4, 6, 8 and 10. Which one of the following could you use to replace < expression >?
Choices:
18. Why does the following piece of code generate an error?
word = ‘Expediensy’
word[-2] = ‘c’
Choices:
19. Suppose the following piece of code executes without error:
with open(‘myfile’, ‘r’) as file:
lines = file.readlines()
What will be the value of lines?
Choices:
20. Suppose you're creating a datetime object from the string "June 24, 1968, at 05:30" using datetime.strptime. Which one of the following format strings should you used?
Choices: