CIS071 LAB14 - Counting Word Occurrences                               DUE: (May 8 at noon – FIRM deadline)


 

Lab14: We need to write a program that will read a stream of characters (also referred to as a text file) containing a sequence of tokens (English words and punctuation marks) each separated from the other by a blank.  For example:

 

the quick brown fox jumped over the tall white fence . . . and fell !

 

There will be no capital letters in your sequence of characters, and the sequence will end with an end-of-file mark (eof).

 

Your program is to produce the following information about this sequence of tokens.

Part 1.                    Count the number of words of length 1 that appear in the stream.  Count the number of words of length 2, the number of words of length 3, … etc, through the number of words of length 15.  Count the total number of occurrences of punctuation marks.  When the program is finished, these 14 counts should be displayed in readable form.  (For example, the count of the number of words of length 3 in the above sentence is 4.)

 

Length        Count of Words

Punctuation          4

1                    0

2                    0

3                    4

4                    3

        etc.

 

Part 2.                    Count and then display (again in readable form) the number of occurrences of each word in the stream.  (For example, all the words in the above sentence appear just once, except for the word the, which appears twice.)

 

Word         Count

the            2

quick          1

        etc.

 

Hint: You will need an array of size 15 to store results for Part 1.  You will need an array of structs of say, size 200, in solving Part 2. 

 

Advice: Your initial implementation should be able to read the input stream from the keyboard. You can use pretty short text to test your program. When this initial implementation works, you should modify it to read from an external file on which you will be asked to store few lines of text to be read and processed.

 

Advice:  There are two important factors in solving this problem.  First, you need to understand and document the sequence of steps the program is to go through to produce the desired results.  At the same time, you should carefully identify all the variables you need and you should begin making a list of the separate functions you need to complete this task. You should work on this problem in an incremental fashion.