CSC 185 & 201 / Homework 7: collections (201)

09 Apr 2012 [ 201 week13 hw ]

Email all files as attachments (or zip up your full project folder) to me at jal2016@email.vccs.edu with subject CSC 201 HW7.

Alternately, put your homework in a hw7 folder in your 201-hw Mercurial repository on BitBucket.

Due Tuesday, April 17.

1: Maps

Write a program that keeps track of the number of times each letter from 'a' to 'z' (ignoring case) was read from standard input, and then prints the totals at the end. Here's a start:

import java.io.IOException;
import java.io.InputStreamReader;

/**
 * Count the number of times the 
 */
public class CharCount {
    public static void main(String[] args) throws IOException {
        // Wrap with object that will convert bytes to characters.
        InputStreamReader r = new InputStreamReader(System.in);
        int charInput = 0;
        // Repeatedly read the next character until EOF.
        while ( (charInput = r.read()) != -1 ) {
            // Cast input to char type to treat as letter.
            char c = (char)charInput;
            // Convert to lower case and ensure it's a letter.
            c = Character.toLowerCase(c);
            if (!(c >= 'a' && c <= 'z'))
                continue;
            // ...
        }
    }
}

CharCount.java

To test on a text file, you can run the program like: java CharCount < INPUT.txt.

For those of you in 185, here's a command line using a few common Unix programs that does the same thing (although it's inefficient for large inputs, since it has to do a two sorts on the full input): (sed 's/\(.\)/\1\n/g' | tr 'A-Z' 'a-z' | grep '[a-z]' | sort | uniq -c | sort -n) < INPUT.txt

2: Priority queues

Write a program in a class called LargestWords that, given a number N and an input file specified on the command line, will print the N largest words found in the file. A convenient way to do this is using a priority queue, which can be used to efficiently keep track of the largest words seen so far when processing a long list of words.

Below is a template that you can start with. To use on the command line, you would type something like java LongestWords 20 my-input-file.txt.

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Scanner;

/**
 * Program to print longest words found in file.
 * 
 */
public class LongestWords {
    
    /** Order by String length in ascending order. */
    public static class MinLengthComparator implements Comparator<String> {
        public int compare(String s, String t) {
            return Integer.compare(s.length(), t.length());
        }
    }
    
    public static void main(String[] args) {
        if (args.length != 2) {
            System.err.println("Usage: java LongestWords N INPUTFILE");
            System.exit(1);
        }
        
        // Process command line arguments.
        int N = Integer.parseInt(args[0]);
        String filename = args[1];

        // Open input file.
        FileReader reader = null;
        try {
            reader = new FileReader(filename);
        } catch (FileNotFoundException e) {
            System.err.println("Error: file " + filename + " not found.");
            System.exit(1);
        }

        // Create queue ordered by length (ascending), with initial capacity N.
        PriorityQueue<String> queue = 
                new PriorityQueue<String>(N, new MinLengthComparator());
        
        // Read and process all words in file.
        Scanner scanner = new Scanner(reader);
        while (scanner.hasNext()) {
            String word = scanner.next();
            // Insert word in queue if it's long enough ...
        }
        
    }
}

LongestWords.java

The general use of a PriorityQueue in this context is as follows. Each time you process a new word, do the following:

If the queue has less than N elements, add the new word to the queue.
Otherwise, you need to ensure that the queue always contains the N largest words seen so far. To do this:

a. Check to see if the new word is larger than the smallest in the queue (see the peek() method).

b. If so, remove the smallest in the queue, and add the new word. (See the remove() and add() methods.)

Optional: use some additional logic to ensure that you only process alphabetic words (words that consist of only letters, so that long strings of punctuation or other characters are ignored).

Here are links to some example files:

CSC 185 & 201: Spring 2012

Homework 7: collections (201)

1: Maps

2: Priority queues