Week 15 summary -- Python files and command line

08 Dec 2011

We covered some examples outlined in last week's agenda. Next week will have a quiz -- I'll post some practice problems this weekend.

Line count program

Create a program that counts the number of lines in the file specified on the command line. So if you type: linecount FILENAME, it will print the number of lines in FILENAME. Create an alternate version that counts the lines from stdin if no FILENAME is specified.

import sys

# test to see if a filename was specified
if len(sys.argv) == 2:
    # get filename
    name = sys.argv[1]
    # open file
    f = open(name)
else:
    f = sys.stdin

# get list of lines
lines = f.readlines()
# print number of lines
print(len(lines))

Word count program

Create a program that counts the number of lines containing a particular word. So if you type wordlines WORD FILENAME, it will print the number of lines that contain WORD in the file FILENAME. Create an alternate version that prints all those lines instead of counting them. Try using this version in a pipeline with your linecount program.

import sys

word = sys.argv[1]
name = sys.argv[2]

f = open(name)

# look in all lines of file for word
count = 0
for line in f:
    if word in line:
        print(line, end='') # don't print extra newlines
        # First version: keep track of number of matches
        # count = count + 1

# First version: print number of matches
# print(count)

Longest word program

Create a program that finds the longest word in a file. Hints:

  • words = line.split() will take a line and split it into a list of words.
  • Break the program up into small functions. Define a function that finds the longest word in a list of words. Then define a function that finds the longest word in the entire file, using the first function.
  • For finding the longest word, think about how you defined the max function to find the max number in a list. You can find the longest word in almost the same way -- you just need to transform the input in some way.
import sys

# split a line into words.
# This could be tweaked later on to split more intelligently.
def words(line):
    return line.split()

# return longest word in a list 
def longest(words):
    long = ''
    for w in words:
        if len(w) > len(long):
            long = w
    return long

name = sys.argv[1]
f = open(name)

# Check all lines on file, keeping track of longest
# word seen so far.
longest_so_far = ''
for line in f:
    w = words(line)
    longest_on_line = longest(w)
    if len(longest_on_line) > len(longest_so_far):
        longest_so_far = longest_on_line

# When loop is done, longest_so_far contains longest in entire file.
print(longest_so_far)

Instead of working line-by-line, we could read in the entire file all at once, and then split that whole thing into words. Our original words function still works for this purpose. If we get all words in the file, we can just feed all of those into the longest function and be done. One downside is that for a very large file, we have to store the whole thing in memory with this version.

import sys

# split a string into words.
# This could be tweaked later on to split more intelligently.
def words(line):
    return line.split()

# return longest word in a list 
def longest(words):
    long = ''
    for w in words:
        if len(w) > len(long):
            long = w
    return long

name = sys.argv[1]
f = open(name)

contents = f.read()
all_words = words(contents)
print(longest(all_words))

As a final version, Python has a built-in max function for which you can specify how to determine the max. Using that, the program can be written on one line if you really want:

import sys
print(max(open(sys.argv[1]).read().split(), key=len))