Python Clone of Unix wc Program using Click Framework

By Adam McQuistan in Python  12/06/2021 Comment

Introduction

In this article I demonstrate the power of the Python based Click package for building beautiful Command Line Interface (CLI) programs by creating a clone of the popular wc Unix program. The wc program is a simple program used to count lines, words, and bytes present in files and serves as a perfect example to demonstrate the basics of the Click CLI framework. For those interested in learning more about the awesome Click package please consider taking my course Building Python CLI Apps with Click.

Prefer Video? Watch on Youtube

Understanding the Requirements

The first place to start in any software project is with a clear understanding of the requirements of what is to be accomplished. In this example it is pretty clear since there is a working example complete with well articulated documentation.

Pulling up the wc man page like so.

man wc

On my macOS Monterey gives the following docs.

WC(1)                        General Commands Manual                       WC(1)

NAME
     wc – word, line, character, and byte count

SYNOPSIS
     wc [-clmw] [file ...]

DESCRIPTION
     The wc utility displays the number of lines, words, and bytes contained in
     each input file, or standard input (if no file is specified) to the
     standard output.  A line is defined as a string of characters delimited by
     a ⟨newline⟩ character.  Characters beyond the final ⟨newline⟩ character
     will not be included in the line count.

     A word is defined as a string of characters delimited by white space
     characters.  White space characters are the set of characters for which the
     iswspace(3) function returns true.  If more than one input file is
     specified, a line of cumulative counts for all the files is displayed on a
     separate line after the output for the last file.

     The following options are available:

     -c      The number of bytes in each input file is written to the standard
             output.  This will cancel out any prior usage of the -m option.

     -l      The number of lines in each input file is written to the standard
             output.

     -m      The number of characters in each input file is written to the
             standard output.  If the current locale does not support multibyte
             characters, this is equivalent to the -c option.  This will cancel
             out any prior usage of the -c option.

     -w      The number of words in each input file is written to the standard
             output.

     When an option is specified, wc only reports the information requested by
     that option.  The order of output always takes the form of line, word,
     byte, and file name.  The default action is equivalent to specifying the
     -c, -l and -w options.

     If no files are specified, the standard input is used and no file name is
     displayed.  The prompt will accept input until receiving EOF, or [^D] in
     most environments.

ENVIRONMENT
     The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of
     wc as described in environ(7).

EXIT STATUS
     The wc utility exits 0 on success, and >0 if an error occurs.

EXAMPLES
     Count the number of characters, words and lines in each of the files
     report1 and report2 as well as the totals for both:

           wc -mlw report1 report2

An example would be useful here so, I'll create the following two demo files.

First file is named martin-fowler.txt and is a quote from Martin Fowler

Any fool can write code that a computer can understand.
Good programmers write code that humans can understand.

Second file is named linus-torvalds.txt and is quote by Linus Torvalds

Any program is only as good as it is useful. 😊

Then run them through wc.

wc martin-fowler.txt linus-torvalds.txt

Output.

2      18     112 martin-fowler.txt
1      10      45 linus-torvalds.txt
3      28     157 total

Setting Up the pywc Click Project

For this simple project I'll need just two Python files. One will be the pywc.py which will contain the source code for the pywc clone of the wc program. The second file is a setup.py file which will use setuptools to create an installable and executable programs.

I start by filling out the setup.py file as shown below.

# setup.py
from setuptools import setup


setup(
  name='pywc',
  version='1.0.0',
  py_modules=['pywc'],
  python_requires=">=3.6",
  install_requires=['Click>=8.0.0'],
  entry_points={
    'console_scripts': [
      'pywc=pywc:main'
    ]
  }
)

The above setup(...) function establishes the name as pywc for the installable package, specifies that it requires at least Python 3.6 (because I'll use f-strings) plus the Click library of at least version 8.0.0 and the pywc command will map to a main(...) function in a module named pywc.py

Next I scaffold out the pywc.py module with a main() function that provides a minimal function doc-string. The main() function is decorated with the @click.command() decorator which tells click to treat this function as a command.

# pywc.py

import click


@click.command()
def main():
    """Python clone of Unix wc program."""
    pass


if __name__ == '__main__':
    main()

Next I create a Python virtual environment with the built in venv module then activate it.

python3 -m venv venv
source venv/bin/active # use .\venv\Scripts\activate.bat if on windows

At this point my current working directory looks as follows.

.
β”œβ”€β”€ venv/
β”œβ”€β”€ linus-torvalds.txt
β”œβ”€β”€ martin-fowler.txt
β”œβ”€β”€ pywc.py
└── setup.py

Now with the Python virtual environment active I can install the pywc package in editable mode.

pip isntall -e .

At this point I should be able the pull up the help page of the base, unimplemented, pywc command as follows.

pywc --help

Output.

Usage: pywc [OPTIONS]

  Python clone of Unix wc program.

Options:
  --help  Show this message and exit.

This concludes the basic setup of the project. I can now progress on to implementing the functionality.

Adding an Argument for File Handing

A good starting point is to add a command argument for specifying either a file or standard input for the program to consume. The Click library provides a decorator named argument for this purpose.

# pywc.py

import click


class DataRow:
    def __init__(self, name):
        self.name = name
        self.lines = 0
        self.chars = 0
        self.words = 0
        self.bytes = 0


@click.command()
@click.argument('inputs', type=click.File('r'), nargs=-1)
def main(inputs):
    """Python clone of Unix wc program."""
    for file in inputs:
        row = DataRow(file.name)
        for line in file:
            click.echo(f"{row:>18}-> {line}", nl=False)


if __name__ == '__main__':
    main()

This first update uses the argument decorator of the Click library to specify that the command requires an argument of type click.File. This file based argument maps to the main(...) function's argument named inputs and represents a colletion of input files which is specified by using the nargs keyword parameter set to -1 but, I could have also restricted this to a hard value like 1,2 3, ect ... I also provided a temporary implementation of simply iterating over the contents of each input file and use the click.echo(...) function to display each line of input in the source files.

If I run the pywc program on my two demo files like so.

pywc martin-fowler.txt linus-torvalds.txt

I get the following output.

 martin-fowler.txt-> Any fool can write code that a computer can understand.
 martin-fowler.txt-> Good programmers write code that humans can understand.
linus-torvalds.txt-> Any program is only as good as it is useful.

Adding Option Flags

Next I need to add options flags for counting bytes (-c), lines (-l), characters (-m), and words (-w). To accomplish this I'll add a option(...) decorator from the Click library above the main method for each option along with a help keyword parameter for documenting them. Since these are boolean flags I use the is_flag=True keyword argument on the option decorator.

# pywc.py

import click


class DataRow:
    def __init__(self, name):
        self.name = name
        self.lines = 0
        self.chars = 0
        self.words = 0
        self.bytes = 0

    def output(self,
        show_lines=False,
        show_chars=False,
        show_words=False,
        show_bytes=False
    ):
        num_counts = sum([show_lines, show_chars, show_words, show_bytes])
        fmts = ['{:>8}'] * num_counts
        fmts.append('{:<20}')
        fmt = ' '.join(fmts)

        fields = []
        if show_lines:
            fields.append(self.lines)
        if show_words:
            fields.append(self.words)
        if show_bytes:
            fields.append(self.bytes)
        if show_chars:
            fields.append(self.chars)

        fields.append(self.name)
        return fmt.format(*fields)


@click.command()
@click.argument('inputs', type=click.File('r'), nargs=-1)
@click.option('-c', help='Count bytes in each input file.', is_flag=True)
@click.option('-l', help='Count lines in each input file.', is_flag=True)
@click.option('-m', help='Count characters in each input file.', is_flag=True)
@click.option('-w', help='Count words in each input file.', is_flag=True)
def main(inputs, c, l, m , w):
    """Python clone of Unix wc program."""
    show_default = not any([c, l, m , w])
    total_row = DataRow('total')

    for file in inputs:
        row = DataRow(file.name)
        for line in file:
            row.bytes += len(line.encode('utf-8'))
            row.lines += 1
            row.chars += len(line)
            row.words += len(line.strip().split())

        total_row.bytes += row.bytes
        total_row.lines += row.lines
        total_row.chars += row.chars
        total_row.words += row.words

        click.echo(row.output(
            show_lines=l or show_default,
            show_bytes=c or show_default,
            show_words=w or show_default,
            show_chars=m
        ))

    if len(inputs) > 1:
        click.echo(total_row.output(
            show_lines=l or show_default,
            show_bytes=c or show_default,
            show_words=w or show_default,
            show_chars=m
        ))


if __name__ == '__main__':
    main()

I've also added the implementation that counts the metrics for each input file and formats the output for printing to standard out via the click.echo(...) function.

For completeness I should check the output from the wc and pywc.

First I run the two files through the wc program counting the lines and words.

wc -wl linus-torvalds.txt martin-fowler.txt

Output.

       1      11 linus-torvalds.txt
       2      18 martin-fowler.txt
       3      29 total

Next I run them through the pywc clone.

pywc -wl linus-torvalds.txt martin-fowler.txt

Output.

       1       11 linus-torvalds.txt  
       2       18 martin-fowler.txt   
       3       29 total

Mission complete!

Conclusion

In this short blog post I was able to make a Python based clone of the unix wc program with a suprisingly small amount of code which is made possible through the use of the Click framework for CLI development.

As always, I thank you for reading and please feel free to ask questions or critique in the comments section below.

Share with friends and colleagues

[[ likes ]] likes

Community favorites for Python

theCodingInterface