3.15. Putting it all together

Once again, all the dominoes are in place. We’ve seen how each line of code works. Now let’s step back and see how it all fits together.

Example 3.39. listDirectory


def listDirectory(directory, fileExtList):                                         1
    "get list of file info objects for files of particular extensions"
    fileList = [os.path.normcase(f) for f in os.listdir(directory)]               
    fileList = [os.path.join(directory, f) for f in fileList \
                if os.path.splitext(f)[1] in fileExtList]                          2
    def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):       3
        "get file info class from filename extension"                             
        subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]        4
        return hasattr(module, subclass) and getattr(module, subclass) or FileInfo 5
    return [getFileInfoClass(f)(f) for f in fileList]                              6
1 listDirectory is the main attraction of this entire module. It takes a directory (like c:\music\_singles\ in my case) and a list of interesting file extensions (like ['.mp3']), and it returns a list of class instances that act like dictionaries that contain metadata about each interesting file in that directory. And it does it in just a few straightforward lines of code.
2 As we saw in the previous section, this line of code gets a list of the full pathnames of all the files in directory that have an interesting file extension (as specified by fileExtList).
3 Old-school Pascal programmers may be familiar with them, but most people give me a blank stare when I tell them that Python supports nested functions -- literally, a function within a function. The nested function getFileInfoClass can only be called from the function in which it is defined, listDirectory. As with any other function, you don’t need an interface declaration or anything fancy; just define the function and code it.
4 Now that you’ve seen the os module, this line should make more sense. It gets the extension of the file (os.path.splitext(filename)[1]), forces it to uppercase (.upper()), slices off the dot ([1:]), and constructs a class name out of it with string formatting. So c:\music\ap\mahadeva.mp3 becomes .mp3 becomes .MP3 becomes MP3 becomes MP3FileInfo.
5 Having constructed the name of the handler class that would handle this file, we check to see if that handler class actually exists in this module. If it does, we return the class, otherwise we return the base class FileInfo. This is a very important point: this function returns a class. Not an instance of a class, but the class itself.
6 For each file in our “interesting files” list (fileList), we call getFileInfoClass with the filename (f). Calling getFileInfoClass(f) returns a class; we don’t know exactly which class, but we don’t care. We then create an instance of this class (whatever it is) and pass the filename (f again), to the __init__ method. As we saw earlier in this chapter, the __init__ method of FileInfo sets self["name"], which triggers __setitem__, which is overridden in the descendant (MP3FileInfo) to parse the file appropriately to pull out the file’s metadata. We do all that for each interesting file and return a list of the resulting instances.

Note that listDirectory is completely generic. It doesn’t know ahead of time which types of files it will be getting, or which classes are defined that could potentially handle those files. It inspects the directory for the files to process, then introspects its own module to see what special handler classes (like MP3FileInfo) are defined. You can extend this program to handle other types of files simply by defining an appropriately-named class: HTMLFileInfo for HTML files, DOCFileInfo for Word .doc files, and so forth. listDirectory will handle them all, without modification, by handing the real work off to the appropriate classes and collating the results.