Thursday, February 19, 2009

Stream of Consciousness

I've been taking a look at functional programming languages in general and Scala in particular as of late. Partially because I've been sucked up into the hype surrounding FP & Scala and partially because I've never really done any FP. I read this post about infinite lists in Scala a couple months ago and decided that I wanted to try and do something similar with files. My goal was to create a function that takes a directory and returns a lazily evaluated list that can be used to recurse through every file and directory under the directory. Here's what I came up with:


import java.io.File

def makeFilestream(filelist: Stream[File]) : Stream[File] = {
if (!filelist.isEmpty) {
val file = filelist.head
if (file.isDirectory) {
Stream.cons(file, makeFilestream(file.listFiles.toStream.append(filelist drop 1)))
} else {
Stream.cons(file, makeFilestream(filelist drop 1))
}
} else {
Stream.empty
}
}

def filestream(root: File) : Stream[File] = {
val filelist:Stream[File] = root.listFiles.toStream
makeFilestream(filelist)
}

So what's the point of all this? By using Scala's Stream class, the list is lazily evaluated so it doesn't have to scan the entire directory tree before returning the first element. Furthermore, since Stream behaves like any other Collection class, all of the normal Collection operations like foreach, map, reduceLeft/Right, etc. are supported, eliminating the need to load the directory tree into memory and storing it in a List before operating on it.

Here are some examples of the cool things you can do with a file stream

Print everything in the directory /tmp

val tmpdir = filestream(new File("/tmp"))
tmpdir.foreach(println)

Print the first 10 directories under ~

val homedir = filestream(new File("/Users/username"))
homedir.filter(f => f.isDirectory).take(10).foreach(println)

Find the largest file under ~

val homedir = filestream(new File("/Users/username"))
val biggestFile = homedir.reduceLeft(
(a, b) => if (a.length > b.length) a else b)

Find the first Java source file under ~

val homedir = filestream(new File("/Users/username"))
val firstJavaFile = homedir.find(f => f.getName().endsWith(".java"))


Pretty much any file searching, selecting, or transformation operation you can think of can be expressed as a one-liner or chain of one-liners and thanks to the Stream, any operation that doesn't need to scan the entire directory tree is fast and efficient.

No comments: