Sometimes you don't pay attention to some details and this lead you
right into the wrong direction. That's exactly what happened to me the
other day. I had a specific problem I intended to solve for a while,
nothing particularly difficult, but an interesting one. After hacking
around for an hour, the situation was getting weird and I decided to ask
for help on the
Clojure Google Group.
This post is an account of how that situation arose and what have been
done to solve it.
Walking Recursively
The task is nearly as simple as explaining it. I needed a way to apply
a function to each strings contained in a tree of nested data structures
in Clojure. Not that easy to explain as you see, well you didn't really
saw me rewrite the previous sentence a thousand times but you can
probably picture it. My first attempt is really naive and leverages the
clojure.walk library written by Stuart Sierra. It's a small library that
provides functions to perform non-recursive walking of generic Clojure
data structures. Lets first have a look at the walk function
documentation.
user> (doc clojure.walk/walk)
-------------------------
clojure.walk/walk
([inner outer form])
Traverses form, an arbitrary data structure. inner and outer are
functions. Applies inner to each element of form, building up a
data structure of the same type, then applies outer to the result.
Recognizes all Clojure data structures except sorted-map-by.
Consumes seqs as with doall.
nil
It isn't hard to take this function and use it in a recursive one to do
the job. Our function to transform strings could go in the inner
argument, it's also where we'll add the recursive call. We don't need
the outer one, so we will just use the identity function.
(use 'clojure.walk)
(defn recursive-string-walk [f form]
(walk #(if (string? %) (f %) (recursive-string-walk f %))
identity form))
It works well but there's an issue with this way of doing things,
recursion will end up filling up the stack which will ultimately
overflow. For the use I intend to make of this function, it isn't really
a big problem, but I always like to take these kind of challenges.
A Buggy Test
The first thing that came to my mind to test the above issue was to
create a nested list of arbitrary depth. We can take the iterate
function to create an infinite sequence of nested function calls to
list, then take and last to fetch the pit we want
to test with.
user> (def pit (iterate list "bottom!"))
#'user/pit
user> (take 6 pit)
("bottom!" ("bottom!") (("bottom!")) ((("bottom!"))) (((("bottom!")))) ((((("bottom!"))))))
user> (last (take 6 pit))
((((("bottom!")))))
user> (recursive-string-walk identity (last (take 6 pit)))
((((("bottom!")))))
user> (recursive-string-walk #(str "reached " %) (last (take 6 pit)))
((((("reached bottom!")))))
user> (recursive-string-walk #(str "reached " %) (last (take 1000 pit)))
; Evaluation aborted.
user> (recursive-string-walk #(str "reached " %) (last (take 100 pit)))
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((("reached bottom!")))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Work great! Now lets create an informal test function. This function
should take a range and a recursive walk function then will test it
using pits starting with the shallowest. This way we can find exactly at
which point the overflow happen.
(defn test-walk [walker shallowest deepest]
(doseq [depth (range shallowest deepest)]
(println (walker #(str depth " reached " %)
(last (take depth pit))))))
We'll use it to make sure it works and see which depth our
recursive-string-walk can reach.
user> (test-walk recursive-string-walk 400 500)
...
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((("486 reached bottom!")))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
(...(; Evaluation aborted.
On my computer, this function cannot go deeper than 486 (I wonder how it
would fare on a 486?) which isn't that much. It shouldn't be all that
difficult to make a lazy version of that function. It boils down to look
for sequences and wrap their recursive calls with lazy-seq.
(defn lazy-recursive-string-walk [f form]
(walk #(cond
(string? %) (f %)
(seq? %) (lazy-seq (lazy-recursive-string-walk f %))
:default (lazy-recursive-string-walk f %))
identity form))
From my understanding of lazy data structures and the walk library, this
version should be able to handle data structures of infinite depth given
infinite resources. However the test function didn't seems to be in
agreement with me as it was crashing at 828. That's about the time I
asked for help and stopped thinking about this problem for a while.
Lazy Zippers
A week-end passed by and in the meantime, some people from the group
tried to help me, yet nobody found what was the real problem. That would
have been miraculous given the actual source of the bug. Furthermore, I
was now more convinced than before than there was no bug in the lazy
version. Tom Hicks provided a lazy version of walk, which was
behaving in the same way. Laurent Petit also suggested that I try to
write a version using zippers.
It was an intriguing suggestion, I never used zippers but heard often
about them. Additionally, that day, MarkCC from Good Math, Bad Math
just posted
an article about them.
So I learned what they are and how to use them. I must say this is a
very neat functional programming technique, really worth the time to
learn. It's not that hard, Mark has done a great job explaining
it. I won't go into the details, but this version is quite easy to
understand when you know that the next function do a depth-first
traversal automatically.
(require '[clojure.zip :as z])
(defn lazy-recursive-string-walk-with-zipper [f form]
(loop [loc (z/seq-zip form)]
(if (z/end? loc)
(z/root loc)
(recur (z/next
(if (string? (z/node loc))
(z/replace loc (f (z/node loc)))
loc))))))
To my surprise, the test function still overflowed the stack using this
version. And it could simply not throw the StackOverflowError as it
executes in a loop. Maybe the error wasn't coming from the lazy walk
function after all.
So Where's The Bug?
I've been negligent about one thing, I didn't look closely enough at the
stack traces. The one thrown by recursive-string-walk happen to be
completely different than the one thrown by
lazy-recursive-string-walk. Here's both of them one after the
other.
No message.
[Thrown class java.lang.StackOverflowError]
Restarts:
0: [ABORT] Return to SLIME's top level.
Backtrace:
0: clojure.lang.RT.boundedLength(RT.java:1128)
1: clojure.lang.RestFn.applyTo(RestFn.java:135)
2: clojure.core$apply__4370.invoke(core.clj:436)
3: clojure.walk$walk__7942.invoke(walk.clj:43)
4: recursive_string_walk$recursive_string_walk__4137.invoke(NO_SOURCE_FILE:1)
5: recursive_string_walk$recursive_string_walk__4137$fn__4139.invoke(NO_SOURCE_FILE:1)
---
No message.
[Thrown class java.lang.StackOverflowError]
Restarts:
0: [ABORT] Return to SLIME's top level.
Backtrace:
0: clojure.lang.RT.assoc(RT.java:666)
1: clojure.core$assoc__4259.invoke(core.clj:146)
2: clojure.lang.AFn.applyToHelper(AFn.java:179)
3: clojure.lang.RestFn.applyTo(RestFn.java:137)
4: clojure.lang.Ref.alter(Ref.java:174)
5: clojure.core$alter__4907.doInvoke(core.clj:1542)
The actual source of the exception is the print function that is
recursive on nested sequences. This make sense, why would someone want
to print such monsters. Because of that inattention I lost time
searching for an imaginary bug. Only a small modification is needed to
make our test function working.
(defn get-bottom [p]
(loop [p (first p)]
(if (seq? p) (recur (first p)) p)))
(defn test-walk [walker shallowest deepest]
(doseq [depth (range shallowest deepest)]
(println (get-bottom
(walker #(str depth " reached " %)
(lazy-seq (last (take depth pit))))))))
We simply search for the bottom item and print it, this will also make
the test output clearer. We can now see what our original walk function
can handle.
user> (test-walk lazy-recursive-string-walk 100000 100001)
...
; Evaluation aborted.
---
Java heap space
[Thrown class java.lang.OutOfMemoryError]
Well, the heap too has its limit! We would need to augment the JVM heap
memory size to be able to run the previous example. Lets reduce our
ambitions and do some benchmarking instead.
user> (time (test-walk lazy-recursive-string-walk 10000 10001))
10000 reached bottom!
"Elapsed time: 25.138697 msecs"
nil
user> (time (test-walk lazy-recursive-string-walk-with-zipper 10000 10001))
10000 reached bottom!
"Elapsed time: 122.691972 msecs"
nil
As you see, the version using clojure.walk is much faster. That's
perfectly understandable considering the more complex concept of zippers
which have multiple uses aside walking trees. As both versions are fully
lazy, the first one is the clear winner here.
What Can We Use This For
After all these trials and tribulations, you may wonder what use I may
have for such a function. It's to help create functions or macros that
perform string replacement on all strings in a Clojure form. This can be
useful with Compojure while deploying your application to a different
root path than "/" for example. The idea come from my experience with
ASP.NET in which URLs can contain a character (~) to denote the
web site root path.
(use 'clojure.contrib.str-utils)
(defn expand* [expansions string]
(reduce (fn [s [regex replacement]]
(re-gsub regex replacement s))
string (partition 2 expansions)))
(defn expand-form [url form]
(recursive-string-walk (partial expand* [#"^~" url]) form))
Having the same functionality in a macro would be nice too.
(defmacro with-expansions [expansions & body]
`(do ~@(recursive-string-walk (partial expand* expansions) body)))
(defmacro expand [url & body]
`(with-expansions [#"^~" ~(str url)] ~@body))
To put this to use I've taken a function written by James Reeves a
few months ago during
a discussion
on the Compojure group. A function with the same name
(with-context) already exists in Compojure though, so we'll rename
it to with-root-path. The way it work is by binding the given
context to a var and then we can use a function called url to add
the context so that internal links still work once the application is
deployed elsewhere. Using the expand-form function on html's
arguments can reduce repetition and make the code less cluttered if
there's a lot of such URLs.
(use 'compojure)
(declare *context*)
(defn with-root-path
[ctx & route-seq]
(let [handler (apply routes route-seq)
pattern (re-pattern (str "^" ctx "(/.*)?"))]
(fn [request]
(if-let [[_ uri] (re-matches pattern (:uri request))]
(binding [*context* ctx]
(handler (assoc request :uri uri)))))))
(defn url [path]
(str *context* path))
(defn html-expand [& trees]
(apply html (expand-form *context* trees)))
(defroutes test-routes
(with-root-path "/foo"
(GET "/" (html-expand [:h1 "test"] [:p "~/"]))
(GET "/bar" (html [:h1 "test"] [:p (url "/bar")]))
(ANY "*" (page-not-found))))
(defn start-test-server []
(run-server {:port 8000} "/*" (servlet test-routes)))
The first route use a special version of the html function, it's
only a slight improvement but I really like that way of doing
things. There's plenty of other possible use cases though, but that
should be enough for this post.
P.S.: I know the title isn't correct, but it sounds way better
than A Recursive Walk on Strings Contained in Arbitrary Nested Data
Structures.