Zen and the art of...


Wickedly Short Wikis

Well well well, let's be a little more serious this time around and talk about code. This is a technical blog (hum... interesting, a stock installation of emacs flyspell mode doesn't know about words like wiki, blog or even flyspell) after all, we ain't gonna talk about nothing!

So, today, I'll talk about wikis or, to be more precise, the software behind them. We'll dissect a very simple implementation, just to see what they are all about. Basically, a wiki is a very simple concept, reduced to its core it's the combination of four principles:

  • Automatic link generation
  • Editable content
  • Simplified formating
  • Backlinks

Upon realizing this, hordes of hackers from around the world flocked to write the shortest one for glory and fame. And thus the shortest wiki contest was born. As everybody expected the winner used Perl a mix of Perl and shell script. This all happened a long time ago (in Internet time) and now I'll tears apart one of these very little beast.

I won't bother with the Perl ones as I'm not masochist. In fact, I've actually already done it for wypy.py a couple of weeks ago. It's the best Python entry and also the best for a language that is not Perl. Even then, I've started with the 23 lines version not loose too much time, just enough. The actual entry to the contest was 11 lines long and contained only 814 characters, which is still nearly four times larger than the winning one, that's short! In this version, the code is at least divided into four functions. It begins with a simple one, load, which returns the content of a text file if it exists, else a string:

# ex is a shortcut for os.path.exists
def load(n): return (ex('w/'+n) and open('w/'+n).read()) or ''

This isn't really exciting, just a good usage of logical operators. The next one has more meat to it, let's see the code first:

def fs(s): return reduce(lambda s, r: re.sub('(?m)'+r[0], r[1], s), (('\r',''),
   ('(^|[^A-Za-z0-9?])(([A-Z][a-z]+){2,})', lambda m: (m.group(1) + '%s<a hr' \
    'ef=wypy?%s'+m.group(2)+'%s>%s</a>') % ((m.group(2),'p=','&amp;q=e','?'),
   ('','','',m.group(2)))[ex('w/'+m.group(2))]),  ('^\{\{$','\n<ul>'),
   ('^\* ','<li>'),  ('^}}$','</ul>'),  ('^---$','<hr>'),  ('\n\n','<p>'),
   ('(ht|f)tp:[^<>"\s]+','<a href="\g<0>">\g<0></a>')), cgi.escape(s))

It look awful, but upon closer inspection we see that the noise is mostly regular expressions and some compact markup in a format string. All in all, it's a fold taking the escaped input string as initial value and reducing a list of tuples which are composed of a regular expression and either of a string or a function. These happen to be the two first arguments of re.sub which is the function used in the lambda expression fed to reduce. There's one little unexplained detail in there, why does '(?m)' is prepended to every regular expressions? After some googling, I've found this explanation:

Caret and dollar match after and before newlines for the remainder of the regular expression. (Older regex flavors may apply this to the entire regex.)

Well, I'll confess I don't truly understand what this is suppose to mean, but it doesn't look that important anyway, let's move on.

Now the do function, obviously the one actually doing something! It's a kind of dispatching function (using a dictionary) performing different type of actions depending on the value of the first parameter:

def do(m, n): return {'get':'<h1>WyPy:<a href=wypy?p=%s&amp;q=f>%s</a></h1>(' \
   '<a href=wypy?p=%s&amp;q=e>edit me</a>)<p>%s' % (n, n, n, fs(load(n)) or n),
   'edit': '<form action=wypy?%s method=POST><h1>%s<input type=hidden name=p' \
   ' value=%s> <input type=submit></h1><textarea name=t rows=15 cols=80>%s</' \
   'textarea></form>' % (n, fs(n), n, load(n) or "Describe %s" % n), 'find': 
   ('<h1>Links: %s</h1>' % fs(n))+fs('{{\n* %s\n}}' % '\n* '.join(filter(
    lambda s: open('w/'+s).read().find(n) > -1, os.listdir('w'))))}.get(m)

It is quite simple actually, get returns a wiki page, edit is obvious and find make a page containing a list of links. The main function then wires up everything to be called by a CGI process and also add a page title.

def main(f): 
   n = f.get('p') or env("QUERY_STRING") or ''; n = ('HomePage',n)[n.isalpha()]
   print "Content-type: text/html; charset=utf-8\r\n\r\n<title>%s</title>" % n
   if env("REQUEST_METHOD") == "POST": open('w/'+n, 'w').write(f['t'])
   print do({'e':'edit', 'f':'find'}.get(f.get('q')) or 'get', n)

That's it, short and sweet. A last interesting point, the only code that I had trouble understanding was this: ('HomePage',n)[n.isalpha()]} which is quite confusing on first sight. It cleverly use the implicit conversion from boolean to integer to choose between the two items of a tuple. It's in fact an old school way of simulating the ternary operator, that feature having been added to Python 2.5 in 2006. That trick might be well know amongst pythonitas, but I'm not one of them as I rarelly code in Python.

While dissecting this code I've refactored (or might I say degolfed) it into a 45 lines version that you can grab here. I've got a nearly finished version of this code translated to Clojure, I'll follow up on this post once it's done, but that may take a while as I'm having fun with the recently released Closure Library.

P.S.: Ward's Wiki is down at the time of this writing so the two first links of this post might not work for some time. You can try to access them through Google's cached links.

No comments:

Post a Comment

About Me

My photo
Quebec, Canada
Your humble servant.