Zen and the art of...

2009-11-18

Wickedly Short Wikis

Well well well, let's be a little more serious this time around and talk about code. This is a technical blog (hum... interesting, a stock installation of emacs flyspell mode doesn't know about words like wiki, blog or even flyspell) after all, we ain't gonna talk about nothing!

So, today, I'll talk about wikis or, to be more precise, the software behind them. We'll dissect a very simple implementation, just to see what they are all about. Basically, a wiki is a very simple concept, reduced to its core it's the combination of four principles:

  • Automatic link generation
  • Editable content
  • Simplified formating
  • Backlinks

Upon realizing this, hordes of hackers from around the world flocked to write the shortest one for glory and fame. And thus the shortest wiki contest was born. As everybody expected the winner used Perl a mix of Perl and shell script. This all happened a long time ago (in Internet time) and now I'll tears apart one of these very little beast.

I won't bother with the Perl ones as I'm not masochist. In fact, I've actually already done it for wypy.py a couple of weeks ago. It's the best Python entry and also the best for a language that is not Perl. Even then, I've started with the 23 lines version not loose too much time, just enough. The actual entry to the contest was 11 lines long and contained only 814 characters, which is still nearly four times larger than the winning one, that's short! In this version, the code is at least divided into four functions. It begins with a simple one, load, which returns the content of a text file if it exists, else a string:

# ex is a shortcut for os.path.exists
def load(n): return (ex('w/'+n) and open('w/'+n).read()) or ''

This isn't really exciting, just a good usage of logical operators. The next one has more meat to it, let's see the code first:

def fs(s): return reduce(lambda s, r: re.sub('(?m)'+r[0], r[1], s), (('\r',''),
   ('(^|[^A-Za-z0-9?])(([A-Z][a-z]+){2,})', lambda m: (m.group(1) + '%s<a hr' \
    'ef=wypy?%s'+m.group(2)+'%s>%s</a>') % ((m.group(2),'p=','&amp;q=e','?'),
   ('','','',m.group(2)))[ex('w/'+m.group(2))]),  ('^\{\{$','\n<ul>'),
   ('^\* ','<li>'),  ('^}}$','</ul>'),  ('^---$','<hr>'),  ('\n\n','<p>'),
   ('(ht|f)tp:[^<>"\s]+','<a href="\g<0>">\g<0></a>')), cgi.escape(s))

It look awful, but upon closer inspection we see that the noise is mostly regular expressions and some compact markup in a format string. All in all, it's a fold taking the escaped input string as initial value and reducing a list of tuples which are composed of a regular expression and either of a string or a function. These happen to be the two first arguments of re.sub which is the function used in the lambda expression fed to reduce. There's one little unexplained detail in there, why does '(?m)' is prepended to every regular expressions? After some googling, I've found this explanation:

Caret and dollar match after and before newlines for the remainder of the regular expression. (Older regex flavors may apply this to the entire regex.)

Well, I'll confess I don't truly understand what this is suppose to mean, but it doesn't look that important anyway, let's move on.

Now the do function, obviously the one actually doing something! It's a kind of dispatching function (using a dictionary) performing different type of actions depending on the value of the first parameter:

def do(m, n): return {'get':'<h1>WyPy:<a href=wypy?p=%s&amp;q=f>%s</a></h1>(' \
   '<a href=wypy?p=%s&amp;q=e>edit me</a>)<p>%s' % (n, n, n, fs(load(n)) or n),
   'edit': '<form action=wypy?%s method=POST><h1>%s<input type=hidden name=p' \
   ' value=%s> <input type=submit></h1><textarea name=t rows=15 cols=80>%s</' \
   'textarea></form>' % (n, fs(n), n, load(n) or "Describe %s" % n), 'find': 
   ('<h1>Links: %s</h1>' % fs(n))+fs('{{\n* %s\n}}' % '\n* '.join(filter(
    lambda s: open('w/'+s).read().find(n) > -1, os.listdir('w'))))}.get(m)

It is quite simple actually, get returns a wiki page, edit is obvious and find make a page containing a list of links. The main function then wires up everything to be called by a CGI process and also add a page title.

def main(f): 
   n = f.get('p') or env("QUERY_STRING") or ''; n = ('HomePage',n)[n.isalpha()]
   print "Content-type: text/html; charset=utf-8\r\n\r\n<title>%s</title>" % n
   if env("REQUEST_METHOD") == "POST": open('w/'+n, 'w').write(f['t'])
   print do({'e':'edit', 'f':'find'}.get(f.get('q')) or 'get', n)

That's it, short and sweet. A last interesting point, the only code that I had trouble understanding was this: ('HomePage',n)[n.isalpha()]} which is quite confusing on first sight. It cleverly use the implicit conversion from boolean to integer to choose between the two items of a tuple. It's in fact an old school way of simulating the ternary operator, that feature having been added to Python 2.5 in 2006. That trick might be well know amongst pythonitas, but I'm not one of them as I rarelly code in Python.

While dissecting this code I've refactored (or might I say degolfed) it into a 45 lines version that you can grab here. I've got a nearly finished version of this code translated to Clojure, I'll follow up on this post once it's done, but that may take a while as I'm having fun with the recently released Closure Library.

P.S.: Ward's Wiki is down at the time of this writing so the two first links of this post might not work for some time. You can try to access them through Google's cached links.

2009-11-15

Hello World!

Hello world! I have currently nothing interesting to say, just posting to see how posts are looking with my selected theme. I have to fill up this post a little bit, so that I'll be able to better judge the appearance of a sufficiently large body of text. How much more completely useless text I still have to write for that state to be attained is not yet clear in my mind. Maybe I'll continue until I'm too tired, or not. There's a chance I'll write it in pieces, spread over many days. Anyhow, this isn't very important, even for a post hopefully devoid of any meaning.

Going the Lorem Ipsum way isn't an option, though, as I'm hopelessly tired of reading that text. Moreover I don't even read Latin. Which bring me to a potentially interesting subject, that is: "How can we assess the pertinence of a given text?" It's quite hard to determine how interesting such a subject is and I fear I could not reveal all of its intricacies. Well, let's not discourage ourselves with these petty concerns and write some more. First and foremost, it's more than obvious that pertinence is a relative concept. Not everybody would be equally interested in reading some collection of more of less related words arranged according to a somewhat coherent set of semantic rules.

And how much the actual meaning is affected by styling concerns? Can it be actually obscured by them? Does a poorly written piece of text containing the same meaning as a masterpiece is necessarily of lesser value. Maybe it can have some advantages over the better one, like it could be of a smaller size, thus requiring less time to read. Is this example even possible? Theoretically and on the top of my head I'd say yes, but that would be more an exception than a rule.

OK, now this post is starting to look like it may be big enough. At last, the end of the tunnel is near, I can feel it. I'll stop anytime now...

No, that wasn't really it. There's still more words for me to write and for you to read, however silly they are. Those two actions will evidently not be done at the same time, unless referring to myself as the reader. And here's another potentially interesting question arising: "Can we write and read at the same time?" Well not really interesting, I would even say that nobody care about that question outside some neurologists somewhere found on this planet. Which bring me to another question, this one incredibly more asinine than the previous ones: "Is there neurologists on other planets?" I'll abstain on commenting on this one though.

In the end, I just hope I'm not making too much sense as it risks to contradict the main objective of this post. I also wonder how many grammatical and styling mistakes I managed to make. There's certainly some vestigial artifacts of my mother tongue here and there. Does punctuation is well used? Did I correctly pluralized each and every words? Will some benevolent readers point them out for me?

P.S.: As usual I have more questions than answers!

About Me

My photo
Quebec, Canada
Your humble servant.