Talks ∋ My first ruby
My first ruby
I’m going to talk to you about the first ruby script I ever wrote. It’s called MY first ruby, but I promise it won’t all be about me. There’ll be a little bit of personal history, just so you can get an idea of what sort of programmer I was at the time, but most of the talk will focus on the script itself.
Me in 2010
First the personal history which I promise won’t take up much time. This is me now. But I wrote this script in 2003. not 2010. So we have to go back in time. So step into my time portal and come with me to the late summer of 2003…
Me in 2003
Cast your mind back. Things were different then. Summers were longer, the sun was brighter, I had more hair, and ruby 1.8.0 (the grandfather of the version you all know and love) had only just been released that August.
I was in the dying stages of my first job out of university. I was interviewed to work primarily as a Java programmer; but I ended up doing a load of C++, a bit of VB and some Python GUI development. It was a great company, but the management decided to move the company offices to Cambridge, which I felt meant I was living the wrong way around; you commute into London, not out of it. I’d never done any Ruby programming, but I did have a taste for dynamic languages having done a lot of (not very OO) Python programming (we used tuples a lot). I still considered myself a Java programmer, not least because I’d just accepted a job at a Java-based SMS gateway company.
An introduction to ruby
I heard of Ruby because my friend, James Adam (pictured), had been doing a PhD instead of working and while he flirted with Java and Python for a couple of early instalments of his thesis code, he finally settled in about 2001 on Ruby for the majority of his code, and wouldn’t stop going on about it to anyone that would listen.
The prime vehicle for James’ evangelism of Ruby was to send messages a mailing list that our university classmates had set up to keep in touch after we graduated.
eGroups
This mailing list was originally running on egroups. Which was fine. It hosted files and polls and had a web interface. It was pretty sweet for the year 2000.
Yahoo! Groups
Then they got bought by Yahoo! and it became Yahoo! Groups! Basically the same service, but it had a red and blue logo instead of a purple one, and yellow was banished from the UI.
However, and here’s where we finally get to the ruby script, Yahoo! Groups! started dropping emails intermittently, or taking several days for emails to show up.
As you can imagine, this sort of delay in the inane ramblings of a group of 20-somethings debating the merits of their first jobs during the dot-com era was TOO. MUCH. TO. BEAR!
Yahoo! Groups - Cancelled
So despite thousands of available choices, and even a single click install of mailman on our shared hosting1, we decided to write our own replacement.
James somehow convinced us that we should write it in ruby even though he was the only one who knew any ruby. James knocked up a simple test script as a proof of concept and we decided to go ahead.
Iʼm ready to do ruby…
And so, this talk is called “My First Ruby”, and while it’s true that it’s my first ruby. It’s not like I wrote it alone. After James wrote the prototype we both had a free weekend on the 13th September, 2003 and knocked up the initial version together.
…with some help
My first ruby script was written under the guidance of someone who had been using it for a couple of years. This probably isn’t that different to many of you though; any code you’ve written has hopefully had the benefit of other people at your job working on the same project, if you’re lucky you’ve even been pair programming with them. Even if it’s code you’ve written at home, chances are you’ve put it up on github and have the chance of thousands of rubyists looking at it. Don’t be scared of letting other people look at your code.
Reading code
Speaking of reading code, Chris Lowis, LRUG’s resident podcaster, recently wrote a great blog post about open source rails projects and what you can learn from reading the source of them. I think it goes both ways; if you read other people’s code you learn a lot, and if you let other people read your code not only do they learn from you, but so will you when they critique it. They’ll suggest patches for edge-cases you didn’t cover or even a neat re-factoring. It’s also a nice confidence boost when you read some code for say, gemcutter, and you notice something you think could be improved.
Anyway, aside over.
My first ruby
So. Here’s the first code I committed to the project.
This probably isn’t the actual first bit of ruby I ever wrote. As I already mentioned, James had hacked up a prototype and the first commit to our source control was timed at around lunchtime on the Saturday. It’s 7 years ago so while I can’t remember exactly what we did that day, and I do recall spending a lot of the morning hunting around for a spare ethernet cable. I’m pretty sure we did some hacking on the code before we decided that CVS2 might be a good idea.
Anyway, I think there’s plenty in here that’s worth talking about. Why did I add this “empty constructor”. The commit message says it’s to allow YAML to make the object good. I’m not sure I know what that means, and on the face of it, it looks like I’m just stamping some error checking on here. I suspect however I was just experimenting with all the fun new things that Ruby can do.
Java vs. Ruby
To compare what this felt like to me in 2003 it’d help to compare the final code to how I’d do the same thing in Java.
(Well, I think so anyway; my Java is rusty). This was pretty amazing! So much less code! First, there’s the fact that by allowing default values in method signatures I can get rid of that entire 2nd constructor. Then there’s using the if statement as a statement modifier, by placing it at the end. I don’t know why, but I’m a massive fan of this format, and I think it’s one of the reasons that I get ruby. It just reads so neatly. Finally, there’s a lack of extraneous syntax.
But, you didn’t come here to listen to a talk about why ruby syntax is better than java. And to be fair to Java in 2003, the syntax is nothing like the mess it is now with generics and annotations.
I think having shown my first “committed” ruby, it’s time to talk about the system as a whole rather than go through it and pick holes in every commit of mine.
A content warning
Now I have to warn those of a sensitive nature, for reasons best left unexplored we decided to call our new mailing list software after a favourite insult from our university days. And I’m not going to be able to avoid saying it or showing it on screen, so I have to warn you;
Fucknut
We called it:
Fucknut
Fucknut is a mailing list with an attached web front-end for viewing the archived messages and attachments and managing your user account. Basically, it’s a less accomplished version of mailman.
Fucknut architecture
The main component of fucknut is the part that processes mail, and this is also the oldest part of it as it’s based on James’ inital prototype.
It starts with a .procmailrc
file. For those that don’t know, procmail is a UNIX tool that you can get to run against every mail that is delivered to your shell account and the .procmailrc
file controls it. You can think of it like a rails routes.rb
file for mail (except it doesn’t use ruby or have a nice dsl).
You define a regexp to match against some part of the incoming mail and if it matches you can decide what to do, for example forward the mail to /dev/null
, or invoke a script on it (it passes the mail in via STDIN
). You also decide if you want to stop processing or continue to see if it matches other rules.
For fucknut we have a rule that matches against the TO (or CC) address, and if it matches the list address we ask procmail to invoke a list handler script for that list.
These handler scripts are slim wrappers that set up the environment for the mailing list processor and then pass it the mail as a ruby object. We use Rmail for this (not the Tmail or Mail gems which you may be more famlilar with).
This mail part of fucknut also uses YAML::Syck
(which is now the default YAML parser in 1.8.x so you’re just using YAML
now3) to deal with some configuration stuff and Net::Smtp to send out email.
We’ll cover this in detail later, but having received an email it stores it in an archive db and then sends it on to the other users on the list.
handleMessage
This is the main method from the mail processing script and it describes the route that the mail takes through the system.
The first thing it does it make sure that the sender of the mail is one of the users. We’re all digital natives so a user is allowed to have several email addresses attached to their account and can post from any of them.
If it’s not a valid user the mail is discarded. If it is a valid user we continue processing it.
The next thing we do is set the from address to the user’s preferred posting address. I might send email from my work account, but I don’t want people using that account to mail me (and this was important at the time because my work email address changed about 4 times as the company underwent furious rebranding every few months at the whim of our VCs) so we tell fucknut to make it seem like all my mail comes from my personal account.
We then massage the subject of the mail to add our list identifier and keep “re:”s down to a minimum.
Then we do various things to the headers, mostly required of us by the 12 hundred RFCs that there are about mailing lists4.
Then we process any attachments to save them to a separate data store. And, because we wrote this in 2003 when many people were still on dialup, remove any attachments over a certain limit.
Then we archive the message to our database (for which read dump the raw mail to disk).
Finally, we go through the complete user list and send the mail out to all the users. Including the sender.
And that’s it. That’s Fucknut at a glance. Now I’ve described the system I’ll go over some of the code that I think is particularly terrible.
Regarding ‘R E :’
Originally we were just going to add the list name in square brackets to the start of the mail, but then we realised we had to do something to prevent various mail clients messing up the re: re: re: re: stuff. After a tortuous requirements gathering thread we decided to settle on [list name] <original subject>
and Re: [list name] <original subject without any re:>
. As you can might be able to see, this took 3 days and 23 messages to argue about and decide to do the thing we were going to do anyway. A further argument, if you ever needed it, for not starting a bikeshed discussion if you can possibly get away with it.
There was nothing else about this app that involved this level of debate.
processSubject
This code is pretty bad. I’m massaging a string, I should be using regexp here. I don’t necessarily agree with using regexp for everything but doing all this string manipulation here would be much better done with regexp. It wasn’t until a few months into doing ruby professionally (2005-ish I think) that Jon Lim (who I was working with at the time) asked my why I kept using .slice
and []
all the time instead of .gsub
with a simple regexp. I think this was a hangover from my Java days where regular expressions were percieved as slow and crappy, and strings were immutable so you did everything with StringBuffer
s.
processSubject (refactored)
I’m pretty sure, that even despite using regexps, this is easier to read and understand what’s going on.
Extending from Hash
The next thing we do is add list headers. We store those as part of the system config and have this object ListConfiguration
, “an extended Hash with some little things to make it’s use more convenient”.
As a comment at the top. You know what, I’ve never ever used a hash in ruby and thought, “Gee, I wish this was more convenient”, I can’t say I even really think that the new 1.9 hash syntax is that much better. It also shouldn’t extend Hash, it should contain a Hash and delegate the bits of the Hash API that I want and then provide it’s own methods where I want more convience.
listHeaders convenience method
This is one of those methods. Clearly there’s some premature optimisation going on here. Maybe I misunderstood YAML backed Hashes and though it would always be hitting the YAML file. Even if I did, don’t optimize until you have to.
Given all that, you know what would be more convenient than having to write that method in the first place…
A more convenient listHeaders
…this.
We mostly assigned instances of ListConfiguration
to a @config
variable when we use it, so just treat it like a hash.
Or, if I wanted to save 1 char typing whenever I accessed the list headers…
Another more convenient listHeaders
… we could define a listHeaders
method and use that on the @config
instance. But, really, the first thing is better, I’ve genuinely no idea what was going on here.
The original listHeaders method again
The final weird thing about this ListConfiguration
object is, if we look back at that listHeaders
method you’ll notice we don’t use symbols or strings as the keys into the hash. We use constants which are defined at the top of the Config
module. For example…
String constants
…like these.
Why? I don’t know. I just doesn’t make any sense, until you remember my Java routes where these sorts of “magic” strings would be defined as public static String
s because you’d only want to create that object once (woe betide the Java programmer in the early 2000s who went around creating more objects than they strictly needed to). Thing is, Ruby has symbols which save one char on typing a string and are more idiomatic. I can only assume I didn’t know about symbols when I wrote this. I probably wanted some comfort that I wasn’t spelling a config key wrong and causing a nil
error so shied away from using strings. Using constants means I’d get a runtime error saying there’s a missing constant instead of a nil
error somewhere down the line.
attachment processing
The next bit of code to talk about is this, it’s a chunk of the attachment processing method.
We get here if the message is multipart, and this fragment is run on each part.
We extract and store all the attachments, but we also remove them entirely if they are over a certain size
This code is actually not too shabby. It’s quite long because there are loads of edge-cases. Over the years, this is the part of the code that’s seen the most changes. Turns out multipart mime messages are hard and you can nest things and it gets weird. Whatever your naïve approach is, it’s going to crumble as people send richer and richer messages from more and more esoteric mail clients. In fact I had to fix a bug in it only a month ago, someone’s mail client started sending nested multipart messages with multipart/alternative
.
What you’re looking at is what the code used to look like. Some of you may already have noticed the bug.
If the part of the mail we’re dealing with didn’t have a filename (such as multipart/alternative
which is effectively nesting another set of parts), then our code logs the error but then blindly continued on assuming that it has a filename it can do something with. It never came to bite us until someone sent a multipart/alternative
message.
This is something I hope would have been fixed by TDD. With TDD I’d probably have mocked the log
or file.size
calls and built the function up slowly. But, I may not have caught it with TDD because I might never have thought to try out a nested multipart file.
The other thing it shows is that the real world will always conspire to break your code. If I had tests though I probably would have been able to feed the mail that broke it into the test suite and find out what tests suddenly failed.
handleMessage (again)
I’ve covered a few of the little refactorings I’d do to individual methods, but I think the whole thing could do with an overhaul. It’s pretty much one class and it does everything. I think something like the following…
A pipeline architecture
…might work. It’s really a pipeline and, like Rack, it might make sense to have a chain of smaller classes linked together, they all take in a mail message and do something to or with it and then pass it on. That way everything can be tested properly in isolation and it reduces the coupling between things.
RAA search
That’s the first part of fucknut covered, so now we’re onto the 2nd part, the web front-end.
2003 was a dark time for web development in ruby. There were lots of little libraries, but nothing as comprehensive as Rails, certainly not the rails we have now, but not even the rails we got in 2005. The thing that may surprise you is that we didn’t even have gems back in 2003. The first release of gems was in March 2004. To find ruby stuff we scoured something called the Ruby Application Archive which was a website similar to what rubyforge is now5, where you could list ruby projects and categorise them. Except it did no hosting, you just pointed the links to where the data was.
On top of this, someone wrote a program called raa-install
, which would go and find projects on RAA, download them and install them. At this point most libraries, if they had any installation, used the ruby setup.rb
incantations, and raa-install
would run those for you too. It didn’t do dependency graph information, but that’s because this info wasn’t on RAA. The thing that’s not clear to me looking back, is why gems and rubyforge came along when there was already this in place. I’ve not looked into it nor looked at the code. There’s probably an interesting story there.
So, as an aside, I know there’s a bit of hate for gems right now6, mostly about issues over dependencies and keeping applications and system stuff independent. I wouldn’t worry about it though, rubygems is not the first ruby code distribution and management system that’s existed, so maybe it won’t be the last. If it’s served it’s purpose perhaps it is time to move on; be that bundler, rip or something else.
RAA entry
Anyway, after a couple of false starts we settled on something called narf, which appeared to be the most high-level thing at the time. Now, remember this is me talking about narf as it was in September 2003. It was version 0.3.4 then, and it got up to 0.7.3 before development appeared to stop in 2005, and some of the docs imply it was headed in a direction that would abstract things further.
handler.rb
So, the first thing that seems weird to me, is that narf isn’t just a library that you include into your scripts. It also comes with an executable. As it turns out this executable is there to redirect any exceptions and errors from the ruby process back into the CGI environment. But this fact is buried as an aside in the docs. It’s weird.
Apache config
There’s no nice abstraction of routes, but then that’s not surprising, it’s not a higher-level MVC framework. It’s a web framework. It abstracts the first level of CGI interaction, it doesn’t build on top of it to give you what rails or sinatra gives you.
So, you run it as a CGI script, although you could have used mod_ruby
(no, not passenger, the mod_ruby
that existed ages ago that no-body used). And if you’ve installed ruby_fastcgi
7 you can run it as fastcgi. So, it’s very bare bones, if you want fancy urls you have to write them yourselves using RewriteRule
directives. This is ours.
We clearly got bored, as there are a few more URLs in the webhead other than looking at the archives, but we clearly couldn’t be bothered with making them pretty, most likely because we’d have to write them in this non-expressive regex format.
That said, I’m pretty sure early versions of rails asked you to do the same thing. I could be wrong though. Already it should be clear we are working at a lower level of abstraction. We’re close to the metal here.
narf API
Having asked apache to invoke your script, you require the narf libraries and this gives you a Web
object. This object is what you interact with to communicate with the webserver. This is a fragment of our main CGI script.
Apart from showing off my naïve ruby stylings (4 space indent! collect
instead of map
!) this explains a lot of the narf api:
Web[]
to get params that were sent with the requestWeb.print_template
to invoke some template processing on a file providing a list of variablesWeb.flush
to send everything back to the webserver
narf view templates
This is what that template looks like.
As you can see here, there are 2 ways of rendering data in these templates.
The first is that, moustache style, it’ll evalutate and render the results of any expressions within {}
braces. The key/values in the hash you provide to print_template
are available as $vars
for evaluation. Much like the :locals
hash when rendering a rails partial.
The other way of interacting is to use these <narf:*>
prefixed tags. Think of them like rails helpers, except instead of looking like code, they look like HTML. The web community swings back and forth on this sort of thing every so often, should the code in our views look like code (front-enders keep your hands off!) or should it look like markup (front-enders get stuck in!). I can think of only one templating engine for rails (radius which is used by radiant) that does it this way though.
Some of the narf tags, like <narf:foreach>
, would emit things and you could use what was emitted inside the curly braces. As far as I can tell though, the braces are just for evaluating simple expressions, no logic. If you wanted logic you have to use narftags.
Anyway, that’s a whistlestop tour of narf as it was. To be honest it’s clearly early days and some of the things littered in the documentation suggest it was pointed in the right direction (it came with a testing framework and the docs suggested building the app test first using that framework).
LoginHandler
showLogin
is effectively the render action for the "login"
command. If the user isn’t logged in we want to show them the login page, we’ll call .showLogin
on the LoginHandler
.
So, what’s going on here? Again I’ve defined strings as constants when they really didn’t need to be, it’s all internal. I’ve also, for no reason abstracted the call to print_template
out into calling it and the args I’d want to pass to it, I think it was just excitement at using the splat operator. Nowhere in the code do I ever call loginTemplateArgs
except in this method, and I can’t think where I’d want to given that the showLogin
method is simply a pass through to print_template
. The only way this might make sense were if it was like this…
LoginHandler#showLogin refactored
Maybe somewhere else (and I use this form for other handlers and their “actions”) I might want to render a template that shows a fragment of the login ui, and so grabbing the params that it needs from the LoginHandler
might make sense.
But I don’t. This just makes it more complex, and it should really be…
LoginHandler#showLogin refactored more
It’s just simpler.
“controllers”
You’ll have noticed that I talk about LoginHandler
. Narf doesn’t give you any controller framework, so we came up with our own. There are 3 handlers:
LoginHandler
- this deals with user sessionsUserUpdater
- this deals with letting the user manipulate their details from the user databaseArchiveDisplayer
- this deals with showing archived messages
They all have the same constructor: takes the Web
object and the path to the fucknut root. They all have a .do
method which looks at the cmd
param of the Web
object and acts accordingly. For example if the cmd
is 'showmsg'
in the ArchiveDisplayer
, it finds the requested message and displays it. If the cmd
is 'update'
in the UserUpdater
it fetches the current user’s details and updates them based on the POST
ed params.
That’s where consistency ends though. LoginHandler
returns a Session
object (our own wrapper to a couple of methods on Web
) and has other methods for rendering templates like showLogin
above. For the other 2 calling .do
is effectively an action endpoint and that handler will deal with everything from then on.
It’s clear that LoginHandler
and the other 2 aren’t really the same sort of thing, and yet I’ve made them look the same. The fact that I was using them differently meant I should have realised that they were different things, or that I was doing something else wrong. A LoginHandler
could easily have acted like the other 2 (where .do
is a render endpoint) if I’d had some other object that deals with is the user logged in or not. Frameworks give you rules and consistency, when you go it alone without much thought you end up with messes like this.
Also, looking at the code for the main handler script and these other cmd
handlers I’m amazed at how much plumbing has gone into my code to determine what to do based on the params, as opposed to actually doing it. With a higher level abstraction (like rails or sinatra routing) I can get on with saying: “this url means this code gets run”.
UserUpdater
Let’s look at some code in one of those handlers: UserUpdater
.
This is some of the code that is run when the 'update'
command is sent to the handler.
This code is directly inside the UserUpdater#do
method. It’s not even refactored out into it’s own method! My mind reels at this nowadays, but clearly not in 2003. We’ve got:
- view code - I’m building HTML fragments that later I send to a template.
- model code -
- data conversion - converting params from strings into objects
- data validation - checking that the data isn’t nil or invalid
Clearly there’s the shock that I’ve had to hand code all this, then there’s the shock that it’s all there mixed up in one method and finally that although we have a User
class it didn’t even cross our minds to keep this logic inside there. There’s nothing inside that User
class that deals with validation. If I didn’t check that the sendTo
mail address wasn’t valid here, it would be saved by default by the User
when I asked it to later in the method.
URL template
Finally, I want to show some code from the ArchiveDisplayer
. The code in here isn’t actually so bad. Maybe it’s because I wrote this in January 2004 in a week I had off between jobs. I was clearly more learned, or maybe it’s because one you get past the web stuff, what the code is doing is fairly straightforward. It just has to go through our disk based archive structure:
archive/<year>/<momnth>/<message_id>.msg
and display the messages. It has some of the flaws already discussed in that everything happens in one class even when it probably could be decomposed more; it has a mix of view code and model code. But actually, looking back at it, although the framework is unfamiliar, some of the methods look not too far from what I’d write as view helpers today.
page navigation
Apart from this one.
That doesn’t look too bad, until you look at the rest of the method (this is just a fragment). That last if
statement (which gets the navigation link or placeholder for taking you from the page you are on to the page for the previous year) is repeated almost in it’s entirety to get the navigation links for the previous month, the next month and the next year.
I remember during my final year in university doing a prolog exercise and getting a good mark for it because somehow I’d managed to bend my mind into making it do a reasonable job at playing noughts and crosses (for a partial board) without using all the memory on the planet. However one of the negative remarks was for a section of code where I’d repeated some lines without abstracting them into another method call. “Surely we put this sort of copy and paste behind us in CS1001?”
Apparently almost 4 years later I was still doing it.
The end
So that’s my first ruby. And before I go, I just want to explain why I gave this talk. Mostly it’s because I hope that after having me come up here and show you this code from 7 years ago, and how bad it is more people will want to get up and show off their code in future meetings. Either, as I’ve intended, by showing that everyone writes bad code and you needn’t be worried about it. Or, unintentionally, because you’re worried that if you don’t I’ll turn this into a series of lectures where I talk you through every piece of ruby code I’ve ever written.
Thanks.