recurs.es

talking to myself

Bonsai and Elasticsearch and Tire

Working on a client project where we use a statically generated ElasticSearch index. We recently noticed that bonsai.io was available on heroku for free public beta usage. Previously we self-hosted on amazon. Obviously it’s nice to have everything in one place, so we decided to move.

On the old system, we used Tire as to generate the indicies and mappings, as well as to generate the correct query strings, and to wrap the results of such queries into nice structures. Needless to say Tire was doing some heavy lifting for us. In particular, we used it to bulk-populate the indices, which made regenerating the indicies on demand very quick, which is desirable when you’re trying to keep your publish times down.

One last important thing, Tire has an opinion about how you index models in your system – one index per model. This means that multiple types of index for a single model is possible – eg, having searchserver.com/items/english/<query> and searchserver.com/items/farsi/<query> – in the event you wanted to separate these ‘logical’ indicies from elasticsearch’s notion of a ‘literal’ index[1].

Bonsai has another opinion – one “literal” index per app. Naturally, there is some impedance to overcome.

To do this, I had to pull in two different changes from two lovely guys on the githubs. First was this pull request, by dylanahsmith, this implemented the “PUT” mapping API, which allows for mappings per logical index (‘type’ in proper ES terms). That got me halfway there, but bulk upload was also broken due to the difference of opinion.

So I pulled in changes from Evan-M which I found via this issue. This changes how you configure tire to be aware of how bonsai does it’s indexing.

Once I pulled both of these changes in, I ran the tests – got 4 failures. Fortunately, they’re the same 4 that are on master. To that end, I feel pretty confident about using it, especially since it is a reasonably non-interacting part of the app.

So, if you want to use my combined changes, you can find them here on the master branch. Be warned that these may not be totally operational in every way. Our usecase is pretty small in scope, so it’s use-at-your-own-risk.

[1] the “logical” and “literal” terms are mine. ES calls them “Indexes” and “Types” – but I don’t think “type” is the right word for the thing.

Three Wrongs Make Me Syck

In which I track down a bug to three causes

I recently had to put this comment in a codebase for a client, to explain why I’d written a task that did this:

1
2
3
    task :hack_yaml_parser do
      YAML::ENGINE.yamler = 'syck'
    end

The text of the comment follows:


This dodges a bug with httparty and neo4j and psych/syck. The rundown is thus:

Neo4j uses a bareword response for certain querys, which is valid JSON in the world of the ECMA people, but not in the world of the RFC.

HTTParty can’t use the superior MultiJson library, because it interprets (with some force) the RFC version, and thus can’t handle the response. Therefore, it depends on the older, crustier “crack” JSON library.

Crack uses a – unique – approach to parsing JSON. It munges the text to be valid yaml, then defers it’s parsing to the ruby stdlib’s yaml parser.

Ruby upgraded it’s YAML engine to “Psych”, which is based on libyaml and very snappy.

Unfortunately, some of the YAML that crack generates is less than perfect, so sometimes YAML.load throws a Psych::SyntaxError. Due to what I can only describe as an extremely unfortunate mistake, Psych::SyntaxError derives from SyntaxError, which follows up the chain to ScriptError, Exception, and Object.

Astute readers of Avdi Grimm’s book “Exceptional Ruby” will note that this means it’s not a subclass of StandardError, which means you cannot rescue from it (because it is in the same class hierarchy as an actual ruby syntax error).

So. You are left with an unrescueable syntax error. You can’t use the parser that would dodge this because it causes worse issues (eg, you can’t get single properties from a node). You can’t use the provided library because it can’t parse without error. What’s a hacker to do?

Well, you can revert to the older “Syck” yaml parsing library. Which happily accepts the failing cases, and generates rescueable errors when it hits a YAML syntax bug.

Thus, this unfortunately terrible line of code is present to fix a bug that all boils down to someone deriving from the wrong class, someone else following the wrong rules, and a third person (you) being in the wrong place, at the wrong time.


So let this be a lesson, follow RFC’s, not communities if you can. Don’t derive from the wrong class hierarchy, and try not to be in the wrong place at the wrong time.

I spent way to much time on this.

Fear

Fear

I’m sitting in an airport right now, about to embark on a living embodiment of fear.

I hate flying. Perhaps it’s just irrationality, or perhaps it’s that burden of knowledge that there is a thousand-ton vehicle about to go careening down a finite runway towards a seemingly-infinite ocean, only to, a few hours later, come careening down towards another finite strip of asphalt at exceptional speed. It’s just the physics that gets me. None of it seems at all safe – and while I know that there are far fewer plane accidents than train or car accidents each year, it doesn’t alleviate my terror at the prospect of breaking up at thirty-two thousand feet above our little marble.

I also hate spiders, and needles, and probably more things than I have a right to hate.

Suffice it that I am often afraid, almost always without cause. So why do I get on the plane? Why do I keep going? If I am terrified, why don’t I just not do any of it?

Family

So, I’m sitting here at Gate A-two-one – that’s what they keep saying, as if we wouldn’t figure it out if they just said “A twenty-one” – awaiting this coach of (what I imagine to be) death. I’m going to Sunny California, going to see my inlaws. As I await my inevitable demise, I quietly think to myself, “Fucking inlaws, if it weren’t for them, I could be home. Enjoying the cold and playing video games.” I’m burning off the rest of my vacation time for the year, so I’d be taking vacation in any case, but so motivated was I by that pretty-lady-who-feeds-me, that here I await the reaper.

So I’m motivated to fly, against (perhaps) my better judgement (and certainly against my will), but I will perservere to make that scary-wife-lady happy, because it’s Christmas, and while Santa Clause and Jesus Christ may be bullshit, my wife’s happiness isn’t. So how can I get through my fear? How can I deal with the fact that in less than three hours, I’ll be hurtling through the upper atmosphere?

Finding Confidence

When I’m coding, there are – many times – changes which need to be made, but appear to be hard. Changes that are sweeping, changes that involve large deletions, or philosophical changes to how the project is approached. Sometimes it’s about using a new tool to solve an old problem. Sometimes it’s about realizing a certain cross-cutting concern has been slicing into the codebase in a nonoptimal way. It’s a hard change because the team might be against it, or maybe the team is for it, but they don’t understand how the tool is “really” supposed to be used.

Similar to flying, we are often motivated by necessity to make these changes, but are hesitant due to fear. We’re afraid that the thing won’t work, that our whole infrastructure will come tumbling down, that everything will generally go to shit and we’ll get fired. Or worse, we won’t get fired, and will have to live with the shame as we pick up the tattered peices of a product.

The risk may be great, and that risk is directly proportional to the fear it induces, but I’m reminded of a quote from one of my favorite books:

I must not fear.

Fear is the mind-killer.

Fear is the little-death that brings total obliteration.

I will face my fear.

I will permit it to pass over me and through me.

When it has gone past I will turn the inner eye to see its path.

Where the fear has gone there will be nothing.

Only I will remain.

– Bene Gesserit Litany against Fear, Dune, Frank Hebert

The problem of rejecting risk due to induced fear is that – oftentimes – that induced fear is phantasmal. It’s not going to crash and burn, it’s not going to drive the company into the ground. In fact, the opposite – it will build the company up.

Make Fear Fuel

I peruse, often, Ward’s Wiki. Amongst the thousands of amazing articles, there is one which I dearly love – MakeFearFuel. In it, the argument is made to – rather than shirk from risk due to fear – to embrace it for the same reason. If you’re afraid of something, it’s natural, it’s okay. Fear is designed to protect you. However, Fear is not designed to paralyze you. Fear is a tool to enable you to run faster and further, to be more aware of your actions and surroundings – these latter two especially are of great use to a programmer. The fear from risk comes from the unknown – the inability to see all paths. However, that self-same fear can be harnessed, and made to work for you, to make your product better, to see further, to build something better.

As the article suggests, Don’t think of it as fear – but rather – excitement! You must permit fear – not ‘allow’ or ‘be okay with’, you are in control of fear – to pass through you. Fear is subservient, not superior, and when you understand the use of fear as fuel, and moreover, when you begin to apply it to your life (and not just programming, but everywhere), I think you will find yourself more amenable to making good changes, better at mitigating undue risk, and better at recovering from problems when shit really does hit the fan.

For my part, I’m being called to board this black coach, to sit amongst my brothers and sisters before death. I will not fear, fear is the mind-killer, I will face my fear, and give fear the finger.

Lessons Learned: A Sortah Postmortem

Ruby is flexible

Flexible like a Russian Gymnast. Seriously, it was never more than a few moments of thought to bend ruby to my will. Still, there were some wiggly tricks I had to use to get some things to work, and there still are several things which need fixing. For instance

This doesn’t work
1
2
3
4
5
6
7
8
    sortah do
     #...

      router :lenses => [:foo] do
        #... 
      end

    end

This doesn’t work, presently you have to write it with router :root, :lenses => [...] because the “parser” (mine, and in a way, the ruby parser) doesn’t know how to handle an implicit first argument – that is, I need positional arguments, and have to simulate them with defaulted arguments. This isn’t a new problem in ruby, in fact, 2.0 is supposed to provide named arguments, which will make life a bit nicer. One way I could get around this is to hack *cough* enhance the parser to check the class of the first argument. So it would go from:

Current `router` parser from lib/sortah/util/component.rb
1
2
3
4
5
    def initialize(name, opts = {}, *potential_block)
      @name = name
      @opts = opts
      @block = potential_block.first unless potential_block.empty?
    end

to something like:

Current `router` parser
1
2
3
4
5
6
7
8
9
    def initialize(name, opts = {}, *potential_block)
      if name.is_a? Hash
        @opts = name
      else
        @name = name
        @opts = opts
      end
      @block = potential_block.first unless potential_block.empty?
    end

which would allow (I think) for defaulting. It might have issues with how I extract the optional block, but that would require actually applying this – which I’m not really willing to do. Mostly because it’s ugly, but also because I’m not sure it’s so bad to have to specify that your root router is in fact a root router when you want to run lenses. I think it makes more sense to not have lenses applied to the root router. Here’s my reasoning.

Almost every lens you run is useful in a fairly limited context, For instance, I may not want to share my spam filters between my personal email and my work email – since the types of email I get in my personal email are very different than the types I get from my work email. This is not something that occurs to you normally – what you think is, “I want to filter my email for spam” so naturally you would tack the lens at the root. The problem is – this arguement could pretty much be made for every lens, “Ooh, wordcount, gonna need that” or “Hey now, definitely going to need the mailinglist name extractor” so you end up with a pile of lenses on the root router, and suddenly your email sorting is thrashing the CPU and taking 20 minutes besides.

The practicality of sortah is this – routers are much more likely to be cheap than lenses. Remember that a router should boil down to a few conditionals and consumption of existing metadata, it’s job is more or less to delegate to other controllers. In a sense, routers are controllers, lenses are models. The former should be thin, the latter should be fat. Thinness here means both the supplied code to do the routing, and the number of lenses attached, should be small. To understand why, we can look at the algorithm used for evaluation:

Sortah core execution methods, #sort and #send_to. From lib/sortah/cleanroom.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    def sort
      catch(:finished_execution) { run!(@pointer) } until @pointer.is_a?(Destination)
      self
    end

    private

    def send_to(dest)
      @pointer = if dest.is_a? Hash
        Destination.dynamic(dest[:dynamic])
      else
        @__context__.routers[dest] || @__context__.destinations[dest]
      end
      throw :finished_execution
    end

@pointer is either a Router object or a Destination object. #sort is called with @pointer set to the root router, it calls a local helper #run! on the pointer. #run! first calls the #run_dependencies! function on the pointer, which runs all of the lenses (a noop if there aren’t any), and then does a self.instance_eval on the block contained in the router. This is what ends up calling the #send_to method[1] which sets up the new pointer based on the symbol it’s provided (checking to see if it’s a router first, destination second, so a router will always override a destination[2]). #send_to then makes use of the niftiest thing I hadn’t known previous to this project. It uses throw to unwind the call stack, which has the pleasant effect of burning up to the nearest catch block (assuming it is a block which specifies the given symbol – which is in #sort! The punchline is that now, #send_to acts like the return keyword. So we can do the nice:

Isn’t that pretty…
1
2
3
4
5
6
sortah do
  router do
    send_to :foo if bar?
    send_to :baz unless quux? && 1 + 2 == 4
  end
end

rather than having to spell out all of the details with a full if/else/end block.

The point of all this is to show that delegating to a router with no lenses is crazyfast, so the big price to pay is not routing delegation, but lens overhead. This means that it will be better to minimize the number of lenses you need to run, which means you should really try to minimize the number of lenses ran on the root and “higher level” routers. Remember that routers form an arbitrary graph – you can happily call a router from itself if you like. There is no stack to worry about, so while I can’t think of a reason it would be useful to recursively call a router, it is possible. What I see as more likely is a sort of “two stage” approach to writing your routers, the first being “figure out what lenses will need to be run on this thing”, the second being “pipe through something which runs a bunch of lenses and does the real routing for this ‘class’ of mail”.

Temporary regrets

The big one is lack of direct support for maildir[3], at the moment you’ll have to tack on new manually. I don’t see much point in preserving mbox, I don’t like the format. I don’t even really like maildir – but I haven’t thought of something better yet. Future non-sortah plans of mine include a mail reader (I’m shopping for names at the moment, if anyone has a brilliant idea), so perhaps that will become a labratory for new mailbox format experiments. I’m not particularly opposed to maildir philosophically, it’s just a bit of a pain to get it to work with, for instance, mutt. If I had a “Longterm regrets” column, it would be “Thinking that it would be cool to use mutt” – seriously, I don’t know how people deal with the miserly state of the thing, it’s got that “I’m a unix tool for unix nerds and thus I must be configured only in this incomprehensible language which tries to merge bash and vimscript and shit all together.”

I like bash, I like unix, I am a unix nerd, but this shit is unacceptably awful.

New rule, if you use single-character symbols as a “shortcut” for anything that is actually a variable in your script because you can’t be bothered to do proper substitution of shit. You are to be paraded through town, naked, with a feather shoved…

Well, you get the idea, I don’t like muttrc.

Getmail is kindof crappy too, but I’m less hateful of it because it has the fringe benefit of being easy to tweak from a cut-and-paste I stole from someone else.

I’d really like to roll in getmail in some way to sortah. I know that the unix philosophy says “one thing, one thing well”, but it always felt to me like retrieval and processing were really the same thing – acquisition. In my head, my mail is already in the sorted format, in theory, I should just be doing a glorified scp. So naturally I dislike a tool which messes with that mental model. Similarly reading and sending email always struck me as a “delivery” problem, though the model there is perhaps less clear.

There are a few other minor things I wish were different in sortah, most of them are listed in the “FUTURE_PLANS” doc in the repo, but largely I’m looking forward to getting my own sortah system set up, getting it working (if temporarily) with mutt, and getting off the GUI with my email as it is.


[1] don’t mind that it’s private, it probably shouldn’t be since it’s getting called externally, but instance_eval bypasses normal checks, and the important thing is that you shouldn’t be able to call it from any context but a sortah block.

[2] this is on the list of, “Things which are a bug no matter which side you choose” – in this case, I chose router because it will cause your filter to hang, rather than gleefully put your email in the wrong (or at least unintended) spot. I figure it’s better to be loud and wrong than quiet and wrong…

[3] This, as of publish time (12-16-2011) is no longer an issue, sort of. Sortah has really hacky support for maildir. It’s not been an issue for me in practice yet, but I have more pressing issues in terms of email to deal with (eg, I hate every textmode email client I’ve tried, so I’m writing my own).

Easy Things

Start flapping…

People have this notion – if you want to be good at something, you should study how to do that thing. This, in-and-of-itself is not so much ‘wrong’, as it is itself misunderstood or misinterpreted. When I told my parents I was going to study Math, my Dad scoffed at the thought. “What are you going to do for work, Joe? How are you going to support a family?” Implicitly, I think, he was saying, “It’s nice you like your airy-fairy mathematics, but you should do something practical, like computer science, so you can get a job.”

Bullshit.

Mathematics has made me ten-times the programmer I would have ever been if I’d studied Computer Science directly. When I got to school, I looked at the course requirements for a CS degree, all I saw was tedium. I didn’t see anything that would challenge me, anything I couldn’t learn on my own. I was already pretty good at CS – certainly not great, but I understood the basics. I’d read the GoF and DDD, I’d tried writing a few compilers, I’d messed around in Scheme and tried to write some Python[1]. The courseload didn’t match my skill, and I sure as shit wasn’t going to pay them to teach me nothing for two years while they caught up.

Compare Math. Out of the gate, I jumped into Multivariable Calculus, I did Linear Algebra and The next semester, Differential Equations.

Holy shit.

My brain couldn’t keep up, I couldn’t absorb the knowledge I needed fast enough. It was like someone grabbed me and threw me off a cliff and screamed, “Start flapping, asshole, it’s a long way down.” Towards the middle of my sophmore year, I hit a stride with mathematics, but it never stopped being hard, it never stopped making my brain hurt.

DIY

So what the hell does that have to do with making me a better hacker? Well, in short, it made me a hacker period. When you’re falling down the side of an intellectual cliff, the first thing you need to do is figure out where the instructions are for the DIY parachute. So too, in studying math, it was a frantic scramble to ply any advantage I could find – same with my classmates. Some of my peers used the fact that they were organized to take copius notes to preserve the knowledge they couldn’t otherwise absorb. Some used their social abilities to tap the resources of others. Some – like me – made use of technology, especially the internet, to glean the information from the ether.

Using the internet to learn is like holding your to a firehose.

But – that’s what made me good at things. I learned, in fact, not how to do math. Indeed, that ability came for free with learning how to learn from the internet. My professors, I realized, weren’t the ones with the ripest knowledge to give, but rather, they were the ones who could be help me learn how to delineate the intellectual chaff from the wheat. I did not need someone to teach me Group Theory, I needed someone to tell me that, “Gallian’s book for that is great, but here, try Hungerford, he’s a bit higher level, but the proof is more clear his way.” Something that could never occur to me, something that isn’t really “learning math” but more “learning how to math.”

Jim and Jane

Learning mathematics made me good at learning things, sure. Everyone has heard that cliché more times than they care to count. Consider, however, what the consequence of that is. Does learning how to learn mean that you are stuck learning only about the thing you went to study? Of course not, I know how to learn, therefore, I can learn anything. So then why is it that, oftentimes, people who study Computer Science (ostensibly, in the process, learning how to learn), continue to only really learn computer science (if they continue to learn anything)? I cannot describe the number of people I know who studied CS, got a Bachelors, when to work at some corporate waterfall company, and remain there, churning out ten-penny code when they could be writing practical poetry in another context, working for people who care about being craftspeople.

Now, while my knowledge of many people who coordinate with my claim is by no means proof, it is an interesting sort of anecdote. It is reasonable to consider why – certainly, one hypothesis is that it’s my fault. After all, I am one thing they all must have in common (or else I wouldn’t be considering them)! However, I have another hypothesis which I think is more likely. That those people chose to study something that was not challenging, and thus, they never learned how to learn.

Take, for instance, Jim. Jim is not his name, but it is his moniker for our purposes. Jim studied CS at UMass, a certainly not terrible school. He learned his GoF, he studied how to drive design based on the domain, he learned a little Lisp and a little C and a little Java and a little bit of everything. He was a fairly reasonable Computer Scientist.

Then he started to work writing C# at a company that didn’t have a great set of standards, they didn’t have a passion for code-poetry, they didn’t feel like craftspeople – they felt like code monkeys.

So whats Jim to do? Jim stagnated. He became a code monkey, he churned out his requsite KLoCs and never wrote tests and fixed bugs and generally proceeded to be a just another drop in the waterfall. Jim got stuck, Jim stopped learning.

Compare now, Jane. Jane was a classmate of mine who studied CS. She’d never touched a computer (at least, inasmuch to program it) before. She started her program with an old windows machine which she, “Was gonna put Linux on, once [she] figure[d] out what Linux is.” She didn’t know about OOP, she’d never touched Scheme, she had, seriously, no clue what she was getting into.

But Jane sat down in her first programming class, and studied her ass off to do well. She learned mountains of material. She wrote code all day.

She also failed.

This served mostly to infuriate her. So when she took the class again, she redoubled her efforts, she was in every day for office hours. Hacking away, building her knowledge and her toolset. She was a permanent fixture in the CS lounge. She was a permanent open IM window on my desktop. She had more questions than Columbo, but that was because I had some answers, and she needed them.

Jane took what she needed.

Jane works for a Financial firm now as a Quant. She ended up picking up a double major in Math and CS (focused on stats), and graduated top of her class. She’s not a code monkey. She’s an innovator, she loves her job. She programs all the time, she (finally) got Linux on her laptop.

The fact that Jane has a “better” job is not the moral, the “betterness” here is merely a metric – I think that, Jane likes her job more than Jim. Jane talks about her job, the challenges she faces, the problems she wrestles with. In much the same way as when she was in school, she still is the girl who failed her first class because it was hard, and who aced the rest of them because she wasn’t one to back down.

Jim doesn’t care anymore. He works for a paycheck, not pride. He doesn’t talk about the new problem he has to solve, there aren’t any. He writes a little glue, surfs the net, writes a little more. There is no concern for quality – it’s not like the project is going to go through anyway. It’ll just be another cancelled plan from upper management. He’ll get reassigned.

Whats the moral here? The moral is that Jim studied something he knew, and so he never needed to learn, and so he never had to learn how to learn. Jane, on the other hand, had no clue going in, she chose a path that would challenge her, and make her get creative in solving her own difficulties with CS.

It’s the challenge – or moreso, the desire to overcome adversity that drives Jane. It’s lizardbrain the whole way down.

Creativity, Elegance, and Craftsmanship

My larger point is simple, it is less important to study something you ‘need’ to know, and more important to study what is hard to learn. If you can read a book, and understand it completely without having to really sit and think about it, you should find a better book. This (for me) roughly boils down to hacker-culture. Hacker-culture (or at least, my definition of it[2]) strives for creativity in solutions, elegance in implementations, and craftsmanship above all. Hacker’s aren’t made by taking the easy path, they aren’t found in the Jim’s of the world – who study only what they know. Hacker’s are the Jane’s, the people who challenge draw inspiration from fields which no one else would think to combine. They’re the musicians-turned-mathematician, the philosopher-hacker-kings, they’re the people who chart the edge of the world and shout, “No dragons here. Keep sailing!”

Maybe you charge my view of these hacker-heroes is pure romance, if so, then I am a romantic. I call them as I see them.


[1] The former of which I love; the latter of which I emphatically do not.

[2] A definition which is principally derived from the definitions found in the Jargon File and similar.

Sortah? I Barely Know ‘Er!

What is Sortah?

Sortah is a embedded DSL for Ruby, which provides a framework for writing email processing routines in Plain Old Ruby Code (PORC[1]). It looks a bit like this:

sortah example original article
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
sortah do
  maildir "/home/jfredett/.mail/"

  destination :work, 'work/'
  destination :gmail, 'gmail/'
  destination :personal, :gmail           # aliases 'gmail' destination
  destination :trash, :abs => '/dev/null' # an absolute path to somewhere on
                                          # the system


  lens :spam do
    blacklist = CSV.read("blacklist.csv")
    email.body.split.inject(0) do |a, word|
      a += 1 if blacklist.include? word
    end
    true if a > 20
  end

  router do
    send_to :work if email.to =~ 'joe@work.com'
    send_to :spam_filter
  end

  router :spam_filter, :lenses => [:spam] do
    send_to :trash if email.spam
    send_to :personal
  end
end

The above is a pretty straightforward example of how sortah handles email. Things to notice include how ‘lenses’ – which are like filters your email will pass through, which may ‘color’ the email with additional data – are specified as dependencies of ‘routers’ which provide a way to specify routing logic. The main idea is separation of email processing from email routing. The former being tasks like running an email through a bayseian filter, or inspecting for presence in a blacklist. The latter being tasks which use that information to determine where things go on the system. Sortah is smart about how it runs lens-dependencies, so that they only ever get run once (a bit like rake task dependencies), this allows you to build up a library of lenses which you can keep around even when you need to change sorting logic, and vice versa – encaspulation, it works!

Enough about what it is, why is it?

I wrote sortah because things like procmail and seive – while (at least the former) is venerable and still very powerful – have utterly arcane syntax, and often make even the simplest of processing tasks very difficult. For instance, look at this seive example:

# Sieve filter

# Declare the extensions used by this script.
#
require ["fileinto", "reject"];

# Messages bigger than 100K will be rejected with an error message
#
if size :over 100K {
  reject "I'm sorry, I do not accept mail over 100kb in size. 
  Please upload larger files to a server and send me a link.
  Thanks.";
}

# Mails from a mailing list will be put into the folder "mailinglist" 
#
elsif address :is ["From", "To"] "mailinglist@blafasel.invalid" {
  fileinto "INBOX.mailinglist";
}

# Spam Rule: Message does not contain my address in To, CC or Bcc
# header, or subject is something with "money" or "Viagra".
#
elsif anyof (not address :all :contains ["To", "Cc", "Bcc"] "me@blafasel.invalid", 
             header :matches "Subject" ["*money*","*Viagra*"]) {
    fileinto "INBOX.spam";
}

This example is similar in complexity to the sortah example above, and while it is miles ahead syntactually, it’s also ugly as sin. I wrote sortah because I couldn’t understand why anyone thought this:

:0:   # Deliver to a file, let Procmail figure out how to lock it
* ^From scooby
scooby

:0    # Forwarding; no locking required
* ^TO dogbert
! bofh@dilbert.com

:0:snoopy.lock  # Explicitly name a file to use as a lock
* ^Subject:.*snoopy.*
| $HOME/bin/woodstock-enhancer.pl >>snoopy.mbox

was a good idea. It’s, frankly, impossible for me to wrap my brain around this language – seive is better, but it has the unfortunate design choice of mixing business (routing) with pleasure (processing) – I think sortah excels here.

One of the most fundamental things I thought should be available to the intrepid email sorting loon is the full weight of a real live programming language. Sortah provides one, Ruby. You can happily write a lens which provides you with a piece of email metadata in the form of a custom class – I actually outline such a case in the README on github, where I use a Contact class to help sort email to appropriate folders in a contact list.

Further, Sortah is declarative, which makes it much easier to compose than procmail or seive. In fact, in generally boils down to calling load somewhere, and routing into the library you’re loading. This means it’s much easier to share common code. Rather than needing to manually bind to SpamAssassin, we could bind to it once, and share, or – even better – make it available directly in ruby.

Alright, what’s the catch.

Well, it’s a bit slow at the moment, mosty because each email loads your sortahrc every time it’s called. A future release will fix that by letting sortah cache a few emails, or maybe run as a server, or – well I haven’t figured out how I want to solve that problem yet. It also doesn’t provide a way to “re-sort” an inbox, though this should actually be pretty easy to do with find and some ingenuity.

It also doesn’t natively support maildir or mbox or anything, it just writes files to the filesystem. You should be able to make it work with anything you like in fairly short order, but it does take a bit of work at the moment.

Finally, this assumes you’re using getmail, it assumes you send mail from getmail to sortah as a MDA_external. I haven’t tested other uses of it, so YMMV if you try something novel (if you do, tell me about it!)

Where can I get it?

You can get it from my github if you want the source, or you can do

gem install sortah

in a gemset somewhere and use rvm exec, etc. There’s a tutorial for setting it up in the github repo, the setup worked for me, but it’s not particularly optimized or highly tested.

I hope you find sortah useful, I’ve had a lot of fun building it. I really hope if finds some use amongst you few who found that mutt + procmail|sieve + getmail + blahblahemailtoolsblah was just too much work to get delicious commandline email management find this to be a nice weight off your shoulders. I know it was one of the things holding me up.


[1] I’m making this a thing.

A Little Static Site in the Meantime

For now, this is a bit of a placeholder site to alleviate my blogging addiction. I have bigger plans in the works for my own site, but I wanted somewhere to hang out in the meantime.

Occurs to me, didn’t tell you who I am. I’m Joe Fredette, Mathematician, Haskell Hacker, Rubyist, Web Developer, and generally amazing person. I like to do math, write software, play video games, and talk to much. This is my blog, it will house many things, including words from my brain. You have been warned.

Hope you like it around here!