Tracking updates in web libraries - a small development pattern

My webapp uses various API's and in particular it pulls in things like Twitter's widget.js - which doesn't have a github home or similar (unlike, say, underscore.js that I can track from its source repository).

I could save my own copy and have my webapp always use that, but that's not such a great idea as when twitter change their API, they also change widget.js to compensate.

So my webapp always pulls in the master authoritative copy, but then I tend not to notice when it changes - and sometimes this breaks my assumptions about how it works (also, changes in widget.js are a good way to track changes to certain API details that I access outside of the widget.js library itself).

So, as there's also no RSS feed about updates or similar, I instead store widget.js in my source repository, even tho my app doesn't use the copy I've stored, and in my development workspace I regularly regularly fetch the latest widget.js (for me, this is in the script to restart my personal development environment, launch the webserver etc but you could do it in a cron job)

   wget -O js/3rd/widget.js http://twitter.com/javascripts/widgets/widget.js

And thanks to the wonders of distributed source code control systems and their "no check out required" model (I use bzr, the same is true of git and others) then if the authoritative version hasn't changed, then this has no effect, but when it does change, I suddenly see the file as modified in my "modified files" report ("bzr status"), and I can diff the new against the old and see if I need to react accordingly.

It's more of a hack than anything else, but works surprisingly well, and has stopped me being caught out (especially with "silent" changes where there's no official communications about such things).

 

Other projects - extensions for Google Chrome

I wrote a couple of small extensions for Google Chrome, aimed particularly at small annoyances for developers.

It's a great web browser for developers, and there's the Web Developer extension amongst others, but these are two extensions that I can't find elsewhere

  • JsError is a minimal extension to flag errors in a page (typically in the javascript) without the need to open the developer tools (which may trigger breakpoints and the like) within Google Chrome™.
  • ForceReload is hard reloads the current tab to overcome caching issues when using Developer Tools for Google Chrome™

I should put the code up on github (it's only a few lines) but there are the links anyway.

 

Other projects - my github account and UglifyJS

I've been adding some features to UglifyJS - a rather nice JavaScript parser / compressor / beautifier that is itself written in JavaScript.

It implements a JavaScript 1.5 parser, and then has various routines to walk the AST doing things like renaming variables and collapsing selected expressions.

I've added the ability to safely replace selected global symbols with constant values (which can then allow the minifier to collapse entire sections, like #ifdef in C++) which has been pulled into the main tree, and the ability to spot and shortcut constant expressions involving &&, ||, and ?: (ie eliminate the RHS when a true constant value on the LHS means that the RHS will never be evaluated).

I've also added the ability to mangle selected object property names, which obviously needs some care, but is great for obfuscating internals and can shorten long method names etc. This change is only in my fork for now - https://github.com/schmerg/UglifyJS (see the mangleprops branch)

I also toyed with the idea of extending the parser to understand some features of later versions of javascript (eg the very useful let statement of 1.7) and have an option to compile them down to javascript 1.5, so that






var x = 1, y =2;


if (something) {


  let x = x+y;


  console.log("X is now "+x+" and y is "+y);


}


console.log("X ends with value "+x);
<div></div>



would be re-written as 






var x = 1, y =2;


if (something) {


  (function(x) {


    console.log("X is now "+x+" and y is "+y);


  }(x+y);


}


console.log("X ends with value "+x);
<div>

Yeah, I know about things like coffee-script, but I don't want to be debugging something too far away from my original source, and I hope eventually javascript 1.7 will be supported in more browsers (in which case I can stop using this conversion)

And I'm also thinking about inlining selected methods - I know javascript engines do this internally, but there are sometimes big wins to be had from inlining trivial functions (which you've coded a such to avoid repeating the same expression endlessly). Again this is a bit like the pre-processor in C++, but because it would be done as part of the proper parse process (rather than limited text substitution) then I think it could be done much more safely.

Anyway - I'll post things to github as I go...

 

check_goodrcptto - a qpsmtpd plugin for checking recipients against qmail config

I use qpsmtpd, a sort of mod+perl for SMTP, to front end my SMTP server. This lets me configure various checks in the name of blocking spam and managing mail for my private domain. There are plugins to check messages before accepting them (scan headers, check the body with SpamAssassin, greylist senders or blacklist senders, check for certain bad behaviour typical of spambots etc) and plugins to deliver mail to various back end systems.

I use this to front end qmail which, amongst other features, has nice handling of aliases including wildcard addresses - so after geting various 'dictionary spam' attacks (mail sent to random names in the hope of hitting a valid user) as well as getting spurious bounces from the joe-jobbing activities of spammers, I wanted to have qpsmtpd understand which addresses were valid and which weren't.

As a result, I wrote the following plugin that knows how to parse the various qmail config files for user assignments and aliases.

It may not be perfect - in particular

  • it's written against an old version of qpsmtpd - I think there are conveniences in newer versions to tidy up some logic
  • it knows about qmail files and config setup, so isn't very generalised for anythign other than qmail
  • it assumes it'll be able to access the various files that qmail reads - this is true for my setup, YMMV
  • it's not in github or launchpad or similar (but may be soon)

but if you're using qmail behind qpsmtpd and you use the assigns file or similar, then you may find it handy.

<div>
<pre>=head1 NAME

check_goodrcptto

=head1 DESCRIPTION

A qpsmtpd plugin checks that the name recipient is valid according to the qmail
config and refuses the mail otherwise.

See http://wiki.qpsmtpd.org/ for details of qpsmptd itself

=head1 CONFIG

Takes the name of the qmail assign file - normally /var/qmail/users/assign

=head1 AUTHOR

Written by Tim Meadowcroft - http://schmerg.com

Published under the same license as Perl itself - you're free to use this as you see fit.

=cut

sub register {
  my ($self, $qp, @args) = @_;

  die "Requires the path of the assign file (usually /var/qmail/users/assign)"
    unless (@args > 0 and -f $args[0]);

  my $assign = ReadAssignments( $args[0] );
  die $assign unless ref $assign;
  $self->{_assign} = $assign;

  $self->register_hook("rcpt", "rcpt_handler");
}

sub ReadAssignments {
  my $lines = slurp($_[0], sub { [ grep(!/^\s*#/, @_) ] } )
    or return "Can't read the assign file $_[0]";

  chomp @$lines;

  # last line should be a single dot
  return "Assign file not properly terminated" unless $lines->[-1] eq ".";

  # extract simple assignments first
  #   =address:user:uid:gid:directory:dash:extension:
  # Messages for <address> will be delivered as user <user>, with the
  # specified uid and gid, and the file <directory>/.qmail<dash><extension>
  # will specify how the messages are to be delivered.
  #
  #   +prefix:user:uid:gid:directory:dash:prepend:
  # Messages received for <prefix><rest> will be delivered as user <user>,
  # with the specified uid and gid, and the file
  # <directory>/.qmail<dash><prepend><rest> will specify how the messages
  # are to be delivered.
  my %a;
  foreach (@$lines) {
    my $type = substr($_,0,1,"");
    my($address,$user,$uid,$gid,$dir,$dash,$ext) = (split(":", $_), ("")x7);
    if ($type eq "=") {
      # got a user
      $a{user}->{$address} = { user => $user,
                   dir => $dir,
                   dash => $dash,
                   ext => $ext };
    } elsif ($type eq "+") {
      # got a prefix
      $a{prefix}->{$address} = { user => $user,
                 dir => $dir,
                 dash => $dash,
                 ext => $ext };
    }
  }
  return \%a;
}

sub slurp {
  my $file = shift;
  my $fh;
  open($fh,$file) or return undef;
  my @lines = <$fh>;
  close $fh;
  return @_ ? $_[0]->(@lines) : \@lines;
}

# $recipient is a Mail::Address object, see if it looks deliverable
sub rcpt_handler {
  my ($self, $transaction, $recipient) = @_;

  $self->log(LOGDEBUG, "check_goodrcptto of ".$recipient->user);
  return (DECLINED) if $recipient->user eq "";

  # we only check recipients for the domains we accept - let any relayed
  # mails pass thru (assuming that relaying is allowed) including 
  # no hostname (so plain "postmaster" and "abuse" works)
  my @rcpthosts = $self->qp->config("rcpthosts") or return (DECLINED);
  my @localhosts = ($self->qp->config("me"), "localhost", qx(hostname), "");
  chomp @localhosts;

  my $host = lc $recipient->host;
  return(DECLINED) unless grep($_ eq $host, @rcpthosts, @localhosts) > 0;

  # Look up this user and see if it looks like a valid user
  my $user = $recipient->user;
  $self->log(LOGDEBUG, "check_goodrcptto: $user needs checking");

  if (CanBeDelivered($user => $self->{_assign}))
  {
    $self->log(LOGDEBUG, "$user accepted");
    return DECLINED;
  }
  my $sender = $transaction->sender->address;
  $sender = "" unless defined $sender;
  $self->log(LOGDEBUG, "check_goodrcptto: $user is rejected, tell $sender");

  # genuine mistake or, more likely, spammers flooding us  
  return(DENY, "No such account - mail to $user not accepted here")
      unless (not(defined($sender)) or $sender eq "");

  # bounce of email form a non-existant user - recommend SPF
  return(DENY, "No such account as $user - checking SPF records would prevent bouncing of joe-job emails");
}

# Returns a name if we believe a message can be delivered to the specified
# user, or undef if not...
sub CanBeDelivered {
  my($user,$assign) = @_;

  # Look up this user and see if it looks like a valid user
  # Delivery will be according that user's ".qmail" or the defaultdelivery file
  return $user if exists $assign->{user}->{$user};

  # if the user isn't directly listed, check the prefixes, longest first
  foreach my $prefix (reverse sort {length($a) <=> length($b)}
              keys %{$assign->{prefix}}) {
    if (substr($user,0,length($prefix)) eq $prefix) {
      # this prefix matches the specified user part of the email address
      my $v = $assign->{prefix}->{$prefix};
      my $rest = substr($user,length($prefix));
      my $dotqmail = $v->{dir}."/.qmail".$v->{dash}.$v->{ext};
      foreach ($dotqmail.$rest, $dotqmail."default") {
    if (-f $_)
    {
        my $d = slurp($_, sub { chomp @_;
                    return join(", ", grep(!/^#/,@_)) });
        return $v->{user}." ($d)";
    }
      }
    }
  }

  return undef;
}

1;

Force Reload - an extension for Google Chrome™

Web developers often find that, after using Developer Tools, the browser caches their files. So if they change a source file and reload, the browser doesn't use the new version, but continues with the old version (see for example http://code.google.com/p/chromium/issues/detail?id=8742 for reports on the issue).

Personally I've found closing the tab and opening a new one helps - so this is a mininmal extension that does just that. It's my first extension, and only a few lines of code... it works for me, and I hope it works for others.

You can find the extension on the Chrome Web Store

The icon used is courtesy of newmoon's handywork at http://code.google.com/p/ultimate-gnome/

 

The new version (1.1) of the extension includes an option (disabled by default) to catch the Ctrl-F5 keystroke and override the browser's builtin behaviour, but please note that when first installed, even if this option is enabled, it will not catch the keystroke on tabs that are already open until the page is otherwise refreshed... but any new pages will behave as required.

If it doesn't work how you'd expect, then you can contact me as meercat on gmail, or add a comment below. It would help if you could briefly describe

  • whether you're using the toolbar button or the keystroke to invoke the extension
  • what version of chrome you're running (and platform)
  • what version of the extension you're using
  • how I can contact you !!

 

 

 

The Trip

So maybe I feel some affinity with Steve Coogan because we grew up in the same part of the same small town, and at the same time (as far as I know), but while I've appreciated his work and Rob Brydon's, I've never really felt that close to the bits of their work I've liked, and have freely skipped the parts that seem excessive.

But the new show, The Trip, is a BBC triumph. Undoubtedly it helps if you know who the 2 characters are and the back stories of their public and private personas (but hell, there are American critics who it seems are only comprehensible if you're intimate with their entire works) but I'd like to think that even without that knowledge this new show of a few, short 30 minute episodes about 2 people with history and baggage, is just some of the most interesting, honest, intelligent and illuminating TV I've ever seen.

On the way it mercilessly rips the piss out of all sorts of cliches that are such holy cows that they normally can't be touched: ironic road trips, middle age conceits, the very art of food criticism, amusing chat techniques (professional and amateur), celebrity as superman, comedian as everyman, the educated version of male bravado... It's like a meta-critique of meta-criticism (which is obviously not meta-meta-criticism, or have I just laid myself open to similar ridicule?).

The fact it's all filmed in HD and mixes what appears to be genuine footage of at least some of the people involved (the sign of a good meta-level anything is not being patronising when they want to show the authentic item underneath) makes it all doubly delicious - did they enhance somehow the way Steve Coogan's head appears so small in comparision to the headrests in the Range Rover when set against the skyline of the Lake District and the muted colours of the various autumnal scenes, or is that just part of the understated way it's been filmed? I don't know, and given how the show works I feel it's one of the few genuine cases where to ask what of the incidental was intended and what just happened, what was scripted and what was ad-lib, what was set-up and what happened in hindsight, would be to ruin the show itself. The story line and the various external character interactions are obviously planned, but how many of the smirks and jibes, the monologues and repeated themes, the reactions and the sneers, are character or character actor... I don't think I'd believe the truth if you told me.

I've missed the "normal slot" so I'm watching it on digital catch-up but if you haven't seen it, grab it while you can. I swear than in 3 or so hours it'll tell you more about people and some of the ways we genuinely interact than 10 times as much of the The Sopranos, The Wire, or almost anything else I can think of.

Less is more and all that... if I've made it sound all noble and humourless and worthy, it's not, it makes me cry laughing, but when some government finally manages to dismantle the BBC and sell it off in constituent pieces to various vultures and pornographers, this is the sort of show that will be rediscovered in 20 years time and held up as an ideal of the sort of thing we've lost.

 

Concurrent programming is hard (so don't do it yourself)

Writing code to be concurrent is hard and, I'm convinced, almost always the wrong model. We all know threading is hard, and leads to all sorts of subtle errors, but I'm pretty sure that no layer of dressing this up nicely will really get rid of the fundamental problems.

My particular bugbear is composability of concurrent code - how do a I write a component that executes concurrently but in a way that suits the concurrency needs of any arbitrary caller while maintaining the encapsulation of its functionality. Let's say my component has its concurrency implemented properly and in particular it can perform its work concurrently to make best use of a multicore or multi machine environment - but how does it know how much concurrency to use? I don't know if, to the caller, the execution of my component is the critical path and should use as many resources as it can to complete in the absolute shortest time possible, or whether it's not at all critical and should execute using the minimum concurrent resources as they're intended to be dedicated to some other components which are the critical path.

With explicit threading, people tend to simply make the component split its work into as many threads as it can, and then let the machine scheduler decide which threads will run, but even if thread creation and admin overhead were zero (which it isn't) the fact is that the machine scheduler doesn't arbitrarily know which threads are critical and which aren't and so the more threads (and IPC calls and communication resources etc) each component makes, the more resources they'll be effectively stealing from the true critical components.

So one suggested approach is to write components that, in addition to their inputs, also take some sort of control block that specifies what resources it can use and how it should use them ("maxthreads" and "thread priority" values), but this is a very poor composability model - what if my component then needs to use a number of others as part of its execution, without knowing precisely how those components work, execute, and consume resources, then there's no good way to take the control block I've been given and interpret how to specify control blocks to those subcomponents.

And of course, if the concurrency requirements and performance are influenced by some non-trivial characteristic of the input data, say not just the number of items in a list but depends on the values involved, and it in fact requires some partial solution of the problem to determine how much concurrency can be achieved, then the composition of an overall solution from encapsulated components becomes horribly more complex.

Another way to address this is for components to examine their environment and adjust their execution depending on what they find, but the critical path issue arises again. Just because there are execution units available and idle (assuming I can determine this) doesn't mean that, for optimal execution of the system overall, that I should go ahead and use them... it may well be that those units are being 'held' to be used by some other component that's not ready to execute yet (it's waiting for its inputs to arrive) but that will be the critical calculation when it's ready to run.

Without composition, I have no way to build components that can be used to construct large ad-hoc programs for a truly efficient concurrent execution. Anything more complex than an embarrassingly parallel model becomes a one-off special case construct, or is made in way that shoehorns my code into some concurrency framework that probably cripples my way to express a problem and delivers only a fraction of the performance that could be achieved given a particular problem and a particular set of resources.

So I devised a little model to avoid this - a functional concurrent language model where everything is composable. There's no explicit concurrency because the model itself can determine the concurrence capability of every single statement (much like a modern compiler and CPU can see chances to boost performance with out-of-order execution) and then assemble larger concurrent blocks. Every component can be written as simply a statement of the calculations required including use of other components, and the concurrency is determined dynamically in advance of execution but only when a complete composition is executed. The same components can be used at multiple points in such a solution, but the concurrency demanded of each one is independent of the others. A program written in this way will dynamically change its behaviour depending on the execution environment - it balances the load across the given resources aiming to find an execution schedule that completes in the minimum time possible, that time being the time required for the critical path as determined for those execution resources (including the cost of communicating between execution units where required). Running the same program (with the same inputs etc) on two different execution environments that differ in available resources will result in different critical paths.

It's not a perfect solution of course, as it stands it makes some simplifying assumptions, and as a small but working model it has certain limitations - in particular the model would require expanding to deal with arbitrary dimensionality of the problem expression. And while functions are composable, it imposes certain limitations on operations such as passing functions themselves as parameters or return values.

But it has some fascinating characteristics, certain whole program optimisations become available, and as it balances the loads it takes communication times into account. So it doesn't just blindly assign chunks of work to arbitrary execution units but can take account of the fact that you typically have a grid of multicore machines and making data such as inputs and outputs and other artefacts (task completion and related admin info, memoisation caches etc) available between cores is orders of magnitude quicker than communicating the same information between machines. So it can decide when, for example, it's worth doing local (per machine) or global memoisation of certain operations. And given its whole program optimisation, it knows the point at which no further resources will boost execution any further - given an excess of execution resources it will determine the a minimal subset of those resources it needs to meet the calculation demands. This is, admittedly, possibly a sub-optimal minimal set (a better or exhaustive scheduler may be able to do better), but it at least knows that it has no way to use further resources and in fact will organise itself to load the execution units in an optimal way and leave other resources completely spare rather than spread itself too thinly.

And given its strongly functional constraints it's also ideally suited to repeated calculations with perturbations to the inputs - the calculation and memoisation model can be tuned so that it performs only the minimal partial recalculations required for any changes. And those repeated partial recalculations can take account of, but are in no way constrained to, the resources that were used for the original calculations. The partial recalculations will each have their own critical path depending on the data changed.   This has, needless to say, huge benefits for certain classes of problems.

But where next ? It's an intriguing prospect, but is the sort of thing that I should probably investigate in a more academic environment. It's only a small model, a prototype to illustrate the principles and show what could be done... anyone fancy sponsoring a PhD for a truly concurrent language and execution model ?

 

No.. thought not... oh well, back to re-inventing the way we organise ourselves and maybe I'll publish the language and code sometime.