Recruiters: Here’s the rules.

  1. The first recruiter to tell me the company name, and then send the job-spec gets to forward my details – if I think it’s interesting.
  2. No company name, or spec, no chance
  3. If you send my details without my OK, you lose. And I tell the company you are a loser (chances are, they are too).
  4. Sending me all the information I could want to know is good – but when you do it on spec (and probably en-mass as well), does not mean that you get to claim the bounty.
  5. Let us know what is happening. That especially includes feed back from the employers.

All of the above have happened to me.

About that last point about feed-back? There’s one recruiter that is on my list (it’s not a good list) because he didn’t bother telling me what the employer was saying about me, but a different recruiter did find out and let me know. The 2nd one has still got a chance to place me, but the first, not so much. Ironically, the comments were about some rants I posted on my LinkedIn profile. It’s also a potential employer I no longer care about working for.

The strangest story happened to me about 15 years ago. I was working with a small recruiter and spending a couple of days to tweak a CV so that it was just perfect for a potential job (this was when I still wrote CVs, this year, it’s all on websites to read, not MSWord documents). Then I got a phone call from one of the largest recruitment companies in town – they had sent my CV (without my knowledge) and the employer was interested in talking to me. WTF? That was so not good. It was even worse for the small recruiter though – it turns out he knew the rogue recruiter. He was married to her.

Finally, when you do contact a candidate with a potential role, make sure you send them your details – and about the job(s). Just a quick email with a note with the what and where. Without it, they will not know how to get back to you for anything. I know you love to talk on the phone (and it avoids that pesky audit-trail), or you might make wonderful notes in your recruitment systems for yourself, but when us developers are looking, we can get a dozen phone calls a day from all different recruiters, and quite likely on our mobile phones to boot. We’ve not got the chance to write it all down most of the time, so you should, and drop us a note about it. Otherwise, we can’t get back to you – if it was interesting. So, it’s in your own benefit to keep us in the loop.

Tagged with: , ,

If you aren’t taking hiring seriously – other people can, and do hire the people you need.

I’ve been guilty of it before – leaving it a couple of days – or even a week before getting back to someone that sent in their CV – although of course, most of the time, it didn’t matter. The person wasn’t going to get hired because they were just not good enough (the generally poor quality of developers is a different rant).

A couple of times I have been bitten hard when hiring though, such as being introduced to a sysadmin on a Thursday night, following up late Friday afternoon and finding out on Monday when I chased him up, that he had just accepted an offer.

So, what to do? Well, to be honest, all you can do is be swift about things. Check all CVs that come in within a couple of hours at most, and for those that show promise, get back to them and arrange the next step as quickly as you can (probably a quick chat on the phone?) and pencil in some time – in your own calendar, if not theirs – a potential time to sit down with them properly.

Please though, after you’ve had the interview get back to them quickly. Occasionally, I’ll have left them with a little thing to do (some code to write, or something to get back to me on), it’s a good idea to just drop a quick email confirming that after they step out the door. A couple of times when I was looking for a new job, I’ve actually emailed them back that afternoon, or before lunchtime the following day to follow up with some code. Both times I was starting that role inside two weeks.

When I’ve been interviewing, I’ve even offered someone a job before they left the interview. It was obvious that the guy was a good developer – just searching for him online found a number of posts he’d done into relevant mailing lists. A few years later, I’d moved on myself, he was now freelancing, so on my suggestion he was interviewed again, and promptly hired again.

There is a cut-throat market for developers in the last few years, and that’s not likely change. Really good people will always have a choice if they want it. You, as an employer, need to be worth working for.

  • Interesting project(s)
  • Enough money for that not to be an issue – though salaries for the best devs are rising fast
  • Working conditions that don’t get in the way

A future post will touch more on my ‘perfect wish-list’ of working environments.

Tagged with: ,

Kindle 'screensaver'

A quick fun post for those of you with an Amazon Kindle – some instructions on how to a) jailbreak your reader (trivially easy), and then b) put your own wallpapers on there, so you get a more interesting ‘screensaver’.

It’s really easy, no more than 20 mins and a couple of reboots/software updates. Most of the time is literally waiting for the reader to restart after you’ve placed a file in the base directory.

All of the instructions are here:

A few notes to make it easier to understand:

  1. The current version is around 3.3. You can double-check that you have this (or later) by pressing ‘Menu’, and going down to ‘Settings’. The version is on the bottom line: “Page 1 of 3 Version Kindle 3.3 (61680021)”
  2. You will likely need the version marked “*-3.2.1_install.bin”
  3. Once the jailbreak has been installed, you can install the screen save hack
  4. The zip file has a ‘src’ folder, you’ll also need to copy the src/linkss/ folder to your Kindle, as /linkss/
  5. Finally, you can add suitable wallpapers into the /linkss/screensavers/ directory. They should be greyscale .png’s or .jpgs

Non functional demo unit

Kindle screensaver

A quick search for “kindle wallpaper” or “kindle screensaver”, especially if you restrict it to 600×800 pixels sizes will bring up a lot of possibilities. On the right is one I made myself (I’d heard of this particular joke wallpaper before), which you are welcome to use.

Tagged with: ,

I was out last night at the The Big Xmas [bash] #, near Silicon Roundabout. It was a fun night out meeting various people, tech, business and recruiters. Oh, the shame though – I was wearing the same T-shirt as someone else – and, yes, I have indeed replaced people with small shell scripts.

Now, to the main part of what this post is about – the rant. It’s not aimed at the particular event last night alone though. It’s alcohol at various tech-meetups in general. Look guys, you generally end up buying too much anyway, and all too often its also to the exclusion of those that may prefer to not get inebriated.

As an example, The Hacker News meetups will get dozens of pizzas (which are, admittedly all eaten – there are 150+ people attending usually), but also a couple of stacks worth of cans and bottles, each with 24 cans in each tray, several hundred cans at least. It’s just as well they aren’t all drunk on the night – many of the event-goers would be unconscious by the end. At least they will also add a few trays of soft-drinks, Lemonade and Cola.

If you want a couple of drinks to help lubricate the social aspect of an evening out, I’ve got no problem at all. I don’t though. I prefer to save my brain cells for doing interesting things, like oh, writing code?

For other events, how about adding some more soft drinks to replace some of the alcohol? Last night, the choice was booze, or fizzy water; That was all.

Thankfully, people don’t generally get blotto at the various meetups – at least that I’ve seen, but I expect there’s been one or two that have swerved their way home sometimes.

Do you have a comment about alcohol being served at the various meetups?  Would you like more, less, or do you think that organisers and sponsors are doing it right?  I would love to start a conversation here about the good or bad of it.

Tagged with: ,

Capistrano, makes deployment of code easy. If you need to do a number of additional steps as well, then the fact that they can be scripted and run automatically is a huge win.

If you’ve only got a single machine (or maybe two), then you could certainly write your own quite simple, and entirely workable system – I described something just like this in a previous post: “SVN checkouts vs exports for live versions”. That was written and used before I was deploying to multiple machines however – and had to be run from the command line of the machine itself. It was OK even when I had a couple of machines to deploy – I just opened an SSH to both, and ran the command on them both at the same time. When I attended the London Devops roundtable on Deployment I even advocated for that as a valid deployment mechanism. But, at the same time, as I was saying that (and it’s in the video), I was also writing Chef cookbooks and a Capistrano script to be able to build, and then deploy code to at least four different machines at once.

A number of people have already written about how to setup Capistrano to deploy PHP scripts. I’ll not repeat their work, instead I’ll just tell you some of the problems you might come across afterwards.

cap shell is a wonderful thing, until it bites you

The Capistrano shell will let you run a simple command or an internal task on one, or as many machines as you want. This can be useful when you are trying things out – and if you are in anyway unsure where a command can be run – you can practice it, just do:

cap> with web uptime
cap> on uptime

Those two commands just show how long a machine has been up, and the current load average. Easy, and safe, but as they run, they show the list of machines they succeed on.

There are some other useful commands you can try:

## show the currently live REVISION file on each machine
cap> cat /mnt/html/deployed/current/REVISION
## This file is created as each new/updated checkout is done.
## change your path to the ./current/ path as appropriate

Since you should be deploying the same codebase to all your live machines at a time (or staging, or qa/test), the versions (or git sha1’s) should be the same as well.

Finally, in the ‘useful’ list is cap deploy:cleanup – this will remove old deployments. Keeping a few around are useful, but they can take up a lot of space. As cap --explain deploy:cleanup says:

Clean up old releases. By default, the last 5 releases are kept on each server (though you can change this with the keep_releases variable). All other deployed revisions are removed from the servers. By default, this will use sudo to clean up the old releases, but if sudo is not available for your environment, set the :use_sudo variable to false instead.

If you want to change the default to something other than 5, that can be set with the line “set :keep_releases, 10” in deploy.rb.

A few gotcha’s

When cap shell checks the source repo version

I’ve found that the latest version available in the main source code repository is only apparently checked when the Capistrano shell is first run. This can be useful if you want to check out to a limited set of machines, run a test and then check out to all the machines (you end up with the same version checked out in the same-named ‘releases/’ directory), but if you are sitting on the cap> prompt in Capistrano shell and doing multiple !deploy commands, you won’t get new versions of code that have been committed to the repository. Exit the shell, and re-run to solve this.

You checked out a new version, but you can’t see it

Be wary if you are logged into the machine, and sitting somewhere inside the ./current/ directory. Because of the symlink is being changed underneath you to a new directory that is being pointed to (the newest subdirectory in releases/), if you do not do a cd . to refresh your location within the real directory tree, you will still be in an old copy of the code. The ‘cd’ makes sure you are in the latest place on disk, via the (now changed) symlink.

Rolling back

Capistrano has the ability to remove the currently live version, and change the ‘current’ symlink to the previous location. Should the worst happen, and a website deployment fail, this can help, if ‘rolling forward’, with a fast-fix, check-in and redeploy may not be easily possible.

# to roll back to a previous deployment:
cap> with !deploy:rollback

If you have rolled back the webservers (php/app servers) you will have to restart php-fpm (or maybe Apache) on the servers, as they do not necessarily pick up the (old) versions of code that is being run now. The same would also be true if you have set APC to cache the byte-code and not look at the time-stamp of files in case they change. I’ve found that PHP-FPM also has this issue.

Tagged with: ,

I’ve been pretty busy in the last couple of years, first at Binweevils and in 2011, PeerIndex – hence the utter lack of posts, but as the note on my personal CV site says, I’m taking some time off between looking for my next role. This does give the opportunity to write more of PHP Scaling and the tools around development that I’ve been using in the last couple of years, and that have been piquing my curiosity.

So, it is my plan to investigate other languages such as Python and Ruby, and tools like Puppet and Node.Js. Rest assured, I’ll keep up with the state-of-the art in PHP and such technologies as MongoDB though!

There’s also a number of planned posts right here, more for Beanstalkd (and talking about other queues), Deployment with Capistrano, graphing and logging (including how to mark a Capistrano deployment in a graph!) and a few other things, including rants.

Tagged with: ,

I’ve previously shown you why you may want to put some tasks through a queuing system, what sort of jobs you could define, plus how to keep a worker process running for as long as you would like (but still be mindful of problems that happen).

In this post, I’ll show you how to put the messages into the queue, and we’ll also make a start on reading them back out.

For PHP, there are two BeanstalkD client libraries available.

Although I’ve previously used the first class in live code, I’m preferring the second, ‘Pheanstalk’, for this article. It is more regularly worked on, and uses object orientation to the fullest, plus it’s got a test suite (based on Simpletest, which is included in the download).

Using it, according to the example is simple:

The ‘pheanstalk_init.php’ file adds an autoloader, though you may find it advantageous to move the main class file hierarchy from where it had been downloaded into its own directory so that an existing (for example Zend Framework) auto-loader can find it.

As you see above, the Object Orientation lends itself well to (an optional) ‘fluid’ programming style, where an object is returned and then can be acted on in turn
$pheanstalk->useTube('testtube')->put("job payload goes here\n");

So, putting simple data into the queue, is, well, simple (as it should be). There are advantages in wrapping this simplicity into our own class though. Some examples

  • We want to put the same job into the queue multiple times – for example, a call to check some data in 1, 10 and 20 seconds time.
  • Adding a new default priority – or with multiple classes, a small range of defaults
  • adding in other (meta) information about the job that is being run, such as when it was queued, and how important it is. Some tasks might be urgent, but not important – ie, if we have the opportunity, run them now – but it doesn’t have to be run at all.

Each may be simple enough to create a simple loop, but it might be advantagous to push that down into a class – and especially with the final idea.

How to store the meta-information then? It should be a text-friendly, but concise format, and quick to parse. Here, JSON (or the related Yaml) fits the bill quite nicely.

Processing it at the other end, after it has been fetched by the worker is a simple matter of running ‘json_decode()’ and extracting the [‘task’] from the results before running it.

Tagged with: , ,

I’m taking a slight diversion now, to show you how the main worker processor runs. There are two parts to it – the actual worker, written in PHP, and the script that keeps running it.

For testing with return from the worker, we’ll just return a random number. In order to avoid returning a normally used exit value, I’ve picked a few numbers for our controls, up around the 100 range. By default a ‘die()’ or ‘exit’ will return a ‘0’, so we can’t use that to act on – though we will use it as a fall-back as a generic error. Ideally, we won’t get one, instead we want the code in all the workers to just run as planned, and then have the worker execute a planned restart – and we will just immediately restart. We may also choose to have the worker process specifically stop – and so we’ll have an exit code for that. If there are any codes we don’t understand, we’ll slow the system down with a ‘sleep()’ to avoid running away with the process.

The actual script that is run from the command line is a pretty simple BASH script – all it’s got to do is to loop, until it gets a particular set of exit values back.

So, if it’s an exit value we know, we either
1/ pause, then restart
2/ immediately restart
3/ exit the loop.
If its any other value, we pause, and restart.

The bash command ‘exec $0 $@’ will re-run the current script ($0) with the original arguments ($@) – but with the ‘exec’, replaces the current process with a specified command. Normally, when the shell encounters a command, it forks off a child process to actually execute the command. Using the exec builtin, the shell does not fork, and the command exec’ed replaces the shell.

Save both the PHP and bash script, and then you can start the script with ‘sh’, run it a few times to see a lot of (deliberate) errors that cause the bash script to pause before restart, immediately restart and finally exit.

With this bash script in place, we can now run the script as many times as we need – and it will keep running, until we specifically tell it to exit. As usefully, we can exit the php worker, and have it execute a planned restart – which will clear any overheads that the script may have picked up with memory or resource allocation.

Next time, we’ll put some simple tasks into the queue.

Tagged with: , ,

The use of Beanstalkd as a queueing system

What is an asynchronous queue

The classic wikipedia quote (Message queue)

In computer science, message queues and mailboxes are software-engineering components used for interprocess communication, or for inter-thread communication within the same process. They use a queue for messaging – the passing of control or of content. Group communication systems provide similar kinds of functionality.

So one part of a system puts a message into a queue for another part to read from, and then act upon. The asynchronous nature means that each side is otherwise independent from the other, and does not wait for a response. That independence is an important part of the nature of the system though – and we’ll see later how some of the more advanced functionality for our software of choice here can give some extraordinary flexibility to what can be done.

Why use a queuing system?

You’d be surprised how few things need to happen right now – you go and buy a fancy coffee, and they write your order down, and put it into the queue for the Barista to make it. That disconnected set of actions works exceeding well for such distributed system (see Starbucks Does Not Use Two-Phase Commit)

In much the same way as you not getting your coffee till it’s made, what about web-sites that have to fetch (or produce) information. A couple of the simpler examples are when you’ve uploaded an image onto That image has to be stored, and then resized into several files. If it’s a large image though, it would take some time, and a lot of resources to be able to do that while you waited – time that you’re left twiddling your thumbs. Instead, it returns immediately, and tells you that the image is being handled in the background – and in a few seconds, or maybe minutes, it shows up on your page.

How about waiting a few seconds for other information? How about, when you login to a social media website, it returns a simple webpage immediately with what it’s got to hand, but then in the background, checks how many new messages you have, and displays them either by updating the page (with ajax), or when you view a different page. Is it so vital you find out that you have thirty old messages, and a few new ones – right now? For a web-mail system like Gmail, or Yahoo Mail, that is the point – but what about on another kind of site?


Beanstalkd is a big to-do list for your distributed application. If there is a unit of work that you want to defer to later (say, sending an email, pushing some data to a slow external service, pulling data from a slow external service, generating high-quality image thumbnails) you put a description of that work, a “job”, into Beanstalkd. Some processes (such as web request handlers), “producers”, put jobs into the queue. Other processes, “workers”, take jobs out of the queue and run them.
From the BeanstalkD FAQ

What can it do?

I’ve already mentioned a few ideas for things to have an asynchronous worker do, via a BeanstalkD queue, but there are a number of ways that it can be run, and a number of very useful facilities that BeanstalkD gives a producer of tasks.


Simple enough to describe – given more than one task that could be run at a particular time, run the more important. The most urgent priority is 0; the least urgent priority is 4,294,967,295 (2^32).


This is, in my mind one of the two secret weapons of Beanstalkd – together with a delayed job. Tubes, or ‘named queues’ can be created at will, and you can use as many different tubes as you want to put jobs into, but those jobs would only be returned to workers that were watching a given tube. Each worker could be watching many, but a single job can only be in a particular tube.

If you don’t use a particular tube-name, it goes into ‘default’, but there’s a lot of flexibility in sending particular jobs to specific workers, or groups of workers. For example, you could create a tube called ‘sql’ watched by workers on a database server, or even further limited by role.

File uploads can create special problems, unless you have some significant back-end systems, they will generally be uploaded to a front-end webserver and then have to be processed there, or moved on to somewhere else before they can be processed. This is a common event, so how do you make sure that any request to process an image can only be picked up by a particular machine? Send it to a tube named after the hostname of the server! As long as there is a worker process there, it will be picked up, and run. What it does from there, is up to it – it could resize the image, and save it to a local file system, or arrange for the file to be moved to a central file-storage area, and then fire another message into the queue for further processing there.

Although BeanstalkD doesn’t (yet) have persistent queues saved to disk, you could also use a tube as a long-term hold. For example, throw a message into a tube called ‘overnight-reports’ – but don’t have a worker pick it up immediately, instead one is only brought up to run the queue tasks in the quiet overnight hours.

The potential flexibility is enormous.


Another of the secret weapons, or killer features of BeanstalkD, is the ability to hold a message within the queue for a defined period before allowing it to be collected, and acted upon. If you have an action that has to be checked repeatedly, for example, has a particular person come online? then you can fire a number of identical tasks into the queue and allow them to slowly come out as the time passes.

It can also be useful to not do everything at once – maybe setting a lower-priority task that would run a few seconds after someone logs in – for example, updating an internal status or record – or checking for lesser-requested information.

How to use

Although BeanstalkD allows a large amount of information to go into the job-specification (the information that is held in the queue and passed between the producers and workers), I find that a simple string can hold at least a reference to what is required. I take my lead from URLs – and use them to direct the action to be run, and a few parameters as needed. For example – imagine the following strings being sent to a BeanstalkD worker, which it decodes and runs as a task:

  • /tasks/image/resize/filename/example.jpg
  • /tasks/image/resize/filename/example.jpg/sizeX/640/sizeY/480
  • / tasks/image/move/from/web1/to/centralstore/filename/example.jpg
  • /tasks/member/logintasks/id/12345
  • /tasks/event/add/id/12345/event/27
  • /tasks/mail/fetchcounts/id/12345
  • /tasks/mail/check-for-disallowed/id/596583405

Sending simple messages like these would require very little setup from the producer’s side, and can be quite easily parsed by any worker process to pass on to a given function. In these examples (some of which I’ve used myself in live code), the path refers to a Zend Framework layout of module/controller/action & parameters. Rather than sending large amounts of text for the actual contents of a mail message (in the last example path), we simply refer to a record in the database for simplicity. Similarly for an image filename in the first item.

Next time:

Following articles in this series will show code to insert some messages into the queue. From there, I’ll show you how to have a worker keep running reliably and pick and run the jobs as required.

Tagged with: , ,

Phew. That would have been embarrassing if I’d not passed my ZCE on Thursday afternoon (Jun 4th, 2009).

Tagged with: , ,