Booze at tech meetups

I was out last night at the The Big Xmas [bash] #, near Silicon Roundabout. It was a fun night out meeting various people, tech, business and recruiters. Oh, the shame though – I was wearing the same T-shirt as someone else – and, yes, I have indeed replaced people with small shell scripts.

Now, to the main part of what this post is about – the rant. It’s not aimed at the particular event last night alone though. It’s alcohol at various tech-meetups in general. Look guys, you generally end up buying too much anyway, and all too often its also to the exclusion of those that may prefer to not get inebriated.

As an example, The Hacker News meetups will get dozens of pizzas (which are, admittedly all eaten – there are 150+ people attending usually), but also a couple of stacks worth of cans and bottles, each with 24 cans in each tray, several hundred cans at least. It’s just as well they aren’t all drunk on the night – many of the event-goers would be unconscious by the end. At least they will also add a few trays of soft-drinks, Lemonade and Cola.

If you want a couple of drinks to help lubricate the social aspect of an evening out, I’ve got no problem at all. I don’t though. I prefer to save my brain cells for doing interesting things, like oh, writing code?

For other events, how about adding some more soft drinks to replace some of the alcohol? Last night, the choice was booze, or fizzy water; That was all.

Thankfully, people don’t generally get blotto at the various meetups – at least that I’ve seen, but I expect there’s been one or two that have swerved their way home sometimes.

Do you have a comment about alcohol being served at the various meetups?  Would you like more, less, or do you think that organisers and sponsors are doing it right?  I would love to start a conversation here about the good or bad of it.

Posted in quick Tagged with: ,

Deployment with Capistrano – the Gotchas

Capistrano, makes deployment of code easy. If you need to do a number of additional steps as well, then the fact that they can be scripted and run automatically is a huge win.

If you’ve only got a single machine (or maybe two), then you could certainly write your own quite simple, and entirely workable system – I described something just like this in a previous post: “SVN checkouts vs exports for live versions”. That was written and used before I was deploying to multiple machines however – and had to be run from the command line of the machine itself. It was OK even when I had a couple of machines to deploy – I just opened an SSH to both, and ran the command on them both at the same time. When I attended the London Devops roundtable on Deployment I even advocated for that as a valid deployment mechanism. But, at the same time, as I was saying that (and it’s in the video), I was also writing Chef cookbooks and a Capistrano script to be able to build, and then deploy code to at least four different machines at once.

A number of people have already written about how to setup Capistrano to deploy PHP scripts. I’ll not repeat their work, instead I’ll just tell you some of the problems you might come across afterwards.

cap shell is a wonderful thing, until it bites you

The Capistrano shell will let you run a simple command or an internal task on one, or as many machines as you want. This can be useful when you are trying things out – and if you are in anyway unsure where a command can be run – you can practice it, just do:

cap> with web uptime
cap> on host.example.com uptime

Those two commands just show how long a machine has been up, and the current load average. Easy, and safe, but as they run, they show the list of machines they succeed on.

There are some other useful commands you can try:

## show the currently live REVISION file on each machine
cap> cat /mnt/html/deployed/current/REVISION
## This file is created as each new/updated checkout is done.
## change your path to the ./current/ path as appropriate

Since you should be deploying the same codebase to all your live machines at a time (or staging, or qa/test), the versions (or git sha1’s) should be the same as well.

Finally, in the ‘useful’ list is cap deploy:cleanup – this will remove old deployments. Keeping a few around are useful, but they can take up a lot of space. As cap --explain deploy:cleanup says:

Clean up old releases. By default, the last 5 releases are kept on each server (though you can change this with the keep_releases variable). All other deployed revisions are removed from the servers. By default, this will use sudo to clean up the old releases, but if sudo is not available for your environment, set the :use_sudo variable to false instead.

If you want to change the default to something other than 5, that can be set with the line “set :keep_releases, 10” in deploy.rb.

A few gotcha’s

When cap shell checks the source repo version

I’ve found that the latest version available in the main source code repository is only apparently checked when the Capistrano shell is first run. This can be useful if you want to check out to a limited set of machines, run a test and then check out to all the machines (you end up with the same version checked out in the same-named ‘releases/’ directory), but if you are sitting on the cap> prompt in Capistrano shell and doing multiple !deploy commands, you won’t get new versions of code that have been committed to the repository. Exit the shell, and re-run to solve this.

You checked out a new version, but you can’t see it

Be wary if you are logged into the machine, and sitting somewhere inside the ./current/ directory. Because of the symlink is being changed underneath you to a new directory that is being pointed to (the newest subdirectory in releases/), if you do not do a cd . to refresh your location within the real directory tree, you will still be in an old copy of the code. The ‘cd’ makes sure you are in the latest place on disk, via the (now changed) symlink.

Rolling back

Capistrano has the ability to remove the currently live version, and change the ‘current’ symlink to the previous location. Should the worst happen, and a website deployment fail, this can help, if ‘rolling forward’, with a fast-fix, check-in and redeploy may not be easily possible.

# to roll back to a previous deployment:
cap> with !deploy:rollback

If you have rolled back the webservers (php/app servers) you will have to restart php-fpm (or maybe Apache) on the servers, as they do not necessarily pick up the (old) versions of code that is being run now. The same would also be true if you have set APC to cache the byte-code and not look at the time-stamp of files in case they change. I’ve found that PHP-FPM also has this issue.

Posted in tools Tagged with: ,

Back from the coalface

I’ve been pretty busy in the last couple of years, first at Binweevils and in 2011, PeerIndex – hence the utter lack of posts, but as the note on my personal CV site says, I’m taking some time off between looking for my next role. This does give the opportunity to write more of PHP Scaling and the tools around development that I’ve been using in the last couple of years, and that have been piquing my curiosity.

So, it is my plan to investigate other languages such as Python and Ruby, and tools like Puppet and Node.Js. Rest assured, I’ll keep up with the state-of-the art in PHP and such technologies as MongoDB though!

There’s also a number of planned posts right here, more for Beanstalkd (and talking about other queues), Deployment with Capistrano, graphing and logging (including how to mark a Capistrano deployment in a graph!) and a few other things, including rants.

Posted in fun, quick Tagged with: ,

Doing the work elsewhere – Adding a job to the queue

I’ve previously shown you why you may want to put some tasks through a queuing system, what sort of jobs you could define, plus how to keep a worker process running for as long as you would like (but still be mindful of problems that happen).

In this post, I’ll show you how to put the messages into the queue, and we’ll also make a start on reading them back out.

For PHP, there are two BeanstalkD client libraries available.

Although I’ve previously used the first class in live code, I’m preferring the second, ‘Pheanstalk’, for this article. It is more regularly worked on, and uses object orientation to the fullest, plus it’s got a test suite (based on Simpletest, which is included in the download).

Using it, according to the example is simple:

The ‘pheanstalk_init.php’ file adds an autoloader, though you may find it advantageous to move the main class file hierarchy from where it had been downloaded into its own directory so that an existing (for example Zend Framework) auto-loader can find it.

As you see above, the Object Orientation lends itself well to (an optional) ‘fluid’ programming style, where an object is returned and then can be acted on in turn
$pheanstalk->useTube('testtube')->put("job payload goes here\n");

So, putting simple data into the queue, is, well, simple (as it should be). There are advantages in wrapping this simplicity into our own class though. Some examples

  • We want to put the same job into the queue multiple times – for example, a call to check some data in 1, 10 and 20 seconds time.
  • Adding a new default priority – or with multiple classes, a small range of defaults
  • adding in other (meta) information about the job that is being run, such as when it was queued, and how important it is. Some tasks might be urgent, but not important – ie, if we have the opportunity, run them now – but it doesn’t have to be run at all.

Each may be simple enough to create a simple loop, but it might be advantagous to push that down into a class – and especially with the final idea.

How to store the meta-information then? It should be a text-friendly, but concise format, and quick to parse. Here, JSON (or the related Yaml) fits the bill quite nicely.

Processing it at the other end, after it has been fetched by the worker is a simple matter of running ‘json_decode()’ and extracting the [‘task’] from the results before running it.

Posted in php, scaling, tools Tagged with: , ,

Doing the work elsewhere – Sidebar – running the worker

I’m taking a slight diversion now, to show you how the main worker processor runs. There are two parts to it – the actual worker, written in PHP, and the script that keeps running it.

For testing with return from the worker, we’ll just return a random number. In order to avoid returning a normally used exit value, I’ve picked a few numbers for our controls, up around the 100 range. By default a ‘die()’ or ‘exit’ will return a ‘0’, so we can’t use that to act on – though we will use it as a fall-back as a generic error. Ideally, we won’t get one, instead we want the code in all the workers to just run as planned, and then have the worker execute a planned restart – and we will just immediately restart. We may also choose to have the worker process specifically stop – and so we’ll have an exit code for that. If there are any codes we don’t understand, we’ll slow the system down with a ‘sleep()’ to avoid running away with the process.

The actual script that is run from the command line is a pretty simple BASH script – all it’s got to do is to loop, until it gets a particular set of exit values back.


So, if it’s an exit value we know, we either
1/ pause, then restart
2/ immediately restart
3/ exit the loop.
If its any other value, we pause, and restart.

The bash command ‘exec $0 $@’ will re-run the current script ($0) with the original arguments ($@) – but with the ‘exec’, replaces the current process with a specified command. Normally, when the shell encounters a command, it forks off a child process to actually execute the command. Using the exec builtin, the shell does not fork, and the command exec’ed replaces the shell.

Save both the PHP and bash script, and then you can start the script with ‘sh runBeanstalkd-worker.sh’, run it a few times to see a lot of (deliberate) errors that cause the bash script to pause before restart, immediately restart and finally exit.

With this bash script in place, we can now run the script as many times as we need – and it will keep running, until we specifically tell it to exit. As usefully, we can exit the php worker, and have it execute a planned restart – which will clear any overheads that the script may have picked up with memory or resource allocation.

Next time, we’ll put some simple tasks into the queue.

Posted in php, scaling Tagged with: , ,

Doing the work elsewhere – Asynchronous Message Queues

The use of Beanstalkd as a queueing system

What is an asynchronous queue

The classic wikipedia quote (Message queue)

In computer science, message queues and mailboxes are software-engineering components used for interprocess communication, or for inter-thread communication within the same process. They use a queue for messaging – the passing of control or of content. Group communication systems provide similar kinds of functionality.

So one part of a system puts a message into a queue for another part to read from, and then act upon. The asynchronous nature means that each side is otherwise independent from the other, and does not wait for a response. That independence is an important part of the nature of the system though – and we’ll see later how some of the more advanced functionality for our software of choice here can give some extraordinary flexibility to what can be done.

Why use a queuing system?

You’d be surprised how few things need to happen right now – you go and buy a fancy coffee, and they write your order down, and put it into the queue for the Barista to make it. That disconnected set of actions works exceeding well for such distributed system (see Starbucks Does Not Use Two-Phase Commit)

In much the same way as you not getting your coffee till it’s made, what about web-sites that have to fetch (or produce) information. A couple of the simpler examples are when you’ve uploaded an image onto Flickr.com. That image has to be stored, and then resized into several files. If it’s a large image though, it would take some time, and a lot of resources to be able to do that while you waited – time that you’re left twiddling your thumbs. Instead, it returns immediately, and tells you that the image is being handled in the background – and in a few seconds, or maybe minutes, it shows up on your page.

How about waiting a few seconds for other information? How about, when you login to a social media website, it returns a simple webpage immediately with what it’s got to hand, but then in the background, checks how many new messages you have, and displays them either by updating the page (with ajax), or when you view a different page. Is it so vital you find out that you have thirty old messages, and a few new ones – right now? For a web-mail system like Gmail, or Yahoo Mail, that is the point – but what about on another kind of site?

BeanstalkD

Beanstalkd is a big to-do list for your distributed application. If there is a unit of work that you want to defer to later (say, sending an email, pushing some data to a slow external service, pulling data from a slow external service, generating high-quality image thumbnails) you put a description of that work, a “job”, into Beanstalkd. Some processes (such as web request handlers), “producers”, put jobs into the queue. Other processes, “workers”, take jobs out of the queue and run them.
From the BeanstalkD FAQ

What can it do?

I’ve already mentioned a few ideas for things to have an asynchronous worker do, via a BeanstalkD queue, but there are a number of ways that it can be run, and a number of very useful facilities that BeanstalkD gives a producer of tasks.

Priorities

Simple enough to describe – given more than one task that could be run at a particular time, run the more important. The most urgent priority is 0; the least urgent priority is 4,294,967,295 (2^32).

Tubes

This is, in my mind one of the two secret weapons of Beanstalkd – together with a delayed job. Tubes, or ‘named queues’ can be created at will, and you can use as many different tubes as you want to put jobs into, but those jobs would only be returned to workers that were watching a given tube. Each worker could be watching many, but a single job can only be in a particular tube.

If you don’t use a particular tube-name, it goes into ‘default’, but there’s a lot of flexibility in sending particular jobs to specific workers, or groups of workers. For example, you could create a tube called ‘sql’ watched by workers on a database server, or even further limited by role.

File uploads can create special problems, unless you have some significant back-end systems, they will generally be uploaded to a front-end webserver and then have to be processed there, or moved on to somewhere else before they can be processed. This is a common event, so how do you make sure that any request to process an image can only be picked up by a particular machine? Send it to a tube named after the hostname of the server! As long as there is a worker process there, it will be picked up, and run. What it does from there, is up to it – it could resize the image, and save it to a local file system, or arrange for the file to be moved to a central file-storage area, and then fire another message into the queue for further processing there.

Although BeanstalkD doesn’t (yet) have persistent queues saved to disk, you could also use a tube as a long-term hold. For example, throw a message into a tube called ‘overnight-reports’ – but don’t have a worker pick it up immediately, instead one is only brought up to run the queue tasks in the quiet overnight hours.

The potential flexibility is enormous.

Delays

Another of the secret weapons, or killer features of BeanstalkD, is the ability to hold a message within the queue for a defined period before allowing it to be collected, and acted upon. If you have an action that has to be checked repeatedly, for example, has a particular person come online? then you can fire a number of identical tasks into the queue and allow them to slowly come out as the time passes.

It can also be useful to not do everything at once – maybe setting a lower-priority task that would run a few seconds after someone logs in – for example, updating an internal status or record – or checking for lesser-requested information.

How to use

Although BeanstalkD allows a large amount of information to go into the job-specification (the information that is held in the queue and passed between the producers and workers), I find that a simple string can hold at least a reference to what is required. I take my lead from URLs – and use them to direct the action to be run, and a few parameters as needed. For example – imagine the following strings being sent to a BeanstalkD worker, which it decodes and runs as a task:

  • /tasks/image/resize/filename/example.jpg
  • /tasks/image/resize/filename/example.jpg/sizeX/640/sizeY/480
  • / tasks/image/move/from/web1/to/centralstore/filename/example.jpg
  • /tasks/member/logintasks/id/12345
  • /tasks/event/add/id/12345/event/27
  • /tasks/mail/fetchcounts/id/12345
  • /tasks/mail/check-for-disallowed/id/596583405

Sending simple messages like these would require very little setup from the producer’s side, and can be quite easily parsed by any worker process to pass on to a given function. In these examples (some of which I’ve used myself in live code), the path refers to a Zend Framework layout of module/controller/action & parameters. Rather than sending large amounts of text for the actual contents of a mail message (in the last example path), we simply refer to a record in the database for simplicity. Similarly for an image filename in the first item.

Next time:

Following articles in this series will show code to insert some messages into the queue. From there, I’ll show you how to have a worker keep running reliably and pick and run the jobs as required.

Posted in advanced, php, scaling Tagged with: , ,

($me instanceOf ZCE) === true

Phew. That would have been embarrassing if I’d not passed my ZCE on Thursday afternoon (Jun 4th, 2009).

Posted in quick, zce Tagged with: , ,

Upcoming posts – keep watching

Just a quick note on what is going to be posted in the next few weeks – I’ve got a few significant pieces in mind for various topics – including:

  • Doing the work elsewhere – asynchronous queues

This is going to be a series of articles – and to support it, I’m rewriting some code that I had originally wrote for my last job (v2, and so significantly improved over the original). First though, I’ll tell you what is planned, and just how asynchronous queues are used and how they can be incredibly useful for scaling up any significant website, and not just in the obvious ways.

  • Mail queuing, on a vast scale

Dave Marshall has just posted an entry on Using message queues to improve user experience where he queues up some emails in order to spool them out over the course of a few minutes.
For the last 18 months, I’d been doing something very similar, on a far larger scale, with PEAR’s Mail_Queue.
I’ll show you how I did it and I’ll show you how the messages were generated quickly, and how they could be sent out – and more importantly without destroying the system it was running on. As a bonus, I’ll show you how if you run some form of internal mail system, you could save gigabytes of database space and give yourself vastly more flexibility.

  • Self stubbing mocks

Using the Mocking functionality in PHPunit & Simpletest can be complicated with the various calls that are required. There’s just not much documentation around the PHP-world on how to run it.
Another method, which can be easier to understand, is self-stubbing – putting your code into the test class. I’ll show some examples of how to do that.

Finally, I’m going to be doing my ZCE exam in the next week or two – quite possibly on Thursday 3rd June (2009). Keep a close eye here for the results, and a follow-up.

Posted in php Tagged with:

I laugh at your ZCE exam prep tests #2

Back at the PHP London Conference at the end of February, iBuildings was offering a little test, with prize for people that could do well answering the sort of questions that are on the ZCE exam. Never one to turn down something useful for free, I took ten minutes to answer the eight questions. A few weeks later, I get an email from them/Zend to say I’d won the chance to take an exam – ZCE, or ZFE (Zend Framework). Although I use ZF, I don’t know it well enough to begin to pass any exam, so as I’ve still not had the chance to take it, I figured, why not take it on their dime?

About 14 months ago, I’d bought 5 tries on the PHPArch-based ‘Vulcan’ test prep exam. Today, I’ve come back to it, and gone through it again. Like last time, the test (practice and real) is scheduled to take up to 90 minutes, but I had whipped through them all in 45 minutes, I have finished the 70 questions.

I’m amused by the fact the only part of this I failed was ‘Basic Language’. The first time around it was design patterns. Either way, now I’ve got some time, I’m going to schedule the test for quite possibly later this week and see about getting the paperwork for it.

It’s also still 7 ‘EXCELLENT’s, and a fail – just in different places 🙂

Category Grade
XML & Web Services PASS
Arrays PASS
Web Features EXCELLENT
Basic Language FAIL
Streams and Network Programming PASS
Database Access PASS
String Manipulation and Regular Expressions EXCELLENT
PHP 4/5 differences EXCELLENT
Security EXCELLENT
OOP EXCELLENT
Functions EXCELLENT
Design EXCELLENT

Overall : EXCELLENT

Posted in php Tagged with: ,

Riddled me that

Well go figure. I’ve just won $50 (Canadian, that’s about $3000 USD by now) of books and ‘stuff’ from PHP Arch, care of its publisher, Marco Tabini’s, blog.

He’d put a little puzzle up last night, some long numbers, and a few short. I recognised them as almost ISBNs – it wasn’t hard to figure them as having dropped a zero from the front, making them “php|architect’s Guide to Programming with Zend Framework” and “php|architect’s Zend PHP 5 Certification Study Guide, 2nd Edition”. From there, guessing the other numbers were page, line and word counts was easy.

So, what should I buy? I’ve already got a subscription to the magazine – PDF edition (it’s so much easier to ship bits over the atlantic…).

Posted in php