Tuesday, 18 December 2007

How to multi-thread using PHP

Apologies about the possibly-dodgy code formatting - Blogger doesn't seem to respect pre tags in its editor...

Well, nearly....

We all know that PHP is a single threaded language right? Well, that's true, but PHP does have non-blocking I/O. Curl multi is a good example. You can ask curl multi to make a HTTP request(s), do something else, come back, see if it's finished, and so on. Another good example is fopen. Using non-blocking I/O we can parallelise our I/O, and hence our logic into "threads", to get a huge speed increase.

An example perchance?

Ok, so imagine that we have a couple of similar pieces of logic something like this:

  1. Construct HTTP request
  2. Make HTTP request
  3. Wait for response (JSON or similar)
  4. Process response

Sounds familiar right? Pretty much happens on every enterprise web server, as web services as pretty much the norm now. So, what we'd really like is to be able to avoid this:

  1. Construct HTTP request 1
  2. Make HTTP request 1
  3. Wait for response 1
  4. Process response 1
  5. Construct HTTP request 2
  6. Make HTTP request 2
  7. Wait for response 2
  8. Process response 2

Well, using non-blocking I/O you can get parallel on its ass.

  1. Construct HTTP request 1
  2. Make HTTP request 1
  3. Construct HTTP request 2
  4. Make HTTP request 2
  5. Wait for response 1 or 2
  6. Process response 2
  7. Wait for response 1
  8. Process response 1

You see here that response 2 returns first!

Ok, so how do we manage the path of execution through our code? How do we know what data is ready? Well let's define some things:

  1. let's call each strand of logic a pseudo thread, or PseuThread;
  2. certain things block execution of the PseuThread - like waiting for an HTTP request to finish - and we will term these things Blockers;
  3. let's have a manager that can look after our threads, a PseuThreadManager;
  4. the entry point for each piece of logic is some method + some arguments, represented as a MethodCall object.
So here, we have two threads, each one is blocked for a time by a HTTP request. If you're feeling headstrong there is a class diagram further on down... Time for some code maybe?

Some sample code

// object + method + arguments => result (eventually)
$methodCall1 = new MethodCall(
  array($someObject, 'someMethod'),
  array($arg1_1, $arg1_2)
);

$thread1 = $manager->createNewThread($methodCall1);

$methodCall2 = new MethodCall(
  array($someOtherObject, 'someOtherMethod'),
  array($arg2_1)
);

$thread2 = $manager->createNewThread($methodCall2);

// this will block execution until all threads are finished
$manager->blockUntilAllThreadsFinished();

var_dump(
  $thread1->getResult(),
  $thread2->getResult()
);
Ok, so you can see how you start a define an entry point for the threads using a MethodCall object, and how you turn these into threads, but how do you control execution? Well let's say $someObject is an instance of SomeClass, and look at some code again.
class SomeClass
{
  function someMethod($someArg, $someOtherArg)
  {
    // construct and make http request
    $httpRequest = $this->getHttpRequest($someArg, $someOtherArg);

    // won't finish or execute next MethodCall until http request finished
    $this->getThreadManager()->addBlockerToCurrentThread($httpRequest);

    // so this won't get executed until http request is finished
    $methodCall = new MethodCall($this, array($httpRequest));

    return $methodCall;
  }

  function someMethodPartTwo($httpRequest)
  {
    $response = $httpRequest->getReponse();

    // finished as nothing blocking thread anymore, and not returning a MethodCall
    return $response;
  }

  ...

}
I've missed out some code here on purpose, but the important stuff is in there. You can see how the control happens from a thread perspective. But what about from the manager perspective?
function blockUntilAllThreadsFinished()
{
  while (TRUE)
  {
    foreach ($this->unfinishedThreads as $key => $thread)
    {
      if ( ! $thread->execute() )
      {
        continue;
      }

      // thread has executed
      if ( $thread->isFinished() )
      {
        unset($this->unfinishedThreads[$key]);
      }
    }

    if ( empty($this->unfinishedThreads) )
    {
      break;
    }

    // have a nap
    $this->sleep();

  }
}

You can guess here that PseuThread->isBlocked() just does an OR on the blockers, and that PseuThread->execute() checks if it is blocked before doing anything.

Further thoughts

Now, what if the blocker was added automatically to the thread when you created a http request object through a loosely couple event system? The threading control would be transparent? :o)

Also, note that there is nothing stopping a thread from being a blocker!

If there's enough interest I will try to release this as a library.

Diagram?

Feedback

All feedback welcome!

1 comments:

Advocate said...

Great write up. An implementation of this : http://www.jaisenmathai.com/blog/2008/05/29/asynchronous-parallel-http-requests-using-php-multi_curl/