Web applications often need to perform multiple tasks in a single request, such as sending emails to notify users about something. However, sending emails can be slow and time-consuming, especially if we have to send many of them one by one. This can make the web request take longer than necessary and affect the user experience.
How can we avoid this problem? How can we delegate the email sending to another process that runs in the background, without blocking the main web request? This is where asynchronous programming comes in handy. Asynchronous programming is a technique that allows us to execute some code without waiting for its result, so we can continue with other tasks.
In this post, we will explore different ways to implement asynchronous programming in PHP, with or without using additional tools or services. We will see how this can help us scale our web applications and improve their performance.
Using exec()
exec() is a native function in PHP that can be used to execute an external program and returns the result. In our case, it could be a script that sends emails. This function uses the operating system to spawn a completely new (blank, nothing copied or shared) process and you can pass any state you need to it.
Let's take a look at an example.
<?php
// handle a web request
// record the start time of the web request
$start = microtime(true);
$path = __DIR__ . '/send_email.php';
// output to /dev/null & so we don't block to wait for the result
$command = 'php ' . $path . ' --email=%s > /dev/null &';
$emails = ['joe@blogs.com', 'jack@test.com'];
// for each of the emails, call exec to start a new script
foreach ($emails as $email) {
// Execute the command
exec(sprintf($command, $email));
}
// record the finish time of the web request
$finish = microtime(true);
$duration = round($finish - $start, 4);
// output duration of web request
echo "finished web request in $duration\n";
send_email.php
<?php
$email = explode('--email=', $argv[1])[1];
// this blocking sleep won't affect the web request duration
// (illustrative purposes only)
sleep(5);
// here we can send the email
echo "sending email to $email\n";
Output
$ php src/exec.php
finished web request in 0.0184
The above scripts show the web request still finishes in milliseconds, even though there is a blocking sleep
function call in the send_email.php script.
The reason it doesn't block is because we've told exec
with the inclusion of > /dev/null &
in the command that we don't want to wait for exec
command to finish so we can get the result, meaning it can happen in the background and the web request can continue.
In this way, the web request script is simply responsible for running the script, not for monitoring its execution and/or failure.
This is an inherent downside of this solution, as the monitoring of the process falls to the process itself and it cannot be restarted. However, this is an easy way to get asynchronous behaviour into a PHP application without much effort.
exec
runs a command on a server so you have to be
careful about how the script is executed, particularly if it involves
user input. It can be hard to manage using exec
particularly as you manage scaling the application, as the script is
likely running on the exact same box that is processing external web
requests, so you could end up exhausing CPU and memory if many hundreds
or thousands of new processes are spawned via exec
.
pcntl_fork
pcntl_fork is a low-level function which requires PCNTL extension to be enabled and is a powerful, yet potentially error prone method for writing asynchronous code in PHP.
pcntl_fork
will fork or clone the current process and
split it into a parent and a number of child processes (depending on how
many times it is called). By detecting the Process ID or PID we can run
different code when in the context of a parent process or a child
process.
The parent process will be responsibile for spawning child processes and waiting until the spawned processes have completed before it can complete.
In this case, we can have more control over how the processes exit and can easily write some logic to handle retries in case of failure in the child process.
Now, on to the example code for our use case to send emails in a non-blocking way.
<?php
function sendEmail($to, $subject, $message)
{
// Code to send email (replace with your email sending logic)
// This is just a mock implementation for demonstration purposes
sleep(3); // Simulating sending email by sleeping for 3 seconds
echo "Email sent to: $to\n";
}
$emails = [
[
'to' => 'john@example.com',
'subject' => 'Hello John',
'message' => 'This is a test email for John.',
],
[
'to' => 'jane@example.com',
'subject' => 'Hello Jane',
'message' => 'This is a test email for Jane.',
],
// Add more email entries as needed
];
$children = [];
foreach ($emails as $email) {
$pid = pcntl_fork();
if ($pid == -1) {
// Fork failed
die('Error: Unable to fork process.');
} elseif ($pid == 0) {
// Child process
sendEmail($email['to'], $email['subject'], $email['message']);
exit(); // Exit the child process
} else {
// Parent process
$children[] = $pid;
}
}
echo "running some other things in parent process\n";
sleep(3);
// Parent process waits for each child process to finish
foreach ($children as $pid) {
pcntl_waitpid($pid, $status);
$status = pcntl_wexitstatus($status);
echo "Child process $pid exited with status: $status\n";
}
echo 'All emails sent.';
In the above example using pcntl_fork
we can fork the
current process, which copies the parent process into new child
processes and wait for the execution to complete. Additionally, after
forking the child processses to send emails, the parent process can
continue doing other things, before ultimately ensuring the child
processes have finished.
This is a step above using exec
where we were pretty
limited in what is possible because the scripts are completely separate
contexts so monitoring is not possible from an overall perspective.
We also gain process isolation as each child process runs in a separate memory space and does not affect other processes.
By tracking the process IDs we can effectively monitor and manage execution flow.
A downside in forking requests in this way, directly from the web request (parent process) is that by waiting for the child processes to finish, there's no benefit to the response time of the original request in doing it this way.
Fortunately, there is a solution to this and it's to combine both exec
and pcntl_fork
to get the best of both worlds, which looks like this:
- Web request uses exec() to spawn a new PHP process
- The spawned process is passed a list of emails as a batch
- The spawned process becomes the parent as it forks to send each email individually
This can all happen in the background, rather than blocking the original request.
Let's take a look at making this work:
<?php
$start = microtime(true);
$path = __DIR__ . '/pcntl_fork_send_email.php';
$emails = implode(',', ['joe@blogs.com', 'jack@test.com']);
$command = 'php ' . $path . ' --emails=%s > /dev/null &';
// Execute the command
echo "running exec\n";
exec(sprintf($command, $emails));
$finish = microtime(true);
$duration = round($finish - $start, 4);
echo "finished web request in $duration\n";
pctnl_fork_send_email.php
<?php
$param = explode('--emails=', $argv[1])[1];
$emails = explode(',', $param);
function sendEmail($to)
{
sleep(3); // Simulating sending email by sleeping for 3 seconds
echo "Email sent to: $to\n";
}
$children = [];
foreach ($emails as $email) {
$pid = pcntl_fork();
if ($pid == -1) {
// Fork failed
die('Error: Unable to fork process.');
} elseif ($pid == 0) {
// Child process
sendEmail($email);
exit(); // Exit the child process
} else {
// Parent process
$children[] = $pid;
}
}
echo "running some other things in parent process\n";
sleep(3);
// Parent process waits for each child process to finish
foreach ($children as $pid) {
pcntl_waitpid($pid, $status);
$status = pcntl_wexitstatus($status);
echo "Child process $pid exited with status: $status\n";
}
echo "All emails sent.\n";
The beauty of this solution, albeit more complicated, is that you can set up a separate process all together whose responsibility it is to run and monitor forked processes for the purpose of doing work asynchronously.
AMPHP
amphp (Asynchronous Multi-tasking PHP) is a collection of libraries that allow you to build fast, concurrent applications with PHP.
The release of PHP 8.1 in November 2021 shipped support for Fibers which implement a lightweight cooperative concurrency model.
Now we know a little bit about how amphp
works and why it's exciting for the future of PHP programs, let's take look at an example:
<?php
require __DIR__ . '/../vendor/autoload.php'; // Include the autoload file for the amphp/amp library
use function Amp\delay;
use function Amp\async;
function sendEmail($to, $subject, $message)
{
delay(3000)->onResolve(function () use ($to) {
echo "Email sent to: $to\n";
});
}
$emails = [
[
'to' => 'john@example.com',
'subject' => 'Hello John',
'message' => 'This is a test email for John.',
],
[
'to' => 'jane@example.com',
'subject' => 'Hello Jane',
'message' => 'This is a test email for Jane.',
],
// Add more email entries as needed
];
foreach ($emails as $email) {
$future = async(static function () use ($email) {
$to = $email['to'];
$subject = $email['subject'];
$message = $email['message'];
sendEmail($to, $subject, $message);
});
// block current process by running $future->await();
}
echo "All emails sent.\n";
The above script is a very simple version of running things asynchronously. It will create a new fiber asynchronously using the given closure, returning a Future (object).
This is a much simpler version than rolling your own and does the heavy lifting for you, which is key for building an application as you don't need to worry about how the work is queued internally - you just know it happens asynchronously.
Queues and Workers
A solution to this problem also exists outside of PHP and prior to PHP 8.1 it could be considered the gold standard because it's language independent and highly scalable.
The use of queues such as Amazon SQS, RabbitMQ or Apache Kafka has been a widely accepted solution for some time.
Queues are pieces of infrastructure to run workers indepdenent to your application for the processing of any work asynchronously. This is not without risk or downside either, but tried and tested over time.
Let's get into an example:
Sender, in this example, is typically your existsing web application.
sender.php
<?php
require 'vendor/autoload.php';
use Aws\Sqs\SqsClient;
// Initialize the SQS client
$client = new SqsClient([
'region' => 'us-east-1',
'version' => 'latest',
'credentials' => [
'key' => 'YOUR_AWS_ACCESS_KEY',
'secret' => 'YOUR_AWS_SECRET_ACCESS_KEY',
],
]);
// Define the message details
$message = [
'to' => 'john@example.com',
'subject' => 'Hello John',
'message' => 'This is a test email for John.',
];
// Send the message to SQS
$result = $client->sendMessage([
'QueueUrl' => 'YOUR_SQS_QUEUE_URL',
'MessageBody' => json_encode($message),
]);
echo "Message sent to SQS with MessageId: " . $result['MessageId'] . "\n";
Workers are an additional deployment of running code to process jobs.
worker.php
<?php
require 'vendor/autoload.php';
use Aws\Sqs\SqsClient;
// Initialize the SQS client
$client = new SqsClient([
'region' => 'us-east-1',
'version' => 'latest',
'credentials' => [
'key' => 'YOUR_AWS_ACCESS_KEY',
'secret' => 'YOUR_AWS_SECRET_ACCESS_KEY',
],
]);
// Receive and process messages from SQS
while (true) {
$result = $client->receiveMessage([
'QueueUrl' => 'YOUR_SQS_QUEUE_URL',
'MaxNumberOfMessages' => 1,
'WaitTimeSeconds' => 20,
]);
if (!empty($result['Messages'])) {
foreach ($result['Messages'] as $message) {
$body = json_decode($message['Body'], true);
// Process the message (send email in this case)
sendEmail($body['to'], $body['subject'], $body['message']);
// Delete the message from SQS
$client->deleteMessage([
'QueueUrl' => 'YOUR_SQS_QUEUE_URL',
'ReceiptHandle' => $message['ReceiptHandle'],
]);
}
}
}
function sendEmail($to, $subject, $message)
{
sleep(3); // Simulating sending email by sleeping for 3 seconds
echo "Email sent to: $to\n";
}
This solution is comprised of two parts:
- Sender (pushes a message to an SQS queue)
- Worker (receives a message from a queue and sends an email)
It can be scaled through increasing the number of workers relative to the number of messages that get sent by any number of senders.
By using a queue, the worker is completely independent from the sender and can be written in any language as the communication between sender and worker is through JSON messages.
Which solution is best?
It's almost impossible to say out of all of the solutions we've explored above, which would be the best for your application because although they all aim at solving the problem of running asynchronous code with PHP the implementations are quite different and have different benefits and drawbacks.
To summarise each option in a few points:
exec()
- Perhaps the simplest and most effective way to run PHP scripts async
- Fraught with potential security implications particularly around user input
- Nothing is shared can be both a blessing and a curse
- May cause increase in existing server resources (CPU/Memory)
pcntl_fork()
- Allows management of parent/child processes to customise behaviour
- Can be abstracted away in a simpler API for your application
- Cloning the current process may cause other downstream issues
AMPHP
- Requires PHP 8.1 for the user of Fibers
- Library has abstracted away the "hard parts" of running async code
- Steep learning curve over other more traditional methods (understanding event loop and multi-tasking in PHP)
Queues and Workers
- Language independent, flexible for any use case
- Introduces a distributed system (can be a good or bad thing in the long run)
- Many solutions around and different queue providers to make it easy
Conclusion
The main reason I wanted to dive a bit deeper into all the different possibilities of async code in PHP is to understand how (if at all) the introduction of Fibers in PHP 8.1 changes how we can write async programs in the future.
There are many solutions available without requiring PHP 8.1 that have been battle tested, but it's interesting to see the direction the PHP language is going in to compete with the likes of Golang and Elixir, both of which support async programming and have done for years.
Ultimately, I would probably still reach for a Queue/Worker approach
given the scalability and cross-platform/cross-language support -
however I think over time we might see libraries such as AMPHP
become more feature rich and make this problem easier to solve without introducing new infrastructure.
To see the code samples used in this blog post, you can find them on GitHub.
No comments:
Post a Comment