PHP curl – controlling multi concurrency

Right, so I moved from downloading 50+ curl handles using the simple interface to using the curl_multi variant, and suddenly I bumbed into a few problems, firstly my downloads were incomplete by the time the code stopped looping (reference example code!) and the web server took some strain trying to serve 50+ severely CPU intensive queries at the same time (an individual download can take up to a few minutes depending on various things, thus why I need concurrency – multiple cores – can just as well utilize them).

Right, so as per usual you set up your normal curl handles using something like:

// $fopen contains the output file name (I just need the content in files)
$fp = fopen($fname, "w") or die("Error opening file ".$fname." for writing.");

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);

$curl_handles[] = array($ch, $desc);

Note that I drop the actual handles into an array called $curl_handles, and each element in itself is an array with 2 elements, the handle and a description (my code currently drops just over 50 in here, and climbing quite a bit). So here is the code I cooked up today, posted here because I could find no alternative samples. It basically processes all the handles in $curl_handles at a concurrency of 4 (as soon as one finishes it starts the next). Note that the reason I don’t use an associative array above is so that I can use array_shift in the code here:

$handle_descs = array();
$cmh = curl_multi_init();

$running = 0;
do {
    while ($running < 4 && $curl_handles) {
        list($ch, $d) = array_shift($curl_handles);
        curl_multi_add_handle($cmh, $ch);
        $handle_descs[$ch] = $d;
        fprintf(STDERR, "Starting transfer: %s\n", $d);
        ++$running;
    }

    if (curl_multi_select($cmh) != -1)
        while (curl_multi_exec($cmh, $running) == CURLM_CALL_MULTI_PERFORM);

    do {    
        $r = curl_multi_info_read($cmh, $remmsg);
        if ($r) {
            if ($r['msg'] == CURLMSG_DONE) {
                if ($r['result'] != CURLE_OK) {
                    fprintf(STDERR, "Error transferring: %s\n", $handles_descs[$r['handle']]);
                    exit(1);
                } else  
                    fprintf(STDERR, "Completed transfer: %s\n", $handle_descs[$r['handle']]);
                curl_multi_remove_handle($cmh, $r['handle']);
            } else {
                fprintf(STDERR, "Don't know how to handle curl message %d\n", $r['msg']);
                exit(1);
            }       
        }       
    } while ($remmsg > 0);
} while($running > 0);

curl_multi_close($cmh);

Firstly we set $running to zero so the code that add handles from the pool will initially add 4 handles (You can up the concurrency by modifying this value in line 6).

I use a curl_multi_select to block on data availability instead of busy-looping (munching a LOT of CPU for no good reason). After which I execute curl_multi_exec until it’s happy. This is on lines 14 + 15.

The curl_multi_info_read loop just removes completed handles and closes them, as well as output some status information.

Please note that status info goes to STDERR, you can easily change this.

Comments are closed.