PHP curl – controlling multi concurrency

Right, so I moved from downloading 50+ curl handles using the simple interface to using the curl_multi variant, and suddenly I bumbed into a few problems, firstly my downloads were incomplete by the time the code stopped looping (reference example code!) and the web server took some strain trying to serve 50+ severely CPU intensive queries at the same time (an individual download can take up to a few minutes depending on various things, thus why I need concurrency – multiple cores – can just as well utilize them).

Right, so as per usual you set up your normal curl handles using something like:

// $fopen contains the output file name (I just need the content in files)
$fp = fopen($fname, "w") or die("Error opening file ".$fname." for writing.");
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
 
$curl_handles[] = array($ch, $desc);

Note that I drop the actual handles into an array called $curl_handles, and each element in itself is an array with 2 elements, the handle and a description (my code currently drops just over 50 in here, and climbing quite a bit). So here is the code I cooked up today, posted here because I could find no alternative samples. It basically processes all the handles in $curl_handles at a concurrency of 4 (as soon as one finishes it starts the next). Note that the reason I don’t use an associative array above is so that I can use array_shift in the code here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
$handle_descs = array();
$cmh = curl_multi_init();
 
$running = 0;
do {
    while ($running < 4 && $curl_handles) {
        list($ch, $d) = array_shift($curl_handles);
        curl_multi_add_handle($cmh, $ch);
        $handle_descs[$ch] = $d;
        fprintf(STDERR, "Starting transfer: %s\n", $d);
        ++$running;
    }
 
    if (curl_multi_select($cmh) != -1)
        while (curl_multi_exec($cmh, $running) == CURLM_CALL_MULTI_PERFORM);
 
    do {    
        $r = curl_multi_info_read($cmh, $remmsg);
        if ($r) {
            if ($r['msg'] == CURLMSG_DONE) {
                if ($r['result'] != CURLE_OK) {
                    fprintf(STDERR, "Error transferring: %s\n", $handles_descs[$r['handle']]);
                    exit(1);
                } else  
                    fprintf(STDERR, "Completed transfer: %s\n", $handle_descs[$r['handle']]);
                curl_multi_remove_handle($cmh, $r['handle']);
            } else {
                fprintf(STDERR, "Don't know how to handle curl message %d\n", $r['msg']);
                exit(1);
            }       
        }       
    } while ($remmsg > 0);
} while($running > 0);
 
curl_multi_close($cmh);

Firstly we set $running to zero so the code that add handles from the pool will initially add 4 handles (You can up the concurrency by modifying this value in line 6).

I use a curl_multi_select to block on data availability instead of busy-looping (munching a LOT of CPU for no good reason). After which I execute curl_multi_exec until it’s happy. This is on lines 14 + 15.

The curl_multi_info_read loop just removes completed handles and closes them, as well as output some status information.

Please note that status info goes to STDERR, you can easily change this.

Leave a Reply

This blog is kept spam free by WP-SpamFree.