Right, so I moved from downloading 50+ curl handles using the simple interface to using the curl_multi variant, and suddenly I bumbed into a few problems, firstly my downloads were incomplete by the time the code stopped looping (reference example code!) and the web server took some strain trying to serve 50+ severely CPU intensive queries at the same time (an individual download can take up to a few minutes depending on various things, thus why I need concurrency – multiple cores – can just as well utilize them).
Right, so as per usual you set up your normal curl handles using something like:
// $fopen contains the output file name (I just need the content in files)
$fp = fopen($fname, "w") or die("Error opening file ".$fname." for writing.");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
$curl_handles[] = array($ch, $desc);
Note that I drop the actual handles into an array called $curl_handles, and each element in itself is an array with 2 elements, the handle and a description (my code currently drops just over 50 in here, and climbing quite a bit). So here is the code I cooked up today, posted here because I could find no alternative samples. It basically processes all the handles in $curl_handles at a concurrency of 4 (as soon as one finishes it starts the next). Note that the reason I don’t use an associative array above is so that I can use array_shift in the code here:
$handle_descs = array();
$cmh = curl_multi_init();
$running = 0;
do {
while ($running < 4 && $curl_handles) {
list($ch, $d) = array_shift($curl_handles);
curl_multi_add_handle($cmh, $ch);
$handle_descs[$ch] = $d;
fprintf(STDERR, "Starting transfer: %s\n", $d);
++$running;
}
if (curl_multi_select($cmh) != -1)
while (curl_multi_exec($cmh, $running) == CURLM_CALL_MULTI_PERFORM);
do {
$r = curl_multi_info_read($cmh, $remmsg);
if ($r) {
if ($r['msg'] == CURLMSG_DONE) {
if ($r['result'] != CURLE_OK) {
fprintf(STDERR, "Error transferring: %s\n", $handles_descs[$r['handle']]);
exit(1);
} else
fprintf(STDERR, "Completed transfer: %s\n", $handle_descs[$r['handle']]);
curl_multi_remove_handle($cmh, $r['handle']);
} else {
fprintf(STDERR, "Don't know how to handle curl message %d\n", $r['msg']);
exit(1);
}
}
} while ($remmsg > 0);
} while($running > 0);
curl_multi_close($cmh);
Firstly we set $running to zero so the code that add handles from the pool will initially add 4 handles (You can up the concurrency by modifying this value in line 6).
I use a curl_multi_select to block on data availability instead of busy-looping (munching a LOT of CPU for no good reason). After which I execute curl_multi_exec until it’s happy. This is on lines 14 + 15.
The curl_multi_info_read loop just removes completed handles and closes them, as well as output some status information.
Please note that status info goes to STDERR, you can easily change this.