Right, so I moved from downloading 50+ curl handles using the simple interface to using the curl_multi variant, and suddenly I bumbed into a few problems, firstly my downloads were incomplete by the time the code stopped looping (reference example code!) and the web server took some strain trying to serve 50+ severely CPU intensive queries at the same time (an individual download can take up to a few minutes depending on various things, thus why I need concurrency – multiple cores – can just as well utilize them).
Right, so as per usual you set up your normal curl handles using something like:
// $fopen contains the output file name (I just need the content in files) $fp = fopen($fname, "w") or die("Error opening file ".$fname." for writing."); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_FILE, $fp); curl_setopt($ch, CURLOPT_HEADER, 0); $curl_handles[] = array($ch, $desc);
Note that I drop the actual handles into an array called $curl_handles, and each element in itself is an array with 2 elements, the handle and a description (my code currently drops just over 50 in here, and climbing quite a bit). So here is the code I cooked up today, posted here because I could find no alternative samples. It basically processes all the handles in $curl_handles at a concurrency of 4 (as soon as one finishes it starts the next). Note that the reason I don’t use an associative array above is so that I can use array_shift in the code here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | $handle_descs = array(); $cmh = curl_multi_init(); $running = 0; do { while ($running < 4 && $curl_handles) { list($ch, $d) = array_shift($curl_handles); curl_multi_add_handle($cmh, $ch); $handle_descs[$ch] = $d; fprintf(STDERR, "Starting transfer: %s\n", $d); ++$running; } if (curl_multi_select($cmh) != -1) while (curl_multi_exec($cmh, $running) == CURLM_CALL_MULTI_PERFORM); do { $r = curl_multi_info_read($cmh, $remmsg); if ($r) { if ($r['msg'] == CURLMSG_DONE) { if ($r['result'] != CURLE_OK) { fprintf(STDERR, "Error transferring: %s\n", $handles_descs[$r['handle']]); exit(1); } else fprintf(STDERR, "Completed transfer: %s\n", $handle_descs[$r['handle']]); curl_multi_remove_handle($cmh, $r['handle']); } else { fprintf(STDERR, "Don't know how to handle curl message %d\n", $r['msg']); exit(1); } } } while ($remmsg > 0); } while($running > 0); curl_multi_close($cmh); |
Firstly we set $running to zero so the code that add handles from the pool will initially add 4 handles (You can up the concurrency by modifying this value in line 6).
I use a curl_multi_select to block on data availability instead of busy-looping (munching a LOT of CPU for no good reason). After which I execute curl_multi_exec until it’s happy. This is on lines 14 + 15.
The curl_multi_info_read loop just removes completed handles and closes them, as well as output some status information.
Please note that status info goes to STDERR, you can easily change this.