Zebra_cURL Zebra_cURL

Class: Zebra_cURL

source file: /Zebra_cURL.php

A high performance cURL PHP library allowing the running of multiple requests at once, asynchronously.

Read more here.

Author(s):

Version:

  • 1.5.0 (last revision: September 29, 2020)
    See CHANGELOG

License:

Copyright:

  • © 2013 - 2020 Stefan Gabos

Class properties

integer $pause_interval

The number of seconds to wait between processing batches of requests.

If the value of this property is greater than 0, the library will process as many requests as defined by the threads property and then wait for pause_interval seconds before processing the next batch of requests.

Default is 0 (the library will keep as many parallel threads as defined by threads running at all times until there are no more requests to process).

top

integer $threads

The number of parallel, asynchronous requests to be processed by the library, at once.

  1. // process 30 simultaneous requests at once
  2. $curl->threads 30;

Note that unless pause_interval is set to a value greater than 0, the library will process a constant number of requests, at all times; it is doing this by starting a new request as soon as another one finishes.

If pause_interval is set to a value greater than 0, the library will process as many requests as set by the threads property and then wait for pause_interval seconds before processing the next batch of requests.

Default is 10

top

Class methods

constructor __construct()

void __construct ( [, boolean $htmlentities = true ] )

Constructor of the class.

Below is the list of default options set by the library when instantiated. Various methods of the library may overwrite some of these options when called (see delete, download, ftp_download, get, header, post, put). The value of any of these options may also be changed with the option method. For a full list of available options and their description, consult the PHP documentation.

  • CURLINFO_HEADER_OUT - the last request string sent
    default: TRUE
  • CURLOPT_AUTOREFERER - TRUE to automatically set the "Referer:" field in requests where it follows a "Location:" redirect
    default: TRUE
  • CURLOPT_COOKIEFILE - the name of the file containing the cookie data. the cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file. if the name is an empty string, no cookies are loaded, but cookie handling is still enabled
    default: an empty string
  • CURLOPT_CONNECTTIMEOUT - the number of seconds to wait while trying to connect
    default: 10 (use 0 to wait indefinitely)
  • CURLOPT_ENCODING - the contents of the "Accept-Encoding: " header. this enables decoding of the response. supported encodings are identity, deflate, and gzip. if an empty string is set, a header containing all supported encoding types is sent
    default: gzip,deflate
  • CURLOPT_FOLLOWLOCATION - TRUE to follow any "Location:" header that the server sends as part of the HTTP header (note this is recursive, PHP will follow as many "Location:" headers that it is sent, unless CURLOPT_MAXREDIRS is set - see below)
    default: TRUE
  • CURLOPT_HEADER - TRUE to include the header in the output
    default: TRUE
  • CURLOPT_MAXREDIRS - the maximum amount of HTTP redirections to follow. use this option alongside CURLOPT_FOLLOWLOCATION - see above
    default: 50
  • CURLOPT_RETURNTRANSFER - TRUE to return the transfer's body as a string instead of outputting it directly
    default: TRUE
  • CURLOPT_SSL_VERIFYHOST - 1 to check the existence of a common name in the SSL peer certificate. 2 to check the existence of a common name and also verify that it matches the hostname provided. 0 to not check the names
    see the ssl method for more info
    default: TRUE
  • CURLOPT_SSL_VERIFYPEER - FALSE to stop cURL from verifying the peer's certificate
    see the ssl method for more info
    default: TRUE
  • CURLOPT_TIMEOUT - the maximum number of seconds to allow cURL functions to execute
    default: 10
  • CURLOPT_USERAGENT - a (slightly) random user agent (Internet Explorer 9 or 10, on Windows Vista, 7 or 8, with other extra strings). Some web services will not respond unless a valid user-agent string is provided
Arguments
boolean $htmlentities

(Optional) Instructs the script whether the response body returned by the get and post methods should be run through PHP's htmlentities function.

Default is TRUE

top

method cache()

void cache ( string $path [, integer $lifetime = 3600 ] [, boolean $compress = true ] [, octal $chmod = 0755 ] )

Enables caching of request results.

Note that in case of downloads, only the actual request is cached and not the associated downloads

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // cache results in the "cache" folder and for 86400 seconds (24 hours)
  10. $curl->cache('cache'86400);
  11.  
  12. // fetch the RSS feeds of some popular tech-related websites
  13. // and execute a callback function for each request, as soon as it finishes
  14. $curl->get(array(
  15.  
  16.     'https://alistapart.com/main/feed/',
  17.     'https://www.smashingmagazine.com/feed/',
  18.     'https://code.tutsplus.com/posts.atom',
  19.  
  20. // the callback function receives as argument an object with 4 properties
  21. // (info, header, body and response)
  22. )function($result{
  23.  
  24.     // everything went well at cURL level
  25.     if ($result->response[1== CURLE_OK{
  26.  
  27.         // if server responded with code 200 (meaning that everything went well)
  28.         // see https://httpstatus.es/ for a list of possible response codes
  29.         if ($result->info['http_code'== 200{
  30.  
  31.             // see all the returned data
  32.             print_r('<pre>');
  33.             print_r($result);
  34.  
  35.         // show the server's response code
  36.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  37.  
  38.     // something went wrong
  39.     // ($result still contains all data that could be gathered)
  40.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  41.  
  42. });
Arguments
string $path

Path where cache files to be stored.

Setting this to FALSE will disable caching.

If set to a non-existing path, the library will try to create the folder and will trigger an error if, for whatever reasons, it is unable to do so. If the folder can be created, its permissions will be set to the value of the $chmod argument.

integer $lifetime

(Optional) The number of seconds after which cache will be considered expired.

Default is 3600 (one hour).

boolean $compress

(Optional) If set to TRUE, cache files will be gzcompress-ed so that they occupy less disk space.

Default is TRUE.

octal $chmod

(Optional) The file system permissions to be set for newly created cache files.

I suggest using the value 0755 but, if you know what you are doing, here is how you can calculate the permission levels:

  • 400 Owner Read
  • 200 Owner Write
  • 100 Owner Execute
  • 40 Group Read
  • 20 Group Write
  • 10 Group Execute
  • 4 Global Read
  • 2 Global Write
  • 1 Global Execute

Default is 0755.

top

method cookies()

void cookies ( string $path )

Sets the path and name of the file to save cookie to / retrieve cookies from. All cookie data will be stored in this file on a per-domain basis. Important when cookies need to stored/restored to maintain status/session of requests made to the same domains.

This method will automatically set the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options.

Arguments
string $path

The path to a file to save cookies to / retrieve cookies from.

If file does not exist the library will attempt to create it and, if it is unable to do so, it will trigger an error.

top

method delete()

void delete ( mixed $urls [, callable $callback = '' ] )

Performs an HTTP DELETE request to one or more URLs with optional POST data, and executes the callback function specified by the $callback argument for each and every request, as soon as the request finishes.

This method will automatically set the following options:

  • CURLINFO_HEADER_OUT = TRUE
  • CURLOPT_CUSTOMREQUEST = DELETE
  • CURLOPT_HEADER = TRUE
  • CURLOPT_NOBODY = FALSE
  • CURLOPT_POST = FALSE
  • CURLOPT_POSTFIELDS = the POST data

...and will unset the following options:

  • CURLOPT_BINARYTRANSFER
  • CURLOPT_HTTPGET
  • CURLOPT_FILE

Multiple requests are processed asynchronously, in parallel, and the callback function is called for each and every request as soon as the request finishes. The number of parallel requests to be constantly processed, at all times, is set through the threads property. See also pause_interval.

Because requests are done asynchronously, when initiating multiple requests at once, these may not finish in the order in which they were initiated!

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // do a DELETE request
  10. // and execute a callback function for each request, as soon as it finishes
  11. $curl->delete(array(
  12.  
  13.     'https://www.somewebsite.com'   =>  array(
  14.         'data_1'  =>  'value 1',
  15.         'data_2'  =>  'value 2',
  16.     ),
  17.  
  18. // the callback function receives as argument an object with 4 properties
  19. // (info, header, body and response)
  20. )function($result{
  21.  
  22.     // everything went well at cURL level
  23.     if ($result->response[1== CURLE_OK{
  24.  
  25.         // if server responded with code 200 (meaning that everything went well)
  26.         // see https://httpstatus.es/ for a list of possible response codes
  27.         if ($result->info['http_code'== 200{
  28.  
  29.             // see all the returned data
  30.             print_r('<pre>');
  31.             print_r($result);
  32.  
  33.         // show the server's response code
  34.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  35.  
  36.     // something went wrong
  37.     // ($result still contains all data that could be gathered)
  38.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  39.  
  40. });
Arguments
mixed $urls

URL(s) to send the request(s) to.

Read full description of the argument at the post method.

callable $callback

(Optional) Callback function to be called as soon as the request finishes.

Read full description of the argument at the get method.

Tags
since:   1.3.3
top

method download()

void download ( mixed $urls , string $path [, callable $callback = '' ] )

Downloads one or more files from one or more URLs, saves the downloaded files to the path specified by the $path argument, and executes the callback function specified by the $callback argument for each and every request, as soon as the request finishes.

If the path you are downloading from refers to a file, the file's original name will be preserved but, if you are downloading a file generated by a script (i.e. https://foo.com/bar.php?w=1200&h=800), the downloaded file's name will be random generated. Refer to the downloaded file's name in the result's info attribute, in the downloaded_filename section - see the example below.

If you are downloading multiple files with the same name the later ones will overwrite the previous ones.

Downloads are streamed (bytes downloaded are directly written to disk) removing the unnecessary strain from your server of reading files into memory first, and then writing them to disk.

This method will automatically set the following options:

  • CURLINFO_HEADER_OUT = TRUE
  • CURLOPT_BINARYTRANSFER = TRUE
  • CURLOPT_HEADER = TRUE
  • CURLOPT_FILE

...and will unset the following options:

  • CURLOPT_CUSTOMREQUEST
  • CURLOPT_HTTPGET
  • CURLOPT_NOBODY
  • CURLOPT_POST
  • CURLOPT_POSTFIELDS

Multiple requests are processed asynchronously, in parallel, and the callback function is called for each and every request as soon as the request finishes. The number of parallel requests to be constantly processed, at all times, is set through the threads property. See also pause_interval.

Because requests are done asynchronously, when initiating multiple requests at once, these may not finish in the order in which they were initiated!

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // download 2 images from 2 different websites
  10. // and execute a callback function for each request, as soon as it finishes
  11. $curl->download(array(
  12.  
  13.     'https://www.somewebsite.com/images/alpha.jpg',
  14.     'https://www.otherwebsite.com/images/omega.jpg',
  15.  
  16. // the callback function receives as argument an object with 4 properties
  17. // (info, header, body and response)
  18. )'destination/path/'function($result{
  19.  
  20.     // everything went well at cURL level
  21.     if ($result->response[1== CURLE_OK{
  22.  
  23.         // if server responded with code 200 (meaning that everything went well)
  24.         // see https://httpstatus.es/ for a list of possible response codes
  25.         if ($result->info['http_code'== 200{
  26.  
  27.             // see all the returned data
  28.             print_r('<pre>');
  29.             print_r($result);
  30.  
  31.             // get the downloaded file's path
  32.             $result->info['downloaded_filename'];
  33.  
  34.         // show the server's response code
  35.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  36.  
  37.     // something went wrong
  38.     // ($result still contains all data that could be gathered)
  39.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  40.  
  41. });
Arguments
mixed $urls

URL(s) to send the request(s) to.

Can be any of the following:

  1. // a string
  2. $curl->download('https://address.com/file.foo''path''callback');
  3.  
  4. // an array, for multiple requests
  5. $curl->download(array(
  6.     'https://address1.com/file1.foo',
  7.     'https://address2.com/file2.bar',
  8. )'path''callback');

If custom options need to be set for each request, use the following format:

  1. // this can also be an array of arrays, for multiple requests
  2. $curl->download(array(
  3.  
  4.     // mandatory!
  5.     'url'       =>  'https://address.com/file.foo',
  6.  
  7.     // optional, used to set any cURL option
  8.     // in the same way you would set with the options() method
  9.     'options'   =>  array(
  10.                         CURLOPT_USERAGENT   =>  'Dummy scrapper 1.0',
  11.                     ),
  12.  
  13. )'path''callback');
string $path

The path to where to save the file(s) to.

If path is not pointing to a directory or the directory is not writable, the library will trigger an error.

callable $callback

(Optional) Callback function to be called as soon as the request finishes.

Read full description of the argument at the get method.

top

method ftp_download()

void ftp_download ( mixed $urls , string $path [, string $username = '' ] [, string $password = '' ] [, callable $callback = '' ] )

Works exactly like the download method but downloads are made from an FTP server.

Downloads one or more files from an FTP server, to which the connection is made using the given $username and $password arguments, saves the downloaded files (with their original name) to the path specified by the $path argument, and executes the callback function specified by the $callback argument for each and every request, as soon as the request finishes.

Downloads are streamed (bytes downloaded are directly written to disk) removing the unnecessary strain from your server of reading files into memory first, and then writing them to disk.

This method will automatically set the following options:

  • CURLINFO_HEADER_OUT = TRUE
  • CURLOPT_BINARYTRANSFER = TRUE
  • CURLOPT_HEADER = TRUE
  • CURLOPT_FILE

...and will unset the following options:

  • CURLOPT_CUSTOMREQUEST
  • CURLOPT_HTTPGET
  • CURLOPT_NOBODY
  • CURLOPT_POST
  • CURLOPT_POSTFIELDS

If you are downloading multiple files with the same name the later ones will overwrite the previous ones.

Multiple requests are processed asynchronously, in parallel, and the callback function is called for each and every request as soon as the request finishes. The number of parallel requests to be constantly processed, at all times, is set through the threads property. See also pause_interval.

Because requests are done asynchronously, when initiating multiple requests at once, these may not finish in the order in which they were initiated!

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // connect to the FTP server using the given credential, download a file to a given location
  10. // and execute a callback function for each request, as soon as it finishes
  11. $curl->ftp_download(
  12.  
  13.     'ftp://somefile.ext',
  14.     'destination/path',
  15.     'username',
  16.     'password',
  17.  
  18.     // the callback function receives as argument an object with 4 properties
  19.     // (info, header, body and response)
  20.     function($result{
  21.  
  22.         // everything went well at cURL level
  23.         if ($result->response[1== CURLE_OK{
  24.  
  25.             // if server responded with code 200 (meaning that everything went well)
  26.             // see https://httpstatus.es/ for a list of possible response codes
  27.             if ($result->info['http_code'== 200{
  28.  
  29.                 // see all the returned data
  30.                 print_r('<pre>');
  31.                 print_r($result);
  32.  
  33.             // show the server's response code
  34.             else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  35.  
  36.         // something went wrong
  37.         // ($result still contains all data that could be gathered)
  38.         else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  39.  
  40.     }
  41.  
  42. );
Arguments
mixed $urls

URL(s) to send the request(s) to.

Can be any of the following:

  1. // a string
  2. $curl->ftp_download(
  3.     'ftp://address.com/file.foo',
  4.     'destination/path',
  5.     'username',
  6.     'password',
  7.     'callback'
  8. );
  9.  
  10. // an array, for multiple requests
  11. $curl->ftp_download(array(
  12.     'ftp://address1.com/file1.foo',
  13.     'ftp://address2.com/file2.bar',
  14. )'destination/path''username''password''callback');

If custom options need to be set for each request, use the following format:

  1. // this can also be an array of arrays, for multiple requests
  2. $curl->ftp_download(array(
  3.  
  4.     // mandatory!
  5.     'url'       =>  'ftp://address.com/file.foo',
  6.  
  7.     // optional, used to set any cURL option
  8.     // in the same way you would set with the options() method
  9.     'options'   =>  array(
  10.                         CURLOPT_USERAGENT   =>  'Dummy scrapper 1.0',
  11.                     ),
  12.  
  13. )'destination/path''username''password''callback');

Note that in all the examples above, you are downloading files from a single FTP server. To make requests to multiple FTP servers, set the CURLOPT_USERPWD option yourself. The $username and $password arguments will be overwritten by the values set like this.

  1. $curl->ftp_download(array(
  2.     array(
  3.         'url'       =>  'ftp://address1.com/file1.foo',
  4.         'options'   =>  array(
  5.                             CURLOPT_USERPWD =>  'username1:password1',
  6.                         ),
  7.     ),
  8.     array(
  9.         'url'       =>  'ftp://address2.com/file2.foo',
  10.         'options'   =>  array(
  11.                             CURLOPT_USERPWD =>  'username2:password2',
  12.                         ),
  13.     ),
  14. )'destination/path''''''callback');
string $path

The path to where to save the file(s) to.

If path is not pointing to a directory or is not writable, the library will trigger an error.

string $username (Optional) The username to be used to connect to the FTP server (if required).
string $password (Optional) The password to be used to connect to the FTP server (if required).
callable $callback

(Optional) Callback function to be called as soon as the request finishes.

Read full description of the argument at the get method.

top

method get()

void get ( mixed $urls [, callable $callback = '' ] )

Performs an HTTP GET request to one or more URLs and executes the callback function specified by the $callback argument for each and every request, as soon as the request finishes.

This method will automatically set the following options:

  • CURLINFO_HEADER_OUT = TRUE
  • CURLOPT_HEADER = TRUE
  • CURLOPT_HTTPGET = TRUE
  • CURLOPT_NOBODY = FALSE

...and will unset the following options:

  • CURLOPT_BINARYTRANSFER
  • CURLOPT_CUSTOMREQUEST
  • CURLOPT_FILE
  • CURLOPT_POST
  • CURLOPT_POSTFIELDS

Multiple requests are processed asynchronously, in parallel, and the callback function is called for each and every request as soon as the request finishes. The number of parallel requests to be constantly processed, at all times, is set through the threads property. See also pause_interval.

Because requests are done asynchronously, when initiating multiple requests at once, these may not finish in the order in which they were initiated!

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // cache results in the "cache" folder and for 3600 seconds (one hour)
  10. $curl->cache('cache'3600);
  11.  
  12. // let's fetch the RSS feeds of some popular websites
  13. // execute the callback function for each request, as soon as it finishes
  14. $curl->get(array(
  15.  
  16.     'https://alistapart.com/main/feed/',
  17.     'https://www.smashingmagazine.com/feed/',
  18.     'https://code.tutsplus.com/posts.atom',
  19.  
  20. // the callback function receives as argument an object with 4 properties
  21. // (info, header, body and response)
  22. )function($result{
  23.  
  24.     // everything went well at cURL level
  25.     if ($result->response[1== CURLE_OK{
  26.  
  27.         // if server responded with code 200 (meaning that everything went well)
  28.         // see https://httpstatus.es/ for a list of possible response codes
  29.         if ($result->info['http_code'== 200{
  30.  
  31.             // see all the returned data
  32.             print_r('<pre>');
  33.             print_r($result);
  34.  
  35.         // show the server's response code
  36.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  37.  
  38.     // something went wrong
  39.     // ($result still contains all data that could be gathered)
  40.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  41.  
  42. });
Arguments
mixed $urls

URL(s) to send the request(s) to.

Can be any of the following:

  1. // a string
  2. $curl->get('https://address.com/''callback');
  3.  
  4. // an array, for multiple requests
  5. $curl->get(array(
  6.     'https://address1.com/',
  7.     'https://address2.com/',
  8. )'callback');

If custom options need to be set for each request, use the following format:

  1. // this can also be an array of arrays, for multiple requests
  2. $curl->get(array(
  3.  
  4.     // mandatory!
  5.     'url'       =>  'https://address.com/',
  6.  
  7.     // optional, used to set any cURL option
  8.     // in the same way you would set with the options() method
  9.     'options'   =>  array(
  10.                         CURLOPT_USERAGENT   =>  'Dummy scrapper 1.0',
  11.                     ),
  12.  
  13.     // optional, you can pass arguments this way also
  14.     'data'      =>  array(
  15.                         'data_1'  =>  'value 1',
  16.                         'data_2'  =>  'value 2',
  17.                     ),
  18.  
  19. )'callback');
callable $callback

(Optional) Callback function to be called as soon as the request finishes.

May be given as a string representing the name of an existing function, or as an anonymous function.

The callback function receives as first argument an object with 4 properties as described below. Any extra arguments passed to the download method will be passed as extra arguments to the callback function:

  • info - an associative array containing information about the request that just finished, as returned by PHP's curl_getinfo() function
  • headers - an associative array with 2 items:
    • last_request - an array with a single entry containing the request headers generated by the last request
      therefore, when redirects are involved, only information from the last request will be available
      if explicitly disabled by setting CURLINFO_HEADER_OUT to 0 or FALSE through the option method, this will be an empty string
    • responses an empty string as it is not available for this method
  • body - the response of the request (the content of the page at the URL).

    Unless disabled via the constructor, all applicable characters will be converted to HTML entities via PHP's htmlentities function, so remember to use PHP's html_entity_decode function in case you need the decoded values

    if explicitly disabled by setting CURLOPT_NOBODY to 0 or FALSE through the option method, this will be an empty string
  • response - the response given by the cURL library as an array with 2 items:
    • the textual representation of the result's code (i.e. CURLE_OK)
    • the result's code (i.e. 0)

If the callback function returns FALSE while caching is enabled, the library will not cache the respective request, making it easy to retry failed requests without having to clear all cache.

top

method header()

void header ( mixed $urls [, callable $callback = '' ] )

Works exactly like the get method, the only difference being that this method will only return the headers, without body.

This method will automatically set the following options:

  • CURLINFO_HEADER_OUT = TRUE
  • CURLOPT_HEADER = TRUE
  • CURLOPT_HTTPGET = TRUE
  • CURLOPT_NOBODY = TRUE

...and will unset the following options:

  • CURLOPT_BINARYTRANSFER
  • CURLOPT_CUSTOMREQUEST
  • CURLOPT_FILE
  • CURLOPT_POST
  • CURLOPT_POSTFIELDS

Multiple requests are processed asynchronously, in parallel, and the callback function is called for each and every request as soon as the request finishes. The number of parallel requests to be constantly processed, at all times, is set through the threads property. See also pause_interval.

Because requests are done asynchronously, when initiating multiple requests at once, these may not finish in the order in which they were initiated!

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // process given URLs
  5. // and execute a callback function for each request, as soon as it finishes
  6. // the callback function receives as argument an object with 4 properties
  7. // (info, header, body and response)
  8. $curl->header('https://www.somewebsite.com'function($result{
  9.  
  10.     // everything went well at cURL level
  11.     if ($result->response[1== CURLE_OK{
  12.  
  13.         // if server responded with code 200 (meaning that everything went well)
  14.         // see https://httpstatus.es/ for a list of possible response codes
  15.         if ($result->info['http_code'== 200{
  16.  
  17.             // see all the returned data
  18.             print_r('<pre>');
  19.             print_r($result);
  20.  
  21.         // show the server's response code
  22.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  23.  
  24.     // something went wrong
  25.     // ($result still contains all data that could be gathered)
  26.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  27.  
  28. });
Arguments
mixed $urls

URL(s) to send the request(s) to.

Read full description of the argument at the get method.

callable $callback

(Optional) Callback function to be called as soon as the request finishes.

Read full description of the argument at the get method.

top

method http_authentication()

void http_authentication ( [, string $username = '' ] [, string $password = '' ] [, string $type = CURLAUTH_ANY ] )

Use this method to make requests to pages that require prior HTTP authentication.

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // prepare user name and password
  5. $curl->http_authentication('username''password');
  6.  
  7. // if making requests over HTTPS we need to load a CA bundle
  8. // so we don't get CURLE_SSL_CACERT response from cURL
  9. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  10. $curl->ssl(true2'path/to/cacert.pem');
  11.  
  12. // get content from a page that requires prior HTTP authentication
  13. // the callback function receives as argument an object with 4 properties
  14. // (info, header, body and response)
  15. $curl->get('https://www.some-page-requiring-prior-http-authentication.com'function($result{
  16.  
  17.     // everything went well at cURL level
  18.     if ($result->response[1== CURLE_OK{
  19.  
  20.         // if server responded with code 200 (meaning that everything went well)
  21.         // see https://httpstatus.es/ for a list of possible response codes
  22.         if ($result->info['http_code'== 200{
  23.  
  24.             // see all the returned data
  25.             print_r('<pre>');
  26.             print_r($result);
  27.  
  28.         // show the server's response code
  29.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  30.  
  31.     // something went wrong
  32.     // ($result still contains all data that could be gathered)
  33.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  34.  
  35. });

If you have to unset previously set values use

Arguments
string $username User name to be used for authentication.
string $password Password to be used for authentication.
string $type

(Optional) The HTTP authentication method(s) to use. The options are:

  • CURLAUTH_BASIC
  • CURLAUTH_DIGEST
  • CURLAUTH_GSSNEGOTIATE
  • CURLAUTH_NTLM
  • CURLAUTH_ANY
  • CURLAUTH_ANYSAFE

The bitwise | (or) operator can be used to combine more than one method. If this is done, cURL will poll the server to see what methods it supports and pick the best one.

CURLAUTH_ANY is an alias for
CURLAUTH_BASIC | CURLAUTH_DIGEST | CURLAUTH_GSSNEGOTIATE | CURLAUTH_NTLM

CURLAUTH_ANYSAFE is an alias for
CURLAUTH_DIGEST | CURLAUTH_GSSNEGOTIATE | CURLAUTH_NTLM

Default is CURLAUTH_ANY

top

method option()

void option ( mixed $option [, mixed $value = '' ] )

Allows the setting of one or more cURL options.

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // setting a single option
  5. $curl->option(CURLOPT_CONNECTTIMEOUT10);
  6.  
  7. // setting multiple options at once
  8. $curl->option(array(
  9.     CURLOPT_TIMEOUT         =>  10,
  10.     CURLOPT_CONNECTTIMEOUT  =>  10,
  11. ));
  12.  
  13. // requests are made here...
Arguments
mixed $option

A single option for which to set a value, or an associative array in the form of option => value.

Setting a value to null will unset that option.

mixed $value

(Optional) If the $option argument is not an array, then this argument represents the value to be set for the respective option. If the $option argument is an array, the value of this argument will be ignored.

Setting a value to null will unset that option.

top

method post()

void post ( mixed $urls [, callable $callback = '' ] )

Performs an HTTP POST request to one or more URLs and executes the callback function specified by the $callback argument for each and every request, as soon as the request finishes.

This method will automatically set the following options:

  • CURLINFO_HEADER_OUT = TRUE
  • CURLOPT_HEADER = TRUE
  • CURLOPT_NOBODY = FALSE
  • CURLOPT_POST = TRUE
  • CURLOPT_POSTFIELDS = the POST data

...and will unset the following options:

  • CURLOPT_BINARYTRANSFER
  • CURLOPT_CUSTOMREQUEST
  • CURLOPT_HTTPGET = TRUE
  • CURLOPT_FILE

Multiple requests are processed asynchronously, in parallel, and the callback function is called for each and every request as soon as the request finishes. The number of parallel requests to be constantly processed, at all times, is set through the threads property. See also pause_interval.

Because requests are done asynchronously, when initiating multiple requests at once, these may not finish in the order in which they were initiated!

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // do a POST request and execute a callback function for each request, as soon as it finishes
  10. $curl->post(array(
  11.  
  12.     'https://www.somewebsite.com'  =>  array(
  13.         'data_1'  =>  'value 1',
  14.         'data_2'  =>  'value 2',
  15.     ),
  16.  
  17. // the callback function receives as argument an object with 4 properties
  18. // (info, header, body and response)
  19. )function($result{
  20.  
  21.     // everything went well at cURL level
  22.     if ($result->response[1== CURLE_OK{
  23.  
  24.         // if server responded with code 200 (meaning that everything went well)
  25.         // see https://httpstatus.es/ for a list of possible response codes
  26.         if ($result->info['http_code'== 200{
  27.  
  28.             // see all the returned data
  29.             print_r('<pre>');
  30.             print_r($result);
  31.  
  32.         // show the server's response code
  33.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  34.  
  35.     // something went wrong
  36.     // ($result still contains all data that could be gathered)
  37.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  38.  
  39. });

When uploading a file, we need to prefix the file name with @

  1. $curl->post(array(
  2.     'https://www.somewebsite.com'  =>  array(
  3.         'data_1'  =>  'value 1',
  4.         'data_2'  =>  'value 2',
  5.         'data_3'  =>  '@absolute/path/to/file.ext',
  6. )'mycallback');
Arguments
mixed $urls

URL(s) to send the request(s) to.

Can be any of the following:

  1. // a string (no POST values sent)
  2. $curl->post('https://address.com');
  3.  
  4. // an array, for multiple requests (no POST values sent)
  5. $curl->post(array(
  6.     'https://address1.com',
  7.     'https://address2.com',
  8. ));
  9.  
  10. // an associative array in the form of Array(url => post-data),
  11. // where "post-data" is an associative array in the form of
  12. // Array(name => value) and represents the value(s) to be set for
  13. // CURLOPT_POSTFIELDS;
  14. // "post‑data" can also be an arbitrary string - useful if you
  15. // want to send raw data (like a JSON)
  16. $curl->post(array('https://address.com' => array(
  17.     'data_1'  =>  'value 1',
  18.     'data_2'  =>  'value 2',
  19. )));
  20.  
  21. // just like above but an *array* of associative arrays, for
  22. // multiple requests
  23. $curl->post(array(
  24.     array('https://address.com1' => array(
  25.         'data_1'  =>  'value 1',
  26.         'data_2'  =>  'value 2',
  27.     )),
  28.     array('https://address.com2' => array(
  29.         'data_1'  =>  'value 1',
  30.         'data_2'  =>  'value 2',
  31.     )),
  32. ));

If custom options need to be set for each request, use the following format:

  1. // this can also be an array of arrays, for multiple requests
  2. $curl->post(array(
  3.  
  4.     // mandatory!
  5.     'url'       =>  'https://address.com',
  6.  
  7.     // optional, used to set any cURL option
  8.     // in the same way you would set with the options() method
  9.     'options'   =>  array(
  10.                         CURLOPT_USERAGENT   =>  'Dummy scrapper 1.0',
  11.                     ),
  12.  
  13.     // optional, if you need to pass any arguments
  14.     // (equivalent of setting CURLOPT_POSTFIELDS using
  15.     // the "options" entry above)
  16.     'data'      =>  array(
  17.                         'data_1'  =>  'value 1',
  18.                         'data_2'  =>  'value 2',
  19.                     ),
  20. ));

To post a file, prepend the filename with @ and use the full server path.

For PHP 5.5+ files are uploaded using CURLFile and CURLOPT_SAFE_UPLOAD will be set to TRUE.

For lower PHP versions, files will be uploaded the old way and the file's mime type should be explicitly specified by following the filename with the type in the format ';type=mimetype' as most of the times cURL will send the wrong mime type...

  1. $curl->post(array('https://address.com' => array(
  2.     'data_1'  =>  'value 1',
  3.     'data_2'  =>  'value 2',
  4.     'data_3'  =>  '@absolute/path/to/file.ext',
  5. )));

If any data is sent, the "Content-Type" header will be set to "multipart/form-data"

callable $callback

(Optional) Callback function to be called as soon as the request finishes.

Read full description of the argument at the get method.

top

method proxy()

void proxy ( string $proxy [, string $port = 80 ] [, string $username = '' ] [, string $password = '' ] )

Instructs the library to tunnel all requests through a proxy server.

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // connect to a proxy server
  10. // (that's a random one i got from https://www.proxynova.com/proxy-server-list/)
  11. $curl->proxy('91.221.252.18''8080');
  12.  
  13. // fetch a page and execute a callback function when done
  14. // the callback function receives as argument an object with 4 properties
  15. // (info, header, body and response)
  16. $curl->get('https://www.somewebsite.com/'function($result{
  17.  
  18.     // everything went well at cURL level
  19.     if ($result->response[1== CURLE_OK{
  20.  
  21.         // if server responded with code 200 (meaning that everything went well)
  22.         // see https://httpstatus.es/ for a list of possible response codes
  23.         if ($result->info['http_code'== 200{
  24.  
  25.             // see all the returned data
  26.             print_r('<pre>');
  27.             print_r($result);
  28.  
  29.         // show the server's response code
  30.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  31.  
  32.     // something went wrong
  33.     // ($result still contains all data that could be gathered)
  34.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  35.  
  36. });
Arguments
string $proxy

The HTTP proxy to tunnel requests through.

Can be an URL or an IP address.

This option can also be set using the option method and setting CURLOPT_PROXY to the desired value.

Setting this argument to FALSE will unset all the proxy-related options.

string $port

(Optional) The port number of the proxy to connect to.

Default is 80.

This option can also be set using the option method and setting CURLOPT_PROXYPORT to the desired value.

string $username

(Optional) The username to be used for the connection to the proxy (if required by the proxy)

Default is "" (an empty string)

The username and the password can also be set using the option method and setting CURLOPT_PROXYUSERPWD to the desired value formatted like [username]:[password].

string $password

(Optional) The password to be used for the connection to the proxy (if required by the proxy)

Default is "" (an empty string)

The username and the password can also be set using the option method and setting CURLOPT_PROXYUSERPWD to the desired value formatted like [username]:[password].

top

method put()

void put ( mixed $urls [, callable $callback = '' ] )

Performs an HTTP PUT request to one or more URLs and executes the callback function specified by the $callback argument for each and every request, as soon as the request finishes.

This method will automatically set the following options:

  • CURLINFO_HEADER_OUT - TRUE
  • CURLOPT_CUSTOMREQUEST - PUT
  • CURLOPT_HEADER - TRUE
  • CURLOPT_NOBODY - FALSE
  • CURLOPT_POST - FALSE
  • CURLOPT_POSTFIELDS - the POST data

...and will unset the following options:

  • CURLOPT_BINARYTRANSFER
  • CURLOPT_HTTPGET = TRUE
  • CURLOPT_FILE

Multiple requests are processed asynchronously, in parallel, and the callback function is called for each and every request as soon as the request finishes. The number of parallel requests to be constantly processed, at all times, is set through the threads property. See also pause_interval.

Because requests are done asynchronously, when initiating multiple requests at once, these may not finish in the order in which they were initiated!

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // do a PUT request and execute a callback function for each request, as soon as it finishes
  10. $curl->put(array(
  11.  
  12.     'https://www.somewebsite.com'  =>  array(
  13.         'data_1'  =>  'value 1',
  14.         'data_2'  =>  'value 2',
  15.     ),
  16.  
  17. // the callback function receives as argument an object with 4 properties
  18. // (info, header, body and response)
  19. )function($result{
  20.  
  21.     // everything went well at cURL level
  22.     if ($result->response[1== CURLE_OK{
  23.  
  24.         // if server responded with code 200 (meaning that everything went well)
  25.         // see https://httpstatus.es/ for a list of possible response codes
  26.         if ($result->info['http_code'== 200{
  27.  
  28.             // see all the returned data
  29.             print_r('<pre>');
  30.             print_r($result);
  31.  
  32.         // show the server's response code
  33.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  34.  
  35.     // something went wrong
  36.     // ($result still contains all data that could be gathered)
  37.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  38.  
  39. });
Arguments
mixed $urls

URL(s) to send the request(s) to.

Read full description of the argument at the post method.

callable $callback

(Optional) Callback function to be called as soon as the request finishes.

Read full description of the argument at the get method.

Tags
since:   1.3.3
top

method queue()

void queue ()

Instructs the library to queue requests rather than processing them right away. Useful for grouping different types of requests and treat them as a single request.

Until start method is called, all calls to delete, download, ftp_download, get, header, post and put methods will queue up rather than being executed right away. Once the start method is called, all queued requests will be processed while values of threads and pause_interval properties will still apply.

  1. // the callback function to be executed for each and every
  2. // request, as soon as the request finishes
  3. // the callback function receives as argument an object with 4 properties
  4. // (info, header, body and response)
  5. function mycallback($result{
  6.  
  7.     // everything went well at cURL level
  8.     if ($result->response[1== CURLE_OK{
  9.  
  10.         // if server responded with code 200 (meaning that everything went well)
  11.         // see https://httpstatus.es/ for a list of possible response codes
  12.         if ($result->info['http_code'== 200{
  13.  
  14.             // see all the returned data
  15.             print_r('<pre>');
  16.             print_r($result);
  17.  
  18.         // show the server's response code
  19.         else trigger_error('Server responded with code ' $result->info['http_code']E_USER_ERROR);
  20.  
  21.     // something went wrong
  22.     // ($result still contains all data that could be gathered)
  23.     else trigger_error('cURL responded with: ' $result->response[0]E_USER_ERROR);
  24.  
  25. }
  26.  
  27. // instantiate the class
  28. $curl new Zebra_cURL();
  29.  
  30. // if making requests over HTTPS we need to load a CA bundle
  31. // so we don't get CURLE_SSL_CACERT response from cURL
  32. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  33. $curl->ssl(true2'path/to/cacert.pem');
  34.  
  35. // queue requests - useful for grouping different types of requests
  36. // in this example, when the "start" method is called, we'll execute
  37. // the "get" and the "post" requests asynchronously
  38. $curl->queue();
  39.  
  40. // do a POST and execute the callback function when done
  41. $curl->post(array(
  42.     'https://www.somewebsite.com'  =>  array(
  43.         'data_1'  =>  'value 1',
  44.         'data_2'  =>  'value 2',
  45.     ),
  46. )'mycallback');
  47.  
  48. // fetch the RSS feeds of some popular websites
  49. // and execute the callback function for each request, as soon as it finishes
  50. $curl->get(array(
  51.     'https://alistapart.com/main/feed/',
  52.     'https://www.smashingmagazine.com/feed/',
  53.     'https://code.tutsplus.com/posts.atom',
  54. )'mycallback')
  55.  
  56. // execute queued requests
  57. $curl->start();
Tags
since:   1.3.0
top

method scrap()

mixed scrap ( string $url [, boolean $body_only = true ] )

A shorthand for making a single get request without the need of a callback function.

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // if making requests over HTTPS we need to load a CA bundle
  5. // so we don't get CURLE_SSL_CACERT response from cURL
  6. // you can get this bundle from https://curl.haxx.se/docs/caextract.html
  7. $curl->ssl(true2'path/to/cacert.pem');
  8.  
  9. // get page's content only
  10. $content $curl->scrap('https://www.somewebsite.com/');
  11.  
  12. // print that to screen
  13. echo $content;
  14.  
  15. // also get extra information about the page
  16. $content $curl->scrap('https://www.somewebsite.com/'false);
  17.  
  18. // print that to screen
  19. print_r('<pre>');
  20. print_r($content);
Arguments
string $url

An URL to fetch.

Note that this method only supports a single URL. For processing multiple URLs at once, see the get method.

boolean $body_only

(Optional) When set to TRUE, will instruct the method to return only the page's content, without info, headers, responses, etc.

When set to FALSE, will instruct the method to return everything it can about the scrapped page, as an object with properties as described for the $callback argument of the get method.

Default is TRUE.

Tags
return:   Returns the scrapped page's content, when $body_only is set to TRUE, or an object with properties as described for the $callback argument of the get method.
since:   1.3.3
top

method ssl()

void ssl ( [, boolean $verify_peer = true ] [, integer $verify_host = 2 ] [, mixed $file = false ] [, mixed $path = false ] )

Requests made over HTTPS usually require additional configuration, depending on the server. Most of the times the defaults set by the library will get you through but, if defaults are not working, you can set specific options using this method.

  1. // instantiate the class
  2. $curl new Zebra_cURL();
  3.  
  4. // instruct the library to skip verifying peer's SSL certificate
  5. // (ignored if request is not made through HTTPS)
  6. $curl->ssl(false);
  7.  
  8. // fetch a page
  9. $curl->get('https://www.somewebsite.com/'function($result{
  10.     print_r("<pre>");
  11.     print_r($result);
  12. });
Arguments
boolean $verify_peer

(Optional) Should the peer's certificate be verified by cURL?

Default is TRUE.

This option can also be set using the option method and setting CURLOPT_SSL_VERIFYPEER to the desired value.

When you are communicating over HTTPS (or any other protocol that uses TLS), it will, by default, verify that the server is signed by a trusted Certificate Authority (CA) and it will most likely fail.

When it does fail, instead of disabling this check, better download the CA bundle from Mozilla and reference it through the $file argument below.

integer $verify_host

(Optional) Specifies whether to check the existence of a common name in the SSL peer certificate and that it matches with the provided hostname.

  • 1 to check the existence of a common name in the SSL peer certificate
  • 2 to check the existence of a common name and also verify that it matches the hostname provided; in production environments the value of this option should be kept at 2;

Default is 2

Support for value 1 removed in cURL 7.28.1

This option can also be set using the option method and setting CURLOPT_SSL_VERIFYHOST to the desired value.

mixed $file

(Optional) An absolute path to a file holding the certificates to verify the peer with. This only makes sense if CURLOPT_SSL_VERIFYPEER is set to TRUE.

Default is FALSE.

This option can also be set using the option method and setting CURLOPT_CAINFO to the desired value.

mixed $path

(Optional) An absolute path to a directory that holds multiple CA certificates. This only makes sense if CURLOPT_SSL_VERIFYPEER is set to TRUE.

Default is FALSE.

This option can also be set using the option method and setting CURLOPT_CAPATH to the desired value.

top

method start()

void start ()

Executes queued requests.

See queue method.

Tags
since:   1.3.0
top