Web scraping in PHP
You may use either of these libraries. As you know each one has pros & cons, so you may consult notes about each one or take time & try it on your own:
- Guzzle: An Independent HTTP client, so no need to depend on cURL, SOAP or REST.
- Goutte: Built on Guzzle & some of Symfony components by Symfony developer.
- hQuery: A fast scraper with caching capabilities. high performance on scraping large docs.
- Requests: Famous for its user-friendly usage.
- Buzz: A lightweight client, ideal for beginners.
- ReactPHP: Async scraper, with comprehensive tutorials & examples.
You'd better check them all & use everyone in its best intended occasion.
I recommend you consider simple_html_dom for this. It will make it very easy.
Here is a working example of how to pull the title, and first image.
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://www.google.com/');
$title = $html->find('title', 0);
$image = $html->find('img', 0);
echo $title->plaintext."<br>\n";
echo $image->src;
?>
Here is a second example that will do the same without an external library. I should note that using regex on HTML is NOT a good idea.
<?php
$data = file_get_contents('http://www.google.com/');
preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
$title = $matches[1];
preg_match('/<img[^>]*src=[\'"]([^\'"]+)[\'"][^>]*>/i', $data, $matches);
$img = $matches[1];
echo $title."<br>\n";
echo $img;
?>