[go: up one dir, main page]

Skip to content


Repository files navigation

Diggy web scraper

PHP tests Latest Stable Version

Diggy is a simple wrapper around the PHP DOM extension that allow finding elements using simple query selectors and fail proof chaining.


  • PHP 8.1

Getting started

Diggy includes a simple webclient that uses Guzzle under the hood to download a page and return a NodeCollection object. However, you can use any webclient you prefer and pass a DOMNode or DOMNodeList object to the NodeCollection constructor.

$client = new \Jerodev\Diggy\WebClient();
$page = $client->get('https://www.deviaene.eu/');

$socials = $page->first('#social')->querySelector('a span')->texts();

//    [
//        'GitHub',
//        'Twitter',
//        'Email',
//        'LinkedIn',
//    ]

Available functions

These are the available functions on a NodeCollection object. All functions that do not return a native value can be chained without having to worry if there are nodes in the collection or not.

attribute(string $name)

Returns the value of the attribute of the first element in the collection if available.



Returns the number of elements in the current node collection.


each(string $selector, closure $closure, ?int $max = null)

Loops over all dom elements in the current collection and executes a closure for each element. The return value of this function is an array of values returned from the closure.

$nodes->each('a', static function (NodeFilter $node) {
    return $a->attribute('href');

exists(?string $selector = null)

Indicates if an element exists in the collection. If a selector is given, the current nodes will first be filtered.


filter(closure $closure)

Filters the current node collection based on a given closure.

$nodes->filter(static function (NodeFilter $node) {
    return $node->text() === 'foo';

first(?string $selector = null)

Returns the first element of the node collection. If a selector is given, the current nodes will first be filtered.


is(string $nodeName)

Indicates if the first element in the current collection has a specified tag name.


last(?string $selector = null)

Returns the last element of the node collection. If a selector is given, the current nodes will first be filtered.



Returns the tag name of the first element in the current node collection


nth(int $index, ?string $selector = null)

Returns the nth element of the node collection, starting at 0. If a selector is given, the current nodes will first be filtered.

$nodes->nth(1, 'a.active');

querySelector(string $selector)

Finds all elements in the current node collection matching this css query selector.


text(?string $selector = null)

Returns the inner text of the first element in the node collection. If a selector is given, the current nodes will first be filtered.



Returns an array containing the inner text of every root element in the collection.

$nodes->texts('nav > a');

whereHas(closure $closure)

Filter nodes that contain child nodes that fulfill the filter described by the closure

$nodes->whereHas(static function (NodeFilter $node) {
    return $node->first('a[href]');

whereHasAttribute(string $key, ?string $value = null)

Filters the current node collection by the existence of a specific attribute. If a value is given the collection is also filtered by the value of this attribute.


whereHasText(?string $value = null, bool $trim = true, bool $exact = false)

Filters the current node collection by the existence of inner text. Setting a value will also filter the nodes by the actual inner text based on $trim and $exact.

option function
$trim Indicates the inner text value should be trimmed before matches with $value.
$exact Indicates the inner text value should match $value exactly.

xPath(string $selector)

Finds all elements in the current node collection matching this xpath query selector.
