scrape

A command line scraping utility inspired by scrape.

Features

Scrape using XPath or CSS selectors
Process HTML from a URL, STDIN, or a local file
Extract a particular attribute

Install

Option 1: Binary

Download the latest release from https://github.com/jakewarren/scrape/releases/latest

Option 2: From source

go get github.com/jakewarren/scrape

Usage

Usage of scrape:
  -A, --agent string   user agent string (default "Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)")
  -a, --attr string    attribute to scrape (default "html")
  -c, --css string     css selector
  -h, --help           usage information
  -k, --insecure       skip SSL verification
  -x, --xpath string   xpath query

Examples:

Read from URL:

❯ scrape -c "h4 a" -a href "https://www.webscraper.io/test-sites/e-commerce/allinone"
/test-sites/e-commerce/allinone/product/244
/test-sites/e-commerce/allinone/product/269
/test-sites/e-commerce/allinone/product/192

Read from STDIN:

❯ curl -A 'Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)' -s "https://www.webscraper.io/test-sites/e-commerce/allinone" | scrape -x "//h4/a" -a href
/test-sites/e-commerce/allinone/product/223
/test-sites/e-commerce/allinone/product/280
/test-sites/e-commerce/allinone/product/278

Read from file:

❯ scrape -x "//h4/a" /tmp/webscrapetest.html
<a href="/test-sites/e-commerce/allinone/product/223" class="title" title="Aspire E1-510">Aspire E1-510</a>
<a href="/test-sites/e-commerce/allinone/product/280" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>
<a href="/test-sites/e-commerce/allinone/product/278" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
css.go		css.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
xpath.go		xpath.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrape

Features

Install

Option 1: Binary

Option 2: From source

Usage

Examples:

Read from URL:

Read from STDIN:

Read from file:

About

Releases 3

Packages

Contributors 4

Languages

License

jakewarren/scrape

Folders and files

Latest commit

History

Repository files navigation

scrape

Features

Install

Option 1: Binary

Option 2: From source

Usage

Examples:

Read from URL:

Read from STDIN:

Read from file:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 4

Languages

Packages