[go: up one dir, main page]

Skip to content

A command line scraping utility supporting CSS selectors or XPath

License

Notifications You must be signed in to change notification settings

jakewarren/scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrape

Build Status GitHub release MIT License Go Report Card PRs Welcome

A command line scraping utility inspired by scrape.

Features

  • Scrape using XPath or CSS selectors
  • Process HTML from a URL, STDIN, or a local file
  • Extract a particular attribute

Install

Option 1: Binary

Download the latest release from https://github.com/jakewarren/scrape/releases/latest

Option 2: From source

go get github.com/jakewarren/scrape

Usage

Usage of scrape:
  -A, --agent string   user agent string (default "Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)")
  -a, --attr string    attribute to scrape (default "html")
  -c, --css string     css selector
  -h, --help           usage information
  -k, --insecure       skip SSL verification
  -x, --xpath string   xpath query

Examples:

Read from URL:

❯ scrape -c "h4 a" -a href "https://www.webscraper.io/test-sites/e-commerce/allinone"
/test-sites/e-commerce/allinone/product/244
/test-sites/e-commerce/allinone/product/269
/test-sites/e-commerce/allinone/product/192

Read from STDIN:

❯ curl -A 'Mozilla/4.0 (Mozilla/4.0; MSIE 7.0; Windows NT 5.1; SV1; .NET CLR 3.0.04506.30)' -s "https://www.webscraper.io/test-sites/e-commerce/allinone" | scrape -x "//h4/a" -a href
/test-sites/e-commerce/allinone/product/223
/test-sites/e-commerce/allinone/product/280
/test-sites/e-commerce/allinone/product/278

Read from file:

❯ scrape -x "//h4/a" /tmp/webscrapetest.html
<a href="/test-sites/e-commerce/allinone/product/223" class="title" title="Aspire E1-510">Aspire E1-510</a>
<a href="/test-sites/e-commerce/allinone/product/280" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>
<a href="/test-sites/e-commerce/allinone/product/278" class="title" title="Lenovo V510 Black">Lenovo V510 Blac...</a>