Amazon-scraper is a command line application to collect reviews and questions/answers from amazon products.
Read the documentation here: amazon-scraper on readthedocs
~ TODO
$ git clone https://github.com/picorana/amazon-scraper.git
$ cd amazon-scraper
$ pip install -r requirements.txt
$ python setup.py install
Run amazon-scraper
via command line by running
$ amazon-scraper [asin]
asin
is a unique identifier for a product on amazon. You can find it in the url:
A query to https://www.amazon.com/gp/product/B01H2E0J5M
would look like this:
$ amazon-scraper B01H2E0J5M
You can also insert multiple asins:
$ amazon-scraper B01H2E0J5M B01GYLZD8C B0736R3W1F
or load them from file:
$ amazon-scraper --file asins.txt
the file needs to have each asin on one line, like this:
B01H2E0J5M
B01GYLZD8C
B0736R3W1F
amazon-scraper
downloads pages, reviews, questions and answers.
It will save its output in folders:
pages
will contain the main pages of the product, useful for extracting more info about the product.
You can disable this function by using the option --save-pages
results
will contain the reviews, organized in json files.
You can disable scraping the reviews by using the option --no-reviews
questions
will contain the questions and answers, organized in json files.
You can disable scraping the questions by using the option --no-questions
positional arguments:
asin Amazon asin(s) to be scraped
optional arguments:
-h, --help show this help message and exit
--file FILE, -f FILE Specify path to list of asins
--save-main-pages, -p
Saves the main pages scraped
--verbose, -v Logging verbosity level
--no-reviews Do not scrape reviews
--no-questions Do not scrape questions
instagram-scraper has been used as a reference for the structure of the program.
this blogpost has been very useful in understanding the issues of building a scraper.