0% found this document useful (0 votes)

35 views4 pages

Debugging Spiders - Scrapy 2.13.0 Documentation

This document outlines various techniques for debugging Scrapy spiders, including using the parse command, Scrapy shell, and logging. It provides examples of how to inspect responses and check the behavior of spider methods. Additionally, it includes instructions for debugging with Visual Studio Code by configuring a launch.json file.

Uploaded by

twirlylust

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views4 pages

Debugging Spiders - Scrapy 2.13.0 Documentation

Uploaded by

twirlylust

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

 / Debugging Spiders

Debugging Spiders

This document explains the most common techniques for debugging spiders. Consider the
following Scrapy spider below:

import scrapy
from myproject.items import MyItem

class MySpider(scrapy.Spider):
name = "myspider"
start_urls = (
"http://example.com/page1",
"http://example.com/page2",
)

def parse(self, response):

# <processing code not shown>
# collect `item_urls`
for item_url in item_urls:
yield scrapy.Request(item_url, self.parse_item)

def parse_item(self, response):

# <processing code not shown>
item = MyItem()
# populate `item` fields
# and extract item_details_url
yield scrapy.Request(
item_details_url, self.parse_details, cb_kwargs={"item": item}
)

def parse_details(self, response, item):

# populate more `item` fields
return item

Basically this is a simple spider which parses two pages of items (the start_urls). Items also
have a details page with additional information, so we use the cb_kwargs functionality of
Request to pass a partially populated item.

Parse Command

The most basic way of checking the output of your spider is to use the parse command. It
allows to check the behaviour of different parts of the spider at the method level. It has the
advantage of being flexible and simple to use, but does not allow debugging code insidelatest
a 
method.
In order to see the item scraped from a specific url:

$ scrapy parse --spider=myspider -c parse_item -d 2 <item_url>

[ ... scrapy log lines crawling example.com spider ... ]

>>> STATUS DEPTH LEVEL 2 <<<

# Scraped Items ------------------------------------------------------------
[{'url': <item_url>}]

# Requests -----------------------------------------------------------------
[]

Using the --verbose or -v option we can see the status at each depth level:

$ scrapy parse --spider=myspider -c parse_item -d 2 -v <item_url>

[ ... scrapy log lines crawling example.com spider ... ]

>>> DEPTH LEVEL: 1 <<<

# Scraped Items ------------------------------------------------------------
[]

# Requests -----------------------------------------------------------------
[<GET item_details_url>]

>>> DEPTH LEVEL: 2 <<<

# Scraped Items ------------------------------------------------------------
[{'url': <item_url>}]

# Requests -----------------------------------------------------------------
[]

Checking items scraped from a single start_url, can also be easily achieved using:

$ scrapy parse --spider=myspider -d 3 'http://example.com/page1'

Scrapy Shell

While the parse command is very useful for checking behaviour of a spider, it is of little help
to check what happens inside a callback, besides showing the response received and the
output. How to debug the situation when parse_details sometimes receives no item?

Fortunately, the shell is your bread and butter in this case (see Invoking the shell from
spiders to inspect responses):

latest
from scrapy.shell import inspect_response

def parse_details(self, response, item=None):

if item:
# populate more `item` fields
return item
else:
inspect_response(response, self)

scrapy.utils.response.open_in_browser(response: TextResponse, _openfunc: Callable[[str],

Any] = <function open>)→ Any [source]

Open response in a local web browser, adjusting the base tag for external links to work,
e.g. so that images and styles are displayed.

For example:

from scrapy.utils.response import open_in_browser

def parse_details(self, response):

if "item name" not in response.body:
open_in_browser(response)

Logging

Logging is another useful option for getting information about your spider run. Although not
as convenient, it comes with the advantage that the logs will be available in all future runs
should they be necessary again:

def parse_details(self, response, item=None):

if item:
# populate more `item` fields
return item
else:
self.logger.warning("No item received for %s", response.url)

latest
For more information, check the Logging section.
Visual Studio Code

To debug spiders with Visual Studio Code you can use the following launch.json :

{
"version": "0.1.0",
"configurations": [
{
"name": "Python: Launch Scrapy Spider",
"type": "python",
"request": "launch",
"module": "scrapy",
"args": [
"runspider",
"${file}"
],
"console": "integratedTerminal"
}
]
}

Also, make sure you enable “User Uncaught Exceptions”, to catch exceptions in your Scrapy
spider.

latest

Scrapy Guide for Python Developers
No ratings yet
Scrapy Guide for Python Developers
4 pages
Web Crawling and Social Media Mining: Module No. 5
No ratings yet
Web Crawling and Social Media Mining: Module No. 5
77 pages
Key Concepts in Scrapy
No ratings yet
Key Concepts in Scrapy
3 pages
Common Practices - Scrapy 2.12.0 Documentation
No ratings yet
Common Practices - Scrapy 2.12.0 Documentation
5 pages
Lab 8
No ratings yet
Lab 8
6 pages
Scrapy Beginners Series Part 1 - First Scrapy Spider - ScrapeOps
No ratings yet
Scrapy Beginners Series Part 1 - First Scrapy Spider - ScrapeOps
17 pages
Spiders Contracts - Scrapy 2.13.0 Documentation
No ratings yet
Spiders Contracts - Scrapy 2.13.0 Documentation
4 pages
Demov6 141213202739 Conversion Gate01
No ratings yet
Demov6 141213202739 Conversion Gate01
41 pages
Scrapy Tutorial PDF
100% (3)
Scrapy Tutorial PDF
114 pages
Web Scrapping: From NP-10
No ratings yet
Web Scrapping: From NP-10
11 pages
Scrapy - A Fast and Powerful Scraping and Web Crawling Framework
No ratings yet
Scrapy - A Fast and Powerful Scraping and Web Crawling Framework
2 pages
Web+Scraping+Cheat+Sheet+2 0
No ratings yet
Web+Scraping+Cheat+Sheet+2 0
3 pages
Extensions - Scrapy 2.13.0 Documentation
No ratings yet
Extensions - Scrapy 2.13.0 Documentation
9 pages
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
No ratings yet
Scrapy Beginners Series Part 2 - Cleaning & Processing Data - ScrapeOps
10 pages
Id-11659 Scrapping Web
No ratings yet
Id-11659 Scrapping Web
295 pages
Practical Web Scraping For Economists 1744341390
No ratings yet
Practical Web Scraping For Economists 1744341390
33 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
Scrapy Docs
100% (1)
Scrapy Docs
197 pages
Web Scraping in Python Using Scrapy
No ratings yet
Web Scraping in Python Using Scrapy
30 pages
WebScraping Lessons 1
100% (1)
WebScraping Lessons 1
3 pages
Learning Scrapy - Sample Chapter
0% (1)
Learning Scrapy - Sample Chapter
16 pages
Web Scraping Cheat Sheet 2.0
No ratings yet
Web Scraping Cheat Sheet 2.0
3 pages
SDS WebScraping Bonus Scrapy Vs BeautifulSoup PDF
No ratings yet
SDS WebScraping Bonus Scrapy Vs BeautifulSoup PDF
6 pages
Scrapy
No ratings yet
Scrapy
298 pages
Benchmarking - Scrapy 2.12.0 Documentation
No ratings yet
Benchmarking - Scrapy 2.12.0 Documentation
3 pages
Docs Scrapy Org en Latest
No ratings yet
Docs Scrapy Org en Latest
354 pages
Web Scraping
No ratings yet
Web Scraping
2 pages
How To Scrap Any Website's Content Using Scrapy
0% (1)
How To Scrap Any Website's Content Using Scrapy
20 pages
Scrapy-Org Documentation
No ratings yet
Scrapy-Org Documentation
352 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Scrapy
No ratings yet
Scrapy
248 pages
Scrapy Web Crawling Guide
No ratings yet
Scrapy Web Crawling Guide
25 pages
Web Scraping Tenders Guide
No ratings yet
Web Scraping Tenders Guide
12 pages
Scrapy Documentation Guide
No ratings yet
Scrapy Documentation Guide
260 pages
Using Scrapy in PyCharm
100% (1)
Using Scrapy in PyCharm
8 pages
Python Selenium Web Scraping Guide
No ratings yet
Python Selenium Web Scraping Guide
14 pages
Advanced Web Scraping - Bypassing - 403 Forbidden, - Captchas, and More - Sangaline
No ratings yet
Advanced Web Scraping - Bypassing - 403 Forbidden, - Captchas, and More - Sangaline
12 pages
Scrapy
No ratings yet
Scrapy
427 pages
Webscraping
No ratings yet
Webscraping
12 pages
Docs Scrapy Org en Latest
No ratings yet
Docs Scrapy Org en Latest
382 pages
Scrapy Documentation
No ratings yet
Scrapy Documentation
234 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
21 pages
Scrapy PDF
No ratings yet
Scrapy PDF
250 pages
Scrapy Documentation
No ratings yet
Scrapy Documentation
230 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
No ratings yet
Web Scraping With Scrapy - Practical Understanding - by Karthikeyan P - Jul, 2020 - Towards Data Science
16 pages
Scrapy
No ratings yet
Scrapy
8 pages
Scrapy
No ratings yet
Scrapy
230 pages
Scrapy Documentation
No ratings yet
Scrapy Documentation
423 pages
Scrapy
No ratings yet
Scrapy
306 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Docs Scrapy Org en Master
No ratings yet
Docs Scrapy Org en Master
411 pages
Python Web Scraping Guide
100% (2)
Python Web Scraping Guide
35 pages
Architecture Overview - Scrapy 2.13.0 Documentation
No ratings yet
Architecture Overview - Scrapy 2.13.0 Documentation
3 pages
Broad Crawls - Scrapy 2.12.0 Documentation
No ratings yet
Broad Crawls - Scrapy 2.12.0 Documentation
3 pages
SL Manual Lock 13 58 SHAFFER RAM BOP Page-1-5
0% (1)
SL Manual Lock 13 58 SHAFFER RAM BOP Page-1-5
5 pages
Transmutr Studio 111 x64 With Patch PDF
No ratings yet
Transmutr Studio 111 x64 With Patch PDF
4 pages
Department Library
No ratings yet
Department Library
4 pages
Migrate MySQL Database To Azure Database For Mysql Online
No ratings yet
Migrate MySQL Database To Azure Database For Mysql Online
11 pages
DJ 2000
No ratings yet
DJ 2000
22 pages
Terason Quick Start Guide
No ratings yet
Terason Quick Start Guide
16 pages
Audit Notification Form
100% (1)
Audit Notification Form
10 pages
Cs 3 Series Nonelastomeric
No ratings yet
Cs 3 Series Nonelastomeric
2 pages
Catalogue ns80 300 Eng PDF
No ratings yet
Catalogue ns80 300 Eng PDF
268 pages
Counting Sort 1: Solutions Guide
No ratings yet
Counting Sort 1: Solutions Guide
4 pages
Switchgear 2022
No ratings yet
Switchgear 2022
9 pages
1747 BSN PDF
No ratings yet
1747 BSN PDF
16 pages
Impact of Engineering
No ratings yet
Impact of Engineering
20 pages
Release Notes
No ratings yet
Release Notes
22 pages
AMP Centre of Excellence (ACE)
No ratings yet
AMP Centre of Excellence (ACE)
12 pages
PL-300 Exam - Free Actual Q&As, Page 3 - ExamTopics3
No ratings yet
PL-300 Exam - Free Actual Q&As, Page 3 - ExamTopics3
9 pages
Divyangjan Form-11
100% (1)
Divyangjan Form-11
1 page
Ghost in The Machine Pranv Mistry Projects Wiki: Download Now
No ratings yet
Ghost in The Machine Pranv Mistry Projects Wiki: Download Now
3 pages
5320 DS 5320
No ratings yet
5320 DS 5320
10 pages
Biometric Testing Report - Pad Algorithm - Template - Finance
No ratings yet
Biometric Testing Report - Pad Algorithm - Template - Finance
20 pages
49" 4K UHD Smart TV with Wi-Fi
No ratings yet
49" 4K UHD Smart TV with Wi-Fi
1 page
SEO Mastery with Chat GPT
No ratings yet
SEO Mastery with Chat GPT
159 pages
MUC-LUC Unit Coolers Overview
No ratings yet
MUC-LUC Unit Coolers Overview
6 pages
VP Director IT Professional Services in Detroit MI Resume Rick Paul
No ratings yet
VP Director IT Professional Services in Detroit MI Resume Rick Paul
3 pages
Contractors & Consultants Registration Services en
No ratings yet
Contractors & Consultants Registration Services en
7 pages
QA Manual Supplier
No ratings yet
QA Manual Supplier
22 pages
Chapter 16 Energy Projects Tic Tac Toe Board
No ratings yet
Chapter 16 Energy Projects Tic Tac Toe Board
4 pages
NB - Kalmar EN1175 2020 Safety Update Flyer
No ratings yet
NB - Kalmar EN1175 2020 Safety Update Flyer
2 pages
ARDO ProjectCollectionAdmins
No ratings yet
ARDO ProjectCollectionAdmins
1 page
B2B Ecommerce Flowchart (No CMS)
33% (3)
B2B Ecommerce Flowchart (No CMS)
13 pages