[go: up one dir, main page]

0% found this document useful (0 votes)
146 views3 pages

Interview Question Webscrap

Uploaded by

mat20d002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
146 views3 pages

Interview Question Webscrap

Uploaded by

mat20d002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Web Scraping and POST Request - Basic

Interview Questions

Web Scraping – General


Q1. What is web scraping?

• Extracting data from websites by programmatically accessing and


parsing the content.

Q2. What are some common libraries used for web scraping in
Python?

• requests, BeautifulSoup, lxml, Scrapy, Selenium, Playwright

Q3. What is the difference between requests.get() and requests.post()?

• GET: Used to retrieve data.


• POST: Used to send data to a server and often returns a result
based on the posted data.

Q4. When would you use a POST request instead of GET in scrap-
ing?

• When data is submitted via a form or the site uses POST to


generate content dynamically.

Q5. What are some challenges you might face while scraping a
website?

• JavaScript-rendered content, rate limiting / CAPTCHAs, chang-


ing site structure, legal and ethical concerns.

1
Q6. How can you handle pagination while scraping?

• Inspect the URL or POST parameters to identify pagination mech-


anisms (e.g., page number, offset).

Q7. What are headers and why are they important in HTTP re-
quests?

• Headers (like User-Agent) mimic browser behavior, manage cook-


ies, or provide authentication.

Practical POST Request Scraping


Q8. How do you simulate form submission using requests.post()?
import requests

url = " https :// example . com / form "


data = {
’ username ’: ’ myuser ’ ,
’ password ’: ’ mypass ’
}
response = requests . post ( url , data = data )

Q9. What is the purpose of inspecting network traffic in developer


tools when scraping?

• To understand how data is sent or received, especially to find API


endpoints and POST payloads.

Q10. How do you handle sessions and cookies while scraping?


import requests

s = requests . Session ()
s . get ( " https :// example . com / login " )
s . post ( " https :// example . com / auth " , data = login_data )

Q11. How can you deal with JavaScript-rendered content when


scraping?

2
• Use tools like Selenium, Playwright, or inspect network activity
to find backend APIs.

Q12. What is a headless browser and why is it useful for scraping?

• A browser without a GUI. Useful for automation and scraping


JS-heavy pages with tools like Selenium or Playwright.

Q13. What is the role of robots.txt in web scraping?

• It’s a site’s guideline for bots. Ethical scrapers respect it, though
it’s not enforced technically.

Q14. How can you avoid being blocked while scraping?

• Use proxies, rotate user agents, delay requests, use headless browsers,
and respect rate limits.

Q15. How can you extract data from HTML using BeautifulSoup?

• Use methods like .find(), .find all(), .select(), or .get text().

Q16. How can you debug failed POST requests?

• Check payload structure, headers, cookies, status codes, and net-


work traffic in browser dev tools.

Q17. What is a session in requests, and why is it useful?

• A requests.Session() object persists cookies and headers, mak-


ing it easier to manage authenticated sessions.

Q18. What HTTP status codes are relevant when scraping?

• 200 OK, 403 Forbidden, 404 Not Found, 429 Too Many Requests,
301/302 Redirect

You might also like