[go: up one dir, main page]

0% found this document useful (0 votes)
47 views3 pages

Concept How To Scrape Dynamic Web Pages

This document discusses two methods for scraping dynamic and AJAX web pages: using a headless browser which is slower but easier to implement, or reverse engineering and calling the undocumented API directly, which is faster but requires more technical skill to discover the API endpoint and structure. It then outlines the steps to take for reverse engineering an API call using developer tools to discover the endpoint and parameters, replicating the API call programmatically, parsing the structured response, extracting the desired data, and exporting the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views3 pages

Concept How To Scrape Dynamic Web Pages

This document discusses two methods for scraping dynamic and AJAX web pages: using a headless browser which is slower but easier to implement, or reverse engineering and calling the undocumented API directly, which is faster but requires more technical skill to discover the API endpoint and structure. It then outlines the steps to take for reverse engineering an API call using developer tools to discover the endpoint and parameters, replicating the API call programmatically, parsing the structured response, extracting the desired data, and exporting the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Concept - Scraping dynamic /

AJAX web pages


2 possible ways
1. Use a headless Browser

- e.g. HtmlUnit for Java

- much slower

- easier to detect

2. Reverse engineering and calling the undocumented API directly

- use the Browser’s Developer Tools

- very fast

- mostly returns already structured data (XML or JSON)


Concept - Steps
1. Open the page in your Browser and find the API
endpoint with the Developer Tools

2. Reverse engineer the API call (parameters, headers,


cookies, etc.)

3. Replicate the API call with Unirest and parse the data
(XML, JSON, sometimes HTML)

4. Extract the desired data

5. Export the results

You might also like