8000 GitHub - privacy-tech-lab/gpc-web-crawler at V1.0.0
[go: up one dir, main page]

Skip to content

privacy-tech-lab/gpc-web-crawler

Repository files navigation

GPC Web Data and Scripts

Data and scripts for researching GPC on the web

Do Not Sell link crawler

Firefox Analysis Mode crawler

Firefox Analysis Mode crawler is a crawler for analysis functionality of OptMeowt. It automatically runs OptMeowt Analysis mode on all the given sites of the input csv file in Firefox. The crawler is implemented using Puppeteer.

Development

  1. Clone this repo locally or download a zipped copy and unzip it.
  2. Ensure that you have node and npm installed.
  3. Navigate to the root directory of Firefox Analysis Mode crawler in terminal by running:
cd Firefox-analysis-mode-crawler
  1. Open sites.csv and enter the links you want to analyze in the first column. (Some examples included in the file)
  2. Install the dependencies by running:
PUPPETEER_PRODUCT=firefox npm install
  1. To start the crawler, run:
node crawler.js
  1. The Firefox Nightly browser will be lauched. In about one minute (before page navigation starts), load the extension from source. Open the popup, click 'More' in the upper right corner to navigate to the Settings page and switch to Analysis Mode.
  2. After the terminal prints "ALL TESTING DONE", navigate to the Settings page and click 'Export Analysis Data'.

NOTE: 1. The Firefox Nightly browser should always be on the testing site once page navigation starts. Do not open or navigate to other pages. Otherwise, the crawler will not work. 2. Killing the crawler before all testing done will lead to loss of all analysis data.

Sponsor this project

 

Packages

 
 
 

Contributors 15

0