-
step 1: Fetch the data from klook using scraper, and upload to Atlas MongoDB.
-
activities
- Schema:
{
'activity_id': string,
'review_star': float,
'title': string,
'update_ts': datetime
}
- The activity list is js-rendered, so I requested the API to extract the data, and used language as 'zh-TW' (default = en-US) and currency as 'NTD' (default = HKD) described in headers.
- Transform the data from json to dataframe, add datetime at now as 'update_ts' column for next step filtering.
- Uploaded to MonggoDB (collection name = 'activity').
**Note: MongoDB API config should be masked for database security. (e.g., AWS Secret Manager.) **
- Schema:
-
reviews
- Schema:
{
'activity_id': string,
'id': string,
'author_id': string,
'content': string,
'rating': int,
'update_ts': datetime
}
- Requested the API using activity_id, set limit as 100.
- Uploaded to MongoDB (collection name = 'review').
Note: This part would be IP-banned if requesting too many times. Could use Scrapy instead to implement random IP scraping.
- Schema:
-
-
step 2:
- Fetched data from MongoDB.
- Filter data using 'review_star' and 'update_ts'.
- Exported to CSV file.
-
Notifications
You must be signed in to change notification settings - Fork 0
dot1258/klook-scraper
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published