Program to collect information about games available on Google Play Store
- go to a location of your choice:
my_location - clone the code:
- current_folder:
my_location - execute_command:
git clone git@github.com:Gulats/store_scrapper.git
- current_folder:
- create virtualenv for store_scrapper:
- current_folder:
my_location - execute_commands:
pip install virtualenv(if virtualenv is not already installed)virtualenv -p <path-to-python3.7> store_scrapper(creates a virtualenv for project)
- current_folder:
- entering virtualenv:
- current_folder:
my_location - execute_commands:
cd store_scrapper(change directory location)source bin/activate(or equivalent windows activate cmd)
- current_folder:
- installing dependencies:
- current_folder:
store_scrapper - execute_command:
pip install -r requirements.txt(installs all required dependencies)
- current_folder:
- starting program:
- current_folder:
store_scrapper - execute_command:
python play_server.py(start execution)
- current_folder:
- interacting with program:
- install Postman on desktop [https://www.getpostman.com/downloads/] or add plugin to Chrome [https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop?hl=en]
- open Postman and click on import and choose the
store_scrapper/store_scrapper_postman.jsonfile. - use the following APIs for the listed tasks:
- see all active managers:
- Collection:
GETView - Path:
/view
- Collection:
- start a new manager:
- Collection:
POSTStart - Path:
/start
- Collection:
- peek an existing manager:
- Collection:
GETPeek - Path:
/peek?pid=<pid>
- Collection:
- flush records of an existing manager: (NOT YET IMPLEMENTED)
- Collection:
POSTFlush - Path:
/flush?pid=<pid>&show_records=<bool>
- Collection:
- stop an existing manager:
- Collection:
POSTStop - Path:
/stop?pid=<pid>&show_records=<bool>
- Collection:
- see all active managers:
- additional APIs for basic testing:
- get detail by app_id:
- Collection:
GETDetail - Path:
/detail?app_id=<app_id>
- Collection:
- get apps by collection and category:
- Collection:
GETCollection - Path:
/collection?catg_id=<catg>&coln_id=<coln>&page=<page>&results=<page_size>
- Collection:
- get apps by similar to a given app:
- Collection:
GETSimilar - Path:
/similar?app_id=<app_id>
- Collection:
- get detail by app_id:
- stopping program:
- execute_command:
press ctrl+c(stop execution) The log file gets generated atlog/play_server_<timestamp>.logOn pressingctrl+cthe execution of the program is stopped and the program attempts to gracefully shutdown active managers (if not previously stopped using the REST API). NOTE: Press CTRL+C ONLY ONCE otherwise data dump will fail. The data for each running manager is written atopt/sweeper_<timestamp>_<manager_id>.json
- execute_command:
- exiting virtualenv:
- current_folder:
store_scraper - execute_command:
deactivate(or equivalent windows deactivate cmd)
- current_folder:
- use separate event loops for crawling and server interaction
- implement retry on failures for requests and file writes
- gracefully shutdown processes before exit
- investigate TODO issues mentioned inline in code
- use asyncio.shield to protect the important tasks (https://stackoverflow.com/a/52511210/6687477)
- implement flush feature
- implement search feature
- implement previous result aggregation feature
- implement manager for collecting details of aggregated records
- experiment with
hlandglquery params to gather more records - seems like not all games return records for
/similar, e.g.: 'io.flowlab.TinyDictator995265', 'com.amelosinteractive.snake' -
def func(key_arg=<default-value>)does not populatekey_argwith<default-value>when called asfunc(key_arg=None). fix this logic in the entire code - make status as enum and map the fields depending upon status accordingly
- leverage subclasses for properly distinguishing manager type
- fix following error during program exit when
DETAILSmanager is run:/Users/bhgulati/Documents/harpoon/store_scrapper/lib/python3.7/site-packages/aiohttp/web.py:419: RuntimeWarning: coroutine 'PlayManager._shutdown' was never awaited _cancel_all_tasks(loop)