So i was planning to to add anime to from a tracking website like LiveChart to my torrent rules list then i came across an idea to scrape anime titles from the website. This is a simple python Notebook that uses BeautifulSoup for css selction and cloudscraper to bypass cloudflare protection. The script is able to scrape anime titles from the website and save them to a text file. Then a user provided rules.json file is then modified, the functions can be changed per liking, and then you have this project that can do all this for you.
- Dynamic Rule Writing: Users can define custom rules that the parser will apply during data processing, offering flexibility to cater to various data analysis needs.
- Complex Data Handling: Engineered to manage and parse complex data structures, including nested and multi-layered data.
- High Performance: Optimized for performance, ensuring quick processing of large datasets without compromising accuracy.
- User-Friendly Interface: Designed with usability in mind, making it accessible for both technical and non-technical users.
Ensure you have Python 3.6 or later installed on your system. This project may also require additional Python libraries, which can be installed using the first line of this Jupyter Notebook.
-
Clone the repository:
git clone https://github.com/AmmarAhmedl200961/anime-list-parser.git cd anime-list-parser -
Install required Python libraries:
This can be acomplished by running the first cell of the Jupyter Notebook
To start using the parser, navigate to the project directory and run the jupyter notebook. The notebook contains detailed instructions on how to load data, define rules, and apply them to the data.
This parser allows users to write custom rules in a simple and intuitive way. Rules can be defined directly within the script or loaded from an external file. Here's a basic example of how to define a rule:
def custom_rule(data):
# Your rule logic here
return modified_dataFor more detailed instructions on writing and applying rules, refer to the instructions in the parser.ipynb notebook.
It is recommended that you explore the Notebook and the included rules.json (obsfucated) to understand the usage
The script is designed to work with the Livechart website. If the website structure changes, the scraper will need to be updated. However, the script will not work with any other website. If cloudflare mitigation provided the cloudscraper breaks, the scraper will also break. And finally the rss fields need to be changed if you have another torrent client, the script is designed so that you click import in the rss reader.
Contributions are welcome! If you have ideas for new features, improvements, or bug fixes, please feel free to fork the repository, make your changes, and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
For any questions or feedback regarding this project, please contact me at Discord: marto90123