Welcome to MatΓ©! πππΌ
MatΓ© is an open science modular Python framework designed to streamline and simplify the development and management of machine learning projects. It was developed to address the reproducibility crisis in artificial intelligence research by promoting open science and accessible AI. ππ»
The framework is built around the best software engineering practices of modularity and separation of concerns, encouraging quality coding, collaboration, and the sharing of models, trainers, data loaders, and knowledge. The modular design and separation of concerns simplify the development and maintenance of machine learning models, leading to an improved developer experience. ππ»π¨βπ»π¬π§
With MatΓ©, you can easily install the source code of open-source projects and adhere to modularity and separation of concerns, making your models and modules sharable out of the box. This means you can collaborate more effectively with others and easily share your work. π¦π»π€
Thank you for choosing MatΓ©, and we can't wait to see the amazing machine learning projects you'll create with it! ππ¨βπ»π
- Seamless integration with any python library such as PyTorch/Lightning, TensorFlow/Keras, JAX/Flax, Huggingface/transformers. π€π€π
- Easy to use interface to add source code of models, trainers, and data loaders to your projects. π¨π»π
- Support for full customizability and reproducibility of results through the inclusion of dependencies in your project. πππ§ͺ
- Modular project structure that enforces a clean and organized codebase. π§±π¨βπ»π
- Fully compatible with python. No need to use mate commands to run your experiments. ππ»π
- Convenient environment management through the MatΓ© Environment API. ππ¨βπ»π§
- Support for pip and conda for dependency management. π¦ππ»
- Works with Colab out of the box. πππ€
- Installation
- Quick Start
- Project Structure
- Example Projects
- Documentation
- Contribution
- Guides
- Open Science
pip install yerbamate
mate init deepnet
This will generate the following empty project structure:
/
|-- models/
| |-- __init__.py
|-- experiments/
| |-- __init__.py
|-- trainers/
| |-- __init__.py
|-- data/
| |-- __init__.py
To install an experiment, you can use mate install
to install a module and its
dependencies from a github repository. See
docs for more details.
# Short version of GitHub URL https://github.com/oalee/big_transfer/tree/master/big_transfer/experiments/bit
mate install oalee/big_transfer/experiments/bit -yo pip
# Short version of GitHub URL https://github.com/oalee/deep-vision/tree/main/deepnet/experiments/resnet
mate install oalee/deep-vision/deepnet/experiments/resnet -yo pip
You can install independant modules such as models, trainers, and data loaders from github projects that follow the Independent modular project structure.
mate install oalee/lightweight-gan/lgan/trainers/lgan
mate install oalee/big_transfer/models/bit -yo pip
mate install oalee/deep-vision/deepnet/models/vit_pytorch -yo pip
mate install oalee/deep-vision/deepnet/trainers/classification -yo pip
Set up your environment before running your experiments. This can be done by using shell, or env.json
file in the root of your project. MatΓ© API requires results
to be set in the environment. For more information, see docs.
DATA_PATH=/path/to/data
results=/path/to/results
{
"DATA_PATH": "/path/to/data",
"results": "/path/to/results"
}
To train a model, you can use the mate train
command. This command will train
the model with the specified experiment. For example, to train the an experiment
called learn
in the bit
module, you can use the following command:
mate train bit learn
# or alternatively use python
python -m deepnet.experiments.bit.learn train
Deep learning projects can be organized into the following structure with modularity and seperation of concerns in mind. This offers a clean and organized codebase that is easy to maintain and is sharable out-of-the-box. The modular structure of the framework involves organizing the project directory in a hierarchical tree structure, with an arbitrary name given to the root project directory by the user. The project is then broken down into distinct concerns such as models, data, trainers, experiments, analyzers, and simulators, each with its own subdirectory. Within each concern, modules can be defined with their own subdirectories, such as specific models, trainers, data loaders, data augmentations, or loss functions.
βββ project_name
βββ data
β βββ my_independent_data_loader
β βββ __init__.py
βββ experiments
β βββ my_awesome_experiment
β βββ __init__.py
βββ __init__.py
βββ models
β βββ awesomenet
β βββ __init__.py
βββ trainers
βββ big_brain_trainer
βββ __init__.py
Modularity is a software design principle that focuses on creating self-contained, reusable and interchangeable components. In the context of a deep learning project, modularity means creating three independent standalone modules for models, trainers and data. This allows for a more organized, maintainable and sharable project structure. The forth module, experiments, is not independent, but rather combines the three modules together to create a complete experiment.
Yerbamate prioritizes the organization of the project into independent modules when applicable. Independent modules only depend on Python dependencies (such as NumPy, PyTorch, TensorFlow, or Hugging Face), and the code inside the module uses relative imports to import within the module. This makes it an independent module that can be re-used when Python dependencies are installed.
In some cases, a combination of independent modules may be necessary for a particular concern. An example of this is the experiment concern, which imports and combines models, data, and trainers to define and create a specific experiment. In such cases, the module is not independent and is designed to combine the previously defined independent modules. In the case of non-independent modules, Yerbamate creates a dependency list of independent modules that can be used to install the code and Python dependencies. This ensures that the necessary modules are installed, and that the code can be run without issue.
This structure highlights modularity and seperation of concerns. The models
,
data
and trainers
modules are independent and can be used in any project.
The experiments
module is not independent, but rather combines the three
modules together to create a complete experiment.
.
βββ mate.json
βββ deepnet
βββ data
β βββ bit
β β βββ fewshot.py
β β βββ __init__.py
β β βββ minibatch_fewshot.py
β β βββ requirements.txt
β β βββ transforms.py
β βββ __init__.py
βββ experiments
β βββ bit
β β βββ aug.py
β β βββ dependencies.json
β β βββ __init__.py
β β βββ learn.py
β β βββ requirements.txt
β βββ __init__.py
βββ __init__.py
βββ models
β βββ bit_torch
β β βββ downloader
β β β βββ downloader.py
β β β βββ __init__.py
β β β βββ requirements.txt
β β β βββ utils.py
β β βββ __init__.py
β β βββ models.py
β β βββ requirements.txt
β βββ __init__.py
βββ trainers
βββ bit_torch
β βββ __init__.py
β βββ lbtoolbox.py
β βββ logger.py
β βββ lr_schduler.py
β βββ requirements.txt
β βββ trainer.py
βββ __init__.py
Please check out the transfer learning, vision models, and lightweight gan.
Please check out the documentation.
For more information on modularity, please check out this guide.
For running experiments on Google Colab, please check out this example
We welcome contributions from the community! Please check out our contributing guide for more information on how to get started.
For questions please contact:
oalee(at)proton.me
As an open science work, Yerbamate strives to promote the principles of transparency and collaboration. To this end, the history of the LaTeX files for work are available on GitHub. These open science repositories are open to collaboration and encourage participation from the community to enhance the validity, reproducibility, accessibility, and quality of this work.