[go: up one dir, main page]

Skip to content

A framework-agnostic deep learning package and experiment manager

License

Notifications You must be signed in to change notification settings

ilex-paraguariensis/yerbamate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MatΓ© πŸ§‰ your modular AI project and experiment manager

Welcome to MatΓ©! πŸŽ‰πŸ‘‹πŸΌ

MatΓ© is an open science modular Python framework designed to streamline and simplify the development and management of machine learning projects. It was developed to address the reproducibility crisis in artificial intelligence research by promoting open science and accessible AI. πŸŒπŸ’»

The framework is built around the best software engineering practices of modularity and separation of concerns, encouraging quality coding, collaboration, and the sharing of models, trainers, data loaders, and knowledge. The modular design and separation of concerns simplify the development and maintenance of machine learning models, leading to an improved developer experience. πŸš€πŸ’»πŸ‘¨β€πŸ’»πŸ’¬πŸ§ 

With MatΓ©, you can easily install the source code of open-source projects and adhere to modularity and separation of concerns, making your models and modules sharable out of the box. This means you can collaborate more effectively with others and easily share your work. πŸ“¦πŸ’»πŸ€

Thank you for choosing MatΓ©, and we can't wait to see the amazing machine learning projects you'll create with it! πŸŽ‰πŸ‘¨β€πŸ’»πŸŒŸ

Features πŸŽ‰

  • Seamless integration with any python library such as PyTorch/Lightning, TensorFlow/Keras, JAX/Flax, Huggingface/transformers. πŸ€πŸ€—πŸ‰
  • Easy to use interface to add source code of models, trainers, and data loaders to your projects. πŸŽ¨πŸ’»πŸ“
  • Support for full customizability and reproducibility of results through the inclusion of dependencies in your project. πŸŒŸπŸ”πŸ§ͺ
  • Modular project structure that enforces a clean and organized codebase. πŸ§±πŸ‘¨β€πŸ’»πŸ‘Œ
  • Fully compatible with python. No need to use mate commands to run your experiments. πŸπŸ’»πŸš€
  • Convenient environment management through the MatΓ© Environment API. πŸŒπŸ‘¨β€πŸ’»πŸ”§
  • Support for pip and conda for dependency management. πŸ“¦πŸ”πŸ’»
  • Works with Colab out of the box. πŸŽ‰πŸ‘ŒπŸ€–

Table of Contents

Installation πŸ”Œ

pip install yerbamate

Quick Start ⚑

Initialize a project

mate init deepnet

This will generate the following empty project structure:

/
|-- models/
|   |-- __init__.py
|-- experiments/
|   |-- __init__.py
|-- trainers/
|   |-- __init__.py
|-- data/
|   |-- __init__.py

Install an experiment

To install an experiment, you can use mate install to install a module and its dependencies from a github repository. See docs for more details.

# Short version of GitHub URL https://github.com/oalee/big_transfer/tree/master/big_transfer/experiments/bit
mate install oalee/big_transfer/experiments/bit -yo pip

# Short version of GitHub URL https://github.com/oalee/deep-vision/tree/main/deepnet/experiments/resnet
mate install oalee/deep-vision/deepnet/experiments/resnet -yo pip

Install a module

You can install independant modules such as models, trainers, and data loaders from github projects that follow the Independent modular project structure.

mate install oalee/lightweight-gan/lgan/trainers/lgan 
mate install oalee/big_transfer/models/bit -yo pip
mate install oalee/deep-vision/deepnet/models/vit_pytorch -yo pip
mate install oalee/deep-vision/deepnet/trainers/classification -yo pip

Setting up environment

Set up your environment before running your experiments. This can be done by using shell, or env.json file in the root of your project. MatΓ© API requires results to be set in the environment. For more information, see docs.

DATA_PATH=/path/to/data
results=/path/to/results
{
    "DATA_PATH": "/path/to/data",
    "results": "/path/to/results"
}

Train a model

To train a model, you can use the mate train command. This command will train the model with the specified experiment. For example, to train the an experiment called learn in the bit module, you can use the following command:

mate train bit learn
# or alternatively use python
python -m deepnet.experiments.bit.learn train

Project Structure πŸ“

Deep learning projects can be organized into the following structure with modularity and seperation of concerns in mind. This offers a clean and organized codebase that is easy to maintain and is sharable out-of-the-box. The modular structure of the framework involves organizing the project directory in a hierarchical tree structure, with an arbitrary name given to the root project directory by the user. The project is then broken down into distinct concerns such as models, data, trainers, experiments, analyzers, and simulators, each with its own subdirectory. Within each concern, modules can be defined with their own subdirectories, such as specific models, trainers, data loaders, data augmentations, or loss functions.

└── project_name
    β”œβ”€β”€ data
    β”‚   β”œβ”€β”€ my_independent_data_loader
    β”‚   └── __init__.py
    β”œβ”€β”€ experiments
    β”‚   β”œβ”€β”€ my_awesome_experiment
    β”‚   └── __init__.py
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ models
    β”‚   β”œβ”€β”€ awesomenet
    β”‚   └── __init__.py
    └── trainers
        β”œβ”€β”€ big_brain_trainer
        └── __init__.py

Modularity

Modularity is a software design principle that focuses on creating self-contained, reusable and interchangeable components. In the context of a deep learning project, modularity means creating three independent standalone modules for models, trainers and data. This allows for a more organized, maintainable and sharable project structure. The forth module, experiments, is not independent, but rather combines the three modules together to create a complete experiment.

Independent Modules

Yerbamate prioritizes the organization of the project into independent modules when applicable. Independent modules only depend on Python dependencies (such as NumPy, PyTorch, TensorFlow, or Hugging Face), and the code inside the module uses relative imports to import within the module. This makes it an independent module that can be re-used when Python dependencies are installed.

Non-Independent Modules

In some cases, a combination of independent modules may be necessary for a particular concern. An example of this is the experiment concern, which imports and combines models, data, and trainers to define and create a specific experiment. In such cases, the module is not independent and is designed to combine the previously defined independent modules. In the case of non-independent modules, Yerbamate creates a dependency list of independent modules that can be used to install the code and Python dependencies. This ensures that the necessary modules are installed, and that the code can be run without issue.

Sample Modular Project Structure

This structure highlights modularity and seperation of concerns. The models, data and trainers modules are independent and can be used in any project. The experiments module is not independent, but rather combines the three modules together to create a complete experiment.

.
β”œβ”€β”€ mate.json
└── deepnet
    β”œβ”€β”€ data
    β”‚   β”œβ”€β”€ bit
    β”‚   β”‚   β”œβ”€β”€ fewshot.py
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ minibatch_fewshot.py
    β”‚   β”‚   β”œβ”€β”€ requirements.txt
    β”‚   β”‚   └── transforms.py
    β”‚   └── __init__.py
    β”œβ”€β”€ experiments
    β”‚   β”œβ”€β”€ bit
    β”‚   β”‚   β”œβ”€β”€ aug.py
    β”‚   β”‚   β”œβ”€β”€ dependencies.json
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ learn.py
    β”‚   β”‚   └── requirements.txt
    β”‚   └── __init__.py
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ models
    β”‚   β”œβ”€β”€ bit_torch
    β”‚   β”‚   β”œβ”€β”€ downloader
    β”‚   β”‚   β”‚   β”œβ”€β”€ downloader.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”‚   β”œβ”€β”€ requirements.txt
    β”‚   β”‚   β”‚   └── utils.py
    β”‚   β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”‚   β”œβ”€β”€ models.py
    β”‚   β”‚   └── requirements.txt
    β”‚   └── __init__.py
    └── trainers
        β”œβ”€β”€ bit_torch
        β”‚   β”œβ”€β”€ __init__.py
        β”‚   β”œβ”€β”€ lbtoolbox.py
        β”‚   β”œβ”€β”€ logger.py
        β”‚   β”œβ”€β”€ lr_schduler.py
        β”‚   β”œβ”€β”€ requirements.txt
        β”‚   └── trainer.py
        └── __init__.py

Example Projects πŸ“š

Please check out the transfer learning, vision models, and lightweight gan.

Documentation πŸ“š

Please check out the documentation.

Guides πŸ“–

For more information on modularity, please check out this guide.

For running experiments on Google Colab, please check out this example

Contribution 🀝

We welcome contributions from the community! Please check out our contributing guide for more information on how to get started.

Contact 🀝

For questions please contact:

oalee(at)proton.me

Open Science πŸ“–

As an open science work, Yerbamate strives to promote the principles of transparency and collaboration. To this end, the history of the LaTeX files for work are available on GitHub. These open science repositories are open to collaboration and encourage participation from the community to enhance the validity, reproducibility, accessibility, and quality of this work.