This repo contains the implementation of FreqCacheEmbedding, which extends the vanilla PyTorch EmbeddingBag with cache mechanism to enable heterogeneous training for large scale recommendation models.
Basically, the preprocessing processes are derived from
Torchrec's utilities
and Avazu kaggle community
Please refer to recsys/datasets/preprocess_scripts
dir to see the details.
During the time this repo was built, another commonly adopted dataset, Criteo 1TB is unavailable (see this issue). We will append its preprocessing & running scripts very soon.
All the commands to run the FreqCacheEmbedding enabled recommendations models are presented in run.sh
Currently, this repo only contains DLRM & DeepFM models, and we are working on testing more recommendation models.