8000 CachedEmbedding/baselines at main · hpcaitech/CachedEmbedding · GitHub
[go: up one dir, main page]

Skip to content

Latest commit

 

History

History

baselines

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

FreqCacheEmbedding

This repo contains the implementation of FreqCacheEmbedding, which extends the vanilla PyTorch EmbeddingBag with cache mechanism to enable heterogeneous training for large scale recommendation models.

Dataset

  1. Criteo Kaggle
  2. Avazu

Basically, the preprocessing processes are derived from Torchrec's utilities and Avazu kaggle community Please refer to recsys/datasets/preprocess_scripts dir to see the details.

During the time this repo was built, another commonly adopted dataset, Criteo 1TB is unavailable (see this issue). We will append its preprocessing & running scripts very soon.

Command

All the commands to run the FreqCacheEmbedding enabled recommendations models are presented in run.sh

Model

Currently, this repo only contains DLRM & DeepFM models, and we are working on testing more recommendation models.

0