🏅
Focusing
Stars
🔮 Future ideas
3 repositories
(CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Computer Vision Tools & Deep Learning Resources (Codes Written by Sida Dai)