Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm Y Li*, F Liang*, L Zhao*, Y Cui, W Ouyang, J Shao, F Yu, J Yan International Conference on Learning Representations(ICLR) 2022, 2021 | 324 | 2021 |
Emu: Generative Pretraining in Multimodality Q Sun*, Q Yu*, Y Cui*, F Zhang*, X Zhang*, Y Wang, H Gao, J Liu, ... The Twelfth International Conference on Learning Representations, 2023 | 76* | 2023 |
Emu2: Generative multimodal models are in-context learners Q Sun*, Y Cui*, X Zhang*, F Zhang*, Q Yu*, Z Luo, Y Wang, Y Rao, J Liu, ... arXiv preprint arXiv:2312.13286, 2023 | 32 | 2023 |
Democratizing contrastive language-image pre-training: A clip benchmark of data, model, and supervision Y Cui, L Zhao, F Liang, Y Li, J Shao ICML First Workshop on Pre-training 2022, 2022 | 30 | 2022 |
Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline Y Li, B Huang, Z Chen, Y Cui, F Liang, M Shen, F Liu, E Xie, L Sheng, ... arXiv preprint arXiv:2301.12511, 2023 | 13 | 2023 |
Multi-modal gait recognition via effective spatial-temporal feature fusion Y Cui, Y Kang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 11 | 2023 |
Capsfusion: Rethinking image-text data at scale Q Yu, Q Sun, X Zhang, Y Cui, F Zhang, X Wang, J Liu arXiv preprint arXiv:2310.20550, 2023 | 7 | 2023 |
Gaittransformer: Multiple-temporal-scale transformer for cross-view gait recognition Y Cui, Y Kang 2022 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2022 | 5 | 2022 |
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters Q Sun, J Wang, Q Yu, Y Cui, F Zhang, X Zhang, X Wang arXiv preprint arXiv:2402.04252, 2024 | 4 | 2024 |
Learning Multiple Granularity Features for Unsupervised Person Re-Identification S Wang*, Y Cui*, Y Kang 2022 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2022 | | 2022 |