Luowei Zhou

Cited by

	All	Since 2019
Citations	5720	5635
h-index	26	26
i10-index	32	32

2200

1100

550

1650

201820192020202120222023202463 162 342 625 1216 2184 1102

Public access

View all

16 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Jason CorsoProfessor of Robotics, Electrical Engineering and Computer Science, University of MichiganVerified email at umich.edu
Chenliang XuAssociate Professor, University of RochesterVerified email at rochester.edu
Jianwei YangPrincipal Researcher, Microsoft Research, RedmondVerified email at microsoft.com
Zhe GanResearch Scientist, AppleVerified email at apple.com
Jianfeng GaoMicrosoft Research, RedmondVerified email at microsoft.com
Linjie (Lindsey) LiSenior Researcher, MicrosoftVerified email at microsoft.com
Bin XiaoMicrosoft GenAIVerified email at microsoft.com
Dongdong ChenPrincipal Researcher, GenAI, MicrosoftVerified email at mail.ustc.edu.cn
Yu ChengVisiting Professor at Rice UniversityVerified email at rice.edu
Jie Lei 雷杰Research Scientist, Meta AIVerified email at fb.com
Caiming XiongSalesforce ResearchVerified email at salesforce.com
Richard Socheryou.comVerified email at stanford.edu
Yingbo ZhouSenior Research Director, Salesforce ResearchVerified email at salesforce.com
Lei ZhangInternational Digital Economy Academy (IDEA)Verified email at idea.edu.cn
Hamid PalangiMicrosoft Research and University of WashingtonVerified email at microsoft.com
Mike Z. SHOUNational U. of Singapore; Facebook AI; Columbia UniversityVerified email at columbia.edu
Xinlei ChenFAIR, MetaVerified email at meta.com
Marcus RohrbachProfessor for Multimodal Reliable AI, TU Darmstadt, GermanyVerified email at tu-darmstadt.de
Yannis KalantidisNAVER LABS EuropeVerified email at naverlabs.com
Chunlin ChenNanjing UniversityVerified email at nju.edu.cn

Luowei Zhou

Research Scientist, Google Deepmind

Verified email at google.com - Homepage

Vision and Language Multimodal Language Models Video Analysis Generative Models


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Unified vision-language pre-training for image captioning and vqa L Zhou, H Palangi, L Zhang, H Hu, J Corso, J Gao Proceedings of the AAAI conference on artificial intelligence 34 (07), 13041 …, 2020	856	2020
Towards automatic learning of procedures from web instructional videos L Zhou, C Xu, J Corso Proceedings of the AAAI Conference on Artificial Intelligence 32 (1), 2018	681	2018
Florence: A new foundation model for computer vision L Yuan, D Chen, YL Chen, N Codella, X Dai, J Gao, H Hu, X Huang, B Li, ... arXiv preprint arXiv:2111.11432, 2021	665	2021
End-to-end dense video captioning with masked transformer L Zhou, Y Zhou, JJ Corso, R Socher, C Xiong Proceedings of the IEEE conference on computer vision and pattern …, 2018	604	2018
Less is more: Clipbert for video-and-language learning via sparse sampling J Lei, L Li, L Zhou, Z Gan, TL Berg, M Bansal, J Liu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021	581	2021
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023	427	2023
Regionclip: Region-based language-image pretraining Y Zhong, J Yang, P Zhang, C Li, N Codella, LH Li, L Zhou, X Dai, L Yuan, ... Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022	339	2022
Grounded video description L Zhou, Y Kalantidis, X Chen, JJ Corso, M Rohrbach Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2019	207	2019
Bevt: Bert pretraining of video transformers R Wang, D Chen, Z Wu, Y Chen, X Dai, M Liu, YG Jiang, L Zhou, L Yuan Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	171	2022
Clip-event: Connecting text and images with event structures M Li, R Xu, S Wang, L Zhou, X Lin, C Zhu, M Zeng, H Ji, SF Chang Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022	106	2022
Omnivl: One foundation model for image-language and video-language tasks J Wang, D Chen, Z Wu, C Luo, L Zhou, Y Zhao, Y Xie, C Liu, YG Jiang, ... Advances in neural information processing systems 35, 5696-5710, 2022	103	2022
Dense video captioning Y Zhou, L Zhou, C Xiong, R Socher US Patent 10,542,270, 2020	98	2020
Value: A multi-task benchmark for video-and-language understanding evaluation L Li, J Lei, Z Gan, L Yu, YC Chen, R Pillai, Y Cheng, L Zhou, XE Wang, ... arXiv preprint arXiv:2106.04632, 2021	95	2021
Watch what you just said: Image captioning with text-conditional attention L Zhou, C Xu, P Koch, JJ Corso Proceedings of the on Thematic Workshops of ACM Multimedia 2017, 305-313, 2017	92	2017
Language models with image descriptors are strong few-shot video-language learners Z Wang, M Li, R Xu, L Zhou, J Lei, X Lin, S Wang, Z Yang, C Zhu, ... Advances in Neural Information Processing Systems 35, 8483-8497, 2022	86	2022
Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction L Zhou, N Louis, JJ Corso British Machine Vision Conference, 2018	83	2018
Uc2: Universal cross-lingual cross-modal vision-and-language pre-training M Zhou, L Zhou, S Wang, Y Cheng, L Li, Z Yu, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021	70	2021
Multiagent reinforcement learning with sparse interactions by negotiation and knowledge transfer L Zhou, P Yang, C Chen, Y Gao IEEE transactions on cybernetics 47 (5), 1238-1250, 2016	60	2016
Image caption generation with text-conditional semantic attention L Zhou, C Xu, P Koch, JJ Corso arXiv preprint arXiv:1606.04621 2, 2016	47	2016
Mist: Multi-modal iterative spatial-temporal transformer for long-form video question answering D Gao, L Zhou, L Ji, L Zhu, Y Yang, MZ Shou Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	44	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors