Qing Li 李庆
Email: dylan.liqing[at]
gmail[dot]
com
I am a research scientist and team lead at Beijing Institute for General Artificial Intelligence (BIGAI), China. I received my Ph.D. in 2022 from University of California, Los Angeles (UCLA), advised by Professor Song-Chun Zhu. During my Ph.D., I have interned at Google Research, Microsoft Azure AI and Amazon Alexa. Before UCLA, I obtained my degrees of Bachelor in 2015 and Master in 2018 from University of Science and Technology of China (USTC).
My long-term research goal is to develop a generalist agent that can perceive the real world, communicate with humans, and learn from feedback. To achieve this goal, I currently focus on:
- AGI Agents: LLM Agents, Vision-Language-Action (VLA), Embodied Agents
- Multimodal Understanding: Vision-Language Modeling (VLM), 3D Visual Grounding, Long-term Video Understanding
- Machine Learning: Neural-Symbolic Learning, Continual Learning, In-Context Learning
Our team is actively recruiting full-time research scientists, engineers, and self-motivated interns. We are also seeking prospective PhD students and long-term collaborators for TongProgam (通计划). Feel free to contact me if you are interested!
News
2024-08 | 🔥🔥🔥 Three papers are accepted by NeurIPS 2024! Check out these awesome works: FIRE, A dataset for feedback refining of large multimodal models. UltraEdit, a large-scale (~4M) high-quality dataset for instruction-based image editing. OmniJARVIS, a novel Vision-Language-Action (VLA) model for instruction following in Minecraft. |
---|---|
2024-08 | 🔥🔥🔥 Call for papers to the 1st workshop on Open-World Agents (NeurIPS 2024). |
2024-07 | I, together with Xiaojian Ma and Zhi Gao, gave a joint tutorial on “Multimodal Generalist Agents: Reasoning, Reflecting, and Learning like Humans” for the participants in TongProgram Summer School 2024. |
2024-07 | 🔥🔥🔥 Three papers are accepted by ECCV 2024! Check out these awesome works: PQ3D, a unfied model for 3D vision-language understanding; SceneVerse, the first million-scale 3D vision-language dataset; VideoAgent, a LLM agent that understands videos by using a structured memory and 4 tools. |
2024-06 | Call for papers to IJCLR 2024, which will happen on 20 - 22 September 2024 in Nanjing! I will serve as an area chair on neuro-symbolic learning and reasoning. If you are willing to be a PC member, please contact me! |
Selected Publications
- Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation Best Paper FinalistInternational Conference on Multimedia Retrieval, 2016