Qing Li 李庆

Email: dylan.liqing[at]gmail[dot]com

I am a research scientist and team lead at Beijing Institute for General Artificial Intelligence (BIGAI), China. I received my Ph.D. in 2022 from Department of Statistics at University of California, Los Angeles (UCLA), advised by Professor Song-Chun Zhu. During my Ph.D., I have interned at Google Research, Microsoft Azure AI and Amazon Alexa. Before UCLA, I obtained my degrees of Bachelor in 2015 and Master in 2018 from University of Science and Technology of China (USTC).

My research interests lie in the intersection of machine learning, computer vision, cognition, and robotics. My current research themes include:

  • Multimodal Understanding: vision & language understanding, visual reasoning, 3D scene understanding, video understanding
  • General Machine Learning: neural-symbolic reasoning and learning, structure learning, representation learning, generative modeling, few-shot learning
  • Embodied Agents: language-grounded task planning, reinforcement learning, robotics

Our team is actively recruiting full-time research scientists, engineers, and self-motivated interns. We are also seeking prospective PhD students and long-term collaborators for TongProgam (通计划). Feel free to contact me if you are interested!


2024-06 INSIGHT is selected as ICML 2024 Spotlight (top 3.5%)!
2024-05 🔥🔥🔥 Check out our new work PQ3D, the first unfied model capable of handling a wide range of 3D-VL tasks! The code and models will be released soon. Stay tuned!
2024-05 Two papers are accepted by ICML 2024. They are about 3D embodied generalist agent (LEO) and end-to-end neural-symbolic RL for explainable decision-making (INSIGHT). Congrats Jiangyong Huang et al. and Lirui Luo et al.!
2024-03 Our paper CLOVA about building tool-based visual assisants is accepted by CVPR 2024. Congrats Zhi Gao et al.!
2023-06 One paper is accepeted by ICCV 2023! Check out 3D-VisTA, a 3D vision-language foundation model.

Selected Publications

* Equal contribution, ✉ Corresponding author

  1. luo2024insight.png
    End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations Spotlight (top 3.5%)
    Lirui LuoGuoxi Zhang , Hongming Xu , Yaodong YangCong Fang , and Qing Li
    International Conference on Machine Learning (ICML), 2024
  2. zhu2024unifying.png
    Unifying 3D Vision-Language Understanding Via Promptable Queries
    Ziyu Zhu , Zhuofan Zhang , Xiaojian Ma , Xuesong Niu , Yixin ChenBaoxiong Jia , Zhidong DengSiyuan Huang , and Qing Li
    arXiv preprint arXiv:2405.11442, 2024
  3. jia2024sceneverse.png
    SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
    Baoxiong Jia* , Yixin Chen* , Huangyue Yu , Yan Wang , Xuesong Niu , Tengyu LiuQing Li , and Siyuan Huang
    arXiv preprint arXiv:2401.09340, 2024
  4. gao2024clova.png
    CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
    Zhi GaoYuntao Du , Xintong Zhang , Xiaojian MaWenjuan HanSong-Chun Zhu , and Qing Li
    The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  5. huang2024embodied.png
    An Embodied Generalist Agent in 3D World
    Jiangyong Huang* , Silong Yong* , Xiaojian Ma* , Xiongkun Linghu* , Puhao Li , Yan Wang , Qing LiSong-Chun ZhuBaoxiong Jia , and Siyuan Huang
    International Conference on Machine Learning (ICML), 2024
  6. li2024nsr.png
    Neural-Symbolic Recursive Machine for Systematic Generalization
    International Conference on Learning Representations (ICLR), 2024
  7. wu2024bongard.png
    Bongard-OpenWorld: Few-Shot Reasoning for Free-Form Visual Concepts in the Real World
    Rujie Wu* , Xiaojian Ma* , Zhenliang Zhang , Wei WangQing LiSong-Chun Zhu , and Yizhou Wang
    International Conference on Learning Representations (ICLR), 2024
  8. qin2023learning.png
    Learning Non-Markovian Decision-Making from State-Only Sequences
    Aoyang Qin , Feng GaoQing LiSong-Chun Zhu , and Sirui Xie
    Advances in Neural Information Processing Systems (NeurIPS), 2023
  9. li2023hint.png
    A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics Notable-top-25%
    International Conference on Learning Representations (ICLR), 2023
  10. zhu2023vista.png
    3D-VisTA: Pre-Trained Transformer for 3D Vision and Text Alignment
    Ziyu ZhuXiaojian MaYixin Chen , Zhidong DengSiyuan Huang , and Qing Li
    International Conference on Computer Vision (ICCV), 2023
  11. ma2023sqa3d.png
    SQA3D: Situated Question Answering in 3D Scenes
    International Conference on Learning Representations (ICLR), 2023
  12. hong2021smart.png
    SMART: A Situation Model for Algebra Story Problems via Attributed Grammar
    Yining HongQing LiRan Gong , Daniel Ciao , Siyuan Huang , and Song-Chun Zhu
    AAAI Conference on Artificial Intelligence (AAAI), 2021
  13. hong2021learning.png
    Learning by Fixing: Solving Math Word Problems with Weak Supervision
    Yining HongQing Li , Daniel Ciao , Siyuan Huang , and Song-Chun Zhu
    AAAI Conference on Artificial Intelligence (AAAI), 2021
  14. chen2021yourefit.png
    YouRefIt: Embodied Reference Understanding with Language and Gesture Oral
    Yixin ChenQing Li , Deqian Kong , Yik Lun Kei , Song-Chun Zhu , Tao Gao , Yixin Zhu , and Siyuan Huang
    International Conference on Computer Vision (ICCV), 2021
  15. hong2021vlgrammar.png
    VLGrammar: Grounded Grammar Induction of Vision and Language
    Yining HongQing LiSong-Chun Zhu , and Siyuan Huang
    International Conference on Computer Vision (ICCV), 2021
  16. li2020competence.png
    A Competence-Aware Curriculum for Visual Concepts Learning Via Question Answering Oral
    Qing LiSiyuan HuangYining Hong , and Song-Chun Zhu
    European Conference on Computer Vision (ECCV), 2020
  17. li2020ngs.png
    Closed Loop Neural-Symbolic Learning Via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning Best Paper in ICML Workshop
    International Conference on Machine Learning (ICML), 2020