Qing Li 李庆

Email: dylan.liqing[at]gmail[dot]com

I am a research scientist and team lead at Beijing Institute for General Artificial Intelligence (BIGAI), China. I received my Ph.D. in 2022 from University of California, Los Angeles (UCLA), advised by Professor Song-Chun Zhu. During my Ph.D., I have interned at Google Research, Microsoft Azure AI and Amazon Alexa. Before UCLA, I obtained my degrees of Bachelor in 2015 and Master in 2018 from University of Science and Technology of China (USTC).

My long-term research goal is to develop a generalist agent that can perceive the 3D world, communicate with humans, and learn from feedback. To achieve this goal, I am currently interested in:

Multimodal Understanding: Multimodal LLMs, 3D LLMs, Long-term Video Understanding
Multimodal Agents: LLM Agents, Vision-Language-Action (VLA), Embodied Agents

Our team is actively recruiting full-time research scientists, engineers, and self-motivated interns. We are also seeking prospective PhD students and long-term collaborators for TongProgam (通计划). Feel free to contact me if you are interested!

News

2025-06	Two papers are accepted by ICCV 2025! Check out these works: MTU3D (receiving perfect review scores) and Embodied VideoAgent.
2025-04	I am invited as an Area Chair for NeurIPS 2025.
2025-03	Two papers are accepted by CVPR 2025!
2025-01	Two papers are accepted by ICLR 2025! Check out these awesome works: Multimodal Knowledge Editing and Multimodal Agent Tuning (Spotlight).
2024-11	I am selected as Top Reviewers of NeurIPS 2024.

Selected Publications

* Equal contribution, ✉ Corresponding author

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

Ziyu Zhu , Xilin Wang , Yixuan Li , Zhuofan Zhang , Xiaojian Ma , Yixin Chen , Baoxiong Jia , Wei Liang , Qian Yu , Zhidong Deng^✉ , Siyuan Huang^✉ , and Qing Li^✉

International Conference on Computer Vision (ICCV), 2025

arXiv Bib Website

@article{zhu2025mtu,
  title = {Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation},
  author = {Zhu, Ziyu and Wang, Xilin and Li, Yixuan and Zhang, Zhuofan and Ma, Xiaojian and Chen, Yixin and Jia, Baoxiong and Liang, Wei and Yu, Qian and Deng, Zhidong and Huang, Siyuan and Li, Qing},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2025},
}

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

Yue Fan , Xiaojian Ma , Rongpeng Su , Jun Guo , Rujie Wu , Xi Chen , and Qing Li^✉

International Conference on Computer Vision (ICCV), 2025

arXiv Bib Website

@article{fan2025eva,
  title = {Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding},
  author = {Fan, Yue and Ma, Xiaojian and Su, Rongpeng and Guo, Jun and Wu, Rujie and Chen, Xi and Li, Qing},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2025},
}

MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans

Huangyue Yu , Baoxiong Jia , Yixin Chen , Yandan Yang , Rongpeng Su , Jiaxin Li , Qing Li , Wei Liang , Song-Chun Zhu , Tengyu Liu , and Siyuan Huang

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

arXiv Bib Website

@article{yu2025metascenes,
  title = {MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans},
  author = {Yu, Huangyue and Jia, Baoxiong and Chen, Yixin and Yang, Yandan and Su, Rongpeng and Li, Jiaxin and Li, Qing and Liang, Wei and Zhu, Song-Chun and Liu, Tengyu and Huang, Siyuan},
  journal = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025},
}

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

Jiangyong Huang , Baoxiong Jia , Ziyu Zhu , Yan Wang , Xiongkun Linghu , Qing Li , Song-Chun Zhu , and Siyuan Huang

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

arXiv Bib Website

@article{huang2025beacon3d,
  title = {Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis},
  author = {Huang, Jiangyong and Jia, Baoxiong and Zhu, Ziyu and Wang, Yan and Linghu, Xiongkun and Li, Qing and Zhu, Song-Chun and Huang, Siyuan},
  journal = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025},
}

Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Spotlight

Zhi Gao* , Bofei Zhang* , Pengxiang Li* , Xiaojian Ma , Tao Yuan , Yue Fan , Yuwei Wu^✉ , Yunde Jia , Song-Chun Zhu , and Qing Li^✉

International Conference on Learning Representations (ICLR), 2025

arXiv Bib Website

Spotlight

@article{2025mat,
  title = {Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage},
  author = {Gao, Zhi and Zhang, Bofei and Li, Pengxiang and Ma, Xiaojian and Yuan, Tao and Fan, Yue and Wu, Yuwei and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
  journal = {International Conference on Learning Representations (ICLR)},
  year = {2025},
}

MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

Yuntao Du* , Kailin Jiang* , Zhi Gao , Chenrui Shi , Zilong Zheng^✉ , Siyuan Qi , and Qing Li^✉

International Conference on Learning Representations (ICLR), 2025

arXiv Bib Website

@article{2025mmke,
  title = {MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge},
  author = {Du, Yuntao and Jiang, Kailin and Gao, Zhi and Shi, Chenrui and Zheng, Zilong and Qi, Siyuan and Li, Qing},
  journal = {International Conference on Learning Representations (ICLR)},
  year = {2025},
}

FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models

Pengxiang Li* , Zhi Gao* , Bofei Zhang* , Tao Yuan , Yuwei Wu^✉ , Mehrtash Harandi , Yunde Jia , Song-Chun Zhu , and Qing Li^✉

Neural Information Processing Systems: Datasets and Benchmarks (NeurIPS D&B), 2024

arXiv Bib Website

@article{2024fire,
  title = {FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models},
  author = {Li, Pengxiang and Gao, Zhi and Zhang, Bofei and Yuan, Tao and Wu, Yuwei and Harandi, Mehrtash and Jia, Yunde and Zhu, Song-Chun and Li, Qing},
  journal = {Neural Information Processing Systems: Datasets and Benchmarks (NeurIPS D&B)},
  year = {2024},
}

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

Haozhe Zhao* , Xiaojian Ma* , Liang Chen , Shuzheng Si , Rujie Wu , Kaikai An , Peiyu Yu , Minjia Zhang , Qing Li^✉ , and Baobao Chang^✉

Neural Information Processing Systems: Datasets and Benchmarks (NeurIPS D&B), 2024

arXiv Bib Website

@article{2024ultraedit,
  title = {UltraEdit: Instruction-based Fine-Grained Image Editing at Scale},
  author = {Zhao, Haozhe and Ma, Xiaojian and Chen, Liang and Si, Shuzheng and Wu, Rujie and An, Kaikai and Yu, Peiyu and Zhang, Minjia and Li, Qing and Chang, Baobao},
  journal = {Neural Information Processing Systems: Datasets and Benchmarks (NeurIPS D&B)},
  year = {2024},
}

Task-oriented Sequential Grounding in 3D Scenes

Zhuofan Zhang , Ziyu Zhu , Pengxiang Li , Tengyu Liu , Xiaojian Ma , Yixin Chen , Baoxiong Jia , Siyuan Huang , and Qing Li^✉

arXiv preprint arXiv:2408.04034, 2024

arXiv Bib Website

@article{2024sg3d,
  title = {Task-oriented Sequential Grounding in 3D Scenes},
  author = {Zhang, Zhuofan and Zhu, Ziyu and Li, Pengxiang and Liu, Tengyu and Ma, Xiaojian and Chen, Yixin and Jia, Baoxiong and Huang, Siyuan and Li, Qing},
  journal = {arXiv preprint arXiv:2408.04034},
  year = {2024},
}

End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations Spotlight (top 3.5%)

Lirui Luo , Guoxi Zhang , Hongming Xu , Yaodong Yang , Cong Fang^✉ , and Qing Li^✉

International Conference on Machine Learning (ICML), 2024

arXiv Bib Website

Spotlight (top 3.5%)

@article{luo2024insight,
  title = {End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations},
  author = {Luo, Lirui and Zhang, Guoxi and Xu, Hongming and Yang, Yaodong and Fang, Cong and Li, Qing},
  journal = {International Conference on Machine Learning (ICML)},
  year = {2024},
}

Unifying 3D Vision-Language Understanding Via Promptable Queries

Ziyu Zhu , Zhuofan Zhang , Xiaojian Ma , Xuesong Niu , Yixin Chen , Baoxiong Jia , Zhidong Deng^✉ , Siyuan Huang^✉ , and Qing Li^✉

European Conference on Computer Vision (ECCV), 2024

arXiv Bib Website

@article{zhu2024unifying,
  title = {Unifying 3D Vision-Language Understanding Via Promptable Queries},
  author = {Zhu, Ziyu and Zhang, Zhuofan and Ma, Xiaojian and Niu, Xuesong and Chen, Yixin and Jia, Baoxiong and Deng, Zhidong and Huang, Siyuan and Li, Qing},
  journal = {European Conference on Computer Vision (ECCV)},
  year = {2024}
}

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Yue Fan* , Xiaojian Ma* , Rujie Wu , Yuntao Du , Jiaqi Li , Zhi Gao , and Qing Li^✉

European Conference on Computer Vision (ECCV), 2024

arXiv Bib Website

@article{fan2024videoagent,
  title = {VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding},
  author = {Fan, Yue and Ma, Xiaojian and Wu, Rujie and Du, Yuntao and Li, Jiaqi and Gao, Zhi and Li, Qing},
  journal = {European Conference on Computer Vision (ECCV)},
  year = {2024}
}

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Baoxiong Jia* , Yixin Chen* , Huangyue Yu , Yan Wang , Xuesong Niu , Tengyu Liu , Qing Li , and Siyuan Huang

European Conference on Computer Vision (ECCV), 2024

arXiv Bib Website

@article{jia2024sceneverse,
  title = {SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding},
  author = {Jia, Baoxiong and Chen, Yixin and Yu, Huangyue and Wang, Yan and Niu, Xuesong and Liu, Tengyu and Li, Qing and Huang, Siyuan},
  journal = {European Conference on Computer Vision (ECCV)},
  year = {2024}
}

CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update

Zhi Gao , Yuntao Du , Xintong Zhang , Xiaojian Ma , Wenjuan Han , Song-Chun Zhu , and Qing Li^✉

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

arXiv Bib Website

@article{gao2024clova,
  title = {CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update},
  author = {Gao, Zhi and Du, Yuntao and Zhang, Xintong and Ma, Xiaojian and Han, Wenjuan and Zhu, Song-Chun and Li, Qing},
  journal = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2024}
}

An Embodied Generalist Agent in 3D World

Jiangyong Huang* , Silong Yong* , Xiaojian Ma* , Xiongkun Linghu* , Puhao Li , Yan Wang , Qing Li , Song-Chun Zhu , Baoxiong Jia , and Siyuan Huang

International Conference on Machine Learning (ICML), 2024

arXiv Bib Website

@article{huang2024embodied,
  title = {An Embodied Generalist Agent in 3D World},
  author = {Huang, Jiangyong and Yong, Silong and Ma, Xiaojian and Linghu, Xiongkun and Li, Puhao and Wang, Yan and Li, Qing and Zhu, Song-Chun and Jia, Baoxiong and Huang, Siyuan},
  journal = {International Conference on Machine Learning (ICML)},
  year = {2024}
}

Neural-Symbolic Recursive Machine for Systematic Generalization

Qing Li , Yixin Zhu , Yitao Liang , Ying Nian Wu , Song-Chun Zhu , and Siyuan Huang

International Conference on Learning Representations (ICLR), 2024

arXiv Bib Website

@article{li2024nsr,
  title = {Neural-Symbolic Recursive Machine for Systematic Generalization},
  author = {Li, Qing and Zhu, Yixin and Liang, Yitao and Wu, Ying Nian and Zhu, Song-Chun and Huang, Siyuan},
  journal = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}

Bongard-OpenWorld: Few-Shot Reasoning for Free-Form Visual Concepts in the Real World

Rujie Wu* , Xiaojian Ma* , Zhenliang Zhang , Wei Wang^✉ , Qing Li^✉ , Song-Chun Zhu , and Yizhou Wang

International Conference on Learning Representations (ICLR), 2024

arXiv Bib Website

@article{wu2024bongard,
  title = {Bongard-OpenWorld: Few-Shot Reasoning for Free-Form Visual Concepts in the Real World},
  author = {Wu, Rujie and Ma, Xiaojian and Zhang, Zhenliang and Wang, Wei and Li, Qing and Zhu, Song-Chun and Wang, Yizhou},
  journal = {International Conference on Learning Representations (ICLR)},
  year = {2024}
}

Learning Non-Markovian Decision-Making from State-Only Sequences

Aoyang Qin , Feng Gao , Qing Li , Song-Chun Zhu , and Sirui Xie

Neural Information Processing Systems (NeurIPS), 2023

@article{qin2023learning,
  title = {Learning Non-Markovian Decision-Making from State-Only Sequences},
  author = {Qin, Aoyang and Gao, Feng and Li, Qing and Zhu, Song-Chun and Xie, Sirui},
  journal = {Neural Information Processing Systems (NeurIPS)},
  year = {2023}
}

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics Notable-top-25%

Qing Li , Siyuan Huang , Yining Hong , Yixin Zhu , Ying Nian Wu , and Song-Chun Zhu

International Conference on Learning Representations (ICLR), 2023

arXiv Bib Website

Notable-top-25%

@article{li2023hint,
  title = {A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics},
  author = {Li, Qing and Huang, Siyuan and Hong, Yining and Zhu, Yixin and Wu, Ying Nian and Zhu, Song-Chun},
  journal = {International Conference on Learning Representations (ICLR)},
  year = {2023}
}

3D-VisTA: Pre-Trained Transformer for 3D Vision and Text Alignment

Ziyu Zhu , Xiaojian Ma , Yixin Chen , Zhidong Deng^✉ , Siyuan Huang^✉ , and Qing Li^✉

International Conference on Computer Vision (ICCV), 2023

arXiv Bib Website

@article{zhu2023vista,
  title = {3D-VisTA: Pre-Trained Transformer for 3D Vision and Text Alignment},
  author = {Zhu, Ziyu and Ma, Xiaojian and Chen, Yixin and Deng, Zhidong and Huang, Siyuan and Li, Qing},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2023}
}

SQA3D: Situated Question Answering in 3D Scenes

Xiaojian Ma* , Silong Yong* , Zilong Zheng , Qing Li , Yitao Liang , Song-Chun Zhu , and Siyuan Huang

International Conference on Learning Representations (ICLR), 2023

arXiv Bib Website

@article{ma2023sqa3d,
  title = {SQA3D: Situated Question Answering in 3D Scenes},
  author = {Ma, Xiaojian and Yong, Silong and Zheng, Zilong and Li, Qing and Liang, Yitao and Zhu, Song-Chun and Huang, Siyuan},
  journal = {International Conference on Learning Representations (ICLR)},
  year = {2023}
}

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar

Yining Hong , Qing Li , Ran Gong , Daniel Ciao , Siyuan Huang , and Song-Chun Zhu

AAAI Conference on Artificial Intelligence (AAAI), 2021

@article{hong2021smart,
  title = {SMART: A Situation Model for Algebra Story Problems via Attributed Grammar},
  author = {Hong, Yining and Li, Qing and Gong, Ran and Ciao, Daniel and Huang, Siyuan and Zhu, Song-Chun},
  journal = {AAAI Conference on Artificial Intelligence (AAAI)},
  year = {2021}
}

Learning by Fixing: Solving Math Word Problems with Weak Supervision

Yining Hong , Qing Li , Daniel Ciao , Siyuan Huang , and Song-Chun Zhu

AAAI Conference on Artificial Intelligence (AAAI), 2021

@article{hong2021learning,
  title = {Learning by Fixing: Solving Math Word Problems with Weak Supervision},
  author = {Hong, Yining and Li, Qing and Ciao, Daniel and Huang, Siyuan and Zhu, Song-Chun},
  journal = {AAAI Conference on Artificial Intelligence (AAAI)},
  year = {2021}
}

YouRefIt: Embodied Reference Understanding with Language and Gesture Oral

Yixin Chen , Qing Li , Deqian Kong , Yik Lun Kei , Song-Chun Zhu , Tao Gao , Yixin Zhu , and Siyuan Huang

International Conference on Computer Vision (ICCV), 2021

Oral

@article{chen2021yourefit,
  title = {YouRefIt: Embodied Reference Understanding with Language and Gesture},
  author = {Chen, Yixin and Li, Qing and Kong, Deqian and Kei, Yik Lun and Zhu, Song-Chun and Gao, Tao and Zhu, Yixin and Huang, Siyuan},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2021}
}

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong , Qing Li , Song-Chun Zhu , and Siyuan Huang

International Conference on Computer Vision (ICCV), 2021

@article{hong2021vlgrammar,
  title = {VLGrammar: Grounded Grammar Induction of Vision and Language},
  author = {Hong, Yining and Li, Qing and Zhu, Song-Chun and Huang, Siyuan},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2021}
}

A Competence-Aware Curriculum for Visual Concepts Learning Via Question Answering Oral

Qing Li , Siyuan Huang , Yining Hong , and Song-Chun Zhu

European Conference on Computer Vision (ECCV), 2020

arXiv Bib Website

Oral

@article{li2020competence,
  title = {A Competence-Aware Curriculum for Visual Concepts Learning Via Question Answering},
  author = {Li, Qing and Huang, Siyuan and Hong, Yining and Zhu, Song-Chun},
  journal = {European Conference on Computer Vision (ECCV)},
  year = {2020}
}

Closed Loop Neural-Symbolic Learning Via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning Best Paper in ICML Workshop

Qing Li , Siyuan Huang , Yining Hong , Yixin Chen , Ying Nian Wu , and Song-Chun Zhu

International Conference on Machine Learning (ICML), 2020

arXiv Bib Website

Best Paper in ICML Workshop

@article{li2020ngs,
  title = {Closed Loop Neural-Symbolic Learning Via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning},
  author = {Li, Qing and Huang, Siyuan and Hong, Yining and Chen, Yixin and Wu, Ying Nian and Zhu, Song-Chun},
  journal = {International Conference on Machine Learning (ICML)},
  year = {2020}
}

Why Does a Visual Question Have Different Answers?

Nilavra Bhattacharya , Qing Li , and Danna Gurari

International Conference on Computer Vision (ICCV), 2019

arXiv Bib Website

@article{bhattacharya2019visual,
  title = {Why Does a Visual Question Have Different Answers?},
  author = {Bhattacharya, Nilavra and Li, Qing and Gurari, Danna},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2019}
}

VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People

Danna Gurari , Qing Li , Chi Lin , Yinan Zhao , Anhong Guo , Abigale Stangl , and Jeffrey P Bigham

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

@article{gurari2019vizwizpriv,
  title = {VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People},
  author = {Gurari, Danna and Li, Qing and Lin, Chi and Zhao, Yinan and Guo, Anhong and Stangl, Abigale and Bigham, Jeffrey P},
  journal = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2019}
}

Tell-and-Answer: Towards Explainable Visual Question Answering Using Attributes and Captions Oral

Qing Li , Jianlong Fu , Dongfei Yu , Tao Mei , and Jiebo Luo

Annual Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018

Oral

@article{li2018tell,
  title = {Tell-and-Answer: Towards Explainable Visual Question Answering Using Attributes and Captions},
  author = {Li, Qing and Fu, Jianlong and Yu, Dongfei and Mei, Tao and Luo, Jiebo},
  journal = {Annual Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year = {2018}
}

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

Qing Li , Qingyi Tao , Shafiq Joty , Jianfei Cai , and Jiebo Luo

European Conference on Computer Vision (ECCV), 2018

@article{li2018vqa,
  title = {VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions},
  author = {Li, Qing and Tao, Qingyi and Joty, Shafiq and Cai, Jianfei and Luo, Jiebo},
  journal = {European Conference on Computer Vision (ECCV)},
  year = {2018}
}

VizWiz Grand Challenge: Answering Visual Questions from Blind People Spotlight

Danna Gurari , Qing Li , Abigale J Stangl , Anhong Guo , Chi Lin , Kristen Grauman , Jiebo Luo , and Jeffrey P Bigham

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018

arXiv Bib Website

Spotlight

@article{gurari2018vizwiz,
  title = {VizWiz Grand Challenge: Answering Visual Questions from Blind People},
  author = {Gurari, Danna and Li, Qing and Stangl, Abigale J and Guo, Anhong and Lin, Chi and Grauman, Kristen and Luo, Jiebo and Bigham, Jeffrey P},
  journal = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2018}
}

Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation Best Paper Finalist

Qing Li , Zhaofan Qiu , Ting Yao , Tao Mei , Yong Rui , and Jiebo Luo

International Conference on Multimedia Retrieval, 2016

Best Paper Finalist

@article{li2016action,
  title = {Action Recognition by Learning Deep Multi-Granular Spatio-Temporal Video Representation},
  author = {Li, Qing and Qiu, Zhaofan and Yao, Ting and Mei, Tao and Rui, Yong and Luo, Jiebo},
  journal = {International Conference on Multimedia Retrieval},
  year = {2016}
}