News

2024-08 🔥🔥🔥 Three papers are accepted by NeurIPS 2024! Check out these awesome works: FIRE, A dataset for feedback refining of large multimodal models. UltraEdit, a large-scale (~4M) high-quality dataset for instruction-based image editing. OmniJARVIS, a novel Vision-Language-Action (VLA) model for instruction following in Minecraft.
2024-08 🔥🔥🔥 Call for papers to the 1st workshop on Open-World Agents (NeurIPS 2024).
2024-07 I, together with Xiaojian Ma and Zhi Gao, gave a joint tutorial on “Multimodal Generalist Agents: Reasoning, Reflecting, and Learning like Humans” for the participants in TongProgram Summer School 2024.
2024-07 🔥🔥🔥 Three papers are accepted by ECCV 2024! Check out these awesome works: PQ3D, a unfied model for 3D vision-language understanding; SceneVerse, the first million-scale 3D vision-language dataset; VideoAgent, a LLM agent that understands videos by using a structured memory and 4 tools.
2024-06 Call for papers to IJCLR 2024, which will happen on 20 - 22 September 2024 in Nanjing! I will serve as an area chair on neuro-symbolic learning and reasoning. If you are willing to be a PC member, please contact me!
2024-06 INSIGHT is selected as ICML 2024 Spotlight (top 3.5%)!
2024-05 Check out our new work PQ3D, the first unfied model capable of handling a wide range of 3D-VL tasks! The code and models will be released soon. Stay tuned!
2024-05 Two papers are accepted by ICML 2024. They are about 3D embodied generalist agent (LEO) and end-to-end neural-symbolic RL for explainable decision-making (INSIGHT). Congrats Jiangyong Huang et al. and Lirui Luo et al.!
2024-03 Our paper CLOVA about building tool-based visual assisants is accepted by CVPR 2024. Congrats Zhi Gao et al.!
2023-06 One paper is accepeted by ICCV 2023! Check out 3D-VisTA, a 3D vision-language foundation model.