2024-08 | 🔥🔥🔥 Three papers are accepted by NeurIPS 2024! Check out these awesome works: FIRE, A dataset for feedback refining of large multimodal models. UltraEdit, a large-scale (~4M) high-quality dataset for instruction-based image editing. OmniJARVIS, a novel Vision-Language-Action (VLA) model for instruction following in Minecraft. |
2024-08 | 🔥🔥🔥 Call for papers to the 1st workshop on Open-World Agents (NeurIPS 2024). |
2024-07 | I, together with Xiaojian Ma and Zhi Gao, gave a joint tutorial on “Multimodal Generalist Agents: Reasoning, Reflecting, and Learning like Humans” for the participants in TongProgram Summer School 2024. |
2024-07 | 🔥🔥🔥 Three papers are accepted by ECCV 2024! Check out these awesome works: PQ3D, a unfied model for 3D vision-language understanding; SceneVerse, the first million-scale 3D vision-language dataset; VideoAgent, a LLM agent that understands videos by using a structured memory and 4 tools. |
2024-06 | Call for papers to IJCLR 2024, which will happen on 20 - 22 September 2024 in Nanjing! I will serve as an area chair on neuro-symbolic learning and reasoning. If you are willing to be a PC member, please contact me! |
2024-06 | INSIGHT is selected as ICML 2024 Spotlight (top 3.5%)! |
2024-05 | Check out our new work PQ3D, the first unfied model capable of handling a wide range of 3D-VL tasks! The code and models will be released soon. Stay tuned! |
2024-05 | Two papers are accepted by ICML 2024. They are about 3D embodied generalist agent (LEO) and end-to-end neural-symbolic RL for explainable decision-making (INSIGHT). Congrats Jiangyong Huang et al. and Lirui Luo et al.! |
2024-03 | Our paper CLOVA about building tool-based visual assisants is accepted by CVPR 2024. Congrats Zhi Gao et al.! |
2023-06 | One paper is accepeted by ICCV 2023! Check out 3D-VisTA, a 3D vision-language foundation model. |