1 code implementation • COLING 2022 • Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong
Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.
1 code implementation • 14 May 2025 • Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, Le Xue, Caiming Xiong, ran Xu
Building on our innovative model design, training recipe, and datasets, we develop BLIP3-o, a suite of state-of-the-art unified multimodal models.
1 code implementation • 28 Feb 2025 • Yichi Zhang, Bohao Lv, Le Xue, Wenbo Zhang, Yuchen Liu, Yu Fu, Yuan Cheng, Yuan Qi
SemiSAM+ consists of one or multiple promptable foundation models as generalist models, and a trainable task-specific segmentation model as specialist model.
1 code implementation • 20 Feb 2025 • Yichi Zhang, Le Xue, Wenbo Zhang, Lanlan Li, Yuchen Liu, Chen Jiang, Yuan Cheng, Yuan Qi
Positron Emission Tomography (PET) imaging plays a crucial role in modern medical diagnostics by revealing the metabolic processes within a patient's body, which is essential for quantification of therapy response and monitoring treatment progress.
1 code implementation • 10 Jan 2025 • Dominick Reilly, Manish Kumar Govind, Le Xue, Srijan Das
To address this, we aim to leverage the complementary nature of egocentric views to enhance LVLM's understanding of exocentric ADL videos.
Human-Object Interaction Detection
Knowledge Distillation
+1
1 code implementation • 9 Dec 2024 • Jieyu Zhang, Le Xue, Linxin Song, Jun Wang, Weikai Huang, Manli Shu, An Yan, Zixian Ma, Juan Carlos Niebles, Silvio Savarese, Caiming Xiong, Zeyuan Chen, Ranjay Krishna, ran Xu
Our multi-image instruction data leads to an 8% improvement on Mantis-Eval.
Ranked #107 on
Visual Question Answering
on MM-Vet
no code implementations • 12 Nov 2024 • Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, Etash Guha, Silvio Savarese, Ludwig Schmidt, Yejin Choi, Caiming Xiong, ran Xu
We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text.
no code implementations • 21 Oct 2024 • Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Silvio Savarese, ran Xu, Caiming Xiong, Juan Carlos Niebles
We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames.
no code implementations • 22 Aug 2024 • Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, ran Xu, Caiming Xiong
We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions.
1 code implementation • 16 Aug 2024 • Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles, Caiming Xiong, ran Xu
The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs.
1 code implementation • 17 Jun 2024 • Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, ran Xu, Yejin Choi, Ludwig Schmidt
Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs).
no code implementations • 13 Jun 2024 • Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha, Manish Kumar Govind, Pu Wang, Francois Bremond, Le Xue, Srijan Das
To address this, we propose a semi-automated framework for curating ADL datasets, creating ADL-X, a multiview, multimodal RGBS instruction-tuning dataset.
2 code implementations • 30 Nov 2023 • Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles
To enable this framework, we devise a scalable pipeline that automatically generates high-quality, instruction-tuning datasets from readily available captioning data across different modalities, and contribute 24K QA data for audio and 250K QA data for 3D.
2 code implementations • 11 Aug 2023 • Zhiwei Liu, Weiran Yao, JianGuo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs).
1 code implementation • 4 Aug 2023 • Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.
no code implementations • 18 Jul 2023 • Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
1 code implementation • CVPR 2024 • Le Xue, Ning Yu, Shu Zhang, Artemis Panagopoulou, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese
It achieves a new SOTA of 50. 6% (top-1) on Objaverse-LVIS and 84. 7% (top-1) on ModelNet40 in zero-shot classification.
Ranked #10 on
3D Point Cloud Classification
on ScanObjectNN
(using extra training data)
no code implementations • 6 Jan 2023 • Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Caiming Xiong, Tom Goldstein, Juan Carlos Niebles, ran Xu
By plugging our proposed modules into the state-of-the-art transformer-based 3D detectors, we improve the previous best results on both benchmarks, with more significant improvements on smaller objects.
1 code implementation • CVPR 2023 • Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese
Then, ULIP learns a 3D representation space aligned with the common image-text space, using a small number of automatically synthesized triplets.
Ranked #4 on
Training-free 3D Point Cloud Classification
on ModelNet40
(using extra training data)
2 code implementations • 10 May 2022 • Yu Fu, Yanyan Huang, Yalin Wang, Shunjie Dong, Le Xue, Xunzhao Yin, Qianqian Yang, Yiyu Shi, Cheng Zhuo
In this paper, we propose an end-to-end neural network architecture, referred to as optimal transport based feature pyramid fusion (OTFPF) network, for the brain age estimation with T1 MRIs.
no code implementations • 14 Feb 2022 • Yu Fu, Yanyan Huang, Meng Niu, Le Xue, Shunjie Dong, Shunlin Guo, Junqiang Lei, Cheng Zhuo
This study for the first time discussed the differences between MDD and HC using both rich club and diverse club metrics and found the complementarity of them in analyzing brain networks.
no code implementations • 14 Feb 2022 • Yu Fu, Shunjie Dong, Yi Liao, Le Xue, Yuanfan Xu, Feng Li, Qianqian Yang, Tianbai Yu, Mei Tian, Cheng Zhuo
18F-fluorodeoxyglucose (18F-FDG) Positron Emission Tomography (PET) imaging usually needs a full-dose radioactive tracer to obtain satisfactory diagnostic results, which raises concerns about the potential health risks of radiation exposure, especially for pediatric patients.
1 code implementation • 15 Dec 2021 • Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong
Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.
1 code implementation • 8 Oct 2021 • Le Xue, Mingfei Gao, Zeyuan Chen, Caiming Xiong, ran Xu
We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks.