no code implementations • 18 Nov 2024 • Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang
To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine.
no code implementations • 8 Oct 2024 • Yifei Xing, Xiangyuan Lan, Ruiping Wang, Dongmei Jiang, Wenjun Huang, Qingfang Zheng, YaoWei Wang
In this work, we propose Empowering Multi-modal Mamba with Structural and Hierarchical Alignment (EMMA), which enables the MLLM to extract fine-grained visual information.
no code implementations • 3 Sep 2024 • Bin Fu, Qiyang Wan, Jialin Li, Ruiping Wang, Xilin Chen
Categorization, a core cognitive ability in humans that organizes objects based on common features, is essential to cognitive science as well as computer vision.
1 code implementation • 18 Jul 2024 • Changshuo Wang, Meiqing Wu, Siew-Kei Lam, Xin Ning, Shangshu Yu, Ruiping Wang, Weijun Li, Thambipillai Srikanthan
The core of GPSFormer is the Global Perception Module (GPM) and the Local Structure Fitting Convolution (LSFConv).
Ranked #4 on 3D Point Cloud Classification on ScanObjectNN
no code implementations • 15 Jul 2024 • Hongyu Wang, Shuming Ma, Ruiping Wang, Furu Wei
We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs).
no code implementations • 17 Jun 2024 • Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jeremy Liu, Ruiping Wang, Hao Dong
The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects. Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly. However, these methods typically focus on high-level planning corrections using an additional MLLM, with limited utilization of failed samples to correct low-level contact poses which is particularly prone to occur during articulated object manipulation. To address this gap, we propose an Autonomous Interactive Correction (AIC) MLLM, which makes use of previous low-level interaction experiences to correct SE(3) pose predictions for articulated object.
1 code implementation • 24 May 2024 • Hongyu Wang, Jiayu Xu, Senwei Xie, Ruiping Wang, Jialin Li, Zhaojie Xie, Bin Zhang, Chuyan Xiong, Xilin Chen
In this work, we introduce M4U, a novel and challenging benchmark for assessing the capability of multi-discipline multilingual multimodal understanding and reasoning.
3 code implementations • 27 Feb 2024 • Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei
Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).
1 code implementation • NeurIPS 2023 • Ziyi Bai, Ruiping Wang, Xilin Chen
Instead of that, we train an Encoder-Decoder to generate a set of dynamic event memories at the glancing stage.
Ranked #1 on Video Question Answering on AGQA 2.0 balanced
3 code implementations • 17 Oct 2023 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei
The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.
1 code implementation • Winter Conference on Applications of Computer Vision (WACV) 2022 • Hui Nie, Ruiping Wang, Xilin Chen
Zero-Shot Detection (ZSD), which aims at localizing andrecognizing unseen objects in a complicated scene, usuallyleverages the visual and semantic information of individ-ual objects alone.
Ranked #6 on Generalized Zero-Shot Object Detection on MS-COCO
Generalized Zero-Shot Object Detection Scene Understanding +1
1 code implementation • 8 Nov 2021 • Fengyuan Yang, Ruiping Wang, Xilin Chen
However, human can learn new classes quickly even given few samples since human can tell what discriminative features should be focused on about each category based on both the visual and semantic prior knowledge.
no code implementations • CVPR 2021 • Sijin Wang, Ziwei Yao, Ruiping Wang, Zhongqin Wu, Xilin Chen
Then for evaluating the adequacy of the candidate caption, it highlights the image gist on the visual scene graph under the guidance of the reference captions.
no code implementations • 9 Mar 2021 • Xian Sun, Peijin Wang, Zhiyuan Yan, Feng Xu, Ruiping Wang, Wenhui Diao, Jin Chen, Jihao Li, Yingchao Feng, Tao Xu, Martin Weinmann, Stefan Hinz, Cheng Wang, Kun fu
In this paper, we propose a novel benchmark dataset with more than 1 million instances and more than 15, 000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery which is named as FAIR1M.
no code implementations • ICCV 2021 • Jiwei Xiao, Ruiping Wang, Xilin Chen
The inference of the HPG uses GRU to encode the pose features from their corresponding regions in a single RGB image, and passes messages along the graph structure iteratively to improve the predicted poses.
no code implementations • ICCV 2021 • Wenbin Wang, Ruiping Wang, Xilin Chen
To this end, we let the scene graph borrow the ability from the image caption so that it can be a specialist on the basis of remaining all-around, resulting in the so-called Topic Scene Graph.
no code implementations • ICCV 2021 • Difei Gao, Ruiping Wang, Ziyi Bai, Xilin Chen
Visual understanding goes well beyond the study of images or videos on the web.
1 code implementation • 14 Sep 2020 • Vincenzo Lomonaco, Lorenzo Pellegrini, Pau Rodriguez, Massimo Caccia, Qi She, Yu Chen, Quentin Jodelet, Ruiping Wang, Zheda Mai, David Vazquez, German I. Parisi, Nikhil Churamani, Marc Pickett, Issam Laradji, Davide Maltoni
In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous.
1 code implementation • ECCV 2020 • Wenbin Wang, Ruiping Wang, Shiguang Shan, Xilin Chen
Scene graph aims to faithfully reveal humans' perception of image content.
1 code implementation • CVPR 2020 • Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen
Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes.
no code implementations • 4 Nov 2019 • Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen
To tackle the key challenge of hashing on the manifold, a well-studied Riemannian kernel mapping is employed to project data (i. e. covariance matrices) into Euclidean space and thus enables to embed the two heterogeneous representations into a common Hamming space, where both intra-space discriminability and inter-space compatibility are considered.
1 code implementation • 11 Oct 2019 • Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, Xilin Chen
In the light of recent success of scene graph in many CV and NLP tasks for describing complex natural scenes, we propose to represent image and text with two kinds of scene graphs: visual scene graph (VSG) and textual scene graph (TSG), each of which is exploited to jointly characterize objects and relationships in the corresponding modality.
no code implementations • 25 Sep 2019 • Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen
In this paper, we propose the hierarchical disentangle network (HDN) to exploit the rich hierarchical characteristics among categories to divide the disentangling process in a coarse-to-fine manner, such that each level only focuses on learning the specific representations in its granularity and finally the common and unique representations in all granularities jointly constitute the raw object.
no code implementations • ICCV 2019 • Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen
Zero-shot learning (ZSL) is a challenging problem that aims to recognize the target categories without seen data, where semantic information is leveraged to transfer knowledge from some source classes.
Ranked #7 on Zero-Shot Learning on SUN Attribute
no code implementations • 8 Aug 2019 • Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen
To comprehensively evaluate such abilities, we propose a VQA benchmark, CRIC, which introduces new types of questions about Compositional Reasoning on vIsion and Commonsense, and an evaluation metric integrating the correctness of answering and commonsense grounding.
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.
no code implementations • ECCV 2018 • Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen
Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets.
no code implementations • CVPR 2018 • Yong Liu, Ruiping Wang, Shiguang Shan, Xilin Chen
Context is important for accurate visual recognition.
no code implementations • ICCV 2017 • Huajie Jiang, Ruiping Wang, Shiguang Shan, Yi Yang, Xilin Chen
Zero-shot learning (ZSL) aims to transfer knowledge from observed classes to the unseen classes, based on the assumption that both the seen and unseen classes share a common semantic space, among which attributes enjoy a great popularity.
no code implementations • CVPR 2017 • Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen
In this paper we propose a unified framework to address multiple realistic image retrieval tasks concerning both category and attributes.
no code implementations • CVPR 2017 • Wen Wang, Ruiping Wang, Shiguang Shan, Xilin Chen
For face recognition with image sets, while most existing works mainly focus on building robust set models with hand-crafted feature, it remains a research gap to learn better image representations which can closely match the subsequent image set modeling and classification.
no code implementations • 17 Aug 2016 • Zhiwu Huang, Ruiping Wang, Xianqiu Li, Wenxian Liu, Shiguang Shan, Luc van Gool, Xilin Chen
Specifically, by exploiting the Riemannian geometry of the manifold of fixed-rank Positive Semidefinite (PSD) matrices, we present a new solution to reduce optimizing over the space of column full-rank transformation matrices to optimizing on the PSD manifold which has a well-established Riemannian structure.
no code implementations • 15 Aug 2016 • Zhiwu Huang, Ruiping Wang, Shiguang Shan, Luc van Gool, Xilin Chen
With this mapping, the problem of learning a cross-view metric between the two source heterogeneous spaces can be expressed as learning a single-view Euclidean distance metric in the target common Euclidean space.
no code implementations • 19 Jul 2016 • Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen
Recent years have seen more and more demand for a unified framework to address multiple realistic image retrieval tasks concerning both category and attributes.
1 code implementation • CVPR 2016 • Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen
In this paper, we present a new hashing method to learn compact binary codes for highly efficient image retrieval on large-scale datasets.
Ranked #1 on Image Retrieval on CIFAR-10
no code implementations • ICCV 2015 • Yan Li, Ruiping Wang, Haomiao Liu, Huajie Jiang, Shiguang Shan, Xilin Chen
In this way, the learned binary codes can be applied to not only fine-grained face image retrieval, but also facial attributes prediction, which is the very innovation of this work, just like killing two birds with one stone.
no code implementations • 16 Nov 2015 • Mengyi Liu, Ruiping Wang, Shiguang Shan, Xilin Chen
Human action recognition remains a challenging task due to the various sources of video data and large intra-class variations.
no code implementations • 16 Nov 2015 • Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen
3) the local modes on each STM can be instantiated by fitting to UMM, and the corresponding expressionlet is constructed by modeling the variations in each local mode.
Dynamic Facial Expression Recognition Facial Expression Recognition +1
no code implementations • CVPR 2015 • Yan Li, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen
Retrieving videos of a specific person given his/her face image as query becomes more and more appealing for applications like smart movie fast-forwards and suspect searching.
no code implementations • CVPR 2015 • Wen Wang, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen
This paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG) to solve the problem of face recognition with image sets.
no code implementations • CVPR 2015 • Zhiwu Huang, Ruiping Wang, Shiguang Shan, Xilin Chen
In video based face recognition, great success has been made by representing videos as linear subspaces, which typically lie in a special type of non-Euclidean space known as Grassmann manifold.
no code implementations • BioMedical Engineering OnLine 2014 • Huifang Huang, Jie Liu, Qiang Zhu, Ruiping Wang, Guangshu Hu
This was done in order to improve the classification performance of these two classes of heartbeats by using different features and classification methods.
Ranked #2 on Heartbeat Classification on MIT-BIH AR
no code implementations • CVPR 2014 • Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen
In this paper, we attempt to solve both problems via manifold modeling of videos based on a novel mid-level representation, i. e. expressionlet.
Dynamic Facial Expression Recognition Facial Expression Recognition +1
no code implementations • CVPR 2014 • Zhiwu Huang, Ruiping Wang, Shiguang Shan, Xilin Chen
Since the points commonly lie in Euclidean space while the sets are typically modeled as elements on Riemannian manifold, they can be treated as Euclidean points and Riemannian points respectively.