Search Results for author: Ruiping Wang

Found 44 papers, 12 papers with code

RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator

no code implementations18 Nov 2024 Xinhai Li, Jialin Li, Ziheng Zhang, Rui Zhang, Fan Jia, Tiancai Wang, Haoqiang Fan, Kuo-Kun Tseng, Ruiping Wang

To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine.

EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment

no code implementations8 Oct 2024 Yifei Xing, Xiangyuan Lan, Ruiping Wang, Dongmei Jiang, Wenjun Huang, Qingfang Zheng, YaoWei Wang

In this work, we propose Empowering Multi-modal Mamba with Structural and Hierarchical Alignment (EMMA), which enables the MLLM to extract fine-grained visual information.

cross-modal alignment Hallucination +1

Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models

no code implementations3 Sep 2024 Bin Fu, Qiyang Wan, Jialin Li, Ruiping Wang, Xilin Chen

Categorization, a core cognitive ability in humans that organizes objects based on common features, is essential to cognitive science as well as computer vision.

Question Answering Visual Question Answering

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

no code implementations15 Jul 2024 Hongyu Wang, Shuming Ma, Ruiping Wang, Furu Wei

We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs).

AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation

no code implementations17 Jun 2024 Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jeremy Liu, Ruiping Wang, Hao Dong

The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects. Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly. However, these methods typically focus on high-level planning corrections using an additional MLLM, with limited utilization of failed samples to correct low-level contact poses which is particularly prone to occur during articulated object manipulation. To address this gap, we propose an Autonomous Interactive Correction (AIC) MLLM, which makes use of previous low-level interaction experiences to correct SE(3) pose predictions for articulated object.

Pose Prediction Test-time Adaptation

M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models

1 code implementation24 May 2024 Hongyu Wang, Jiayu Xu, Senwei Xie, Ruiping Wang, Jialin Li, Zhaojie Xie, Bin Zhang, Chuyan Xiong, Xilin Chen

In this work, we introduce M4U, a novel and challenging benchmark for assessing the capability of multi-discipline multilingual multimodal understanding and reasoning.

Multimodal Reasoning

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

3 code implementations27 Feb 2024 Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).

BitNet: Scaling 1-bit Transformers for Large Language Models

3 code implementations17 Oct 2023 Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Huaijie Wang, Lingxiao Ma, Fan Yang, Ruiping Wang, Yi Wu, Furu Wei

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption.

Language Modelling Quantization

SEGA: Semantic Guided Attention on Visual Prototype for Few-Shot Learning

1 code implementation8 Nov 2021 Fengyuan Yang, Ruiping Wang, Xilin Chen

However, human can learn new classes quickly even given few samples since human can tell what discriminative features should be focused on about each category based on both the visual and semantic prior knowledge.

feature selection Few-Shot Learning

FAIEr: Fidelity and Adequacy Ensured Image Caption Evaluation

no code implementations CVPR 2021 Sijin Wang, Ziwei Yao, Ruiping Wang, Zhongqin Wu, Xilin Chen

Then for evaluating the adequacy of the candidate caption, it highlights the image gist on the visual scene graph under the guidance of the reference captions.

Image Captioning

FAIR1M: A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery

no code implementations9 Mar 2021 Xian Sun, Peijin Wang, Zhiyuan Yan, Feng Xu, Ruiping Wang, Wenhui Diao, Jin Chen, Jihao Li, Yingchao Feng, Tao Xu, Martin Weinmann, Stefan Hinz, Cheng Wang, Kun fu

In this paper, we propose a novel benchmark dataset with more than 1 million instances and more than 15, 000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery which is named as FAIR1M.

Deep Learning Object +3

Holistic Pose Graph: Modeling Geometric Structure Among Objects in a Scene Using Graph Inference for 3D Object Prediction

no code implementations ICCV 2021 Jiwei Xiao, Ruiping Wang, Xilin Chen

The inference of the HPG uses GRU to encode the pose features from their corresponding regions in a single RGB image, and passes messages along the graph structure iteratively to improve the predicted poses.

Object Pose Estimation

Topic Scene Graph Generation by Attention Distillation From Caption

no code implementations ICCV 2021 Wenbin Wang, Ruiping Wang, Xilin Chen

To this end, we let the scene graph borrow the ability from the image caption so that it can be a specialist on the basis of remaining all-around, resulting in the so-called Topic Scene Graph.

Caption Generation Graph Generation +1

CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions

1 code implementation14 Sep 2020 Vincenzo Lomonaco, Lorenzo Pellegrini, Pau Rodriguez, Massimo Caccia, Qi She, Yu Chen, Quentin Jodelet, Ruiping Wang, Zheda Mai, David Vazquez, German I. Parisi, Nikhil Churamani, Marc Pickett, Issam Laradji, Davide Maltoni

In the last few years, we have witnessed a renewed and fast-growing interest in continual learning with deep neural networks with the shared objective of making current AI systems more adaptive, efficient and autonomous.

Benchmarking Continual Learning

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

1 code implementation CVPR 2020 Difei Gao, Ke Li, Ruiping Wang, Shiguang Shan, Xilin Chen

Then, we introduce three aggregators which guide the message passing from one graph to another to utilize the contexts in various modalities, so as to refine the features of nodes.

Graph Neural Network Question Answering +1

Deep Heterogeneous Hashing for Face Video Retrieval

no code implementations4 Nov 2019 Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen

To tackle the key challenge of hashing on the manifold, a well-studied Riemannian kernel mapping is employed to project data (i. e. covariance matrices) into Euclidean space and thus enables to embed the two heterogeneous representations into a common Hamming space, where both intra-space discriminability and inter-space compatibility are considered.

Retrieval Video Retrieval

Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval

1 code implementation11 Oct 2019 Sijin Wang, Ruiping Wang, Ziwei Yao, Shiguang Shan, Xilin Chen

In the light of recent success of scene graph in many CV and NLP tasks for describing complex natural scenes, we propose to represent image and text with two kinds of scene graphs: visual scene graph (VSG) and textual scene graph (TSG), each of which is exploited to jointly characterize objects and relationships in the corresponding modality.

Graph Matching Image-text Retrieval +1

Hierarchical Disentangle Network for Object Representation Learning

no code implementations25 Sep 2019 Shishi Qiao, Ruiping Wang, Shiguang Shan, Xilin Chen

In this paper, we propose the hierarchical disentangle network (HDN) to exploit the rich hierarchical characteristics among categories to divide the disentangling process in a coarse-to-fine manner, such that each level only focuses on learning the specific representations in its granularity and finally the common and unique representations in all granularities jointly constitute the raw object.

Decoder Disentanglement +2

Transferable Contrastive Network for Generalized Zero-Shot Learning

no code implementations ICCV 2019 Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen

Zero-shot learning (ZSL) is a challenging problem that aims to recognize the target categories without seen data, where semantic information is leveraged to transfer knowledge from some source classes.

Generalized Zero-Shot Learning Transfer Learning

CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense

no code implementations8 Aug 2019 Difei Gao, Ruiping Wang, Shiguang Shan, Xilin Chen

To comprehensively evaluate such abilities, we propose a VQA benchmark, CRIC, which introduces new types of questions about Compositional Reasoning on vIsion and Commonsense, and an evaluation metric integrating the correctness of answering and commonsense grounding.

Question Answering Visual Question Answering (VQA)

Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition

no code implementations ECCV 2018 Huajie Jiang, Ruiping Wang, Shiguang Shan, Xilin Chen

Zero-shot learning (ZSL) aims to recognize objects of novel classes without any training samples of specific classes, which is achieved by exploiting the semantic information and auxiliary datasets.

Dictionary Learning Zero-Shot Learning

Learning Discriminative Latent Attributes for Zero-Shot Classification

no code implementations ICCV 2017 Huajie Jiang, Ruiping Wang, Shiguang Shan, Yi Yang, Xilin Chen

Zero-shot learning (ZSL) aims to transfer knowledge from observed classes to the unseen classes, based on the assumption that both the seen and unseen classes share a common semantic space, among which attributes enjoy a great popularity.

Attribute Classification +3

Learning Multifunctional Binary Codes for Both Category and Attribute Oriented Retrieval Tasks

no code implementations CVPR 2017 Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

In this paper we propose a unified framework to address multiple realistic image retrieval tasks concerning both category and attributes.

Attribute Image Retrieval +1

Discriminative Covariance Oriented Representation Learning for Face Recognition With Image Sets

no code implementations CVPR 2017 Wen Wang, Ruiping Wang, Shiguang Shan, Xilin Chen

For face recognition with image sets, while most existing works mainly focus on building robust set models with hand-crafted feature, it remains a research gap to learn better image representations which can closely match the subsequent image set modeling and classification.

Face Recognition General Classification +2

Geometry-aware Similarity Learning on SPD Manifolds for Visual Recognition

no code implementations17 Aug 2016 Zhiwu Huang, Ruiping Wang, Xianqiu Li, Wenxian Liu, Shiguang Shan, Luc van Gool, Xilin Chen

Specifically, by exploiting the Riemannian geometry of the manifold of fixed-rank Positive Semidefinite (PSD) matrices, we present a new solution to reduce optimizing over the space of column full-rank transformation matrices to optimizing on the PSD manifold which has a well-established Riemannian structure.

Cross Euclidean-to-Riemannian Metric Learning with Application to Face Recognition from Video

no code implementations15 Aug 2016 Zhiwu Huang, Ruiping Wang, Shiguang Shan, Luc van Gool, Xilin Chen

With this mapping, the problem of learning a cross-view metric between the two source heterogeneous spaces can be expressed as learning a single-view Euclidean distance metric in the target common Euclidean space.

Face Recognition Metric Learning

Dual Purpose Hashing

no code implementations19 Jul 2016 Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

Recent years have seen more and more demand for a unified framework to address multiple realistic image retrieval tasks concerning both category and attributes.

Attribute Image Retrieval +1

Deep Supervised Hashing for Fast Image Retrieval

1 code implementation CVPR 2016 Haomiao Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

In this paper, we present a new hashing method to learn compact binary codes for highly efficient image retrieval on large-scale datasets.

Image Retrieval Retrieval

Two Birds, One Stone: Jointly Learning Binary Code for Large-Scale Face Image Retrieval and Attributes Prediction

no code implementations ICCV 2015 Yan Li, Ruiping Wang, Haomiao Liu, Huajie Jiang, Shiguang Shan, Xilin Chen

In this way, the learned binary codes can be applied to not only fine-grained face image retrieval, but also facial attributes prediction, which is the very innovation of this work, just like killing two birds with one stone.

Face Image Retrieval Retrieval

Learning Mid-level Words on Riemannian Manifold for Action Recognition

no code implementations16 Nov 2015 Mengyi Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

Human action recognition remains a challenging task due to the various sources of video data and large intra-class variations.

Action Recognition Clustering +1

Learning Expressionlets via Universal Manifold Model for Dynamic Facial Expression Recognition

no code implementations16 Nov 2015 Mengyi Liu, Shiguang Shan, Ruiping Wang, Xilin Chen

3) the local modes on each STM can be instantiated by fitting to UMM, and the corresponding expressionlet is constructed by modeling the variations in each local mode.

Dynamic Facial Expression Recognition Facial Expression Recognition +1

Face Video Retrieval With Image Query via Hashing Across Euclidean Space and Riemannian Manifold

no code implementations CVPR 2015 Yan Li, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen

Retrieving videos of a specific person given his/her face image as query becomes more and more appealing for applications like smart movie fast-forwards and suspect searching.

Retrieval Video Retrieval

Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets

no code implementations CVPR 2015 Wen Wang, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen

This paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG) to solve the problem of face recognition with image sets.

Face Identification Face Recognition +1

Projection Metric Learning on Grassmann Manifold With Application to Video Based Face Recognition

no code implementations CVPR 2015 Zhiwu Huang, Ruiping Wang, Shiguang Shan, Xilin Chen

In video based face recognition, great success has been made by representing videos as linear subspaces, which typically lie in a special type of non-Euclidean space known as Grassmann manifold.

Dimensionality Reduction Face Recognition +1

Learning Euclidean-to-Riemannian Metric for Point-to-Set Classification

no code implementations CVPR 2014 Zhiwu Huang, Ruiping Wang, Shiguang Shan, Xilin Chen

Since the points commonly lie in Euclidean space while the sets are typically modeled as elements on Riemannian manifold, they can be treated as Euclidean points and Riemannian points respectively.

Classification General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.