Search Results for author: Jing Shao

Found 79 papers, 45 papers with code

RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

no code implementations28 Mar 2024 Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments.

Motion Planning

Assessment of Multimodal Large Language Models in Alignment with Human Values

1 code implementation26 Mar 2024 Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh).

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

1 code implementation18 Mar 2024 Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao

It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways.

Instruction Following

Endora: Video Generation Models as Endoscopy Simulators

no code implementations17 Mar 2024 Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan

In a nutshell, Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research, setting a substantial stage for further advances in medical content generation.

Data Augmentation Video Generation

Exploring Safety Generalization Challenges of Large Language Models via Code

no code implementations12 Mar 2024 Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Yu Qiao, Wai Lam, Lizhuang Ma

The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse.

Code Completion

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

1 code implementation29 Feb 2024 Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong liu, Jing Shao

This research provides an initial exploration of trustworthiness modeling during LLM pre-training, seeking to unveil new insights and spur further developments in the field.

Fairness Mutual Information Estimation

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

1 code implementation14 Feb 2024 Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao

Large Language Models (LLMs) are now commonplace in conversation applications.

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

1 code implementation7 Feb 2024 Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, WangMeng Zuo, Dahua Lin, Yu Qiao, Jing Shao

In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount.

Multiple-choice

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety

1 code implementation22 Jan 2024 Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao

In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety.

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

1 code implementation12 Dec 2023 Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao

It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways.

Self Model for Embodied Intelligence: Modeling Full-Body Human Musculoskeletal System and Locomotion Control with Hierarchical Low-Dimensional Representation

no code implementations9 Dec 2023 Kaibo He, Chenhui Zuo, Jing Shao, Yanan Sui

Modeling and control of the human musculoskeletal system is important for understanding human motor functions, developing embodied intelligence, and optimizing human-robot interaction systems.

Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE

1 code implementation5 Nov 2023 Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao

While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).

Zero-shot Generalization

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models

1 code implementation5 Nov 2023 Zhelun Shi, Zhipin Wang, Hongxing Fan, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models, so that ChEF can be a growing evaluation framework for the MLLM community.

Hallucination In-Context Learning +2

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

1 code implementation5 Oct 2023 Zhanhui Zhou, Jie Liu, Chao Yang, Jing Shao, Yu Liu, Xiangyu Yue, Wanli Ouyang, Yu Qiao

A single language model (LM), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences.

Language Modelling Long Form Question Answering

UniG3D: A Unified 3D Object Generation Dataset

no code implementations19 Jun 2023 Qinghong Sun, Yangguang Li, Zexiang Liu, Xiaoshui Huang, Fenggang Liu, Xihui Liu, Wanli Ouyang, Jing Shao

However, the quality and diversity of existing 3D object generation methods are constrained by the inadequacies of existing 3D object datasets, including issues related to text quality, the incompleteness of multi-modal data representation encompassing 2D rendered images and 3D assets, as well as the size of the dataset.

Autonomous Driving Object

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

1 code implementation NeurIPS 2023 Zhenfei Yin, Jiong Wang, JianJian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang

To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.

Latent Distribution Adjusting for Face Anti-Spoofing

2 code implementations16 May 2023 Qinghong Sun, Zhenfei Yin, Yichao Wu, Yuanhan Zhang, Jing Shao

In this work, we propose a unified framework called Latent Distribution Adjusting (LDA) with properties of latent, discriminative, adaptive, generic to improve the robustness of the FAS model by adjusting complex data distribution with multiple prototypes.

Face Anti-Spoofing Prototype Selection

Mask Hierarchical Features For Self-Supervised Learning

no code implementations1 Apr 2023 Fenggang Liu, Yangguang Li, Feng Liang, Jilan Xu, Bin Huang, Jing Shao

We mask part of patches in the representation space and then utilize sparse visible patches to reconstruct high semantic image representation.

object-detection Object Detection +1

Siamese DETR

1 code implementation CVPR 2023 Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng

In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR.

MULTI-VIEW LEARNING Representation Learning

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

1 code implementation29 Jan 2023 Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.

Data Augmentation

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception

1 code implementation19 Jan 2023 Bin Huang, Yangguang Li, Enze Xie, Feng Liang, Luya Wang, Mingzhu Shen, Fenggang Liu, Tianqi Wang, Ping Luo, Jing Shao

Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving.

Autonomous Driving Data Augmentation

BEVBert: Multimodal Map Pre-training for Language-guided Navigation

1 code implementation ICCV 2023 Dong An, Yuankai Qi, Yangguang Li, Yan Huang, Liang Wang, Tieniu Tan, Jing Shao

Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map.

Vision and Language Navigation Visual Navigation

R$^2$F: A General Retrieval, Reading and Fusion Framework for Document-level Natural Language Inference

1 code implementation22 Oct 2022 Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao

Document-level natural language inference (DOCNLI) is a new challenging task in natural language processing, aiming at judging the entailment relationship between a pair of hypothesis and premise documents.

Natural Language Inference Retrieval +1

Towards Accurate Binary Neural Networks via Modeling Contextual Dependencies

1 code implementation3 Sep 2022 Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jing Shao, Chunlei Liu, Xianglong Liu

Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks.

Binarization Inductive Bias

Task-Balanced Distillation for Object Detection

no code implementations5 Aug 2022 Ruining Tang, Zhenyu Liu, Yangguang Li, Yiguo Song, Hui Liu, Qide Wang, Jing Shao, Guifang Duan, Jianrong Tan

To alleviate this problem, a novel Task-decoupled Feature Distillation (TFD) is proposed by flexibly balancing the contributions of classification and regression tasks.

Classification Knowledge Distillation +4

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

1 code implementation14 Jul 2022 Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu

We benchmark ReCo and other advances in omni-vision representation studies that are different in architectures (from CNNs to transformers) and in learning paradigms (from supervised learning to self-supervised learning) on OmniBenchmark.

Benchmarking Contrastive Learning +2

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)

1 code implementation23 Jun 2022 Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao

Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.

Data Augmentation Vision and Language Navigation

Robust Face Anti-Spoofing with Dual Probabilistic Modeling

no code implementations27 Apr 2022 Yuanhan Zhang, Yichao Wu, Zhenfei Yin, Jing Shao, Ziwei Liu

In this work, we attempt to fill this gap by automatically addressing the noise problem from both label and data perspectives in a probabilistic manner.

Face Anti-Spoofing

ERGO: Event Relational Graph Transformer for Document-level Event Causality Identification

no code implementations COLING 2022 Meiqi Chen, Yixin Cao, Kunquan Deng, Mukai Li, Kun Wang, Jing Shao, Yan Zhang

In this paper, we propose a novel Event Relational Graph TransfOrmer (ERGO) framework for DECI, which improves existing state-of-the-art (SOTA) methods upon two aspects.

Event Causality Identification Node Classification +2

Few-shot Forgery Detection via Guided Adversarial Interpolation

no code implementations12 Apr 2022 Haonan Qiu, Siyu Chen, Bei Gan, Kun Wang, Huafeng Shi, Jing Shao, Ziwei Liu

Notably, our method is also validated to be robust to choices of majority and minority forgery approaches.

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

no code implementations16 Mar 2022 Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.

object-detection Object Detection +3

Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision

1 code implementation11 Mar 2022 Yufeng Cui, Lichen Zhao, Feng Liang, Yangguang Li, Jing Shao

This is because researchers do not choose consistent training recipes and even use different data, hampering the fair comparison between different methods.

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

no code implementations18 Jan 2022 Luya Wang, Feng Liang, Yangguang Li, Honggang Zhang, Wanli Ouyang, Jing Shao

Recently, self-supervised vision transformers have attracted unprecedented attention for their impressive representation learning ability.

Contrastive Learning Representation Learning

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

1 code implementation16 Jan 2022 Hao Wang, Yangguang Li, Zhen Huang, Yong Dou, Lingpeng Kong, Jing Shao

To alleviate feature suppression, we propose contrastive learning for unsupervised sentence embedding with soft negative samples (SNCSE).

Contrastive Learning Data Augmentation +7

A Simple Long-Tailed Recognition Baseline via Vision-Language Model

1 code implementation29 Nov 2021 Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao

Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.

Ranked #4 on Long-tail Learning on Places-LT (using extra training data)

Contrastive Learning Language Modelling +3

INTERN: A New Learning Paradigm Towards General Vision

no code implementations16 Nov 2021 Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

3 code implementations ICLR 2022 Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks.

Zero-Shot Learning

Few-Shot Domain Expansion for Face Anti-Spoofing

no code implementations27 Jun 2021 Bowen Yang, Jing Zhang, Zhenfei Yin, Jing Shao

In practice, given a handful of labeled samples from a new deployment scenario (target domain) and abundant labeled face images in the existing source domain, the FAS system is expected to perform well in the new scenario without sacrificing the performance on the original domain.

Face Anti-Spoofing Face Recognition +1

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

2 code implementations CVPR 2021 Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu

To counter this emerging threat, we construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification.

Benchmarking Classification +2

PV-NAS: Practical Neural Architecture Search for Video Recognition

no code implementations2 Nov 2020 ZiHao Wang, Chen Lin, Lu Sheng, Junjie Yan, Jing Shao

Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability.

Neural Architecture Search Video Recognition

Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues

2 code implementations ECCV 2020 Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, Jing Shao

As realistic facial manipulation technologies have achieved remarkable progress, social concerns about potential malicious abuse of these technologies bring out an emerging research topic of face forgery detection.

1st place solution for AVA-Kinetics Crossover in AcitivityNet Challenge 2020

2 code implementations16 Jun 2020 Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu

This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.

Relation Network Spatio-Temporal Action Localization +1

Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization

3 code implementations CVPR 2021 Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li

We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.

Action Detection Action Recognition +5

Morphing and Sampling Network for Dense Point Cloud Completion

2 code implementations30 Nov 2019 Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, Shi-Min Hu

3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community.

Point Cloud Completion

Diving into Optimization of Topology in Neural Networks

no code implementations25 Sep 2019 Kun Yuan, Quanquan Li, Yucong Zhou, Jing Shao, Junjie Yan

Seeking effective networks has become one of the most crucial and practical areas in deep learning.

Face Recognition Image Classification +2

Context and Attribute Grounded Dense Captioning

no code implementations CVPR 2019 Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao

Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.

Attribute Dense Captioning

Video Generation from Single Semantic Label Map

2 code implementations CVPR 2019 Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.

Image Generation Image to Video Generation +1

Unsupervised Bi-directional Flow-based Video Generation from one Snapshot

no code implementations3 Mar 2019 Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy

Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.

Video Generation

Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

no code implementations CVPR 2019 Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li

Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.

Referring Expression

Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

1 code implementation16 Sep 2018 Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan

Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs.

Classification General Classification +4

Transductive Centroid Projection for Semi-supervised Large-scale Recognition

no code implementations ECCV 2018 Yu Liu, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang

It is inspired by the observation of the weights in classification layer (called extit{anchors}) converge to the central direction of each class in hyperspace.

Clustering General Classification

Localization Guided Learning for Pedestrian Attribute Recognition

no code implementations28 Aug 2018 Pengze Liu, Xihui Liu, Junjie Yan, Jing Shao

Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos.

Attribute Pedestrian Attribute Recognition +1

BlockQNN: Efficient Block-wise Neural Network Architecture Generation

2 code implementations16 Aug 2018 Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu

The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2. 35% top-1 error rate on CIFAR-10.

Image Classification Q-Learning

Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition

no code implementations ECCV 2018 Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy

We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.

Object

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

3 code implementations CVPR 2018 Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang

Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.

Image Generation Image Reconstruction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.