no code implementations • ACL 2022 • Yubo Ma, Zehao Wang, Mukai Li, Yixin Cao, Meiqi Chen, Xinze Li, Wenqi Sun, Kunquan Deng, Kun Wang, Aixin Sun, Jing Shao
Events are fundamental building blocks of real-world happenings.
no code implementations • 5 Mar 2025 • Rui Ye, Shuo Tang, Rui Ge, Yaxin Du, Zhenfei Yin, Siheng Chen, Jing Shao
LLM-based multi-agent systems (MAS) have shown significant potential in tackling diverse tasks.
no code implementations • 4 Mar 2025 • Zhenhua Liu, Lijun Li, Ruizhe Chen, Yuxian Jiang, Tong Zhu, Zhaochen Su, Wenliang Chen, Jing Shao
While Reinforcement Learning from Human Feedback (RLHF) has become the predominant method for controlling language model outputs, it suffers from high computational costs and training instability.
no code implementations • 24 Feb 2025 • Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao
Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs.
1 code implementation • 14 Feb 2025 • Xiaoya Lu, Dongrui Liu, Yi Yu, Luxin Xu, Jing Shao
In this paper, we conduct a comprehensive comparison, revealing that some existing defense methods can improve the robustness of LLMs against multi-turn jailbreaks but compromise usability, i. e., reducing general capabilities or causing the over-refusal problem.
no code implementations • 30 Jan 2025 • Yi Ding, Lijun Li, Bing Cao, Jing Shao
Our experiments demonstrate that fine-tuning InternVL2. 5-8B with MIS significantly outperforms both powerful open-source models and API-based models in challenging multi-image tasks requiring safety-related visual reasoning.
no code implementations • 23 Jan 2025 • Erjia Xiao, Hao Cheng, Jing Shao, Jinhao Duan, Kaidi Xu, Le Yang, Jindong Gu, Renjing Xu
The integration of multimodal encoders extends their capabilities, enabling the development of Multimodal Large Language Models that process vision, audio, and text.
1 code implementation • 22 Jan 2025 • Lijun Li, Zhelun Shi, Xuhao Hu, Bowen Dong, Yiran Qin, Xihui Liu, Lu Sheng, Jing Shao
Based on this taxonomy and prompt set, we build a large-scale T2I dataset with 68K manually annotated images and train an evaluator capable of detecting critical risks that previous work has failed to identify, including risks that even ultra-large proprietary models like GPTs cannot correctly detect.
1 code implementation • 17 Dec 2024 • Sheng Yin, Xianghe Pang, Yuanzhuo Ding, Menglan Chen, Yutong Bi, Yichen Xiong, Wenhao Huang, Zhen Xiang, Jing Shao, Siheng Chen
With the integration of large language models (LLMs), embodied agents have strong capabilities to understand and plan complicated natural language instructions.
no code implementations • 3 Dec 2024 • Yunkai Dang, Kaichen Huang, Jiahao Huo, Yibo Yan, Sirui Huang, Dongrui Liu, Mengxi Gao, Jie Zhang, Chen Qian, Kun Wang, Yong liu, Jing Shao, Hui Xiong, Xuming Hu
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing, respectively.
no code implementations • 2 Dec 2024 • Rui Ye, Xianghe Pang, Jingyi Chai, Jiaao Chen, Zhenfei Yin, Zhen Xiang, Xiaowen Dong, Jing Shao, Siheng Chen
However, the unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.
no code implementations • 29 Nov 2024 • Xuhao Hu, Dongrui Liu, Hao Li, Xuanjing Huang, Jing Shao
To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i. e., the potentially risky and sensitive content in the image has been revealed in the textual query.
no code implementations • 26 Nov 2024 • Lehan He, Zeren Chen, Zhelun Shi, Tianyu Yu, Jing Shao, Lu Sheng
Through a deconfounded strategy that replaces each topic within the response with the best or worst alternatives generated by the model itself, TPO creates more contrasting pairwise preference feedback, enhancing the feedback quality without human or proprietary model intervention.
1 code implementation • 18 Nov 2024 • ZiYi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Jinsong Chen, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Chaochao Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao
There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i. e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems.
no code implementations • 23 Oct 2024 • Yiran Qin, Zhelun Shi, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao, Lei Bai, Wanli Ouyang, Ruimao Zhang
WorldSimBench includes Explicit Perceptual Evaluation and Implicit Manipulative Evaluation, encompassing human preference assessments from the visual perspective and action-level evaluations in embodied tasks, covering three representative embodied scenarios: Open-Ended Embodied Environment, Autonomous, Driving, and Robot Manipulation.
1 code implementation • 22 Oct 2024 • Chen Qian, Dongrui Liu, Jie Zhang, Yong liu, Jing Shao
Extensive experimental results demonstrate that DEAN eliminates the trade-off phenomenon and significantly improves LLMs' fairness and privacy awareness simultaneously, \eg improving Qwen-2-7B-Instruct's fairness awareness by 12. 2\% and privacy awareness by 14. 0\%.
1 code implementation • 18 Oct 2024 • Jie Zhang, Dongrui Liu, Chen Qian, Linfeng Zhang, Yong liu, Yu Qiao, Jing Shao
Therefore, model owners and third parties need to identify whether a suspect model is a subsequent development of the victim model.
2 code implementations • 14 Oct 2024 • Qibing Ren, Hao Li, Dongrui Liu, Zhanxu Xie, Xiaoya Lu, Yu Qiao, Lei Sha, Junchi Yan, Lizhuang Ma, Jing Shao
This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn interactions, where malicious users can obscure harmful intents across several queries.
1 code implementation • 17 Jul 2024 • Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong liu, Yu Qiao, Jing Shao
In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i. e., toxicity, privacy, and fairness, based on the reliable MBTI-M scale.
no code implementations • 8 Jul 2024 • Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan
To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o), which embeds the numerous disparate clinical modalities into a unified representation, completes the deficient embedding of missing modality and reformulates the cross-modal learning with a graph-based aggregation.
no code implementations • 30 Jun 2024 • Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, DaCheng Tao
For the first time, this paper introduces the GenderBias-\emph{VL} benchmark to evaluate occupation-related gender bias in LVLMs using counterfactual visual questions under individual fairness criteria.
1 code implementation • 17 Jun 2024 • Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao
To address these limitations, we propose a Safety Preference Alignment dataset for Vision Language Models named SPA-VL.
no code implementations • 5 Jun 2024 • Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu
However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers.
no code implementations • 28 Mar 2024 • Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng
Achieving generalizability in solving out-of-distribution tasks is one of the ultimate goals of learning robotic manipulation.
1 code implementation • 26 Mar 2024 • Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao
Large Language Models (LLMs) aim to serve as versatile assistants aligned with human values, as defined by the principles of being helpful, honest, and harmless (hhh).
1 code implementation • 18 Mar 2024 • Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang
This paper introduces EasyJailbreak, a unified framework simplifying the construction and evaluation of jailbreak attacks against LLMs.
1 code implementation • 18 Mar 2024 • Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao
It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways.
no code implementations • 17 Mar 2024 • Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan
In a nutshell, Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research, setting a substantial stage for further advances in medical content generation.
1 code implementation • 12 Mar 2024 • Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma
The rapid advancement of Large Language Models (LLMs) has brought about remarkable generative capabilities but also raised concerns about their potential misuse.
1 code implementation • 29 Feb 2024 • Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong liu, Jing Shao
This research provides an initial exploration of trustworthiness modeling during LLM pre-training, seeking to unveil new insights and spur further developments in the field.
1 code implementation • 14 Feb 2024 • Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao
Large Language Models (LLMs) are now commonplace in conversation applications.
1 code implementation • 7 Feb 2024 • Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, WangMeng Zuo, Dahua Lin, Yu Qiao, Jing Shao
In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety measures is paramount.
no code implementations • 26 Jan 2024 • Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, LiMin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang
Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents.
2 code implementations • 22 Jan 2024 • Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao
In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety.
1 code implementation • CVPR 2024 • Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing Shao
It is a long-lasting goal to design an embodied system that can solve long-horizon open-world tasks in human-like ways.
no code implementations • 9 Dec 2023 • Chenhui Zuo, Kaibo He, Jing Shao, Yanan Sui
Modeling and control of the human musculoskeletal system is important for understanding human motor functions, developing embodied intelligence, and optimizing human-robot interaction systems.
1 code implementation • 5 Nov 2023 • Zhelun Shi, Zhipin Wang, Hongxing Fan, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao
We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models, so that ChEF can be a growing evaluation framework for the MLLM community.
2 code implementations • 5 Nov 2023 • Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao
While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).
2 code implementations • 5 Oct 2023 • Zhanhui Zhou, Jie Liu, Jing Shao, Xiangyu Yue, Chao Yang, Wanli Ouyang, Yu Qiao
A single language model, even when aligned with labelers through reinforcement learning from human feedback (RLHF), may not suit all human preferences.
no code implementations • 19 Jun 2023 • Qinghong Sun, Yangguang Li, Zexiang Liu, Xiaoshui Huang, Fenggang Liu, Xihui Liu, Wanli Ouyang, Jing Shao
However, the quality and diversity of existing 3D object generation methods are constrained by the inadequacies of existing 3D object datasets, including issues related to text quality, the incompleteness of multi-modal data representation encompassing 2D rendered images and 3D assets, as well as the size of the dataset.
2 code implementations • NeurIPS 2023 • Zhenfei Yin, Jiong Wang, JianJian Cao, Zhelun Shi, Dingning Liu, Mukai Li, Lu Sheng, Lei Bai, Xiaoshui Huang, Zhiyong Wang, Jing Shao, Wanli Ouyang
To the best of our knowledge, we present one of the very first open-source endeavors in the field, LAMM, encompassing a Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.
3 code implementations • 16 May 2023 • Qinghong Sun, Zhenfei Yin, Yichao Wu, Yuanhan Zhang, Jing Shao
In this work, we propose a unified framework called Latent Distribution Adjusting (LDA) with properties of latent, discriminative, adaptive, generic to improve the robustness of the FAS model by adjusting complex data distribution with multiple prototypes.
no code implementations • 1 Apr 2023 • Fenggang Liu, Yangguang Li, Feng Liang, Jilan Xu, Bin Huang, Jing Shao
We mask part of patches in the representation space and then utilize sparse visible patches to reconstruct high semantic image representation.
1 code implementation • CVPR 2023 • Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng
In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR.
1 code implementation • 29 Jan 2023 • Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao
Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.
1 code implementation • 19 Jan 2023 • Bin Huang, Yangguang Li, Enze Xie, Feng Liang, Luya Wang, Mingzhu Shen, Fenggang Liu, Tianqi Wang, Ping Luo, Jing Shao
Recently, the pure camera-based Bird's-Eye-View (BEV) perception removes expensive Lidar sensors, making it a feasible solution for economical autonomous driving.
no code implementations • 9 Jan 2023 • Huan Peng, Fenggang Liu, Yangguang Li, Bin Huang, Jing Shao, Nong Sang, Changxin Gao
Human-Object Interaction (HOI) detection aims to learn how human interacts with surrounding objects.
1 code implementation • ICCV 2023 • Dong An, Yuankai Qi, Yangguang Li, Yan Huang, Liang Wang, Tieniu Tan, Jing Shao
Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map.
Ranked #3 on
Visual Navigation
on R2R
1 code implementation • 22 Oct 2022 • Hao Wang, Yixin Cao, Yangguang Li, Zhen Huang, Kun Wang, Jing Shao
Document-level natural language inference (DOCNLI) is a new challenging task in natural language processing, aiming at judging the entailment relationship between a pair of hypothesis and premise documents.
1 code implementation • 20 Oct 2022 • Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao
Multimodal ambiguity and color bleeding remain challenging in colorization.
1 code implementation • 3 Sep 2022 • Xingrun Xing, Yangguang Li, Wei Li, Wenrui Ding, Yalong Jiang, Yufeng Wang, Jing Shao, Chunlei Liu, Xianglong Liu
Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks.
no code implementations • 5 Aug 2022 • Ruining Tang, Zhenyu Liu, Yangguang Li, Yiguo Song, Hui Liu, Qide Wang, Jing Shao, Guifang Duan, Jianrong Tan
To alleviate this problem, a novel Task-decoupled Feature Distillation (TFD) is proposed by flexibly balancing the contributions of classification and regression tasks.
1 code implementation • 14 Jul 2022 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu
We benchmark ReCo and other advances in omni-vision representation studies that are different in architectures (from CNNs to transformers) and in learning paradigms (from supervised learning to self-supervised learning) on OmniBenchmark.
1 code implementation • 27 Jun 2022 • Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li
This has led to a new research direction in parameter-efficient transfer learning.
Ranked #23 on
Action Recognition
on Something-Something V2
(using extra training data)
1 code implementation • 23 Jun 2022 • Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao
Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.
no code implementations • 27 Apr 2022 • Yuanhan Zhang, Yichao Wu, Zhenfei Yin, Jing Shao, Ziwei Liu
In this work, we attempt to fill this gap by automatically addressing the noise problem from both label and data perspectives in a probabilistic manner.
no code implementations • COLING 2022 • Meiqi Chen, Yixin Cao, Kunquan Deng, Mukai Li, Kun Wang, Jing Shao, Yan Zhang
In this paper, we propose a novel Event Relational Graph TransfOrmer (ERGO) framework for DECI, which improves existing state-of-the-art (SOTA) methods upon two aspects.
no code implementations • 12 Apr 2022 • Haonan Qiu, Siyu Chen, Bei Gan, Kun Wang, Huafeng Shi, Jing Shao, Ziwei Liu
Notably, our method is also validated to be robust to choices of majority and minority forgery approaches.
no code implementations • 16 Mar 2022 • Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao
2) Squeeze Stage: X-Learner condenses the model to a reasonable size and learns the universal and generalizable representation for various tasks transferring.
2 code implementations • 15 Mar 2022 • Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang, Lu Sheng, Yu Qiao, Jing Shao, Ziwei Liu
This work thus proposes a novel active learning framework for realistic dataset annotation.
Ranked #1 on
Image Classification
on Food-101
(using extra training data)
1 code implementation • 11 Mar 2022 • Yufeng Cui, Lichen Zhao, Feng Liang, Yangguang Li, Jing Shao
This is because researchers do not choose consistent training recipes and even use different data, hampering the fair comparison between different methods.
1 code implementation • ACL 2022 • Yubo Ma, Zehao Wang, Yixin Cao, Mukai Li, Meiqi Chen, Kun Wang, Jing Shao
We have conducted extensive experiments on three benchmarks, including both sentence- and document-level EAE.
no code implementations • 18 Jan 2022 • Luya Wang, Feng Liang, Yangguang Li, Honggang Zhang, Wanli Ouyang, Jing Shao
Recently, self-supervised vision transformers have attracted unprecedented attention for their impressive representation learning ability.
1 code implementation • 16 Jan 2022 • Hao Wang, Yangguang Li, Zhen Huang, Yong Dou, Lingpeng Kong, Jing Shao
To alleviate feature suppression, we propose contrastive learning for unsupervised sentence embedding with soft negative samples (SNCSE).
no code implementations • 15 Dec 2021 • Yinan He, Lu Sheng, Jing Shao, Ziwei Liu, Zhaofan Zou, Zhizhi Guo, Shan Jiang, Curitis Sun, Guosheng Zhang, Keyao Wang, Haixiao Yue, Zhibin Hong, Wanguo Wang, Zhenyu Li, Qi Wang, Zhenli Wang, Ronghao Xu, Mingwen Zhang, Zhiheng Wang, Zhenhang Huang, Tianming Zhang, Ningning Zhao
The rapid progress of photorealistic synthesis techniques has reached a critical point where the boundary between real and manipulated images starts to blur.
1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao
Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.
Ranked #5 on
Long-tail Learning
on Places-LT
(using extra training data)
no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.
4 code implementations • ICLR 2022 • Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan
Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks.
no code implementations • 27 Jun 2021 • Bowen Yang, Jing Zhang, Zhenfei Yin, Jing Shao
In practice, given a handful of labeled samples from a new deployment scenario (target domain) and abundant labeled face images in the existing source domain, the FAS system is expected to perform well in the new scenario without sacrificing the performance on the original domain.
2 code implementations • CVPR 2021 • Yinan He, Bei Gan, Siyu Chen, Yichun Zhou, Guojun Yin, Luchuan Song, Lu Sheng, Jing Shao, Ziwei Liu
To counter this emerging threat, we construct the ForgeryNet dataset, an extremely large face forgery dataset with unified annotations in image- and video-level data across four tasks: 1) Image Forgery Classification, including two-way (real / fake), three-way (real / fake with identity-replaced forgery approaches / fake with identity-remained forgery approaches), and n-way (real and 15 respective forgery approaches) classification.
1 code implementation • 25 Feb 2021 • Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu, Shuo Yang, Yuanjun Xiong, Wei Xia, Yan Xu, Man Luo, Jian Liu, Jianshu Li, Zhijun Chen, Mingyu Guo, Hui Li, Junfu Liu, Pengfei Gao, Tianqi Hong, Hao Han, Shijie Liu, Xinhua Chen, Di Qiu, Cheng Zhen, Dashuang Liang, Yufeng Jin, Zhanlong Hao
It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects.
no code implementations • 2 Nov 2020 • ZiHao Wang, Chen Lin, Lu Sheng, Junjie Yan, Jing Shao
Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability.
no code implementations • ECCV 2020 • Kun Yuan, Quanquan Li, Jing Shao, Junjie Yan
In this paper, we attempt to optimize the connectivity in neural networks.
1 code implementation • ECCV 2020 • Yuanhan Zhang, Zhenfei Yin, Yidong Li, Guojun Yin, Junjie Yan, Jing Shao, Ziwei Liu
The main reason is that current face anti-spoofing datasets are limited in both quantity and diversity.
3 code implementations • ECCV 2020 • Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, Jing Shao
As realistic facial manipulation technologies have achieved remarkable progress, social concerns about potential malicious abuse of these technologies bring out an emerging research topic of face forgery detection.
2 code implementations • 16 Jun 2020 • Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu
This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.
3 code implementations • CVPR 2021 • Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li
We propose to explicitly model the Actor-Context-Actor Relation, which is the relation between two actors based on their interactions with the context.
Ranked #2 on
Action Recognition
on AVA v2.1
2 code implementations • 30 Nov 2019 • Minghua Liu, Lu Sheng, Sheng Yang, Jing Shao, Shi-Min Hu
3D point cloud completion, the task of inferring the complete geometric shape from a partial point cloud, has been attracting attention in the community.
Ranked #8 on
Point Cloud Completion
on ShapeNet
1 code implementation • NeurIPS 2019 • Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li
Semantic image synthesis aims at generating photorealistic images from semantic layouts.
no code implementations • 25 Sep 2019 • Kun Yuan, Quanquan Li, Yucong Zhou, Jing Shao, Junjie Yan
Seeking effective networks has become one of the most crucial and practical areas in deep learning.
1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Ranked #9 on
Image Retrieval
on Flickr30K 1K test
no code implementations • CVPR 2019 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao
Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.
Ranked #3 on
Dense Captioning
on Visual Genome
no code implementations • CVPR 2019 • Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao
Synthesizing photo-realistic images from text descriptions is a challenging problem.
2 code implementations • CVPR 2019 • Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.
no code implementations • 3 Mar 2019 • Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy
Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.
no code implementations • CVPR 2019 • Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li
Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.
1 code implementation • 16 Sep 2018 • Yongcheng Liu, Lu Sheng, Jing Shao, Junjie Yan, Shiming Xiang, Chunhong Pan
Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs.
Ranked #9 on
Multi-Label Classification
on NUS-WIDE
no code implementations • ECCV 2018 • Yu Liu, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang
It is inspired by the observation of the weights in classification layer (called extit{anchors}) converge to the central direction of each class in hyperspace.
no code implementations • 28 Aug 2018 • Pengze Liu, Xihui Liu, Junjie Yan, Jing Shao
Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos.
2 code implementations • 16 Aug 2018 • Zhao Zhong, Zichen Yang, Boyang Deng, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
The block-wise generation brings unique advantages: (1) it yields state-of-the-art results in comparison to the hand-crafted networks on image classification, particularly, the best network generated by BlockQNN achieves 2. 35% top-1 error rate on CIFAR-10.
no code implementations • ECCV 2018 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy
We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.
3 code implementations • CVPR 2018 • Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang
Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.
no code implementations • CVPR 2018 • Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes learning disentangled but complementary face features with minimal supervision by face identification.
no code implementations • ECCV 2018 • Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang
The aim of image captioning is to generate captions by machine to describe image contents.
no code implementations • ICCV 2017 • Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang
In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed.
2 code implementations • ICCV 2017 • Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang
Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems.
Ranked #2 on
Pedestrian Attribute Recognition
on RAP
1 code implementation • CVPR 2018 • Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu
Convolutional neural networks have gained a remarkable success in computer vision.
1 code implementation • CVPR 2017 • Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, Xiaoou Tang
Person re-identification (ReID) is an important task in video surveillance and has various applications.
no code implementations • CVPR 2016 • Jing Shao, Chen-Change Loy, Kai Kang, Xiaogang Wang
Learning and capturing both appearance and dynamic representations are pivotal for crowd video understanding.
no code implementations • CVPR 2015 • Jing Shao, Kai Kang, Chen Change Loy, Xiaogang Wang
We further measure user study performance on WWW and compare this with the proposed deep models.
no code implementations • CVPR 2014 • Jing Shao, Chen Change Loy, Xiaogang Wang
Groups are the primary entities that make up a crowd.