Search Results for author: Shuo Chen

Found 49 papers, 26 papers with code

Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion

1 code implementation18 May 2024 Zeyu Zhang, Yiran Wang, Biao Wu, Shuo Chen, Zhiyuan Zhang, Shiya Huang, Wenbo Zhang, Meng Fang, Ling Chen, Yang Zhao

Firstly, we proposed a novel agent-based approach named Motion Avatar, which allows for the automatic generation of high-quality customizable human and animal avatars with motions through text queries.

Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

no code implementations4 Apr 2024 Shuo Chen, Zhen Han, Bailan He, Zifeng Ding, Wenqian Yu, Philip Torr, Volker Tresp, Jindong Gu

Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs.

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

no code implementations8 Mar 2024 Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu

In this work, we present the Convolutional Reconstruction Model (CRM), a high-fidelity feed-forward single image-to-3D generative model.

Image to 3D

PromptKD: Unsupervised Prompt Distillation for Vision-Language Models

1 code implementation5 Mar 2024 Zheng Li, Xiang Li, Xinyi Fu, Xin Zhang, Weiqiang Wang, Shuo Chen, Jian Yang

To our best knowledge, we are the first to (1) perform unsupervised domain-specific prompt-driven knowledge distillation for CLIP, and (2) establish a practical pre-storing mechanism of text features as shared class vectors between teacher and student.

Knowledge Distillation Prompt Engineering +1

Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

no code implementations22 Feb 2024 Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu

Our research evaluates the adversarial robustness of MLLMs when employing CoT reasoning, finding that CoT marginally improves adversarial robustness against existing attack methods.

Adversarial Robustness

Multi-View Neural 3D Reconstruction of Micro-/Nanostructures with Atomic Force Microscopy

1 code implementation21 Jan 2024 Shuo Chen, Mao Peng, Yijin Li, Bing-Feng Ju, Hujun Bao, Yuan-Liu Chen, Guofeng Zhang

However, conventional AFM scanning struggles to reconstruct complex 3D micro-/nanostructures precisely due to limitations such as incomplete sample topography capturing and tip-sample convolution artifacts.

3D Reconstruction Surface Reconstruction

CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents

1 code implementation19 Jan 2024 Siyuan Qi, Shuo Chen, Yexin Li, Xiangyu Kong, Junqi Wang, Bangcheng Yang, Pring Wong, Yifan Zhong, Xiaoyuan Zhang, Zhaowei Zhang, Nian Liu, Wei Wang, Yaodong Yang, Song-Chun Zhu

Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning.

Decision Making

Direct Distillation between Different Domains

no code implementations12 Jan 2024 Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama

Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data.

Domain Adaptation Knowledge Distillation

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

no code implementations9 Jan 2024 Weimin WANG, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field.

MORPH Video Generation

Understanding and Improving In-Context Learning on Vision-language Models

no code implementations29 Nov 2023 Shuo Chen, Zhen Han, Bailan He, Mark Buckley, Philip Torr, Volker Tresp, Jindong Gu

Our findings indicate that ICL in VLMs is predominantly driven by the textual information in the demonstrations whereas the visual information in the demonstrations barely affects the ICL performance.

In-Context Learning

Atom-Motif Contrastive Transformer for Molecular Property Prediction

no code implementations11 Oct 2023 Wentao Yu, Shuo Chen, Chen Gong, Gang Niu, Masashi Sugiyama

As motifs in a molecule are significant patterns that are of great importance for determining molecular properties (e. g., toxicity and solubility), overlooking motif interactions inevitably hinders the effectiveness of MPP.

Molecular Property Prediction Property Prediction

Creative Birds: Self-Supervised Single-View 3D Style Transfer

2 code implementations ICCV 2023 Renke Wang, Guimin Que, Shuo Chen, Xiang Li, Jun Li, Jian Yang

Our focus lies primarily on birds, a popular subject in 3D reconstruction, for which no existing single-view 3D transfer methods have been developed. The method we propose seeks to generate a 3D mesh shape and texture of a bird from two single-view images.

3D Reconstruction Style Transfer

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

1 code implementation24 Jul 2023 Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr

This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e. g. Flamingo), image-text matching models (e. g.

Image-text matching Language Modelling +4

Minimizing Age of Information for Mobile Edge Computing Systems: A Nested Index Approach

no code implementations3 Jul 2023 Shuo Chen, Ning Yang, Meng Zhang, Jun Wang

In this paper, we consider multiple users offloading tasks to heterogeneous edge servers in a MEC system.


Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation

1 code implementation16 Jun 2023 Shuo Chen, Yingjun Du, Pascal Mettes, Cees G. M. Snoek

This paper investigates the problem of scene graph generation in videos with the aim of capturing semantic relations between subjects and objects in the form of $\langle$subject, predicate, object$\rangle$ triplets.

Graph Generation Meta-Learning +1

On Underdamped Nesterov's Acceleration

no code implementations28 Apr 2023 Shuo Chen, Bin Shi, Ya-xiang Yuan

In this paper, based on the high-resolution differential equation framework, we construct the new Lyapunov functions for the underdamped case, which is motivated by the power of the time $t^{\gamma}$ or the iteration $k^{\gamma}$ in the mixed term.

Decision-making with Speculative Opponent Models

no code implementations22 Nov 2022 Jing Sun, Shuo Chen, Cong Zhang, Yining Ma, Jie Zhang

To address this issue, we introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC), the first speculative opponent modelling algorithm that relies solely on local information (i. e., the controlled agent's observations, actions, and rewards).

Decision Making SMAC+ +1

R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

2 code implementations20 Nov 2022 Shuo Chen, Tan Yu, Ping Li

Recently, vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community.

3D Object Recognition Image Classification +1

IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis

1 code implementation ICCV 2023 Weicai Ye, Shuo Chen, Chong Bao, Hujun Bao, Marc Pollefeys, Zhaopeng Cui, Guofeng Zhang

Existing inverse rendering combined with neural rendering methods can only perform editable novel view synthesis on object-specific scenes, while we present intrinsic neural radiance fields, dubbed IntrinsicNeRF, which introduce intrinsic decomposition into the NeRF-based neural rendering method and can extend its application to room-scale scenes.

Clustering Inverse Rendering +2

Gradient Norm Minimization of Nesterov Acceleration: $o(1/k^3)$

no code implementations19 Sep 2022 Shuo Chen, Bin Shi, Ya-xiang Yuan

In the history of first-order algorithms, Nesterov's accelerated gradient descent (NAG) is one of the milestones.

Open-Ended Question Answering

Higher-order accurate two-sample network inference and network hashing

1 code implementation16 Aug 2022 Meijia Shao, Dong Xia, Yuan Zhang, Qiong Wu, Shuo Chen

Two-sample hypothesis testing for network comparison presents many significant challenges, including: leveraging repeated network observations and known node registration, but without requiring them to operate; relaxing strong structural assumptions; achieving finite-sample higher-order accuracy; handling different network sizes and sparsity levels; fast computation and memory parsimony; controlling false discovery rate (FDR) in multiple testing; and theoretical understandings, particularly regarding finite-sample accuracy and minimax optimality.

Vocal Bursts Valence Prediction

PVO: Panoptic Visual Odometry

1 code implementation CVPR 2023 Weicai Ye, Xinyue Lan, Shuo Chen, Yuhang Ming, Xingyuan Yu, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

We present PVO, a novel panoptic visual odometry framework to achieve more comprehensive modeling of the scene motion, geometry, and panoptic segmentation information.

Optical Flow Estimation Pose Estimation +3

Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation

1 code implementation CVPR 2022 Jinchao Yang, Fei Guo, Shuo Chen, Jun Li, Jian Yang

Given a source product, a target product, and an art style image, our method produces a neural warping field that warps the source shape to imitate the geometric style of the target and a neural texture transformation network that transfers the artistic style to the warped source product.

Style Transfer

MVT: Multi-view Vision Transformer for 3D Object Recognition

2 code implementations25 Oct 2021 Shuo Chen, Tan Yu, Ping Li

Nevertheless, multi-view CNN models cannot model the communications between patches from different views, limiting its effectiveness in 3D object recognition.

3D Object Recognition Inductive Bias +1

Diagnosing Errors in Video Relation Detectors

1 code implementation25 Oct 2021 Shuo Chen, Pascal Mettes, Cees G. M. Snoek

Video relation detection forms a new and challenging problem in computer vision, where subjects and objects need to be localized spatio-temporally and a predicate label needs to be assigned if and only if there is an interaction between the two.

Action Localization Object +3

Spectrum-to-Kernel Translation for Accurate Blind Image Super-Resolution

no code implementations NeurIPS 2021 Guangpin Tao, Xiaozhong Ji, Wenzhuo Wang, Shuo Chen, Chuming Lin, Yun Cao, Tong Lu, Donghao Luo, Ying Tai

In this paper, we propose a novel blind SR framework to super-resolve LR images degraded by arbitrary blur kernel with accurate kernel estimation in frequency domain.

Image Super-Resolution Translation

Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?

1 code implementation20 Sep 2021 Deqiang Li, Tian Qiu, Shuo Chen, Qianmu Li, Shouhuai Xu

Our main findings are: (i) predictive uncertainty indeed helps achieve reliable malware detection in the presence of dataset shift, but cannot cope with adversarial evasion attacks; (ii) approximate Bayesian methods are promising to calibrate and generalize malware detectors to deal with dataset shift, but cannot cope with adversarial evasion attacks; (iii) adversarial evasion attacks can render calibration methods useless, and it is an open problem to quantify the uncertainty associated with the predicted labels of adversarial examples (i. e., it is not effective to use predictive uncertainty to detect adversarial examples).

Android Malware Detection Malware Detection

Social Fabric: Tubelet Compositions for Video Relation Detection

1 code implementation ICCV 2021 Shuo Chen, Zenglin Shi, Pascal Mettes, Cees G. M. Snoek

We also propose Social Fabric: an encoding that represents a pair of object tubelets as a composition of interaction primitives.

Object Relation +2

Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model

1 code implementation NeurIPS 2021 Jiangning Zhang, Chao Xu, Jian Li, Wenzhou Chen, Yabiao Wang, Ying Tai, Shuo Chen, Chengjie Wang, Feiyue Huang, Yong liu

Inspired by biological evolution, we explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA) and derive that both of them have consistent mathematical representation.

Image Retrieval Retrieval

Contrastive Embedding for Generalized Zero-Shot Learning

3 code implementations CVPR 2021 Zongyan Han, ZhenYong Fu, Shuo Chen, Jian Yang

To tackle this issue, we propose to integrate the generation model with the embedding model, yielding a hybrid GZSL framework.

Generalized Zero-Shot Learning

Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

7 code implementations NeurIPS 2020 Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang

Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.

Dense Object Detection General Classification

Neural Architecture Search for Compressed Sensing Magnetic Resonance Image Reconstruction

1 code implementation22 Feb 2020 Jiangpeng Yan, Shuo Chen, Yongbing Zhang, Xiu Li

Our proposed method can reach a better trade-off between computation cost and reconstruction performance for MR reconstruction problem with good generalizability and offer insights to design neural networks for other medical image applications.

Image Reconstruction Neural Architecture Search +1

Curvilinear Distance Metric Learning

1 code implementation NeurIPS 2019 Shuo Chen, Lei Luo, Jian Yang, Chen Gong, Jun Li, Heng Huang

To address this issue, we first reveal that the traditional linear distance metric is equivalent to the cumulative arc length between the data pair's nearest points on the learned straight measurer lines.

Metric Learning

GANSynth: Adversarial Neural Audio Synthesis

6 code implementations ICLR 2019 Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, Adam Roberts

Efficient audio synthesis is an inherently difficult machine learning task, as human perception is sensitive to both global structure and fine-scale waveform coherence.

Audio Generation Audio Synthesis

Formal Specification and Verification of Smart Contracts for Azure Blockchain

1 code implementation20 Dec 2018 Shuvendu K. Lahiri, Shuo Chen, Yuepeng Wang, Isil Dillig

In this paper, we describe the formal verification of Smart Contracts offered as part of the Azure Blockchain Content and Samples on github.

Programming Languages F.3.1

Adversarial Metric Learning

no code implementations9 Feb 2018 Shuo Chen, Chen Gong, Jian Yang, Xiang Li, Yang Wei, Jun Li

In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both the adversarial pairs and the original training pairs.

Metric Learning

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

4 code implementations CVPR 2019 Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang

Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test.

Deep Multi-Species Embedding

no code implementations28 Sep 2016 Di Chen, Yexiang Xue, Shuo Chen, Daniel Fink, Carla Gomes

Additionally, we demonstrate the benefit of using a deep neural network to extract features within the embedding and show how they improve the predictive performance of species distribution modelling.

Cannot find the paper you are looking for? You can Submit a new open access paper.