Search Results for author: Zeyu Wang

Found 56 papers, 30 papers with code

What If We Recaption Billions of Web Images with LLaMA-3?

no code implementations • 12 Jun 2024 • Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie

For discriminative models like CLIP, we observe enhanced zero-shot performance in cross-modal retrieval tasks.

Paper
Add Code

Mamba YOLO: SSMs-Based YOLO For Object Detection

1 code implementation • 9 Jun 2024 • Zeyu Wang, Chen Li, Huiying Xu, Xinzhong Zhu

To further enhance detection performance, Transformer-based structures have been introduced, significantly expanding the model's receptive field and achieving notable performance gains.

Novel Object Detection Object +2

Paper
Code

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

1 code implementation • 8 Jun 2024 • Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.

Conditional Image Generation Denoising +2

Paper
Code

Knowledge-enhanced Relation Graph and Task Sampling for Few-shot Molecular Property Prediction

1 code implementation • 24 May 2024 • Zeyu Wang, Tianyi Jiang, Yao Lu, Xiaoze Bao, Shanqing Yu, Bin Wei, Qi Xuan

The knowledge-enhanced relation graph module constructs the molecule-property multi-relation graph (MPMRG) to capture the many-to-many relationships between molecules and properties.

Meta-Learning Molecular Property Prediction +3

Paper
Code

G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios

1 code implementation • 13 May 2024 • Zeyu Wang, Yuanchun Shi, Yuntao Wang, Yuchen Yao, Kun Yan, YuHan Wang, Lei Ji, Xuhai Xu, Chun Yu

Modern information querying systems are progressively incorporating multimodal inputs like vision and audio.

Natural Language Queries

Paper
Code

Camera-Based Remote Physiology Sensing for Hundreds of Subjects Across Skin Tones

1 code implementation • 7 Apr 2024 • Jiankai Tang, Xinyi Li, Jiacheng Liu, Xiyuxing Zhang, Zeyu Wang, Yuntao Wang

Remote photoplethysmography (rPPG) emerges as a promising method for non-invasive, convenient measurement of vital signs, utilizing the widespread presence of cameras.

Paper
Code

HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

no code implementations • 14 Mar 2024 • Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang

We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance.

Attribute

Paper
Add Code

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

1 code implementation • CVPR 2024 • Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang

We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device.

314

Paper
Code

MemoNav: Working Memory Model for Visual Navigation

1 code implementation • CVPR 2024 • Hongxin Li, Zeyu Wang, Xu Yang, Yuran Yang, Shuqi Mei, Zhaoxiang Zhang

Subsequently, a graph attention module encodes the retained STM and the LTM to generate working memory (WM) which contains the scene features essential for efficient navigation.

Decision Making Graph Attention +2

Paper
Code

Multi-Human Mesh Recovery with Transformers

no code implementations • 26 Feb 2024 • Zeyu Wang, Zhenzhen Weng, Serena Yeung-Levy

Conventional approaches to human mesh recovery predominantly employ a region-based strategy.

Human Mesh Recovery

Paper
Add Code

HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification

no code implementations • 24 Jan 2024 • Vidit Jain, Mukund Rungta, Yuchen Zhuang, Yue Yu, Zeyu Wang, Mu Gao, Jeffrey Skolnick, Chao Zhang

The best-performing models aim to learn a static representation by combining document and hierarchical label information.

Language Modelling Multi Label Text Classification +3

Paper
Add Code

BoNuS: Boundary Mining for Nuclei Segmentation with Partial Point Labels

1 code implementation • 15 Jan 2024 • Yi Lin, Zeyu Wang, Dong Zhang, Kwang-Ting Cheng, Hao Chen

To alleviate this problem, in this paper, we propose a weakly-supervised nuclei segmentation method that only requires partial point labels of nuclei.

Multiple Instance Learning Segmentation

Paper
Code

Revisiting Adversarial Training at Scale

1 code implementation • CVPR 2024 • Zeyu Wang, Xianhang Li, Hongru Zhu, Cihang Xie

For example, by training on DataComp-1B dataset, our AdvXL empowers a vanilla ViT-g model to substantially surpass the previous records of $l_{\infty}$-, $l_{2}$-, and $l_{1}$-robust accuracy by margins of 11. 4%, 14. 2% and 12. 9%, respectively.

Paper
Code

Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry

1 code implementation • 7 Jan 2024 • Zeyu Wang, Tianyi Jiang, Jinhuan Wang, Qi Xuan

Molecular property prediction refers to the task of labeling molecules with some biochemical properties, playing a pivotal role in the drug discovery and design process.

Data Augmentation Drug Discovery +4

Paper
Code

A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning

1 code implementation • 4 Jan 2024 • Parvin Malekzadeh, Konstantinos N. Plataniotis, Zissis Poulos, Zeyu Wang

Distributional Reinforcement Learning (RL) estimates return distribution mainly by learning quantile values via minimizing the quantile Huber loss function, entailing a threshold parameter often selected heuristically or via hyperparameter search, which may not generalize well and can be suboptimal.

Atari Games Distributional Reinforcement Learning +1

Paper
Code

Voila-A: Aligning Vision-Language Models with User's Gaze Attention

no code implementations • 22 Dec 2023 • Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.

Paper
Add Code

MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoising

no code implementations • 18 Dec 2023 • Bingyuan Wang, Hengyu Meng, Zeyu Cai, Lanjiong Li, Yue Ma, Qifeng Chen, Zeyu Wang

Visual storytelling often uses nontypical aspect-ratio images like scroll paintings, comic strips, and panoramas to create an expressive and compelling narrative.

Denoising Image Generation +1

Paper
Add Code

Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis

no code implementations • 14 Dec 2023 • Frank P. -W. Lo, Jianing Qiu, Zeyu Wang, Junhong Chen, Bo Xiao, Wu Yuan, Stamatia Giannarou, Gary Frost, Benny Lo

Although artificial intelligence (AI)-based solutions have been devised to automate the dietary assessment process, these prior AI methodologies encounter challenges in their ability to generalize across a diverse range of food types, dietary behaviors, and cultural contexts.

Image Captioning Scene Understanding

Paper
Add Code

Rejuvenating image-GPT as Strong Visual Representation Learners

1 code implementation • 4 Dec 2023 • Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie

This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict next pixels for visual representation learning.

Representation Learning

Paper
Code

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

1 code implementation • CVPR 2024 • Yipeng Gao, Zeyu Wang, Wei-Shi Zheng, Cihang Xie, Yuyin Zhou

Contrastive learning has emerged as a promising paradigm for 3D open-world understanding, i. e., aligning point cloud representation to image and text embedding space individually.

Ranked #1 on Zero-shot 3D classification on Objaverse LVIS (using extra training data)

Contrastive Learning Retrieval +3

Paper
Code

FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning

1 code implementation • 6 Oct 2023 • Peiran Xu, Zeyu Wang, Jieru Mei, Liangqiong Qu, Alan Yuille, Cihang Xie, Yuyin Zhou

Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage.

Federated Learning

Paper
Code

Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation

no code implementations • 27 Sep 2023 • Zeyu Wang, Mizuho Iwaihara

Since aspect category detection often suffers from limited datasets and data sparsity, the prototypical network with attention mechanisms has been applied for few-shot aspect category detection.

Aspect Category Detection Sentence

Paper
Add Code

DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation

1 code implementation • ICCV 2023 • Zeyu Wang, Dingwen Li, Chenxu Luo, Cihang Xie, Xiaodong Yang

In this work, we propose to boost the representation learning of a multi-camera BEV based student detector by training it to imitate the features of a well-trained LiDAR based teacher detector.

3D Object Detection Autonomous Driving +4

Paper
Code

CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy

2 code implementations • 27 Jun 2023 • Xianhang Li, Zeyu Wang, Cihang Xie

The recent work CLIPA presents an inverse scaling law for CLIP training -- whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.

281

Paper
Code

Subgraph Networks Based Contrastive Learning

no code implementations • 6 Jun 2023 • Jinhuan Wang, Jiafei Shao, Zeyu Wang, Shanqing Yu, Qi Xuan, Xiaoniu Yang

In addition, we also investigate the impact of the second-order subgraph augmentation on mining graph structure interactions, and further, propose a contrastive objective that fuses the first-order and second-order subgraph information.

Attribute Contrastive Learning +3

Paper
Add Code

ZeroAvatar: Zero-shot 3D Avatar Generation from a Single Image

no code implementations • 25 May 2023 • Zhenzhen Weng, Zeyu Wang, Serena Yeung

Recent advancements in text-to-image generation have enabled significant progress in zero-shot 3D shape generation.

3D Shape Generation Image to 3D +1

Paper
Add Code

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

3 code implementations • 16 May 2023 • Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Zeyu Wang, Zhengxin Zhang, Rae Ying Yee Wong, Alan Zhu, Lijie Yang, Xiaoxiang Shi, Chunan Shi, Zhuoming Chen, Daiyaan Arfeen, Reyna Abhyankar, Zhihao Jia

Our evaluation shows that SpecInfer outperforms existing LLM serving systems by 1. 5-2. 8x for distributed LLM inference and by 2. 6-3. 5x for offloading-based LLM inference, while preserving the same generative performance.

Decoder Language Modelling +1

1,565

Paper
Code

An Inverse Scaling Law for CLIP Training

1 code implementation • NeurIPS 2023 • Xianhang Li, Zeyu Wang, Cihang Xie

However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration.

281

Paper
Code

Click-Feedback Retrieval

no code implementations • 28 Apr 2023 • Zeyu Wang, Yu Wu

In this work, we study a setting where the feedback is provided through users clicking liked and disliked searching results.

Retrieval

Paper
Add Code

A robust method for reliability updating with equality information using sequential adaptive importance sampling

no code implementations • 8 Mar 2023 • Xiong Xiao, Zeyu Wang, Quanwang Li

Reliability updating refers to a problem that integrates Bayesian updating technique with structural reliability analysis and cannot be directly solved by structural reliability methods (SRMs) when it involves equality information.

Computational Efficiency

Paper
Add Code

On the Adversarial Robustness of Camera-based 3D Object Detection

1 code implementation • 25 Jan 2023 • Shaoyuan Xie, Zichao Li, Zeyu Wang, Cihang Xie

In recent years, camera-based 3D object detection has gained widespread attention for its ability to achieve high performance with low computational cost.

3D Object Detection Adversarial Attack +5

Paper
Code

Thermal Infrared Image Inpainting via Edge-Aware Guidance

no code implementations • 28 Oct 2022 • Zeyu Wang, Haibin Shen, Changyou Men, Quan Sun, Kejie Huang

In this paper, we propose a novel task -- Thermal Infrared Image Inpainting, which aims to reconstruct missing regions of TIR images.

Image Inpainting

Paper
Add Code

Bag of Tricks for FGSM Adversarial Training

no code implementations • 6 Sep 2022 • Zichao Li, Li Liu, Zeyu Wang, Yuyin Zhou, Cihang Xie

Adversarial training (AT) with samples generated by Fast Gradient Sign Method (FGSM), also known as FGSM-AT, is a computationally simple method to train robust networks.

Paper
Add Code

Masked Autoencoders Enable Efficient Knowledge Distillers

1 code implementation • CVPR 2023 • Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.

Knowledge Distillation

Paper
Code

Predicting Word Learning in Children from the Performance of Computer Vision Systems

no code implementations • 7 Jul 2022 • Sunayana Rane, Mira L. Nencheva, Zeyu Wang, Casey Lew-Williams, Olga Russakovsky, Thomas L. Griffiths

The performance of the computer vision systems is correlated with human judgments of the concreteness of words, which are in turn a predictor of children's word learning, suggesting that these models are capturing the relationship between words and visual phenomena.

Image Captioning

Paper
Add Code

InsMix: Towards Realistic Generative Data Augmentation for Nuclei Instance Segmentation

1 code implementation • 30 Jun 2022 • Yi Lin, Zeyu Wang, Kwang-Ting Cheng, Hao Chen

Nuclei Segmentation from histology images is a fundamental task in digital pathology analysis.

Data Augmentation Instance Segmentation +2

Paper
Code

Learning Optimal Treatment Strategies for Sepsis Using Offline Reinforcement Learning in Continuous Space

no code implementations • 22 Jun 2022 • Zeyu Wang, Huiying Zhao, Peng Ren, Yuxi Zhou, Ming Sheng

Sepsis is a leading cause of death in the ICU.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Can CNNs Be More Robust Than Transformers?

1 code implementation • 7 Jun 2022 • Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie

The recent success of Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade.

143

Paper
Code

Gamma and Vega Hedging Using Deep Distributional Reinforcement Learning

1 code implementation • 10 May 2022 • Jay Cao, Jacky Chen, Soroush Farghadani, John Hull, Zissis Poulos, Zeyu Wang, Jun Yuan

We show how D4PG can be used in conjunction with quantile regression to develop a hedging strategy for a trader responsible for derivatives that arrive stochastically and depend on a single underlying asset.

Distributional Reinforcement Learning Position +2

Paper
Code

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

no code implementations • 25 Jan 2022 • Suchin Gururangan, Dallas Card, Sarah K. Dreier, Emily K. Gade, Leroy Z. Wang, Zeyu Wang, Luke Zettlemoyer, Noah A. Smith

Language models increasingly rely on massive web dumps for diverse text data.

Language Modelling

Paper
Add Code

Multi-Query Video Retrieval

1 code implementation • 10 Jan 2022 • Zeyu Wang, Yu Wu, Karthik Narasimhan, Olga Russakovsky

Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years.

Retrieval Video Retrieval

Paper
Code

Human Activity Recognition Using 3D Orthogonally-projected EfficientNet on Radar Time-Range-Doppler Signature

no code implementations • 24 Nov 2021 • Zeyu Wang, Chenglin Yao, Jianfeng Ren, Xudong Jiang

In radar activity recognition, 2D signal representations such as spectrogram, cepstrum and cadence velocity diagram are often utilized, while range information is often neglected.

Human Activity Recognition

Paper
Add Code

Semi-Supervised Crowd Counting from Unlabeled Data

no code implementations • 31 Aug 2021 • Haoran Duan, Fan Wan, Rui Sun, Zeyu Wang, Varun Ojha, Yu Guan, Hubert P. H. Shum, Bingzhang Hu, Yang Long

Our method achieved competitive performance in semi-supervised learning approaches on these crowd counting datasets.

Crowd Counting

Paper
Add Code

Chi-square Loss for Softmax: an Echo of Neural Network Structure

no code implementations • 31 Aug 2021 • Zeyu Wang, Meiqing Wang

In addition, we studied the sample distribution of this loss function by visualization and found that the distribution is related to the neural network structure, which is distinct compared to cross-entropy.

Paper
Add Code

maplet: An extensible R toolbox for modular and reproducible omics pipelines

no code implementations • 6 May 2021 • Kelsey Chetnik, Elisa Benedetti, Daniel P. Gomari, Annalise Schweickart, Richa Batra, Mustafa Buyukozkan, Zeyu Wang, Matthias Arnold, Jonas Zierer, Karsten Suhre, Jan Krumsiek

This paper presents maplet, an open-source R package for the creation of highly customizable, fully reproducible statistical pipelines for omics data analysis, with a special focus on metabolomics-based methods.

Data Visualization

Paper
Add Code

Plot2API: Recommending Graphic API from Plot via Semantic Parsing Guided Neural Network

1 code implementation • 2 Apr 2021 • Zeyu Wang, Sheng Huang, Zhongxin Liu, Meng Yan, Xin Xia, Bei Wang, Dan Yang

Considering the lack of technologies in Plot2API, we present a novel deep multi-task learning approach named Semantic Parsing Guided Neural Network (SPGNN) which translates the Plot2API issue as a multi-label image classification and an image semantic parsing tasks for the solution.

Data Augmentation Data Visualization +3

Paper
Code

Towards Unique and Informative Captioning of Images

1 code implementation • ECCV 2020 • Zeyu Wang, Berthy Feng, Karthik Narasimhan, Olga Russakovsky

We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be 'topped' using simple captioning systems relying on object detectors.

Image Captioning Re-Ranking

Paper
Code

Value of Information Analysis via Active Learning and Knowledge Sharing in Error-Controlled Adaptive Kriging

no code implementations • 6 Feb 2020 • Chi Zhang, Zeyu Wang, Abdollah Shafieezadeh

The proposed VoI analysis framework is applied for an optimal decision-making problem involving load testing of a truss bridge.

Active Learning Bayesian Inference +1

Paper
Add Code

REAK: Reliability analysis through Error rate-based Adaptive Kriging

no code implementations • 4 Feb 2020 • Zeyu Wang, Abdollah Shafieezadeh

Reliability analysis for these systems when failure probabilities are small is significantly challenging, requiring a large number of costly simulations.

Paper
Add Code

Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation

3 code implementations • CVPR 2020 • Zeyu Wang, Klint Qinami, Ioannis Christos Karakozis, Kyle Genova, Prem Nair, Kenji Hata, Olga Russakovsky

We design a simple but surprisingly effective visual recognition benchmark for studying bias mitigation.

Ranked #1 on Out-of-Distribution Generalization on UrbanCars

Activity Recognition Attribute +3

Paper
Code

SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines

6 code implementations • 14 Nov 2019 • Yinda Xu, Zeyu Wang, Zuoxin Li, Ye Yuan, Gang Yu

Following these guidelines, we design our Fully Convolutional Siamese tracker++ (SiamFC++) by introducing both classification and target state estimation branch(G1), classification score without ambiguity(G2), tracking without prior knowledge(G3), and estimation quality score(G4).

Ranked #2 on Visual Object Tracking on VOT2017/18 (using extra training data)

Classification General Classification +3

816

Paper
Code

Self-supervised Learning of Detailed 3D Face Reconstruction

1 code implementation • 25 Oct 2019 • Yajing Chen, Fanzi Wu, Zeyu Wang, Yibing Song, Yonggen Ling, Linchao Bao

The displacement map and the coarse model are used to render a final detailed face, which again can be compared with the original input image to serve as a photometric loss for the second stage.

3D Face Reconstruction Face Alignment +1

Paper
Code

SCANN: Synthesis of Compact and Accurate Neural Networks

no code implementations • 19 Apr 2019 • Shayan Hassantabar, Zeyu Wang, Niraj K. Jha

To address these challenges, we propose a two-step neural network synthesis methodology, called DR+SCANN, that combines two complementary approaches to design compact and accurate DNNs.

Dimensionality Reduction Neural Network Compression +1

Paper
Add Code

Multi-band Weighted $l_p$ Norm Minimization for Image Denoising

no code implementations • 14 Jan 2019 • Yanchi Su, Zhanshan Li, Haihong Yu, Zeyu Wang

Low rank matrix approximation (LRMA) has drawn increasing attention in recent years, due to its wide range of applications in computer vision and machine learning.

Image Denoising

Paper
Add Code

A Data-driven Adversarial Examples Recognition Framework via Adversarial Feature Genome

no code implementations • 25 Dec 2018 • Li Chen, Qi Li, Weiye Chen, Zeyu Wang, Haifeng Li

In this regard, we propose the Adversarial Feature Genome (AFG), a novel type of data that contains both the differences and features about classes.

General Classification Multi-Label Classification

Paper
Add Code

AniCode: Authoring Coded Artifacts for Network-Free Personalized Animations

1 code implementation • 31 Jul 2018 • Zeyu Wang, Shiyu Qiu, Qingyang Chen, Alexander Ringlein, Julie Dorsey, Holly Rushmeier

We introduce AniCode, a novel framework for authoring and consuming time-based media.

Graphics

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.