Search Results for author: Yuxuan Zhang

Found 37 papers, 17 papers with code

Decoding the Flow: CauseMotion for Emotional Causality Analysis in Long-form Conversations

no code implementations1 Jan 2025 Yuxuan Zhang, Yulong Li, Zichen Yu, Feilong Tang, Zhixiang Lu, Chong Li, Kang Dang, Jionglong Su

To address the limitations of large-scale language models (e. g., GPT-4) in capturing intricate emotional causality within extended dialogues, we propose CauseMotion, a long-sequence emotional causal reasoning framework grounded in Retrieval-Augmented Generation (RAG) and multimodal fusion.

Causal Inference RAG

CausalTAD: Causal Implicit Generative Model for Debiased Online Trajectory Anomaly Detection

1 code implementation25 Dec 2024 Wenbin Li, Di Yao, Chang Gong, Xiaokai Chu, Quanliang Jing, Xiaolei Zhou, Yuxuan Zhang, Yunxia Fan, Jingping Bi

Existing solutions directly train a generative model for observed trajectories and calculate the conditional generative probability $P({T}|{C})$ as the anomaly risk, where ${T}$ and ${C}$ represent the trajectory and SD pair respectively.

Anomaly Detection

Comparison of Tiny Machine Learning Techniques for Embedded Acoustic Emission Analysis

no code implementations22 Nov 2024 Uditha Muthumala, Yuxuan Zhang, Luciano Sebastian Martinez-Rau, Sebastian Bader

These classifications can be performed based on the entire AE waveform or specific features that have been extracted from it.

Structural Health Monitoring

Edge-Enhanced Dilated Residual Attention Network for Multimodal Medical Image Fusion

1 code implementation18 Nov 2024 Meng Zhou, Yuxuan Zhang, Xiaolan Xu, Jiayi Wang, Farzad Khalvati

Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation, thereby enhancing diagnostic accuracy and treatment planning.

Brain Tumor Classification

On-device Anomaly Detection in Conveyor Belt Operations

no code implementations16 Nov 2024 Luciano S. Martinez-Rau, Yuxuan Zhang, Bengt Oelmann, Sebastian Bader

This study proposes two distinctive pattern recognition approaches for real-time anomaly detection in the operational cycles of mining conveyor belts, combining feature extraction, threshold-based cycle detection, and tiny machine-learning classification.

Anomaly Detection

RmGPT: Rotating Machinery Generative Pretrained Model

no code implementations26 Sep 2024 Yilin Wang, Yifei Yu, Kong Sun, Peixuan Lei, Yuxuan Zhang, Enrico Zio, Aiguo Xia, Yuanxiang Li

Extensive experiments demonstrate that RmGPT significantly outperforms state-of-the-art algorithms, achieving near-perfect accuracy in diagnosis tasks and exceptionally low errors in prognosis tasks.

Few-Shot Learning Self-Supervised Learning

Scalable quantum dynamics compilation via quantum machine learning

no code implementations24 Sep 2024 Yuxuan Zhang, Roeland Wiersema, Juan Carrasquilla, Lukasz Cincio, Yong Baek Kim

In this work, we explore the potential of a VQC scheme by making use of out-of-distribution generalization results in quantum machine learning (QML): By learning the action of a given many-body dynamics on a small data set of product states, we can obtain a unitary circuit that generalizes to highly entangled states such as the Haar random states.

Out-of-Distribution Generalization Quantum Machine Learning

GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

1 code implementation13 Aug 2024 Jinpeng Yu, Binbin Huang, Yuxuan Zhang, Huaxia Li, Xu Tang, Shenghua Gao

In this paper, we introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details.

Point Cloud Completion

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

1 code implementation12 Aug 2024 Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.

Text-to-Video Generation Video Alignment +2

Stable-Hair: Real-World Hair Transfer via Diffusion Model

1 code implementation19 Jul 2024 Yuxuan Zhang, Qing Zhang, Yiren Song, Jichao Zhang, Hao Tang, Jiaming Liu

In the second stage, we specifically designed a Hair Extractor and a Latent IdentityNet to transfer the target hairstyle with highly detailed and high-fidelity to the bald image.

Triplet

EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

1 code implementation28 Jun 2024 Yuxuan Zhang, Tianheng Cheng, Rui Hu, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

Surprisingly, we observe that: (1) multimodal prompts and (2) vision-language models with early fusion (e. g., BEIT-3) are beneficial for prompting SAM for accurate referring segmentation.

Interactive Segmentation Language Modeling +4

Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

no code implementations27 Jun 2024 Yuxuan Zhang, T. M. Sazzad, Yangyang Song, Spencer J. Chang, Ritesh Chowdhry, Tomas Mejia, Anna Hampton, Shelby Kucharski, Stefan Gerber, Barry Tillman, Marcio F. R. Resende, William M. Hammond, Chris H. Wilson, Alina Zare, Sanjeev J. Koppal

In addition, the ability to reconstruct hyperspectral data from multi-spectral input makes our device compatible to models and algorithms developed for hyperspectral applications with no modifications required.

Spectral Reconstruction

ProcessPainter: Learn Painting Process from Sequence Data

no code implementations10 Jun 2024 Yiren Song, Shijie Huang, Chen Yao, Xiaojun Ye, Hai Ci, Jiaming Liu, Yuxuan Zhang, Mike Zheng Shou

The painting process of artists is inherently stepwise and varies significantly among different painters and styles.

Denoising Image Generation

Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023

no code implementations26 Mar 2024 Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji

To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera.

Motion Detection Point Tracking +2

Fast Personalized Text-to-Image Syntheses With Attention Injection

no code implementations17 Mar 2024 Yuxuan Zhang, Yiren Song, Jinpeng Yu, Han Pan, Zhongliang Jing

Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts.

Personalized Image Generation Text-to-Image Generation

Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model

1 code implementation12 Mar 2024 Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, Haibo Zhao

Current makeup transfer methods are limited to simple makeup styles, making them difficult to apply in real-world scenarios.

Text-to-Image Generation

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

1 code implementation CVPR 2024 Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing

Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging.

Image Generation

CogAgent: A Visual Language Model for GUI Agents

3 code implementations CVPR 2024 Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e. g., computer or smartphone screens.

Ranked #4 on on

Language Modeling +4

DPP-based Client Selection for Federated Learning with Non-IID Data

no code implementations30 Mar 2023 Yuxuan Zhang, Chao Xu, Howard H. Yang, Xijun Wang, Tony Q. S. Quek

This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue.

Federated Learning

Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks

no code implementations24 Feb 2023 Yuxuan Zhang, Qingzhong Wang, Jiang Bian, Yi Liu, Yanwu Xu, Dejing Dou, Haoyi Xiong

Due to the high similarity between MRI data and videos, we conduct extensive empirical studies on video recognition techniques for MRI classification to answer the questions: (1) can we directly use video recognition models for MRI classification, (2) which model is more appropriate for MRI, (3) are the common tricks like data augmentation in video recognition still useful for MRI classification?

Classification Data Augmentation +3

Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized Photography

no code implementations CVPR 2023 Ilya Chugunov, Yuxuan Zhang, Felix Heide

Modern mobile burst photography pipelines capture and merge a short sequence of frames to recover an enhanced image, but often disregard the 3D nature of the scene they capture, treating pixel motion between images as a 2D aggregation problem.

Depth And Camera Motion Pose Estimation

Neural Volume Super-Resolution

no code implementations9 Dec 2022 Yuval Bahat, Yuxuan Zhang, Hendrik Sommerhoff, Andreas Kolb, Felix Heide

This allows us to super-resolve the 3D scene representation by applying 2D convolutional networks on the 2D feature planes.

Super-Resolution

An Attention-based Multi-Scale Feature Learning Network for Multimodal Medical Image Fusion

1 code implementation9 Dec 2022 Meng Zhou, Xiaolan Xu, Yuxuan Zhang

Furthermore, we propose a novel fixed fusion strategy termed Softmax-based weighted strategy based on the Softmax weights and matrix nuclear norm.

An Edge Alignment-based Orientation Selection Method for Neutron Tomography

no code implementations1 Dec 2022 Diyu Yang, Shimin Tang, Singanallur V. Venkatakrishnan, Mohammad S. N. Chowdhury, Yuxuan Zhang, Hassina Z. Bilheux, Gregery T. Buzzard, Charles A. Bouman

Neutron computed tomography (nCT) is a 3D characterization technique used to image the internal morphology or chemical composition of samples in biology and materials sciences.

Diversity

All You Need is RAW: Defending Against Adversarial Attacks with Camera Image Pipelines

1 code implementation16 Dec 2021 Yuxuan Zhang, Bo Dong, Felix Heide

Various defense methods have proposed image-to-image mapping methods, either including these perturbations in the training process or removing them in a preprocessing denoising step.

Adversarial Defense Denoising +3

The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement

1 code implementation CVPR 2022 Ilya Chugunov, Yuxuan Zhang, Zhihao Xia, Xuaner, Zhang, Jiawen Chen, Felix Heide

Modern smartphones can continuously stream multi-megapixel RGB images at 60Hz, synchronized with high-quality 3D pose information and low-resolution LiDAR-driven depth estimates.

CelebHair: A New Large-Scale Dataset for Hairstyle Recommendation based on CelebA

no code implementations14 Apr 2021 Yutao Chen, Yuxuan Zhang, Zhongrui Huang, Zhenyao Luo, Jinpeng Chen

In this paper, we present a new large-scale dataset for hairstyle recommendation, CelebHair, based on the celebrity facial attributes dataset, CelebA.

Facial Landmark Detection

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

2 code implementations CVPR 2021 Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.

Decoder Image Segmentation +1

Adaptive Radar Detection and Classification Algorithms for Multiple Coherent Signals

no code implementations23 Dec 2020 Sudan Han, Linjie Yan, Yuxuan Zhang, Pia Addabbo, Chengpeng Hao, Danilo Orlando

In this paper, we address the problem of target detection in the presence of coherent (or fully correlated) signals, which can be due to multipath propagation effects or electronic attacks by smart jammers.

General Classification

A prognostic dynamic model applicable to infectious diseases providing easily visualized guides -- A case study of COVID-19 in the UK

1 code implementation14 Dec 2020 Yuxuan Zhang, Chen Gong, Dawei Li, Zhi-Wei Wang, Shengda D Pu, Alex W Robertson, Hong Yu, John Parrington

A reasonable prediction of infectious diseases transmission process under different disease control strategies is an important reference point for policy makers.

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

no code implementations ICLR 2021 Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.

3D geometry Neural Rendering

A Combined Data-driven and Physics-driven Method for Steady Heat Conduction Prediction using Deep Convolutional Neural Networks

no code implementations16 May 2020 Hao Ma, Xiangyu Hu, Yuxuan Zhang, Nils Thuerey, Oskar J. Haidn

For the data-driven based method, the introduction of physical equation not only is able to speed up the convergence, but also produces physically more consistent solutions.

Deep Neural Network Fingerprinting by Conferrable Adversarial Examples

1 code implementation ICLR 2021 Nils Lukas, Yuxuan Zhang, Florian Kerschbaum

We propose a fingerprinting method for deep neural network classifiers that extracts a set of inputs from the source model so that only surrogates agree with the source model on the classification of such inputs.

Model extraction Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.