3 code implementations • 7 Jan 2025 • Nvidia, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
no code implementations • 1 Jan 2025 • Yuxuan Zhang, Yulong Li, Zichen Yu, Feilong Tang, Zhixiang Lu, Chong Li, Kang Dang, Jionglong Su
To address the limitations of large-scale language models (e. g., GPT-4) in capturing intricate emotional causality within extended dialogues, we propose CauseMotion, a long-sequence emotional causal reasoning framework grounded in Retrieval-Augmented Generation (RAG) and multimodal fusion.
no code implementations • 1 Jan 2025 • Yulong Li, Yuxuan Zhang, Feilong Tang, Mian Zhou, Zhixiang Lu, Haochen Xue, Yifang Wang, Kang Dang, Jionglong Su
To improve the accuracy and applicability of sign language systems, we propose the AuraLLM and SignMST-C models.
1 code implementation • 25 Dec 2024 • Wenbin Li, Di Yao, Chang Gong, Xiaokai Chu, Quanliang Jing, Xiaolei Zhou, Yuxuan Zhang, Yunxia Fan, Jingping Bi
Existing solutions directly train a generative model for observed trajectories and calculate the conditional generative probability $P({T}|{C})$ as the anomaly risk, where ${T}$ and ${C}$ represent the trajectory and SD pair respectively.
no code implementations • 22 Nov 2024 • Uditha Muthumala, Yuxuan Zhang, Luciano Sebastian Martinez-Rau, Sebastian Bader
These classifications can be performed based on the entire AE waveform or specific features that have been extracted from it.
1 code implementation • 18 Nov 2024 • Meng Zhou, Yuxuan Zhang, Xiaolan Xu, Jiayi Wang, Farzad Khalvati
Multimodal medical image fusion is a crucial task that combines complementary information from different imaging modalities into a unified representation, thereby enhancing diagnostic accuracy and treatment planning.
no code implementations • 16 Nov 2024 • Luciano S. Martinez-Rau, Yuxuan Zhang, Bengt Oelmann, Sebastian Bader
This study proposes two distinctive pattern recognition approaches for real-time anomaly detection in the operational cycles of mining conveyor belts, combining feature extraction, threshold-based cycle detection, and tiny machine-learning classification.
1 code implementation • 2 Oct 2024 • Yuxuan Zhang, Ruizhe Li
This approach reduces inference time to less than twice that of single LoRA inference by leveraging parallel computation.
no code implementations • 26 Sep 2024 • Yilin Wang, Yifei Yu, Kong Sun, Peixuan Lei, Yuxuan Zhang, Enrico Zio, Aiguo Xia, Yuanxiang Li
Extensive experiments demonstrate that RmGPT significantly outperforms state-of-the-art algorithms, achieving near-perfect accuracy in diagnosis tasks and exceptionally low errors in prognosis tasks.
no code implementations • 24 Sep 2024 • Yuxuan Zhang, Roeland Wiersema, Juan Carrasquilla, Lukasz Cincio, Yong Baek Kim
In this work, we explore the potential of a VQC scheme by making use of out-of-distribution generalization results in quantum machine learning (QML): By learning the action of a given many-body dynamics on a small data set of product states, we can obtain a unitary circuit that generalizes to highly entangled states such as the Haar random states.
1 code implementation • 13 Aug 2024 • Jinpeng Yu, Binbin Huang, Yuxuan Zhang, Huaxia Li, Xu Tang, Shenghua Gao
In this paper, we introduce a GeoFormer that simultaneously enhances the global geometric structure of the points and improves the local details.
1 code implementation • 12 Aug 2024 • Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.
1 code implementation • 19 Jul 2024 • Yuxuan Zhang, Qing Zhang, Yiren Song, Jichao Zhang, Hao Tang, Jiaming Liu
In the second stage, we specifically designed a Hair Extractor and a Latent IdentityNet to transfer the target hairstyle with highly detailed and high-fidelity to the bald image.
1 code implementation • 28 Jun 2024 • Yuxuan Zhang, Tianheng Cheng, Rui Hu, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang
Surprisingly, we observe that: (1) multimodal prompts and (2) vision-language models with early fusion (e. g., BEIT-3) are beneficial for prompting SAM for accurate referring segmentation.
Ranked #3 on Referring Expression Segmentation on RefCOCO testA
no code implementations • 27 Jun 2024 • Yuxuan Zhang, T. M. Sazzad, Yangyang Song, Spencer J. Chang, Ritesh Chowdhry, Tomas Mejia, Anna Hampton, Shelby Kucharski, Stefan Gerber, Barry Tillman, Marcio F. R. Resende, William M. Hammond, Chris H. Wilson, Alina Zare, Sanjeev J. Koppal
In addition, the ability to reconstruct hyperspectral data from multi-spectral input makes our device compatible to models and algorithms developed for hyperspectral applications with no modifications required.
no code implementations • 10 Jun 2024 • Yiren Song, Shijie Huang, Chen Yao, Xiaojun Ye, Hai Ci, Jiaming Liu, Yuxuan Zhang, Mike Zheng Shou
The painting process of artists is inherently stepwise and varies significantly among different painters and styles.
no code implementations • 26 Mar 2024 • Hongpeng Pan, Yang Yang, Zhongtian Fu, Yuxuan Zhang, Shian Du, Yi Xu, Xiangyang Ji
To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera.
no code implementations • 17 Mar 2024 • Yuxuan Zhang, Yiren Song, Jinpeng Yu, Han Pan, Zhongliang Jing
Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts.
1 code implementation • 12 Mar 2024 • Yuxuan Zhang, Lifu Wei, Qing Zhang, Yiren Song, Jiaming Liu, Huaxia Li, Xu Tang, Yao Hu, Haibo Zhao
Current makeup transfer methods are limited to simple makeup styles, making them difficult to apply in real-world scenarios.
1 code implementation • CVPR 2024 • Yuxuan Zhang, Yiren Song, Jiaming Liu, Rui Wang, Jinpeng Yu, Hao Tang, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing
Recent advancements in subject-driven image generation have led to zero-shot generation, yet precise selection and focus on crucial subject representations remain challenging.
3 code implementations • CVPR 2024 • Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e. g., computer or smartphone screens.
Ranked #4 on on
no code implementations • Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence Main Track 2023 • Yuxuan Zhang, Wei Yang, Shaowei Wang
In this paper, we propose a uniform network to fill both the gaps, termed FGNet.
no code implementations • 30 Mar 2023 • Yuxuan Zhang, Chao Xu, Howard H. Yang, Xijun Wang, Tony Q. S. Quek
This paper proposes a client selection (CS) method to tackle the communication bottleneck of federated learning (FL) while concurrently coping with FL's data heterogeneity issue.
no code implementations • 24 Feb 2023 • Yuxuan Zhang, Qingzhong Wang, Jiang Bian, Yi Liu, Yanwu Xu, Dejing Dou, Haoyi Xiong
Due to the high similarity between MRI data and videos, we conduct extensive empirical studies on video recognition techniques for MRI classification to answer the questions: (1) can we directly use video recognition models for MRI classification, (2) which model is more appropriate for MRI, (3) are the common tricks like data augmentation in video recognition still useful for MRI classification?
no code implementations • CVPR 2023 • Ilya Chugunov, Yuxuan Zhang, Felix Heide
Modern mobile burst photography pipelines capture and merge a short sequence of frames to recover an enhanced image, but often disregard the 3D nature of the scene they capture, treating pixel motion between images as a 2D aggregation problem.
no code implementations • 9 Dec 2022 • Yuval Bahat, Yuxuan Zhang, Hendrik Sommerhoff, Andreas Kolb, Felix Heide
This allows us to super-resolve the 3D scene representation by applying 2D convolutional networks on the 2D feature planes.
1 code implementation • 9 Dec 2022 • Meng Zhou, Xiaolan Xu, Yuxuan Zhang
Furthermore, we propose a novel fixed fusion strategy termed Softmax-based weighted strategy based on the Softmax weights and matrix nuclear norm.
no code implementations • 1 Dec 2022 • Diyu Yang, Shimin Tang, Singanallur V. Venkatakrishnan, Mohammad S. N. Chowdhury, Yuxuan Zhang, Hassina Z. Bilheux, Gregery T. Buzzard, Charles A. Bouman
Neutron computed tomography (nCT) is a 3D characterization technique used to image the internal morphology or chemical composition of samples in biology and materials sciences.
1 code implementation • 16 Dec 2021 • Yuxuan Zhang, Bo Dong, Felix Heide
Various defense methods have proposed image-to-image mapping methods, either including these perturbations in the training process or removing them in a preprocessing denoising step.
1 code implementation • CVPR 2022 • Ilya Chugunov, Yuxuan Zhang, Zhihao Xia, Xuaner, Zhang, Jiawen Chen, Felix Heide
Modern smartphones can continuously stream multi-megapixel RGB images at 60Hz, synchronized with high-quality 3D pose information and low-resolution LiDAR-driven depth estimates.
no code implementations • 14 Apr 2021 • Yutao Chen, Yuxuan Zhang, Zhongrui Huang, Zhenyao Luo, Jinpeng Chen
In this paper, we present a new large-scale dataset for hairstyle recommendation, CelebHair, based on the celebrity facial attributes dataset, CelebA.
2 code implementations • CVPR 2021 • Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler
To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.
no code implementations • 23 Dec 2020 • Sudan Han, Linjie Yan, Yuxuan Zhang, Pia Addabbo, Chengpeng Hao, Danilo Orlando
In this paper, we address the problem of target detection in the presence of coherent (or fully correlated) signals, which can be due to multipath propagation effects or electronic attacks by smart jammers.
1 code implementation • 14 Dec 2020 • Yuxuan Zhang, Chen Gong, Dawei Li, Zhi-Wei Wang, Shengda D Pu, Alex W Robertson, Hong Yu, John Parrington
A reasonable prediction of infectious diseases transmission process under different disease control strategies is an important reference point for policy makers.
no code implementations • ICLR 2021 • Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler
Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.
no code implementations • 16 May 2020 • Hao Ma, Xiangyu Hu, Yuxuan Zhang, Nils Thuerey, Oskar J. Haidn
For the data-driven based method, the introduction of physical equation not only is able to speed up the convergence, but also produces physically more consistent solutions.
1 code implementation • ICLR 2021 • Nils Lukas, Yuxuan Zhang, Florian Kerschbaum
We propose a fingerprinting method for deep neural network classifiers that extracts a set of inputs from the source model so that only surrogates agree with the source model on the classification of such inputs.