no code implementations • 10 Jan 2025 • Yifan Zhao, Jia Li, Zeyin Song, Yonghong Tian
Depicting novel classes with language descriptions by observing few-shot samples is inherent in human-learning systems.
1 code implementation • 3 Dec 2024 • Qisen Wang, Yifan Zhao, Jiawei Ma, Jia Li
However, the diffusion model, as an external prior that can directly provide visual supervision, has always underperformed in sparse-view 3D reconstruction using Score Distillation Sampling (SDS) due to the low information entropy of sparse views compared to text, leading to optimization challenges caused by mode deviation.
no code implementations • 31 Oct 2024 • Yuchen Yang, Shubham Ugare, Yifan Zhao, Gagandeep Singh, Sasa Misailovic
Traditional quantization methods have primarily concentrated on maintaining neural network accuracy, either ignoring the impact of quantization on the robustness of the network, or using only empirical techniques for improving robustness.
no code implementations • 27 Sep 2024 • Yipeng Lu, Yifan Zhao, Haiping Wang, Zhiwei Ruan, YuAn Liu, Zhen Dong, Bisheng Yang
In this study, we propose a precise pose estimation method for dashcam images, leveraging the inherent camera motion prior.
no code implementations • 5 Sep 2024 • Yifan Zhao, Zhiqian Zhang, Ziyang Xu, Zhenbin Zhang, Jose Rodriguez
Based on this, an effective current limiting strategy is proposed.
no code implementations • 3 Aug 2024 • Yunshan Qi, Jia Li, Yifan Zhao, Yu Zhang, Lin Zhu
To effectively introduce event streams into the neural volumetric representation learning process, we propose an event-enhanced blur rendering loss and an event rendering loss, which guide the network via modeling the real blur process and event generation process, respectively.
no code implementations • 20 Jun 2024 • Yunshan Qi, Lin Zhu, Yifan Zhao, Nan Bao, Jia Li
Neural Radiance Fields (NeRF) achieves impressive 3D representation learning and novel view synthesis results with high-quality multi-view images as input.
no code implementations • 20 Jun 2024 • Bin Li, Jiayan Pei, Feiyang Xiao, Yifan Zhao, Zhixing Zhang, Diwei Liu, Hengxu He, Jia Jia
OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency.
no code implementations • 13 Jun 2024 • Yifan Zhao, Xiaoyu Wang, Kaibo Zhou, Xuehui Wang, Yan Wang, Wei Gao, Ruiqi Liu, Feng Shu
To meet the requirements of the network performance, a power allocation (PA) strategy is proposed and adopted in the system.
no code implementations • 15 May 2024 • Li Ma, Yifan Zhao, Peixi Peng, Yonghong Tian
Different from these methods, we propose to decouple the intrinsic attributes into two complementary features for artifacts reduction, ie, the compression-insensitive features to regularize the high-level semantic representations during training and the compression-sensitive features to be aware of the compression degree.
1 code implementation • 25 Apr 2024 • Yifan Zhao, Zhenyu Liang, Zhichao Lu, Ran Cheng
To bridge the gap, we introduce a tailored streamline to transform the task of HW-NAS for real-time semantic segmentation into standard MOPs.
1 code implementation • CVPR 2024 • Lin Zhu, Kangmin Jia, Yifan Zhao, Yunshan Qi, Lizhi Wang, Hua Huang
Spike cameras, leveraging spike-based integration sampling and high temporal resolution, offer distinct advantages over standard cameras.
no code implementations • 7 Nov 2023 • Yifan Zhao, Xuehui Wang, Yan Wang, Xianpeng Wang, Zhilin Chen, Feng Shu, Cunhua Pan, Jiangzhou Wang
Due to its slow linear convergence from iterative GA, the proposed ESMPI-GA is high-complexity.
1 code implementation • 15 Jul 2023 • Cheng Chen, Yifan Zhao, Jia Li
Learning multi-label image recognition with incomplete annotation is gaining popularity due to its superior performance and significant labor savings when compared to training with fully labeled datasets.
1 code implementation • 18 Jun 2023 • Yifan Zhao, Tong Zhang, Jia Li, Yonghong Tian
Recent progress in this setting assumes that the base knowledge and novel query samples are distributed in the same domains, which are usually infeasible for realistic applications.
1 code implementation • 21 Apr 2023 • Li Ma, Peixi Peng, Guangyao Chen, Yifan Zhao, Siwei Dong, Yonghong Tian
The sensitivity of deep neural networks to compressed images hinders their usage in many real applications, which means classification networks may fail just after taking a screenshot and saving it as a compressed file.
no code implementations • 12 Apr 2023 • Yuhang Li, Jingxi Li, Yifan Zhao, Tianyi Gan, Jingtian Hu, Mona Jarrahi, Aydogan Ozcan
We demonstrate universal polarization transformers based on an engineered diffractive volume, which can synthesize a large set of arbitrarily-selected, complex-valued polarization scattering matrices between the polarization states at different positions within its input and output field-of-views (FOVs).
1 code implementation • CVPR 2023 • Zeyin Song, Yifan Zhao, Yujun Shi, Peixi Peng, Li Yuan, Yonghong Tian
However, in this work, we find that the CE loss is not ideal for the base session training as it suffers poor class separation in terms of representations, which further degrades generalization to novel classes.
no code implementations • 16 Mar 2023 • Matevž Fabjančič, Octavian Machidon, Hashim Sharif, Yifan Zhao, Saša Misailović, Veljko Pejović
Runtime-tunable context-dependent network compression would make mobile deep learning (DL) adaptable to often varying resource availability, input "difficulty", or user needs.
no code implementations • ICCV 2023 • Yunpeng Zhai, Peixi Peng, Yifan Zhao, Yangru Huang, Yonghong Tian
Vision-based reinforcement learning (RL) depends on discriminative representation encoders to abstract the observation states.
no code implementations • ICCV 2023 • Yangru Huang, Peixi Peng, Yifan Zhao, Yunpeng Zhai, Haoran Xu, Yonghong Tian
Efficient motion and appearance modeling are critical for vision-based Reinforcement Learning (RL).
1 code implementation • 28 Dec 2022 • Yifan Zhao, Jia Li, Xiaowu Chen, Yonghong Tian
This framework, namely PArt-guided Relational Transformers (PART), is proposed to learn the discriminative part features with an automatic part discovery module, and to explore the intrinsic correlations with a feature transformation module by adapting the Transformer models from the field of natural language processing.
Ranked #8 on
Fine-Grained Image Classification
on FGVC Aircraft
Fine-Grained Image Classification
Fine-Grained Visual Recognition
+1
no code implementations • 28 Dec 2022 • Yifan Zhao, Jia Li, Yonghong Tian
Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e. g., agriculture, remote sensing, and space technologies.
no code implementations • 5 Dec 2022 • Jingxi Li, Tianyi Gan, Yifan Zhao, Bijie Bai, Che-Yung Shen, Songyu Sun, Mona Jarrahi, Aydogan Ozcan
A unidirectional imager would only permit image formation along one direction, from an input field-of-view (FOV) A to an output FOV B, and in the reverse path, the image formation would be blocked.
no code implementations • Neurocomputing 2022 • Heng Wu, Yifan Zhao, Jia Li
Recent ideas propose to explore this problem in an unsupervised setting, i. e., without any labels in base classes, which reduces the heavy consumption of manual annotations.
no code implementations • 21 Jun 2022 • Deniz Mengu, Yifan Zhao, Anika Tabassum, Mona Jarrahi, Aydogan Ozcan
Permutation matrices form an important computational building block frequently used in various fields including e. g., communications, information security and data processing.
no code implementations • 15 Jun 2022 • Cagatay Isil, Deniz Mengu, Yifan Zhao, Anika Tabassum, Jingxi Li, Yi Luo, Mona Jarrahi, Aydogan Ozcan
We report a deep learning-enabled diffractive display design that is based on a jointly-trained pair of an electronic encoder and a diffractive optical decoder to synthesize/project super-resolved images using low-resolution wavefront modulators.
no code implementations • 26 May 2022 • Bijie Bai, Yi Luo, Tianyi Gan, Jingtian Hu, Yuhang Li, Yifan Zhao, Deniz Mengu, Mona Jarrahi, Aydogan Ozcan
Here, we demonstrate a camera design that performs class-specific imaging of target objects with instantaneous all-optical erasure of other classes of objects.
no code implementations • 10 Nov 2021 • Jiaxin Li, Yan Ding, Weizhong Zhang, Yifan Zhao, Lingxi Guo, Zhe Yang
Augmented reality technology based on image registration is becoming increasingly popular for the convenience of pre-surgery preparation and medical education.
1 code implementation • ICCV 2021 • Jiawei Zhao, Ke Yan, Yifan Zhao, Xiaowei Guo, Feiyue Huang, Jia Li
Different from these researches, in this paper, we propose a novel Transformer-based Dual Relation learning framework, constructing complementary relationships by exploring two aspects of correlation, i. e., structural relation graph and semantic relation graph.
Ranked #9 on
Multi-Label Classification
on PASCAL VOC 2007
1 code implementation • ACM MM 2021 • Jiawei Zhao, Yifan Zhao, Jia Li
Multi-label image recognition aims to recognize multiple objects simultaneously in one image.
Ranked #5 on
Multi-Label Classification
on PASCAL VOC 2007
1 code implementation • ICCV 2021 • Jiajian Zhao, Yifan Zhao, Jia Li, Ke Yan, Yonghong Tian
The crucial problem in vehicle re-identification is to find the same vehicle identity when reviewing this object from cross-view cameras, which sets a higher demand for learning viewpoint-invariant representations.
no code implementations • 8 Sep 2021 • Yifan Zhao, Jiawei Zhao, Jia Li, Xiaowu Chen
To construct our framework as well as achieving accurate salient detection results, we propose a Ubiquitous Target Awareness (UTA) network to solve three important challenges in RGB-D SOD task: 1) a depth awareness module to excavate depth information and to mine ambiguous regions via adaptive depth-error weights, 2) a spatial-aware cross-modal interaction and a channel-aware cross-level interaction, exploiting the low-level boundary cues and amplifying high-level salient channels, and 3) a gated multi-scale predictor module to perceive the object saliency in different contextual scales.
Ranked #10 on
Thermal Image Segmentation
on RGB-T-Glass-Segmentation
1 code implementation • 8 Sep 2021 • Zhongxing Ma, Yifan Zhao, Jia Li
Therefore, we propose a Pose-guided inter-and intra-part relational transformer (Pirt) for occluded person Re-Id, which builds part-aware long-term correlations by introducing transformers.
1 code implementation • ICCV 2021 • Chenxu Zhang, Yifan Zhao, Yifei HUANG, Ming Zeng, Saifeng Ni, Madhukar Budagavi, Xiaohu Guo
In this paper, we propose a talking face generation method that takes an audio signal as input and a short target video clip as reference, and synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.
1 code implementation • 1 Aug 2021 • Yifan Zhao, Le Hui, Jin Xie
To achieve this, we exploit the consistency between the input sparse point cloud and generated dense point cloud for the shapes and rendered images.
no code implementations • CVPR 2021 • Yifan Zhao, Ke Yan, Feiyue Huang, Jia Li
Fine-grained object recognition aims to learn effective features that can identify the subtle differences between visually similar objects.
Ranked #34 on
Fine-Grained Image Classification
on CUB-200-2011
no code implementations • 1 Jan 2021 • Yali Du, Yifan Zhao, Meng Fang, Jun Wang, Gangyan Xu, Haifeng Zhang
Dealing with multi-agent control in networked systems is one of the biggest challenges in Reinforcement Learning (RL) and limited success has been presented compared to recent deep reinforcement learning in single-agent domain.
no code implementations • 15 Oct 2020 • Chi Chen, Xin Peng, Zhenchang Xing, Jun Sun, Xin Wang, Yifan Zhao, Wenyun Zhao
APIRec-CST is a deep learning model that combines the API usage with the text information in the source code based on an API Context Graph Network and a Code Token Network that simultaneously learn structural and textual features for API recommendation.
1 code implementation • 17 Sep 2020 • Xin Guo, Yifan Zhao, Jia Li
To explore the relationship between music and dance movements, we propose a cross-modal alignment module that focuses on dancing video clips, accompanied on pre-designed music, to learn a system that can judge the consistency between the visual features of pose sequences and the acoustic features of music.
1 code implementation • 10 Aug 2020 • Zeyuan Wang, Yifan Zhao, Jia Li, Yonghong Tian
Given base classes with sufficient labeled samples, the target of few-shot classification is to recognize unlabeled samples of novel classes with only a few labeled samples.
1 code implementation • 30 May 2020 • Jia-Wei Zhao, Yifan Zhao, Jia Li, Xiaowu Chen
To solve this, many recent RGBD-based networks are proposed by adopting the depth map as an independent input and fuse the features with RGB information.
Ranked #1 on
RGB-D Salient Object Detection
on RGBD135
no code implementations • 23 May 2020 • Deniz Mengu, Yifan Zhao, Nezih T. Yardimci, Yair Rivenson, Mona Jarrahi, Aydogan Ozcan
By modeling the undesired layer-to-layer misalignments in 3D as continuous random variables in the optical forward model, diffractive networks are trained to maintain their inference accuracy over a large range of misalignments; we term this diffractive network design as vaccinated D2NN (v-D2NN).
no code implementations • ICCV 2019 • Yifan Zhao, Jia Li, Yu Zhang, Yonghong Tian
In this paper, we propose a joint parsing framework with boundary and semantic awareness to address this challenging problem.
1 code implementation • 31 Jul 2019 • Yi Zheng, Yifan Zhao, Mengyuan Ren, He Yan, Xiangju Lu, Junhui Liu, Jia Li
Recent years have witnessed increasing attention in cartoon media, powered by the strong demands of industrial applications.
no code implementations • 10 Apr 2019 • Kaiwen Yu, Jia Li, Yu Zhang, Yifan Zhao, Long Xu
Along with the development of virtual reality (VR), omnidirectional images play an important role in producing multimedia content with immersive experience.