no code implementations • 3 Sep 2019 • Guoqing Li, Meng Zhang, Qianru Zhang, Ziyang Chen, Wenzhao Liu, Jiaojie Li, Xuzhao Shen, Jianjun Li, Zhenyu Zhu, Chau Yuen
To design more efficient lightweight concolutional neural netwok, Depthwise-Pointwise-Depthwise inverted bottleneck block (DPD block) is proposed and DPDNet is designed by stacking DPD block.
no code implementations • 24 Jul 2021 • Xuetian Lai, Qiongyao Li, Ziyang Chen, Xiaopeng Shao, Jixiong Pu
It is shown that based on the trained YGAN, we can reconstruct images of two adjacent objects from one speckle pattern with high fidelity.
1 code implementation • 10 Nov 2021 • Ziyang Chen, Xixi Hu, Andrew Owens
From whirling ceiling fans to ticking clocks, the sounds that we hear subtly vary as we move through a scene.
1 code implementation • 26 Apr 2022 • Ziyang Chen, David F. Fouhey, Andrew Owens
We adapt the contrastive random walk of Jabri et al. to learn a cycle-consistent representation from unlabeled stereo sounds, resulting in a model that performs on par with supervised methods on "in the wild" internet recordings.
1 code implementation • 15 Sep 2022 • Zhangli Zhou, Shaochen Wang, Ziyang Chen, Mingyu Cai, Zhen Kan
We demonstrate that using parallel branches as opposed to serial stacked convolutional layers will be a more powerful design for robotic visual grasping tasks.
1 code implementation • 13 Nov 2022 • Nuo Chen, Yan Wang, Haiyun Jiang, Deng Cai, Yuhan Li, Ziyang Chen, Longyue Wang, Jia Li
In this paper, we introduce the Harry Potter Dialogue (HPD) dataset, designed to advance the study of dialogue agents and character alignment.
no code implementations • CVPR 2022 • Xixi Hu, Ziyang Chen, Andrew Owens
This task requires a model to both group a sound mixture into individual sources, and to associate them with a visual signal.
no code implementations • CVPR 2023 • Chao Feng, Ziyang Chen, Andrew Owens
Manipulated videos often contain subtle inconsistencies between their visual and audio signals.
Ranked #3 on DeepFake Detection on FakeAVCeleb
2 code implementations • ICCV 2023 • Ziyang Chen, Shengyi Qian, Andrew Owens
In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources.
1 code implementation • 7 Apr 2023 • Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Yong Xia
Moreover, UniSeg also beats other pre-trained models on two downstream datasets, providing the community with a high-quality pre-trained model for 3D medical image segmentation.
no code implementations • 10 Apr 2023 • Ziyang Chen, Yongsheng Pan, Yong Xia
The reconstruction alignment (RA) module uses a variational auto-encoder (VAE) to reconstruct the input image and thus boosts the image representation ability of the network in a self-supervised way.
1 code implementation • CVPR 2023 • Yuexi Du, Ziyang Chen, Justin Salamon, Bryan Russell, Andrew Owens
Second, we propose a model for generating a soundtrack for a silent input video, given a user-supplied example that specifies what the video should "sound like".
1 code implementation • 31 May 2023 • Ziyang Chen, Yongsheng Pan, Yiwen Ye, Hengfei Cui, Yong Xia
In this paper, we propose a multi-source DG method called Treasure in Distribution (TriD), which constructs an unprecedented search space to obtain the model with strong robustness by randomly sampling from a uniform distribution.
no code implementations • 14 Jul 2023 • Linkai Luo, Qiaoling Yang, Hong Peng, Yiding Wang, Ziyang Chen
We first formulate the training and parameter selection of SVC as a minimax optimization problem named as MaxMin-L2-SVC-NCH, in which the minimization problem is an optimization problem of finding the closest points between two normal convex hulls (L2-SVC-NCH) while the maximization problem is an optimization problem of finding the optimal Gaussian kernel parameters.
no code implementations • 25 Sep 2023 • Qiaoling Yang, Linkai Luo, Haoyu Zhang, Hong Peng, Ziyang Chen
To address this, we propose a sample attention memory network (SAMN) that effectively combines SVM and NN by incorporating sample attention module, class prototypes, and memory block to NN.
1 code implementation • 7 Nov 2023 • Dongfang Li, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Ziyang Chen, Baotian Hu, Aiguo Wu, Min Zhang
Open-domain generative systems have gained significant attention in the field of conversational AI (e. g., generative search engines).
no code implementations • 15 Nov 2023 • Ziyang Chen, Dongfang Li, Xiang Zhao, Baotian Hu, Min Zhang
In this paper, we tackle the significant challenge of temporal knowledge reasoning in Large Language Models (LLMs), an area where such models frequently encounter difficulties.
1 code implementation • 29 Nov 2023 • Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Qi Wu, Yong Xia
In this paper, we reconsider versatile self-supervised learning from the perspective of continual learning and propose MedCoSS, a continuous self-supervised learning approach for multi-modal medical data.
1 code implementation • 30 Nov 2023 • Ziyang Chen, Yiwen Ye, Mengkang Lu, Yongsheng Pan, Yong Xia
Distribution shift widely exists in medical images acquired from different medical centres and poses a significant obstacle to deploying the pre-trained semantic segmentation model in real-world applications.
1 code implementation • 15 Dec 2023 • Xiangde Luo, Jia Fu, Yunxin Zhong, Shuolin Liu, Bing Han, Mehdi Astaraki, Simone Bendazzoli, Iuliana Toma-Dasu, Yiwen Ye, Ziyang Chen, Yong Xia, Yanzhou Su, Jin Ye, Junjun He, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Kaixiang Yang, Xin Fang, Zhiwei Wang, Chan Woong Lee, Sang Joon Park, Jaehee Chun, Constantin Ulrich, Klaus H. Maier-Hein, Nchongmaje Ndipenoch, Alina Miron, Yongmin Li, Yimeng Zhang, Yu Chen, Lu Bai, Jinlong Huang, Chengyang An, Lisheng Wang, Kaiwen Huang, Yunqi Gu, Tao Zhou, Mu Zhou, Shichuan Zhang, Wenjun Liao, Guotai Wang, Shaoting Zhang
The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis.
no code implementations • 31 Jan 2024 • Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong
We introduce UniTouch, a unified tactile model for vision-based touch sensors connected to multiple modalities, including vision, language, and sound.
no code implementations • 27 Mar 2024 • Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander Richard
The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms.
2 code implementations • 10 Apr 2024 • Ziyang Chen, Wei Long, He Yao, Yongjun Zhang, Bingshu Wang, Yongbin Qin, Jia Wu
In addition, edge variations in %potential feature channels of the reconstruction error map also affect details matching, we propose the Reconstruction Error Motif Penalty (REMP) module to further refine the full-resolution disparity estimation.
Ranked #1 on Stereo Depth Estimation on KITTI 2015