Search Results for author: Yang Zhao

Found 155 papers, 48 papers with code

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations20 Aug 2019 Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

1 code implementation17 Jul 2023 Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.

Instruction Following Sentence +1

Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study

2 code implementations10 Apr 2024 Hongru Du, Jianan Zhao, Yang Zhao, Shaochong Xu, Xihong Lin, Yiran Chen, Lauren M. Gardner, Hao, Yang

Forecasting the short-term spread of an ongoing disease outbreak is a formidable challenge due to the complexity of contributing factors, some of which can be characterized through interlinked, multi-modality variables such as epidemiological time series data, viral biology, population demographics, and the intersection of public policy and human behavior.

Representation Learning Time Series

HW-NAS-Bench:Hardware-Aware Neural Architecture Search Benchmark

1 code implementation19 Mar 2021 Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Yingyan Lin

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance of all the networks in the search spaces of both NAS-Bench-201 and FBNet, on six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

Hardware Aware Neural Architecture Search Neural Architecture Search

ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design

1 code implementation18 Oct 2022 Haoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao, Yongan Zhang, Chaojian Li, Baopu Li, Yingyan Lin

Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations.

Federated Learning with New Knowledge: Fundamentals, Advances, and Futures

1 code implementation3 Feb 2024 Lixu Wang, Yang Zhao, Jiahua Dong, Ating Yin, Qinbin Li, Xiao Wang, Dusit Niyato, Qi Zhu

Federated Learning (FL) is a privacy-preserving distributed learning approach that is rapidly developing in an era where privacy protection is increasingly valued.

Federated Learning Privacy Preserving

Synchronous Bidirectional Inference for Neural Sequence Generation

1 code implementation24 Feb 2019 Jiajun Zhang, Long Zhou, Yang Zhao, Cheng-qing Zong

In this work, we propose a synchronous bidirectional inference model to generate outputs using both left-to-right and right-to-left decoding simultaneously and interactively.

Abstractive Text Summarization Machine Translation +1

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

1 code implementation6 Jan 2020 Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.

xFraud: Explainable Fraud Transaction Detection

1 code implementation24 Nov 2020 Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Zhiyao Chen, Yinan Shan, Yang Zhao, Ce Zhang

At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss.

Explainable Models Fraud Detection +1

A Survey of Federated Unlearning: A Taxonomy, Challenges and Future Directions

1 code implementation30 Oct 2023 Yang Zhao, Jiaxi Yang, Yiling Tao, Lixu Wang, Xiaoxiao Li, Dusit Niyato

Achieving an optimal equilibrium among these facets is crucial for maintaining the effectiveness and usability of FL systems while adhering to privacy and security standards.

Federated Learning Privacy Preserving

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

2 code implementations13 Dec 2023 Haifeng Huang, Zehan Wang, Rongjie Huang, Luping Liu, Xize Cheng, Yang Zhao, Tao Jin, Zhou Zhao

These tokens capture the object's attributes and spatial relationships with surrounding objects in the 3D scene.

Attribute Object +1

Unpaired Image-to-Image Translation via Latent Energy Transport

1 code implementation CVPR 2021 Yang Zhao, Changyou Chen

Instead of explicitly extracting the two codes and applying adaptive instance normalization to combine them, our latent EBM can implicitly learn to transport the source style code to the target style code while preserving the content code, an advantage over existing image translation methods.

Image Reconstruction Image-to-Image Translation +1

Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes

1 code implementation17 Aug 2023 Zehan Wang, Haifeng Huang, Yang Zhao, Ziang Zhang, Zhou Zhao

This paper presents Chat-3D, which combines the 3D visual perceptual ability of pre-trained 3D representations and the impressive reasoning and conversation capabilities of advanced LLMs to achieve the first universal dialogue systems for 3D scenes.

Language Modelling Large Language Model +1

Structure-Aware Human-Action Generation

1 code implementation ECCV 2020 Ping Yu, Yang Zhao, Chunyuan Li, Junsong Yuan, Changyou Chen

Generating long-range skeleton-based human actions has been a challenging problem since small deviations of one frame can cause a malformed action sequence.

Action Generation graph construction +1

FaultNet: A Deep Convolutional Neural Network for bearing fault classification

1 code implementation5 Oct 2020 Rishikesh Magar, Lalit Ghule, Junhan Li, Yang Zhao, Amir Barati Farimani

In this work, we analyze vibration signal data of mechanical systems with bearings by combining different signal processing methods and coupling them with machine learning techniques to classify different types of bearing faults.

Ranked #2 on Classification on CWRU Bearing Dataset (using extra training data)

BIG-bench Machine Learning Classification +2

IRS-Aided SWIPT: Joint Waveform, Active and Passive Beamforming Design Under Nonlinear Harvester Model

1 code implementation10 Dec 2020 Yang Zhao, Bruno Clerckx, Zhenyuan Feng

To facilitate practical implementation, we also propose a low-complexity design based on closed-form adaptive waveform schemes.

Information Theory Signal Processing Information Theory

Learning an Efficient Multimodal Depth Completion Model

1 code implementation23 Aug 2022 Dewang Hou, Yuanyuan Du, Kai Zhao, Yang Zhao

With the wide application of sparse ToF sensors in mobile devices, RGB image-guided sparse depth completion has attracted extensive attention recently, but still faces some problems.

Depth Completion Depth Estimation +1

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

1 code implementation8 Feb 2022 Yang Zhao, Hao Zhang, Xiuyuan Hu

In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization.

Extending Multi-modal Contrastive Representations

1 code implementation13 Oct 2023 Zehan Wang, Ziang Zhang, Luping Liu, Yang Zhao, Haifeng Huang, Tao Jin, Zhou Zhao

Inspired by recent C-MCR, this paper proposes Extending Multimodal Contrastive Representation (Ex-MCR), a training-efficient and paired-data-free method to flexibly learn unified contrastive representation space for more than three modalities by integrating the knowledge of existing MCR spaces.

3D Object Classification Representation Learning +1

Learning Diverse Stochastic Human-Action Generators by Learning Smooth Latent Transitions

1 code implementation AAAI 2019 Zhenyi Wang, Ping Yu, Yang Zhao, Ruiyi Zhang, Yufan Zhou, Junsong Yuan, Changyou Chen

In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality.

Action Generation

De novo Drug Design using Reinforcement Learning with Multiple GPT Agents

1 code implementation NeurIPS 2023 Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

A central challenge in this field is to generate molecules with specific properties while also producing a wide range of diverse candidates.

reinforcement-learning

ResiDualGAN: Resize-Residual DualGAN for Cross-Domain Remote Sensing Images Semantic Segmentation

1 code implementation27 Jan 2022 Yang Zhao, Peng Guo, Zihao Sun, Xiuwan Chen, Han Gao

The performance of a semantic segmentation model for remote sensing (RS) images pretrained on an annotated dataset would greatly decrease when testing on another unannotated dataset because of the domain gap.

Image-to-Image Translation Semantic Segmentation +2

Depth-Assisted ResiDualGAN for Cross-Domain Aerial Images Semantic Segmentation

1 code implementation21 Aug 2022 Yang Zhao, Peng Guo, Han Gao, Xiuwan Chen

Generative methods are common approaches to minimizing the domain gap of aerial images which improves the performance of the downstream tasks, e. g., cross-domain semantic segmentation.

Segmentation Semantic Segmentation +1

BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks

1 code implementation31 Aug 2023 Qiang Huang, Jiawei Jiang, Xi Susie Rao, Ce Zhang, Zhichao Han, Zitao Zhang, Xin Wang, Yongjun He, Quanqing Xu, Yang Zhao, Chuang Hu, Shuo Shang, Bo Du

To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed.

Link Prediction Node Classification

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

1 code implementation NeurIPS 2020 Yonggan Fu, Haoran You, Yang Zhao, Yue Wang, Chaojian Li, Kailash Gopalakrishnan, Zhangyang Wang, Yingyan Lin

Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs.

Quantization

Modelling graph dynamics in fraud detection with "Attention"

1 code implementation22 Apr 2022 Susie Xi Rao, Clémence Lanfranchi, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Mo Cheng, Yinan Shan, Yang Zhao, Ce Zhang

At online retail platforms, detecting fraudulent accounts and transactions is crucial to improve customer experience, minimize loss, and avoid unauthorized transactions.

Fraud Detection

Bayesian Meta Sampling for Fast Uncertainty Adaptation

1 code implementation ICLR 2020 Zhenyi Wang, Yang Zhao, Ping Yu, Ruiyi Zhang, Changyou Chen

Specifically, we propose a Bayesian meta sampling framework consisting of two main components: a meta sampler and a sample adapter.

Meta-Learning

Video-Guided Curriculum Learning for Spoken Video Grounding

1 code implementation1 Sep 2022 Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren

To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) during the audio pre-training process, which can make use of the vital visual perceptions to help understand the spoken language and suppress the external noise.

Video Grounding

Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems

1 code implementation COLING 2020 Vitou Phy, Yang Zhao, Akiko Aizawa

For instance, specificity is mandatory in a food-ordering dialogue task, whereas fluency is preferred in a language-teaching dialogue system.

Dialogue Evaluation Semantic Similarity +2

NASA: Neural Architecture Search and Acceleration for Hardware Inspired Hybrid Networks

2 code implementations24 Oct 2022 Huihong Shi, Haoran You, Yang Zhao, Zhongfeng Wang, Yingyan Lin

Multiplication is arguably the most cost-dominant operation in modern deep neural networks (DNNs), limiting their achievable efficiency and thus more extensive deployment in resource-constrained applications.

Neural Architecture Search

EAR-NET: Error Attention Refining Network For Retinal Vessel Segmentation

1 code implementation3 Jul 2021 Jun Wang, Yang Zhao, Linglong Qian, Xiaohan Yu, Yongsheng Gao

The precise detection of blood vessels in retinal images is crucial to the early diagnosis of the retinal vascular diseases, e. g., diabetic, hypertensive and solar retinopathies.

Retinal Vessel Segmentation Segmentation +1

Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding

1 code implementation ICCV 2023 Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

To accomplish this, we design a novel semantic matching model that analyzes the semantic similarity between object proposals and sentences in a coarse-to-fine manner.

Object Semantic Similarity +3

Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance

1 code implementation ICCV 2021 Xiaohan Yu, Yang Zhao, Yongsheng Gao, Xiaohui Yuan, Shengwu Xiong

The proposed UFG image dataset and evaluation protocols is intended to serve as a benchmark platform that can advance research of visual classification from approaching human performance to beyond human ability, via facilitating benchmark data of artificial intelligence (AI) not to be limited by the labels of human intelligence (HI).

Fine-Grained Visual Categorization

Patchy Image Structure Classification Using Multi-Orientation Region Transform

1 code implementation2 Dec 2019 Xiaohan Yu, Yang Zhao, Yongsheng Gao, Shengwu Xiong, Xiaohui Yuan

To address above limitations, this paper proposes a novel Multi-Orientation Region Transform (MORT), which can effectively characterize both contour and structure features simultaneously, for patchy image structure classification.

Classification General Classification

E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

1 code implementation9 May 2023 Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong

Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models.

Machine Translation Optical Character Recognition +2

SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training

1 code implementation4 Jan 2021 Xiaohan Chen, Yang Zhao, Yue Wang, Pengfei Xu, Haoran You, Chaojian Li, Yonggan Fu, Yingyan Lin, Zhangyang Wang

Results show that: 1) applied to inference, SD achieves up to 2. 44x energy efficiency as evaluated via real hardware implementations; 2) applied to training, SD leads to 10. 56x and 4. 48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.

Multi-Scale Context Aggregation Network with Attention-Guided for Crowd Counting

1 code implementation6 Apr 2021 Xin Wang, Yang Zhao, Tangwen Yang, Qiuqi Ruan

In this paper, we propose a multi-scale context aggregation network (MSCANet) based on single-column encoder-decoder architecture for crowd counting, which consists of an encoder based on a dense context-aware module (DCAM) and a hierarchical attention-guided decoder.

Crowd Counting

Improving End-to-End Text Image Translation From the Auxiliary Text Translation Task

1 code implementation8 Oct 2022 Cong Ma, Yaping Zhang, Mei Tu, Xu Han, Linghui Wu, Yang Zhao, Yu Zhou

End-to-end text image translation (TIT), which aims at translating the source language embedded in images to the target language, has attracted intensive attention in recent research.

Multi-Task Learning Translation

Dilation-Erosion for Single-Frame Supervised Temporal Action Localization

1 code implementation13 Dec 2022 Bin Wang, Yan Song, Fanming Wang, Yang Zhao, Xiangbo Shu, Yan Rui

To balance the annotation labor and the granularity of supervision, single-frame annotation has been introduced in temporal action localization.

Temporal Action Localization

Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation

1 code implementation24 Mar 2023 Weide Liu, Zhonghua Wu, Yang Zhao, Yuming Fang, Chuan-Sheng Foo, Jun Cheng, Guosheng Lin

Current methods for few-shot segmentation (FSSeg) have mainly focused on improving the performance of novel classes while neglecting the performance of base classes.

Generalized Few-Shot Semantic Segmentation Segmentation +1

CoRTEx: Contrastive Learning for Representing Terms via Explanations with Applications on Constructing Biomedical Knowledge Graphs

1 code implementation13 Dec 2023 Huaiyuan Ying, Zhengyun Zhao, Yang Zhao, Sihang Zeng, Sheng Yu

Due to a lack of knowledge, previous contrastive learning models trained with Unified Medical Language System (UMLS) synonyms struggle at clustering difficult terms and do not generalize well beyond UMLS terms.

Clustering Contrastive Learning +2

Empirical Evidence for the Fragment level Understanding on Drug Molecular Structure of LLMs

1 code implementation15 Jan 2024 Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

AI for drug discovery has been a research hotspot in recent years, and SMILES-based language models has been increasingly applied in drug molecular design.

Drug Discovery

Phrase Table as Recommendation Memory for Neural Machine Translation

no code implementations25 May 2018 Yang Zhao, Yining Wang, Jiajun Zhang, Cheng-qing Zong

Neural Machine Translation (NMT) has drawn much attention due to its promising translation performance recently.

Machine Translation NMT +2

Multispectral Image Intrinsic Decomposition via Low Rank Constraint

no code implementations24 Feb 2018 Qian Huang, Weixin Zhu, Yang Zhao, Linsen Chen, Yao Wang, Tao Yue, Xun Cao

In this paper, a Low Rank Multispectral Image Intrinsic Decomposition model (LRIID) is presented to decompose the shading and reflectance from a single multispectral image.

Towards Neural Machine Translation with Partially Aligned Corpora

no code implementations IJCNLP 2017 Yining Wang, Yang Zhao, Jiajun Zhang, Cheng-qing Zong, Zhengshan Xue

While neural machine translation (NMT) has become the new paradigm, the parameter optimization requires large-scale parallel data which is scarce in many domains and language pairs.

Machine Translation NMT +2

GUN: Gradual Upsampling Network for Single Image Super-Resolution

no code implementations13 Mar 2017 Yang Zhao, Guoqing Li, Wenjun Xie, Wei Jia, Hai Min, Xiaoping Liu

The GUN consists of an input layer, multiple upsampling and convolutional layers, and an output layer.

Image Super-Resolution

Parallel Spectral Clustering Algorithm Based on Hadoop

no code implementations31 May 2015 Yajun Cui, Yang Zhao, Kafei Xiao, Chenglong Zhang, Lei Wang

Spectral clustering and cloud computing is emerging branch of computer science or related discipline.

Cloud Computing Clustering

Random Occlusion-recovery for Person Re-identification

no code implementations26 Sep 2018 Di Wu, Kun Zhang, Fei Cheng, Yang Zhao, Qi Liu, Chang-An Yuan, De-Shuang Huang

As a basic task of multi-camera surveillance system, person re-identification aims to re-identify a query pedestrian observed from non-overlapping multiple cameras or across different time with a single camera.

Generative Adversarial Network Person Re-Identification

Variance Reduction in Stochastic Particle-Optimization Sampling

no code implementations ICML 2020 Jianyi Zhang, Yang Zhao, Changyou Chen

Stochastic particle-optimization sampling (SPOS) is a recently-developed scalable Bayesian sampling framework that unifies stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD) algorithms based on Wasserstein gradient flows.

POS

Self-Adversarially Learned Bayesian Sampling

no code implementations21 Nov 2018 Yang Zhao, Jianyi Zhang, Changyou Chen

Scalable Bayesian sampling is playing an important role in modern machine learning, especially in the fast-developed unsupervised-(deep)-learning models.

Self-Learning

Addressing Troublesome Words in Neural Machine Translation

no code implementations EMNLP 2018 Yang Zhao, Jiajun Zhang, Zhongjun He, Cheng-qing Zong, Hua Wu

One of the weaknesses of Neural Machine Translation (NMT) is in handling lowfrequency and ambiguous words, which we refer as troublesome words.

Machine Translation NMT +1

A Language Model based Evaluator for Sentence Compression

no code implementations ACL 2018 Yang Zhao, Zhiyuan Luo, Akiko Aizawa

We herein present a language-model-based evaluator for deletion-based sentence compression and view this task as a series of deletion-and-evaluation operations using the evaluator.

Language Modelling reinforcement-learning +3

Multispectral Image Intrinsic Decomposition via Subspace Constraint

no code implementations CVPR 2018 Qian Huang, Weixin Zhu, Yang Zhao, Linsen Chen, Yao Wang, Tao Yue, Xun Cao

In this paper, a new Multispectral Image Intrinsic Decomposition model (MIID) is presented to decompose the shading and reflectance from a single multispectral image.

Privacy-Preserving Blockchain-Based Federated Learning for IoT Devices

no code implementations26 Jun 2019 Yang Zhao, Jun Zhao, Linshan Jiang, Rui Tan, Dusit Niyato, Zengxiang Li, Lingjuan Lyu, Yingbo Liu

To help manufacturers develop a smart home system, we design a federated learning (FL) system leveraging the reputation mechanism to assist home appliance manufacturers to train a machine learning model based on customers' data.

Edge-computing Federated Learning +1

Unsupervised Rewriter for Multi-Sentence Compression

no code implementations ACL 2019 Yang Zhao, Xiaoyu Shen, Wei Bi, Akiko Aizawa

First, the word graph approach that simply concatenates fragments from multiple sentences may yield non-fluent or ungrammatical compression.

Sentence Sentence Compression

MobileFAN: Transferring Deep Hidden Representation for Face Alignment

no code implementations11 Aug 2019 Yang Zhao, Yifan Liu, Chunhua Shen, Yongsheng Gao, Shengwu Xiong

To this end, we propose an effective lightweight model, namely Mobile Face Alignment Network (MobileFAN), using a simple backbone MobileNetV2 as the encoder and three deconvolutional layers as the decoder.

Face Alignment Facial Landmark Detection

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

no code implementations10 Oct 2019 Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang

Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.

E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings

no code implementations NeurIPS 2019 Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Yingyan Lin, Zhangyang Wang

Extensive simulations and ablation studies, with real energy measurements from an FPGA board, confirm the superiority of our proposed strategies and demonstrate remarkable energy savings for training.

A Deep Gradient Boosting Network for Optic Disc and Cup Segmentation

no code implementations5 Nov 2019 Qing Liu, Beiji Zou, Yang Zhao, Yixiong Liang

To build connections among prediction branches, this paper introduces gradient boosting framework to deep classification model and proposes a gradient boosting network called BoostNet.

Segmentation

DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architectures

no code implementations26 Feb 2020 Yang Zhao, Chaojian Li, Yue Wang, Pengfei Xu, Yongan Zhang, Yingyan Lin

The recent breakthroughs in deep neural networks (DNNs) have spurred a tremendously increased demand for DNN accelerators.

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

no code implementations2 Mar 2020 Hongjie Wang, Yang Zhao, Chaojian Li, Yue Wang, Yingyan Lin

The excellent performance of modern deep neural networks (DNNs) comes at an often prohibitive training cost, limiting the rapid development of DNN innovations and raising various environmental concerns.

Efficient Neural Network

Dual-discriminator GAN: A GAN way of profile face recognition

no code implementations20 Mar 2020 Xin-Yu Zhang, Yang Zhao, Hao Zhang

A wealth of angle problems occur when facial recognition is performed: At present, the feature extraction network presents eigenvectors with large differences between the frontal face and profile face recognition of the same person in many cases.

Face Recognition Generative Adversarial Network

Local Differential Privacy based Federated Learning for Internet of Things

no code implementations19 Apr 2020 Yang Zhao, Jun Zhao, Mengmeng Yang, Teng Wang, Ning Wang, Lingjuan Lyu, Dusit Niyato, Kwok-Yan Lam

To avoid the privacy threat and reduce the communication cost, in this paper, we propose to integrate federated learning and local differential privacy (LDP) to facilitate the crowdsourcing applications to achieve the machine learning model.

BIG-bench Machine Learning Federated Learning +1

TIMELY: Pushing Data Movements and Interfaces in PIM Accelerators Towards Local and in Time Domain

no code implementations3 May 2020 Weitao Li, Pengfei Xu, Yang Zhao, Haitong Li, Yuan Xie, Yingyan Lin

Resistive-random-access-memory (ReRAM) based processing-in-memory (R$^2$PIM) accelerators show promise in bridging the gap between Internet of Thing devices' constrained resources and Convolutional/Deep Neural Networks' (CNNs/DNNs') prohibitive energy cost.

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

no code implementations7 May 2020 Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, Yingyan Lin

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs).

Model Compression Quantization

Attacks to Federated Learning: Responsive Web User Interface to Recover Training Data from User Gradients

no code implementations8 Jun 2020 Hans Albert Lianto, Yang Zhao, Jun Zhao

In a case where the aggregator is untrusted and LDP is not applied to each user gradient, the aggregator can recover sensitive user data from these gradients.

Federated Learning

A Flexible Recurrent Residual Pyramid Network for Video Frame Interpolation

no code implementations ECCV 2020 Haoxian Zhang, Yang Zhao, Ronggang Wang

Inspired by classical pyramid energy minimization optical flow algorithms, this paper proposes a recurrent residual pyramid network (RRPN) for video frame interpolation.

Optical Flow Estimation Video Frame Interpolation

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

no code implementations ICLR 2021 Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e. g., energy cost and latency) of all the networks in the search space of both NAS-Bench-201 and FBNet, considering six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

Hardware Aware Neural Architecture Search Neural Architecture Search

Learning Energy-Based Generative Models via Coarse-to-Fine Expanding and Sampling

no code implementations ICLR 2021 Yang Zhao, Jianwen Xie, Ping Li

Energy-based models (EBMs) for generative modeling parametrize a single net and can be directly trained by maximum likelihood estimation.

Translation Unsupervised Image-To-Image Translation

Role Taxonomy of Units in Deep Neural Networks

no code implementations2 Nov 2020 Yang Zhao, Hao Zhang, Xiuyuan Hu

Identifying the role of network units in deep neural networks (DNNs) is critical in many aspects including giving understandings on the mechanisms of DNNs and building basic connections between deep learning and neuroscience.

Retrieval Topological Data Analysis

Rethinking deinterlacing for early interlaced videos

no code implementations27 Nov 2020 Yang Zhao, Wei Jia, Ronggang Wang

Traditional deinterlacing approaches are mainly focused on early interlacing scanning systems and thus cannot handle the complex and complicated artifacts in real-world early interlaced videos.

Image Restoration

ReMP: Rectified Metric Propagation for Few-Shot Learning

no code implementations2 Dec 2020 Yang Zhao, Chunyuan Li, Ping Yu, Changyou Chen

Few-shot learning features the capability of generalizing from a few examples.

Few-Shot Learning

Suspicious Massive Registration Detection via Dynamic Heterogeneous Graph Neural Networks

no code implementations20 Dec 2020 Susie Xi Rao, Shuai Zhang, Zhichao Han, Zitao Zhang, Wei Min, Mo Cheng, Yinan Shan, Yang Zhao, Ce Zhang

Massive account registration has raised concerns on risk management in e-commerce companies, especially when registration increases rapidly within a short time frame.

Management

Waveform and Beamforming Design for Intelligent Reflecting Surface Aided Wireless Power Transfer: Single-User and Multi-User Solutions

no code implementations7 Jan 2021 Zhenyuan Feng, Bruno Clerckx, Yang Zhao

This paper highlights the fact that IRS can provide an extra passive beamforming gain on output DC power over conventional WPT designs and significantly influence the waveform design by leveraging the benefit of passive beamforming, frequency diversity and energy harvester nonlinearity.

Information Theory Signal Processing Information Theory

SDA: Improving Text Generation with Self Data Augmentation

no code implementations2 Jan 2021 Ping Yu, Ruiyi Zhang, Yang Zhao, Yizhe Zhang, Chunyuan Li, Changyou Chen

Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision.

Data Augmentation Imitation Learning +2

Interaction between optical pulse and tumor using finite element analysis

no code implementations19 Jan 2021 Xianlin Song, Ao Teng, Jianshuang Wei, Hao Chen, Yang Zhao, Jianheng Chen, Fangwei Liu, Qianxiang Wan, Guoning Huang, Lingfang Song, Aojie Zhao, Bo Li, Zihao Li, Qiming He, Jinhong Zhang

As a non-destructive biological tissue imaging technology, photoacoustic imaging has important application value in the field of biomedicine.

Biological Physics

Quantitative Performance Assessment of CNN Units via Topological Entropy Calculation

no code implementations ICLR 2022 Yang Zhao, Hao Zhang

We show that by investigating the feature entropy of units on only training data, it could give discrimination between networks with different generalization ability from the view of the effectiveness of feature representations.

General Classification Image Classification

Super-Resolving Compressed Video in Coding Chain

no code implementations26 Mar 2021 Dewang Hou, Yang Zhao, Yuyao Ye, Jiayu Yang, Jian Zhang, Ronggang Wang

Scaling and lossy coding are widely used in video transmission and storage.

Estimating the Generalization in Deep Neural Networks via Sparsity

no code implementations2 Apr 2021 Yang Zhao, Hao Zhang

By training DNNs with a wide range of generalization gap on popular datasets, we show that our key quantities and linear model could be efficient tools for estimating the generalization gap of DNNs.

Image Classification

A Comprehensive Survey of 6G Wireless Communications

no code implementations21 Dec 2020 Yang Zhao, Wenchao Zhai, Jun Zhao, Tinghao Zhang, Sumei Sun, Dusit Niyato, Kwok-Yan Lam

First, we give an overview of 6G from perspectives of technologies, security and privacy, and applications.

Cascaded Prediction Network via Segment Tree for Temporal Video Grounding

no code implementations CVPR 2021 Yang Zhao, Zhou Zhao, Zhu Zhang, Zhijie Lin

Temporal video grounding aims to localize the target segment which is semantically aligned with the given sentence in an untrimmed video.

Sentence Video Grounding

2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency

no code implementations11 Sep 2021 Yonggan Fu, Yang Zhao, Qixuan Yu, Chaojian Li, Yingyan Lin

The recent breakthroughs of deep neural networks (DNNs) and the advent of billions of Internet of Things (IoT) devices have excited an explosive demand for intelligent IoT devices equipped with domain-specific DNN accelerators.

Adversarial Robustness Quantization

Knowledge Graph Enhanced Neural Machine Translation via Multi-task Learning on Sub-entity Granularity

no code implementations COLING 2020 Yang Zhao, Lu Xiang, Junnan Zhu, Jiajun Zhang, Yu Zhou, Chengqing Zong

Previous studies combining knowledge graph (KG) with neural machine translation (NMT) have two problems: i) Knowledge under-utilization: they only focus on the entities that appear in both KG and training sentence pairs, making much knowledge in KG unable to be fully utilized.

Machine Translation Multi-Task Learning +3

Multi-frame Joint Enhancement for Early Interlaced Videos

no code implementations29 Sep 2021 Yang Zhao, Yanbo Ma, Yuan Chen, Wei Jia, Ronggang Wang, Xiaoping Liu

Early interlaced videos usually contain multiple and interlacing and complex compression artifacts, which significantly reduce the visual quality.

Video Deinterlacing Video Reconstruction

D$^2$-GCN: Data-Dependent GCNs for Boosting Both Efficiency and Scalability

no code implementations29 Sep 2021 Chaojian Li, Xu Ouyang, Yang Zhao, Haoran You, Yonggan Fu, Yuchen Gu, Haonan Liu, Siyuan Miao, Yingyan Lin

Graph Convolutional Networks (GCNs) have gained an increasing attention thanks to their state-of-the-art (SOTA) performance in graph-based learning tasks.

Rethinking Deep Face Restoration

no code implementations CVPR 2022 Yang Zhao, Yu-Chuan Su, Chun-Te Chu, Yandong Li, Marius Renn, Yukun Zhu, Changyou Chen, Xuhui Jia

While existing approaches for face restoration make significant progress in generating high-quality faces, they often fail to preserve facial features and cannot authentically reconstruct the faces.

Face Generation Face Reconstruction

TransTCN: An Attention-based TCN Framework for Sequential Modeling

no code implementations29 Sep 2021 Yuan Chai, Liang He, Yang Zhao, Xueyan Li, Zhenxin Wang

The model was evaluated across a wide range of the tasks in time series, which are commonly used to the benchmark of TCN and recurrent networks.

Language Modelling Time Series Analysis

Rethinking Sentiment Style Transfer

no code implementations Findings (EMNLP) 2021 Ping Yu, Yang Zhao, Chunyuan Li, Changyou Chen

To overcome this issue, we propose a graph-based method to extract attribute content and attribute-independent content from input sentences in the YELP dataset and IMDB dataset.

Attribute Style Transfer +1

EnergyNet: Energy-Efficient Dynamic Inference

no code implementations NIPS Workshop CDNNRIA 2018 Yue Wang, Tan Nguyen, Yang Zhao, Zhangyang Wang, Yingyan Lin, Richard Baraniuk

The prohibitive energy cost of running high-performance Convolutional Neural Networks (CNNs) has been limiting their deployment on resource-constrained platforms including mobile and wearable devices.

Equivalence between algorithmic instability and transition to replica symmetry breaking in perceptron learning systems

no code implementations26 Nov 2021 Yang Zhao, Junbin Qiu, Mingshan Xie, Haiping Huang

Binary perceptron is a fundamental model of supervised learning for the non-convex optimization, which is a root of the popular deep learning.

Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks

no code implementations16 Jan 2022 Yang Zhao, Hao Zhang

NRS leverages the finding that models would benefit from converging to flat minima, and tries to regularize the neighborhood region in weight space to yield approximate outputs.

Image Classification

Spatio-temporal Gait Feature with Global Distance Alignment

no code implementations7 Mar 2022 Yifan Chen, Yang Zhao, Xuelong Li

In this paper, we try to enhance the discrimination of spatio-temporal gait features from two aspects: effective extraction of spatio-temporal gait features and reasonable refinement of extracted features.

Gait Recognition

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

no code implementations18 Mar 2022 Yang Zhao, Hao Zhang, Xiuyuan Hu

Optimizers in RST would perform a Bernoulli trial at each iteration to choose randomly from base algorithms (SGD) and sharpness-aware algorithms (SAM) with a probability arranged by a predefined scheduling function.

Computational Efficiency Scheduling

Indoor simultaneous localization and mapping based on fringe projection profilometry

no code implementations23 Apr 2022 Yang Zhao, Kai Zhang, Haotian Yu, Yi Zhang, Dongliang Zheng, Jing Han

Simultaneous Localization and Mapping (SLAM) plays an important role in outdoor and indoor applications ranging from autonomous driving to indoor robotics.

Autonomous Driving Simultaneous Localization and Mapping

Age Minimization in Outdoor and Indoor Communications with Relay-aided Dual RIS

no code implementations6 May 2022 Wanting Lyu, Yue Xiu, Yang Zhao, Chadi Assi, Zhongpei Zhang

In this paper, we investigate an outdoor and indoor wireless communication network with the assistance of a novel relay-aided double-sided reconfigurable intelligent surface (RIS).

Scheduling

BRIGHT -- Graph Neural Networks in Real-Time Fraud Detection

no code implementations25 May 2022 Mingxuan Lu, Zhichao Han, Susie Xi Rao, Zitao Zhang, Yang Zhao, Yinan Shan, Ramesh Raghunathan, Ce Zhang, Jiawei Jiang

Apart from rule-based and machine learning filters that are already deployed in production, we want to enable efficient real-time inference with graph neural networks (GNNs), which is useful to catch multihop risk propagation in a transaction graph.

Entity Embeddings Fraud Detection

AntPivot: Livestream Highlight Detection via Hierarchical Attention Mechanism

no code implementations10 Jun 2022 Yang Zhao, Xuan Lin, Wenqiang Xu, Maozong Zheng, Zhengyong Liu, Zhou Zhao

In recent days, streaming technology has greatly promoted the development in the field of livestream.

Highlight Detection

Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation

no code implementations2 Jul 2022 Yang Zhao, Yan Song

To obtain more information to optimize the model, the existing method generated pseudo frame-wise labels iteratively based on the output of a segmentation model and the timestamp annotations.

Action Segmentation Model Optimization +1

A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning

no code implementations Findings (NAACL) 2022 Yang Zhao, Hua Qin, Wang Zhenyu, Changxi Zhu, Shihan Wang

It supports evaluating the difficulty of dialogue tasks only using the learning experiences of dialogue policy and skip-level selection according to their learning needs to maximize the learning efficiency.

AntCritic: Argument Mining for Free-Form and Visually-Rich Financial Comments

no code implementations20 Aug 2022 Yang Zhao, Wenqiang Xu, Xuan Lin, Jingjing Huo, Hong Chen, Zhou Zhao

The task of argument mining aims to detect all possible argumentative components and identify their relationships automatically.

Argument Mining

e-G2C: A 0.14-to-8.31 $μ$J/Inference NN-based Processor with Continuous On-chip Adaptation for Anomaly Detection and ECG Conversion from EGM

no code implementations24 Jul 2022 Yang Zhao, Yongan Zhang, Yonggan Fu, Xu Ouyang, Cheng Wan, Shang Wu, Anton Banta, Mathews M. John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin

This work presents the first silicon-validated dedicated EGM-to-ECG (G2C) processor, dubbed e-G2C, featuring continuous lightweight anomaly detection, event-driven coarse/precise conversion, and on-chip adaptation.

Anomaly Detection

A Simple Yet Effective Corpus Construction Method for Chinese Sentence Compression

no code implementations LREC 2022 Yang Zhao, Hiroshi Kanayama, Issei Yoshida, Masayasu Muraoka, Akiko Aizawa

To remedy this shortcoming, we present a dependency-tree-based method to construct a Chinese corpus with 151k pairs of sentences and compression based on Chinese language-specific characteristics.

Sentence Sentence Compression

CoopHash: Cooperative Learning of Multipurpose Descriptor and Contrastive Pair Generator via Variational MCMC Teaching for Supervised Image Hashing

no code implementations9 Oct 2022 Khoa D. Doan, Jianwen Xie, Yaxuan Zhu, Yang Zhao, Ping Li

Leveraging supervised information can lead to superior retrieval performance in the image hashing domain but the performance degrades significantly without enough labeled data.

Retrieval

Behavioral graph fraud detection in E-commerce

no code implementations13 Oct 2022 Hang Yin, Zitao Zhang, Zhurong Wang, Yilmazcan Ozyurt, Weiming Liang, Wenyu Dong, Yang Zhao, Yinan Shan

Our experiments show that embedding features learned from similarity based behavioral graph have achieved significant performance increase to the baseline fraud detection model in various business scenarios.

Fraud Detection graph construction +1

Boosting Semi-Supervised 3D Object Detection with Semi-Sampling

no code implementations14 Nov 2022 Xiaopei Wu, Yang Zhao, Liang Peng, Hua Chen, Xiaoshui Huang, Binbin Lin, Haifeng Liu, Deng Cai, Wanli Ouyang

When training a teacher-student semi-supervised framework, we randomly select gt samples and pseudo samples to both labeled frames and unlabeled frames, making a strong data augmentation for them.

3D Object Detection Data Augmentation +2

Stereo Image Rain Removal via Dual-View Mutual Attention

no code implementations18 Nov 2022 Yanyan Wei, Zhao Zhang, ZhongQiu Zhao, Yang Zhao, Richang Hong, Yi Yang

Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e. g., rain removal and super-resolution.

Disparity Estimation Image Restoration +2

Deformation measurement of a soil mixing retaining wall using terrestrial laser scanning

no code implementations12 Jan 2023 Yang Zhao, Lei Fan, Hyungjoon Seo

Retaining walls are often built to prevent excessive lateral movements of the ground surrounding an excavation site.

CoopInit: Initializing Generative Adversarial Networks via Cooperative Learning

no code implementations21 Mar 2023 Yang Zhao, Jianwen Xie, Ping Li

The proposed algorithm consists of two learning stages: (i) Cooperative initialization stage: The discriminator of GAN is treated as an energy-based model (EBM) and is optimized via maximum likelihood estimation (MLE), with the help of the GAN's generator to provide synthetic data to approximate the learning gradients.

Image-to-Image Translation

Identity Encoder for Personalized Diffusion

no code implementations14 Apr 2023 Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia

Our approach greatly reduces the overhead for personalized image generation and is more applicable in many potential applications.

Image Enhancement Image Generation

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

no code implementations1 May 2023 Ziyang Zhang, Huan Li, Yang Zhao, Changyao Lin, Jie Liu

As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time.

Scheduling

Multi-Teacher Knowledge Distillation For Text Image Machine Translation

no code implementations9 May 2023 Cong Ma, Yaping Zhang, Mei Tu, Yang Zhao, Yu Zhou, Chengqing Zong

Text image machine translation (TIMT) has been widely used in various real-world applications, which translates source language texts in images into another target language sentence.

Knowledge Distillation Machine Translation +2

Instant-NeRF: Instant On-Device Neural Radiance Field Training via Algorithm-Accelerator Co-Designed Near-Memory Processing

no code implementations9 May 2023 Yang Zhao, Shang Wu, Jingqun Zhang, Sixu Li, Chaojian Li, Yingyan Lin

Instant on-device Neural Radiance Fields (NeRFs) are in growing demand for unleashing the promise of immersive AR/VR experiences, but are still limited by their prohibitive training time.

Pink-Eggs Dataset V1: A Step Toward Invasive Species Management Using Deep Learning Embedded Solutions

no code implementations16 May 2023 Di Xu, Yang Zhao, Xiang Hao, Xin Meng

We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations.

Management

Revisiting the Stack-Based Inverse Tone Mapping

no code implementations CVPR 2023 Ning Zhang, Yuyao Ye, Yang Zhao, Ronggang Wang

In this paper, we revisit the stack-based ITM approaches and propose a novel method to reconstruct HDR radiance from a single image, which only needs to estimate two exposure images.

inverse tone mapping Inverse-Tone-Mapping +1

CLIP3Dstyler: Language Guided 3D Arbitrary Neural Style Transfer

no code implementations25 May 2023 Ming Gao, Yanwu Xu, Yang Zhao, Tingbo Hou, Chenkai Zhao, Mingming Gong

In this paper, we propose a novel language-guided 3D arbitrary neural style transfer method (CLIP3Dstyler).

Style Transfer

DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference

no code implementations2 Jun 2023 Ziyang Zhang, Yang Zhao, Huan Li, Changyao Lin, Jie Liu

Due to limited resources on edge and different characteristics of deep neural network (DNN) models, it is a big challenge to optimize DNN inference performance in terms of energy consumption and end-to-end latency on edge devices.

Collaborative Inference

Towards Authentic Face Restoration with Iterative Diffusion Models and Beyond

no code implementations ICCV 2023 Yang Zhao, Tingbo Hou, Yu-Chuan Su, Xuhui Jia. Yandong Li, Matthias Grundmann

An authentic face restoration system is becoming increasingly demanding in many computer vision applications, e. g., image enhancement, video communication, and taking portrait.

Blind Face Restoration Denoising +2

3DRP-Net: 3D Relative Position-aware Network for 3D Visual Grounding

no code implementations25 Jul 2023 Zehan Wang, Haifeng Huang, Yang Zhao, Linjun Li, Xize Cheng, Yichen Zhu, Aoxiong Yin, Zhou Zhao

3D visual grounding aims to localize the target object in a 3D point cloud by a free-form language description.

Object Position +3

Cross-Dataset-Robust Method for Blind Real-World Image Quality Assessment

no code implementations26 Sep 2023 Yuan Chen, Zhiliang Ma, Yang Zhao

First, many individual models based on popular and state-of-the-art (SOTA) Swin-Transformer (SwinT) are trained on different real-world BIQA datasets respectively.

Blind Image Quality Assessment

TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework

no code implementations29 Sep 2023 Yang Zhao, Jiaxi Yang, Wenbo Wang, Helin Yang, Dusit Niyato

Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime.

reinforcement-learning

UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs

no code implementations14 Nov 2023 Yanwu Xu, Yang Zhao, Zhisheng Xiao, Tingbo Hou

Text-to-image diffusion models have demonstrated remarkable capabilities in transforming textual prompts into coherent images, yet the computational cost of their inference remains a persistent challenge.

Text-to-Image Generation

Cut-and-Paste: Subject-Driven Video Editing with Attention Control

no code implementations20 Nov 2023 Zhichao Zuo, Zhao Zhang, Yan Luo, Yang Zhao, Haijun Zhang, Yi Yang, Meng Wang

This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image.

Object Video Editing

MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices

no code implementations28 Nov 2023 Yang Zhao, Yanwu Xu, Zhisheng Xiao, Tingbo Hou

The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and slow inference speed.

Computational Efficiency Text-to-Image Generation

Connecting Multi-modal Contrastive Representations

no code implementations NeurIPS 2023 Zehan Wang, Yang Zhao, Xize Cheng, Haifeng Huang, Jiageng Liu, Li Tang, Linjun Li, Yongqi Wang, Aoxiong Yin, Ziang Zhang, Zhou Zhao

This paper proposes a novel training-efficient method for learning MCR without paired data called Connecting Multi-modal Contrastive Representations (C-MCR).

3D Point Cloud Classification counterfactual +4

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

no code implementations30 Nov 2023 Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi, Tingbo Hou

We further extend our method to a novel image editing task: substituting the subject in an image through textual manipulations.

Denoising Image Generation

DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models

no code implementations5 Dec 2023 Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin C. K. Chan, Yandong Li, Yanwu Xu, Kun Zhang, Tingbo Hou

Our extensive experiments demonstrate the superior performance of our method in terms of visual quality, identity preservation, and text control, showcasing its effectiveness in the context of text-guided subject-driven image inpainting.

Image Inpainting

Adapting Vision Transformer for Efficient Change Detection

no code implementations8 Dec 2023 Yang Zhao, Yuxiang Zhang, Yanni Dong, Bo Du

Most change detection models based on vision transformers currently follow a "pretraining then fine-tuning" strategy.

Change Detection

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

no code implementations21 Dec 2023 Haifeng Huang, Yang Zhao, Zehan Wang, Yan Xia, Zhou Zhao

Thus, to address this issue and enhance model performance on new scenes, we explore the TVG task in an unsupervised domain adaptation (UDA) setting across scenes for the first time, where the video-query pairs in the source scene (domain) are labeled with temporal boundaries, while those in the target scene are not.

Unsupervised Domain Adaptation Video Grounding

Comparing roughness maps generated by five roughness descriptors for LiDAR-derived digital elevation models

no code implementations29 Dec 2023 Lei Fan, Yang Zhao

Terrain surface roughness, often described abstractly, poses challenges in quantitative characterisation with various descriptors found in the literature.

Instruct-Imagen: Image Generation with Multi-modal Instruction

no code implementations3 Jan 2024 Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia

We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.

Image Generation Retrieval

Audio-Infused Automatic Image Colorization by Exploiting Audio Scene Semantics

no code implementations24 Jan 2024 Pengcheng Zhao, Yanxiang Chen, Yang Zhao, Wei Jia, Zhao Zhang, Ronggang Wang, Richang Hong

Second, the natural co-occurrence of audio and video is utilized to learn the color semantic correlations between audio and visual scenes.

Colorization Image Colorization

A novel spatial-frequency domain network for zero-shot incremental learning

no code implementations11 Feb 2024 Jie Ren, Yang Zhao, Weichuan Zhang, Changming Sun

The proposed SFDNet has the ability to effectively extract spatial-frequency feature representation from input images, improve the accuracy of image classification, and fundamentally alleviate catastrophic forgetting.

Image Classification Incremental Learning +1

Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning

no code implementations18 Feb 2024 Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Zhouhao Sun, Jun Shi, Ting Liu, Bing Qin

Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance.

Machine Unlearning

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

no code implementations29 Feb 2024 Xiaozheng Zheng, Chao Wen, Zhuo Su, Zeran Xu, Zhaohu Li, Yang Zhao, Zhou Xue

In this paper, we delve into the creation of one-shot hand avatars, attaining high-fidelity and drivable hand representations swiftly from a single image.

MEBS: Multi-task End-to-end Bid Shading for Multi-slot Display Advertising

no code implementations5 Mar 2024 Zhen Gong, Lvyin Niu, Yang Zhao, Miao Xu, Zhenzhe Zheng, Haoqi Zhang, Zhilin Zhang, Fan Wu, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng

Through extensive offline and online experiments, we demonstrate the effectiveness and efficiency of our method, and we obtain a 7. 01% lift in Gross Merchandise Volume, a 7. 42% lift in Return on Investment, and a 3. 26% lift in ad buy count.

Cannot find the paper you are looking for? You can Submit a new open access paper.