Search Results for author: Zheng Zhang

Found 275 papers, 140 papers with code

MovieChats: Chat like Humans in a Closed Domain

no code implementations EMNLP 2020 Hui Su, Xiaoyu Shen, Zhou Xiao, Zheng Zhang, Ernie Chang, Cheng Zhang, Cheng Niu, Jie zhou

In this work, we take a close look at the movie domain and present a large-scale high-quality corpus with fine-grained annotations in hope of pushing the limit of movie-domain chatbots.

Chatbot Retrieval

Region Graph Embedding Network for Zero-Shot Learning

no code implementations ECCV 2020 Guo-Sen Xie, Li Liu, Fan Zhu, Fang Zhao, Zheng Zhang, Yazhou Yao, Jie Qin, Ling Shao

To exploit the progressive interactions among these regions, we represent them as a region graph, on which the parts relation reasoning is performed with graph convolutions, thus leading to our PRR branch.

Graph Embedding Relation +1

Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective

no code implementations6 Apr 2024 Duanyu Feng, Bowen Qin, Chen Huang, Zheng Zhang, Wenqiang Lei

Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences.

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

1 code implementation31 Mar 2024 Zhuotong Chen, Zihu Wang, Yifan Yang, Qianxiao Li, Zheng Zhang

This approach reduces the computational cost to that of using just the P controller, instead of the full PID control.

EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs

1 code implementation30 Mar 2024 Cheng Jiayang, Lin Qiu, Chunkit Chan, Xin Liu, Yangqiu Song, Zheng Zhang

In this work, we propose an initial comprehensive framework called EventGround, which aims to tackle the problem of grounding free-texts to eventuality-centric KGs for contextualized narrative reasoning.

Knowledge Graphs Language Modelling +2

CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

no code implementations28 Mar 2024 Jie Wen, Zheng Zhang, Yong Xu, Bob Zhang, Lunke Fei, Guo-Sen Xie

In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues.

Clustering Graph Embedding +1

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

no code implementations22 Mar 2024 Zheng Zhang, WenBo Hu, Yixing Lao, Tong He, Hengshuang Zhao

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.

Novel View Synthesis

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

1 code implementation21 Mar 2024 Zheng Zhang, Yeyao Ma, Enming Zhang, Xiang Bai

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

Generalized Referring Expression Segmentation Image Segmentation +5

Common 7B Language Models Already Possess Strong Math Capabilities

no code implementations7 Mar 2024 Chen Li, Weiqi Wang, Jingcheng Hu, Yixuan Wei, Nanning Zheng, Han Hu, Zheng Zhang, Houwen Peng

This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities, as evidenced by its impressive accuracy of 97. 7% and 72. 0% on the GSM8K and MATH benchmarks, respectively, when selecting the best response from 256 random generations.

GSM8K Math

OffLanDat: A Community Based Implicit Offensive Language Dataset Generated by Large Language Model Through Prompt Engineering

no code implementations4 Mar 2024 Amit Das, Mostafa Rahgouy, Dongji Feng, Zheng Zhang, Tathagata Bhattacharya, Nilanjana Raychawdhary, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals

Firstly, the existing datasets primarily rely on the collection of texts containing explicit offensive keywords, making it challenging to capture implicitly offensive contents that are devoid of these keywords.

Language Modelling Large Language Model +1

Distilling Large Language Models for Text-Attributed Graph Learning

no code implementations19 Feb 2024 Bo Pan, Zheng Zhang, Yifei Zhang, Yuntong Hu, Liang Zhao

To address the inherent gaps between LLMs (generative models for texts) and graph models (discriminative models for graphs), we propose first to let LLMs teach an interpreter with rich textual rationale and then let a student model mimic the interpreter's reasoning without LLMs' textual rationale.

Graph Learning TAG

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models

1 code implementation18 Feb 2024 Yifan Yang, Jiajun Zhou, Ngai Wong, Zheng Zhang

Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance.

Multi-Task Learning

Multi-Agent Generative Adversarial Interactive Self-Imitation Learning for AUV Formation Control and Obstacle Avoidance

no code implementations21 Jan 2024 Zheng Fang, Tianhao Chen, Dong Jiang, Zheng Zhang, Guangliang Li

Multi-agent generative adversarial imitation learning (MAGAIL) allows multi-AUV to learn from expert demonstration instead of pre-defined reward functions, but suffers from the deficiency of requiring optimal demonstrations and not surpassing provided expert demonstrations.

Imitation Learning Multi-agent Reinforcement Learning

See the Unseen: Better Context-Consistent Knowledge-Editing by Noises

no code implementations15 Jan 2024 Youcheng Huang, Wenqiang Lei, Zheng Zhang, Jiancheng Lv, Shuicheng Yan

In this paper, we empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution.

knowledge editing

Real-Time FJ/MAC PDE Solvers via Tensorized, Back-Propagation-Free Optical PINN Training

no code implementations31 Dec 2023 Yequan Zhao, Xian Xiao, Xinling Yu, Ziyue Liu, Zhixiong Chen, Geza Kurczveil, Raymond G. Beausoleil, Zheng Zhang

Despite the ultra-high speed of optical neural networks, training a PINN on an optical chip is hard due to (1) the large size of photonic devices, and (2) the lack of scalable optical memory devices to store the intermediate results of back-propagation (BP).

Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style

no code implementations20 Dec 2023 Haohan Wang, Wei Feng, Yang Lu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junjie Shen, Zhangang Lin, Lixing Bo, Jingping Shao

Furthermore, for products with specific and fine-grained requirements in layout, elements, etc, a Personality-Wise Generator is devised to learn such personalized style directly from a reference image to resolve textual ambiguities, and is trained in a self-supervised manner for more efficient training data usage.


Non-Euclidean Spatial Graph Neural Network

1 code implementation17 Dec 2023 Zheng Zhang, Sirui Li, Jingcheng Zhou, Junxiang Wang, Abhinav Angirekula, Allen Zhang, Liang Zhao

Besides, existing spatial network representation learning methods can only consider networks embedded in Euclidean space, and can not well exploit the rich geometric information carried by irregular and non-uniform non-Euclidean space.

Representation Learning

Planning and Rendering: Towards End-to-End Product Poster Generation

no code implementations14 Dec 2023 Zhaochen Li, Fengheng Li, Wei Feng, Honghe Zhu, An Liu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Xin Zhu, Junjie Shen, Zhangang Lin, Jingping Shao, Zhenglu Yang

At the planning stage, we propose a PlanNet to generate the layout of the product and other visual components considering both the appearance features of the product and semantic features of the text, which improves the diversity and rationality of the layouts.

Image Inpainting

Segment and Caption Anything

1 code implementation1 Dec 2023 Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.

Caption Generation object-detection +2

RTQ: Rethinking Video-language Understanding Based on Image-text Model

2 code implementations1 Dec 2023 Xiao Wang, Yaoyu Li, Tian Gan, Zheng Zhang, Jingjing Lv, Liqiang Nie

Recent advancements in video-language understanding have been established on the foundation of image-text models, resulting in promising outcomes due to the shared knowledge between images and videos.

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

1 code implementation22 Nov 2023 Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu

Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.

Hallucination Retrieval

Learning to Complement with Multiple Humans (LECOMH): Integrating Multi-rater and Noisy-Label Learning into Human-AI Collaboration

no code implementations22 Nov 2023 Zheng Zhang, Kevin Wells, Gustavo Carneiro

The advent of learning with noisy labels (LNL), multi-rater learning, and human-AI collaboration has revolutionised the development of robust classifiers, enabling them to address the challenges posed by different types of data imperfections and complex decision processes commonly encountered in real-world applications.

Learning with noisy labels

A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

no code implementations17 Nov 2023 Ruohong Zhang, Luyu Gao, Chen Zheng, Zhen Fan, Guokun Lai, Zheng Zhang, Fangzhou Ai, Yiming Yang, Hongxia Yang

This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries.

Chatbot Text Generation

Asymptotically Fair Participation in Machine Learning Models: an Optimal Control Perspective

no code implementations16 Nov 2023 Zhuotong Chen, Qianxiao Li, Zheng Zhang

Moreover, we design a surrogate retention system based on existing literature on evolutionary population dynamics to approximate the dynamics of distribution shifts on active user counts, from which the objective of achieving asymptotically fair participation is formulated as an optimal control problem, and the control variables are considered as the model parameters.

WinNet:time series forecasting with a window-enhanced period extracting and interacting

no code implementations1 Nov 2023 Wenjie Ou, Dongyue Guo, Zheng Zhang, Zhishuo Zhao, Yi Lin

We present a highly accurate and simply structured CNN-based model for long-term time series forecasting tasks, called WinNet, including (i) Inter-Intra Period Encoder (I2PE) to transform 1D sequence into 2D tensor with long and short periodicity according to the predefined periodic window, (ii) Two-Dimensional Period Decomposition (TDPD) to model period-trend and oscillation terms, and (iii) Decomposition Correlation Block (DCB) to leverage the correlations of the period-trend and oscillation terms to support the prediction tasks by CNNs.

Time Series Time Series Forecasting

Semantic-Aware Adversarial Training for Reliable Deep Hashing Retrieval

1 code implementation IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2023 Xu Yuan, Zheng Zhang, Xunguang Wang, Lin Wu

Further, we, for the first time, formulate the formalized adversarial training of deep hashing into a unified minimax optimization under the guidance of the generated mainstay codes.

Adversarial Attack Adversarial Robustness +2

Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts

1 code implementation23 Oct 2023 Tengxiao Liu, Qipeng Guo, Yuqing Yang, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang

As large language models (LLMs) have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity to each other on math reasoning tasks.

Logical Reasoning Math

Large Language Models for Spatial Trajectory Patterns Mining

no code implementations7 Oct 2023 Zheng Zhang, Hossein Amiri, Zhenke Liu, Andreas Züfle, Liang Zhao

Identifying anomalous human spatial trajectory patterns can indicate dynamic changes in mobility behavior with applications in domains like infectious disease monitoring and elderly care.

Anomaly Detection

Transferable Deep Clustering Model

no code implementations7 Oct 2023 Zheng Zhang, Liang Zhao

Deep learning has shown remarkable success in the field of clustering recently.

Clustering Deep Clustering +1

Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data

no code implementations7 Oct 2023 Yuntong Hu, Zheng Zhang, Liang Zhao

Large language models (LLMs) have achieved impressive performance on many natural language processing tasks.


Balancing Specialized and General Skills in LLMs: The Impact of Modern Tuning and Data Strategy

no code implementations7 Oct 2023 Zheng Zhang, Chen Zheng, Da Tang, Ke Sun, Yukun Ma, Yingtong Bu, Xun Zhou, Liang Zhao

This paper introduces a multifaceted methodology for fine-tuning and evaluating large language models (LLMs) for specialized monetization tasks.

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

1 code implementation3 Oct 2023 Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Jiancheng Liu, Konstantinos Parasyris, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu

Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time.

Adversarial Defense Computational Efficiency +1

KLoB: a Benchmark for Assessing Knowledge Locating Methods in Language Models

1 code implementation28 Sep 2023 Yiming Ju, Zheng Zhang

KLoB can serve as a benchmark for evaluating existing locating methods in language models, and can contributes a method to reassessing the validity of locality hypothesis of factual knowledge.

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

1 code implementation ICCV 2023 Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion.

Object Video Segmentation +1

Partition-A-Medical-Image: Extracting Multiple Representative Sub-regions for Few-shot Medical Image Segmentation

1 code implementation20 Sep 2023 Yazhou Zhu, Shidong Wang, Tong Xin, Zheng Zhang, Haofeng Zhang

In this work, we present an approach to extract multiple representative sub-regions from a given support medical image, enabling fine-grained selection over the generated image regions.

Image Segmentation Medical Image Segmentation +1

Unsupervised Open-Vocabulary Object Localization in Videos

no code implementations ICCV 2023 Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.

Object Object Localization +1

FLM-101B: An Open LLM and How to Train It with $100K Budget

no code implementations7 Sep 2023 Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang

We demonstrate that a 101B-parameter LLM with 0. 31T tokens can be trained with a budget of 100K US dollars.


InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

1 code implementation7 Sep 2023 Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, Baining Guo

We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.

Keypoint Detection

Object-Centric Multiple Object Tracking

1 code implementation ICCV 2023 Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.

Multiple Object Tracking Object +3

Coarse-to-Fine Amodal Segmentation with Shape Prior

1 code implementation ICCV 2023 Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation.

Object Segmentation +1

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

1 code implementation28 Aug 2023 Fengling Li, Lei Zhu, Tianshi Wang, Jingjing Li, Zheng Zhang, Heng Tao Shen

With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users demanding access to data from various modalities.

Cross-Modal Retrieval Retrieval

Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks

no code implementations18 Aug 2023 Yequan Zhao, Xinling Yu, Zhixiong Chen, Ziyue Liu, Sijia Liu, Zheng Zhang

Backward propagation (BP) is widely used to compute the gradients in neural network training.

DETR Doesn't Need Multi-Scale or Locality Design

1 code implementation3 Aug 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection

no code implementations ICCV 2023 Yadan Luo, Zhuoxiao Chen, Zhen Fang, Zheng Zhang, Zi Huang, Mahsa Baktashmotlagh

Achieving a reliable LiDAR-based object detector in autonomous driving is paramount, but its success hinges on obtaining large amounts of precise 3D annotations.

3D Object Detection Active Learning +4

Partial Vessels Annotation-based Coronary Artery Segmentation with Self-training and Prototype Learning

1 code implementation10 Jul 2023 Zheng Zhang, XiaoLei Zhang, Yaolei Qi, Guanyu Yang

To this end, we propose partial vessels annotation (PVA) based on the challenges of coronary artery segmentation and clinical diagnostic characteristics.

Coronary Artery Segmentation Segmentation +1

A review of dynamics design methods for high-speed and high-precision CNC machine tool feed systems

no code implementations7 Jul 2023 Xuesong Wang, Dongsheng Zhang, Zheng Zhang

With the development of CNC machine tools toward high speed and high precision, the traditional static design methods can hardly meet the demand.

Distributed Marker Representation for Ambiguous Discourse Markers and Entangled Relations

no code implementations19 Jun 2023 Dongyu Ru, Lin Qiu, Xipeng Qiu, Yue Zhang, Zheng Zhang

Discourse analysis is an important task because it models intrinsic semantic structures between sentences in a document.


Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective

1 code implementation18 Jun 2023 Yan Zhuang, Qi Liu, Yuting Ning, Weizhe Huang, Rui Lv, Zhenya Huang, Guanhao Zhao, Zheng Zhang, Qingyang Mao, Shijin Wang, Enhong Chen

Different tests for different models using efficient adaptive testing -- we believe this has the potential to become a new norm in evaluating large language models.

Mathematical Reasoning

A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels

no code implementations8 Jun 2023 Yifan Yang, Alec Koppel, Zheng Zhang

In this paper, we propose a novel gradient-based approach to enable the detection of noisy labels for the online learning of model parameters, named Online Gradient-based Robust Selection (OGRS).

Learning with noisy labels

Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning

1 code implementation6 Jun 2023 Chujie Zheng, Pei Ke, Zheng Zhang, Minlie Huang

It has always been an important yet challenging problem to control language models to avoid generating texts with undesirable attributes, such as toxic language and unnatural repetition.

Contrastive Learning Text Generation

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

no code implementations1 Jun 2023 Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang

To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer.

Natural Language Understanding Quantization

An AMR-based Link Prediction Approach for Document-level Event Argument Extraction

1 code implementation30 May 2023 Yuqing Yang, Qipeng Guo, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang

Motivated by the fact that all event structures can be inferred from AMR, this work reformulates EAE as a link prediction problem on AMR graphs.

Event Argument Extraction Link Prediction +1

Exploiting Abstract Meaning Representation for Open-Domain Question Answering

1 code implementation26 May 2023 Cunxiang Wang, Zhikun Xu, Qipeng Guo, Xiangkun Hu, Xuefeng Bai, Zheng Zhang, Yue Zhang

The Open-Domain Question Answering (ODQA) task involves retrieving and subsequently generating answers from fine-grained relevant passages within a database.

Natural Questions Open-Domain Question Answering +1

Evaluating Open-QA Evaluation

1 code implementation NeurIPS 2023 Cunxiang Wang, Sirui Cheng, Qipeng Guo, Yuanhao Yue, Bowen Ding, Zhikun Xu, Yidong Wang, Xiangkun Hu, Zheng Zhang, Yue Zhang

This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs).

Question Answering

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

1 code implementation12 May 2023 Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang

Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries.


Masked Structural Growth for 2x Faster Language Model Pre-training

1 code implementation4 May 2023 Yiqun Yao, Zheng Zhang, Jing Li, Yequan Wang

In terms of growth schedule, the impact of each single dimension on a schedule's efficiency is under-explored by existing work.

Language Modelling Large Language Model +1

FlightBERT++: A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework

no code implementations2 May 2023 Dongyue Guo, Zheng Zhang, Zhen Yan, Jianwei Zhang, Yi Lin

Flight Trajectory Prediction (FTP) is an essential task in Air Traffic Control (ATC), which can assist air traffic controllers in managing airspace more safely and efficiently.

Computational Efficiency Trajectory Prediction

VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping

no code implementations16 Apr 2023 Zheng Zhang, Jie Gao, Ranjodh Singh Dhaliwal, Toby Jia-Jun Li

In argumentative writing, writers must brainstorm hierarchical writing goals, ensure the persuasiveness of their arguments, and revise and organize their plans through drafting.

Persuasiveness Text Generation

DeepMIM: Deep Supervision for Masked Image Modeling

1 code implementation15 Mar 2023 Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.

Image Classification object-detection +2

Particle-based Online Bayesian Sampling

no code implementations28 Feb 2023 Yifan Yang, Chang Liu, Zheng Zhang

Online optimization has gained increasing interest due to its capability of tracking real-world streaming data.

Variational Inference

DeepOHeat: Operator Learning-based Ultra-fast Thermal Simulation in 3D-IC Design

no code implementations25 Feb 2023 Ziyue Liu, Yixing Li, Jing Hu, Xinling Yu, Shinyu Shiau, Xin Ai, Zhiyu Zeng, Zheng Zhang

In this paper, for the first time, we propose DeepOHeat, a physics-aware operator learning framework to predict the temperature field of a family of heat equations with multiple parametric or non-parametric design configurations.

Operator learning

Side Adapter Network for Open-Vocabulary Semantic Segmentation

3 code implementations CVPR 2023 Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

Language Modelling Open Vocabulary Semantic Segmentation +3

PIFON-EPT: MR-Based Electrical Property Tomography Using Physics-Informed Fourier Networks

no code implementations23 Feb 2023 Xinling Yu, José E. C. Serrallés, Ilias I. Giannakopoulos, Ziyue Liu, Luca Daniel, Riccardo Lattanzi, Zheng Zhang

PIFON-EPT is the first method that can simultaneously reconstruct EP and transmit fields from incomplete noisy MR measurements, providing new opportunities for EPT research.


Tensorized Optical Multimodal Fusion Network

no code implementations17 Feb 2023 Yequan Zhao, Xian Xiao, Geza Kurczveil, Raymond G. Beausoleil, Zheng Zhang

We propose the first tensorized optical multimodal fusion network architecture with a self-attention mechanism and low-rank tensor fusion.

Apples and Oranges? Assessing Image Quality over Content Recognition

no code implementations22 Jan 2023 Junyong You, Zheng Zhang

A sequential spatial-channel attention module is proposed to simulate the visual attention and contrast sensitivity mechanisms that are crucial for content recognition and quality assessment.

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

no code implementations18 Jan 2023 Kezhao Huang, Haitian Jiang, Minjie Wang, Guangxuan Xiao, David Wipf, Xiang Song, Quan Gan, Zengfeng Huang, Jidong Zhai, Zheng Zhang

A key performance bottleneck when training graph neural network (GNN) models on large, real-world graphs is loading node features onto a GPU.

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

1 code implementation ICCV 2023 Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu

With these new techniques and other designs, we show that the proposed general-purpose task-solver can perform both instance segmentation and depth estimation well.

Instance Segmentation Monocular Depth Estimation +1

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

2 code implementations CVPR 2023 Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu

Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.

Image Classification Semantic Segmentation

iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition

no code implementations CVPR 2023 Yixuan Wei, Yue Cao, Zheng Zhang, Houwen Peng, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

This paper presents a method that effectively combines two prevalent visual recognition methods, i. e., image classification and contrastive language-image pre-training, dubbed iCLIP.

Classification Image Classification +2

DETR Does Not Need Multi-Scale or Locality Design

1 code implementation ICCV 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

Improving CLIP Fine-tuning Performance

1 code implementation ICCV 2023 Yixuan Wei, Han Hu, Zhenda Xie, Ze Liu, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

Experiments suggest that the feature map distillation approach significantly boosts the fine-tuning performance of CLIP models on several typical downstream vision tasks.

object-detection Object Detection +1

Refined Edge Usage of Graph Neural Networks for Edge Prediction

no code implementations25 Dec 2022 Jiarui Jin, Yangkun Wang, Weinan Zhang, Quan Gan, Xiang Song, Yong Yu, Zheng Zhang, David Wipf

However, existing methods lack elaborate design regarding the distinctions between two tasks that have been frequently overlooked: (i) edges only constitute the topology in the node classification task but can be used as both the topology and the supervisions (i. e., labels) in the edge prediction task; (ii) the node classification makes prediction over each individual node, while the edge prediction is determinated by each pair of nodes.

Link Prediction Node Classification

EASpace: Enhanced Action Space for Policy Transfer

1 code implementation7 Dec 2022 Zheng Zhang, Qingrui Zhang, Bo Zhu, Xiaohan Wang, Tianjiang Hu

In this paper, a novel algorithm named EASpace (Enhanced Action Space) is proposed, which formulates macro actions in an alternative form to accelerate the learning process using multiple available sub-optimal expert policies.

Q-Learning Transfer Learning

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

1 code implementation5 Dec 2022 Xi Zhao, Wei Feng, Zheng Zhang, Jingjing Lv, Xin Zhu, Zhangang Lin, Jinghe Hu, Jingping Shao

Recently, segmentation-based methods are quite popular in scene text detection, which mainly contain two steps: text kernel segmentation and expansion.

Scene Text Detection Segmentation +1

Exploring Discrete Diffusion Models for Image Captioning

1 code implementation21 Nov 2022 Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.

Image Captioning Image Generation

Could Giant Pretrained Image Models Extract Universal Representations?

no code implementations3 Nov 2022 Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao

In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition.

Action Recognition In Videos Instance Segmentation +5

RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees

1 code implementation31 Oct 2022 Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang

RLET iteratively performs single step reasoning with sentence selection and deduction generation modules, from which the training signal is accumulated across the tree with elaborately designed aligned reward function that is consistent with the evaluation.

reinforcement-learning Reinforcement Learning (RL) +1

DORE: Document Ordered Relation Extraction based on Generative Framework

1 code implementation28 Oct 2022 Qipeng Guo, Yuqing Yang, Hang Yan, Xipeng Qiu, Zheng Zhang

In this paper, we investigate the root cause of the underwhelming performance of the existing generative DocRE models and discover that the culprit is the inadequacy of the training paradigm, instead of the capacities of the models.

Document-level Relation Extraction Relation

Conversation Disentanglement with Bi-Level Contrastive Learning

no code implementations27 Oct 2022 Chengyu Huang, Zheng Zhang, Hao Fei, Lizi Liao

Conversation disentanglement aims to group utterances into detached sessions, which is a fundamental task in processing multi-party conversations.

Contrastive Learning Conversation Disentanglement +1

Self-supervised Amodal Video Object Segmentation

1 code implementation23 Oct 2022 Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang

The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned.

Object Segmentation +6

MR-Based Electrical Property Reconstruction Using Physics-Informed Neural Networks

no code implementations23 Oct 2022 Xinling Yu, José E. C. Serrallés, Ilias I. Giannakopoulos, Ziyue Liu, Luca Daniel, Riccardo Lattanzi, Zheng Zhang

Electrical properties (EP), namely permittivity and electric conductivity, dictate the interactions between electromagnetic waves and biological tissue.

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

4 code implementations3 Oct 2022 Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, WeiHong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu

Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens.

Clustering Depth Estimation +6

Vega-MT: The JD Explore Academy Translation System for WMT22

1 code implementation20 Sep 2022 Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao

As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.

Data Augmentation Machine Translation +1

Whole-Body Lesion Segmentation in 18F-FDG PET/CT

1 code implementation16 Sep 2022 Jia Zhang, Yukun Huang, Zheng Zhang, Yuhang Shi

There has been growing research interest in using deep learning based method to achieve fully automated segmentation of lesion in Positron emission tomography computed tomography(PET CT) scans for the prognosis of various cancers.

Image Segmentation Lesion Segmentation +2

An Empirical Study and Analysis of Learning Generalizable Manipulation Skill in the SAPIEN Simulator

no code implementations31 Aug 2022 Kun Liu, Huiyuan Fu, Zheng Zhang, Huanpu Yin

This paper provides a brief overview of our submission to the no interaction track of SAPIEN ManiSkill Challenge 2021.

CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation

1 code implementation18 Aug 2022 Jinfeng Zhou, Chujie Zheng, Bo wang, Zheng Zhang, Minlie Huang

Empathetic conversation is psychologically supposed to be the result of conscious alignment and interaction between the cognition and affection of empathy.

Dialogue Generation Empathetic Response Generation +1

A Survey on Incomplete Multi-view Clustering

1 code implementation17 Aug 2022 Jie Wen, Zheng Zhang, Lunke Fei, Bob Zhang, Yong Xu, Zhao Zhang, Jinxing Li

However, in practical applications, such as disease diagnosis, multimedia analysis, and recommendation system, it is common to observe that not all views of samples are available in many cases, which leads to the failure of the conventional multi-view clustering methods.

Clustering Incomplete multi-view clustering

Learning Resolution-Adaptive Representations for Cross-Resolution Person Re-Identification

no code implementations9 Jul 2022 Lin Wu, Lingqiao Liu, Yang Wang, Zheng Zhang, Farid Boussaid, Mohammed Bennamoun

It is a challenging and practical problem since the query images often suffer from resolution degradation due to the different capturing conditions from real-world cameras.

Person Re-Identification Super-Resolution

TT-PINN: A Tensor-Compressed Neural PDE Solver for Edge Computing

no code implementations4 Jul 2022 Ziyue Liu, Xinling Yu, Zheng Zhang

Physics-informed neural networks (PINNs) have been increasingly employed due to their capability of modeling complex physics systems.


Domain Adaptive Nuclei Instance Segmentation and Classification via Category-aware Feature Alignment and Pseudo-labelling

no code implementations4 Jul 2022 Canran Li, Dongnan Liu, Haoran Li, Zheng Zhang, Guangming Lu, Xiaojun Chang, Weidong Cai

In this work, we propose a novel deep neural network, namely Category-Aware feature alignment and Pseudo-Labelling Network (CAPL-Net) for UDA nuclei instance segmentation and classification.

Classification Instance Segmentation +3

Self-Healing Robust Neural Networks via Closed-Loop Control

1 code implementation26 Jun 2022 Zhuotong Chen, Qianxiao Li, Zheng Zhang

While numerous attack and defense techniques have been developed, this work investigates the robustness issue from a new angle: can we design a self-healing neural network that can automatically detect and fix the vulnerability issue by itself?

Predicting Electricity Infrastructure Induced Wildfire Risk in California

no code implementations6 Jun 2022 Mengqi Yao, Meghana Bharadwaj, Zheng Zhang, Baihong Jin, Duncan S. Callaway

Our data include historical ignition and wire-down points triggered by grid infrastructure collected between 2015 to 2019 in Pacific Gas & Electricity territory along with various weather, vegetation, and very high resolution data on grid infrastructure including location, age, materials.

Weather Forecasting

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

1 code implementation27 May 2022 Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools.

Ranked #2 on Instance Segmentation on COCO test-dev (using extra training data)

Contrastive Learning Image Classification +5

Revealing the Dark Secrets of Masked Image Modeling

1 code implementation CVPR 2023 Zhenda Xie, Zigang Geng, Jingcheng Hu, Zheng Zhang, Han Hu, Yue Cao

In this paper, we compare MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences.

Inductive Bias Monocular Depth Estimation +3

Heterogeneous Information Network based Default Analysis on Banking Micro and Small Enterprise Users

no code implementations24 Apr 2022 Zheng Zhang, Yingsheng Ji, Jiachen Shen, Xi Zhang, Guangwen Yang

Risk assessment is a substantial problem for financial institutions that has been extensively studied both for its methodological richness and its various practical applications.

Feature Engineering Implicit Relations

Dialogue Meaning Representation for Task-Oriented Dialogue Systems

1 code implementation23 Apr 2022 Xiangkun Hu, Junqi Dai, Hang Yan, Yi Zhang, Qipeng Guo, Xipeng Qiu, Zheng Zhang

We propose Dialogue Meaning Representation (DMR), a pliable and easily extendable representation for task-oriented dialogue.

coreference-resolution Negation +1

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

no code implementations22 Apr 2022 Yixuan Wei, Yue Cao, Zheng Zhang, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

Second, we convert the image classification problem from learning parametric category classifier weights to learning a text encoder as a meta network to generate category classifier weights.

Action Recognition Classification +7

BLISS: Robust Sequence-to-Sequence Learning via Self-Supervised Input Representation

no code implementations16 Apr 2022 Zheng Zhang, Liang Ding, Dazhao Cheng, Xuebo Liu, Min Zhang, DaCheng Tao

Data augmentations (DA) are the cores to achieving robust sequence-to-sequence learning on various natural language processing (NLP) tasks.

Grammatical Error Correction Machine Translation +1

Visual Mechanisms Inspired Efficient Transformers for Image and Video Quality Assessment

no code implementations28 Mar 2022 Junyong You, Zheng Zhang

Meanwhile, representative features for image quality perception in the spatial and frequency domains can also be derived from the IQA model, which are then fed into another windowed transformer architecture for video quality assessment (VQA).

Image Quality Assessment Video Quality Assessment +1

Multi-robot Cooperative Pursuit via Potential Field-Enhanced Reinforcement Learning

no code implementations9 Mar 2022 Zheng Zhang, Xiaohan Wang, Qingrui Zhang, Tianjiang Hu

It is shown by numerical simulations that the proposed hybrid design outperforms the pursuit policies either learned from vanilla reinforcement learning or designed by the potential field method.

reinforcement-learning Reinforcement Learning (RL)

AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation

1 code implementation26 Feb 2022 Chujie Zheng, Sahand Sabour, Jiaxin Wen, Zheng Zhang, Minlie Huang

Applying this approach, we construct AugESC, an augmented dataset for the ESC task, which largely extends the scale and topic coverage of the crowdsourced ESConv corpus.

Data Augmentation Dialogue Generation +2

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement

1 code implementation13 Feb 2022 Zheng Zhang, Ying Xu, Yanhao Wang, Bingsheng Yao, Daniel Ritchie, Tongshuang Wu, Mo Yu, Dakuo Wang, Toby Jia-Jun Li

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges in coming up with appropriate questions.


Online, Informative MCMC Thinning with Kernelized Stein Discrepancy

1 code implementation18 Jan 2022 Cole Hawkins, Alec Koppel, Zheng Zhang

A fundamental challenge in Bayesian inference is efficient representation of a target distribution.

Bayesian Inference

A Critical Review of Inductive Logic Programming Techniques for Explainable AI

no code implementations31 Dec 2021 Zheng Zhang, Liangliang Xu, Levent Yilmaz, Bo Liu

Despite recent advances in modern machine learning algorithms, the opaqueness of their underlying mechanisms continues to be an obstacle in adoption.

BIG-bench Machine Learning Explainable artificial intelligence +2

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

2 code implementations29 Dec 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

Image Classification Language Modelling +8

Representation Learning on Spatial Networks

1 code implementation NeurIPS 2021 Zheng Zhang, Liang Zhao

Specifically, a provably information-lossless and roto-translation invariant representation of spatial information on networks is presented.

Representation Learning Translation

Swin Transformer V2: Scaling Up Capacity and Resolution

19 code implementations CVPR 2022 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Ranked #4 on Image Classification on ImageNet V2 (using extra training data)

Action Classification Image Classification +3

SimMIM: A Simple Framework for Masked Image Modeling

4 code implementations CVPR 2022 Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.

Representation Learning Self-Supervised Image Classification +1

Bootstrap Your Object Detector via Mixed Training

1 code implementation NeurIPS 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai

We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.

Data Augmentation Missing Labels +3

Why Propagate Alone? Parallel Use of Labels and Features on Graphs

no code implementations ICLR 2022 Yangkun Wang, Jiarui Jin, Weinan Zhang, Yongyi Yang, Jiuhai Chen, Quan Gan, Yong Yu, Zheng Zhang, Zengfeng Huang, David Wipf

In this regard, it has recently been proposed to use a randomly-selected portion of the training labels as GNN inputs, concatenated with the original node features for making predictions on the remaining labels.

Node Property Prediction Property Prediction

Kokoyi: Executable LaTeX for End-to-end Deep Learning

no code implementations29 Sep 2021 Minjie Wang, Haoming Lu, Yu Gai, Lesheng Jin, Zihao Ye, Zheng Zhang

Despite substantial efforts from the deep learning system community to relieve researchers and practitioners from the burden of implementing models with ever-growing complexity, a considerable lingual gap remains between developing models in the language of mathematics and implementing them in the languages of computer.

Math Translation

Inductive Relation Prediction Using Analogy Subgraph Embeddings

no code implementations ICLR 2022 Jiarui Jin, Yangkun Wang, Kounianhua Du, Weinan Zhang, Zheng Zhang, David Wipf, Yong Yu, Quan Gan

Prevailing methods for relation prediction in heterogeneous graphs aim at learning latent representations (i. e., embeddings) of observed nodes and relations, and thus are limited to the transductive setting where the relation types must be known during training.

Inductive Bias Inductive Relation Prediction +1

Deep Collaborative Multi-Modal Learning for Unsupervised Kinship Estimation

no code implementations7 Sep 2021 Guan-Nan Dong, Chi-Man Pun, Zheng Zhang

To this end, we propose a novel deep collaborative multi-modal learning (DCML) to integrate the underlying information presented in facial properties in an adaptive manner to strengthen the facial details for effective unsupervised kinship verification.

Face Recognition Kinship Verification

Kinship Verification Based on Cross-Generation Feature Interaction Learning

no code implementations7 Sep 2021 Guan-Nan Dong, Chi-Man Pun, Zheng Zhang

Specifically, we take parents and children as a whole to extract the expressive local and non-local features.

Kinship Verification

EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training

2 code implementations3 Aug 2021 Hao Zhou, Pei Ke, Zheng Zhang, Yuxian Gu, Yinhe Zheng, Chujie Zheng, Yida Wang, Chen Henry Wu, Hao Sun, Xiaocong Yang, Bosi Wen, Xiaoyan Zhu, Minlie Huang, Jie Tang

Although pre-trained language models have remarkably enhanced the generation ability of dialogue systems, open-domain Chinese dialogue systems are still limited by the dialogue data and the model size compared with English ones.

MConv: An Environment for Multimodal Conversational Search across Multiple Domains

1 code implementation SIGIR 2021 Lizi Liao, Le Hong Long, Zheng Zhang, Minlie Huang, Tat-Seng Chua

Second, a set of benchmark results for dialogue state tracking, conversational recommendation, response generation as well as a unified model for multiple tasks are reported.

Conversational Search Dialogue State Tracking +1

Learning Hierarchical Graph Neural Networks for Image Clustering

2 code implementations ICCV 2021 Yifan Xing, Tong He, Tianjun Xiao, Yongxin Wang, Yuanjun Xiong, Wei Xia, David Wipf, Zheng Zhang, Stefano Soatto

Our hierarchical GNN uses a novel approach to merge connected components predicted at each level of the hierarchy to form a new graph at the next level.

Clustering Face Clustering

Context-Aware Attention-Based Data Augmentation for POI Recommendation

no code implementations30 Jun 2021 Yang Li, Yadan Luo, Zheng Zhang, Shazia W. Sadiq, Peng Cui

It aims at suggesting the next POI to a user in spatial and temporal context, which is a practical yet challenging task in various applications.

Data Augmentation

Video Swin Transformer

14 code implementations CVPR 2022 Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, Han Hu

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

Ranked #28 on Action Classification on Kinetics-600 (using extra training data)

Action Classification Action Recognition +5

End-to-End Semi-Supervised Object Detection with Soft Teacher

8 code implementations ICCV 2021 Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu

This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.

Instance Segmentation object-detection +4