Search Results for author: Hao Zhang

Found 405 papers, 148 papers with code

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

7 code implementations9 Mar 2023 Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion.

Referring Expression Referring Expression Comprehension +2

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

1 code implementation21 Sep 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications.

Chatbot Instruction Following

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

5 code implementations NeurIPS 2023 Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

Chatbot Language Modelling +2

Efficient Memory Management for Large Language Model Serving with PagedAttention

4 code implementations12 Sep 2023 Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica

On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage.

Language Modelling Large Language Model +1

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

14 code implementations7 Mar 2022 Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

Compared to other models on the leaderboard, DINO significantly reduces its model size and pre-training data size while achieving better results.

Real-Time Object Detection

Segment Everything Everywhere All at Once

2 code implementations NeurIPS 2023 Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, JianFeng Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs).

Image Segmentation Interactive Segmentation +4

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

1 code implementation25 Jan 2024 Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM).

Segmentation

Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V

3 code implementations17 Oct 2023 Jianwei Yang, Hao Zhang, Feng Li, Xueyan Zou, Chunyuan Li, Jianfeng Gao

We present Set-of-Mark (SoM), a new visual prompting method, to unleash the visual grounding abilities of large multimodal models (LMMs), such as GPT-4V.

Interactive Segmentation Referring Expression +4

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

1 code implementation28 Jan 2022 Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

2 code implementations22 Feb 2023 Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

Model parallelism is conventionally viewed as a method to scale a single large deep learning model beyond the memory limits of a single device.

DN-DETR: Accelerate DETR Training by Introducing Query DeNoising

16 code implementations CVPR 2022 Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang

Our method is universal and can be easily plugged into any DETR-like methods by adding dozens of lines of code to achieve a remarkable improvement.

Object Detection

Semantic-SAM: Segment and Recognize Anything at Any Granularity

1 code implementation10 Jul 2023 Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao

In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.

Image Segmentation Segmentation +1

Visual In-Context Prompting

3 code implementations22 Nov 2023 Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain.

Segmentation Visual Prompting

DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

7 code implementations ICLR 2022 Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, Lei Zhang

We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR.

Object Detection

detrex: Benchmarking Detection Transformers

1 code implementation12 Jun 2023 Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

A Simple Framework for Open-Vocabulary Segmentation and Detection

2 code implementations ICCV 2023 Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang

We present OpenSeeD, a simple Open-vocabulary Segmentation and Detection framework that jointly learns from different segmentation and detection datasets.

Ranked #2 on Instance Segmentation on ADE20K val (using extra training data)

Instance Segmentation Panoptic Segmentation +2

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

1 code implementation3 Feb 2024 Yichao Fu, Peter Bailis, Ion Stoica, Hao Zhang

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators.

Code Completion

How Can Recommender Systems Benefit from Large Language Models: A Survey

1 code implementation9 Jun 2023 Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, Weinan Zhang

In this paper, we conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.

Ethics Feature Engineering +5

A Strong and Reproducible Object Detector with Only Public Datasets

2 code implementations25 Apr 2023 Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64. 6 AP on COCO val2017 and 64. 8 AP on COCO test-dev using only 700M parameters without any test time augmentation.

Ranked #5 on Object Detection on COCO minival (using extra training data)

object-detection Object Detection

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

2 code implementations27 Aug 2020 Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, Eric P. Xing

Some recent schedulers choose job resources for users, but do so without awareness of how DL training can be re-optimized to better utilize the provided resources.

Fairness Scheduling

Learning Implicit Fields for Generative Shape Modeling

4 code implementations CVPR 2019 Zhiqin Chen, Hao Zhang

We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes.

3D Reconstruction 3D Shape Representation +2

Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning

2 code implementations28 Sep 2017 Pinxin Long, Tingxiang Fan, Xinyi Liao, Wenxi Liu, Hao Zhang, Jia Pan

We validate the learned sensor-level collision avoidance policy in a variety of simulated scenarios with thorough performance evaluations and show that the final learned policy is able to find time efficient, collision-free paths for a large-scale robot system.

Collision Avoidance reinforcement-learning +1

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation5 Dec 2023 Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

Neural Dual Contouring

2 code implementations4 Feb 2022 Zhiqin Chen, Andrea Tagliasacchi, Thomas Funkhouser, Hao Zhang

We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC).

Surface Reconstruction

BSP-Net: Generating Compact Meshes via Binary Space Partitioning

3 code implementations CVPR 2020 Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang

The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built on a set of planes.

3D Reconstruction 3D Shape Representation

Learning Mesh Representations via Binary Space Partitioning Tree Networks

1 code implementation27 Jun 2021 Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang

The network is trained to reconstruct a shape using a set of convexes obtained from a BSP-tree built over a set of planes, where the planes and convexes are both defined by learned network weights.

Detection Transformer with Stable Matching

1 code implementation ICCV 2023 Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR.

Position

Neural Marching Cubes

1 code implementation21 Jun 2021 Zhiqin Chen, Hao Zhang

To tackle these challenges, we re-cast MC from a deep learning perspective, by designing tessellation templates more apt at preserving geometric features, and learning the vertex positions and mesh topologies from training meshes, to account for contextual information from nearby cubes.

DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

1 code implementation5 Oct 2023 Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU.

DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion

1 code implementation ICCV 2023 Maham Tanveer, Yizhi Wang, Ali Mahdavi-Amiri, Hao Zhang

We introduce a novel method to automatically generate an artistic typography by stylizing one or more letter fonts to visually convey the semantics of an input word, while ensuring that the output remains readable.

Denoising

SketchyScene: Richly-Annotated Scene Sketches

2 code implementations ECCV 2018 Changqing Zou, Qian Yu, Ruofei Du, Haoran Mo, Yi-Zhe Song, Tao Xiang, Chengying Gao, Baoquan Chen, Hao Zhang

We contribute the first large-scale dataset of scene sketches, SketchyScene, with the goal of advancing research on sketch understanding at both the object and scene level.

Colorization Image Retrieval +2

Span-based Localizing Network for Natural Language Video Localization

1 code implementation ACL 2020 Hao Zhang, Aixin Sun, Wei Jing, Joey Tianyi Zhou

Given an untrimmed video and a text query, natural language video localization (NLVL) is to locate a matching span from the video that semantically corresponds to the query.

Automatic Photo Adjustment Using Deep Neural Networks

1 code implementation24 Dec 2014 Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, Yizhou Yu

Many photographic styles rely on subtle adjustments that depend on the image content and even its semantics.

Photo Retouching

MPCFormer: fast, performant and private Transformer inference with MPC

1 code implementation2 Nov 2022 Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang

Through extensive evaluations, we show that MPCFORMER significantly speeds up Transformer inference in MPC settings while achieving similar ML performance to the input model.

Knowledge Distillation

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

1 code implementation19 Aug 2022 Rachneet Sachdeva, Haritz Puerto, Tim Baumgärtner, Sewin Tariverdian, Hao Zhang, Kexin Wang, Hossain Shaikh Saadi, Leonardo F. R. Ribeiro, Iryna Gurevych

In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations.

Adversarial Attack Explainable Models +2

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

1 code implementation31 Mar 2023 Haritz Puerto, Tim Baumgärtner, Rachneet Sachdeva, Haishuo Fang, Hao Zhang, Sewin Tariverdian, Kexin Wang, Iryna Gurevych

To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents.

Question Answering

Token Shift Transformer for Video Classification

3 code implementations5 Aug 2021 Hao Zhang, Yanbin Hao, Chong-Wah Ngo

It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.

Classification Computational Efficiency +2

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

1 code implementation CVPR 2023 Zequn Zeng, Hao Zhang, Zhengjue Wang, Ruiying Lu, Dongsheng Wang, Bo Chen

Zero-shot capability has been considered as a new revolution of deep learning, letting machines work on tasks without curated training data.

Image Captioning Language Modelling

BAE-NET: Branched Autoencoder for Shape Co-Segmentation

1 code implementation ICCV 2019 Zhiqin Chen, Kangxue Yin, Matthew Fisher, Siddhartha Chaudhuri, Hao Zhang

The unsupervised BAE-NET is trained with a collection of un-segmented shapes, using a shape reconstruction loss, without any ground-truth labels.

One-Shot Learning Representation Learning

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network

1 code implementation5 Jul 2020 Hao Xu, Ka Hei Hui, Chi-Wing Fu, Hao Zhang

To start, we reformulate tiling as a graph problem by modeling candidate tile locations in the target shape as graph nodes and connectivity between tile locations as edges.

TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders

1 code implementation1 Mar 2023 Mingyue Cheng, Qi Liu, Zhiding Liu, Hao Zhang, Rujiao Zhang, Enhong Chen

In this work, we propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks.

Time Series Time Series Analysis +1

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models

1 code implementation16 Feb 2021 Zhuohan Li, Siyuan Zhuang, Shiyuan Guo, Danyang Zhuo, Hao Zhang, Dawn Song, Ion Stoica

With this key idea, we design TeraPipe, a high-performance token-level pipeline parallel algorithm for synchronous model-parallel training of Transformer-based language models.

FaceDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

2 code implementations NeurIPS 2023 Hao Zhang, Yanbo Xu, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries.

3D Face Reconstruction Video Editing +1

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation28 Nov 2022 Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

object-detection Object Detection +4

BalaGAN: Image Translation Between Imbalanced Domains via Cross-Modal Transfer

1 code implementation5 Oct 2020 Or Patashnik, Dov Danon, Hao Zhang, Daniel Cohen-Or

State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity.

Image-to-Image Translation Style Transfer +1

Video Corpus Moment Retrieval with Contrastive Learning

1 code implementation13 May 2021 Hao Zhang, Aixin Sun, Wei Jing, Guoshun Nan, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh

We adopt the first approach and introduce two contrastive learning objectives to refine video encoder and text encoder to learn video and text representations separately but with better alignment for VCMR.

Contrastive Learning Moment Retrieval +2

WHAI: Weibull Hybrid Autoencoding Inference for Deep Topic Modeling

1 code implementation ICLR 2018 Hao Zhang, Bo Chen, Dandan Guo, Mingyuan Zhou

To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes.

UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification

1 code implementation WS 2018 Andreas Hanselowski, Hao Zhang, Zile Li, Daniil Sorokin, Benjamin Schiller, Claudia Schulz, Iryna Gurevych

The Fact Extraction and VERification (FEVER) shared task was launched to support the development of systems able to verify claims by extracting supporting or refuting facts from raw text.

Claim Verification Entity Linking +4

CLLMs: Consistency Large Language Models

1 code implementation28 Feb 2024 Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang

Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM inference as it breaks the sequential nature of the LLM decoding process and transforms it into parallelizable computation.

LayoutGMN: Neural Graph Matching for Structural Layout Similarity

1 code implementation CVPR 2021 Akshay Gadi Patil, Manyi Li, Matthew Fisher, Manolis Savva, Hao Zhang

In particular, retrieval results by our network better match human judgement of structural layout similarity compared to both IoUs and other baselines including a state-of-the-art method based on graph neural networks and image convolution.

Graph Matching Metric Learning +1

A Prototype-Oriented Framework for Unsupervised Domain Adaptation

1 code implementation NeurIPS 2021 Korawat Tanwisuth, Xinjie Fan, Huangjie Zheng, Shujian Zhang, Hao Zhang, Bo Chen, Mingyuan Zhou

Existing methods for unsupervised domain adaptation often rely on minimizing some statistical distance between the source and target samples in the latent space.

Unsupervised Domain Adaptation

Roof-GAN: Learning to Generate Roof Geometry and Relations for Residential Houses

1 code implementation CVPR 2021 Yiming Qian, Hao Zhang, Yasutaka Furukawa

This paper presents Roof-GAN, a novel generative adversarial network that generates structured geometry of residential roof structures as a set of roof primitives and their relationships.

Generative Adversarial Network

Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects

2 code implementations9 Aug 2013 Binghang Liu, Yujian Shi, Jianying Yuan, Xuesong Hu, Hao Zhang, Nan Li, Zhenyu Li, Yanxiang Chen, Desheng Mu, Wei Fan

Therefore, it is necessary to develop efficient assembly-independent methods for accurate estimation of these genomic characteristics.

GDPNet: Refining Latent Multi-View Graph for Relation Extraction

1 code implementation12 Dec 2020 Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng

Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence.

Ranked #4 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Dynamic Time Warping +2

Discovering and Explaining the Representation Bottleneck of DNNs

1 code implementation ICLR 2022 Huiqi Deng, Qihan Ren, Hao Zhang, Quanshi Zhang

This paper explores the bottleneck of feature representations of deep neural networks (DNNs), from the perspective of the complexity of interactions between input variables encoded in DNNs.

BIRNAT: Bidirectional Recurrent Neural Networks with Adversarial Training for Video Snapshot Compressive Imaging

1 code implementation ECCV 2020 Ziheng Cheng, Ruiying Lu, Zhengjue Wang, Hao Zhang, Bo Chen, Ziyi Meng, Xin Yuan

This measurement and the modulation masks are fed into our Recurrent Neural Network (RNN) to reconstruct the desired high-speed frames.

Shape-IoU: More Accurate Metric considering Bounding Box Shape and Scale

1 code implementation29 Dec 2023 Hao Zhang, Shuaijie Zhang

As an important component of the detector localization branch, bounding box regression loss plays a significant role in object detection tasks.

object-detection Object Detection +1

Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning

1 code implementation8 Feb 2022 Yang Zhao, Hao Zhang, Xiuyuan Hu

In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization.

Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box

1 code implementation6 Nov 2023 Hao Zhang, Cong Xu, Shuaijie Zhang

Based on the above, we first analyzed the BBR model and concluded that distinguishing different regression samples and using different scales of auxiliary bounding boxes to calculate losses can effectively accelerate the bounding box regression process.

 Ranked #1 on Object Detection on AI-TOD (mAP50 metric)

Object Detection regression

An End-to-End Neural Network for Image Cropping by Learning Composition from Aesthetic Photos

2 code implementations2 Jul 2019 Peng Lu, Hao Zhang, Xujun Peng, Xiaofu Jin

In this paper, we primarily focus on improving the accuracy of automatic image cropping, and on further exploring its potential in public datasets with high efficiency.

Image Cropping

AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness

1 code implementation13 Oct 2022 Dacheng Li, Hongyi Wang, Eric Xing, Hao Zhang

Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks.

valid

CompoNet: Learning to Generate the Unseen by Part Synthesis and Composition

1 code implementation ICCV 2019 Nadav Schor, Oren Katzir, Hao Zhang, Daniel Cohen-Or

Data-driven generative modeling has made remarkable progress by leveraging the power of deep neural networks.

RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud

1 code implementation26 Jun 2020 Zihao Yan, Ruizhen Hu, Xingguang Yan, Luanmin Chen, Oliver van Kaick, Hao Zhang, Hui Huang

We show results of simultaneous motion and part predictions from synthetic and real scans of 3D objects exhibiting a variety of part mobilities, possibly involving multiple movable parts.

Semantic Segmentation

Predictive and Generative Neural Networks for Object Functionality

1 code implementation28 Jun 2020 Ruizhen Hu, Zihao Yan, Jingwen Zhang, Oliver van Kaick, Ariel Shamir, Hao Zhang, Hui Huang

Given a 3D object in isolation, our functional similarity network (fSIM-NET), a variation of the triplet network, is trained to predict the functionality of the object by inferring functionality-revealing interaction contexts.

Object

Memory-Efficient Network for Large-scale Video Compressive Sensing

2 code implementations CVPR 2021 Ziheng Cheng, Bo Chen, Guanliang Liu, Hao Zhang, Ruiying Lu, Zhengjue Wang, Xin Yuan

With the knowledge of masks, optimization algorithms or deep learning methods are employed to reconstruct the desired high-speed video frames from this snapshot measurement.

Compressive Sensing Demosaicking +1

Group Contextualization for Video Recognition

1 code implementation CVPR 2022 Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He

By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.

Action Recognition Egocentric Activity Recognition +1

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

1 code implementation7 Jun 2022 Zhifeng Ma, Hao Zhang, Jie Liu

Spatiotemporal predictive learning, which predicts future frames through historical prior knowledge with the aid of deep learning, is widely used in many fields.

Video Prediction

FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

1 code implementation5 Jan 2024 Hao Zhang, Yu-Wing Tai, Chi-Keung Tang

However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge.

Video Editing

High-accuracy mass, spin, and recoil predictions of generic black-hole merger remnants

1 code implementation24 Sep 2018 Vijay Varma, Davide Gerosa, François Hébert, Leo C. Stein, Hao Zhang

We present accurate fits for the remnant properties of generically precessing binary black holes, trained on large banks of numerical-relativity simulations.

General Relativity and Quantum Cosmology High Energy Astrophysical Phenomena

AutoLoss: Learning Discrete Schedules for Alternate Optimization

1 code implementation4 Oct 2018 Haowen Xu, Hao Zhang, Zhiting Hu, Xiaodan Liang, Ruslan Salakhutdinov, Eric Xing

Many machine learning problems involve iteratively and alternately optimizing different task objectives with respect to different sets of parameters.

Image Generation Machine Translation +4

Symbolic Graph Reasoning Meets Convolutions

1 code implementation NeurIPS 2018 Xiaodan Liang, Zhiting Hu, Hao Zhang, Liang Lin, Eric P. Xing

To cooperate with local convolutions, each SGR is constituted by three modules: a) a primal local-to-semantic voting module where the features of all symbolic nodes are generated by voting from local representations; b) a graph reasoning module propagates information over knowledge graph to achieve global semantic coherency; c) a dual semantic-to-local mapping module learns new associations of the evolved symbolic nodes with local representations, and accordingly enhances local features.

Image Classification Semantic Segmentation

Physical Interaction: Reconstructing Hand-object Interactions with Physics

1 code implementation22 Sep 2022 Haoyu Hu, Xinyu Yi, Hao Zhang, Jun-Hai Yong, Feng Xu

Single view-based reconstruction of hand-object interaction is challenging due to the severe observation missing caused by occlusions.

Object

MetaSCI: Scalable and Adaptive Reconstruction for Video Compressive Sensing

2 code implementations CVPR 2021 Zhengjue Wang, Hao Zhang, Ziheng Cheng, Bo Chen, Xin Yuan

To capture high-speed videos using a two-dimensional detector, video snapshot compressive imaging (SCI) is a promising system, where the video frames are coded by different masks and then compressed to a snapshot measurement.

Compressive Sensing Video Compressive Sensing

Can learning from natural image denoising be used for seismic data interpolation?

1 code implementation27 Feb 2019 Hao Zhang, Xiuyan Yang, Jianwei Ma

We propose a convolutional neural network (CNN) denoising based method for seismic data interpolation.

De-aliasing Image Denoising

FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields

1 code implementation21 Nov 2022 Hao Zhang, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs).

De novo Drug Design using Reinforcement Learning with Multiple GPT Agents

1 code implementation NeurIPS 2023 Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

A central challenge in this field is to generate molecules with specific properties while also producing a wide range of diverse candidates.

reinforcement-learning

MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction

1 code implementation30 May 2023 Jing Wang, Aixin Sun, Hao Zhang, XiaoLi Li

Given a query, the task of Natural Language Video Localization (NLVL) is to localize a temporal moment in an untrimmed video that semantically matches the query.

ARO-Net: Learning Implicit Fields from Anchored Radial Observations

1 code implementation CVPR 2023 Yizhi Wang, Zeyu Huang, Ariel Shamir, Hui Huang, Hao Zhang, Ruizhen Hu

We introduce anchored radial observations (ARO), a novel shape encoding for learning implicit field representation of 3D shapes that is category-agnostic and generalizable amid significant shape variations.

Surface Reconstruction

Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities

1 code implementation5 Nov 2020 Hao Zhang, Jae Ro, Richard Sproat

Breaking domain names such as openresearch into component words open and research is important for applications like Text-to-Speech synthesis and web search.

Chinese Word Segmentation Speech Synthesis +1

Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities

1 code implementation COLING 2020 Hao Zhang, Jae Ro, Richard Sproat

Breaking domain names such as openresearch into component words open and research is important for applications like Text-to-Speech synthesis and web search.

Chinese Word Segmentation Speech Synthesis +1

GANHopper: Multi-Hop GAN for Unsupervised Image-to-Image Translation

1 code implementation ECCV 2020 Wallace Lira, Johannes Merz, Daniel Ritchie, Daniel Cohen-Or, Hao Zhang

Instead of executing translation directly, we steer the translation by requiring the network to produce in-between images that resemble weighted hybrids between images from the input domains.

Translation Unsupervised Image-To-Image Translation

Adaptive Split-Fusion Transformer

1 code implementation26 Apr 2022 Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.

Image Classification

High-throughput, high-resolution registration-free generated adversarial network microscopy

1 code implementation7 Jan 2018 Hao Zhang, Xinlin Xie, Chunyu Fang, Yicong Yang, Di Jin, Peng Fei

We combine generative adversarial network (GAN) with light microscopy to achieve deep learning super-resolution under a large field of view (FOV).

Generative Adversarial Network Image Registration +2

Focaler-IoU: More Focused Intersection over Union Loss

1 code implementation19 Jan 2024 Hao Zhang, Shuaijie Zhang

Existing researchs improve regression performance by utilizing the geometric relationship between bounding boxes, while ignoring the impact of difficult and easy sample distribution on bounding box regression.

Object object-detection +2

Hybrid Neural Networks for On-device Directional Hearing

1 code implementation AAAI 2022 Anran Wang, Maruchi Kim, Hao Zhang, Shyamnath Gollakota

On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements.

Causal Inference Real-time Directional Hearing

Manifoldron: Direct Space Partition via Manifold Discovery

2 code implementations14 Jan 2022 Dayang Wang, Feng-Lei Fan, Bo-Jian Hou, Hao Zhang, Zhen Jia, Boce Zhou, Rongjie Lai, Hengyong Yu, Fei Wang

A neural network with the widely-used ReLU activation has been shown to partition the sample space into many convex polytopes for prediction.

BIG-bench Machine Learning

Neural Eigenfunctions Are Structured Representation Learners

1 code implementation23 Oct 2022 Zhijie Deng, Jiaxin Shi, Hao Zhang, Peng Cui, Cewu Lu, Jun Zhu

Unlike prior spectral methods such as Laplacian Eigenmap that operate in a nonparametric manner, Neural Eigenmap leverages NeuralEF to parametrically model eigenfunctions using a neural network.

Contrastive Learning Data Augmentation +7

BSD-GAN: Branched Generative Adversarial Network for Scale-Disentangled Representation Learning and Image Synthesis

2 code implementations22 Mar 2018 Zili Yi, Zhiqin Chen, Hao Cai, Wendong Mao, Minglun Gong, Hao Zhang

The key feature of BSD-GAN is that it is trained in multiple branches, progressively covering both the breadth and depth of the network, as resolutions of the training images increase to reveal finer-scale features.

Generative Adversarial Network Image Generation +1

Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling

1 code implementation ICLR 2020 Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou

For bidirectional joint image-text modeling, we develop variational hetero-encoder (VHE) randomized generative adversarial network (GAN), a versatile deep generative model that integrates a probabilistic text decoder, probabilistic image encoder, and GAN into a coherent end-to-end multi-modality learning framework.

Generative Adversarial Network

Spin-Orbit Protection of Induced Superconductivity in Majorana Nanowires

1 code implementation5 Jul 2018 Jouri D. S. Bommer, Hao Zhang, Önder Gül, Bas Nijholt, Michael Wimmer, Filipp N. Rybakov, Julien Garaud, Donjan Rodic, Egor Babaev, Matthias Troyer, Diana Car, Sébastien R. Plissard, Erik P. A. M. Bakkers, Kenji Watanabe, Takashi Taniguchi, Leo P. Kouwenhoven

Spin-orbit interaction (SOI) plays a key role in creating Majorana zero modes in semiconductor nanowires proximity coupled to a superconductor.

Mesoscale and Nanoscale Physics

Students Need More Attention: BERT-based AttentionModel for Small Data with Application to AutomaticPatient Message Triage

1 code implementation22 Jun 2020 Shijing Si, Rui Wang, Jedrek Wosik, Hao Zhang, David Dov, Guoyin Wang, Ricardo Henao, Lawrence Carin

Small and imbalanced datasets commonly seen in healthcare represent a challenge when training classifiers based on deep learning models.

RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape Structures

1 code implementation CVPR 2022 Chengjie Niu, Manyi Li, Kai Xu, Hao Zhang

Each level of the tree corresponds to an assembly of shape parts, represented as implicit functions, to reconstruct the input shape.

Language Model Decomposition: Quantifying the Dependency and Correlation of Language Models

1 code implementation19 Oct 2022 Hao Zhang

A goodness-of-fit metric for LMD similar to the coefficient of determination is defined and used to measure the linear dependency of a set of LMs.

Language Modelling

DAE-Net: Deforming Auto-Encoder for fine-grained shape co-segmentation

1 code implementation22 Nov 2023 Zhiqin Chen, Qimin Chen, Hang Zhou, Hao Zhang

We present an unsupervised 3D shape co-segmentation method which learns a set of deformable part templates from a shape collection.

Multi-Task Dense Prediction via Mixture of Low-Rank Experts

1 code implementation26 Mar 2024 YuQi Yang, Peng-Tao Jiang, Qibin Hou, Hao Zhang, Jinwei Chen, Bo Li

Furthermore, to control the parameters and computational cost brought by the increase in the number of experts, we take inspiration from LoRA and propose to leverage the low-rank format of a vanilla convolution in the expert network.

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

1 code implementation12 Apr 2024 Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy.

Downstream Transformer Generation of Question-Answer Pairs with Preprocessing and Postprocessing Pipelines

1 code implementation15 May 2022 Cheng Zhang, Hao Zhang, Jie Wang

We present a system called TP3 to perform a downstream task of transformers on generating question-answer pairs (QAPs) from a given article.

Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion

1 code implementation25 Mar 2024 Xunpeng Yi, Han Xu, Hao Zhang, Linfeng Tang, Jiayi Ma

Through the text semantic encoder and semantic interaction fusion decoder, Text-IF is accessible to the all-in-one infrared and visible image degradation-aware processing and the interactive flexible fusion outcomes.

Interventional Video Grounding with Dual Contrastive Learning

1 code implementation CVPR 2021 Guoshun Nan, Rui Qiao, Yao Xiao, Jun Liu, Sicong Leng, Hao Zhang, Wei Lu

2) Meanwhile, we introduce a dual contrastive learning approach (DCL) to better align the text and video by maximizing the mutual information (MI) between query and video clips, and the MI between start/end frames of a target moment and the others within a video to learn more informative visual representations.

Causal Inference Contrastive Learning +2

COSY: COunterfactual SYntax for Cross-Lingual Understanding

1 code implementation ACL 2021 Sicheng Yu, Hao Zhang, Yulei Niu, Qianru Sun, Jing Jiang

Pre-trained multilingual language models, e. g., multilingual-BERT, are widely used in cross-lingual tasks, yielding the state-of-the-art performance.

counterfactual Natural Language Inference +3

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

1 code implementation15 Jul 2022 Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers.

ShaDDR: Interactive Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering

1 code implementation8 Jun 2023 Qimin Chen, Zhiqin Chen, Hang Zhou, Hao Zhang

Furthermore, we showcase the ability of our method to learn geometric details and textures from shapes reconstructed from real-world photos.

Texture Synthesis

Distantly-Supervised Long-Tailed Relation Extraction Using Constraint Graphs

1 code implementation24 May 2021 Tianming Liang, Yang Liu, Xiaoyan Liu, Hao Zhang, Gaurav Sharma, Maozu Guo

On top of that, we further propose a novel constraint graph-based relation extraction framework(CGRE) to handle the two challenges simultaneously.

Denoising Relation +2

Heterogeneous Autoencoder Empowered by Quadratic Neurons

1 code implementation2 Apr 2022 Jing-Xiao Liao, Bo-Jian Hou, Hang-Cheng Dong, Hao Zhang, Jianwei Ma, Jinwei Sun, Shiping Zhang, Feng-Lei Fan

Inspired by the complexity and diversity of biological neurons, a quadratic neuron is proposed to replace the inner product in the current neuron with a simplified quadratic function.

Anomaly Detection

Incorporating Instructional Prompts into a Unified Generative Framework for Joint Multiple Intent Detection and Slot Filling

1 code implementation COLING 2022 Yangjun Wu, Han Wang, Dongxiang Zhang, Gang Chen, Hao Zhang

Specifically, we design 5-type templates as instructional prompts, and each template includes a question that acts as the driver to teach UGEN to grasp the paradigm, options that list the candidate intents or slots to reduce the answer search space, and the context denotes original utterance.

Intent Detection Question Answering +3

MeaCap: Memory-Augmented Zero-shot Image Captioning

1 code implementation6 Mar 2024 Zequn Zeng, Yan Xie, Hao Zhang, Chiyu Chen, Zhengjue Wang, Bo Chen

The framework of MeaCap achieves the state-of-the-art performance on a series of zero-shot IC settings.

Caption Generation Image Captioning +4

FAME: 3D Shape Generation via Functionality-Aware Model Evolution

1 code implementation9 May 2020 Yanran Guan, Han Liu, Kun Liu, Kangxue Yin, Ruizhen Hu, Oliver van Kaick, Yan Zhang, Ersin Yumer, Nathan Carr, Radomir Mech, Hao Zhang

Our tool supports constrained modeling, allowing users to restrict or steer the model evolution with functionality labels.

Graphics

EnsLM: Ensemble Language Model for Data Diversity by Semantic Clustering

1 code implementation ACL 2021 Zhibin Duan, Hao Zhang, Chaojie Wang, Zhengjue Wang, Bo Chen, Mingyuan Zhou

As a result, the backbone learns the shared knowledge among all clusters while modulated weights extract the cluster-specific features.

Clustering Language Modelling

SAC-GAN: Structure-Aware Image Composition

1 code implementation13 Dec 2021 Hang Zhou, Rui Ma, Ling-Xiao Zhang, Lin Gao, Ali Mahdavi-Amiri, Hao Zhang

Specifically, our network takes the semantic layout features from the input scene image, features encoded from the edges and silhouette in the input object patch, as well as a latent code as inputs, and generates a 2D spatial affine transform defining the translation and scaling of the object patch.

Image Augmentation Object

A Variational Edge Partition Model for Supervised Graph Representation Learning

1 code implementation7 Feb 2022 Yilin He, Chaojie Wang, Hao Zhang, Bo Chen, Mingyuan Zhou

This paper introduces a graph generative process to model how the observed edges are generated by aggregating the node interactions over a set of overlapping node communities, each of which contributes to the edges via a logical OR mechanism.

Classification Graph Representation Learning +1

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

1 code implementation12 Jul 2022 Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-Wah Ngo

By replacing a vanilla 2D attention with the LAPS, we could adapt a static transformer into a video one, with zero extra parameters and neglectable computation overhead ($\sim$2. 6\%).

Video Classification

NoisywikiHow: A Benchmark for Learning with Real-world Noisy Labels in Natural Language Processing

1 code implementation18 May 2023 Tingting Wu, Xiao Ding, Minji Tang, Hao Zhang, Bing Qin, Ting Liu

To mitigate the effects of label noise, learning with noisy labels (LNL) methods are designed to achieve better generalization performance.

Learning with noisy labels

TLM: Token-Level Masking for Transformers

1 code implementation28 Oct 2023 Yangjun Wu, Kebin Fang, Dongxiang Zhang, Han Wang, Hao Zhang, Gang Chen

Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers.

Data-to-Text Generation Grammatical Error Correction +1

Revisiting Single Image Reflection Removal In the Wild

1 code implementation29 Nov 2023 Yurui Zhu, Xueyang Fu, Peng-Tao Jiang, Hao Zhang, Qibin Sun, Jinwei Chen, Zheng-Jun Zha, Bo Li

This research focuses on the issue of single-image reflection removal (SIRR) in real-world conditions, examining it from two angles: the collection pipeline of real reflection pairs and the perception of real reflection locations.

Reflection Removal

Wavelet Regularization Benefits Adversarial Training

1 code implementation8 Jun 2022 Jun Yan, Huilin Yin, Xiaoyang Deng, Ziming Zhao, Wancheng Ge, Hao Zhang, Gerhard Rigoll

Since adversarial vulnerability can be regarded as a high-frequency phenomenon, it is essential to regulate the adversarially-trained neural network models in the frequency domain.

Adversarial Robustness

Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping

1 code implementation24 Jun 2023 Daniel Zou, Xinchen Jin, Xueyang Yu, Hao Zhang, James Demmel

In anticipation of workloads that involve serving many of such large models to handle different tasks, we develop Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster.

Parameter-Efficient Conversational Recommender System as a Language Processing Task

1 code implementation25 Jan 2024 Mathieu Ravaut, Hao Zhang, Lu Xu, Aixin Sun, Yong liu

Conversational recommender systems (CRS) aim to recommend relevant items to users by eliciting user preference through natural language conversation.

Dialogue Generation Knowledge Graphs +2

Interpretable Complex-Valued Neural Networks for Privacy Protection

1 code implementation ICLR 2020 Liyao Xiang, Haotian Ma, Hao Zhang, Yifan Zhang, Jie Ren, Quanshi Zhang

Previous studies have found that an adversary attacker can often infer unintended input information from intermediate-layer features.

Quantification and Analysis of Layer-wise and Pixel-wise Information Discarding

1 code implementation10 Jun 2019 Haotian Ma, Hao Zhang, Fan Zhou, Yinqing Zhang, Quanshi Zhang

We define two types of entropy-based metrics, i. e. (1) the discarding of pixel-wise information used in the forward propagation, and (2) the uncertainty of the input reconstruction, to measure input information contained by a specific layer from two perspectives.

Fairness

Deep N-ary Error Correcting Output Codes

1 code implementation22 Sep 2020 Hao Zhang, Joey Tianyi Zhou, Tianying Wang, Ivor W. Tsang, Rick Siow Mong Goh

To facilitate the training of N-ary ECOC with deep learning base learners, we further propose three different variants of parameter sharing architectures for deep N-ary ECOC.

Ensemble Learning General Classification +3

Unlocking the Potential of Large Language Models for Explainable Recommendations

1 code implementation25 Dec 2023 Yucong Luo, Mingyue Cheng, Hao Zhang, Junyu Lu, Qi Liu, Enhong Chen

In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework aimed at further boosting the explanation quality by employing LLMs.

Decision Making Explainable Recommendation +2

Contrastive Attraction and Contrastive Repulsion for Representation Learning

1 code implementation8 May 2021 Huangjie Zheng, Xu Chen, Jiangchao Yao, Hongxia Yang, Chunyuan Li, Ya zhang, Hao Zhang, Ivor Tsang, Jingren Zhou, Mingyuan Zhou

We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples.

Contrastive Learning Representation Learning

Combined Invariant Subspace \& Frequency-Domain Subspace Method for Identification of Discrete-Time MIMO Linear Systems

1 code implementation12 Dec 2023 Jingze You, Chao Huang, Hao Zhang

Recently, a novel system identification method based on invariant subspace theory is introduced, aiming to address the identification problem of continuous-time (CT) linear time-invariant (LTI) systems by combining time-domain and frequency-domain methods.

Empirical Evidence for the Fragment level Understanding on Drug Molecular Structure of LLMs

1 code implementation15 Jan 2024 Xiuyuan Hu, Guoqing Liu, Yang Zhao, Hao Zhang

AI for drug discovery has been a research hotspot in recent years, and SMILES-based language models has been increasingly applied in drug molecular design.

Drug Discovery

Alternating Synthetic and Real Gradients for Neural Language Modeling

1 code implementation27 Feb 2019 Fangxin Shang, Hao Zhang

Empirically, we demonstrate the effectiveness of alternating training with synthetic and real gradients after periodic warm restarts on language modeling tasks.

Language Modelling

Sentence Bag Graph Formulation for Biomedical Distant Supervision Relation Extraction

1 code implementation29 Oct 2023 Hao Zhang, Yang Liu, Xiaoyan Liu, Tianming Liang, Gaurav Sharma, Liang Xue, Maozu Guo

We introduce a novel graph-based framework for alleviating key challenges in distantly-supervised relation extraction and demonstrate its effectiveness in the challenging and important domain of biomedical data.

Relation Relation Extraction +1

P2P-NET: Bidirectional Point Displacement Net for Shape Transform

no code implementations25 Mar 2018 Kangxue Yin, Hui Huang, Daniel Cohen-Or, Hao Zhang

We introduce P2P-NET, a general-purpose deep neural network which learns geometric transformations between point-based shape representations from two domains, e. g., meso-skeletons and surfaces, partial and complete scans, etc.

Semi-Supervised Co-Analysis of 3D Shape Styles from Projected Lines

no code implementations18 Apr 2018 Fenggen Yu, Yan Zhang, Kai Xu, Ali Mahdavi-Amiri, Hao Zhang

We present a semi-supervised co-analysis method for learning 3D shape styles from projected feature lines, achieving style patch localization with only weak supervision.

Clustering

On the Selection of Anchors and Targets for Video Hyperlinking

no code implementations14 Apr 2018 Zhi-Qi Cheng, Hao Zhang, Xiao Wu, Chong-Wah Ngo

A principle way of hyperlinking can be carried out by picking centers of clusters as anchors and from there reach out to targets within or outside of clusters with consideration of neighborhood complexity.

Cavs: A Vertex-centric Programming Interface for Dynamic Neural Networks

no code implementations11 Dec 2017 Hao Zhang, Shizhen Xu, Graham Neubig, Wei Dai, Qirong Ho, Guangwen Yang, Eric P. Xing

Recent deep learning (DL) models have moved beyond static network architectures to dynamic ones, handling data where the network structure changes every example, such as sequences of variable lengths, trees, and graphs.

graph construction Management +1

Efficient and Effective Single-Document Summarizations and A Word-Embedding Measurement of Quality

no code implementations1 Oct 2017 Liqun Shao, Hao Zhang, Ming Jia, Jie Wang

We show that the orderings of the ROUGE and WESM scores of our algorithms are highly comparable, suggesting that WESM may serve as a viable alternative for measuring the quality of a summary.

Clustering Keyword Extraction

Mining Deep And-Or Object Structures via Cost-Sensitive Question-Answer-Based Active Annotations

no code implementations13 Aug 2017 Quanshi Zhang, Ying Nian Wu, Hao Zhang, Song-Chun Zhu

The loss is defined for nodes in all layers of the AOG, including the generative loss (measuring the likelihood of the images) and the discriminative loss (measuring the fitness to human answers).

Question Answering

Generative Semantic Manipulation with Contrasting GAN

no code implementations1 Aug 2017 Xiaodan Liang, Hao Zhang, Eric P. Xing

Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer.

Image-to-Image Translation Style Transfer

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters

no code implementations11 Jun 2017 Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing

We show that Poseidon enables Caffe and TensorFlow to achieve 15. 5x speed-up on 16 single-GPU machines, even with limited bandwidth (10GbE) and the challenging VGG19-22K network for image classification.

Image Classification

GRASS: Generative Recursive Autoencoders for Shape Structures

no code implementations5 May 2017 Jun Li, Kai Xu, Siddhartha Chaudhuri, Ersin Yumer, Hao Zhang, Leonidas Guibas

We introduce a novel neural network architecture for encoding and synthesis of 3D shapes, particularly their structures.

SCAN: Structure Correcting Adversarial Network for Organ Segmentation in Chest X-rays

no code implementations26 Mar 2017 Wei Dai, Joseph Doyle, Xiaodan Liang, Hao Zhang, Nanqing Dong, Yuan Li, Eric P. Xing

Through this adversarial process the critic network learns the higher order structures and guides the segmentation model to achieve realistic segmentation outcomes.

Organ Segmentation Segmentation

Recurrent Topic-Transition GAN for Visual Paragraph Generation

no code implementations ICCV 2017 Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, Eric P. Xing

The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators.

Generative Adversarial Network Image Paragraph Captioning +1

ZM-Net: Real-time Zero-shot Image Manipulation Network

no code implementations21 Mar 2017 Hao Wang, Xiaodan Liang, Hao Zhang, Dit-yan Yeung, Eric P. Xing

We cast this problem as manipulating an input image according to a parametric model whose key parameters can be conditionally generated from any guiding signal (even unseen ones).

Colorization Descriptive +2

Sequence-based Multimodal Apprenticeship Learning For Robot Perception and Decision Making

no code implementations24 Feb 2017 Fei Han, Xue Yang, Yu Zhang, Hao Zhang

Apprenticeship learning has recently attracted a wide attention due to its capability of allowing robots to learn physical tasks directly from demonstrations provided by human experts.

Decision Making

Simultaneous Feature and Body-Part Learning for Real-Time Robot Awareness of Human Behaviors

no code implementations24 Feb 2017 Fei Han, Xue Yang, Christopher Reardon, Yu Zhang, Hao Zhang

We formulate FABL as a regression-like optimization problem with structured sparsity-inducing norms to model interrelationships of body parts and features.

Space-Time Representation of People Based on 3D Skeletal Data: A Review

1 code implementation5 Jan 2016 Fei Han, Brian Reily, William Hoff, Hao Zhang

Spatiotemporal human representation based on 3D visual perception data is a rapidly growing research area.

Feature Engineering

Self-Reflective Risk-Aware Artificial Cognitive Modeling for Robot Response to Human Behaviors

no code implementations16 May 2016 Fei Han, Christopher Reardon, Lynne E. Parker, Hao Zhang

In order for cooperative robots ("co-robots") to respond to human behaviors accurately and efficiently in human-robot collaboration, interpretation of human actions, awareness of new situations, and appropriate decision making are all crucial abilities for co-robots.

Decision Making

Enforcing Template Representability and Temporal Consistency for Adaptive Sparse Tracking

no code implementations30 Apr 2016 Xue Yang, Fei Han, Hua Wang, Hao Zhang

Sparse representation has been widely studied in visual tracking, which has shown promising tracking performance.

Descriptive Visual Tracking

On the Reducibility of Submodular Functions

no code implementations4 Jan 2016 Jincheng Mei, Hao Zhang, Bao-liang Lu

The scalability of submodular optimization methods is critical for their usability in practice.

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

no code implementations19 Dec 2015 Hao Zhang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Gunhee Kim, Qirong Ho, Eric Xing

To investigate how to adapt existing frameworks to efficiently support distributed GPUs, we propose Poseidon, a scalable system architecture for distributed inter-machine communication in existing DL frameworks.

Object Recognition

Online Markov decision processes with policy iteration

no code implementations15 Oct 2015 Yao Ma, Hao Zhang, Masashi Sugiyama

The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions.

Task Selection for Bandit-Based Task Assignment in Heterogeneous Crowdsourcing

no code implementations26 Jul 2015 Hao Zhang, Masashi Sugiyama

Task selection (picking an appropriate labeling task) and worker selection (assigning the labeling task to a suitable worker) are two major challenges in task assignment for crowdsourcing.

Active Learning

Bandit-Based Task Assignment for Heterogeneous Crowdsourcing

no code implementations21 Jul 2015 Hao Zhang, Yao Ma, Masashi Sugiyama

We consider a task assignment problem in crowdsourcing, which is aimed at collecting as many reliable labels as possible within a limited budget.

Statistical models and regularization strategies in statistical image reconstruction of low-dose X-ray CT: a survey

no code implementations4 Dec 2014 Hao Zhang, Jing Wang, Jianhua Ma, Hongbing Lu, Zhengrong Liang

Statistical image reconstruction (SIR) methods have shown potential to substantially improve the image quality of low-dose X-ray computed tomography (CT) as compared to the conventional filtered back-projection (FBP) method for various clinical tasks.

Computed Tomography (CT) Image Reconstruction

Spatial-Spectral Boosting Analysis for Stroke Patients' Motor Imagery EEG in Rehabilitation Training

no code implementations23 Oct 2013 Hao Zhang, Liqing Zhang

Current studies about motor imagery based rehabilitation training systems for stroke subjects lack an appropriate analytic method, which can achieve a considerable classification accuracy, at the same time detects gradual changes of imagery patterns during rehabilitation process and disinters potential mechanisms about motor function recovery.

EEG Motor Imagery

Dual-label Deep LSTM Dereverberation For Speaker Verification

no code implementations8 Sep 2018 Hao Zhang, Stephen Zahorian, Xiao Chen, Peter Guzewich, Xiaoyu Liu

In this paper, we present a reverberation removal approach for speaker verification, utilizing dual-label deep neural networks (DNNs).

Speaker Verification

Semantic WordRank: Generating Finer Single-Document Summarizations

no code implementations12 Sep 2018 Hao Zhang, Jie Wang

We present Semantic WordRank (SWR), an unsupervised method for generating an extractive summary of a single document.

Clustering

SCORES: Shape Composition with Recursive Substructure Priors

no code implementations14 Sep 2018 Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Renjiao Yi, Hao Zhang

The network may significantly alter the geometry and structure of the input parts and synthesize a novel shape structure based on the inputs, while adding or removing parts to minimize a structure plausibility loss.

Toward Understanding the Impact of Staleness in Distributed Machine Learning

no code implementations ICLR 2019 Wei Dai, Yi Zhou, Nanqing Dong, Hao Zhang, Eric P. Xing

Many distributed machine learning (ML) systems adopt the non-synchronous execution in order to alleviate the network communication bottleneck, resulting in stale parameters that do not reflect the latest updates.

BIG-bench Machine Learning

Event Representation through Semantic Roles: Evaluation of Coverage

no code implementations9 Oct 2018 Aliaksandr Huminski, Hao Zhang

Semantic role theory is a widely used approach for event representation.

Towards Verifying Semantic Roles Co-occurrence

no code implementations9 Oct 2018 Aliaksandr Huminski, Hao Zhang, Gangeshwar Krishnamurthy

Semantic role theory considers roles as a small universal set of unanalyzed entities.

Hartley Spectral Pooling for Deep Learning

no code implementations7 Oct 2018 Hao Zhang, Jianwei Ma

In most convolution neural networks (CNNs), downsampling hidden layers is adopted for increasing computation efficiency and the receptive field size.

Dimensionality Reduction

Deep Poisson gamma dynamical systems

no code implementations NeurIPS 2018 Dandan Guo, Bo Chen, Hao Zhang, Mingyuan Zhou

We develop deep Poisson-gamma dynamical systems (DPGDS) to model sequentially observed multivariate count data, improving previously proposed models by not only mining deep hierarchical latent structure from the data, but also capturing both first-order and long-range temporal dependencies.

Data Augmentation Time Series +1

Nearly-tight bounds on linear regions of piecewise linear neural networks

no code implementations31 Oct 2018 Qiang Hu, Hao Zhang

The developments of deep neural networks (DNN) in recent years have ushered a brand new era of artificial intelligence.

Fast and Accurate Reordering with ITG Transition RNN

no code implementations COLING 2018 Hao Zhang, Axel Ng, Richard Sproat

Compared to a strong baseline of attention-based RNN, our ITG RNN re-ordering model can reach the same reordering accuracy with only 1/10 of the training data and is 2. 5x faster in decoding.

Feature Engineering Machine Translation +3

Learning Multi-Instance Enriched Image Representations via Non-Greedy Ratio Maximization of the l1-Norm Distances

no code implementations CVPR 2018 Kai Liu, Hua Wang, Feiping Nie, Hao Zhang

To tackle these two challenges, in this paper we propose a novel image representation learning method that can integrate the local patches (the instances) of an input image (the bag) and its holistic representation into one single-vector representation.

Representation Learning

Generative Semantic Manipulation with Mask-Contrasting GAN

no code implementations ECCV 2018 Xiaodan Liang, Hao Zhang, Liang Lin, Eric Xing

Despite the promising results on paired/unpaired image-to-image translation achieved by Generative Adversarial Networks (GANs), prior works often only transfer the low-level information (e. g. color or texture changes), but fail to manipulate high-level semantic meanings (e. g., geometric structure or content) of different object regions.

Image-to-Image Translation

VHEGAN: Variational Hetero-Encoder Randomized GAN for Zero-Shot Learning

no code implementations ICLR 2019 Hao Zhang, Bo Chen, Long Tian, Zhengjue Wang, Mingyuan Zhou

To extract and relate visual and linguistic concepts from images and textual descriptions for text-based zero-shot learning (ZSL), we develop variational hetero-encoder (VHE) that decodes text via a deep probabilisitic topic model, the variational posterior of whose local latent variables is encoded from an image via a Weibull distribution based inference network.

Image Generation Retrieval +3

Sparse Dictionary Learning for Edit Propagation of High-Resolution Images

no code implementations CVPR 2014 Xiaowu Chen, Dongqing Zou, Jianwei Li, Xiaochun Cao, Qinping Zhao, Hao Zhang

Previous approaches for edit propagation typically employ a global optimization over the whole set of image pixels, incurring a prohibitively high memory and time consumption for high-resolution images.

Dictionary Learning Vocal Bursts Intensity Prediction

AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss

no code implementations CVPR 2020 Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas Guibas, Hao Zhang

While the part prior network can be trained with noisy and inconsistently segmented shapes, the final output of AdaCoSeg is a consistent part labeling for the input set, with each shape segmented into up to (a user-specified) K parts.

Instance Segmentation Segmentation +1

LOGAN: Unpaired Shape Transform in Latent Overcomplete Space

no code implementations25 Mar 2019 Kangxue Yin, Zhiqin Chen, Hui Huang, Daniel Cohen-Or, Hao Zhang

Our network consists of an autoencoder to encode shapes from the two input domains into a common latent space, where the latent codes concatenate multi-scale shape features, resulting in an overcomplete representation.

Generative Adversarial Network Translation

DenseAttentionSeg: Segment Hands from Interacted Objects Using Depth Input

no code implementations29 Mar 2019 Zihao Bo, Hao Zhang, Junhai Yong, Feng Xu

We propose a real-time DNN-based technique to segment hand and object of interacting motions from depth inputs.

Hand Segmentation Object +1

Multisensory Omni-directional Long-term Place Recognition: Benchmark Dataset and Analysis

no code implementations18 Apr 2017 Ashwin Mathur, Fei Han, Hao Zhang

We introduce a new dataset Multisensory Omnidirectional Long-term Place recognition (MOLP) comprising omnidirectional intensity and disparity images.

Robotics

Constrained low-tubal-rank tensor recovery for hyperspectral images mixed noise removal by bilateral random projections

no code implementations15 May 2019 Hao Zhang, Xi-Le Zhao, Tai-Xiang Jiang, Michael Kwok-Po Ng

In this paper, we propose a novel low-tubal-rank tensor recovery model, which directly constrains the tubal rank prior for effectively removing the mixed Gaussian and sparse noise in hyperspectral images.

Hyperspectral Image Denoising Image Denoising

A Hybrid Precipitation Prediction Method based on Multicellular Gene Expression Programming

no code implementations1 Apr 2019 Hongya Li, Yuzhong Peng, Chuyan Deng, Yonghua Pan, Daoqing Gong, Hao Zhang

Prompt and accurate precipitation forecast is very important for development management of regional water resource, flood disaster prevention and people's daily activity and production plan; however, non-linear and nonstationary characteristics of precipitation data and noise seriously affect forecast accuracy.

Denoising Management

A Seft-adaptive Multicellular GEP Algorithm Based On Fuzzy Control For Function Optimization

no code implementations1 Apr 2019 Chuyan Deng, Yuzhong Peng, Hongya Li, Daoqing Gong, Hao Zhang, Zhiping Liu

According to the concentration and dispersion of individual fitness values in population, the crossover rate, mutation rate and real number set mutation rate of genetic operation are dynamically adjusted.

GRAINS: Generative Recursive Autoencoders for INdoor Scenes

no code implementations24 Jul 2018 Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, Hao Zhang

We present a generative neural network which enables us to generate plausible 3D indoor scenes in large quantities and varieties, easily and highly efficiently.

Graphics

Improving Performance of End-to-End ASR on Numeric Sequences

no code implementations1 Jul 2019 Cal Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu

This out-of-vocabulary (OOV) issue is addressed in conventional ASR systems by training part of the model on spoken domain utterances (e. g.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.