Search Results for author: BoWen Zhang

Found 70 papers, 32 papers with code

Few Shot Learning with Simplex

no code implementations27 Jul 2018 Bowen Zhang, Xifan Zhang, Fan Cheng, Deli Zhao

During testing, combined with the test sample and the points in the class, a new simplex is formed.

Few-Shot Learning

Cross-Modal and Hierarchical Modeling of Video and Text

1 code implementation ECCV 2018 Bowen Zhang, Hexiang Hu, Fei Sha

Similarly, a paragraph may contain sentences with different topics, which collectively conveys a coherent message or story.

Action Recognition Retrieval +3

Attacking CNN-based anti-spoofing face authentication in the physical domain

no code implementations1 Oct 2019 Bowen Zhang, Benedetta Tondi, Mauro Barni

In this paper, we study the vulnerability of anti-spoofing methods based on deep learning against adversarial perturbations.

Cryptography and Security

Visual Storytelling via Predicting Anchor Word Embeddings in the Stories

no code implementations13 Jan 2020 Bowen Zhang, Hexiang Hu, Fei Sha

To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model.

Visual Storytelling Word Embeddings

Challenging the adversarial robustness of DNNs based on error-correcting output codes

no code implementations26 Mar 2020 Bowen Zhang, Benedetta Tondi, Xixiang Lv, Mauro Barni

The existence of adversarial examples and the easiness with which they can be generated raise several security concerns with regard to deep learning systems, pushing researchers to develop suitable defense mechanisms.

Adversarial Attack Adversarial Robustness +2

Enhancing Cross-target Stance Detection with Transferable Semantic-Emotion Knowledge

no code implementations ACL 2020 Bowen Zhang, Min Yang, Xutao Li, Yunming Ye, Xiaofei Xu, Kuai Dai

Specifically, a semantic-emotion heterogeneous graph is constructed from external semantic and emotion lexicons, which is then fed into a graph convolutional network to learn multi-hop semantic connections between words and emotion tags.

Stance Detection Transfer Learning

Online Action Detection in Streaming Videos with Time Buffers

no code implementations6 Oct 2020 BoWen Zhang, Hao Chen, Meng Wang, Yuanjun Xiong

We formulate the problem of online temporal action detection in live streaming videos, acknowledging one important property of live streaming videos that there is normally a broadcast delay between the latest captured frame and the actual frame viewed by the audience.

Online Action Detection

Learning to Represent Image and Text with Denotation Graph

no code implementations EMNLP 2020 BoWen Zhang, Hexiang Hu, Vihan Jain, Eugene Ie, Fei Sha

Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in Transformers to learn representation from datasets containing images aligned with linguistic expressions that describe the images.

Attribute Image Retrieval +4

Solving Sparse Linear Inverse Problems in Communication Systems: A Deep Learning Approach With Adaptive Depth

no code implementations29 Oct 2020 Wei Chen, BoWen Zhang, Shi Jin, Bo Ai, Zhangdui Zhong

Sparse signal recovery problems from noisy linear measurements appear in many areas of wireless communications.

A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus

no code implementations18 Nov 2020 BoWen Zhang, Hexiang Hu, Joonseok Lee, Ming Zhao, Sheide Chammas, Vihan Jain, Eugene Ie, Fei Sha

Identifying a short segment in a long video that semantically matches a text query is a challenging task that has important application potentials in language-based video search, browsing, and navigation.

Language Modelling Masked Language Modeling +3

Instance and Panoptic Segmentation Using Conditional Convolutions

no code implementations5 Feb 2021 Zhi Tian, BoWen Zhang, Hao Chen, Chunhua Shen

In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance.

Instance Segmentation Panoptic Segmentation +1

CREATe: Clinical Report Extraction and Annotation Technology

no code implementations28 Feb 2021 Yichao Zhou, Wei-Ting Chen, BoWen Zhang, David Lee, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, Wei Wang

Clinical case reports are written descriptions of the unique aspects of a particular clinical case, playing an essential role in sharing clinical experiences about atypical disease phenotypes and new therapies.

Enhanced Hyperspectral Image Super-Resolution via RGB Fusion and TV-TV Minimization

1 code implementation13 Jun 2021 Marija Vella, BoWen Zhang, Wei Chen, João F. C. Mota

Such methods, however, cannot guarantee that the input measurements are satisfied in the recovered image, since the learned parameters by the network are applied to every test image.

Astronomy Hyperspectral Image Super-Resolution +1

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

1 code implementation NeurIPS 2021 BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen

This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.

Segmentation Semantic Segmentation +1

Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

2 code implementations EMNLP 2021 Linlu Qiu, Hexiang Hu, BoWen Zhang, Peter Shaw, Fei Sha

We analyze the grounded SCAN (gSCAN) benchmark, which was recently proposed to study systematic generalization for grounded language understanding.

Systematic Generalization

Visually Grounded Concept Composition

no code implementations Findings (EMNLP) 2021 BoWen Zhang, Hexiang Hu, Linlu Qiu, Peter Shaw, Fei Sha

We investigate ways to compose complex concepts in texts from primitive ones while grounding them in images.

Sentence

FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling

2 code implementations NeurIPS 2021 BoWen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki

However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes.

Semi-Supervised Image Classification

Margin Calibration for Long-Tailed Visual Recognition

1 code implementation14 Dec 2021 Yidong Wang, BoWen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki

The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i. e., the model tends to classify tail classes as head classes.

StyleSwin: Transformer-based GAN for High-resolution Image Generation

1 code implementation CVPR 2022 BoWen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo

To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity.

 Ranked #1 on Image Generation on CelebA 256x256 (FID metric)

Blocking Computational Efficiency +3

Deep Learning-Based Perceptual Stimulus Encoder for Bionic Vision

no code implementations10 Mar 2022 Lucas Relic, BoWen Zhang, Yi-Lin Tuan, Michael Beyeler

Retinal implants have the potential to treat incurable blindness, yet the quality of the artificial vision they produce is still rudimentary.

CSI-fingerprinting Indoor Localization via Attention-Augmented Residual Convolutional Neural Network

no code implementations11 May 2022 BoWen Zhang, Houssem Sifaou, Geoffrey Ye Li

On the other hand, considering the generality of a tracking system, we decouple the tracking system from the CSI environments so that one tracking system for all environments becomes possible.

Denoising Indoor Localization

Context-Driven Detection of Invertebrate Species in Deep-Sea Video

no code implementations1 Jun 2022 R. Austin McEver, BoWen Zhang, Connor Levenson, A S M Iftekhar, B. S. Manjunath

Each video includes annotations indicating the start and end times of substrates across the video in addition to counts of species of interest.

object-detection Object Detection

Shape Completion with Points in the Shadow

1 code implementation17 Sep 2022 BoWen Zhang, Xi Zhao, He Wang, Ruizhen Hu

The core challenge is to generate plausible geometries to fill the unobserved part of the object based on a partial scan, which is under-constrained and suffers from a huge solution space.

Object Point Cloud Completion

Safety-Constrained Policy Transfer with Successor Features

no code implementations10 Nov 2022 Zeyu Feng, BoWen Zhang, Jianxin Bi, Harold Soh

In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints.

Context-Matched Collage Generation for Underwater Invertebrate Detection

no code implementations15 Nov 2022 R. Austin McEver, BoWen Zhang, B. S. Manjunath

However, in many scenarios, it can be difficult to collect images for training, not to mention the costs associated with collecting annotations suitable for training these object detectors.

Object object-detection +1

Semantic Sensing and Communications for Ultimate Extended Reality

no code implementations16 Dec 2022 BoWen Zhang, Zhijin Qin, Yiyu Guo, Geoffrey Ye Li

In particular, semantic sensing is used to improve the sensing efficiency by exploring the spatial-temporal distributions of semantic information.

How would Stance Detection Techniques Evolve after the Launch of ChatGPT?

1 code implementation30 Dec 2022 BoWen Zhang, Daijun Ding, Liwen Jing

ChatGPT has the potential to be the best AI model for stance detection tasks in NLP, or at least change the research paradigm of this field.

Language Modelling Stance Detection +2

A separation logic for sequences in pointer programs and its decidability

no code implementations16 Jan 2023 Tianyue Cao, BoWen Zhang, Zhao Jin, Yongzhi Cao, Hanpin Wang

To deal with properties on variable-length sequences and multilevel data structures, we propose sequence-heap separation logic which integrates sequences into logical reasoning on heap-manipulated programs.

Logical Reasoning

Geometric Deep Learning for Autonomous Driving: Unlocking the Power of Graph Neural Networks With CommonRoad-Geometric

no code implementations2 Feb 2023 Eivind Meyer, Maurice Brenner, BoWen Zhang, Max Schickert, Bilal Musani, Matthias Althoff

Heterogeneous graphs offer powerful data representations for traffic, given their ability to model the complex interaction effects among a varying number of traffic participants and the underlying road infrastructure.

Autonomous Driving Trajectory Prediction

Semantic Communications with Variable-Length Coding for Extended Reality

no code implementations17 Feb 2023 BoWen Zhang, Zhijin Qin, Geoffrey Ye Li

Wireless extended reality (XR) has attracted wide attentions as a promising technology to improve users' mobility and quality of experience.

Large Language Models as Zero-Shot Human Models for Human-Robot Interaction

1 code implementation6 Mar 2023 BoWen Zhang, Harold Soh

In this work, we explore the potential of large-language models (LLMs) -- which have consumed vast amounts of human-generated text data -- to act as zero-shot human models for HRI.

Investigating Chain-of-thought with ChatGPT for Stance Detection on Social Media

no code implementations6 Apr 2023 BoWen Zhang, Xianghua Fu, Daijun Ding, Hu Huang, Yangyang Li, Liwen Jing

Stance detection predicts attitudes towards targets in texts and has gained attention with the rise of social media.

Stance Detection

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

1 code implementation8 May 2023 Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng

In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.

Adversarial Text Retrieval

SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers

1 code implementation9 Jun 2023 BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu

This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.

Continual Learning Continual Semantic Segmentation +2

BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation

1 code implementation13 Jun 2023 Liyang Liu, Zihan Wang, Minh Hieu Phan, BoWen Zhang, Jinchao Ge, Yifan Liu

Current knowledge distillation approaches in semantic segmentation tend to adopt a holistic approach that treats all spatial locations equally.

Knowledge Distillation Segmentation +1

MOFI: Learning Image Representations from Noisy Entity Annotated Images

1 code implementation13 Jun 2023 Wentao Wu, Aleksei Timofeev, Chen Chen, BoWen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang

Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.

Image Classification Image Retrieval +3

Semantic-Aware Image Compressed Sensing

no code implementations6 Jul 2023 BoWen Zhang, Zhijin Qin, Geoffrey Ye Li

According to the base CS results, the encoder then employs a policy network to analyze the semantic information in images and determines the measurement matrix for different image areas.

Image Compressed Sensing

Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning

1 code implementation28 Jul 2023 Xindi Wang, YuFei Wang, Can Xu, Xiubo Geng, BoWen Zhang, Chongyang Tao, Frank Rudzicz, Robert E. Mercer, Daxin Jiang

Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained.

In-Context Learning

MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning

no code implementations31 Jul 2023 Baoquan Zhang, Chuyao Luo, Demin Yu, Huiwei Lin, Xutao Li, Yunming Ye, BoWen Zhang

Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i. e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data.

Denoising Few-Shot Learning

Dynamic Token Pruning in Plain Vision Transformers for Semantic Segmentation

1 code implementation ICCV 2023 Quan Tang, BoWen Zhang, Jiajun Liu, Fagui Liu, Yifan Liu

Experiments suggest that the proposed DToP architecture reduces on average $20\% - 35\%$ of computational cost for current semantic segmentation methods based on plain vision transformers without accuracy degradation.

Image Classification Segmentation +1

Category Feature Transformer for Semantic Segmentation

1 code implementation10 Aug 2023 Quan Tang, Chuanjian Liu, Fagui Liu, Yifan Liu, Jun Jiang, BoWen Zhang, Kai Han, Yunhe Wang

Aggregation of multi-stage features has been revealed to play a significant role in semantic segmentation.

Segmentation Semantic Segmentation

Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

1 code implementation16 Aug 2023 Qi Chen, Chaorui Deng, Zixiong Huang, BoWen Zhang, Mingkui Tan, Qi Wu

In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i. e., a higher likelihood indicates better perceptual quality and better text-image alignment.

Text-to-Image Generation

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

no code implementations8 Sep 2023 Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du

We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.

CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

1 code implementation20 Sep 2023 BoWen Zhang, Kehua Chang, Chunping Li

Unsupervised sentence representation learning endeavors to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data.

Contrastive Learning Denoising +3

Compressing LLMs: The Truth is Rarely Pure and Never Simple

1 code implementation2 Oct 2023 Ajay Jaiswal, Zhe Gan, Xianzhi Du, BoWen Zhang, Zhangyang Wang, Yinfei Yang

Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs that achieve 50 - 60% sparsity and reduce the bit width to 3 or 4 bits per weight, with negligible degradation of perplexity over the uncompressed baseline.

Quantization Retrieval

Reconstructing 3D Human Pose from RGB-D Data with Occlusions

no code implementations2 Oct 2023 Bowen Dang, Xi Zhao, BoWen Zhang, He Wang

Our key idea is to constrain the solution space of the human body by considering the occluded body parts and visible body parts separately: modeling all plausible poses where the occluded body parts do not penetrate the scene, and constraining the visible body parts using depth data.

Compression Ratio Learning and Semantic Communications for Video Imaging

no code implementations10 Oct 2023 BoWen Zhang, Zhijin Qin, Geoffrey Ye Li

In this article, we also investigate the data transmission methods for programmable sensors, where the performance of communication systems is evaluated by the reconstructed images or videos rather than the transmission of sensor data itself.

Compressive Sensing Video Compressive Sensing

Ferret: Refer and Ground Anything Anywhere at Any Granularity

1 code implementation11 Oct 2023 Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.

Hallucination Language Modelling +1

Multimodal Large Language Model for Visual Navigation

no code implementations12 Oct 2023 Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, BoWen Zhang, Jian Zhang

Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems.

Language Modelling Large Language Model +2

Cross-target Stance Detection by Exploiting Target Analytical Perspectives

no code implementations3 Jan 2024 Daijun Ding, Rong Chen, Liwen Jing, BoWen Zhang, Xu Huang, Li Dong, Xiaowen Zhao, Ge Song

In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge.

Language Modelling Large Language Model +1

Vision Reimagined: AI-Powered Breakthroughs in WiFi Indoor Imaging

no code implementations9 Jan 2024 Jianyang Shi, BoWen Zhang, Amartansh Dubey, Ross Murch, Liwen Jing

This is the first research work to consider WiFi indoor imaging as a multi-modal image generation task that converts the measured WiFi power into a high-resolution indoor image.

Image Generation

Source-Free Unsupervised Domain Adaptation with Hypothesis Consolidation of Prediction Rationale

1 code implementation2 Feb 2024 Yangyang Shu, Xiaofeng Cao, Qi Chen, BoWen Zhang, Ziqin Zhou, Anton Van Den Hengel, Lingqiao Liu

Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.

Unsupervised Domain Adaptation

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

1 code implementation26 Feb 2024 Shiwen Ni, Minghuan Tan, Yuelin Bai, Fuqiang Niu, Min Yang, BoWen Zhang, Ruifeng Xu, Xiaojun Chen, Chengming Li, Xiping Hu, Ye Li, Jianping Fan

In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in the IP domain.

Language Modelling Large Language Model +2

Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework

1 code implementation12 Mar 2024 Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, BoWen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans

Medical vision language pre-training (VLP) has emerged as a frontier of research, enabling zero-shot pathological recognition by comparing the query image with the textual descriptions for each disease.

Language Modelling Large Language Model

A Challenge Dataset and Effective Models for Conversational Stance Detection

1 code implementation17 Mar 2024 Fuqiang Niu, Min Yang, Ang Li, Baoquan Zhang, Xiaojiang Peng, BoWen Zhang

Previous stance detection studies typically concentrate on evaluating stances within individual instances, thereby exhibiting limitations in effectively modeling multi-party discussions concerning the same specific topic, as naturally transpire in authentic social media interactions.

Stance Detection

Compress3D: a Compressed Latent Space for 3D Generation from a Single Image

no code implementations20 Mar 2024 BoWen Zhang, Tianyu Yang, Yu Li, Lei Zhang, Xi Zhao

In this paper, we present a triplane autoencoder, which encodes 3D models into a compact triplane latent space to effectively compress both the 3D geometry and texture information.

Noise Learning for Text Classification: A Benchmark

no code implementations COLING 2022 Bo Liu, Wandi Xu, Yuejia Xiang, XiaoJun Wu, Lejian He, BoWen Zhang, Li Zhu

However, we find that noise learning in text classification is relatively underdeveloped: 1. many methods that have been proven effective in the image domain are not explored in text classification, 2. it is difficult to conduct a fair comparison between previous studies as they do experiments in different noise settings.

text-classification Text Classification

Sentiment Interpretable Logic Tensor Network for Aspect-Term Sentiment Analysis

no code implementations COLING 2022 BoWen Zhang, Xu Huang, Zhichao Huang, Hu Huang, Baoquan Zhang, Xianghua Fu, Liwen Jing

SILTN is interpretable because it is a neurosymbolic formalism and a computational model that supports learning and reasoning about data with a differentiable first-order logic language (FOL).

Computational Efficiency Knowledge Distillation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.