no code implementations • COLING 2022 • Bo Liu, Wandi Xu, Yuejia Xiang, XiaoJun Wu, Lejian He, BoWen Zhang, Li Zhu
However, we find that noise learning in text classification is relatively underdeveloped: 1. many methods that have been proven effective in the image domain are not explored in text classification, 2. it is difficult to conduct a fair comparison between previous studies as they do experiments in different noise settings.
no code implementations • COLING 2022 • BoWen Zhang, Xu Huang, Zhichao Huang, Hu Huang, Baoquan Zhang, Xianghua Fu, Liwen Jing
SILTN is interpretable because it is a neurosymbolic formalism and a computational model that supports learning and reasoning about data with a differentiable first-order logic language (FOL).
no code implementations • 1 Sep 2024 • Fuqiang Niu, Zebang Cheng, Xianghua Fu, Xiaojiang Peng, Genan Dai, Yin Chen, Hu Huang, BoWen Zhang
To address this, we introduce a new multimodal multi-turn conversational stance detection dataset (called MmMtCSD).
1 code implementation • 24 Aug 2024 • Jinchao Ge, Zeyu Zhang, Minh Hieu Phan, BoWen Zhang, Akide Liu, Yang Zhao
Active learning enhances annotation efficiency by selecting the most revealing samples for labeling, thereby reducing reliance on extensive human input.
1 code implementation • 14 Aug 2024 • Yongcheng Li, Lingcong Cai, Ying Lu, Cheng Lin, Yupeng Zhang, Jingyan Jiang, Genan Dai, BoWen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan
To address this issue, we propose a novel framework of domain-invariant representation learning (DoRL) via segment anything model (SAM) for blood cell classification.
1 code implementation • 13 Aug 2024 • Yongcheng Li, Lingcong Cai, Ying Lu, Yupeng Zhang, Jingyan Jiang, Genan Dai, BoWen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan
Accurate classification of blood cells plays a vital role in hematological analysis as it aids physicians in diagnosing various medical conditions.
no code implementations • 29 Jul 2024 • Tom Gunter, ZiRui Wang, Chong Wang, Ruoming Pang, Aonan Zhang, BoWen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek, Sam Wiseman, Syd Evans, Tao Lei, Vivek Rathod, Xiang Kong, Xianzhi Du, Yanghao Li, Yongqiang Wang, Yuan Gao, Zaid Ahmed, Zhaoyang Xu, Zhiyun Lu, Al Rashid, Albin Madappally Jose, Alec Doane, Alfredo Bencomo, Allison Vanderby, Andrew Hansen, Ankur Jain, Anupama Mann Anupama, Areeba Kamal, Bugu Wu, Carolina Brum, Charlie Maalouf, Chinguun Erdenebileg, Chris Dulhanty, Dominik Moritz, Doug Kang, Eduardo Jimenez, Evan Ladd, Fangping Shi, Felix Bai, Frank Chu, Fred Hohman, Hadas Kotek, Hannah Gillis Coleman, Jane Li, Jeffrey Bigham, Jeffery Cao, Jeff Lai, Jessica Cheung, Jiulong Shan, Joe Zhou, John Li, Jun Qin, Karanjeet Singh, Karla Vega, Kelvin Zou, Laura Heckman, Lauren Gardiner, Margit Bowler, Maria Cordell, Meng Cao, Nicole Hay, Nilesh Shahdadpuri, Otto Godwin, Pranay Dighe, Pushyami Rachapudi, Ramsey Tantawi, Roman Frigg, Sam Davarnia, Sanskruti Shah, Saptarshi Guha, Sasha Sirovica, Shen Ma, Shuang Ma, Simon Wang, Sulgi Kim, Suma Jayaram, Vaishaal Shankar, Varsha Paidi, Vivek Kumar, Xin Wang, Xin Zheng, Walker Cheng, Yael Shrager, Yang Ye, Yasu Tanaka, Yihao Guo, Yunsong Meng, Zhao Tang Luo, Zhi Ouyang, Alp Aygar, Alvin Wan, Andrew Walkingshaw, Andy Narayanan, Antonie Lin, Arsalan Farooq, Brent Ramerth, Colorado Reed, Chris Bartels, Chris Chaney, David Riazati, Eric Liang Yang, Erin Feldman, Gabriel Hochstrasser, Guillaume Seguin, Irina Belousova, Joris Pelemans, Karen Yang, Keivan Alizadeh Vahid, Liangliang Cao, Mahyar Najibi, Marco Zuliani, Max Horton, Minsik Cho, Nikhil Bhendawade, Patrick Dong, Piotr Maj, Pulkit Agrawal, Qi Shan, Qichen Fu, Regan Poston, Sam Xu, Shuangning Liu, Sushma Rao, Tashweena Heeramun, Thomas Merth, Uday Rayala, Victor Cui, Vivek Rangarajan Sridhar, Wencong Zhang, Wenqi Zhang, Wentao Wu, Xingyu Zhou, Xinwen Liu, Yang Zhao, Yin Xia, Zhile Ren, Zhongzheng Ren
We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute.
no code implementations • 21 Jul 2024 • BoWen Zhang, Cheng Yang, Xuanhui Liu
The third approach depends on phenomena specific to certain model architectures, complicating its application to large-scale image generation. To address these issues, we propose a novel controllable generation framework that offers a generalized interpretation of backward guidance without relying on specific assumptions.
no code implementations • 9 Jul 2024 • BoWen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo
We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image.
no code implementations • 2 Jul 2024 • BoWen Zhang, Zhichao Huang, Genan Dai, Guangning Xu, Xiaomao Fan, Hu Huang
\method{} comprises several key modules, including the core subgraph knowledge submodule, graph domain adaptation module, and few-shot learning module for downstream tasks.
no code implementations • 2 Jul 2024 • BoWen Zhang, Geoffrey Ye Li
In this work, we revisit the 3D-orthogonal matching pursuit (OMP) algorithm and demonstrate that the operation of 3D-OMP is analogous to a specific kind of transformer with 3D attention.
1 code implementation • 27 Jun 2024 • Vu Minh Hieu Phan, Yutong Xie, BoWen Zhang, Yuankai Qi, Zhibin Liao, Antonios Perperidis, Son Lam Phung, Johan W. Verjans, Minh-Son To
To address this, we introduce UNet Structured Transformer (UNest), a novel architecture incorporating structural inductive biases for unpaired medical image synthesis.
no code implementations • 15 Jun 2024 • BoWen Zhang, Ying Chen, Long Bai, Yan Zhao, Yuxiang Sun, Yixuan Yuan, Jianhua Zhang, Hongliang Ren
Our method, inspired by the DINOv2 foundation model, applies low-rank adaptation learning to tailor foundation models for capsule endoscopy diagnosis effectively.
1 code implementation • 14 Jun 2024 • BoWen Zhang, Chunping Li
Semantic Textual Similarity (STS) constitutes a critical research direction in computational linguistics and serves as a key indicator of the encoding capabilities of embedding models.
1 code implementation • 8 Jun 2024 • BoWen Zhang, Chunping Li
Since the introduction of BERT and RoBERTa, research on Semantic Textual Similarity (STS) has made groundbreaking progress.
1 code implementation • 28 May 2024 • BoWen Zhang, Xiaofei Xie, Haotian Lu, Na Ma, Tianlin Li, Qing Guo
The core challenge lies in generating smooth and natural transitions between these segments given the inherent complexity and variability of action transitions.
no code implementations • 24 May 2024 • Runsong Jia, BoWen Zhang, Sergio J. Rodríguez Méndez, Pouya G. Omran
DDM enables a fine-grained representation of the hierarchical structure and semantic relationships within academic papers, while KGQP leverages the KG structure to improve query accuracy and efficiency with LLMs.
no code implementations • 11 Apr 2024 • Haotian Zhang, Haoxuan You, Philipp Dufter, BoWen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang
While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks.
Ranked #83 on Visual Question Answering on MM-Vet
no code implementations • 11 Apr 2024 • Ming Cheng, BoWen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, Xingjian Diao
Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments.
no code implementations • 5 Apr 2024 • BoWen Zhang, Harold Soh
A principal issue is that in prior methods, the KG schema has to be included in the LLM prompt to generate valid triplets; larger and more complex schema easily exceed the LLMs' context window length.
2 code implementations • 5 Apr 2024 • BoWen Zhang, Kehua Chang, Chunping Li
Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms.
1 code implementation • 31 Mar 2024 • Zebang Cheng, Fuqiang Niu, Yuxiang Lin, Zhi-Qi Cheng, BoWen Zhang, Xiaojiang Peng
This paper presents our winning submission to Subtask 2 of SemEval 2024 Task 3 on multimodal emotion cause analysis in conversations.
no code implementations • 28 Mar 2024 • BoWen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo
We introduce a radiance representation that is both structured and fully explicit and thus greatly facilitates 3D generative modeling.
1 code implementation • 23 Mar 2024 • Daijun Ding, Li Dong, Zhichao Huang, Guangning Xu, Xu Huang, Bo Liu, Liwen Jing, BoWen Zhang
To address these issues, we propose an encoder-decoder data augmentation (EDDA) framework.
no code implementations • 20 Mar 2024 • BoWen Zhang, Tianyu Yang, Yu Li, Lei Zhang, Xi Zhao
In this paper, we present a triplane autoencoder, which encodes 3D models into a compact triplane latent space to effectively compress both the 3D geometry and texture information.
1 code implementation • 17 Mar 2024 • Fuqiang Niu, Min Yang, Ang Li, Baoquan Zhang, Xiaojiang Peng, BoWen Zhang
Previous stance detection studies typically concentrate on evaluating stances within individual instances, thereby exhibiting limitations in effectively modeling multi-party discussions concerning the same specific topic, as naturally transpire in authentic social media interactions.
no code implementations • 14 Mar 2024 • Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, BoWen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, ZiRui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang
Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons.
Ranked #30 on Visual Question Answering on MM-Vet
1 code implementation • CVPR 2024 • Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, BoWen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. Verjans
Medical vision language pre-training (VLP) has emerged as a frontier of research, enabling zero-shot pathological recognition by comparing the query image with the textual descriptions for each disease.
1 code implementation • 26 Feb 2024 • Shiwen Ni, Minghuan Tan, Yuelin Bai, Fuqiang Niu, Min Yang, BoWen Zhang, Ruifeng Xu, Xiaojun Chen, Chengming Li, Xiping Hu, Ye Li, Jianping Fan
In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in the IP domain.
1 code implementation • 2 Feb 2024 • Yangyang Shu, Xiaofeng Cao, Qi Chen, BoWen Zhang, Ziqin Zhou, Anton Van Den Hengel, Lingqiao Liu
Source-Free Unsupervised Domain Adaptation (SFUDA) is a challenging task where a model needs to be adapted to a new domain without access to target domain labels or source domain data.
no code implementations • 9 Jan 2024 • Jianyang Shi, BoWen Zhang, Amartansh Dubey, Ross Murch, Liwen Jing
This is the first research work to consider WiFi indoor imaging as a multi-modal image generation task that converts the measured WiFi power into a high-resolution indoor image.
no code implementations • 3 Jan 2024 • Daijun Ding, Rong Chen, Liwen Jing, BoWen Zhang, Xu Huang, Li Dong, Xiaowen Zhao, Ge Song
In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge.
1 code implementation • CVPR 2024 • Satish Kumar, BoWen Zhang, Chandrakanth Gudavalli, Connor Levenson, Lacey Hughey, Jared A. Stabach, Irene Amoke, Gordon Ojwang, Joseph Mukeka, Stephen Mwiu, Joseph Ogutu, Howard Frederick, B.S. Manjunath
We introduce WildlifeMapper (WM) a flexible model designed to detect locate and identify multiple species in aerial imagery.
no code implementations • 26 Dec 2023 • BoWen Zhang, Daijun Ding, Liwen Jing, Hu Huang
Zero-shot stance detection (ZSSD) aims to detect stances toward unseen targets.
1 code implementation • 12 Nov 2023 • Zeyu Zhang, Xuyin Qi, BoWen Zhang, Biao Wu, Hien Le, Bora Jeong, Zhibin Liao, Yunxiang Liu, Johan Verjans, Minh-Son To, Richard Hartley
This manual process is highly time-consuming and expensive, limiting the number of patients who can receive timely radiotherapy.
no code implementations • 12 Oct 2023 • Yao-Hung Hubert Tsai, Vansh Dhar, Jialu Li, BoWen Zhang, Jian Zhang
Recent efforts to enable visual navigation using large language models have mainly focused on developing complex prompt systems.
2 code implementations • 11 Oct 2023 • Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang
We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.
1 code implementation • 11 Oct 2023 • Zhengfeng Lai, Haotian Zhang, BoWen Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao
For example, VeCLIP achieves up to +25. 2% gain in COCO and Flickr30k retrieval tasks under the 12M setting.
no code implementations • 10 Oct 2023 • BoWen Zhang, Zhijin Qin, Geoffrey Ye Li
In this article, we also investigate the data transmission methods for programmable sensors, where the performance of communication systems is evaluated by the reconstructed images or videos rather than the transmission of sensor data itself.
no code implementations • 2 Oct 2023 • Bowen Dang, Xi Zhao, BoWen Zhang, He Wang
Our key idea is to constrain the solution space of the human body by considering the occluded body parts and visible body parts separately: modeling all plausible poses where the occluded body parts do not penetrate the scene, and constraining the visible body parts using depth data.
1 code implementation • 2 Oct 2023 • Ajay Jaiswal, Zhe Gan, Xianzhi Du, BoWen Zhang, Zhangyang Wang, Yinfei Yang
Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs that achieve 50 - 60% sparsity and reduce the bit width to 3 or 4 bits per weight, with negligible degradation of perplexity over the uncompressed baseline.
2 code implementations • 20 Sep 2023 • BoWen Zhang, Kehua Chang, Chunping Li
In response to these challenges, this paper presents CoT-BERT, an innovative method that harnesses the progressive thinking of Chain-of-Thought reasoning to tap into the latent potential of pre-trained models like BERT.
no code implementations • 8 Sep 2023 • Erik Daxberger, Floris Weers, BoWen Zhang, Tom Gunter, Ruoming Pang, Marcin Eichner, Michael Emmersberger, Yinfei Yang, Alexander Toshev, Xianzhi Du
We empirically show that our sparse Mobile Vision MoEs (V-MoEs) can achieve a better trade-off between performance and efficiency than the corresponding dense ViTs.
1 code implementation • 16 Aug 2023 • Qi Chen, Chaorui Deng, Zixiong Huang, BoWen Zhang, Mingkui Tan, Qi Wu
In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i. e., a higher likelihood indicates better perceptual quality and better text-image alignment.
1 code implementation • 10 Aug 2023 • Quan Tang, Chuanjian Liu, Fagui Liu, Yifan Liu, Jun Jiang, BoWen Zhang, Kai Han, Yunhe Wang
Aggregation of multi-stage features has been revealed to play a significant role in semantic segmentation.
1 code implementation • ICCV 2023 • Quan Tang, BoWen Zhang, Jiajun Liu, Fagui Liu, Yifan Liu
Experiments suggest that the proposed DToP architecture reduces on average $20\% - 35\%$ of computational cost for current semantic segmentation methods based on plain vision transformers without accuracy degradation.
no code implementations • 31 Jul 2023 • Baoquan Zhang, Chuyao Luo, Demin Yu, Huiwei Lin, Xutao Li, Yunming Ye, BoWen Zhang
Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i. e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data.
1 code implementation • 28 Jul 2023 • Xindi Wang, YuFei Wang, Can Xu, Xiubo Geng, BoWen Zhang, Chongyang Tao, Frank Rudzicz, Robert E. Mercer, Daxin Jiang
Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained.
no code implementations • 6 Jul 2023 • BoWen Zhang, Zhijin Qin, Geoffrey Ye Li
According to the base CS results, the encoder then employs a policy network to analyze the semantic information in images and determines the measurement matrix for different image areas.
1 code implementation • 13 Jun 2023 • Wentao Wu, Aleksei Timofeev, Chen Chen, BoWen Zhang, Kun Duan, Shuangning Liu, Yantao Zheng, Jonathon Shlens, Xianzhi Du, Zhe Gan, Yinfei Yang
Our approach involves employing a named entity recognition model to extract entities from the alt-text, and then using a CLIP model to select the correct entities as labels of the paired image.
1 code implementation • 13 Jun 2023 • Liyang Liu, Zihan Wang, Minh Hieu Phan, BoWen Zhang, Jinchao Ge, Yifan Liu
Current knowledge distillation approaches in semantic segmentation tend to adopt a holistic approach that treats all spatial locations equally.
1 code implementation • 9 Jun 2023 • BoWen Zhang, Liyang Liu, Minh Hieu Phan, Zhi Tian, Chunhua Shen, Yifan Liu
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder-decoder framework and introduces \textbf{SegViTv2}.
Ranked #16 on Semantic Segmentation on ADE20K
1 code implementation • 8 May 2023 • Liangliang Cao, BoWen Zhang, Chen Chen, Yinfei Yang, Xianzhi Du, Wencong Zhang, Zhiyun Lu, Yantao Zheng
In this paper, we discuss two effective approaches to improve the efficiency and robustness of CLIP training: (1) augmenting the training dataset while maintaining the same number of optimization steps, and (2) filtering out samples that contain text regions in the image.
no code implementations • 6 Apr 2023 • BoWen Zhang, Xianghua Fu, Daijun Ding, Hu Huang, Yangyang Li, Liwen Jing
Stance detection predicts attitudes towards targets in texts and has gained attention with the rise of social media.
1 code implementation • 6 Mar 2023 • BoWen Zhang, Harold Soh
In this work, we explore the potential of large-language models (LLMs) -- which have consumed vast amounts of human-generated text data -- to act as zero-shot human models for HRI.
no code implementations • 17 Feb 2023 • BoWen Zhang, Zhijin Qin, Geoffrey Ye Li
Wireless extended reality (XR) has attracted wide attentions as a promising technology to improve users' mobility and quality of experience.
no code implementations • 2 Feb 2023 • Eivind Meyer, Maurice Brenner, BoWen Zhang, Max Schickert, Bilal Musani, Matthias Althoff
Heterogeneous graphs offer powerful data representations for traffic, given their ability to model the complex interaction effects among a varying number of traffic participants and the underlying road infrastructure.
no code implementations • 30 Jan 2023 • Chen Chen, BoWen Zhang, Liangliang Cao, Jiguang Shen, Tom Gunter, Albin Madappally Jose, Alexander Toshev, Jonathon Shlens, Ruoming Pang, Yinfei Yang
We extend the CLIP model and build a sparse text and image representation (STAIR), where the image and text are mapped to a sparse token space.
1 code implementation • CVPR 2024 • Xiaojie Jin, BoWen Zhang, Weibo Gong, Kai Xu, Xueqing Deng, Peng Wang, Zhao Zhang, Xiaohui Shen, Jiashi Feng
The first is a Temporal Adaptation Module that is incorporated in the video branch to introduce global and local temporal contexts.
no code implementations • 16 Jan 2023 • Tianyue Cao, BoWen Zhang, Zhao Jin, Yongzhi Cao, Hanpin Wang
To deal with properties on variable-length sequences and multilevel data structures, we propose sequence-heap separation logic which integrates sequences into logical reasoning on heap-manipulated programs.
1 code implementation • 30 Dec 2022 • BoWen Zhang, Daijun Ding, Liwen Jing, Genan Dai, Nan Yin
ChatGPT has the potential to be the best AI model for stance detection tasks in NLP, or at least change the research paradigm of this field.
no code implementations • 16 Dec 2022 • BoWen Zhang, Zhijin Qin, Yiyu Guo, Geoffrey Ye Li
In particular, semantic sensing is used to improve the sensing efficiency by exploring the spatial-temporal distributions of semantic information.
1 code implementation • CVPR 2023 • BoWen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen
In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects.
1 code implementation • CVPR 2023 • Ziqin Zhou, BoWen Zhang, Yinjie Lei, Lingqiao Liu, Yifan Liu
Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme.
Ranked #4 on Zero-Shot Semantic Segmentation on PASCAL VOC
no code implementations • 15 Nov 2022 • R. Austin McEver, BoWen Zhang, B. S. Manjunath
However, in many scenarios, it can be difficult to collect images for training, not to mention the costs associated with collecting annotations suitable for training these object detectors.
no code implementations • 10 Nov 2022 • Zeyu Feng, BoWen Zhang, Jianxin Bi, Harold Soh
In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints.
1 code implementation • 12 Oct 2022 • BoWen Zhang, Zhi Tian, Quan Tang, Xiangxiang Chu, Xiaolin Wei, Chunhua Shen, Yifan Liu
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose the SegVit.
Ranked #4 on Semantic Segmentation on COCO-Stuff test
1 code implementation • 17 Sep 2022 • BoWen Zhang, Xi Zhao, He Wang, Ruizhen Hu
The core challenge is to generate plausible geometries to fill the unobserved part of the object based on a partial scan, which is under-constrained and suffers from a huge solution space.
no code implementations • 1 Jun 2022 • R. Austin McEver, BoWen Zhang, Connor Levenson, A S M Iftekhar, B. S. Manjunath
Each video includes annotations indicating the start and end times of substrates across the video in addition to counts of species of interest.
no code implementations • 11 May 2022 • BoWen Zhang, Houssem Sifaou, Geoffrey Ye Li
On the other hand, considering the generality of a tracking system, we decouple the tracking system from the CSI environments so that one tracking system for all environments becomes possible.
no code implementations • 10 Mar 2022 • Lucas Relic, BoWen Zhang, Yi-Lin Tuan, Michael Beyeler
Retinal implants have the potential to treat incurable blindness, yet the quality of the artificial vision they produce is still rudimentary.
1 code implementation • CVPR 2022 • BoWen Zhang, Shuyang Gu, Bo Zhang, Jianmin Bao, Dong Chen, Fang Wen, Yong Wang, Baining Guo
To this end, we believe that local attention is crucial to strike the balance between computational efficiency and modeling capacity.
Ranked #1 on Image Generation on CelebA 256x256 (FID metric)
1 code implementation • 14 Dec 2021 • Yidong Wang, BoWen Zhang, Wenxin Hou, Zhen Wu, Jindong Wang, Takahiro Shinozaki
The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i. e., the model tends to classify tail classes as head classes.
no code implementations • 14 Dec 2021 • BoWen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha
We term this approach as Co-training Videos and Images for Action Recognition (CoVeR).
Ranked #6 on Action Classification on MiT (using extra training data)
2 code implementations • NeurIPS 2021 • BoWen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki
However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes.
no code implementations • Findings (EMNLP) 2021 • BoWen Zhang, Hexiang Hu, Linlu Qiu, Peter Shaw, Fei Sha
We investigate ways to compose complex concepts in texts from primitive ones while grounding them in images.
2 code implementations • EMNLP 2021 • Linlu Qiu, Hexiang Hu, BoWen Zhang, Peter Shaw, Fei Sha
We analyze the grounded SCAN (gSCAN) benchmark, which was recently proposed to study systematic generalization for grounded language understanding.
1 code implementation • NeurIPS 2021 • BoWen Zhang, Yifan Liu, Zhi Tian, Chunhua Shen
This neural representation enables our decoder to leverage the smoothness prior in the semantic label space, and thus makes our decoder more efficient.
1 code implementation • 13 Jun 2021 • Marija Vella, BoWen Zhang, Wei Chen, João F. C. Mota
Such methods, however, cannot guarantee that the input measurements are satisfied in the recovered image, since the learned parameters by the network are applied to every test image.
no code implementations • 28 Feb 2021 • Yichao Zhou, Wei-Ting Chen, BoWen Zhang, David Lee, J. Harry Caufield, Kai-Wei Chang, Yizhou Sun, Peipei Ping, Wei Wang
Clinical case reports are written descriptions of the unique aspects of a particular clinical case, playing an essential role in sharing clinical experiences about atypical disease phenotypes and new therapies.
no code implementations • 5 Feb 2021 • Zhi Tian, BoWen Zhang, Hao Chen, Chunhua Shen
In the literature, top-performing instance segmentation methods typically follow the paradigm of Mask R-CNN and rely on ROI operations (typically ROIAlign) to attend to each instance.
no code implementations • 18 Nov 2020 • BoWen Zhang, Hexiang Hu, Joonseok Lee, Ming Zhao, Sheide Chammas, Vihan Jain, Eugene Ie, Fei Sha
Identifying a short segment in a long video that semantically matches a text query is a challenging task that has important application potentials in language-based video search, browsing, and navigation.
no code implementations • 29 Oct 2020 • Wei Chen, BoWen Zhang, Shi Jin, Bo Ai, Zhangdui Zhong
Sparse signal recovery problems from noisy linear measurements appear in many areas of wireless communications.
no code implementations • EMNLP 2020 • BoWen Zhang, Hexiang Hu, Vihan Jain, Eugene Ie, Fei Sha
Recent progresses have leveraged the ideas of pre-training (from language modeling) and attention layers in Transformers to learn representation from datasets containing images aligned with linguistic expressions that describe the images.
no code implementations • 6 Oct 2020 • BoWen Zhang, Hao Chen, Meng Wang, Yuanjun Xiong
We formulate the problem of online temporal action detection in live streaming videos, acknowledging one important property of live streaming videos that there is normally a broadcast delay between the latest captured frame and the actual frame viewed by the audience.
no code implementations • ACL 2020 • Bowen Zhang, Min Yang, Xutao Li, Yunming Ye, Xiaofei Xu, Kuai Dai
Specifically, a semantic-emotion heterogeneous graph is constructed from external semantic and emotion lexicons, which is then fed into a graph convolutional network to learn multi-hop semantic connections between words and emotion tags.
no code implementations • 26 Mar 2020 • Bowen Zhang, Benedetta Tondi, Xixiang Lv, Mauro Barni
The existence of adversarial examples and the easiness with which they can be generated raise several security concerns with regard to deep learning systems, pushing researchers to develop suitable defense mechanisms.
no code implementations • 13 Jan 2020 • Bowen Zhang, Hexiang Hu, Fei Sha
To narrate a sequence of images, we use the predicted anchor word embeddings and the image features as the joint input to a seq2seq model.
Ranked #15 on Visual Storytelling on VIST
no code implementations • 1 Oct 2019 • Bowen Zhang, Benedetta Tondi, Mauro Barni
In this paper, we study the vulnerability of anti-spoofing methods based on deep learning against adversarial perturbations.
Cryptography and Security
1 code implementation • ECCV 2018 • Bowen Zhang, Hexiang Hu, Fei Sha
Similarly, a paragraph may contain sentences with different topics, which collectively conveys a coherent message or story.
no code implementations • 27 Jul 2018 • Bowen Zhang, Xifan Zhang, Fan Cheng, Deli Zhao
During testing, combined with the test sample and the points in the class, a new simplex is formed.
1 code implementation • CVPR 2016 • Bowen Zhang, Li-Min Wang, Zhe Wang, Yu Qiao, Hanli Wang
The deep two-stream architecture exhibited excellent performance on video based action recognition.
Ranked #73 on Action Recognition on UCF101