19 code implementations • CVPR 2019 • Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao
Due to the fact that there are large geometrical margins among the minimal scale kernels, our method is effective to split the close text instances, making it easier to use segmentation-based methods to detect arbitrary-shaped text instances.
Ranked #12 on Scene Text Detection on SCUT-CTW1500
20 code implementations • CVPR 2019 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jian Yang
A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches.
Ranked #98 on Image Classification on CIFAR-100 (using extra training data)
16 code implementations • 25 Jun 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
We hope this work will facilitate state-of-the-art Transformer researches in computer vision.
Ranked #23 on Object Detection on COCO-O
7 code implementations • NeurIPS 2020 • Xiang Li, Wenhai Wang, Lijun Wu, Shuo Chen, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations.
Ranked #93 on Object Detection on COCO test-dev
9 code implementations • ICCV 2021 • Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.
Ranked #5 on Semantic Segmentation on SynPASS
5 code implementations • 20 Apr 2023 • Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.
Ranked #9 on Visual Question Answering on BenchLMM
1 code implementation • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny
Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.
Ranked #10 on Visual Question Answering on BenchLMM
4 code implementations • ICLR 2019 • Zhengdao Chen, Xiang Li, Joan Bruna
We show that, in a data-driven manner and without access to the underlying generative models, they can match or even surpass the performance of the belief propagation algorithm on binary and multi-class stochastic block models, which is believed to reach the computational threshold.
Ranked #1 on Community Detection on Amazon (Accuracy-NE metric, using extra training data)
5 code implementations • CVPR 2021 • Xiang Li, Wenhai Wang, Xiaolin Hu, Jun Li, Jinhui Tang, Jian Yang
Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality.
Ranked #26 on Object Detection on COCO-O
9 code implementations • NeurIPS 2018 • Robert J. Wang, Xiang Li, Charles X. Ling
In this study, we propose an efficient architecture named PeleeNet, which is built with conventional convolution instead.
4 code implementations • 8 Jan 2018 • Han Zhu, Xiang Li, Pengye Zhang, Guozheng Li, Jie He, Han Li, Kun Gai
In systems with large corpus, however, the calculation cost for the learnt model to predict all user-item preferences is tremendous, which makes full corpus retrieval extremely difficult.
9 code implementations • 7 Jun 2018 • Xiang Li, Wenhai Wang, Wenbo Hou, Ruo-Ze Liu, Tong Lu, Jian Yang
To address these problems, we propose a novel Progressive Scale Expansion Network (PSENet), designed as a segmentation-based detector with multiple predictions for each text instance.
Ranked #12 on Scene Text Detection on ICDAR 2017 MLT
3 code implementations • 23 May 2019 • Xiang Li, Xiaolin Hu, Jian Yang
The Convolutional Neural Networks (CNNs) generate the feature representation of complex objects by collecting hierarchical and different parts of semantic sub-features.
Ranked #739 on Image Classification on ImageNet
2 code implementations • ACL 2019 • Sungjin Lee, Qi Zhu, Ryuichi Takanobu, Xiang Li, Yaoqin Zhang, Zheng Zhang, Jinchao Li, Baolin Peng, Xiujun Li, Minlie Huang, Jianfeng Gao
We present ConvLab, an open-source multi-domain end-to-end dialog system platform, that enables researchers to quickly set up experiments with reusable components and compare a large set of different approaches, ranging from conventional pipeline systems to end-to-end neural models, in common environments.
1 code implementation • 25 Jul 2023 • Zhengliang Liu, Tianyang Zhong, Yiwei Li, Yutong Zhang, Yi Pan, Zihao Zhao, Peixin Dong, Chao Cao, Yuxiao Liu, Peng Shu, Yaonai Wei, Zihao Wu, Chong Ma, Jiaqi Wang, Sheng Wang, Mengyue Zhou, Zuowei Jiang, Chunlin Li, Jason Holmes, Shaochen Xu, Lu Zhang, Haixing Dai, Kai Zhang, Lin Zhao, Yuanhao Chen, Xu Liu, Peilong Wang, Pingkun Yan, Jun Liu, Bao Ge, Lichao Sun, Dajiang Zhu, Xiang Li, Wei Liu, Xiaoyan Cai, Xintao Hu, Xi Jiang, Shu Zhang, Xin Zhang, Tuo Zhang, Shijie Zhao, Quanzheng Li, Hongtu Zhu, Dinggang Shen, Tianming Liu
The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP).
1 code implementation • ACL 2020 • Qi Zhu, Zheng Zhang, Yan Fang, Xiang Li, Ryuichi Takanobu, Jinchao Li, Baolin Peng, Jianfeng Gao, Xiaoyan Zhu, Minlie Huang
We present ConvLab-2, an open-source toolkit that enables researchers to build task-oriented dialogue systems with state-of-the-art models, perform an end-to-end evaluation, and diagnose the weakness of systems.
1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen
By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.
1 code implementation • 9 Apr 2023 • Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny
Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment.
1 code implementation • 23 Feb 2024 • Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu
Training LLMs at this scale brings unprecedented challenges to training efficiency and stability.
1 code implementation • ICCV 2023 • YuXuan Li, Qibin Hou, Zhaohui Zheng, Ming-Ming Cheng, Jian Yang, Xiang Li
To the best of our knowledge, this is the first time that large and selective kernel mechanisms have been explored in the field of remote sensing object detection.
Ranked #1 on Semantic Segmentation on UAVid
1 code implementation • 18 Mar 2024 • YuXuan Li, Xiang Li, Yimain Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang
While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios.
1 code implementation • 26 May 2023 • Kai Zhang, Jun Yu, Eashan Adhikarla, Rong Zhou, Zhiling Yan, Yixin Liu, Zhengliang Liu, Lifang He, Brian Davison, Xiang Li, Hui Ren, Sunyang Fu, James Zou, Wei Liu, Jing Huang, Chen Chen, Yuyin Zhou, Tianming Liu, Xun Chen, Yong Chen, Quanzheng Li, Hongfang Liu, Lichao Sun
Conventional task- and modality-specific artificial intelligence (AI) models are inflexible in real-world deployment and maintenance for biomedicine.
Ranked #1 on Text Summarization on MeQSum
1 code implementation • CVPR 2020 • Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang
In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only.
Ranked #20 on Few-Shot Semantic Segmentation on FSS-1000 (5-shot)
1 code implementation • 10 Jan 2024 • Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao
This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.
2 code implementations • ICLR 2020 • Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, Zhihua Zhang
In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of $\mathcal{O}(\frac{1}{T})$ for strongly convex and smooth problems, where $T$ is the number of SGDs.
1 code implementation • 20 May 2022 • Xiang Li, Wenhai Wang, Lingfeng Yang, Jian Yang
Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an elegant asymmetric encoder-decoder design, which significantly optimizes both the pre-training efficiency and fine-tuning accuracy.
Ranked #37 on Object Detection on COCO minival
2 code implementations • 10 Jan 2024 • Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun, Rui Kong, Yile Wang, Hanfei Geng, Jian Luan, Xuefeng Jin, Zilong Ye, Guanjing Xiong, Fan Zhang, Xiang Li, Mengwei Xu, Zhijun Li, Peng Li, Yang Liu, Ya-Qin Zhang, Yunxin Liu
Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.
1 code implementation • 25 Feb 2019 • Yuan Hu, Yunpeng Chen, Xiang Li, Jiashi Feng
In this work, we propose a novel dynamic feature fusion strategy that assigns different fusion weights for different input images and locations adaptively.
1 code implementation • 11 Mar 2024 • YuXuan Li, Xiang Li, Weijie Li, Qibin Hou, Li Liu, Ming-Ming Cheng, Jian Yang
To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created.
Ranked #1 on 2D Object Detection on SARDet-100K (using extra training data)
2 code implementations • ICCV 2021 • Kun Wang, Zhenyu Zhang, Zhiqiang Yan, Xiang Li, Baobei Xu, Jun Li, Jian Yang
Monocular depth estimation aims at predicting depth from a single image or video.
1 code implementation • 21 Mar 2024 • Qiushi Sun, Zhirui Chen, Fangzhi Xu, Kanzhi Cheng, Chang Ma, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Qipeng Guo, Xipeng Qiu, Pengcheng Yin, XiaoLi Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu
Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains.
3 code implementations • 19 Jun 2020 • Yu Zheng, Chen Gao, Xiang Li, Xiangnan He, Depeng Jin, Yong Li
We further demonstrate that the learned embeddings successfully capture the desired causes, and show that DICE guarantees the robustness and interpretability of recommendation.
1 code implementation • 30 Mar 2022 • Gang Li, Xiang Li, Yujie Wang, Yichao Wu, Ding Liang, Shanshan Zhang
Specifically, for pseudo labeling, existing works only focus on the classification score yet fail to guarantee the localization precision of pseudo boxes; For consistency training, the widely adopted random-resize training only considers the label-level consistency but misses the feature-level one, which also plays an important role in ensuring the scale invariance.
1 code implementation • 12 Jul 2022 • Gang Li, Xiang Li, Yujie Wang, Yichao Wu, Ding Liang, Shanshan Zhang
Specifically, we propose the Inverse NMS Clustering (INC) and Rank Matching (RM) to instantiate the dense supervision, without the widely used, conventional sparse pseudo labels.
1 code implementation • 29 Nov 2022 • Zheng Li, Xiang Li, Lingfeng Yang, Borui Zhao, RenJie Song, Lei Luo, Jun Li, Jian Yang
In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic and learnable temperature.
1 code implementation • COLING 2018 • Zikun Hu, Xiang Li, Cunchao Tu, Zhiyuan Liu, Maosong Sun
Specifically, our model outperforms other baselines by more than 50{\%} in the few-shot scenario.
1 code implementation • 9 Nov 2023 • Jinjin Xu, Liwu Xu, Yuzhe Yang, Xiang Li, Fanyi Wang, Yanchun Xie, Yi-Jie Huang, Yaqian Li
Recent advancements in multi-modal large language models (MLLMs) have led to substantial improvements in visual understanding, primarily driven by sophisticated modality alignment strategies.
1 code implementation • 20 Jun 2023 • Jiabao Wang, Yuming Chen, Zhaohui Zheng, Xiang Li, Ming-Ming Cheng, Qibin Hou
Moreover, as mimicking the teacher's predictions is the target of KD, CrossKD offers more task-oriented information in contrast with feature imitation.
2 code implementations • 7 Nov 2022 • Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li, Dan Zhu, Mengdi Sun, Ran Duan, Yan Gao, Lingshun Kong, Long Sun, Xiang Li, Xingdong Zhang, Jiawei Zhang, Yaqi Wu, Jinshan Pan, Gaocheng Yu, Jin Zhang, Feng Zhang, Zhe Ma, Hongbin Wang, Hojin Cho, Steve Kim, Huaen Li, Yanbo Ma, Ziwei Luo, Youwei Li, Lei Yu, Zhihong Wen, Qi Wu, Haoqiang Fan, Shuaicheng Liu, Lize Zhang, Zhikai Zong, Jeremy Kwon, Junxi Zhang, Mengyuan Li, Nianxiang Fu, Guanchen Ding, Han Zhu, Zhenzhong Chen, Gen Li, Yuanfan Zhang, Lei Sun, Dafeng Zhang, Neo Yang, Fitz Liu, Jerry Zhao, Mustafa Ayazoglu, Bahri Batuhan Bilecen, Shota Hirose, Kasidis Arunruangsirilert, Luo Ao, Ho Chun Leung, Andrew Wei, Jie Liu, Qiang Liu, Dahai Yu, Ao Li, Lei Luo, Ce Zhu, Seongmin Hong, Dongwon Park, Joonhee Lee, Byeong Hyun Lee, Seunggyu Lee, Se Young Chun, Ruiyuan He, Xuhao Jiang, Haihang Ruan, Xinjian Zhang, Jing Liu, Garas Gendy, Nabil Sabor, Jingchao Hou, Guanghui He
While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints.
1 code implementation • CVPR 2022 • Lingfeng Yang, Xiang Li, RenJie Song, Borui Zhao, Juntian Tao, Shihao Zhou, Jiajun Liang, Jian Yang
Therefore, it is helpful to leverage additional information, e. g., the locations and dates for data shooting, which can be easily accessible but rarely exploited.
2 code implementations • 9 May 2023 • Xiang Li, Congcong Wen, Yuan Hu, Zhenghang Yuan, Xiao Xiang Zhu
Existing AI-related research in remote sensing primarily focuses on visual understanding tasks while neglecting the semantic understanding of the objects and their relationships.
1 code implementation • 5 Mar 2024 • Zheng Li, Xiang Li, Xinyi Fu, Xin Zhang, Weiqiang Wang, Shuo Chen, Jian Yang
To our best knowledge, we are the first to (1) perform unsupervised domain-specific prompt-driven knowledge distillation for CLIP, and (2) establish a practical pre-storing mechanism of text features as shared class vectors between teacher and student.
Ranked #1 on Prompt Engineering on Oxford-IIIT Pet Dataset
1 code implementation • 12 Oct 2021 • Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo
Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions.
1 code implementation • NeurIPS 2020 • Tao Zhuang, Zhixuan Zhang, Yuheng Huang, Xiaoyi Zeng, Kai Shuang, Xiang Li
Experimentally, we show that structured pruning using polarization regularizer achieves much better results than using L1 regularizer.
1 code implementation • 16 Sep 2023 • Cheng Chen, Juzheng Miao, Dufan Wu, Zhiling Yan, Sekeun Kim, Jiang Hu, Aoxiao Zhong, Zhengliang Liu, Lichao Sun, Xiang Li, Tianming Liu, Pheng-Ann Heng, Quanzheng Li
The Segment Anything Model (SAM), a foundation model for general image segmentation, has demonstrated impressive zero-shot performance across numerous natural image segmentation tasks.
1 code implementation • 31 Mar 2022 • Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Huashan Pan, Xiulin Li, Helen Meng
Inspired by Flat-LAttice Transformer (FLAT), we propose an end-to-end Chinese text normalization model, which accepts Chinese characters as direct input and integrates expert knowledge contained in rules into the neural network, both contribute to the superior performance of proposed model for the text normalization task.
2 code implementations • ICLR 2018 • Zhengdao Chen, Xiang Li, Joan Bruna
This graph inference task can be recast as a node-wise graph classification problem, and, as such, computational detection thresholds can be translated in terms of learning within appropriate models.
1 code implementation • 6 Dec 2020 • Jia-Qi Yang, Xiang Li, Shuguang Han, Tao Zhuang, De-Chuan Zhan, Xiaoyi Zeng, Bin Tong
To strike a balance in this trade-off, we propose Elapsed-Time Sampling Delayed Feedback Model (ES-DFM), which models the relationship between the observed conversion distribution and the true conversion distribution.
1 code implementation • 14 Feb 2023 • Chengcheng Han, Renyu Zhu, Jun Kuang, FengJiao Chen, Xiang Li, Ming Gao, Xuezhi Cao, Wei Wu
We design an improved triplet network to map samples and prototype vectors into a low-dimensional space that is easier to be classified and propose an adaptive margin for each entity type.
4 code implementations • CVPR 2018 • Jifeng Wang, Xiang Li, Le Hui, Jian Yang
Specifically, a shadow image is fed into the first generator which produces a shadow detection mask.
Ranked #3 on RGB Salient Object Detection on ISTD
2 code implementations • 17 Jul 2023 • Ruichen Li, Haotian Ye, Du Jiang, Xuelan Wen, Chuwei Wang, Zhe Li, Xiang Li, Di He, Ji Chen, Weiluo Ren, LiWei Wang
Neural network-based variational Monte Carlo (NN-VMC) has emerged as a promising cutting-edge technique of ab initio quantum chemistry.
1 code implementation • NeurIPS 2019 • Han Zhu, Daqing Chang, Ziru Xu, Pengye Zhang, Xiang Li, Jie He, Han Li, Jian Xu, Kun Gai
The previous work Tree-based Deep Model (TDM) \cite{zhu2018learning} greatly improves recommendation accuracy using tree index.
1 code implementation • 28 Jul 2023 • Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li
This dataset consists of human-annotated captions and visual question-answer pairs, allowing for a comprehensive assessment of VLMs in the context of RS.
1 code implementation • 2 Apr 2019 • Lingjing Wang, Jianchun Chen, Xiang Li, Yi Fang
In contrast, the proposed point registration neural network (PR-Net) actively learns the registration pattern as a parametric function from a training dataset, consequently predict the desired geometric transformation to align a pair of point sets.
1 code implementation • 18 Jan 2024 • Chenghua Gong, Yao Cheng, Xiang Li, Caihua Shan, Siqiang Luo
Graphs are structured data that models complex relations between real-world entities.
1 code implementation • 13 Mar 2024 • Yupeng Zheng, Xiang Li, Pengfei Li, Yuhang Zheng, Bu Jin, Chengliang Zhong, Xiaoxiao Long, Hao Zhao, Qichao Zhang
However, existing methods rely on a complex cascaded framework with relatively limited information to restore 3D scenes, including a dependency on supervision solely on the whole network's output, single-frame input, and the utilization of a small backbone.
1 code implementation • 16 Dec 2022 • Yimian Dai, Xiang Li, Fei Zhou, Yulei Qian, Yaohong Chen, Jian Yang
Finally, we present a new research benchmark for infrared small target detection, consisting of the SIRST-V2 dataset of real-world, high-resolution single-frame targets, the normalized contrast evaluation metric, and the DeepInfrared toolkit for detection.
1 code implementation • 29 Oct 2023 • Zhiling Yan, Kai Zhang, Rong Zhou, Lifang He, Xiang Li, Lichao Sun
In this paper, we critically evaluate the capabilities of the state-of-the-art multimodal large language model, i. e., GPT-4 with Vision (GPT-4V), on Visual Question Answering (VQA) task.
1 code implementation • 22 May 2023 • Zheng Li, YuXuan Li, Penghai Zhao, RenJie Song, Xiang Li, Jian Yang
Diffusion models have recently achieved astonishing performance in generating high-fidelity photo-realistic images.
1 code implementation • 15 May 2022 • Xiang Li, Renyu Zhu, Yao Cheng, Caihua Shan, Siqiang Luo, Dongsheng Li, Weining Qian
Further, for other homophilous nodes excluded in the neighborhood, they are ignored for information aggregation.
Ranked #2 on Node Classification on pokec
1 code implementation • 12 Apr 2022 • Wenjing Zhu, Xiang Li
Speech Emotion Recognition (SER) is a fundamental task to predict the emotion label from speech data.
1 code implementation • 31 Mar 2024 • Xiang Li, Fan Bu, Ambuj Mehrish, Yingting Li, Jiale Han, Bo Cheng, Soujanya Poria
The pursuit of modern models, like Diffusion Models (DMs), holds promise for achieving high-fidelity, real-time speech synthesis.
1 code implementation • 9 Sep 2022 • Yeon Seonwoo, Guoyin Wang, Changmin Seo, Sajal Choudhary, Jiwei Li, Xiang Li, Puyang Xu, Sunghyun Park, Alice Oh
In this work, we show that the semantic meaning of a sentence is also determined by nearest-neighbor sentences that are similar to the input sentence.
1 code implementation • 4 Jul 2023 • Xiang Li, Varun Belagali, Jinghuan Shang, Michael S. Ryoo
Sequence modeling approaches have shown promising results in robot imitation learning.
2 code implementations • 25 Aug 2023 • Yonghao Song, Bingchuan Liu, Xiang Li, Nanlin Shi, Yijun Wang, Xiaorong Gao
This paper presents a self-supervised framework to demonstrate the feasibility of learning image representations from EEG signals, particularly for object recognition.
1 code implementation • 20 Mar 2023 • Zhengliang Liu, Yue Huang, Xiaowei Yu, Lu Zhang, Zihao Wu, Chao Cao, Haixing Dai, Lin Zhao, Yiwei Li, Peng Shu, Fang Zeng, Lichao Sun, Wei Liu, Dinggang Shen, Quanzheng Li, Tianming Liu, Dajiang Zhu, Xiang Li
The digitization of healthcare has facilitated the sharing and re-using of medical data but has also raised concerns about confidentiality and privacy.
3 code implementations • 7 Jun 2019 • Lingjing Wang, Xiang Li, Jianchun Chen, Yi Fang
In contrast to previous efforts (e. g. coherent point drift), CPD-Net can learn displacement field function to estimate geometric transformation from a training dataset, consequently, to predict the desired geometric transformation for the alignment of previously unseen pairs without any additional iterative optimization process.
1 code implementation • 14 Jan 2023 • Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ming-Ming Cheng
In this paper, we study the spatial disequilibrium problem of modern object detectors and propose to quantify this ``spatial bias'' by measuring the detection performance over zones.
1 code implementation • 20 Oct 2023 • Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, Ming-Ming Cheng
A fundamental limitation of object detectors is that they suffer from "spatial bias", and in particular perform less satisfactorily when detecting objects near image borders.
1 code implementation • 31 Jan 2018 • Xi Cheng, Xiang Li, Ying Tai, Jian Yang
Single image super resolution is a very important computer vision task, with a wide range of applications.
Ranked #34 on Image Super-Resolution on BSD100 - 4x upscaling
1 code implementation • ACL 2022 • Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, Ruifeng Xu
Then, the descriptions of the objects are served as a bridge to determine the importance of the association between the objects of image modality and the contextual words of text modality, so as to build a cross-modal graph for each multi-modal instance.
1 code implementation • 12 Feb 2024 • Xiaohao Xu, Tianyi Zhang, Sibo Wang, Xiang Li, Yongqi Chen, Ye Li, Bhiksha Raj, Matthew Johnson-Roberson, Xiaonan Huang
To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations.
1 code implementation • 5 Sep 2023 • Renyu Zhu, Chengcheng Han, Yong Qian, Qiushi Sun, Xiang Li, Ming Gao, Xuezhi Cao, Yunsen Xian
To solve these issues, in this paper, we propose a novel exchanging-based multimodal fusion model MuSE for text-vision fusion based on Transformer.
1 code implementation • 6 Feb 2018 • Wenhai Wang, Xiang Li, Jian Yang, Tong Lu
Basing on the analysis by revealing the equivalence of modern networks, we find that both ResNet and DenseNet are essentially derived from the same "dense topology", yet they only differ in the form of connection -- addition (dubbed "inner link") vs. concatenation (dubbed "outer link").
1 code implementation • 8 May 2020 • Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang
Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited.
1 code implementation • 20 Oct 2021 • Guanjie Huang, Hongjian He, Xiang Li, Xingchen Li, Ziang Liu
Currently, it is hard to compare and evaluate different style transfer algorithms due to chaotic definitions of style and the absence of agreed objective validation methods in the study of style transfer.
1 code implementation • 18 Jan 2024 • Xianfu Cheng, Weixiao Zhou, Xiang Li, Xiaoming Chen, Jian Yang, Tongliang Li, Zhoujun Li
In this work, we propose the VIsion Permutable extractor for fast and efficient scene Text Recognition (VIPTR), which achieves an impressive balance between high performance and rapid inference speeds in the domain of STR.
1 code implementation • 1 Feb 2021 • Meimei Shang, Fei Gao, Xiang Li, Jingjie Zhu, Lingna Dai
In this paper, we propose a novel method to learn face sketch synthesis models by using unpaired data.
1 code implementation • 4 Jul 2022 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu
Referring Video Object Segmentation (R-VOS) is a challenging task that aims to segment an object in a video based on a linguistic expression.
Ranked #11 on Referring Video Object Segmentation on Refer-YouTube-VOS
Referring Expression Segmentation Referring Video Object Segmentation +2
1 code implementation • 14 Apr 2023 • Yiqun Yao, Siqi Fan, Xiusheng Huang, Xuezhi Fang, Xiang Li, Ziyi Ni, Xin Jiang, Xuying Meng, Peng Han, Shuo Shang, Kang Liu, Aixin Sun, Yequan Wang
With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B.
3 code implementations • 29 Sep 2023 • Xiang Li, Jinglu Wang, Xiaohao Xu, Xiulian Peng, Rita Singh, Yan Lu, Bhiksha Raj
We propose a semantic decomposition method based on product quantization, where the multi-source semantics can be decomposed and represented by several disentangled and noise-suppressed single-source semantics.
2 code implementations • 7 Mar 2024 • Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj
Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive.
1 code implementation • 5 Apr 2024 • JunHao Chen, Xiang Li, Xiaojun Ye, Chao Li, Zhaoxin Fan, Hao Zhao
The definition of an IDEA is the composition of multimodal inputs including text, image, and 3D models.
1 code implementation • 19 Aug 2019 • Congcong Wen, Lina Yang, Ling Peng, Xiang Li, Tianhe Chi
In this paper, we proposed a directionally constrained fully convolutional neural network (D-FCN) that can take the original 3D coordinates and LiDAR intensity as input; thus, it can directly apply to unstructured 3D point clouds for semantic labeling.
1 code implementation • 14 Mar 2022 • Lingfeng Yang, Xiang Li, Borui Zhao, RenJie Song, Jian Yang
In semantic segmentation, RM also surpasses the baseline and CutMix by 1. 9 and 1. 1 mIoU points under UperNet on ADE20K, respectively.
1 code implementation • 23 Nov 2022 • Ryan Burgert, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo
In this work, we explore how an off-the-shelf text-to-image diffusion model, trained without exposure to localization information, can ground various semantic phrases without segmentation-specific re-training.
1 code implementation • NeurIPS 2023 • Lingfeng Yang, Yueze Wang, Xiang Li, Xinlong Wang, Jian Yang
Previous works have suggested that incorporating visual prompts, such as colorful boxes or circles, can improve the ability of models to recognize objects of interest.
1 code implementation • COLING 2022 • Fangyu Lei, Shizhu He, Xiang Li, Jun Zhao, Kang Liu
In the real-world question answering scenarios, hybrid form combining both tabular and textual contents has attracted more and more attention, among which numerical reasoning problem is one of the most typical and challenging problems.
1 code implementation • CVPR 2023 • Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny
Video generation remains a challenging task due to spatiotemporal complexity and the requirement of synthesizing diverse motions with temporal consistency.
1 code implementation • 27 May 2023 • Zhibin Lan, Jiawei Yu, Xiang Li, Wen Zhang, Jian Luan, Bin Wang, Degen Huang, Jinsong Su
Text image translation (TIT) aims to translate the source texts embedded in the image to target translations, which has a wide range of applications and thus has important research value.
1 code implementation • 19 Mar 2024 • Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, Jun Huang, Wei Lin
The task of financial analysis primarily encompasses two key areas: stock trend prediction and the corresponding financial question answering.
2 code implementations • 17 Apr 2023 • Chong Ma, Zihao Wu, Jiaqi Wang, Shaochen Xu, Yaonai Wei, Zhengliang Liu, Xi Jiang, Lei Guo, Xiaoyan Cai, Shu Zhang, Tuo Zhang, Dajiang Zhu, Dinggang Shen, Tianming Liu, Xiang Li
The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians, and it is typically written by radiologists based on the 'Findings' section.
2 code implementations • ICCV 2023 • Lingyu Xiao, Xiang Li, Sen yang, Wankou Yang
In this paper, we revisit the limitations of anchor-based lane detection methods, which have predominantly focused on fixed anchors that stem from the edges of the image, disregarding their versatility and quality.
1 code implementation • 14 Feb 2022 • Qiyang Zhang, Xiang Li, Xiangying Che, Xiao Ma, Ao Zhou, Mengwei Xu, Shangguang Wang, Yun Ma, Xuanzhe Liu
Deploying deep learning (DL) on mobile devices has been a notable trend in recent years.
1 code implementation • 31 Oct 2023 • Srijan Das, Tanmay Jain, Dominick Reilly, Pranav Balaji, Soumyajit Karmakar, Shyam Marjit, Xiang Li, Abhijit Das, Michael S. Ryoo
We explore the appropriate SSL tasks that can be optimized alongside the primary task, the training schemes for these tasks, and the data scale at which they can be most effective.
1 code implementation • NeurIPS 2019 • Jianchun Chen, Lingjing Wang, Xiang Li, Yi Fang
To address this issue, we present an end-to-end trainable deep neural networks, named Arbitrary Continuous Geometric Transformation Networks (Arbicon-Net), to directly predict the dense displacement field for pairwise image alignment.
1 code implementation • 28 Dec 2022 • Jianxiang Yu, Qingqing Ge, Xiang Li, Aoying Zhou
In addition, we propose a variant model AdaMEOW that adaptively learns soft-valued weights of negative samples to further improve node representation.
1 code implementation • 10 May 2023 • Di Jin, Luzhi Wang, Yizhen Zheng, Guojie Song, Fei Jiang, Xiang Li, Wei Lin, Shirui Pan
We design a dual-intent network to learn user intent from an attention mechanism and the distribution of historical data respectively, which can simulate users' decision-making process in interacting with a new item.
1 code implementation • 3 Jul 2023 • Haixing Dai, Chong Ma, Zhiling Yan, Zhengliang Liu, Enze Shi, Yiwei Li, Peng Shu, Xiaozheng Wei, Lin Zhao, Zihao Wu, Fang Zeng, Dajiang Zhu, Wei Liu, Quanzheng Li, Lichao Sun, Shu Zhang Tianming Liu, Xiang Li
Starting with an initial point prompt, SAM produces an initial mask, which is then fed into our proposed SAMAug to generate augmented point prompts.
2 code implementations • 23 Oct 2016 • Yiping Song, Rui Yan, Xiang Li, Dongyan Zhao, Ming Zhang
In this paper, we propose a novel ensemble of retrieval-based and generation-based dialog systems in the open domain.
1 code implementation • ACL 2022 • Bin Liang, Qinglin Zhu, Xiang Li, Min Yang, Lin Gui, Yulan He, Ruifeng Xu
In this paper, we propose a joint contrastive learning (JointCL) framework, which consists of stance contrastive learning and target-aware prototypical graph contrastive learning.
1 code implementation • 19 May 2023 • Fangyu Lei, Xiang Li, Yifan Wei, Shizhu He, Yiming Huang, Jun Zhao, Kang Liu
In this paper, we propose a three-stage TextTableQA framework S3HQA, which comprises of retriever, selector, and reasoner.
1 code implementation • NeurIPS 2021 • Chenjie Cao, Yuxin Hong, Xiang Li, Chengrong Wang, Chengming Xu, xiangyang xue, Yanwei Fu
To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis.
1 code implementation • 29 Mar 2022 • Zhifang Fan, Dan Ou, Yulong Gu, Bairan Fu, Xiang Li, Wentian Bao, Xin-yu Dai, Xiaoyi Zeng, Tao Zhuang, Qingwen Liu
In this paper, we propose a new perspective for context-aware users' behavior modeling by including the whole page-wisely exposed products and the corresponding feedback as contextualized page-wise feedback sequence.
1 code implementation • 25 Mar 2024 • Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo
In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.
1 code implementation • 16 Nov 2020 • Yufeng Wang, Dan Li, Xiang Li, Min Yang
Further, this classifier is incorporated into the generative adversarial framework to help the generator to yield higher quality imputation results.
1 code implementation • 7 Dec 2022 • Xiang Li, Junbo Yin, Botian Shi, Yikang Li, Ruigang Yang, Jianbing Shen
In this paper, we present a more artful framework, LiDAR-guided Weakly Supervised Instance Segmentation (LWSIS), which leverages the off-the-shelf 3D data, i. e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models.
1 code implementation • 17 Dec 2018 • Xiang Li, Shihao Ji
The proposed method is generic and can defend white-box and black-box attacks without the need of retraining the original CNN classifiers, and can further strengthen the defense by retraining CNN or end-to-end finetuning the whole pipeline.
1 code implementation • 19 May 2021 • Cong Xu, Xiang Li, Min Yang
Neural networks are susceptible to artificially designed adversarial perturbations.
Ranked #1 on Adversarial Attack on CIFAR-10
1 code implementation • 8 Nov 2021 • Xiang Li, Shihao Ji
Extensive experiments on VGGFace, Traffic Sign and ImageNet show that GDPA achieves higher attack success rates than state-of-the-art patch attacks, while adversarially trained model with GDPA demonstrates superior robustness to adversarial patch attacks than competing methods.
1 code implementation • 19 Jul 2022 • Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, Pengcheng Yin
Recent advances in data processing have stimulated the demand for learning graphs of very large scales.
1 code implementation • 7 Oct 2022 • Nuo Chen, Qiushi Sun, Renyu Zhu, Xiang Li, Xuesong Lu, Ming Gao
To interpret these models, some probing methods have been applied.
1 code implementation • 16 Oct 2022 • Jianing Wang, Wenkang Huang, Qiuhui Shi, Hongbin Wang, Minghui Qiu, Xiang Li, Ming Gao
In this paper, to address these problems, we introduce a seminal knowledge prompting paradigm and further propose a knowledge-prompting-based PLM framework KP-PLM.
1 code implementation • 30 Sep 2023 • Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong
Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge.
1 code implementation • 8 Oct 2023 • Chengcheng Han, Xiaowei Du, Che Zhang, Yixin Lian, Xiang Li, Ming Gao, Baoyuan Wang
Chain-of-Thought (CoT) prompting has proven to be effective in enhancing the reasoning capabilities of Large Language Models (LLMs) with at least 100 billion parameters.
1 code implementation • 22 Nov 2023 • JunHao Chen, Peng Rong, Jingbo Sun, Chao Li, Xiang Li, Hongwu Lv
We introduce a large language model to parse the text and identify stylization goals and specific styles.
1 code implementation • 3 May 2020 • Xiang Li, Songcan Chen
In aligning, we characterize the global and local structures of multiple labels to be high-rank and low-rank, respectively.
1 code implementation • 23 Mar 2023 • Xiang Li, Ge Wu, Lingfeng Yang, Wenhai Wang, RenJie Song, Jian Yang
The various types of elements, deposited in the training history, are a large amount of wealth for improving learning deep models.
1 code implementation • 2 Feb 2024 • Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj
Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment.
1 code implementation • Findings (NAACL) 2022 • Ziqian Zeng, Weimin Ni, Tianqing Fang, Xiang Li, Xinran Zhao, Yangqiu Song
In this paper, we propose to query a masked language model with cloze style prompts to obtain supervision signals.
1 code implementation • 5 Feb 2023 • Chengcheng Han, Yuhe Wang, Yingnan Fu, Xiang Li, Minghui Qiu, Ming Gao, Aoying Zhou
Few-shot learning has been used to tackle the problem of label scarcity in text classification, of which meta-learning based methods have shown to be effective, such as the prototypical networks (PROTO).
1 code implementation • 27 Oct 2023 • Habib Slim, Xiang Li, Yuchen Li, Mahmoud Ahmed, Mohamed Ayman, Ujjwal Upadhyay, Ahmed Abdelreheem, Arpit Prajapati, Suhail Pothigara, Peter Wonka, Mohamed Elhoseiny
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks.
1 code implementation • IEEE Transactions on Multimedia 2020 • Fan Yang, Yang Wu, Zheng Wang, Xiang Li, Sakriani Sakti, Satoshi Nakamura
Therefore, previous works pre-train their models on rich-labeled photo retrieval data (i. e., source domain) and then fine-tune them on the limited-labeled sketch-to-photo retrieval data (i. e., target domain).
Ranked #1 on Image Retrieval on PKU-Reid
1 code implementation • 14 May 2023 • Qiushi Sun, Chengcheng Han, Nuo Chen, Renyu Zhu, Jingyang Gong, Xiang Li, Ming Gao
Large language models (LLMs) have shown increasing power on various natural language processing (NLP) tasks.
2 code implementations • 22 Sep 2023 • Xirong Cao, Xiang Li, Divyesh Jadav, Yanzhao Wu, Zhehui Chen, Chen Zeng, Wenqi Wei
Diffusion models have gained prominence in the image domain for their capabilities in data generation and transformation, achieving state-of-the-art performance in various tasks in both image and audio domains.
1 code implementation • 24 Sep 2023 • Sekeun Kim, Kyungsang Kim, Jiang Hu, Cheng Chen, Zhiliang Lyu, Ren Hui, Sunghwan Kim, Zhengliang Liu, Aoxiao Zhong, Xiang Li, Tianming Liu, Quanzheng Li
The Segmentation Anything Model (SAM) has gained significant attention for its robust generalization capabilities across diverse downstream tasks.
2 code implementations • 3 Feb 2018 • Zixiang Ding, Rui Xia, Jianfei Yu, Xiang Li, Jian Yang
Deep neural networks have recently been shown to achieve highly competitive performance in many computer vision tasks due to their abilities of exploring in a much larger hypothesis space.
1 code implementation • 3 Dec 2018 • Jie Zhao, Quanzheng Li, Xiang Li, Hongfeng Li, Li Zhang
Pap smear testing has been widely used for detecting cervical cancers based on the morphology properties of cell nuclei in microscopic image.
1 code implementation • 5 Mar 2021 • Shunyu Jiang, Fuli Feng, Weijian Chen, Xiang Li, Xiangnan He
Graph classification is a highly impactful task that plays a crucial role in a myriad of real-world applications such as molecular property prediction and protein function prediction. Aiming to handle the new classes with limited labeled graphs, few-shot graph classification has become a bridge of existing graph classification solutions and practical usage. This work explores the potential of metric-based meta-learning for solving few-shot graph classification. We highlight the importance of considering structural characteristics in the solution and propose a novel framework which explicitly considers global structure and local structure of the input graph.
1 code implementation • 30 May 2023 • Xiang Li, Chung-Ching Lin, Yinpeng Chen, Zicheng Liu, Jinglu Wang, Bhiksha Raj
The paper introduces PaintSeg, a new unsupervised method for segmenting objects without any training.
1 code implementation • 19 Jan 2024 • Zhengliang Liu, Jason Holmes, Wenxiong Liao, Chenbin Liu, Lian Zhang, Hongying Feng, Peilong Wang, Muhammad Ali Elahi, Hongmin Cai, Lichao Sun, Quanzheng Li, Xiang Li, Tianming Liu, Jiajian Shen, Wei Liu
ROND is specifically designed to address this gap in the domain of radiation oncology, a field that offers many opportunities for NLP exploration.
1 code implementation • CVPR 2023 • Kangyang Luo, Xiang Li, Yunshi Lan, Ming Gao
Federated Learning (FL) has emerged as a de facto machine learning area and received rapid increasing research interests from the community.
2 code implementations • ICCV 2023 • Renke Wang, Guimin Que, Shuo Chen, Xiang Li, Jun Li, Jian Yang
Our focus lies primarily on birds, a popular subject in 3D reconstruction, for which no existing single-view 3D transfer methods have been developed. The method we propose seeks to generate a 3D mesh shape and texture of a bird from two single-view images.
1 code implementation • 26 Aug 2023 • Mengwei Xu, Dongqi Cai, Yaozong Wu, Xiang Li, Shangguang Wang
Federated Learning (FL), a method to preserve user data privacy, is often employed in fine-tuning LLMs to downstream mobile tasks, an approach known as FedLLM.
1 code implementation • 30 May 2022 • Di Jin, Luzhi Wang, Yizhen Zheng, Xiang Li, Fei Jiang, Wei Lin, Shirui Pan
As most of the existing graph neural networks yield effective graph representations of a single graph, little effort has been made for jointly learning two graph representations and calculating their similarity score.
1 code implementation • 7 Jun 2023 • Yuting Zhang, Yiqing Wu, Ran Le, Yongchun Zhu, Fuzhen Zhuang, Ruidong Han, Xiang Li, Wei Lin, Zhulin An, Yongjun Xu
Different from traditional recommendation, takeaway recommendation faces two main challenges: (1) Dual Interaction-Aware Preference Modeling.
1 code implementation • 5 Jul 2023 • Hongmin Cai, Xiaoke Huang, Zhengliang Liu, Wenxiong Liao, Haixing Dai, Zihao Wu, Dajiang Zhu, Hui Ren, Quanzheng Li, Tianming Liu, Xiang Li
As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease.
1 code implementation • 29 Jul 2023 • Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao
The resistance of pFL methods with parameter decoupling is attributed to the heterogeneous classifiers between malicious clients and benign counterparts.
1 code implementation • ICCV 2023 • Jiangwei Yu, Xiang Li, Xinran Zhao, Hongming Zhang, Yu-Xiong Wang
Learning about object state changes in Video Object Segmentation (VOS) is crucial for understanding and interacting with the visual world.
1 code implementation • 29 Nov 2023 • Xiang Li, Qianli Shen, Kenji Kawaguchi
The booming use of text-to-image generative models has raised concerns about their high risk of producing copyright-infringing content.
1 code implementation • 4 Mar 2024 • Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu Jin, Lingyao Li, Haoyang Ling, Jinkui Chi, Jindong Wang, Xin Ma, Yongfeng Zhang
Understanding the reasoning capabilities of Multimodal Large Language Models (MLLMs) is an important area of research.
1 code implementation • IEEE International Conference on Communications 2020 • Yijun Su, Xiang Li, Baoping Liu, Daren Zha, Ji Xiang, Wei Tang and Neng Gao.
With the popularity of location-based social networks (LBSNs), Point-of-Interest (POI) recommendation has become an essential location-based service to help people explore novel locations.
1 code implementation • ACL 2022 • Zhiyong Wu, Wei Bi, Xiang Li, Lingpeng Kong, Ben Kao
We propose knowledge internalization (KI), which aims to complement the lexical knowledge into neural dialog models.
1 code implementation • ACL 2022 • Renyu Zhu, Lei Yuan, Xiang Li, Ming Gao, Wenyuan Cai
In this paper, we consider human behaviors and propose the PGNN-EK model that consists of two main components.
1 code implementation • COLING 2022 • Zhongjian Miao, Xiang Li, Liyan Kang, Wen Zhang, Chulun Zhou, Yidong Chen, Bin Wang, Min Zhang, Jinsong Su
Most existing methods on robust neural machine translation (NMT) construct adversarial examples by injecting noise into authentic examples and indiscriminately exploit two types of examples.
1 code implementation • 2 Mar 2023 • Mengge Liu, Wen Zhang, Xiang Li, Jian Luan, Bin Wang, Yuhang Guo, Shuoying Chen
Simultaneous machine translation (SimulMT) models start translation before the end of the source sentence, making the translation monotonically aligned with the source sentence.
1 code implementation • IEEE International Joint Conference on Neural Network 2020 • Yijun Su, Jia-Dong Zhang, Xiang Li, Daren Zha, Ji Xiang, Wei Tang, and Neng Gao
Recent studies mainly utilize social information, categorical information and/or geographical information to supplement the highly sparse check-in data.
1 code implementation • 3 May 2023 • Yucheng Shi, Hehuan Ma, Wenliang Zhong, Qiaoyu Tan, Gengchen Mai, Xiang Li, Tianming Liu, Junzhou Huang
To tackle these limitations, we propose a novel framework that leverages the power of ChatGPT for specific tasks, such as text classification, while improving its interpretability.
1 code implementation • 15 Nov 2023 • Yunshi Lan, Xiang Li, Xin Liu, Yang Li, Wei Qin, Weining Qian
This results in a set of candidate answers.
1 code implementation • 11 Sep 2017 • Xuan Peng, Xunzhang Gao, Xiang Li
To break this dependency between neighboring hidden units and speed up the convergence of training, a novel training strategy is proposed.
1 code implementation • 9 Aug 2019 • Xiang Li, Shihao Ji
Explaining the prediction of deep neural networks (DNNs) and semantic image compression are two active research areas of deep learning with a numerous of applications in decision-critical systems, such as surveillance cameras, drones and self-driving cars, where interpretable decision is critical and storage/network bandwidth is limited.
1 code implementation • 1 Jan 2021 • Xiulong Yang, Hui Ye, Yang Ye, Xiang Li, Shihao Ji
We show that our Generative MMC (GMMC) can be trained discriminatively, generatively, or jointly for image classification and generation.
1 code implementation • 29 Sep 2021 • Liang Zongwei, Junan Yang, Keju Huang, Hui Liu, Lin Cui, Lingzhi Qu, Xiang Li
The interpretability of the current temporal KG forecasting models is manifested in providing the reasoning paths.
1 code implementation • IEEE International Conference on Mobile Data Management (MDM) 2018 • Yijun Su, Xiang Li, Wei Tang, Ji Xiang, Yuanye He
In this paper, we propose a unified location prediction framework to integrate the effect of history check-in and the influence of social circles.
2 code implementations • 10 Jun 2022 • Xiang Li, Jinghuan Shang, Srijan Das, Michael S. Ryoo
We investigate whether self-supervised learning (SSL) can improve online reinforcement learning (RL) from pixels.
1 code implementation • COLING 2022 • Yequan Wang, Xiang Li, Aixin Sun, Xuying Meng, Huaming Liao, Jiafeng Guo
CofeNet is able to extract complicated quotations with components of variable lengths and complicated structures.
1 code implementation • 27 Mar 2023 • Xiang Li, Mingfu Shao
Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying compacted de Bruijn graph(cdBG) used in the latter task often contains millions of vertices and edges.
1 code implementation • 26 Jul 2023 • Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj
This work unveils the enigmatic link between phonemes and facial features.
4 code implementations • CVPR 2019 • Xiang Li, Shuo Chen, Xiaolin Hu, Jian Yang
Theoretically, we find that Dropout would shift the variance of a specific neural unit when we transfer the state of that network from train to test.
1 code implementation • AKBC 2020 • Dhruvesh Patel, Shib Sankar Dasgupta, Michael Boratko, Xiang Li, Luke Vilnis, Andrew McCallum
Box Embeddings [Vilnis et al., 2018, Li et al., 2019] represent concepts with hyperrectangles in $n$-dimensional space and are shown to be capable of modeling tree-like structures efficiently by training on a large subset of the transitive closure of the WordNet hypernym graph.
1 code implementation • 10 Jan 2022 • Lianghao Xia, Chao Huang, Yong Xu, Huance Xu, Xiang Li, WeiGuo Zhang
As the deep learning techniques have expanded to real-world recommendation tasks, many deep neural network based Collaborative Filtering (CF) models have been developed to project user-item interactions into latent feature space, based on various neural architectures, such as multi-layer perceptron, auto-encoder and graph neural networks.
1 code implementation • 23 Sep 2023 • Xiang Li, JunHao Chen, Chao Li, Hongwu Lv
Audio recognition in specialized areas such as birdsong and submarine acoustics faces challenges in large-scale pre-training due to the limitations in available samples imposed by sampling environments and specificity requirements.
1 code implementation • 19 Oct 2023 • Jianing Wang, Qiushi Sun, Nuo Chen, Chengyu Wang, Jun Huang, Ming Gao, Xiang Li
The recent success of large pre-trained language models (PLMs) heavily hinges on massive labeled data, which typically produces inferior performance in low-resource scenarios.
1 code implementation • 7 Nov 2023 • Enhong Liu, Joseph Suarez, Chenhui You, Bo Wu, BingCheng Chen, Jun Hu, Jiaxin Chen, Xiaolong Zhu, Clare Zhu, Julian Togelius, Sharada Mohanty, Weijun Hong, Rui Du, Yibing Zhang, Qinwen Wang, Xinhang Li, Zheng Yuan, Xiang Li, Yuejia Huang, Kun Zhang, Hanhui Yang, Shiqi Tang, Phillip Isola
In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1, 600 submissions.
1 code implementation • 20 Feb 2024 • Xiang Li, Yunshi Lan, Chao Yang
Recently, numerous new benchmarks have been established to evaluate the performance of large language models (LLMs) via either computing a holistic score or employing another LLM as a judge.
1 code implementation • 19 Mar 2024 • Chong Ma, Hanqi Jiang, WenTing Chen, Zihao Wu, Xiaowei Yu, Fang Zeng, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, Xiang Li
Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal pre-training.
1 code implementation • 29 Dec 2021 • Xiang Li, Wenhao Yang, Jiadong Liang, Zhihua Zhang, Michael I. Jordan
We study Q-learning with Polyak-Ruppert averaging in a discounted Markov decision process in synchronous and tabular settings.
1 code implementation • 28 Aug 2023 • Jinliang Yuan, Chen Yang, Dongqi Cai, Shihe Wang, Xin Yuan, Zeling Zhang, Xiang Li, Dingge Zhang, Hanzi Mei, Xianqing Jia, Shangguang Wang, Mengwei Xu
Concurrently, each app contributes a concise, offline fine-tuned "adapter" tailored to distinct downstream tasks.
1 code implementation • 14 Oct 2023 • Zhihui Zhang, Jianxiang Yu, Xiang Li
Session-based recommendation (SBR) is a task that aims to predict items based on anonymous sequences of user behaviors in a session.
1 code implementation • 21 Nov 2023 • Shu Zheng, Tiandi Ye, Xiang Li, Ming Gao
We theoretically show that the consensus mechanism can guarantee the convergence of the global objective.
no code implementations • 31 May 2018 • Yu Zhao, Xiang Li, Wei zhang, Shijie Zhao, Milad Makkie, Mo Zhang, Quanzheng Li, Tianming Liu
Simultaneous modeling of the spatio-temporal variation patterns of brain functional network from 4D fMRI data has been an important yet challenging problem for the field of cognitive neuroscience and medical image analysis.
no code implementations • 28 May 2018 • Yabo Ni, Dan Ou, Shichen Liu, Xiang Li, Wenwu Ou, An-Xiang Zeng, Luo Si
In this work, we propose to learn universal user representations across multiple tasks for more e ective personalization.
no code implementations • ACL 2018 • Luke Vilnis, Xiang Li, Shikhar Murty, Andrew McCallum
Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE) (Vendrov et al., 2016), are a natural way to model transitive relational data (e. g. entailment graphs).
no code implementations • 9 Feb 2018 • Shuo Chen, Chen Gong, Jian Yang, Xiang Li, Yang Wei, Jun Li
In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both the adversarial pairs and the original training pairs.
no code implementations • 6 Dec 2017 • Le Hui, Xiang Li, Jiaxin Chen, Hongliang He, Chen Gong, Jian Yang
Unsupervised Image-to-Image Translation achieves spectacularly advanced developments nowadays.
no code implementations • 31 Oct 2017 • Zhe Guo, Xiang Li, Heng Huang, Ning Guo, Quanzheng Li
Image analysis using more than one modality (i. e. multi-modal) has been increasingly applied in the field of biomedical imaging.
no code implementations • 23 Oct 2017 • Mo Zhang, Xiang Li, Mengjia Xu, Quanzheng Li
Reliable cell segmentation and classification from biomedical images is a crucial step for both scientific research and clinical practice.
no code implementations • 1 Aug 2017 • Xiang Li, Luke Vilnis, Andrew McCallum
Recent work in learning ontologies (hierarchical and partially-ordered structures) has leveraged the intrinsic geometry of spaces of learned representations to make predictions that automatically obey complex structural constraints.
no code implementations • 31 Jul 2017 • Jing Mei, Eryu Xia, Xiang Li, Guotong Xie
Precision medicine requires the precision disease risk prediction models.
no code implementations • 21 Jul 2017 • Seongah Jeong, Xiang Li, Jiarui Yang, Quanzheng Li, Vahid Tarokh
In order to address the limitations of the unsupervised DLSC-based fMRI studies, we utilize the prior knowledge of task paradigm in the learning step to train a data-driven dictionary and to model the sparse representation.
no code implementations • 19 Jul 2017 • Xiang Li, Aoxiao Zhong, Ming Lin, Ning Guo, Mu Sun, Arkadiusz Sitek, Jieping Ye, James Thrall, Quanzheng Li
However, the development of a robust and reliable deep learning model for computer-aided diagnosis is still highly challenging due to the combination of the high heterogeneity in the medical images and the relative lack of training samples.
no code implementations • 29 May 2017 • Songting Shi, Xiang Li, Arkadiusz Sitek, Quanzheng Li
In this article, we derive a Bayesian model to learning the sparse and low rank PARAFAC decomposition for the observed tensor with missing values via the elastic net, with property to find the true rank and sparse factor matrix which is robust to the noise.
no code implementations • NeurIPS 2016 • Xiang Li, Tao Qin, Jian Yang, Tie-Yan Liu
Based on the 2-Component shared embedding, we design a new RNN algorithm and evaluate it using the language modeling task on several benchmark datasets.
no code implementations • 24 Nov 2015 • Dekang Zhu, Dan P. Guralnik, Xuezhi Wang, Xiang Li, Bill Moran
Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data.
no code implementations • CVPR 2016 • Jin-Jie You, An-Cong Wu, Xiang Li, Wei-Shi Zheng
Since only limited information can be exploited from still images, it is hard (if not impossible) to overcome the occlusion, pose and camera-view change, and lighting variation problems.
no code implementations • 26 Apr 2016 • Shangxuan Wu, Ying-Cong Chen, Xiang Li, An-Cong Wu, Jin-Jie You, Wei-Shi Zheng
In this paper, we focus on the feature representation and claim that hand-crafted histogram features can be complementary to Convolutional Neural Network (CNN) features.
no code implementations • 15 Apr 2016 • Xiang Li, Lili Mou, Rui Yan, Ming Zhang
In this paper, we propose StalemateBreaker, a conversation system that can proactively introduce new content when appropriate.
no code implementations • 25 Nov 2015 • Dekang Zhu, Dan P. Guralnik, Xuezhi Wang, Xiang Li, Bill Moran
We derive a statistical model for estimation of a dendrogram from single linkage hierarchical clustering (SLHC) that takes account of uncertainty through noise or corruption in the measurements of separation of data.
no code implementations • 28 Aug 2014 • Chao Zhang, DaCheng Tao, Tao Hu, Xiang Li
We are mainly concerned with two theoretical questions: 1) under what conditions does RMTL perform better with a smaller task sample size than STL?
no code implementations • ECCV 2018 • Xiang Li, An-Cong Wu, Wei-Shi Zheng
The main idea is learning to attack feature extractor on the target people by using GAN to generate very target-like images (imposters), and in the meantime the model will make the feature extractor learn to tolerate the attack by discriminative learning so as to realize group-based verification.
no code implementations • 6 Aug 2018 • Jiasha Liu, Xiang Li, Hui Ren, Quanzheng Li
The framework combines two 1st-level modules: direct estimation module and a segmentation module.
no code implementations • 1 Oct 2018 • Xiang Li, Qitian Chen, Xing Wang, Ning Guo, Nan Wu, Quanzheng Li
In this work, we developed a network inference method from incomplete data ("PathInf") , as massive and non-uniformly distributed missing values is a common challenge in practical problems.
no code implementations • 8 Oct 2018 • Xi Cheng, Xiang Li, Jian Yang
Single image super resolution is of great importance as a low-level computer vision task.
no code implementations • 17 Oct 2018 • Jing Mei, Shiwan Zhao, Feng Jin, Eryu Xia, Haifeng Liu, Xiang Li
In healthcare, applying deep learning models to electronic health records (EHRs) has drawn considerable attention.
no code implementations • 2 Nov 2018 • Xiang Li, Haiyang Xue, Wei Chen, Yang Liu, Yang Feng, Qun Liu
Although neural machine translation (NMT) has achieved impressive progress recently, it is usually trained on the clean parallel data set and hence cannot work well when the input sentence is the production of the automatic speech recognition (ASR) system due to the enormous errors in the source.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 18 Dec 2018 • Jiechao Ma, Xiang Li, Hongwei Li, Bjoern H. Menze, Sen Liang, Rongguo Zhang, Wei-Shi Zheng
In this paper, we propose a novel and effective abnormality detector implementing the attention mechanism and group convolution on 3D single-shot detector (SSD) called group-attention SSD (GA-SSD).
Computed Tomography (CT) Finding Pulmonary Nodules In Large-Scale Ct Images
no code implementations • ECCV 2018 • Zhen-Yu Zhang, Zhen Cui, Chunyan Xu, Zequn Jie, Xiang Li, Jian Yang
In this paper, we propose a novel joint Task-Recursive Learning (TRL) framework for the closing-loop semantic segmentation and monocular depth estimation tasks.
Ranked #76 on Semantic Segmentation on NYU Depth v2
no code implementations • ICLR 2019 • Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, Andrew McCallum
However, the hard edges of the boxes present difficulties for standard gradient based optimization; that work employed a special surrogate function for the disjoint case, but we find this method to be fragile.