no code implementations • 3 Sep 2024 • Yuchen Yan, Jin Jiang, Yang Liu, Yixin Cao, Xin Xu, Mengdi Zhang, Xunliang Cai, Jian Shao
To the best of our knowledge, we are the first to introduce the spontaneous step-level self-correction ability of LLMs in mathematical reasoning.
no code implementations • 14 Mar 2024 • Chang Zong, Yuyan Chen, Weiming Lu, Jian Shao, Yongfeng Huang, Heng Chang, Yueting Zhuang
Large Language Models (LLMs) have demonstrated efficacy in various linguistic applications, including question answering and controlled text generation.
1 code implementation • 22 Feb 2024 • Chang Zong, Yuchen Yan, Weiming Lu, Jian Shao, Eliot Huang, Heng Chang, Yueting Zhuang
We evaluated the performance of our framework using three benchmark datasets, and the results show that our framework outperforms state-of-the-art systems on the LC-QuAD and YAGO-QA benchmarks, yielding F1 scores of 11. 8% and 20. 7%, respectively.
no code implementations • 30 Jul 2023 • Wenqing Wang, Kaifeng Gao, Yawei Luo, Tao Jiang, Fei Gao, Jian Shao, Jianwen Sun, Jun Xiao
Video-based scene graph generation (VidSGG) is an approach that aims to represent video content in a dynamic graph by identifying visual entities and their relationships.
no code implementations • 25 Jun 2023 • Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen
A recent DIC method proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i. e., reference-based DIC (Ref-DIC).
1 code implementation • NeurIPS 2023 • Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen
To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues.
no code implementations • 19 May 2023 • Chenchi Zhang, Jun Xiao, Lei Chen, Jian Shao, Long Chen
In this paper, we argue that their poor interpretability is attributed to the holistic prompt generation and inference process.
no code implementations • 11 Mar 2023 • Zhen Wang, Jun Xiao, Yueting Zhuang, Fei Gao, Jian Shao, Long Chen
To this end, we propose a novel prompt-based framework for CIC by learning Combinatorial Prompts, dubbed as ComPro.
no code implementations • 2 Oct 2022 • Chang Zong, Yueting Zhuang, Weiming Lu, Jian Shao, Siliang Tang
In this paper, we propose CTPIR, a new citation trajectory prediction framework that is able to represent the influence (the momentum of citation) of either new or existing publications using the history information of all their attributes.
no code implementations • 7 Aug 2022 • Lin Li, Long Chen, Hanrong Shi, Wenxiao Wang, Jian Shao, Yi Yang, Jun Xiao
To this end, we propose a novel model-agnostic Label Semantic Knowledge Distillation (LS-KD) for unbiased SGG.
no code implementations • 3 Aug 2022 • Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang, Jun Xiao
Current Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones due to the severe imbalanced distribution of predicates.
1 code implementation • 22 Jul 2022 • Yangjun Mao, Long Chen, Zhihong Jiang, Dong Zhang, Zhimeng Zhang, Jian Shao, Jun Xiao
Unfortunately, reference images used by existing Ref-DIC works are easy to distinguish: these reference images only resemble the target image at scene-level and have few common objects, such that a Ref-DIC model can trivially generate distinctive captions even without considering the reference images.
1 code implementation • 20 Jul 2022 • Zhen Wang, Long Chen, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao
Given an image and a reference caption, the image caption editing task aims to correct the misalignment errors and generate a refined caption.
1 code implementation • CVPR 2022 • Kaifeng Gao, Long Chen, Yulei Niu, Jian Shao, Jun Xiao
To this end, we propose a new classification-then-grounding framework for VidSGG, which can avoid all the three overlooked drawbacks.
1 code implementation • EMNLP 2021 • Shaoning Xiao, Long Chen, Jian Shao, Yueting Zhuang, Jun Xiao
Given an untrimmed video and a natural language query, Natural Language Video Localization (NLVL) aims to identify the video moment described by the query.
no code implementations • 3 Sep 2021 • Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, Jun Xiao
Deep neural networks have demonstrated remarkable performance in many data-driven and prediction-oriented applications, and sometimes even perform better than humans.
1 code implementation • 21 Jun 2021 • Tao Chen, Haochen Shi, Liyuan Liu, Siliang Tang, Jian Shao, Zhigang Chen, Yueting Zhuang
In this paper, we propose collaborative adversarial training to improve the data utilization, which coordinates virtual adversarial training (VAT) and adversarial training (AT) at different levels.
no code implementations • 26 May 2021 • Feifei Shao, Long Chen, Jian Shao, Wei Ji, Shaoning Xiao, Lu Ye, Yueting Zhuang, Jun Xiao
With the success of deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention.
no code implementations • 12 May 2021 • Chenchi Zhang, Wenbo Ma, Jun Xiao, Hanwang Zhang, Jian Shao, Yueting Zhuang, Long Chen
In this paper, we argue that these methods overlook an obvious \emph{mismatch} between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i. e., query-agnostic), hoping that the proposals contain all instances mentioned in the text query (i. e., query-aware).
no code implementations • 15 Mar 2021 • Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao
State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.
2 code implementations • CVPR 2017 • Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, Tat-Seng Chua
Existing visual attention models are generally spatial, i. e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image.