1 code implementation • ECCV 2020 • Yanbo Fan, Baoyuan Wu, Tuanhui Li, Yong Zhang, Mingyang Li, Zhifeng Li, Yujiu Yang
Based on this factorization, we formulate the sparse attack problem as a mixed integer programming (MIP) to jointly optimize the binary selection factors and continuous perturbation magnitudes of all pixels, with a cardinality constraint on selection factors to explicitly control the degree of sparsity.
1 code implementation • Findings (EMNLP) 2021 • Junjie Wang, Yatai Ji, Jiaqi Sun, Yujiu Yang, Tetsuya Sakai
On the other hand, trilinear models such as the CTI model efficiently utilize the inter-modality information between answers, questions, and images, while ignoring intra-modality information.
no code implementations • 7 Sep 2024 • Runming Yang, Taiqiang Wu, Yujiu Yang
In this paper, we propose a simple yet effective Logit Calibration (LoCa) method, which calibrates the logits from the teacher model based on the ground-truth labels.
no code implementations • 6 Sep 2024 • Zhuoyan Luo, Fengyuan Shi, Yixiao Ge, Yujiu Yang, LiMin Wang, Ying Shan
We present Open-MAGVIT2, a family of auto-regressive image generation models ranging from 300M to 1. 5B.
Ranked #19 on Image Generation on ImageNet 256x256
1 code implementation • 29 Jul 2024 • Cheng Yang, Guoping Huang, Mo Yu, Zhirui Zhang, Siheng Li, Mingming Yang, Shuming Shi, Yujiu Yang, Lemao Liu
Existing work addresses this task through a classification model based on a neural network that maps the hidden vector of the input context into its corresponding label (i. e., the candidate target word is treated as a label).
no code implementations • 21 Jul 2024 • Yu Li, Yifan Chen, Gongye Liu, Jie Wu, Yujiu Yang
We find that these methods overly focus on content information and lack constraints on layout spatial structure, resulting in an imbalance of learning content-aware and graphic-aware features.
1 code implementation • 10 Jul 2024 • Yatai Ji, Shilong Zhang, Jie Wu, Peize Sun, Weifeng Chen, Xuefeng Xiao, Sidi Yang, Yujiu Yang, Ping Luo
The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities.
1 code implementation • 28 Jun 2024 • Yuxiang Zhang, Jing Chen, Junjie Wang, Yaxin Liu, Cheng Yang, Chufan Shi, Xinyu Zhu, Zihao Lin, Hanwen Wan, Yujiu Yang, Tetsuya Sakai, Tian Feng, Hayato Yamana
To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH.
no code implementations • 21 Jun 2024 • Haoling Li, Xin Zhang, Xiao Liu, Yeyun Gong, Yifan Wang, Yujiu Yang, Qi Chen, Peng Cheng
Large language models (LLMs) have revolutionized lots of fields of research.
no code implementations • 17 Jun 2024 • Jing Chen, Xinyu Zhu, Cheng Yang, Chufan Shi, Yadong Xi, Yuxiang Zhang, Junjie Wang, Jiashu Pu, Rongsheng Zhang, Yujiu Yang, Tian Feng
Generative AI has demonstrated unprecedented creativity in the field of computer vision, yet such phenomena have not been observed in natural language processing.
1 code implementation • 14 Jun 2024 • Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang
We introduce a new benchmark, ChartMimic, aimed at assessing the visually-grounded code generation capabilities of large multimodal models (LMMs).
1 code implementation • 11 Jun 2024 • Tianle Gu, Zeyang Zhou, Kexin Huang, Dandan Liang, Yixu Wang, Haiquan Zhao, Yuanqi Yao, Xingge Qiao, Keqing Wang, Yujiu Yang, Yan Teng, Yu Qiao, Yingchun Wang
In this paper, we present MLLMGuard, a multidimensional safety evaluation suite for MLLMs, including a bilingual image-text evaluation dataset, inference utilities, and a lightweight evaluator.
1 code implementation • 24 May 2024 • Zhuoyan Luo, Yinghao Wu, Yong liu, Yicheng Xiao, Xiao-Ping Zhang, Yujiu Yang
The newly proposed Generalized Referring Expression Segmentation (GRES) amplifies the formulation of classic RES by involving multiple/non-target scenarios.
1 code implementation • 23 May 2024 • Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng Li, Deng Cai, Yujiu Yang, Yu Meng
In MoE, each token in the input sequence activates a different subset of experts determined by a routing mechanism.
no code implementations • 15 Apr 2024 • Jiayi Li, Ruilin Luo, Jiaqi Sun, Jing Xiao, Yujiu Yang
In this paper, we investigate three crucial processes relevant to real-world construction scenarios: (a) the verification process, which arises from the necessity and limitations of human verifiers; (b) the mining process, which identifies the most promising candidates for verification; and (c) the training process, which harnesses verified data for subsequent utilization; in order to achieve a transition toward more realistic challenges.
no code implementations • CVPR 2024 • Chenming Shang, Shiji Zhou, Hengyuan Zhang, Xinzhe Ni, Yujiu Yang, Yuwang Wang
Concept Bottleneck Models (CBMs) map the black-box visual representations extracted by deep neural networks onto a set of interpretable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process.
no code implementations • 13 Apr 2024 • Chenming Shang, Hengyuan Zhang, Hao Wen, Yujiu Yang
The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic.
1 code implementation • 12 Apr 2024 • Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma
Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.
3 code implementations • 11 Apr 2024 • Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen
After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.
1 code implementation • CVPR 2024 • Mingdeng Cao, Sidi Yang, Yujiu Yang, Yinqiang Zheng
Additionally, a multi-distortion flow prediction strategy is integrated to mitigate the issue of inaccurate flow estimation further.
no code implementations • 1 Apr 2024 • Xinzhe Ni, Yeyun Gong, Zhibin Gou, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
Additionally, we showcase the use of QaDS in creating efficient fine-tuning mixtures with various selection ratios, and analyze the quality of a wide range of open-source datasets, which can perform as a reference for future works on mathematical reasoning tasks.
1 code implementation • 28 Mar 2024 • Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang
Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources.
no code implementations • 18 Mar 2024 • Yifan Wang, Yafei Liu, Chufan Shi, Haoling Li, Chen Chen, Haonan Lu, Yujiu Yang
Instruction tuning effectively optimizes Large Language Models (LLMs) for downstream tasks.
1 code implementation • 16 Mar 2024 • Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang
While Multimodal Large Language Models (MLLMs) have experienced significant advancement in visual understanding and reasoning, their potential to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored.
no code implementations • 1 Mar 2024 • Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, Yujiu Yang
Accordingly, permutation on the training data is considered as a potential solution, since this can make the model predict antecedent words or tokens.
1 code implementation • 22 Feb 2024 • Zicheng Lin, Zhibin Gou, Tian Liang, Ruilin Luo, Haowei Liu, Yujiu Yang
Utilizing CriticBench, we evaluate and dissect the performance of 17 LLMs in generation, critique, and correction reasoning, i. e., GQC reasoning.
2 code implementations • 20 Feb 2024 • Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui
In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-aware image diffusion models (e. g., layout, keypoints and segmentation maps) to enhance both realism and compositionality of the generated images.
no code implementations • 18 Feb 2024 • Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen
To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning.
1 code implementation • 10 Feb 2024 • Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
no code implementations • 10 Feb 2024 • Chufan Shi, Deng Cai, Yujiu Yang
In the rapidly evolving field of text generation, the demand for more precise control mechanisms has become increasingly apparent.
2 code implementations • 24 Jan 2024 • Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yujiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, Junxian He
Evaluating large language models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications.
no code implementations • 15 Jan 2024 • Zhibo Xiao, Luwei Yang, Tao Zhang, Wen Jiang, Wei Ning, Yujiu Yang
Recently, a new recommendation scenario, called Trigger-Induced Recommendation (TIR), where users are able to explicitly express their instant interests via trigger items, is emerging as an essential role in many e-commerce platforms, e. g., Alibaba. com and Amazon.
no code implementations • 11 Jan 2024 • Ruilin Luo, Tianle Gu, Haoling Li, Junzhe Li, Zicheng Lin, Jiayi Li, Yujiu Yang
Temporal Knowledge Graph Completion (TKGC) is a complex task involving the prediction of missing event links at future timestamps by leveraging established temporal structural knowledge.
1 code implementation • 1 Jan 2024 • Zhuoyan Luo, Yicheng Xiao, Yong liu, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang
The recent transformer-based models have dominated the Referring Video Object Segmentation (RVOS) task due to the superior performance.
1 code implementation • CVPR 2024 • Yong liu, Cairong Zhang, Yitong Wang, Jiahao Wang, Yujiu Yang, Yansong Tang
This paper aims to achieve universal segmentation of arbitrary semantic level.
Ranked #1 on Referring Expression Segmentation on RefCOCOg-test (using extra training data)
2 code implementations • 1 Dec 2023 • Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Yibo Wang, Xintao Wang, Yujiu Yang, Ying Shan
To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.
1 code implementation • CVPR 2024 • Yicheng Xiao, Zhuoyan Luo, Yong liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li
Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis.
Ranked #1 on Highlight Detection on YouTube Highlights
1 code implementation • CVPR 2024 • Haoze Sun, Wenbo Li, Jianzhuang Liu, Haoyu Chen, Renjing Pei, Xueyi Zou, Youliang Yan, Yujiu Yang
We achieve this by marrying image appearance and language understanding to generate a cognitive embedding, which not only activates prior information from large text-to-image diffusion models but also facilitates the generation of high-quality reference images to optimize the SR process.
no code implementations • 3 Nov 2023 • Yifan Wang, Qingyan Guo, Xinzhe Ni, Chufan Shi, Lemao Liu, Haiyun Jiang, Yujiu Yang
In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs), enabling them to learn input-label mappings from demonstrations and perform well on downstream tasks.
1 code implementation • 31 Oct 2023 • Tian Liang, Zhiwei He, Jen-tse Huang, Wenxuan Wang, Wenxiang Jiao, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi, Xing Wang
Ideally, an advanced agent should possess the ability to accurately describe a given word using an aggressive description while concurrently maximizing confusion in the conservative description, enhancing its participation in the game.
no code implementations • 23 Oct 2023 • Chufan Shi, Yixuan Su, Cheng Yang, Yujiu Yang, Deng Cai
Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks.
no code implementations • 9 Oct 2023 • Yong Lin, Fan Zhou, Lu Tan, Lintao Ma, Jiameng Liu, Yansu He, Yuan Yuan, Yu Liu, James Zhang, Yujiu Yang, Hao Wang
To address this challenge, we then propose Continuous Invariance Learning (CIL), which extracts invariant features across continuously indexed domains.
1 code implementation • 2 Oct 2023 • Yiyao Yu, Junjie Wang, Yuxiang Zhang, Lin Zhang, Yujiu Yang, Tetsuya Sakai
Artificial intelligence (AI) technologies should adhere to human norms to better serve our society and avoid disseminating harmful or misleading information, particularly in Conversational Information Retrieval (CIR).
1 code implementation • 29 Sep 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen
Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics.
Ranked #15 on Math Word Problem Solving on MATH (using extra training data)
no code implementations • 29 Sep 2023 • Yong Lin, Lu Tan, Yifan Hao, Honam Wong, Hanze Dong, Weizhong Zhang, Yujiu Yang, Tong Zhang
Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance.
1 code implementation • 25 Sep 2023 • Jiayi Li, Ruilin Luo, Jiaqi Sun, Jing Xiao, Yujiu Yang
Bilinear based models are powerful and widely used approaches for Knowledge Graphs Completion (KGC).
Ranked #5 on Link Property Prediction on ogbl-biokg
1 code implementation • 15 Sep 2023 • Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort.
1 code implementation • 14 Sep 2023 • Huayang Li, Siheng Li, Deng Cai, Longyue Wang, Lemao Liu, Taro Watanabe, Yujiu Yang, Shuming Shi
We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.
Ranked #133 on Visual Question Answering on MM-Vet
no code implementations • ICCV 2023 • Yuan Gong, Yong Zhang, Xiaodong Cun, Fei Yin, Yanbo Fan, Xuan Wang, Baoyuan Wu, Yujiu Yang
Moreover, since no paired data is provided, we propose a novel cross-domain training scheme using data from two domains with the designed analogy constraint.
1 code implementation • 12 Aug 2023 • Siheng Li, Yichun Yin, Cheng Yang, Wangjie Jiang, Yiwei Li, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang
In this paper, we propose a novel task, Proactive News Grounded Conversation, in which a dialogue system can proactively lead the conversation based on some key topics of the news.
no code implementations • 12 Aug 2023 • Siheng Li, Cheng Yang, Yichun Yin, Xinyu Zhu, Zesen Cheng, Lifeng Shang, Xin Jiang, Qun Liu, Yujiu Yang
Information-seeking conversation, which aims to help users gather information through conversation, has achieved great progress in recent years.
1 code implementation • 12 Jun 2023 • Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu
MGSC promotes learning more representative global features, which have a great impact on the performance of downstream tasks, while MLTC reconstructs modal-fusion local tokens, further enhancing accurate comprehension of multimodal data.
1 code implementation • 10 Jun 2023 • Xuanzhou Liu, Lin Zhang, Jiaqi Sun, Yujiu Yang, Haiqin Yang
Subgraph matching is a fundamental building block for graph-based applications and is challenging due to its high-order combinatorial nature.
no code implementations • 3 Jun 2023 • Yiji Cheng, Fei Yin, Xiaoke Huang, Xintong Yu, Jiaxiang Liu, Shikun Feng, Yujiu Yang, Yansong Tang
These elaborated designs enable our model to generate portraits with robust multi-view semantic consistency, eliminating the need for optimization-based methods.
1 code implementation • 30 May 2023 • Tian Liang, Zhiwei He, Wenxiang Jiao, Xing Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi
To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution.
1 code implementation • 29 May 2023 • Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.
1 code implementation • NeurIPS 2023 • Zhuoyan Luo, Yicheng Xiao, Yong liu, Shuyan Li, Yitong Wang, Yansong Tang, Xiu Li, Yujiu Yang
To address this issue, we propose Semantic-assisted Object Cluster (SOC), which aggregates video content and textual guidance for unified temporal modeling and cross-modal alignment.
Ranked #2 on Referring Expression Segmentation on A2D Sentences (using extra training data)
1 code implementation • 26 May 2023 • Gongye Liu, Haoze Sun, Jiayi Li, Fei Yin, Yujiu Yang
To derive the transitional state during the forward process, we introduce Distortion Adaptive Inversion.
1 code implementation • 23 May 2023 • Xinyu Zhu, Cheng Yang, Bei Chen, Siheng Li, Jian-Guang Lou, Yujiu Yang
Question answering plays a pivotal role in human daily life because it involves our acquisition of knowledge about the world.
1 code implementation • 22 May 2023 • Zhibin Gou, Qingyan Guo, Yujiu Yang
Generative methods greatly promote aspect-based sentiment analysis via generating a sequence of sentiment elements in a specified format.
Ranked #1 on Aspect-Based Sentiment Analysis (ABSA) on ACOS
Aspect-Based Sentiment Analysis Aspect Category Detection +11
no code implementations • 19 May 2023 • Xingyu Bai, Taiqiang Wu, Han Guo, Zhe Zhao, Xuefeng Yang, Jiayi Li, Weijie Liu, Qi Ju, Weigang Guo, Yujiu Yang
Event Extraction (EE), aiming to identify and classify event triggers and arguments from event mentions, has benefited from pre-trained language models (PLMs).
1 code implementation • 19 May 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checking, or a code interpreter for debugging.
no code implementations • 18 May 2023 • Bangrui Jiang, Zhenhua Guo, Yujiu Yang
In the first stage, we invert the input image to an editable latent code using off-the-shelf inversion techniques.
1 code implementation • NeurIPS 2023 • Tianhe Wu, Shuwei Shi, Haoming Cai, Mingdeng Cao, Jing Xiao, Yinqiang Zheng, Yujiu Yang
Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information.
1 code implementation • 16 May 2023 • Taiqiang Wu, Cheng Hou, Shanshan Lao, Jiayi Li, Ngai Wong, Zhe Zhao, Yujiu Yang
Knowledge Distillation (KD) is a predominant approach for BERT compression.
1 code implementation • 10 May 2023 • Jiaqi Sun, Lin Zhang, Guangyi Chen, Kun Zhang, Peng Xu, Yujiu Yang
Graph neural networks aim to learn representations for graph-structured data and show impressive performance, particularly in node classification.
2 code implementations • 6 May 2023 • Zhiwei He, Tian Liang, Wenxiang Jiao, Zhuosheng Zhang, Yujiu Yang, Rui Wang, Zhaopeng Tu, Shuming Shi, Xing Wang
Compared to typical machine translation that focuses solely on source-to-target mapping, LLM-based translation can potentially mimic the human translation process which might take preparatory steps to ensure high-quality translation.
2 code implementations • 12 Apr 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.
no code implementations • 24 Mar 2023 • Taiqiang Wu, Zhe Zhao, Jiahao Wang, Xingyu Bai, Lei Wang, Ngai Wong, Yujiu Yang
Distilling high-accuracy Graph Neural Networks~(GNNs) to low-latency multilayer perceptrons~(MLPs) on graph tasks has become a hot research topic.
1 code implementation • ICCV 2023 • Kunyang Han, Yong liu, Jun Hao Liew, Henghui Ding, Yunchao Wei, Jiajun Liu, Yitong Wang, Yansong Tang, Yujiu Yang, Jiashi Feng, Yao Zhao
Recent advancements in pre-trained vision-language models, such as CLIP, have enabled the segmentation of arbitrary concepts solely from textual inputs, a process commonly referred to as open-vocabulary semantic segmentation (OVS).
Knowledge Distillation Open Vocabulary Semantic Segmentation +4
no code implementations • 25 Feb 2023 • Chengze Yu, Taiqiang Wu, Jiayi Li, Xingyu Bai, Yujiu Yang
To the best of our knowledge, we are the first one to introduce syntactic information to generative ABSA frameworks.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1
no code implementations • ICCV 2023 • Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu, Yujiu Yang
Bridging this semantic gap now requires case-by-case algorithm design which is time-consuming and heavily relies on experienced adjustment.
1 code implementation • 1 Jan 2023 • Fei Yin, Yong Zhang, Baoyuan Wu, Yan Feng, Jingyi Zhang, Yanbo Fan, Yujiu Yang
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget.
no code implementations • CVPR 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.
no code implementations • ICCV 2023 • Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu, Yujiu Yang
In MKD, random patches of the input image are masked, and the corresponding missing feature is recovered by forcing it to imitate the output of the teacher.
no code implementations • 9 Dec 2022 • Xinzhe Ni, Yong liu, Hao Wen, Yatai Ji, Jing Xiao, Yujiu Yang
Then in the visual flow, visual prototypes are computed by a visual prototype-computed module.
1 code implementation • CVPR 2023 • Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen
Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition.
no code implementations • CVPR 2023 • Fei Yin, Yong Zhang, Xuan Wang, Tengfei Wang, Xiaoyu Li, Yuan Gong, Yanbo Fan, Xiaodong Cun, Ying Shan, Cengiz Oztireli, Yujiu Yang
It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion.
1 code implementation • CVPR 2023 • Yatai Ji, RongCheng Tu, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu
Cross-modal alignment is essential for vision-language pre-training (VLP) models to learn the correct corresponding information across different modalities.
Ranked #8 on Zero-Shot Video Retrieval on LSMDC
1 code implementation • 20 Nov 2022 • Taiqiang Wu, Xingyu Bai, Weigang Guo, Weijie Liu, Siheng Li, Yujiu Yang
We extract the knowledge units from the corresponding context and then construct a mention/entity centralized graph.
1 code implementation • 28 Oct 2022 • Xinyu Zhu, Junjie Wang, Lin Zhang, Yuxiang Zhang, Ruyi Gan, Jiaxing Zhang, Yujiu Yang
This inspires us to develop a cooperative reasoning-induced PLM for solving MWPs, called Cooperative Reasoning (CoRe), resulting in a human-like reasoning architecture with system 1 as the generator and system 2 as the verifier.
Ranked #103 on Arithmetic Reasoning on GSM8K
1 code implementation • 21 Oct 2022 • Wangjie Jiang, Zhihao Ye, Zijing Ou, Ruihui Zhao, Jianguang Zheng, Yi Liu, Siheng Li, Bang Liu, Yujiu Yang, Yefeng Zheng
In this work, we define the task of Medical-domain Chinese Spelling Correction and propose MCSCSet, a large scale specialist-annotated dataset that contains about 200k samples.
Optical Character Recognition Optical Character Recognition (OCR) +1
1 code implementation • 15 Oct 2022 • Jiaqi Sun, Lin Zhang, Shenglin Zhao, Yujiu Yang
Graph neural networks (GNNs) hold the promise of learning efficient representations of graph-structured data, and one of its most important applications is semi-supervised node classification.
1 code implementation • CVPR 2023 • Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang
Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets.
1 code implementation • 11 Oct 2022 • Yong liu, Ran Yu, Jiahao Wang, Xinyuan Zhao, Yitong Wang, Yansong Tang, Yujiu Yang
Besides, we empirically find low frequency feature should be enhanced in encoder (backbone) while high frequency for decoder (segmentation head).
1 code implementation • 28 Aug 2022 • Mingdeng Cao, Zhihang Zhong, Yanbo Fan, Jiahao Wang, Yong Zhang, Jue Wang, Yujiu Yang, Yinqiang Zheng
We believe the novel realistic synthesis pipeline and the corresponding RAW video dataset can help the community to easily construct customized blur datasets to improve real-world video deblurring performance largely, instead of laboriously collecting real data pairs.
1 code implementation • 23 Aug 2022 • Weihao Xia, Yujiu Yang, Jing-Hao Xue
The entire sequence is seen as discrete-time observations of a continuous trajectory of the initial latent code, by considering each latent code as a moving particle and the latent space as a high-dimensional dynamic system.
1 code implementation • 18 Jul 2022 • Shuwei Shi, Jinjin Gu, Liangbin Xie, Xintao Wang, Yujiu Yang, Chao Dong
In this paper, we rethink the role of alignment in VSR Transformers and make several counter-intuitive observations.
Ranked #6 on Video Super-Resolution on Vid4 - 4x upscaling
1 code implementation • 16 Jul 2022 • Yong liu, Ran Yu, Fei Yin, Xinyuan Zhao, Wei Zhao, Weihao Xia, Yujiu Yang
However, they mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory.
Ranked #11 on Semi-Supervised Video Object Segmentation on DAVIS 2016 (using extra training data)
1 code implementation • 1 Jun 2022 • Yutong Wang, Renze Lou, Kai Zhang, MaoYan Chen, Yujiu Yang
To address these problems, in this work, we propose a novel learning framework named MORE (Metric learning-based Open Relation Extraction).
1 code implementation • CVPR 2022 • Mingdeng Cao, Zhihang Zhong, Jiahao Wang, Yinqiang Zheng, Yujiu Yang
This paper proposes the first real-world rolling shutter (RS) correction dataset, BS-RSC, and a corresponding model to correct the RS frames in a distorted video.
1 code implementation • NAACL 2022 • Mao Yan Chen, Siheng Li, Yujiu Yang
To address the bias of the empathetic intents distribution between empathetic dialogue models and humans, we propose a novel model to generate empathetic responses with human-consistent empathetic intents, EmpHi for short.
2 code implementations • 22 Apr 2022 • Shanshan Lao, Yuan Gong, Shuwei Shi, Sidi Yang, Tianhe Wu, Jiahao Wang, Weihao Xia, Yujiu Yang
Image quality assessment (IQA) algorithm aims to quantify the human perception of image quality.
Ranked #1 on Image Quality Assessment on MSU FR VQA Database
2 code implementations • 20 Apr 2022 • Ren Yang, Radu Timofte, Meisong Zheng, Qunliang Xing, Minglang Qiao, Mai Xu, Lai Jiang, Huaida Liu, Ying Chen, Youcheng Ben, Xiao Zhou, Chen Fu, Pei Cheng, Gang Yu, Junyi Li, Renlong Wu, Zhilu Zhang, Wei Shang, Zhengyao Lv, Yunjin Chen, Mingcai Zhou, Dongwei Ren, Kai Zhang, WangMeng Zuo, Pavel Ostyakov, Vyal Dmitry, Shakarim Soltanayev, Chervontsev Sergey, Zhussip Magauiya, Xueyi Zou, Youliang Yan, Pablo Navarrete Michelini, Yunhua Lu, Diankai Zhang, Shaoli Liu, Si Gao, Biao Wu, Chengjian Zheng, Xiaofeng Zhang, Kaidi Lu, Ning Wang, Thuong Nguyen Canh, Thong Bach, Qing Wang, Xiaopeng Sun, Haoyu Ma, Shijie Zhao, Junlin Li, Liangbin Xie, Shuwei Shi, Yujiu Yang, Xintao Wang, Jinjin Gu, Chao Dong, Xiaodi Shi, Chunmei Nian, Dong Jiang, Jucai Lin, Zhihuai Xie, Mao Ye, Dengyan Luo, Liuhan Peng, Shengjie Chen, Qian Wang, Xin Liu, Boyang Liang, Hang Dong, Yuhao Huang, Kai Chen, Xingbei Guo, Yujing Sun, Huilei Wu, Pengxu Wei, Yulin Huang, Junying Chen, Ik Hyun Lee, Sunder Ali Khowaja, Jiseok Yoon
This challenge includes three tracks.
2 code implementations • 19 Apr 2022 • Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, Yujiu Yang
No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception.
Ranked #8 on Video Quality Assessment on MSU SR-QA Dataset
1 code implementation • 17 Apr 2022 • Mingdeng Cao, Yanbo Fan, Yong Zhang, Jue Wang, Yujiu Yang
For multi-frame temporal modeling, we adapt Transformer to fuse multiple spatial features efficiently.
1 code implementation • 21 Mar 2022 • Qingyan Bai, Yinghao Xu, Jiapeng Zhu, Weihao Xia, Yujiu Yang, Yujun Shen
In this work, we propose to involve the padding space of the generator to complement the latent space with spatial information.
1 code implementation • 8 Mar 2022 • Fei Yin, Yong Zhang, Xiaodong Cun, Mingdeng Cao, Yanbo Fan, Xuan Wang, Qingyan Bai, Baoyuan Wu, Jue Wang, Yujiu Yang
Our framework elevates the resolution of the synthesized talking face to 1024*1024 for the first time, even though the training dataset has a lower resolution.
no code implementations • 3 Mar 2022 • Mao Yan Chen, Haiyun Jiang, Yujiu Yang
The short text matching task employs a model to determine whether two short texts have the same semantic meaning or intent.
no code implementations • 15 Feb 2022 • Jiayi Li, Yujiu Yang
Therefore, we propose a corresponding bilinear model Scaling Translation and Rotation (STaR) consisting of the above two parts.
no code implementations • CVPR 2022 • Jiahao Wang, Baoyuan Wu, Rui Su, Mingdeng Cao, Shuwei Shi, Wanli Ouyang, Yujiu Yang
We conduct experiments both from a control theory lens through a phase locus verification and from a network training lens on several models, including CNNs, Transformers, MLPs, and on benchmark datasets.
4 code implementations • NeurIPS 2021 • Han Shu, Jiahao Wang, Hanting Chen, Lin Li, Yujiu Yang, Yunhe Wang
With the new operation, vision transformers constructed using additions can also provide powerful feature representations.
no code implementations • 10 Oct 2021 • Qingyan Bai, Weihao Xia, Fei Yin, Yujiu Yang
Concretely, we propose a novel dual-encoder architecture, in which an identity encoder extracts the identity-related feature, accompanied by a main encoder to obtain the rough contour information and further fuse all the information together.
no code implementations • 12 Sep 2021 • Pengda Si, Yao Qiu, Jinchao Zhang, Yujiu Yang
Further analysis individually proves the effectiveness of the enhanced concept graph and the Edge-Transformer architecture.
1 code implementation • 16 Aug 2021 • Ran Yu, Chenyu Tian, Weihao Xia, Xinyuan Zhao, Haoqian Wang, Yujiu Yang
To alleviate this problem, we propose a mechanism named Inner Center Sampling to improve the accuracy of instance segmentation.
1 code implementation • 22 Jul 2021 • Chenyu Tian, Ran Yu, Xinyuan Zhao, Weihao Xia, Haoqian Wang, Yujiu Yang
This simple framework achieves an unprecedented speed and a competitive accuracy on the COCO benchmark compared with state-of-the-art methods.
1 code implementation • 28 May 2021 • Xiaopei Wan, Guoqiu Li, Yujiu Yang, Zhenhua Guo
Furthermore, AADI is a learning-based anchor augmentation method, but it does not add any parameters or hyper-parameters, which is beneficial for research and downstream tasks.
no code implementations • 7 May 2021 • Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, SungJun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, ZiRui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang, Yifan Chen, Yujiu Yang, Yang Li, Tao Zhang, Longtao Feng, Yiting Liao, Junlin Li, William Thong, Jose Costa Pereira, Ales Leonardis, Steven McDonagh, Kele Xu, Lehan Yang, Hengxing Cai, Pengfei Sun, Seyed Mehdi Ayyoubzadeh, Ali Royat, Sid Ahmed Fezza, Dounia Hammou, Wassim Hamidouche, Sewoong Ahn, Gwangjin Yoon, Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa
This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021.
3 code implementations • 23 Apr 2021 • Shuwei Shi, Qingyan Bai, Mingdeng Cao, Weihao Xia, Jiahao Wang, Yifan Chen, Yujiu Yang
Image quality assessment (IQA) aims to assess the perceptual quality of images.
no code implementations • 19 Apr 2021 • Jiahao Wang, Han Shu, Weihao Xia, Yujiu Yang, Yunhe Wang
This paper studies the neural architecture search (NAS) problem for developing efficient generator networks.
2 code implementations • 18 Apr 2021 • Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu
To be specific, we propose a brand new paradigm of text-guided image generation and manipulation based on the superior characteristics of a pretrained GAN model.
Ranked #5 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
1 code implementation • CVPR 2021 • Gengcong Yang, Jingyi Zhang, Yong Zhang, Baoyuan Wu, Yujiu Yang
The ambiguity naturally leads to the issue of \emph{implicit multi-label}, motivating the need for diverse predictions.
no code implementations • 31 Jan 2021 • Lanbo Lin, Yujiu Yang, Zhenhua Guo
Firstly, AACP represents the structure of a model as a structure vector and introduces a pruning step vector to control the compressing granularity of each layer.
no code implementations • 28 Jan 2021 • Xiaopei Wan, Zhenhua Guo, Chao He, Yujiu Yang, Fangbo Tao
Lacking enough high quality proposals for RoI box head has impeded two-stage and multi-stage object detectors for a long time, and many previous works try to solve it via improving RPN's performance or manually generating proposals from ground truth.
1 code implementation • 14 Jan 2021 • Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.
5 code implementations • CVPR 2021 • Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu
In this work, we propose TediGAN, a novel framework for multi-modal image generation and manipulation with textual descriptions.
Ranked #6 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
1 code implementation • COLING 2020 • Sijin Wu, Yujiu Yang, Nicholas Yung, Zhengchen Shen, Zeyang Lei
With the transformation of education from the traditional classroom environment to online education and assessment, it is more and more important to accurately assess the difficulty of questions than ever.
1 code implementation • 9 Oct 2020 • Weihao Xia, Yujiu Yang, Jing-Hao Xue, Wensen Feng
The encoder maps images into a well-disentangled and hierarchically-organized latent space.
no code implementations • 13 Aug 2020 • Yiru Wang, Shen Huang, Gongfu Li, Qiang Deng, Dongliang Liao, Pengda Si, Yujiu Yang, Jin Xu
The automatic quality assessment of self-media online articles is an urgent and new issue, which is of great value to the online recommendation and search.
no code implementations • ACL 2020 • Miaomiao Yu, Yujiu Yang, Chenhui Li
Recently deep learning has been used in Medical subject headings (MeSH) indexing to reduce the time and monetary cost by manual annotation, including DeepMeSH, TextCNN, etc.
no code implementations • 26 Apr 2020 • Zeyang Lei, Zekang Li, Jinchao Zhang, Fandong Meng, Yang Feng, Yujiu Yang, Cheng Niu, Jie zhou
Furthermore, to facilitate the convergence of Gaussian mixture prior and posterior distributions, we devise a curriculum optimization strategy to progressively train the model under multiple training criteria from easy to hard.
no code implementations • 1 Dec 2019 • Yiru Wang, Pengda Si, Zeyang Lei, Guangxu Xun, Yujiu Yang
The sequence-to-sequence (Seq2Seq) model generates target words iteratively given the previously observed words during decoding process, which results in the loss of the holistic semantics in the target response and the complete semantic relationship between responses and dialogue histories.
no code implementations • 2 Nov 2019 • Weihao Xia, Zhanglin Cheng, Yujiu Yang, Jing-Hao Xue
Most state-of-the-art semantic segmentation approaches only achieve high accuracy in good conditions.
no code implementations • 2 Nov 2019 • Weihao Xia, Yujiu Yang, Jing-Hao Xue, Jing Xiao
Human fingerprints are detailed and nearly unique markers of human identity.
1 code implementation • 2 Nov 2019 • Weihao Xia, Yujiu Yang, Jing-Hao Xue
Image-to-image translation has drawn great attention during the past few years.
no code implementations • 1 Nov 2019 • Weihao Xia, Yujiu Yang, Jing-Hao Xue
Image generation has received increasing attention because of its wide application in security and entertainment.
no code implementations • 5 Oct 2019 • Xinrui Zhuang, Yuexiang Li, Yifan Hu, Kai Ma, Yujiu Yang, Yefeng Zheng
Witnessed the development of deep learning, increasing number of studies try to build computer aided diagnosis systems for 3D volumetric medical data.
no code implementations • ACL 2018 • Zeyang Lei, Yujiu Yang, Min Yang, Yi Liu
Deep learning approaches for sentiment classification do not fully exploit sentiment linguistic knowledge.
Ranked #5 on Sentiment Analysis on MR
no code implementations • WS 2018 • Pengcheng Zhu, Yujiu Yang, Wenqiang Gao, Yi Liu
Based on the multi-glance mechanism, we design two types of recurrent neural network models for repeated reading: Glance Cell Model (GCM) and Glance Gate Model (GGM).
no code implementations • 1 Jun 2017 • Xiaoxiang Hu, Yujiu Yang
Our approach achieves equivalent performance to the baseline tracker SRDCF on all three datasets.