1 code implementation • 5 Dec 2024 • Jun Zhang, Desen Meng, Ji Qi, Zhenpeng Huang, Tao Wu, LiMin Wang
In this paper, we propose to build efficient MLLMs by leveraging the Mixture-of-Depths (MoD) mechanism, where each transformer decoder layer selects essential vision tokens to process while skipping redundant ones.
no code implementations • 27 Nov 2024 • Yuze Wang, Aoran Hu, Ji Qi, Yang Liu, Chao Tao
On the one hand, to alleviate the overfitting problem caused by the model's over-trust of remaining errors in high-quality labels, we encode the similarity/aggregation of cropland in the visual/spatial domain to construct the unsupervised learning signal, and take it as the regularization term to constrain the supervised part.
no code implementations • 18 Oct 2024 • Jianfa Chen, Emily Shen, Trupti Bavalatti, Xiaowen Lin, Yongkai Wang, Shuming Hu, Harihar Subramanyam, Ksheeraj Sai Vepuri, Ming Jiang, Ji Qi, Li Chen, Nan Jiang, Ankit Jain
Robust content moderation classifiers are essential for the safety of Generative AI systems.
no code implementations • 4 Sep 2024 • Xing Lan, Jian Xue, Ji Qi, Dongmei Jiang, Ke Lu, Tat-Seng Chua
Specifically, we have designed the CoT mechanism from three key perspectives: key observations, overall emotional interpretation, and conclusion.
Facial Expression Recognition Facial Expression Recognition (FER)
3 code implementations • 29 Aug 2024 • Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications.
Ranked #10 on Visual Question Answering on MM-Vet
no code implementations • 23 Jul 2024 • Zeyu Wang, Weichen Dai, Xiangyu Zhou, Ji Qi, Yi Zhou
Vision Transformer and its variants have been adopted in many visual tasks due to their powerful capabilities, which also bring significant challenges in computation and storage.
1 code implementation • 12 Jun 2024 • Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Xiaotao Gu, Shiyu Huang, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang
To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding.
1 code implementation • 7 Apr 2024 • Kai Sun, Yushi Bai, Ji Qi, Lei Hou, Juanzi Li
This highlights the challenging nature of our benchmark for existing models and the significant gap between the multimodal reasoning capabilities of current models and humans.
1 code implementation • 23 Feb 2024 • Zui Chen, Yezeng Chen, Jiaqi Han, Zhijie Huang, Ji Qi, Yi Zhou
Large language models (LLMs) are displaying emergent abilities for math reasoning tasks, and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT). In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability. Firstly, we determine the ability boundary of reasoning paths augmentation by identifying these paths' minimal optimal set. Secondly, we validate that different abilities of the model can be cumulatively enhanced by Mix of Minimal Optimal Sets of corresponding types of data, while our models MMOS achieve SOTA performance on series base models under much lower construction costs. Besides, we point out GSM-HARD is not really hard and today's LLMs no longer lack numerical robustness. Also, we provide an Auto Problem Generator for robustness testing and educational applications. Our code and data are publicly available at https://github. com/cyzhh/MMOS.
Ranked #2 on Math Word Problem Solving on ASDiv-A (using extra training data)
1 code implementation • 6 Feb 2024 • Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang
Drawing inspiration from human cognition in solving visual problems (e. g., marking, zoom in), this paper introduces Chain of Manipulations, a mechanism that enables VLMs to solve problems step-by-step with evidence.
1 code implementation • 31 Jan 2024 • Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li
Extending large language models to effectively handle long contexts requires instruction fine-tuning on input sequences of similar length.
4 code implementations • 6 Nov 2023 • Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang
We introduce CogVLM, a powerful open-source visual language foundation model.
Ranked #4 on Visual Question Answering (VQA) on InfiMM-Eval
no code implementations • 16 Oct 2023 • Ji Qi, Kaixuan Ji, Xiaozhi Wang, Jifan Yu, Kaisheng Zeng, Lei Hou, Juanzi Li, Bin Xu
Open Information Extraction (OIE) aims to extract objective structured knowledge from natural texts, which has attracted growing attention to build dedicated models with human experience.
no code implementations • 16 Oct 2023 • Ji Qi, Kaixuan Ji, Jifan Yu, Duokang Wang, Bin Xu, Lei Hou, Juanzi Li
Building models that comprehends videos and responds specific user instructions is a practical and challenging topic, as it requires mastery of both vision understanding and knowledge reasoning.
no code implementations • 18 Aug 2023 • Pengbo Hu, Ji Qi, Xingyu Li, Hong Li, Xinqi Wang, Bing Quan, Ruiyu Wang, Yi Zhou
Our approach succeeds in performance while significantly saving inference steps.
1 code implementation • 15 Jun 2023 • Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan YAO, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations.
1 code implementation • 23 May 2023 • Ji Qi, Chuchun Zhang, Xiaozhi Wang, Kaisheng Zeng, Jifan Yu, Jinxin Liu, Jiuding Sun, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu
In this paper, we present the first benchmark that simulates the evaluation of open information extraction models in the real world, where the syntactic and expressive distributions under the same knowledge meaning may drift variously.
1 code implementation • 26 Mar 2023 • Ji Qi, Jifan Yu, Teng Tu, Kunyu Gao, Yifan Xu, Xinyu Guan, Xiaozhi Wang, Yuxiao Dong, Bin Xu, Lei Hou, Juanzi Li, Jie Tang, Weidong Guo, Hui Liu, Yu Xu
Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i. e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative.
1 code implementation • 17 Jan 2023 • Ji Qi, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu
In this paper, we propose a syntactically robust training framework that enables models to be trained on a syntactic-abundant distribution based on diverse paraphrase generation.
no code implementations • 30 Nov 2022 • Jie Yan, Jing Liu, Ji Qi, Zhong-Yuan Zhang
Federated clustering (FC) is an essential extension of centralized clustering designed for the federated setting, wherein the challenge lies in constructing a global similarity measure without the need to share private data.
1 code implementation • 15 Nov 2022 • Chao Tao, Ji Qi, Mingning Guo, Qing Zhu, Haifeng Li
Deep learning has achieved great success in learning features from massive remote sensing images (RSIs).
1 code implementation • 29 Oct 2022 • Jie Yan, Jing Liu, Ji Qi, Zhong-Yuan Zhang
Federated clustering (FC) is an extension of centralized clustering in federated settings.
no code implementations • 8 Oct 2022 • Ji Qi, Bin Xu, Kaisheng Zeng, Jinxin Liu, Jifan Yu, Qi Gao, Juanzi Li, Lei Hou
Document-level relation extraction with graph neural networks faces a fundamental graph construction gap between training and inference - the golden graph structure only available during training, which causes that most methods adopt heuristic or syntactic rules to construct a prior graph as a pseudo proxy.
no code implementations • 23 Apr 2022 • Jie Yan, Xin Liu, Ji Qi, Tao You, Zhong-Yuan Zhang
Clustering ensemble has an impressive performance in improving the accuracy and robustness of partition results and has received much attention in recent years.
1 code implementation • 31 May 2021 • Dejie Chang, Mosha Chen, Chaozhen Liu, LiPing Liu, Dongdong Li, Wei Li, Fei Kong, Bangchang Liu, Xiaobin Luo, Ji Qi, Qiao Jin, Bin Xu
In order to accelerate the research for domain-specific knowledge graphs in the medical domain, we introduce DiaKG, a high-quality Chinese dataset for Diabetes knowledge graph, which contains 22, 050 entities and 6, 890 relations in total.
no code implementations • 11 Mar 2021 • ZiYuan Gao, Sanjay Jain, Ji Qi, Philipp Schlicht, Frank Stephan, Jacob Tarr
The present work looks at semiautomatic rings with automatic addition and comparisons which are dense subrings of the real numbers and asks how these can be used to represent geometric objects such that certain operations and transformations are automatic.
Formal Languages and Automata Theory Logic
no code implementations • 2 Oct 2020 • Chao Tao, Ji Qi, Weipeng Lu, Hao Wang, Haifeng Li
With the development of deep learning, supervised learning methods perform well in remote sensing images (RSIs) scene classification.
1 code implementation • ISPRS Journal of Photogrammetry and Remote Sensing 2019 • Chao Tao, Ji Qi, Yansheng Li, Hao Wang, Haifeng Li
The validation experiments using three large datasets of very high-resolution (VHR) satellite imagery show that the proposed method can improve road extraction accuracy and provide an output that is more in line with human expectations.
no code implementations • EMNLP 2018 • Changliang Li, Liang Li, Ji Qi
In this work, we propose a novel self-attentive model with gate mechanism to fully utilize the semantic correlation between slot and intent.
no code implementations • WS 2018 • Changliang Li, Ji Qi
Chinese grammatical error diagnosis system is a very important tool, which can help Chinese learners automatically diagnose grammatical errors in many scenarios.
no code implementations • 19 Jun 2017 • Jianyu Lin, Neil T. Clancy, Yang Hu, Ji Qi, Taran Tatla, Danail Stoyanov, Lena Maier-Hein, Daniel S. Elson
Intra-operative measurements of tissue shape and multi/ hyperspectral information have the potential to provide surgical guidance and decision making support.
no code implementations • 15 Jun 2016 • Jianyu Lin, Neil T. Clancy, Xueqing Sun, Ji Qi, Mirek Janatka, Danail Stoyanov, Daniel S. Elson
In HSI mode standard endoscopic illumination is used, with the fibre probe collecting reflected light and encoding the spatial information into a linear format that can be imaged onto the slit of a spectrograph.