no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.
1 code implementation • 25 Feb 2025 • Guoxin Wang, Minyu Gao, Shuai Yang, Ya zhang, Lizhi He, Liang Huang, Hanlin Xiao, Yexuan Zhang, Wanyue Li, Lu Chen, Jintao Fei, Xin Li
Large language models (LLMs), particularly those with reasoning capabilities, have rapidly advanced in recent years, demonstrating significant potential across a wide range of applications.
1 code implementation • 14 Nov 2024 • Xuyang Cao, Guoxin Wang, Sheng Shi, Jun Zhao, Yang Yao, Jintao Fei, Minyu Gao
Specifically, in the first stage, we introduce a decoupled facial representation framework that separates dynamic facial expressions from static 3D facial representations.
no code implementations • 26 Sep 2024 • Chao Li, Chen Jiang, Xiaolong Liu, Jun Zhao, Guoxin Wang
In this paper, we introduce a novel approach for multilingual visual text creation, named JoyType, designed to maintain the font style of text during the image generation process.
no code implementations • 20 Sep 2024 • Sheng Shi, Xuyang Cao, Jun Zhao, Guoxin Wang
In audio-driven video generation, creating Mandarin videos presents significant challenges.
1 code implementation • 14 Sep 2024 • Lei Yu, Jintao Fei, Xinyi Liu, Yang Yao, Jun Zhao, Guoxin Wang, Xin Li
This non-contact, real-time monitoring method holds great potential for home settings.
no code implementations • 9 Sep 2024 • Guoxin Wang, Shreejith Shanker, Avishek Nag, Yong Lian, Deepu John
This paper investigates an electrocardiogram (ECG) based biometric user authentication system using features derived from the Convolutional Neural Network (CNN) and self-supervised contrastive learning.
1 code implementation • 26 Apr 2024 • Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose
In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval.
Ranked #1 on
Cross-Modal Retrieval
on MSCOCO
1 code implementation • 14 Mar 2024 • Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding
Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs.
no code implementations • 15 Jan 2024 • Guoxin Wang, Sheng Shi, Shan An, Fengmei Fan, Wenshu Ge, Qi Wang, Feng Yu, Zhiren Wang
Previous research on the diagnosis of Bipolar disorder has mainly focused on resting-state functional magnetic resonance imaging.
no code implementations • 17 Oct 2023 • Guoxin Wang, Qingyuan Wang, Ganesh Neelakanta Iyer, Avishek Nag, Deepu John
Unsupervised learning methods have become increasingly important in deep learning due to their demonstrated large utilization of datasets and higher accuracy in computer vision and natural language processing tasks.
no code implementations • 4 Oct 2023 • Guoxin Wang, Xuyang Cao, Shan An, Fengmei Fan, Chao Zhang, Jinsong Wang, Feng Yu, Zhiren Wang
In this work, we proposed a multi-dimension-embedding-aware modality fusion transformer (MFFormer) for schizophrenia and bipolar disorder classification using rs-fMRI and T1 weighted structural MRI (T1w sMRI).
no code implementations • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Yuzhong Zhao, Yilin Jia, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
In this paper we present KOSMOS-2. 5, a multimodal literate model for machine reading of text-intensive images.
4 code implementations • CVPR 2023 • Zineng Tang, ZiYi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal
UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.
Ranked #5 on
Visual Question Answering (VQA)
on InfographicVQA
(using extra training data)
no code implementations • 17 Aug 2022 • Hai Pham, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang
Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input.
no code implementations • 20 Jul 2021 • Huiqiang Jiang, Guoxin Wang, WEILE CHEN, Chengxi Zhang, Börje F. Karlsson
While named entity recognition (NER) is a key task in natural language processing, most approaches only target flat entities, ignoring nested structures which are common in many scenarios.
Ranked #1 on
Nested Mention Recognition
on ACE 2005
no code implementations • 19 Apr 2021 • Chenyi Lei, Shixian Luo, Yong liu, Wanggui He, Jiamang Wang, Guoxin Wang, Haihong Tang, Chunyan Miao, Houqiang Li
The pre-trained neural models have recently achieved impressive performances in understanding multimodal content.
6 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
Ranked #6 on
Key-value Pair Extraction
on SIBR
8 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
Ranked #1 on
Key Information Extraction
on SROIE
no code implementations • 23 Oct 2020 • Yong liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong Zhang, Aixin Sun, Chunyan Miao
Side information of items, e. g., images and text description, has shown to be effective in contributing to accurate recommendations.
1 code implementation • 14 Nov 2019 • Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Börje F. Karlsson, Biqing Huang, Chin-Yew Lin
For languages with no annotated resources, transferring knowledge from rich-resource languages is an effective solution for named entity recognition (NER).
Ranked #1 on
Cross-Lingual NER
on MSRA
1 code implementation • NAACL 2019 • Yuying Zhu, Guoxin Wang, Börje F. Karlsson
Named entity recognition (NER) in Chinese is essential but difficult because of the lack of natural delimiters.
Ranked #1 on
Chinese Named Entity Recognition
on Weibo NER
(Accuracy-NE metric)
Chinese Named Entity Recognition
named-entity-recognition
+4