Search Results for author: Guoxin Wang

Found 22 papers, 10 papers with code

XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding

no code implementations Findings (ACL) 2022 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.

document understanding Form

Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support

1 code implementation25 Feb 2025 Guoxin Wang, Minyu Gao, Shuai Yang, Ya zhang, Lizhi He, Liang Huang, Hanlin Xiao, Yexuan Zhang, Wanyue Li, Lu Chen, Jintao Fei, Xin Li

Large language models (LLMs), particularly those with reasoning capabilities, have rapidly advanced in recent years, demonstrating significant potential across a wide range of applications.

Decision Making Diagnostic +3

JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation

1 code implementation14 Nov 2024 Xuyang Cao, Guoxin Wang, Sheng Shi, Jun Zhao, Yang Yao, Jintao Fei, Minyu Gao

Specifically, in the first stage, we introduce a decoupled facial representation framework that separates dynamic facial expressions from static 3D facial representations.

Image Animation Motion Generation +1

JoyType: A Robust Design for Multilingual Visual Text Creation

no code implementations26 Sep 2024 Chao Li, Chen Jiang, Xiaolong Liu, Jun Zhao, Guoxin Wang

In this paper, we introduce a novel approach for multilingual visual text creation, named JoyType, designed to maintain the font style of text during the image generation process.

Image Generation Optical Character Recognition (OCR) +2

JoyHallo: Digital human model for Mandarin

no code implementations20 Sep 2024 Sheng Shi, Xuyang Cao, Jun Zhao, Guoxin Wang

In audio-driven video generation, creating Mandarin videos presents significant challenges.

model Text Generation +1

ECG Biometric Authentication Using Self-Supervised Learning for IoT Edge Sensors

no code implementations9 Sep 2024 Guoxin Wang, Shreejith Shanker, Avishek Nag, Yong Lian, Deepu John

This paper investigates an electrocardiogram (ECG) based biometric user authentication system using features derived from the Convolutional Neural Network (CNN) and self-supervised contrastive learning.

Contrastive Learning Quantization +1

3SHNet: Boosting Image-Sentence Retrieval via Visual Semantic-Spatial Self-Highlighting

1 code implementation26 Apr 2024 Xuri Ge, Songpei Xu, Fuhai Chen, Jie Wang, Guoxin Wang, Shan An, Joemon M. Jose

In this paper, we propose a novel visual Semantic-Spatial Self-Highlighting Network (termed 3SHNet) for high-precision, high-efficiency and high-generalization image-sentence retrieval.

Cross-Modal Retrieval Retrieval +2

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

1 code implementation14 Mar 2024 Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding

Consequently, a simple combination of them cannot guarantee accomplishing both training efficiency and inference efficiency with minimal costs.

Model Compression parameter-efficient fine-tuning

A Bi-Pyramid Multimodal Fusion Method for the Diagnosis of Bipolar Disorders

no code implementations15 Jan 2024 Guoxin Wang, Sheng Shi, Shan An, Fengmei Fan, Wenshu Ge, Qi Wang, Feng Yu, Zhiren Wang

Previous research on the diagnosis of Bipolar disorder has mainly focused on resting-state functional magnetic resonance imaging.

Medical Diagnosis

Unsupervised Pre-Training Using Masked Autoencoders for ECG Analysis

no code implementations17 Oct 2023 Guoxin Wang, Qingyuan Wang, Ganesh Neelakanta Iyer, Avishek Nag, Deepu John

Unsupervised learning methods have become increasingly important in deep learning due to their demonstrated large utilization of datasets and higher accuracy in computer vision and natural language processing tasks.

Unsupervised Pre-training

Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Clasification

no code implementations4 Oct 2023 Guoxin Wang, Xuyang Cao, Shan An, Fengmei Fan, Chao Zhang, Jinsong Wang, Feng Yu, Zhiren Wang

In this work, we proposed a multi-dimension-embedding-aware modality fusion transformer (MFFormer) for schizophrenia and bipolar disorder classification using rs-fMRI and T1 weighted structural MRI (T1w sMRI).

Functional Connectivity Time Series

Unifying Vision, Text, and Layout for Universal Document Processing

4 code implementations CVPR 2023 Zineng Tang, ZiYi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal

UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation.

Ranked #5 on Visual Question Answering (VQA) on InfographicVQA (using extra training data)

document understanding Image Reconstruction +1

Understanding Long Documents with Different Position-Aware Attentions

no code implementations17 Aug 2022 Hai Pham, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang

Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input.

document understanding Position

BoningKnife: Joint Entity Mention Detection and Typing for Nested NER via prior Boundary Knowledge

no code implementations20 Jul 2021 Huiqiang Jiang, Guoxin Wang, WEILE CHEN, Chengxi Zhang, Börje F. Karlsson

While named entity recognition (NER) is a key task in natural language processing, most approaches only target flat entities, ignoring nested structures which are common in many scenarios.

named-entity-recognition Named Entity Recognition +3

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

6 code implementations18 Apr 2021 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.

Document Image Classification document understanding +2

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

8 code implementations ACL 2021 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou

Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.

Document Image Classification Document Layout Analysis +7

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

1 code implementation14 Nov 2019 Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Börje F. Karlsson, Biqing Huang, Chin-Yew Lin

For languages with no annotated resources, transferring knowledge from rich-resource languages is an effective solution for named entity recognition (NER).

Cross-Lingual NER Meta-Learning +4

Cannot find the paper you are looking for? You can Submit a new open access paper.