1 code implementation • 26 Jan 2024 • XiaoJun Wu, Dixiang Zhang, Ruyi Gan, Junyu Lu, Ziwei Wu, Renliang Sun, Jiaxing Zhang, Pingjian Zhang, Yan Song
Recent advancements in text-to-image models have significantly enhanced image generation capabilities, yet a notable gap of open-source models persists in bilingual or Chinese language support.
no code implementations • 28 Dec 2023 • Dixiang Zhang, Junyu Lu, Pingjian Zhang
To solve this issue, we propose a Unified Lattice Graph Fusion (ULGF) approach for Chinese NER.
Chinese Named Entity Recognition
named-entity-recognition
+2
no code implementations • 8 Dec 2023 • Junyu Lu, Dixiang Zhang, Songxin Zhang, Zejian Xie, Zhuoyang Song, Cong Lin, Jiaxing Zhang, BingYi Jing, Pingjian Zhang
During the instruction fine-tuning stage, we introduce semantic-aware visual feature extraction, a crucial method that enables the model to extract informative features from concrete visual objects.
Ranked #1 on
Image Captioning
on nocaps entire
no code implementations • 7 Dec 2023 • Ruyi Gan, XiaoJun Wu, Junyu Lu, Yuanhe Tian, Dixiang Zhang, Ziwei Wu, Renliang Sun, Chang Liu, Jiaxing Zhang, Pingjian Zhang, Yan Song
However, there are few specialized models in certain domains, such as interior design, which is attributed to the complex textual descriptions and detailed visual elements inherent in design, alongside the necessity for adaptable resolution.
no code implementations • 6 Nov 2023 • Ruyi Gan, Ziwei Wu, Renliang Sun, Junyu Lu, XiaoJun Wu, Dixiang Zhang, Kunhao Pan, Junqing He, Yuanhe Tian, Ping Yang, Qi Yang, Hao Wang, Jiaxing Zhang, Yan Song
Although many such issues are addressed along the line of research on LLMs, an important yet practical limitation is that many studies overly pursue enlarging model sizes without comprehensively analyzing and optimizing the use of pre-training data in their learning process, as well as appropriate organization and leveraging of such data in training LLMs under cost-effective settings.
no code implementations • 12 Oct 2023 • Junyu Lu, Dixiang Zhang, XiaoJun Wu, Xinyu Gao, Ruyi Gan, Jiaxing Zhang, Yan Song, Pingjian Zhang
Recent advancements enlarge the capabilities of large language models (LLMs) in zero-shot image-to-text generation and understanding by integrating multi-modal inputs.
no code implementations • COLING 2022 • Junyu Lu, Dixiang Zhang, Pingjian Zhang
Then, we transform the fine-grained semantic representation of the vision and text into a unified lattice structure and design a novel relative position encoding to match different modalities in Transformer.