no code implementations • NAACL (ACL) 2022 • Gongzheng li, Yadong Xi, Jingzhen Ding, Duan Wang, Ziyang Luo, Rongsheng Zhang, Bai Liu, Changjie Fan, Xiaoxi Mao, Zeng Zhao
To fill such a gap, we introduce a scalable inference solution: Easy and Efficient Transformer (EET), including a series of transformer inference optimization at the algorithm and implementation levels.
no code implementations • 20 Dec 2024 • Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li
Digital agents for automating tasks across different platforms by directly manipulating the GUIs are increasingly important.
Ranked #4 on
Natural Language Visual Grounding
on ScreenSpot
no code implementations • 17 Dec 2024 • Yuxi Sun, Wei Gao, Jing Ma, Hongzhan Lin, Ziyang Luo, Wenxuan Zhang
This suggests that modeling human moral judgment with the emulating humans moral strategy is promising for improving the ethical behaviors of LLMs.
1 code implementation • 28 Nov 2024 • Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma
By integrating visual elements and embedded programming logic, ScratchEval requires the model to process both visual information and code structure, thereby comprehensively evaluating its programming intent understanding ability.
no code implementations • 20 Nov 2024 • Ziyang Luo, HaoNing Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li
To further streamline our evaluation, we introduce VideoAutoBench as an auxiliary benchmark, where human annotators label winners in a subset of VideoAutoArena battles.
no code implementations • 12 Nov 2024 • Chuyi Kong, Ziyang Luo, Hongzhan Lin, Zhiyuan Fan, Yaxin Fan, Yuxi Sun, Jing Ma
The advanced role-playing capabilities of Large Language Models (LLMs) have paved the way for developing Role-Playing Agents (RPAs).
1 code implementation • 8 Nov 2024 • Jianzhao Huang, Hongzhan Lin, Ziyan Liu, Ziyang Luo, Guang Chen, Jing Ma
The proliferation of Internet memes in the age of social media necessitates effective identification of harmful ones.
1 code implementation • 1 Oct 2024 • Ziyang Luo, Xin Li, Hongzhan Lin, Jing Ma, Lidong Bing
To this end, our study introduces the Adaptive Modular Response Evolution (AMR-Evol) framework, which employs a two-stage process to refine response distillation.
1 code implementation • 20 Aug 2024 • Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma
Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks.
1 code implementation • 17 Jun 2024 • Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma
Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning.
4 code implementations • 11 Jun 2024 • Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing
In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.
Ranked #3 on
Video Question Answering
on Perception Test
1 code implementation • 1 May 2024 • Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen
Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image.
1 code implementation • 30 Apr 2024 • Yuchen Tian, Weixiang Yan, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma, Dawn Song
By evaluating 17 popular LLMs using this benchmark, we reveal significant differences in their accuracy and reliability in code generation, offering detailed insights for further improving the code generation capabilities of LLMs.
3 code implementations • 15 Apr 2024 • Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, Jing Ma
Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts.
no code implementations • 30 Mar 2024 • Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Noah Persaud, Nour Fahmy, Tianlong Chen, Mohit Bansal, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Huu Nguyen, Sampo Pyysalo
Despite these efforts, such models encounter challenges such as limited multilingual capabilities, risks of catastrophic forgetting during continual pretraining, and the high costs of training models from scratch, alongside the need to align with AI safety standards and regulatory frameworks.
1 code implementation • 24 Jan 2024 • Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo wang, Ruichao Yang
Then we propose to fine-tune a small language model as the debate judge for harmfulness inference, to facilitate multimodal fusion between the harmfulness rationales and the intrinsic multimodal information within memes.
no code implementations • 3 Jan 2024 • Hongzhan Lin, Ziyang Luo, Bo wang, Ruichao Yang, Jing Ma
The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age.
1 code implementation • 9 Dec 2023 • Hongzhan Lin, Ziyang Luo, Jing Ma, Long Chen
The age of social media is rife with memes.
1 code implementation • CVPR 2024 • Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han
Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks.
no code implementations • 18 Oct 2023 • Nian Liu, Ziyang Luo, Ni Zhang, Junwei Han
Our previous work, the Visual Saliency Transformer (VST), addressed this constraint from a transformer-based sequence-to-sequence perspective, to unify RGB and RGB-D SOD.
3 code implementations • 14 Jun 2023 • Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, QIngwei Lin, Daxin Jiang
Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+.
Ranked #6 on
Code Generation
on CodeContests
1 code implementation • 8 May 2023 • Ziyang Luo, Can Xu, Pu Zhao, Xiubo Geng, Chongyang Tao, Jing Ma, QIngwei Lin, Daxin Jiang
We demonstrate that our PKG framework can enhance the performance of "black-box" LLMs on a range of domain knowledge-intensive tasks that require factual (+7. 9%), tabular (+11. 9%), medical (+3. 0%), and multimodal (+8. 1%) knowledge.
1 code implementation • 6 Feb 2023 • Ziyang Luo, Pu Zhao, Can Xu, Xiubo Geng, Tao Shen, Chongyang Tao, Jing Ma, Qingwen Lin, Daxin Jiang
The conventional dense retrieval paradigm relies on encoding images and texts into dense representations using dual-stream encoders, however, it faces challenges with low retrieval speed in large-scale retrieval scenarios.
1 code implementation • ICCV 2023 • Ziyang Luo, Pu Zhao, Can Xu, Xiubo Geng, Tao Shen, Chongyang Tao, Jing Ma, QIngwei Lin, Daxin Jiang
To address this issue, we propose a novel sparse retrieval paradigm for ITR that exploits sparse representations in the vocabulary space for images and texts.
1 code implementation • 2 Dec 2022 • Hongzhan Lin, Pengyao Yi, Jing Ma, Haiyun Jiang, Ziyang Luo, Shuming Shi, Ruifang Liu
The spread of rumors along with breaking events seriously hinders the truth in the era of social media.
1 code implementation • COLING 2022 • Zhiwei Yang, Jing Ma, Hechang Chen, Hongzhan Lin, Ziyang Luo, Yi Chang
Existing fake news detection methods aim to classify a piece of news as true or false and provide veracity explanations, achieving remarkable performances.
Ranked #3 on
Fake News Detection
on RAWFC
no code implementations • Findings (NAACL) 2022 • Ziyang Luo, Yadong Xi, Jing Ma, Zhiwei Yang, Xiaoxi Mao, Changjie Fan, Rongsheng Zhang
In contrast, Transformer Decoder with the causal attention masks is naturally sensitive to the word order.
no code implementations • 14 Feb 2022 • Ziyang Luo, Zhipeng Hu, Yadong Xi, Rongsheng Zhang, Jing Ma
Different to these heavy-cost models, we introduce a lightweight image captioning framework (I-Tuning), which contains a small number of trainable parameters.
no code implementations • 30 Jan 2022 • Ziyang Luo, Yadong Xi, Rongsheng Zhang, Jing Ma
Before training the captioning models, an extra object detector is utilized to recognize the objects in the image at first.
no code implementations • 29 Sep 2021 • Ziyang Luo, Yadong Xi, Jing Ma, Xiaoxi Mao, Changjie Fan
A common limitation of Transformer Encoder's self-attention mechanism is that it cannot automatically capture the information of word order, so one needs to feed the explicit position encodings into the target model.
no code implementations • ACL (GeBNLP) 2021 • Meichun Jiao, Ziyang Luo
Gender bias in word embeddings gradually becomes a vivid research field in recent years.
no code implementations • EACL 2021 • Ziyang Luo
Our results suggest that SMS tasks decrease the average CGI ability of upper layers, while NLI tasks increase it.
Natural Language Inference
Natural Language Understanding
+2
no code implementations • ACL 2021 • Ziyang Luo, Artur Kulmizev, Xiaoxi Mao
In this work, we demonstrate that the contextualized word vectors derived from pretrained masked language model-based encoders share a common, perhaps undesirable pattern across layers.