Search Results for author: Ziyang Luo

Found 33 papers, 18 papers with code

Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model

no code implementations NAACL (ACL) 2022 Gongzheng li, Yadong Xi, Jingzhen Ding, Duan Wang, Ziyang Luo, Rongsheng Zhang, Bai Liu, Changjie Fan, Xiaoxi Mao, Zeng Zhao

To fill such a gap, we introduce a scalable inference solution: Easy and Efficient Transformer (EET), including a series of transformer inference optimization at the algorithm and implementation levels.

Decoder Inference Optimization

Aria-UI: Visual Grounding for GUI Instructions

no code implementations20 Dec 2024 Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li

Digital agents for automating tasks across different platforms by directly manipulating the GUIs are increasingly important.

Natural Language Visual Grounding Visual Grounding

ClarityEthic: Explainable Moral Judgment Utilizing Contrastive Ethical Insights from Large Language Models

no code implementations17 Dec 2024 Yuxi Sun, Wei Gao, Jing Ma, Hongzhan Lin, Ziyang Luo, Wenxuan Zhang

This suggests that modeling human moral judgment with the emulating humans moral strategy is promising for improving the ethical behaviors of LLMs.

Contrastive Learning

ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges

1 code implementation28 Nov 2024 Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma

By integrating visual elements and embedded programming logic, ScratchEval requires the model to process both visual information and code structure, thereby comprehensively evaluating its programming intent understanding ability.

Code Generation

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

no code implementations20 Nov 2024 Ziyang Luo, HaoNing Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li

To further streamline our evaluation, we introduce VideoAutoBench as an auxiliary benchmark, where human annotators label winners in a subset of VideoAutoArena battles.

Chatbot Multiple-choice +2

SHARP: Unlocking Interactive Hallucination via Stance Transfer in Role-Playing Agents

no code implementations12 Nov 2024 Chuyi Kong, Ziyang Luo, Hongzhan Lin, Zhiyuan Fan, Yaxin Fan, Yuxi Sun, Jing Ma

The advanced role-playing capabilities of Large Language Models (LLMs) have paved the way for developing Role-Playing Agents (RPAs).

General Knowledge Hallucination +2

Towards Low-Resource Harmful Meme Detection with LMM Agents

1 code implementation8 Nov 2024 Jianzhao Huang, Hongzhan Lin, Ziyan Liu, Ziyang Luo, Guang Chen, Jing Ma

The proliferation of Internet memes in the age of social media necessitates effective identification of harmful ones.

Multimodal Reasoning

AMR-Evol: Adaptive Modular Response Evolution Elicits Better Knowledge Distillation for Large Language Models in Code Generation

1 code implementation1 Oct 2024 Ziyang Luo, Xin Li, Hongzhan Lin, Jing Ma, Lidong Bing

To this end, our study introduces the Adaptive Modular Response Evolution (AMR-Evol) framework, which employs a two-stage process to refine response distillation.

Code Generation HumanEval +1

CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?

1 code implementation20 Aug 2024 Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma

Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks.

Code Generation Memorization

MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models

1 code implementation17 Jun 2024 Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma

Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning.

Benchmarking Fact Checking +5

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

4 code implementations11 Jun 2024 Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.

Multiple-choice Temporal Relation Extraction +3

CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

1 code implementation1 May 2024 Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen

Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image.

Language Modeling Language Modelling +2

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

1 code implementation30 Apr 2024 Yuchen Tian, Weixiang Yan, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma, Dawn Song

By evaluating 17 popular LLMs using this benchmark, we reveal significant differences in their accuracy and reliability in code generation, offering detailed insights for further improving the code generation capabilities of LLMs.

Code Generation Hallucination

MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

3 code implementations15 Apr 2024 Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, Jing Ma

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts.

Benchmarking Code Generation +1

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

1 code implementation24 Jan 2024 Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo wang, Ruichao Yang

Then we propose to fine-tune a small language model as the debate judge for harmfulness inference, to facilitate multimodal fusion between the harmfulness rationales and the intrinsic multimodal information within memes.

Language Modelling Text Generation

GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

no code implementations3 Jan 2024 Hongzhan Lin, Ziyang Luo, Bo wang, Ruichao Yang, Jing Ma

The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age.

VST++: Efficient and Stronger Visual Saliency Transformer

no code implementations18 Oct 2023 Nian Liu, Ziyang Luo, Ni Zhang, Junwei Han

Our previous work, the Visual Saliency Transformer (VST), addressed this constraint from a transformer-based sequence-to-sequence perspective, to unify RGB and RGB-D SOD.

object-detection Object Detection +1

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

3 code implementations14 Jun 2023 Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, QIngwei Lin, Daxin Jiang

Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+.

Code Generation HumanEval

Augmented Large Language Models with Parametric Knowledge Guiding

1 code implementation8 May 2023 Ziyang Luo, Can Xu, Pu Zhao, Xiubo Geng, Chongyang Tao, Jing Ma, QIngwei Lin, Daxin Jiang

We demonstrate that our PKG framework can enhance the performance of "black-box" LLMs on a range of domain knowledge-intensive tasks that require factual (+7. 9%), tabular (+11. 9%), medical (+3. 0%), and multimodal (+8. 1%) knowledge.

LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Retrieval

1 code implementation6 Feb 2023 Ziyang Luo, Pu Zhao, Can Xu, Xiubo Geng, Tao Shen, Chongyang Tao, Jing Ma, Qingwen Lin, Daxin Jiang

The conventional dense retrieval paradigm relies on encoding images and texts into dense representations using dual-stream encoders, however, it faces challenges with low retrieval speed in large-scale retrieval scenarios.

Image-text Retrieval Text Retrieval

Zero-Shot Rumor Detection with Propagation Structure via Prompt Learning

1 code implementation2 Dec 2022 Hongzhan Lin, Pengyao Yi, Jing Ma, Haiyun Jiang, Ziyang Luo, Shuming Shi, Ruifang Liu

The spread of rumors along with breaking events seriously hinders the truth in the era of social media.

Domain Adaptation

A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Explainable Fake News Detection

1 code implementation COLING 2022 Zhiwei Yang, Jing Ma, Hechang Chen, Hongzhan Lin, Ziyang Luo, Yi Chang

Existing fake news detection methods aim to classify a piece of news as true or false and provide veracity explanations, achieving remarkable performances.

Fake News Detection

I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning

no code implementations14 Feb 2022 Ziyang Luo, Zhipeng Hu, Yadong Xi, Rongsheng Zhang, Jing Ma

Different to these heavy-cost models, we introduce a lightweight image captioning framework (I-Tuning), which contains a small number of trainable parameters.

Decoder Image Captioning +1

A Frustratingly Simple Approach for End-to-End Image Captioning

no code implementations30 Jan 2022 Ziyang Luo, Yadong Xi, Rongsheng Zhang, Jing Ma

Before training the captioning models, an extra object detector is utilized to recognize the objects in the image at first.

Decoder Image Captioning +2

Analyzing the Implicit Position Encoding Ability of Transformer Decoder

no code implementations29 Sep 2021 Ziyang Luo, Yadong Xi, Jing Ma, Xiaoxi Mao, Changjie Fan

A common limitation of Transformer Encoder's self-attention mechanism is that it cannot automatically capture the information of word order, so one needs to feed the explicit position encodings into the target model.

Decoder Language Modeling +2

Positional Artefacts Propagate Through Masked Language Model Embeddings

no code implementations ACL 2021 Ziyang Luo, Artur Kulmizev, Xiaoxi Mao

In this work, we demonstrate that the contextualized word vectors derived from pretrained masked language model-based encoders share a common, perhaps undesirable pattern across layers.

Language Modeling Language Modelling +4

Cannot find the paper you are looking for? You can Submit a new open access paper.