Search Results for author: Wenqi Zhang

Found 26 papers, 15 papers with code

Think Twice, Click Once: Enhancing GUI Grounding via Fast and Slow Systems

no code implementations9 Mar 2025 Fei Tang, Yongliang Shen, Hang Zhang, Siqi Chen, Guiyang Hou, Wenqi Zhang, Wenqiao Zhang, Kaitao Song, Weiming Lu, Yueting Zhuang

This structured decomposition enables systematic understanding of both interface layouts and visual relationships.

DB-Explore: Automated Database Exploration and Instruction Synthesis for Text-to-SQL

no code implementations6 Mar 2025 Haoyuan Ma, Yongliang Shen, Hengwei Liu, Wenqi Zhang, Haolei Xu, Qiuying Peng, Jun Wang, Weiming Lu

Our framework enables comprehensive database understanding through diverse sampling strategies and automated instruction generation, bridging the gap between database structures and language models.

Logical Reasoning Natural Language Queries +1

STaR-SQL: Self-Taught Reasoner for Text-to-SQL

no code implementations19 Feb 2025 Mingqian He, Yongliang Shen, Wenqi Zhang, Qiuying Peng, Jun Wang, Weiming Lu

Generating step-by-step "chain-of-thought" rationales has proven effective for improving the performance of large language models on complex reasoning tasks.

Text-To-SQL

RG-Attn: Radian Glue Attention for Multi-modality Multi-agent Cooperative Perception

1 code implementation28 Jan 2025 Lantao Li, Kang Yang, Wenqi Zhang, Xiaoxue Wang, Chen Sun

To harness the potential of every possible data source for optimal performance, we design a robust LiDAR and camera cross-modality fusion module, Radian-Glue-Attention (RG-Attn), applicable to both intra-agent cross-modality fusion and inter-agent cross-modality fusion scenarios, owing to the convenient coordinate conversion by transformation matrix and the unified sampling/inversion mechanism.

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

1 code implementation1 Jan 2025 Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang, Lidong Bing

Compared to its counterparts, our video-centric textbook offers more coherent context, richer knowledge, and better image-text alignment.

Optical Character Recognition (OCR)

GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation

1 code implementation15 Oct 2024 Fei Tang, Yongliang Shen, Hang Zhang, Zeqi Tan, Wenqi Zhang, Zhibiao Huang, Kaitao Song, Weiming Lu, Yueting Zhuang

GaVaMoE introduces two key components: (1) a rating reconstruction module that employs Variational Autoencoder (VAE) with a Gaussian Mixture Model (GMM) to capture complex user-item collaborative preferences, serving as a pre-trained multi-gating mechanism; and (2) a set of fine-grained expert models coupled with the multi-gating mechanism for generating highly personalized explanations.

Explainable Recommendation Language Modelling +2

Entering Real Social World! Benchmarking the Social Intelligence of Large Language Models from a First-person Perspective

1 code implementation8 Oct 2024 Guiyang Hou, Wenqi Zhang, Yongliang Shen, Zeqi Tan, Sihao Shen, Weiming Lu

(3) a lack of comprehensive evaluation of behavioral intelligence, with specific emphasis on incorporating critical human-machine interaction scenarios.

Attribute Benchmarking +2

DiReDi: Distillation and Reverse Distillation for AIoT Applications

no code implementations12 Sep 2024 Chen Sun, Qing Tong, Wenshuang Yang, Wenqi Zhang

When the user needs to update the edge AI model to better fit the actual scenario, the reverse distillation (RD) process is employed to extract the knowledge: the difference between user preferences and the manufacturer's presumptions from the edge AI model using the user's exclusive data.

Knowledge Distillation Management

Apple Intelligence Foundation Language Models

no code implementations29 Jul 2024 Tom Gunter, ZiRui Wang, Chong Wang, Ruoming Pang, Aonan Zhang, BoWen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek, Sam Wiseman, Syd Evans, Tao Lei, Vivek Rathod, Xiang Kong, Xianzhi Du, Yanghao Li, Yongqiang Wang, Yuan Gao, Zaid Ahmed, Zhaoyang Xu, Zhiyun Lu, Al Rashid, Albin Madappally Jose, Alec Doane, Alfredo Bencomo, Allison Vanderby, Andrew Hansen, Ankur Jain, Anupama Mann Anupama, Areeba Kamal, Bugu Wu, Carolina Brum, Charlie Maalouf, Chinguun Erdenebileg, Chris Dulhanty, Dominik Moritz, Doug Kang, Eduardo Jimenez, Evan Ladd, Fangping Shi, Felix Bai, Frank Chu, Fred Hohman, Hadas Kotek, Hannah Gillis Coleman, Jane Li, Jeffrey Bigham, Jeffery Cao, Jeff Lai, Jessica Cheung, Jiulong Shan, Joe Zhou, John Li, Jun Qin, Karanjeet Singh, Karla Vega, Kelvin Zou, Laura Heckman, Lauren Gardiner, Margit Bowler, Maria Cordell, Meng Cao, Nicole Hay, Nilesh Shahdadpuri, Otto Godwin, Pranay Dighe, Pushyami Rachapudi, Ramsey Tantawi, Roman Frigg, Sam Davarnia, Sanskruti Shah, Saptarshi Guha, Sasha Sirovica, Shen Ma, Shuang Ma, Simon Wang, Sulgi Kim, Suma Jayaram, Vaishaal Shankar, Varsha Paidi, Vivek Kumar, Xin Wang, Xin Zheng, Walker Cheng, Yael Shrager, Yang Ye, Yasu Tanaka, Yihao Guo, Yunsong Meng, Zhao Tang Luo, Zhi Ouyang, Alp Aygar, Alvin Wan, Andrew Walkingshaw, Andy Narayanan, Antonie Lin, Arsalan Farooq, Brent Ramerth, Colorado Reed, Chris Bartels, Chris Chaney, David Riazati, Eric Liang Yang, Erin Feldman, Gabriel Hochstrasser, Guillaume Seguin, Irina Belousova, Joris Pelemans, Karen Yang, Keivan Alizadeh Vahid, Liangliang Cao, Mahyar Najibi, Marco Zuliani, Max Horton, Minsik Cho, Nikhil Bhendawade, Patrick Dong, Piotr Maj, Pulkit Agrawal, Qi Shan, Qichen Fu, Regan Poston, Sam Xu, Shuangning Liu, Sushma Rao, Tashweena Heeramun, Thomas Merth, Uday Rayala, Victor Cui, Vivek Rangarajan Sridhar, Wencong Zhang, Wenqi Zhang, Wentao Wu, Xingyu Zhou, Xinwen Liu, Yang Zhao, Yin Xia, Zhile Ren, Zhongzheng Ren

We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute.

Language Modeling Language Modelling

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

1 code implementation9 Jul 2024 Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

In light of this, we design a multi-modal self-instruct, utilizing large language models and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios.

Chart Understanding Language Modeling +2

Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation

no code implementations3 Jul 2024 Qiang Tong, Jinrui Wang, Wenshuang Yang, Songtao Wu, Wenqi Zhang, Chen Sun, Kuanhong Xu

The utilization of AIoT technology has become a crucial trend in modern poultry management, offering the potential to optimize farming operations and reduce human workloads.

Knowledge Distillation Quantization

TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind

no code implementations1 Jul 2024 Guiyang Hou, Wenqi Zhang, Yongliang Shen, Linjuan Wu, Weiming Lu

Theory of Mind (ToM)-the cognitive ability to reason about mental states of ourselves and others, is the foundation of social interaction.

Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

no code implementations29 Jun 2024 Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu

For instance, Tree-PLV achieved substantial performance gains over the Mistral-7B self-consistency baseline on GSM8K (67. 55% to 82. 79%), MATH (17. 00% to 26. 80%), CSQA (68. 14% to 72. 97%), and StrategyQA (82. 86% to 83. 25%). Additionally, our study explores the appropriate granularity for applying preference learning, revealing that step-level guidance provides feedback that better aligns with the evaluation of the reasoning process.

Binary Classification GSM8K +2

On the Combination of AI and Wireless Technologies: 3GPP Standardization Progress

no code implementations17 Jun 2024 Chen Sun, Tao Cui, Wenqi Zhang, Yingshuang Bai, Shuo Wang, Haojin Li

Combing Artificial Intelligence (AI) and wireless communication technologies has become one of the major technologies trends towards 2030.

Management

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

4 code implementations11 Jun 2024 Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.

Multiple-choice Temporal Relation Extraction +3

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

no code implementations4 Jan 2024 Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming Lu

Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.

Language Modeling Language Modelling +1

TaskBench: Benchmarking Large Language Models for Task Automation

1 code implementation30 Nov 2023 Yongliang Shen, Kaitao Song, Xu Tan, Wenqi Zhang, Kan Ren, Siyu Yuan, Weiming Lu, Dongsheng Li, Yueting Zhuang

To address this, we introduce TaskBench, a comprehensive framework to evaluate the capability of LLMs in task automation.

Benchmarking Parameter Prediction

An Expression Tree Decoding Strategy for Mathematical Equation Generation

1 code implementation14 Oct 2023 Wenqi Zhang, Yongliang Shen, Qingpeng Nong, Zeqi Tan, Yanna Ma, Weiming Lu

To generate a tree with expression as its node, we employ a layer-wise parallel decoding strategy: we decode multiple independent expressions (leaf nodes) in parallel at each layer and repeat parallel decoding layer by layer to sequentially generate these parent node expressions that depend on others.

Math Math Word Problem Solving +1

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

1 code implementation12 Jun 2023 Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang

The advancements are twofold: First, it is a code-centric agent that receives human requests and generates code as an intermediary to handle massive data, which is quite flexible for large-scale data processing tasks.

Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem

1 code implementation21 Oct 2022 Wenqi Zhang, Yongliang Shen, Yanna Ma, Xiaoxia Cheng, Zeqi Tan, Qingpeng Nong, Weiming Lu

Math word problem solver requires both precise relation reasoning about quantities in the text and reliable generation for the diverse equation.

Ranked #2 on Math Word Problem Solving on Math23K (using extra training data)

Contrastive Learning Math +3

Cannot find the paper you are looking for? You can Submit a new open access paper.