Search Results for author: Wenxuan Wang

Found 99 papers, 47 papers with code

Image Difference Grounding with Natural Language

no code implementations2 Apr 2025 Wenxuan Wang, Zijia Zhao, Yisi Zhang, Yepeng Tang, Erdong Hu, Xinlong Wang, Jing Liu

We introduce DiffGround, a large-scale and high-quality dataset for IDG, containing image pairs with diverse visual variations along with instructions querying fine-grained differences.

Visual Grounding

STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language Models

no code implementations23 Mar 2025 Xunguang Wang, Wenxuan Wang, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Daoyuan Wu, Shuai Wang

Large Language Models (LLMs) have become increasingly vulnerable to jailbreak attacks that circumvent their safety mechanisms.

TimeZero: Temporal Video Grounding with Reasoning-Guided LVLM

1 code implementation17 Mar 2025 Ye Wang, Boshen Xu, Zihao Yue, Zihan Xiao, Ziheng Wang, Liang Zhang, Dingyi Yang, Wenxuan Wang, Qin Jin

We introduce TimeZero, a reasoning-guided LVLM designed for the temporal video grounding (TVG) task.

Video Grounding

SeCap: Self-Calibrating and Adaptive Prompts for Cross-view Person Re-Identification in Aerial-Ground Networks

1 code implementation10 Mar 2025 Shining Wang, Yunlong Wang, Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Peng Wang

To address this issue, previous methods attempt to reduce the differences between viewpoints by critical attributes and decoupling the viewpoints.

Person Re-Identification Person Search

VisBias: Measuring Explicit and Implicit Social Biases in Vision Language Models

no code implementations10 Mar 2025 Jen-tse Huang, Jiantong Qin, Jianping Zhang, Youliang Yuan, Wenxuan Wang, Jieyu Zhao

To analyze explicit bias, we directly pose questions to VLMs related to gender and racial differences: (1) Multiple-choice questions based on a given image (e. g., "What is the education level of the person in the image?")

Multiple-choice

VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models

1 code implementation23 Feb 2025 Jen-tse Huang, Dasen Dai, Jen-Yuan Huang, Youliang Yuan, Xiaoyuan Liu, Wenxuan Wang, Wenxiang Jiao, Pinjia He, Zhaopeng Tu

Multimodal Large Language Models (MLLMs) have demonstrated remarkable advancements in multimodal understanding; however, their fundamental visual cognitive abilities remain largely underexplored.

Benchmarking Spatial Reasoning +1

Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation

no code implementations21 Feb 2025 Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, XiaoYu Zhang, Jing Liu

Building on this concept, we introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations.

Time Series Time Series Analysis

A Survey of LLM-based Agents in Medicine: How far are we from Baymax?

no code implementations16 Feb 2025 Wenxuan Wang, Zizhan Ma, Zheng Wang, Chenghan Wu, WenTing Chen, Xiang Li, Yixuan Yuan

Large Language Models (LLMs) are transforming healthcare through the development of LLM-based agents that can understand, reason about, and assist with medical tasks.

Hallucination Survey

Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs

no code implementations16 Feb 2025 Wenxuan Wang, Xiaoyuan Liu, Kuiyi Gao, Jen-tse Huang, Youliang Yuan, Pinjia He, Shuai Wang, Zhaopeng Tu

Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images.

Benchmarking

VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks

1 code implementation16 Feb 2025 Jingyuan Huang, Jen-tse Huang, Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang, Jieyu Zhao

Evaluating four VLMs, we find that while these models demonstrate the ability to recognize geographic information from images, achieving up to $53. 8\%$ accuracy in city prediction, they exhibit significant regional biases.

Making Them a Malicious Database: Exploiting Query Code to Jailbreak Aligned Large Language Models

1 code implementation13 Feb 2025 Qingsong Zou, Jingyu Xiao, Qing Li, Zhi Yan, Yuhang Wang, Li Xu, Wenxuan Wang, Kuofeng Gao, Ruoyu Li, Yong Jiang

By treating LLMs as knowledge databases, we translate malicious queries in natural language into structured non-natural query language to bypass the safety alignment mechanisms of LLMs.

Safety Alignment

EVEv2: Improved Baselines for Encoder-Free Vision-Language Models

1 code implementation10 Feb 2025 Haiwen Diao, Xiaotong Li, Yufeng Cui, Yueze Wang, Haoge Deng, Ting Pan, Wenxuan Wang, Huchuan Lu, Xinlong Wang

Existing encoder-free vision-language models (VLMs) are rapidly narrowing the performance gap with their encoder-based counterparts, highlighting the promising potential for unified multimodal systems with structural simplicity and efficient deployment.

Decoder

Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries

1 code implementation9 Feb 2025 Jen-tse Huang, Yuhang Yan, Linqi Liu, Yixin Wan, Wenxuan Wang, Kai-Wei Chang, Michael R. Lyu

Using these statistics, we develop a checklist comprising objective and subjective queries to analyze behavior of large language models (LLMs) and text-to-image (T2I) models.

Diversity Fairness +1

BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR

no code implementations22 Jan 2025 Guodong Ma, Wenxuan Wang, Lifeng Zhou, Yuting Yang, Yuke Li, Binbin Du

Recently, the Mixture of Expert (MoE) architecture, such as LR-MoE, is often used to alleviate the impact of language confusion on the multilingual ASR (MASR) task.

Mixture-of-Experts

How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs

no code implementations18 Jan 2025 Jialun Cao, Yuk-Kit Chan, Zixuan Ling, Wenxuan Wang, Shuqing Li, Mingwei Liu, Ruixi Qiao, Yuting Han, Chaozheng Wang, Boxi Yu, Pinjia He, Shuai Wang, Zibin Zheng, Michael R. Lyu, Shing-Chi Cheung

We propose How2Bench, which is comprised of a 55-criteria checklist as a set of guidelines to govern the development of code-related benchmarks comprehensively.

MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs

1 code implementation19 Dec 2024 Yuxuan Wan, Yi Dong, Jingyu Xiao, Yintong Huo, Wenxuan Wang, Michael R. Lyu

Our study applies existing methods to the MRWeb problem using a newly curated dataset of 500 websites (300 synthetic, 200 real-world).

Sustainable Self-evolution Adversarial Training

no code implementations3 Dec 2024 Wenxuan Wang, Chenglei Wang, Huihui Qi, Menghao Ye, Xuelin Qian, Peng Wang, Yanning Zhang

With the wide application of deep neural network models in various computer vision tasks, there has been a proliferation of adversarial example generation strategies aimed at deeply exploring model security.

Adversarial Defense Continual Learning

Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation

no code implementations23 Nov 2024 Fengfan Zhou, Bangjie Yin, Hefei Ling, Qianyu Zhou, Wenxuan Wang

Experimental results demonstrate that our proposed attack method can effectively enhance the transferability of the crafted adversarial face examples.

Adversarial Attack Face Recognition

On the Shortcut Learning in Multilingual Neural Machine Translation

no code implementations15 Nov 2024 Wenxuan Wang, Wenxiang Jiao, Jen-tse Huang, Zhaopeng Tu, Michael R. Lyu

By carefully designing experiments on different MNMT scenarios and models, we attribute the off-target issue to the overfitting of the shortcuts of (non-centric, centric) language mappings.

Attribute Machine Translation +1

Dynamic Textual Prompt For Rehearsal-free Lifelong Person Re-identification

no code implementations9 Nov 2024 Hongyu Chen, Bingliang Jiao, Wenxuan Wang, Peng Wang

By leveraging this shared textual space as an anchor, we can prompt the ReID model to embed images from various domains into a unified semantic space, thereby alleviating catastrophic forgetting caused by domain shifts.

Knowledge Distillation Person Re-Identification

Interaction2Code: Benchmarking MLLM-based Interactive Webpage Code Generation from Interactive Prototyping

1 code implementation5 Nov 2024 Jingyu Xiao, Yuxuan Wan, Yintong Huo, Zixin Wang, Xinyi Xu, Wenxuan Wang, Zhiyao Xu, Yuhang Wang, Michael R. Lyu

To address these limitations, we propose four enhancement strategies: Interactive Element Highlighting, Failureaware Prompting (FAP), Visual Saliency Enhancement, and Visual-Textual Descriptions Combination, all aiming at improving MLLMs' performance on the Interaction-toCode task.

Benchmarking Code Generation

Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs

1 code implementation10 Oct 2024 Xiaoyuan Liu, Wenxuan Wang, Youliang Yuan, Jen-tse Huang, Qiuzhi Liu, Pinjia He, Zhaopeng Tu

This paper explores the problem of commonsense-level vision-knowledge conflict in Multimodal Large Language Models (MLLMs), where visual information contradicts model's internal commonsense knowledge (see Figure 1).

Diagnostic

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

no code implementations4 Oct 2024 Wenxuan Wang, Kuiyi Gao, Zihan Jia, Youliang Yuan, Jen-tse Huang, Qiuzhi Liu, Shuai Wang, Wenxiang Jiao, Zhaopeng Tu

To assess the safety of existing models, we introduce a novel jailbreaking method called Chain-of-Jailbreak (CoJ) attack, which compromises image generation models through a step-by-step editing process.

Image Generation

Learning to Ask: When LLM Agents Meet Unclear Instruction

no code implementations31 Aug 2024 Wenxuan Wang, Juluan Shi, Zixuan Ling, Yuk-Kit Chan, Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

Equipped with the capability to call functions, modern large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.

On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents

1 code implementation2 Aug 2024 Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Michael R. Lyu, Maarten Sap

Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain.

Code Generation Large Language Model +1

Diffusion Feedback Helps CLIP See Better

1 code implementation29 Jul 2024 Wenxuan Wang, Quan Sun, Fan Zhang, Yepeng Tang, Jing Liu, Xinlong Wang

We demonstrate that DIVA improves CLIP's performance on the challenging MMVP-VLM benchmark which assesses fine-grained visual abilities to a large extent (e. g., 3-7%), and enhances the performance of MLLMs and vision models on multimodal understanding and segmentation tasks.

Image Classification

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

2 code implementations12 Jul 2024 Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence.

Position

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

no code implementations24 Jun 2024 Yuxuan Wan, Chaozheng Wang, Yi Dong, Wenxuan Wang, Shuqing Li, Yintong Huo, Michael R. Lyu

We conduct extensive testing with a dataset comprised of real-world websites and various MLLMs and demonstrate that DCGen achieves up to a 14% improvement in visual similarity over competing methods.

Layout Design

Enhancing Visible-Infrared Person Re-identification with Modality- and Instance-aware Visual Prompt Learning

no code implementations18 Jun 2024 Ruiqi Wu, Bingliang Jiao, Wenxuan Wang, Meng Liu, Peng Wang

In this model, we have designed a series of modality-specific prompts, which could enable our model to adapt to and make use of the specific information inherent in different modality inputs, thereby reducing the interference caused by the modality gap and achieving better identification.

Person Re-Identification Prompt Learning

Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack

1 code implementation17 Jun 2024 Shangqing Tu, Zhuoran Pan, Wenxuan Wang, Zhexin Zhang, Yuliang Sun, Jifan Yu, Hongning Wang, Lei Hou, Juanzi Li

To bridge this gap, we propose a new task, knowledge-to-jailbreak, which aims to generate jailbreaks from domain knowledge to evaluate the safety of LLMs when applied to those domains.

Language Modeling Language Modelling +1

Diffusion Actor-Critic with Entropy Regulator

1 code implementation24 May 2024 Yinuo Wang, Likun Wang, YuXuan Jiang, Wenjun Zou, Tong Liu, Xujie Song, Wenxuan Wang, Liming Xiao, Jiang Wu, Jingliang Duan, Shengbo Eben Li

This algorithm conceptualizes the reverse process of the diffusion model as a novel policy function and leverages the capability of the diffusion model to fit multimodal distributions, thereby enhancing the representational capacity of the policy.

Decision Making MuJoCo +1

How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO

1 code implementation22 Apr 2024 Man Tik Ng, Hui Tung Tse, Jen-tse Huang, Jingjing Li, Wenxuan Wang, Michael R. Lyu

However, existing studies focus on imitating well-known public figures or fictional characters, overlooking the potential for simulating ordinary individuals.

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

1 code implementation17 Feb 2024 Wenxuan Wang, Yisi Zhang, Xingjian He, Yichen Yan, Zijia Zhao, Xinlong Wang, Jing Liu

To promote classic VG towards human intention interpretation, we propose a new intention-driven visual grounding (IVG) task and build a large-scale IVG dataset termed IntentionVG with free-form intention expressions.

Visual Grounding

A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models

no code implementations17 Feb 2024 Jie Liu, Wenxuan Wang, Yihang Su, Jingyuan Huan, WenTing Chen, Yudi Zhang, Cheng-Yi Li, Kao-Jung Chang, Xiaohan Xin, Linlin Shen, Michael R. Lyu

The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support.

Diagnostic Visual Question Answering (VQA)

Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation

1 code implementation CVPR 2024 Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu

To foster future research into fine-grained visual grounding our benchmark RefCOCOm the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES.

Descriptive Object +3

LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models

1 code implementation1 Jan 2024 Yuxuan Wan, Wenxuan Wang, Yiliu Yang, Youliang Yuan, Jen-tse Huang, Pinjia He, Wenxiang Jiao, Michael R. Lyu

We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) such as ChatGPT and GPT-4.

Code Generation In-Context Learning +2

The Earth is Flat? Unveiling Factual Errors in Large Language Models

no code implementations1 Jan 2024 Wenxuan Wang, Juluan Shi, Zhaopeng Tu, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

Current methods for evaluating LLMs' veracity are limited by test data leakage or the need for extensive human labor, hindering efficient and accurate error detection.

In-Context Learning Multiple-choice

New Job, New Gender? Measuring the Social Bias in Image Generation Models

1 code implementation1 Jan 2024 Wenxuan Wang, Haonan Bai, Jen-tse Huang, Yuxuan Wan, Youliang Yuan, Haoyi Qiu, Nanyun Peng, Michael R. Lyu

BiasPainter uses a diverse range of seed images of individuals and prompts the image generation models to edit these images using gender, race, and age-neutral queries.

Bias Detection Fairness +1

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

1 code implementation13 Dec 2023 Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu

To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES

Descriptive Object +3

Training Multi-layer Neural Networks on Ising Machine

no code implementations6 Nov 2023 Xujie Song, Tong Liu, Shengbo Eben Li, Jingliang Duan, Wenxuan Wang, Keqiang Li

This paper proposes an Ising learning algorithm to train quantized neural network (QNN), by incorporating two essential techinques, namely binary representation of topological network and order reduction of loss function.

Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models

1 code implementation31 Oct 2023 Tian Liang, Zhiwei He, Jen-tse Huang, Wenxuan Wang, Wenxiang Jiao, Rui Wang, Yujiu Yang, Zhaopeng Tu, Shuming Shi, Xing Wang

Ideally, an advanced agent should possess the ability to accurately describe a given word using an aggressive description while concurrently maximizing confusion in the conservative description, enhancing its participation in the game.

LFAA: Crafting Transferable Targeted Adversarial Examples with Low-Frequency Perturbations

no code implementations31 Oct 2023 Kunyu Wang, Juluan Shi, Wenxuan Wang

In this work, we present a novel approach to generate transferable targeted adversarial examples by exploiting the vulnerability of deep neural networks to perturbations on high-frequency components of images.

Adversarial Attack

Med-DANet V2: A Flexible Dynamic Architecture for Efficient Medical Volumetric Segmentation

no code implementations28 Oct 2023 Haoran Shen, Yifu Zhang, Wenxuan Wang, Chen Chen, Jing Liu, Shanshan Song, Jiangyun Li

As a pioneering work, a dynamic architecture network for medical volumetric segmentation (i. e. Med-DANet) has achieved a favorable accuracy and efficiency trade-off by dynamically selecting a suitable 2D candidate model from the pre-defined model bank for different slices.

Computational Efficiency MRI segmentation +2

Not All Countries Celebrate Thanksgiving: On the Cultural Dominance in Large Language Models

no code implementations19 Oct 2023 Wenxuan Wang, Wenxiang Jiao, Jingyuan Huang, Ruyi Dai, Jen-tse Huang, Zhaopeng Tu, Michael R. Lyu

This paper identifies a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e. g., ChatGPT).

All

Distributional Soft Actor-Critic with Three Refinements

2 code implementations9 Oct 2023 Jingliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li, Chang Liu, Ya-Qin Zhang, Bo Cheng, Keqiang Li

To address this issue, we previously proposed the Distributional Soft Actor-Critic (DSAC or DSACv1), an off-policy RL algorithm that enhances value estimation accuracy by learning a continuous Gaussian value distribution.

Decision Making Reinforcement Learning (RL)

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

1 code implementation2 Oct 2023 Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho Lam, Shujie Ren, Youliang Yuan, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

Large Language Models (LLMs) have recently showcased their remarkable capacities, not only in natural language processing tasks but also across diverse domains such as clinical medicine, legal consultation, and education.

Benchmarking Safety Alignment

All Languages Matter: On the Multilingual Safety of Large Language Models

1 code implementation2 Oct 2023 Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu

In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice.

All Safety Alignment

Boosting Adversarial Transferability by Block Shuffle and Rotation

2 code implementations CVPR 2024 Kunyu Wang, Xuanran He, Wenxuan Wang, Xiaosen Wang

In this work, we observe that existing input transformation based attacks, one of the mainstream transfer-based attacks, result in different attention heatmaps on various models, which might limit the transferability.

An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

no code implementations18 Aug 2023 Wenxuan Wang, Jingyuan Huang, Jen-tse Huang, Chang Chen, Jiazhen Gu, Pinjia He, Michael R. Lyu

Moreover, through retraining the models with the test cases generated by OASIS, the robustness of the moderation model can be improved without performance degradation.

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

no code implementations18 Aug 2023 Yichen Yan, Xingjian He, Wenxuan Wang, Sihan Chen, Jing Liu

Our method harnesses the potential of the multi-modal features in the segmentation stage and aligns language features of different emphases with image features to achieve fine-grained text-to-pixel correlation.

Image Segmentation Referring Expression Segmentation +2

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

1 code implementation12 Aug 2023 Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi, Zhaopeng Tu

We propose a novel framework CipherChat to systematically examine the generalizability of safety alignment to non-natural languages -- ciphers.

Ethics Red Teaming +1

Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench

1 code implementation7 Aug 2023 Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao, Zhaopeng Tu, Michael R. Lyu

Evaluating Large Language Models' (LLMs) anthropomorphic capabilities has become increasingly important in contemporary discourse.

Revisiting the Reliability of Psychological Scales on Large Language Models

1 code implementation31 May 2023 Jen-tse Huang, Wenxiang Jiao, Man Ho Lam, Eric John Li, Wenxuan Wang, Michael R. Lyu

Recent research has focused on examining Large Language Models' (LLMs) characteristics from a psychological standpoint, acknowledging the necessity of understanding their behavioral characteristics.

Validating Multimedia Content Moderation Software via Semantic Fusion

no code implementations23 May 2023 Wenxuan Wang, Jingyuan Huang, Chang Chen, Jiazhen Gu, Jianping Zhang, Weibin Wu, Pinjia He, Michael Lyu

To this end, content moderation software has been widely deployed on these platforms to detect and blocks toxic content.

Sentence software testing

BiasAsker: Measuring the Bias in Conversational AI System

1 code implementation21 May 2023 Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, Michael Lyu

Particularly, it is hard to generate inputs that can comprehensively trigger potential bias due to the lack of data containing both social groups as well as biased properties.

Bias Detection

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

no code implementations19 May 2023 Wenxuan Wang, Jing Liu, Xingjian He, Yisi Zhang, Chen Chen, Jiachen Shen, Yan Zhang, Jiangyun Li

Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression.

Image Segmentation Segmentation +1

FreMIM: Fourier Transform Meets Masked Image Modeling for Medical Image Segmentation

1 code implementation21 Apr 2023 Wenxuan Wang, Jing Wang, Chen Chen, Jianbo Jiao, Yuanxiu Cai, Shanshan Song, Jiangyun Li

The research community has witnessed the powerful potential of self-supervised Masked Image Modeling (MIM), which enables the models capable of learning visual representation from unlabeled data.

Image Segmentation Medical Image Segmentation +2

Med-Tuning: A New Parameter-Efficient Tuning Framework for Medical Volumetric Segmentation

no code implementations21 Apr 2023 Jiachen Shen, Wenxuan Wang, Chen Chen, Jianbo Jiao, Jing Liu, Yan Zhang, Shanshan Song, Jiangyun Li

Thus, it is of increasing importance to fine-tune pre-trained models for medical volumetric segmentation tasks in a both effective and parameter-efficient manner.

Segmentation Transfer Learning

ParroT: Translating during Chat using Large Language Models tuned with Human Translation and Feedback

1 code implementation5 Apr 2023 Wenxiang Jiao, Jen-tse Huang, Wenxuan Wang, Zhiwei He, Tian Liang, Xing Wang, Shuming Shi, Zhaopeng Tu

Therefore, we propose ParroT, a framework to enhance and regulate the translation abilities during chat based on open-source LLMs (e. g., LLaMA), human-written translation and feedback data.

Instruction Following Machine Translation +1

Improving the Transferability of Adversarial Samples by Path-Augmented Method

1 code implementation CVPR 2023 Jianping Zhang, Jen-tse Huang, Wenxuan Wang, Yichen Li, Weibin Wu, Xiaosen Wang, Yuxin Su, Michael R. Lyu

However, such methods selected the image augmentation path heuristically and may augment images that are semantics-inconsistent with the target images, which harms the transferability of the generated adversarial samples.

Image Augmentation

ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark

no code implementations15 Mar 2023 Haoran Wu, Wenxuan Wang, Yuxuan Wan, Wenxiang Jiao, Michael Lyu

ChatGPT is a cutting-edge artificial intelligence language model developed by OpenAI, which has attracted a lot of attention due to its surprisingly strong ability in answering follow-up questions.

Grammatical Error Correction Language Modeling +2

MTTM: Metamorphic Testing for Textual Content Moderation Software

1 code implementation11 Feb 2023 Wenxuan Wang, Jen-tse Huang, Weibin Wu, Jianping Zhang, Yizhan Huang, Shuqing Li, Pinjia He, Michael Lyu

In addition, we leverage the test cases generated by MTTM to retrain the model we explored, which largely improves model robustness (0% to 5. 9% EFR) while maintaining the accuracy on the original test set.

Sentence

Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine

1 code implementation20 Jan 2023 Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi, Zhaopeng Tu

By evaluating on a number of benchmark test sets, we find that ChatGPT performs competitively with commercial translation products (e. g., Google Translate) on high-resource European languages but lags behind significantly on low-resource or distant languages.

Machine Translation Sentence +1

Smoothing Policy Iteration for Zero-sum Markov Games

no code implementations3 Dec 2022 Yangang Ren, Yao Lyu, Wenxuan Wang, Shengbo Eben Li, Zeyang Li, Jingliang Duan

In this paper, we propose the smoothing policy iteration (SPI) algorithm to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies.

Adversarial Robustness

Tencent's Multilingual Machine Translation System for WMT22 Large-Scale African Languages

1 code implementation18 Oct 2022 Wenxiang Jiao, Zhaopeng Tu, Jiarui Li, Wenxuan Wang, Jen-tse Huang, Shuming Shi

This paper describes Tencent's multilingual machine translation systems for the WMT22 shared task on Large-Scale Machine Translation Evaluation for African Languages.

Data Augmentation Machine Translation +1

QSAN: A Near-term Achievable Quantum Self-Attention Network

no code implementations14 Jul 2022 Jinjing Shi, Ren-xin Zhao, Wenxuan Wang, Shichao Zhang, Xuelong Li

Self-Attention Mechanism (SAM) is good at capturing the internal connections of features and greatly improves the performance of machine learning models, espeacially requiring efficient characterization and feature extraction of high-dimensional data.

Binary Classification Image Classification +3

Positive-Negative Equal Contrastive Loss for Semantic Segmentation

no code implementations4 Jul 2022 Jing Wang, Jiangyun Li, Wei Li, Lingfei Xuan, Tianxiang Zhang, Wenxuan Wang

The contextual information is critical for various computer vision tasks, previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.

Contrastive Learning Semantic Segmentation

Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation

no code implementations14 Jun 2022 Wenxuan Wang, Chen Chen, Jing Wang, Sen Zha, Yan Zhang, Jiangyun Li

For 3D medical image (e. g. CT and MRI) segmentation, the difficulty of segmenting each slice in a clinical case varies greatly.

Brain Tumor Segmentation Image Segmentation +5

Understanding and Mitigating the Uncertainty in Zero-Shot Translation

no code implementations20 May 2022 Wenxuan Wang, Wenxiang Jiao, Shuo Wang, Zhaopeng Tu, Michael R. Lyu

Zero-shot translation is a promising direction for building a comprehensive multilingual neural machine translation~(MNMT) system.

Machine Translation Translation

AEON: A Method for Automatic Evaluation of NLP Test Cases

1 code implementation13 May 2022 Jen-tse Huang, Jianping Zhang, Wenxuan Wang, Pinjia He, Yuxin Su, Michael R. Lyu

However, in practice, many of the generated test cases fail to preserve similar semantic meaning and are unnatural (e. g., grammar errors), which leads to a high false alarm rate and unnatural test cases.

Semantic Similarity Semantic Textual Similarity +1

DST: Dynamic Substitute Training for Data-free Black-box Attack

no code implementations CVPR 2022 Wenxuan Wang, Xuelin Qian, Yanwei Fu, xiangyang xue

With the wide applications of deep neural network models in various computer vision tasks, more and more works study the model vulnerability to adversarial examples.

Knowledge Distillation

Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

no code implementations ACL 2022 Wenxuan Wang, Wenxiang Jiao, Yongchang Hao, Xing Wang, Shuming Shi, Zhaopeng Tu, Michael Lyu

In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT).

Decoder Machine Translation +2

TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images

1 code implementation30 Jan 2022 Jiangyun Li, Wenxuan Wang, Chen Chen, Tianxiang Zhang, Sen Zha, Jing Wang, Hong Yu

Different from TransBTS, the proposed TransBTSV2 is not limited to brain tumor segmentation (BTS) but focuses on general medical image segmentation, providing a stronger and more efficient 3D baseline for volumetric segmentation of medical images.

Brain Tumor Segmentation Image Segmentation +3

BBS-KWS:The Mandarin Keyword Spotting System Won the Video Keyword Wakeup Challenge

no code implementations3 Dec 2021 Yuting Yang, Binbin Du, Yingxin Zhang, Wenxuan Wang, Yuke Li

We propose a mandarin keyword spotting system (KWS) with several novel and effective improvements, including a big backbone (B) model, a keyword biasing (B) mechanism and the introduction of syllable modeling units (S).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Optimal Variable Speed Limit Control Strategy on Freeway Segments under Fog Conditions

1 code implementation30 Jul 2021 Ben Zhai, Yanli Wang, Wenxuan Wang, Bing Wu

This study developed optimal VSL control strategy under fog conditions with fully consideration of factors that affect traffic safety risks.

Language Models are Good Translators

no code implementations25 Jun 2021 Shuo Wang, Zhaopeng Tu, Zhixing Tan, Wenxuan Wang, Maosong Sun, Yang Liu

Inspired by the recent progress of large-scale pre-trained language models on machine translation in a limited scenario, we firstly demonstrate that a single language model (LM4MT) can achieve comparable performance with strong encoder-decoder NMT models on standard machine translation benchmarks, using the same training data and similar amount of model parameters.

Decoder Language Modeling +4

Adv-Makeup: A New Imperceptible and Transferable Attack on Face Recognition

1 code implementation7 May 2021 Bangjie Yin, Wenxuan Wang, Taiping Yao, Junfeng Guo, Zelun Kong, Shouhong Ding, Jilin Li, Cong Liu

Deep neural networks, particularly face recognition models, have been shown to be vulnerable to both digital and physical adversarial examples.

Adversarial Attack Face Generation +2

Delving into Data: Effectively Substitute Training for Black-box Attack

no code implementations CVPR 2021 Wenxuan Wang, Bangjie Yin, Taiping Yao, Li Zhang, Yanwei Fu, Shouhong Ding, Jilin Li, Feiyue Huang, xiangyang xue

Previous substitute training approaches focus on stealing the knowledge of the target model based on real training data or synthetic data, without exploring what kind of data can further improve the transferability between the substitute and target models.

Adversarial Attack

TransBTS: Multimodal Brain Tumor Segmentation Using Transformer

2 code implementations7 Mar 2021 Wenxuan Wang, Chen Chen, Meng Ding, Jiangyun Li, Hong Yu, Sen Zha

To capture the local 3D context information, the encoder first utilizes 3D CNN to extract the volumetric spatial feature maps.

 Ranked #1 on Brain Tumor Segmentation on BRATS 2019 (Dice Score metric)

Brain Tumor Segmentation Decoder +4

Recurrent Model Predictive Control

no code implementations23 Feb 2021 Zhengyu Liu, Jingliang Duan, Wenxuan Wang, Shengbo Eben Li, Yuming Yin, Ziyu Lin, Qi Sun, Bo Cheng

This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems.

model Model Predictive Control

Recurrent Model Predictive Control: Learning an Explicit Recurrent Controller for Nonlinear Systems

no code implementations20 Feb 2021 Zhengyu Liu, Jingliang Duan, Wenxuan Wang, Shengbo Eben Li, Yuming Yin, Ziyu Lin, Bo Cheng

This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems.

Model Predictive Control

Rethinking the Value of Transformer Components

no code implementations COLING 2020 Wenxuan Wang, Zhaopeng Tu

Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures.

Translation

Ultrasound Liver Fibrosis Diagnosis using Multi-indicator guided Deep Neural Networks

no code implementations10 Sep 2020 Jiali Liu, Wenxuan Wang, Tianyao Guan, Ningbo Zhao, Xiaoguang Han, Zhen Li

An indicator-guided learning mechanism is further proposed to ease the training of the proposed model.

A New Screening Method for COVID-19 based on Ocular Feature Recognition by Machine Learning Tools

no code implementations4 Sep 2020 Yanwei Fu, Feng Li, Wenxuan Wang, Haicheng Tang, Xuelin Qian, Mengwei Gu, xiangyang xue

After more than four months study, we found that the confirmed cases of COVID-19 present the consistent ocular pathological symbols; and we propose a new screening method of analyzing the eye-region images, captured by common CCD and CMOS cameras, could reliably make a rapid risk screening of COVID-19 with very high accuracy.

BIG-bench Machine Learning Ethics +2

FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification

no code implementations CVPR 2020 Wenxuan Wang, Yanwei Fu, Xuelin Qian, Yu-Gang Jiang, Qi Tian, Xiangyang Xue

It is challenging in learning a makeup-invariant face verification model, due to (1) insufficient makeup/non-makeup face training pairs, (2) the lack of diverse makeup faces, and (3) the significant appearance changes caused by cosmetics.

Face Recognition Face Verification

Long-Term Cloth-Changing Person Re-identification

no code implementations26 May 2020 Xuelin Qian, Wenxuan Wang, Li Zhang, Fangrui Zhu, Yanwei Fu, Tao Xiang, Yu-Gang Jiang, xiangyang xue

Specifically, we consider that under cloth-changes, soft-biometrics such as body shape would be more reliable.

Cloth-Changing Person Re-Identification

Signal-to-Noise Ratio of Microwave Photonic Filter With an Interferometric Structure Based on an Incoherent Broadband Optical Source

no code implementations10 May 2020 Long Huang, Ruoming Li, Peng Xiang, Pan Dai, Wenxuan Wang, Mi Li, Xiangfei Chen, Yuechun Shi

Theoretical analysis shows that the SNR is a function of the center frequency of the passband, the modulation index, the chromatic dispersion, and the shape of the IBOS.

Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition

no code implementations17 Jan 2020 Wenxuan Wang, Yanwei Fu, Qiang Sun, Tao Chen, Chenjie Cao, Ziqi Zheng, Guoqiang Xu, Han Qiu, Yu-Gang Jiang, xiangyang xue

Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we further evaluate several tasks of few-shot expression learning by virtue of our F2ED, which are to recognize the facial expressions given only few training instances.

Facial Expression Recognition Facial Expression Recognition (FER) +1

DeepEnFM: Deep neural networks with Encoder enhanced Factorization Machine

no code implementations25 Sep 2019 Qiang Sun, Zhinan Cheng, Yanwei Fu, Wenxuan Wang, Yu-Gang Jiang, xiangyang xue

Instead of learning the cross features directly, DeepEnFM adopts the Transformer encoder as a backbone to align the feature embeddings with the clues of other fields.

Click-Through Rate Prediction

FPETS : Fully Parallel End-to-End Text-to-Speech System

2 code implementations12 Dec 2018 Dabiao Ma, Zhiba Su, Wenxuan Wang, Yuhao Lu

End-to-end Text-to-speech (TTS) system can greatly improve the quality of synthesised speech.

Text to Speech

Pose-Normalized Image Generation for Person Re-identification

2 code implementations ECCV 2018 Xuelin Qian, Yanwei Fu, Tao Xiang, Wenxuan Wang, Jie Qiu, Yang Wu, Yu-Gang Jiang, xiangyang xue

Person Re-identification (re-id) faces two major challenges: the lack of cross-view paired training data and learning discriminative identity-sensitive and view-invariant features in the presence of large pose variations.

Generative Adversarial Network Image Generation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.