Search Results for author: Bin Zhu

Found 45 papers, 16 papers with code

AURORA: Navigating UI Tarpits via Automated Neural Screen Understanding

no code implementations • 1 Apr 2024 • Safwat Ali Khan, Wenyu Wang, Yiran Ren, Bin Zhu, Jiangfan Shi, Alyssa McGowan, Wing Lam, Kevin Moran

We evaluated AURORA both on a set of 12 apps with known tarpits from prior work, and on a new set of five of the most popular apps from the Google Play store.

Navigate

Paper
Add Code

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

no code implementations • 12 Mar 2024 • Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.

Food Recognition

Paper
Add Code

LLMBind: A Unified Modality-Task Integration Framework

no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Audio Generation Image Segmentation +3

Paper
Add Code

Video Editing for Video Retrieval

no code implementations • 4 Feb 2024 • Bin Zhu, Kevin Flanagan, Adriano Fragomeni, Michael Wray, Dima Damen

The teacher model is employed to edit the clips in the training set whereas the student model trains on the edited clips.

Retrieval Text Retrieval +2

Paper
Add Code

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan

In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.

Ranked #52 on Visual Question Answering on MM-Vet

Hallucination Visual Question Answering

2,344

Paper
Code

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

no code implementations • 22 Dec 2023 • Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo

In the second stage, we construct a multi-round conversation dataset and a reasoning segmentation dataset to fine-tune the model, enabling it to conduct professional dialogues and generate segmentation masks based on complex reasoning in the food domain.

Food Recognition Multi-Task Learning +3

Paper
Add Code

Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

1 code implementation • 21 Dec 2023 • Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu

Based on the evaluation, our work makes a key analysis of the underlying reason for the success of the attack, namely the inability of LLMs to distinguish between instructions and external content and the absence of LLMs' awareness to not execute instructions within external content.

Benchmarking

Paper
Code

CAR: Consolidation, Augmentation and Regulation for Recipe Retrieval

no code implementations • 8 Dec 2023 • Fangzhou Song, Bin Zhu, Yanbin Hao, Shuo Wang, Xiangnan He

Learning recipe and food image representation in common embedding space is non-trivial but crucial for cross-modal recipe retrieval.

Retrieval

Paper
Add Code

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

1 code implementation • 27 Nov 2023 • Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

Paper
Code

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

4 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Ranked #2 on Zero-Shot Video Question Answer on TGIF-QA

Language Modelling Large Language Model +2

2,344

Paper
Code

X-Transfer: A Transfer Learning-Based Framework for GAN-Generated Fake Image Detection

no code implementations • 7 Oct 2023 • Lei Zhang, Hao Chen, Shu Hu, Bin Zhu, Ching Sheng Lin, Xi Wu, Jinrong Hu, Xin Wang

Generative adversarial networks (GANs) have remarkably advanced in diverse domains, especially image generation and editing.

Fake Image Detection Image Generation +1

Paper
Add Code

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

4 code implementations • 3 Oct 2023 • Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, Hongfa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

Ranked #1 on Zero-shot Audio Classification on VGG-Sound (using extra training data)

Audio Classification Contrastive Learning +11

2,344

Paper
Code

Improving Cross-dataset Deepfake Detection with Deep Information Decomposition

no code implementations • 30 Sep 2023 • Shanmin Yang, Shu Hu, Bin Zhu, Ying Fu, Siwei Lyu, Xi Wu, Xin Wang

Deepfake technology poses a significant threat to security and social trust.

DeepFake Detection Face Swapping

Paper
Add Code

Controlling Neural Style Transfer with Deep Reinforcement Learning

no code implementations • 30 Sep 2023 • Chengming Feng, Jing Hu, Xin Wang, Shu Hu, Bin Zhu, Xi Wu, Hongtu Zhu, Siwei Lyu

Controlling the degree of stylization in the Neural Style Transfer (NST) is a little tricky since it usually needs hand-engineering on hyper-parameters.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Image-to-Image Translation with Deep Reinforcement Learning

1 code implementation • 24 Sep 2023 • Xin Wang, Ziwei Luo, Jing Hu, Chengming Feng, Shu Hu, Bin Zhu, Xi Wu, Xin Li, Siwei Lyu

The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform a source image successively to a target image.

Auxiliary Learning Decision Making +3

Paper
Code

CgT-GAN: CLIP-guided Text GAN for Image Captioning

1 code implementation • 23 Aug 2023 • Jiarui Yu, Haoran Li, Yanbin Hao, Bin Zhu, Tong Xu, Xiangnan He

Particularly, we use adversarial training to teach CgT-GAN to mimic the phrases of an external text corpus and CLIP-based reward to provide semantic guidance.

Image Captioning

Paper
Code

MKL-$L_{0/1}$-SVM

no code implementations • 23 Aug 2023 • Bin Zhu, Yijie Shi

This paper presents a Multiple Kernel Learning (abbreviated as MKL) framework for the Support Vector Machine (SVM) with the $(0, 1)$ loss function.

Paper
Add Code

Towards Attack-tolerant Federated Learning via Critical Parameter Analysis

1 code implementation • ICCV 2023 • Sungwon Han, Sungwon Park, Fangzhao Wu, Sundong Kim, Bin Zhu, Xing Xie, Meeyoung Cha

Federated learning is used to train a shared model in a decentralized way without clients sharing private data with each other.

Federated Learning

Paper
Code

FedDefender: Client-Side Attack-Tolerant Federated Learning

1 code implementation • 18 Jul 2023 • Sungwon Park, Sungwon Han, Fangzhao Wu, Sundong Kim, Bin Zhu, Xing Xie, Meeyoung Cha

Evaluations of real-world scenarios across multiple datasets show that the proposed method enhances the robustness of federated learning against model poisoning attacks.

Federated Learning Knowledge Distillation +1

Paper
Code

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

1 code implementation • 17 May 2023 • Wenjun Peng, Jingwei Yi, Fangzhao Wu, Shangxi Wu, Bin Zhu, Lingjuan Lyu, Binxing Jiao, Tong Xu, Guangzhong Sun, Xing Xie

Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers.

Model extraction

Paper
Code

Harnessing the Power of Text-image Contrastive Models for Automatic Detection of Online Misinformation

no code implementations • 19 Apr 2023 • Hao Chen, Peng Zheng, Xin Wang, Shu Hu, Bin Zhu, Jinrong Hu, Xi Wu, Siwei Lyu

As growing usage of social media websites in the recent decades, the amount of news articles spreading online rapidly, resulting in an unprecedented scale of potentially fraudulent information.

Contrastive Learning Misinformation +1

Paper
Add Code

An ADMM Solver for the MKL-$L_{0/1}$-SVM

no code implementations • 8 Mar 2023 • Yijie Shi, Bin Zhu

We formulate the Multiple Kernel Learning (abbreviated as MKL) problem for the support vector machine with the infamous $(0, 1)$-loss function.

Paper
Add Code

Attacking Important Pixels for Anchor-free Detectors

no code implementations • 26 Jan 2023 • Yunxu Xie, Shu Hu, Xin Wang, Quanyu Liao, Bin Zhu, Xi Wu, Siwei Lyu

Existing adversarial attacks on object detection focus on attacking anchor-based detectors, which may not work well for anchor-free detectors.

Adversarial Attack object-detection +2

Paper
Add Code

On the Statistical Consistency of a Generalized Cepstral Estimator

no code implementations • 17 Jan 2023 • Bin Zhu, Mattia Zorzi

We consider the problem to estimate the generalized cepstral coefficients of a stationary stochastic process or stationary multidimensional random field.

Paper
Add Code

RMBench: Benchmarking Deep Reinforcement Learning for Robotic Manipulator Control

1 code implementation • 20 Oct 2022 • Yanfei Xiang, Xin Wang, Shu Hu, Bin Zhu, Xiaomeng Huang, Xi Wu, Siwei Lyu

Reinforcement learning is applied to solve actual complex tasks from high-dimensional, sensory inputs.

Benchmarking Data Augmentation +2

Paper
Code

Text-driven Video Prediction

no code implementations • 6 Oct 2022 • Xue Song, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

Specifically, appearance and motion components are provided by the image and caption separately.

Causal Inference Video Generation +1

Paper
Add Code

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

3 code implementations • 26 Sep 2022 • Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets.

Object Segmentation +4

Paper
Code

Robust Quantity-Aware Aggregation for Federated Learning

no code implementations • 22 May 2022 • Jingwei Yi, Fangzhao Wu, Huishuai Zhang, Bin Zhu, Tao Qi, Guangzhong Sun, Xing Xie

Federated learning (FL) enables multiple clients to collaboratively train models without sharing their local data, and becomes an important privacy-preserving machine learning framework.

Federated Learning Privacy Preserving

Paper
Add Code

Cross-lingual Adaptation for Recipe Retrieval with Mixup

no code implementations • 8 May 2022 • Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Wing-Kwong Chan

To bridge the domain gap, recipe mixup loss is proposed to enforce the intermediate domain to locate in the shortest geodesic path between source and target domains in the recipe embedding space.

Retrieval Unsupervised Domain Adaptation

Paper
Add Code

Improving robustness of language models from a geometry-aware perspective

no code implementations • Findings (ACL) 2022 • Bin Zhu, Zhaoquan Gu, Le Wang, Jinyin Chen, Qi Xuan

On top of FADA, we propose geometry-aware adversarial training (GAT) to perform adversarial training on friendly adversarial data so that we can save a large number of search steps.

Data Augmentation

Paper
Add Code

OneLabeler: A Flexible System for Building Data Labeling Tools

1 code implementation • 27 Mar 2022 • Yu Zhang, Yun Wang, Haidong Zhang, Bin Zhu, Siming Chen, Dongmei Zhang

In this paper, we propose a conceptual framework for data labeling and OneLabeler based on the conceptual framework to support easy building of labeling tools for diverse usage scenarios.

Paper
Code

UA-FedRec: Untargeted Attack on Federated News Recommendation

1 code implementation • 14 Feb 2022 • Jingwei Yi, Fangzhao Wu, Bin Zhu, Jing Yao, Zhulin Tao, Guangzhong Sun, Xing Xie

Our study reveals a critical security issue in existing federated news recommendation systems and calls for research efforts to address the issue.

Federated Learning News Recommendation +2

Paper
Code

TREATED:Towards Universal Defense against Textual Adversarial Attacks

no code implementations • 13 Sep 2021 • Bin Zhu, Zhaoquan Gu, Le Wang, Zhihong Tian

Recent work shows that deep neural networks are vulnerable to adversarial examples.

Adversarial Defense

Paper
Add Code

Transferable Adversarial Examples for Anchor Free Object Detection

no code implementations • 3 Jun 2021 • Quanyu Liao, Xin Wang, Bin Kong, Siwei Lyu, Bin Zhu, Youbing Yin, Qi Song, Xi Wu

Deep neural networks have been demonstrated to be vulnerable to adversarial attacks: subtle perturbation can completely change prediction result.

Adversarial Attack Object +2

Paper
Add Code

Imperceptible Adversarial Examples for Fake Image Detection

no code implementations • 3 Jun 2021 • Quanyu Liao, Yuezun Li, Xin Wang, Bin Kong, Bin Zhu, Siwei Lyu, Youbing Yin, Qi Song, Xi Wu

Fooling people with highly realistic fake images generated with Deepfake or GANs brings a great social disturbance to our society.

Face Swapping Fake Image Detection

Paper
Add Code

Pyramid Fusion Dark Channel Prior for Single Image Dehazing

no code implementations • 21 May 2021 • Qiyuan Liang, Bin Zhu, Chong-Wah Ngo

In this paper, we propose the pyramid fusion dark channel prior (PF-DCP) for single image dehazing.

Image Dehazing Single Image Dehazing

Paper
Add Code

An Optimized H.266/VVC Software Decoder On Mobile Platform

no code implementations • 5 Mar 2021 • Yiming Li, Shan Liu, Yu Chen, Yushan Zheng, Sijia Chen, Bin Zhu, Jian Lou

As the successor of H. 265/HEVC, the new versatile video coding standard (H. 266/VVC) can provide up to 50% bitrate saving with the same subjective quality, at the cost of increased decoding complexity.

Paper
Add Code

New Strong Bounds on sub-GeV Dark Matter from Boosted and Migdal Effects

no code implementations • 17 Dec 2020 • Victor V. Flambaum, Liangliang Su, Lei Wu, Bin Zhu

Due to the low nuclear recoils, sub-GeV dark matter (DM) is usually beyond the sensitivity of the conventional DM direct detection experiments.

High Energy Physics - Phenomenology Cosmology and Nongalactic Astrophysics

Paper
Add Code

Line Spectrum Representation for Vector Processes With Application to Frequency Estimation

no code implementations • 24 Jun 2020 • Bin Zhu

A positive semidefinite Toeplitz matrix, which often arises as the finite covariance matrix of a stationary random process, can be decomposed as the sum of a nonnegative multiple of the identity corresponding to a white noise, and a singular term corresponding to a purely deterministic process.

Time Series Analysis

Paper
Add Code

CookGAN: Causality Based Text-to-Image Synthesis

no code implementations • CVPR 2020 • Bin Zhu, Chong-Wah Ngo

Particularly, a cooking simulator sub-network is proposed to incrementally make changes to food images based on the interaction between ingredients and cooking methods over a series of steps.

Image Generation

Paper
Add Code

CPM R-CNN: Calibrating Point-guided Misalignment in Object Detection

1 code implementation • 7 Mar 2020 • Bin Zhu, Qing Song, Lu Yang, Zhihui Wang, Chun Liu, Mengjie Hu

In object detection, offset-guided and point-guided regression dominate anchor-based and anchor-free method separately.

object-detection Object Detection

Paper
Code

An Empirical Bayes Approach to Frequency Estimation

no code implementations • 21 Oct 2019 • Giorgio Picci, Bin Zhu

In this paper we show that the classical problem of frequency estimation can be formulated and solved efficiently in an empirical Bayesian framework by assigning a uniform a priori probability distribution to the unknown frequency.

Paper
Add Code

Automatic Group Cohesiveness Detection With Multi-modal Features

no code implementations • 2 Oct 2019 • Bin Zhu, Xin Guo, Kenneth Barner, Charles Boncelet

The task is to predict the cohesive level for a group of people in images.

Emotion Recognition regression

Paper
Add Code

Graph Neural Networks for Image Understanding Based on Multiple Cues: Group Emotion Recognition and Event Recognition as Use Cases

1 code implementation • 19 Sep 2019 • Xin Guo, Luisa F. Polania, Bin Zhu, Charles Boncelet, Kenneth E. Barner

A graph neural network (GNN) for image understanding based on multiple cues is proposed in this paper.

Emotion Recognition

Paper
Code

R2GAN: Cross-Modal Recipe Retrieval With Generative Adversarial Network

no code implementations • CVPR 2019 • Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Yanbin Hao

Representing procedure text such as recipe for crossmodal retrieval is inherently a difficult problem, not mentioning to generate image from recipe for visualization.

Generative Adversarial Network Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.