Search Results for author: Jian Wang

Found 289 papers, 90 papers with code

多特征融合的越英端到端语音翻译方法(A Vietnamese-English end-to-end speech translation method based on multi-feature fusion)

no code implementations CCL 2022 Houli Ma, Ling Dong, Wenjun Wang, Jian Wang, Shengxiang Gao, Zhengtao Yu

“语音翻译的编码器需要同时编码语音中的声学和语义信息, 单一的Fbank或Wav2vec2语音特征表征能力存在不足。本文通过分析人工的Fbank特征与自监督的Wav2vec2特征间的差异性, 提出基于交叉注意力机制的声学特征融合方法, 并探究了不同的自监督特征和融合方式, 加强模型对语音中声学和语义信息的学习。结合越南语语音特点, 以Fbank特征为主、Pitch特征为辅混合编码Fbank表征, 构建多特征融合的越-英语音翻译模型。实验表明, 使用多特征的语音翻译模型相比单特征翻译效果更优, 与简单的特征拼接方法相比更有效, 所提的多特征融合方法在越-英语音翻译任务上提升了1. 97个BLEU值。”

RealMedDial: A Real Telemedical Dialogue Dataset Collected from Online Chinese Short-Video Clips

no code implementations COLING 2022 Bo Xu, Hongtong Zhang, Jian Wang, Xiaokun Zhang, Dezhi Hao, Linlin Zong, Hongfei Lin, Fenglong Ma

We collected and annotated a wide range of meta-data with respect to medical dialogue including doctor profiles, hospital departments, diseases and symptoms for fine-grained analysis on language usage pattern and clinical diagnosis.

Response Generation

Automated Quality Evaluation of Cervical Cytopathology Whole Slide Images Based on Content Analysis

no code implementations20 May 2025 Lanlan Kang, Jian Wang, Jian Qin, Yiqin Liang, Yongjun He

The ThinPrep Cytologic Test (TCT) is the most widely used method for cervical cancer screening, and the sample quality directly impacts the accuracy of the diagnosis.

FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units

no code implementations13 May 2025 Jian Wang, Baoyuan Wu, Li Liu, Qingshan Liu

The rapid evolution of generative AI has increased the threat of realistic audio-visual deepfakes, demanding robust detection methods.

DeepFake Detection Face Swapping

Agent-as-a-Service based on Agent Network

no code implementations13 May 2025 Yuhan Zhu, Haojie Liu, Jian Wang, Bing Li, Zikang Yin, Yefei Liao

AaaS-AN unifies the entire agent lifecycle, including construction, integration, interoperability, and networked collaboration, through two core components: (1) a dynamic Agent Network, which models agents and agent groups as vertexes that self-organize within the network based on task and role dependencies; (2) service-oriented agents, incorporating service discovery, registration, and interoperability protocols.

Code Generation Mathematical Reasoning

Seed1.5-VL Technical Report

no code implementations11 May 2025 Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, PengFei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng, Weiwei Liu, Wenqian Wang, Xianhan Zeng, Xiao Liu, Xiaobo Qin, Xiaohan Ding, Xiaojun Xiao, Xiaoying Zhang, Xuanwei Zhang, Xuehan Xiong, Yanghua Peng, Yangrui Chen, Yanwei Li, Yanxu Hu, Yi Lin, Yiyuan Hu, Yiyuan Zhang, Youbin Wu, Yu Li, Yudong Liu, Yue Ling, Yujia Qin, Zanbo Wang, Zhiwu He, Aoxue Zhang, Bairen Yi, Bencheng Liao, Can Huang, Can Zhang, Chaorui Deng, Chaoyi Deng, Cheng Lin, Cheng Yuan, Chenggang Li, Chenhui Gou, Chenwei Lou, Chengzhi Wei, Chundian Liu, Chunyuan Li, Deyao Zhu, Donghong Zhong, Feng Li, Feng Zhang, Gang Wu, Guodong Li, Guohong Xiao, Haibin Lin, Haihua Yang, Haoming Wang, Heng Ji, Hongxiang Hao, Hui Shen, Huixia Li, Jiahao Li, Jialong Wu, Jianhua Zhu, Jianpeng Jiao, Jiashi Feng, Jiaze Chen, Jianhui Duan, Jihao Liu, Jin Zeng, Jingqun Tang, Jingyu Sun, Joya Chen, Jun Long, Junda Feng, Junfeng Zhan, Junjie Fang, Junting Lu, Kai Hua, Kai Liu, Kai Shen, Kaiyuan Zhang, Ke Shen, Ke Wang, Keyu Pan, Kun Zhang, Kunchang Li, Lanxin Li, Lei LI, Lei Shi, Li Han, Liang Xiang, Liangqiang Chen, Lin Chen, Lin Li, Lin Yan, Liying Chi, Longxiang Liu, Mengfei Du, Mingxuan Wang, Ningxin Pan, Peibin Chen, Pengfei Chen, Pengfei Wu, Qingqing Yuan, Qingyao Shuai, Qiuyan Tao, Renjie Zheng, Renrui Zhang, Ru Zhang, Rui Wang, Rui Yang, Rui Zhao, Shaoqiang Xu, Shihao Liang, Shipeng Yan, Shu Zhong, Shuaishuai Cao, Shuangzhi Wu, Shufan Liu, Shuhan Chang, Songhua Cai, Tenglong Ao, Tianhao Yang, Tingting Zhang, Wanjun Zhong, Wei Jia, Wei Weng, Weihao Yu, Wenhao Huang, Wenjia Zhu, Wenli Yang, Wenzhi Wang, Xiang Long, XiangRui Yin, Xiao Li, Xiaolei Zhu, Xiaoying Jia, Xijin Zhang, Xin Liu, Xinchen Zhang, Xinyu Yang, Xiongcai Luo, Xiuli Chen, Xuantong Zhong, Xuefeng Xiao, Xujing Li, Yan Wu, Yawei Wen, Yifan Du, Yihao Zhang, Yining Ye, Yonghui Wu, Yu Liu, Yu Yue, Yufeng Zhou, Yufeng Yuan, Yuhang Xu, Yuhong Yang, Yun Zhang, Yunhao Fang, Yuntao Li, Yurui Ren, Yuwen Xiong, Zehua Hong, Zehua Wang, Zewei Sun, Zeyu Wang, Zhao Cai, Zhaoyue Zha, Zhecheng An, Zhehui Zhao, Zhengzhuo Xu, Zhipeng Chen, Zhiyong Wu, Zhuofan Zheng, ZiHao Wang, Zilong Huang, Ziyu Zhu, Zuquan Song

We present Seed1. 5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning.

Mixture-of-Experts Multimodal Reasoning +1

DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor

no code implementations6 May 2025 Wei-Ting Chen, Yu-Jiet Vong, Yi-Tsung Lee, Sy-Yen Kuo, Qiang Gao, Sizhuo Ma, Jian Wang

To address this limitation, we introduce a novel VQA framework, DiffVQA, which harnesses the robust generalization capabilities of diffusion models pre-trained on extensive datasets.

Mamba Video Quality Assessment +1

Explainable Machine Learning for Cyberattack Identification from Traffic Flows

1 code implementation2 May 2025 Yujing Zhou, Marc L. Jacquet, Robel Dawit, Skyler Fabre, Dev Sarawat, Faheem Khan, Madison Newell, Yongxin Liu, Dahai Liu, Hongyun Chen, Jian Wang, Huihui Wang

The increasing automation of traffic management systems has made them prime targets for cyberattacks, disrupting urban mobility and public safety.

Anomaly Detection Management

Machine Learning for Cyber-Attack Identification from Traffic Flows

1 code implementation2 May 2025 Yujing Zhou, Marc L. Jacquet, Robel Dawit, Skyler Fabre, Dev Sarawat, Faheem Khan, Madison Newell, Yongxin Liu, Dahai Liu, Hongyun Chen, Jian Wang, Huihui Wang

This paper presents our simulation of cyber-attacks and detection strategies on the traffic control system in Daytona Beach, FL.

GuideSR: Rethinking Guidance for One-Step High-Fidelity Diffusion-Based Super-Resolution

no code implementations1 May 2025 Aditya Arora, Zhengzhong Tu, YuFei Wang, Ruizheng Bai, Jian Wang, Sizhuo Ma

In this paper, we propose GuideSR, a novel single-step diffusion-based image super-resolution (SR) model specifically designed to enhance image fidelity.

Image Restoration Image Super-Resolution +1

POLYRAG: Integrating Polyviews into Retrieval-Augmented Generation for Medical Applications

no code implementations21 Apr 2025 Chunjing Gan, Dan Yang, Binbin Hu, Ziqi Liu, Yue Shen, Zhiqiang Zhang, Jian Wang, Jun Zhou

Large language models (LLMs) have become a disruptive force in the industry, introducing unprecedented capabilities in natural language processing, logical reasoning and so on.

Hallucination Logical Reasoning +2

SEGA: Drivable 3D Gaussian Head Avatar from a Single Image

no code implementations19 Apr 2025 Chen Guo, Zhuo Su, Jian Wang, Shuang Li, Xu Chang, Zhaohu Li, Yang Zhao, Guidong Wang, Ruqi Huang

Creating photorealistic 3D head avatars from limited input has become increasingly important for applications in virtual reality, telepresence, and digital entertainment.

Neural Rendering

Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input

no code implementations11 Apr 2025 Jian Wang, Rishabh Dabral, Diogo Luvizon, Zhe Cao, Lingjie Liu, Thabo Beeler, Christian Theobalt

First, the IMU sensor inputs, the optional egocentric image, and text description of human motion are encoded into the latent space of a motion VQ-VAE.

Decoder

Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation

1 code implementation10 Apr 2025 Bo Zhang, Hui Ma, Dailin Li, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin

Large language models (LLMs) demonstrate remarkable text comprehension and generation capabilities but often lack the ability to utilize up-to-date or domain-specific knowledge not included in their training data.

Dialogue Generation Reading Comprehension

ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving

no code implementations4 Apr 2025 Sheng Yang, Tong Zhan, Shichen Qiao, Jicheng Gong, Qing Yang, YanFeng Lu, Jian Wang

In typical traffic scenarios like the VoD (View-of-Delft) dataset, experiments show that with reasonable inference speed, ZFusion achieved the state-of-the-art mAP (mean average precision) in the region of interest, while having competitive mAP in the entire area compared to the baseline methods, which demonstrates performance close to LiDAR and greatly outperforms those camera-only methods.

3D Object Detection Autonomous Driving +1

SelfMedHPM: Self Pre-training With Hard Patches Mining Masked Autoencoders For Medical Image Segmentation

no code implementations3 Apr 2025 Yunhao Lv, Lingyu Chen, Jian Wang, Yangxi Li, Fang Chen

In recent years, deep learning methods such as convolutional neural network (CNN) and transformers have made significant progress in CT multi-organ segmentation.

Image Segmentation Medical Image Segmentation +3

GeoRAG: A Question-Answering Approach from a Geographical Perspective

no code implementations2 Apr 2025 Jian Wang, Zhuo Zhao, Zeng Jie Wang, Bo Da Cheng, Lei Nie, Wen Luo, Zhao Yuan Yu, Ling Wang Yuan

Geographic Question Answering (GeoQA) addresses natural language queries in geographic domains to fulfill complex user demands and improve information retrieval efficiency.

Attribute Geographic Question Answering +6

Training-Free Text-Guided Image Editing with Visual Autoregressive Model

1 code implementation31 Mar 2025 YuFei Wang, Lanqing Guo, Zhihao LI, Jiaxing Huang, Pichao Wang, Bihan Wen, Jian Wang

Text-guided image editing is an essential task that enables users to modify images through natural language descriptions.

text-guided-image-editing

Style Quantization for Data-Efficient GAN Training

no code implementations31 Mar 2025 Jian Wang, Xin Lan, Jizhe Zhou, Yuxin Tian, Jiancheng Lv

Instead of direct quantization, we first map the input latent variables into a less entangled ``style'' space and apply quantization using a learnable codebook.

Navigate Quantization

FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video

no code implementations29 Mar 2025 Andrea Boscolo Camiletto, Jian Wang, Eduardo Alvarado, Rishabh Dabral, Thabo Beeler, Marc Habermann, Christian Theobalt

Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications but presents significant challenges such as heavy occlusions and limited annotated real-world data.

Pose Prediction Pose Tracking

SceneMI: Motion In-betweening for Modeling Human-Scene Interactions

no code implementations20 Mar 2025 Inwoo Hwang, Bing Zhou, Young Min Kim, Jian Wang, Chuan Guo

Modeling human-scene interactions (HSI) is essential for understanding and simulating everyday human behaviors.

Denoising motion in-betweening

Towards Harmless Multimodal Assistants with Blind Preference Optimization

no code implementations18 Mar 2025 Yongqi Li, Lu Yang, Jian Wang, Runyang You, Wenjie Li, Liqiang Nie

Additionally, applying BPO to the MMSafe-PO dataset greatly reduces the base MLLM's unsafe rate on other safety benchmarks (14. 5% on MM-SafetyBench and 82. 9% on HarmEval, demonstrating the effectiveness and robustness of both the dataset and the approach.

A Survey on Human Interaction Motion Generation

1 code implementation17 Mar 2025 Kewei Sui, Anindita Ghosh, Inwoo Hwang, Jian Wang, Chuan Guo

Humans inhabit a world defined by interactions -- with other humans, objects, and environments.

Human Dynamics Motion Generation +1

Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation

no code implementations14 Mar 2025 Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt

Our experiments show that the new camera configurations with back views provide superior support for 3D pose tracking compared to only frontal placements.

3D Human Pose Estimation 3D Reconstruction +1

KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception

1 code implementation13 Mar 2025 Yunpeng Qu, Kun Yuan, Qizhi Xie, Ming Sun, Chao Zhou, Jian Wang

Inspired by the Human Visual System (HVS) that links global quality to the local texture of different regions and their visual saliency, we propose a Kaleidoscope Video Quality Assessment (KVQ) framework, which aims to effectively assess both saliency and local texture, thereby facilitating the assessment of global quality.

Video Quality Assessment Visual Question Answering (VQA)

Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning

no code implementations8 Mar 2025 Yanjun Chen, Yirong Sun, Xinghao Chen, Jian Wang, Xiaoyu Shen, Wenjie Li, Wei zhang

Chain-of-Thought (CoT) reasoning has proven effective in natural language tasks but remains underexplored in multimodal alignment.

Multimodal Reasoning

STeCa: Step-level Trajectory Calibration for LLM Agent Learning

1 code implementation20 Feb 2025 Hanlin Wang, Jian Wang, Chak Tou Leong, Wenjie Li

To address this, we highlight the importance of timely calibration and the need to automatically construct calibration trajectories for training agents.

Decision Making Language Modeling +2

Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region

no code implementations19 Feb 2025 Chak Tou Leong, Qingyu Yin, Jian Wang, Wenjie Li

The safety alignment of large language models (LLMs) remains vulnerable, as their initial behavior can be easily jailbroken by even relatively simple attacks.

Decision Making Safety Alignment

Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

1 code implementation18 Feb 2025 Jian Wang, Yinpei Dai, Yichi Zhang, Ziqiao Ma, Wenjie Li, Joyce Chai

Intelligent tutoring agents powered by large language models (LLMs) have been increasingly explored to deliver personalized guidance in areas such as language learning and science education.

Code Generation Knowledge Tracing

KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems

no code implementations11 Feb 2025 Jusheng Zhang, Zimeng Huang, Yijia Fan, Ningyuan Liu, Mingyan Li, Zhuojie Yang, Jiawei Yao, Jian Wang, Keze Wang

As scaling large language models faces prohibitive costs, multi-agent systems emerge as a promising alternative, though challenged by static knowledge assumptions and coordination inefficiencies.

Thompson Sampling

EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera

1 code implementation11 Feb 2025 Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Alain Pagani, Didier Stricker, Christian Theobalt, Vladislav Golyanik

To address these limitations, we introduce EventEgo3D++, the first approach that leverages a monocular event camera with a fisheye lens for 3D human motion capture.

Molecular Odor Prediction Based on Multi-Feature Graph Attention Networks

no code implementations3 Feb 2025 HongXin Xie, Jiande Sun, Yi Shao, Shuai Li, Sujuan Hou, YuLong Sun, Jian Wang

Olfactory perception plays a critical role in both human and organismal interactions, yet understanding of its underlying mechanisms and influencing factors remain insufficient.

Graph Attention

Auto-Prompting SAM for Weakly Supervised Landslide Extraction

no code implementations23 Jan 2025 Jian Wang, Xiaokang Zhang, Xianping Ma, Weikang Yu, Pedram Ghamisi

These informative prompts are able to identify the extent of landslide areas (box prompts) and denote the centers of landslide objects (point prompts), guiding SAM in landslide segmentation.

Landslide segmentation Object Localization +1

Discrete Curvature Graph Information Bottleneck

no code implementations28 Dec 2024 Xingcheng Fu, Jian Wang, Yisen Gao, Qingyun Sun, Haonan Yuan, JianXin Li, Xianxian Li

CurvGIB advances the Variational Information Bottleneck (VIB) principle for Ricci curvature optimization to learn the optimal information transport pattern for specific downstream tasks.

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

no code implementations9 Dec 2024 Howard Zhang, Yuval Alaluf, Sizhuo Ma, Achuta Kadambi, Jian Wang, Kfir Aberman

Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features.

Image Restoration

RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation

no code implementations29 Nov 2024 Xianfeng Tan, Yuhan Li, Wenxiang Shang, Yubo Wu, Jian Wang, Xuanhong Chen, Yi Zhang, Ran Lin, Bingbing Ni

Standard clothing asset generation involves creating forward-facing flat-lay garment images displayed on a clear background by extracting clothing information from diverse real-world contexts, which presents significant challenges due to highly standardized sampling distributions and precise structural requirements in the generated images.

Contrastive Learning RAG +1

TextMaster: Universal Controllable Text Edit

no code implementations13 Oct 2024 Aoqiang Wang, Jian Wang, Zhenyu Yan, Wenxiang Shang, Ran Lin, Zhao Zhang

In image editing tasks, high-quality text editing capabilities can significantly reduce human and material resource costs.

Optical Character Recognition (OCR) Style Transfer

Delving Deep into Engagement Prediction of Short Videos

1 code implementation30 Sep 2024 Dasong Li, Wenjie Li, Baili Lu, Hongsheng Li, Sizhuo Ma, Gurunandan Krishnan, Jian Wang

Understanding and modeling the popularity of User Generated Content (UGC) short videos on social media platforms presents a critical challenge with broad implications for content creators and recommendation systems.

Prediction Recommendation Systems +1

Multimodal Trajectory Prediction for Autonomous Driving on Unstructured Roads using Deep Convolutional Network

no code implementations27 Sep 2024 Lei LI, Zhifa Chen, Jian Wang, Bin Zhou, Guizhen Yu, Xiaoxuan Chen

Recently, the application of autonomous driving in open-pit mining has garnered increasing attention for achieving safe and efficient mineral transportation.

Autonomous Driving Trajectory Prediction

EgoAvatar: Egocentric View-Driven and Photorealistic Full-body Avatars

no code implementations22 Sep 2024 Jianchun Chen, Jian Wang, yinda zhang, Rohit Pandey, Thabo Beeler, Marc Habermann, Christian Theobalt

Immersive VR telepresence ideally means being able to interact and communicate with digital avatars that are indistinguishable from and precisely reflect the behaviour of their real counterparts.

UniMo: Universal Motion Correction For Medical Images without Network Retraining

1 code implementation21 Sep 2024 Jian Wang, Razieh Faghihpirayesh, Danny Joca, Polina Golland, Ali Gholipour

In this paper, we introduce a Universal Motion Correction (UniMo) framework, leveraging deep neural networks to tackle the challenges of motion correction across diverse imaging modalities.

E2CL: Exploration-based Error Correction Learning for Embodied Agents

no code implementations5 Sep 2024 Hanlin Wang, Chak Tou Leong, Jian Wang, Wenjie Li

Language models are exhibiting increasing capability in knowledge utilization and reasoning.

RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment

no code implementations22 Aug 2024 Xiaohan Wang, Xiaoyan Yang, Yuqi Zhu, Yue Shen, Jian Wang, Peng Wei, Lei Liang, Jinjie Gu, Huajun Chen, Ningyu Zhang

Large Language Models (LLMs) like GPT-4, MedPaLM-2, and Med-Gemini achieve performance competitively with human experts across various medical benchmarks.

Diagnostic

MS$^3$D: A RG Flow-Based Regularization for GAN Training with Limited Data

no code implementations20 Aug 2024 Jian Wang, Xin Lan, Yuxin Tian, Jiancheng Lv

Generative adversarial networks (GANs) have made impressive advances in image generation, but they often require large-scale training data to avoid degradation caused by discriminator overfitting.

Image Generation

Multi-periodicity dependency Transformer based on spectrum offset for radio frequency fingerprint identification

no code implementations14 Aug 2024 Jing Xiao, Wenrui Ding, Zeqi Shao, Duona Zhang, Yanan Ma, Yufeng Wang, Jian Wang

These challenges diminish the capability of RFFI methods in feature representation, complicating the effective identification of device identities.

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

1 code implementation12 Aug 2024 Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia

In this paper, we propose ControlNeXt: a powerful and efficient method for controllable image and video generation.

Video Generation

Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation

1 code implementation6 Aug 2024 Hui Ma, Bo Zhang, Bo Xu, Jian Wang, Hongfei Lin, Xiao Sun

During reinforcement learning training, the proximal policy optimization algorithm is used to fine-tune the policy, enabling the generation of empathetic responses.

Empathetic Response Generation reinforcement-learning +3

POA: Pre-training Once for Models of All Sizes

1 code implementation2 Aug 2024 Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang

Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks.

All Representation Learning

Mitral Regurgitation Recognition based on Unsupervised Out-of-Distribution Detection with Residual Diffusion Amplification

no code implementations31 Jul 2024 Zhe Liu, Xiliang Zhu, Tong Han, Yuhao Huang, Jian Wang, Lian Liu, Fang Wang, Dong Ni, Zhongshan Gou, Xin Yang

Since MR data is limited and has large intra-class variability, we propose an unsupervised out-of-distribution (OOD) detection method to identify MR rather than building a deep classifier.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Matting by Generation

no code implementations30 Jul 2024 Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh

This paper introduces an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling challenge.

Image Matting

SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking

no code implementations29 Jul 2024 Jian Wang, Razieh Faghihpirayesh, Polina Golland, Ali Gholipour

In this paper, we introduce SpaER, a pioneering method for fetal motion tracking that leverages equivariant filters and self-attention mechanisms to effectively learn spatio-temporal representations.

Data Augmentation

Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

no code implementations22 Jul 2024 Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs).

Soli-enabled Noncontact Heart Rate Detection for Sleep and Meditation Tracking

no code implementations8 Jul 2024 Luzhou Xu, Jaime Lien, Haiguang Li, Nicholas Gillian, Rajeev Nongpiur, Jihan Li, Qian Zhang, Jian Cui, David Jorgensen, Adam Bernstein, Lauren Bedal, Eiji Hayashi, Jin Yamanaka, Alex Lee, Jian Wang, D Shin, Ivan Poupyrev, Trausti Thormundsson, Anupam Pathak, Shwetak Patel

This study represents the first application of the noncontact HR detection technology to sleep and meditation tracking, offering a promising alternative to wearable devices for HR monitoring during sleep and meditation.

Root Cause Localization for Microservice Systems in Cloud-edge Collaborative Environments

no code implementations19 Jun 2024 Yuhan Zhu, Jian Wang, Bing Li, Xuxian Tang, Hao Li, Neng Zhang, Yuqi Zhao

Experiments conducted on the dataset collected from the benchmark show that MicroCERCL can accurately localize the root cause of microservice systems in such environments, significantly outperforming state-of-the-art approaches with an increase of at least 24. 1% in top-1 accuracy.

Graph Neural Network

SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

1 code implementation14 Jun 2024 Junwei Luo, Zhen Pang, Yongjun Zhang, Tingzhu Wang, LinLin Wang, Bo Dang, Jiangwei Lao, Jian Wang, Jingdong Chen, Yihua Tan, Yansheng Li

Remote Sensing Large Multi-Modal Models (RSLMMs) are developing rapidly and showcase significant capabilities in remote sensing imagery (RSI) comprehension.

Graph Generation Relation +1

RobustSAM: Segment Anything Robustly on Degraded Images

1 code implementation CVPR 2024 Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

Segment Anything Model (SAM) has emerged as a transformative approach in image segmentation, acclaimed for its robust zero-shot segmentation capabilities and flexible prompting system.

Deblurring Image Dehazing +6

DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer

1 code implementation CVPR 2024 Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao, Sy-Yen Kuo, Sizhuo Ma, Jian Wang

Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks.

Face Image Quality Face Image Quality Assessment +3

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

1 code implementation13 Jun 2024 Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas.

Image Enhancement

A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

no code implementations6 Jun 2024 Lei Liu, Xiaoyan Yang, Junchi Lei, Yue Shen, Jian Wang, Peng Wei, Zhixuan Chu, Zhan Qin, Kui Ren

With the advent of Large Language Models (LLMs), medical artificial intelligence (AI) has experienced substantial technological progress and paradigm shifts, highlighting the potential of LLMs to streamline healthcare delivery and improve patient outcomes.

Fairness

No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks

no code implementations25 May 2024 Chak Tou Leong, Yi Cheng, Kaishuai Xu, Jian Wang, Hanlin Wang, Wenjie Li

In particular, we analyze the two most representative types of attack approaches: Explicit Harmful Attack (EHA) and Identity-Shifting Attack (ISA).

Safety Alignment

Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation

1 code implementation16 May 2024 Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin

Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities.

Dialogue Generation Knowledge Distillation

Hi-Gen: Generative Retrieval For Large-Scale Personalized E-commerce Search

no code implementations24 Apr 2024 Yanjing Wu, Yinfu Feng, Jian Wang, WenJi Zhou, Yunan Ye, Rong Xiao, Jun Xiao

To overcome these problems, we introduce an efficient Hierarchical encoding-decoding Generative retrieval method (Hi-Gen) for large-scale personalized E-commerce search systems.

Language Modelling Metric Learning +3

EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams

1 code implementation CVPR 2024 Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik

In response to the existing limitations, this paper 1) introduces a new problem, i. e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens, and 2) proposes the first approach to it called EventEgo3D (EE3D).

3D Pose Estimation 3D Reconstruction +1

Application of Quantum Tensor Networks for Protein Classification

no code implementations11 Mar 2024 Debarshi Kundu, Archisman Ghosh, Srinivasan Ekambaram, Jian Wang, Nikolay Dokholyan, Swaroop Ghosh

We show that protein sequences can be thought of as sentences in natural language processing and can be parsed using the existing Quantum Natural Language framework into parameterized quantum circuits of reasonable qubits, which can be trained to solve various protein-related machine-learning problems.

Binary Classification Classification +2

Target-constrained Bidirectional Planning for Generation of Target-oriented Proactive Dialogue

1 code implementation10 Mar 2024 Jian Wang, Dongding Lin, Wenjie Li

Inspired by decision-making theories in cognitive science, we propose a novel target-constrained bidirectional planning (TRIP) approach, which plans an appropriate dialogue path by looking ahead and looking back.

Dialogue Generation

Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training

no code implementations18 Feb 2024 Jian Wang, Xin Yang, Xiaohong Jia, Wufeng Xue, Rusi Chen, Yanlin Chen, Xiliang Zhu, Lian Liu, Yan Cao, Jianqiao Zhou, Dong Ni, Ning Gu

In this study, we proposed a multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels.

Classification Segmentation +1

Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue

1 code implementation10 Feb 2024 Jian Wang, Chak Tou Leong, Jiashuo Wang, Dongding Lin, Wenjie Li, Xiao-Yong Wei

Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents.

Dialogue Generation

OrchMoE: Efficient Multi-Adapter Learning with Task-Skill Synergy

no code implementations19 Jan 2024 Haowen Wang, Tao Sun, Kaixiang Ji, Jian Wang, Cong Fan, Jinjie Gu

We advance the field of Parameter-Efficient Fine-Tuning (PEFT) with our novel multi-adapter method, OrchMoE, which capitalizes on modular skill architecture for enhanced forward transfer in neural networks.

Multi-Task Learning parameter-efficient fine-tuning

Enhancing Automatic Modulation Recognition through Robust Global Feature Extraction

no code implementations2 Jan 2024 Yunpeng Qu, Zhilin Lu, Rui Zeng, Jintao Wang, Jian Wang

Modulated signals exhibit long temporal dependencies, and extracting global features is crucial in identifying modulation schemes.

Automatic Modulation Recognition Data Augmentation

Towards Better Vision-Inspired Vision-Language Models

no code implementations CVPR 2024 Yun-Hao Cao, Kaixiang Ji, Ziyuan Huang, Chuanyang Zheng, Jiajia Liu, Jian Wang, Jingdong Chen, Ming Yang

In this paper we present a vision-inspired vision-language connection module dubbed as VIVL which efficiently exploits the vision cue for VL models.

Personalized Restoration via Dual-Pivot Tuning

no code implementations28 Dec 2023 Pradyumna Chari, Sizhuo Ma, Daniil Ostashev, Achuta Kadambi, Gurunandan Krishnan, Jian Wang, Kfir Aberman

This approach ensures that personalization does not interfere with the restoration process, resulting in a natural appearance with high fidelity to the person's identity and the attributes of the degraded image.

Image Restoration

COOPER: Coordinating Specialized Agents towards a Complex Dialogue Goal

1 code implementation19 Dec 2023 Yi Cheng, Wenge Liu, Jian Wang, Chak Tou Leong, Yi Ouyang, Wenjie Li, Xian Wu, Yefeng Zheng

In recent years, there has been a growing interest in exploring dialogues with more complex goals, such as negotiation, persuasion, and emotional support, which go beyond traditional service-focused dialogue systems.

Robust Communicative Multi-Agent Reinforcement Learning with Active Defense

no code implementations16 Dec 2023 Lebin Yu, Yunbo Qiu, Quanming Yao, Yuan Shen, Xudong Zhang, Jian Wang

We propose an active defense strategy, where agents automatically reduce the impact of potentially harmful messages on the final decision.

Multi-agent Reinforcement Learning reinforcement-learning +1

Towards 4D Human Video Stylization

1 code implementation7 Dec 2023 Tiantian Wang, Xinxin Zuo, Fangzhou Mu, Jian Wang, Ming-Hsuan Yang

To overcome these limitations, we leverage Neural Radiance Fields (NeRFs) to represent videos, conducting stylization in the rendered feature space.

Human Animation Novel View Synthesis +2

Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation

1 code implementation14 Nov 2023 Zhihang Zhong, Xiao Sun, Yu Qiao, Gurunandan Krishnan, Sizhuo Ma, Jian Wang

Existing video frame interpolation (VFI) methods blindly predict where each object is at a specific timestep t ("time indexing"), which struggles to predict precise object movements.

Object Video Editing +1

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception

1 code implementation NeurIPS 2023 Junkun Yuan, Xinyu Zhang, Hao Zhou, Jian Wang, Zhongwei Qiu, Zhiyin Shao, Shaofeng Zhang, Sifan Long, Kun Kuang, Kun Yao, Junyu Han, Errui Ding, Lanfen Lin, Fei Wu, Jingdong Wang

To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image.

2D Pose Estimation Attribute +3

A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations

1 code implementation31 Oct 2023 Hui Ma, Jian Wang, Hongfei Lin, Bo Zhang, Yijia Zhang, Bo Xu

Emotion recognition in conversations (ERC), the task of recognizing the emotion of each utterance in a conversation, is crucial for building empathetic machines.

Emotion Recognition in Conversation Multimodal Emotion Recognition

Self-Detoxifying Language Models via Toxification Reversal

2 code implementations14 Oct 2023 Chak Tou Leong, Yi Cheng, Jiashuo Wang, Jian Wang, Wenjie Li

Drawing on this idea, we devise a method to identify the toxification direction from the normal generation process to the one prompted with the negative prefix, and then steer the generation to the reversed direction by manipulating the information movement within the attention layers.

Language Modeling Language Modelling

Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation

1 code implementation11 Oct 2023 Jian Wang, Yi Cheng, Dongding Lin, Chak Tou Leong, Wenjie Li

Target-oriented dialogue systems, designed to proactively steer conversations toward predefined targets or accomplish specific system-side goals, are an exciting area in conversational AI.

PatchProto Networks for Few-shot Visual Anomaly Classification

no code implementations7 Oct 2023 Jian Wang, Yue Zhuo

The visual anomaly diagnosis can automatically analyze the defective products, which has been widely applied in industrial quality inspection.

Anomaly Classification Classification +1

Segmented Harmonic Loss: Handling Class-Imbalanced Multi-Label Clinical Data for Medical Coding with Large Language Models

no code implementations6 Oct 2023 Surjya Ray, Pratik Mehta, Hongen Zhang, Ada Chaman, Jian Wang, Chung-Jen Ho, Michael Chiou, Tashfeen Suleman

In this paper, we gauge the extent of the impact by evaluating the performance of LLMs for the task of medical coding on real-life noisy data.

Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

1 code implementation ICCV 2023 Zhiyin Shao, Xinyu Zhang, Changxing Ding, Jian Wang, Jingdong Wang

In this way, the pre-training task and the T2I-ReID task are made consistent with each other on both data and training levels.

Person Re-Identification

FFPN: Fourier Feature Pyramid Network for Ultrasound Image Segmentation

no code implementations26 Aug 2023 Chaoyu Chen, Xin Yang, Rusi Chen, Junxuan Yu, Liwei Du, Jian Wang, Xindi Hu, Yan Cao, Yingying Liu, Dong Ni

In this paper, we introduce a novel Fourier-anchor-based DTS framework called Fourier Feature Pyramid Network (FFPN) to address the aforementioned issues.

Image Segmentation Semantic Segmentation

Model predictive control strategy in waked wind farms for optimal fatigue loads

no code implementations25 Aug 2023 Cheng Zhong, Yicheng Ding, Husai Wang, Jikai Chen, Jian Wang, Yang Li

In this paper, a closed-loop model predictive controller is developed that minimizes the wind farm tracking errors, the dynamical fatigue load, and and the load equalization.

Model Predictive Control

Group Pose: A Simple Baseline for End-to-End Multi-person Pose Estimation

2 code implementations ICCV 2023 Huan Liu, Qiang Chen, Zichang Tan, Jiang-Jiang Liu, Jian Wang, Xiangbo Su, Xiaolong Li, Kun Yao, Junyu Han, Errui Ding, Yao Zhao, Jingdong Wang

State-of-the-art solutions adopt the DETR-like framework, and mainly develop the complex decoder, e. g., regarding pose estimation as keypoint box detection and combining with human detection in ED-Pose, hierarchically predicting with pose decoder and joint (keypoint) decoder in PETR.

Decoder Human Detection +1

ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation

1 code implementation1 Aug 2023 Bo Zhang, Jian Wang, Hui Ma, Bo Xu, Hongfei Lin

To overcome this challenge, we propose an innovative multimodal framework, called ZRIGF, which assimilates image-grounded information for dialogue generation in zero-resource situations.

Dialogue Generation Response Generation

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

1 code implementation10 Jul 2023 Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang

In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration.

Image Registration Semantic Segmentation

Image Harmonization with Diffusion Model

no code implementations17 Jun 2023 Jiajie Li, Jian Wang, Chen Wang, JinJun Xiong

In this paper, we present a novel approach for image harmonization by leveraging diffusion models.

Image Harmonization model

Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers with Partially Annotated Ultrasound Images

no code implementations12 Jun 2023 Jian Wang, Liang Qiao, Shichong Zhou, Jin Zhou, Jun Wang, Juncheng Li, Shihui Ying, Cai Chang, Jun Shi

To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to enhance diagnostic accuracy of the ultrasound-based CAD for breast cancers.

Diagnostic Lesion Detection +1

Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification

no code implementations5 Jun 2023 Yuhao Huang, Xin Yang, Xiaoqiong Huang, Xinrui Zhou, Haozhe Chi, Haoran Dou, Xindi Hu, Jian Wang, Xuedong Deng, Dong Ni

Second, we introduce a regularization technique that utilizes style interpolation consistency in the frequency space to encourage self-consistency in the logit space of the model output.

Robust classification Test-time Adaptation

Medical Dialogue Generation via Dual Flow Modeling

1 code implementation29 May 2023 Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

It extracts the medical entities and dialogue acts used in the dialogue history and models their transitions with an entity-centric graph flow and a sequential act flow, respectively.

Dialogue Generation Dialogue Understanding

Dialogue Planning via Brownian Bridge Stochastic Process for Goal-directed Proactive Dialogue

1 code implementation9 May 2023 Jian Wang, Dongding Lin, Wenjie Li

The key to achieving this task lies in planning dialogue paths that smoothly and coherently direct conversations towards the target.

Dialogue Generation

Exploring Effective Factors for Improving Visual In-Context Learning

1 code implementation10 Apr 2023 Yanpeng Sun, Qiang Chen, Jian Wang, Jingdong Wang, Zechao Li

By doing this, the model can leverage the diverse knowledge stored in different parts of the model to improve its performance on new tasks.

In-Context Learning Meta-Learning +1

Robust Calibrate Proxy Loss for Deep Metric Learning

no code implementations6 Apr 2023 Xinyue Li, Jian Wang, Wei Song, Yanling Du, Zhixiang Liu

The mainstream researche in deep metric learning can be divided into two genres: proxy-based and pair-based methods.

Metric Learning Retrieval

Learning to Recover Spectral Reflectance from RGB Images

1 code implementation4 Apr 2023 Dong Huo, Jian Wang, Yiming Qian, Yee-Hong Yang

Instead of relying on naive end-to-end training, we also propose a novel architecture that integrates the physical relationship between the spectral reflectance and the corresponding RGB images into the network based on our mathematical analysis.

Auxiliary Learning

Exploring Adversarial Attacks on Neural Networks: An Explainable Approach

1 code implementation8 Mar 2023 Justus Renkhoff, Wenkai Tan, Alvaro Velasquez, illiam Yichen Wang, Yongxin Liu, Jian Wang, Shuteng Niu, Lejla Begic Fazlic, Guido Dartmann, Houbing Song

Finally, we demonstrate that the layers $Block4\_conv1$ and $Block5\_cov1$ of the VGG-16 model are more susceptible to adversarial attacks.

Autonomous Driving

MetaMorph: Learning Metamorphic Image Transformation With Appearance Changes

no code implementations8 Mar 2023 Jian Wang, Jiarui Xing, Jason Druzgal, William M. Wells III, Miaomiao Zhang

This paper presents a novel predictive model, MetaMorph, for metamorphic registration of images with appearance changes (i. e., caused by brain tumors).

Segmentation

Temporal Segment Transformer for Action Segmentation

no code implementations25 Feb 2023 Zhichao Liu, Leshan Wang, Desen Zhou, Jian Wang, Songyang Zhang, Yang Bai, Errui Ding, Rui Fan

To deal with these issues, we propose an attention based approach which we call \textit{temporal segment transformer}, for joint segment relation modeling and denoising.

Action Segmentation Denoising +1

DisCO: Portrait Distortion Correction with Perspective-Aware 3D GANs

no code implementations23 Feb 2023 Zhixiang Wang, Yu-Lun Liu, Jia-Bin Huang, Shin'ichi Satoh, Sizhuo Ma, Gurunandan Krishnan, Jian Wang

Close-up facial images captured at short distances often suffer from perspective distortion, resulting in exaggerated facial features and unnatural/unattractive appearances.

distortion correction Scheduling

Differentiable Rotamer Sampling with Molecular Force Fields

1 code implementation22 Feb 2023 Congzhou M. Sha, Jian Wang, Nikolay V. Dokholyan

Molecular dynamics is the primary computational method by which modern structural biology explores macromolecule structure and function.

Low Entropy Communication in Multi-Agent Reinforcement Learning

no code implementations10 Feb 2023 Lebin Yu, Yunbo Qiu, Qiexiang Wang, Xudong Zhang, Jian Wang

Communication in multi-agent reinforcement learning has been drawing attention recently for its significant role in cooperation.

Multi-agent Reinforcement Learning reinforcement-learning +2

CSDR-BERT: a pre-trained scientific dataset match model for Chinese Scientific Dataset Retrieval

no code implementations30 Jan 2023 Xintao Chu, Jianping Liu, Jian Wang, XiaoFeng Wang, Yingfei Wang, Meng Wang, Xunxun Gu

As the number of open and shared scientific datasets on the Internet increases under the open science movement, efficiently retrieving these datasets is a crucial task in information retrieval (IR) research.

Information Retrieval Retrieval +2

Graph Contrastive Learning for Skeleton-based Action Recognition

1 code implementation26 Jan 2023 Xiaohu Huang, Hao Zhou, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng

In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences.

Action Recognition Contrastive Learning +2

FE-TCM: Filter-Enhanced Transformer Click Model for Web Search

no code implementations19 Jan 2023 Yingfei Wang, Jianping Liu, Jian Wang, XiaoFeng Wang, Meng Wang, Xintao Chu

In this paper, We use Transformer as the backbone network of feature extraction, add filter layer innovatively, and propose a new Filter-Enhanced Transformer Click Model (FE-TCM) for web search.

Uncertainty-guided Learning for Improving Image Manipulation Detection

no code implementations ICCV 2023 Kaixiang Ji, Feng Chen, Xin Guo, Yadong Xu, Jian Wang, Jingdong Chen

Image manipulation detection (IMD) is of vital importance as faking images and spreading misinformation can be malicious and harm our daily life.

Image Manipulation Image Manipulation Detection +1

s-Adaptive Decoupled Prototype for Few-Shot Object Detection

no code implementations ICCV 2023 Jinhao Du, Shan Zhang, Qiang Chen, Haifeng Le, Yanpeng Sun, Yao Ni, Jian Wang, Bin He, Jingdong Wang

To provide precise information for the query image, the prototype is decoupled into task-specific ones, which provide tailored guidance for 'where to look' and 'what to look for', respectively.

Few-Shot Object Detection Meta-Learning +3

Scene-aware Egocentric 3D Human Pose Estimation

1 code implementation CVPR 2023 Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Diogo Luvizon, Christian Theobalt

To this end, we propose an egocentric depth estimation network to predict the scene depth map from a wide-view egocentric fisheye camera while mitigating the occlusion of the human body with a depth-inpainting network.

Ranked #3 on Egocentric Pose Estimation on GlobalEgoMocap Test Dataset (using extra training data)

Depth Estimation Egocentric Pose Estimation

COLA: Improving Conversational Recommender Systems by Collaborative Augmentation

no code implementations15 Dec 2022 Dongding Lin, Jian Wang, Wenjie Li

Inspired by collaborative filtering, we propose a collaborative augmentation (COLA) method to simultaneously improve both item representation learning and user preference modeling to address these issues.

CoLA Collaborative Filtering +2

WAIR-D: Wireless AI Research Dataset

no code implementations5 Dec 2022 Yourui Huangfu, Jian Wang, Shengchen Dai, Rong Li, Jun Wang, Chongwen Huang, Zhaoyang Zhang

The statistical data hinder the trained AI models from further fine-tuning for a specific scenario, and ray-tracing data with limited environments lower down the generalization capability of the trained AI models.

Intelligent Communication

Attention-based Class Activation Diffusion for Weakly-Supervised Semantic Segmentation

no code implementations20 Nov 2022 Jianqiang Huang, Jian Wang, Qianru Sun, Hanwang Zhang

An intuitive solution is ``coupling'' the CAM with the long-range attention matrix of visual transformers (ViT) We find that the direct ``coupling'', e. g., pixel-wise multiplication of attention and activation, achieves a more global coverage (on the foreground), but unfortunately goes with a great increase of false positives, i. e., background pixels are mistakenly included.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

no code implementations arXiv 2022 Qiang Chen, Jian Wang, Chuchu Han, Shan Zhang, Zexian Li, Xiaokang Chen, Jiahui Chen, Xiaodi Wang, Shuming Han, Gang Zhang, Haocheng Feng, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO.

Decoder Object +2

A survey on the development status and application prospects of knowledge graph in smart grids

no code implementations2 Nov 2022 Jian Wang, Xi Wang, Chaoqun Ma, Lei Kou

With the advent of the electric power big data era, semantic interoperability and interconnection of power data have received extensive attention.

Decision Making

Geo-SIC: Learning Deformable Geometric Shapes in Deep Image Classifiers

1 code implementation25 Oct 2022 Jian Wang, Miaomiao Zhang

We introduce a newly designed framework that (i) simultaneously derives features from both image and latent shape spaces with large intra-class variations; and (ii) gains increased model interpretability by allowing direct access to the underlying geometric features of image data.

Image Classification

Self-Supervised 2D/3D Registration for X-Ray to CT Image Fusion

no code implementations14 Oct 2022 Srikrishna Jaganathan, Maximilian Kukla, Jian Wang, Karthik Shetty, Andreas Maier

Deep Learning-based 2D/3D registration enables fast, robust, and accurate X-ray to CT image fusion when large annotated paired datasets are available for training.

Domain Adaptation

U-HRNet: Delving into Improving Semantic Representation of High Resolution Network for Dense Prediction

4 code implementations13 Oct 2022 Jian Wang, Xiang Long, Guowei Chen, Zewu Wu, Zeyu Chen, Errui Ding

Therefore, we designed a U-shaped High-Resolution Network (U-HRNet), which adds more stages after the feature map with strongest semantic representation and relaxes the constraint in HRNet that all resolutions need to be calculated parallel for a newly added stage.

Depth Estimation Depth Prediction +1

Beam Management in Ultra-dense mmWave Network via Federated Reinforcement Learning: An Intelligent and Secure Approach

no code implementations4 Oct 2022 Qing Xue, Yi-Jing Liu, Yao Sun, Jian Wang, Li Yan, Gang Feng, Shaodan Ma

Deploying ultra-dense networks that operate on millimeter wave (mmWave) band is a promising way to address the tremendous growth on mobile data traffic.

Federated Learning Management

Design of the PID temperature controller for an alkaline electrolysis system with time delays

no code implementations3 Oct 2022 Ruomei Qi, Jiarong Li, Jin Lin, Yonghua Song, Jiepeng Wang, Qiangqiang Cui, Yiwei Qiu, Ming Tang, Jian Wang

This paper focuses on the design of the PID temperature controller for an alkaline electrolysis system to achieve fast and stable temperature control.

Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control

no code implementations17 Sep 2022 Yunbo Qiu, Yue Jin, Jian Wang, Xudong Zhang

Flocking control is a challenging problem, where multiple agents, such as drones or vehicles, need to reach a target position while maintaining the flock and avoiding collisions with obstacles and collisions among agents in the environment.

Multi-agent Reinforcement Learning reinforcement-learning +3

Part-aware Prototypical Graph Network for One-shot Skeleton-based Action Recognition

no code implementations19 Aug 2022 Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Qian He, Chuanyang Hu, Errui Ding, Yu Guan, Xuming He

In this paper, we study the problem of one-shot skeleton-based action recognition, which poses unique challenges in learning transferable representation from base classes to novel classes, particularly for fine-grained actions.

Action Recognition Meta-Learning +1

Multilayer Fisher extreme learning machine for classification

no code implementations Complex & Intelligent Systems 2022 Jie Lai, Xiaodan Wang, Qian Xiang, Jian Wang, Lei Lei

To address this problem, a novel Fisher extreme learning machine autoencoder (FELM-AE) is proposed and is used as the component for the multilayer Fisher extreme leaning machine (ML-FELM).

Classification Denoising +1

Follow Me: Conversation Planning for Target-driven Recommendation Dialogue Systems

1 code implementation6 Aug 2022 Jian Wang, Dongding Lin, Wenjie Li

Recommendation dialogue systems aim to build social bonds with users and provide high-quality recommendations.

Dialogue Generation

IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation

no code implementations6 Aug 2022 Zhongwei Qiu, Qiansheng Yang, Jian Wang, Dongmei Fu

In particular, we firstly formulate video frames as a series of instance-guided tokens and each token is in charge of predicting the 3D pose of a human instance.

Ranked #11 on 3D Multi-Person Pose Estimation on Panoptic (using extra training data)

2D Pose Estimation 3D Multi-Person Pose Estimation +1

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

2 code implementations ICCV 2023 Qiang Chen, Xiaokang Chen, Jian Wang, Shan Zhang, Kun Yao, Haocheng Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang

Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to one prediction, for end-to-end detection without NMS post-processing.

Data Augmentation Decoder +3

Action Quality Assessment with Temporal Parsing Transformer

1 code implementation19 Jul 2022 Yang Bai, Desen Zhou, Songyang Zhang, Jian Wang, Errui Ding, Yu Guan, Yang Long, Jingdong Wang

Action Quality Assessment(AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences.

Action Quality Assessment Action Understanding +2

Learning Granularity-Unified Representations for Text-to-Image Person Re-identification

2 code implementations16 Jul 2022 Zhiyin Shao, Xinyu Zhang, Meng Fang, Zhifeng Lin, Jian Wang, Changxing Ding

In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance.

Person Re-Identification Text based Person Retrieval +1

Privacy-preserving household load forecasting based on non-intrusive load monitoring: A federated deep learning approach

no code implementations30 Jun 2022 Xinxin Zhou, Jingru Feng, Jian Wang, Jianhong Pan

In this method, the integrated power is decomposed into individual device power by non-intrusive load monitoring, and the power of individual appliances is predicted separately using a federated deep learning model.

Deep Learning Federated Learning +3

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

no code implementations19 Jun 2022 Pengfei Zhang, Xiaohui Hu, Kaidong Yu, Jian Wang, Song Han, Cao Liu, Chunyang Yuan

Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively.

Dialogue Evaluation MME

Structured Light with Redundancy Codes

no code implementations18 Jun 2022 Zhanghao Sun, Yu Zhang, Yicheng Wu, Dong Huo, Yiming Qian, Jian Wang

We propose three applications using our redundancy codes: (1) Self error-correction for SL imaging under strong ambient light, (2) Error detection for adaptive reconstruction under global illumination, and (3) Interference filtering with device-specific projection sequence encoding, especially for event camera-based SL and light curtain devices.

3D geometry

Object Scan Context: Object-centric Spatial Descriptor for Place Recognition within 3D Point Cloud Map

no code implementations7 Jun 2022 Haodong Yuan, Yudong Zhang, Shengyin Fan, Xue Li, Jian Wang

The integration of a SLAM algorithm with place recognition technology empowers it with the ability to mitigate accumulated errors and to relocalize itself.

Object

One-to-N & N-to-One: Two Advanced Backdoor Attacks Against Deep Learning Models

no code implementations IEEE Transactions on Dependable and Secure Computing 2022 Mingfu Xue, Can He, Jian Wang, and Weiqiang Liu

In this article, for the first time, we propose two advanced backdoor attacks, the multi-target backdoor attacks and multi-trigger backdoor attacks: 1) One-to-N attack, where the attacker can trigger multiple backdoor targets by controlling the different intensities of the same backdoor; 2) N-to-One attack, where such attack is triggered only when all the N backdoors are satisfied.

Face Recognition

Human-Object Interaction Detection via Disentangled Transformer

no code implementations CVPR 2022 Desen Zhou, Zhichao Liu, Jian Wang, Leshan Wang, Tao Hu, Errui Ding, Jingdong Wang

To associate the predictions of disentangled decoders, we first generate a unified representation for HOI triplets with a base decoder, and then utilize it as input feature of each disentangled decoder.

Decoder Human-Object Interaction Detection +2

Implicit Sample Extension for Unsupervised Person Re-Identification

1 code implementation CVPR 2022 Xinyu Zhang, Dongdong Li, Zhigang Wang, Jian Wang, Errui Ding, Javen Qinfeng Shi, Zhaoxiang Zhang, Jingdong Wang

Specifically, we generate support samples from actual samples and their neighbouring clusters in the embedding space through a progressive linear interpolation (PLI) strategy.

Clustering Unsupervised Person Re-Identification

Glass Segmentation with RGB-Thermal Image Pairs

1 code implementation12 Apr 2022 Dong Huo, Jian Wang, Yiming Qian, Yee-Hong Yang

Due to the large difference between the transmission property of visible light and that of the thermal energy through the glass where most glass is transparent to the visible light but opaque to thermal energy, glass regions of a scene are made more distinguishable with a pair of RGB and thermal images than solely with an RGB image.

Segmentation Thermal Image Segmentation

NPC: Neuron Path Coverage via Characterizing Decision Logic of Deep Neural Networks

no code implementations24 Mar 2022 Xiaofei Xie, Tianlin Li, Jian Wang, Lei Ma, Qing Guo, Felix Juefei-Xu, Yang Liu

Inspired by software testing, a number of structural coverage criteria are designed and proposed to measure the test adequacy of DNNs.

Defect Detection DNN Testing +2

Exploiting Pairwise Mutual Information for Knowledge-Grounded Dialogue

1 code implementation IEEE/ACM Transactions on Audio, Speech, and Language Processing 2022 Bo Zhang, Jian Wang, Hongfei Lin, Hui Ma, Bo Xu

Correlation integration is designed to fully exploit the pairwise mutual information among dialogue context, knowledge, and responses, while overall integration adopts an integration gate to capture global information.

Dialogue Generation

Hierarchical Memory Learning for Fine-Grained Scene Graph Generation

no code implementations14 Mar 2022 Youming Deng, Yansheng Li, Yongjun Zhang, Xiang Xiang, Jian Wang, Jingdong Chen, Jiayi Ma

After the autonomous partition of coarse and fine predicates, the model is first trained on the coarse predicates and then learns the fine predicates.

Graph Generation Scene Graph Generation

Image Steganography based on Style Transfer

no code implementations9 Mar 2022 Donghui Hu, Yu Zhang, Cong Yu, Jian Wang, Yaofei Wang

Image steganography is the art and science of using images as cover for covert communications.

Image Steganography Image Stylization +1

Thermal Modelling and Controller Design of an Alkaline Electrolysis System under Dynamic Operating Conditions

no code implementations27 Feb 2022 Ruomei Qi, Jiarong Li, Jin Lin, Yonghua Song, Jiepeng Wang, Qiangqiang Cui, Yiwei Qiu, Ming Tang, Jian Wang

A control-oriented thermal model is established in the form of a third-order time-delay process, which is used for simulation and controller design.

Management

Estimating Egocentric 3D Human Pose in the Wild with External Weak Supervision

no code implementations CVPR 2022 Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Diogo Luvizon, Christian Theobalt

Specifically, we first generate pseudo labels for the EgoPW dataset with a spatio-temporal optimization method by incorporating the external-view supervision.

Ranked #4 on Egocentric Pose Estimation on GlobalEgoMocap Test Dataset (using extra training data)

Egocentric Pose Estimation

An Adaptive Neuro-Fuzzy System with Integrated Feature Selection and Rule Extraction for High-Dimensional Classification Problems

no code implementations10 Jan 2022 Guangdong Xue, Qin Chang, Jian Wang, Kai Zhang, Nikhil R. Pal

The effectiveness of the FSRE-AdaTSK is demonstrated on 19 datasets of which five are in more than 2000 dimension including two with dimension greater than 7000.

feature selection

Training Object Detectors From Scratch: An Empirical Study in the Era of Vision Transformer

no code implementations CVPR 2022 Weixiang Hong, Jiangwei Lao, Wang Ren, Jian Wang, Jingdong Chen, Wei Chu

Instead of proposing a specific vision transformer based detector, in this work, our goal is to reveal the insights of training vision transformer based detectors from scratch.

object-detection Object Detection +1

Hybrid Atlas Building with Deep Registration Priors

no code implementations13 Dec 2021 Nian Wu, Jian Wang, Miaomiao Zhang, Guixu Zhang, Yaxin Peng, Chaomin Shen

Registration-based atlas building often poses computational challenges in high-dimensional image spaces.

MFNet: Multi-filter Directive Network for Weakly Supervised Salient Object Detection

1 code implementation ICCV 2021 Yongri Piao, Jian Wang, Miao Zhang, Huchuan Lu

The multiple accurate cues from multiple DFs are then simultaneously propagated to the saliency network with a multi-guidance loss.

object-detection Object Detection +2

Image-Guided Navigation of a Robotic Ultrasound Probe for Autonomous Spinal Sonography Using a Shadow-aware Dual-Agent Framework

no code implementations3 Nov 2021 Keyu Li, Yangxin Xu, Jian Wang, Dong Ni, Li Liu, Max Q. -H. Meng

Ultrasound (US) imaging is commonly used to assist in the diagnosis and interventions of spine diseases, while the standardized US acquisitions performed by manually operating the probe require substantial experience and training of sonographers.

Anatomy Decision Making +2

URIR: Recommendation algorithm of user RNN encoder and item encoder based on knowledge graph

no code implementations1 Nov 2021 Na Zhao, Zhen Long, Zhi-Dan Zhao, Jian Wang

This implies that URIR can effectively use knowledge graph to obtain better user codes and item codes, thereby obtaining better recommendation results.

Knowledge Graphs Recommendation Systems

One-Bit Matrix Completion with Differential Privacy

no code implementations2 Oct 2021 Zhengpin Li, Zheng Wei, Zengfeng Huang, Xiaojun Mao, Jian Wang

In this paper, we propose a unified framework for ensuring a strong privacy guarantee of one-bit matrix completion with DP.

Collaborative Filtering Matrix Completion +2

SAM: A Self-adaptive Attention Module for Context-Aware Recommendation System

no code implementations1 Oct 2021 Jiabin Liu, Zheng Wei, Zhengpin Li, Xiaojun Mao, Jian Wang, Zhongyu Wei, Qi Zhang

In this work, we propose a novel and general self-adaptive module, the Self-adaptive Attention Module (SAM), which adjusts the selection bias by capturing contextual information based on its representation.

Recommendation Systems Representation Learning +1

Applying Differential Privacy to Tensor Completion

no code implementations1 Oct 2021 Zheng Wei, Zhengpin Li, Xiaojun Mao, Jian Wang

Tensor completion aims at filling the missing or unobserved entries based on partially observed tensors.

Tensor Decomposition

To be Critical: Self-Calibrated Weakly Supervised Learning for Salient Object Detection

no code implementations4 Sep 2021 Yongri Piao, Jian Wang, Miao Zhang, Zhengxuan Ma, Huchuan Lu

Despite of the success of previous works, explorations on an effective training strategy for the saliency network and accurate matches between image-level annotations and salient objects are still inadequate.

object-detection Object Detection +2

Mining Contextual Information Beyond Image for Semantic Segmentation

1 code implementation ICCV 2021 Zhenchao Jin, Tao Gong, Dongdong Yu, Qi Chu, Jian Wang, Changhu Wang, Jie Shao

To address this, this paper proposes to mine the contextual information beyond individual images to further augment the pixel representations.

Image Segmentation Segmentation +1

Learning to Detect: A Data-driven Approach for Network Intrusion Detection

no code implementations18 Aug 2021 Zachary Tauscher, Yushan Jiang, Kai Zhang, Jian Wang, Houbing Song

With massive data being generated daily and the ever-increasing interconnectivity of the world's Internet infrastructures, a machine learning based intrusion detection system (IDS) has become a vital component to protect our economic and national security.

Network Intrusion Detection Representation Learning

Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition

1 code implementation10 Aug 2021 Tailin Chen, Desen Zhou, Jian Wang, Shidong Wang, Yu Guan, Xuming He, Errui Ding

The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion.

Action Classification Action Recognition +2

Distributed Learning for Time-varying Networks: A Scalable Design

no code implementations31 Jul 2021 Jian Wang, Yourui Huangfu, Rong Li, Yiqun Ge, Jun Wang

The wireless network is undergoing a trend from "onnection of things" to "connection of intelligence".

Federated Learning

A Novel Interactive Two-stage Joint Retail Electricity Market for Multiple Microgrids

no code implementations27 Jul 2021 Chunyi Huang, Mingzhi Zhang, Chengmin Wang, Ning Xie, Jian Wang, Shi Peng

To accommodate the advent of microgrids (MG) managing distributed energy resources (DER) in distribution systems, an interactive two-stage joint retail electricity market mechanism is proposed to provide an effective platform for these prosumers to proactively join in retail transactions.

energy trading

Deep Iterative 2D/3D Registration

no code implementations21 Jul 2021 Srikrishna Jaganathan, Jian Wang, Anja Borsdorf, Karthik Shetty, Andreas Maier

A refinement step using the classical optimization-based 2D/3D registration method applied in combination with Deep Learning-based techniques can provide the required accuracy.

Deep Learning Optical Flow Estimation

Bayesian Atlas Building with Hierarchical Priors for Subject-specific Regularization

1 code implementation12 Jul 2021 Jian Wang, Miaomiao Zhang

This paper presents a novel hierarchical Bayesian model for unbiased atlas building with subject-specific regularizations of image registration.

Image Registration

Seeing in Extra Darkness Using a Deep-Red Flash

no code implementations CVPR 2021 Jinhui Xiong, Jian Wang, Wolfgang Heidrich, Shree Nayar

We propose a new flash technique for low-light imaging, using deep-red light as an illuminating source.

Video Reconstruction

Cannot find the paper you are looking for? You can Submit a new open access paper.