Search Results for author: Zheng Zhang

Found 372 papers, 192 papers with code

Region Graph Embedding Network for Zero-Shot Learning

no code implementations ECCV 2020 Guo-Sen Xie, Li Liu, Fan Zhu, Fang Zhao, Zheng Zhang, Yazhou Yao, Jie Qin, Ling Shao

To exploit the progressive interactions among these regions, we represent them as a region graph, on which the parts relation reasoning is performed with graph convolutions, thus leading to our PRR branch.

Graph Embedding Relation +1

MovieChats: Chat like Humans in a Closed Domain

no code implementations EMNLP 2020 Hui Su, Xiaoyu Shen, Zhou Xiao, Zheng Zhang, Ernie Chang, Cheng Zhang, Cheng Niu, Jie zhou

In this work, we take a close look at the movie domain and present a large-scale high-quality corpus with fine-grained annotations in hope of pushing the limit of movie-domain chatbots.

Chatbot Retrieval

Denoising Programming Knowledge Tracing with a Code Graph-based Tuning Adaptor

no code implementations7 Jun 2025 Weibo Gao, Qi Liu, Rui Li, Yuze Zhao, Hao Wang, Linan Yre, Fangzhou Yao, Zheng Zhang

However, current PKT studies primarily focus on the implicit relationship between code content and knowledge assessment, often overlooking two types of noise signals in long-term programming activities: unwanted signals from unrelated submissions and weak signals from minor modifications.

Denoising Knowledge Tracing +2

AuthGuard: Generalizable Deepfake Detection via Language Guidance

no code implementations4 Jun 2025 Guangyu Shen, Zhihua Li, Xiang Xu, Tianchen Zhao, Zheng Zhang, Dongsheng An, Zhuowen Tu, Yifan Xing, Qin Zhang

To achieve this, we train an expert deepfake vision encoder by combining discriminative classification with image-text contrastive learning, where the text is generated by generalist MLLMs using few-shot prompting.

Contrastive Learning DeepFake Detection +1

Projection Pursuit Density Ratio Estimation

no code implementations1 Jun 2025 Meilin Wang, Wei Huang, Mingming Gong, Zheng Zhang

Density ratio estimation (DRE) is a paramount task in machine learning, for its broad applications across multiple domains, such as covariate shift adaptation, causal inference, independence tests and beyond.

Causal Inference Density Ratio Estimation

MGS3: A Multi-Granularity Self-Supervised Code Search Framework

no code implementations30 May 2025 Rui Li, Junfeng Kang, Qi Liu, Liyang He, Zheng Zhang, Yunhao Sha, Linbo Zhu, Zhenya Huang

Subsequently, we introduce a novel Multi-Granularity Self-Supervised contrastive learning code Search framework (MGS$^{3}$}).

Code Search Contrastive Learning +1

FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression

1 code implementation29 May 2025 Jiayi Tian, Ryan Solgi, Jinming Lu, Yifan Yang, Hai Li, Zheng Zhang

Large Language Models (LLMs) have enabled remarkable progress in natural language processing, yet their high computational and memory demands pose challenges for deployment in resource-constrained environments.

Language Modeling Language Modelling +2

Learning to Select In-Context Demonstration Preferred by Large Language Model

no code implementations26 May 2025 Zheng Zhang, Shaocheng Lan, Lei Song, Jiang Bian, Yexin Li, Kan Ren

In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks during inference using only a few demonstrations.

In-Context Learning Language Modeling +4

Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression

no code implementations26 May 2025 Yiwei Xie, Ping Liu, Zheng Zhang

Text-to-Image (T2I) models have demonstrated impressive capabilities in generating high-quality and diverse visual content from natural language prompts.

Adversarial Robustness Disentanglement +1

Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models

no code implementations20 May 2025 Ryan Solgi, Kai Zhen, Rupak Vignesh Swaminathan, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang

In this study, we investigate low-rank tensorized LLMs during fine-tuning and propose sparse augmented tensor networks (Saten) to enhance their performance.

Model Compression Tensor Networks

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

1 code implementation8 May 2025 Yunxin Li, Zhenyu Liu, Zitao Li, Xuanyu Zhang, Zhenran Xu, Xinyu Chen, Haoyuan Shi, Shenyuan Jiang, Xintong Wang, Jifang Wang, Shouzheng Huang, Xinping Zhao, Borui Jiang, Lanqing Hong, Longyue Wang, Zhuotao Tian, Baoxing Huai, Wenhan Luo, Weihua Luo, Zheng Zhang, Baotian Hu, Min Zhang

Large Multimodal Reasoning Models (LMRMs) have emerged as a promising paradigm, integrating modalities such as text, images, audio, and video to support complex reasoning capabilities and aiming to achieve comprehensive perception, precise understanding, and deep reasoning.

Multimodal Reasoning

Attonsecond Streaking Phase Retrieval Via Deep Learning Methods

no code implementations6 May 2025 Yuzhou Zhu, Zheng Zhang, Ruyi Zhang, Liang Zhou

Attosecond streaking phase retrieval is essential for resolving electron dynamics on sub-femtosecond time scales yet traditional algorithms rely on iterative minimization and central momentum approximations that degrade accuracy for broadband pulses.

Deep Learning Graph Attention +2

MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind

no code implementations25 Apr 2025 Zheng Zhang, Nuoqian Xiao, Qi Chai, Deheng Ye, Hao Wang

Large Language Model (LLM) agents have demonstrated impressive capabilities in social deduction games (SDGs) like Werewolf, where strategic reasoning and social deception are essential.

Large Language Model Multimodal Reasoning

An Empirical Study on Prompt Compression for Large Language Models

1 code implementation24 Apr 2025 Zheng Zhang, Jinyi Li, Yihuai Lan, Xiang Wang, Hao Wang

Prompt engineering enables Large Language Models (LLMs) to perform a variety of tasks.

Articles Math +2

Enhancing Multi-task Learning Capability of Medical Generalist Foundation Model via Image-centric Multi-annotation Data

no code implementations14 Apr 2025 Xun Zhu, Fanbin Mo, Zheng Zhang, Jiaxi Wang, Yiming Shi, Ming Wu, Chuang Zhang, Miao Li, Ji Wu

In this paper, we introduce the image-centric multi-annotation X-ray dataset (IMAX), the first attempt to enhance the multi-task learning capabilities of medical multi-modal large language models (MLLMs) from the data construction level.

Multi-Task Learning

The Other Side of the Coin: Exploring Fairness in Retrieval-Augmented Generation

1 code implementation11 Apr 2025 Zheng Zhang, Ning li, Qi Liu, Rui Li, Weibo Gao, Qingyang Mao, Zhenya Huang, Baosheng Yu, DaCheng Tao

By referencing this external knowledge, RAG effectively reduces the generation of factually incorrect content and addresses hallucination issues within LLMs.

Fairness Hallucination +3

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

no code implementations10 Apr 2025 ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen, Riwei Chen, Liangqiang Chen, Zixin Chen, Jinsong Chen, Siyan Chen, Kaiyuan Chen, Zhi Chen, Jin Chen, Jiecao Chen, Jinxin Chi, Weinan Dai, Ning Dai, Jiahui Dai, Shihan Dou, Yantao Du, Zhengyin Du, Jianhui Duan, Chen Dun, Ting-Han Fan, Jiazhan Feng, Junda Feng, Ziyuan Feng, Yuwei Fu, Wenqi Fu, Hanjie Fu, Hao Ge, Hongyi Guo, Mingji Han, Li Han, Wenhao Hao, Xintong Hao, Qianyu He, Jerry He, Feng He, Wen Heng, Zehua Hong, Qi Hou, Liang Hu, Shengding Hu, Nan Hu, Kai Hua, Qi Huang, Ziyue Huang, Hongzhi Huang, Zihao Huang, Ting Huang, Wenhao Huang, Wei Jia, Bin Jia, Xiaoying Jia, Yuhua Jiang, Haobin Jiang, Ziheng Jiang, Kaihua Jiang, Chengquan Jiang, Jianpeng Jiao, Xiaoran Jin, Xing Jin, Xunhao Lai, Xiang Li, Liyi Li, Hongkai Li, Zheng Li, Shengxian Wan, Ya Wang, Yunshui Li, Chenggang Li, Niuniu Li, Siyu Li, Xi Li, Xiao Li, Aoyan Li, Yuntao Li, Nianning Liang, Xinnian Liang, Haibin Lin, Weijian Lin, Ye Lin, Zhicheng Liu, Guanlin Liu, Chenxiao Liu, Yan Liu, Gaohong Liu, Juncai Liu, Chundian Liu, Deyi Liu, Kaibo Liu, Siyao Liu, Qi Liu, Yongfei Liu, Kang Liu, Gan Liu, Boyi Liu, Rui Long, Weiqiang Lou, Chenwei Lou, Xiang Luo, Yao Luo, Caiping Lv, Heyang Lv, Bole Ma, Qianli Ma, Hongzhi Ma, Yiyuan Ma, Jin Ma, Wenchang Ma, Tingting Ma, Chen Mao, Qiyang Min, Zhe Nan, Guanghan Ning, Jinxiang Ou, Haojie Pan, Renming Pang, Yanghua Peng, Tao Peng, Lihua Qian, Mu Qiao, Meng Qu, Cheng Ren, Hongbin Ren, Yong Shan, Wei Shen, Ke Shen, Kai Shen, Guangming Sheng, Jinlong Shi, Wenlei Shi, Guang Shi, Shuai Shuai Cao, Yuxin Song, Zuquan Song, Jing Su, Yifan Sun, Tao Sun, Zewei Sun, Borui Wan, Xiaohui Wang, Xi Wang, Shuguang Wang, Jun Wang, Qinlong Wang, Chenyuan Wang, Shuai Wang, Zihan Wang, Changbao Wang, Jiaqiang Wang, Shihang Wang, Xuwu Wang, Zaiyuan Wang, Yuxuan Wang, Wenqi Wang, Taiqing Wang, Chengzhi Wei, Houmin Wei, Ziyun Wei, Shufa Wei, Zheng Wu, Yonghui Wu, Yangjun Wu, Bohong Wu, Shuang Wu, Jingqiao Wu, Ning Wu, Shuangzhi Wu, Jianmin Wu, Chenguang Xi, Fan Xia, Yuqiao Xian, Liang Xiang, Boren Xiang, Bowen Xiao, Zhen Xiao, Xia Xiao, Yongsheng Xiao, Chao Xin, Shulin Xin, Yuwen Xiong, Jingjing Xu, Ziwen Xu, Chenyin Xu, Jiayi Xu, Yifan Xu, Wei Xu, Yufei Xu, Shikun Xu, Shipeng Yan, Shen Yan, Qingping Yang, Xi Yang, Tianhao Yang, Yuehang Yang, Yuan Yang, Ximing Yang, Zeyu Yang, Guang Yang, Yifan Yang, Xuesong Yao, Bairen Yi, Fan Yin, Jianian Yin, Ziqiang Ying, Xiangyu Yu, Hongli Yu, Song Yu, Menghan Yu, Huan Yu, Siyu Yuan, Jun Yuan, Yutao Zeng, Tianyang Zhan, Zheng Zhang, Yun Zhang, Mofan Zhang, Wang Zhang, Ru Zhang, Zhi Zhang, Tianqi Zhang, Xinyi Zhang, Zhexi Zhang, Sijun Zhang, Wenqiang Zhang, Xiangxiang Zhang, Yongtao Zhang, Yuyu Zhang, Ge Zhang, He Zhang, Yue Zhang, Renjie Zheng, Ningxin Zheng, Zhuolin Zheng, Yaowei Zheng, Chen Zheng, Xiaoyun Zhi, Wanjun Zhong, Cheng Zhong, Zheng Zhong, Baoquan Zhong, Xun Zhou, Na Zhou, Huan Zhou, Hang Zhu, Defa Zhu, Wenjia Zhu, Lei Zuo

We introduce Seed1. 5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks.

Mixture-of-Experts reinforcement-learning +1

Kimi-VL Technical Report

1 code implementation10 Apr 2025 Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, HaoNing Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen, Zongyu Lin

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).

Long-Context Understanding Mathematical Reasoning +4

DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimization in 3D-IC Design

1 code implementation4 Apr 2025 Xinling Yu, Ziyue Liu, Hai Li, Yixing Li, Xin Ai, Zhiyu Zeng, Ian Young, Zheng Zhang

Third, we propose a confidence score to evaluate the trustworthiness of the predicted results, and further develop a hybrid optimization workflow that combines operator learning with finite difference (FD) using Generalized Minimal Residual (GMRES) method for incremental solution refinement, enabling efficient and trustworthy thermal optimization.

Kolmogorov-Arnold Networks Operator learning

LLM for Complex Reasoning Task: An Exploratory Study in Fermi Problems

no code implementations3 Apr 2025 Zishuo Liu, Carlos Rabat Villarreal, Mostafa Rahgouy, Amit Das, Zheng Zhang, Chang Ren, Dongji Feng

Comparative experiments confirmed this hypothesis, demonstrating that LLMs performed better on standard FPs in terms of both accuracy and efficiency.

Mathematical Reasoning

Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation

no code implementations31 Mar 2025 Yongle Li, Bo Liu, Sheng Huang, Zheng Zhang, Xiaotong Yuan, Richang Hong

In federated learning, fine-tuning pre-trained foundation models poses significant challenges, particularly regarding high communication cost and suboptimal model performance due to data heterogeneity between the clients.

Federated Learning

Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing

no code implementations CVPR 2025 Zhuowei Li, Tianchen Zhao, Xiang Xu, Zheng Zhang, Zhihua Li, Xuanbai Chen, Qin Zhang, Alessandro Bergamo, Anil K. Jain, Yifan Xing

Developing a face anti-spoofing model that meets the security requirements of clients worldwide is challenging due to the domain gap between training datasets and diverse end-user test data.

Face Anti-Spoofing

TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting

no code implementations11 Mar 2025 Fengyi Zhang, Huitong Yang, Zheng Zhang, Zi Huang, Yadan Luo

Self-supervised 3D occupancy prediction offers a promising solution for understanding complex driving scenes without requiring costly 3D annotations.

Muon is Scalable for LLM Training

2 code implementations24 Feb 2025 Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, Yanru Chen, Huabin Zheng, Yibo Liu, Shaowei Liu, Bohong Yin, Weiran He, Han Zhu, Yuzhi Wang, Jianzhou Wang, Mengnan Dong, Zheng Zhang, Yongsheng Kang, Hao Zhang, Xinran Xu, Yutao Zhang, Yuxin Wu, Xinyu Zhou, Zhilin Yang

Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven.

Computational Efficiency

Connector-S: A Survey of Connectors in Multi-modal Large Language Models

no code implementations17 Feb 2025 Xun Zhu, Zheng Zhang, Xi Chen, Yiming Shi, Miao Li, Ji Wu

With the rapid advancements in multi-modal large language models (MLLMs), connectors play a pivotal role in bridging diverse modalities and enhancing model performance.

Mixture-of-Experts Survey

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models

no code implementations17 Feb 2025 Zhen Zhang, Yifan Yang, Kai Zhen, Nathan Susanj, Athanasios Mouchtaris, Siegfried Kunzmann, Zheng Zhang

Large language models have demonstrated exceptional capabilities across diverse tasks, but their fine-tuning demands significant memory, posing challenges for resource-constrained environments.

Multi-Task Learning

CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation

1 code implementation16 Feb 2025 Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Zi Yang, Paul Hovland, Bogdan Nicolae, Franck Cappello, Zheng Zhang

Motivated by such observations, we propose CoLA and its memory-efficient implementation, CoLA-M, to replace these full-size layers with compute-efficient auto-encoders that naturally enforce low-rank activations throughout training.

CoLA

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

no code implementations25 Jan 2025 Weikang Meng, Yadan Luo, Xin Li, Dongmei Jiang, Zheng Zhang

Linear attention has emerged as a promising alternative to softmax-based attention, leveraging kernelized feature maps to reduce complexity from quadratic to linear in sequence length.

CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs

no code implementations25 Jan 2025 Yuntong Hu, Zhihan Lei, Zhongjie Dai, Allen Zhang, Abhinav Angirekula, Zheng Zhang, Liang Zhao

In this paper, we introduce Contextualized Graph Retrieval-Augmented Generation (CG-RAG), a novel framework that integrates sparse and dense retrieval signals within graph structures to enhance retrieval efficiency and subsequently improve generation quality for research question answering.

Information Retrieval Question Answering +3

Kimi k1.5: Scaling Reinforcement Learning with LLMs

2 code implementations22 Jan 2025 Kimi Team, Angang Du, Bofei Gao, Bowei Xing, Changjiu Jiang, Cheng Chen, Cheng Li, Chenjun Xiao, Chenzhuang Du, Chonghua Liao, Chuning Tang, Congcong Wang, Dehao Zhang, Enming Yuan, Enzhe Lu, Fengxiang Tang, Flood Sung, Guangda Wei, Guokun Lai, Haiqing Guo, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haotian Yao, Haotian Zhao, Haoyu Lu, Haoze Li, Haozhen Yu, Hongcheng Gao, Huabin Zheng, Huan Yuan, Jia Chen, Jianhang Guo, Jianlin Su, Jianzhou Wang, Jie Zhao, Jin Zhang, Jingyuan Liu, Junjie Yan, Junyan Wu, Lidong Shi, Ling Ye, Longhui Yu, Mengnan Dong, Neo Zhang, Ningchen Ma, Qiwei Pan, Qucheng Gong, Shaowei Liu, Shengling Ma, Shupeng Wei, Sihan Cao, Siying Huang, Tao Jiang, Weihao Gao, Weimin Xiong, Weiran He, Weixiao Huang, Wenhao Wu, Wenyang He, Xianghui Wei, Xianqing Jia, Xingzhe Wu, Xinran Xu, Xinxing Zu, Xinyu Zhou, Xuehai Pan, Y. Charles, Yang Li, Yangyang Hu, Yangyang Liu, Yanru Chen, Yejie Wang, Yibo Liu, Yidao Qin, Yifeng Liu, Ying Yang, Yiping Bao, Yulun Du, Yuxin Wu, Yuzhi Wang, Zaida Zhou, Zhaoji Wang, Zhaowei Li, Zhen Zhu, Zheng Zhang, Zhexu Wang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Ziyao Xu, Zonghan Yang

Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e. g., 60. 8 on AIME, 94. 6 on MATH500, 47. 3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3. 5 by a large margin (up to +550%).

Math reinforcement-learning +2

Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems

1 code implementation17 Jan 2025 Weibo Gao, Qi Liu, Linan Yue, Fangzhou Yao, Rui Lv, Zheng Zhang, Hao Wang, Zhenya Huang

Personalized learning represents a promising educational strategy within intelligent educational systems, aiming to enhance learners' practice efficiency.

Response Generation

DVM: Towards Controllable LLM Agents in Social Deduction Games

no code implementations12 Jan 2025 Zheng Zhang, Yihuai Lan, Yangsen Chen, Lei Wang, Xiang Wang, Hao Wang

This control not only ensures that NPCs can adapt to varying difficulty levels during gameplay, but also provides insights into the safety and fairness of LLM agents.

Fairness

Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization

no code implementations11 Jan 2025 Jiayi Tian, Jinming Lu, Hai Li, Xiangwei Wang, Cong, Hao, Ian Young, Zheng Zhang

Compared to uncompressed training on the NVIDIA RTX 3090 GPU, our on-FPGA training achieves a memory reduction of $30\times$ to $51\times$.

Domain Adaptation

A Semantic Knowledge Complementarity based Decoupling Framework for Semi-supervised Class-imbalanced Medical Image Segmentation

no code implementations CVPR 2025 Zheng Zhang, Guanchun Yin, Bo Zhang, Wu Liu, Xiuzhuang Zhou, Wendong Wang

We also design a semantic knowledge complementarity module that adopts labeled data to guide the generation of pseudo labels and enriches the semantic features of labeled data with unlabeled data, which improves the quality of generated pseudo labels and the robustness of the overall model.

Decoder Image Segmentation +5

Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models

1 code implementation19 Dec 2024 Zijun Chen, WenBo Hu, Guande He, Zhijie Deng, Zheng Zhang, Richang Hong

This paper investigates representative MLLMs, focusing on their calibration across various scenarios, including before and after visual fine-tuning, as well as before and after multimodal training of the base LLMs.

Autonomous Driving Image Captioning +2

PerSphere: A Comprehensive Framework for Multi-Faceted Perspective Retrieval and Summarization

1 code implementation17 Dec 2024 Yun Luo, Yingjie Li, Xiangkun Hu, Qinglin Qi, Fang Guo, Qipeng Guo, Zheng Zhang, Yue Zhang

As online platforms and recommendation algorithms evolve, people are increasingly trapped in echo chambers, leading to biased understandings of various issues.

Retrieval

Transferable Adversarial Face Attack with Text Controlled Attribute

2 code implementations16 Dec 2024 Wenyun Li, Zheng Zhang, Xiangyuan Lan, Dongmei Jiang

Extensive experiments on two high-resolution face recognition datasets validate that our TCA$^2$ method can generate natural text-guided adversarial impersonation faces with high transferability.

Attribute Face Recognition

Dense Cross-Connected Ensemble Convolutional Neural Networks for Enhanced Model Robustness

no code implementations9 Dec 2024 Longwei Wang, Xueqian Li, Zheng Zhang

The resilience of convolutional neural networks against input variations and adversarial attacks remains a significant challenge in image recognition tasks.

Ensemble Learning

SUPERMERGE: An Approach For Gradient-Based Model Merging

no code implementations9 Dec 2024 HaoYu Yang, Zheng Zhang, Saket Sathe

A straightforward solution requires fine-tuning the model again for both existing and new tasks, which is computationally expensive and time-consuming.

model

Beyond Idle Channels: Unlocking Idle Space with Signal Alignment in Massive MIMO Cognitive Radio Networks

no code implementations9 Dec 2024 Weidong Zhu, Xueqian Li, Longwei Wang, Zheng Zhang

We propose a comprehensive framework that synergizes spatial spectrum sensing, signal alignment, and resource allocation, specifically designed for secondary users in CRNs.

PoTable: Towards Systematic Thinking via Stage-oriented Plan-then-Execute Reasoning on Tables

no code implementations5 Dec 2024 Qingyang Mao, Qi Liu, Zhi Li, Mingyue Cheng, Zheng Zhang, Rui Li

In recent years, table reasoning has garnered substantial research interest, particularly its integration with Large Language Models (LLMs) which revolutionize natural language applications.

Code Generation Large Language Model

Optimizing Student Ability Assessment: A Hierarchy Constraint-Aware Cognitive Diagnosis Framework for Educational Contexts

no code implementations21 Nov 2024 Xinjie Sun, Qi Liu, Kai Zhang, Shuanghong Shen, Fei Wang, Yan Zhuang, Zheng Zhang, Weiyin Gong, Shijin Wang, Lina Yang, Xingying Huo

To address this, we propose the Hierarchy Constraint-Aware Cognitive Diagnosis Framework (HCD), designed to more accurately represent student ability performance within real educational contexts.

cognitive diagnosis Diagnostic +1

DRL-Based Optimization for AoI and Energy Consumption in C-V2X Enabled IoV

1 code implementation20 Nov 2024 Zheng Zhang, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

Therefore, this paper analyzes the effects of multi-priority queues and NOMA on AoI in the C-V2X vehicular communication system and proposes an energy consumption and AoI optimization method based on DRL.

Deep Reinforcement Learning Scheduling

Coverage-Constrained Human-AI Cooperation with Multiple Experts

no code implementations18 Nov 2024 Zheng Zhang, Cuong Nguyen, Kevin Wells, Thanh-Toan Do, David Rosewarne, Gustavo Carneiro

However, a notable research gap remains in effectively exploring both L2D and L2C under diverse expert knowledge to improve decision-making, particularly when constrained by the cooperation cost required to achieve a target probability for AI-only selection (i. e., coverage).

Decision Making

Poor Man's Training on MCUs: A Memory-Efficient Quantized Back-Propagation-Free Approach

no code implementations7 Nov 2024 Yequan Zhao, Hai Li, Ian Young, Zheng Zhang

This paper presents a simple BP-free training scheme on an MCU, which makes edge training hardware design as easy as inference hardware design.

Dimensionality Reduction

Can Language Models Learn to Skip Steps?

1 code implementation4 Nov 2024 Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Cheng Jiayang, Yue Zhang, Xipeng Qiu, Zheng Zhang

In this work, we study the ability to skip steps in reasoning - a hallmark of human expertise developed through practice.

Collaborative Cognitive Diagnosis with Disentangled Representation Learning for Learner Modeling

1 code implementation4 Nov 2024 Weibo Gao, Qi Liu, Linan Yue, Fangzhou Yao, Hao Wang, Yin Gu, Zheng Zhang

Motivated by the success of collaborative modeling in various domains, such as recommender systems, we aim to investigate how collaborative signals among learners contribute to the diagnosis of human cognitive states (i. e., knowledge proficiency) in the context of intelligent education.

cognitive diagnosis Disentanglement +1

TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models

no code implementations20 Oct 2024 Bo Pan, Zhen Xiong, Guanchen Wu, Zheng Zhang, Yifei Zhang, Liang Zhao

Despite advancements in TAG learning methodologies, challenges remain in explainability due to the black-box nature of existing TAG representation learning models.

Decision Making Graph Learning +6

VideoSAM: Open-World Video Segmentation

no code implementations11 Oct 2024 Pinxue Guo, Zixu Zhao, Jianxiong Gao, Chongruo wu, Tong He, Zheng Zhang, Tianjun Xiao, Wenqiang Zhang

Video segmentation is essential for advancing robotics and autonomous driving, particularly in open-world settings where continuous perception and object association across video frames are critical.

Autonomous Driving Decoder +7

ECon: On the Detection and Resolution of Evidence Conflicts

1 code implementation5 Oct 2024 Cheng Jiayang, Chunkit Chan, Qianqian Zhuang, Lin Qiu, Tianhang Zhang, Tengxiao Liu, Yangqiu Song, Yue Zhang, PengFei Liu, Zheng Zhang

The rise of large language models (LLMs) has significantly influenced the quality of information in decision-making systems, leading to the prevalence of AI-generated content and challenges in detecting misinformation and managing conflicting information, or "inter-evidence conflicts."

Decision Making Misinformation +1

BadCM: Invisible Backdoor Attack Against Cross-Modal Learning

1 code implementation3 Oct 2024 Zheng Zhang, Xu Yuan, Lei Zhu, Jingkuan Song, Liqiang Nie

In this paper, we introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor and propose a generalized invisible backdoor framework against cross-modal learning (BadCM).

Backdoor Attack Cross-Modal Retrieval +1

HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation

no code implementations2 Oct 2024 Yuntong Hu, Zhuofeng Li, Zheng Zhang, Chen Ling, Raasikh Kanjiani, Boxin Zhao, Liang Zhao

In this work, we present HiReview, a novel framework for hierarchical taxonomy-driven automatic literature review generation.

Clustering Review Generation

Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging

no code implementations1 Oct 2024 Yiming Ju, Ziyi Ni, Xingrun Xing, Zhixiong Zeng, Hanyu Zhao, Siqi Fan, Zheng Zhang

Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks.

One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

1 code implementation29 Sep 2024 Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou

We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos.

All Image Segmentation +12

VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models

1 code implementation23 Sep 2024 Jingtao Cao, Zheng Zhang, Hongru Wang, Kam-Fai Wong

Progress in Text-to-Image (T2I) models has significantly improved the generation of images from textual descriptions.

Image Generation

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation

1 code implementation7 Sep 2024 Jiaxin Cheng, Zixu Zhao, Tong He, Tianjun Xiao, Yicong Zhou, Zheng Zhang

Recent advancements in generative models have significantly enhanced their capacity for image generation, enabling a wide range of applications such as image editing, completion and video editing.

Layout-to-Image Generation Video Editing

Representation Learning of Geometric Trees

no code implementations16 Aug 2024 Zheng Zhang, Allen Zhang, Ruth Nelson, Giorgio Ascoli, Liang Zhao

Geometric trees are characterized by their tree-structured layout and spatially constrained nodes and edges, which significantly impacts their topological attributes.

Representation Learning Self-Supervised Learning

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

1 code implementation15 Aug 2024 Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Cheng Jiayang, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, PengFei Liu, Yue Zhang, Zheng Zhang

Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements.

Diagnostic RAG +2

Unified Lexical Representation for Interpretable Visual-Language Alignment

1 code implementation25 Jul 2024 YiFan Li, Yikai Wang, Yanwei Fu, Dongyu Ru, Zheng Zhang, Tong He

On the other hand, lexical representation, a vector whose element represents the similarity between the sample and a word from the vocabulary, is a natural sparse representation and interpretable, providing exact matches for individual words.

Cross-Modal Retrieval Language Modelling

Separable Operator Networks

1 code implementation15 Jul 2024 Xinling Yu, Sean Hooten, Ziyue Liu, Yequan Zhao, Marco Fiorentino, Thomas Van Vaerenbergh, Zheng Zhang

We provide a universal approximation theorem for SepONet proving the existence of a separable approximation to any nonlinear continuous operator.

Benchmarking Operator learning

Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning

1 code implementation11 Jul 2024 Shulin Song, Zheng Zhang, Qiong Wu, Qiang Fan, Pingyi Fan

To address this, 3GPP has developed Vehicle-to-Everything (V2X) specifications based on 5G New Radio (NR) technology, where Mode 2 Side-Link (SL) communication resembles Mode 4 in LTE-V2X, allowing direct communication between vehicles.

Autonomous Driving Deep Reinforcement Learning

Learning to Complement and to Defer to Multiple Users

1 code implementation9 Jul 2024 Zheng Zhang, Wenjie Ai, Kevin Wells, David Rosewarne, Thanh-Toan Do, Gustavo Carneiro

This process has three options: 1) AI autonomously classifies, 2) learning to complement, where AI collaborates with users, and 3) learning to defer, where AI defers to users.

Decision Making

A Survey of Models for Cognitive Diagnosis: New Developments and Future Directions

no code implementations7 Jul 2024 Fei Wang, Weibo Gao, Qi Liu, Jiatong Li, Guanhao Zhao, Zheng Zhang, Zhenya Huang, Mengxiao Zhu, Shijin Wang, Wei Tong, Enhong Chen

Cognitive diagnosis has been developed for decades as an effective measurement tool to evaluate human cognitive status such as ability level and knowledge mastery.

cognitive diagnosis parameter estimation

AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning

1 code implementation26 Jun 2024 Yifan Yang, Kai Zhen, Ershad Banijamal, Athanasios Mouchtaris, Zheng Zhang

In this paper, we propose the Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods.

Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning

1 code implementation21 Jun 2024 Suyi Li, Chenyi Jiang, Shidong Wang, Yang Long, Zheng Zhang, Haofeng Zhang

Inspired by the success of vanilla adversarial learning in Cross-Domain Few-Shot Learning, we take a step further and devise a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem.

Attribute Compositional Zero-Shot Learning +2

Investigating Annotator Bias in Large Language Models for Hate Speech Detection

3 code implementations17 Jun 2024 Amit Das, Zheng Zhang, Najib Hasan, Souvika Sarkar, Fatemeh Jamshidi, Tathagata Bhattacharya, Mostafa Rahgouy, Nilanjana Raychawdhary, Dongji Feng, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals

This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for data annotation, thereby fostering advancements in this critical field.

Descriptive Hate Speech Detection

TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs

1 code implementation14 Jun 2024 Zhuofeng Li, Zixing Gou, Xiangnan Zhang, Zhongyuan Liu, Sirui Li, Yuntong Hu, Chen Ling, Zheng Zhang, Liang Zhao

To address this gap, we introduce Textual-Edge Graphs Datasets and Benchmark (TEG-DB), a comprehensive and diverse collection of benchmark textual-edge datasets featuring rich textual descriptions on nodes and edges.

TAG

OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

no code implementations14 Jun 2024 Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai Wong

Utilizing the BLIP model for image captioning, PP-OCR and TrOCR for text recognition across multiple languages, and the Qwen LLM for nuanced language understanding, our system is capable of identifying harmful content in memes created in English, Chinese, Malay, and Tamil.

Image Captioning Language Modeling +4

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

1 code implementation CVPR 2024 Ke Fan, Zechen Bai, Tianjun Xiao, Tong He, Max Horn, Yanwei Fu, Francesco Locatello, Zheng Zhang

Moreover, our analysis substantiates that our method exhibits the capability to dynamically adapt the slot number according to each instance's complexity, offering the potential for further exploration in slot attention research.

Decoder Object +1

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

no code implementations13 Jun 2024 Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo

In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.

Retrieval

Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

1 code implementation12 Jun 2024 Duanyu Feng, Bowen Qin, Chen Huang, Youcheng Huang, Zheng Zhang, Wenqiang Lei

By leveraging this safety direction, Legend can then leverage the semantic distances of paired responses along this direction to annotate margins automatically.

Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications

no code implementations8 Jun 2024 Zhou Zhou, Guohang He, Zheng Zhang, Luziwei Leng, Qinghai Guo, Jianxing Liao, Xuan Song, Ran Cheng

We executed a series of neural decoding experiments involving nonhuman primates engaged in random reaching tasks, evaluating four prospective models, Gated Recurrent Unit (GRU), Transformer, Receptance Weighted Key Value (RWKV), and Selective State Space model (Mamba), across several metrics: single-session decoding, multi-session decoding, new session fine-tuning, inference speed, calibration speed, and scalability.

Benchmarking Mamba

Principles of Designing Robust Remote Face Anti-Spoofing Systems

no code implementations6 Jun 2024 Xiang Xu, Tianchen Zhao, Zheng Zhang, Zhihua Li, Jon Wu, Alessandro Achille, Mani Srivastava

Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor.

Face Anti-Spoofing Face Swapping

Synergetic Event Understanding: A Collaborative Approach to Cross-Document Event Coreference Resolution with Large Language Models

1 code implementation4 Jun 2024 Qingkai Min, Qipeng Guo, Xiangkun Hu, Songfang Huang, Zheng Zhang, Yue Zhang

Experimental results demonstrate that our approach surpasses the performance of both the large and small language models individually, forming a complementary advantage.

coreference-resolution Diversity +1

Dishonesty in Helpful and Harmless Alignment

no code implementations4 Jun 2024 Youcheng Huang, Jingkun Tang, Duanyu Feng, Zheng Zhang, Wenqiang Lei, Jiancheng Lv, Anthony G. Cohn

We find that this also induces dishonesty in helpful and harmless alignment where LLMs tell lies in generating harmless responses.

Xwin-LM: Strong and Scalable Alignment Practice for LLMs

1 code implementation30 May 2024 Bolin Ni, Jingcheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang, Gaofeng Meng, Han Hu

In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs).

TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

no code implementations27 May 2024 Zheng Zhang, Yuntong Hu, Bo Pan, Chen Ling, Liang Zhao

Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions, enabling detailed representation of data and their relationships across a broad spectrum of real-world scenarios.

Representation Learning Self-Supervised Learning +1

GRAG: Graph Retrieval-Augmented Generation

1 code implementation26 May 2024 Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao

Naive Retrieval-Augmented Generation (RAG) focuses on individual documents during retrieval and, as a result, falls short in handling networked documents which are very popular in many applications such as citation graphs, social media, and knowledge graphs.

Entity Retrieval Knowledge Graphs +3

CoMERA: Computing- and Memory-Efficient Training via Rank-Adaptive Tensor Optimization

1 code implementation23 May 2024 Zi Yang, Ziyue Liu, Samridhi Choudhary, Xinfeng Xie, Cao Gao, Siegfried Kunzmann, Zheng Zhang

Our method also shows $\sim 2\times$ speedup than standard pre-training on a BERT-like code-generation LLM while achieving $4. 23\times$ compression ratio in pre-training.

Code Generation Recommendation Systems

Hallucination of Multimodal Large Language Models: A Survey

1 code implementation29 Apr 2024 Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou

By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field.

Hallucination Survey

4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs

1 code implementation28 Apr 2024 Minjie Wang, Quan Gan, David Wipf, Zhenkun Cai, Ning li, Jianheng Tang, Yanlin Zhang, Zizhao Zhang, Zunyao Mao, Yakun Song, Yanbo Wang, Jiahang Li, Han Zhang, Guang Yang, Xiao Qin, Chuan Lei, Muhan Zhang, Weinan Zhang, Christos Faloutsos, Zheng Zhang

Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing.

Benchmarking

DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

no code implementations23 Apr 2024 Linxuan Xin, Zheng Zhang, Jinfu Wei, Wei Gao, Duan Gao

Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets.

Decoder

Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective

no code implementations6 Apr 2024 Duanyu Feng, Bowen Qin, Chen Huang, Zheng Zhang, Wenqiang Lei

Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences.

PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

1 code implementation31 Mar 2024 Zhuotong Chen, Zihu Wang, Yifan Yang, Qianxiao Li, Zheng Zhang

This approach reduces the computational cost to that of using just the P controller, instead of the full PID control.

EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs

1 code implementation30 Mar 2024 Cheng Jiayang, Lin Qiu, Chunkit Chan, Xin Liu, Yangqiu Song, Zheng Zhang

In this work, we propose an initial comprehensive framework called EventGround, which aims to tackle the problem of grounding free-texts to eventuality-centric KGs for contextualized narrative reasoning.

Graph Neural Network Knowledge Graphs +3

CDIMC-net: Cognitive Deep Incomplete Multi-view Clustering Network

no code implementations28 Mar 2024 Jie Wen, Zheng Zhang, Yong Xu, Bob Zhang, Lunke Fei, Guo-Sen Xie

In this paper, we propose a novel incomplete multi-view clustering network, called Cognitive Deep Incomplete Multi-view Clustering Network (CDIMC-net), to address these issues.

Clustering Graph Embedding +1

Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting

no code implementations22 Mar 2024 Zheng Zhang, WenBo Hu, Yixing Lao, Tong He, Hengshuang Zhao

3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance.

3DGS NeRF +1

PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model

1 code implementation21 Mar 2024 Zheng Zhang, Yeyao Ma, Enming Zhang, Xiang Bai

PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.

Ranked #2 on Referring Expression Segmentation on RefCoCo val (using extra training data)

Decoder Generalized Referring Expression Segmentation +6

NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

1 code implementation18 Mar 2024 Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Xiangkun Hu, Zheng Zhang, Qian Wang, Yue Zhang

The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information.

Benchmarking Question Answering

Common 7B Language Models Already Possess Strong Math Capabilities

2 code implementations7 Mar 2024 Chen Li, Weiqi Wang, Jingcheng Hu, Yixuan Wei, Nanning Zheng, Han Hu, Zheng Zhang, Houwen Peng

This paper shows that the LLaMA-2 7B model with common pre-training already exhibits strong mathematical abilities, as evidenced by its impressive accuracy of 97. 7% and 72. 0% on the GSM8K and MATH benchmarks, respectively, when selecting the best response from 256 random generations.

GSM8K Math

OffensiveLang: A Community Based Implicit Offensive Language Dataset

1 code implementation4 Mar 2024 Amit Das, Mostafa Rahgouy, Dongji Feng, Zheng Zhang, Tathagata Bhattacharya, Nilanjana Raychawdhary, Fatemeh Jamshidi, Vinija Jain, Aman Chadha, Mary Sandage, Lauramarie Pope, Gerry Dozier, Cheryl Seals

Firstly, the existing datasets primarily rely on the collection of texts containing explicit offensive keywords, making it challenging to capture implicitly offensive contents that are devoid of these keywords.

Language Modelling Large Language Model +1

Distilling Large Language Models for Text-Attributed Graph Learning

no code implementations19 Feb 2024 Bo Pan, Zheng Zhang, Yifei Zhang, Yuntong Hu, Liang Zhao

To address the inherent gaps between LLMs (generative models for texts) and graph models (discriminative models for graphs), we propose first to let LLMs teach an interpreter with rich textual rationale and then let a student model mimic the interpreter's reasoning without LLMs' textual rationale.

Graph Learning TAG

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models

1 code implementation18 Feb 2024 Yifan Yang, Jiajun Zhou, Ngai Wong, Zheng Zhang

Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance.

Multi-Task Learning parameter-efficient fine-tuning

Multi-Agent Generative Adversarial Interactive Self-Imitation Learning for AUV Formation Control and Obstacle Avoidance

no code implementations21 Jan 2024 Zheng Fang, Tianhao Chen, Dong Jiang, Zheng Zhang, Guangliang Li

Multi-agent generative adversarial imitation learning (MAGAIL) allows multi-AUV to learn from expert demonstration instead of pre-defined reward functions, but suffers from the deficiency of requiring optimal demonstrations and not surpassing provided expert demonstrations.

Imitation Learning Multi-agent Reinforcement Learning

See the Unseen: Better Context-Consistent Knowledge-Editing by Noises

no code implementations15 Jan 2024 Youcheng Huang, Wenqiang Lei, Zheng Zhang, Jiancheng Lv, Shuicheng Yan

In this paper, we empirically find that the effects of different contexts upon LLMs in recalling the same knowledge follow a Gaussian-like distribution.

knowledge editing

Real-Time FJ/MAC PDE Solvers via Tensorized, Back-Propagation-Free Optical PINN Training

no code implementations31 Dec 2023 Yequan Zhao, Xian Xiao, Xinling Yu, Ziyue Liu, Zhixiong Chen, Geza Kurczveil, Raymond G. Beausoleil, Zheng Zhang

Despite the ultra-high speed of optical neural networks, training a PINN on an optical chip is hard due to (1) the large size of photonic devices, and (2) the lack of scalable optical memory devices to store the intermediate results of back-propagation (BP).

Generate E-commerce Product Background by Integrating Category Commonality and Personalized Style

1 code implementation20 Dec 2023 Haohan Wang, Wei Feng, Yaoyu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao

Furthermore, for products with specific and fine-grained requirements in layout, elements, etc, a Personality-Wise Generator is devised to learn such personalized style directly from a reference image to resolve textual ambiguities, and is trained in a self-supervised manner for more efficient training data usage.

2k

Non-Euclidean Spatial Graph Neural Network

1 code implementation17 Dec 2023 Zheng Zhang, Sirui Li, Jingcheng Zhou, Junxiang Wang, Abhinav Angirekula, Allen Zhang, Liang Zhao

Besides, existing spatial network representation learning methods can only consider networks embedded in Euclidean space, and can not well exploit the rich geometric information carried by irregular and non-uniform non-Euclidean space.

Graph Neural Network Representation Learning

Planning and Rendering: Towards Product Poster Generation with Diffusion Models

no code implementations14 Dec 2023 Zhaochen Li, Fengheng Li, Wei Feng, Honghe Zhu, Yaoyu Li, Zheng Zhang, Jingjing Lv, Junjie Shen, Zhangang Lin, Jingping Shao, Zhenglu Yang

At the planning stage, we propose a PlanNet to generate the layout of the product and other visual components considering both the appearance features of the product and semantic features of the text, which improves the diversity and rationality of the layouts.

Diversity Image Inpainting +1

Segment and Caption Anything

1 code implementation CVPR 2024 Xiaoke Huang, JianFeng Wang, Yansong Tang, Zheng Zhang, Han Hu, Jiwen Lu, Lijuan Wang, Zicheng Liu

We propose a method to efficiently equip the Segment Anything Model (SAM) with the ability to generate regional captions.

Caption Generation object-detection +2

RTQ: Rethinking Video-language Understanding Based on Image-text Model

2 code implementations1 Dec 2023 Xiao Wang, Yaoyu Li, Tian Gan, Zheng Zhang, Jingjing Lv, Liqiang Nie

Recent advancements in video-language understanding have been established on the foundation of image-text models, resulting in promising outcomes due to the shared knowledge between images and videos.

Ranked #9 on Video Captioning on MSR-VTT (using extra training data)

Video Captioning Video Question Answering +1

Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

2 code implementations22 Nov 2023 Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu

Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields.

Hallucination Retrieval

Learning to Complement with Multiple Humans

no code implementations22 Nov 2023 Zheng Zhang, Cuong Nguyen, Kevin Wells, Thanh-Toan Do, Gustavo Carneiro

The ill-posedness of the LNL task requires the adoption of strong assumptions or the use of multiple noisy labels per training image, resulting in accurate models that work well in isolation but fail to optimise human-AI collaborative classification (HAI-CC).

image-classification Learning with noisy labels

A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

no code implementations17 Nov 2023 Ruohong Zhang, Luyu Gao, Chen Zheng, Zhen Fan, Guokun Lai, Zheng Zhang, Fangzhou Ai, Yiming Yang, Hongxia Yang

This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries.

Chatbot Text Generation

Asymptotically Fair Participation in Machine Learning Models: an Optimal Control Perspective

no code implementations16 Nov 2023 Zhuotong Chen, Qianxiao Li, Zheng Zhang

Moreover, we design a surrogate retention system based on existing literature on evolutionary population dynamics to approximate the dynamics of distribution shifts on active user counts, from which the objective of achieving asymptotically fair participation is formulated as an optimal control problem, and the control variables are considered as the model parameters.

Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts

1 code implementation23 Oct 2023 Tengxiao Liu, Qipeng Guo, Yuqing Yang, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang

As large language models (LLMs) have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity to each other on math reasoning tasks.

Logical Reasoning Math

Semantic-Aware Adversarial Training for Reliable Deep Hashing Retrieval

1 code implementation IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY 2023 Xu Yuan, Zheng Zhang, Xunguang Wang, Lin Wu

Further, we, for the first time, formulate the formalized adversarial training of deep hashing into a unified minimax optimization under the guidance of the generated mainstay codes.

Adversarial Attack Adversarial Robustness +2

Transferable Deep Clustering Model

no code implementations7 Oct 2023 Zheng Zhang, Liang Zhao

Deep learning has shown remarkable success in the field of clustering recently.

Clustering Deep Clustering +2

Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data

no code implementations7 Oct 2023 Yuntong Hu, Zheng Zhang, Liang Zhao

Large language models (LLMs) have achieved impressive performance on many natural language processing tasks.

Benchmarking

Large Language Models for Spatial Trajectory Patterns Mining

no code implementations7 Oct 2023 Zheng Zhang, Hossein Amiri, Zhenke Liu, Andreas Züfle, Liang Zhao

Identifying anomalous human spatial trajectory patterns can indicate dynamic changes in mobility behavior with applications in domains like infectious disease monitoring and elderly care.

Anomaly Detection

Balancing Specialized and General Skills in LLMs: The Impact of Modern Tuning and Data Strategy

no code implementations7 Oct 2023 Zheng Zhang, Chen Zheng, Da Tang, Ke Sun, Yukun Ma, Yingtong Bu, Xun Zhou, Liang Zhao

This paper introduces a multifaceted methodology for fine-tuning and evaluating large language models (LLMs) for specialized monetization tasks.

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

1 code implementation3 Oct 2023 Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Jiancheng Liu, Konstantinos Parasyris, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu

Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time.

Adversarial Defense Computational Efficiency +1

Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation

1 code implementation ICCV 2023 Ke Fan, Jingshi Lei, Xuelin Qian, Miaopeng Yu, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

Furthermore, we propose a multi-view fusion layer based temporal module which is equipped with a set of object slots and interacts with features from different views by attention mechanism to fulfill sufficient object representation completion.

Object Video Segmentation +1

Partition-A-Medical-Image: Extracting Multiple Representative Sub-regions for Few-shot Medical Image Segmentation

1 code implementation20 Sep 2023 Yazhou Zhu, Shidong Wang, Tong Xin, Zheng Zhang, Haofeng Zhang

In this work, we present an approach to extract multiple representative sub-regions from a given support medical image, enabling fine-grained selection over the generated image regions.

Image Segmentation Medical Image Segmentation +1

Unsupervised Open-Vocabulary Object Localization in Videos

1 code implementation ICCV 2023 Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He

In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.

Object Object Localization +1

FLM-101B: An Open LLM and How to Train It with $100K Budget

no code implementations7 Sep 2023 Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Xuying Meng, Siqi Fan, Peng Han, Jing Li, Li Du, Bowen Qin, Zheng Zhang, Aixin Sun, Yequan Wang

Large language models (LLMs) are considered important approaches towards foundational machine intelligence, achieving remarkable success in Natural Language Processing and multimodal tasks, among others.

Memorization

Object-Centric Multiple Object Tracking

1 code implementation ICCV 2023 Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao

Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.

Multiple Object Tracking Object +3

Coarse-to-Fine Amodal Segmentation with Shape Prior

1 code implementation ICCV 2023 Jianxiong Gao, Xuelin Qian, Yikai Wang, Tianjun Xiao, Tong He, Zheng Zhang, Yanwei Fu

To address this issue, we propose a convolution refine module to inject fine-grained information and provide a more precise amodal object segmentation based on visual features and coarse-predicted segmentation.

Object Segmentation +1

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

1 code implementation28 Aug 2023 Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen

With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users seeking access to data across various modalities.

Cross-Modal Retrieval Retrieval

Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks

no code implementations18 Aug 2023 Yequan Zhao, Xinling Yu, Zhixiong Chen, Ziyue Liu, Sijia Liu, Zheng Zhang

Backward propagation (BP) is widely used to compute the gradients in neural network training.

DETR Doesn't Need Multi-Scale or Locality Design

1 code implementation3 Aug 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

Decoder

KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection

no code implementations ICCV 2023 Yadan Luo, Zhuoxiao Chen, Zhen Fang, Zheng Zhang, Zi Huang, Mahsa Baktashmotlagh

Achieving a reliable LiDAR-based object detector in autonomous driving is paramount, but its success hinges on obtaining large amounts of precise 3D annotations.

3D Object Detection Active Learning +4

Partial Vessels Annotation-based Coronary Artery Segmentation with Self-training and Prototype Learning

1 code implementation10 Jul 2023 Zheng Zhang, XiaoLei Zhang, Yaolei Qi, Guanyu Yang

To this end, we propose partial vessels annotation (PVA) based on the challenges of coronary artery segmentation and clinical diagnostic characteristics.

Coronary Artery Segmentation Diagnostic +2

A review of dynamics design methods for high-speed and high-precision CNC machine tool feed systems

no code implementations7 Jul 2023 Xuesong Wang, Dongsheng Zhang, Zheng Zhang

With the development of CNC machine tools toward high speed and high precision, the traditional static design methods can hardly meet the demand.

Distributed Marker Representation for Ambiguous Discourse Markers and Entangled Relations

no code implementations19 Jun 2023 Dongyu Ru, Lin Qiu, Xipeng Qiu, Yue Zhang, Zheng Zhang

Discourse analysis is an important task because it models intrinsic semantic structures between sentences in a document.

Sentence

A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels

no code implementations8 Jun 2023 Yifan Yang, Alec Koppel, Zheng Zhang

In this paper, we propose a novel gradient-based approach to enable the detection of noisy labels for the online learning of model parameters, named Online Gradient-based Robust Selection (OGRS).

Learning with noisy labels

Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning

1 code implementation6 Jun 2023 Chujie Zheng, Pei Ke, Zheng Zhang, Minlie Huang

It has always been an important yet challenging problem to control language models to avoid generating texts with undesirable attributes, such as toxic language and unnatural repetition.

Contrastive Learning Text Generation

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

no code implementations1 Jun 2023 Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang

To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer.

Natural Language Understanding Quantization

An AMR-based Link Prediction Approach for Document-level Event Argument Extraction

1 code implementation30 May 2023 Yuqing Yang, Qipeng Guo, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang

Motivated by the fact that all event structures can be inferred from AMR, this work reformulates EAE as a link prediction problem on AMR graphs.

Abstract Meaning Representation Event Argument Extraction +2

Exploiting Abstract Meaning Representation for Open-Domain Question Answering

1 code implementation26 May 2023 Cunxiang Wang, Zhikun Xu, Qipeng Guo, Xiangkun Hu, Xuefeng Bai, Zheng Zhang, Yue Zhang

The Open-Domain Question Answering (ODQA) task involves retrieving and subsequently generating answers from fine-grained relevant passages within a database.

Abstract Meaning Representation Diversity +4

Evaluating Open-QA Evaluation

1 code implementation NeurIPS 2023 Cunxiang Wang, Sirui Cheng, Qipeng Guo, Yuanhao Yue, Bowen Ding, Zhikun Xu, Yidong Wang, Xiangkun Hu, Zheng Zhang, Yue Zhang

This study focuses on the evaluation of the Open Question Answering (Open-QA) task, which can directly estimate the factuality of large language models (LLMs).

Question Answering

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

1 code implementation12 May 2023 Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang

Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries.

Text to SQL Text-To-SQL

Masked Structural Growth for 2x Faster Language Model Pre-training

1 code implementation4 May 2023 Yiqun Yao, Zheng Zhang, Jing Li, Yequan Wang

In terms of growth schedule, the impact of each single dimension on a schedule's efficiency is under-explored by existing work.

Language Modeling Language Modelling +2

A Non-autoregressive Multi-Horizon Flight Trajectory Prediction Framework with Gray Code Representation

1 code implementation2 May 2023 Dongyue Guo, Zheng Zhang, Zhen Yan, Jianwei Zhang, Yi Lin

Additionally, the Gray code representation and the differential prediction paradigm are designed to cope with the high-bit misclassifications of the BE representation, which significantly reduces the outliers in the predictions.

Computational Efficiency Decoder +1

Integrating spoken instructions into flight trajectory prediction to optimize automation in air traffic control

no code implementations2 May 2023 Dongyue Guo, Zheng Zhang, Bo Yang, Jianwei Zhang, Hongyu Yang, Yi Lin

The booming air transportation industry inevitably burdens air traffic controllers' workload, causing unexpected human factor-related incidents.

Prediction Traffic Prediction +1

VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping

no code implementations16 Apr 2023 Zheng Zhang, Jie Gao, Ranjodh Singh Dhaliwal, Toby Jia-Jun Li

In argumentative writing, writers must brainstorm hierarchical writing goals, ensure the persuasiveness of their arguments, and revise and organize their plans through drafting.

Persuasiveness Text Generation

DeepMIM: Deep Supervision for Masked Image Modeling

1 code implementation15 Mar 2023 Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.

image-classification Image Classification +3

Particle-based Online Bayesian Sampling

no code implementations28 Feb 2023 Yifan Yang, Chang Liu, Zheng Zhang

Online optimization has gained increasing interest due to its capability of tracking real-world streaming data.

Variational Inference

DeepOHeat: Operator Learning-based Ultra-fast Thermal Simulation in 3D-IC Design

1 code implementation25 Feb 2023 Ziyue Liu, Yixing Li, Jing Hu, Xinling Yu, Shinyu Shiau, Xin Ai, Zhiyu Zeng, Zheng Zhang

In this paper, for the first time, we propose DeepOHeat, a physics-aware operator learning framework to predict the temperature field of a family of heat equations with multiple parametric or non-parametric design configurations.

Operator learning

PIFON-EPT: MR-Based Electrical Property Tomography Using Physics-Informed Fourier Networks

no code implementations23 Feb 2023 Xinling Yu, José E. C. Serrallés, Ilias I. Giannakopoulos, Ziyue Liu, Luca Daniel, Riccardo Lattanzi, Zheng Zhang

PIFON-EPT is the first method that can simultaneously reconstruct EP and transmit fields from incomplete noisy MR measurements, providing new opportunities for EPT research.

Denoising

Side Adapter Network for Open-Vocabulary Semantic Segmentation

3 code implementations CVPR 2023 Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

Language Modelling Open Vocabulary Semantic Segmentation +3

Tensorized Optical Multimodal Fusion Network

no code implementations17 Feb 2023 Yequan Zhao, Xian Xiao, Geza Kurczveil, Raymond G. Beausoleil, Zheng Zhang

We propose the first tensorized optical multimodal fusion network architecture with a self-attention mechanism and low-rank tensor fusion.

Apples and Oranges? Assessing Image Quality over Content Recognition

no code implementations22 Jan 2023 Junyong You, Zheng Zhang

A sequential spatial-channel attention module is proposed to simulate the visual attention and contrast sensitivity mechanisms that are crucial for content recognition and quality assessment.

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

1 code implementation ICCV 2023 Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu

With these new techniques and other designs, we show that the proposed general-purpose task-solver can perform both instance segmentation and depth estimation well.

All Instance Segmentation +2

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

2 code implementations CVPR 2023 Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu

Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.

image-classification Image Classification +1

iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition

no code implementations CVPR 2023 Yixuan Wei, Yue Cao, Zheng Zhang, Houwen Peng, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

This paper presents a method that effectively combines two prevalent visual recognition methods, i. e., image classification and contrastive language-image pre-training, dubbed iCLIP.

Classification image-classification +3

DETR Does Not Need Multi-Scale or Locality Design

1 code implementation ICCV 2023 Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

Decoder Object Detection

Improving CLIP Fine-tuning Performance

1 code implementation ICCV 2023 Yixuan Wei, Han Hu, Zhenda Xie, Ze Liu, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo

Experiments suggest that the feature map distillation approach significantly boosts the fine-tuning performance of CLIP models on several typical downstream vision tasks.

Diagnostic object-detection +2

Refined Edge Usage of Graph Neural Networks for Edge Prediction

no code implementations25 Dec 2022 Jiarui Jin, Yangkun Wang, Weinan Zhang, Quan Gan, Xiang Song, Yong Yu, Zheng Zhang, David Wipf

However, existing methods lack elaborate design regarding the distinctions between two tasks that have been frequently overlooked: (i) edges only constitute the topology in the node classification task but can be used as both the topology and the supervisions (i. e., labels) in the edge prediction task; (ii) the node classification makes prediction over each individual node, while the edge prediction is determinated by each pair of nodes.

Link Prediction Node Classification +1

EASpace: Enhanced Action Space for Policy Transfer

1 code implementation7 Dec 2022 Zheng Zhang, Qingrui Zhang, Bo Zhu, Xiaohan Wang, Tianjiang Hu

In this paper, a novel algorithm named EASpace (Enhanced Action Space) is proposed, which formulates macro actions in an alternative form to accelerate the learning process using multiple available sub-optimal expert policies.

Q-Learning Transfer Learning

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

1 code implementation5 Dec 2022 Xi Zhao, Wei Feng, Zheng Zhang, Jingjing Lv, Xin Zhu, Zhangang Lin, Jinghe Hu, Jingping Shao

Recently, segmentation-based methods are quite popular in scene text detection, which mainly contain two steps: text kernel segmentation and expansion.

Scene Text Detection Segmentation +1

Exploring Discrete Diffusion Models for Image Captioning

1 code implementation21 Nov 2022 Zixin Zhu, Yixuan Wei, JianFeng Wang, Zhe Gan, Zheng Zhang, Le Wang, Gang Hua, Lijuan Wang, Zicheng Liu, Han Hu

The image captioning task is typically realized by an auto-regressive method that decodes the text tokens one by one.

Image Captioning Image Generation

Could Giant Pretrained Image Models Extract Universal Representations?

no code implementations3 Nov 2022 Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao

In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition.

Action Recognition In Videos Instance Segmentation +5

RLET: A Reinforcement Learning Based Approach for Explainable QA with Entailment Trees

1 code implementation31 Oct 2022 Tengxiao Liu, Qipeng Guo, Xiangkun Hu, Yue Zhang, Xipeng Qiu, Zheng Zhang

RLET iteratively performs single step reasoning with sentence selection and deduction generation modules, from which the training signal is accumulated across the tree with elaborately designed aligned reward function that is consistent with the evaluation.

reinforcement-learning Reinforcement Learning (RL) +1

DORE: Document Ordered Relation Extraction based on Generative Framework

1 code implementation28 Oct 2022 Qipeng Guo, Yuqing Yang, Hang Yan, Xipeng Qiu, Zheng Zhang

In this paper, we investigate the root cause of the underwhelming performance of the existing generative DocRE models and discover that the culprit is the inadequacy of the training paradigm, instead of the capacities of the models.

Document-level Relation Extraction Relation

Conversation Disentanglement with Bi-Level Contrastive Learning

no code implementations27 Oct 2022 Chengyu Huang, Zheng Zhang, Hao Fei, Lizi Liao

Conversation disentanglement aims to group utterances into detached sessions, which is a fundamental task in processing multi-party conversations.

Contrastive Learning Conversation Disentanglement +1

MR-Based Electrical Property Reconstruction Using Physics-Informed Neural Networks

no code implementations23 Oct 2022 Xinling Yu, José E. C. Serrallés, Ilias I. Giannakopoulos, Ziyue Liu, Luca Daniel, Riccardo Lattanzi, Zheng Zhang

Electrical properties (EP), namely permittivity and electric conductivity, dictate the interactions between electromagnetic waves and biological tissue.

Self-supervised Amodal Video Object Segmentation

1 code implementation23 Oct 2022 Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang

The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned.

Object Segmentation +6

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

4 code implementations3 Oct 2022 Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, WeiHong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu

Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens.

Clustering Depth Estimation +8

Vega-MT: The JD Explore Academy Translation System for WMT22

1 code implementation20 Sep 2022 Changtong Zan, Keqin Peng, Liang Ding, Baopu Qiu, Boan Liu, Shwai He, Qingyu Lu, Zheng Zhang, Chuang Liu, Weifeng Liu, Yibing Zhan, DaCheng Tao

As for model sizes, we scale the Transformer-Big up to the extremely large model that owns nearly 4. 7 Billion parameters, to fully enhance the model capacity for our Vega-MT.

Data Augmentation de-en +2

Whole-Body Lesion Segmentation in 18F-FDG PET/CT

1 code implementation16 Sep 2022 Jia Zhang, Yukun Huang, Zheng Zhang, Yuhang Shi

There has been growing research interest in using deep learning based method to achieve fully automated segmentation of lesion in Positron emission tomography computed tomography(PET CT) scans for the prognosis of various cancers.

Image Segmentation Lesion Segmentation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.