Search Results for author: Xiaohan Zhang

Found 59 papers, 27 papers with code

How Does the Experimental Setting Affect the Conclusions of Neural Encoding Models?

no code implementations LREC 2022 Xiaohan Zhang, Shaonan Wang, Chengqing Zong

Based on these results, we suggest a block-wise cross-validation training method and an adequate data size for increasing the performance of linear encoding models.

Metamon-GS: Enhancing Representability with Variance-Guided Densification and Light Encoding

no code implementations20 Apr 2025 Junyan Su, Baozhu Zhao, Xiaohan Zhang, Qi Liu

The introduction of 3D Gaussian Splatting (3DGS) has advanced novel view synthesis by utilizing Gaussians to represent scenes.

3DGS Novel View Synthesis

ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario

1 code implementation17 Jan 2025 Lucen Zhong, Zhengxiao Du, Xiaohan Zhang, Haiyi Hu, Jie Tang

However, evaluating the function calling abilities of LLMs in real-world scenarios remains under-explored due to the complexity of data collection and evaluation.

RobotDiffuse: Motion Planning for Redundant Manipulator based on Diffusion Model

1 code implementation27 Dec 2024 Xiaohan Zhang, Xudong Mou, Rui Wang, Tianyu Wo, Ningbo Gu, Tiejun Wang, Cangbai Xu, Xudong Liu

This paper introduces RobotDiffuse, a diffusion model-based approach for motion planning in redundant manipulators.

Acrobot Motion Planning

SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control

no code implementations20 Dec 2024 Xiaohan Zhang, Sebastian Starke, Vladimir Guzov, Zhensong Zhang, Eduardo Pérez Pellitero, Gerard Pons-Moll

To address these limitations, we introduce SCENIC, a diffusion model designed to generate human motion that adapts to dynamic terrains within virtual scenes while enabling semantic control through natural language.

Motion Synthesis

Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories

no code implementations13 Dec 2024 Xiaohan Zhang, Zhenyu Sun, Yukui Qiu, Junyan Su, Qi Liu

Currently, 3D rendering for large-scale free camera trajectories, namely, arbitrary input camera trajectories, poses significant challenges: 1) The distribution and observation angles of the cameras are irregular, and various types of scenes are included in the free trajectories; 2) Processing the entire point cloud and all images at once for large-scale scenes requires a substantial amount of GPU memory.

Safe Text-to-Image Generation: Simply Sanitize the Prompt Embedding

no code implementations15 Nov 2024 Huming Qiu, Guanxu Chen, Mi Zhang, Xiaohan Zhang, Xiaoyu You, Min Yang

Existing safe generation methods typically focus on suppressing inappropriate content by erasing undesired concepts from visual representations, while neglecting to sanitize the textual representation.

Text-to-Image Generation

CogVLM2: Visual Language Models for Image and Video Understanding

3 code implementations29 Aug 2024 Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications.

MM-Vet MVBench +3

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

1 code implementation12 Aug 2024 Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Yuxuan Zhang, Weihan Wang, Yean Cheng, Bin Xu, Xiaotao Gu, Yuxiao Dong, Jie Tang

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.

Text-to-Video Generation Video Alignment +2

Cross-View Meets Diffusion: Aerial Image Synthesis with Geometry and Text Guidance

2 code implementations8 Aug 2024 Ahmad Arrabi, Xiaohan Zhang, Waqas Sultani, Chen Chen, Safwan Wshah

In this paper, we present a novel Geometric Preserving Ground-to-Aerial (G2A) image synthesis (GPG2A) model that can generate realistic aerial images from ground images.

BEV Segmentation Data Augmentation +2

DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning

no code implementations25 Jun 2024 Xiaohan Zhang, Zainab Altaweel, Yohei Hayamizu, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs.

Task Planning

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

no code implementations21 Jun 2024 Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users.

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

no code implementations13 Jun 2024 Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong

Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants.

Multiple-choice

Aerial-NeRF: Adaptive Spatial Partitioning and Sampling for Large-Scale Aerial Rendering

no code implementations10 May 2024 Xiaohan Zhang, Yukui Qiu, Zhenyu Sun, Qi Liu

To that end, we propose Aerial-NeRF with three innovative modifications for jointly adapting NeRF in large-scale aerial rendering: (1) Designing an adaptive spatial partitioning and selection method based on drones' poses to adapt different flight trajectories; (2) Using similarity of poses instead of (expert) network for rendering speedup to determine which region a new viewpoint belongs to; (3) Developing an adaptive sampling approach for rendering performance improvement to cover the entire buildings at different heights.

NeRF

NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

1 code implementation7 May 2024 Shudan Zhang, Hanlin Zhao, Xiao Liu, Qinkai Zheng, Zehan Qi, Xiaotao Gu, Xiaohan Zhang, Yuxiao Dong, Jie Tang

To fill this gap, we propose NaturalCodeBench (NCB), a challenging code benchmark designed to mirror the complexity and variety of scenarios in real coding tasks.

HumanEval mbpp

AG-NeRF: Attention-guided Neural Radiance Fields for Multi-height Large-scale Outdoor Scene Rendering

1 code implementation18 Apr 2024 Jingfeng Guo, Xiaohan Zhang, Baozhu Zhao, Qi Liu

Existing neural radiance fields (NeRF)-based novel view synthesis methods for large-scale outdoor scenes are mainly built on a single altitude.

NeRF Novel View Synthesis

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

no code implementations4 Apr 2024 Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li

Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination.

counterfactual Counterfactual Reasoning +2

AutoWebGLM: A Large Language Model-based Web Navigating Agent

1 code implementation4 Apr 2024 Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang

Large language models (LLMs) have fueled many intelligent web agents, but most existing ones perform far from satisfying in real-world web navigation tasks due to three factors: (1) the complexity of HTML text data (2) versatility of actions on webpages, and (3) task difficulty due to the open-domain nature of the web.

Decision Making Language Modeling +2

ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline

3 code implementations3 Apr 2024 Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Wenyi Zhao, Jie Tang, Yuxiao Dong

Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving.

Math Mathematical Problem-Solving

ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

no code implementations1 Apr 2024 Zhenyu Hou, Yilin Niu, Zhengxiao Du, Xiaohan Zhang, Xiao Liu, Aohan Zeng, Qinkai Zheng, Minlie Huang, Hongning Wang, Jie Tang, Yuxiao Dong

The work presents our practices of aligning LLMs with human preferences, offering insights into the challenges and solutions in RLHF implementations.

MapGuide: A Simple yet Effective Method to Reconstruct Continuous Language from Brain Activities

no code implementations26 Mar 2024 Xinpei Zhao, Jingyuan Sun, Shaonan Wang, Jing Ye, Xiaohan Zhang, Chengqing Zong

In contrast, we propose a simple yet effective method that guides text reconstruction by directly comparing them with the predicted text embeddings mapped from brain activities.

Text Generation

FORCE: Physics-aware Human-object Interaction

no code implementations17 Mar 2024 Xiaohan Zhang, Bharat Lal Bhatnagar, Sebastian Starke, Ilya Petrov, Vladimir Guzov, Helisa Dhamo, Eduardo Pérez-Pellitero, Gerard Pons-Moll

Interactions between human and objects are influenced not only by the object's pose and shape, but also by physical attributes such as object mass and surface friction.

Diversity Friction +2

PointCore: Efficient Unsupervised Point Cloud Anomaly Detector Using Local-Global Features

2 code implementations4 Mar 2024 Baozhu Zhao, Qiwei Xiong, Xiaohan Zhang, Jingfeng Guo, Qi Liu, Xiaofen Xing, Xiangmin Xu

Three-dimensional point cloud anomaly detection that aims to detect anomaly data points from a training set serves as the foundation for a variety of applications, including industrial inspection and autonomous driving.

Anomaly Detection Autonomous Driving

MulCogBench: A Multi-modal Cognitive Benchmark Dataset for Evaluating Chinese and English Computational Language Models

no code implementations2 Mar 2024 Yunhao Zhang, Xiaohan Zhang, Chong Li, Shaonan Wang, Chengqing Zong

Results show that language models share significant similarities with human cognitive data and the similarity patterns are modulated by the data modality and stimuli complexity.

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

1 code implementation22 Dec 2023 Qianrui Zhou, Hua Xu, Hao Li, Hanlei Zhang, Xiaohan Zhang, Yifan Wang, Kai Gao

To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism.

Contrastive Learning Multimodal Intent Recognition +1

Robust Tumor Segmentation with Hyperspectral Imaging and Graph Neural Networks

no code implementations20 Nov 2023 Mayar Lotfy Mostafa, Anna Alperovich, Tommaso Giannantonio, Bjorn Barz, Xiaohan Zhang, Felix Holm, Nassir Navab, Felix Boehm, Carolin Schwamborn, Thomas K. Hoffmann, Patrick J. Schuler

Despite the limited dataset, the GNN-based model significantly outperforms context-agnostic approaches, accurately distinguishing between healthy and tumor tissues, even in images from previously unseen patients.

Tumor Segmentation

Tuning In to Neural Encoding: Linking Human Brain and Artificial Supervised Representations of Language

no code implementations5 Oct 2023 Jingyuan Sun, Xiaohan Zhang, Marie-Francine Moens

To understand the algorithm that supports the human brain's language representation, previous research has attempted to predict neural responses to linguistic stimuli using embeddings generated by artificial neural networks (ANNs), a process known as neural encoding.

Natural Language Understanding

GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement

1 code implementation18 Aug 2023 Xiaohan Zhang, Xingyu Li, Waqas Sultani, Chen Chen, Safwan Wshah

We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details.

Attribute Disentanglement +1

Human Emotion Recognition Based On Galvanic Skin Response signal Feature Selection and SVM

no code implementations4 Jul 2023 Di Fan, Mingyang Liu, Xiaohan Zhang, Xiaopeng Gong

A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper.

Emotion Recognition feature selection

Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds

1 code implementation27 May 2023 Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works.

Task Planning World Knowledge

TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection

1 code implementation26 May 2023 Xue Zhang, Xiaohan Zhang, Jiangtao Wang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, Hui-Liang Shen

Different from them, we comprehensively analyze the impacts of false positives on the detection performance and find that enhancing feature contrast can significantly reduce these false positives.

Multispectral Object Detection object-detection +2

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

1 code implementation22 Apr 2023 Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, Peter Stone

LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language.

Zero-shot Generalization

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

1 code implementation28 Feb 2023 Jing Zhang, Xiaokang Zhang, Daniel Zhang-li, Jifan Yu, Zijun Yao, Zeyao Ma, Yiqi Xu, Haohua Wang, Xiaohan Zhang, Nianyi Lin, Sunrui Lu, Juanzi Li, Jie Tang

We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.

Dialogue Evaluation Dialogue Generation +3

Time-sensitive Learning for Heterogeneous Federated Edge Intelligence

no code implementations26 Jan 2023 Yong Xiao, Xiaohan Zhang, Guangming Shi, Marwan Krunz, Diep N. Nguyen, Dinh Thai Hoang

A joint optimization algorithm is proposed to minimize the overall time consumption of model training by selecting participating edge servers, local epoch number.

Decision Making Edge-computing +1

Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence

1 code implementation8 Dec 2022 Xiaohan Zhang, Xingyu Li, Waqas Sultani, Yi Zhou, Safwan Wshah

We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set.

Attribute counterfactual +1

Cross-View Image Sequence Geo-localization

1 code implementation25 Oct 2022 Xiaohan Zhang, Waqas Sultani, Safwan Wshah

In this paper, we present the first cross-view geo-localization method that works on a sequence of limited FOV images.

geo-localization

Robotic Table Wiping via Reinforcement Learning and Whole-body Trajectory Optimization

no code implementations19 Oct 2022 Thomas Lew, Sumeet Singh, Mario Prats, Jeffrey Bingham, Jonathan Weisz, Benjie Holson, Xiaohan Zhang, Vikas Sindhwani, Yao Lu, Fei Xia, Peng Xu, Tingnan Zhang, Jie Tan, Montserrat Gonzalez

This problem is challenging, as it requires planning wiping actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations.

reinforcement-learning Reinforcement Learning (RL)

Revisiting Optimal Convergence Rate for Smooth and Non-convex Stochastic Decentralized Optimization

no code implementations14 Oct 2022 Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, Pan Pan

While (Lu and Sa, 2021) have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear.

Robot Task Planning and Situation Handling in Open Worlds

no code implementations4 Oct 2022 Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao, Hao Yang, Chad Esselink, Shiqi Zhang

For systematic evaluations, we collected a dataset that includes 561 execution-time situations in a dining domain, where each situation corresponds to a state instance of a robot being potentially unable to complete a task using a solution that normally works.

Common Sense Reasoning Task Planning +1

COUCH: Towards Controllable Human-Chair Interactions

no code implementations1 May 2022 Xiaohan Zhang, Bharat Lal Bhatnagar, Vladimir Guzov, Sebastian Starke, Gerard Pons-Moll

In this work, we study the problem of synthesizing scene interactions conditioned on different contact positions on the object.

Human-Object Interaction Detection Object

Deep Class Incremental Learning from Decentralized Data

no code implementations11 Mar 2022 Xiaohan Zhang, Songlin Dong, Jinjie Chen, Qi Tian, Yihong Gong, Xiaopeng Hong

In this paper, we focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed and the data are stored in multiple repositories.

class-incremental learning Class Incremental Learning +2

Visual and Object Geo-localization: A Comprehensive Survey

no code implementations30 Dec 2021 Daniel Wilson, Xiaohan Zhang, Waqas Sultani, Safwan Wshah

The concept of geo-localization refers to the process of determining where on earth some `entity' is located, typically using Global Positioning System (GPS) coordinates.

3D Reconstruction geo-localization +2

Actor-Critic Algorithm for High-dimensional PDEs

no code implementations NeurIPS Workshop DLDE 2021 Xiaohan Zhang

Our model advances the state-of-the-art machine learning PDE solvers in a few aspects: 1) the trainable parameters are reduced by $N$ times, where $N$ is the number of steps to discretize the PDE in time, 2) the model convergence rate is an order of magnitude faster, 3) our model has fewer tuning hyperparameters.

BIG-bench Machine Learning Vocal Bursts Intensity Prediction

Object Tracking and Geo-localization from Street Images

no code implementations13 Jul 2021 Daniel Wilson, Thayer Alshaabi, Colin Van Oort, Xiaohan Zhang, Jonathan Nelson, Safwan Wshah

Geo-localizing static objects from street images is challenging but also very important for road asset mapping and autonomous driving.

Autonomous Driving geo-localization +2

Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task

no code implementations24 Jan 2021 Xiaohan Zhang, Lu Liu, Guodong Long, Jing Jiang, Shenquan Liu

Typical methods to study cognitive function are to record the electrical activities of animal neurons during the training of animals performing behavioral tasks.

Decision Making Hippocampus +3

Efficient Golf Ball Detection and Tracking Based on Convolutional Neural Networks and Kalman Filter

1 code implementation17 Dec 2020 Tianxiao Zhang, Xiaohan Zhang, Yiju Yang, Zongbo Wang, Guanghui Wang

The detection is performed on small image patches instead of the entire image to increase the performance of small ball detection.

Object object-detection +1

Actor-Critic Algorithm for High-dimensional Partial Differential Equations

no code implementations7 Oct 2020 Xiaohan Zhang

We develop a deep learning model to effectively solve high-dimensional nonlinear parabolic partial differential equations (PDE).

Deep Reinforcement Learning reinforcement-learning +2

Scale Calibrated Training: Improving Generalization of Deep Networks via Scale-Specific Normalization

no code implementations31 Aug 2019 Zhuoran Yu, Aojun Zhou, Yukun Ma, Yudian Li, Xiaohan Zhang, Ping Luo

Experiment results show that SCT improves accuracy of single Resnet-50 on ImageNet by 1. 7% and 11. 5% accuracy when testing on image sizes of 224 and 128 respectively.

Data Augmentation Image Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.