no code implementations • LREC 2022 • Xiaohan Zhang, Shaonan Wang, Chengqing Zong
Based on these results, we suggest a block-wise cross-validation training method and an adequate data size for increasing the performance of linear encoding models.
1 code implementation • 2 May 2025 • Boyuan Meng, Xiaohan Zhang, Peilin Li, Zhe Wu, Yiming Li, Wenkai Zhao, Beinan Yu, Hui-Liang Shen
Cross-domain few-shot object detection (CD-FSOD) aims to detect novel objects across different domains with limited class instances.
Ranked #2 on
Cross-Domain Few-Shot Object Detection
on Clipark1k
Cross-Domain Few-Shot
Cross-Domain Few-Shot Object Detection
+2
no code implementations • 20 Apr 2025 • Junyan Su, Baozhu Zhao, Xiaohan Zhang, Qi Liu
The introduction of 3D Gaussian Splatting (3DGS) has advanced novel view synthesis by utilizing Gaussians to represent scenes.
1 code implementation • 17 Jan 2025 • Lucen Zhong, Zhengxiao Du, Xiaohan Zhang, Haiyi Hu, Jie Tang
However, evaluating the function calling abilities of LLMs in real-world scenarios remains under-explored due to the complexity of data collection and evaluation.
1 code implementation • 30 Dec 2024 • Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, YuAn Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, Yuxiao Dong
We present a general strategy to aligning visual generation models -- both image and video generation -- with human preference.
1 code implementation • 27 Dec 2024 • Xiaohan Zhang, Xudong Mou, Rui Wang, Tianyu Wo, Ningbo Gu, Tiejun Wang, Cangbai Xu, Xudong Liu
This paper introduces RobotDiffuse, a diffusion model-based approach for motion planning in redundant manipulators.
no code implementations • 20 Dec 2024 • Xiaohan Zhang, Sebastian Starke, Vladimir Guzov, Zhensong Zhang, Eduardo Pérez Pellitero, Gerard Pons-Moll
To address these limitations, we introduce SCENIC, a diffusion model designed to generate human motion that adapts to dynamic terrains within virtual scenes while enabling semantic control through natural language.
no code implementations • 13 Dec 2024 • Xiaohan Zhang, Zhenyu Sun, Yukui Qiu, Junyan Su, Qi Liu
Currently, 3D rendering for large-scale free camera trajectories, namely, arbitrary input camera trajectories, poses significant challenges: 1) The distribution and observation angles of the cameras are irregular, and various types of scenes are included in the free trajectories; 2) Processing the entire point cloud and all images at once for large-scale scenes requires a substantial amount of GPU memory.
no code implementations • 15 Nov 2024 • Huming Qiu, Guanxu Chen, Mi Zhang, Xiaohan Zhang, Xiaoyu You, Min Yang
Existing safe generation methods typically focus on suppressing inappropriate content by erasing undesired concepts from visual representations, while neglecting to sanitize the textual representation.
3 code implementations • 29 Aug 2024 • Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications.
Ranked #10 on
Visual Question Answering
on MM-Vet
1 code implementation • 12 Aug 2024 • Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Yuxuan Zhang, Weihan Wang, Yean Cheng, Bin Xu, Xiaotao Gu, Yuxiao Dong, Jie Tang
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.
2 code implementations • 8 Aug 2024 • Ahmad Arrabi, Xiaohan Zhang, Waqas Sultani, Chen Chen, Safwan Wshah
In this paper, we present a novel Geometric Preserving Ground-to-Aerial (G2A) image synthesis (GPG2A) model that can generate realistic aerial images from ground images.
no code implementations • 25 Jun 2024 • Xiaohan Zhang, Zainab Altaweel, Yohei Hayamizu, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang
Vision-language models (VLMs) have been applied to robot task planning problems, where the robot receives a task in natural language and generates plans based on visual inputs.
no code implementations • 21 Jun 2024 • Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang
We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users.
7 code implementations • 18 Jun 2024 • Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, Zihan Wang
We introduce ChatGLM, an evolving family of large language models that we have been developing over time.
no code implementations • 13 Jun 2024 • Yuhang Wu, Wenmeng Yu, Yean Cheng, Yan Wang, Xiaohan Zhang, Jiazheng Xu, Ming Ding, Yuxiao Dong
Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants.
1 code implementation • 12 Jun 2024 • Weihan Wang, Zehai He, Wenyi Hong, Yean Cheng, Xiaohan Zhang, Ji Qi, Xiaotao Gu, Shiyu Huang, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang
To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding.
1 code implementation • 25 May 2024 • Xue Zhang, Si-Yuan Cao, Fang Wang, Runmin Zhang, Zhe Wu, Xiaohan Zhang, Xiaokai Bai, Hui-Liang Shen
In this paper, we address this issue by improving the performance of efficient single-branch structures.
no code implementations • 10 May 2024 • Xiaohan Zhang, Yukui Qiu, Zhenyu Sun, Qi Liu
To that end, we propose Aerial-NeRF with three innovative modifications for jointly adapting NeRF in large-scale aerial rendering: (1) Designing an adaptive spatial partitioning and selection method based on drones' poses to adapt different flight trajectories; (2) Using similarity of poses instead of (expert) network for rendering speedup to determine which region a new viewpoint belongs to; (3) Developing an adaptive sampling approach for rendering performance improvement to cover the entire buildings at different heights.
1 code implementation • 7 May 2024 • Shudan Zhang, Hanlin Zhao, Xiao Liu, Qinkai Zheng, Zehan Qi, Xiaotao Gu, Xiaohan Zhang, Yuxiao Dong, Jie Tang
To fill this gap, we propose NaturalCodeBench (NCB), a challenging code benchmark designed to mirror the complexity and variety of scenarios in real coding tasks.
1 code implementation • 18 Apr 2024 • Jingfeng Guo, Xiaohan Zhang, Baozhu Zhao, Qi Liu
Existing neural radiance fields (NeRF)-based novel view synthesis methods for large-scale outdoor scenes are mainly built on a single altitude.
no code implementations • 4 Apr 2024 • Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li
Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination.
1 code implementation • 4 Apr 2024 • Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang
Large language models (LLMs) have fueled many intelligent web agents, but most existing ones perform far from satisfying in real-world web navigation tasks due to three factors: (1) the complexity of HTML text data (2) versatility of actions on webpages, and (3) task difficulty due to the open-domain nature of the web.
3 code implementations • 3 Apr 2024 • Yifan Xu, Xiao Liu, Xinghan Liu, Zhenyu Hou, Yueyan Li, Xiaohan Zhang, Zihan Wang, Aohan Zeng, Zhengxiao Du, Wenyi Zhao, Jie Tang, Yuxiao Dong
Large language models (LLMs) have shown excellent mastering of human language, but still struggle in real-world applications that require mathematical problem-solving.
no code implementations • 1 Apr 2024 • Zhenyu Hou, Yilin Niu, Zhengxiao Du, Xiaohan Zhang, Xiao Liu, Aohan Zeng, Qinkai Zheng, Minlie Huang, Hongning Wang, Jie Tang, Yuxiao Dong
The work presents our practices of aligning LLMs with human preferences, offering insights into the challenges and solutions in RLHF implementations.
no code implementations • 26 Mar 2024 • Xinpei Zhao, Jingyuan Sun, Shaonan Wang, Jing Ye, Xiaohan Zhang, Chengqing Zong
In contrast, we propose a simple yet effective method that guides text reconstruction by directly comparing them with the predicted text embeddings mapped from brain activities.
no code implementations • 17 Mar 2024 • Xiaohan Zhang, Bharat Lal Bhatnagar, Sebastian Starke, Ilya Petrov, Vladimir Guzov, Helisa Dhamo, Eduardo Pérez-Pellitero, Gerard Pons-Moll
Interactions between human and objects are influenced not only by the object's pose and shape, but also by physical attributes such as object mass and surface friction.
2 code implementations • 4 Mar 2024 • Baozhu Zhao, Qiwei Xiong, Xiaohan Zhang, Jingfeng Guo, Qi Liu, Xiaofen Xing, Xiangmin Xu
Three-dimensional point cloud anomaly detection that aims to detect anomaly data points from a training set serves as the foundation for a variety of applications, including industrial inspection and autonomous driving.
no code implementations • 2 Mar 2024 • Yunhao Zhang, Xiaohan Zhang, Chong Li, Shaonan Wang, Chengqing Zong
Results show that language models share significant similarities with human cognitive data and the similarity patterns are modulated by the data modality and stimuli complexity.
no code implementations • CVPR 2024 • Arjun Majumdar, Anurag Ajay, Xiaohan Zhang, Pranav Putta, Sriram Yenamandra, Mikael Henaff, Sneha Silwal, Paul McVay, Oleksandr Maksymets, Sergio Arnaud, Karmesh Yadav, Qiyang Li, Ben Newman, Mohit Sharma, Vincent Berges, Shiqi Zhang, Pulkit Agrawal, Yonatan Bisk, Dhruv Batra, Mrinal Kalakrishnan, Franziska Meier, Chris Paxton, Alexander Sax, Aravind Rajeswaran
We present a modern formulation of Embodied Question Answering (EQA) as the task of understanding an environment well enough to answer questions about it in natural language.
1 code implementation • 22 Dec 2023 • Qianrui Zhou, Hua Xu, Hao Li, Hanlei Zhang, Xiaohan Zhang, Yifan Wang, Kai Gao
To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism.
Ranked #2 on
Multimodal Intent Recognition
on MIntRec
1 code implementation • 30 Nov 2023 • Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Xiaotao Gu, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang
Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants.
1 code implementation • 28 Nov 2023 • Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang
In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters.
no code implementations • 20 Nov 2023 • Mayar Lotfy Mostafa, Anna Alperovich, Tommaso Giannantonio, Bjorn Barz, Xiaohan Zhang, Felix Holm, Nassir Navab, Felix Boehm, Carolin Schwamborn, Thomas K. Hoffmann, Patrick J. Schuler
Despite the limited dataset, the GNN-based model significantly outperforms context-agnostic approaches, accurately distinguishing between healthy and tumor tissues, even in images from previously unseen patients.
no code implementations • 5 Oct 2023 • Jingyuan Sun, Xiaohan Zhang, Marie-Francine Moens
To understand the algorithm that supports the human brain's language representation, previous research has attempted to predict neural responses to linguistic stimuli using embeddings generated by artificial neural networks (ANNs), a process known as neural encoding.
1 code implementation • 18 Aug 2023 • Xiaohan Zhang, Xingyu Li, Waqas Sultani, Chen Chen, Safwan Wshah
We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details.
no code implementations • 4 Jul 2023 • Di Fan, Mingyang Liu, Xiaohan Zhang, Xiaopeng Gong
A novel human emotion recognition method based on automatically selected Galvanic Skin Response (GSR) signal features and SVM is proposed in this paper.
1 code implementation • 15 Jun 2023 • Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan YAO, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations.
1 code implementation • 27 May 2023 • Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang
Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works.
1 code implementation • 26 May 2023 • Xue Zhang, Xiaohan Zhang, Jiangtao Wang, Jiacheng Ying, Zehua Sheng, Heng Yu, Chunguang Li, Hui-Liang Shen
Different from them, we comprehensively analyze the impacts of false positives on the detection performance and find that enhancing feature contrast can significantly reduce these false positives.
1 code implementation • 22 Apr 2023 • Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, Peter Stone
LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language.
no code implementations • 21 Apr 2023 • Priyam Parashar, Vidhi Jain, Xiaohan Zhang, Jay Vakil, Sam Powers, Yonatan Bisk, Chris Paxton
We see a 4x improvement over baseline in mobile manipulation setting.
1 code implementation • 28 Feb 2023 • Jing Zhang, Xiaokang Zhang, Daniel Zhang-li, Jifan Yu, Zijun Yao, Zeyao Ma, Yiqi Xu, Haohua Wang, Xiaohan Zhang, Nianyi Lin, Sunrui Lu, Juanzi Li, Jie Tang
We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.
no code implementations • 6 Feb 2023 • Tommaso Giannantonio, Anna Alperovich, Piercosimo Semeraro, Manfredo Atzori, Xiaohan Zhang, Christoph Hauger, Alexander Freytag, Siri Luthman, Roeland Vandebriel, Murali Jayapala, Lien Solie, Steven de Vleeschouwer
Surgery for gliomas (intrinsic brain tumors), especially when low-grade, is challenging due to the infiltrative nature of the lesion.
no code implementations • 26 Jan 2023 • Yong Xiao, Xiaohan Zhang, Guangming Shi, Marwan Krunz, Diep N. Nguyen, Dinh Thai Hoang
A joint optimization algorithm is proposed to minimize the overall time consumption of model training by selecting participating edge servers, local epoch number.
1 code implementation • 8 Dec 2022 • Xiaohan Zhang, Xingyu Li, Waqas Sultani, Yi Zhou, Safwan Wshah
We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set.
1 code implementation • 25 Oct 2022 • Xiaohan Zhang, Waqas Sultani, Safwan Wshah
In this paper, we present the first cross-view geo-localization method that works on a sequence of limited FOV images.
no code implementations • 19 Oct 2022 • Thomas Lew, Sumeet Singh, Mario Prats, Jeffrey Bingham, Jonathan Weisz, Benjie Holson, Xiaohan Zhang, Vikas Sindhwani, Yao Lu, Fei Xia, Peng Xu, Tingnan Zhang, Jie Tan, Montserrat Gonzalez
This problem is challenging, as it requires planning wiping actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations.
no code implementations • 14 Oct 2022 • Kun Yuan, Xinmeng Huang, Yiming Chen, Xiaohan Zhang, Yingya Zhang, Pan Pan
While (Lu and Sa, 2021) have recently provided an optimal rate for non-convex stochastic decentralized optimization with weight matrices defined over linear graphs, the optimal rate with general weight matrices remains unclear.
no code implementations • 4 Oct 2022 • Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao, Hao Yang, Chad Esselink, Shiqi Zhang
For systematic evaluations, we collected a dataset that includes 561 execution-time situations in a dining domain, where each situation corresponds to a state instance of a robot being potentially unable to complete a task using a solution that normally works.
no code implementations • 1 May 2022 • Xiaohan Zhang, Bharat Lal Bhatnagar, Vladimir Guzov, Sebastian Starke, Gerard Pons-Moll
In this work, we study the problem of synthesizing scene interactions conditioned on different contact positions on the object.
no code implementations • 11 Mar 2022 • Xiaohan Zhang, Songlin Dong, Jinjie Chen, Qi Tian, Yihong Gong, Xiaopeng Hong
In this paper, we focus on a new and challenging decentralized machine learning paradigm in which there are continuous inflows of data to be addressed and the data are stored in multiple repositories.
no code implementations • 30 Dec 2021 • Daniel Wilson, Xiaohan Zhang, Waqas Sultani, Safwan Wshah
The concept of geo-localization refers to the process of determining where on earth some `entity' is located, typically using Global Positioning System (GPS) coordinates.
no code implementations • NeurIPS Workshop DLDE 2021 • Xiaohan Zhang
Our model advances the state-of-the-art machine learning PDE solvers in a few aspects: 1) the trainable parameters are reduced by $N$ times, where $N$ is the number of steps to discretize the PDE in time, 2) the model convergence rate is an order of magnitude faster, 3) our model has fewer tuning hyperparameters.
BIG-bench Machine Learning
Vocal Bursts Intensity Prediction
no code implementations • 13 Jul 2021 • Daniel Wilson, Thayer Alshaabi, Colin Van Oort, Xiaohan Zhang, Jonathan Nelson, Safwan Wshah
Geo-localizing static objects from street images is challenging but also very important for road asset mapping and autonomous driving.
no code implementations • 24 Jan 2021 • Xiaohan Zhang, Lu Liu, Guodong Long, Jing Jiang, Shenquan Liu
Typical methods to study cognitive function are to record the electrical activities of animal neurons during the training of animals performing behavioral tasks.
1 code implementation • 17 Dec 2020 • Tianxiao Zhang, Xiaohan Zhang, Yiju Yang, Zongbo Wang, Guanghui Wang
The detection is performed on small image patches instead of the entire image to increase the performance of small ball detection.
no code implementations • 7 Oct 2020 • Xiaohan Zhang
We develop a deep learning model to effectively solve high-dimensional nonlinear parabolic partial differential equations (PDE).
no code implementations • 31 Aug 2019 • Zhuoran Yu, Aojun Zhou, Yukun Ma, Yudian Li, Xiaohan Zhang, Ping Luo
Experiment results show that SCT improves accuracy of single Resnet-50 on ImageNet by 1. 7% and 11. 5% accuracy when testing on image sizes of 224 and 128 respectively.