Search Results for author: Chi Zhang

Found 306 papers, 104 papers with code

Rethink the Role of Deep Learning towards Large-scale Quantum Systems

1 code implementation20 May 2025 Yusheng Zhao, Chi Zhang, Yuxuan Du

Characterizing the ground state properties of quantum systems is fundamental to capturing their behavior but computationally challenging.

A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning

no code implementations18 May 2025 Zachary Roch, Chi Zhang, George Atia, Yue Wang

Robust reinforcement learning (RL) under the average-reward criterion is crucial for long-term decision making under potential environment mismatches, yet its finite-sample complexity study remains largely unexplored.

Reinforcement Learning (RL)

KINDLE: Knowledge-Guided Distillation for Prior-Free Gene Regulatory Network Inference

no code implementations14 May 2025 Rui Peng, Yuchen Lu, Qichen Sun, Yuxing Lu, Chi Zhang, Ziru Liu, Jinzhuo Wang

Subsequent methods integrate prior knowledge to mitigate this challenge by restricting the solution space to biologically plausible interactions.

Synthesizing Images on Perceptual Boundaries of ANNs for Uncovering and Manipulating Human Perceptual Variability

no code implementations6 May 2025 Chen Wei, Chi Zhang, Jiachen Zou, Haotian Deng, Dietmar Heinke, Quanying Liu

Human decision-making in cognitive tasks and daily life exhibits considerable variability, shaped by factors such as task difficulty, individual preferences, and personal experiences.

Decision Making

DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

no code implementations22 Apr 2025 Jie Zhu, Qian Chen, Huaixia Dou, Junhui Li, Lifan Guo, Feng Chen, Chi Zhang

Effective reasoning remains a core challenge for large language models (LLMs) in the financial domain, where tasks often require domain-specific knowledge, precise numerical calculations, and strict adherence to compliance rules.

Math

Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models

no code implementations19 Apr 2025 Xinlin Zhuang, Jiahui Peng, Ren Ma, Yinfan Wang, Tianyi Bai, Xingjian Wei, Jiantao Qiu, Chi Zhang, Ying Qian, Conghui He

The composition of pre-training datasets for large language models (LLMs) remains largely undisclosed, hindering transparency and efforts to optimize data quality, a critical driver of model performance.

Physics Informed Constrained Learning of Dynamics from Static Data

1 code implementation17 Apr 2025 Pengtao Dang, Tingbo Guo, Melissa Fishel, Guang Lin, Wenzhuo Wu, Sha Cao, Chi Zhang

In this study, we developed a new PINN learning paradigm, namely Constrained Learning, that enables the approximation of first-order derivatives or motions using non-time course or partially observed data.

Probing and Inducing Combinational Creativity in Vision-Language Models

no code implementations17 Apr 2025 Yongqian Peng, Yuxi Ma, Mengmeng Wang, Yuxuan Wang, Yizhou Wang, Chi Zhang, Yixin Zhu, Zilong Zheng

The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence.

Phased Array Calibration based on Rotating-Element Harmonic Electric-Field Vector with Time Modulation

no code implementations17 Apr 2025 Shiyuan Li, Yuyue Zhou, Chi Zhang, Liang Kong, Kebin Liu, Yihan Xie, Chong He

While amplitude-only calibration methods offer advantages when phase measurements are impractical, conventional approaches face two key challenges: they typically require high-resolution phase shifters and remain susceptible to phase errors inherent in these components.

OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding

no code implementations15 Apr 2025 Dianbing Xi, Jiepeng Wang, Yuanzhi Liang, Xi Qiu, Yuchi Huo, Rui Wang, Chi Zhang, Xuelong Li

In this paper, we propose a novel framework for controllable video diffusion, OmniVDiff, aiming to synthesize and comprehend multiple video visual content in a single diffusion model.

Semantic Segmentation Video Generation +1

Learning to Be A Doctor: Searching for Effective Medical Agent Architectures

no code implementations15 Apr 2025 Yangyang Zhuang, Wenjia Jiang, Jiayu Zhang, Ze Yang, Joey Tianyi Zhou, Chi Zhang

Motivated by the success of automated machine learning (AutoML), this paper introduces a novel framework for the automated design of medical agent architectures.

AutoML Diagnostic +1

A simulation-heuristics dual-process model for intuitive physics

1 code implementation13 Apr 2025 Shiqian Li, Yuxi Ma, Jiajun Yan, Bo Dai, Yujia Peng, Chi Zhang, Yixin Zhu

The role of mental simulation in human physical reasoning is widely acknowledged, but whether it is employed across scenarios with varying simulation costs and where its boundary lies remains unclear.

HD-RAG: Retrieval-Augmented Generation for Hybrid Documents Containing Text and Hierarchical Tables

no code implementations13 Apr 2025 Chi Zhang, Qiyang Chen

The Hybrid Document RAG task aims to integrate textual and hierarchical tabular data for more comprehensive retrieval and generation in complex scenarios.

Question Answering RAG +4

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

no code implementations10 Apr 2025 ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen, Riwei Chen, Liangqiang Chen, Zixin Chen, Jinsong Chen, Siyan Chen, Kaiyuan Chen, Zhi Chen, Jin Chen, Jiecao Chen, Jinxin Chi, Weinan Dai, Ning Dai, Jiahui Dai, Shihan Dou, Yantao Du, Zhengyin Du, Jianhui Duan, Chen Dun, Ting-Han Fan, Jiazhan Feng, Junda Feng, Ziyuan Feng, Yuwei Fu, Wenqi Fu, Hanjie Fu, Hao Ge, Hongyi Guo, Mingji Han, Li Han, Wenhao Hao, Xintong Hao, Qianyu He, Jerry He, Feng He, Wen Heng, Zehua Hong, Qi Hou, Liang Hu, Shengding Hu, Nan Hu, Kai Hua, Qi Huang, Ziyue Huang, Hongzhi Huang, Zihao Huang, Ting Huang, Wenhao Huang, Wei Jia, Bin Jia, Xiaoying Jia, Yuhua Jiang, Haobin Jiang, Ziheng Jiang, Kaihua Jiang, Chengquan Jiang, Jianpeng Jiao, Xiaoran Jin, Xing Jin, Xunhao Lai, Xiang Li, Liyi Li, Hongkai Li, Zheng Li, Shengxian Wan, Ya Wang, Yunshui Li, Chenggang Li, Niuniu Li, Siyu Li, Xi Li, Xiao Li, Aoyan Li, Yuntao Li, Nianning Liang, Xinnian Liang, Haibin Lin, Weijian Lin, Ye Lin, Zhicheng Liu, Guanlin Liu, Chenxiao Liu, Yan Liu, Gaohong Liu, Juncai Liu, Chundian Liu, Deyi Liu, Kaibo Liu, Siyao Liu, Qi Liu, Yongfei Liu, Kang Liu, Gan Liu, Boyi Liu, Rui Long, Weiqiang Lou, Chenwei Lou, Xiang Luo, Yao Luo, Caiping Lv, Heyang Lv, Bole Ma, Qianli Ma, Hongzhi Ma, Yiyuan Ma, Jin Ma, Wenchang Ma, Tingting Ma, Chen Mao, Qiyang Min, Zhe Nan, Guanghan Ning, Jinxiang Ou, Haojie Pan, Renming Pang, Yanghua Peng, Tao Peng, Lihua Qian, Mu Qiao, Meng Qu, Cheng Ren, Hongbin Ren, Yong Shan, Wei Shen, Ke Shen, Kai Shen, Guangming Sheng, Jinlong Shi, Wenlei Shi, Guang Shi, Shuai Shuai Cao, Yuxin Song, Zuquan Song, Jing Su, Yifan Sun, Tao Sun, Zewei Sun, Borui Wan, Xiaohui Wang, Xi Wang, Shuguang Wang, Jun Wang, Qinlong Wang, Chenyuan Wang, Shuai Wang, Zihan Wang, Changbao Wang, Jiaqiang Wang, Shihang Wang, Xuwu Wang, Zaiyuan Wang, Yuxuan Wang, Wenqi Wang, Taiqing Wang, Chengzhi Wei, Houmin Wei, Ziyun Wei, Shufa Wei, Zheng Wu, Yonghui Wu, Yangjun Wu, Bohong Wu, Shuang Wu, Jingqiao Wu, Ning Wu, Shuangzhi Wu, Jianmin Wu, Chenguang Xi, Fan Xia, Yuqiao Xian, Liang Xiang, Boren Xiang, Bowen Xiao, Zhen Xiao, Xia Xiao, Yongsheng Xiao, Chao Xin, Shulin Xin, Yuwen Xiong, Jingjing Xu, Ziwen Xu, Chenyin Xu, Jiayi Xu, Yifan Xu, Wei Xu, Yufei Xu, Shikun Xu, Shipeng Yan, Shen Yan, Qingping Yang, Xi Yang, Tianhao Yang, Yuehang Yang, Yuan Yang, Ximing Yang, Zeyu Yang, Guang Yang, Yifan Yang, Xuesong Yao, Bairen Yi, Fan Yin, Jianian Yin, Ziqiang Ying, Xiangyu Yu, Hongli Yu, Song Yu, Menghan Yu, Huan Yu, Siyu Yuan, Jun Yuan, Yutao Zeng, Tianyang Zhan, Zheng Zhang, Yun Zhang, Mofan Zhang, Wang Zhang, Ru Zhang, Zhi Zhang, Tianqi Zhang, Xinyi Zhang, Zhexi Zhang, Sijun Zhang, Wenqiang Zhang, Xiangxiang Zhang, Yongtao Zhang, Yuyu Zhang, Ge Zhang, He Zhang, Yue Zhang, Renjie Zheng, Ningxin Zheng, Zhuolin Zheng, Yaowei Zheng, Chen Zheng, Xiaoyun Zhi, Wanjun Zhong, Cheng Zhong, Zheng Zhong, Baoquan Zhong, Xun Zhou, Na Zhou, Huan Zhou, Hang Zhu, Defa Zhu, Wenjia Zhu, Lei Zuo

We introduce Seed1. 5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks.

Mixture-of-Experts reinforcement-learning +1

VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

1 code implementation7 Apr 2025 Yu Yue, Yufeng Yuan, Qiying Yu, Xiaochen Zuo, Ruofei Zhu, Wenyuan Xu, Jiaze Chen, Chengyi Wang, Tiantian Fan, Zhengyin Du, Xiangpeng Wei, Xiangyu Yu, Gaohong Liu, Juncai Liu, Lingjun Liu, Haibin Lin, Zhiqi Lin, Bole Ma, Chi Zhang, Mofan Zhang, Wang Zhang, Hang Zhu, Ru Zhang, Xin Liu, Mingxuan Wang, Yonghui Wu, Lin Yan

We present VAPO, Value-based Augmented Proximal Policy Optimization framework for reasoning models., a novel framework tailored for reasoning models within the value-based paradigm.

Patients Speak, AI Listens: LLM-based Analysis of Online Reviews Uncovers Key Drivers for Urgent Care Satisfaction

no code implementations26 Mar 2025 Xiaoran Xu, Zhaoqian Xue, Chi Zhang, Jhonatan Medri, Junjie Xiong, Jiayan Zhou, Jin Jin, Yongfeng Zhang, Siyuan Ma, Lingyao Li

Our results show that interpersonal factors and operational efficiency emerge as the strongest determinants of patient satisfaction in urgent care, while technical quality, finances, and facilities show no significant independent effects when adjusted for in multivariate models.

Prompt Engineering

InstructVEdit: A Holistic Approach for Instructional Video Editing

no code implementations22 Mar 2025 Chi Zhang, Chengjian Feng, Feng Yan, Qiming Zhang, Mingjin Zhang, Yujie Zhong, Jing Zhang, Lin Ma

Video editing according to instructions is a highly challenging task due to the difficulty in collecting large-scale, high-quality edited video pair data.

Video Editing

Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness

no code implementations12 Mar 2025 Beier Zhu, Jiequan Cui, Hanwang Zhang, Chi Zhang

While image-text foundation models have succeeded across diverse downstream tasks, they still face challenges in the presence of spurious correlations between the input and label.

parameter-efficient fine-tuning

Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge

1 code implementation5 Mar 2025 Fanwen Wang, Zi Wang, Yan Li, Jun Lyu, Chen Qin, Shuo Wang, Kunyuan Guo, Mengting Sun, Mingkai Huang, Haoyu Zhang, Michael Tänzer, Qirong Li, Xinran Chen, Jiahao Huang, Yinzhe Wu, Kian Anvari Hamedani, Yuntong Lyu, Longyu Sun, Qing Li, Ziqiang Xu, Bingyu Xin, Dimitris N. Metaxas, Narges Razizadeh, Shahabedin Nabavi, George Yiasemis, Jonas Teuwen, Zhenxi Zhang, Sha Wang, Chi Zhang, Daniel B. Ennis, Zhihao Xue, Chenxi Hu, Ruru Xu, Ilkay Oksuz, Donghang Lyu, Yanxin Huang, Xinrui Guo, Ruqian Hao, Jaykumar H. Patel, Guanke Cai, Binghua Chen, Yajing Zhang, Sha Hua, Zhensen Chen, Qi Dou, Xiahai Zhuang, Qian Tao, Wenjia Bai, Jing Qin, He Wang, Claudia Prieto, Michael Markl, Alistair Young, Hao Li, Xihong Hu, Lianmin Wu, Xiaobo Qu, Guang Yang, Chengyan Wang

In addition, through a detailed analysis of the results submitted to the challenge, we have also made several findings, including: 1) adaptive prompt-learning embedding is an effective means for achieving strong generalization in reconstruction models; 2) enhanced data consistency based on physics-informed networks is also an effective pathway toward a universal model; 3) traditional evaluation metrics have limitations when assessing ground-truth references with moderate or lower image quality, highlighting the need for subjective evaluation methods.

Benchmarking Image Reconstruction +3

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

no code implementations4 Mar 2025 Wenjia Jiang, Yangyang Zhuang, Chenxi Song, Xu Yang, Chi Zhang

This allows the agent to focus on tasks requiring more complex reasoning, while simplifying routine actions.

Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

1 code implementation26 Feb 2025 Xiankang He, Dongyan Guo, Hongji Li, Ruibo Li, Ying Cui, Chi Zhang

Recent advances in zero-shot monocular depth estimation(MDE) have significantly improved generalization by unifying depth distributions through normalized depth representations and by leveraging large-scale unlabeled data via pseudo-label distillation.

Diversity Monocular Depth Estimation +2

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

no code implementations20 Feb 2025 Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li, Shali Jiang, Jiyan Yang, Xiaozhen Xia, Fan Yang, Yasmine Badr, Ellie Wen, Shuyu Xu, Hansey Chen, Zhengyu Zhang, Jade Nie, Chunzhi Yang, Zhichen Zeng, Weilin Zhang, Xingliang Huang, Qianru Li, Shiquan Wang, Evelyn Lyu, Wenjing Lu, Rui Zhang, Wenjun Wang, Jason Rudy, Mengyue Hang, Kai Wang, Yinbin Ma, Shuaiwen Wang, Sihan Zeng, Tongyi Tang, Xiaohan Wei, Longhao Jin, Jamey Zhang, Marcus Chen, Jiayi Xu, Angie Huang, Xihuan Zeng, Chi Zhang, Zhengli Zhao, Jared Yang, Qiang Jin, Xian Chen, Amit Anand Amlesahwaram, Lexi Song, Liang Luo, Yuchen Hao, Nan Xiao, Yavuz Yetim, Luoshang Pan, Gaoxiang Liu, Yuxi Hu, Yuzhen Huang, Jackie Xu, Rich Zhu, Xin Zhang, Yiqun Liu, Hang Yin, Yuxin Chen, Buyun Zhang, Xiaoyi Liu, Xingyuan Wang, Wenguang Mao, Zhijing Li, Zhehui Zhou, Feifan Gu, Qin Huang, Chonglin Sun, Nancy Yu, Shuo Gu, Shupin Mao, Benjamin Au, Jingzheng Qin, Peggy Yao, Jae-Woo Choi, Bin Gao, Ernest Wang, Lei Zhang, Wen-Yen Chen, Ted Lee, Jay Zha, Yi Meng, Alex Gong, Edison Gao, Alireza Vahdatpour, Yiping Han, Yantao Yao, Toshinari Kureha, Shuo Chang, Musharaf Sultan, John Bocharov, Sagar Chordia, Xiaorui Gan, Peng Sun, Rocky Liu, Bo Long, Wenlin Chen, Santanu Kolay, Huayu Li

Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system.

Data Augmentation

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

1 code implementation20 Feb 2025 Jiangyuan Liu, Hongxuan Ma, Yuxin Guo, Yuhao Zhao, Chi Zhang, Wei Sui, Wei Zou

To address these issues, we propose a monocular framework, which is the first to excel in both segmentation and depth estimation of transparent objects, with only a single-image input.

Monocular Depth Estimation Transparent objects

Neural Force Field: Learning Generalized Physical Representation from a Few Examples

no code implementations13 Feb 2025 Shiqian Li, Ruihong Shen, Chi Zhang, Yixin Zhu

Physical reasoning is a remarkable human ability that enables rapid learning and generalization from limited experience.

PoI: Pixel of Interest for Novel View Synthesis Assisted Scene Coordinate Regression

no code implementations7 Feb 2025 Feifei Li, Qi Song, Chi Zhang, Hui Shuai, Rui Huang

The task of estimating camera poses can be enhanced through novel view synthesis techniques such as NeRF and Gaussian Splatting to increase the diversity and extension of training data.

Diversity NeRF +2

MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent

no code implementations5 Feb 2025 Xinyao Liao, Xianfang Zeng, Liao Wang, Gang Yu, Guosheng Lin, Chi Zhang

Specifically, the agent extracts the object movement and camera motion described in the text and converts them into object trajectories and camera extrinsics, respectively.

Image to Video Generation Motion Generation +1

Learning to Plan with Personalized Preferences

no code implementations2 Feb 2025 Manjie Xu, Xinyi Yang, Wei Liang, Chi Zhang, Yixin Zhu

Effective integration of AI agents into daily life requires them to understand and adapt to individual human preferences, particularly in collaborative roles.

Behavior Modeling Space Reconstruction for E-Commerce Search

no code implementations30 Jan 2025 Yejing Wang, Chi Zhang, Xiangyu Zhao, Qidong Liu, Maolin Wang, Xuetao Wei, Zitao Liu, Xing Shi, Xudong Yang, Ling Zhong, Wei Lin

Delivering superior search services is crucial for enhancing customer experience and driving revenue growth.

Exploring Siamese Networks in Self-Supervised Fast MRI Reconstruction

no code implementations18 Jan 2025 Liyan Sun, Shaocong Yu, Chi Zhang, Xinghao Ding

Reconstructing MR images using deep neural networks from undersampled k-space data without using fully sampled training references offers significant value in practice, which is a self-supervised regression problem calling for effective prior knowledge and supervision.

MRI Reconstruction Self-Supervised Learning

DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem

1 code implementation15 Jan 2025 Shipei Zhou, Yuandong Ding, Chi Zhang, Zhiguang Cao, Yan Jin

This paper proposes a dual divide-and-optimize algorithm (DualOpt) for solving the large-scale traveling salesman problem (TSP).

Computational Efficiency Traveling Salesman Problem

Molecule-dynamic-based Aging Clock and Aging Roadmap Forecast with Sundial

no code implementations4 Jan 2025 Wei Wu, Zizhen Deng, Chi Zhang, Can Liao, Jinzhuo Wang

Addressing the unavoidable bias inherent in supervised aging clocks, we introduce Sundial, a novel framework that models molecular dynamics through a diffusion field, capturing both the population-level aging process and the individual-level relative aging order.

Training-Free Mitigation of Adversarial Attacks on Deep Learning-Based MRI Reconstruction

no code implementations3 Jan 2025 Mahdi Saberi, Chi Zhang, Mehmet Akcakaya

In this work, we propose a novel approach for mitigating adversarial attacks on MRI reconstruction models without any retraining.

MRI Reconstruction

VAST 1.0: A Unified Framework for Controllable and Consistent Video Generation

no code implementations21 Dec 2024 Chi Zhang, Yuanzhi Liang, Xi Qiu, Fangqiu Yi, Xuelong Li

Generating high-quality videos from textual descriptions poses challenges in maintaining temporal coherence and control over subject motion.

Video Generation

Proposing and solving olympiad geometry with guided tree search

no code implementations14 Dec 2024 Chi Zhang, Jiajun Song, Siyu Li, Yitao Liang, Yuxi Ma, Wei Wang, Yixin Zhu, Song-Chun Zhu

Mathematics olympiads are prestigious competitions, with problem proposing and solving highly honored.

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

no code implementations11 Dec 2024 Mingkun Lei, Xue Song, Beier Zhu, Hao Wang, Chi Zhang

Recent advancements in text-to-image models have improved the nuance of style transformations, yet significant challenges remain, particularly with overfitting to reference styles, limiting stylistic control, and misaligning with textual content.

Style Transfer

UniScene: Unified Occupancy-centric Driving Scene Generation

no code implementations6 Dec 2024 Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin

UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling.

Autonomous Driving Scene Generation

Human Multi-View Synthesis from a Single-View Model:Transferred Body and Face Representations

no code implementations4 Dec 2024 Yu Feng, Shunsi Zhang, Jian Shu, HanFeng Zhao, Guoliang Pang, Chi Zhang, Hao Wang

Specifically, we use a single-view model pretrained on a large-scale human dataset to develop a multi-view body representation, aiming to extend the 2D knowledge of the single-view model to a multi-view diffusion model.

Novel View Synthesis

Predicting Pedestrian Crossing Behavior in Germany and Japan: Insights into Model Transferability

no code implementations4 Dec 2024 Chi Zhang, Janis Sprenger, Zhongjun Ni, Christian Berger

When comparing the differences between countries, pedestrians from the study conducted in Japan are more cautious, selecting larger gaps compared to those in Germany.

Prediction Trajectory Prediction

LoRA of Change: Learning to Generate LoRA for the Editing Instruction from A Single Before-After Image Pair

no code implementations28 Nov 2024 Xue Song, Jiequan Cui, Hanwang Zhang, Jiaxin Shi, Jingjing Chen, Chi Zhang, Yu-Gang Jiang

Furthermore, generalizable models for image editing with visual instructions typically require quad data, i. e., a before-after image pair, along with query and target images.

Specificity Text-based Image Editing

Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

1 code implementation20 Nov 2024 Yoel Zimmermann, Adib Bazgir, Zartashia Afzal, Fariha Agbere, Qianxiang Ai, Nawaf Alampara, Alexander Al-Feghali, Mehrad Ansari, Dmytro Antypov, Amro Aswad, Jiaru Bai, Viktoriia Baibakova, Devi Dutta Biswajeet, Erik Bitzek, Joshua D. Bocarsly, Anna Borisova, Andres M Bran, L. Catherine Brinson, Marcel Moran Calderon, Alessandro Canalicchio, Victor Chen, Yuan Chiang, Defne Circi, Benjamin Charmes, Vikrant Chaudhary, Zizhang Chen, Min-Hsueh Chiu, Judith Clymo, Kedar Dabhadkar, Nathan Daelman, Archit Datar, Wibe A. de Jong, Matthew L. Evans, Maryam Ghazizade Fard, Giuseppe Fisicaro, Abhijeet Sadashiv Gangan, Janine George, Jose D. Cojal Gonzalez, Michael Götte, Ankur K. Gupta, Hassan Harb, Pengyu Hong, Abdelrahman Ibrahim, Ahmed Ilyas, Alishba Imran, Kevin Ishimwe, Ramsey Issa, Kevin Maik Jablonka, Colin Jones, Tyler R. Josephson, Greg Juhasz, Sarthak Kapoor, Rongda Kang, Ghazal Khalighinejad, Sartaaj Khan, Sascha Klawohn, Suneel Kuman, Alvin Noe Ladines, Sarom Leang, Magdalena Lederbauer, Sheng-Lun, Liao, Hao liu, Xuefeng Liu, Stanley Lo, Sandeep Madireddy, Piyush Ranjan Maharana, Shagun Maheshwari, Soroush Mahjoubi, José A. Márquez, Rob Mills, Trupti Mohanty, Bernadette Mohr, Seyed Mohamad Moosavi, Alexander Moßhammer, Amirhossein D. Naghdi, Aakash Naik, Oleksandr Narykov, Hampus Näsström, Xuan Vu Nguyen, Xinyi Ni, Dana O'Connor, Teslim Olayiwola, Federico Ottomano, Aleyna Beste Ozhan, Sebastian Pagel, Chiku Parida, Jaehee Park, Vraj Patel, Elena Patyukova, Martin Hoffmann Petersen, Luis Pinto, José M. Pizarro, Dieter Plessers, Tapashree Pradhan, Utkarsh Pratiush, Charishma Puli, Andrew Qin, Mahyar Rajabi, Francesco Ricci, Elliot Risch, Martiño Ríos-García, Aritra Roy, Tehseen Rug, Hasan M Sayeed, Markus Scheidgen, Mara Schilling-Wilhelmi, Marcel Schloz, Fabian Schöppach, Julia Schumann, Philippe Schwaller, Marcus Schwarting, Samiha Sharlin, Kevin Shen, Jiale Shi, Pradip Si, Jennifer D'Souza, Taylor Sparks, Suraj Sudhakar, Leopold Talirz, Dandan Tang, Olga Taran, Carla Terboven, Mark Tropin, Anastasiia Tsymbal, Katharina Ueltzen, Pablo Andres Unzueta, Archit Vasan, Tirtha Vinchurkar, Trung Vo, Gabriel Vogel, Christoph Völker, Jan Weinreich, Faradawn Yang, Mohd Zaki, Chi Zhang, Sylvester Zhang, Weijie Zhang, Ruijie Zhu, Shang Zhu, Jan Janssen, Calvin Li, Ian Foster, Ben Blaiszik

Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions.

Language Modeling Language Modelling +2

On the Foundation Model for Cardiac MRI Reconstruction

no code implementations15 Nov 2024 Chi Zhang, Michael Loecher, Cagan Alkan, Mahmut Yurt, Shreyas S. Vasanawala, Daniel B. Ennis

ML-based reconstruction, however, also requires substantial data and computational time to train the neural network, which is often optimized for a fixed acceleration rate or image contrast.

model MRI Reconstruction

Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models

no code implementations14 Nov 2024 Chutian Meng, Fan Ma, Jiaxu Miao, Chi Zhang, Yi Yang, Yueting Zhuang

We use GPT4V to bridge the gap between the reference image and the text input for the T2I model, allowing T2I models to understand image content.

Image Generation

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

no code implementations4 Nov 2024 Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, Liang Pan

Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping.

3D Inpainting Super-Resolution

LLM4PR: Improving Post-Ranking in Search Engine with Large Language Models

no code implementations2 Nov 2024 Yang Yan, Yihao Wang, Chi Zhang, Wenyuan Hou, Kang Pan, Xingkai Ren, Zelun Wu, Zhixin Zhai, Enyun Yu, Wenwu Ou, Yang song

In this study, we introduce a novel paradigm named Large Language Models for Post-Ranking in search engine (LLM4PR), which leverages the capabilities of LLMs to accomplish the post-ranking task in SE.

Information Retrieval

ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object

no code implementations14 Oct 2024 Jiwei Chen, Laiyan Ding, Chi Zhang, Feifei Li, Rui Huang

Vision-based BEV (Bird-Eye-View) 3D object detection has recently become popular in autonomous driving.

3D Object Detection Autonomous Driving +1

HybridFlow: A Flexible and Efficient RLHF Framework

3 code implementations28 Sep 2024 Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, Chuan Wu

Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes data dependencies between the NNs.

Large Language Model

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

no code implementations25 Sep 2024 Chi Zhang, Huaping Zhong, Kuan Zhang, Chengliang Chai, Rui Wang, Xinlin Zhuang, Tianyi Bai, Jiantao Qiu, Lei Cao, Ju Fan, Ye Yuan, Guoren Wang, Conghui He

For each cluster, if we opt to select data from it, we take some samples to evaluate the influence to prevent processing all instances.

Diversity

dnaGrinder: a lightweight and high-capacity genomic foundation model

no code implementations24 Sep 2024 Qihang Zhao, Chi Zhang, Weixiong Zhang

In this context, recent advancements in large language model research have led to the development of both encoder-only and decoder-only foundation models designed to decode intricate information in DNA sequences.

Decoder Language Modelling +1

MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

1 code implementation5 Aug 2024 YiWen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, Guosheng Lin

Meshes are the de facto 3D representation in the industry but are labor-intensive to produce.

AppAgent v2: Advanced Agent for Flexible Mobile Interactions

no code implementations5 Aug 2024 Yanda Li, Chi Zhang, Wanqi Yang, Bin Fu, Pei Cheng, Xin Chen, Ling Chen, Yunchao Wei

In the deployment phase, RAG technology enables efficient retrieval and update from this knowledge base, thereby empowering the agent to perform tasks effectively and accurately.

RAG

Efficient and Long-Tailed Generalization for Pre-trained Vision-Language Model

1 code implementation18 Jun 2024 Jiang-Xin Shi, Chi Zhang, Tong Wei, Yu-Feng Li

For efficient adaptation, we treat the CLIP model as a black box and leverage the extracted features to obtain visual and textual prototypes for prediction.

Image-text matching Language Modeling +2

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

1 code implementation14 Jun 2024 YiWen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement.

Decoder

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

1 code implementation13 Jun 2024 Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang

Recent advancements in image generation have enabled the creation of high-quality images from text conditions.

Conditional Image Generation

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

1 code implementation31 May 2024 Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen

The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications.

Language Modeling Language Modelling +1

Learning Monocular Depth from Focus with Event Focal Stack

no code implementations11 May 2024 Chenxu Jiang, Mingyuan Lin, Chi Zhang, Zhenghai Wang, Lei Yu

Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i. e. the Focal Stack.

Super-Resolving Blurry Images with Events

no code implementations11 May 2024 Chi Zhang, Mingyuan Lin, Xiang Zhang, Chenxu Jiang, Lei Yu

Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution.

Super-Resolution

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

1 code implementation6 May 2024 Zheng Zhu, XiaoFeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.

Autonomous Driving Decision Making +2

Non-Uniform Exposure Imaging via Neuromorphic Shutter Control

no code implementations22 Apr 2024 Mingyuan Lin, Jian Liu, Chi Zhang, Zibo Zhao, Chu He, Lei Yu

To address this challenge, we propose a novel Neuromorphic Shutter Control (NSC) system to avoid motion blurs and alleviate instant noises, where the extremely low latency of events is leveraged to monitor the real-time motion and facilitate the scene-adaptive exposure.

Image Denoising Self-Supervised Learning

SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images

no code implementations20 Apr 2024 Jiaqi Wang, Mengtian Kang, Yong liu, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Shuo Gao, Luigi G. Occhipinti

Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results.

Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

no code implementations20 Apr 2024 Yong liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti

Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis.

FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving

1 code implementation19 Apr 2024 Xingtai Gui, Tengteng Huang, Haonan Shao, Haotian Yao, Chi Zhang

In this paper, we propose a simple yet effective fully end-to-end framework named Future Instance Prediction Transformer(FipTR), which views the task as BEV instance segmentation and prediction for future frames.

Autonomous Driving Instance Segmentation +3

HybriMap: Hybrid Clues Utilization for Effective Vectorized HD Map Construction

no code implementations17 Apr 2024 Chi Zhang, Qi Song, Feifei Li, Yongquan Chen, Rui Huang

Constructing vectorized high-definition maps from surround-view cameras has garnered significant attention in recent years.

Predicting and Analyzing Pedestrian Crossing Behavior at Unsignalized Crossings

no code implementations15 Apr 2024 Chi Zhang, Janis Sprenger, Zhongjun Ni, Christian Berger

Predicting gap selection behavior and the use of zebra crossing enables driving systems to proactively respond and prevent potential conflicts.

MotionChain: Conversational Motion Controllers via Multimodal Prompts

1 code implementation2 Apr 2024 Biao Jiang, Xin Chen, Chi Zhang, Fukun Yin, Zhuoyuan Li, Gang Yu, Jiayuan Fan

However, this proficiency remains largely unexplored in other multimodal generative models, particularly in human motion models.

Language Modeling Language Modelling +1

Edge-based Parametric Digital Twins for Intelligent Building Indoor Climate Modeling

no code implementations7 Mar 2024 Zhongjun Ni, Chi Zhang, Magnus Karlsson, Shaofang Gong

Digital transformation in the built environment generates vast data for developing data-driven models to optimize building operations.

Deep Learning Edge-computing +1

Neural Networks with (Low-Precision) Polynomial Approximations: New Insights and Techniques for Accuracy Improvement

no code implementations17 Feb 2024 Chi Zhang, Jingjing Fan, Man Ho Au, Siu Ming Yiu

Experiments showed that combination of our solutions is very effective: at the same precision, our PANN is 10% to 50% more accurate than state-of-the-arts; and at the same accuracy, our PANN only requires a precision of 2^{-9} while state-of-the-art solution requires a precision of 2^{-12} using the ResNet-20 model on CIFAR-10 dataset.

Privacy Preserving

CounterCLR: Counterfactual Contrastive Learning with Non-random Missing Data in Recommendation

no code implementations8 Feb 2024 Jun Wang, Haoxuan Li, Chi Zhang, Dongxu Liang, Enyun Yu, Wenwu Ou, Wenjia Wang

Recommender systems are designed to learn user preferences from observed feedback and comprise many fundamental tasks, such as rating prediction and post-click conversion rate (pCVR) prediction.

Contrastive Learning counterfactual +3

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

2 code implementations6 Feb 2024 Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou

Multi-view 3D object detection systems often struggle with generating precise predictions due to the challenges in estimating depth from images, increasing redundant and incorrect detections.

3D Object Detection Denoising +1

Integration of cognitive tasks into artificial general intelligence test for large models

no code implementations4 Feb 2024 Youzhi Qu, Chen Wei, Penghui Du, Wenxin Che, Chi Zhang, Wanli Ouyang, Yatao Bian, Feiyang Xu, Bin Hu, Kai Du, Haiyan Wu, Jia Liu, Quanying Liu

During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application.

A Survey on Data-Centric Recommender Systems

no code implementations31 Jan 2024 Riwei Lai, Rui Chen, Chi Zhang

Recommender systems (RSs) have become an essential tool for mitigating information overload in a range of real-world applications.

Recommendation Systems Survey

Stream Query Denoising for Vectorized HD Map Construction

no code implementations17 Jan 2024 Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao

This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.

Autonomous Driving Denoising

Adaptive Hardness Negative Sampling for Collaborative Filtering

1 code implementation10 Jan 2024 Riwei Lai, Rui Chen, Qilong Han, Chi Zhang, Li Chen

Negative sampling is essential for implicit collaborative filtering to provide proper negative training signals so as to achieve desirable performance.

Collaborative Filtering

DreamGaussian4D: Generative 4D Gaussian Splatting

1 code implementation28 Dec 2023 Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, Ziwei Liu

Specifically, we propose an integral framework with two major modules: 1) Image-to-4D GS - we initially generate static GS with DreamGaussianHD, followed by HexPlane-based dynamic generation with Gaussian deformation; and 2) Video-to-Video Texture Refinement - we refine the generated UV-space texture maps and meanwhile enhance their temporal consistency by utilizing a pre-trained image-to-video diffusion model.

Video Generation

AppAgent: Multimodal Agents as Smartphone Users

1 code implementation21 Dec 2023 Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks.

Navigate

Solving the swing-up and balance task for the Acrobot and Pendubot with SAC

no code implementations18 Dec 2023 Chi Zhang, Akhil Sathuluri, Markus Zimmermann

We present a solution of the swing-up and balance task for the pendubot and acrobot for the participation in the AI Olympics competition at IJCAI 2023.

Acrobot Position +2

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

2 code implementations17 Dec 2023 Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

Furthermore, we establish a new benchmark for assessing the performance of large models in understanding multi-modal 3D prompts.

Instruction Following

Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models

2 code implementations15 Dec 2023 Xu Yang, Yingzhe Peng, Haoxuan Ma, Shuo Xu, Chi Zhang, Yucheng Han, Hanwang Zhang

As Archimedes famously said, ``Give me a lever long enough and a fulcrum on which to place it, and I shall move the world'', in this study, we propose to use a tiny Language Model (LM), \eg, a Transformer with 67M parameters, to lever much larger Vision-Language Models (LVLMs) with 9B parameters.

Image Captioning In-Context Learning +4

Creative Agents: Empowering Agents with Imagination for Creative Tasks

1 code implementation5 Dec 2023 Chi Zhang, Penglin Cai, Yuhui Fu, Haoqi Yuan, Zongqing Lu

We benchmark creative tasks with the challenging open-world game Minecraft, where the agents are asked to create diverse buildings given free-form language instructions.

Instruction Following Language Modelling +2

FaceStudio: Put Your Face Everywhere in Seconds

no code implementations5 Dec 2023 Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu

This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch.

Image Generation

I-PHYRE: Interactive Physical Reasoning

no code implementations4 Dec 2023 Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

Current evaluation protocols predominantly assess physical reasoning in stationary scenes, creating a gap in evaluating agents' abilities to interact with dynamic events.

Zero-shot Generalization

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

1 code implementation30 Nov 2023 Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen

However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene.

3D dense captioning Dense Captioning +1

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model

no code implementations29 Nov 2023 Fukun Yin, Xin Chen, Chi Zhang, Biao Jiang, Zibo Zhao, Jiayuan Fan, Gang Yu, Taihao Li, Tao Chen

The advent of large language models, enabling flexibility through instruction-driven approaches, has revolutionized many traditional generative tasks, but large models for 3D data, particularly in comprehensively handling 3D shapes with other modalities, are still under-explored.

3D Shape Generation Language Modeling +2

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

1 code implementation CVPR 2024 Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.

Autonomous Driving Video Generation

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

no code implementations22 Nov 2023 Chi Zhang, Zifan Wang, Ravi Mangal, Matt Fredrikson, Limin Jia, Corina Pasareanu

They improve upon previous neural network models of code, such as code2seq or seq2seq, that already demonstrated competitive results when performing tasks such as code summarization and identifying code vulnerabilities.

Code Summarization

ADriver-I: A General World Model for Autonomous Driving

no code implementations22 Nov 2023 Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang

Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I.

Autonomous Driving

Self-Supervised 3D Scene Flow Estimation and Motion Prediction using Local Rigidity Prior

1 code implementation17 Oct 2023 Ruibo Li, Chi Zhang, Zhe Wang, Chunhua Shen, Guosheng Lin

By rigidly aligning each region with its potential counterpart in the target point cloud, we obtain a region-specific rigid transformation to generate its pseudo flow labels.

Motion Estimation motion prediction +2

CrossZoom: Simultaneously Motion Deblurring and Event Super-Resolving

1 code implementation29 Sep 2023 Chi Zhang, Xiang Zhang, Mingyuan Lin, Cheng Li, Chu He, Wen Yang, Gui-Song Xia, Lei Yu

Even though the collaboration between traditional and neuromorphic event cameras brings prosperity to frame-event based vision applications, the performance is still confined by the resolution gap crossing two modalities in both spatial and temporal domains.

Deblurring Event-based vision

Learning Parallax for Stereo Event-based Motion Deblurring

no code implementations18 Sep 2023 Mingyuan Lin, Chi Zhang, Chu He, Lei Yu

To tackle this problem, we propose a novel coarse-to-fine framework, named NETwork of Event-based motion Deblurring with STereo event and intensity cameras (St-EDNet), to recover high-quality images directly from the misaligned inputs, consisting of a single blurry image and the concurrent event streams.

Deblurring Stereo Matching

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

no code implementations ICCV 2023 Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.

Monocular Depth Estimation

PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction

2 code implementations ICCV 2023 Wenjie Ding, Limeng Qiao, Xi Qiu, Chi Zhang

Furthermore, to supervise the position and topology of the vectorized point predictions, we propose a dynamic vectorized sequence loss.

Autonomous Driving

DPF-Net: Combining Explicit Shape Priors in Deformable Primitive Field for Unsupervised Structural Reconstruction of 3D Objects

no code implementations ICCV 2023 Qingyao Shuai, Chi Zhang, Kaizhi Yang, Xuejin Chen

Unsupervised methods for reconstructing structures face significant challenges in capturing the geometric details with consistent structures among diverse shapes of the same category.

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

1 code implementation22 Aug 2023 YiWen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin

Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs).

3D Generation Text to 3D

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

1 code implementation ICCV 2023 Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu

Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in infancy.

Weakly supervised learning for pattern classification in serial femtosecond crystallography

no code implementations30 Jul 2023 Jianan Xie, Ji Liu, Chi Zhang, Xihui Chen, Ping Huai, Jie Zheng, Xiaofeng Zhang

Th is heavy dependence on labeled datasets will seriously restrict the application of networks, because it is very costly to annotate a large number of diffraction patterns.

Weakly-supervised Learning

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution

1 code implementation ICCV 2023 Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang

Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.

Hyperspectral Image Super-Resolution Image Super-Resolution

A Phase-Coded Time-Domain Interleaved OTFS Waveform with Improved Ambiguity Function

no code implementations26 Jul 2023 Jiajun Zhu, Yanqun Tang, Chao Yang, Chi Zhang, Haoran Yin, Jiaojiao Xiong, Yuhua Chen

To enhance the sensing performance of the orthogonal time frequency space (OTFS) waveform, we propose a novel time-domain interleaved cyclic-shifted P4-coded OTFS (TICP4-OTFS) with improved ambiguity function.

Integrated sensing and communication ISAC

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image

1 code implementation ICCV 2023 Wei Yin, Chi Zhang, Hao Chen, Zhipeng Cai, Gang Yu, Kaixuan Wang, Xiaozhi Chen, Chunhua Shen

State-of-the-art (SOTA) monocular metric depth estimation methods can only handle a single camera model and are unable to perform mixed-data training due to the metric ambiguity.

Ranked #29 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Image Reconstruction Monocular Depth Estimation +1

Brain in a Vat: On Missing Pieces Towards Artificial General Intelligence in Large Language Models

no code implementations7 Jul 2023 Yuxi Ma, Chi Zhang, Song-Chun Zhu

In this perspective paper, we first comprehensively review existing evaluations of Large Language Models (LLMs) using both standardized tests and ability-oriented benchmarks.

Unity

Event Detection from Social Media Stream: Methods, Datasets and Opportunities

no code implementations28 Jun 2023 Quanzhi Li, Yang Chao, Dong Li, Yao Lu, Chi Zhang

Social media streams contain large and diverse amount of information, ranging from daily-life stories to the latest global and local events and news.

Event Detection

MachMap: End-to-End Vectorized Solution for Compact HD-Map Construction

2 code implementations17 Jun 2023 Limeng Qiao, Yongchao Zheng, Peng Zhang, Wenjie Ding, Xi Qiu, Xing Wei, Chi Zhang

This report introduces the 1st place winning solution for the Autonomous Driving Challenge 2023 - Online HD-map Construction.

Autonomous Driving Decoder

End-to-End Vectorized HD-map Construction with Piecewise Bezier Curve

1 code implementation CVPR 2023 Limeng Qiao, Wenjie Ding, Xi Qiu, Chi Zhang

Vectorized high-definition map (HD-map) construction, which focuses on the perception of centimeter-level environmental information, has attracted significant research interest in the autonomous driving community.

3D geometry Autonomous Driving

MEWL: Few-shot multimodal word learning with referential uncertainty

1 code implementation1 Jun 2023 Guangyuan Jiang, Manjie Xu, Shiji Xin, Wei Liang, Yujia Peng, Chi Zhang, Yixin Zhu

To fill in this gap, we introduce the MachinE Word Learning (MEWL) benchmark to assess how machines learn word meaning in grounded visual scenes.

An Overview of Resource Allocation in Integrated Sensing and Communication

no code implementations15 May 2023 Jinming Du, Yanqun Tang, Xizhang Wei, Jiaojiao Xiong, Jiajun Zhu, Haoran Yin, Chi Zhang, Haibo Chen

Integrated sensing and communication (ISAC) is considered as a promising solution for improving spectrum efficiency and relieving wireless spectrum congestion.

Integrated sensing and communication ISAC

Leveraging Deep Learning and Digital Twins to Improve Energy Performance of Buildings

no code implementations8 May 2023 Zhongjun Ni, Chi Zhang, Magnus Karlsson, Shaofang Gong

Digital transformation in buildings accumulates massive operational data, which calls for smart solutions to utilize these data to improve energy performance.

Deep Learning

You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking

1 code implementation18 Apr 2023 Xiyang Wang, Chunyun Fu, JiaWei He, Mingguang Huang, Ting Meng, Siyu Zhang, Hangning Zhou, Ziyao Xu, Chi Zhang

In the classical tracking-by-detection (TBD) paradigm, detection and tracking are separately and sequentially conducted, and data association must be properly performed to achieve satisfactory tracking performance.

3D Multi-Object Tracking Object +3

Cross or Wait? Predicting Pedestrian Interaction Outcomes at Unsignalized Crossings

no code implementations17 Apr 2023 Chi Zhang, Amir Hossein Kalantari, Yue Yang, Zhongjun Ni, Gustav Markkula, Natasha Merat, Christian Berger

Predicting pedestrian behavior when interacting with vehicles is one of the most critical challenges in the field of automated driving.

Model Selection regression

Model-Agnostic Reachability Analysis on Deep Neural Networks

no code implementations3 Apr 2023 Chi Zhang, Wenjie Ruan, Fu Wang, Peipei Xu, Geyong Min, Xiaowei Huang

Verification plays an essential role in the formal analysis of safety-critical systems.

model

Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks

no code implementations29 Mar 2023 Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu

Our method outperforms baselines by a large margin and is the most sample-efficient demonstration-free RL method to solve Minecraft Tech Tree tasks.

Minecraft Multi-Task Learning +2

Cyclic Delay-Doppler Shift: A Simple Transmit Diversity Technique for Delay-Doppler Waveforms in Doubly Selective Channels

no code implementations22 Feb 2023 Haoran Yin, Jiaojiao Xiong, Yu Zhou, Chi Zhang, Di Zhang, Xizhang Wei, Yanqun Tang

Delay-Doppler waveform design has been considered as a promising solution to achieve reliable communication under high-mobility channels for the space-air-ground-integrated networks (SAGIN).

Diversity

Denoising and Prompt-Tuning for Multi-Behavior Recommendation

1 code implementation12 Feb 2023 Chi Zhang, Rui Chen, Xiangyu Zhao, Qilong Han, Li Li

In practical recommendation scenarios, users often interact with items under multi-typed behaviors (e. g., click, add-to-cart, and purchase).

Collaborative Filtering Denoising

Two-Stage Constrained Actor-Critic for Short Video Recommendation

1 code implementation3 Feb 2023 Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang, Kun Gai

One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning.

Recommendation Systems reinforcement-learning +2

Constrained Policy Optimization with Explicit Behavior Density for Offline Reinforcement Learning

1 code implementation NeurIPS 2023 Jing Zhang, Chi Zhang, Wenjia Wang, Bing-Yi Jing

Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points.

reinforcement-learning Reinforcement Learning +1

Reachability Analysis of Neural Network Control Systems

1 code implementation28 Jan 2023 Chi Zhang, Wenjie Ruan, Peipei Xu

We then reveal the working principles of applying Lipschitzian optimisation on NNCS verification and illustrate it by verifying an adaptive cruise control model.

Rolling Shutter Correction

Computationally Efficient 3D MRI Reconstruction with Adaptive MLP

no code implementations21 Jan 2023 Eric Z. Chen, Chi Zhang, Xiao Chen, Yikang Liu, Terrence Chen, Shanhui Sun

Recon3DMLP improves HR 3D reconstruction and outperforms several existing CNN-based models under similar GPU memory consumption, which demonstrates that Recon3DMLP is a practical solution for HR 3D MRI reconstruction.

3D Reconstruction MRI Reconstruction

Label-Guided Knowledge Distillation for Continual Semantic Segmentation on 2D Images and 3D Point Clouds

1 code implementation ICCV 2023 Ze Yang, Ruibo Li, Evan Ling, Chi Zhang, Yiming Wang, Dezhao Huang, Keng Teck Ma, Minhoe Hur, Guosheng Lin

To address this issue, we propose a new label-guided knowledge distillation (LGKD) loss, where the old model output is expanded and transplanted (with the guidance of the ground truth label) to form a semantically appropriate class correspondence with the new model output.

Continual Semantic Segmentation Knowledge Distillation +1

Discrepant and Multi-Instance Proxies for Unsupervised Person Re-Identification

no code implementations ICCV 2023 Chang Zou, Zeqi Chen, Zhichao Cui, Yuehu Liu, Chi Zhang

To completely and accurately represent the information contained in a cluster and learn discriminative features, we propose to maintain discrepant cluster proxies and multi-instance proxies for a cluster.

Contrastive Learning Unsupervised Person Re-Identification

BEAR: Physics-Principled Building Environment for Control and Reinforcement Learning

1 code implementation27 Nov 2022 Chi Zhang, Yuanyuan Shi, Yize Chen

Recent advancements in reinforcement learning algorithms have opened doors for researchers to operate and optimize building energy management systems autonomously.

energy management Management +4

Semantics-Preserving Sketch Embedding for Face Generation

no code implementations23 Nov 2022 Binxin Yang, Xuejin Chen, Chaoqun Wang, Chi Zhang, Zihan Chen, Xiaoyan Sun

With a semantic feature matching loss for effective semantic supervision, our sketch embedding precisely conveys the semantics in the input sketches to the synthesized images.

Face Generation Image-to-Image Translation

Dual Clustering Co-teaching with Consistent Sample Mining for Unsupervised Person Re-Identification

no code implementations7 Oct 2022 Zeqi Chen, Zhichao Cui, Chi Zhang, Jiahuan Zhou, Yuehu Liu

However, training two networks with a set of noisy pseudo labels reduces the complementarity of the two networks and results in label noise accumulation.

Clustering Pseudo Label +1

On the Learning Mechanisms in Physical Reasoning

no code implementations5 Oct 2022 Shiqian Li, Kewen Wu, Chi Zhang, Yixin Zhu

Taken together, the results on the challenging benchmark of PHYRE show that LfI is, if not better, as good as LfD for dynamics prediction.

Prediction

Infrared: A Meta Bug Detector

no code implementations18 Sep 2022 Chi Zhang, Yu Wang, Linzhang Wang

The recent breakthroughs in deep learning methods have sparked a wave of interest in learning-based bug detectors.

Anomaly Detection

MRF-PINN: A Multi-Receptive-Field convolutional physics-informed neural network for solving partial differential equations

no code implementations6 Sep 2022 Shihong Zhang, Chi Zhang, Bosen Wang

To fill the gaps above, we propose three initiatives in this paper: (1) A Multi-Receptive-Field PINN (MRF-PINN) model is established to solve different types of PDEs on various mesh resolutions without manual tuning; (2) The dimensional balance method is used to estimate the loss weights when solving Navier-Stokes equations; (3) The Taylor polynomial is used to pad the virtual nodes near the boundaries for implementing high-order finite difference.

CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks

no code implementations23 Aug 2022 Weide Liu, Chi Zhang, Guosheng Lin, Fayao Liu

Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.

Segmentation

Sobolev Training for Implicit Neural Representations with Approximated Image Derivatives

1 code implementation21 Jul 2022 Wentao Yuan, Qingtian Zhu, Xiangyue Liu, Yikang Ding, Haotian Zhang, Chi Zhang

Recently, Implicit Neural Representations (INRs) parameterized by neural networks have emerged as a powerful and promising tool to represent different kinds of signals due to its continuous, differentiable properties, showing superiorities to classical discretized representations.

Inverse Rendering

KD-MVS: Knowledge Distillation Based Self-supervised Learning for Multi-view Stereo

1 code implementation21 Jul 2022 Yikang Ding, Qingtian Zhu, Xiangyue Liu, Wentao Yuan, Haotian Zhang, Chi Zhang

Supervised multi-view stereo (MVS) methods have achieved remarkable progress in terms of reconstruction quality, but suffer from the challenge of collecting large-scale ground-truth depth.

Knowledge Distillation Self-Supervised Learning

Few-shot Open-set Recognition Using Background as Unknowns

no code implementations19 Jul 2022 Nan Song, Chi Zhang, Guosheng Lin

First, instead of learning the decision boundaries between seen classes, as is done in standard close-set classification, we reserve space for unseen classes, such that images located in these areas are recognized as the unseen classes.

Open Set Learning

A Synergistic Compilation Workflow for Tackling Crosstalk in Quantum Machines

no code implementations12 Jul 2022 Fei Hua, Yuwei Jin, Ang Li, Chenxu Liu, Meng Wang, Yanhao Chen, Chi Zhang, Ari Hayes, Samuel Stein, Minghao Guo, Yipeng Huang, Eddy Z. Zhang

Evaluations through simulation and on real IBM-Q devices show that our framework can significantly reduce the error rate by up to 6$\times$, with only $\sim$60\% circuit depth compared to state-of-the-art gate scheduling approaches.

Scheduling

Automatic Generation of Product-Image Sequence in E-commerce

1 code implementation26 Jun 2022 Xiaochuan Fan, Chi Zhang, Yong Yang, Yue Shang, Xueying Zhang, Zhen He, Yun Xiao, Bo Long, Lingfei Wu

For a platform with billions of products, it is extremely time-costly and labor-expensive to manually pick and organize qualified images.

DETR++: Taming Your Multi-Scale Detection Transformer

no code implementations7 Jun 2022 Chi Zhang, Lijuan Liu, Xiaoxue Zang, Frederick Liu, Hao Zhang, Xinying Song, Jindong Chen

Convolutional Neural Networks (CNN) have dominated the field of detection ever since the success of AlexNet in ImageNet classification [12].

object-detection Small Object Detection

On the Perils of Cascading Robust Classifiers

1 code implementation1 Jun 2022 Ravi Mangal, Zifan Wang, Chi Zhang, Klas Leino, Corina Pasareanu, Matt Fredrikson

We present \emph{cascade attack} (CasA), an adversarial attack against cascading ensembles, and show that: (1) there exists an adversarial input for up to 88\% of the samples where the ensemble claims to be certifiably robust and accurate; and (2) the accuracy of a cascading ensemble under our attack is as low as 11\% when it claims to be certifiably robust and accurate on 97\% of the test set.

Adversarial Attack

Multi-agent Databases via Independent Learning

no code implementations28 May 2022 Chi Zhang, Olga Papaemmanouil, Josiah P. Hanna, Aditya Akella

Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?".

Multi-agent Reinforcement Learning Scheduling

Constrained Reinforcement Learning for Short Video Recommendation

no code implementations26 May 2022 Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding, Pinghua Gong, Dong Zheng, Peng Jiang

In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos.

Recommendation Systems reinforcement-learning +2

Scenario-based Multi-product Advertising Copywriting Generation for E-Commerce

no code implementations21 May 2022 Xueying Zhang, Kai Shen, Chi Zhang, Xiaochuan Fan, Yun Xiao, Zhen He, Bo Long, Lingfei Wu

In this paper, we proposed an automatic Scenario-based Multi-product Advertising Copywriting Generation system (SMPACG) for E-Commerce, which has been deployed on a leading Chinese e-commerce platform.

Attribute Language Modeling +1

Correction of out-of-focus microscopic images by deep learning

1 code implementation Computational and Structural Biotechnology Journal 2022 Chi Zhang, Hao Jiang, Weihuang Liu, Junyi Li, Shiming Tang, Mario Juhas, Yang Zhang.

Results To solve the out-of-focus issue in microscopy, we developed a Cycle Generative Adversarial Network (CycleGAN) based model and a multi-component weighted loss function.

Deep Learning Generative Adversarial Network +2

Efficient Few-Shot Object Detection via Knowledge Inheritance

1 code implementation23 Mar 2022 Ze Yang, Chi Zhang, Ruibo Li, Yi Xu, Guosheng Lin

Upon this baseline, we devise an initializer named knowledge inheritance (KI) to reliably initialize the novel weights for the box classifier, which effectively facilitates the knowledge transfer process and boosts the adaptation speed.

Few-Shot Object Detection Object +2

Learning the Pedestrian-Vehicle Interaction for Pedestrian Trajectory Prediction

no code implementations10 Feb 2022 Chi Zhang, Christian Berger

In this paper, we study the interaction between pedestrians and vehicles and propose a novel neural network structure called the Pedestrian-Vehicle Interaction (PVI) extractor for learning the pedestrian-vehicle interaction.

Pedestrian Trajectory Prediction Trajectory Prediction

Multi-Centroid Representation Network for Domain Adaptive Person Re-ID

no code implementations22 Dec 2021 Yuhang Wu, Tengteng Huang, Haotian Yao, Chi Zhang, Yuanjie Shao, Chuchu Han, Changxin Gao, Nong Sang

First, we present a Domain-Specific Contrastive Learning (DSCL) mechanism to fully explore intradomain information by comparing samples only from the same domain.

Contrastive Learning Domain Adaptive Person Re-Identification +2

DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

no code implementations SIGIR 2021 Xueying Zhang, Yunjiang Jiang, Yue Shang, Zhaomeng Cheng, Chi Zhang, Xiaochuan Fan, Yun Xiao, Bo Long

We propose a novel domain-specific generative pre-training (DS-GPT) method for text generation and apply it to the product titleand review summarization problems on E-commerce mobile display. First, we adopt a decoder-only transformer architecture, which fitswell for fine-tuning tasks by combining input and output all to-gether.

Decoder Text Generation

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

no code implementations25 Nov 2021 Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu

Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization.

Abstract Algebra Systematic Generalization

Spatial Ensemble: a Novel Model Smoothing Mechanism for Student-Teacher Framework

1 code implementation NeurIPS 2021 Tengteng Huang, Yifan Sun, Xun Wang, Haotian Yao, Chi Zhang

Model smoothing is of central importance for obtaining a reliable teacher model in the student-teacher framework, where the teacher generates surrogate supervision signals to train the student.

Unity

Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations

no code implementations3 Oct 2021 Chi Zhang, Sanmukh Rao Kuppannagari, Viktor K Prasanna

Current implementations exhibit poor performance due to challenges such as irregular memory accesses and thread-level synchronization overheads on CPU.

reinforcement-learning Reinforcement Learning +1

Degradation Attacks on Certifiably Robust Neural Networks

no code implementations29 Sep 2021 Klas Leino, Chi Zhang, Ravi Mangal, Matt Fredrikson, Bryan Parno, Corina Pasareanu

Certifiably robust neural networks employ provable run-time defenses against adversarial examples by checking if the model is locally robust at the input under evaluation.

valid

Adaptive Reliability Analysis for Multi-fidelity Models using a Collective Learning Strategy

no code implementations21 Sep 2021 Chi Zhang, Chaolin Song, Abdollah Shafieezadeh

In this context, CLF provides a new direction for quantifying the impact of new training points and can be easily extended with new learning functions to adapt to different reliability problems.

Meta Navigator: Search for a Good Adaptation Policy for Few-shot Learning

no code implementations ICCV 2021 Chi Zhang, Henghui Ding, Guosheng Lin, Ruibo Li, Changhu Wang, Chunhua Shen

Inspired by the recent success in Automated Machine Learning literature (AutoML), in this paper, we present Meta Navigator, a framework that attempts to solve the aforementioned limitation in few-shot learning by seeking a higher-level strategy and proffer to automate the selection from various few-shot learning designs.

AutoML Few-Shot Learning

GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph

1 code implementation6 Sep 2021 Zhixuan Zhang, Chi Zhang, Zhenning Niu, Le Wang, Yuehu Liu

In this manuscript, we introduce a semi-automatic scene graph annotation tool for images, the GeneAnnotator.

Graph Generation Graph Learning +3

Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development

1 code implementation1 Sep 2021 Mingkuan Liu, Chi Zhang, Hua Xing, Chao Feng, Monchu Chen, Judith Bishop, Grace Ngapo

Our A/B testing and pilot results demonstrated the HITL pipeline can improve annotation speed and capacity by at least 80% and quality is comparable to or higher than manual double pass annotation.

Vocal Bursts Intensity Prediction

Spatially and Robustly Hybrid Mixture Regression Model for Inference of Spatial Dependence

1 code implementation1 Sep 2021 Wennan Chang, Pengtao Dang, Changlin Wan, Xiaoyu Lu, Yue Fang, Tong Zhao, Yong Zang, Bo Li, Chi Zhang, Sha Cao

Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships.

regression

Calibrating Class Activation Maps for Long-Tailed Visual Recognition

no code implementations29 Aug 2021 Chi Zhang, Guosheng Lin, Lvlong Lai, Henghui Ding, Qingyao Wu

First, we present a Class Activation Map Calibration (CAMC) module to improve the learning and prediction of network classifiers, by enforcing network prediction based on important image regions.

Representation Learning

Binocular Mutual Learning for Improving Few-shot Classification

1 code implementation ICCV 2021 Ziqi Zhou, Xi Qiu, Jiangtao Xie, Jianan Wu, Chi Zhang

From the perspective of class space on base set, existing methods either focus on utilizing all classes under a global view by normal pretraining, or pay more attention to adopt an episodic manner to train meta-tasks within few classes in a local view.

Classification Decision Making +1

DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection

2 code implementations ICCV 2021 Limeng Qiao, Yuxuan Zhao, Zhiyuan Li, Xi Qiu, Jianan Wu, Chi Zhang

Few-shot object detection, which aims at detecting novel objects rapidly from extremely few annotated examples of previously unseen classes, has attracted significant research interest in the community.

Classification Cross-Domain Few-Shot Object Detection +1

Few-shot Segmentation with Optimal Transport Matching and Message Flow

no code implementations19 Aug 2021 Weide Liu, Chi Zhang, Henghui Ding, Tzu-Yi Hung, Guosheng Lin

In this work, we argue that every support pixel's information is desired to be transferred to all query pixels and propose a Correspondence Matching Network (CMNet) with an Optimal Transport Matching module to mine out the correspondence between the query and support images.

Few-Shot Semantic Segmentation Multi-Task Learning +2

Unified Regularity Measures for Sample-wise Learning and Generalization

no code implementations9 Aug 2021 Chi Zhang, Xiaoning Ma, Yu Liu, Le Wang, Yuanqi SU, Yuehu Liu

Fundamental machine learning theory shows that different samples contribute unequally both in learning and testing processes.

Learning Theory Memorization

M2IOSR: Maximal Mutual Information Open Set Recognition

no code implementations5 Aug 2021 Xin Sun, Henghui Ding, Chi Zhang, Guosheng Lin, Keck-Voon Ling

In this work, we aim to address the challenging task of open set recognition (OSR).

Open Set Learning

IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID

3 code implementations ICCV 2021 Yongxing Dai, Jun Liu, Yifan Sun, Zekun Tong, Chi Zhang, Ling-Yu Duan

To ensure these two properties to better characterize appropriate intermediate domains, we enforce the bridge losses on intermediate domains' prediction space and feature space, and enforce a diversity loss on the two domain factors.

Diversity Domain Adaptive Person Re-Identification +1

Principled Hyperedge Prediction with Structural Spectral Features and Neural Networks

no code implementations8 Jun 2021 Changlin Wan, Muhan Zhang, Wei Hao, Sha Cao, Pan Li, Chi Zhang

SNALS captures the joint interactions of a hyperedge by its local environment, which is retrieved by collecting the spectrum information of their connections.

Graph Neural Network Hyperedge Prediction

Social-IWSTCNN: A Social Interaction-Weighted Spatio-Temporal Convolutional Neural Network for Pedestrian Trajectory Prediction in Urban Traffic Scenarios

no code implementations26 May 2021 Chi Zhang, Christian Berger, Marco Dozza

In this paper, we use the recently released large-scale Waymo Open Dataset in urban traffic scenarios, which includes 374 urban training scenes and 76 urban testing scenes to analyze the performance of our proposed algorithm in comparison to the state-of-the-art (SOTA) models.

Pedestrian Trajectory Prediction Trajectory Prediction

More Separable and Easier to Segment: A Cluster Alignment Method for Cross-Domain Semantic Segmentation

no code implementations7 May 2021 Shuang Wang, Dong Zhao, Yi Li, Chi Zhang, Yuwei Guo, Qi Zang, Biao Hou, Licheng Jiao

Feature alignment between domains is one of the mainstream methods for Unsupervised Domain Adaptation (UDA) semantic segmentation.

Clustering Segmentation +2

Few-Shot Incremental Learning with Continually Evolved Classifiers

1 code implementation CVPR 2021 Chi Zhang, Nan Song, Guosheng Lin, Yun Zheng, Pan Pan, Yinghui Xu

First, we adopt a simple but effective decoupled learning strategy of representations and classifiers that only the classifiers are updated in each incremental session, which avoids knowledge forgetting in the representations.

class-incremental learning Few-Shot Class-Incremental Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.