Search Results for author: Lei Zhang

Found 684 papers, 308 papers with code

Momentum Batch Normalization for Deep Learning with Small Batch Size

no code implementations ECCV 2020 Hongwei Yong, Jianqiang Huang, Deyu Meng, Xian-Sheng Hua, Lei Zhang

To make a deeper understanding of BN, in this work we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process, while the noise level depends only on the batch size.

Deep Learning

LST-Net: Learning a Convolutional Neural Network with a Learnable Sparse Transform

no code implementations ECCV 2020 Lida Li, Kun Wang, Shuai Li, Xiangchu Feng, Lei Zhang

The 2D convolutional (Conv2d) layer is the fundamental element to a deep convolutional neural network (CNN).

Mining Word Boundaries from Speech-Text Parallel Data for Cross-domain Chinese Word Segmentation

no code implementations12 Dec 2024 Xuebin Wang, Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Yang Hou

Inspired by early research on exploring naturally annotated data for Chinese Word Segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to explicitly mine word boundaries from speech-text parallel data.

Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence

no code implementations10 Dec 2024 Wenbo Huang, Jinghui Zhang, Guang Li, Lei Zhang, Shuoyuan Wang, Fang Dong, Jiahui Jin, Takahiro Ogawa, Miki Haseyama

The Matryoshka Mamba and the hybrid contrastive learning paradigm operate in parallel branches within Manta, enhancing Mamba for FSAR of long sub-sequence.

Contrastive Learning Few-Shot action recognition +2

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

1 code implementation4 Dec 2024 Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, Lei Zhang

By introducing two adjustable guidance scales on the two LoRA modules to control the strengths of pixel-wise fidelity and semantic-level details during inference, PiSASR can offer flexible SR results according to user preference without re-training.

Image Super-Resolution

Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data

no code implementations3 Dec 2024 Junhao Liu, Siwei Xu, Lei Zhang, Jing Zhang

To thoroughly evaluate the capability of modern instruction-tuned LLMs in automating the cell type identification process, we introduce SOAR, a comprehensive benchmarking study of LLMs for cell type annotation tasks in single-cell genomics.

Benchmarking

Don't Let Your Robot be Harmful: Responsible Robotic Manipulation

no code implementations27 Nov 2024 Minheng Ni, Zihan Chen, Lei Zhang, WangMeng Zuo

Additionally, we create the SafeBox synthetic dataset, which includes one hundred responsible robotic manipulation tasks with different safety risk scenarios and instructions, effectively reducing the risks associated with real-world experiments.

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

1 code implementation27 Nov 2024 Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang

From the data perspective, we build a fully automated data engine and construct the Rexverse-2M dataset which possesses multiple granularities to support the joint training of perception and understanding.

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

no code implementations27 Nov 2024 Jinyuan Qu, Hongyang Li, Shilong Liu, Tianhe Ren, Zhaoyang Zeng, Lei Zhang

In this paper, we present TAPTRv3, which is built upon TAPTRv2 to improve its point tracking robustness in long videos.

Point Tracking

Scaling Speech-Text Pre-training with Synthetic Interleaved Data

1 code implementation26 Nov 2024 Aohan Zeng, Zhengxiao Du, Mingdao Liu, Lei Zhang, Shengmin Jiang, Yuxiao Dong, Jie Tang

Starting from a pre-trained language model and scaling our pre-training to 1 trillion tokens (with 600B synthetic interleaved speech-text data), we achieve state-of-the-art performance in speech language modeling and spoken question answering, improving performance on spoken questions tasks from the previous SOTA of 13% (Moshi) to 31%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Neural Network-based High-index Saddle Dynamics Method for Searching Saddle Points and Solution Landscape

no code implementations25 Nov 2024 Yuankai Liu, Lei Zhang, Jin Zhao

The high-index saddle dynamics (HiSD) method is a powerful approach for computing saddle points and solution landscape.

Adversarial Diffusion Compression for Real-World Image Super-Resolution

no code implementations20 Nov 2024 Bin Chen, Gehui Li, Rongyuan Wu, Xindong Zhang, Jie Chen, Jian Zhang, Lei Zhang

Real-world image super-resolution (Real-ISR) aims to reconstruct high-resolution images from low-resolution inputs degraded by complex, unknown processes.

Decoder Denoising +1

Advancing Sustainability via Recommender Systems: A Survey

1 code implementation12 Nov 2024 Xin Zhou, Lei Zhang, Honglei Zhang, Yixin Zhang, Xiaoxiong Zhang, Jie Zhang, Zhiqi Shen

Human behavioral patterns and consumption paradigms have emerged as pivotal determinants in environmental degradation and climate change, with quotidian decisions pertaining to transportation, energy utilization, and resource consumption collectively precipitating substantial ecological impacts.

Recommendation Systems Survey

A Learned Proximal Alternating Minimization Algorithm and Its Induced Network for a Class of Two-block Nonconvex and Nonsmooth Optimization

no code implementations10 Nov 2024 YunMei Chen, Lezhi Liu, Lei Zhang

For smoothed nonconvex problems we modify the proximal alternating linearized minimization (PALM) scheme by incorporating the residual learning architecture, which has proven to be highly effective in deep network training, and employing the block coordinate decent (BCD) iterates as a safeguard for the convergence of the algorithm.

MRI Reconstruction

Minder: Faulty Machine Detection for Large-scale Distributed Model Training

no code implementations4 Nov 2024 Yangtao Deng, Xiang Shi, Zhuo Jiang, Xingjian Zhang, Lei Zhang, Zhang Zhang, Bo Li, Zuquan Song, Hang Zhu, Gaohong Liu, Fuliang Li, Shuguang Wang, Haibin Lin, Jianxi Ye, Minlan Yu

To address the drawbacks of the time-consuming and labor-intensive manual scrutiny, we propose Minder, an automatic faulty machine detector for distributed training tasks.

Fault Detection

Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation

no code implementations3 Nov 2024 Zhenbin Wang, Lei Zhang, Lituan Wang, Minjuan Zhu, Zhenwei Zhang

Medical video generation models are expected to have a profound impact on the healthcare industry, including but not limited to medical education and training, surgical planning, and simulation.

Mamba Optical Flow Estimation +1

Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

no code implementations3 Nov 2024 Fei Zhou, Peng Wang, Lei Zhang, Zhenghua Chen, Wei Wei, Chen Ding, Guosheng Lin, Yanning Zhang

Meta-learning offers a promising avenue for few-shot learning (FSL), enabling models to glean a generalizable feature embedding through episodic training on synthetic FSL tasks in a source domain.

cross-domain few-shot learning

Automating Quantum Software Maintenance: Flakiness Detection and Root Cause Analysis

no code implementations31 Oct 2024 Janakan Sivaloganathan, Ainaz Jamshidi, Andriy Miranskyy, Lei Zhang

We aim to create an automated framework to detect flaky tests in quantum software and an extended dataset of quantum flaky tests, overcoming the limitations of manual methods.

AdaNeg: Adaptive Negative Proxy Guided OOD Detection with Vision-Language Models

1 code implementation26 Oct 2024 Yabin Zhang, Lei Zhang

To overcome this issue, we introduce \textit{adaptive negative proxies}, which are dynamically generated during testing by exploring actual OOD images, to align more closely with the underlying OOD label space and enhance the efficacy of negative proxy guidance.

MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms

no code implementations24 Oct 2024 Ling-Hao Chen, Wenxun Dai, Xuan Ju, Shunlin Lu, Lei Zhang

Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability.

Motion Generation

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

1 code implementation24 Oct 2024 Zhengqiang Zhang, Ruihuang Li, Lei Zhang

While image generation with diffusion models has achieved a great success, generating images of higher resolution than the training size remains a challenging task due to the high computational cost.

Image Generation

Efficient Antibody Structure Refinement Using Energy-Guided SE(3) Flow Matching

no code implementations22 Oct 2024 Jiying Zhang, Zijing Liu, Shengyuan Bai, He Cao, Yu Li, Lei Zhang

In this paper, we develop a novel antibody structure refinement method termed FlowAB based on energy-guided flow matching.

Specificity

Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion

1 code implementation19 Oct 2024 Chaodong Xiao, Minghan Li, Zhengqiang Zhang, Deyu Meng, Lei Zhang

Selective state space models (SSMs), such as Mamba, highly excel at capturing long-range dependencies in 1D sequential data, while their applications to 2D vision tasks still face challenges.

Image Classification Mamba +1

Toward Generalizing Visual Brain Decoding to Unseen Subjects

1 code implementation18 Oct 2024 Xiangtao Kong, Kexin Huang, Ping Li, Lei Zhang

Prior works typically focus on decoding brain activity of individuals based on the observation that different subjects exhibit different brain activities, while it remains unclear whether brain decoding can be generalized to unseen subjects.

Brain Decoding

UniG: Modelling Unitary 3D Gaussians for View-consistent 3D Reconstruction

2 code implementations17 Oct 2024 Jiamin Wu, Kenkun Liu, Yukai Shi, Xiaoke Jiang, Yuan YAO, Lei Zhang

In this work, we present UniG, a view-consistent 3D reconstruction and novel view synthesis model that generates a high-fidelity representation of 3D Gaussians from sparse images.

3D Reconstruction Decoder +1

Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation

no code implementations17 Oct 2024 Xuezhi Xiang, Xi Wang, Lei Zhang, Denis Ombati, Himaloy Himu, XianTong Zhen

Scene flow estimation aims to generate the 3D motion field of points between two consecutive frames of point clouds, which has wide applications in various fields.

Self-supervised Scene Flow Estimation

LKASeg:Remote-Sensing Image Semantic Segmentation with Large Kernel Attention and Full-Scale Skip Connections

no code implementations14 Oct 2024 Xuezhi Xiang, Yibo Ning, Lei Zhang, Denis Ombati, Himaloy Himu, XianTong Zhen

In this paper, we propose a remote-sensing image semantic segmentation network named LKASeg, which combines Large Kernel Attention(LSKA) and Full-Scale Skip Connections(FSC).

Decoder Semantic Segmentation

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

1 code implementation27 Sep 2024 Jiaming Li, Lei Zhang, Yunshui Li, Ziqiang Liu, Yuelin Bai, Run Luo, Longze Chen, Min Yang

Specifically, Ruler equips LLMs with the ability to generate responses of a specified length based on length constraints within the instructions.

Instruction Following

Self-supervised Monocular Depth Estimation with Large Kernel Attention

no code implementations26 Sep 2024 Xuezhi Xiang, Yao Wang, Lei Zhang, Denis Ombati, Himaloy Himu, XianTong Zhen

Self-supervised monocular depth estimation has emerged as a promising approach since it does not rely on labeled training data.

Decoder Monocular Depth Estimation

LKA-ReID:Vehicle Re-Identification with Large Kernel Attention

no code implementations26 Sep 2024 Xuezhi Xiang, Zhushan Ma, Lei Zhang, Denis Ombati, Himaloy Himu, XianTong Zhen

Using attention mechanism to capture global and local features is crucial to solve the challenge of high similarity between classes in vehicle Re-ID tasks.

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

1 code implementation25 Sep 2024 Yukun Huang, Jianan Wang, Ailing Zeng, Zheng-Jun Zha, Lei Zhang, Xihui Liu

The core of this framework lies in Skeleton-guided Score Distillation and Hybrid 3D Gaussian Avatar representation.

Text to 3D

Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration

no code implementations25 Sep 2024 Jiazhou Zhou, Kanghao Chen, Lei Zhang, Lin Wang

Our key insight is to learn the spatiotemporal relationships from the encoded event features via the state space model (SSM) -- whose linear complexity makes it ideal for modeling high temporal resolution events with longer sequences.

Action Recognition

ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation

no code implementations13 Sep 2024 Kaixin Bai, Huajian Zeng, Lei Zhang, YiWen Liu, Hongli Xu, Zhaopeng Chen, Jianwei Zhang

Transparent object depth perception poses a challenge in everyday life and logistics, primarily due to the inability of standard 3D sensors to accurately capture depth on transparent or reflective surfaces.

Transparent objects

TMFNet: Two-Stream Multi-Channels Fusion Networks for Color Image Operation Chain Detection

no code implementations12 Sep 2024 Yakun Niu, Lei Tan, Lei Zhang, Xianyu Zuo

To solve these issues, in this article, we propose a novel two-stream multi-channels fusion networks for color image operation chain detection in which the spatial artifact stream and the noise residual stream are explored in a complementary manner.

Image Operation Chain Detection

CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization

no code implementations9 Sep 2024 Nan Chen, Mengqi Huang, Zhuowei Chen, Yang Zheng, Lei Zhang, Zhendong Mao

This misconstruction leads to both overfitting or underfitting of irrelevant and intrinsic attributes of the subject, i. e., these attributes are over-represented or under-represented simultaneously, causing a trade-off between similarity and controllability.

Contrastive Learning

A Survey of Multimodal Composite Editing and Retrieval

1 code implementation9 Sep 2024 Suyan Li, Fuxiang Huang, Lei Zhang

To facilitate a deeper understanding of this promising direction, this survey explores multimodal composite editing and retrieval in depth, covering image-text composite editing, image-text composite retrieval, and other multimodal composite retrieval.

Retrieval Survey

Fed-MUnet: Multi-modal Federated Unet for Brain Tumor Segmentation

1 code implementation2 Sep 2024 Ruojun Zhou, Lisha Qu, Lei Zhang, Ziming Li, Hongwei Yu, Bing Luo

To address the above challenges, we propose a novel multi-modal FL framework for brain tumor segmentation (Fed-MUnet) that is suitable for FL training.

Brain Tumor Segmentation Federated Learning +3

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

1 code implementation28 Aug 2024 Fangxun Shu, Yue Liao, Le Zhuo, Chenning Xu, Lei Zhang, Guanghao Zhang, Haonan Shi, Long Chen, Tao Zhong, Wanggui He, Siming Fu, Haoyuan Li, Bolin Li, Zhelun Yu, Si Liu, Hongsheng Li, Hao Jiang

We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM).

Computational Efficiency Hallucination +2

Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

no code implementations19 Aug 2024 Tao Yang, Yangming Shi, Yunwen Huang, Feng Chen, Yin Zheng, Lei Zhang

Text-to-video (T2V) generation has gained significant attention due to its wide applications to video generation, editing, enhancement and translation, \etc.

Descriptive Image to Video Generation

VRCopilot: Authoring 3D Layouts with Generative AI Models in VR

no code implementations18 Aug 2024 Lei Zhang, Jin Pan, Jacob Gettig, Steve Oney, Anhong Guo

Through a series of user studies, we evaluated the potential and challenges in manual, scaffolded, and automatic creation in immersive authoring.

SkillMimic: Learning Reusable Basketball Skills from Demonstrations

no code implementations12 Aug 2024 Yinhuai Wang, Qihan Zhao, Runyi Yu, Ailing Zeng, Jing Lin, Zhengyi Luo, Hok Wai Tsui, Jiwen Yu, Xiu Li, Qifeng Chen, Jian Zhang, Lei Zhang, Ping Tan

SkillMimic employs a unified configuration to learn diverse skills from human-ball motion datasets, with skill diversity and generalization improving as the dataset grows.

SSL: A Self-similarity Loss for Improving Generative Image Super-resolution

1 code implementation11 Aug 2024 Du Chen, Zhengqiang Zhang, Jie Liang, Lei Zhang

Based on the fact that natural images exhibit high self-similarities, i. e., a local patch can have many similar patches to it in the whole image, in this work we propose a simple yet effective self-similarity loss (SSL) to improve the performance of generative Real-ISR models, enhancing the hallucination of structural and textural details while reducing the unpleasant visual artifacts.

Hallucination Image Super-Resolution

Dual-path Collaborative Generation Network for Emotional Video Captioning

1 code implementation6 Aug 2024 Cheng Ye, Weidong Chen, Jingyu Li, Lei Zhang, Zhendong Mao

Emotional Video Captioning is an emerging task that aims to describe factual content with the intrinsic emotions expressed in videos.

Caption Generation Video Captioning

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

1 code implementation4 Aug 2024 Qinshuo Liu, Zixin Wang, Xi-An Li, Xinyao Ji, Lei Zhang, Lin Liu, Zhonghua Liu

Semiparametric statistics play a pivotal role in a wide range of domains, including but not limited to missing data, causal inference, and transfer learning, to name a few.

Causal Inference Transfer Learning

Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model

1 code implementation2 Aug 2024 Yang Jin, Lei Zhang, Shi Yan, Bin Fan, Binglu Wang

Gaze object prediction (GOP) aims to predict the category and location of the object that a human is looking at.

Object object-detection +3

Gradient Harmonization in Unsupervised Domain Adaptation

no code implementations1 Aug 2024 Fuxiang Huang, Suqi Song, Lei Zhang

In this paper, we delve into this issue and introduce two effective solutions known as Gradient Harmonization, including GH and GH++, to mitigate the conflict between domain alignment and classification tasks.

Unsupervised Domain Adaptation

The Llama 3 Herd of Models

1 code implementation31 Jul 2024 Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, Danny Wyatt, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Francisco Guzmán, Frank Zhang, Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Govind Thattai, Graeme Nail, Gregoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, Iliyan Zarov, Imanol Arrieta Ibarra, Isabel Kloumann, Ishan Misra, Ivan Evtimov, Jack Zhang, Jade Copet, Jaewon Lee, Jan Geffert, Jana Vranes, Jason Park, Jay Mahadeokar, Jeet Shah, Jelmer Van der Linde, Jennifer Billock, Jenny Hong, Jenya Lee, Jeremy Fu, Jianfeng Chi, Jianyu Huang, Jiawen Liu, Jie Wang, Jiecao Yu, Joanna Bitton, Joe Spisak, Jongsoo Park, Joseph Rocca, Joshua Johnstun, Joshua Saxe, Junteng Jia, Kalyan Vasuden Alwala, Karthik Prasad, Kartikeya Upasani, Kate Plawiak, Ke Li, Kenneth Heafield, Kevin Stone, Khalid El-Arini, Krithika Iyer, Kshitiz Malik, Kuenley Chiu, Kunal Bhalla, Kushal Lakhotia, Lauren Rantala-Yeary, Laurens van der Maaten, Lawrence Chen, Liang Tan, Liz Jenkins, Louis Martin, Lovish Madaan, Lubo Malo, Lukas Blecher, Lukas Landzaat, Luke de Oliveira, Madeline Muzzi, Mahesh Pasupuleti, Mannat Singh, Manohar Paluri, Marcin Kardas, Maria Tsimpoukelli, Mathew Oldham, Mathieu Rita, Maya Pavlova, Melanie Kambadur, Mike Lewis, Min Si, Mitesh Kumar Singh, Mona Hassan, Naman Goyal, Narjes Torabi, Nikolay Bashlykov, Nikolay Bogoychev, Niladri Chatterji, Ning Zhang, Olivier Duchenne, Onur Çelebi, Patrick Alrassy, Pengchuan Zhang, Pengwei Li, Petar Vasic, Peter Weng, Prajjwal Bhargava, Pratik Dubal, Praveen Krishnan, Punit Singh Koura, Puxin Xu, Qing He, Qingxiao Dong, Ragavan Srinivasan, Raj Ganapathy, Ramon Calderer, Ricardo Silveira Cabral, Robert Stojnic, Roberta Raileanu, Rohan Maheswari, Rohit Girdhar, Rohit Patel, Romain Sauvestre, Ronnie Polidoro, Roshan Sumbaly, Ross Taylor, Ruan Silva, Rui Hou, Rui Wang, Saghar Hosseini, Sahana Chennabasappa, Sanjay Singh, Sean Bell, Seohyun Sonia Kim, Sergey Edunov, Shaoliang Nie, Sharan Narang, Sharath Raparthy, Sheng Shen, Shengye Wan, Shruti Bhosale, Shun Zhang, Simon Vandenhende, Soumya Batra, Spencer Whitman, Sten Sootla, Stephane Collot, Suchin Gururangan, Sydney Borodinsky, Tamar Herman, Tara Fowler, Tarek Sheasha, Thomas Georgiou, Thomas Scialom, Tobias Speckbacher, Todor Mihaylov, Tong Xiao, Ujjwal Karn, Vedanuj Goswami, Vibhor Gupta, Vignesh Ramanathan, Viktor Kerkez, Vincent Gonguet, Virginie Do, Vish Vogeti, Vítor Albiero, Vladan Petrovic, Weiwei Chu, Wenhan Xiong, Wenyin Fu, Whitney Meers, Xavier Martinet, Xiaodong Wang, Xiaofang Wang, Xiaoqing Ellen Tan, Xide Xia, Xinfeng Xie, Xuchao Jia, Xuewei Wang, Yaelle Goldschlag, Yashesh Gaur, Yasmine Babaei, Yi Wen, Yiwen Song, Yuchen Zhang, Yue Li, Yuning Mao, Zacharie Delpierre Coudert, Zheng Yan, Zhengxing Chen, Zoe Papakipos, Aaditya Singh, Aayushi Srivastava, Abha Jain, Adam Kelsey, Adam Shajnfeld, Adithya Gangidi, Adolfo Victoria, Ahuva Goldstand, Ajay Menon, Ajay Sharma, Alex Boesenberg, Alexei Baevski, Allie Feinstein, Amanda Kallet, Amit Sangani, Amos Teo, Anam Yunus, Andrei Lupu, Andres Alvarado, Andrew Caples, Andrew Gu, Andrew Ho, Andrew Poulton, Andrew Ryan, Ankit Ramchandani, Annie Dong, Annie Franco, Anuj Goyal, Aparajita Saraf, Arkabandhu Chowdhury, Ashley Gabriel, Ashwin Bharambe, Assaf Eisenman, Azadeh Yazdan, Beau James, Ben Maurer, Benjamin Leonhardi, Bernie Huang, Beth Loyd, Beto De Paola, Bhargavi Paranjape, Bing Liu, Bo Wu, Boyu Ni, Braden Hancock, Bram Wasti, Brandon Spence, Brani Stojkovic, Brian Gamido, Britt Montalvo, Carl Parker, Carly Burton, Catalina Mejia, Ce Liu, Changhan Wang, Changkyu Kim, Chao Zhou, Chester Hu, Ching-Hsiang Chu, Chris Cai, Chris Tindal, Christoph Feichtenhofer, Cynthia Gao, Damon Civin, Dana Beaty, Daniel Kreymer, Daniel Li, David Adkins, David Xu, Davide Testuggine, Delia David, Devi Parikh, Diana Liskovich, Didem Foss, Dingkang Wang, Duc Le, Dustin Holland, Edward Dowling, Eissa Jamil, Elaine Montgomery, Eleonora Presani, Emily Hahn, Emily Wood, Eric-Tuan Le, Erik Brinkman, Esteban Arcaute, Evan Dunbar, Evan Smothers, Fei Sun, Felix Kreuk, Feng Tian, Filippos Kokkinos, Firat Ozgenel, Francesco Caggioni, Frank Kanayet, Frank Seide, Gabriela Medina Florez, Gabriella Schwarz, Gada Badeer, Georgia Swee, Gil Halpern, Grant Herman, Grigory Sizov, Guangyi, Zhang, Guna Lakshminarayanan, Hakan Inan, Hamid Shojanazeri, Han Zou, Hannah Wang, Hanwen Zha, Haroun Habeeb, Harrison Rudolph, Helen Suk, Henry Aspegren, Hunter Goldman, Hongyuan Zhan, Ibrahim Damlaj, Igor Molybog, Igor Tufanov, Ilias Leontiadis, Irina-Elena Veliche, Itai Gat, Jake Weissman, James Geboski, James Kohli, Janice Lam, Japhet Asher, Jean-Baptiste Gaya, Jeff Marcus, Jeff Tang, Jennifer Chan, Jenny Zhen, Jeremy Reizenstein, Jeremy Teboul, Jessica Zhong, Jian Jin, Jingyi Yang, Joe Cummings, Jon Carvill, Jon Shepard, Jonathan McPhie, Jonathan Torres, Josh Ginsburg, Junjie Wang, Kai Wu, Kam Hou U, Karan Saxena, Kartikay Khandelwal, Katayoun Zand, Kathy Matosich, Kaushik Veeraraghavan, Kelly Michelena, Keqian Li, Kiran Jagadeesh, Kun Huang, Kunal Chawla, Kyle Huang, Lailin Chen, Lakshya Garg, Lavender A, Leandro Silva, Lee Bell, Lei Zhang, Liangpeng Guo, Licheng Yu, Liron Moshkovich, Luca Wehrstedt, Madian Khabsa, Manav Avalani, Manish Bhatt, Martynas Mankus, Matan Hasson, Matthew Lennie, Matthias Reso, Maxim Groshev, Maxim Naumov, Maya Lathi, Meghan Keneally, Miao Liu, Michael L. Seltzer, Michal Valko, Michelle Restrepo, Mihir Patel, Mik Vyatskov, Mikayel Samvelyan, Mike Clark, Mike Macey, Mike Wang, Miquel Jubert Hermoso, Mo Metanat, Mohammad Rastegari, Munish Bansal, Nandhini Santhanam, Natascha Parks, Natasha White, Navyata Bawa, Nayan Singhal, Nick Egebo, Nicolas Usunier, Nikhil Mehta, Nikolay Pavlovich Laptev, Ning Dong, Norman Cheng, Oleg Chernoguz, Olivia Hart, Omkar Salpekar, Ozlem Kalinli, Parkin Kent, Parth Parekh, Paul Saab, Pavan Balaji, Pedro Rittner, Philip Bontrager, Pierre Roux, Piotr Dollar, Polina Zvyagina, Prashant Ratanchandani, Pritish Yuvraj, Qian Liang, Rachad Alao, Rachel Rodriguez, Rafi Ayub, Raghotham Murthy, Raghu Nayani, Rahul Mitra, Rangaprabhu Parthasarathy, Raymond Li, Rebekkah Hogan, Robin Battey, Rocky Wang, Russ Howes, Ruty Rinott, Sachin Mehta, Sachin Siby, Sai Jayesh Bondu, Samyak Datta, Sara Chugh, Sara Hunt, Sargun Dhillon, Sasha Sidorov, Satadru Pan, Saurabh Mahajan, Saurabh Verma, Seiji Yamamoto, Sharadh Ramaswamy, Shaun Lindsay, Sheng Feng, Shenghao Lin, Shengxin Cindy Zha, Shishir Patil, Shiva Shankar, Shuqiang Zhang, Sinong Wang, Sneha Agarwal, Soji Sajuyigbe, Soumith Chintala, Stephanie Max, Stephen Chen, Steve Kehoe, Steve Satterfield, Sudarshan Govindaprasad, Sumit Gupta, Summer Deng, Sungmin Cho, Sunny Virk, Suraj Subramanian, Sy Choudhury, Sydney Goldman, Tal Remez, Tamar Glaser, Tamara Best, Thilo Koehler, Thomas Robinson, Tianhe Li, Tianjun Zhang, Tim Matthews, Timothy Chou, Tzook Shaked, Varun Vontimitta, Victoria Ajayi, Victoria Montanez, Vijai Mohan, Vinay Satish Kumar, Vishal Mangla, Vlad Ionescu, Vlad Poenaru, Vlad Tiberiu Mihailescu, Vladimir Ivanov, Wei Li, Wenchen Wang, WenWen Jiang, Wes Bouaziz, Will Constable, Xiaocheng Tang, Xiaojian Wu, Xiaolan Wang, Xilun Wu, Xinbo Gao, Yaniv Kleinman, Yanjun Chen, Ye Hu, Ye Jia, Ye Qi, Yenda Li, Yilin Zhang, Ying Zhang, Yossi Adi, Youngjin Nam, Yu, Wang, Yu Zhao, Yuchen Hao, Yundi Qian, Yunlu Li, Yuzi He, Zach Rait, Zachary DeVito, Zef Rosnbrick, Zhaoduo Wen, Zhenyu Yang, Zhiwei Zhao, Zhiyu Ma

This paper presents a new set of foundation models, called Llama 3.

Language Modelling Multi-task Language Understanding +2

GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

no code implementations25 Jul 2024 Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang

Although these approaches have shown promising results, their performance is constrained by the limited representation ability of discrete latent codes in the encoded features.

Decoder Image Super-Resolution

Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection

no code implementations23 Jul 2024 Su Li, Wang Liang, Jianye Wang, Ziheng Zhang, Lei Zhang

Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions.

Language Modelling Optical Flow Estimation

SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition

1 code implementation23 Jul 2024 Wenbo Huang, Jinghui Zhang, Xuwei Qian, Zhen Wu, Meng Wang, Lei Zhang

High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density.

Few-Shot action recognition Few Shot Action Recognition +1

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

no code implementations23 Jul 2024 Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang

In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task.

Position

LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning

no code implementations18 Jul 2024 Wei Huang, Wei Liu, XiaoMing Zhang, Xiaoli Yin, Xu Han, Chunli Li, Yuan Gao, Yu Shi, Le Lu, Ling Zhang, Lei Zhang, Ke Yan

The early detection and precise diagnosis of liver tumors are tasks of critical clinical value, yet they pose significant challenges due to the high heterogeneity and variability of liver tumors.

Contrastive Learning

Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

no code implementations18 Jul 2024 Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation.

Knowledge Distillation Representation Learning +1

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

no code implementations17 Jul 2024 Kaixin Bai, Lei Zhang, Zhaopeng Chen, Fang Wan, Jianwei Zhang

Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling.

Deep Learning Instance Segmentation +5

Dilated convolution neural operator for multiscale partial differential equations

no code implementations16 Jul 2024 Bo Xu, Xinliang Liu, Lei Zhang

This paper introduces a data-driven operator learning method for multiscale partial differential equations, with a particular emphasis on preserving high-frequency information.

Operator learning

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

no code implementations13 Jul 2024 Ruihuang Li, Zhengqiang Zhang, Chenhang He, Zhiyuan Ma, Vishal M. Patel, Lei Zhang

Recent vision-language pre-training models have exhibited remarkable generalization ability in zero-shot recognition tasks.

Scene Understanding Zero-Shot Learning

LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models

2 code implementations12 Jul 2024 Yabin Zhang, Wenjie Zhu, Chenhang He, Lei Zhang

The LAPT framework operates autonomously, requiring only ID class names as input and eliminating the need for manual intervention.

Image Generation Out of Distribution (OOD) Detection +1

A Text-to-Game Engine for UGC-Based Role-Playing Games

no code implementations11 Jul 2024 Lei Zhang, Xuezheng Peng, Shuyi Yang, Feiyang Wang

The shift from professionally generated content (PGC) to user-generated content (UGC) has revolutionized various media formats, from text to video.

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

1 code implementation10 Jul 2024 Wanggui He, Siming Fu, Mushui Liu, Xierui Wang, Wenyi Xiao, Fangxun Shu, Yi Wang, Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, Leilei Gan, Hao Jiang

Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis.

Image Generation Text Generation

EMBANet: A Flexible Efffcient Multi-branch Attention Network

no code implementations7 Jul 2024 Keke Zu, Hu Zhang, Jian Lu, Lei Zhang, Chen Xu

The proposed MBC module brings new degrees of freedom (DoF) for the design of attention networks by allowing the type of transformation operators and the number of branches to be flexibly adjusted.

SymPoint Revolutionized: Boosting Panoptic Symbol Spotting with Layer Feature Enhancement

1 code implementation2 Jul 2024 Wenlong Liu, Tianyu Yang, QiZhi Yu, Lei Zhang

In particular, we first propose a Layer Feature-Enhanced module (LFE) to encode the graphical layer information into the primitive feature, which significantly boosts the performance.

TokenPacker: Efficient Visual Projector for Multimodal LLM

1 code implementation2 Jul 2024 Wentong Li, Yuqian Yuan, Jian Liu, Dongqi Tang, Song Wang, Jie Qin, Jianke Zhu, Lei Zhang

However, the visual tokens are redundant and can be considerably increased when dealing with high-resolution images, impairing the efficiency of MLLMs significantly.

Language Modelling Large Language Model +2

ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

1 code implementation2 Jul 2024 Zhiyuan Ma, Yuxiang Wei, Yabin Zhang, Xiangyu Zhu, Zhen Lei, Lei Zhang

Current state-of-the-arts such as Variational Score Distillation finetune the pretrained diffusion model to minimize the noise prediction error so as to align the distributions, which are however unstable to train and will impair the model's comprehension capability to numerous text prompts.

Text to 3D

Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

1 code implementation26 Jun 2024 Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang

We applied the HCP strategy in experiments with six Repo-Code LLMs, and the results demonstrate that our proposed method can significantly enhance completion accuracy while substantially reducing the length of input.

Code Completion

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

1 code implementation25 Jun 2024 Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows.

Benchmarking Long-Context Understanding +2

CausalMMM: Learning Causal Structure for Marketing Mix Modeling

no code implementations24 Jun 2024 Chang Gong, Di Yao, Lei Zhang, Sheng Chen, Wenbin Li, Yueyang Su, Jingping Bi

We argue that causal MMM needs dynamically discover specific causal structures for different shops and the predictions should comply with the prior known marketing response patterns.

Marketing Variational Inference

Soft Masked Mamba Diffusion Model for CT to MRI Conversion

1 code implementation22 Jun 2024 Zhenbin Wang, Lei Zhang, Lituan Wang, Zhenwei Zhang

Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are the predominant modalities utilized in the field of medical imaging.

Computed Tomography (CT) Image Generation +2

Toward Exploring the Code Understanding Capabilities of Pre-trained Code Generation Models

no code implementations18 Jun 2024 Jiayi Lin, Yutao Xie, Yue Yu, Yibiao Yang, Lei Zhang

While these models acquire vast amounts of code knowledge, they perform poorly on code understanding tasks, such as code search and clone detection, as they are specifically trained for generation.

Clone Detection Code Generation +3

Integrating sensing and communications: Simultaneously transmitting and reflecting digital coding metasurfaces

no code implementations16 Jun 2024 Francesco Verde, Vincenzo Galdi, Lei Zhang, Tie Jun Cui

Wireless networks are undergoing a transformative shift, driven by the crucial factors of cost effectiveness and sustainability.

Voxel Mamba: Group-Free State Space Models for Point Cloud based 3D Object Detection

1 code implementation15 Jun 2024 Guowen Zhang, Lue Fan, Chenhang He, Zhen Lei, Zhaoxiang Zhang, Lei Zhang

Inspired by the recent advances of state space models (SSMs), we present a Voxel SSM, termed as Voxel Mamba, which employs a group-free strategy to serialize the whole space of voxels into a single sequence.

3D Object Detection Computational Efficiency +3

Interference Analysis for Coexistence of UAVs and Civil Aircrafts Based on Automatic Dependent Surveillance-Broadcast

no code implementations12 Jun 2024 Yiyang Liao, Ziye Jia, Chao Dong, Lei Zhang, Qihui Wu, Huiling Hu, Zhu Han

However, due to the limited resource of channel capacity, UAVs equipped with ADS-B results in the interference between UAVs and civil aircrafts (CAs), which further impacts the accuracy of received information at GSs.

One-Step Effective Diffusion Network for Real-World Image Super-Resolution

1 code implementation12 Jun 2024 Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, Lei Zhang

Most of the existing methods start from random noise to reconstruct the high-quality (HQ) image under the guidance of the given low-quality (LQ) image.

Image Restoration Image Super-Resolution

Open-World Human-Object Interaction Detection via Multi-modal Prompts

no code implementations CVPR 2024 Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang

In this paper, we develop \textbf{MP-HOI}, a powerful Multi-modal Prompt-based HOI detector designed to leverage both textual descriptions for open-set generalization and visual exemplars for handling high ambiguity in descriptions, realizing HOI detection in the open world.

Human-Object Interaction Detection

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Mamba

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

1 code implementation30 May 2024 Ling-Hao Chen, Shunlin Lu, Ailing Zeng, Hao Zhang, Benyou Wang, Ruimao Zhang, Lei Zhang

This study delves into the realm of multi-modality (i. e., video and motion modalities) human behavior understanding by leveraging the powerful capabilities of Large Language Models (LLMs).

Network Interdiction Goes Neural

no code implementations26 May 2024 Lei Zhang, Zhiqian Chen, Chang-Tien Lu, Liang Zhao

Network interdiction problems are combinatorial optimization problems involving two players: one aims to solve an optimization problem on a network, while the other seeks to modify the network to thwart the first player's objectives.

Combinatorial Optimization Graph Matching +1

Spiking Neural Network Phase Encoding for Cognitive Computing

no code implementations25 May 2024 Lei Zhang

Additionally, the paper discusses the encoding of impulse delays and the phase differences between adjacent frequency components.

Time Series

LIRE: listwise reward enhancement for preference alignment

1 code implementation22 May 2024 Mingye Zhu, Yi Liu, Lei Zhang, Junbo Guo, Zhendong Mao

Recently, tremendous strides have been made to align the generation of Large Language Models (LLMs) with human values to mitigate toxic or unhelpful content.

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

2 code implementations16 May 2024 Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

Empirical results demonstrate the effectiveness of Grounding DINO 1. 5, with the Grounding DINO 1. 5 Pro model attaining a 54. 3 AP on the COCO detection benchmark and a 55. 7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection.

 Ranked #1 on Zero-Shot Object Detection on MSCOCO (using extra training data)

Edge-computing Few-Shot Object Detection +2

PCLMix: Weakly Supervised Medical Image Segmentation via Pixel-Level Contrastive Learning and Dynamic Mix Augmentation

1 code implementation10 May 2024 Yu Lei, Haolun Luo, Lituan Wang, Zhenwei Zhang, Lei Zhang

In weakly supervised medical image segmentation, the absence of structural priors and the discreteness of class feature distribution present a challenge, i. e., how to accurately propagate supervision signals from local to global regions without excessively spreading them to other irrelevant regions?

Contrastive Learning Decoder +4

MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation

1 code implementation9 May 2024 Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, WangMeng Zuo

In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability.

Text-to-Image Generation

122