Search Results for author: Mei Chen

Found 35 papers, 14 papers with code

Defense against Prompt Injection Attacks via Mixture of Encodings

1 code implementation10 Apr 2025 Ruiyi Zhang, David Sullivan, Kyle Jackson, Pengtao Xie, Mei Chen

Large Language Models (LLMs) have emerged as a dominant approach for a wide range of NLP tasks, with their access to external information further enhancing their capabilities.

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

1 code implementation17 Feb 2025 Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, HongYu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu, Jianchang Wu, Jiangjie Zhen, Ranchen Ming, Song Yuan, Xuelin Zhang, Yu Zhou, Bingxin Li, Buyun Ma, Hongyuan Wang, Kang An, Wei Ji, Wen Li, Xuan Wen, Xiangwen Kong, Yuankai Ma, Yuanwei Liang, Yun Mou, Bahtiyar Ahmidi, Bin Wang, Bo Li, Changxin Miao, Chen Xu, Chenrun Wang, Dapeng Shi, Deshan Sun, Dingyuan Hu, Dula Sai, Enle Liu, Guanzhe Huang, Gulin Yan, Heng Wang, Haonan Jia, Haoyang Zhang, Jiahao Gong, Junjing Guo, Jiashuai Liu, Jiahong Liu, Jie Feng, Jie Wu, Jiaoren Wu, Jie Yang, Jinguo Wang, Jingyang Zhang, Junzhe Lin, Kaixiang Li, Lei Xia, Li Zhou, Liang Zhao, Longlong Gu, Mei Chen, Menglin Wu, Ming Li, Mingxiao Li, Mingliang Li, Mingyao Liang, Na Wang, Nie Hao, Qiling Wu, Qinyuan Tan, Ran Sun, Shuai Shuai, Shaoliang Pang, Shiliang Yang, Shuli Gao, Shanshan Yuan, SiQi Liu, Shihong Deng, Shilei Jiang, Sitong Liu, Tiancheng Cao, Tianyu Wang, Wenjin Deng, Wuxun Xie, Weipeng Ming, Wenqing He, Wen Sun, Xin Han, Xin Huang, Xiaomin Deng, Xiaojia Liu, Xin Wu, Xu Zhao, Yanan Wei, Yanbo Yu, Yang Cao, Yangguang Li, Yangzhen Ma, Yanming Xu, Yaoyu Wang, Yaqiang Shi, Yilei Wang, Yizhuang Zhou, Yinmin Zhong, Yang Zhang, Yaoben Wei, Yu Luo, Yuanwei Lu, Yuhe Yin, Yuchu Luo, Yuanhao Ding, Yuting Yan, Yaqi Dai, Yuxiang Yang, Zhe Xie, Zheng Ge, Zheng Sun, Zhewei Huang, Zhichao Chang, Zhisheng Guan, Zidong Yang, Zili Zhang, Binxing Jiao, Daxin Jiang, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu

Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following.

Instruction Following Voice Cloning

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

3 code implementations14 Feb 2025 Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang, Guanzhe Huang, Gulin Yan, Haiyang Feng, Hao Nie, Haonan Jia, Hanpeng Hu, Hanqi Chen, Haolong Yan, Heng Wang, Hongcheng Guo, Huilin Xiong, Huixin Xiong, Jiahao Gong, Jianchang Wu, Jiaoren Wu, Jie Wu, Jie Yang, Jiashuai Liu, Jiashuo Li, Jingyang Zhang, Junjing Guo, Junzhe Lin, Kaixiang Li, Lei Liu, Lei Xia, Liang Zhao, Liguo Tan, Liwen Huang, Liying Shi, Ming Li, Mingliang Li, Muhua Cheng, Na Wang, Qiaohui Chen, Qinglin He, Qiuyan Liang, Quan Sun, Ran Sun, Rui Wang, Shaoliang Pang, Shiliang Yang, Sitong Liu, SiQi Liu, Shuli Gao, Tiancheng Cao, Tianyu Wang, Weipeng Ming, Wenqing He, Xu Zhao, Xuelin Zhang, Xianfang Zeng, Xiaojia Liu, Xuan Yang, Yaqi Dai, Yanbo Yu, Yang Li, Yineng Deng, Yingming Wang, Yilei Wang, Yuanwei Lu, Yu Chen, Yu Luo, Yuchu Luo, Yuhe Yin, Yuheng Feng, Yuxiang Yang, Zecheng Tang, Zekai Zhang, Zidong Yang, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang

We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.

Video Generation Video Reconstruction

Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment

no code implementations7 Feb 2025 Minh-Quan Le, Gaurav Mittal, Tianjian Meng, A S M Iftekhar, Vishwas Suryanarayanan, Barun Patra, Dimitris Samaras, Mei Chen

While diffusion models are powerful in generating high-quality, diverse synthetic data for object-centric tasks, existing methods struggle with scene-aware tasks such as Visual Question Answering (VQA) and Human-Object Interaction (HOI) Reasoning, where it is critical to preserve scene attributes in generated images consistent with a multimodal context, i. e. a reference image with accompanying text guidance query.

Diversity Human-Object Interaction Detection +4

PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models

1 code implementation9 Dec 2024 Qian Zhang, Panfeng Chen, Jiali Li, Linkun Feng, Shuyu Liu, Heng Zhao, Mei Chen, Hui Li, Yanhao Wang

Through an in-depth analysis of experimental results, we offer insights into the ability of LLMs to answer pediatric questions in the Chinese context, highlighting their limitations for further improvements.

Benchmarking Instruction Following +1

SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving

no code implementations23 Nov 2024 Su Sun, Cheng Zhao, Zhuoyang Sun, Yingjie Victor Chen, Mei Chen

SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields.

Autonomous Driving Image Reconstruction +1

Cell as Point: One-Stage Framework for Efficient Cell Tracking

no code implementations22 Nov 2024 Yaxuan Song, Jianan Fan, Heng Huang, Mei Chen, Weidong Cai

To solve these challenges, CAP introduces two key innovations: (1) adaptive event-guided (AEG) sampling, which prioritizes cell division events to mitigate the occurrence imbalance of cell events, and (2) the rolling-as-window (RAW) inference strategy, which ensures continuous and stable tracking of newly emerging cells over extended sequences.

Cell Tracking Diagnostic +1

Phenome-wide causal proteomics enhance systemic lupus erythematosus flare prediction: A study in Asian populations

no code implementations18 Nov 2024 Liying Chen, Ou Deng, Ting Fang, Mei Chen, Xvfeng Zhang, Ruichen Cong, Dingqi Lu, Runrun Zhang, Qun Jin, Xinchang Wang

The identification of key proteins and their causal relationships with flare-related clinical markers provides valuable insights for proactive SLE management and personalized therapeutic approaches.

2k Management

Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection

1 code implementation24 Jul 2023 Christopher Clarke, Matthew Hall, Gaurav Mittal, Ye Yu, Sandra Sajeev, Jason Mars, Mei Chen

In this paper, we present Rule By Example (RBE): a novel exemplar-based contrastive learning approach for learning from logical rules for the task of textual content moderation.

Contrastive Learning Deep Learning +1

Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed-modality

no code implementations17 May 2023 Jialin Yuan, Ye Yu, Gaurav Mittal, Matthew Hall, Sandra Sajeev, Mei Chen

There is a rapidly growing need for multimodal content moderation (CM) as more and more content on social media is multimodal in nature.

PivoTAL: Prior-Driven Supervision for Weakly-Supervised Temporal Action Localization

no code implementations CVPR 2023 Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen

To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.

Weakly Supervised Action Localization Weakly Supervised Temporal Action Localization

ProTeGe: Untrimmed Pretraining for Video Temporal Grounding by Video Temporal Grounding

no code implementations CVPR 2023 Lan Wang, Gaurav Mittal, Sandra Sajeev, Ye Yu, Matthew Hall, Vishnu Naresh Boddeti, Mei Chen

We present ProTeGe as the first method to perform VTG-based untrimmed pretraining to bridge the gap between trimmed pretrained backbones and downstream VTG tasks.

text similarity

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

no code implementations1 Aug 2022 Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen

It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries.

 Ranked #1 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)

Object Optical Flow Estimation +6

GateHUB: Gated History Unit with Background Suppression for Online Action Detection

no code implementations CVPR 2022 Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, Mei Chen

We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction.

Online Action Detection Optical Flow Estimation

MUSE: Feature Self-Distillation with Mutual Information and Self-Information

no code implementations25 Oct 2021 Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, Mei Chen

Importantly, we argue and empirically demonstrate that MUSE, compared to other feature discrepancy functions, is a more functional proxy to introduce dependency and effectively improve the expressivity of all features in the knowledge distillation framework.

Image Classification Knowledge Distillation +2

DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology Image Analysis

no code implementations13 Sep 2021 Tiange Xiang, Yang song, Chaoyi Zhang, Dongnan Liu, Mei Chen, Fan Zhang, Heng Huang, Lauren O'Donnell, Weidong Cai

With image-level labels only, patch-wise classification would be sub-optimal due to inconsistency between the patch appearance and image-level label.

Classification whole slide images

Seismic Inverse Modeling Method based on Generative Adversarial Network

no code implementations8 Jun 2021 Pengfei Xie, YanShu Yin, JiaGen Hou, Mei Chen, LiXin Wang

Seismic inverse modeling is a common method in reservoir prediction and it plays a vital role in the exploration and development of oil and gas.

Generative Adversarial Network Seismic Inversion

Revisiting Dynamic Convolution via Matrix Decomposition

1 code implementation ICLR 2021 Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos

It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.

Dimensionality Reduction

Stronger NAS with Weaker Predictors

1 code implementation NeurIPS 2021 Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.

Neural Architecture Search

Weak NAS Predictor Is All You Need

no code implementations1 Jan 2021 Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan

Rather than expecting a single strong predictor to model the whole space, we seek a progressive line of weak predictors that can connect a path to the best architecture, thus greatly simplifying the learning task of each predictor.

All Neural Architecture Search

PDAM: A Panoptic-Level Feature Alignment Framework for Unsupervised Domain Adaptive Instance Segmentation in Microscopy Images

1 code implementation11 Sep 2020 Dongnan Liu, Donghao Zhang, Yang song, Fan Zhang, Lauren O'Donnell, Heng Huang, Mei Chen, Weidong Cai

In this work, we present an unsupervised domain adaptation (UDA) method, named Panoptic Domain Adaptive Mask R-CNN (PDAM), for unsupervised instance segmentation in microscopy images.

Instance Segmentation Segmentation +3

HyperSTAR: Task-Aware Hyperparameters for Deep Networks

no code implementations CVPR 2020 Gaurav Mittal, Chang Liu, Nikolaos Karianakis, Victor Fragoso, Mei Chen, Yun Fu

To reduce HPO time, we present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a task-aware method to warm-start HPO for deep neural networks.

Hyperparameter Optimization Image Classification

Region and Object based Panoptic Image Synthesis through Conditional GANs

no code implementations14 Dec 2019 Heng Wang, Donghao Zhang, Yang song, Heng Huang, Mei Chen, Weidong Cai

Our contribution consists of the proposal of a significant task worth investigating and a naive baseline of solving it.

Image-to-Image Translation Translation

FARSA: Fully Automated Roadway Safety Assessment

1 code implementation17 Jan 2019 Weilian Song, Scott Workman, Armin Hadzic, Xu Zhang, Eric Green, Mei Chen, Reginald Souleyrette, Nathan Jacobs

An emerging approach for conducting such assessments in the United States is through the US Road Assessment Program (usRAP), which rates roads from highest risk (1 star) to lowest (5 stars).

Cannot find the paper you are looking for? You can Submit a new open access paper.