1 code implementation • 10 Apr 2025 • Ruiyi Zhang, David Sullivan, Kyle Jackson, Pengtao Xie, Mei Chen
Large Language Models (LLMs) have emerged as a dominant approach for a wide range of NLP tasks, with their access to external information further enhancing their capabilities.
1 code implementation • 17 Feb 2025 • Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, HongYu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu, Jianchang Wu, Jiangjie Zhen, Ranchen Ming, Song Yuan, Xuelin Zhang, Yu Zhou, Bingxin Li, Buyun Ma, Hongyuan Wang, Kang An, Wei Ji, Wen Li, Xuan Wen, Xiangwen Kong, Yuankai Ma, Yuanwei Liang, Yun Mou, Bahtiyar Ahmidi, Bin Wang, Bo Li, Changxin Miao, Chen Xu, Chenrun Wang, Dapeng Shi, Deshan Sun, Dingyuan Hu, Dula Sai, Enle Liu, Guanzhe Huang, Gulin Yan, Heng Wang, Haonan Jia, Haoyang Zhang, Jiahao Gong, Junjing Guo, Jiashuai Liu, Jiahong Liu, Jie Feng, Jie Wu, Jiaoren Wu, Jie Yang, Jinguo Wang, Jingyang Zhang, Junzhe Lin, Kaixiang Li, Lei Xia, Li Zhou, Liang Zhao, Longlong Gu, Mei Chen, Menglin Wu, Ming Li, Mingxiao Li, Mingliang Li, Mingyao Liang, Na Wang, Nie Hao, Qiling Wu, Qinyuan Tan, Ran Sun, Shuai Shuai, Shaoliang Pang, Shiliang Yang, Shuli Gao, Shanshan Yuan, SiQi Liu, Shihong Deng, Shilei Jiang, Sitong Liu, Tiancheng Cao, Tianyu Wang, Wenjin Deng, Wuxun Xie, Weipeng Ming, Wenqing He, Wen Sun, Xin Han, Xin Huang, Xiaomin Deng, Xiaojia Liu, Xin Wu, Xu Zhao, Yanan Wei, Yanbo Yu, Yang Cao, Yangguang Li, Yangzhen Ma, Yanming Xu, Yaoyu Wang, Yaqiang Shi, Yilei Wang, Yizhuang Zhou, Yinmin Zhong, Yang Zhang, Yaoben Wei, Yu Luo, Yuanwei Lu, Yuhe Yin, Yuchu Luo, Yuanhao Ding, Yuting Yan, Yaqi Dai, Yuxiang Yang, Zhe Xie, Zheng Ge, Zheng Sun, Zhewei Huang, Zhichao Chang, Zhisheng Guan, Zidong Yang, Zili Zhang, Binxing Jiao, Daxin Jiang, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu
Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following.
3 code implementations • 14 Feb 2025 • Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang, Guanzhe Huang, Gulin Yan, Haiyang Feng, Hao Nie, Haonan Jia, Hanpeng Hu, Hanqi Chen, Haolong Yan, Heng Wang, Hongcheng Guo, Huilin Xiong, Huixin Xiong, Jiahao Gong, Jianchang Wu, Jiaoren Wu, Jie Wu, Jie Yang, Jiashuai Liu, Jiashuo Li, Jingyang Zhang, Junjing Guo, Junzhe Lin, Kaixiang Li, Lei Liu, Lei Xia, Liang Zhao, Liguo Tan, Liwen Huang, Liying Shi, Ming Li, Mingliang Li, Muhua Cheng, Na Wang, Qiaohui Chen, Qinglin He, Qiuyan Liang, Quan Sun, Ran Sun, Rui Wang, Shaoliang Pang, Shiliang Yang, Sitong Liu, SiQi Liu, Shuli Gao, Tiancheng Cao, Tianyu Wang, Weipeng Ming, Wenqing He, Xu Zhao, Xuelin Zhang, Xianfang Zeng, Xiaojia Liu, Xuan Yang, Yaqi Dai, Yanbo Yu, Yang Li, Yineng Deng, Yingming Wang, Yilei Wang, Yuanwei Lu, Yu Chen, Yu Luo, Yuchu Luo, Yuhe Yin, Yuheng Feng, Yuxiang Yang, Zecheng Tang, Zekai Zhang, Zidong Yang, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang
We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.
no code implementations • 7 Feb 2025 • Minh-Quan Le, Gaurav Mittal, Tianjian Meng, A S M Iftekhar, Vishwas Suryanarayanan, Barun Patra, Dimitris Samaras, Mei Chen
While diffusion models are powerful in generating high-quality, diverse synthetic data for object-centric tasks, existing methods struggle with scene-aware tasks such as Visual Question Answering (VQA) and Human-Object Interaction (HOI) Reasoning, where it is critical to preserve scene attributes in generated images consistent with a multimodal context, i. e. a reference image with accompanying text guidance query.
1 code implementation • 9 Dec 2024 • Qian Zhang, Panfeng Chen, Jiali Li, Linkun Feng, Shuyu Liu, Heng Zhao, Mei Chen, Hui Li, Yanhao Wang
Through an in-depth analysis of experimental results, we offer insights into the ability of LLMs to answer pediatric questions in the Chinese context, highlighting their limitations for further improvements.
no code implementations • 23 Nov 2024 • Su Sun, Cheng Zhao, Zhuoyang Sun, Yingjie Victor Chen, Mei Chen
SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields.
no code implementations • 22 Nov 2024 • Yaxuan Song, Jianan Fan, Heng Huang, Mei Chen, Weidong Cai
To solve these challenges, CAP introduces two key innovations: (1) adaptive event-guided (AEG) sampling, which prioritizes cell division events to mitigate the occurrence imbalance of cell events, and (2) the rolling-as-window (RAW) inference strategy, which ensures continuous and stable tracking of newly emerging cells over extended sequences.
no code implementations • 18 Nov 2024 • Liying Chen, Ou Deng, Ting Fang, Mei Chen, Xvfeng Zhang, Ruichen Cong, Dingqi Lu, Runrun Zhang, Qun Jin, Xinchang Wang
The identification of key proteins and their causal relationships with flare-related clinical markers provides valuable insights for proactive SLE management and personalized therapeutic approaches.
1 code implementation • 14 Jul 2024 • Jianan Fan, Dongnan Liu, Canran Li, Hang Chang, Heng Huang, Filip Braet, Mei Chen, Weidong Cai
Cellular nuclei recognition serves as a fundamental and essential step in the workflow of digital pathology.
no code implementations • 1 Apr 2024 • Akshita Gupta, Gaurav Mittal, Ahmed Magooda, Ye Yu, Graham W. Taylor, Mei Chen
Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video.
no code implementations • CVPR 2024 • Jianan Fan, Dongnan Liu, Hang Chang, Heng Huang, Mei Chen, Weidong Cai
Machine learning holds tremendous promise for transforming the fundamental practice of scientific discovery by virtue of its data-driven nature.
1 code implementation • 17 Nov 2023 • Pranav Singh, Raviteja Chukkapalli, Shravan Chaudhari, Luoyao Chen, Mei Chen, Jinqian Pan, Craig Smuda, Jacopo Cirrone
Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks.
no code implementations • 26 Oct 2023 • Ahmed Magooda, Alec Helyar, Kyle Jackson, David Sullivan, Chad Atalla, Emily Sheng, Dan Vann, Richard Edgar, Hamid Palangi, Roman Lutz, Hongliang Kong, Vincent Yun, Eslam Kamal, Federico Zarfati, Hanna Wallach, Sarah Bird, Mei Chen
We present a framework for the automated measurement of responsible AI (RAI) metrics for large language models (LLMs) and associated products and services.
no code implementations • 21 Aug 2023 • Pranav Singh, Luoyao Chen, Mei Chen, Jinqian Pan, Raviteja Chukkapalli, Shravan Chaudhari, Jacopo Cirrone
In this paper, we present a deep-learning approach tailored for Medical image segmentation.
1 code implementation • ICCV 2023 • Jianan Fan, Dongnan Liu, Hang Chang, Heng Huang, Mei Chen, Weidong Cai
The success of automated medical image analysis depends on large-scale and expert-annotated training sets.
1 code implementation • 24 Jul 2023 • Christopher Clarke, Matthew Hall, Gaurav Mittal, Ye Yu, Sandra Sajeev, Jason Mars, Mei Chen
In this paper, we present Rule By Example (RBE): a novel exemplar-based contrastive learning approach for learning from logical rules for the task of textual content moderation.
no code implementations • 17 May 2023 • Jialin Yuan, Ye Yu, Gaurav Mittal, Matthew Hall, Sandra Sajeev, Mei Chen
There is a rapidly growing need for multimodal content moderation (CM) as more and more content on social media is multimodal in nature.
no code implementations • CVPR 2023 • Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen
To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.
Weakly Supervised Action Localization
Weakly Supervised Temporal Action Localization
no code implementations • CVPR 2023 • Lan Wang, Gaurav Mittal, Sandra Sajeev, Ye Yu, Matthew Hall, Vishnu Naresh Boddeti, Mei Chen
We present ProTeGe as the first method to perform VTG-based untrimmed pretraining to bridge the gap between trimmed pretrained backbones and downstream VTG tasks.
no code implementations • 1 Aug 2022 • Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen
It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries.
Ranked #1 on
Video Object Segmentation
on DAVIS 2017 (test-dev)
(using extra training data)
no code implementations • CVPR 2022 • Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, Mei Chen
We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction.
Ranked #1 on
Online Action Detection
on TVSeries
no code implementations • 25 Oct 2021 • Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, Mei Chen
Importantly, we argue and empirically demonstrate that MUSE, compared to other feature discrepancy functions, is a more functional proxy to introduce dependency and effectively improve the expressivity of all features in the knowledge distillation framework.
no code implementations • ICCV 2021 • Jay Patravali, Gaurav Mittal, Ye Yu, Fuxin Li, Mei Chen
We present MetaUVFS as the first Unsupervised Meta-learning algorithm for Video Few-Shot action recognition.
no code implementations • 13 Sep 2021 • Tiange Xiang, Yang song, Chaoyi Zhang, Dongnan Liu, Mei Chen, Fan Zhang, Heng Huang, Lauren O'Donnell, Weidong Cai
With image-level labels only, patch-wise classification would be sub-optimal due to inconsistency between the patch appearance and image-level label.
no code implementations • 8 Jun 2021 • Pengfei Xie, YanShu Yin, JiaGen Hou, Mei Chen, LiXin Wang
Seismic inverse modeling is a common method in reservoir prediction and it plays a vital role in the exploration and development of oil and gas.
1 code implementation • ICLR 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos
It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.
1 code implementation • NeurIPS 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan
We propose a paradigm shift from fitting the whole architecture space using one strong predictor, to progressively fitting a search path towards the high-performance sub-space through a set of weaker predictors.
no code implementations • 1 Jan 2021 • Junru Wu, Xiyang Dai, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Ye Yu, Zhangyang Wang, Zicheng Liu, Mei Chen, Lu Yuan
Rather than expecting a single strong predictor to model the whole space, we seek a progressive line of weak predictors that can connect a path to the best architecture, thus greatly simplifying the learning task of each predictor.
1 code implementation • ACCV 2020 • Jedrzej Kozerawski, Victor Fragoso, Nikolaos Karianakis, Gaurav Mittal, Matthew Turk, Mei Chen
Unfortunately, this imbalance enables a visual recognition system to perform well on head classes but poorly on tail classes.
Ranked #57 on
Long-tail Learning
on ImageNet-LT
1 code implementation • 11 Sep 2020 • Dongnan Liu, Donghao Zhang, Yang song, Fan Zhang, Lauren O'Donnell, Heng Huang, Mei Chen, Weidong Cai
In this work, we present an unsupervised domain adaptation (UDA) method, named Panoptic Domain Adaptive Mask R-CNN (PDAM), for unsupervised instance segmentation in microscopy images.
no code implementations • 14 Jun 2020 • Armin Hadzic, Hunter Blanton, Weilian Song, Mei Chen, Scott Workman, Nathan Jacobs
Roadway free-flow speed captures the typical vehicle speed in low traffic conditions.
no code implementations • CVPR 2020 • Gaurav Mittal, Chang Liu, Nikolaos Karianakis, Victor Fragoso, Mei Chen, Yun Fu
To reduce HPO time, we present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a task-aware method to warm-start HPO for deep neural networks.
1 code implementation • CVPR 2020 • Dongnan Liu, Donghao Zhang, Yang song, Fan Zhang, Lauren O'Donnell, Heng Huang, Mei Chen, Weidong Cai
More specifically, we first propose a nuclei inpainting mechanism to remove the auxiliary generated objects in the synthesized images.
no code implementations • 14 Dec 2019 • Heng Wang, Donghao Zhang, Yang song, Heng Huang, Mei Chen, Weidong Cai
Our contribution consists of the proposal of a significant task worth investigating and a naive baseline of solving it.
1 code implementation • 17 Jan 2019 • Weilian Song, Scott Workman, Armin Hadzic, Xu Zhang, Eric Green, Mei Chen, Reginald Souleyrette, Nathan Jacobs
An emerging approach for conducting such assessments in the United States is through the US Road Assessment Program (usRAP), which rates roads from highest risk (1 star) to lowest (5 stars).