no code implementations • 27 May 2025 • Tianhua Qi, Shiyan Wang, Cheng Lu, Tengfei Song, Hao Yang, Zhanglin Wu, Wenming Zheng
Experimental results demonstrate that PromptEVC outperforms state-of-the-art controllable EVC methods in emotion conversion, intensity control, mixed emotion synthesis, and prosody manipulation.
no code implementations • 3 Apr 2025 • Chu Han, Bingchao Zhao, Jiatai Lin, Shanshan Lyu, Longfei Wang, Tianpeng Deng, Cheng Lu, Changhong Liang, Hannah Y. Wen, Xiaojing Guo, Zhenwei Shi, Zaiyi Liu
Despite the impressive performance across a wide range of applications, current computational pathology models face significant diagnostic efficiency challenges due to their reliance on high-magnification whole-slide image analysis.
no code implementations • 3 Apr 2025 • Shaoyuan Xu, Cheng Lu, Mark Shaw, Peter Bauer, Jan P. Allebach
Digital copiers and printers are widely used nowadays.
no code implementations • 25 Oct 2024 • OpenAI, :, Aaron Hurst, Adam Lerer, Adam P. Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, Aleksander Mądry, Alex Baker-Whitcomb, Alex Beutel, Alex Borzunov, Alex Carney, Alex Chow, Alex Kirillov, Alex Nichol, Alex Paino, Alex Renzin, Alex Tachard Passos, Alexander Kirillov, Alexi Christakis, Alexis Conneau, Ali Kamali, Allan Jabri, Allison Moyer, Allison Tam, Amadou Crookes, Amin Tootoochian, Amin Tootoonchian, Ananya Kumar, Andrea Vallone, Andrej Karpathy, Andrew Braunstein, Andrew Cann, Andrew Codispoti, Andrew Galu, Andrew Kondrich, Andrew Tulloch, Andrey Mishchenko, Angela Baek, Angela Jiang, Antoine Pelisse, Antonia Woodford, Anuj Gosalia, Arka Dhar, Ashley Pantuliano, Avi Nayak, Avital Oliver, Barret Zoph, Behrooz Ghorbani, Ben Leimberger, Ben Rossen, Ben Sokolowsky, Ben Wang, Benjamin Zweig, Beth Hoover, Blake Samic, Bob McGrew, Bobby Spero, Bogo Giertler, Bowen Cheng, Brad Lightcap, Brandon Walkin, Brendan Quinn, Brian Guarraci, Brian Hsu, Bright Kellogg, Brydon Eastman, Camillo Lugaresi, Carroll Wainwright, Cary Bassin, Cary Hudson, Casey Chu, Chad Nelson, Chak Li, Chan Jun Shern, Channing Conger, Charlotte Barette, Chelsea Voss, Chen Ding, Cheng Lu, Chong Zhang, Chris Beaumont, Chris Hallacy, Chris Koch, Christian Gibson, Christina Kim, Christine Choi, Christine McLeavey, Christopher Hesse, Claudia Fischer, Clemens Winter, Coley Czarnecki, Colin Jarvis, Colin Wei, Constantin Koumouzelis, Dane Sherburn, Daniel Kappler, Daniel Levin, Daniel Levy, David Carr, David Farhi, David Mely, David Robinson, David Sasaki, Denny Jin, Dev Valladares, Dimitris Tsipras, Doug Li, Duc Phong Nguyen, Duncan Findlay, Edede Oiwoh, Edmund Wong, Ehsan Asdar, Elizabeth Proehl, Elizabeth Yang, Eric Antonow, Eric Kramer, Eric Peterson, Eric Sigler, Eric Wallace, Eugene Brevdo, Evan Mays, Farzad Khorasani, Felipe Petroski Such, Filippo Raso, Francis Zhang, Fred von Lohmann, Freddie Sulit, Gabriel Goh, Gene Oden, Geoff Salmon, Giulio Starace, Greg Brockman, Hadi Salman, Haiming Bao, Haitang Hu, Hannah Wong, Haoyu Wang, Heather Schmidt, Heather Whitney, Heewoo Jun, Hendrik Kirchner, Henrique Ponde de Oliveira Pinto, Hongyu Ren, Huiwen Chang, Hyung Won Chung, Ian Kivlichan, Ian O'Connell, Ian Osband, Ian Silber, Ian Sohl, Ibrahim Okuyucu, Ikai Lan, Ilya Kostrikov, Ilya Sutskever, Ingmar Kanitscheider, Ishaan Gulrajani, Jacob Coxon, Jacob Menick, Jakub Pachocki, James Aung, James Betker, James Crooks, James Lennon, Jamie Kiros, Jan Leike, Jane Park, Jason Kwon, Jason Phang, Jason Teplitz, Jason Wei, Jason Wolfe, Jay Chen, Jeff Harris, Jenia Varavva, Jessica Gan Lee, Jessica Shieh, Ji Lin, Jiahui Yu, Jiayi Weng, Jie Tang, Jieqi Yu, Joanne Jang, Joaquin Quinonero Candela, Joe Beutler, Joe Landers, Joel Parish, Johannes Heidecke, John Schulman, Jonathan Lachman, Jonathan McKay, Jonathan Uesato, Jonathan Ward, Jong Wook Kim, Joost Huizinga, Jordan Sitkin, Jos Kraaijeveld, Josh Gross, Josh Kaplan, Josh Snyder, Joshua Achiam, Joy Jiao, Joyce Lee, Juntang Zhuang, Justyn Harriman, Kai Fricke, Kai Hayashi, Karan Singhal, Katy Shi, Kavin Karthik, Kayla Wood, Kendra Rimbach, Kenny Hsu, Kenny Nguyen, Keren Gu-Lemberg, Kevin Button, Kevin Liu, Kiel Howe, Krithika Muthukumar, Kyle Luther, Lama Ahmad, Larry Kai, Lauren Itow, Lauren Workman, Leher Pathak, Leo Chen, Li Jing, Lia Guy, Liam Fedus, Liang Zhou, Lien Mamitsuka, Lilian Weng, Lindsay McCallum, Lindsey Held, Long Ouyang, Louis Feuvrier, Lu Zhang, Lukas Kondraciuk, Lukasz Kaiser, Luke Hewitt, Luke Metz, Lyric Doshi, Mada Aflak, Maddie Simens, Madelaine Boyd, Madeleine Thompson, Marat Dukhan, Mark Chen, Mark Gray, Mark Hudnall, Marvin Zhang, Marwan Aljubeh, Mateusz Litwin, Matthew Zeng, Max Johnson, Maya Shetty, Mayank Gupta, Meghan Shah, Mehmet Yatbaz, Meng Jia Yang, Mengchao Zhong, Mia Glaese, Mianna Chen, Michael Janner, Michael Lampe, Michael Petrov, Michael Wu, Michele Wang, Michelle Fradin, Michelle Pokrass, Miguel Castro, Miguel Oom Temudo de Castro, Mikhail Pavlov, Miles Brundage, Miles Wang, Minal Khan, Mira Murati, Mo Bavarian, Molly Lin, Murat Yesildal, Nacho Soto, Natalia Gimelshein, Natalie Cone, Natalie Staudacher, Natalie Summers, Natan LaFontaine, Neil Chowdhury, Nick Ryder, Nick Stathas, Nick Turley, Nik Tezak, Niko Felix, Nithanth Kudige, Nitish Keskar, Noah Deutsch, Noel Bundick, Nora Puckett, Ofir Nachum, Ola Okelola, Oleg Boiko, Oleg Murk, Oliver Jaffe, Olivia Watkins, Olivier Godement, Owen Campbell-Moore, Patrick Chao, Paul McMillan, Pavel Belov, Peng Su, Peter Bak, Peter Bakkum, Peter Deng, Peter Dolan, Peter Hoeschele, Peter Welinder, Phil Tillet, Philip Pronin, Philippe Tillet, Prafulla Dhariwal, Qiming Yuan, Rachel Dias, Rachel Lim, Rahul Arora, Rajan Troll, Randall Lin, Rapha Gontijo Lopes, Raul Puri, Reah Miyara, Reimar Leike, Renaud Gaubert, Reza Zamani, Ricky Wang, Rob Donnelly, Rob Honsby, Rocky Smith, Rohan Sahai, Rohit Ramchandani, Romain Huet, Rory Carmichael, Rowan Zellers, Roy Chen, Ruby Chen, Ruslan Nigmatullin, Ryan Cheu, Saachi Jain, Sam Altman, Sam Schoenholz, Sam Toizer, Samuel Miserendino, Sandhini Agarwal, Sara Culver, Scott Ethersmith, Scott Gray, Sean Grove, Sean Metzger, Shamez Hermani, Shantanu Jain, Shengjia Zhao, Sherwin Wu, Shino Jomoto, Shirong Wu, Shuaiqi, Xia, Sonia Phene, Spencer Papay, Srinivas Narayanan, Steve Coffey, Steve Lee, Stewart Hall, Suchir Balaji, Tal Broda, Tal Stramer, Tao Xu, Tarun Gogineni, Taya Christianson, Ted Sanders, Tejal Patwardhan, Thomas Cunninghman, Thomas Degry, Thomas Dimson, Thomas Raoux, Thomas Shadwell, Tianhao Zheng, Todd Underwood, Todor Markov, Toki Sherbakov, Tom Rubin, Tom Stasi, Tomer Kaftan, Tristan Heywood, Troy Peterson, Tyce Walters, Tyna Eloundou, Valerie Qi, Veit Moeller, Vinnie Monaco, Vishal Kuo, Vlad Fomenko, Wayne Chang, Weiyi Zheng, Wenda Zhou, Wesam Manassra, Will Sheu, Wojciech Zaremba, Yash Patil, Yilei Qian, Yongjik Kim, Youlong Cheng, Yu Zhang, Yuchen He, Yuchen Zhang, Yujia Jin, Yunxing Dai, Yury Malkov
We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.
Ranked #1 on
Visual Question Answering (VQA)
on VLM2-Bench
1 code implementation • 14 Oct 2024 • Cheng Lu, Yang song
Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling.
no code implementations • 20 Jul 2024 • Tianhua Qi, Shiyan Wang, Cheng Lu, Yan Zhao, Yuan Zong, Wenming Zheng
Realistic emotional voice conversion (EVC) aims to enhance emotional diversity of converted audios, making the synthesized voices more authentic and natural.
no code implementations • 17 Jul 2024 • Sunan Li, Hailun Lian, Cheng Lu, Yan Zhao, Tianhua Qi, Hao Yang, Yuan Zong, Wenming Zheng
The emotion recognition has attracted more attention in recent decades.
1 code implementation • 1 Apr 2024 • Cheng Lu, Jiusun Zeng, Yu Xia, Jinhui Cai, Shihua Luo
Most existing accelerated estimation methods have to compromise on estimation accuracy with efficiency.
Explainable artificial intelligence
Explainable Artificial Intelligence (XAI)
no code implementations • 3 Mar 2024 • Tianhua Qi, Wenming Zheng, Cheng Lu, Yuan Zong, Hailun Lian
In this paper, we propose Prosody-aware VITS (PAVITS) for emotional voice conversion (EVC), aiming to achieve two major objectives of EVC: high content naturalness and high emotional naturalness, which are crucial for meeting the demands of human perception.
2 code implementations • 1 Feb 2024 • Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang
This leads to the same mode-seeking solution, while enables efficient optimization by circumventing the complexities of RL.
no code implementations • 30 Jan 2024 • Chen Bai, Zeman Shao, Guoxiang Zhang, Di Liang, Jie Yang, Zhuorui Zhang, Yujian Guo, Chengzhang Zhong, Yiqiao Qiu, Zhendong Wang, Yichen Guan, Xiaoyin Zheng, Tao Wang, Cheng Lu
Our proposed general framework encompasses three key processes: 1) integrating a realistic object into a given scene video with proper placement to ensure geometric realism; 2) estimating the sky and environmental lighting distribution and simulating realistic shadows to enhance the light realism; 3) employing a style transfer network that refines the final video output to maximize photorealism.
no code implementations • 19 Jan 2024 • Yong Wang, Cheng Lu, Hailun Lian, Yan Zhao, Björn Schuller, Yuan Zong, Wenming Zheng
These segment-level patches are then encoded using a stack of Swin blocks, in which a local window Transformer is utilized to explore local inter-frame emotional information across frame patches of each segment patch.
no code implementations • 18 Jan 2024 • Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao, Björn Schuller, Wenming Zheng
In speaker-independent speech emotion recognition, the training and testing samples are collected from diverse speakers, leading to a multi-domain shift challenge across the feature distributions of data from different speakers.
no code implementations • 2 Nov 2023 • Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li
We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE).
2 code implementations • NeurIPS 2023 • Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution.
1 code implementation • 11 Oct 2023 • Huayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu
Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies.
no code implementations • 7 Oct 2023 • Jie Zhu, Yuan Zong, Jingang Shi, Cheng Lu, Hongli Chang, Wenming Zheng
This paper focuses on the research of micro-expression recognition (MER) and proposes a flexible and reliable deep learning method called learning to rank onset-occurring-offset representations (LTR3O).
1 code implementation • 28 May 2023 • Zihan Chen, Lei Nico Zheng, Cheng Lu, Jialu Yuan, Di Zhu
However, its potential for inferring dynamic network structures from temporal textual data, specifically financial news, remains an unexplored frontier.
2 code implementations • NeurIPS 2023 • Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu
In comparison, VSD works well with various CFG weights as ancestral sampling from diffusion models and simultaneously improves the diversity and sample quality with a common CFG weight (i. e., $7. 5$).
1 code implementation • 6 May 2023 • Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
The probability flow ordinary differential equation (ODE) of diffusion models (i. e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation.
Ranked #1 on
Density Estimation
on CIFAR-10
3 code implementations • 25 Apr 2023 • Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate.
1 code implementation • NeurIPS 2023 • Tianyu Pang, Cheng Lu, Chao Du, Min Lin, Shuicheng Yan, Zhijie Deng
In this work, we observe that the stochastic reverse process of data scores is a martingale, from which concentration bounds and the optional stopping theorem for data scores can be derived.
no code implementations • 25 Nov 2022 • Cheng Lyu, Jiake Xie, Bo Xu, Cheng Lu, Han Huang, Xin Huang, Ming Wu, Chuang Zhang, Yong Tang
Performance of trimap-free image matting methods is limited when trying to decouple the deterministic and undetermined regions, especially in the scenes where foregrounds are semantically ambiguous, chromaless, or high transmittance.
3 code implementations • 2 Nov 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples.
no code implementations • 22 Oct 2022 • Cheng Lu, Wenming Zheng, Hailun Lian, Yuan Zong, Chuangao Tang, Sunan Li, Yan Zhao
The F-Encoder and T-Encoder model the correlations within frequency bands and time frames, respectively, and they are embedded into a time-frequency joint learning strategy to obtain the time-frequency patterns for speech emotions.
1 code implementation • 29 Sep 2022 • Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, Jun Zhu
To address this problem, we adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model.
1 code implementation • 26 Aug 2022 • Tao Sun, Cheng Lu, Haibin Ling
We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features, as used in previous works.
1 code implementation • ICCV 2023 • Tao Sun, Cheng Lu, Haibin Ling
In this paper, we propose a Local context-aware ADA framework, named LADA, to address this issue.
1 code implementation • 18 Jul 2022 • Tao Sun, Cheng Lu, Haibin Ling
We propose a general rectification module that uses such prior knowledge to refine model generated pseudo labels.
no code implementations • 15 Jul 2022 • Jianwei Lin, Jiatai Lin, Cheng Lu, Hao Chen, Huan Lin, Bingchao Zhao, Zhenwei Shi, Bingjiang Qiu, Xipeng Pan, Zeyan Xu, Biao Huang, Changhong Liang, Guoqiang Han, Zaiyi Liu, Chu Han
To bridge the gap between Transformer and CNN features, we propose a Trans&CNN Feature Calibration block (TCFC) in the decoder.
1 code implementation • 6 Jul 2022 • Runyu Mao, Chen Bai, Yatong An, Fengqing Zhu, Cheng Lu
To the best of our knowledge, 3DG-STFM is the first student-teacher learning method for the local feature matching task.
1 code implementation • 16 Jun 2022 • Cheng Lu, Kaiwen Zheng, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs.
4 code implementations • 2 Jun 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
In this work, we propose an exact formulation of the solution of diffusion ODEs.
no code implementations • 20 Apr 2022 • Bo Xu, Jiake Xie, Han Huang, Ziwen Li, Cheng Lu, Yong Tang, Yandong Guo
In this paper, we propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations and captures sufficient situational perception information for better global saliency distilled from the visual-to-textual task.
1 code implementation • CVPR 2022 • Tao Sun, Cheng Lu, Tianshuo Zhang, Haibin Ling
Unsupervised Domain Adaptation (UDA) aims to leverage a label-rich source domain to solve tasks on a related unlabeled target domain.
Ranked #4 on
Domain Adaptation
on Office-31
no code implementations • 13 Apr 2022 • Chu Han, Xipeng Pan, Lixu Yan, Huan Lin, Bingbing Li, Su Yao, Shanshan Lv, Zhenwei Shi, Jinhai Mai, Jiatai Lin, Bingchao Zhao, Zeyan Xu, Zhizhen Wang, Yumeng Wang, Yuan Zhang, Huihui Wang, Chao Zhu, Chunhui Lin, Lijian Mao, Min Wu, Luwen Duan, Jingsong Zhu, Dong Hu, Zijie Fang, Yang Chen, Yongbing Zhang, Yi Li, Yiwen Zou, Yiduo Yu, Xiaomeng Li, Haiming Li, Yanfen Cui, Guoqiang Han, Yan Xu, Jun Xu, Huihua Yang, Chunming Li, Zhenbing Liu, Cheng Lu, Xin Chen, Changhong Liang, Qingling Zhang, Zaiyi Liu
According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS.
Data Augmentation
Weakly supervised Semantic Segmentation
+1
no code implementations • 8 Mar 2022 • Bo Xu, Guanze Liu, Han Huang, Cheng Lu, Yandong Guo
Most existing CNN-based salient object detection methods can identify local segmentation details like hair and animal fur, but often misinterpret the real saliency due to the lack of global contextual information caused by the subjectiveness of the SOD task and the locality of convolution layers.
no code implementations • 28 Jan 2022 • Changwei Xu, Jianfei Yang, Haoran Tang, Han Zou, Cheng Lu, Tianshuo Zhang
Unsupervised Domain Adaptation (UDA), a branch of transfer learning where labels for target samples are unavailable, has been widely researched and developed in recent years with the help of adversarially trained models.
no code implementations • 22 Oct 2021 • Ziwen Li, Bo Xu, Han Huang, Cheng Lu, Yandong Guo
In this paper, we propose a new framework Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation (DTS-VIBE), to generate 3D human pose and mesh from RGB videos.
Ranked #53 on
3D Human Pose Estimation
on 3DPW
1 code implementation • ICCV 2021 • Bo Xu, Han Huang, Cheng Lu, Ziwen Li, Yandong Guo
In this paper, we propose a Virtual Multi-modality Foreground Matting (VMFM) method to learn human-object interactive foreground (human and objects interacted with him or her) from a raw RGB image.
1 code implementation • ICLR 2021 • Cheng Lu, Jianfei Chen, Chongxuan Li, Qiuhao Wang, Jun Zhu
Through theoretical analysis, we show that the function space of ImpFlow is strictly richer than that of ResFlows.
no code implementations • 13 Aug 2020 • Xingxun Jiang, Yuan Zong, Wenming Zheng, Chuangao Tang, Wanchuang Xia, Cheng Lu, Jiateng Liu
Experimental results show that DFEW is a well-designed and challenging database, and the proposed EC-STFL can promisingly improve the performance of existing spatiotemporal deep neural networks in coping with the problem of dynamic FER in the wild.
Ranked #23 on
Dynamic Facial Expression Recognition
on DFEW
Dynamic Facial Expression Recognition
Facial Expression Recognition
+1
2 code implementations • CVPR 2020 • Bo Xu, Cheng Lu, Yandong Guo, Jacob Wang
Vision is often used as a complementary modality for audio speech recognition (ASR), especially in the noisy environment where performance of solo audio modality significantly deteriorates.
Ranked #11 on
Audio-Visual Speech Recognition
on LRS3-TED
(using extra training data)
no code implementations • 7 Apr 2020 • Zhecan Wang, Jian Zhao, Cheng Lu, Han Huang, Fan Yang, Lianji Li, Yandong Guo
To better demonstrate the advantage of our methods, we further propose a new benchmark dataset with the most rich distribution of head-gaze combination reflecting real-world scenarios.
1 code implementation • ICML 2020 • Jianfei Chen, Cheng Lu, Biqi Chenli, Jun Zhu, Tian Tian
Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations.
1 code implementation • 8 Dec 2019 • Fan Yang, Cheng Lu, Yandong Guo, Longin Jan Latecki, Haibin Ling
Feature pyramid architecture has been broadly adopted in object detection and segmentation to deal with multi-scale problem.
1 code implementation • NeurIPS 2019 • Andrey Kolobov, Yuval Peres, Cheng Lu, Eric J. Horvitz
From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e. g., web pages).
no code implementations • CVPR 2017 • Yandong Guo, Cheng Lu, Jan P. Allebach, Charles A. Bouman
Experimental results with a variety of document images demonstrate that our method improves the image quality compared with the observed image, and simultaneously improves the compression ratio.