1 code implementation • 10 Feb 2025 • Haiduo Huang, Fuwei Yang, Zhenhua Liu, Yixing Xu, Jinze Li, Yang Liu, Xuanwu Yin, Dong Li, Pengju Ren, Emad Barsoum
Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to predict multiple tokens, which are then verified in parallel by the larger target model.
no code implementations • 8 Jan 2025 • Zhi-Lin Huang, Yixuan Liu, Chujun Qin, Zhongdao Wang, Dong Zhou, Dong Li, Emad Barsoum
In this paper, we propose a novel Image-guided Video Editing Diffusion model, termed IVEDiff for the image-guided video editing.
no code implementations • 8 Jan 2025 • Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Zicheng Liu, Emad Barsoum
Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results.
no code implementations • 2 Jan 2025 • Yixing Xu, Shivank Nag, Dong Li, Lu Tian, Emad Barsoum
Sliding window attention (SWA) solves this problem by restricting the attention range to a fixed-size local context window.
2 code implementations • 27 Dec 2024 • Xiaomin Li, Yixuan Liu, Takashi Isobe, Xu Jia, Qinpeng Cui, Dong Zhou, Dong Li, You He, Huchuan Lu, Zhongdao Wang, Emad Barsoum
In text-to-image (T2I) generation applications, negative embeddings have proven to be a simple yet effective approach for enhancing generation quality.
1 code implementation • 20 Dec 2024 • Yixiong Huo, Guangfeng Jiang, Hongyang Wei, Ji Liu, Song Zhang, Han Liu, Xingliang Huang, Mingjie Lu, Jinzhang Peng, Dong Li, Lu Tian, Emad Barsoum
To address these issues, we propose EGSRAL, a 3D GS-based method that relies solely on training images without extra annotations.
no code implementations • 16 Dec 2024 • Zekai Li, Jintu Zheng, Ji Liu, Han Liu, Haowei Zhu, Zeping Li, Fuwei Yang, Haiduo Huang, Jinzhang Peng, Dong Li, Lu Tian, Emad Barsoum
To address these issues, we propose a fine-grained token-wise pruning approach for the LLMs, which presents a learnable router to adaptively identify the less important tokens and skip them across model blocks to reduce computational cost during inference.
1 code implementation • 14 Dec 2024 • Hao Chen, Ze Wang, Xiang Li, Ximeng Sun, Fangyi Chen, Jiang Liu, Jindong Wang, Bhiksha Raj, Zicheng Liu, Emad Barsoum
With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VAE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models.
no code implementations • 10 Dec 2024 • Mingjie Lu, Yuanxian Huang, Ji Liu, Xingliang Huang, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum
To address this problem, we make an analysis of the bottleneck of Occupancy Network inference cost, and present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature and presents an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost.
no code implementations • 22 Oct 2024 • Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian, Spandan Tiwari, Ashish Sirasao, Jun-Hai Yong, Bin Wang, Emad Barsoum
Finally, our method can identify an optimal SubNet through few-step gradient optimization and a simple post-processing procedure.
2 code implementations • 26 Sep 2024 • Qinpeng Cui, Yixuan Liu, Xinyi Zhang, Qiqi Bao, Qingmin Liao, Li Wang, Tian Lu, Zicheng Liu, Zhongdao Wang, Emad Barsoum
In this paper, we present DoSSR, a Domain Shift diffusion-based SR model that capitalizes on the generative powers of pretrained diffusion models while significantly enhancing efficiency by initiating the diffusion process with low-resolution (LR) images.
no code implementations • 20 Aug 2024 • Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum
Next, we reconstruct a dense model featuring a pruning-friendly weight distribution by reactivating pruned connections with sparse regularization.
no code implementations • 19 Jun 2024 • Zeping Li, Xinlong Yang, Ziheng Gao, Ji Liu, Guanchen Li, Zhuang Liu, Dong Li, Jinzhang Peng, Lu Tian, Emad Barsoum
On MT-Bench, Amphista delivers up to 2. 75$\times$ speedup over vanilla autoregressive decoding and 1. 40$\times$ over Medusa on Vicuna 33B in wall-clock time.
no code implementations • 11 Jun 2024 • Tianqi Chen, Zhe Li, Weixiang Xu, Zeyu Zhu, Dong Li, Lu Tian, Emad Barsoum, Peisong Wang, Jian Cheng
The proposed OFF can incorporate semantic information and is insensitive to outliers.
no code implementations • 17 Apr 2024 • Tong Shen, Dong Li, Ziheng Gao, Lu Tian, Emad Barsoum
Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc.
no code implementations • 11 Apr 2024 • Ji Liu, Zifeng Zhang, Mingjie Lu, Hongyang Wei, Dong Li, Yile Xie, Jinzhang Peng, Lu Tian, Ashish Sirasao, Emad Barsoum
We analyze that dense anchors are not necessary for lane detection, and propose a transformer-based lane detection framework based on a sparse anchor mechanism.
no code implementations • 31 Dec 2020 • Emad Barsoum, John Kender, Zicheng Liu
Our model learns to predict multiple future sequences of human poses from the same input sequence.
no code implementations • 4 Jun 2020 • Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum
This paper introduces a novel method to combine gradients called Adasum (for adaptive sum) that converges faster than prior work.
no code implementations • 1 Oct 2019 • Wenlei Bao, Li-Wen Chang, Yang Chen, Ke Deng, Amit Agarwal, Emad Barsoum, Abe Taha
Various approaches have been developed by leveraging techniques such as vectorization and memory layout to improve the performance of integer GEMM.
no code implementations • 20 May 2018 • Weitang Liu, Emad Barsoum, John D. Owens
Our model can learn and derive the coordinates of the digits better than its convolution counterpart that lacks a routing-by-agreement algorithm, and can also perform well when testing on the multi-digit moving MNIST and KTH datasets.
3 code implementations • 27 Nov 2017 • Emad Barsoum, John Kender, Zicheng Liu
Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses.
Ranked #7 on
Human Pose Forecasting
on Human3.6M
(APD metric)
7 code implementations • 3 Aug 2016 • Emad Barsoum, Cha Zhang, Cristian Canton Ferrer, Zhengyou Zhang
Crowd sourcing has become a widely adopted scheme to collect ground truth labels.
Facial Expression Recognition
Facial Expression Recognition (FER)
+1
no code implementations • 21 Apr 2016 • Emad Barsoum
In this paper, we focus on reviewing recent progress of hand pose estimation from depth sensor.