no code implementations • 25 Nov 2024 • Zun Wang, Jialu Li, Han Lin, Jaehong Yoon, Mohit Bansal
To address these challenges, we propose DreamRunner, a novel story-to-video generation method: First, we structure the input script using a large language model (LLM) to facilitate both coarse-grained scene planning as well as fine-grained object-level layout and motion planning.
no code implementations • 4 Oct 2024 • Han Lin, Tushar Nagarajan, Nicolas Ballas, Mido Assran, Mojtaba Komeili, Mohit Bansal, Koustuv Sinha
In this work, we show that a strong off-the-shelf frozen pretrained visual encoder, along with a well designed prediction model, can achieve state-of-the-art (SoTA) performance in forecasting and procedural planning without the need for pretraining the prediction model, nor requiring additional supervision from language or ASR.
1 code implementation • 22 Jun 2024 • Krzysztof Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi
We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees.
1 code implementation • 24 Apr 2024 • Kaining Ying, Fanqing Meng, Jin Wang, Zhiqian Li, Han Lin, Yue Yang, Hao Zhang, Wenbo Zhang, Yuqi Lin, Shuo Liu, Jiayi Lei, Quanfeng Lu, Runjian Chen, Peng Xu, Renrui Zhang, Haozhe Zhang, Peng Gao, Yali Wang, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation.
1 code implementation • 19 Apr 2024 • Zeyu Ling, Bo Han, Yongkang Wongkan, Han Lin, Mohan Kankanhalli, Weidong Geng
Conditional human motion synthesis (HMS) aims to generate human motion sequences that conform to specific conditions.
Ranked #7 on
Motion Synthesis
on HumanML3D
no code implementations • 15 Apr 2024 • Han Lin, Jaemin Cho, Abhay Zala, Mohit Bansal
ControlNets are widely used for adding spatial control to text-to-image diffusion models with different conditions, such as depth maps, scribbles/sketches, and human poses.
no code implementations • 18 Mar 2024 • Abhay Zala, Jaemin Cho, Han Lin, Jaehong Yoon, Mohit Bansal
Then, we enable the LLM to continuously adapt the generated environments to progressively improve the skills that the agent is weak at, by providing feedback to the LLM in the form of the agent's performance.
no code implementations • 18 Oct 2023 • Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal
In the second stage, we use a diagram generator, DiagramGLIGEN, and a text label rendering module to generate diagrams (with clear text labels) following the diagram plans.
no code implementations • 26 Sep 2023 • Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal
Our experiments demonstrate that our proposed VideoDirectorGPT framework substantially improves layout and movement control in both single- and multi-scene video generation and can generate multi-scene videos with consistency, while achieving competitive performance with SOTAs in open-domain single-scene T2V generation.
1 code implementation • CVPR 2023 • Han Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang
Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features.
1 code implementation • 2 Feb 2023 • Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller
We present two new classes of algorithms for efficient field integration on graphs encoding point clouds.
no code implementations • 11 Jan 2023 • Fatemeh Haghighi, Soumitra Ghosh, Hai Ngu, Sarah Chu, Han Lin, Mohsen Hejrati, Baris Bingol, Somaye Hashemifar
To this end, we propose an end-to-end deep learning framework based on self-supervised learning for the segmentation and quantification of dopaminergic neurons in PD animal models.
no code implementations • 19 Sep 2022 • Jingxi Xu, Han Lin, Shuran Song, Matei Ciocarlie
In this work, we propose TANDEM3D, a method that applies a co-training framework for exploration and decision making to 3D object recognition with tactile signals.
1 code implementation • ICLR 2022 • Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller
We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest.
1 code implementation • 16 Jul 2021 • Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten
In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way.
no code implementations • 21 Jan 2021 • Xiangyun Zeng, XiaoFeng Wang, Ali Esamdin, Craig Pellegrino, WeiKang Zheng, Jujia Zhang, Jun Mo, Wenxiong Li, D. Andrew Howell, Alexei V. Filippenko, Han Lin, Thomas G. Brink, Edward A. Baron, Jamison Burke, James M. DerKacy, Curtis McCully, Daichi Hiramatsu, Griffin Hosseinzadeh, Benjamin T. Jeffers, Timothy W. Ross, Benjamin E. Stahl, Samantha Stegman, Stefano Valenti, Lifan Wang, Danfeng Xiang, Jicheng Zhang, Tianmeng Zhang
We present extensive, well-sampled optical and ultraviolet photometry and optical spectra of the Type Ia supernova (SN Ia) 2017hpa.
High Energy Astrophysical Phenomena Solar and Stellar Astrophysics
no code implementations • 21 Dec 2020 • Ji-Cheng Zhang, Xiao-Feng Wang, Jun Mo, Gao-Bo Xi, Jie Lin, Xiao-Jun Jiang, Xiao-Ming Zhang, Wen-Xiong Li, Sheng-Yu Yan, Zhi-Hao Chen, Lei Hu, Xue Li, Wei-Li Lin, Han Lin, Cheng Miao, Li-Ming Rui, Han-Na Sai, Dan-Feng Xiang, Xing-Han Zhang
The TMTS system can have a FoV of about 9 deg2 when monitoring the sky with two bands (i. e., SDSS g and r filters) at the same time, and a maximum FoV of ~18 deg2 when four telescopes monitor different sky areas in monochromatic filter mode.
Instrumentation and Methods for Astrophysics
no code implementations • NeurIPS 2020 • Han Lin, Haoxian Chen, Tianyi Zhang, Clement Laroche, Krzysztof Choromanski
Orthogonal Monte Carlo (OMC) is a very effective sampling algorithm imposing structural geometric conditions (orthogonality) on samples for variance reduction.