1 code implementation • ECCV 2020 • Samuel S. Sohn, Honglu Zhou, Seonghyeon Moon, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia
Predicting the crowd behavior in complex environments is a key requirement for crowd and disaster management, architectural design, and urban planning.
no code implementations • 4 Feb 2025 • Che-Jui Chang, Qingze Tony Liu, Honglu Zhou, Vladimir Pavlovic, Mubbasir Kapadia
Recent advances in generative modeling and tokenization have driven significant progress in text-to-motion generation, leading to enhanced quality and realism in generated motions.
1 code implementation • 2 Jan 2025 • Jihoon Chung, Tyler Zhu, Max Gonzalez Saez-Diez, Juan Carlos Niebles, Honglu Zhou, Olga Russakovsky
Our method, MERV, Multi-Encoder Representation of Videos, instead leverages multiple frozen visual encoders to create a unified representation of a video, providing the VideoLLM with a comprehensive set of specialized visual knowledge.
no code implementations • CVPR 2025 • Artemis Panagopoulou, Honglu Zhou, Silvio Savarese, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles
In our framework, a unit test is represented as a novel image and answer pair meant to verify the logical correctness of a program produced for a given query.
no code implementations • 21 Oct 2024 • Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Silvio Savarese, ran Xu, Caiming Xiong, Juan Carlos Niebles
We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames.
1 code implementation • 4 Sep 2024 • Chamuditha Jayanaga Galappaththige, Zachary Izzo, Xilin He, Honglu Zhou, Muhammad Haris Khan
In search of this endeavor, we study the challenging problem of semi-supervised domain generalization (SSDG), where the goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a relatively large fraction of unlabeled data.
no code implementations • 22 Aug 2024 • Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, ran Xu, Caiming Xiong
We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions.
1 code implementation • 16 Aug 2024 • Le Xue, Manli Shu, Anas Awadalla, Jun Wang, An Yan, Senthil Purushwalkam, Honglu Zhou, Viraj Prabhu, Yutong Dai, Michael S Ryoo, Shrikant Kendre, Jieyu Zhang, Can Qin, Shu Zhang, Chia-Chih Chen, Ning Yu, Juntao Tan, Tulika Manoj Awalgaonkar, Shelby Heinecke, Huan Wang, Yejin Choi, Ludwig Schmidt, Zeyuan Chen, Silvio Savarese, Juan Carlos Niebles, Caiming Xiong, ran Xu
The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs.
4 code implementations • CVPR 2024 • Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khan
This knowledge, sourced from training procedure plans and structured as a directed weighted graph, equips the agent to better navigate the complexities of step sequencing and its potential variations.
no code implementations • CVPR 2024 • Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia
The study of complex human interactions and group activities has become a focal point in human-centric computer vision.
no code implementations • 29 Jun 2023 • Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia
The study of complex human interactions and group activities has become a focal point in human-centric computer vision.
1 code implementation • CVPR 2023 • Honglu Zhou, Roberto Martín-Martín, Mubbasir Kapadia, Silvio Savarese, Juan Carlos Niebles
This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form to generalize to multiple procedure understanding tasks.
1 code implementation • ICCV 2023 • Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia
To extract information relevant to the target class, a dominant approach in best-performing FSS methods removes background features using a support mask.
Ranked #3 on
Few-Shot Semantic Segmentation
on FSS-1000 (1-shot)
1 code implementation • 24 Mar 2022 • Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia
A fundamental limitation of FM is the inability to preserve the fine-grained spatial details that affect the accuracy of segmentation mask, especially for small target objects.
Ranked #5 on
Few-Shot Semantic Segmentation
on FSS-1000 (1-shot)
1 code implementation • 22 Dec 2021 • Honglu Zhou, Advith Chegu, Samuel S. Sohn, Zuohui Fu, Gerard de Melo, Mubbasir Kapadia
Digraph Representation Learning (DRL) aims to learn representations for directed homogeneous graphs (digraphs).
1 code implementation • 11 Dec 2021 • Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf
Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects.
Ranked #2 on
Group Activity Recognition
on Collective Activity
1 code implementation • ICLR 2021 • Honglu Zhou, Asim Kadav, Farley Lai, Alexandru Niculescu-Mizil, Martin Renqiang Min, Mubbasir Kapadia, Hans Peter Graf
We evaluate over CATER dataset and find that Hopper achieves 73. 2% Top-1 accuracy using just 1 FPS by hopping through just a few critical frames.
Ranked #5 on
Video Object Tracking
on CATER
no code implementations • 8 Dec 2020 • Vahid Azizi, Muhammad Usman, Honglu Zhou, Petros Faloutsos, Mubbasir Kapadia
We present a floorplan embedding technique that uses an attributed graph to represent the geometric information as well as design semantics and behavioral features of the inhabitants as node and edge attributes.
1 code implementation • 9 Oct 2020 • Honglu Zhou, Hareesh Ravi, Carlos M. Muniz, Vahid Azizi, Linda Ness, Gerard de Melo, Mubbasir Kapadia
Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform.
1 code implementation • 6 Jul 2020 • Yingqiang Ge, Shuya Zhao, Honglu Zhou, Changhua Pei, Fei Sun, Wenwu Ou, Yongfeng Zhang
Current research on recommender systems mostly focuses on matching users with proper items based on user interests.
2 code implementations • 19 Apr 2020 • Honglu Zhou, Shuyuan Xu, Zuohui Fu, Gerard de Melo, Yongfeng Zhang, Mubbasir Kapadia
In this paper, we present a Hierarchical Information Diffusion (HID) framework by integrating user representation learning and multiscale modeling.
no code implementations • 13 Oct 2019 • Samuel S. Sohn, Seonghyeon Moon, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia
In this paper, we propose an approach to instantly predict the long-term flow of crowds in arbitrarily large, realistic environments.