Search Results for author: Honglu Zhou

Found 22 papers, 14 papers with code

Laying the Foundations of Deep Long-Term Crowd Flow Prediction

1 code implementation ECCV 2020 Samuel S. Sohn, Honglu Zhou, Seonghyeon Moon, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia

Predicting the crowd behavior in complex environments is a key requirement for crowd and disaster management, architectural design, and urban planning.

Management

CASIM: Composite Aware Semantic Injection for Text to Motion Generation

no code implementations4 Feb 2025 Che-Jui Chang, Qingze Tony Liu, Honglu Zhou, Vladimir Pavlovic, Mubbasir Kapadia

Recent advances in generative modeling and tokenization have driven significant progress in text-to-motion generation, leading to enhanced quality and realism in generated motions.

Motion Generation

Unifying Specialized Visual Encoders for Video Language Models

1 code implementation2 Jan 2025 Jihoon Chung, Tyler Zhu, Max Gonzalez Saez-Diez, Juan Carlos Niebles, Honglu Zhou, Olga Russakovsky

Our method, MERV, Multi-Encoder Representation of Videos, instead leverages multiple frozen visual encoders to create a unified representation of a video, providing the VideoLLM with a comprehensive set of specialized visual knowledge.

Multiple-choice Video Understanding

ViUniT: Visual Unit Tests for More Robust Visual Programming

no code implementations CVPR 2025 Artemis Panagopoulou, Honglu Zhou, Silvio Savarese, Caiming Xiong, Chris Callison-Burch, Mark Yatskar, Juan Carlos Niebles

In our framework, a unit test is represented as a novel image and answer pair meant to verify the logical correctness of a program produced for a given query.

Image Generation Image-text matching +4

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

no code implementations21 Oct 2024 Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Silvio Savarese, ran Xu, Caiming Xiong, Juan Carlos Niebles

We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames.

Language Modeling Language Modelling +2

Domain-Guided Weight Modulation for Semi-Supervised Domain Generalization

1 code implementation4 Sep 2024 Chamuditha Jayanaga Galappaththige, Zachary Izzo, Xilin He, Honglu Zhou, Muhammad Haris Khan

In search of this endeavor, we study the challenging problem of semi-supervised domain generalization (SSDG), where the goal is to learn a domain-generalizable model while using only a small fraction of labeled data and a relatively large fraction of unlabeled data.

Domain Generalization Semi-Supervised Domain Generalization

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

4 code implementations CVPR 2024 Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Renqiang Min, Daniel Harari, Muhammad Haris Khan

This knowledge, sourced from training procedure plans and structured as a directed weighted graph, equips the agent to better navigate the complexities of step sequencing and its potential variations.

Logical Sequence Navigate

Procedure-Aware Pretraining for Instructional Video Understanding

1 code implementation CVPR 2023 Honglu Zhou, Roberto Martín-Martín, Mubbasir Kapadia, Silvio Savarese, Juan Carlos Niebles

This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form to generalize to multiple procedure understanding tasks.

Video Understanding

HM: Hybrid Masking for Few-Shot Segmentation

1 code implementation24 Mar 2022 Seonghyeon Moon, Samuel S. Sohn, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Muhammad Haris Khan, Mubbasir Kapadia

A fundamental limitation of FM is the inability to preserve the fine-grained spatial details that affect the accuracy of segmentation mask, especially for small target objects.

Few-Shot Semantic Segmentation Segmentation +1

Graph-Based Generative Representation Learning of Semantically and Behaviorally Augmented Floorplans

no code implementations8 Dec 2020 Vahid Azizi, Muhammad Usman, Honglu Zhou, Petros Faloutsos, Mubbasir Kapadia

We present a floorplan embedding technique that uses an attributed graph to represent the geometric information as well as design semantics and behavioral features of the inhabitants as node and edge attributes.

Representation Learning

GitEvolve: Predicting the Evolution of GitHub Repositories

1 code implementation9 Oct 2020 Honglu Zhou, Hareesh Ravi, Carlos M. Muniz, Vahid Azizi, Linda Ness, Gerard de Melo, Mubbasir Kapadia

Given its crucial role, there is a need to better understand and model the dynamics of GitHub as a social platform.

Representation Learning

Understanding Echo Chambers in E-commerce Recommender Systems

1 code implementation6 Jul 2020 Yingqiang Ge, Shuya Zhao, Honglu Zhou, Changhua Pei, Fei Sun, Wenwu Ou, Yongfeng Zhang

Current research on recommender systems mostly focuses on matching users with proper items based on user interests.

Recommendation Systems

HID: Hierarchical Multiscale Representation Learning for Information Diffusion

2 code implementations19 Apr 2020 Honglu Zhou, Shuyuan Xu, Zuohui Fu, Gerard de Melo, Yongfeng Zhang, Mubbasir Kapadia

In this paper, we present a Hierarchical Information Diffusion (HID) framework by integrating user representation learning and multiscale modeling.

Representation Learning

Deep Crowd-Flow Prediction in Built Environments

no code implementations13 Oct 2019 Samuel S. Sohn, Seonghyeon Moon, Honglu Zhou, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia

In this paper, we propose an approach to instantly predict the long-term flow of crowds in arbitrarily large, realistic environments.

Management Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.