Search Results for author: Yifei HUANG

Found 33 papers, 17 papers with code

Pretraining Language Models with Text-Attributed Heterogeneous Graphs

1 code implementation19 Oct 2023 Tao Zou, Le Yu, Yifei HUANG, Leilei Sun, Bowen Du

In many real-world scenarios (e. g., academic networks, social platforms), different types of entities are not only associated with texts but also connected by various relationships, which can be abstracted as Text-Attributed Heterogeneous Graphs (TAHGs).

Link Prediction Node Classification +1

Proposal-based Temporal Action Localization with Point-level Supervision

no code implementations9 Oct 2023 Yuan Yin, Yifei HUANG, Ryosuke Furuta, Yoichi Sato

Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data.

Action Classification Multiple Instance Learning +1

Memory-and-Anticipation Transformer for Online Action Understanding

1 code implementation ICCV 2023 Jiahao Wang, Guo Chen, Yifei HUANG, LiMin Wang, Tong Lu

Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.

Action Understanding Online Action Detection

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation22 May 2023 Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Video Understanding

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

no code implementations CVPR 2023 Mingfang Zhang, Jinglu Wang, Xiao Li, Yifei HUANG, Yoichi Sato, Yan Lu

The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs.

3D Reconstruction

Fine-grained Affordance Annotation for Egocentric Hand-Object Interaction Videos

1 code implementation7 Feb 2023 Zecheng Yu, Yifei HUANG, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

Object affordance is an important concept in hand-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning.

Action Anticipation Action Recognition +2

Weakly Supervised Temporal Sentence Grounding With Uncertainty-Guided Self-Training

no code implementations CVPR 2023 Yifei HUANG, Lijin Yang, Yoichi Sato

The task of weakly supervised temporal sentence grounding aims at finding the corresponding temporal moments of a language description in the video, given video-language correspondence only at video-level.

Data Augmentation Weakly-supervised Learning

Compound Prototype Matching for Few-shot Action Recognition

no code implementations12 Jul 2022 Yifei HUANG, Lijin Yang, Yoichi Sato

Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start/evolution of the action.

Few-Shot action recognition Few Shot Action Recognition +1

Precise Affordance Annotation for Egocentric Action Video Datasets

no code implementations11 Jun 2022 Zecheng Yu, Yifei HUANG, Ryosuke Furuta, Takuma Yagi, Yusuke Goutsu, Yoichi Sato

Object affordance is an important concept in human-object interaction, providing information on action possibilities based on human motor capacity and objects' physical property thus benefiting tasks such as action anticipation and robot imitation learning.

Action Anticipation Affordance Recognition +1

CLRNet: Cross Layer Refinement Network for Lane Detection

3 code implementations CVPR 2022 Tu Zheng, Yifei HUANG, Yang Liu, Wenjian Tang, Zheng Yang, Deng Cai, Xiaofei He

In this way, we can exploit more contextual information to detect lanes while leveraging local detailed lane features to improve localization accuracy.

Lane Detection

Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips

no code implementations2 Dec 2021 Lijin Yang, Yifei HUANG, Yusuke Sugano, Yoichi Sato

Previous works explored to address this problem by applying temporal attention but failed to consider the global context of the full video, which is critical for determining the relatively significant parts.

Action Recognition Video Understanding

Leveraging Human Selective Attention for Medical Image Analysis with Limited Training Data

no code implementations2 Dec 2021 Yifei HUANG, Xiaoxiao Li, Lijin Yang, Lin Gu, Yingying Zhu, Hirofumi Seo, Qiuming Meng, Tatsuya Harada, Yoichi Sato

Then we design a novel Auxiliary Attention Block (AAB) to allow information from SAN to be utilized by the backbone encoder to focus on selective areas.

Tumor Segmentation

Ego4D: Around the World in 3,000 Hours of Egocentric Video

4 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Spatio-Temporal Perturbations for Video Attribution

1 code implementation1 Sep 2021 Zhenqiang Li, Weimin WANG, Zuoyue Li, Yifei HUANG, Yoichi Sato

The attribution method provides a direction for interpreting opaque neural networks in a visual way by identifying and visualizing the input regions/pixels that dominate the output of a network.

Video Understanding

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning

1 code implementation ICCV 2021 Chenxu Zhang, Yifan Zhao, Yifei HUANG, Ming Zeng, Saifeng Ni, Madhukar Budagavi, Xiaohu Guo

In this paper, we propose a talking face generation method that takes an audio signal as input and a short target video clip as reference, and synthesizes a photo-realistic video of the target face with natural lip motions, head poses, and eye blinks that are in-sync with the input audio signal.

3D Face Animation Talking Face Generation

EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2021: Team M3EM Technical Report

no code implementations18 Jun 2021 Lijin Yang, Yifei HUANG, Yusuke Sugano, Yoichi Sato

In this report, we describe the technical details of our submission to the 2021 EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition.

Action Recognition Unsupervised Domain Adaptation

Goal-Oriented Gaze Estimation for Zero-Shot Learning

1 code implementation CVPR 2021 Yang Liu, Lei Zhou, Xiao Bai, Yifei HUANG, Lin Gu, Jun Zhou, Tatsuya Harada

Therefore, we introduce a novel goal-oriented gaze estimation module (GEM) to improve the discriminative attribute localization based on the class-level attributes for ZSL.

Gaze Estimation Generalized Zero-Shot Learning

Commonsense Knowledge Aware Concept Selection For Diverse and Informative Visual Storytelling

no code implementations5 Feb 2021 Hong Chen, Yifei HUANG, Hiroya Takamura, Hideki Nakayama

To enrich the candidate concepts, a commonsense knowledge graph is created for each image sequence from which the concept candidates are proposed.

Informativeness Visual Storytelling

Adversarial Robustness of Stabilized NeuralODEs Might be from Obfuscated Gradients

1 code implementation28 Sep 2020 Yifei Huang, Yaodong Yu, Hongyang Zhang, Yi Ma, Yuan YAO

Even replacing only the first layer of a ResNet by such a ODE block can exhibit further improvement in robustness, e. g., under PGD-20 ($\ell_\infty=0. 031$) attack on CIFAR-10 dataset, it achieves 91. 57\% and natural accuracy and 62. 35\% robust accuracy, while a counterpart architecture of ResNet trained with TRADES achieves natural and robust accuracy 76. 29\% and 45. 24\%, respectively.

Adversarial Defense Adversarial Robustness

Improving Action Segmentation via Graph-Based Temporal Reasoning

no code implementations CVPR 2020 Yifei Huang, Yusuke Sugano, Yoichi Sato

In this paper, we propose a network module called Graph-based Temporal Reasoning Module (GTRM) that can be built on top of existing action segmentation models to learn the relation of multiple action segments in various time spans.

Action Segmentation Segmentation

Towards Visually Explaining Video Understanding Networks with Perturbation

2 code implementations1 May 2020 Zhenqiang Li, Weimin WANG, Zuoyue Li, Yifei HUANG, Yoichi Sato

''Making black box models explainable'' is a vital problem that accompanies the development of deep learning networks.

Video Understanding

Discovery of Bias and Strategic Behavior in Crowdsourced Performance Assessment

no code implementations5 Aug 2019 Yifei Huang, Matt Shum, Xi Wu, Jason Zezhong Xiao

With the industry trend of shifting from a traditional hierarchical approach to flatter management structure, crowdsourced performance assessment gained mainstream popularity.

Fairness Management


no code implementations ICLR 2019 Yifei HUANG, Yuan YAO, Weizhi Zhu

A belief persists long in machine learning that enlargement of margins over training data accounts for the resistance of models to overfitting by increasing the robustness.

Generalization Bounds Test

An Evaluation of Transfer Learning for Classifying Sales Engagement Emails at Large Scale

no code implementations19 Apr 2019 Yong Liu, Pavel Dmitriev, Yifei HUANG, Andrew Brooks, Li Dong

Our results show that fine-tuning of the BERT model outperforms with as few as 300 labeled samples, but underperforms with fewer than 300 labeled samples, relative to all the feature-based approaches using different embeddings.

Language Modelling Transfer Learning

Manipulation-skill Assessment from Videos with Spatial Attention Network

no code implementations9 Jan 2019 Zhenqiang Li, Yifei Huang, Minjie Cai, Yoichi Sato

Recent advances in computer vision have made it possible to automatically assess from videos the manipulation skills of humans in performing a task, which breeds many important applications in domains such as health rehabilitation and manufacturing.

Mutual Context Network for Jointly Estimating Egocentric Gaze and Actions

no code implementations7 Jan 2019 Yifei Huang, Zhenqiang Li, Minjie Cai, Yoichi Sato

In this work, we address two coupled tasks of gaze prediction and action recognition in egocentric videos by exploring their mutual context.

Action Recognition Gaze Prediction +1

Differentiable Fine-grained Quantization for Deep Neural Network Compression

1 code implementation NIPS Workshop CDNNRIA 2018 Hsin-Pai Cheng, Yuanjun Huang, Xuyang Guo, Yifei HUANG, Feng Yan, Hai Li, Yiran Chen

Thus judiciously selecting different precision for different layers/structures can potentially produce more efficient models compared to traditional quantization methods by striking a better balance between accuracy and compression rate.

Neural Network Compression Quantization

Semantic Aware Attention Based Deep Object Co-segmentation

3 code implementations16 Oct 2018 Hong Chen, Yifei HUANG, Hideki Nakayama

Object co-segmentation is the task of segmenting the same objects from multiple images.


Rethinking Breiman's Dilemma in Neural Networks: Phase Transitions of Margin Dynamics

1 code implementation8 Oct 2018 Weizhi Zhu, Yifei HUANG, Yuan YAO

In this paper, we revisit Breiman's dilemma in deep neural networks with recently proposed spectrally normalized margins, from a novel perspective based on phase transitions of normalized margin distributions in training dynamics.

Generalization Bounds Test

Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition

2 code implementations ECCV 2018 Yifei Huang, Minjie Cai, Zhenqiang Li, Yoichi Sato

We present a new computational model for gaze prediction in egocentric videos by exploring patterns in temporal shift of gaze fixations (attention transition) that are dependent on egocentric manipulation tasks.

Gaze Prediction Saliency Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.