Search Results for author: Roger Zimmermann

Found 68 papers, 31 papers with code

Causal Augmentation for Causal Sentence Classification

1 code implementation EMNLP (CINLP) 2021 Fiona Anting Tan, Devamanyu Hazarika, See-Kiong Ng, Soujanya Poria, Roger Zimmermann

Scarcity of annotated causal texts leads to poor robustness when training state-of-the-art language models for causal sentence classification.

Classification counterfactual +3

Described Spatial-Temporal Video Detection

no code implementations8 Jul 2024 Wei Ji, Xiangyan Liu, Yingfei Sun, Jiajun Deng, You Qin, Ammar Nuwanna, Mengyao Qiu, Lina Wei, Roger Zimmermann

However, in the video domain, the existing setting, i. e., spatial-temporal video grounding (STVG), is formulated to only detect one pre-existing object in each frame, ignoring the fact that language descriptions can involve none or multiple entities within a video.

Multi-class Classification Temporal Localization +1

Do As I Do: Pose Guided Human Motion Copy

no code implementations24 Jun 2024 Sifan Wu, Zhenguang Liu, Beibei Zhang, Roger Zimmermann, Zhongjie Ba, Xiaosong Zhang, Kui Ren

Human motion copy is an intriguing yet challenging task in artificial intelligence and computer vision, which strives to generate a fake video of a target person performing the motion of a source person.

PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search with Supplementary Materials

no code implementations19 Jun 2024 Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xianjing Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann

Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images.

Prompt-Enhanced Spatio-Temporal Graph Transfer Learning

no code implementations21 May 2024 Junfeng Hu, Xu Liu, Zhencheng Fan, Yifang Yin, Shili Xiang, Savitha Ramasamy, Roger Zimmermann

To bridge this gap, we propose Spatio-Temporal Graph Prompting (STGP), a prompt-enhanced transfer learning framework capable of adapting to diverse tasks in data-scarce domains.

Transfer Learning

ShapeMoiré: Channel-Wise Shape-Guided Network for Image Demoiréing

1 code implementation28 Apr 2024 Jinming Cao, Sicheng Shen, Qiu Zhou, Yifang Yin, Yangyan Li, Roger Zimmermann

Interestingly, we find that the Shape information effectively captures the moir\'e patterns in artifact images.

Modeling Spatio-temporal Dynamical Systems with Neural Discrete Learning and Levels-of-Experts

no code implementations6 Feb 2024 Kun Wang, Hao Wu, Guibin Zhang, Junfeng Fang, Yuxuan Liang, Yuankai Wu, Roger Zimmermann, Yang Wang

In this paper, we address the issue of modeling and estimating changes in the state of the spatio-temporal dynamical systems based on a sequence of observations like video frames.

Optical Flow Estimation

Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification

no code implementations18 Jan 2024 Yutong Xia, Runpeng Yu, Yuxuan Liang, Xavier Bresson, Xinchao Wang, Roger Zimmermann

Graph Neural Networks (GNNs) have become the preferred tool to process graph data, with their efficacy being boosted through graph data augmentation techniques.

Data Augmentation Graph Classification

Towards Unifying Diffusion Models for Probabilistic Spatio-Temporal Graph Learning

no code implementations26 Oct 2023 Junfeng Hu, Xu Liu, Zhencheng Fan, Yuxuan Liang, Roger Zimmermann

Based on this proposal, we introduce Unified Spatio-Temporal Diffusion Models (USTD) to address the tasks uniformly within the uncertainty-aware diffusion framework.

Denoising Graph Learning

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

1 code implementation22 Oct 2023 Yibo Yan, Haomin Wen, Siru Zhong, Wei Chen, Haodong Chen, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

To answer the questions, we leverage the power of Large Language Models (LLMs) and introduce the first-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling, named LLM-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP).

Language Modelling Representation Learning

Domain-wise Invariant Learning for Panoptic Scene Graph Generation

no code implementations9 Oct 2023 Li Li, You Qin, Wei Ji, Yuxiao Zhou, Roger Zimmermann

Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates).

Graph Generation Panoptic Scene Graph Generation

Towards Complex-query Referring Image Segmentation: A Novel Benchmark

no code implementations29 Sep 2023 Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann

Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms.

Image Segmentation Semantic Segmentation

Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

1 code implementation13 Sep 2023 Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang

We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature.


A Survey on Service Route and Time Prediction in Instant Delivery: Taxonomy, Progress, and Prospects

no code implementations3 Sep 2023 Haomin Wen, Youfang Lin, Lixia Wu, Xiaowei Mao, Tianyue Cai, Yunfeng Hou, Shengnan Guo, Yuxuan Liang, Guangyin Jin, Yiji Zhao, Roger Zimmermann, Jieping Ye, Huaiyu Wan

An emerging research area within these services is service Route\&Time Prediction (RTP), which aims to estimate the future service route as well as the arrival time of a given worker.

SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection

1 code implementation26 Aug 2023 Qiu Zhou, Jinming Cao, Hanchao Leng, Yifang Yin, Yu Kun, Roger Zimmermann

This indicates that the combination of 3D object detection and 3D semantic occupancy leads to a more comprehensive perception of the 3D environment, thereby aiding build more robust autonomous driving systems.

3D Object Detection Autonomous Driving +2

Panoptic Scene Graph Generation with Semantics-Prototype Learning

1 code implementation28 Jul 2023 Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger Zimmermann

To promise consistency and accuracy during the transfer process, we propose to measure the invariance of representations in each predicate class, and learn unbiased prototypes of predicates with different intensities.

Graph Generation Panoptic Scene Graph Generation

In Defense of Clip-based Video Relation Detection

no code implementations18 Jul 2023 Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Roger Zimmermann

While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets.

Feature Compression Object Tracking +2

LaDe: The First Comprehensive Last-mile Delivery Dataset from Industry

no code implementations19 Jun 2023 Lixia Wu, Haomin Wen, Haoyuan Hu, Xiaowei Mao, Yutong Xia, Ergang Shan, Jianbin Zhen, Junhong Lou, Yuxuan Liang, Liuqing Yang, Roger Zimmermann, Youfang Lin, Huaiyu Wan

In this paper, we introduce \texttt{LaDe}, the first publicly available last-mile delivery dataset with millions of packages from the industry.


Graph Neural Processes for Spatio-Temporal Extrapolation

1 code implementation30 May 2023 Junfeng Hu, Yuxuan Liang, Zhencheng Fan, Hongyang Chen, Yu Zheng, Roger Zimmermann

We study the task of spatio-temporal extrapolation that generates data at target locations from surrounding contexts in a graph.

Gaussian Processes

A real-time blind quality-of-experience assessment metric for HTTP adaptive streaming

1 code implementation17 Mar 2023 Chunyi Li, May Lim, Abdelhak Bentaleb, Roger Zimmermann

In today's Internet, HTTP Adaptive Streaming (HAS) is the mainstream standard for video streaming, which switches the bitrate of the video content based on an Adaptive BitRate (ABR) algorithm.

DiffSTG: Probabilistic Spatio-Temporal Graph Forecasting with Denoising Diffusion Models

1 code implementation31 Jan 2023 Haomin Wen, Youfang Lin, Yutong Xia, Huaiyu Wan, Qingsong Wen, Roger Zimmermann, Yuxuan Liang

Spatio-temporal graph neural networks (STGNN) have emerged as the dominant model for spatio-temporal graph (STG) forecasting.

Decision Making Denoising

Do We Really Need Graph Neural Networks for Traffic Forecasting?

no code implementations30 Jan 2023 Xu Liu, Yuxuan Liang, Chao Huang, Hengchang Hu, Yushi Cao, Bryan Hooi, Roger Zimmermann

Spatio-temporal graph neural networks (STGNN) have become the most popular solution to traffic forecasting.

AirFormer: Predicting Nationwide Air Quality in China with Transformers

1 code implementation29 Nov 2022 Yuxuan Liang, Yutong Xia, Songyu Ke, Yiwei Wang, Qingsong Wen, Junbo Zhang, Yu Zheng, Roger Zimmermann

Air pollution is a crucial issue affecting human health and livelihoods, as well as one of the barriers to economic and social growth.

Analyzing Modality Robustness in Multimodal Sentiment Analysis

1 code implementation NAACL 2022 Devamanyu Hazarika, Yingting Li, Bo Cheng, Shuai Zhao, Roger Zimmermann, Soujanya Poria

In this work, we hope to address that by (i) Proposing simple diagnostic checks for modality robustness in a trained multimodal model.

Multimodal Sentiment Analysis

DualFormer: Local-Global Stratified Transformer for Efficient Video Recognition

1 code implementation9 Dec 2021 Yuxuan Liang, Pan Zhou, Roger Zimmermann, Shuicheng Yan

While transformers have shown great potential on video recognition with their strong capability of capturing long-range dependencies, they often suffer high computational costs induced by the self-attention to the huge number of 3D tokens.

Video Recognition

Decoupling Long- and Short-Term Patterns in Spatiotemporal Inference

no code implementations16 Sep 2021 Junfeng Hu, Yuxuan Liang, Zhencheng Fan, Li Liu, Yifang Yin, Roger Zimmermann

Specifically, we introduce a joint spatiotemporal graph attention network to learn the relations across space and time for short-term patterns.

Graph Attention

When Do Contrastive Learning Signals Help Spatio-Temporal Graph Forecasting?

1 code implementation26 Aug 2021 Xu Liu, Yuxuan Liang, Chao Huang, Yu Zheng, Bryan Hooi, Roger Zimmermann

In view of this, one may ask: can we leverage the additional signals from contrastive learning to alleviate data scarcity, so as to benefit STG forecasting?

Contrastive Learning Data Augmentation +2

Recovering the Unbiased Scene Graphs from the Biased Ones

1 code implementation5 Jul 2021 Meng-Jiun Chiou, Henghui Ding, Hanshu Yan, Changhu Wang, Roger Zimmermann, Jiashi Feng

Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects.

Missing Labels Scene Graph Classification +4

Towards Modelling Coherence in Spoken Discourse

no code implementations31 Dec 2020 Rajaswa Patil, Yaman Kumar Singla, Rajiv Ratn Shah, Mika Hama, Roger Zimmermann

While there has been significant progress towards modelling coherence in written discourse, the work in modelling spoken discourse coherence has been quite limited.

LIFI: Towards Linguistically Informed Frame Interpolation

1 code implementation30 Oct 2020 Aradhya Neeraj Mathur, Devansh Batra, Yaman Kumar, Rajiv Ratn Shah, Roger Zimmermann

We also release several datasets to test computer vision video generation models of their speech understanding.

Video Generation

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends

no code implementations19 Oct 2020 Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh

Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data.

Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations

1 code implementation10 Sep 2020 Meng-Jiun Chiou, Roger Zimmermann, Jiashi Feng

Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years.

Object object-detection +4

Exploring the use of Time-Dependent Cross-Network Information for Personalized Recommendations

no code implementations25 Aug 2020 Dilruk Perera, Roger Zimmermann

The solution first learns historical user models in the target network by aggregating user preferences from multiple source networks.


LSTM Networks for Online Cross-Network Recommendations

no code implementations25 Aug 2020 Dilruk Perera, Roger Zimmermann

Existing models (1) fail to capture complex non-linear relationships in user interactions, and (2) are designed for offline settings hence, not updated online with incoming interactions to capture the dynamics in the recommender environment.

Diversity Recommendation Systems

Zero-Shot Multi-View Indoor Localization via Graph Location Networks

1 code implementation6 Aug 2020 Meng-Jiun Chiou, Zhenguang Liu, Yifang Yin, An-An Liu, Roger Zimmermann

In this paper, we propose a novel neural network based architecture Graph Location Networks (GLN) to perform infrastructure-free, multi-view image based indoor localization.

Indoor Localization

"Notic My Speech" -- Blending Speech Patterns With Multimedia

no code implementations12 Jun 2020 Dhruva Sahrawat, Yaman Kumar, Shashwat Aggarwal, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

To close the gap between speech understanding and multimedia video applications, in this paper, we show the initial experiments by modelling the perception on visual speech and showing its use case on video compression.

speech-recognition Video Compression +1

Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings

no code implementations19 Oct 2019 Dhruva Sahrawat, Debanjan Mahata, Mayank Kulkarni, Haimin Zhang, Rakesh Gosangi, Amanda Stent, Agniv Sharma, Yaman Kumar, Rajiv Ratn Shah, Roger Zimmermann

In this paper, we formulate keyphrase extraction from scholarly articles as a sequence labeling task solved using a BiLSTM-CRF, where the words in the input text are represented using deep contextualized embeddings.

Keyphrase Extraction Word Embeddings

Conversational Transfer Learning for Emotion Recognition

1 code implementation11 Oct 2019 Devamanyu Hazarika, Soujanya Poria, Roger Zimmermann, Rada Mihalcea

We propose an approach, TL-ERC, where we pre-train a hierarchical dialogue model on multi-turn conversations (source) and then transfer its parameters to a conversational emotion classifier (target).

Emotion Recognition in Conversation Sentence +1

Towards Multimodal Sarcasm Detection (An \_Obviously\_ Perfect Paper)

1 code implementation ACL 2019 Santiago Castro, Devamanyu Hazarika, Ver{\'o}nica P{\'e}rez-Rosas, Roger Zimmermann, Rada Mihalcea, Soujanya Poria

As a first step towards enabling the development of multimodal approaches for sarcasm detection, we propose a new sarcasm dataset, Multimodal Sarcasm Detection Dataset (MUStARD), compiled from popular TV shows.

Sarcasm Detection

Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)

1 code implementation5 Jun 2019 Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, Soujanya Poria

As a first step towards enabling the development of multimodal approaches for sarcasm detection, we propose a new sarcasm dataset, Multimodal Sarcasm Detection Dataset (MUStARD), compiled from popular TV shows.

Sarcasm Detection

Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition

1 code implementation29 Jan 2019 Yaman Kumar, Dhruva Sahrawat, Shubham Maheshwari, Debanjan Mahata, Amanda Stent, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases.

speech-recognition Visual Speech Recognition +1

Kiki Kills: Identifying Dangerous Challenge Videos from Social Media

no code implementations2 Dec 2018 Nupur Baghel, Yaman Kumar, Paavini Nanda, Rajiv Ratn Shah, Debanjan Mahata, Roger Zimmermann

There has been upsurge in the number of people participating in challenges made popular through social media channels.

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

no code implementations16 Jul 2018 Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann, John R. Talburt

Keyword extraction is a fundamental task in natural language processing that facilitates mapping of documents to a concise set of representative single and multi-word phrases.

Keyword Extraction

A Multimodal Approach to Predict Social Media Popularity

no code implementations16 Jul 2018 Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

In this work, we propose a multimodal dataset consisiting of content, context, and social information for popularity prediction.

Cannot find the paper you are looking for? You can Submit a new open access paper.