Search Results for author: Jiebo Luo

Found 292 papers, 111 papers with code

Irony in Emojis: A Comparative Study of Human and LLM Interpretation

no code implementations20 Jan 2025 Yawen Zheng, Hanjia Lyu, Jiebo Luo

Emojis have become a universal language in online communication, often carrying nuanced and context-dependent meanings.

Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion

no code implementations15 Jan 2025 Jingyuan Chen, Fuchen Long, Jie An, Zhaofan Qiu, Ting Yao, Jiebo Luo, Tao Mei

Extensive experiments of long video generation on the VBench benchmark demonstrate the superiority of our Ouroboros-Diffusion, particularly in terms of subject consistency, motion smoothness, and temporal consistency.

Denoising Video Denoising +1

SafeCFG: Redirecting Harmful Classifier-Free Guidance for Safe Generation

no code implementations20 Dec 2024 Jiadong Pan, Hongcheng Gao, Liang Li, Zheng-Jun Zha, Qingming Huang, Jiebo Luo

Experimental results show that by incorporating HGR, images generated by diffusion models achieve both high quality and strong safety, and safe DMs trained through unsupervised methods according to the harmfulness detected by HGR also exhibit good safety performance.

Image Generation

How Vision-Language Tasks Benefit from Large Pre-trained Models: A Survey

no code implementations11 Dec 2024 Yayun Qi, Hongxi Li, Yiqi Song, Xinxiao wu, Jiebo Luo

The exploration of various vision-language tasks, such as visual captioning, visual question answering, and visual commonsense reasoning, is an important area in artificial intelligence and continuously attracts the research community's attention.

Image Captioning Question Answering +2

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

1 code implementation26 Nov 2024 Shenghai Yuan, Jinfa Huang, Xianyi He, Yunyuan Ge, Yujun Shi, Liuhan Chen, Jiebo Luo, Li Yuan

We propose a hierarchical training strategy to leverage frequency information for identity preservation, transforming a vanilla pre-trained video generation model into an IPT2V model.

Image to Video Generation Text-to-Video Generation

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

no code implementations23 Nov 2024 Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Zhifei Zhang, Yilin Wang, Jianming Zhang, Jiebo Luo

To support this endeavor, we introduce COMPOSITIONCAP, a new dataset for multi-grained region compositional image captioning, which introduces the task of compositional attribute-aware regional image captioning.

Attribute Cross-Modal Retrieval +4

Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics

no code implementations18 Nov 2024 Taowen Wang, Dongfang Liu, James Chenhao Liang, Wenhao Yang, Qifan Wang, Cheng Han, Jiebo Luo, Ruixiang Tang

Recently in robotics, Vision-Language-Action (VLA) models have emerged as a transformative approach, enabling robots to execute complex tasks by integrating visual and linguistic inputs within an end-to-end learning framework.

CRTRE: Causal Rule Generation with Target Trial Emulation Framework

no code implementations10 Nov 2024 Junda Wang, Weijian Li, Han Wang, Hanjia Lyu, Caroline P. Thirukumaran, Addisu Mesfin, Hong Yu, Jiebo Luo

On the ICD code prediction tasks, it achieved AUC Macro scores of 92. 8 on MIMIC-III and 96. 7 on MIMIC-IV, outperforming the state-of-the-art models KEPT and MSMN.

Causal Inference Prediction

On Inductive Biases That Enable Generalization of Diffusion Transformers

1 code implementation28 Oct 2024 Jie An, De Wang, Pengsheng Guo, Jiebo Luo, Alexander Schwing

Furthermore, we empirically find that both the placement and the effective attention size of these local attention windows are crucial factors.

Denoising Inductive Bias

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment

1 code implementation7 Oct 2024 Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo

To address the limitations of both on- and off-policy RLHF, we propose a preference optimization method that aligns DMs with preferences without relying on reward models or paired human-annotated data.

Model Selection

DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis

1 code implementation23 Sep 2024 Zixuan Wang, Jiayi Li, Xiaoyu Qin, Shikun Sun, Songtao Zhou, Jia Jia, Jiebo Luo

Unlike human movements, which are always continuous, dance camera movements involve both continuous sequences of variable lengths and sudden drastic changes to simulate the switching of multiple cameras.

Human Animation

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting

no code implementations19 Sep 2024 Yongqi Wang, Shuo Yang, Xinxiao wu, Jiebo Luo

To address this challenge, we propose to unify object trajectory detection and relationship classification into an end-to-end open-vocabulary framework.

Decoder Object +6

Semantics Preserving Emoji Recommendation with Large Language Models

1 code implementation16 Sep 2024 Zhongyi Qiu, Kangyi Qiu, Hanjia Lyu, Wei Xiong, Jiebo Luo

Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text.

Learning Brain Tumor Representation in 3D High-Resolution MR Images via Interpretable State Space Models

1 code implementation12 Sep 2024 Qingqiao Hu, Daoan Zhang, Jiebo Luo, Zhenyu Gong, Benedikt Wiestler, JianGuo Zhang, Hongwei Bran Li

Learning meaningful and interpretable representations from high-dimensional volumetric magnetic resonance (MR) images is essential for advancing personalized medicine.

Self-Supervised Learning State Space Models

X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation

no code implementations27 Aug 2024 Hanjia Lyu, Ryan Rossi, Xiang Chen, Md Mehrab Tanjim, Stefano Petrangeli, Somdeb Sarkhel, Jiebo Luo

Large Language Models (LLMs) and Large Multimodal Models (LMMs) have been shown to enhance the effectiveness of enriching item descriptions, thereby improving the accuracy of recommendation systems.

Multimodal Recommendation

Retrieval Augmentation via User Interest Clustering

no code implementations7 Aug 2024 Hanjia Lyu, Hanqing Zeng, Yinglong Xia, Ren Chen, Jiebo Luo

In this paper, we address these challenges by introducing an intermediate "interest" layer between users and items.

Clustering Computational Efficiency +2

Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection

2 code implementations30 Jul 2024 Jinfa Huang, Jinsheng Pan, Zhongwei Wan, Hanjia Lyu, Jiebo Luo

To this end, we propose Evolver, which incorporates LMMs via Chain-of-Evolution (CoE) Prompting, by integrating the evolution attribute and in-context information of memes.

Attribute

3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities

1 code implementation24 Jul 2024 Yanqi Bao, Tianyu Ding, Jing Huo, Yaoli Liu, Yuxin Li, Wenbin Li, Yang Gao, Jiebo Luo

3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations.

3DGS Survey

Representation Bias in Political Sample Simulations with Large Language Models

no code implementations16 Jul 2024 Weihong Qi, Hanjia Lyu, Jiebo Luo

This study seeks to identify and quantify biases in simulating political samples with Large Language Models, specifically focusing on vote choice and public opinion.

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

2 code implementations26 Jun 2024 Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan

We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e. g. Sora and Lumiere) in time-lapse video generation.

Text-to-Video Generation Video Generation

Dual Attribute-Spatial Relation Alignment for 3D Visual Grounding

no code implementations13 Jun 2024 Yue Xu, Kaizhi Yang, Jiebo Luo, Xuejin Chen

3D visual grounding is an emerging research area dedicated to making connections between the 3D physical world and natural language, which is crucial for achieving embodied intelligence.

3D visual grounding Attribute +1

INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance

1 code implementation13 Jun 2024 Chenwei Lin, Hanjia Lyu, Xian Xu, Jiebo Luo

There is no systematic review of multimodal tasks in the insurance domain, nor a benchmark specifically designed to evaluate the capabilities of LVLMs in insurance.

Multiple-choice Visual Reasoning

PromptFix: You Prompt and We Fix the Photo

1 code implementation27 May 2024 Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo

To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks.

Denoising Image Generation +1

Chain-of-Thought Prompting for Demographic Inference with Large Multimodal Models

1 code implementation24 May 2024 Yongsheng Yu, Jiebo Luo

Conventional demographic inference methods have predominantly operated under the supervision of accurately labeled data, yet struggle to adapt to shifting social landscapes and diverse cultural contexts, leading to narrow specialization and limited accuracy in applications.

Zero-Shot Learning

BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

1 code implementation23 Apr 2024 Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang

This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time.

Decision Making Language Modelling

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

no code implementations23 Apr 2024 Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.

Hallucination In-Context Learning +2

CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

no code implementations23 Apr 2024 Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang

In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal dataset of CT images and reports.

Contrastive Learning

The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

no code implementations22 Apr 2024 Yuying Li, Zeyan Liu, Junyi Zhao, Liangqin Ren, Fengjun Li, Jiebo Luo, Bo Luo

In a benchmarking study, we further evaluate if state-of-the-art open-source and commercial AI image detectors can effectively identify the images in the ARIA dataset.

Benchmarking Misinformation

V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

no code implementations18 Apr 2024 Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT).

Text Summarization Video Summarization

Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration

no code implementations15 Apr 2024 Chenwei Lin, Hanjia Lyu, Jiebo Luo, Xian Xu

The emergence of Large Multimodal Models (LMMs) marks a significant milestone in the development of artificial intelligence.

Hallucination

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

2 code implementations7 Apr 2024 Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Text-to-Video Generation Video Generation

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

no code implementations CVPR 2024 Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE.

Decoder Denoising +4

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

1 code implementation CVPR 2024 Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo

However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data.

SoMeLVLM: A Large Vision Language Model for Social Media Processing

no code implementations20 Feb 2024 Xinnong Zhang, Haoyu Kuang, Xinyi Mou, Hanjia Lyu, Kun Wu, Siming Chen, Jiebo Luo, Xuanjing Huang, Zhongyu Wei

The powerful Large Vision Language Models make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall short in aligning with the unique speaking style and context of social media tasks.

Language Modeling Language Modelling

GaussianStyle: Gaussian Head Avatar via StyleGAN

1 code implementation1 Feb 2024 Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

Existing methods like Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant strides in facial attribute control such as facial animation and components editing, yet they struggle with fine-grained representation and scalability in dynamic head modeling.

3DGS Attribute +4

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach

1 code implementation28 Jan 2024 Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan

At inference, we generate images with arbitrary expansion multiples by inputting an anchor image and its corresponding positional embeddings.

Image Outpainting

Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication

1 code implementation16 Jan 2024 Hanjia Lyu, Weihong Qi, Zhongyu Wei, Jiebo Luo

Leveraging Large Multimodal Models (LMMs) to simulate human behaviors when processing multimodal information, especially in the context of social media, has garnered immense interest due to its broad potential and far-reaching implications.

CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs

1 code implementation5 Jan 2024 Daoan Zhang, Junming Yang, Hanjia Lyu, Zijian Jin, Yuan YAO, Mingkai Chen, Jiebo Luo

When exploring the development of Artificial General Intelligence (AGI), a critical task for these models involves interpreting and processing information from multiple image inputs.

Image Comprehension Image to text +2

Bring Metric Functions into Diffusion Models

no code implementations4 Jan 2024 Jie An, Zhengyuan Yang, JianFeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo

The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function.

Denoising

Video Understanding with Large Language Models: A Survey

1 code implementation29 Dec 2023 Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Pinxin Liu, Mingqian Feng, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu

With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.

Survey Video Understanding

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

2 code implementations22 Dec 2023 Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang

Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.

Segmentation Semantic Segmentation

Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation

no code implementations13 Dec 2023 WenTing Chen, Linlin Shen, Jingyang Lin, Jiebo Luo, Xiang Li, Yixuan Yuan

To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process.

Language Modeling Language Modelling +1

GPT-4V(ision) as A Social Media Analysis Engine

1 code implementation13 Nov 2023 Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo

Our investigation begins with a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a careful review of the results and a selection of qualitative samples that illustrate GPT-4V's potential in understanding multimodal social media content.

Hallucination Hate Speech Detection +1

Mixture of Weak & Strong Experts on Graphs

1 code implementation9 Nov 2023 Hanqing Zeng, Hanjia Lyu, Diyi Hu, Yinglong Xia, Jiebo Luo

We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN.

Graph Neural Network Node Classification

A Survey of Large Language Models in Medicine: Progress, Application, and Challenge

1 code implementation9 Nov 2023 Hongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Chenyu You, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton

Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face.

Deceptive Fairness Attacks on Graphs via Meta Learning

1 code implementation24 Oct 2023 Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong

We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively?

Adversarial Robustness Fairness +3

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

no code implementations11 Oct 2023 Jie An, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo

We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

Question Answering Text Generation

Understanding Divergent Framing of the Supreme Court Controversies: Social Media vs. News Outlets

no code implementations18 Sep 2023 Jinsheng Pan, Zichen Wang, Weihong Qi, Hanjia Lyu, Jiebo Luo

Understanding the framing of political issues is of paramount importance as it significantly shapes how individuals perceive, interpret, and engage with these matters.

Decision Making

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

1 code implementation17 Aug 2023 Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang

However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline.

Image Segmentation Segmentation +1

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

1 code implementation14 Aug 2023 Alexander Martin, Haitian Zheng, Jie An, Jiebo Luo

In this work, we use text-guided latent diffusion models for zero-shot image-to-image translation (I2I) across large domain gaps (longI2I), where large amounts of new visual features and new geometry need to be generated to enter the target domain.

Image-to-Image Translation

User-Controllable Recommendation via Counterfactual Retrospective and Prospective Explanations

1 code implementation2 Aug 2023 Juntao Tan, Yingqiang Ge, Yan Zhu, Yinglong Xia, Jiebo Luo, Jianchao Ji, Yongfeng Zhang

Acknowledging the recent advancements in explainable recommender systems that enhance users' understanding of recommendation mechanisms, we propose leveraging these advancements to improve user controllability.

counterfactual Counterfactual Reasoning +1

MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text

no code implementations31 Jul 2023 Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo

In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.

Video Generation

LLM-Rec: Personalized Recommendation via Prompting Large Language Models

no code implementations24 Jul 2023 Hanjia Lyu, Song Jiang, Hanqing Zeng, Yinglong Xia, Qifan Wang, Si Zhang, Ren Chen, Christopher Leung, Jiajie Tang, Jiebo Luo

Notably, the success of LLM-Rec lies in its prompting strategies, which effectively tap into the language model's comprehension of both general and specific item characteristics.

Domain-Scalable Unpaired Image Translation via Latent Space Anchoring

1 code implementation26 Jun 2023 Siyu Huang, Jie An, Donglai Wei, Zudi Lin, Jiebo Luo, Hanspeter Pfister

However, given a UNIT model trained on certain domains, it is difficult for current methods to incorporate new domains because they often need to train the full model on both existing and new domains.

Image-to-Image Translation Translation

Improving Video Colorization by Test-Time Tuning

1 code implementation25 Jun 2023 Yaping Zhao, Haitian Zheng, Jiebo Luo, Edmund Y. Lam

With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored.

Colorization

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

no code implementations31 May 2023 Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo

In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.

counterfactual Counterfactual Inference +2

Learning to Evaluate the Artness of AI-generated Images

no code implementations8 May 2023 Junyu Chen, Jie An, Hanjia Lyu, Christopher Kanan, Jiebo Luo

Assessing the artness of AI-generated images continues to be a challenge within the realm of image generation.

Image Generation

Meta-causal Learning for Single Domain Generalization

no code implementations CVPR 2023 Jin Chen, Zhi Gao, Xinxiao wu, Jiebo Luo

Under this paradigm, we propose a meta-causal learning method to learn meta-knowledge, that is, how to infer the causes of domain shift between the auxiliary and source domains during training.

counterfactual Counterfactual Inference +2

Bias or Diversity? Unraveling Fine-Grained Thematic Discrepancy in U.S. News Headlines

no code implementations28 Mar 2023 Jinsheng Pan, Weihong Qi, Zichen Wang, Hanjia Lyu, Jiebo Luo

There is a broad consensus that news media outlets incorporate ideological biases in their news articles.

Diversity

Predicting Adverse Neonatal Outcomes for Preterm Neonates with Multi-Task Learning

no code implementations28 Mar 2023 Jingyang Lin, Junyu Chen, Hanjia Lyu, Igor Khodak, Divya Chhabra, Colby L Day Richardson, Irina Prelipcean, Andrew M Dylag, Jiebo Luo

In this work, we first analyze the correlations between three adverse neonatal outcomes and then formulate the diagnosis of multiple neonatal outcomes as a multi-task learning (MTL) problem.

Feature Importance Multi-Task Learning

Grounding 3D Object Affordance from 2D Interactions in Images

1 code implementation ICCV 2023 Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha

Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method.

Object

Spatial-Aware Token for Weakly Supervised Object Localization

1 code implementation ICCV 2023 Pingyu Wu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha

Specifically, a spatial token is first introduced in the input space to aggregate representations for localization task.

Object Weakly-Supervised Object Localization

SegPrompt: Using Segmentation Map as a Better Prompt to Finetune Deep Models for Kidney Stone Classification

no code implementations15 Mar 2023 Wei Zhu, Runtao Zhou, Yao Yuan, Campbell Timothy, Rajat Jain, Jiebo Luo

However, the shortage of annotated training data poses a severe problem in improving the performance and generalization ability of the trained model.

Classification Segmentation

Adaptive Siamese Tracking with a Compact Latent Network

no code implementations2 Feb 2023 Xingping Dong, Jianbing Shen, Fatih Porikli, Jiebo Luo, Ling Shao

Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples, and find that the failure cases in some challenging situations can be regarded as the issue of missing decisive samples in offline training.

Computational Assessment of Hyperpartisanship in News Titles

1 code implementation16 Jan 2023 Hanjia Lyu, Jinsheng Pan, Zichen Wang, Jiebo Luo

We first adopt a human-guided machine learning framework to develop a new dataset for hyperpartisan news title detection with 2, 200 manually labeled and 1. 8 million machine-labeled titles that were posted from 2014 to the present by nine representative media organizations across three media bias groups - Left, Central, and Right in an active learning manner.

Active Learning Language Modelling

PromptCap: Prompt-Guided Image Captioning for VQA with GPT-3

no code implementations ICCV 2023 Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo

PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).

Image Captioning Question Answering +3

QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity

1 code implementation CVPR 2023 Siyu Huang, Jie An, Donglai Wei, Jiebo Luo, Hanspeter Pfister

The mechanism of existing style transfer algorithms is by minimizing a hybrid loss function to push the generated image toward high similarities in both content and style.

Quantization Style Transfer +1

A Unified Framework for Contrastive Learning from a Perspective of Affinity Matrix

no code implementations26 Nov 2022 Wenbin Li, Meihao Kong, Xuesong Yang, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo

In this study, we present a new unified contrastive learning representation framework (named UniCLR) suitable for all the above four kinds of methods from a novel perspective of basic affinity matrix.

Contrastive Learning Representation Learning

Holistic Visual-Textual Sentiment Analysis with Prior Models

1 code implementation23 Nov 2022 Junyu Chen, Jie An, Hanjia Lyu, Christopher Kanan, Jiebo Luo

Visual-textual sentiment analysis aims to predict sentiment with the input of a pair of image and text, which poses a challenge in learning effective features for diverse input images.

Sentiment Analysis

Stare at What You See: Masked Image Modeling without Reconstruction

no code implementations CVPR 2023 Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo

However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?

PromptCap: Prompt-Guided Task-Aware Image Captioning

1 code implementation15 Nov 2022 Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A Smith, Jiebo Luo

PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).

Image Captioning Language Modelling +5

FeDXL: Provable Federated Learning for Deep X-Risk Optimization

1 code implementation26 Oct 2022 Zhishuai Guo, Rong Jin, Jiebo Luo, Tianbao Yang

To this end, we propose an active-passive decomposition framework that decouples the gradient's components with two types, namely active parts and passive parts, where the active parts depend on local data that are computed with the local model and the passive parts depend on other machines that are communicated/computed based on historical models and samples.

Federated Learning

Learning a Grammar Inducer from Massive Uncurated Instructional Videos

1 code implementation22 Oct 2022 Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo

While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.

Language Acquisition Video Alignment

TLDW: Extreme Multimodal Summarisation of News Videos

no code implementations16 Oct 2022 Peggy Tang, Kun Hu, Lei Zhang, Jiebo Luo, Zhiyong Wang

Multimodal summarisation with multimodal output is drawing increasing attention due to the rapid growth of multimedia data.

Sentence

Contextual Modeling for 3D Dense Captioning on Point Clouds

no code implementations8 Oct 2022 Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma

With such global and local contextual modeling strategies, our proposed model can effectively characterize the object representations and contextual information and thereby generate comprehensive and detailed descriptions of the located objects.

3D dense captioning Dense Captioning +2

Causal Inference via Nonlinear Variable Decorrelation for Healthcare Applications

no code implementations29 Sep 2022 Junda Wang, Weijian Li, Han Wang, Hanjia Lyu, Caroline Thirukumaran, Addisu Mesfin, Jiebo Luo

Causal inference and model interpretability research are gaining increasing attention, especially in the domains of healthcare and bioinformatics.

Causal Inference

Bi-Calibration Networks for Weakly-Supervised Video Representation Learning

1 code implementation21 Jun 2022 Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

The video-to-text/video-to-query projections over text prototypes/query vocabulary then start the text-to-query or query-to-text calibration to estimate the amendment to query or text.

Representation Learning

Stand-Alone Inter-Frame Attention in Video Models

1 code implementation CVPR 2022 Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo, Tao Mei

In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location.

Action Classification Action Recognition +1

Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization

no code implementations12 Jun 2022 Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo

The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing.

Domain Generalization Language Modeling +4

Automatic Relation-aware Graph Network Proliferation

1 code implementation CVPR 2022 Shaofei Cai, Liang Li, Xinzhe Han, Jiebo Luo, Zheng-Jun Zha, Qingming Huang

However, the currently used graph search space overemphasizes learning node features and neglects mining hierarchical relational information.

Graph Classification Graph Learning +5

Localized Adversarial Domain Generalization

1 code implementation CVPR 2022 Wei Zhu, Le Lu, Jing Xiao, Mei Han, Jiebo Luo, Adam P. Harrison

Adversarial domain generalization is a popular approach to DG, but conventional approaches (1) struggle to sufficiently align features so that local neighborhoods are mixed across domains; and (2) can suffer from feature space over collapse which can threaten generalization performance.

Domain Generalization

Deep Federated Anomaly Detection for Multivariate Time Series Data

no code implementations9 May 2022 Wei Zhu, Dongjin Song, Yuncong Chen, Wei Cheng, Bo Zong, Takehiko Mizoguchi, Cristian Lumezanu, Haifeng Chen, Jiebo Luo

Specifically, we first design an Exemplar-based Deep Neural network (ExDNN) to learn local time series representations based on their compatibility with an exemplar module which consists of hidden parameters learned to capture varieties of normal patterns on each edge device.

Constrained Clustering Federated Learning +3

Explainable Fairness in Recommendation

no code implementations24 Apr 2022 Yingqiang Ge, Juntao Tan, Yan Zhu, Yinglong Xia, Jiebo Luo, Shuchang Liu, Zuohui Fu, Shijie Geng, Zelong Li, Yongfeng Zhang

In this paper, we study the problem of explainable fairness, which helps to gain insights about why a system is fair or unfair, and guides the design of fair recommender systems with a more informed and unified methodology.

counterfactual Fairness +1

CM-GAN: Image Inpainting with Cascaded Modulation GAN and Object-Aware Training

1 code implementation22 Mar 2022 Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Ning Xu, Sohrab Amirghodsi, Jiebo Luo

We propose cascaded modulation GAN (CM-GAN), a new network design consisting of an encoder with Fourier convolution blocks that extract multi-scale feature representations from the input image with holes and a dual-stream decoder with a novel cascaded global-spatial modulation block at each scale level.

Decoder Image Inpainting

Breast Cancer Induced Bone Osteolysis Prediction Using Temporal Variational Auto-Encoders

no code implementations20 Mar 2022 Wei Xiong, Neil Yeung, Shubo Wang, Haofu Liao, Liyun Wang, Jiebo Luo

Its ability of predicting the development of bone lesions in cancer-invading bones can assist in assessing the risk of impending fractures and choosing proper treatments in breast cancer bone metastasis.

Computed Tomography (CT) Deep Learning

RawlsGCN: Towards Rawlsian Difference Principle on Graph Convolutional Network

no code implementations28 Feb 2022 Jian Kang, Yan Zhu, Yinglong Xia, Jiebo Luo, Hanghang Tong

Graph Convolutional Network (GCN) plays pivotal roles in many real-world applications.

Point Cloud Denoising via Momentum Ascent in Gradient Fields

1 code implementation21 Feb 2022 Yaping Zhao, Haitian Zheng, Zhongrui Wang, Jiebo Luo, Edmund Y. Lam

To achieve point cloud denoising, traditional methods heavily rely on geometric priors, and most learning-based approaches suffer from outliers and loss of details.

Denoising Position

MANet: Improving Video Denoising with a Multi-Alignment Network

1 code implementation20 Feb 2022 Yaping Zhao, Haitian Zheng, Zhongrui Wang, Jiebo Luo, Edmund Y. Lam

In video denoising, the adjacent frames often provide very useful information, but accurate alignment is needed before such information can be harnassed.

Denoising Video Denoising

Cross-modal Contrastive Distillation for Instructional Activity Anticipation

no code implementations18 Jan 2022 Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo

In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation.

Knowledge Distillation

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Color Editing

no code implementations CVPR 2022 Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu

Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) show great knowledge transfer and generalization capability on various downstream tasks within their domains.

Decoder Image-to-Image Translation +2

Multi-modal Dependency Tree for Video Captioning

no code implementations NeurIPS 2021 Wentian Zhao, Xinxiao wu, Jiebo Luo

To this end, we propose a novel video captioning method that generates a sentence by first constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology.

Caption Generation Dependency Parsing +3

SpaceEdit: Learning a Unified Editing Space for Open-Domain Image Editing

no code implementations30 Nov 2021 Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu

Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains.

Decoder Image-to-Image Translation +2

Music Sentiment Transfer

1 code implementation12 Oct 2021 Miles Sigel, Michael Zhou, Jiebo Luo

Results and literature suggest that the task of music sentiment transfer is more difficult than image sentiment transfer because of the temporal characteristics of music and lack of existing datasets.

Style Transfer

Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning

no code implementations ICCV 2021 Jing Bi, Jiebo Luo, Chenliang Xu

In this work, we leverage instructional videos to study humans' decision-making processes, focusing on learning a model to plan goal-directed actions in real-life videos.

Action Recognition Bayesian Inference +1

CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation

1 code implementation30 Sep 2021 Xiao Wang, Jingen Liu, Tao Mei, Jiebo Luo

Unlike the mainstream clustering-based methods, our framework exploits a transformer-based feature reconstruction scheme to detect event boundary by reconstruction errors.

Boundary Detection Event Segmentation +1

Federated Learning of Molecular Properties with Graph Neural Networks in a Heterogeneous Setting

no code implementations15 Sep 2021 Wei Zhu, Jiebo Luo, Andrew White

FLIT(+) can align the local training across heterogeneous clients by improving the performance for uncertain samples.

Federated Learning

Learning to Aggregate and Refine Noisy Labels for Visual Sentiment Analysis

no code implementations15 Sep 2021 Wei Zhu, Zihe Zheng, Haitian Zheng, Hanjia Lyu, Jiebo Luo

The learned prototypes and their labels can be regarded as denoising features and labels for the local regions and can guide the training process to prevent the model from overfitting the noisy cases.

Denoising Learning with noisy labels +1

LibFewShot: A Comprehensive Library for Few-shot Learning

2 code implementations10 Sep 2021 Wenbin Li, Ziyi, Wang, Xuesong Yang, Chuanqi Dong, Pinzhuo Tian, Tiexin Qin, Jing Huo, Yinghuan Shi, Lei Wang, Yang Gao, Jiebo Luo

Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmarks with various backbone architectures to evaluate common pitfalls and effects of different training tricks.

Data Augmentation Few-Shot Image Classification +2

Learning Fine-Grained Motion Embedding for Landscape Animation

no code implementations6 Sep 2021 Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.

Motion Generation

Multi-Modulation Network for Audio-Visual Event Localization

no code implementations26 Aug 2021 Hao Wang, Zheng-Jun Zha, Liang Li, Xuejin Chen, Jiebo Luo

We propose a novel MultiModulation Network (M2N) to learn the above correlation and leverage it as semantic guidance to modulate the related auditory, visual, and fused features.

audio-visual event localization

UniFaceGAN: A Unified Framework for Temporally Consistent Facial Video Editing

no code implementations12 Aug 2021 Meng Cao, HaoZhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, Jiebo Luo

Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.

3D Reconstruction Face Reenactment +3

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

no code implementations26 Jul 2021 Wentian Zhao, Yao Hu, HeDa Wang, Xinxiao wu, Jiebo Luo

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article.

Graph Attention Image Captioning +1

Adaptive Recursive Circle Framework for Fine-grained Action Recognition

no code implementations25 Jul 2021 Hanxi Lin, Xinxiao wu, Jiebo Luo

It inherits the operators and parameters of the original layer but is slightly different in the use of those operators and parameters.

ARC Fine-grained Action Recognition

Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings

1 code implementation22 Jul 2021 Wenbin Li, Xuesong Yang, Meihao Kong, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo

However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all.

Representation Learning Self-Supervised Learning +2

Structured Multi-Level Interaction Network for Video Moment Localization via Language Query

no code implementations CVPR 2021 Hao Wang, Zheng-Jun Zha, Liang Li, Dong Liu, Jiebo Luo

In particular, for cross-modal interaction, we interact the sentence-level query with the whole moment while interact the word-level query with content and boundary, as in a coarse-to-fine manner.

Sentence

Improving OCR-Based Image Captioning by Incorporating Geometrical Relationship

no code implementations CVPR 2021 Jing Wang, Jinhui Tang, Mingkun Yang, Xiang Bai, Jiebo Luo

Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.

Image Captioning Optical Character Recognition (OCR) +1

How COVID-19 Has Changed Crowdfunding: Evidence From GoFundMe

no code implementations18 Jun 2021 Junda Wang, Xupin Zhang, Jiebo Luo

More importantly, sentiment analysis and the paired sample t-test are performed to examine the differences in crowdfunding campaigns before and after the COVID-19 outbreak that started in March 2020.

counterfactual Sentiment Analysis

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

1 code implementation ICCV 2021 Zhengyuan Yang, Songyang Zhang, LiWei Wang, Jiebo Luo

3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.

3D visual grounding Object +1

Few-shot Partial Multi-view Learning

no code implementations5 May 2021 Yuan Zhou, Yanrong Guo, Shijie Hao, Richang Hong, Jiebo Luo

The challenges of this task are twofold: (i) it is difficult to overcome the impact of data scarcity under the interference of missing views; (ii) the limited number of data exacerbates information scarcity, thus making it harder to address the view-missing issue in turn.

Few-Shot Learning MULTI-VIEW LEARNING

Video-aided Unsupervised Grammar Induction

1 code implementation NAACL 2021 Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo

We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.

Optical Character Recognition (OCR)

40