no code implementations • ECCV 2020 • Haitian Zheng, Haofu Liao, Lele Chen, Wei Xiong, Tianlang Chen, Jiebo Luo
Example-guided image synthesis has recently been attempted to synthesize an image from a semantic label map and an exemplary image.
no code implementations • 27 Jan 2025 • Jingyuan Chen, Yuan YAO, Mie Anderson, Natalie Hauglund, Celia Kjaerby, Verena Untiet, Maiken Nedergaard, Jiebo Luo
For single-channel inference, our model also outperforms the transformer-based models trained with single-channel signals.
no code implementations • 20 Jan 2025 • Yawen Zheng, Hanjia Lyu, Jiebo Luo
Emojis have become a universal language in online communication, often carrying nuanced and context-dependent meanings.
no code implementations • 15 Jan 2025 • Jingyuan Chen, Fuchen Long, Jie An, Zhaofan Qiu, Ting Yao, Jiebo Luo, Tao Mei
Extensive experiments of long video generation on the VBench benchmark demonstrate the superiority of our Ouroboros-Diffusion, particularly in terms of subject consistency, motion smoothness, and temporal consistency.
no code implementations • 20 Dec 2024 • Jiadong Pan, Hongcheng Gao, Liang Li, Zheng-Jun Zha, Qingming Huang, Jiebo Luo
Experimental results show that by incorporating HGR, images generated by diffusion models achieve both high quality and strong safety, and safe DMs trained through unsupervised methods according to the harmfulness detected by HGR also exhibit good safety performance.
no code implementations • 11 Dec 2024 • Yayun Qi, Hongxi Li, Yiqi Song, Xinxiao wu, Jiebo Luo
The exploration of various vision-language tasks, such as visual captioning, visual question answering, and visual commonsense reasoning, is an important area in artificial intelligence and continuously attracts the research community's attention.
no code implementations • 8 Dec 2024 • Zhenghong Zhou, Jie An, Jiebo Luo
Precise camera pose control is crucial for video generation with diffusion models.
no code implementations • 3 Dec 2024 • Junda Wu, Hanjia Lyu, Yu Xia, Zhehao Zhang, Joe Barrow, Ishita Kumar, Mehrnoosh Mirtaheri, Hongjie Chen, Ryan A. Rossi, Franck Dernoncourt, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Namyong Park, Sungchul Kim, Huanrui Yang, Subrata Mitra, Zhengmian Hu, Nedim Lipka, Dang Nguyen, Yue Zhao, Jiebo Luo, Julian McAuley
We propose an intuitive taxonomy for categorizing the techniques used to personalize MLLMs to individual users, and discuss the techniques accordingly.
1 code implementation • 26 Nov 2024 • Shenghai Yuan, Jinfa Huang, Xianyi He, Yunyuan Ge, Yujun Shi, Liuhan Chen, Jiebo Luo, Li Yuan
We propose a hierarchical training strategy to leverage frequency information for identity preservation, transforming a vanilla pre-trained video generation model into an IPT2V model.
no code implementations • 23 Nov 2024 • Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Zhifei Zhang, Yilin Wang, Jianming Zhang, Jiebo Luo
To support this endeavor, we introduce COMPOSITIONCAP, a new dataset for multi-grained region compositional image captioning, which introduces the task of compositional attribute-aware regional image captioning.
1 code implementation • 20 Nov 2024 • Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia Lin, Jinfa Huang, Jiayi Ji, Fei Chao, Jiebo Luo, Rongrong Ji
Existing large video-language models (LVLMs) struggle to comprehend long videos correctly due to limited context.
no code implementations • 18 Nov 2024 • Taowen Wang, Dongfang Liu, James Chenhao Liang, Wenhao Yang, Qifan Wang, Cheng Han, Jiebo Luo, Ruixiang Tang
Recently in robotics, Vision-Language-Action (VLA) models have emerged as a transformative approach, enabling robots to execute complex tasks by integrating visual and linguistic inputs within an end-to-end learning framework.
no code implementations • 10 Nov 2024 • Junda Wang, Weijian Li, Han Wang, Hanjia Lyu, Caroline P. Thirukumaran, Addisu Mesfin, Hong Yu, Jiebo Luo
On the ICD code prediction tasks, it achieved AUC Macro scores of 92. 8 on MIMIC-III and 96. 7 on MIMIC-IV, outperforming the state-of-the-art models KEPT and MSMN.
1 code implementation • 8 Nov 2024 • Jing Xiong, Gongye Liu, Lun Huang, Chengyue Wu, Taiqiang Wu, Yao Mu, Yuan YAO, Hui Shen, Zhongwei Wan, Jinfa Huang, Chaofan Tao, Shen Yan, Huaxiu Yao, Lingpeng Kong, Hongxia Yang, Mi Zhang, Guillermo Sapiro, Jiebo Luo, Ping Luo, Ngai Wong
Autoregressive modeling has been a huge success in the field of natural language processing (NLP).
1 code implementation • 28 Oct 2024 • Jie An, De Wang, Pengsheng Guo, Jiebo Luo, Alexander Schwing
Furthermore, we empirically find that both the placement and the effective attention size of these local attention windows are crucial factors.
no code implementations • 28 Oct 2024 • Xinnong Zhang, Jiayu Lin, Libo Sun, Weihong Qi, Yihang Yang, Yue Chen, Hanjia Lyu, Xinyi Mou, Siming Chen, Jiebo Luo, Xuanjing Huang, Shiping Tang, Zhongyu Wei
We also introduce PPE, a poll-based presidential election benchmark to assess the performance of our framework under the U. S. presidential election scenario.
no code implementations • 13 Oct 2024 • Hang Hua, Yunlong Tang, Ziyun Zeng, Liangliang Cao, Zhengyuan Yang, Hangfeng He, Chenliang Xu, Jiebo Luo
With MMCOMPOSITION, we can quantify and explore the compositionality of the mainstream VLMs.
1 code implementation • 7 Oct 2024 • Daoan Zhang, Guangchen Lan, Dong-Jun Han, Wenlin Yao, Xiaoman Pan, Hongming Zhang, Mingxiao Li, Pengcheng Chen, Yu Dong, Christopher Brinton, Jiebo Luo
To address the limitations of both on- and off-policy RLHF, we propose a preference optimization method that aligns DMs with preferences without relying on reward models or paired human-annotated data.
1 code implementation • 23 Sep 2024 • Zixuan Wang, Jiayi Li, Xiaoyu Qin, Shikun Sun, Songtao Zhou, Jia Jia, Jiebo Luo
Unlike human movements, which are always continuous, dance camera movements involve both continuous sequences of variable lengths and sudden drastic changes to simulate the switching of multiple cameras.
no code implementations • 19 Sep 2024 • Yongqi Wang, Shuo Yang, Xinxiao wu, Jiebo Luo
To address this challenge, we propose to unify object trajectory detection and relationship classification into an end-to-end open-vocabulary framework.
1 code implementation • 16 Sep 2024 • Zhongyi Qiu, Kangyi Qiu, Hanjia Lyu, Wei Xiong, Jiebo Luo
Existing emoji recommendation methods are primarily evaluated based on their ability to match the exact emoji a user chooses in the original text.
1 code implementation • 12 Sep 2024 • Qingqiao Hu, Daoan Zhang, Jiebo Luo, Zhenyu Gong, Benedikt Wiestler, JianGuo Zhang, Hongwei Bran Li
Learning meaningful and interpretable representations from high-dimensional volumetric magnetic resonance (MR) images is essential for advancing personalized medicine.
no code implementations • 27 Aug 2024 • Hanjia Lyu, Ryan Rossi, Xiang Chen, Md Mehrab Tanjim, Stefano Petrangeli, Somdeb Sarkhel, Jiebo Luo
Large Language Models (LLMs) and Large Multimodal Models (LMMs) have been shown to enhance the effectiveness of enriching item descriptions, thereby improving the accuracy of recommendation systems.
no code implementations • 7 Aug 2024 • Hanjia Lyu, Hanqing Zeng, Yinglong Xia, Ren Chen, Jiebo Luo
In this paper, we address these challenges by introducing an intermediate "interest" layer between users and items.
2 code implementations • 30 Jul 2024 • Jinfa Huang, Jinsheng Pan, Zhongwei Wan, Hanjia Lyu, Jiebo Luo
To this end, we propose Evolver, which incorporates LMMs via Chain-of-Evolution (CoE) Prompting, by integrating the evolution attribute and in-context information of memes.
1 code implementation • 24 Jul 2024 • Yanqi Bao, Tianyu Ding, Jing Huo, Yaoli Liu, Yuxin Li, Wenbin Li, Yang Gao, Jiebo Luo
3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations.
no code implementations • 20 Jul 2024 • Beichen Zhang, Liang Li, Zheng-Jun Zha, Jiebo Luo, Qingming Huang
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
no code implementations • 16 Jul 2024 • Weihong Qi, Hanjia Lyu, Jiebo Luo
This study seeks to identify and quantify biases in simulating political samples with Large Language Models, specifically focusing on vote choice and public opinion.
2 code implementations • 26 Jun 2024 • Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan
We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e. g. Sora and Lumiere) in time-lapse video generation.
no code implementations • 13 Jun 2024 • Yue Xu, Kaizhi Yang, Jiebo Luo, Xuejin Chen
3D visual grounding is an emerging research area dedicated to making connections between the 3D physical world and natural language, which is crucial for achieving embodied intelligence.
1 code implementation • 13 Jun 2024 • Chenwei Lin, Hanjia Lyu, Xian Xu, Jiebo Luo
There is no systematic review of multimodal tasks in the insurance domain, nor a benchmark specifically designed to evaluate the capabilities of LVLMs in insurance.
1 code implementation • 27 May 2024 • Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo
To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks.
1 code implementation • 24 May 2024 • Yongsheng Yu, Jiebo Luo
Conventional demographic inference methods have predominantly operated under the supervision of accurately labeled data, yet struggle to adapt to shifting social landscapes and diverse cultural contexts, leading to narrow specialization and limited accuracy in applications.
no code implementations • 17 May 2024 • Bo Wu, Peiye Liu, Wen-Huang Cheng, Bei Liu, Zhaoyang Zeng, Jia Wang, Qiushi Huang, Jiebo Luo
The research progress analysis provides an overall analysis of the solutions and trends in recent years.
1 code implementation • 23 Apr 2024 • Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang
This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time.
no code implementations • 23 Apr 2024 • Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.
no code implementations • 23 Apr 2024 • Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang
In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal dataset of CT images and reports.
no code implementations • 22 Apr 2024 • Yuying Li, Zeyan Liu, Junyi Zhao, Liangqin Ren, Fengjun Li, Jiebo Luo, Bo Luo
In a benchmarking study, we further evaluate if state-of-the-art open-source and commercial AI image detectors can effectively identify the images in the ARIA dataset.
no code implementations • 18 Apr 2024 • Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo
Recent efforts have been made to expand from unimodal to multimodal video summarization, categorizing the task into three sub-tasks based on the summary's modality: video-to-video (V2V), video-to-text (V2T), and a combination of video and text summarization (V2VT).
no code implementations • 15 Apr 2024 • Chenwei Lin, Hanjia Lyu, Jiebo Luo, Xian Xu
The emergence of Large Multimodal Models (LMMs) marks a significant milestone in the development of artificial intelligence.
2 code implementations • 7 Apr 2024 • Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo
Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.
no code implementations • CVPR 2024 • Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei
Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE.
Ranked #9 on
Video Super-Resolution
on Vid4 - 4x upscaling
1 code implementation • CVPR 2024 • Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo
However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data.
no code implementations • 20 Feb 2024 • Xinnong Zhang, Haoyu Kuang, Xinyi Mou, Hanjia Lyu, Kun Wu, Siming Chen, Jiebo Luo, Xuanjing Huang, Zhongyu Wei
The powerful Large Vision Language Models make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall short in aligning with the unique speaking style and context of social media tasks.
1 code implementation • 1 Feb 2024 • Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu
Existing methods like Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant strides in facial attribute control such as facial animation and components editing, yet they struggle with fine-grained representation and scalability in dynamic head modeling.
2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jinfa Huang, Junwu Zhang, Yatian Pang, Peng Jin, Munan Ning, Jiebo Luo, Li Yuan
In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.
Ranked #146 on
Visual Question Answering
on MM-Vet
1 code implementation • 28 Jan 2024 • Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan
At inference, we generate images with arbitrary expansion multiples by inputting an anchor image and its corresponding positional embeddings.
1 code implementation • 16 Jan 2024 • Hanjia Lyu, Weihong Qi, Zhongyu Wei, Jiebo Luo
Leveraging Large Multimodal Models (LMMs) to simulate human behaviors when processing multimodal information, especially in the context of social media, has garnered immense interest due to its broad potential and far-reaching implications.
1 code implementation • 5 Jan 2024 • Daoan Zhang, Junming Yang, Hanjia Lyu, Zijian Jin, Yuan YAO, Mingkai Chen, Jiebo Luo
When exploring the development of Artificial General Intelligence (AGI), a critical task for these models involves interpreting and processing information from multiple image inputs.
Ranked #3 on
Visual Reasoning
on Winoground
no code implementations • 4 Jan 2024 • Jie An, Zhengyuan Yang, JianFeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo
The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function.
1 code implementation • 29 Dec 2023 • Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Pinxin Liu, Mingqian Feng, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu
With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.
2 code implementations • 22 Dec 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang
Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.
no code implementations • 13 Dec 2023 • WenTing Chen, Linlin Shen, Jingyang Lin, Jiebo Luo, Xiang Li, Yixuan Yuan
To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process.
1 code implementation • 13 Nov 2023 • Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo
Our investigation begins with a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a careful review of the results and a selection of qualitative samples that illustrate GPT-4V's potential in understanding multimodal social media content.
1 code implementation • 9 Nov 2023 • Hanqing Zeng, Hanjia Lyu, Diyi Hu, Yinglong Xia, Jiebo Luo
We propose to decouple the two modalities by Mixture of weak and strong experts (Mowst), where the weak expert is a light-weight Multi-layer Perceptron (MLP), and the strong expert is an off-the-shelf GNN.
1 code implementation • 9 Nov 2023 • Hongjian Zhou, Fenglin Liu, Boyang Gu, Xinyu Zou, Jinfa Huang, Jinge Wu, Yiru Li, Sam S. Chen, Peilin Zhou, Junling Liu, Yining Hua, Chengfeng Mao, Chenyu You, Xian Wu, Yefeng Zheng, Lei Clifton, Zheng Li, Jiebo Luo, David A. Clifton
Therefore, this review aims to provide a detailed overview of the development and deployment of LLMs in medicine, including the challenges and opportunities they face.
1 code implementation • 24 Oct 2023 • Jian Kang, Yinglong Xia, Ross Maciejewski, Jiebo Luo, Hanghang Tong
We study deceptive fairness attacks on graphs to answer the following question: How can we achieve poisoning attacks on a graph learning model to exacerbate the bias deceptively?
no code implementations • 11 Oct 2023 • Jie An, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo
We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.
no code implementations • 18 Sep 2023 • Jinsheng Pan, Zichen Wang, Weihong Qi, Hanjia Lyu, Jiebo Luo
Understanding the framing of political issues is of paramount importance as it significantly shapes how individuals perceive, interpret, and engage with these matters.
1 code implementation • 17 Aug 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang
However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline.
1 code implementation • 14 Aug 2023 • Alexander Martin, Haitian Zheng, Jie An, Jiebo Luo
In this work, we use text-guided latent diffusion models for zero-shot image-to-image translation (I2I) across large domain gaps (longI2I), where large amounts of new visual features and new geometry need to be generated to enter the target domain.
1 code implementation • 2 Aug 2023 • Juntao Tan, Yingqiang Ge, Yan Zhu, Yinglong Xia, Jiebo Luo, Jianchao Ji, Yongfeng Zhang
Acknowledging the recent advancements in explainable recommender systems that enhance users' understanding of recommendation mechanisms, we propose leveraging these advancements to improve user controllability.
no code implementations • 31 Jul 2023 • Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo
In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.
no code implementations • 24 Jul 2023 • Hanjia Lyu, Song Jiang, Hanqing Zeng, Yinglong Xia, Qifan Wang, Si Zhang, Ren Chen, Christopher Leung, Jiajie Tang, Jiebo Luo
Notably, the success of LLM-Rec lies in its prompting strategies, which effectively tap into the language model's comprehension of both general and specific item characteristics.
1 code implementation • 26 Jun 2023 • Siyu Huang, Jie An, Donglai Wei, Zudi Lin, Jiebo Luo, Hanspeter Pfister
However, given a UNIT model trained on certain domains, it is difficult for current methods to incorporate new domains because they often need to train the full model on both existing and new domains.
1 code implementation • 25 Jun 2023 • Yaping Zhao, Haitian Zheng, Jiebo Luo, Edmund Y. Lam
With the advancements in deep learning, video colorization by propagating color information from a colorized reference frame to a monochrome video sequence has been well explored.
no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai
It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.
no code implementations • 31 May 2023 • Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.
no code implementations • 8 May 2023 • Junyu Chen, Jie An, Hanjia Lyu, Christopher Kanan, Jiebo Luo
Assessing the artness of AI-generated images continues to be a challenge within the realm of image generation.
no code implementations • 17 Apr 2023 • Jie An, Songyang Zhang, Harry Yang, Sonal Gupta, Jia-Bin Huang, Jiebo Luo, Xi Yin
In contrast, we propose a parameter-free temporal shift module that can leverage the spatial U-Net as is for video generation.
no code implementations • CVPR 2023 • Jin Chen, Zhi Gao, Xinxiao wu, Jiebo Luo
Under this paradigm, we propose a meta-causal learning method to learn meta-knowledge, that is, how to infer the causes of domain shift between the auxiliary and source domains during training.
Ranked #2 on
Single-Source Domain Generalization
on PACS
no code implementations • 28 Mar 2023 • Jinsheng Pan, Weihong Qi, Zichen Wang, Hanjia Lyu, Jiebo Luo
There is a broad consensus that news media outlets incorporate ideological biases in their news articles.
no code implementations • 28 Mar 2023 • Jingyang Lin, Junyu Chen, Hanjia Lyu, Igor Khodak, Divya Chhabra, Colby L Day Richardson, Irina Prelipcean, Andrew M Dylag, Jiebo Luo
In this work, we first analyze the correlations between three adverse neonatal outcomes and then formulate the diagnosis of multiple neonatal outcomes as a multi-task learning (MTL) problem.
no code implementations • 23 Mar 2023 • Hanjia Lyu, Arsal Imtiaz, Yufei Zhao, Jiebo Luo
On one hand, human behavior is found to shape the spread of the disease.
1 code implementation • 21 Mar 2023 • Jingyang Lin, Hang Hua, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo
We propose a new joint video and text summarization task.
Ranked #1 on
Video Summarization
on videoxum
1 code implementation • ICCV 2023 • Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method.
1 code implementation • ICCV 2023 • Pingyu Wu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Specifically, a spatial token is first introduced in the input space to aggregate representations for localization task.
no code implementations • 15 Mar 2023 • Wei Zhu, Runtao Zhou, Yao Yuan, Campbell Timothy, Rajat Jain, Jiebo Luo
However, the shortage of annotated training data poses a severe problem in improving the performance and generalization ability of the trained model.
no code implementations • 2 Feb 2023 • Xingping Dong, Jianbing Shen, Fatih Porikli, Jiebo Luo, Ling Shao
Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples, and find that the failure cases in some challenging situations can be regarded as the issue of missing decisive samples in offline training.
1 code implementation • 16 Jan 2023 • Hanjia Lyu, Jinsheng Pan, Zichen Wang, Jiebo Luo
We first adopt a human-guided machine learning framework to develop a new dataset for hyperpartisan news title detection with 2, 200 manually labeled and 1. 8 million machine-labeled titles that were posted from 2014 to the present by nine representative media organizations across three media bias groups - Left, Central, and Right in an active learning manner.
no code implementations • ICCV 2023 • Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A. Smith, Jiebo Luo
PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).
no code implementations • CVPR 2023 • Chengzhi Cao, Xueyang Fu, Hongjian Liu, Yukun Huang, Kunyu Wang, Jiebo Luo, Zheng-Jun Zha
Video-based person re-identification (Re-ID) is a prominent computer vision topic due to its wide range of video surveillance applications.
Representation Learning
Video-Based Person Re-Identification
1 code implementation • CVPR 2023 • Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei
Point cloud completion aims to recover the completed 3D shape of an object from its partial observation.
1 code implementation • CVPR 2023 • Siyu Huang, Jie An, Donglai Wei, Jiebo Luo, Hanspeter Pfister
The mechanism of existing style transfer algorithms is by minimizing a hybrid loss function to push the generated image toward high similarities in both content and style.
no code implementations • 17 Dec 2022 • Yuan YAO, Yuanhan Zhang, Zhenfei Yin, Jiebo Luo, Wanli Ouyang, Xiaoshui Huang
The recent success of pre-trained 2D vision models is mostly attributable to learning from large-scale datasets.
no code implementations • 13 Dec 2022 • Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Qing Liu, Yuqian Zhou, Sohrab Amirghodsi, Jiebo Luo
Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects.
no code implementations • 26 Nov 2022 • Wenbin Li, Meihao Kong, Xuesong Yang, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo
In this study, we present a new unified contrastive learning representation framework (named UniCLR) suitable for all the above four kinds of methods from a novel perspective of basic affinity matrix.
1 code implementation • 23 Nov 2022 • Junyu Chen, Jie An, Hanjia Lyu, Christopher Kanan, Jiebo Luo
Visual-textual sentiment analysis aims to predict sentiment with the input of a pair of image and text, which poses a challenge in learning effective features for diverse input images.
no code implementations • CVPR 2023 • Hongwei Xue, Peng Gao, Hongyang Li, Yu Qiao, Hao Sun, Houqiang Li, Jiebo Luo
However, unlike the low-level features such as pixel values, we argue the features extracted by powerful teacher models already encode rich semantic correlation across regions in an intact image. This raises one question: is reconstruction necessary in Masked Image Modeling (MIM) with a teacher model?
1 code implementation • 15 Nov 2022 • Yushi Hu, Hang Hua, Zhengyuan Yang, Weijia Shi, Noah A Smith, Jiebo Luo
PromptCap outperforms generic captions by a large margin and achieves state-of-the-art accuracy on knowledge-based VQA tasks (60. 4% on OK-VQA and 59. 6% on A-OKVQA).
Ranked #1 on
Visual Question Answering
on TextVQA test-standard
1 code implementation • 26 Oct 2022 • Zhishuai Guo, Rong Jin, Jiebo Luo, Tianbao Yang
To this end, we propose an active-passive decomposition framework that decouples the gradient's components with two types, namely active parts and passive parts, where the active parts depend on local data that are computed with the local model and the passive parts depend on other machines that are communicated/computed based on historical models and samples.
1 code implementation • 22 Oct 2022 • Songyang Zhang, Linfeng Song, Lifeng Jin, Haitao Mi, Kun Xu, Dong Yu, Jiebo Luo
While previous work focuses on building systems for inducing grammars on text that are well-aligned with video content, we investigate the scenario, in which text and video are only in loose correspondence.
no code implementations • 16 Oct 2022 • Peggy Tang, Kun Hu, Lei Zhang, Jiebo Luo, Zhiyong Wang
Multimodal summarisation with multimodal output is drawing increasing attention due to the rapid growth of multimedia data.
no code implementations • 8 Oct 2022 • Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma
With such global and local contextual modeling strategies, our proposed model can effectively characterize the object representations and contextual information and thereby generate comprehensive and detailed descriptions of the located objects.
no code implementations • 29 Sep 2022 • Junda Wang, Weijian Li, Han Wang, Hanjia Lyu, Caroline Thirukumaran, Addisu Mesfin, Jiebo Luo
Causal inference and model interpretability research are gaining increasing attention, especially in the domains of healthcare and bioinformatics.
1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo
and 2) how to mitigate the impact of these factors?
Ranked #2 on
Video Retrieval
on MSR-VTT-1kA
(using extra training data)
no code implementations • 26 Jul 2022 • Weijian Li, Wei Zhu, E. Ray Dorsey, Jiebo Luo
Medication for neurological diseases such as the Parkinson's disease usually happens remotely away from hospitals.
1 code implementation • 21 Jun 2022 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
The video-to-text/video-to-query projections over text prototypes/query vocabulary then start the text-to-query or query-to-text calibration to estimate the amendment to query or text.
1 code implementation • CVPR 2022 • Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo, Tao Mei
In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location.
Ranked #14 on
Action Recognition
on Something-Something V1
no code implementations • 12 Jun 2022 • Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo
The advent of large-scale pre-trained language models has contributed greatly to the recent progress in natural language processing.
1 code implementation • CVPR 2022 • Shaofei Cai, Liang Li, Xinzhe Han, Jiebo Luo, Zheng-Jun Zha, Qingming Huang
However, the currently used graph search space overemphasizes learning node features and neglects mining hierarchical relational information.
Ranked #2 on
Link Prediction
on TSP/HCP Benchmark set
1 code implementation • CVPR 2022 • Wei Zhu, Le Lu, Jing Xiao, Mei Han, Jiebo Luo, Adam P. Harrison
Adversarial domain generalization is a popular approach to DG, but conventional approaches (1) struggle to sufficiently align features so that local neighborhoods are mixed across domains; and (2) can suffer from feature space over collapse which can threaten generalization performance.
no code implementations • 9 May 2022 • Wei Zhu, Dongjin Song, Yuncong Chen, Wei Cheng, Bo Zong, Takehiko Mizoguchi, Cristian Lumezanu, Haifeng Chen, Jiebo Luo
Specifically, we first design an Exemplar-based Deep Neural network (ExDNN) to learn local time series representations based on their compatibility with an exemplar module which consists of hidden parameters learned to capture varieties of normal patterns on each edge device.
no code implementations • 24 Apr 2022 • Yingqiang Ge, Juntao Tan, Yan Zhu, Yinglong Xia, Jiebo Luo, Shuchang Liu, Zuohui Fu, Shijie Geng, Zelong Li, Yongfeng Zhang
In this paper, we study the problem of explainable fairness, which helps to gain insights about why a system is fair or unfair, and guides the design of fair recommender systems with a more informed and unified methodology.
1 code implementation • 22 Mar 2022 • Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Eli Shechtman, Connelly Barnes, Jianming Zhang, Ning Xu, Sohrab Amirghodsi, Jiebo Luo
We propose cascaded modulation GAN (CM-GAN), a new network design consisting of an encoder with Fourier convolution blocks that extract multi-scale feature representations from the input image with holes and a dual-stream decoder with a novel cascaded global-spatial modulation block at each scale level.
Ranked #3 on
Image Inpainting
on Places2
no code implementations • 20 Mar 2022 • Wei Xiong, Neil Yeung, Shubo Wang, Haofu Liao, Liyun Wang, Jiebo Luo
Its ability of predicting the development of bone lesions in cancer-invading bones can assist in assessing the risk of impending fractures and choosing proper treatments in breast cancer bone metastasis.
2 code implementations • CVPR 2022 • Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Non-exemplar class-incremental learning is to recognize both the old and new classes when old class samples cannot be saved.
no code implementations • 28 Feb 2022 • Jian Kang, Yan Zhu, Yinglong Xia, Jiebo Luo, Hanghang Tong
Graph Convolutional Network (GCN) plays pivotal roles in many real-world applications.
1 code implementation • 21 Feb 2022 • Yaping Zhao, Haitian Zheng, Zhongrui Wang, Jiebo Luo, Edmund Y. Lam
To achieve point cloud denoising, traditional methods heavily rely on geometric priors, and most learning-based approaches suffer from outliers and loss of details.
1 code implementation • 20 Feb 2022 • Yaping Zhao, Haitian Zheng, Zhongrui Wang, Jiebo Luo, Edmund Y. Lam
In video denoising, the adjacent frames often provide very useful information, but accurate alignment is needed before such information can be harnassed.
no code implementations • 18 Jan 2022 • Zhengyuan Yang, Jingen Liu, Jing Huang, Xiaodong He, Tao Mei, Chenliang Xu, Jiebo Luo
In this study, we aim to predict the plausible future action steps given an observation of the past and study the task of instructional activity anticipation.
no code implementations • CVPR 2022 • Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu
Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) show great knowledge transfer and generalization capability on various downstream tasks within their domains.
no code implementations • NeurIPS 2021 • Wentian Zhao, Xinxiao wu, Jiebo Luo
To this end, we propose a novel video captioning method that generates a sentence by first constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology.
no code implementations • 30 Nov 2021 • Jing Shi, Ning Xu, Haitian Zheng, Alex Smith, Jiebo Luo, Chenliang Xu
Recently, large pretrained models (e. g., BERT, StyleGAN, CLIP) have shown great knowledge transfer and generalization capability on various downstream tasks within their domains.
1 code implementation • 12 Oct 2021 • Miles Sigel, Michael Zhou, Jiebo Luo
Results and literature suggest that the task of music sentiment transfer is more difficult than image sentiment transfer because of the temporal characteristics of music and lack of existing datasets.
no code implementations • ICCV 2021 • Jing Bi, Jiebo Luo, Chenliang Xu
In this work, we leverage instructional videos to study humans' decision-making processes, focusing on learning a model to plan goal-directed actions in real-life videos.
1 code implementation • 30 Sep 2021 • Xiao Wang, Jingen Liu, Tao Mei, Jiebo Luo
Unlike the mainstream clustering-based methods, our framework exploits a transformer-based feature reconstruction scheme to detect event boundary by reconstruction errors.
no code implementations • 15 Sep 2021 • Wei Zhu, Jiebo Luo, Andrew White
FLIT(+) can align the local training across heterogeneous clients by improving the performance for uncertain samples.
no code implementations • 15 Sep 2021 • Wei Zhu, Zihe Zheng, Haitian Zheng, Hanjia Lyu, Jiebo Luo
The learned prototypes and their labels can be regarded as denoising features and labels for the local regions and can guide the training process to prevent the model from overfitting the noisy cases.
no code implementations • 14 Sep 2021 • Jinlong Ruan, Wei Wu, Jiebo Luo
The stock market is volatile and complicated, especially in 2020.
2 code implementations • 10 Sep 2021 • Wenbin Li, Ziyi, Wang, Xuesong Yang, Chuanqi Dong, Pinzhuo Tian, Tiexin Qin, Jing Huo, Yinghuan Shi, Lei Wang, Yang Gao, Jiebo Luo
Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmarks with various backbone architectures to evaluate common pitfalls and effects of different training tricks.
no code implementations • 6 Sep 2021 • Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.
1 code implementation • 30 Aug 2021 • Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang, Yi Zuo, Licheng Jiao, Chang Meng, Hao Wang, Jiahao Wang, Yiming Hui, Zhuojun Dong, Jie Zhang, Qianyue Bao, Zixiao Zhang, Fang Liu
This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images.
no code implementations • 26 Aug 2021 • Hao Wang, Zheng-Jun Zha, Liang Li, Xuejin Chen, Jiebo Luo
We propose a novel MultiModulation Network (M2N) to learn the above correlation and leverage it as semantic guidance to modulate the related auditory, visual, and fused features.
1 code implementation • ICCV 2021 • Heliang Zheng, Huan Yang, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
And the reference space is optimized to capture deep image priors that are useful for quality assessment.
no code implementations • 12 Aug 2021 • Meng Cao, HaoZhi Huang, Hao Wang, Xuan Wang, Li Shen, Sheng Wang, Linchao Bao, Zhifeng Li, Jiebo Luo
Compared with the state-of-the-art facial image editing methods, our framework generates video portraits that are more photo-realistic and temporally smooth.
no code implementations • ICCV 2021 • Wei Zhu, Haitian Zheng, Haofu Liao, Weijian Li, Jiebo Luo
We propose to remove the bias information misused by the target task with a cross-sample adversarial debiasing (CSAD) method.
no code implementations • 26 Jul 2021 • Wentian Zhao, Yao Hu, HeDa Wang, Xinxiao wu, Jiebo Luo
Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article.
no code implementations • 25 Jul 2021 • Hanxi Lin, Xinxiao wu, Jiebo Luo
It inherits the operators and parameters of the original layer but is slightly different in the use of those operators and parameters.
1 code implementation • 22 Jul 2021 • Wenbin Li, Xuesong Yang, Meihao Kong, Lei Wang, Jing Huo, Yang Gao, Jiebo Luo
However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all.
no code implementations • NAACL 2021 • Hang Hua, Xingjian Li, Dejing Dou, Cheng-Zhong Xu, Jiebo Luo
The brittleness of this process is often reflected by the sensitivity to random seeds.
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
no code implementations • CVPR 2021 • Hao Wang, Zheng-Jun Zha, Liang Li, Dong Liu, Jiebo Luo
In particular, for cross-modal interaction, we interact the sentence-level query with the whole moment while interact the word-level query with content and boundary, as in a coarse-to-fine manner.
no code implementations • CVPR 2021 • Jing Wang, Jinhui Tang, Mingkun Yang, Xiang Bai, Jiebo Luo
Under the guidance of the geometrical relationship between OCR tokens, our LSTM-R capitalizes on a newly-devised relation-aware pointer network to select OCR tokens from the scene text for OCR-based image captioning.
no code implementations • 18 Jun 2021 • Junda Wang, Xupin Zhang, Jiebo Luo
More importantly, sentiment analysis and the paired sample t-test are performed to examine the differences in crowdfunding campaigns before and after the COVID-19 outbreak that started in March 2020.
1 code implementation • ICCV 2021 • Zhengyuan Yang, Songyang Zhang, LiWei Wang, Jiebo Luo
3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.
no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo
To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.
no code implementations • 5 May 2021 • Yuan Zhou, Yanrong Guo, Shijie Hao, Richang Hong, Jiebo Luo
The challenges of this task are twofold: (i) it is difficult to overcome the impact of data scarcity under the interference of missing views; (ii) the limited number of data exacerbates information scarcity, thus making it harder to address the view-missing issue in turn.
1 code implementation • NAACL 2021 • Songyang Zhang, Linfeng Song, Lifeng Jin, Kun Xu, Dong Yu, Jiebo Luo
We investigate video-aided grammar induction, which learns a constituency parser from both unlabeled text and its corresponding video.