Search Results for author: Wei Ji

Found 62 papers, 37 papers with code

Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images

no code implementations • 6 Oct 2018 • Yiming Wu, Wei Ji, Xi Li, Gang Wang, Jianwei Yin, Fei Wu

As a fundamental and challenging problem in computer vision, hand pose estimation aims to estimate the hand joint locations from depth images.

Hand Pose Estimation

Paper
Add Code

An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

no code implementations • 30 Apr 2020 • Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller

In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety.

Sleep Quality

Paper
Add Code

Accurate RGB-D Salient Object Detection via Collaborative Learning

2 code implementations • ECCV 2020 • Wei Ji, Jingjing Li, Miao Zhang, Yongri Piao, Huchuan Lu

The explicitly extracted edge information goes together with saliency to give more emphasis to the salient regions and object boundaries.

Ranked #19 on RGB-D Salient Object Detection on NJU2K

Object object-detection +5

Paper
Code

Table2Charts: Recommending Charts by Learning Shared Table Representations

1 code implementation • 24 Aug 2020 • Mengyu Zhou, Qingtao Li, Xinyi He, Yuejiang Li, Yibo Liu, Wei Ji, Shi Han, Yining Chen, Daxin Jiang, Dongmei Zhang

It is common for people to create different types of charts to explore a multi-dimensional dataset (table).

Q-Learning Recommendation Systems

Paper
Code

ChemistryQA: A Complex Question Answering Dataset from Chemistry

no code implementations • 1 Jan 2021 • Zhuoyu Wei, Wei Ji, Xiubo Geng, Yining Chen, Baihua Chen, Tao Qin, Daxin Jiang

We notice that some real-world QA tasks are more complex, which cannot be solved by end-to-end neural networks or translated to any kind of formal representations.

Machine Reading Comprehension Math +1

Paper
Add Code

Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection

1 code implementation • ICCV 2021 • Miao Zhang, Jie Liu, Yifei Wang, Yongri Piao, Shunyu Yao, Wei Ji, Jingjing Li, Huchuan Lu, Zhongxuan Luo

Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner.

Ranked #12 on Video Polyp Segmentation on SUN-SEG-Easy (Unseen)

object-detection Salient Object Detection +2

Paper
Code

Boundary Proposal Network for Two-Stage Natural Language Video Localization

no code implementations • 15 Mar 2021 • Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao

State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.

Vocal Bursts Valence Prediction

Paper
Add Code

Conditional Hyper-Network for Blind Super-Resolution with Multiple Degradations

1 code implementation • 8 Apr 2021 • Guanghao Yin, Wei Wang, Zehuan Yuan, Wei Ji, Dongdong Yu, Shouqian Sun, Tat-Seng Chua, Changhu Wang

We extract degradation prior at task-level with the proposed ConditionNet, which will be used to adapt the parameters of the basic SR network (BaseNet).

Blind Super-Resolution Image Super-Resolution

Paper
Code

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

no code implementations • 26 May 2021 • Feifei Shao, Long Chen, Jian Shao, Wei Ji, Shaoning Xiao, Lu Ye, Yueting Zhuang, Jun Xiao

With the success of deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention.

Object object-detection +2

Paper
Add Code

Deconfounded Video Moment Retrieval with Causal Intervention

1 code implementation • 3 Jun 2021 • Xun Yang, Fuli Feng, Wei Ji, Meng Wang, Tat-Seng Chua

To fill the research gap, we propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.

Moment Retrieval Retrieval

Paper
Code

Calibrated RGB-D Salient Object Detection

1 code implementation • CVPR 2021 • Wei Ji, Jingjing Li, Shuang Yu, Miao Zhang, Yongri Piao, Shunyu Yao, Qi Bi, Kai Ma, Yefeng Zheng, Huchuan Lu, Li Cheng

Complex backgrounds and similar appearances between objects and their surroundings are generally recognized as challenging scenarios in Salient Object Detection (SOD).

Ranked #13 on Thermal Image Segmentation on RGB-T-Glass-Segmentation

Object object-detection +3

Paper
Code

Learning Calibrated Medical Image Segmentation via Multi-Rater Agreement Modeling

1 code implementation • CVPR 2021 • Wei Ji, Shuang Yu, Junde Wu, Kai Ma, Cheng Bian, Qi Bi, Jingjing Li, Hanruo Liu, Li Cheng, Yefeng Zheng

To our knowledge, our work is the first in producing calibrated predictions under different expertise levels for medical image segmentation.

Image Segmentation Medical Image Segmentation +2

Paper
Code

Advancing biological super-resolution microscopy through deep learning: a brief review

no code implementations • 24 Jun 2021 • Tianjie Yang, Yaoru Luo, Wei Ji, Ge Yang

We conclude with an outlook on how deep learning could shape the future of this new generation of light microscopy technology.

Specificity Super-Resolution

Paper
Add Code

Decoupling Strategy and Surface Realization for Task-oriented Dialogues

no code implementations • 29 Sep 2021 • Chenchen Ye, Lizi Liao, Fuli Feng, Wei Ji, Tat-Seng Chua

The core is to construct a latent content space for strategy optimization and disentangle the surface style from it.

Reinforcement Learning (RL) Style Transfer +1

Paper
Add Code

PP-PicoDet: A Better Real-Time Object Detector on Mobile Devices

4 code implementations • 1 Nov 2021 • Guanghua Yu, Qinyao Chang, Wenyu Lv, Chang Xu, Cheng Cui, Wei Ji, Qingqing Dang, Kaipeng Deng, Guanzhong Wang, Yuning Du, Baohua Lai, Qiwen Liu, Xiaoguang Hu, dianhai yu, Yanjun Ma

We investigate the applicability of the anchor-free strategy on lightweight object detection models.

Ranked #1 on Object Detection on MSCOCO

Object object-detection +1

12,029

Paper
Code

Meeting Summarization with Pre-training and Clustering Methods

1 code implementation • 16 Nov 2021 • Andras Huebner, Wei Ji, Xiang Xiao

Lastly, we compare the performance of our baseline models with BART, a state-of-the-art language model that is effective for summarization.

Clustering Language Modelling +2

Paper
Code

Joint Semantic Mining for Weakly Supervised RGB-D Salient Object Detection

1 code implementation • NeurIPS 2021 • Jingjing Li, Wei Ji, Qi Bi, Cheng Yan, Miao Zhang, Yongri Piao, Huchuan Lu, Li Cheng

As a by-product, a CapS dataset is constructed by augmenting existing benchmark training set with additional image tags and captions.

object-detection RGB-D Salient Object Detection +2

Paper
Code

Rethinking the Two-Stage Framework for Grounded Situation Recognition

1 code implementation • 10 Dec 2021 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Tat-Seng Chua

Since each verb is associated with a specific set of semantic roles, all existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage.

Ranked #3 on Situation Recognition on imSitu

Grounded Situation Recognition Object Recognition +1

Paper
Code

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

1 code implementation • 12 Dec 2021 • Junbin Xiao, Angela Yao, Zhiyuan Liu, Yicong Li, Wei Ji, Tat-Seng Chua

To align with the multi-granular essence of linguistic concepts in language queries, we propose to model video as a conditional graph hierarchy which weaves together visual facts of different granularity in a level-wise manner, with the guidance of corresponding textual cues.

Ranked #23 on Video Question Answering on NExT-QA

Question Answering Video Question Answering +1

Paper
Code

Generating Diverse and Natural 3D Human Motions From Text

1 code implementation • CVPR 2022 • Chuan Guo, Shihao Zou, Xinxin Zuo, Sen Wang, Wei Ji, Xingyu Li, Li Cheng

Automated generation of 3D human motions from text is a challenging problem.

Ranked #6 on Motion Synthesis on InterHuman

Motion Synthesis

392

Paper
Code

Exploring Denoised Cross-Video Contrast for Weakly-Supervised Temporal Action Localization

no code implementations • CVPR 2022 • Jingjing Li, Tianyu Yang, Wei Ji, Jue Wang, Li Cheng

Inspired by recent success in unsupervised contrastive representation learning, we propose a novel denoised cross-video contrastive algorithm, aiming to enhance the feature discrimination ability of video snippets for accurate temporal action localization in the weakly-supervised setting.

Contrastive Learning Denoising +4

Paper
Add Code

Content-Variant Reference Image Quality Assessment via Knowledge Distillation

1 code implementation • 26 Feb 2022 • Guanghao Yin, Wei Wang, Zehuan Yuan, Chuchu Han, Wei Ji, Shouqian Sun, Changhu Wang

The comparisons of distribution differences between HQ and LQ images can help our model better assess the image quality.

Image Quality Assessment Knowledge Distillation +1

Paper
Code

Video Question Answering: Datasets, Algorithms and Challenges

1 code implementation • 2 Mar 2022 • Yaoyao Zhong, Junbin Xiao, Wei Ji, Yicong Li, Weihong Deng, Tat-Seng Chua

Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.

Question Answering Video Question Answering

Paper
Code

Fine-Grained Scene Graph Generation with Data Transfer

2 code implementations • 22 Mar 2022 • Ao Zhang, Yuan YAO, Qianyu Chen, Wei Ji, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua

Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images.

Ranked #1 on Predicate Classification on Visual Genome

Graph Generation Predicate Classification +3

Paper
Code

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

1 code implementation • 27 Apr 2022 • Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image.

Ranked #1 on Single-View 3D Reconstruction on CUB-200-2011

3D Reconstruction Person Re-Identification +2

Paper
Code

Promoting Saliency From Depth: Deep Unsupervised RGB-D Saliency Detection

1 code implementation • ICLR 2022 • Wei Ji, Jingjing Li, Qi Bi, Chuan Guo, Jie Liu, Li Cheng

The laborious and time-consuming manual annotation has become a real bottleneck in various practical scenarios.

object-detection RGB-D Salient Object Detection +2

Paper
Code

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models

1 code implementation • 23 May 2022 • Yuan YAO, Qianyu Chen, Ao Zhang, Wei Ji, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

We show that PEVL enables state-of-the-art performance of detector-free VLP models on position-sensitive tasks such as referring expression comprehension and phrase grounding, and also improves the performance on position-insensitive tasks with grounded inputs.

Ranked #1 on Visual Commonsense Reasoning on VCR (Q-AR) test

Language Modelling Object +7

Paper
Code

Invariant Grounding for Video Question Answering

1 code implementation • CVPR 2022 • Yicong Li, Xiang Wang, Junbin Xiao, Wei Ji, Tat-Seng Chua

At its core is understanding the alignments between visual scenes in video and linguistic semantics in question to yield the answer.

Question Answering Video Question Answering

Paper
Code

Structured and Natural Responses Co-generation for Conversational Search

1 code implementation • ACM SIGIR Conference on Research and Development in Information Retrieval 2022 • Chenchen Ye, Lizi Liao, Fuli Feng, Wei Ji, Tat-Seng Chua

Existing approaches either 1) predict structured dialog acts first and then generate natural response; or 2) map conversation context to natural responses directly in an end-to-end manner.

Conversational Search

Paper
Code

MetaComp: Learning to Adapt for Online Depth Completion

no code implementations • 21 Jul 2022 • Yang Chen, Shanshan Zhao, Wei Ji, Mingming Gong, Liping Xie

However, facing a new environment where the test data occurs online and differs from the training data in the RGB image content and depth sparsity, the trained model might suffer severe performance drop.

Depth Completion Meta-Learning +1

Paper
Add Code

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

1 code implementation • 14 Nov 2022 • Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, Tat-Seng Chua

The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively.

Ranked #3 on Image Retrieval with Multi-Modal Query on Fashion200k

Composed Image Retrieval (CoIR) Image Retrieval with Multi-Modal Query +1

Paper
Code

Driving Style Recognition at First Impression for Online Trajectory Prediction

no code implementations • 21 Dec 2022 • Tu Xu, Kan Wu, Yongdong Zhu, Wei Ji

This paper proposes a new driving style recognition approach that allows autonomous vehicles (AVs) to perform trajectory predictions for surrounding vehicles with minimal data.

Autonomous Vehicles Trajectory Prediction

Paper
Add Code

Multi-queue Momentum Contrast for Microvideo-Product Retrieval

1 code implementation • 22 Dec 2022 • Yali Du, Yinwei Wei, Wei Ji, Fan Liu, Xin Luo, Liqiang Nie

The booming development and huge market of micro-videos bring new e-commerce channels for merchants.

Representation Learning Retrieval

Paper
Code

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding

no code implementations • 26 Dec 2022 • Wei Ji, Long Chen, Yinwei Wei, Yiming Wu, Tat-Seng Chua

In this work, we propose a novel multi-resolution temporal video sentence grounding network: MRTNet, which consists of a multi-modal feature encoder, a Multi-Resolution Temporal (MRT) module, and a predictor module.

Descriptive Sentence

Paper
Add Code

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning

1 code implementation • CVPR 2023 • Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua

Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.

Active Learning Moment Retrieval +1

Paper
Code

Multispectral Video Semantic Segmentation: A Benchmark Dataset and Baseline

1 code implementation • CVPR 2023 • Wei Ji, Jingjing Li, Cheng Bian, Zongwei Zhou, Jiaying Zhao, Alan L. Yuille, Li Cheng

This gives rise to significantly more robust segmentation of image objects in complex scenes and under adverse conditions.

Ranked #1 on Video Semantic Segmentation on Multispectral Video Semantic Segmentation

Segmentation Semantic Segmentation +1

Paper
Code

WINNER: Weakly-Supervised hIerarchical decompositioN and aligNment for Spatio-tEmporal Video gRounding

no code implementations • CVPR 2023 • Mengze Li, Han Wang, Wenqiao Zhang, Jiaxu Miao, Zhou Zhao, Shengyu Zhang, Wei Ji, Fei Wu

WINNER first builds the language decomposition tree in a bottom-up manner, upon which the structural attention mechanism and top-down feature backtracking jointly build a multi-modal decomposition tree, permitting a hierarchical understanding of unstructured videos.

Contrastive Learning Spatio-Temporal Video Grounding +1

Paper
Add Code

MedSegDiff-V2: Diffusion based Medical Image Segmentation with Transformer

2 code implementations • 19 Jan 2023 • Junde Wu, Wei Ji, Huazhu Fu, Min Xu, Yueming Jin, Yanwu Xu

To effectively integrate these two cutting-edge techniques for the Medical image segmentation, we propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.

Image Generation Image Segmentation +3

911

Paper
Code

Scalable Attribution of Adversarial Attacks via Multi-Task Learning

no code implementations • 25 Feb 2023 • Zhongyi Guo, Keji Han, Yao Ge, Wei Ji, Yun Li

In this paper, AAP is defined as the recognition of three signatures, i. e., {\em attack algorithm}, {\em victim model} and {\em hyperparameter}.

Multi-Task Learning

Paper
Add Code

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

no code implementations • ICCV 2023 • Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.

Domain Generalization Few-Shot Learning +1

Paper
Add Code

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

1 code implementation • ICCV 2023 • Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang

Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.

Graph Generation Language Modelling +1

Paper
Code

Segment Anything Is Not Always Perfect: An Investigation of SAM on Different Real-world Applications

1 code implementation • 12 Apr 2023 • Wei Ji, Jingjing Li, Qi Bi, TingWei Liu, Wenbo Li, Li Cheng

Recently, Meta AI Research approaches a general, promptable Segment Anything Model (SAM) pre-trained on an unprecedentedly large segmentation dataset (SA-1B).

Image Segmentation Segmentation +1

Paper
Code

Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation

3 code implementations • 25 Apr 2023 • Junde Wu, Wei Ji, Yuanpei Liu, Huazhu Fu, Min Xu, Yanwu Xu, Yueming Jin

In Med-SA, we propose Space-Depth Transpose (SD-Trans) to adapt 2D SAM to 3D medical images and Hyper-Prompting Adapter (HyP-Adpt) to achieve prompt-conditioned adaptation.

Image Segmentation Medical Image Segmentation +2

818

Paper
Code

VPGTrans: Transfer Visual Prompt Generator across LLMs

1 code implementation • NeurIPS 2023 • Ao Zhang, Hao Fei, Yuan YAO, Wei Ji, Li Li, Zhiyuan Liu, Tat-Seng Chua

While developing a new multimodal LLM (MLLM) by pre-training on tremendous image-text pairs from scratch can be exceedingly resource-consuming, connecting an existing LLM with a comparatively lightweight visual prompt generator (VPG) becomes a feasible paradigm.

Transfer Learning

260

Paper
Code

Generating Visual Spatial Description via Holistic 3D Scene Understanding

1 code implementation • 19 May 2023 • Yu Zhao, Hao Fei, Wei Ji, Jianguo Wei, Meishan Zhang, Min Zhang, Tat-Seng Chua

With an external 3D scene extractor, we obtain the 3D objects and scene features for input images, based on which we construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes.

Scene Understanding Text Generation

Paper
Code

Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment

no code implementations • 20 May 2023 • Shengqiong Wu, Hao Fei, Wei Ji, Tat-Seng Chua

Unpaired cross-lingual image captioning has long suffered from irrelevancy and disfluency issues, due to the inconsistencies of the semantic scene and syntax attributes during transfer.

Image Captioning Translation

Paper
Add Code

In Defense of Clip-based Video Relation Detection

no code implementations • 18 Jul 2023 • Meng Wei, Long Chen, Wei Ji, Xiaoyu Yue, Roger Zimmermann

While recent video-based methods utilizing video tubelets have shown promising results, we argue that the effective modeling of spatial and temporal context plays a more significant role than the choice between clip tubelets and video tubelets.

Feature Compression Object Tracking +2

Paper
Add Code

Panoptic Scene Graph Generation with Semantics-Prototype Learning

1 code implementation • 28 Jul 2023 • Li Li, Wei Ji, Yiming Wu, Mengze Li, You Qin, Lina Wei, Roger Zimmermann

To promise consistency and accuracy during the transfer process, we propose to measure the invariance of representations in each predicate class, and learn unbiased prototypes of predicates with different intensities.

Ranked #3 on Panoptic Scene Graph Generation on PSG Dataset

Graph Generation Panoptic Scene Graph Generation

Paper
Code

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

1 code implementation • 8 Aug 2023 • Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Hanwang Zhang, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Yueting Zhuang

This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.

Caption Generation Image Captioning +1

322

Paper
Code

Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation

1 code implementation • 8 Aug 2023 • Wei Ji, Xiangyan Liu, An Zhang, Yinwei Wei, Yongxin Ni, Xiang Wang

To be specific, we first introduce an ID-aware Multi-modal Transformer module in the item representation learning stage to facilitate information interaction among different features.

Collaborative Filtering Representation Learning +1

Paper
Code

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval

no code implementations • 19 Aug 2023 • Kaihang Pan, Juncheng Li, Hongye Song, Hao Fei, Wei Ji, Shuo Zhang, Jun Lin, Xiaozhong Liu, Siliang Tang

Recent studies have shown that dense retrieval models, lacking dedicated training data, struggle to perform well across diverse retrieval tasks, as different retrieval tasks often entail distinct search intents.

Retrieval Text-to-Image Generation

Paper
Add Code

Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape

no code implementations • ICCV 2023 • Jiacong Xu, Yi Zhang, Jiawei Peng, Wufei Ma, Artur Jesslen, Pengliang Ji, Qixin Hu, Jiehua Zhang, Qihao Liu, Jiahao Wang, Wei Ji, Chen Wang, Xiaoding Yuan, Prakhar Kaushik, Guofeng Zhang, Jie Liu, Yushan Xie, Yawen Cui, Alan Yuille, Adam Kortylewski

Animal3D consists of 3379 images collected from 40 mammal species, high-quality annotations of 26 keypoints, and importantly the pose and shape parameters of the SMAL model.

Ranked #1 on Animal Pose Estimation on Animal3D

Animal Pose Estimation

Paper
Add Code

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

no code implementations • 26 Aug 2023 • Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua

In this work, we investigate strengthening the awareness of video dynamics for DMs, for high-quality T2V generation.

In-Context Learning Video Generation

Paper
Add Code

NExT-GPT: Any-to-Any Multimodal LLM

1 code implementation • 11 Sep 2023 • Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng Chua

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities.

2,841

Paper
Code

Towards Complex-query Referring Image Segmentation: A Novel Benchmark

no code implementations • 29 Sep 2023 • Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann

Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms.

Image Segmentation Semantic Segmentation

Paper
Add Code

Domain-wise Invariant Learning for Panoptic Scene Graph Generation

no code implementations • 9 Oct 2023 • Li Li, You Qin, Wei Ji, Yuxiao Zhou, Roger Zimmermann

Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates).

Graph Generation Panoptic Scene Graph Generation

Paper
Add Code

Towards Robust Multi-Modal Reasoning via Model Selection

1 code implementation • 12 Oct 2023 • Xiangyan Liu, Rongxue Li, Wei Ji, Tao Lin

The reasoning capabilities of LLM (Large Language Model) are widely acknowledged in recent research, inspiring studies on tool learning and autonomous agents.

Language Modelling Large Language Model +1

Paper
Code

NExT-Chat: An LMM for Chat, Detection and Segmentation

1 code implementation • 8 Nov 2023 • Ao Zhang, Yuan YAO, Wei Ji, Zhiyuan Liu, Tat-Seng Chua

The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs).

Referring Expression Referring Expression Segmentation +1

156

Paper
Code

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback

no code implementations • 21 Nov 2023 • Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.

Logical Reasoning

Paper
Add Code

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching

no code implementations • 21 Nov 2023 • Meng Chu, Zhedong Zheng, Wei Ji, Tingyu Wang, Tat-Seng Chua

Navigating drones through natural language commands remains challenging due to the dearth of accessible multi-modal datasets and the stringent precision requirements for aligning visual and textual data.

Drone navigation Language Modelling +2

Paper
Add Code

Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization

no code implementations • 16 Jan 2024 • Qi Bi, Wei Ji, Jingjun Yi, Haolan Zhan, Gui-Song Xia

To comprehensively learn the relation between informative patches and fine-grained semantics, the multi-instance knowledge distillation is implemented on both the region/image crop pairs from the teacher and student net, and the region-image crops inside the teacher / student net, which we term as intra-level multi-instance distillation and inter-level multi-instance distillation.

Fine-Grained Visual Categorization Knowledge Distillation +2

Paper
Add Code

GOOD: Towards Domain Generalized Orientated Object Detection

no code implementations • 20 Feb 2024 • Qi Bi, Beichen Zhou, Jingjun Yi, Wei Ji, Haolan Zhan, Gui-Song Xia

In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains.

Hallucination Object +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.