Search Results for author: Zhijie Lin

Found 24 papers, 8 papers with code

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

1 code implementation • arXiv 2024 • Lin Xu, Yilin Zhao, Daquan Zhou⋆†, Zhijie Lin, See Kiong Ng, Jiashi Feng

PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.

Ranked #1 on Zero-Shot Video Question Answer on TGIF-QA

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +4

Paper
Code

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation

no code implementations • 9 Jan 2024 • Weimin WANG, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng

The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field.

MORPH Video Generation

Paper
Add Code

ChatAnything: Facetime Chat with LLM-Enhanced Personas

no code implementations • 12 Nov 2023 • Yilin Zhao, Xinbin Yuan, ShangHua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou

For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically.

In-Context Learning Novel Concepts +2

Paper
Add Code

Towards Garment Sewing Pattern Reconstruction from a Single Image

1 code implementation • 7 Nov 2023 • Lijuan Liu, Xiangyu Xu, Zhijie Lin, Jiabin Liang, Shuicheng Yan

In this work, we explore the challenging problem of recovering garment sewing patterns from daily photos for augmenting these applications.

Garment Reconstruction Texture Synthesis +1

111

Paper
Code

Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models

no code implementations • 15 Oct 2023 • Zijian Zhang, Luping Liu, Zhijie Lin, Yichen Zhu, Zhou Zhao

We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.

Paper
Add Code

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

1 code implementation • 17 Jul 2023 • Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.

Instruction Following Sentence +1

470

Paper
Code

DATE: Domain Adaptive Product Seeker for E-commerce

no code implementations • CVPR 2023 • Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao

Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG.

Domain Adaptation Retrieval +1

Paper
Add Code

Rethinking Multi-Contrast MRI Super-Resolution: Rectangle-Window Cross-Attention Transformer and Arbitrary-Scale Upsampling

no code implementations • ICCV 2023 • Guangyuan Li, Lei Zhao, Jiakai Sun, Zehua Lan, Zhanjie Zhang, Jiafu Chen, Zhijie Lin, Huaizhong Lin, Wei Xing

Recently, several methods have explored the potential of multi-contrast magnetic resonance imaging (MRI) super-resolution (SR) and obtain results superior to single-contrast SR methods.

Super-Resolution

Paper
Add Code

Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models

2 code implementations • 26 Dec 2022 • Zijian Zhang, Zhou Zhao, Zhijie Lin

These imply that the gap corresponds to the lost information of the image, and we can reconstruct the image by filling the gap.

Image Reconstruction Representation Learning

265

Paper
Code

Pseudo Numerical Methods for Diffusion Models on Manifolds

7 code implementations • ICLR 2022 • Luping Liu, Yi Ren, Zhijie Lin, Zhou Zhao

Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs).

Ranked #11 on Image Generation on CelebA 64x64

Denoising Image Generation

17,091

Paper
Code

A Survey: Deep Learning for Hyperspectral Image Classification with Few Labeled Samples

1 code implementation • 3 Dec 2021 • Sen Jia, Shuguo Jiang, Zhijie Lin, Nanying Li, Meng Xu, Shiqi Yu

In general, deep learning models often contain many trainable parameters and require a massive number of labeled samples to achieve optimal performance.

Active Learning Few-Shot Learning +2

Paper
Code

ST-DDPM: Explore Class Clustering for Conditional Diffusion Probabilistic Models

no code implementations • 29 Sep 2021 • Zhijie Lin, Zijian Zhang, Zhou Zhao

Score-based generative models involve sequentially corrupting the data distribution with noise and then learns to recover the data distribution based on score matching.

Clustering Conditional Image Generation

Paper
Add Code

SimulLR: Simultaneous Lip Reading Transducer with Attention-Guided Adaptive Memory

no code implementations • 31 Aug 2021 • Zhijie Lin, Zhou Zhao, Haoyuan Li, Jinglin Liu, Meng Zhang, Xingshan Zeng, Xiaofei He

Lip reading, aiming to recognize spoken sentences according to the given video of lip movements without relying on the audio stream, has attracted great interest due to its application in many scenarios.

Lip Reading

Paper
Add Code

Cascaded Prediction Network via Segment Tree for Temporal Video Grounding

no code implementations • CVPR 2021 • Yang Zhao, Zhou Zhao, Zhu Zhang, Zhijie Lin

Temporal video grounding aims to localize the target segment which is semantically aligned with the given sentence in an untrimmed video.

Sentence Video Grounding

Paper
Add Code

Learning to Rehearse in Long Sequence Memorization

no code implementations • 2 Jun 2021 • Zhu Zhang, Chang Zhou, Jianxin Ma, Zhijie Lin, Jingren Zhou, Hongxia Yang, Zhou Zhao

Further, we design a history sampler to select informative fragments for rehearsal training, making the memory focus on the crucial information.

Memorization Question Answering +1

Paper
Add Code

To Learn Effective Features: Understanding the Task-Specific Adaptation of MAML

no code implementations • 1 Jan 2021 • Zhijie Lin, Zhou Zhao, Zhu Zhang, Huai Baoxing, Jing Yuan

Model Agnostic Meta-Learning~(MAML)~(\cite{finn2017model}) is one of the most well-known gradient-based meta learning algorithms, that learns the meta-initialization through the inner and outer optimization loop.

Contrastive Learning Meta-Learning

Paper
Add Code

Continual Memory: Can We Reason After Long-Term Memorization?

no code implementations • 1 Jan 2021 • Zhu Zhang, Chang Zhou, Zhou Zhao, Zhijie Lin, Jingren Zhou, Hongxia Yang

Existing reasoning tasks often follow the setting of "reasoning while experiencing", which has an important assumption that the raw contents can be always accessed while reasoning.

Memorization

Paper
Add Code

Counterfactual Contrastive Learning for Weakly-Supervised Vision-Language Grounding

no code implementations • NeurIPS 2020 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jieming Zhu, Xiuqiang He

Weakly-supervised vision-language grounding aims to localize a target moment in a video or a specific region in an image according to the given sentence query, where only video-level or image-level sentence annotations are provided during training.

Contrastive Learning counterfactual +2

Paper
Add Code

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

1 code implementation • 19 Aug 2020 • Zhu Zhang, Zhijie Lin, Zhou Zhao, Jieming Zhu, Xiuqiang He

Thus, these methods fail to distinguish the target moment from plausible negative moments.

Moment Retrieval Retrieval +1

Paper
Code

Object-Aware Multi-Branch Relation Networks for Spatio-Temporal Video Grounding

no code implementations • 16 Aug 2020 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Baoxing Huai, Nicholas Jing Yuan

Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of a queried object according to the given sentence.

Object Relation +4

Paper
Add Code

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network

no code implementations • 19 Nov 2019 • Zhijie Lin, Zhou Zhao, Zhu Zhang, Qi. Wang, Huasheng Liu

Video moment retrieval is to search the moment that is most relevant to the given natural language query.

Moment Retrieval Retrieval +2

Paper
Add Code

Localizing Unseen Activities in Video via Image Query

no code implementations • 28 Jun 2019 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Deng Cai

Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization.

Action Localization Video Understanding

Paper
Add Code

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks

no code implementations • 28 Jun 2019 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Xiaofei He

Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context.

Answer Generation Question Answering +1

Paper
Add Code

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

1 code implementation • 6 Jun 2019 • Zhu Zhang, Zhijie Lin, Zhou Zhao, Zhenxin Xiao

Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query.

Moment Retrieval Natural Language Queries +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.