1 code implementation • arXiv 2024 • Lin Xu, Yilin Zhao, Daquan Zhou⋆†, Zhijie Lin, See Kiong Ng, Jiashi Feng
PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks.
Ranked #1 on Zero-Shot Video Question Answer on TGIF-QA
Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +4
no code implementations • 9 Jan 2024 • Weimin WANG, Jiawei Liu, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field.
no code implementations • 12 Nov 2023 • Yilin Zhao, Xinbin Yuan, ShangHua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou
For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically.
1 code implementation • 7 Nov 2023 • Lijuan Liu, Xiangyu Xu, Zhijie Lin, Jiabin Liang, Shuicheng Yan
In this work, we explore the challenging problem of recovering garment sewing patterns from daily photos for augmenting these applications.
no code implementations • 15 Oct 2023 • Zijian Zhang, Luping Liu, Zhijie Lin, Yichen Zhu, Zhou Zhao
We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models.
1 code implementation • 17 Jul 2023 • Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang
Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.
no code implementations • CVPR 2023 • Haoyuan Li, Hao Jiang, Tao Jin, Mengyan Li, Yan Chen, Zhijie Lin, Yang Zhao, Zhou Zhao
Then, we present two cooperative seekers to simultaneously search the image for PR and localize the product for PG.
no code implementations • ICCV 2023 • Guangyuan Li, Lei Zhao, Jiakai Sun, Zehua Lan, Zhanjie Zhang, Jiafu Chen, Zhijie Lin, Huaizhong Lin, Wei Xing
Recently, several methods have explored the potential of multi-contrast magnetic resonance imaging (MRI) super-resolution (SR) and obtain results superior to single-contrast SR methods.
2 code implementations • 26 Dec 2022 • Zijian Zhang, Zhou Zhao, Zhijie Lin
These imply that the gap corresponds to the lost information of the image, and we can reconstruct the image by filling the gap.
7 code implementations • ICLR 2022 • Luping Liu, Yi Ren, Zhijie Lin, Zhou Zhao
Under such a perspective, we propose pseudo numerical methods for diffusion models (PNDMs).
Ranked #11 on Image Generation on CelebA 64x64
1 code implementation • 3 Dec 2021 • Sen Jia, Shuguo Jiang, Zhijie Lin, Nanying Li, Meng Xu, Shiqi Yu
In general, deep learning models often contain many trainable parameters and require a massive number of labeled samples to achieve optimal performance.
no code implementations • 29 Sep 2021 • Zhijie Lin, Zijian Zhang, Zhou Zhao
Score-based generative models involve sequentially corrupting the data distribution with noise and then learns to recover the data distribution based on score matching.
no code implementations • 31 Aug 2021 • Zhijie Lin, Zhou Zhao, Haoyuan Li, Jinglin Liu, Meng Zhang, Xingshan Zeng, Xiaofei He
Lip reading, aiming to recognize spoken sentences according to the given video of lip movements without relying on the audio stream, has attracted great interest due to its application in many scenarios.
no code implementations • CVPR 2021 • Yang Zhao, Zhou Zhao, Zhu Zhang, Zhijie Lin
Temporal video grounding aims to localize the target segment which is semantically aligned with the given sentence in an untrimmed video.
no code implementations • 2 Jun 2021 • Zhu Zhang, Chang Zhou, Jianxin Ma, Zhijie Lin, Jingren Zhou, Hongxia Yang, Zhou Zhao
Further, we design a history sampler to select informative fragments for rehearsal training, making the memory focus on the crucial information.
no code implementations • 1 Jan 2021 • Zhijie Lin, Zhou Zhao, Zhu Zhang, Huai Baoxing, Jing Yuan
Model Agnostic Meta-Learning~(MAML)~(\cite{finn2017model}) is one of the most well-known gradient-based meta learning algorithms, that learns the meta-initialization through the inner and outer optimization loop.
no code implementations • 1 Jan 2021 • Zhu Zhang, Chang Zhou, Zhou Zhao, Zhijie Lin, Jingren Zhou, Hongxia Yang
Existing reasoning tasks often follow the setting of "reasoning while experiencing", which has an important assumption that the raw contents can be always accessed while reasoning.
no code implementations • NeurIPS 2020 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jieming Zhu, Xiuqiang He
Weakly-supervised vision-language grounding aims to localize a target moment in a video or a specific region in an image according to the given sentence query, where only video-level or image-level sentence annotations are provided during training.
1 code implementation • 19 Aug 2020 • Zhu Zhang, Zhijie Lin, Zhou Zhao, Jieming Zhu, Xiuqiang He
Thus, these methods fail to distinguish the target moment from plausible negative moments.
no code implementations • 16 Aug 2020 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Baoxing Huai, Nicholas Jing Yuan
Spatio-temporal video grounding aims to retrieve the spatio-temporal tube of a queried object according to the given sentence.
no code implementations • 19 Nov 2019 • Zhijie Lin, Zhou Zhao, Zhu Zhang, Qi. Wang, Huasheng Liu
Video moment retrieval is to search the moment that is most relevant to the given natural language query.
no code implementations • 28 Jun 2019 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Deng Cai
Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization.
no code implementations • 28 Jun 2019 • Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Xiaofei He
Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context.
1 code implementation • 6 Jun 2019 • Zhu Zhang, Zhijie Lin, Zhou Zhao, Zhenxin Xiao
Query-based moment retrieval aims to localize the most relevant moment in an untrimmed video according to the given natural language query.