Search Results for author: Ziyun Zeng

Found 11 papers, 10 papers with code

GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval

1 code implementation • 8 Oct 2023 • Yuting Wang, Jinpeng Wang, Bin Chen, Ziyun Zeng, Shu-Tao Xia

Current PRVR methods adopt scanning-based clip construction to achieve explicit clip modeling, which is information-redundant and requires a large storage overhead.

Partially Relevant Video Retrieval Retrieval +1

Paper
Code

Making LLaMA SEE and Draw with SEED Tokenizer

1 code implementation • 2 Oct 2023 • Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying Shan

We identify two crucial design principles: (1) Image tokens should be independent of 2D physical patch positions and instead be produced with a 1D causal dependency, exhibiting intrinsic interdependence that aligns with the left-to-right autoregressive prediction mechanism in LLMs.

multimodal generation

461

Paper
Code

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

1 code implementation • 28 Aug 2023 • Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell

Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions.

Instance Segmentation Optical Flow Estimation +5

864

Paper
Code

MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

1 code implementation • 22 Aug 2023 • Jinpeng Wang, Ziyun Zeng, Yunxiao Wang, Yuting Wang, Xingyu Lu, Tianxiang Li, Jun Yuan, Rui Zhang, Hai-Tao Zheng, Shu-Tao Xia

We propose MISSRec, a multi-modal pre-training and transfer learning framework for SR. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests while a novel interest-aware decoder is developed to grasp item-modality-interest relations for better sequence representation.

Contrastive Learning Sequential Recommendation +1

Paper
Code

Planting a SEED of Vision in Large Language Model

1 code implementation • 16 Jul 2023 • Yuying Ge, Yixiao Ge, Ziyun Zeng, Xintao Wang, Ying Shan

Research on image tokenizers has previously reached an impasse, as frameworks employing quantized visual tokens have lost prominence due to subpar performance and convergence in multimodal comprehension (compared to BLIP-2, etc.)

Language Modelling Large Language Model +1

461

Paper
Code

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

1 code implementation • 23 May 2023 • Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan

We argue that tuning a text encoder end-to-end, as done in previous work, is suboptimal since it may overfit in terms of styles, thereby losing its original generalization ability to capture the semantics of various language registers.

Representation Learning

Paper
Code

Contrastive Masked Autoencoders for Self-Supervised Video Hashing

1 code implementation • 21 Nov 2022 • Yuting Wang, Jinpeng Wang, Bin Chen, Ziyun Zeng, Shutao Xia

To capture video semantic information for better hashing learning, we adopt an encoder-decoder structure to reconstruct the video from its temporal-masked frames.

Retrieval Video Retrieval +2

Paper
Code

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

1 code implementation • CVPR 2023 • Ziyun Zeng, Yuying Ge, Xihui Liu, Bin Chen, Ping Luo, Shu-Tao Xia, Yixiao Ge

Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years.

Descriptive Representation Learning +1

Paper
Code

Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval

1 code implementation • 7 Feb 2022 • Jinpeng Wang, Bin Chen, Dongliang Liao, Ziyun Zeng, Gongfu Li, Shu-Tao Xia, Jin Xu

By performing Asymmetric-Quantized Contrastive Learning (AQ-CL) across views, HCQ aligns texts and videos at coarse-grained and multiple fine-grained levels.

Contrastive Learning Quantization +4

Paper
Code

PHPQ: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval

no code implementations • 11 Sep 2021 • Ziyun Zeng, Jinpeng Wang, Bin Chen, Tao Dai, Shu-Tao Xia, Zhi Wang

To improve fine-grained image hashing, we propose Pyramid Hybrid Pooling Quantization (PHPQ).

Deep Hashing Image Retrieval +1

Paper
Add Code

Contrastive Quantization with Code Memory for Unsupervised Image Retrieval

1 code implementation • 11 Sep 2021 • Jinpeng Wang, Ziyun Zeng, Bin Chen, Tao Dai, Shu-Tao Xia

The high efficiency in computation and storage makes hashing (including binary hashing and quantization) a common strategy in large-scale retrieval systems.

Contrastive Learning Deep Hashing +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.