Search Results for author: Peng Jin

Found 44 papers, 21 papers with code

MoH: Multi-Head Attention as Mixture-of-Head Attention

3 code implementations15 Oct 2024 Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan

We show that multi-head attention can be expressed in the summation form.

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

3 code implementations9 Oct 2024 Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan

In this work, we aim to simultaneously enhance the effectiveness and efficiency of Mixture-of-Experts (MoE) methods.

MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

1 code implementation20 Aug 2024 Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang

Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries.

Mamba Natural Language Queries +2

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

no code implementations15 Jul 2024 Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

Specifically, we provide an automated method for reference local action sampling and leverage graph attention networks to assess the guiding weight of each local action in the overall motion synthesis.

Graph Attention Motion Generation +1

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

1 code implementation26 Jun 2024 Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan

Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency.

LLMBind: A Unified Modality-Task Integration Framework

1 code implementation22 Feb 2024 Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

AI Agent Audio Generation +3

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

1 code implementation18 Jan 2024 Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen

To mold instance queries to follow Brownian bridge and accomplish alignment with class texts, we design Bridge-Text Alignment (BTA) to learn discriminative bridge-level representations of instances via contrastive objectives.

Instance Segmentation Semantic Segmentation +1

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

1 code implementation20 Dec 2023 Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.

3D Generation Image to 3D

FreestyleRet: Retrieving Images from Style-Diversified Queries

1 code implementation5 Dec 2023 Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan

In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles.

Image Retrieval Retrieval

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

5 code implementations16 Nov 2023 Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Language Modelling Large Language Model +3

Synthetic Augmentation with Large-scale Unconditional Pre-training

1 code implementation8 Aug 2023 Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue

To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training.

An Empirical Study of Large-Scale Data-Driven Full Waveform Inversion

no code implementations28 Jul 2023 Peng Jin, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin

This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem.

Deep Learning SSIM

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

4 code implementations20 May 2023 Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.

Retrieval Video Retrieval

TG-VQA: Ternary Game of Video Question Answering

no code implementations17 May 2023 Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.

Contrastive Learning Question Answering +2

Auto-Linear Phenomenon in Subsurface Imaging

no code implementations27 Apr 2023 Yinan Feng, Yinpeng Chen, Peng Jin, Shihang Feng, Zicheng Liu, Youzuo Lin

Subsurface imaging involves solving full waveform inversion (FWI) to predict geophysical properties from measurements.

Decoder Geophysics +2

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

4 code implementations CVPR 2023 Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

Contrastive learning-based video-language representation learning approaches, e. g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs.

Contrastive Learning Question Answering +5

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation

no code implementations ICCV 2023 Kehan Li, Yian Zhao, Zhennan Wang, Zesen Cheng, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis.

Interactive Segmentation Medical Image Analysis

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

4 code implementations ICCV 2023 Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen

Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i. e., p(candidates|query).

Retrieval Video Retrieval

Parallel Vertex Diffusion for Unified Visual Grounding

no code implementations13 Mar 2023 Zesen Cheng, Kehan Li, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

An intuitive materialization of our paradigm is Parallel Vertex Diffusion (PVD) to directly set vertex coordinates as the generation target and use a diffusion model to train and infer.

Visual Grounding

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

4 code implementations21 Nov 2022 Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen

Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.

Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)

Contrastive Learning Representation Learning +5

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

no code implementations21 Sep 2022 Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.

Image Captioning Optical Character Recognition (OCR) +2

An Intriguing Property of Geophysics Inversion

no code implementations28 Apr 2022 Yinan Feng, Yinpeng Chen, Shihang Feng, Peng Jin, Zicheng Liu, Youzuo Lin

In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels.

Decoder Geophysics

Extremely Weak Supervision Inversion of Multi-physical Properties

no code implementations3 Feb 2022 Shihang Feng, Peng Jin, Xitong Zhang, Yinpeng Chen, David Alumbaugh, Michael Commer, Youzuo Lin

We explore a multi-physics inversion problem from two distinct measurements~(seismic and EM data) to three geophysical properties~(velocity, conductivity, and CO$_2$ saturation).

Geophysics

OpenFWI: Large-Scale Multi-Structural Benchmark Datasets for Seismic Full Waveform Inversion

2 code implementations4 Nov 2021 Chengyuan Deng, Shihang Feng, Hanchen Wang, Xitong Zhang, Peng Jin, Yinan Feng, Qili Zeng, Yinpeng Chen, Youzuo Lin

The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community.

2k Benchmarking +2

Unsupervised Learning of Full-Waveform Inversion: Connecting CNN and Partial Differential Equation in a Loop

no code implementations ICLR 2022 Peng Jin, Xitong Zhang, Yinpeng Chen, Sharon Xiaolei Huang, Zicheng Liu, Youzuo Lin

In particular, we use finite difference to approximate the forward modeling of PDE as a differentiable operator (from velocity map to seismic data) and model its inversion by CNN (from seismic data to velocity map).

Geophysics

Learning on Abstract Domains: A New Approach for Verifiable Guarantee in Reinforcement Learning

no code implementations13 Jun 2021 Peng Jin, Min Zhang, Jianwen Li, Li Han, Xuejun Wen

Formally verifying Deep Reinforcement Learning (DRL) systems is a challenging task due to the dynamic continuity of system behaviors and the black-box feature of embedded neural networks.

reinforcement-learning Reinforcement Learning (RL)

Robust Kalman filter-based dynamic state estimation of natural gas pipeline networks

no code implementations26 Feb 2021 Liang Chen, Peng Jin, Jing Yang, Yang Li, Yi Song

To obtain the accurate transient states of the big scale natural gas pipeline networks under the bad data and non-zero mean noises conditions, a robust Kalman filter-based dynamic state estimation method is proposed using the linearized gas pipeline transient flow equations in this paper.

CN-HIT-IT.NLP at SemEval-2020 Task 4: Enhanced Language Representation with Multiple Knowledge Triples

no code implementations SEMEVAL 2020 Yice Zhang, Jiaxuan Lin, Yang Fan, Peng Jin, Yuanchao Liu, Bingquan Liu

For this task, it is obvious that external knowledge, such as Knowledge graph, can help the model understand commonsense in natural language statements.

Knowledge Graphs

Artificial Intelligence Enabled Traffic Monitoring System

no code implementations2 Oct 2020 Vishal Mandal, Abdul Rashid Mussah, Peng Jin, Yaw Adu-Gyamfi

Real-time object detection algorithms coupled with different tracking systems are deployed to automatically detect stranded vehicles as well as perform vehicular counts.

Management object-detection +1

Distributional Discrepancy: A Metric for Unconditional Text Generation

1 code implementation4 May 2020 Ping Cai, Xingyuan Chen, Peng Jin, Hongjun Wang, Tianrui Li

The purpose of unconditional text generation is to train a model with real sentences, then generate novel sentences of the same quality and diversity as the training data.

Diversity Language Modelling +1

Adding A Filter Based on The Discriminator to Improve Unconditional Text Generation

1 code implementation5 Apr 2020 Xingyuan Chen, Ping Cai, Peng Jin, Hongjun Wang, Xin-yu Dai, Jia-Jun Chen

To alleviate the exposure bias, generative adversarial networks (GAN) use the discriminator to update the generator's parameters directly, but they fail by being evaluated precisely.

Language Modelling Text Generation

Pavement Image Datasets: A New Benchmark Dataset to Classify and Densify Pavement Distresses

no code implementations20 Oct 2019 Hamed Majidifard, Peng Jin, Yaw Adu-Gyamfi, William G. Buttlar

Automated pavement distresses detection using road images remains a challenging topic in the computer vision research community.

Deep Learning

The Detection of Distributional Discrepancy for Text Generation

no code implementations28 Sep 2019 Xingyuan Chen, Ping Cai, Peng Jin, Haokun Du, Hongjun Wang, Xingyu Dai, Jia-Jun Chen

In this paper, we theoretically propose two metric functions to measure the distributional difference between real text and generated text.

Language Modelling Text Generation

Restricted Boltzmann Machines with Gaussian Visible Units Guided by Pairwise Constraints

no code implementations13 Jan 2017 Jielei Chu, Hongjun Wang, Hua Meng, Peng Jin, Tianrui Li

To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by pairwise constraints and the process of encoding is conducted under these guidances.

Clustering

CLTC: A Chinese-English Cross-lingual Topic Corpus

no code implementations LREC 2012 Yunqing Xia, Guoyu Tang, Peng Jin, Xia Yang

A preliminary evaluation with CLTC corpus indicates that the corpus is effective in evaluating cross-lingual topic detection methods.

Clustering Text Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.