3 code implementations • 15 Oct 2024 • Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan
We show that multi-head attention can be expressed in the summation form.
3 code implementations • 9 Oct 2024 • Peng Jin, Bo Zhu, Li Yuan, Shuicheng Yan
In this work, we aim to simultaneously enhance the effectiveness and efficiency of Mixture-of-Experts (MoE) methods.
1 code implementation • 20 Aug 2024 • Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang
Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries.
no code implementations • 15 Jul 2024 • Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen
Specifically, we provide an automated method for reference local action sampling and leverage graph attention networks to assess the guiding weight of each local action in the overall motion synthesis.
1 code implementation • 26 Jun 2024 • Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan
Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency.
no code implementations • 29 May 2024 • Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li
Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries.
1 code implementation • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan
In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.
1 code implementation • 8 Feb 2024 • Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao, Peng Gao
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.
Ranked #11 on Video Question Answering on MVBench
2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Yatian Pang, Munan Ning, Li Yuan
In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.
Ranked #117 on Visual Question Answering on MM-Vet
1 code implementation • 18 Jan 2024 • Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen
To mold instance queries to follow Brownian bridge and accomplish alignment with class texts, we design Bridge-Text Alignment (BTA) to learn discriminative bridge-level representations of instances via contrastive objectives.
1 code implementation • 20 Dec 2023 • Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.
1 code implementation • 5 Dec 2023 • Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan
In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles.
5 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan
In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.
Ranked #7 on Zero-Shot Video Question Answer on TGIF-QA
4 code implementations • CVPR 2024 • Peng Jin, Ryuichi Takanobu, Wancai Zhang, Xiaochun Cao, Li Yuan
Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations.
Image-based Generative Performance Benchmarking Language Modelling +10
1 code implementation • 8 Aug 2023 • Jiarong Ye, Haomiao Ni, Peng Jin, Sharon X. Huang, Yuan Xue
To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training.
no code implementations • 28 Jul 2023 • Peng Jin, Yinan Feng, Shihang Feng, Hanchen Wang, Yinpeng Chen, Benjamin Consolvo, Zicheng Liu, Youzuo Lin
This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem.
no code implementations • 21 Jun 2023 • Shihang Feng, Hanchen Wang, Chengyuan Deng, Yinan Feng, Yanhua Liu, Min Zhu, Peng Jin, Yinpeng Chen, Youzuo Lin
We conduct comprehensive numerical experiments to explore the relationship between P-wave and S-wave velocities in seismic data.
no code implementations • 19 Jun 2023 • Zesen Cheng, Peng Jin, Hao Li, Kehan Li, Siheng Li, Xiangyang Ji, Chang Liu, Jie Chen
Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information.
4 code implementations • 20 May 2023 • Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen
In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.
no code implementations • 17 May 2023 • Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen
Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.
no code implementations • 27 Apr 2023 • Yinan Feng, Yinpeng Chen, Peng Jin, Shihang Feng, Zicheng Liu, Youzuo Lin
Subsurface imaging involves solving full waveform inversion (FWI) to predict geophysical properties from measurements.
4 code implementations • CVPR 2023 • Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen
Contrastive learning-based video-language representation learning approaches, e. g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs.
Ranked #8 on Video Question Answering on MSRVTT-QA
no code implementations • ICCV 2023 • Kehan Li, Yian Zhao, Zhennan Wang, Zesen Cheng, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen
Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis.
4 code implementations • ICCV 2023 • Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen
Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i. e., p(candidates|query).
Ranked #15 on Video Retrieval on MSVD
no code implementations • 13 Mar 2023 • Zesen Cheng, Kehan Li, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen
An intuitive materialization of our paradigm is Parallel Vertex Diffusion (PVD) to directly set vertex coordinates as the generation target and use a diffusion model to train and infer.
4 code implementations • 21 Nov 2022 • Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen
Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.
Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)
no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen
Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.
no code implementations • 28 Apr 2022 • Yinan Feng, Yinpeng Chen, Shihang Feng, Peng Jin, Zicheng Liu, Youzuo Lin
In particular, when dealing with the inversion from seismic data to subsurface velocity governed by a wave equation, the integral results of velocity with Gaussian kernels are linearly correlated to the integral of seismic data with sine kernels.
no code implementations • 3 Feb 2022 • Shihang Feng, Peng Jin, Xitong Zhang, Yinpeng Chen, David Alumbaugh, Michael Commer, Youzuo Lin
We explore a multi-physics inversion problem from two distinct measurements~(seismic and EM data) to three geophysical properties~(velocity, conductivity, and CO$_2$ saturation).
2 code implementations • 4 Nov 2021 • Chengyuan Deng, Shihang Feng, Hanchen Wang, Xitong Zhang, Peng Jin, Yinan Feng, Qili Zeng, Yinpeng Chen, Youzuo Lin
The recent success of data-driven FWI methods results in a rapidly increasing demand for open datasets to serve the geophysics community.
no code implementations • ICLR 2022 • Peng Jin, Xitong Zhang, Yinpeng Chen, Sharon Xiaolei Huang, Zicheng Liu, Youzuo Lin
In particular, we use finite difference to approximate the forward modeling of PDE as a differentiable operator (from velocity map to seismic data) and model its inversion by CNN (from seismic data to velocity map).
no code implementations • 13 Jun 2021 • Peng Jin, Min Zhang, Jianwen Li, Li Han, Xuejun Wen
Formally verifying Deep Reinforcement Learning (DRL) systems is a challenging task due to the dynamic continuity of system behaviors and the black-box feature of embedded neural networks.
no code implementations • 26 Feb 2021 • Liang Chen, Peng Jin, Jing Yang, Yang Li, Yi Song
To obtain the accurate transient states of the big scale natural gas pipeline networks under the bad data and non-zero mean noises conditions, a robust Kalman filter-based dynamic state estimation method is proposed using the linearized gas pipeline transient flow equations in this paper.
no code implementations • SEMEVAL 2020 • Yice Zhang, Jiaxuan Lin, Yang Fan, Peng Jin, Yuanchao Liu, Bingquan Liu
For this task, it is obvious that external knowledge, such as Knowledge graph, can help the model understand commonsense in natural language statements.
no code implementations • 2 Oct 2020 • Vishal Mandal, Abdul Rashid Mussah, Peng Jin, Yaw Adu-Gyamfi
Real-time object detection algorithms coupled with different tracking systems are deployed to automatically detect stranded vehicles as well as perform vehicular counts.
1 code implementation • 4 May 2020 • Ping Cai, Xingyuan Chen, Peng Jin, Hongjun Wang, Tianrui Li
The purpose of unconditional text generation is to train a model with real sentences, then generate novel sentences of the same quality and diversity as the training data.
1 code implementation • 5 Apr 2020 • Xingyuan Chen, Ping Cai, Peng Jin, Hongjun Wang, Xin-yu Dai, Jia-Jun Chen
To alleviate the exposure bias, generative adversarial networks (GAN) use the discriminator to update the generator's parameters directly, but they fail by being evaluated precisely.
no code implementations • 20 Oct 2019 • Hamed Majidifard, Peng Jin, Yaw Adu-Gyamfi, William G. Buttlar
Automated pavement distresses detection using road images remains a challenging topic in the computer vision research community.
no code implementations • 28 Sep 2019 • Xingyuan Chen, Ping Cai, Peng Jin, Haokun Du, Hongjun Wang, Xingyu Dai, Jia-Jun Chen
In this paper, we theoretically propose two metric functions to measure the distributional difference between real text and generated text.
1 code implementation • 30 May 2019 • Xingyuan Chen, Yanzhe Li, Peng Jin, Jiuhua Zhang, Xin-yu Dai, Jia-Jun Chen, Gang Song
It is easy to improve the existing GAN-based models with this mechanism.
no code implementations • 13 Jan 2017 • Jielei Chu, Hongjun Wang, Hua Meng, Peng Jin, Tianrui Li
To enhance the expression ability of traditional RBMs, in this paper, we propose pairwise constraints restricted Boltzmann machine with Gaussian visible units (pcGRBM) model, in which the learning procedure is guided by pairwise constraints and the process of encoding is conducted under these guidances.
no code implementations • LREC 2012 • Yunqing Xia, Guoyu Tang, Peng Jin, Xia Yang
A preliminary evaluation with CLTC corpus indicates that the corpus is effective in evaluating cross-lingual topic detection methods.