Search Results for author: Jian Yang

Found 389 papers, 169 papers with code

Zero-Shot Image Super-Resolution with Depth Guided Internal Degradation Learning

no code implementations ECCV 2020 Xi Cheng, Zhen-Yong Fu, Jian Yang

In the past few years, we have witnessed the great progress of image super-resolution (SR) thanks to the power of deep learning.

Image Super-Resolution

Patch Triplet Similarity Purification for Guided Real-World Low-Dose CT Image Denoising

no code implementations1 Feb 2025 Junhao Long, Fengwei Yang, Juncheng Yan, Baoping Zhang, Chao Jin, Jian Yang, Changliang Zou, Jun Xu

Since non-contrast CT (NCCT) images share the content characteristics to the corresponding NDCT images in a three-phase scan, they can potentially provide useful information for real-world LDCT image denoising.

Image Denoising Triplet

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

1 code implementation23 Jan 2025 Tao Liu, Kai Wang, Senmao Li, Joost Van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng

Drawing inspiration from the inherent context consistency, we propose a novel training-free method for consistent text-to-image (T2I) generation, termed "One-Prompt-One-Story" (1Prompt1Story).

Story Generation Text-to-Image Generation

Three-view Focal Length Recovery From Homographies

1 code implementation13 Jan 2025 Yaqing Ding, Viktor Kocur, Zuzana Berger Haladová, Qianliang Wu, Shen Cai, Jian Yang, Zuzana Kukelova

In this paper, we propose a novel approach for recovering focal lengths from three-view homographies.

Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

1 code implementation13 Jan 2025 Yaqing Ding, Václav Vávra, Viktor Kocur, Jian Yang, Torsten Sattler, Zuzana Kukelova

We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths.

Camera Pose Estimation Depth Estimation +2

Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation

no code implementations10 Jan 2025 Minxing Luo, Zixun Xia, Liaojun Chen, Zhenhang Li, Weichao Zeng, Jianye Wang, Wentao Cheng, Yaxing Wang, Yu Zhou, Jian Yang

In this paper, we introduce a new training-free framework, STGen, which accurately generates visual texts in challenging scenarios (\eg, slanted or curved text layouts) while harmonizing them with the text background.

Text Generation

RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark

1 code implementation8 Jan 2025 Xin Zhang, Xue Yang, YuXuan Li, Jian Yang, Ming-Ming Cheng, Xiang Li

Our approach can effectively improve the performance of existing state-of-the-art weakly supervised methods and even surpasses fully supervised models on existing optical benchmarks (i. e., DOTA-v1. 0 dataset).

object-detection Object Detection +1

EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy

no code implementations2 Jan 2025 Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang

In this way, the proposed method tackles the limitation on initialization and optimization, leading to an efficient and accurate 3DGS modeling.

3DGS Novel View Synthesis

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection

1 code implementation30 Dec 2024 YuXuan Li, Xiang Li, Yunheng Li, YiCheng Zhang, Yimian Dai, Qibin Hou, Ming-Ming Cheng, Jian Yang

To address these, we establish a benchmark dataset and propose a unified model, SM3Det (Single Model for Multi-Modal datasets and Multi-Task object Detection).

object-detection Object Detection

Towards Better Spherical Sliced-Wasserstein Distance Learning with Data-Adaptive Discriminative Projection Direction

no code implementations26 Dec 2024 Hongliang Zhang, Shuo Chen, Lei Luo, Jian Yang

To address this issue, we propose a novel data-adaptive Discriminative Spherical Sliced-Wasserstein (DSSW) distance, which utilizes a projected energy function to determine the discriminative projection direction for SSW.

Density Estimation Representation Learning +1

ERGNN: Spectral Graph Neural Network With Explicitly-Optimized Rational Graph Filters

no code implementations26 Dec 2024 Guoming Li, Jian Yang, Shangsong Liang

Approximation-based spectral graph neural networks, which construct graph filters with function approximation, have shown substantial performance in graph learning tasks.

Graph Learning Graph Neural Network

Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion

no code implementations26 Dec 2024 Zhiqiang Yan, Zhengxue Wang, Kun Wang, Jun Li, Jian Yang

In this paper, we introduce the Selective Image Guided Network (SigNet), a novel degradation-aware framework that transforms depth completion into depth enhancement for the first time.

Depth Completion Mamba

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

1 code implementation20 Dec 2024 Xiantao Hu, Ying Tai, Xu Zhao, Chen Zhao, Zhenyu Zhang, Jun Li, Bineng Zhong, Jian Yang

These temporal information tokens are used to guide the localization of the target in the next time state, establish long-range contextual relationships between video frames, and capture the temporal trajectory of the target.

Mamba Rgb-T Tracking +1

StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors

1 code implementation16 Dec 2024 Xiaokun Sun, Zeyu Cai, Ying Tai, Jian Yang, Zhenyu Zhang

We propose StrandHead, a novel text to 3D head avatar generation method capable of generating disentangled 3D hair with strand representation.

Diversity Text to 3D

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

1 code implementation16 Dec 2024 Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang

As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.

Language Modeling Language Modelling +2

Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video

no code implementations16 Dec 2024 Junkai Fan, Kun Wang, Zhiqiang Yan, Xiang Chen, Shangbing Gao, Jun Li, Jian Yang

In this paper, we study the challenging problem of simultaneously removing haze and estimating depth from real monocular hazy videos.

Depth Estimation

ATPrompt: Textual Prompt Learning with Embedded Attributes

1 code implementation12 Dec 2024 Zheng Li, Yibing Song, Penghai Zhao, Ming-Ming Cheng, Xiang Li, Jian Yang

Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text prompt inputs, aiming to align image and text (category) spaces for downstream tasks.

Attribute Large Language Model

Agent-based Video Trimming

no code implementations12 Dec 2024 Lingfeng Yang, Zhenyuan Chen, Xiang Li, Peiyang Jia, Liangqu Long, Jian Yang

As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights.

Highlight Detection Moment Retrieval +2

Adaptive$^2$: Adaptive Domain Mining for Fine-grained Domain Adaptation Modeling

no code implementations11 Dec 2024 Wenxuan Sun, Zixuan Yang, Yunli Wang, Zhen Zhang, Zhiqiang Wang, Yu Li, Jian Yang, Yiming Yang, Shiyang Wen, Peng Jiang, Kun Gai

To the best of our knowledge, Adaptive$^2$ is the first approach to automatically learn both domain identification and adaptation in online advertising, opening new research directions for this area.

Domain Adaptation

Customized Generation Reimagined: Fidelity and Editability Harmonized

1 code implementation6 Dec 2024 Jian Jin, Yang shen, ZhenYong Fu, Jian Yang

Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts.

Denoising

Behavior Backdoor for Deep Learning Models

no code implementations2 Dec 2024 Jiakai Wang, Pengfei Zhang, Renshuai Tao, Jian Yang, Hao liu, Xianglong Liu, Yunchao Wei, Yao Zhao

Specifically, to adapt the optimization goal of behavior backdoor, we introduce the behavior-driven backdoor object optimizing method by a bi-target behavior backdoor training loss, thus we could guide the poisoned model optimization direction.

Backdoor Attack Deep Learning +1

Towards Robust Cross-Domain Recommendation with Joint Identifiability of User Preference

no code implementations26 Nov 2024 Jing Du, Zesheng Ye, Bin Guo, Zhiwen Yu, Jia Wu, Jian Yang, Michael Sheng, Lina Yao

To achieve this, we introduce a hierarchical user preference modeling framework that organizes user representations by the neural network encoder's depth, allowing separate treatment of shallow and deeper subspaces.

Disentanglement Transfer Learning

LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation

no code implementations22 Nov 2024 Fan Deng, Yaguang Wu, Xinyang Yu, Xiangjun Huang, Jian Yang, Guangyu Yan, Qiang Xu

Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images.

Scaling Laws for Online Advertisement Retrieval

no code implementations20 Nov 2024 Yunli Wang, Zixuan Yang, Zhen Zhang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Peng Jiang, Kun Gai

To the best of our knowledge, this is the first work to study the scaling laws for online advertisement retrieval of real-world systems, showing great potential for scaling law in advertising system optimization.

Retrieval

Leveraging MLLM Embeddings and Attribute Smoothing for Compositional Zero-Shot Learning

no code implementations18 Nov 2024 Xudong Yan, Songhe Feng, Yang Zhang, Jian Yang, Yueguan Lin, Haojun Fei

Moreover, we propose attribute smoothing with auxiliary attributes generated by Large Language Model (LLM) for seen compositions, addressing the issue of overconfidence by encouraging the model to learn more attributes in one given composition.

Attribute Compositional Zero-Shot Learning +8

United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images

1 code implementation11 Nov 2024 Yanguang Sun, Jian Yang, Lei Luo

Technically, we first design a frequency-spatial domain transformer block that mutually amalgamates the complementary local spatial and global frequency features to strength the capability of initial input features.

object-detection Object Detection +1

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

1 code implementation11 Nov 2024 Taihang Hu, Linxuan Li, Joost Van de Weijer, Hongcheng Gao, Fahad Shahbaz Khan, Jian Yang, Ming-Ming Cheng, Kai Wang, Yaxing Wang

In this paper, we define semantic binding as the task of associating a given object with its attribute, termed attribute binding, or linking it to other related sub-objects, referred to as object binding.

Attribute Image Generation +1

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

1 code implementation10 Nov 2024 Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun Li, Qian Wang, Jian Yang, Ying Tai

Regional prompting, or compositional generation, which enables fine-grained spatial control, has gained increasing attention for its practicality in real-world applications.

Attribute RAG +1

Compressive Spectrum Sensing with 1-bit ADCs

no code implementations7 Nov 2024 Jian Yang, Zihang Song, Han Zhang, Yue Gao

Efficient wideband spectrum sensing (WSS) is essential for managing spectrum scarcity in wireless communications.

Quantization

Domain Generalization for Cross-Receiver Radio Frequency Fingerprint Identification

no code implementations6 Nov 2024 Ying Zhang, Qiang Li, Hongli Liu, Liu Yang, Jian Yang

Radio Frequency Fingerprint Identification (RFFI) technology uniquely identifies emitters by analyzing unique distortions in the transmitted signal caused by non-ideal hardware.

Domain Generalization Federated Learning

MdEval: Massively Multilingual Code Debugging

no code implementations4 Nov 2024 Shukai Liu, Linzheng Chai, Jian Yang, Jiajun Shi, He Zhu, Liran Wang, Ke Jin, Wei zhang, Hualei Zhu, Shuyue Guo, Tao Sun, Jiaheng Liu, Yunlong Duan, Yu Hao, Liqun Yang, Guanglin Niu, Ge Zhang, Zhoujun Li

Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet.

Program Repair

Efficient Non-Exemplar Class-Incremental Learning with Retrospective Feature Synthesis

no code implementations3 Nov 2024 Liang Bai, Hong Song, Yucong Lin, Tianyu Fu, Deqiang Xiao, Danni Ai, Jingfan Fan, Jian Yang

Additionally, we introduce a similarity-based feature compensation mechanism that integrates generated old class features with similar new class features to synthesize robust retrospective representations.

class-incremental learning Class Incremental Learning +1

Novel Object Synthesis via Adaptive Text-Image Harmony

no code implementations28 Oct 2024 Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li

In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image.

Object

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

no code implementations28 Oct 2024 Jiaheng Liu, Ken Deng, Congnan Liu, Jian Yang, Shukai Liu, He Zhu, Peng Zhao, Linzheng Chai, Yanan Wu, Ke Jin, Ge Zhang, Zekun Wang, Guoan Zhang, Bangyu Xiang, Wenbo Su, Bo Zheng

Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored.

Code Completion

Grid4D: 4D Decomposed Hash Encoding for High-Fidelity Dynamic Gaussian Splatting

no code implementations28 Oct 2024 Jiawei Xu, Zexin Fan, Jian Yang, Jin Xie

To tackle these problems, we propose Grid4D, a dynamic scene rendering model based on Gaussian splatting and employing a novel explicit encoding method for the 4D input through the hash encoding.

FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation

1 code implementation25 Oct 2024 Tianyu Zhang, Guocheng Qian, Jin Xie, Jian Yang

Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure.

Scene Flow Estimation

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

1 code implementation19 Oct 2024 Kun Wang, Zhiqiang Yan, Junkai Fan, Wanlu Zhu, Xiang Li, Jun Li, Jian Yang

In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task.

Monocular Depth Estimation

ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing

no code implementations18 Oct 2024 Jimin Dai, Yingzhen Zhang, Shuo Chen, Jian Yang, Lei Luo

While DM efficiently generates images under this assumption, it can also accumulate errors during the diffusion process due to the assumption, ultimately negatively impacting the quality of real image reconstruction and editing.

Image Reconstruction SSIM

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

1 code implementation17 Oct 2024 Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu

In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i. e., math, coding, commonsense reasoning).

Math

Degradation Oriented and Regularized Network for Blind Depth Super-Resolution

1 code implementation15 Oct 2024 Zhengxue Wang, Zhiqiang Yan, Jinshan Pan, Guangwei Gao, Kai Zhang, Jian Yang

Recent RGB-guided depth super-resolution methods have achieved impressive performance under the assumption of fixed and known degradation (e. g., bicubic downsampling).

Super-Resolution

MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling

no code implementations14 Oct 2024 Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha

However, we have identified that recent methods inevitably suffer from loss of image information during understanding task, due to either image discretization or diffusion denoising steps.

Denoising Image Generation +2

HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation

no code implementations10 Oct 2024 Shanyan Guan, Yanhao Ge, Ying Tai, Jian Yang, Wei Li, Mingyu You

Recent advancements in text-to-image diffusion models have shown remarkable creative capabilities with textual prompts, but generating personalized instances based on specific subjects, known as subject-driven generation, remains challenging.

In-Context Code-Text Learning for Bimodal Software Engineering

no code implementations8 Oct 2024 Xunzhu Tang, Liran Wang, Yonghui Liu, Linzheng Chai, Jian Yang, Zhoujun Li, Haoye Tian, Jacques Klein, Tegawende F. Bissyande

Unfortunately, the complex interplay of natural language text and code in software engineering, presents unique challenges that prevent pretrained models to generalize to a variety of tasks.

Clone Detection In-Context Learning +1

LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details

no code implementations1 Oct 2024 Jian Yang, Xukun Wang, Wentao Wang, Guoming Li, Qihang Fang, Ruihong Yuan, Tianyang Wang, Jason Zhaoxin Fan

Our experiments further demonstrate that the high-frequency texture deficiency of the foundation model can be temporally consistently recovered by the Space-Optimised Vector Quantised Auto Encoder (SOVQAE) we introduced, thereby facilitating the creation of realistic talking head videos.

Denoising Talking Head Generation

HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes

1 code implementation30 Sep 2024 Changfeng Feng, Zhenyuan Chen, Renke Kou, Guangwei Gao, Chunping Wang, Xiang Li, Xiangbo Shu, Yimian Dai, Qiang Fu, Jian Yang

By observing the significant variations in object scale and clarity under different depth and haze conditions, we designed a Depth Conditioned Detector (DeCoDet) to incorporate this prior knowledge.

Object object-detection +1

GrokLST: Towards High-Resolution Benchmark and Toolkit for Land Surface Temperature Downscaling

1 code implementation30 Sep 2024 Qun Dai, Chunyang Yuan, Yimian Dai, YuXuan Li, Xiang Li, Kang Ni, Jianhui Xu, Xiangbo Shu, Jian Yang

Land Surface Temperature (LST) is a critical parameter for environmental studies, but obtaining high-resolution LST data remains challenging due to the spatio-temporal trade-off in satellite remote sensing.

RNG: Relightable Neural Gaussians

no code implementations29 Sep 2024 Jiahui Fan, Fujun Luan, Jian Yang, Miloš Hašan, Beibei Wang

We propose Relightable Neural Gaussians (RNG), a novel 3DGS-based framework that enables the relighting of objects with both hard surfaces or soft boundaries, while avoiding assumptions on the shading model.

3DGS Novel View Synthesis

Cascade Prompt Learning for Vision-Language Model Adaptation

2 code implementations26 Sep 2024 Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li

Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks.

General Knowledge Image Classification +3

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

1 code implementation24 Sep 2024 Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, Junran Peng, Zhaoxiang Zhang, Songyang Zhang, Kai Chen

Therefore, we introduce the Hierarchical Long Text Generation Benchmark (HelloBench), a comprehensive, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text.

Long-Context Understanding Text Generation

SoccerNet 2024 Challenges Results

1 code implementation16 Sep 2024 Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk, Zuzanna Mróz, Szymon Łukasik, Michał Hałoń, Hassan Mkhallati, Adrien Deliège, Carlos Hinojosa, Karen Sanchez, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Adam Gorski, Albert Clapés, Andrei Boiarov, Anton Afanasiev, Artur Xarles, Atom Scott, Byoungkwon Lim, Calvin Yeung, Cristian Gonzalez, Dominic Rüfenacht, Enzo Pacilio, Fabian Deuser, Faisal Sami Altawijri, Francisco Cachón, Hankyul Kim, Haobo Wang, Hyeonmin Choe, Hyunwoo J Kim, Il-Min Kim, Jae-Mo Kang, Jamshid Tursunboev, Jian Yang, Jihwan Hong, JiMin Lee, Jing Zhang, Junseok Lee, Kexin Zhang, Konrad Habel, Licheng Jiao, Linyi Li, Marc Gutiérrez-Pérez, Marcelo Ortega, Menglong Li, Milosz Lopatto, Nikita Kasatkin, Nikolay Nemtsev, Norbert Oswald, Oleg Udin, Pavel Kononov, Pei Geng, Saad Ghazai Alotaibi, Sehyung Kim, Sergei Ulasen, Sergio Escalera, Shanshan Zhang, Shuyuan Yang, Sunghwan Moon, Thomas B. Moeslund, Vasyl Shandyba, Vladimir Golovkin, Wei Dai, WonTaek Chung, Xinyu Liu, Yongqiang Zhu, Youngseo Kim, Yuan Li, Yuting Yang, Yuxuan Xiao, Zehua Cheng, Zhihao LI

The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team.

Action Spotting Dense Video Captioning +2

GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection

1 code implementation15 Sep 2024 Yanguang Sun, Hanyu Xuan, Jian Yang, Lei Luo

Recently, biological perception has been a powerful tool for handling the camouflaged object detection (COD) task.

Decoder object-detection +1

IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web

no code implementations14 Sep 2024 Hongcheng Guo, Wei zhang, JunHao Chen, Yaonan Gu, Jian Yang, Junjia Du, Binyuan Hui, Tianyu Liu, Jianxin Ma, Chang Zhou, Zhoujun Li

We have conducted extensive experiments on existing large multimodal models, offering insights into their performance and areas for improvement in image-to-web domain.

Image Comprehension

wgatools: an ultrafast toolkit for manipulating whole genome alignments

1 code implementation13 Sep 2024 Wenjie Wei, Songtao Gui, Jian Yang, Erik Garrison, Jianbing Yan, Hai-Jun Liu

Summary: With the rapid development of long-read sequencing technologies, the era of individual complete genomes is approaching.

Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

1 code implementation12 Sep 2024 Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang

MGHS projects the 2D image features into multiple subspaces, where each grid contains features within reasonable height ranges.

3D geometry

LIME: Less Is More for MLLM Evaluation

2 code implementations10 Sep 2024 King Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shawn Gavin, Tuney Zheng, Jiawei Guo, Bo Li, HaoNing Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang

However, many of these benchmarks include overly simple or uninformative samples, complicating the effective distinction of different MLLMs' performance.

Image Captioning Question Answering +1

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

no code implementations3 Sep 2024 Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli Wang, Linzheng Chai, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations.

Language Modeling Language Modelling +2

Attention-Guided Multi-scale Interaction Network for Face Super-Resolution

no code implementations1 Sep 2024 Xujie Wan, Wenjie Li, Guangwei Gao, Huimin Lu, Jian Yang, Chia-Wen Lin

Recently, CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks.

Decoder Super-Resolution

SBDet: A Symmetry-Breaking Object Detector via Relaxed Rotation-Equivariance

no code implementations21 Aug 2024 Zhiqiang Wu, Yingjie Liu, Hanlin Dong, Xuan Tang, Jian Yang, Bo Jin, Mingsong Chen, Xian Wei

Furthermore, we propose a Relaxed Rotation-Equivariant Network (R2Net) as the backbone and further develop the Symmetry-Breaking Object Detector (SBDet) for 2D object detection built upon it.

Image Classification Object +2

Flatten: Video Action Recognition is an Image Classification task

no code implementations17 Aug 2024 Junlin Chen, Chengcheng Xu, Yangfan Xu, Jian Yang, Jun Li, Zhiping Shi

In recent years, video action recognition, as a fundamental task in the field of video understanding, has been deeply explored by numerous researchers. Most traditional video action recognition methods typically involve converting videos into three-dimensional data that encapsulates both spatial and temporal information, subsequently leveraging prevalent image understanding models to model and analyze these data.

Action Recognition Image Classification +2

Barbie: Text to Barbie-Style 3D Avatars

1 code implementation17 Aug 2024 Xiaokun Sun, Zhenyu Zhang, Ying Tai, Qian Wang, Hao Tang, Zili Yi, Jian Yang

In this paper, we propose Barbie, a novel framework for generating 3D avatars that can be dressed in diverse and high-quality Barbie-like garments and accessories.

Disentanglement Diversity

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

no code implementations17 Aug 2024 Xianjie Wu, Jian Yang, Linzheng Chai, Ge Zhang, Jiaheng Liu, Xinrun Du, Di Liang, Daixin Shu, Xianfu Cheng, Tianzhen Sun, Guanglin Niu, Tongliang Li, Zhoujun Li

Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities.

Question Answering

AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

1 code implementation13 Aug 2024 Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, ChengWei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu, Xiangjun Huang, Jian Yang

In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale.

Language Modelling Transfer Learning

Pick of the Bunch: Detecting Infrared Small Targets Beyond Hit-Miss Trade-Offs via Selective Rank-Aware Attention

1 code implementation7 Aug 2024 Yimian Dai, Peiwen Pan, Yulei Qian, YuXuan Li, Xiang Li, Jian Yang, Huan Wang

Infrared small target detection faces the inherent challenge of precisely localizing dim targets amidst complex background clutter.

From Words to Worth: Newborn Article Impact Prediction with LLM

no code implementations7 Aug 2024 Penghai Zhao, Qinghua Xing, Kairan Dou, Jinyu Tian, Ying Tai, Jian Yang, Ming-Ming Cheng, Xiang Li

As the academic landscape expands, the challenge of efficiently identifying impactful newly published articles grows increasingly vital.

parameter-efficient fine-tuning

Synthesizing Text-to-SQL Data from Weak and Strong LLMs

no code implementations6 Aug 2024 Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Junyang Lin, Chang Zhou

The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to-SQL tasks.

Domain Generalization Text-To-SQL

Add-SD: Rational Generation without Manual Reference

1 code implementation30 Jul 2024 Lingfeng Yang, Xinyu Zhang, Xiang Li, Jinwen Chen, Kun Yao, Gang Zhang, Errui Ding, Lingqiao Liu, Jingdong Wang, Jian Yang

Our work contributes in three aspects: proposing a dataset containing numerous instructed image pairs; fine-tuning a diffusion model for rational generation; and generating synthetic data to boost downstream tasks.

Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer

no code implementations29 Jul 2024 Yang Wu, Kaihua Zhang, Jianjun Qian, Jin Xie, Jian Yang

The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging.

Point Cloud Generation

Background Semantics Matter: Cross-Task Feature Exchange Network for Clustered Infrared Small Target Detection With Sky-Annotated Dataset

1 code implementation29 Jul 2024 Mengxuan Xiao, Qun Dai, Yiming Zhu, Kehua Guo, Huan Wang, Xiangbo Shu, Jian Yang, Yimian Dai

To address this, we introduce a new task--clustered infrared small target detection, and present DenseSIRST, a novel benchmark dataset that provides per-pixel semantic annotations for background regions, enabling the transition from sparse to dense target detection.

Semantic Segmentation

Double-Shot 3D Shape Measurement with a Dual-Branch Network for Structured Light Projection Profilometry

no code implementations19 Jul 2024 Mingyang Lei, Jingfan Fan, Long Shao, Hong Song, Deqiang Xiao, Danni Ai, Tianyu Fu, Ying Gu, Jian Yang

Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

no code implementations3 Jul 2024 Xia Hou, QiFeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, Jingbo Dun, Wenfeng Song

In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generating knowledge-intensive multi-turn dialogues for instruction tuning.

Language Modeling Language Modelling +1

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

no code implementations2 Jul 2024 Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai

Additionally, we propose a novel Multi-modal Video Diffusion Transformer (MVDiT) capable of mining both structure information from visual tokens and semantic information from text tokens.

Text-to-Video Generation Video Generation

Complementary Fusion of Deep Network and Tree Model for ETA Prediction

no code implementations1 Jul 2024 Yurui Huang, Jie Zhang, HengDa Bao, Yang Yang, Jian Yang

Estimated time of arrival (ETA) is a very important factor in the transportation system.

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

no code implementations25 Jun 2024 Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

To address these issues, we propose the LongIns benchmark dataset, a challenging long-context instruction-based exam for LLMs, which is built based on the existing instruction datasets.

16k 4k

UniCoder: Scaling Code Large Language Model via Universal Code

no code implementations24 Jun 2024 Tao Sun, Linzheng Chai, Jian Yang, Yuwei Yin, Hongcheng Guo, Jiaheng Liu, Bing Wang, Liqun Yang, Zhoujun Li

When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural language or other structured intermediate steps.

Code Translation Language Modeling +3