no code implementations • ECCV 2020 • Xi Cheng, Zhen-Yong Fu, Jian Yang
In the past few years, we have witnessed the great progress of image super-resolution (SR) thanks to the power of deep learning.
no code implementations • 4 Feb 2025 • Senmao Li, Kai Wang, Joost Van de Weijer, Fahad Shahbaz Khan, Chun-Le Guo, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng
Diffusion priors have been used for blind face restoration (BFR) by fine-tuning diffusion models (DMs) on restoration datasets to recover low-quality images.
no code implementations • 1 Feb 2025 • Junhao Long, Fengwei Yang, Juncheng Yan, Baoping Zhang, Chao Jin, Jian Yang, Changliang Zou, Jun Xu
Since non-contrast CT (NCCT) images share the content characteristics to the corresponding NDCT images in a three-phase scan, they can potentially provide useful information for real-world LDCT image denoising.
1 code implementation • 23 Jan 2025 • Tao Liu, Kai Wang, Senmao Li, Joost Van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng
Drawing inspiration from the inherent context consistency, we propose a novel training-free method for consistent text-to-image (T2I) generation, termed "One-Prompt-One-Story" (1Prompt1Story).
1 code implementation • 13 Jan 2025 • Yaqing Ding, Viktor Kocur, Zuzana Berger Haladová, Qianliang Wu, Shen Cai, Jian Yang, Zuzana Kukelova
In this paper, we propose a novel approach for recovering focal lengths from three-view homographies.
1 code implementation • 13 Jan 2025 • Yaqing Ding, Václav Vávra, Viktor Kocur, Jian Yang, Torsten Sattler, Zuzana Kukelova
We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths.
no code implementations • 10 Jan 2025 • Minxing Luo, Zixun Xia, Liaojun Chen, Zhenhang Li, Weichao Zeng, Jianye Wang, Wentao Cheng, Yaxing Wang, Yu Zhou, Jian Yang
In this paper, we introduce a new training-free framework, STGen, which accurately generates visual texts in challenging scenarios (\eg, slanted or curved text layouts) while harmonizing them with the text background.
1 code implementation • 8 Jan 2025 • Xin Zhang, Xue Yang, YuXuan Li, Jian Yang, Ming-Ming Cheng, Xiang Li
Our approach can effectively improve the performance of existing state-of-the-art weakly supervised methods and even surpasses fully supervised models on existing optical benchmarks (i. e., DOTA-v1. 0 dataset).
no code implementations • 6 Jan 2025 • Rui Xie, Yinhong Liu, Penghao Zhou, Chen Zhao, Jun Zhou, Kai Zhang, Zhenyu Zhang, Jian Yang, Zhenheng Yang, Ying Tai
Image diffusion models have been adapted for real-world video super-resolution to tackle over-smoothing issues in GAN-based methods.
no code implementations • 2 Jan 2025 • Ao Gao, Luosong Guo, Tao Chen, Zhao Wang, Ying Tai, Jian Yang, Zhenyu Zhang
In this way, the proposed method tackles the limitation on initialization and optimization, leading to an efficient and accurate 3DGS modeling.
no code implementations • 2 Jan 2025 • Shanghaoran Quan, Jiaxi Yang, Bowen Yu, Bo Zheng, Dayiheng Liu, An Yang, Xuancheng Ren, Bofei Gao, Yibo Miao, Yunlong Feng, Zekun Wang, Jian Yang, Zeyu Cui, Yang Fan, Yichang Zhang, Binyuan Hui, Junyang Lin
CodeElo benchmark is mainly based on the official CodeForces platform and tries to align with the platform as much as possible.
1 code implementation • 30 Dec 2024 • YuXuan Li, Xiang Li, Yunheng Li, YiCheng Zhang, Yimian Dai, Qibin Hou, Ming-Ming Cheng, Jian Yang
To address these, we establish a benchmark dataset and propose a unified model, SM3Det (Single Model for Multi-Modal datasets and Multi-Task object Detection).
no code implementations • 26 Dec 2024 • Hongliang Zhang, Shuo Chen, Lei Luo, Jian Yang
To address this issue, we propose a novel data-adaptive Discriminative Spherical Sliced-Wasserstein (DSSW) distance, which utilizes a projected energy function to determine the discriminative projection direction for SSW.
no code implementations • 26 Dec 2024 • Guoming Li, Jian Yang, Shangsong Liang
Approximation-based spectral graph neural networks, which construct graph filters with function approximation, have shown substantial performance in graph learning tasks.
no code implementations • 26 Dec 2024 • Zhiqiang Yan, Zhengxue Wang, Kun Wang, Jun Li, Jian Yang
In this paper, we introduce the Selective Image Guided Network (SigNet), a novel degradation-aware framework that transforms depth completion into depth enhancement for the first time.
1 code implementation • 20 Dec 2024 • Xiantao Hu, Ying Tai, Xu Zhao, Chen Zhao, Zhenyu Zhang, Jun Li, Bineng Zhong, Jian Yang
These temporal information tokens are used to guide the localization of the target in the next time state, establish long-range contextual relationships between video frames, and capture the temporal trajectory of the target.
Ranked #3 on
Rgb-T Tracking
on LasHeR
4 code implementations • 19 Dec 2024 • Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, TianHao Li, Tianyi Tang, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu
In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2. 5-Turbo and Qwen2. 5-Plus, both available from Alibaba Cloud Model Studio.
Ranked #6 on
on GPQA
1 code implementation • 18 Dec 2024 • Xiaoqi An, Lin Zhao, Chen Gong, Jun Li, Jian Yang
In particular, compared with LPFormer on Waymo, we reduce the average MPJPE by $10. 0mm$.
1 code implementation • 16 Dec 2024 • Xiaokun Sun, Zeyu Cai, Ying Tai, Jian Yang, Zhenyu Zhang
We propose StrandHead, a novel text to 3D head avatar generation method capable of generating disentangled 3D hair with strand representation.
1 code implementation • 16 Dec 2024 • Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang
As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.
no code implementations • 16 Dec 2024 • Junkai Fan, Kun Wang, Zhiqiang Yan, Xiang Chen, Shangbing Gao, Jun Li, Jian Yang
In this paper, we study the challenging problem of simultaneously removing haze and estimating depth from real monocular hazy videos.
no code implementations • 16 Dec 2024 • Jian Yang, Jiajun Zhang, Jiaxi Yang, Ke Jin, Lei Zhang, Qiyao Peng, Ken Deng, Yibo Miao, Tianyu Liu, Zeyu Cui, Binyuan Hui, Junyang Lin
Code completion has become an essential tool for daily software development.
1 code implementation • 12 Dec 2024 • Zheng Li, Yibing Song, Penghai Zhao, Ming-Ming Cheng, Xiang Li, Jian Yang
Textual-based prompt learning methods primarily employ multiple learnable soft prompts and hard class tokens in a cascading manner as text prompt inputs, aiming to align image and text (category) spaces for downstream tasks.
no code implementations • 12 Dec 2024 • Lingfeng Yang, Zhenyuan Chen, Xiang Li, Peiyang Jia, Liangqu Long, Jian Yang
As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights.
no code implementations • 12 Dec 2024 • Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Zhenheng Yang, Chaoyou Fu, Xiang Li, Jian Yang, Ying Tai
Text-to-video generation has evolved rapidly in recent years, delivering remarkable results.
no code implementations • 11 Dec 2024 • Wenxuan Sun, Zixuan Yang, Yunli Wang, Zhen Zhang, Zhiqiang Wang, Yu Li, Jian Yang, Yiming Yang, Shiyang Wen, Peng Jiang, Kun Gai
To the best of our knowledge, Adaptive$^2$ is the first approach to automatically learn both domain identification and adaptation in online advertising, opening new research directions for this area.
1 code implementation • 6 Dec 2024 • Jian Jin, Yang shen, ZhenYong Fu, Jian Yang
Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts.
no code implementations • 6 Dec 2024 • Jian Yang, Jiaxi Yang, Ke Jin, Yibo Miao, Lei Zhang, Liqun Yang, Zeyu Cui, Yichang Zhang, Binyuan Hui, Junyang Lin
Code large language models (codeLLMs) have made significant strides in code generation.
no code implementations • 2 Dec 2024 • Jiakai Wang, Pengfei Zhang, Renshuai Tao, Jian Yang, Hao liu, Xianglong Liu, Yunchao Wei, Yao Zhao
Specifically, to adapt the optimization goal of behavior backdoor, we introduce the behavior-driven backdoor object optimizing method by a bi-target behavior backdoor training loss, thus we could guide the poisoned model optimization direction.
no code implementations • 26 Nov 2024 • Jing Du, Zesheng Ye, Bin Guo, Zhiwen Yu, Jia Wu, Jian Yang, Michael Sheng, Lina Yao
To achieve this, we introduce a hierarchical user preference modeling framework that organizes user representations by the neural network encoder's depth, allowing separate treatment of shallow and deeper subspaces.
no code implementations • 23 Nov 2024 • Haijie Yang, Zhenyu Zhang, Hao Tang, Jianjun Qian, Jian Yang
In this paper, we propose ConsistentAvatar, a novel framework for fully consistent and high-fidelity talking avatar generation.
no code implementations • 22 Nov 2024 • Fan Deng, Yaguang Wu, Xinyang Yu, Xiangjun Huang, Jian Yang, Guangyu Yan, Qiang Xu
Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images.
no code implementations • 20 Nov 2024 • Yunli Wang, Zixuan Yang, Zhen Zhang, Zhiqiang Wang, Jian Yang, Shiyang Wen, Peng Jiang, Kun Gai
To the best of our knowledge, this is the first work to study the scaling laws for online advertisement retrieval of real-world systems, showing great potential for scaling law in advertising system optimization.
no code implementations • 18 Nov 2024 • Xudong Yan, Songhe Feng, Yang Zhang, Jian Yang, Yueguan Lin, Haojun Fei
Moreover, we propose attribute smoothing with auxiliary attributes generated by Large Language Model (LLM) for seen compositions, addressing the issue of overconfidence by encouraging the model to learn more attributes in one given composition.
1 code implementation • 11 Nov 2024 • Yanguang Sun, Jian Yang, Lei Luo
Technically, we first design a frequency-spatial domain transformer block that mutually amalgamates the complementary local spatial and global frequency features to strength the capability of initial input features.
1 code implementation • 11 Nov 2024 • Taihang Hu, Linxuan Li, Joost Van de Weijer, Hongcheng Gao, Fahad Shahbaz Khan, Jian Yang, Ming-Ming Cheng, Kai Wang, Yaxing Wang
In this paper, we define semantic binding as the task of associating a given object with its attribute, termed attribute binding, or linking it to other related sub-objects, referred to as object binding.
1 code implementation • 10 Nov 2024 • Zhennan Chen, Yajie Li, Haofan Wang, Zhibo Chen, Zhengkai Jiang, Jun Li, Qian Wang, Jian Yang, Ying Tai
Regional prompting, or compositional generation, which enables fine-grained spatial control, has gained increasing attention for its practicality in real-world applications.
no code implementations • 7 Nov 2024 • Jian Yang, Zihang Song, Han Zhang, Yue Gao
Efficient wideband spectrum sensing (WSS) is essential for managing spectrum scarcity in wireless communications.
no code implementations • 6 Nov 2024 • Ying Zhang, Qiang Li, Hongli Liu, Liu Yang, Jian Yang
Radio Frequency Fingerprint Identification (RFFI) technology uniquely identifies emitters by analyzing unique distortions in the transmitted signal caused by non-ideal hardware.
no code implementations • 4 Nov 2024 • Shukai Liu, Linzheng Chai, Jian Yang, Jiajun Shi, He Zhu, Liran Wang, Ke Jin, Wei zhang, Hualei Zhu, Shuyue Guo, Tao Sun, Jiaheng Liu, Yunlong Duan, Yu Hao, Liqun Yang, Guanglin Niu, Ge Zhang, Zhoujun Li
Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet.
no code implementations • 3 Nov 2024 • Liang Bai, Hong Song, Yucong Lin, Tianyu Fu, Deqiang Xiao, Danni Ai, Jingfan Fan, Jian Yang
Additionally, we introduce a similarity-based feature compensation mechanism that integrates generated old class features with similar new class features to synthesize robust retrospective representations.
no code implementations • 28 Oct 2024 • Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li
In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image.
no code implementations • 28 Oct 2024 • Jiaheng Liu, Ken Deng, Congnan Liu, Jian Yang, Shukai Liu, He Zhu, Peng Zhao, Linzheng Chai, Yanan Wu, Ke Jin, Ge Zhang, Zekun Wang, Guoan Zhang, Bangyu Xiang, Wenbo Su, Bo Zheng
Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored.
no code implementations • 28 Oct 2024 • Jiawei Xu, Zexin Fan, Jian Yang, Jin Xie
To tackle these problems, we propose Grid4D, a dynamic scene rendering model based on Gaussian splatting and employing a novel explicit encoding method for the 4D input through the hash encoding.
1 code implementation • 27 Oct 2024 • Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tuney Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, Wanjun Zhong, Wangchunshu Zhou, Wenhao Huang, Ge Zhang
Data science tasks involving tabular data present complex challenges that require sophisticated problem-solving approaches.
1 code implementation • 25 Oct 2024 • Tianyu Zhang, Guocheng Qian, Jin Xie, Jian Yang
Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure.
no code implementations • 24 Oct 2024 • Yibo Miao, Bofei Gao, Shanghaoran Quan, Junyang Lin, Daoguang Zan, Jiaheng Liu, Jian Yang, Tianyu Liu, Zhijie Deng
We also contribute a pipeline for collecting preference pairs for DPO on CodeLLMs.
1 code implementation • 19 Oct 2024 • Kun Wang, Zhiqiang Yan, Junkai Fan, Wanlu Zhu, Xiang Li, Jun Li, Jian Yang
In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task.
no code implementations • 18 Oct 2024 • Jimin Dai, Yingzhen Zhang, Shuo Chen, Jian Yang, Lei Luo
While DM efficiently generates images under this assumption, it can also accumulate errors during the diffusion process due to the assumption, ultimately negatively impacting the quality of real image reconstruction and editing.
1 code implementation • 17 Oct 2024 • Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu
In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i. e., math, coding, commonsense reasoning).
1 code implementation • 15 Oct 2024 • Zhengxue Wang, Zhiqiang Yan, Jinshan Pan, Guangwei Gao, Kai Zhang, Jian Yang
Recent RGB-guided depth super-resolution methods have achieved impressive performance under the assumption of fixed and known degradation (e. g., bicubic downsampling).
no code implementations • 14 Oct 2024 • Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha
However, we have identified that recent methods inevitably suffer from loss of image information during understanding task, due to either image discretization or diffusion denoising steps.
Ranked #212 on
Visual Question Answering
on MM-Vet
no code implementations • 10 Oct 2024 • Shanyan Guan, Yanhao Ge, Ying Tai, Jian Yang, Wei Li, Mingyu You
Recent advancements in text-to-image diffusion models have shown remarkable creative capabilities with textual prompts, but generating personalized instances based on specific subjects, known as subject-driven generation, remains challenging.
no code implementations • 8 Oct 2024 • Xunzhu Tang, Liran Wang, Yonghui Liu, Linzheng Chai, Jian Yang, Zhoujun Li, Haoye Tian, Jacques Klein, Tegawende F. Bissyande
Unfortunately, the complex interplay of natural language text and code in software engineering, presents unique challenges that prevent pretrained models to generalize to a variety of tasks.
no code implementations • 1 Oct 2024 • Jian Yang, Xukun Wang, Wentao Wang, Guoming Li, Qihang Fang, Ruihong Yuan, Tianyang Wang, Jason Zhaoxin Fan
Our experiments further demonstrate that the high-frequency texture deficiency of the foundation model can be temporally consistently recovered by the Space-Optimised Vector Quantised Auto Encoder (SOVQAE) we introduced, thereby facilitating the creation of realistic talking head videos.
1 code implementation • 30 Sep 2024 • Changfeng Feng, Zhenyuan Chen, Renke Kou, Guangwei Gao, Chunping Wang, Xiang Li, Xiangbo Shu, Yimian Dai, Qiang Fu, Jian Yang
By observing the significant variations in object scale and clarity under different depth and haze conditions, we designed a Depth Conditioned Detector (DeCoDet) to incorporate this prior knowledge.
1 code implementation • 30 Sep 2024 • Qun Dai, Chunyang Yuan, Yimian Dai, YuXuan Li, Xiang Li, Kang Ni, Jianhui Xu, Xiangbo Shu, Jian Yang
Land Surface Temperature (LST) is a critical parameter for environmental studies, but obtaining high-resolution LST data remains challenging due to the spatio-temporal trade-off in satellite remote sensing.
no code implementations • 29 Sep 2024 • Jiahui Fan, Fujun Luan, Jian Yang, Miloš Hašan, Beibei Wang
We propose Relightable Neural Gaussians (RNG), a novel 3DGS-based framework that enables the relighting of objects with both hard surfaces or soft boundaries, while avoiding assumptions on the shading model.
2 code implementations • 26 Sep 2024 • Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li
Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks.
1 code implementation • 24 Sep 2024 • Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, Junran Peng, Zhaoxiang Zhang, Songyang Zhang, Kai Chen
Therefore, we introduce the Hierarchical Long Text Generation Benchmark (HelloBench), a comprehensive, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text.
no code implementations • 23 Sep 2024 • Yizhi Li, Ge Zhang, Yinghao Ma, Ruibin Yuan, Kang Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Zekun Wang, Jian Yang, Siwei Wu, Xingwei Qu, Jinjie Shi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Zhaoxiang Zhang, Zachary Liu, Emmanouil Benetos, Wenhao Huang, Chenghua Lin
Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities.
2 code implementations • 18 Sep 2024 • Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, Kai Dang, Yang Fan, Yichang Zhang, An Yang, Rui Men, Fei Huang, Bo Zheng, Yibo Miao, Shanghaoran Quan, Yunlong Feng, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin
In this report, we introduce the Qwen2. 5-Coder series, a significant upgrade from its predecessor, CodeQwen1. 5.
1 code implementation • 16 Sep 2024 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Victor Joos, Floriane Magera, Jan Held, Seyed Abolfazl Ghasemzadeh, Xin Zhou, Karolina Seweryn, Mateusz Kowalczyk, Zuzanna Mróz, Szymon Łukasik, Michał Hałoń, Hassan Mkhallati, Adrien Deliège, Carlos Hinojosa, Karen Sanchez, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Adam Gorski, Albert Clapés, Andrei Boiarov, Anton Afanasiev, Artur Xarles, Atom Scott, Byoungkwon Lim, Calvin Yeung, Cristian Gonzalez, Dominic Rüfenacht, Enzo Pacilio, Fabian Deuser, Faisal Sami Altawijri, Francisco Cachón, Hankyul Kim, Haobo Wang, Hyeonmin Choe, Hyunwoo J Kim, Il-Min Kim, Jae-Mo Kang, Jamshid Tursunboev, Jian Yang, Jihwan Hong, JiMin Lee, Jing Zhang, Junseok Lee, Kexin Zhang, Konrad Habel, Licheng Jiao, Linyi Li, Marc Gutiérrez-Pérez, Marcelo Ortega, Menglong Li, Milosz Lopatto, Nikita Kasatkin, Nikolay Nemtsev, Norbert Oswald, Oleg Udin, Pavel Kononov, Pei Geng, Saad Ghazai Alotaibi, Sehyung Kim, Sergei Ulasen, Sergio Escalera, Shanshan Zhang, Shuyuan Yang, Sunghwan Moon, Thomas B. Moeslund, Vasyl Shandyba, Vladimir Golovkin, Wei Dai, WonTaek Chung, Xinyu Liu, Yongqiang Zhu, Youngseo Kim, Yuan Li, Yuting Yang, Yuxuan Xiao, Zehua Cheng, Zhihao LI
The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team.
1 code implementation • 15 Sep 2024 • Yanguang Sun, Hanyu Xuan, Jian Yang, Lei Luo
Recently, biological perception has been a powerful tool for handling the camouflaged object detection (COD) task.
no code implementations • 14 Sep 2024 • Hongcheng Guo, Wei zhang, JunHao Chen, Yaonan Gu, Jian Yang, Junjia Du, Binyuan Hui, Tianyu Liu, Jianxin Ma, Chang Zhou, Zhoujun Li
We have conducted extensive experiments on existing large multimodal models, offering insights into their performance and areas for improvement in image-to-web domain.
1 code implementation • 13 Sep 2024 • Wenjie Wei, Songtao Gui, Jian Yang, Erik Garrison, Jianbing Yan, Hai-Jun Liu
Summary: With the rapid development of long-read sequencing technologies, the era of individual complete genomes is approaching.
1 code implementation • 12 Sep 2024 • Yuan Wu, Zhiqiang Yan, Zhengxue Wang, Xiang Li, Le Hui, Jian Yang
MGHS projects the 2D image features into multiple subspaces, where each grid contains features within reasonable height ranges.
2 code implementations • 10 Sep 2024 • King Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shawn Gavin, Tuney Zheng, Jiawei Guo, Bo Li, HaoNing Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang
However, many of these benchmarks include overly simple or uninformative samples, complicating the effective distinction of different MLLMs' performance.
1 code implementation • 4 Sep 2024 • Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Shanghaoran Quan, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang
Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.
no code implementations • 3 Sep 2024 • Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli Wang, Linzheng Chai, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li
In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations.
1 code implementation • 3 Sep 2024 • Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, Lei Luo
Camouflaged object detection has attracted a lot of attention in computer vision.
no code implementations • 1 Sep 2024 • Xujie Wan, Wenjie Li, Guangwei Gao, Huimin Lu, Jian Yang, Chia-Wen Lin
Recently, CNN and Transformer hybrid networks demonstrated excellent performance in face super-resolution (FSR) tasks.
1 code implementation • 22 Aug 2024 • Zhiqiang Wu, Yingjie Liu, Licheng Sun, Jian Yang, Hanlin Dong, Shing-Ho J. Lin, Xuan Tang, Jinpeng Mi, Bo Jin, Xian Wei
Group Equivariant Convolution (GConv) can capture rotational equivariance from original data.
no code implementations • 21 Aug 2024 • Zhiqiang Wu, Yingjie Liu, Hanlin Dong, Xuan Tang, Jian Yang, Bo Jin, Mingsong Chen, Xian Wei
Furthermore, we propose a Relaxed Rotation-Equivariant Network (R2Net) as the backbone and further develop the Symmetry-Breaking Object Detector (SBDet) for 2D object detection built upon it.
no code implementations • 17 Aug 2024 • Junlin Chen, Chengcheng Xu, Yangfan Xu, Jian Yang, Jun Li, Zhiping Shi
In recent years, video action recognition, as a fundamental task in the field of video understanding, has been deeply explored by numerous researchers. Most traditional video action recognition methods typically involve converting videos into three-dimensional data that encapsulates both spatial and temporal information, subsequently leveraging prevalent image understanding models to model and analyze these data.
1 code implementation • 17 Aug 2024 • Xiaokun Sun, Zhenyu Zhang, Ying Tai, Qian Wang, Hao Tang, Zili Yi, Jian Yang
In this paper, we propose Barbie, a novel framework for generating 3D avatars that can be dressed in diverse and high-quality Barbie-like garments and accessories.
no code implementations • 17 Aug 2024 • Xianjie Wu, Jian Yang, Linzheng Chai, Ge Zhang, Jiaheng Liu, Xinrun Du, Di Liang, Daixin Shu, Xianfu Cheng, Tianzhen Sun, Guanglin Niu, Tongliang Li, Zhoujun Li
Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities.
1 code implementation • 13 Aug 2024 • Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, ChengWei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu, Xiangjun Huang, Jian Yang
In this paper, we present AquilaMoE, a cutting-edge bilingual 8*16B Mixture of Experts (MoE) language model that has 8 experts with 16 billion parameters each and is developed using an innovative training methodology called EfficientScale.
1 code implementation • 7 Aug 2024 • Yimian Dai, Peiwen Pan, Yulei Qian, YuXuan Li, Xiang Li, Jian Yang, Huan Wang
Infrared small target detection faces the inherent challenge of precisely localizing dim targets amidst complex background clutter.
no code implementations • 7 Aug 2024 • Penghai Zhao, Qinghua Xing, Kairan Dou, Jinyu Tian, Ying Tai, Jian Yang, Ming-Ming Cheng, Xiang Li
As the academic landscape expands, the challenge of efficiently identifying impactful newly published articles grows increasingly vital.
no code implementations • 6 Aug 2024 • Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Junyang Lin, Chang Zhou
The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to-SQL tasks.
1 code implementation • 30 Jul 2024 • Lingfeng Yang, Xinyu Zhang, Xiang Li, Jinwen Chen, Kun Yao, Gang Zhang, Errui Ding, Lingqiao Liu, Jingdong Wang, Jian Yang
Our work contributes in three aspects: proposing a dataset containing numerous instructed image pairs; fine-tuning a diffusion model for rational generation; and generating synthetic data to boost downstream tasks.
no code implementations • 29 Jul 2024 • Yang Wu, Kaihua Zhang, Jianjun Qian, Jin Xie, Jian Yang
The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging.
1 code implementation • 29 Jul 2024 • Mengxuan Xiao, Qun Dai, Yiming Zhu, Kehua Guo, Huan Wang, Xiangbo Shu, Jian Yang, Yimian Dai
To address this, we introduce a new task--clustered infrared small target detection, and present DenseSIRST, a novel benchmark dataset that provides per-pixel semantic annotations for background regions, enabling the transition from sparse to dense target detection.
1 code implementation • 22 Jul 2024 • Fei Zhou, Maixia Fu, Yulei Qian, Jian Yang, Yimian Dai
Infrared small target detection is crucial for the efficacy of infrared search and tracking systems.
no code implementations • 19 Jul 2024 • Mingyang Lei, Jingfan Fan, Long Shao, Hong Song, Deqiang Xiao, Danni Ai, Tianyu Fu, Ying Gu, Jian Yang
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
5 code implementations • 15 Jul 2024 • An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, TianHao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, Zhihao Fan
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
Ranked #1 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
no code implementations • 3 Jul 2024 • Xia Hou, QiFeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, Jingbo Dun, Wenfeng Song
In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generating knowledge-intensive multi-turn dialogues for instruction tuning.
no code implementations • 2 Jul 2024 • Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai
Additionally, we propose a novel Multi-modal Video Diffusion Transformer (MVDiT) capable of mining both structure information from visual tokens and semantic information from text tokens.
no code implementations • 1 Jul 2024 • Yurui Huang, Jie Zhang, HengDa Bao, Yang Yang, Jian Yang
Estimated time of arrival (ETA) is a very important factor in the transportation system.
no code implementations • 25 Jun 2024 • Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang
To address these issues, we propose the LongIns benchmark dataset, a challenging long-context instruction-based exam for LLMs, which is built based on the existing instruction datasets.
no code implementations • 24 Jun 2024 • Tao Sun, Linzheng Chai, Jian Yang, Yuwei Yin, Hongcheng Guo, Jiaheng Liu, Bing Wang, Liqun Yang, Zhoujun Li
When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural language or other structured intermediate steps.