1 code implementation • 10 Apr 2025 • Xingguang Ji, Jiakang Wang, Hongzhi Zhang, Jingyuan Zhang, Haonan Zhou, Chenxi Sun, Yahui Liu, Qi Wang, Fuzheng Zhang
We present in detail the framework design, the data construction, and the training recipe, to develop an MLLM step-by-step to obtain competitive performance.
1 code implementation • 10 Apr 2025 • Jingyuan Zhang, Hongzhi Zhang, Zhou Haonan, Chenxi Sun, Xingguang Ji, Jiakang Wang, Fanheng Kong, Yahui Liu, Qi Wang, Fuzheng Zhang
Data curation plays a crucial role in training powerful Visual Language Models (VLMs).
no code implementations • 12 Dec 2024 • Cheng Mei, Hao He, Yahui Liu, Zhenhua Guo
Notably, our method ranks the 1st place in the nuScenes lidar-based object detection task.
no code implementations • 21 Apr 2024 • Xiaoran Zhao, Tianhao Wu, Yu Lai, Zhiliang Tian, Zhen Huang, Yahui Liu, Zejiang He, Dongsheng Li
Controllable text-to-image generation synthesizes visual text and objects in images with certain conditions, which are frequently applied to emoji and poster generation.
1 code implementation • 8 Mar 2023 • Yahui Liu, Bin Tian, Yisheng Lv, Lingxi Li, FeiYue Wang
To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short.
Ranked #16 on
3D Point Cloud Classification
on ScanObjectNN
1 code implementation • 3 Oct 2022 • Yahui Liu, Enver Sangineto, Yajing Chen, Linchao Bao, Haoxian Zhang, Nicu Sebe, Bruno Lepri, Marco De Nadai
Multi-domain image-to-image (I2I) translations can transform a source image according to the style of a target domain.
1 code implementation • 26 Aug 2022 • Jichao Zhang, Aliaksandr Siarohin, Yahui Liu, Hao Tang, Nicu Sebe, Wei Wang
Generative Neural Radiance Fields (GNeRF)-based 3D-aware GANs have showcased remarkable prowess in crafting high-fidelity images while upholding robust 3D consistency, particularly face generation.
1 code implementation • 3 Aug 2022 • Ziping Yu, Hongbo Huang, Weijun Chen, YongXin Su, Yahui Liu, Xiuying Wang
In this paper, we propose a real-time face detector based on the one-stage detector YOLOv5, named YOLO-FaceV2.
1 code implementation • 25 Jul 2022 • Yingyi Chen, Xi Shen, Yahui Liu, Qinghua Tao, Johan A. K. Suykens
In this paper, we explore solving jigsaw puzzle as a self-supervised auxiliary loss in ViT for image classification, named Jigsaw-ViT.
Ranked #1 on
Learning with noisy labels
on ANIMAL
1 code implementation • 9 Jun 2022 • Elia Peruzzo, Enver Sangineto, Yahui Liu, Marco De Nadai, Wei Bi, Bruno Lepri, Nicu Sebe
In this work, we propose a different and complementary direction, in which a local bias is introduced using an auxiliary self-supervised task, performed jointly with standard supervised training.
1 code implementation • CVPR 2023 • Bin Ren, Yahui Liu, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe, Wei Wang
In particular, MJP first shuffles the selected patches via our block-wise random jigsaw puzzle shuffle algorithm, and their corresponding PEs are occluded.
1 code implementation • NAACL 2022 • Yahui Liu, Haoping Yang, Chen Gong, Qingrong Xia, Zhenghua Li, Min Zhang
1) Based on a frame-free annotation methodology, we avoid writing complex frames for new predicates.
1 code implementation • 5 May 2022 • Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, Nigel Collier
MAGIC is a flexible framework and is theoretically compatible with any text generation tasks that incorporate image grounding.
no code implementations • Findings (ACL) 2022 • Jiannan Xiang, Huayang Li, Yahui Liu, Lemao Liu, Guoping Huang, Defu Lian, Shuming Shi
Current practices in metric evaluation focus on one single dataset, e. g., Newstest dataset in each year's WMT Metrics Shared Task.
1 code implementation • 26 Sep 2021 • Yahui Liu, Yajing Chen, Linchao Bao, Nicu Sebe, Bruno Lepri, Marco De Nadai
The ISF manipulates the semantics of an input latent code to make the image generated from it lying in the desired visual domain.
no code implementations • CVPR 2021 • Yahui Liu, Enver Sangineto, Yajing Chen, Linchao Bao, Haoxian Zhang, Nicu Sebe, Bruno Lepri, Wei Wang, Marco De Nadai
In this paper, we propose a new training protocol based on three specific losses which help a translation network to learn a smooth and disentangled latent style space in which: 1) Both intra- and inter-domain interpolations correspond to gradual changes in the generated images and 2) The content of the source image is better preserved during the translation.
1 code implementation • NeurIPS 2021 • Yahui Liu, Enver Sangineto, Wei Bi, Nicu Sebe, Bruno Lepri, Marco De Nadai
This task encourages the VTs to learn spatial relations within an image and makes the VT training much more robust when training data are scarce.
1 code implementation • Findings (ACL) 2021 • Jiannan Xiang, Yahui Liu, Deng Cai, Huayang Li, Defu Lian, Lemao Liu
An important aspect of developing dialogue systems is how to evaluate and compare the performance of different systems.
1 code implementation • 22 Feb 2021 • Lei Ding, Hao Tang, Yahui Liu, Yilei Shi, Xiao Xiang Zhu, Lorenzo Bruzzone
To address this issue, we propose an adversarial shape learning network (ASLNet) to model the building shape patterns that improve the accuracy of building segmentation.
1 code implementation • 19 Oct 2020 • Pierfrancesco Ardino, Yahui Liu, Elisa Ricci, Bruno Lepri, Marco De Nadai
Inspired by recent works on image inpainting, our proposed method leverages the semantic segmentation to model the content and structure of the image, and learn the best shape and location of the object to insert.
1 code implementation • 11 Aug 2020 • Raul Gomez, Yahui Liu, Marco De Nadai, Dimosthenis Karatzas, Bruno Lepri, Nicu Sebe
In this paper we propose the use of an image retrieval system to assist the image-to-image translation task.
1 code implementation • 10 Aug 2020 • Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, Bruno Lepri
Our proposed model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description, before generating a new image from the content and the modified attribute representation.
no code implementations • 6 Apr 2020 • Kai Chen, Jian Yao, Jingmin Tu, Yahui Liu, Yinxuan Li, Li Li
Recently, works on improving the naturalness of stitching images gain more and more extensive attention.
1 code implementation • 15 Mar 2020 • Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, Xavier Alameda-Pineda
Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images.
no code implementations • 18 Nov 2019 • Yahui Liu, Furao Shen, Jian Zhao
PIGAT introduces the attention mechanism to consider the importance of each interacted user/item to both the user and the item, which captures user interests, item attractions and their influence on the recommendation context.
1 code implementation • 12 Jul 2019 • Yahui Liu, Marco De Nadai, Gloria Zen, Nicu Sebe, Bruno Lepri
In this work, we propose a novel GAN architecture that decouples the required annotations into a category label - that specifies the gesture type - and a simple-to-draw category-independent conditional map - that expresses the location, rotation and size of the hand gesture.
3 code implementations • 8 Jan 2019 • Xiaohu Lu, Yahui Liu, Kai Li
This paper presents a very simple but efficient algorithm for 3D line segment detection from large scale unorganized point cloud.
1 code implementation • EMNLP 2018 • Yahui Liu, Wei Bi, Jun Gao, Xiaojiang Liu, Jian Yao, Shuming Shi
We observe that in the conversation tasks, each query could have multiple responses, which forms a 1-to-n or m-to-n relationship in the view of the total corpus.
no code implementations • 12 May 2017 • Yahui Liu, Jian Yao, Li Li, Xiaohu Lu, Jing Han
We develop a novel deep contour detection algorithm with a top-down fully convolutional encoder-decoder network.