Search Results for author: Minheng Ni

Found 14 papers, 6 papers with code

AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

no code implementations21 Aug 2024 Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, WangMeng Zuo, Nan Duan

With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism.

Scheduling

Responsible Visual Editing

1 code implementation8 Apr 2024 Minheng Ni, Yeli Shen, Lei Zhang, WangMeng Zuo

To mitigate the negative implications of harmful images on research, we create a transparent and public dataset, AltBear, which expresses harmful information using teddy bears instead of humans.

Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models

no code implementations31 Aug 2023 Minheng Ni, Yabo Zhang, Kailai Feng, Xiaoming Li, Yiwen Guo, WangMeng Zuo

In this work, we introduce a novel Referring Diffusional segmentor (Ref-Diff) for this task, which leverages the fine-grained multi-modal information from generative models.

Image Segmentation Instance Segmentation +2

ORES: Open-vocabulary Responsible Visual Synthesis

1 code implementation26 Aug 2023 Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content.

Image Generation Language Modelling

NUWA-LIP: Language-Guided Image Inpainting With Defect-Free VQGAN

no code implementations CVPR 2023 Minheng Ni, Xiaoming Li, WangMeng Zuo

Language-guided image inpainting aims to fill the defective regions of an image under the guidance of text while keeping the non-defective regions unchanged.

Image Inpainting

ImaginaryNet: Learning Object Detectors without Real Images and Annotations

1 code implementation13 Oct 2022 Minheng Ni, Zitong Huang, Kailai Feng, WangMeng Zuo

Given a class label, the language model is used to generate a full description of a scene with a target object, and the text-to-image model deployed to generate a photo-realistic image.

Image Generation Language Modelling +3

NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

no code implementations10 Feb 2022 Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, WangMeng Zuo, Nan Duan

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged.

Image Inpainting

Co-GAT: A Co-Interactive Graph Attention Network for Joint Dialog Act Recognition and Sentiment Classification

1 code implementation24 Dec 2020 Libo Qin, Zhouyang Li, Wanxiang Che, Minheng Ni, Ting Liu

The dialog context information (contextual information) and the mutual interaction information are two key factors that contribute to the two related tasks.

Graph Attention Sentiment Analysis +1

DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification

no code implementations16 Aug 2020 Libo Qin, Wanxiang Che, Yangming Li, Minheng Ni, Ting Liu

In dialog system, dialog act recognition and sentiment classification are two correlative tasks to capture speakers intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately.

Relation Relation Network +2

CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP

1 code implementation11 Jun 2020 Libo Qin, Minheng Ni, Yue Zhang, Wanxiang Che

Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.

Data Augmentation

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

1 code implementation CVPR 2021 Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Jianfeng Gao, Dongdong Zhang, Nan Duan

We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training.

Image Captioning Image Retrieval +4

Multi-Domain Spoken Language Understanding Using Domain- and Task-Aware Parameterization

no code implementations30 Apr 2020 Libo Qin, Minheng Ni, Yue Zhang, Wanxiang Che, Yangming Li, Ting Liu

Spoken language understanding has been addressed as a supervised learning problem, where a set of training data is available for each domain.

Spoken Language Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.