Search Results for author: Ruihua Song

Found 29 papers, 12 papers with code

What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

1 code implementation2 Nov 2023 Yifan Du, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen

By conducting a comprehensive empirical study, we find that instructions focused on complex visual reasoning tasks are particularly effective in improving the performance of MLLMs on evaluation benchmarks.

Visual Reasoning

Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions

no code implementations11 Oct 2023 Yuchong Sun, Che Liu, Jinwen Huang, Ruihua Song, Fuzheng Zhang, Di Zhang, Zhongyuan Wang, Kun Gai

In this paper, we address these challenges by introducing Parrot, a highly scalable solution designed to automatically generate high-quality instruction-tuning data, which are then used to enhance the effectiveness of chat models in multi-turn conversations.

Instruction Following

ViCo: Engaging Video Comment Generation with Human Preference Rewards

no code implementations22 Aug 2023 Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu

Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered.

Comment Generation Descriptive

Transferring Foundation Models for Generalizable Robotic Manipulation

no code implementations9 Jun 2023 Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, LiMin Wang

Improving the generalization capabilities of general-purpose robotic manipulation agents in the real world has long been a significant challenge.

Imitation Learning Robot Manipulation

When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm

1 code implementation5 Jun 2023 Lei Wang, Jingsen Zhang, Hao Yang, ZhiYuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen

We argue that these models present significant opportunities for reliable user simulation, and have the potential to revolutionize traditional study paradigms in user behavior analysis.

Language Modelling Large Language Model +2

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

no code implementations30 May 2023 Chuhao Jin, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, LiMin Wang, Jianlong Fu

We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks.

Robot Manipulation

ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios

no code implementations20 May 2023 Yuyue Wang, Huan Xiao, Yihan Wu, Ruihua Song

Considering comedians have diverse personal speech styles, including personal prosody, rhythm, and fillers, it requires real-world datasets and strong speech style modeling capabilities, which brings challenges.

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

1 code implementation14 Jan 2023 Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu

Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall.

Knowledge Graphs

Text2Poster: Laying out Stylized Texts on Retrieved Images

1 code implementation6 Jan 2023 Chuhao Jin, Hongteng Xu, Ruihua Song, Zhiwu Lu

Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience.

Image Retrieval Layout Design +1

TeViS:Translating Text Synopses to Video Storyboards

no code implementations31 Dec 2022 Xu Gu, Yuchong Sun, Feiyue Ni, ShiZhe Chen, Xihua Wang, Ruihua Song, Boyuan Li, Xiang Cao

In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images as the video storyboard to visualize the text synopsis.

Language Modelling Quantization

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

1 code implementation30 Nov 2022 Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian

In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech.

Machine Translation speech-recognition +3

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

1 code implementation12 Oct 2022 Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.

Ranked #2 on Video Retrieval on QuerYD (using extra training data)

Contrastive Learning Question Answering +3

Multi-Modal Experience Inspired AI Creation

1 code implementation2 Sep 2022 Qian Cao, Xu Chen, Ruihua Song, Hao Jiang, Guang Yang, Zhao Cao

To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences.

Text Generation

Self-supervised Context-aware Style Representation for Expressive Speech Synthesis

no code implementations25 Jun 2022 Yihan Wu, Xi Wang, Shaofei Zhang, Lei He, Ruihua Song, Jian-Yun Nie

In this paper, we propose a novel framework for learning style representation from abundant plain text in a self-supervised manner.

Contrastive Learning Deep Clustering +2

AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios

no code implementations1 Apr 2022 Yihan Wu, Xu Tan, Bohan Li, Lei He, Sheng Zhao, Ruihua Song, Tao Qin, Tie-Yan Liu

We model the speaker characteristics systematically to improve the generalization on new speakers.

Speech Synthesis

Class-aware Sounding Objects Localization via Audiovisual Correspondence

1 code implementation22 Dec 2021 Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen

To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision.

object-detection Object Detection +2

Towards artificial general intelligence via a multimodal foundation model

1 code implementation27 Oct 2021 Nanyi Fei, Zhiwu Lu, Yizhao Gao, Guoxing Yang, Yuqi Huo, Jingyuan Wen, Haoyu Lu, Ruihua Song, Xin Gao, Tao Xiang, Hao Sun, Ji-Rong Wen

To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks.

Image Classification Reading Comprehension +2

Stylistic Retrieval-based Dialogue System with Unparallel Training Data

no code implementations12 Sep 2021 Hao Fu, Yan Wang, Ruihua Song, Tianran Hu, Jianyun Nie

The ability of a dialog system to express consistent language style during conversations has a direct, positive impact on its usability and on user satisfaction.

Chatbot Data Augmentation +2

ScriptWriter: Narrative-Guided Script Generation

1 code implementation ACL 2020 Yutao Zhu, Ruihua Song, Zhicheng Dou, Jian-Yun Nie, Jin Zhou

In dialogue systems, it would also be useful to drive dialogues by a dialogue plan.

"Love is as Complex as Math": Metaphor Generation System for Social Chatbot

no code implementations3 Jan 2020 Danning Zheng, Ruihua Song, Tianran Hu, Hao Fu, Jin Zhou

By embedding the framework into a chatbot system, we then enables the chatbot to communicate with users using figurative language.


Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences

no code implementations24 Nov 2019 Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou

A storyboard is a sequence of images to illustrate a story containing multiple sentences, which has been a key process to create different story products.

From Text to Sound: A Preliminary Study on Retrieving Sound Effects to Radio Stories

no code implementations20 Aug 2019 Songwei Ge, Curtis Xuan, Ruihua Song, Chao Zou, Wei Liu, Jin Zhou

In this paper, we address the problem of automatically adding sound effects to radio stories with a retrieval-based model.

Retrieval TAG +1

A World of Difference: Divergent Word Interpretations among People

no code implementations8 Mar 2017 Tianran Hu, Ruihua Song, Maya Abtahian, Philip Ding, Xing Xie, Jiebo Luo

We propose an approach that quantifies semantic differences in interpretations among different groups of people.

Cannot find the paper you are looking for? You can Submit a new open access paper.