no code implementations • 15 Apr 2024 • Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang
To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels: Perception, Description, and Assessment.
2 code implementations • 7 Apr 2024 • Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo
Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.
no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan
In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.
2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan
In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.
Ranked #58 on Visual Question Answering on MM-Vet
1 code implementation • 27 Nov 2023 • Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan
Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.
4 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan
In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.
Ranked #4 on Zero-Shot Video Question Answer on TGIF-QA
no code implementations • 30 Oct 2023 • Bin Lin, Zhiping Mao, Zhicheng Wang, George Em Karniadakis
Initially, we utilize DeepONet to learn the solution operator for a set of smooth problems relevant to the PDEs characterized by sharp solutions.
4 code implementations • 3 Oct 2023 • Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, Hongfa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan
We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.
Ranked #1 on Zero-shot Audio Classification on VGG-Sound (using extra training data)
no code implementations • 28 Sep 2020 • Rungroj Maipradit, Bin Lin, Csaba Nagy, Gabriele Bavota, Michele Lanza, Hideaki Hata, Kenichi Matsumoto
Self-admitted technical debt (SATD) is a particular form of technical debt: developers consciously perform the hack but also document it in the code by adding comments as a reminder (or as an admission of guilt).
Software Engineering
no code implementations • 13 Dec 2019 • Yutao Jiao, Ping Wang, Dusit Niyato, Bin Lin, Dong In Kim
In this paper, we propose an auction based market model for incentivizing data owners to participate in federated learning.
Computer Science and Game Theory