no code implementations • 18 Apr 2024 • Suyuan Huang, Haoxin Zhang, Yan Gao, Yao Hu, Zengchang Qin
Multimodal Large Language Models (MLLMs) have demonstrated profound capabilities in understanding multimodal information, covering from Image LLMs to the more complex Video LLMs.
no code implementations • 4 Mar 2024 • Chao Zhang, Shiwei Wu, Haoxin Zhang, Tong Xu, Yan Gao, Yao Hu, Di wu, Enhong Chen
Indeed, learning to generate hashtags/categories can potentially enhance note embeddings, both of which compress key note information into limited content.
no code implementations • 31 Jul 2021 • Berken Utku Demirel, Ivan Skelin, Haoxin Zhang, Jack J. Lin, Mohammad Abdullah Al Faruque
This paper proposes a novel lightweight method using the multitaper power spectrum to estimate arousal levels at wearable devices.
no code implementations • 16 Oct 2017 • Bin Li, Hu Luo, Haoxin Zhang, Shunquan Tan, Zhongzhou Ji
In this paper, we present a CNN solution by using raw DCT (discrete cosine transformation) coefficients from JPEG images as input.