no code implementations • 8 Jun 2023 • Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia, Yan Deng, Jonathan Tien
To leverage NLP models, speech input is first force-aligned with texts, and then pre-processed into a token sequence, including words and phrase break information.
no code implementations • 28 Oct 2022 • Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia
The token sequence is then fed into the pre-training and fine-tuning pipeline.
no code implementations • 4 Oct 2021 • Yuan Zhang, Jian Cao, Ling Zhang, Xiangcheng Liu, Zhiyi Wang, Feng Ling, Weiqian Chen
Learning subtle representation about object parts plays a vital role in fine-grained visual recognition (FGVR) field.
Ranked #10 on Fine-Grained Image Classification on Stanford Dogs
Fine-Grained Image Classification Fine-Grained Visual Recognition