no code implementations • 27 Feb 2025 • Minggui He, Yilun Liu, Shimin Tao, Yuanchang Luo, Hongyong Zeng, Chang Su, Li Zhang, Hongxia Ma, Daimeng Wei, Weibin Meng, Hao Yang, Boxing Chen, Osamu Yoshie
Despite recent breakthroughs in reasoning-enhanced large language models (LLMs) like DeepSeek-R1, incorporating inference-time reasoning into machine translation (MT), where human translators naturally employ structured, multi-layered reasoning chain-of-thoughts (CoTs), is yet underexplored.
no code implementations • 16 Feb 2025 • Zihan Lan, Weixin Mao, Haosheng Li, Le Wang, Tiancai Wang, Haoqiang Fan, Osamu Yoshie
Built upon the visual backbone of the policy network, we design a lightweight network to predict the importance score of each view.
1 code implementation • 29 Nov 2024 • Weixin Mao, Weiheng Zhong, Zhou Jiang, Dong Fang, Zhongyue Zhang, Zihan Lan, Fan Jia, Tiancai Wang, Haoqiang Fan, Osamu Yoshie
Existing policy learning methods predominantly adopt the task-centric paradigm, necessitating the collection of task data in an end-to-end manner.
1 code implementation • 23 Aug 2024 • Yilun Liu, Minggui He, Feiyu Yao, Yuhe Ji, Shimin Tao, Jingzhou Du, Duan Li, Jian Gao, Li Zhang, Hao Yang, Boxing Chen, Osamu Yoshie
To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset.
2 code implementations • 28 Jun 2024 • Jihao Liu, Xin Huang, Jinliang Zheng, Boxiao Liu, Jia Wang, Osamu Yoshie, Yu Liu, Hongsheng Li
This paper introduces MM-Instruct, a large-scale dataset of diverse and high-quality visual instruction data designed to enhance the instruction-following capabilities of large multimodal models (LMMs).
Ranked #137 on
Visual Question Answering
on MM-Vet
1 code implementation • 12 Feb 2024 • Kang Zhang, Osamu Yoshie, Weiran Huang
To address these issues, we introduce BreakGPT, the first large language model for financial breakout detection.
no code implementations • 29 Nov 2023 • Weixin Mao, Tiancai Wang, Diankun Zhang, Junjie Yan, Osamu Yoshie
Pillar-based methods mainly employ randomly initialized 2D convolution neural network (ConvNet) for feature extraction and fail to enjoy the benefits from the backbone scaling and pretraining in the image domain.
no code implementations • 30 Jun 2023 • Weixin Mao, Jinrong Yang, Zheng Ge, Lin Song, HongYu Zhou, Tiezheng Mao, Zeming Li, Osamu Yoshie
In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for improving depth perception in 3D object detection.
no code implementations • 17 Jan 2023 • Bingchen Zhao, Quan Cui, Hao Wu, Osamu Yoshie, Cheng Yang, Oisin Mac Aodha
In this work, given the excellent scalability of web data, we consider self-supervised pre-training on noisy web sourced image-text paired data.
1 code implementation • CVPR 2023 • Muyang Yi, Quan Cui, Hao Wu, Cheng Yang, Osamu Yoshie, Hongtao Lu
LoDA and SimSeg jointly ameliorate a vanilla CLIP to produce impressive semantic segmentation results.
1 code implementation • 8 Mar 2022 • Quan Cui, Bingchen Zhao, Zhao-Min Chen, Borui Zhao, RenJie Song, Jiajun Liang, Boyan Zhou, Osamu Yoshie
This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i. e., image classification.
1 code implementation • 17 Dec 2021 • Quan Cui, Boyan Zhou, Yu Guo, Weidong Yin, Hao Wu, Osamu Yoshie, Yubo Chen
However, these works require a tremendous amount of data and computational resources (e. g., billion-level web data and hundreds of GPUs), which prevent researchers with limited resources from reproduction and further exploration.
1 code implementation • 21 Apr 2021 • Xin Huang, Xinxin Wang, Wenyu Lv, Xiaying Bai, Xiang Long, Kaipeng Deng, Qingqing Dang, Shumin Han, Qiwen Liu, Xiaoguang Hu, dianhai yu, Yanjun Ma, Osamu Yoshie
To meet these two concerns, we comprehensively evaluate a collection of existing refinements to improve the performance of PP-YOLO while almost keep the infer time unchanged.
2 code implementations • CVPR 2021 • Zheng Ge, Songtao Liu, Zeming Li, Osamu Yoshie, Jian Sun
Recent advances in label assignment in object detection mainly seek to independently define positive/negative training samples for each ground-truth (gt) object.
Ranked #74 on
Object Detection
on COCO test-dev
no code implementations • 13 Feb 2021 • Anqing Jiang, LiangYao Chen, Osamu Yoshie
Machine learning, especially deep learning, is dramatically changing the methods associated with optical thin-film inverse design.
1 code implementation • 12 Jan 2021 • Zheng Ge, JianFeng Wang, Xin Huang, Songtao Liu, Osamu Yoshie
A joint loss is then defined as the weighted summation of cls and reg losses as the assigning indicator.
no code implementations • ECCV 2020 • Quan Cui, Qing-Yuan Jiang, Xiu-Shen Wei, Wu-Jun Li, Osamu Yoshie
Retrieving content relevant images from a large-scale fine-grained dataset could suffer from intolerably slow query speed and highly redundant storage cost, due to high-dimensional real-valued embeddings which aim to distinguish subtle visual differences of fine-grained objects.
no code implementations • 23 May 2020 • Zheng Ge, Zequn Jie, Xin Huang, Chengzheng Li, Osamu Yoshie
The first imbalance lies in the large number of low-quality RPN proposals, which makes the R-CNN module (i. e., post-classification layers) become highly biased towards the negative proposals in the early training stage.
no code implementations • CVPR 2020 • Xin Huang, Zheng Ge, Zequn Jie, Osamu Yoshie
To acquire the visible parts, a novel Paired-Box Model (PBM) is proposed to simultaneously predict the full and visible boxes of a pedestrian.
no code implementations • 16 Mar 2020 • Zheng Ge, Zequn Jie, Xin Huang, Rong Xu, Osamu Yoshie
PS-RCNN first detects slightly/none occluded objects by an R-CNN module (referred as P-RCNN), and then suppress the detected instances by human-shaped masks so that the features of heavily occluded instances can stand out.
Ranked #2 on
Object Detection
on WiderPerson
no code implementations • 7 Dec 2018 • Anqing Jiang, Osamu Yoshie, LiangYao Chen
This model can converge the global optimum of the optical thin film structure, this will greatly improve the design efficiency of multi-layer films.