no code implementations • 18 Dec 2024 • Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang
Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks.
no code implementations • 18 Dec 2024 • Xinghang Li, Peiyan Li, Minghuan Liu, Dong Wang, Jirong Liu, Bingyi Kang, Xiao Ma, Tao Kong, Hanbo Zhang, Huaping Liu
The obtained results convince us firmly to explain why we need VLA and develop a new family of VLAs, RoboVLMs, which require very few manual designs and achieve a new state-of-the-art performance in three simulation tasks and real-world experiments.
1 code implementation • 7 Nov 2024 • Luting Wang, Yang Zhao, Zijian Zhang, Jiashi Feng, Si Liu, Bingyi Kang
Currently, pixel reconstruction (e. g., VQGAN) dominates the training objective for image tokenizers.
1 code implementation • 5 Nov 2024 • Zilong Huang, Qinghao Ye, Bingyi Kang, Jiashi Feng, Haoqi Fan
Due to the absence of the text encoding as contrastive target, SuperClass does not require a text encoder and does not need to maintain a large batch size as CLIP does.
1 code implementation • 4 Nov 2024 • Yang Yue, Yulin Wang, Bingyi Kang, Yizeng Han, Shenzhi Wang, Shiji Song, Jiashi Feng, Gao Huang
MLLMs have demonstrated remarkable comprehension and reasoning capabilities with complex language and visual data.
no code implementations • 4 Nov 2024 • Bingyi Kang, Yang Yue, Rui Lu, Zhijie Lin, Yang Zhao, Kaixin Wang, Gao Huang, Jiashi Feng
Our scaling experiments show perfect generalization within the distribution, measurable scaling behavior for combinatorial generalization, but failure in out-of-distribution scenarios.
no code implementations • 3 Oct 2024 • Yuqing Wang, Tianwei Xiong, Daquan Zhou, Zhijie Lin, Yang Zhao, Bingyi Kang, Jiashi Feng, Xihui Liu
Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of natural language processing, while the exploration of autoregressive LLMs for video generation is limited to generating short videos of several seconds.
2 code implementations • 13 Jun 2024 • Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
This work presents Depth Anything V2.
1 code implementation • 8 Feb 2024 • Lior Cohen, Kaixin Wang, Bingyi Kang, Shie Mannor
We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15. 4x faster imagination compared to prior TBWMs.
5 code implementations • CVPR 2024 • Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao
To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.
Ranked #3 on Monocular Depth Estimation on ETH3D
1 code implementation • 22 Dec 2023 • Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang
Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Ranked #2 on Semantic Segmentation on Cityscapes test
1 code implementation • NeurIPS 2023 • Lihe Yang, Xiaogang Xu, Bingyi Kang, Yinghuan Shi, Hengshuang Zhao
Then, we investigate the role of synthetic images by joint training with real images, or pre-training for real images.
2 code implementations • NeurIPS 2023 • Yang Yue, Rui Lu, Bingyi Kang, Shiji Song, Gao Huang
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
1 code implementation • 17 Jul 2023 • Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang
Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.
2 code implementations • 8 Jun 2023 • Yang Yue, Bingyi Kang, Xiao Ma, Qisen Yang, Gao Huang, Shiji Song, Shuicheng Yan
OPER is a plug-and-play component for offline RL algorithms.
1 code implementation • 1 Jun 2023 • Bingyi Kang, Xiao Ma, Yirui Wang, Yang Yue, Shuicheng Yan
Recently, Offline Reinforcement Learning (RL) has achieved remarkable progress with the emergence of various algorithms and datasets.
1 code implementation • NeurIPS 2023 • Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, Shuicheng Yan
2) It is incompatible with maximum likelihood-based RL algorithms (e. g., policy gradient methods) as the likelihood of diffusion models is intractable.
1 code implementation • 27 May 2023 • Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang
Offline reinforcement learning (RL) aims to learn policies from pre-existing datasets without further interactions, making it a challenging task.
1 code implementation • 9 Feb 2023 • Weichen Yu, Tianyu Pang, Qian Liu, Chao Du, Bingyi Kang, Yan Huang, Min Lin, Shuicheng Yan
With the advance of language models, privacy protection is receiving more attention.
no code implementations • 17 Oct 2022 • Yang Yue, Bingyi Kang, Xiao Ma, Zhongwen Xu, Gao Huang, Shuicheng Yan
Therefore, we propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
1 code implementation • NeurIPS 2023 • Xiao Ma, Bingyi Kang, Zhongwen Xu, Min Lin, Shuicheng Yan
In this work, we propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset by directly constraining the policy improvement direction.
no code implementations • 25 Jun 2022 • Yang Yue, Bingyi Kang, Zhongwen Xu, Gao Huang, Shuicheng Yan
Recently, visual representation learning has been shown to be effective and promising for boosting sample efficiency in RL.
1 code implementation • 9 Oct 2021 • Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, Jiashi Feng
Deep long-tailed learning, one of the most challenging problems in visual recognition, aims to train well-performing deep models from a large number of images that follow a long-tailed class distribution.
1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng
Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.
Ranked #183 on Image Classification on ImageNet
5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng
In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.
Ranked #465 on Image Classification on ImageNet
no code implementations • 1 Jan 2021 • Bingyi Kang, Shie Mannor, Jiashi Feng
Reinforcement Learning (RL) with safety guarantee is critical for agents performing tasks in risky environments.
no code implementations • ICLR 2021 • Bingyi Kang, Yu Li, Sa Xie, Zehuan Yuan, Jiashi Feng
Motivated by this question, we conduct a series of studies on the performance of self-supervised contrastive learning and supervised learning methods over multiple datasets where training instance distributions vary from a balanced one to a long-tailed one.
Ranked #40 on Long-tail Learning on CIFAR-10-LT (ρ=10)
1 code implementation • ICLR 2021 • Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell
In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.
2 code implementations • NeurIPS 2020 • Kaixin Wang, Bingyi Kang, Jie Shao, Jiashi Feng
Deep reinforcement learning (RL) agents trained in a limited set of environments tend to suffer overfitting and fail to generalize to unseen testing environments.
1 code implementation • 6 Aug 2020 • Zi-Hang Jiang, Bingyi Kang, Kuangqi Zhou, Jiashi Feng
To be specific, we devise a simple and efficient meta-reweighting strategy to adapt the sample representations and generate soft attention to refine the representation such that the relevant features from the query and support samples can be extracted for a better few-shot classification.
1 code implementation • ECCV 2020 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, Jiashi Feng
Specifically, we systematically investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset, and unveil that a major cause is the inaccurate classification of object proposals.
2 code implementations • CVPR 2020 • Yu Li, Tao Wang, Bingyi Kang, Sheng Tang, Chunfeng Wang, Jintao Li, Jiashi Feng
Solving long-tail large vocabulary object detection with deep learning based models is a challenging and demanding task, which is however under-explored. In this work, we provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution.
1 code implementation • 29 Oct 2019 • Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Jun Hao Liew, Sheng Tang, Steven Hoi, Jiashi Feng
In this report, we investigate the performance drop phenomenon of state-of-the-art two-stage instance segmentation models when processing extreme long-tail training data based on the LVIS [5] dataset, and find a major cause is the inaccurate classification of object proposals.
4 code implementations • ICLR 2020 • Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis
The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem.
Ranked #3 on Long-tail learning with class descriptors on CUB-LT
2 code implementations • 21 Oct 2019 • Zhuang Liu, Xuanlin Li, Bingyi Kang, Trevor Darrell
In this work, we present the first comprehensive study of regularization techniques with multiple policy optimization algorithms on continuous control tasks.
1 code implementation • 21 Oct 2019 • Zhuang Liu, Hung-Ju Wang, Tinghui Zhou, Zhiqiang Shen, Bingyi Kang, Evan Shelhamer, Trevor Darrell
Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets.
no code implementations • 25 Dec 2018 • Huijuan Xu, Bingyi Kang, Ximeng Sun, Jiashi Feng, Kate Saenko, Trevor Darrell
In this paper, we present a conceptually simple and general yet novel framework for few-shot temporal activity detection which detects the start and end time of the few-shot input activities in an untrimmed video.
4 code implementations • ICCV 2019 • Bingyi Kang, Zhuang Liu, Xin Wang, Fisher Yu, Jiashi Feng, Trevor Darrell
The feature learner extracts meta features that are generalizable to detect novel object classes, using training data from base classes with sufficient samples.
Ranked #23 on Few-Shot Object Detection on MS-COCO (30-shot)
no code implementations • ICML 2018 • Bingyi Kang, Zequn Jie, Jiashi Feng
Exploration remains a significant challenge to reinforcement learning methods, especially in environments where reward signals are sparse.
no code implementations • ICLR 2018 • Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor
As most deep learning algorithms are stochastic (e. g., Stochastic Gradient Descent, Dropout, and Bayes-by-backprop), we revisit the robustness arguments of Xu & Mannor, and introduce a new approach, ensemble robustness, that concerns the robustness of a population of hypotheses.