no code implementations • 30 Jul 2024 • WeiYu Chen, James T. Kwok
However, it suffers from scalability issues when the number of tasks is large.
1 code implementation • 4 Jul 2024 • Tao Li, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James T. Kwok
Pre-training followed by fine-tuning is widely adopted among practitioners.
no code implementations • 19 Jun 2024 • Hansi Yang, James T. Kwok
Distributed learning, which does not require gathering training data in a central location, has become increasingly important in the big-data era.
no code implementations • 3 Jun 2024 • Lifeng Shen, Jincheng Yu, Hansi Yang, James T. Kwok
Mixup and its variants form a popular class of data augmentation techniques. Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels.
no code implementations • 31 May 2024 • Runsheng Yu, Yong Wang, Xiaoqi Jiao, Youzhi Zhang, James T. Kwok
To alleviate this problem, we investigate the use of intrinsic knowledge within the on-the-fly fine-tuning LLM to obtain relative qualities and help to refine the loss function.
no code implementations • 1 May 2024 • Zhili Liu, Yunhao Gou, Kai Chen, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok
As the capabilities of large language models (LLMs) have expanded dramatically, aligning these models with human values presents a significant challenge.
no code implementations • 14 Mar 2024 • Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James T. Kwok, Yu Zhang
To construct robust MLLMs, we propose ECSO (Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs in MLLMs.
no code implementations • 8 Feb 2024 • Zhili Liu, Kai Chen, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, James T. Kwok
It also obtains new state-of-the-art self-supervised learning results on detection and segmentation.
1 code implementation • 4 Feb 2024 • Yanbin Wei, Qiushi Huang, James T. Kwok, Yu Zhang
Knowledge Graph Completion (KGC) is crucial for addressing knowledge graph incompleteness and supporting downstream applications.
1 code implementation • 3 Feb 2024 • Yanbin Wei, Shuai Fu, Weisen Jiang, Zejian Zhang, Zhixiong Zeng, Qi Wu, James T. Kwok, Yu Zhang
Large Language Models (LLMs) are increasingly used for various tasks with graph structures.
no code implementations • 19 Dec 2023 • Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-yan Yeung, James T. Kwok, Yu Zhang
Instruction tuning of Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks.
no code implementations • 10 Nov 2023 • Mingwei Xu, Xiaofeng Cao, Ivor W. Tsang, James T. Kwok
In this paper, we replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model.
no code implementations • 3 Oct 2023 • Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok
Recently, various merging methods have been proposed to build a multi-task model from task-specific finetuned models without retraining.
no code implementations • 23 Sep 2023 • Yulong Zhang, Shuhao Chen, Weisen Jiang, Yu Zhang, Jiangang Lu, James T. Kwok
However, the performance of existing UDA methods is constrained by the large domain shift and limited target domain data.
1 code implementation • 21 Sep 2023 • Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu
Our MetaMath-7B model achieves 66. 4% on GSM8K and 19. 4% on MATH, exceeding the state-of-the-art models of the same size by 11. 5% and 8. 7%.
Ranked #51 on Arithmetic Reasoning on GSM8K (using extra training data)
1 code implementation • 23 Aug 2023 • Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu, James T. Kwok
Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields.
no code implementations • 15 Aug 2023 • Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok
Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification.
1 code implementation • 1 Jun 2023 • Weisen Jiang, Yu Zhang, James T. Kwok
Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting.
1 code implementation • 18 May 2023 • Qianli Ma, Zhen Liu, Zhenjing Zheng, Ziyang Huang, Siying Zhu, Zhongzhong Yu, James T. Kwok
Time-Series Mining (TSM) is an important research area since it shows great potential in practical applications.
no code implementations • 27 Sep 2022 • HUI ZHANG, Quanming Yao, James T. Kwok, Xiang Bai
We design a domain-specific search space by exploring principles for having good feature extractors.
Neural Architecture Search Vocal Bursts Intensity Prediction
no code implementations • 29 Jul 2022 • Xiaofeng Cao, Weixin Bu, Shengjun Huang, MinLing Zhang, Ivor W. Tsang, Yew Soon Ong, James T. Kwok
In future, learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans.
no code implementations • 30 Jun 2022 • Xiaofeng Cao, Yaming Guo, Ivor W. Tsang, James T. Kwok
An inherent assumption is that this learning manner can derive those updates into the optimal hypothesis.
no code implementations • ICLR 2022 • Han Shi, Jiahui Gao, Hang Xu, Xiaodan Liang, Zhenguo Li, Lingpeng Kong, Stephen M. S. Lee, James T. Kwok
Recently over-smoothing phenomenon of Transformer-based models is observed in both vision and language fields.
1 code implementation • 17 Sep 2021 • Zac Wellmer, James T. Kwok
By training the World Model using dropout, the dream environment is capable of creating a nearly infinite number of different dream environments.
no code implementations • 13 Jun 2021 • Huapeng Wu, Jie Gui, Jun Zhang, James T. Kwok, Zhihui Wei
Recently, deep convolutional neural network methods have achieved an excellent performance in image superresolution (SR), but they can not be easily applied to embedded devices due to large memory cost.
no code implementations • 13 Jun 2021 • Huapeng Wu, Jie Gui, Jun Zhang, James T. Kwok, Zhihui Wei
Recently, convolutional neural network (CNN) based image super-resolution (SR) methods have achieved significant performance improvement.
1 code implementation • NeurIPS 2021 • Haoang Chi, Feng Liu, Wenjing Yang, Long Lan, Tongliang Liu, Bo Han, William K. Cheung, James T. Kwok
To this end, we propose a target orientated hypothesis adaptation network (TOHAN) to solve the FHA problem, where we generate highly-compatible unlabeled data (i. e., an intermediate domain) to help train a target-domain classifier.
1 code implementation • 25 Feb 2021 • Han Shi, Jiahui Gao, Xiaozhe Ren, Hang Xu, Xiaodan Liang, Zhenguo Li, James T. Kwok
A surprising result is that diagonal elements in the attention map are the least important compared with other attention positions.
1 code implementation • 9 Nov 2020 • Bo Han, Quanming Yao, Tongliang Liu, Gang Niu, Ivor W. Tsang, James T. Kwok, Masashi Sugiyama
Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios.
no code implementations • 14 Aug 2020 • Yaqing Wang, Quanming Yao, James T. Kwok
Extensive low-rank matrix completion experiments on a number of synthetic and real-world data sets show that the proposed method obtains state-of-the-art recovery performance while being the fastest in comparison to existing low-rank matrix learning methods.
no code implementations • 26 Nov 2019 • Han Shi, Haozheng Fan, James T. Kwok
We propose the triad decoder, which considers and predicts the three edges involved in a local triad together.
1 code implementation • NeurIPS 2020 • Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang
In this work, we propose BONAS (Bayesian Optimized Neural Architecture Search), a sample-based NAS framework which is accelerated using weight-sharing to evaluate multiple related architectures simultaneously.
no code implementations • 25 Sep 2019 • Han Shi, Renjie Pi, Hang Xu, Zhenguo Li, James T. Kwok, Tong Zhang
Inspired by the nature of the graph structure of a neural network, we propose BOGCN-NAS, a NAS algorithm using Bayesian Optimization with Graph Convolutional Network (GCN) predictor.
1 code implementation • NeurIPS 2019 • Shuai Zheng, Ziyue Huang, James T. Kwok
In particular, on distributed ResNet training with 7 workers on the ImageNet, the proposed algorithm achieves the same testing accuracy as momentum SGD using full-precision gradients, but with $46\%$ less wall clock time.
no code implementations • 23 May 2019 • Shuai Zheng, James T. Kwok
Stochastic methods with coordinate-wise adaptive stepsize (such as RMSprop and Adam) have been widely used in training deep neural networks.
no code implementations • ICLR 2019 • Lu Hou, Ruiliang Zhang, James T. Kwok
We show that (i) weight-quantized networks converge to an error related to the weight quantization resolution and weight dimension; (ii) quantizing gradients slows convergence by a factor related to the gradient quantization resolution and dimension; and (iii) clipping the gradient before quantization renders this factor dimension-free, thus allowing the use of fewer bits for gradient quantization.
no code implementations • 8 Mar 2019 • Yaqing Wang, James T. Kwok, Lionel M. Ni
However, existing CSC methods can only model noises from Gaussian distribution, which is restrictive and unrealistic.
no code implementations • 23 Nov 2018 • Quanming Yao, Xiawei Guo, James T. Kwok, WeiWei Tu, Yuqiang Chen, Wenyuan Dai, Qiang Yang
To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms.
1 code implementation • 23 Jul 2018 • Quanming Yao, James T. Kwok, Bo Han
Due to the easy optimization, the convex overlapping nuclear norm has been popularly used for tensor completion.
no code implementations • ICML 2018 • Shuai Zheng, James T. Kwok
The memory cost of SSAG does not depend on the sample size, while that of S-SAGA is the same as those of variance reduction methods on un- perturbed data.
no code implementations • 4 May 2018 • Lu Hou, James T. Kwok
The power law has been observed in the degree distributions of many biological neural networks.
no code implementations • ICML 2018 • Yaqing Wang, Quanming Yao, James T. Kwok, Lionel M. Ni
Convolutional sparse coding (CSC) has been popularly used for the learning of shift-invariant dictionaries in image and signal processing.
1 code implementation • ICLR 2018 • Lu Hou, James T. Kwok
The huge size of deep networks hinders their use in small computing devices.
no code implementations • 1 Aug 2017 • Quanming Yao, James T. Kwok, Taifeng Wang, Tie-Yan Liu
Based on it, we develop a proximal gradient algorithm (and its accelerated variant) with inexact proximal splitting and prove that a convergence rate of O(1/T) where T is the number of iterations is guaranteed.
no code implementations • ICML 2017 • Shuai Zheng, James T. Kwok
Deep networks are highly nonlinear and difficult to optimize.
no code implementations • 21 Jun 2017 • Yaqing Wang, Quanming Yao, James T. Kwok, Lionel M. Ni
Convolutional sparse coding (CSC) improves sparse coding by learning a shift-invariant dictionary from the data.
no code implementations • 4 Apr 2017 • Yue Zhu, James T. Kwok, Zhi-Hua Zhou
In fact, in the real-world applications, both cases may occur that some label correlations are globally applicable and some are shared only in a local group of instances.
no code implementations • 29 Dec 2016 • Quanming Yao, James T. Kwok, Fei Gao, Wei Chen, Tie-Yan Liu
The proximal gradient algorithm has been popularly used for convex optimization.
Optimization and Control
1 code implementation • 5 Nov 2016 • Lu Hou, Quanming Yao, James T. Kwok
Deep neural network models, though very powerful and highly successful, are computationally expensive in terms of space and time.
no code implementations • 29 Oct 2016 • Quanming Yao, James T. Kwok, Xiawei Guo
In this paper, we show that a closed-form solution can be derived for the proximal step associated with this regularizer.
no code implementations • 27 Jul 2016 • Quanming Yao, James T. Kwok
Learning of low-rank matrices is fundamental to many machine learning applications.
no code implementations • 13 Jun 2016 • Quanming Yao, James T. Kwok
The nonconvex regularizer is then transformed to a familiar convex regularizer, while the resultant loss function can still be guaranteed to be smooth.
no code implementations • 24 Apr 2016 • Shuai Zheng, James T. Kwok
The alternating direction method of multipliers (ADMM) is a powerful optimization solver in machine learning.
no code implementations • 25 Feb 2016 • Shuai Zheng, Ruiliang Zhang, James T. Kwok
In regularized risk minimization, the associated optimization problem becomes particularly difficult when both the loss and regularizer are nonsmooth.
1 code implementation • 3 Dec 2015 • Quanming Yao, James T. Kwok, Wenliang Zhong
This allows the use of power method to approximate the SVD efficiently.
no code implementations • 7 Aug 2015 • Ruiliang Zhang, Shuai Zheng, James T. Kwok
With the recent proliferation of large-scale learning problems, there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants.
no code implementations • 16 Aug 2013 • Leon Wenliang Zhong, James T. Kwok
This matches the convergence rate of the batch ADMM algorithm, but without the need to visit all the samples in each iteration.
no code implementations • 6 Mar 2013 • Yu-Feng Li, Ivor W. Tsang, James T. Kwok, Zhi-Hua Zhou
In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete.
no code implementations • NeurIPS 2012 • Wei Bi, James T. Kwok
However, while there have been a lot of MLNP methods in hierarchical multiclass classification, performing MLNP in hierarchical multilabel classification is much more difficult.
no code implementations • NeurIPS 2012 • James T. Kwok, Ryan P. Adams
We show how to perform MAP inference with DPP priors in latent Dirichlet allocation and in mixture models, leading to better intuition for the latent variable representation and quantitatively improved unsupervised feature extraction, without compromising the generative aspects of the model.
no code implementations • NeurIPS 2009 • Chonghai Hu, Weike Pan, James T. Kwok
Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e. g., hinge loss) or the regularizer (e. g., $\ell_1$-regularizer).