1 code implementation • 21 Apr 2025 • Hongcheng Gao, Yue Liu, Yufei He, Longxu Dou, Chao Du, Zhijie Deng, Bryan Hooi, Min Lin, Tianyu Pang
This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i. e., one system per user query.
1 code implementation • 26 Mar 2025 • Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin
DeepSeek-R1-Zero has shown that reinforcement learning (RL) at scale can directly enhance the reasoning capabilities of LLMs without supervised fine-tuning.
1 code implementation • 3 Mar 2025 • Xinyi Wan, Penghui Qi, Guangxing Huang, Jialin Li, Min Lin
In this paper, we focus on addressing this challenge by leveraging the under-explored memory offload strategy in PP.
1 code implementation • 29 Jan 2025 • Rui Min, Tianyu Pang, Chao Du, Qian Liu, Minhao Cheng, Min Lin
We first introduce a straightforward target-only rigging strategy that focuses on new battles involving $m_{t}$, identifying it via watermarking or a binary classifier, and exclusively voting for $m_{t}$ wins.
no code implementations • 15 Jan 2025 • Xianqi Wang, Hao Yang, Gangwei Xu, Junda Cheng, Min Lin, Yong Deng, Jinliang Zang, Yurui Chen, Xin Yang
This pipeline utilizes arbitrary single images as left images and pseudo disparities generated by a monocular depth estimation model to synthesize high-quality corresponding right images.
no code implementations • 23 Dec 2024 • Min Lin, Gangwei Xu, Yun Wang, Xianqi Wang, Xin Yang
In this paper, we propose a novel global-aware scene flow estimation network with global motion propagation, named FlowMamba.
1 code implementation • 27 Nov 2024 • Zekun Shi, Zheyuan Hu, Min Lin, Kenji Kawaguchi
Separately, the exponential scaling in $k$ for univariate functions ($d=1$) was addressed with high-order auto-differentiation (AD).
no code implementations • 6 Nov 2024 • Tianbo Li, Min Lin, Stephen Dale, Zekun Shi, A. H. Castro Neto, Kostya S. Novoselov, Giovanni Vignale
We present a novel approach to address the challenges of variable occupation numbers in direct optimization of density functional theory (DFT).
1 code implementation • 3 Nov 2024 • Zichen Liu, Changyu Chen, Chao Du, Wee Sun Lee, Min Lin
The results demonstrate that SEA achieves highly sample-efficient alignment with oracle's preferences, outperforming recent active exploration methods for LLMs.
1 code implementation • 24 Oct 2024 • Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Min Lin, Chongxuan Li
Masked diffusion models (MDMs) have shown promise in language modeling, yet their scalability and effectiveness in core language tasks, such as text generation and language understanding, remain underexplored.
1 code implementation • 17 Oct 2024 • Xuan Zhang, Cunxiao Du, Chao Du, Tianyu Pang, Wei Gao, Min Lin
To mitigate this issue, we present SimLayerKV, a simple yet effective method that reduces inter-layer KV cache redundancies by selectively dropping cache in identified lazy layers.
1 code implementation • 16 Oct 2024 • Hongcheng Gao, Tianyu Pang, Chao Du, Taihang Hu, Zhijie Deng, Min Lin
With the rapid progress of diffusion-based content generation, significant efforts are being made to unlearn harmful or copyrighted concepts from pretrained diffusion models (DMs) to prevent potential model misuse.
1 code implementation • 14 Oct 2024 • Xiangming Gu, Tianyu Pang, Chao Du, Qian Liu, Fengzhuo Zhang, Cunxiao Du, Ye Wang, Min Lin
In this work, we first demonstrate that attention sinks exist universally in LMs with various inputs, even in small models.
1 code implementation • 14 Oct 2024 • Kuofeng Gao, Tianyu Pang, Chao Du, Yong Yang, Shu-Tao Xia, Min Lin
To overcome this limitation, we propose poisoning-based DoS (P-DoS) attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit.
1 code implementation • 10 Oct 2024 • Xiaojian Yuan, Tianyu Pang, Chao Du, Kejiang Chen, Weiming Zhang, Min Lin
Specifically, the behavior that untargeted unlearning attempts to approximate is unpredictable and may involve hallucinations, and existing regularization is insufficient for targeted unlearning.
1 code implementation • 9 Oct 2024 • Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Jing Jiang, Min Lin
Achieving high win rates on these benchmarks can significantly boost the promotional impact of newly released language models.
1 code implementation • 18 Jul 2024 • Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong
We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations.
1 code implementation • 1 Jul 2024 • Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin
RegMix trains many small models on diverse data mixtures, uses regression to predict performance of unseen mixtures, and applies the best predicted mixture to train a large-scale model with orders of magnitude more compute.
1 code implementation • 14 Jun 2024 • Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin
In this work, we make a novel observation that this implicit reward model can by itself be used in a bootstrapping fashion to further align the LLM.
1 code implementation • 13 Jun 2024 • Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin
The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving.
1 code implementation • 3 Jun 2024 • Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Jing Jiang, Min Lin
In addition, we conduct comprehensive and elaborate (e. g., making sure to use correct system prompts) evaluations against other aligned LLMs and advanced defenses, where our method consistently achieves nearly 100% ASRs.
1 code implementation • 31 May 2024 • Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin
Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques.
1 code implementation • 24 May 2024 • Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin
To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput.
3 code implementations • 4 Apr 2024 • Longxu Dou, Qian Liu, Guangtao Zeng, Jia Guo, Jiahui Zhou, Wei Lu, Min Lin
We present Sailor, a family of open language models ranging from 0. 5B to 7B parameters, tailored for South-East Asian (SEA) languages.
1 code implementation • 12 Mar 2024 • Tongyao Zhu, Qian Liu, Liang Pang, Zhengbao Jiang, Min-Yen Kan, Min Lin
Through carefully-designed synthetic tasks, covering the scenarios of full recitation, selective recitation and grounded question answering, we reveal that LMs manage to sequentially access their memory while encountering challenges in randomly accessing memorized content.
1 code implementation • 26 Feb 2024 • Yijing Liu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Wei Chen
Recent research has made significant progress in optimizing diffusion models for downstream objectives, which is an important pursuit in fields such as graph generation for drug design.
no code implementations • 19 Feb 2024 • Tianlin Li, Qian Liu, Tianyu Pang, Chao Du, Qing Guo, Yang Liu, Min Lin
The emerging success of large language models (LLMs) heavily relies on collecting abundant training data from external (untrusted) sources.
1 code implementation • 13 Feb 2024 • Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin
A multimodal large language model (MLLM) agent can receive instructions, capture images, retrieve histories from memory, and decide which tools to use.
1 code implementation • 13 Feb 2024 • Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin
Backdoor attacks are commonly executed by contaminating training data, such that a trigger can activate predetermined harmful effects during the test phase.
no code implementations • 23 Jan 2024 • Zichen Liu, Chao Du, Wee Sun Lee, Min Lin
Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents.
1 code implementation • 22 Jan 2024 • Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin
This technical report aims to fill a deficiency in the assessment of large multimodal models (LMMs) by specifically examining the self-consistency of their outputs when subjected to common corruptions.
1 code implementation • 30 Nov 2023 • Min Lin
We present a set of primitive operators that serve as foundational building blocks for constructing several key types of functionals.
1 code implementation • 30 Nov 2023 • Penghui Qi, Xinyi Wan, Guangxing Huang, Min Lin
Pipeline parallelism is one of the key components for large-scale distributed training, yet its efficiency suffers from pipeline bubbles which were deemed inevitable.
1 code implementation • 14 Nov 2023 • Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu
We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt.
1 code implementation • 11 Nov 2023 • Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, Mohan Kankanhalli
The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases.
1 code implementation • 1 Nov 2023 • Xiaosen Zheng, Tianyu Pang, Chao Du, Jing Jiang, Min Lin
Data attribution seeks to trace model outputs back to training data.
2 code implementations • 4 Oct 2023 • Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Ye Wang
Looking into this, we first observe that memorization behaviors tend to occur on smaller-sized datasets, which motivates our definition of effective model memorization (EMM), a metric measuring the maximum size of training data at which a learned diffusion model approximates its theoretical optimum.
1 code implementation • 29 Sep 2023 • Shengyi Huang, Jiayi Weng, Rujikorn Charakorn, Min Lin, Zhongwen Xu, Santiago Ontañón
Distributed Deep Reinforcement Learning (DRL) aims to leverage more computational resources to train autonomous agents with less training time.
2 code implementations • 25 Jul 2023 • Chengsong Huang, Qian Liu, Bill Yuchen Lin, Tianyu Pang, Chao Du, Min Lin
This paper investigates LoRA composability for cross-task generalization and introduces LoraHub, a simple framework devised for the purposive assembly of LoRA modules trained on diverse given tasks, with the objective of achieving adaptable performance on unseen tasks.
1 code implementation • NeurIPS 2023 • Stefan Lionar, Xiangyu Xu, Min Lin, Gim Hee Lee
Second, our Repulsive UDF is a novel alternative to the occupancy field used in MCC, significantly improving the quality of 3D object reconstruction.
1 code implementation • NeurIPS 2023 • Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Chongxuan Li, Ngai-Man Cheung, Min Lin
Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented performance in response generation, especially with visual inputs, enabling more creative and adaptable interaction than large language models such as ChatGPT.
2 code implementations • 3 May 2023 • Chao Du, Tianbo Li, Tianyu Pang, Shuicheng Yan, Min Lin
Sliced-Wasserstein Flow (SWF) is a promising approach to nonparametric generative modeling but has not been widely adopted due to its suboptimal generative quality and lack of conditional modeling capabilities.
1 code implementation • 17 Apr 2023 • Qian Liu, Fan Zhou, Zhengbao Jiang, Longxu Dou, Min Lin
Empirical results on various benchmarks validate that the integration of SQL execution leads to significant improvements in zero-shot scenarios, particularly in table reasoning.
1 code implementation • CVPR 2023 • Yunqing Zhao, Chao Du, Milad Abdollahzadeh, Tianyu Pang, Min Lin, Shuicheng Yan, Ngai-Man Cheung
To this end, we propose knowledge truncation to mitigate this issue in FSIG, which is a complementary operation to knowledge preservation and is implemented by a lightweight pruning-based method.
1 code implementation • 17 Mar 2023 • Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Ngai-Man Cheung, Min Lin
Diffusion models (DMs) have demonstrated advantageous potential on generative tasks.
no code implementations • 1 Mar 2023 • Tianbo Li, Min Lin, Zheyuan Hu, Kunhao Zheng, Giovanni Vignale, Kenji Kawaguchi, A. H. Castro Neto, Kostya S. Novoselov, Shuicheng Yan
Kohn-Sham Density Functional Theory (KS-DFT) has been traditionally solved by the Self-Consistent Field (SCF) method.
1 code implementation • NeurIPS 2023 • Tianyu Pang, Cheng Lu, Chao Du, Min Lin, Shuicheng Yan, Zhijie Deng
In this work, we observe that the stochastic reverse process of data scores is a martingale, from which concentration bounds and the optional stopping theorem for data scores can be derived.
1 code implementation • 9 Feb 2023 • Weichen Yu, Tianyu Pang, Qian Liu, Chao Du, Bingyi Kang, Yan Huang, Min Lin, Shuicheng Yan
With the advance of language models, privacy protection is receiving more attention.
4 code implementations • 9 Feb 2023 • Zekai Wang, Tianyu Pang, Chao Du, Min Lin, Weiwei Liu, Shuicheng Yan
Under the $\ell_\infty$-norm threat model with $\epsilon=8/255$, our models achieve $70. 69\%$ and $42. 67\%$ robust accuracy on CIFAR-10 and CIFAR-100, respectively, i. e. improving upon previous state-of-the-art models by $+4. 58\%$ and $+8. 03\%$.
1 code implementation • 28 Jan 2023 • Haozhe Feng, Tianyu Pang, Chao Du, Wei Chen, Shuicheng Yan, Min Lin
BAFFLE is 1) memory-efficient and easily fits uploading bandwidth; 2) compatible with inference-only hardware optimization and model quantization or pruning; and 3) well-suited to trusted execution environments, because the clients in BAFFLE only execute forward propagation and return a set of scalars to the server.
no code implementations • ICCV 2023 • Yun Wang, Cheng Chi, Min Lin, Xin Yang
This approach circulates high-resolution estimated information (scene flow and feature) from the preceding iteration back to the low-resolution layer of the current iteration.
1 code implementation • NeurIPS 2023 • Xiao Ma, Bingyi Kang, Zhongwen Xu, Min Lin, Shuicheng Yan
In this work, we propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset by directly constraining the policy improvement direction.
no code implementations • 26 Sep 2022 • Yun Zhao, Hang Chen, Min Lin, Haiou Zhang, Tao Yan, Xing Lin, Ruqi Huang, Qionghai Dai
Increasing the layer number of on-chip photonic neural networks (PNNs) is essential to improve its model performance.
3 code implementations • 21 Jun 2022 • Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, Shuicheng Yan
EnvPool is open-sourced at https://github. com/sail-sg/envpool.
no code implementations • 26 May 2022 • Tianyu Pang, Shuicheng Yan, Min Lin
In this paper, we substitute the Slater determinant with a pairwise antisymmetry construction, which is easy to implement and can reduce the computational cost to $O(N^2)$.
no code implementations • COLING 2022 • Ziqing Yang, Zihang Xu, Yiming Cui, Baoxin Wang, Min Lin, Dayong Wu, Zhigang Chen
It covers Standard Chinese, Yue Chinese, and six other ethnic minority languages.
1 code implementation • 21 Feb 2022 • Tianyu Pang, Min Lin, Xiao Yang, Jun Zhu, Shuicheng Yan
The trade-off between robustness and accuracy has been widely studied in the adversarial literature.
1 code implementation • 30 Dec 2021 • Yongduo Sui, Xiang Wang, Jiancan Wu, Min Lin, Xiangnan He, Tat-Seng Chua
To endow the classifier with better interpretation and generalization, we propose the Causal Attention Learning (CAL) strategy, which discovers the causal patterns and mitigates the confounding effect of shortcuts.
1 code implementation • NeurIPS 2021 • Xinhsuai Dong, Luu Anh Tuan, Min Lin, Shuicheng Yan, Hanwang Zhang
The fine-tuning of pre-trained language models has a great success in many NLP fields.
1 code implementation • 27 Oct 2021 • Kun Li, Meng Li, Yanling Li, Min Lin
The traditional trend prediction models can better predict the short trend than the long trend.
no code implementations • 13 May 2021 • Bai Zhao, Min Lin, Ming Cheng, Wei-Ping Zhu, Naofal Al-Dhahir
This paper proposes a robust beamforming scheme to enhance the physical layer security (PLS) of multicast transmission in a cognitive satellite and aerial network (CSAN) operating in the millimeter wave frequency band.
no code implementations • NeurIPS 2020 • Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Page-Caccia, Issam Hadj Laradji, Irina Rish, Alexandre Lacoste, David Vázquez, Laurent Charlin
The main challenge is that the agent must not forget previous tasks and also adapt to novel tasks in the stream.
no code implementations • ICML Workshop LifelongML 2020 • Xu He, Min Lin
We compare these approaches in terms of both compression and forgetting and empirically study the reasons that limit the performance of continual learning methods based on variational posterior approximation.
1 code implementation • NeurIPS 2020 • Massimo Caccia, Pau Rodriguez, Oleksiy Ostapenko, Fabrice Normandin, Min Lin, Lucas Caccia, Issam Laradji, Irina Rish, Alexandre Lacoste, David Vazquez, Laurent Charlin
We propose Continual-MAML, an online extension of the popular MAML algorithm as a strong baseline for this scenario.
2 code implementations • NeurIPS 2019 • Rahaf Aljundi, Eugene Belilovsky, Tinne Tuytelaars, Laurent Charlin, Massimo Caccia, Min Lin, Lucas Page-Caccia
Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks.
1 code implementation • 11 Aug 2019 • Rahaf Aljundi, Lucas Caccia, Eugene Belilovsky, Massimo Caccia, Min Lin, Laurent Charlin, Tinne Tuytelaars
Methods based on replay, either generative or from a stored memory, have been shown to be effective approaches for continual learning, matching or exceeding the state of the art in a number of standard benchmarks.
no code implementations • 16 Jun 2019 • Min Lin, Jie Fu, Yoshua Bengio
In this study, we analyze parameter sharing under the conditional computation framework where the parameters of a neural network are conditioned on each input example.
5 code implementations • NeurIPS 2019 • Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio
To prevent forgetting, a replay buffer is usually employed to store the previous data for the purpose of rehearsal.
2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville
Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.
no code implementations • 16 Dec 2017 • Jun-Bo Wang, Junyuan Wang, Yongpeng Wu, Jin-Yuan Wang, Huiling Zhu, Min Lin, Jiangzhou Wang
Moreover, optimal or near-optimal solutions of historical scenarios can be searched offline and stored in advance.
4 code implementations • 20 Apr 2017 • Min Lin
In the generator training phase, the target is to assign equal probability to all data points in the batch, each with probability $\frac{1}{M+N}$.
2 code implementations • 3 Dec 2015 • Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang
This paper describes both the API design and the system implementation of MXNet, and explains how embedding of both symbolic expression and tensor operation is handled in a unified fashion.
no code implementations • 18 Jan 2015 • Canyi Lu, Jinhui Tang, Min Lin, Liang Lin, Shuicheng Yan, Zhouchen Lin
In this paper, we study the robust subspace clustering problem, which aims to cluster the given possibly noisy data points into their underlying subspaces.
1 code implementation • 19 Dec 2014 • Min Lin, Shuo Li, Xuan Luo, Shuicheng Yan
In this paper, we introduce a novel deep learning framework, termed Purine.
17 code implementations • 16 Dec 2013 • Min Lin, Qiang Chen, Shuicheng Yan
With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Ranked #4 on
Face Identification
on DroneSURF