Search Results for author: Tian Ding

Found 13 papers, 5 papers with code

CoRT: Code-integrated Reasoning within Thinking

1 code implementation11 Jun 2025 Chengpeng Li, Zhengyang Tang, Ziniu Li, Mingfeng Xue, Keqin Bao, Tian Ding, Ruoyu Sun, Benyou Wang, Xiang Wang, Junyang Lin, Dayiheng Liu

Large Reasoning Models (LRMs) like o1 and DeepSeek-R1 have shown remarkable progress in natural language reasoning with long chain-of-thought (CoT), yet they remain inefficient or inaccurate when handling complex mathematical operations.

Mathematical Reasoning

Enabling Scalable Oversight via Self-Evolving Critic

no code implementations10 Jan 2025 Zhengyang Tang, Ziniu Li, Zhenyang Xiao, Tian Ding, Ruoyu Sun, Benyou Wang, Dayiheng Liu, Fei Huang, Tianyu Liu, Bowen Yu, Junyang Lin

Despite their remarkable performance, the development of Large Language Models (LLMs) faces a critical challenge in scalable oversight: providing effective feedback for tasks where human evaluation is difficult or where LLMs outperform humans.

An Efficient Unsupervised Framework for Convex Quadratic Programs via Deep Unrolling

no code implementations2 Dec 2024 Linxin Yang, Bingheng Li, Tian Ding, Jianghua Wu, Akang Wang, Yuyi Wang, Jiliang Tang, Ruoyu Sun, Xiaodong Luo

Unlike the standard learning-to-optimize framework that requires optimization solutions generated by solvers, our unsupervised method adjusts the network weights directly from the evaluation of the primal-dual gap.

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning

no code implementations30 Jul 2024 Yupeng Chen, Senmiao Wang, Yushun Zhang, Zhihang Lin, Haozhe Zhang, Weijian Sun, Tian Ding, Ruoyu Sun

Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks.

Adam-mini: Use Fewer Learning Rates To Gain More

1 code implementation24 Jun 2024 Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Diederik P. Kingma, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun

Adam-mini reduces memory by cutting down the learning rate resources in Adam (i. e., $1/\sqrt{v}$).

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

1 code implementation4 Jun 2024 Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems.

Why Transformers Need Adam: A Hessian Perspective

2 code implementations26 Feb 2024 Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo

SGD performs worse than Adam by a significant margin on Transformers, but the reason remains unclear.

Federated Learning with Lossy Distributed Source Coding: Analysis and Optimization

no code implementations23 Apr 2022 Huiyuan Yang, Tian Ding, Xiaojun Yuan

We then conduct an FL convergence analysis to connect the aggregation distortion and the FL convergence performance.

Federated Learning Quantization

The Global Landscape of Neural Networks: An Overview

no code implementations2 Jul 2020 Ruoyu Sun, Dawei Li, Shiyu Liang, Tian Ding, R. Srikant

Second, we discuss a few rigorous results on the geometric properties of wide networks such as "no bad basin", and some modifications that eliminate sub-optimal local minima and/or decreasing paths to infinity.

Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

no code implementations4 Nov 2019 Tian Ding, Dawei Li, Ruoyu Sun

More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons).

All

On the Benefit of Width for Neural Networks: Disappearance of Bad Basins

no code implementations28 Dec 2018 Dawei Li, Tian Ding, Ruoyu Sun

Wide networks are often believed to have a nice optimization landscape, but what rigorous results can we prove?

Sparsity Learning Based Multiuser Detection in Grant-Free Massive-Device Multiple Access

no code implementations28 Jul 2018 Tian Ding, Xiaojun Yuan, Soung Chang Liew

In this work, we study the multiuser detection (MUD) problem for a grant-free massive-device multiple access (MaDMA) system, where a large number of single-antenna user devices transmit sporadic data to a multi-antenna base station (BS).

User Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.