no code implementations • 28 Feb 2024 • Sridhar Mahadevan

In this paper, we propose GAIA, a generative AI architecture based on category theory.

no code implementations • 17 Jul 2023 • Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song

We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs.

no code implementations • 10 Apr 2023 • Yichuan Deng, Sridhar Mahadevan, Zhao Song

It runs in $\widetilde{O}(\mathrm{nnz}(X) + n^{\omega} ) $ time, has $1-\delta$ succeed probability, and chooses $m = O(n \log(n/\delta))$.

no code implementations • 29 Mar 2023 • Yeqi Gao, Sridhar Mahadevan, Zhao Song

Mathematically, we define the neural function $F: \mathbb{R}^{d \times m} \times \mathbb{R}^d \rightarrow \mathbb{R}$ using an exponential activation function.

no code implementations • 18 Dec 2022 • Sridhar Mahadevan

At the second layer, causal models are defined by a graph-type category.

no code implementations • 3 Nov 2022 • Shiv Shankar, Ritwik Sinha, Saayan Mitra, Viswanathan Swaminathan, Sridhar Mahadevan, Moumita Sinha

We propose a two-stage experimental design, where the two brands only need to agree on high-level aggregate parameters of the experiment to test the alternate experiences.

no code implementations • 13 Sep 2022 • Sridhar Mahadevan

We present a unified formalism for structure discovery of causal models and predictive state representation (PSR) models in reinforcement learning (RL) using higher-order category theory.

no code implementations • 23 Aug 2022 • Sridhar Mahadevan

Categoroids are defined as a hybrid of two categories: one encoding a preordered lattice structure defined by objects and arrows between them; the second dual parameterization involves trigonoidal objects and morphisms defining a conditional independence structure, with bridge morphisms providing the interface between the binary and ternary structures.

no code implementations • 6 Jul 2022 • Sridhar Mahadevan

The second result, the Causal Reproducing Property (CRP), states that any causal influence of a object X on another object Y is representable as a natural transformation between two abstract causal diagrams.

no code implementations • 23 Apr 2022 • Kai Wang, Zhao Song, Georgios Theocharous, Sridhar Mahadevan

Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds.

no code implementations • 28 Oct 2021 • Sridhar Mahadevan

Decision objects in a UDM correspond to instances of decision tasks, ranging from causal models and dynamical systems such as Markov decision processes and predictive state representations, to network multiplayer games and Witsenhausen's intrinsic models, which generalizes all these previous formalisms.

no code implementations • 20 Sep 2021 • Sridhar Mahadevan

Semantic entropy quantifies the reduction in entropy where edges are removed by causal intervention.

no code implementations • 20 Sep 2021 • Sridhar Mahadevan

Network economics is the study of a rich class of equilibrium problems that occur in the real world, from traffic management to supply chains and two-sided online marketplaces.

no code implementations • 20 Sep 2021 • Sridhar Mahadevan

Second, a diverse range ofgraphical models used to represent causal structures can be represented in a unified way in terms of a topological representation of the induced poset structure.

no code implementations • 19 Sep 2021 • Sridhar Mahadevan, Anup Rao, Georgios Theocharous, Jennifer Healey

Many real-world applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and human-robot coordination.

no code implementations • NeurIPS 2012 • Bo Liu, Sridhar Mahadevan, Ji Liu

We present a novel $l_1$ regularized off-policy convergent TD-learning method (termed RO-TD), which is able to learn sparse representations of value functions with low computational complexity.

1 code implementation • 6 Jun 2020 • Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms.

no code implementations • 6 Jun 2020 • Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.

1 code implementation • ICML 2020 • Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary.

no code implementations • 4 Aug 2018 • Ian Gemp, Sridhar Mahadevan

In optimization, the negative gradient of a function denotes the direction of steepest descent.

1 code implementation • 28 Apr 2018 • Sridhar Mahadevan, Bamdev Mishra, Shalini Ghosh

We present a novel framework for domain adaptation, whereby both geometric and statistical differences between a labeled source domain and unlabeled target domain can be integrated by exploiting the curved Riemannian geometry of statistical manifolds.

no code implementations • 19 Oct 2017 • Ian Gemp, Sridhar Mahadevan

Algorithmic game theory (AGT) focuses on the design and analysis of algorithms for interacting agents, with interactions rigorously formalized within the framework of games.

no code implementations • 8 Mar 2017 • Stephen Giguere, Francisco Garcia, Sridhar Mahadevan

Although many machine learning algorithms involve learning subspaces with particular characteristics, optimizing a parameter matrix that is constrained to represent a subspace can be challenging.

1 code implementation • 5 Nov 2016 • Ishan Durugkar, Ian Gemp, Sridhar Mahadevan

Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game.

Ranked #70 on Image Generation on CIFAR-10 (Inception score metric)

no code implementations • 29 Aug 2016 • Ian Gemp, Sridhar Mahadevan

This paper presents a new framework for analyzing and designing no-regret algorithms for dynamic (possibly adversarial) systems.

no code implementations • 21 Aug 2016 • Ian Gemp, Ishan Durugkar, Mario Parente, M. Darby Dyar, Sridhar Mahadevan

Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets ($\mathbf{x},\mathbf{y}$) to large unlabeled ones ($\mathbf{x}$).

no code implementations • 15 Jun 2016 • Ishan P. Durugkar, Clemens Rosenbaum, Stefan Dernbach, Sridhar Mahadevan

Deep reinforcement learning has been shown to be a powerful framework for learning policies from complex high-dimensional sensory inputs to actions in complex tasks, such as the Atari domain.

no code implementations • 28 Jul 2015 • Sridhar Mahadevan, Sarath Chandar

In this paper, we introduce a new approach to capture analogies in continuous word representations, based on modeling not just individual word vectors, but rather the subspaces spanned by groups of words.

no code implementations • 26 May 2014 • Sridhar Mahadevan, Bo Liu, Philip Thomas, Will Dabney, Steve Giguere, Nicholas Jacek, Ian Gemp, Ji Liu

In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization.

no code implementations • NeurIPS 2013 • Philip S. Thomas, William C. Dabney, Stephen Giguere, Sridhar Mahadevan

Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes.

no code implementations • NeurIPS 2010 • Sridhar Mahadevan, Bo Liu

This paper explores links between basis construction methods in Markov decision processes and power series expansions of value functions.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.