Search Results for author: Ziwei Ji

Found 31 papers, 8 papers with code

Contrastive Learning for Inference in Dialogue

1 code implementation19 Oct 2023 Etsuko Ishii, Yan Xu, Bryan Wilie, Ziwei Ji, Holy Lovenia, Willy Chung, Pascale Fung

Inference, especially those derived from inductive processes, is a crucial component in our conversation to complement the information implicitly or explicitly conveyed by a speaker.

Contrastive Learning

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

no code implementations10 Oct 2023 Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, Pascale Fung

Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.

Answer Generation Hallucination +1

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

no code implementations9 Oct 2023 Holy Lovenia, Wenliang Dai, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung

Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects.

Hallucination Object +2

Think before you speak: Training Language Models With Pause Tokens

no code implementations3 Oct 2023 Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token.

GSM8K Question Answering

Improving Query-Focused Meeting Summarization with Query-Relevant Knowledge

no code implementations5 Sep 2023 Tiezheng Yu, Ziwei Ji, Pascale Fung

Query-Focused Meeting Summarization (QFMS) aims to generate a summary of a given meeting transcript conditioned upon a query.

Meeting Summarization

Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

1 code implementation1 Jun 2023 Yan Xu, Deqian Kong, Dehong Xu, Ziwei Ji, Bo Pang, Pascale Fung, Ying Nian Wu

The capability to generate responses with diversity and faithfulness using factual knowledge is paramount for creating a human-like, trustworthy dialogue system.

Dialogue Generation Response Generation

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

no code implementations13 May 2023 Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

RHO ($ρ$): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

1 code implementation3 Dec 2022 Ziwei Ji, Zihan Liu, Nayeon Lee, Tiezheng Yu, Bryan Wilie, Min Zeng, Pascale Fung

Dialogue systems can leverage large pre-trained language models and knowledge to generate fluent and informative responses.

Hallucination Representation Learning +1

Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training

1 code implementation14 Oct 2022 Wenliang Dai, Zihan Liu, Ziwei Ji, Dan Su, Pascale Fung

Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information.

Hallucination Image Augmentation +3

Survey of Hallucination in Natural Language Generation

no code implementations8 Feb 2022 Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Ho Shu Chan, Wenliang Dai, Andrea Madotto, Pascale Fung

This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation.

Abstractive Text Summarization Data-to-Text Generation +4

Agnostic Learnability of Halfspaces via Logistic Loss

no code implementations31 Jan 2022 Ziwei Ji, Kwangjun Ahn, Pranjal Awasthi, Satyen Kale, Stefani Karp

In this paper, we close this gap by constructing a well-behaved distribution such that the global minimizer of the logistic risk over this distribution only achieves $\Omega(\sqrt{\textrm{OPT}})$ misclassification risk, matching the upper bound in (Frei et al., 2021).

regression

Actor-critic is implicitly biased towards high entropy optimal policies

no code implementations ICLR 2022 Yuzheng Hu, Ziwei Ji, Matus Telgarsky

We show that the simplest actor-critic method -- a linear softmax policy updated with TD through interaction with a linear MDP, but featuring no explicit regularization or exploration -- does not merely find an optimal policy, but moreover prefers high entropy optimal policies.

Vocal Bursts Intensity Prediction

Fast Margin Maximization via Dual Acceleration

no code implementations1 Jul 2021 Ziwei Ji, Nathan Srebro, Matus Telgarsky

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e. g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of $\widetilde{\mathcal{O}}(1/t^2)$.

Early-stopped neural networks are consistent

no code implementations NeurIPS 2021 Ziwei Ji, Justin D. Li, Matus Telgarsky

This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero.

Binary Classification

Generalization bounds via distillation

no code implementations ICLR 2021 Daniel Hsu, Ziwei Ji, Matus Telgarsky, Lan Wang

This paper theoretically investigates the following empirical phenomenon: given a high-complexity network with poor generalization bounds, one can distill it into a network with nearly identical predictions but low complexity and vastly smaller generalization bounds.

Data Augmentation Generalization Bounds

Model Generalization on COVID-19 Fake News Detection

no code implementations11 Jan 2021 Yejin Bang, Etsuko Ishii, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung

Amid the pandemic COVID-19, the world is facing unprecedented infodemic with the proliferation of both fake and real information.

Fake News Detection Misinformation

Multi-hop Question Generation with Graph Convolutional Network

1 code implementation Findings of the Association for Computational Linguistics 2020 Dan Su, Yan Xu, Wenliang Dai, Ziwei Ji, Tiezheng Yu, Pascale Fung

Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.

Question Generation Question-Generation +1

Gradient descent follows the regularization path for general losses

no code implementations19 Jun 2020 Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias.

Directional convergence and alignment in deep learning

no code implementations NeurIPS 2020 Ziwei Ji, Matus Telgarsky

In this paper, we show that although the minimizers of cross-entropy and related classification losses are off at infinity, network weights learned by gradient flow converge in direction, with an immediate corollary that network predictions, training errors, and the margin distribution also converge.

General Classification

Neural tangent kernels, transportation mappings, and universal approximation

no code implementations ICLR 2020 Ziwei Ji, Matus Telgarsky, Ruicheng Xian

This paper establishes rates of universal approximation for the shallow neural tangent kernel (NTK): network weights are only allowed microscopic changes from random initialization, which entails that activations are mostly unchanged, and the network is nearly equivalent to its linearization.

Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

no code implementations ICLR 2020 Ziwei Ji, Matus Telgarsky

Recent theoretical work has guaranteed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error.

Approximation power of random neural networks

no code implementations18 Jun 2019 Bolton Bailey, Ziwei Ji, Matus Telgarsky, Ruicheng Xian

This paper investigates the approximation power of three types of random neural networks: (a) infinite width networks, with weights following an arbitrary distribution; (b) finite width networks obtained by subsampling the preceding infinite width networks; (c) finite width networks obtained by starting with standard Gaussian initialization, and then adding a vanishingly small correction to the weights.

Characterizing the implicit bias via a primal-dual analysis

no code implementations11 Jun 2019 Ziwei Ji, Matus Telgarsky

On the other hand, with a properly chosen but aggressive step size schedule, we prove $O(1/t)$ rates for both $\ell_2$ margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved $\widetilde{O}(1/\sqrt{t})$ margin rates, or $O(1/t)$ margin rates to a suboptimal margin, with an implied (slower) bias rate.

Gradient descent aligns the layers of deep linear networks

no code implementations ICLR 2019 Ziwei Ji, Matus Telgarsky

This paper establishes risk convergence and asymptotic weight matrix alignment --- a form of implicit regularization --- of gradient flow and gradient descent when applied to deep linear networks on linearly separable data.

Risk and parameter convergence of logistic regression

no code implementations20 Mar 2018 Ziwei Ji, Matus Telgarsky

Gradient descent, when applied to the task of logistic regression, outputs iterates which are biased to follow a unique ray defined by the data.

regression

Social welfare and profit maximization from revealed preferences

no code implementations6 Nov 2017 Ziwei Ji, Ruta Mehta, Matus Telgarsky

Consider the seller's problem of finding optimal prices for her $n$ (divisible) goods when faced with a set of $m$ consumers, given that she can only observe their purchased bundles at posted prices, i. e., revealed preferences.

Cannot find the paper you are looking for? You can Submit a new open access paper.