Search Results for author: Kevin Lin

Found 79 papers, 41 papers with code

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

no code implementations ICML 2020 Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

Sleep-time Compute: Beyond Inference Scaling at Test-time

1 code implementation17 Apr 2025 Kevin Lin, Charlie Snell, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, Joseph E. Gonzalez

Scaling test-time compute has emerged as a key ingredient for enabling large language models (LLMs) to solve difficult problems, but comes with high latency and inference cost.

SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement

1 code implementation10 Apr 2025 Xiyao Wang, Zhengyuan Yang, Chao Feng, Hongjin Lu, Linjie Li, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang

In this paper, we present an effective method to enhance visual reasoning with significantly fewer training samples, relying purely on self-improvement with no knowledge distillation.

Knowledge Distillation Visual Reasoning

Measurement of LLM's Philosophies of Human Nature

1 code implementation3 Apr 2025 Minheng Ni, Ennan Wu, Zidong Gong, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Lijuan Wang, WangMeng Zuo

The widespread application of artificial intelligence (AI) in various tasks, along with frequent reports of conflicts or violations involving AI, has sparked societal concerns about interactions with AI systems.

Moral Scenarios

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

no code implementations26 Mar 2025 Yuyang Peng, Shishi Xiao, Keming Wu, Qisheng Liao, Bohan Chen, Kevin Lin, Danqing Huang, Ji Li, Yuhui Yuan

In contrast to most previous works that focus on a limited number of sub-regions and sentence-level prompts, ensuring precise adherence to ultra-dense layouts with tens or even hundreds of sub-regions in business content is far more challenging.

Descriptive Sentence +1

Zero-Shot Audio-Visual Editing via Cross-Modal Delta Denoising

no code implementations26 Mar 2025 Yan-Bo Lin, Kevin Lin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Chung-Ching Lin, Xiaofei Wang, Gedas Bertasius, Lijuan Wang

In this paper, we introduce zero-shot audio-video editing, a novel task that requires transforming original audio-visual content to align with a specified textual prompt without additional model training.

Denoising Video Editing

Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension

1 code implementation4 Dec 2024 Xiyao Wang, Zhengyuan Yang, Linjie Li, Hongjin Lu, Yuancheng Xu, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang

In this paper, we present Vision Value Model (VisVM) that can guide VLM inference-time search to generate responses with better visual comprehension.

Descriptive Language Modeling +3

LiVOS: Light Video Object Segmentation with Gated Linear Matching

1 code implementation5 Nov 2024 Qin Liu, JianFeng Wang, Zhengyuan Yang, Linjie Li, Kevin Lin, Marc Niethammer, Lijuan Wang

Semi-supervised video object segmentation (VOS) has been largely driven by space-time memory (STM) networks, which store past frame features in a spatiotemporal memory to segment the current frame via softmax attention.

Semantic Segmentation Semi-Supervised Video Object Segmentation +1

GenXD: Generating Any 3D and 4D Scenes

no code implementations4 Nov 2024 Yuyang Zhao, Chung-Ching Lin, Kevin Lin, Zhiwen Yan, Linjie Li, Zhengyuan Yang, JianFeng Wang, Gim Hee Lee, Lijuan Wang

Due to the lack of real-world 4D data in the community, we first propose a data curation pipeline to obtain camera poses and object motion strength from videos.

DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning

no code implementations31 Oct 2024 Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, Yuke Zhu

To this end, we introduce DexMimicGen, a large-scale automated data generation system that synthesizes trajectories from a handful of human demonstrations for humanoid robots with dexterous hands.

Imitation Learning

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

no code implementations30 Oct 2024 Yining Hong, Beide Liu, Maxine Wu, Yuanhao Zhai, Kai-Wei Chang, Linjie Li, Kevin Lin, Chung-Ching Lin, JianFeng Wang, Zhengyuan Yang, YingNian Wu, Lijuan Wang

Our approach incorporates a masked conditional video diffusion model for the slow learning of world dynamics, alongside an inference-time fast learning strategy based on a temporal LoRA module.

Video Generation

Latency correction in sparse neuronal spike trains with overlapping global events

no code implementations19 Oct 2024 Arturo Mariani, Federico Senocrate, Jason Mikiel-Hunter, David Mcalpine, Barbara Beiderbeck, Michael Pecka, Kevin Lin, Thomas Kreuz

New Method: Here we propose an iterative scheme that combines the advantages of the two original methods by using in each step as much of the latency information as possible and by employing a very fast extrapolation direct shift method instead of the much slower simulated annealing.

Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

1 code implementation17 Oct 2024 Yun-Yen Chuang, Hung-Min Hsu, Kevin Lin, Chen-Sheng Gu, Ling Zhen Li, Ray-I Chang, Hung-Yi Lee

The diffusion model, a new generative modeling paradigm, has achieved significant success in generating images, audio, video, and text.

Denoising Scheduling +2

Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization

no code implementations4 Oct 2024 Zichen Miao, Zhengyuan Yang, Kevin Lin, Ze Wang, Zicheng Liu, Lijuan Wang, Qiang Qiu

We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data.

Image Generation Style Transfer

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

no code implementations3 Oct 2024 Kaizhi Zheng, Xiaotong Chen, Xuehai He, Jing Gu, Linjie Li, Zhengyuan Yang, Kevin Lin, JianFeng Wang, Lijuan Wang, Xin Eric Wang

Given the steep learning curve of professional 3D software and the time-consuming process of managing large 3D assets, language-guided 3D scene editing has significant potential in fields such as virtual reality, augmented reality, and gaming.

3D scene Editing

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

1 code implementation1 Aug 2024 Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, JianFeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang, Xinchao Wang

Using MM-Vet v2 to benchmark large multimodal models, we found that Claude 3. 5 Sonnet is the best model with a score of 71. 8, slightly outperforming GPT-4o which scored 71. 0.

Math MM-Vet +3

IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

1 code implementation15 Jul 2024 Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, JianFeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

First, to enable dual-modal generation and maximize the information exchange between video and depth generation, we propose a unified dual-modal U-Net, a parameter-sharing framework for joint video and depth denoising, wherein a modality label guides the denoising target, and cross-modal attention enables the mutual information flow.

Denoising Monocular Depth Estimation +2

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

1 code implementation12 Jun 2024 Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, JianFeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics.

counterfactual Future prediction +1

Consistency Policy: Accelerated Visuomotor Policies via Consistency Distillation

no code implementations13 May 2024 Aaditya Prasad, Kevin Lin, Jimmy Wu, Linqi Zhou, Jeannette Bohg

Many robotic systems, such as mobile manipulators or quadrotors, cannot be equipped with high-end GPUs due to space, weight, and power constraints.

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

1 code implementation25 Apr 2024 An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, JianFeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Visual Grounding Visual Question Answering +1

Diffusion and Multi-Domain Adaptation Methods for Eosinophil Segmentation

no code implementations17 Mar 2024 Kevin Lin, Donald Brown, Sana Syed, Adam Greene

Eosinophilic Esophagitis (EoE) represents a challenging condition for medical providers today.

Domain Adaptation

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

no code implementations1 Jan 2024 Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, JianFeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

\ModelName, our unified framework, merges unimodal and multimodal elements, enhancing model performance for tasks involving textual and visual data while notably reducing learnable parameters.

Language Modelling Reading Comprehension +1

MM-VID: Advancing Video Understanding with GPT-4V(ision)

1 code implementation30 Oct 2023 Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Script Generation Video Understanding

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

1 code implementation23 Oct 2023 Kevin Lin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Lijuan Wang

For DEsignBench benchmarking, we perform human evaluations on generated images in DEsignBench gallery, against the criteria of image-text alignment, visual aesthetic, and design creativity.

Benchmarking Image Generation

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

no code implementations12 Oct 2023 Zhengyuan Yang, JianFeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation.

MemGPT: Towards LLMs as Operating Systems

1 code implementation12 Oct 2023 Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis.

Management

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

no code implementations11 Oct 2023 Jie An, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo

We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

Question Answering Text Generation

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

2 code implementations29 Sep 2023 Zhengyuan Yang, Linjie Li, Kevin Lin, JianFeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models.

Ranked #3 on MMR total on MRR-Benchmark (using extra training data)

MMR total

Uncertainty Quantification for Eosinophil Segmentation

no code implementations28 Sep 2023 Kevin Lin, Donald Brown, Sana Syed, Adam Greene

The uncertainty can be visualized in an output image to evaluate model performance, provide insight to how deep learning algorithms function, and assist pathologists in identifying eosinophils.

Deep Learning Image Segmentation +3

Few-Shot Adaptation for Parsing Contextual Utterances with LLMs

1 code implementation18 Sep 2023 Kevin Lin, Patrick Xia, Hao Fang

We evaluate the ability of semantic parsers based on large language models (LLMs) to handle contextual utterances.

In-Context Learning Semantic Parsing

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

1 code implementation4 Aug 2023 Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Math MM-Vet +1

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

no code implementations27 Jul 2023 Xin Yuan, Linjie Li, JianFeng Wang, Zhengyuan Yang, Kevin Lin, Zicheng Liu, Lijuan Wang

In this paper, we study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.

Denoising

Lost in the Middle: How Language Models Use Long Contexts

6 code implementations6 Jul 2023 Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.

Language Modelling Position +2

DisCo: Disentangled Control for Realistic Human Dance Generation

2 code implementations CVPR 2024 Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources.

Attribute

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

4 code implementations26 Jun 2023 Fuxiao Liu, Kevin Lin, Linjie Li, JianFeng Wang, Yaser Yacoob, Lijuan Wang

To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts.

Hallucination Visual Question Answering

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

no code implementations24 May 2023 Kevin Lin, Kyle Lo, Joseph E. Gonzalez, Dan Klein

When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content elements (e. g., book characters or events), information beyond the document text (e. g., descriptions of book covers), or personal context (e. g., when they read a book).

Retrieval

Neural Voting Field for Camera-Space 3D Hand Pose Estimation

no code implementations CVPR 2023 Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu

We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.

3D Hand Pose Estimation regression

An Empirical Study of Multimodal Model Merging

1 code implementation28 Apr 2023 Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang

In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities.

model Retrieval +2

Adaptive Human Matting for Dynamic Videos

1 code implementation CVPR 2023 Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications.

Decoder Image Matting +1

Partial-View Object View Synthesis via Filtered Inversion

no code implementations3 Apr 2023 Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds.

Object

Equivariant Similarity for Vision-Language Foundation Models

1 code implementation ICCV 2023 Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.

Image-text Retrieval Text Retrieval +2

MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

2 code implementations24 Nov 2022 Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang

Traditional methods of reconstructing 3D human pose and mesh from single images rely on paired image-mesh datasets, which can be difficult and expensive to obtain.

3D Human Pose Estimation Hand Pose Estimation

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

1 code implementation CVPR 2023 Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.

Decoder Language Modeling +8

GIT: A Generative Image-to-text Transformer for Vision and Language

1 code implementation27 May 2022 JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Decoder Image Captioning +10

Cross-modal Representation Learning for Zero-shot Action Recognition

no code implementations CVPR 2022 Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The model design provides a natural mechanism for visual and semantic representations to be learned in a shared knowledge space, whereby it encourages the learned visual embedding to be discriminative and more semantically consistent.

Action Recognition Representation Learning +1

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation CVPR 2022 Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Caption Generation Question Answering +3

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation24 Nov 2021 Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Question Answering Retrieval +5

Mesh Graphormer

3 code implementations ICCV 2021 Kevin Lin, Lijuan Wang, Zicheng Liu

We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.

3D Hand Pose Estimation 3D Human Pose Estimation

Do Abstractions Have Politics? Toward a More Critical Algorithm Analysis

no code implementations4 Jan 2021 Kevin Lin

The expansion of computer science (CS) education in K--12 and higher-education in the United States has prompted deeper engagement with equity that moves beyond inclusion toward a more critical CS education.

Computers and Society K.3.2

Constructing Taxonomies from Pretrained Language Models

no code implementations NAACL 2021 Catherine Chen, Kevin Lin, Dan Klein

The tree reconciliation module treats the task as a graph optimization problem and outputs the maximum spanning tree of this graph.

Nifty Web Apps: Build a Web App for Any Text-Based Programming Assignment

1 code implementation9 Oct 2020 Kevin Lin, Sumant Guha, Joe Spaniac, Andy Zheng

While many students now interact with web apps across a variety of smart devices, the vast majority of our Nifty Assignments still present traditional user interfaces such as console input/output and desktop GUI.

Computers and Society K.3.2

Evaluating NLP Models via Contrast Sets

no code implementations1 Oct 2020 Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

no code implementations28 Sep 2020 Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).

Image Captioning Object +1

Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

no code implementations28 Feb 2020 Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun

Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.

Segmentation

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

2 code implementations26 Feb 2020 Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

no code implementations WS 2020 Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz

Specifically, we decompose the latent representation of the input sentence to a style code that captures the language style variation and a content code that encodes the language style-independent content.

Sentence Style Transfer +1

Neural Module Networks for Reasoning over Text

2 code implementations ICLR 2020 Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner

Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations.

Diversity Inductive Bias

QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions

no code implementations IJCNLP 2019 Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark

QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.

General Knowledge

Reasoning Over Paragraph Effects in Situations

no code implementations WS 2019 Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner

A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.

Reading Comprehension

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

3 code implementations11 Jul 2019 Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun

On the other hand, if part labels are also available in the real-images during training, our method outperforms the supervised state-of-the-art methods by a large margin.

 Ranked #1 on Human Part Segmentation on PASCAL-Part (using extra training data)

Domain Adaptation Human Part Segmentation +3

Grammar-based Neural Text-to-SQL Generation

no code implementations30 May 2019 Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner

The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar.

Text-To-SQL

Adversarial Learning for Fine-grained Image Search

no code implementations6 Jul 2018 Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu

Fine-grained image search is still a challenging problem due to the difficulty in capturing subtle differences regardless of pose variations of objects from fine-grained categories.

Generative Adversarial Network Image Retrieval

A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening

no code implementations NeurIPS 2017 Kevin Lin, James L. Sharpnack, Alessandro Rinaldo, Ryan J. Tibshirani

In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints.

Adversarial Ranking for Language Generation

1 code implementation NeurIPS 2017 Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun

Rather than training the discriminator to learn and assign absolute binary predicate for individual data sample, the proposed RankGAN is able to analyze and rank a collection of human-written and machine-written sentences by giving a reference group.

Generative Adversarial Network Text Generation

Learning Compact Binary Descriptors With Unsupervised Deep Neural Networks

no code implementations CVPR 2016 Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie zhou

In this paper, we propose a new unsupervised deep learning approach called DeepBit to learn compact binary descriptor for efficient visual object matching.

Image Retrieval Object +3

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks

1 code implementation1 Jul 2015 Huei-Fang Yang, Kevin Lin, Chu-Song Chen

SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets.

Attribute Classification +3

Optimization for Compressed Sensing: the Simplex Method and Kronecker Sparsification

no code implementations16 Dec 2013 Robert Vanderbei, Han Liu, Lie Wang, Kevin Lin

For the first approach, we note that the zero vector can be taken as the initial basic (infeasible) solution for the linear programming problem and therefore, if the true signal is very sparse, some variants of the simplex method can be expected to take only a small number of pivots to arrive at a solution.

compressed sensing

Cannot find the paper you are looking for? You can Submit a new open access paper.