Search Results for author: Kevin Lin

Found 56 papers, 26 papers with code

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

no code implementations ICML 2020 Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

Diffusion and Multi-Domain Adaptation Methods for Eosinophil Segmentation

no code implementations17 Mar 2024 Kevin Lin, Donald Brown, Sana Syed, Adam Greene

Eosinophilic Esophagitis (EoE) represents a challenging condition for medical providers today.

Domain Adaptation

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

no code implementations1 Jan 2024 Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, JianFeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

\ModelName, our unified framework, merges unimodal and multimodal elements, enhancing model performance for tasks involving textual and visual data while notably reducing learnable parameters.

Language Modelling Reading Comprehension +1

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

no code implementations29 Nov 2023 Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, JianFeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We present MM-Narrator, a novel system leveraging GPT-4 with multimodal in-context learning for the generation of audio descriptions (AD).

In-Context Learning Text Generation

MM-VID: Advancing Video Understanding with GPT-4V(ision)

no code implementations30 Oct 2023 Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Video Understanding

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

1 code implementation23 Oct 2023 Kevin Lin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Lijuan Wang

For DEsignBench benchmarking, we perform human evaluations on generated images in DEsignBench gallery, against the criteria of image-text alignment, visual aesthetic, and design creativity.

Benchmarking Image Generation

MemGPT: Towards LLMs as Operating Systems

no code implementations12 Oct 2023 Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis.

Management

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

no code implementations12 Oct 2023 Zhengyuan Yang, JianFeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation.

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

no code implementations11 Oct 2023 Jie An, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo

We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

Question Answering Text Generation

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

1 code implementation29 Sep 2023 Zhengyuan Yang, Linjie Li, Kevin Lin, JianFeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models.

Uncertainty Quantification for Eosinophil Segmentation

no code implementations28 Sep 2023 Kevin Lin, Donald Brown, Sana Syed, Adam Greene

The uncertainty can be visualized in an output image to evaluate model performance, provide insight to how deep learning algorithms function, and assist pathologists in identifying eosinophils.

Image Segmentation Segmentation +2

Few-Shot Adaptation for Parsing Contextual Utterances with LLMs

1 code implementation18 Sep 2023 Kevin Lin, Patrick Xia, Hao Fang

We evaluate the ability of semantic parsers based on large language models (LLMs) to handle contextual utterances.

In-Context Learning Semantic Parsing

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

1 code implementation4 Aug 2023 Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Math Zero-Shot Visual Question Answring

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

no code implementations27 Jul 2023 Xin Yuan, Linjie Li, JianFeng Wang, Zhengyuan Yang, Kevin Lin, Zicheng Liu, Lijuan Wang

In this paper, we study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.

Denoising

Lost in the Middle: How Language Models Use Long Contexts

4 code implementations6 Jul 2023 Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.

Language Modelling Position +2

DisCo: Disentangled Control for Realistic Human Dance Generation

1 code implementation30 Jun 2023 Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources.

Attribute

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

3 code implementations26 Jun 2023 Fuxiao Liu, Kevin Lin, Linjie Li, JianFeng Wang, Yaser Yacoob, Lijuan Wang

To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts.

Hallucination Visual Question Answering

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

no code implementations24 May 2023 Kevin Lin, Kyle Lo, Joseph E. Gonzalez, Dan Klein

When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content elements (e. g., book characters or events), information beyond the document text (e. g., descriptions of book covers), or personal context (e. g., when they read a book).

Retrieval

An Empirical Study of Multimodal Model Merging

1 code implementation28 Apr 2023 Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang

In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities.

Retrieval Visual Question Answering (VQA)

Adaptive Human Matting for Dynamic Videos

1 code implementation CVPR 2023 Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications.

Image Matting Video Matting

Partial-View Object View Synthesis via Filtered Inversion

no code implementations3 Apr 2023 Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds.

Object

Equivariant Similarity for Vision-Language Foundation Models

1 code implementation ICCV 2023 Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.

Retrieval Text Retrieval +2

MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

no code implementations24 Nov 2022 Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang

Traditional methods of reconstructing 3D human pose and mesh from single images rely on paired image-mesh datasets, which can be difficult and expensive to obtain.

3D Human Pose Estimation Hand Pose Estimation

ReCo: Region-Controlled Text-to-Image Generation

no code implementations CVPR 2023 Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.

Conditional Text-to-Image Synthesis Position

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

1 code implementation CVPR 2023 Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.

Language Modelling Masked Language Modeling +6

GIT: A Generative Image-to-text Transformer for Vision and Language

2 code implementations27 May 2022 JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Image Captioning Image Classification +7

Cross-modal Representation Learning for Zero-shot Action Recognition

no code implementations CVPR 2022 Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The model design provides a natural mechanism for visual and semantic representations to be learned in a shared knowledge space, whereby it encourages the learned visual embedding to be discriminative and more semantically consistent.

Action Recognition Representation Learning +1

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation CVPR 2022 Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Caption Generation Question Answering +3

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation24 Nov 2021 Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Question Answering Retrieval +5

Mesh Graphormer

1 code implementation ICCV 2021 Kevin Lin, Lijuan Wang, Zicheng Liu

We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.

3D Hand Pose Estimation 3D Human Pose Estimation

Do Abstractions Have Politics? Toward a More Critical Algorithm Analysis

no code implementations4 Jan 2021 Kevin Lin

The expansion of computer science (CS) education in K--12 and higher-education in the United States has prompted deeper engagement with equity that moves beyond inclusion toward a more critical CS education.

Computers and Society K.3.2

Constructing Taxonomies from Pretrained Language Models

no code implementations NAACL 2021 Catherine Chen, Kevin Lin, Dan Klein

The tree reconciliation module treats the task as a graph optimization problem and outputs the maximum spanning tree of this graph.

Nifty Web Apps: Build a Web App for Any Text-Based Programming Assignment

1 code implementation9 Oct 2020 Kevin Lin, Sumant Guha, Joe Spaniac, Andy Zheng

While many students now interact with web apps across a variety of smart devices, the vast majority of our Nifty Assignments still present traditional user interfaces such as console input/output and desktop GUI.

Computers and Society K.3.2

Evaluating NLP Models via Contrast Sets

no code implementations1 Oct 2020 Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

no code implementations28 Sep 2020 Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).

Image Captioning Object +1

Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

no code implementations28 Feb 2020 Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun

Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.

Segmentation

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

2 code implementations26 Feb 2020 Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

no code implementations WS 2020 Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz

Specifically, we decompose the latent representation of the input sentence to a style code that captures the language style variation and a content code that encodes the language style-independent content.

Sentence Style Transfer +1

Neural Module Networks for Reasoning over Text

2 code implementations ICLR 2020 Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner

Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations.

Inductive Bias

QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions

no code implementations IJCNLP 2019 Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark

QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.

General Knowledge

Reasoning Over Paragraph Effects in Situations

no code implementations WS 2019 Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner

A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.

Reading Comprehension

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

3 code implementations11 Jul 2019 Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun

On the other hand, if part labels are also available in the real-images during training, our method outperforms the supervised state-of-the-art methods by a large margin.

 Ranked #1 on Human Part Segmentation on PASCAL-Part (using extra training data)

Domain Adaptation Human Part Segmentation +3

Grammar-based Neural Text-to-SQL Generation

no code implementations30 May 2019 Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner

The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar.

Semantic Parsing Text-To-SQL

Adversarial Learning for Fine-grained Image Search

no code implementations6 Jul 2018 Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu

Fine-grained image search is still a challenging problem due to the difficulty in capturing subtle differences regardless of pose variations of objects from fine-grained categories.

Generative Adversarial Network Image Retrieval

A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening

no code implementations NeurIPS 2017 Kevin Lin, James L. Sharpnack, Alessandro Rinaldo, Ryan J. Tibshirani

In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints.

Adversarial Ranking for Language Generation

1 code implementation NeurIPS 2017 Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun

Rather than training the discriminator to learn and assign absolute binary predicate for individual data sample, the proposed RankGAN is able to analyze and rank a collection of human-written and machine-written sentences by giving a reference group.

Generative Adversarial Network Text Generation

Learning Compact Binary Descriptors With Unsupervised Deep Neural Networks

no code implementations CVPR 2016 Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie zhou

In this paper, we propose a new unsupervised deep learning approach called DeepBit to learn compact binary descriptor for efficient visual object matching.

Image Retrieval Object +3

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks

1 code implementation1 Jul 2015 Huei-Fang Yang, Kevin Lin, Chu-Song Chen

SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets.

Attribute Classification +3

Optimization for Compressed Sensing: the Simplex Method and Kronecker Sparsification

no code implementations16 Dec 2013 Robert Vanderbei, Han Liu, Lie Wang, Kevin Lin

For the first approach, we note that the zero vector can be taken as the initial basic (infeasible) solution for the linear programming problem and therefore, if the true signal is very sparse, some variants of the simplex method can be expected to take only a small number of pivots to arrive at a solution.

Cannot find the paper you are looking for? You can Submit a new open access paper.