no code implementations • ICML 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.
1 code implementation • 14 Jun 2022 • Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang
In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.
no code implementations • 27 May 2022 • JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang
In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.
Ranked #1 on
Image Captioning
on nocaps-XD out-of-domain
no code implementations • CVPR 2022 • Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
The model design provides a natural mechanism for visual and semantic representations to be learned in a shared knowledge space, whereby it encourages the learned visual embedding to be discriminative and more semantically consistent.
Ranked #1 on
Zero-Shot Action Recognition
on HMDB51
1 code implementation • CVPR 2022 • Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang
Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).
1 code implementation • 24 Nov 2021 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu
Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.
no code implementations • 8 Aug 2021 • Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu
We introduce the task of open-vocabulary visual instance search (OVIS).
1 code implementation • ICCV 2021 • Kevin Lin, Lijuan Wang, Zicheng Liu
We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.
Ranked #1 on
3D Hand Pose Estimation
on FreiHAND
no code implementations • 4 Jan 2021 • Kevin Lin
The expansion of computer science (CS) education in K--12 and higher-education in the United States has prompted deeper engagement with equity that moves beyond inclusion toward a more critical CS education.
Computers and Society K.3.2
1 code implementation • CVPR 2021 • Kevin Lin, Lijuan Wang, Zicheng Liu
We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image.
Ranked #3 on
3D Hand Pose Estimation
on FreiHAND
no code implementations • NAACL 2021 • Catherine Chen, Kevin Lin, Dan Klein
The tree reconciliation module treats the task as a graph optimization problem and outputs the maximum spanning tree of this graph.
1 code implementation • 9 Oct 2020 • Kevin Lin, Sumant Guha, Joe Spaniac, Andy Zheng
While many students now interact with web apps across a variety of smart devices, the vast majority of our Nifty Assignments still present traditional user interfaces such as console input/output and desktop GUI.
Computers and Society K.3.2
no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 28 Sep 2020 • Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu
It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).
Ranked #2 on
Image Captioning
on nocaps-XD out-of-domain
no code implementations • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou
Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.
no code implementations • 28 Feb 2020 • Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun
Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.
2 code implementations • 26 Feb 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.
no code implementations • WS 2020 • Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz
Specifically, we decompose the latent representation of the input sentence to a style code that captures the language style variation and a content code that encodes the language style-independent content.
2 code implementations • ICLR 2020 • Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner
Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations.
no code implementations • IJCNLP 2019 • Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark
QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.
no code implementations • WS 2019 • Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner
A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.
3 code implementations • 11 Jul 2019 • Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun
On the other hand, if part labels are also available in the real-images during training, our method outperforms the supervised state-of-the-art methods by a large margin.
Ranked #1 on
Human Part Segmentation
on PASCAL-Part
(using extra training data)
no code implementations • 30 May 2019 • Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner
The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar.
no code implementations • 6 Jul 2018 • Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu
Fine-grained image search is still a challenging problem due to the difficulty in capturing subtle differences regardless of pose variations of objects from fine-grained categories.
no code implementations • NeurIPS 2017 • Kevin Lin, James L. Sharpnack, Alessandro Rinaldo, Ryan J. Tibshirani
In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints.
1 code implementation • NeurIPS 2017 • Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun
Rather than training the discriminator to learn and assign absolute binary predicate for individual data sample, the proposed RankGAN is able to analyze and rank a collection of human-written and machine-written sentences by giving a reference group.
Ranked #2 on
Text Generation
on Chinese Poems
no code implementations • CVPR 2016 • Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie zhou
In this paper, we propose a new unsupervised deep learning approach called DeepBit to learn compact binary descriptor for efficient visual object matching.
1 code implementation • 1 Jul 2015 • Huei-Fang Yang, Kevin Lin, Chu-Song Chen
SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets.
no code implementations • 16 Dec 2013 • Robert Vanderbei, Han Liu, Lie Wang, Kevin Lin
For the first approach, we note that the zero vector can be taken as the initial basic (infeasible) solution for the linear programming problem and therefore, if the true signal is very sparse, some variants of the simplex method can be expected to take only a small number of pivots to arrive at a solution.