Search Results for author: Kevin Lin

Found 57 papers, 28 papers with code

Optimization for Compressed Sensing: the Simplex Method and Kronecker Sparsification

no code implementations • 16 Dec 2013 • Robert Vanderbei, Han Liu, Lie Wang, Kevin Lin

For the first approach, we note that the zero vector can be taken as the initial basic (infeasible) solution for the linear programming problem and therefore, if the true signal is very sparse, some variants of the simplex method can be expected to take only a small number of pivots to arrive at a solution.

Paper
Add Code

Adversarial Learning for Fine-grained Image Search

no code implementations • 6 Jul 2018 • Kevin Lin, Fan Yang, Qiaosong Wang, Robinson Piramuthu

Fine-grained image search is still a challenging problem due to the difficulty in capturing subtle differences regardless of pose variations of objects from fine-grained categories.

Generative Adversarial Network Image Retrieval

Paper
Add Code

A Sharp Error Analysis for the Fused Lasso, with Application to Approximate Changepoint Screening

no code implementations • NeurIPS 2017 • Kevin Lin, James L. Sharpnack, Alessandro Rinaldo, Ryan J. Tibshirani

In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints.

Paper
Add Code

Learning Compact Binary Descriptors With Unsupervised Deep Neural Networks

no code implementations • CVPR 2016 • Kevin Lin, Jiwen Lu, Chu-Song Chen, Jie zhou

In this paper, we propose a new unsupervised deep learning approach called DeepBit to learn compact binary descriptor for efficient visual object matching.

Image Retrieval Object +3

Paper
Add Code

Grammar-based Neural Text-to-SQL Generation

no code implementations • 30 May 2019 • Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Berant, Matt Gardner

The sequence-to-sequence paradigm employed by neural text-to-SQL models typically performs token-level decoding and does not consider generating SQL hierarchically from a grammar.

Semantic Parsing Text-To-SQL

Paper
Add Code

Reasoning Over Paragraph Effects in Situations

no code implementations • WS 2019 • Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner

A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.

Reading Comprehension

Paper
Add Code

QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions

no code implementations • IJCNLP 2019 • Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark

QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.

General Knowledge

Paper
Add Code

Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

no code implementations • WS 2020 • Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz

Specifically, we decompose the latent representation of the input sentence to a style code that captures the language style variation and a content code that encodes the language style-independent content.

Sentence Style Transfer +1

Paper
Add Code

Learning Nonparametric Human Mesh Reconstruction from a Single Image without Ground Truth Meshes

no code implementations • 28 Feb 2020 • Kevin Lin, Lijuan Wang, Ying Jin, Zicheng Liu, Ming-Ting Sun

Experimental results on multiple public datasets show that without using 3D ground truth meshes, the proposed approach outperforms the previous state-of-the-art approaches that require ground truth meshes for training.

Segmentation

Paper
Add Code

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

no code implementations • ICML 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

Paper
Add Code

VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning

no code implementations • 28 Sep 2020 • Xiaowei Hu, Xi Yin, Kevin Lin, Lijuan Wang, Lei Zhang, Jianfeng Gao, Zicheng Liu

It is highly desirable yet challenging to generate image captions that can describe novel objects which are unseen in caption-labeled training data, a capability that is evaluated in the novel object captioning challenge (nocaps).

Ranked #3 on Image Captioning on nocaps-XD out-of-domain

Image Captioning Object +1

Paper
Add Code

Constructing Taxonomies from Pretrained Language Models

no code implementations • NAACL 2021 • Catherine Chen, Kevin Lin, Dan Klein

The tree reconciliation module treats the task as a graph optimization problem and outputs the maximum spanning tree of this graph.

Paper
Add Code

Do Abstractions Have Politics? Toward a More Critical Algorithm Analysis

no code implementations • 4 Jan 2021 • Kevin Lin

The expansion of computer science (CS) education in K--12 and higher-education in the United States has prompted deeper engagement with equity that moves beyond inclusion toward a more critical CS education.

Computers and Society K.3.2

Paper
Add Code

Evaluating NLP Models via Contrast Sets

no code implementations • 1 Oct 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, A. Zhang, Ben Zhou

Unfortunately, when a dataset has systematic gaps (e. g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities.

Reading Comprehension Sentiment Analysis

Paper
Add Code

OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning

no code implementations • 8 Aug 2021 • Sheng Liu, Kevin Lin, Lijuan Wang, Junsong Yuan, Zicheng Liu

We introduce the task of open-vocabulary visual instance search (OVIS).

Instance Search Representation Learning

Paper
Add Code

Cross-modal Representation Learning for Zero-shot Action Recognition

no code implementations • CVPR 2022 • Chung-Ching Lin, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The model design provides a natural mechanism for visual and semantic representations to be learned in a shared knowledge space, whereby it encourages the learned visual embedding to be discriminative and more semantically consistent.

Ranked #3 on Zero-Shot Action Recognition on ActivityNet

Action Recognition Representation Learning +1

Paper
Add Code

MPT: Mesh Pre-Training with Transformers for Human Pose and Mesh Reconstruction

no code implementations • 24 Nov 2022 • Kevin Lin, Chung-Ching Lin, Lin Liang, Zicheng Liu, Lijuan Wang

Traditional methods of reconstructing 3D human pose and mesh from single images rely on paired image-mesh datasets, which can be difficult and expensive to obtain.

Ranked #13 on 3D Human Pose Estimation on 3DPW

3D Human Pose Estimation Hand Pose Estimation

Paper
Add Code

ReCo: Region-Controlled Text-to-Image Generation

no code implementations • CVPR 2023 • Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.

Ranked #2 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Position

Paper
Add Code

Partial-View Object View Synthesis via Filtered Inversion

no code implementations • 3 Apr 2023 • Fan-Yun Sun, Jonathan Tremblay, Valts Blukis, Kevin Lin, Danfei Xu, Boris Ivanovic, Peter Karkus, Stan Birchfield, Dieter Fox, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Marco Pavone, Nick Haber

At inference, given one or more views of a novel real-world object, FINV first finds a set of latent codes for the object by inverting the generative model from multiple initial seeds.

Object

Paper
Add Code

Neural Voting Field for Camera-Space 3D Hand Pose Estimation

no code implementations • CVPR 2023 • Lin Huang, Chung-Ching Lin, Kevin Lin, Lin Liang, Lijuan Wang, Junsong Yuan, Zicheng Liu

We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.

Ranked #4 on 3D Hand Pose Estimation on HO-3D

3D Hand Pose Estimation regression

Paper
Add Code

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

no code implementations • 24 May 2023 • Kevin Lin, Kyle Lo, Joseph E. Gonzalez, Dan Klein

When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content elements (e. g., book characters or events), information beyond the document text (e. g., descriptions of book covers), or personal context (e. g., when they read a book).

Retrieval

Paper
Add Code

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

no code implementations • 27 Jul 2023 • Xin Yuan, Linjie Li, JianFeng Wang, Zhengyuan Yang, Kevin Lin, Zicheng Liu, Lijuan Wang

In this paper, we study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis.

Denoising

Paper
Add Code

Uncertainty Quantification for Eosinophil Segmentation

no code implementations • 28 Sep 2023 • Kevin Lin, Donald Brown, Sana Syed, Adam Greene

The uncertainty can be visualized in an output image to evaluate model performance, provide insight to how deep learning algorithms function, and assist pathologists in identifying eosinophils.

Image Segmentation Segmentation +2

Paper
Add Code

Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

no code implementations • 12 Oct 2023 • Zhengyuan Yang, JianFeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We introduce ``Idea to Image,'' a system that enables multimodal iterative self-refinement with GPT-4V(ision) for automatic image design and generation.

Paper
Add Code

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation

no code implementations • 11 Oct 2023 • Jie An, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo

We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

Question Answering Text Generation

Paper
Add Code

MM-VID: Advancing Video Understanding with GPT-4V(ision)

no code implementations • 30 Oct 2023 • Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Video Understanding

Paper
Add Code

MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning

no code implementations • 29 Nov 2023 • Chaoyi Zhang, Kevin Lin, Zhengyuan Yang, JianFeng Wang, Linjie Li, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We present MM-Narrator, a novel system leveraging GPT-4 with multimodal in-context learning for the generation of audio descriptions (AD).

In-Context Learning Text Generation

Paper
Add Code

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

no code implementations • 1 Jan 2024 • Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, JianFeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou

\ModelName, our unified framework, merges unimodal and multimodal elements, enhancing model performance for tasks involving textual and visual data while notably reducing learnable parameters.

Language Modelling Reading Comprehension +1

Paper
Add Code

Diffusion and Multi-Domain Adaptation Methods for Eosinophil Segmentation

no code implementations • 17 Mar 2024 • Kevin Lin, Donald Brown, Sana Syed, Adam Greene

Eosinophilic Esophagitis (EoE) represents a challenging condition for medical providers today.

Domain Adaptation

Paper
Add Code

Nifty Web Apps: Build a Web App for Any Text-Based Programming Assignment

1 code implementation • 9 Oct 2020 • Kevin Lin, Sumant Guha, Joe Spaniac, Andy Zheng

While many students now interact with web apps across a variety of smart devices, the vast majority of our Nifty Assignments still present traditional user interfaces such as console input/output and desktop GUI.

Computers and Society K.3.2

Paper
Code

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

1 code implementation • 23 Oct 2023 • Kevin Lin, Zhengyuan Yang, Linjie Li, JianFeng Wang, Lijuan Wang

For DEsignBench benchmarking, we perform human evaluations on generated images in DEsignBench gallery, against the criteria of image-text alignment, visual aesthetic, and design creativity.

Benchmarking Image Generation

Paper
Code

Few-Shot Adaptation for Parsing Contextual Utterances with LLMs

1 code implementation • 18 Sep 2023 • Kevin Lin, Patrick Xia, Hao Fang

We evaluate the ability of semantic parsers based on large language models (LLMs) to handle contextual utterances.

In-Context Learning Semantic Parsing

Paper
Code

Adversarial Ranking for Language Generation

1 code implementation • NeurIPS 2017 • Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-Ting Sun

Rather than training the discriminator to learn and assign absolute binary predicate for individual data sample, the proposed RankGAN is able to analyze and rank a collection of human-written and machine-written sentences by giving a reference group.

Ranked #1 on Text Generation on Chinese Poems

Generative Adversarial Network Text Generation

Paper
Code

An Empirical Study of Multimodal Model Merging

1 code implementation • 28 Apr 2023 • Yi-Lin Sung, Linjie Li, Kevin Lin, Zhe Gan, Mohit Bansal, Lijuan Wang

In this paper, we expand on this concept to a multimodal setup by merging transformers trained on different modalities.

Retrieval Visual Question Answering (VQA)

Paper
Code

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

1 code implementation • CVPR 2023 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Masked visual modeling (MVM) has been recently proven effective for visual pre-training.

Ranked #1 on Video Question Answering on LSMDC-MC

Fill Mask Optical Flow Estimation +10

Paper
Code

Adaptive Human Matting for Dynamic Videos

1 code implementation • CVPR 2023 • Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu

The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications.

Decoder Image Matting +1

Paper
Code

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

1 code implementation • CVPR 2023 • Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.

Decoder Language Modelling +7

Paper
Code

Evaluating Models' Local Decision Boundaries via Contrast Sets

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, Ben Zhou

Reading Comprehension Sentiment Analysis

Paper
Code

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

1 code implementation • 25 Apr 2024 • An Yan, Zhengyuan Yang, Junda Wu, Wanrong Zhu, Jianwei Yang, Linjie Li, Kevin Lin, JianFeng Wang, Julian McAuley, Jianfeng Gao, Lijuan Wang

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Ranked #47 on Visual Question Answering on MM-Vet

Visual Grounding Visual Question Answering +1

Paper
Code

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

2 code implementations • 13 Nov 2023 • An Yan, Zhengyuan Yang, Wanrong Zhu, Kevin Lin, Linjie Li, JianFeng Wang, Jianwei Yang, Yiwu Zhong, Julian McAuley, Jianfeng Gao, Zicheng Liu, Lijuan Wang

We first benchmark MM-Navigator on our collected iOS screen dataset.

Action Localization

106

Paper
Code

Neural Module Networks for Reasoning over Text

2 code implementations • ICLR 2020 • Nitish Gupta, Kevin Lin, Dan Roth, Sameer Singh, Matt Gardner

Answering compositional questions that require multiple steps of reasoning against text is challenging, especially when they involve discrete, symbolic operations.

Inductive Bias

120

Paper
Code

Equivariant Similarity for Vision-Language Foundation Models

1 code implementation • ICCV 2023 • Tan Wang, Kevin Lin, Linjie Li, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

Unlike the existing image-text similarity objective which only categorizes matched pairs as similar and unmatched pairs as dissimilar, equivariance also requires similarity to vary faithfully according to the semantic changes.

Ranked #7 on Visual Reasoning on Winoground

Retrieval Text Retrieval +2

121

Paper
Code

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation • 24 Nov 2021 • Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Ranked #20 on Zero-Shot Video Retrieval on DiDeMo

Question Answering Retrieval +5

136

Paper
Code

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

1 code implementation • 4 Aug 2023 • Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Math Zero-Shot Visual Question Answring

177

Paper
Code

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

1 code implementation • 29 Sep 2023 • Zhengyuan Yang, Linjie Li, Kevin Lin, JianFeng Wang, Chung-Ching Lin, Zicheng Liu, Lijuan Wang

We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models.

182

Paper
Code

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks

1 code implementation • 1 Jul 2015 • Huei-Fang Yang, Kevin Lin, Chu-Song Chen

SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets.

Attribute Classification +3

205

Paper
Code

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

4 code implementations • 26 Jun 2023 • Fuxiao Liu, Kevin Lin, Linjie Li, JianFeng Wang, Yaser Yacoob, Lijuan Wang

To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts.

Ranked #4 on Visual Question Answering (VQA) on HallusionBench

Hallucination Visual Question Answering

211

Paper
Code

SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning

1 code implementation • CVPR 2022 • Kevin Lin, Linjie Li, Chung-Ching Lin, Faisal Ahmed, Zhe Gan, Zicheng Liu, Yumao Lu, Lijuan Wang

Based on this model architecture, we show that video captioning can benefit significantly from more densely sampled video frames as opposed to previous successes with sparsely sampled video frames for video-and-language understanding tasks (e. g., video question answering).

Caption Generation Question Answering +3

226

Paper
Code

Lost in the Middle: How Language Models Use Long Contexts

4 code implementations • 6 Jul 2023 • Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.

Language Modelling Position +2

264

Paper
Code

Cross-Domain Complementary Learning Using Pose for Multi-Person Part Segmentation

3 code implementations • 11 Jul 2019 • Kevin Lin, Lijuan Wang, Kun Luo, Yinpeng Chen, Zicheng Liu, Ming-Ting Sun

On the other hand, if part labels are also available in the real-images during training, our method outperforms the supervised state-of-the-art methods by a large margin.

Ranked #1 on Human Part Segmentation on PASCAL-Part (using extra training data)

Domain Adaptation Human Part Segmentation +3

271

Paper
Code

Mesh Graphormer

1 code implementation • ICCV 2021 • Kevin Lin, Lijuan Wang, Zicheng Liu

We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.

Ranked #1 on 3D Hand Pose Estimation on FreiHAND

3D Hand Pose Estimation 3D Human Pose Estimation

363

Paper
Code

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

2 code implementations • 26 Feb 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

378

Paper
Code

GIT: A Generative Image-to-text Transformer for Vision and Language

1 code implementation • 27 May 2022 • JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Ranked #1 on Image Captioning on nocaps-XD near-domain

Decoder Image Captioning +8

518

Paper
Code

End-to-End Human Pose and Mesh Reconstruction with Transformers

1 code implementation • CVPR 2021 • Kevin Lin, Lijuan Wang, Zicheng Liu

We present a new method, called MEsh TRansfOrmer (METRO), to reconstruct 3D human pose and mesh vertices from a single image.

Ranked #4 on 3D Hand Pose Estimation on FreiHAND

3D Absolute Human Pose Estimation 3D Hand Pose Estimation

585

Paper
Code

DisCo: Disentangled Control for Realistic Human Dance Generation

1 code implementation • 30 Jun 2023 • Tan Wang, Linjie Li, Kevin Lin, Yuanhao Zhai, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang

In this paper, we depart from the traditional paradigm of human motion transfer and emphasize two additional critical attributes for the synthesis of human dance content in social media contexts: (i) Generalizability: the model should be able to generalize beyond generic human viewpoints as well as unseen human subjects, backgrounds, and poses; (ii) Compositionality: it should allow for the seamless composition of seen/unseen subjects, backgrounds, and poses from different sources.

Attribute

903

Paper
Code

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

1 code implementation • 20 Mar 2023 • Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.

Ranked #24 on Visual Question Answering on MM-Vet

Multimodal Reasoning Visual Question Answering

907

Paper
Code

MemGPT: Towards LLMs as Operating Systems

1 code implementation • 12 Oct 2023 • Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, Joseph E. Gonzalez

Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis.

Management

9,122

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.