Search Results for author: Lu Jiang

Found 61 papers, 34 papers with code

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

1 code implementation ECCV 2020 Junwei Liang, Lu Jiang, Alexander Hauptmann

We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras.

Adversarial Attack Adversarial Defense +2

BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation

1 code implementation1 Jan 2024 Libin Lan, Pengzhou Cai, Lu Jiang, Xiaojuan Liu, Yongmei Li, Yudong Zhang

Specifically, BRAU-Net++ uses bi-level routing attention as the core building block to design our u-shaped encoder-decoder structure, in which both encoder and decoder are hierarchically constructed, so as to learn global semantic information while reducing computational complexity.

Image Segmentation Medical Image Segmentation +3

Spatial-Temporal Interplay in Human Mobility: A Hierarchical Reinforcement Learning Approach with Hypergraph Representation

no code implementations25 Dec 2023 Zhaofan Zhang, Yanan Xiao, Lu Jiang, Dingqi Yang, Minghao Yin, Pengyang Wang

In the realm of human mobility, the decision-making process for selecting the next-visit location is intricately influenced by a trade-off between spatial and temporal constraints, which are reflective of individual needs and preferences.

Decision Making Hierarchical Reinforcement Learning +1

Fine-grained Controllable Video Generation via Object Appearance and Context

no code implementations5 Dec 2023 Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang

To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model.

Text-to-Video Generation Video Generation

VideoGLUE: Video General Understanding Evaluation of Foundation Models

1 code implementation6 Jul 2023 Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task.

Action Recognition Temporal Localization +1

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

no code implementations NeurIPS 2023 Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.

In-Context Learning multimodal generation

Learning Disentangled Prompts for Compositional Image Synthesis

no code implementations1 Jun 2023 Kihyuk Sohn, Albert Shaw, Yuan Hao, Han Zhang, Luisa Polania, Huiwen Chang, Lu Jiang, Irfan Essa

We study domain-adaptive image synthesis, the problem of teaching pretrained image generative models a new style or concept from as few as one image to synthesize novel images, to better understand the compositional image synthesis.

Domain Adaptation Image Generation

Auditing Gender Presentation Differences in Text-to-Image Models

1 code implementation7 Feb 2023 Yanzhe Zhang, Lu Jiang, Greg Turk, Diyi Yang

Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools.

Muse: Text-To-Image Generation via Masked Generative Transformers

4 code implementations2 Jan 2023 Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan

Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.

Language Modelling Large Language Model

Multi-View MOOC Quality Evaluation via Information-Aware Graph Representation Learning

no code implementations1 Jan 2023 Lu Jiang, Yibin Wang, Jianan Wang, Pengyang Wang, Minghao Yin

To tackle the challenges, we formulate the problem as a course representation learning task-based and develop an Information-aware Graph Representation Learning(IaGRL) for multi-view MOOC quality evaluation.

Graph Representation Learning

A Multi-Source Information Learning Framework for Airbnb Price Prediction

no code implementations1 Jan 2023 Lu Jiang, Yuanhan Li, Na Luo, Jianan Wang, Qiao Ning

Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding.

Streaming Traffic Flow Prediction Based on Continuous Reinforcement Learning

no code implementations24 Dec 2022 Yanan Xiao, Minyu Liu, Zichen Zhang, Lu Jiang, Minghao Yin, Jianan Wang

We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network.

reinforcement-learning Reinforcement Learning (RL) +2

Visual Prompt Tuning for Generative Transfer Learning

1 code implementation CVPR 2023 Kihyuk Sohn, Yuan Hao, José Lezama, Luisa Polania, Huiwen Chang, Han Zhang, Irfan Essa, Lu Jiang

We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.

Image Generation Transfer Learning

Improved Masked Image Generation with Token-Critic

1 code implementation9 Sep 2022 José Lezama, Huiwen Chang, Lu Jiang, Irfan Essa

Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer.

Image Generation

MaskGIT: Masked Generative Image Transformer

6 code implementations CVPR 2022 Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation.

 Ranked #1 on Image Outpainting on LHQC (Block-FID (Right Extend) metric)

Image Manipulation Image Outpainting

BLT: Bidirectional Layout Transformer for Controllable Layout Generation

1 code implementation9 Dec 2021 Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa

During inference, BLT first generates a draft layout from the input and then iteratively refines it into a high-quality layout by masking out low-confident attributes.

Pyramid Adversarial Training Improves ViT Performance

1 code implementation CVPR 2022 Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun

In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.

Ranked #9 on Domain Generalization on ImageNet-C (using extra training data)

Adversarial Attack Data Augmentation +2

ViTGAN: Training GANs with Vision Transformers

3 code implementations ICLR 2022 Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.

Image Generation

Self-supervised and Supervised Joint Training for Resource-rich Machine Translation

no code implementations8 Jun 2021 Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey

Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT).

Low-Resource Neural Machine Translation NMT +1

Simplifying Reinforced Feature Selection via Restructured Choice Strategy of Single Agent

no code implementations19 Sep 2020 Xiaosa Zhao, Kunpeng Liu, Wei Fan, Lu Jiang, Xiaowei Zhao, Minghao Yin, Yanjie Fu

To address the question, we develop a single-agent reinforced feature selection approach integrated with restructured choice strategy.

feature selection

Text as Neural Operator: Image Manipulation by Text Instruction

1 code implementation11 Aug 2020 Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa

In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community.

Conditional Image Generation Image Captioning +2

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

no code implementations ECCV 2020 Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing.

Image Generation Retrieval

Controllable and Progressive Image Extrapolation

no code implementations25 Dec 2019 Yijun Li, Lu Jiang, Ming-Hsuan Yang

Image extrapolation aims at expanding the narrow field of view of a given image patch.

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

1 code implementation CVPR 2020 Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals.

Autonomous Driving Human motion prediction +5

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels

2 code implementations ICML 2020 Lu Jiang, Di Huang, Mason Liu, Weilong Yang

Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-world label noise has never been studied in a controlled setting.

Image Classification

Confident Learning: Estimating Uncertainty in Dataset Labels

4 code implementations31 Oct 2019 Curtis G. Northcutt, Lu Jiang, Isaac L. Chuang

Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence.

Learning with noisy labels Sentiment Analysis +1

Synthetic vs Real: Deep Learning on Controlled Noise

no code implementations25 Sep 2019 Lu Jiang, Di Huang, Weilong Yang

Performing controlled experiments on noisy data is essential in thoroughly understanding deep learning across a spectrum of noise levels.

Feature Partitioning for Efficient Multi-Task Architectures

no code implementations ICLR 2020 Alejandro Newell, Lu Jiang, Chong Wang, Li-Jia Li, Jia Deng

Multi-task learning holds the promise of less data, parameters, and time than training of separate models.

Multi-Task Learning

State-aware Re-identification Feature for Multi-target Multi-camera Tracking

no code implementations4 Jun 2019 Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, Guan Huang

Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.

Eidetic 3D LSTM: A Model for Video Prediction and Beyond

3 code implementations ICLR 2019 Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei

We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance.

 Ranked #1 on Video Prediction on KTH (Cond metric)

Activity Recognition Video Prediction +1

Revisiting EmbodiedQA: A Simple Baseline and Beyond

no code implementations8 Apr 2019 Yu Wu, Lu Jiang, Yi Yang

In this paper, we empirically study this problem and introduce 1) a simple yet effective baseline that achieves promising performance; 2) an easier and practical setting for EmbodiedQA where an agent has a chance to adapt the trained model to a new environment before it actually answers users questions.

Embodied Question Answering Question Answering

Let's Transfer Transformations of Shared Semantic Representations

2 code implementations2 Mar 2019 Nam Vo, Lu Jiang, James Hays

In this work we show how one can learn transformations with no training examples by learning them on another domain and then transfer to the target domain.

Attribute Image Retrieval +3

Contrastive Adaptation Network for Unsupervised Domain Adaptation

2 code implementations CVPR 2019 Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann

Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in the source domain.

Unsupervised Domain Adaptation

Composing Text and Image for Image Retrieval - An Empirical Odyssey

4 code implementations CVPR 2019 Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays

In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.

Image Retrieval Image Retrieval with Multi-Modal Query +1

Focal Visual-Text Attention for Visual Question Answering

2 code implementations CVPR 2018 Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.

Memex Question Answering Question Answering +1

Decoupled Novel Object Captioner

1 code implementation11 Apr 2018 Yu Wu, Linchao Zhu, Lu Jiang, Yi Yang

Thus, the sequence model can be decoupled from the novel object descriptions.

Image Captioning Novel Concepts +2

Graph Distillation for Action Detection with Privileged Modalities

1 code implementation ECCV 2018 Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei

We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available.

Action Classification Action Detection +1

MemexQA: Visual Memex Question Answering

1 code implementation4 Aug 2017 Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann

This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.

Memex Question Answering Question Answering +1

Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning

1 code implementation16 Jul 2016 Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann

Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community.

BIG-bench Machine Learning

Strategies for Searching Video Content with Text Queries or Video Examples

no code implementations17 Jun 2016 Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang

The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.

Event Detection Retrieval +1

What Objective Does Self-paced Learning Indeed Optimize?

no code implementations19 Nov 2015 Deyu Meng, Qian Zhao, Lu Jiang

Self-paced learning (SPL) is a recently raised methodology designed through simulating the learning principle of humans/animals.

Self-Paced Learning with Diversity

no code implementations NeurIPS 2014 Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann

Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training.

Cannot find the paper you are looking for? You can Submit a new open access paper.