1 code implementation • ECCV 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann
We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras.
Ranked #1 on
Trajectory Forecasting
on ActEV
no code implementations • 11 Apr 2025 • Team Seawead, Ceyuan Yang, Zhijie Lin, Yang Zhao, Shanchuan Lin, Zhibei Ma, Haoyuan Guo, Hao Chen, Lu Qi, Sen Wang, Feng Cheng, Feilong Zuo Xuejiao Zeng, Ziyan Yang, Fangyuan Kong, Zhiwu Qing, Fei Xiao, Meng Wei, Tuyen Hoang, Siyu Zhang, Peihao Zhu, Qi Zhao, Jiangqiao Yan, Liangke Gui, Sheng Bi, Jiashi Li, Yuxi Ren, Rui Wang, Huixia Li, Xuefeng Xiao, Shu Liu, Feng Ling, Heng Zhang, Houmin Wei, Huafeng Kuang, Jerry Duncan, Junda Zhang, Junru Zheng, Li Sun, Manlin Zhang, Renfei Sun, Xiaobin Zhuang, Xiaojie Li, Xin Xia, Xuyan Chi, Yanghua Peng, Yuping Wang, Yuxuan Wang, Zhongkai Zhao, Zhuo Chen, Zuquan Song, Zhenheng Yang, Jiashi Feng, Jianchao Yang, Lu Jiang
This technical report highlights the key design decisions that enhance the performance of the medium-sized diffusion model.
no code implementations • 26 Mar 2025 • Qi Zhao, Xingyu Ni, Ziyu Wang, Feng Cheng, Ziyan Yang, Lu Jiang, Bohan Wang
We investigate how to enhance the physical fidelity of video generation models by leveraging synthetic videos derived from computer graphics pipelines.
no code implementations • 13 Mar 2025 • Yuwei Guo, Ceyuan Yang, Ziyan Yang, Zhibei Ma, Zhijie Lin, Zhenheng Yang, Dahua Lin, Lu Jiang
Recent advances in video generation can produce realistic, minute-long single-shot videos with scalable diffusion transformers.
1 code implementation • 13 Mar 2025 • Yuanxin Liu, Rui Zhu, Shuhuai Ren, Jiacong Wang, Haoyuan Guo, Xu sun, Lu Jiang
To evaluate the performance of automatic metrics in unified AIGV evaluation, we introduce a benchmark called UVE-Bench.
no code implementations • 13 Mar 2025 • Hao He, Ceyuan Yang, Shanchuan Lin, Yinghao Xu, Meng Wei, Liangke Gui, Qi Zhao, Gordon Wetzstein, Lu Jiang, Hongsheng Li
This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.
no code implementations • 14 Jan 2025 • Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang
The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive.
no code implementations • 10 Jan 2025 • Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Jiepeng Cen, Zhibei Ma, Alan Yuille, Lu Jiang
We further introduce a Long Narrative Video Director to enhance both visual and semantic coherence in generated videos and emphasize the role of aligning visual embeddings to achieve improved overall video quality.
no code implementations • 2 Jan 2025 • Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Chen Change Loy, Lu Jiang
Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild.
no code implementations • 1 Jan 2025 • Libin Lan, Lu Jiang, Tianshu Yu, Xiaojuan Liu, Zhongshi He
Based on this, we propose a transformer-like architecture, named FullTransNet, which has a full encoder-decoder structure with local-global sparse attention for video summarization.
no code implementations • 27 Dec 2024 • Weichen Yu, Ziyan Yang, Shanchuan Lin, Qi Zhao, Jianyi Wang, Liangke Gui, Matt Fredrikson, Lu Jiang
In text-to-image (T2I) generation, a prevalent training technique involves utilizing Vision Language Models (VLMs) for image re-captioning.
no code implementations • 15 Nov 2024 • Qi Hao, Runchang Liang, Yue Gao, Hao Dong, Wei Fan, Lu Jiang, Pengyang Wang
Variable Subset Forecasting (VSF) refers to a unique scenario in multivariate time series forecasting, where available variables in the inference phase are only a subset of the variables in the training phase.
no code implementations • 14 Oct 2024 • Pengzhou Cai, Lu Jiang, Yanxin Li, Xiaojuan Liu, Libin Lan
To address this issue, we introduce a dynamic, query-aware sparse attention mechanism for ultrasound image segmentation.
no code implementations • 17 Sep 2024 • Jieyun Bai, ZiHao Zhou, Zhanhong Ou, Gregor Koehler, Raphael Stock, Klaus Maier-Hein, Marawan Elbatel, Robert Martí, Xiaomeng Li, Yaoyang Qiu, Panjie Gou, Gongping Chen, Lei Zhao, Jianxun Zhang, Yu Dai, Fangyijie Wang, Guénolé Silvestre, Kathleen Curran, Hongkun Sun, Jing Xu, Pengzhou Cai, Lu Jiang, Libin Lan, Dong Ni, Mei Zhong, Gaowen Chen, Víctor M. Campello, Yaosheng Lu, Karim Lekadir
This challenge aimed to enhance the development of automatic segmentation algorithms at an international scale, providing the largest dataset to date with 5, 101 intrapartum ultrasound images collected from two ultrasound machines across three hospitals from two institutions.
1 code implementation • 4 Jun 2024 • Jinghan Zhang, Xiting Wang, Weijieying Ren, Lu Jiang, Dongjie Wang, Kunpeng Liu
To address these limitations, we introduce the Retrieval Augmented Thought Tree (RATT), a novel thought structure that considers both overall logical soundness and factual correctness at each step of the thinking process.
no code implementations • 22 May 2024 • Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, José Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna Somandepalli
Here, we propose a novel training approach to effectively learn arbitrary conditional distributions in the audiovisual space. Our key contribution lies in how we parameterize the diffusion timestep in the forward diffusion process.
no code implementations • 14 May 2024 • Xinhao Zhang, Zaitian Wang, Lu Jiang, Wanfu Gao, Pengfei Wang, Kunpeng Liu
In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data.
no code implementations • 13 Mar 2024 • Lu Jiang, Qi Wang, Yuhang Chang, Jianing Song, Haoyue Fu, Xiaochun Yang
The traditional machine learning method tends to favor the majority class, which leads to the lack of minority class information in the model.
1 code implementation • 1 Jan 2024 • Libin Lan, Pengzhou Cai, Lu Jiang, Xiaojuan Liu, Yongmei Li, Yudong Zhang
Specifically, BRAU-Net++ uses bi-level routing attention as the core building block to design our u-shaped encoder-decoder structure, in which both encoder and decoder are hierarchically constructed, so as to learn global semantic information while reducing computational complexity.
no code implementations • 25 Dec 2023 • Zhaofan Zhang, Yanan Xiao, Lu Jiang, Dingqi Yang, Minghao Yin, Pengyang Wang
In the realm of human mobility, the decision-making process for selecting the next-visit location is intricately influenced by a trade-off between spatial and temporal constraints, which are reflective of individual needs and preferences.
1 code implementation • 21 Dec 2023 • Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam, Ming-Hsuan Yang, Irfan Essa, Huisheng Wang, David A. Ross, Bryan Seybold, Lu Jiang
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals.
Ranked #4 on
Text-to-Video Generation
on MSR-VTT
no code implementations • 11 Dec 2023 • Agrim Gupta, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, José Lezama
We present W. A. L. T, a transformer-based approach for photorealistic video generation via diffusion modeling.
Ranked #1 on
Video Generation
on UCF-101
(using extra training data)
no code implementations • 5 Dec 2023 • Hsin-Ping Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang
To achieve detailed control, we propose a unified framework to jointly inject control signals into the existing text-to-video model.
1 code implementation • CVPR 2024 • Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang
Language has emerged as a natural interface for image editing.
3 code implementations • 9 Oct 2023 • Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation.
Ranked #2 on
Video Generation
on Kinetics-600 12 frames, 64x64
no code implementations • 30 Sep 2023 • Pengzhou Cai, Lu Jiang, Yanxin Li, Libin Lan
In this paper, we propose a method, named BRAU-Net, to solve the pubic symphysis-fetal head segmentation task.
1 code implementation • 6 Jul 2023 • Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong
We evaluate the video understanding capabilities of existing foundation models (FMs) using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring an FM for downstream tasks.
no code implementations • NeurIPS 2023 • Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang
In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos.
4 code implementations • 1 Jun 2023 • Kihyuk Sohn, Nataniel Ruiz, Kimin Lee, Daniel Castro Chin, Irina Blok, Huiwen Chang, Jarred Barber, Lu Jiang, Glenn Entis, Yuanzhen Li, Yuan Hao, Irfan Essa, Michael Rubinstein, Dilip Krishnan
Pre-trained large text-to-image models synthesize impressive images with an appropriate use of text prompts.
1 code implementation • 1 Jun 2023 • Kihyuk Sohn, Albert Shaw, Yuan Hao, Han Zhang, Luisa Polania, Huiwen Chang, Lu Jiang, Irfan Essa
We study domain-adaptive image synthesis, the problem of teaching pretrained image generative models a new style or concept from as few as one image to synthesize novel images, to better understand the compositional image synthesis.
1 code implementation • 7 Feb 2023 • Yanzhe Zhang, Lu Jiang, Greg Turk, Diyi Yang
Text-to-image models, which can generate high-quality images based on textual input, have recently enabled various content-creation tools.
5 code implementations • 2 Jan 2023 • Huiwen Chang, Han Zhang, Jarred Barber, AJ Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding.
Ranked #1 on
Text-to-Image Generation
on MS-COCO
(FID metric)
no code implementations • 1 Jan 2023 • Lu Jiang, Yuanhan Li, Na Luo, Jianan Wang, Qiao Ning
Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding.
no code implementations • 1 Jan 2023 • Lu Jiang, Yibin Wang, Jianan Wang, Pengyang Wang, Minghao Yin
To tackle the challenges, we formulate the problem as a course representation learning task-based and develop an Information-aware Graph Representation Learning(IaGRL) for multi-view MOOC quality evaluation.
no code implementations • 24 Dec 2022 • Yanan Xiao, Minyu Liu, Zichen Zhang, Lu Jiang, Minghao Yin, Jianan Wang
We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network.
1 code implementation • CVPR 2023 • Lijun Yu, Yong Cheng, Kihyuk Sohn, José Lezama, Han Zhang, Huiwen Chang, Alexander G. Hauptmann, Ming-Hsuan Yang, Yuan Hao, Irfan Essa, Lu Jiang
We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model.
Ranked #1 on
Video Prediction
on Something-Something V2
1 code implementation • CVPR 2023 • Kihyuk Sohn, Yuan Hao, José Lezama, Luisa Polania, Huiwen Chang, Han Zhang, Irfan Essa, Lu Jiang
We base our framework on state-of-the-art generative vision transformers that represent an image as a sequence of visual tokens to the autoregressive or non-autoregressive transformers.
1 code implementation • 9 Sep 2022 • José Lezama, Huiwen Chang, Lu Jiang, Irfan Essa
Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer.
6 code implementations • CVPR 2022 • Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation.
Ranked #2 on
Text-to-Image Generation
on LHQC
1 code implementation • 9 Dec 2021 • Xiang Kong, Lu Jiang, Huiwen Chang, Han Zhang, Yuan Hao, Haifeng Gong, Irfan Essa
During inference, BLT first generates a draft layout from the input and then iteratively refines it into a high-quality layout by masking out low-confident attributes.
1 code implementation • CVPR 2022 • Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun
In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.
Ranked #9 on
Domain Generalization
on ImageNet-C
(using extra training data)
1 code implementation • ICLR 2022 • Chengzhi Mao, Lu Jiang, Mostafa Dehghani, Carl Vondrick, Rahul Sukthankar, Irfan Essa
Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image recognition.
Ranked #3 on
Domain Generalization
on Stylized-ImageNet
3 code implementations • ICLR 2022 • Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
Ranked #32 on
Image Generation
on CIFAR-10
no code implementations • 8 Jun 2021 • Yong Cheng, Wei Wang, Lu Jiang, Wolfgang Macherey
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT).
1 code implementation • 30 Apr 2021 • Youjiang Xu, Linchao Zhu, Lu Jiang, Yi Yang
It has been shown that deep neural networks are prone to overfitting on biased training data.
1 code implementation • CVPR 2021 • Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang
Recent years have witnessed the rapid progress of generative adversarial networks (GANs).
Ranked #1 on
Image Generation
on CIFAR-100
no code implementations • 19 Sep 2020 • Xiaosa Zhao, Kunpeng Liu, Wei Fan, Lu Jiang, Xiaowei Zhao, Minghao Yin, Yanjie Fu
To address the question, we develop a single-agent reinforced feature selection approach integrated with restructured choice strategy.
1 code implementation • 11 Aug 2020 • Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa
In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community.
no code implementations • ECCV 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang
Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing.
no code implementations • ACL 2020 • Yong Cheng, Lu Jiang, Wolfgang Macherey, Jacob Eisenstein
In this paper, we propose a new adversarial augmentation method for Neural Machine Translation (NMT).
Ranked #22 on
Machine Translation
on WMT2014 English-German
1 code implementation • 4 Apr 2020 • Junwei Liang, Lu Jiang, Alexander Hauptmann
We refer to our method as SimAug.
Ranked #2 on
Trajectory Prediction
on ActEV
no code implementations • 25 Dec 2019 • Yijun Li, Lu Jiang, Ming-Hsuan Yang
Image extrapolation aims at expanding the narrow field of view of a given image patch.
no code implementations • ECCV 2020 • Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang
The first module predicts a graph with complete relations from a graph with user-specified relations.
1 code implementation • CVPR 2020 • Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann
The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals.
Ranked #1 on
Multi-future Trajectory Prediction
on ForkingPaths
3 code implementations • ICML 2020 • Lu Jiang, Di Huang, Mason Liu, Weilong Yang
Due to the lack of suitable datasets, previous research has only examined deep learning on controlled synthetic label noise, and real-world label noise has never been studied in a controlled setting.
Ranked #12 on
Image Classification
on WebVision-1000
4 code implementations • 31 Oct 2019 • Curtis G. Northcutt, Lu Jiang, Isaac L. Chuang
Confident learning (CL) is an alternative approach which focuses instead on label quality by characterizing and identifying label errors in datasets, based on the principles of pruning noisy data, counting with probabilistic thresholds to estimate noise, and ranking examples to train with confidence.
no code implementations • 25 Sep 2019 • Lu Jiang, Di Huang, Weilong Yang
Performing controlled experiments on noisy data is essential in thoroughly understanding deep learning across a spectrum of noise levels.
no code implementations • ICLR 2020 • Alejandro Newell, Lu Jiang, Chong Wang, Li-Jia Li, Jia Deng
Multi-task learning holds the promise of less data, parameters, and time than training of separate models.
no code implementations • ACL 2019 • Yong Cheng, Lu Jiang, Wolfgang Macherey
Neural machine translation (NMT) often suffers from the vulnerability to noisy perturbations in the input.
no code implementations • 4 Jun 2019 • Peng Li, Jiabin Zhang, Zheng Zhu, Yanwei Li, Lu Jiang, Guan Huang
Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras.
3 code implementations • ICLR 2019 • Yunbo Wang, Lu Jiang, Ming-Hsuan Yang, Li-Jia Li, Mingsheng Long, Li Fei-Fei
We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance.
Ranked #1 on
Video Prediction
on KTH
(Cond metric)
no code implementations • 8 Apr 2019 • Yu Wu, Lu Jiang, Yi Yang
In this paper, we empirically study this problem and introduce 1) a simple yet effective baseline that achieves promising performance; 2) an easier and practical setting for EmbodiedQA where an agent has a chance to adapt the trained model to a new environment before it actually answers users questions.
2 code implementations • 2 Mar 2019 • Nam Vo, Lu Jiang, James Hays
In this work we show how one can learn transformations with no training examples by learning them on another domain and then transfer to the target domain.
2 code implementations • CVPR 2019 • Junwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander Hauptmann, Li Fei-Fei
To facilitate the training, the network is learned with an auxiliary task of predicting future location in which the activity will happen.
Ranked #1 on
Activity Prediction
on ActEV
2 code implementations • CVPR 2019 • Guoliang Kang, Lu Jiang, Yi Yang, Alexander G. Hauptmann
Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in the source domain.
Ranked #9 on
Domain Adaptation
on Office-31
4 code implementations • CVPR 2019 • Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James Hays
In this paper, we study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.
Ranked #2 on
Image Retrieval with Multi-Modal Query
on MIT-States
1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Yannis Kalantidis, Li-Jia Li, and Alexander Hauptmann
In addition to a text answer, a few grounding photos are also given to justify the answer.
Ranked #1 on
Memex Question Answering
on MemexQA
2 code implementations • CVPR 2018 • Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander Hauptmann
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering.
Ranked #1 on
Memex Question Answering
on MemexQA
1 code implementation • 11 Apr 2018 • Yu Wu, Linchao Zhu, Lu Jiang, Yi Yang
Thus, the sequence model can be decoupled from the novel object descriptions.
1 code implementation • ICML 2018 • Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
Recent deep networks are capable of memorizing the entire data even when the labels are completely random.
Ranked #16 on
Image Classification
on WebVision-1000
1 code implementation • ECCV 2018 • Zelun Luo, Jun-Ting Hsieh, Lu Jiang, Juan Carlos Niebles, Li Fei-Fei
We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available.
1 code implementation • 4 Aug 2017 • Lu Jiang, Junwei Liang, Liangliang Cao, Yannis Kalantidis, Sachin Farfade, Alexander Hauptmann
This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection.
no code implementations • 5 Jul 2017 • Po-Yao Huang, Ye Yuan, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann
We report on CMU Informedia Lab's system used in Google's YouTube 8 Million Video Understanding Challenge.
no code implementations • 12 Aug 2016 • Mengyi Liu, Lu Jiang, Shiguang Shan, Alexander G. Hauptmann
Multimedia event detection has been receiving increasing attention in recent years.
1 code implementation • 16 Jul 2016 • Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann
Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community.
no code implementations • 17 Jun 2016 • Shoou-I Yu, Yi Yang, Zhongwen Xu, Shicheng Xu, Deyu Meng, Zexi Mao, Zhigang Ma, Ming Lin, Xuanchong Li, Huan Li, Zhenzhong Lan, Lu Jiang, Alexander G. Hauptmann, Chuang Gan, Xingzhong Du, Xiaojun Chang
The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search.
no code implementations • ICCV 2015 • Dingwen Zhang, Deyu Meng, Chao Li, Lu Jiang, Qian Zhao, Junwei Han
As an interesting and emerging topic, co-saliency detection aims at simultaneously extracting common salient objects in a group of images.
no code implementations • 19 Nov 2015 • Deyu Meng, Qian Zhao, Lu Jiang
Self-paced learning (SPL) is a recently raised methodology designed through simulating the learning principle of humans/animals.
no code implementations • NeurIPS 2014 • Lu Jiang, Deyu Meng, Shoou-I Yu, Zhenzhong Lan, Shiguang Shan, Alexander Hauptmann
Self-paced learning (SPL) is a recently proposed learning regime inspired by the learning process of humans and animals that gradually incorporates easy to more complex samples into training.