no code implementations • 7 Sep 2017 • SouYoung Jin, Hang Su, Chris Stauffer, Erik Learned-Miller
We introduce a novel verification method, rank-1 counts verification, that has this property, and use it in a link-based clustering scheme.
no code implementations • ICCV 2017 • SouYoung Jin, Hang Su, Chris Stauffer, Erik Learned-Miller
We introduce a novel verification method, rank-1 counts verification, that has this property, and use it in a link-based clustering scheme.
no code implementations • ECCV 2018 • SouYoung Jin, Aruni RoyChowdhury, Huaizu Jiang, Ashish Singh, Aditya Prasad, Deep Chakraborty, Erik Learned-Miller
In this work, we show how large numbers of hard negatives can be obtained {\em automatically} by analyzing the output of a trained detector on video sequences.
1 code implementation • CVPR 2019 • Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao, Erik Learned-Miller
Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters.
no code implementations • CVPR 2021 • Mathew Monfort, SouYoung Jin, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva
With this in mind, the descriptions people generate for videos of different dynamic events can greatly improve our understanding of the key information of interest in each video.
no code implementations • ACL 2022 • Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.
no code implementations • CVPR 2023 • Camilo L. Fosco, SouYoung Jin, Emilie Josephs, Aude Oliva
We show that including information from the ETM during training improves action recognition and anticipation performance on various egocentric video datasets.
no code implementations • 11 Oct 2023 • Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings.
1 code implementation • NeurIPS 2023 • Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris
To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
no code implementations • 9 Dec 2023 • Xingjian Diao, Ming Cheng, Wayner Barrios, SouYoung Jin
This achievement highlights our model capability to bridge first-person statements and dynamic face generation, providing insightful guidance for future work.