no code implementations • 16 Jul 2024 • Jiahao Zhang, Frederic Z. Zhang, Cristian Rodriguez, Yizhak Ben-Shabat, Anoop Cherian, Stephen Gould
In this paper, we tackle this issue by simultaneously grounding a sequence of step diagrams.
1 code implementation • CVPR 2024 • Ming Xu, Stephen Gould
We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.
1 code implementation • 14 Mar 2024 • Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould
This study uses linear probing to shed light on the hidden knowledge at the output layers of LVLMs.
no code implementations • 12 Feb 2024 • Weijie Tu, Weijian Deng, Dylan Campbell, Stephen Gould, Tom Gedeon
Vision-Language Models (VLMs) have emerged as the dominant approach for zero-shot recognition, adept at handling diverse scenarios and significant distribution changes.
1 code implementation • 1 Feb 2024 • Qinyu Zhao, Ming Xu, Kartik Gupta, Akshay Asthana, Liang Zheng, Stephen Gould
Feature shaping refers to a family of methods that exhibit state-of-the-art performance for out-of-distribution (OOD) detection.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • CVPR 2024 • Weijian Deng, Dylan Campbell, Chunyi Sun, Shubham Kanitkar, Matthew E. Shaffer, Stephen Gould
Neural implicit surface reconstruction leveraging volume rendering has led to significant advances in multi-view reconstruction.
1 code implementation • CVPR 2024 • Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Dylan Campbell, Stephen Gould
We circumvent this problem by introducing guiding points and use them to steer the optimization towards the true shape via small incremental changes for which the loss formulation has a good descent direction.
no code implementations • CVPR 2024 • Yunzhong Hou, Stephen Gould, Liang Zheng
Inspired by this we study selective view pipelining for efficient multi-view understanding which breaks computation of multiple views into steps and only computes the most helpful views/steps in a parallel manner for the best efficiency.
no code implementations • 29 Nov 2023 • Hamed Damirchi, Cristian Rodríguez-Opazo, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Stephen Gould, Anton Van Den Hengel
Large pre-trained models can dramatically reduce the amount of task-specific data required to solve a problem, but they often fail to capture domain-specific nuances out of the box.
no code implementations • 19 Oct 2023 • Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould
Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.
1 code implementation • ICCV 2023 • Frederic Z. Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, Stephen Gould
Recently, the DETR framework has emerged as the dominant approach for human--object interaction (HOI) research.
Ranked #2 on Human-Object Interaction Detection on HICO-DET
1 code implementation • ICCV 2023 • Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.
1 code implementation • ICCV 2023 • Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan
Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.
1 code implementation • 26 Jun 2023 • Zhiwei Xu, Hao Wang, Yanbin Liu, Stephen Gould
We explore two differentiable deep declarative layers, namely least squares on sphere (LESS) and implicit eigen decomposition (IED), for learning the principal matrix features (PMaF).
no code implementations • 24 Jun 2023 • Stephen Gould, Ming Xu, Zhiwei Xu, Yanbin Liu
We explore conditions for when the gradient of a deep declarative node can be approximated by ignoring constraint terms and still result in a descent direction for the global loss function.
2 code implementations • 25 May 2023 • Zheyuan Liu, Weixuan Sun, Damien Teney, Stephen Gould
An alternative approach is to allow interactions between the query and every possible candidate, i. e., reference-text-candidate triplets, and pick the best from the entire set.
Ranked #3 on Image Retrieval on CIRR
no code implementations • 18 Apr 2023 • Zheyu Zhuang, Yizhak Ben-Shabat, Jiahao Zhang, Stephen Gould, Robert Mahony
It is composed of a visual servoing module that reaches and grasps assembly parts in an unstructured multi-instance and dynamic environment, an action recognition module that performs human action prediction for implicit communication, and a visual handover module that uses the perceptual understanding of human behaviour to produce an intuitive and efficient collaborative assembly experience.
no code implementations • 30 Mar 2023 • Thalaiyasingam Ajanthan, Matt Ma, Anton Van Den Hengel, Stephen Gould
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration as the learnable parameters are being updated.
1 code implementation • 29 Mar 2023 • Zheyuan Liu, Weixuan Sun, Yicong Hong, Damien Teney, Stephen Gould
Composed image retrieval searches for a target image based on a multi-modal user query comprised of a reference image and modification text describing the desired changes.
Ranked #8 on Image Retrieval on Fashion IQ
1 code implementation • CVPR 2023 • Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen Gould
In this paper, we consider a novel setting where such an alignment is between (i) instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) and (ii) video segments from in-the-wild videos; these videos comprising an enactment of the assembly actions in the real world.
1 code implementation • 19 Mar 2023 • Ming Xu, Sourav Garg, Michael Milford, Stephen Gould
An interesting byproduct of this formulation is that DecDTW outputs the optimal warping path between two time series as opposed to a soft approximation, recoverable from Soft-DTW.
1 code implementation • CVPR 2024 • Yizhak Ben-Shabat, Oren Shrout, Stephen Gould
We propose a novel method for 3D point cloud action recognition.
1 code implementation • 10 Mar 2023 • Yunzhong Hou, Stephen Gould, Liang Zheng
Multiview camera setups have proven useful in many computer vision applications for reducing ambiguities, mitigating occlusions, and increasing field-of-view coverage.
no code implementations • 2 Feb 2023 • Weijian Deng, Yumin Suh, Stephen Gould, Liang Zheng
This work aims to assess how well a model performs under distribution shifts without using labels.
1 code implementation • CVPR 2023 • Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Stephen Gould
We propose a two-step approach, OG-INR, where we (1) construct a discrete octree and label what is inside and outside (2) optimize for a continuous and high-fidelity shape using an INR that is initially guided by the octree's labelling.
no code implementations • ICCV 2023 • Peixia Li, Pulak Purkait, Thalaiyasingam Ajanthan, Majid Abdolshah, Ravi Garg, Hisham Husain, Chenchen Xu, Stephen Gould, Wanli Ouyang, Anton Van Den Hengel
Each learning group consists of a teacher network, a student network and a novel filter module.
no code implementations • 22 Dec 2022 • Kartik Gupta, Thalaiyasingam Ajanthan, Anton Van Den Hengel, Stephen Gould
Most current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective and then discard the learned projection head after training.
no code implementations • 7 Dec 2022 • Chunyi Sun, Yanbin Liu, Junlin Han, Stephen Gould
Specifically, we use a NeRF model to generate numerous image-angle pairs to train an adjustor, which can adjust the StyleGAN latent code to generate high-fidelity stylized images for any given angle.
no code implementations • CVPR 2023 • Jaskirat Singh, Stephen Gould, Liang Zheng
The user scribbles control the color composition while the text prompt provides control over the overall image semantics.
no code implementations • 17 Aug 2022 • Yunzhong Hou, Liang Zheng, Stephen Gould
To this end, we propose a color quantization network, ColorCNN, which learns to structure an image in limited color spaces by minimizing the classification loss.
no code implementations • 17 Aug 2022 • Yunzhong Hou, Stephen Gould, Liang Zheng
In this paper, we take the best of both worlds and propose multi-view correlation consistency (MVCC) learning: it considers rich pairwise relationships in self-correlation matrices and matches them across views to provide robust supervision.
no code implementations • 14 Jul 2022 • Weijian Deng, Stephen Gould, Liang Zheng
Generalization and invariance are two essential properties of any machine learning model.
1 code implementation • CVPR 2022 • Yicong Hong, Zun Wang, Qi Wu, Stephen Gould
To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments.
no code implementations • 24 Feb 2022 • Stephen Gould, Dylan Campbell, Itzik Ben-Shabat, Chamin Hewa Koneputugodage, Zhiwei Xu
Deep declarative networks and other recent related works have shown how to differentiate the solution map of a (continuous) parametrized optimization problem, opening up the possibility of embedding mathematical optimization problems into end-to-end learnable models.
1 code implementation • CVPR 2022 • Yizhak Ben-Shabat, Chamin Hewa Koneputugodage, Stephen Gould
In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input.
1 code implementation • CVPR 2022 • Frederic Z. Zhang, Dylan Campbell, Stephen Gould
Recent developments in transformer models for visual data have led to significant improvements in recognition and detection tasks.
Ranked #10 on Human-Object Interaction Detection on V-COCO
1 code implementation • 6 Oct 2021 • Asiri Wijesinghe, Qing Wang, Stephen Gould
This framework provides a novel optimal transport distance metric, namely Regularized Wasserstein (RW) discrepancy, which can preserve both features and structure of graphs via Wasserstein distances on features and their local variations, local barycenters and global connectivity.
3 code implementations • ICCV 2021 • Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, Stephen Gould
We demonstrate that with a relatively simple architecture, CIRPLANT outperforms existing methods on open-domain images, while matching state-of-the-art accuracy on the existing narrow datasets, such as fashion.
Ranked #12 on Image Retrieval on CIRR
1 code implementation • 21 Jun 2021 • Yizhak Ben-Shabat, Chamin Hewa Koneputugodage, Stephen Gould
In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input.
no code implementations • CVPR 2021 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
no code implementations • 10 Jun 2021 • Weijian Deng, Stephen Gould, Liang Zheng
In this work, we train semantic classification and rotation prediction in a multi-task way.
no code implementations • 2 Jan 2021 • Sourav Garg, Niko Sünderhauf, Feras Dayoub, Douglas Morrison, Akansel Cosgun, Gustavo Carneiro, Qi Wu, Tat-Jun Chin, Ian Reid, Stephen Gould, Peter Corke, Michael Milford
In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning.
no code implementations • 12 Dec 2020 • Weijian Deng, Joshua Marsh, Stephen Gould, Liang Zheng
The memory module stores the prototypical feature representation for each category as a moving average.
Ranked #51 on Fine-Grained Image Classification on CUB-200-2011
2 code implementations • ICCV 2021 • Frederic Z. Zhang, Dylan Campbell, Stephen Gould
We address the problem of detecting human-object interactions in images using graphical neural networks.
Ranked #18 on Human-Object Interaction Detection on V-COCO (using extra training data)
no code implementations • CVPR 2021 • Fatemeh Saleh, Sadegh Aliakbarian, Hamid Rezatofighi, Mathieu Salzmann, Stephen Gould
Despite the recent advances in multiple object tracking (MOT), achieved by joint detection and tracking, dealing with long occlusions remains a challenge.
1 code implementation • 26 Nov 2020 • Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
In this paper we propose a recurrent BERT model that is time-aware for use in VLN.
Ranked #7 on Visual Navigation on R2R
1 code implementation • NeurIPS 2021 • Sameera Ramasinghe, Moshiur Farazi, Salman Khan, Nick Barnes, Stephen Gould
Conditional GANs (cGAN), in their rudimentary form, suffer from critical drawbacks such as the lack of diversity in generated outputs and distortion between the latent and output manifolds.
1 code implementation • NeurIPS 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould
From both the textual and visual perspectives, we find that the relationships among the scene, its objects, and directional clues are essential for the agent to interpret complex instructions and correctly perceive the environment.
1 code implementation • 13 Oct 2020 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Basura Fernando, Hongdong Li, Stephen Gould
This paper studies the task of temporal moment localization in a long untrimmed video using natural language query.
no code implementations • ICLR 2021 • Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, Stephen Gould
Although deep learning has achieved appealing results on several machine learning tasks, most of the models are deterministic at inference, limiting their application to single-modal settings.
2 code implementations • ECCV 2020 • Dylan Campbell, Liu Liu, Stephen Gould
We instead propose the first fully end-to-end trainable network for solving the blind PnP problem efficiently and globally, that is, without the need for pose priors.
3 code implementations • ECCV 2020 • Yunzhong Hou, Liang Zheng, Stephen Gould
First, how should we aggregate cues from the multiple views?
Ranked #5 on Multiview Detection on MultiviewX
1 code implementation • 1 Jul 2020 • Yizhak Ben-Shabat, Xin Yu, Fatemeh Sadat Saleh, Dylan Campbell, Cristian Rodriguez-Opazo, Hongdong Li, Stephen Gould
The availability of a large labeled dataset is a key requirement for applying deep learning methods to solve various computer vision tasks.
1 code implementation • 22 Jun 2020 • Yao Lu, Stephen Gould, Thalaiyasingam Ajanthan
The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks.
no code implementations • WS 2020 • Edison Marrese-Taylor, Cristian Rodriguez-Opazo, Jorge A. Balazs, Stephen Gould, Yutaka Matsuo
Despite the recent advances in opinion mining for written reviews, few works have tackled the problem on other sources of reviews.
no code implementations • 28 Apr 2020 • Rodrigo Santa Cruz, Anoop Cherian, Basura Fernando, Dylan Campbell, Stephen Gould
This paper presents a framework to recognize temporal compositions of atomic actions in videos.
no code implementations • 16 Apr 2020 • Fatemeh Saleh, Sadegh Aliakbarian, Mathieu Salzmann, Stephen Gould
One of the core components in online multiple object tracking (MOT) frameworks is associating new detections with existing tracklets, typically done via a scoring function.
1 code implementation • EMNLP 2020 • Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould
Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions.
1 code implementation • ECCV 2020 • Yizhak Ben-Shabat, Stephen Gould
We propose a surface fitting method for unstructured 3D point clouds.
Ranked #6 on Surface Normals Estimation on PCPNet
1 code implementation • CVPR 2020 • Yunzhong Hou, Liang Zheng, Stephen Gould
Color and structure are the two pillars that construct an image.
no code implementations • 26 Feb 2020 • Shihao Jiang, Dylan Campbell, Miaomiao Liu, Stephen Gould, Richard Hartley
We address the problem of joint optical flow and camera motion estimation in rigid scenes by incorporating geometric constraints into an unsupervised deep learning framework.
no code implementations • ICCV 2021 • Sadegh Aliakbarian, Fatemeh Sadat Saleh, Lars Petersson, Stephen Gould, Mathieu Salzmann
We tackle the task of diverse 3D human motion prediction, that is, forecasting multiple plausible future 3D poses given a sequence of observed 3D poses.
1 code implementation • 4 Dec 2019 • Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould
Point-clouds are a popular choice for vision and graphics tasks due to their accurate shape description and direct acquisition from range-scanners.
no code implementations • 30 Nov 2019 • Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould
In this work, we propose a novel `\emph{volumetric convolution}' operation that can effectively model and convolve arbitrary functions in $\mathbb{B}^3$.
1 code implementation • 11 Sep 2019 • Stephen Gould, Richard Hartley, Dylan Campbell
We show how these declarative processing nodes can be implemented in the popular PyTorch deep learning software library allowing declarative and imperative nodes to co-exist within the same network.
no code implementations • 24 Aug 2019 • Sameera Ramasinghe, Salman Khan, Nick Barnes, Stephen Gould
Existing networks directly learn feature representations on 3D point clouds for shape analysis.
1 code implementation • 20 Aug 2019 • Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Fatemeh Sadat Saleh, Hongdong Li, Stephen Gould
Given an untrimmed video and a sentence as the query, the goal is to determine the starting, and the ending, of the relevant visual moment in the video, that corresponds to the query sentence.
no code implementations • 2 Aug 2019 • Mohammad Sadegh Aliakbarian, Fatemeh Sadat Saleh, Mathieu Salzmann, Lars Petersson, Stephen Gould, Amirhossein Habibian
In this paper, we introduce an approach to stochastically combine the root of variations with previous pose information, which forces the model to take the noise into account.
1 code implementation • ICLR 2020 • Namhoon Lee, Thalaiyasingam Ajanthan, Stephen Gould, Philip H. S. Torr
Alternatively, a recent approach shows that pruning can be done at initialization prior to training, based on a saliency criterion called connection sensitivity.
1 code implementation • ICCV 2019 • Amirreza Shaban, Amir Rahimi, Shray Bansal, Stephen Gould, Byron Boots, Richard Hartley
We model the selection as an energy minimization problem with unary and pairwise potential functions.
no code implementations • CVPR 2019 • Dylan Campbell, Lars Petersson, Laurent Kneip, Hongdong Li, Stephen Gould
Determining the position and orientation of a calibrated camera from a single image with respect to a 3D model is an essential task for many applications.
no code implementations • NeurIPS 2018 • Peter Anderson, Stephen Gould, Mark Johnson
To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets.
no code implementations • CVPR 2018 • Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley
As these features are often non-linear, we propose a novel pooling method, kernelized rank pooling, that represents a given sequence compactly as the pre-image of the parameters of a hyperplane in a reproducing kernel Hilbert space, projections of data onto which captures their temporal order.
no code implementations • CVPR 2018 • Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould
In an attempt to tackle this problem, we propose discriminative pooling, based on the notion that among the deep features generated on all short clips, there is at least one that characterizes the action.
no code implementations • 26 Jan 2018 • Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
In this paper, we build on the compositionality principle and develop an "algebra" to compose classifiers for complex visual concepts.
8 code implementations • CVPR 2018 • Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel
This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.
Ranked #10 on Visual Navigation on R2R
no code implementations • 19 Sep 2017 • Tengda Han, Jue Wang, Anoop Cherian, Stephen Gould
For effective human-robot interaction, it is important that a robotic assistant can forecast the next action a human will consider in a given task.
65 code implementations • CVPR 2018 • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.
Ranked #29 on Visual Question Answering (VQA) on VQA v2 test-std
no code implementations • 24 Jul 2017 • Sam Toyer, Anoop Cherian, Tengda Han, Stephen Gould
Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving.
no code implementations • 6 Jun 2017 • Fatemeh Sadat Saleh, Mohammad Sadegh Aliakbarian, Mathieu Salzmann, Lars Petersson, Jose M. Alvarez, Stephen Gould
We then show how to obtain multi-class masks by the fusion of foreground/background ones with information extracted from a weakly-supervised localization network.
1 code implementation • 30 May 2017 • Basura Fernando, Stephen Gould
First, we present "discriminative rank pooling" in which the shared weights of our video representation and the parameters of the action classifiers are estimated jointly for a given training dataset of labelled vector sequences using a bilevel optimization formulation of the learning problem.
no code implementations • 23 Apr 2017 • Anoop Cherian, Stephen Gould
We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space.
no code implementations • CVPR 2017 • Rodrigo Santa Cruz, Basura Fernando, Anoop Cherian, Stephen Gould
Unrolling these iterations in a Sinkhorn network layer, we propose DeepPermNet, an end-to-end CNN model for this task.
no code implementations • CVPR 2017 • Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould
Most popular deep models for action recognition split video sequences into short sub-sequences consisting of a few frames; frame-based features are then pooled for recognizing the activity.
no code implementations • 6 Apr 2017 • Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould
Applying multiple instance learning in an SVM setup, we use the parameters of this separating hyperplane as a descriptor for the video.
no code implementations • 19 Jan 2017 • Anoop Cherian, Piotr Koniusz, Stephen Gould
The HOK descriptors are then generated from the higher-order co-occurrences of these feature maps, and are then used as input to a video-level classifier.
1 code implementation • EMNLP 2017 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects.
no code implementations • 2 Dec 2016 • Basura Fernando, Sareh Shirazi, Stephen Gould
On the MPII Cooking dataset we detect action segments with a precision of 21. 6% and recall of 11. 7% over 946 long video pairs and over 5000 ground truth action segments.
no code implementations • CVPR 2017 • Basura Fernando, Hakan Bilen, Efstratios Gavves, Stephen Gould
On action classification, our method obtains 60. 3\% on the UCF101 dataset using only UCF101 data for training which is approximately 10% better than current state-of-the-art self-supervised learning methods.
Ranked #47 on Self-Supervised Action Recognition on UCF101
no code implementations • 2 Sep 2016 • Fatemehsadat Saleh, Mohammad Sadegh Ali Akbarian, Mathieu Salzmann, Lars Petersson, Stephen Gould, Jose M. Alvarez
Hence, weak supervision using only image tags could have a significant impact in semantic segmentation.
11 code implementations • 29 Jul 2016 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
There is considerable interest in the task of automatically generating image captions.
no code implementations • 19 Jul 2016 • Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, Edison Guo
Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem.
1 code implementation • CVPR 2016 • Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, Stephen Gould
We introduce the concept of dynamic image, a novel compact representation of videos useful for video analysis especially when convolutional neural networks (CNNs) are used.
Ranked #62 on Action Recognition on HMDB-51
no code implementations • CVPR 2016 • Basura Fernando, Peter Anderson, Marcus Hutter, Stephen Gould
We present hierarchical rank pooling, a video sequence encoding method for activity recognition.
no code implementations • ICCV 2015 • Trung T. Pham, Ian Reid, Yasir Latif, Stephen Gould
Specifically, we relax the labelling problem to a regression, and generalize the higher-order associative P n Potts model to a new family of arbitrary higher-order models based on regression forests.
no code implementations • 23 Jul 2015 • Seyed Hamid Rezatofighi, Stephen Gould, Ba Tuong Vo, Ba-Ngu Vo, Katarina Mele, Richard Hartley
To deal with this, we propose a bootstrap filter composed of an estimator and a tracker.
no code implementations • 24 Jun 2015 • Jian Guo, Stephen Gould
We report on the methods used in our recent DeepEnsembleCoco submission to the PASCAL VOC 2012 challenge, which achieves state-of-the-art performance on the object detection task.
no code implementations • CVPR 2014 • Xuming He, Stephen Gould
We address the problem of joint detection and segmentation of multiple object instances in an image, a key step towards scene understanding.
no code implementations • NeurIPS 2009 • Stephen Gould, Tianshi Gao, Daphne Koller
Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other.
no code implementations • NeurIPS 2008 • Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller
We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d scene reconstruction.
no code implementations • NeurIPS 2008 • Gal Elidan, Stephen Gould
In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound.