1 code implementation • 12 Mar 2025 • Yefei He, Yuanyu He, Shaoxuan He, Feng Chen, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
In this paper, we propose Neighboring Autoregressive Modeling (NAR), a novel paradigm that formulates autoregressive visual generation as a progressive outpainting procedure, following a near-to-far ``next-neighbor prediction" mechanism.
1 code implementation • 10 Mar 2025 • Zeyu Zhang, Yiran Wang, Wei Mao, Danning Li, Rui Zhao, Biao Wu, Zirui Song, Bohan Zhuang, Ian Reid, Richard Hartley
Conditional motion generation has been extensively studied in computer vision, yet two critical challenges remain.
Ranked #1 on
Motion Synthesis
on TMD
no code implementations • 18 Dec 2024 • Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang
To address the inefficiency, model merging strategies have emerged, merging all LLMs into one model to reduce the memory footprint during inference.
1 code implementation • 5 Dec 2024 • Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency.
no code implementations • 2 Dec 2024 • Zhuokun Chen, Jinwu Hu, Zeshuai Deng, Yufeng Wang, Bohan Zhuang, Mingkui Tan
By merging the parameters of language models from these MLLMs, VisionFuse allows a single language model to align with various vision encoders, significantly reducing deployment overhead.
no code implementations • 22 Nov 2024 • Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu
To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics.
1 code implementation • 10 Nov 2024 • Zeyu Zhang, Hang Gao, Akide Liu, Qi Chen, Feng Chen, Yiran Wang, Danning Li, Rui Zhao, ZhenMing Li, Zhongwen Zhou, Hao Tang, Bohan Zhuang
The recent Mamba architecture shows promising results in efficiently modeling long and complex sequences, yet two significant challenges remain: Firstly, directly applying Mamba to extended motion generation is ineffective, as the limited capacity of the implicit memory leads to memory decay.
1 code implementation • 7 Nov 2024 • Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, Jianfei Cai
To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\deg} NVS tasks.
no code implementations • 11 Oct 2024 • Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
The efficiency of large vision-language models (LVLMs) is constrained by the computational bottleneck of the attention mechanism during the prefill phase and the memory bottleneck of fetching the key-value (KV) cache in the decoding phase, particularly in scenarios involving high-resolution images or videos.
no code implementations • 1 Sep 2024 • Sanuwani Dayarathna, Kh Tohidul Islam, Bohan Zhuang, Guang Yang, Jianfei Cai, Meng Law, Zhaolin Chen
Moreover, existing methods for multi-contrast MRI synthesis often fail to accurately map feature-level information across multiple imaging contrasts.
1 code implementation • 6 Aug 2024 • Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields.
1 code implementation • 14 Jul 2024 • Zeyu Zhang, Akide Liu, Qi Chen, Feng Chen, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
Text-to-motion generation holds potential for film, gaming, and robotics, yet current methods often prioritize short motion generation, making it challenging to produce long motion sequences effectively: (1) Current methods struggle to handle long motion sequences as a single input due to prohibitively high computational cost; (2) Breaking down the generation of long motion sequences into shorter segments can result in inconsistent transitions and requires interpolation or inpainting, which lacks entire sequence modeling.
no code implementations • 6 Jul 2024 • Guoan Wang, Jin Ye, Junlong Cheng, Tianbin Li, Zhaolin Chen, Jianfei Cai, Junjun He, Bohan Zhuang
Supervised Finetuning (SFT) serves as an effective way to adapt such foundation models for task-specific downstream tasks but at the cost of degrading the general knowledge previously stored in the original foundation model. To address this, we propose SAM-Med3D-MoE, a novel framework that seamlessly integrates task-specific finetuned models with the foundational model, creating a unified model at minimal additional training expense for an extra gating network.
no code implementations • 13 Jun 2024 • Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang
LLM development involves pre-training a foundation model on massive data, followed by fine-tuning on task-specific data to create specialized experts.
1 code implementation • 30 May 2024 • Feng Chen, Zhen Yang, Bohan Zhuang, Qi Wu
We present a novel task called online video editing, which is designed to edit \textbf{streaming} frames while maintaining temporal consistency.
1 code implementation • 23 May 2024 • Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang
In terms of efficiency, ZipCache also showcases a $37. 3\%$ reduction in prefill-phase latency, a $56. 9\%$ reduction in decoding-phase latency, and a $19. 8\%$ reduction in GPU memory usage when evaluating LLaMA3-8B model with a input length of $4096$.
no code implementations • 23 May 2024 • Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang
In this paper, we present a simple yet effective approach, called MiniCache, to compress the KV cache across layers from a novel depth perspective, significantly reducing the memory footprint for LLM inference.
1 code implementation • 4 Apr 2024 • Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang
In this way, we encode video representations that incorporate both local and global information, enabling the LLM to generate comprehensive responses for long-term videos.
1 code implementation • 21 Mar 2024 • Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians.
Ranked #1 on
Generalizable Novel View Synthesis
on ACID
1 code implementation • 12 Mar 2024 • Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging.
Ranked #12 on
Motion Synthesis
on KIT Motion-Language
1 code implementation • 21 Feb 2024 • Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar
Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.
1 code implementation • CVPR 2024 • Xinyu Wang, Bohan Zhuang, Qi Wu
This alignment process, which synchronizes a language model trained on textual data with encoders and decoders trained on multi-modal data, often necessitates extensive training of several projection layers in multiple stages.
1 code implementation • CVPR 2024 • Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang
In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.
1 code implementation • NeurIPS 2023 • Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan
Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images.
1 code implementation • NeurIPS 2023 • Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang
By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs.
1 code implementation • 18 Oct 2023 • Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen
Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.
2 code implementations • 12 Oct 2023 • Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.
1 code implementation • 5 Oct 2023 • Yefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
In this paper, we introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
no code implementations • 14 Sep 2023 • Xinyu Wang, Bohan Zhuang, Qi Wu
To bridge this gap, we propose a novel approach, \methodname, from a modality conversion perspective that evolves a text-based LLM into a multi-modal one.
1 code implementation • 30 Jun 2023 • Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.
1 code implementation • 28 May 2023 • Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang
To this end, we propose LoRAPrune, a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner.
1 code implementation • NeurIPS 2023 • Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process.
1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang
Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.
2 code implementations • CVPR 2023 • Zizheng Pan, Jianfei Cai, Bohan Zhuang
As each model family consists of pretrained models with diverse scales (e. g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime.
no code implementations • 2 Feb 2023 • Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen
Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by the efficient use of computation and memory resources.
no code implementations • ICCV 2023 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.
no code implementations • 14 Nov 2022 • Yefei He, Zhenyu Lou, Luoming Zhang, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization.
1 code implementation • 19 Sep 2022 • Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang
To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.
no code implementations • 23 Aug 2022 • Jing Liu, Jianfei Cai, Bohan Zhuang
During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.
no code implementations • 21 Jul 2022 • Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang
The task of action detection aims at deducing both the action category and localization of the start and end moment for each action instance in a long, untrimmed video.
5 code implementations • 26 May 2022 • Zizheng Pan, Jianfei Cai, Bohan Zhuang
Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map.
Ranked #190 on
Image Classification
on ImageNet
(GFLOPs metric)
2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang
In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.
Ranked #24 on
Semantic Segmentation
on ADE20K
2 code implementations • CVPR 2022 • Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang
First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.
4 code implementations • 24 Nov 2021 • Jing Liu, Jianfei Cai, Bohan Zhuang
However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.
Ranked #205 on
Image Classification
on CIFAR-100
3 code implementations • 23 Nov 2021 • Haoyu He, Jianfei Cai, Jing Liu, Zizheng Pan, Jing Zhang, DaCheng Tao, Bohan Zhuang
Relying on the single-path space, we introduce learnable binary gates to encode the operation choices in MSA layers.
Ranked #18 on
Efficient ViTs
on ImageNet-1K (with DeiT-T)
3 code implementations • 22 Nov 2021 • Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang
While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.
no code implementations • 3 Aug 2021 • Jing Liu, Bohan Zhuang, Mingkui Tan, Xu Liu, Dinh Phung, Yuanqing Li, Jianfei Cai
More critically, EAS is able to find compact architectures within 0. 1 second for 50 deployment scenarios.
2 code implementations • 29 May 2021 • Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai
Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.
1 code implementation • 4 May 2021 • Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao
To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).
2 code implementations • ICCV 2021 • Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai
However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.
Ranked #22 on
Efficient ViTs
on ImageNet-1K (with DeiT-T)
no code implementations • 13 Jan 2021 • Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan
By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.
1 code implementation • 29 Nov 2020 • Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen
With the rising popularity of intelligent mobile devices, it is of great practical significance to develop accurate, realtime and energy-efficient image Super-Resolution (SR) inference methods.
no code implementations • ICCV 2021 • Peng Chen, Bohan Zhuang, Chunhua Shen
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
1 code implementation • CVPR 2021 • Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, Chunhua Shen
Network quantization allows inference to be conducted using low-precision arithmetic for improved inference efficiency of deep neural networks on edge devices.
1 code implementation • ICLR 2020 • Jie Fu, Xue Geng, Zhijian Duan, Bohan Zhuang, Xingdi Yuan, Adam Trischler, Jie Lin, Chris Pal, Hao Dong
To our knowledge, existing methods overlook the fact that although the student absorbs extra knowledge from the teacher, both models share the same input data -- and this data is the only medium by which the teacher's knowledge can be demonstrated.
3 code implementations • ECCV 2020 • Shoukai Xu, Haokun Li, Bohan Zhuang, Jing Liu, JieZhang Cao, Chuangrun Liang, Mingkui Tan
More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.
Ranked #2 on
Data Free Quantization
on CIFAR-100
no code implementations • 7 Feb 2020 • Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond
Instantaneous and on demand accuracy-efficiency trade-off has been recently explored in the context of neural networks slimming.
no code implementations • 3 Feb 2020 • Luis Guerra, Bohan Zhuang, Ian Reid, Tom Drummond
In particular, for ResNet-18 on ImageNet, we prune 26. 12% of the model size with Binarized Neural Network quantization, achieving a top-1 classification accuracy of 47. 32% in a model of 2. 47 MB and 59. 30% with a 2-bit DoReFa-Net in 4. 36 MB.
1 code implementation • 4 Jan 2020 • Jing Liu, Bohan Zhuang, Zhuangwei Zhuang, Yong Guo, Junzhou Huang, Jinhui Zhu, Mingkui Tan
In this paper, we propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.
no code implementations • 22 Sep 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid
Experiments on both classification, semantic segmentation and object detection tasks demonstrate the superior performance of the proposed methods over various quantized networks in the literature.
no code implementations • 5 Sep 2019 • Yifan Liu, Bohan Zhuang, Chunhua Shen, Hao Chen, Wei Yin
The most current methods can be categorized as either: (i) hard parameter sharing where a subset of the parameters is shared among tasks while other parameters are task-specific; or (ii) soft parameter sharing where all parameters are task-specific but they are jointly regularized.
no code implementations • 21 Aug 2019 • Chunlei Liu, Wenrui Ding, Xin Xia, Yuan Hu, Baochang Zhang, Jianzhuang Liu, Bohan Zhuang, Guodong Guo
Binarized convolutional neural networks (BCNNs) are widely used to improve memory and computation efficiency of deep convolutional neural networks (DCNNs) for mobile and AI chips based applications.
no code implementations • 10 Aug 2019 • Bohan Zhuang, Jing Liu, Mingkui Tan, Lingqiao Liu, Ian Reid, Chunhua Shen
Furthermore, we propose a second progressive quantization scheme which gradually decreases the bit-width from high-precision to low-precision during training.
no code implementations • CVPR 2020 • Bohan Zhuang, Lingqiao Liu, Mingkui Tan, Chunhua Shen, Ian Reid
In this paper, we seek to tackle a challenge in training low-precision networks: the notorious difficulty in propagating gradient through a low-precision network due to the non-differentiable quantization function.
no code implementations • CVPR 2019 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid
In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading to quantized models specifically} for mobile devices with limited power capacity and computation resources.
1 code implementation • NeurIPS 2018 • Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu
Channel pruning is one of the predominant approaches for deep model compression.
no code implementations • 8 Aug 2018 • Bohan Zhuang, Chunhua Shen, Ian Reid
In this paper, we propose to train a network with binary weights and low-bitwidth activations, designed especially for mobile devices with limited power consumption.
no code implementations • CVPR 2018 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs.
2 code implementations • CVPR 2018 • Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid
This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations.
1 code implementation • ICCV 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid
The proposed method still builds one classifier for one interaction (as per type (ii) above), but the classifier built is adaptive to context via weights which are context dependent.
no code implementations • 7 Jul 2017 • Hao Lu, Zhiguo Cao, Yang Xiao, Bohan Zhuang, Chunhua Shen
To our knowledge, this is the first time that a plant-related counting problem is considered using computer vision technologies under unconstrained field-based environment.
no code implementations • 28 May 2017 • Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton Van Den Hengel
In addressing this problem we first construct a large-scale human-centric visual relationship detection dataset (HCVRD), which provides many more types of relationship annotation (nearly 10K categories) than the previous released datasets.
Human-Object Interaction Detection
Relationship Detection
+1
no code implementations • 18 Mar 2017 • Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Ian Reid
Recognizing how objects interact with each other is a crucial task in visual recognition.
no code implementations • CVPR 2017 • Yao Li, Guosheng Lin, Bohan Zhuang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel
In this work, we propose to model the relational information between people as a sequence prediction task.
no code implementations • CVPR 2017 • Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, Ian Reid
Large-scale datasets have driven the rapid development of deep neural networks for visual recognition.
no code implementations • 27 Jul 2016 • Bohan Zhuang, Lijun Wang, Huchuan Lu
In the discriminative model, we exploit the advances of deep learning architectures to learn generic features which are robust to both background clutters and foreground appearance variations.
no code implementations • CVPR 2016 • Bohan Zhuang, Guosheng Lin, Chunhua Shen, Ian Reid
To solve the first stage, we design a large-scale high-order binary codes inference algorithm to reduce the high-order objective to a standard binary quadratic problem such that graph cuts can be used to efficiently infer the binary code which serve as the label of each training datum.