Search Results for author: Mingyu Ding

Found 58 papers, 27 papers with code

RoadBEV: Road Surface Reconstruction in Bird's Eye View

1 code implementation • 9 Apr 2024 • Tong Zhao, Lei Yang, Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Yintao Wei

This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively.

Autonomous Driving Monocular Depth Estimation +2

104

Paper
Code

Q-SLAM: Quadric Representations for Monocular SLAM

no code implementations • 12 Mar 2024 • Chensheng Peng, Chenfeng Xu, Yue Wang, Mingyu Ding, Heng Yang, Masayoshi Tomizuka, Kurt Keutzer, Marco Pavone, Wei Zhan

This focus results in a significant disconnect between NeRF applications, i. e., novel-view synthesis and the requirements of SLAM.

3D Reconstruction Depth Estimation +2

Paper
Add Code

PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models

no code implementations • 26 Feb 2024 • Dingkun Guo, Yuqi Xiang, Shuqi Zhao, Xinghao Zhu, Masayoshi Tomizuka, Mingyu Ding, Wei Zhan

With these two capabilities, PhyGrasp is able to accurately assess the physical properties of object parts and determine optimal grasping poses.

Object Physical Commonsense Reasoning +1

Paper
Add Code

RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

no code implementations • 25 Feb 2024 • Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI.

Ranked #77 on Visual Question Answering on MM-Vet

Code Generation Multimodal Reasoning +1

Paper
Add Code

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

no code implementations • 22 Feb 2024 • Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language.

Code Generation Common Sense Reasoning +2

Paper
Add Code

Depth-aware Volume Attention for Texture-less Stereo Matching

1 code implementation • 14 Feb 2024 • Tong Zhao, Mingyu Ding, Wei Zhan, Masayoshi Tomizuka, Yintao Wei

Furthermore, we propose a more rigorous evaluation metric that considers depth-wise relative error, providing comprehensive evaluations for universal stereo matching and depth estimation models.

Depth Estimation Stereo Matching

Paper
Code

SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution

1 code implementation • 18 Dec 2023 • Zhixuan Liang, Yao Mu, Hengbo Ma, Masayoshi Tomizuka, Mingyu Ding, Ping Luo

Experiments on multi-task robotic manipulation benchmarks like Meta-World and LOReL demonstrate state-of-the-art performance and human-interpretable skill representations from SkillDiffuser.

Trajectory Planning

Paper
Code

A Survey of Reasoning with Foundation Models

1 code implementation • 17 Dec 2023 • Jiankai Sun, Chuanyang Zheng, Enze Xie, Zhengying Liu, Ruihang Chu, Jianing Qiu, Jiaqi Xu, Mingyu Ding, Hongyang Li, Mengzhe Geng, Yue Wu, Wenhai Wang, Junsong Chen, Zhangyue Yin, Xiaozhe Ren, Jie Fu, Junxian He, Wu Yuan, Qi Liu, Xihui Liu, Yu Li, Hao Dong, Yu Cheng, Ming Zhang, Pheng Ann Heng, Jifeng Dai, Ping Luo, Jingdong Wang, Ji-Rong Wen, Xipeng Qiu, Yike Guo, Hui Xiong, Qun Liu, Zhenguo Li

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation.

Medical Diagnosis

370

Paper
Code

Interfacing Foundation Models' Embeddings

1 code implementation • 12 Dec 2023 • Xueyan Zou, Linjie Li, JianFeng Wang, Jianwei Yang, Mingyu Ding, Zhengyuan Yang, Feng Li, Hao Zhang, Shilong Liu, Arul Aravinthan, Yong Jae Lee, Lijuan Wang

The proposed interface is adaptive to new tasks, and new models.

Decoder Image Segmentation +3

Paper
Code

EgoPlan-Bench: Benchmarking Egocentric Embodied Planning with Multimodal Large Language Models

1 code implementation • 11 Dec 2023 • Yi Chen, Yuying Ge, Yixiao Ge, Mingyu Ding, Bohao Li, Rui Wang, Ruifeng Xu, Ying Shan, Xihui Liu

Given diverse environmental inputs, including real-time task progress, visual observations, and open-form language instructions, a proficient task planner is expected to predict feasible actions, which is a feat inherently achievable by Multimodal Large Language Models (MLLMs).

Benchmarking Human-Object Interaction Detection

Paper
Code

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

no code implementations • 12 Oct 2023 • Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo

This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations.

Decision Making

Paper
Add Code

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions

no code implementations • ICCV 2023 • Chengyang Zhao, Yikang Shen, Zhenfang Chen, Mingyu Ding, Chuang Gan

To tackle this problem, we propose a new framework TextPSG consisting of four modules, i. e., a region grouper, an entity grounder, a segment merger, and a label generator, with several novel techniques.

Graph Generation Panoptic Scene Graph Generation +1

Paper
Add Code

Human-oriented Representation Learning for Robotic Manipulation

no code implementations • 4 Oct 2023 • Mingxiao Huo, Mingyu Ding, Chenfeng Xu, Thomas Tian, Xinghao Zhu, Yao Mu, Lingfeng Sun, Masayoshi Tomizuka, Wei Zhan

We introduce Task Fusion Decoder as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks.

Decoder Hand Detection +2

Paper
Add Code

LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving

no code implementations • 4 Oct 2023 • Hao Sha, Yao Mu, YuXuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding

Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability.

Autonomous Driving Decision Making

Paper
Add Code

RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving

no code implementations • 3 Oct 2023 • Tong Zhao, Chenfeng Xu, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan, Yintao Wei

This paper addresses the growing demands for safety and comfort in intelligent robot systems, particularly autonomous vehicles, where road conditions play a pivotal role in overall driving performance.

Autonomous Driving Depth Estimation +3

Paper
Add Code

Generalizable Long-Horizon Manipulations with Large Language Models

no code implementations • 3 Oct 2023 • Haoyu Zhou, Mingyu Ding, Weikun Peng, Masayoshi Tomizuka, Lin Shao, Chuang Gan

This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations with novel objects and unseen tasks.

Paper
Add Code

Towards Free Data Selection with General-Purpose Models

1 code implementation • NeurIPS 2023 • Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan

However, current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.

Active Learning

Paper
Code

Pre-training on Synthetic Driving Data for Trajectory Prediction

no code implementations • 18 Sep 2023 • Yiheng Li, Seth Z. Zhao, Chenfeng Xu, Chen Tang, Chenran Li, Mingyu Ding, Masayoshi Tomizuka, Wei Zhan

We propose to augment both HD maps and trajectories and apply pre-training strategies on top of them.

Autonomous Driving Trajectory Forecasting

Paper
Add Code

An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

no code implementations • 29 Jun 2023 • Zitian Chen, Mingyu Ding, Yikang Shen, Wei Zhan, Masayoshi Tomizuka, Erik Learned-Miller, Chuang Gan

We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.

Continual Learning Multi-Task Learning

Paper
Add Code

Doubly Robust Self-Training

1 code implementation • 1 Jun 2023 • Banghua Zhu, Mingyu Ding, Philip Jacobson, Ming Wu, Wei Zhan, Michael Jordan, Jiantao Jiao

Self-training is an important technique for solving semi-supervised learning problems.

3D Object Detection Autonomous Driving +2

Paper
Code

EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

no code implementations • NeurIPS 2023 • Yao Mu, Qinglong Zhang, Mengkang Hu, Wenhai Wang, Mingyu Ding, Jun Jin, Bin Wang, Jifeng Dai, Yu Qiao, Ping Luo

In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities.

Image Captioning Language Modelling +3

Paper
Add Code

VDT: General-purpose Video Diffusion Transformers via Mask Modeling

1 code implementation • 22 May 2023 • Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding

We also propose a unified spatial-temporal mask modeling mechanism, seamlessly integrated with the model, to cater to diverse video generation scenarios.

Autonomous Driving Video Generation +1

182

Paper
Code

Quadric Representations for LiDAR Odometry, Mapping and Localization

no code implementations • 27 Apr 2023 • Chao Xia, Chenfeng Xu, Patrick Rim, Mingyu Ding, Nanning Zheng, Kurt Keutzer, Masayoshi Tomizuka, Wei Zhan

Current LiDAR odometry, mapping and localization methods leverage point-wise representations of 3D scenes and achieve high accuracy in autonomous driving tasks.

Autonomous Driving

Paper
Add Code

EC^2: Emergent Communication for Embodied Control

no code implementations • 19 Apr 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan

We learn embodied representations of video trajectories, emergent language, and natural language using a language model, which is then used to finetune a lightweight policy network for downstream control.

Contrastive Learning Language Modelling

Paper
Add Code

Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

no code implementations • 7 Apr 2023 • Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging the learned concepts; and (iv) a program executor with deterministic policies to execute each program.

Instruction Following Self-Supervised Learning

Paper
Add Code

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

1 code implementation • CVPR 2023 • Mingyu Ding, Yikang Shen, Lijie Fan, Zhenfang Chen, Zitian Chen, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

When looking at an image, we can decompose the scene into entities and their parts as well as obtain the dependencies between them.

Paper
Code

Planning with Large Language Models for Code Generation

no code implementations • 9 Mar 2023 • Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, Chuang Gan

Existing large language model-based code generation pipelines typically use beam search or sampling algorithms during the decoding process.

Code Generation Language Modelling +1

Paper
Add Code

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

2 code implementations • 13 Feb 2023 • Haoyu Lu, Yuqi Huo, Guoxing Yang, Zhiwu Lu, Wei Zhan, Masayoshi Tomizuka, Mingyu Ding

Particularly, on the MSRVTT retrieval task, UniAdapter achieves 49. 7% recall@1 with 2. 2% model parameters, outperforming the latest competitors by 2. 0%.

Retrieval Text Retrieval +3

Paper
Code

AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners

1 code implementation • 3 Feb 2023 • Zhixuan Liang, Yao Mu, Mingyu Ding, Fei Ni, Masayoshi Tomizuka, Ping Luo

For example, AdaptDiffuser not only outperforms the previous art Diffuser by 20. 8% on Maze2D and 7. 5% on MuJoCo locomotion, but also adapts better to new tasks, e. g., KUKA pick-and-place, by 27. 9% without requiring additional expert data.

Paper
Code

Understanding Self-Supervised Pretraining with Part-Aware Representation Learning

1 code implementation • 27 Jan 2023 • Jie Zhu, Jiyang Qi, Mingyu Ding, Xiaokang Chen, Ping Luo, Xinggang Wang, Wenyu Liu, Leye Wang, Jingdong Wang

The study is mainly motivated by that random views, used in contrastive learning, and random masked (visible) patches, used in masked image modeling, are often about object parts.

Contrastive Learning Object +1

Paper
Code

EC2: Emergent Communication for Embodied Control

no code implementations • CVPR 2023 • Yao Mu, Shunyu Yao, Mingyu Ding, Ping Luo, Chuang Gan

Contrastive Learning Language Modelling

Paper
Add Code

Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners

no code implementations • CVPR 2023 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik G. Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

no code implementations • 15 Dec 2022 • Zitian Chen, Yikang Shen, Mingyu Ding, Zhenfang Chen, Hengshuang Zhao, Erik Learned-Miller, Chuang Gan

To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad').

Multi-Task Learning

Paper
Add Code

NeRF-Loc: Transformer-Based Object Localization Within Neural Radiance Fields

no code implementations • 24 Sep 2022 • Jiankai Sun, Yan Xu, Mingyu Ding, Hongwei Yi, Chen Wang, Jingdong Wang, Liangjun Zhang, Mac Schwager

Using current NeRF training tools, a robot can train a NeRF environment model in real-time and, using our algorithm, identify 3D bounding boxes of objects of interest within the NeRF for downstream navigation or manipulation tasks.

Object Localization Robot Navigation

Paper
Add Code

LGDN: Language-Guided Denoising Network for Video-Language Modeling

no code implementations • 23 Sep 2022 • Haoyu Lu, Mingyu Ding, Nanyi Fei, Yuqi Huo, Zhiwu Lu

However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e. g., scenery shot, transition or teaser).

Denoising Language Modelling

Paper
Add Code

Multimodal foundation models are better simulators of the human brain

1 code implementation • 17 Aug 2022 • Haoyu Lu, Qiongyi Zhou, Nanyi Fei, Zhiwu Lu, Mingyu Ding, Jingyuan Wen, Changde Du, Xin Zhao, Hao Sun, Huiguang He, Ji-Rong Wen

Further, from the perspective of neural encoding (based on our foundation model), we find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.

Paper
Code

CtrlFormer: Learning Transferable State Representation for Visual Control via Transformer

1 code implementation • 17 Jun 2022 • Yao Mu, Shoufa Chen, Mingyu Ding, Jianyu Chen, Runjian Chen, Ping Luo

In visual control, learning transferable state representation that can transfer between different control tasks is important to reduce the training sample size.

Transfer Learning

Paper
Code

ComPhy: Compositional Physical Reasoning of Objects and Events from Videos

no code implementations • ICLR 2022 • Zhenfang Chen, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset.

Paper
Add Code

DaViT: Dual Attention Vision Transformers

3 code implementations • 7 Apr 2022 • Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan

We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention.

Ranked #1 on Instance Segmentation on Object Detection on COCO minival

Computational Efficiency Image Classification +4

30,231

Paper
Code

Context Autoencoder for Self-Supervised Representation Learning

6 code implementations • 7 Feb 2022 • Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches.

Ranked #14 on Self-Supervised Image Classification on ImageNet (finetuned)

Decoder Instance Segmentation +6

3,110

Paper
Code

Compressed Video Contrastive Learning

no code implementations • NeurIPS 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa.

Contrastive Learning Representation Learning

Paper
Add Code

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

no code implementations • NeurIPS 2021 • Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine.

counterfactual Visual Reasoning

Paper
Add Code

L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing

2 code implementations • CVPR 2021 • Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, Tao Xiang

To overcome these limitations, we propose a novel latent space factorization model, called L2M-GAN, which is learned end-to-end and effective for editing both local and global attributes.

Attribute Disentanglement

Paper
Code

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

1 code implementation • CVPR 2021 • Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo

Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.

Image Classification Neural Architecture Search +3

138

Paper
Code

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

1 code implementation • 5 May 2021 • Enze Xie, Wenhai Wang, Mingyu Ding, Ruimao Zhang, Ping Luo

Extensive experiments demonstrate the effectiveness of both PolarMask and PolarMask++, which achieve competitive results on instance segmentation in the challenging COCO dataset with single-model and single-scale training and testing, as well as new state-of-the-art results on rotate text detection and cell segmentation.

Ranked #81 on Instance Segmentation on COCO test-dev (using extra training data)

Cell Segmentation Instance Segmentation +5

869

Paper
Code

Learning Versatile Neural Architectures by Propagating Network Codes

1 code implementation • ICLR 2022 • Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo

(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.

Image Segmentation Neural Architecture Search +2

Paper
Code

Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

no code implementations • 1 Jan 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo

With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.

Representation Learning

Paper
Add Code

IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning

1 code implementation • ICLR 2021 • Manli Zhang, Jianhong Zhang, Zhiwu Lu, Tao Xiang, Mingyu Ding, Songfang Huang

Importantly, at the episode-level, two SSL-FSL hybrid learning objectives are devised: (1) The consistency across the predictions of an FSL classifier from different extended episodes is maximized as an episode-level pretext task.

Few-Shot Learning Self-Supervised Learning +2

Paper
Code

Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking

2 code implementations • ECCV 2020 • Jian-Feng Yan, Zizhuang Wei, Hongwei Yi, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, Yu-Wing Tai

In this paper, we propose an efficient and effective dense hybrid recurrent multi-view stereo net with dynamic consistency checking, namely $D^{2}$HC-RMVSNet, for accurate dense point cloud reconstruction.

Point cloud reconstruction

109

Paper
Code

Segmenting Transparent Objects in the Wild

1 code implementation • ECCV 2020 • Enze Xie, Wenjia Wang, Wenhai Wang, Mingyu Ding, Chunhua Shen, Ping Luo

To address this important problem, this work proposes a large-scale dataset for transparent object segmentation, named Trans10K, consisting of 10, 428 images of real scenarios with carefully manual annotations, which are 10 times larger than the existing datasets.

Ranked #4 on Semantic Segmentation on Trans10K

Segmentation Semantic Segmentation +1

Paper
Code

Domain-Adaptive Few-Shot Learning

1 code implementation • 19 Mar 2020 • An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo

Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples.

Domain Adaptation Few-Shot Learning

136

Paper
Code

SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud

no code implementations • 13 Feb 2020 • Hongwei Yi, Shaoshuai Shi, Mingyu Ding, Jiankai Sun, Kui Xu, Hui Zhou, Zhe Wang, Sheng Li, Guoping Wang

First, the semantic context information in LiDAR is seldom explored in previous works, which may help identify ambiguous vehicles.

Autonomous Driving Semantic Segmentation

Paper
Add Code

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

2 code implementations • CVPR 2020 • Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, Ping Luo

3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information.

Ranked #17 on Vehicle Pose Estimation on KITTI Cars Hard

Monocular 3D Object Detection Object +2

313

Paper
Code

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

1 code implementation • ECCV 2020 • Hongwei Yi, Zizhuang Wei, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, Yu-Wing Tai

n this paper, we propose an effective and efficient pyramid multi-view stereo (MVS) net with self-adaptive view aggregation for accurate and complete dense point cloud reconstruction.

3D Point Cloud Reconstruction Depth Estimation +1

Paper
Code

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

no code implementations • 28 Nov 2019 • Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo

Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.

Optical Flow Estimation Segmentation +3

Paper
Add Code

CamNet: Coarse-to-Fine Retrieval for Camera Re-Localization

no code implementations • ICCV 2019 • Mingyu Ding, Zhe Wang, Jiankai Sun, Jianping Shi, Ping Luo

Camera re-localization is an important but challenging task in applications like robotics and autonomous driving.

Autonomous Driving regression +1

Paper
Add Code

Face-Focused Cross-Stream Network for Deception Detection in Videos

no code implementations • CVPR 2019 • Mingyu Ding, An Zhao, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

To address the training data scarcity problem, our FFCSN model is trained with both meta learning and adversarial learning.

Deception Detection In Videos Emotion Recognition +2

Paper
Add Code

Domain-Invariant Projection Learning for Zero-Shot Recognition

no code implementations • NeurIPS 2018 • An Zhao, Mingyu Ding, Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

This is made possible by learning a projection between a feature space and a semantic space (e. g. attribute space).

Attribute Transfer Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.