Sentence-Level Resampling for Named Entity Recognition

As a fundamental task in natural language processing, named entity recognition (NER) aims to locate and classify named entities in unstructured text.

基于词信息嵌入的汉语构词结构识别研究(Chinese Word-Formation Prediction based on Representations of Word-Related Features)

“作为一种意合型语言, 汉语中的构词结构刻画了构词成分之间的组合关系, 是认知、理解词义的关键。在中文信息处理领域, 此前的构词结构识别工作大多沿用句法层面的粗粒度标签, 且主要基于上下文等词间信息建模, 忽略了语素义、词义等词内信息对构词结构识别的作用。本文采用语言学视域下的构词结构标签体系, 构建汉语构词结构及相关信息数据集, 提出了一种基于Bi-LSTM和Self-attention的模型, 以此来探究词内、词间等多方面信息对构词结构识别的潜在影响和能达到的性能。实验取得了良好的预测效果, 准确率77. 87%, F1值78. 36%;同时, 对比测试揭示, 词内的语素义信息对构词结构识别具有显著的贡献, 而词间的上下文信息贡献较弱且带有较强的不稳定性。该预测方法与数据集, 将为中文信息处理的多种任务, 如语素和词结构分析、词义识别与生成、语言文字研究与词典编纂等提供新的观点和方案。”

Rethinking Directional Integration in Neural Radiance Fields

To that end, we introduce a modification to the NeRF rendering equation which is as simple as a few lines of code change for any NeRF variations, while greatly improving the rendering quality of view-dependent effects.

A Language Agent for Autonomous Driving

Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection.

Enhancing Few-shot CLIP with Semantic-Aware Fine-Tuning

Hence, we propose fine-tuning the parameters of the attention pooling layer during the training process to encourage the model to focus on task-specific semantics.

Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps

We propose a novel framework to integrate SD maps into online map prediction and propose a Transformer-based encoder, SD Map Encoder Representations from transFormers, to leverage priors in SD maps for the lane-topology prediction task.

G-SPEED: General SParse Efficient Editing MoDel

Large Language Models~(LLMs) have demonstrated incredible capabilities in understanding, generating, and manipulating languages.

GPT-Driver: Learning to Drive with GPT

In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs).

Sparsity-Based Channel Estimation Exploiting Deep Unrolling for Downlink Massive MIMO

Massive multiple-input multiple-output (MIMO) enjoys great advantage in 5G wireless communication systems owing to its spectrum and energy efficiency.

RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability.

RGAT: A Deeper Look into Syntactic Dependency Information for Coreference Resolution

Our experiments on a public Gendered Ambiguous Pronouns (GAP) dataset show that with the supervision learning of the syntactic dependency graph and without fine-tuning the entire BERT, we increased the F1-score of the previous best model (RGCN-with-BERT) from 80. 3% to 82. 5%, compared to the F1-score by single BERT embeddings from 78. 5% to 82. 5%.

Toward High Quality Facial Representation Learning

To improve the facial representation quality, we use feature map of a pre-trained visual backbone as a supervision item and use a partially pre-trained decoder for mask image modeling.

StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction

This approach limits their stability and performance in complex scenarios such as occlusions, largely due to the absence of temporal information.

Harnessing the Power of David against Goliath: Exploring Instruction Data Generation without Using Closed-Source Models

Instruction tuning is instrumental in enabling Large Language Models~(LLMs) to follow user instructions to complete various open-domain tasks.

Order-of-mutation effects on cancer progression: models for myeloproliferative neoplasm

no code implementations19 Aug 2023 Yue Wang, Blerta Shtylla, Tom Chou

In some patients with myeloproliferative neoplasms, two genetic mutations can be found, JAK2 V617F and TET2.

Exploiting Point-Wise Attention in 6D Object Pose Estimation Based on Bidirectional Prediction

Traditional geometric registration based estimation methods only exploit the CAD model implicitly, which leads to their dependence on observation quality and deficiency to occlusion.

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases.

Joint Radio Frequency Fingerprints Identification via Multi-antenna Receiver

When the number is small, the Mutual Information Weighting Scheme (MIWS) is developed by calculating the weighted voting of RFFI result at each antenna; when the number is moderate, the Distortions Filtering Scheme (DFS) is developed by filtering out the channel noise and receiver distortions; when the number is large enough, the Group-Distortions Filtering and Weighting Scheme (GDFWS) is developed, which integrates the advantages of MIWS and DFS.

Privately generating tabular data using language models

Privately generating synthetic data from a table is an important brick of a privacy-first world.

An Empirical Study on Challenging Math Problem Solving with GPT-4

Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields.

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM

In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.

Leveraging BEV Representation for 360-degree Visual Place Recognition

In addition, the image and point cloud cues can be easily stated in the same coordinates, which benefits sensor fusion for place recognition.

'Tax-free' 3DMM Conditional Face Generation

This results in a new model that effectively removes the quality tax between 3DMM conditioned face GANs and the unconditional StyleGAN.

Achieving the Minimax Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach

We show that an improved sample complexity of $\mathcal{O}(SC^{\pi^*}\epsilon^{-2}(1-\gamma)^{-3})$ can be obtained, which matches with the minimax lower bound for offline reinforcement learning, and thus is minimax optimal.


Discounted Thompson Sampling for Non-Stationary Bandit Problems

Under mild assumptions, we show that DS-TS with Gaussian priors can achieve nearly optimal regret bound on the order of $\tilde{O}(\sqrt{TB_T})$ for abruptly changing and $\tilde{O}(T^{\beta})$ for smoothly changing, where $T$ is the number of time steps, $B_T$ is the number of breakpoints, $\beta$ is associated with the smoothly changing environment and $\tilde{O}$ hides the parameters independent of $T$ as well as logarithmic terms.

Model-Free Robust Average-Reward Reinforcement Learning

Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs.

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations?

CodeT5+: Open Code Large Language Models for Code Understanding and Generation

To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

We abstract the features (i. e. learned representations) of multi-modal data into 1) uni-modal features, which can be learned from uni-modal training, and 2) paired features, which can only be learned from cross-modal interactions.

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

3D occupancy prediction, which estimates the detailed occupancy states and semantics of a scene, is an emerging task to overcome these limitations.

How to Control Hydrodynamic Force on Fluidic Pinball via Deep Reinforcement Learning

The finding from this work can control hydrodynamic force on the operation of fluidic pinball system and potentially pave the way for exploring efficient active flow control strategies in other complex fluid dynamic problems.

Neural Map Prior for Autonomous Driving

To the best of our knowledge, this is the first learning-based system for creating a global map prior.

How Does Imperfect Automatic Indexing Affect Semantic Search Performance?

In this work, we aim to understand the performance impact of using imperfectly assigned terms in Boolean semantic searches.

Simplifying Low-Light Image Enhancement Networks with Relative Loss Functions

In this paper, to make the learning easier in low-light image enhancement, we introduce FLW-Net (Fast and LightWeight Network) and two relative loss functions.

Object-centric Inference for Language Conditioned Placement: A Foundation Model based Approach

We focus on the task of language-conditioned object placement, in which a robot should generate placements that satisfy all the spatial relational constraints in language instructions.

UrbanGIRAFFE: Representing Urban Scenes as Compositional Generative Neural Feature Fields

Generating photorealistic images with controllable camera pose and scene contents is essential for many applications including AR/VR and simulation.

Optimal Smoothing Distribution Exploration for Backdoor Neutralization in Deep Learning-based Traffic Systems

The effectiveness of the proposed method is verified on a simulated traffic system based on a microscopic traffic simulator, where experimental results showcase that the smoothed traffic controller can neutralize all trigger samples and maintain the performance of relieving traffic congestion

Physical Backdoor Trigger Activation of Autonomous Vehicle using Reachability Analysis

Recent studies reveal that Autonomous Vehicles (AVs) can be manipulated by hidden backdoors, causing them to perform harmful actions when activated by physical triggers.

GOOD: General Optimization-based Fusion for 3D Object Detection via LiDAR-Camera Object Candidates

In this paper, we propose GOOD, a general optimization-based fusion framework that can achieve satisfying detection without training additional models and is available for any combinations of 2D and 3D detectors to improve the accuracy and robustness of 3D detection.

FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization

One is to regularize the frequency range of NeRF's inputs, while the other is to penalize the near-camera density fields.

Efficient Gridless DoA Estimation Method of Non-uniform Linear Arrays with Applications in Automotive Radars

This paper focuses on the gridless direction-of-arrival (DoA) estimation for data acquired by non-uniform linear arrays (NLAs) in automotive applications.

AERK: Aligned Entropic Reproducing Kernels through Continuous-time Quantum Walks

For pairwise graphs, the proposed AERK kernel is defined by computing a reproducing kernel based similarity between the quantum Shannon entropies of their each pair of aligned vertices.

Multimodal Industrial Anomaly Detection via Hybrid Fusion

2D-based Industrial Anomaly Detection has been widely discussed, however, multimodal industrial anomaly detection based on 3D point clouds and RGB images still has many untouched fields.

NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with Spatial-temporal Decomposition

Neural networks have shown great potential in accelerating the solution of partial differential equations (PDEs).

GRAFS: Graphical Faceted Search System to Support Conceptual Understanding in Exploratory Search

When people search for information about a new topic within large document collections, they implicitly construct a mental model of the unfamiliar information space to represent what they currently know and guide their exploration into the unknown.

Mathematical models for order of mutation problem in myeloproliferative neoplasm: non-additivity and non-commutativity

In some patients of myeloproliferative neoplasm, two genetic mutations can be found: JAK2 V617F and TET2.

Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation

To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.

GE-Blender: Graph-Based Knowledge Enhancement for Blender

Although the great success of open-domain dialogue generation, unseen entities can have a large impact on the dialogue generation task.

Three facets of mathematical cancer biology research

In this review, we will discuss three mathematical approaches for studying cancer biology: population dynamics, gene regulation, and developmental biology.

Using Fano factors to determine certain types of gene autoregulation

These two propositions form a simple but robust method to infer the existence of autoregulation in certain scenarios from gene expression data.

Super-Resolution Harmonic Retrieval of Non-Circular Signals

This paper proposes a super-resolution harmonic retrieval method for uncorrelated strictly non-circular signals, whose covariance and pseudo-covariance present Toeplitz and Hankel structures, respectively.

Cell Population Growth Kinetics in the Presence of Stochastic Heterogeneity of Cell Phenotype

Recent studies at individual cell resolution have revealed phenotypic heterogeneity in nominally clonal tumor cell populations.

Algorithms for the uniqueness of the longest common subsequence

In this paper, we consider how to determine the uniqueness of the longest common subsequence.

Robust Average-Reward Markov Decision Processes

We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory

Neural Radiance Fields (NeRF) have demonstrated superior novel view synthesis performance but are slow at rendering.

RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding

However, existing data augmentation techniques in natural language understanding (NLU) may not fully capture the complexity of natural language variations, and they can be challenging to apply to large datasets.

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data.

ATASI-Net: An Efficient Sparse Reconstruction Network for Tomographic SAR Imaging with Adaptive Threshold

In addition, adaptive threshold is introduced for each azimuth-range pixel, enabling the threshold shrinkage to be not only layer-varied but also element-wise.


Detect-Localize-Repair: A Unified Framework for Learning to Debug with CodeT5

Specifically, we propose three objectives to adapt the generic CodeT5 for debugging: a bug detection objective to determine whether a given code snippet is buggy or not, a bug localization objective to identify the buggy lines, and a program repair objective to translate the buggy code to its fixed version.

Open-Set Object Detection Using Classification-free Object Proposal and Instance-level Contrastive Learning

To disambiguate unknown objects and background in the first subtask, we propose to use classification-free region proposal network (CF-RPN) which estimates the objectness score of each region purely using cues from object's location and shape preventing overfitting to the training categories.

Compressive Spectrum Sensing Using Blind-Block Orthogonal Least Squares

In this paper, we propose a blind-block orthogonal least squares-based compressive spectrum sensing (B-BOLS-CSS) algorithm, which utilizes a novel blind stopping rule to cut the cords to these prior information.

Compressive Spectrum Sensing Using Sampling-Controlled Block Orthogonal Matching Pursuit

To this end, the minimum number of required measurements for successful recovery is first derived in terms of its probabilistic lower bound.

HAQJSK: Hierarchical-Aligned Quantum Jensen-Shannon Kernels for Graph Classification

In this work, we propose a family of novel quantum kernels, namely the Hierarchical Aligned Quantum Jensen-Shannon Kernels (HAQJSK), for un-attributed graphs.

Downlink Massive MIMO Channel Estimation via Deep Unrolling : Sparsity Exploitations in Angular Domain

no code implementations31 Oct 2022 An Chen, Wenbo Xu, Liyang Lu, Yue Wang

Compressive Sensing Rolling Shutter Correction

no code implementations30 Oct 2022 Xiaorui Ding, Wenbo Xu, Yue Wang

Distributed Swarm Learning for Internet of Things at the Edge: Where Artificial Intelligence Meets Biological Intelligence

With the proliferation of versatile Internet of Things (IoT) services, smart IoT devices are increasingly deployed at the edge of wireless networks to perform collaborative machine learning tasks using locally collected data, giving rise to the edge learning paradigm.

Robust Distributed Learning Against Both Distributional Shifts and Byzantine Attacks

In this paper, we propose a new algorithm that equips distributed learning with robustness measures against both distributional shifts and byzantine attacks.

RING++: Roto-translation Invariant Gram for Global Localization on a Sparse Scan Map

In addition, we derive sufficient conditions of feature extractors for the representation preserving the roto-translation invariance, making RING++ a framework applicable to generic multi-channel features.


A Robust and Constrained Multi-Agent Reinforcement Learning Electric Vehicle Rebalancing Method in AMoD Systems

In this work, we design a robust and constrained multi-agent reinforcement learning (MARL) framework with state transition kernel uncertainty for EV AMoD systems.

Robust Constrained Reinforcement Learning

We then investigate a concrete example of $\delta$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.

Finite-Time Error Bounds for Greedy-GQ

Our techniques in this paper provide a general approach for finite-sample analysis of non-convex two timescale value-based reinforcement learning algorithms.

CB-DSL: Communication-efficient and Byzantine-robust Distributed Swarm Learning on Non-i.i.d. Data

data issues and Byzantine attacks, global data samples are introduced in CB-DSL and shared among IoT workers, which not only alleviates the local data heterogeneity effectively but also enables to fully utilize the exploration-exploitation mechanism of swarm intelligence.

QC-ODKLA: Quantized and Communication-Censored Online Decentralized Kernel Learning via Linearized ADMM

We then propose a novel learning framework named Online Decentralized Kernel learning via Linearized ADMM (ODKLA) to efficiently solve the online decentralized kernel learning problem.


ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene.

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL).

Towards Two-view 6D Object Pose Estimation: A Comparative Study on Fusion Strategy

We ascertain the Mid- Fusion approach is the best approach to restore the most precise 3D keypoints useful for object pose estimation.

Predicting Stock Price Movement after Disclosure of Corporate Annual Reports: A Case Study of 2021 China CSI 300 Stocks

We conclude that according to the financial indicators based on the just-released annual report of the company, the predictability of the stock price movement on the second day after disclosure is weak, with maximum accuracy about 59. 6% and maximum precision about 0. 56 on our test set by the random forest classifier, and the stock filtering does not improve the performance.

Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations

To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation.

VectorMapNet: End-to-end Vectorized HD Map Learning

To the best of our knowledge, VectorMapNet is the first work designed towards end-to-end vectorized map learning from onboard observations.

Collaborative Knowledge Graph Fusion by Exploiting the Open Corpus

To alleviate the challenges of building Knowledge Graphs (KG) from scratch, a more general task is to enrich a KG using triples from an open corpus, where the obtained triples contain noisy entities and relations.

MBGDT:Robust Mini-Batch Gradient Descent

In high dimensions, most machine learning method perform fragile even there are a little outliers.


Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair.

DPCN++: Differentiable Phase Correlation Network for Versatile Pose Registration

Next, the rotation, scale, and translation are independently and efficiently estimated in the spectrum step-by-step using the DPC solver.


Policy Gradient Method For Robust Reinforcement Learning

We further develop a smoothed robust policy gradient method and show that to achieve an $\epsilon$-global optimum, the complexity is $\mathcal O(\epsilon^{-3})$.

Learning A Simulation-based Visual Policy for Real-world Peg In Unseen Holes

This paper proposes a learning-based visual peg-in-hole that enables training with several shapes in simulation, and adapting to arbitrary unseen shapes in real world with minimal sim-to-real cost.

TomoSAR-ALISTA: Efficient TomoSAR Imaging via Deep Unfolded Network

Synthetic aperture radar (SAR) tomography (TomoSAR) has attracted remarkable interest for its ability in achieving three-dimensional reconstruction along the elevation direction from multiple observations.

MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries

In contrast to prior works, MUTR3D does not explicitly rely on the spatial and appearance similarity of objects.

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.

Accurate Portraits of Scientific Resources and Knowledge Service Components

With the advent of the cloud computing era, the cost of creating, capturing and managing information has gradually decreased.

Blind Orthogonal Least Squares based Compressive Spectrum Sensing

As an enabling technique of cognitive radio (CR), compressive spectrum sensing (CSS) based on compressive sensing (CS) can detect the spectrum opportunities from wide frequency bands efficiently and accurately by using sub-Nyquist sampling rate.

Compressive Sensing

A Visual Navigation Perspective for Category-Level Object Pose Estimation

In this paper, we take a deeper look at the inference of analysis-by-synthesis from the perspective of visual navigation, and investigate what is a good navigation policy for this specific task.

Academic Resource Text Level Multi-label Classification based on Attention

We propose an attention-based hierarchical multi-label classification algorithm of academic texts (AHMCA) by integrating features such as text, keywords, and hierarchical structure, the academic documents are classified into the most relevant categories.

FUTR3D: A Unified Sensor Fusion Framework for 3D Detection

Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics.

PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection and Mitigation in Deep Neural Networks

Based on our theoretical analysis and experimental results, we demonstrate the effectiveness of PiDAn in defending against backdoor attacks that use different settings of poisoned samples on GTSRB and ILSVRC2012 datasets.

CtlGAN: Few-shot Artistic Portraits Generation with Contrastive Transfer Learning

We propose a new encoder which embeds real faces into Z+ space and proposes a dual-path training strategy to better cope with the adapted decoder and eliminate the artifacts.

Translation Invariant Global Estimation of Heading Angle Using Sinogram of LiDAR Point Cloud

Global point cloud registration is an essential module for localization, of which the main difficulty exists in estimating the rotation globally without initial value.

Inference on autoregulation in gene expression

These two propositions form a simple but robust method to infer the existence of autoregulation from gene expression data.

Writing Style Aware Document-level Event Extraction

This verifies the writing style contains valuable information that could improve the performance of the event extraction task.

Robust factored principal component analysis for matrix-valued outlier accommodation and detection

To solve the robustness problem suffered by FPCA and make it applicable to matrix data, in this paper we propose a robust extension of FPCA (RFPCA), which is built upon a $t$-type distribution called matrix-variate $t$ distribution.

Auto-Tag: Tagging-Data-By-Example in Data Lakes

As data lakes become increasingly popular in large enterprises today, there is a growing need to tag or classify data assets (e. g., files and databases) in data lakes with additional metadata (e. g., semantic column-types), as the inferred metadata can enable a range of downstream applications like data governance (e. g., GDPR compliance), and dataset search.


Transformer-based Network for RGB-D Saliency Detection

TFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously.

Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals

The recent generative dataless methods construct document-specific category priors by using seed word occurrences only, however, such category priors often contain very limited and even noisy supervised signals.

EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation

We also evaluate two types of baseline on EventNarrative: a graph-to-text specific model and two state-of-the-art language models, which previous work has shown to be adaptable to the knowledge graph-to-text domain.

BEV-SGD: Best Effort Voting SGD for Analog Aggregation Based Federated Learning against Byzantine Attackers

As a promising distributed learning technology, analog aggregation based federated learning over the air (FLOA) provides high communication efficiency and privacy provisioning under the edge computing paradigm.

Object DGCNN: 3D Object Detection using Dynamic Graphs

Our method models 3D object detection as message passing on a dynamic graph, generalizing the DGCNN framework to predict a set of objects.

Revisiting Latent-Space Interpolation via a Quantitative Evaluation Framework

In this work, we show how data labeled with semantically continuous attributes can be utilized to conduct a quantitative evaluation of latent-space interpolation algorithms, for variational autoencoders.

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

This top-down approach outperforms its bottom-up counterpart in which object bounding box prediction follows per-pixel depth estimation, since it does not suffer from the compounding error introduced by a depth prediction model.

Machine Translation Verbosity Control for Automatic Dubbing

Automatic dubbing aims at seamlessly replacing the speech in a video document with synthetic speech in a different language.

Are BERT Families Zero-Shot Learners? A Study on Their Potential and Limitations

Starting from the resurgence of deep learning, language models (LMs) have never been so popular.

Online Robust Reinforcement Learning with Model Uncertainty

In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially and is assumed to be unknown.

Learning Interpretable BEV Based VIO without Deep Neural Networks

Specifically, we first adopt Unscented Kalman Filter as a differentiable layer to predict the pitch and roll, where the covariance matrices of noise are learned to filter out the noise of the IMU raw data.

Autonomous Driving Pose Estimation

Learning Stereopsis from Geometric Synthesis for 6D Object Pose Estimation

Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods, mostly due to the lack of 3D information.

Domain Generalization for Vision-based Driving Trajectory Generation

In this paper, we propose a domain generalization method for vision-based driving trajectory generation for autonomous vehicles in urban environments, which can be seen as a solution to extend the Invariant Risk Minimization (IRM) method in complex problems.

Autonomous Vehicles Domain Generalization

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.

A framework for massive scale personalized promotion

In order to do effective optimization in the second stage, counterfactual prediction and noise-reduction are essential for the first stage.


Inference on the structure of gene regulatory networks

For scenarios that have not been covered in literature, if the structure can be inferred, we propose new mathematical inference methods and evaluate them on simulated data.

HDMapNet: An Online HD Map Construction and Evaluation Framework

By introducing the method and metrics, we invite the community to study this novel map learning problem.

Improving Multi-Modal Learning with Uni-Modal Teachers

We name this problem Modality Failure, and hypothesize that the imbalance of modalities and the implicit bias of common objectives in fusion method prevent encoders of each modality from sufficient feature learning.

Improved Radar Localization on Lidar Maps Using Shared Embedding

We present a heterogeneous localization framework for solving radar global localization and pose tracking on pre-built lidar maps.

Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics

In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.

Feature-based Style Randomization for Domain Generalization

As a recent noticeable topic, domain generalization (DG) aims to first learn a generic model on multiple source domains and then directly generalize to an arbitrary unseen target domain without any additional adaption.

Towards Modeling the Style of Translators in Neural Machine Translation

We show that our style-augmented translation models are able to capture the style variations of translators and to generate translations with different styles on new data.

Deep Multi-agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic

On-ramp merging is a challenging task for autonomous vehicles (AVs), especially in mixed traffic where AVs coexist with human-driven vehicles (HDVs).

On Feature Decorrelation in Self-Supervised Learning

In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce the robustness of the representations to predefined augmentations.

Joint Optimization of Communications and Federated Learning Over the Air

Federated learning (FL) is an attractive paradigm for making use of rich distributed data while protecting data privacy.

Non-Asymptotic Analysis for Two Time-scale TDC with General Smooth Function Approximation

Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning.

1-Bit Compressive Sensing for Efficient Federated Learning Over the Air

For distributed learning among collaborative users, this paper develops and analyzes a communication-efficient scheme for federated learning (FL) over the air, which incorporates 1-bit compressive sensing (CS) into analog aggregation transmissions.

HW-NAS-Bench:Hardware-Aware Neural Architecture Search Benchmark

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance of all the networks in the search spaces of both NAS-Bench-201 and FBNet, on six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

Efficient learning of goal-oriented push-grasping synergy in clutter

In this paper, a goal-conditioned hierarchical reinforcement learning formulation with high sample efficiency is proposed to learn a push-grasping policy for grasping a specific object in clutter.

Learn to Differ: Sim2Real Small Defection Segmentation Network

In this paper, we propose the network SSDS that learns a way of distinguishing small defections between two images regardless of the context, so that the network can be trained once for all.

Collaborative Recognition of Feasible Region with Aerial and Ground Robots through DPCN

Taking the aerial robots' advantages of having large scale variance of view points of the same route which the ground robots is on, the collaboration work provides global information of road segmentation for the ground robot, thus enabling it to obtain feasible region and adjust its pose ahead of time.

Using Prior Knowledge to Guide BERT's Attention in Semantic Textual Matching Tasks

We study the problem of incorporating prior knowledge into a deep Transformer-based model, i. e., Bidirectional Encoder Representations from Transformers (BERT), to enhance its performance on semantic textual matching tasks.

Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

Place recognition is critical for both offline mapping and online localization.

SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training

Results show that: 1) applied to inference, SD achieves up to 2. 44x energy efficiency as evaluated via real hardware implementations; 2) applied to training, SD leads to 10. 56x and 4. 48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e. g., energy cost and latency) of all the networks in the search space of both NAS-Bench-201 and FBNet, considering six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

SACoD: Sensor Algorithm Co-Design Towards Efficient CNN-powered Intelligent PhlatCam

PhlatCam, with its form factor potentially reduced by orders of magnitude, has emerged as a promising solution to the first aforementioned challenge, while the second one remains a bottleneck.

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs.


3D Point-to-Keypoint Voting Network for 6D Pose Estimation

In this paper, we propose a framework for 6D pose estimation from RGB-D data based on spatial structure characteristics of 3D keypoints.

6D Pose Estimation

CORAL: Colored structural representation for bi-modal place recognition

In this way, we fuse the structural features and visual features in the consistent bird-eye view frame, yielding a semantic representation, namely CORAL.

Visual Place Recognition

Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings

Further analyses show that our multi-head attention is able to attend information from various aspects and boost classification or generation in diverse scenarios.

PREGAN: Pose Randomization and Estimation for Weakly Paired Image Style Translation

Utilizing the trained model under different conditions without data annotation is attractive for robot applications.

Self-supervised Representation Learning for Evolutionary Neural Architecture Search

To enhance the predictive performance of neural predictors, we devise two self-supervised learning methods from different perspectives to pre-train the architecture embedding part of neural predictors to generate a meaningful representation of neural architectures.

Improving the generalization of network based relative pose regression: dimension reduction as a regularizer

Through experiments on real world RGBD datasets we validate the effectiveness of our design in terms of improving both generalization performance and robustness towards viewpoint change, and also show the potential of regression based visual localization networks towards challenging occasions that are difficult for geometry based visual localization methods.

3D Reconstruction Dimensionality Reduction +3

DiSCO: Differentiable Scan Context with Orientation

In this paper, we propose a LiDAR-based place recognition method, named Differentiable Scan Context with Orientation (DiSCO), which simultaneously finds the scan at a similar place and estimates their relative orientation.

Imitation Learning of Hierarchical Driving Model: from Continuous Intention to Continuous Trajectory

One of the challenges to reduce the gap between the machine and the human level driving is how to endow the system with the learning capacity to deal with the coupled complexity of environments, intentions, and dynamics.


Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

To verify the effectiveness of our proposed method, we conduct extensive experiments on four real-world datasets as well as compare our method with state-of-the-art methods.

Inference on tissue transplantation experiments

This method can provide the most probable results of a group of experiments or the probability of a specific result for each experiment.

Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection

A common dilemma in 3D object detection for autonomous driving is that high-quality, dense point clouds are only available during training, but not testing.

Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product

We annotate a multimodal product attribute value dataset that contains 87, 194 instances, and the experimental results on this dataset demonstrate that explicitly modeling the relationship between attributes and values facilitates our method to establish the correspondence between them, and selectively utilizing visual product information is necessary for the task.

RaLL: End-to-end Radar Localization on Lidar Map Using Differentiable Measurement Model

In this paper, we propose an end-to-end deep learning framework for Radar Localization on Lidar Map (RaLL) to bridge the gap, which not only achieves the robust radar localization but also exploits the mature lidar mapping technique, thus reducing the cost of radar mapping.

Synergistic saliency and depth prediction for RGB-D saliency detection

Evaluation on seven RGB-D datasets demonstrates that even without saliency ground truth for RGB-D datasets and using only the RGB data of RGB-D datasets at inference, our semi-supervised system performs favorable against state-of-the-art fully-supervised RGB-D saliency detection methods that use saliency ground truth for RGB-D datasets at training and depth data at inference on two largest testing datasets.

Learning hierarchical behavior and motion planning for autonomous driving

To improve the tactical decision-making for learning-based driving solution, we introduce hierarchical behavior and motion planning (HBMP) to explicitly model the behavior in learning-based solution.

SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation

We present SmartExchange, an algorithm-hardware co-design framework to trade higher-cost memory storage/access for lower-cost computation, for energy-efficient inference of deep neural networks (DNNs).

