Complex question generation over knowledge bases (KB) aims to generate natural language questions involving multiple KB relations or functional constraints.
State-of-the-art approaches to video-based action and gesture recognition often employ two key concepts: First, they employ multistream processing; second, they use an ensemble of convolutional networks.
Ranked #1 on Action Classification on Jester test
Seq2seq神经网络模型在中英文文本摘要的研究中取得了良好的效果, 但在低资源语言的文本摘要研究还处于探索阶段, 尤其是在藏语中。此外, 目前还没有大规模的标注语料库进行摘要提取。本文提出了一种生成藏文新闻摘要的统一模型。利用TextRank算法解决了藏语标注训练数据不足的问题。然后, 采用两层双GRU神经网络提取代表原始新闻的句子, 减少冗余信息。最后, 使用基于注意力机制的Seq2Seq来生成理解式摘要。同时, 我们加入了指针网络来处理未登录词的问题。实验结果表明, ROUGE-1评分比传统模型提高了2%。 关键词:文本摘要;藏文;TextRank; 指针网络;Bi-GRU
Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent.
We propose novel approaches for simultaneously identifying important weights of a convolutional neural network (ConvNet) and providing more attention to the important weights during training.
“语义依存图是NLP处理语义的深层分析方法, 能够对句子中词与词之间的语义进行分析。该文针对古代汉语特点, 在制定古代汉语语义依存图标注规范的基础上, 以《二十四史》为语料来源, 完成标注了规模为3000句的古代汉语语义依存图库, 标注一致性的kappa值为78. 83%。通过与现代汉语语义依存图库的对比, 对依存图库基本情况进行统计, 分析古代汉语的语义特色和规律。统计显示, 古代汉语语义分布宏观上符合齐普夫定律, 在语义事件描述上具有强烈的历史性叙事和正式文体特征, 如以人物纪传为中心, 时间、地点等周边角色描述细致, 叙事语言冷静客观, 缺少描述情态、语气、程度、时间状态等的修饰词语等。 "
“基于深度学习的有监督机器翻译取得了良好的效果, 但训练过程中需要大量质量较高的对齐语料。对于中文古今翻译场景, 高质量的平行语料并不多, 而粗对齐的篇章、段语料比较容易获得, 因此语料对齐很有研究价值和研究必要。在传统双语平行语料的句子对齐研究中, 传统方法根据双语文本中的长度、词汇、共现文字等语法信息, 建立一个综合评判标准来衡量两个句对之间相似度。此类方法虽然在单句对齐上取得了较好的效果, 但是对于句子语义匹配的能力有限, 并且在一些多对多的对齐模式上的性能表现不佳。在本文中我们提出尝试利用现在发展迅速且具有强大语义表示能力的预训练语言模型来考虑双语的语义信息, 但是单独使用预训练语言模型只能考虑相对局部的信息, 因此我们提出采用基于动态规划算法的强化学习训练目标来整合段落全局信息, 并且进行无监督训练。实验结果证明我们提出的方法训练得到的模型性能优于此前获得最好表现的基线模型, 尤其相较于传统模型难以处理的多对多对齐模式下, 性能提升较大。”
As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks.
Real-world image super-resolution (RISR) has received increased focus for improving the quality of SR images under unknown complex degradation.
However, these UDA solutions just yield unsatisfactory 3D detection results when there is a severe domain shift, e. g., from Waymo (64-beam) to nuScenes (32-beam).
By introducing stochastic prediction and the parallel encoder-decoder, SAIM significantly improve the performance of autoregressive image modeling.
However, existing datasets for unsupervised anomaly detection are biased towards manufacturing inspection, not considering maintenance inspection which is usually conducted under outdoor uncontrolled environment such as varying camera viewpoints, messy background and degradation of object surface after long-term working.
Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world.
However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed.
In addition, we delicately examine the explainability of the CBR system in the decision-making process of bankruptcy prediction.
We first measure a model's factual robustness by its success rate to defend against adversarial attacks when generating factual information.
Diffusion generative models have recently greatly improved the power of text-conditioned image generation.
Recently methods of graph neural networks (GNNs) have been applied to solving the problems in high energy physics (HEP) and have shown its great potential for quark-gluon tagging with graph representation of jet events.
Though model robustness has been extensively studied in language understanding, the robustness of Seq2Seq generation remains understudied.
Multiple predictive path points are dynamically generated by a deep Markov model optimized using RL approach for robot to track.
This is actually a matching task between a query and candidate entities based on their historical structures, which reflect behavioral trends of the entities at different timestamps.
Based on this consideration, we propose a novel framework to learn the geometric primitives shared in seen and unseen categories' objects, where the learned geometric primitives are served for transferring knowledge from seen to unseen categories.
Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP.
Our method can model the common pattern behind different stocks with a meta-learner, while modeling the specific pattern for each stock across time spans with stock-dependent parameters.
In order to comprehensively investigate the influence caused by the misalignment, we proposed a method for estimating the performance of a 4f-ONN in response to various misalignment in the context of the image classification task. The misalignment in numerical simulation is estimated by manipulating the optical intensity distributions in the fourth focus plane in the 4f system.
6 code implementations • 5 Oct 2022 • Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li
The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.
Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks.
While Reinforcement Learning can achieve impressive results for complex tasks, the learned policies are generally prone to fail in downstream tasks with even minor model mismatch or unexpected perturbations.
Our key insight is that dynamic systems with different parameters provide different levels of difficulty for the policy, and the difficulty of behaving well in a system is constantly changing due to the evolution of the policy.
In view of this, we propose a new goal area-based framework, named Goal Area Network (GANet), for motion forecasting, which models goal areas rather than exact goal coordinates as preconditions for trajectory prediction, performing more robustly and accurately.
Ranked #2 on Motion Forecasting on Argoverse 2 Motion Forecasting
Because each Guzheng playing technique is applied to a note, a dedicated onset detector is trained to divide an audio into several notes and its predictions are fused with frame-wise IPT predictions.
Specifically, a channel separation-aggregation (CSA) structure is designed to simplify the complexity of stacked separable convolutions, and a dynamic receptive field (DRF) mechanism is developed to maintain high accuracy by customizing the convolution kernel and its perception range dynamically when reducing the network complexity.
Self-training has shown great potential in semi-supervised learning.
The existing AOOD methods face the challenges of ambiguity and high costs in angle representation.
However, the inconsistent features for the localization and classification tasks in AOOD models may lead to ambiguity and low-quality object predictions, which constrains the detection performance.
Text information including extensive prior knowledge about land cover classes has been ignored in hyperspectral image classification (HSI) tasks.
Currently, cross-scene hyperspectral image (HSI) classification has drawn increasing attention.
Second, to improve the robustness of binary models with contextual dependencies, we compute the contextual dynamic embeddings to determine the binarization thresholds in general binary convolutional blocks.
Existing top-performance 3D object detectors typically rely on the multi-modal fusion strategy.
Graph coloring, a classical and critical NP-hard problem, is the problem of assigning connected nodes as different colors as possible.
To alleviate these issues, we propose to simultaneously conduct feature alignment in two individual spaces focusing on different domains, and create for each space a domain-oriented classifier tailored specifically for that domain.
Then, Next Hybrid Strategy (NHS) is designed to stack NCB and NTB in an efficient hybrid paradigm, which boosts performance in various downstream tasks.
Ranked #233 on Image Classification on ImageNet
To further leverage these two paradigms, we propose a selective and joint HDR and denoising (SJ-HD$^2$R) imaging framework, utilizing scenario-specific priors to conduct the path selection with an accuracy of more than 93. 3$\%$.
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.
To address these issues, we establish a new high-quality dataset named RealRain-1k, consisting of $1, 120$ high-resolution paired clean and rainy images with low- and high-density rain streaks, respectively.
To this end, a multi-patch attention network (MPANet) based on the axial-attention encoder and the multi-scale patch branch (MSPB) structure is proposed.
From the time the MVTec AD dataset was proposed to the present, new research methods that are constantly being proposed push its precision to saturation.
Constraint violation has been a building block to design evolutionary multi-objective optimization algorithms for solving constrained multi-objective optimization problems.
no code implementations • 25 May 2022 • Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang, Javen Qinfeng Shi, Dong Gong, Dan Zhu, Mengdi Sun, Guannan Chen, Yang Hu, Haowei Li, Baozhu Zou, Zhen Liu, Wenjie Lin, Ting Jiang, Chengzhi Jiang, Xinpeng Li, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Juan Marín-Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Chunyang Li, Long Bao, Gang He, Ziyao Xu, Li Xu, Gen Zhan, Ming Sun, Xing Wen, Junlin Li, Shuang Feng, Fei Lei, Rui Liu, Junxiang Ruan, Tianhong Dai, Wei Li, Zhan Lu, Hengyan Liu, Peian Huang, Guangyu Ren, Yonglin Luo, Chang Liu, Qiang Tu, Fangya Li, Ruipeng Gang, Chenghua Li, Jinjing Li, Sai Ma, Chenming Liu, Yizhen Cao, Steven Tel, Barthelemy Heyrman, Dominique Ginhac, Chul Lee, Gahyeon Kim, Seonghyun Park, An Gia Vien, Truong Thanh Nhat Mai, Howoon Yoon, Tu Vo, Alexander Holston, Sheir Zaheer, Chan Y. Park
The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i. e. solutions can not exceed a given number of operations).
The ability of capturing fine spectral discriminative information enables hyperspectral images (HSIs) to observe, detect and identify objects with subtle spectral discrepancy.
In this paper, the depiction of $(O, G)$-granular variable precision fuzzy rough sets ($(O, G)$-GVPFRSs for short) is first given based on overlap and grouping functions.
In order to further generalize the FRS theory to more complicated data environments, we firstly propose four types of fuzzy neighborhood operators based on fuzzy covering by overlap functions and their implicators in this paper.
In this paper, we mainly construct three types of $L$-fuzzy $\beta$-covering-based rough set models and study the axiom sets, matrix representations and interdependency of these three pairs of $L$-fuzzy $\beta$-covering-based rough approximation operators.
We present Answer-Me, a task-aware multi-task framework which unifies a variety of question answering tasks, such as, visual question answering, visual entailment, visual reasoning.
Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification.
Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos and then employs a translation module to translate glosses into spoken sentences.
The problem of effectively exploiting the information multiple data sources has become a relevant but challenging research topic in remote sensing.
In this paper, we propose a unified network for TAD, termed Faster-TAD, by re-purposing a Faster-RCNN like architecture.
SEAL consists of two kinds of annotations, SEAL Tubes and SEAL Clips.
Considering the characteristics and differences of multi-source remote sensing images, a feature-based registration algorithm named Multi-scale Histogram of Local Main Orientation (MS-HLMO) is proposed.
We propose FindIt, a simple and versatile framework that unifies a variety of visual grounding and localization tasks including referring expression comprehension, text-based localization, and object detection.
We propose a memory efficient method, named Stochastic Backpropagation (SBP), for training deep neural networks on videos.
We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span.
To this end, we propose a novel open-vocabulary detector based on DETR -- hence the name OV-DETR -- which, once trained, can detect any object given its class name or an exemplar image.
Ranked #6 on Open Vocabulary Object Detection on MSCOCO
Promising performance has been achieved for visual perception on the point cloud.
In particular, we propose to conduct grounded learning on both images and texts via a sharing grounded space, which helps bridge unaligned images and texts, and align the visual and textual semantic spaces on different types of corpora.
Furthermore, these models are all trained offline, which cannot well adapt to the changes of evolutional patterns from then on.
Recently, adversarial attacks have been applied in visual object tracking to deceive deep trackers by injecting imperceptible perturbations into video frames.
Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2. 5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.
In this survey, we provide a systematic overview of the research progress on the faithfulness problem of NLG, including problem analysis, evaluation metrics and optimization methods.
A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability.
Ranked #1 on Semi-Supervised Semantic Segmentation on Pascal VOC 2012 12.5% labeled (using extra training data)
In this paper, we address this issue by explicitly leveraging temporal motion cues and propose DMT, a Detector-free Motion prediction based 3D Tracking network that totally removes the usage of complicated 3D detectors, which is lighter, faster, and more accurate than previous trackers.
Moreover, we utilize multi-source information (e. g., MFCC and deep features) to further improve the scoring system performance.
Chorus detection is a challenging problem in musical signal processing as the chorus often repeats more than once in popular songs, usually with rich instruments and complex rhythm forms.
In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture.
Multi-temporal hyperspectral images can be used to detect changed information, which has gradually attracted researchers' attention.
Extracting roads from high-resolution remote sensing images (HRSIs) is vital in a wide variety of applications, such as autonomous driving, path planning, and road navigation.
Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take the advantage of the both methods for better exploration and exploitation. The evolutionary part in these hybrid methods maintains a population of policy networks. However, existing methods focus on optimizing the parameters of policy network, which is usually high-dimensional and tricky for EA. In this paper, we shift the target of evolution from high-dimensional parameter space to low-dimensional action space. We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel hybrid method of EA and DRL. In EAS, we focus on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to promote policy learning through an evolutionary algorithm.
A multimode fiber represents the ultimate limit in miniaturization of imaging endoscopes.
In contrast, our large-scale VIdeo Panoptic Segmentation in the Wild (VIPSeg) dataset provides 3, 536 videos and 84, 750 frames with pixel-level panoptic annotations, covering a wide range of real-world scenarios and categories.
However, the application value of SG on downstream tasks is severely limited by the predicate classification bias, which is caused by long-tailed data and presented as semantic bias of predicted relation predicates.
Recently, there have been many advances in autonomous driving society, attracting a lot of attention from academia and industry.
With the DANN, only a small fraction of input configurations (2d images) needs to be labeled, which is automatically chosen, in order to capture the critical point.
RDIM and region fitting do not require extra running time and these three steps can be well integrated into other attacks.
On the one hand, audio encoder and visual encoder separately encode audio data and visual data into two different latent spaces.
Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain.
However, current methods can not effectively map image features to a tractable base distribution and ignore the relationship between local and global features which are important to identify anomalies.
Ranked #5 on Anomaly Detection on MVTec AD (using extra training data)
Owing to effective and flexible data acquisition, unmanned aerial vehicle (UAV) has recently become a hotspot across the fields of computer vision (CV) and remote sensing (RS).
Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent.
Moreover, we make a key observation that subtle forgery artifacts can be further exposed in the patch-wise phase and amplitude spectrum and exhibit different clues.
In this paper, we study a new problem named Referring Self-supervised Learning (RSL) on 3D scene understanding: Given the 3D synthetic models with labels and the unlabeled 3D real scene scans, our goal is to distinguish the identical semantic objects on an unseen scene according to the referring synthetic 3D models.
Specifically, an anchor-free object-adaptation label assignment (OLA) strategy is presented to define the positive candidates based on two-dimensional (2-D) oriented Gaussian heatmaps, which reflect the shape and direction features of arbitrary-oriented objects.
Ranked #24 on Object Detection In Aerial Images on DOTA
The advantage of the proposed method is higher efficiency, more accurate, and less hyperparameters than the strong form PINN with subdomains.
Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003. 07705 [eess. AS], , arXiv:2012. 06749 [cs. CL]).
In this paper, we benchmark our model on these three tasks.
1 code implementation • 30 Aug 2021 • Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang, Yi Zuo, Licheng Jiao, Chang Meng, Hao Wang, Jiahao Wang, Yiming Hui, Zhuojun Dong, Jie Zhang, Qianyue Bao, Zixiao Zhang, Fang Liu
This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images.
Trading volume movement prediction is the key in a variety of financial applications.
Besides the security concerns of potential adversarial examples, adversarial training can also improve the generalization ability of neural networks, train robust neural networks, and provide interpretability for neural networks.
no code implementations • 19 Aug 2021 • Mingren Shen, Guanzhao Li, Dongxia Wu, YuHan Liu, Jacob Greaves, Wei Hao, Nathaniel J. Krakauer, Leah Krudy, Jacob Perez, Varun Sreenivasan, Bryan Sanchez, Oigimer Torres, Wei Li, Kevin Field, Dane Morgan
Electron microscopy is widely used to explore defects in crystal structures, but human detecting of defects is often time-consuming, error-prone, and unreliable, and is not scalable to large numbers of images or real-time analysis.
To tackle this issue, we propose Semantic Concentration for Domain Adaptation (SCDA), which encourages the model to concentrate on the most principal features via the pair-wise adversarial alignment of prediction distributions.
The frequency-guided upsampling module reconstructs details from multiple frequency-specific components with rich details.
This paper describes our submission to SemEval 2021 Task 2.
The rapid development of artificial intelligence methods contributes to their wide applications for forecasting various financial risks in recent years.
Phonation mode is an essential characteristic of singing style as well as an important expression of performance.
As a result, we can leverage the spatial information (the size of objects), temporal information (the direction and magnitude of motions) as our learning target.
We design a new instance-to-track matching objective to learn appearance embedding that compares a candidate detection to the embedding of the tracks persisted in the tracker.
We integrate this ranking scheme with two frequency models and a GPT-2 styled language model, along with the acceptance model to yield 27. 80% and 37. 64% increase in TOP1 and TOP5 accuracy, respectively.
More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.
Generative Adversarial Networks (GAN) have promoted a variety of applications in computer vision, natural language processing, etc.
Specifically, at the clue searching stage, CluSTeR learns a beam search policy via reinforcement learning (RL) to induce multiple clues from historical facts.
In order to accelerate the research for domain-specific knowledge graphs in the medical domain, we introduce DiaKG, a high-quality Chinese dataset for Diabetes knowledge graph, which contains 22, 050 entities and 6, 890 relations in total.
Abstractive summarization for long-document or multi-document remains challenging for the Seq2Seq architecture, as Seq2Seq is not good at analyzing long-distance relations in text.
Signed network embedding is an approach to learn low-dimensional representations of nodes in signed networks with both positive and negative links, which facilitates downstream tasks such as link prediction with general data mining frameworks.
To capture these properties effectively and efficiently, we propose a novel Recurrent Evolution network based on Graph Convolution Network (GCN), called RE-GCN, which learns the evolutional representations of entities and relations at each timestamp by modeling the KG sequence recurrently.
We tackle the problem of object completion from point clouds and propose a novel point cloud completion network employing an Asymmetrical Siamese Feature Matching strategy, termed as ASFM-Net.
To achieve these results, we pose discovering attack paths as a Reinforcement Learning (RL) problem and train an agent to discover multi-domain cyberspace attack paths.
no code implementations • 9 Apr 2021 • Léni K. Le Goff, Edgar Buchanan, Emma Hart, Agoston E. Eiben, Wei Li, Matteo De Carlo, Alan F. Winfield, Matthew F. Hale, Robert Woolley, Mike Angus, Jon Timmis, Andy M. Tyrrell
This causes a potential mismatch between the structure of an inherited controller and the new body.
We present a novel method for single image depth estimation using surface normal constraints.
Domain adaptation (DA) enables knowledge transfer from a labeled source domain to an unlabeled target domain by reducing the cross-domain distribution discrepancy.
To remedy this, we propose a Transferable Semantic Augmentation (TSA) approach to enhance the classifier adaptation ability through implicitly generating source features towards target semantics.
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR).
With the path-specific parameters obtained by the proposed channel tracking, the proposed PTRM can not only match the time dispersion as conventional PTRM, but also the doubly-spread channel, since the path-specific delay and Doppler scaler factor can help to match the channel in both time and frequency domain.
A model based on the underwater acoustic channel's correlation can be used as the state-space model in the Kalman filter to improve the underwater acoustic channel tracking compared that without a model.
(3) The detection of neural disease was demonstrated to be benefit from thermodynamic model, implying the immense potential of thermodynamics in auxiliary diagnosis.
A simple model is presented that explains absorption line-shapes of disordered systems, and we also provide a strategy to determine the excitonic disorder energy.
Optics Disordered Systems and Neural Networks
1 code implementation • • Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption.
Strong fluctuations in the low-$T$ quantum critical regime can give rise to a large thermal entropy change and thus significant cooling effect when approaching the QCP.
Strongly Correlated Electrons
In this paper, we propose LEAD, i. e., LiDAR Extender for Autonomous Driving, to extend the MEMS LiDAR by coupled image w. r. t both FoV and range.
Quantum Ising model on a triangular lattice hosts a finite temperature Berezinskii-Kosterlitz-Thouless (BKT) phase with emergent U(1) symmetry, and it will transit into an up-up-down (UUD) phase with $C_3$ symmetry breaking upon an infinitesimal external field along the longitudinal direction, but the overall phase diagram spanned by the axes of external field and temperature remains opaque due to the lack of systematic invesitgations with controlled methodologies.
Strongly Correlated Electrons Statistical Mechanics
The physical essences of the quantum critical points are determined by analyzing the susceptibility exponents for all of the source terms in particle-hole and particle-particle channels.
Strongly Correlated Electrons Materials Science
In the context of trade liberalisation and market harmonisation in the European markets, accurate price forecasting becomes difficult for electricity market participants to obtain because electricity forecasting requires the consideration of features from ever-growing coupling markets.
The topical domain generalization (DG) problem asks trained models to perform well on an unseen target domain with different data statistics from the source training domains.
In this paper, we probe this direction by deriving a relationship between the estimation of unknown parameters of the probability density function (pdf) of input data and classification accuracy.
To characterize the power of GNNs for the graph coloring problem, we first formalize the discrimination power of GNNs as the capability to assign nodes different colors.
Existed pre-training methods either focus on single-modal tasks or multi-modal tasks, and cannot effectively adapt to each other.
Ranked #3 on Image Captioning on COCO
In order to remove this factor, we first develop gradient descent averaging (GDA), which is a general projection-based dual averaging algorithm in the strongly convex setting.
Recent deep learning based methods focus on learning clustering oriented representations.
In this work, we present a lightweight, tightly-coupled deep depth network and visual-inertial odometry (VIO) system, which can provide accurate state estimates and dense depth maps of the immediate surroundings.
When DSA module and object confidence task are applied in RetinaNet together, the detection performances based on ResNet50 and ResNet101 can be increased by 1. 0% AP and 1. 4% AP respectively.
We present a novel method for multi-view depth estimation from a single video, which is a critical task in various applications, such as perception, reconstruction and robot navigation.
However, we found that in the outdoor point cloud, the improvement obtained in this way is quite limited.
Ranked #2 on 3D Semantic Segmentation on ScribbleKITTI
However, the negative pressure waves or guided stress waves may not be easily detected with environmental interference, e. g., the oil and gas pipelines in offshore environment.
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system.
This concept represents a novel one-dimensional realization of artificial neural networks, enabling an efficient application of optical deep learning methods to the analysis and processing of serial data signals, while offering a new overall perspective for the temporal signal processing.
In this paper, we propose Adversarial Privacy Graph Embedding (APGE), a graph adversarial training framework that integrates the disentangling and purging mechanisms to remove users' private information from learned node representations.
The 2nd-pass model plays a key role in the quality improvement of the end-to-end model to surpass the conventional model.
One of the current state-of-the-art multilingual document embedding model LASER is based on the bidirectional LSTM neural machine translation model.
Transfer learning-based methods address this problem by pre-training in the source domain and fine-tuning on the target domain.
Current computer vision tasks based on deep learning require a huge amount of data with annotations for model training or testing, especially in some dense estimation tasks, such as optical flow segmentation and depth estimation.
We then propose a structure-aware network for lane marker extraction in DVS images.
In this paper, we propose a novel method (TEAM, Taylor Expansion-Based Adversarial Methods) to generate more powerful adversarial examples than previous methods.
To get clear street-view and photo-realistic simulation in autonomous driving, we present an automatic video inpainting algorithm that can remove traffic agents from videos and synthesize missing regions with the guidance of depth/point cloud.
Ranked #1 on Image Inpainting on ApolloScape