The emotion cause is a stimulus for human emotions.
The power of the framework is a novel difficulty assessment model, which forecasts how challenging an unlabelled sample is to the latest trained instance segmentation model.
Dense retrievers have achieved impressive performance, but their demand for abundant training data limits their application scenarios.
However, when the perturbation occurs earlier, the SDE model outperforms the ODE model, and we demonstrate that the error of sample generation due to pulse-shape error can be exponentially suppressed as the diffusion term's magnitude increases to infinity.
Clothing segmentation and fine-grained attribute recognition are challenging tasks at the crossing of computer vision and fashion, which segment the entire ensemble clothing instances as well as recognize detailed attributes of the clothing products from any input human images.
Through multiple quantitative metrics evaluated on our dataset and a user study, we demonstrate AnimeDiffusion outperforms state-of-the-art GANs-based models for anime face line drawing colorization.
Automatic colorization of anime line drawing has attracted much attention in recent years since it can substantially benefit the animation industry.
We present Twin Answer Sentences Attack (TASA), an adversarial attack method for question answering (QA) models that produces fluent and grammatical adversarial contexts while maintaining gold answers.
To this end, we propose SpikeSim, a tool that can perform realistic performance, energy, latency and area evaluation of IMC-mapped SNNs.
Pre-Training (PT) of text representations has been successfully applied to low-resource Neural Machine Translation (NMT).
Segmentation from point cloud data is essential in many applications such as remote sensing, mobile robots, or autonomous cars.
However, for high-level QM methods, such as density functional theory (DFT) at the meta-GGA level and/or with exact exchange, quantum Monte Carlo, etc., generating a sufficient amount of data for training a ML potential has remained computationally challenging due to their high cost.
On the theory side, we discuss how to tailor the velocity field to the target and establish general conditions under which the proposed estimator is a perfect estimator with zero-variance.
Generating high-quality textual adversarial examples is critical for investigating the pitfalls of natural language processing (NLP) models and further promoting their robustness.
We present IBR, an Iterative Backward Reasoning model to solve the proof generation tasks on rule-based Question Answering (QA), where models are required to reason over a series of textual rules and facts to find out the related proof path and derive the final answer.
Besides accelerating the computation using custom compute elements (CE) and in-memory computing, COIN aims at minimizing the intra- and inter-CE communication in GCN operations to optimize the performance and energy efficiency.
To alleviate the above data issues, we propose a data manipulation method, which is model-agnostic to be packed with any persona-based dialogue generation model to improve its performance.
For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e. g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e. g. 25 languages from CommonCrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e. g. English-German, making there exists the data discrepancy, namely domain discrepancy, and cross-lingual learning objective discrepancy, namely task discrepancy, between the pretraining and finetuning stages.
To improve the IVF success rate, we propose a knowledge-based decision support system that can provide medical advice on the treatment protocol and medication adjustment for each patient visit during IVF treatment cycle.
We are interested in exploring its capability in human pose estimation, and thus propose a novel model based on transformer architecture, enhanced with a feature pyramid fusion structure.
In this thesis we present a semantic representation formalism based on directed graphs and explore its linguistic adequacy and explanatory benefits in the semantics of plurality and quantification.
With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs).
In-memory computing (IMC) on a monolithic chip for deep learning faces dramatic challenges on area, yield, and on-chip interconnection cost due to the ever-increasing model sizes.
This paper presents a novel dataset for the development of visual navigation and simultaneous localisation and mapping (SLAM) algorithms as well as for underwater intervention tasks.
In this technique, we use analytical models of NoC to evaluate end-to-end communication latency of any given DNN.
Our method leverages structured sparsification to reduce computational cost without hurting the model capacity at the end of offline training so that a full-size model is available in the recurring training stage to learn new data in real-time.
The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.
Ranked #24 on Reading Comprehension on ReClor
Apart from recovering the inference accuracy, our RA-BNN after growing also shows significantly higher resistance to BFA.
All these factors pose a significant challenge to effective polyp detection in a colonoscopy.
This paper presents the design of a tune-free (human-out-of-the-loop parameter tuning) control framework, aiming at accelerating large scale autonomous driving system deployed on various vehicles and driving environments.
In this work, we study dialogue models with multiple input sources adapted from the pretrained language model GPT2.
Many techniques have been developed, such as model compression, to make Deep Neural Networks (DNNs) inference more efficiently.
This paper builds upon the success of previous models and develops a novel architecture, which combines object segmentation and convolutional neural networks (CNN) to construct an effective classifier of ROP stages 1-3 based on neonatal retinal images.
High-fidelity clothing reconstruction is the key to achieving photorealism in a wide range of applications including human digitization, virtual try-on, etc.
The first one is that our dataset is not fully labeled, i. e., only a subset of all lesion instances are marked.
This classifier branch is equipped with Feature Aggregation and Local Magnification Layers to enhance the classifier branch.
The model we design consists of an encoder to extract multi-scale semantic features and a decoder to expand the feature maps to a polyp segmentation map.
We analyze the lesion-vs-image scale carefully and propose a large-size feature pyramid network (LFPN) to preserve more image details for mini lesion instance detection.
On the other hand, it further reduces domain distribution discrepancy through conditional adversarial learning across domains.
Inspired by the observation that brain networks follow the Small-World model, we propose a novel structural pruning scheme, which includes (1) hierarchically trimming the network into a Small-World model before training, (2) training the network for a given dataset, and (3) optimizing the network for accuracy.
1 code implementation • • Alex Warstadt, Yu Cao, Ioana Grosu, Wei Peng, Hagen Blix, Yining Nie, Anna Alsop, Shikha Bordia, Haokun Liu, Alicia Parrish, Sheng-Fu Wang, Jason Phang, Anhad Mohananey, Phu Mon Htut, Paloma Jeretič, Samuel R. Bowman
We conclude that a variety of methods is necessary to reveal all relevant aspects of a model's grammatical knowledge in a given domain.
Though challenging, with the great advances in object detection techniques, automated polyp detection still demonstrates a great potential in reducing the false negative rate while maintaining a high precision.
The anchor mechanism is removed and lesions are formalized as single keypoints.
We are interested in the optimal scheduling of a collection of multi-component application jobs in an edge computing system that consists of geo-distributed edge computing nodes connected through a wide area network.
We present a 3D Convolutional Neural Networks (CNNs) based single shot detector for spatial-temporal action detection tasks.
Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days.
Such a system requires learning from the data stream, training the model to preserve previous information and adapt to a new task, and generating a single-headed vector for future inference.
A typical training pipeline to mitigate over-parameterization is to pre-define a DNN structure first with redundant learning units (filters and neurons) under the goal of high accuracy, then to prune redundant learning units after training with the purpose of efficient inference.
Today a canonical approach to reduce the computation cost of Deep Neural Networks (DNNs) is to pre-define an over-parameterized model before training to guarantee the learning capacity, and then prune unimportant learning units (filters and neurons) during training to improve model compactness.
Graph convolutional networks are used to obtain a relation-aware representation of nodes for entity graphs built from documents with multi-level features.
no code implementations • 23 May 2018 • Chetan Singh Thakur, Jamal Molin, Gert Cauwenberghs, Giacomo Indiveri, Kundan Kumar, Ning Qiao, Johannes Schemmel, Runchun Wang, Elisabetta Chicca, Jennifer Olson Hasler, Jae-sun Seo, Shimeng Yu, Yu Cao, André van Schaik, Ralph Etienne-Cummings
Neuromorphic engineering (NE) encompasses a diverse range of approaches to information processing that are inspired by neurobiological systems, and this feature distinguishes neuromorphic systems from conventional computing systems.
The proposed generative sensing framework aims at transforming low-end, low-quality sensor data into higher quality sensor data in terms of achieved classification accuracy.
We present a new back propagation based training algorithm for discrete-time spiking neural networks (SNN).
In this paper, we present DeepDSL, a domain specific language (DSL) embedded in Scala, that compiles deep networks written in DeepDSL to Java source code.
We applied our proposed approach to two real-world food image data sets (UEC-256 and Food-101) and achieved impressive results.
In this paper, we propose a method to compress deep neural networks by using the Fisher Information metric, which we estimate through a stochastic optimization method that keeps track of second-order information in the network.
By adopting a natural and widely used assumption -- "the data samples from the same class should lay on a low-dimensional subspace, even if they come from different domains", the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the compact joint subspaces of source and target domain.
In this paper, we propose a new method that can recognize human activities from partially observed videos in the general case.