Contrastive Language-Image Pretraining (CLIP) has demonstrated impressive zero-shot learning abilities for image understanding, yet limited effort has been made to investigate CLIP for zero-shot video recognition.
While the US is not a standard paradigm for spinal surgery, the scarcity of intra-operative clinical US data is an insurmountable bottleneck in training a neural network.
Molecular property calculations are the bedrock of chemical physics.
In this work, we propose a novel method, FedStyle, to learn a more generalized global model by infusing local style information with local content information for contrastive learning, and to learn more personalized local models by inducing local style information for downstream tasks.
In this paper, we present a machine learning framework that uses the bounds of the benefit function that are estimable from the finite population data to learn the bounds of the benefit function for each cell of characteristics.
The proposed method uses the framework of adversarial learning to construct a video multimodal feature fusion network and a feature mapping network as generator, a modality discrimination network as discriminator.
Quantum Machine Learning continues to be a highly active area of interest within Quantum Computing.
The assumption is that one is in possession of a large enough sample to permit an accurate estimation of the experimental and observational distributions.
The unit selection problem defined by Li and Pearl identifies individuals who have desired counterfactual behavior patterns, for example, individuals who would respond positively if encouraged and would not otherwise.
Then the whole heterogeneous information network is transformed into a hypergraph composed of different hyperedges.
To solve the above challenges, aiming at the data information of scientific research teams closely related to science and technology, we proposed an academic heterogeneous information network embedding representation learning method based on federated learning (FedAHE), which utilizes node attention and meta path attention mechanism to learn low-dimensional, dense and real-valued vector representations while preserving the rich topological information and meta-path-based semantic information of nodes in network.
KGUPN contains three main layers, which are the propagation representation layer, the contextual information layer and collaborative relation layer.
We also show that layer normalization is a better choice in FL which can mitigate the external covariate shift and improve the performance of the global model.
Based on this measure, we also design a computation-efficient client sampling strategy, such that the actively selected clients will generate a more class-balanced grouped dataset with theoretical guarantees.
The rapid growth and deployment of deep learning (DL) has witnessed emerging privacy and security concerns.
The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms.
We propose an algorithm to test the identifiability of the nonbinary benefit function and an algorithm to compute the bounds of the nonbinary benefit function using experimental and observational data.
Particularly, we develop a hardware-friendly sparse attention operator and a length-aware hardware resource scheduling algorithm.
Federated Learning aims at training a global model from multiple decentralized devices (i. e. clients) without exchanging their private local data.
Evaluations through simulation and on real IBM-Q devices show that our framework can significantly reduce the error rate by up to 6$\times$, with only $\sim$60\% circuit depth compared to state-of-the-art gate scheduling approaches.
Aspect-Based Sentiment Analysis (ABSA) is a fine-grained task in the field of sentiment analysis, which aims to predict the polarity of aspects.
We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics.
To address this limitation, we propose SememeWSD Synonym (SWSDS) model to assign a different vector to every sense of polysemous words with the help of word sense disambiguation (WSD) and synonym set in OpenHowNet.
Graph Neural Networks (GNNs) have drawn tremendous attention due to their unique capability to extend Machine Learning (ML) approaches to applications broadly-defined as having unstructured data, especially graphs.
Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).
Being a promising model to be deployed in resource-limited devices, Binarized Neural Networks (BNNs) have drawn extensive attention from both academic and industry.
To close the accuracy gap, in this paper we propose to add a complementary activation function (AF) ahead of the sign based binarization, and rely on the genetic algorithm (GA) to automatically search for the ideal AFs.
In this paper, we study the text sentiment classification of online travel reviews based on social media online comments and propose the SCCL model based on capsule network and sentiment lexicon.
With the rapid development of information technology, "information overload" has become the main theme that plagues people's online life.
However, caused by severe domain gaps (e. g., the field of view (FOV), pixel size, and object size among datasets), Mono3D detectors have difficulty in generalization, leading to drastic performance degradation on unseen domains.
The IGGNet contains two key ingredients, i. e., a Geometry-Aware Attention (GAA) module and an Iterative Cross Guidance (ICG) strategy.
With the advent of the information age, the scale of data on the Internet is getting larger and larger, and it is full of text, images, videos, and other information.
Based on this, we develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
The scientific and technological resources of experts and scholars are mainly composed of basic attributes and scientific research achievements.
In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing.
In recent years, with the continuous progress of science and technology, the number of scientific research achievements is increasing day by day, as the exchange platform and medium of scientific research achievements, the scientific and technological academic conferences have become more and more abundant.
A challenging problem in task-free continual learning is the online selection of a representative replay memory from data streams.
In this paper, we advance FastMapSVM -- an interpretable Machine Learning framework for classifying complex objects -- as an advantageous alternative to Neural Networks for general classification tasks.
The knowledge extraction task is to extract triple relations (head entity-relation-tail entity) from unstructured text data.
To further strengthen the results, we co-design a customized ML model FLNet and its personalization under the decentralized training scenario.
We propose an attention-based hierarchical multi-label classification algorithm of academic texts (AHMCA) by integrating features such as text, keywords, and hierarchical structure, the academic documents are classified into the most relevant categories.
Aiming at the problem that the current general-purpose semantic text similarity calculation methods are difficult to use the semantic information of scientific academic conference data, a semantic similarity calculation algorithm for scientific academic conferences by fusion with domain features is proposed.
In the era of big data, it is possible to carry out cooperative research on the research results of researchers through papers, patents and other data, so as to study the role of researchers, and produce results in the analysis of results.
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art method for graph-based learning tasks.
Ranked #1 on Node Classification on Reddit
In light of this, we propose a scientific and technological information oriented Semantics-adversarial and Media-adversarial Cross-media Retrieval method (SMCR) to find an effective common subspace.
The proposed work is the first investigation in the emotion recognition oriented EEG topological feature analysis, which brought a novel insight into the brain neural system nonlinear dynamics analysis and feature extraction.
In this paper we propose a novel hardware accelerator for GCN inference, called I-GCN, that significantly improves data locality and reduces unnecessary computation.
For the text modality, we use global vectors for word representation (GloVe) to perform word embeddings and the embeddings are fed into the Bi-LSTM network.
Inspired by this observation, we propose a reinforcement learning solution TrojanSeeker to find approximate trigger actions for the trojan agents, and further propose an efficient approach to mitigate the trojan agents based on machine unlearning.
We introduce M2DGR: a novel large-scale dataset collected by a ground robot with a full sensor-suite including six fish-eye and one sky-pointing RGB cameras, an infrared camera, an event camera, a Visual-Inertial Sensor (VI-sensor), an inertial measurement unit (IMU), a LiDAR, a consumer-grade Global Navigation Satellite System (GNSS) receiver and a GNSS-IMU navigation system with real-time kinematic (RTK) signals.
To bridge this gap, we aim to learn a spatial-aware visual representation that can describe the three-dimensional space and is more suitable and effective for these tasks.
Surprisingly, we show Vision Transformers perform significantly worse than Convolutional Neural Networks when only a small set of labeled data is available.
Furthermore, we derive a certified robustness guarantee against model poisoning attacks and a convergence guarantee to FedAvg after applying our FL-WBC.
While end-to-end jointly optimizing GNNs and their accelerators is promising in boosting GNNs' inference efficiency and expediting the design process, it is still underexplored due to the vast and distinct design spaces of GNNs and their accelerators.
The unit selection problem aims to identify a set of individuals who are most likely to exhibit a desired mode of behavior, for example, selecting individuals who would respond one way if encouraged and a different way if not encouraged.
no code implementations • 25 Aug 2021 • Austin Derrow-Pinion, Jennifer She, David Wong, Oliver Lange, Todd Hester, Luis Perez, Marc Nunkesser, Seongjae Lee, Xueying Guo, Brett Wiltshire, Peter W. Battaglia, Vishal Gupta, Ang Li, Zhongwen Xu, Alvaro Sanchez-Gonzalez, Yujia Li, Petar Veličković
Travel-time prediction constitutes a task of high importance in transportation networks, with web mapping services like Google Maps regularly serving vast quantities of travel time queries from users and enterprises alike.
Deep complex networks (DCN), in contrast, can learn from complex data, but have high computational costs; therefore, they cannot satisfy the instant decision-making requirements of many deployable systems dealing with short observations or short signal bursts.
Existing representation learning methods on graphs have achieved state-of-the-art performance on various graph-related tasks such as node classification, link prediction, etc.
Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning.
This paper addresses the problem of estimating causal effects when adjustment variables in the back-door or front-door criterion are partially observed.
The key idea of our defense is learning to perturb data representation such that the quality of the reconstructed data is severely degraded, while FL performance is maintained.
A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in an image.
no code implementations • 17 May 2021 • Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, Jin-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu, Zhenyu Li, Xianming Liu, Junjun Jiang, Wei-Chi Chen, Shayan Joya, Huanhuan Fan, Zhaobing Kang, Ang Li, Tianpeng Feng, Yang Liu, Chuannan Sheng, Jian Yin, Fausto T. Benavide
While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference.
Based upon this observation, we propose a novel metric called Neural Mean Discrepancy (NMD), which compares neural means of the input examples and training data.
Crowd counting has drawn much attention due to its importance in safety-critical surveillance systems.
Motivated by the complex neural networks, in this paper we introduce complex representation into the BNNs and propose Binary complex neural network -- a novel network design that processes binary complex inputs and weights through complex convolution, but still can harvest the extraordinary computation efficiency of BNNs.
In order to reveal the detailed flow physics that result in significant fluid forces alternations, the detailed flow visualization is provided by the numerical simulation: the small gap between two cylinders in a side-by-side configuration will result in a strong gap jet that enhances the energy dissipation and increase the drag, while due to the flow blocking effect for two cylinders in a tandem configuration, the drag coefficient decreases.
We employ the Cram\'er-Rao bound (CRB) as a performance metric of target estimation, under both point and extended target scenarios.
The framework shows that the subset selection process, a deciding factor for subset aggregation methods, can be viewed as a code design problem.
In this paper, we mitigate these limitations by developing a novel deep meta reinforcement learning (DMRL) algorithm.
In this paper, we propose GenQu, a hybrid and general-purpose quantum framework for learning classical data through quantum states.
In this work, we show our key observation that the data representation leakage from gradients is the essential cause of privacy leakage in FL.
However, existing FL methods 1) perform poorly when data across clients are non-IID, 2) cannot handle data with new label domains, and 3) cannot leverage unlabeled data, while all these issues naturally happen in real-world graph-based problems.
Video inpainting aims to restore missing regions of a video and has many applications such as video editing and object removal.
Link prediction in dynamic graphs (LPDG) is an important research problem that has diverse applications such as online recommendations, studies on disease contagion, organizational studies, etc.
Next, we reformulate the evasion attack against GNNs to be related to calculating label influence on LP, which is applicable to multi-layer GNNs and does not need to know the GNN model.
Rather than learning a shared global model in classic federated learning, each client learns a personalized model via LotteryFL; the communication cost can be significantly reduced due to the compact size of lottery networks.
Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs.
Load shedding has been one of the most widely used and effective emergency control approaches against voltage instability.
This problem is often referred to as catastrophic forgetting, a key challenge in continual learning of neural networks.
The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years.
For evaluation, we measure the execution fidelity of a subset of QASMBench applications on 12 IBM-Q machines through density matrix state tomography, which comprises 25K circuit evaluations.
In this paper, we propose a monocular visual localization pipeline leveraging semantic and depth cues.
Instead of performing stylization frame by frame, only key frames in the original video are processed by a pre-trained deep neural network (DNN) on edge servers, while the rest of stylized intermediate frames are generated by our designed optical-flow-based frame interpolation algorithm on mobile phones.
Through the cooperation between a photographer and a stranger, the stranger's face in a photo can be automatically blurred upon his request when the photo is taken.
The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks.
Hence, the measurement of these corrections has the potential of providing important information on the equation of state of nuclear matter.
High Energy Astrophysical Phenomena General Relativity and Quantum Cosmology
In this paper, we focus on the reconfigurable intelligent surface (RIS)-enhanced two-way device-to-device (D2D) multi-pair orthogonal-frequency-division-multiplexing (OFDM) communication systems.
The dataset is collected by annotating videos from the Kinetics-700 dataset using the AVA annotation protocol, and extending the original AVA dataset with these new AVA annotated Kinetics clips.
In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We propose the OpenHybrid framework, which is composed of an encoder to encode the input data into a joint embedding space, a classifier to classify samples to inlier classes, and a flow-based density estimator to detect whether a sample belongs to the unknown category.
Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments.
Recently, when we used this method to identify aircraft targets in remote sensing images, we found that there are some defects in our own YOLOv2 and Darknet-19 network.
In this paper, we propose to address this issue from a parameter space perspective and study an approach to restrict the direction of the gradient updates to avoid forgetting previously-learned data.
Our experiments on CelebA and LFW datasets show that the quality of the reconstructed images from the obfuscated features of the raw image is dramatically decreased from 0. 9458 to 0. 3175 in terms of multi-scale structural similarity.
A promising approach is to embed the high-dimensional observations into a lower-dimensional latent representation space, estimate the latent dynamics model, then utilize this model for control in the latent space.
Deep learning systems have been successfully applied to Euclidean data such as images, video, and audio.
Forexample, given a male image with image region of one eye missing, current models may restore it with a female eye.
Image inpainting aims at restoring missing regions of corrupted images, which has many applications such as image restoration and object removal.
We compare PhysGAN with a set of state-of-the-art baseline methods including several of our self-designed ones, which further demonstrate the robustness and efficacy of our approach.
The ability to navigate from visual observations in unfamiliar environments is a core component of intelligent agents and an ongoing challenge for Deep Reinforcement Learning (RL).
High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep learning, big data and planet-scale simulations.
Hardware Architecture Distributed, Parallel, and Cluster Computing Networking and Internet Architecture Performance
To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher.
Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training.
Among the issues with this approach is that to make the distributed cluster work with high utilization, the workload distributed to each node must be large, which implies nontrivial growth in the SGD mini-batch size.
We propose a Consistency-aware Selective Fusion (CSF) to integrate the pairwise orders into a globally consistent order.
We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance.
Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance.
In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the "final response layer" (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power.
We introduce count-guided weakly supervised localization (C-WSL), an approach that uses per-class object count as a new form of supervision to improve weakly supervised localization (WSL).
We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images.
Understanding visual relationships involves identifying the subject, the object, and a predicate relating them.
Real-world image recognition systems need to recognize tens of thousands of classes that constitute a plethora of visual concepts.
Ranked #2 on Zero-Shot Transfer Image Classification on SUN
Since interactions between objects can be reduced to a limited set of atomic spatial relations in 3D, we study the possibility of inferring 3D structure from a text description rather than an image, applying physical relation models to synthesize holistic 3D abstract object layouts satisfying the spatial constraints present in a textual description.
Deep learning modeling lifecycle generates a rich set of data artifacts, such as learned parameters and training logs, and comprises of several frequently conducted tasks, e. g., to understand the model behaviors and to try out new models.
We examined temporal patterns of the linguistic expression of individuals on Chinese social media (Weibo).
The proposed DNBS representation is applied to object tracking through SSD-based template matching.
Nowadays, people are motivated to share their experiences and feelings on social media, so we propose to sense SWB from the vast user generated data on social media.