The new ideas in the current paper are: (a) new variants of mixup with negative as well as positive coefficients, and extend the sample-wise mixup to be pixel-wise.
Deploying neural networks to different devices or platforms is in general challenging, especially when the model size is large or model complexity is high.
Detecting spoofing attacks on the positions of unmanned aerial vehicles (UAVs) within a swarm is challenging.
The attacker then adversarially regenerates the graph structural correlations while maximizing the FL training loss, and subsequently generates malicious local models using the adversarial graph structure and the training data features of the benign ones.
Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information.
These priors are subsequently utilized by RGformer to guide the decomposition of image features into their respective reflectance and illumination components.
Accurate measurement of the offset from roof-to-footprint in very-high-resolution remote sensing imagery is crucial for urban information extraction tasks.
Finally, we propose an effective alignment method that explores diverse generation strategies, which can reasonably reduce the misalignment rate under our attack.
We model and capture the time and frequency dimensions of the audio independently using a multi-layered RNN along each dimension.
The extraction of lakes from remote sensing images is a complex challenge due to the varied lake shapes and data noise.
The integration of different modalities, such as audio and visual information, plays a crucial role in human perception of the surrounding environment.
no code implementations • 14 Aug 2023 • Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji
This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23).
LEFormer contains four main modules: CNN encoder, Transformer encoder, cross-encoder fusion, and lightweight decoder.
Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis.
On the prey side, we propose an adversarial training framework, Camouflageator, which introduces an auxiliary generator to generate more camouflaged objects that are harder for a COD method to detect.
In this paper, we propose a consistency regularization framework to develop a more generalizable SFDA method, which simultaneously boosts model performance on both target training and testing datasets.
no code implementations • 19 Jul 2023 • Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu, Yusheng Zhang, Rongyu Zhang, Hang Shi, Qihang Xu, Longan Xiao, Zhiliang Ma, Mirko Agarla, Luigi Celona, Claudio Rota, Raimondo Schettini, Zhiwei Huang, Yanan Li, Xiaotao Wang, Lei Lei, Hongye Liu, Wei Hong, Ironhead Chuang, Allen Lin, Drake Guan, Iris Chen, Kae Lou, Willy Huang, Yachun Tasi, Yvonne Kao, Haotian Fan, Fangyuan Kong, Shiqi Zhou, Hao liu, Yu Lai, Shanshan Chen, Wenqi Wang, HaoNing Wu, Chaofeng Chen, Chunzheng Zhu, Zekun Guo, Shiling Zhao, Haibing Yin, Hongkui Wang, Hanene Brachemi Meftah, Sid Ahmed Fezza, Wassim Hamidouche, Olivier Déforges, Tengfei Shi, Azadeh Mansouri, Hossein Motamednia, Amir Hossein Bakhtiari, Ahmad Mahmoudi Aznaveh
61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions.
However, software is hardly consistently cited: one software entity can be cited as different objects, and the citations can change over time.
Specifically, we extract features from an HQ image and explicitly insert the features, which are expected to encode HQ cues, into the enhancement network to guide the LQ enhancement with the variational normalization module.
This paper aims to address these two issues by proposing the Class-Balanced Mean Teacher (CBMT) model.
In recent years, ubiquitous semantic Metaverse has been studied to revolutionize immersive cyber-virtual experiences for augmented reality (AR) and virtual reality (VR) users, which leverages advanced semantic understanding and representation to enable seamless, context-aware interactions within mixed-reality environments.
We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments.
In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).
As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications.
Crucially, we find that $k$NN-LMs are more susceptible to leaking private information from their private datastore than parametric models.
It remains a challenging task since (1) it is hard to distinguish concealed objects from the background due to the intrinsic similarity and (2) the sparsely-annotated training data only provide weak supervision for model learning.
We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images.
We further propose a consistency learning based mean teacher model to effectively adapt the learned UDA model using labeled and unlabeled target samples.
Satellite imagery analysis plays a pivotal role in remote sensing; however, information loss due to cloud cover significantly impedes its application.
In this paper, we propose an approach for cI2V using novel latent flow diffusion models (LFDM) that synthesize an optical flow sequence in the latent space based on the given condition to warp the given image.
In the surface defect detection, there are some suspicious regions that cannot be uniquely classified as abnormal or normal.
Source-free object detection (SFOD) aims to transfer a detector pre-trained on a label-rich source domain to an unlabeled target domain without seeing source data.
COD is a challenging task due to the intrinsic similarity of camouflaged objects with the background, as well as their ambiguous boundaries.
Heterogeneous image fusion (HIF) techniques aim to enhance image quality by merging complementary information from images captured by different sensors.
Federated learning casts a light on the collaboration of distributed local clients with privacy protected to attain a more generic global model.
This operation maximizes the contribution of discriminative frames to further capture the similarity of support and query samples from the same category.
Source-free domain adaptation (SFDA) is an emerging research topic that studies how to adapt a pretrained source model using unlabeled target data.
Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections.
Ranked #1 on Speech Separation on VoxCeleb2
Neural Radiance Fields (NeRFs) encode the radiance in a scene parameterized by the scene's plenoptic function.
It is also unveiled that the reliability and trust of ML in UAV operations and applications require significant attention before full automation of UAVs and potential cooperation between UAVs and humans come to fruition.
The Audio Deep Synthesis Detection (ADD) Challenge has been held to detect generated human-like speech.
In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer.
Ranked #2 on Speech Separation on WHAM!
With the advancement of sciences, many scientific methods are being proposed, modified, and used in academic literature.
The design of MAFENN framework and algorithm are dedicated to enhance the learning capability of the feedfoward DL networks or their variations with the simple data feedback.
For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences.
We study few-shot debugging of transformer based natural language understanding models, using recently popularized test suites to not just diagnose but correct a problem.
The downlink channel covariance matrix (CCM) acquisition is the key step for the practical performance of massive multiple-input and multiple-output (MIMO) systems, including beamforming, channel tracking, and user scheduling.
Based on the identified latent directions of attributes, we propose Compositional Attribute Adjustment to adjust the latent code, resulting in better compositionality of image synthesis.
Federated learning (FL) has been increasingly considered to preserve data training privacy from eavesdropping attacks in mobile edge computing-based Internet of Thing (EdgeIoT).
Gradient inversion attack (or input recovery from gradient) is an emerging threat to the security and privacy preservation of Federated learning, whereby malicious eavesdroppers or participants in the protocol can recover (partially) the clients' private data.
The deep policy gradient method has demonstrated promising results in many large-scale games, where the agent learns purely from its own experience.
For the second issue, we reconsider how to improve detection efficiency with excellent performance, and then propose our lightweight encoder-decoder architecture termed CarNet.
In scientific literature, XANES/Raman data are usually plotted in line graphs which is a visually appropriate way to represent the information when the end-user is a human reader.
Numerous detection problems in computer vision, including road crack detection, suffer from exceedingly foreground-background imbalance.
They also fail to sense the entire space of the input, which is critical for high-quality MR image SR. To address those problems, we propose squeeze and excitation reasoning attention networks (SERAN) for accurate MR image SR. We propose to squeeze attention from global spatial information of the input and obtain global descriptors.
Ranked #2 on Image Super-Resolution on IXI
In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios.
In this paper, we propose a cooperative MARL method with sequential credit assignment (SeCA) that deduces each agent's contribution to the team's success one by one to learn better cooperation.
This paper studies Semi-Supervised Domain Adaptation (SSDA), a practical yet under-investigated research topic that aims to learn a model of good performance using unlabeled samples and a few labeled samples in the target domain, with the help of labeled samples from a source domain.
Contrasted with prior work, this paper provides a complementary solution to align domains by learning the same auxiliary tasks in both domains simultaneously.
By utilizing this dataset, we propose an object-detection based image retrieval framework that models the UI context and hierarchical structure.
With the goal of tuning up the brightness, low-light image enhancement enjoys numerous applications, such as surveillance, remote sensing and computational photography.
This paper proposes a neural network for multi-level low-light image enhancement, which is user-friendly to meet various requirements by selecting different images as brightness reference.
In this paper, we propose to enhance the temporal coherence by Consistency-Regularized Graph Neural Networks (CRGNN) with the aid of a synthesized video matting dataset.
Unsupervised domain adaptation (UDA) is to make predictions for unlabeled data in a target domain with labeled data from source domain available.
Owning to the unremitting efforts by a few institutes, significant progress has recently been made in designing superhuman AIs in No-limit Texas Hold'em (NLTH), the primary testbed for large-scale imperfect-information game research.
To obtain a high-performance vehicle ReID model, we present a novel Distance Shrinking with Angular Marginalizing (DSAM) loss function to perform hybrid learning in both the Original Feature Space (OFS) and the Feature Angular Space (FAS) using the local verification and the global identification information.
1 code implementation • 10 Nov 2020 • Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, WangMeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li, Byung-Hoon Kim, Joonyoung Song, Jong Chul Ye, JaeHyun Baek, Magauiya Zhussip, Yeskendir Koishekenov, Hwechul Cho Ye, Xin Liu, Xueying Hu, Jun Jiang, Jinwei Gu, Kai Li, Pengliang Tan, Bingxin Hou
This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results.
In this paper, we address this limitation with an efficient learning objective that considers the discriminative feature distributions between the visual objects and sentence words.
To address the issue that deep neural networks (DNNs) are vulnerable to model inversion attacks, we design an objective function, which adjusts the separability of the hidden data representations, as a way to control the trade-off between data utility and vulnerability to inversion attacks.
There is a fundamental trade-off between the channel representation resolution of codebooks and the overheads of feedback communications in the fifth generation new radio (5G NR) frequency division duplex (FDD) massive multiple-input and multiple-output (MIMO) systems.
In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e. g., BERT) for any sentence or sentence-pair task.
This paper introduces InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines.
This paper presents a learning-based approach to synthesize the view from an arbitrary camera position given a sparse set of images.
We establish a benchmark suite consisting of different types of PDF document datasets that can be utilized for cross-domain DOD model training and evaluation.
This paper attempts to answer the question whether neural network pruning can be used as a tool to achieve differential privacy without losing much data utility.
This paper presents a new approach for caching in CDNs that uses machine learning to approximate the Belady MIN algorithm.
We instead reformulate ZSL as a conditioned visual classification problem, i. e., classifying visual features based on the classifiers learned from the semantic descriptions.
It outperforms the current best method by 6. 8% relatively for image retrieval and 4. 8% relatively for caption retrieval on MS-COCO (Recall@1 using 1K test set).
Ranked #8 on Image Retrieval on Flickr30K 1K test
A key challenge is online MPT and data collection in the presence of on-board control of a UAV (e. g., patrolling velocity) for preventing battery drainage and data queue overflow of the sensing devices, while up-to-date knowledge on battery level and data queue of the devices is not available at the UAV.
However, capturing the short-term effects of drugs and therapeutic interventions on patient physiological state remains challenging.
To circumvent the limited enclave memory (128 MB with the latest Intel CPUs), we propose to place the memory buffer of the eLSM store outside the enclave and protect the buffer using a new authenticated data structure by digesting individual LSM-tree levels.
Cryptography and Security Databases Distributed, Parallel, and Cluster Computing Data Structures and Algorithms
Face Anti-spoofing gains increased attentions recently in both academic and industrial fields.
To address this issue, we design local and non-local attention blocks to extract features that capture the long-range dependencies between pixels and pay more attention to the challenging parts.
Convolutional nets have been shown to achieve state-of-the-art accuracy in many biomedical image analysis tasks.
This paper presents a very simple but efficient algorithm for 3D line segment detection from large scale unorganized point cloud.
To capture the graph dynamics, we use the graph prediction stream to predict the dynamic graph structures, and the predicted structures are fed into the flow prediction stream.
To ensure scalability and separability, a softmax-like function is formulated to push apart the positive and negative support sets.
To solve these problems, we propose the very deep residual channel attention networks (RCAN).
Ranked #14 on Image Super-Resolution on BSD100 - 4x upscaling
First, a novel cost-sensitive multi-task loss function is designed to learn transferable aging features by training on the source population.
However, most existing deep hashing methods directly learn the hash functions by encoding the global semantic information, while ignoring the local spatial information of images.
Gaussian processes (GPs), or distributions over arbitrary functions in a continuous domain, can be generalized to the multi-output case: a linear model of coregionalization (LMC) is one approach.
In the scenario of real-time monitoring of hospital patients, high-quality inference of patients' health status using all information available from clinical covariates and lab tests is essential to enable successful medical interventions and improve patient outcomes.
We detect the topological properties of Chern insulators with strong Coulomb interactions by use of cluster perturbation theory and variational cluster approach.
Strongly Correlated Electrons
We argue that the problem lies in the mix-up of two interpretations of the extensive form game structures: game rules or game runs which do not always coincide.
With the prevalence of the commodity depth cameras, the new paradigm of user interfaces based on 3D motion capturing and recognition have dramatically changed the way of interactions between human and computers.
Fisheye image rectification and estimation of intrinsic parameters for real scenes have been addressed in the literature by using line information on the distorted images.
In this work, we propose a novel hash learning framework that encodes feature's rank orders instead of numeric values in a number of optimal low-dimensional ranking subspaces.