In spite of the success of pre-trained language model in many NLP tasks, the learned text representation only contains the correlation among the words in the sentence itself and ignores the implicit relationship between arbitrary tokens in the sequence.
1 code implementation • 17 Nov 2023 • Pietro Melzi, Ruben Tolosana, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Ivan DeAndres-Tame, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Weisong Zhao, Xiangyu Zhu, Zheyu Yan, Xiao-Yu Zhang, Jinlin Wu, Zhen Lei, Suvidha Tripathi, Mahak Kothari, Md Haider Zama, Debayan Deb, Bernardo Biesseck, Pedro Vidal, Roger Granada, Guilherme Fickel, Gustavo Führ, David Menotti, Alexander Unnervik, Anjith George, Christophe Ecabert, Hatef Otroshi Shahreza, Parsa Rahimi, Sébastien Marcel, Ioannis Sarridis, Christos Koutlis, Georgia Baltsou, Symeon Papadopoulos, Christos Diou, Nicolò Di Domenico, Guido Borghi, Lorenzo Pellegrini, Enrique Mas-Candela, Ángela Sánchez-Pérez, Andrea Atzori, Fadi Boutros, Naser Damer, Gianni Fenu, Mirko Marras
Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail.
Existing LLM-based systems for writing long-form stories or story outlines frequently suffer from unnatural pacing, whether glossing over important events or over-elaborating on insignificant details, resulting in a jarring experience for the reader.
This paper presents a novel approach, called Prototype-based Self-Distillation (ProS), for unsupervised face representation learning.
Long-Term Person Re-Identification (LT-ReID) has become increasingly crucial in computer vision and biometrics.
However, prior discrete prompt optimization methods require expert knowledge to design the base prompt set and identify high-quality prompts, which is costly, inefficient, and subjective.
no code implementations • 29 Jun 2023 • Feng Liu, Ryan Ashbaugh, Nicholas Chimitt, Najmul Hassan, Ali Hassani, Ajay Jaiswal, Minchul Kim, Zhiyuan Mao, Christopher Perry, Zhiyuan Ren, Yiyang Su, Pegah Varghaei, Kai Wang, Xingguang Zhang, Stanley Chan, Arun Ross, Humphrey Shi, Zhangyang Wang, Anil Jain, Xiaoming Liu
Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance.
Our novel Patch-wise style extractor and Time-step dependent ID loss enables DCFace to consistently produce face images of the same subject under different styles with precise control.
As a result, the algorithm is encouraged to learn both comprehensive features and inherent hierarchical nature of different forgery attributes, thereby improving the IFDL representation.
To resolve this, we reformulate the MIM from reconstructing a single masked image to reconstructing a pair of masked images, enabling the pretraining of transformer module.
This work studies the generalization issue of face anti-spoofing (FAS) models on domain gaps, such as image resolution, blurriness and sensor variations.
We call this 'model parsing of adversarial attacks' - a task to uncover 'arcana' in terms of the concealed VM information in attacks.
This observation motivates us to decouple the video depth estimation into two components, a normalized pose estimation over a flowmap and a logged residual depth estimation over a mono-depth map.
The objective of this paper is to learn dense 3D shape correspondence for topology-varying generic objects in an unsupervised manner.
Machine-Generated Text (MGT) detection, a task that discriminates MGT from Human-Written Text (HWT), plays a crucial role in preventing misuse of text generative models, which excel in mimicking human writing style recently.
Advances in attention and recurrent modules have led to feature fusion that can model the relationship among the images in the input set.
Ranked #1 on Face Verification on IJB-B (TAR @ FAR=0.001 metric)
In this work, we study multi-domain learning for face anti-spoofing(MD-FAS), where a pre-trained FAS model needs to be updated to perform equally well on both source and target domains while only using target domain data for updating.
As a result, DEVIANT is equivariant to the depth translations in the projective manifold whereas vanilla networks are not.
In light of this, we propose a novel image-conditioned neural implicit field, which can leverage 2D supervisions from GAN-generated multi-view images and perform the single-view reconstruction of generic objects.
To address this problem, we propose a controllable face synthesis model (CFSM) that can mimic the distribution of target datasets in a style latent space.
Ranked #1 on Face Verification on IJB-S
In this work, we introduce another aspect of adaptiveness in the loss function, namely the image quality.
Ranked #1 on Surveillance-to-Booking on IJB-S
That is, a template protected real image, and its manipulated version, is better discriminated compared to the original real image vs. its manipulated one.
However, carefully crafted, tiny adversarial perturbations are difficult to recover by optimizing a unilateral RED objective.
The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities.
With complementary supervision from both 3D detection and reconstruction, one enables the 3D voxel features to be geometry and context preserving, benefiting both tasks. The effectiveness of our approach is demonstrated through 3D detection and reconstruction in single object and multiple object scenarios.
Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis.
A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points.
To tackle this problem, we propose a framework with two components: a Fingerprint Estimation Network (FEN), which estimates a GM fingerprint from a generated image by training with four constraints to encourage the fingerprint to have desired properties, and a Parsing Network (PN), which predicts network architecture and loss functions from the estimated fingerprints.
Here we propose a radar-to-pixel association stage which learns a mapping from radar returns to pixels.
This paper presents a method for riggable 3D face reconstruction from monocular images, which jointly estimates a personalized face rig and per-image parameters including expressions, poses, and illuminations.
Proposed UniFAD outperforms prevailing defense methods and their fusion with an overall TDR = 94. 73% @ 0. 2% FDR on a large fake face dataset consisting of 341K bona fide images and 448K attack images of 25 types across all 3 categories.
Furthermore, we introduce a method to use the shadow mask to estimate the ambient light intensity in an image, and are thus able to leverage multiple datasets during training with different global lighting intensities.
That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo, lighting and camera projection matrix, decode the representations to segmented 3D shape and albedo respectively, and fuse these components to render an image well approximating the input image.
In this paper, we present and integrate GrooMeD-NMS -- a novel Grouped Mathematically Differentiable NMS for monocular 3D object detection, such that the network is trained end-to-end with a loss on the boxes after NMS.
Ranked #14 on Monocular 3D Object Detection on KITTI Cars Moderate
To defend against manipulation of image content, such as splicing, copy-move, and removal, we develop a Progressive Spatio-Channel Correlation Network (PSCC-Net) to detect and localize image manipulations.
While using the trending graph neural networks (GNNs) as encoder has the problem that GNNs aggregate redundant information from neighborhood and generate indistinguishable user representations, which is known as over-smoothing.
Compared to the previous state-of-the-art learning algorithms for non-rigid registration of face scans, SMF only requires the raw data to be rigidly aligned (with scaling) with a pre-defined face template.
Additive process describes spoofing as spoof material introducing extra patterns (e. g., moire pattern), where the live counterpart can be recovered by removing those patterns.
To tackle this research gap, we propose a novel duet representation learning framework named \sysname to fuse local information (user-item interaction data) and global information (external knowledge graph) for the top-$N$ recommendation, which is composed of two separate sub-models.
During training, FaceGuard automatically synthesizes challenging and diverse adversarial attacks, enabling a classifier to learn to distinguish them from real faces and a purifier attempts to remove the adversarial perturbations in the image space.
Robotic apple harvesting has received much research attention in the past few years due to growing shortage and rising cost in labor.
It is challenging to detect small-floating object in the sea clutter for a surface radar.
In this work, we propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
Ranked #5 on 3D Object Detection on Rope3D
Prior studies show that the key to face anti-spoofing lies in the subtle image pattern, termed "spoof trace", e. g., color distortion, 3D mask edge, Moire pattern, and many others.
Our proposed group adaptive classifier mitigates bias by using adaptive convolution kernels and attention mechanisms on faces based on their demographic attributes.
In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities.
Ranked #1 on Face Alignment on Menpo
As an emerging topic in face recognition, designing margin-based loss functions can increase the feature margin between different classes for enhanced discriminability.
Ranked #13 on Face Verification on IJB-C (TAR @ FAR=1e-4 metric)
In this work we study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images.
Using printed photograph and replaying videos of biometric modalities, such as iris, fingerprint and face, are common attacks to fool the recognition systems for granting access as the genuine user.
Graph convolution network (GCN) attracts intensive research interest with broad applications.
To improve the performance on those hard samples for general tasks, we propose a novel Distribution Distillation Loss to narrow the performance gap between easy and hard samples, which is a simple, effective and generic for various types of facial variations.
FAN can leverage both paired and unpaired data as we disentangle the features into identity and non-identity components and adapt the distribution of the identity features, which breaks the limit of current face super-resolution methods.
Our results show that object detection can help improve the accuracy of some skin disease classes.
We address the problem of bias in automated face recognition and demographic attribute estimation algorithms, where errors are lower on certain cohorts belonging to specific demographic groups.
Instead of simply using multi-task learning to simultaneously detect manipulated images and predict the manipulated mask (regions), we propose to utilize an attention mechanism to process and improve the feature maps for the classification task.
The LSTM integrates pose features over time as a dynamic gait feature while canonical features are averaged as a static gait feature.
Understanding the world in 3D is a critical component of urban autonomous driving.
Most of the existing gait recognition methods take silhouettes or articulated body models as the gait features.
By improving the nonlinear 3D morphable model in both learning objective and network architecture, we present a model which is superior in capturing higher level of details than the linear or its precedent nonlinear counterparts.
Ranked #21 on 3D Face Reconstruction on REALY
We also show that the standard Mean Squared Error (MSE) loss function can promote depth mixing, and thus propose instead to use cross-entropy loss for DC.
In recent years, heatmap regression based models have shown their effectiveness in face alignment and pose estimation.
Understanding the world around us and making decisions about the future is a critical component to human intelligence.
To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans.
In this work, motivated by the noise modeling and denoising algorithms, we identify a new problem of face de-spoofing, for the purpose of anti-spoofing: inversely decomposing a spoof face into a spoof noise and a live face, and then utilizing the spoof noise for classification.
As a classic statistical model of 3D facial shape and texture, 3D Morphable Model (3DMM) is widely used in facial analysis, e. g., model fitting, image synthesis.
Ranked #2 on Face Alignment on AFLW2000
In this paper, we propose to tackle these three challenges in an new alignment framework termed 3D Dense Face Alignment (3DDFA), in which a dense 3D Morphable Model (3DMM) is fitted to the image via Cascaded Convolutional Neural Networks.
Ranked #3 on Face Alignment on AFLW
This paper proposes an encoder-decoder network to disentangle shape features during 3D face reconstruction from single 2D images, such that the tasks of reconstructing accurate 3D face shapes and learning discriminative shape features for face recognition can be accomplished simultaneously.
Face anti-spoofing is the crucial step to prevent face recognition systems from a security breach.
In this paper, we propose a center-based feature transfer framework to augment the feature space of under-represented subjects from the regular subjects that have sufficiently diverse samples.
Recent developments in deep domain adaptation have allowed knowledge transfer from a labeled source domain to an unlabeled target domain at the level of intermediate features or input pixels.
We present a novel deep end-to-end trainable Face Super-Resolution Network (FSRNet), which makes full use of the geometry prior, i. e., facial landmark heatmaps and parsing maps, to super-resolve very low-resolution (LR) face images without well-aligned requirement.
We present visual-analytics methods to reveal and analyze this hierarchy of similar classes in relation with CNN-internal data.
This paper presents an automated monocular-camera-based computer vision system for autonomous self-backing-up a vehicle towards a trailer, by continuously estimating the 3D trailer coupler position and feeding it to the vehicle control system, until the alignment of the tow hitch with the trailers coupler.
Extensive experiments show that the proposed method can achieve the state-of-the-art accuracy in both face alignment and 3D face reconstruction, and benefit face recognition owing to its reconstructed PEN 3D face.
We apply MemNet to three image restoration tasks, i. e., image denosing, super-resolution and JPEG deblocking.
Specifically, residual learning is adopted, both in global and local manners, to mitigate the difficulty of training very deep networks; recursive learning is used to control the model parameters while increasing the depth.
Ranked #10 on Video Super-Resolution on MSU Video Upscalers: Quality Enhancement (VMAF metric)
To leverage the valuable information in the corrupted data, we propose to impute the missing data by leveraging the relatedness among different modalities.
When placed properly, the additional supervision helps guide features in shared layers to become more sophisticated and helpful for the downstream pedestrian detector.
Ranked #20 on Pedestrian Detection on Caltech
First, the encoder-decoder structure of the generator enables DR-GAN to learn a representation that is both generative and discriminative, which can be used for face image synthesis and pose-invariant face recognition.
Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations in unconstrained environments.
RNN-based approaches have achieved outstanding performance on action recognition with skeleton inputs.
Ranked #1 on Skeleton Based Action Recognition on SBU
First, we propose a multi-task Convolutional Neural Network (CNN) for face recognition where identity classification is the main task and pose, illumination, and expression estimations are the side tasks.
Given a collection of "in-the-wild" face images captured under a variety of unknown pose, expression, and illumination conditions, this paper presents a method for reconstructing a 3D face surface model of an individual along with albedo information.
Large-pose face alignment is a very challenging problem in computer vision, which is used as a prerequisite for many important vision tasks, e. g, face recognition and 3D face reconstruction.
Global motion compensation (GMC) removes the impact of camera motion and creates a video in which the background appears static over the progression of time.
Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in CV community.
Ranked #3 on 3D Face Reconstruction on Florence
Second, by leveraging emerging face alignment techniques and our novel normal field-based Laplace editing, a combination of landmark constraints and photometric stereo-based normals drives our surface reconstruction.
First, leaf segmentation and alignment are applied on the last frame of a plant video to find a number of well-aligned leaf candidates.
Overlap is one of the characteristics of social networks, in which a person may belong to more than one social group.
Social and Information Networks Data Structures and Algorithms Physics and Society