It also improves the present state-of-the-art by 0. 35 and 0. 12 BLEU points for German-English and Spanish-English and respectively.
Given that 44% of Indian population speaks and operates in Hindi language and we address the above challenges by presenting an English–to–Hindi neural machine translation (NMT) system to translate the product reviews available on e-commerce websites by creating an in-domain parallel corpora and handling various types of noise in reviews via two data augmentation techniques and viz.
We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x less memory and faster training/inference speed.
Implicit Neural Representations (INR) or neural fields have emerged as a popular framework to encode multimedia signals such as images and radiance fields while retaining high-quality.
Recognizing and generating object-state compositions has been a challenging task, especially when generalizing to unseen compositions.
We present a self-supervised technique that directly optimizes on a sparse collection of images of a particular object/object category to obtain consistent dense correspondences across the collection.
We compare ViTs trained through different methods of supervision, and show that they learn a diverse range of behaviors in terms of their attention, representations, and downstream performance.
We present a robot-to-human object handover algorithm and implement it on a 7-DOF arm equipped with a 3-finger mechanical hand.
We introduce LilNetX, an end-to-end trainable technique for neural networks that enables learning models with specified accuracy-rate-computation trade-off.
We study a referential game (a type of signaling game) where two agents communicate with each other via a discrete bottleneck to achieve a common goal.
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents and 3D objects.
The recently proposed Lottery Ticket Hypothesis (LTH) states that deep neural networks trained on large datasets contain smaller subnetworks that achieve on par performance as the dense networks.
Our simple encoder-decoder framework, comprised of a novel identity encoder and class-conditional viewpoint generator, generates 3D consistent depth maps.
Generating a new layout or extending an existing layout requires understanding the relationships between these primitives.
We present a generalized grasping algorithm that uses point clouds (i. e. a group of points and their respective surface normals) to discover grasp pose solutions for multiple grasp types, executed by a mechanical gripper, in near real-time.
Unsupervised representation learning holds the promise of exploiting large amounts of unlabeled data to learn general representations.