Furthermore, our method outperforms existing lightweight methods in terms of accuracy and efficiency for the gaze estimation task.
We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
Ranked #1 on Speech Recognition on Common Voice English (using extra training data)
no code implementations • 21 Nov 2022 • Shiqiang Zhu, Ting Yu, Tao Xu, Hongyang Chen, Schahram Dustdar, Sylvain Gigan, Deniz Gunduz, Ekram Hossain, Yaochu Jin, Feng Lin, Bo Liu, Zhiguo Wan, Ji Zhang, Zhifeng Zhao, Wentao Zhu, Zuoning Chen, Tariq Durrani, Huaimin Wang, Jiangxing Wu, Tongyi Zhang, Yunhe Pan
In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications.
In the first stage, we introduce a base encoder that converts the input image to a latent code.
TReaderXML dynamically obtains teacher knowledge for each text by similar texts and hierarchical label information in training sets to release the ability of distinctly fine-grained label-oriented semantic scope.
This work aims to provide an effective deep learning framework to predict the vector-soliton solutions of the coupled nonlinear equations and their interactions.
As the ageing population and childlessness are increasing in rural China, social pensions will become the mainstream choice for farmers, and the level of social pensions must be supported by better social insurance.
As the quality of few shot facial animation from landmarks increases, new applications become possible, such as ultra low bandwidth video chat compression with a high degree of realism.
One of the most significant challenges of EEG-based emotion recognition is the cross-subject EEG variations, leading to poor performance and generalizability.
The international mega-event, such as the Winter Olympic Game, has been considered as one of the most carbon intensive activities worldwide.
The issue of voltage variations caused by integration of renewables has been addressed in this paper through distributed management of Microgrids (MGs).
1 code implementation • 24 Jan 2022 • Arvind Neelakantan, Tao Xu, Raul Puri, Alec Radford, Jesse Michael Han, Jerry Tworek, Qiming Yuan, Nikolas Tezak, Jong Wook Kim, Chris Hallacy, Johannes Heidecke, Pranav Shyam, Boris Power, Tyna Eloundou Nekoul, Girish Sastry, Gretchen Krueger, David Schnurr, Felipe Petroski Such, Kenny Hsu, Madeleine Thompson, Tabarak Khan, Toki Sherbakov, Joanne Jang, Peter Welinder, Lilian Weng
Similarly to text embeddings, we train code embedding models on (text, code) pairs, obtaining a 20. 8% relative improvement over prior best work on code search.
Ranked #1 on Code Search on CodeSearchNet
The hard samples, which are beneficial for classifier learning, are often mistakenly treated as noises in such a setting since both the hard samples and ones with noisy labels lead to a relatively larger loss value than the easy cases.
no code implementations • 11 Oct 2021 • Jesse Michael Han, Igor Babuschkin, Harrison Edwards, Arvind Neelakantan, Tao Xu, Stanislas Polu, Alex Ray, Pranav Shyam, Aditya Ramesh, Alec Radford, Ilya Sutskever
We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models.
In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images that adhere to the input content.
no code implementations • 9 Mar 2021 • Xian Sun, Peijin Wang, Zhiyuan Yan, Feng Xu, Ruiping Wang, Wenhui Diao, Jin Chen, Jihao Li, Yingchao Feng, Tao Xu, Martin Weinmann, Stefan Hinz, Cheng Wang, Kun fu
In this paper, we propose a novel benchmark dataset with more than 1 million instances and more than 15, 000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery which is named as FAIR1M.
no code implementations • 8 Mar 2021 • Christopher Bendkowski, Laurent Mennillo, Tao Xu, Mohamed Elsayed, Filip Stojic, Harrison Edwards, Shuailong Zhang, Cindi Morshead, Vijay Pawar, Aaron R. Wheeler, Danail Stoyanov, Michael Shaw
Optoelectronic tweezer-driven microrobots (OETdMs) are a versatile micromanipulation technology based on the use of light induced dielectrophoresis to move small dielectric structures (microrobots) across a photoconductive substrate.
no code implementations • 13 Jan 2021 • OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya, Vineet Kosaraju, Peter Welinder, Ruben D'Sa, Arthur Petron, Henrique Ponde de Oliveira Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu, Wojciech Zaremba
We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects.
no code implementations • 1 Dec 2020 • Maxime Oquab, Pierre Stock, Oran Gafni, Daniel Haziza, Tao Xu, Peizhao Zhang, Onur Celebi, Yana Hasson, Patrick Labatut, Bobo Bose-Kolanu, Thibault Peyronel, Camille Couprie
To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extracted at the sender's side and transmitted over the network.
For smooth convex loss functions with (non)-smooth regularization, we propose the first differentially private ADMM (DP-ADMM) algorithm with performance guarantee of $(\epsilon,\delta)$-differential privacy ($(\epsilon,\delta)$-DP).
Methods for environmental image capture, 3D reconstruction (photogrammetry) and the creation of foreground assets are presented along with a flexible and user-friendly simulation interface.
Existing multimodal emotion databases in the real-world conditions are few and small, with a limited number of subjects and expressed in a single language.
We propose a masking mechanism for feature map reuse, so that memory and computational costs stay nearly constant as the search space expands.
Ranked #68 on Neural Architecture Search on ImageNet
In this paper, we propose a novel and more flexible GCN model with a feature encoder that adaptively updates the adjacency matrix during learning and demonstrate that this model design leads to improved performance.
In the ten-fold cross-validation process, the CADx approach, HIENet, achieved a 76. 91 $\pm$ 1. 17% (mean $\pm$ s. d.) classification accuracy for four classes of endometrial tissue, namely normal endometrium, endometrial polyp, endometrial hyperplasia, and endometrial adenocarcinoma.
1 code implementation • 22 Oct 2018 • Malay Haldar, Mustafa Abdool, Prashant Ramanathan, Tao Xu, Shulin Yang, Huizhong Duan, Qing Zhang, Nick Barrow-Williams, Bradley C. Turnbull, Brendan M. Collins, Thomas Legrand
The application to search ranking is one of the biggest machine learning success stories at Airbnb.
PMODE and HECO-PDE are compared with the algorithms from the IEEE CEC 2018 competition and another recent MOEA for constrained optimisation.
Convolutional neural networks are powerful tools for image segmentation and classification.
no code implementations • 10 Feb 2018 • Tao Tan, Zhang Li, Haixia Liu, Ping Liu, Wenfang Tang, Hui Li, Yue Sun, Yusheng Yan, Keyu Li, Tao Xu, Shanshan Wan, Ke Lou, Jun Xu, Huiming Ying, Quchang Ouyang, Yuling Tang, Zheyu Hu, Qiang Li
To help doctors to be more selective on biopsies and provide a second opinion on diagnosis, in this work, we propose a computer-aided diagnosis (CAD) system for lung diseases including cancers and tuberculosis (TB).
In this paper, we propose an Attentional Generative Adversarial Network (AttnGAN) that allows attention-driven, multi-stage refinement for fine-grained text-to-image generation.
Ranked #8 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
When evaluated with neural distance, our bounds show that generalization is guaranteed as long as the discriminator set is small enough, regardless of the size of the generator or hypothesis set.
In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.
Ranked #5 on Text-to-Image Generation on Oxford 102 Flowers
Extensive experimental results demonstrate the effectiveness of the proposed SegAN with multi-scale loss: on BRATS 2013 SegAN gives performance comparable to the state-of-the-art for whole tumor and tumor core segmentation while achieves better precision and sensitivity for Gd-enhance tumor core segmentation; on BRATS 2015 SegAN achieves better performance than the state-of-the-art in both dice score and precision.
Ranked #1 on Brain Tumor Segmentation on BRATS-2013 leaderboard
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.
Ranked #3 on Text-to-Image Generation on Oxford 102 Flowers (Inception score metric)
In this paper, we propose a new CNN architecture that integrates semantic part detection and abstraction (SPDA-CNN) for fine-grained classification.
It is well known that scene images are well characterized by particular arrangements of their local structures, we divide the scene image into the non-overlapping sub-regions and compute the proposed higher order structural features among them.
In object recognition, Fisher vector (FV) representation is one of the state-of-art image representations ways at the expense of dense, high dimensional features and increased computation time.