This strategy can implicitly enhance the model's robustness during the optimization process, mitigating instability risks arising from multimodal information inputs.
Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention.
The self-attention mechanism (SAM) is widely used in various fields of artificial intelligence and has successfully boosted the performance of different models.
In computer vision, the performance of deep neural networks (DNNs) is highly related to the feature extraction ability, i. e., the ability to recognize and focus on key pixel regions in an image.
Our approach can make text-to-image diffusion models easier to use with better user experience, which demonstrates our approach has the potential for further advancing the development of user-friendly text-to-image generation models by bridging the semantic gap between simple narrative prompts and complex keyword-based prompts.
This technique enables the mitigation of the extra costs for performance improvement during training, such as parameter size and inference time, through these transformations during inference, and therefore SRP has great potential for industrial and practical applications.
Naturally, we also present a unified multi-task Geometric Transformer framework, Geoformer, to tackle calculation and proving problems simultaneously in the form of sequence generation, which finally shows the reasoning ability can be improved on both two tasks by unifying formulation.
Ranked #3 on Mathematical Reasoning on PGPS9K
More and more empirical and theoretical evidence shows that deepening neural networks can effectively improve their performance under suitable training settings.
However, most existing methods mainly focus on the dialogue context or assist with global satisfaction prediction based on multi-task learning, which ignore the grounded relationships among the causal variables, like the user state and labor cost.
Recently many effective attention modules are proposed to boot the model performance by exploiting the internal information of convolutional neural networks in computer vision.
Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input, has recently received considerable attention with regard to its tremendous application potentials.
To address this issue and make a step towards interpretable MWP solving, we first construct a high-quality MWP dataset named InterMWP which consists of 11, 495 MWPs and annotates interpretable logical formulas based on algebraic knowledge as the grounded linguistic logic of each solution equation.
However, current solvers exist solving bias which consists of data bias and learning bias due to biased dataset and improper training strategy.
Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.
Therefore, we propose a Geometric Question Answering dataset GeoQA, containing 4, 998 geometric problems with corresponding annotated programs, which illustrate the solving process of the given problems.
Ranked #4 on Mathematical Reasoning on PGPS9K
In contrast to existing studies that ignore difficulty diversity, we adopt different stage of a neural network to perform image restoration.
A practical automatic textual math word problems (MWPs) solver should be able to solve various textual MWPs while most existing works only focused on one-unknown linear MWPs.
Ranked #10 on Math Word Problem Solving on ALG514
Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.
Target-guided open-domain conversation aims to proactively and naturally guide a dialogue agent or human to achieve specific goals, topics or keywords during open-ended conversations.
To identify whether a region is easy or hard, we propose a novel image difficulty recognition network based on PSNR prior.
no code implementations • 3 Oct 2018 • Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X. Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei Li, Eirikur Agustsson, Shuhang Gu, Luc van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong, Qinquan Gao, Liu Hanwen, Pablo Navarrete Michelini, Zhu Dan, Hu Fengshuo, Zheng Hui, Xiumei Wang, Lirui Deng, Rang Meng, Jinghui Qin, Yukai Shi, Wushao Wen, Liang Lin, Ruicheng Feng, Shixiang Wu, Chao Dong, Yu Qiao, Subeesh Vasu, Nimisha Thekke Madam, Praveen Kandula, A. N. Rajagopalan, Jie Liu, Cheolkon Jung
This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones.