Therefore, in this paper, we are the first to jointly perform multi-modal ATE (MATE) and multi-modal ASC (MASC), and we propose a multi-modal joint learning approach with auxiliary cross-modal relation detection for multi-modal aspect-level sentiment analysis (MALSA).
Therefore, we propose SpeechTokenizer, a unified speech tokenizer for speech large language models.
TFISeg does not require training a semantic or/and instance segmentation model and avoids the need for instance-level image annotations.
A recent DIC method proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i. e., reference-based DIC (Ref-DIC).
no code implementations • 30 May 2023 • Maruf Adewole, Jeffrey D. Rudie, Anu Gbadamosi, Oluyemisi Toyobo, Confidence Raymond, Dong Zhang, Olubukola Omidiji, Rachel Akinola, Mohammad Abba Suwaid, Adaobi Emegoakor, Nancy Ojo, Kenneth Aguh, Chinasa Kalaiwo, Gabriel Babatunde, Afolabi Ogunleye, Yewande Gbadamosi, Kator Iorpagher, Evan Calabrese, Mariam Aboian, Marius Linguraru, Jake Albrecht, Benedikt Wiestler, Florian Kofler, Anastasia Janas, Dominic LaBella, Anahita Fathi Kzerooni, Hongwei Bran Li, Juan Eugenio Iglesias, Keyvan Farahani, James Eddy, Timothy Bergquist, Verena Chung, Russell Takeshi Shinohara, Walter Wiggins, Zachary Reitman, Chunhao Wang, Xinyang Liu, Zhifan Jiang, Ariana Familiar, Koen van Leemput, Christina Bukas, Maire Piraud, Gian-Marco Conte, Elaine Johansson, Zeke Meier, Bjoern H Menze, Ujjwal Baid, Spyridon Bakas, Farouk Dako, Abiodun Fatade, Udunna C Anazodo
Thus, the BraTS-Africa Challenge provides a unique opportunity to include brain MRI glioma cases from SSA in global efforts through the BraTS Challenge to develop and evaluate computer-aided-diagnostic (CAD) methods for the detection and characterization of glioma in resource-limited settings, where the potential for CAD tools to transform healthcare are more likely.
The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST.
Multi-modal large language models are regarded as a crucial step towards Artificial General Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT.
Sharing revenue with data providers using such a scoring system would encourage more data owners to participate in the revenue-sharing program.
Medical image segmentation is a fundamental task in the community of medical image analysis.
In this paper, we propose a novel image forgery detection paradigm for boosting the model learning capacity on both forgery-sensitive and genuine compact visual patterns.
Specifically, a flexible context aggregation module is proposed to capture the global object context in different granular spaces.
Recently, the advent of vision Transformer (ViT) has brought substantial advancements in 3D dataset benchmarks, particularly in 3D volumetric medical image segmentation (Vol-MedSeg).
SSC is a well-known ill-posed problem as the prediction model has to "imagine" what is behind the visible surface, which is usually represented by Truncated Signed Distance Function (TSDF).
In this paper, we propose a novel framework, TransPro, that translates 3D Optical Coherence Tomography (OCT) images into exclusive 3D OCTA images using an image translation pattern.
no code implementations • 2 Feb 2023 • Meng Zhao, Yifan Hu, Ruixuan Jiang, Yuanli Zhao, Dong Zhang, Yan Zhang, Rong Wang, Yong Cao, Qian Zhang, Yonggang Ma, Jiaxi Li, Shaochen Yu, Wenjie Li, Ran Zhang, Yefeng Zheng, Shuo Wang, Jizong Zhao
Conclusions: The proposed deep learning algorithms can be an effective tool for early identification of hemorrhage etiologies based on NCCT scans.
Noticing that both the absolute and relative velocity protocols can solve the second-order consensus of multi-agent systems, this paper aims to investigate which of the above two protocols has better anti-disturbance capability, in which the anti-disturbance capability is measured by the L2 gain from the disturbance to the consensus error.
To address this problem, in this paper, we propose a Centralized Feature Pyramid (CFP) for object detection, which is based on a globally explicit centralized feature regulation.
Over the past few years, the rapid development of deep learning technologies for computer vision has significantly improved the performance of medical image segmentation (MedISeg).
In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
Unfortunately, reference images used by existing Ref-DIC works are easy to distinguish: these reference images only resemble the target image at scene-level and have few common objects, such that a Ref-DIC model can trivially generate distinctive captions even without considering the reference images.
To relax this assumption, in this work, we propose a label-agnostic unified federated learning framework, named FedMix, for medical image segmentation based on mixed image labels.
In this letter, we first underline the importance of the neck network in object detection from the perspective of information bottleneck.
2 code implementations • 4 Mar 2022 • Maxime Gasse, Quentin Cappart, Jonas Charfreitag, Laurent Charlin, Didier Chételat, Antonia Chmiela, Justin Dumouchelle, Ambros Gleixner, Aleksandr M. Kazachkov, Elias Khalil, Pawel Lichocki, Andrea Lodi, Miles Lubin, Chris J. Maddison, Christopher Morris, Dimitri J. Papageorgiou, Augustin Parjadis, Sebastian Pokutta, Antoine Prouvost, Lara Scavuzzo, Giulia Zarpellon, Linxin Yang, Sha Lai, Akang Wang, Xiaodong Luo, Xiang Zhou, Haohan Huang, Shengcheng Shao, Yuanming Zhu, Dong Zhang, Tao Quan, Zixuan Cao, Yang Xu, Zhewei Huang, Shuchang Zhou, Chen Binbin, He Minggui, Hao Hao, Zhang Zhiyu, An Zhiwu, Mao Kun
Combinatorial optimization is a well-established area in operations research and computer science.
To enhance the robustness of the system and reduce data collecting efforts, we design a data augmentation framework for mmWave signals based on correlations between signal patterns and gesture variations.
This manuscript presents an algorithm for individual Lithium-ion (Li-ion) battery cell state of charge (SOC) estimation in a large-scale battery pack under minimal sensing, where only pack-level voltage and current are measured.
More specifically, we scan the whole input images and its priority maps in the form of column vector to obtain a relevance matrix estimating their similarity.
Specifically, for a given set of feature maps, CG first computes the feature similarity between each channel and the remaining channels as the intermediary calibration guidance.
In this paper, a novel OpenFlow-enabled deep packet inspection (OFDPI) approach is proposed based on the SDN paradigm to provide adaptive and efficient packet inspection.
Networking and Internet Architecture
no code implementations • 15 Oct 2020 • Tri Vu, Anthony DiSpirito III, Daiwei Li, Zixuan Zhang, Xiaoyi Zhu, Maomao Chen, Laiming Jiang, Dong Zhang, Jianwen Luo, Yu Shrike Zhang, Qifa Zhou, Roarke Horstmeyer, Junjie Yao
Photoacoustic microscopy (PAM) is an emerging imaging method combining light and sound.
We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS).
Ranked #26 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
As local pose estimation is ill-conditioned, local pose estimation failures happen regularly, making the overall SLAM system brittle.
This paper introduces our approaches for the Mask and Breathing Sub-Challenge in the Interspeech COMPARE Challenge 2020.
Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales.
One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed.
Paper-reviewer recommendation task is of significant academic importance for conference chairs and journal editors.
The proposed DMQCA model consists of a multiview module with two attention mechanisms, a key-frame module, and a regression module, to achieve direct accurate multiple-index estimation.
On the one hand, our approach represents each utterance and each speaker as a node.
Ranked #46 on Emotion Recognition in Conversation on MELD
With multiple crowd gatherings of millions of people every year in events ranging from pilgrimages to protests, concerts to marathons, and festivals to funerals; visual crowd analysis is emerging as a new frontier in computer vision.
Ranked #7 on Crowd Counting on UCF-QNRF
Since the source sentence is broken into two fragments: the sentence's left fragment (before the blank) and the sentence's right fragment (after the blank), traditional Recurrent Neural Networks cannot encode this structure accurately because of many possible variations of the missing word in terms of the location and type of the word in the source sentence.
To reduce the large search space, the first stage (ClusterNet) takes in a set of extremely large video frames, combines the motion and appearance information within the convolutional architecture, and proposes regions of objects of interest (ROOBI).
Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly.
In this paper, we propose a two-view label propagation approach to semi-supervised reader emotion classification by exploiting two views, namely source text and response text in a label propagation algorithm.
Accordingly we propose an end-to-end face recognition method to deal with pose and illumination simultaneously based on convolutional networks where the discriminative nonlinear features that are invariant to pose and illumination are extracted.
For character detection, we use the HSC features instead of using the Histograms of Oriented Gradients (HOG) features.
Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames.
In contrast, we propose to extract cross-image features, i. e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces.
The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video.