The task of semantic code search is to retrieve code snippets from a source code corpus based on an information need expressed in natural language.
Researchers have proposed simple yet effective techniques for the retrieval problem based on using BERT as a relevance classifier to rerank initial candidates from keyword search.
This paper describes the first publicly available corpus of Hmong, a minority language of China, Vietnam, Laos, Thailand, and various countries in Europe and the Americas.
For COCO, DE-ViT outperforms the open-vocabulary SoTA by 6. 9 AP50 and achieves 50 AP50 in novel classes.
Ranked #1 on Few-Shot Object Detection on MS-COCO (30-shot)
1 code implementation • 19 Sep 2023 • Chengyan Wang, Jun Lyu, Shuo Wang, Chen Qin, Kunyuan Guo, Xinyu Zhang, Xiaotong Yu, Yan Li, Fanwen Wang, Jianhua Jin, Zhang Shi, Ziqiang Xu, Yapeng Tian, Sha Hua, Zhensen Chen, Meng Liu, Mengting Sun, Xutong Kuang, Kang Wang, Haoran Wang, Hao Li, Yinghua Chu, Guang Yang, Wenjia Bai, Xiahai Zhuang, He Wang, Jing Qin, Xiaobo Qu
However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images.
To adapt to environmental changes, we model the task graph scheduling for computation offloading as a Markov Decision Process (MDP).
In this work, we revisit this key challenge through the lens of gradient conflicts on the server side.
However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain.
A standard convex SPCA-based model with PSD constraint for unsupervised feature selection is proposed.
In some previous works, additional modules like graph neural networks (GNNs) are trained on retrieved knowledge from external knowledge bases, aiming to mitigate the problem of lacking domain-specific knowledge.
Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up(e. g., entity mismatch) of irrelevant references.
Multi-modal fusion is increasingly being used for autonomous driving tasks, as images from different modalities provide unique information for feature extraction.
In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations.
The language models, especially the basic text classification models, have been shown to be susceptible to textual adversarial attacks such as synonym substitution and word insertion attacks.
Single-frame infrared small target detection is considered to be a challenging task, due to the extreme imbalance between target and background, bounding box regression is extremely sensitive to infrared small targets, and small target information is easy to lose in the high-level semantic layer.
Our fine-tuning procedure outperforms state-of-the-art techniques for unsupervised semantic segmentation through linear probing, without the use of any labeled data.
In this paper, we propose a hybrid point-wise Radar-Optical fusion approach for object detection in autonomous driving scenarios.
When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have made significant strides.
This scheme generalizes the Mixup strategy for general images to special WSIs via pseudo-bags so as to be applied in MIL-based WSI classification.
We discuss how Pyserini - a widely used toolkit for reproducible IR research can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
Transfer learning provides the possibility to solve this problem, but there are too many features in natural images that are not related to the target domain.
In this work, we aim to re-expose the captured photo in post-processing to provide a more flexible way of addressing those issues within a unified framework.
Ranked #4 on Deblurring on GoPro (using extra training data)
A robot for the field application environment was proposed, and a lightweight global spatial planning technique for the robot based on the graph-search algorithm taking mode switching point optimization into account, with an emphasis on energy efficiency, searching speed, and the viability of real deployment.
The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs.
Supervised ranking methods based on bi-encoder or cross-encoder architectures have shown success in multi-stage text ranking tasks, but they require large amounts of relevance judgments as training data.
To quantify the correlation in multi-modal information, we model the uncertainty, as the inverse of data information, in different modalities and embed it in the bounding box generation.
In this paper, we introduce a new NLP task -- generating short factual articles with references for queries by mining supporting evidence from the Web.
no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang
The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.
Our work significantly outperforms the state-of-the-art for three-finger robotic hands.
To fully exploit information with event streams to detect objects, a dual-memory aggregation network (DMANet) is proposed to leverage both long and short memory along event streams to aggregate effective information for object detection.
In essence, instead of predicting the pixel-wise depth, we regress the height to the ground to achieve a distance-agnostic formulation to ease the optimization process of camera-only perception methods.
Ranked #1 on 3D Object Detection on DAIR-V2X-I
In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process.
We present Spacerini, a modular framework for seamless building and deployment of interactive search applications, designed to facilitate the qualitative analysis of large scale research datasets.
Recent progress in information retrieval finds that embedding query and document representation into multi-vector yields a robust bi-encoder retriever on out-of-distribution datasets.
The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios.
With the increasing complexity of the traffic environment, the importance of safety perception in intelligent driving is growing.
We start with a N=50 user study to investigate how different factors may affect human performance, which help us gain prior knowledge in framework design.
This paper proposes a generative design approach based on the generative pre-trained language model (PLM) to automatically retrieve and map biological analogy and generate BID in the form of natural language.
Label smoothing is a regularization technique widely used in supervised learning to improve the generalization of models on various tasks, such as image classification and machine translation.
That is to say, the smaller the model, the lower the mask ratio needs to be.
Machine learning and deep learning classification models are data-driven, and the model and the data jointly determine their classification performance.
RFID localization is considered the key enabler of automating the process of inventory tracking and management for high-performance logistic network.
MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world.
However, only a handful of the 7000+ languages on the planet benefit from specialized, custom-built tokenization algorithms, while the other languages are stuck with a "default" whitespace tokenizer, which cannot capture the intricacies of different languages.
Hyperlinks, which are commonly used in Web pages, have been leveraged for designing pre-training objectives.
In this paper, we proposed a novel method to design a waveform to synthesize the STAF based on suppressing the interference power.
As such, we propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT), which uses both camera and LiDAR data and significantly reduces tracking failures caused by occlusion and false detection.
In this work, we propose a curriculum learning framework for context-aware document ranking, in which the ranking model learns matching signals between the search context and the candidate document in an easy-to-hard manner.
Our experiments on 13 benchmark datasets across five natural language understanding tasks demonstrate the superiority of our method.
The novelty of this project is to utilize the ordinal information among labels and add a new regression task, which can help the model learn more discriminative feature embedding for fine-grained classification tasks.
1 code implementation • 21 Jul 2022 • Teng Xi, Yifan Sun, Deli Yu, Bi Li, Nan Peng, Gang Zhang, Xinyu Zhang, Zhigang Wang, Jinwen Chen, Jian Wang, Lufei Liu, Haocheng Feng, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang
UFO aims to benefit each single task with a large-scale pretraining on all tasks.
In PGU, we adopt a set of shared and learnable prototypes as the queries to extract diverse and semantically aligned features for both modalities in the granularity-unified feature space, which further promotes the ReID performance.
Besides, by leveraging full training set and the additional 48K raw images of KITTI, it can further improve the MonoFlex by +4. 65% improvement on AP@0. 7 for car detection, reaching 18. 54% AP@0. 7, which ranks the 1st place among all monocular based methods on KITTI test leaderboard.
Heart rate measuring based on remote photoplethysmography (rPPG) plays an important role in health caring, which estimates heart rate from facial video in a non-contact, less-constrained way.
In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs).
For example, on MS MARCO Passage v1, our method yields an average candidate set size of 27 out of 1, 000 which increases the reranking speed by about 37 times, while the MRR@10 is greater than a pre-specified value of 0. 38 with about 90% empirical coverage and the empirical baselines fail to provide such guarantee.
Specifically, we generate support samples from actual samples and their neighbouring clusters in the embedding space through a progressive linear interpolation (PLI) strategy.
Dense retrieval models using a transformer-based bi-encoder design have emerged as an active area of research.
To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR).
Therefore, this study proposed a novel multi-channel convolutional neural network (CNN) architecture to address the multi-class classification task of thyroid disease.
However, to figure out whether PLMs can be reliable knowledge sources and used as alternative knowledge bases (KBs), we need to further explore some critical features of PLMs.
State-of-the-art quantization techniques are currently applied to both the weights and activations; however, pruning is most often applied to only the weights of the network.
no code implementations • 14 Oct 2021 • Hao Jiang, Ke Zhan, Jianwei Qu, Yongkang Wu, Zhaoye Fei, Xinyu Zhang, Lei Chen, Zhicheng Dou, Xipeng Qiu, Zikai Guo, Ruofei Lai, Jiawen Wu, Enrui Hu, Yinxia Zhang, Yantao Jia, Fan Yu, Zhao Cao
To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU).
ELUE is dedicated to depict the Pareto Frontier for various language understanding tasks, such that it can tell whether and how much a method achieves Pareto improvement.
We propose a novel modification of the standard upper confidence bound (UCB) method for the stochastic multi-armed bandit (MAB) problem which tunes the confidence bound of a given bandit based on its distance to others.
Finally, supervisory signal in rear compressor is computed based on condition probability and thus can control sample dynamic and further enhance the model performance.
To learn a more robust representation of the user behavior sequence, we propose a method based on contrastive learning, which takes into account the possible variations in user's behavior sequences.
In this paper, we propose to leverage the large-scale hyperlinks and anchor texts to pre-train the language model for ad-hoc retrieval.
We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations.
Currently, the most popular method for open-domain Question Answering (QA) adopts "Retriever and Reader" pipeline, where the retriever extracts a list of candidate documents from a large set of documents followed by a ranker to rank the most relevant documents and the reader extracts answer from the candidates.
Vehicle detection with visual sensors like lidar and camera is one of the critical functions enabling autonomous driving.
Through an IPS (Intersection Perception System) installed at the diagonal of the intersection, this paper proposes a high-quality multimodal dataset for the intersection perception task.
But more importantly, the proposed $W$++ space achieves superior performance in both reconstruction quality and editing quality.
In this paper, we show that a novel objective function for the training of the ensemble internal classifiers can be naturally induced from the perspective of ensemble learning and information theory.
Then we propose a dual encoder-decoder structure to model the generation of responses in both positive and negative side based on the changes of the user's emotion status in the conversation.
To address this issue, we propose a novel graph convolutional network-based LDCT denoising model, namely GCN-MIF, to explicitly perform multi-information fusion for denoising purpose.
3D object detection with a single image is an essential and challenging task for autonomous driving.
There has recently been growing interest in utilizing multimodal sensors to achieve robust lane line segmentation.
3D object detection plays a crucial role in environmental perception for autonomous vehicles, which is the prerequisite of decision and control.
Person search aims to localize and identify a specific person from a gallery of images.
Many recent trackers adopt the multiple-stage tracking strategy to improve the quality of bounding box estimation.
Ranked #13 on Semi-Supervised Video Object Segmentation on VOT2020
In this paper, we present P2PLocate, a peer-to-peer localization system that enables a single-antenna device co-located with a batteryless backscatter tag to localize another single-antenna device with decimeter-level accuracy.
Object detection methods are widely adopted for computer-aided diagnosis using medical images.
The second limitation comes from a common sense that these sensors can only pick up a narrow band (85-100Hz) of speech signals due to a sampling ceiling of 200Hz.