1 code implementation • ICML 2020 • Xu Chu, Yang Lin, Xiting Wang, Xin Gao, Qi Tong, Hailong Yu, Yasha Wang
Distance metric learning (DML) is to learn a representation space equipped with a metric, such that examples from the same class are closer than examples from different classes with respect to the metric.
no code implementations • 23 Dec 2024 • Xin Gao, Yang Lin, Ruiqing Li, Yasha Wang, Xu Chu, Xinyu Ma, Hailong Yu
Data mining and knowledge discovery are essential aspects of extracting valuable insights from vast datasets.
1 code implementation • 31 Oct 2024 • Xinke Jiang, Rihong Qiu, Yongxin Xu, Wentao Zhang, Yichen Zhu, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang
Graph Neural Networks (GNNs) have become essential in interpreting relational data across various domains, yet, they often struggle to generalize to unseen graph data that differs markedly from training instances.
no code implementations • 22 Oct 2024 • Zhijie Tan, Xu Chu, Weiping Li, Tong Mo
Furthermore, we demonstrate that popular MLLMs pay special attention to certain multimodal context positions, particularly the beginning and end.
no code implementations • 14 Oct 2024 • Yongxin Xu, Ruizhe Zhang, Xinke Jiang, Yujie Feng, Yuzhen Xiao, Xinyu Ma, Runchuan Zhu, Xu Chu, Junfeng Zhao, Yasha Wang
Retrieval-Augmented Generation (RAG) offers an effective solution to the issues faced by Large Language Models (LLMs) in hallucination generation and knowledge obsolescence by incorporating externally retrieved knowledge.
no code implementations • 13 Oct 2024 • Hongxin Ding, Yue Fang, Runchuan Zhu, Xinke Jiang, Jinyang Zhang, Yongxin Xu, Xu Chu, Junfeng Zhao, Yasha Wang
To address this, we propose a two-stage model-centric data selection framework, Decomposed Difficulty Data Selection (3DS), which aligns data with the model's knowledge distribution for optimized adaptation.
no code implementations • 10 Oct 2024 • Weibin Liao, Xu Chu, Yasha Wang
In the domain of complex reasoning tasks, such as mathematical reasoning, recent advancements have proposed the use of Direct Preference Optimization (DPO) to suppress output of dispreferred responses, thereby enhancing the long-chain reasoning capabilities of large language models (LLMs).
1 code implementation • 23 Aug 2024 • Zhihao Yu, Yujie Jin, Yongxin Xu, Xu Chu, Yasha Wang, Junfeng Zhao
While pioneering deep learning methods have made great strides in analyzing electronic health record (EHR) data, they often struggle to fully capture the semantics of diverse medical codes from limited data.
1 code implementation • 19 Aug 2024 • Yujie Feng, Xu Chu, Yongxin Xu, Guangyuan Shi, Bo Liu, Xiao-Ming Wu
A practical dialogue system requires the capacity for ongoing skill acquisition and adaptability to new tasks while preserving prior knowledge.
2 code implementations • 17 Aug 2024 • Xinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen, Wentao Zhang, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang
In the pursuit of enhancing domain-specific Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) emerges as a promising solution to mitigate issues such as hallucinations, outdated knowledge, and limited expertise in highly specialized queries.
1 code implementation • 9 Aug 2024 • Yujie Feng, Xu Chu, Yongxin Xu, Zexin Lu, Bo Liu, Philip S. Yu, Xiao-Ming Wu
In this paper, we introduce a novel CL framework for language models, named Knowledge Localization and Fusion (KlF), which boosts knowledge transfer without depending on memory replay.
no code implementations • 6 Aug 2024 • Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao, Runchuan Zhu, Xinke Jiang, Xu Chu, Junfeng Zhao, Yasha Wang
Simultaneously, we proposed a rewriting strategy and data ratio optimization strategy to address preference imbalances.
no code implementations • 28 May 2024 • Renzhi Wu, Pramod Chunduri, Dristi J Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong
In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface.
1 code implementation • 15 May 2024 • Zhihao Yu, Xu Chu, Yujie Jin, Yasha Wang, Junfeng Zhao
To tackle this problem, we propose SMART, a Self-Supervised Missing-Aware RepresenTation Learning approach for patient health status prediction, which encodes missing information via elaborated attentions and learns to impute missing values through a novel self-supervised pre-training approach that reconstructs missing data representations in the latent space.
no code implementations • 15 Apr 2024 • Yang Lin, Xinyu Ma, Xu Chu, Yujie Jin, Zhibang Yang, Yasha Wang, Hong Mei
We then demonstrate the theoretical mechanism of our LoRA Dropout mechanism from the perspective of sparsity regularization by providing a generalization error bound under this framework.
2 code implementations • 5 Apr 2024 • Xinyu Ma, Xu Chu, Zhibang Yang, Yang Lin, Xin Gao, Junfeng Zhao
With the increasingly powerful performances and enormous scales of pretrained models, promoting parameter efficiency in fine-tuning has become a crucial need for effective and efficient adaptation to various downstream tasks.
no code implementations • 21 Mar 2024 • Yang Yao, Xin Wang, Zeyang Zhang, Yijian Qin, Ziwei Zhang, Xu Chu, Yuekui Yang, Wenwu Zhu, Hong Mei
In this paper, we propose LLM4GraphGen to explore the ability of LLMs for graph generation with systematical task designs and extensive experiments.
no code implementations • 30 Jan 2024 • Weibin Liao, Yinghao Zhu, Zixiang Wang, Xu Chu, Yasha Wang, Liantao Ma
PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values, resulting in a significant performance improvement for all EHR analysis models.
1 code implementation • 18 Jan 2024 • Ruizhe Zhang, Xinke Jiang, Yuchen Fang, Jiayuan Luo, Yongxin Xu, Yichen Zhu, Xu Chu, Junfeng Zhao, Yasha Wang
Graph Neural Networks (GNNs) have shown considerable effectiveness in a variety of graph learning tasks, particularly those based on the message-passing approach in recent years.
1 code implementation • 14 Jan 2024 • Zhihao Yu, Xu Chu, Liantao Ma, Yasha Wang, Wenwu Zhu
To bridge this gap, we propose PRIME, a Prototype Recurrent Imputation ModEl, which integrates both intra-series and inter-series information for imputing missing values in irregularly sampled time series.
1 code implementation • 26 Dec 2023 • Xinke Jiang, Ruizhe Zhang, Yongxin Xu, Rihong Qiu, Yue Fang, Zhiyuan Wang, Jinyi Tang, Hongxin Ding, Xu Chu, Junfeng Zhao, Yasha Wang
In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs).
1 code implementation • 20 Aug 2023 • Peng Li, Zhiyi Chen, Xu Chu, Kexin Rong
Data preprocessing is a crucial step in the machine learning process that transforms raw data into a more usable format for downstream ML models.
1 code implementation • NeurIPS 2023 • Xinyu Ma, Xu Chu, Yasha Wang, Yang Lin, Junfeng Zhao, Liantao Ma, Wenwu Zhu
To solve this problem, we propose a novel graph mixup algorithm called FGWMixup, which seeks a midpoint of source graphs in the Fused Gromov-Wasserstein (FGW) metric space.
no code implementations • CVPR 2023 • Lianzhe Wang, Shiji Zhou, Shanghang Zhang, Xu Chu, Heng Chang, Wenwu Zhu
Despite the broad interest in meta-learning, the generalization problem remains one of the significant challenges in this field.
no code implementations • 13 Nov 2022 • Renzhi Wu, Alexander Bendeck, Xu Chu, Yeye He
We also show that a deep learning EM end model (DeepMatcher) trained on labels generated from our weak supervision approach is comparable to an end model trained using tens of thousands of ground-truth labels, demonstrating that our approach can significantly reduce the labeling efforts required in EM.
1 code implementation • 28 Oct 2022 • Yujie Jin, Xu Chu, Yasha Wang, Wenwu Zhu
Based on the proposed term of invariance, we propose a novel deep DG method called Angular Invariance Domain Generalization Network (AIDGN).
1 code implementation • 28 Oct 2022 • Chaohe Zhang, Xu Chu, Liantao Ma, Yinghao Zhu, Yasha Wang, Jiangtao Wang, Junfeng Zhao
M3Care is an end-to-end model compensating the missing information of the patients with missing modalities to perform clinical analysis.
1 code implementation • 15 Sep 2022 • Hantian Zhang, Ki Hyun Tae, Jaeyoung Park, Xu Chu, Steven Euijong Whang
We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips.
1 code implementation • 27 Jul 2022 • Renzhi Wu, Shen-En Chen, Jieyu Zhang, Xu Chu
We train the model on synthetic data generated in the way that ensures the model approximates the analytical optimal solution, and build the model upon Graph Neural Network (GNN) to ensure the model prediction being invariant (or equivariant) to the permutation of LFs (or data points).
no code implementations • 21 Apr 2022 • Xinyu Ma, Xu Chu, Yasha Wang, Hailong Yu, Liantao Ma, Wen Tang, Junfeng Zhao
Thus, to address the issues, we expect to group up strongly correlated features and learn feature correlations in a group-wise manner to reduce the learning complexity without losing generality.
1 code implementation • 6 Feb 2022 • Renzhi Wu, Bolin Ding, Xu Chu, Zhewei Wei, Xiening Dai, Tao Guan, Jingren Zhou
We derive conditions of the learning framework under which the learned model is workload agnostic, in the sense that the model/estimator can be trained with synthetically generated training data, and then deployed into any data warehouse simply as, e. g., user-defined functions (UDFs), to offer efficient (within microseconds on CPU) and accurate NDV estimations for unseen tables and workloads.
no code implementations • 21 Jun 2021 • Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, Yeye He
Panda's IDE includes many novel features purpose-built for EM, such as smart data sampling, a builtin library of EM utility functions, automatically generated LFs, visual debugging of LFs, and finally, an EM-specific labeling model.
no code implementations • 13 Mar 2021 • Hantian Zhang, Xu Chu, Abolfazl Asudeh, Shamkant B. Navathe
Existing techniques for producing fair ML models either are limited to the type of fairness constraints they can handle (e. g., preprocessing) or require nontrivial modifications to downstream ML training algorithms (e. g., in-processing).
1 code implementation • 11 May 2020 • Bojan Karlaš, Peng Li, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang
Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data.
1 code implementation • 16 Aug 2019 • Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, Saravanan Thirumuruganathan
We investigate an important problem that vexes practitioners: is it possible to design an effective algorithm for ER that requires Zero labeled examples, yet can achieve performance comparable to supervised approaches?
no code implementations • 20 Apr 2019 • Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, Ce Zhang
Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training.
1 code implementation • 11 Mar 2019 • Nilaksh Das, Sanya Chaba, Renzhi Wu, Sakshi Gandhi, Duen Horng Chau, Xu Chu
We build the GOGGLES system that implements affinity coding for labeling image datasets by designing a novel set of reusable affinity functions for images, and propose a novel hierarchical generative model for class inference using a small development set.
no code implementations • 1 Nov 2018 • Xu Chu, Yang Lin, Jingyue Gao, Jiangtao Wang, Yasha Wang, Leye Wang
However, the shallow models leveraging bilinear forms suffer from limitations on capturing complicated nonlinear interactions between drug pairs.
no code implementations • 14 Sep 2017 • Yuanduo He, Xu Chu, Juguang Peng, Jingyue Gao, Yasha Wang
Time series prediction is of great significance in many applications and has attracted extensive attention from the data mining community.