Search Results for author: ran Xu

Found 55 papers, 27 papers with code

DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents

1 code implementation COLING 2022 Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong

Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.

document understanding Language Modelling +1

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

1 code implementation17 Mar 2024 Guohao Sun, Can Qin, Jiamian Wang, Zeyuan Chen, ran Xu, Zhiqiang Tao

Recent advancements in the vision-language model have shown notable generalization in vision-language tasks after visual instruction tuning.

Language Modelling Question Answering +2

NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation

no code implementations13 Mar 2024 ran Xu, Yan Shen, Xiaoqi Li, Ruihai Wu, Hao Dong

To address these challenges, we introduce a comprehensive benchmark, NrVLM, comprising 15 distinct manipulation tasks, containing over 4500 episodes meticulously annotated with fine-grained language instructions.

Robot Manipulation

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

1 code implementation28 Feb 2024 Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, ran Xu, Wenpeng Yin, Caiming Xiong

This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents.

RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records

no code implementations25 Feb 2024 ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May D. Wang, Joyce C. Ho, Carl Yang

We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs).

Retrieval

Multimodal Fusion of EHR in Structures and Semantics: Integrating Clinical Records and Notes with Hypergraph and LLM

no code implementations19 Feb 2024 Hejie Cui, Xinyu Fang, ran Xu, Xuan Kan, Joyce C. Ho, Carl Yang

While there has been a lot of research on representation learning of structured EHR data, the fusion of different types of EHR data (multimodal fusion) is not well studied.

Decision Making Representation Learning

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

no code implementations14 Jan 2024 Jiaqi Chen, Bingqian Lin, ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-making and generalization abilities across various tasks.

Decision Making Vision and Language Navigation

EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records

1 code implementation13 Jan 2024 Wenqi Shi, ran Xu, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, May D. Wang

Large language models (LLMs) have demonstrated exceptional capabilities in planning and tool utilization as autonomous agents, but few have been developed for medical problem-solving.

Code Generation Few-Shot Learning +1

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

no code implementations19 Dec 2023 Jiaming Liu, ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang

To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts.

Self-Supervised Learning Test-time Adaptation

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

1 code implementation30 Nov 2023 Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

Visual Reasoning

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

1 code implementation1 Nov 2023 ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, Wei Jin, Joyce Ho, Carl Yang

Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts.

Clinical Knowledge Knowledge Graphs +1

Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation

no code implementations24 Sep 2023 Jiayi Ni, Senqiao Yang, ran Xu, Jiaming Liu, Xiaoqi Li, Wenyu Jiao, Zehui Chen, Yi Liu, Shanghang Zhang

In this paper, we propose a distribution-aware tuning (DAT) method to make the semantic segmentation CTTA efficient and practical in real-world applications.

Autonomous Driving Semantic Segmentation +1

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

no code implementations4 Aug 2023 Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.

Language Modelling

Weakly-Supervised Scientific Document Classification via Retrieval-Augmented Multi-Stage Training

1 code implementation12 Jun 2023 ran Xu, Yue Yu, Joyce C. Ho, Carl Yang

To address this challenge, we propose a weakly-supervised approach for scientific document classification using label names only.

Document Classification Retrieval

R-Mixup: Riemannian Mixup for Biological Networks

no code implementations5 Jun 2023 Xuan Kan, Zimu Li, Hejie Cui, Yue Yu, ran Xu, Shaojun Yu, Zilong Zhang, Ying Guo, Carl Yang

Biological networks are commonly used in biomedical and healthcare domains to effectively model the structure of complex biological systems with interactions linking biological entities.

Data Augmentation

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

1 code implementation14 May 2023 Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Recent advancements in multimodal pre-training methods have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions.

Ranked #4 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Point Cloud Classification Representation Learning +1

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

no code implementations CVPR 2023 Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M. Patel, ran Xu

In summary, an OV method learns task-specific information using strong supervision from base annotations and novel category information using weak supervision from image-captions pairs.

Image Captioning Instance Segmentation +2

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

1 code implementation ICCV 2023 Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, ran Xu

Empirical results show that GlueNet can be trained efficiently and enables various capabilities beyond previous state-of-the-art models: 1) multilingual language models such as XLM-Roberta can be aligned with existing T2I models, allowing for the generation of high-quality images from captions beyond English; 2) GlueNet can align multi-modal encoders such as AudioCLIP with the Stable Diffusion model, enabling sound-to-image generation; 3) it can also upgrade the current text encoder of the latent diffusion model for challenging case generation.

Image Generation

HIVE: Harnessing Human Feedback for Instructional Visual Editing

1 code implementation16 Mar 2023 Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, ran Xu

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences.

Text-based Image Editing

Neighborhood-Regularized Self-Training for Learning with Few Labels

1 code implementation10 Jan 2023 ran Xu, Yue Yu, Hejie Cui, Xuan Kan, Yanqiao Zhu, Joyce Ho, Chao Zhang, Carl Yang

Our further analysis demonstrates that our proposed data selection strategy reduces the noise of pseudo labels by 36. 8% and saves 57. 3% of the time when compared with the best baseline.

Model-Agnostic Hierarchical Attention for 3D Object Detection

no code implementations6 Jan 2023 Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Juan Carlos Niebles, Caiming Xiong, ran Xu

By plugging our proposed modules into the state-of-the-art transformer-based 3D detector, we improve the previous best results on both benchmarks, with the largest improvement margin on small objects.

3D Object Detection Object +1

Tackling Data Heterogeneity in Federated Learning with Class Prototypes

1 code implementation6 Dec 2022 Yutong Dai, Zeyuan Chen, Junnan Li, Shelby Heinecke, Lichao Sun, ran Xu

We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes.

Personalized Federated Learning

Learning Task-Aware Effective Brain Connectivity for fMRI Analysis with Graph Neural Networks

1 code implementation1 Nov 2022 Yue Yu, Xuan Kan, Hejie Cui, ran Xu, Yujia Zheng, Xiangchen Song, Yanqiao Zhu, Kun Zhang, Razieh Nabi, Ying Guo, Chao Zhang, Carl Yang

To better adapt GNNs for fMRI analysis, we propose TBDS, an end-to-end framework based on \underline{T}ask-aware \underline{B}rain connectivity \underline{D}AG (short for Directed Acyclic Graph) \underline{S}tructure generation for fMRI analysis.

Time Series Time Series Analysis

Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

1 code implementation15 Sep 2022 Yue Yu, Rongzhi Zhang, ran Xu, Jieyu Zhang, Jiaming Shen, Chao Zhang

Large Language Models have demonstrated remarkable few-shot performance, but the performance can be sensitive to the selection of few-shot instances.

Language Modelling Text Classification

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

1 code implementation3 Aug 2022 Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, ran Xu, Joseph F. JaJa, Larry S. Davis

To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.

Answer Generation Question-Answer-Generation +3

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

1 code implementation CVPR 2022 Shu Zhang, ran Xu, Caiming Xiong, Chetan Ramaiah

Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks.

Contrastive Learning Representation Learning

SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles

no code implementations CVPR 2022 ran Xu, Fangzhou Mu, Jayoung Lee, Preeti Mukherjee, Somali Chaterji, Saurabh Bagchi, Yin Li

In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to schedule the optimal one at inference time?

object-detection Video Object Detection

Value Retrieval with Arbitrary Queries for Form-like Documents

1 code implementation15 Dec 2021 Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong

Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.

document understanding Language Modelling +1

Burn After Reading: Online Adaptation for Cross-domain Streaming Data

no code implementations8 Dec 2021 Luyu Yang, Mingfei Gao, Zeyuan Chen, ran Xu, Abhinav Shrivastava, Chetan Ramaiah

In the context of online privacy, many methods propose complex privacy and security preserving measures to protect sensitive data.

Unsupervised Domain Adaptation

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

1 code implementation18 Nov 2021 Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong

To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.

Object object-detection +1

Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks

1 code implementation8 Oct 2021 Le Xue, Mingfei Gao, Zeyuan Chen, Caiming Xiong, ran Xu

We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks.

Optical Character Recognition (OCR)

MetaHistoSeg: A Python Framework for Meta Learning in Histopathology Image Segmentation

no code implementations29 Sep 2021 Zheng Yuan, Andre Esteva, ran Xu

We also curate a histopathology meta dataset - a benchmark dataset for training and validating models on out-of-distribution performance across a range of cancer types.

Domain Generalization Few-Shot Learning +3

JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads

no code implementations9 Dec 2020 Karthick Shankar, Pengcheng Wang, ran Xu, Ashraf Mahgoub, Somali Chaterji

In addition, we also look at the pros and cons of some of the proprietary deep-learning object detection packages, such as Amazon Rekognition, Google Vision, and Azure Cognitive Services, to contrast with open-source and tunable solutions, such as Faster R-CNN (FRCNN).

Anomaly Detection Benchmarking +4

ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

1 code implementation21 Oct 2020 ran Xu, Chen-Lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, Saurabh Bagchi

In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios.

Object object-detection +3

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

no code implementations CVPR 2021 Mingfei Gao, Yingbo Zhou, ran Xu, Richard Socher, Caiming Xiong

Online action detection in untrimmed videos aims to identify an action as it happens, which makes it very important for real-time applications.

Action Recognition Online Action Detection

Context-aware Active Multi-Step Reinforcement Learning

no code implementations11 Nov 2019 Gang Chen, Dingcheng Li, ran Xu

Then given the selected samples, we propose the adaptive multi-step TD, which generalizes TD($\lambda$), but adaptively switch on/off the backups from future returns of different steps.

Active Learning Decision Making +2

ApproxNet: Content and Contention-Aware Video Analytics System for Embedded Clients

no code implementations28 Aug 2019 Ran Xu, Rakesh Kumar, Pengcheng Wang, Peter Bai, Ganga Meghanath, Somali Chaterji, Subrata Mitra, Saurabh Bagchi

None of the current approximation techniques for object classification DNNs can adapt to changing runtime conditions, e. g., changes in resource availability on the device, the content characteristics, or requirements from the user.

Object Detection

TempoCave: Visualizing Dynamic Connectome Datasets to Support Cognitive Behavioral Therapy

1 code implementation18 Jun 2019 Ran Xu, Manu Mathew Thomas, Alex Leow, Olusola Ajilore, Angus G. Forbes

We introduce TempoCave, a novel visualization application for analyzing dynamic brain networks, or connectomes.

Human-Computer Interaction Neurons and Cognition

Human Action Segmentation With Hierarchical Supervoxel Consistency

no code implementations CVPR 2015 Jiasen Lu, ran Xu, Jason J. Corso

Detailed analysis of human action, such as action classification, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing the impact that such deeper information has on the greater action understanding problem.

Action Classification Action Segmentation +3

Sequential Labeling with online Deep Learning

no code implementations10 Dec 2014 Gang Chen, ran Xu, Sargur Srihari

Deep learning has attracted great attention recently and yielded the state of the art performance in dimension reduction and classification problems.

Dimensionality Reduction

Compositional Structure Learning for Action Understanding

no code implementations21 Oct 2014 Ran Xu, Gang Chen, Caiming Xiong, Wei Chen, Jason J. Corso

The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection.

Action Detection Action Understanding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.