Search Results for author: ran Xu

Found 79 papers, 41 papers with code

DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents

1 code implementation COLING 2022 Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong

Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.

document understanding Form +3

MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

no code implementations4 Jun 2025 ran Xu, Yuchen Zhuang, Yishan Zhong, Yue Yu, Xiangru Tang, Hang Wu, May D. Wang, Peifeng Ruan, Donghan Yang, Tao Wang, Guanghua Xiao, Carl Yang, Yang Xie, Wenqi Shi

We introduce MedAgentGYM, the first publicly available training environment designed to enhance coding-based medical reasoning capabilities in large language model (LLM) agents.

Benchmarking Language Modeling +3

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

1 code implementation14 May 2025 Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, Le Xue, Caiming Xiong, ran Xu

Building on our innovative model design, training recipe, and datasets, we develop BLIP3-o, a suite of state-of-the-art unified multimodal models.

Image Generation

DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs

no code implementations23 Apr 2025 Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong, Silvio Savarese, Heng Ji, ran Xu

Extensive experiments on image and video understanding tasks demonstrate that DyMU can reduce the average visual token count by 32%-85% while achieving comparable performance to full-length models across diverse VLM architectures, including the recently popularized AnyRes-based visual encoders.

Token Reduction Video Understanding

Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration

1 code implementation7 Apr 2025 ran Xu, Wenqi Shi, Yuchen Zhuang, Yue Yu, Joyce C. Ho, Haoyu Wang, Carl Yang

Retrieval-Augmented Generation (RAG) systems often struggle to handle multi-hop question-answering tasks accurately due to irrelevant context retrieval and limited complex reasoning capabilities.

Language Modeling Language Modelling +6

Could AI Trace and Explain the Origins of AI-Generated Images and Text?

1 code implementation5 Apr 2025 Hongchao Fang, Yixin Liu, Jiangshu Du, Can Qin, ran Xu, Feng Liu, Lichao Sun, Dongwon Lee, Lifu Huang, Wenpeng Yin

AI-generated content is becoming increasingly prevalent in the real world, leading to serious ethical and societal concerns.

RoseRAG: Robust Retrieval-augmented Generation with Small-scale LLMs via Margin-aware Preference Optimization

no code implementations16 Feb 2025 Tianci Liu, Haoxiang Jiang, Tianze Wang, ran Xu, Yue Yu, Linjun Zhang, Tuo Zhao, Haoyu Wang

Large language models (LLMs) have achieved impressive performance but face high computational costs and latency, limiting their deployment in resource-constrained settings.

Open-Domain Question Answering RAG +3

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

no code implementations12 Nov 2024 Anas Awadalla, Le Xue, Manli Shu, An Yan, Jun Wang, Senthil Purushwalkam, Sheng Shen, Hannah Lee, Oscar Lo, Jae Sung Park, Etash Guha, Silvio Savarese, Ludwig Schmidt, Yejin Choi, Caiming Xiong, ran Xu

We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text.

Descriptive Image Captioning

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

no code implementations21 Oct 2024 Michael S. Ryoo, Honglu Zhou, Shrikant Kendre, Can Qin, Le Xue, Manli Shu, Silvio Savarese, ran Xu, Caiming Xiong, Juan Carlos Niebles

We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames.

Language Modeling Language Modelling +2

Trust but Verify: Programmatic VLM Evaluation in the Wild

no code implementations17 Oct 2024 Viraj Prabhu, Senthil Purushwalkam, An Yan, Caiming Xiong, ran Xu

Next, to evaluate free-form model responses to queries in PROVE, we propose a programmatic evaluation strategy that measures both the helpfulness and truthfulness of a response within a unified scene graph-based framework.

Benchmarking Language Modelling +1

xLAM: A Family of Large Action Models to Empower AI Agent Systems

1 code implementation5 Sep 2024 JianGuo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

By releasing the xLAM series, we aim to advance the performance of open-source LLMs for autonomous AI agents, potentially accelerating progress and democratizing access to high-performance models for agent tasks.

AI Agent

Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation

no code implementations22 Jul 2024 Jiaming Shen, ran Xu, Yennie Jun, Zhen Qin, Tianqi Liu, Carl Yang, Yi Liang, Simon Baumgartner, Michael Bendersky

Unlike traditional methods, which generate two responses before obtaining the preference label, RMBoost first generates one response and selects a preference label, followed by generating the second more (or less) preferred response conditioned on the pre-selected preference label and the first response.

Synthetic Data Generation

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

1 code implementation17 Jun 2024 Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, ran Xu, Yejin Choi, Ludwig Schmidt

Multimodal interleaved datasets featuring free-form interleaved sequences of images and text are crucial for training frontier large multimodal models (LMMs).

TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data

1 code implementation14 Jun 2024 Ziyang Zhang, Hejie Cui, ran Xu, Yuzhang Xie, Joyce C. Ho, Carl Yang

In this work, we introduce TACCO, a novel framework that jointly discovers clusters of clinical concepts and patient visits based on a hypergraph modeling of EHR data.

Clustering Phenotype classification +1

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

no code implementations12 Jun 2024 Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, ran Xu, Sarah Tan, JianGuo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese

The deployment of Large Language Models (LLMs) and Large Multimodal Models (LMMs) on mobile devices has gained significant attention due to the benefits of enhanced privacy, stability, and personalization.

Benchmarking Model Compression +1

From Basic to Extra Features: Hypergraph Transformer Pretrain-then-Finetuning for Balanced Clinical Predictions on EHR

no code implementations9 Jun 2024 ran Xu, Yiwen Lu, Chang Liu, Yong Chen, Yan Sun, Xiao Hu, Joyce C Ho, Carl Yang

Electronic Health Records (EHRs) contain rich patient information and are crucial for clinical research and practice.

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

1 code implementation26 May 2024 Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang, Jiaming Liu, ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models.

feature selection Mixture-of-Experts +1

MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning

1 code implementation5 May 2024 Wenqi Shi, ran Xu, Yuchen Zhuang, Yue Yu, Haotian Sun, Hang Wu, Carl Yang, May D. Wang

Faced with the challenges of balancing model performance, computational resources, and data privacy, MedAdapter provides an efficient, privacy-preserving, cost-effective, and transparent solution for adapting LLMs to the biomedical domain.

Privacy Preserving Test-time Adaptation

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

1 code implementation29 Apr 2024 ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Yanqiao Zhu, May D. Wang, Joyce C. Ho, Chao Zhang, Carl Yang

Developing effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources.

Retrieval Unsupervised Pre-training

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

2 code implementations17 Mar 2024 Guohao Sun, Can Qin, Jiamian Wang, Zeyuan Chen, ran Xu, Zhiqiang Tao

Recent advances in vision-language models have shown notable generalization in broad tasks through visual instruction tuning.

Language Modelling Question Answering +2

NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation

no code implementations13 Mar 2024 ran Xu, Yan Shen, Xiaoqi Li, Ruihai Wu, Hao Dong

To address these challenges, we introduce a comprehensive benchmark, NrVLM, comprising 15 distinct manipulation tasks, containing over 4500 episodes meticulously annotated with fine-grained language instructions.

Robot Manipulation

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

1 code implementation28 Feb 2024 Congying Xia, Chen Xing, Jiangshu Du, Xinyi Yang, Yihao Feng, ran Xu, Wenpeng Yin, Caiming Xiong

This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats, a crucial yet underexamined capability for their application as AI agents.

RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records

1 code implementation25 Feb 2024 ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Bowen Jin, May D. Wang, Joyce C. Ho, Carl Yang

We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs).

Retrieval

Multimodal Fusion of EHR in Structures and Semantics: Integrating Clinical Records and Notes with Hypergraph and LLM

no code implementations19 Feb 2024 Hejie Cui, Xinyu Fang, ran Xu, Xuan Kan, Joyce C. Ho, Carl Yang

While there has been a lot of research on representation learning of structured EHR data, the fusion of different types of EHR data (multimodal fusion) is not well studied.

Decision Making Representation Learning

Text2Data: Low-Resource Data Generation with Textual Control

no code implementations8 Feb 2024 Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, ran Xu, Huan Wang, Caiming Xiong, Silvio Savarese

Natural language serves as a common and straightforward signal for humans to interact seamlessly with machines.

Audio Synthesis Time Series

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

no code implementations14 Jan 2024 Jiaqi Chen, Bingqian Lin, ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

Embodied agents equipped with GPT as their brains have exhibited extraordinary decision-making and generalization abilities across various tasks.

Decision Making Vision and Language Navigation

EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records

1 code implementation13 Jan 2024 Wenqi Shi, ran Xu, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce Ho, Carl Yang, May D. Wang

Large language models (LLMs) have demonstrated exceptional capabilities in planning and tool utilization as autonomous agents, but few have been developed for medical problem-solving.

Code Generation Few-Shot Learning +1

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation

no code implementations CVPR 2024 Jiaming Liu, ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang Zhang

To tackle these issues, we propose a continual self-supervised method, Adaptive Distribution Masked Autoencoders (ADMA), which enhances the extraction of target domain knowledge while mitigating the accumulation of distribution shifts.

Decoder Self-Supervised Learning +1

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

2 code implementations30 Nov 2023 Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

To enable this framework, we devise a scalable pipeline that automatically generates high-quality, instruction-tuning datasets from readily available captioning data across different modalities, and contribute 24K QA data for audio and 250K QA data for 3D.

Visual Reasoning

Distribution-Aware Continual Test-Time Adaptation for Semantic Segmentation

1 code implementation24 Sep 2023 Jiayi Ni, Senqiao Yang, ran Xu, Jiaming Liu, Xiaoqi Li, Wenyu Jiao, Zehui Chen, Yi Liu, Shanghang Zhang

In this paper, we propose a distribution-aware tuning (DAT) method to make the semantic segmentation CTTA efficient and practical in real-world applications.

Autonomous Driving Semantic Segmentation +1

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

1 code implementation4 Aug 2023 Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.

Language Modelling

Weakly-Supervised Scientific Document Classification via Retrieval-Augmented Multi-Stage Training

1 code implementation12 Jun 2023 ran Xu, Yue Yu, Joyce C. Ho, Carl Yang

To address this challenge, we propose a weakly-supervised approach for scientific document classification using label names only.

Document Classification Retrieval

A Review on Knowledge Graphs for Healthcare: Resources, Applications, and Promises

no code implementations7 Jun 2023 Hejie Cui, Jiaying Lu, ran Xu, Shiyu Wang, Wenjing Ma, Yue Yu, Shaojun Yu, Xuan Kan, Chen Ling, Liang Zhao, Zhaohui S. Qin, Joyce C. Ho, Tianfan Fu, Jing Ma, Mengdi Huai, Fei Wang, Carl Yang

This comprehensive review aims to provide an overview of the current state of Healthcare Knowledge Graphs (HKGs), including their construction, utilization models, and applications across various healthcare and biomedical research domains.

Knowledge Graphs

R-Mixup: Riemannian Mixup for Biological Networks

no code implementations5 Jun 2023 Xuan Kan, Zimu Li, Hejie Cui, Yue Yu, ran Xu, Shaojun Yu, Zilong Zhang, Ying Guo, Carl Yang

Biological networks are commonly used in biomedical and healthcare domains to effectively model the structure of complex biological systems with interactions linking biological entities.

Data Augmentation

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

no code implementations CVPR 2023 Vibashan VS, Ning Yu, Chen Xing, Can Qin, Mingfei Gao, Juan Carlos Niebles, Vishal M. Patel, ran Xu

In summary, an OV method learns task-specific information using strong supervision from base annotations and novel category information using weak supervision from image-captions pairs.

Image Captioning Instance Segmentation +3

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

1 code implementation ICCV 2023 Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, ran Xu

Empirical results show that GlueNet can be trained efficiently and enables various capabilities beyond previous state-of-the-art models: 1) multilingual language models such as XLM-Roberta can be aligned with existing T2I models, allowing for the generation of high-quality images from captions beyond English; 2) GlueNet can align multi-modal encoders such as AudioCLIP with the Stable Diffusion model, enabling sound-to-image generation; 3) it can also upgrade the current text encoder of the latent diffusion model for challenging case generation.

Decoder Image Generation

Neighborhood-Regularized Self-Training for Learning with Few Labels

1 code implementation10 Jan 2023 ran Xu, Yue Yu, Hejie Cui, Xuan Kan, Yanqiao Zhu, Joyce Ho, Chao Zhang, Carl Yang

Our further analysis demonstrates that our proposed data selection strategy reduces the noise of pseudo labels by 36. 8% and saves 57. 3% of the time when compared with the best baseline.

Hierarchical Point Attention for Indoor 3D Object Detection

no code implementations6 Jan 2023 Manli Shu, Le Xue, Ning Yu, Roberto Martín-Martín, Caiming Xiong, Tom Goldstein, Juan Carlos Niebles, ran Xu

By plugging our proposed modules into the state-of-the-art transformer-based 3D detectors, we improve the previous best results on both benchmarks, with more significant improvements on smaller objects.

3D Object Detection Object +1

Tackling Data Heterogeneity in Federated Learning with Class Prototypes

1 code implementation6 Dec 2022 Yutong Dai, Zeyuan Chen, Junnan Li, Shelby Heinecke, Lichao Sun, ran Xu

We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes.

Personalized Federated Learning

Learning Task-Aware Effective Brain Connectivity for fMRI Analysis with Graph Neural Networks

1 code implementation1 Nov 2022 Yue Yu, Xuan Kan, Hejie Cui, ran Xu, Yujia Zheng, Xiangchen Song, Yanqiao Zhu, Kun Zhang, Razieh Nabi, Ying Guo, Chao Zhang, Carl Yang

To better adapt GNNs for fMRI analysis, we propose TBDS, an end-to-end framework based on \underline{T}ask-aware \underline{B}rain connectivity \underline{D}AG (short for Directed Acyclic Graph) \underline{S}tructure generation for fMRI analysis.

Time Series Time Series Analysis

Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach

1 code implementation15 Sep 2022 Yue Yu, Rongzhi Zhang, ran Xu, Jieyu Zhang, Jiaming Shen, Chao Zhang

Large Language Models have demonstrated remarkable few-shot performance, but the performance can be sensitive to the selection of few-shot instances.

Diversity Language Modeling +2

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation

1 code implementation3 Aug 2022 Jun Wang, Mingfei Gao, Yuqian Hu, Ramprasaath R. Selvaraju, Chetan Ramaiah, ran Xu, Joseph F. JaJa, Larry S. Davis

To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.

Answer Generation Question-Answer-Generation +4

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

1 code implementation CVPR 2022 Shu Zhang, ran Xu, Caiming Xiong, Chetan Ramaiah

Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks.

All Contrastive Learning +1

SmartAdapt: Multi-Branch Object Detection Framework for Videos on Mobiles

no code implementations CVPR 2022 ran Xu, Fangzhou Mu, Jayoung Lee, Preeti Mukherjee, Somali Chaterji, Saurabh Bagchi, Yin Li

In this paper, we ask, and answer, the wide-ranging question across all MBODFs: How to expose the right set of execution branches and then how to schedule the optimal one at inference time?

object-detection Video Object Detection

Value Retrieval with Arbitrary Queries for Form-like Documents

1 code implementation15 Dec 2021 Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong

Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of the layout and semantics of a form.

document understanding Form +3

Burn After Reading: Online Adaptation for Cross-domain Streaming Data

no code implementations8 Dec 2021 Luyu Yang, Mingfei Gao, Zeyuan Chen, ran Xu, Abhinav Shrivastava, Chetan Ramaiah

In the context of online privacy, many methods propose complex privacy and security preserving measures to protect sensitive data.

Diversity Unsupervised Domain Adaptation

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

1 code implementation18 Nov 2021 Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong

To enlarge the set of base classes, we propose a method to automatically generate pseudo bounding-box annotations of diverse objects from large-scale image-caption pairs.

Object object-detection +2

Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks

1 code implementation8 Oct 2021 Le Xue, Mingfei Gao, Zeyuan Chen, Caiming Xiong, ran Xu

We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks.

Form Optical Character Recognition (OCR)

MetaHistoSeg: A Python Framework for Meta Learning in Histopathology Image Segmentation

no code implementations29 Sep 2021 Zheng Yuan, Andre Esteva, ran Xu

We also curate a histopathology meta dataset - a benchmark dataset for training and validating models on out-of-distribution performance across a range of cancer types.

Domain Generalization Few-Shot Learning +3

JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads

no code implementations9 Dec 2020 Karthick Shankar, Pengcheng Wang, ran Xu, Ashraf Mahgoub, Somali Chaterji

In addition, we also look at the pros and cons of some of the proprietary deep-learning object detection packages, such as Amazon Rekognition, Google Vision, and Azure Cognitive Services, to contrast with open-source and tunable solutions, such as Faster R-CNN (FRCNN).

Anomaly Detection Benchmarking +4

ApproxDet: Content and Contention-Aware Approximate Object Detection for Mobiles

1 code implementation21 Oct 2020 ran Xu, Chen-Lin Zhang, Pengcheng Wang, Jayoung Lee, Subrata Mitra, Somali Chaterji, Yin Li, Saurabh Bagchi

In this paper we introduce ApproxDet, an adaptive video object detection framework for mobile devices to meet accuracy-latency requirements in the face of changing content and resource contention scenarios.

Object object-detection +3

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

no code implementations CVPR 2021 Mingfei Gao, Yingbo Zhou, ran Xu, Richard Socher, Caiming Xiong

Online action detection in untrimmed videos aims to identify an action as it happens, which makes it very important for real-time applications.

Action Recognition Online Action Detection

Context-aware Active Multi-Step Reinforcement Learning

no code implementations11 Nov 2019 Gang Chen, Dingcheng Li, ran Xu

Then given the selected samples, we propose the adaptive multi-step TD, which generalizes TD($\lambda$), but adaptively switch on/off the backups from future returns of different steps.

Active Learning Decision Making +3

ApproxNet: Content and Contention-Aware Video Analytics System for Embedded Clients

no code implementations28 Aug 2019 Ran Xu, Rakesh Kumar, Pengcheng Wang, Peter Bai, Ganga Meghanath, Somali Chaterji, Subrata Mitra, Saurabh Bagchi

None of the current approximation techniques for object classification DNNs can adapt to changing runtime conditions, e. g., changes in resource availability on the device, the content characteristics, or requirements from the user.

Object Detection

TempoCave: Visualizing Dynamic Connectome Datasets to Support Cognitive Behavioral Therapy

1 code implementation18 Jun 2019 Ran Xu, Manu Mathew Thomas, Alex Leow, Olusola Ajilore, Angus G. Forbes

We introduce TempoCave, a novel visualization application for analyzing dynamic brain networks, or connectomes.

Human-Computer Interaction Neurons and Cognition

Human Action Segmentation With Hierarchical Supervoxel Consistency

no code implementations CVPR 2015 Jiasen Lu, ran Xu, Jason J. Corso

Detailed analysis of human action, such as action classification, detection and localization has received increasing attention from the community; datasets like JHMDB have made it plausible to conduct studies analyzing the impact that such deeper information has on the greater action understanding problem.

Action Classification Action Segmentation +3

Sequential Labeling with online Deep Learning

no code implementations10 Dec 2014 Gang Chen, ran Xu, Sargur Srihari

Deep learning has attracted great attention recently and yielded the state of the art performance in dimension reduction and classification problems.

Deep Learning Dimensionality Reduction

Compositional Structure Learning for Action Understanding

no code implementations21 Oct 2014 Ran Xu, Gang Chen, Caiming Xiong, Wei Chen, Jason J. Corso

The focus of the action understanding literature has predominately been classification, how- ever, there are many applications demanding richer action understanding such as mobile robotics and video search, with solutions to classification, localization and detection.

Action Detection Action Understanding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.