Search Results for author: Guo Chen

Found 25 papers, 16 papers with code

一种非结构化数据表征增强的术后风险预测模型(An Unstructured Data Representation Enhanced Model for Postoperative Risk Prediction)

no code implementations CCL 2022 Yaqiang Wang, Xiao Yang, Xuechao Hao, Hongping Shu, Guo Chen, Tao Zhu

“准确的术后风险预测对临床资源规划和应急方案准备以及降低患者的术后风险和死亡率具有积极作用。术后风险预测目前主要基于术前和术中的患者基本信息、实验室检查、生命体征等结构化数据, 而蕴含丰富语义信息的非结构化术前诊断的价值还有待验证。针对该问题, 本文提出一种非结构化数据表征增强的术后风险预测模型, 利用自注意力机制, 精巧的将结构化数据与术前诊断数据进行信息加权融合。基于临床数据, 将本文方法与术后风险预测常用的统计机器学习模型以及最新的深度神经网络进行对比, 本文方法不仅提升了术后风险预测的性能, 同时也为预测模型带来了良好的可解释性。”

SPMamba: State-space model is all you need in speech separation

1 code implementation2 Apr 2024 Kai Li, Guo Chen

Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements.

Speech Separation

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

1 code implementation24 Mar 2024 Yifei HUANG, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, LiMin Wang, Yu Qiao

Along with the videos we record high-quality gaze data and provide detailed multimodal annotations, formulating a playground for modeling the human ability to bridge asynchronous procedural actions from different viewpoints.

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

2 code implementations22 Mar 2024 Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei HUANG, Yu Qiao, Yali Wang, LiMin Wang

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

 Ranked #1 on Audio Classification on ESC-50 (using extra training data)

Action Classification Action Recognition +12

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

1 code implementation14 Mar 2024 Guo Chen, Yifei HUANG, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, LiMin Wang

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

Moment Retrieval Temporal Action Localization +1

Retrieval-Augmented Egocentric Video Captioning

no code implementations1 Jan 2024 Jilan Xu, Yifei HUANG, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos to enhance the video captioning of egocentric videos.

Representation Learning Retrieval +1

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2 code implementations21 Dec 2023 Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

 Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

Decoupling SQL Query Hardness Parsing for Text-to-SQL

no code implementations11 Dec 2023 Jiawen Yi, Guo Chen

This framework decouples the Text-to-SQL task based on query hardness by analyzing questions and schemas, simplifying the multi-hardness task into a single-hardness challenge.

Language Modelling Text-To-SQL

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

1 code implementation28 Nov 2023 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

Fairness Multiple-choice +8

Memory-and-Anticipation Transformer for Online Action Understanding

1 code implementation ICCV 2023 Jiahao Wang, Guo Chen, Yifei HUANG, LiMin Wang, Tong Lu

Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.

Action Understanding Online Action Detection

AVSegFormer: Audio-Visual Segmentation with Transformer

1 code implementation3 Jul 2023 Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture.

Scene Understanding Segmentation

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation22 May 2023 Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Video Understanding

MRSN: Multi-Relation Support Network for Video Action Detection

no code implementations24 Apr 2023 Yin-Dong Zheng, Guo Chen, Minglei Yuan, Tong Lu

Action detection is a challenging video understanding task, requiring modeling spatio-temporal and interaction relations.

Action Detection Relation +1

Champion Solution for the WSDM2023 Toloka VQA Challenge

1 code implementation22 Jan 2023 Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge.

Question Answering Visual Grounding +1

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

1 code implementation6 Dec 2022 Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

 Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge

no code implementations17 Nov 2022 Yinan He, Guo Chen

In this report, we present the transferring pretrained video mask autoencoders(VideoMAE) to egocentric tasks for Ego4d Looking at me Challenge.

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

2 code implementations5 May 2022 Min Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, LiMin Wang

Empirical results demonstrate that our PlusTAD is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction.

Action Detection object-detection +3

Optically-generated focused ultrasound for noninvasive brain stimulation with ultrahigh precision

no code implementations19 Apr 2022 Yueming Li, Ying Jiang, Lu Lan, Xiaowei Ge, Ran Cheng, Yuewei Zhan, Guo Chen, Linli Shi, Runyu Wang, Nan Zheng, Chen Yang, Ji-Xin Cheng

Here, we report optically-generated focused ultrasound (OFUS) for non-invasive brain stimulation with ultrahigh precision.

DCAN: Improving Temporal Action Detection via Dual Context Aggregation

1 code implementation7 Dec 2021 Guo Chen, Yin-Dong Zheng, LiMin Wang, Tong Lu

Specifically, we design the Multi-Path Temporal Context Aggregation (MTCA) to achieve smooth context aggregation on boundary level and precise evaluation of boundaries.

Action Detection Temporal Action Localization

Non-genetic acoustic stimulation of single neurons by a tapered fiber optoacoustic emitter

no code implementations17 Dec 2020 Linli Shi, Ying Jiang, Fernando R. Fernandez, Lu Lan, Guo Chen, Heng-ye Man, John A. White, Ji-Xin Cheng, Chen Yang

As an emerging technology, transcranial focused ultrasound has been demonstrated to successfully evoke motor responses in mice, rabbits, and sensory/motor responses in humans.

Cannot find the paper you are looking for? You can Submit a new open access paper.