Search Results for author: Guo Chen

Found 25 papers, 16 papers with code

一种非结构化数据表征增强的术后风险预测模型(An Unstructured Data Representation Enhanced Model for Postoperative Risk Prediction)

no code implementations • CCL 2022 • Yaqiang Wang, Xiao Yang, Xuechao Hao, Hongping Shu, Guo Chen, Tao Zhu

“准确的术后风险预测对临床资源规划和应急方案准备以及降低患者的术后风险和死亡率具有积极作用。术后风险预测目前主要基于术前和术中的患者基本信息、实验室检查、生命体征等结构化数据, 而蕴含丰富语义信息的非结构化术前诊断的价值还有待验证。针对该问题, 本文提出一种非结构化数据表征增强的术后风险预测模型, 利用自注意力机制, 精巧的将结构化数据与术前诊断数据进行信息加权融合。基于临床数据, 将本文方法与术后风险预测常用的统计机器学习模型以及最新的深度神经网络进行对比, 本文方法不仅提升了术后风险预测的性能, 同时也为预测模型带来了良好的可解释性。”

Paper
Add Code

SPMamba: State-space model is all you need in speech separation

1 code implementation • 2 Apr 2024 • Kai Li, Guo Chen

Notably, within computer vision, Mamba-based methods have been celebrated for their formidable performance and reduced computational requirements.

Speech Separation

Paper
Code

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World

1 code implementation • 24 Mar 2024 • Yifei HUANG, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, LiMin Wang, Yu Qiao

Along with the videos we record high-quality gaze data and provide detailed multimodal annotations, formulating a playground for modeling the human ability to bridge asynchronous procedural actions from different viewpoints.

Paper
Code

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

2 code implementations • 22 Mar 2024 • Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Guo Chen, Baoqi Pei, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei HUANG, Yu Qiao, Yali Wang, LiMin Wang

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

Ranked #1 on Audio Classification on ESC-50 (using extra training data)

Action Classification Action Recognition +12

921

Paper
Code

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

1 code implementation • 14 Mar 2024 • Guo Chen, Yifei HUANG, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, LiMin Wang

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

Ranked #1 on Temporal Action Localization on FineAction

Moment Retrieval Temporal Action Localization +1

107

Paper
Code

Retrieval-Augmented Egocentric Video Captioning

no code implementations • 1 Jan 2024 • Jilan Xu, Yifei HUANG, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos to enhance the video captioning of egocentric videos.

Representation Learning Retrieval +1

Paper
Add Code

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

2 code implementations • 21 Dec 2023 • Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

Ranked #1 on Zero-Shot Video Retrieval on MSR-VTT-full (using extra training data)

Image Retrieval Image-to-Text Retrieval +10

844

Paper
Code

Decoupling SQL Query Hardness Parsing for Text-to-SQL

no code implementations • 11 Dec 2023 • Jiawen Yi, Guo Chen

This framework decouples the Text-to-SQL task based on query hardness by analyzing questions and schemas, simplifying the multi-hardness task into a single-hardness challenge.

Language Modelling Text-To-SQL

Paper
Add Code

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

1 code implementation • 28 Nov 2023 • Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

Ranked #1 on Zero-Shot Video Question Answer on STAR Benchmark

Fairness Multiple-choice +8

2,667

Paper
Code

Memory-and-Anticipation Transformer for Online Action Understanding

1 code implementation • ICCV 2023 • Jiahao Wang, Guo Chen, Yifei HUANG, LiMin Wang, Tong Lu

Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.

Ranked #1 on Action Detection on THUMOS' 14

Action Understanding Online Action Detection

Paper
Code

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

1 code implementation • 13 Jul 2023 • Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, we utilize a multi-scale approach to generate video-related descriptions.

Action Recognition Contrastive Learning +7

921

Paper
Code

AVSegFormer: Audio-Visual Segmentation with Transformer

1 code implementation • 3 Jul 2023 • Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this paper, we propose AVSegFormer, a novel framework for AVS tasks that leverages the transformer architecture.

Scene Understanding Segmentation

Paper
Code

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation • 22 May 2023 • Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Video Understanding

154

Paper
Code

MRSN: Multi-Relation Support Network for Video Action Detection

no code implementations • 24 Apr 2023 • Yin-Dong Zheng, Guo Chen, Minglei Yuan, Tong Lu

Action detection is a challenging video understanding task, requiring modeling spatio-temporal and interaction relations.

Action Detection Relation +1

Paper
Add Code

Champion Solution for the WSDM2023 Toloka VQA Challenge

1 code implementation • 22 Jan 2023 • Shengyi Gao, Zhe Chen, Guo Chen, Wenhai Wang, Tong Lu

In this report, we present our champion solution to the WSDM2023 Toloka Visual Question Answering (VQA) Challenge.

Question Answering Visual Grounding +1

1,118

Paper
Code

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

1 code implementation • 6 Dec 2022 • Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

921

Paper
Code

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

2 code implementations • 17 Nov 2022 • Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

In this report, we present our champion solutions to five tracks at Ego4D challenge.

Ranked #1 on State Change Object Detection on Ego4D

Future Hand Prediction Moment Queries +7

Paper
Code

Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge

no code implementations • 17 Nov 2022 • Yinan He, Guo Chen

In this report, we present the transferring pretrained video mask autoencoders(VideoMAE) to egocentric tasks for Ego4d Looking at me Challenge.

Paper
Add Code

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

no code implementations • 16 Nov 2022 • Yin-Dong Zheng, Guo Chen, Jiahao Wang, Tong Lu, LiMin Wang

Our method achieves an accuracy of 0. 796 on OSCC while achieving an absolute temporal localization error of 0. 516 on PNR.

Human-Object Interaction Detection Object +3

Paper
Add Code

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

2 code implementations • 5 May 2022 • Min Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, LiMin Wang

Empirical results demonstrate that our PlusTAD is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction.

Ranked #1 on Temporal Action Localization on THUMOS14

Action Detection object-detection +3

Paper
Code

Optically-generated focused ultrasound for noninvasive brain stimulation with ultrahigh precision

no code implementations • 19 Apr 2022 • Yueming Li, Ying Jiang, Lu Lan, Xiaowei Ge, Ran Cheng, Yuewei Zhan, Guo Chen, Linli Shi, Runyu Wang, Nan Zheng, Chen Yang, Ji-Xin Cheng

Here, we report optically-generated focused ultrasound (OFUS) for non-invasive brain stimulation with ultrahigh precision.

Paper
Add Code

DCAN: Improving Temporal Action Detection via Dual Context Aggregation

1 code implementation • 7 Dec 2021 • Guo Chen, Yin-Dong Zheng, LiMin Wang, Tong Lu

Specifically, we design the Multi-Path Temporal Context Aggregation (MTCA) to achieve smooth context aggregation on boundary level and precise evaluation of boundaries.

Ranked #19 on Temporal Action Localization on ActivityNet-1.3

Action Detection Temporal Action Localization

Paper
Code

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

2 code implementations • 3 Nov 2021 • Zhe Chen, Jiahao Wang, Wenhai Wang, Guo Chen, Enze Xie, Ping Luo, Tong Lu

We propose an accurate and efficient scene text detection framework, termed FAST (i. e., faster arbitrarily-shaped text detector).

Ranked #2 on Scene Text Detection on MSRA-TD500

Image Classification Scene Text Detection +1

433

Paper
Code

Word-Free Spoken Language Understanding for Mandarin-Chinese

no code implementations • 1 Jul 2021 • Zhiyuan Guo, Yuexin Li, Guo Chen, Xingyu Chen, Akshat Gupta

Spoken dialogue systems such as Siri and Alexa provide great convenience to people's everyday life.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Non-genetic acoustic stimulation of single neurons by a tapered fiber optoacoustic emitter

no code implementations • 17 Dec 2020 • Linli Shi, Ying Jiang, Fernando R. Fernandez, Lu Lan, Guo Chen, Heng-ye Man, John A. White, Ji-Xin Cheng, Chen Yang

As an emerging technology, transcranial focused ultrasound has been demonstrated to successfully evoke motor responses in mice, rabbits, and sensory/motor responses in humans.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.