no code implementations • 24 Mar 2025 • Meng Cao, Pengfei Hu, Yingyao Wang, Jihao Gu, Haoran Tang, Haoze Zhao, Jiahua Dong, Wangbo Yu, Ge Zhang, Ian Reid, Xiaodan Liang
Recent advancements in Large Video Language Models (LVLMs) have highlighted their potential for multi-modal understanding, yet evaluating their factual grounding in video contexts remains a critical unsolved challenge.
1 code implementation • 3 Jan 2025 • Kang Yi, Haoran Tang, Yumeng Li, Jing Xu, Jun Zhang
RGB-D salient object detection (SOD), aiming to highlight prominent regions of a given scene by jointly modeling RGB and depth information, is one of the challenging pixel-level prediction tasks.
1 code implementation • 2 Dec 2024 • Meng Cao, Haoran Tang, Haoze Zhao, Hangyu Guo, Jiaheng Liu, Ge Zhang, Ruyang Liu, Qiang Sun, Ian Reid, Xiaodan Liang
In this paper, we propose PhysGame as a pioneering benchmark to evaluate physical commonsense violations in gameplay videos.
1 code implementation • 4 Nov 2024 • Ruyang Liu, Haoran Tang, Haibo Liu, Yixiao Ge, Ying Shan, Chen Li, Jiankun Yang
In this paper, we identify the key issue as the redundant content in videos.
1 code implementation • 31 Oct 2024 • Dingli Yuan, Shitong Wu, Haoran Tang, Lu Yang, Chenghui Peng
The main idea is to construct a fixed-point equation for channel estimation, which can be implemented into the deep equilibrium (DEQ) model with a fixed network.
1 code implementation • 20 Aug 2024 • Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang
Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries.
1 code implementation • 2 Jul 2024 • Xiang Li, Haoran Tang, Siyu Chen, Ziwei Wang, Ryan Chen, Marcin Abram
This effect is especially visible for open questions and questions of high difficulty or novelty.
1 code implementation • 19 Jun 2024 • Yicong Li, Yu Yang, Jiannong Cao, Shuaiqi Liu, Haoran Tang, Guandong Xu
We first identify biased structural evolutions in a dynamic graph based on the evolving trend of vertex degree and then propose FairDGE, the first structurally Fair Dynamic Graph Embedding algorithm.
no code implementations • 29 May 2024 • Meng Cao, Haoran Tang, Jinfa Huang, Peng Jin, Can Zhang, Ruyang Liu, Long Chen, Xiaodan Liang, Li Yuan, Ge Li
Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries.
no code implementations • 29 May 2024 • Xueyao Sun, Kaize Shi, Haoran Tang, Guandong Xu, Qing Li
Large language models (LLMs) can elicit social bias during generations, especially when inference with toxic prompts.
1 code implementation • 30 Mar 2024 • Ruyang Liu, Chen Li, Haoran Tang, Yixiao Ge, Ying Shan, Ge Li
In this paper, we investigate a straightforward yet unexplored question: Can we feed all spatial-temporal tokens into the LLM, thus delegating the task of video sequence modeling to the LLMs?
no code implementations • 27 Feb 2024 • Jingying Wang, Haoran Tang, Taylor Kantor, Tandis Soltani, Vitaliy Popov, Xu Wang
The segmentation pipeline enables functionalities to create visual questions and feedback desired by surgeons from a formative study.
2 code implementations • 12 Dec 2023 • Xiang Li, Haoran Tang, Siyu Chen, Ziwei Wang, Anurag Maravi, Marcin Abram
In this paper, we explore the challenges inherent to Large Language Models (LLMs) like GPT-4, particularly their propensity for hallucinations, logic mistakes, and incorrect conclusions when tasked with answering complex questions.
no code implementations • 5 Dec 2023 • Haoran Tang, Jieren Deng, Zhihong Pan, Hao Tian, Pratik Chaudhari, Xin Zhou
Previous methods use the same reference image as the target.
no code implementations • 20 Oct 2023 • Zixuan Wang, Haoran Tang, Haibo Wang, Bo Qin, Mark D. Butala, Weiming Shen, Hongwei Wang
Despite the remarkable results that can be achieved by data-driven intelligent fault diagnosis techniques, they presuppose the same distribution of training and test data as well as sufficient labeled data.
no code implementations • ICCV 2023 • Yuanyi Zhong, Haoran Tang, Jun-Kun Chen, Yu-Xiong Wang
Though self-supervised contrastive learning (CL) has shown its potential to achieve state-of-the-art accuracy without any supervision, its behavior still remains under investigated by academia.
no code implementations • 29 Nov 2022 • Lukas Zhornyak, Zhengjie Xu, Haoran Tang, Jianbo Shi
We present HashEncoding, a novel autoencoding architecture that leverages a non-parametric multiscale coordinate hash function to facilitate a per-pixel decoder without convolutions.
no code implementations • 10 Nov 2022 • Zeqian Li, Keyu Qiu, Chenxu Jiao, Wen Zhu, Haoran Tang
This paper describes a French dialect recognition system that will appropriately distinguish between different regional French dialects.
no code implementations • 10 Jun 2022 • Yuanyi Zhong, Haoran Tang, Junkun Chen, Jian Peng, Yu-Xiong Wang
Our insight has implications in improving the downstream robustness of supervised learning.
no code implementations • 28 Jan 2022 • Changwei Xu, Jianfei Yang, Haoran Tang, Han Zou, Cheng Lu, Tianshuo Zhang
Unsupervised Domain Adaptation (UDA), a branch of transfer learning where labels for target samples are unavailable, has been widely researched and developed in recent years with the help of adversarially trained models.
no code implementations • 23 Sep 2019 • Ofir Nachum, Haoran Tang, Xingyu Lu, Shixiang Gu, Honglak Lee, Sergey Levine
Hierarchical reinforcement learning has demonstrated significant success at solving difficult reinforcement learning (RL) tasks.
Hierarchical Reinforcement Learning
reinforcement-learning
+2
no code implementations • 8 Nov 2018 • Dennis Lee, Haoran Tang, Jeffrey O. Zhang, Huazhe Xu, Trevor Darrell, Pieter Abbeel
We present a novel modular architecture for StarCraft II AI.
4 code implementations • ICML 2017 • Tuomas Haarnoja, Haoran Tang, Pieter Abbeel, Sergey Levine
We propose a method for learning expressive energy-based policies for continuous states and actions, which has been feasible only in tabular domains before.
3 code implementations • NeurIPS 2017 • Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks.
Ranked #1 on
Atari Games
on Atari 2600 Freeway