no code implementations • EMNLP 2020 • Yiquan Wu, Kun Kuang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Jun Xiao, Yueting Zhuang, Luo Si, Fei Wu
Court{'}s view generation is a novel but essential task for legal AI, aiming at improving the interpretability of judgment prediction results and enabling automatic legal document generation.
1 code implementation • 18 Apr 2025 • Sijing Li, Tianwei Lin, Lingshuai Lin, Wenqiao Zhang, Jiang Liu, Xiaoda Yang, Juncheng Li, Yucheng He, Xiaohui Song, Jun Xiao, Yueting Zhuang, Beng Chin Ooi
In this paper, we propose the Eyecare Kit, which systematically tackles the aforementioned three key challenges with the tailored dataset, benchmark and model: First, we construct a multi-agent data engine with real-life ophthalmology data to produce Eyecare-100K, a high-quality ophthalmic visual instruction dataset.
no code implementations • 16 Apr 2025 • Kaifeng Gao, Siqi Chen, Hanwang Zhang, Jun Xiao, Yueting Zhuang, Qianru Sun
To this end, we propose to model visual relations as continuous embeddings, and design diffusion models to achieve generalized VRD in a conditional generative manner, termed Diff-VRD.
1 code implementation • 27 Mar 2025 • Wenqi Zhang, Mengna Wang, Gangao Liu, Xu Huixin, Yiwei Jiang, Yongliang Shen, Guiyang Hou, Zhe Zheng, Hang Zhang, Xin Li, Weiming Lu, Peng Li, Yueting Zhuang
Recent advances in deep thinking models have demonstrated remarkable reasoning capabilities on mathematical and coding tasks.
no code implementations • 10 Mar 2025 • Haoyu Zheng, Qifan Yu, Binghe Yu, Yang Dai, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang
Diffusion models have achieved remarkable progress in image and video stylization.
no code implementations • 9 Mar 2025 • Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Mengdi Zhang, Jian Shao, Yueting Zhuang
Advanced reasoning in large language models has achieved remarkable performance on challenging tasks, but the prevailing long-context reasoning paradigm faces critical limitations: quadratic computational scaling with sequence length, reasoning constrained by maximum context boundaries, and performance degradation beyond pre-training context windows.
no code implementations • 9 Mar 2025 • Fei Tang, Yongliang Shen, Hang Zhang, Siqi Chen, Guiyang Hou, Wenqi Zhang, Wenqiao Zhang, Kaitao Song, Weiming Lu, Yueting Zhuang
This structured decomposition enables systematic understanding of both interface layouts and visual relationships.
no code implementations • 17 Feb 2025 • Yuchen Yan, Yongliang Shen, Yang Liu, Jin Jiang, Xin Xu, Mengdi Zhang, Jian Shao, Yueting Zhuang
We then apply these models to enhance existing mathematical reasoning datasets by inserting detailed intermediate steps into their solution chains, creating MathFimer-expanded versions.
1 code implementation • 14 Feb 2025 • Tianwei Lin, Wenqiao Zhang, Sijing Li, Yuqian Yuan, Binhe Yu, Haoyuan Li, Wanggui He, Hao Jiang, Mengze Li, Xiaohui Song, Siliang Tang, Jun Xiao, Hui Lin, Yueting Zhuang, Beng Chin Ooi
To effectively learn the HealthGPT, we devise a comprehensive medical domain-specific comprehension and generation dataset called VL-Health.
no code implementations • 7 Feb 2025 • Zhenwei Wu, Jinxiong Lu, Yuxiao Chen, Yunxin Liu, Yueting Zhuang, Luhui Hu
Humanoid robotics presents significant challenges in artificial intelligence, requiring precise coordination and control of high-degree-of-freedom systems.
1 code implementation • 1 Jan 2025 • Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang, Lidong Bing
Compared to its counterparts, our video-centric textbook offers more coherent context, richer knowledge, and better image-text alignment.
2 code implementations • 31 Dec 2024 • Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing
Finally, we meticulously create a VideoRefer-Bench to comprehensively assess the spatial-temporal understanding capability of a Video LLM, evaluating it across various aspects.
no code implementations • 28 Dec 2024 • Haoyu Zheng, Wenqiao Zhang, Zheqi Lv, Yu Zhong, Yang Dai, Jianxiang An, Yongliang Shen, Juncheng Li, Dongping Zhang, Siliang Tang, Yueting Zhuang
Diffusion-based text-to-image (T2I) models have demonstrated remarkable results in global video editing tasks.
no code implementations • 27 Dec 2024 • Jiang Liu, Bolin Li, Haoyuan Li, Tianwei Lin, Wenqiao Zhang, Tao Zhong, Zhelun Yu, Jinghao Wei, Hao Cheng, Hao Jiang, Zheqi Lv, Juncheng Li, Siliang Tang, Yueting Zhuang
Efficient multimodal large language models (EMLLMs), in contrast to multimodal large language models (MLLMs), reduce model size and computational costs and are often deployed on resource-constrained devices.
no code implementations • 18 Dec 2024 • Yaoke Wang, Yun Zhu, Xintong Bao, Wenqiao Zhang, Suyang Dai, Kehan Chen, Wenqiang Li, Gang Huang, Siliang Tang, Yueting Zhuang
Despite the remarkable capabilities of large language models (LLMs) in natural language understanding and reasoning, they often display undesirable behaviors, such as generating hallucinations and unfaithful reasoning.
no code implementations • 13 Dec 2024 • Zhiqi Ge, Juncheng Li, Xinglei Pang, Minghe Gao, Kaihang Pan, Wang Lin, Hao Fei, Wenqiao Zhang, Siliang Tang, Yueting Zhuang
Digital agents are increasingly employed to automate tasks in interactive digital environments such as web pages, software applications, and operating systems.
no code implementations • 9 Dec 2024 • Qifan Yu, Zhebei Shen, Zhongqi Yue, Yang Wu, Wenqiao Zhang, Yunfei Li, Juncheng Li, Siliang Tang, Yueting Zhuang
Instruction tuning fine-tunes pre-trained Multi-modal Large Language Models (MLLMs) to handle real-world tasks.
no code implementations • 29 Nov 2024 • Haiyi Qiu, Minghe Gao, Long Qian, Kaihang Pan, Qifan Yu, Juncheng Li, Wenjie Wang, Siliang Tang, Yueting Zhuang, Tat-Seng Chua
Video Large Language Models (Video-LLMs) have recently shown strong performance in basic video understanding tasks, such as captioning and coarse-grained question answering, but struggle with compositional reasoning that requires multi-step spatio-temporal inference across object relations, interactions, and events.
no code implementations • 26 Nov 2024 • Travis Davies, Jiahuan Yan, Xiang Chen, Yu Tian, Yueting Zhuang, Yiqi Huang, Luhui Hu
Our results demonstrate that our approach significantly boosts the success rate across diverse camera exposures, where previous models experience performance collapse.
no code implementations • 24 Nov 2024 • Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, Yueting Zhuang
Instruction-based image editing aims to modify specific image elements with natural language instructions.
no code implementations • 14 Nov 2024 • Chutian Meng, Fan Ma, Jiaxu Miao, Chi Zhang, Yi Yang, Yueting Zhuang
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model, allowing T2I models to understand image content.
1 code implementation • 15 Oct 2024 • Fei Tang, Yongliang Shen, Hang Zhang, Zeqi Tan, Wenqi Zhang, Zhibiao Huang, Kaitao Song, Weiming Lu, Yueting Zhuang
GaVaMoE introduces two key components: (1) a rating reconstruction module that employs Variational Autoencoder (VAE) with a Gaussian Mixture Model (GMM) to capture complex user-item collaborative preferences, serving as a pre-trained multi-gating mechanism; and (2) a set of fine-grained expert models coupled with the multi-gating mechanism for generating highly personalized explanations.
no code implementations • 2 Oct 2024 • Bingchen Miao, Wenqiao Zhang, Juncheng Li, Siliang Tang, Zhaocheng Li, Haochen Shi, Jun Xiao, Yueting Zhuang
To address this practical challenge, we introduce a first-of-its-kind study that comprehensively investigates Modality-Incomplete Industrial Anomaly Detection (MIIAD), to consider the imperfect learning environment in which the multimodal information may be incomplete.
1 code implementation • 27 Sep 2024 • Hongzhe Huang, Jiang Liu, Zhewen Yu, Li Cai, Dian Jiao, Wenqiao Zhang, Siliang Tang, Juncheng Li, Hao Jiang, Haoyuan Li, Yueting Zhuang
Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning.
1 code implementation • 19 Aug 2024 • Tianwei Lin, Jiang Liu, Wenqiao Zhang, Zhaocheng Li, Yang Dai, Haoyuan Li, Zhelun Yu, Wanggui He, Juncheng Li, Hao Jiang, Siliang Tang, Yueting Zhuang
Considering this, we introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts, and thus achieving the right balance of effectiveness and efficiency: (i) For collaboration, a novel knowledge-sharing and -organizing mechanism is devised to appropriately reduce the scale of matrix operations, thereby boosting the training and inference speed.
Ranked #196 on
Visual Question Answering
on MM-Vet
no code implementations • 28 Jul 2024 • Dong Chen, Shilin Zhang, Fei Gao, Yueting Zhuang, Siliang Tang, Qidong Liu, Mingliang Xu
Subsequently, based on the function base, LD fine-tunes S-LLMs to learn the logic employed by L-LLMs in planning and decision-making.
no code implementations • 21 Jul 2024 • Yunyi Xuan, WeiJie Chen, Shicai Yang, Di Xie, Luojun Lin, Yueting Zhuang
In this paper, we discuss the extension of DFKD to Vision-Language Foundation Models without access to the billion-level image-text datasets.
1 code implementation • 15 Jul 2024 • Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang
Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization.
no code implementations • 12 Jul 2024 • Hanrong Shi, Lin Li, Jun Xiao, Yueting Zhuang, Long Chen
Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects.
1 code implementation • 9 Jul 2024 • Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang
In light of this, we design a multi-modal self-instruct, utilizing large language models and their code capabilities to synthesize massive abstract images and visual reasoning instructions across daily scenarios.
1 code implementation • 6 Jul 2024 • Kai Shen, Lingfei Wu, Siliang Tang, Fangli Xu, Bo Long, Yueting Zhuang, Jian Pei
The visual question generation (VQG) task aims to generate human-like questions from an image and potentially other side information (e. g. answer type).
1 code implementation • 18 Jun 2024 • Yaoke Wang, Yun Zhu, Wenqiao Zhang, Yueting Zhuang, Yunfei Li, Siliang Tang
Representation learning on text-attributed graphs (TAGs) is vital for real-world applications, as they combine semantic textual and contextual structural information.
1 code implementation • 15 Jun 2024 • Dong Chen, Shuo Zhang, Yueting Zhuang, Siliang Tang, Qidong Liu, Hua Wang, Mingliang Xu
On the other hand, certain tasks can be broken down into multiple subtasks, some of which can be completed without powerful capabilities.
no code implementations • 11 Jun 2024 • Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang
In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook.
no code implementations • 12 May 2024 • Dian Jiao, Li Cai, Jingsheng Huang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang
Retrieval-Augmented Generation (RAG) methods augment the input of Large Language Models (LLMs) with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks.
1 code implementation • 3 May 2024 • Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge.
1 code implementation • 28 Apr 2024 • Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang
As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.
no code implementations • 21 Apr 2024 • Haoyu Zheng, Wenqiao Zhang, Yaoke Wang, Hao Zhou, Jiang Liu, Juncheng Li, Zheqi Lv, Siliang Tang, Yueting Zhuang
Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e. g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance.
no code implementations • 17 Apr 2024 • Minghe Gao, Shuang Chen, Liang Pang, Yuan YAO, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua
Their ability to execute intricate compositional reasoning tasks is also constrained, culminating in a stagnation of learning progression for these models.
1 code implementation • 20 Mar 2024 • Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang
Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks.
Ranked #198 on
Visual Question Answering
on MM-Vet
no code implementations • 14 Mar 2024 • Chang Zong, Yuyan Chen, Weiming Lu, Jian Shao, Yongfeng Huang, Heng Chang, Yueting Zhuang
Large Language Models (LLMs) have demonstrated efficacy in various linguistic applications, including question answering and controlled text generation.
1 code implementation • 27 Feb 2024 • Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks.
1 code implementation • 22 Feb 2024 • Chang Zong, Yuchen Yan, Weiming Lu, Jian Shao, Eliot Huang, Heng Chang, Yueting Zhuang
We evaluated the performance of our framework using three benchmark datasets, and the results show that our framework outperforms state-of-the-art systems on the LC-QuAD and YAGO-QA benchmarks, yielding F1 scores of 11. 8% and 20. 7%, respectively.
1 code implementation • 18 Feb 2024 • Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tang
Large Language Models (LLMs) demonstrate remarkable proficiency in comprehending and handling text-based tasks.
no code implementations • 23 Jan 2024 • Kexin Li, Tao Jiang, Zongxin Yang, Yi Yang, Yueting Zhuang, Jun Xiao
Interactive Video Object Segmentation (iVOS) is a challenging task that requires real-time human-computer interaction.
Interactive Video Object Segmentation
Semantic Segmentation
+1
no code implementations • 4 Jan 2024 • Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming Lu
Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.
1 code implementation • 30 Nov 2023 • Yongliang Shen, Kaitao Song, Xu Tan, Wenqi Zhang, Kan Ren, Siyu Yuan, Weiming Lu, Dongsheng Li, Yueting Zhuang
To address this, we introduce TaskBench, a comprehensive framework to evaluate the capability of LLMs in task automation.
1 code implementation • CVPR 2024 • Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang
Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.
no code implementations • CVPR 2024 • Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Siliang Tang, Yueting Zhuang
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources.
no code implementations • 21 Nov 2023 • Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Zheqi Lv, Wenqiao Zhang, Siliang Tang, Yueting Zhuang
Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.
no code implementations • 25 Oct 2023 • WeiJie Chen, Haoyu Wang, Shicai Yang, Lei Zhang, Wei Wei, Yanning Zhang, Luojun Lin, Di Xie, Yueting Zhuang
Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as the corresponding unlabeled target data.
1 code implementation • 4 Oct 2023 • Dong Chen, Kaihang Pan, Guoming Wang, Yueting Zhuang, Siliang Tang
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality, and then the latent space of vision modality will be learned with the guidance of the matrix.
no code implementations • 15 Aug 2023 • Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang
Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
1 code implementation • 8 Aug 2023 • Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Hanwang Zhang, Yueting Zhuang
This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.
no code implementations • 2 Aug 2023 • Zixuan Ni, Longhui Wei, Jiacheng Li, Siliang Tang, Yueting Zhuang, Qi Tian
In this work, we propose a novel strategy named \textbf{Degeneration-Tuning (DT)} to shield contents of unwanted concepts from SD weights.
no code implementations • 5 Jul 2023 • Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang
The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation.
no code implementations • 5 Jul 2023 • Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang
MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8.
no code implementations • 25 Jun 2023 • Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen
A recent DIC method proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i. e., reference-based DIC (Ref-DIC).
1 code implementation • 12 Jun 2023 • Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang
The advancements are twofold: First, it is a code-centric agent that receives human requests and generates code as an intermediary to handle massive data, which is quite flexible for large-scale data processing tasks.
1 code implementation • 26 May 2023 • Yongliang Shen, Zeqi Tan, Shuhui Wu, Wenqi Zhang, Rongsheng Zhang, Yadong Xi, Weiming Lu, Yueting Zhuang
Prompt learning is a new paradigm for utilizing pre-trained language models and has achieved great success in many tasks.
Ranked #1 on
Nested Named Entity Recognition
on ACE 2004
1 code implementation • 22 May 2023 • Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting Zhuang
Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.
3 code implementations • 22 May 2023 • Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang
In this paper, we propose DiffusionNER, which formulates the named entity recognition task as a boundary-denoising diffusion process and thus generates named entities from noisy spans.
Ranked #2 on
Nested Named Entity Recognition
on GENIA
1 code implementation • NeurIPS 2023 • Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen
To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues.
no code implementations • 21 May 2023 • Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang
We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions.
no code implementations • 11 May 2023 • Zixuan Ni, Longhui Wei, Siliang Tang, Yueting Zhuang, Qi Tian
Moreover, we empirically and theoretically demonstrate how SD leads to a performance decline for CLIP on cross-modal retrieval tasks.
1 code implementation • 5 May 2023 • Zeqi Tan, Shen Huang, Zixia Jia, Jiong Cai, Yinghui Li, Weiming Lu, Yueting Zhuang, Kewei Tu, Pengjun Xie, Fei Huang, Yong Jiang
Also, we discover that the limited context length causes the retrieval knowledge to be invisible to the model.
Multilingual Named Entity Recognition
named-entity-recognition
+4
no code implementations • ICCV 2023 • Wenqiao Zhang, Changshuo Liu, Lingze Zeng, Beng Chin Ooi, Siliang Tang, Yueting Zhuang
Conventional multi-label classification (MLC) methods assume that all samples are fully labeled and identically distributed.
1 code implementation • NeurIPS 2023 • Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang
Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence.
Automatic Machine Learning Model Selection
Model Selection
+2
1 code implementation • ICCV 2023 • Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang
Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.
no code implementations • ICCV 2023 • Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.
no code implementations • 11 Mar 2023 • Zhen Wang, Jun Xiao, Yueting Zhuang, Fei Gao, Jian Shao, Long Chen
To this end, we propose a novel prompt-based framework for CIC by learning Combinatorial Prompts, dubbed as ComPro.
no code implementations • 7 Mar 2023 • Jiacheng Li, Longhui Wei, Zongyuan Zhan, Xin He, Siliang Tang, Qi Tian, Yueting Zhuang
To better accelerate the generative transformers while keeping good generation quality, we propose Lformer, a semi-autoregressive text-to-image generation model.
no code implementations • 22 Jan 2023 • Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang
To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.
no code implementations • 12 Jan 2023 • Yilu Guo, Xingyue Shi, WeiJie Chen, Shicai Yang, Di Xie, ShiLiang Pu, Yueting Zhuang
In the test-time training stage, we use the pre-trained model to assign noisy label for the unlabeled target data, and propose a Label-Periodically-Updated DivideMix method for noisy label learning.
no code implementations • 12 Jan 2023 • Wei Zhao, Binbin Chen, WeiJie Chen, Shicai Yang, Di Xie, ShiLiang Pu, Yueting Zhuang
The domain adaptation part is implemented as a Source-Free Domain Adaptation paradigm, which only uses the pre-trained model and the unlabeled target data to further optimize in a self-supervised training manner.
no code implementations • ICCV 2023 • Weizhen He, WeiJie Chen, Binbin Chen, Shicai Yang, Di Xie, Luojun Lin, Donglian Qi, Yueting Zhuang
In this paper, we delve into this problem and propose an Unsupervised Prompt Tuning framework for text-driven object detection, which is composed of two novel mean teaching mechanisms.
no code implementations • 24 Nov 2022 • Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang
Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form.
no code implementations • 3 Nov 2022 • Zeqi Tan, Yongliang Shen, Xuming Hu, Wenqi Zhang, Xiaoxia Cheng, Weiming Lu, Yueting Zhuang
Joint entity and relation extraction has been a core task in the field of information extraction.
Contrastive Learning
Joint Entity and Relation Extraction
+1
no code implementations • 2 Oct 2022 • Chang Zong, Yueting Zhuang, Weiming Lu, Jian Shao, Siliang Tang
In this paper, we propose CTPIR, a new citation trajectory prediction framework that is able to represent the influence (the momentum of citation) of either new or existing publications using the history information of all their attributes.
1 code implementation • 4 Aug 2022 • Juncheng Li, Xin He, Longhui Wei, Long Qian, Linchao Zhu, Lingxi Xie, Yueting Zhuang, Qi Tian, Siliang Tang
Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks.
1 code implementation • 3 Aug 2022 • Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang
In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.
no code implementations • 9 Jul 2022 • Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang
In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image.
1 code implementation • 1 Jul 2022 • Naiyuan Liu, Xiaohan Wang, Xiaobo Li, Yi Yang, Yueting Zhuang
In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022.
Ranked #5 on
Natural Language Queries
on Ego4D
3 code implementations • CVPR 2022 • Binbin Chen, WeiJie Chen, Shicai Yang, Yunyi Xuan, Jie Song, Di Xie, ShiLiang Pu, Mingli Song, Yueting Zhuang
To remedy this issue, we present a novel label assignment mechanism for self-training framework, namely proposal self-assignment, which injects the proposals from student into teacher and generates accurate pseudo labels to match each proposal in the student model accordingly.
1 code implementation • CVPR 2022 • Rang Meng, WeiJie Chen, Shicai Yang, Jie Song, Luojun Lin, Di Xie, ShiLiang Pu, Xinchao Wang, Mingli Song, Yueting Zhuang
In this paper, we introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank, from which models of different capacities can be sampled to accommodate different accuracy-efficiency trade-offs.
no code implementations • 13 Jun 2022 • Junchu Huang, WeiJie Chen, Shicai Yang, Di Xie, ShiLiang Pu, Yueting Zhuang
This framework can reduce the impact of noisy labels from CLIP model effectively by combining both techniques.
2 code implementations • 13 Jun 2022 • Meilin Chen, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, ShiLiang Pu
In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter.
1 code implementation • 4 Jun 2022 • Dong Chen, Lingfei Wu, Siliang Tang, Xiao Yun, Bo Long, Yueting Zhuang
Moreover, when handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise on a corrupted dataset.
1 code implementation • CVPR 2022 • Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang
To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.
1 code implementation • ACL 2022 • Yongliang Shen, Xiaobin Wang, Zeqi Tan, Guangwei Xu, Pengjun Xie, Fei Huang, Weiming Lu, Yueting Zhuang
Each instance query predicts one entity, and by feeding all instance queries simultaneously, we can query all entities in parallel.
Ranked #1 on
Nested Named Entity Recognition
on GENIA
Chinese Named Entity Recognition
named-entity-recognition
+5
1 code implementation • SemEval (NAACL) 2022 • Xinyu Wang, Yongliang Shen, Jiong Cai, Tao Wang, Xiaobin Wang, Pengjun Xie, Fei Huang, Weiming Lu, Yueting Zhuang, Kewei Tu, Wei Lu, Yong Jiang
Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
Multilingual Named Entity Recognition
Named Entity Recognition
+1
1 code implementation • 1 Jan 2022 • Xiaoqiang Wang, Lei Zhu, Siliang Tang, Huazhu Fu, Ping Li, Fei Wu, Yi Yang, Yueting Zhuang
The depth estimation branch is trained with RGB-D images and then used to estimate the pseudo depth maps for all unlabeled RGB images to form the paired data.
no code implementations • CVPR 2022 • Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Longhui Wei, Yueting Zhuang, Qi Tian
Existing NAS-based meta-learning methods apply a two-stage strategy, i. e., first searching architectures and then re-training meta-weights on the searched architecture.
no code implementations • 13 Dec 2021 • Wenqiao Zhang, Haochen Shi, Jiannan Guo, Shengyu Zhang, Qingpeng Cai, Juncheng Li, Sihui Luo, Yueting Zhuang
We propose the Multimodal relAtional Graph adversarIal inferenCe (MAGIC) framework for diverse and unpaired TextCap.
no code implementations • 2 Dec 2021 • Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haocheng Shi, Jun Xiao, Yueting Zhuang, William Yang Wang
Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description.
no code implementations • 2 Dec 2021 • Wenqiao Zhang, Haochen Shi, Siliang Tang, Jun Xiao, Qiang Yu, Yueting Zhuang
The contemporary visual captioning models frequently hallucinate objects that are not actually in a scene, due to the visual misclassification or over-reliance on priors that resulting in the semantic inconsistency between the visual information and the target lexical words.
1 code implementation • NeurIPS 2021 • Shen Kai, Lingfei Wu, Siliang Tang, Yueting Zhuang, Zhen He, Zhuoye Ding, Yun Xiao, Bo Long
The task of visual question generation (VQG) aims to generate human-like neural questions from an image and potentially other side information (e. g., answer type or the answer itself).
no code implementations • 18 Nov 2021 • Zixuan Ni, Siliang Tang, Yueting Zhuang
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
no code implementations • 29 Sep 2021 • Haizhou Shi, Youcai Zhang, Zijin Shen, Siliang Tang, Yaqian Li, Yandong Guo, Yueting Zhuang
This paper investigates the feasibility of federated representation learning under the constraints of communication cost and privacy protection.
1 code implementation • EMNLP 2021 • Shaoning Xiao, Long Chen, Jian Shao, Yueting Zhuang, Jun Xiao
Given an untrimmed video and a natural language query, Natural Language Video Localization (NLVL) aims to identify the video moment described by the query.
no code implementations • 30 Jul 2021 • Haizhou Shi, Youcai Zhang, Siliang Tang, Wenjie Zhu, Yaqian Li, Yandong Guo, Yueting Zhuang
It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning.
no code implementations • 26 Jul 2021 • Zixuan Ni, Haizhou Shi, Siliang Tang, Longhui Wei, Qi Tian, Yueting Zhuang
After investigating existing strategies, we observe that there is a lack of study on how to prevent the inter-phase confusion.
no code implementations • ICCV 2021 • Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang, Fei Wu, Yi Yang, Yueting Zhuang
Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies.
1 code implementation • ACL 2021 • Tao Chen, Haizhou Shi, Siliang Tang, Zhigang Chen, Fei Wu, Yueting Zhuang
The journey of reducing noise from distant supervision (DS) generated training data has been started since the DS was first introduced into the relation extraction (RE) task.
1 code implementation • 21 Jun 2021 • Tao Chen, Haochen Shi, Liyuan Liu, Siliang Tang, Jian Shao, Zhigang Chen, Yueting Zhuang
In this paper, we propose collaborative adversarial training to improve the data utilization, which coordinates virtual adversarial training (VAT) and adversarial training (AT) at different levels.
no code implementations • 26 May 2021 • Feifei Shao, Long Chen, Jian Shao, Wei Ji, Shaoning Xiao, Lu Ye, Yueting Zhuang, Jun Xiao
With the success of deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention.
1 code implementation • 19 May 2021 • Zeqi Tan, Yongliang Shen, Shuai Zhang, Weiming Lu, Yueting Zhuang
We utilize a non-autoregressive decoder to predict the final set of entities in one pass, in which we are able to capture dependencies between entities.
Ranked #6 on
Nested Named Entity Recognition
on ACE 2005
no code implementations • 12 May 2021 • Chenchi Zhang, Wenbo Ma, Jun Xiao, Hanwang Zhang, Jian Shao, Yueting Zhuang, Long Chen
In this paper, we argue that these methods overlook an obvious \emph{mismatch} between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i. e., query-agnostic), hoping that the proposals contain all instances mentioned in the text query (i. e., query-aware).
no code implementations • 13 Apr 2021 • Zongshen Mu, Siliang Tang, Jie Tan, Qiang Yu, Yueting Zhuang
In this paper, we propose a novel graph learning framework for phrase grounding in the image.
Ranked #6 on
Phrase Grounding
on Flickr30k Entities Test
no code implementations • 23 Feb 2021 • WeiJie Chen, Luojun Lin, Shicai Yang, Di Xie, ShiLiang Pu, Yueting Zhuang, Wenqi Ren
Usually, the given source domain pre-trained model is expected to optimize with only unlabeled target data, which is termed as source-free unsupervised domain adaptation.
no code implementations • 1 Feb 2021 • WeiJie Chen, Yilu Guo, Shicai Yang, Zhaoyang Li, Zhenxin Ma, Binbin Chen, Long Zhao, Di Xie, ShiLiang Pu, Yueting Zhuang
Therefore, it yields our attention to suppress false positive in each target domain in an unsupervised way.
no code implementations • 1 Jan 2021 • Chengyue Huang, Lingfei Wu, Yadong Ding, Siliang Tang, Fangli Xu, Chang Zong, Chilie Tan, Yueting Zhuang
To this end, we learn a differentiable graph neural network as a surrogate model to rank candidate architectures, which enable us to obtain gradient w. r. t the input architectures.
no code implementations • 1 Jan 2021 • Haizhou Shi, Dongliang Luo, Siliang Tang, Jian Wang, Yueting Zhuang
Recently, a newly proposed self-supervised framework Bootstrap Your Own Latent (BYOL) seriously challenges the necessity of negative samples in contrastive-based learning frameworks.
no code implementations • 1 Jan 2021 • Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Yueting Zhuang
In this paper, we aim to obtain better meta-learners by co-optimizing the architecture and meta-weights simultaneously.
no code implementations • 1 Jan 2021 • Dong Chen, Lingfei Wu, Siliang Tang, Fangli Xu, Juncheng Li, Chang Zong, Chilie Tan, Yueting Zhuang
In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step.
no code implementations • ICCV 2021 • Jiannan Guo, Haochen Shi, Yangyang Kang, Kun Kuang, Siliang Tang, Zhuoren Jiang, Changlong Sun, Fei Wu, Yueting Zhuang
Although current mainstream methods begin to combine SSL and AL (SSL-AL) to excavate the diverse expressions of unlabeled samples, these methods' fully supervised task models are still trained only with labeled data.
no code implementations • 1 Jan 2021 • Shen Kai, Lingfei Wu, Siliang Tang, Fangli Xu, Zhu Zhang, Yu Qiang, Yueting Zhuang
The task of visual question generation~(VQG) aims to generate human-like questions from an image and potentially other side information (e. g. answer type or the answer itself).
no code implementations • 10 Dec 2020 • Xianfeng Li, WeiJie Chen, Di Xie, Shicai Yang, Peng Yuan, ShiLiang Pu, Yueting Zhuang
However, it is difficult to evaluate the quality of pseudo labels since no labels are available in target domain.
no code implementations • 22 Nov 2020 • Haizhou Shi, Dongliang Luo, Siliang Tang, Jian Wang, Yueting Zhuang
Recently, a newly proposed self-supervised framework Bootstrap Your Own Latent (BYOL) seriously challenges the necessity of negative samples in contrastive learning frameworks.
1 code implementation • NeurIPS 2020 • Guoliang Kang, Yunchao Wei, Yi Yang, Yueting Zhuang, Alexander G. Hauptmann
The conventional solution to this task is to minimize the discrepancy between source and target to enable effective knowledge transfer.
Ranked #27 on
Synthetic-to-Real Translation
on SYNTHIA-to-Cityscapes
no code implementations • 18 Oct 2020 • Fengda Zhang, Kun Kuang, Zhaoyang You, Tao Shen, Jun Xiao, Yin Zhang, Chao Wu, Yueting Zhuang, Xiaolin Li
FURL poses two new challenges: (1) data distribution shift (Non-IID distribution) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces.
no code implementations • 28 Aug 2020 • Siliang Tang, Qi Zhang, Tianpeng Zheng, Mengdi Zhou, Zhan Chen, Lixing Shen, Xiang Ren, Yueting Zhuang, ShiLiang Pu, Fei Wu
When patients need to take medicine, particularly taking more than one kind of drug simultaneously, they should be alarmed that there possibly exists drug-drug interaction.
no code implementations • 11 Aug 2020 • Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, ShiLiang Pu, Yueting Zhuang
In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting.
no code implementations • ACL 2020 • Jie Tan, Changlin Yang, Ying Li, Siliang Tang, Chen Huang, Yueting Zhuang
Measuring the scholarly impact of a document without citations is an important and challenging problem.
1 code implementation • 12 Jun 2020 • Anpeng Wu, Kun Kuang, Junkun Yuan, Bo Li, Runze Wu, Qiang Zhu, Yueting Zhuang, Fei Wu
The fundamental problem in treatment effect estimation from observational data is confounder identification and balancing.
no code implementations • 9 Jun 2020 • Kun Kuang, Bo Li, Peng Cui, Yue Liu, Jianrong Tao, Yueting Zhuang, Fei Wu
By assuming the relationships between causal variables and response variable are invariant across data, to address this problem, we propose a conditional independence test based algorithm to separate those causal variables with a seed variable as priori, and adopt them for stable prediction.
no code implementations • 8 Jun 2020 • Kun Kuang, Hengtao Zhang, Fei Wu, Yueting Zhuang, Aijun Zhang
However, this assumption is often violated in practice because the sample selection bias may induce the distribution shift from training data to test data.
2 code implementations • CVPR 2020 • Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, ShiLiang Pu, Yueting Zhuang
To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP.
Ranked #1 on
Visual Question Answering (VQA)
on VQA-CP
(using extra training data)
no code implementations • 14 Jan 2020 • Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai
Neural Machine Translation (NMT) has become a popular technology in recent years, and the encoder-decoder framework is the mainstream among all the methods.
no code implementations • 9 Dec 2019 • Du Chen, Zewei He, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, Michael Ying Yang, Siliang Tang, Yueting Zhuang
Firstly, we proposed a novel Orientation-Aware feature extraction and fusion Module (OAM), which contains a mixture of 1D and 2D convolutional kernels (i. e., 5 x 1, 1 x 5, and 3 x 3) for extracting orientation-aware features.
no code implementations • CVPR 2020 • Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang
Visual navigation is a task of training an embodied agent by intelligently navigating to a target object (e. g., television) using only visual observations.
1 code implementation • 11 Nov 2019 • Ziqiang Cheng, Yang Yang, Wei Wang, Wenjie Hu, Yueting Zhuang, Guojie Song
Time series modeling has attracted extensive research efforts; however, achieving both reliable efficiency and interpretability from a unified model still remains a challenging problem.
no code implementations • IJCNLP 2019 • Weike Jin, Zhou Zhao, Mao Gu, Jun Xiao, Furu Wei, Yueting Zhuang
Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history.
2 code implementations • IJCNLP 2019 • Xiyuan Yang, Xiaotao Gu, Sheng Lin, Siliang Tang, Yueting Zhuang, Fei Wu, Zhigang Chen, Guoping Hu, Xiang Ren
Despite of the recent success of collective entity linking (EL) methods, these "global" inference methods may yield sub-optimal results when the "all-mention coherence" assumption breaks, and often suffer from high computational cost at the inference stage, due to the complex search space.
Ranked #5 on
Entity Disambiguation
on AIDA-CoNLL
no code implementations • 5 Aug 2019 • Juncheng Li, Siliang Tang, Fei Wu, Yueting Zhuang
The experimental results and further analysis prove that the agent with the MIND module is superior to its counterparts not only in EQA performance but in many other aspects such as route planning, behavioral interpretation, and the ability to generalize from a few examples.
1 code implementation • ACL 2018 • Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, Xiaofei He
We observe that people usually use some discourse markers such as "so" or "but" to represent the logical relationship between two sentences.
Ranked #16 on
Natural Language Inference
on SNLI
no code implementations • NeurIPS 2018 • Boyuan Pan, Yazheng Yang, Hao Li, Zhou Zhao, Yueting Zhuang, Deng Cai, Xiaofei He
In this paper, we transfer knowledge learned from machine comprehension to the sequence-to-sequence tasks to deepen the understanding of the text.
1 code implementation • 7 Jul 2019 • Jiacheng Li, Haizhou Shi, Siliang Tang, Fei Wu, Yueting Zhuang
To solve this problem, we propose a method to mine the cross-modal rules to help the model infer these informative concepts given certain visual input.
Ranked #11 on
Visual Storytelling
on VIST
no code implementations • 1 Jul 2019 • Yutong Wang, Jiyuan Zheng, Qijiong Liu, Zhou Zhao, Jun Xiao, Yueting Zhuang
More specifically, we devise a discriminator, Relation Guider, to capture the relations between the whole passage and the associated answer and then the Multi-Interaction mechanism is deployed to transfer the knowledge dynamically for our question generation system.
1 code implementation • ACL 2019 • Sheng Lin, Luye Zheng, Bo Chen, Siliang Tang, Yueting Zhuang, Fei Wu, Zhigang Chen, Guoping Hu, Xiang Ren
Fine-grained Entity Typing is a tough task which suffers from noise samples extracted from distant supervision.
1 code implementation • 6 Jun 2019 • Zhou Yu, Dejing Xu, Jun Yu, Ting Yu, Zhou Zhao, Yueting Zhuang, DaCheng Tao
It is both crucial and natural to extend this research direction to the video domain for video question answering (VideoQA).
Ranked #31 on
Video Question Answering
on ActivityNet-QA
Visual Question Answering (VQA)
Zero-Shot Video Question Answer
1 code implementation • NAACL 2019 • Qi Zhang, Siliang Tang, Xiang Ren, Fei Wu, ShiLiang Pu, Yueting Zhuang
This paper provides a new way to improve the efficiency of the REINFORCE training process.
no code implementations • NAACL 2019 • Bo Chen, Xiaotao Gu, Yu-Feng Hu, Siliang Tang, Guoping Hu, Yueting Zhuang, Xiang Ren
Recently, distant supervision has gained great success on Fine-grained Entity Typing (FET).
1 code implementation • 27 Dec 2018 • Yujin Yuan, Liyuan Liu, Siliang Tang, Zhongfei Zhang, Yueting Zhuang, ShiLiang Pu, Fei Wu, Xiang Ren
Distant supervision leverages knowledge bases to automatically label instances, thus allowing us to train relation extractor without human annotations.
no code implementations • 29 Nov 2017 • Rui Feng, Yang Yang, Wenjie Hu, Fei Wu, Yueting Zhuang
Existing network embedding works primarily focus on preserving the microscopic structure, such as the first- and second-order proximity of vertexes, while the macroscopic scale-free property is largely ignored.
no code implementations • EMNLP 2017 • Siliang Tang, Ning Zhang, Jinjiang Zhang, Fei Wu, Yueting Zhuang
In domain-specific NER, due to insufficient labeled training data, deep models usually fail to behave normally.
1 code implementation • ICCV 2017 • Liming Zhao, Xi Li, Jingdong Wang, Yueting Zhuang
In this paper, we address the problem of person re-identification, which refers to associating the persons captured from different cameras.
Ranked #112 on
Person Re-Identification
on Market-1501
no code implementations • 20 Jul 2017 • Yunan Ye, Zhou Zhao, Yimeng Li, Long Chen, Jun Xiao, Yueting Zhuang
Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question.
no code implementations • CVPR 2017 • Yanan Li, Donghui Wang, Huanhang Hu, Yuetan Lin, Yueting Zhuang
This mapping is learned on training data of seen classes and is expected to have transfer ability to unseen classes.
no code implementations • 22 Feb 2017 • Yuetan Lin, Zhangyang Pang, Donghui Wang, Yueting Zhuang
Visual question answering (VQA) has witnessed great progress since May, 2015 as a classic problem unifying visual and textual data into a system.
no code implementations • 27 Jan 2016 • Siyu Huang, Xi Li, Zhongfei Zhang, Zhouzhou He, Fei Wu, Wei Liu, Jinhui Tang, Yueting Zhuang
The highly effective visual representation and deep context models ensure that our framework makes a deep semantic understanding of the scene and motion pattern, consequently improving the performance of the visual path prediction task.
no code implementations • CVPR 2016 • Pingbo Pan, Zhongwen Xu, Yi Yang, Fei Wu, Yueting Zhuang
In this paper, we propose a new approach, namely Hierarchical Recurrent Neural Encoder (HRNE), to exploit temporal information of videos.
no code implementations • 19 Oct 2015 • Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, Jingdong Wang
A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner.
no code implementations • 21 Jul 2015 • Xi Li, Chunhua Shen, Anthony Dick, Zhongfei Zhang, Yueting Zhuang
Object identification results for an entire video sequence are achieved by systematically combining the tracking information and visual recognition at each frame.
no code implementations • 4 Dec 2014 • Liming Zhao, Xi Li, Jun Xiao, Fei Wu, Yueting Zhuang
As an important and challenging problem in computer vision and graphics, keypoint-based object tracking is typically formulated in a spatio-temporal statistical learning framework.