no code implementations • 12 Feb 2025 • Xingtong Yu, Chang Zhou, Zhongwei Kuai, Xinming Zhang, Yuan Fang
Specifically, we decompose the adaptation process for each downstream task into a series of inference steps, with each step consisting of prompt-based inference, ``thought'' generation, and thought-conditioned prompt learning.
1 code implementation • 8 Feb 2025 • Xingtong Yu, Zechuan Gong, Chang Zhou, Yuan Fang, HUI ZHANG
This raises an important question: How can we train a graph foundational model on multiple source domains and adapt to an unseen target domain?
no code implementations • 29 Dec 2024 • Chiyu Cheng, Chang Zhou, Yang Zhao, Jin Cao
The exponential growth of data storage demands has necessitated the evolution of hierarchical storage management strategies [1].
no code implementations • 29 Dec 2024 • Chiyu Cheng, Chang Zhou, Yang Zhao, Jin Cao
Traditional heuristics employed for storage performance optimization often fail to adapt to the variability and complexity of contemporary workloads, leading to significant performance bottlenecks and resource inefficiencies.
no code implementations • 29 Dec 2024 • Chiyu Cheng, Chang Zhou, Yang Zhao, Jin Cao
The management of data writes to SSD caches plays a crucial role in improving overall system performance, reducing latency, and extending the lifespan of storage devices.
7 code implementations • 18 Sep 2024 • Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, Junyang Lin
We present the Qwen2-VL Series, an advanced upgrade of the previous Qwen-VL models that redefines the conventional predetermined-resolution approach in visual processing.
Ranked #3 on
Video Question Answering
on TVBench
Natural Language Visual Grounding
Temporal Relation Extraction
+2
no code implementations • 14 Sep 2024 • Hongcheng Guo, Wei zhang, JunHao Chen, Yaonan Gu, Jian Yang, Junjia Du, Binyuan Hui, Tianyu Liu, Jianxin Ma, Chang Zhou, Zhoujun Li
We have conducted extensive experiments on existing large multimodal models, offering insights into their performance and areas for improvement in image-to-web domain.
no code implementations • 20 Aug 2024 • Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou
Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc.
no code implementations • 6 Aug 2024 • Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Junyang Lin, Chang Zhou
The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to-SQL tasks.
2 code implementations • 15 Jul 2024 • Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, YuanJun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou
We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.
6 code implementations • 15 Jul 2024 • An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, TianHao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, Zhihao Fan
This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models.
Ranked #3 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
no code implementations • 3 Jul 2024 • Yang Zhao, Chang Zhou, Jin Cao, Yi Zhao, Shaobo Liu, Chiyu Cheng, Xingchen Li
This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL).
1 code implementation • 20 Jun 2024 • Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang
In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models.
1 code implementation • 19 Jun 2024 • Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, Jingren Zhou
AutoIF transforms the validation of instruction-following data quality into code verification, requiring LLMs to generate instructions, the corresponding code to check the correctness of the instruction responses, and unit test samples to verify the code's correctness.
Ranked #1 on
Instruction Following
on IFEval
1 code implementation • 18 Jun 2024 • Zhe Yang, Yichang Zhang, Tianyu Liu, Jian Yang, Junyang Lin, Chang Zhou, Zhifang Sui
Furthermore, we introduce the concept of consistency score to quantitatively measure this inconsistency and analyze the potential for improvement in consistency by relative consistency score.
no code implementations • 4 Jun 2024 • Chang Zhou, Yang Zhao, Yuelin Zou, Jin Cao, Wenhan Fan, Yi Zhao, Chiyu Cheng
This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform.
no code implementations • 4 Jun 2024 • Chang Zhou, Yang Zhao, Shaobo Liu, Yi Zhao, Xingchen Li, Chiyu Cheng
In a society where traffic accidents frequently occur, fatigue driving has emerged as a grave issue.
1 code implementation • 28 May 2024 • Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, Chang Zhou
Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF).
no code implementations • 22 May 2024 • Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhang
To address these issues, we propose MDGPT, a text free Multi-Domain Graph Pre-Training and adaptation framework designed to exploit multi-domain knowledge for graph learning.
no code implementations • 22 May 2024 • Chang Zhou, Yang Zhao, Jin Cao, Yi Shen, Xiaoling Cui, Chiyu Cheng
This paper explores the integration of strategic optimization methods in search advertising, focusing on ad ranking and bidding mechanisms within E-commerce platforms.
1 code implementation • 17 May 2024 • Tingyu Xia, Bowen Yu, Yuan Wu, Yi Chang, Chang Zhou
In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts.
no code implementations • 17 May 2024 • Wenhan Fan, Zhicheng Ding, Ruixin Huang, Chang Zhou, Xuyang Zhang
The confusion matrix for the training set shows a total of 177 correct predictions and 52 incorrect predictions, with an accuracy of 77%, precision of 88%, recall of 77% and f1 score of 82%.
no code implementations • 6 May 2024 • Jinyin Wang, Xingchen Li, Yixuan Jin, Yihao Zhong, Keke Zhang, Chang Zhou
This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks.
1 code implementation • 11 Mar 2024 • Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang
To this end, we introduce FastV, a versatile plug-and-play method designed to optimize computational efficiency by learning adaptive attention patterns in early layers and pruning visual tokens in subsequent ones.
1 code implementation • 27 Feb 2024 • Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong
The ability of Large Language Models (LLMs) to process and generate coherent text is markedly weakened when the number of input tokens exceeds their pretraining length.
no code implementations • 26 Feb 2024 • Jonathan W. Lee, Han Wang, Kathy Jang, Amaury Hayat, Matthew Bunting, Arwa Alanqary, William Barbour, Zhe Fu, Xiaoqian Gong, George Gunter, Sharon Hornstein, Abdul Rahman Kreidieh, Nathan Lichtlé, Matthew W. Nice, William A. Richardson, Adit Shah, Eugene Vinitsky, Fangyu Wu, Shengquan Xiang, Sulaiman Almatrudi, Fahd Althukair, Rahul Bhadani, Joy Carpio, Raphael Chekroun, Eric Cheng, Maria Teresa Chiri, Fang-Chieh Chou, Ryan Delorenzo, Marsalis Gibson, Derek Gloudemans, Anish Gollakota, Junyi Ji, Alexander Keimer, Nour Khoudari, Malaika Mahmood, Mikail Mahmood, Hossein Nick Zinat Matin, Sean McQuade, Rabie Ramadan, Daniel Urieli, Xia Wang, Yanbing Wang, Rita Xu, Mengsha Yao, Yiling You, Gergely Zachár, Yibo Zhao, Mostafa Ameli, Mirza Najamuddin Baig, Sarah Bhaskaran, Kenneth Butts, Manasi Gowda, Caroline Janssen, John Lee, Liam Pedersen, Riley Wagner, Zimo Zhang, Chang Zhou, Daniel B. Work, Benjamin Seibold, Jonathan Sprinkle, Benedetto Piccoli, Maria Laura Delle Monache, Alexandre M. Bayen
The upper layer is called Speed Planner, and is a centralized optimal control algorithm.
1 code implementation • 12 Feb 2024 • Qian Yang, Jin Xu, Wenrui Liu, Yunfei Chu, Ziyue Jiang, Xiaohuan Zhou, Yichong Leng, YuanJun Lv, Zhou Zhao, Chang Zhou, Jingren Zhou
By revealing the limitations of existing LALMs through evaluation results, AIR-Bench can provide insights into the direction of future research.
1 code implementation • 26 Jan 2024 • Chao Chen, Jie Liu, Chang Zhou, Jie Tang, Gangshan Wu
At the "Sketch" stage, local directions of keypoints can be easily estimated by fast convolutional layers.
1 code implementation • 23 Jan 2024 • Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou
Nevertheless, we posit that LLMs inherently harbor role-play capabilities, owing to the extensive knowledge of characters and potential dialogues ingrained in their vast training corpora.
1 code implementation • 21 Dec 2023 • Xiaolong Shen, Jianxin Ma, Chang Zhou, Zongxin Yang
For 3D GAN inversion, we introduce two methods which aim to enhance the representation of style codes and alleviate 3D inconsistencies.
1 code implementation • 28 Nov 2023 • Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhang
Hence, in this paper, we propose MultiGPrompt, a novel multi-task pre-training and prompting framework to exploit multiple pretext tasks for more comprehensive pre-trained knowledge.
no code implementations • 15 Nov 2023 • Keming Lu, Hongyi Yuan, Runji Lin, Junyang Lin, Zheng Yuan, Chang Zhou, Jingren Zhou
Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods.
no code implementations • 15 Nov 2023 • Hongyi Yuan, Keming Lu, Fei Huang, Zheng Yuan, Chang Zhou
Large language models~(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias.
2 code implementations • 14 Nov 2023 • Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren Zhou
Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.
Ranked #1 on
Acoustic Scene Classification
on TUT Acoustic Scenes 2017
(using extra training data)
1 code implementation • 14 Nov 2023 • Shengguang Wu, Keming Lu, Benfeng Xu, Junyang Lin, Qi Su, Chang Zhou
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets, as the model selects new data points most distinct from any existing ones according to its current embedding space.
no code implementations • 5 Nov 2023 • Huan Cui, Jie Cao, Qun Hao, Haoyu Zhang, Chang Zhou
At a sampling ratio of 0. 0084 referring to HR FSI with 1024*768 pixels, experimentally, by UFFSI with 255*341 cells of 89% reduction in data redundancy, the ROI has a significantly better imaging quality to meet imaging needs.
1 code implementation • 25 Oct 2023 • Mingfeng Xue, Dayiheng Liu, Kexin Yang, Guanting Dong, Wenqiang Lei, Zheng Yuan, Chang Zhou, Jingren Zhou
Furthermore, we assemble three test sets for comprehensive evaluation, an occu-test set covering 25 occupational categories, an estate set focusing on real estate, and an occu-quora set containing real-world questions from Quora.
2 code implementations • 9 Oct 2023 • Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, Jingren Zhou
We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies.
1 code implementation • 9 Oct 2023 • Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou
In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks?
Ranked #60 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
2 code implementations • 7 Oct 2023 • Zhihao Du, JiaMing Wang, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang
Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as automatic speech recognition, speech-to-text translation, and speech enhancement over models using continuous speech features.
2 code implementations • 28 Sep 2023 • Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, Tianhang Zhu
Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans.
Ranked #3 on
Multi-Label Text Classification
on CC3M-TagMask
1 code implementation • 31 Aug 2023 • Shuai Bai, Shusheng Yang, Jinze Bai, Peng Wang, Xingxuan Zhang, Junyang Lin, Xinggang Wang, Chang Zhou, Jingren Zhou
Large vision-language models (LVLMs) have recently witnessed rapid advancements, exhibiting a remarkable capacity for perceiving, understanding, and processing visual information by connecting visual receptor with large language models (LLMs).
2 code implementations • 24 Aug 2023 • Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images.
Ranked #2 on
Spatial Reasoning
on EmbSpatial-Bench
1 code implementation • 14 Aug 2023 • Keming Lu, Hongyi Yuan, Zheng Yuan, Runji Lin, Junyang Lin, Chuanqi Tan, Chang Zhou, Jingren Zhou
Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data.
1 code implementation • 3 Aug 2023 • Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Keming Lu, Chuanqi Tan, Chang Zhou, Jingren Zhou
We find with augmented samples containing more distinct reasoning paths, RFT improves mathematical reasoning performance more for LLMs.
Ranked #111 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
1 code implementation • ICCV 2023 • Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang
Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.
no code implementations • ICCV 2023 • Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang
However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL.
2 code implementations • 24 May 2023 • Benfeng Xu, An Yang, Junyang Lin, Quan Wang, Chang Zhou, Yongdong Zhang, Zhendong Mao
The answering quality of an aligned large language model (LLM) can be drastically improved if treated with proper crafting of prompts.
2 code implementations • 18 May 2023 • Peng Wang, Shijie Wang, Junyang Lin, Shuai Bai, Xiaohuan Zhou, Jingren Zhou, Xinggang Wang, Chang Zhou
In this work, we explore a scalable way for building a general representation model toward unlimited modalities.
Ranked #1 on
Semantic Segmentation
on ADE20K
(using extra training data)
1 code implementation • 26 Apr 2023 • Chang Zhou, Jie Liu, Jie Tang, Gangshan Wu
To better model correlations and to produce more accurate motion fields, we propose the Densely Queried Bilateral Correlation (DQBC) that gets rid of the receptive field dependency problem and thus is more friendly to small and fast-moving objects.
Ranked #1 on
Video Frame Interpolation
on MSU Video Frame Interpolation
(VMAF metric)
1 code implementation • CVPR 2023 • Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang
However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.
Ranked #56 on
3D Human Pose Estimation
on 3DPW
1 code implementation • 17 Feb 2023 • Yukang Gan, Yixiao Ge, Chang Zhou, Shupeng Su, Zhouchuan Xu, Xuyuan Xu, Quanchao Hui, Xiang Chen, Yexin Wang, Ying Shan
To tackle the challenge, we propose a binary embedding-based retrieval (BEBR) engine equipped with a recurrent binarization algorithm that enables customized bits per dimension.
1 code implementation • 19 Dec 2022 • Junyang Lin, Xuancheng Ren, Yichang Zhang, Gao Liu, Peng Wang, An Yang, Chang Zhou
This paper proposes a new method, OFA-OCR, to transfer multimodal pretrained models to text recognition.
1 code implementation • 8 Dec 2022 • Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou
As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data.
no code implementations • 6 Dec 2022 • Jianxin Ma, Shuai Bai, Chang Zhou
Generative modeling of human motion has broad applications in computer animation, virtual reality, and robotics.
1 code implementation • 29 Nov 2022 • Xiaohuan Zhou, JiaMing Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou
Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text.
Ranked #4 on
Speech Recognition
on AISHELL-1
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 26 Nov 2022 • Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou
To achieve this task, we construct a synthetic dataset and develop an effective framework.
1 code implementation • 2 Nov 2022 • An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou
The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.
Ranked #1 on
Zero-shot Image Retrieval
on MUGE Retrieval
no code implementations • 23 Oct 2022 • Yulei Niu, Long Chen, Chang Zhou, Hanwang Zhang
The network response serves as additional supervision to formulate the machine domain, which uses the data collected from the human domain as a transfer set.
1 code implementation • 4 Aug 2022 • Hao Yang, Junyang Lin, An Yang, Peng Wang, Chang Zhou, Hongxia Yang
Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural language pretraining and even vision pretraining.
Ranked #2 on
Visual Entailment
on SNLI-VE test
1 code implementation • 19 Jul 2022 • Shuai Bai, Huiling Zhou, Zhikang Li, Chang Zhou, Hongxia Yang
Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image.
Ranked #3 on
Virtual Try-on
on VITON
no code implementations • 4 Jun 2022 • Yuezihan Jiang, Hao Yang, Junyang Lin, Hanyu Zhao, An Yang, Chang Zhou, Hongxia Yang, Zhi Yang, Bin Cui
Prompt Learning has recently gained great popularity in bridging the gap between pretraining tasks and various downstream tasks.
no code implementations • 24 May 2022 • Zhikang Li, Huiling Zhou, Shuai Bai, Peike Li, Chang Zhou, Hongxia Yang
The fashion industry has diverse applications in multi-modal image generation and editing.
no code implementations • 17 May 2022 • Zeyu Cui, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang
Industrial recommender systems have been growing increasingly complex, may involve \emph{diverse domains} such as e-commerce products and user-generated contents, and can comprise \emph{a myriad of tasks} such as retrieval, ranking, explanation generation, and even AI-assisted content production.
1 code implementation • 29 Mar 2022 • Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang
By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.
no code implementations • 23 Mar 2022 • Yu Huang, Junyang Lin, Chang Zhou, Hongxia Yang, Longbo Huang
Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information.
4 code implementations • 7 Feb 2022 • Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang
In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization.
Ranked #1 on
Visual Question Answering
on VQA v2 test-std
(yes/no metric)
2 code implementations • 30 Dec 2021 • Qingsong Lv, Ming Ding, Qiang Liu, Yuxiang Chen, Wenzheng Feng, Siming He, Chang Zhou, Jianguo Jiang, Yuxiao Dong, Jie Tang
Heterogeneous graph neural networks (HGNNs) have been blossoming in recent years, but the unique data processing and evaluation setups used by each work obstruct a full understanding of their advancements.
no code implementations • 8 Oct 2021 • Junyang Lin, An Yang, Jinze Bai, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Yong Li, Wei Lin, Jingren Zhou, Hongxia Yang
Recent expeditious developments in deep learning algorithms, distributed training, and even hardware design for large models have enabled training extreme-scale models, say GPT-3 and Switch Transformer possessing hundreds of billions or even trillions of parameters.
1 code implementation • ICCV 2021 • Tan Wang, Chang Zhou, Qianru Sun, Hanwang Zhang
Attention module does not always help deep models learn causal features that are robust in any confounding context, e. g., a foreground object feature is invariant to different backgrounds.
no code implementations • 8 Jun 2021 • Jingjing Xiong, Lai-Man Po, Wing-Yin Yu, Chang Zhou, Pengfei Xian, Weifeng Ou
Real-time semantic segmentation has received considerable attention due to growing demands in many practical applications, such as autonomous vehicles, robotics, etc.
no code implementations • 2 Jun 2021 • Zhu Zhang, Chang Zhou, Jianxin Ma, Zhijie Lin, Jingren Zhou, Hongxia Yang, Zhou Zhao
Further, we design a history sampler to select informative fragments for rehearsal training, making the memory focus on the crucial information.
1 code implementation • 31 May 2021 • Haonan Wang, Chang Zhou, Carl Yang, Hongxia Yang, Jingrui He
A better way is to present a sequence of products with increasingly floral attributes based on the white dress, and allow the customer to select the most satisfactory one from the sequence.
1 code implementation • 31 May 2021 • Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Yi Yang, Hongxia Yang
In this paper, we apply one new modality, i. e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario.
no code implementations • 31 May 2021 • An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling.
no code implementations • Findings (ACL) 2021 • Peng Wang, Junyang Lin, An Yang, Chang Zhou, Yichang Zhang, Jingren Zhou, Hongxia Yang
Experimental results demonstrate that our method outperforms the previous state-of-the-art methods in both automatic and human evaluation, especially on coverage and faithfulness.
no code implementations • NeurIPS 2021 • Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations.
4 code implementations • NeurIPS 2021 • Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang
Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.
Ranked #53 on
Text-to-Image Generation
on MS COCO
(using extra training data)
no code implementations • NeurIPS 2021 • Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations.
no code implementations • 1 Mar 2021 • Junyang Lin, Rui Men, An Yang, Chang Zhou, Ming Ding, Yichang Zhang, Peng Wang, Ang Wang, Le Jiang, Xianyan Jia, Jie Zhang, Jianwei Zhang, Xu Zou, Zhikang Li, Xiaodong Deng, Jie Liu, Jinbao Xue, Huiling Zhou, Jianxin Ma, Jin Yu, Yong Li, Wei Lin, Jingren Zhou, Jie Tang, Hongxia Yang
In this work, we construct the largest dataset for multimodal pretraining in Chinese, which consists of over 1. 9TB images and 292GB texts that cover a wide range of domains.
1 code implementation • 1 Mar 2021 • Yukuo Cen, Zhenyu Hou, Yan Wang, Qibin Chen, Yizhen Luo, Zhongming Yu, Hengrui Zhang, Xingcheng Yao, Aohan Zeng, Shiguang Guo, Yuxiao Dong, Yang Yang, Peng Zhang, Guohao Dai, Yu Wang, Chang Zhou, Hongxia Yang, Jie Tang
In CogDL, we propose a unified design for the training and evaluation of GNN models for various graph tasks, making it unique among existing graph learning libraries.
no code implementations • 1 Jan 2021 • Yue Wu, Jianqiang Huang, Jiangjie Zhen, Guokun Wang, Chen Shen, Chang Zhou, Xian-Sheng Hua
The past years have witnessed an explosion of deep learning frameworks like PyTorch and TensorFlow since the success of deep neural networks.
no code implementations • 1 Jan 2021 • Jiezhong Qiu, Yukuo Cen, Qibin Chen, Chang Zhou, Jingren Zhou, Hongxia Yang, Jie Tang
Based on the theoretical analysis, we propose Local Clustering Graph Neural Networks (LCGNN), a GNN learning paradigm that utilizes local clustering to efficiently search for small but compact subgraphs for GNN training and inference.
no code implementations • 1 Jan 2021 • Zhu Zhang, Chang Zhou, Zhou Zhao, Zhijie Lin, Jingren Zhou, Hongxia Yang
Existing reasoning tasks often follow the setting of "reasoning while experiencing", which has an important assumption that the raw contents can be always accessed while reasoning.
1 code implementation • NeurIPS 2020 • Ming Ding, Chang Zhou, Hongxia Yang, Jie Tang
BERTs are incapable of processing long texts due to its quadratically increasing memory and time consumption.
1 code implementation • 23 Aug 2020 • Jianxin Ma, Chang Zhou, Hongxia Yang, Peng Cui, Xin Wang, Wenwu Zhu
There exist two challenges: i) reconstructing a future sequence containing many behaviors is exponentially harder than reconstructing a single next behavior, which can lead to difficulty in convergence, and ii) the sequence of all future behaviors can involve many intentions, not all of which may be predictable from the sequence of earlier behaviors.
4 code implementations • 20 May 2020 • Zhen Yang, Ming Ding, Chang Zhou, Hongxia Yang, Jingren Zhou, Jie Tang
To the best of our knowledge, we are the first to derive the theory and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution.
no code implementations • 20 May 2020 • Chang Zhou, Jianxin Ma, Jianwei Zhang, Jingren Zhou, Hongxia Yang
Deep candidate generation (DCG) that narrows down the collection of relevant items from billions to hundreds via representation learning has become prevalent in industrial recommender systems.
2 code implementations • 19 May 2020 • Yukuo Cen, Jianwei Zhang, Xu Zou, Chang Zhou, Hongxia Yang, Jie Tang
Recent works usually give an overall embedding from a user's behavior sequence.
no code implementations • ICLR 2020 • Baichuan Yuan, Xiaowei Wang, Jianxin Ma, Chang Zhou, Andrea L. Bertozzi, Hongxia Yang
To bridge this gap, we introduce a declustering based hidden variable model that leads to an efficient inference procedure via a variational autoencoder (VAE).
no code implementations • 2 Dec 2019 • Chunnan Wang, Hongzhi Wang, Chang Zhou, Hanxiao Chen
Motivated by this, we propose ExperienceThinking algorithm to quickly find the best possible hyperparameter configuration of machine learning algorithms within a few configuration evaluations.
no code implementations • NeurIPS 2019 • Jianxin Ma, Chang Zhou, Peng Cui, Hongxia Yang, Wenwu Zhu
Our approach achieves macro disentanglement by inferring the high-level concepts associated with user intentions (e. g., to buy a shirt or a cellphone), while capturing the preference of a user regarding the different concepts separately.
no code implementations • 25 Sep 2019 • Xu Zou, Qiuye Jia, Jianwei Zhang, Chang Zhou, Zijun Yao, Hongxia Yang, Jie Tang
In this paper, we propose a method named Dimensional reweighting Graph Convolutional Networks (DrGCNs), to tackle the problem of variance between dimensional information in the node representations of GCNs.
2 code implementations • 4 Jul 2019 • Xu Zou, Qiuye Jia, Jianwei Zhang, Chang Zhou, Hongxia Yang, Jie Tang
Graph Convolution Networks (GCNs) are becoming more and more popular for learning node representations on graphs.
1 code implementation • 13 Jun 2019 • Zhengxiao Du, Chang Zhou, Ming Ding, Hongxia Yang, Jie Tang
Inferring new facts from existing knowledge graphs (KG) with explainable reasoning processes is a significant problem and has received much attention recently.
2 code implementations • ACL 2019 • Ming Ding, Chang Zhou, Qibin Chen, Hongxia Yang, Jie Tang
We propose a new CogQA framework for multi-hop question answering in web-scale documents.
Ranked #50 on
Question Answering
on HotpotQA
no code implementations • 3 Apr 2019 • Jinze Bai, Chang Zhou, Junshuai Song, Xiaoru Qu, Weiting An, Zhao Li, Jun Gao
In particular, BGN improves the precision of the best competitors by 16\% on average while maintaining the highest diversity on four datasets, and yields a 3. 85x improvement of response time over the best competitors in the bundle list recommendation problem.
no code implementations • 23 Feb 2019 • Rong Zhu, Kun Zhao, Hongxia Yang, Wei. Lin, Chang Zhou, Baole Ai, Yong Li, Jingren Zhou
An increasing number of machine learning tasks require dealing with large graph datasets, which capture rich and complex relationship among potentially billions of elements.
Distributed, Parallel, and Cluster Computing
15 code implementations • 11 Sep 2018 • Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, Kun Gai
Easy-to-use, Modular and Extendible package of deep-learning based CTR models. DeepFM, DeepInterestNetwork(DIN), DeepInterestEvolutionNetwork(DIEN), DeepCrossNetwork(DCN), AttentionalFactorizationMachine(AFM), Neural Factorization Machine(NFM), AutoInt
Ranked #1 on
Click-Through Rate Prediction
on Amazon Dataset
2 code implementations • 17 Nov 2017 • Chang Zhou, Jinze Bai, Junshuai Song, Xiaofei Liu, Zhengchao Zhao, Xiusi Chen, Jun Gao
Downstream applications then can use the user behavior vectors via vanilla attention.