no code implementations • ACL 2022 • Shijie Geng, Zuohui Fu, Yingqiang Ge, Lei LI, Gerard de Melo, Yongfeng Zhang
In modern recommender systems, there are usually comments or reviews from users that justify their ratings for different items.
1 code implementation • 30 Nov 2024 • Bytedance-Seed-Foundation-Code-Team, :, Yao Cheng, Jianfeng Chen, Jie Chen, Li Chen, Liyu Chen, Wentao Chen, Zhengyu Chen, Shijie Geng, Aoyan Li, Bo Li, Bowen Li, Linyi Li, Boyi Liu, Jerry Liu, Kaibo Liu, Qi Liu, Shukai Liu, Siyao Liu, Tianyi Liu, Tingkai Liu, Yongfei Liu, Rui Long, Jing Mai, Guanghan Ning, Z. Y. Peng, Kai Shen, Jiahao Su, Jing Su, Tao Sun, Yifan Sun, Yunzhe Tao, Guoyin Wang, Siwei Wang, Xuwu Wang, Yite Wang, Zihan Wang, Jinxiang Xia, Liang Xiang, Xia Xiao, Yongsheng Xiao, Chenguang Xi, Shulin Xin, Jingjing Xu, Shikun Xu, Hongxia Yang, Jack Yang, Yingxiang Yang, Jianbo Yuan, Jun Zhang, Yufeng Zhang, Yuyu Zhang, Shen Zheng, He Zhu, Ming Zhu
As the capabilities of code large language models (LLMs) continue to expand, their applications across diverse code intelligence domains are rapidly increasing.
2 code implementations • 9 May 2024 • Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, Jingwen He, Yu Qiao, Hongsheng Li
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.
2 code implementations • 11 Mar 2024 • Linyi Li, Shijie Geng, Zhenwen Li, Yibo He, Hao Yu, Ziyue Hua, Guanghan Ning, Siwei Wang, Tao Xie, Hongxia Yang
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
1 code implementation • 8 Feb 2024 • Dongyang Liu, Renrui Zhang, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao, Peng Gao
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.
Ranked #14 on
Video Question Answering
on MVBench
1 code implementation • 27 Sep 2023 • Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric Jing, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias
We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries.
1 code implementation • 23 May 2023 • Shijie Geng, Juntao Tan, Shuchang Liu, Zuohui Fu, Yongfeng Zhang
In light of this, we propose the development of a multimodal foundation model (MFM) considering visual, textual, and personalization modalities under the P5 recommendation paradigm, thus named VIP5 (Visual P5), to unify various modalities and recommendation tasks.
3 code implementations • 28 Apr 2023 • Peng Gao, Jiaming Han, Renrui Zhang, Ziyi Lin, Shijie Geng, Aojun Zhou, Wei zhang, Pan Lu, Conghui He, Xiangyu Yue, Hongsheng Li, Yu Qiao
This strategy effectively alleviates the interference between the two tasks of image-text alignment and instruction following and achieves strong multi-modal reasoning with only a small-scale image-text and instruction dataset.
Ranked #6 on
Visual Question Answering (VQA)
on InfiMM-Eval
1 code implementation • CVPR 2023 • Yuxiao Chen, Jianbo Yuan, Yu Tian, Shijie Geng, Xinyu Li, Ding Zhou, Dimitris N. Metaxas, Hongxia Yang
However, direct aligning cross-modal information using such representations is challenging, as visual patches and text tokens differ in semantic levels and granularities.
1 code implementation • 6 Mar 2023 • Shijie Geng, Jianbo Yuan, Yu Tian, Yuxiao Chen, Yongfeng Zhang
The success of large-scale contrastive vision-language pretraining (CLIP) has benefited both visual recognition and multimodal content understanding.
1 code implementation • 30 Jan 2023 • Haonan Chang, Dhruv Metha Ramesh, Shijie Geng, Yuqiu Gan, Abdeslam Boularias
We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework.
2 code implementations • 6 Aug 2022 • Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li
Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos.
Ranked #31 on
Action Classification
on Kinetics-400
1 code implementation • 20 Jul 2022 • Yuxiao Chen, Long Zhao, Jianbo Yuan, Yu Tian, Zhaoyang Xia, Shijie Geng, Ligong Han, Dimitris N. Metaxas
Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult.
no code implementations • 24 Apr 2022 • Yingqiang Ge, Juntao Tan, Yan Zhu, Yinglong Xia, Jiebo Luo, Shuchang Liu, Zuohui Fu, Shijie Geng, Zelong Li, Yongfeng Zhang
In this paper, we study the problem of explainable fairness, which helps to gain insights about why a system is fair or unfair, and guides the design of fair recommender systems with a more informed and unified methodology.
2 code implementations • 24 Mar 2022 • Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, Yongfeng Zhang
For a long time, different recommendation tasks typically require designing task-specific architectures and training objectives.
1 code implementation • 17 Feb 2022 • Juntao Tan, Shijie Geng, Zuohui Fu, Yingqiang Ge, Shuyuan Xu, Yunqi Li, Yongfeng Zhang
For quantitatively evaluating the generated explanations without the requirement of ground-truth, we design metrics based on Counterfactual and Factual reasoning to evaluate the necessity and sufficiency of the explanations.
1 code implementation • 11 Dec 2021 • Honglu Zhou, Asim Kadav, Aviv Shamsian, Shijie Geng, Farley Lai, Long Zhao, Ting Liu, Mubbasir Kapadia, Hans Peter Graf
Group Activity Recognition detects the activity collectively performed by a group of actors, which requires compositional reasoning of actors and objects.
Ranked #2 on
Group Activity Recognition
on Collective Activity
1 code implementation • 29 Nov 2021 • Teli Ma, Shijie Geng, Mengmeng Wang, Jing Shao, Jiasen Lu, Hongsheng Li, Peng Gao, Yu Qiao
Recent advances in large-scale contrastive visual-language pretraining shed light on a new pathway for visual recognition.
Ranked #5 on
Long-tail Learning
on Places-LT
(using extra training data)
no code implementations • 13 Oct 2021 • Ankit P. Shah, Shijie Geng, Peng Gao, Anoop Cherian, Takaaki Hori, Tim K. Marks, Jonathan Le Roux, Chiori Hori
In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).
3 code implementations • 9 Oct 2021 • Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao
Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.
no code implementations • 24 Sep 2021 • Lei Shi, Kai Shuang, Shijie Geng, Peng Gao, Zuohui Fu, Gerard de Melo, Yunpeng Chen, Sen Su
To overcome these issues, we propose unbiased Dense Contrastive Visual-Linguistic Pretraining (DCVLP), which replaces the region regression and classification with cross-modality region contrastive learning that requires no annotations.
no code implementations • 5 Sep 2021 • Yingqiang Ge, Shuchang Liu, Zelong Li, Shuyuan Xu, Shijie Geng, Yunqi Li, Juntao Tan, Fei Sun, Yongfeng Zhang
While recent years have witnessed the emergence of various explainable methods in machine learning, to what degree the explanations really represent the reasoning process behind the model prediction -- namely, the faithfulness of explanation -- is still an open problem.
no code implementations • 4 Jun 2021 • Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li
In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.
1 code implementation • 24 Jan 2021 • Shijie Geng, Peng Gao, Zuohui Fu, Yongfeng Zhang
In this paper, we leverage gradient regularized self-distillation for RObust training of Multi-Exit BERT (RomeBERT), which can effectively solve the performance imbalance problem between early and late exits.
1 code implementation • 29 Oct 2020 • Yikun Xian, Zuohui Fu, Handong Zhao, Yingqiang Ge, Xu Chen, Qiaoying Huang, Shijie Geng, Zhou Qin, Gerard de Melo, S. Muthukrishnan, Yongfeng Zhang
User profiles can capture prominent user behaviors from the history, and provide valuable signals about which kinds of path patterns are more likely to lead to potential items of interest for the user.
no code implementations • 23 Sep 2020 • Peng Gao, Chiori Hori, Shijie Geng, Takaaki Hori, Jonathan Le Roux
In contrast with previous approaches where information flows only towards deeper layers of a stack, we consider a multi-pass transformer (MPT) architecture in which earlier layers are allowed to process information in light of the output of later layers.
no code implementations • 26 Jul 2020 • Lei Shi, Kai Shuang, Shijie Geng, Peng Su, Zhengkai Jiang, Peng Gao, Zuohui Fu, Gerard de Melo, Sen Su
We evaluate CVLP on several down-stream tasks, including VQA, GQA and NLVR2 to validate the superiority of contrastive learning on multi-modality representation learning.
no code implementations • 8 Jul 2020 • Shijie Geng, Peng Gao, Moitreya Chatterjee, Chiori Hori, Jonathan Le Roux, Yongfeng Zhang, Hongsheng Li, Anoop Cherian
Given an input video, its associated audio, and a brief caption, the audio-visual scene aware dialog (AVSD) task requires an agent to indulge in a question-answer dialog with a human about the audio-visual content.
no code implementations • 3 Jun 2020 • Zuohui Fu, Yikun Xian, Ruoyuan Gao, Jieyu Zhao, Qiaoying Huang, Yingqiang Ge, Shuyuan Xu, Shijie Geng, Chirag Shah, Yongfeng Zhang, Gerard de Melo
There has been growing attention on fairness considerations recently, especially in the context of intelligent decision making systems.
no code implementations • 9 May 2020 • Shijie Geng, Ji Zhang, Zuohui Fu, Peng Gao, Hang Zhang, Gerard de Melo
Without identifying the connection between appearing people and character names, a model is not able to obtain a genuine understanding of the plots.
no code implementations • 29 Jan 2020 • Zuohui Fu, Yikun Xian, Shijie Geng, Yingqiang Ge, Yuting Wang, Xin Dong, Guang Wang, Gerard de Melo
A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal.
no code implementations • 3 Jan 2020 • Lei Shi, Shijie Geng, Kai Shuang, Chiori Hori, Songxiang Liu, Peng Gao, Sen Su
To solve the issue for the intermediate layers, we propose an efficient Quaternion Block Network (QBN) to learn interaction not only for the last layer but also for all intermediate layers simultaneously.
no code implementations • 16 Jul 2019 • Shijie Geng, Ji Zhang, Hang Zhang, Ahmed Elgammal, Dimitris N. Metaxas
We present a simple method that achieves unexpectedly superior performance for Complex Reasoning involved Visual Question Answering.
1 code implementation • 20 Aug 2018 • Zhiqiang Tang, Xi Peng, Shijie Geng, Yizhe Zhu, Dimitris N. Metaxas
We design a new connectivity pattern for the U-Net architecture.
Ranked #30 on
Pose Estimation
on MPII Human Pose
1 code implementation • ECCV 2018 • Zhiqiang Tang, Xi Peng, Shijie Geng, Lingfei Wu, Shaoting Zhang, Dimitris Metaxas
Finally, to reduce the memory consumption and high precision operations both in training and testing, we further quantize weights, inputs, and gradients of our localization network to low bit-width numbers.
Ranked #19 on
Pose Estimation
on MPII Human Pose