no code implementations • 11 Apr 2025 • Junliang Guo, Yang Ye, Tianyu He, HaoYu Wu, Yushu Jiang, Tim Pearce, Jiang Bian
In evaluation, we propose new metrics to assess not only visual quality but also the action following capacity when generating new scenes, which is crucial for a world model.
no code implementations • 18 Mar 2025 • Yang Ye, Junliang Guo, HaoYu Wu, Tianyu He, Tim Pearce, Tabish Rashid, Katja Hofmann, Jiang Bian
Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of tokens.
no code implementations • 24 Jan 2025 • Hengxu You, Yang Ye, Tianyu Zhou, Jing Du
The framework simulates realistic force-based interactions, enhancing the training data's quality for precise robotic manipulation in construction tasks.
5 code implementations • 28 Nov 2024 • Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, Tanghui Jia, Junwu Zhang, Zhenyu Tang, Yatian Pang, Bin She, Cen Yan, Zhiheng Hu, Xiaoyi Dong, Lin Chen, Zhang Pan, Xing Zhou, Shaoling Dong, Yonghong Tian, Li Yuan
We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs.
2 code implementations • 26 Nov 2024 • Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan
However, as the resolution and duration of generated videos increase, the encoding cost of Video VAEs becomes a limiting bottleneck in training LVDMs.
no code implementations • 29 Jul 2024 • Tom Gunter, ZiRui Wang, Chong Wang, Ruoming Pang, Aonan Zhang, BoWen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek, Sam Wiseman, Syd Evans, Tao Lei, Vivek Rathod, Xiang Kong, Xianzhi Du, Yanghao Li, Yongqiang Wang, Yuan Gao, Zaid Ahmed, Zhaoyang Xu, Zhiyun Lu, Al Rashid, Albin Madappally Jose, Alec Doane, Alfredo Bencomo, Allison Vanderby, Andrew Hansen, Ankur Jain, Anupama Mann Anupama, Areeba Kamal, Bugu Wu, Carolina Brum, Charlie Maalouf, Chinguun Erdenebileg, Chris Dulhanty, Dominik Moritz, Doug Kang, Eduardo Jimenez, Evan Ladd, Fangping Shi, Felix Bai, Frank Chu, Fred Hohman, Hadas Kotek, Hannah Gillis Coleman, Jane Li, Jeffrey Bigham, Jeffery Cao, Jeff Lai, Jessica Cheung, Jiulong Shan, Joe Zhou, John Li, Jun Qin, Karanjeet Singh, Karla Vega, Kelvin Zou, Laura Heckman, Lauren Gardiner, Margit Bowler, Maria Cordell, Meng Cao, Nicole Hay, Nilesh Shahdadpuri, Otto Godwin, Pranay Dighe, Pushyami Rachapudi, Ramsey Tantawi, Roman Frigg, Sam Davarnia, Sanskruti Shah, Saptarshi Guha, Sasha Sirovica, Shen Ma, Shuang Ma, Simon Wang, Sulgi Kim, Suma Jayaram, Vaishaal Shankar, Varsha Paidi, Vivek Kumar, Xin Wang, Xin Zheng, Walker Cheng, Yael Shrager, Yang Ye, Yasu Tanaka, Yihao Guo, Yunsong Meng, Zhao Tang Luo, Zhi Ouyang, Alp Aygar, Alvin Wan, Andrew Walkingshaw, Andy Narayanan, Antonie Lin, Arsalan Farooq, Brent Ramerth, Colorado Reed, Chris Bartels, Chris Chaney, David Riazati, Eric Liang Yang, Erin Feldman, Gabriel Hochstrasser, Guillaume Seguin, Irina Belousova, Joris Pelemans, Karen Yang, Keivan Alizadeh Vahid, Liangliang Cao, Mahyar Najibi, Marco Zuliani, Max Horton, Minsik Cho, Nikhil Bhendawade, Patrick Dong, Piotr Maj, Pulkit Agrawal, Qi Shan, Qichen Fu, Regan Poston, Sam Xu, Shuangning Liu, Sushma Rao, Tashweena Heeramun, Thomas Merth, Uday Rayala, Victor Cui, Vivek Rangarajan Sridhar, Wencong Zhang, Wenqi Zhang, Wentao Wu, Xingyu Zhou, Xinwen Liu, Yang Zhao, Yin Xia, Zhile Ren, Zhongzheng Ren
We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute.
no code implementations • 19 Apr 2024 • Yang Ye, Shihao Ji
This paper proposes GDPNet, the first hybrid Generative and Discriminative PointNet that extends JEM for point cloud classification and generation.
2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jinfa Huang, Junwu Zhang, Yatian Pang, Peng Jin, Munan Ning, Jiebo Luo, Li Yuan
In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.
Ranked #150 on
Visual Question Answering
on MM-Vet
6 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan
In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.
Ranked #9 on
Zero-Shot Video Question Answer
on TGIF-QA
no code implementations • 13 Oct 2023 • Jiaohao Wu, Yang Ye, Jing Du
Drones are vital for urban emergency search and rescue (SAR) due to the challenges of navigating dynamic environments with obstacles like buildings and wind.
no code implementations • 29 May 2023 • Mingchen Li, Yang Ye, Jeremy Yeung, Huixue Zhou, Huaiyuan Chu, Rui Zhang
Contrastive learning has become a popular solution for few-shot Name Entity Recognization (NER).
no code implementations • 25 Apr 2023 • Yang Ye, Hengxu You, Jing Du
A human-subject experiment showed that incorporating ChatGPT in robots significantly increased trust in human-robot collaboration, which can be attributed to the robot's ability to communicate more effectively with humans.
no code implementations • 21 Apr 2023 • Hengxu You, Yang Ye, Tianyu Zhou, Qi Zhu, Jing Du
To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks.
1 code implementation • 11 Oct 2022 • Yang Ye, Xiulong Yang, Shihao Ji
Traditional task-agnostic sampling methods, such as farthest point sampling (FPS), do not consider downstream tasks when sampling point clouds, and thus non-informative points to the tasks are often sampled.
1 code implementation • 16 Apr 2022 • Mingchen Li, Junfan Chen, Samuel Mensah, Nikolaos Aletras, Xiulong Yang, Yang Ye
Thus, in this paper, we propose a Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which considers the dependencies among character n-grams of the relation surface name for ZSLP.
no code implementations • 29 Jan 2021 • Yang Ye, Qingpeng Zhang, Zhidong Cao, Frank Youhua Chen, Houmin Yan, H. Eugene Stanley, Daniel Dajun Zeng
The threshold model captures the shortage contagion on the global PPE trade network.
Physics and Society
1 code implementation • 1 Jan 2021 • Xiulong Yang, Hui Ye, Yang Ye, Xiang Li, Shihao Ji
We show that our Generative MMC (GMMC) can be trained discriminatively, generatively, or jointly for image classification and generation.
1 code implementation • 30 Aug 2020 • Kaiyang Li, Guangchun Luo, Yang Ye, Wei Li, Shihao Ji, Zhipeng Cai
In this paper, we propose Adversarial Privacy Graph Embedding (APGE), a graph adversarial training framework that integrates the disentangling and purging mechanisms to remove users' private information from learned node representations.
1 code implementation • 2 Dec 2019 • Yang Ye, Shihao Ji
Among the variants of GNNs, Graph Attention Networks (GATs) learn to assign dense attention coefficients over all neighbors of a node for feature aggregation, and improve the performance of many graph learning tasks.