Search Results for author: Yang Ye

Found 19 papers, 9 papers with code

MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft

no code implementations11 Apr 2025 Junliang Guo, Yang Ye, Tianyu He, HaoYu Wu, Yushu Jiang, Tim Pearce, Jiang Bian

In evaluation, we propose new metrics to assess not only visual quality but also the action following capacity when generating new scenes, which is crucial for a world model.

Minecraft

Fast Autoregressive Video Generation with Diagonal Decoding

no code implementations18 Mar 2025 Yang Ye, Junliang Guo, HaoYu Wu, Tianyu He, Tim Pearce, Tabish Rashid, Katja Hofmann, Jiang Bian

Autoregressive Transformer models have demonstrated impressive performance in video generation, but their sequential token-by-token decoding process poses a major bottleneck, particularly for long videos represented by tens of thousands of tokens.

Video Generation

Force-Based Robotic Imitation Learning: A Two-Phase Approach for Construction Assembly Tasks

no code implementations24 Jan 2025 Hengxu You, Yang Ye, Tianyu Zhou, Jing Du

The framework simulates realistic force-based interactions, enhancing the training data's quality for precise robotic manipulation in construction tasks.

Imitation Learning

Open-Sora Plan: Open-Source Large Video Generation Model

5 code implementations28 Nov 2024 Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, Tanghui Jia, Junwu Zhang, Zhenyu Tang, Yatian Pang, Bin She, Cen Yan, Zhiheng Hu, Xiaoyi Dong, Lin Chen, Zhang Pan, Xing Zhou, Shaoling Dong, Yonghong Tian, Li Yuan

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs.

Video Generation

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

2 code implementations26 Nov 2024 Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan

However, as the resolution and duration of generated videos increase, the encoding cost of Video VAEs becomes a limiting bottleneck in training LVDMs.

Apple Intelligence Foundation Language Models

no code implementations29 Jul 2024 Tom Gunter, ZiRui Wang, Chong Wang, Ruoming Pang, Aonan Zhang, BoWen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek, Sam Wiseman, Syd Evans, Tao Lei, Vivek Rathod, Xiang Kong, Xianzhi Du, Yanghao Li, Yongqiang Wang, Yuan Gao, Zaid Ahmed, Zhaoyang Xu, Zhiyun Lu, Al Rashid, Albin Madappally Jose, Alec Doane, Alfredo Bencomo, Allison Vanderby, Andrew Hansen, Ankur Jain, Anupama Mann Anupama, Areeba Kamal, Bugu Wu, Carolina Brum, Charlie Maalouf, Chinguun Erdenebileg, Chris Dulhanty, Dominik Moritz, Doug Kang, Eduardo Jimenez, Evan Ladd, Fangping Shi, Felix Bai, Frank Chu, Fred Hohman, Hadas Kotek, Hannah Gillis Coleman, Jane Li, Jeffrey Bigham, Jeffery Cao, Jeff Lai, Jessica Cheung, Jiulong Shan, Joe Zhou, John Li, Jun Qin, Karanjeet Singh, Karla Vega, Kelvin Zou, Laura Heckman, Lauren Gardiner, Margit Bowler, Maria Cordell, Meng Cao, Nicole Hay, Nilesh Shahdadpuri, Otto Godwin, Pranay Dighe, Pushyami Rachapudi, Ramsey Tantawi, Roman Frigg, Sam Davarnia, Sanskruti Shah, Saptarshi Guha, Sasha Sirovica, Shen Ma, Shuang Ma, Simon Wang, Sulgi Kim, Suma Jayaram, Vaishaal Shankar, Varsha Paidi, Vivek Kumar, Xin Wang, Xin Zheng, Walker Cheng, Yael Shrager, Yang Ye, Yasu Tanaka, Yihao Guo, Yunsong Meng, Zhao Tang Luo, Zhi Ouyang, Alp Aygar, Alvin Wan, Andrew Walkingshaw, Andy Narayanan, Antonie Lin, Arsalan Farooq, Brent Ramerth, Colorado Reed, Chris Bartels, Chris Chaney, David Riazati, Eric Liang Yang, Erin Feldman, Gabriel Hochstrasser, Guillaume Seguin, Irina Belousova, Joris Pelemans, Karen Yang, Keivan Alizadeh Vahid, Liangliang Cao, Mahyar Najibi, Marco Zuliani, Max Horton, Minsik Cho, Nikhil Bhendawade, Patrick Dong, Piotr Maj, Pulkit Agrawal, Qi Shan, Qichen Fu, Regan Poston, Sam Xu, Shuangning Liu, Sushma Rao, Tashweena Heeramun, Thomas Merth, Uday Rayala, Victor Cui, Vivek Rangarajan Sridhar, Wencong Zhang, Wenqi Zhang, Wentao Wu, Xingyu Zhou, Xinwen Liu, Yang Zhao, Yin Xia, Zhile Ren, Zhongzheng Ren

We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute.

Language Modeling Language Modelling

A Hybrid Generative and Discriminative PointNet on Unordered Point Sets

no code implementations19 Apr 2024 Yang Ye, Shihao Ji

This paper proposes GDPNet, the first hybrid Generative and Discriminative PointNet that extends JEM for point cloud classification and generation.

Image Classification Point Cloud Classification +2

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

6 code implementations16 Nov 2023 Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Language Modeling Language Modelling +5

Urban Drone Navigation: Autoencoder Learning Fusion for Aerodynamics

no code implementations13 Oct 2023 Jiaohao Wu, Yang Ye, Jing Du

Drones are vital for urban emergency search and rescue (SAR) due to the challenges of navigating dynamic environments with obstacles like buildings and wind.

Drone navigation Multi-Objective Reinforcement Learning +1

Improved Trust in Human-Robot Collaboration with ChatGPT

no code implementations25 Apr 2023 Yang Ye, Hengxu You, Jing Du

A human-subject experiment showed that incorporating ChatGPT in robots significantly increased trust in human-robot collaboration, which can be attributed to the robot's ability to communicate more effectively with humans.

Robot-Enabled Construction Assembly with Automated Sequence Planning based on ChatGPT: RoboGPT

no code implementations21 Apr 2023 Hengxu You, Yang Ye, Tianyu Zhou, Qi Zhu, Jing Du

To expand the ability of the current robot system in sequential understanding, this paper introduces RoboGPT, a novel system that leverages the advanced reasoning capabilities of ChatGPT, a large language model, for automated sequence planning in robot-based assembly applied to construction tasks.

Language Modeling Language Modelling +1

APSNet: Attention Based Point Cloud Sampling

1 code implementation11 Oct 2022 Yang Ye, Xiulong Yang, Shihao Ji

Traditional task-agnostic sampling methods, such as farthest point sampling (FPS), do not consider downstream tasks when sampling point clouds, and thus non-informative points to the tasks are often sampled.

3D Point Cloud Classification Knowledge Distillation +2

A Hierarchical N-Gram Framework for Zero-Shot Link Prediction

1 code implementation16 Apr 2022 Mingchen Li, Junfan Chen, Samuel Mensah, Nikolaos Aletras, Xiulong Yang, Yang Ye

Thus, in this paper, we propose a Hierarchical N-Gram framework for Zero-Shot Link Prediction (HNZSLP), which considers the dependencies among character n-grams of the relation surface name for ZSLP.

Knowledge Graphs Link Prediction +1

Generative Max-Mahalanobis Classifiers for Image Classification, Generation and More

1 code implementation1 Jan 2021 Xiulong Yang, Hui Ye, Yang Ye, Xiang Li, Shihao Ji

We show that our Generative MMC (GMMC) can be trained discriminatively, generatively, or jointly for image classification and generation.

Adversarial Robustness Classification +4

Adversarial Privacy Preserving Graph Embedding against Inference Attack

1 code implementation30 Aug 2020 Kaiyang Li, Guangchun Luo, Yang Ye, Wei Li, Shihao Ji, Zhipeng Cai

In this paper, we propose Adversarial Privacy Graph Embedding (APGE), a graph adversarial training framework that integrates the disentangling and purging mechanisms to remove users' private information from learned node representations.

Graph Embedding Inference Attack +4

Sparse Graph Attention Networks

1 code implementation2 Dec 2019 Yang Ye, Shihao Ji

Among the variants of GNNs, Graph Attention Networks (GATs) learn to assign dense attention coefficients over all neighbors of a node for feature aggregation, and improve the performance of many graph learning tasks.

General Classification Graph Attention +5

Cannot find the paper you are looking for? You can Submit a new open access paper.