Search Results for author: Zun Wang

Found 30 papers, 16 papers with code

DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management

no code implementations19 May 2025 Xuerui Su, Liya Guo, Yue Wang, Yi Zhu, ZhiMing Ma, Zun Wang, YuTing Liu

On the other hand, we observe that reward variance significantly affects both convergence speed and final model performance.

Management Reinforcement Learning (RL)

UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion

no code implementations9 Mar 2025 Gongbo Zhang, Yanting Li, Renqian Luo, Pipi Hu, Zeru Zhao, Lingbo Li, Guoqing Liu, Zun Wang, Ran Bi, Kaiyuan Gao, Liya Guo, Yu Xie, Chang Liu, Jia Zhang, Tian Xie, Robert Pinsler, Claudio Zeni, Ziheng Lu, Yingce Xia, Marwin Segler, Maik Riechert, Li Yuan, Lei Chen, Haiguang Liu, Tao Qin

We validate the effectiveness of UniGenX on material and small molecule generation tasks, achieving a significant leap in state-of-the-art performance for material crystal structure prediction and establishing new state-of-the-art results for small molecule structure prediction, de novo design, and conditional generation.

Text Generation

Enhancing the Scalability and Applicability of Kohn-Sham Hamiltonians for Molecular Systems

no code implementations26 Feb 2025 Yunyang Li, Zaishuo Xia, Lin Huang, Xinran Wei, Han Yang, Sam Harshe, Zun Wang, Chang Liu, Jia Zhang, Bin Shao, Mark B. Gerstein

In this study, we generate a substantially larger training set (PubChemQH) than used previously and use it to create a scalable model for DFT calculations with physical accuracy.

HybriDNA: A Hybrid Transformer-Mamba2 Long-Range DNA Language Model

no code implementations15 Feb 2025 Mingqian Ma, Guoqing Liu, Chuan Cao, Pan Deng, Tri Dao, Albert Gu, Peiran Jin, Zhao Yang, Yingce Xia, Renqian Luo, Pipi Hu, Zun Wang, Yuan-Jyue Chen, Haiguang Liu, Tao Qin

To address these challenges, we propose HybriDNA, a decoder-only DNA language model that incorporates a hybrid Transformer-Mamba2 architecture, seamlessly integrating the strengths of attention mechanisms with selective state-space models.

Language Modeling Language Modelling +1

Efficient and Scalable Density Functional Theory Hamiltonian Prediction through Adaptive Sparsity

1 code implementation3 Feb 2025 Erpai Luo, Xinran Wei, Lin Huang, Yunyang Li, Han Yang, Zaishuo Xia, Zun Wang, Chang Liu, Bin Shao, Jia Zhang

Beyond Hamiltonian prediction, the proposed sparsification techniques also hold significant potential for improving the efficiency and scalability of other SE(3) equivariant networks, further broadening their applicability and impact.

Computational chemistry Prediction

E2Former: A Linear-time Efficient and Equivariant Transformer for Scalable Molecular Modeling

no code implementations31 Jan 2025 Yunyang Li, Lin Huang, Zhihao Ding, Chu Wang, Xinran Wei, Han Yang, Zun Wang, Chang Liu, Yu Shi, Peiran Jin, Jia Zhang, Mark Gerstein, Tao Qin

Equivariant Graph Neural Networks (EGNNs) have demonstrated significant success in modeling microscale systems, including those in chemistry, biology and materials science.

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

1 code implementation11 Dec 2024 Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, LiMin Wang

In this paper, we introduce a Self-Refining Data Flywheel (SRDF) that generates high-quality and large-scale navigational instruction-trajectory pairs by iteratively refining the data pool through the collaboration between two models, the instruction generator and the navigator, without any human-in-the-loop annotation.

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

1 code implementation7 Dec 2024 Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu

The academic field of learning instruction-guided visual navigation can be generally categorized into high-level category-specific search and low-level language-guided navigation, depending on the granularity of language instruction, in which the former emphasizes the exploration process, while the latter concentrates on following detailed textual commands.

General Knowledge Mixture-of-Experts +1

Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates

no code implementations2 Dec 2024 Kaiyuan Gao, Yusong Wang, Haoxiang Guan, Zun Wang, Qizhi Pei, John E. Hopcroft, Kun He, Lijun Wu

Two primary obstacles emerge: (1) the difficulty in designing a 3D line notation that ensures SE(3)-invariant atomic coordinates, and (2) the non-trivial task of tokenizing continuous coordinates for use in LMs, which inherently require discrete inputs.

molecular representation Property Prediction

DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

no code implementations25 Nov 2024 Zun Wang, Jialu Li, Han Lin, Jaehong Yoon, Mohit Bansal

To address these challenges, we propose DreamRunner, a novel story-to-video generation method: First, we structure the input script using a large language model (LLM) to facilitate both coarse-grained scene planning as well as fine-grained object-level layout and motion planning.

Large Language Model Motion Planning +4

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

1 code implementation17 Jul 2024 Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu

Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a burgeoning initiative to harness LLMs for instruction following robotic navigation.

Instruction Following Vision and Language Navigation

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

1 code implementation9 Jul 2024 Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development.

Vision and Language Navigation

FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Fields

no code implementations2 Jul 2024 Shihao Shao, Haoran Geng, Zun Wang, Qinghua Cui

However, the permutation-equivariance requirement of MLFFs limits the design space of CG transform, that is, intensive CG transform has to be conducted for each neighboring edge and the operations should be performed in the same manner for all edges.

Property Prediction

Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models

1 code implementation6 Jun 2024 Zun Wang, Chang Liu, Nianlong Zou, He Zhang, Xinran Wei, Lin Huang, Lijun Wu, Bin Shao

In this study, we introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model, which incorporates Deep Equilibrium Models (DEQs) for predicting Density Functional Theory (DFT) Hamiltonians.

SE3Set: Harnessing equivariant hypergraph neural networks for molecular representation learning

1 code implementation26 May 2024 Hongfei Wu, Lijun Wu, Guoqing Liu, Zhirong Liu, Bin Shao, Zun Wang

In this paper, we develop SE3Set, an SE(3) equivariant hypergraph neural network architecture tailored for advanced molecular representation learning.

Computational chemistry molecular representation +1

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding

2 code implementations22 Mar 2024 Yi Wang, Kunchang Li, Xinhao Li, Jiashuo Yu, Yinan He, Chenting Wang, Guo Chen, Baoqi Pei, Ziang Yan, Rongkun Zheng, Jilan Xu, Zun Wang, Yansong Shi, Tianxiang Jiang, Songze Li, Hongjie Zhang, Yifei HUANG, Yu Qiao, Yali Wang, LiMin Wang

We introduce InternVideo2, a new family of video foundation models (ViFM) that achieve the state-of-the-art results in video recognition, video-text tasks, and video-centric dialogue.

Action Classification Action Recognition +13

Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction

no code implementations14 Mar 2024 He Zhang, Chang Liu, Zun Wang, Xinran Wei, Siyuan Liu, Nanning Zheng, Bin Shao, Tie-Yan Liu

Predicting the mean-field Hamiltonian matrix in density functional theory is a fundamental formulation to leverage machine learning for solving molecular science problems.

Prediction Property Prediction

Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey

2 code implementations3 Mar 2024 Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, Rui Yan

The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology.

Property Prediction

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

3 code implementations CVPR 2024 Kunchang Li, Yali Wang, Yinan He, Yizhuo Li, Yi Wang, Yi Liu, Zun Wang, Jilan Xu, Guo Chen, Ping Luo, LiMin Wang, Yu Qiao

With the rapid development of Multi-modal Large Language Models (MLLMs), a number of diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities of these models.

3D Question Answering (3D-QA) Diagnostic +12

Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields

no code implementations11 Aug 2023 Yatao Li, Wanling Gao, Lei Wang, Lixin Sun, Zun Wang, Jianfeng Zhan

This suite of metrics has demonstrated a better ability to assess a model's performance in real-world scientific applications, in contrast to traditional AI benchmarking methodologies.

Benchmarking

Scaling Data Generation in Vision-and-Language Navigation

1 code implementation ICCV 2023 Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents.

Imitation Learning Vision and Language Navigation +1

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

1 code implementation6 Apr 2023 Dong An, Hanqing Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang

To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments.

Autonomous Navigation Navigate +1

InternVideo: General Video Foundation Models via Generative and Discriminative Learning

2 code implementations6 Dec 2022 Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu, Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications.

 Ranked #1 on Action Recognition on Something-Something V1 (using extra training data)

Action Classification Contrastive Learning +8

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition (CVPR 2022)

1 code implementation23 Jun 2022 Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao

Our model consists of three modules: the candidate waypoints predictor (CWP), the history enhanced planner and the tryout controller.

Data Augmentation Vision and Language Navigation

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

1 code implementation CVPR 2022 Yicong Hong, Zun Wang, Qi Wu, Stephen Gould

To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments.

Imitation Learning Vision and Language Navigation

Heterogeneous relational message passing networks for molecular dynamics simulations

no code implementations2 Sep 2021 Zun Wang, Chong Wang, Sibo Zhao, Yong Xu, Shaogang Hao, Chang Yu Hsieh, Bing-Lin Gu, Wenhui Duan

With many frameworks based on message passing neural networks proposed to predict molecular and bulk properties, machine learning methods have tremendously shifted the paradigms of computational sciences underpinning physics, material science, chemistry, and biology.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.