no code implementations • 27 Sep 2024 • Min Yang, Zichen Zhang, LiMin Wang
With this unified token representation, Temporal2Seq can train a generalist model within a single architecture on different video understanding tasks.
no code implementations • 31 Jul 2024 • Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang
To address this, we propose a novel model, PEAR (Phrase-Based Hand-Object Interaction Anticipation), which jointly anticipates interaction intention and manipulation.
no code implementations • 11 Jul 2024 • Yue Bai, Zichen Zhang, Jiasen Lu, Yun Fu
Training large language models (LLMs) and multimodal LLMs necessitates significant computing resources, and existing publicly available LLMs are typically pre-trained on diverse, privately curated datasets spanning various tasks.
no code implementations • 28 Jun 2024 • Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs
We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation.
no code implementations • 9 May 2024 • Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang
Building upon this relationship, a novel Bidirectional prOgressive Transformer (BOT), which introduces a Bidirectional Progressive mechanism into the anticipation of interaction intention is established.
no code implementations • CVPR 2024 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following novel instructions.
1 code implementation • 28 Dec 2023 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action.
no code implementations • 12 Oct 2023 • Zichen Zhang, Yunshuang Li, Osbert Bastani, Abhishek Gupta, Dinesh Jayaraman, Yecheng Jason Ma, Luca Weihs
Learning long-horizon manipulation tasks, however, is a long-standing challenge, and demands decomposing the overarching task into several manageable subtasks to facilitate policy learning and generalization to unseen tasks.
no code implementations • 30 Mar 2023 • Zichen Zhang, Luca Weihs
Finally, we find that our proposed approach can dramatically reduce the number of resets required for training other embodied tasks, in particular for RoboTHOR ObjectNav we obtain higher success rates than episodic approaches using 99. 97\% fewer resets.
no code implementations • 24 Dec 2022 • Yanan Xiao, Minyu Liu, Zichen Zhang, Lu Jiang, Minghao Yin, Jianan Wang
We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network.
no code implementations • NeurIPS 2023 • Zichen Zhang, Johannes Kirschner, Junxi Zhang, Francesco Zanini, Alex Ayoub, Masood Dehghan, Dale Schuurmans
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle.
1 code implementation • 16 Dec 2022 • Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans
To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution.
2 code implementations • 6 Oct 2022 • Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi Fan
We show that a wide spectrum of robot manipulation tasks can be expressed with multimodal prompts, interleaving textual and visual tokens.
no code implementations • 23 Feb 2022 • Cameron Haigh, Zichen Zhang, Negar Hassanpour, Khurram Javed, Yingying Fu, Shayan Shahramian, Shawn Zhang, Jun Luo
In light of the need to tweak the target specifications throughout the circuit design cycle, we also develop a variant in which the agent can learn to quickly adapt to draw new inductors for moderately different target specifications.
no code implementations • 21 Oct 2021 • Zichen Zhang, Lang Wang, Shuhao Wang
Breast cancer is the most common cancer among women worldwide.
no code implementations • 29 Sep 2021 • Zichen Zhang, Jun Jin, Martin Jagersand, Jun Luo, Dale Schuurmans
Further, we extend the decentralized approach to sequential decision-making problems where we show in 13 continuous control benchmark environments that it matches or outperforms the state-of-the-art CEM algorithms in most cases, under the same budget of the total number of samples for planning.
no code implementations • 11 Feb 2021 • Roberto Vega, Pouneh Gorji, Zichen Zhang, Xuebin Qin, Abhilash Rakkunedeth Hareendranathan, Jeevesh Kapur, Jacob L. Jaremko, Russell Greiner
This complicates its use in tasks like image-based medical diagnosis, where the small training datasets are usually insufficient to learn appropriate data representations.
5 code implementations • 12 Jan 2021 • Xuebin Qin, Deng-Ping Fan, Chenyang Huang, Cyril Diagne, Zichen Zhang, Adrià Cabeza Sant'Anna, Albert Suàrez, Martin Jagersand, Ling Shao
In this paper, we propose a simple yet powerful Boundary-Aware Segmentation Network (BASNet), which comprises a predict-refine architecture and a hybrid loss, for highly accurate image segmentation.
no code implementations • SEMEVAL 2020 • Junyi Li, Xiaobing Zhou, Zichen Zhang
We only participate in the English part of subtask A, which aims to identify offensive languages in English.
29 code implementations • 18 May 2020 • Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane, Martin Jagersand
In this paper, we design a simple yet powerful deep network architecture, U$^2$-Net, for salient object detection (SOD).
Ranked #1 on
Salient Object Detection
on SOD
no code implementations • 19 Dec 2019 • Zichen Zhang, Qingfeng Lan, Lei Ding, Yue Wang, Negar Hassanpour, Russell Greiner
We learn two groups of latent random variables, where one group corresponds to variables that only cause selection bias, and the other group is relevant for outcome prediction.
3 code implementations • 18 Feb 2019 • Max Allan, Alex Shvets, Thomas Kurmann, Zichen Zhang, Rahul Duggal, Yun-Hsuan Su, Nicola Rieke, Iro Laina, Niveditha Kalavakonda, Sebastian Bodenstedt, Luis Herrera, Wenqi Li, Vladimir Iglovikov, Huoling Luo, Jian Yang, Danail Stoyanov, Lena Maier-Hein, Stefanie Speidel, Mahdi Azizian
In mainstream computer vision and machine learning, public datasets such as ImageNet, COCO and KITTI have helped drive enormous improvements by enabling researchers to understand the strengths and limitations of different algorithms via performance comparison.
1 code implementation • 29 Sep 2018 • Jun Jin, Laura Petrich, Masood Dehghan, Zichen Zhang, Martin Jagersand
Our proposed method can directly learn from raw videos, which removes the need for hand-engineered task specification.
Robotics
1 code implementation • 8 Jan 2018 • Zichen Zhang, Min Tang, Dana Cobzas, Dornoosh Zonoobi, Martin Jagersand, Jacob L. Jaremko
We propose an end-to-end neural network that improves the segmentation accuracy of fully convolutional networks by incorporating a localization unit.
no code implementations • 31 Oct 2017 • Min Tang, Zichen Zhang, Dana Cobzas, Martin Jagersand, Jacob L. Jaremko
We propose an attention mechanism for 3D medical image segmentation.
1 code implementation • 10 Aug 2017 • Shida He, Xuebin Qin, Zichen Zhang, Martin Jagersand
This approach reduces a 3D line segment fitting problem into two 2D line segment fitting problems and takes advantage of both images and depth maps.