Search Results for author: ZiRui Wang

Found 41 papers, 27 papers with code

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

no code implementations • 14 Mar 2024 • Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, BoWen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Ankur Jain, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, ZiRui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang

Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons.

Ranked #18 on Visual Question Answering on MM-Vet

In-Context Learning Visual Question Answering

Paper
Add Code

Improving Language Understanding from Screenshots

1 code implementation • 21 Feb 2024 • Tianyu Gao, ZiRui Wang, Adithya Bhaskar, Danqi Chen

An emerging family of language models (LMs), capable of processing both text and images within a single visual view, has the promise to unlock complex tasks such as chart understanding and UI navigation.

Paper
Code

Language Models as Science Tutors

1 code implementation • 16 Feb 2024 • Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodríguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang, ZiRui Wang, Xindi Wu, Mengzhou Xia, Wenhan Jia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen

We use TutorChat to fine-tune Llemma models with 7B and 34B parameters.

GSM8K Math +1

Paper
Code

Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response

1 code implementation • 18 Dec 2023 • Junfeng Long, ZiRui Wang, Quanyi Li, Jiawei Gao, Liu Cao, Jiangmiao Pang

Robust locomotion control depends on accurate state estimations.

Contrastive Learning

138

Paper
Code

TokenCompose: Grounding Diffusion with Token-level Supervision

1 code implementation • 6 Dec 2023 • ZiRui Wang, Zhizhou Sha, Zheng Ding, Yilin Wang, Zhuowen Tu

We present TokenCompose, a Latent Diffusion Model for text-to-image generation that achieves enhanced consistency between user-specified text prompts and model-generated images.

Denoising Object +1

Paper
Code

Ferret: Refer and Ground Anything Anywhere at Any Granularity

1 code implementation • 11 Oct 2023 • Haoxuan You, Haotian Zhang, Zhe Gan, Xianzhi Du, BoWen Zhang, ZiRui Wang, Liangliang Cao, Shih-Fu Chang, Yinfei Yang

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.

Hallucination Language Modelling +1

7,716

Paper
Code

A Read Margin Enhancement Circuit with Dynamic Bias Optimization for MRAM

no code implementations • 18 Sep 2023 • Renhe Chen, Albert Lee, ZiRui Wang, Di wu, Xufeng Kou

This brief introduces a read bias circuit to improve readout yield of magnetic random access memories (MRAMs).

Paper
Add Code

Guiding Image Captioning Models Toward More Specific Captions

no code implementations • ICCV 2023 • Simon Kornblith, Lala Li, ZiRui Wang, Thao Nguyen

We further explore the use of language models to guide the decoding process, obtaining small improvements over the Pareto frontier of reference-free vs. reference-based captioning metrics that arises from classifier-free guidance, and substantially improving the quality of captions generated from a model trained only on minimally curated web data.

Image Captioning Image Retrieval

Paper
Add Code

SHISRCNet: Super-resolution And Classification Network For Low-resolution Breast Cancer Histopathology Image

1 code implementation • 25 Jun 2023 • Luyuan Xie, Cong Li, ZiRui Wang, Xin Zhang, Boyan Chen, Qingni Shen, Zhonghai Wu

CF module extracts and fuses the multi-scale features of SR images for classification.

Histopathological Image Classification Image Classification +1

Paper
Code

Language Models Meet World Models: Embodied Experiences Enhance Language Models

1 code implementation • NeurIPS 2023 • Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, ZiRui Wang, Zichao Yang, Zhiting Hu

While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities.

Paper
Code

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

1 code implementation • 18 May 2023 • Qiuhui Chen, Xinyue Hu, ZiRui Wang, Yi Hong

Vision-language pre-training (VLP) models have been demonstrated to be effective in many computer vision applications.

Medical Visual Question Answering Question Answering +2

Paper
Code

PaLM 2 Technical Report

1 code implementation • 17 May 2023 • Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Ranked #1 on Question Answering on StrategyQA

Code Generation Common Sense Reasoning +6

Paper
Code

Neural Refinement for Absolute Pose Regression with Feature Synthesis

1 code implementation • 17 Mar 2023 • Shuai Chen, Yash Bhalgat, Xinghui Li, Jiawang Bian, Kejie Li, ZiRui Wang, Victor Adrian Prisacariu

To enhance the robustness of our model, we introduce a feature fusion module and a progressive training strategy.

Pose Estimation regression

Paper
Code

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

1 code implementation • CVPR 2023 • Wenjing Bian, ZiRui Wang, Kejie Li, Jia-Wang Bian, Victor Adrian Prisacariu

Recent advances in this direction demonstrate the possibility of jointly optimising a NeRF and camera poses in forward-facing scenes.

Pose Estimation

344

Paper
Code

REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

1 code implementation • CVPR 2023 • Ziniu Hu, Ahmet Iscen, Chen Sun, ZiRui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, Alireza Fathi

REVEAL consists of four key components: the memory, the encoder, the retriever and the generator.

Ranked #9 on Visual Question Answering (VQA) on OK-VQA

Image Captioning Language Modelling +4

2,979

Paper
Code

VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

no code implementations • 9 Dec 2022 • Shen Yan, Tao Zhu, ZiRui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu

We explore an efficient approach to establish a foundational video-text model.

Ranked #1 on Video Captioning on ActivityNet Captions (using extra training data)

Question Answering Retrieval +9

Paper
Add Code

Generative Data Augmentation for Non-IID Problem in Decentralized Clinical Machine Learning

no code implementations • 2 Dec 2022 • ZiRui Wang, Shaoming Duan, Chengyue Wu, Wenhao Lin, Xinyu Zha, Peiyi Han, Chuanyi Liu

To address this problem, we propose a generative augmentation framework in swarm learning called SL-GAN, which augments the non-IID data by generating the synthetic data from participants.

Data Augmentation Edge-computing +1

Paper
Add Code

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

no code implementations • 29 Nov 2022 • Taihong Xiao, ZiRui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang

Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks.

Classification Few-Shot Image Classification

Paper
Add Code

On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

1 code implementation • 19 Oct 2022 • Yifan Xu, Nicklas Hansen, ZiRui Wang, Yung-Chieh Chan, Hao Su, Zhuowen Tu

Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so.

Atari Games 100k Model-based Reinforcement Learning +2

Paper
Code

AS-IntroVAE: Adversarial Similarity Distance Makes Robust IntroVAE

1 code implementation • 28 Jun 2022 • Changjie Lu, Shen Zheng, ZiRui Wang, Omar Dib, Gaurav Gupta

However, due to the unavailability of an effective metric to evaluate the difference between the real and the fake images, the posterior collapse and the vanishing gradient problem still exist, reducing the fidelity of the synthesized images.

Image Generation

Paper
Code

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

2 code implementations • 22 Jun 2022 • Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Ranked #1 on Text-to-Image Generation on LAION COCO

Machine Translation Text-to-Image Generation +1

505

Paper
Code

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

3 code implementations • 9 Jun 2022 • Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Math +1

2,641

Paper
Code

CoCa: Contrastive Captioners are Image-Text Foundation Models

5 code implementations • 4 May 2022 • Jiahui Yu, ZiRui Wang, Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, Yonghui Wu

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

Ranked #1 on Visual Question Answering on VQA v2 test-dev

Action Classification Image Captioning +9

8,328

Paper
Code

DFNet: Enhance Absolute Pose Regression with Direct Feature Matching

1 code implementation • 1 Apr 2022 • Shuai Chen, Xinghui Li, ZiRui Wang, Victor Adrian Prisacariu

We introduce a camera relocalization pipeline that combines absolute pose regression (APR) and direct feature matching.

Camera Relocalization Novel View Synthesis +3

Paper
Code

HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images

1 code implementation • 20 Dec 2021 • Meirui Jiang, ZiRui Wang, Qi Dou

Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice.

Federated Learning Image Classification +1

Paper
Code

Towards Zero-Label Language Learning

no code implementations • 19 Sep 2021 • ZiRui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao

This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data.

Data Augmentation

Paper
Add Code

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

2 code implementations • ICLR 2022 • ZiRui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.

Ranked #4 on Visual Entailment on SNLI-VE val

Image Captioning Language Modelling +2

Paper
Code

Ray-ONet: Efficient 3D Reconstruction From A Single RGB Image

1 code implementation • 5 Jul 2021 • Wenjing Bian, ZiRui Wang, Kejie Li, Victor Adrian Prisacariu

We propose Ray-ONet to reconstruct detailed 3D models from monocular images efficiently.

3D Reconstruction

Paper
Code

NTIRE 2021 Challenge on Perceptual Image Quality Assessment

no code implementations • 7 May 2021 • Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, SungJun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, ZiRui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang, Yifan Chen, Yujiu Yang, Yang Li, Tao Zhang, Longtao Feng, Yiting Liao, Junlin Li, William Thong, Jose Costa Pereira, Ales Leonardis, Steven McDonagh, Kele Xu, Lehan Yang, Hengxing Cai, Pengfei Sun, Seyed Mehdi Ayyoubzadeh, Ali Royat, Sid Ahmed Fezza, Dounia Hammou, Wassim Hamidouche, Sewoong Ahn, Gwangjin Yoon, Koki Tsubota, Hiroaki Akutsu, Kiyoharu Aizawa

This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021.

Image Quality Assessment Image Restoration

Paper
Add Code

Direct-PoseNet: Absolute Pose Regression with Photometric Consistency

1 code implementation • 8 Apr 2021 • Shuai Chen, ZiRui Wang, Victor Prisacariu

We present a relocalization pipeline, which combines an absolute pose regression (APR) network with a novel view synthesis based direct matching module, offering superior accuracy while maintaining low inference time.

Camera Relocalization Novel View Synthesis +1

Paper
Code

NeRF--: Neural Radiance Fields Without Known Camera Parameters

5 code implementations • 14 Feb 2021 • ZiRui Wang, Shangzhe Wu, Weidi Xie, Min Chen, Victor Adrian Prisacariu

Considering the problem of novel view synthesis (NVS) from only a set of 2D images, we simplify the training process of Neural Radiance Field (NeRF) on forward-facing scenes by removing the requirement of known or pre-computed camera parameters, including both intrinsics and 6DoF poses.

Novel View Synthesis

570

Paper
Code

Temporal Cue Guided Video Highlight Detection With Low-Rank Audio-Visual Fusion

no code implementations • ICCV 2021 • Qinghao Ye, Xiyue Shen, Yuan Gao, ZiRui Wang, Qi Bi, Ping Li, Guang Yang

Video highlight detection plays an increasingly important role in social media content filtering, however, it remains highly challenging to develop automated video highlight detection methods because of the lack of temporal annotations (i. e., where the highlight moments are in long videos) for supervised learning.

Highlight Detection Model Optimization

Paper
Add Code

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

1 code implementation • ICLR 2021 • ZiRui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Massively multilingual models subsuming tens or even hundreds of languages pose great challenges to multi-task optimization.

Machine Translation Multi-Task Learning +1

Paper
Code

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

1 code implementation • EMNLP 2020 • ZiRui Wang, Zachary C. Lipton, Yulia Tsvetkov

Modern multilingual models are trained on concatenated text from multiple languages in hopes of conferring benefits to each (positive transfer), with the most pronounced benefits accruing to low-resource languages.

Meta-Learning

Paper
Code

Efficient Meta Lifelong-Learning with Limited Memory

no code implementations • EMNLP 2020 • ZiRui Wang, Sanket Vaibhav Mehta, Barnabás Póczos, Jaime Carbonell

State-of-the-art lifelong language learning methods store past examples in episodic memory and replay them at both training and inference time.

Multi-Task Learning Question Answering +2

Paper
Add Code

Neighbourhood-Insensitive Point Cloud Normal Estimation Network

1 code implementation • 23 Aug 2020 • Zirui Wang, Victor Adrian Prisacariu

We introduce a novel self-attention-based normal estimation network that is able to focus softly on relevant points and adjust the softness by learning a temperature parameter, making it able to work naturally and effectively within a large neighbourhood range.

Paper
Code

FlowNet3D++: Geometric Losses For Deep Scene Flow Estimation

no code implementations • 3 Dec 2019 • Zirui Wang, Shuda Li, Henry Howard-Jenkins, Victor Adrian Prisacariu, Min Chen

We present FlowNet3D++, a deep scene flow estimation network.

3D Reconstruction Scene Flow Estimation

Paper
Add Code

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

2 code implementations • ICLR 2020 • Zirui Wang, Jiateng Xie, Ruochen Xu, Yiming Yang, Graham Neubig, Jaime Carbonell

Learning multilingual representations of text has proven a successful method for many cross-lingual transfer learning tasks.

Bilingual Lexicon Induction Cross-Lingual NER +2

Paper
Code

Characterizing and Avoiding Negative Transfer

no code implementations • CVPR 2019 • Zirui Wang, Zihang Dai, Barnabás Póczos, Jaime Carbonell

When labeled data is scarce for a specific target task, transfer learning often offers an effective solution by utilizing data from a related source task.

Transfer Learning

Paper
Add Code

Theoretical Guarantees of Transfer Learning

no code implementations • 14 Oct 2018 • Zirui Wang

Transfer learning has been proven effective when within-target labeled data is scarce.

Transfer Learning

Paper
Add Code

Towards more Reliable Transfer Learning

no code implementations • 6 Jul 2018 • Zirui Wang, Jaime Carbonell

Multi-source transfer learning has been proven effective when within-target labeled data is scarce.

Active Learning Transfer Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.