Search Results for author: Yuan Zhang

Found 148 papers, 60 papers with code

SASE: A Searching Architecture for Squeeze and Excitation Operations

no code implementations13 Nov 2024 Hanming Wang, YunLong Li, Zijun Wu, Huifen Wang, Yuan Zhang

Neural Architecture Search (NAS), meanwhile, is able to automate the design of neural networks and spares the numerous experiments required for an optimal architecture.

Neural Architecture Search

Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures

1 code implementation10 Oct 2024 Yiming Chen, Yuan Zhang, Liyuan Cao, Kun Yuan, Zaiwen Wen

However, traditional first-order (FO) fine-tuning algorithms incur substantial memory overhead due to the need to store activation values for back-propagation during gradient computation, particularly in long-context fine-tuning tasks.

parameter-efficient fine-tuning

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

1 code implementation6 Oct 2024 Yuan Zhang, Chun-Kai Fan, Junpeng Ma, Wenzhao Zheng, Tao Huang, Kuan Cheng, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Shanghang Zhang

In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens.

Language Modelling Video Understanding

Generic Diagonalizability, Structural Functional Observability and Output Controllability

1 code implementation25 Sep 2024 Yuan Zhang, Tyrone Fernando, Mohamed Darouach

This paper investigates the structural functional observability (SFO) and structural output controllability (SOC) of a class of systems with generically diagonalizable state matrices and explores the associated minimal sensor and actuator placement problems.

A Novel Perspective for Multi-modal Multi-label Skin Lesion Classification

no code implementations19 Sep 2024 Yuan Zhang, Yutong Xie, Hu Wang, Jodie C Avery, M Louise Hull, Gustavo Carneiro

The efficacy of deep learning-based Computer-Aided Diagnosis (CAD) methods for skin diseases relies on analyzing multiple data modalities (i. e., clinical+dermoscopic images, and patient metadata) and addressing the challenges of multi-label classification.

Lesion Classification Multi-Label Classification +1

Human-AI Collaborative Multi-modal Multi-rater Learning for Endometriosis Diagnosis

no code implementations3 Sep 2024 Hu Wang, David Butler, Yuan Zhang, Jodie Avery, Steven Knox, Congbo Ma, Louise Hull, Gustavo Carneiro

HAICOMM is the first method that explores three important aspects of this problem: 1) multi-rater learning to extract a cleaner label from the multiple "noisy" labels available per training sample; 2) multi-modal learning to leverage the presence of T1/T2 MRI images for training and testing; and 3) human-AI collaboration to build a system that leverages the predictions from clinicians and the AI model to provide more accurate classification than standalone clinicians and AI models.

La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection

no code implementations23 Aug 2024 Hang Zou, Chenxi Du, HUI ZHANG, Yuan Zhang, Ajian Liu, Jun Wan, Zhen Lei

Some studies attempt to combine the sparse data from both types of attacks into a single dataset and try to find a common feature space, which is often impractical due to the space is difficult to be found or even non-existent.

A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method

no code implementations19 Aug 2024 Hang Zou, Chenxi Du, Ajian Liu, Yuan Zhang, Jing Liu, MingChuan Yang, Jun Wan, HUI ZHANG

Specifically, we utilize the Mixture of Experts (MoE) to fit complex data distributions using multiple sub-neural networks.

Iris Recognition

Minimal Sensor Placement for Generic State and Unknown Input Observability

no code implementations19 Aug 2024 Ranbo Cheng, Yuan Zhang, Amin MD Al, Yuanqing Xia

Based on these conditions, we prove that determining the minimum number of dedicated sensors to achieve generic state and input observability is NP-hard, which contrasts sharply with the polynomial-time complexity of the corresponding problem with known inputs.

RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction

1 code implementation16 Aug 2024 Xiucheng Wang, Keda Tao, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin Shen

Thus, to enhance RM construction performance, in this paper, the sampling-free RM construction is modeled as a conditional generative problem, where a denoised diffusion-based method, named RadioDiff, is proposed to achieve high-quality RM construction.

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

1 code implementation19 Jul 2024 Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia

Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms.

3D Generation Text to 3D

4Dynamic: Text-to-4D Generation with Hybrid Priors

no code implementations17 Jul 2024 Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao

The static 3D generation is achieved under the guidance of the input text and the first frame of the reference video, while in the dynamic generation stage, we introduce a customized SDS loss to ensure multi-view consistency, a video-based SDS loss to improve temporal consistency, and most importantly, direct priors from the reference video to ensure the quality of geometry and texture.

3D Generation Text to 3D

Latent Linear Quadratic Regulator for Robotic Control Tasks

no code implementations15 Jul 2024 Yuan Zhang, Shaohui Yang, Toshiyuki Ohtsuka, Colin Jones, Joschka Boedecker

Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models.

Model Predictive Control

DSCENet: Dynamic Screening and Clinical-Enhanced Multimodal Fusion for MPNs Subtype Classification

1 code implementation11 Jul 2024 Yuan Zhang, Yaolei Qi, Xiaoming Qi, Yongyue Wei, Guanyu Yang

(1) A dynamic screening module is proposed to flexibly adapt the feature learning of local patches, reducing the interference of irrelevant features and enhancing their diagnostic representativeness.

whole slide images

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

1 code implementation3 Jul 2024 Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang

Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability.

Computational Efficiency Face Reenactment +3

ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

1 code implementation17 Jun 2024 Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye

Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in training MLLMs to communicate between language and vision.

Decoder Visual Reasoning

Artemis: Towards Referential Understanding in Complex Videos

1 code implementation1 Jun 2024 Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, David Doermann, Qixiang Ye, Yunjie Tian

Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring.

Text Summarization Video Grounding

Benchmarking and Improving Detail Image Caption

1 code implementation29 May 2024 Hongyuan Dong, Jiawen Li, Bohong Wu, Jiacong Wang, Yuan Zhang, Haoyuan Guo

We also design a more reliable caption evaluation metric called CAPTURE (CAPtion evaluation by exTracting and coUpling coRE information).

Benchmarking Image Captioning +1

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

no code implementations24 May 2024 Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Guangyong Chen, Yijun Li, Ying-Cong Chen

In this paper, we introduce the Scene Graph Adapter(SG-Adapter), leveraging the structured representation of scene graphs to rectify inaccuracies in the original text embeddings.

Text-to-Image Generation

Unveiling the Tapestry of Consistency in Large Vision-Language Models

1 code implementation23 May 2024 Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

To this end, we provide a multi-modal benchmark ConBench, to intuitively analyze how LVLMs perform when the solution space of a prompt revolves around a knowledge point.

A rapid approach to urban traffic noise mapping with a generative adversarial network

no code implementations21 May 2024 Xinhao Yang, Zhen Han, Xiaodong Lu, Yuan Zhang

With rapid urbanisation and the accompanying increase in traffic density, traffic noise has become a major concern in urban planning.

Generative Adversarial Network SSIM

UDUC: An Uncertainty-driven Approach for Learning-based Robust Control

no code implementations4 May 2024 Yuan Zhang, Jasper Hoffmann, Joschka Boedecker

Learning-based techniques have become popular in both model predictive control (MPC) and reinforcement learning (RL).

Contrastive Learning Model Predictive Control +3

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios

no code implementations11 Apr 2024 Yuan Zhang, Xiaomei Tao, Hanxu Ai, Tao Chen, Yanling Gan

The method proposed in this paper not only contributes to a deeper understanding of the impact of instructional videos on learners' emotional states but also provides a beneficial reference for future research on emotion recognition in MOOC learning scenarios.

Language Modelling Large Language Model +2

ReALM: Reference Resolution As Language Modeling

no code implementations29 Mar 2024 Joel Ruben Antony Moniz, Soundarya Krishnan, Melis Ozyildirim, Prathamesh Saraf, Halim Cagri Ates, Yuan Zhang, Hong Yu

Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds.

Language Modelling

Constrained Reinforcement Learning with Smoothed Log Barrier Function

no code implementations21 Mar 2024 Baohe Zhang, Yuan Zhang, Lilli Frison, Thomas Brox, Joschka Bödecker

However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously.

reinforcement-learning Reinforcement Learning +1

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

1 code implementation8 Mar 2024 Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love, Paul Voigtlaender, Rohan Jain, Gabriela Surita, Kareem Mohamed, Rory Blevins, Junwhan Ahn, Tao Zhu, Kornraphop Kawintiranon, Orhan Firat, Yiming Gu, Yujing Zhang, Matthew Rahtz, Manaal Faruqui, Natalie Clay, Justin Gilmer, JD Co-Reyes, Ivo Penchev, Rui Zhu, Nobuyuki Morioka, Kevin Hui, Krishna Haridasan, Victor Campos, Mahdis Mahdieh, Mandy Guo, Samer Hassan, Kevin Kilgour, Arpi Vezer, Heng-Tze Cheng, Raoul de Liedekerke, Siddharth Goyal, Paul Barham, DJ Strouse, Seb Noury, Jonas Adler, Mukund Sundararajan, Sharad Vikram, Dmitry Lepikhin, Michela Paganini, Xavier Garcia, Fan Yang, Dasha Valter, Maja Trebacz, Kiran Vodrahalli, Chulayuth Asawaroengchai, Roman Ring, Norbert Kalb, Livio Baldini Soares, Siddhartha Brahma, David Steiner, Tianhe Yu, Fabian Mentzer, Antoine He, Lucas Gonzalez, Bibo Xu, Raphael Lopez Kaufman, Laurent El Shafey, Junhyuk Oh, Tom Hennigan, George van den Driessche, Seth Odoom, Mario Lucic, Becca Roelofs, Sid Lall, Amit Marathe, Betty Chan, Santiago Ontanon, Luheng He, Denis Teplyashin, Jonathan Lai, Phil Crone, Bogdan Damoc, Lewis Ho, Sebastian Riedel, Karel Lenc, Chih-Kuan Yeh, Aakanksha Chowdhery, Yang Xu, Mehran Kazemi, Ehsan Amid, Anastasia Petrushkina, Kevin Swersky, Ali Khodaei, Gowoon Chen, Chris Larkin, Mario Pinto, Geng Yan, Adria Puigdomenech Badia, Piyush Patil, Steven Hansen, Dave Orr, Sebastien M. R. Arnold, Jordan Grimstad, Andrew Dai, Sholto Douglas, Rishika Sinha, Vikas Yadav, Xi Chen, Elena Gribovskaya, Jacob Austin, Jeffrey Zhao, Kaushal Patel, Paul Komarek, Sophia Austin, Sebastian Borgeaud, Linda Friso, Abhimanyu Goyal, Ben Caine, Kris Cao, Da-Woon Chung, Matthew Lamm, Gabe Barth-Maron, Thais Kagohara, Kate Olszewska, Mia Chen, Kaushik Shivakumar, Rishabh Agarwal, Harshal Godhia, Ravi Rajwar, Javier Snaider, Xerxes Dotiwalla, YuAn Liu, Aditya Barua, Victor Ungureanu, Yuan Zhang, Bat-Orgil Batsaikhan, Mateo Wirth, James Qin, Ivo Danihelka, Tulsee Doshi, Martin Chadwick, Jilin Chen, Sanil Jain, Quoc Le, Arjun Kar, Madhu Gurumurthy, Cheng Li, Ruoxin Sang, Fangyu Liu, Lampros Lamprou, Rich Munoz, Nathan Lintz, Harsh Mehta, Heidi Howard, Malcolm Reynolds, Lora Aroyo, Quan Wang, Lorenzo Blanco, Albin Cassirer, Jordan Griffith, Dipanjan Das, Stephan Lee, Jakub Sygnowski, Zach Fisher, James Besley, Richard Powell, Zafarali Ahmed, Dominik Paulus, David Reitter, Zalan Borsos, Rishabh Joshi, Aedan Pope, Steven Hand, Vittorio Selo, Vihan Jain, Nikhil Sethi, Megha Goel, Takaki Makino, Rhys May, Zhen Yang, Johan Schalkwyk, Christina Butterfield, Anja Hauth, Alex Goldin, Will Hawkins, Evan Senter, Sergey Brin, Oliver Woodman, Marvin Ritter, Eric Noland, Minh Giang, Vijay Bolina, Lisa Lee, Tim Blyth, Ian Mackinnon, Machel Reid, Obaid Sarvana, David Silver, Alexander Chen, Lily Wang, Loren Maggiore, Oscar Chang, Nithya Attaluri, Gregory Thornton, Chung-Cheng Chiu, Oskar Bunyan, Nir Levine, Timothy Chung, Evgenii Eltyshev, Xiance Si, Timothy Lillicrap, Demetra Brady, Vaibhav Aggarwal, Boxi Wu, Yuanzhong Xu, Ross Mcilroy, Kartikeya Badola, Paramjit Sandhu, Erica Moreira, Wojciech Stokowiec, Ross Hemsley, Dong Li, Alex Tudor, Pranav Shyam, Elahe Rahimtoroghi, Salem Haykal, Pablo Sprechmann, Xiang Zhou, Diana Mincu, Yujia Li, Ravi Addanki, Kalpesh Krishna, Xiao Wu, Alexandre Frechette, Matan Eyal, Allan Dafoe, Dave Lacey, Jay Whang, Thi Avrahami, Ye Zhang, Emanuel Taropa, Hanzhao Lin, Daniel Toyama, Eliza Rutherford, Motoki Sano, HyunJeong Choe, Alex Tomala, Chalence Safranek-Shrader, Nora Kassner, Mantas Pajarskas, Matt Harvey, Sean Sechrist, Meire Fortunato, Christina Lyu, Gamaleldin Elsayed, Chenkai Kuang, James Lottes, Eric Chu, Chao Jia, Chih-Wei Chen, Peter Humphreys, Kate Baumli, Connie Tao, Rajkumar Samuel, Cicero Nogueira dos santos, Anders Andreassen, Nemanja Rakićević, Dominik Grewe, Aviral Kumar, Stephanie Winkler, Jonathan Caton, Andrew Brock, Sid Dalmia, Hannah Sheahan, Iain Barr, Yingjie Miao, Paul Natsev, Jacob Devlin, Feryal Behbahani, Flavien Prost, Yanhua Sun, Artiom Myaskovsky, Thanumalayan Sankaranarayana Pillai, Dan Hurt, Angeliki Lazaridou, Xi Xiong, Ce Zheng, Fabio Pardo, Dan Horgan, Joe Stanton, Moran Ambar, Fei Xia, Alejandro Lince, Mingqiu Wang, Basil Mustafa, Albert Webson, Hyo Lee, Rohan Anil, Martin Wicke, Timothy Dozat, Abhishek Sinha, Enrique Piqueras, Elahe Dabir, Shyam Upadhyay, Anudhyan Boral, Lisa Anne Hendricks, Corey Fry, Josip Djolonga, Yi Su, Jake Walker, Jane Labanowski, Ronny Huang, Vedant Misra, Jeremy Chen, RJ Skerry-Ryan, Avi Singh, Shruti Rijhwani, Dian Yu, Alex Castro-Ros, Beer Changpinyo, Romina Datta, Sumit Bagri, Arnar Mar Hrafnkelsson, Marcello Maggioni, Daniel Zheng, Yury Sulsky, Shaobo Hou, Tom Le Paine, Antoine Yang, Jason Riesa, Dominika Rogozinska, Dror Marcus, Dalia El Badawy, Qiao Zhang, Luyu Wang, Helen Miller, Jeremy Greer, Lars Lowe Sjos, Azade Nova, Heiga Zen, Rahma Chaabouni, Mihaela Rosca, Jiepu Jiang, Charlie Chen, Ruibo Liu, Tara Sainath, Maxim Krikun, Alex Polozov, Jean-Baptiste Lespiau, Josh Newlan, Zeyncep Cankara, Soo Kwak, Yunhan Xu, Phil Chen, Andy Coenen, Clemens Meyer, Katerina Tsihlas, Ada Ma, Juraj Gottweis, Jinwei Xing, Chenjie Gu, Jin Miao, Christian Frank, Zeynep Cankara, Sanjay Ganapathy, Ishita Dasgupta, Steph Hughes-Fitt, Heng Chen, David Reid, Keran Rong, Hongmin Fan, Joost van Amersfoort, Vincent Zhuang, Aaron Cohen, Shixiang Shane Gu, Anhad Mohananey, Anastasija Ilic, Taylor Tobin, John Wieting, Anna Bortsova, Phoebe Thacker, Emma Wang, Emily Caveness, Justin Chiu, Eren Sezener, Alex Kaskasoli, Steven Baker, Katie Millican, Mohamed Elhawaty, Kostas Aisopos, Carl Lebsack, Nathan Byrd, Hanjun Dai, Wenhao Jia, Matthew Wiethoff, Elnaz Davoodi, Albert Weston, Lakshman Yagati, Arun Ahuja, Isabel Gao, Golan Pundak, Susan Zhang, Michael Azzam, Khe Chai Sim, Sergi Caelles, James Keeling, Abhanshu Sharma, Andy Swing, Yaguang Li, Chenxi Liu, Carrie Grimes Bostock, Yamini Bansal, Zachary Nado, Ankesh Anand, Josh Lipschultz, Abhijit Karmarkar, Lev Proleev, Abe Ittycheriah, Soheil Hassas Yeganeh, George Polovets, Aleksandra Faust, Jiao Sun, Alban Rrustemi, Pen Li, Rakesh Shivanna, Jeremiah Liu, Chris Welty, Federico Lebron, Anirudh Baddepudi, Sebastian Krause, Emilio Parisotto, Radu Soricut, Zheng Xu, Dawn Bloxwich, Melvin Johnson, Behnam Neyshabur, Justin Mao-Jones, Renshen Wang, Vinay Ramasesh, Zaheer Abbas, Arthur Guez, Constant Segal, Duc Dung Nguyen, James Svensson, Le Hou, Sarah York, Kieran Milan, Sophie Bridgers, Wiktor Gworek, Marco Tagliasacchi, James Lee-Thorp, Michael Chang, Alexey Guseynov, Ale Jakse Hartman, Michael Kwong, Ruizhe Zhao, Sheleem Kashem, Elizabeth Cole, Antoine Miech, Richard Tanburn, Mary Phuong, Filip Pavetic, Sebastien Cevey, Ramona Comanescu, Richard Ives, Sherry Yang, Cosmo Du, Bo Li, Zizhao Zhang, Mariko Iinuma, Clara Huiyi Hu, Aurko Roy, Shaan Bijwadia, Zhenkai Zhu, Danilo Martins, Rachel Saputro, Anita Gergely, Steven Zheng, Dawei Jia, Ioannis Antonoglou, Adam Sadovsky, Shane Gu, Yingying Bi, Alek Andreev, Sina Samangooei, Mina Khan, Tomas Kocisky, Angelos Filos, Chintu Kumar, Colton Bishop, Adams Yu, Sarah Hodkinson, Sid Mittal, Premal Shah, Alexandre Moufarek, Yong Cheng, Adam Bloniarz, Jaehoon Lee, Pedram Pejman, Paul Michel, Stephen Spencer, Vladimir Feinberg, Xuehan Xiong, Nikolay Savinov, Charlotte Smith, Siamak Shakeri, Dustin Tran, Mary Chesus, Bernd Bohnet, George Tucker, Tamara von Glehn, Carrie Muir, Yiran Mao, Hideto Kazawa, Ambrose Slone, Kedar Soparkar, Disha Shrivastava, James Cobon-Kerr, Michael Sharman, Jay Pavagadhi, Carlos Araya, Karolis Misiunas, Nimesh Ghelani, Michael Laskin, David Barker, Qiujia Li, Anton Briukhov, Neil Houlsby, Mia Glaese, Balaji Lakshminarayanan, Nathan Schucher, Yunhao Tang, Eli Collins, Hyeontaek Lim, Fangxiaoyu Feng, Adria Recasens, Guangda Lai, Alberto Magni, Nicola De Cao, Aditya Siddhant, Zoe Ashwood, Jordi Orbay, Mostafa Dehghani, Jenny Brennan, Yifan He, Kelvin Xu, Yang Gao, Carl Saroufim, James Molloy, Xinyi Wu, Seb Arnold, Solomon Chang, Julian Schrittwieser, Elena Buchatskaya, Soroush Radpour, Martin Polacek, Skye Giordano, Ankur Bapna, Simon Tokumine, Vincent Hellendoorn, Thibault Sottiaux, Sarah Cogan, Aliaksei Severyn, Mohammad Saleh, Shantanu Thakoor, Laurent Shefey, Siyuan Qiao, Meenu Gaba, Shuo-Yiin Chang, Craig Swanson, Biao Zhang, Benjamin Lee, Paul Kishan Rubenstein, Gan Song, Tom Kwiatkowski, Anna Koop, Ajay Kannan, David Kao, Parker Schuh, Axel Stjerngren, Golnaz Ghiasi, Gena Gibson, Luke Vilnis, Ye Yuan, Felipe Tiengo Ferreira, Aishwarya Kamath, Ted Klimenko, Ken Franko, Kefan Xiao, Indro Bhattacharya, Miteyan Patel, Rui Wang, Alex Morris, Robin Strudel, Vivek Sharma, Peter Choy, Sayed Hadi Hashemi, Jessica Landon, Mara Finkelstein, Priya Jhakra, Justin Frye, Megan Barnes, Matthew Mauger, Dennis Daun, Khuslen Baatarsukh, Matthew Tung, Wael Farhan, Henryk Michalewski, Fabio Viola, Felix de Chaumont Quitry, Charline Le Lan, Tom Hudson, Qingze Wang, Felix Fischer, Ivy Zheng, Elspeth White, Anca Dragan, Jean-Baptiste Alayrac, Eric Ni, Alexander Pritzel, Adam Iwanicki, Michael Isard, Anna Bulanova, Lukas Zilka, Ethan Dyer, Devendra Sachan, Srivatsan Srinivasan, Hannah Muckenhirn, Honglong Cai, Amol Mandhane, Mukarram Tariq, Jack W. Rae, Gary Wang, Kareem Ayoub, Nicholas FitzGerald, Yao Zhao, Woohyun Han, Chris Alberti, Dan Garrette, Kashyap Krishnakumar, Mai Gimenez, Anselm Levskaya, Daniel Sohn, Josip Matak, Inaki Iturrate, Michael B. Chang, Jackie Xiang, Yuan Cao, Nishant Ranka, Geoff Brown, Adrian Hutter, Nanxin Chen, Kaisheng Yao, Zoltan Egyed, Francois Galilee, Tyler Liechty, Praveen Kallakuri, Evan Palmer, Sanjay Ghemawat, Jasmine Liu, David Tao, Chloe Thornton, Tim Green, Mimi Jasarevic, Sharon Lin, Victor Cotruta, Yi-Xuan Tan, Noah Fiedel, Hongkun Yu, Ed Chi, Alexander Neitz, Jens Heitkaemper, Anu Sinha, Denny Zhou, Yi Sun, Charbel Kaed, Brice Hulse, Swaroop Mishra, Maria Georgaki, Sneha Kudugunta, Clement Farabet, Izhak Shafran, Daniel Vlasic, Anton Tsitsulin, Rajagopal Ananthanarayanan, Alen Carin, Guolong Su, Pei Sun, Shashank V, Gabriel Carvajal, Josef Broder, Iulia Comsa, Alena Repina, William Wong, Warren Weilun Chen, Peter Hawkins, Egor Filonov, Lucia Loher, Christoph Hirnschall, Weiyi Wang, Jingchen Ye, Andrea Burns, Hardie Cate, Diana Gage Wright, Federico Piccinini, Lei Zhang, Chu-Cheng Lin, Ionel Gog, Yana Kulizhskaya, Ashwin Sreevatsa, Shuang Song, Luis C. Cobo, Anand Iyer, Chetan Tekur, Guillermo Garrido, Zhuyun Xiao, Rupert Kemp, Huaixiu Steven Zheng, Hui Li, Ananth Agarwal, Christel Ngani, Kati Goshvadi, Rebeca Santamaria-Fernandez, Wojciech Fica, Xinyun Chen, Chris Gorgolewski, Sean Sun, Roopal Garg, Xinyu Ye, S. M. Ali Eslami, Nan Hua, Jon Simon, Pratik Joshi, Yelin Kim, Ian Tenney, Sahitya Potluri, Lam Nguyen Thiet, Quan Yuan, Florian Luisier, Alexandra Chronopoulou, Salvatore Scellato, Praveen Srinivasan, Minmin Chen, Vinod Koverkathu, Valentin Dalibard, Yaming Xu, Brennan Saeta, Keith Anderson, Thibault Sellam, Nick Fernando, Fantine Huot, Junehyuk Jung, Mani Varadarajan, MICHAEL QUINN, Amit Raul, Maigo Le, Ruslan Habalov, Jon Clark, Komal Jalan, Kalesha Bullard, Achintya Singhal, Thang Luong, Boyu Wang, Sujeevan Rajayogam, Julian Eisenschlos, Johnson Jia, Daniel Finchelstein, Alex Yakubovich, Daniel Balle, Michael Fink, Sameer Agarwal, Jing Li, DJ Dvijotham, Shalini Pal, Kai Kang, Jaclyn Konzelmann, Jennifer Beattie, Olivier Dousse, Diane Wu, Remi Crocker, Chen Elkind, Siddhartha Reddy Jonnalagadda, Jong Lee, Dan Holtmann-Rice, Krystal Kallarackal, Rosanne Liu, Denis Vnukov, Neera Vats, Luca Invernizzi, Mohsen Jafari, Huanjie Zhou, Lilly Taylor, Jennifer Prendki, Marcus Wu, Tom Eccles, Tianqi Liu, Kavya Kopparapu, Francoise Beaufays, Christof Angermueller, Andreea Marzoca, Shourya Sarcar, Hilal Dib, Jeff Stanway, Frank Perbet, Nejc Trdin, Rachel Sterneck, Andrey Khorlin, Dinghua Li, Xihui Wu, Sonam Goenka, David Madras, Sasha Goldshtein, Willi Gierke, Tong Zhou, Yaxin Liu, Yannie Liang, Anais White, Yunjie Li, Shreya Singh, Sanaz Bahargam, Mark Epstein, Sujoy Basu, Li Lao, Adnan Ozturel, Carl Crous, Alex Zhai, Han Lu, Zora Tung, Neeraj Gaur, Alanna Walton, Lucas Dixon, Ming Zhang, Amir Globerson, Grant Uy, Andrew Bolt, Olivia Wiles, Milad Nasr, Ilia Shumailov, Marco Selvi, Francesco Piccinno, Ricardo Aguilar, Sara McCarthy, Misha Khalman, Mrinal Shukla, Vlado Galic, John Carpenter, Kevin Villela, Haibin Zhang, Harry Richardson, James Martens, Matko Bosnjak, Shreyas Rammohan Belle, Jeff Seibert, Mahmoud Alnahlawi, Brian McWilliams, Sankalp Singh, Annie Louis, Wen Ding, Dan Popovici, Lenin Simicich, Laura Knight, Pulkit Mehta, Nishesh Gupta, Chongyang Shi, Saaber Fatehi, Jovana Mitrovic, Alex Grills, Joseph Pagadora, Dessie Petrova, Danielle Eisenbud, Zhishuai Zhang, Damion Yates, Bhavishya Mittal, Nilesh Tripuraneni, Yannis Assael, Thomas Brovelli, Prateek Jain, Mihajlo Velimirovic, Canfer Akbulut, Jiaqi Mu, Wolfgang Macherey, Ravin Kumar, Jun Xu, Haroon Qureshi, Gheorghe Comanici, Jeremy Wiesner, Zhitao Gong, Anton Ruddock, Matthias Bauer, Nick Felt, Anirudh GP, Anurag Arnab, Dustin Zelle, Jonas Rothfuss, Bill Rosgen, Ashish Shenoy, Bryan Seybold, Xinjian Li, Jayaram Mudigonda, Goker Erdogan, Jiawei Xia, Jiri Simsa, Andrea Michi, Yi Yao, Christopher Yew, Steven Kan, Isaac Caswell, Carey Radebaugh, Andre Elisseeff, Pedro Valenzuela, Kay McKinney, Kim Paterson, Albert Cui, Eri Latorre-Chimoto, Solomon Kim, William Zeng, Ken Durden, Priya Ponnapalli, Tiberiu Sosea, Christopher A. Choquette-Choo, James Manyika, Brona Robenek, Harsha Vashisht, Sebastien Pereira, Hoi Lam, Marko Velic, Denese Owusu-Afriyie, Katherine Lee, Tolga Bolukbasi, Alicia Parrish, Shawn Lu, Jane Park, Balaji Venkatraman, Alice Talbert, Lambert Rosique, Yuchung Cheng, Andrei Sozanschi, Adam Paszke, Praveen Kumar, Jessica Austin, Lu Li, Khalid Salama, Wooyeol Kim, Nandita Dukkipati, Anthony Baryshnikov, Christos Kaplanis, XiangHai Sheng, Yuri Chervonyi, Caglar Unlu, Diego de Las Casas, Harry Askham, Kathryn Tunyasuvunakool, Felix Gimeno, Siim Poder, Chester Kwak, Matt Miecnikowski, Vahab Mirrokni, Alek Dimitriev, Aaron Parisi, Dangyi Liu, Tomy Tsai, Toby Shevlane, Christina Kouridi, Drew Garmon, Adrian Goedeckemeyer, Adam R. Brown, Anitha Vijayakumar, Ali Elqursh, Sadegh Jazayeri, Jin Huang, Sara Mc Carthy, Jay Hoover, Lucy Kim, Sandeep Kumar, Wei Chen, Courtney Biles, Garrett Bingham, Evan Rosen, Lisa Wang, Qijun Tan, David Engel, Francesco Pongetti, Dario de Cesare, Dongseong Hwang, Lily Yu, Jennifer Pullman, Srini Narayanan, Kyle Levin, Siddharth Gopal, Megan Li, Asaf Aharoni, Trieu Trinh, Jessica Lo, Norman Casagrande, Roopali Vij, Loic Matthey, Bramandia Ramadhana, Austin Matthews, CJ Carey, Matthew Johnson, Kremena Goranova, Rohin Shah, Shereen Ashraf, Kingshuk Dasgupta, Rasmus Larsen, Yicheng Wang, Manish Reddy Vuyyuru, Chong Jiang, Joana Ijazi, Kazuki Osawa, Celine Smith, Ramya Sree Boppana, Taylan Bilal, Yuma Koizumi, Ying Xu, Yasemin Altun, Nir Shabat, Ben Bariach, Alex Korchemniy, Kiam Choo, Olaf Ronneberger, Chimezie Iwuanyanwu, Shubin Zhao, David Soergel, Cho-Jui Hsieh, Irene Cai, Shariq Iqbal, Martin Sundermeyer, Zhe Chen, Elie Bursztein, Chaitanya Malaviya, Fadi Biadsy, Prakash Shroff, Inderjit Dhillon, Tejasi Latkar, Chris Dyer, Hannah Forbes, Massimo Nicosia, Vitaly Nikolaev, Somer Greene, Marin Georgiev, Pidong Wang, Nina Martin, Hanie Sedghi, John Zhang, Praseem Banzal, Doug Fritz, Vikram Rao, Xuezhi Wang, Jiageng Zhang, Viorica Patraucean, Dayou Du, Igor Mordatch, Ivan Jurin, Lewis Liu, Ayush Dubey, Abhi Mohan, Janek Nowakowski, Vlad-Doru Ion, Nan Wei, Reiko Tojo, Maria Abi Raad, Drew A. Hudson, Vaishakh Keshava, Shubham Agrawal, Kevin Ramirez, Zhichun Wu, Hoang Nguyen, Ji Liu, Madhavi Sewak, Bryce Petrini, DongHyun Choi, Ivan Philips, Ziyue Wang, Ioana Bica, Ankush Garg, Jarek Wilkiewicz, Priyanka Agrawal, Xiaowei Li, Danhao Guo, Emily Xue, Naseer Shaik, Andrew Leach, Sadh MNM Khan, Julia Wiesinger, Sammy Jerome, Abhishek Chakladar, Alek Wenjiao Wang, Tina Ornduff, Folake Abu, Alireza Ghaffarkhah, Marcus Wainwright, Mario Cortes, Frederick Liu, Joshua Maynez, Andreas Terzis, Pouya Samangouei, Riham Mansour, Tomasz Kępa, François-Xavier Aubet, Anton Algymr, Dan Banica, Agoston Weisz, Andras Orban, Alexandre Senges, Ewa Andrejczuk, Mark Geller, Niccolo Dal Santo, Valentin Anklin, Majd Al Merey, Martin Baeuml, Trevor Strohman, Junwen Bai, Slav Petrov, Yonghui Wu, Demis Hassabis, Koray Kavukcuoglu, Jeffrey Dean, Oriol Vinyals

In this report, we introduce the Gemini 1. 5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio.

1 Image, 2*2 Stitching Code Generation +7

Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer

no code implementations6 Mar 2024 Naifu Xue, Qi Mao, Zijian Wang, Yuan Zhang, Siwei Ma

Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data.

Image Generation

"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach

1 code implementation1 Mar 2024 Lingyu Gu, Yongqi Du, Yuan Zhang, Di Xie, ShiLiang Pu, Robert C. Qiu, Zhenyu Liao

Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging.

Model Compression Quantization

Open Ad Hoc Teamwork with Cooperative Game Theory

1 code implementation23 Feb 2024 Jianhong Wang, Yang Li, Yuan Zhang, Wei Pan, Samuel Kaski

Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training.

Instruction Backdoor Attacks Against Customized LLMs

1 code implementation14 Feb 2024 Rui Zhang, Hongwei Li, Rui Wen, Wenbo Jiang, Yuan Zhang, Michael Backes, Yun Shen, Yang Zhang

The increasing demand for customized Large Language Models (LLMs) has led to the development of solutions like GPTs.

Language Modelling Large Language Model +2

Understanding Time Series Anomaly State Detection through One-Class Classification

no code implementations3 Feb 2024 Hanxu Zhou, Yuan Zhang, Guangjie Leng, Ruofan Wang, Zhi-Qin John Xu

Therefore, in this article, we try to re-understand and define the time series anomaly detection problem through OCC, which we call 'time series anomaly state detection problem'.

One-Class Classification Time Series +2

Can Large Language Models Understand Context?

no code implementations1 Feb 2024 YIlun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng

Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent.

In-Context Learning Quantization

ChatterBox: Multi-round Multimodal Referring and Grounding

1 code implementation24 Jan 2024 Yunjie Tian, Tianren Ma, Lingxi Xie, Jihao Qiu, Xi Tang, Yuan Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye

In this study, we establish a baseline for a new task named multimodal multi-round referring and grounding (MRG), opening up a promising direction for instance-level multimodal dialogues.

Language Modelling Visual Grounding

Enhancing the Power of OOD Detection via Sample-Aware Model Selection

no code implementations CVPR 2024 Feng Xue, Zi He, Yuan Zhang, Chuanlong Xie, Zhenguo Li, Falong Tan

In this work we present a novel perspective on detecting out-of-distribution (OOD) samples and propose an algorithm for sample-aware model selection to enhance the effectiveness of OOD detection.

Model Selection

Cloud-Device Collaborative Learning for Multimodal Large Language Models

no code implementations CVPR 2024 Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang

However, the deployment of these large-scale MLLMs on client devices is hindered by their extensive model parameters, leading to a notable decline in generalization capabilities when these models are compressed for device deployment.

Device-Cloud Collaboration Knowledge Distillation +1

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

1 code implementation20 Dec 2023 Tao Zhang, Xingye Tian, Yikang Zhou, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS).

Contrastive Learning Denoising +6

FedSODA: Federated Cross-assessment and Dynamic Aggregation for Histopathology Segmentation

no code implementations20 Dec 2023 Yuan Zhang, Yaolei Qi, Xiaoming Qi, Lotfi Senhadji, Yongyue Wei, Feng Chen, Guanyu Yang

Federated learning (FL) for histopathology image segmentation involving multiple medical sites plays a crucial role in advancing the field of accurate disease diagnosis and treatment.

Federated Learning Image Segmentation +2

Stable Segment Anything Model

1 code implementation27 Nov 2023 Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).

Segmentation

FreeKD: Knowledge Distillation via Semantic Frequency Prompt

1 code implementation CVPR 2024 Yuan Zhang, Tao Huang, Jiaming Liu, Tao Jiang, Kuan Cheng, Shanghang Zhang

(2) During the distillation period, a pixel-wise frequency mask is generated via Frequency Prompt, to localize those pixel of interests (PoIs) in various frequency bands.

Knowledge Distillation

Collaboration and Transition: Distilling Item Transitions into Multi-Query Self-Attention for Sequential Recommendation

1 code implementation2 Nov 2023 Tianyu Zhu, Yansong Shi, Yuan Zhang, Yihong Wu, Fengran Mo, Jian-Yun Nie

Second, we develop a transition-aware embedding distillation module that distills global item-to-item transition patterns into item embeddings, which enables the model to memorize and leverage transitional signals and serves as a calibrator for collaborative signals.

Sequential Recommendation

Generalized Cactus and Structural Controllability of Switched Linear Continuous-Time Systems

no code implementations19 Sep 2023 Yuan Zhang, Yuanqing Xia, Aming Li

This paper explores the structural controllability of switched linear continuous-time systems.

1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

1 code implementation28 Aug 2023 Tao Zhang, Xingye Tian, Yikang Zhou, Yu Wu, Shunping Ji, Cilin Yan, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

Video instance segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving.

Autonomous Driving Denoising +6

Label-free Deep Learning Driven Secure Access Selection in Space-Air-Ground Integrated Networks

no code implementations28 Aug 2023 Zhaowei Wang, Zhisheng Yin, Xiucheng Wang, Nan Cheng, Yuan Zhang, Tom H. Luan

Considering the inherent co-channel interference due to spectrum sharing among multi-tier access networks in SAGIN, it can be leveraged to assist the physical layer security among heterogeneous transmissions.

Effectively Heterogeneous Federated Learning: A Pairing and Split Learning Based Approach

no code implementations26 Aug 2023 Jinglong Shen, Xiucheng Wang, Nan Cheng, Longfei Ma, Conghao Zhou, Yuan Zhang

As a promising paradigm federated Learning (FL) is widely used in privacy-preserving machine learning, which allows distributed devices to collaboratively train a model while avoiding data transmission among clients.

Federated Learning Privacy Preserving

Functional Observability, Structural Functional Observability and Optimal Sensor Placement

no code implementations18 Jul 2023 Yuan Zhang, Tyrone Fernando, Mohamed Darouach

Based on these results, the problems of selecting the minimal sensors from a prior set to achieve functional observability and SFO are shown to be NP-hard.

Measuring Item Global Residual Value for Fair Recommendation

1 code implementation17 Jul 2023 Jiayin Wang, Weizhi Ma, Chumeng Jiang, Min Zhang, Yuan Zhang, Biao Li, Peng Jiang

In this paper, we call for a shift of attention from modeling user preferences to developing fair exposure mechanisms for items.

Recommendation Systems

Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation

2 code implementations ICCV 2023 Yaolei Qi, Yuting He, Xiaoming Qi, Yuan Zhang, Guanyu Yang

In this work, we note the specificity of tubular structures and use this knowledge to guide our DSCNet to simultaneously enhance perception in three stages: feature extraction, feature fusion, and loss constraint.

Segmentation Specificity

Distilling Missing Modality Knowledge from Ultrasound for Endometriosis Diagnosis with Magnetic Resonance Images

no code implementations5 Jul 2023 Yuan Zhang, Hu Wang, David Butler, Minh-Son To, Jodie Avery, M Louise Hull, Gustavo Carneiro

Next, we distill the knowledge from the teacher TVUS POD obliteration detector to train the student MRI model by minimizing a regression loss that approximates the output of the student to the teacher using unpaired TVUS and MRI data.

Knowledge Distillation

Razor SNN: Efficient Spiking Neural Network with Temporal Embeddings

no code implementations30 Jun 2023 Yuan Zhang, Jian Cao, Ling Zhang, Jue Chen, Wenyu Sun, YuAn Wang

The event streams generated by dynamic vision sensors (DVS) are sparse and non-uniform in the spatial domain, while still dense and redundant in the temporal domain.

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

1 code implementation7 Jun 2023 Tao Zhang, Xingye Tian, Haoran Wei, Yu Wu, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

In this report, we successfully validated the effectiveness of the decoupling strategy in video panoptic segmentation.

Autonomous Driving Segmentation +2

DVIS: Decoupled Video Instance Segmentation Framework

1 code implementation ICCV 2023 Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang, Pengfei Wan

The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame association during tracking, and 2) the effective utilization of temporal information predicated on the aforementioned accurate alignment outcomes during refinement.

Autonomous Driving Instance Segmentation +5

Knowledge Diffusion for Distillation

1 code implementation NeurIPS 2023 Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu

To address this, we propose to denoise student features using a diffusion model trained by teacher features.

Denoising Image Classification +4

Distribution-Free Matrix Prediction Under Arbitrary Missing Pattern

no code implementations19 May 2023 Meijia Shao, Yuan Zhang

This paper studies the open problem of conformalized entry prediction in a row/column-exchangeable matrix.

Conformal Prediction

Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty

1 code implementation4 May 2023 Yuan Zhang, Weihua Chen, Yichen Lu, Tao Huang, Xiuyu Sun, Jian Cao

Knowledge distillation is an effective paradigm for boosting the performance of pocket-size model, especially when multiple teacher models are available, the student would break the upper limit again.

Knowledge Distillation object-detection +3

Observability Blocking for Functional Privacy of Linear Dynamic Networks

no code implementations17 Apr 2023 Yuan Zhang, Ranbo Cheng, Yuanqing Xia

This paper addresses the problem of determining the minimum set of state variables in a network that need to be blocked from direct measurements in order to protect functional privacy with respect to {\emph{any}} output matrices.

Blocking

Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge

no code implementations23 Mar 2023 Yuan Zhang, Chuanyi Li, Yu Sheng, Jidong Ge, Bin Luo

It is a difficult but necessary task to extract the key information for the cases from these textual materials and to clarify the dispute focus of related parties.

On the Reachability and Controllability of Temporal Continuous-Time Linear Networks: A Generic Analysis

no code implementations23 Feb 2023 Yuan Zhang, Yuanqing Xia, Long Wang

This paper investigates the reachability and controllability of temporal continuous-time linear networks from a generic viewpoint, where only the zero-nonzero patterns of subsystem matrices are known.

Temporal Sequences

Real-time speech enhancement with dynamic attention span

no code implementations21 Feb 2023 Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

For real-time speech enhancement (SE) including noise suppression, dereverberation and acoustic echo cancellation, the time-variance of the audio signals becomes a severe challenge.

Acoustic echo cancellation Decoder +1

Capacity Analysis of Holographic MIMO Channels with Practical Constraints

no code implementations29 Dec 2022 Yuan Zhang, Jianhua Zhang, Yuxiang Zhang, Yuan YAO, Guangyi Liu

However, the channel might not satisfy isotropic scattering because of generalized angle distributions, and the antenna gain is limited by the array aperture in reality.

Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction

1 code implementation23 Nov 2022 Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin

Since the error rate of the incorrect sentence is usually low (e. g., 10\%), the correction model can only learn to correct on limited error tokens but trivially copy on most tokens (correct tokens), which harms the effective training of error correction.

Decoder Sentence +2

Disentangled Feature Learning for Real-Time Neural Speech Coding

no code implementations22 Nov 2022 Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs.

Disentanglement Speech Representation Learning +1

Differentially Private Optimizers Can Learn Adversarially Robust Models

no code implementations16 Nov 2022 Yuan Zhang, Zhiqi Bu

Machine learning models have shone in a variety of domains and attracted increasing attention from both the security and the privacy communities.

Adversarial Robustness

FV2ES: A Fully End2End Multimodal System for Fast Yet Effective Video Emotion Recognition Inference

1 code implementation21 Sep 2022 Qinglan Wei, Xuling Huang, Yuan Zhang

In the latest social networks, more and more people prefer to express their emotions in videos through text, speech, and rich facial expressions.

Multimodal Emotion Recognition Video Emotion Recognition

Real-time Short Video Recommendation on Mobile Devices

no code implementations20 Aug 2022 Xudong Gong, Qinlin Feng, Yuan Zhang, Jiangling Qin, Weijie Ding, Biao Li, Peng Jiang, Kun Gai

However, as users continue to watch videos and feedback, the changing context leads the ranking of the server-side recommendation system inaccurate.

Recommendation Systems Re-Ranking

Higher-order accurate two-sample network inference and network hashing

1 code implementation16 Aug 2022 Meijia Shao, Dong Xia, Yuan Zhang, Qiong Wu, Shuo Chen

Two-sample hypothesis testing for network comparison presents many significant challenges, including: leveraging repeated network observations and known node registration, but without requiring them to operate; relaxing strong structural assumptions; achieving finite-sample higher-order accuracy; handling different network sizes and sparsity levels; fast computation and memory parsimony; controlling false discovery rate (FDR) in multiple testing; and theoretical understandings, particularly regarding finite-sample accuracy and minimax optimality.

Vocal Bursts Valence Prediction

Latent-Domain Predictive Neural Speech Coding

no code implementations18 Jul 2022 Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods.

Quantization

Cross-Scale Vector Quantization for Scalable Neural Speech Coding

no code implementations7 Jul 2022 Xue Jiang, Xiulian Peng, Huaying Xue, Yuan Zhang, Yan Lu

In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement.

Quantization

Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

1 code implementation5 Jul 2022 Yuan Zhang, Jianhong Wang, Joschka Boedecker

To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function.

continuous-control Continuous Control +3

Masked Distillation with Receptive Tokens

1 code implementation29 May 2022 Tao Huang, Yuan Zhang, Shan You, Fei Wang, Chen Qian, Jian Cao, Chang Xu

To obtain a group of masks, the receptive tokens are learned via the regular task loss but with teacher fixed, and we also leverage a Dice loss to enrich the diversity of learned masks.

object-detection Object Detection +1

CIRS: Bursting Filter Bubbles by Counterfactual Interactive Recommender System

1 code implementation4 Apr 2022 Chongming Gao, Shiqi Wang, Shijun Li, Jiawei Chen, Xiangnan He, Wenqiang Lei, Biao Li, Yuan Zhang, Peng Jiang

The basic idea is to first learn a causal user model on historical data to capture the overexposure effect of items on user satisfaction.

Causal Inference counterfactual +2

On Polynomially Solvable Constrained Input Selections for Fixed and Switched Linear Structured Systems

no code implementations3 Apr 2022 Yuan Zhang, Yuanqing Xia, Shenyu Liu, Zhongqi Sun

In this paper, it is found that, if the input structure satisfies certain `regularizations', which are characterized by the proposed restricted total unimodulairty notion, those problems can be solvable in polynomial time via linear programming (LP) relaxations.

LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm

1 code implementation29 Jan 2022 Meng Ai, Biao Li, Heyang Gong, Qingwei Yu, Shengjie Xue, Yuan Zhang, Yunzhou Zhang, Peng Jiang

The proposed approach is currently serving over hundreds of millions of users on the platform and achieves one of the most tremendous improvements over these months.

Distributed Computing Marketing

In Defense of Kalman Filtering for Polyp Tracking from Colonoscopy Videos

no code implementations27 Jan 2022 David Butler, Yuan Zhang, Tim Chen, Seon Ho Shin, Rajvinder Singh, Gustavo Carneiro

We advocate that the field should instead focus on the development of simple and efficient detectors that an be combined with effective trackers to allow the implementation of real-time polyp detectors.

End-to-End Neural Speech Coding for Real-Time Communications

no code implementations24 Jan 2022 Xue Jiang, Xiulian Peng, Chengyu Zheng, Huaying Xue, Yuan Zhang, Yan Lu

Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC).

Decoder Packet Loss Concealment

On real structured controllability/stabilizability/stability radius: Complexity and unified rank-relaxation based methods

no code implementations4 Jan 2022 Yuan Zhang, Yuanqing Xia, Yufeng Zhan

This paper addresses the real structured controllability, stabilizability, and stability radii (RSCR, RSSZR, and RSSR, respectively) of linear systems, which involve determining the distance (in terms of matrix norms) between a (possibly large-scale) system and its nearest uncontrollable, unstabilizable, and unstable systems, respectively, with a prescribed affine structure.

Learning Multi-granularity User Intent Unit for Session-based Recommendation

1 code implementation25 Dec 2021 Jiayan Guo, Yaming Yang, Xiangchen Song, Yuan Zhang, Yujing Wang, Jing Bai, Yan Zhang

Specifically, we creatively propose Multi-granularity Intent Heterogeneous Session Graph which captures the interactions between different granularity intent units and relieves the burden of long-dependency.

Graph Neural Network Session-Based Recommendations

MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image

1 code implementation CVPR 2022 Xingyu Chen, Yufeng Liu, Yajiao Dong, Xiong Zhang, Chongyang Ma, Yanmin Xiong, Yuan Zhang, Xiaoyan Guo

In this work, we propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence.

3D Hand Pose Estimation Position +2

Controllable Semantic Parsing via Retrieval Augmentation

1 code implementation EMNLP 2021 Panupong Pasupat, Yuan Zhang, Kelvin Guu

In practical applications of semantic parsing, we often want to rapidly change the behavior of the parser, such as enabling it to handle queries in a new domain, or changing its predictions on certain targeted queries.

Retrieval Semantic Parsing

Practical and Private Heterogeneous Federated Learning

no code implementations29 Sep 2021 Hanxiao Chen, Meng Hao, Hongwei Li, Guangxiao Niu, Guowen Xu, Huawei Wang, Yuan Zhang, Tianwei Zhang

Heterogeneous federated learning (HFL) enables clients with different computation/communication capabilities to collaboratively train their own customized models, in which the knowledge of models is shared via clients' predictions on a public dataset.

Federated Learning Privacy Preserving

AdaPruner: Adaptive Channel Pruning and Effective Weights Inheritance

no code implementations14 Sep 2021 Xiangcheng Liu, Jian Cao, Hongyi Yao, Wenyu Sun, Yuan Zhang

While previous pruning methods have mostly focused on identifying unimportant channels, channel pruning is considered as a special case of neural architecture search in recent years.

Image Classification Neural Architecture Search

r-GAT: Relational Graph Attention Network for Multi-Relational Graphs

no code implementations13 Sep 2021 Meiqi Chen, Yuan Zhang, Xiaoyu Kou, Yuntao Li, Yan Zhang

To tackle this issue, we propose r-GAT, a relational graph attention network to learn multi-channel entity representations.

Graph Attention Knowledge Graphs +1

PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition

1 code implementation9 Sep 2021 Zhi Qiao, Yu Zhou, Jin Wei, Wei Wang, Yuan Zhang, Ning Jiang, Hongbin Wang, Weiping Wang

In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency.

Decoder Scene Text Recognition

Total Unimodularity and Strongly Polynomial Solvability of Constrained Minimum Input Selections for Structural Controllability: an LP-based Method

no code implementations10 Aug 2021 Yuan Zhang, Yuanqing Xia, Yufeng Zhan

Subject to a certain condition on the prescribed input configurations that contains the dedicated input one as a special case, we demonstrate that the constraint matrices of these ILPs are totally unimodular.

Distilling Neuron Spike with High Temperature in Reinforcement Learning Agents

no code implementations5 Aug 2021 Ling Zhang, Jian Cao, Yuan Zhang, Bohan Zhou, Shuo Feng

This method uses distillation to effectively avoid the weakness of STBP, which can achieve SOTA performance in classification, and can obtain a smaller, faster convergence and lower power consumption SNN reinforcement learning model.

reinforcement-learning Reinforcement Learning +2

QA-Driven Zero-shot Slot Filling with Weak Supervision Pretraining

no code implementations ACL 2021 Xinya Du, Luheng He, Qi Li, Dian Yu, Panupong Pasupat, Yuan Zhang

To address this problem, we introduce QA-driven slot filling (QASF), which extracts slot-filler spans from utterances with a span-based QA model.

slot-filling Zero-shot Slot Filling

Latent Space Model for Higher-order Networks and Generalized Tensor Decomposition

no code implementations30 Jun 2021 Zhongyuan Lyu, Dong Xia, Yuan Zhang

We formulate the relationship between the latent positions and the observed data via a generalized multilinear kernel as the link function.

Link Prediction Tensor Decomposition

Partial Strong Structural Controllability

no code implementations26 Jun 2021 Yuan Zhang, Yuanqing Xia

This paper introduces a new controllability notion, termed partial strong structural controllability (PSSC), on a structured system whose entries of system matrices are either fixed zero or indeterminate, which naturally extends the conventional strong structural controllability (SSC) and bridges the gap between structural controllability and SSC.

SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

1 code implementation31 May 2021 Jianhong Wang, Yuan Zhang, Yunjie Gu, Tae-Kyun Kim

This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory.

Fairness Q-Learning +2

Recent Standard Development Activities on Video Coding for Machines

no code implementations26 May 2021 Wen Gao, Shan Liu, Xiaozhong Xu, Manouchehr Rafie, Yuan Zhang, Igor Curcio

Specifically, we will first provide an overview of the MPEG VCM group including use cases, requirements, processing pipelines, plan for potential VCM standards, followed by the evaluation framework including machine-vision tasks, dataset, evaluation metrics, and anchor generation.

object-detection Object Detection

Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations

2 code implementations15 Apr 2021 Jonathan Herzig, Peter Shaw, Ming-Wei Chang, Kelvin Guu, Panupong Pasupat, Yuan Zhang

Sequence-to-sequence (seq2seq) models are prevalent in semantic parsing, but have been found to struggle at out-of-distribution compositional generalization.

Text-To-SQL

Few-shot Intent Classification and Slot Filling with Retrieved Examples

no code implementations NAACL 2021 Dian Yu, Luheng He, Yuan Zhang, Xinya Du, Panupong Pasupat, Qi Li

Few-shot learning arises in important practical scenarios, such as when a natural language understanding system needs to learn new semantic labels for an emerging, resource-scarce domain.

Classification Few-Shot Learning +8

ProDCoNN-server: a web server for protein sequence prediction and design from a three-dimensional structure

1 code implementation bioRxiv 2021 Yuan Zhang, Arunima Mandal, Kevin Cui, Xiuwen Liu, Jinfeng Zhang

The prediction is very fast compared with other protein sequence prediction servers - it takes only a few minutes for a query protein on average.

Protein Design

PTSC: a New Definition for Structural Controllability under Numerical Perturbations

no code implementations22 Mar 2021 Yuan Zhang, Yuanqing Xia

This paper proposes a novel notion for structural controllability under structured numerical perturbations, namely the perturbation-tolerant structural controllability (PTSC), on a single-input structured system whose entries can be classified into three categories: fixed zero entries, unknown generic entries whose values are fixed but unknown, and perturbed entries that can take arbitrary complex values.