Search Results for author: Jieru Mei

Found 27 papers, 20 papers with code

STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

no code implementations2 Apr 2025 Zijun Wang, Haoqin Tu, YuHan Wang, Juncheng Wu, Jieru Mei, Brian R. Bartoldson, Bhavya Kailkhura, Cihang Xie

This paper introduces STAR-1, a high-quality, just-1k-scale safety dataset specifically designed for large reasoning models (LRMs) like DeepSeek-R1.

Diversity Safety Alignment

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

1 code implementation11 Oct 2024 Zijun Wang, Haoqin Tu, Jieru Mei, Bingchen Zhao, Yisen Wang, Cihang Xie

This paper studies the vulnerabilities of transformer-based Large Language Models (LLMs) to jailbreaking attacks, focusing specifically on the optimization-based Greedy Coordinate Gradient (GCG) strategy.

Safety Alignment

From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation

no code implementations2 Sep 2024 Yunfei Xie, Cihang Xie, Alan Yuille, Jieru Mei

Local aggregation is employed to form superpixels, leveraging the inherent redundancy of the image data to produce segments closely aligned with specific parts of the object, guided by object-level supervision.

Computational Efficiency Image Segmentation +4

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Mamba

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

1 code implementation8 Jun 2024 Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.

Conditional Image Generation Denoising +2

Mamba-R: Vision Mamba ALSO Needs Registers

1 code implementation23 May 2024 Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba.

Mamba Semantic Segmentation

3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge

1 code implementation23 Mar 2024 Siwei Yang, Xianhang Li, Jieru Mei, Jieneng Chen, Cihang Xie, Yuyin Zhou

We identify that the Decoder-only 3D-TransUNet model should offer enhanced efficacy in the segmentation of brain metastases, as indicated by our 5-fold cross-validation on the training set.

Brain Tumor Segmentation Decoder +2

SPFormer: Enhancing Vision Transformer with Superpixel Representation

no code implementations5 Jan 2024 Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation.

Superpixels

A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties

1 code implementation21 Dec 2023 Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie

Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.

Common Sense Reasoning Descriptive +1

Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning

no code implementations18 Dec 2023 Bingchen Zhao, Haoqin Tu, Chen Wei, Jieru Mei, Cihang Xie

This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs).

Domain Adaptation

SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference

1 code implementation4 Dec 2023 Feng Wang, Jieru Mei, Alan Yuille

Specifically, we replace the traditional self-attention block of CLIP vision encoder's last layer by our CSA module and reuse its pretrained projection matrices of query, key, and value, leading to a training-free adaptation approach for CLIP's zero-shot semantic segmentation.

Segmentation Semantic Segmentation +2

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

3 code implementations11 Oct 2023 Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew Lungren, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou

In this paper, we extend the 2D TransUNet architecture to a 3D network by building upon the state-of-the-art nnU-Net architecture, and fully exploring Transformers' potential in both the encoder and decoder design.

Decoder Image Segmentation +4

FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning

1 code implementation6 Oct 2023 Peiran Xu, Zeyu Wang, Jieru Mei, Liangqiong Qu, Alan Yuille, Cihang Xie, Yuyin Zhou

Federated learning (FL) is an emerging paradigm in machine learning, where a shared model is collaboratively learned using data from multiple devices to mitigate the risk of data leakage.

Federated Learning

Superpixel Transformers for Efficient Semantic Segmentation

no code implementations28 Sep 2023 Alex Zihao Zhu, Jieru Mei, Siyuan Qiao, Hang Yan, Yukun Zhu, Liang-Chieh Chen, Henrik Kretzschmar

Finally, we directly project the superpixel class predictions back into the pixel space using the associations between the superpixels and the image pixel features.

Autonomous Driving Segmentation +2

3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose Estimation

1 code implementation ICCV 2023 Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, Alan Yuille

Motivated by the recent success of generative models in rigid object pose estimation, we propose 3D-aware Neural Body Fitting (3DNBF) - an approximate analysis-by-synthesis approach to 3D human pose estimation with SOTA performance and occlusion robustness.

3D Human Pose Estimation Contrastive Learning

SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

1 code implementation24 Jul 2023 YiQing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou

To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis.

Contrastive Learning Image Reconstruction +5

Waymo Open Dataset: Panoramic Video Panoptic Segmentation

1 code implementation15 Jun 2022 Jieru Mei, Alex Zihao Zhu, Xinchen Yan, Hang Yan, Siyuan Qiao, Yukun Zhu, Liang-Chieh Chen, Henrik Kretzschmar, Dragomir Anguelov

We therefore present the Waymo Open Dataset: Panoramic Video Panoptic Segmentation Dataset, a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving.

3D Multi-Object Tracking Autonomous Driving +5

In Defense of Image Pre-Training for Spatiotemporal Recognition

1 code implementation3 May 2022 Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie

Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels.

STS Video Recognition

Fast AdvProp

1 code implementation ICLR 2022 Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie

Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.

Data Augmentation object-detection +1

Batch Normalization with Enhanced Linear Transformation

1 code implementation28 Nov 2020 Yuhui Xu, Lingxi Xie, Cihang Xie, Jieru Mei, Siyuan Qiao, Wei Shen, Hongkai Xiong, Alan Yuille

Batch normalization (BN) is a fundamental unit in modern deep networks, in which a linear transformation module was designed for improving BN's flexibility of fitting complex data distributions.

Shape-Texture Debiased Neural Network Training

1 code implementation ICLR 2021 Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.

 Ranked #1 on Image Classification on ImageNet (Hardware Burden metric)

Adversarial Robustness Data Augmentation +2

Neural Architecture Search for Lightweight Non-Local Networks

2 code implementations CVPR 2020 Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille

However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.

Image Classification Neural Architecture Search

CAKES: Channel-wise Automatic KErnel Shrinking for Efficient 3D Networks

1 code implementation28 Mar 2020 Qihang Yu, Yingwei Li, Jieru Mei, Yuyin Zhou, Alan L. Yuille

3D Convolution Neural Networks (CNNs) have been widely applied to 3D scene understanding, such as video analysis and volumetric image recognition.

3D Medical Imaging Segmentation Action Recognition +3

AtomNAS: Fine-Grained End-to-End Neural Architecture Search

1 code implementation ICLR 2020 Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang

We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms.

Neural Architecture Search

Online Dictionary Learning for Approximate Archetypal Analysis

no code implementations ECCV 2018 Jieru Mei, Chunyu Wang, Wen-Jun Zeng

The archetypes generally correspond to the extremal points in the dataset and are learned by requiring them to be convex combinations of the training data.

Dictionary Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.