Search Results for author: Baizhou Huang

Found 8 papers, 4 papers with code

Generative Evaluation of Complex Reasoning in Large Language Models

1 code implementation3 Apr 2025 Haowei Lin, Xiangyu Wang, Ruilin Yan, Baizhou Huang, Haotian Ye, Jianhua Zhu, ZiHao Wang, James Zou, Jianzhu Ma, Yitao Liang

Moreover, LLM performance on KUMO tasks correlates strongly with results on newly released real-world reasoning benchmarks, underscoring KUMO's value as a robust, enduring assessment tool for genuine LLM reasoning capabilities.

Benchmarking Memorization

$B^4$: A Black-Box Scrubbing Attack on LLM Watermarks

no code implementations2 Nov 2024 Baizhou Huang, Xiao Pu, Xiaojun Wan

Specifically, we formulate the watermark scrubbing attack as a constrained optimization problem by capturing its objectives with two distributions, a Watermark Distribution and a Fidelity Distribution.

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

no code implementations19 Jun 2024 Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, Xiaojun Wan

Our benchmark facilitates independent correction of misreading and misrecognition errors by editing the corresponding knowledge component.

knowledge editing

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

no code implementations22 May 2024 Baizhou Huang, Xiaojun Wan

To this end, we introduce \textbf{WaterPool}, a simple yet effective key module that preserves a complete key sampling space required by imperceptibility while utilizing semantics-based search to improve the key restoration process.

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

1 code implementation4 Feb 2024 Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, ZiHao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang

The ever-growing ecosystem of LLMs has posed a challenge in selecting the most appropriate pre-trained model to fine-tune amidst a sea of options.

Language Modeling Language Modelling +1

ALCUNA: Large Language Models Meet New Knowledge

1 code implementation23 Oct 2023 Xunjian Yin, Baizhou Huang, Xiaojun Wan

With the rapid development of NLP, large-scale language models (LLMs) excel in various tasks across multiple domains now.

Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency

1 code implementation29 Sep 2023 Baizhou Huang, Shuai Lu, Weizhu Chen, Xiaojun Wan, Nan Duan

We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency across outputs from multiple perspectives.

Code Generation HumanEval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.