Search Results

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

1 code implementation19 Apr 2024

We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities.

When LLMs Meet Cybersecurity: A Systematic Literature Review

1 code implementation6 May 2024

The rapid development of large language models (LLMs) has opened new avenues across various fields, including cybersecurity, which faces an evolving threat landscape and demand for innovative technologies.

Systematic Literature Review

CAI: An Open, Bug Bounty-Ready Cybersecurity AI

1 code implementation8 Apr 2025

By 2028 most cybersecurity actions will be autonomous, with humans teleoperating.

Cryptography and Security

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges

2 code implementations24 Sep 2024

Although language model (LM) agents are demonstrating growing potential in many domains, their success in cybersecurity has been limited due to simplistic design and the lack of fundamental features for this domain.

Language Modeling Language Modelling

Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

1 code implementation2 Oct 2017

As a prospective filter for the human analyst, we present an online unsupervised deep learning approach to detect anomalous network activity from system logs in real time.

Anomaly Detection

Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions

4 code implementations20 Aug 2021

The most notable of these comes in the form of the first self-described `AI pair programmer', GitHub Copilot, a language model trained over open-source GitHub code.

Code Generation Diversity +2

The SEED Internet Emulator and Its Applications in Cybersecurity Education

1 code implementation10 Jan 2022

While the emulator was initially developed for cybersecurity courses, it can also be used for network courses, for students to learn how the Internet technologies work, such as routing, BGP, IP Anycast, and DNS.

Cryptography and Security

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

3 code implementations15 Aug 2024

To evaluate agent capabilities, we construct a cybersecurity agent and evaluate 8 models: GPT-4o, OpenAI o1-preview, Claude 3 Opus, Claude 3. 5 Sonnet, Mixtral 8x22b Instruct, Gemini 1. 5 Pro, Llama 3 70B Chat, and Llama 3. 1 405B Instruct.

Ranked #3 on on Cybench