Darpa OpTC (Darpa Operationally Transparent Cyber (OpTC) Dataset)

Operationally Transparent Cyber (OpTC) was a technology transition pilot study funded under Boston Fusion Corp.'s Cyber APT Scenarios for Enterprise Systems (CASES) project. Its primary objective was to determine if DARPA Transparent Computing (TC) program technologies could scale without loss of detection performance to address cyber defense capability gaps identified in USTRANSCOM's Joint Deployment Distribution Enterprise (JDDE) solicitation for the government fiscal years 2019-2023. Boston Fusion along with two performers from the TC program (Five Directions providing endpoint telemetry (TA1) and BAE providing analysis over the data (TA2)) worked to scale their systems from two machines to one thousand machines. The OpTC team conducted scaling and detection tests in the fall of 2019. A third performer (Provatek), not originally associated with the TC program, acted as a red team and test coordinator. This data set represents a subset of that activity.

The OpTC system architecture is based on one used in TC program evaluations. Kafka, an open-source stream-processing server, is used to pass information among system components. Each Windows 10 endpoint is equipped with an endpoint sensor that monitors host events, packs them into JSON records, and sends them to Kafka. As these records flow into Kafka, a translation server aggregates them into new data records in a format called eCAR that are then pushed back to Kafka. As the translation server pushes eCAR records to Kafka, a data analytics component integrates them into a graph data structure for analysis and visualization.

OpTC took TC system components that worked well on two hosts in TC program tests and scaled them up to work with one thousand hosts. This scaled-up system was evaluated over two weeks in a highly instrumented environment, and the data in this collection contains approximately a terabyte of data in compressed JSON compatible format from that evaluation. The evaluation started with a period of benign record generation, followed by the injection of malware by a red team. Benign traffic ran continuously during red team activity. Due to constraints in collection data space during the evaluation, data from five hundred hosts were collected rather than from the full set of one-thousand hosts.

Papers


Paper Code Results Date Stars

Tasks


License


  • Distribution A: Approved for public release: distribution unlimited

Modalities


Languages