Scalable Bilevel Optimization for Generating Maximally Representative OPF Datasets
New generations of power systems, containing high shares of renewable energy resources, require improved data-driven tools which can swiftly adapt to changes in system operation. Many of these tools, such as ones using machine learning, rely on high-quality training datasets to construct probabilistic models. Such models should be able to accurately represent the system when operating at its limits (i.e., operating with a high degree of ``active constraints"). However, generating training datasets that accurately represent the many possible combinations of these active constraints is a particularly challenging task, especially within the realm of nonlinear AC Optimal Power Flow (OPF), since most active constraints cannot be enforced explicitly. Using bilevel optimization, this paper introduces a data collection routine that sequentially solves for OPF solutions which are ``optimally far" from previously acquired voltage, power, and load profile data points. The routine, termed RAMBO, samples critical data close to a system's boundaries much more effectively than a random sampling benchmark. Simulated test results are collected on the 30-, 57-, and 118-bus PGLib test cases.
PDF Abstract