You have correctly stated that heuristic alignment patches like RLHF do not represent a fundamental solution to AI safety. The underlying truth is that without formal, deterministic bounds, we are simply hoping for the best.
The OHM NI Stack’s vision is to solve AI security topics in a rigorously mathematical, best-practice way where the sharpest minds can contribute. We have filed patent applications (#63/994,444, #63/997,472, and #64/1,997,472) for the AEGIS Cascade and the TLA (Monotonic Risk Ratchet). This framework Abandons probability for physics, utilizing "Kaiostic Entropy" to enforce one-way mathematical bounds on context erosion.
We are presenting this mathematically bounding paradigm at the Planetary Summit in Vienna this May. We want MIRI to contribute to this deterministic foundation. We invite you to try and formally break our mathematical bounds. If our physics-based deterministic cascade holds against your evaluation, we can establish this as the global best practice for alignment.