Microsoft
Windows Kernel Developer on the Windows Hardware Error Architecture (WHEA).
Notable Technical Projects:
- Implementation of Downstream Port Containment in the Windows PCI driver.
- Extension of Downstream Port Containment support for E1.s drive surprise removal.
- Windows support for AMD's MCAx (Extended Machine Check Architecture).
- Porting of platform specific hardware reliability features from firmware code into kernel code.
- Porting of Windows hardware reliability features for Arm platforms.
- Documentation champion, running local "documentation days" to improve documentation culture.
WHEA? Yes! We even have our own MSDN pages. WHEA owns BSOD code 0x124. If you see a BSOD 0x124 it means your hardware has slightly lost its mind and the Windows kernel is making a controlled emergency landing. Sadly you won't be reaching your destination this boot session, but all your files will get to ride those fine inflatable slides off the side of the plane and return home safely for when you next boot your machine. We don't always crash your machine, we also implement recovery mechanisms to keep your machine alive.
Usually things are not the fault of hardware, but ECC exists for a reason. When hardware does flip bits, it's preferable to contain the damage and bring the system down in a controlled fashion to prevent executing random instructions. Additionally, when considering cloud scale, a one-in-a-million event is a daily occurrence. You can even use memory errors to attack machines, and try to induce them using hair dryers! I'm serious, this paper by Sudhakar Govindavajhala titled "Using Memory Errors to Attack a Virtual Machine" features a figure with a heat lamp attached to a desktop to induce memory errors.
Industry Contributions:
- Input on updates to the Arm RAS Extensions
- Review of updates to ACPI, specifically in the APEI tables.
- Participation on the Arm RAS workgroup for SBSA requirements