10ef0506eSEric DeVolderACPI ERST DEVICE 20ef0506eSEric DeVolder================ 30ef0506eSEric DeVolder 40ef0506eSEric DeVolderThe ACPI ERST device is utilized to support the ACPI Error Record 50ef0506eSEric DeVolderSerialization Table, ERST, functionality. This feature is designed for 60ef0506eSEric DeVolderstoring error records in persistent storage for future reference 70ef0506eSEric DeVolderand/or debugging. 80ef0506eSEric DeVolder 90ef0506eSEric DeVolderThe ACPI specification[1], in Chapter "ACPI Platform Error Interfaces 100ef0506eSEric DeVolder(APEI)", and specifically subsection "Error Serialization", outlines a 110ef0506eSEric DeVoldermethod for storing error records into persistent storage. 120ef0506eSEric DeVolder 130ef0506eSEric DeVolderThe format of error records is described in the UEFI specification[2], 140ef0506eSEric DeVolderin Appendix N "Common Platform Error Record". 150ef0506eSEric DeVolder 160ef0506eSEric DeVolderWhile the ACPI specification allows for an NVRAM "mode" (see 170ef0506eSEric DeVolderGET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is 180ef0506eSEric DeVolderdirectly exposed for direct access by the OS/guest, this device 190ef0506eSEric DeVolderimplements the non-NVRAM "mode". This non-NVRAM "mode" is what is 200ef0506eSEric DeVolderimplemented by most BIOS (since flash memory requires programming 210ef0506eSEric DeVolderoperations in order to update its contents). Furthermore, as of the 220ef0506eSEric DeVoldertime of this writing, Linux only supports the non-NVRAM "mode". 230ef0506eSEric DeVolder 240ef0506eSEric DeVolder 250ef0506eSEric DeVolderBackground/Motivation 260ef0506eSEric DeVolder--------------------- 270ef0506eSEric DeVolder 280ef0506eSEric DeVolderLinux uses the persistent storage filesystem, pstore, to record 290ef0506eSEric DeVolderinformation (eg. dmesg tail) upon panics and shutdowns. Pstore is 300ef0506eSEric DeVolderindependent of, and runs before, kdump. In certain scenarios (ie. 310ef0506eSEric DeVolderhosts/guests with root filesystems on NFS/iSCSI where networking 320ef0506eSEric DeVoldersoftware and/or hardware fails, and thus kdump fails), pstore may 330ef0506eSEric DeVoldercontain information available for post-mortem debugging. 340ef0506eSEric DeVolder 350ef0506eSEric DeVolderTwo common storage backends for the pstore filesystem are ACPI ERST 360ef0506eSEric DeVolderand UEFI. Most BIOS implement ACPI ERST. UEFI is not utilized in all 370ef0506eSEric DeVolderguests. With QEMU supporting ACPI ERST, it becomes a viable pstore 380ef0506eSEric DeVolderstorage backend for virtual machines (as it is now for bare metal 390ef0506eSEric DeVoldermachines). 400ef0506eSEric DeVolder 410ef0506eSEric DeVolderEnabling support for ACPI ERST facilitates a consistent method to 420ef0506eSEric DeVoldercapture kernel panic information in a wide range of guests: from 430ef0506eSEric DeVolderresource-constrained microvms to very large guests, and in particular, 440ef0506eSEric DeVolderin direct-boot environments (which would lack UEFI run-time services). 450ef0506eSEric DeVolder 460ef0506eSEric DeVolderNote that Microsoft Windows also utilizes the ACPI ERST for certain 470ef0506eSEric DeVoldercrash information, if available[3]. 480ef0506eSEric DeVolder 490ef0506eSEric DeVolder 500ef0506eSEric DeVolderConfiguration|Usage 510ef0506eSEric DeVolder------------------- 520ef0506eSEric DeVolder 530ef0506eSEric DeVolderTo use ACPI ERST, a memory-backend-file object and acpi-erst device 540ef0506eSEric DeVoldercan be created, for example: 550ef0506eSEric DeVolder 560ef0506eSEric DeVolder qemu ... 570ef0506eSEric DeVolder -object memory-backend-file,id=erstnvram,mem-path=acpi-erst.backing,size=0x10000,share=on \ 580ef0506eSEric DeVolder -device acpi-erst,memdev=erstnvram 590ef0506eSEric DeVolder 600ef0506eSEric DeVolderFor proper operation, the ACPI ERST device needs a memory-backend-file 610ef0506eSEric DeVolderobject with the following parameters: 620ef0506eSEric DeVolder 630ef0506eSEric DeVolder - id: The id of the memory-backend-file object is used to associate 640ef0506eSEric DeVolder this memory with the acpi-erst device. 650ef0506eSEric DeVolder - size: The size of the ACPI ERST backing storage. This parameter is 660ef0506eSEric DeVolder required. 670ef0506eSEric DeVolder - mem-path: The location of the ACPI ERST backing storage file. This 680ef0506eSEric DeVolder parameter is also required. 690ef0506eSEric DeVolder - share: The share=on parameter is required so that updates to the 700ef0506eSEric DeVolder ERST backing store are written to the file. 710ef0506eSEric DeVolder 720ef0506eSEric DeVolderand ERST device: 730ef0506eSEric DeVolder 740ef0506eSEric DeVolder - memdev: Is the object id of the memory-backend-file. 750ef0506eSEric DeVolder - record_size: Specifies the size of the records (or slots) in the 760ef0506eSEric DeVolder backend storage. Must be a power of two value greater than or 770ef0506eSEric DeVolder equal to 4096 (PAGE_SIZE). 780ef0506eSEric DeVolder 790ef0506eSEric DeVolder 800ef0506eSEric DeVolderPCI Interface 810ef0506eSEric DeVolder------------- 820ef0506eSEric DeVolder 830ef0506eSEric DeVolderThe ERST device is a PCI device with two BARs, one for accessing the 840ef0506eSEric DeVolderprogramming registers, and the other for accessing the record exchange 850ef0506eSEric DeVolderbuffer. 860ef0506eSEric DeVolder 870ef0506eSEric DeVolderBAR0 contains the programming interface consisting of ACTION and VALUE 880ef0506eSEric DeVolder64-bit registers. All ERST actions/operations/side effects happen on 890ef0506eSEric DeVolderthe write to the ACTION, by design. Any data needed by the action must 900ef0506eSEric DeVolderbe placed into VALUE prior to writing ACTION. Reading the VALUE 910ef0506eSEric DeVoldersimply returns the register contents, which can be updated by a 920ef0506eSEric DeVolderprevious ACTION. 930ef0506eSEric DeVolder 940ef0506eSEric DeVolderBAR1 contains the 8KiB record exchange buffer, which is the 950ef0506eSEric DeVolderimplemented maximum record size. 960ef0506eSEric DeVolder 970ef0506eSEric DeVolder 980ef0506eSEric DeVolderBackend Storage Format 990ef0506eSEric DeVolder---------------------- 1000ef0506eSEric DeVolder 1010ef0506eSEric DeVolderThe backend storage is divided into fixed size "slots", 8KiB in 1020ef0506eSEric DeVolderlength, with each slot storing a single record. Not all slots need to 1030ef0506eSEric DeVolderbe occupied, and they need not be occupied in a contiguous fashion. 1040ef0506eSEric DeVolderThe ability to clear/erase specific records allows for the formation 1050ef0506eSEric DeVolderof unoccupied slots. 1060ef0506eSEric DeVolder 1070ef0506eSEric DeVolderSlot 0 contains a backend storage header that identifies the contents 1080ef0506eSEric DeVolderas ERST and also facilitates efficient access to the records. 1090ef0506eSEric DeVolderDepending upon the size of the backend storage, additional slots will 1100ef0506eSEric DeVolderbe designated to be a part of the slot 0 header. For example, at 8KiB, 111*120f765eSStefan Weilthe slot 0 header can accommodate 1021 records. Thus a storage size 1120ef0506eSEric DeVolderof 8MiB (8KiB * 1024) requires an additional slot for use by the 1130ef0506eSEric DeVolderheader. In this scenario, slot 0 and slot 1 form the backend storage 1140ef0506eSEric DeVolderheader, and records can be stored starting at slot 2. 1150ef0506eSEric DeVolder 1160ef0506eSEric DeVolderBelow is an example layout of the backend storage format (for storage 1170ef0506eSEric DeVoldersize less than 8MiB). The size of the storage is a multiple of 8KiB, 1180ef0506eSEric DeVolderand contains N number of slots to store records. The example below 1190ef0506eSEric DeVoldershows two records (in CPER format) in the backend storage, while the 1200ef0506eSEric DeVolderremaining slots are empty/available. 1210ef0506eSEric DeVolder 1220ef0506eSEric DeVolder:: 1230ef0506eSEric DeVolder 1240ef0506eSEric DeVolder Slot Record 1250ef0506eSEric DeVolder <------------------ 8KiB --------------------> 1260ef0506eSEric DeVolder +--------------------------------------------+ 1270ef0506eSEric DeVolder 0 | storage header | 1280ef0506eSEric DeVolder +--------------------------------------------+ 1290ef0506eSEric DeVolder 1 | empty/available | 1300ef0506eSEric DeVolder +--------------------------------------------+ 1310ef0506eSEric DeVolder 2 | CPER | 1320ef0506eSEric DeVolder +--------------------------------------------+ 1330ef0506eSEric DeVolder 3 | CPER | 1340ef0506eSEric DeVolder +--------------------------------------------+ 1350ef0506eSEric DeVolder ... | | 1360ef0506eSEric DeVolder +--------------------------------------------+ 1370ef0506eSEric DeVolder N | empty/available | 1380ef0506eSEric DeVolder +--------------------------------------------+ 1390ef0506eSEric DeVolder 1400ef0506eSEric DeVolderThe storage header consists of some basic information and an array 1410ef0506eSEric DeVolderof CPER record_id's to efficiently access records in the backend 1420ef0506eSEric DeVolderstorage. 1430ef0506eSEric DeVolder 1440ef0506eSEric DeVolderAll fields in the header are stored in little endian format. 1450ef0506eSEric DeVolder 1460ef0506eSEric DeVolder:: 1470ef0506eSEric DeVolder 1480ef0506eSEric DeVolder +--------------------------------------------+ 1490ef0506eSEric DeVolder | magic | 0x0000 1500ef0506eSEric DeVolder +--------------------------------------------+ 1510ef0506eSEric DeVolder | record_offset | record_size | 0x0008 1520ef0506eSEric DeVolder +--------------------------------------------+ 1530ef0506eSEric DeVolder | record_count | reserved | version | 0x0010 1540ef0506eSEric DeVolder +--------------------------------------------+ 1550ef0506eSEric DeVolder | record_id[0] | 0x0018 1560ef0506eSEric DeVolder +--------------------------------------------+ 1570ef0506eSEric DeVolder | record_id[1] | 0x0020 1580ef0506eSEric DeVolder +--------------------------------------------+ 1590ef0506eSEric DeVolder | record_id[...] | 1600ef0506eSEric DeVolder +--------------------------------------------+ 1610ef0506eSEric DeVolder | record_id[N] | 0x1FF8 1620ef0506eSEric DeVolder +--------------------------------------------+ 1630ef0506eSEric DeVolder 1640ef0506eSEric DeVolderThe 'magic' field contains the value 0x524F545354535245. 1650ef0506eSEric DeVolder 1660ef0506eSEric DeVolderThe 'record_size' field contains the value 0x2000, 8KiB. 1670ef0506eSEric DeVolder 1680ef0506eSEric DeVolderThe 'record_offset' field points to the first record_id in the array, 1690ef0506eSEric DeVolder0x0018. 1700ef0506eSEric DeVolder 1710ef0506eSEric DeVolderThe 'version' field contains 0x0100, the first version. 1720ef0506eSEric DeVolder 1730ef0506eSEric DeVolderThe 'record_count' field contains the number of valid records in the 1740ef0506eSEric DeVolderbackend storage. 1750ef0506eSEric DeVolder 1760ef0506eSEric DeVolderThe 'record_id' array fields are the 64-bit record identifiers of the 1770ef0506eSEric DeVolderCPER record in the corresponding slot. Stated differently, the 1780ef0506eSEric DeVolderlocation of a CPER record_id in the record_id[] array provides the 1790ef0506eSEric DeVolderslot index for the corresponding record in the backend storage. 1800ef0506eSEric DeVolder 1810ef0506eSEric DeVolderNote that, for example, with a backend storage less than 8MiB, slot 0 1820ef0506eSEric DeVoldercontains the header, so the record_id[0] will never contain a valid 1830ef0506eSEric DeVolderCPER record_id. Instead slot 1 is the first available slot and thus 1840ef0506eSEric DeVolderrecord_id_[1] may contain a CPER. 1850ef0506eSEric DeVolder 1860ef0506eSEric DeVolderA 'record_id' of all 0s or all 1s indicates an invalid record (ie. the 1870ef0506eSEric DeVolderslot is available). 1880ef0506eSEric DeVolder 1890ef0506eSEric DeVolder 1900ef0506eSEric DeVolderReferences 1910ef0506eSEric DeVolder---------- 1920ef0506eSEric DeVolder 1930ef0506eSEric DeVolder[1] "Advanced Configuration and Power Interface Specification", 1940ef0506eSEric DeVolder version 4.0, June 2009. 1950ef0506eSEric DeVolder 1960ef0506eSEric DeVolder[2] "Unified Extensible Firmware Interface Specification", 1970ef0506eSEric DeVolder version 2.1, October 2008. 1980ef0506eSEric DeVolder 199*120f765eSStefan Weil[3] "Windows Hardware Error Architecture", specifically 2000ef0506eSEric DeVolder "Error Record Persistence Mechanism". 201