1Ramoops oops/panic logger 2========================= 3 4Sergiu Iordache <sergiu@chromium.org> 5 6Updated: 10 Feb 2021 7 8Introduction 9------------ 10 11Ramoops is an oops/panic logger that writes its logs to RAM before the system 12crashes. It works by logging oopses and panics in a circular buffer. Ramoops 13needs a system with persistent RAM so that the content of that area can 14survive after a restart. 15 16Ramoops concepts 17---------------- 18 19Ramoops uses a predefined memory area to store the dump. The start and size 20and type of the memory area are set using three variables: 21 22 * ``mem_address`` for the start 23 * ``mem_size`` for the size. The memory size will be rounded down to a 24 power of two. 25 * ``mem_type`` to specify if the memory type (default is pgprot_writecombine). 26 27Typically the default value of ``mem_type=0`` should be used as that sets the pstore 28mapping to pgprot_writecombine. Setting ``mem_type=1`` attempts to use 29``pgprot_noncached``, which only works on some platforms. This is because pstore 30depends on atomic operations. At least on ARM, pgprot_noncached causes the 31memory to be mapped strongly ordered, and atomic operations on strongly ordered 32memory are implementation defined, and won't work on many ARMs such as omaps. 33Setting ``mem_type=2`` attempts to treat the memory region as normal memory, 34which enables full cache on it. This can improve the performance. 35 36The memory area is divided into ``record_size`` chunks (also rounded down to 37power of two) and each kmesg dump writes a ``record_size`` chunk of 38information. 39 40Limiting which kinds of kmsg dumps are stored can be controlled via 41the ``max_reason`` value, as defined in include/linux/kmsg_dump.h's 42``enum kmsg_dump_reason``. For example, to store both Oopses and Panics, 43``max_reason`` should be set to 2 (KMSG_DUMP_OOPS), to store only Panics 44``max_reason`` should be set to 1 (KMSG_DUMP_PANIC). Setting this to 0 45(KMSG_DUMP_UNDEF), means the reason filtering will be controlled by the 46``printk.always_kmsg_dump`` boot param: if unset, it'll be KMSG_DUMP_OOPS, 47otherwise KMSG_DUMP_MAX. 48 49The module uses a counter to record multiple dumps but the counter gets reset 50on restart (i.e. new dumps after the restart will overwrite old ones). 51 52Ramoops also supports software ECC protection of persistent memory regions. 53This might be useful when a hardware reset was used to bring the machine back 54to life (i.e. a watchdog triggered). In such cases, RAM may be somewhat 55corrupt, but usually it is restorable. 56 57Setting the parameters 58---------------------- 59 60Setting the ramoops parameters can be done in several different manners: 61 62 A. Use the module parameters (which have the names of the variables described 63 as before). For quick debugging, you can also reserve parts of memory during 64 boot and then use the reserved memory for ramoops. For example, assuming a 65 machine with > 128 MB of memory, the following kernel command line will tell 66 the kernel to use only the first 128 MB of memory, and place ECC-protected 67 ramoops region at 128 MB boundary:: 68 69 mem=128M ramoops.mem_address=0x8000000 ramoops.ecc=1 70 71 B. Use Device Tree bindings, as described in 72 ``Documentation/devicetree/bindings/reserved-memory/ramoops.txt``. 73 For example:: 74 75 reserved-memory { 76 #address-cells = <2>; 77 #size-cells = <2>; 78 ranges; 79 80 ramoops@8f000000 { 81 compatible = "ramoops"; 82 reg = <0 0x8f000000 0 0x100000>; 83 record-size = <0x4000>; 84 console-size = <0x4000>; 85 }; 86 }; 87 88 C. Use a platform device and set the platform data. The parameters can then 89 be set through that platform data. An example of doing that is: 90 91 .. code-block:: c 92 93 #include <linux/pstore_ram.h> 94 [...] 95 96 static struct ramoops_platform_data ramoops_data = { 97 .mem_size = <...>, 98 .mem_address = <...>, 99 .mem_type = <...>, 100 .record_size = <...>, 101 .max_reason = <...>, 102 .ecc = <...>, 103 }; 104 105 static struct platform_device ramoops_dev = { 106 .name = "ramoops", 107 .dev = { 108 .platform_data = &ramoops_data, 109 }, 110 }; 111 112 [... inside a function ...] 113 int ret; 114 115 ret = platform_device_register(&ramoops_dev); 116 if (ret) { 117 printk(KERN_ERR "unable to register platform device\n"); 118 return ret; 119 } 120 121You can specify either RAM memory or peripheral devices' memory. However, when 122specifying RAM, be sure to reserve the memory by issuing memblock_reserve() 123very early in the architecture code, e.g.:: 124 125 #include <linux/memblock.h> 126 127 memblock_reserve(ramoops_data.mem_address, ramoops_data.mem_size); 128 129Dump format 130----------- 131 132The data dump begins with a header, currently defined as ``====`` followed by a 133timestamp and a new line. The dump then continues with the actual data. 134 135Reading the data 136---------------- 137 138The dump data can be read from the pstore filesystem. The format for these 139files is ``dmesg-ramoops-N``, where N is the record number in memory. To delete 140a stored record from RAM, simply unlink the respective pstore file. 141 142Persistent function tracing 143--------------------------- 144 145Persistent function tracing might be useful for debugging software or hardware 146related hangs. The functions call chain log is stored in a ``ftrace-ramoops`` 147file. Here is an example of usage:: 148 149 # mount -t debugfs debugfs /sys/kernel/debug/ 150 # echo 1 > /sys/kernel/debug/pstore/record_ftrace 151 # reboot -f 152 [...] 153 # mount -t pstore pstore /mnt/ 154 # tail /mnt/ftrace-ramoops 155 0 ffffffff8101ea64 ffffffff8101bcda native_apic_mem_read <- disconnect_bsp_APIC+0x6a/0xc0 156 0 ffffffff8101ea44 ffffffff8101bcf6 native_apic_mem_write <- disconnect_bsp_APIC+0x86/0xc0 157 0 ffffffff81020084 ffffffff8101a4b5 hpet_disable <- native_machine_shutdown+0x75/0x90 158 0 ffffffff81005f94 ffffffff8101a4bb iommu_shutdown_noop <- native_machine_shutdown+0x7b/0x90 159 0 ffffffff8101a6a1 ffffffff8101a437 native_machine_emergency_restart <- native_machine_restart+0x37/0x40 160 0 ffffffff811f9876 ffffffff8101a73a acpi_reboot <- native_machine_emergency_restart+0xaa/0x1e0 161 0 ffffffff8101a514 ffffffff8101a772 mach_reboot_fixups <- native_machine_emergency_restart+0xe2/0x1e0 162 0 ffffffff811d9c54 ffffffff8101a7a0 __const_udelay <- native_machine_emergency_restart+0x110/0x1e0 163 0 ffffffff811d9c34 ffffffff811d9c80 __delay <- __const_udelay+0x30/0x40 164 0 ffffffff811d9d14 ffffffff811d9c3f delay_tsc <- __delay+0xf/0x20 165