1*e79be4beSAlexandre Ghiti.. SPDX-License-Identifier: GPL-2.0 2*e79be4beSAlexandre Ghiti 3*e79be4beSAlexandre Ghiti=============================================== 4*e79be4beSAlexandre GhitiRISC-V Kernel Boot Requirements and Constraints 5*e79be4beSAlexandre Ghiti=============================================== 6*e79be4beSAlexandre Ghiti 7*e79be4beSAlexandre Ghiti:Author: Alexandre Ghiti <alexghiti@rivosinc.com> 8*e79be4beSAlexandre Ghiti:Date: 23 May 2023 9*e79be4beSAlexandre Ghiti 10*e79be4beSAlexandre GhitiThis document describes what the RISC-V kernel expects from bootloaders and 11*e79be4beSAlexandre Ghitifirmware, and also the constraints that any developer must have in mind when 12*e79be4beSAlexandre Ghititouching the early boot process. For the purposes of this document, the 13*e79be4beSAlexandre Ghiti``early boot process`` refers to any code that runs before the final virtual 14*e79be4beSAlexandre Ghitimapping is set up. 15*e79be4beSAlexandre Ghiti 16*e79be4beSAlexandre GhitiPre-kernel Requirements and Constraints 17*e79be4beSAlexandre Ghiti======================================= 18*e79be4beSAlexandre Ghiti 19*e79be4beSAlexandre GhitiThe RISC-V kernel expects the following of bootloaders and platform firmware: 20*e79be4beSAlexandre Ghiti 21*e79be4beSAlexandre GhitiRegister state 22*e79be4beSAlexandre Ghiti-------------- 23*e79be4beSAlexandre Ghiti 24*e79be4beSAlexandre GhitiThe RISC-V kernel expects: 25*e79be4beSAlexandre Ghiti 26*e79be4beSAlexandre Ghiti * ``$a0`` to contain the hartid of the current core. 27*e79be4beSAlexandre Ghiti * ``$a1`` to contain the address of the devicetree in memory. 28*e79be4beSAlexandre Ghiti 29*e79be4beSAlexandre GhitiCSR state 30*e79be4beSAlexandre Ghiti--------- 31*e79be4beSAlexandre Ghiti 32*e79be4beSAlexandre GhitiThe RISC-V kernel expects: 33*e79be4beSAlexandre Ghiti 34*e79be4beSAlexandre Ghiti * ``$satp = 0``: the MMU, if present, must be disabled. 35*e79be4beSAlexandre Ghiti 36*e79be4beSAlexandre GhitiReserved memory for resident firmware 37*e79be4beSAlexandre Ghiti------------------------------------- 38*e79be4beSAlexandre Ghiti 39*e79be4beSAlexandre GhitiThe RISC-V kernel must not map any resident memory, or memory protected with 40*e79be4beSAlexandre GhitiPMPs, in the direct mapping, so the firmware must correctly mark those regions 41*e79be4beSAlexandre Ghitias per the devicetree specification and/or the UEFI specification. 42*e79be4beSAlexandre Ghiti 43*e79be4beSAlexandre GhitiKernel location 44*e79be4beSAlexandre Ghiti--------------- 45*e79be4beSAlexandre Ghiti 46*e79be4beSAlexandre GhitiThe RISC-V kernel expects to be placed at a PMD boundary (2MB aligned for rv64 47*e79be4beSAlexandre Ghitiand 4MB aligned for rv32). Note that the EFI stub will physically relocate the 48*e79be4beSAlexandre Ghitikernel if that's not the case. 49*e79be4beSAlexandre Ghiti 50*e79be4beSAlexandre GhitiHardware description 51*e79be4beSAlexandre Ghiti-------------------- 52*e79be4beSAlexandre Ghiti 53*e79be4beSAlexandre GhitiThe firmware can pass either a devicetree or ACPI tables to the RISC-V kernel. 54*e79be4beSAlexandre Ghiti 55*e79be4beSAlexandre GhitiThe devicetree is either passed directly to the kernel from the previous stage 56*e79be4beSAlexandre Ghitiusing the ``$a1`` register, or when booting with UEFI, it can be passed using the 57*e79be4beSAlexandre GhitiEFI configuration table. 58*e79be4beSAlexandre Ghiti 59*e79be4beSAlexandre GhitiThe ACPI tables are passed to the kernel using the EFI configuration table. In 60*e79be4beSAlexandre Ghitithis case, a tiny devicetree is still created by the EFI stub. Please refer to 61*e79be4beSAlexandre Ghiti"EFI stub and devicetree" section below for details about this devicetree. 62*e79be4beSAlexandre Ghiti 63*e79be4beSAlexandre GhitiKernel entry 64*e79be4beSAlexandre Ghiti------------ 65*e79be4beSAlexandre Ghiti 66*e79be4beSAlexandre GhitiOn SMP systems, there are 2 methods to enter the kernel: 67*e79be4beSAlexandre Ghiti 68*e79be4beSAlexandre Ghiti- ``RISCV_BOOT_SPINWAIT``: the firmware releases all harts in the kernel, one hart 69*e79be4beSAlexandre Ghiti wins a lottery and executes the early boot code while the other harts are 70*e79be4beSAlexandre Ghiti parked waiting for the initialization to finish. This method is mostly used to 71*e79be4beSAlexandre Ghiti support older firmwares without SBI HSM extension and M-mode RISC-V kernel. 72*e79be4beSAlexandre Ghiti- ``Ordered booting``: the firmware releases only one hart that will execute the 73*e79be4beSAlexandre Ghiti initialization phase and then will start all other harts using the SBI HSM 74*e79be4beSAlexandre Ghiti extension. The ordered booting method is the preferred booting method for 75*e79be4beSAlexandre Ghiti booting the RISC-V kernel because it can support CPU hotplug and kexec. 76*e79be4beSAlexandre Ghiti 77*e79be4beSAlexandre GhitiUEFI 78*e79be4beSAlexandre Ghiti---- 79*e79be4beSAlexandre Ghiti 80*e79be4beSAlexandre GhitiUEFI memory map 81*e79be4beSAlexandre Ghiti~~~~~~~~~~~~~~~ 82*e79be4beSAlexandre Ghiti 83*e79be4beSAlexandre GhitiWhen booting with UEFI, the RISC-V kernel will use only the EFI memory map to 84*e79be4beSAlexandre Ghitipopulate the system memory. 85*e79be4beSAlexandre Ghiti 86*e79be4beSAlexandre GhitiThe UEFI firmware must parse the subnodes of the ``/reserved-memory`` devicetree 87*e79be4beSAlexandre Ghitinode and abide by the devicetree specification to convert the attributes of 88*e79be4beSAlexandre Ghitithose subnodes (``no-map`` and ``reusable``) into their correct EFI equivalent 89*e79be4beSAlexandre Ghiti(refer to section "3.5.4 /reserved-memory and UEFI" of the devicetree 90*e79be4beSAlexandre Ghitispecification v0.4-rc1). 91*e79be4beSAlexandre Ghiti 92*e79be4beSAlexandre GhitiRISCV_EFI_BOOT_PROTOCOL 93*e79be4beSAlexandre Ghiti~~~~~~~~~~~~~~~~~~~~~~~ 94*e79be4beSAlexandre Ghiti 95*e79be4beSAlexandre GhitiWhen booting with UEFI, the EFI stub requires the boot hartid in order to pass 96*e79be4beSAlexandre Ghitiit to the RISC-V kernel in ``$a1``. The EFI stub retrieves the boot hartid using 97*e79be4beSAlexandre Ghitione of the following methods: 98*e79be4beSAlexandre Ghiti 99*e79be4beSAlexandre Ghiti- ``RISCV_EFI_BOOT_PROTOCOL`` (**preferred**). 100*e79be4beSAlexandre Ghiti- ``boot-hartid`` devicetree subnode (**deprecated**). 101*e79be4beSAlexandre Ghiti 102*e79be4beSAlexandre GhitiAny new firmware must implement ``RISCV_EFI_BOOT_PROTOCOL`` as the devicetree 103*e79be4beSAlexandre Ghitibased approach is deprecated now. 104*e79be4beSAlexandre Ghiti 105*e79be4beSAlexandre GhitiEarly Boot Requirements and Constraints 106*e79be4beSAlexandre Ghiti======================================= 107*e79be4beSAlexandre Ghiti 108*e79be4beSAlexandre GhitiThe RISC-V kernel's early boot process operates under the following constraints: 109*e79be4beSAlexandre Ghiti 110*e79be4beSAlexandre GhitiEFI stub and devicetree 111*e79be4beSAlexandre Ghiti----------------------- 112*e79be4beSAlexandre Ghiti 113*e79be4beSAlexandre GhitiWhen booting with UEFI, the devicetree is supplemented (or created) by the EFI 114*e79be4beSAlexandre Ghitistub with the same parameters as arm64 which are described at the paragraph 115*e79be4beSAlexandre Ghiti"UEFI kernel support on ARM" in Documentation/arch/arm/uefi.rst. 116*e79be4beSAlexandre Ghiti 117*e79be4beSAlexandre GhitiVirtual mapping installation 118*e79be4beSAlexandre Ghiti---------------------------- 119*e79be4beSAlexandre Ghiti 120*e79be4beSAlexandre GhitiThe installation of the virtual mapping is done in 2 steps in the RISC-V kernel: 121*e79be4beSAlexandre Ghiti 122*e79be4beSAlexandre Ghiti1. ``setup_vm()`` installs a temporary kernel mapping in ``early_pg_dir`` which 123*e79be4beSAlexandre Ghiti allows discovery of the system memory. Only the kernel text/data are mapped 124*e79be4beSAlexandre Ghiti at this point. When establishing this mapping, no allocation can be done 125*e79be4beSAlexandre Ghiti (since the system memory is not known yet), so ``early_pg_dir`` page table is 126*e79be4beSAlexandre Ghiti statically allocated (using only one table for each level). 127*e79be4beSAlexandre Ghiti 128*e79be4beSAlexandre Ghiti2. ``setup_vm_final()`` creates the final kernel mapping in ``swapper_pg_dir`` 129*e79be4beSAlexandre Ghiti and takes advantage of the discovered system memory to create the linear 130*e79be4beSAlexandre Ghiti mapping. When establishing this mapping, the kernel can allocate memory but 131*e79be4beSAlexandre Ghiti cannot access it directly (since the direct mapping is not present yet), so 132*e79be4beSAlexandre Ghiti it uses temporary mappings in the fixmap region to be able to access the 133*e79be4beSAlexandre Ghiti newly allocated page table levels. 134*e79be4beSAlexandre Ghiti 135*e79be4beSAlexandre GhitiFor ``virt_to_phys()`` and ``phys_to_virt()`` to be able to correctly convert 136*e79be4beSAlexandre Ghitidirect mapping addresses to physical addresses, they need to know the start of 137*e79be4beSAlexandre Ghitithe DRAM. This happens after step 1, right before step 2 installs the direct 138*e79be4beSAlexandre Ghitimapping (see ``setup_bootmem()`` function in arch/riscv/mm/init.c). Any usage of 139*e79be4beSAlexandre Ghitithose macros before the final virtual mapping is installed must be carefully 140*e79be4beSAlexandre Ghitiexamined. 141*e79be4beSAlexandre Ghiti 142*e79be4beSAlexandre GhitiDevicetree mapping via fixmap 143*e79be4beSAlexandre Ghiti----------------------------- 144*e79be4beSAlexandre Ghiti 145*e79be4beSAlexandre GhitiAs the ``reserved_mem`` array is initialized with virtual addresses established 146*e79be4beSAlexandre Ghitiby ``setup_vm()``, and used with the mapping established by 147*e79be4beSAlexandre Ghiti``setup_vm_final()``, the RISC-V kernel uses the fixmap region to map the 148*e79be4beSAlexandre Ghitidevicetree. This ensures that the devicetree remains accessible by both virtual 149*e79be4beSAlexandre Ghitimappings. 150*e79be4beSAlexandre Ghiti 151*e79be4beSAlexandre GhitiPre-MMU execution 152*e79be4beSAlexandre Ghiti----------------- 153*e79be4beSAlexandre Ghiti 154*e79be4beSAlexandre GhitiA few pieces of code need to run before even the first virtual mapping is 155*e79be4beSAlexandre Ghitiestablished. These are the installation of the first virtual mapping itself, 156*e79be4beSAlexandre Ghitipatching of early alternatives and the early parsing of the kernel command line. 157*e79be4beSAlexandre GhitiThat code must be very carefully compiled as: 158*e79be4beSAlexandre Ghiti 159*e79be4beSAlexandre Ghiti- ``-fno-pie``: This is needed for relocatable kernels which use ``-fPIE``, 160*e79be4beSAlexandre Ghiti since otherwise, any access to a global symbol would go through the GOT which 161*e79be4beSAlexandre Ghiti is only relocated virtually. 162*e79be4beSAlexandre Ghiti- ``-mcmodel=medany``: Any access to a global symbol must be PC-relative to 163*e79be4beSAlexandre Ghiti avoid any relocations to happen before the MMU is setup. 164*e79be4beSAlexandre Ghiti- *all* instrumentation must also be disabled (that includes KASAN, ftrace and 165*e79be4beSAlexandre Ghiti others). 166*e79be4beSAlexandre Ghiti 167*e79be4beSAlexandre GhitiAs using a symbol from a different compilation unit requires this unit to be 168*e79be4beSAlexandre Ghiticompiled with those flags, we advise, as much as possible, not to use external 169*e79be4beSAlexandre Ghitisymbols. 170