1*** Settings *** 2Documentation Utility for RAS test scenarios through HOST & BMC. 3Resource ../../lib/utils.robot 4Resource ../../lib/ras/host_utils.robot 5Resource ../../lib/resource.robot 6Resource ../../lib/state_manager.robot 7Resource ../../lib/boot_utils.robot 8Variables ../../lib/ras/variables.py 9Variables ../../data/variables.py 10Resource ../../lib/dump_utils.robot 11 12Library DateTime 13Library OperatingSystem 14Library random 15Library Collections 16 17*** Variables *** 18${stack_mode} normal 19 20*** Keywords *** 21 22Verify And Clear Gard Records On HOST 23 [Documentation] Verify And Clear gard records on HOST. 24 25 ${output}= Gard Operations On OS list 26 Should Not Contain ${output} No GARD 27 Gard Operations On OS clear all 28 29Verify Error Log Entry 30 [Documentation] Verify error log entry & signature description. 31 [Arguments] ${signature_desc} ${log_prefix} 32 # Description of argument(s): 33 # signature_desc Error log signature description. 34 # log_prefix Log path prefix. 35 36 37 Error Logs Should Exist 38 39 Collect eSEL Log ${log_prefix} 40 ${error_log_file_path}= Catenate ${log_prefix}esel.txt 41 ${rc} ${output}= Run and Return RC and Output 42 ... grep -i ${signature_desc} ${error_log_file_path} 43 Should Be Equal ${rc} ${0} 44 Should Not Be Empty ${output} 45 46Inject Recoverable Error With Threshold Limit 47 [Documentation] Inject and verify recoverable error on processor through 48 ... BMC/HOST. 49 ... Test sequence: 50 ... 1. Inject recoverable error on a given target 51 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 52 ... 2. Check If HOST is running. 53 ... 3. Verify error log entry & signature description. 54 ... 4. Verify & clear gard records. 55 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 56 ... ${signature_desc} ${log_prefix} 57 # Description of argument(s): 58 # interface_type Inject error through 'BMC' or 'HOST'. 59 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 60 # value (e.g 2000000000000000). 61 # threshold_limit Threshold limit (e.g 1, 5, 32). 62 # signature_desc Error log signature description. 63 # log_prefix Log path prefix. 64 65 Run Keyword Inject Error Through ${interface_type} 66 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 67 68 Is Host Running 69 ${output}= Gard Operations On OS list 70 Should Contain ${output} No GARD 71 Verify Error Log Entry ${signature_desc} ${log_prefix} 72 73 74Inject Unrecoverable Error 75 [Documentation] Inject and verify unrecoverable error on processor through 76 ... BMC/HOST. 77 ... Test sequence: 78 ... 1. Inject unrecoverable error on a given target 79 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 80 ... 2. Check If HOST is rebooted. 81 ... 3. Verify & clear gard records. 82 ... 4. Verify error log entry & signature description. 83 ... 5. Verify & clear dump entry. 84 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 85 ... ${signature_desc} ${log_prefix} ${bmc_reboot}=${0} 86 # Description of argument(s): 87 # interface_type Inject error through 'BMC' or 'HOST'. 88 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 89 # value (e.g 2000000000000000). 90 # threshold_limit Threshold limit (e.g 1, 5, 32). 91 # signature_desc Error Log signature description. 92 # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable') 93 # log_prefix Log path prefix. 94 # bmc_reboot Do bmc reboot If bmc_reboot is set. 95 96 Run Keyword Inject Error Through ${interface_type} 97 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 98 99 # Do BMC Reboot after error injection. 100 Run Keyword If ${bmc_reboot} Run Keywords 101 ... Initiate BMC Reboot 102 ... Wait For BMC Ready 103 ... Initiate Host PowerOff 104 ... Initiate Host Boot 105 ... ELSE 106 ... Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 107 108 Wait for OS 109 Verify Error Log Entry ${signature_desc} ${log_prefix} 110 111 ${dump_service_status} ${stderr} ${rc}= BMC Execute Command systemctl status xyz.openbmc_project.Dump.Manager.service 112 Should Contain ${dump_service_status} Active: active (running) 113 114 ${resp}= OpenBMC Get Request ${DUMP_URI} 115 Run Keyword If '${resp.status_code}' == '${HTTP_NOT_FOUND}' 116 ... Set Test Variable ${DUMP_ENTRY_URI} /xyz/openbmc_project/dump/entry/ 117 118 Read Properties ${DUMP_ENTRY_URI}list 119 Delete All BMC Dump 120 Verify And Clear Gard Records On HOST 121 122 123Fetch FIR Address Translation Value 124 [Documentation] Fetch FIR address translation value through HOST. 125 [Arguments] ${fir_address} ${target_type} 126 # Description of argument(s): 127 # fir_address FIR (Fault isolation register) value (e.g. '2011400'). 128 # core_id Core ID (e.g. '9'). 129 # target_type Target type (e.g. 'EX', 'EQ', 'C'). 130 131 Login To OS Host 132 Copy Address Translation Utils To HOST OS 133 134 # Fetch processor chip IDs. 135 ${proc_chip_id}= Get ProcChipId From OS Processor ${master_proc_chip} 136 # Example output: 137 # 00000000 138 139 ${core_ids}= Get Core IDs From OS ${proc_chip_id[-1]} 140 # Example output: 141 #./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22 142 # ['14', '15', '16', '17'] 143 144 # Ignoring master core ID. 145 ${output}= Get Slice From List ${core_ids} 1 146 # Feth random non-master core ID. 147 ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random 148 ${core_id}= Get From List ${core_ids_sub_list} 0 149 ${translated_fir_addr}= FIR Address Translation Through HOST 150 ... ${fir_address} ${core_id} ${target_type} 151 152 [Return] ${translated_fir_addr} 153 154RAS Test SetUp 155 [Documentation] Validates input parameters. 156 157 Should Not Be Empty 158 ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host. 159 Should Not Be Empty 160 ... ${OS_USERNAME} msg=You must provide OS host user name. 161 Should Not Be Empty 162 ... ${OS_PASSWORD} msg=You must provide OS host user password. 163 164 Smart Power Off 165 166 # Boot to OS. 167 REST Power On quiet=${1} 168 # Adding delay after host bring up. 169 Sleep 60s 170 171RAS Suite Setup 172 [Documentation] Create RAS log directory to store all RAS test logs. 173 174 ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/ 175 Set Suite Variable ${RAS_LOG_DIR_PATH} 176 Set Suite Variable ${master_proc_chip} False 177 178 Create Directory ${RAS_LOG_DIR_PATH} 179 OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH} 180 Empty Directory ${RAS_LOG_DIR_PATH} 181 182 Should Not Be Empty ${ESEL_BIN_PATH} 183 Set Environment Variable PATH %{PATH}:${ESEL_BIN_PATH} 184 185 # Boot to Os. 186 REST Power On quiet=${1} 187 188 # Check Opal-PRD service enabled on host. 189 ${opal_prd_state}= Is Opal-PRD Service Enabled 190 Run Keyword If '${opal_prd_state}' == 'disabled' 191 ... Enable Opal-PRD Service On HOST 192 193RAS Suite Cleanup 194 [Documentation] Perform RAS suite cleanup and verify that host 195 ... boots after test suite run. 196 197 # Boot to OS. 198 REST Power On 199 Delete Error Logs 200 Gard Operations On OS clear all 201 202 203Inject Error At HOST Boot Path 204 205 [Documentation] Inject and verify recoverable error on processor through 206 ... BMC using pdbg tool at HOST Boot path. 207 ... Test sequence: 208 ... 1. Inject error on a given target 209 ... (e.g: Processor core, CAPP, MCA) through BMC using 210 ... pdbg tool at HOST Boot path. 211 ... 2. Check If HOST is rebooted and running. 212 ... 3. Verify error log entry & signature description. 213 ... 4. Verify & clear gard records. 214 [Arguments] ${fir_address} ${value} ${signature_desc} ${log_prefix} 215 # Description of argument(s): 216 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 217 # value (e.g 2000000000000000). 218 # signature_desc Error log signature description. 219 # log_prefix Log path prefix. 220 221 Inject Error Through BMC At HOST Boot ${fir_address} ${value} 222 223 Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 224 Wait for OS 225 Verify Error Log Entry ${signature_desc} ${log_prefix} 226 Verify And Clear Gard Records On HOST 227