1*** Settings *** 2Documentation Utility for RAS test scenarios through HOST & BMC. 3Resource ../../lib/utils.robot 4Resource ../../lib/ras/host_utils.robot 5Resource ../../lib/resource.robot 6Resource ../../lib/state_manager.robot 7Resource ../../lib/boot_utils.robot 8Variables ../../lib/ras/variables.py 9Variables ../../data/variables.py 10Resource ../../lib/dump_utils.robot 11 12Library DateTime 13Library OperatingSystem 14Library random 15Library Collections 16 17*** Variables *** 18${stack_mode} normal 19 20*** Keywords *** 21 22Verify And Clear Gard Records On HOST 23 [Documentation] Verify And Clear gard records on HOST. 24 25 ${output}= Gard Operations On OS list 26 Should Not Contain ${output} No GARD 27 Gard Operations On OS clear all 28 29Verify Error Log Entry 30 [Documentation] Verify error log entry & signature description. 31 [Arguments] ${signature_desc} ${log_prefix} 32 # Description of argument(s): 33 # signature_desc Error log signature description. 34 # log_prefix Log path prefix. 35 36 37 Error Logs Should Exist 38 39 Collect eSEL Log ${log_prefix} 40 ${error_log_file_path}= Catenate ${log_prefix}esel.txt 41 ${rc} ${output}= Run and Return RC and Output 42 ... grep -i ${signature_desc} ${error_log_file_path} 43 Should Be Equal ${rc} ${0} 44 Should Not Be Empty ${output} 45 46Inject Recoverable Error With Threshold Limit 47 [Documentation] Inject and verify recoverable error on processor through 48 ... BMC/HOST. 49 ... Test sequence: 50 ... 1. Inject recoverable error on a given target 51 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 52 ... 2. Check If HOST is running. 53 ... 3. Verify error log entry & signature description. 54 ... 4. Verify & clear gard records. 55 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 56 ... ${signature_desc} ${log_prefix} 57 # Description of argument(s): 58 # interface_type Inject error through 'BMC' or 'HOST'. 59 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 60 # value (e.g 2000000000000000). 61 # threshold_limit Threshold limit (e.g 1, 5, 32). 62 # signature_desc Error log signature description. 63 # log_prefix Log path prefix. 64 65 Run Keyword Inject Error Through ${interface_type} 66 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 67 68 Is Host Running 69 ${output}= Gard Operations On OS list 70 Should Contain ${output} No GARD 71 Verify Error Log Entry ${signature_desc} ${log_prefix} 72 73 74Inject Unrecoverable Error 75 [Documentation] Inject and verify unrecoverable error on processor through 76 ... BMC/HOST. 77 ... Test sequence: 78 ... 1. Inject unrecoverable error on a given target 79 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 80 ... 2. Check If HOST is rebooted. 81 ... 3. Verify & clear gard records. 82 ... 4. Verify error log entry & signature description. 83 ... 5. Verify & clear dump entry. 84 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 85 ... ${signature_desc} ${log_prefix} ${bmc_reboot}=${0} 86 # Description of argument(s): 87 # interface_type Inject error through 'BMC' or 'HOST'. 88 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 89 # value (e.g 2000000000000000). 90 # threshold_limit Threshold limit (e.g 1, 5, 32). 91 # signature_desc Error Log signature description. 92 # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable') 93 # log_prefix Log path prefix. 94 # bmc_reboot Do bmc reboot If bmc_reboot is set. 95 96 Run Keyword Inject Error Through ${interface_type} 97 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 98 99 # Do BMC Reboot after error injection. 100 Run Keyword If ${bmc_reboot} Run Keywords 101 ... Initiate BMC Reboot 102 ... Wait For BMC Ready 103 ... Initiate Host PowerOff 104 ... Initiate Host Boot 105 ... ELSE 106 ... Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 107 108 Wait for OS 109 Verify Error Log Entry ${signature_desc} ${log_prefix} 110 111 ${dump_service_status} ${stderr} ${rc}= BMC Execute Command 112 ... systemctl status xyz.openbmc_project.Dump.Manager.service 113 Should Contain ${dump_service_status} Active: active (running) 114 115 ${resp}= OpenBMC Get Request ${DUMP_URI} 116 Run Keyword If '${resp.status_code}' == '${HTTP_NOT_FOUND}' 117 ... Set Test Variable ${DUMP_ENTRY_URI} /xyz/openbmc_project/dump/entry/ 118 119 Read Properties ${DUMP_ENTRY_URI}list 120 Delete All BMC Dump 121 Verify And Clear Gard Records On HOST 122 123 124Fetch FIR Address Translation Value 125 [Documentation] Fetch FIR address translation value through HOST. 126 [Arguments] ${fir_address} ${target_type} 127 # Description of argument(s): 128 # fir_address FIR (Fault isolation register) value (e.g. '2011400'). 129 # core_id Core ID (e.g. '9'). 130 # target_type Target type (e.g. 'EX', 'EQ', 'C'). 131 132 Login To OS Host 133 Copy Address Translation Utils To HOST OS 134 135 # Fetch processor chip IDs. 136 ${proc_chip_id}= Get ProcChipId From OS Processor ${master_proc_chip} 137 # Example output: 138 # 00000000 139 140 ${core_ids}= Get Core IDs From OS ${proc_chip_id[-1]} 141 # Example output: 142 #./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22 143 # ['14', '15', '16', '17'] 144 145 # Ignoring master core ID. 146 ${output}= Get Slice From List ${core_ids} 1 147 # Feth random non-master core ID. 148 ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random 149 ${core_id}= Get From List ${core_ids_sub_list} 0 150 ${translated_fir_addr}= FIR Address Translation Through HOST 151 ... ${fir_address} ${core_id} ${target_type} 152 153 RETURN ${translated_fir_addr} 154 155RAS Test SetUp 156 [Documentation] Validates input parameters. 157 158 Should Not Be Empty 159 ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host. 160 Should Not Be Empty 161 ... ${OS_USERNAME} msg=You must provide OS host user name. 162 Should Not Be Empty 163 ... ${OS_PASSWORD} msg=You must provide OS host user password. 164 165 Smart Power Off 166 167 # Boot to OS. 168 REST Power On quiet=${1} 169 # Adding delay after host bring up. 170 Sleep 60s 171 172RAS Suite Setup 173 [Documentation] Create RAS log directory to store all RAS test logs. 174 175 ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/ 176 Set Suite Variable ${RAS_LOG_DIR_PATH} 177 Set Suite Variable ${master_proc_chip} False 178 179 Create Directory ${RAS_LOG_DIR_PATH} 180 OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH} 181 Empty Directory ${RAS_LOG_DIR_PATH} 182 183 Should Not Be Empty ${ESEL_BIN_PATH} 184 Set Environment Variable PATH %{PATH}:${ESEL_BIN_PATH} 185 186 # Boot to Os. 187 REST Power On quiet=${1} 188 189 # Check Opal-PRD service enabled on host. 190 ${opal_prd_state}= Is Opal-PRD Service Enabled 191 Run Keyword If '${opal_prd_state}' == 'disabled' 192 ... Enable Opal-PRD Service On HOST 193 194RAS Suite Cleanup 195 [Documentation] Perform RAS suite cleanup and verify that host 196 ... boots after test suite run. 197 198 # Boot to OS. 199 REST Power On 200 Delete Error Logs 201 Gard Operations On OS clear all 202 203 204Inject Error At HOST Boot Path 205 206 [Documentation] Inject and verify recoverable error on processor through 207 ... BMC using pdbg tool at HOST Boot path. 208 ... Test sequence: 209 ... 1. Inject error on a given target 210 ... (e.g: Processor core, CAPP, MCA) through BMC using 211 ... pdbg tool at HOST Boot path. 212 ... 2. Check If HOST is rebooted and running. 213 ... 3. Verify error log entry & signature description. 214 ... 4. Verify & clear gard records. 215 [Arguments] ${fir_address} ${value} ${signature_desc} ${log_prefix} 216 # Description of argument(s): 217 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 218 # value (e.g 2000000000000000). 219 # signature_desc Error log signature description. 220 # log_prefix Log path prefix. 221 222 Inject Error Through BMC At HOST Boot ${fir_address} ${value} 223 224 Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 225 Wait for OS 226 Verify Error Log Entry ${signature_desc} ${log_prefix} 227 Verify And Clear Gard Records On HOST 228