1*** Settings *** 2Documentation Utility for RAS test scenarios through HOST & BMC. 3Resource ../../lib/utils.robot 4Resource ../../lib/ras/host_utils.robot 5Resource ../../lib/resource.robot 6Resource ../../lib/state_manager.robot 7Resource ../../lib/boot_utils.robot 8Variables ../../lib/ras/variables.py 9Variables ../../data/variables.py 10Resource ../../lib/dump_utils.robot 11 12Library DateTime 13Library OperatingSystem 14Library random 15Library Collections 16 17*** Variables *** 18${stack_mode} normal 19 20*** Keywords *** 21 22Verify And Clear Gard Records On HOST 23 [Documentation] Verify And Clear gard records on HOST. 24 25 ${output}= Gard Operations On OS list 26 Should Not Contain ${output} No GARD 27 Gard Operations On OS clear all 28 29Verify Error Log Entry 30 [Documentation] Verify error log entry & signature description. 31 [Arguments] ${signature_desc} ${log_prefix} 32 # Description of argument(s): 33 # signature_desc Error log signature description. 34 # log_prefix Log path prefix. 35 36 37 Error Logs Should Exist 38 39 Collect eSEL Log ${log_prefix} 40 ${error_log_file_path}= Catenate ${log_prefix}esel.txt 41 ${rc} ${output}= Run and Return RC and Output 42 ... grep -i ${signature_desc} ${error_log_file_path} 43 Should Be Equal ${rc} ${0} 44 Should Not Be Empty ${output} 45 46Inject Recoverable Error With Threshold Limit 47 [Documentation] Inject and verify recoverable error on processor through 48 ... BMC/HOST. 49 ... Test sequence: 50 ... 1. Inject recoverable error on a given target 51 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 52 ... 2. Check If HOST is running. 53 ... 3. Verify error log entry & signature description. 54 ... 4. Verify & clear gard records. 55 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 56 ... ${signature_desc} ${log_prefix} 57 # Description of argument(s): 58 # interface_type Inject error through 'BMC' or 'HOST'. 59 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 60 # value (e.g 2000000000000000). 61 # threshold_limit Threshold limit (e.g 1, 5, 32). 62 # signature_desc Error log signature description. 63 # log_prefix Log path prefix. 64 65 Run Keyword Inject Error Through ${interface_type} 66 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 67 68 Is Host Running 69 ${output}= Gard Operations On OS list 70 Should Contain ${output} No GARD 71 Verify Error Log Entry ${signature_desc} ${log_prefix} 72 73 74Inject Unrecoverable Error 75 [Documentation] Inject and verify unrecoverable error on processor through 76 ... BMC/HOST. 77 ... Test sequence: 78 ... 1. Inject unrecoverable error on a given target 79 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 80 ... 2. Check If HOST is rebooted. 81 ... 3. Verify & clear gard records. 82 ... 4. Verify error log entry & signature description. 83 ... 5. Verify & clear dump entry. 84 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 85 ... ${signature_desc} ${log_prefix} ${bmc_reboot}=${0} 86 # Description of argument(s): 87 # interface_type Inject error through 'BMC' or 'HOST'. 88 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 89 # value (e.g 2000000000000000). 90 # threshold_limit Threshold limit (e.g 1, 5, 32). 91 # signature_desc Error Log signature description. 92 # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable') 93 # log_prefix Log path prefix. 94 # bmc_reboot Do bmc reboot If bmc_reboot is set. 95 96 Run Keyword Inject Error Through ${interface_type} 97 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 98 99 # Do BMC Reboot after error injection. 100 Run Keyword If ${bmc_reboot} Run Keywords 101 ... Initiate BMC Reboot 102 ... Wait For BMC Ready 103 ... Initiate Host PowerOff 104 ... Initiate Host Boot 105 ... ELSE 106 ... Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 107 108 Wait for OS 109 Verify Error Log Entry ${signature_desc} ${log_prefix} 110 Read Properties ${DUMP_ENTRY_URI}list 111 Delete All BMC Dump 112 Verify And Clear Gard Records On HOST 113 114Fetch FIR Address Translation Value 115 [Documentation] Fetch FIR address translation value through HOST. 116 [Arguments] ${fir_address} ${target_type} 117 # Description of argument(s): 118 # fir_address FIR (Fault isolation register) value (e.g. '2011400'). 119 # core_id Core ID (e.g. '9'). 120 # target_type Target type (e.g. 'EX', 'EQ', 'C'). 121 122 Login To OS Host 123 Copy Address Translation Utils To HOST OS 124 125 # Fetch processor chip IDs. 126 ${proc_chip_id}= Get ProcChipId From OS Processor ${master_proc_chip} 127 # Example output: 128 # 00000000 129 130 ${core_ids}= Get Core IDs From OS ${proc_chip_id[-1]} 131 # Example output: 132 #./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22 133 # ['14', '15', '16', '17'] 134 135 # Ignoring master core ID. 136 ${output}= Get Slice From List ${core_ids} 1 137 # Feth random non-master core ID. 138 ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random 139 ${core_id}= Get From List ${core_ids_sub_list} 0 140 ${translated_fir_addr}= FIR Address Translation Through HOST 141 ... ${fir_address} ${core_id} ${target_type} 142 143 [Return] ${translated_fir_addr} 144 145RAS Test SetUp 146 [Documentation] Validates input parameters. 147 148 Should Not Be Empty 149 ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host. 150 Should Not Be Empty 151 ... ${OS_USERNAME} msg=You must provide OS host user name. 152 Should Not Be Empty 153 ... ${OS_PASSWORD} msg=You must provide OS host user password. 154 155 Smart Power Off 156 157 # Boot to OS. 158 REST Power On quiet=${1} 159 # Adding delay after host bring up. 160 Sleep 60s 161 162RAS Suite Setup 163 [Documentation] Create RAS log directory to store all RAS test logs. 164 165 ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/ 166 Set Suite Variable ${RAS_LOG_DIR_PATH} 167 Set Suite Variable ${master_proc_chip} False 168 169 Create Directory ${RAS_LOG_DIR_PATH} 170 OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH} 171 Empty Directory ${RAS_LOG_DIR_PATH} 172 173 Should Not Be Empty ${ESEL_BIN_PATH} 174 Set Environment Variable PATH %{PATH}:${ESEL_BIN_PATH} 175 176 # Boot to Os. 177 REST Power On quiet=${1} 178 179 # Check Opal-PRD service enabled on host. 180 ${opal_prd_state}= Is Opal-PRD Service Enabled 181 Run Keyword If '${opal_prd_state}' == 'disabled' 182 ... Enable Opal-PRD Service On HOST 183 184RAS Suite Cleanup 185 [Documentation] Perform RAS suite cleanup and verify that host 186 ... boots after test suite run. 187 188 # Boot to OS. 189 REST Power On 190 Delete Error Logs 191 Gard Operations On OS clear all 192 193 194Inject Error At HOST Boot Path 195 196 [Documentation] Inject and verify recoverable error on processor through 197 ... BMC using pdbg tool at HOST Boot path. 198 ... Test sequence: 199 ... 1. Inject error on a given target 200 ... (e.g: Processor core, CAPP, MCA) through BMC using 201 ... pdbg tool at HOST Boot path. 202 ... 2. Check If HOST is rebooted and running. 203 ... 3. Verify error log entry & signature description. 204 ... 4. Verify & clear gard records. 205 [Arguments] ${fir_address} ${value} ${signature_desc} ${log_prefix} 206 # Description of argument(s): 207 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 208 # value (e.g 2000000000000000). 209 # signature_desc Error log signature description. 210 # log_prefix Log path prefix. 211 212 Inject Error Through BMC At HOST Boot ${fir_address} ${value} 213 214 Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 215 Wait for OS 216 Verify Error Log Entry ${signature_desc} ${log_prefix} 217 Verify And Clear Gard Records On HOST 218