1*** Settings *** 2Documentation Utility for RAS test scenarios through HOST & BMC. 3Resource ../../lib/utils.robot 4Resource ../../lib/ras/host_utils.robot 5Resource ../../lib/resource.robot 6Resource ../../lib/state_manager.robot 7Resource ../../lib/boot_utils.robot 8Variables ../../lib/ras/variables.py 9Variables ../../data/variables.py 10Resource ../../lib/dump_utils.robot 11 12Library DateTime 13Library OperatingSystem 14Library random 15Library Collections 16 17*** Variables *** 18${stack_mode} normal 19 20*** Keywords *** 21 22Verify And Clear Gard Records On HOST 23 [Documentation] Verify And Clear gard records on HOST. 24 25 ${output}= Gard Operations On OS list 26 Should Not Contain ${output} No GARD 27 Gard Operations On OS clear all 28 29Verify Error Log Entry 30 [Documentation] Verify error log entry & signature description. 31 [Arguments] ${signature_desc} ${log_prefix} 32 # Description of argument(s): 33 # signature_desc Error log signature description. 34 # log_prefix Log path prefix. 35 36 # TODO: Need to move this keyword to common utility. 37 38 Error Logs Should Exist 39 40 Collect eSEL Log ${log_prefix} 41 ${error_log_file_path}= Catenate ${log_prefix}esel.txt 42 ${rc} ${output}= Run and Return RC and Output 43 ... grep -i ${signature_desc} ${error_log_file_path} 44 Should Be Equal ${rc} ${0} 45 Should Not Be Empty ${output} 46 47Inject Recoverable Error With Threshold Limit 48 [Documentation] Inject and verify recoverable error on processor through 49 ... BMC/HOST. 50 ... Test sequence: 51 ... 1. Inject recoverable error on a given target 52 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 53 ... 2. Check If HOST is running. 54 ... 3. Verify error log entry & signature description. 55 ... 4. Verify & clear gard records. 56 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 57 ... ${signature_desc} ${log_prefix} 58 # Description of argument(s): 59 # interface_type Inject error through 'BMC' or 'HOST'. 60 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 61 # value (e.g 2000000000000000). 62 # threshold_limit Threshold limit (e.g 1, 5, 32). 63 # signature_desc Error log signature description. 64 # log_prefix Log path prefix. 65 66 Run Keyword Inject Error Through ${interface_type} 67 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 68 69 Is Host Running 70 ${output}= Gard Operations On OS list 71 Should Contain ${output} No GARD 72 Verify Error Log Entry ${signature_desc} ${log_prefix} 73 # TODO: Verify SOL console logs. 74 75 76Inject Unrecoverable Error 77 [Documentation] Inject and verify unrecoverable error on processor through 78 ... BMC/HOST. 79 ... Test sequence: 80 ... 1. Inject unrecoverable error on a given target 81 ... (e.g: Processor core, CAPP, MCA) through BMC/HOST. 82 ... 2. Check If HOST is rebooted. 83 ... 3. Verify & clear gard records. 84 ... 4. Verify error log entry & signature description. 85 ... 5. Verify & clear dump entry. 86 [Arguments] ${interface_type} ${fir_address} ${value} ${threshold_limit} 87 ... ${signature_desc} ${log_prefix} ${bmc_reboot}=${0} 88 # Description of argument(s): 89 # interface_type Inject error through 'BMC' or 'HOST'. 90 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 91 # value (e.g 2000000000000000). 92 # threshold_limit Threshold limit (e.g 1, 5, 32). 93 # signature_desc Error Log signature description. 94 # (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable') 95 # log_prefix Log path prefix. 96 # bmc_reboot Do bmc reboot If bmc_reboot is set. 97 98 Run Keyword Inject Error Through ${interface_type} 99 ... ${fir_address} ${value} ${threshold_limit} ${master_proc_chip} 100 101 # Do BMC Reboot after error injection. 102 Run Keyword If ${bmc_reboot} Run Keywords 103 ... Initiate BMC Reboot 104 ... Wait For BMC Ready 105 ... Initiate Host PowerOff 106 ... Initiate Host Boot 107 ... ELSE 108 ... Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 109 110 Wait for OS 111 Verify Error Log Entry ${signature_desc} ${log_prefix} 112 Read Properties ${DUMP_ENTRY_URI}list 113 Delete All BMC Dump 114 Verify And Clear Gard Records On HOST 115 116Fetch FIR Address Translation Value 117 [Documentation] Fetch FIR address translation value through HOST. 118 [Arguments] ${fir_address} ${target_type} 119 # Description of argument(s): 120 # fir_address FIR (Fault isolation register) value (e.g. '2011400'). 121 # core_id Core ID (e.g. '9'). 122 # target_type Target type (e.g. 'EX', 'EQ', 'C'). 123 124 Login To OS Host 125 Copy Address Translation Utils To HOST OS 126 127 # Fetch processor chip IDs. 128 ${proc_chip_id}= Get ProcChipId From OS Processor ${master_proc_chip} 129 # Example output: 130 # 00000000 131 132 ${core_ids}= Get Core IDs From OS ${proc_chip_id[-1]} 133 # Example output: 134 #./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22 135 # ['14', '15', '16', '17'] 136 137 # Ignoring master core ID. 138 ${output}= Get Slice From List ${core_ids} 1 139 # Feth random non-master core ID. 140 ${core_ids_sub_list}= Evaluate random.sample(${core_ids}, 1) random 141 ${core_id}= Get From List ${core_ids_sub_list} 0 142 ${translated_fir_addr}= FIR Address Translation Through HOST 143 ... ${fir_address} ${core_id} ${target_type} 144 145 [Return] ${translated_fir_addr} 146 147RAS Test SetUp 148 [Documentation] Validates input parameters. 149 150 Should Not Be Empty 151 ... ${OS_HOST} msg=You must provide DNS name/IP of the OS host. 152 Should Not Be Empty 153 ... ${OS_USERNAME} msg=You must provide OS host user name. 154 Should Not Be Empty 155 ... ${OS_PASSWORD} msg=You must provide OS host user password. 156 157 Smart Power Off 158 159 # Boot to OS. 160 REST Power On quiet=${1} 161 # Adding delay after host bring up. 162 Sleep 60s 163 164RAS Suite Setup 165 [Documentation] Create RAS log directory to store all RAS test logs. 166 167 ${RAS_LOG_DIR_PATH}= Catenate ${EXECDIR}/RAS_logs/ 168 Set Suite Variable ${RAS_LOG_DIR_PATH} 169 Set Suite Variable ${master_proc_chip} False 170 171 Create Directory ${RAS_LOG_DIR_PATH} 172 OperatingSystem.Directory Should Exist ${RAS_LOG_DIR_PATH} 173 Empty Directory ${RAS_LOG_DIR_PATH} 174 175 Should Not Be Empty ${ESEL_BIN_PATH} 176 Set Environment Variable PATH %{PATH}:${ESEL_BIN_PATH} 177 178 # Boot to Os. 179 REST Power On quiet=${1} 180 181 # Check Opal-PRD service enabled on host. 182 ${opal_prd_state}= Is Opal-PRD Service Enabled 183 Run Keyword If '${opal_prd_state}' == 'disabled' 184 ... Enable Opal-PRD Service On HOST 185 186RAS Suite Cleanup 187 [Documentation] Perform RAS suite cleanup and verify that host 188 ... boots after test suite run. 189 190 # Boot to OS. 191 REST Power On 192 Delete Error Logs 193 Gard Operations On OS clear all 194 195 196Inject Error At HOST Boot Path 197 198 [Documentation] Inject and verify recoverable error on processor through 199 ... BMC using pdbg tool at HOST Boot path. 200 ... Test sequence: 201 ... 1. Inject error on a given target 202 ... (e.g: Processor core, CAPP, MCA) through BMC using 203 ... pdbg tool at HOST Boot path. 204 ... 2. Check If HOST is rebooted and running. 205 ... 3. Verify error log entry & signature description. 206 ... 4. Verify & clear gard records. 207 [Arguments] ${fir_address} ${value} ${signature_desc} ${log_prefix} 208 # Description of argument(s): 209 # fir_address FIR (Fault isolation register) value (e.g. 2011400). 210 # value (e.g 2000000000000000). 211 # signature_desc Error log signature description. 212 # log_prefix Log path prefix. 213 214 Inject Error Through BMC At HOST Boot ${fir_address} ${value} 215 216 Wait Until Keyword Succeeds 500 sec 20 sec Is Host Rebooted 217 Wait for OS 218 Verify Error Log Entry ${signature_desc} ${log_prefix} 219 Verify And Clear Gard Records On HOST 220