xref: /openbmc/openbmc-test-automation/openpower/ras/ras_utils.robot (revision 6fb70d98f2f1cb9273ba912deaa2cebe3c23ea86)
1*** Settings ***
2Documentation       Utility for RAS test scenarios through HOST & BMC.
3Resource            ../../lib/utils.robot
4Resource            ../../lib/ras/host_utils.robot
5Resource            ../../lib/resource.robot
6Resource            ../../lib/state_manager.robot
7Resource            ../../lib/boot_utils.robot
8Variables           ../../lib/ras/variables.py
9Variables           ../../data/variables.py
10Resource            ../../lib/dump_utils.robot
11
12Library             DateTime
13Library             OperatingSystem
14Library             random
15Library             Collections
16
17*** Variables ***
18${stack_mode}       normal
19
20*** Keywords ***
21
22Verify And Clear Gard Records On HOST
23    [Documentation]  Verify And Clear gard records on HOST.
24
25    ${output}=  Gard Operations On OS  list
26    Should Not Contain  ${output}  No GARD
27    Gard Operations On OS  clear all
28
29Verify Error Log Entry
30    [Documentation]  Verify error log entry & signature description.
31    [Arguments]  ${signature_desc}  ${log_prefix}
32    # Description of argument(s):
33    # signature_desc  Error log signature description.
34    # log_prefix      Log path prefix.
35
36
37    Error Logs Should Exist
38
39    Collect eSEL Log  ${log_prefix}
40    ${error_log_file_path}=  Catenate  ${log_prefix}esel.txt
41    ${rc}  ${output}=  Run and Return RC and Output
42    ...  grep -i ${signature_desc} ${error_log_file_path}
43    Should Be Equal  ${rc}  ${0}
44    Should Not Be Empty  ${output}
45
46Inject Recoverable Error With Threshold Limit
47    [Documentation]  Inject and verify recoverable error on processor through
48    ...              BMC/HOST.
49    ...              Test sequence:
50    ...              1. Inject recoverable error on a given target
51    ...                 (e.g: Processor core, CAPP, MCA) through BMC/HOST.
52    ...              2. Check If HOST is running.
53    ...              3. Verify error log entry & signature description.
54    ...              4. Verify & clear gard records.
55    [Arguments]      ${interface_type}  ${fir_address}  ${value}  ${threshold_limit}
56    ...              ${signature_desc}  ${log_prefix}
57    # Description of argument(s):
58    # interface_type      Inject error through 'BMC' or 'HOST'.
59    # fir_address         FIR (Fault isolation register) value (e.g. 2011400).
60    # value               (e.g 2000000000000000).
61    # threshold_limit     Threshold limit (e.g 1, 5, 32).
62    # signature_desc      Error log signature description.
63    # log_prefix          Log path prefix.
64
65    Run Keyword  Inject Error Through ${interface_type}
66    ...  ${fir_address}  ${value}  ${threshold_limit}  ${master_proc_chip}
67
68    Is Host Running
69    ${output}=  Gard Operations On OS  list
70    Should Contain  ${output}  No GARD
71    Verify Error Log Entry  ${signature_desc}  ${log_prefix}
72
73
74Inject Unrecoverable Error
75    [Documentation]  Inject and verify unrecoverable error on processor through
76    ...              BMC/HOST.
77    ...              Test sequence:
78    ...              1. Inject unrecoverable error on a given target
79    ...                 (e.g: Processor core, CAPP, MCA) through BMC/HOST.
80    ...              2. Check If HOST is rebooted.
81    ...              3. Verify & clear gard records.
82    ...              4. Verify error log entry & signature description.
83    ...              5. Verify & clear dump entry.
84    [Arguments]      ${interface_type}  ${fir_address}  ${value}  ${threshold_limit}
85    ...              ${signature_desc}  ${log_prefix}  ${bmc_reboot}=${0}
86    # Description of argument(s):
87    # interface_type      Inject error through 'BMC' or 'HOST'.
88    # fir_address         FIR (Fault isolation register) value (e.g. 2011400).
89    # value               (e.g 2000000000000000).
90    # threshold_limit     Threshold limit (e.g 1, 5, 32).
91    # signature_desc      Error Log signature description.
92    #                     (e.g 'mcs(n0p0c0) (MCFIR[0]) mc internal recoverable')
93    # log_prefix          Log path prefix.
94    # bmc_reboot          Do bmc reboot If bmc_reboot is set.
95
96    Run Keyword  Inject Error Through ${interface_type}
97    ...  ${fir_address}  ${value}  ${threshold_limit}  ${master_proc_chip}
98
99    # Do BMC Reboot after error injection.
100    Run Keyword If  ${bmc_reboot}  Run Keywords
101    ...    Initiate BMC Reboot
102    ...    Wait For BMC Ready
103    ...    Initiate Host PowerOff
104    ...    Initiate Host Boot
105    ...  ELSE
106    ...    Wait Until Keyword Succeeds  500 sec  20 sec  Is Host Rebooted
107
108    Wait for OS
109    Verify Error Log Entry  ${signature_desc}  ${log_prefix}
110
111    ${dump_service_status}  ${stderr}  ${rc}=  BMC Execute Command
112    ...  systemctl status xyz.openbmc_project.Dump.Manager.service
113    Should Contain  ${dump_service_status}  Active: active (running)
114
115    ${resp}=  OpenBMC Get Request  ${DUMP_URI}
116    Run Keyword If  '${resp.status_code}' == '${HTTP_NOT_FOUND}'
117    ...  Set Test Variable  ${DUMP_ENTRY_URI}  /xyz/openbmc_project/dump/entry/
118
119    Read Properties  ${DUMP_ENTRY_URI}list
120    Delete All BMC Dump
121    Verify And Clear Gard Records On HOST
122
123
124Fetch FIR Address Translation Value
125    [Documentation]  Fetch FIR address translation value through HOST.
126    [Arguments]  ${fir_address}  ${target_type}
127    # Description of argument(s):
128    # fir_address          FIR (Fault isolation register) value (e.g. '2011400').
129    # core_id              Core ID (e.g. '9').
130    # target_type          Target type (e.g. 'EX', 'EQ', 'C').
131
132    Login To OS Host
133    Copy Address Translation Utils To HOST OS
134
135    # Fetch processor chip IDs.
136    ${proc_chip_id}=  Get ProcChipId From OS  Processor  ${master_proc_chip}
137    # Example output:
138    # 00000000
139
140    ${core_ids}=  Get Core IDs From OS  ${proc_chip_id[-1]}
141    # Example output:
142    #./probe_cpus.sh | grep 'CHIP ID: 0' | cut -c21-22
143    # ['14', '15', '16', '17']
144
145    # Ignoring master core ID.
146    ${output}=  Get Slice From List  ${core_ids}  1
147    # Feth random non-master core ID.
148    ${core_ids_sub_list}=   Evaluate  random.sample(${core_ids}, 1)  random
149    ${core_id}=  Get From List  ${core_ids_sub_list}  0
150    ${translated_fir_addr}=  FIR Address Translation Through HOST
151    ...  ${fir_address}  ${core_id}  ${target_type}
152
153    RETURN  ${translated_fir_addr}
154
155RAS Test SetUp
156    [Documentation]  Validates input parameters.
157
158    Should Not Be Empty
159    ...  ${OS_HOST}  msg=You must provide DNS name/IP of the OS host.
160    Should Not Be Empty
161    ...  ${OS_USERNAME}  msg=You must provide OS host user name.
162    Should Not Be Empty
163    ...  ${OS_PASSWORD}  msg=You must provide OS host user password.
164
165    Smart Power Off
166
167    # Boot to OS.
168    REST Power On  quiet=${1}
169    # Adding delay after host bring up.
170    Sleep  60s
171
172RAS Suite Setup
173    [Documentation]  Create RAS log directory to store all RAS test logs.
174
175    ${RAS_LOG_DIR_PATH}=  Catenate  ${EXECDIR}/RAS_logs/
176    Set Suite Variable  ${RAS_LOG_DIR_PATH}
177    Set Suite Variable  ${master_proc_chip}  False
178
179    Create Directory  ${RAS_LOG_DIR_PATH}
180    OperatingSystem.Directory Should Exist  ${RAS_LOG_DIR_PATH}
181    Empty Directory  ${RAS_LOG_DIR_PATH}
182
183    Should Not Be Empty  ${ESEL_BIN_PATH}
184    Set Environment Variable  PATH  %{PATH}:${ESEL_BIN_PATH}
185
186    # Boot to Os.
187    REST Power On  quiet=${1}
188
189    # Check Opal-PRD service enabled on host.
190    ${opal_prd_state}=  Is Opal-PRD Service Enabled
191    Run Keyword If  '${opal_prd_state}' == 'disabled'
192    ...  Enable Opal-PRD Service On HOST
193
194RAS Suite Cleanup
195    [Documentation]  Perform RAS suite cleanup and verify that host
196    ...              boots after test suite run.
197
198    # Boot to OS.
199    REST Power On
200    Delete Error Logs
201    Gard Operations On OS  clear all
202
203
204Inject Error At HOST Boot Path
205
206    [Documentation]  Inject and verify recoverable error on processor through
207    ...              BMC using pdbg tool at HOST Boot path.
208    ...              Test sequence:
209    ...              1. Inject error on a given target
210    ...                 (e.g: Processor core, CAPP, MCA) through BMC using
211    ...                 pdbg tool at HOST Boot path.
212    ...              2. Check If HOST is rebooted and running.
213    ...              3. Verify error log entry & signature description.
214    ...              4. Verify & clear gard records.
215    [Arguments]      ${fir_address}  ${value}  ${signature_desc}  ${log_prefix}
216    # Description of argument(s):
217    # fir_address         FIR (Fault isolation register) value (e.g. 2011400).
218    # value               (e.g 2000000000000000).
219    # signature_desc      Error log signature description.
220    # log_prefix          Log path prefix.
221
222    Inject Error Through BMC At HOST Boot  ${fir_address}  ${value}
223
224    Wait Until Keyword Succeeds  500 sec  20 sec  Is Host Rebooted
225    Wait for OS
226    Verify Error Log Entry  ${signature_desc}  ${log_prefix}
227    Verify And Clear Gard Records On HOST
228