*** Settings *** Documentation Keywords for system test. Library ../lib/gen_robot_keyword.py Library ../lib/gen_print.py Library ../lib/gen_robot_print.py Resource ../lib/boot_utils.robot Resource ../extended/obmc_boot_test_resource.robot Resource ../lib/utils.robot Resource ../lib/state_manager.robot Resource ../lib/rest_client.robot Resource ../lib/htx_resource.robot Library OperatingSystem Library DateTime *** Variables *** ${htx_log_dir_path} ${EXECDIR}${/}logs${/} # Error strings to check from dmesg. ${ERROR_REGEX} error|GPU|NVRM|nvidia # GPU specific error message from dmesg. ${ERROR_DBE_MSG} (DBE) has been detected on GPU # Inventory - List of I/O devices to collect for Inventory @{I/O} communication disk display generic input multimedia ... network printer tape # Inventory Paths of the JSON and YAML files ${json_tmp_file_path} ${EXECDIR}/inventory_temp_file.json ${yaml_file_path} ${EXECDIR}/inventory_temp_file.yaml *** Keywords *** Login To OS [Documentation] Login to OS Host. [Arguments] ${os_host}=${OS_HOST} ${os_username}=${OS_USERNAME} ... ${os_password}=${OS_PASSWORD} ... ${alias_name}=os_connection # Description of argument(s): # os_host IP address of the OS Host. # os_username OS Host Login user name. # os_password OS Host Login passwrd. # alias_name Default OS SSH session connection alias name. Ping Host ${os_host} SSHLibrary.Open Connection ${os_host} alias=${alias_name} SSHLibrary.Login ${os_username} ${os_password} Tool Exist [Documentation] Check whether given tool is installed on OS. [Arguments] ${tool_name} # Description of argument(s): # tool_name Tool name whose existence is to be checked. ${output} ${stderr} ${rc}= OS Execute Command which ${tool_name} Should Contain ${output} ${tool_name} ... msg=Please install ${tool_name} tool. Boot To OS [Documentation] Boot host OS. Run Key OBMC Boot Test \ REST Power On Power Off Host [Documentation] Power off host. Run Key OBMC Boot Test \ REST Power Off File Exist On OS [Documentation] Check if the given file path exist on OS. [Arguments] ${file_path} # Description of argument(s): # file_path Absolute file path. Login To OS ${out} ${stderr} ${rc}= OS Execute Command ls ${file_path} Log To Console \n File Exist: ${out} Is HTX Running [Documentation] Return "True" if the HTX is running, "False" ... otherwise. # Example usage: # ${status}= Is HTX Running # Run Keyword If '${status}' == 'True' Shutdown HTX Exerciser ${status} ${stderr} ${rc}= OS Execute Command ... htxcmdline -getstats ignore_err=1 # Get HTX state # (idle, currently running, selected_mdt but not running). ${running}= Set Variable If ... "Currently running" in """${status}""" ${True} ${False} [Return] ${running} Write Log Data To File [Documentation] Write log data to the logs directory. [Arguments] ${data}= ${log_file_path}= # Description of argument(s): # data String buffer. # log_file_path The log file path. Create File ${log_file_path} ${data} Collect HTX Log Files [Documentation] Collect status and error log files. # Collects the following files: # HTX error log file /tmp/htxerr # HTX status log file /tmp/htxstats # Create logs directory and get current datetime. Create Directory ${htx_log_dir_path} ${cur_datetime}= Get Current Date result_format=%Y%m%d%H%M%S%f File Exist On OS /tmp/htxerr ${htx_err} ${std_err} ${rc}= OS Execute Command cat /tmp/htxerr Write Log Data To File ... ${htx_err} ${htx_log_dir_path}/${OS_HOST}${cur_datetime}.htxerr File Exist On OS /tmp/htxstats ${htx_stats} ${std_err} ${rc}= OS Execute Command ... cat /tmp/htxstats Write Log Data To File ... ${htx_stats} ${htx_log_dir_path}/${OS_HOST}_${cur_datetime}.htxstats REST Upload File To BMC [Documentation] Upload a file via REST to BMC. # Generate 32 MB file size Run dd if=/dev/zero of=dummyfile bs=1 count=0 seek=32MB OperatingSystem.File Should Exist dummyfile # Get the content of the file and upload to BMC ${image_data}= OperatingSystem.Get Binary File dummyfile # Get REST session to BMC Initialize OpenBMC # Create the REST payload headers and data ${data}= Create Dictionary data ${image_data} ${headers}= Create Dictionary Content-Type=application/octet-stream ... Accept=application/octet-stream Set To Dictionary ${data} headers ${headers} ${resp}= Post Request openbmc /upload/image &{data} Should Be Equal As Strings ${resp.status_code} ${HTTP_BAD_REQUEST} ... msg=Openbmc /upload/image failed. # Take SSH connection to BMC and switch to BMC connection to perform # the task. &{bmc_connection_args}= Create Dictionary alias=bmc_connection Open Connection And Log In &{bmc_connection_args} # Currently OS SSH session is active, switch to BMC connection. Switch Connection bmc_connection # Switch back to OS SSH connection. Switch Connection os_connection Get CPU Min Frequency Limit [Documentation] Get CPU minimum assignable frequency. # lscpu | grep min returns # CPU min MHz: 1983.0000 ${cmd}= Catenate lscpu | grep min | tr -dc '0-9.\n' ${cpu_freq} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${cpu_freq} Get CPU Min Frequency [Documentation] Get CPU assigned minimum frequency. # ppc64_cpu --frequency -t 10 returns # min: 3.295 GHz (cpu 143) # max: 3.295 GHz (cpu 0) # avg: 3.295 GHz ${cmd}= Catenate ppc64_cpu --frequency -t 10 | grep min ... | cut -f 2 | cut -d ' ' -f 1 | tr -dc '0-9\n' ${cpu_freq} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${cpu_freq} Get CPU Max Frequency Limit [Documentation] Get CPU maximum assignable frequency. # lscpu | grep max returns # CPU max MHz: 3300.0000 ${cmd}= Catenate lscpu | grep max | tr -dc '0-9.\n' ${cpu_freq} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${cpu_freq} Get CPU Max Frequency [Documentation] Get CPU assigned maximum frequency. # ppc64_cpu --frequency -t 10 returns # min: 3.295 GHz (cpu 143) # max: 3.295 GHz (cpu 0) # avg: 3.295 GHz ${cmd}= Catenate ppc64_cpu --frequency -t 10 | grep max ... | cut -f 2 | cut -d ' ' -f 1 | tr -dc '0-9\n' ${cpu_freq} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${cpu_freq} Get CPU Max Temperature [Documentation] Get the highest CPU Temperature. ${temperature_objs}= Read Properties ... ${SENSORS_URI}temperature/enumerate # Filter the dictionary to get just the CPU temperature info. ${cmd}= Catenate {k:v for k,v in $temperature_objs.items() ... if re.match('${SENSORS_URI}temperature/p.*core.*temp', k)} ${cpu_temperatuture_objs} Evaluate ${cmd} modules=re # Create a list of the CPU temperature values (current). ${cpu_temperatures}= Evaluate ... [ x['Value'] for x in $cpu_temperatuture_objs.values() ] ${cpu_max_temp} Evaluate int(max(map(int, $cpu_temperatures))/1000) [Return] ${cpu_max_temp} Get CPU Min Temperature [Documentation] Get the CPU Temperature. ${temperature_objs}= Read Properties ... ${SENSORS_URI}temperature/enumerate # Filter the dictionary to get just the CPU temperature info. ${cmd}= Catenate {k:v for k,v in $temperature_objs.items() ... if re.match('${SENSORS_URI}temperature/p.*core.*temp', k)} ${cpu_temperatuture_objs}= Evaluate ${cmd} modules=re # Create a list of the CPU temperature values (current). ${cpu_temperatures}= Evaluate ... [ x['Value'] for x in $cpu_temperatuture_objs.values() ] ${cpu_min_temp} Evaluate int(min(map(int, $cpu_temperatures))/1000) [Return] ${cpu_min_temp} Check For Errors On OS Dmesg Log [Documentation] Check if dmesg has nvidia errors logged. ${dmesg_log} ${stderr} ${rc}= OS Execute Command ... dmesg | egrep '${ERROR_REGEX}' # To enable multiple string check. Should Not Contain ${dmesg_log} ${ERROR_DBE_MSG} ... msg=OS dmesg shows ${ERROR_DBE_MSG}. Collect NVIDIA Log File [Documentation] Collect ndivia-smi command output. [Arguments] ${suffix} # Description of argument(s): # suffix String name to append. # Collects the output of ndivia-smi cmd output. # +--------------------------------------------------------------------+ # | NVIDIA-SMI 361.89 Driver Version: 361.89 | # |-------------------------------+----------------------+-------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | GPU ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | Utiliz err | # |===============================+======================+=============| # | 0 Tesla P100-SXM2... On | 0002:01:00.0 Off | 0 | # | N/A 25C P0 35W / 300W | 931MiB / 16280MiB | 0% Default | # +-------------------------------+----------------------+-------------+ # | 1 Tesla P100-SXM2... On | 0003:01:00.0 Off | 0 | # | N/A 26C P0 40W / 300W | 1477MiB / 16280MiB | 0% Default | # +-------------------------------+----------------------+-------------+ # | 2 Tesla P100-SXM2... On | 0006:01:00.0 Off | 0 | # | N/A 25C P0 35W / 300W | 931MiB / 16280MiB | 0% Default | # +-------------------------------+----------------------+-------------+ # | 3 Tesla P100-SXM2... On | 0007:01:00.0 Off | 0 | # | N/A 44C P0 290W / 300W | 965MiB / 16280MiB | 0% Default | # +-------------------------------+----------------------+-------------+ # +--------------------------------------------------------------------+ # | Processes: GPU Memory | # | GPU PID Type Process name Usage | # |====================================================================| # | 0 28459 C hxenvidia 929MiB | # | 1 28460 C hxenvidia 1475MiB | # | 2 28461 C hxenvidia 929MiB | # | 3 28462 C hxenvidia 963MiB | # +--------------------------------------------------------------------+ # Create logs directory and get current datetime. Create Directory ${htx_log_dir_path} ${cur_datetime}= Get Current Date result_format=%Y%m%d%H%M%S%f ${nvidia_out} ${stderr} ${rc}= OS Execute Command nvidia-smi Write Log Data To File ... ${nvidia_out} ... ${htx_log_dir_path}/${OS_HOST}_${cur_datetime}.nvidia_${suffix} Get GPU Power Limit [Documentation] Get NVIDIA GPU maximum permitted power draw. # nvidia-smi --query-gpu=power.limit --format=csv returns # power.limit [W] # 300.00 W # 300.00 W # 300.00 W # 300.00 W ${cmd}= Catenate nvidia-smi --query-gpu=power.limit ... --format=csv | cut -f 1 -d ' ' | sort -n -u | tail -n 1 ${nvidia_out} ${stderr} ${rc}= OS Execute Command ${cmd} # Allow for sensor overshoot. That is, max power reported for # a GPU could be a few watts above the limit. ${power_max}= Evaluate ${nvidia_out}+${7.00} [Return] ${power_max} Get GPU Max Power [Documentation] Get the maximum GPU power dissipation. # nvidia-smi --query-gpu=power.draw --format=csv returns # power.draw [W] # 34.12 W # 34.40 W # 36.55 W # 36.05 W ${cmd}= Catenate nvidia-smi --query-gpu=power.draw ... --format=csv | cut -f 1 -d ' ' | sort -n -u | tail -n 1 ${nvidia_out} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${nvidia_out} Get GPU Min Power [Documentation] Return the minimum GPU power value as record by ... nvidia-smi. ${cmd}= Catenate nvidia-smi --query-gpu=power.draw --format=csv | ... grep -v 'power.draw' | cut -f 1 -d ' ' | sort -n -u | head -1 ${gpu_min_power} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${gpu_min_power} Get GPU Temperature Limit [Documentation] Get NVIDIA GPU maximum permitted temperature. # nvidia-smi -q -d TEMPERATURE | grep "GPU Max" returns # GPU Max Operating Temp : 83 C # GPU Max Operating Temp : 83 C # GPU Max Operating Temp : 83 C # GPU Max Operating Temp : 83 C ${cmd}= Catenate nvidia-smi -q -d TEMPERATURE | grep "GPU Max" ... | cut -f 2 -d ":" | tr -dc '0-9\n' | sort -n -u | tail -n 1 ${nvidia_out} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${nvidia_out} Get GPU Min Temperature [Documentation] Get the minimum GPU temperature. ${cmd}= Catenate nvidia-smi --query-gpu=temperature.gpu ... --format=csv | grep -v 'temperature.gpu' | sort -n -u | head -1 ${nvidia_out} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${nvidia_out} Get GPU Max Temperature [Documentation] Get the maximum GPU temperature. # nvidia-smi --query-gpu=temperature.gpu --format=csv returns # 38 # 41 # 38 # 40 ${cmd}= Catenate nvidia-smi --query-gpu=temperature.gpu ... --format=csv | sort -n -u | tail -n 1 ${nvidia_out} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${nvidia_out} Get GPU Temperature Via REST [Documentation] Return the temperature in degrees C of the warmest GPU ... as reportd by REST. # NOTE: This endpoint path is not defined until system has been powered-on. ${temperature_objs}= Read Properties ${SENSORS_URI}temperature/enumerate ... timeout=30 quiet=1 ${core_temperatures_list}= Catenate {k:v for k,v in $temperature_objs.items() ... if re.match('${SENSORS_URI}temperature/.*_core_temp', k)} ${gpu_temperature_objs_dict}= Evaluate ${core_temperatures_list} modules=re # Create a list containing all of the GPU temperatures. ${gpu_temperatures}= Evaluate ... [ x['Value'] for x in $gpu_temperature_objs_dict.values() ] # Find the max temperature value and divide by 1000 to get just the integer # portion. ${max_gpu_temperature}= Evaluate ... int(max(map(int, $gpu_temperatures))/1000) [Return] ${max_gpu_temperature} Get GPU Clock Limit [Documentation] Get NVIDIA GPU maximum permitted graphics clock. # nvidia-smi --query-gpu=clocks.max.gr --format=csv returns # 1530 MHz # 1530 MHz # 1530 MHz # 1530 MHz ${cmd}= Catenate nvidia-smi --query-gpu=clocks.max.gr ... --format=csv | cut -f 1 -d ' ' | sort -n -u | tail -n 1 ${nvidia_out} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${nvidia_out} Get GPU Clock [Documentation] Get the highest assigned value of the GPU graphics clock. # nvidia-smi --query-gpu=clocks.gr --format=csv returns # 1230 MHz # 1230 MHz # 135 MHz # 150 MHz ${cmd}= Catenate nvidia-smi --query-gpu=clocks.gr ... --format=csv | cut -f 1 -d ' ' | sort -n -u | tail -n 1 ${nvidia_out} ${stderr} ${rc}= OS Execute Command ${cmd} [Return] ${nvidia_out} Count GPUs From BMC [Documentation] Determine number of GPUs from the BMC. Hostboot ... needs to have been run previously because the BMC gets GPU data ... from Hostboot. # Example of gv* endpoint data: # "/xyz/openbmc_project/inventory/system/chassis/motherboard/gv100card0": { # "Functional": 1, # "Present": 1, # "PrettyName": "" # }, ${num_bmc_gpus}= Set Variable ${0} ${gpu_list}= Get Endpoint Paths ... ${HOST_INVENTORY_URI}system/chassis/motherboard gv* FOR ${gpu_uri} IN @{gpu_list} ${present}= Read Attribute ${gpu_uri} Present ${state}= Read Attribute ${gpu_uri} Functional Rpvars gpu_uri present state ${num_bmc_gpus}= Run Keyword If ${present} and ${state} ... Evaluate ${num_bmc_gpus}+${1} END [Return] ${num_bmc_gpus} Create Default MDT Profile [Documentation] Create default mdt.bu profile and run. Print Timen Create HTX mdt profile. ${profile} ${stderr} ${rc}= OS Execute Command ... htxcmdline -createmdt Printn ${profile} Should Contain ${profile} mdts are created successfully ... msg=Create MDT profile failed. response=${profile} Run MDT Profile [Documentation] Load user pre-defined MDT profile. [Arguments] ${HTX_MDT_PROFILE}=${HTX_MDT_PROFILE} # Description of argument(s): # HTX_MDT_PROFILE MDT profile to be executed (e.g. "mdt.bu"). Print Timen Start HTX mdt profile execution. ${htx_run} ${stderr} ${rc}= OS Execute Command ... htxcmdline -run -mdt ${HTX_MDT_PROFILE} Printn ${htx_run} Should Contain ${htx_run} Activated ... msg=htxcmdline run mdt did not return "Activated" status. Check HTX Run Status [Documentation] Get HTX exerciser status and check for error. [Arguments] ${sleep_time}=${0} # Description of argument(s): # sleep_time The amount of time to sleep after checking status, # for example "3s" or "2m". Print Timen Check HTX mdt Status and error. ${htx_status} ${stderr} ${rc}= OS Execute Command ... htxcmdline -status -mdt ${HTX_MDT_PROFILE} Printn ${htx_status} ${htx_errlog} ${stderr} ${rc}= OS Execute Command ... htxcmdline -geterrlog Printn ${htx_errlog} Should Contain ${htx_errlog} file is empty ... msg=HTX geterrorlog was not empty. Return From Keyword If "${sleep_time}" == "${0}" Run Key U Sleep \ ${sleep_time} Shutdown HTX Exerciser [Documentation] Shut down HTX exerciser run. Print Timen Shutdown HTX Run ${shutdown} ${stderr} ${rc}= OS Execute Command ... htxcmdline -shutdown -mdt ${HTX_MDT_PROFILE} Printn ${shutdown} ${down1}= Evaluate 'shutdown successfully' in $shutdown Return From Keyword If '${down1}' == 'True' ${down2}= Evaluate 'No MDT is currently running' in $shutdown Return From Keyword If '${down2}' == 'True' Fail msg=Shutdown returned unexpected message. Create JSON Inventory File [Documentation] Create a JSON inventory file, and make a YAML copy. [Arguments] ${json_file_path} # Description of argument: # json_file_path Where the inventory file is wrtten to. Login To OS Compile Inventory JSON Run json2yaml ${json_tmp_file_path} ${yaml_file_path} # Format to JSON pretty print to file. Run python -m json.tool ${json_tmp_file_path} > ${json_file_path} OperatingSystem.File Should Exist ${json_file_path} ... msg=File ${json_file_path} does not exist. Compile Inventory JSON [Documentation] Compile the Inventory into a JSON file. Create File ${json_tmp_file_path} Write New JSON List ${json_tmp_file_path} Inventory Retrieve HW Info And Write processor ${json_tmp_file_path} Retrieve HW Info And Write memory ${json_tmp_file_path} Retrieve HW Info And Write List ${I/O} ${json_tmp_file_path} I/O last Close New JSON List ${json_tmp_file_path} Write New JSON List [Documentation] Start a new JSON list element in file. [Arguments] ${json_tmp_file_path} ${json_field_name} # Description of argument(s): # json_tmp_file_path Name of file to write to. # json_field_name Name to give json list element. Append to File ${json_tmp_file_path} { "${json_field_name}" : [ Close New JSON List [Documentation] Close JSON list element in file. [Arguments] ${json_tmp_file_path} # Description of argument(s): # json_tmp_file_path Path of file to write to. Append to File ${json_tmp_file_path} ]} Retrieve HW Info And Write [Documentation] Retrieve and write info, add a comma if not last item. [Arguments] ${class} ${json_tmp_file_path} ${last}=false # Description of argument(s): # class Device class to retrieve with lshw. # json_tmp_file_path Path of file to write to. # last Is this the last element in the parent JSON? Write New JSON List ${json_tmp_file_path} ${class} ${output}= Retrieve Hardware Info ${class} ${output}= Clean Up String ${output} Run Keyword if ${output.__class__ is not type(None)} ... Append To File ${json_tmp_file_path} ${output} Close New JSON List ${json_tmp_file_path} Run Keyword if '${last}' == 'false' ... Append to File ${json_tmp_file_path} , Retrieve HW Info And Write List [Documentation] Does a Retrieve/Write with a list of classes and ... encapsulates them into one large JSON element. [Arguments] ${list} ${json_tmp_file_path} ${json_field_name} ... ${last}=false # Description of argument(s): # list The list of devices classes to retrieve with lshw. # json_tmp_file_path Path of file to write to. # json_field_name Name of the JSON element to encapsulate this list. # last Is this the last element in the parent JSON? Write New JSON List ${json_tmp_file_path} ${json_field_name} FOR ${class} IN @{list} ${tail} Get From List ${list} -1 Run Keyword if '${tail}' == '${class}' ... Retrieve HW Info And Write ${class} ${json_tmp_file_path} true ... ELSE Retrieve HW Info And Write ${class} ${json_tmp_file_path} END Close New JSON List ${json_tmp_file_path} Run Keyword if '${last}' == 'false' ... Append to File ${json_tmp_file_path} , Retrieve Hardware Info [Documentation] Retrieves the lshw output of the device class as JSON. [Arguments] ${class} # Description of argument(s): # class Device class to retrieve with lshw. ${output} ${stderr} ${rc}= OS Execute Command lshw -c ${class} -json ${output}= Verify JSON string ${output} [Return] ${output} Verify JSON String [Documentation] Ensure the JSON string content is separated by commas. [Arguments] ${unver_string} # Description of argument(s): # unver_string JSON String we will be checking for lshw comma errors. ${unver_string}= Convert to String ${unver_string} ${ver_string}= Replace String Using Regexp ${unver_string} }\\s*{ },{ [Return] ${ver_string} Clean Up String [Documentation] Remove extra whitespace and trailing commas. [Arguments] ${dirty_string} # Description of argument(s): # dirty_string String that will be space stripped and have comma removed. ${clean_string}= Strip String ${dirty_string} ${last_char}= Get Substring ${clean_string} -1 ${trimmed_string}= Get Substring ${clean_string} 0 -1 ${clean_string}= Set Variable If '${last_char}' == ',' ... ${trimmed_string} ${clean_string} [Return] ${clean_string} Get OS Network Interface Names [Documentation] Return a list of interface names on the OS. ${stdout} ${stderr} ${rc}= OS Execute Command ls /sys/class/net @{interface_names}= Split String ${stdout} [Return] @{interface_names} Run Soft Bootme [Documentation] Run a soft bootme for a period of an hour. [Arguments] ${bootme_period}=3 # Description of argument(s): # bootme_time Bootme period to be rebooting the system. ${output} ${stderr} ${rc}= OS Execute Command ... htxcmdline -bootme on mode:soft period:${bootme_period} Should Contain ${output} bootme on is completed successfully Shutdown Bootme [Documentation] Stop the bootme process. ${output} ${stderr} ${rc}= OS Execute Command htxcmdline -bootme off Should Contain ${output} bootme off is completed successfully