xref: /openbmc/openpower-hw-diags/analyzer/ras-data/ras-data-definition.md (revision 506db4c8dfd0b67f8516d1cb8e3c01b2074aea2e)
1# RAS Data File definition
2
3When hardware reports an error, the analyzer will call the isolator which will
4return a list of active attentions in the hardware, referred to as `signatures`.
5The analyzer will then filter and sort the list to find the root cause
6signature. The RAS Data files are used to define, in a data driven fashion, the
7appropriate RAS actions that should be taken for the root cause signature.
8
9The RAS Data will be defined in the JSON data format. Each file will contain a
10single JSON object (with nested values) which will define the RAS actions for a
11single chip model and EC level.
12
13## 1) `model_ec` keyword (required)
14
15The value of this keyword is a `string` representing a 32-bit hexidecimal number
16in the format `[0-9A-Fa-f]{8}`. This value is used to determine the chip model
17and EC level in which this data is defined.
18
19## 2) `version` keyword (required)
20
21A new version number should be used for each new RAS data file format so that
22user applications will know how to properly parse the files. The value of this
23keyword is a positive integer. Version `1` has been deprecated. The current
24version is `2`.
25
26## 3) `units` keyword
27
28The value of this keyword is a JSON object representing all of the guardable
29unit targets on this chip. Each element of this object will have the format:
30
31```text
32"<unit_name>" : "<relative_devtree_path>"
33```
34
35Where `<unit_name>` is simply an alphanumeric label for the unit and
36`<relative_devtree_path>` is a string representing the devtree path of the unit
37relative to the chip defined by the file. When necessary, the user application
38should be able to concatenate the devtree path of the chip and the relative
39devtree path of the unit to get the full devtree path of the unit.
40
41## 4) `buses` keyword
42
43The value of this keyword is a JSON object representing all of the buses
44connected to this chip. Each element of this object will have the format:
45
46```text
47"<bus_name>" : { <bus_details> }
48```
49
50Where `<bus_name>` is simply an alphanumeric label for the bus and
51`<bus_details>` is a JSON object containing details of the bus connection.
52
53### 4.1) `<bus_details>` object
54
55This describes how the bus is connected to this chip. Note that the `unit`
56keyword is optional and the chip is used as the endpoint connection instead.
57This is usually intended to be used when the chip is the child and we need to
58find the connected `parent` chip/unit.
59
60| Keyword | Description                                                       |
61| ------- | ----------------------------------------------------------------- |
62| type    | The bus connection type. Values (string): `SMP_BUS` and `OMI_BUS` |
63| unit    | Optional. The `<unit_name>` of the bus endpoint on this chip.     |
64
65## 5) `actions` keyword (required)
66
67The value of this keyword is a JSON object representing all of the defined
68actions available for the file. Each element of this object contains an array of
69RAS actions, to be performed in order, with the format:
70
71```text
72"<action_name>" : [ { <action_element> }, ... ]
73```
74
75Where `<action_name>` is simply an alphanumeric label for a set of actions. This
76will be the keyword referenced by the `signatures` or by a special
77`<action_element>` for nested actions (see below).
78
79### 5.1) `<action_element>` object
80
81All `<action_element>` are JSON objects and they all require the `type` keyword,
82which is used to determine the action type. The remaining required keywords are
83dependent on the action type.
84
85Actions with a `priority` keyword can only use the following values (string):
86
87| Priority | Description                                                        |
88| -------- | ------------------------------------------------------------------ |
89| `HIGH`   | Serivce is mandatory.                                              |
90| `MED`    | Service one at a time, in order, until issue is resolved.          |
91| `MED_A`  | Same as `MED` except all in group A replaced at the same time.     |
92| `MED_B`  | Same as `MED` except all in group B replaced at the same time.     |
93| `MED_C`  | Same as `MED` except all in group C replaced at the same time.     |
94| `LOW`    | Same as `MED*`, but only if higher priority service does not work. |
95
96NOTE: If a part is called out more than once, only the highest priority callout
97will be used.
98
99Actions with a `guard` keyword can only use the following values (boolean):
100
101| Guard | Description                       |
102| ----- | --------------------------------- |
103| true  | Request guard on associated part. |
104| false | No guard request.                 |
105
106#### 5.1.1) action type `action`
107
108This is a special action type that allows using an action that has already been
109defined (nested actions).
110
111| Keyword | Description                                            |
112| ------- | ------------------------------------------------------ |
113| type    | value (string): `action`                               |
114| name    | The `<action_name>` of a previously predefined action. |
115
116#### 5.1.2) action type `callout_self`
117
118This will request to callout the chip defined by this file.
119
120| Keyword  | Description                    |
121| -------- | ------------------------------ |
122| type     | value (string): `callout_self` |
123| priority | See `priority` table above.    |
124| guard    | See `guard` table above.       |
125
126#### 5.1.3) action type `callout_unit`
127
128This will request to callout a unit of the chip defined by this file.
129
130| Keyword  | Description                                          |
131| -------- | ---------------------------------------------------- |
132| type     | value (string): `callout_unit`                       |
133| name     | The `<unit_name>` as defined by the `units` keyword. |
134| priority | See `priority` table above.                          |
135| guard    | See `guard` table above.                             |
136
137#### 5.1.4) action type `callout_connected`
138
139This will request to callout a connected chip/unit on the other side of a bus.
140
141| Keyword  | Description                                         |
142| -------- | --------------------------------------------------- |
143| type     | value (string): `callout_connected`                 |
144| name     | The `<bus_name>` as defined by the `buses` keyword. |
145| priority | See `priority` table above.                         |
146| guard    | See `guard` table above.                            |
147
148#### 5.1.5) action type `callout_bus`
149
150This will request to callout all parts associated with a bus (RX/TX endpoints
151and everything else in between the endpoints). All parts will be called out with
152the same priority. If a particular part, like the endpoints, need to be called
153out at a different priority, they will need to be called out using a different
154action type. For example:
155
156- `callout_self` with priority `MED_A`. (RX endpoint MED_A)
157- `callout_connected` with priority `MED_A`. (TX endpoint MED_A)
158- `callout_bus` with priority `LOW`. (everything else LOW)
159
160| Keyword  | Description                                         |
161| -------- | --------------------------------------------------- |
162| type     | value (string): `callout_bus`                       |
163| name     | The `<bus_name>` as defined by the `buses` keyword. |
164| priority | See `priority` table above.                         |
165| guard    | See `guard` table above.                            |
166
167#### 5.1.6) action type `callout_clock`
168
169This will request to callout a clock associated with this chip.
170
171| Keyword  | Description                     |
172| -------- | ------------------------------- |
173| type     | value (string): `callout_clock` |
174| name     | See `clock type` table below.   |
175| priority | See `priority` table above.     |
176| guard    | See `guard` table above.        |
177
178Supported clock types:
179
180| Clock Type      | Description                  |
181| --------------- | ---------------------------- |
182| OSC_REF_CLOCK_0 | Oscillator reference clock 0 |
183| OSC_REF_CLOCK_1 | Oscillator reference clock 1 |
184| TOD_CLOCK       | Time of Day (TOD) clock      |
185
186#### 5.1.7) action type `callout_procedure`
187
188This will request to callout a service procedure.
189
190| Keyword  | Description                         |
191| -------- | ----------------------------------- |
192| type     | value (string): `callout_procedure` |
193| name     | See `procedures` table below.       |
194| priority | See `priority` table above.         |
195
196Supported procedures:
197
198| Procedure | Description             |
199| --------- | ----------------------- |
200| LEVEL2    | Request Level 2 support |
201
202#### 5.1.8) action type `callout_part`
203
204This will request special part callouts that cannot be managed by the other
205callout actions (e.g. the PNOR).
206
207| Keyword  | Description                    |
208| -------- | ------------------------------ |
209| type     | value (string): `callout_part` |
210| name     | See `parts` table below.       |
211| priority | See `priority` table above.    |
212
213Supported parts:
214
215| Part Type | Description                  |
216| --------- | ---------------------------- |
217| PNOR      | The part containing the PNOR |
218
219#### 5.1.9) action type `plugin`
220
221Some RAS actions require additional support that cannot be defined easily in
222these data files. User application can defined plugins to perform these
223additional tasks. Use of this keyword should be avoided if possible. Remember,
224the goal is to make the user applications as data driven as possible to avoid
225platform specific code.
226
227| Keyword  | Description                                                       |
228| -------- | ----------------------------------------------------------------- |
229| type     | value (string): `plugin`                                          |
230| name     | A string representing the plugin name.                            |
231| instance | Some plugins may be defined for multiple register/unit instances. |
232
233### 5.2) `actions` example
234
235```json
236    "actions" : {
237        "self_L" : [
238            {
239                "type"     : "callout_self",
240                "priority" : "LOW",
241                "guard"    : false
242            },
243        ],
244        "level2_M_self_L" : [
245            {
246                "type"     : "callout_procedure",
247                "name"     : "LEVEL2",
248                "priority" : "MED"
249            },
250            {
251                "type" : "action",
252                "name" : "self_L"
253            }
254        ]
255    }
256```
257
258## 6) `signatures` keyword (required)
259
260The value of this keyword is a JSON object representing all of the signatures
261from this chip requiring RAS actions. Each element of this object will have the
262format:
263
264```text
265"<sig_id>" : { "<sig_bit>" : { "<sig_inst>" : "<action_name>", ... }, ... }
266```
267
268Where `<sig_id>` (16-bit), `<sig_bit>` (8-bit), and `<sig_inst>` (8-bit) are
269lower case hexadecimal values with NO preceeding '0x'. See the details of these
270fields in the isolator's `Signature` object. The `<action_name>` is a label
271defined in by the `actions` keyword above.
272