xref: /openbmc/phosphor-mboxd/Documentation/mbox_protocol.md (revision 4bcf02bf961f2ca94674c6196e7466c4f5dcb4e1)
1Copyright 2017 IBM
2
3Licensed under the Apache License, Version 2.0 (the "License");
4you may not use this file except in compliance with the License.
5You may obtain a copy of the License at
6
7  http://www.apache.org/licenses/LICENSE-2.0
8
9Unless required by applicable law or agreed to in writing, software
10distributed under the License is distributed on an "AS IS" BASIS,
11WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12See the License for the specific language governing permissions and
13limitations under the License.
14
15## Intro
16
17This document describes a protocol for host to BMC communication via the
18mailbox registers present on the Aspeed 2400 and 2500 chips.
19This protocol is specifically designed to allow a host to request and manage
20access to the flash with the specifics of how the host is required to control
21this described below.
22
23## Version
24
25Both version 1 and version 2 of the protocol are described below with version 2
26specificities represented with V2 in brackets - (V2).
27
28## Problem Overview
29
30"mbox" is the name we use to represent a protocol we have established between
31the host and the BMC via the Aspeed mailbox registers. This protocol is used
32for the host to control the flash.
33
34Prior to the mbox protocol, the host uses a backdoor into the BMC address space
35(the iLPC-to-AHB bridge) to directly manipulate the BMCs own flash controller.
36
37This is not sustainable for a number of reasons. The main ones are:
38
391. Every piece of the host software stack that needs flash access (HostBoot,
40   OCC, OPAL, ...) has to have a complete driver for the flash controller,
41   update it on each BMC generation, have all the quirks for all the flash
42   chips supported etc... We have 3 copies on the host already in addition to
43   the one in the BMC itself.
44
452. There are serious issues of access conflicts to that controller between the
46   host and the BMC.
47
483. It's very hard to support "BMC reboots" when doing that
49
504. It's slow
51
525. Last but probably most important, having that backdoor open is a security
53   risk. It means the host can access any address on the BMC internal bus and
54   implant malware in the BMC itself. So if the host is a "bare metal" shared
55   system in some kind of data center, not only the host flash needs to be
56   reflashed when switching from one customer to another, but the entire BMC
57   flash too as nothing can be trusted. So we want to disable it.
58
59To address all these, we have implemented a new mechanism that we call mbox.
60
61When using this mechanism, the BMC is solely responsible for directly accessing
62the flash controller. All flash erase and write operations are performed by the
63BMC and the BMC only. (We can allow direct reads from flash under some
64circumstances but we tend to prefer going via memory).
65
66The host uses the mailbox registers to send "commands" to the BMC, which
67responds via the same mechanism. Those commands allow the host to control a
68"window" (which is the LPC -> AHB FW space mapping) that is either a read
69window or a write window onto the flash.
70
71When set for writing, the BMC makes the window point to a chunk of RAM instead.
72When the host "commits" a change (via MBOX), then the BMC can perform the
73actual flashing from the data in the RAM window.
74
75The idea is to have the LPC FW space be routed to an active "window".  That
76window can be a read or a write window. The commands allow to control which
77window and which offset into the flash it maps.
78
79* A read window can be a direct window to the flash controller space (ie.
80  0x3000\_0000) or it can be a window to a RAM image of a flash. It doesn't have
81  to be the full size of the flash per protocol (commands can be used to "slide"
82  it to various parts of the flash) but if it's set to map the actual flash
83  controller space at 0x3000\_0000, it's probably simpler to make it the full
84  flash. The host makes no assumption, it's your choice what to provide. The
85  simplest implementation is to just route to the flash read/only.
86
87* A write window has to be a chunk of BMC memory. The minimum size is not
88  defined in the spec, but it should be at least one block (4k for now but it
89  should support larger block sizes in the future). When the BMC receive the
90  command to map the write window at a given offset of the flash, the BMC should
91  copy that portion of the flash into a reserved memory buffer, and modify the
92  LPC mapping to point to that buffer.
93
94The host can then write to that window directly (updating the BMC memory) and
95send a command to "commit" those updates to flash.
96
97Finally, there is a `RESET_STATE`. It's the state in which the bootloader in the
98SEEPROM of the POWER9 chip will find what it needs to load HostBoot. The
99details are still being ironed out: either mapping the full flash read only or
100reset to a "window" that is either at the bottom or top of the flash. The
101current implementation resets to point to the full flash.
102
103## Where is the code?
104
105The mbox userspace is available [on GitHub](https://github.com/openbmc/mboxbridge)
106This is Apache licensed but we are keen to see any enhancements you may have.
107
108The kernel driver is still in the process of being upstreamed but can be found
109in the OpenBMC Linux kernel staging tree:
110
111https://github.com/openbmc/linux/commit/85770a7d1caa6a1fa1a291c33dfe46e05755a2ef
112
113## Building
114
115The Autotools of this requires the autoconf-archive package for your
116system
117
118## The Hardware
119
120The Aspeed mailbox consists of 16 (8 bit) data registers see Layout for their
121use. Mailbox interrupt enabling, masking and triggering is done using a pair
122of control registers, one accessible by the host the other by the BMC.
123Interrupts can also be raised per write to each data register, for BMC and
124host. Write triggered interrupts are configured using two 8 bit registers where
125each bit represents a data register and if an interrupt should fire on write.
126Two 8 bit registers are present to act as a mask for write triggered
127interrupts.
128
129### Layout
130
131```
132Byte 0: COMMAND
133Byte 1: Sequence
134Byte 2-12: Arguments
135Byte 13: Response code
136Byte 14: Host controlled status reg
137Byte 15: BMC controlled status reg
138```
139
140Note: when the BMC is writing a response to the mbox registers (as described
141above), the "Response Code" (Register 13) must be the last register written to.
142Writing register 13 will trigger an interrupt to the host indicating a complete
143response has been written. Triggering the interrupt by writing register 13
144prior to completing the response may lead to a data race, and must, therefore,
145be avoided.
146
147## Low Level Protocol Flow
148
149What we essentially have is a set of registers which either the host or BMC can
150write to in order to communicate to the other which will respond in some way.
151There are 3 basic types of communication.
152
1531. Commands sent from the Host to the BMC
1542. Responses sent from the BMC to the Host in response to commands
1553. Asynchronous events raised by the BMC
156
157### General Use
158
159Messages usually originate from the host to the BMC. There are special
160cases for a back channel for the BMC to pass new information to the
161host which will be discussed later.
162
163To initiate a request the host must set a command code (see Commands) into
164mailbox data register 0, and generate a sequence number (see Sequence Numbers)
165to write to mailbox register data 1. After these two values, any
166command-specific data should be written (see Layout). The host must then
167generate an interrupt to the BMC by using bit 0 of its control register and
168wait for an interrupt on the response register.  Generating an interrupt
169automatically sets bit 7 of the corresponding control register. This bit can be
170used to poll for messages.
171
172On receiving an interrupt (or polling on bit 7 of its Control
173Register) the BMC should read the message from the general registers
174of the mailbox and perform the necessary action before responding. On
175responding the BMC must ensure that the sequence number is the same as
176the one in the request from the host. The BMC must also ensure that
177mailbox data register 13 is a valid response code (see Responses). The
178BMC should then use its control register to generate an interrupt for
179the host to notify it of a response.
180
181### Asynchronous BMC to Host Events
182
183BMC to host communication is also possible for notification of events
184from the BMC. This requires that the host have interrupts enabled on
185mailbox data register 15 (or otherwise poll on bit 7 of mailbox status
186register 1). On receiving such a notification the host should read
187mailbox data register 15 to determine the event code which was set by the
188BMC (see BMC Event notifications in Commands for detail). Events which are
189defined as being able to be acknowledged by the host must be with a
190BMC_EVENT_ACK command.
191
192## High Level Protocol Flow
193
194When a host wants to communicate with the BMC via the mbox protocol the first
195thing it should do it call MBOX_GET_INFO in order to establish the protocol
196version which each understands. Before this, the only other commands which are
197allowed are RESET_STATE and BMC_EVENT_ACK.
198
199After this, the host can open and close windows with the CREATE_READ_WINDOW,
200CREATE_WRITE_WINDOW and CLOSE_WINDOW commands. Creating a window is how the
201host requests access to a section of flash. It is worth noting that the host
202can only ever have one window that it is accessing at a time - hence forth
203referred to as the active window.
204
205When the active window is a write window the host can perform MARK_WRITE_DIRTY,
206MARK_WRITE_ERASED and WRITE_FLUSH commands to identify changed blocks and
207control when the changed blocks are written to flash.
208
209Independently, and at any point not during an existing mbox command
210transaction, the BMC may raise asynchronous events with the host to
211communicate a change in state.
212
213### Version Negotiation
214
215Given that a majority of command and response arguments are specified as a
216multiple of block size it is necessary for the host and BMC to agree on a
217protocol version as this determines the block size. In V1 it is hard coded at
2184K and in V2 the BMC chooses and specifies this to the host as a response
219argument to `MBOX_GET_INFO`. Thus the host must always call `MBOX_GET_INFO`
220before any other command which specifies an argument in block size.
221
222When invoking `MBOX_GET_INFO` the host must provide the BMC its highest
223supported version of the protocol. The BMC must respond with a protocol version
224less than or equal to that requested by the host, or in the event that there is
225no such value, an error code. In the event that an error is returned the host
226must not continue to communicate with the BMC. Otherwise, the protocol version
227returned by the BMC is the agreed protocol version for all further
228communication. The host may at a future point request a change in protocol
229version by issuing a subsequent `MBOX_GET_INFO` command.
230
231### Window Management
232
233In order to access flash contents, the host must request a window be opened at
234the flash offset it would like to access. The host may give a hint as to how
235much data it would like to access or otherwise set this argument to zero. The
236BMC must respond with the LPC bus address to access this window and the
237window size. The host must not access past the end of the active window.
238
239There is only ever one active window which is the window created by the most
240recent CREATE_READ_WINDOW or CREATE_WRITE_WINDOW call which succeeded. Even
241though there are two types of windows there can still only be one active window
242irrespective of type. A host must not write to a read window. A host may read
243from a write window and the BMC must guarantee that the window reflects what
244the host has written there.
245
246A window can be closed by calling CLOSE_WINDOW in which case there is no active
247window and the host must not access the LPC window after it has been closed.
248If the host closes an active write window then the BMC must perform an
249implicit flush. If the host tries to open a new window with an already active
250window then the active window is closed (and implicitly flushed if it was a
251write window). If the new window is successfully opened then it is the new
252active window, if the command fails then there is no active window and the
253previously active window must no longer be accessed.
254
255The host must not access an LPC address other than that which is contained by
256the active window. The host must not use write management functions (see below)
257if the active window is a read window or if there is no active window.
258
259### Write Management
260
261The BMC has no method for intercepting writes that occur over the LPC bus. Thus
262the host must explicitly notify the BMC of where and when a write has
263occurred. The host must use the MARK_WRITE_DIRTY command to tell the BMC where
264within the write window it has modified. The host may also use the
265MARK_WRITE_ERASED command to erase large parts of the active window without the
266need to write 0xFF. The BMC must ensure that if the host
267reads from an area it has erased that the read values are 0xFF. Any part of the
268active window marked dirty/erased is only marked for the lifetime of the current
269active write window and does not persist if the active window is closed either
270implicitly or explicitly by the host or the BMC. The BMC may at any time
271or must on a call to WRITE_FLUSH flush the changes which it has been notified
272of back to the flash, at which point the dirty or erased marking is cleared
273for the active window. The host must not assume that any changes have been
274written to flash unless an explicit flush call was successful, a close of an
275active write window was successful or a create window command with an active
276write window was successful - otherwise consistency between the flash and memory
277contents cannot be guaranteed.
278
279The host is not required to perform an erase before a write command and the
280BMC must ensure that a write performs as expected - that is if an erase is
281required before a write then the BMC must perform this itself.
282
283### BMC Events
284
285The BMC can raise events with the host asynchronously to communicate to the
286host a change in state which it should take notice of. The host must (if
287possible for the given event) acknowledge it to inform the BMC it has been
288received.
289
290If the BMC raises a BMC Reboot event then the host must renegotiate the
291protocol version so that both the BMC and the host agree on the block size.
292A BMC Reboot event implies a BMC Windows Reset event.
293If the BMC raises a BMC Windows Reset event then the host must
294assume that there is no longer an active window - that is if there was an
295active window it has been closed by the BMC and if it was a write window
296then the host must not assume that it was flushed unless a previous explicit
297flush call was successful.
298
299The BMC may at some points require access to the flash and the BMC daemon must
300set the BMC Flash Control Lost event when the BMC is accessing the flash behind
301the BMC daemons back. When this event is set the host must assume that the
302contents of the active window could be inconsistent with the contents of flash.
303
304## Protocol Definition
305
306### Commands
307
308```
309RESET_STATE          0x01
310GET_MBOX_INFO        0x02
311GET_FLASH_INFO       0x03
312CREATE_READ_WINDOW   0x04
313CLOSE_WINDOW         0x05
314CREATE_WRITE_WINDOW  0x06
315MARK_WRITE_DIRTY     0x07
316WRITE_FLUSH          0x08
317BMC_EVENT_ACK        0x09
318MARK_WRITE_ERASED    0x0a	(V2)
319```
320
321### Responses
322
323```
324SUCCESS		1
325PARAM_ERROR	2
326WRITE_ERROR	3
327SYSTEM_ERROR	4
328TIMEOUT		5
329BUSY		6	(V2)
330WINDOW_ERROR	7	(V2)
331SEQ_ERROR	8	(V2)
332```
333
334### Sequence Numbers
335
336Sequence numbers are included in messages for correlation of commands and
337responses. V1 and V2 of the protocol permit either zero or one commands to be
338in progress (yet to receive a response).
339
340For generality, the host must generate a sequence number that is unique with
341respect to the previous command (one that has received a response) and any
342in-progress commands. Sequence numbers meeting this requirement are considered
343valid. The BMC's response to a command must contain the same sequence number
344issued by the host as found in the relevant command.
345
346Sequence numbers may be reused in accordance with the constraints outlined
347above. However, it is not an error if the BMC receives a `GET_MBOX_INFO` with an
348invalid sequence number. For all other cases, the BMC must respond with
349`SEQ_ERROR` if the constraints are violated. If the host receives a `SEQ_ERROR`
350response it must consider any in-progress commands to have failed. The host may
351retry the affected command(s) after generating a suitable sequence number.
352
353#### Description:
354
355SUCCESS		- Command completed successfully
356
357PARAM_ERROR	- Error with parameters supplied or command invalid
358
359WRITE_ERROR	- Error writing to the backing file system
360
361SYSTEM_ERROR	- Error in BMC performing system action
362
363TIMEOUT		- Timeout in performing action
364
365BUSY		- Daemon in suspended state (currently unable to access flash)
366		- Retry again later
367
368WINDOW_ERROR	- Command not valid for active window or no active window
369		- Try opening an appropriate window and retrying the command
370
371### Information
372- All multibyte messages are LSB first (little endian)
373- All responses must have a valid return code in byte 13
374
375
376### Commands in detail
377
378Block size refers to an agreed value which is used as a unit for the
379arguments of various commands and responses. Having a block size multiplier
380allows us to specify larger values with fewer command and response fields.
381
382In V1 block size is hard coded to 4K.
383In V2 it is variable and must be queried with the GET_MBOX_INFO command.
384Note that for simplicity block size must always be a power-of-2.
385Block size must also be greater than or equal to 4K. This is due to the
386fact that we have a 28-bit LPC address space and commands which return an
387LPC address do so in 16 bits, thus we need at least a 12-bit unit to ensure
388that we can specify the entire address space. This additionally allows us
389to specify flash addresses of at least 256MB.
390
391Sizes and addresses are specified in either bytes - (bytes)
392					 or blocks - (blocks)
393Sizes and addresses specified in blocks must be converted to bytes by
394multiplying by the block size.
395```
396Command:
397	RESET_STATE
398	Implemented in Versions:
399		V1, V2
400	Arguments:
401		-
402	Response:
403		-
404	Notes:
405		This command is designed to inform the BMC that it should put
406		host LPC mapping back in a state where the SBE will be able to
407		use it. Currently, this means pointing back to BMC flash
408		pre mailbox protocol. Final behavior is still TBD.
409
410Command:
411	GET_MBOX_INFO
412	Implemented in Versions:
413		V1, V2
414	Arguments:
415		V1:
416		Args 0: API version
417
418		V2:
419		Args 0: API version
420
421	Response:
422		V1:
423		Args 0: API version
424		Args 1-2: default read window size (blocks)
425		Args 3-4: default write window size (blocks)
426
427		V2:
428		Args 0: API version
429		Args 1-2: reserved
430		Args 3-4: reserved
431		Args 5: Block size as power of two (encoded as a shift)
432		Args 6-7: Suggested Timeout (seconds)
433	Notes:
434		The suggested timeout is a hint to the host as to how long
435		it should wait after issuing a command to the BMC before it
436		times out waiting for a response. This is the maximum time
437		which the BMC thinks it could take to service any command which
438		the host could issue. This may be set to zero to indicate that
439		the BMC	does not wish to provide a hint in which case the host
440		must choose some reasonable value.
441
442Command:
443	GET_FLASH_INFO
444	Implemented in Versions:
445		V1, V2
446	Arguments:
447		-
448	Response:
449		V1:
450		Args 0-3: Flash size (bytes)
451		Args 4-7: Erase granule (bytes)
452
453		V2:
454		Args 0-1: Flash size (blocks)
455		Args 2-3: Erase granule (blocks)
456
457Command:
458	CREATE_{READ/WRITE}_WINDOW
459	Implemented in Versions:
460		V1, V2
461	Arguments:
462		V1:
463		Args 0-1: Requested flash offset (blocks)
464
465		V2:
466		Args 0-1: Requested flash offset (blocks)
467		Args 2-3: Requested flash size to access (blocks)
468
469	Response:
470		V1:
471		Args 0-1: LPC bus address of window (blocks)
472
473		V2:
474		Args 0-1: LPC bus address of window (blocks)
475		Args 2-3: Window size (blocks)
476		Args 4-5: Flash offset mapped by window (blocks)
477	Notes:
478		The flash offset which the host requests access to is always
479		taken from the start of flash - that is it is an absolute
480		offset into flash.
481
482		LPC bus address is always given from the start of the LPC
483		address space - that is it is an absolute address.
484
485		The requested access size is only a hint. The response
486		indicates the actual size of the window. The BMC may
487		want to use the requested size to pre-load the remainder
488		of the request. The host must not access past the end of the
489		active window.
490
491		The flash offset mapped by the window is an absolute flash
492		offset and must be less than or equal to the flash offset
493		requested by the host. It is the responsibility of the host
494		to use this information to access any offset which is required.
495
496		The requested window size may be zero. In this case the
497		BMC is free to create any sized window but it must contain
498		atleast the first block of data requested by the host. A large
499		window is of course preferred and should correspond to
500		the default size returned in the GET_MBOX_INFO command.
501
502		If this command returns successfully then the created window
503		is the active window. If it fails then there is no active
504		window.
505
506Command:
507	CLOSE_WINDOW
508	Implemented in Versions:
509		V1, V2
510	Arguments:
511		V1:
512		-
513
514		V2:
515		Args 0: Flags
516	Response:
517		-
518	Notes:
519		Closes the active window. Any further access to the LPC bus
520		address specified to address the previously active window will
521		have undefined effects. If the active window is a
522		write window then the BMC must perform an implicit flush.
523
524		The Flags argument allows the host to provide some
525		hints to the BMC. Defined Values:
526			0x01 - Short Lifetime:
527				The window is unlikely to be accessed
528				anytime again in the near future. The effect of
529				this will depend on BMC implementation. In
530				the event that the BMC performs some caching
531				the BMC daemon could mark data contained in a
532				window closed with this flag as first to be
533				evicted from the cache.
534
535Command:
536	MARK_WRITE_DIRTY
537	Implemented in Versions:
538		V1, V2
539	Arguments:
540		V1:
541		Args 0-1: Flash offset to mark from base of flash (blocks)
542		Args 2-5: Number to mark dirty at offset (bytes)
543
544		V2:
545		Args 0-1: Window offset to mark (blocks)
546		Args 2-3: Number to mark dirty at offset (blocks)
547
548	Response:
549		-
550	Notes:
551		The BMC has no method for intercepting writes that
552		occur over the LPC bus. The host must explicitly notify
553		the daemon of where and when a write has occurred so it
554		can be flushed to backing storage.
555
556		Offsets are given as an absolute (either into flash (V1) or the
557		active window (V2)) and a zero offset refers to the first
558		block. If the offset + number exceeds the size of the active
559		window then the command must not succeed.
560
561Command
562	WRITE_FLUSH
563	Implemented in Versions:
564		V1, V2
565	Arguments:
566		V1:
567		Args 0-1: Flash offset to mark from base of flash (blocks)
568		Args 2-5: Number to mark dirty at offset (bytes)
569
570		V2:
571		-
572
573	Response:
574		-
575	Notes:
576		Flushes any dirty/erased blocks in the active window to
577		the backing storage.
578
579		In V1 this can also be used to mark parts of the flash
580		dirty and flush in a single command. In V2 the explicit
581		mark dirty command must be used before a call to flush
582		since there are no longer any arguments. If the offset + number
583		exceeds the size of the active window then the command must not
584		succeed.
585
586
587Command:
588	BMC_EVENT_ACK
589	Implemented in Versions:
590		V1, V2
591	Arguments:
592		Args 0:	Bits in the BMC status byte (mailbox data
593			register 15) to ack
594	Response:
595		*clears the bits in mailbox data register 15*
596	Notes:
597		The host should use this command to acknowledge BMC events
598		supplied in mailbox register 15.
599
600Command:
601	MARK_WRITE_ERASED
602	Implemented in Versions:
603		V2
604	Arguments:
605		V2:
606		Args 0-1: Window offset to erase (blocks)
607		Args 2-3: Number to erase at offset (blocks)
608	Response:
609		-
610	Notes:
611		This command allows the host to erase a large area
612		without the need to individually write 0xFF
613		repetitively.
614
615		Offset is the offset within the active window to start erasing
616		from (zero refers to the first block of the active window) and
617		number is the number of blocks of the active window to erase
618		starting at offset. If the offset + number exceeds the size of
619		the active window then the command must not succeed.
620```
621
622### BMC Events in Detail:
623
624If the BMC needs to tell the host something then it simply
625writes to Byte 15. The host should have interrupts enabled
626on that register, or otherwise be polling it.
627
628#### Bit Definitions:
629
630Events which should be ACKed:
631```
6320x01: BMC Reboot
6330x02: BMC Windows Reset (V2)
634```
635
636Events which cannot be ACKed (BMC will clear when no longer
637applicable):
638```
6390x40: BMC Flash Control Lost (V2)
6400x80: BMC MBOX Daemon Ready (V2)
641```
642
643#### Event Description:
644
645Events which must be ACKed:
646The host should acknowledge these events with BMC_EVENT_ACK to
647let the BMC know that they have been received and understood.
648```
6490x01 - BMC Reboot:
650	Used to inform the host that a BMC reboot has occurred.
651	The host must perform protocol version negotiation again and
652	must assume it has no active window. The host must not assume
653	that any commands which didn't respond as such succeeded.
6540x02 - BMC Windows Reset: (V2)
655	The host must assume that its active window has been closed and
656	that it no longer has an active window. The host is not
657	required to perform protocol version negotiation again. The
658	host must not assume that any commands which didn't respond as such
659	succeeded.
660```
661
662Events which cannot be ACKed:
663These events cannot be acknowledged by the host and a call to
664BMC_EVENT_ACK with these bits set will have no effect. The BMC
665will clear these bits when they are no longer applicable.
666```
6670x40 - BMC Flash Control Lost: (V2)
668	The BMC daemon has been suspended and thus no longer
669	controls access to the flash (most likely because some
670	other process on the BMC required direct access to the
671	flash and has suspended the BMC daemon to preclude
672	concurrent access).
673	The BMC daemon must clear this bit itself when it regains
674	control of the flash (the host isn't able to clear it
675	through an acknowledge command).
676	The host must not assume that the contents of the active window
677	correctly reflect the contents of flash while this bit is set.
6780x80 - BMC MBOX Daemon Ready: (V2)
679	Used to inform the host that the BMC daemon is ready to
680	accept command requests. The host isn't able to clear
681	this bit through an acknowledge command, the BMC daemon must
682	clear it before it terminates (assuming it didn't
683	terminate unexpectedly).
684	The host should not expect a response while this bit is
685	not set.
686	Note that this bit being set is not a guarantee that the BMC daemon
687	will respond as it or the BMC may have crashed without clearing
688	it.
689```
690