xref: /openbmc/u-boot/doc/README.memory-test (revision a2681707b2478abef34b8c403e7ab52daae9c331)
1*a2681707SWolfgang DenkThe most frequent cause of problems when porting U-Boot to new
2*a2681707SWolfgang Denkhardware, or when using a sloppy port on some board, is memory errors.
3*a2681707SWolfgang DenkIn most cases these are not caused by failing hardware, but by
4*a2681707SWolfgang Denkincorrect initialization of the memory controller.  So it appears to
5*a2681707SWolfgang Denkbe a good idea to always test if the memory is working correctly,
6*a2681707SWolfgang Denkbefore looking for any other potential causes of any problems.
7*a2681707SWolfgang Denk
8*a2681707SWolfgang DenkU-Boot implements 3 different approaches to perform memory tests:
9*a2681707SWolfgang Denk
10*a2681707SWolfgang Denk1. The get_ram_size() function (see "common/memsize.c").
11*a2681707SWolfgang Denk
12*a2681707SWolfgang Denk   This function is supposed to be used in each and every U-Boot port
13*a2681707SWolfgang Denk   determine the presence and actual size of each of the potential
14*a2681707SWolfgang Denk   memory banks on this piece of hardware.  The code is supposed to be
15*a2681707SWolfgang Denk   very fast, so running it for each reboot does not hurt.  It is a
16*a2681707SWolfgang Denk   little known and generally underrated fact that this code will also
17*a2681707SWolfgang Denk   catch 99% of hardware related (i. e. reliably reproducible) memory
18*a2681707SWolfgang Denk   errors.  It is strongly recommended to always use this function, in
19*a2681707SWolfgang Denk   each and every port of U-Boot.
20*a2681707SWolfgang Denk
21*a2681707SWolfgang Denk2. The "mtest" command.
22*a2681707SWolfgang Denk
23*a2681707SWolfgang Denk   This is probably the best known memory test utility in U-Boot.
24*a2681707SWolfgang Denk   Unfortunately, it is also the most problematic, and the most
25*a2681707SWolfgang Denk   useless one.
26*a2681707SWolfgang Denk
27*a2681707SWolfgang Denk   There are a number of serious problems with this command:
28*a2681707SWolfgang Denk
29*a2681707SWolfgang Denk   - It is terribly slow.  Running "mtest" on the whole system RAM
30*a2681707SWolfgang Denk     takes a _long_ time before there is any significance in the fact
31*a2681707SWolfgang Denk     that no errors have been found so far.
32*a2681707SWolfgang Denk
33*a2681707SWolfgang Denk   - It is difficult to configure, and to use.  And any errors here
34*a2681707SWolfgang Denk     will reliably crash or hang your system.  "mtest" is dumb and has
35*a2681707SWolfgang Denk     no knowledge about memory ranges that may be in use for other
36*a2681707SWolfgang Denk     purposes, like exception code, U-Boot code and data, stack,
37*a2681707SWolfgang Denk     malloc arena, video buffer, log buffer, etc.  If you let it, it
38*a2681707SWolfgang Denk     will happily "test" all such areas, which of course will cause
39*a2681707SWolfgang Denk     some problems.
40*a2681707SWolfgang Denk
41*a2681707SWolfgang Denk   - It is not easy to configure and use, and a large number of
42*a2681707SWolfgang Denk     systems are seriously misconfigured.  The original idea was to
43*a2681707SWolfgang Denk     test basically the whole system RAM, with only exempting the
44*a2681707SWolfgang Denk     areas used by U-Boot itself - on most systems these are the areas
45*a2681707SWolfgang Denk     used for the exception vectors (usually at the very lower end of
46*a2681707SWolfgang Denk     system memory) and for U-Boot (code, data, etc. - see above;
47*a2681707SWolfgang Denk     these are usually at the very upper end of system memory).  But
48*a2681707SWolfgang Denk     experience has shown that a very large number of ports use
49*a2681707SWolfgang Denk     pretty much bogus settings of CONFIG_SYS_MEMTEST_START and
50*a2681707SWolfgang Denk     CONFIG_SYS_MEMTEST_END; this results in useless tests (because
51*a2681707SWolfgang Denk     the ranges is too small and/or badly located) or in critical
52*a2681707SWolfgang Denk     failures (system crashes).
53*a2681707SWolfgang Denk
54*a2681707SWolfgang Denk   Because of these issues, the "mtest" command is considered depre-
55*a2681707SWolfgang Denk   cated.  It should not be enabled in most normal ports of U-Boot,
56*a2681707SWolfgang Denk   especially not in production.  If you really need a memory test,
57*a2681707SWolfgang Denk   then see 1. and 3. above resp. below.
58*a2681707SWolfgang Denk
59*a2681707SWolfgang Denk3. The most thorough memory test facility is available as part of the
60*a2681707SWolfgang Denk   POST (Power-On Self Test) sub-system, see "post/drivers/memory.c".
61*a2681707SWolfgang Denk
62*a2681707SWolfgang Denk   If you really need to perform memory tests (for example, because
63*a2681707SWolfgang Denk   it is mandatory part of your requirement specification), then
64*a2681707SWolfgang Denk   enable this test which is generic and should work on all archi-
65*a2681707SWolfgang Denk   tectures.
66*a2681707SWolfgang Denk
67*a2681707SWolfgang DenkWARNING:
68*a2681707SWolfgang Denk
69*a2681707SWolfgang DenkIt should pointed out that _all_ these memory tests have one
70*a2681707SWolfgang Denkfundamental, unfixable design flaw:  they are based on the assumption
71*a2681707SWolfgang Denkthat memory errors can be found by writing to and reading from memory.
72*a2681707SWolfgang DenkUnfortunately, this is only true for the relatively harmless, usually
73*a2681707SWolfgang Denkstatic errors like shorts between data or address lines, unconnected
74*a2681707SWolfgang Denkpins, etc.  All the really nasty errors which will first turn your
75*a2681707SWolfgang Denkhair gray, only to make you tear it out later, are dynamical errors,
76*a2681707SWolfgang Denkwhich usually happen not with simple read or write cycles on the bus,
77*a2681707SWolfgang Denkbut when performing back-to-back data transfers in burst mode.  Such
78*a2681707SWolfgang Denkaccesses usually happen only for certain DMA operations, or for heavy
79*a2681707SWolfgang Denkcache use (instruction fetching, cache flushing).  So far I am not
80*a2681707SWolfgang Denkaware of any freely available code that implements a generic, and
81*a2681707SWolfgang Denkefficient, memory test like that.  The best known test case to stress
82*a2681707SWolfgang Denka system like that is to boot Linux with root file system mounted over
83*a2681707SWolfgang DenkNFS, and then build some larger software package natively (say,
84*a2681707SWolfgang Denkcompile a Linux kernel on the system) - this will cause enough context
85*a2681707SWolfgang Denkswitches, network traffic (and thus DMA transfers from the network
86*a2681707SWolfgang Denkcontroller), varying RAM use, etc. to trigger any weak spots in this
87*a2681707SWolfgang Denkarea.
88*a2681707SWolfgang Denk
89*a2681707SWolfgang DenkNote: An attempt was made once to implement such a test to catch
90*a2681707SWolfgang Denkmemory problems on a specific board.  The code is pretty much board
91*a2681707SWolfgang Denkspecific (for example, it includes setting specific GPIO signals to
92*a2681707SWolfgang Denkprovide triggers for an attached logic analyzer), but you can get an
93*a2681707SWolfgang Denkidea how it works: see "examples/standalone/test_burst*".
94*a2681707SWolfgang Denk
95*a2681707SWolfgang DenkNote 2: Ironically enough, the "test_burst" did not catch any RAM
96*a2681707SWolfgang Denkerrors, not a single one ever.  The problems this code was supposed
97*a2681707SWolfgang Denkto catch did not happen when accessing the RAM, but when reading from
98*a2681707SWolfgang DenkNOR flash.
99