xref: /openbmc/u-boot/doc/README.440-DDR-performance (revision a2c95a72247990dee9a03b26b4dc9fc0182c97ed)
1*a2c95a72SStefan RoeseAMCC suggested to set the PMU bit to 0 for best performace on the
2*a2c95a72SStefan RoesePPC440 DDR controller. The 440er common DDR setup files (sdram.c &
3*a2c95a72SStefan Roesespd_sdram.c) are changed accordingly. So all 440er boards using
4*a2c95a72SStefan Roesethese setup routines will automatically receive this performance
5*a2c95a72SStefan Roeseincrease.
6*a2c95a72SStefan Roese
7*a2c95a72SStefan RoesePlease see below some benchmarks done by AMCC to demonstrate this
8*a2c95a72SStefan Roeseperformance changes:
9*a2c95a72SStefan Roese
10*a2c95a72SStefan Roese
11*a2c95a72SStefan Roese----------------------------------------
12*a2c95a72SStefan RoeseSDRAM0_CFG0[PMU] = 1 (U-boot default for Bamboo, Yosemite and Yellowstone)
13*a2c95a72SStefan Roese----------------------------------------
14*a2c95a72SStefan RoeseStream benchmark results
15*a2c95a72SStefan Roese-------------------------------------------------------------
16*a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word.
17*a2c95a72SStefan Roese-------------------------------------------------------------
18*a2c95a72SStefan RoeseArray size = 2000000, Offset = 0
19*a2c95a72SStefan RoeseTotal memory required = 45.8 MB.
20*a2c95a72SStefan RoeseEach test is run 10 times, but only
21*a2c95a72SStefan Roesethe *best* time for each is used.
22*a2c95a72SStefan Roese-------------------------------------------------------------
23*a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds.
24*a2c95a72SStefan RoeseEach test below will take on the order of 112345 microseconds.
25*a2c95a72SStefan Roese   (= 112345 clock ticks)
26*a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting
27*a2c95a72SStefan Roeseat least 20 clock ticks per test.
28*a2c95a72SStefan Roese-------------------------------------------------------------
29*a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline.
30*a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system
31*a2c95a72SStefan Roesetimer.
32*a2c95a72SStefan Roese-------------------------------------------------------------
33*a2c95a72SStefan RoeseFunction      Rate (MB/s)   RMS time     Min time     Max time
34*a2c95a72SStefan RoeseCopy:         256.7683       0.1248       0.1246       0.1250
35*a2c95a72SStefan RoeseScale:        246.0157       0.1302       0.1301       0.1302
36*a2c95a72SStefan RoeseAdd:          255.0316       0.1883       0.1882       0.1885
37*a2c95a72SStefan RoeseTriad:        253.1245       0.1897       0.1896       0.1899
38*a2c95a72SStefan Roese
39*a2c95a72SStefan Roese
40*a2c95a72SStefan RoeseTTCP Benchmark Results
41*a2c95a72SStefan Roesettcp-t: socket
42*a2c95a72SStefan Roesettcp-t: connect
43*a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
44*a2c95a72SStefan Roeselocalhost
45*a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++
46*a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57
47*a2c95a72SStefan Roesettcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw
48*a2c95a72SStefan Roese
49*a2c95a72SStefan Roese----------------------------------------
50*a2c95a72SStefan RoeseSDRAM0_CFG0[PMU] = 0 (Suggested modification)
51*a2c95a72SStefan RoeseSetting PMU = 0 provides a noticeable performance improvement *2% to
52*a2c95a72SStefan Roese5% improvement in memory performance.
53*a2c95a72SStefan Roese*Improves the Mbit/sec for TTCP benchmark by almost 76%.
54*a2c95a72SStefan Roese----------------------------------------
55*a2c95a72SStefan RoeseStream benchmark results
56*a2c95a72SStefan Roese-------------------------------------------------------------
57*a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word.
58*a2c95a72SStefan Roese-------------------------------------------------------------
59*a2c95a72SStefan RoeseArray size = 2000000, Offset = 0
60*a2c95a72SStefan RoeseTotal memory required = 45.8 MB.
61*a2c95a72SStefan RoeseEach test is run 10 times, but only
62*a2c95a72SStefan Roesethe *best* time for each is used.
63*a2c95a72SStefan Roese-------------------------------------------------------------
64*a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds.
65*a2c95a72SStefan RoeseEach test below will take on the order of 120066 microseconds.
66*a2c95a72SStefan Roese   (= 120066 clock ticks)
67*a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting
68*a2c95a72SStefan Roeseat least 20 clock ticks per test.
69*a2c95a72SStefan Roese-------------------------------------------------------------
70*a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline.
71*a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system
72*a2c95a72SStefan Roesetimer.
73*a2c95a72SStefan Roese-------------------------------------------------------------
74*a2c95a72SStefan RoeseFunction      Rate (MB/s)   RMS time     Min time     Max time
75*a2c95a72SStefan RoeseCopy:         262.5167       0.1221       0.1219       0.1223
76*a2c95a72SStefan RoeseScale:        258.4856       0.1238       0.1238       0.1240
77*a2c95a72SStefan RoeseAdd:          262.5404       0.1829       0.1828       0.1831
78*a2c95a72SStefan RoeseTriad:        266.8594       0.1800       0.1799       0.1802
79*a2c95a72SStefan Roese
80*a2c95a72SStefan RoeseTTCP Benchmark Results
81*a2c95a72SStefan Roesettcp-t: socket
82*a2c95a72SStefan Roesettcp-t: connect
83*a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
84*a2c95a72SStefan Roeselocalhost
85*a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++
86*a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89
87*a2c95a72SStefan Roesettcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw
88*a2c95a72SStefan Roese
89*a2c95a72SStefan Roese
90*a2c95a72SStefan Roese2006-07-28, Stefan Roese <sr@denx.de>
91