1a2c95a72SStefan RoeseAMCC suggested to set the PMU bit to 0 for best performace on the
2a2c95a72SStefan RoesePPC440 DDR controller. The 440er common DDR setup files (sdram.c &
3a2c95a72SStefan Roesespd_sdram.c) are changed accordingly. So all 440er boards using
4a2c95a72SStefan Roesethese setup routines will automatically receive this performance
5a2c95a72SStefan Roeseincrease.
6a2c95a72SStefan Roese
7a2c95a72SStefan RoesePlease see below some benchmarks done by AMCC to demonstrate this
8a2c95a72SStefan Roeseperformance changes:
9a2c95a72SStefan Roese
10a2c95a72SStefan Roese
11a2c95a72SStefan Roese----------------------------------------
12*a187559eSBin MengSDRAM0_CFG0[PMU] = 1 (U-Boot default for Bamboo, Yosemite and Yellowstone)
13a2c95a72SStefan Roese----------------------------------------
14a2c95a72SStefan RoeseStream benchmark results
15a2c95a72SStefan Roese-------------------------------------------------------------
16a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word.
17a2c95a72SStefan Roese-------------------------------------------------------------
18a2c95a72SStefan RoeseArray size = 2000000, Offset = 0
19a2c95a72SStefan RoeseTotal memory required = 45.8 MB.
20a2c95a72SStefan RoeseEach test is run 10 times, but only
21a2c95a72SStefan Roesethe *best* time for each is used.
22a2c95a72SStefan Roese-------------------------------------------------------------
23a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds.
24a2c95a72SStefan RoeseEach test below will take on the order of 112345 microseconds.
25a2c95a72SStefan Roese   (= 112345 clock ticks)
26a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting
27a2c95a72SStefan Roeseat least 20 clock ticks per test.
28a2c95a72SStefan Roese-------------------------------------------------------------
29a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline.
30a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system
31a2c95a72SStefan Roesetimer.
32a2c95a72SStefan Roese-------------------------------------------------------------
33a2c95a72SStefan RoeseFunction      Rate (MB/s)   RMS time     Min time     Max time
34a2c95a72SStefan RoeseCopy:         256.7683       0.1248       0.1246       0.1250
35a2c95a72SStefan RoeseScale:        246.0157       0.1302       0.1301       0.1302
36a2c95a72SStefan RoeseAdd:          255.0316       0.1883       0.1882       0.1885
37a2c95a72SStefan RoeseTriad:        253.1245       0.1897       0.1896       0.1899
38a2c95a72SStefan Roese
39a2c95a72SStefan Roese
40a2c95a72SStefan RoeseTTCP Benchmark Results
41a2c95a72SStefan Roesettcp-t: socket
42a2c95a72SStefan Roesettcp-t: connect
43a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
44a2c95a72SStefan Roeselocalhost
45a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++
46a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57
47a2c95a72SStefan Roesettcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw
48a2c95a72SStefan Roese
49a2c95a72SStefan Roese----------------------------------------
50a2c95a72SStefan RoeseSDRAM0_CFG0[PMU] = 0 (Suggested modification)
51a2c95a72SStefan RoeseSetting PMU = 0 provides a noticeable performance improvement *2% to
52a2c95a72SStefan Roese5% improvement in memory performance.
53a2c95a72SStefan Roese*Improves the Mbit/sec for TTCP benchmark by almost 76%.
54a2c95a72SStefan Roese----------------------------------------
55a2c95a72SStefan RoeseStream benchmark results
56a2c95a72SStefan Roese-------------------------------------------------------------
57a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word.
58a2c95a72SStefan Roese-------------------------------------------------------------
59a2c95a72SStefan RoeseArray size = 2000000, Offset = 0
60a2c95a72SStefan RoeseTotal memory required = 45.8 MB.
61a2c95a72SStefan RoeseEach test is run 10 times, but only
62a2c95a72SStefan Roesethe *best* time for each is used.
63a2c95a72SStefan Roese-------------------------------------------------------------
64a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds.
65a2c95a72SStefan RoeseEach test below will take on the order of 120066 microseconds.
66a2c95a72SStefan Roese   (= 120066 clock ticks)
67a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting
68a2c95a72SStefan Roeseat least 20 clock ticks per test.
69a2c95a72SStefan Roese-------------------------------------------------------------
70a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline.
71a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system
72a2c95a72SStefan Roesetimer.
73a2c95a72SStefan Roese-------------------------------------------------------------
74a2c95a72SStefan RoeseFunction      Rate (MB/s)   RMS time     Min time     Max time
75a2c95a72SStefan RoeseCopy:         262.5167       0.1221       0.1219       0.1223
76a2c95a72SStefan RoeseScale:        258.4856       0.1238       0.1238       0.1240
77a2c95a72SStefan RoeseAdd:          262.5404       0.1829       0.1828       0.1831
78a2c95a72SStefan RoeseTriad:        266.8594       0.1800       0.1799       0.1802
79a2c95a72SStefan Roese
80a2c95a72SStefan RoeseTTCP Benchmark Results
81a2c95a72SStefan Roesettcp-t: socket
82a2c95a72SStefan Roesettcp-t: connect
83a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000  tcp  ->
84a2c95a72SStefan Roeselocalhost
85a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++
86a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89
87a2c95a72SStefan Roesettcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw
88a2c95a72SStefan Roese
89a2c95a72SStefan Roese
90a2c95a72SStefan Roese2006-07-28, Stefan Roese <sr@denx.de>
91