1*a2c95a72SStefan RoeseAMCC suggested to set the PMU bit to 0 for best performace on the 2*a2c95a72SStefan RoesePPC440 DDR controller. The 440er common DDR setup files (sdram.c & 3*a2c95a72SStefan Roesespd_sdram.c) are changed accordingly. So all 440er boards using 4*a2c95a72SStefan Roesethese setup routines will automatically receive this performance 5*a2c95a72SStefan Roeseincrease. 6*a2c95a72SStefan Roese 7*a2c95a72SStefan RoesePlease see below some benchmarks done by AMCC to demonstrate this 8*a2c95a72SStefan Roeseperformance changes: 9*a2c95a72SStefan Roese 10*a2c95a72SStefan Roese 11*a2c95a72SStefan Roese---------------------------------------- 12*a2c95a72SStefan RoeseSDRAM0_CFG0[PMU] = 1 (U-boot default for Bamboo, Yosemite and Yellowstone) 13*a2c95a72SStefan Roese---------------------------------------- 14*a2c95a72SStefan RoeseStream benchmark results 15*a2c95a72SStefan Roese------------------------------------------------------------- 16*a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word. 17*a2c95a72SStefan Roese------------------------------------------------------------- 18*a2c95a72SStefan RoeseArray size = 2000000, Offset = 0 19*a2c95a72SStefan RoeseTotal memory required = 45.8 MB. 20*a2c95a72SStefan RoeseEach test is run 10 times, but only 21*a2c95a72SStefan Roesethe *best* time for each is used. 22*a2c95a72SStefan Roese------------------------------------------------------------- 23*a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds. 24*a2c95a72SStefan RoeseEach test below will take on the order of 112345 microseconds. 25*a2c95a72SStefan Roese (= 112345 clock ticks) 26*a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting 27*a2c95a72SStefan Roeseat least 20 clock ticks per test. 28*a2c95a72SStefan Roese------------------------------------------------------------- 29*a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline. 30*a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system 31*a2c95a72SStefan Roesetimer. 32*a2c95a72SStefan Roese------------------------------------------------------------- 33*a2c95a72SStefan RoeseFunction Rate (MB/s) RMS time Min time Max time 34*a2c95a72SStefan RoeseCopy: 256.7683 0.1248 0.1246 0.1250 35*a2c95a72SStefan RoeseScale: 246.0157 0.1302 0.1301 0.1302 36*a2c95a72SStefan RoeseAdd: 255.0316 0.1883 0.1882 0.1885 37*a2c95a72SStefan RoeseTriad: 253.1245 0.1897 0.1896 0.1899 38*a2c95a72SStefan Roese 39*a2c95a72SStefan Roese 40*a2c95a72SStefan RoeseTTCP Benchmark Results 41*a2c95a72SStefan Roesettcp-t: socket 42*a2c95a72SStefan Roesettcp-t: connect 43*a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> 44*a2c95a72SStefan Roeselocalhost 45*a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++ 46*a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57 47*a2c95a72SStefan Roesettcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw 48*a2c95a72SStefan Roese 49*a2c95a72SStefan Roese---------------------------------------- 50*a2c95a72SStefan RoeseSDRAM0_CFG0[PMU] = 0 (Suggested modification) 51*a2c95a72SStefan RoeseSetting PMU = 0 provides a noticeable performance improvement *2% to 52*a2c95a72SStefan Roese5% improvement in memory performance. 53*a2c95a72SStefan Roese*Improves the Mbit/sec for TTCP benchmark by almost 76%. 54*a2c95a72SStefan Roese---------------------------------------- 55*a2c95a72SStefan RoeseStream benchmark results 56*a2c95a72SStefan Roese------------------------------------------------------------- 57*a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word. 58*a2c95a72SStefan Roese------------------------------------------------------------- 59*a2c95a72SStefan RoeseArray size = 2000000, Offset = 0 60*a2c95a72SStefan RoeseTotal memory required = 45.8 MB. 61*a2c95a72SStefan RoeseEach test is run 10 times, but only 62*a2c95a72SStefan Roesethe *best* time for each is used. 63*a2c95a72SStefan Roese------------------------------------------------------------- 64*a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds. 65*a2c95a72SStefan RoeseEach test below will take on the order of 120066 microseconds. 66*a2c95a72SStefan Roese (= 120066 clock ticks) 67*a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting 68*a2c95a72SStefan Roeseat least 20 clock ticks per test. 69*a2c95a72SStefan Roese------------------------------------------------------------- 70*a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline. 71*a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system 72*a2c95a72SStefan Roesetimer. 73*a2c95a72SStefan Roese------------------------------------------------------------- 74*a2c95a72SStefan RoeseFunction Rate (MB/s) RMS time Min time Max time 75*a2c95a72SStefan RoeseCopy: 262.5167 0.1221 0.1219 0.1223 76*a2c95a72SStefan RoeseScale: 258.4856 0.1238 0.1238 0.1240 77*a2c95a72SStefan RoeseAdd: 262.5404 0.1829 0.1828 0.1831 78*a2c95a72SStefan RoeseTriad: 266.8594 0.1800 0.1799 0.1802 79*a2c95a72SStefan Roese 80*a2c95a72SStefan RoeseTTCP Benchmark Results 81*a2c95a72SStefan Roesettcp-t: socket 82*a2c95a72SStefan Roesettcp-t: connect 83*a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> 84*a2c95a72SStefan Roeselocalhost 85*a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++ 86*a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89 87*a2c95a72SStefan Roesettcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw 88*a2c95a72SStefan Roese 89*a2c95a72SStefan Roese 90*a2c95a72SStefan Roese2006-07-28, Stefan Roese <sr@denx.de> 91