1a2c95a72SStefan RoeseAMCC suggested to set the PMU bit to 0 for best performace on the 2a2c95a72SStefan RoesePPC440 DDR controller. The 440er common DDR setup files (sdram.c & 3a2c95a72SStefan Roesespd_sdram.c) are changed accordingly. So all 440er boards using 4a2c95a72SStefan Roesethese setup routines will automatically receive this performance 5a2c95a72SStefan Roeseincrease. 6a2c95a72SStefan Roese 7a2c95a72SStefan RoesePlease see below some benchmarks done by AMCC to demonstrate this 8a2c95a72SStefan Roeseperformance changes: 9a2c95a72SStefan Roese 10a2c95a72SStefan Roese 11a2c95a72SStefan Roese---------------------------------------- 12*a187559eSBin MengSDRAM0_CFG0[PMU] = 1 (U-Boot default for Bamboo, Yosemite and Yellowstone) 13a2c95a72SStefan Roese---------------------------------------- 14a2c95a72SStefan RoeseStream benchmark results 15a2c95a72SStefan Roese------------------------------------------------------------- 16a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word. 17a2c95a72SStefan Roese------------------------------------------------------------- 18a2c95a72SStefan RoeseArray size = 2000000, Offset = 0 19a2c95a72SStefan RoeseTotal memory required = 45.8 MB. 20a2c95a72SStefan RoeseEach test is run 10 times, but only 21a2c95a72SStefan Roesethe *best* time for each is used. 22a2c95a72SStefan Roese------------------------------------------------------------- 23a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds. 24a2c95a72SStefan RoeseEach test below will take on the order of 112345 microseconds. 25a2c95a72SStefan Roese (= 112345 clock ticks) 26a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting 27a2c95a72SStefan Roeseat least 20 clock ticks per test. 28a2c95a72SStefan Roese------------------------------------------------------------- 29a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline. 30a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system 31a2c95a72SStefan Roesetimer. 32a2c95a72SStefan Roese------------------------------------------------------------- 33a2c95a72SStefan RoeseFunction Rate (MB/s) RMS time Min time Max time 34a2c95a72SStefan RoeseCopy: 256.7683 0.1248 0.1246 0.1250 35a2c95a72SStefan RoeseScale: 246.0157 0.1302 0.1301 0.1302 36a2c95a72SStefan RoeseAdd: 255.0316 0.1883 0.1882 0.1885 37a2c95a72SStefan RoeseTriad: 253.1245 0.1897 0.1896 0.1899 38a2c95a72SStefan Roese 39a2c95a72SStefan Roese 40a2c95a72SStefan RoeseTTCP Benchmark Results 41a2c95a72SStefan Roesettcp-t: socket 42a2c95a72SStefan Roesettcp-t: connect 43a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> 44a2c95a72SStefan Roeselocalhost 45a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++ 46a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57 47a2c95a72SStefan Roesettcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw 48a2c95a72SStefan Roese 49a2c95a72SStefan Roese---------------------------------------- 50a2c95a72SStefan RoeseSDRAM0_CFG0[PMU] = 0 (Suggested modification) 51a2c95a72SStefan RoeseSetting PMU = 0 provides a noticeable performance improvement *2% to 52a2c95a72SStefan Roese5% improvement in memory performance. 53a2c95a72SStefan Roese*Improves the Mbit/sec for TTCP benchmark by almost 76%. 54a2c95a72SStefan Roese---------------------------------------- 55a2c95a72SStefan RoeseStream benchmark results 56a2c95a72SStefan Roese------------------------------------------------------------- 57a2c95a72SStefan RoeseThis system uses 8 bytes per DOUBLE PRECISION word. 58a2c95a72SStefan Roese------------------------------------------------------------- 59a2c95a72SStefan RoeseArray size = 2000000, Offset = 0 60a2c95a72SStefan RoeseTotal memory required = 45.8 MB. 61a2c95a72SStefan RoeseEach test is run 10 times, but only 62a2c95a72SStefan Roesethe *best* time for each is used. 63a2c95a72SStefan Roese------------------------------------------------------------- 64a2c95a72SStefan RoeseYour clock granularity/precision appears to be 1 microseconds. 65a2c95a72SStefan RoeseEach test below will take on the order of 120066 microseconds. 66a2c95a72SStefan Roese (= 120066 clock ticks) 67a2c95a72SStefan RoeseIncrease the size of the arrays if this shows that you are not getting 68a2c95a72SStefan Roeseat least 20 clock ticks per test. 69a2c95a72SStefan Roese------------------------------------------------------------- 70a2c95a72SStefan RoeseWARNING -- The above is only a rough guideline. 71a2c95a72SStefan RoeseFor best results, please be sure you know the precision of your system 72a2c95a72SStefan Roesetimer. 73a2c95a72SStefan Roese------------------------------------------------------------- 74a2c95a72SStefan RoeseFunction Rate (MB/s) RMS time Min time Max time 75a2c95a72SStefan RoeseCopy: 262.5167 0.1221 0.1219 0.1223 76a2c95a72SStefan RoeseScale: 258.4856 0.1238 0.1238 0.1240 77a2c95a72SStefan RoeseAdd: 262.5404 0.1829 0.1828 0.1831 78a2c95a72SStefan RoeseTriad: 266.8594 0.1800 0.1799 0.1802 79a2c95a72SStefan Roese 80a2c95a72SStefan RoeseTTCP Benchmark Results 81a2c95a72SStefan Roesettcp-t: socket 82a2c95a72SStefan Roesettcp-t: connect 83a2c95a72SStefan Roesettcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> 84a2c95a72SStefan Roeselocalhost 85a2c95a72SStefan Roesettcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++ 86a2c95a72SStefan Roesettcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89 87a2c95a72SStefan Roesettcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw 88a2c95a72SStefan Roese 89a2c95a72SStefan Roese 90a2c95a72SStefan Roese2006-07-28, Stefan Roese <sr@denx.de> 91