xref: /openbmc/linux/Documentation/networking/device_drivers/ethernet/chelsio/cxgb.rst (revision c0ecca6604b80e438b032578634c6e133c7028f6)
1.. SPDX-License-Identifier: GPL-2.0
2.. include:: <isonum.txt>
3
4=============================================
5Chelsio N210 10Gb Ethernet Network Controller
6=============================================
7
8Driver Release Notes for Linux
9
10Version 2.1.1
11
12June 20, 2005
13
14.. Contents
15
16 INTRODUCTION
17 FEATURES
18 PERFORMANCE
19 DRIVER MESSAGES
20 KNOWN ISSUES
21 SUPPORT
22
23
24Introduction
25============
26
27 This document describes the Linux driver for Chelsio 10Gb Ethernet Network
28 Controller. This driver supports the Chelsio N210 NIC and is backward
29 compatible with the Chelsio N110 model 10Gb NICs.
30
31
32Features
33========
34
35Adaptive Interrupts (adaptive-rx)
36---------------------------------
37
38  This feature provides an adaptive algorithm that adjusts the interrupt
39  coalescing parameters, allowing the driver to dynamically adapt the latency
40  settings to achieve the highest performance during various types of network
41  load.
42
43  The interface used to control this feature is ethtool. Please see the
44  ethtool manpage for additional usage information.
45
46  By default, adaptive-rx is disabled.
47  To enable adaptive-rx::
48
49      ethtool -C <interface> adaptive-rx on
50
51  To disable adaptive-rx, use ethtool::
52
53      ethtool -C <interface> adaptive-rx off
54
55  After disabling adaptive-rx, the timer latency value will be set to 50us.
56  You may set the timer latency after disabling adaptive-rx::
57
58      ethtool -C <interface> rx-usecs <microseconds>
59
60  An example to set the timer latency value to 100us on eth0::
61
62      ethtool -C eth0 rx-usecs 100
63
64  You may also provide a timer latency value while disabling adaptive-rx::
65
66      ethtool -C <interface> adaptive-rx off rx-usecs <microseconds>
67
68  If adaptive-rx is disabled and a timer latency value is specified, the timer
69  will be set to the specified value until changed by the user or until
70  adaptive-rx is enabled.
71
72  To view the status of the adaptive-rx and timer latency values::
73
74      ethtool -c <interface>
75
76
77TCP Segmentation Offloading (TSO) Support
78-----------------------------------------
79
80  This feature, also known as "large send", enables a system's protocol stack
81  to offload portions of outbound TCP processing to a network interface card
82  thereby reducing system CPU utilization and enhancing performance.
83
84  The interface used to control this feature is ethtool version 1.8 or higher.
85  Please see the ethtool manpage for additional usage information.
86
87  By default, TSO is enabled.
88  To disable TSO::
89
90      ethtool -K <interface> tso off
91
92  To enable TSO::
93
94      ethtool -K <interface> tso on
95
96  To view the status of TSO::
97
98      ethtool -k <interface>
99
100
101Performance
102===========
103
104 The following information is provided as an example of how to change system
105 parameters for "performance tuning" an what value to use. You may or may not
106 want to change these system parameters, depending on your server/workstation
107 application. Doing so is not warranted in any way by Chelsio Communications,
108 and is done at "YOUR OWN RISK". Chelsio will not be held responsible for loss
109 of data or damage to equipment.
110
111 Your distribution may have a different way of doing things, or you may prefer
112 a different method. These commands are shown only to provide an example of
113 what to do and are by no means definitive.
114
115 Making any of the following system changes will only last until you reboot
116 your system. You may want to write a script that runs at boot-up which
117 includes the optimal settings for your system.
118
119  Setting PCI Latency Timer::
120
121      setpci -d 1425::
122
123* 0x0c.l=0x0000F800
124
125  Disabling TCP timestamp::
126
127      sysctl -w net.ipv4.tcp_timestamps=0
128
129  Disabling SACK::
130
131      sysctl -w net.ipv4.tcp_sack=0
132
133  Setting large number of incoming connection requests::
134
135      sysctl -w net.ipv4.tcp_max_syn_backlog=3000
136
137  Setting maximum receive socket buffer size::
138
139      sysctl -w net.core.rmem_max=1024000
140
141  Setting maximum send socket buffer size::
142
143      sysctl -w net.core.wmem_max=1024000
144
145  Set smp_affinity (on a multiprocessor system) to a single CPU::
146
147      echo 1 > /proc/irq/<interrupt_number>/smp_affinity
148
149  Setting default receive socket buffer size::
150
151      sysctl -w net.core.rmem_default=524287
152
153  Setting default send socket buffer size::
154
155      sysctl -w net.core.wmem_default=524287
156
157  Setting maximum option memory buffers::
158
159      sysctl -w net.core.optmem_max=524287
160
161  Setting maximum backlog (# of unprocessed packets before kernel drops)::
162
163      sysctl -w net.core.netdev_max_backlog=300000
164
165  Setting TCP read buffers (min/default/max)::
166
167      sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000"
168
169  Setting TCP write buffers (min/pressure/max)::
170
171      sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000"
172
173  Setting TCP buffer space (min/pressure/max)::
174
175      sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000"
176
177  TCP window size for single connections:
178
179   The receive buffer (RX_WINDOW) size must be at least as large as the
180   Bandwidth-Delay Product of the communication link between the sender and
181   receiver. Due to the variations of RTT, you may want to increase the buffer
182   size up to 2 times the Bandwidth-Delay Product. Reference page 289 of
183   "TCP/IP Illustrated, Volume 1, The Protocols" by W. Richard Stevens.
184
185   At 10Gb speeds, use the following formula::
186
187       RX_WINDOW >= 1.25MBytes * RTT(in milliseconds)
188       Example for RTT with 100us: RX_WINDOW = (1,250,000 * 0.1) = 125,000
189
190   RX_WINDOW sizes of 256KB - 512KB should be sufficient.
191
192   Setting the min, max, and default receive buffer (RX_WINDOW) size::
193
194       sysctl -w net.ipv4.tcp_rmem="<min> <default> <max>"
195
196  TCP window size for multiple connections:
197   The receive buffer (RX_WINDOW) size may be calculated the same as single
198   connections, but should be divided by the number of connections. The
199   smaller window prevents congestion and facilitates better pacing,
200   especially if/when MAC level flow control does not work well or when it is
201   not supported on the machine. Experimentation may be necessary to attain
202   the correct value. This method is provided as a starting point for the
203   correct receive buffer size.
204
205   Setting the min, max, and default receive buffer (RX_WINDOW) size is
206   performed in the same manner as single connection.
207
208
209Driver Messages
210===============
211
212 The following messages are the most common messages logged by syslog. These
213 may be found in /var/log/messages.
214
215  Driver up::
216
217     Chelsio Network Driver - version 2.1.1
218
219  NIC detected::
220
221     eth#: Chelsio N210 1x10GBaseX NIC (rev #), PCIX 133MHz/64-bit
222
223  Link up::
224
225     eth#: link is up at 10 Gbps, full duplex
226
227  Link down::
228
229     eth#: link is down
230
231
232Known Issues
233============
234
235 These issues have been identified during testing. The following information
236 is provided as a workaround to the problem. In some cases, this problem is
237 inherent to Linux or to a particular Linux Distribution and/or hardware
238 platform.
239
240  1. Large number of TCP retransmits on a multiprocessor (SMP) system.
241
242      On a system with multiple CPUs, the interrupt (IRQ) for the network
243      controller may be bound to more than one CPU. This will cause TCP
244      retransmits if the packet data were to be split across different CPUs
245      and re-assembled in a different order than expected.
246
247      To eliminate the TCP retransmits, set smp_affinity on the particular
248      interrupt to a single CPU. You can locate the interrupt (IRQ) used on
249      the N110/N210 by using ifconfig::
250
251	  ifconfig <dev_name> | grep Interrupt
252
253      Set the smp_affinity to a single CPU::
254
255	  echo 1 > /proc/irq/<interrupt_number>/smp_affinity
256
257      It is highly suggested that you do not run the irqbalance daemon on your
258      system, as this will change any smp_affinity setting you have applied.
259      The irqbalance daemon runs on a 10 second interval and binds interrupts
260      to the least loaded CPU determined by the daemon. To disable this daemon::
261
262	  chkconfig --level 2345 irqbalance off
263
264      By default, some Linux distributions enable the kernel feature,
265      irqbalance, which performs the same function as the daemon. To disable
266      this feature, add the following line to your bootloader::
267
268	  noirqbalance
269
270	  Example using the Grub bootloader::
271
272	      title Red Hat Enterprise Linux AS (2.4.21-27.ELsmp)
273	      root (hd0,0)
274	      kernel /vmlinuz-2.4.21-27.ELsmp ro root=/dev/hda3 noirqbalance
275	      initrd /initrd-2.4.21-27.ELsmp.img
276
277  2. After running insmod, the driver is loaded and the incorrect network
278     interface is brought up without running ifup.
279
280      When using 2.4.x kernels, including RHEL kernels, the Linux kernel
281      invokes a script named "hotplug". This script is primarily used to
282      automatically bring up USB devices when they are plugged in, however,
283      the script also attempts to automatically bring up a network interface
284      after loading the kernel module. The hotplug script does this by scanning
285      the ifcfg-eth# config files in /etc/sysconfig/network-scripts, looking
286      for HWADDR=<mac_address>.
287
288      If the hotplug script does not find the HWADDRR within any of the
289      ifcfg-eth# files, it will bring up the device with the next available
290      interface name. If this interface is already configured for a different
291      network card, your new interface will have incorrect IP address and
292      network settings.
293
294      To solve this issue, you can add the HWADDR=<mac_address> key to the
295      interface config file of your network controller.
296
297      To disable this "hotplug" feature, you may add the driver (module name)
298      to the "blacklist" file located in /etc/hotplug. It has been noted that
299      this does not work for network devices because the net.agent script
300      does not use the blacklist file. Simply remove, or rename, the net.agent
301      script located in /etc/hotplug to disable this feature.
302
303  3. Transport Protocol (TP) hangs when running heavy multi-connection traffic
304     on an AMD Opteron system with HyperTransport PCI-X Tunnel chipset.
305
306      If your AMD Opteron system uses the AMD-8131 HyperTransport PCI-X Tunnel
307      chipset, you may experience the "133-Mhz Mode Split Completion Data
308      Corruption" bug identified by AMD while using a 133Mhz PCI-X card on the
309      bus PCI-X bus.
310
311      AMD states, "Under highly specific conditions, the AMD-8131 PCI-X Tunnel
312      can provide stale data via split completion cycles to a PCI-X card that
313      is operating at 133 Mhz", causing data corruption.
314
315      AMD's provides three workarounds for this problem, however, Chelsio
316      recommends the first option for best performance with this bug:
317
318	For 133Mhz secondary bus operation, limit the transaction length and
319	the number of outstanding transactions, via BIOS configuration
320	programming of the PCI-X card, to the following:
321
322	   Data Length (bytes): 1k
323
324	   Total allowed outstanding transactions: 2
325
326      Please refer to AMD 8131-HT/PCI-X Errata 26310 Rev 3.08 August 2004,
327      section 56, "133-MHz Mode Split Completion Data Corruption" for more
328      details with this bug and workarounds suggested by AMD.
329
330      It may be possible to work outside AMD's recommended PCI-X settings, try
331      increasing the Data Length to 2k bytes for increased performance. If you
332      have issues with these settings, please revert to the "safe" settings
333      and duplicate the problem before submitting a bug or asking for support.
334
335      .. note::
336
337	    The default setting on most systems is 8 outstanding transactions
338	    and 2k bytes data length.
339
340  4. On multiprocessor systems, it has been noted that an application which
341     is handling 10Gb networking can switch between CPUs causing degraded
342     and/or unstable performance.
343
344      If running on an SMP system and taking performance measurements, it
345      is suggested you either run the latest netperf-2.4.0+ or use a binding
346      tool such as Tim Hockin's procstate utilities (runon)
347      <http://www.hockin.org/~thockin/procstate/>.
348
349      Binding netserver and netperf (or other applications) to particular
350      CPUs will have a significant difference in performance measurements.
351      You may need to experiment which CPU to bind the application to in
352      order to achieve the best performance for your system.
353
354      If you are developing an application designed for 10Gb networking,
355      please keep in mind you may want to look at kernel functions
356      sched_setaffinity & sched_getaffinity to bind your application.
357
358      If you are just running user-space applications such as ftp, telnet,
359      etc., you may want to try the runon tool provided by Tim Hockin's
360      procstate utility. You could also try binding the interface to a
361      particular CPU: runon 0 ifup eth0
362
363
364Support
365=======
366
367 If you have problems with the software or hardware, please contact our
368 customer support team via email at support@chelsio.com or check our website
369 at http://www.chelsio.com
370
371-------------------------------------------------------------------------------
372
373::
374
375 Chelsio Communications
376 370 San Aleso Ave.
377 Suite 100
378 Sunnyvale, CA 94085
379 http://www.chelsio.com
380
381This program is free software; you can redistribute it and/or modify
382it under the terms of the GNU General Public License, version 2, as
383published by the Free Software Foundation.
384
385You should have received a copy of the GNU General Public License along
386with this program; if not, write to the Free Software Foundation, Inc.,
38759 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
388
389THIS SOFTWARE IS PROVIDED ``AS IS`` AND WITHOUT ANY EXPRESS OR IMPLIED
390WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
391MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
392
393Copyright |copy| 2003-2005 Chelsio Communications. All rights reserved.
394