xref: /openbmc/linux/Documentation/networking/net_dim.rst (revision ead5d1f4d877e92c051e1a1ade623d0d30e71619)
19b038086SJakub Kicinski======================================================
29b038086SJakub KicinskiNet DIM - Generic Network Dynamic Interrupt Moderation
39b038086SJakub Kicinski======================================================
49b038086SJakub Kicinski
59b038086SJakub Kicinski:Author: Tal Gilboa <talgi@mellanox.com>
69b038086SJakub Kicinski
79b038086SJakub Kicinski.. contents:: :depth: 2
89b038086SJakub Kicinski
99b038086SJakub KicinskiAssumptions
109b038086SJakub Kicinski===========
119b038086SJakub Kicinski
129b038086SJakub KicinskiThis document assumes the reader has basic knowledge in network drivers
139b038086SJakub Kicinskiand in general interrupt moderation.
149b038086SJakub Kicinski
159b038086SJakub Kicinski
169b038086SJakub KicinskiIntroduction
179b038086SJakub Kicinski============
189b038086SJakub Kicinski
199b038086SJakub KicinskiDynamic Interrupt Moderation (DIM) (in networking) refers to changing the
209b038086SJakub Kicinskiinterrupt moderation configuration of a channel in order to optimize packet
219b038086SJakub Kicinskiprocessing. The mechanism includes an algorithm which decides if and how to
229b038086SJakub Kicinskichange moderation parameters for a channel, usually by performing an analysis on
239b038086SJakub Kicinskiruntime data sampled from the system. Net DIM is such a mechanism. In each
249b038086SJakub Kicinskiiteration of the algorithm, it analyses a given sample of the data, compares it
259b038086SJakub Kicinskito the previous sample and if required, it can decide to change some of the
269b038086SJakub Kicinskiinterrupt moderation configuration fields. The data sample is composed of data
279b038086SJakub Kicinskibandwidth, the number of packets and the number of events. The time between
289b038086SJakub Kicinskisamples is also measured. Net DIM compares the current and the previous data and
299b038086SJakub Kicinskireturns an adjusted interrupt moderation configuration object. In some cases,
309b038086SJakub Kicinskithe algorithm might decide not to change anything. The configuration fields are
319b038086SJakub Kicinskithe minimum duration (microseconds) allowed between events and the maximum
329b038086SJakub Kicinskinumber of wanted packets per event. The Net DIM algorithm ascribes importance to
339b038086SJakub Kicinskiincrease bandwidth over reducing interrupt rate.
349b038086SJakub Kicinski
359b038086SJakub Kicinski
369b038086SJakub KicinskiNet DIM Algorithm
379b038086SJakub Kicinski=================
389b038086SJakub Kicinski
399b038086SJakub KicinskiEach iteration of the Net DIM algorithm follows these steps:
409b038086SJakub Kicinski
419b038086SJakub Kicinski#. Calculates new data sample.
429b038086SJakub Kicinski#. Compares it to previous sample.
439b038086SJakub Kicinski#. Makes a decision - suggests interrupt moderation configuration fields.
449b038086SJakub Kicinski#. Applies a schedule work function, which applies suggested configuration.
459b038086SJakub Kicinski
469b038086SJakub KicinskiThe first two steps are straightforward, both the new and the previous data are
479b038086SJakub Kicinskisupplied by the driver registered to Net DIM. The previous data is the new data
489b038086SJakub Kicinskisupplied to the previous iteration. The comparison step checks the difference
499b038086SJakub Kicinskibetween the new and previous data and decides on the result of the last step.
509b038086SJakub KicinskiA step would result as "better" if bandwidth increases and as "worse" if
519b038086SJakub Kicinskibandwidth reduces. If there is no change in bandwidth, the packet rate is
529b038086SJakub Kicinskicompared in a similar fashion - increase == "better" and decrease == "worse".
539b038086SJakub KicinskiIn case there is no change in the packet rate as well, the interrupt rate is
549b038086SJakub Kicinskicompared. Here the algorithm tries to optimize for lower interrupt rate so an
559b038086SJakub Kicinskiincrease in the interrupt rate is considered "worse" and a decrease is
569b038086SJakub Kicinskiconsidered "better". Step #2 has an optimization for avoiding false results: it
579b038086SJakub Kicinskionly considers a difference between samples as valid if it is greater than a
589b038086SJakub Kicinskicertain percentage. Also, since Net DIM does not measure anything by itself, it
599b038086SJakub Kicinskiassumes the data provided by the driver is valid.
609b038086SJakub Kicinski
619b038086SJakub KicinskiStep #3 decides on the suggested configuration based on the result from step #2
629b038086SJakub Kicinskiand the internal state of the algorithm. The states reflect the "direction" of
639b038086SJakub Kicinskithe algorithm: is it going left (reducing moderation), right (increasing
649b038086SJakub Kicinskimoderation) or standing still. Another optimization is that if a decision
659b038086SJakub Kicinskito stay still is made multiple times, the interval between iterations of the
669b038086SJakub Kicinskialgorithm would increase in order to reduce calculation overhead. Also, after
679b038086SJakub Kicinski"parking" on one of the most left or most right decisions, the algorithm may
689b038086SJakub Kicinskidecide to verify this decision by taking a step in the other direction. This is
699b038086SJakub Kicinskidone in order to avoid getting stuck in a "deep sleep" scenario. Once a
709b038086SJakub Kicinskidecision is made, an interrupt moderation configuration is selected from
719b038086SJakub Kicinskithe predefined profiles.
729b038086SJakub Kicinski
739b038086SJakub KicinskiThe last step is to notify the registered driver that it should apply the
749b038086SJakub Kicinskisuggested configuration. This is done by scheduling a work function, defined by
759b038086SJakub Kicinskithe Net DIM API and provided by the registered driver.
769b038086SJakub Kicinski
779b038086SJakub KicinskiAs you can see, Net DIM itself does not actively interact with the system. It
789b038086SJakub Kicinskiwould have trouble making the correct decisions if the wrong data is supplied to
799b038086SJakub Kicinskiit and it would be useless if the work function would not apply the suggested
809b038086SJakub Kicinskiconfiguration. This does, however, allow the registered driver some room for
819b038086SJakub Kicinskimanoeuvre as it may provide partial data or ignore the algorithm suggestion
829b038086SJakub Kicinskiunder some conditions.
839b038086SJakub Kicinski
849b038086SJakub Kicinski
859b038086SJakub KicinskiRegistering a Network Device to DIM
869b038086SJakub Kicinski===================================
879b038086SJakub Kicinski
889b038086SJakub KicinskiNet DIM API exposes the main function net_dim().
899b038086SJakub KicinskiThis function is the entry point to the Net
909b038086SJakub KicinskiDIM algorithm and has to be called every time the driver would like to check if
919b038086SJakub Kicinskiit should change interrupt moderation parameters. The driver should provide two
929b038086SJakub Kicinskidata structures: :c:type:`struct dim <dim>` and
939b038086SJakub Kicinski:c:type:`struct dim_sample <dim_sample>`. :c:type:`struct dim <dim>`
949b038086SJakub Kicinskidescribes the state of DIM for a specific object (RX queue, TX queue,
959b038086SJakub Kicinskiother queues, etc.). This includes the current selected profile, previous data
969b038086SJakub Kicinskisamples, the callback function provided by the driver and more.
979b038086SJakub Kicinski:c:type:`struct dim_sample <dim_sample>` describes a data sample,
989b038086SJakub Kicinskiwhich will be compared to the data sample stored in :c:type:`struct dim <dim>`
999b038086SJakub Kicinskiin order to decide on the algorithm's next
1009b038086SJakub Kicinskistep. The sample should include bytes, packets and interrupts, measured by
1019b038086SJakub Kicinskithe driver.
1029b038086SJakub Kicinski
1039b038086SJakub KicinskiIn order to use Net DIM from a networking driver, the driver needs to call the
1049b038086SJakub Kicinskimain net_dim() function. The recommended method is to call net_dim() on each
1059b038086SJakub Kicinskiinterrupt. Since Net DIM has a built-in moderation and it might decide to skip
1069b038086SJakub Kicinskiiterations under certain conditions, there is no need to moderate the net_dim()
1079b038086SJakub Kicinskicalls as well. As mentioned above, the driver needs to provide an object of type
1089b038086SJakub Kicinski:c:type:`struct dim <dim>` to the net_dim() function call. It is advised for
1099b038086SJakub Kicinskieach entity using Net DIM to hold a :c:type:`struct dim <dim>` as part of its
1109b038086SJakub Kicinskidata structure and use it as the main Net DIM API object.
1119b038086SJakub KicinskiThe :c:type:`struct dim_sample <dim_sample>` should hold the latest
1129b038086SJakub Kicinskibytes, packets and interrupts count. No need to perform any calculations, just
1139b038086SJakub Kicinskiinclude the raw data.
1149b038086SJakub Kicinski
1159b038086SJakub KicinskiThe net_dim() call itself does not return anything. Instead Net DIM relies on
1169b038086SJakub Kicinskithe driver to provide a callback function, which is called when the algorithm
1179b038086SJakub Kicinskidecides to make a change in the interrupt moderation parameters. This callback
1189b038086SJakub Kicinskiwill be scheduled and run in a separate thread in order not to add overhead to
1199b038086SJakub Kicinskithe data flow. After the work is done, Net DIM algorithm needs to be set to
1209b038086SJakub Kicinskithe proper state in order to move to the next iteration.
1219b038086SJakub Kicinski
1229b038086SJakub Kicinski
1239b038086SJakub KicinskiExample
1249b038086SJakub Kicinski=======
1259b038086SJakub Kicinski
1269b038086SJakub KicinskiThe following code demonstrates how to register a driver to Net DIM. The actual
1279b038086SJakub Kicinskiusage is not complete but it should make the outline of the usage clear.
1289b038086SJakub Kicinski
1299b038086SJakub Kicinski.. code-block:: c
1309b038086SJakub Kicinski
1319b038086SJakub Kicinski  #include <linux/dim.h>
1329b038086SJakub Kicinski
1339b038086SJakub Kicinski  /* Callback for net DIM to schedule on a decision to change moderation */
1349b038086SJakub Kicinski  void my_driver_do_dim_work(struct work_struct *work)
1359b038086SJakub Kicinski  {
1369b038086SJakub Kicinski	/* Get struct dim from struct work_struct */
1379b038086SJakub Kicinski	struct dim *dim = container_of(work, struct dim,
1389b038086SJakub Kicinski				       work);
1399b038086SJakub Kicinski	/* Do interrupt moderation related stuff */
1409b038086SJakub Kicinski	...
1419b038086SJakub Kicinski
1429b038086SJakub Kicinski	/* Signal net DIM work is done and it should move to next iteration */
1439b038086SJakub Kicinski	dim->state = DIM_START_MEASURE;
1449b038086SJakub Kicinski  }
1459b038086SJakub Kicinski
1469b038086SJakub Kicinski  /* My driver's interrupt handler */
1479b038086SJakub Kicinski  int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...)
1489b038086SJakub Kicinski  {
1499b038086SJakub Kicinski	...
1509b038086SJakub Kicinski	/* A struct to hold current measured data */
1519b038086SJakub Kicinski	struct dim_sample dim_sample;
1529b038086SJakub Kicinski	...
1539b038086SJakub Kicinski	/* Initiate data sample struct with current data */
1549b038086SJakub Kicinski	dim_update_sample(my_entity->events,
1559b038086SJakub Kicinski		          my_entity->packets,
1569b038086SJakub Kicinski		          my_entity->bytes,
1579b038086SJakub Kicinski		          &dim_sample);
1589b038086SJakub Kicinski	/* Call net DIM */
1599b038086SJakub Kicinski	net_dim(&my_entity->dim, dim_sample);
1609b038086SJakub Kicinski	...
1619b038086SJakub Kicinski  }
1629b038086SJakub Kicinski
1639b038086SJakub Kicinski  /* My entity's initialization function (my_entity was already allocated) */
1649b038086SJakub Kicinski  int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...)
1659b038086SJakub Kicinski  {
1669b038086SJakub Kicinski	...
1679b038086SJakub Kicinski	/* Initiate struct work_struct with my driver's callback function */
1689b038086SJakub Kicinski	INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work);
1699b038086SJakub Kicinski	...
1709b038086SJakub Kicinski  }
171*9d859289SRandy Dunlap
172*9d859289SRandy DunlapDynamic Interrupt Moderation (DIM) library API
173*9d859289SRandy Dunlap==============================================
174*9d859289SRandy Dunlap
175*9d859289SRandy Dunlap.. kernel-doc:: include/linux/dim.h
176*9d859289SRandy Dunlap    :internal:
177