19b038086SJakub Kicinski====================================================== 29b038086SJakub KicinskiNet DIM - Generic Network Dynamic Interrupt Moderation 39b038086SJakub Kicinski====================================================== 49b038086SJakub Kicinski 59b038086SJakub Kicinski:Author: Tal Gilboa <talgi@mellanox.com> 69b038086SJakub Kicinski 79b038086SJakub Kicinski.. contents:: :depth: 2 89b038086SJakub Kicinski 99b038086SJakub KicinskiAssumptions 109b038086SJakub Kicinski=========== 119b038086SJakub Kicinski 129b038086SJakub KicinskiThis document assumes the reader has basic knowledge in network drivers 139b038086SJakub Kicinskiand in general interrupt moderation. 149b038086SJakub Kicinski 159b038086SJakub Kicinski 169b038086SJakub KicinskiIntroduction 179b038086SJakub Kicinski============ 189b038086SJakub Kicinski 199b038086SJakub KicinskiDynamic Interrupt Moderation (DIM) (in networking) refers to changing the 209b038086SJakub Kicinskiinterrupt moderation configuration of a channel in order to optimize packet 219b038086SJakub Kicinskiprocessing. The mechanism includes an algorithm which decides if and how to 229b038086SJakub Kicinskichange moderation parameters for a channel, usually by performing an analysis on 239b038086SJakub Kicinskiruntime data sampled from the system. Net DIM is such a mechanism. In each 249b038086SJakub Kicinskiiteration of the algorithm, it analyses a given sample of the data, compares it 259b038086SJakub Kicinskito the previous sample and if required, it can decide to change some of the 269b038086SJakub Kicinskiinterrupt moderation configuration fields. The data sample is composed of data 279b038086SJakub Kicinskibandwidth, the number of packets and the number of events. The time between 289b038086SJakub Kicinskisamples is also measured. Net DIM compares the current and the previous data and 299b038086SJakub Kicinskireturns an adjusted interrupt moderation configuration object. In some cases, 309b038086SJakub Kicinskithe algorithm might decide not to change anything. The configuration fields are 319b038086SJakub Kicinskithe minimum duration (microseconds) allowed between events and the maximum 329b038086SJakub Kicinskinumber of wanted packets per event. The Net DIM algorithm ascribes importance to 339b038086SJakub Kicinskiincrease bandwidth over reducing interrupt rate. 349b038086SJakub Kicinski 359b038086SJakub Kicinski 369b038086SJakub KicinskiNet DIM Algorithm 379b038086SJakub Kicinski================= 389b038086SJakub Kicinski 399b038086SJakub KicinskiEach iteration of the Net DIM algorithm follows these steps: 409b038086SJakub Kicinski 419b038086SJakub Kicinski#. Calculates new data sample. 429b038086SJakub Kicinski#. Compares it to previous sample. 439b038086SJakub Kicinski#. Makes a decision - suggests interrupt moderation configuration fields. 449b038086SJakub Kicinski#. Applies a schedule work function, which applies suggested configuration. 459b038086SJakub Kicinski 469b038086SJakub KicinskiThe first two steps are straightforward, both the new and the previous data are 479b038086SJakub Kicinskisupplied by the driver registered to Net DIM. The previous data is the new data 489b038086SJakub Kicinskisupplied to the previous iteration. The comparison step checks the difference 499b038086SJakub Kicinskibetween the new and previous data and decides on the result of the last step. 509b038086SJakub KicinskiA step would result as "better" if bandwidth increases and as "worse" if 519b038086SJakub Kicinskibandwidth reduces. If there is no change in bandwidth, the packet rate is 529b038086SJakub Kicinskicompared in a similar fashion - increase == "better" and decrease == "worse". 539b038086SJakub KicinskiIn case there is no change in the packet rate as well, the interrupt rate is 549b038086SJakub Kicinskicompared. Here the algorithm tries to optimize for lower interrupt rate so an 559b038086SJakub Kicinskiincrease in the interrupt rate is considered "worse" and a decrease is 569b038086SJakub Kicinskiconsidered "better". Step #2 has an optimization for avoiding false results: it 579b038086SJakub Kicinskionly considers a difference between samples as valid if it is greater than a 589b038086SJakub Kicinskicertain percentage. Also, since Net DIM does not measure anything by itself, it 599b038086SJakub Kicinskiassumes the data provided by the driver is valid. 609b038086SJakub Kicinski 619b038086SJakub KicinskiStep #3 decides on the suggested configuration based on the result from step #2 629b038086SJakub Kicinskiand the internal state of the algorithm. The states reflect the "direction" of 639b038086SJakub Kicinskithe algorithm: is it going left (reducing moderation), right (increasing 649b038086SJakub Kicinskimoderation) or standing still. Another optimization is that if a decision 659b038086SJakub Kicinskito stay still is made multiple times, the interval between iterations of the 669b038086SJakub Kicinskialgorithm would increase in order to reduce calculation overhead. Also, after 679b038086SJakub Kicinski"parking" on one of the most left or most right decisions, the algorithm may 689b038086SJakub Kicinskidecide to verify this decision by taking a step in the other direction. This is 699b038086SJakub Kicinskidone in order to avoid getting stuck in a "deep sleep" scenario. Once a 709b038086SJakub Kicinskidecision is made, an interrupt moderation configuration is selected from 719b038086SJakub Kicinskithe predefined profiles. 729b038086SJakub Kicinski 739b038086SJakub KicinskiThe last step is to notify the registered driver that it should apply the 749b038086SJakub Kicinskisuggested configuration. This is done by scheduling a work function, defined by 759b038086SJakub Kicinskithe Net DIM API and provided by the registered driver. 769b038086SJakub Kicinski 779b038086SJakub KicinskiAs you can see, Net DIM itself does not actively interact with the system. It 789b038086SJakub Kicinskiwould have trouble making the correct decisions if the wrong data is supplied to 799b038086SJakub Kicinskiit and it would be useless if the work function would not apply the suggested 809b038086SJakub Kicinskiconfiguration. This does, however, allow the registered driver some room for 819b038086SJakub Kicinskimanoeuvre as it may provide partial data or ignore the algorithm suggestion 829b038086SJakub Kicinskiunder some conditions. 839b038086SJakub Kicinski 849b038086SJakub Kicinski 859b038086SJakub KicinskiRegistering a Network Device to DIM 869b038086SJakub Kicinski=================================== 879b038086SJakub Kicinski 889b038086SJakub KicinskiNet DIM API exposes the main function net_dim(). 899b038086SJakub KicinskiThis function is the entry point to the Net 909b038086SJakub KicinskiDIM algorithm and has to be called every time the driver would like to check if 919b038086SJakub Kicinskiit should change interrupt moderation parameters. The driver should provide two 929b038086SJakub Kicinskidata structures: :c:type:`struct dim <dim>` and 939b038086SJakub Kicinski:c:type:`struct dim_sample <dim_sample>`. :c:type:`struct dim <dim>` 949b038086SJakub Kicinskidescribes the state of DIM for a specific object (RX queue, TX queue, 959b038086SJakub Kicinskiother queues, etc.). This includes the current selected profile, previous data 969b038086SJakub Kicinskisamples, the callback function provided by the driver and more. 979b038086SJakub Kicinski:c:type:`struct dim_sample <dim_sample>` describes a data sample, 989b038086SJakub Kicinskiwhich will be compared to the data sample stored in :c:type:`struct dim <dim>` 999b038086SJakub Kicinskiin order to decide on the algorithm's next 1009b038086SJakub Kicinskistep. The sample should include bytes, packets and interrupts, measured by 1019b038086SJakub Kicinskithe driver. 1029b038086SJakub Kicinski 1039b038086SJakub KicinskiIn order to use Net DIM from a networking driver, the driver needs to call the 1049b038086SJakub Kicinskimain net_dim() function. The recommended method is to call net_dim() on each 1059b038086SJakub Kicinskiinterrupt. Since Net DIM has a built-in moderation and it might decide to skip 1069b038086SJakub Kicinskiiterations under certain conditions, there is no need to moderate the net_dim() 1079b038086SJakub Kicinskicalls as well. As mentioned above, the driver needs to provide an object of type 1089b038086SJakub Kicinski:c:type:`struct dim <dim>` to the net_dim() function call. It is advised for 1099b038086SJakub Kicinskieach entity using Net DIM to hold a :c:type:`struct dim <dim>` as part of its 1109b038086SJakub Kicinskidata structure and use it as the main Net DIM API object. 1119b038086SJakub KicinskiThe :c:type:`struct dim_sample <dim_sample>` should hold the latest 1129b038086SJakub Kicinskibytes, packets and interrupts count. No need to perform any calculations, just 1139b038086SJakub Kicinskiinclude the raw data. 1149b038086SJakub Kicinski 1159b038086SJakub KicinskiThe net_dim() call itself does not return anything. Instead Net DIM relies on 1169b038086SJakub Kicinskithe driver to provide a callback function, which is called when the algorithm 1179b038086SJakub Kicinskidecides to make a change in the interrupt moderation parameters. This callback 1189b038086SJakub Kicinskiwill be scheduled and run in a separate thread in order not to add overhead to 1199b038086SJakub Kicinskithe data flow. After the work is done, Net DIM algorithm needs to be set to 1209b038086SJakub Kicinskithe proper state in order to move to the next iteration. 1219b038086SJakub Kicinski 1229b038086SJakub Kicinski 1239b038086SJakub KicinskiExample 1249b038086SJakub Kicinski======= 1259b038086SJakub Kicinski 1269b038086SJakub KicinskiThe following code demonstrates how to register a driver to Net DIM. The actual 1279b038086SJakub Kicinskiusage is not complete but it should make the outline of the usage clear. 1289b038086SJakub Kicinski 1299b038086SJakub Kicinski.. code-block:: c 1309b038086SJakub Kicinski 1319b038086SJakub Kicinski #include <linux/dim.h> 1329b038086SJakub Kicinski 1339b038086SJakub Kicinski /* Callback for net DIM to schedule on a decision to change moderation */ 1349b038086SJakub Kicinski void my_driver_do_dim_work(struct work_struct *work) 1359b038086SJakub Kicinski { 1369b038086SJakub Kicinski /* Get struct dim from struct work_struct */ 1379b038086SJakub Kicinski struct dim *dim = container_of(work, struct dim, 1389b038086SJakub Kicinski work); 1399b038086SJakub Kicinski /* Do interrupt moderation related stuff */ 1409b038086SJakub Kicinski ... 1419b038086SJakub Kicinski 1429b038086SJakub Kicinski /* Signal net DIM work is done and it should move to next iteration */ 1439b038086SJakub Kicinski dim->state = DIM_START_MEASURE; 1449b038086SJakub Kicinski } 1459b038086SJakub Kicinski 1469b038086SJakub Kicinski /* My driver's interrupt handler */ 1479b038086SJakub Kicinski int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...) 1489b038086SJakub Kicinski { 1499b038086SJakub Kicinski ... 1509b038086SJakub Kicinski /* A struct to hold current measured data */ 1519b038086SJakub Kicinski struct dim_sample dim_sample; 1529b038086SJakub Kicinski ... 1539b038086SJakub Kicinski /* Initiate data sample struct with current data */ 1549b038086SJakub Kicinski dim_update_sample(my_entity->events, 1559b038086SJakub Kicinski my_entity->packets, 1569b038086SJakub Kicinski my_entity->bytes, 1579b038086SJakub Kicinski &dim_sample); 1589b038086SJakub Kicinski /* Call net DIM */ 1599b038086SJakub Kicinski net_dim(&my_entity->dim, dim_sample); 1609b038086SJakub Kicinski ... 1619b038086SJakub Kicinski } 1629b038086SJakub Kicinski 1639b038086SJakub Kicinski /* My entity's initialization function (my_entity was already allocated) */ 1649b038086SJakub Kicinski int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...) 1659b038086SJakub Kicinski { 1669b038086SJakub Kicinski ... 1679b038086SJakub Kicinski /* Initiate struct work_struct with my driver's callback function */ 1689b038086SJakub Kicinski INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work); 1699b038086SJakub Kicinski ... 1709b038086SJakub Kicinski } 171*9d859289SRandy Dunlap 172*9d859289SRandy DunlapDynamic Interrupt Moderation (DIM) library API 173*9d859289SRandy Dunlap============================================== 174*9d859289SRandy Dunlap 175*9d859289SRandy Dunlap.. kernel-doc:: include/linux/dim.h 176*9d859289SRandy Dunlap :internal: 177