1bfcdcef8SDaniel Jordan.. SPDX-License-Identifier: GPL-2.0 2bfcdcef8SDaniel Jordan 3bfcdcef8SDaniel Jordan======================================= 4bfcdcef8SDaniel JordanThe padata parallel execution mechanism 5bfcdcef8SDaniel Jordan======================================= 6bfcdcef8SDaniel Jordan 7ec3b39c7SDaniel Jordan:Date: May 2020 8bfcdcef8SDaniel Jordan 9bfcdcef8SDaniel JordanPadata is a mechanism by which the kernel can farm jobs out to be done in 10ec3b39c7SDaniel Jordanparallel on multiple CPUs while optionally retaining their ordering. 11bfcdcef8SDaniel Jordan 12ec3b39c7SDaniel JordanIt was originally developed for IPsec, which needs to perform encryption and 13ec3b39c7SDaniel Jordandecryption on large numbers of packets without reordering those packets. This 14ec3b39c7SDaniel Jordanis currently the sole consumer of padata's serialized job support. 15ec3b39c7SDaniel Jordan 16ec3b39c7SDaniel JordanPadata also supports multithreaded jobs, splitting up the job evenly while load 17ec3b39c7SDaniel Jordanbalancing and coordinating between threads. 18ec3b39c7SDaniel Jordan 19ec3b39c7SDaniel JordanRunning Serialized Jobs 20ec3b39c7SDaniel Jordan======================= 21bfcdcef8SDaniel Jordan 22bfcdcef8SDaniel JordanInitializing 23bfcdcef8SDaniel Jordan------------ 24bfcdcef8SDaniel Jordan 25ec3b39c7SDaniel JordanThe first step in using padata to run serialized jobs is to set up a 26ec3b39c7SDaniel Jordanpadata_instance structure for overall control of how jobs are to be run:: 27bfcdcef8SDaniel Jordan 28bfcdcef8SDaniel Jordan #include <linux/padata.h> 29bfcdcef8SDaniel Jordan 303f257191SDaniel Jordan struct padata_instance *padata_alloc(const char *name); 31bfcdcef8SDaniel Jordan 32bfcdcef8SDaniel Jordan'name' simply identifies the instance. 33bfcdcef8SDaniel Jordan 34350ef051SDaniel JordanThen, complete padata initialization by allocating a padata_shell:: 35bfcdcef8SDaniel Jordan 36bfcdcef8SDaniel Jordan struct padata_shell *padata_alloc_shell(struct padata_instance *pinst); 37bfcdcef8SDaniel Jordan 38bfcdcef8SDaniel JordanA padata_shell is used to submit a job to padata and allows a series of such 39bfcdcef8SDaniel Jordanjobs to be serialized independently. A padata_instance may have one or more 40bfcdcef8SDaniel Jordanpadata_shells associated with it, each allowing a separate series of jobs. 41bfcdcef8SDaniel Jordan 42bfcdcef8SDaniel JordanModifying cpumasks 43bfcdcef8SDaniel Jordan------------------ 44bfcdcef8SDaniel Jordan 45*d2fb903fSRandy DunlapThe CPUs used to run jobs can be changed in two ways, programmatically with 46bfcdcef8SDaniel Jordanpadata_set_cpumask() or via sysfs. The former is defined:: 47bfcdcef8SDaniel Jordan 48bfcdcef8SDaniel Jordan int padata_set_cpumask(struct padata_instance *pinst, int cpumask_type, 49bfcdcef8SDaniel Jordan cpumask_var_t cpumask); 50bfcdcef8SDaniel Jordan 51bfcdcef8SDaniel JordanHere cpumask_type is one of PADATA_CPU_PARALLEL or PADATA_CPU_SERIAL, where a 52bfcdcef8SDaniel Jordanparallel cpumask describes which processors will be used to execute jobs 53bfcdcef8SDaniel Jordansubmitted to this instance in parallel and a serial cpumask defines which 54bfcdcef8SDaniel Jordanprocessors are allowed to be used as the serialization callback processor. 55bfcdcef8SDaniel Jordancpumask specifies the new cpumask to use. 56bfcdcef8SDaniel Jordan 57bfcdcef8SDaniel JordanThere may be sysfs files for an instance's cpumasks. For example, pcrypt's 58bfcdcef8SDaniel Jordanlive in /sys/kernel/pcrypt/<instance-name>. Within an instance's directory 59bfcdcef8SDaniel Jordanthere are two files, parallel_cpumask and serial_cpumask, and either cpumask 60bfcdcef8SDaniel Jordanmay be changed by echoing a bitmask into the file, for example:: 61bfcdcef8SDaniel Jordan 62bfcdcef8SDaniel Jordan echo f > /sys/kernel/pcrypt/pencrypt/parallel_cpumask 63bfcdcef8SDaniel Jordan 64bfcdcef8SDaniel JordanReading one of these files shows the user-supplied cpumask, which may be 65bfcdcef8SDaniel Jordandifferent from the 'usable' cpumask. 66bfcdcef8SDaniel Jordan 67bfcdcef8SDaniel JordanPadata maintains two pairs of cpumasks internally, the user-supplied cpumasks 68bfcdcef8SDaniel Jordanand the 'usable' cpumasks. (Each pair consists of a parallel and a serial 69bfcdcef8SDaniel Jordancpumask.) The user-supplied cpumasks default to all possible CPUs on instance 70bfcdcef8SDaniel Jordanallocation and may be changed as above. The usable cpumasks are always a 71bfcdcef8SDaniel Jordansubset of the user-supplied cpumasks and contain only the online CPUs in the 72bfcdcef8SDaniel Jordanuser-supplied masks; these are the cpumasks padata actually uses. So it is 73bfcdcef8SDaniel Jordanlegal to supply a cpumask to padata that contains offline CPUs. Once an 74bfcdcef8SDaniel Jordanoffline CPU in the user-supplied cpumask comes online, padata is going to use 75bfcdcef8SDaniel Jordanit. 76bfcdcef8SDaniel Jordan 77bfcdcef8SDaniel JordanChanging the CPU masks are expensive operations, so it should not be done with 78bfcdcef8SDaniel Jordangreat frequency. 79bfcdcef8SDaniel Jordan 80bfcdcef8SDaniel JordanRunning A Job 81bfcdcef8SDaniel Jordan------------- 82bfcdcef8SDaniel Jordan 83bfcdcef8SDaniel JordanActually submitting work to the padata instance requires the creation of a 84bfcdcef8SDaniel Jordanpadata_priv structure, which represents one job:: 85bfcdcef8SDaniel Jordan 86bfcdcef8SDaniel Jordan struct padata_priv { 87bfcdcef8SDaniel Jordan /* Other stuff here... */ 88bfcdcef8SDaniel Jordan void (*parallel)(struct padata_priv *padata); 89bfcdcef8SDaniel Jordan void (*serial)(struct padata_priv *padata); 90bfcdcef8SDaniel Jordan }; 91bfcdcef8SDaniel Jordan 92bfcdcef8SDaniel JordanThis structure will almost certainly be embedded within some larger 93bfcdcef8SDaniel Jordanstructure specific to the work to be done. Most of its fields are private to 94bfcdcef8SDaniel Jordanpadata, but the structure should be zeroed at initialisation time, and the 95bfcdcef8SDaniel Jordanparallel() and serial() functions should be provided. Those functions will 96bfcdcef8SDaniel Jordanbe called in the process of getting the work done as we will see 97bfcdcef8SDaniel Jordanmomentarily. 98bfcdcef8SDaniel Jordan 99bfcdcef8SDaniel JordanThe submission of the job is done with:: 100bfcdcef8SDaniel Jordan 101bfcdcef8SDaniel Jordan int padata_do_parallel(struct padata_shell *ps, 102bfcdcef8SDaniel Jordan struct padata_priv *padata, int *cb_cpu); 103bfcdcef8SDaniel Jordan 104bfcdcef8SDaniel JordanThe ps and padata structures must be set up as described above; cb_cpu 105bfcdcef8SDaniel Jordanpoints to the preferred CPU to be used for the final callback when the job is 106bfcdcef8SDaniel Jordandone; it must be in the current instance's CPU mask (if not the cb_cpu pointer 107bfcdcef8SDaniel Jordanis updated to point to the CPU actually chosen). The return value from 108bfcdcef8SDaniel Jordanpadata_do_parallel() is zero on success, indicating that the job is in 109bfcdcef8SDaniel Jordanprogress. -EBUSY means that somebody, somewhere else is messing with the 110bfcdcef8SDaniel Jordaninstance's CPU mask, while -EINVAL is a complaint about cb_cpu not being in the 111bfcdcef8SDaniel Jordanserial cpumask, no online CPUs in the parallel or serial cpumasks, or a stopped 112bfcdcef8SDaniel Jordaninstance. 113bfcdcef8SDaniel Jordan 114bfcdcef8SDaniel JordanEach job submitted to padata_do_parallel() will, in turn, be passed to 115bfcdcef8SDaniel Jordanexactly one call to the above-mentioned parallel() function, on one CPU, so 116bfcdcef8SDaniel Jordantrue parallelism is achieved by submitting multiple jobs. parallel() runs with 117bfcdcef8SDaniel Jordansoftware interrupts disabled and thus cannot sleep. The parallel() 118bfcdcef8SDaniel Jordanfunction gets the padata_priv structure pointer as its lone parameter; 119bfcdcef8SDaniel Jordaninformation about the actual work to be done is probably obtained by using 120bfcdcef8SDaniel Jordancontainer_of() to find the enclosing structure. 121bfcdcef8SDaniel Jordan 122bfcdcef8SDaniel JordanNote that parallel() has no return value; the padata subsystem assumes that 123bfcdcef8SDaniel Jordanparallel() will take responsibility for the job from this point. The job 124bfcdcef8SDaniel Jordanneed not be completed during this call, but, if parallel() leaves work 125bfcdcef8SDaniel Jordanoutstanding, it should be prepared to be called again with a new job before 126bfcdcef8SDaniel Jordanthe previous one completes. 127bfcdcef8SDaniel Jordan 128bfcdcef8SDaniel JordanSerializing Jobs 129bfcdcef8SDaniel Jordan---------------- 130bfcdcef8SDaniel Jordan 131bfcdcef8SDaniel JordanWhen a job does complete, parallel() (or whatever function actually finishes 132bfcdcef8SDaniel Jordanthe work) should inform padata of the fact with a call to:: 133bfcdcef8SDaniel Jordan 134bfcdcef8SDaniel Jordan void padata_do_serial(struct padata_priv *padata); 135bfcdcef8SDaniel Jordan 136bfcdcef8SDaniel JordanAt some point in the future, padata_do_serial() will trigger a call to the 137bfcdcef8SDaniel Jordanserial() function in the padata_priv structure. That call will happen on 138bfcdcef8SDaniel Jordanthe CPU requested in the initial call to padata_do_parallel(); it, too, is 139bfcdcef8SDaniel Jordanrun with local software interrupts disabled. 140bfcdcef8SDaniel JordanNote that this call may be deferred for a while since the padata code takes 141bfcdcef8SDaniel Jordanpains to ensure that jobs are completed in the order in which they were 142bfcdcef8SDaniel Jordansubmitted. 143bfcdcef8SDaniel Jordan 144bfcdcef8SDaniel JordanDestroying 145bfcdcef8SDaniel Jordan---------- 146bfcdcef8SDaniel Jordan 147350ef051SDaniel JordanCleaning up a padata instance predictably involves calling the two free 148bfcdcef8SDaniel Jordanfunctions that correspond to the allocation in reverse:: 149bfcdcef8SDaniel Jordan 150bfcdcef8SDaniel Jordan void padata_free_shell(struct padata_shell *ps); 151bfcdcef8SDaniel Jordan void padata_free(struct padata_instance *pinst); 152bfcdcef8SDaniel Jordan 153bfcdcef8SDaniel JordanIt is the user's responsibility to ensure all outstanding jobs are complete 154bfcdcef8SDaniel Jordanbefore any of the above are called. 155bfcdcef8SDaniel Jordan 156ec3b39c7SDaniel JordanRunning Multithreaded Jobs 157ec3b39c7SDaniel Jordan========================== 158ec3b39c7SDaniel Jordan 159ec3b39c7SDaniel JordanA multithreaded job has a main thread and zero or more helper threads, with the 160ec3b39c7SDaniel Jordanmain thread participating in the job and then waiting until all helpers have 161ec3b39c7SDaniel Jordanfinished. padata splits the job into units called chunks, where a chunk is a 162ec3b39c7SDaniel Jordanpiece of the job that one thread completes in one call to the thread function. 163ec3b39c7SDaniel Jordan 164ec3b39c7SDaniel JordanA user has to do three things to run a multithreaded job. First, describe the 165ec3b39c7SDaniel Jordanjob by defining a padata_mt_job structure, which is explained in the Interface 166ec3b39c7SDaniel Jordansection. This includes a pointer to the thread function, which padata will 167ec3b39c7SDaniel Jordancall each time it assigns a job chunk to a thread. Then, define the thread 168ec3b39c7SDaniel Jordanfunction, which accepts three arguments, ``start``, ``end``, and ``arg``, where 169ec3b39c7SDaniel Jordanthe first two delimit the range that the thread operates on and the last is a 170ec3b39c7SDaniel Jordanpointer to the job's shared state, if any. Prepare the shared state, which is 171ec3b39c7SDaniel Jordantypically allocated on the main thread's stack. Last, call 172ec3b39c7SDaniel Jordanpadata_do_multithreaded(), which will return once the job is finished. 173ec3b39c7SDaniel Jordan 174bfcdcef8SDaniel JordanInterface 175bfcdcef8SDaniel Jordan========= 176bfcdcef8SDaniel Jordan 177bfcdcef8SDaniel Jordan.. kernel-doc:: include/linux/padata.h 178bfcdcef8SDaniel Jordan.. kernel-doc:: kernel/padata.c 179