1cb7f6a7bSJulius Volz /* 2cb7f6a7bSJulius Volz * IPVS: Shortest Expected Delay scheduling module 3cb7f6a7bSJulius Volz * 4cb7f6a7bSJulius Volz * Authors: Wensong Zhang <wensong@linuxvirtualserver.org> 5cb7f6a7bSJulius Volz * 6cb7f6a7bSJulius Volz * This program is free software; you can redistribute it and/or 7cb7f6a7bSJulius Volz * modify it under the terms of the GNU General Public License 8cb7f6a7bSJulius Volz * as published by the Free Software Foundation; either version 9cb7f6a7bSJulius Volz * 2 of the License, or (at your option) any later version. 10cb7f6a7bSJulius Volz * 11cb7f6a7bSJulius Volz * Changes: 12cb7f6a7bSJulius Volz * 13cb7f6a7bSJulius Volz */ 14cb7f6a7bSJulius Volz 15cb7f6a7bSJulius Volz /* 16cb7f6a7bSJulius Volz * The SED algorithm attempts to minimize each job's expected delay until 17cb7f6a7bSJulius Volz * completion. The expected delay that the job will experience is 18cb7f6a7bSJulius Volz * (Ci + 1) / Ui if sent to the ith server, in which Ci is the number of 19cb7f6a7bSJulius Volz * jobs on the ith server and Ui is the fixed service rate (weight) of 20cb7f6a7bSJulius Volz * the ith server. The SED algorithm adopts a greedy policy that each does 21cb7f6a7bSJulius Volz * what is in its own best interest, i.e. to join the queue which would 22cb7f6a7bSJulius Volz * minimize its expected delay of completion. 23cb7f6a7bSJulius Volz * 24cb7f6a7bSJulius Volz * See the following paper for more information: 25cb7f6a7bSJulius Volz * A. Weinrib and S. Shenker, Greed is not enough: Adaptive load sharing 26cb7f6a7bSJulius Volz * in large heterogeneous systems. In Proceedings IEEE INFOCOM'88, 27cb7f6a7bSJulius Volz * pages 986-994, 1988. 28cb7f6a7bSJulius Volz * 29cb7f6a7bSJulius Volz * Thanks must go to Marko Buuri <marko@buuri.name> for talking SED to me. 30cb7f6a7bSJulius Volz * 31cb7f6a7bSJulius Volz * The difference between SED and WLC is that SED includes the incoming 32cb7f6a7bSJulius Volz * job in the cost function (the increment of 1). SED may outperform 33cb7f6a7bSJulius Volz * WLC, while scheduling big jobs under larger heterogeneous systems 34cb7f6a7bSJulius Volz * (the server weight varies a lot). 35cb7f6a7bSJulius Volz * 36cb7f6a7bSJulius Volz */ 37cb7f6a7bSJulius Volz 389aada7acSHannes Eder #define KMSG_COMPONENT "IPVS" 399aada7acSHannes Eder #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt 409aada7acSHannes Eder 41cb7f6a7bSJulius Volz #include <linux/module.h> 42cb7f6a7bSJulius Volz #include <linux/kernel.h> 43cb7f6a7bSJulius Volz 44cb7f6a7bSJulius Volz #include <net/ip_vs.h> 45cb7f6a7bSJulius Volz 46cb7f6a7bSJulius Volz 47cb7f6a7bSJulius Volz static inline unsigned int 48cb7f6a7bSJulius Volz ip_vs_sed_dest_overhead(struct ip_vs_dest *dest) 49cb7f6a7bSJulius Volz { 50cb7f6a7bSJulius Volz /* 51cb7f6a7bSJulius Volz * We only use the active connection number in the cost 52cb7f6a7bSJulius Volz * calculation here. 53cb7f6a7bSJulius Volz */ 54cb7f6a7bSJulius Volz return atomic_read(&dest->activeconns) + 1; 55cb7f6a7bSJulius Volz } 56cb7f6a7bSJulius Volz 57cb7f6a7bSJulius Volz 58cb7f6a7bSJulius Volz /* 59cb7f6a7bSJulius Volz * Weighted Least Connection scheduling 60cb7f6a7bSJulius Volz */ 61cb7f6a7bSJulius Volz static struct ip_vs_dest * 62cb7f6a7bSJulius Volz ip_vs_sed_schedule(struct ip_vs_service *svc, const struct sk_buff *skb) 63cb7f6a7bSJulius Volz { 64cb7f6a7bSJulius Volz struct ip_vs_dest *dest, *least; 65cb7f6a7bSJulius Volz unsigned int loh, doh; 66cb7f6a7bSJulius Volz 671e3e238eSHannes Eder IP_VS_DBG(6, "%s(): Scheduling...\n", __func__); 68cb7f6a7bSJulius Volz 69cb7f6a7bSJulius Volz /* 70cb7f6a7bSJulius Volz * We calculate the load of each dest server as follows: 71cb7f6a7bSJulius Volz * (server expected overhead) / dest->weight 72cb7f6a7bSJulius Volz * 73cb7f6a7bSJulius Volz * Remember -- no floats in kernel mode!!! 74cb7f6a7bSJulius Volz * The comparison of h1*w2 > h2*w1 is equivalent to that of 75cb7f6a7bSJulius Volz * h1/w1 > h2/w2 76cb7f6a7bSJulius Volz * if every weight is larger than zero. 77cb7f6a7bSJulius Volz * 78cb7f6a7bSJulius Volz * The server with weight=0 is quiesced and will not receive any 79cb7f6a7bSJulius Volz * new connections. 80cb7f6a7bSJulius Volz */ 81cb7f6a7bSJulius Volz 829be52abaSJulian Anastasov list_for_each_entry_rcu(dest, &svc->destinations, n_list) { 83cb7f6a7bSJulius Volz if (!(dest->flags & IP_VS_DEST_F_OVERLOAD) && 84cb7f6a7bSJulius Volz atomic_read(&dest->weight) > 0) { 85cb7f6a7bSJulius Volz least = dest; 86cb7f6a7bSJulius Volz loh = ip_vs_sed_dest_overhead(least); 87cb7f6a7bSJulius Volz goto nextstage; 88cb7f6a7bSJulius Volz } 89cb7f6a7bSJulius Volz } 9041ac51eeSPatrick Schaaf ip_vs_scheduler_err(svc, "no destination available"); 91cb7f6a7bSJulius Volz return NULL; 92cb7f6a7bSJulius Volz 93cb7f6a7bSJulius Volz /* 94cb7f6a7bSJulius Volz * Find the destination with the least load. 95cb7f6a7bSJulius Volz */ 96cb7f6a7bSJulius Volz nextstage: 979be52abaSJulian Anastasov list_for_each_entry_continue_rcu(dest, &svc->destinations, n_list) { 98cb7f6a7bSJulius Volz if (dest->flags & IP_VS_DEST_F_OVERLOAD) 99cb7f6a7bSJulius Volz continue; 100cb7f6a7bSJulius Volz doh = ip_vs_sed_dest_overhead(dest); 101cb7f6a7bSJulius Volz if (loh * atomic_read(&dest->weight) > 102cb7f6a7bSJulius Volz doh * atomic_read(&least->weight)) { 103cb7f6a7bSJulius Volz least = dest; 104cb7f6a7bSJulius Volz loh = doh; 105cb7f6a7bSJulius Volz } 106cb7f6a7bSJulius Volz } 107cb7f6a7bSJulius Volz 108cb7f6a7bSJulius Volz IP_VS_DBG_BUF(6, "SED: server %s:%u " 109cb7f6a7bSJulius Volz "activeconns %d refcnt %d weight %d overhead %d\n", 110cb7f6a7bSJulius Volz IP_VS_DBG_ADDR(svc->af, &least->addr), ntohs(least->port), 111cb7f6a7bSJulius Volz atomic_read(&least->activeconns), 112cb7f6a7bSJulius Volz atomic_read(&least->refcnt), 113cb7f6a7bSJulius Volz atomic_read(&least->weight), loh); 114cb7f6a7bSJulius Volz 115cb7f6a7bSJulius Volz return least; 116cb7f6a7bSJulius Volz } 117cb7f6a7bSJulius Volz 118cb7f6a7bSJulius Volz 119cb7f6a7bSJulius Volz static struct ip_vs_scheduler ip_vs_sed_scheduler = 120cb7f6a7bSJulius Volz { 121cb7f6a7bSJulius Volz .name = "sed", 122cb7f6a7bSJulius Volz .refcnt = ATOMIC_INIT(0), 123cb7f6a7bSJulius Volz .module = THIS_MODULE, 124cb7f6a7bSJulius Volz .n_list = LIST_HEAD_INIT(ip_vs_sed_scheduler.n_list), 125cb7f6a7bSJulius Volz .schedule = ip_vs_sed_schedule, 126cb7f6a7bSJulius Volz }; 127cb7f6a7bSJulius Volz 128cb7f6a7bSJulius Volz 129cb7f6a7bSJulius Volz static int __init ip_vs_sed_init(void) 130cb7f6a7bSJulius Volz { 131cb7f6a7bSJulius Volz return register_ip_vs_scheduler(&ip_vs_sed_scheduler); 132cb7f6a7bSJulius Volz } 133cb7f6a7bSJulius Volz 134cb7f6a7bSJulius Volz static void __exit ip_vs_sed_cleanup(void) 135cb7f6a7bSJulius Volz { 136cb7f6a7bSJulius Volz unregister_ip_vs_scheduler(&ip_vs_sed_scheduler); 137cb7f6a7bSJulius Volz } 138cb7f6a7bSJulius Volz 139cb7f6a7bSJulius Volz module_init(ip_vs_sed_init); 140cb7f6a7bSJulius Volz module_exit(ip_vs_sed_cleanup); 141cb7f6a7bSJulius Volz MODULE_LICENSE("GPL"); 142