11497624fSFederico Vaga.. _kernel_hacking_hack: 21497624fSFederico Vaga 3c4fcd7caSMauro Carvalho Chehab============================================ 4c4fcd7caSMauro Carvalho ChehabUnreliable Guide To Hacking The Linux Kernel 5c4fcd7caSMauro Carvalho Chehab============================================ 6c4fcd7caSMauro Carvalho Chehab 7c4fcd7caSMauro Carvalho Chehab:Author: Rusty Russell 8c4fcd7caSMauro Carvalho Chehab 9c4fcd7caSMauro Carvalho ChehabIntroduction 10c4fcd7caSMauro Carvalho Chehab============ 11c4fcd7caSMauro Carvalho Chehab 12c4fcd7caSMauro Carvalho ChehabWelcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux 13c4fcd7caSMauro Carvalho ChehabKernel Hacking. This document describes the common routines and general 14c4fcd7caSMauro Carvalho Chehabrequirements for kernel code: its goal is to serve as a primer for Linux 15c4fcd7caSMauro Carvalho Chehabkernel development for experienced C programmers. I avoid implementation 16c4fcd7caSMauro Carvalho Chehabdetails: that's what the code is for, and I ignore whole tracts of 17c4fcd7caSMauro Carvalho Chehabuseful routines. 18c4fcd7caSMauro Carvalho Chehab 19c4fcd7caSMauro Carvalho ChehabBefore you read this, please understand that I never wanted to write 20c4fcd7caSMauro Carvalho Chehabthis document, being grossly under-qualified, but I always wanted to 21c4fcd7caSMauro Carvalho Chehabread it, and this was the only way. I hope it will grow into a 22c4fcd7caSMauro Carvalho Chehabcompendium of best practice, common starting points and random 23c4fcd7caSMauro Carvalho Chehabinformation. 24c4fcd7caSMauro Carvalho Chehab 25c4fcd7caSMauro Carvalho ChehabThe Players 26c4fcd7caSMauro Carvalho Chehab=========== 27c4fcd7caSMauro Carvalho Chehab 28c4fcd7caSMauro Carvalho ChehabAt any time each of the CPUs in a system can be: 29c4fcd7caSMauro Carvalho Chehab 30c4fcd7caSMauro Carvalho Chehab- not associated with any process, serving a hardware interrupt; 31c4fcd7caSMauro Carvalho Chehab 32c4fcd7caSMauro Carvalho Chehab- not associated with any process, serving a softirq or tasklet; 33c4fcd7caSMauro Carvalho Chehab 34c4fcd7caSMauro Carvalho Chehab- running in kernel space, associated with a process (user context); 35c4fcd7caSMauro Carvalho Chehab 36c4fcd7caSMauro Carvalho Chehab- running a process in user space. 37c4fcd7caSMauro Carvalho Chehab 38c4fcd7caSMauro Carvalho ChehabThere is an ordering between these. The bottom two can preempt each 39c4fcd7caSMauro Carvalho Chehabother, but above that is a strict hierarchy: each can only be preempted 40c4fcd7caSMauro Carvalho Chehabby the ones above it. For example, while a softirq is running on a CPU, 41c4fcd7caSMauro Carvalho Chehabno other softirq will preempt it, but a hardware interrupt can. However, 42c4fcd7caSMauro Carvalho Chehabany other CPUs in the system execute independently. 43c4fcd7caSMauro Carvalho Chehab 44c4fcd7caSMauro Carvalho ChehabWe'll see a number of ways that the user context can block interrupts, 45c4fcd7caSMauro Carvalho Chehabto become truly non-preemptable. 46c4fcd7caSMauro Carvalho Chehab 47c4fcd7caSMauro Carvalho ChehabUser Context 48c4fcd7caSMauro Carvalho Chehab------------ 49c4fcd7caSMauro Carvalho Chehab 50c4fcd7caSMauro Carvalho ChehabUser context is when you are coming in from a system call or other trap: 51c4fcd7caSMauro Carvalho Chehablike userspace, you can be preempted by more important tasks and by 52c4fcd7caSMauro Carvalho Chehabinterrupts. You can sleep, by calling :c:func:`schedule()`. 53c4fcd7caSMauro Carvalho Chehab 54c4fcd7caSMauro Carvalho Chehab.. note:: 55c4fcd7caSMauro Carvalho Chehab 56c4fcd7caSMauro Carvalho Chehab You are always in user context on module load and unload, and on 57c4fcd7caSMauro Carvalho Chehab operations on the block device layer. 58c4fcd7caSMauro Carvalho Chehab 59c4fcd7caSMauro Carvalho ChehabIn user context, the ``current`` pointer (indicating the task we are 60c4fcd7caSMauro Carvalho Chehabcurrently executing) is valid, and :c:func:`in_interrupt()` 61dca1e58eSMauro Carvalho Chehab(``include/linux/preempt.h``) is false. 62c4fcd7caSMauro Carvalho Chehab 63c4fcd7caSMauro Carvalho Chehab.. warning:: 64c4fcd7caSMauro Carvalho Chehab 65c4fcd7caSMauro Carvalho Chehab Beware that if you have preemption or softirqs disabled (see below), 66c4fcd7caSMauro Carvalho Chehab :c:func:`in_interrupt()` will return a false positive. 67c4fcd7caSMauro Carvalho Chehab 68c4fcd7caSMauro Carvalho ChehabHardware Interrupts (Hard IRQs) 69c4fcd7caSMauro Carvalho Chehab------------------------------- 70c4fcd7caSMauro Carvalho Chehab 71c4fcd7caSMauro Carvalho ChehabTimer ticks, network cards and keyboard are examples of real hardware 72c4fcd7caSMauro Carvalho Chehabwhich produce interrupts at any time. The kernel runs interrupt 73c4fcd7caSMauro Carvalho Chehabhandlers, which services the hardware. The kernel guarantees that this 74c4fcd7caSMauro Carvalho Chehabhandler is never re-entered: if the same interrupt arrives, it is queued 75c4fcd7caSMauro Carvalho Chehab(or dropped). Because it disables interrupts, this handler has to be 76c4fcd7caSMauro Carvalho Chehabfast: frequently it simply acknowledges the interrupt, marks a 'software 77c4fcd7caSMauro Carvalho Chehabinterrupt' for execution and exits. 78c4fcd7caSMauro Carvalho Chehab 79fe450eebSChangbin DuYou can tell you are in a hardware interrupt, because in_hardirq() returns 80fe450eebSChangbin Dutrue. 81c4fcd7caSMauro Carvalho Chehab 82c4fcd7caSMauro Carvalho Chehab.. warning:: 83c4fcd7caSMauro Carvalho Chehab 84c4fcd7caSMauro Carvalho Chehab Beware that this will return a false positive if interrupts are 85c4fcd7caSMauro Carvalho Chehab disabled (see below). 86c4fcd7caSMauro Carvalho Chehab 87c4fcd7caSMauro Carvalho ChehabSoftware Interrupt Context: Softirqs and Tasklets 88c4fcd7caSMauro Carvalho Chehab------------------------------------------------- 89c4fcd7caSMauro Carvalho Chehab 90c4fcd7caSMauro Carvalho ChehabWhenever a system call is about to return to userspace, or a hardware 91c4fcd7caSMauro Carvalho Chehabinterrupt handler exits, any 'software interrupts' which are marked 92c4fcd7caSMauro Carvalho Chehabpending (usually by hardware interrupts) are run (``kernel/softirq.c``). 93c4fcd7caSMauro Carvalho Chehab 94c4fcd7caSMauro Carvalho ChehabMuch of the real interrupt handling work is done here. Early in the 95c4fcd7caSMauro Carvalho Chehabtransition to SMP, there were only 'bottom halves' (BHs), which didn't 96c4fcd7caSMauro Carvalho Chehabtake advantage of multiple CPUs. Shortly after we switched from wind-up 97c4fcd7caSMauro Carvalho Chehabcomputers made of match-sticks and snot, we abandoned this limitation 98c4fcd7caSMauro Carvalho Chehaband switched to 'softirqs'. 99c4fcd7caSMauro Carvalho Chehab 100c4fcd7caSMauro Carvalho Chehab``include/linux/interrupt.h`` lists the different softirqs. A very 101c4fcd7caSMauro Carvalho Chehabimportant softirq is the timer softirq (``include/linux/timer.h``): you 102c4fcd7caSMauro Carvalho Chehabcan register to have it call functions for you in a given length of 103c4fcd7caSMauro Carvalho Chehabtime. 104c4fcd7caSMauro Carvalho Chehab 105c4fcd7caSMauro Carvalho ChehabSoftirqs are often a pain to deal with, since the same softirq will run 106c4fcd7caSMauro Carvalho Chehabsimultaneously on more than one CPU. For this reason, tasklets 107c4fcd7caSMauro Carvalho Chehab(``include/linux/interrupt.h``) are more often used: they are 108c4fcd7caSMauro Carvalho Chehabdynamically-registrable (meaning you can have as many as you want), and 109c4fcd7caSMauro Carvalho Chehabthey also guarantee that any tasklet will only run on one CPU at any 110c4fcd7caSMauro Carvalho Chehabtime, although different tasklets can run simultaneously. 111c4fcd7caSMauro Carvalho Chehab 112c4fcd7caSMauro Carvalho Chehab.. warning:: 113c4fcd7caSMauro Carvalho Chehab 114c4fcd7caSMauro Carvalho Chehab The name 'tasklet' is misleading: they have nothing to do with 115f35cf1a5SKonstantin Ryabitsev 'tasks'. 116c4fcd7caSMauro Carvalho Chehab 117c4fcd7caSMauro Carvalho ChehabYou can tell you are in a softirq (or tasklet) using the 118dca1e58eSMauro Carvalho Chehab:c:func:`in_softirq()` macro (``include/linux/preempt.h``). 119c4fcd7caSMauro Carvalho Chehab 120c4fcd7caSMauro Carvalho Chehab.. warning:: 121c4fcd7caSMauro Carvalho Chehab 122dca1e58eSMauro Carvalho Chehab Beware that this will return a false positive if a 123*e648174bSMushahid Hussain :ref:`bottom half lock <local_bh_disable>` is held. 124c4fcd7caSMauro Carvalho Chehab 125c4fcd7caSMauro Carvalho ChehabSome Basic Rules 126c4fcd7caSMauro Carvalho Chehab================ 127c4fcd7caSMauro Carvalho Chehab 128c4fcd7caSMauro Carvalho ChehabNo memory protection 129c4fcd7caSMauro Carvalho Chehab If you corrupt memory, whether in user context or interrupt context, 130c4fcd7caSMauro Carvalho Chehab the whole machine will crash. Are you sure you can't do what you 131c4fcd7caSMauro Carvalho Chehab want in userspace? 132c4fcd7caSMauro Carvalho Chehab 133c4fcd7caSMauro Carvalho ChehabNo floating point or MMX 134c4fcd7caSMauro Carvalho Chehab The FPU context is not saved; even in user context the FPU state 135c4fcd7caSMauro Carvalho Chehab probably won't correspond with the current process: you would mess 136c4fcd7caSMauro Carvalho Chehab with some user process' FPU state. If you really want to do this, 137c4fcd7caSMauro Carvalho Chehab you would have to explicitly save/restore the full FPU state (and 138c4fcd7caSMauro Carvalho Chehab avoid context switches). It is generally a bad idea; use fixed point 139c4fcd7caSMauro Carvalho Chehab arithmetic first. 140c4fcd7caSMauro Carvalho Chehab 141c4fcd7caSMauro Carvalho ChehabA rigid stack limit 142c4fcd7caSMauro Carvalho Chehab Depending on configuration options the kernel stack is about 3K to 143c4fcd7caSMauro Carvalho Chehab 6K for most 32-bit architectures: it's about 14K on most 64-bit 144c4fcd7caSMauro Carvalho Chehab archs, and often shared with interrupts so you can't use it all. 145c4fcd7caSMauro Carvalho Chehab Avoid deep recursion and huge local arrays on the stack (allocate 146c4fcd7caSMauro Carvalho Chehab them dynamically instead). 147c4fcd7caSMauro Carvalho Chehab 148c4fcd7caSMauro Carvalho ChehabThe Linux kernel is portable 149c4fcd7caSMauro Carvalho Chehab Let's keep it that way. Your code should be 64-bit clean, and 150c4fcd7caSMauro Carvalho Chehab endian-independent. You should also minimize CPU specific stuff, 151c4fcd7caSMauro Carvalho Chehab e.g. inline assembly should be cleanly encapsulated and minimized to 152c4fcd7caSMauro Carvalho Chehab ease porting. Generally it should be restricted to the 153c4fcd7caSMauro Carvalho Chehab architecture-dependent part of the kernel tree. 154c4fcd7caSMauro Carvalho Chehab 155c4fcd7caSMauro Carvalho Chehabioctls: Not writing a new system call 156c4fcd7caSMauro Carvalho Chehab===================================== 157c4fcd7caSMauro Carvalho Chehab 158dca1e58eSMauro Carvalho ChehabA system call generally looks like this:: 159c4fcd7caSMauro Carvalho Chehab 160c4fcd7caSMauro Carvalho Chehab asmlinkage long sys_mycall(int arg) 161c4fcd7caSMauro Carvalho Chehab { 162c4fcd7caSMauro Carvalho Chehab return 0; 163c4fcd7caSMauro Carvalho Chehab } 164c4fcd7caSMauro Carvalho Chehab 165c4fcd7caSMauro Carvalho Chehab 166c4fcd7caSMauro Carvalho ChehabFirst, in most cases you don't want to create a new system call. You 167c4fcd7caSMauro Carvalho Chehabcreate a character device and implement an appropriate ioctl for it. 168c4fcd7caSMauro Carvalho ChehabThis is much more flexible than system calls, doesn't have to be entered 169c4fcd7caSMauro Carvalho Chehabin every architecture's ``include/asm/unistd.h`` and 170c4fcd7caSMauro Carvalho Chehab``arch/kernel/entry.S`` file, and is much more likely to be accepted by 171c4fcd7caSMauro Carvalho ChehabLinus. 172c4fcd7caSMauro Carvalho Chehab 173c4fcd7caSMauro Carvalho ChehabIf all your routine does is read or write some parameter, consider 174c4fcd7caSMauro Carvalho Chehabimplementing a :c:func:`sysfs()` interface instead. 175c4fcd7caSMauro Carvalho Chehab 176c4fcd7caSMauro Carvalho ChehabInside the ioctl you're in user context to a process. When a error 177dca1e58eSMauro Carvalho Chehaboccurs you return a negated errno (see 178dca1e58eSMauro Carvalho Chehab``include/uapi/asm-generic/errno-base.h``, 179dca1e58eSMauro Carvalho Chehab``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``), 180c4fcd7caSMauro Carvalho Chehabotherwise you return 0. 181c4fcd7caSMauro Carvalho Chehab 182c4fcd7caSMauro Carvalho ChehabAfter you slept you should check if a signal occurred: the Unix/Linux 183c4fcd7caSMauro Carvalho Chehabway of handling signals is to temporarily exit the system call with the 184c4fcd7caSMauro Carvalho Chehab``-ERESTARTSYS`` error. The system call entry code will switch back to 185c4fcd7caSMauro Carvalho Chehabuser context, process the signal handler and then your system call will 186c4fcd7caSMauro Carvalho Chehabbe restarted (unless the user disabled that). So you should be prepared 187c4fcd7caSMauro Carvalho Chehabto process the restart, e.g. if you're in the middle of manipulating 188c4fcd7caSMauro Carvalho Chehabsome data structure. 189c4fcd7caSMauro Carvalho Chehab 190c4fcd7caSMauro Carvalho Chehab:: 191c4fcd7caSMauro Carvalho Chehab 192c4fcd7caSMauro Carvalho Chehab if (signal_pending(current)) 193c4fcd7caSMauro Carvalho Chehab return -ERESTARTSYS; 194c4fcd7caSMauro Carvalho Chehab 195c4fcd7caSMauro Carvalho Chehab 196c4fcd7caSMauro Carvalho ChehabIf you're doing longer computations: first think userspace. If you 197c4fcd7caSMauro Carvalho Chehab**really** want to do it in kernel you should regularly check if you need 198c4fcd7caSMauro Carvalho Chehabto give up the CPU (remember there is cooperative multitasking per CPU). 199dca1e58eSMauro Carvalho ChehabIdiom:: 200c4fcd7caSMauro Carvalho Chehab 201c4fcd7caSMauro Carvalho Chehab cond_resched(); /* Will sleep */ 202c4fcd7caSMauro Carvalho Chehab 203c4fcd7caSMauro Carvalho Chehab 204c4fcd7caSMauro Carvalho ChehabA short note on interface design: the UNIX system call motto is "Provide 205c4fcd7caSMauro Carvalho Chehabmechanism not policy". 206c4fcd7caSMauro Carvalho Chehab 207c4fcd7caSMauro Carvalho ChehabRecipes for Deadlock 208c4fcd7caSMauro Carvalho Chehab==================== 209c4fcd7caSMauro Carvalho Chehab 210c4fcd7caSMauro Carvalho ChehabYou cannot call any routines which may sleep, unless: 211c4fcd7caSMauro Carvalho Chehab 212c4fcd7caSMauro Carvalho Chehab- You are in user context. 213c4fcd7caSMauro Carvalho Chehab 214c4fcd7caSMauro Carvalho Chehab- You do not own any spinlocks. 215c4fcd7caSMauro Carvalho Chehab 216c4fcd7caSMauro Carvalho Chehab- You have interrupts enabled (actually, Andi Kleen says that the 217c4fcd7caSMauro Carvalho Chehab scheduling code will enable them for you, but that's probably not 218c4fcd7caSMauro Carvalho Chehab what you wanted). 219c4fcd7caSMauro Carvalho Chehab 220c4fcd7caSMauro Carvalho ChehabNote that some functions may sleep implicitly: common ones are the user 221c4fcd7caSMauro Carvalho Chehabspace access functions (\*_user) and memory allocation functions 222c4fcd7caSMauro Carvalho Chehabwithout ``GFP_ATOMIC``. 223c4fcd7caSMauro Carvalho Chehab 224c4fcd7caSMauro Carvalho ChehabYou should always compile your kernel ``CONFIG_DEBUG_ATOMIC_SLEEP`` on, 225c4fcd7caSMauro Carvalho Chehaband it will warn you if you break these rules. If you **do** break the 226c4fcd7caSMauro Carvalho Chehabrules, you will eventually lock up your box. 227c4fcd7caSMauro Carvalho Chehab 228c4fcd7caSMauro Carvalho ChehabReally. 229c4fcd7caSMauro Carvalho Chehab 230c4fcd7caSMauro Carvalho ChehabCommon Routines 231c4fcd7caSMauro Carvalho Chehab=============== 232c4fcd7caSMauro Carvalho Chehab 233dca1e58eSMauro Carvalho Chehab:c:func:`printk()` 234dca1e58eSMauro Carvalho Chehab------------------ 235dca1e58eSMauro Carvalho Chehab 236dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/printk.h`` 237c4fcd7caSMauro Carvalho Chehab 238c4fcd7caSMauro Carvalho Chehab:c:func:`printk()` feeds kernel messages to the console, dmesg, and 239c4fcd7caSMauro Carvalho Chehabthe syslog daemon. It is useful for debugging and reporting errors, and 240c4fcd7caSMauro Carvalho Chehabcan be used inside interrupt context, but use with caution: a machine 241c4fcd7caSMauro Carvalho Chehabwhich has its console flooded with printk messages is unusable. It uses 242c4fcd7caSMauro Carvalho Chehaba format string mostly compatible with ANSI C printf, and C string 243dca1e58eSMauro Carvalho Chehabconcatenation to give it a first "priority" argument:: 244c4fcd7caSMauro Carvalho Chehab 245c4fcd7caSMauro Carvalho Chehab printk(KERN_INFO "i = %u\n", i); 246c4fcd7caSMauro Carvalho Chehab 247c4fcd7caSMauro Carvalho Chehab 248dca1e58eSMauro Carvalho ChehabSee ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are 249c4fcd7caSMauro Carvalho Chehabinterpreted by syslog as the level. Special case: for printing an IP 250dca1e58eSMauro Carvalho Chehabaddress use:: 251c4fcd7caSMauro Carvalho Chehab 252c4fcd7caSMauro Carvalho Chehab __be32 ipaddress; 253c4fcd7caSMauro Carvalho Chehab printk(KERN_INFO "my ip: %pI4\n", &ipaddress); 254c4fcd7caSMauro Carvalho Chehab 255c4fcd7caSMauro Carvalho Chehab 256c4fcd7caSMauro Carvalho Chehab:c:func:`printk()` internally uses a 1K buffer and does not catch 257c4fcd7caSMauro Carvalho Chehaboverruns. Make sure that will be enough. 258c4fcd7caSMauro Carvalho Chehab 259c4fcd7caSMauro Carvalho Chehab.. note:: 260c4fcd7caSMauro Carvalho Chehab 261c4fcd7caSMauro Carvalho Chehab You will know when you are a real kernel hacker when you start 262c4fcd7caSMauro Carvalho Chehab typoing printf as printk in your user programs :) 263c4fcd7caSMauro Carvalho Chehab 264c4fcd7caSMauro Carvalho Chehab.. note:: 265c4fcd7caSMauro Carvalho Chehab 266c4fcd7caSMauro Carvalho Chehab Another sidenote: the original Unix Version 6 sources had a comment 267c4fcd7caSMauro Carvalho Chehab on top of its printf function: "Printf should not be used for 268c4fcd7caSMauro Carvalho Chehab chit-chat". You should follow that advice. 269c4fcd7caSMauro Carvalho Chehab 270dca1e58eSMauro Carvalho Chehab:c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()` 271dca1e58eSMauro Carvalho Chehab--------------------------------------------------------------------------------------------------- 272dca1e58eSMauro Carvalho Chehab 273dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/uaccess.h`` / ``asm/uaccess.h`` 274c4fcd7caSMauro Carvalho Chehab 275c4fcd7caSMauro Carvalho Chehab**[SLEEPS]** 276c4fcd7caSMauro Carvalho Chehab 277c4fcd7caSMauro Carvalho Chehab:c:func:`put_user()` and :c:func:`get_user()` are used to get 278c4fcd7caSMauro Carvalho Chehaband put single values (such as an int, char, or long) from and to 279c4fcd7caSMauro Carvalho Chehabuserspace. A pointer into userspace should never be simply dereferenced: 280c4fcd7caSMauro Carvalho Chehabdata should be copied using these routines. Both return ``-EFAULT`` or 281c4fcd7caSMauro Carvalho Chehab0. 282c4fcd7caSMauro Carvalho Chehab 283c4fcd7caSMauro Carvalho Chehab:c:func:`copy_to_user()` and :c:func:`copy_from_user()` are 284c4fcd7caSMauro Carvalho Chehabmore general: they copy an arbitrary amount of data to and from 285c4fcd7caSMauro Carvalho Chehabuserspace. 286c4fcd7caSMauro Carvalho Chehab 287c4fcd7caSMauro Carvalho Chehab.. warning:: 288c4fcd7caSMauro Carvalho Chehab 289c4fcd7caSMauro Carvalho Chehab Unlike :c:func:`put_user()` and :c:func:`get_user()`, they 290c4fcd7caSMauro Carvalho Chehab return the amount of uncopied data (ie. 0 still means success). 291c4fcd7caSMauro Carvalho Chehab 292f35cf1a5SKonstantin Ryabitsev[Yes, this objectionable interface makes me cringe. The flamewar comes 293f35cf1a5SKonstantin Ryabitsevup every year or so. --RR.] 294c4fcd7caSMauro Carvalho Chehab 295c4fcd7caSMauro Carvalho ChehabThe functions may sleep implicitly. This should never be called outside 296c4fcd7caSMauro Carvalho Chehabuser context (it makes no sense), with interrupts disabled, or a 297c4fcd7caSMauro Carvalho Chehabspinlock held. 298c4fcd7caSMauro Carvalho Chehab 299dca1e58eSMauro Carvalho Chehab:c:func:`kmalloc()`/:c:func:`kfree()` 300dca1e58eSMauro Carvalho Chehab------------------------------------- 301dca1e58eSMauro Carvalho Chehab 302dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/slab.h`` 303c4fcd7caSMauro Carvalho Chehab 304c4fcd7caSMauro Carvalho Chehab**[MAY SLEEP: SEE BELOW]** 305c4fcd7caSMauro Carvalho Chehab 306c4fcd7caSMauro Carvalho ChehabThese routines are used to dynamically request pointer-aligned chunks of 307c4fcd7caSMauro Carvalho Chehabmemory, like malloc and free do in userspace, but 308c4fcd7caSMauro Carvalho Chehab:c:func:`kmalloc()` takes an extra flag word. Important values: 309c4fcd7caSMauro Carvalho Chehab 310c4fcd7caSMauro Carvalho Chehab``GFP_KERNEL`` 311c4fcd7caSMauro Carvalho Chehab May sleep and swap to free memory. Only allowed in user context, but 312c4fcd7caSMauro Carvalho Chehab is the most reliable way to allocate memory. 313c4fcd7caSMauro Carvalho Chehab 314c4fcd7caSMauro Carvalho Chehab``GFP_ATOMIC`` 315c4fcd7caSMauro Carvalho Chehab Don't sleep. Less reliable than ``GFP_KERNEL``, but may be called 316c4fcd7caSMauro Carvalho Chehab from interrupt context. You should **really** have a good 317c4fcd7caSMauro Carvalho Chehab out-of-memory error-handling strategy. 318c4fcd7caSMauro Carvalho Chehab 319c4fcd7caSMauro Carvalho Chehab``GFP_DMA`` 320c4fcd7caSMauro Carvalho Chehab Allocate ISA DMA lower than 16MB. If you don't know what that is you 321c4fcd7caSMauro Carvalho Chehab don't need it. Very unreliable. 322c4fcd7caSMauro Carvalho Chehab 323c4fcd7caSMauro Carvalho ChehabIf you see a sleeping function called from invalid context warning 324c4fcd7caSMauro Carvalho Chehabmessage, then maybe you called a sleeping allocation function from 325c4fcd7caSMauro Carvalho Chehabinterrupt context without ``GFP_ATOMIC``. You should really fix that. 326c4fcd7caSMauro Carvalho ChehabRun, don't walk. 327c4fcd7caSMauro Carvalho Chehab 328dca1e58eSMauro Carvalho ChehabIf you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or 329dca1e58eSMauro Carvalho Chehab``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()` 330dca1e58eSMauro Carvalho Chehab(``include/linux/gfp.h``). It takes an order argument (0 for page sized, 331c4fcd7caSMauro Carvalho Chehab1 for double page, 2 for four pages etc.) and the same memory priority 332c4fcd7caSMauro Carvalho Chehabflag word as above. 333c4fcd7caSMauro Carvalho Chehab 334c4fcd7caSMauro Carvalho ChehabIf you are allocating more than a page worth of bytes you can use 335c4fcd7caSMauro Carvalho Chehab:c:func:`vmalloc()`. It'll allocate virtual memory in the kernel 336c4fcd7caSMauro Carvalho Chehabmap. This block is not contiguous in physical memory, but the MMU makes 337c4fcd7caSMauro Carvalho Chehabit look like it is for you (so it'll only look contiguous to the CPUs, 338c4fcd7caSMauro Carvalho Chehabnot to external device drivers). If you really need large physically 339c4fcd7caSMauro Carvalho Chehabcontiguous memory for some weird device, you have a problem: it is 340c4fcd7caSMauro Carvalho Chehabpoorly supported in Linux because after some time memory fragmentation 341c4fcd7caSMauro Carvalho Chehabin a running kernel makes it hard. The best way is to allocate the block 342c4fcd7caSMauro Carvalho Chehabearly in the boot process via the :c:func:`alloc_bootmem()` 343c4fcd7caSMauro Carvalho Chehabroutine. 344c4fcd7caSMauro Carvalho Chehab 345c4fcd7caSMauro Carvalho ChehabBefore inventing your own cache of often-used objects consider using a 346c4fcd7caSMauro Carvalho Chehabslab cache in ``include/linux/slab.h`` 347c4fcd7caSMauro Carvalho Chehab 3483a4928cfSJoe Pater:c:macro:`current` 3493a4928cfSJoe Pater------------------ 350dca1e58eSMauro Carvalho Chehab 351dca1e58eSMauro Carvalho ChehabDefined in ``include/asm/current.h`` 352c4fcd7caSMauro Carvalho Chehab 353c4fcd7caSMauro Carvalho ChehabThis global variable (really a macro) contains a pointer to the current 354c4fcd7caSMauro Carvalho Chehabtask structure, so is only valid in user context. For example, when a 355c4fcd7caSMauro Carvalho Chehabprocess makes a system call, this will point to the task structure of 356c4fcd7caSMauro Carvalho Chehabthe calling process. It is **not NULL** in interrupt context. 357c4fcd7caSMauro Carvalho Chehab 358dca1e58eSMauro Carvalho Chehab:c:func:`mdelay()`/:c:func:`udelay()` 359dca1e58eSMauro Carvalho Chehab------------------------------------- 360dca1e58eSMauro Carvalho Chehab 361dca1e58eSMauro Carvalho ChehabDefined in ``include/asm/delay.h`` / ``include/linux/delay.h`` 362c4fcd7caSMauro Carvalho Chehab 363c4fcd7caSMauro Carvalho ChehabThe :c:func:`udelay()` and :c:func:`ndelay()` functions can be 364c4fcd7caSMauro Carvalho Chehabused for small pauses. Do not use large values with them as you risk 365c4fcd7caSMauro Carvalho Chehaboverflow - the helper function :c:func:`mdelay()` is useful here, or 366c4fcd7caSMauro Carvalho Chehabconsider :c:func:`msleep()`. 367c4fcd7caSMauro Carvalho Chehab 368dca1e58eSMauro Carvalho Chehab:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()` 369dca1e58eSMauro Carvalho Chehab----------------------------------------------------------------------------------------------- 370dca1e58eSMauro Carvalho Chehab 371dca1e58eSMauro Carvalho ChehabDefined in ``include/asm/byteorder.h`` 372c4fcd7caSMauro Carvalho Chehab 373c4fcd7caSMauro Carvalho ChehabThe :c:func:`cpu_to_be32()` family (where the "32" can be replaced 374c4fcd7caSMauro Carvalho Chehabby 64 or 16, and the "be" can be replaced by "le") are the general way 375c4fcd7caSMauro Carvalho Chehabto do endian conversions in the kernel: they return the converted value. 376c4fcd7caSMauro Carvalho ChehabAll variations supply the reverse as well: 377c4fcd7caSMauro Carvalho Chehab:c:func:`be32_to_cpu()`, etc. 378c4fcd7caSMauro Carvalho Chehab 379c4fcd7caSMauro Carvalho ChehabThere are two major variations of these functions: the pointer 380c4fcd7caSMauro Carvalho Chehabvariation, such as :c:func:`cpu_to_be32p()`, which take a pointer 381c4fcd7caSMauro Carvalho Chehabto the given type, and return the converted value. The other variation 382c4fcd7caSMauro Carvalho Chehabis the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which 383c4fcd7caSMauro Carvalho Chehabconvert value referred to by the pointer, and return void. 384c4fcd7caSMauro Carvalho Chehab 385dca1e58eSMauro Carvalho Chehab:c:func:`local_irq_save()`/:c:func:`local_irq_restore()` 386dca1e58eSMauro Carvalho Chehab-------------------------------------------------------- 387dca1e58eSMauro Carvalho Chehab 388dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/irqflags.h`` 389c4fcd7caSMauro Carvalho Chehab 390c4fcd7caSMauro Carvalho ChehabThese routines disable hard interrupts on the local CPU, and restore 391c4fcd7caSMauro Carvalho Chehabthem. They are reentrant; saving the previous state in their one 392c4fcd7caSMauro Carvalho Chehab``unsigned long flags`` argument. If you know that interrupts are 393c4fcd7caSMauro Carvalho Chehabenabled, you can simply use :c:func:`local_irq_disable()` and 394c4fcd7caSMauro Carvalho Chehab:c:func:`local_irq_enable()`. 395c4fcd7caSMauro Carvalho Chehab 396dca1e58eSMauro Carvalho Chehab.. _local_bh_disable: 397dca1e58eSMauro Carvalho Chehab 398dca1e58eSMauro Carvalho Chehab:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()` 399dca1e58eSMauro Carvalho Chehab-------------------------------------------------------- 400dca1e58eSMauro Carvalho Chehab 401dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/bottom_half.h`` 402dca1e58eSMauro Carvalho Chehab 403c4fcd7caSMauro Carvalho Chehab 404c4fcd7caSMauro Carvalho ChehabThese routines disable soft interrupts on the local CPU, and restore 405c4fcd7caSMauro Carvalho Chehabthem. They are reentrant; if soft interrupts were disabled before, they 406c4fcd7caSMauro Carvalho Chehabwill still be disabled after this pair of functions has been called. 407c4fcd7caSMauro Carvalho ChehabThey prevent softirqs and tasklets from running on the current CPU. 408c4fcd7caSMauro Carvalho Chehab 409dca1e58eSMauro Carvalho Chehab:c:func:`smp_processor_id()` 410dca1e58eSMauro Carvalho Chehab---------------------------- 411dca1e58eSMauro Carvalho Chehab 412dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/smp.h`` 413c4fcd7caSMauro Carvalho Chehab 414c4fcd7caSMauro Carvalho Chehab:c:func:`get_cpu()` disables preemption (so you won't suddenly get 415c4fcd7caSMauro Carvalho Chehabmoved to another CPU) and returns the current processor number, between 416c4fcd7caSMauro Carvalho Chehab0 and ``NR_CPUS``. Note that the CPU numbers are not necessarily 417c4fcd7caSMauro Carvalho Chehabcontinuous. You return it again with :c:func:`put_cpu()` when you 418c4fcd7caSMauro Carvalho Chehabare done. 419c4fcd7caSMauro Carvalho Chehab 420c4fcd7caSMauro Carvalho ChehabIf you know you cannot be preempted by another task (ie. you are in 421c4fcd7caSMauro Carvalho Chehabinterrupt context, or have preemption disabled) you can use 422c4fcd7caSMauro Carvalho Chehabsmp_processor_id(). 423c4fcd7caSMauro Carvalho Chehab 424dca1e58eSMauro Carvalho Chehab``__init``/``__exit``/``__initdata`` 425dca1e58eSMauro Carvalho Chehab------------------------------------ 426dca1e58eSMauro Carvalho Chehab 427dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/init.h`` 428c4fcd7caSMauro Carvalho Chehab 429c4fcd7caSMauro Carvalho ChehabAfter boot, the kernel frees up a special section; functions marked with 430c4fcd7caSMauro Carvalho Chehab``__init`` and data structures marked with ``__initdata`` are dropped 431c4fcd7caSMauro Carvalho Chehabafter boot is complete: similarly modules discard this memory after 432c4fcd7caSMauro Carvalho Chehabinitialization. ``__exit`` is used to declare a function which is only 433c4fcd7caSMauro Carvalho Chehabrequired on exit: the function will be dropped if this file is not 434c4fcd7caSMauro Carvalho Chehabcompiled as a module. See the header file for use. Note that it makes no 435c4fcd7caSMauro Carvalho Chehabsense for a function marked with ``__init`` to be exported to modules 436dca1e58eSMauro Carvalho Chehabwith :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this 437dca1e58eSMauro Carvalho Chehabwill break. 438c4fcd7caSMauro Carvalho Chehab 439dca1e58eSMauro Carvalho Chehab:c:func:`__initcall()`/:c:func:`module_init()` 440dca1e58eSMauro Carvalho Chehab---------------------------------------------- 441dca1e58eSMauro Carvalho Chehab 442dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/init.h`` / ``include/linux/module.h`` 443c4fcd7caSMauro Carvalho Chehab 444c4fcd7caSMauro Carvalho ChehabMany parts of the kernel are well served as a module 445c4fcd7caSMauro Carvalho Chehab(dynamically-loadable parts of the kernel). Using the 446c4fcd7caSMauro Carvalho Chehab:c:func:`module_init()` and :c:func:`module_exit()` macros it 447c4fcd7caSMauro Carvalho Chehabis easy to write code without #ifdefs which can operate both as a module 448c4fcd7caSMauro Carvalho Chehabor built into the kernel. 449c4fcd7caSMauro Carvalho Chehab 450c4fcd7caSMauro Carvalho ChehabThe :c:func:`module_init()` macro defines which function is to be 451c4fcd7caSMauro Carvalho Chehabcalled at module insertion time (if the file is compiled as a module), 452c4fcd7caSMauro Carvalho Chehabor at boot time: if the file is not compiled as a module the 453c4fcd7caSMauro Carvalho Chehab:c:func:`module_init()` macro becomes equivalent to 454c4fcd7caSMauro Carvalho Chehab:c:func:`__initcall()`, which through linker magic ensures that 455c4fcd7caSMauro Carvalho Chehabthe function is called on boot. 456c4fcd7caSMauro Carvalho Chehab 457c4fcd7caSMauro Carvalho ChehabThe function can return a negative error number to cause module loading 458c4fcd7caSMauro Carvalho Chehabto fail (unfortunately, this has no effect if the module is compiled 459c4fcd7caSMauro Carvalho Chehabinto the kernel). This function is called in user context with 460c4fcd7caSMauro Carvalho Chehabinterrupts enabled, so it can sleep. 461c4fcd7caSMauro Carvalho Chehab 462dca1e58eSMauro Carvalho Chehab:c:func:`module_exit()` 463dca1e58eSMauro Carvalho Chehab----------------------- 464dca1e58eSMauro Carvalho Chehab 465dca1e58eSMauro Carvalho Chehab 466dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/module.h`` 467c4fcd7caSMauro Carvalho Chehab 468c4fcd7caSMauro Carvalho ChehabThis macro defines the function to be called at module removal time (or 469c4fcd7caSMauro Carvalho Chehabnever, in the case of the file compiled into the kernel). It will only 470c4fcd7caSMauro Carvalho Chehabbe called if the module usage count has reached zero. This function can 471c4fcd7caSMauro Carvalho Chehabalso sleep, but cannot fail: everything must be cleaned up by the time 472c4fcd7caSMauro Carvalho Chehabit returns. 473c4fcd7caSMauro Carvalho Chehab 474c4fcd7caSMauro Carvalho ChehabNote that this macro is optional: if it is not present, your module will 475c4fcd7caSMauro Carvalho Chehabnot be removable (except for 'rmmod -f'). 476c4fcd7caSMauro Carvalho Chehab 477dca1e58eSMauro Carvalho Chehab:c:func:`try_module_get()`/:c:func:`module_put()` 478dca1e58eSMauro Carvalho Chehab------------------------------------------------- 479dca1e58eSMauro Carvalho Chehab 480dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/module.h`` 481c4fcd7caSMauro Carvalho Chehab 482c4fcd7caSMauro Carvalho ChehabThese manipulate the module usage count, to protect against removal (a 483c4fcd7caSMauro Carvalho Chehabmodule also can't be removed if another module uses one of its exported 484c4fcd7caSMauro Carvalho Chehabsymbols: see below). Before calling into module code, you should call 485c4fcd7caSMauro Carvalho Chehab:c:func:`try_module_get()` on that module: if it fails, then the 486c4fcd7caSMauro Carvalho Chehabmodule is being removed and you should act as if it wasn't there. 487c4fcd7caSMauro Carvalho ChehabOtherwise, you can safely enter the module, and call 488c4fcd7caSMauro Carvalho Chehab:c:func:`module_put()` when you're finished. 489c4fcd7caSMauro Carvalho Chehab 490c4fcd7caSMauro Carvalho ChehabMost registerable structures have an owner field, such as in the 491c4fcd7caSMauro Carvalho Chehab:c:type:`struct file_operations <file_operations>` structure. 492c4fcd7caSMauro Carvalho ChehabSet this field to the macro ``THIS_MODULE``. 493c4fcd7caSMauro Carvalho Chehab 494c4fcd7caSMauro Carvalho ChehabWait Queues ``include/linux/wait.h`` 495c4fcd7caSMauro Carvalho Chehab==================================== 496c4fcd7caSMauro Carvalho Chehab 497c4fcd7caSMauro Carvalho Chehab**[SLEEPS]** 498c4fcd7caSMauro Carvalho Chehab 499c4fcd7caSMauro Carvalho ChehabA wait queue is used to wait for someone to wake you up when a certain 500c4fcd7caSMauro Carvalho Chehabcondition is true. They must be used carefully to ensure there is no 501dca1e58eSMauro Carvalho Chehabrace condition. You declare a :c:type:`wait_queue_head_t`, and then processes 502650fc870SLinus Torvaldswhich want to wait for that condition declare a :c:type:`wait_queue_entry_t` 503c4fcd7caSMauro Carvalho Chehabreferring to themselves, and place that in the queue. 504c4fcd7caSMauro Carvalho Chehab 505c4fcd7caSMauro Carvalho ChehabDeclaring 506c4fcd7caSMauro Carvalho Chehab--------- 507c4fcd7caSMauro Carvalho Chehab 508c4fcd7caSMauro Carvalho ChehabYou declare a ``wait_queue_head_t`` using the 509c4fcd7caSMauro Carvalho Chehab:c:func:`DECLARE_WAIT_QUEUE_HEAD()` macro, or using the 510c4fcd7caSMauro Carvalho Chehab:c:func:`init_waitqueue_head()` routine in your initialization 511c4fcd7caSMauro Carvalho Chehabcode. 512c4fcd7caSMauro Carvalho Chehab 513c4fcd7caSMauro Carvalho ChehabQueuing 514c4fcd7caSMauro Carvalho Chehab------- 515c4fcd7caSMauro Carvalho Chehab 516c4fcd7caSMauro Carvalho ChehabPlacing yourself in the waitqueue is fairly complex, because you must 517c4fcd7caSMauro Carvalho Chehabput yourself in the queue before checking the condition. There is a 518c4fcd7caSMauro Carvalho Chehabmacro to do this: :c:func:`wait_event_interruptible()` 519dca1e58eSMauro Carvalho Chehab(``include/linux/wait.h``) The first argument is the wait queue head, and 520c4fcd7caSMauro Carvalho Chehabthe second is an expression which is evaluated; the macro returns 0 when 521dca1e58eSMauro Carvalho Chehabthis expression is true, or ``-ERESTARTSYS`` if a signal is received. The 522c4fcd7caSMauro Carvalho Chehab:c:func:`wait_event()` version ignores signals. 523c4fcd7caSMauro Carvalho Chehab 524c4fcd7caSMauro Carvalho ChehabWaking Up Queued Tasks 525c4fcd7caSMauro Carvalho Chehab---------------------- 526c4fcd7caSMauro Carvalho Chehab 527c1de03a4SMarco Donato TorselloCall :c:func:`wake_up()` (``include/linux/wait.h``), which will wake 528c4fcd7caSMauro Carvalho Chehabup every process in the queue. The exception is if one has 529c4fcd7caSMauro Carvalho Chehab``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will 530c4fcd7caSMauro Carvalho Chehabnot be woken. There are other variants of this basic function available 531c4fcd7caSMauro Carvalho Chehabin the same header. 532c4fcd7caSMauro Carvalho Chehab 533c4fcd7caSMauro Carvalho ChehabAtomic Operations 534c4fcd7caSMauro Carvalho Chehab================= 535c4fcd7caSMauro Carvalho Chehab 536c4fcd7caSMauro Carvalho ChehabCertain operations are guaranteed atomic on all platforms. The first 537dca1e58eSMauro Carvalho Chehabclass of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``); 538dca1e58eSMauro Carvalho Chehabthis contains a signed integer (at least 32 bits long), and you must use 539dca1e58eSMauro Carvalho Chehabthese functions to manipulate or read :c:type:`atomic_t` variables. 540c4fcd7caSMauro Carvalho Chehab:c:func:`atomic_read()` and :c:func:`atomic_set()` get and set 541c4fcd7caSMauro Carvalho Chehabthe counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`, 542c4fcd7caSMauro Carvalho Chehab:c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and 543c4fcd7caSMauro Carvalho Chehab:c:func:`atomic_dec_and_test()` (returns true if it was 544c4fcd7caSMauro Carvalho Chehabdecremented to zero). 545c4fcd7caSMauro Carvalho Chehab 546c4fcd7caSMauro Carvalho ChehabYes. It returns true (i.e. != 0) if the atomic variable is zero. 547c4fcd7caSMauro Carvalho Chehab 548c4fcd7caSMauro Carvalho ChehabNote that these functions are slower than normal arithmetic, and so 549c4fcd7caSMauro Carvalho Chehabshould not be used unnecessarily. 550c4fcd7caSMauro Carvalho Chehab 551c4fcd7caSMauro Carvalho ChehabThe second class of atomic operations is atomic bit operations on an 552c4fcd7caSMauro Carvalho Chehab``unsigned long``, defined in ``include/linux/bitops.h``. These 553c4fcd7caSMauro Carvalho Chehaboperations generally take a pointer to the bit pattern, and a bit 554c4fcd7caSMauro Carvalho Chehabnumber: 0 is the least significant bit. :c:func:`set_bit()`, 555c4fcd7caSMauro Carvalho Chehab:c:func:`clear_bit()` and :c:func:`change_bit()` set, clear, 556c4fcd7caSMauro Carvalho Chehaband flip the given bit. :c:func:`test_and_set_bit()`, 557c4fcd7caSMauro Carvalho Chehab:c:func:`test_and_clear_bit()` and 558c4fcd7caSMauro Carvalho Chehab:c:func:`test_and_change_bit()` do the same thing, except return 559c4fcd7caSMauro Carvalho Chehabtrue if the bit was previously set; these are particularly useful for 560c4fcd7caSMauro Carvalho Chehabatomically setting flags. 561c4fcd7caSMauro Carvalho Chehab 562c4fcd7caSMauro Carvalho ChehabIt is possible to call these operations with bit indices greater than 563dca1e58eSMauro Carvalho Chehab``BITS_PER_LONG``. The resulting behavior is strange on big-endian 564c4fcd7caSMauro Carvalho Chehabplatforms though so it is a good idea not to do this. 565c4fcd7caSMauro Carvalho Chehab 566c4fcd7caSMauro Carvalho ChehabSymbols 567c4fcd7caSMauro Carvalho Chehab======= 568c4fcd7caSMauro Carvalho Chehab 569c4fcd7caSMauro Carvalho ChehabWithin the kernel proper, the normal linking rules apply (ie. unless a 570c4fcd7caSMauro Carvalho Chehabsymbol is declared to be file scope with the ``static`` keyword, it can 571c4fcd7caSMauro Carvalho Chehabbe used anywhere in the kernel). However, for modules, a special 572c4fcd7caSMauro Carvalho Chehabexported symbol table is kept which limits the entry points to the 573c4fcd7caSMauro Carvalho Chehabkernel proper. Modules can also export symbols. 574c4fcd7caSMauro Carvalho Chehab 575dca1e58eSMauro Carvalho Chehab:c:func:`EXPORT_SYMBOL()` 576dca1e58eSMauro Carvalho Chehab------------------------- 577dca1e58eSMauro Carvalho Chehab 578dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/export.h`` 579c4fcd7caSMauro Carvalho Chehab 580c4fcd7caSMauro Carvalho ChehabThis is the classic method of exporting a symbol: dynamically loaded 581c4fcd7caSMauro Carvalho Chehabmodules will be able to use the symbol as normal. 582c4fcd7caSMauro Carvalho Chehab 583dca1e58eSMauro Carvalho Chehab:c:func:`EXPORT_SYMBOL_GPL()` 584dca1e58eSMauro Carvalho Chehab----------------------------- 585dca1e58eSMauro Carvalho Chehab 586dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/export.h`` 587c4fcd7caSMauro Carvalho Chehab 588c4fcd7caSMauro Carvalho ChehabSimilar to :c:func:`EXPORT_SYMBOL()` except that the symbols 589c4fcd7caSMauro Carvalho Chehabexported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by 590c4fcd7caSMauro Carvalho Chehabmodules with a :c:func:`MODULE_LICENSE()` that specifies a GPL 591c4fcd7caSMauro Carvalho Chehabcompatible license. It implies that the function is considered an 592c4fcd7caSMauro Carvalho Chehabinternal implementation issue, and not really an interface. Some 593c4fcd7caSMauro Carvalho Chehabmaintainers and developers may however require EXPORT_SYMBOL_GPL() 594c4fcd7caSMauro Carvalho Chehabwhen adding any new APIs or functionality. 595c4fcd7caSMauro Carvalho Chehab 596c4f4af40SMatthias Maennich:c:func:`EXPORT_SYMBOL_NS()` 597c4f4af40SMatthias Maennich---------------------------- 598c4f4af40SMatthias Maennich 599c4f4af40SMatthias MaennichDefined in ``include/linux/export.h`` 600c4f4af40SMatthias Maennich 601c4f4af40SMatthias MaennichThis is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol 602c4f4af40SMatthias Maennichnamespace. Symbol Namespaces are documented in 6037f3f7bfbSMauro Carvalho ChehabDocumentation/core-api/symbol-namespaces.rst 604c4f4af40SMatthias Maennich 605c4f4af40SMatthias Maennich:c:func:`EXPORT_SYMBOL_NS_GPL()` 606c4f4af40SMatthias Maennich-------------------------------- 607c4f4af40SMatthias Maennich 608c4f4af40SMatthias MaennichDefined in ``include/linux/export.h`` 609c4f4af40SMatthias Maennich 610c4f4af40SMatthias MaennichThis is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol 611c4f4af40SMatthias Maennichnamespace. Symbol Namespaces are documented in 6127f3f7bfbSMauro Carvalho ChehabDocumentation/core-api/symbol-namespaces.rst 613c4f4af40SMatthias Maennich 614c4fcd7caSMauro Carvalho ChehabRoutines and Conventions 615c4fcd7caSMauro Carvalho Chehab======================== 616c4fcd7caSMauro Carvalho Chehab 617c4fcd7caSMauro Carvalho ChehabDouble-linked lists ``include/linux/list.h`` 618c4fcd7caSMauro Carvalho Chehab-------------------------------------------- 619c4fcd7caSMauro Carvalho Chehab 620c4fcd7caSMauro Carvalho ChehabThere used to be three sets of linked-list routines in the kernel 621c4fcd7caSMauro Carvalho Chehabheaders, but this one is the winner. If you don't have some particular 622c4fcd7caSMauro Carvalho Chehabpressing need for a single list, it's a good choice. 623c4fcd7caSMauro Carvalho Chehab 624c4fcd7caSMauro Carvalho ChehabIn particular, :c:func:`list_for_each_entry()` is useful. 625c4fcd7caSMauro Carvalho Chehab 626c4fcd7caSMauro Carvalho ChehabReturn Conventions 627c4fcd7caSMauro Carvalho Chehab------------------ 628c4fcd7caSMauro Carvalho Chehab 629c4fcd7caSMauro Carvalho ChehabFor code called in user context, it's very common to defy C convention, 630dca1e58eSMauro Carvalho Chehaband return 0 for success, and a negative error number (eg. ``-EFAULT``) for 631c4fcd7caSMauro Carvalho Chehabfailure. This can be unintuitive at first, but it's fairly widespread in 632c4fcd7caSMauro Carvalho Chehabthe kernel. 633c4fcd7caSMauro Carvalho Chehab 634dca1e58eSMauro Carvalho ChehabUsing :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a 635c4fcd7caSMauro Carvalho Chehabnegative error number into a pointer, and :c:func:`IS_ERR()` and 636c4fcd7caSMauro Carvalho Chehab:c:func:`PTR_ERR()` to get it back out again: avoids a separate 637c4fcd7caSMauro Carvalho Chehabpointer parameter for the error number. Icky, but in a good way. 638c4fcd7caSMauro Carvalho Chehab 639c4fcd7caSMauro Carvalho ChehabBreaking Compilation 640c4fcd7caSMauro Carvalho Chehab-------------------- 641c4fcd7caSMauro Carvalho Chehab 642c4fcd7caSMauro Carvalho ChehabLinus and the other developers sometimes change function or structure 643c4fcd7caSMauro Carvalho Chehabnames in development kernels; this is not done just to keep everyone on 644c4fcd7caSMauro Carvalho Chehabtheir toes: it reflects a fundamental change (eg. can no longer be 645c4fcd7caSMauro Carvalho Chehabcalled with interrupts on, or does extra checks, or doesn't do checks 646c4fcd7caSMauro Carvalho Chehabwhich were caught before). Usually this is accompanied by a fairly 647f35cf1a5SKonstantin Ryabitsevcomplete note to the appropriate kernel development mailing list; search 648f35cf1a5SKonstantin Ryabitsevthe archives. Simply doing a global replace on the file usually makes 649f35cf1a5SKonstantin Ryabitsevthings **worse**. 650c4fcd7caSMauro Carvalho Chehab 651c4fcd7caSMauro Carvalho ChehabInitializing structure members 652c4fcd7caSMauro Carvalho Chehab------------------------------ 653c4fcd7caSMauro Carvalho Chehab 654c4fcd7caSMauro Carvalho ChehabThe preferred method of initializing structures is to use designated 655dca1e58eSMauro Carvalho Chehabinitialisers, as defined by ISO C99, eg:: 656c4fcd7caSMauro Carvalho Chehab 657c4fcd7caSMauro Carvalho Chehab static struct block_device_operations opt_fops = { 658c4fcd7caSMauro Carvalho Chehab .open = opt_open, 659c4fcd7caSMauro Carvalho Chehab .release = opt_release, 660c4fcd7caSMauro Carvalho Chehab .ioctl = opt_ioctl, 661c4fcd7caSMauro Carvalho Chehab .check_media_change = opt_media_change, 662c4fcd7caSMauro Carvalho Chehab }; 663c4fcd7caSMauro Carvalho Chehab 664c4fcd7caSMauro Carvalho Chehab 665c4fcd7caSMauro Carvalho ChehabThis makes it easy to grep for, and makes it clear which structure 666c4fcd7caSMauro Carvalho Chehabfields are set. You should do this because it looks cool. 667c4fcd7caSMauro Carvalho Chehab 668c4fcd7caSMauro Carvalho ChehabGNU Extensions 669c4fcd7caSMauro Carvalho Chehab-------------- 670c4fcd7caSMauro Carvalho Chehab 671c4fcd7caSMauro Carvalho ChehabGNU Extensions are explicitly allowed in the Linux kernel. Note that 672c4fcd7caSMauro Carvalho Chehabsome of the more complex ones are not very well supported, due to lack 673c4fcd7caSMauro Carvalho Chehabof general use, but the following are considered standard (see the GCC 674c4fcd7caSMauro Carvalho Chehabinfo page section "C Extensions" for more details - Yes, really the info 675c4fcd7caSMauro Carvalho Chehabpage, the man page is only a short summary of the stuff in info). 676c4fcd7caSMauro Carvalho Chehab 677c4fcd7caSMauro Carvalho Chehab- Inline functions 678c4fcd7caSMauro Carvalho Chehab 679c4fcd7caSMauro Carvalho Chehab- Statement expressions (ie. the ({ and }) constructs). 680c4fcd7caSMauro Carvalho Chehab 681c4fcd7caSMauro Carvalho Chehab- Declaring attributes of a function / variable / type 682c4fcd7caSMauro Carvalho Chehab (__attribute__) 683c4fcd7caSMauro Carvalho Chehab 684c4fcd7caSMauro Carvalho Chehab- typeof 685c4fcd7caSMauro Carvalho Chehab 686c4fcd7caSMauro Carvalho Chehab- Zero length arrays 687c4fcd7caSMauro Carvalho Chehab 688c4fcd7caSMauro Carvalho Chehab- Macro varargs 689c4fcd7caSMauro Carvalho Chehab 690c4fcd7caSMauro Carvalho Chehab- Arithmetic on void pointers 691c4fcd7caSMauro Carvalho Chehab 692c4fcd7caSMauro Carvalho Chehab- Non-Constant initializers 693c4fcd7caSMauro Carvalho Chehab 694c4fcd7caSMauro Carvalho Chehab- Assembler Instructions (not outside arch/ and include/asm/) 695c4fcd7caSMauro Carvalho Chehab 696c4fcd7caSMauro Carvalho Chehab- Function names as strings (__func__). 697c4fcd7caSMauro Carvalho Chehab 698c4fcd7caSMauro Carvalho Chehab- __builtin_constant_p() 699c4fcd7caSMauro Carvalho Chehab 700c4fcd7caSMauro Carvalho ChehabBe wary when using long long in the kernel, the code gcc generates for 701c4fcd7caSMauro Carvalho Chehabit is horrible and worse: division and multiplication does not work on 702c4fcd7caSMauro Carvalho Chehabi386 because the GCC runtime functions for it are missing from the 703c4fcd7caSMauro Carvalho Chehabkernel environment. 704c4fcd7caSMauro Carvalho Chehab 705c4fcd7caSMauro Carvalho ChehabC++ 706c4fcd7caSMauro Carvalho Chehab--- 707c4fcd7caSMauro Carvalho Chehab 708c4fcd7caSMauro Carvalho ChehabUsing C++ in the kernel is usually a bad idea, because the kernel does 709c4fcd7caSMauro Carvalho Chehabnot provide the necessary runtime environment and the include files are 710c4fcd7caSMauro Carvalho Chehabnot tested for it. It is still possible, but not recommended. If you 711c4fcd7caSMauro Carvalho Chehabreally want to do this, forget about exceptions at least. 712c4fcd7caSMauro Carvalho Chehab 713423860a6SMatthew Wilcox#if 714423860a6SMatthew Wilcox--- 715c4fcd7caSMauro Carvalho Chehab 716c4fcd7caSMauro Carvalho ChehabIt is generally considered cleaner to use macros in header files (or at 717c4fcd7caSMauro Carvalho Chehabthe top of .c files) to abstract away functions rather than using \`#if' 718c4fcd7caSMauro Carvalho Chehabpre-processor statements throughout the source code. 719c4fcd7caSMauro Carvalho Chehab 720c4fcd7caSMauro Carvalho ChehabPutting Your Stuff in the Kernel 721c4fcd7caSMauro Carvalho Chehab================================ 722c4fcd7caSMauro Carvalho Chehab 723c4fcd7caSMauro Carvalho ChehabIn order to get your stuff into shape for official inclusion, or even to 724c4fcd7caSMauro Carvalho Chehabmake a neat patch, there's administrative work to be done: 725c4fcd7caSMauro Carvalho Chehab 726f35cf1a5SKonstantin Ryabitsev- Figure out who are the owners of the code you've been modifying. Look 727f35cf1a5SKonstantin Ryabitsev at the top of the source files, inside the ``MAINTAINERS`` file, and 728f35cf1a5SKonstantin Ryabitsev last of all in the ``CREDITS`` file. You should coordinate with these 729f35cf1a5SKonstantin Ryabitsev people to make sure you're not duplicating effort, or trying something 730f35cf1a5SKonstantin Ryabitsev that's already been rejected. 731c4fcd7caSMauro Carvalho Chehab 732f35cf1a5SKonstantin Ryabitsev Make sure you put your name and email address at the top of any files 733f35cf1a5SKonstantin Ryabitsev you create or modify significantly. This is the first place people 734c4fcd7caSMauro Carvalho Chehab will look when they find a bug, or when **they** want to make a change. 735c4fcd7caSMauro Carvalho Chehab 736c4fcd7caSMauro Carvalho Chehab- Usually you want a configuration option for your kernel hack. Edit 737c4fcd7caSMauro Carvalho Chehab ``Kconfig`` in the appropriate directory. The Config language is 738c4fcd7caSMauro Carvalho Chehab simple to use by cut and paste, and there's complete documentation in 739cd238effSMauro Carvalho Chehab ``Documentation/kbuild/kconfig-language.rst``. 740c4fcd7caSMauro Carvalho Chehab 741c4fcd7caSMauro Carvalho Chehab In your description of the option, make sure you address both the 742c4fcd7caSMauro Carvalho Chehab expert user and the user who knows nothing about your feature. 743c4fcd7caSMauro Carvalho Chehab Mention incompatibilities and issues here. **Definitely** end your 744c4fcd7caSMauro Carvalho Chehab description with “if in doubt, say N” (or, occasionally, \`Y'); this 745c4fcd7caSMauro Carvalho Chehab is for people who have no idea what you are talking about. 746c4fcd7caSMauro Carvalho Chehab 747c4fcd7caSMauro Carvalho Chehab- Edit the ``Makefile``: the CONFIG variables are exported here so you 748c4fcd7caSMauro Carvalho Chehab can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax 749cd238effSMauro Carvalho Chehab is documented in ``Documentation/kbuild/makefiles.rst``. 750c4fcd7caSMauro Carvalho Chehab 751f35cf1a5SKonstantin Ryabitsev- Put yourself in ``CREDITS`` if you consider what you've done 752f35cf1a5SKonstantin Ryabitsev noteworthy, usually beyond a single file (your name should be at the 753f35cf1a5SKonstantin Ryabitsev top of the source files anyway). ``MAINTAINERS`` means you want to be 754f35cf1a5SKonstantin Ryabitsev consulted when changes are made to a subsystem, and hear about bugs; 755f35cf1a5SKonstantin Ryabitsev it implies a more-than-passing commitment to some part of the code. 756c4fcd7caSMauro Carvalho Chehab 757c4fcd7caSMauro Carvalho Chehab- Finally, don't forget to read 7589db370deSLukas Bulwahn ``Documentation/process/submitting-patches.rst`` 759c4fcd7caSMauro Carvalho Chehab 760c4fcd7caSMauro Carvalho ChehabKernel Cantrips 761c4fcd7caSMauro Carvalho Chehab=============== 762c4fcd7caSMauro Carvalho Chehab 763c4fcd7caSMauro Carvalho ChehabSome favorites from browsing the source. Feel free to add to this list. 764c4fcd7caSMauro Carvalho Chehab 765dca1e58eSMauro Carvalho Chehab``arch/x86/include/asm/delay.h``:: 766c4fcd7caSMauro Carvalho Chehab 767c4fcd7caSMauro Carvalho Chehab #define ndelay(n) (__builtin_constant_p(n) ? \ 768c4fcd7caSMauro Carvalho Chehab ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \ 769c4fcd7caSMauro Carvalho Chehab __ndelay(n)) 770c4fcd7caSMauro Carvalho Chehab 771c4fcd7caSMauro Carvalho Chehab 772dca1e58eSMauro Carvalho Chehab``include/linux/fs.h``:: 773c4fcd7caSMauro Carvalho Chehab 774c4fcd7caSMauro Carvalho Chehab /* 775c4fcd7caSMauro Carvalho Chehab * Kernel pointers have redundant information, so we can use a 776c4fcd7caSMauro Carvalho Chehab * scheme where we can return either an error code or a dentry 777c4fcd7caSMauro Carvalho Chehab * pointer with the same return value. 778c4fcd7caSMauro Carvalho Chehab * 779c4fcd7caSMauro Carvalho Chehab * This should be a per-architecture thing, to allow different 780c4fcd7caSMauro Carvalho Chehab * error and pointer decisions. 781c4fcd7caSMauro Carvalho Chehab */ 782c4fcd7caSMauro Carvalho Chehab #define ERR_PTR(err) ((void *)((long)(err))) 783c4fcd7caSMauro Carvalho Chehab #define PTR_ERR(ptr) ((long)(ptr)) 784c4fcd7caSMauro Carvalho Chehab #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000)) 785c4fcd7caSMauro Carvalho Chehab 786dca1e58eSMauro Carvalho Chehab``arch/x86/include/asm/uaccess_32.h:``:: 787c4fcd7caSMauro Carvalho Chehab 788c4fcd7caSMauro Carvalho Chehab #define copy_to_user(to,from,n) \ 789c4fcd7caSMauro Carvalho Chehab (__builtin_constant_p(n) ? \ 790c4fcd7caSMauro Carvalho Chehab __constant_copy_to_user((to),(from),(n)) : \ 791c4fcd7caSMauro Carvalho Chehab __generic_copy_to_user((to),(from),(n))) 792c4fcd7caSMauro Carvalho Chehab 793c4fcd7caSMauro Carvalho Chehab 794dca1e58eSMauro Carvalho Chehab``arch/sparc/kernel/head.S:``:: 795c4fcd7caSMauro Carvalho Chehab 796c4fcd7caSMauro Carvalho Chehab /* 797c4fcd7caSMauro Carvalho Chehab * Sun people can't spell worth damn. "compatability" indeed. 798c4fcd7caSMauro Carvalho Chehab * At least we *know* we can't spell, and use a spell-checker. 799c4fcd7caSMauro Carvalho Chehab */ 800c4fcd7caSMauro Carvalho Chehab 801c4fcd7caSMauro Carvalho Chehab /* Uh, actually Linus it is I who cannot spell. Too much murky 802c4fcd7caSMauro Carvalho Chehab * Sparc assembly will do this to ya. 803c4fcd7caSMauro Carvalho Chehab */ 804c4fcd7caSMauro Carvalho Chehab C_LABEL(cputypvar): 805c4fcd7caSMauro Carvalho Chehab .asciz "compatibility" 806c4fcd7caSMauro Carvalho Chehab 807c4fcd7caSMauro Carvalho Chehab /* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */ 808c4fcd7caSMauro Carvalho Chehab .align 4 809c4fcd7caSMauro Carvalho Chehab C_LABEL(cputypvar_sun4m): 810c4fcd7caSMauro Carvalho Chehab .asciz "compatible" 811c4fcd7caSMauro Carvalho Chehab 812c4fcd7caSMauro Carvalho Chehab 813dca1e58eSMauro Carvalho Chehab``arch/sparc/lib/checksum.S:``:: 814c4fcd7caSMauro Carvalho Chehab 815c4fcd7caSMauro Carvalho Chehab /* Sun, you just can't beat me, you just can't. Stop trying, 816c4fcd7caSMauro Carvalho Chehab * give up. I'm serious, I am going to kick the living shit 817c4fcd7caSMauro Carvalho Chehab * out of you, game over, lights out. 818c4fcd7caSMauro Carvalho Chehab */ 819c4fcd7caSMauro Carvalho Chehab 820c4fcd7caSMauro Carvalho Chehab 821c4fcd7caSMauro Carvalho ChehabThanks 822c4fcd7caSMauro Carvalho Chehab====== 823c4fcd7caSMauro Carvalho Chehab 824c4fcd7caSMauro Carvalho ChehabThanks to Andi Kleen for the idea, answering my questions, fixing my 825c4fcd7caSMauro Carvalho Chehabmistakes, filling content, etc. Philipp Rumpf for more spelling and 826c4fcd7caSMauro Carvalho Chehabclarity fixes, and some excellent non-obvious points. Werner Almesberger 827c4fcd7caSMauro Carvalho Chehabfor giving me a great summary of :c:func:`disable_irq()`, and Jes 828c4fcd7caSMauro Carvalho ChehabSorensen and Andrea Arcangeli added caveats. Michael Elizabeth Chastain 829c4fcd7caSMauro Carvalho Chehabfor checking and adding to the Configure section. Telsa Gwynne for 830c4fcd7caSMauro Carvalho Chehabteaching me DocBook. 831