xref: /openbmc/linux/Documentation/kernel-hacking/hacking.rst (revision 7ae9fb1b7ecbb5d85d07857943f677fd1a559b18)
11497624fSFederico Vaga.. _kernel_hacking_hack:
21497624fSFederico Vaga
3c4fcd7caSMauro Carvalho Chehab============================================
4c4fcd7caSMauro Carvalho ChehabUnreliable Guide To Hacking The Linux Kernel
5c4fcd7caSMauro Carvalho Chehab============================================
6c4fcd7caSMauro Carvalho Chehab
7c4fcd7caSMauro Carvalho Chehab:Author: Rusty Russell
8c4fcd7caSMauro Carvalho Chehab
9c4fcd7caSMauro Carvalho ChehabIntroduction
10c4fcd7caSMauro Carvalho Chehab============
11c4fcd7caSMauro Carvalho Chehab
12c4fcd7caSMauro Carvalho ChehabWelcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux
13c4fcd7caSMauro Carvalho ChehabKernel Hacking. This document describes the common routines and general
14c4fcd7caSMauro Carvalho Chehabrequirements for kernel code: its goal is to serve as a primer for Linux
15c4fcd7caSMauro Carvalho Chehabkernel development for experienced C programmers. I avoid implementation
16c4fcd7caSMauro Carvalho Chehabdetails: that's what the code is for, and I ignore whole tracts of
17c4fcd7caSMauro Carvalho Chehabuseful routines.
18c4fcd7caSMauro Carvalho Chehab
19c4fcd7caSMauro Carvalho ChehabBefore you read this, please understand that I never wanted to write
20c4fcd7caSMauro Carvalho Chehabthis document, being grossly under-qualified, but I always wanted to
21c4fcd7caSMauro Carvalho Chehabread it, and this was the only way. I hope it will grow into a
22c4fcd7caSMauro Carvalho Chehabcompendium of best practice, common starting points and random
23c4fcd7caSMauro Carvalho Chehabinformation.
24c4fcd7caSMauro Carvalho Chehab
25c4fcd7caSMauro Carvalho ChehabThe Players
26c4fcd7caSMauro Carvalho Chehab===========
27c4fcd7caSMauro Carvalho Chehab
28c4fcd7caSMauro Carvalho ChehabAt any time each of the CPUs in a system can be:
29c4fcd7caSMauro Carvalho Chehab
30c4fcd7caSMauro Carvalho Chehab-  not associated with any process, serving a hardware interrupt;
31c4fcd7caSMauro Carvalho Chehab
32c4fcd7caSMauro Carvalho Chehab-  not associated with any process, serving a softirq or tasklet;
33c4fcd7caSMauro Carvalho Chehab
34c4fcd7caSMauro Carvalho Chehab-  running in kernel space, associated with a process (user context);
35c4fcd7caSMauro Carvalho Chehab
36c4fcd7caSMauro Carvalho Chehab-  running a process in user space.
37c4fcd7caSMauro Carvalho Chehab
38c4fcd7caSMauro Carvalho ChehabThere is an ordering between these. The bottom two can preempt each
39c4fcd7caSMauro Carvalho Chehabother, but above that is a strict hierarchy: each can only be preempted
40c4fcd7caSMauro Carvalho Chehabby the ones above it. For example, while a softirq is running on a CPU,
41c4fcd7caSMauro Carvalho Chehabno other softirq will preempt it, but a hardware interrupt can. However,
42c4fcd7caSMauro Carvalho Chehabany other CPUs in the system execute independently.
43c4fcd7caSMauro Carvalho Chehab
44c4fcd7caSMauro Carvalho ChehabWe'll see a number of ways that the user context can block interrupts,
45c4fcd7caSMauro Carvalho Chehabto become truly non-preemptable.
46c4fcd7caSMauro Carvalho Chehab
47c4fcd7caSMauro Carvalho ChehabUser Context
48c4fcd7caSMauro Carvalho Chehab------------
49c4fcd7caSMauro Carvalho Chehab
50c4fcd7caSMauro Carvalho ChehabUser context is when you are coming in from a system call or other trap:
51c4fcd7caSMauro Carvalho Chehablike userspace, you can be preempted by more important tasks and by
52c4fcd7caSMauro Carvalho Chehabinterrupts. You can sleep, by calling :c:func:`schedule()`.
53c4fcd7caSMauro Carvalho Chehab
54c4fcd7caSMauro Carvalho Chehab.. note::
55c4fcd7caSMauro Carvalho Chehab
56c4fcd7caSMauro Carvalho Chehab    You are always in user context on module load and unload, and on
57c4fcd7caSMauro Carvalho Chehab    operations on the block device layer.
58c4fcd7caSMauro Carvalho Chehab
59c4fcd7caSMauro Carvalho ChehabIn user context, the ``current`` pointer (indicating the task we are
60c4fcd7caSMauro Carvalho Chehabcurrently executing) is valid, and :c:func:`in_interrupt()`
61dca1e58eSMauro Carvalho Chehab(``include/linux/preempt.h``) is false.
62c4fcd7caSMauro Carvalho Chehab
63c4fcd7caSMauro Carvalho Chehab.. warning::
64c4fcd7caSMauro Carvalho Chehab
65c4fcd7caSMauro Carvalho Chehab    Beware that if you have preemption or softirqs disabled (see below),
66c4fcd7caSMauro Carvalho Chehab    :c:func:`in_interrupt()` will return a false positive.
67c4fcd7caSMauro Carvalho Chehab
68c4fcd7caSMauro Carvalho ChehabHardware Interrupts (Hard IRQs)
69c4fcd7caSMauro Carvalho Chehab-------------------------------
70c4fcd7caSMauro Carvalho Chehab
71c4fcd7caSMauro Carvalho ChehabTimer ticks, network cards and keyboard are examples of real hardware
72c4fcd7caSMauro Carvalho Chehabwhich produce interrupts at any time. The kernel runs interrupt
73c4fcd7caSMauro Carvalho Chehabhandlers, which services the hardware. The kernel guarantees that this
74c4fcd7caSMauro Carvalho Chehabhandler is never re-entered: if the same interrupt arrives, it is queued
75c4fcd7caSMauro Carvalho Chehab(or dropped). Because it disables interrupts, this handler has to be
76c4fcd7caSMauro Carvalho Chehabfast: frequently it simply acknowledges the interrupt, marks a 'software
77c4fcd7caSMauro Carvalho Chehabinterrupt' for execution and exits.
78c4fcd7caSMauro Carvalho Chehab
79fe450eebSChangbin DuYou can tell you are in a hardware interrupt, because in_hardirq() returns
80fe450eebSChangbin Dutrue.
81c4fcd7caSMauro Carvalho Chehab
82c4fcd7caSMauro Carvalho Chehab.. warning::
83c4fcd7caSMauro Carvalho Chehab
84c4fcd7caSMauro Carvalho Chehab    Beware that this will return a false positive if interrupts are
85c4fcd7caSMauro Carvalho Chehab    disabled (see below).
86c4fcd7caSMauro Carvalho Chehab
87c4fcd7caSMauro Carvalho ChehabSoftware Interrupt Context: Softirqs and Tasklets
88c4fcd7caSMauro Carvalho Chehab-------------------------------------------------
89c4fcd7caSMauro Carvalho Chehab
90c4fcd7caSMauro Carvalho ChehabWhenever a system call is about to return to userspace, or a hardware
91c4fcd7caSMauro Carvalho Chehabinterrupt handler exits, any 'software interrupts' which are marked
92c4fcd7caSMauro Carvalho Chehabpending (usually by hardware interrupts) are run (``kernel/softirq.c``).
93c4fcd7caSMauro Carvalho Chehab
94c4fcd7caSMauro Carvalho ChehabMuch of the real interrupt handling work is done here. Early in the
95c4fcd7caSMauro Carvalho Chehabtransition to SMP, there were only 'bottom halves' (BHs), which didn't
96c4fcd7caSMauro Carvalho Chehabtake advantage of multiple CPUs. Shortly after we switched from wind-up
97c4fcd7caSMauro Carvalho Chehabcomputers made of match-sticks and snot, we abandoned this limitation
98c4fcd7caSMauro Carvalho Chehaband switched to 'softirqs'.
99c4fcd7caSMauro Carvalho Chehab
100c4fcd7caSMauro Carvalho Chehab``include/linux/interrupt.h`` lists the different softirqs. A very
101c4fcd7caSMauro Carvalho Chehabimportant softirq is the timer softirq (``include/linux/timer.h``): you
102c4fcd7caSMauro Carvalho Chehabcan register to have it call functions for you in a given length of
103c4fcd7caSMauro Carvalho Chehabtime.
104c4fcd7caSMauro Carvalho Chehab
105c4fcd7caSMauro Carvalho ChehabSoftirqs are often a pain to deal with, since the same softirq will run
106c4fcd7caSMauro Carvalho Chehabsimultaneously on more than one CPU. For this reason, tasklets
107c4fcd7caSMauro Carvalho Chehab(``include/linux/interrupt.h``) are more often used: they are
108c4fcd7caSMauro Carvalho Chehabdynamically-registrable (meaning you can have as many as you want), and
109c4fcd7caSMauro Carvalho Chehabthey also guarantee that any tasklet will only run on one CPU at any
110c4fcd7caSMauro Carvalho Chehabtime, although different tasklets can run simultaneously.
111c4fcd7caSMauro Carvalho Chehab
112c4fcd7caSMauro Carvalho Chehab.. warning::
113c4fcd7caSMauro Carvalho Chehab
114c4fcd7caSMauro Carvalho Chehab    The name 'tasklet' is misleading: they have nothing to do with
115f35cf1a5SKonstantin Ryabitsev    'tasks'.
116c4fcd7caSMauro Carvalho Chehab
117c4fcd7caSMauro Carvalho ChehabYou can tell you are in a softirq (or tasklet) using the
118dca1e58eSMauro Carvalho Chehab:c:func:`in_softirq()` macro (``include/linux/preempt.h``).
119c4fcd7caSMauro Carvalho Chehab
120c4fcd7caSMauro Carvalho Chehab.. warning::
121c4fcd7caSMauro Carvalho Chehab
122dca1e58eSMauro Carvalho Chehab    Beware that this will return a false positive if a
123*e648174bSMushahid Hussain    :ref:`bottom half lock <local_bh_disable>` is held.
124c4fcd7caSMauro Carvalho Chehab
125c4fcd7caSMauro Carvalho ChehabSome Basic Rules
126c4fcd7caSMauro Carvalho Chehab================
127c4fcd7caSMauro Carvalho Chehab
128c4fcd7caSMauro Carvalho ChehabNo memory protection
129c4fcd7caSMauro Carvalho Chehab    If you corrupt memory, whether in user context or interrupt context,
130c4fcd7caSMauro Carvalho Chehab    the whole machine will crash. Are you sure you can't do what you
131c4fcd7caSMauro Carvalho Chehab    want in userspace?
132c4fcd7caSMauro Carvalho Chehab
133c4fcd7caSMauro Carvalho ChehabNo floating point or MMX
134c4fcd7caSMauro Carvalho Chehab    The FPU context is not saved; even in user context the FPU state
135c4fcd7caSMauro Carvalho Chehab    probably won't correspond with the current process: you would mess
136c4fcd7caSMauro Carvalho Chehab    with some user process' FPU state. If you really want to do this,
137c4fcd7caSMauro Carvalho Chehab    you would have to explicitly save/restore the full FPU state (and
138c4fcd7caSMauro Carvalho Chehab    avoid context switches). It is generally a bad idea; use fixed point
139c4fcd7caSMauro Carvalho Chehab    arithmetic first.
140c4fcd7caSMauro Carvalho Chehab
141c4fcd7caSMauro Carvalho ChehabA rigid stack limit
142c4fcd7caSMauro Carvalho Chehab    Depending on configuration options the kernel stack is about 3K to
143c4fcd7caSMauro Carvalho Chehab    6K for most 32-bit architectures: it's about 14K on most 64-bit
144c4fcd7caSMauro Carvalho Chehab    archs, and often shared with interrupts so you can't use it all.
145c4fcd7caSMauro Carvalho Chehab    Avoid deep recursion and huge local arrays on the stack (allocate
146c4fcd7caSMauro Carvalho Chehab    them dynamically instead).
147c4fcd7caSMauro Carvalho Chehab
148c4fcd7caSMauro Carvalho ChehabThe Linux kernel is portable
149c4fcd7caSMauro Carvalho Chehab    Let's keep it that way. Your code should be 64-bit clean, and
150c4fcd7caSMauro Carvalho Chehab    endian-independent. You should also minimize CPU specific stuff,
151c4fcd7caSMauro Carvalho Chehab    e.g. inline assembly should be cleanly encapsulated and minimized to
152c4fcd7caSMauro Carvalho Chehab    ease porting. Generally it should be restricted to the
153c4fcd7caSMauro Carvalho Chehab    architecture-dependent part of the kernel tree.
154c4fcd7caSMauro Carvalho Chehab
155c4fcd7caSMauro Carvalho Chehabioctls: Not writing a new system call
156c4fcd7caSMauro Carvalho Chehab=====================================
157c4fcd7caSMauro Carvalho Chehab
158dca1e58eSMauro Carvalho ChehabA system call generally looks like this::
159c4fcd7caSMauro Carvalho Chehab
160c4fcd7caSMauro Carvalho Chehab    asmlinkage long sys_mycall(int arg)
161c4fcd7caSMauro Carvalho Chehab    {
162c4fcd7caSMauro Carvalho Chehab            return 0;
163c4fcd7caSMauro Carvalho Chehab    }
164c4fcd7caSMauro Carvalho Chehab
165c4fcd7caSMauro Carvalho Chehab
166c4fcd7caSMauro Carvalho ChehabFirst, in most cases you don't want to create a new system call. You
167c4fcd7caSMauro Carvalho Chehabcreate a character device and implement an appropriate ioctl for it.
168c4fcd7caSMauro Carvalho ChehabThis is much more flexible than system calls, doesn't have to be entered
169c4fcd7caSMauro Carvalho Chehabin every architecture's ``include/asm/unistd.h`` and
170c4fcd7caSMauro Carvalho Chehab``arch/kernel/entry.S`` file, and is much more likely to be accepted by
171c4fcd7caSMauro Carvalho ChehabLinus.
172c4fcd7caSMauro Carvalho Chehab
173c4fcd7caSMauro Carvalho ChehabIf all your routine does is read or write some parameter, consider
174c4fcd7caSMauro Carvalho Chehabimplementing a :c:func:`sysfs()` interface instead.
175c4fcd7caSMauro Carvalho Chehab
176c4fcd7caSMauro Carvalho ChehabInside the ioctl you're in user context to a process. When a error
177dca1e58eSMauro Carvalho Chehaboccurs you return a negated errno (see
178dca1e58eSMauro Carvalho Chehab``include/uapi/asm-generic/errno-base.h``,
179dca1e58eSMauro Carvalho Chehab``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``),
180c4fcd7caSMauro Carvalho Chehabotherwise you return 0.
181c4fcd7caSMauro Carvalho Chehab
182c4fcd7caSMauro Carvalho ChehabAfter you slept you should check if a signal occurred: the Unix/Linux
183c4fcd7caSMauro Carvalho Chehabway of handling signals is to temporarily exit the system call with the
184c4fcd7caSMauro Carvalho Chehab``-ERESTARTSYS`` error. The system call entry code will switch back to
185c4fcd7caSMauro Carvalho Chehabuser context, process the signal handler and then your system call will
186c4fcd7caSMauro Carvalho Chehabbe restarted (unless the user disabled that). So you should be prepared
187c4fcd7caSMauro Carvalho Chehabto process the restart, e.g. if you're in the middle of manipulating
188c4fcd7caSMauro Carvalho Chehabsome data structure.
189c4fcd7caSMauro Carvalho Chehab
190c4fcd7caSMauro Carvalho Chehab::
191c4fcd7caSMauro Carvalho Chehab
192c4fcd7caSMauro Carvalho Chehab    if (signal_pending(current))
193c4fcd7caSMauro Carvalho Chehab            return -ERESTARTSYS;
194c4fcd7caSMauro Carvalho Chehab
195c4fcd7caSMauro Carvalho Chehab
196c4fcd7caSMauro Carvalho ChehabIf you're doing longer computations: first think userspace. If you
197c4fcd7caSMauro Carvalho Chehab**really** want to do it in kernel you should regularly check if you need
198c4fcd7caSMauro Carvalho Chehabto give up the CPU (remember there is cooperative multitasking per CPU).
199dca1e58eSMauro Carvalho ChehabIdiom::
200c4fcd7caSMauro Carvalho Chehab
201c4fcd7caSMauro Carvalho Chehab    cond_resched(); /* Will sleep */
202c4fcd7caSMauro Carvalho Chehab
203c4fcd7caSMauro Carvalho Chehab
204c4fcd7caSMauro Carvalho ChehabA short note on interface design: the UNIX system call motto is "Provide
205c4fcd7caSMauro Carvalho Chehabmechanism not policy".
206c4fcd7caSMauro Carvalho Chehab
207c4fcd7caSMauro Carvalho ChehabRecipes for Deadlock
208c4fcd7caSMauro Carvalho Chehab====================
209c4fcd7caSMauro Carvalho Chehab
210c4fcd7caSMauro Carvalho ChehabYou cannot call any routines which may sleep, unless:
211c4fcd7caSMauro Carvalho Chehab
212c4fcd7caSMauro Carvalho Chehab-  You are in user context.
213c4fcd7caSMauro Carvalho Chehab
214c4fcd7caSMauro Carvalho Chehab-  You do not own any spinlocks.
215c4fcd7caSMauro Carvalho Chehab
216c4fcd7caSMauro Carvalho Chehab-  You have interrupts enabled (actually, Andi Kleen says that the
217c4fcd7caSMauro Carvalho Chehab   scheduling code will enable them for you, but that's probably not
218c4fcd7caSMauro Carvalho Chehab   what you wanted).
219c4fcd7caSMauro Carvalho Chehab
220c4fcd7caSMauro Carvalho ChehabNote that some functions may sleep implicitly: common ones are the user
221c4fcd7caSMauro Carvalho Chehabspace access functions (\*_user) and memory allocation functions
222c4fcd7caSMauro Carvalho Chehabwithout ``GFP_ATOMIC``.
223c4fcd7caSMauro Carvalho Chehab
224c4fcd7caSMauro Carvalho ChehabYou should always compile your kernel ``CONFIG_DEBUG_ATOMIC_SLEEP`` on,
225c4fcd7caSMauro Carvalho Chehaband it will warn you if you break these rules. If you **do** break the
226c4fcd7caSMauro Carvalho Chehabrules, you will eventually lock up your box.
227c4fcd7caSMauro Carvalho Chehab
228c4fcd7caSMauro Carvalho ChehabReally.
229c4fcd7caSMauro Carvalho Chehab
230c4fcd7caSMauro Carvalho ChehabCommon Routines
231c4fcd7caSMauro Carvalho Chehab===============
232c4fcd7caSMauro Carvalho Chehab
233dca1e58eSMauro Carvalho Chehab:c:func:`printk()`
234dca1e58eSMauro Carvalho Chehab------------------
235dca1e58eSMauro Carvalho Chehab
236dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/printk.h``
237c4fcd7caSMauro Carvalho Chehab
238c4fcd7caSMauro Carvalho Chehab:c:func:`printk()` feeds kernel messages to the console, dmesg, and
239c4fcd7caSMauro Carvalho Chehabthe syslog daemon. It is useful for debugging and reporting errors, and
240c4fcd7caSMauro Carvalho Chehabcan be used inside interrupt context, but use with caution: a machine
241c4fcd7caSMauro Carvalho Chehabwhich has its console flooded with printk messages is unusable. It uses
242c4fcd7caSMauro Carvalho Chehaba format string mostly compatible with ANSI C printf, and C string
243dca1e58eSMauro Carvalho Chehabconcatenation to give it a first "priority" argument::
244c4fcd7caSMauro Carvalho Chehab
245c4fcd7caSMauro Carvalho Chehab    printk(KERN_INFO "i = %u\n", i);
246c4fcd7caSMauro Carvalho Chehab
247c4fcd7caSMauro Carvalho Chehab
248dca1e58eSMauro Carvalho ChehabSee ``include/linux/kern_levels.h``; for other ``KERN_`` values; these are
249c4fcd7caSMauro Carvalho Chehabinterpreted by syslog as the level. Special case: for printing an IP
250dca1e58eSMauro Carvalho Chehabaddress use::
251c4fcd7caSMauro Carvalho Chehab
252c4fcd7caSMauro Carvalho Chehab    __be32 ipaddress;
253c4fcd7caSMauro Carvalho Chehab    printk(KERN_INFO "my ip: %pI4\n", &ipaddress);
254c4fcd7caSMauro Carvalho Chehab
255c4fcd7caSMauro Carvalho Chehab
256c4fcd7caSMauro Carvalho Chehab:c:func:`printk()` internally uses a 1K buffer and does not catch
257c4fcd7caSMauro Carvalho Chehaboverruns. Make sure that will be enough.
258c4fcd7caSMauro Carvalho Chehab
259c4fcd7caSMauro Carvalho Chehab.. note::
260c4fcd7caSMauro Carvalho Chehab
261c4fcd7caSMauro Carvalho Chehab    You will know when you are a real kernel hacker when you start
262c4fcd7caSMauro Carvalho Chehab    typoing printf as printk in your user programs :)
263c4fcd7caSMauro Carvalho Chehab
264c4fcd7caSMauro Carvalho Chehab.. note::
265c4fcd7caSMauro Carvalho Chehab
266c4fcd7caSMauro Carvalho Chehab    Another sidenote: the original Unix Version 6 sources had a comment
267c4fcd7caSMauro Carvalho Chehab    on top of its printf function: "Printf should not be used for
268c4fcd7caSMauro Carvalho Chehab    chit-chat". You should follow that advice.
269c4fcd7caSMauro Carvalho Chehab
270dca1e58eSMauro Carvalho Chehab:c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()`
271dca1e58eSMauro Carvalho Chehab---------------------------------------------------------------------------------------------------
272dca1e58eSMauro Carvalho Chehab
273dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/uaccess.h`` / ``asm/uaccess.h``
274c4fcd7caSMauro Carvalho Chehab
275c4fcd7caSMauro Carvalho Chehab**[SLEEPS]**
276c4fcd7caSMauro Carvalho Chehab
277c4fcd7caSMauro Carvalho Chehab:c:func:`put_user()` and :c:func:`get_user()` are used to get
278c4fcd7caSMauro Carvalho Chehaband put single values (such as an int, char, or long) from and to
279c4fcd7caSMauro Carvalho Chehabuserspace. A pointer into userspace should never be simply dereferenced:
280c4fcd7caSMauro Carvalho Chehabdata should be copied using these routines. Both return ``-EFAULT`` or
281c4fcd7caSMauro Carvalho Chehab0.
282c4fcd7caSMauro Carvalho Chehab
283c4fcd7caSMauro Carvalho Chehab:c:func:`copy_to_user()` and :c:func:`copy_from_user()` are
284c4fcd7caSMauro Carvalho Chehabmore general: they copy an arbitrary amount of data to and from
285c4fcd7caSMauro Carvalho Chehabuserspace.
286c4fcd7caSMauro Carvalho Chehab
287c4fcd7caSMauro Carvalho Chehab.. warning::
288c4fcd7caSMauro Carvalho Chehab
289c4fcd7caSMauro Carvalho Chehab    Unlike :c:func:`put_user()` and :c:func:`get_user()`, they
290c4fcd7caSMauro Carvalho Chehab    return the amount of uncopied data (ie. 0 still means success).
291c4fcd7caSMauro Carvalho Chehab
292f35cf1a5SKonstantin Ryabitsev[Yes, this objectionable interface makes me cringe. The flamewar comes
293f35cf1a5SKonstantin Ryabitsevup every year or so. --RR.]
294c4fcd7caSMauro Carvalho Chehab
295c4fcd7caSMauro Carvalho ChehabThe functions may sleep implicitly. This should never be called outside
296c4fcd7caSMauro Carvalho Chehabuser context (it makes no sense), with interrupts disabled, or a
297c4fcd7caSMauro Carvalho Chehabspinlock held.
298c4fcd7caSMauro Carvalho Chehab
299dca1e58eSMauro Carvalho Chehab:c:func:`kmalloc()`/:c:func:`kfree()`
300dca1e58eSMauro Carvalho Chehab-------------------------------------
301dca1e58eSMauro Carvalho Chehab
302dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/slab.h``
303c4fcd7caSMauro Carvalho Chehab
304c4fcd7caSMauro Carvalho Chehab**[MAY SLEEP: SEE BELOW]**
305c4fcd7caSMauro Carvalho Chehab
306c4fcd7caSMauro Carvalho ChehabThese routines are used to dynamically request pointer-aligned chunks of
307c4fcd7caSMauro Carvalho Chehabmemory, like malloc and free do in userspace, but
308c4fcd7caSMauro Carvalho Chehab:c:func:`kmalloc()` takes an extra flag word. Important values:
309c4fcd7caSMauro Carvalho Chehab
310c4fcd7caSMauro Carvalho Chehab``GFP_KERNEL``
311c4fcd7caSMauro Carvalho Chehab    May sleep and swap to free memory. Only allowed in user context, but
312c4fcd7caSMauro Carvalho Chehab    is the most reliable way to allocate memory.
313c4fcd7caSMauro Carvalho Chehab
314c4fcd7caSMauro Carvalho Chehab``GFP_ATOMIC``
315c4fcd7caSMauro Carvalho Chehab    Don't sleep. Less reliable than ``GFP_KERNEL``, but may be called
316c4fcd7caSMauro Carvalho Chehab    from interrupt context. You should **really** have a good
317c4fcd7caSMauro Carvalho Chehab    out-of-memory error-handling strategy.
318c4fcd7caSMauro Carvalho Chehab
319c4fcd7caSMauro Carvalho Chehab``GFP_DMA``
320c4fcd7caSMauro Carvalho Chehab    Allocate ISA DMA lower than 16MB. If you don't know what that is you
321c4fcd7caSMauro Carvalho Chehab    don't need it. Very unreliable.
322c4fcd7caSMauro Carvalho Chehab
323c4fcd7caSMauro Carvalho ChehabIf you see a sleeping function called from invalid context warning
324c4fcd7caSMauro Carvalho Chehabmessage, then maybe you called a sleeping allocation function from
325c4fcd7caSMauro Carvalho Chehabinterrupt context without ``GFP_ATOMIC``. You should really fix that.
326c4fcd7caSMauro Carvalho ChehabRun, don't walk.
327c4fcd7caSMauro Carvalho Chehab
328dca1e58eSMauro Carvalho ChehabIf you are allocating at least ``PAGE_SIZE`` (``asm/page.h`` or
329dca1e58eSMauro Carvalho Chehab``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()`
330dca1e58eSMauro Carvalho Chehab(``include/linux/gfp.h``). It takes an order argument (0 for page sized,
331c4fcd7caSMauro Carvalho Chehab1 for double page, 2 for four pages etc.) and the same memory priority
332c4fcd7caSMauro Carvalho Chehabflag word as above.
333c4fcd7caSMauro Carvalho Chehab
334c4fcd7caSMauro Carvalho ChehabIf you are allocating more than a page worth of bytes you can use
335c4fcd7caSMauro Carvalho Chehab:c:func:`vmalloc()`. It'll allocate virtual memory in the kernel
336c4fcd7caSMauro Carvalho Chehabmap. This block is not contiguous in physical memory, but the MMU makes
337c4fcd7caSMauro Carvalho Chehabit look like it is for you (so it'll only look contiguous to the CPUs,
338c4fcd7caSMauro Carvalho Chehabnot to external device drivers). If you really need large physically
339c4fcd7caSMauro Carvalho Chehabcontiguous memory for some weird device, you have a problem: it is
340c4fcd7caSMauro Carvalho Chehabpoorly supported in Linux because after some time memory fragmentation
341c4fcd7caSMauro Carvalho Chehabin a running kernel makes it hard. The best way is to allocate the block
342c4fcd7caSMauro Carvalho Chehabearly in the boot process via the :c:func:`alloc_bootmem()`
343c4fcd7caSMauro Carvalho Chehabroutine.
344c4fcd7caSMauro Carvalho Chehab
345c4fcd7caSMauro Carvalho ChehabBefore inventing your own cache of often-used objects consider using a
346c4fcd7caSMauro Carvalho Chehabslab cache in ``include/linux/slab.h``
347c4fcd7caSMauro Carvalho Chehab
3483a4928cfSJoe Pater:c:macro:`current`
3493a4928cfSJoe Pater------------------
350dca1e58eSMauro Carvalho Chehab
351dca1e58eSMauro Carvalho ChehabDefined in ``include/asm/current.h``
352c4fcd7caSMauro Carvalho Chehab
353c4fcd7caSMauro Carvalho ChehabThis global variable (really a macro) contains a pointer to the current
354c4fcd7caSMauro Carvalho Chehabtask structure, so is only valid in user context. For example, when a
355c4fcd7caSMauro Carvalho Chehabprocess makes a system call, this will point to the task structure of
356c4fcd7caSMauro Carvalho Chehabthe calling process. It is **not NULL** in interrupt context.
357c4fcd7caSMauro Carvalho Chehab
358dca1e58eSMauro Carvalho Chehab:c:func:`mdelay()`/:c:func:`udelay()`
359dca1e58eSMauro Carvalho Chehab-------------------------------------
360dca1e58eSMauro Carvalho Chehab
361dca1e58eSMauro Carvalho ChehabDefined in ``include/asm/delay.h`` / ``include/linux/delay.h``
362c4fcd7caSMauro Carvalho Chehab
363c4fcd7caSMauro Carvalho ChehabThe :c:func:`udelay()` and :c:func:`ndelay()` functions can be
364c4fcd7caSMauro Carvalho Chehabused for small pauses. Do not use large values with them as you risk
365c4fcd7caSMauro Carvalho Chehaboverflow - the helper function :c:func:`mdelay()` is useful here, or
366c4fcd7caSMauro Carvalho Chehabconsider :c:func:`msleep()`.
367c4fcd7caSMauro Carvalho Chehab
368dca1e58eSMauro Carvalho Chehab:c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()`
369dca1e58eSMauro Carvalho Chehab-----------------------------------------------------------------------------------------------
370dca1e58eSMauro Carvalho Chehab
371dca1e58eSMauro Carvalho ChehabDefined in ``include/asm/byteorder.h``
372c4fcd7caSMauro Carvalho Chehab
373c4fcd7caSMauro Carvalho ChehabThe :c:func:`cpu_to_be32()` family (where the "32" can be replaced
374c4fcd7caSMauro Carvalho Chehabby 64 or 16, and the "be" can be replaced by "le") are the general way
375c4fcd7caSMauro Carvalho Chehabto do endian conversions in the kernel: they return the converted value.
376c4fcd7caSMauro Carvalho ChehabAll variations supply the reverse as well:
377c4fcd7caSMauro Carvalho Chehab:c:func:`be32_to_cpu()`, etc.
378c4fcd7caSMauro Carvalho Chehab
379c4fcd7caSMauro Carvalho ChehabThere are two major variations of these functions: the pointer
380c4fcd7caSMauro Carvalho Chehabvariation, such as :c:func:`cpu_to_be32p()`, which take a pointer
381c4fcd7caSMauro Carvalho Chehabto the given type, and return the converted value. The other variation
382c4fcd7caSMauro Carvalho Chehabis the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which
383c4fcd7caSMauro Carvalho Chehabconvert value referred to by the pointer, and return void.
384c4fcd7caSMauro Carvalho Chehab
385dca1e58eSMauro Carvalho Chehab:c:func:`local_irq_save()`/:c:func:`local_irq_restore()`
386dca1e58eSMauro Carvalho Chehab--------------------------------------------------------
387dca1e58eSMauro Carvalho Chehab
388dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/irqflags.h``
389c4fcd7caSMauro Carvalho Chehab
390c4fcd7caSMauro Carvalho ChehabThese routines disable hard interrupts on the local CPU, and restore
391c4fcd7caSMauro Carvalho Chehabthem. They are reentrant; saving the previous state in their one
392c4fcd7caSMauro Carvalho Chehab``unsigned long flags`` argument. If you know that interrupts are
393c4fcd7caSMauro Carvalho Chehabenabled, you can simply use :c:func:`local_irq_disable()` and
394c4fcd7caSMauro Carvalho Chehab:c:func:`local_irq_enable()`.
395c4fcd7caSMauro Carvalho Chehab
396dca1e58eSMauro Carvalho Chehab.. _local_bh_disable:
397dca1e58eSMauro Carvalho Chehab
398dca1e58eSMauro Carvalho Chehab:c:func:`local_bh_disable()`/:c:func:`local_bh_enable()`
399dca1e58eSMauro Carvalho Chehab--------------------------------------------------------
400dca1e58eSMauro Carvalho Chehab
401dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/bottom_half.h``
402dca1e58eSMauro Carvalho Chehab
403c4fcd7caSMauro Carvalho Chehab
404c4fcd7caSMauro Carvalho ChehabThese routines disable soft interrupts on the local CPU, and restore
405c4fcd7caSMauro Carvalho Chehabthem. They are reentrant; if soft interrupts were disabled before, they
406c4fcd7caSMauro Carvalho Chehabwill still be disabled after this pair of functions has been called.
407c4fcd7caSMauro Carvalho ChehabThey prevent softirqs and tasklets from running on the current CPU.
408c4fcd7caSMauro Carvalho Chehab
409dca1e58eSMauro Carvalho Chehab:c:func:`smp_processor_id()`
410dca1e58eSMauro Carvalho Chehab----------------------------
411dca1e58eSMauro Carvalho Chehab
412dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/smp.h``
413c4fcd7caSMauro Carvalho Chehab
414c4fcd7caSMauro Carvalho Chehab:c:func:`get_cpu()` disables preemption (so you won't suddenly get
415c4fcd7caSMauro Carvalho Chehabmoved to another CPU) and returns the current processor number, between
416c4fcd7caSMauro Carvalho Chehab0 and ``NR_CPUS``. Note that the CPU numbers are not necessarily
417c4fcd7caSMauro Carvalho Chehabcontinuous. You return it again with :c:func:`put_cpu()` when you
418c4fcd7caSMauro Carvalho Chehabare done.
419c4fcd7caSMauro Carvalho Chehab
420c4fcd7caSMauro Carvalho ChehabIf you know you cannot be preempted by another task (ie. you are in
421c4fcd7caSMauro Carvalho Chehabinterrupt context, or have preemption disabled) you can use
422c4fcd7caSMauro Carvalho Chehabsmp_processor_id().
423c4fcd7caSMauro Carvalho Chehab
424dca1e58eSMauro Carvalho Chehab``__init``/``__exit``/``__initdata``
425dca1e58eSMauro Carvalho Chehab------------------------------------
426dca1e58eSMauro Carvalho Chehab
427dca1e58eSMauro Carvalho ChehabDefined in  ``include/linux/init.h``
428c4fcd7caSMauro Carvalho Chehab
429c4fcd7caSMauro Carvalho ChehabAfter boot, the kernel frees up a special section; functions marked with
430c4fcd7caSMauro Carvalho Chehab``__init`` and data structures marked with ``__initdata`` are dropped
431c4fcd7caSMauro Carvalho Chehabafter boot is complete: similarly modules discard this memory after
432c4fcd7caSMauro Carvalho Chehabinitialization. ``__exit`` is used to declare a function which is only
433c4fcd7caSMauro Carvalho Chehabrequired on exit: the function will be dropped if this file is not
434c4fcd7caSMauro Carvalho Chehabcompiled as a module. See the header file for use. Note that it makes no
435c4fcd7caSMauro Carvalho Chehabsense for a function marked with ``__init`` to be exported to modules
436dca1e58eSMauro Carvalho Chehabwith :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this
437dca1e58eSMauro Carvalho Chehabwill break.
438c4fcd7caSMauro Carvalho Chehab
439dca1e58eSMauro Carvalho Chehab:c:func:`__initcall()`/:c:func:`module_init()`
440dca1e58eSMauro Carvalho Chehab----------------------------------------------
441dca1e58eSMauro Carvalho Chehab
442dca1e58eSMauro Carvalho ChehabDefined in  ``include/linux/init.h`` / ``include/linux/module.h``
443c4fcd7caSMauro Carvalho Chehab
444c4fcd7caSMauro Carvalho ChehabMany parts of the kernel are well served as a module
445c4fcd7caSMauro Carvalho Chehab(dynamically-loadable parts of the kernel). Using the
446c4fcd7caSMauro Carvalho Chehab:c:func:`module_init()` and :c:func:`module_exit()` macros it
447c4fcd7caSMauro Carvalho Chehabis easy to write code without #ifdefs which can operate both as a module
448c4fcd7caSMauro Carvalho Chehabor built into the kernel.
449c4fcd7caSMauro Carvalho Chehab
450c4fcd7caSMauro Carvalho ChehabThe :c:func:`module_init()` macro defines which function is to be
451c4fcd7caSMauro Carvalho Chehabcalled at module insertion time (if the file is compiled as a module),
452c4fcd7caSMauro Carvalho Chehabor at boot time: if the file is not compiled as a module the
453c4fcd7caSMauro Carvalho Chehab:c:func:`module_init()` macro becomes equivalent to
454c4fcd7caSMauro Carvalho Chehab:c:func:`__initcall()`, which through linker magic ensures that
455c4fcd7caSMauro Carvalho Chehabthe function is called on boot.
456c4fcd7caSMauro Carvalho Chehab
457c4fcd7caSMauro Carvalho ChehabThe function can return a negative error number to cause module loading
458c4fcd7caSMauro Carvalho Chehabto fail (unfortunately, this has no effect if the module is compiled
459c4fcd7caSMauro Carvalho Chehabinto the kernel). This function is called in user context with
460c4fcd7caSMauro Carvalho Chehabinterrupts enabled, so it can sleep.
461c4fcd7caSMauro Carvalho Chehab
462dca1e58eSMauro Carvalho Chehab:c:func:`module_exit()`
463dca1e58eSMauro Carvalho Chehab-----------------------
464dca1e58eSMauro Carvalho Chehab
465dca1e58eSMauro Carvalho Chehab
466dca1e58eSMauro Carvalho ChehabDefined in  ``include/linux/module.h``
467c4fcd7caSMauro Carvalho Chehab
468c4fcd7caSMauro Carvalho ChehabThis macro defines the function to be called at module removal time (or
469c4fcd7caSMauro Carvalho Chehabnever, in the case of the file compiled into the kernel). It will only
470c4fcd7caSMauro Carvalho Chehabbe called if the module usage count has reached zero. This function can
471c4fcd7caSMauro Carvalho Chehabalso sleep, but cannot fail: everything must be cleaned up by the time
472c4fcd7caSMauro Carvalho Chehabit returns.
473c4fcd7caSMauro Carvalho Chehab
474c4fcd7caSMauro Carvalho ChehabNote that this macro is optional: if it is not present, your module will
475c4fcd7caSMauro Carvalho Chehabnot be removable (except for 'rmmod -f').
476c4fcd7caSMauro Carvalho Chehab
477dca1e58eSMauro Carvalho Chehab:c:func:`try_module_get()`/:c:func:`module_put()`
478dca1e58eSMauro Carvalho Chehab-------------------------------------------------
479dca1e58eSMauro Carvalho Chehab
480dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/module.h``
481c4fcd7caSMauro Carvalho Chehab
482c4fcd7caSMauro Carvalho ChehabThese manipulate the module usage count, to protect against removal (a
483c4fcd7caSMauro Carvalho Chehabmodule also can't be removed if another module uses one of its exported
484c4fcd7caSMauro Carvalho Chehabsymbols: see below). Before calling into module code, you should call
485c4fcd7caSMauro Carvalho Chehab:c:func:`try_module_get()` on that module: if it fails, then the
486c4fcd7caSMauro Carvalho Chehabmodule is being removed and you should act as if it wasn't there.
487c4fcd7caSMauro Carvalho ChehabOtherwise, you can safely enter the module, and call
488c4fcd7caSMauro Carvalho Chehab:c:func:`module_put()` when you're finished.
489c4fcd7caSMauro Carvalho Chehab
490c4fcd7caSMauro Carvalho ChehabMost registerable structures have an owner field, such as in the
491c4fcd7caSMauro Carvalho Chehab:c:type:`struct file_operations <file_operations>` structure.
492c4fcd7caSMauro Carvalho ChehabSet this field to the macro ``THIS_MODULE``.
493c4fcd7caSMauro Carvalho Chehab
494c4fcd7caSMauro Carvalho ChehabWait Queues ``include/linux/wait.h``
495c4fcd7caSMauro Carvalho Chehab====================================
496c4fcd7caSMauro Carvalho Chehab
497c4fcd7caSMauro Carvalho Chehab**[SLEEPS]**
498c4fcd7caSMauro Carvalho Chehab
499c4fcd7caSMauro Carvalho ChehabA wait queue is used to wait for someone to wake you up when a certain
500c4fcd7caSMauro Carvalho Chehabcondition is true. They must be used carefully to ensure there is no
501dca1e58eSMauro Carvalho Chehabrace condition. You declare a :c:type:`wait_queue_head_t`, and then processes
502650fc870SLinus Torvaldswhich want to wait for that condition declare a :c:type:`wait_queue_entry_t`
503c4fcd7caSMauro Carvalho Chehabreferring to themselves, and place that in the queue.
504c4fcd7caSMauro Carvalho Chehab
505c4fcd7caSMauro Carvalho ChehabDeclaring
506c4fcd7caSMauro Carvalho Chehab---------
507c4fcd7caSMauro Carvalho Chehab
508c4fcd7caSMauro Carvalho ChehabYou declare a ``wait_queue_head_t`` using the
509c4fcd7caSMauro Carvalho Chehab:c:func:`DECLARE_WAIT_QUEUE_HEAD()` macro, or using the
510c4fcd7caSMauro Carvalho Chehab:c:func:`init_waitqueue_head()` routine in your initialization
511c4fcd7caSMauro Carvalho Chehabcode.
512c4fcd7caSMauro Carvalho Chehab
513c4fcd7caSMauro Carvalho ChehabQueuing
514c4fcd7caSMauro Carvalho Chehab-------
515c4fcd7caSMauro Carvalho Chehab
516c4fcd7caSMauro Carvalho ChehabPlacing yourself in the waitqueue is fairly complex, because you must
517c4fcd7caSMauro Carvalho Chehabput yourself in the queue before checking the condition. There is a
518c4fcd7caSMauro Carvalho Chehabmacro to do this: :c:func:`wait_event_interruptible()`
519dca1e58eSMauro Carvalho Chehab(``include/linux/wait.h``) The first argument is the wait queue head, and
520c4fcd7caSMauro Carvalho Chehabthe second is an expression which is evaluated; the macro returns 0 when
521dca1e58eSMauro Carvalho Chehabthis expression is true, or ``-ERESTARTSYS`` if a signal is received. The
522c4fcd7caSMauro Carvalho Chehab:c:func:`wait_event()` version ignores signals.
523c4fcd7caSMauro Carvalho Chehab
524c4fcd7caSMauro Carvalho ChehabWaking Up Queued Tasks
525c4fcd7caSMauro Carvalho Chehab----------------------
526c4fcd7caSMauro Carvalho Chehab
527c1de03a4SMarco Donato TorselloCall :c:func:`wake_up()` (``include/linux/wait.h``), which will wake
528c4fcd7caSMauro Carvalho Chehabup every process in the queue. The exception is if one has
529c4fcd7caSMauro Carvalho Chehab``TASK_EXCLUSIVE`` set, in which case the remainder of the queue will
530c4fcd7caSMauro Carvalho Chehabnot be woken. There are other variants of this basic function available
531c4fcd7caSMauro Carvalho Chehabin the same header.
532c4fcd7caSMauro Carvalho Chehab
533c4fcd7caSMauro Carvalho ChehabAtomic Operations
534c4fcd7caSMauro Carvalho Chehab=================
535c4fcd7caSMauro Carvalho Chehab
536c4fcd7caSMauro Carvalho ChehabCertain operations are guaranteed atomic on all platforms. The first
537dca1e58eSMauro Carvalho Chehabclass of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``);
538dca1e58eSMauro Carvalho Chehabthis contains a signed integer (at least 32 bits long), and you must use
539dca1e58eSMauro Carvalho Chehabthese functions to manipulate or read :c:type:`atomic_t` variables.
540c4fcd7caSMauro Carvalho Chehab:c:func:`atomic_read()` and :c:func:`atomic_set()` get and set
541c4fcd7caSMauro Carvalho Chehabthe counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`,
542c4fcd7caSMauro Carvalho Chehab:c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and
543c4fcd7caSMauro Carvalho Chehab:c:func:`atomic_dec_and_test()` (returns true if it was
544c4fcd7caSMauro Carvalho Chehabdecremented to zero).
545c4fcd7caSMauro Carvalho Chehab
546c4fcd7caSMauro Carvalho ChehabYes. It returns true (i.e. != 0) if the atomic variable is zero.
547c4fcd7caSMauro Carvalho Chehab
548c4fcd7caSMauro Carvalho ChehabNote that these functions are slower than normal arithmetic, and so
549c4fcd7caSMauro Carvalho Chehabshould not be used unnecessarily.
550c4fcd7caSMauro Carvalho Chehab
551c4fcd7caSMauro Carvalho ChehabThe second class of atomic operations is atomic bit operations on an
552c4fcd7caSMauro Carvalho Chehab``unsigned long``, defined in ``include/linux/bitops.h``. These
553c4fcd7caSMauro Carvalho Chehaboperations generally take a pointer to the bit pattern, and a bit
554c4fcd7caSMauro Carvalho Chehabnumber: 0 is the least significant bit. :c:func:`set_bit()`,
555c4fcd7caSMauro Carvalho Chehab:c:func:`clear_bit()` and :c:func:`change_bit()` set, clear,
556c4fcd7caSMauro Carvalho Chehaband flip the given bit. :c:func:`test_and_set_bit()`,
557c4fcd7caSMauro Carvalho Chehab:c:func:`test_and_clear_bit()` and
558c4fcd7caSMauro Carvalho Chehab:c:func:`test_and_change_bit()` do the same thing, except return
559c4fcd7caSMauro Carvalho Chehabtrue if the bit was previously set; these are particularly useful for
560c4fcd7caSMauro Carvalho Chehabatomically setting flags.
561c4fcd7caSMauro Carvalho Chehab
562c4fcd7caSMauro Carvalho ChehabIt is possible to call these operations with bit indices greater than
563dca1e58eSMauro Carvalho Chehab``BITS_PER_LONG``. The resulting behavior is strange on big-endian
564c4fcd7caSMauro Carvalho Chehabplatforms though so it is a good idea not to do this.
565c4fcd7caSMauro Carvalho Chehab
566c4fcd7caSMauro Carvalho ChehabSymbols
567c4fcd7caSMauro Carvalho Chehab=======
568c4fcd7caSMauro Carvalho Chehab
569c4fcd7caSMauro Carvalho ChehabWithin the kernel proper, the normal linking rules apply (ie. unless a
570c4fcd7caSMauro Carvalho Chehabsymbol is declared to be file scope with the ``static`` keyword, it can
571c4fcd7caSMauro Carvalho Chehabbe used anywhere in the kernel). However, for modules, a special
572c4fcd7caSMauro Carvalho Chehabexported symbol table is kept which limits the entry points to the
573c4fcd7caSMauro Carvalho Chehabkernel proper. Modules can also export symbols.
574c4fcd7caSMauro Carvalho Chehab
575dca1e58eSMauro Carvalho Chehab:c:func:`EXPORT_SYMBOL()`
576dca1e58eSMauro Carvalho Chehab-------------------------
577dca1e58eSMauro Carvalho Chehab
578dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/export.h``
579c4fcd7caSMauro Carvalho Chehab
580c4fcd7caSMauro Carvalho ChehabThis is the classic method of exporting a symbol: dynamically loaded
581c4fcd7caSMauro Carvalho Chehabmodules will be able to use the symbol as normal.
582c4fcd7caSMauro Carvalho Chehab
583dca1e58eSMauro Carvalho Chehab:c:func:`EXPORT_SYMBOL_GPL()`
584dca1e58eSMauro Carvalho Chehab-----------------------------
585dca1e58eSMauro Carvalho Chehab
586dca1e58eSMauro Carvalho ChehabDefined in ``include/linux/export.h``
587c4fcd7caSMauro Carvalho Chehab
588c4fcd7caSMauro Carvalho ChehabSimilar to :c:func:`EXPORT_SYMBOL()` except that the symbols
589c4fcd7caSMauro Carvalho Chehabexported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by
590c4fcd7caSMauro Carvalho Chehabmodules with a :c:func:`MODULE_LICENSE()` that specifies a GPL
591c4fcd7caSMauro Carvalho Chehabcompatible license. It implies that the function is considered an
592c4fcd7caSMauro Carvalho Chehabinternal implementation issue, and not really an interface. Some
593c4fcd7caSMauro Carvalho Chehabmaintainers and developers may however require EXPORT_SYMBOL_GPL()
594c4fcd7caSMauro Carvalho Chehabwhen adding any new APIs or functionality.
595c4fcd7caSMauro Carvalho Chehab
596c4f4af40SMatthias Maennich:c:func:`EXPORT_SYMBOL_NS()`
597c4f4af40SMatthias Maennich----------------------------
598c4f4af40SMatthias Maennich
599c4f4af40SMatthias MaennichDefined in ``include/linux/export.h``
600c4f4af40SMatthias Maennich
601c4f4af40SMatthias MaennichThis is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol
602c4f4af40SMatthias Maennichnamespace. Symbol Namespaces are documented in
6037f3f7bfbSMauro Carvalho ChehabDocumentation/core-api/symbol-namespaces.rst
604c4f4af40SMatthias Maennich
605c4f4af40SMatthias Maennich:c:func:`EXPORT_SYMBOL_NS_GPL()`
606c4f4af40SMatthias Maennich--------------------------------
607c4f4af40SMatthias Maennich
608c4f4af40SMatthias MaennichDefined in ``include/linux/export.h``
609c4f4af40SMatthias Maennich
610c4f4af40SMatthias MaennichThis is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol
611c4f4af40SMatthias Maennichnamespace. Symbol Namespaces are documented in
6127f3f7bfbSMauro Carvalho ChehabDocumentation/core-api/symbol-namespaces.rst
613c4f4af40SMatthias Maennich
614c4fcd7caSMauro Carvalho ChehabRoutines and Conventions
615c4fcd7caSMauro Carvalho Chehab========================
616c4fcd7caSMauro Carvalho Chehab
617c4fcd7caSMauro Carvalho ChehabDouble-linked lists ``include/linux/list.h``
618c4fcd7caSMauro Carvalho Chehab--------------------------------------------
619c4fcd7caSMauro Carvalho Chehab
620c4fcd7caSMauro Carvalho ChehabThere used to be three sets of linked-list routines in the kernel
621c4fcd7caSMauro Carvalho Chehabheaders, but this one is the winner. If you don't have some particular
622c4fcd7caSMauro Carvalho Chehabpressing need for a single list, it's a good choice.
623c4fcd7caSMauro Carvalho Chehab
624c4fcd7caSMauro Carvalho ChehabIn particular, :c:func:`list_for_each_entry()` is useful.
625c4fcd7caSMauro Carvalho Chehab
626c4fcd7caSMauro Carvalho ChehabReturn Conventions
627c4fcd7caSMauro Carvalho Chehab------------------
628c4fcd7caSMauro Carvalho Chehab
629c4fcd7caSMauro Carvalho ChehabFor code called in user context, it's very common to defy C convention,
630dca1e58eSMauro Carvalho Chehaband return 0 for success, and a negative error number (eg. ``-EFAULT``) for
631c4fcd7caSMauro Carvalho Chehabfailure. This can be unintuitive at first, but it's fairly widespread in
632c4fcd7caSMauro Carvalho Chehabthe kernel.
633c4fcd7caSMauro Carvalho Chehab
634dca1e58eSMauro Carvalho ChehabUsing :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a
635c4fcd7caSMauro Carvalho Chehabnegative error number into a pointer, and :c:func:`IS_ERR()` and
636c4fcd7caSMauro Carvalho Chehab:c:func:`PTR_ERR()` to get it back out again: avoids a separate
637c4fcd7caSMauro Carvalho Chehabpointer parameter for the error number. Icky, but in a good way.
638c4fcd7caSMauro Carvalho Chehab
639c4fcd7caSMauro Carvalho ChehabBreaking Compilation
640c4fcd7caSMauro Carvalho Chehab--------------------
641c4fcd7caSMauro Carvalho Chehab
642c4fcd7caSMauro Carvalho ChehabLinus and the other developers sometimes change function or structure
643c4fcd7caSMauro Carvalho Chehabnames in development kernels; this is not done just to keep everyone on
644c4fcd7caSMauro Carvalho Chehabtheir toes: it reflects a fundamental change (eg. can no longer be
645c4fcd7caSMauro Carvalho Chehabcalled with interrupts on, or does extra checks, or doesn't do checks
646c4fcd7caSMauro Carvalho Chehabwhich were caught before). Usually this is accompanied by a fairly
647f35cf1a5SKonstantin Ryabitsevcomplete note to the appropriate kernel development mailing list; search
648f35cf1a5SKonstantin Ryabitsevthe archives. Simply doing a global replace on the file usually makes
649f35cf1a5SKonstantin Ryabitsevthings **worse**.
650c4fcd7caSMauro Carvalho Chehab
651c4fcd7caSMauro Carvalho ChehabInitializing structure members
652c4fcd7caSMauro Carvalho Chehab------------------------------
653c4fcd7caSMauro Carvalho Chehab
654c4fcd7caSMauro Carvalho ChehabThe preferred method of initializing structures is to use designated
655dca1e58eSMauro Carvalho Chehabinitialisers, as defined by ISO C99, eg::
656c4fcd7caSMauro Carvalho Chehab
657c4fcd7caSMauro Carvalho Chehab    static struct block_device_operations opt_fops = {
658c4fcd7caSMauro Carvalho Chehab            .open               = opt_open,
659c4fcd7caSMauro Carvalho Chehab            .release            = opt_release,
660c4fcd7caSMauro Carvalho Chehab            .ioctl              = opt_ioctl,
661c4fcd7caSMauro Carvalho Chehab            .check_media_change = opt_media_change,
662c4fcd7caSMauro Carvalho Chehab    };
663c4fcd7caSMauro Carvalho Chehab
664c4fcd7caSMauro Carvalho Chehab
665c4fcd7caSMauro Carvalho ChehabThis makes it easy to grep for, and makes it clear which structure
666c4fcd7caSMauro Carvalho Chehabfields are set. You should do this because it looks cool.
667c4fcd7caSMauro Carvalho Chehab
668c4fcd7caSMauro Carvalho ChehabGNU Extensions
669c4fcd7caSMauro Carvalho Chehab--------------
670c4fcd7caSMauro Carvalho Chehab
671c4fcd7caSMauro Carvalho ChehabGNU Extensions are explicitly allowed in the Linux kernel. Note that
672c4fcd7caSMauro Carvalho Chehabsome of the more complex ones are not very well supported, due to lack
673c4fcd7caSMauro Carvalho Chehabof general use, but the following are considered standard (see the GCC
674c4fcd7caSMauro Carvalho Chehabinfo page section "C Extensions" for more details - Yes, really the info
675c4fcd7caSMauro Carvalho Chehabpage, the man page is only a short summary of the stuff in info).
676c4fcd7caSMauro Carvalho Chehab
677c4fcd7caSMauro Carvalho Chehab-  Inline functions
678c4fcd7caSMauro Carvalho Chehab
679c4fcd7caSMauro Carvalho Chehab-  Statement expressions (ie. the ({ and }) constructs).
680c4fcd7caSMauro Carvalho Chehab
681c4fcd7caSMauro Carvalho Chehab-  Declaring attributes of a function / variable / type
682c4fcd7caSMauro Carvalho Chehab   (__attribute__)
683c4fcd7caSMauro Carvalho Chehab
684c4fcd7caSMauro Carvalho Chehab-  typeof
685c4fcd7caSMauro Carvalho Chehab
686c4fcd7caSMauro Carvalho Chehab-  Zero length arrays
687c4fcd7caSMauro Carvalho Chehab
688c4fcd7caSMauro Carvalho Chehab-  Macro varargs
689c4fcd7caSMauro Carvalho Chehab
690c4fcd7caSMauro Carvalho Chehab-  Arithmetic on void pointers
691c4fcd7caSMauro Carvalho Chehab
692c4fcd7caSMauro Carvalho Chehab-  Non-Constant initializers
693c4fcd7caSMauro Carvalho Chehab
694c4fcd7caSMauro Carvalho Chehab-  Assembler Instructions (not outside arch/ and include/asm/)
695c4fcd7caSMauro Carvalho Chehab
696c4fcd7caSMauro Carvalho Chehab-  Function names as strings (__func__).
697c4fcd7caSMauro Carvalho Chehab
698c4fcd7caSMauro Carvalho Chehab-  __builtin_constant_p()
699c4fcd7caSMauro Carvalho Chehab
700c4fcd7caSMauro Carvalho ChehabBe wary when using long long in the kernel, the code gcc generates for
701c4fcd7caSMauro Carvalho Chehabit is horrible and worse: division and multiplication does not work on
702c4fcd7caSMauro Carvalho Chehabi386 because the GCC runtime functions for it are missing from the
703c4fcd7caSMauro Carvalho Chehabkernel environment.
704c4fcd7caSMauro Carvalho Chehab
705c4fcd7caSMauro Carvalho ChehabC++
706c4fcd7caSMauro Carvalho Chehab---
707c4fcd7caSMauro Carvalho Chehab
708c4fcd7caSMauro Carvalho ChehabUsing C++ in the kernel is usually a bad idea, because the kernel does
709c4fcd7caSMauro Carvalho Chehabnot provide the necessary runtime environment and the include files are
710c4fcd7caSMauro Carvalho Chehabnot tested for it. It is still possible, but not recommended. If you
711c4fcd7caSMauro Carvalho Chehabreally want to do this, forget about exceptions at least.
712c4fcd7caSMauro Carvalho Chehab
713423860a6SMatthew Wilcox#if
714423860a6SMatthew Wilcox---
715c4fcd7caSMauro Carvalho Chehab
716c4fcd7caSMauro Carvalho ChehabIt is generally considered cleaner to use macros in header files (or at
717c4fcd7caSMauro Carvalho Chehabthe top of .c files) to abstract away functions rather than using \`#if'
718c4fcd7caSMauro Carvalho Chehabpre-processor statements throughout the source code.
719c4fcd7caSMauro Carvalho Chehab
720c4fcd7caSMauro Carvalho ChehabPutting Your Stuff in the Kernel
721c4fcd7caSMauro Carvalho Chehab================================
722c4fcd7caSMauro Carvalho Chehab
723c4fcd7caSMauro Carvalho ChehabIn order to get your stuff into shape for official inclusion, or even to
724c4fcd7caSMauro Carvalho Chehabmake a neat patch, there's administrative work to be done:
725c4fcd7caSMauro Carvalho Chehab
726f35cf1a5SKonstantin Ryabitsev-  Figure out who are the owners of the code you've been modifying. Look
727f35cf1a5SKonstantin Ryabitsev   at the top of the source files, inside the ``MAINTAINERS`` file, and
728f35cf1a5SKonstantin Ryabitsev   last of all in the ``CREDITS`` file. You should coordinate with these
729f35cf1a5SKonstantin Ryabitsev   people to make sure you're not duplicating effort, or trying something
730f35cf1a5SKonstantin Ryabitsev   that's already been rejected.
731c4fcd7caSMauro Carvalho Chehab
732f35cf1a5SKonstantin Ryabitsev   Make sure you put your name and email address at the top of any files
733f35cf1a5SKonstantin Ryabitsev   you create or modify significantly. This is the first place people
734c4fcd7caSMauro Carvalho Chehab   will look when they find a bug, or when **they** want to make a change.
735c4fcd7caSMauro Carvalho Chehab
736c4fcd7caSMauro Carvalho Chehab-  Usually you want a configuration option for your kernel hack. Edit
737c4fcd7caSMauro Carvalho Chehab   ``Kconfig`` in the appropriate directory. The Config language is
738c4fcd7caSMauro Carvalho Chehab   simple to use by cut and paste, and there's complete documentation in
739cd238effSMauro Carvalho Chehab   ``Documentation/kbuild/kconfig-language.rst``.
740c4fcd7caSMauro Carvalho Chehab
741c4fcd7caSMauro Carvalho Chehab   In your description of the option, make sure you address both the
742c4fcd7caSMauro Carvalho Chehab   expert user and the user who knows nothing about your feature.
743c4fcd7caSMauro Carvalho Chehab   Mention incompatibilities and issues here. **Definitely** end your
744c4fcd7caSMauro Carvalho Chehab   description with “if in doubt, say N” (or, occasionally, \`Y'); this
745c4fcd7caSMauro Carvalho Chehab   is for people who have no idea what you are talking about.
746c4fcd7caSMauro Carvalho Chehab
747c4fcd7caSMauro Carvalho Chehab-  Edit the ``Makefile``: the CONFIG variables are exported here so you
748c4fcd7caSMauro Carvalho Chehab   can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax
749cd238effSMauro Carvalho Chehab   is documented in ``Documentation/kbuild/makefiles.rst``.
750c4fcd7caSMauro Carvalho Chehab
751f35cf1a5SKonstantin Ryabitsev-  Put yourself in ``CREDITS`` if you consider what you've done
752f35cf1a5SKonstantin Ryabitsev   noteworthy, usually beyond a single file (your name should be at the
753f35cf1a5SKonstantin Ryabitsev   top of the source files anyway). ``MAINTAINERS`` means you want to be
754f35cf1a5SKonstantin Ryabitsev   consulted when changes are made to a subsystem, and hear about bugs;
755f35cf1a5SKonstantin Ryabitsev   it implies a more-than-passing commitment to some part of the code.
756c4fcd7caSMauro Carvalho Chehab
757c4fcd7caSMauro Carvalho Chehab-  Finally, don't forget to read
7589db370deSLukas Bulwahn   ``Documentation/process/submitting-patches.rst``
759c4fcd7caSMauro Carvalho Chehab
760c4fcd7caSMauro Carvalho ChehabKernel Cantrips
761c4fcd7caSMauro Carvalho Chehab===============
762c4fcd7caSMauro Carvalho Chehab
763c4fcd7caSMauro Carvalho ChehabSome favorites from browsing the source. Feel free to add to this list.
764c4fcd7caSMauro Carvalho Chehab
765dca1e58eSMauro Carvalho Chehab``arch/x86/include/asm/delay.h``::
766c4fcd7caSMauro Carvalho Chehab
767c4fcd7caSMauro Carvalho Chehab    #define ndelay(n) (__builtin_constant_p(n) ? \
768c4fcd7caSMauro Carvalho Chehab            ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
769c4fcd7caSMauro Carvalho Chehab            __ndelay(n))
770c4fcd7caSMauro Carvalho Chehab
771c4fcd7caSMauro Carvalho Chehab
772dca1e58eSMauro Carvalho Chehab``include/linux/fs.h``::
773c4fcd7caSMauro Carvalho Chehab
774c4fcd7caSMauro Carvalho Chehab    /*
775c4fcd7caSMauro Carvalho Chehab     * Kernel pointers have redundant information, so we can use a
776c4fcd7caSMauro Carvalho Chehab     * scheme where we can return either an error code or a dentry
777c4fcd7caSMauro Carvalho Chehab     * pointer with the same return value.
778c4fcd7caSMauro Carvalho Chehab     *
779c4fcd7caSMauro Carvalho Chehab     * This should be a per-architecture thing, to allow different
780c4fcd7caSMauro Carvalho Chehab     * error and pointer decisions.
781c4fcd7caSMauro Carvalho Chehab     */
782c4fcd7caSMauro Carvalho Chehab     #define ERR_PTR(err)    ((void *)((long)(err)))
783c4fcd7caSMauro Carvalho Chehab     #define PTR_ERR(ptr)    ((long)(ptr))
784c4fcd7caSMauro Carvalho Chehab     #define IS_ERR(ptr)     ((unsigned long)(ptr) > (unsigned long)(-1000))
785c4fcd7caSMauro Carvalho Chehab
786dca1e58eSMauro Carvalho Chehab``arch/x86/include/asm/uaccess_32.h:``::
787c4fcd7caSMauro Carvalho Chehab
788c4fcd7caSMauro Carvalho Chehab    #define copy_to_user(to,from,n)                         \
789c4fcd7caSMauro Carvalho Chehab            (__builtin_constant_p(n) ?                      \
790c4fcd7caSMauro Carvalho Chehab             __constant_copy_to_user((to),(from),(n)) :     \
791c4fcd7caSMauro Carvalho Chehab             __generic_copy_to_user((to),(from),(n)))
792c4fcd7caSMauro Carvalho Chehab
793c4fcd7caSMauro Carvalho Chehab
794dca1e58eSMauro Carvalho Chehab``arch/sparc/kernel/head.S:``::
795c4fcd7caSMauro Carvalho Chehab
796c4fcd7caSMauro Carvalho Chehab    /*
797c4fcd7caSMauro Carvalho Chehab     * Sun people can't spell worth damn. "compatability" indeed.
798c4fcd7caSMauro Carvalho Chehab     * At least we *know* we can't spell, and use a spell-checker.
799c4fcd7caSMauro Carvalho Chehab     */
800c4fcd7caSMauro Carvalho Chehab
801c4fcd7caSMauro Carvalho Chehab    /* Uh, actually Linus it is I who cannot spell. Too much murky
802c4fcd7caSMauro Carvalho Chehab     * Sparc assembly will do this to ya.
803c4fcd7caSMauro Carvalho Chehab     */
804c4fcd7caSMauro Carvalho Chehab    C_LABEL(cputypvar):
805c4fcd7caSMauro Carvalho Chehab            .asciz "compatibility"
806c4fcd7caSMauro Carvalho Chehab
807c4fcd7caSMauro Carvalho Chehab    /* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */
808c4fcd7caSMauro Carvalho Chehab            .align 4
809c4fcd7caSMauro Carvalho Chehab    C_LABEL(cputypvar_sun4m):
810c4fcd7caSMauro Carvalho Chehab            .asciz "compatible"
811c4fcd7caSMauro Carvalho Chehab
812c4fcd7caSMauro Carvalho Chehab
813dca1e58eSMauro Carvalho Chehab``arch/sparc/lib/checksum.S:``::
814c4fcd7caSMauro Carvalho Chehab
815c4fcd7caSMauro Carvalho Chehab            /* Sun, you just can't beat me, you just can't.  Stop trying,
816c4fcd7caSMauro Carvalho Chehab             * give up.  I'm serious, I am going to kick the living shit
817c4fcd7caSMauro Carvalho Chehab             * out of you, game over, lights out.
818c4fcd7caSMauro Carvalho Chehab             */
819c4fcd7caSMauro Carvalho Chehab
820c4fcd7caSMauro Carvalho Chehab
821c4fcd7caSMauro Carvalho ChehabThanks
822c4fcd7caSMauro Carvalho Chehab======
823c4fcd7caSMauro Carvalho Chehab
824c4fcd7caSMauro Carvalho ChehabThanks to Andi Kleen for the idea, answering my questions, fixing my
825c4fcd7caSMauro Carvalho Chehabmistakes, filling content, etc. Philipp Rumpf for more spelling and
826c4fcd7caSMauro Carvalho Chehabclarity fixes, and some excellent non-obvious points. Werner Almesberger
827c4fcd7caSMauro Carvalho Chehabfor giving me a great summary of :c:func:`disable_irq()`, and Jes
828c4fcd7caSMauro Carvalho ChehabSorensen and Andrea Arcangeli added caveats. Michael Elizabeth Chastain
829c4fcd7caSMauro Carvalho Chehabfor checking and adding to the Configure section. Telsa Gwynne for
830c4fcd7caSMauro Carvalho Chehabteaching me DocBook.
831