xref: /openbmc/linux/Documentation/security/credentials.rst (revision cbecf716ca618fd44feda6bd9a64a8179d031fc5)
1af777cd1SKees Cook====================
2af777cd1SKees CookCredentials in Linux
3af777cd1SKees Cook====================
4af777cd1SKees Cook
5af777cd1SKees CookBy: David Howells <dhowells@redhat.com>
6af777cd1SKees Cook
7af777cd1SKees Cook.. contents:: :local:
8af777cd1SKees Cook
9af777cd1SKees CookOverview
10af777cd1SKees Cook========
11af777cd1SKees Cook
12af777cd1SKees CookThere are several parts to the security check performed by Linux when one
13af777cd1SKees Cookobject acts upon another:
14af777cd1SKees Cook
15af777cd1SKees Cook 1. Objects.
16af777cd1SKees Cook
17af777cd1SKees Cook     Objects are things in the system that may be acted upon directly by
18af777cd1SKees Cook     userspace programs.  Linux has a variety of actionable objects, including:
19af777cd1SKees Cook
20af777cd1SKees Cook	- Tasks
21af777cd1SKees Cook	- Files/inodes
22af777cd1SKees Cook	- Sockets
23af777cd1SKees Cook	- Message queues
24af777cd1SKees Cook	- Shared memory segments
25af777cd1SKees Cook	- Semaphores
26af777cd1SKees Cook	- Keys
27af777cd1SKees Cook
28af777cd1SKees Cook     As a part of the description of all these objects there is a set of
29af777cd1SKees Cook     credentials.  What's in the set depends on the type of object.
30af777cd1SKees Cook
31af777cd1SKees Cook 2. Object ownership.
32af777cd1SKees Cook
33af777cd1SKees Cook     Amongst the credentials of most objects, there will be a subset that
34af777cd1SKees Cook     indicates the ownership of that object.  This is used for resource
35af777cd1SKees Cook     accounting and limitation (disk quotas and task rlimits for example).
36af777cd1SKees Cook
37af777cd1SKees Cook     In a standard UNIX filesystem, for instance, this will be defined by the
38af777cd1SKees Cook     UID marked on the inode.
39af777cd1SKees Cook
40af777cd1SKees Cook 3. The objective context.
41af777cd1SKees Cook
42af777cd1SKees Cook     Also amongst the credentials of those objects, there will be a subset that
43af777cd1SKees Cook     indicates the 'objective context' of that object.  This may or may not be
44af777cd1SKees Cook     the same set as in (2) - in standard UNIX files, for instance, this is the
45af777cd1SKees Cook     defined by the UID and the GID marked on the inode.
46af777cd1SKees Cook
47af777cd1SKees Cook     The objective context is used as part of the security calculation that is
48af777cd1SKees Cook     carried out when an object is acted upon.
49af777cd1SKees Cook
50af777cd1SKees Cook 4. Subjects.
51af777cd1SKees Cook
52af777cd1SKees Cook     A subject is an object that is acting upon another object.
53af777cd1SKees Cook
54af777cd1SKees Cook     Most of the objects in the system are inactive: they don't act on other
55af777cd1SKees Cook     objects within the system.  Processes/tasks are the obvious exception:
56af777cd1SKees Cook     they do stuff; they access and manipulate things.
57af777cd1SKees Cook
58af777cd1SKees Cook     Objects other than tasks may under some circumstances also be subjects.
59af777cd1SKees Cook     For instance an open file may send SIGIO to a task using the UID and EUID
60af777cd1SKees Cook     given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
61af777cd1SKees Cook     the file struct will have a subjective context too.
62af777cd1SKees Cook
63af777cd1SKees Cook 5. The subjective context.
64af777cd1SKees Cook
65af777cd1SKees Cook     A subject has an additional interpretation of its credentials.  A subset
66af777cd1SKees Cook     of its credentials forms the 'subjective context'.  The subjective context
67af777cd1SKees Cook     is used as part of the security calculation that is carried out when a
68af777cd1SKees Cook     subject acts.
69af777cd1SKees Cook
70af777cd1SKees Cook     A Linux task, for example, has the FSUID, FSGID and the supplementary
71af777cd1SKees Cook     group list for when it is acting upon a file - which are quite separate
72af777cd1SKees Cook     from the real UID and GID that normally form the objective context of the
73af777cd1SKees Cook     task.
74af777cd1SKees Cook
75af777cd1SKees Cook 6. Actions.
76af777cd1SKees Cook
77af777cd1SKees Cook     Linux has a number of actions available that a subject may perform upon an
78af777cd1SKees Cook     object.  The set of actions available depends on the nature of the subject
79af777cd1SKees Cook     and the object.
80af777cd1SKees Cook
81af777cd1SKees Cook     Actions include reading, writing, creating and deleting files; forking or
82af777cd1SKees Cook     signalling and tracing tasks.
83af777cd1SKees Cook
84af777cd1SKees Cook 7. Rules, access control lists and security calculations.
85af777cd1SKees Cook
86af777cd1SKees Cook     When a subject acts upon an object, a security calculation is made.  This
87af777cd1SKees Cook     involves taking the subjective context, the objective context and the
88af777cd1SKees Cook     action, and searching one or more sets of rules to see whether the subject
89af777cd1SKees Cook     is granted or denied permission to act in the desired manner on the
90af777cd1SKees Cook     object, given those contexts.
91af777cd1SKees Cook
92af777cd1SKees Cook     There are two main sources of rules:
93af777cd1SKees Cook
94af777cd1SKees Cook     a. Discretionary access control (DAC):
95af777cd1SKees Cook
96af777cd1SKees Cook	 Sometimes the object will include sets of rules as part of its
97af777cd1SKees Cook	 description.  This is an 'Access Control List' or 'ACL'.  A Linux
98af777cd1SKees Cook	 file may supply more than one ACL.
99af777cd1SKees Cook
100af777cd1SKees Cook	 A traditional UNIX file, for example, includes a permissions mask that
101af777cd1SKees Cook	 is an abbreviated ACL with three fixed classes of subject ('user',
102af777cd1SKees Cook	 'group' and 'other'), each of which may be granted certain privileges
103af777cd1SKees Cook	 ('read', 'write' and 'execute' - whatever those map to for the object
104af777cd1SKees Cook	 in question).  UNIX file permissions do not allow the arbitrary
105af777cd1SKees Cook	 specification of subjects, however, and so are of limited use.
106af777cd1SKees Cook
107af777cd1SKees Cook	 A Linux file might also sport a POSIX ACL.  This is a list of rules
108af777cd1SKees Cook	 that grants various permissions to arbitrary subjects.
109af777cd1SKees Cook
110af777cd1SKees Cook     b. Mandatory access control (MAC):
111af777cd1SKees Cook
112af777cd1SKees Cook	 The system as a whole may have one or more sets of rules that get
113af777cd1SKees Cook	 applied to all subjects and objects, regardless of their source.
114af777cd1SKees Cook	 SELinux and Smack are examples of this.
115af777cd1SKees Cook
116af777cd1SKees Cook	 In the case of SELinux and Smack, each object is given a label as part
117af777cd1SKees Cook	 of its credentials.  When an action is requested, they take the
118af777cd1SKees Cook	 subject label, the object label and the action and look for a rule
119af777cd1SKees Cook	 that says that this action is either granted or denied.
120af777cd1SKees Cook
121af777cd1SKees Cook
122af777cd1SKees CookTypes of Credentials
123af777cd1SKees Cook====================
124af777cd1SKees Cook
125af777cd1SKees CookThe Linux kernel supports the following types of credentials:
126af777cd1SKees Cook
127af777cd1SKees Cook 1. Traditional UNIX credentials.
128af777cd1SKees Cook
129af777cd1SKees Cook	- Real User ID
130af777cd1SKees Cook	- Real Group ID
131af777cd1SKees Cook
132af777cd1SKees Cook     The UID and GID are carried by most, if not all, Linux objects, even if in
133af777cd1SKees Cook     some cases it has to be invented (FAT or CIFS files for example, which are
134af777cd1SKees Cook     derived from Windows).  These (mostly) define the objective context of
135af777cd1SKees Cook     that object, with tasks being slightly different in some cases.
136af777cd1SKees Cook
137af777cd1SKees Cook	- Effective, Saved and FS User ID
138af777cd1SKees Cook	- Effective, Saved and FS Group ID
139af777cd1SKees Cook	- Supplementary groups
140af777cd1SKees Cook
141af777cd1SKees Cook     These are additional credentials used by tasks only.  Usually, an
142af777cd1SKees Cook     EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
143af777cd1SKees Cook     will be used as the objective.  For tasks, it should be noted that this is
144af777cd1SKees Cook     not always true.
145af777cd1SKees Cook
146af777cd1SKees Cook 2. Capabilities.
147af777cd1SKees Cook
148af777cd1SKees Cook	- Set of permitted capabilities
149af777cd1SKees Cook	- Set of inheritable capabilities
150af777cd1SKees Cook	- Set of effective capabilities
151af777cd1SKees Cook	- Capability bounding set
152af777cd1SKees Cook
153af777cd1SKees Cook     These are only carried by tasks.  They indicate superior capabilities
154af777cd1SKees Cook     granted piecemeal to a task that an ordinary task wouldn't otherwise have.
155af777cd1SKees Cook     These are manipulated implicitly by changes to the traditional UNIX
156af777cd1SKees Cook     credentials, but can also be manipulated directly by the ``capset()``
157af777cd1SKees Cook     system call.
158af777cd1SKees Cook
159af777cd1SKees Cook     The permitted capabilities are those caps that the process might grant
160af777cd1SKees Cook     itself to its effective or permitted sets through ``capset()``.  This
161af777cd1SKees Cook     inheritable set might also be so constrained.
162af777cd1SKees Cook
163af777cd1SKees Cook     The effective capabilities are the ones that a task is actually allowed to
164af777cd1SKees Cook     make use of itself.
165af777cd1SKees Cook
166af777cd1SKees Cook     The inheritable capabilities are the ones that may get passed across
167af777cd1SKees Cook     ``execve()``.
168af777cd1SKees Cook
169af777cd1SKees Cook     The bounding set limits the capabilities that may be inherited across
170af777cd1SKees Cook     ``execve()``, especially when a binary is executed that will execute as
171af777cd1SKees Cook     UID 0.
172af777cd1SKees Cook
173af777cd1SKees Cook 3. Secure management flags (securebits).
174af777cd1SKees Cook
175af777cd1SKees Cook     These are only carried by tasks.  These govern the way the above
176af777cd1SKees Cook     credentials are manipulated and inherited over certain operations such as
177af777cd1SKees Cook     execve().  They aren't used directly as objective or subjective
178af777cd1SKees Cook     credentials.
179af777cd1SKees Cook
180af777cd1SKees Cook 4. Keys and keyrings.
181af777cd1SKees Cook
182af777cd1SKees Cook     These are only carried by tasks.  They carry and cache security tokens
183af777cd1SKees Cook     that don't fit into the other standard UNIX credentials.  They are for
184af777cd1SKees Cook     making such things as network filesystem keys available to the file
185af777cd1SKees Cook     accesses performed by processes, without the necessity of ordinary
186af777cd1SKees Cook     programs having to know about security details involved.
187af777cd1SKees Cook
188af777cd1SKees Cook     Keyrings are a special type of key.  They carry sets of other keys and can
189af777cd1SKees Cook     be searched for the desired key.  Each process may subscribe to a number
190af777cd1SKees Cook     of keyrings:
191af777cd1SKees Cook
192af777cd1SKees Cook	Per-thread keying
193af777cd1SKees Cook	Per-process keyring
194af777cd1SKees Cook	Per-session keyring
195af777cd1SKees Cook
196af777cd1SKees Cook     When a process accesses a key, if not already present, it will normally be
197af777cd1SKees Cook     cached on one of these keyrings for future accesses to find.
198af777cd1SKees Cook
199c7f66400STom Saeger     For more information on using keys, see ``Documentation/security/keys/*``.
200af777cd1SKees Cook
201af777cd1SKees Cook 5. LSM
202af777cd1SKees Cook
203af777cd1SKees Cook     The Linux Security Module allows extra controls to be placed over the
204af777cd1SKees Cook     operations that a task may do.  Currently Linux supports several LSM
205af777cd1SKees Cook     options.
206af777cd1SKees Cook
207af777cd1SKees Cook     Some work by labelling the objects in a system and then applying sets of
208af777cd1SKees Cook     rules (policies) that say what operations a task with one label may do to
209af777cd1SKees Cook     an object with another label.
210af777cd1SKees Cook
211af777cd1SKees Cook 6. AF_KEY
212af777cd1SKees Cook
213af777cd1SKees Cook     This is a socket-based approach to credential management for networking
214af777cd1SKees Cook     stacks [RFC 2367].  It isn't discussed by this document as it doesn't
215af777cd1SKees Cook     interact directly with task and file credentials; rather it keeps system
216af777cd1SKees Cook     level credentials.
217af777cd1SKees Cook
218af777cd1SKees Cook
219af777cd1SKees CookWhen a file is opened, part of the opening task's subjective context is
220af777cd1SKees Cookrecorded in the file struct created.  This allows operations using that file
221af777cd1SKees Cookstruct to use those credentials instead of the subjective context of the task
222af777cd1SKees Cookthat issued the operation.  An example of this would be a file opened on a
223af777cd1SKees Cooknetwork filesystem where the credentials of the opened file should be presented
224af777cd1SKees Cookto the server, regardless of who is actually doing a read or a write upon it.
225af777cd1SKees Cook
226af777cd1SKees Cook
227af777cd1SKees CookFile Markings
228af777cd1SKees Cook=============
229af777cd1SKees Cook
230af777cd1SKees CookFiles on disk or obtained over the network may have annotations that form the
231af777cd1SKees Cookobjective security context of that file.  Depending on the type of filesystem,
232af777cd1SKees Cookthis may include one or more of the following:
233af777cd1SKees Cook
234af777cd1SKees Cook * UNIX UID, GID, mode;
235af777cd1SKees Cook * Windows user ID;
236af777cd1SKees Cook * Access control list;
237af777cd1SKees Cook * LSM security label;
238af777cd1SKees Cook * UNIX exec privilege escalation bits (SUID/SGID);
239af777cd1SKees Cook * File capabilities exec privilege escalation bits.
240af777cd1SKees Cook
241af777cd1SKees CookThese are compared to the task's subjective security context, and certain
242af777cd1SKees Cookoperations allowed or disallowed as a result.  In the case of execve(), the
243af777cd1SKees Cookprivilege escalation bits come into play, and may allow the resulting process
244af777cd1SKees Cookextra privileges, based on the annotations on the executable file.
245af777cd1SKees Cook
246af777cd1SKees Cook
247af777cd1SKees CookTask Credentials
248af777cd1SKees Cook================
249af777cd1SKees Cook
250af777cd1SKees CookIn Linux, all of a task's credentials are held in (uid, gid) or through
251af777cd1SKees Cook(groups, keys, LSM security) a refcounted structure of type 'struct cred'.
252af777cd1SKees CookEach task points to its credentials by a pointer called 'cred' in its
253af777cd1SKees Cooktask_struct.
254af777cd1SKees Cook
255af777cd1SKees CookOnce a set of credentials has been prepared and committed, it may not be
256af777cd1SKees Cookchanged, barring the following exceptions:
257af777cd1SKees Cook
258af777cd1SKees Cook 1. its reference count may be changed;
259af777cd1SKees Cook
260af777cd1SKees Cook 2. the reference count on the group_info struct it points to may be changed;
261af777cd1SKees Cook
262af777cd1SKees Cook 3. the reference count on the security data it points to may be changed;
263af777cd1SKees Cook
264af777cd1SKees Cook 4. the reference count on any keyrings it points to may be changed;
265af777cd1SKees Cook
266af777cd1SKees Cook 5. any keyrings it points to may be revoked, expired or have their security
267af777cd1SKees Cook    attributes changed; and
268af777cd1SKees Cook
269af777cd1SKees Cook 6. the contents of any keyrings to which it points may be changed (the whole
270af777cd1SKees Cook    point of keyrings being a shared set of credentials, modifiable by anyone
271af777cd1SKees Cook    with appropriate access).
272af777cd1SKees Cook
273af777cd1SKees CookTo alter anything in the cred struct, the copy-and-replace principle must be
274af777cd1SKees Cookadhered to.  First take a copy, then alter the copy and then use RCU to change
275af777cd1SKees Cookthe task pointer to make it point to the new copy.  There are wrappers to aid
276af777cd1SKees Cookwith this (see below).
277af777cd1SKees Cook
278af777cd1SKees CookA task may only alter its _own_ credentials; it is no longer permitted for a
279af777cd1SKees Cooktask to alter another's credentials.  This means the ``capset()`` system call
280af777cd1SKees Cookis no longer permitted to take any PID other than the one of the current
281af777cd1SKees Cookprocess. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
282af777cd1SKees Cooklonger permit attachment to process-specific keyrings in the requesting
283af777cd1SKees Cookprocess as the instantiating process may need to create them.
284af777cd1SKees Cook
285af777cd1SKees Cook
286af777cd1SKees CookImmutable Credentials
287af777cd1SKees Cook---------------------
288af777cd1SKees Cook
289af777cd1SKees CookOnce a set of credentials has been made public (by calling ``commit_creds()``
290af777cd1SKees Cookfor example), it must be considered immutable, barring two exceptions:
291af777cd1SKees Cook
292af777cd1SKees Cook 1. The reference count may be altered.
293af777cd1SKees Cook
294806654a9SWill Deacon 2. While the keyring subscriptions of a set of credentials may not be
295af777cd1SKees Cook    changed, the keyrings subscribed to may have their contents altered.
296af777cd1SKees Cook
297af777cd1SKees CookTo catch accidental credential alteration at compile time, struct task_struct
298af777cd1SKees Cookhas _const_ pointers to its credential sets, as does struct file.  Furthermore,
299af777cd1SKees Cookcertain functions such as ``get_cred()`` and ``put_cred()`` operate on const
300af777cd1SKees Cookpointers, thus rendering casts unnecessary, but require to temporarily ditch
301af777cd1SKees Cookthe const qualification to be able to alter the reference count.
302af777cd1SKees Cook
303af777cd1SKees Cook
304af777cd1SKees CookAccessing Task Credentials
305af777cd1SKees Cook--------------------------
306af777cd1SKees Cook
307af777cd1SKees CookA task being able to alter only its own credentials permits the current process
308af777cd1SKees Cookto read or replace its own credentials without the need for any form of locking
309af777cd1SKees Cook-- which simplifies things greatly.  It can just call::
310af777cd1SKees Cook
311af777cd1SKees Cook	const struct cred *current_cred()
312af777cd1SKees Cook
313af777cd1SKees Cookto get a pointer to its credentials structure, and it doesn't have to release
314af777cd1SKees Cookit afterwards.
315af777cd1SKees Cook
316af777cd1SKees CookThere are convenience wrappers for retrieving specific aspects of a task's
317af777cd1SKees Cookcredentials (the value is simply returned in each case)::
318af777cd1SKees Cook
319af777cd1SKees Cook	uid_t current_uid(void)		Current's real UID
320af777cd1SKees Cook	gid_t current_gid(void)		Current's real GID
321af777cd1SKees Cook	uid_t current_euid(void)	Current's effective UID
322af777cd1SKees Cook	gid_t current_egid(void)	Current's effective GID
323af777cd1SKees Cook	uid_t current_fsuid(void)	Current's file access UID
324af777cd1SKees Cook	gid_t current_fsgid(void)	Current's file access GID
325af777cd1SKees Cook	kernel_cap_t current_cap(void)	Current's effective capabilities
326af777cd1SKees Cook	struct user_struct *current_user(void)  Current's user account
327af777cd1SKees Cook
328af777cd1SKees CookThere are also convenience wrappers for retrieving specific associated pairs of
329af777cd1SKees Cooka task's credentials::
330af777cd1SKees Cook
331af777cd1SKees Cook	void current_uid_gid(uid_t *, gid_t *);
332af777cd1SKees Cook	void current_euid_egid(uid_t *, gid_t *);
333af777cd1SKees Cook	void current_fsuid_fsgid(uid_t *, gid_t *);
334af777cd1SKees Cook
335af777cd1SKees Cookwhich return these pairs of values through their arguments after retrieving
336af777cd1SKees Cookthem from the current task's credentials.
337af777cd1SKees Cook
338af777cd1SKees Cook
339af777cd1SKees CookIn addition, there is a function for obtaining a reference on the current
340af777cd1SKees Cookprocess's current set of credentials::
341af777cd1SKees Cook
342af777cd1SKees Cook	const struct cred *get_current_cred(void);
343af777cd1SKees Cook
344af777cd1SKees Cookand functions for getting references to one of the credentials that don't
345af777cd1SKees Cookactually live in struct cred::
346af777cd1SKees Cook
347af777cd1SKees Cook	struct user_struct *get_current_user(void);
348af777cd1SKees Cook	struct group_info *get_current_groups(void);
349af777cd1SKees Cook
350af777cd1SKees Cookwhich get references to the current process's user accounting structure and
351af777cd1SKees Cooksupplementary groups list respectively.
352af777cd1SKees Cook
353af777cd1SKees CookOnce a reference has been obtained, it must be released with ``put_cred()``,
354af777cd1SKees Cook``free_uid()`` or ``put_group_info()`` as appropriate.
355af777cd1SKees Cook
356af777cd1SKees Cook
357af777cd1SKees CookAccessing Another Task's Credentials
358af777cd1SKees Cook------------------------------------
359af777cd1SKees Cook
360806654a9SWill DeaconWhile a task may access its own credentials without the need for locking, the
361af777cd1SKees Cooksame is not true of a task wanting to access another task's credentials.  It
362af777cd1SKees Cookmust use the RCU read lock and ``rcu_dereference()``.
363af777cd1SKees Cook
364af777cd1SKees CookThe ``rcu_dereference()`` is wrapped by::
365af777cd1SKees Cook
366af777cd1SKees Cook	const struct cred *__task_cred(struct task_struct *task);
367af777cd1SKees Cook
368af777cd1SKees CookThis should be used inside the RCU read lock, as in the following example::
369af777cd1SKees Cook
370af777cd1SKees Cook	void foo(struct task_struct *t, struct foo_data *f)
371af777cd1SKees Cook	{
372af777cd1SKees Cook		const struct cred *tcred;
373af777cd1SKees Cook		...
374af777cd1SKees Cook		rcu_read_lock();
375af777cd1SKees Cook		tcred = __task_cred(t);
376af777cd1SKees Cook		f->uid = tcred->uid;
377af777cd1SKees Cook		f->gid = tcred->gid;
378af777cd1SKees Cook		f->groups = get_group_info(tcred->groups);
379af777cd1SKees Cook		rcu_read_unlock();
380af777cd1SKees Cook		...
381af777cd1SKees Cook	}
382af777cd1SKees Cook
383af777cd1SKees CookShould it be necessary to hold another task's credentials for a long period of
384806654a9SWill Deacontime, and possibly to sleep while doing so, then the caller should get a
385af777cd1SKees Cookreference on them using::
386af777cd1SKees Cook
387af777cd1SKees Cook	const struct cred *get_task_cred(struct task_struct *task);
388af777cd1SKees Cook
389af777cd1SKees CookThis does all the RCU magic inside of it.  The caller must call put_cred() on
390af777cd1SKees Cookthe credentials so obtained when they're finished with.
391af777cd1SKees Cook
392af777cd1SKees Cook.. note::
393af777cd1SKees Cook   The result of ``__task_cred()`` should not be passed directly to
394af777cd1SKees Cook   ``get_cred()`` as this may race with ``commit_cred()``.
395af777cd1SKees Cook
396af777cd1SKees CookThere are a couple of convenience functions to access bits of another task's
397af777cd1SKees Cookcredentials, hiding the RCU magic from the caller::
398af777cd1SKees Cook
399af777cd1SKees Cook	uid_t task_uid(task)		Task's real UID
400af777cd1SKees Cook	uid_t task_euid(task)		Task's effective UID
401af777cd1SKees Cook
402af777cd1SKees CookIf the caller is holding the RCU read lock at the time anyway, then::
403af777cd1SKees Cook
404af777cd1SKees Cook	__task_cred(task)->uid
405af777cd1SKees Cook	__task_cred(task)->euid
406af777cd1SKees Cook
407af777cd1SKees Cookshould be used instead.  Similarly, if multiple aspects of a task's credentials
408af777cd1SKees Cookneed to be accessed, RCU read lock should be used, ``__task_cred()`` called,
409af777cd1SKees Cookthe result stored in a temporary pointer and then the credential aspects called
410af777cd1SKees Cookfrom that before dropping the lock.  This prevents the potentially expensive
411af777cd1SKees CookRCU magic from being invoked multiple times.
412af777cd1SKees Cook
413af777cd1SKees CookShould some other single aspect of another task's credentials need to be
414af777cd1SKees Cookaccessed, then this can be used::
415af777cd1SKees Cook
416af777cd1SKees Cook	task_cred_xxx(task, member)
417af777cd1SKees Cook
418af777cd1SKees Cookwhere 'member' is a non-pointer member of the cred struct.  For instance::
419af777cd1SKees Cook
420af777cd1SKees Cook	uid_t task_cred_xxx(task, suid);
421af777cd1SKees Cook
422af777cd1SKees Cookwill retrieve 'struct cred::suid' from the task, doing the appropriate RCU
423af777cd1SKees Cookmagic.  This may not be used for pointer members as what they point to may
424af777cd1SKees Cookdisappear the moment the RCU read lock is dropped.
425af777cd1SKees Cook
426af777cd1SKees Cook
427af777cd1SKees CookAltering Credentials
428af777cd1SKees Cook--------------------
429af777cd1SKees Cook
430af777cd1SKees CookAs previously mentioned, a task may only alter its own credentials, and may not
431af777cd1SKees Cookalter those of another task.  This means that it doesn't need to use any
432af777cd1SKees Cooklocking to alter its own credentials.
433af777cd1SKees Cook
434af777cd1SKees CookTo alter the current process's credentials, a function should first prepare a
435af777cd1SKees Cooknew set of credentials by calling::
436af777cd1SKees Cook
437af777cd1SKees Cook	struct cred *prepare_creds(void);
438af777cd1SKees Cook
439af777cd1SKees Cookthis locks current->cred_replace_mutex and then allocates and constructs a
440af777cd1SKees Cookduplicate of the current process's credentials, returning with the mutex still
441af777cd1SKees Cookheld if successful.  It returns NULL if not successful (out of memory).
442af777cd1SKees Cook
443af777cd1SKees CookThe mutex prevents ``ptrace()`` from altering the ptrace state of a process
444806654a9SWill Deaconwhile security checks on credentials construction and changing is taking place
445af777cd1SKees Cookas the ptrace state may alter the outcome, particularly in the case of
446af777cd1SKees Cook``execve()``.
447af777cd1SKees Cook
448af777cd1SKees CookThe new credentials set should be altered appropriately, and any security
449af777cd1SKees Cookchecks and hooks done.  Both the current and the proposed sets of credentials
450af777cd1SKees Cookare available for this purpose as current_cred() will return the current set
451af777cd1SKees Cookstill at this point.
452af777cd1SKees Cook
4530b345d72SNeilBrownWhen replacing the group list, the new list must be sorted before it
4540b345d72SNeilBrownis added to the credential, as a binary search is used to test for
455*4d010d14SPuranjay Mohanmembership.  In practice, this means groups_sort() should be
456*4d010d14SPuranjay Mohancalled before set_groups() or set_current_groups().
457*4d010d14SPuranjay Mohangroups_sort() must not be called on a ``struct group_list`` which
4580b345d72SNeilBrownis shared as it may permute elements as part of the sorting process
4590b345d72SNeilBrowneven if the array is already sorted.
460af777cd1SKees Cook
461af777cd1SKees CookWhen the credential set is ready, it should be committed to the current process
462af777cd1SKees Cookby calling::
463af777cd1SKees Cook
464af777cd1SKees Cook	int commit_creds(struct cred *new);
465af777cd1SKees Cook
466af777cd1SKees CookThis will alter various aspects of the credentials and the process, giving the
467af777cd1SKees CookLSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
468af777cd1SKees Cookactually commit the new credentials to ``current->cred``, it will release
469af777cd1SKees Cook``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
470af777cd1SKees Cookwill notify the scheduler and others of the changes.
471af777cd1SKees Cook
472af777cd1SKees CookThis function is guaranteed to return 0, so that it can be tail-called at the
473af777cd1SKees Cookend of such functions as ``sys_setresuid()``.
474af777cd1SKees Cook
475af777cd1SKees CookNote that this function consumes the caller's reference to the new credentials.
476af777cd1SKees CookThe caller should _not_ call ``put_cred()`` on the new credentials afterwards.
477af777cd1SKees Cook
478af777cd1SKees CookFurthermore, once this function has been called on a new set of credentials,
479af777cd1SKees Cookthose credentials may _not_ be changed further.
480af777cd1SKees Cook
481af777cd1SKees Cook
482af777cd1SKees CookShould the security checks fail or some other error occur after
483af777cd1SKees Cook``prepare_creds()`` has been called, then the following function should be
484af777cd1SKees Cookinvoked::
485af777cd1SKees Cook
486af777cd1SKees Cook	void abort_creds(struct cred *new);
487af777cd1SKees Cook
488af777cd1SKees CookThis releases the lock on ``current->cred_replace_mutex`` that
489af777cd1SKees Cook``prepare_creds()`` got and then releases the new credentials.
490af777cd1SKees Cook
491af777cd1SKees Cook
492af777cd1SKees CookA typical credentials alteration function would look something like this::
493af777cd1SKees Cook
494af777cd1SKees Cook	int alter_suid(uid_t suid)
495af777cd1SKees Cook	{
496af777cd1SKees Cook		struct cred *new;
497af777cd1SKees Cook		int ret;
498af777cd1SKees Cook
499af777cd1SKees Cook		new = prepare_creds();
500af777cd1SKees Cook		if (!new)
501af777cd1SKees Cook			return -ENOMEM;
502af777cd1SKees Cook
503af777cd1SKees Cook		new->suid = suid;
504af777cd1SKees Cook		ret = security_alter_suid(new);
505af777cd1SKees Cook		if (ret < 0) {
506af777cd1SKees Cook			abort_creds(new);
507af777cd1SKees Cook			return ret;
508af777cd1SKees Cook		}
509af777cd1SKees Cook
510af777cd1SKees Cook		return commit_creds(new);
511af777cd1SKees Cook	}
512af777cd1SKees Cook
513af777cd1SKees Cook
514af777cd1SKees CookManaging Credentials
515af777cd1SKees Cook--------------------
516af777cd1SKees Cook
517af777cd1SKees CookThere are some functions to help manage credentials:
518af777cd1SKees Cook
519af777cd1SKees Cook - ``void put_cred(const struct cred *cred);``
520af777cd1SKees Cook
521af777cd1SKees Cook     This releases a reference to the given set of credentials.  If the
522af777cd1SKees Cook     reference count reaches zero, the credentials will be scheduled for
523af777cd1SKees Cook     destruction by the RCU system.
524af777cd1SKees Cook
525af777cd1SKees Cook - ``const struct cred *get_cred(const struct cred *cred);``
526af777cd1SKees Cook
527af777cd1SKees Cook     This gets a reference on a live set of credentials, returning a pointer to
528af777cd1SKees Cook     that set of credentials.
529af777cd1SKees Cook
530af777cd1SKees Cook - ``struct cred *get_new_cred(struct cred *cred);``
531af777cd1SKees Cook
532af777cd1SKees Cook     This gets a reference on a set of credentials that is under construction
533af777cd1SKees Cook     and is thus still mutable, returning a pointer to that set of credentials.
534af777cd1SKees Cook
535af777cd1SKees Cook
536af777cd1SKees CookOpen File Credentials
537af777cd1SKees Cook=====================
538af777cd1SKees Cook
539af777cd1SKees CookWhen a new file is opened, a reference is obtained on the opening task's
540af777cd1SKees Cookcredentials and this is attached to the file struct as ``f_cred`` in place of
541af777cd1SKees Cook``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
542af777cd1SKees Cook``file->f_gid`` should now access ``file->f_cred->fsuid`` and
543af777cd1SKees Cook``file->f_cred->fsgid``.
544af777cd1SKees Cook
545af777cd1SKees CookIt is safe to access ``f_cred`` without the use of RCU or locking because the
546af777cd1SKees Cookpointer will not change over the lifetime of the file struct, and nor will the
547af777cd1SKees Cookcontents of the cred struct pointed to, barring the exceptions listed above
548af777cd1SKees Cook(see the Task Credentials section).
549af777cd1SKees Cook
5507303515aSKees CookTo avoid "confused deputy" privilege escalation attacks, access control checks
5517303515aSKees Cookduring subsequent operations on an opened file should use these credentials
5527303515aSKees Cookinstead of "current"'s credentials, as the file may have been passed to a more
5537303515aSKees Cookprivileged process.
554af777cd1SKees Cook
555af777cd1SKees CookOverriding the VFS's Use of Credentials
556af777cd1SKees Cook=======================================
557af777cd1SKees Cook
558af777cd1SKees CookUnder some circumstances it is desirable to override the credentials used by
559af777cd1SKees Cookthe VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
560af777cd1SKees Cookdifferent set of credentials.  This is done in the following places:
561af777cd1SKees Cook
562af777cd1SKees Cook * ``sys_faccessat()``.
563af777cd1SKees Cook * ``do_coredump()``.
564af777cd1SKees Cook * nfs4recover.c.
565