1==================== 2Credentials in Linux 3==================== 4 5By: David Howells <dhowells@redhat.com> 6 7.. contents:: :local: 8 9Overview 10======== 11 12There are several parts to the security check performed by Linux when one 13object acts upon another: 14 15 1. Objects. 16 17 Objects are things in the system that may be acted upon directly by 18 userspace programs. Linux has a variety of actionable objects, including: 19 20 - Tasks 21 - Files/inodes 22 - Sockets 23 - Message queues 24 - Shared memory segments 25 - Semaphores 26 - Keys 27 28 As a part of the description of all these objects there is a set of 29 credentials. What's in the set depends on the type of object. 30 31 2. Object ownership. 32 33 Amongst the credentials of most objects, there will be a subset that 34 indicates the ownership of that object. This is used for resource 35 accounting and limitation (disk quotas and task rlimits for example). 36 37 In a standard UNIX filesystem, for instance, this will be defined by the 38 UID marked on the inode. 39 40 3. The objective context. 41 42 Also amongst the credentials of those objects, there will be a subset that 43 indicates the 'objective context' of that object. This may or may not be 44 the same set as in (2) - in standard UNIX files, for instance, this is the 45 defined by the UID and the GID marked on the inode. 46 47 The objective context is used as part of the security calculation that is 48 carried out when an object is acted upon. 49 50 4. Subjects. 51 52 A subject is an object that is acting upon another object. 53 54 Most of the objects in the system are inactive: they don't act on other 55 objects within the system. Processes/tasks are the obvious exception: 56 they do stuff; they access and manipulate things. 57 58 Objects other than tasks may under some circumstances also be subjects. 59 For instance an open file may send SIGIO to a task using the UID and EUID 60 given to it by a task that called ``fcntl(F_SETOWN)`` upon it. In this case, 61 the file struct will have a subjective context too. 62 63 5. The subjective context. 64 65 A subject has an additional interpretation of its credentials. A subset 66 of its credentials forms the 'subjective context'. The subjective context 67 is used as part of the security calculation that is carried out when a 68 subject acts. 69 70 A Linux task, for example, has the FSUID, FSGID and the supplementary 71 group list for when it is acting upon a file - which are quite separate 72 from the real UID and GID that normally form the objective context of the 73 task. 74 75 6. Actions. 76 77 Linux has a number of actions available that a subject may perform upon an 78 object. The set of actions available depends on the nature of the subject 79 and the object. 80 81 Actions include reading, writing, creating and deleting files; forking or 82 signalling and tracing tasks. 83 84 7. Rules, access control lists and security calculations. 85 86 When a subject acts upon an object, a security calculation is made. This 87 involves taking the subjective context, the objective context and the 88 action, and searching one or more sets of rules to see whether the subject 89 is granted or denied permission to act in the desired manner on the 90 object, given those contexts. 91 92 There are two main sources of rules: 93 94 a. Discretionary access control (DAC): 95 96 Sometimes the object will include sets of rules as part of its 97 description. This is an 'Access Control List' or 'ACL'. A Linux 98 file may supply more than one ACL. 99 100 A traditional UNIX file, for example, includes a permissions mask that 101 is an abbreviated ACL with three fixed classes of subject ('user', 102 'group' and 'other'), each of which may be granted certain privileges 103 ('read', 'write' and 'execute' - whatever those map to for the object 104 in question). UNIX file permissions do not allow the arbitrary 105 specification of subjects, however, and so are of limited use. 106 107 A Linux file might also sport a POSIX ACL. This is a list of rules 108 that grants various permissions to arbitrary subjects. 109 110 b. Mandatory access control (MAC): 111 112 The system as a whole may have one or more sets of rules that get 113 applied to all subjects and objects, regardless of their source. 114 SELinux and Smack are examples of this. 115 116 In the case of SELinux and Smack, each object is given a label as part 117 of its credentials. When an action is requested, they take the 118 subject label, the object label and the action and look for a rule 119 that says that this action is either granted or denied. 120 121 122Types of Credentials 123==================== 124 125The Linux kernel supports the following types of credentials: 126 127 1. Traditional UNIX credentials. 128 129 - Real User ID 130 - Real Group ID 131 132 The UID and GID are carried by most, if not all, Linux objects, even if in 133 some cases it has to be invented (FAT or CIFS files for example, which are 134 derived from Windows). These (mostly) define the objective context of 135 that object, with tasks being slightly different in some cases. 136 137 - Effective, Saved and FS User ID 138 - Effective, Saved and FS Group ID 139 - Supplementary groups 140 141 These are additional credentials used by tasks only. Usually, an 142 EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID 143 will be used as the objective. For tasks, it should be noted that this is 144 not always true. 145 146 2. Capabilities. 147 148 - Set of permitted capabilities 149 - Set of inheritable capabilities 150 - Set of effective capabilities 151 - Capability bounding set 152 153 These are only carried by tasks. They indicate superior capabilities 154 granted piecemeal to a task that an ordinary task wouldn't otherwise have. 155 These are manipulated implicitly by changes to the traditional UNIX 156 credentials, but can also be manipulated directly by the ``capset()`` 157 system call. 158 159 The permitted capabilities are those caps that the process might grant 160 itself to its effective or permitted sets through ``capset()``. This 161 inheritable set might also be so constrained. 162 163 The effective capabilities are the ones that a task is actually allowed to 164 make use of itself. 165 166 The inheritable capabilities are the ones that may get passed across 167 ``execve()``. 168 169 The bounding set limits the capabilities that may be inherited across 170 ``execve()``, especially when a binary is executed that will execute as 171 UID 0. 172 173 3. Secure management flags (securebits). 174 175 These are only carried by tasks. These govern the way the above 176 credentials are manipulated and inherited over certain operations such as 177 execve(). They aren't used directly as objective or subjective 178 credentials. 179 180 4. Keys and keyrings. 181 182 These are only carried by tasks. They carry and cache security tokens 183 that don't fit into the other standard UNIX credentials. They are for 184 making such things as network filesystem keys available to the file 185 accesses performed by processes, without the necessity of ordinary 186 programs having to know about security details involved. 187 188 Keyrings are a special type of key. They carry sets of other keys and can 189 be searched for the desired key. Each process may subscribe to a number 190 of keyrings: 191 192 Per-thread keying 193 Per-process keyring 194 Per-session keyring 195 196 When a process accesses a key, if not already present, it will normally be 197 cached on one of these keyrings for future accesses to find. 198 199 For more information on using keys, see ``Documentation/security/keys/*``. 200 201 5. LSM 202 203 The Linux Security Module allows extra controls to be placed over the 204 operations that a task may do. Currently Linux supports several LSM 205 options. 206 207 Some work by labelling the objects in a system and then applying sets of 208 rules (policies) that say what operations a task with one label may do to 209 an object with another label. 210 211 6. AF_KEY 212 213 This is a socket-based approach to credential management for networking 214 stacks [RFC 2367]. It isn't discussed by this document as it doesn't 215 interact directly with task and file credentials; rather it keeps system 216 level credentials. 217 218 219When a file is opened, part of the opening task's subjective context is 220recorded in the file struct created. This allows operations using that file 221struct to use those credentials instead of the subjective context of the task 222that issued the operation. An example of this would be a file opened on a 223network filesystem where the credentials of the opened file should be presented 224to the server, regardless of who is actually doing a read or a write upon it. 225 226 227File Markings 228============= 229 230Files on disk or obtained over the network may have annotations that form the 231objective security context of that file. Depending on the type of filesystem, 232this may include one or more of the following: 233 234 * UNIX UID, GID, mode; 235 * Windows user ID; 236 * Access control list; 237 * LSM security label; 238 * UNIX exec privilege escalation bits (SUID/SGID); 239 * File capabilities exec privilege escalation bits. 240 241These are compared to the task's subjective security context, and certain 242operations allowed or disallowed as a result. In the case of execve(), the 243privilege escalation bits come into play, and may allow the resulting process 244extra privileges, based on the annotations on the executable file. 245 246 247Task Credentials 248================ 249 250In Linux, all of a task's credentials are held in (uid, gid) or through 251(groups, keys, LSM security) a refcounted structure of type 'struct cred'. 252Each task points to its credentials by a pointer called 'cred' in its 253task_struct. 254 255Once a set of credentials has been prepared and committed, it may not be 256changed, barring the following exceptions: 257 258 1. its reference count may be changed; 259 260 2. the reference count on the group_info struct it points to may be changed; 261 262 3. the reference count on the security data it points to may be changed; 263 264 4. the reference count on any keyrings it points to may be changed; 265 266 5. any keyrings it points to may be revoked, expired or have their security 267 attributes changed; and 268 269 6. the contents of any keyrings to which it points may be changed (the whole 270 point of keyrings being a shared set of credentials, modifiable by anyone 271 with appropriate access). 272 273To alter anything in the cred struct, the copy-and-replace principle must be 274adhered to. First take a copy, then alter the copy and then use RCU to change 275the task pointer to make it point to the new copy. There are wrappers to aid 276with this (see below). 277 278A task may only alter its _own_ credentials; it is no longer permitted for a 279task to alter another's credentials. This means the ``capset()`` system call 280is no longer permitted to take any PID other than the one of the current 281process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no 282longer permit attachment to process-specific keyrings in the requesting 283process as the instantiating process may need to create them. 284 285 286Immutable Credentials 287--------------------- 288 289Once a set of credentials has been made public (by calling ``commit_creds()`` 290for example), it must be considered immutable, barring two exceptions: 291 292 1. The reference count may be altered. 293 294 2. While the keyring subscriptions of a set of credentials may not be 295 changed, the keyrings subscribed to may have their contents altered. 296 297To catch accidental credential alteration at compile time, struct task_struct 298has _const_ pointers to its credential sets, as does struct file. Furthermore, 299certain functions such as ``get_cred()`` and ``put_cred()`` operate on const 300pointers, thus rendering casts unnecessary, but require to temporarily ditch 301the const qualification to be able to alter the reference count. 302 303 304Accessing Task Credentials 305-------------------------- 306 307A task being able to alter only its own credentials permits the current process 308to read or replace its own credentials without the need for any form of locking 309-- which simplifies things greatly. It can just call:: 310 311 const struct cred *current_cred() 312 313to get a pointer to its credentials structure, and it doesn't have to release 314it afterwards. 315 316There are convenience wrappers for retrieving specific aspects of a task's 317credentials (the value is simply returned in each case):: 318 319 uid_t current_uid(void) Current's real UID 320 gid_t current_gid(void) Current's real GID 321 uid_t current_euid(void) Current's effective UID 322 gid_t current_egid(void) Current's effective GID 323 uid_t current_fsuid(void) Current's file access UID 324 gid_t current_fsgid(void) Current's file access GID 325 kernel_cap_t current_cap(void) Current's effective capabilities 326 struct user_struct *current_user(void) Current's user account 327 328There are also convenience wrappers for retrieving specific associated pairs of 329a task's credentials:: 330 331 void current_uid_gid(uid_t *, gid_t *); 332 void current_euid_egid(uid_t *, gid_t *); 333 void current_fsuid_fsgid(uid_t *, gid_t *); 334 335which return these pairs of values through their arguments after retrieving 336them from the current task's credentials. 337 338 339In addition, there is a function for obtaining a reference on the current 340process's current set of credentials:: 341 342 const struct cred *get_current_cred(void); 343 344and functions for getting references to one of the credentials that don't 345actually live in struct cred:: 346 347 struct user_struct *get_current_user(void); 348 struct group_info *get_current_groups(void); 349 350which get references to the current process's user accounting structure and 351supplementary groups list respectively. 352 353Once a reference has been obtained, it must be released with ``put_cred()``, 354``free_uid()`` or ``put_group_info()`` as appropriate. 355 356 357Accessing Another Task's Credentials 358------------------------------------ 359 360While a task may access its own credentials without the need for locking, the 361same is not true of a task wanting to access another task's credentials. It 362must use the RCU read lock and ``rcu_dereference()``. 363 364The ``rcu_dereference()`` is wrapped by:: 365 366 const struct cred *__task_cred(struct task_struct *task); 367 368This should be used inside the RCU read lock, as in the following example:: 369 370 void foo(struct task_struct *t, struct foo_data *f) 371 { 372 const struct cred *tcred; 373 ... 374 rcu_read_lock(); 375 tcred = __task_cred(t); 376 f->uid = tcred->uid; 377 f->gid = tcred->gid; 378 f->groups = get_group_info(tcred->groups); 379 rcu_read_unlock(); 380 ... 381 } 382 383Should it be necessary to hold another task's credentials for a long period of 384time, and possibly to sleep while doing so, then the caller should get a 385reference on them using:: 386 387 const struct cred *get_task_cred(struct task_struct *task); 388 389This does all the RCU magic inside of it. The caller must call put_cred() on 390the credentials so obtained when they're finished with. 391 392.. note:: 393 The result of ``__task_cred()`` should not be passed directly to 394 ``get_cred()`` as this may race with ``commit_cred()``. 395 396There are a couple of convenience functions to access bits of another task's 397credentials, hiding the RCU magic from the caller:: 398 399 uid_t task_uid(task) Task's real UID 400 uid_t task_euid(task) Task's effective UID 401 402If the caller is holding the RCU read lock at the time anyway, then:: 403 404 __task_cred(task)->uid 405 __task_cred(task)->euid 406 407should be used instead. Similarly, if multiple aspects of a task's credentials 408need to be accessed, RCU read lock should be used, ``__task_cred()`` called, 409the result stored in a temporary pointer and then the credential aspects called 410from that before dropping the lock. This prevents the potentially expensive 411RCU magic from being invoked multiple times. 412 413Should some other single aspect of another task's credentials need to be 414accessed, then this can be used:: 415 416 task_cred_xxx(task, member) 417 418where 'member' is a non-pointer member of the cred struct. For instance:: 419 420 uid_t task_cred_xxx(task, suid); 421 422will retrieve 'struct cred::suid' from the task, doing the appropriate RCU 423magic. This may not be used for pointer members as what they point to may 424disappear the moment the RCU read lock is dropped. 425 426 427Altering Credentials 428-------------------- 429 430As previously mentioned, a task may only alter its own credentials, and may not 431alter those of another task. This means that it doesn't need to use any 432locking to alter its own credentials. 433 434To alter the current process's credentials, a function should first prepare a 435new set of credentials by calling:: 436 437 struct cred *prepare_creds(void); 438 439this locks current->cred_replace_mutex and then allocates and constructs a 440duplicate of the current process's credentials, returning with the mutex still 441held if successful. It returns NULL if not successful (out of memory). 442 443The mutex prevents ``ptrace()`` from altering the ptrace state of a process 444while security checks on credentials construction and changing is taking place 445as the ptrace state may alter the outcome, particularly in the case of 446``execve()``. 447 448The new credentials set should be altered appropriately, and any security 449checks and hooks done. Both the current and the proposed sets of credentials 450are available for this purpose as current_cred() will return the current set 451still at this point. 452 453When replacing the group list, the new list must be sorted before it 454is added to the credential, as a binary search is used to test for 455membership. In practice, this means groups_sort() should be 456called before set_groups() or set_current_groups(). 457groups_sort() must not be called on a ``struct group_list`` which 458is shared as it may permute elements as part of the sorting process 459even if the array is already sorted. 460 461When the credential set is ready, it should be committed to the current process 462by calling:: 463 464 int commit_creds(struct cred *new); 465 466This will alter various aspects of the credentials and the process, giving the 467LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to 468actually commit the new credentials to ``current->cred``, it will release 469``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it 470will notify the scheduler and others of the changes. 471 472This function is guaranteed to return 0, so that it can be tail-called at the 473end of such functions as ``sys_setresuid()``. 474 475Note that this function consumes the caller's reference to the new credentials. 476The caller should _not_ call ``put_cred()`` on the new credentials afterwards. 477 478Furthermore, once this function has been called on a new set of credentials, 479those credentials may _not_ be changed further. 480 481 482Should the security checks fail or some other error occur after 483``prepare_creds()`` has been called, then the following function should be 484invoked:: 485 486 void abort_creds(struct cred *new); 487 488This releases the lock on ``current->cred_replace_mutex`` that 489``prepare_creds()`` got and then releases the new credentials. 490 491 492A typical credentials alteration function would look something like this:: 493 494 int alter_suid(uid_t suid) 495 { 496 struct cred *new; 497 int ret; 498 499 new = prepare_creds(); 500 if (!new) 501 return -ENOMEM; 502 503 new->suid = suid; 504 ret = security_alter_suid(new); 505 if (ret < 0) { 506 abort_creds(new); 507 return ret; 508 } 509 510 return commit_creds(new); 511 } 512 513 514Managing Credentials 515-------------------- 516 517There are some functions to help manage credentials: 518 519 - ``void put_cred(const struct cred *cred);`` 520 521 This releases a reference to the given set of credentials. If the 522 reference count reaches zero, the credentials will be scheduled for 523 destruction by the RCU system. 524 525 - ``const struct cred *get_cred(const struct cred *cred);`` 526 527 This gets a reference on a live set of credentials, returning a pointer to 528 that set of credentials. 529 530 - ``struct cred *get_new_cred(struct cred *cred);`` 531 532 This gets a reference on a set of credentials that is under construction 533 and is thus still mutable, returning a pointer to that set of credentials. 534 535 536Open File Credentials 537===================== 538 539When a new file is opened, a reference is obtained on the opening task's 540credentials and this is attached to the file struct as ``f_cred`` in place of 541``f_uid`` and ``f_gid``. Code that used to access ``file->f_uid`` and 542``file->f_gid`` should now access ``file->f_cred->fsuid`` and 543``file->f_cred->fsgid``. 544 545It is safe to access ``f_cred`` without the use of RCU or locking because the 546pointer will not change over the lifetime of the file struct, and nor will the 547contents of the cred struct pointed to, barring the exceptions listed above 548(see the Task Credentials section). 549 550To avoid "confused deputy" privilege escalation attacks, access control checks 551during subsequent operations on an opened file should use these credentials 552instead of "current"'s credentials, as the file may have been passed to a more 553privileged process. 554 555Overriding the VFS's Use of Credentials 556======================================= 557 558Under some circumstances it is desirable to override the credentials used by 559the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a 560different set of credentials. This is done in the following places: 561 562 * ``sys_faccessat()``. 563 * ``do_coredump()``. 564 * nfs4recover.c. 565