1*800c02f5SMauro Carvalho Chehab============================= 2*800c02f5SMauro Carvalho ChehabNo-MMU memory mapping support 3*800c02f5SMauro Carvalho Chehab============================= 4*800c02f5SMauro Carvalho Chehab 5*800c02f5SMauro Carvalho ChehabThe kernel has limited support for memory mapping under no-MMU conditions, such 6*800c02f5SMauro Carvalho Chehabas are used in uClinux environments. From the userspace point of view, memory 7*800c02f5SMauro Carvalho Chehabmapping is made use of in conjunction with the mmap() system call, the shmat() 8*800c02f5SMauro Carvalho Chehabcall and the execve() system call. From the kernel's point of view, execve() 9*800c02f5SMauro Carvalho Chehabmapping is actually performed by the binfmt drivers, which call back into the 10*800c02f5SMauro Carvalho Chehabmmap() routines to do the actual work. 11*800c02f5SMauro Carvalho Chehab 12*800c02f5SMauro Carvalho ChehabMemory mapping behaviour also involves the way fork(), vfork(), clone() and 13*800c02f5SMauro Carvalho Chehabptrace() work. Under uClinux there is no fork(), and clone() must be supplied 14*800c02f5SMauro Carvalho Chehabthe CLONE_VM flag. 15*800c02f5SMauro Carvalho Chehab 16*800c02f5SMauro Carvalho ChehabThe behaviour is similar between the MMU and no-MMU cases, but not identical; 17*800c02f5SMauro Carvalho Chehaband it's also much more restricted in the latter case: 18*800c02f5SMauro Carvalho Chehab 19*800c02f5SMauro Carvalho Chehab (#) Anonymous mapping, MAP_PRIVATE 20*800c02f5SMauro Carvalho Chehab 21*800c02f5SMauro Carvalho Chehab In the MMU case: VM regions backed by arbitrary pages; copy-on-write 22*800c02f5SMauro Carvalho Chehab across fork. 23*800c02f5SMauro Carvalho Chehab 24*800c02f5SMauro Carvalho Chehab In the no-MMU case: VM regions backed by arbitrary contiguous runs of 25*800c02f5SMauro Carvalho Chehab pages. 26*800c02f5SMauro Carvalho Chehab 27*800c02f5SMauro Carvalho Chehab (#) Anonymous mapping, MAP_SHARED 28*800c02f5SMauro Carvalho Chehab 29*800c02f5SMauro Carvalho Chehab These behave very much like private mappings, except that they're 30*800c02f5SMauro Carvalho Chehab shared across fork() or clone() without CLONE_VM in the MMU case. Since 31*800c02f5SMauro Carvalho Chehab the no-MMU case doesn't support these, behaviour is identical to 32*800c02f5SMauro Carvalho Chehab MAP_PRIVATE there. 33*800c02f5SMauro Carvalho Chehab 34*800c02f5SMauro Carvalho Chehab (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE 35*800c02f5SMauro Carvalho Chehab 36*800c02f5SMauro Carvalho Chehab In the MMU case: VM regions backed by pages read from file; changes to 37*800c02f5SMauro Carvalho Chehab the underlying file are reflected in the mapping; copied across fork. 38*800c02f5SMauro Carvalho Chehab 39*800c02f5SMauro Carvalho Chehab In the no-MMU case: 40*800c02f5SMauro Carvalho Chehab 41*800c02f5SMauro Carvalho Chehab - If one exists, the kernel will re-use an existing mapping to the 42*800c02f5SMauro Carvalho Chehab same segment of the same file if that has compatible permissions, 43*800c02f5SMauro Carvalho Chehab even if this was created by another process. 44*800c02f5SMauro Carvalho Chehab 45*800c02f5SMauro Carvalho Chehab - If possible, the file mapping will be directly on the backing device 46*800c02f5SMauro Carvalho Chehab if the backing device has the NOMMU_MAP_DIRECT capability and 47*800c02f5SMauro Carvalho Chehab appropriate mapping protection capabilities. Ramfs, romfs, cramfs 48*800c02f5SMauro Carvalho Chehab and mtd might all permit this. 49*800c02f5SMauro Carvalho Chehab 50*800c02f5SMauro Carvalho Chehab - If the backing device can't or won't permit direct sharing, 51*800c02f5SMauro Carvalho Chehab but does have the NOMMU_MAP_COPY capability, then a copy of the 52*800c02f5SMauro Carvalho Chehab appropriate bit of the file will be read into a contiguous bit of 53*800c02f5SMauro Carvalho Chehab memory and any extraneous space beyond the EOF will be cleared 54*800c02f5SMauro Carvalho Chehab 55*800c02f5SMauro Carvalho Chehab - Writes to the file do not affect the mapping; writes to the mapping 56*800c02f5SMauro Carvalho Chehab are visible in other processes (no MMU protection), but should not 57*800c02f5SMauro Carvalho Chehab happen. 58*800c02f5SMauro Carvalho Chehab 59*800c02f5SMauro Carvalho Chehab (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE 60*800c02f5SMauro Carvalho Chehab 61*800c02f5SMauro Carvalho Chehab In the MMU case: like the non-PROT_WRITE case, except that the pages in 62*800c02f5SMauro Carvalho Chehab question get copied before the write actually happens. From that point 63*800c02f5SMauro Carvalho Chehab on writes to the file underneath that page no longer get reflected into 64*800c02f5SMauro Carvalho Chehab the mapping's backing pages. The page is then backed by swap instead. 65*800c02f5SMauro Carvalho Chehab 66*800c02f5SMauro Carvalho Chehab In the no-MMU case: works much like the non-PROT_WRITE case, except 67*800c02f5SMauro Carvalho Chehab that a copy is always taken and never shared. 68*800c02f5SMauro Carvalho Chehab 69*800c02f5SMauro Carvalho Chehab (#) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 70*800c02f5SMauro Carvalho Chehab 71*800c02f5SMauro Carvalho Chehab In the MMU case: VM regions backed by pages read from file; changes to 72*800c02f5SMauro Carvalho Chehab pages written back to file; writes to file reflected into pages backing 73*800c02f5SMauro Carvalho Chehab mapping; shared across fork. 74*800c02f5SMauro Carvalho Chehab 75*800c02f5SMauro Carvalho Chehab In the no-MMU case: not supported. 76*800c02f5SMauro Carvalho Chehab 77*800c02f5SMauro Carvalho Chehab (#) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 78*800c02f5SMauro Carvalho Chehab 79*800c02f5SMauro Carvalho Chehab In the MMU case: As for ordinary regular files. 80*800c02f5SMauro Carvalho Chehab 81*800c02f5SMauro Carvalho Chehab In the no-MMU case: The filesystem providing the memory-backed file 82*800c02f5SMauro Carvalho Chehab (such as ramfs or tmpfs) may choose to honour an open, truncate, mmap 83*800c02f5SMauro Carvalho Chehab sequence by providing a contiguous sequence of pages to map. In that 84*800c02f5SMauro Carvalho Chehab case, a shared-writable memory mapping will be possible. It will work 85*800c02f5SMauro Carvalho Chehab as for the MMU case. If the filesystem does not provide any such 86*800c02f5SMauro Carvalho Chehab support, then the mapping request will be denied. 87*800c02f5SMauro Carvalho Chehab 88*800c02f5SMauro Carvalho Chehab (#) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 89*800c02f5SMauro Carvalho Chehab 90*800c02f5SMauro Carvalho Chehab In the MMU case: As for ordinary regular files. 91*800c02f5SMauro Carvalho Chehab 92*800c02f5SMauro Carvalho Chehab In the no-MMU case: As for memory backed regular files, but the 93*800c02f5SMauro Carvalho Chehab blockdev must be able to provide a contiguous run of pages without 94*800c02f5SMauro Carvalho Chehab truncate being called. The ramdisk driver could do this if it allocated 95*800c02f5SMauro Carvalho Chehab all its memory as a contiguous array upfront. 96*800c02f5SMauro Carvalho Chehab 97*800c02f5SMauro Carvalho Chehab (#) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE 98*800c02f5SMauro Carvalho Chehab 99*800c02f5SMauro Carvalho Chehab In the MMU case: As for ordinary regular files. 100*800c02f5SMauro Carvalho Chehab 101*800c02f5SMauro Carvalho Chehab In the no-MMU case: The character device driver may choose to honour 102*800c02f5SMauro Carvalho Chehab the mmap() by providing direct access to the underlying device if it 103*800c02f5SMauro Carvalho Chehab provides memory or quasi-memory that can be accessed directly. Examples 104*800c02f5SMauro Carvalho Chehab of such are frame buffers and flash devices. If the driver does not 105*800c02f5SMauro Carvalho Chehab provide any such support, then the mapping request will be denied. 106*800c02f5SMauro Carvalho Chehab 107*800c02f5SMauro Carvalho Chehab 108*800c02f5SMauro Carvalho ChehabFurther notes on no-MMU MMAP 109*800c02f5SMauro Carvalho Chehab============================ 110*800c02f5SMauro Carvalho Chehab 111*800c02f5SMauro Carvalho Chehab (#) A request for a private mapping of a file may return a buffer that is not 112*800c02f5SMauro Carvalho Chehab page-aligned. This is because XIP may take place, and the data may not be 113*800c02f5SMauro Carvalho Chehab paged aligned in the backing store. 114*800c02f5SMauro Carvalho Chehab 115*800c02f5SMauro Carvalho Chehab (#) A request for an anonymous mapping will always be page aligned. If 116*800c02f5SMauro Carvalho Chehab possible the size of the request should be a power of two otherwise some 117*800c02f5SMauro Carvalho Chehab of the space may be wasted as the kernel must allocate a power-of-2 118*800c02f5SMauro Carvalho Chehab granule but will only discard the excess if appropriately configured as 119*800c02f5SMauro Carvalho Chehab this has an effect on fragmentation. 120*800c02f5SMauro Carvalho Chehab 121*800c02f5SMauro Carvalho Chehab (#) The memory allocated by a request for an anonymous mapping will normally 122*800c02f5SMauro Carvalho Chehab be cleared by the kernel before being returned in accordance with the 123*800c02f5SMauro Carvalho Chehab Linux man pages (ver 2.22 or later). 124*800c02f5SMauro Carvalho Chehab 125*800c02f5SMauro Carvalho Chehab In the MMU case this can be achieved with reasonable performance as 126*800c02f5SMauro Carvalho Chehab regions are backed by virtual pages, with the contents only being mapped 127*800c02f5SMauro Carvalho Chehab to cleared physical pages when a write happens on that specific page 128*800c02f5SMauro Carvalho Chehab (prior to which, the pages are effectively mapped to the global zero page 129*800c02f5SMauro Carvalho Chehab from which reads can take place). This spreads out the time it takes to 130*800c02f5SMauro Carvalho Chehab initialize the contents of a page - depending on the write-usage of the 131*800c02f5SMauro Carvalho Chehab mapping. 132*800c02f5SMauro Carvalho Chehab 133*800c02f5SMauro Carvalho Chehab In the no-MMU case, however, anonymous mappings are backed by physical 134*800c02f5SMauro Carvalho Chehab pages, and the entire map is cleared at allocation time. This can cause 135*800c02f5SMauro Carvalho Chehab significant delays during a userspace malloc() as the C library does an 136*800c02f5SMauro Carvalho Chehab anonymous mapping and the kernel then does a memset for the entire map. 137*800c02f5SMauro Carvalho Chehab 138*800c02f5SMauro Carvalho Chehab However, for memory that isn't required to be precleared - such as that 139*800c02f5SMauro Carvalho Chehab returned by malloc() - mmap() can take a MAP_UNINITIALIZED flag to 140*800c02f5SMauro Carvalho Chehab indicate to the kernel that it shouldn't bother clearing the memory before 141*800c02f5SMauro Carvalho Chehab returning it. Note that CONFIG_MMAP_ALLOW_UNINITIALIZED must be enabled 142*800c02f5SMauro Carvalho Chehab to permit this, otherwise the flag will be ignored. 143*800c02f5SMauro Carvalho Chehab 144*800c02f5SMauro Carvalho Chehab uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this 145*800c02f5SMauro Carvalho Chehab to allocate the brk and stack region. 146*800c02f5SMauro Carvalho Chehab 147*800c02f5SMauro Carvalho Chehab (#) A list of all the private copy and anonymous mappings on the system is 148*800c02f5SMauro Carvalho Chehab visible through /proc/maps in no-MMU mode. 149*800c02f5SMauro Carvalho Chehab 150*800c02f5SMauro Carvalho Chehab (#) A list of all the mappings in use by a process is visible through 151*800c02f5SMauro Carvalho Chehab /proc/<pid>/maps in no-MMU mode. 152*800c02f5SMauro Carvalho Chehab 153*800c02f5SMauro Carvalho Chehab (#) Supplying MAP_FIXED or a requesting a particular mapping address will 154*800c02f5SMauro Carvalho Chehab result in an error. 155*800c02f5SMauro Carvalho Chehab 156*800c02f5SMauro Carvalho Chehab (#) Files mapped privately usually have to have a read method provided by the 157*800c02f5SMauro Carvalho Chehab driver or filesystem so that the contents can be read into the memory 158*800c02f5SMauro Carvalho Chehab allocated if mmap() chooses not to map the backing device directly. An 159*800c02f5SMauro Carvalho Chehab error will result if they don't. This is most likely to be encountered 160*800c02f5SMauro Carvalho Chehab with character device files, pipes, fifos and sockets. 161*800c02f5SMauro Carvalho Chehab 162*800c02f5SMauro Carvalho Chehab 163*800c02f5SMauro Carvalho ChehabInterprocess shared memory 164*800c02f5SMauro Carvalho Chehab========================== 165*800c02f5SMauro Carvalho Chehab 166*800c02f5SMauro Carvalho ChehabBoth SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU 167*800c02f5SMauro Carvalho Chehabmode. The former through the usual mechanism, the latter through files created 168*800c02f5SMauro Carvalho Chehabon ramfs or tmpfs mounts. 169*800c02f5SMauro Carvalho Chehab 170*800c02f5SMauro Carvalho Chehab 171*800c02f5SMauro Carvalho ChehabFutexes 172*800c02f5SMauro Carvalho Chehab======= 173*800c02f5SMauro Carvalho Chehab 174*800c02f5SMauro Carvalho ChehabFutexes are supported in NOMMU mode if the arch supports them. An error will 175*800c02f5SMauro Carvalho Chehabbe given if an address passed to the futex system call lies outside the 176*800c02f5SMauro Carvalho Chehabmappings made by a process or if the mapping in which the address lies does not 177*800c02f5SMauro Carvalho Chehabsupport futexes (such as an I/O chardev mapping). 178*800c02f5SMauro Carvalho Chehab 179*800c02f5SMauro Carvalho Chehab 180*800c02f5SMauro Carvalho ChehabNo-MMU mremap 181*800c02f5SMauro Carvalho Chehab============= 182*800c02f5SMauro Carvalho Chehab 183*800c02f5SMauro Carvalho ChehabThe mremap() function is partially supported. It may change the size of a 184*800c02f5SMauro Carvalho Chehabmapping, and may move it [#]_ if MREMAP_MAYMOVE is specified and if the new size 185*800c02f5SMauro Carvalho Chehabof the mapping exceeds the size of the slab object currently occupied by the 186*800c02f5SMauro Carvalho Chehabmemory to which the mapping refers, or if a smaller slab object could be used. 187*800c02f5SMauro Carvalho Chehab 188*800c02f5SMauro Carvalho ChehabMREMAP_FIXED is not supported, though it is ignored if there's no change of 189*800c02f5SMauro Carvalho Chehabaddress and the object does not need to be moved. 190*800c02f5SMauro Carvalho Chehab 191*800c02f5SMauro Carvalho ChehabShared mappings may not be moved. Shareable mappings may not be moved either, 192*800c02f5SMauro Carvalho Chehabeven if they are not currently shared. 193*800c02f5SMauro Carvalho Chehab 194*800c02f5SMauro Carvalho ChehabThe mremap() function must be given an exact match for base address and size of 195*800c02f5SMauro Carvalho Chehaba previously mapped object. It may not be used to create holes in existing 196*800c02f5SMauro Carvalho Chehabmappings, move parts of existing mappings or resize parts of mappings. It must 197*800c02f5SMauro Carvalho Chehabact on a complete mapping. 198*800c02f5SMauro Carvalho Chehab 199*800c02f5SMauro Carvalho Chehab.. [#] Not currently supported. 200*800c02f5SMauro Carvalho Chehab 201*800c02f5SMauro Carvalho Chehab 202*800c02f5SMauro Carvalho ChehabProviding shareable character device support 203*800c02f5SMauro Carvalho Chehab============================================ 204*800c02f5SMauro Carvalho Chehab 205*800c02f5SMauro Carvalho ChehabTo provide shareable character device support, a driver must provide a 206*800c02f5SMauro Carvalho Chehabfile->f_op->get_unmapped_area() operation. The mmap() routines will call this 207*800c02f5SMauro Carvalho Chehabto get a proposed address for the mapping. This may return an error if it 208*800c02f5SMauro Carvalho Chehabdoesn't wish to honour the mapping because it's too long, at a weird offset, 209*800c02f5SMauro Carvalho Chehabunder some unsupported combination of flags or whatever. 210*800c02f5SMauro Carvalho Chehab 211*800c02f5SMauro Carvalho ChehabThe driver should also provide backing device information with capabilities set 212*800c02f5SMauro Carvalho Chehabto indicate the permitted types of mapping on such devices. The default is 213*800c02f5SMauro Carvalho Chehabassumed to be readable and writable, not executable, and only shareable 214*800c02f5SMauro Carvalho Chehabdirectly (can't be copied). 215*800c02f5SMauro Carvalho Chehab 216*800c02f5SMauro Carvalho ChehabThe file->f_op->mmap() operation will be called to actually inaugurate the 217*800c02f5SMauro Carvalho Chehabmapping. It can be rejected at that point. Returning the ENOSYS error will 218*800c02f5SMauro Carvalho Chehabcause the mapping to be copied instead if NOMMU_MAP_COPY is specified. 219*800c02f5SMauro Carvalho Chehab 220*800c02f5SMauro Carvalho ChehabThe vm_ops->close() routine will be invoked when the last mapping on a chardev 221*800c02f5SMauro Carvalho Chehabis removed. An existing mapping will be shared, partially or not, if possible 222*800c02f5SMauro Carvalho Chehabwithout notifying the driver. 223*800c02f5SMauro Carvalho Chehab 224*800c02f5SMauro Carvalho ChehabIt is permitted also for the file->f_op->get_unmapped_area() operation to 225*800c02f5SMauro Carvalho Chehabreturn -ENOSYS. This will be taken to mean that this operation just doesn't 226*800c02f5SMauro Carvalho Chehabwant to handle it, despite the fact it's got an operation. For instance, it 227*800c02f5SMauro Carvalho Chehabmight try directing the call to a secondary driver which turns out not to 228*800c02f5SMauro Carvalho Chehabimplement it. Such is the case for the framebuffer driver which attempts to 229*800c02f5SMauro Carvalho Chehabdirect the call to the device-specific driver. Under such circumstances, the 230*800c02f5SMauro Carvalho Chehabmapping request will be rejected if NOMMU_MAP_COPY is not specified, and a 231*800c02f5SMauro Carvalho Chehabcopy mapped otherwise. 232*800c02f5SMauro Carvalho Chehab 233*800c02f5SMauro Carvalho Chehab.. important:: 234*800c02f5SMauro Carvalho Chehab 235*800c02f5SMauro Carvalho Chehab Some types of device may present a different appearance to anyone 236*800c02f5SMauro Carvalho Chehab looking at them in certain modes. Flash chips can be like this; for 237*800c02f5SMauro Carvalho Chehab instance if they're in programming or erase mode, you might see the 238*800c02f5SMauro Carvalho Chehab status reflected in the mapping, instead of the data. 239*800c02f5SMauro Carvalho Chehab 240*800c02f5SMauro Carvalho Chehab In such a case, care must be taken lest userspace see a shared or a 241*800c02f5SMauro Carvalho Chehab private mapping showing such information when the driver is busy 242*800c02f5SMauro Carvalho Chehab controlling the device. Remember especially: private executable 243*800c02f5SMauro Carvalho Chehab mappings may still be mapped directly off the device under some 244*800c02f5SMauro Carvalho Chehab circumstances! 245*800c02f5SMauro Carvalho Chehab 246*800c02f5SMauro Carvalho Chehab 247*800c02f5SMauro Carvalho ChehabProviding shareable memory-backed file support 248*800c02f5SMauro Carvalho Chehab============================================== 249*800c02f5SMauro Carvalho Chehab 250*800c02f5SMauro Carvalho ChehabProvision of shared mappings on memory backed files is similar to the provision 251*800c02f5SMauro Carvalho Chehabof support for shared mapped character devices. The main difference is that the 252*800c02f5SMauro Carvalho Chehabfilesystem providing the service will probably allocate a contiguous collection 253*800c02f5SMauro Carvalho Chehabof pages and permit mappings to be made on that. 254*800c02f5SMauro Carvalho Chehab 255*800c02f5SMauro Carvalho ChehabIt is recommended that a truncate operation applied to such a file that 256*800c02f5SMauro Carvalho Chehabincreases the file size, if that file is empty, be taken as a request to gather 257*800c02f5SMauro Carvalho Chehabenough pages to honour a mapping. This is required to support POSIX shared 258*800c02f5SMauro Carvalho Chehabmemory. 259*800c02f5SMauro Carvalho Chehab 260*800c02f5SMauro Carvalho ChehabMemory backed devices are indicated by the mapping's backing device info having 261*800c02f5SMauro Carvalho Chehabthe memory_backed flag set. 262*800c02f5SMauro Carvalho Chehab 263*800c02f5SMauro Carvalho Chehab 264*800c02f5SMauro Carvalho ChehabProviding shareable block device support 265*800c02f5SMauro Carvalho Chehab======================================== 266*800c02f5SMauro Carvalho Chehab 267*800c02f5SMauro Carvalho ChehabProvision of shared mappings on block device files is exactly the same as for 268*800c02f5SMauro Carvalho Chehabcharacter devices. If there isn't a real device underneath, then the driver 269*800c02f5SMauro Carvalho Chehabshould allocate sufficient contiguous memory to honour any supported mapping. 270*800c02f5SMauro Carvalho Chehab 271*800c02f5SMauro Carvalho Chehab 272*800c02f5SMauro Carvalho ChehabAdjusting page trimming behaviour 273*800c02f5SMauro Carvalho Chehab================================= 274*800c02f5SMauro Carvalho Chehab 275*800c02f5SMauro Carvalho ChehabNOMMU mmap automatically rounds up to the nearest power-of-2 number of pages 276*800c02f5SMauro Carvalho Chehabwhen performing an allocation. This can have adverse effects on memory 277*800c02f5SMauro Carvalho Chehabfragmentation, and as such, is left configurable. The default behaviour is to 278*800c02f5SMauro Carvalho Chehabaggressively trim allocations and discard any excess pages back in to the page 279*800c02f5SMauro Carvalho Chehaballocator. In order to retain finer-grained control over fragmentation, this 280*800c02f5SMauro Carvalho Chehabbehaviour can either be disabled completely, or bumped up to a higher page 281*800c02f5SMauro Carvalho Chehabwatermark where trimming begins. 282*800c02f5SMauro Carvalho Chehab 283*800c02f5SMauro Carvalho ChehabPage trimming behaviour is configurable via the sysctl ``vm.nr_trim_pages``. 284