1*c9b54d6fSMauro Carvalho Chehab========================= 2*c9b54d6fSMauro Carvalho ChehabUnaligned Memory Accesses 3*c9b54d6fSMauro Carvalho Chehab========================= 4*c9b54d6fSMauro Carvalho Chehab 5*c9b54d6fSMauro Carvalho Chehab:Author: Daniel Drake <dsd@gentoo.org>, 6*c9b54d6fSMauro Carvalho Chehab:Author: Johannes Berg <johannes@sipsolutions.net> 7*c9b54d6fSMauro Carvalho Chehab 8*c9b54d6fSMauro Carvalho Chehab:With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, 9*c9b54d6fSMauro Carvalho Chehab Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, 10*c9b54d6fSMauro Carvalho Chehab Vadim Lobanov 11*c9b54d6fSMauro Carvalho Chehab 12*c9b54d6fSMauro Carvalho Chehab 13*c9b54d6fSMauro Carvalho ChehabLinux runs on a wide variety of architectures which have varying behaviour 14*c9b54d6fSMauro Carvalho Chehabwhen it comes to memory access. This document presents some details about 15*c9b54d6fSMauro Carvalho Chehabunaligned accesses, why you need to write code that doesn't cause them, 16*c9b54d6fSMauro Carvalho Chehaband how to write such code! 17*c9b54d6fSMauro Carvalho Chehab 18*c9b54d6fSMauro Carvalho Chehab 19*c9b54d6fSMauro Carvalho ChehabThe definition of an unaligned access 20*c9b54d6fSMauro Carvalho Chehab===================================== 21*c9b54d6fSMauro Carvalho Chehab 22*c9b54d6fSMauro Carvalho ChehabUnaligned memory accesses occur when you try to read N bytes of data starting 23*c9b54d6fSMauro Carvalho Chehabfrom an address that is not evenly divisible by N (i.e. addr % N != 0). 24*c9b54d6fSMauro Carvalho ChehabFor example, reading 4 bytes of data from address 0x10004 is fine, but 25*c9b54d6fSMauro Carvalho Chehabreading 4 bytes of data from address 0x10005 would be an unaligned memory 26*c9b54d6fSMauro Carvalho Chehabaccess. 27*c9b54d6fSMauro Carvalho Chehab 28*c9b54d6fSMauro Carvalho ChehabThe above may seem a little vague, as memory access can happen in different 29*c9b54d6fSMauro Carvalho Chehabways. The context here is at the machine code level: certain instructions read 30*c9b54d6fSMauro Carvalho Chehabor write a number of bytes to or from memory (e.g. movb, movw, movl in x86 31*c9b54d6fSMauro Carvalho Chehabassembly). As will become clear, it is relatively easy to spot C statements 32*c9b54d6fSMauro Carvalho Chehabwhich will compile to multiple-byte memory access instructions, namely when 33*c9b54d6fSMauro Carvalho Chehabdealing with types such as u16, u32 and u64. 34*c9b54d6fSMauro Carvalho Chehab 35*c9b54d6fSMauro Carvalho Chehab 36*c9b54d6fSMauro Carvalho ChehabNatural alignment 37*c9b54d6fSMauro Carvalho Chehab================= 38*c9b54d6fSMauro Carvalho Chehab 39*c9b54d6fSMauro Carvalho ChehabThe rule mentioned above forms what we refer to as natural alignment: 40*c9b54d6fSMauro Carvalho ChehabWhen accessing N bytes of memory, the base memory address must be evenly 41*c9b54d6fSMauro Carvalho Chehabdivisible by N, i.e. addr % N == 0. 42*c9b54d6fSMauro Carvalho Chehab 43*c9b54d6fSMauro Carvalho ChehabWhen writing code, assume the target architecture has natural alignment 44*c9b54d6fSMauro Carvalho Chehabrequirements. 45*c9b54d6fSMauro Carvalho Chehab 46*c9b54d6fSMauro Carvalho ChehabIn reality, only a few architectures require natural alignment on all sizes 47*c9b54d6fSMauro Carvalho Chehabof memory access. However, we must consider ALL supported architectures; 48*c9b54d6fSMauro Carvalho Chehabwriting code that satisfies natural alignment requirements is the easiest way 49*c9b54d6fSMauro Carvalho Chehabto achieve full portability. 50*c9b54d6fSMauro Carvalho Chehab 51*c9b54d6fSMauro Carvalho Chehab 52*c9b54d6fSMauro Carvalho ChehabWhy unaligned access is bad 53*c9b54d6fSMauro Carvalho Chehab=========================== 54*c9b54d6fSMauro Carvalho Chehab 55*c9b54d6fSMauro Carvalho ChehabThe effects of performing an unaligned memory access vary from architecture 56*c9b54d6fSMauro Carvalho Chehabto architecture. It would be easy to write a whole document on the differences 57*c9b54d6fSMauro Carvalho Chehabhere; a summary of the common scenarios is presented below: 58*c9b54d6fSMauro Carvalho Chehab 59*c9b54d6fSMauro Carvalho Chehab - Some architectures are able to perform unaligned memory accesses 60*c9b54d6fSMauro Carvalho Chehab transparently, but there is usually a significant performance cost. 61*c9b54d6fSMauro Carvalho Chehab - Some architectures raise processor exceptions when unaligned accesses 62*c9b54d6fSMauro Carvalho Chehab happen. The exception handler is able to correct the unaligned access, 63*c9b54d6fSMauro Carvalho Chehab at significant cost to performance. 64*c9b54d6fSMauro Carvalho Chehab - Some architectures raise processor exceptions when unaligned accesses 65*c9b54d6fSMauro Carvalho Chehab happen, but the exceptions do not contain enough information for the 66*c9b54d6fSMauro Carvalho Chehab unaligned access to be corrected. 67*c9b54d6fSMauro Carvalho Chehab - Some architectures are not capable of unaligned memory access, but will 68*c9b54d6fSMauro Carvalho Chehab silently perform a different memory access to the one that was requested, 69*c9b54d6fSMauro Carvalho Chehab resulting in a subtle code bug that is hard to detect! 70*c9b54d6fSMauro Carvalho Chehab 71*c9b54d6fSMauro Carvalho ChehabIt should be obvious from the above that if your code causes unaligned 72*c9b54d6fSMauro Carvalho Chehabmemory accesses to happen, your code will not work correctly on certain 73*c9b54d6fSMauro Carvalho Chehabplatforms and will cause performance problems on others. 74*c9b54d6fSMauro Carvalho Chehab 75*c9b54d6fSMauro Carvalho Chehab 76*c9b54d6fSMauro Carvalho ChehabCode that does not cause unaligned access 77*c9b54d6fSMauro Carvalho Chehab========================================= 78*c9b54d6fSMauro Carvalho Chehab 79*c9b54d6fSMauro Carvalho ChehabAt first, the concepts above may seem a little hard to relate to actual 80*c9b54d6fSMauro Carvalho Chehabcoding practice. After all, you don't have a great deal of control over 81*c9b54d6fSMauro Carvalho Chehabmemory addresses of certain variables, etc. 82*c9b54d6fSMauro Carvalho Chehab 83*c9b54d6fSMauro Carvalho ChehabFortunately things are not too complex, as in most cases, the compiler 84*c9b54d6fSMauro Carvalho Chehabensures that things will work for you. For example, take the following 85*c9b54d6fSMauro Carvalho Chehabstructure:: 86*c9b54d6fSMauro Carvalho Chehab 87*c9b54d6fSMauro Carvalho Chehab struct foo { 88*c9b54d6fSMauro Carvalho Chehab u16 field1; 89*c9b54d6fSMauro Carvalho Chehab u32 field2; 90*c9b54d6fSMauro Carvalho Chehab u8 field3; 91*c9b54d6fSMauro Carvalho Chehab }; 92*c9b54d6fSMauro Carvalho Chehab 93*c9b54d6fSMauro Carvalho ChehabLet us assume that an instance of the above structure resides in memory 94*c9b54d6fSMauro Carvalho Chehabstarting at address 0x10000. With a basic level of understanding, it would 95*c9b54d6fSMauro Carvalho Chehabnot be unreasonable to expect that accessing field2 would cause an unaligned 96*c9b54d6fSMauro Carvalho Chehabaccess. You'd be expecting field2 to be located at offset 2 bytes into the 97*c9b54d6fSMauro Carvalho Chehabstructure, i.e. address 0x10002, but that address is not evenly divisible 98*c9b54d6fSMauro Carvalho Chehabby 4 (remember, we're reading a 4 byte value here). 99*c9b54d6fSMauro Carvalho Chehab 100*c9b54d6fSMauro Carvalho ChehabFortunately, the compiler understands the alignment constraints, so in the 101*c9b54d6fSMauro Carvalho Chehababove case it would insert 2 bytes of padding in between field1 and field2. 102*c9b54d6fSMauro Carvalho ChehabTherefore, for standard structure types you can always rely on the compiler 103*c9b54d6fSMauro Carvalho Chehabto pad structures so that accesses to fields are suitably aligned (assuming 104*c9b54d6fSMauro Carvalho Chehabyou do not cast the field to a type of different length). 105*c9b54d6fSMauro Carvalho Chehab 106*c9b54d6fSMauro Carvalho ChehabSimilarly, you can also rely on the compiler to align variables and function 107*c9b54d6fSMauro Carvalho Chehabparameters to a naturally aligned scheme, based on the size of the type of 108*c9b54d6fSMauro Carvalho Chehabthe variable. 109*c9b54d6fSMauro Carvalho Chehab 110*c9b54d6fSMauro Carvalho ChehabAt this point, it should be clear that accessing a single byte (u8 or char) 111*c9b54d6fSMauro Carvalho Chehabwill never cause an unaligned access, because all memory addresses are evenly 112*c9b54d6fSMauro Carvalho Chehabdivisible by one. 113*c9b54d6fSMauro Carvalho Chehab 114*c9b54d6fSMauro Carvalho ChehabOn a related topic, with the above considerations in mind you may observe 115*c9b54d6fSMauro Carvalho Chehabthat you could reorder the fields in the structure in order to place fields 116*c9b54d6fSMauro Carvalho Chehabwhere padding would otherwise be inserted, and hence reduce the overall 117*c9b54d6fSMauro Carvalho Chehabresident memory size of structure instances. The optimal layout of the 118*c9b54d6fSMauro Carvalho Chehababove example is:: 119*c9b54d6fSMauro Carvalho Chehab 120*c9b54d6fSMauro Carvalho Chehab struct foo { 121*c9b54d6fSMauro Carvalho Chehab u32 field2; 122*c9b54d6fSMauro Carvalho Chehab u16 field1; 123*c9b54d6fSMauro Carvalho Chehab u8 field3; 124*c9b54d6fSMauro Carvalho Chehab }; 125*c9b54d6fSMauro Carvalho Chehab 126*c9b54d6fSMauro Carvalho ChehabFor a natural alignment scheme, the compiler would only have to add a single 127*c9b54d6fSMauro Carvalho Chehabbyte of padding at the end of the structure. This padding is added in order 128*c9b54d6fSMauro Carvalho Chehabto satisfy alignment constraints for arrays of these structures. 129*c9b54d6fSMauro Carvalho Chehab 130*c9b54d6fSMauro Carvalho ChehabAnother point worth mentioning is the use of __attribute__((packed)) on a 131*c9b54d6fSMauro Carvalho Chehabstructure type. This GCC-specific attribute tells the compiler never to 132*c9b54d6fSMauro Carvalho Chehabinsert any padding within structures, useful when you want to use a C struct 133*c9b54d6fSMauro Carvalho Chehabto represent some data that comes in a fixed arrangement 'off the wire'. 134*c9b54d6fSMauro Carvalho Chehab 135*c9b54d6fSMauro Carvalho ChehabYou might be inclined to believe that usage of this attribute can easily 136*c9b54d6fSMauro Carvalho Chehablead to unaligned accesses when accessing fields that do not satisfy 137*c9b54d6fSMauro Carvalho Chehabarchitectural alignment requirements. However, again, the compiler is aware 138*c9b54d6fSMauro Carvalho Chehabof the alignment constraints and will generate extra instructions to perform 139*c9b54d6fSMauro Carvalho Chehabthe memory access in a way that does not cause unaligned access. Of course, 140*c9b54d6fSMauro Carvalho Chehabthe extra instructions obviously cause a loss in performance compared to the 141*c9b54d6fSMauro Carvalho Chehabnon-packed case, so the packed attribute should only be used when avoiding 142*c9b54d6fSMauro Carvalho Chehabstructure padding is of importance. 143*c9b54d6fSMauro Carvalho Chehab 144*c9b54d6fSMauro Carvalho Chehab 145*c9b54d6fSMauro Carvalho ChehabCode that causes unaligned access 146*c9b54d6fSMauro Carvalho Chehab================================= 147*c9b54d6fSMauro Carvalho Chehab 148*c9b54d6fSMauro Carvalho ChehabWith the above in mind, let's move onto a real life example of a function 149*c9b54d6fSMauro Carvalho Chehabthat can cause an unaligned memory access. The following function taken 150*c9b54d6fSMauro Carvalho Chehabfrom include/linux/etherdevice.h is an optimized routine to compare two 151*c9b54d6fSMauro Carvalho Chehabethernet MAC addresses for equality:: 152*c9b54d6fSMauro Carvalho Chehab 153*c9b54d6fSMauro Carvalho Chehab bool ether_addr_equal(const u8 *addr1, const u8 *addr2) 154*c9b54d6fSMauro Carvalho Chehab { 155*c9b54d6fSMauro Carvalho Chehab #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 156*c9b54d6fSMauro Carvalho Chehab u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) | 157*c9b54d6fSMauro Carvalho Chehab ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4))); 158*c9b54d6fSMauro Carvalho Chehab 159*c9b54d6fSMauro Carvalho Chehab return fold == 0; 160*c9b54d6fSMauro Carvalho Chehab #else 161*c9b54d6fSMauro Carvalho Chehab const u16 *a = (const u16 *)addr1; 162*c9b54d6fSMauro Carvalho Chehab const u16 *b = (const u16 *)addr2; 163*c9b54d6fSMauro Carvalho Chehab return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0; 164*c9b54d6fSMauro Carvalho Chehab #endif 165*c9b54d6fSMauro Carvalho Chehab } 166*c9b54d6fSMauro Carvalho Chehab 167*c9b54d6fSMauro Carvalho ChehabIn the above function, when the hardware has efficient unaligned access 168*c9b54d6fSMauro Carvalho Chehabcapability, there is no issue with this code. But when the hardware isn't 169*c9b54d6fSMauro Carvalho Chehabable to access memory on arbitrary boundaries, the reference to a[0] causes 170*c9b54d6fSMauro Carvalho Chehab2 bytes (16 bits) to be read from memory starting at address addr1. 171*c9b54d6fSMauro Carvalho Chehab 172*c9b54d6fSMauro Carvalho ChehabThink about what would happen if addr1 was an odd address such as 0x10003. 173*c9b54d6fSMauro Carvalho Chehab(Hint: it'd be an unaligned access.) 174*c9b54d6fSMauro Carvalho Chehab 175*c9b54d6fSMauro Carvalho ChehabDespite the potential unaligned access problems with the above function, it 176*c9b54d6fSMauro Carvalho Chehabis included in the kernel anyway but is understood to only work normally on 177*c9b54d6fSMauro Carvalho Chehab16-bit-aligned addresses. It is up to the caller to ensure this alignment or 178*c9b54d6fSMauro Carvalho Chehabnot use this function at all. This alignment-unsafe function is still useful 179*c9b54d6fSMauro Carvalho Chehabas it is a decent optimization for the cases when you can ensure alignment, 180*c9b54d6fSMauro Carvalho Chehabwhich is true almost all of the time in ethernet networking context. 181*c9b54d6fSMauro Carvalho Chehab 182*c9b54d6fSMauro Carvalho Chehab 183*c9b54d6fSMauro Carvalho ChehabHere is another example of some code that could cause unaligned accesses:: 184*c9b54d6fSMauro Carvalho Chehab 185*c9b54d6fSMauro Carvalho Chehab void myfunc(u8 *data, u32 value) 186*c9b54d6fSMauro Carvalho Chehab { 187*c9b54d6fSMauro Carvalho Chehab [...] 188*c9b54d6fSMauro Carvalho Chehab *((u32 *) data) = cpu_to_le32(value); 189*c9b54d6fSMauro Carvalho Chehab [...] 190*c9b54d6fSMauro Carvalho Chehab } 191*c9b54d6fSMauro Carvalho Chehab 192*c9b54d6fSMauro Carvalho ChehabThis code will cause unaligned accesses every time the data parameter points 193*c9b54d6fSMauro Carvalho Chehabto an address that is not evenly divisible by 4. 194*c9b54d6fSMauro Carvalho Chehab 195*c9b54d6fSMauro Carvalho ChehabIn summary, the 2 main scenarios where you may run into unaligned access 196*c9b54d6fSMauro Carvalho Chehabproblems involve: 197*c9b54d6fSMauro Carvalho Chehab 198*c9b54d6fSMauro Carvalho Chehab 1. Casting variables to types of different lengths 199*c9b54d6fSMauro Carvalho Chehab 2. Pointer arithmetic followed by access to at least 2 bytes of data 200*c9b54d6fSMauro Carvalho Chehab 201*c9b54d6fSMauro Carvalho Chehab 202*c9b54d6fSMauro Carvalho ChehabAvoiding unaligned accesses 203*c9b54d6fSMauro Carvalho Chehab=========================== 204*c9b54d6fSMauro Carvalho Chehab 205*c9b54d6fSMauro Carvalho ChehabThe easiest way to avoid unaligned access is to use the get_unaligned() and 206*c9b54d6fSMauro Carvalho Chehabput_unaligned() macros provided by the <asm/unaligned.h> header file. 207*c9b54d6fSMauro Carvalho Chehab 208*c9b54d6fSMauro Carvalho ChehabGoing back to an earlier example of code that potentially causes unaligned 209*c9b54d6fSMauro Carvalho Chehabaccess:: 210*c9b54d6fSMauro Carvalho Chehab 211*c9b54d6fSMauro Carvalho Chehab void myfunc(u8 *data, u32 value) 212*c9b54d6fSMauro Carvalho Chehab { 213*c9b54d6fSMauro Carvalho Chehab [...] 214*c9b54d6fSMauro Carvalho Chehab *((u32 *) data) = cpu_to_le32(value); 215*c9b54d6fSMauro Carvalho Chehab [...] 216*c9b54d6fSMauro Carvalho Chehab } 217*c9b54d6fSMauro Carvalho Chehab 218*c9b54d6fSMauro Carvalho ChehabTo avoid the unaligned memory access, you would rewrite it as follows:: 219*c9b54d6fSMauro Carvalho Chehab 220*c9b54d6fSMauro Carvalho Chehab void myfunc(u8 *data, u32 value) 221*c9b54d6fSMauro Carvalho Chehab { 222*c9b54d6fSMauro Carvalho Chehab [...] 223*c9b54d6fSMauro Carvalho Chehab value = cpu_to_le32(value); 224*c9b54d6fSMauro Carvalho Chehab put_unaligned(value, (u32 *) data); 225*c9b54d6fSMauro Carvalho Chehab [...] 226*c9b54d6fSMauro Carvalho Chehab } 227*c9b54d6fSMauro Carvalho Chehab 228*c9b54d6fSMauro Carvalho ChehabThe get_unaligned() macro works similarly. Assuming 'data' is a pointer to 229*c9b54d6fSMauro Carvalho Chehabmemory and you wish to avoid unaligned access, its usage is as follows:: 230*c9b54d6fSMauro Carvalho Chehab 231*c9b54d6fSMauro Carvalho Chehab u32 value = get_unaligned((u32 *) data); 232*c9b54d6fSMauro Carvalho Chehab 233*c9b54d6fSMauro Carvalho ChehabThese macros work for memory accesses of any length (not just 32 bits as 234*c9b54d6fSMauro Carvalho Chehabin the examples above). Be aware that when compared to standard access of 235*c9b54d6fSMauro Carvalho Chehabaligned memory, using these macros to access unaligned memory can be costly in 236*c9b54d6fSMauro Carvalho Chehabterms of performance. 237*c9b54d6fSMauro Carvalho Chehab 238*c9b54d6fSMauro Carvalho ChehabIf use of such macros is not convenient, another option is to use memcpy(), 239*c9b54d6fSMauro Carvalho Chehabwhere the source or destination (or both) are of type u8* or unsigned char*. 240*c9b54d6fSMauro Carvalho ChehabDue to the byte-wise nature of this operation, unaligned accesses are avoided. 241*c9b54d6fSMauro Carvalho Chehab 242*c9b54d6fSMauro Carvalho Chehab 243*c9b54d6fSMauro Carvalho ChehabAlignment vs. Networking 244*c9b54d6fSMauro Carvalho Chehab======================== 245*c9b54d6fSMauro Carvalho Chehab 246*c9b54d6fSMauro Carvalho ChehabOn architectures that require aligned loads, networking requires that the IP 247*c9b54d6fSMauro Carvalho Chehabheader is aligned on a four-byte boundary to optimise the IP stack. For 248*c9b54d6fSMauro Carvalho Chehabregular ethernet hardware, the constant NET_IP_ALIGN is used. On most 249*c9b54d6fSMauro Carvalho Chehabarchitectures this constant has the value 2 because the normal ethernet 250*c9b54d6fSMauro Carvalho Chehabheader is 14 bytes long, so in order to get proper alignment one needs to 251*c9b54d6fSMauro Carvalho ChehabDMA to an address which can be expressed as 4*n + 2. One notable exception 252*c9b54d6fSMauro Carvalho Chehabhere is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned 253*c9b54d6fSMauro Carvalho Chehabaddresses can be very expensive and dwarf the cost of unaligned loads. 254*c9b54d6fSMauro Carvalho Chehab 255*c9b54d6fSMauro Carvalho ChehabFor some ethernet hardware that cannot DMA to unaligned addresses like 256*c9b54d6fSMauro Carvalho Chehab4*n+2 or non-ethernet hardware, this can be a problem, and it is then 257*c9b54d6fSMauro Carvalho Chehabrequired to copy the incoming frame into an aligned buffer. Because this is 258*c9b54d6fSMauro Carvalho Chehabunnecessary on architectures that can do unaligned accesses, the code can be 259*c9b54d6fSMauro Carvalho Chehabmade dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so:: 260*c9b54d6fSMauro Carvalho Chehab 261*c9b54d6fSMauro Carvalho Chehab #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS 262*c9b54d6fSMauro Carvalho Chehab skb = original skb 263*c9b54d6fSMauro Carvalho Chehab #else 264*c9b54d6fSMauro Carvalho Chehab skb = copy skb 265*c9b54d6fSMauro Carvalho Chehab #endif 266