15ecd0a06SJonathan Corbet================================= 25ecd0a06SJonathan Corbet(How to avoid) Botching up ioctls 35ecd0a06SJonathan Corbet================================= 45ecd0a06SJonathan Corbet 5e7b4311eSAlexander A. KlimovFrom: https://blog.ffwll.ch/2013/11/botching-up-ioctls.html 65ecd0a06SJonathan Corbet 75ecd0a06SJonathan CorbetBy: Daniel Vetter, Copyright © 2013 Intel Corporation 85ecd0a06SJonathan Corbet 95ecd0a06SJonathan CorbetOne clear insight kernel graphics hackers gained in the past few years is that 105ecd0a06SJonathan Corbettrying to come up with a unified interface to manage the execution units and 115ecd0a06SJonathan Corbetmemory on completely different GPUs is a futile effort. So nowadays every 125ecd0a06SJonathan Corbetdriver has its own set of ioctls to allocate memory and submit work to the GPU. 135ecd0a06SJonathan CorbetWhich is nice, since there's no more insanity in the form of fake-generic, but 145ecd0a06SJonathan Corbetactually only used once interfaces. But the clear downside is that there's much 155ecd0a06SJonathan Corbetmore potential to screw things up. 165ecd0a06SJonathan Corbet 175ecd0a06SJonathan CorbetTo avoid repeating all the same mistakes again I've written up some of the 185ecd0a06SJonathan Corbetlessons learned while botching the job for the drm/i915 driver. Most of these 195ecd0a06SJonathan Corbetonly cover technicalities and not the big-picture issues like what the command 205ecd0a06SJonathan Corbetsubmission ioctl exactly should look like. Learning these lessons is probably 215ecd0a06SJonathan Corbetsomething every GPU driver has to do on its own. 225ecd0a06SJonathan Corbet 235ecd0a06SJonathan Corbet 245ecd0a06SJonathan CorbetPrerequisites 255ecd0a06SJonathan Corbet------------- 265ecd0a06SJonathan Corbet 275ecd0a06SJonathan CorbetFirst the prerequisites. Without these you have already failed, because you 285ecd0a06SJonathan Corbetwill need to add a 32-bit compat layer: 295ecd0a06SJonathan Corbet 305ecd0a06SJonathan Corbet * Only use fixed sized integers. To avoid conflicts with typedefs in userspace 315ecd0a06SJonathan Corbet the kernel has special types like __u32, __s64. Use them. 325ecd0a06SJonathan Corbet 335ecd0a06SJonathan Corbet * Align everything to the natural size and use explicit padding. 32-bit 345ecd0a06SJonathan Corbet platforms don't necessarily align 64-bit values to 64-bit boundaries, but 355ecd0a06SJonathan Corbet 64-bit platforms do. So we always need padding to the natural size to get 365ecd0a06SJonathan Corbet this right. 375ecd0a06SJonathan Corbet 385ecd0a06SJonathan Corbet * Pad the entire struct to a multiple of 64-bits if the structure contains 395ecd0a06SJonathan Corbet 64-bit types - the structure size will otherwise differ on 32-bit versus 405ecd0a06SJonathan Corbet 64-bit. Having a different structure size hurts when passing arrays of 415ecd0a06SJonathan Corbet structures to the kernel, or if the kernel checks the structure size, which 425ecd0a06SJonathan Corbet e.g. the drm core does. 435ecd0a06SJonathan Corbet 4453379797SFederico Vaga * Pointers are __u64, cast from/to a uintptr_t on the userspace side and 455ecd0a06SJonathan Corbet from/to a void __user * in the kernel. Try really hard not to delay this 465ecd0a06SJonathan Corbet conversion or worse, fiddle the raw __u64 through your code since that 475ecd0a06SJonathan Corbet diminishes the checking tools like sparse can provide. The macro 485ecd0a06SJonathan Corbet u64_to_user_ptr can be used in the kernel to avoid warnings about integers 49b275fb60SChris Packham and pointers of different sizes. 505ecd0a06SJonathan Corbet 515ecd0a06SJonathan Corbet 525ecd0a06SJonathan CorbetBasics 535ecd0a06SJonathan Corbet------ 545ecd0a06SJonathan Corbet 555ecd0a06SJonathan CorbetWith the joys of writing a compat layer avoided we can take a look at the basic 565ecd0a06SJonathan Corbetfumbles. Neglecting these will make backward and forward compatibility a real 575ecd0a06SJonathan Corbetpain. And since getting things wrong on the first attempt is guaranteed you 585ecd0a06SJonathan Corbetwill have a second iteration or at least an extension for any given interface. 595ecd0a06SJonathan Corbet 605ecd0a06SJonathan Corbet * Have a clear way for userspace to figure out whether your new ioctl or ioctl 615ecd0a06SJonathan Corbet extension is supported on a given kernel. If you can't rely on old kernels 625ecd0a06SJonathan Corbet rejecting the new flags/modes or ioctls (since doing that was botched in the 635ecd0a06SJonathan Corbet past) then you need a driver feature flag or revision number somewhere. 645ecd0a06SJonathan Corbet 655ecd0a06SJonathan Corbet * Have a plan for extending ioctls with new flags or new fields at the end of 665ecd0a06SJonathan Corbet the structure. The drm core checks the passed-in size for each ioctl call 675ecd0a06SJonathan Corbet and zero-extends any mismatches between kernel and userspace. That helps, 685ecd0a06SJonathan Corbet but isn't a complete solution since newer userspace on older kernels won't 695ecd0a06SJonathan Corbet notice that the newly added fields at the end get ignored. So this still 705ecd0a06SJonathan Corbet needs a new driver feature flags. 715ecd0a06SJonathan Corbet 725ecd0a06SJonathan Corbet * Check all unused fields and flags and all the padding for whether it's 0, 735ecd0a06SJonathan Corbet and reject the ioctl if that's not the case. Otherwise your nice plan for 745ecd0a06SJonathan Corbet future extensions is going right down the gutters since someone will submit 755ecd0a06SJonathan Corbet an ioctl struct with random stack garbage in the yet unused parts. Which 765ecd0a06SJonathan Corbet then bakes in the ABI that those fields can never be used for anything else 775ecd0a06SJonathan Corbet but garbage. This is also the reason why you must explicitly pad all 785ecd0a06SJonathan Corbet structures, even if you never use them in an array - the padding the compiler 795ecd0a06SJonathan Corbet might insert could contain garbage. 805ecd0a06SJonathan Corbet 815ecd0a06SJonathan Corbet * Have simple testcases for all of the above. 825ecd0a06SJonathan Corbet 835ecd0a06SJonathan Corbet 845ecd0a06SJonathan CorbetFun with Error Paths 855ecd0a06SJonathan Corbet-------------------- 865ecd0a06SJonathan Corbet 875ecd0a06SJonathan CorbetNowadays we don't have any excuse left any more for drm drivers being neat 885ecd0a06SJonathan Corbetlittle root exploits. This means we both need full input validation and solid 895ecd0a06SJonathan Corbeterror handling paths - GPUs will die eventually in the oddmost corner cases 905ecd0a06SJonathan Corbetanyway: 915ecd0a06SJonathan Corbet 925ecd0a06SJonathan Corbet * The ioctl must check for array overflows. Also it needs to check for 935ecd0a06SJonathan Corbet over/underflows and clamping issues of integer values in general. The usual 945ecd0a06SJonathan Corbet example is sprite positioning values fed directly into the hardware with the 955ecd0a06SJonathan Corbet hardware just having 12 bits or so. Works nicely until some odd display 965ecd0a06SJonathan Corbet server doesn't bother with clamping itself and the cursor wraps around the 975ecd0a06SJonathan Corbet screen. 985ecd0a06SJonathan Corbet 995ecd0a06SJonathan Corbet * Have simple testcases for every input validation failure case in your ioctl. 1005ecd0a06SJonathan Corbet Check that the error code matches your expectations. And finally make sure 1015ecd0a06SJonathan Corbet that you only test for one single error path in each subtest by submitting 1025ecd0a06SJonathan Corbet otherwise perfectly valid data. Without this an earlier check might reject 1035ecd0a06SJonathan Corbet the ioctl already and shadow the codepath you actually want to test, hiding 1045ecd0a06SJonathan Corbet bugs and regressions. 1055ecd0a06SJonathan Corbet 1065ecd0a06SJonathan Corbet * Make all your ioctls restartable. First X really loves signals and second 1075ecd0a06SJonathan Corbet this will allow you to test 90% of all error handling paths by just 1085ecd0a06SJonathan Corbet interrupting your main test suite constantly with signals. Thanks to X's 1095ecd0a06SJonathan Corbet love for signal you'll get an excellent base coverage of all your error 1105ecd0a06SJonathan Corbet paths pretty much for free for graphics drivers. Also, be consistent with 1115ecd0a06SJonathan Corbet how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its 1125ecd0a06SJonathan Corbet userspace library. The i915 driver botched this with the set_tiling ioctl, 1135ecd0a06SJonathan Corbet now we're stuck forever with some arcane semantics in both the kernel and 1145ecd0a06SJonathan Corbet userspace. 1155ecd0a06SJonathan Corbet 1165ecd0a06SJonathan Corbet * If you can't make a given codepath restartable make a stuck task at least 1175ecd0a06SJonathan Corbet killable. GPUs just die and your users won't like you more if you hang their 1185ecd0a06SJonathan Corbet entire box (by means of an unkillable X process). If the state recovery is 1195ecd0a06SJonathan Corbet still too tricky have a timeout or hangcheck safety net as a last-ditch 1205ecd0a06SJonathan Corbet effort in case the hardware has gone bananas. 1215ecd0a06SJonathan Corbet 1225ecd0a06SJonathan Corbet * Have testcases for the really tricky corner cases in your error recovery code 1235ecd0a06SJonathan Corbet - it's way too easy to create a deadlock between your hangcheck code and 1245ecd0a06SJonathan Corbet waiters. 1255ecd0a06SJonathan Corbet 1265ecd0a06SJonathan Corbet 1275ecd0a06SJonathan CorbetTime, Waiting and Missing it 1285ecd0a06SJonathan Corbet---------------------------- 1295ecd0a06SJonathan Corbet 1305ecd0a06SJonathan CorbetGPUs do most everything asynchronously, so we have a need to time operations and 1315ecd0a06SJonathan Corbetwait for outstanding ones. This is really tricky business; at the moment none of 1325ecd0a06SJonathan Corbetthe ioctls supported by the drm/i915 get this fully right, which means there's 1335ecd0a06SJonathan Corbetstill tons more lessons to learn here. 1345ecd0a06SJonathan Corbet 1355ecd0a06SJonathan Corbet * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and 1365ecd0a06SJonathan Corbet v4l use by default nowadays. But let userspace know which timestamps are 1375ecd0a06SJonathan Corbet derived from different clock domains like your main system clock (provided 1385ecd0a06SJonathan Corbet by the kernel) or some independent hardware counter somewhere else. Clocks 1395ecd0a06SJonathan Corbet will mismatch if you look close enough, but if performance measuring tools 1405ecd0a06SJonathan Corbet have this information they can at least compensate. If your userspace can 1415ecd0a06SJonathan Corbet get at the raw values of some clocks (e.g. through in-command-stream 1425ecd0a06SJonathan Corbet performance counter sampling instructions) consider exposing those also. 1435ecd0a06SJonathan Corbet 1445ecd0a06SJonathan Corbet * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most 1455ecd0a06SJonathan Corbet convenient time specification, but it's mostly the standard. 1465ecd0a06SJonathan Corbet 1475ecd0a06SJonathan Corbet * Check that input time values are normalized and reject them if not. Note 1485ecd0a06SJonathan Corbet that the kernel native struct ktime has a signed integer for both seconds 1495ecd0a06SJonathan Corbet and nanoseconds, so beware here. 1505ecd0a06SJonathan Corbet 1515ecd0a06SJonathan Corbet * For timeouts, use absolute times. If you're a good fellow and made your 1525ecd0a06SJonathan Corbet ioctl restartable relative timeouts tend to be too coarse and can 1535ecd0a06SJonathan Corbet indefinitely extend your wait time due to rounding on each restart. 1545ecd0a06SJonathan Corbet Especially if your reference clock is something really slow like the display 1555ecd0a06SJonathan Corbet frame counter. With a spec lawyer hat on this isn't a bug since timeouts can 1565ecd0a06SJonathan Corbet always be extended - but users will surely hate you if their neat animations 1575ecd0a06SJonathan Corbet starts to stutter due to this. 1585ecd0a06SJonathan Corbet 1595ecd0a06SJonathan Corbet * Consider ditching any synchronous wait ioctls with timeouts and just deliver 1605ecd0a06SJonathan Corbet an asynchronous event on a pollable file descriptor. It fits much better 1615ecd0a06SJonathan Corbet into event driven applications' main loop. 1625ecd0a06SJonathan Corbet 1635ecd0a06SJonathan Corbet * Have testcases for corner-cases, especially whether the return values for 1645ecd0a06SJonathan Corbet already-completed events, successful waits and timed-out waits are all sane 1655ecd0a06SJonathan Corbet and suiting to your needs. 1665ecd0a06SJonathan Corbet 1675ecd0a06SJonathan Corbet 1685ecd0a06SJonathan CorbetLeaking Resources, Not 1695ecd0a06SJonathan Corbet---------------------- 1705ecd0a06SJonathan Corbet 1715ecd0a06SJonathan CorbetA full-blown drm driver essentially implements a little OS, but specialized to 1725ecd0a06SJonathan Corbetthe given GPU platforms. This means a driver needs to expose tons of handles 1735ecd0a06SJonathan Corbetfor different objects and other resources to userspace. Doing that right 1745ecd0a06SJonathan Corbetentails its own little set of pitfalls: 1755ecd0a06SJonathan Corbet 1765ecd0a06SJonathan Corbet * Always attach the lifetime of your dynamically created resources to the 1775ecd0a06SJonathan Corbet lifetime of a file descriptor. Consider using a 1:1 mapping if your resource 1785ecd0a06SJonathan Corbet needs to be shared across processes - fd-passing over unix domain sockets 1795ecd0a06SJonathan Corbet also simplifies lifetime management for userspace. 1805ecd0a06SJonathan Corbet 1815ecd0a06SJonathan Corbet * Always have O_CLOEXEC support. 1825ecd0a06SJonathan Corbet 1835ecd0a06SJonathan Corbet * Ensure that you have sufficient insulation between different clients. By 1845ecd0a06SJonathan Corbet default pick a private per-fd namespace which forces any sharing to be done 1855ecd0a06SJonathan Corbet explicitly. Only go with a more global per-device namespace if the objects 1865ecd0a06SJonathan Corbet are truly device-unique. One counterexample in the drm modeset interfaces is 1875ecd0a06SJonathan Corbet that the per-device modeset objects like connectors share a namespace with 1885ecd0a06SJonathan Corbet framebuffer objects, which mostly are not shared at all. A separate 1895ecd0a06SJonathan Corbet namespace, private by default, for framebuffers would have been more 1905ecd0a06SJonathan Corbet suitable. 1915ecd0a06SJonathan Corbet 1925ecd0a06SJonathan Corbet * Think about uniqueness requirements for userspace handles. E.g. for most drm 1935ecd0a06SJonathan Corbet drivers it's a userspace bug to submit the same object twice in the same 1945ecd0a06SJonathan Corbet command submission ioctl. But then if objects are shareable userspace needs 1955ecd0a06SJonathan Corbet to know whether it has seen an imported object from a different process 1965ecd0a06SJonathan Corbet already or not. I haven't tried this myself yet due to lack of a new class 1975ecd0a06SJonathan Corbet of objects, but consider using inode numbers on your shared file descriptors 1985ecd0a06SJonathan Corbet as unique identifiers - it's how real files are told apart, too. 1995ecd0a06SJonathan Corbet Unfortunately this requires a full-blown virtual filesystem in the kernel. 2005ecd0a06SJonathan Corbet 2015ecd0a06SJonathan Corbet 2025ecd0a06SJonathan CorbetLast, but not Least 2035ecd0a06SJonathan Corbet------------------- 2045ecd0a06SJonathan Corbet 2055ecd0a06SJonathan CorbetNot every problem needs a new ioctl: 2065ecd0a06SJonathan Corbet 2075ecd0a06SJonathan Corbet * Think hard whether you really want a driver-private interface. Of course 2085ecd0a06SJonathan Corbet it's much quicker to push a driver-private interface than engaging in 2095ecd0a06SJonathan Corbet lengthy discussions for a more generic solution. And occasionally doing a 2105ecd0a06SJonathan Corbet private interface to spearhead a new concept is what's required. But in the 211*d56b699dSBjorn Helgaas end, once the generic interface comes around you'll end up maintaining two 2125ecd0a06SJonathan Corbet interfaces. Indefinitely. 2135ecd0a06SJonathan Corbet 2145ecd0a06SJonathan Corbet * Consider other interfaces than ioctls. A sysfs attribute is much better for 2155ecd0a06SJonathan Corbet per-device settings, or for child objects with fairly static lifetimes (like 2165ecd0a06SJonathan Corbet output connectors in drm with all the detection override attributes). Or 2175ecd0a06SJonathan Corbet maybe only your testsuite needs this interface, and then debugfs with its 2185ecd0a06SJonathan Corbet disclaimer of not having a stable ABI would be better. 2195ecd0a06SJonathan Corbet 2205ecd0a06SJonathan CorbetFinally, the name of the game is to get it right on the first attempt, since if 2215ecd0a06SJonathan Corbetyour driver proves popular and your hardware platforms long-lived then you'll 2225ecd0a06SJonathan Corbetbe stuck with a given ioctl essentially forever. You can try to deprecate 2235ecd0a06SJonathan Corbethorrible ioctls on newer iterations of your hardware, but generally it takes 2245ecd0a06SJonathan Corbetyears to accomplish this. And then again years until the last user able to 2255ecd0a06SJonathan Corbetcomplain about regressions disappears, too. 226