1baa293e9SMauro Carvalho Chehab========================================== 2baa293e9SMauro Carvalho ChehabXillybus driver for generic FPGA interface 3baa293e9SMauro Carvalho Chehab========================================== 4baa293e9SMauro Carvalho Chehab 5baa293e9SMauro Carvalho Chehab:Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com) 6baa293e9SMauro Carvalho Chehab:Email: eli.billauer@gmail.com or as advertised on Xillybus' site. 7baa293e9SMauro Carvalho Chehab 8baa293e9SMauro Carvalho Chehab.. Contents: 9baa293e9SMauro Carvalho Chehab 10baa293e9SMauro Carvalho Chehab - Introduction 11baa293e9SMauro Carvalho Chehab -- Background 12baa293e9SMauro Carvalho Chehab -- Xillybus Overview 13baa293e9SMauro Carvalho Chehab 14baa293e9SMauro Carvalho Chehab - Usage 15baa293e9SMauro Carvalho Chehab -- User interface 16baa293e9SMauro Carvalho Chehab -- Synchronization 17baa293e9SMauro Carvalho Chehab -- Seekable pipes 18baa293e9SMauro Carvalho Chehab 19baa293e9SMauro Carvalho Chehab - Internals 20baa293e9SMauro Carvalho Chehab -- Source code organization 21baa293e9SMauro Carvalho Chehab -- Pipe attributes 22baa293e9SMauro Carvalho Chehab -- Host never reads from the FPGA 23baa293e9SMauro Carvalho Chehab -- Channels, pipes, and the message channel 24baa293e9SMauro Carvalho Chehab -- Data streaming 25baa293e9SMauro Carvalho Chehab -- Data granularity 26baa293e9SMauro Carvalho Chehab -- Probing 27baa293e9SMauro Carvalho Chehab -- Buffer allocation 28baa293e9SMauro Carvalho Chehab -- The "nonempty" message (supporting poll) 29baa293e9SMauro Carvalho Chehab 30baa293e9SMauro Carvalho Chehab 31baa293e9SMauro Carvalho ChehabIntroduction 32baa293e9SMauro Carvalho Chehab============ 33baa293e9SMauro Carvalho Chehab 34baa293e9SMauro Carvalho ChehabBackground 35baa293e9SMauro Carvalho Chehab---------- 36baa293e9SMauro Carvalho Chehab 37baa293e9SMauro Carvalho ChehabAn FPGA (Field Programmable Gate Array) is a piece of logic hardware, which 38baa293e9SMauro Carvalho Chehabcan be programmed to become virtually anything that is usually found as a 39baa293e9SMauro Carvalho Chehabdedicated chipset: For instance, a display adapter, network interface card, 40baa293e9SMauro Carvalho Chehabor even a processor with its peripherals. FPGAs are the LEGO of hardware: 41baa293e9SMauro Carvalho ChehabBased upon certain building blocks, you make your own toys the way you like 42baa293e9SMauro Carvalho Chehabthem. It's usually pointless to reimplement something that is already 43baa293e9SMauro Carvalho Chehabavailable on the market as a chipset, so FPGAs are mostly used when some 44baa293e9SMauro Carvalho Chehabspecial functionality is needed, and the production volume is relatively low 45baa293e9SMauro Carvalho Chehab(hence not justifying the development of an ASIC). 46baa293e9SMauro Carvalho Chehab 47baa293e9SMauro Carvalho ChehabThe challenge with FPGAs is that everything is implemented at a very low 48baa293e9SMauro Carvalho Chehablevel, even lower than assembly language. In order to allow FPGA designers to 49baa293e9SMauro Carvalho Chehabfocus on their specific project, and not reinvent the wheel over and over 50baa293e9SMauro Carvalho Chehabagain, pre-designed building blocks, IP cores, are often used. These are the 51baa293e9SMauro Carvalho ChehabFPGA parallels of library functions. IP cores may implement certain 52baa293e9SMauro Carvalho Chehabmathematical functions, a functional unit (e.g. a USB interface), an entire 53baa293e9SMauro Carvalho Chehabprocessor (e.g. ARM) or anything that might come handy. Think of them as a 54baa293e9SMauro Carvalho Chehabbuilding block, with electrical wires dangling on the sides for connection to 55baa293e9SMauro Carvalho Chehabother blocks. 56baa293e9SMauro Carvalho Chehab 57baa293e9SMauro Carvalho ChehabOne of the daunting tasks in FPGA design is communicating with a fullblown 58baa293e9SMauro Carvalho Chehaboperating system (actually, with the processor running it): Implementing the 59baa293e9SMauro Carvalho Chehablow-level bus protocol and the somewhat higher-level interface with the host 60baa293e9SMauro Carvalho Chehab(registers, interrupts, DMA etc.) is a project in itself. When the FPGA's 61baa293e9SMauro Carvalho Chehabfunction is a well-known one (e.g. a video adapter card, or a NIC), it can 62baa293e9SMauro Carvalho Chehabmake sense to design the FPGA's interface logic specifically for the project. 63baa293e9SMauro Carvalho ChehabA special driver is then written to present the FPGA as a well-known interface 64baa293e9SMauro Carvalho Chehabto the kernel and/or user space. In that case, there is no reason to treat the 65baa293e9SMauro Carvalho ChehabFPGA differently than any device on the bus. 66baa293e9SMauro Carvalho Chehab 67baa293e9SMauro Carvalho ChehabIt's however common that the desired data communication doesn't fit any well- 68baa293e9SMauro Carvalho Chehabknown peripheral function. Also, the effort of designing an elegant 69baa293e9SMauro Carvalho Chehababstraction for the data exchange is often considered too big. In those cases, 70baa293e9SMauro Carvalho Chehaba quicker and possibly less elegant solution is sought: The driver is 71baa293e9SMauro Carvalho Chehabeffectively written as a user space program, leaving the kernel space part 72baa293e9SMauro Carvalho Chehabwith just elementary data transport. This still requires designing some 73baa293e9SMauro Carvalho Chehabinterface logic for the FPGA, and write a simple ad-hoc driver for the kernel. 74baa293e9SMauro Carvalho Chehab 75baa293e9SMauro Carvalho ChehabXillybus Overview 76baa293e9SMauro Carvalho Chehab----------------- 77baa293e9SMauro Carvalho Chehab 78baa293e9SMauro Carvalho ChehabXillybus is an IP core and a Linux driver. Together, they form a kit for 79baa293e9SMauro Carvalho Chehabelementary data transport between an FPGA and the host, providing pipe-like 80baa293e9SMauro Carvalho Chehabdata streams with a straightforward user interface. It's intended as a low- 81baa293e9SMauro Carvalho Chehabeffort solution for mixed FPGA-host projects, for which it makes sense to 82baa293e9SMauro Carvalho Chehabhave the project-specific part of the driver running in a user-space program. 83baa293e9SMauro Carvalho Chehab 84baa293e9SMauro Carvalho ChehabSince the communication requirements may vary significantly from one FPGA 85baa293e9SMauro Carvalho Chehabproject to another (the number of data pipes needed in each direction and 86baa293e9SMauro Carvalho Chehabtheir attributes), there isn't one specific chunk of logic being the Xillybus 87baa293e9SMauro Carvalho ChehabIP core. Rather, the IP core is configured and built based upon a 88baa293e9SMauro Carvalho Chehabspecification given by its end user. 89baa293e9SMauro Carvalho Chehab 90baa293e9SMauro Carvalho ChehabXillybus presents independent data streams, which resemble pipes or TCP/IP 91baa293e9SMauro Carvalho Chehabcommunication to the user. At the host side, a character device file is used 92baa293e9SMauro Carvalho Chehabjust like any pipe file. On the FPGA side, hardware FIFOs are used to stream 93baa293e9SMauro Carvalho Chehabthe data. This is contrary to a common method of communicating through fixed- 94baa293e9SMauro Carvalho Chehabsized buffers (even though such buffers are used by Xillybus under the hood). 95baa293e9SMauro Carvalho ChehabThere may be more than a hundred of these streams on a single IP core, but 96baa293e9SMauro Carvalho Chehabalso no more than one, depending on the configuration. 97baa293e9SMauro Carvalho Chehab 98baa293e9SMauro Carvalho ChehabIn order to ease the deployment of the Xillybus IP core, it contains a simple 99baa293e9SMauro Carvalho Chehabdata structure which completely defines the core's configuration. The Linux 100baa293e9SMauro Carvalho Chehabdriver fetches this data structure during its initialization process, and sets 101baa293e9SMauro Carvalho Chehabup the DMA buffers and character devices accordingly. As a result, a single 102baa293e9SMauro Carvalho Chehabdriver is used to work out of the box with any Xillybus IP core. 103baa293e9SMauro Carvalho Chehab 104baa293e9SMauro Carvalho ChehabThe data structure just mentioned should not be confused with PCI's 105baa293e9SMauro Carvalho Chehabconfiguration space or the Flattened Device Tree. 106baa293e9SMauro Carvalho Chehab 107baa293e9SMauro Carvalho ChehabUsage 108baa293e9SMauro Carvalho Chehab===== 109baa293e9SMauro Carvalho Chehab 110baa293e9SMauro Carvalho ChehabUser interface 111baa293e9SMauro Carvalho Chehab-------------- 112baa293e9SMauro Carvalho Chehab 113baa293e9SMauro Carvalho ChehabOn the host, all interface with Xillybus is done through /dev/xillybus_* 114baa293e9SMauro Carvalho Chehabdevice files, which are generated automatically as the drivers loads. The 115baa293e9SMauro Carvalho Chehabnames of these files depend on the IP core that is loaded in the FPGA (see 116baa293e9SMauro Carvalho ChehabProbing below). To communicate with the FPGA, open the device file that 117baa293e9SMauro Carvalho Chehabcorresponds to the hardware FIFO you want to send data or receive data from, 118baa293e9SMauro Carvalho Chehaband use plain write() or read() calls, just like with a regular pipe. In 119baa293e9SMauro Carvalho Chehabparticular, it makes perfect sense to go:: 120baa293e9SMauro Carvalho Chehab 121baa293e9SMauro Carvalho Chehab $ cat mydata > /dev/xillybus_thisfifo 122baa293e9SMauro Carvalho Chehab 123baa293e9SMauro Carvalho Chehab $ cat /dev/xillybus_thatfifo > hisdata 124baa293e9SMauro Carvalho Chehab 125baa293e9SMauro Carvalho Chehabpossibly pressing CTRL-C as some stage, even though the xillybus_* pipes have 126baa293e9SMauro Carvalho Chehabthe capability to send an EOF (but may not use it). 127baa293e9SMauro Carvalho Chehab 128baa293e9SMauro Carvalho ChehabThe driver and hardware are designed to behave sensibly as pipes, including: 129baa293e9SMauro Carvalho Chehab 130baa293e9SMauro Carvalho Chehab* Supporting non-blocking I/O (by setting O_NONBLOCK on open() ). 131baa293e9SMauro Carvalho Chehab 132baa293e9SMauro Carvalho Chehab* Supporting poll() and select(). 133baa293e9SMauro Carvalho Chehab 134baa293e9SMauro Carvalho Chehab* Being bandwidth efficient under load (using DMA) but also handle small 135baa293e9SMauro Carvalho Chehab pieces of data sent across (like TCP/IP) by autoflushing. 136baa293e9SMauro Carvalho Chehab 137baa293e9SMauro Carvalho ChehabA device file can be read only, write only or bidirectional. Bidirectional 138baa293e9SMauro Carvalho Chehabdevice files are treated like two independent pipes (except for sharing a 139baa293e9SMauro Carvalho Chehab"channel" structure in the implementation code). 140baa293e9SMauro Carvalho Chehab 141baa293e9SMauro Carvalho ChehabSynchronization 142baa293e9SMauro Carvalho Chehab--------------- 143baa293e9SMauro Carvalho Chehab 144baa293e9SMauro Carvalho ChehabXillybus pipes are configured (on the IP core) to be either synchronous or 145baa293e9SMauro Carvalho Chehabasynchronous. For a synchronous pipe, write() returns successfully only after 146baa293e9SMauro Carvalho Chehabsome data has been submitted and acknowledged by the FPGA. This slows down 147baa293e9SMauro Carvalho Chehabbulk data transfers, and is nearly impossible for use with streams that 148baa293e9SMauro Carvalho Chehabrequire data at a constant rate: There is no data transmitted to the FPGA 149baa293e9SMauro Carvalho Chehabbetween write() calls, in particular when the process loses the CPU. 150baa293e9SMauro Carvalho Chehab 151baa293e9SMauro Carvalho ChehabWhen a pipe is configured asynchronous, write() returns if there was enough 152baa293e9SMauro Carvalho Chehabroom in the buffers to store any of the data in the buffers. 153baa293e9SMauro Carvalho Chehab 154baa293e9SMauro Carvalho ChehabFor FPGA to host pipes, asynchronous pipes allow data transfer from the FPGA 155baa293e9SMauro Carvalho Chehabas soon as the respective device file is opened, regardless of if the data 156baa293e9SMauro Carvalho Chehabhas been requested by a read() call. On synchronous pipes, only the amount 157baa293e9SMauro Carvalho Chehabof data requested by a read() call is transmitted. 158baa293e9SMauro Carvalho Chehab 159baa293e9SMauro Carvalho ChehabIn summary, for synchronous pipes, data between the host and FPGA is 160baa293e9SMauro Carvalho Chehabtransmitted only to satisfy the read() or write() call currently handled 161baa293e9SMauro Carvalho Chehabby the driver, and those calls wait for the transmission to complete before 162baa293e9SMauro Carvalho Chehabreturning. 163baa293e9SMauro Carvalho Chehab 164baa293e9SMauro Carvalho ChehabNote that the synchronization attribute has nothing to do with the possibility 165baa293e9SMauro Carvalho Chehabthat read() or write() completes less bytes than requested. There is a 166baa293e9SMauro Carvalho Chehabseparate configuration flag ("allowpartial") that determines whether such a 167baa293e9SMauro Carvalho Chehabpartial completion is allowed. 168baa293e9SMauro Carvalho Chehab 169baa293e9SMauro Carvalho ChehabSeekable pipes 170baa293e9SMauro Carvalho Chehab-------------- 171baa293e9SMauro Carvalho Chehab 172baa293e9SMauro Carvalho ChehabA synchronous pipe can be configured to have the stream's position exposed 173baa293e9SMauro Carvalho Chehabto the user logic at the FPGA. Such a pipe is also seekable on the host API. 174baa293e9SMauro Carvalho ChehabWith this feature, a memory or register interface can be attached on the 175baa293e9SMauro Carvalho ChehabFPGA side to the seekable stream. Reading or writing to a certain address in 176baa293e9SMauro Carvalho Chehabthe attached memory is done by seeking to the desired address, and calling 177baa293e9SMauro Carvalho Chehabread() or write() as required. 178baa293e9SMauro Carvalho Chehab 179baa293e9SMauro Carvalho Chehab 180baa293e9SMauro Carvalho ChehabInternals 181baa293e9SMauro Carvalho Chehab========= 182baa293e9SMauro Carvalho Chehab 183baa293e9SMauro Carvalho ChehabSource code organization 184baa293e9SMauro Carvalho Chehab------------------------ 185baa293e9SMauro Carvalho Chehab 186baa293e9SMauro Carvalho ChehabThe Xillybus driver consists of a core module, xillybus_core.c, and modules 187baa293e9SMauro Carvalho Chehabthat depend on the specific bus interface (xillybus_of.c and xillybus_pcie.c). 188baa293e9SMauro Carvalho Chehab 189baa293e9SMauro Carvalho ChehabThe bus specific modules are those probed when a suitable device is found by 190baa293e9SMauro Carvalho Chehabthe kernel. Since the DMA mapping and synchronization functions, which are bus 191baa293e9SMauro Carvalho Chehabdependent by their nature, are used by the core module, a 192baa293e9SMauro Carvalho Chehabxilly_endpoint_hardware structure is passed to the core module on 193baa293e9SMauro Carvalho Chehabinitialization. This structure is populated with pointers to wrapper functions 194baa293e9SMauro Carvalho Chehabwhich execute the DMA-related operations on the bus. 195baa293e9SMauro Carvalho Chehab 196baa293e9SMauro Carvalho ChehabPipe attributes 197baa293e9SMauro Carvalho Chehab--------------- 198baa293e9SMauro Carvalho Chehab 199baa293e9SMauro Carvalho ChehabEach pipe has a number of attributes which are set when the FPGA component 200baa293e9SMauro Carvalho Chehab(IP core) is built. They are fetched from the IDT (the data structure which 201baa293e9SMauro Carvalho Chehabdefines the core's configuration, see Probing below) by xilly_setupchannels() 202baa293e9SMauro Carvalho Chehabin xillybus_core.c as follows: 203baa293e9SMauro Carvalho Chehab 204baa293e9SMauro Carvalho Chehab* is_writebuf: The pipe's direction. A non-zero value means it's an FPGA to 205baa293e9SMauro Carvalho Chehab host pipe (the FPGA "writes"). 206baa293e9SMauro Carvalho Chehab 207baa293e9SMauro Carvalho Chehab* channelnum: The pipe's identification number in communication between the 208baa293e9SMauro Carvalho Chehab host and FPGA. 209baa293e9SMauro Carvalho Chehab 210baa293e9SMauro Carvalho Chehab* format: The underlying data width. See Data Granularity below. 211baa293e9SMauro Carvalho Chehab 212baa293e9SMauro Carvalho Chehab* allowpartial: A non-zero value means that a read() or write() (whichever 213baa293e9SMauro Carvalho Chehab applies) may return with less than the requested number of bytes. The common 214baa293e9SMauro Carvalho Chehab choice is a non-zero value, to match standard UNIX behavior. 215baa293e9SMauro Carvalho Chehab 216baa293e9SMauro Carvalho Chehab* synchronous: A non-zero value means that the pipe is synchronous. See 217baa293e9SMauro Carvalho Chehab Synchronization above. 218baa293e9SMauro Carvalho Chehab 219baa293e9SMauro Carvalho Chehab* bufsize: Each DMA buffer's size. Always a power of two. 220baa293e9SMauro Carvalho Chehab 221baa293e9SMauro Carvalho Chehab* bufnum: The number of buffers allocated for this pipe. Always a power of two. 222baa293e9SMauro Carvalho Chehab 223baa293e9SMauro Carvalho Chehab* exclusive_open: A non-zero value forces exclusive opening of the associated 224baa293e9SMauro Carvalho Chehab device file. If the device file is bidirectional, and already opened only in 225baa293e9SMauro Carvalho Chehab one direction, the opposite direction may be opened once. 226baa293e9SMauro Carvalho Chehab 227baa293e9SMauro Carvalho Chehab* seekable: A non-zero value indicates that the pipe is seekable. See 228baa293e9SMauro Carvalho Chehab Seekable pipes above. 229baa293e9SMauro Carvalho Chehab 230baa293e9SMauro Carvalho Chehab* supports_nonempty: A non-zero value (which is typical) indicates that the 231baa293e9SMauro Carvalho Chehab hardware will send the messages that are necessary to support select() and 232baa293e9SMauro Carvalho Chehab poll() for this pipe. 233baa293e9SMauro Carvalho Chehab 234baa293e9SMauro Carvalho ChehabHost never reads from the FPGA 235baa293e9SMauro Carvalho Chehab------------------------------ 236baa293e9SMauro Carvalho Chehab 237baa293e9SMauro Carvalho ChehabEven though PCI Express is hotpluggable in general, a typical motherboard 238baa293e9SMauro Carvalho Chehabdoesn't expect a card to go away all of the sudden. But since the PCIe card 239baa293e9SMauro Carvalho Chehabis based upon reprogrammable logic, a sudden disappearance from the bus is 240baa293e9SMauro Carvalho Chehabquite likely as a result of an accidental reprogramming of the FPGA while the 241baa293e9SMauro Carvalho Chehabhost is up. In practice, nothing happens immediately in such a situation. But 242baa293e9SMauro Carvalho Chehabif the host attempts to read from an address that is mapped to the PCI Express 243baa293e9SMauro Carvalho Chehabdevice, that leads to an immediate freeze of the system on some motherboards, 244baa293e9SMauro Carvalho Chehabeven though the PCIe standard requires a graceful recovery. 245baa293e9SMauro Carvalho Chehab 246baa293e9SMauro Carvalho ChehabIn order to avoid these freezes, the Xillybus driver refrains completely from 247baa293e9SMauro Carvalho Chehabreading from the device's register space. All communication from the FPGA to 248baa293e9SMauro Carvalho Chehabthe host is done through DMA. In particular, the Interrupt Service Routine 249baa293e9SMauro Carvalho Chehabdoesn't follow the common practice of checking a status register when it's 250baa293e9SMauro Carvalho Chehabinvoked. Rather, the FPGA prepares a small buffer which contains short 251baa293e9SMauro Carvalho Chehabmessages, which inform the host what the interrupt was about. 252baa293e9SMauro Carvalho Chehab 253baa293e9SMauro Carvalho ChehabThis mechanism is used on non-PCIe buses as well for the sake of uniformity. 254baa293e9SMauro Carvalho Chehab 255baa293e9SMauro Carvalho Chehab 256baa293e9SMauro Carvalho ChehabChannels, pipes, and the message channel 257baa293e9SMauro Carvalho Chehab---------------------------------------- 258baa293e9SMauro Carvalho Chehab 259baa293e9SMauro Carvalho ChehabEach of the (possibly bidirectional) pipes presented to the user is allocated 260baa293e9SMauro Carvalho Chehaba data channel between the FPGA and the host. The distinction between channels 261baa293e9SMauro Carvalho Chehaband pipes is necessary only because of channel 0, which is used for interrupt- 262baa293e9SMauro Carvalho Chehabrelated messages from the FPGA, and has no pipe attached to it. 263baa293e9SMauro Carvalho Chehab 264baa293e9SMauro Carvalho ChehabData streaming 265baa293e9SMauro Carvalho Chehab-------------- 266baa293e9SMauro Carvalho Chehab 267baa293e9SMauro Carvalho ChehabEven though a non-segmented data stream is presented to the user at both 268baa293e9SMauro Carvalho Chehabsides, the implementation relies on a set of DMA buffers which is allocated 269baa293e9SMauro Carvalho Chehabfor each channel. For the sake of illustration, let's take the FPGA to host 270baa293e9SMauro Carvalho Chehabdirection: As data streams into the respective channel's interface in the 271baa293e9SMauro Carvalho ChehabFPGA, the Xillybus IP core writes it to one of the DMA buffers. When the 272baa293e9SMauro Carvalho Chehabbuffer is full, the FPGA informs the host about that (appending a 273baa293e9SMauro Carvalho ChehabXILLYMSG_OPCODE_RELEASEBUF message channel 0 and sending an interrupt if 274baa293e9SMauro Carvalho Chehabnecessary). The host responds by making the data available for reading through 275baa293e9SMauro Carvalho Chehabthe character device. When all data has been read, the host writes on the 276*f31a03b1SRandy DunlapFPGA's buffer control register, allowing the buffer's overwriting. Flow 277baa293e9SMauro Carvalho Chehabcontrol mechanisms exist on both sides to prevent underflows and overflows. 278baa293e9SMauro Carvalho Chehab 279baa293e9SMauro Carvalho ChehabThis is not good enough for creating a TCP/IP-like stream: If the data flow 280baa293e9SMauro Carvalho Chehabstops momentarily before a DMA buffer is filled, the intuitive expectation is 281baa293e9SMauro Carvalho Chehabthat the partial data in buffer will arrive anyhow, despite the buffer not 282baa293e9SMauro Carvalho Chehabbeing completed. This is implemented by adding a field in the 283baa293e9SMauro Carvalho ChehabXILLYMSG_OPCODE_RELEASEBUF message, through which the FPGA informs not just 284baa293e9SMauro Carvalho Chehabwhich buffer is submitted, but how much data it contains. 285baa293e9SMauro Carvalho Chehab 286baa293e9SMauro Carvalho ChehabBut the FPGA will submit a partially filled buffer only if directed to do so 287baa293e9SMauro Carvalho Chehabby the host. This situation occurs when the read() method has been blocking 288baa293e9SMauro Carvalho Chehabfor XILLY_RX_TIMEOUT jiffies (currently 10 ms), after which the host commands 289baa293e9SMauro Carvalho Chehabthe FPGA to submit a DMA buffer as soon as it can. This timeout mechanism 290baa293e9SMauro Carvalho Chehabbalances between bus bandwidth efficiency (preventing a lot of partially 291baa293e9SMauro Carvalho Chehabfilled buffers being sent) and a latency held fairly low for tails of data. 292baa293e9SMauro Carvalho Chehab 293baa293e9SMauro Carvalho ChehabA similar setting is used in the host to FPGA direction. The handling of 294baa293e9SMauro Carvalho Chehabpartial DMA buffers is somewhat different, though. The user can tell the 295baa293e9SMauro Carvalho Chehabdriver to submit all data it has in the buffers to the FPGA, by issuing a 296baa293e9SMauro Carvalho Chehabwrite() with the byte count set to zero. This is similar to a flush request, 297baa293e9SMauro Carvalho Chehabbut it doesn't block. There is also an autoflushing mechanism, which triggers 298baa293e9SMauro Carvalho Chehaban equivalent flush roughly XILLY_RX_TIMEOUT jiffies after the last write(). 299baa293e9SMauro Carvalho ChehabThis allows the user to be oblivious about the underlying buffering mechanism 300baa293e9SMauro Carvalho Chehaband yet enjoy a stream-like interface. 301baa293e9SMauro Carvalho Chehab 302baa293e9SMauro Carvalho ChehabNote that the issue of partial buffer flushing is irrelevant for pipes having 303baa293e9SMauro Carvalho Chehabthe "synchronous" attribute nonzero, since synchronous pipes don't allow data 304baa293e9SMauro Carvalho Chehabto lay around in the DMA buffers between read() and write() anyhow. 305baa293e9SMauro Carvalho Chehab 306baa293e9SMauro Carvalho ChehabData granularity 307baa293e9SMauro Carvalho Chehab---------------- 308baa293e9SMauro Carvalho Chehab 309baa293e9SMauro Carvalho ChehabThe data arrives or is sent at the FPGA as 8, 16 or 32 bit wide words, as 310baa293e9SMauro Carvalho Chehabconfigured by the "format" attribute. Whenever possible, the driver attempts 311baa293e9SMauro Carvalho Chehabto hide this when the pipe is accessed differently from its natural alignment. 312baa293e9SMauro Carvalho ChehabFor example, reading single bytes from a pipe with 32 bit granularity works 313baa293e9SMauro Carvalho Chehabwith no issues. Writing single bytes to pipes with 16 or 32 bit granularity 314baa293e9SMauro Carvalho Chehabwill also work, but the driver can't send partially completed words to the 315baa293e9SMauro Carvalho ChehabFPGA, so the transmission of up to one word may be held until it's fully 316baa293e9SMauro Carvalho Chehaboccupied with user data. 317baa293e9SMauro Carvalho Chehab 318baa293e9SMauro Carvalho ChehabThis somewhat complicates the handling of host to FPGA streams, because 319baa293e9SMauro Carvalho Chehabwhen a buffer is flushed, it may contain up to 3 bytes don't form a word in 320baa293e9SMauro Carvalho Chehabthe FPGA, and hence can't be sent. To prevent loss of data, these leftover 321baa293e9SMauro Carvalho Chehabbytes need to be moved to the next buffer. The parts in xillybus_core.c 322baa293e9SMauro Carvalho Chehabthat mention "leftovers" in some way are related to this complication. 323baa293e9SMauro Carvalho Chehab 324baa293e9SMauro Carvalho ChehabProbing 325baa293e9SMauro Carvalho Chehab------- 326baa293e9SMauro Carvalho Chehab 327baa293e9SMauro Carvalho ChehabAs mentioned earlier, the number of pipes that are created when the driver 328baa293e9SMauro Carvalho Chehabloads and their attributes depend on the Xillybus IP core in the FPGA. During 329baa293e9SMauro Carvalho Chehabthe driver's initialization, a blob containing configuration info, the 330baa293e9SMauro Carvalho ChehabInterface Description Table (IDT), is sent from the FPGA to the host. The 331baa293e9SMauro Carvalho Chehabbootstrap process is done in three phases: 332baa293e9SMauro Carvalho Chehab 333baa293e9SMauro Carvalho Chehab1. Acquire the length of the IDT, so a buffer can be allocated for it. This 334baa293e9SMauro Carvalho Chehab is done by sending a quiesce command to the device, since the acknowledge 335baa293e9SMauro Carvalho Chehab for this command contains the IDT's buffer length. 336baa293e9SMauro Carvalho Chehab 337baa293e9SMauro Carvalho Chehab2. Acquire the IDT itself. 338baa293e9SMauro Carvalho Chehab 339baa293e9SMauro Carvalho Chehab3. Create the interfaces according to the IDT. 340baa293e9SMauro Carvalho Chehab 341baa293e9SMauro Carvalho ChehabBuffer allocation 342baa293e9SMauro Carvalho Chehab----------------- 343baa293e9SMauro Carvalho Chehab 344baa293e9SMauro Carvalho ChehabIn order to simplify the logic that prevents illegal boundary crossings of 345baa293e9SMauro Carvalho ChehabPCIe packets, the following rule applies: If a buffer is smaller than 4kB, 346baa293e9SMauro Carvalho Chehabit must not cross a 4kB boundary. Otherwise, it must be 4kB aligned. The 347baa293e9SMauro Carvalho Chehabxilly_setupchannels() functions allocates these buffers by requesting whole 348baa293e9SMauro Carvalho Chehabpages from the kernel, and diving them into DMA buffers as necessary. Since 349baa293e9SMauro Carvalho Chehaball buffers' sizes are powers of two, it's possible to pack any set of such 350baa293e9SMauro Carvalho Chehabbuffers, with a maximal waste of one page of memory. 351baa293e9SMauro Carvalho Chehab 352baa293e9SMauro Carvalho ChehabAll buffers are allocated when the driver is loaded. This is necessary, 353baa293e9SMauro Carvalho Chehabsince large continuous physical memory segments are sometimes requested, 354baa293e9SMauro Carvalho Chehabwhich are more likely to be available when the system is freshly booted. 355baa293e9SMauro Carvalho Chehab 356baa293e9SMauro Carvalho ChehabThe allocation of buffer memory takes place in the same order they appear in 357baa293e9SMauro Carvalho Chehabthe IDT. The driver relies on a rule that the pipes are sorted with decreasing 358baa293e9SMauro Carvalho Chehabbuffer size in the IDT. If a requested buffer is larger or equal to a page, 359baa293e9SMauro Carvalho Chehabthe necessary number of pages is requested from the kernel, and these are 360baa293e9SMauro Carvalho Chehabused for this buffer. If the requested buffer is smaller than a page, one 361baa293e9SMauro Carvalho Chehabsingle page is requested from the kernel, and that page is partially used. 362baa293e9SMauro Carvalho ChehabOr, if there already is a partially used page at hand, the buffer is packed 363baa293e9SMauro Carvalho Chehabinto that page. It can be shown that all pages requested from the kernel 364baa293e9SMauro Carvalho Chehab(except possibly for the last) are 100% utilized this way. 365baa293e9SMauro Carvalho Chehab 366baa293e9SMauro Carvalho ChehabThe "nonempty" message (supporting poll) 367baa293e9SMauro Carvalho Chehab---------------------------------------- 368baa293e9SMauro Carvalho Chehab 369baa293e9SMauro Carvalho ChehabIn order to support the "poll" method (and hence select() ), there is a small 370baa293e9SMauro Carvalho Chehabcatch regarding the FPGA to host direction: The FPGA may have filled a DMA 371baa293e9SMauro Carvalho Chehabbuffer with some data, but not submitted that buffer. If the host waited for 372baa293e9SMauro Carvalho Chehabthe buffer's submission by the FPGA, there would be a possibility that the 373baa293e9SMauro Carvalho ChehabFPGA side has sent data, but a select() call would still block, because the 374baa293e9SMauro Carvalho Chehabhost has not received any notification about this. This is solved with 375baa293e9SMauro Carvalho ChehabXILLYMSG_OPCODE_NONEMPTY messages sent by the FPGA when a channel goes from 376baa293e9SMauro Carvalho Chehabcompletely empty to containing some data. 377baa293e9SMauro Carvalho Chehab 378baa293e9SMauro Carvalho ChehabThese messages are used only to support poll() and select(). The IP core can 379baa293e9SMauro Carvalho Chehabbe configured not to send them for a slight reduction of bandwidth. 380