Content¶
1. intro¶
通用的 IO Para-Virt Protocol, 跨 hypervisor/guest VM.
2. virtio: the three goals¶
3 goals
driver unification
uniformity to provide common ABI for (1) general publication (2) use of buffer.
device probing and config
如果 developer 熟悉 Linux, 他們可能會直接 map Linux API 在 (virtual IO mechanism?) ABI 上面.
cross-device: common ABI for (1) general publication (2) use of buffer.
provide a two complete ABI implementations
using (1) virtio_ring infrastructure (2) Linux API for virtual IO device.
implement final part of virtual IO device: device probe and config
explicit seperate (1) driver (2) transport (3) config
反例: 如果其他 hypervisor 要用 Xen Linux network driver, 必須 support Xenbus probing and config system
3. virtio: a Linux internal abstraction API¶
4 parts of config op
RW feature bits
RW config space
RW status bits
device reset
feature bit
device feature: e.g. VIRTIO_NET_F_CSUM is checksum offload feature of net device
device specific
config space
when device feature VIRTIO_NET_F_MAC is set
MAC address of device is in config space
RW 8 bits status word
indicate device probe
VIRTIO_CONFIG_S_DRIVER_OK is set means that host knows what feature it understands and wants to use.
Transport Abstraction: Virtqueue
virtio-blk has one queue
virtio-net/console has two queues, input/output both use one.
each buffer is a scatter-gather array
virtqueue op struct: 5 ops
add_buf
kick
get_buf
enable/disable_cb
data exchange flow: add_buf() => kick host() => host update data => get_buf()
5 ops
disable_cb is a hint that guest doesn’t want to know “when the buffer is used” => same as disable interrupt.
4. virtio_ring: a transport implementation for virtio¶
virtio ring consist of 3 parts
descriptor array: (addr, length) pair chain.
avail ring: guest indicate. chain is ready to use.
used ring: host indicate. chain is used.
descriptor array:
(addr, length)
optional next
flags: 2 bits, 1 for RW, 1 for next option
used ring 有故意跟 available ring/descriptor array 放在不同 page. 這樣 cache 的表現會比較好.
interrupt suppression flag
available ring 跟 used ring 都有.
for optimization (virtqueue kick (vmexit/trap) and interrupt completion)
available ring 是 guest 用來通知 host 不用送 completion interrupt (disable_cb)
used ring 是 host 用來通知 guest 不用 virtqueue kick host.
optimization example?
4.1. a note on Zero-Copy and religion of page flipping¶
efficient IO need 2 things
the number of notification per op. => by virtio_ring interrupt suppression flag.
amount of cache-cold data which is accessed.
Zero-Copy
page flipping
5. current virtio drivers¶
virtio block driver: single request queue
first 16 byte is RO header = (type, ioprio, sector)
4 kinds of type: R, W, SCSI command, W Barrier
IO priority hint
sector: 512 bytes offset
SCSI command: e.g. (a) eject virtual CDROM. (b) implement SCSI HBA over virtio.
virtio network driver
2 virtqueue: transmission and receiving virtqueue.
Virtual HW set large MTU. It reduce the number of hypercall.
large MTU means few PCI transfer to card.
In virtual env, it means fewer numbers of calls out from virtual env.
guest can set interrupt suppression flag for transmission virtqueue. guest doesn’t care when transmission is finish.
The only exception is when the queue is full.