QEMU Internal: RTL8139


In my previous post, I give a basic introduction on pcnet emulation and display the stacktrace of execution flow of the emulation.
In this post I will give a introduction on RTL8139 emulation in QEMU. Different from the previous post, I will omit the execution flow of RTL8139 I/O operation. Instead, I want to put more focus on how the emulated registers are used and how user controlled data go into the vulnerable function and trigger the vulnerability.
In QEMU, all RTL8139 emulation is implemented in rtl8139.c.
The concept of DMA will be introduced in this post. But more details on that will be given in next post.

Device Initialization

Similar to PCNET, the device initialization is done in the following two function:
RTL8139 Class Initialization

static void rtl8139_class_init(ObjectClass *klass, void *data)
    DeviceClass *dc = DEVICE_CLASS(klass);
    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);

    k->realize = pci_rtl8139_realize;
    k->exit = pci_rtl8139_uninit;
    k->romfile = "efi-rtl8139.rom";
    k->vendor_id = PCI_VENDOR_ID_REALTEK;
    k->device_id = PCI_DEVICE_ID_REALTEK_8139;
    k->revision = RTL8139_PCI_REVID; /* >=0x20 is for 8139C+ */
    dc->reset = rtl8139_reset;
    dc->vmsd = &vmstate_rtl8139;
    dc->props = rtl8139_properties;
    set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);

RTL8139 necessary function Initialization

static void pci_rtl8139_realize(PCIDevice *dev, Error **errp)
    RTL8139State *s = RTL8139(dev);
    DeviceState *d = DEVICE(dev);
    uint8_t *pci_conf;

    pci_conf = dev->config;
    pci_conf[PCI_INTERRUPT_PIN] = 1;    /* interrupt pin A */
    /* TODO: start of capability list, but no capability
     * list bit in status register, and offset 0xdc seems unused. */
    pci_conf[PCI_CAPABILITY_LIST] = 0xdc;

    memory_region_init_io(&s->bar_io, OBJECT(s), &rtl8139_io_ops, s,
                          "rtl8139", 0x100);
    memory_region_init_io(&s->bar_mem, OBJECT(s), &rtl8139_mmio_ops, s,
                          "rtl8139", 0x100);
    pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &s->bar_io);
    pci_register_bar(dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->bar_mem);


    /* prepare eeprom */
    s->eeprom.contents[0] = 0x8129;
#if 1
    /* PCI vendor and device ID should be mirrored here */
    s->eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
    s->eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
    s->eeprom.contents[7] = s->conf.macaddr.a[0] | s->conf.macaddr.a[1] << 8;
    s->eeprom.contents[8] = s->conf.macaddr.a[2] | s->conf.macaddr.a[3] << 8;
    s->eeprom.contents[9] = s->conf.macaddr.a[4] | s->conf.macaddr.a[5] << 8;

    s->nic = qemu_new_nic(&net_rtl8139_info, &s->conf,
                          object_get_typename(OBJECT(dev)), d->id, s);
    qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);

    s->cplus_txbuffer = NULL;
    s->cplus_txbuffer_len = 0;
    s->cplus_txbuffer_offset = 0;

    s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, rtl8139_timer, s);

Emulated Register

The emulated registers for RTL8139 is much more complicated than PCNET. More details on the meaning of the those registers will be given in next chapter.

typedef struct RTL8139State {
    /*< private >*/
    PCIDevice parent_obj;
    /*< public >*/

    uint8_t phys[8]; /* mac address */
    uint8_t mult[8]; /* multicast mask array */

    uint32_t TxStatus[4]; /* TxStatus0 in C mode*/ /* also DTCCR[0] and DTCCR[1] in C+ mode */
    uint32_t TxAddr[4];   /* TxAddr0 */
    uint32_t RxBuf;       /* Receive buffer */
    uint32_t RxBufferSize;/* internal variable, receive ring buffer size in C mode */
    uint32_t RxBufPtr;
    uint32_t RxBufAddr;

    uint16_t IntrStatus;
    uint16_t IntrMask;

    uint32_t TxConfig;
    uint32_t RxConfig;
    uint32_t RxMissed;

    uint16_t CSCR;

    uint8_t  Cfg9346;
    uint8_t  Config0;
    uint8_t  Config1;
    uint8_t  Config3;
    uint8_t  Config4;
    uint8_t  Config5;

    uint8_t  clock_enabled;
    uint8_t  bChipCmdState;

    uint16_t MultiIntr;

    uint16_t BasicModeCtrl;
    uint16_t BasicModeStatus;
    uint16_t NWayAdvert;
    uint16_t NWayLPAR;
    uint16_t NWayExpansion;

    uint16_t CpCmd;
    uint8_t  TxThresh;

    NICState *nic;
    NICConf conf;

    /* C ring mode */
    uint32_t   currTxDesc;

    /* C+ mode */
    uint32_t   cplus_enabled;

    uint32_t   currCPlusRxDesc;
    uint32_t   currCPlusTxDesc;

    uint32_t   RxRingAddrLO;
    uint32_t   RxRingAddrHI;

    EEprom9346 eeprom;

    uint32_t   TCTR;
    uint32_t   TimerInt;
    int64_t    TCTR_base;

    /* Tally counters */
    RTL8139TallyCounters tally_counters;

    /* Non-persistent data */
    uint8_t   *cplus_txbuffer;
    int        cplus_txbuffer_len;
    int        cplus_txbuffer_offset;

    /* PCI interrupt timer */
    QEMUTimer *timer;

    MemoryRegion bar_io;
    MemoryRegion bar_mem;

    /* Support migration to/from old versions */
    int rtl8139_mmio_io_addr_dummy;
} RTL8139State;

A more general and simple expression of the structure is already given in [1].

I/O Communication

According to our experience on PCNET, we can soon locate the critical function:

static void rtl8139_ioport_write(void *opaque, hwaddr addr,
                                 uint64_t val, unsigned size)
    switch (size) {
    case 1:
        rtl8139_io_writeb(opaque, addr, val);
    case 2:
        rtl8139_io_writew(opaque, addr, val);
    case 4:
        rtl8139_io_writel(opaque, addr, val);

static uint64_t rtl8139_ioport_read(void *opaque, hwaddr addr,
                                    unsigned size)
    switch (size) {
    case 1:
        return rtl8139_io_readb(opaque, addr);
    case 2:
        return rtl8139_io_readw(opaque, addr);
    case 4:
        return rtl8139_io_readl(opaque, addr);

    return -1;

Actually, only rtl8139_ioport_write, i.e. outb/outw/outl, is used in the exploit given in [2].

Set Receiver Ring

The first step in exploit is to initialize the receiver ring. Before setting the receiver ring, rtl8139_desc_config_rx function will prepare an array of rtl8139_ring with size 0x2c, then physical address of the array in the guest machine. Than it sets the receiver ring address to the physical address.

struct rtl8139_desc {
	uint32_t dw0;
	uint32_t dw1;
	uint32_t buf_lo;
	uint32_t buf_hi;

struct rtl8139_ring {
	struct rtl8139_desc *desc;
	void                *buffer;

addr = (uint32_t)gva_to_gpa(desc);
outl(addr, RTL8139_PORT + RxRingAddrLO);
outl(0x0, RTL8139_PORT + RxRingAddrHI);

In function rtl8139_io_writel, it will set the lower 32-bit address and higher 32-bit address of the receiver buffer.

RTL8139State *s = opaque;
switch (addr)
    case RxRingAddrLO:
         DPRINTF("C+ RxRing low bits write val=0x%08x\n", val);
         s->RxRingAddrLO = val;

    case RxRingAddrHI:
         DPRINTF("C+ RxRing high bits write val=0x%08x\n", val);
         s->RxRingAddrHI = val;

Set Transmit Description Buffer Address

The next step in exploit is to initialize the transmit address. More specifically, it is s->TxAddr[0] and s->TxAddr[1] that are set to the physical address of buffer in guest machine.

addr = (uint32_t)gva_to_gpa(desc);
outl(addr, RTL8139_PORT + TxAddr0);
outl(0x0, RTL8139_PORT + TxAddr0 + 0x4);
RTL8139State *s = opaque;
switch (addr)
     case TxAddr0 ... TxAddr0+4*4-1:
          rtl8139_TxAddr_write(s, addr-TxAddr0, val);

static void rtl8139_TxAddr_write(RTL8139State *s, uint32_t txAddrOffset, uint32_t val)
    DPRINTF("TxAddr write offset=0x%x val=0x%08x\n", txAddrOffset, val);

    s->TxAddr[txAddrOffset/4] = val;

Set RTL8139 Card

In the step of setting the configuration of RTL8139 card, s->TxConfig is set to TxLoopBack and s->RxConfig is set to AcceptMyPhys.

outl(TxLoopBack, RTL8139_PORT + TxConfig);
outl(AcceptMyPhys, RTL8139_PORT + RxConfig);
RTL8139State *s = opaque;
switch (addr)
    case TxConfig:
         rtl8139_TxConfig_write(s, val);
    case RxConfig:
         rtl8139_RxConfig_write(s, val);

static void rtl8139_TxConfig_write(RTL8139State *s, uint32_t val)
    if (!rtl8139_transmitter_enabled(s))
    val = SET_MASKED(val, TxVersionMask | 0x8070f80f, s->TxConfig);
    s->TxConfig = val;

static void rtl8139_RxConfig_write(RTL8139State *s, uint32_t val)
    val = SET_MASKED(val, 0xf0fc0040, s->RxConfig);
    s->RxConfig = val;
    rtl8139_reset_rxring(s, 8192 << ((s->RxConfig >> 11) & 0x3));

Then s->CpCmd is set to CPlusRxEnb|CPlusTxEnb

outw(CPlusRxEnb|CPlusTxEnb, RTL8139_PORT + CpCmd);
static void rtl8139_CpCmd_write(RTL8139State *s, uint32_t val)
    val &= 0xffff;

    DPRINTF("C+ command register write(w) val=0x%04x\n", val);

    s->cplus_enabled = 1;

    /* mask unwritable bits */
    val = SET_MASKED(val, 0xff84, s->CpCmd);

    s->CpCmd = val;

Next, s->currCPlusRxDesc and s->currCPlusTxDesc are both set to 0.

outb(CmdRxEnb|CmdTxEnb, RTL8139_PORT + ChipCmd);
case ChipCmd:
     rtl8139_ChipCmd_write(s, val);
static void rtl8139_ChipCmd_write(RTL8139State *s, uint32_t val)
    DeviceState *d = DEVICE(s);
    val &= 0xff;
    if (val & CmdReset)
    if (val & CmdRxEnb)
        s->currCPlusRxDesc = 0;
    if (val & CmdTxEnb)
        s->currCPlusTxDesc = 0;

    /* mask unwritable bits */
    val = SET_MASKED(val, 0xe3, s->bChipCmdState);

    /* Deassert reset pin before next read */
    val &= ~CmdReset;

    s->bChipCmdState = val;

Start to Trigger

The we come to the trigger function that will eventually invoke the vulnerable function.

outb(CPlus, RTL8139_PORT + TxPoll)

In function rtl8139_io_writeb, it will call rtl8139_cplus_transmit_one with the set values in s.

case TxPoll:
     if (val & (1 << 7))
     if (val & (1 << 6))

static void rtl8139_cplus_transmit(RTL8139State *s)
    int txcount = 0;

    while (rtl8139_cplus_transmit_one(s))

Vulnerable Code: rtl8139_cplus_transmit_one

Now let us dive into the vulnerable code and see how the code works.
Step 1
First of all, it will get the physical address of the transmit description buffer address

dma_addr_t cplus_tx_ring_desc = rtl8139_addr64(s->TxAddr[0], s->TxAddr[1]);

pci_dma_read(d, cplus_tx_ring_desc,    (uint8_t *)&val, 4);
txdw0 = le32_to_cpu(val);
pci_dma_read(d, cplus_tx_ring_desc+4,  (uint8_t *)&val, 4);
txdw1 = le32_to_cpu(val);
pci_dma_read(d, cplus_tx_ring_desc+8,  (uint8_t *)&val, 4);
txbufLO = le32_to_cpu(val);
pci_dma_read(d, cplus_tx_ring_desc+12, (uint8_t *)&val, 4);
txbufHI = le32_to_cpu(val);

A very important function here is pci_dma_read. DMA is the abbreviation of Direct Memory Access, which enables data transfer between memory and PCI device. Here, pci_dma_read emulates such data transfer between the PHY_MEM and the local variable in QEMU binary. The details on this will be given in next post.

Step 2
According to the data retracted from the transmit description buffer, it continues to retract the transmit buffer (s->cplus_txbuffer), which will be used to store the send packet data later..

int txsize = txdw0 & CP_TX_BUFFER_SIZE_MASK;
dma_addr_t tx_addr = rtl8139_addr64(txbufLO, txbufHI);

if (!s->cplus_txbuffer)
    s->cplus_txbuffer_len = CP_TX_BUFFER_SIZE;
    s->cplus_txbuffer = g_malloc(s->cplus_txbuffer_len);
    s->cplus_txbuffer_offset = 0;
if (s->cplus_txbuffer_offset + txsize >= s->cplus_txbuffer_len)
    txsize = s->cplus_txbuffer_len - s->cplus_txbuffer_offset;
pci_dma_read(d, tx_addr, s->cplus_txbuffer + s->cplus_txbuffer_offset, txsize);
s->cplus_txbuffer_offset += txsize;

Step 3
After getting those necessary information for receiving packet, it begins to resolve the packet. The operation later will continue via saved_buffer and saved_size.

uint8_t *saved_buffer  = s->cplus_txbuffer;
int      saved_size    = s->cplus_txbuffer_offset;
int      saved_buffer_len = s->cplus_txbuffer_len;

Since we have set desc->dw0 to CP_TX_OWN | CP_TX_EOR | CP_TX_LS | CP_TX_LGSEN | CP_TX_IPCS | CP_TX_TCPCS earlier, we then come the resolve the packet based on the set protocol.


Step 4
Finally we come to the vulnerable code in CVE-2015-5165.

if (proto == ETH_P_IP)
    eth_payload_data = saved_buffer + ETH_HLEN;
    eth_payload_len  = saved_size   - ETH_HLEN;

    ip = (ip_header*)eth_payload_data;
        ip = NULL;
    } else {
        hlen = IP_HEADER_LENGTH(ip);
        ip_protocol = ip->ip_p;
        ip_data_len = be16_to_cpu(ip->ip_len) - hlen;

At this point, we can finally reach the point where the vulnerability takes place. ip->ip_len and hlen are both under control. Maliciously crafted data will trigger the out-of-bound read in the end.


In this post, I give a detailed explanation of RTL8139 emulation. More importantly, I give a mention of pci_dma_read in the source code of QEMU, which gives a new topic in my next post.


[1] http://www.phrack.org/papers/vm-escape-qemu-case-study.html
[2] https://github.com/dangokyo/QEMU_ESCAPE/blob/master/CVE_2015_5165_leak.c

One thought on “QEMU Internal: RTL8139

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.