(linenum→info "unix/slp.c:2238")

qemu/0.9.1/qemu-tech.texi

    1: \input texinfo @c -*- texinfo -*-
    2: @c %**start of header
    3: @setfilename qemu-tech.info
    4: @settitle QEMU Internals
    5: @exampleindent 0
    6: @paragraphindent 0
    7: @c %**end of header
    8: 
    9: @iftex
   10: @titlepage
   11: @sp 7
   12: @center @titlefont{QEMU Internals}
   13: @sp 3
   14: @end titlepage
   15: @end iftex
   16: 
   17: @ifnottex
   18: @node Top
   19: @top
   20: 
   21: @menu
   22: * Introduction::
   23: * QEMU Internals::
   24: * Regression Tests::
   25: * Index::
   26: @end menu
   27: @end ifnottex
   28: 
   29: @contents
   30: 
   31: @node Introduction
   32: @chapter Introduction
   33: 
   34: @menu
   35: * intro_features::        Features
   36: * intro_x86_emulation::   x86 emulation
   37: * intro_arm_emulation::   ARM emulation
   38: * intro_mips_emulation::  MIPS emulation
   39: * intro_ppc_emulation::   PowerPC emulation
   40: * intro_sparc_emulation:: SPARC emulation
   41: @end menu
   42: 
   43: @node intro_features
   44: @section Features
   45: 
   46: QEMU is a FAST! processor emulator using a portable dynamic
   47: translator.
   48: 
   49: QEMU has two operating modes:
   50: 
   51: @itemize @minus
   52: 
   53: @item
   54: Full system emulation. In this mode, QEMU emulates a full system
   55: (usually a PC), including a processor and various peripherals. It can
   56: be used to launch an different Operating System without rebooting the
   57: PC or to debug system code.
   58: 
   59: @item
   60: User mode emulation (Linux host only). In this mode, QEMU can launch
   61: Linux processes compiled for one CPU on another CPU. It can be used to
   62: launch the Wine Windows API emulator (@url{http://www.winehq.org}) or
   63: to ease cross-compilation and cross-debugging.
   64: 
   65: @end itemize
   66: 
   67: As QEMU requires no host kernel driver to run, it is very safe and
   68: easy to use.
   69: 
   70: QEMU generic features:
   71: 
   72: @itemize
   73: 
   74: @item User space only or full system emulation.
   75: 
   76: @item Using dynamic translation to native code for reasonable speed.
   77: 
   78: @item Working on x86 and PowerPC hosts. Being tested on ARM, Sparc32, Alpha and S390.
   79: 
   80: @item Self-modifying code support.
   81: 
   82: @item Precise exceptions support.
   83: 
   84: @item The virtual CPU is a library (@code{libqemu}) which can be used
   85: in other projects (look at @file{qemu/tests/qruncom.c} to have an
   86: example of user mode @code{libqemu} usage).
   87: 
   88: @end itemize
   89: 
   90: QEMU user mode emulation features:
   91: @itemize
   92: @item Generic Linux system call converter, including most ioctls.
   93: 
   94: @item clone() emulation using native CPU clone() to use Linux scheduler for threads.
   95: 
   96: @item Accurate signal handling by remapping host signals to target signals.
   97: @end itemize
   98: 
   99: QEMU full system emulation features:
  100: @itemize
  101: @item QEMU can either use a full software MMU for maximum portability or use the host system call mmap() to simulate the target MMU.
  102: @end itemize
  103: 
  104: @node intro_x86_emulation
  105: @section x86 emulation
  106: 
  107: QEMU x86 target features:
  108: 
  109: @itemize
  110: 
  111: @item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation.
  112: LDT/GDT and IDT are emulated. VM86 mode is also supported to run DOSEMU.
  113: 
  114: @item Support of host page sizes bigger than 4KB in user mode emulation.
  115: 
  116: @item QEMU can emulate itself on x86.
  117: 
  118: @item An extensive Linux x86 CPU test program is included @file{tests/test-i386}.
  119: It can be used to test other x86 virtual CPUs.
  120: 
  121: @end itemize
  122: 
  123: Current QEMU limitations:
  124: 
  125: @itemize
  126: 
  127: @item No SSE/MMX support (yet).
  128: 
  129: @item No x86-64 support.
  130: 
  131: @item IPC syscalls are missing.
  132: 
  133: @item The x86 segment limits and access rights are not tested at every
  134: memory access (yet). Hopefully, very few OSes seem to rely on that for
  135: normal use.
  136: 
  137: @item On non x86 host CPUs, @code{double}s are used instead of the non standard
  138: 10 byte @code{long double}s of x86 for floating point emulation to get
  139: maximum performances.
  140: 
  141: @end itemize
  142: 
  143: @node intro_arm_emulation
  144: @section ARM emulation
  145: 
  146: @itemize
  147: 
  148: @item Full ARM 7 user emulation.
  149: 
  150: @item NWFPE FPU support included in user Linux emulation.
  151: 
  152: @item Can run most ARM Linux binaries.
  153: 
  154: @end itemize
  155: 
  156: @node intro_mips_emulation
  157: @section MIPS emulation
  158: 
  159: @itemize
  160: 
  161: @item The system emulation allows full MIPS32/MIPS64 Release 2 emulation,
  162: including privileged instructions, FPU and MMU, in both little and big
  163: endian modes.
  164: 
  165: @item The Linux userland emulation can run many 32 bit MIPS Linux binaries.
  166: 
  167: @end itemize
  168: 
  169: Current QEMU limitations:
  170: 
  171: @itemize
  172: 
  173: @item Self-modifying code is not always handled correctly.
  174: 
  175: @item 64 bit userland emulation is not implemented.
  176: 
  177: @item The system emulation is not complete enough to run real firmware.
  178: 
  179: @item The watchpoint debug facility is not implemented.
  180: 
  181: @end itemize
  182: 
  183: @node intro_ppc_emulation
  184: @section PowerPC emulation
  185: 
  186: @itemize
  187: 
  188: @item Full PowerPC 32 bit emulation, including privileged instructions,
  189: FPU and MMU.
  190: 
  191: @item Can run most PowerPC Linux binaries.
  192: 
  193: @end itemize
  194: 
  195: @node intro_sparc_emulation
  196: @section SPARC emulation
  197: 
  198: @itemize
  199: 
  200: @item Full SPARC V8 emulation, including privileged
  201: instructions, FPU and MMU. SPARC V9 emulation includes most privileged
  202: and VIS instructions, FPU and I/D MMU. Alignment is fully enforced.
  203: 
  204: @item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and
  205: some 64-bit SPARC Linux binaries.
  206: 
  207: @end itemize
  208: 
  209: Current QEMU limitations:
  210: 
  211: @itemize
  212: 
  213: @item IPC syscalls are missing.
  214: 
  215: @item Floating point exception support is buggy.
  216: 
  217: @item Atomic instructions are not correctly implemented.
  218: 
  219: @item Sparc64 emulators are not usable for anything yet.
  220: 
  221: @end itemize
  222: 
  223: @node QEMU Internals
  224: @chapter QEMU Internals
  225: 
  226: @menu
  227: * QEMU compared to other emulators::
  228: * Portable dynamic translation::
  229: * Register allocation::
  230: * Condition code optimisations::
  231: * CPU state optimisations::
  232: * Translation cache::
  233: * Direct block chaining::
  234: * Self-modifying code and translated code invalidation::
  235: * Exception support::
  236: * MMU emulation::
  237: * Hardware interrupts::
  238: * User emulation specific details::
  239: * Bibliography::
  240: @end menu
  241: 
  242: @node QEMU compared to other emulators
  243: @section QEMU compared to other emulators
  244: 
  245: Like bochs [3], QEMU emulates an x86 CPU. But QEMU is much faster than
  246: bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC
  247: emulation while QEMU can emulate several processors.
  248: 
  249: Like Valgrind [2], QEMU does user space emulation and dynamic
  250: translation. Valgrind is mainly a memory debugger while QEMU has no
  251: support for it (QEMU could be used to detect out of bound memory
  252: accesses as Valgrind, but it has no support to track uninitialised data
  253: as Valgrind does). The Valgrind dynamic translator generates better code
  254: than QEMU (in particular it does register allocation) but it is closely
  255: tied to an x86 host and target and has no support for precise exceptions
  256: and system emulation.
  257: 
  258: EM86 [4] is the closest project to user space QEMU (and QEMU still uses
  259: some of its code, in particular the ELF file loader). EM86 was limited
  260: to an alpha host and used a proprietary and slow interpreter (the
  261: interpreter part of the FX!32 Digital Win32 code translator [5]).
  262: 
  263: TWIN [6] is a Windows API emulator like Wine. It is less accurate than
  264: Wine but includes a protected mode x86 interpreter to launch x86 Windows
  265: executables. Such an approach has greater potential because most of the
  266: Windows API is executed natively but it is far more difficult to develop
  267: because all the data structures and function parameters exchanged
  268: between the API and the x86 code must be converted.
  269: 
  270: User mode Linux [7] was the only solution before QEMU to launch a
  271: Linux kernel as a process while not needing any host kernel
  272: patches. However, user mode Linux requires heavy kernel patches while
  273: QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is
  274: slower.
  275: 
  276: The new Plex86 [8] PC virtualizer is done in the same spirit as the
  277: qemu-fast system emulator. It requires a patched Linux kernel to work
  278: (you cannot launch the same kernel on your PC), but the patches are
  279: really small. As it is a PC virtualizer (no emulation is done except
  280: for some priveledged instructions), it has the potential of being
  281: faster than QEMU. The downside is that a complicated (and potentially
  282: unsafe) host kernel patch is needed.
  283: 
  284: The commercial PC Virtualizers (VMWare [9], VirtualPC [10], TwoOStwo
  285: [11]) are faster than QEMU, but they all need specific, proprietary
  286: and potentially unsafe host drivers. Moreover, they are unable to
  287: provide cycle exact simulation as an emulator can.
  288: 
  289: @node Portable dynamic translation
  290: @section Portable dynamic translation
  291: 
  292: QEMU is a dynamic translator. When it first encounters a piece of code,
  293: it converts it to the host instruction set. Usually dynamic translators
  294: are very complicated and highly CPU dependent. QEMU uses some tricks
  295: which make it relatively easily portable and simple while achieving good
  296: performances.
  297: 
  298: The basic idea is to split every x86 instruction into fewer simpler
  299: instructions. Each simple instruction is implemented by a piece of C
  300: code (see @file{target-i386/op.c}). Then a compile time tool
  301: (@file{dyngen}) takes the corresponding object file (@file{op.o})
  302: to generate a dynamic code generator which concatenates the simple
  303: instructions to build a function (see @file{op.h:dyngen_code()}).
  304: 
  305: In essence, the process is similar to [1], but more work is done at
  306: compile time.
  307: 
  308: A key idea to get optimal performances is that constant parameters can
  309: be passed to the simple operations. For that purpose, dummy ELF
  310: relocations are generated with gcc for each constant parameter. Then,
  311: the tool (@file{dyngen}) can locate the relocations and generate the
  312: appriopriate C code to resolve them when building the dynamic code.
  313: 
  314: That way, QEMU is no more difficult to port than a dynamic linker.
  315: 
  316: To go even faster, GCC static register variables are used to keep the
  317: state of the virtual CPU.
  318: 
  319: @node Register allocation
  320: @section Register allocation
  321: 
  322: Since QEMU uses fixed simple instructions, no efficient register
  323: allocation can be done. However, because RISC CPUs have a lot of
  324: register, most of the virtual CPU state can be put in registers without
  325: doing complicated register allocation.
  326: 
  327: @node Condition code optimisations
  328: @section Condition code optimisations
  329: 
  330: Good CPU condition codes emulation (@code{EFLAGS} register on x86) is a
  331: critical point to get good performances. QEMU uses lazy condition code
  332: evaluation: instead of computing the condition codes after each x86
  333: instruction, it just stores one operand (called @code{CC_SRC}), the
  334: result (called @code{CC_DST}) and the type of operation (called
  335: @code{CC_OP}).
  336: 
  337: @code{CC_OP} is almost never explicitely set in the generated code
  338: because it is known at translation time.
  339: 
  340: In order to increase performances, a backward pass is performed on the
  341: generated simple instructions (see
  342: @code{target-i386/translate.c:optimize_flags()}). When it can be proved that
  343: the condition codes are not needed by the next instructions, no
  344: condition codes are computed at all.
  345: 
  346: @node CPU state optimisations
  347: @section CPU state optimisations
  348: 
  349: The x86 CPU has many internal states which change the way it evaluates
  350: instructions. In order to achieve a good speed, the translation phase
  351: considers that some state information of the virtual x86 CPU cannot
  352: change in it. For example, if the SS, DS and ES segments have a zero
  353: base, then the translator does not even generate an addition for the
  354: segment base.
  355: 
  356: [The FPU stack pointer register is not handled that way yet].
  357: 
  358: @node Translation cache
  359: @section Translation cache
  360: 
  361: A 16 MByte cache holds the most recently used translations. For
  362: simplicity, it is completely flushed when it is full. A translation unit
  363: contains just a single basic block (a block of x86 instructions
  364: terminated by a jump or by a virtual CPU state change which the
  365: translator cannot deduce statically).
  366: 
  367: @node Direct block chaining
  368: @section Direct block chaining
  369: 
  370: After each translated basic block is executed, QEMU uses the simulated
  371: Program Counter (PC) and other cpu state informations (such as the CS
  372: segment base value) to find the next basic block.
  373: 
  374: In order to accelerate the most common cases where the new simulated PC
  375: is known, QEMU can patch a basic block so that it jumps directly to the
  376: next one.
  377: 
  378: The most portable code uses an indirect jump. An indirect jump makes
  379: it easier to make the jump target modification atomic. On some host
  380: architectures (such as x86 or PowerPC), the @code{JUMP} opcode is
  381: directly patched so that the block chaining has no overhead.
  382: 
  383: @node Self-modifying code and translated code invalidation
  384: @section Self-modifying code and translated code invalidation
  385: 
  386: Self-modifying code is a special challenge in x86 emulation because no
  387: instruction cache invalidation is signaled by the application when code
  388: is modified.
  389: 
  390: When translated code is generated for a basic block, the corresponding
  391: host page is write protected if it is not already read-only (with the
  392: system call @code{mprotect()}). Then, if a write access is done to the
  393: page, Linux raises a SEGV signal. QEMU then invalidates all the
  394: translated code in the page and enables write accesses to the page.
  395: 
  396: Correct translated code invalidation is done efficiently by maintaining
  397: a linked list of every translated block contained in a given page. Other
  398: linked lists are also maintained to undo direct block chaining.
  399: 
  400: Although the overhead of doing @code{mprotect()} calls is important,
  401: most MSDOS programs can be emulated at reasonnable speed with QEMU and
  402: DOSEMU.
  403: 
  404: Note that QEMU also invalidates pages of translated code when it detects
  405: that memory mappings are modified with @code{mmap()} or @code{munmap()}.
  406: 
  407: When using a software MMU, the code invalidation is more efficient: if
  408: a given code page is invalidated too often because of write accesses,
  409: then a bitmap representing all the code inside the page is
  410: built. Every store into that page checks the bitmap to see if the code
  411: really needs to be invalidated. It avoids invalidating the code when
  412: only data is modified in the page.
  413: 
  414: @node Exception support
  415: @section Exception support
  416: 
  417: longjmp() is used when an exception such as division by zero is
  418: encountered.
  419: 
  420: The host SIGSEGV and SIGBUS signal handlers are used to get invalid
  421: memory accesses. The exact CPU state can be retrieved because all the
  422: x86 registers are stored in fixed host registers. The simulated program
  423: counter is found by retranslating the corresponding basic block and by
  424: looking where the host program counter was at the exception point.
  425: 
  426: The virtual CPU cannot retrieve the exact @code{EFLAGS} register because
  427: in some cases it is not computed because of condition code
  428: optimisations. It is not a big concern because the emulated code can
  429: still be restarted in any cases.
  430: 
  431: @node MMU emulation
  432: @section MMU emulation
  433: 
  434: For system emulation, QEMU uses the mmap() system call to emulate the
  435: target CPU MMU. It works as long the emulated OS does not use an area
  436: reserved by the host OS (such as the area above 0xc0000000 on x86
  437: Linux).
  438: 
  439: In order to be able to launch any OS, QEMU also supports a soft
  440: MMU. In that mode, the MMU virtual to physical address translation is
  441: done at every memory access. QEMU uses an address translation cache to
  442: speed up the translation.
  443: 
  444: In order to avoid flushing the translated code each time the MMU
  445: mappings change, QEMU uses a physically indexed translation cache. It
  446: means that each basic block is indexed with its physical address.
  447: 
  448: When MMU mappings change, only the chaining of the basic blocks is
  449: reset (i.e. a basic block can no longer jump directly to another one).
  450: 
  451: @node Hardware interrupts
  452: @section Hardware interrupts
  453: 
  454: In order to be faster, QEMU does not check at every basic block if an
  455: hardware interrupt is pending. Instead, the user must asynchrously
  456: call a specific function to tell that an interrupt is pending. This
  457: function resets the chaining of the currently executing basic
  458: block. It ensures that the execution will return soon in the main loop
  459: of the CPU emulator. Then the main loop can test if the interrupt is
  460: pending and handle it.
  461: 
  462: @node User emulation specific details
  463: @section User emulation specific details
  464: 
  465: @subsection Linux system call translation
  466: 
  467: QEMU includes a generic system call translator for Linux. It means that
  468: the parameters of the system calls can be converted to fix the
  469: endianness and 32/64 bit issues. The IOCTLs are converted with a generic
  470: type description system (see @file{ioctls.h} and @file{thunk.c}).
  471: 
  472: QEMU supports host CPUs which have pages bigger than 4KB. It records all
  473: the mappings the process does and try to emulated the @code{mmap()}
  474: system calls in cases where the host @code{mmap()} call would fail
  475: because of bad page alignment.
  476: 
  477: @subsection Linux signals
  478: 
  479: Normal and real-time signals are queued along with their information
  480: (@code{siginfo_t}) as it is done in the Linux kernel. Then an interrupt
  481: request is done to the virtual CPU. When it is interrupted, one queued
  482: signal is handled by generating a stack frame in the virtual CPU as the
  483: Linux kernel does. The @code{sigreturn()} system call is emulated to return
  484: from the virtual signal handler.
  485: 
  486: Some signals (such as SIGALRM) directly come from the host. Other
  487: signals are synthetized from the virtual CPU exceptions such as SIGFPE
  488: when a division by zero is done (see @code{main.c:cpu_loop()}).
  489: 
  490: The blocked signal mask is still handled by the host Linux kernel so
  491: that most signal system calls can be redirected directly to the host
  492: Linux kernel. Only the @code{sigaction()} and @code{sigreturn()} system
  493: calls need to be fully emulated (see @file{signal.c}).
  494: 
  495: @subsection clone() system call and threads
  496: 
  497: The Linux clone() system call is usually used to create a thread. QEMU
  498: uses the host clone() system call so that real host threads are created
  499: for each emulated thread. One virtual CPU instance is created for each
  500: thread.
  501: 
  502: The virtual x86 CPU atomic operations are emulated with a global lock so
  503: that their semantic is preserved.
  504: 
  505: Note that currently there are still some locking issues in QEMU. In
  506: particular, the translated cache flush is not protected yet against
  507: reentrancy.
  508: 
  509: @subsection Self-virtualization
  510: 
  511: QEMU was conceived so that ultimately it can emulate itself. Although
  512: it is not very useful, it is an important test to show the power of the
  513: emulator.
  514: 
  515: Achieving self-virtualization is not easy because there may be address
  516: space conflicts. QEMU solves this problem by being an executable ELF
  517: shared object as the ld-linux.so ELF interpreter. That way, it can be
  518: relocated at load time.
  519: 
  520: @node Bibliography
  521: @section Bibliography
  522: 
  523: @table @asis
  524: 
  525: @item [1]
  526: @url{http://citeseer.nj.nec.com/piumarta98optimizing.html}, Optimizing
  527: direct threaded code by selective inlining (1998) by Ian Piumarta, Fabio
  528: Riccardi.
  529: 
  530: @item [2]
  531: @url{http://developer.kde.org/~sewardj/}, Valgrind, an open-source
  532: memory debugger for x86-GNU/Linux, by Julian Seward.
  533: 
  534: @item [3]
  535: @url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project,
  536: by Kevin Lawton et al.
  537: 
  538: @item [4]
  539: @url{http://www.cs.rose-hulman.edu/~donaldlf/em86/index.html}, the EM86
  540: x86 emulator on Alpha-Linux.
  541: 
  542: @item [5]
  543: @url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf},
  544: DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton
  545: Chernoff and Ray Hookway.
  546: 
  547: @item [6]
  548: @url{http://www.willows.com/}, Windows API library emulation from
  549: Willows Software.
  550: 
  551: @item [7]
  552: @url{http://user-mode-linux.sourceforge.net/},
  553: The User-mode Linux Kernel.
  554: 
  555: @item [8]
  556: @url{http://www.plex86.org/},
  557: The new Plex86 project.
  558: 
  559: @item [9]
  560: @url{http://www.vmware.com/},
  561: The VMWare PC virtualizer.
  562: 
  563: @item [10]
  564: @url{http://www.microsoft.com/windowsxp/virtualpc/},
  565: The VirtualPC PC virtualizer.
  566: 
  567: @item [11]
  568: @url{http://www.twoostwo.org/},
  569: The TwoOStwo PC virtualizer.
  570: 
  571: @end table
  572: 
  573: @node Regression Tests
  574: @chapter Regression Tests
  575: 
  576: In the directory @file{tests/}, various interesting testing programs
  577: are available. They are used for regression testing.
  578: 
  579: @menu
  580: * test-i386::
  581: * linux-test::
  582: * qruncom.c::
  583: @end menu
  584: 
  585: @node test-i386
  586: @section @file{test-i386}
  587: 
  588: This program executes most of the 16 bit and 32 bit x86 instructions and
  589: generates a text output. It can be compared with the output obtained with
  590: a real CPU or another emulator. The target @code{make test} runs this
  591: program and a @code{diff} on the generated output.
  592: 
  593: The Linux system call @code{modify_ldt()} is used to create x86 selectors
  594: to test some 16 bit addressing and 32 bit with segmentation cases.
  595: 
  596: The Linux system call @code{vm86()} is used to test vm86 emulation.
  597: 
  598: Various exceptions are raised to test most of the x86 user space
  599: exception reporting.
  600: 
  601: @node linux-test
  602: @section @file{linux-test}
  603: 
  604: This program tests various Linux system calls. It is used to verify
  605: that the system call parameters are correctly converted between target
  606: and host CPUs.
  607: 
  608: @node qruncom.c
  609: @section @file{qruncom.c}
  610: 
  611: Example of usage of @code{libqemu} to emulate a user mode i386 CPU.
  612: 
  613: @node Index
  614: @chapter Index
  615: @printindex cp
  616: 
  617: @bye
1
Syntax (Markdown)