Sophie

Sophie

distrib > Mageia > 7 > x86_64 > by-pkgid > 1956d551da2433756e2b94372d3a6181 > files > 216

xen-doc-4.12.1-1.mga7.noarch.rpm

NAME
    xen-tscmode - Xen TSC (time stamp counter) and timekeeping discussion

OVERVIEW
    As of Xen 4.0, a new config option called tsc_mode may be specified for
    each domain. The default for tsc_mode handles the vast majority of
    hardware and software environments. This document is targeted for Xen
    users and administrators that may need to select a non-default tsc_mode.

    Proper selection of tsc_mode depends on an understanding not only of the
    guest operating system (OS), but also of the application set that will
    ever run on this guest OS. This is because tsc_mode applies equally to
    both the OS and ALL apps that are running on this domain, now or in the
    future.

    Key questions to be answered for the OS and/or each application are:

    *   Does the OS/app use the rdtsc instruction at all? (We will explain
        below how to determine this.)

    *   At what frequency is the rdtsc instruction executed by either the OS
        or any running apps? If the sum exceeds about 10,000 rdtsc
        instructions per second per processor, we call this a
        "high-TSC-frequency" OS/app/environment. (This is relatively rare,
        and developers of OS's and apps that are high-TSC-frequency are
        usually aware of it.)

    *   If the OS/app does use rdtsc, will it behave incorrectly if "time
        goes backwards" or if the frequency of the TSC suddenly changes? If
        so, we call this a "TSC-sensitive" app or OS; otherwise it is
        "TSC-resilient".

    This last is the US$64,000 question as it may be very difficult (or, for
    legacy apps, even impossible) to predict all possible failure cases. As
    a result, unless proven otherwise, any app that uses rdtsc must be
    assumed to be TSC-sensitive and, as we will see, this is the default
    starting in Xen 4.0.

    Xen's new tsc_mode parameter determines the circumstances under which
    the family of rdtsc instructions are executed "natively" vs emulated.
    Roughly speaking, native means rdtsc is fast but TSC-sensitive apps may,
    under unpredictable circumstances, run incorrectly; emulated means there
    is some performance degradation (unobservable in most cases), but
    TSC-sensitive apps will always run correctly. Prior to Xen 4.0, all
    rdtsc instructions were native: "fast but potentially incorrect."
    Starting at Xen 4.0, the default is that all rdtsc instructions are
    "correct but potentially slow". The tsc_mode parameter in 4.0 provides
    an intelligent default but allows system administrator's to adjust how
    rdtsc instructions are executed differently for different domains.

    The non-default choices for tsc_mode are:

    *   tsc_mode=1 (always emulate).

        All rdtsc instructions are emulated; this is the best choice when
        TSC-sensitive apps are running and it is necessary to understand
        worst-case performance degradation for a specific hardware
        environment.

    *   tsc_mode=2 (never emulate).

        This is the same as prior to Xen 4.0 and is the best choice if it is
        certain that all apps running in this VM are TSC-resilient and
        highest performance is required.

    *   tsc_mode=3 (PVRDTSCP).

        This mode has been removed.

    If tsc_mode is left unspecified (or set to tsc_mode=0), a hybrid
    algorithm is utilized to ensure correctness while providing the best
    performance possible given:

    *   the requirement of correctness,

    *   the underlying hardware, and

    *   whether or not the VM has been saved/restored/migrated

    To understand this in more detail, the rest of this document must be
    read.

DETERMINING RDTSC FREQUENCY
    To determine the frequency of rdtsc instructions that are emulated, an
    "xl" command can be used by a privileged user of domain0. The command:

        # xl debug-key s; xl dmesg | tail

    provides information about TSC usage in each domain where TSC emulation
    is currently enabled.

TSC HISTORY
    To understand tsc_mode completely, some background on TSC is required:

    The x86 "timestamp counter", or TSC, is a 64-bit register on each
    processor that increases monotonically. Historically, TSC incremented
    every processor cycle, but on recent processors, it increases at a
    constant rate even if the processor changes frequency (for example, to
    reduce processor power usage). TSC is known by x86 programmers as the
    fastest, highest-precision measurement of the passage of time so it is
    often used as a foundation for performance monitoring. And since it is
    guaranteed to be monotonically increasing and, at 64 bits, is guaranteed
    to not wraparound within 10 years, it is sometimes used as a random
    number or a unique sequence identifier, such as to stamp transactions so
    they can be replayed in a specific order.

    On most older SMP and early multi-core machines, TSC was not
    synchronized between processors. Thus if an application were to read the
    TSC on one processor, then was moved by the OS to another processor,
    then read TSC again, it might appear that "time went backwards". This
    loss of monotonicity resulted in many obscure application bugs when
    TSC-sensitive apps were ported from a uniprocessor to an SMP
    environment; as a result, many applications -- especially in the Windows
    world -- removed their dependency on TSC and replaced their timestamp
    needs with OS-specific functions, losing both performance and precision.
    On some more recent generations of multi-core machines, especially
    multi-socket multi-core machines, the TSC was synchronized but if one
    processor were to enter certain low-power states, its TSC would stop,
    destroying the synchrony and again causing obscure bugs. This reinforced
    decisions to avoid use of TSC altogether. On the most recent generations
    of multi-core machines, however, synchronization is provided across all
    processors in all power states, even on multi-socket machines, and
    provide a flag that indicates that TSC is synchronized and "invariant".
    Thus TSC is once again useful for applications, and even newer operating
    systems are using and depending upon TSC for critical timekeeping tasks
    when running on these recent machines.

    We will refer to hardware that ensures TSC is both synchronized and
    invariant as "TSC-safe" and any hardware on which TSC is not (or may not
    remain) synchronized as "TSC-unsafe".

    As a result of TSC's sordid history, two classes of applications use
    TSC: old applications designed for single processors, and the most
    recent enterprise applications which require high-frequency
    high-precision timestamping.

    We will refer to apps that might break if running on a TSC-unsafe
    machine as "TSC-sensitive"; apps that don't use TSC, or do use TSC but
    use it in a way that monotonicity and frequency invariance are
    unimportant as "TSC-resilient".

    The emergence of virtualization once again complicates the usage of TSC.
    When features such as save/restore or live migration are employed, a
    guest OS and all its currently running applications may be invisibly
    transported to an entirely different physical machine. While TSC may be
    "safe" on one machine, it is essentially impossible to precisely
    synchronize TSC across a data center or even a pool of machines. As a
    result, when run in a virtualized environment, rare and obscure "time
    going backwards" problems might once again occur for those TSC-sensitive
    applications. Worse, if a guest OS moves from, for example, a 3GHz
    machine to a 1.5GHz machine, attempts by an OS/app to measure time
    intervals with TSC may without notice be incorrect by a factor of two.

    The rdtsc (read timestamp counter) instruction is used to read the TSC
    register. The rdtscp instruction is a variant of rdtsc on recent
    processors. We refer to these together as the rdtsc family of
    instructions, or just "rdtsc". Instructions in the rdtsc family are
    non-privileged, but privileged software may set a cpuid bit to cause all
    rdtsc family instructions to trap. This trap can be detected by Xen,
    which can then transparently "emulate" the results of the rdtsc
    instruction and return control to the code following the rdtsc
    instruction.

    To provide a "safe" TSC, i.e. to ensure both TSC monotonicity and a
    fixed rate, Xen provides rdtsc emulation whenever necessary or when
    explicitly specified by a per-VM configuration option. TSC emulation is
    relatively slow -- roughly 15-20 times slower than the rdtsc instruction
    when executed natively. However, except when an OS or application uses
    the rdtsc instruction at a high frequency (e.g. more than about 10,000
    times per second per processor), this performance degradation is not
    noticeable (i.e. <0.3%). And, TSC emulation is nearly always faster than
    OS-provided alternatives (e.g. Linux's gettimeofday). For environments
    where it is certain that all apps are TSC-resilient (e.g. "TSC-safeness"
    is not necessary) and highest performance is a requirement, TSC
    emulation may be entirely disabled (tsc_mode==2).

    The default mode (tsc_mode==0) checks TSC-safeness of the underlying
    hardware on which the virtual machine is launched. If it is TSC-safe,
    rdtsc will execute at hardware speed; if it is not, rdtsc will be
    emulated. Once a virtual machine is save/restored or migrated, however,
    there are two possibilities: TSC remains native IF the source physical
    machine and target physical machine have the same TSC frequency (or, for
    HVM/PVH guests, if TSC scaling support is available); else TSC is
    emulated. Note that, though emulated, the "apparent" TSC frequency will
    be the TSC frequency of the initial physical machine, even after
    migration.

    Finally, tsc_mode==1 always enables TSC emulation, regardless of the
    underlying physical hardware. The "apparent" TSC frequency will be the
    TSC frequency of the initial physical machine, even after migration.
    This mode is useful to measure any performance degradation that might be
    encountered by a tsc_mode==0 domain after migration occurs, or a
    tsc_mode==3 domain when it is running on TSC-unsafe hardware.

    Note that while Xen ensures that an emulated TSC is "safe" across
    migration, it does not ensure that it continues to tick at the same rate
    during the actual migration. As an oversimplified example, if TSC is
    ticking once per second in a guest, and the guest is saved when the TSC
    is 1000, then restored 30 seconds later, TSC is only guaranteed to be
    greater than or equal to 1001, not precisely 1030. This has some OS
    implications as will be seen in the next section.

TSC INVARIANT BIT and NO_MIGRATE
    Related to TSC emulation, the "TSC Invariant" bit is architecturally
    defined in a cpuid bit on the most recent x86 processors. If set, TSC
    invariance ensures that the TSC is "safe", that is it will increment at
    a constant rate regardless of power events, will be synchronized across
    all processors, and was properly initialized to zero on all processors
    at boot-time by system hardware/BIOS. As long as system software never
    writes to TSC, TSC will be safe and continuously incremented at a fixed
    rate and thus can be used as a system "clocksource".

    This bit is used by some OS's, and specifically by Linux starting with
    version 2.6.30(?), to select TSC as a system clocksource. Once selected,
    TSC remains the Linux system clocksource unless manually overridden. In
    a virtualized environment, since it is not possible to synchronize TSC
    across all the machines in a pool or data center, a migration may
    "break" TSC as a usable clocksource; while time will not go backwards,
    it may not track wallclock time well enough to avoid certain
    time-sensitive consequences. As a result, Xen can only expose the TSC
    Invariant bit to a guest OS if it is certain that the domain will never
    migrate. As of Xen 4.0, the "no_migrate=1" VM configuration option may
    be specified to disable migration. If no_migrate is selected and the VM
    is running on a physical machine with "TSC Invariant", Linux 2.6.30+
    will safely use TSC as the system clocksource. But, attempts to migrate
    or, once saved, restore this domain will fail.

    There is another cpuid-related complication: The x86 cpuid instruction
    is non-privileged. HVM domains are configured to always trap this
    instruction to Xen, where Xen can "filter" the result. In a PV OS, all
    cpuid instructions have been replaced by a paravirtualized equivalent of
    the cpuid instruction ("pvcpuid") and also trap to Xen. But apps in a PV
    guest that use a cpuid instruction execute it directly, without a trap
    to Xen. As a result, an app may directly examine the physical TSC
    Invariant cpuid bit and make decisions based on that bit.

HARDWARE TSC SCALING
    Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read by
    guest rdtsc/p increasing in a different frequency than the host TSC
    frequency.

    If a HVM container in default TSC mode (tsc_mode=0) is created on a host
    that provides constant TSC, its guest TSC frequency will be the same as
    the host. If it is later migrated to another host that provides constant
    TSC and supports Intel VMX TSC scaling/AMD SVM TSC ratio, its guest TSC
    frequency will be the same before and after migration.

    For above HVM container in default TSC mode (tsc_mode=0), if above hosts
    support rdtscp, both guest rdtsc and rdtscp instructions will be
    executed natively before and after migration.

AUTHORS
    Dan Magenheimer <dan.magenheimer@oracle.com>