Sophie: xen-doc-4.5.0-1m.mo8 i686

xen-doc-4.5.0-1m.mo8.i686.rpm

-----------------------
XSM/FLASK Configuration
-----------------------

Xen provides a security framework called XSM, and FLASK is an implementation of
a security model using this framework (at the time of writing, it is the only
one). FLASK defines a mandatory access control policy providing fine-grained
controls over Xen domains, allowing the policy writer to define what
interactions between domains, devices, and the hypervisor are permitted.

Some examples of what FLASK can do:
- Prevent two domains from communicating via event channels or grants
- Control which domains can use device passthrough (and which devices)
- Restrict or audit operations performed by privileged domains
- Prevent a privileged domain from arbitrarily mapping pages from other domains

Some of these examples require dom0 disaggregation to be useful, since the
domain build process requires the ability to write to the new domain's memory.

Security Status of dom0 disaggregation
--------------------------------------

Xen supports disaggregation of various support and management
functions into their own domains, via the XSM mechanisms described in
this document.

However the implementations of these support and management interfaces
were originally written to be used only by the totally-privileged
dom0, and have not been reviewed for security when exposed to
supposedly-only-semi-privileged disaggregated management domains. But
such management domains are (in such a design) to be seen as
potentially hostile, e.g. due to privilege escalation following
exploitation of a bug in the management domain.

Until the interfaces have been properly reviewed for security against
hostile callers, the Xen.org security team intends (subject of course
to the permission of anyone disclosing to us) to handle these and
future vulnerabilities in these interfaces in public, as if they were
normal non-security-related bugs.

This applies only to bugs which do no more than reduce the security of
a radically disaggregated system to the security of a
non-disaggregated one. Here a "radically disaggregated system" is one
which uses the XSM mechanism to delegate the affected interfaces to
other-than-fully-trusted domains.

This policy does not apply to bugs which affect stub device models,
driver domains, or stub xenstored - even if those bugs do no worse
than reduce the security of such a system to one whose device models,
backend drivers, or xenstore, run in dom0.

For more information see http://xenbits.xen.org/xsa/advisory-77.html.

The following interfaces are covered by this statement. Interfaces
not listed here are considered safe for disaggregation, security
issues found in interfaces not listed here will be handled according
to the normal security problem response policy
http://www.xenproject.org/security-policy.html.

__HYPERVISOR_domctl (xen/include/public/domctl.h)

The following subops are covered by this statement. subops not listed
here are considered safe for disaggregation.

* XEN_DOMCTL_createdomain
* XEN_DOMCTL_destroydomain
* XEN_DOMCTL_getmemlist
* XEN_DOMCTL_setvcpuaffinity
* XEN_DOMCTL_shadow_op
* XEN_DOMCTL_max_mem
* XEN_DOMCTL_setvcpucontext
* XEN_DOMCTL_getvcpucontext
* XEN_DOMCTL_max_vcpus
* XEN_DOMCTL_scheduler_op
* XEN_DOMCTL_iomem_permission
* XEN_DOMCTL_gethvmcontext
* XEN_DOMCTL_sethvmcontext
* XEN_DOMCTL_set_address_size
* XEN_DOMCTL_assign_device
* XEN_DOMCTL_pin_mem_cacheattr
* XEN_DOMCTL_set_ext_vcpucontext
* XEN_DOMCTL_get_ext_vcpucontext
* XEN_DOMCTL_test_assign_device
* XEN_DOMCTL_set_target
* XEN_DOMCTL_deassign_device
* XEN_DOMCTL_get_device_group
* XEN_DOMCTL_set_machine_address_size
* XEN_DOMCTL_debug_op
* XEN_DOMCTL_gethvmcontext_partial
* XEN_DOMCTL_mem_event_op
* XEN_DOMCTL_mem_sharing_op
* XEN_DOMCTL_setvcpuextstate
* XEN_DOMCTL_getvcpuextstate
* XEN_DOMCTL_set_access_required
* XEN_DOMCTL_set_virq_handler
* XEN_DOMCTL_set_broken_page_p2m
* XEN_DOMCTL_setnodeaffinity
* XEN_DOMCTL_gdbsx_guestmemio

__HYPERVISOR_sysctl (xen/include/public/sysctl.h)

The following subops are covered by this statement. subops not listed
here are considered safe for disaggregation.

* XEN_SYSCTL_readconsole
* XEN_SYSCTL_tbuf_op
* XEN_SYSCTL_physinfo
* XEN_SYSCTL_sched_id
* XEN_SYSCTL_perfc_op
* XEN_SYSCTL_getdomaininfolist
* XEN_SYSCTL_debug_keys
* XEN_SYSCTL_getcpuinfo
* XEN_SYSCTL_availheap
* XEN_SYSCTL_get_pmstat
* XEN_SYSCTL_cpu_hotplug
* XEN_SYSCTL_pm_op
* XEN_SYSCTL_page_offline_op
* XEN_SYSCTL_lockprof_op
* XEN_SYSCTL_topologyinfo
* XEN_SYSCTL_numainfo
* XEN_SYSCTL_cpupool_op
* XEN_SYSCTL_scheduler_op
* XEN_SYSCTL_coverage_op

__HYPERVISOR_memory_op (xen/include/public/memory.h)

The following subops are covered by this statement. subops not listed
here are considered safe for disaggregation.

* XENMEM_set_pod_target
* XENMEM_get_pod_target
* XENMEM_claim_pages

__HYPERVISOR_tmem_op (xen/include/public/tmem.h)

The following tmem control ops, that is the sub-subops of
TMEM_CONTROL, are covered by this statement.

Note that TMEM is also subject to a similar policy arising from
XSA-15 http://lists.xen.org/archives/html/xen-announce/2012-09/msg00006.html.
Due to this existing policy all TMEM Ops are already subject to
reduced security support.

* TMEMC_THAW
* TMEMC_FREEZE
* TMEMC_FLUSH
* TMEMC_DESTROY
* TMEMC_LIST
* TMEMC_SET_WEIGHT
* TMEMC_SET_CAP
* TMEMC_SET_COMPRESS
* TMEMC_QUERY_FREEABLE_MB
* TMEMC_SAVE_BEGIN
* TMEMC_SAVE_GET_VERSION
* TMEMC_SAVE_GET_MAXPOOLS
* TMEMC_SAVE_GET_CLIENT_WEIGHT
* TMEMC_SAVE_GET_CLIENT_CAP
* TMEMC_SAVE_GET_CLIENT_FLAGS
* TMEMC_SAVE_GET_POOL_FLAGS
* TMEMC_SAVE_GET_POOL_NPAGES
* TMEMC_SAVE_GET_POOL_UUID
* TMEMC_SAVE_GET_NEXT_PAGE
* TMEMC_SAVE_GET_NEXT_INV
* TMEMC_SAVE_END
* TMEMC_RESTORE_BEGIN
* TMEMC_RESTORE_PUT_PAGE
* TMEMC_RESTORE_FLUSH_PAGE

Setting up FLASK
----------------

Xen must be compiled with XSM and FLASK enabled; by default, the security
framework is disabled. Edit Config.mk or the .config file to set XSM_ENABLE and
FLASK_ENABLE to "y"; this change requires a make clean and rebuild.

FLASK uses only one domain configuration parameter (seclabel) defining the
full security label of the newly created domain. If using the example policy,
"seclabel='system_u:system_r:domU_t'" is an example of a normal domain. The
labels are in the same format as SELinux labels; see http://selinuxproject.org
for more details on the use of the user, role, and optional MLS/MCS labels.

FLASK policy overview
---------------------

Most of FLASK policy consists of defining the interactions allowed between
different types (domU_t would be the type in this example). For simple policies,
only type enforcement is used and the user and role are set to system_u and
system_r for all domains.

The FLASK security framework is mostly configured using a security policy file.
This policy file is not normally generated during the Xen build process because
it relies on the SELinux compiler "checkpolicy"; run

make -C tools/flask/policy

to compile the example policy included with Xen. The policy is generated from
definition files under this directory. When creating or modifying security
policy, most modifications will be made to the xen type enforcement (.te) file
tools/flask/policy/policy/modules/xen/xen.te or the macro definitions in xen.if.
The XSM policy file needs to be copied to /boot and loaded as a module by grub.
The exact position of the module does not matter as long as it is after the Xen
kernel; it is normally placed either just above the dom0 kernel or at the end.
Once dom0 is running, the policy can be reloaded using "xl loadpolicy".

The example policy included with Xen demonstrates most of the features of FLASK
that can be used without dom0 disaggregation. The main types for domUs are:

- domU_t is a domain that can communicate with any other domU_t
- isolated_domU_t can only communicate with dom0
- prot_domU_t is a domain type whose creation can be disabled with a boolean
- nomigrate_t is a domain that must be created via the nomigrate_t_building
type, and whose memory cannot be read by dom0 once created

HVM domains with stubdomain device models use two types (one per domain):
- domHVM_t is an HVM domain that uses a stubdomain device model
- dm_dom_t is the device model for a domain with type domHVM_t

One disadvantage of using type enforcement to enforce isolation is that a new
type is needed for each group of domains. The user field can be used to address
this for the most common case of groups that can communicate internally but not
externally; see "Users and roles" below.

Type transitions
----------------

Xen defines a number of operations such as memory mapping that are necessary for
a domain to perform on itself, but are also undesirable to allow a domain to
perform on every other domain of the same label. While it is possible to address
this by only creating one domain per type, this solution significantly limits
the flexibility of the type system. Another method to address this issue is to
duplicate the permission names for every operation that can be performed on the
current domain or on other domains; however, this significantly increases the
necessary number of permissions and complicates the XSM hooks. Instead, this is
addressed by allowing a distinct type to be used for a domain's access to
itself. The same applies for a device model domain's access to its designated
target, allowing the IS_PRIV_FOR checks used in Xen's DAC model to be
implemented in FLASK.

Upon domain creation (or relabel), a type transition is computed using the
domain's label as the source and target. The result of this computation is used
as the target when the domain accesses itself. In the example policy, this
computed type is the result of appending _self to a domain's type: domU_t_self
for domU_t. If no type transition rule exists, the domain will continue to use
its own label for both the source and target. An AVC message will look like:

scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t_self

A similar type transition is done when a device model domain is associated with
its target using the set_target operation. The transition is computed with the
target domain as the source and the device model domain as the target: this
ordering was chosen in order to preserve the original label for the target when
no type transition rule exists. In the example policy, these computed types are
the result of appending _target to the domain.

Type transitions are also used to compute the labels of event channels.

Users and roles
---------------

Users are defined in tools/flask/policy/policy/users. The example policy defines
two users (customer_1 and customer_2) in addition to the system user system_u.
Users are visible in the labels of domains and associated objects (event
channels); in the example policy, "customer_1:vm_r:domU_t" is a valid label for
the customer_1 user.

Access control rules involving users and roles are defined in the policy
constraints file (tools/flask/policy/policy/constraints). The example policy
provides constraints that prevent different users from communicating using
grants or event channels, while still allowing communication with the system_u
user where dom0 resides.

Resource Policy
---------------

The example policy also includes a resource type (nic_dev_t) for device
passthrough, configured to allow use by domU_t. To label the PCI device 3:2.0
for passthrough, run:

tools/flask/utils/flask-label-pci 0000:03:02.0 system_u:object_r:nic_dev_t

This command must be rerun on each boot or after any policy reload.

The example policy was only tested with simple domain creation and may be
missing rules allowing accesses by dom0 or domU when a number of hypervisor
features are used. When first loading or writing a policy, you should run FLASK
in permissive mode (the default) and check the Xen logs (xl dmesg) for AVC
denials before using it in enforcing mode (flask_enforcing=1 on the command
line, or xl setenforce).

MLS/MCS policy
--------------

If you want to use the MLS policy, then set TYPE=xen-mls in the policy Makefile
before building the policy. Note that the MLS constraints in policy/mls
are incomplete and are only a sample.

AVC denials
-----------

XSM:Flask will emit avc: denied messages when a permission is denied by the
policy, just like SELinux. For example, if the HVM rules are removed from the
declare_domain and create_domain interfaces:

# xl dmesg | grep avc
(XEN) avc: denied { setparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
(XEN) avc: denied { getparam } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
(XEN) avc: denied { irqlevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
(XEN) avc: denied { pciroute } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
(XEN) avc: denied { setparam } for domid=4 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t tclass=hvm
(XEN) avc: denied { cacheattr } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm
(XEN) avc: denied { pcilevel } for domid=0 scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=hvm

Existing SELinux tools such as audit2allow can be applied to these denials, e.g.
xl dmesg | audit2allow

The generated allow rules can then be fed back into the policy by
adding them to xen.te, although manual review is advised and will
often lead to adding parameterized rules to the interfaces in xen.if
to address the general case.

Device Labeling in Policy
-------------------------

FLASK is capable of labeling devices and enforcing policies associated with
them. There are two methods to label devices: dynamic labeling using
flask-label-pci or similar tools run in dom0, or static labeling defined in
policy. Static labeling will make security policy machine-specific and may
prevent the system from booting after any hardware changes (adding PCI cards,
memory, or even changing certain BIOS settings). Dynamic labeling requires that
the domain performing the labeling be trusted to label all the devices in the
system properly.

To enable static device labeling, a checkpolicy >= 2.0.20 and libsepol >=2.0.39
are required. The policy Makefile (tools/flask/policy/Makefile) must also be
changed as follows:

########################################
#
# Build a binary policy locally
#
$(POLVER): policy.conf
@echo "Compiling $(NAME) $(POLVER)"
$(QUIET) $(CHECKPOLICY) $^ -o $@ (Comment out this line)
# Uncomment line below to enable policies for devices
# $(QUIET) $(CHECKPOLICY) -t Xen $^ -o $@ (Uncomment this line)

########################################
#
# Install a binary policy
#
$(LOADPATH): policy.conf
@echo "Compiling and installing $(NAME) $(LOADPATH)"
$(QUIET) $(CHECKPOLICY) $^ -o $@ (Comment out this line)
# Uncomment line below to enable policies for devices
# $(QUIET) $(CHECKPOLICY) -t Xen $^ -o $@ (Uncomment this line)

IRQs, PCI devices, I/O memory and ports can all be labeled. There are
commented out lines in xen.te policy for examples on how to label devices.

Device Labeling
---------------

The "lspci -vvn" command can be used to output all the devices and identifiers
associated with them. For example, to label an Intel e1000e ethernet card the
lspci output is..

00:19.0 0200: 8086:10de (rev 02)
Subsystem: 1028:0276
Interrupt: pin A routed to IRQ 33
Region 0: Memory at febe0000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at febd9000 (32-bit, non-prefetchable) [size=4K]
Region 2: I/O ports at ecc0 [size=32]
Kernel modules: e1000e

The labeling can be done with these commands

pirqcon 33 system_u:object_r:nicP_t
iomemcon 0xfebe0-0xfebff system_u:object_r:nicP_t
iomemcon 0xfebd9 system_u:object_r:nicP_t
ioportcon 0xecc0-0xecdf system_u:object_r:nicP_t
pcidevicecon 0xc800 system_u:object_r:nicP_t

The PCI device label must be computed as the 32-bit SBDF number for the PCI
device. It the PCI device is aaaa:bb:cc.d or bb:cc.d, then the SBDF can be
calculated using:
SBDF = (a << 16) | (b << 8) | (c << 3) | d

The AVC denials for IRQs, memory, ports, and PCI devices will normally contain
the ranges being denied to more easily determine what resources are required.
When running in permissive mode, only the first denial of a given
source/destination is printed to the log, so labeling devices using this method
may require multiple passes to find all required ranges.

Additional notes on XSM:FLASK
-----------------------------

1) xen command line parameters

a) flask_enforcing

The default value for flask_enforcing is '0'. This parameter causes the
platform to boot in permissive mode which means that the policy is loaded
but not enforced. This mode is often helpful for developing new systems
and policies as the policy violations are reported on the xen console and
may be viewed in dom0 through 'xl dmesg'.

To boot the platform into enforcing mode, which means that the policy is
loaded and enforced, append 'flask_enforcing=1' on the grub line.

This parameter may also be changed through the flask hypercall.

b) flask_enabled

The default value for flask_enabled is '1'. This parameter causes the
platform to enable the FLASK security module under the XSM framework.
The parameter may be enabled/disabled only once per boot. If the parameter
is set to '0', only a reboot can re-enable flask. When flask_enabled is '0'
the DUMMY module is enforced.

This parameter may also be changed through the flask hypercall. But may
only be performed once per boot.