Sophie

Sophie

distrib > Mandriva > 2009.1 > x86_64 > media > contrib-backports > by-pkgid > 5449138d6297d4beefc46ffe46a8c51a > files > 10

waf-1.5.11-1mdv2009.1.noarch.rpm

<?xml version='1.0'?>
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd"
>
<chapter id="architecture">
	<title>Overview of the Waf architecture</title>
	<section id="core_lib">
		<title>The core library</title>
		<para>
			Waf is based on 12 modules which constitute the core library. They are located in the directory <filename>wafadmin/</filename>. The modules located under <filename>wafadmin/Tools</filename> add support for programming languages and more tools, but are not essential for the Waf core.
<table id="core_lib_ref">
<title>The core library</title>
<tgroup cols='2' align='left' colsep='1' rowsep='1'>

<colspec colname='c1'/>
<colspec colname='c2'/>

<thead>
<row>
  <entry>Module</entry>
  <entry>Role</entry>
</row>
</thead>

<tbody>

<row>
  <entry>Build</entry>
  <entry>Defines the build context class, which holds the data for one build (paths, configuration data)</entry>
</row>

<row>
  <entry>Configure</entry>
  <entry>Contains the configuration context class, which is used for launching configuration tests, and the extension system</entry>
</row>

<row>
  <entry>Constants</entry>
  <entry>Provides the constants used in the project</entry>
</row>

<row>
  <entry>Environment</entry>
  <entry>Contains a dictionary class which supports a lightweight copy scheme and provides persistence services</entry>
</row>

<row>
  <entry>Logs</entry>
  <entry>Provide a logging system</entry>
</row>

<row>
  <entry>Node</entry>
  <entry>Contains the file system representation class</entry>
</row>

<row>
  <entry>Options</entry>
  <entry>Provides a custom command-line option processing system based on optparse</entry>
</row>

<row>
  <entry>Runner</entry>
  <entry>Contains the task execution system (threaded producer-consumer)</entry>
</row>

<row>
  <entry>Scripting</entry>
  <entry>Constitutes the entry point of the Waf application, use the command-line for launching the configuration, the build, etc</entry>
</row>

<row>
  <entry>TaskGen</entry>
  <entry>Provide the task generator system, and its extension system based on method addition</entry>
</row>

<row>
  <entry>Task</entry>
  <entry>Contains the task classes, and the task containers.</entry>
</row>

<row>
  <entry>Utils</entry>
  <entry>Contains the support functions and classes re-used in other Waf modules</entry>
</row>


</tbody>
</tgroup>
</table>
	</para>

	<para>
		The essential classes and methods from the core library are represented on the following diagram:
		<graphic format="png" fileref="classes.png" align="center"/>
	</para>

	</section>

	<section id="build_context">
		<title>Build context instances</title>
		<para>
			Executing tasks, accessing the file system and consulting the results of a previous build are very different concerns which have to be encapsulated properly. The core class representing a build is a build context.
		</para>
		<sect2>
			<title>Build context and persistence</title>
			<para>
				The build context holds all the information necessary for a build. To accelerate the start-up, a part of the information is stored and loaded between the runs. The persistent attributes are the following:
<table id="build_context_persistence">
<title>Build context persistence</title>

<tgroup cols='2' align='left' colsep='1' rowsep='1'>

<colspec colname='c1'/>
<colspec colname='c2'/>

<thead>
<row>
  <entry>Attribute</entry>
  <entry>Information</entry>
</row>
</thead>

<tbody>

<row>
  <entry>root</entry>
  <entry>Node representing the root of the file system</entry>
</row>
<row>
  <entry>srcnode</entry>
  <entry>Node representing the source directory</entry>
</row>
<row>
  <entry>bldnode</entry>
  <entry>Node representing the build directory</entry>
</row>
<row>
  <entry>node_sigs</entry>
  <entry>File hashes (dict mapping Node ids to hashes)</entry>
</row>
<row>
  <entry>node_deps</entry>
  <entry>Implicit dependencies (dict mapping Node ids)</entry>
</row>
<row>
  <entry>raw_deps</entry>
  <entry>Implicit file dependencies which could not be resolved (dict mapping Node ids to lists of strings)</entry>
</row>
<row>
  <entry>task_sigs</entry>
  <entry>Signature of the tasks previously run (dict mapping a Task id to a hash)</entry>
</row>
<row>
  <entry>id_nodes</entry>
  <entry>Sequence for generating unique node instance ids (id of the last Node created)</entry>
</row>

</tbody>
</tgroup>
</table>
			</para>
		</sect2>
		<sect2>
			<title>Build context access</title>
			<para>
				In previous Waf releases, the build context was supposed to be a unique object (one build active at a time). To enable the use of Waf as a library, the dependency on the singleton <emphasis>Build.bld</emphasis> was removed. This implies that each object should be able to obtain its build context from its attributes. Here are a few examples:
<table id="build_context_access">
<title>Build context access</title>

<tgroup cols='2' align='left' colsep='1' rowsep='1'>

<colspec colname='c1'/>
<colspec colname='c2'/>

<thead>
<row>
  <entry>Object type</entry>
  <entry>Build context access</entry>
</row>
</thead>

<tbody>

<row>
  <entry>Node</entry>
  <entry>self.__class__.bld</entry>
</row>
<row>
  <entry>task_gen</entry>
  <entry>self.bld</entry>
</row>
<row>
  <entry>Task</entry>
  <entry>self.generator.bld</entry>
</row>

</tbody>
</tgroup>
</table>
			</para>
		</sect2>
		<sect2>
			<title>Parallelization concerns</title>
			<para>
				Build contexts perform an <emphasis>os.chdir</emphasis> call before starting to execute the tasks. When running build contexts within build contexts (tasks), the current working directory may cause various problems. To work around them, it may be necessary to change the compilation rules (compile from the file system root) and to inject code (replace bld.compile).
			</para>
			<para>
				Direct <emphasis>Node</emphasis> instances are not used anywhere in the Waf code. Instead, each build context creates a new Node subclass (bld.node_class), on which the build context instance is attached as a class attribute.
			</para>
		</sect2>
		<sect2>
			<title>Threading concerns</title>
			<para>
				Nearly all the code is executed in the main thread. The other threads are merely waiting for new tasks, and executing the methods <emphasis>run</emphasis> and <emphasis>install</emphasis> from the task instances. As a consequence, such methods should contain as little code as possible, and access the BuildContext in a read-only manner. If such tasks must declare new nodes while executing the build (find_dir, find_resource, ..), then locks must be used to prevent concurrent access (concurrent directory and node creation).
			</para>
			<para>
				If the run methods have to modify the build context, it is recommended to overload the method <emphasis>get_out</emphasis> of the scheduler and to execute methods in an event-like manner (data is attached to the task, and the method get_out executes the code). Adding more tasks from a running task is demonstrated <xref linkend="runtime_discovered_outputs"/>.
			</para>
		</sect2>
	</section>


	<section id="execution_overview">
		<title>Overview of the Waf execution</title>
		<para>
		</para>
		<sect2>
			<title>File system access</title>
			<para>
				File system access is performed through an abstraction layer formed by the build context and <emphasis>Node</emphasis> instances. The data structure was carefully designed to maximize performance, so it is unlikely that it will change again in the future. The idea is to represent one file or one directory by a single Node instance. Dependent data such as file hashes are stored on the build context object and allowed to be persisted. Three kinds of nodes are declared: files, build files and folders. The nodes in a particular directory are unique, but build files used in several variant add duplicate entry on the build context cache.
			</para>
			<para>
				To access a file, the methods <emphasis>Node::find_resource</emphasis>, <emphasis>Node::find_build</emphasis> (find an existing resource or declare a build node) and <emphasis>Node::find_dir</emphasis> must be used. While searching for a particular node, the folders are automatically searched once for the files. Old nodes (which do not have a corresponding file) are automatically removed, except for the build nodes. In some cases (lots of files added and removed), it may be necessary to perform a <emphasis>Waf clean</emphasis> to eliminate the information on build files which do not exist anymore.
			</para>
		</sect2>
		<sect2>
			<title>Task classes</title>
			<para>
				The whole process of generating tasks through Waf is performed by methods added on the class task_gen by code injection. This often puzzles the programmers used to static languages where new functions or classes cannot be defined at runtime.
			</para>
			<para>
				The task generators automatically inherit the build context attribute <emphasis>bld</emphasis> when created from bld(...). Likewise, tasks created from a task generator (create_task) automatically inherit their generator, and their build context. Direct instantiation may result in problems when running Waf as a library.
			</para>
			<para>
				The tasks created by task generator methods are automatically stored on the build context task manager, which stores the task into a task group. The task groups are later used by the scheduler to obtain the task which may run (state machine). Target (un)installation is performed right after a task has run, using the method <emphasis>install</emphasis>.
			</para>
		</sect2>
	</section>

	<section id="performance">
		<title>Performance and build accuracy</title>
		<para>
			From the experience with tools such as SCons, users may be concerned about performance and think that all build systems based on interpreted languages such as Python would not scale. We will now describe why this is not the case for Waf and why Waf should be chosen for building very large projects.
		</para>

		<sect2>
			<title>Comparing Waf against other build systems</title>
			<para>
				Since Waf considers the file contents in the build process, it is often thought that Waf would be much slower than make. For a test project having 5000 files (generated from the script located in <filename>tools/genbench.py</filename>), on a 1.5Ghz computer, the Waf runtime is actually slightly faster than the Gnu/Make one (less than one second). The reason is the time to launch a new process - make is usually called recursively, once by directory.
			</para>
			<para>
				For huge projects, calling make recursively is necessary for flexibility, but it hurts performance (launch many processes), and CPU utilization (running tasks in parallel). Make-based build systems such as CMake or Autotools inherit the limitations of Make.
			</para>
			<para>
				Though Waf uses a similar design as SCons, Waf is about 15 times faster for similar features and without sacrificing build accuracy. The main reasons for this are the following:
				<itemizedlist>
					<listitem>The Waf data structures (file system representation, tasks) have been carefully chosen to minimize memory usage and data duplication</listitem>
					<listitem>For a project of the same size, SCons requires at least 10 times as many function calls</listitem>
				</itemizedlist>
				A few benchmarks are maintained at <ulink url="http://freehackers.org/~tnagy/bench.txt">this location</ulink>
			</para>
		</sect2>

		<sect2>
			<title>Waf hashing schemes and build accuracy</title>
			<para>
				To rebuild targets when source file change, the file contents are hashed and compared. The hashes are used to identify the tasks, and to retrieve the files from a cache (folder defined by the environment variable <emphasis>WAFCACHE</emphasis>). Besides command-lines, this scheme also takes file dependencies into account: it is more accurate than caching systems such as <emphasis>ccache</emphasis>.
			</para>
			<para>
				The Waf hashing scheme uses the md5 algorithm provided by the Python distribution. It is fast enough for up to about 100Mb of data and about 10000 files and very safe (virtually no risk of collision).
			</para>
			<para>
				If more than 100Mb of data is present in the project, it may be necessary to use a faster hashing algorithm. An implementation of the fnv algorithm is present in the Waf distribution, and can replace md5 without really degrading accuracy.
			</para>
			<para>
				If more than 10000 files are present, it may be necessary to replace the hashing system by a <emphasis>file name+size+timestamp hash scheme</emphasis>. An example is provided in the comment section of the module <filename>Utils.py</filename>. That scheme is more efficient but less accurate: the Waf cache should not be used with this scheme.
			</para>
		</sect2>
	</section>
</chapter>