<!DOCTYPE html> <!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]--> <!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]--> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Parallel Computing — Julia Language 0.3.4 documentation</title> <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'> <link rel="stylesheet" href="../_static/julia.css" type="text/css" /> <link rel="stylesheet" href="../_static/julia.css" type="text/css" /> <link rel="top" title="Julia Language 0.3.4 documentation" href="../index.html"/> <link rel="next" title="Running External Programs" href="running-external-programs.html"/> <link rel="prev" title="Networking and Streams" href="networking-and-streams.html"/> <script src="https://cdnjs.cloudflare.com/ajax/libs/modernizr/2.6.2/modernizr.min.js"></script> </head> <body class="wy-body-for-nav" role="document"> <div class="wy-grid-for-nav"> <nav data-toggle="wy-nav-shift" class="wy-nav-side"> <div class="wy-side-nav-search"> <a href="http://julialang.org/"><img src="../_static/julia-logo.svg" class="logo"></a> <!-- <a href="../index.html" class="fa fa-home"> Julia Language</a> --> <div role="search"> <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get"> <input type="text" name="q" placeholder="Search docs" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> </div> <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation"> <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="introduction.html">Introduction</a></li> <li class="toctree-l1"><a class="reference internal" href="getting-started.html">Getting Started</a><ul> <li class="toctree-l2"><a class="reference internal" href="getting-started.html#resources">Resources</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="variables.html">Variables</a><ul> <li class="toctree-l2"><a class="reference internal" href="variables.html#allowed-variable-names">Allowed Variable Names</a></li> <li class="toctree-l2"><a class="reference internal" href="variables.html#stylistic-conventions">Stylistic Conventions</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="integers-and-floating-point-numbers.html">Integers and Floating-Point Numbers</a><ul> <li class="toctree-l2"><a class="reference internal" href="integers-and-floating-point-numbers.html#integers">Integers</a></li> <li class="toctree-l2"><a class="reference internal" href="integers-and-floating-point-numbers.html#floating-point-numbers">Floating-Point Numbers</a></li> <li class="toctree-l2"><a class="reference internal" href="integers-and-floating-point-numbers.html#arbitrary-precision-arithmetic">Arbitrary Precision Arithmetic</a></li> <li class="toctree-l2"><a class="reference internal" href="integers-and-floating-point-numbers.html#numeric-literal-coefficients">Numeric Literal Coefficients</a></li> <li class="toctree-l2"><a class="reference internal" href="integers-and-floating-point-numbers.html#literal-zero-and-one">Literal zero and one</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="mathematical-operations.html">Mathematical Operations and Elementary Functions</a><ul> <li class="toctree-l2"><a class="reference internal" href="mathematical-operations.html#arithmetic-operators">Arithmetic Operators</a></li> <li class="toctree-l2"><a class="reference internal" href="mathematical-operations.html#bitwise-operators">Bitwise Operators</a></li> <li class="toctree-l2"><a class="reference internal" href="mathematical-operations.html#updating-operators">Updating operators</a></li> <li class="toctree-l2"><a class="reference internal" href="mathematical-operations.html#numeric-comparisons">Numeric Comparisons</a></li> <li class="toctree-l2"><a class="reference internal" href="mathematical-operations.html#elementary-functions">Elementary Functions</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="complex-and-rational-numbers.html">Complex and Rational Numbers</a><ul> <li class="toctree-l2"><a class="reference internal" href="complex-and-rational-numbers.html#complex-numbers">Complex Numbers</a></li> <li class="toctree-l2"><a class="reference internal" href="complex-and-rational-numbers.html#rational-numbers">Rational Numbers</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="strings.html">Strings</a><ul> <li class="toctree-l2"><a class="reference internal" href="strings.html#characters">Characters</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#string-basics">String Basics</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#unicode-and-utf-8">Unicode and UTF-8</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#interpolation">Interpolation</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#common-operations">Common Operations</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#non-standard-string-literals">Non-Standard String Literals</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#regular-expressions">Regular Expressions</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#id3">Byte Array Literals</a></li> <li class="toctree-l2"><a class="reference internal" href="strings.html#version-number-literals">Version Number Literals</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="functions.html">Functions</a><ul> <li class="toctree-l2"><a class="reference internal" href="functions.html#argument-passing-behavior">Argument Passing Behavior</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#the-return-keyword">The <tt class="docutils literal"><span class="pre">return</span></tt> Keyword</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#id1">Operators Are Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#operators-with-special-names">Operators With Special Names</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#anonymous-functions">Anonymous Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#multiple-return-values">Multiple Return Values</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#varargs-functions">Varargs Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#optional-arguments">Optional Arguments</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#keyword-arguments">Keyword Arguments</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#evaluation-scope-of-default-values">Evaluation Scope of Default Values</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#block-syntax-for-function-arguments">Block Syntax for Function Arguments</a></li> <li class="toctree-l2"><a class="reference internal" href="functions.html#further-reading">Further Reading</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="control-flow.html">Control Flow</a><ul> <li class="toctree-l2"><a class="reference internal" href="control-flow.html#compound-expressions">Compound Expressions</a></li> <li class="toctree-l2"><a class="reference internal" href="control-flow.html#conditional-evaluation">Conditional Evaluation</a></li> <li class="toctree-l2"><a class="reference internal" href="control-flow.html#short-circuit-evaluation">Short-Circuit Evaluation</a></li> <li class="toctree-l2"><a class="reference internal" href="control-flow.html#repeated-evaluation-loops">Repeated Evaluation: Loops</a></li> <li class="toctree-l2"><a class="reference internal" href="control-flow.html#exception-handling">Exception Handling</a></li> <li class="toctree-l2"><a class="reference internal" href="control-flow.html#tasks-aka-coroutines">Tasks (aka Coroutines)</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="variables-and-scoping.html">Scope of Variables</a><ul> <li class="toctree-l2"><a class="reference internal" href="variables-and-scoping.html#for-loops-and-comprehensions">For Loops and Comprehensions</a></li> <li class="toctree-l2"><a class="reference internal" href="variables-and-scoping.html#constants">Constants</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="types.html">Types</a><ul> <li class="toctree-l2"><a class="reference internal" href="types.html#type-declarations">Type Declarations</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#abstract-types">Abstract Types</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#bits-types">Bits Types</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#composite-types">Composite Types</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#immutable-composite-types">Immutable Composite Types</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#declared-types">Declared Types</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#tuple-types">Tuple Types</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#type-unions">Type Unions</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#man-parametric-types">Parametric Types</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#type-aliases">Type Aliases</a></li> <li class="toctree-l2"><a class="reference internal" href="types.html#operations-on-types">Operations on Types</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="methods.html">Methods</a><ul> <li class="toctree-l2"><a class="reference internal" href="methods.html#defining-methods">Defining Methods</a></li> <li class="toctree-l2"><a class="reference internal" href="methods.html#method-ambiguities">Method Ambiguities</a></li> <li class="toctree-l2"><a class="reference internal" href="methods.html#parametric-methods">Parametric Methods</a></li> <li class="toctree-l2"><a class="reference internal" href="methods.html#note-on-optional-and-keyword-arguments">Note on Optional and keyword Arguments</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="constructors.html">Constructors</a><ul> <li class="toctree-l2"><a class="reference internal" href="constructors.html#outer-constructor-methods">Outer Constructor Methods</a></li> <li class="toctree-l2"><a class="reference internal" href="constructors.html#inner-constructor-methods">Inner Constructor Methods</a></li> <li class="toctree-l2"><a class="reference internal" href="constructors.html#incomplete-initialization">Incomplete Initialization</a></li> <li class="toctree-l2"><a class="reference internal" href="constructors.html#parametric-constructors">Parametric Constructors</a></li> <li class="toctree-l2"><a class="reference internal" href="constructors.html#case-study-rational">Case Study: Rational</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="conversion-and-promotion.html">Conversion and Promotion</a><ul> <li class="toctree-l2"><a class="reference internal" href="conversion-and-promotion.html#conversion">Conversion</a></li> <li class="toctree-l2"><a class="reference internal" href="conversion-and-promotion.html#promotion">Promotion</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="modules.html">Modules</a><ul> <li class="toctree-l2"><a class="reference internal" href="modules.html#summary-of-module-usage">Summary of module usage</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="metaprogramming.html">Metaprogramming</a><ul> <li class="toctree-l2"><a class="reference internal" href="metaprogramming.html#expressions-and-eval">Expressions and Eval</a></li> <li class="toctree-l2"><a class="reference internal" href="metaprogramming.html#macros">Macros</a></li> <li class="toctree-l2"><a class="reference internal" href="metaprogramming.html#reflection">Reflection</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="arrays.html">Multi-dimensional Arrays</a><ul> <li class="toctree-l2"><a class="reference internal" href="arrays.html#arrays">Arrays</a></li> <li class="toctree-l2"><a class="reference internal" href="arrays.html#sparse-matrices">Sparse Matrices</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="linear-algebra.html">Linear algebra</a><ul> <li class="toctree-l2"><a class="reference internal" href="linear-algebra.html#matrix-factorizations">Matrix factorizations</a></li> <li class="toctree-l2"><a class="reference internal" href="linear-algebra.html#special-matrices">Special matrices</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="networking-and-streams.html">Networking and Streams</a><ul> <li class="toctree-l2"><a class="reference internal" href="networking-and-streams.html#basic-stream-i-o">Basic Stream I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="networking-and-streams.html#text-i-o">Text I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="networking-and-streams.html#working-with-files">Working with Files</a></li> <li class="toctree-l2"><a class="reference internal" href="networking-and-streams.html#a-simple-tcp-example">A simple TCP example</a></li> <li class="toctree-l2"><a class="reference internal" href="networking-and-streams.html#resolving-ip-addresses">Resolving IP Addresses</a></li> </ul> </li> <li class="toctree-l1 current"><a class="current reference internal" href="">Parallel Computing</a><ul> <li class="toctree-l2"><a class="reference internal" href="#data-movement">Data Movement</a></li> <li class="toctree-l2"><a class="reference internal" href="#parallel-map-and-loops">Parallel Map and Loops</a></li> <li class="toctree-l2"><a class="reference internal" href="#synchronization-with-remote-references">Synchronization With Remote References</a></li> <li class="toctree-l2"><a class="reference internal" href="#scheduling">Scheduling</a></li> <li class="toctree-l2"><a class="reference internal" href="#distributed-arrays">Distributed Arrays</a></li> <li class="toctree-l2"><a class="reference internal" href="#constructing-distributed-arrays">Constructing Distributed Arrays</a></li> <li class="toctree-l2"><a class="reference internal" href="#distributed-array-operations">Distributed Array Operations</a></li> <li class="toctree-l2"><a class="reference internal" href="#shared-arrays-experimental">Shared Arrays (Experimental)</a></li> <li class="toctree-l2"><a class="reference internal" href="#clustermanagers">ClusterManagers</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="running-external-programs.html">Running External Programs</a><ul> <li class="toctree-l2"><a class="reference internal" href="running-external-programs.html#interpolation">Interpolation</a></li> <li class="toctree-l2"><a class="reference internal" href="running-external-programs.html#quoting">Quoting</a></li> <li class="toctree-l2"><a class="reference internal" href="running-external-programs.html#pipelines">Pipelines</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="calling-c-and-fortran-code.html">Calling C and Fortran Code</a><ul> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#mapping-c-types-to-julia">Mapping C Types to Julia</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#accessing-data-through-a-pointer">Accessing Data through a Pointer</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#passing-pointers-for-modifying-inputs">Passing Pointers for Modifying Inputs</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#garbage-collection-safety">Garbage Collection Safety</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#non-constant-function-specifications">Non-constant Function Specifications</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#indirect-calls">Indirect Calls</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#calling-convention">Calling Convention</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#accessing-global-variables">Accessing Global Variables</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#passing-julia-callback-functions-to-c">Passing Julia Callback Functions to C</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#c">C++</a></li> <li class="toctree-l2"><a class="reference internal" href="calling-c-and-fortran-code.html#handling-platform-variations">Handling Platform Variations</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="interacting-with-julia.html">Interacting With Julia</a><ul> <li class="toctree-l2"><a class="reference internal" href="interacting-with-julia.html#the-different-prompt-modes">The different prompt modes</a></li> <li class="toctree-l2"><a class="reference internal" href="interacting-with-julia.html#key-bindings">Key bindings</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="embedding.html">Embedding Julia</a><ul> <li class="toctree-l2"><a class="reference internal" href="embedding.html#high-level-embedding">High-Level Embedding</a></li> <li class="toctree-l2"><a class="reference internal" href="embedding.html#converting-types">Converting Types</a></li> <li class="toctree-l2"><a class="reference internal" href="embedding.html#calling-julia-functions">Calling Julia Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="embedding.html#memory-management">Memory Management</a></li> <li class="toctree-l2"><a class="reference internal" href="embedding.html#working-with-arrays">Working with Arrays</a></li> <li class="toctree-l2"><a class="reference internal" href="embedding.html#exceptions">Exceptions</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="packages.html">Packages</a><ul> <li class="toctree-l2"><a class="reference internal" href="packages.html#package-status">Package Status</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#adding-and-removing-packages">Adding and Removing Packages</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#installing-unregistered-packages">Installing Unregistered Packages</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#updating-packages">Updating Packages</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#checkout-pin-and-free">Checkout, Pin and Free</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="packages.html#package-development">Package Development</a><ul> <li class="toctree-l2"><a class="reference internal" href="packages.html#initial-setup">Initial Setup</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#generating-a-new-package">Generating a New Package</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#making-your-package-available">Making Your Package Available</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#publishing-your-package">Publishing Your Package</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#tagging-package-versions">Tagging Package Versions</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#fixing-package-requirements">Fixing Package Requirements</a></li> <li class="toctree-l2"><a class="reference internal" href="packages.html#man-package-requirements">Requirements Specification</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="performance-tips.html">Performance Tips</a><ul> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#avoid-global-variables">Avoid global variables</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#measure-performance-with-time-and-pay-attention-to-memory-allocation">Measure performance with <tt class="docutils literal"><span class="pre">@time</span></tt> and pay attention to memory allocation</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#tools">Tools</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#avoid-containers-with-abstract-type-parameters">Avoid containers with abstract type parameters</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#type-declarations">Type declarations</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#break-functions-into-multiple-definitions">Break functions into multiple definitions</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#write-type-stable-functions">Write “type-stable” functions</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#avoid-changing-the-type-of-a-variable">Avoid changing the type of a variable</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#separate-kernel-functions">Separate kernel functions</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#access-arrays-in-memory-order-along-columns">Access arrays in memory order, along columns</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#pre-allocating-outputs">Pre-allocating outputs</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#avoid-string-interpolation-for-i-o">Avoid string interpolation for I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#fix-deprecation-warnings">Fix deprecation warnings</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#tweaks">Tweaks</a></li> <li class="toctree-l2"><a class="reference internal" href="performance-tips.html#performance-annotations">Performance Annotations</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="style-guide.html">Style Guide</a><ul> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#write-functions-not-just-scripts">Write functions, not just scripts</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#avoid-writing-overly-specific-types">Avoid writing overly-specific types</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#handle-excess-argument-diversity-in-the-caller">Handle excess argument diversity in the caller</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#append-to-names-of-functions-that-modify-their-arguments">Append <cite>!</cite> to names of functions that modify their arguments</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#avoid-strange-type-unions">Avoid strange type Unions</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#try-to-avoid-nullable-fields">Try to avoid nullable fields</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#avoid-elaborate-container-types">Avoid elaborate container types</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#avoid-underscores-in-names">Avoid underscores in names</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#don-t-overuse-try-catch">Don’t overuse try-catch</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#don-t-parenthesize-conditions">Don’t parenthesize conditions</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#don-t-overuse">Don’t overuse ...</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#don-t-use-unnecessary-static-parameters">Don’t use unnecessary static parameters</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#avoid-confusion-about-whether-something-is-an-instance-or-a-type">Avoid confusion about whether something is an instance or a type</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#don-t-overuse-macros">Don’t overuse macros</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#don-t-expose-unsafe-operations-at-the-interface-level">Don’t expose unsafe operations at the interface level</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#don-t-overload-methods-of-base-container-types">Don’t overload methods of base container types</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#be-careful-with-type-equality">Be careful with type equality</a></li> <li class="toctree-l2"><a class="reference internal" href="style-guide.html#do-not-write-x-f-x">Do not write <tt class="docutils literal"><span class="pre">x->f(x)</span></tt></a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="faq.html">Frequently Asked Questions</a><ul> <li class="toctree-l2"><a class="reference internal" href="faq.html#sessions-and-the-repl">Sessions and the REPL</a></li> <li class="toctree-l2"><a class="reference internal" href="faq.html#functions">Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="faq.html#types-type-declarations-and-constructors">Types, type declarations, and constructors</a></li> <li class="toctree-l2"><a class="reference internal" href="faq.html#nothingness-and-missing-values">Nothingness and missing values</a></li> <li class="toctree-l2"><a class="reference internal" href="faq.html#julia-releases">Julia Releases</a></li> <li class="toctree-l2"><a class="reference internal" href="faq.html#developing-julia">Developing Julia</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="noteworthy-differences.html">Noteworthy Differences from other Languages</a><ul> <li class="toctree-l2"><a class="reference internal" href="noteworthy-differences.html#noteworthy-differences-from-matlab">Noteworthy differences from MATLAB</a></li> <li class="toctree-l2"><a class="reference internal" href="noteworthy-differences.html#noteworthy-differences-from-r">Noteworthy differences from R</a></li> <li class="toctree-l2"><a class="reference internal" href="noteworthy-differences.html#noteworthy-differences-from-python">Noteworthy differences from Python</a></li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../stdlib/base.html">The Standard Library</a><ul> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#introduction">Introduction</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#getting-around">Getting Around</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#all-objects">All Objects</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#types">Types</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#generic-functions">Generic Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#syntax">Syntax</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#iteration">Iteration</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#general-collections">General Collections</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#iterable-collections">Iterable Collections</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#indexable-collections">Indexable Collections</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#associative-collections">Associative Collections</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#set-like-collections">Set-Like Collections</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#dequeues">Dequeues</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#strings">Strings</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#i-o">I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#network-i-o">Network I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#text-i-o">Text I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#multimedia-i-o">Multimedia I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#memory-mapped-i-o">Memory-mapped I/O</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#standard-numeric-types">Standard Numeric Types</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#mathematical-operators">Mathematical Operators</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#mathematical-functions">Mathematical Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#data-formats">Data Formats</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#numbers">Numbers</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#bigfloats">BigFloats</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#random-numbers">Random Numbers</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#arrays">Arrays</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#combinatorics">Combinatorics</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#statistics">Statistics</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#signal-processing">Signal Processing</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#numerical-integration">Numerical Integration</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#parallel-computing">Parallel Computing</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#distributed-arrays">Distributed Arrays</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#shared-arrays-experimental-unix-only-feature">Shared Arrays (Experimental, UNIX-only feature)</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#system">System</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#c-interface">C Interface</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#errors">Errors</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#tasks">Tasks</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#events">Events</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#reflection">Reflection</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/base.html#internals">Internals</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/sparse.html">Sparse Matrices</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/linalg.html">Linear Algebra</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/linalg.html#module-Base.LinAlg.BLAS">BLAS Functions</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/constants.html">Constants</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/file.html">Filesystem</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/punctuation.html">Punctuation</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/sort.html">Sorting and Related Functions</a><ul> <li class="toctree-l2"><a class="reference internal" href="../stdlib/sort.html#sorting-functions">Sorting Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/sort.html#order-related-functions">Order-Related Functions</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/sort.html#sorting-algorithms">Sorting Algorithms</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/pkg.html">Package Manager Functions</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/collections.html">Collections and Data Structures</a><ul> <li class="toctree-l2"><a class="reference internal" href="../stdlib/collections.html#priorityqueue">PriorityQueue</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/collections.html#heap-functions">Heap Functions</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/graphics.html">Graphics</a><ul> <li class="toctree-l2"><a class="reference internal" href="../stdlib/graphics.html#geometry">Geometry</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/test.html">Unit and Functional Testing</a><ul> <li class="toctree-l2"><a class="reference internal" href="../stdlib/test.html#overview">Overview</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/test.html#handlers">Handlers</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/test.html#macros">Macros</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/test.html#functions">Functions</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/test.html#testing-base-julia">Testing Base Julia</a></li> <li class="toctree-l1"><a class="reference internal" href="../stdlib/profile.html">Profiling</a><ul> <li class="toctree-l2"><a class="reference internal" href="../stdlib/profile.html#basic-usage">Basic usage</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/profile.html#accumulation-and-clearing">Accumulation and clearing</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/profile.html#options-for-controlling-the-display-of-profile-results">Options for controlling the display of profile results</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/profile.html#configuration">Configuration</a></li> <li class="toctree-l2"><a class="reference internal" href="../stdlib/profile.html#function-reference">Function reference</a></li> </ul> </li> </ul> <ul> <li class="toctree-l1"><a class="reference internal" href="../devdocs/julia.html">Documentation of Julia’s Internals</a><ul> <li class="toctree-l2"><a class="reference internal" href="../devdocs/cartesian.html">Base.Cartesian</a></li> <li class="toctree-l2"><a class="reference internal" href="../devdocs/sysimg.html">System Image Building</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../devdocs/C.html">Developing/debugging Julia’s C code</a><ul> <li class="toctree-l2"><a class="reference internal" href="../devdocs/backtraces.html">Reporting and analyzing crashes (segfaults)</a></li> <li class="toctree-l2"><a class="reference internal" href="../devdocs/debuggingtips.html">gdb debugging tips</a></li> </ul> </li> </ul> </div> </nav> <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"> <nav class="wy-nav-top" role="navigation" aria-label="top navigation"> <i data-toggle="wy-nav-top" class="fa fa-bars"></i> <a href="../index.html">Julia Language</a> </nav> <div class="wy-nav-content"> <div class="rst-content"> <div role="navigation" aria-label="breadcrumbs navigation"> <ul class="wy-breadcrumbs"> <li><a href="../index.html">Docs</a> »</li> <li>Parallel Computing</li> <li class="wy-breadcrumbs-aside"> <a href="../_sources/manual/parallel-computing.txt" rel="nofollow"> View page source</a> </li> </ul> <hr/> </div> <div role="main" class="document"> <div class="section" id="parallel-computing"> <span id="man-parallel-computing"></span><h1>Parallel Computing<a class="headerlink" href="#parallel-computing" title="Permalink to this headline">¶</a></h1> <p>Most modern computers possess more than one CPU, and several computers can be combined together in a cluster. Harnessing the power of these multiple CPUs allows many computations to be completed more quickly. There are two major factors that influence performance: the speed of the CPUs themselves, and the speed of their access to memory. In a cluster, it’s fairly obvious that a given CPU will have fastest access to the RAM within the same computer (node). Perhaps more surprisingly, similar issues are relevant on a typical multicore laptop, due to differences in the speed of main memory and the <a class="reference external" href="http://www.akkadia.org/drepper/cpumemory.pdf">cache</a>. Consequently, a good multiprocessing environment should allow control over the “ownership” of a chunk of memory by a particular CPU. Julia provides a multiprocessing environment based on message passing to allow programs to run on multiple processes in separate memory domains at once.</p> <p>Julia’s implementation of message passing is different from other environments such as MPI <a class="footnote-reference" href="#mpi2rma" id="id1">[1]</a>. Communication in Julia is generally “one-sided”, meaning that the programmer needs to explicitly manage only one process in a two-process operation. Furthermore, these operations typically do not look like “message send” and “message receive” but rather resemble higher-level operations like calls to user functions.</p> <p>Parallel programming in Julia is built on two primitives: <em>remote references</em> and <em>remote calls</em>. A remote reference is an object that can be used from any process to refer to an object stored on a particular process. A remote call is a request by one process to call a certain function on certain arguments on another (possibly the same) process. A remote call returns a remote reference to its result. Remote calls return immediately; the process that made the call proceeds to its next operation while the remote call happens somewhere else. You can wait for a remote call to finish by calling <tt class="docutils literal"><span class="pre">wait</span></tt> on its remote reference, and you can obtain the full value of the result using <tt class="docutils literal"><span class="pre">fetch</span></tt>. You can store a value to a remote reference using <tt class="docutils literal"><span class="pre">put</span></tt>.</p> <p>Let’s try this out. Starting with <tt class="docutils literal"><span class="pre">julia</span> <span class="pre">-p</span> <span class="pre">n</span></tt> provides <tt class="docutils literal"><span class="pre">n</span></tt> worker processes on the local machine. Generally it makes sense for <tt class="docutils literal"><span class="pre">n</span></tt> to equal the number of CPU cores on the machine.</p> <div class="highlight-julia"><div class="highlight"><pre><span class="o">$</span> <span class="o">./</span><span class="n">julia</span> <span class="o">-</span><span class="n">p</span> <span class="mi">2</span> <span class="n">julia</span><span class="o">></span> <span class="n">r</span> <span class="o">=</span> <span class="n">remotecall</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">rand</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="n">RemoteRef</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">5</span><span class="p">)</span> <span class="n">julia</span><span class="o">></span> <span class="n">fetch</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="mi">2</span><span class="n">x2</span> <span class="kt">Float64</span> <span class="n">Array</span><span class="p">:</span> <span class="mf">0.60401</span> <span class="mf">0.501111</span> <span class="mf">0.174572</span> <span class="mf">0.157411</span> <span class="n">julia</span><span class="o">></span> <span class="n">s</span> <span class="o">=</span> <span class="p">@</span><span class="n">spawnat</span> <span class="mi">2</span> <span class="mi">1</span> <span class="o">.+</span> <span class="n">fetch</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="n">RemoteRef</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">7</span><span class="p">)</span> <span class="n">julia</span><span class="o">></span> <span class="n">fetch</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="mi">2</span><span class="n">x2</span> <span class="kt">Float64</span> <span class="n">Array</span><span class="p">:</span> <span class="mf">1.60401</span> <span class="mf">1.50111</span> <span class="mf">1.17457</span> <span class="mf">1.15741</span> </pre></div> </div> <p>The first argument to <tt class="docutils literal"><span class="pre">remotecall</span></tt> is the index of the process that will do the work. Most parallel programming in Julia does not reference specific processes or the number of processes available, but <tt class="docutils literal"><span class="pre">remotecall</span></tt> is considered a low-level interface providing finer control. The second argument to <tt class="docutils literal"><span class="pre">remotecall</span></tt> is the function to call, and the remaining arguments will be passed to this function. As you can see, in the first line we asked process 2 to construct a 2-by-2 random matrix, and in the second line we asked it to add 1 to it. The result of both calculations is available in the two remote references, <tt class="docutils literal"><span class="pre">r</span></tt> and <tt class="docutils literal"><span class="pre">s</span></tt>. The <tt class="docutils literal"><span class="pre">@spawnat</span></tt> macro evaluates the expression in the second argument on the process specified by the first argument.</p> <p>Occasionally you might want a remotely-computed value immediately. This typically happens when you read from a remote object to obtain data needed by the next local operation. The function <tt class="docutils literal"><span class="pre">remotecall_fetch</span></tt> exists for this purpose. It is equivalent to <tt class="docutils literal"><span class="pre">fetch(remotecall(...))</span></tt> but is more efficient.</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span><span class="o">></span> <span class="n">remotecall_fetch</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="n">getindex</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="mf">0.10824216411304866</span> </pre></div> </div> <p>Remember that <tt class="docutils literal"><span class="pre">getindex(r,1,1)</span></tt> is <a class="reference internal" href="arrays.html#man-array-indexing"><em>equivalent</em></a> to <tt class="docutils literal"><span class="pre">r[1,1]</span></tt>, so this call fetches the first element of the remote reference <tt class="docutils literal"><span class="pre">r</span></tt>.</p> <p>The syntax of <tt class="docutils literal"><span class="pre">remotecall</span></tt> is not especially convenient. The macro <tt class="docutils literal"><span class="pre">@spawn</span></tt> makes things easier. It operates on an expression rather than a function, and picks where to do the operation for you:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span><span class="o">></span> <span class="n">r</span> <span class="o">=</span> <span class="p">@</span><span class="n">spawn</span> <span class="n">rand</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="n">RemoteRef</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">)</span> <span class="n">julia</span><span class="o">></span> <span class="n">s</span> <span class="o">=</span> <span class="p">@</span><span class="n">spawn</span> <span class="mi">1</span> <span class="o">.+</span> <span class="n">fetch</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="n">RemoteRef</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span> <span class="n">julia</span><span class="o">></span> <span class="n">fetch</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="mf">1.10824216411304866</span> <span class="mf">1.13798233877923116</span> <span class="mf">1.12376292706355074</span> <span class="mf">1.18750497916607167</span> </pre></div> </div> <p>Note that we used <tt class="docutils literal"><span class="pre">1</span> <span class="pre">.+</span> <span class="pre">fetch(r)</span></tt> instead of <tt class="docutils literal"><span class="pre">1</span> <span class="pre">.+</span> <span class="pre">r</span></tt>. This is because we do not know where the code will run, so in general a <tt class="docutils literal"><span class="pre">fetch</span></tt> might be required to move <tt class="docutils literal"><span class="pre">r</span></tt> to the process doing the addition. In this case, <tt class="docutils literal"><span class="pre">@spawn</span></tt> is smart enough to perform the computation on the process that owns <tt class="docutils literal"><span class="pre">r</span></tt>, so the <tt class="docutils literal"><span class="pre">fetch</span></tt> will be a no-op.</p> <p>(It is worth noting that <tt class="docutils literal"><span class="pre">@spawn</span></tt> is not built-in but defined in Julia as a <a class="reference internal" href="metaprogramming.html#man-macros"><em>macro</em></a>. It is possible to define your own such constructs.)</p> <p>One important point is that your code must be available on any process that runs it. For example, type the following into the julia prompt:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span><span class="o">></span> <span class="k">function</span><span class="nf"> rand2</span><span class="p">(</span><span class="n">dims</span><span class="o">...</span><span class="p">)</span> <span class="k">return</span> <span class="mi">2</span><span class="o">*</span><span class="n">rand</span><span class="p">(</span><span class="n">dims</span><span class="o">...</span><span class="p">)</span> <span class="k">end</span> <span class="n">julia</span><span class="o">></span> <span class="n">rand2</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="mi">2</span><span class="n">x2</span> <span class="kt">Float64</span> <span class="n">Array</span><span class="p">:</span> <span class="mf">0.153756</span> <span class="mf">0.368514</span> <span class="mf">1.15119</span> <span class="mf">0.918912</span> <span class="n">julia</span><span class="o">></span> <span class="p">@</span><span class="n">spawn</span> <span class="n">rand2</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="n">RemoteRef</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span> <span class="n">julia</span><span class="o">></span> <span class="p">@</span><span class="n">spawn</span> <span class="n">rand2</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="n">RemoteRef</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">)</span> <span class="n">julia</span><span class="o">></span> <span class="n">exception</span> <span class="n">on</span> <span class="mi">2</span><span class="p">:</span> <span class="k">in</span> <span class="n">anonymous</span><span class="p">:</span> <span class="n">rand2</span> <span class="n">not</span> <span class="n">defined</span> </pre></div> </div> <p>Process 1 knew about the function <tt class="docutils literal"><span class="pre">rand2</span></tt>, but process 2 did not. To make your code available to all processes, the <tt class="docutils literal"><span class="pre">require</span></tt> function will automatically load a source file on all currently available processes:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span><span class="o">></span> <span class="n">require</span><span class="p">(</span><span class="s">"myfile"</span><span class="p">)</span> </pre></div> </div> <p>In a cluster, the contents of the file (and any files loaded recursively) will be sent over the network. It is also useful to execute a statement on all processes. This can be done with the <tt class="docutils literal"><span class="pre">@everywhere</span></tt> macro:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span><span class="o">></span> <span class="p">@</span><span class="n">everywhere</span> <span class="n">id</span> <span class="o">=</span> <span class="n">myid</span><span class="p">()</span> <span class="n">julia</span><span class="o">></span> <span class="n">remotecall_fetch</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="p">()</span><span class="o">-></span><span class="n">id</span><span class="p">)</span> <span class="mi">2</span> <span class="p">@</span><span class="n">everywhere</span> <span class="n">include</span><span class="p">(</span><span class="s">"defs.jl"</span><span class="p">)</span> </pre></div> </div> <p>A file can also be preloaded on multiple processes at startup, and a driver script can be used to drive the computation:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span> <span class="o">-</span><span class="n">p</span> <span class="o"><</span><span class="n">n</span><span class="o">></span> <span class="o">-</span><span class="n">L</span> <span class="n">file1</span><span class="o">.</span><span class="n">jl</span> <span class="o">-</span><span class="n">L</span> <span class="n">file2</span><span class="o">.</span><span class="n">jl</span> <span class="n">driver</span><span class="o">.</span><span class="n">jl</span> </pre></div> </div> <p>Each process has an associated identifier. The process providing the interactive julia prompt always has an id equal to 1, as would the julia process running the driver script in the example above. The processes used by default for parallel operations are referred to as <tt class="docutils literal"><span class="pre">workers</span></tt>. When there is only one process, process 1 is considered a worker. Otherwise, workers are considered to be all processes other than process 1.</p> <p>The base Julia installation has in-built support for two types of clusters:</p> <blockquote> <div><ul class="simple"> <li>A local cluster specified with the <tt class="docutils literal"><span class="pre">-p</span></tt> option as shown above.</li> <li>A cluster spanning machines using the <tt class="docutils literal"><span class="pre">--machinefile</span></tt> option. This uses a passwordless <tt class="docutils literal"><span class="pre">ssh</span></tt> login to start julia worker processes (from the same path as the current host) on the specified machines.</li> </ul> </div></blockquote> <p>Functions <tt class="docutils literal"><span class="pre">addprocs</span></tt>, <tt class="docutils literal"><span class="pre">rmprocs</span></tt>, <tt class="docutils literal"><span class="pre">workers</span></tt>, and others are available as a programmatic means of adding, removing and querying the processes in a cluster.</p> <p>Other types of clusters can be supported by writing your own custom ClusterManager. See section on ClusterManagers.</p> <div class="section" id="data-movement"> <h2>Data Movement<a class="headerlink" href="#data-movement" title="Permalink to this headline">¶</a></h2> <p>Sending messages and moving data constitute most of the overhead in a parallel program. Reducing the number of messages and the amount of data sent is critical to achieving performance and scalability. To this end, it is important to understand the data movement performed by Julia’s various parallel programming constructs.</p> <p><tt class="docutils literal"><span class="pre">fetch</span></tt> can be considered an explicit data movement operation, since it directly asks that an object be moved to the local machine. <tt class="docutils literal"><span class="pre">@spawn</span></tt> (and a few related constructs) also moves data, but this is not as obvious, hence it can be called an implicit data movement operation. Consider these two approaches to constructing and squaring a random matrix:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="c"># method 1</span> <span class="n">A</span> <span class="o">=</span> <span class="n">rand</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span><span class="mi">1000</span><span class="p">)</span> <span class="n">Bref</span> <span class="o">=</span> <span class="p">@</span><span class="n">spawn</span> <span class="n">A</span><span class="o">^</span><span class="mi">2</span> <span class="o">...</span> <span class="n">fetch</span><span class="p">(</span><span class="n">Bref</span><span class="p">)</span> <span class="c"># method 2</span> <span class="n">Bref</span> <span class="o">=</span> <span class="p">@</span><span class="n">spawn</span> <span class="n">rand</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span><span class="mi">1000</span><span class="p">)</span><span class="o">^</span><span class="mi">2</span> <span class="o">...</span> <span class="n">fetch</span><span class="p">(</span><span class="n">Bref</span><span class="p">)</span> </pre></div> </div> <p>The difference seems trivial, but in fact is quite significant due to the behavior of <tt class="docutils literal"><span class="pre">@spawn</span></tt>. In the first method, a random matrix is constructed locally, then sent to another process where it is squared. In the second method, a random matrix is both constructed and squared on another process. Therefore the second method sends much less data than the first.</p> <p>In this toy example, the two methods are easy to distinguish and choose from. However, in a real program designing data movement might require more thought and likely some measurement. For example, if the first process needs matrix <tt class="docutils literal"><span class="pre">A</span></tt> then the first method might be better. Or, if computing <tt class="docutils literal"><span class="pre">A</span></tt> is expensive and only the current process has it, then moving it to another process might be unavoidable. Or, if the current process has very little to do between the <tt class="docutils literal"><span class="pre">@spawn</span></tt> and <tt class="docutils literal"><span class="pre">fetch(Bref)</span></tt> then it might be better to eliminate the parallelism altogether. Or imagine <tt class="docutils literal"><span class="pre">rand(1000,1000)</span></tt> is replaced with a more expensive operation. Then it might make sense to add another <tt class="docutils literal"><span class="pre">@spawn</span></tt> statement just for this step.</p> </div> <div class="section" id="parallel-map-and-loops"> <h2>Parallel Map and Loops<a class="headerlink" href="#parallel-map-and-loops" title="Permalink to this headline">¶</a></h2> <p>Fortunately, many useful parallel computations do not require data movement. A common example is a Monte Carlo simulation, where multiple processes can handle independent simulation trials simultaneously. We can use <tt class="docutils literal"><span class="pre">@spawn</span></tt> to flip coins on two processes. First, write the following function in <tt class="docutils literal"><span class="pre">count_heads.jl</span></tt>:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="k">function</span><span class="nf"> count_heads</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="n">c</span><span class="p">::</span><span class="kt">Int</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="p">:</span><span class="n">n</span> <span class="n">c</span> <span class="o">+=</span> <span class="n">randbool</span><span class="p">()</span> <span class="k">end</span> <span class="n">c</span> <span class="k">end</span> </pre></div> </div> <p>The function <tt class="docutils literal"><span class="pre">count_heads</span></tt> simply adds together <tt class="docutils literal"><span class="pre">n</span></tt> random bits. Here is how we can perform some trials on two machines, and add together the results:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">require</span><span class="p">(</span><span class="s">"count_heads"</span><span class="p">)</span> <span class="n">a</span> <span class="o">=</span> <span class="p">@</span><span class="n">spawn</span> <span class="n">count_heads</span><span class="p">(</span><span class="mi">100000000</span><span class="p">)</span> <span class="n">b</span> <span class="o">=</span> <span class="p">@</span><span class="n">spawn</span> <span class="n">count_heads</span><span class="p">(</span><span class="mi">100000000</span><span class="p">)</span> <span class="n">fetch</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="o">+</span><span class="n">fetch</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> </pre></div> </div> <p>This example demonstrates a powerful and often-used parallel programming pattern. Many iterations run independently over several processes, and then their results are combined using some function. The combination process is called a <em>reduction</em>, since it is generally tensor-rank-reducing: a vector of numbers is reduced to a single number, or a matrix is reduced to a single row or column, etc. In code, this typically looks like the pattern <tt class="docutils literal"><span class="pre">x</span> <span class="pre">=</span> <span class="pre">f(x,v[i])</span></tt>, where <tt class="docutils literal"><span class="pre">x</span></tt> is the accumulator, <tt class="docutils literal"><span class="pre">f</span></tt> is the reduction function, and the <tt class="docutils literal"><span class="pre">v[i]</span></tt> are the elements being reduced. It is desirable for <tt class="docutils literal"><span class="pre">f</span></tt> to be associative, so that it does not matter what order the operations are performed in.</p> <p>Notice that our use of this pattern with <tt class="docutils literal"><span class="pre">count_heads</span></tt> can be generalized. We used two explicit <tt class="docutils literal"><span class="pre">@spawn</span></tt> statements, which limits the parallelism to two processes. To run on any number of processes, we can use a <em>parallel for loop</em>, which can be written in Julia like this:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">nheads</span> <span class="o">=</span> <span class="p">@</span><span class="n">parallel</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="p">:</span><span class="mi">200000000</span> <span class="n">int</span><span class="p">(</span><span class="n">randbool</span><span class="p">())</span> <span class="k">end</span> </pre></div> </div> <p>This construct implements the pattern of assigning iterations to multiple processes, and combining them with a specified reduction (in this case <tt class="docutils literal"><span class="pre">(+)</span></tt>). The result of each iteration is taken as the value of the last expression inside the loop. The whole parallel loop expression itself evaluates to the final answer.</p> <p>Note that although parallel for loops look like serial for loops, their behavior is dramatically different. In particular, the iterations do not happen in a specified order, and writes to variables or arrays will not be globally visible since iterations run on different processes. Any variables used inside the parallel loop will be copied and broadcast to each process.</p> <p>For example, the following code will not work as intended:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">a</span> <span class="o">=</span> <span class="n">zeros</span><span class="p">(</span><span class="mi">100000</span><span class="p">)</span> <span class="p">@</span><span class="n">parallel</span> <span class="k">for</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="p">:</span><span class="mi">100000</span> <span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span> <span class="k">end</span> </pre></div> </div> <p>Notice that the reduction operator can be omitted if it is not needed. However, this code will not initialize all of <tt class="docutils literal"><span class="pre">a</span></tt>, since each process will have a separate copy of it. Parallel for loops like these must be avoided. Fortunately, distributed arrays can be used to get around this limitation, as we will see in the next section.</p> <p>Using “outside” variables in parallel loops is perfectly reasonable if the variables are read-only:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">a</span> <span class="o">=</span> <span class="n">randn</span><span class="p">(</span><span class="mi">1000</span><span class="p">)</span> <span class="p">@</span><span class="n">parallel</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="p">:</span><span class="mi">100000</span> <span class="n">f</span><span class="p">(</span><span class="n">a</span><span class="p">[</span><span class="n">rand</span><span class="p">(</span><span class="mi">1</span><span class="p">:</span><span class="k">end</span><span class="p">)])</span> <span class="k">end</span> </pre></div> </div> <p>Here each iteration applies <tt class="docutils literal"><span class="pre">f</span></tt> to a randomly-chosen sample from a vector <tt class="docutils literal"><span class="pre">a</span></tt> shared by all processes.</p> <p>In some cases no reduction operator is needed, and we merely wish to apply a function to all integers in some range (or, more generally, to all elements in some collection). This is another useful operation called <em>parallel map</em>, implemented in Julia as the <tt class="docutils literal"><span class="pre">pmap</span></tt> function. For example, we could compute the singular values of several large random matrices in parallel as follows:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">M</span> <span class="o">=</span> <span class="p">{</span><span class="n">rand</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span><span class="mi">1000</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="o">=</span><span class="mi">1</span><span class="p">:</span><span class="mi">10</span><span class="p">}</span> <span class="n">pmap</span><span class="p">(</span><span class="n">svd</span><span class="p">,</span> <span class="n">M</span><span class="p">)</span> </pre></div> </div> <p>Julia’s <tt class="docutils literal"><span class="pre">pmap</span></tt> is designed for the case where each function call does a large amount of work. In contrast, <tt class="docutils literal"><span class="pre">@parallel</span> <span class="pre">for</span></tt> can handle situations where each iteration is tiny, perhaps merely summing two numbers. Only worker processes are used by both <tt class="docutils literal"><span class="pre">pmap</span></tt> and <tt class="docutils literal"><span class="pre">@parallel</span> <span class="pre">for</span></tt> for the parallel computation. In case of <tt class="docutils literal"><span class="pre">@parallel</span> <span class="pre">for</span></tt>, the final reduction is done on the calling process.</p> </div> <div class="section" id="synchronization-with-remote-references"> <h2>Synchronization With Remote References<a class="headerlink" href="#synchronization-with-remote-references" title="Permalink to this headline">¶</a></h2> </div> <div class="section" id="scheduling"> <h2>Scheduling<a class="headerlink" href="#scheduling" title="Permalink to this headline">¶</a></h2> <p>Julia’s parallel programming platform uses <a class="reference internal" href="control-flow.html#man-tasks"><em>Tasks (aka Coroutines)</em></a> to switch among multiple computations. Whenever code performs a communication operation like <tt class="docutils literal"><span class="pre">fetch</span></tt> or <tt class="docutils literal"><span class="pre">wait</span></tt>, the current task is suspended and a scheduler picks another task to run. A task is restarted when the event it is waiting for completes.</p> <p>For many problems, it is not necessary to think about tasks directly. However, they can be used to wait for multiple events at the same time, which provides for <em>dynamic scheduling</em>. In dynamic scheduling, a program decides what to compute or where to compute it based on when other jobs finish. This is needed for unpredictable or unbalanced workloads, where we want to assign more work to processes only when they finish their current tasks.</p> <p>As an example, consider computing the singular values of matrices of different sizes:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">M</span> <span class="o">=</span> <span class="p">{</span><span class="n">rand</span><span class="p">(</span><span class="mi">800</span><span class="p">,</span><span class="mi">800</span><span class="p">),</span> <span class="n">rand</span><span class="p">(</span><span class="mi">600</span><span class="p">,</span><span class="mi">600</span><span class="p">),</span> <span class="n">rand</span><span class="p">(</span><span class="mi">800</span><span class="p">,</span><span class="mi">800</span><span class="p">),</span> <span class="n">rand</span><span class="p">(</span><span class="mi">600</span><span class="p">,</span><span class="mi">600</span><span class="p">)}</span> <span class="n">pmap</span><span class="p">(</span><span class="n">svd</span><span class="p">,</span> <span class="n">M</span><span class="p">)</span> </pre></div> </div> <p>If one process handles both 800x800 matrices and another handles both 600x600 matrices, we will not get as much scalability as we could. The solution is to make a local task to “feed” work to each process when it completes its current task. This can be seen in the implementation of <tt class="docutils literal"><span class="pre">pmap</span></tt>:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="k">function</span><span class="nf"> pmap</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">lst</span><span class="p">)</span> <span class="n">np</span> <span class="o">=</span> <span class="n">nprocs</span><span class="p">()</span> <span class="c"># determine the number of processes available</span> <span class="n">n</span> <span class="o">=</span> <span class="n">length</span><span class="p">(</span><span class="n">lst</span><span class="p">)</span> <span class="n">results</span> <span class="o">=</span> <span class="n">cell</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span> <span class="c"># function to produce the next work item from the queue.</span> <span class="c"># in this case it's just an index.</span> <span class="n">nextidx</span><span class="p">()</span> <span class="o">=</span> <span class="p">(</span><span class="n">idx</span><span class="o">=</span><span class="n">i</span><span class="p">;</span> <span class="n">i</span><span class="o">+=</span><span class="mi">1</span><span class="p">;</span> <span class="n">idx</span><span class="p">)</span> <span class="p">@</span><span class="n">sync</span> <span class="k">begin</span> <span class="k">for</span> <span class="n">p</span><span class="o">=</span><span class="mi">1</span><span class="p">:</span><span class="n">np</span> <span class="k">if</span> <span class="n">p</span> <span class="o">!=</span> <span class="n">myid</span><span class="p">()</span> <span class="o">||</span> <span class="n">np</span> <span class="o">==</span> <span class="mi">1</span> <span class="p">@</span><span class="n">async</span> <span class="k">begin</span> <span class="k">while</span> <span class="n">true</span> <span class="n">idx</span> <span class="o">=</span> <span class="n">nextidx</span><span class="p">()</span> <span class="k">if</span> <span class="n">idx</span> <span class="o">></span> <span class="n">n</span> <span class="k">break</span> <span class="k">end</span> <span class="n">results</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="o">=</span> <span class="n">remotecall_fetch</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">lst</span><span class="p">[</span><span class="n">idx</span><span class="p">])</span> <span class="k">end</span> <span class="k">end</span> <span class="k">end</span> <span class="k">end</span> <span class="k">end</span> <span class="n">results</span> <span class="k">end</span> </pre></div> </div> <p><tt class="docutils literal"><span class="pre">@async</span></tt> is similar to <tt class="docutils literal"><span class="pre">@spawn</span></tt>, but only runs tasks on the local process. We use it to create a “feeder” task for each process. Each task picks the next index that needs to be computed, then waits for its process to finish, then repeats until we run out of indexes. Note that the feeder tasks do not begin to execute until the main task reaches the end of the <tt class="docutils literal"><span class="pre">@sync</span></tt> block, at which point it surrenders control and waits for all the local tasks to complete before returning from the function. The feeder tasks are able to share state via <tt class="docutils literal"><span class="pre">nextidx()</span></tt> because they all run on the same process. No locking is required, since the threads are scheduled cooperatively and not preemptively. This means context switches only occur at well-defined points: in this case, when <tt class="docutils literal"><span class="pre">remotecall_fetch</span></tt> is called.</p> </div> <div class="section" id="distributed-arrays"> <h2>Distributed Arrays<a class="headerlink" href="#distributed-arrays" title="Permalink to this headline">¶</a></h2> <p>Large computations are often organized around large arrays of data. In these cases, a particularly natural way to obtain parallelism is to distribute arrays among several processes. This combines the memory resources of multiple machines, allowing use of arrays too large to fit on one machine. Each process operates on the part of the array it owns, providing a ready answer to the question of how a program should be divided among machines.</p> <p>Julia distributed arrays are implemented by the <tt class="docutils literal"><span class="pre">DArray</span></tt> type. A <tt class="docutils literal"><span class="pre">DArray</span></tt> has an element type and dimensions just like an <tt class="docutils literal"><span class="pre">Array</span></tt>. A <tt class="docutils literal"><span class="pre">DArray</span></tt> can also use arbitrary array-like types to represent the local chunks that store actual data. The data in a <tt class="docutils literal"><span class="pre">DArray</span></tt> is distributed by dividing the index space into some number of blocks in each dimension.</p> <p>Common kinds of arrays can be constructed with functions beginning with <tt class="docutils literal"><span class="pre">d</span></tt>:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">dzeros</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">100</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span> <span class="n">dones</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">100</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span> <span class="n">drand</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">100</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span> <span class="n">drandn</span><span class="p">(</span><span class="mi">100</span><span class="p">,</span><span class="mi">100</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span> <span class="n">dfill</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span><span class="mi">100</span><span class="p">,</span><span class="mi">10</span><span class="p">)</span> </pre></div> </div> <p>In the last case, each element will be initialized to the specified value <tt class="docutils literal"><span class="pre">x</span></tt>. These functions automatically pick a distribution for you. For more control, you can specify which processes to use, and how the data should be distributed:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">dzeros</span><span class="p">((</span><span class="mi">100</span><span class="p">,</span><span class="mi">100</span><span class="p">),</span> <span class="n">workers</span><span class="p">()[</span><span class="mi">1</span><span class="p">:</span><span class="mi">4</span><span class="p">],</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">])</span> </pre></div> </div> <p>The second argument specifies that the array should be created on the first four workers. When dividing data among a large number of processes, one often sees diminishing returns in performance. Placing <tt class="docutils literal"><span class="pre">DArray</span></tt>s on a subset of processes allows multiple <tt class="docutils literal"><span class="pre">DArray</span></tt> computations to happen at once, with a higher ratio of work to communication on each process.</p> <p>The third argument specifies a distribution; the nth element of this array specifies how many pieces dimension n should be divided into. In this example the first dimension will not be divided, and the second dimension will be divided into 4 pieces. Therefore each local chunk will be of size <tt class="docutils literal"><span class="pre">(100,25)</span></tt>. Note that the product of the distribution array must equal the number of processes.</p> <p><tt class="docutils literal"><span class="pre">distribute(a::Array)</span></tt> converts a local array to a distributed array.</p> <p><tt class="docutils literal"><span class="pre">localpart(a::DArray)</span></tt> obtains the locally-stored portion of a <tt class="docutils literal"><span class="pre">DArray</span></tt>.</p> <p><tt class="docutils literal"><span class="pre">localindexes(a::DArray)</span></tt> gives a tuple of the index ranges owned by the local process.</p> <p><tt class="docutils literal"><span class="pre">convert(Array,</span> <span class="pre">a::DArray)</span></tt> brings all the data to the local process.</p> <p>Indexing a <tt class="docutils literal"><span class="pre">DArray</span></tt> (square brackets) with ranges of indexes always creates a <tt class="docutils literal"><span class="pre">SubArray</span></tt>, not copying any data.</p> </div> <div class="section" id="constructing-distributed-arrays"> <h2>Constructing Distributed Arrays<a class="headerlink" href="#constructing-distributed-arrays" title="Permalink to this headline">¶</a></h2> <p>The primitive <tt class="docutils literal"><span class="pre">DArray</span></tt> constructor has the following somewhat elaborate signature:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">DArray</span><span class="p">(</span><span class="n">init</span><span class="p">,</span> <span class="n">dims</span><span class="p">[,</span> <span class="n">procs</span><span class="p">,</span> <span class="n">dist</span><span class="p">])</span> </pre></div> </div> <p><tt class="docutils literal"><span class="pre">init</span></tt> is a function that accepts a tuple of index ranges. This function should allocate a local chunk of the distributed array and initialize it for the specified indices. <tt class="docutils literal"><span class="pre">dims</span></tt> is the overall size of the distributed array. <tt class="docutils literal"><span class="pre">procs</span></tt> optionally specifies a vector of process IDs to use. <tt class="docutils literal"><span class="pre">dist</span></tt> is an integer vector specifying how many chunks the distributed array should be divided into in each dimension.</p> <p>The last two arguments are optional, and defaults will be used if they are omitted.</p> <p>As an example, here is how to turn the local array constructor <tt class="docutils literal"><span class="pre">fill</span></tt> into a distributed array constructor:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">dfill</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">args</span><span class="o">...</span><span class="p">)</span> <span class="o">=</span> <span class="n">DArray</span><span class="p">(</span><span class="n">I</span><span class="o">-></span><span class="n">fill</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">map</span><span class="p">(</span><span class="n">length</span><span class="p">,</span><span class="n">I</span><span class="p">)),</span> <span class="n">args</span><span class="o">...</span><span class="p">)</span> </pre></div> </div> <p>In this case the <tt class="docutils literal"><span class="pre">init</span></tt> function only needs to call <tt class="docutils literal"><span class="pre">fill</span></tt> with the dimensions of the local piece it is creating.</p> </div> <div class="section" id="distributed-array-operations"> <h2>Distributed Array Operations<a class="headerlink" href="#distributed-array-operations" title="Permalink to this headline">¶</a></h2> <p>At this time, distributed arrays do not have much functionality. Their major utility is allowing communication to be done via array indexing, which is convenient for many problems. As an example, consider implementing the “life” cellular automaton, where each cell in a grid is updated according to its neighboring cells. To compute a chunk of the result of one iteration, each process needs the immediate neighbor cells of its local chunk. The following code accomplishes this:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="k">function</span><span class="nf"> life_step</span><span class="p">(</span><span class="n">d</span><span class="p">::</span><span class="n">DArray</span><span class="p">)</span> <span class="n">DArray</span><span class="p">(</span><span class="n">size</span><span class="p">(</span><span class="n">d</span><span class="p">),</span><span class="n">procs</span><span class="p">(</span><span class="n">d</span><span class="p">))</span> <span class="k">do</span> <span class="n">I</span> <span class="n">top</span> <span class="o">=</span> <span class="n">mod</span><span class="p">(</span><span class="n">first</span><span class="p">(</span><span class="n">I</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span><span class="n">size</span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span><span class="o">+</span><span class="mi">1</span> <span class="n">bot</span> <span class="o">=</span> <span class="n">mod</span><span class="p">(</span> <span class="n">last</span><span class="p">(</span><span class="n">I</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="p">,</span><span class="n">size</span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span><span class="o">+</span><span class="mi">1</span> <span class="n">left</span> <span class="o">=</span> <span class="n">mod</span><span class="p">(</span><span class="n">first</span><span class="p">(</span><span class="n">I</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span><span class="n">size</span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="mi">2</span><span class="p">))</span><span class="o">+</span><span class="mi">1</span> <span class="n">right</span> <span class="o">=</span> <span class="n">mod</span><span class="p">(</span> <span class="n">last</span><span class="p">(</span><span class="n">I</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="p">,</span><span class="n">size</span><span class="p">(</span><span class="n">d</span><span class="p">,</span><span class="mi">2</span><span class="p">))</span><span class="o">+</span><span class="mi">1</span> <span class="n">old</span> <span class="o">=</span> <span class="n">Array</span><span class="p">(</span><span class="kt">Bool</span><span class="p">,</span> <span class="n">length</span><span class="p">(</span><span class="n">I</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span> <span class="n">length</span><span class="p">(</span><span class="n">I</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="o">+</span><span class="mi">2</span><span class="p">)</span> <span class="n">old</span><span class="p">[</span><span class="mi">1</span> <span class="p">,</span> <span class="mi">1</span> <span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">top</span> <span class="p">,</span> <span class="n">left</span><span class="p">]</span> <span class="c"># left side</span> <span class="n">old</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="k">end</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span> <span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">I</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">left</span><span class="p">]</span> <span class="n">old</span><span class="p">[</span><span class="k">end</span> <span class="p">,</span> <span class="mi">1</span> <span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">bot</span> <span class="p">,</span> <span class="n">left</span><span class="p">]</span> <span class="n">old</span><span class="p">[</span><span class="mi">1</span> <span class="p">,</span> <span class="mi">2</span><span class="p">:</span><span class="k">end</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">top</span> <span class="p">,</span> <span class="n">I</span><span class="p">[</span><span class="mi">2</span><span class="p">]]</span> <span class="n">old</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="k">end</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span><span class="k">end</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">I</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">I</span><span class="p">[</span><span class="mi">2</span><span class="p">]]</span> <span class="c"># middle</span> <span class="n">old</span><span class="p">[</span><span class="k">end</span> <span class="p">,</span> <span class="mi">2</span><span class="p">:</span><span class="k">end</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">bot</span> <span class="p">,</span> <span class="n">I</span><span class="p">[</span><span class="mi">2</span><span class="p">]]</span> <span class="n">old</span><span class="p">[</span><span class="mi">1</span> <span class="p">,</span> <span class="k">end</span> <span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">top</span> <span class="p">,</span> <span class="n">right</span><span class="p">]</span> <span class="c"># right side</span> <span class="n">old</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="k">end</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="k">end</span> <span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">I</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">right</span><span class="p">]</span> <span class="n">old</span><span class="p">[</span><span class="k">end</span> <span class="p">,</span> <span class="k">end</span> <span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="n">bot</span> <span class="p">,</span> <span class="n">right</span><span class="p">]</span> <span class="n">life_rule</span><span class="p">(</span><span class="n">old</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> </pre></div> </div> <p>As you can see, we use a series of indexing expressions to fetch data into a local array <tt class="docutils literal"><span class="pre">old</span></tt>. Note that the <tt class="docutils literal"><span class="pre">do</span></tt> block syntax is convenient for passing <tt class="docutils literal"><span class="pre">init</span></tt> functions to the <tt class="docutils literal"><span class="pre">DArray</span></tt> constructor. Next, the serial function <tt class="docutils literal"><span class="pre">life_rule</span></tt> is called to apply the update rules to the data, yielding the needed <tt class="docutils literal"><span class="pre">DArray</span></tt> chunk. Nothing about <tt class="docutils literal"><span class="pre">life_rule</span></tt> is <tt class="docutils literal"><span class="pre">DArray</span></tt>-specific, but we list it here for completeness:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="k">function</span><span class="nf"> life_rule</span><span class="p">(</span><span class="n">old</span><span class="p">)</span> <span class="n">m</span><span class="p">,</span> <span class="n">n</span> <span class="o">=</span> <span class="n">size</span><span class="p">(</span><span class="n">old</span><span class="p">)</span> <span class="nb">new</span> <span class="o">=</span> <span class="n">similar</span><span class="p">(</span><span class="n">old</span><span class="p">,</span> <span class="n">m</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="n">n</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span> <span class="k">for</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">2</span><span class="p">:</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span> <span class="k">for</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">2</span><span class="p">:</span><span class="n">m</span><span class="o">-</span><span class="mi">1</span> <span class="n">nc</span> <span class="o">=</span> <span class="o">+</span><span class="p">(</span><span class="n">old</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="n">j</span><span class="p">],</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="n">j</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span> <span class="p">,</span><span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span> <span class="p">,</span><span class="n">j</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">j</span><span class="p">],</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">j</span><span class="o">+</span><span class="mi">1</span><span class="p">])</span> <span class="nb">new</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="n">j</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">nc</span> <span class="o">==</span> <span class="mi">3</span> <span class="o">||</span> <span class="n">nc</span> <span class="o">==</span> <span class="mi">2</span> <span class="o">&&</span> <span class="n">old</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">])</span> <span class="k">end</span> <span class="k">end</span> <span class="nb">new</span> <span class="k">end</span> </pre></div> </div> </div> <div class="section" id="shared-arrays-experimental"> <h2>Shared Arrays (Experimental)<a class="headerlink" href="#shared-arrays-experimental" title="Permalink to this headline">¶</a></h2> <p>Shared Arrays use system shared memory to map the same array across many processes. While there are some similarities to a <tt class="docutils literal"><span class="pre">DArray</span></tt>, the behavior of a <tt class="docutils literal"><span class="pre">SharedArray</span></tt> is quite different. In a <tt class="docutils literal"><span class="pre">DArray</span></tt>, each process has local access to just a chunk of the data, and no two processes share the same chunk; in contrast, in a <tt class="docutils literal"><span class="pre">SharedArray</span></tt> each “participating” process has access to the entire array. A <tt class="docutils literal"><span class="pre">SharedArray</span></tt> is a good choice when you want to have a large amount of data jointly accessible to two or more processes on the same machine.</p> <p><tt class="docutils literal"><span class="pre">SharedArray</span></tt> indexing (assignment and accessing values) works just as with regular arrays, and is efficient because the underlying memory is available to the local process. Therefore, most algorithms work naturally on <tt class="docutils literal"><span class="pre">SharedArrays</span></tt>, albeit in single-process mode. In cases where an algorithm insists on an <tt class="docutils literal"><span class="pre">Array</span></tt> input, the underlying array can be retrieved from a <tt class="docutils literal"><span class="pre">SharedArray</span></tt> by calling <tt class="docutils literal"><span class="pre">sdata(S)</span></tt>. For other <tt class="docutils literal"><span class="pre">AbstractArray</span></tt> types, <tt class="docutils literal"><span class="pre">sdata</span></tt> just returns the object itself, so it’s safe to use <tt class="docutils literal"><span class="pre">sdata</span></tt> on any Array-type object.</p> <p>The constructor for a shared array is of the form:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">SharedArray</span><span class="p">(</span><span class="n">T</span><span class="p">::</span><span class="n">Type</span><span class="p">,</span> <span class="n">dims</span><span class="p">::</span><span class="n">NTuple</span><span class="p">;</span> <span class="n">init</span><span class="o">=</span><span class="n">false</span><span class="p">,</span> <span class="n">pids</span><span class="o">=</span><span class="kt">Int</span><span class="p">[])</span> </pre></div> </div> <p>which creates a shared array of a bitstype <tt class="docutils literal"><span class="pre">T</span></tt> and size <tt class="docutils literal"><span class="pre">dims</span></tt> across the processes specified by <tt class="docutils literal"><span class="pre">pids</span></tt>. Unlike distributed arrays, a shared array is accessible only from those participating workers specified by the <tt class="docutils literal"><span class="pre">pids</span></tt> named argument (and the creating process too, if it is on the same host).</p> <p>If an <tt class="docutils literal"><span class="pre">init</span></tt> function, of signature <tt class="docutils literal"><span class="pre">initfn(S::SharedArray)</span></tt>, is specified, it is called on all the participating workers. You can arrange it so that each worker runs the <tt class="docutils literal"><span class="pre">init</span></tt> function on a distinct portion of the array, thereby parallelizing initialization.</p> <p>Here’s a brief example:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span><span class="o">></span> <span class="n">addprocs</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span> <span class="mi">3</span><span class="o">-</span><span class="n">element</span> <span class="n">Array</span><span class="p">{</span><span class="kt">Any</span><span class="p">,</span><span class="mi">1</span><span class="p">}:</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span> <span class="n">julia</span><span class="o">></span> <span class="n">S</span> <span class="o">=</span> <span class="n">SharedArray</span><span class="p">(</span><span class="kt">Int</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span> <span class="n">init</span> <span class="o">=</span> <span class="n">S</span> <span class="o">-></span> <span class="n">S</span><span class="p">[</span><span class="n">localindexes</span><span class="p">(</span><span class="n">S</span><span class="p">)]</span> <span class="o">=</span> <span class="n">myid</span><span class="p">())</span> <span class="mi">3</span><span class="n">x4</span> <span class="n">SharedArray</span><span class="p">{</span><span class="kt">Int64</span><span class="p">,</span><span class="mi">2</span><span class="p">}:</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">4</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span> <span class="mi">4</span> <span class="n">julia</span><span class="o">></span> <span class="n">S</span><span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mi">7</span> <span class="mi">7</span> <span class="n">julia</span><span class="o">></span> <span class="n">S</span> <span class="mi">3</span><span class="n">x4</span> <span class="n">SharedArray</span><span class="p">{</span><span class="kt">Int64</span><span class="p">,</span><span class="mi">2</span><span class="p">}:</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">4</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">4</span> <span class="mi">2</span> <span class="mi">7</span> <span class="mi">4</span> <span class="mi">4</span> </pre></div> </div> <p><tt class="docutils literal"><span class="pre">localindexes</span></tt> provides disjoint one-dimensional ranges of indexes, and is sometimes convenient for splitting up tasks among processes. You can, of course, divide the work any way you wish:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">julia</span><span class="o">></span> <span class="n">S</span> <span class="o">=</span> <span class="n">SharedArray</span><span class="p">(</span><span class="kt">Int</span><span class="p">,</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">),</span> <span class="n">init</span> <span class="o">=</span> <span class="n">S</span> <span class="o">-></span> <span class="n">S</span><span class="p">[</span><span class="n">myid</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="p">:</span><span class="n">nworkers</span><span class="p">():</span><span class="n">length</span><span class="p">(</span><span class="n">S</span><span class="p">)]</span> <span class="o">=</span> <span class="n">myid</span><span class="p">())</span> <span class="mi">3</span><span class="n">x4</span> <span class="n">SharedArray</span><span class="p">{</span><span class="kt">Int64</span><span class="p">,</span><span class="mi">2</span><span class="p">}:</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">2</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">3</span> <span class="mi">4</span> <span class="mi">4</span> <span class="mi">4</span> <span class="mi">4</span> </pre></div> </div> <p>Since all processes have access to the underlying data, you do have to be careful not to set up conflicts. For example:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="p">@</span><span class="n">sync</span> <span class="k">begin</span> <span class="k">for</span> <span class="n">p</span> <span class="k">in</span> <span class="n">workers</span><span class="p">()</span> <span class="p">@</span><span class="n">async</span> <span class="k">begin</span> <span class="n">remotecall_wait</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">fill</span><span class="o">!</span><span class="p">,</span> <span class="n">S</span><span class="p">,</span> <span class="n">p</span><span class="p">)</span> <span class="k">end</span> <span class="k">end</span> <span class="k">end</span> </pre></div> </div> <p>would result in undefined behavior: because each process fills the <em>entire</em> array with its own <tt class="docutils literal"><span class="pre">pid</span></tt>, whichever process is the last to execute (for any particular element of <tt class="docutils literal"><span class="pre">S</span></tt>) will have its <tt class="docutils literal"><span class="pre">pid</span></tt> retained.</p> </div> <div class="section" id="clustermanagers"> <h2>ClusterManagers<a class="headerlink" href="#clustermanagers" title="Permalink to this headline">¶</a></h2> <p>Julia worker processes can also be spawned on arbitrary machines, enabling Julia’s natural parallelism to function quite transparently in a cluster environment. The <tt class="docutils literal"><span class="pre">ClusterManager</span></tt> interface provides a way to specify a means to launch and manage worker processes. For example, <tt class="docutils literal"><span class="pre">ssh</span></tt> clusters are also implemented using a <tt class="docutils literal"><span class="pre">ClusterManager</span></tt>:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">immutable</span> <span class="n">SSHManager</span> <span class="o"><:</span> <span class="n">ClusterManager</span> <span class="n">launch</span><span class="p">::</span><span class="n">Function</span> <span class="n">manage</span><span class="p">::</span><span class="n">Function</span> <span class="n">machines</span><span class="p">::</span><span class="n">AbstractVector</span> <span class="n">SSHManager</span><span class="p">(;</span> <span class="n">machines</span><span class="o">=</span><span class="p">[])</span> <span class="o">=</span> <span class="nb">new</span><span class="p">(</span><span class="n">launch_ssh_workers</span><span class="p">,</span> <span class="n">manage_ssh_workers</span><span class="p">,</span> <span class="n">machines</span><span class="p">)</span> <span class="k">end</span> <span class="k">function</span><span class="nf"> launch_ssh_workers</span><span class="p">(</span><span class="n">cman</span><span class="p">::</span><span class="n">SSHManager</span><span class="p">,</span> <span class="n">np</span><span class="p">::</span><span class="n">Integer</span><span class="p">,</span> <span class="n">config</span><span class="p">::</span><span class="n">Dict</span><span class="p">)</span> <span class="o">...</span> <span class="k">end</span> <span class="k">function</span><span class="nf"> manage_ssh_workers</span><span class="p">(</span><span class="n">id</span><span class="p">::</span><span class="n">Integer</span><span class="p">,</span> <span class="n">config</span><span class="p">::</span><span class="n">Dict</span><span class="p">,</span> <span class="n">op</span><span class="p">::</span><span class="n">Symbol</span><span class="p">)</span> <span class="o">...</span> <span class="k">end</span> </pre></div> </div> <p>where <tt class="docutils literal"><span class="pre">launch_ssh_workers</span></tt> is responsible for instantiating new Julia processes and <tt class="docutils literal"><span class="pre">manage_ssh_workers</span></tt> provides a means to manage those processes, e.g. for sending interrupt signals. New processes can then be added at runtime using <tt class="docutils literal"><span class="pre">addprocs</span></tt>:</p> <div class="highlight-julia"><div class="highlight"><pre><span class="n">addprocs</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="n">cman</span><span class="o">=</span><span class="n">LocalManager</span><span class="p">())</span> </pre></div> </div> <p>which specifies a number of processes to add and a <tt class="docutils literal"><span class="pre">ClusterManager</span></tt> to use for launching those processes.</p> <p class="rubric">Footnotes</p> <table class="docutils footnote" frame="void" id="mpi2rma" rules="none"> <colgroup><col class="label" /><col /></colgroup> <tbody valign="top"> <tr><td class="label"><a class="fn-backref" href="#id1">[1]</a></td><td>In this context, MPI refers to the MPI-1 standard. Beginning with MPI-2, the MPI standards committee introduced a new set of communication mechanisms, collectively referred to as Remote Memory Access (RMA). The motivation for adding RMA to the MPI standard was to facilitate one-sided communication patterns. For additional information on the latest MPI standard, see <a class="reference external" href="http://www.mpi-forum.org/docs">http://www.mpi-forum.org/docs</a>.</td></tr> </tbody> </table> </div> </div> </div> <footer> <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation"> <a href="running-external-programs.html" class="btn btn-neutral float-right" title="Running External Programs"/>Next <span class="fa fa-arrow-circle-right"></span></a> <a href="networking-and-streams.html" class="btn btn-neutral" title="Networking and Streams"><span class="fa fa-arrow-circle-left"></span> Previous</a> </div> <hr/> <div role="contentinfo"> <p> </p> </div> <a href="https://github.com/snide/sphinx_rtd_theme">Sphinx theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a> </footer> </div> </div> </section> </div> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT:'../', VERSION:'0.3.4', COLLAPSE_INDEX:false, FILE_SUFFIX:'.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="../_static/jquery.js"></script> <script type="text/javascript" src="../_static/underscore.js"></script> <script type="text/javascript" src="../_static/doctools.js"></script> <script type="text/javascript" src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> <script type="text/javascript" src="../_static/js/theme.js"></script> <script type="text/javascript"> jQuery(function () { SphinxRtdTheme.StickyNav.enable(); }); </script> </body> </html>