Riak Setup Instructions ------ This document explains how to set up a Riak cluster. It assumes that you have already downloaded and successfully built Riak. For help with those steps, please refer to riak/README. Overview --- Riak has many knobs to tweak, affecting everything from distribution to disk storage. This document will attempt to give a description of the common configuration parameters, then describe two typical setups, one small, one large. Configuration --- Configurations are stored in the simple text files vm.args and app.config. Initial versions of these files are stored in the rel/files/ subdirectory of the riak source tree. When a release is generated, these files are copied to rel/riak/etc/. vm.args --- The vm.args configuration file sets the parameters passed on the command line to the Erlang VM. Lines starting with a '#' are comments, and are ignored. The other lines are concatenated and passed on the command line verbatim. Two important parameters to configure here are "-name", the name to give the Erlang node running Riak, and "-setcookie", the cookie that all Riak nodes need to share in order to communicate. app.config --- The app.config configuration file is formatted as an Erlang VM config file. The syntax is simply: [ {AppName, [ {Option1, Value1}, {Option2, Value2}, ... ]}, ... ]. Normally, this will look something like: [ {riak_kv, [ {storage_backend, riak_kv_dets_backend}, {riak_kv_dets_backend_root, "data/dets"} ]}, {sasl, [ {sasl_error_logger, {file, "log/sasl-error.log"}} ]} ]. This would set the 'storage_backend' and 'riak_kv_dets_backend_root' options for the 'riak_kv' application, and the 'sasl_error_logger' option for the 'sasl' application. The following parameters can be used in app.config to configure Riak behavior. Some of the terminology used below is better explained in riak/doc/architecture.txt. cluster_name: string The name of the cluster. Can be anything. Used mainly in saving ring configuration. All nodes should have the same cluster name. gossip_interval: integer The period, in milliseconds, at which ring state gossiping will happen. A good default is 60000 (sixty seconds). Best not to change it unless you know what you're doing. ring_creation_size: integer The number of partitions to divide the keyspace into. This can be any number, but you probably don't want to go lower than 16, and production deployments will probably want something like 1024 or greater. This is a very difficult parameter to change after your ring has been created, so choose a number that allows for growth, if you expect to add nodes to this cluster in the future. ring_state_dir: string Directory in which the ring state should be stored. Ring state is stored to allow an entire cluster to be restarted. storage_backend: atom Name of the module that implements the storage for a vnode. The backends that ship with Riak are riak_kv_bitcask_backend, riak_kv_cache_backend, riak_dets_backend, riak_ets_backend, riak_fs_backend, and riak_kv_gb_trees_backend. Some backends have their own set of configuration parameters. riak_kv_bitcask_backend: Stores data in the Bitcask key/value store. riak_kv_cache_backend: Behaves as an LRU-with-timed-expiry cache. riak_kv_dets_backend: A backend that uses DETS to store its data. riak_kv_dets_backend_root: string The directory under which this backend will store its files. riak_kv_ets_backend: A backend that uses ETS to store its data. riak_kv_fs_backend: A backend that uses the filesystem directly to store data. Data are stored in Erlang binary format in files in a directory structure on disk. riak_fs_backend_root: string The directory under which this backend will store its files. riak_kv_gb_trees_backend: A backend that uses Erlang gb_trees to store data. Single-node Configuration --- If you're running a single Riak node, you likely don't need to change any configuration at all. After compiling and generating the release ("./rebar compile generate"), simply start Riak from the rel/ directory. (Details about the "riak" control script can be found in the README.) Large (Production) Configuration --- If you're running any sort of cluster that could be labeled "production", "deployment", "scalable", "enterprise", or any other word implying that the cluster will be running interminably with on-going maintenance, then you will want to change configurations a bit. Some recommended changes: * Uncomment the "-heart" line in vm.args. This will cause the "heart" utility to monitor the Riak node, and restart it if it stops. * Change the name of the Riak node in vm.args from riak@127.0.0.1 to riak@VISIBLE.HOSTNAME. This will allow Riak nodes on separate machines to communicate. * Change 'web_ip' in the 'riak_core' section of app.config if you'll be accessing that interface from a non-host-local client. * Consider adding a 'ring_creation_size' entry to app.config, and setting it to a number higher than the default of 64. More partitions will allow you to add more Riak nodes later, if you need to. To get the cluster up and running, first start Riak on each node with the usual "riak start" command. Next, tell each node to join the cluster with the riak-admin script: box2$ bin/riak-admin join riak@box1.example.com Sent join request to riak@box1.example.com To check that all nodes have joined the cluster, attach a console to any node, and request the ring from the ring manager, then check that all nodes are represented: $ bin/riak attach Attaching to /tmp/erlang.pipe.1 (^D to exit) (riak@box1.example.com)1> {ok, R} = riak_core_ring_manager:get_my_ring(). {ok,{chstate,'riak@box1.example.com', ...snip... (riak@box1.example.com)2> riak_core_ring:all_members(R). ['riak@box1.example.com','riak@box2.example.com'] Your cluster should now be ready to accept requests. See riak/doc/basic-client.txt for simple instructions on connecting, storing and fetching data. Starting more nodes in production is just as easy: 1. Install Riak on another host, modifying hostnames in configuration files, if necessary. 2. Start the node with "riak start" 3. Add the node to the cluster with "riak-admin join ExistingClusterNode" Developer Configuration --- If you're hacking on Riak, and you need to run multiple nodes on a single physical machine, use the "devrel" make command: $ make devrel mkdir -p dev (cd rel && ../rebar generate target_dir=../dev/dev1 overlay_vars=vars/dev1_vars.config) ==> rel (generate) mkdir -p dev (cd rel && ../rebar generate target_dir=../dev/dev2 overlay_vars=vars/dev2_vars.config) ==> rel (generate) mkdir -p dev (cd rel && ../rebar generate target_dir=../dev/dev3 overlay_vars=vars/dev3_vars.config) ==> rel (generate) This make target creates a release, and then modifies configuration files such that each Riak node uses a different Erlang node name (dev1-3), web port (8091-3), data directory (dev/dev1-3/data/), etc. Start each developer node just as you would a regular node, but use the 'riak' script in dev/devN/bin/.