Sophie

Sophie

distrib > Mageia > 9 > armv7hl > media > core-release-src > by-pkgid > f0643ec7946b75a211ba071972431f37 > files > 6

pacemaker-1.1.19-13.mga9.src.rpm

From 4d6f6e01b309cda7b3f8fe791247566d247d8028 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jan=20Pokorn=C3=BD?= <jpokorny@redhat.com>
Date: Tue, 16 Apr 2019 00:08:28 +0200
Subject: [PATCH] High: pacemakerd vs. IPC/procfs confused deputy authenticity
 issue (4/4)

[4/4: CPG users to be careful about now-more-probable rival processes]

In essence, this comes down to pacemaker confusing at-node CPG members
with effectively the only plausible to co-exist at particular node,
which doesn't hold and asks for a wider reconciliation of this
reality-check.

However, in practical terms, since there are two factors lowering the
priority of doing so:

1/ possibly the only non-self-inflicted scenario is either that
   some of the cluster stack processes fail -- this the problem
   that shall rather be deferred to arranged node disarming/fencing
   to stay on the safe side with 100% certainty, at the cost of
   possibly long-lasting failover process at other nodes
   (for other possibility, someone running some of these by accident
   so they effectively become rival processes, it's like getting
   hands cut when playing with a lawnmower in an unintended way)

2/ for state tracking of the peer nodes, it may possibly cause troubles
   in case the process observed as left wasn't the last for the
   particular node, even if presumably just temporary, since the
   situation may eventually resolve with imposed serialization of
   the rival processes via API end-point singleton restriction (this
   is also the most likely cause of why such non-final leave gets
   observed in the first place), except in one case -- the legitimate
   API end-point carrier won't possibly acknowledged as returned
   by its peers, at least not immediately, unless it tries to join
   anew, which verges on undefined behaviour (at least per corosync
   documentation)

we make do just with a light code change so as to

* limit 1/ some more with in-daemon self-check for pre-existing
  end-point existence (this is to complement the checks already made in
  the parent daemon prior to spawning new instances, only some moments
  later; note that we don't have any lock file etc. mechanisms to
  prevent parallel runs of the same daemons, and people could run these
  on their own deliberation), and to

* guard against the interferences from the rivals at the same node
  per 2/ with ignoring their non-final leave messages altogether.

Note that CPG at this point is already expected to be authenticity-safe.

Regarding now-more-probable part, we actually traded the inherently racy
procfs scanning for something (exactly that singleton mentioned above)
rather firm (and unfakeable), but we admittedly got lost track of
processes that are after CPG membership (that is, another form of
a shared state) prior to (or in non-deterministic order allowing for
the same) carring about publishing the end-point.

Big thanks is owed to Yan Gao of SUSE, for early discovery and reporting
this discrepancy arising from the earlier commits in the set.
---
 attrd/main.c      | 19 +++++++++-
 cib/main.c        | 35 ++++++++++--------
 crmd/main.c       | 35 ++++++++++--------
 fencing/main.c    | 32 +++++++++-------
 lib/cluster/cpg.c | 94 +++++++++++++++++++++++++++++++++++++++++++----
 5 files changed, 163 insertions(+), 52 deletions(-)

diff --git a/attrd/main.c b/attrd/main.c
index 4cc15cc534..e0a1e7c6ca 100644
--- a/attrd/main.c
+++ b/attrd/main.c
@@ -1,5 +1,7 @@
 /*
- * Copyright (C) 2013 Andrew Beekhof <andrew@beekhof.net>
+ * Copyright 2013-2019 the Pacemaker project contributors
+ *
+ * The version control history for this file may have further details.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public
@@ -336,6 +338,7 @@ main(int argc, char **argv)
     int index = 0;
     int argerr = 0;
     qb_ipcs_service_t *ipcs = NULL;
+    crm_ipc_t *old_instance = NULL;
 
     attrd_init_mainloop();
     crm_log_preinit(NULL, argc, argv);
@@ -372,6 +375,20 @@ main(int argc, char **argv)
 
     crm_log_init(T_ATTRD, LOG_INFO, TRUE, FALSE, argc, argv, FALSE);
     crm_info("Starting up");
+
+    old_instance = crm_ipc_new(T_ATTRD, 0);
+    if (crm_ipc_connect(old_instance)) {
+        /* IPC end-point already up */
+        crm_ipc_close(old_instance);
+        crm_ipc_destroy(old_instance);
+        crm_err("attrd is already active, aborting startup");
+        crm_exit(EX_OK);
+    } else {
+        /* not up or not authentic, we'll proceed either way */
+        crm_ipc_destroy(old_instance);
+        old_instance = NULL;
+    }
+
     attributes = g_hash_table_new_full(crm_str_hash, g_str_equal, NULL, free_attribute);
 
     attrd_exit_status = attrd_cluster_connect();
diff --git a/cib/main.c b/cib/main.c
index 5473d40958..7c745dafc5 100644
--- a/cib/main.c
+++ b/cib/main.c
@@ -1,19 +1,10 @@
 /*
- * Copyright (C) 2004 Andrew Beekhof <andrew@beekhof.net>
+ * Copyright 2004-2019 the Pacemaker project contributors
  *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
+ * The version control history for this file may have further details.
  *
- * This software is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this library; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ * This source code is licensed under the GNU General Public License version 2
+ * or later (GPLv2+) WITHOUT ANY WARRANTY.
  */
 
 #include <crm_internal.h>
@@ -135,13 +126,12 @@ main(int argc, char **argv)
     int index = 0;
     int argerr = 0;
     struct passwd *pwentry = NULL;
+    crm_ipc_t *old_instance = NULL;
 
     crm_log_preinit(NULL, argc, argv);
     crm_set_options(NULL, "[options]",
                     long_options, "Daemon for storing and replicating the cluster configuration");
 
-    crm_peer_init();
-
     mainloop_add_signal(SIGTERM, cib_shutdown);
     mainloop_add_signal(SIGPIPE, cib_enable_writes);
 
@@ -216,6 +206,19 @@ main(int argc, char **argv)
 
     crm_log_init(NULL, LOG_INFO, TRUE, FALSE, argc, argv, FALSE);
 
+    old_instance = crm_ipc_new(cib_channel_ro, 0);
+    if (crm_ipc_connect(old_instance)) {
+        /* IPC end-point already up */
+        crm_ipc_close(old_instance);
+        crm_ipc_destroy(old_instance);
+        crm_err("cib is already active, aborting startup");
+        crm_exit(EX_OK);
+    } else {
+        /* not up or not authentic, we'll proceed either way */
+        crm_ipc_destroy(old_instance);
+        old_instance = NULL;
+    }
+
     if (cib_root == NULL) {
         if ((g_file_test(CRM_CONFIG_DIR "/cib.xml", G_FILE_TEST_EXISTS) == FALSE)
             && (g_file_test(CRM_LEGACY_CONFIG_DIR "/cib.xml", G_FILE_TEST_EXISTS) == TRUE)) {
@@ -238,6 +241,8 @@ main(int argc, char **argv)
         return 100;
     }
 
+    crm_peer_init();
+
     /* read local config file */
     rc = cib_init();
 
diff --git a/crmd/main.c b/crmd/main.c
index e8baa125e9..6eb7c032f9 100644
--- a/crmd/main.c
+++ b/crmd/main.c
@@ -1,19 +1,10 @@
 /* 
- * Copyright (C) 2004 Andrew Beekhof <andrew@beekhof.net>
- * 
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
- * 
- * This software is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- * 
- * You should have received a copy of the GNU General Public
- * License along with this library; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ * Copyright 2004-2019 the Pacemaker project contributors
+ *
+ * The version control history for this file may have further details.
+ *
+ * This source code is licensed under the GNU General Public License version 2
+ * or later (GPLv2+) WITHOUT ANY WARRANTY.
  */
 
 #include <crm_internal.h>
@@ -61,6 +52,7 @@ main(int argc, char **argv)
     int flag;
     int index = 0;
     int argerr = 0;
+    crm_ipc_t *old_instance = NULL;
 
     crmd_mainloop = g_main_new(FALSE);
     crm_log_preinit(NULL, argc, argv);
@@ -104,6 +96,19 @@ main(int argc, char **argv)
         crm_help('?', EX_USAGE);
     }
 
+    old_instance = crm_ipc_new(CRM_SYSTEM_CRMD, 0);
+    if (crm_ipc_connect(old_instance)) {
+        /* IPC end-point already up */
+        crm_ipc_close(old_instance);
+        crm_ipc_destroy(old_instance);
+        crm_err("crmd is already active, aborting startup");
+        crm_exit(EX_OK);
+    } else {
+        /* not up or not authentic, we'll proceed either way */
+        crm_ipc_destroy(old_instance);
+        old_instance = NULL;
+    }
+
     if (pcmk__daemon_can_write(PE_STATE_DIR, NULL) == FALSE) {
         crm_err("Terminating due to bad permissions on " PE_STATE_DIR);
         fprintf(stderr,
diff --git a/fencing/main.c b/fencing/main.c
index 16663f6259..c46c9a531b 100644
--- a/fencing/main.c
+++ b/fencing/main.c
@@ -1,19 +1,10 @@
 /*
- * Copyright (C) 2009 Andrew Beekhof <andrew@beekhof.net>
+ * Copyright 2009-2019 the Pacemaker project contributors
  *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License as published by the Free Software Foundation; either
- * version 2 of the License, or (at your option) any later version.
+ * The version control history for this file may have further details.
  *
- * This software is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this library; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ * This source code is licensed under the GNU General Public License version 2
+ * or later (GPLv2+) WITHOUT ANY WARRANTY.
  */
 
 #include <crm_internal.h>
@@ -1289,6 +1280,7 @@ main(int argc, char **argv)
     int option_index = 0;
     crm_cluster_t cluster;
     const char *actions[] = { "reboot", "off", "on", "list", "monitor", "status" };
+    crm_ipc_t *old_instance = NULL;
 
     crm_log_preinit("stonith-ng", argc, argv);
     crm_set_options(NULL, "mode [options]", long_options,
@@ -1459,6 +1451,20 @@ main(int argc, char **argv)
     }
 
     crm_log_init("stonith-ng", LOG_INFO, TRUE, FALSE, argc, argv, FALSE);
+
+    old_instance = crm_ipc_new("stonith-ng", 0);
+    if (crm_ipc_connect(old_instance)) {
+        /* IPC end-point already up */
+        crm_ipc_close(old_instance);
+        crm_ipc_destroy(old_instance);
+        crm_err("stonithd is already active, aborting startup");
+        crm_exit(EX_OK);
+    } else {
+        /* not up or not authentic, we'll proceed either way */
+        crm_ipc_destroy(old_instance);
+        old_instance = NULL;
+    }
+
     mainloop_add_signal(SIGTERM, stonith_shutdown);
 
     crm_peer_init();
diff --git a/lib/cluster/cpg.c b/lib/cluster/cpg.c
index a61d492387..c5ecc67164 100644
--- a/lib/cluster/cpg.c
+++ b/lib/cluster/cpg.c
@@ -385,6 +385,20 @@ pcmk_message_common_cs(cpg_handle_t handle, uint32_t nodeid, uint32_t pid, void
     return NULL;
 }
 
+static int cmp_member_list_nodeid(const void *first,
+                                  const void *second)
+{
+    const struct cpg_address *const a = *((const struct cpg_address **) first),
+                             *const b = *((const struct cpg_address **) second);
+    if (a->nodeid < b->nodeid) {
+        return -1;
+    } else if (a->nodeid > b->nodeid) {
+        return 1;
+    }
+    /* don't bother with "reason" nor "pid" */
+    return 0;
+}
+
 void
 pcmk_cpg_membership(cpg_handle_t handle,
                     const struct cpg_name *groupName,
@@ -396,29 +410,91 @@ pcmk_cpg_membership(cpg_handle_t handle,
     gboolean found = FALSE;
     static int counter = 0;
     uint32_t local_nodeid = get_local_nodeid(handle);
+    const struct cpg_address *key, **rival, **sorted;
+
+    sorted = malloc(member_list_entries * sizeof(const struct cpg_address *));
+    CRM_ASSERT(sorted != NULL);
+
+    for (size_t iter = 0; iter < member_list_entries; iter++) {
+        sorted[iter] = member_list + iter;
+    }
+    /* so that the cross-matching multiply-subscribed nodes is then cheap */
+    qsort(sorted, member_list_entries, sizeof(const struct cpg_address *),
+          cmp_member_list_nodeid);
 
     for (i = 0; i < left_list_entries; i++) {
         crm_node_t *peer = crm_find_peer(left_list[i].nodeid, NULL);
 
-        crm_info("Node %u left group %s (peer=%s, counter=%d.%d)",
+        crm_info("Node %u left group %s (peer=%s:%llu, counter=%d.%d)",
                  left_list[i].nodeid, groupName->value,
-                 (peer? peer->uname : "<none>"), counter, i);
+                 (peer? peer->uname : "<none>"),
+                 (unsigned long long) left_list[i].pid, counter, i);
+
+        /* in CPG world, NODE:PROCESS-IN-MEMBERSHIP-OF-G is an 1:N relation
+           and not playing by this rule may go wild in case of multiple
+           residual instances of the same pacemaker daemon at the same node
+           -- we must ensure that the possible local rival(s) won't make us
+           cry out and bail (e.g. when they quit themselves), since all the
+           surrounding logic denies this simple fact that the full membership
+           is discriminated also per the PID of the process beside mere node
+           ID (and implicitly, group ID); practically, this will be sound in
+           terms of not preventing progress, since all the CPG joiners are
+           also API end-point carriers, and that's what matters locally
+           (who's the winner);
+           remotely, we will just compare leave_list and member_list and if
+           the left process has it's node retained in member_list (under some
+           other PID, anyway) we will just ignore it as well
+           XXX: long-term fix is to establish in-out PID-aware tracking? */
         if (peer) {
-            crm_update_peer_proc(__FUNCTION__, peer, crm_proc_cpg, OFFLINESTATUS);
+            key = &left_list[i];
+            rival = bsearch(&key, sorted, member_list_entries,
+                            sizeof(const struct cpg_address *),
+                            cmp_member_list_nodeid);
+            if (rival == NULL) {
+                crm_update_peer_proc(__FUNCTION__, peer, crm_proc_cpg,
+                                     OFFLINESTATUS);
+            } else if (left_list[i].nodeid == local_nodeid) {
+                crm_info("Ignoring the above event %s.%d, comes from a local"
+                         " rival process (presumably not us): %llu",
+                         groupName->value, counter,
+                         (unsigned long long) left_list[i].pid);
+            } else {
+                crm_info("Ignoring the above event %s.%d, comes from"
+                         " a rival-rich node: %llu (e.g. %llu process"
+                         " carries on)",
+                         groupName->value, counter,
+                         (unsigned long long) left_list[i].pid,
+                         (unsigned long long) (*rival)->pid);
+            }
         }
     }
+    free(sorted);
+    sorted = NULL;
 
     for (i = 0; i < joined_list_entries; i++) {
-        crm_info("Node %u joined group %s (counter=%d.%d)",
-                 joined_list[i].nodeid, groupName->value, counter, i);
+        crm_info("Node %u joined group %s (counter=%d.%d, pid=%llu,"
+                 " unchecked for rivals)",
+                 joined_list[i].nodeid, groupName->value, counter, i,
+                 (unsigned long long) left_list[i].pid);
     }
 
     for (i = 0; i < member_list_entries; i++) {
         crm_node_t *peer = crm_get_peer(member_list[i].nodeid, NULL);
 
-        crm_info("Node %u still member of group %s (peer=%s, counter=%d.%d)",
+        crm_info("Node %u still member of group %s (peer=%s:%llu,"
+                 " counter=%d.%d, at least once)",
                  member_list[i].nodeid, groupName->value,
-                 (peer? peer->uname : "<none>"), counter, i);
+                 (peer? peer->uname : "<none>"), member_list[i].pid,
+                 counter, i);
+
+        if (member_list[i].nodeid == local_nodeid
+                && member_list[i].pid != getpid()) {
+            /* see the note above */
+            crm_info("Ignoring the above event %s.%d, comes from a local rival"
+                     " process: %llu", groupName->value, counter,
+                     (unsigned long long) member_list[i].pid);
+            continue;
+        }
 
         /* Anyone that is sending us CPG messages must also be a _CPG_ member.
          * But it's _not_ safe to assume it's in the quorum membership.
@@ -438,7 +514,9 @@ pcmk_cpg_membership(cpg_handle_t handle,
                  *
                  * Set the threshold to 1 minute
                  */
-                crm_err("Node %s[%u] appears to be online even though we think it is dead", peer->uname, peer->id);
+                crm_err("Node %s[%u] appears to be online even though we think"
+                        " it is dead (unchecked for rivals)",
+                        peer->uname, peer->id);
                 if (crm_update_peer_state(__FUNCTION__, peer, CRM_NODE_MEMBER, 0)) {
                     peer->votes = 0;
                 }