678 lines
26 KiB
Python
678 lines
26 KiB
Python
![]() |
"""
|
|||
|
Algorithm for testing d-separation in DAGs.
|
|||
|
|
|||
|
*d-separation* is a test for conditional independence in probability
|
|||
|
distributions that can be factorized using DAGs. It is a purely
|
|||
|
graphical test that uses the underlying graph and makes no reference
|
|||
|
to the actual distribution parameters. See [1]_ for a formal
|
|||
|
definition.
|
|||
|
|
|||
|
The implementation is based on the conceptually simple linear time
|
|||
|
algorithm presented in [2]_. Refer to [3]_, [4]_ for a couple of
|
|||
|
alternative algorithms.
|
|||
|
|
|||
|
The functional interface in NetworkX consists of three functions:
|
|||
|
|
|||
|
- `find_minimal_d_separator` returns a minimal d-separator set ``z``.
|
|||
|
That is, removing any node or nodes from it makes it no longer a d-separator.
|
|||
|
- `is_d_separator` checks if a given set is a d-separator.
|
|||
|
- `is_minimal_d_separator` checks if a given set is a minimal d-separator.
|
|||
|
|
|||
|
D-separators
|
|||
|
------------
|
|||
|
|
|||
|
Here, we provide a brief overview of d-separation and related concepts that
|
|||
|
are relevant for understanding it:
|
|||
|
|
|||
|
The ideas of d-separation and d-connection relate to paths being open or blocked.
|
|||
|
|
|||
|
- A "path" is a sequence of nodes connected in order by edges. Unlike for most
|
|||
|
graph theory analysis, the direction of the edges is ignored. Thus the path
|
|||
|
can be thought of as a traditional path on the undirected version of the graph.
|
|||
|
- A "candidate d-separator" ``z`` is a set of nodes being considered as
|
|||
|
possibly blocking all paths between two prescribed sets ``x`` and ``y`` of nodes.
|
|||
|
We refer to each node in the candidate d-separator as "known".
|
|||
|
- A "collider" node on a path is a node that is a successor of its two neighbor
|
|||
|
nodes on the path. That is, ``c`` is a collider if the edge directions
|
|||
|
along the path look like ``... u -> c <- v ...``.
|
|||
|
- If a collider node or any of its descendants are "known", the collider
|
|||
|
is called an "open collider". Otherwise it is a "blocking collider".
|
|||
|
- Any path can be "blocked" in two ways. If the path contains a "known" node
|
|||
|
that is not a collider, the path is blocked. Also, if the path contains a
|
|||
|
collider that is not a "known" node, the path is blocked.
|
|||
|
- A path is "open" if it is not blocked. That is, it is open if every node is
|
|||
|
either an open collider or not a "known". Said another way, every
|
|||
|
"known" in the path is a collider and every collider is open (has a
|
|||
|
"known" as a inclusive descendant). The concept of "open path" is meant to
|
|||
|
demonstrate a probabilistic conditional dependence between two nodes given
|
|||
|
prescribed knowledge ("known" nodes).
|
|||
|
- Two sets ``x`` and ``y`` of nodes are "d-separated" by a set of nodes ``z``
|
|||
|
if all paths between nodes in ``x`` and nodes in ``y`` are blocked. That is,
|
|||
|
if there are no open paths from any node in ``x`` to any node in ``y``.
|
|||
|
Such a set ``z`` is a "d-separator" of ``x`` and ``y``.
|
|||
|
- A "minimal d-separator" is a d-separator ``z`` for which no node or subset
|
|||
|
of nodes can be removed with it still being a d-separator.
|
|||
|
|
|||
|
The d-separator blocks some paths between ``x`` and ``y`` but opens others.
|
|||
|
Nodes in the d-separator block paths if the nodes are not colliders.
|
|||
|
But if a collider or its descendant nodes are in the d-separation set, the
|
|||
|
colliders are open, allowing a path through that collider.
|
|||
|
|
|||
|
Illustration of D-separation with examples
|
|||
|
------------------------------------------
|
|||
|
|
|||
|
A pair of two nodes, ``u`` and ``v``, are d-connected if there is a path
|
|||
|
from ``u`` to ``v`` that is not blocked. That means, there is an open
|
|||
|
path from ``u`` to ``v``.
|
|||
|
|
|||
|
For example, if the d-separating set is the empty set, then the following paths are
|
|||
|
open between ``u`` and ``v``:
|
|||
|
|
|||
|
- u <- n -> v
|
|||
|
- u -> w -> ... -> n -> v
|
|||
|
|
|||
|
If on the other hand, ``n`` is in the d-separating set, then ``n`` blocks
|
|||
|
those paths between ``u`` and ``v``.
|
|||
|
|
|||
|
Colliders block a path if they and their descendants are not included
|
|||
|
in the d-separating set. An example of a path that is blocked when the
|
|||
|
d-separating set is empty is:
|
|||
|
|
|||
|
- u -> w -> ... -> n <- v
|
|||
|
|
|||
|
The node ``n`` is a collider in this path and is not in the d-separating set.
|
|||
|
So ``n`` blocks this path. However, if ``n`` or a descendant of ``n`` is
|
|||
|
included in the d-separating set, then the path through the collider
|
|||
|
at ``n`` (... -> n <- ...) is "open".
|
|||
|
|
|||
|
D-separation is concerned with blocking all paths between nodes from ``x`` to ``y``.
|
|||
|
A d-separating set between ``x`` and ``y`` is one where all paths are blocked.
|
|||
|
|
|||
|
D-separation and its applications in probability
|
|||
|
------------------------------------------------
|
|||
|
|
|||
|
D-separation is commonly used in probabilistic causal-graph models. D-separation
|
|||
|
connects the idea of probabilistic "dependence" with separation in a graph. If
|
|||
|
one assumes the causal Markov condition [5]_, (every node is conditionally
|
|||
|
independent of its non-descendants, given its parents) then d-separation implies
|
|||
|
conditional independence in probability distributions.
|
|||
|
Symmetrically, d-connection implies dependence.
|
|||
|
|
|||
|
The intuition is as follows. The edges on a causal graph indicate which nodes
|
|||
|
influence the outcome of other nodes directly. An edge from u to v
|
|||
|
implies that the outcome of event ``u`` influences the probabilities for
|
|||
|
the outcome of event ``v``. Certainly knowing ``u`` changes predictions for ``v``.
|
|||
|
But also knowing ``v`` changes predictions for ``u``. The outcomes are dependent.
|
|||
|
Furthermore, an edge from ``v`` to ``w`` would mean that ``w`` and ``v`` are dependent
|
|||
|
and thus that ``u`` could indirectly influence ``w``.
|
|||
|
|
|||
|
Without any knowledge about the system (candidate d-separating set is empty)
|
|||
|
a causal graph ``u -> v -> w`` allows all three nodes to be dependent. But
|
|||
|
if we know the outcome of ``v``, the conditional probabilities of outcomes for
|
|||
|
``u`` and ``w`` are independent of each other. That is, once we know the outcome
|
|||
|
for ``v``, the probabilities for ``w`` do not depend on the outcome for ``u``.
|
|||
|
This is the idea behind ``v`` blocking the path if it is "known" (in the candidate
|
|||
|
d-separating set).
|
|||
|
|
|||
|
The same argument works whether the direction of the edges are both
|
|||
|
left-going and when both arrows head out from the middle. Having a "known"
|
|||
|
node on a path blocks the collider-free path because those relationships
|
|||
|
make the conditional probabilities independent.
|
|||
|
|
|||
|
The direction of the causal edges does impact dependence precisely in the
|
|||
|
case of a collider e.g. ``u -> v <- w``. In that situation, both ``u`` and ``w``
|
|||
|
influence ``v``. But they do not directly influence each other. So without any
|
|||
|
knowledge of any outcomes, ``u`` and ``w`` are independent. That is the idea behind
|
|||
|
colliders blocking the path. But, if ``v`` is known, the conditional probabilities
|
|||
|
of ``u`` and ``w`` can be dependent. This is the heart of Berkson's Paradox [6]_.
|
|||
|
For example, suppose ``u`` and ``w`` are boolean events (they either happen or do not)
|
|||
|
and ``v`` represents the outcome "at least one of ``u`` and ``w`` occur". Then knowing
|
|||
|
``v`` is true makes the conditional probabilities of ``u`` and ``w`` dependent.
|
|||
|
Essentially, knowing that at least one of them is true raises the probability of
|
|||
|
each. But further knowledge that ``w`` is true (or false) change the conditional
|
|||
|
probability of ``u`` to either the original value or 1. So the conditional
|
|||
|
probability of ``u`` depends on the outcome of ``w`` even though there is no
|
|||
|
causal relationship between them. When a collider is known, dependence can
|
|||
|
occur across paths through that collider. This is the reason open colliders
|
|||
|
do not block paths.
|
|||
|
|
|||
|
Furthermore, even if ``v`` is not "known", if one of its descendants is "known"
|
|||
|
we can use that information to know more about ``v`` which again makes
|
|||
|
``u`` and ``w`` potentially dependent. Suppose the chance of ``n`` occurring
|
|||
|
is much higher when ``v`` occurs ("at least one of ``u`` and ``w`` occur").
|
|||
|
Then if we know ``n`` occurred, it is more likely that ``v`` occurred and that
|
|||
|
makes the chance of ``u`` and ``w`` dependent. This is the idea behind why
|
|||
|
a collider does no block a path if any descendant of the collider is "known".
|
|||
|
|
|||
|
When two sets of nodes ``x`` and ``y`` are d-separated by a set ``z``,
|
|||
|
it means that given the outcomes of the nodes in ``z``, the probabilities
|
|||
|
of outcomes of the nodes in ``x`` are independent of the outcomes of the
|
|||
|
nodes in ``y`` and vice versa.
|
|||
|
|
|||
|
Examples
|
|||
|
--------
|
|||
|
A Hidden Markov Model with 5 observed states and 5 hidden states
|
|||
|
where the hidden states have causal relationships resulting in
|
|||
|
a path results in the following causal network. We check that
|
|||
|
early states along the path are separated from late state in
|
|||
|
the path by the d-separator of the middle hidden state.
|
|||
|
Thus if we condition on the middle hidden state, the early
|
|||
|
state probabilities are independent of the late state outcomes.
|
|||
|
|
|||
|
>>> G = nx.DiGraph()
|
|||
|
>>> G.add_edges_from(
|
|||
|
... [
|
|||
|
... ("H1", "H2"),
|
|||
|
... ("H2", "H3"),
|
|||
|
... ("H3", "H4"),
|
|||
|
... ("H4", "H5"),
|
|||
|
... ("H1", "O1"),
|
|||
|
... ("H2", "O2"),
|
|||
|
... ("H3", "O3"),
|
|||
|
... ("H4", "O4"),
|
|||
|
... ("H5", "O5"),
|
|||
|
... ]
|
|||
|
... )
|
|||
|
>>> x, y, z = ({"H1", "O1"}, {"H5", "O5"}, {"H3"})
|
|||
|
>>> nx.is_d_separator(G, x, y, z)
|
|||
|
True
|
|||
|
>>> nx.is_minimal_d_separator(G, x, y, z)
|
|||
|
True
|
|||
|
>>> nx.is_minimal_d_separator(G, x, y, z | {"O3"})
|
|||
|
False
|
|||
|
>>> z = nx.find_minimal_d_separator(G, x | y, {"O2", "O3", "O4"})
|
|||
|
>>> z == {"H2", "H4"}
|
|||
|
True
|
|||
|
|
|||
|
If no minimal_d_separator exists, `None` is returned
|
|||
|
|
|||
|
>>> other_z = nx.find_minimal_d_separator(G, x | y, {"H2", "H3"})
|
|||
|
>>> other_z is None
|
|||
|
True
|
|||
|
|
|||
|
|
|||
|
References
|
|||
|
----------
|
|||
|
|
|||
|
.. [1] Pearl, J. (2009). Causality. Cambridge: Cambridge University Press.
|
|||
|
|
|||
|
.. [2] Darwiche, A. (2009). Modeling and reasoning with Bayesian networks.
|
|||
|
Cambridge: Cambridge University Press.
|
|||
|
|
|||
|
.. [3] Shachter, Ross D. "Bayes-ball: The rational pastime (for
|
|||
|
determining irrelevance and requisite information in belief networks
|
|||
|
and influence diagrams)." In Proceedings of the Fourteenth Conference
|
|||
|
on Uncertainty in Artificial Intelligence (UAI), (pp. 480–487). 1998.
|
|||
|
|
|||
|
.. [4] Koller, D., & Friedman, N. (2009).
|
|||
|
Probabilistic graphical models: principles and techniques. The MIT Press.
|
|||
|
|
|||
|
.. [5] https://en.wikipedia.org/wiki/Causal_Markov_condition
|
|||
|
|
|||
|
.. [6] https://en.wikipedia.org/wiki/Berkson%27s_paradox
|
|||
|
|
|||
|
"""
|
|||
|
|
|||
|
from collections import deque
|
|||
|
from itertools import chain
|
|||
|
|
|||
|
import networkx as nx
|
|||
|
from networkx.utils import UnionFind, not_implemented_for
|
|||
|
|
|||
|
__all__ = [
|
|||
|
"is_d_separator",
|
|||
|
"is_minimal_d_separator",
|
|||
|
"find_minimal_d_separator",
|
|||
|
]
|
|||
|
|
|||
|
|
|||
|
@not_implemented_for("undirected")
|
|||
|
@nx._dispatchable
|
|||
|
def is_d_separator(G, x, y, z):
|
|||
|
"""Return whether node sets `x` and `y` are d-separated by `z`.
|
|||
|
|
|||
|
Parameters
|
|||
|
----------
|
|||
|
G : nx.DiGraph
|
|||
|
A NetworkX DAG.
|
|||
|
|
|||
|
x : node or set of nodes
|
|||
|
First node or set of nodes in `G`.
|
|||
|
|
|||
|
y : node or set of nodes
|
|||
|
Second node or set of nodes in `G`.
|
|||
|
|
|||
|
z : node or set of nodes
|
|||
|
Potential separator (set of conditioning nodes in `G`). Can be empty set.
|
|||
|
|
|||
|
Returns
|
|||
|
-------
|
|||
|
b : bool
|
|||
|
A boolean that is true if `x` is d-separated from `y` given `z` in `G`.
|
|||
|
|
|||
|
Raises
|
|||
|
------
|
|||
|
NetworkXError
|
|||
|
The *d-separation* test is commonly used on disjoint sets of
|
|||
|
nodes in acyclic directed graphs. Accordingly, the algorithm
|
|||
|
raises a :exc:`NetworkXError` if the node sets are not
|
|||
|
disjoint or if the input graph is not a DAG.
|
|||
|
|
|||
|
NodeNotFound
|
|||
|
If any of the input nodes are not found in the graph,
|
|||
|
a :exc:`NodeNotFound` exception is raised
|
|||
|
|
|||
|
Notes
|
|||
|
-----
|
|||
|
A d-separating set in a DAG is a set of nodes that
|
|||
|
blocks all paths between the two sets. Nodes in `z`
|
|||
|
block a path if they are part of the path and are not a collider,
|
|||
|
or a descendant of a collider. Also colliders that are not in `z`
|
|||
|
block a path. A collider structure along a path
|
|||
|
is ``... -> c <- ...`` where ``c`` is the collider node.
|
|||
|
|
|||
|
https://en.wikipedia.org/wiki/Bayesian_network#d-separation
|
|||
|
"""
|
|||
|
try:
|
|||
|
x = {x} if x in G else x
|
|||
|
y = {y} if y in G else y
|
|||
|
z = {z} if z in G else z
|
|||
|
|
|||
|
intersection = x & y or x & z or y & z
|
|||
|
if intersection:
|
|||
|
raise nx.NetworkXError(
|
|||
|
f"The sets are not disjoint, with intersection {intersection}"
|
|||
|
)
|
|||
|
|
|||
|
set_v = x | y | z
|
|||
|
if set_v - G.nodes:
|
|||
|
raise nx.NodeNotFound(f"The node(s) {set_v - G.nodes} are not found in G")
|
|||
|
except TypeError:
|
|||
|
raise nx.NodeNotFound("One of x, y, or z is not a node or a set of nodes in G")
|
|||
|
|
|||
|
if not nx.is_directed_acyclic_graph(G):
|
|||
|
raise nx.NetworkXError("graph should be directed acyclic")
|
|||
|
|
|||
|
# contains -> and <-> edges from starting node T
|
|||
|
forward_deque = deque([])
|
|||
|
forward_visited = set()
|
|||
|
|
|||
|
# contains <- and - edges from starting node T
|
|||
|
backward_deque = deque(x)
|
|||
|
backward_visited = set()
|
|||
|
|
|||
|
ancestors_or_z = set().union(*[nx.ancestors(G, node) for node in x]) | z | x
|
|||
|
|
|||
|
while forward_deque or backward_deque:
|
|||
|
if backward_deque:
|
|||
|
node = backward_deque.popleft()
|
|||
|
backward_visited.add(node)
|
|||
|
if node in y:
|
|||
|
return False
|
|||
|
if node in z:
|
|||
|
continue
|
|||
|
|
|||
|
# add <- edges to backward deque
|
|||
|
backward_deque.extend(G.pred[node].keys() - backward_visited)
|
|||
|
# add -> edges to forward deque
|
|||
|
forward_deque.extend(G.succ[node].keys() - forward_visited)
|
|||
|
|
|||
|
if forward_deque:
|
|||
|
node = forward_deque.popleft()
|
|||
|
forward_visited.add(node)
|
|||
|
if node in y:
|
|||
|
return False
|
|||
|
|
|||
|
# Consider if -> node <- is opened due to ancestor of node in z
|
|||
|
if node in ancestors_or_z:
|
|||
|
# add <- edges to backward deque
|
|||
|
backward_deque.extend(G.pred[node].keys() - backward_visited)
|
|||
|
if node not in z:
|
|||
|
# add -> edges to forward deque
|
|||
|
forward_deque.extend(G.succ[node].keys() - forward_visited)
|
|||
|
|
|||
|
return True
|
|||
|
|
|||
|
|
|||
|
@not_implemented_for("undirected")
|
|||
|
@nx._dispatchable
|
|||
|
def find_minimal_d_separator(G, x, y, *, included=None, restricted=None):
|
|||
|
"""Returns a minimal d-separating set between `x` and `y` if possible
|
|||
|
|
|||
|
A d-separating set in a DAG is a set of nodes that blocks all
|
|||
|
paths between the two sets of nodes, `x` and `y`. This function
|
|||
|
constructs a d-separating set that is "minimal", meaning no nodes can
|
|||
|
be removed without it losing the d-separating property for `x` and `y`.
|
|||
|
If no d-separating sets exist for `x` and `y`, this returns `None`.
|
|||
|
|
|||
|
In a DAG there may be more than one minimal d-separator between two
|
|||
|
sets of nodes. Minimal d-separators are not always unique. This function
|
|||
|
returns one minimal d-separator, or `None` if no d-separator exists.
|
|||
|
|
|||
|
Uses the algorithm presented in [1]_. The complexity of the algorithm
|
|||
|
is :math:`O(m)`, where :math:`m` stands for the number of edges in
|
|||
|
the subgraph of G consisting of only the ancestors of `x` and `y`.
|
|||
|
For full details, see [1]_.
|
|||
|
|
|||
|
Parameters
|
|||
|
----------
|
|||
|
G : graph
|
|||
|
A networkx DAG.
|
|||
|
x : set | node
|
|||
|
A node or set of nodes in the graph.
|
|||
|
y : set | node
|
|||
|
A node or set of nodes in the graph.
|
|||
|
included : set | node | None
|
|||
|
A node or set of nodes which must be included in the found separating set,
|
|||
|
default is None, which means the empty set.
|
|||
|
restricted : set | node | None
|
|||
|
Restricted node or set of nodes to consider. Only these nodes can be in
|
|||
|
the found separating set, default is None meaning all nodes in ``G``.
|
|||
|
|
|||
|
Returns
|
|||
|
-------
|
|||
|
z : set | None
|
|||
|
The minimal d-separating set, if at least one d-separating set exists,
|
|||
|
otherwise None.
|
|||
|
|
|||
|
Raises
|
|||
|
------
|
|||
|
NetworkXError
|
|||
|
Raises a :exc:`NetworkXError` if the input graph is not a DAG
|
|||
|
or if node sets `x`, `y`, and `included` are not disjoint.
|
|||
|
|
|||
|
NodeNotFound
|
|||
|
If any of the input nodes are not found in the graph,
|
|||
|
a :exc:`NodeNotFound` exception is raised.
|
|||
|
|
|||
|
References
|
|||
|
----------
|
|||
|
.. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
|
|||
|
minimal d-separators in linear time and applications." In
|
|||
|
Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
|
|||
|
"""
|
|||
|
if not nx.is_directed_acyclic_graph(G):
|
|||
|
raise nx.NetworkXError("graph should be directed acyclic")
|
|||
|
|
|||
|
try:
|
|||
|
x = {x} if x in G else x
|
|||
|
y = {y} if y in G else y
|
|||
|
|
|||
|
if included is None:
|
|||
|
included = set()
|
|||
|
elif included in G:
|
|||
|
included = {included}
|
|||
|
|
|||
|
if restricted is None:
|
|||
|
restricted = set(G)
|
|||
|
elif restricted in G:
|
|||
|
restricted = {restricted}
|
|||
|
|
|||
|
set_y = x | y | included | restricted
|
|||
|
if set_y - G.nodes:
|
|||
|
raise nx.NodeNotFound(f"The node(s) {set_y - G.nodes} are not found in G")
|
|||
|
except TypeError:
|
|||
|
raise nx.NodeNotFound(
|
|||
|
"One of x, y, included or restricted is not a node or set of nodes in G"
|
|||
|
)
|
|||
|
|
|||
|
if not included <= restricted:
|
|||
|
raise nx.NetworkXError(
|
|||
|
f"Included nodes {included} must be in restricted nodes {restricted}"
|
|||
|
)
|
|||
|
|
|||
|
intersection = x & y or x & included or y & included
|
|||
|
if intersection:
|
|||
|
raise nx.NetworkXError(
|
|||
|
f"The sets x, y, included are not disjoint. Overlap: {intersection}"
|
|||
|
)
|
|||
|
|
|||
|
nodeset = x | y | included
|
|||
|
ancestors_x_y_included = nodeset.union(*[nx.ancestors(G, node) for node in nodeset])
|
|||
|
|
|||
|
z_init = restricted & (ancestors_x_y_included - (x | y))
|
|||
|
|
|||
|
x_closure = _reachable(G, x, ancestors_x_y_included, z_init)
|
|||
|
if x_closure & y:
|
|||
|
return None
|
|||
|
|
|||
|
z_updated = z_init & (x_closure | included)
|
|||
|
y_closure = _reachable(G, y, ancestors_x_y_included, z_updated)
|
|||
|
return z_updated & (y_closure | included)
|
|||
|
|
|||
|
|
|||
|
@not_implemented_for("undirected")
|
|||
|
@nx._dispatchable
|
|||
|
def is_minimal_d_separator(G, x, y, z, *, included=None, restricted=None):
|
|||
|
"""Determine if `z` is a minimal d-separator for `x` and `y`.
|
|||
|
|
|||
|
A d-separator, `z`, in a DAG is a set of nodes that blocks
|
|||
|
all paths from nodes in set `x` to nodes in set `y`.
|
|||
|
A minimal d-separator is a d-separator `z` such that removing
|
|||
|
any subset of nodes makes it no longer a d-separator.
|
|||
|
|
|||
|
Note: This function checks whether `z` is a d-separator AND is
|
|||
|
minimal. One can use the function `is_d_separator` to only check if
|
|||
|
`z` is a d-separator. See examples below.
|
|||
|
|
|||
|
Parameters
|
|||
|
----------
|
|||
|
G : nx.DiGraph
|
|||
|
A NetworkX DAG.
|
|||
|
x : node | set
|
|||
|
A node or set of nodes in the graph.
|
|||
|
y : node | set
|
|||
|
A node or set of nodes in the graph.
|
|||
|
z : node | set
|
|||
|
The node or set of nodes to check if it is a minimal d-separating set.
|
|||
|
The function :func:`is_d_separator` is called inside this function
|
|||
|
to verify that `z` is in fact a d-separator.
|
|||
|
included : set | node | None
|
|||
|
A node or set of nodes which must be included in the found separating set,
|
|||
|
default is ``None``, which means the empty set.
|
|||
|
restricted : set | node | None
|
|||
|
Restricted node or set of nodes to consider. Only these nodes can be in
|
|||
|
the found separating set, default is ``None`` meaning all nodes in ``G``.
|
|||
|
|
|||
|
Returns
|
|||
|
-------
|
|||
|
bool
|
|||
|
Whether or not the set `z` is a minimal d-separator subject to
|
|||
|
`restricted` nodes and `included` node constraints.
|
|||
|
|
|||
|
Examples
|
|||
|
--------
|
|||
|
>>> G = nx.path_graph([0, 1, 2, 3], create_using=nx.DiGraph)
|
|||
|
>>> G.add_node(4)
|
|||
|
>>> nx.is_minimal_d_separator(G, 0, 2, {1})
|
|||
|
True
|
|||
|
>>> # since {1} is the minimal d-separator, {1, 3, 4} is not minimal
|
|||
|
>>> nx.is_minimal_d_separator(G, 0, 2, {1, 3, 4})
|
|||
|
False
|
|||
|
>>> # alternatively, if we only want to check that {1, 3, 4} is a d-separator
|
|||
|
>>> nx.is_d_separator(G, 0, 2, {1, 3, 4})
|
|||
|
True
|
|||
|
|
|||
|
Raises
|
|||
|
------
|
|||
|
NetworkXError
|
|||
|
Raises a :exc:`NetworkXError` if the input graph is not a DAG.
|
|||
|
|
|||
|
NodeNotFound
|
|||
|
If any of the input nodes are not found in the graph,
|
|||
|
a :exc:`NodeNotFound` exception is raised.
|
|||
|
|
|||
|
References
|
|||
|
----------
|
|||
|
.. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
|
|||
|
minimal d-separators in linear time and applications." In
|
|||
|
Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
|
|||
|
|
|||
|
Notes
|
|||
|
-----
|
|||
|
This function works on verifying that a set is minimal and
|
|||
|
d-separating between two nodes. Uses criterion (a), (b), (c) on
|
|||
|
page 4 of [1]_. a) closure(`x`) and `y` are disjoint. b) `z` contains
|
|||
|
all nodes from `included` and is contained in the `restricted`
|
|||
|
nodes and in the union of ancestors of `x`, `y`, and `included`.
|
|||
|
c) the nodes in `z` not in `included` are contained in both
|
|||
|
closure(x) and closure(y). The closure of a set is the set of nodes
|
|||
|
connected to the set by a directed path in G.
|
|||
|
|
|||
|
The complexity is :math:`O(m)`, where :math:`m` stands for the
|
|||
|
number of edges in the subgraph of G consisting of only the
|
|||
|
ancestors of `x` and `y`.
|
|||
|
|
|||
|
For full details, see [1]_.
|
|||
|
"""
|
|||
|
if not nx.is_directed_acyclic_graph(G):
|
|||
|
raise nx.NetworkXError("graph should be directed acyclic")
|
|||
|
|
|||
|
try:
|
|||
|
x = {x} if x in G else x
|
|||
|
y = {y} if y in G else y
|
|||
|
z = {z} if z in G else z
|
|||
|
|
|||
|
if included is None:
|
|||
|
included = set()
|
|||
|
elif included in G:
|
|||
|
included = {included}
|
|||
|
|
|||
|
if restricted is None:
|
|||
|
restricted = set(G)
|
|||
|
elif restricted in G:
|
|||
|
restricted = {restricted}
|
|||
|
|
|||
|
set_y = x | y | included | restricted
|
|||
|
if set_y - G.nodes:
|
|||
|
raise nx.NodeNotFound(f"The node(s) {set_y - G.nodes} are not found in G")
|
|||
|
except TypeError:
|
|||
|
raise nx.NodeNotFound(
|
|||
|
"One of x, y, z, included or restricted is not a node or set of nodes in G"
|
|||
|
)
|
|||
|
|
|||
|
if not included <= z:
|
|||
|
raise nx.NetworkXError(
|
|||
|
f"Included nodes {included} must be in proposed separating set z {x}"
|
|||
|
)
|
|||
|
if not z <= restricted:
|
|||
|
raise nx.NetworkXError(
|
|||
|
f"Separating set {z} must be contained in restricted set {restricted}"
|
|||
|
)
|
|||
|
|
|||
|
intersection = x.intersection(y) or x.intersection(z) or y.intersection(z)
|
|||
|
if intersection:
|
|||
|
raise nx.NetworkXError(
|
|||
|
f"The sets are not disjoint, with intersection {intersection}"
|
|||
|
)
|
|||
|
|
|||
|
nodeset = x | y | included
|
|||
|
ancestors_x_y_included = nodeset.union(*[nx.ancestors(G, n) for n in nodeset])
|
|||
|
|
|||
|
# criterion (a) -- check that z is actually a separator
|
|||
|
x_closure = _reachable(G, x, ancestors_x_y_included, z)
|
|||
|
if x_closure & y:
|
|||
|
return False
|
|||
|
|
|||
|
# criterion (b) -- basic constraint; included and restricted already checked above
|
|||
|
if not (z <= ancestors_x_y_included):
|
|||
|
return False
|
|||
|
|
|||
|
# criterion (c) -- check that z is minimal
|
|||
|
y_closure = _reachable(G, y, ancestors_x_y_included, z)
|
|||
|
if not ((z - included) <= (x_closure & y_closure)):
|
|||
|
return False
|
|||
|
return True
|
|||
|
|
|||
|
|
|||
|
@not_implemented_for("undirected")
|
|||
|
def _reachable(G, x, a, z):
|
|||
|
"""Modified Bayes-Ball algorithm for finding d-connected nodes.
|
|||
|
|
|||
|
Find all nodes in `a` that are d-connected to those in `x` by
|
|||
|
those in `z`. This is an implementation of the function
|
|||
|
`REACHABLE` in [1]_ (which is itself a modification of the
|
|||
|
Bayes-Ball algorithm [2]_) when restricted to DAGs.
|
|||
|
|
|||
|
Parameters
|
|||
|
----------
|
|||
|
G : nx.DiGraph
|
|||
|
A NetworkX DAG.
|
|||
|
x : node | set
|
|||
|
A node in the DAG, or a set of nodes.
|
|||
|
a : node | set
|
|||
|
A (set of) node(s) in the DAG containing the ancestors of `x`.
|
|||
|
z : node | set
|
|||
|
The node or set of nodes conditioned on when checking d-connectedness.
|
|||
|
|
|||
|
Returns
|
|||
|
-------
|
|||
|
w : set
|
|||
|
The closure of `x` in `a` with respect to d-connectedness
|
|||
|
given `z`.
|
|||
|
|
|||
|
References
|
|||
|
----------
|
|||
|
.. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
|
|||
|
minimal d-separators in linear time and applications." In
|
|||
|
Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
|
|||
|
|
|||
|
.. [2] Shachter, Ross D. "Bayes-ball: The rational pastime
|
|||
|
(for determining irrelevance and requisite information in
|
|||
|
belief networks and influence diagrams)." In Proceedings of the
|
|||
|
Fourteenth Conference on Uncertainty in Artificial Intelligence
|
|||
|
(UAI), (pp. 480–487). 1998.
|
|||
|
"""
|
|||
|
|
|||
|
def _pass(e, v, f, n):
|
|||
|
"""Whether a ball entering node `v` along edge `e` passes to `n` along `f`.
|
|||
|
|
|||
|
Boolean function defined on page 6 of [1]_.
|
|||
|
|
|||
|
Parameters
|
|||
|
----------
|
|||
|
e : bool
|
|||
|
Directed edge by which the ball got to node `v`; `True` iff directed into `v`.
|
|||
|
v : node
|
|||
|
Node where the ball is.
|
|||
|
f : bool
|
|||
|
Directed edge connecting nodes `v` and `n`; `True` iff directed `n`.
|
|||
|
n : node
|
|||
|
Checking whether the ball passes to this node.
|
|||
|
|
|||
|
Returns
|
|||
|
-------
|
|||
|
b : bool
|
|||
|
Whether the ball passes or not.
|
|||
|
|
|||
|
References
|
|||
|
----------
|
|||
|
.. [1] van der Zander, Benito, and Maciej Liśkiewicz. "Finding
|
|||
|
minimal d-separators in linear time and applications." In
|
|||
|
Uncertainty in Artificial Intelligence, pp. 637-647. PMLR, 2020.
|
|||
|
"""
|
|||
|
is_element_of_A = n in a
|
|||
|
# almost_definite_status = True # always true for DAGs; not so for RCGs
|
|||
|
collider_if_in_Z = v not in z or (e and not f)
|
|||
|
return is_element_of_A and collider_if_in_Z # and almost_definite_status
|
|||
|
|
|||
|
queue = deque([])
|
|||
|
for node in x:
|
|||
|
if bool(G.pred[node]):
|
|||
|
queue.append((True, node))
|
|||
|
if bool(G.succ[node]):
|
|||
|
queue.append((False, node))
|
|||
|
processed = queue.copy()
|
|||
|
|
|||
|
while any(queue):
|
|||
|
e, v = queue.popleft()
|
|||
|
preds = ((False, n) for n in G.pred[v])
|
|||
|
succs = ((True, n) for n in G.succ[v])
|
|||
|
f_n_pairs = chain(preds, succs)
|
|||
|
for f, n in f_n_pairs:
|
|||
|
if (f, n) not in processed and _pass(e, v, f, n):
|
|||
|
queue.append((f, n))
|
|||
|
processed.append((f, n))
|
|||
|
|
|||
|
return {w for (_, w) in processed}
|