Mục Lục

Bayesian Networks¶

Bayesian networks are a probabilistic model that are especially good at inference given incomplete data. Much like a hidden Markov model, they consist of a directed graphical model (though Bayesian networks must also be acyclic) and a set of probability distributions. The edges encode dependency statements between the variables, where the lack of an edge between any pair of variables indicates a conditional independence. Each node encodes a probability distribution, where root nodes encode univariate probability distributions and inner/leaf nodes encode conditional probability distributions. Bayesian networks are exceptionally flexible when doing inference, as any subset of variables can be observed, and inference done over all other variables, without needing to define these groups in advance. In fact, the set of observed variables can change from one sample to the next without needing to modify the underlying algorithm at all.

Currently, pomegranate only supports discrete Bayesian networks, meaning that the values must be categories, i.e. ‘apples’ and ‘oranges’, or 1 and 2, where 1 and 2 refer to categories, not numbers, and so 2 is not explicitly ‘bigger’ than 1.

Initialization¶

Bayesian networks can be initialized in two ways, depending on whether the underlying graphical structure is known or not: (1) the graphical structure can be built one node at a time with pre-initialized distributions set for each node, or (2) both the graphical structure and distributions can be learned directly from data. This mirrors the other models that are implemented in pomegranate. However, typically expectation maximization is used to fit the parameters of the distribution, and so initialization (such as through k-means) is typically fast whereas fitting is slow. For Bayesian networks, the opposite is the case. Fitting can be done quickly by just summing counts through the data, while initialization is hard as it requires an exponential time search through all possible DAGs to identify the optimal graph. More is discussed in the tutorials above and in the fitting section below.

Let’s take a look at initializing a Bayesian network in the first manner by quickly implementing the Monty Hall problem. The Monty Hall problem arose from the gameshow Let’s Make a Deal, where a guest had to choose which one of three doors had a prize behind it. The twist was that after the guest chose, the host, originally Monty Hall, would then open one of the doors the guest did not pick and ask if the guest wanted to switch which door they had picked. Initial inspection may lead you to believe that if there are only two doors left, there is a 50-50 chance of you picking the right one, and so there is no advantage one way or the other. However, it has been proven both through simulations and analytically that there is in fact a 66% chance of getting the prize if the guest switches their door, regardless of the door they initially went with.

Our network will have three nodes, one for the guest, one for the prize, and one for the door Monty chooses to open. The door the guest initially chooses and the door the prize is behind are uniform random processes across the three doors, but the door which Monty opens is dependent on both the door the guest chooses (it cannot be the door the guest chooses), and the door the prize is behind (it cannot be the door with the prize behind it).


from
 pomegranate
 import
 *

guest
 =
 DiscreteDistribution
({
'A'
:
 1.
/
3
,
 'B'
:
 1.
/
3
,
 'C'
:
 1.
/
3
})
prize
 =
 DiscreteDistribution
({
'A'
:
 1.
/
3
,
 'B'
:
 1.
/
3
,
 'C'
:
 1.
/
3
})
monty
 =
 ConditionalProbabilityTable
(
        [[
'A'
,
 'A'
,
 'A'
,
 0.0
],
         [
'A'
,
 'A'
,
 'B'
,
 0.5
],
         [
'A'
,
 'A'
,
 'C'
,
 0.5
],
         [
'A'
,
 'B'
,
 'A'
,
 0.0
],
         [
'A'
,
 'B'
,
 'B'
,
 0.0
],
         [
'A'
,
 'B'
,
 'C'
,
 1.0
],
         [
'A'
,
 'C'
,
 'A'
,
 0.0
],
         [
'A'
,
 'C'
,
 'B'
,
 1.0
],
         [
'A'
,
 'C'
,
 'C'
,
 0.0
],
         [
'B'
,
 'A'
,
 'A'
,
 0.0
],
         [
'B'
,
 'A'
,
 'B'
,
 0.0
],
         [
'B'
,
 'A'
,
 'C'
,
 1.0
],
         [
'B'
,
 'B'
,
 'A'
,
 0.5
],
         [
'B'
,
 'B'
,
 'B'
,
 0.0
],
         [
'B'
,
 'B'
,
 'C'
,
 0.5
],
         [
'B'
,
 'C'
,
 'A'
,
 1.0
],
         [
'B'
,
 'C'
,
 'B'
,
 0.0
],
         [
'B'
,
 'C'
,
 'C'
,
 0.0
],
         [
'C'
,
 'A'
,
 'A'
,
 0.0
],
         [
'C'
,
 'A'
,
 'B'
,
 1.0
],
         [
'C'
,
 'A'
,
 'C'
,
 0.0
],
         [
'C'
,
 'B'
,
 'A'
,
 1.0
],
         [
'C'
,
 'B'
,
 'B'
,
 0.0
],
         [
'C'
,
 'B'
,
 'C'
,
 0.0
],
         [
'C'
,
 'C'
,
 'A'
,
 0.5
],
         [
'C'
,
 'C'
,
 'B'
,
 0.5
],
         [
'C'
,
 'C'
,
 'C'
,
 0.0
]],
 [
guest
,
 prize
])

s1
 =
 Node
(
guest
,
 name
=
"guest"
)
s2
 =
 Node
(
prize
,
 name
=
"prize"
)
s3
 =
 Node
(
monty
,
 name
=
"monty"
)

model
 =
 BayesianNetwork
(
"Monty Hall Problem"
)
model
.
add_states
(
s1
,
 s2
,
 s3
)
model
.
add_edge
(
s1
,
 s3
)
model
.
add_edge
(
s2
,
 s3
)
model
.
bake
()

Note

The objects ‘state’ and ‘node’ are really the same thing and can be used interchangeable. The only difference is the name, as hidden Markov models use ‘state’ in the literature frequently whereas Bayesian networks use ‘node’ frequently.

The conditional distribution must be explicitly spelled out in this example, followed by a list of the parents in the same order as the columns take in the table that is provided (e.g. the columns in the table correspond to guest, prize, monty, probability.)

However, one can also initialize a Bayesian network based completely on data. As mentioned before, the exact version of this algorithm takes exponential time with the number of variables and typically can’t be done on more than ~25 variables. This is because there are a super-exponential number of directed acyclic graphs that one could define over a set of variables, but fortunately one can use dynamic programming in order to reduce this complexity down to “simply exponential.” The implementation of the exact algorithm actually goes further than the original dynamic programming algorithm by implementing an A* search to somewhat reduce computational time but drastically reduce required memory, sometimes by an order of magnitude.


from
 pomegranate
 import
 *
import
 numpy

X
 =
 numpy
.
load
(
'data.npy'
)
model
 =
 BayesianNetwork
.
from_samples
(
X
,
 algorithm
=
'exact'
)

The exact algorithm is not the default, though. The default is a novel greedy algorithm that greedily chooses a topological ordering of the variables, but optimally identifies the best parents for each variable given this ordering. It is significantly faster and more memory efficient than the exact algorithm and produces far better estimates than using a Chow-Liu tree. This is set to the default to avoid locking up the computers of users that unintentionally tell their computers to do a near-impossible task.

Probability¶

You can calculate the probability of a sample under a Bayesian network as the product of the probability of each variable given its parents, if it has any. This can be expressed as $P = \prod\limits_{i=1}^{d} P(D_{i}|Pa_{i})$ for a sample with $d$ dimensions. For example, in the Monty Hal problem, the probability of a show is the probability of the guest choosing the respective door, times the probability of the prize being behind a given door, times the probability of Monty opening a given door given the previous two values. For example, using the manually initialized network above:


>>> 
print
(
model
.
probability
([[
'A'
,
 'A'
,
 'A'
],
                             ['A', 'A', 'B'],
                             ['C', 'C', 'B']]))
[ 0.          0.05555556  0.05555556]

Prediction¶

Bayesian networks are frequently used to infer/impute the value of missing variables given the observed values. In other models, typically there is either a single or fixed set of missing variables, such as latent factors, that need to be imputed, and so returning a fixed vector or matrix as the predictions makes sense. However, in the case of Bayesian networks, we can make no such assumptions, and so when data is passed in for prediction it should be in the format as a matrix with None in the missing variables that need to be inferred. The return is thus a filled in matrix where the Nones have been replaced with the imputed values. For example:


>>> 
print
(
model
.
predict
([[
'A'
,
 'B'
,
 None
],
                         ['A', 'C', None],
                         ['C', 'B', None]]))
[['A' 'B' 'C']
 ['A' 'C' 'B']
 ['C' 'B' 'A']]

In this example, the final column is the one that is always missing, but a more complex example is as follows:


>>> 
print
(
model
.
predict
([[
'A'
,
 'B'
,
 None
],
                 ['A', None, 'C'],
                 [None, 'B', 'A']]))
[['A' 'B' 'C']
 ['A' 'B' 'C']
 ['C' 'B' 'A']]

Fitting¶

Fitting a Bayesian network to data is a fairly simple process. Essentially, for each variable, you need consider only that column of data and the columns corresponding to that variables parents. If it is a univariate distribution, then the maximum likelihood estimate is just the count of each symbol divided by the number of samples in the data. If it is a multivariate distribution, it ends up being the probability of each symbol in the variable of interest given the combination of symbols in the parents. For example, consider a binary dataset with two variables, X and Y, where X is a parent of Y. First, we would go through the dataset and calculate P(X=0) and P(X=1). Then, we would calculate P(Y=0|X=0), P(Y=1|X=0), P(Y=0|X=1), and P(Y=1|X=1). Those values encode all of the parameters of the Bayesian network.

API Reference¶

class pomegranate.BayesianNetwork. BayesianNetwork ¶

A Bayesian Network Model.

A Bayesian network is a directed graph where nodes represent variables, edges
represent conditional dependencies of the children on their parents, and the
lack of an edge represents a conditional independence.

Parameters

name

str, optional: The name of the model. Default is None

Attributes

states

list, shape (n_states,)

A list of all the state objects in the model

graph networkx.DiGraph: The underlying graph object.

add_edge ( ) ¶: Add a transition from state a to state b which indicates that B is
dependent on A in ways specified by the distribution.

add_node ( ) ¶: Add a node to the graph.

add_nodes ( ) ¶: Add multiple states to the graph.

add_state ( ) ¶: Another name for a node.

add_states ( ) ¶: Another name for a node.

add_transition ( ) ¶: Transitions and edges are the same.

bake ( ) ¶

Finalize the topology of the model.

Assign a numerical index to every state and create the underlying arrays
corresponding to the states and edges between the states. This method
must be called before any of the probability-calculating methods. This
includes converting conditional probability tables into joint probability
tables and creating a list of both marginal and table nodes.

Parameters

None

Returns

None

clear_summaries ( ) ¶: Clear the summary statistics stored in the object.

copy ( ) ¶

Return a deep copy of this distribution object.

This object will not be tied to any other distribution or connected
in any form.

Parameters

None

Returns

distribution

Distribution: A copy of the distribution with the same parameters.

dense_transition_matrix ( ) ¶: Returns the dense transition matrix. Useful if the transitions of
somewhat small models need to be analyzed.

edge_count ( ) ¶: Returns the number of edges present in the model.

fit ( ) ¶

Fit the model to data using MLE estimates.

Fit the model to the data by updating each of the components of the model,
which are univariate or multivariate distributions. This uses a simple
MLE estimate to update the distributions according to their summarize or
fit methods.

This is a wrapper for the summarize and from_summaries methods.

Parameters

X array-like or generator, shape (n_samples, n_nodes): The data to train on, where each row is a sample and each column
corresponds to the associated variable.
weights

array-like, shape (n_nodes), optional: The weight of each sample as a positive double. Default is None.
inertia

double, optional: The inertia for updating the distributions, passed along to the
distribution method. Default is 0.0.
pseudocount

double, optional: A pseudocount to add to the emission of each distribution. This
effectively smoothes the states to prevent 0. probability symbols
if they don’t happen to occur in the data. Only effects hidden
Markov models defined over discrete distributions. Default is 0.
verbose

bool, optional: Whether or not to print out improvement information over
iterations. Only required if doing semisupervised learning.
Default is False.
n_jobs

int: The number of jobs to use to parallelize, either the number of threads
or the number of processes to use. -1 means use all available resources.
Default is 1.

Returns

self

BayesianNetwork: The fit Bayesian network object with updated model parameters.

freeze ( ) ¶: Freeze the distribution, preventing updates from occurring.

from_dict ( ) ¶: Deserialize this object from a dictionary of parameters.

from_json ( ) ¶

Deserialize this object from its JSON representation.

Parameters

s str: A JSON formatted string containing the file.

Returns

model

object: A properly initialized and baked model.

from_samples ( ) ¶

Learn the structure of the network from data.

There are currently two types of approaches implemented. The first,
the Chow-Liu algorithm, finds a tree-like structure from symmetric
mutual-information scores given a root node (the root parameter).
The second type searches through structures and returns the structure
that maximizes the following objective function:

P(D|M) + penalty * |M|

There are currently three ways that the learned structure can be
controlled. The first is to increase the penalty term to increase
sparsity. The second is to pass in a specified list of edges that
must exist (include_edges) or cannot exist (exclude_edges). Lastly,
a constraint graph can be specified where each node in the graph is a
set of variables being modeled and the edges in the graph indicate
which sets of variables can be parents to which other sets of
variables (and where a self-loop is the normal structure learning step).

Parameters

X array-like or generator, shape (n_samples, n_nodes): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

array-like, shape (n_samples), optional: The weight of each sample as a positive double. Default is None.
algorithm

str, one of ‘chow-liu’, ‘greedy’, ‘exact’, ‘exact-dp’ optional: The algorithm to use for learning the Bayesian network. Default is
‘greedy’ that greedily attempts to find the best structure, and
frequently can identify the optimal structure. ‘exact’ uses DP/A*
to find the optimal Bayesian network, and ‘exact-dp’ tries to find
the shortest path on the entire order lattice, which is more memory
and computationally expensive. ‘exact’ and ‘exact-dp’ should give
identical results, with ‘exact-dp’ remaining an option mostly for
debugging reasons. ‘chow-liu’ will return the optimal tree-like
structure for the Bayesian network, which is a very fast
approximation but not always the best network.
max_parents

int, optional: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
root

int, optional: For algorithms which require a single root (‘chow-liu’), this is the
root for which all edges point away from. User may specify which
column to use as the root. Default is the first column.
constraint_graph

networkx.DiGraph or None, optional: A directed graph showing valid parent sets for each variable. Each
node is a set of variables, and edges represent which variables can
be valid parents of those variables. The naive structure learning
task is just all variables in a single node with a self edge,
meaning that you know nothing about
include_edges

list or None, optional: A list of (parent, child) tuples that are edges which must be
present in the found structure. Default is None.
exclude_edges

list or None, optional: A list of (parent, child) tuples that are edges which cannot be
present in the found structure. Default is None.
pseudocount

double, optional: A pseudocount to add to the emission of each distribution. This
effectively smoothes the states to prevent 0. probability symbols
if they don’t happen to occur in the data. Default is 0.
state_names

array-like, shape (n_nodes), optional: A list of meaningful names to be applied to nodes
name

str, optional: The name of the model. Default is None.
reduce_dataset

bool, optional: Given the discrete nature of these datasets, frequently a user
will pass in a dataset that has many identical samples. It is time
consuming to go through these redundant samples and a far more
efficient use of time to simply calculate a new dataset comprised
of the subset of unique observed samples weighted by the number of
times they occur in the dataset. This typically will speed up all
algorithms, including when using a constraint graph. Default is
True.
keys

list, optional: A list of sets where each set is the keys present in that column.
If there are d columns in the data set then this list should have
d sets and each set should have at least two keys in it. Default
is None.
low_memory

bool or None, optional: Whether to use a low-memory version of the search algorithm. This
option only affects algorithm=”greedy” and algorithm=”exact”.
Although the low-memory version of both the greedy and exact
algorithms will use less memory, it will also significantly slow
down the exact algorithm. However, setting this to True will also
significantly speed up the greedy algorithm. Setting this value to
None will enable it when algorithm=”greedy” and disable it otherwise.
Default is None.
n_jobs

int, optional: The number of threads to use when learning the structure of the
network. If a constraint graph is provided, this will parallelize
the tasks as directed by the constraint graph. If one is not
provided it will parallelize the building of the parent graphs.
Both cases will provide large speed gains.

Returns

model

BayesianNetwork: The learned BayesianNetwork.

from_structure ( ) ¶

Return a Bayesian network from a predefined structure.

Pass in the structure of the network as a tuple of tuples and get a fit
network in return. The tuple should contain n tuples, with one for each
node in the graph. Each inner tuple should be of the parents for that
node. For example, a three node graph where both node 0 and 1 have node
2 as a parent would be specified as ((2,), (2,), ()).

Parameters

X array-like, shape (n_samples, n_nodes): The data to fit the structure too, where each row is a sample and each column
corresponds to the associated variable.
structure

tuple of tuples: The parents for each node in the graph. If a node has no parents,
then do not specify any parents.
weights

array-like, shape (n_nodes), optional: The weight of each sample as a positive double. Default is None.
pseudocount

double, optional: A pseudocount to add to the emission of each distribution. This
effectively smoothes the states to prevent 0. probability symbols
if they don’t happen to occur in the data. Default is 0.
name

str, optional: The name of the model. Default is None.
state_names

array-like, shape (n_nodes), optional: A list of meaningful names to be applied to nodes
keys

list: A list of sets where each set is the keys present in that column.
If there are d columns in the data set then this list should have
d sets and each set should have at least two keys in it.

Returns

model

BayesianNetwork: A Bayesian network with the specified structure.

from_summaries ( ) ¶

Use MLE on the stored sufficient statistics to train the model.

This uses MLE estimates on the stored sufficient statistics to train
the model.

Parameters

inertia

double, optional: The inertia for updating the distributions, passed along to the
distribution method. Default is 0.0.
pseudocount

double, optional: A pseudocount to add to the emission of each distribution. This
effectively smoothes the states to prevent 0. probability symbols
if they don’t happen to occur in the data. Default is 0.

Returns

None

from_yaml ( ) ¶: Deserialize this object from its YAML representation.

log_probability ( ) ¶

Return the log probability of samples under the Bayesian network.

The log probability is just the sum of the log probabilities under each of
the components. The log probability of a sample under the graph A -> B is
just P(A)*P(B|A). This will return a vector of log probabilities, one for each
sample.

Parameters

X array-like, shape (n_samples, n_dim): The sample is a vector of points where each dimension represents the
same variable as added to the graph originally. It doesn’t matter what
the connections between these variables are, just that they are all
ordered the same.
check_input

bool, optional: Check to make sure that the observed symbol is a valid symbol for that
distribution to produce. Default is True.
n_jobs

int: The number of jobs to use to parallelize, either the number of threads
or the number of processes to use. -1 means use all available resources.
Default is 1.

Returns

logp

numpy.ndarray or double: The log probability of the samples if many, or the single log probability.

marginal ( ) ¶

Return the marginal probabilities of each variable in the graph.

This is equivalent to a pass of belief propagation on a graph where
no data has been given. This will calculate the probability of each
variable being in each possible emission when nothing is known.

Parameters

None

Returns

marginals

array-like, shape (n_nodes): An array of univariate distribution objects showing the marginal
probabilities of that variable.

node_count ( ) ¶: Returns the number of nodes/states in the model

plot ( ) ¶

Draw this model’s graph using pygraphviz and matplotlib.

If no filename, it uses pygraphviz to write a temporary png file,
and matplotlib to imshow() it. If using jupyter or IPython, enable
%matplotlib inline and this will immediately display your graph.
Otherwise, per usual matplotlib convention, you have to issue a
plt.show() or matplotlib.pyplot.show() to open a window with the
image.

TODO: Use svg or pdf for original image. Jupyter and IPython can render SVG
directly, e.g. from IPython.display import SVG and SVG(filename=…).

Parameters

filename

str, optional: Filename for saving the .pdf graph. Default is None

Returns

None

predict ( ) ¶

Predict missing values of a data matrix using MLE.

Impute the missing values of a data matrix using the maximally likely
predictions according to the forward-backward algorithm. Run each
sample through the algorithm (predict_proba) and replace missing values
with the maximally likely predicted emission.

Parameters

X array-like, shape (n_samples, n_nodes): Data matrix to impute. Missing values must be either None (if lists)
or np.nan (if numpy.ndarray). Will fill in these values with the
maximally likely ones.
max_iterations

int, optional: Number of iterations to run loopy belief propagation for. Default
is 100.
check_input

bool, optional: Check to make sure that the observed symbol is a valid symbol for that
distribution to produce. Default is True.
n_jobs

int: The number of jobs to use to parallelize, either the number of threads
or the number of processes to use. -1 means use all available resources.
Default is 1.

Returns

y_hat

numpy.ndarray, shape (n_samples, n_nodes): This is the data matrix with the missing values imputed.

predict_proba ( ) ¶

Returns the probabilities of each variable in the graph given evidence.

This calculates the marginal probability distributions for each state given
the evidence provided through loopy belief propagation. Loopy belief
propagation is an approximate algorithm which is exact for certain graph
structures.

Parameters

X dict or array-like, shape <= n_nodes: The evidence supplied to the graph. This can either be a dictionary
with keys being state names and values being the observed values
(either the emissions or a distribution over the emissions) or an
array with the values being ordered according to the nodes incorporation
in the graph (the order fed into .add_states/add_nodes) and None for
variables which are unknown. It can also be vectorized, so a list of
dictionaries can be passed in where each dictionary is a single sample,
or a list of lists where each list is a single sample, both formatted
as mentioned before.
max_iterations

int, optional: The number of iterations with which to do loopy belief propagation.
Usually requires only 1. Default is 100.
check_input

bool, optional: Check to make sure that the observed symbol is a valid symbol for that
distribution to produce. Default is True.
n_jobs

int, optional: The number of threads to use when parallelizing the job. This
parameter is passed directly into joblib. Default is 1, indicating
no parallelism.

Returns

y_hat

array-like, shape (n_samples, n_nodes): An array of univariate distribution objects showing the probabilities
of each variable.

probability ( ) ¶

Return the probability of the given symbol under this distribution.

Parameters

symbol

object: The symbol to calculate the probability of

Returns

probability

double: The probability of that point under the distribution.

sample ( ) ¶

Sample the network, optionally given some evidences
Use rejection to condition on non marginal nodes

Parameters

n int, optional: The number of samples to generate. Defaults to 1.
evidences

list of dict, optional: Evidence to set constant while samples are generated.
algorithm:

str, one of ‘gibbs’, ‘rejection’ optional. default ‘rejection’: Rejection sampling successively sample each node given its parents evidence. When evidences are given on
non-root nodes, only draws that are compatible with evidence nodes are not rejected. Rejection sampling is a good
option when evidences nodes are not far from the root nodes or when given evidence is likely. Rare evidences
lead to a high rate of rejected samples, thus to significant slow down of the sampling.
Gibbs sampling scheme is a Markov Chain Monte Carlo (MCMC) technique designed to speed up the sampling. It
builds conditional probability of state transition of each nodes given its neighbours in its markov blanket.
Works well with a lot of evidences in the network, even when they are far from the root nodes. Drawback :
convergence is only guaranteed when there is a non null probability path between states. If the posterior
consists of isolated islands of high probability, Gibbs sampling will stay stuck in one the island
and will never transition to the others. Successive samples will have high correlation.
min_prob

float <1 optional. If algorithm == “rejection”: stop iterations when Sum P(X|Evidence) < min_prob. generated samples for a given evidence will be
incomplete (<n)
initial_state

dict, optional: initial state used by the Gibbs sampler.
Default is to use the maximum joint-probability values, calculated with self.predict().
The default should be optimal.
scan_order: str, one ‘topological’,’random’ optional. If algorithm == “gibbs”: Scan order or the gibbs sampler. Indicate in which order nodes are sampled. Topological order is good for
chain like networks (lots of successive nodes). Random order yield better results with more connected

networks. Default : ‘random’.
burnin

int, optional. If algorithm == “Gibbs”: Number of sample to discard at the begining of the sampling. Default is 0.
random_state

seed or seeded numpy instance (for gibbs, only seed)

Returns

a nested list of sampled states of shape [n*len(evidences),len(nodes)]

Examples


>>> 
network
.
sample
(
evidence
 =
 [{
'HLML'
:
 '2'
},{
'HLML'
:
 '2'
,
'TYPL'
:
'1'
},{
'NBPI'
:
 '02'
,
'TYPL'
:
'1'
}])

score ( ) ¶

Return the accuracy of the model on a data set.

Parameters

X numpy.ndarray, shape=(n, d): The values of the data set
y numpy.ndarray, shape=(n,): The labels of each value

state_count ( ) ¶: Returns the number of states present in the model.

summarize ( ) ¶

Summarize a batch of data and store the sufficient statistics.

This will partition the dataset into columns which belong to their
appropriate distribution. If the distribution has parents, then multiple
columns are sent to the distribution. This relies mostly on the summarize
function of the underlying distribution.

Parameters

X array-like, shape (n_samples, n_nodes): The data to train on, where each row is a sample and each column
corresponds to the associated variable.
weights

array-like, shape (n_nodes), optional: The weight of each sample as a positive double. Default is None.

Returns

None

thaw ( ) ¶: Thaw the distribution, re-allowing updates to occur.

to_dict ( ) ¶: Serialize this object to a dictionary of parameters.

to_json ( ) ¶

Serialize the model to JSON.

Parameters

separators

tuple, optional: The two separators to pass to the json.dumps function for
formatting.
Default is (‘,’, ‘ : ‘).
indent

int, optional: The indentation to use at each level. Passed to json.dumps for
formatting. Default is 4.

Returns

json

str: A properly formatted JSON object.

to_yaml ( ) ¶: Serialize the model to YAML for compactness.

class pomegranate.BayesianNetwork. ParentGraph ¶

Generate a parent graph for a single variable over its parents.

This will generate the parent graph for a single parents given the data.
A parent graph is the dynamically generated best parent set and respective
score for each combination of parent variables. For example, if we are
generating a parent graph for x1 over x2, x3, and x4, we may calculate that
having x2 as a parent is better than x2,x3 and so store the value
of x2 in the node for x2,x3.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
include_parents

tuple: A set of parents that this node must have.
exclude_parents

tuple: A set of parents that this node cannot have.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
low_memory

bool or None, optional: Whether to use a low-memory version of the search algorithm. This
option only affects algorithm=”greedy” and algorithm=”exact”.
Although the low-memory version of both the greedy and exact
algorithms will use less memory, it will also significantly slow
down the exact algorithm. However, setting this to True will also
significantly speed up the greedy algorithm. Setting this value to
None will enable it when algorithm=”greedy” and disable it otherwise.
Default is None.

Returns

structure

tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork. discrete_exact_a_star ( ) ¶

Find the optimal graph over a set of variables with no other knowledge.

This is the naive dynamic programming structure learning task where the
optimal graph is identified from a set of variables using an order graph
and parent graphs. This can be used either when no constraint graph is
provided or for a SCC which is made up of a node containing a self-loop.
It uses DP/A* in order to find the optimal graph without considering all
possible topological sorts. A greedy version of the algorithm can be used
that massively reduces both the computational and memory cost while frequently
producing the optimal graph.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
include_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that must exist in the found structure.
exclude_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that cannot exist in the found structure.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
low_memory

bool or None, optional: Whether to use a low-memory version of the search algorithm. This
option only affects algorithm=”greedy” and algorithm=”exact”.
Although the low-memory version of both the greedy and exact
algorithms will use less memory, it will also significantly slow
down the exact algorithm. However, setting this to True will also
significantly speed up the greedy algorithm. Setting this value to
None will enable it when algorithm=”greedy” and disable it otherwise.
Default is None.
n_jobs

int: The number of threads to use when learning the structure of the
network. This parallelizes the creation of the parent graphs.

Returns

structure

tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork. discrete_exact_component ( ) ¶

Find the optimal graph over a multi-node component of the constaint graph.

The general algorithm in this case is to begin with each variable and add
all possible single children for that entry recursively until completion.
This will result in a far sparser order graph than before. In addition, one
can eliminate entries from the parent graphs that contain invalid parents
as they are a fast of computational time.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
include_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that must exist in the found structure.
exclude_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that cannot exist in the found structure.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
n_jobs

int: The number of threads to use when learning the structure of the
network. This parallelizes the creation of the parent graphs.

Returns

structure

tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork. discrete_exact_dp ( ) ¶

Find the optimal graph over a set of variables with no other knowledge.

This is the naive dynamic programming structure learning task where the
optimal graph is identified from a set of variables using an order graph
and parent graphs. This can be used either when no constraint graph is
provided or for a SCC which is made up of a node containing a self-loop.
This is a reference implementation that uses the naive shortest path
algorithm over the entire order graph. The ‘exact’ option uses the A* path
in order to avoid considering the full order graph.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
include_edges

list or None: A set of (parent, child) tuples where each tuple is an edge that
must exist in the found structure.
exclude_edges

list or None: A set of (parent, child) tuples where each tuple is an edge that
cannot exist in the found structure.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
n_jobs

int: The number of threads to use when learning the structure of the
network. This parallelizes the creation of the parent graphs.

Returns

structure

tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork. discrete_exact_slap ( ) ¶

Find the optimal graph in a node with a Self Loop And Parents (SLAP).

Instead of just performing exact BNSL over the set of all parents and
removing the offending edges there are efficiencies that can be gained
by considering the structure. In particular, parents not coming from the
main node do not need to be considered in the order graph but simply
added to each entry after creation of the order graph. This is because
those variables occur earlier in the topological ordering but it doesn’t
matter how they occur otherwise. Parent graphs must be defined over all
variables however.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
include_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that must exist in the found structure.
exclude_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that cannot exist in the found structure.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
n_jobs

int: The number of threads to use when learning the structure of the
network. This parallelizes the creation of the parent graphs.

Returns

structure

tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork. discrete_exact_with_constraints ( ) ¶

This returns the optimal Bayesian network given a set of constraints.

This function controls the process of learning the Bayesian network by
taking in a constraint graph, identifying the strongly connected
components (SCCs) and solving each one using the appropriate algorithm.
This is mostly an internal function.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
include_edges

list or None: A set of (parent, child) tuples where each tuple is an edge that
must exist in the found structure.
exclude_edges

list or None: A set of (parent, child) tuples where each tuple is an edge that
cannot exist in the found structure.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
constraint_graph

networkx.DiGraph: A directed graph showing valid parent sets for each variable. Each
node is a set of variables, and edges represent which variables can
be valid parents of those variables. The naive structure learning
task is just all variables in a single node with a self edge,
meaning that you know nothing about
low_memory

bool or None, optional: Whether to use a low-memory version of the search algorithm. This
option only affects algorithm=”greedy” and algorithm=”exact”.
Although the low-memory version of both the greedy and exact
algorithms will use less memory, it will also significantly slow
down the exact algorithm. However, setting this to True will also
significantly speed up the greedy algorithm. Setting this value to
None will enable it when algorithm=”greedy” and disable it otherwise.
Default is None.
n_jobs

int: The number of threads to use when learning the structure of the
network. This parallelized both the creation of the parent
graphs for each variable and the solving of the SCCs. -1 means
use all available resources. Default is 1, meaning no parallelism.

Returns

structure

tuple, shape=(d,): The parents for each variable in the network.

pomegranate.BayesianNetwork. discrete_exact_with_constraints_task ( ) ¶

This is a wrapper for the function to be parallelized by joblib.

This function takes in a single task as an id and a set of parents and
children and calls the appropriate function. This is mostly a wrapper for
joblib to parallelize.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
include_edges

list or None: A set of (parent, child) tuples where each tuple is an edge that
must exist in the found structure.
exclude_edges

list or None: A set of (parent, child) tuples where each tuple is an edge that
cannot exist in the found structure.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
task

tuple: A 3-tuple containing the id, the set of parents and the set of children
to learn a component of the Bayesian network over. The cases represent
a SCC of the following:

0 – Self loop and no parents
1 – Self loop and parents
2 – Parents and no self loop
3 – Multiple nodes
low_memory

bool or None, optional: Whether to use a low-memory version of the search algorithm. This
option only affects algorithm=”greedy” and algorithm=”exact”.
Although the low-memory version of both the greedy and exact
algorithms will use less memory, it will also significantly slow
down the exact algorithm. However, setting this to True will also
significantly speed up the greedy algorithm. Setting this value to
None will enable it when algorithm=”greedy” and disable it otherwise.
Default is None.
n_jobs

int: The number of threads to use when learning the structure of the
network. This parallelizes the creation of the parent graphs
for each task or the finding of best parents in case 2.

Returns

structure

tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork. discrete_greedy ( ) ¶

Find the optimal graph over a set of variables with no other knowledge.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
include_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that must exist in the found structure.
exclude_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that cannot exist in the found structure.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
low_memory

bool or None, optional: Whether to use a low-memory version of the search algorithm. This
option only affects algorithm=”greedy” and algorithm=”exact”.
Although the low-memory version of both the greedy and exact
algorithms will use less memory, it will also significantly slow
down the exact algorithm. However, setting this to True will also
significantly speed up the greedy algorithm. Setting this value to
None will enable it when algorithm=”greedy” and disable it otherwise.
Default is None.
n_jobs

int: The number of threads to use when learning the structure of the
network. This parallelizes the creation of the parent graphs.

Returns

structure

tuple, shape=(d,): The parents for each variable in this SCC

pomegranate.BayesianNetwork. generate_parent_graph ( ) ¶

Generate a parent graph for a single variable over its parents.

Parameters

X numpy.ndarray, shape=(n, d): The data to fit the structure too, where each row is a sample and
each column corresponds to the associated variable.
weights

numpy.ndarray, shape=(n,): The weight of each sample as a positive double. Default is None.
key_count

numpy.ndarray, shape=(d,): The number of unique keys in each column.
i int: The column index to build the parent graph for.
include_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that must exist in the found structure.
exclude_edges

list or None: A list of (parent, child) tuples where each tuple corresponds to an
edge that cannot exist in the found structure.
pseudocount

double: A pseudocount to add to each possibility.
penalty

float or None, optional: The weighting of the model complexity term in the objective function.
Increasing this value will encourage sparsity whereas setting the value
to 0 will result in an unregularized structure. Default is
log2(|D|) / 2 where |D| is the sum of the weights of the data.
max_parents

int: The maximum number of parents a node can have. If used, this means
using the k-learn procedure. Can drastically speed up algorithms.
If -1, no max on parents. Default is -1.
parent_set

tuple, default (): The variables which are possible parents for this variable. If nothing
is passed in then it defaults to all other variables, as one would
expect in the naive case. This allows for cases where we want to build
a parent graph over only a subset of the variables.

Returns