Should I be reading this documentation?

This is the documentation for the GOBNILP source code for development purposes.
For help in using GOBNILP, see the PDF manual included in the distribution.

Overview of the code layout


circuit_cuts.c contains code to look for cluster cuts by searching for cyclic graphs in a given solution (typically the LP relaxation). This code is 'logically' part of cons_dagcluster.c since it adds constraints (as cutting planes) which follow from acyclicity.


cons_chordal.c implements a constraint handler for constraints which ensure that DAGs have no immoralities and thus are equivalent to chordal undirected graphs (aka decomposable models).


cons_ci.c implements a constraint handler for conditional independence constraints.


Enforcing a constraint to rule out cycles in the Bayesian network is non-trivial. The approach in GOBNILP is to use ‘cluster constraints’. As there are exponentially many of these, they are added during solving as cutting planes. The main code associated with creating and updating these constraints is contained in cons_dagcluster.c .


cons_distinguishedrep.c implements a constraint handler for constraints which ensure that only one representative from each equivalence class of Markov equivalent DAGs is feasible.


cons_lop.c implements a constraint handler for linear ordering constraints. This constraint handler was written by Marc Pfetsch and is part of the LOP example that comes with the SCIP distribution. It is copied across when GOBNILP is installed.


cons_partialordering.c implements a constraint handler for partial order constraints. This constraint handler implements constraints ensuring that DAGs are consistent with a partial order. The partial order can be minimal in which case it is the ancestor relation for the DAG. The constraint handler was written by small modifications to cons_lop.c . This file is created when GOBNILP is installed by applying a patch to a copy of cons_lop.c.


cons_vanilla.c implements a constraint handler for constraints stating a DAG that no 'obviously' suboptimal DAGs are feasible under the assumption that no constraints (other than acyclicity) are imposed on DAGs.


convex4_cuts.c contains code to look for cuts which are (generalisations of) facets of the convex hull of 4-node DAGs. This code is 'logically' part of cons_dagcluster.c since it adds constraints (typically as cutting planes) which follow from acyclicity.


disp_clearcols.c contains code to generate clearer column headings for GOBNILP output


event_splitdag.c provides a simple event handler to detect when the program should attempt to split dagcluster constraints into their individual strongly connected components


fractional_circuit_cuts.c generalises the approach taken in circuit_cuts.c This code is 'logically' part of cons_dagcluster.c since it adds constraints (as cutting planes) which follow from acyclicity.


The main entry point in to the code is in the file gobnilp.c . This file contains a single procedure mostly laying out the general order of operations for a problem in SCIP. The idea is that this main calling function should able to be largely ignorant of how the underlying implementation of the problem so that representations and functionality can be altered without affecting the general order of operations of the program. Unless you are adding a new plugin, command line parameter or altering the basic workflow of the code, you probably shouldn't be making alterations in this file.


heur_sinks.c has most of the code necessary to implement a heuristic for finding Bayesian networks. In general, this file should be self contained, except for the function needed to include the heuristic in the program. The one excpetion to this is that the heuristic shouldn't assign variables for the pedigree problem. These should instead be set from pedigrees.c in order to maintain the correct modularisation of data.


matroid_cuts.c contains functions for finding 'matroid' cuts (as proposed by Milan Studeny)


metadata.c contains metadata relating to the problem that is not necessarily part of the ILP that will be solved.


model_averaging.c contains all the functionality to perform model averaging over the n best networks found. As far as possible, the fact that GOBNILP is capable of model averaging should be hidden from other parts of the program. There are two exceptions to this. First, gobnilp.c needs to know about model averaging in order that it can make appropriate calls to perform the model averaging. Second, output.c needs to be aware that the program can perform model averaging so that the output functions can produce suitable output if a model average is to be printed.


As the name suggests, output.c contains functions related to outputing data. In fact, with a small number of exceptions, all functions related to file output should appear in this file. The exceptions to this are those functions which would break the modular nature of the program if put into this file. At present, the only substantial output function not appearing in this file is that for outputting pedigrees, which instead appears in pedigrees.c . Functions appearing in this file for outputting solutions should also be capable of sensible output when called with model averaging data.


parent_set_data.c contains code for copying, deleting and altering ParentSetData which contains information on candidate parent sets for vertices. The header file parent_set_data.h defines several data structures that are used throughout the program. Most importantly, this is where the main data structure for storing instance data ParentSetData is defined.


pedigree_data.c contains functions for managing pedigree data (stored in a PedigreeData instance)


pedigree_scorer.c contains functions for creating scores for pedigrees.


GOBNILP is capable of learning pedigrees in addition to general purpose Bayesian network learning. As much of the additional behaviour that this requires is unrelated to normal Bayesian network learning, the code for it is all assembled in one place where it can be altered without affecting the main program. Any functions related to pedigrees should appear in this file. Outside pedigrees.c , the data and constraints that the pedigree part of the program is using should be treated as a black box. A function PD_inPedigreeMode is defined which allows the rest of the program to determine whether pedigrees are being used and call the appropriate pedigree specific functions if so.


The majority of operations that are more specific to Bayesian network learning are in the file probdata_bn.c . For example, the functions that create the variables and constraints needed to learn BNs are found in this file. If adding a relatively small change in functionality to GOBNILP , this is probably the first place to consider adding it.


property_data.c contains functions related to managing PropertyData.


scoring.c computes local scores (currently only BDeu scores) for candidate parent sets from complete discrete data. Includes code which allows certain constraints (e.g. conditional independence ones) to limit which local scores need be computed. ADtrees (plus some additional caching of log-likelihoods) are used to improve efficiency over a naive approach.


set_packing_cuts.c adds initial constraints (not cuts despite the name of the file!) of the sort $I(A \leftarrow B,C) + I(B \leftarrow A,C) + I(C \leftarrow A,B) \leq 1$. These are 'redundant' but improve efficiency. See inequality (12) in Bartlett and Cussens (UAI'13).


solution_info contains code for collecting and storing information on a solution (typically an LP solution) which is then used by eg. several separators


stack.c implements a data type representing a stack of integers


subip_cuts.c constructs and solves a sub-IP for finding cluster cuts. Used by both cons_dagcluster.c and cons_ci.c


utils.c contains a number of functions that may be useful in several other files. At present most of these are shortcuts to several commonly used SCIP functions which have lots of parameters but which are nearly always called with the same parameters for many of these. The functions in this file just make the rest of the code easier to read by hiding this distraction.


vector.c implements a data type representing a resizeable array of integers


vectorlist.c implements a data type representing a resizeable array resizeable arrays of integers


versiongit.h just contains some versioning information and probably shouldn't be changed, except to update the version numbers.


gobnilp.c has the overall structure of the program. probdata_bn.c provides the functions related to creating the Bayesian network problem. parent_set_data.h contain the main shared data structures for the programs. Pedigree functions are all in pedigrees.c , the behaviour of which should be separated from the rest of the program, such that pedigree related constraints can be added or removed without affecting the correctness of the rest of the program. cons_dagcluster.c and heur_sinks.c provide SCIP plugins and should be entirely self contained except for calls to pedigree functions. The program should be unaware of model averaging except for calls from gobnilp.c and output.c .