An Introduction to Bayesian Networks with Jayes

At Eclipse Code Recommenders, most of our recommendation engines use Bayesian Networks, which are a compact representation of probability distributions. They thus serve to express relationships between variables in a partially observable world. Our recommenders use these networks to predict what the developer wants to use next, based on what he has done previously.

When the Code Recommenders project first started, there was a need for a new open-source, pure-Java bayesian network library. As part of my bachelor thesis, I created such a library, called Jayes. Jayes has since become the backend of most Code Recommenders’ recommendation engines. Its development continues and a new version of Jayes will be included in the upcoming Code Recommenders 2.0.

This post describes how to use Jayes for your own inference tasks.

Guest Blogger: Michael

Michael Kutschke is currently completing his Master of Science at the Computer Science department of Technische Universität Darmstadt. His areas of interest include large scale data analysis, statistics and optimization. Michael contributes to the open source Code Recommenders project through his student work terms with Codetrails.

What Jayes is, and what it isn’t

Jayes is a library for Bayesian networks and the inference in such networks. At the moment, there is no learning component included. (We can, however, recommend the Apache Mahout library.)

Where can I get it?

There are two sources for getting your hands on Jayes’ source code:

The version in the Code Recommenders repository that we’re using for this post has not, at the time of writing, been merged into the Eclipse repository but will be soon. Also, it is still under development and does not yet carry the version number 2.0 (as we are just about to start the 2.0 branch of Code Recommenders). That means for the most current version of Jayes, my Github repository is the place to go. The Github repository also contains the classes used for evaluating and benchmarking Jayes, which will not move to Eclipse in the foreseeable future. These classes allow you to assess the runtime performance of Jayes.

How do I use it?

Before we start, you’ll need to have a rough idea of what a Bayesian network is and what it looks like. Wikipedia has a pretty good introduction. So let’s get started. For the use case of inference, there are only three classes you need to know:

  • org.eclipse.recommenders.jayes.BayesNet
  • org.eclipse.recommenders.jayes.BayesNode
  • org.eclipse.recommenders.jayes.inference.junctionTree.JunctionTreeAlgorithm

The first two are used for setting up the model itself, while the third is the algorithm used for inference.

The BayesNet is a container for BayesNodes, which represent the random variables of the probability distribution you are modeling. The preferred way in Jayes 2.0 for creating BayesNodes is through BayesNet.createNode(String name). This is most likely the only method of BayesNet you need to use.

A BayesNode has outcomes, parents, and a conditional probability table. It is important to set the probabilities last, after setting the outcomes of the parent nodes. The following diagram shows the workflow:

The methods from BayesNode that perform these steps are

  • BayesNode.addOutcomes(String...),
  • BayesNode.setParents(List<BayesNode>), and
  • BayesNode.setProbabilities(double...).

The probabilities in the conditional probability table need to be specified in a particular order. This is shown in the following code snippet:

BayesNet net = new BayesNet();
BayesNode a = net.createNode("a");
a.addOutcomes("true", "false");
a.setProbabilities(0.2, 0.8);

BayesNode b = net.createNode("b");
b.addOutcomes("one", "two", "three");

  0.1, 0.4, 0.5, // a == true
  0.3, 0.4, 0.3, // a == false

BayesNode c = net.createNode("c");
c.addOutcomes("true", "false");
c.setParents(Arrays.asList(a, b));
  // a == true
  0.1, 0.9, // b == one
  0.0, 1.0, // b == two
  0.5, 0.5, // b == three
  // a == false
  0.2, 0.8, // b == one
  0.0, 1.0, // b == two
  0.7, 0.3, // b == three

We now have a network and want to perform inference. The class used for this task is JunctionTreeAlgorithm.

IBayesInferer inferer = new JunctionTreeAlgorithm();

Map<BayesNode,String> evidence = new HashMap<BayesNode,String>();
evidence.put(a, "false");
evidence.put(b, "three");

double[] beliefsC = inferer.getBeliefs(c);

This gives us the probability distribution P(c | a = “false”, b =”three”). 

Potential pitfalls

Inference algorithms use an internal representation of the network that will not be updated when you update the BayesNet. Should your use case require changes to the BayesNet, you need to call IBayesInferer.setNetwork() again to update the internal representation.

We discourage using a BayesNode from a different network as a parent. The inference algorithms will access all BayesNodes through the BayesNet and this mixing of BayesNodes from different networks is very likely to lead to errors.

Advanced features

In Jayes 2.0, we added several advanced features that allowed us to trade-off between three major performance indicators for the inference engine: memory consumption, runtime performance, and numerical stability. For example, in terms of numerical stability, one limitation of Jayes is the network size. With increasing network size, any observed event is so unlikely that it becomes indistinguishable from an impossible event. This leads to an error because everything suddenly has zero probability – which is, of course, not true and Jayes consequently throws an exception to inform the user about this situation. Some of the advanced features described below have an influence on when this problem appears, and therefore how large the networks can become.

Out-of-the-box, Jayes allows for fine-tuning in several areas:

  • Floating point representation
  • Logarithmic values
  • Graph elimination algorithm
  • Factor decomposition

Floating point representation

Jayes can compute with double precision as well as single precision. Using single precision consumes less memory, but networks with more than ~200 variables are likely to suffer from numerical instability. Double precision can, on the other hand, easily support several thousand variables.

To set the floating point representation, use org.eclipse.recommenders.jayes.factor.FactorFactory.setFloatingPointType(Class).  (Valid arguments are float.class and double.class). The default is double precision, although using single precision reduces the memory consumption by approximately 50% and has no measurable impact on runtime performance.

Important: You need to set floating point precision before the network is set in the inference algorithm.

JunctionTreeAlgorithm algo = new JunctionTreeAlgorithm();

Logarithmic values

Jayes can also use logarithmic values internally. This drastically improves numerical stability, but approximately doubles the time needed for inference. The FactorFactory is again the class that provides this option.

JunctionTreeAlgorithm algo = new JunctionTreeAlgorithm();

Graph elimination algorithm

Jayes has also added the capability to choose the graph elimination algorithm used for the generation of the junction tree used internally by JunctionTreeAlgorithm. This has influence on the time needed to set up the algorithm, as well as potentially the performance of the resulting inference engine, both in terms of memory consumption and runtime performance. There are two heuristic algorithms available which can be set in the JunctionTreeAlgorithm. Both reside in the org.eclipse.recommenders.jayes.util.triangulation package.  

  • MinFillIn: the best available quality, but is not suited for big networks with several hundred variables, as loading will take too long. Thus best suited for small, complex networks. This is the default.
  • MinDegree: suitable for any size of network, but with complex networks the quality may suffer a bit. This could lead to a higher memory footprint, increased inference times and eventually the danger of numerical instability.

JunctionTreeAlgorithm algo = new JunctionTreeAlgorithm();
JunctionTreeBuilder builder = JunctionTreeBuilder.forHeuristic(new MinFillIn());

Factor decomposition

For probability distributions learned from real data, many parameters are zero because of a lack of data. However, in order to be able to predict in previously unseen cases, the distributions are smoothed, meaning some of the probability mass is distributed among the cases we did not see in our data.

Jayes is able to take advantage of sparse distributions. However, the smoothing intentionally leads to non-sparse distributions. Using linear algebra magic, Jayes provides algorithms to make the smoothing an explicit part of the model, in the form of new variables. This allows the distributions to be sparse, which saves memory – for our models ~20-30%. The extra variables make the model more complex, leading to increased inference times – for our models twice the time. So, this again is a memory/time trade-off.

The decomposition algorithms can be found in the org.eclipse.recommenders.jayes.transformation bundle. The algorithm to use for smoothed distributions is the SmoothedFactorDecomposition. This class has one public method, decompose(BayesNet,BayesNode), which will decompose the given BayesNode and augment the network with the results.

Here are the most important things to think about when using this feature:

  • Evaluate the use of this feature for every model you use. For some models there will be no memory benefit.
  • It is not the best strategy to decompose all nodes – therefore you should choose the nodes to decompose. The more a distribution needs to be smoothed, the better the decomposition will perform.

Jayes and You

I hope this article has given you an overview of what Jayes does and how you can use it. If you have any questions regarding Jayes, please contact us on the Eclipse Code Recommenders mailing-list.

If you like what the Code Recommenders and Codetrails team is doing and want to keep up to date: Follow us on Google+.


Ashwin Jayaprakash's picture

Hi Michael, would you be able to tell/direct me how the Eclipse plugin is building this network? I imagine the network is very complex and built by analyzing the AST of all loaded classes.

I looked in the source code but couldn't figure out where the network was actually being built.

(Source links: #1 and #2)


Michael Kutschke's picture

Hi Ashwin,

the static analysis and the training routines are actually closed source at the moment, that's why you can't find them. As Marcel said on the mailing list though, there is an example how to create an override model from serialized JSON data here:

Sam's picture

Hi; Thanks for the post. My problem is how to add the JAYES api to the bath of a project is there any .jar to use it directly. 

Thanks in advance

thamarai selvan's picture

Hi,can you give me some sample inputs for the jayes(preferred if the attributes have values to determine the class).

Thank you

Michael Kutschke's picture


what exactly are you looking for? There are two small example networks in the project (dog and rain), which you can load with the appropriate readers from the project. Those should be enough to play around a little bit.

For examples on how to use the API, the network examples I use for testing are a good place to look at.

Is that helping? It seems like you are looking at classification. You can also look at the actual code recommenders models if you like, code for loading and exporting those was just recently contributed here. These models look very similar to what a NaiveBayes classifier would look like, although we use them differently.

David's picture

Hi Michael,

I'm interested in providing a network with soft / virtual evidence, and I see some indications that you've worked on this ( , git:// refs/changes/44/30544/5 ), but it doesn't seem to have been merged into the main branch and development stopped ~8 months ago. Are there any plans to merge this soon?



Michael Kutschke's picture

Hi David,

You can give the code you found a try, it should be working. Currently there are no plans to merge this into the main branch any time soon, but it would be great to get feedback concerning the API and bugs, if any. The reason we are reluctant to merge this is because we as a project don't need the functionality, but it makes the code more complex and doesn't have a natural place in the API so far. I can rebuild the thing tonight to remember why the build failed 8 months ago, but feel free to give it a test ride. I think the problem was related to non-functional things.

Jhonatan Oliveira's picture

Hi Michael.

First, congratulations for the library!
I have tried to implement multiplication/marginalization of factors before (mostly by following proposed "efficient algorithm" in Koller & Firedman 2009 's book) but the result was too slow even after further optimizations.

In the other hand, Jayes is very fast. I was wondering if you could give me more details by recommending a paper or giving some insights on how you have implemented factors operations (multiplication/marginalization) in Jayes, please.

Thank you.

Best regards.

Prithvi Shah's picture

Why is this library not able to tackle the reverse inference problem? i.e. P(a| b="three", c="false"), where a and b are parents of c.

Michael Kutschke's picture

Hi Prithvi, this kind of query should be supported, do you have problems with a specific example? In fact, the algorithm that is used for inference uses an undirected model internally, so parent-child-relations don't matter at inference time.

Prithvi Shah's picture

I got it, thanks!

Julia Walker's picture

Just downloaded the Jayes libraries and went through your example above.  I noticed that some of the classes such as IBayesInferer, JunctionTreeAlgorithm, etc. are deprecated but found no comment in Javadoc about  which classes having replaced them.  Thanks.


Michael Kutschke's picture

Hi Julia,

The replacements are:

IBayesInferer -> IBayesInferrer      (the first version is spelled wrong...)

AbstractInferer -> AbstractInferrer   (again, spelling, we were cautious with API changes so decided to deprecate it instead of just rename it)


JunctionTreeAlgorithm is not deprecated. In the source code, there is a SuppressWarnings annotation for deprecation, that is just there so we can still use AbstractInferer as base class, for backwards compatibility reasons, without getting a warning. Same for the other classes that use this annotation. The deprecated methods in BayesNode and BayesNet have comments what you should be using instead. Any other I missed?

Ljuba's picture

Hi Michael,

please can you tell me which is correct Jayes verion? I checked in this page: and in Pom xml there is no Jayes version.



Ehsan's picture


is there any jar file of this library, which I can use in another project? I would like to use this library in an android project. Is it possible?


Johannes Dorn's picture

Hi Ehsan,

have a look at my previous comment. The repository contains the Jayes JARs.

Add new comment