YABTE
(Yet Another Bayes’ Theorem Emulator)

Version 1.1
2014-01-31

by Bill Seymour

Copyright Bill Seymour 2013, 2014.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)


Abstract:

This paper describes an open-source program that calculates probabilities using Bayes’ Theorem. It’s not intended for serious numerical work in the sciences or statistics, but rather is for use when the various input probabilities are not well known (when studying ancient history, for example), and so the answers aren’t expected to be particularly precise and might even be a rather broad range of probabilities (like “20% to 40%” or some such).

A Web-based version is available at http://www.cstdbill.com/cgi-bin/yabte. Versions that you can run off-line are comming Real Soon Now. (If anyone would like to help with that, see the internal documentation which contains links the current source code.)

This CGI version requires that JavaScript be enabled in your browser.

I still consider this to be a beta version of the program. I believe that everything works as advertised; but the only browsers that I’ve tested it on so far are Firefox 25.0.1 and IE 8.0. The mechanism for making sure that priors sum to 1 could probably be improved.


Acknowledgements:

I’d like to thank Richard Carrier for his input on how such a program would likely be used, and for his description of Bayes’ Theorem in Chapter 3 of Proving History: Bayes’s Theorem and the Quest for the Historical Jesus, Amherst, New York:Prometheus Books, 2012. ISBN 978-1-61614-559-0.


Brief glossary:

Here are three terms that Carrier uses in Proving History along with what I mean by them in this paper. (I hypothesize that I understand them. ☺) And three terms that I’ve made up in the hope of eliminating some ambiguity:


The User Interface:

User Name:
Hypothesis Name:
Main Hypothesis

Description:
Prior:
Consequent:
Alternate Hypothesis 1

Description:
Prior:
Consequent:
Alternate Hypothesis n

Description:
Prior:
Consequent:
         
Validation accuracy:  ±10
Displayed Result:   Precision:     Decimal    Percent    Odds
Most recent result:


The Input Fields:

User Name:  This is a string that will identify you as the user in the hope of keeping your work separate from others’. It will actually wind up being the name of a disk directory (a.k.a., “folder”); so to keep it maximally portable across various operating systems, it must begin with a letter of the English alphabet; and subsequent characters can be any English letter, decimal digit, or the underscore (_). It must not have any spaces in it. Case is not significant (for example, “foo”, “Foo”, and “FOO” are all considered the same string). You’ll probably want to set this once at the beginning of your session and not change it thereafter. If nothing is entered, it defaults to “j_random_user.”

Note that this beta version provides no mechanism for “signing in,” and so there’s no way to keep you from modifying the work of others. Be nice. ☺

Hypothesis Name:  This is a handle that the program will use to identify a particular complete hypothesis. Similar to the User Name field, it will actually wind up being the name of a disk file; so it has the same requirements as do User Name entries. There’s no default.

To begin entering data for a new complete hypothesis, or to return to one already saved, just enter the new or existing name. The page will reload when you move away from the Hypothesis Name field as if you had clicked the Reset button.

You can also use a complete hypothesis to actually just give a name to a constant value.

Description:  This is strictly for H. sapiens who are trying to understand your work. You may enter any arbitrary ISO 10646 text in this field except that you can’t have an ASCII record separator, U+001E, as the only character on a line.

Prior and Consequent:  These are probabilities. They may, in general, be any arithmetic expression that yields a value, or range of values (a pair of probabilities representing a minimum and a maximum), between 0 and 1, both inclusive. The program will also recognize a logical negation operator that yields what I’ll call the “logical complement of a probability” (1 minus its operand), a colon (:) operator used to express probabilities as odds, and vertical bar (|) and “+/−” operators for creating ranges of values. Any term may also be the name of some other hypothesis. There’s a complete description of what you can enter as probabilities later in this paper.

There will always be at least one alternate hypothesis. When there’s exactly one, any prior you enter will simply be ignored (so you can just leave it blank), and the program will use the complement of the main hypothesis’ prior for the alternate’s. (When there’s exactly one alternate hypothesis, it’s “not the main one” by definition.) All the priors (main and all the alternates) must add up to 1. How the program enforces this is explained later.

Except for the only alternate’s prior, if you leave any probability entry blank, it will default to 1.

The Add Alt. button creates another alternate hypothesis that’s ready for your input.

The Remove Alt. button deletes the final alternate hypothesis. (This is mainly to allow you to get rid of an alternate if you clicked Add Alt. by mistake; but note that it’ll delete the final alternate even if the hypothesis has already been saved.) Note that the program won’t let you delete the only alternate.

The Save button saves the current complete hypothesis for later. It doesn’t validate the input in any way; that’s done by the Compute button.

The Delete button permanently deletes the complete hypothesis.

The Reset button restores all previously saved data, or just clears all entered data except the user and hypothesis names if the current complete hypothesis has yet to be saved.

The Compute button actually performs the calculation. If parsing any probability entry fails, or if the calculation of any probability yields a value less than 0 or greater than 1, you’ll get an error message and the calculation will abort. If you’ve made changes since the last time you saved the complete hypothesis, Compute will first save the data as if you had clicked Save.

Options that are global for all hypotheses:

Validation accuracy:  This is how picky you want the program to be when enforcing the requirements that probabilities be between 0 and 1 and that all the priors add up to 1. By default, you get ±10−4, which means that individual probabilities can be between −0.0001 and +1.0001 for your calculation to proceed, and that the sum of all the priors can be between 0.9999 and 1.0001 (otherwise the program will make an adjustment as explained later).

Note that you would normally just enter a positive number here; but if you don’t, that’s OK…the program will happily use the negative of the absolute value of whatever integer you enter.

Displayed Result:  These are options that affect how the most recent computed result, if any, will be shown.

The last computed result, if any, will be displayed at the bottom. A range of values will normally be displayed as minimum to maximum unless the two values compare equal after rounding to your selected output precision, in which case you’ll get a single value.

In the author’s CGI, the decimal point will always be displayed as a period. In off-line versions comming Real Soon Now, there will probably be some mechanism to allow the user to specify the decimal point character.


Named constants:

Many computer programmers like to avoid what they call
magic numbers. These are (usually integer) constants for things like the number of elements in an array or the number of times to execute a loop. Careful programmers will give these values meaningful names so that, a week, a month, or a year from now, when they have to make changes to the program for some reason, they won’t have to puzzle out what all those numeric literals are for.

You can create probability values that are named constants. Just create a complete hypothesis with one alternate, put the constant value in the main prior, and leave everything else blank (so it defaults to 1 as stated above). For example, let’s say that one out of five writings by somebody named Billvs has some serious error, and that this value is useful in several of your hypotheses. You could create a one-alternate “hypothesis” called billvs_error_rate, set its prior to 0.2, and just leave everything else blank. The unsurprising calculation would be:

           0.2 × 1.0              0.2        0.2
     —————————————————————  =  —————————  =  ———  =  0.2
     0.2 × 1.0 + 0.8 × 1.0     0.2 + 0.8     1.0
and then you could use “billvs_error_rate” instead of “0.2” in other priors and consequents to make those entries more self-documenting.


A formal description of probability entries:

Here’s the formal grammar for probability entries in a modified Backus-Naur Form (BNF). Briefly:

probability-expression:
  additive-expression
  probability-expression '|' additive-expression
  probability-expression '+/-' additive-expression
  probability-expression '±' additive-expression

additive-expression:
  complement-expression
  additive-expression '+' complement-expression
  additive-expression '-' complement-expression

complement-expression:
  multiplicative-expression
  '˜' complement-expression
  '¬' complement-expression

multiplicative-expression:
  odds-expression
  multiplicative-expression '*' odds-expression
  multiplicative-expression '×' odds-expression
  multiplicative-expression '/' odds-expression
  multiplicative-expression '÷' odds-expression

odds-expression:
  primary-expression
  primary-expression ':' primary-expression

primary-expression:
  decimal-number
  decimal-number '%'
  hypothesis-name
  other-user-name '\' hypothesis-name
  '(' probability-expression ')'

Semantics:

Operator precedence, highest to lowest:
    %  \
    :  
    *  ×  /  ÷
    ˜  ¬
    +  
    |  +/−  ±
for example, x+y*z means x+(y*z) because * has the higher precedence.

:, \ and % are non-associative; in other words, you’re not allowed to write any of  x:y:z”, x\y\z or x%%”.

The complement operators associate right-to-left; for example ˜˜x means ˜(˜x) which, in turn, just means x.

All other operators with equal precedence associate left-to-right; for example, x+yz means (x+y)−z.

The vertical bar (|) operator yields a range of values (minimum and maximum) in no particular order. (The program will happily use the larger of the two as the maximum, and so there’s no need for any kind of “down to” operator.)

The +/- operator yields a range of values such that the maximum is the sum of the two operands, and the minimum is the left-hand operand minus the right-hand operand. The program will also recognize ± as this operator. The ± character, U+00B1, doesn’t seem to be present on Microsoft’s US-International keyboard; but it’s Shift+Option+= on the Mac.

The unary tilde (˜) operator yields what I’ll call the “logical complement of the probability”, that is, 1 minus the operand. For example, ˜0.25 means 0.75. The program also recognizes logical NOT (¬) as this operator. The ¬ character, U+00AC, is AltGr+\ on Microsoft’s US-International keyboard, and Option+l (letter l) on the Mac.

Note that the complement operators’ precedence is lower than that of the multiplicative operators but higher than the precedence of the additive operators. In other words, multiplication and division group tighter than complementing, but complementing groups tighter than addition and subtraction. For example, ˜1/4 = 3/4; but ˜0.25 + 0.1 = 0.75 + 0.1 = 0.85. If you want the complement operator to apply to, say, a whole additive-expression, just use parentheses for grouping. For example, ˜(0.25 + 0.1) = ˜0.35 = 0.65.

The program recognizes both * and × as the multiplication operator. The latter, U+00D7, is AltGr+= on Microsoft’s US-International keyboard, but doesn’t seem to be present on the Mac.

The program recognizes both / and ÷ as the division operator. The latter, U+00F7, is Shift+AltGr+= on Microsoft’s US-International keyboard, Option+/ on the Mac.

An expression with a colon (:) is a probability expressed as odds. In other words, x:y means x/(x+y). The expression, ˜x:y, just means y:x; for example, the complement of “three-to-five odds” is “five-to-three odds”.

A primary-expression of the form, decimal-number %, is a probability expressed as a percentage. For example, 10% means 0.1.

The decimal point in a decimal-number may be either a period or a comma, which should take care of most Occidental locales. Other grouping marks (e.g., thousands separators) are not allowed.

A primary-expression of the form, hypothesis-name, is the posterior probability of the named complete hypothesis. If you’re sharing work with some other user, you can include that person’s user name and a backslash (\) before the hypothesis-name.

Any operation on a range of values yields a range of values.

Note that there’s no unary + or − allowed in a decimal-number. I could easily add that if I get complaints from users; but, for reasons beyond the scope of this paper, it would result in spaces sometimes being significant in probability expressions; and since we’re dealing with non-negative numbers generally, that would seem to be more trouble than it’s worth for the user.

Note also that this grammar allows for a very large superset of all the utterances that would actually make sense; so if I enter something like

    (~60% | 0.7):(3 * some_other_hypothesis)
the program won’t bat an eye; and if by chance that expression happens to yield minimum and maximum values between 0 and 1, the Bayes’ Theorem calculation will merrily proceed as advertised and give me an answer that’s no better than I deserve.


What it means for the priors to sum to 1:

There’s a requirement that all the priors (main and all the alternates) add up to 1.

When there’s exactly one alternate, implementing this is trivial:  the alternate’s prior is just 1 minus the main’s prior by definition. Even when the entered probabilities are ranges of values, it’s still no problem:  the calculation is just done twice to create a minimum and a maximum.

The interesting problem is what to do when there’s more than one alternate given that any probability entry, prior or consequent, can be a range of values.

What this program currently does is always do the calculation twice, once to create a minimum (using the main hypothesis’ minimum prior and consequent and the alternates’ maximim priors and consequents), and again to create a maximum (with main’s max values and the alternates’ min values). While the terms for the denominator are being accumulated, the priors are summed, and the minimum and maximum alternate consequents are noted.

Then just before the final division, if the priors didn’t sum to 1 within your selected accuracy range, one alternate’s prior is adjusted upward (if the prior sum is less than 1) or downward (if the sum is greater than 1):

Here’s an excerpt from the actual code if you’re curious:
        double zero = 1.0 - prior_sum;
        if (zero < -epsilon() || zero > epsilon())
        {
            bool use_max_cons = (zero < 0.0) == finding_max;
            bayes_denom += zero * (use_max_cons ? max_cons : min_cons);
        }

That clearly will create the most extreme value; but it might be too extreme:  it’s not necessarily the case that the alternate with the largest, or smallest, consequent is the one that should have its prior adjusted.

It’s also not clear that the program should have the hubris to be quietly making this adjustment at all (as opposed to just complaining and letting the user decide what to do). The problem is that, when the probabilities are ranges of values, the sums of the minimum and maximum priors will almost certainly be less than 1, and greater than 1, respectively, possibly by large amounts, in which case error messages would be generated on every calculation, likely a major annoyance for the user.

Suggestions on how to improve this will be greatly appreciated.

<aside>
Proof that it really is a prior that’s being adjusted
in the code snippet above:
Let a = the required adjustment
Let p = the prior to be adjusted
Let c = that prior’s consequent
Let d0 = the computed denominator
Let d1 = the adjusted denominator

d1 = d0 − pc + (p + a)c
d1 = d0 + (−p + p + a)c
d1 = d0 + ac

</aside>


All suggestions and corrections will be welcome; all flames will be amusing.
Mail to was at pobox dot com.