Version 1.1
2014-01-31
A Web-based version is available at http://www.cstdbill.com/cgi-bin/yabte. Versions that you can run off-line are comming Real Soon Now. (If anyone would like to help with that, see the internal documentation which contains links the current source code.)
This CGI version requires that JavaScript be enabled in your browser.
I still consider this to be a beta version of the program.
I believe that everything works as advertised;
but the only browsers that I’ve tested it on so far are Firefox 25.0.1
and
| ||||||||
| ||||||||
| ||||||||
… | ||||||||
---|---|---|---|---|---|---|---|---|
| ||||||||
Validation accuracy: ±10^{−} | ||||||||
Displayed Result: Precision: Decimal Percent Odds | ||||||||
Most recent result: |
Note that this beta version provides no mechanism for “signing in,” and so there’s no way to keep you from modifying the work of others. Be nice. ☺
Hypothesis Name: This is a handle that the program will use to identify a particular complete hypothesis. Similar to the User Name field, it will actually wind up being the name of a disk file; so it has the same requirements as do User Name entries. There’s no default.
To begin entering data for a new complete hypothesis, or to return to one already saved, just enter the new or existing name. The page will reload when you move away from the Hypothesis Name field as if you had clicked the Reset button.
You can also use a complete hypothesis to actually just give a name to a constant value.
Description: This is strictly for H. sapiens
who are trying to understand your work. You may enter any arbitrary
ISO 10646
text in this field except that
you can’t have an ASCII record separator,
Prior and Consequent: These are probabilities.
They may, in general, be any arithmetic expression that yields a value,
or range of values (a pair of probabilities representing a minimum
and a maximum), between
There will always be at least one alternate hypothesis. When there’s exactly one, any prior you enter will simply be ignored (so you can just leave it blank), and the program will use the complement of the main hypothesis’ prior for the alternate’s. (When there’s exactly one alternate hypothesis, it’s “not the main one” by definition.) All the priors (main and all the alternates) must add up to 1. How the program enforces this is explained later.
Except for the only alternate’s prior, if you leave any probability entry blank, it will default to 1.
The Add Alt. button creates another alternate hypothesis that’s ready for your input.
The Remove Alt. button deletes the final alternate hypothesis. (This is mainly to allow you to get rid of an alternate if you clicked Add Alt. by mistake; but note that it’ll delete the final alternate even if the hypothesis has already been saved.) Note that the program won’t let you delete the only alternate.
The Save button saves the current complete hypothesis for later. It doesn’t validate the input in any way; that’s done by the Compute button.
The Delete button permanently deletes the complete hypothesis.
The Reset button restores all previously saved data, or just clears all entered data except the user and hypothesis names if the current complete hypothesis has yet to be saved.
The Compute button actually performs the calculation. If parsing any probability entry fails, or if the calculation of any probability yields a value less than 0 or greater than 1, you’ll get an error message and the calculation will abort. If you’ve made changes since the last time you saved the complete hypothesis, Compute will first save the data as if you had clicked Save.
Note that you would normally just enter a positive number here;
but if you don’t, that’s
Displayed Result: These are options that affect how the most recent computed result, if any, will be shown.
In the author’s CGI, the decimal point will always be displayed as a period. In off-line versions comming Real Soon Now, there will probably be some mechanism to allow the user to specify the decimal point character.
You can create probability values that are named constants. Just
create a complete hypothesis with one alternate, put the constant value
in the main prior, and leave everything else blank (so it defaults
to 1 as stated above). For example, let’s say
that one out of five writings by somebody named Billvs has some
serious error, and that this value is useful in several of your
hypotheses. You could create a
0.2 × 1.0 0.2 0.2 ————————————————————— = ————————— = ——— = 0.2 0.2 × 1.0 + 0.8 × 1.0 0.2 + 0.8 1.0and then you could use “billvs_error_rate” instead of “0.2” in other priors and consequents to make those entries more self-documenting.
Here’s the formal grammar for probability entries in a modified Backus-Naur Form (BNF). Briefly:
probability-expression:
additive-expression
probability-expression '|' additive-expression
probability-expression '+/-' additive-expression
probability-expression '±' additive-expression
additive-expression:
complement-expression
additive-expression '+' complement-expression
additive-expression '-' complement-expression
complement-expression:
multiplicative-expression
'˜' complement-expression
'¬' complement-expression
multiplicative-expression:
odds-expression
multiplicative-expression '*' odds-expression
multiplicative-expression '×' odds-expression
multiplicative-expression '/' odds-expression
multiplicative-expression '÷' odds-expression
odds-expression:
primary-expression
primary-expression ':' primary-expression
primary-expression:
decimal-number
decimal-number '%'
hypothesis-name
other-user-name '\' hypothesis-name
'(' probability-expression ')'
Operator precedence, highest to lowest:
% \
:
* × / ÷
˜ ¬
+ −
| +/− ±
for example,
:, \ and %
are non-associative; in other words, you’re not allowed to write any of
The complement operators associate right-to-left; for example
All other operators with equal precedence associate left-to-right; for example,
The vertical bar (|) operator yields a range of values (minimum and maximum)
in no particular order. (The program will happily use the larger of the two
as the maximum, and so there’s no need for any kind
The +/- operator yields a range of values such that
the maximum is the sum of the two operands, and the minimum is the left-hand
operand minus the right-hand operand. The program will also recognize
±
as this operator. The ± character, U+00B1,
doesn’t seem to be present on Microsoft’s
The unary
Note that the complement operators’ precedence is lower than
that of the multiplicative operators but higher than the precedence
of the additive operators. In other words, multiplication and division
group tighter than complementing, but complementing groups tighter than
addition and subtraction. For example,
The program recognizes both * and
×
as the multiplication operator. The latter, U+00D7, is
The program recognizes both / and
÷
as the division operator. The latter, U+00F7, is
An expression with a
A primary-expression of the form,
The decimal point in a
A primary-expression of the form,
Any operation on a range of values yields a range of values.
Note that there’s no unary + or − allowed in a
Note also that this grammar allows for a very large superset of all the utterances that would actually make sense; so if I enter something like
(~60% | 0.7):(3 * some_other_hypothesis)the program won’t bat an eye; and if by chance that expression happens to yield minimum and maximum values between 0 and 1, the Bayes’ Theorem calculation will merrily proceed as advertised and give me an answer that’s no better than I deserve.
When there’s exactly one alternate, implementing this is trivial: the alternate’s prior is just 1 minus the main’s prior by definition. Even when the entered probabilities are ranges of values, it’s still no problem: the calculation is just done twice to create a minimum and a maximum.
The interesting problem is what to do when there’s more than one alternate given that any probability entry, prior or consequent, can be a range of values.
What this program currently does is always do the calculation twice, once to create a minimum (using the main hypothesis’ minimum prior and consequent and the alternates’ maximim priors and consequents), and again to create a maximum (with main’s max values and the alternates’ min values). While the terms for the denominator are being accumulated, the priors are summed, and the minimum and maximum alternate consequents are noted.
Then just before the final division, if the priors didn’t sum to 1 within your selected accuracy range, one alternate’s prior is adjusted upward (if the prior sum is less than 1) or downward (if the sum is greater than 1):
Here’s an excerpt from the actual code if you’re curious:
double zero = 1.0 - prior_sum; if (zero < -epsilon() || zero > epsilon()) { bool use_max_cons = (zero < 0.0) == finding_max; bayes_denom += zero * (use_max_cons ? max_cons : min_cons); } |
That clearly will create the most extreme value; but it might be too extreme: it’s not necessarily the case that the alternate with the largest, or smallest, consequent is the one that should have its prior adjusted.
It’s also not clear that the program should have the hubris to be quietly making this adjustment at all (as opposed to just complaining and letting the user decide what to do). The problem is that, when the probabilities are ranges of values, the sums of the minimum and maximum priors will almost certainly be less than 1, and greater than 1, respectively, possibly by large amounts, in which case error messages would be generated on every calculation, likely a major annoyance for the user.
Suggestions on how to improve this will be greatly appreciated.
<aside>
Proof that it really is a prior that’s being adjusted in the code snippet above: Let a = the required adjustment</aside> |