### Omar Javed

Discussion -Nice high-level talk on the difference between Frequentism and Bayesianism

https://clip.mn/video/yt-KhAUfqhLakw

https://clip.mn/video/yt-KhAUfqhLakw

10

7

Add a comment...

Start a hangout

All communitiesRecommended for you

Join this community to post or comment

Join community

Nice high-level talk on the difference between Frequentism and Bayesianism

https://clip.mn/video/yt-KhAUfqhLakw

https://clip.mn/video/yt-KhAUfqhLakw

10

7

Add a comment...

moderator

This is perhaps the first real crack in the wall for the almost-universal use of the null hypothesis significance testing procedure (NHSTP). The journal, Basic...

15

8

2 comments

I guess they took the phrase "lies, damn lies, statistics" a bit literally.. by the way this is not a "crack" sort of thing.. I mean, Cumming's “dance of the p-value” argument is valid but do we have a better alternative keeping it's simplicity?

Add a comment...

In my PhD thesis on a historical language change from Latin to Old French (probably very boring subject for many people), and I am trying to use Bayesian inference for my data which is mainly categorical (with logistic regression glm as a prior). Surprisingly, I have never seen any previous linguistic study that has used Bayesian statistics (except language evolution prediction model which is different). I am not sure even how to present and explain my choices of prior and my models to a non-statistical non-Bayesian audience. I would greatly appreciate your insights!

1

8 comments

Thank you, Mad!

Add a comment...

help me solve

"in research of 1000 people 8% tested to have tuberculosis.the 1000 people then given new test found that tuberculosis was in 96% of those who have it and 2% for those who dont have.whats the probability of randomly chosen perso

1

Add a comment...

I have been asked to improve the way that predictions of reliability of my companies products are done. I think I understand Bayes' Theorem, but putting into practice is another thing. I have the number of hours that the sub system has before it fails, if we have a corrective action for the failure mode (I don't have a lot of confidence of the root causes or effectiveness), the total number of machines built in a quarter, the percent new content for the next generation of machine. So can I predict what the reliability will be at the start of production for the new product and a year into production? Can I use excel, R, mini tab? I use a mac by the way.

1

since none else answer you, I would do it: sure you can; and you can do it in excel and specially in R. You have to use survival analysis and use number of machines, etc as covariates. See for example:

http://www.hindawi.com/journals/mpe/2012/329489/

or

http://www.springer.com/statistics/physical+%26+information+science/book/978-0-387-77948-5

http://www.hindawi.com/journals/mpe/2012/329489/

or

http://www.springer.com/statistics/physical+%26+information+science/book/978-0-387-77948-5

Add a comment...

moderator

Interesting to think about. Bayesian statistics became popular when it became computationally feasible due to MCMC and Moore's Law. Could it become infeasible again due to new larger data sets and become less popular?

For our first seminar of the year, we are very pleased to have a talk which will combine two themes close to the heart of the statistics department: Steve Scott, "Bayes and Big Data": Abstract: A useful definition of "big data" is data that is too big to fit on a single machine, either because ...

1

1

Add a comment...

moderator

"Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use generalized linear mixed effect models."

http://www.r-bloggers.com/the-joy-and-martyrdom-of-trying-to-be-a-bayesian/

http://www.r-bloggers.com/the-joy-and-martyrdom-of-trying-to-be-a-bayesian/

Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use … Continue reading →

11

4

Excellent article. Thanks for sharing!

Add a comment...

moderator

"The most important thing to note about these categorizations is that the type of randomness depends on your perspective. The cards you hold in your hand are Type 0 randomness to you, but to the person sitting across the poker table from you, they are Type 2 randomness."

http://www.statisticsblog.com/2012/02/a-classification-scheme-for-types-of-randomness/

http://www.statisticsblog.com/2012/02/a-classification-scheme-for-types-of-randomness/

We often speak implicitly of different types of randomness but neglect to name or categorize them. Consider this post to be a kind of white paper or rough draft on the division of randomness into five categories. If you start using these distinctions explicitly, even if only in your own head, ...

2

1

Add a comment...

Pre-Bayesian: Ridiculous, probabilities are

without doubt objective. They can be seen

in the relative frequencies they cause.

Bayesian: So if p = 0.75 for some event, after

1000 trials we’ll see exactly 750 such events?

Pre-Bayesian: You might, but most likely you

won’t see that exactly. You’re just likely to

see something close to it.

Bayesian: Likely? Close? How do you define or

quantify these things without making reference

to your degrees of belief for what will

happen?

Pre-Bayesian: Well, in any case, in the infinite

limit the correct frequency will definitely

occur.

Bayesian: How would I know? Are you saying

that in one billion trials I could not possibly

see an “incorrect” frequency? In one

trillion?

Pre-Bayesian: OK, you can in principle see

an incorrect frequency, but it’d be ever less

likely!

Bayesian: Tell me once again, what does ‘likely’

mean?

without doubt objective. They can be seen

in the relative frequencies they cause.

Bayesian: So if p = 0.75 for some event, after

1000 trials we’ll see exactly 750 such events?

Pre-Bayesian: You might, but most likely you

won’t see that exactly. You’re just likely to

see something close to it.

Bayesian: Likely? Close? How do you define or

quantify these things without making reference

to your degrees of belief for what will

happen?

Pre-Bayesian: Well, in any case, in the infinite

limit the correct frequency will definitely

occur.

Bayesian: How would I know? Are you saying

that in one billion trials I could not possibly

see an “incorrect” frequency? In one

trillion?

Pre-Bayesian: OK, you can in principle see

an incorrect frequency, but it’d be ever less

likely!

Bayesian: Tell me once again, what does ‘likely’

mean?

4

1

2 comments

+charles griffiths But sometimes a decision needs to be made, whether or not we have "answers about the actual world". Losses are least using the Bayesian approach.

Add a comment...

moderator

"This is the Bayesian approach. You have a belief according to existing evidence and theories. If a new bit of evidence comes in you don’t discard all prior knowledge, or pretend that we currently know nothing. You simply update your belief, adding the new information to existing information. In this way our beliefs slowly evolve, tracking with new evidence and ideas (unless you have a large emotional investment in one belief, but that’s another post)."

http://theness.com/neurologicablog/index.php/in-defense-of-prior-probability/

http://theness.com/neurologicablog/index.php/in-defense-of-prior-probability/

This post is a follow up to one from last week about reproducibility in science. An e-mailer had a problem with the following statement: 'I tend to accept...

9

2

Add a comment...

This is a writing-problem concerning a bayesian analysis I hope to publish. There is a simple idea that I just can't justify succinctly. People must have to deal with it all the time but I can't find any references. It's driving me nuts!

The concept I want to express: As we wish to consider a larger range of data-values, a model must be made more complicated in order to remain useful.

As an example: If I drop a rock a distance of one-metre, I can probably get away with a constant-acceleration model. If I drop a rock a distance of a kilometre, I have to consider air resistance. If I drop it a distance of 1000 kilometres, I must consider orbital dynamics. ...

Correspondingly,one way of managing the need for model-complexity in a bayesian model is to limit the range of data-values. In my particular circumstances, I can do that at an acceptable cost.

There must be a name for this concept. It's gotta be published somewhere. Google is failing me. The paper will lose a lot of focus if I have to chase this tangent. Can anybody suggest a useful reference? Or even a useful term to google?

The concept I want to express: As we wish to consider a larger range of data-values, a model must be made more complicated in order to remain useful.

As an example: If I drop a rock a distance of one-metre, I can probably get away with a constant-acceleration model. If I drop a rock a distance of a kilometre, I have to consider air resistance. If I drop it a distance of 1000 kilometres, I must consider orbital dynamics. ...

Correspondingly,one way of managing the need for model-complexity in a bayesian model is to limit the range of data-values. In my particular circumstances, I can do that at an acceptable cost.

There must be a name for this concept. It's gotta be published somewhere. Google is failing me. The paper will lose a lot of focus if I have to chase this tangent. Can anybody suggest a useful reference? Or even a useful term to google?

1

5 comments

Dan Mazur

+

1

2

1

2

1

The concept is just that of approximation.

I would say "Within the range <describe range>, the model can be approximated by a simplified version where <list parameters> are ignored."

I would say "Within the range <describe range>, the model can be approximated by a simplified version where <list parameters> are ignored."

Add a comment...

Good evening all

I have encountered a counter-intuitive result while thinking about Bayesian networks and decided to ask the members of this group

Suppose A is the probability space of all possible events (with P(A)=1, of course)

Now suppose A is partitioned into A1 and A2 such that P(A1)=P(A2)=0.5 and let a be some event in A

According to Bayes' Rule,

P(A1|a) +P (A2|a) = P(A1)/P(a) *P(a|A1) + P(A2)/P(a)*P(a|A2) =

= 0.5/P(a) *(P(a|A1) + P(a|A2)) = 0.5 since P(A1)= P(A2) = 0.5

P(A1+A2)=1 and A1 and A2 are disjoint, yet P(A1+A2|a) ~= P(A1|a) + P(A2|a) = 0.5

But A1 and A2 are disjoint and P(A1) + P(A2) = 1 so a must be fully contained in the union of A1 and A2 since it is contained in the universal probability space A.

Likewise, P(a|A1) + P(a|A2) = (P(a)/0.5) * (P(A1|a) +P (A2|a))

My head is spinning from this. Is there a rationalization I don''t know about?

I have encountered a counter-intuitive result while thinking about Bayesian networks and decided to ask the members of this group

Suppose A is the probability space of all possible events (with P(A)=1, of course)

Now suppose A is partitioned into A1 and A2 such that P(A1)=P(A2)=0.5 and let a be some event in A

According to Bayes' Rule,

P(A1|a) +P (A2|a) = P(A1)/P(a) *P(a|A1) + P(A2)/P(a)*P(a|A2) =

= 0.5/P(a) *(P(a|A1) + P(a|A2)) = 0.5 since P(A1)= P(A2) = 0.5

P(A1+A2)=1 and A1 and A2 are disjoint, yet P(A1+A2|a) ~= P(A1|a) + P(A2|a) = 0.5

But A1 and A2 are disjoint and P(A1) + P(A2) = 1 so a must be fully contained in the union of A1 and A2 since it is contained in the universal probability space A.

Likewise, P(a|A1) + P(a|A2) = (P(a)/0.5) * (P(A1|a) +P (A2|a))

My head is spinning from this. Is there a rationalization I don''t know about?

1

Add a comment...

Could someone point me to some literature on setting priors?

Specifically, I want to set a prior for Click Through Rate estimation but I want to penalize a subset of the results based on the cardinality of the set, but It doesn't sound like a very Bayesian thing to do. Naturally, some reading could help. Thanks!

Specifically, I want to set a prior for Click Through Rate estimation but I want to penalize a subset of the results based on the cardinality of the set, but It doesn't sound like a very Bayesian thing to do. Naturally, some reading could help. Thanks!

1

4 comments

Splendid, thank you. I will go through this!

Add a comment...

As a part of a bit complicated high school conditional probability question, I devised the following argument to get my answer to agree with the given answer. Please tell whether this argument is valid, by providing links for any theorems or so.

Below,

Three genes M, N, O that are responsible for the color of eyes occur randomly among adults and one person can have only one of these genes. Among children, probabilities of having Brown or Black eyes given that Parents are combination of MM, MN, MO...etc are given separately.

P(Ci) = Probabilities of parents having random genes M,N,O joining to produce a baby.

i.e, C1=MM C2=MN C3= MO C4= NM C5=NN C6=NO.... C9=OO

P(A) = Probability of Both parents having Black eyes

P(B) = Probability of child having brown eyes

P(A∩B)

= ⅀ P([A∩B] | Ci) P(Ci)

= ⅀ P(A | Ci) P(B | Ci) P(Ci)

= ⅀ P(A | Ci) (P(B∩Ci)

Note, in the second step, I assumed

P([A∩B] | Ci) = P(A | Ci) P(B | Ci)

because, A and B are both events that depend on C only. So, I take it that when given C has occurred (when the sample space is restricted to C only), Events A|C and B|C can be considered independent of each other.

Is this argument correct? Please support your answer with links or references to any theorems.

1

Add a comment...

Top 11 Free Software for Text Analysis, Text Mining, Text Analytics

KH Coder, Carrot2, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable and LPU

Read more: http://wp.me/p43LB9-o

KH Coder, Carrot2, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable and LPU

Read more: http://wp.me/p43LB9-o

Review of Top 11 Free Software for Text Analysis, Text Mining, Text Analytics ? KH Coder, Carrot2, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable and LPU are some of the key vendors who provides text analytics software

1

Add a comment...

Most scientific data are analysed and reported using Frequentist statistics and use a cut-off of P values <0.05 to define positive as opposed to negative study results. Now a new study that compares the traditional frequentist analysis with Bayesian inference has concluded that the choice of P values <0.05 as the reason for the excess of false positive results that never get reproduced. They suggest use of more stringent statistical standards with a cut off of at least 0.005 for reporting purposes.

http://goo.gl/UY3DgN

13

3

4 comments

Interesting lies!

Add a comment...

Why Gamification Can Get Even The Uninterested Very Interested In Mathematics?

4

1

Add a comment...