Stream

Join this community to post or comment

João Neto
moderator

Discussion  - 
This is perhaps the first real crack in the wall for the almost-universal use of the null hypothesis significance testing procedure (NHSTP). The journal, Basic...
13
7
Clifford Long's profile photoRoland Kofler's profile photoStefan Knecht's profile photoDeniz Yuret's profile photo
2 comments
 
I guess they took the phrase "lies, damn lies, statistics" a bit literally.. by the way this is not a "crack" sort of thing.. I mean, Cumming's “dance of the p-value” argument is valid but do we have a better alternative keeping it's simplicity? 
Add a comment...

Mortal Kolle

Discussion  - 
 
Good evening all

I have encountered a counter-intuitive result while thinking about Bayesian networks and decided to ask the members of this group

Suppose A is the probability space of all possible events (with P(A)=1, of course)

Now suppose A is partitioned into A1 and A2 such that P(A1)=P(A2)=0.5 and let a be some event in A

According to Bayes' Rule, 

P(A1|a) +P (A2|a) = P(A1)/P(a) *P(a|A1)  + P(A2)/P(a)*P(a|A2) =

= 0.5/P(a) *(P(a|A1) + P(a|A2)) = 0.5          since P(A1)= P(A2) = 0.5

P(A1+A2)=1 and A1 and A2 are disjoint, yet P(A1+A2|a) ~= P(A1|a) + P(A2|a) = 0.5

But A1 and A2 are disjoint and P(A1) + P(A2) = 1 so a must be fully contained in the union of A1 and A2 since it is contained in the universal probability space A. 


Likewise, P(a|A1) + P(a|A2) = (P(a)/0.5) *  (P(A1|a) +P (A2|a))

My head is spinning from this. Is there a rationalization I don''t know about?
1
Add a comment...

Busuru Elyvanus

Discussion  - 
 
help me solve
 
 "in research of 1000 people 8% tested to have tuberculosis.the 1000 people then given new test found that tuberculosis was in 96% of those who have it and 2% for those who dont have.whats the probability of randomly chosen perso
View original post
1
Add a comment...
 
High school conditional Probability, Please help

As a part of a bit complicated high school conditional probability question, I devised the following argument to get my answer to agree with the given answer. Please tell whether this argument is valid, by providing links for any theorems or so.

Below, 
Three genes M, N, O that are responsible for the color of eyes occur randomly among adults and one person can have only one of these genes. Among children, probabilities of having  Brown or Black eyes given that Parents are combination of MM, MN, MO...etc are given separately.

P(Ci) = Probabilities of parents having random genes M,N,O joining to produce a baby. 
            i.e, C1=MM C2=MN C3= MO C4= NM C5=NN C6=NO.... C9=OO
P(A)  = Probability of Both parents having Black eyes
P(B)  = Probability of child having brown eyes

****************************
P(A∩B)

= ⅀ P([A∩B] | Ci) P(Ci)

= ⅀ P(A | Ci) P(B | Ci) P(Ci)

= ⅀ P(A | Ci) (P(B∩Ci)

******************************

Note, in the second step, I assumed
P([A∩B] | Ci) = P(A | Ci) P(B | Ci)
because, A and B are both events that depend on C only. So, I take it that when given C has occurred (when the sample space is restricted to C only), Events A|C and B|C can be considered independent of each other.

Is this argument correct? Please support your answer with links or references to any theorems.
1
Add a comment...
 
Top 11 Free Software for Text Analysis, Text Mining, Text Analytics

KH Coder, Carrot2, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable and LPU

Read more: http://wp.me/p43LB9-o
Review of Top 11 Free Software for Text Analysis, Text Mining, Text Analytics ? KH Coder, Carrot2, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable and LPU are some of the key vendors who provides text analytics software
1
Add a comment...

Able Lawrence

Discussion  - 
 
Weak statistics to blame for non-reproducible scientific results
Most scientific data are analysed and reported using Frequentist statistics and use a cut-off of P values <0.05 to define positive as opposed to negative study results. Now a new study that compares the traditional frequentist analysis with Bayesian inference has concluded that the choice of P values <0.05 as the reason for the excess of false positive results that never get reproduced. They suggest use of more stringent statistical standards with a cut off of at least 0.005 for reporting purposes. 
Revised standard for statistical inference http://goo.gl/Bo6S2H
Weak statistical standard implicated in scientific irreproducibility 
http://goo.gl/UY3DgN
13
3
Xiaowei Song (Shavi)'s profile photoClinical Immunology Society of India's profile photoArturo Benjamin Chian Nuñez's profile photoShekar Kannai's profile photo
4 comments
 
Interesting lies!
Add a comment...
 
Why Gamification Can Get Even The Uninterested Very Interested In Mathematics?
4
1
Rini Das's profile photo
Add a comment...

Rasmey Yem

Discussion  - 
 
Hi all of you I'm pleasure to be a member in this community. I'm a little man that start learn statistics, I hope get more advice from you.
1
Danilo Freire's profile photoMattias Villani's profile photo
2 comments
 
Welcome!
Add a comment...

Sajjit Thampy

Discussion  - 
 
The law of the unconscious statistician
2
1
Omar Co's profile photo
Add a comment...

Wayne Hajas

Discussion  - 
 
This is a writing-problem concerning a bayesian analysis I hope to publish.  There is a simple idea that I just can't justify succinctly.  People must have to deal with it all the time but I can't find any references.  It's driving me nuts!  

The concept I want to express:  As we wish to consider a larger range of data-values, a model must be made more complicated in order to remain useful.  

As an example: If I drop a rock a distance of one-metre, I can probably get away with a constant-acceleration model.  If I drop a rock a distance of a kilometre, I have to consider air resistance. If I drop it a distance of 1000 kilometres, I must consider orbital dynamics. ...

Correspondingly,one way of managing the need for model-complexity in a bayesian model is to limit the range of data-values.  In my particular circumstances, I can do that at an acceptable cost.

There must be a name for this concept.  It's gotta be published somewhere.  Google is failing me.  The paper will lose a lot of focus if I have to chase this tangent.  Can anybody suggest a useful reference?  Or even a useful term to google?
1
Wayne Hajas's profile photoDan Mazur's profile photo
5 comments
 
The concept is just that of approximation.

I would say "Within the range <describe range>, the model can be approximated by a simplified version where <list parameters> are ignored."
Add a comment...

Olga Scrivner

Discussion  - 
 
In my PhD thesis on a historical language change from Latin to Old French (probably very boring subject for many people), and I am trying to use Bayesian inference for my data which is mainly categorical (with logistic regression glm as a prior). Surprisingly, I have never seen any previous linguistic study that has used Bayesian statistics (except language evolution prediction model which is different). I am not sure even how to present and explain my choices of prior and my models to a non-statistical non-Bayesian  audience. I would greatly appreciate your insights!
1
Mad Tinker's profile photoOlga Scrivner's profile photo
8 comments
 
Thank you, Mad!
Add a comment...
 
Could someone point me to some literature on setting priors? 
Specifically, I want to set a prior for Click Through Rate estimation but I want to penalize a subset of the results based on the cardinality of the set, but It doesn't sound like a very Bayesian thing to do. Naturally, some reading could help. Thanks!
1
Håvard Rue's profile photoSanjeev Satheesh's profile photo
4 comments
 
Splendid, thank you. I will go through this!
Add a comment...

Mark Adams

Discussion  - 
 
I have been asked to improve the way that predictions of reliability of my companies products are done. I think I understand Bayes' Theorem, but putting into practice is another thing. I have the number of hours that the sub system has before it fails, if we have a corrective action for the failure mode (I don't have a lot of confidence of the root causes or effectiveness), the total number of machines built in a quarter, the percent new content for the next generation of machine. So can I predict what the reliability will be at the start of production for the new product and a year into production? Can I use excel, R, mini tab? I use a mac by the way. 
1
Rolando Gonzales's profile photo
 
since none else answer you, I would do it: sure you can; and you can do it in excel and specially in R. You have to use survival analysis and use number of machines, etc as covariates. See for example:

http://www.hindawi.com/journals/mpe/2012/329489/

or

http://www.springer.com/statistics/physical+%26+information+science/book/978-0-387-77948-5
Add a comment...

João Neto
moderator

Discussion  - 
 
 
Interesting to think about. Bayesian statistics became popular when it became computationally feasible due to MCMC and Moore's Law. Could it become infeasible again due to new larger data sets and become less popular?
For our first seminar of the year, we are very pleased to have a talk which will combine two themes close to the heart of the statistics department: Steve Scott, "Bayes and Big Data": Abstract: A useful definition of "big data" is data that is too big to fit on a single machine, either because ...
2 comments on original post
1
1
Matt Moores's profile photo
Add a comment...

João Neto
moderator

Discussion  - 
 
"Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use generalized linear mixed effect models."

http://www.r-bloggers.com/the-joy-and-martyrdom-of-trying-to-be-a-bayesian/
Some of my fellow scientists have it easy. They use predefined methods like linear regression and ANOVA to test simple hypotheses; they live in the innocent world of bivariate plots and lm(). Sometimes they notice that the data have odd histograms and they use glm(). The more educated ones use … Continue reading →
11
5
Karlson Pfannschmidt's profile photoNicholas Miceli's profile photoDaniel Adamec's profile photoHernán De Angelis's profile photo
 
Excellent article. Thanks for sharing!
Add a comment...

João Neto
moderator

Discussion  - 
 
"The most important thing to note about these categorizations is that the type of randomness depends on your perspective. The cards you hold in your hand are Type 0 randomness to you, but to the person sitting across the poker table from you, they are Type 2 randomness."

http://www.statisticsblog.com/2012/02/a-classification-scheme-for-types-of-randomness/
We often speak implicitly of different types of randomness but neglect to name or categorize them. Consider this post to be a kind of white paper or rough draft on the division of randomness into five categories. If you start using these distinctions explicitly, even if only in your own head, ...
2
1
Rini Das's profile photo
Add a comment...

Ismael Navas

Discussion  - 
 
Big Data Analytics Master
BDA Logo · logo UMA · logo SolidQ. Máster en Big Data Analytics (BDA). Información de la 1ª Edición. Preinscripción y matrícula. Precio: 9000 € Preinscripción: 23 Junio - 15 Octubre de 2013. Matriculación: 16 Octubre - 30 Octubre de 2013. Características. Tipo de docencia: Online ...
1
Add a comment...