Brent's posts

Post has attachment

After > 1 month with what appeared to be croup, a surgeon did exploratory bronchioscopy and pulled this piece of plastic from my daughter's airway. She's doing great now--back to eating beets and getting into mischief.

‹

›

2015-01-13

3 Photos - View album

Post has attachment

Post has attachment

I have been thinking about model selection on genomic data.

(http://biostars.org/post/show/42955/genome-wide-model-selection-and-interactions-terms/)

Normally we choose a single model and fit it across the genome. Let's say:

~ disease + age + gender + pack_years

but what if in most places:

~ disease + age + gender

is more appropriate. (or maybe just ~ age)

I'd like to just give an over-specified, full-model with all possible terms, and then do model selection at each site. But, then, there are problems with multiple-testing correction (e.g. how to do it).

Anyone got any references on this?

(http://biostars.org/post/show/42955/genome-wide-model-selection-and-interactions-terms/)

Normally we choose a single model and fit it across the genome. Let's say:

~ disease + age + gender + pack_years

but what if in most places:

~ disease + age + gender

is more appropriate. (or maybe just ~ age)

I'd like to just give an over-specified, full-model with all possible terms, and then do model selection at each site. But, then, there are problems with multiple-testing correction (e.g. how to do it).

Anyone got any references on this?

Post has attachment

This was a hard-earned pub, 4 example analyses, 3 revisions in a 2 page applications note.

We're using it to find differentially methylated regions in 450K and CHARM data.

Here's the free link: http://bioinformatics.oxfordjournals.org/cgi/reprint/bts545?ijkey=ZTTOnczUJYLfKgw&keytype=ref

We're using it to find differentially methylated regions in 450K and CHARM data.

Here's the free link: http://bioinformatics.oxfordjournals.org/cgi/reprint/bts545?ijkey=ZTTOnczUJYLfKgw&keytype=ref

Post has attachment

My obvious thoughts on methylation and correlation using the Ipython Notebook:

http://brentp.github.com/correlation.html

I do see a lot of publications ignoring this.

http://brentp.github.com/correlation.html

I do see a lot of publications ignoring this.

Post has attachment

I updated my aligner comparison to include 75bp single end Solid 5500 data. Scroll down for ROC-like plot:

https://github.com/brentp/bowfast/tree/master/aligner-compare

Again, BFAST has the highest number of true positives (and lowest number of false positives given a proper mapping quality cutoff). Bowtie does much better on these reads which are better quality.

Mapping rate is much higher in 5500 (almost 80%) than in solid 3 ( < 60%).

https://github.com/brentp/bowfast/tree/master/aligner-compare

Again, BFAST has the highest number of true positives (and lowest number of false positives given a proper mapping quality cutoff). Bowtie does much better on these reads which are better quality.

Mapping rate is much higher in 5500 (almost 80%) than in solid 3 ( < 60%).

Post has attachment

An ROC analysis using a number of aligners for colorspace (scroll down on the linked page to see notes, image):

https://github.com/brentp/bowfast/tree/master/aligner-compare

It is for real solid 3 data on a targetted resequencing project. So I can gauge accuracy by the number of reads that are mapped inside the target region.

This is preliminary, and I

the parameters for all (any?) aligners. I did not trim the reads.

BFAST does quite well, novoalign takes forever, but maps a lot more reads, most of them accurate.

https://github.com/brentp/bowfast/tree/master/aligner-compare

It is for real solid 3 data on a targetted resequencing project. So I can gauge accuracy by the number of reads that are mapped inside the target region.

This is preliminary, and I

***welcome any feedback***. I certainly have not optimizedthe parameters for all (any?) aligners. I did not trim the reads.

BFAST does quite well, novoalign takes forever, but maps a lot more reads, most of them accurate.

Post has attachment

when my car starts sliding in the snow, there's a flashing indicator that comes on the dash (a car with some squiggly lines behind it), apparently to distract me from the fact that I'm, umm sliding.

I don't see how that's a useful feature.

This is the first day I drove the 8 miles to work instead of cycling. Next time, I might snow-shoe it.

I don't see how that's a useful feature.

This is the first day I drove the 8 miles to work instead of cycling. Next time, I might snow-shoe it.

Post has attachment

Great title:

http://genomebiology.com/content/12/10/130

I see this Multi-Dimensional Scaling (MDS) more frequently lately.

It's not clear to me how it differs From PCA. Anyone know why/when it's more useful?

http://genomebiology.com/content/12/10/130

I see this Multi-Dimensional Scaling (MDS) more frequently lately.

It's not clear to me how it differs From PCA. Anyone know why/when it's more useful?

Post has attachment

http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002240

A meta-analysis paper looking at the association of published and random gene

sets with Breast Cancer (outcome). They find that you can choose a

random set of genes and find--in cases where there are > 100 genes

involved--that 90% of those sets are "associated" with Breast Cancer outcome.

This is another very interesting paper related to how we do our statistical

analyses. It's essentially the problem of correlation != causation.

And moreso, that (from the paper):

"

the question is not whether a given set of genes is related to survival, but

whether it is more related to survival than random sets of genes

"

Because of this problem they find that their random gene sets and those with

un-related processes--including "postprandial laughter", are associated with

breast cancer outcome. And, 28 of 47 published studies had association that

was not stronger than expected by chance.

Only 18 of those 47 were more significant than all but 5% of the random

signatures (p < 0.05).

This problem should be somewhat specific to Cancer because it causes [sic]

A meta-analysis paper looking at the association of published and random gene

sets with Breast Cancer (outcome). They find that you can choose a

random set of genes and find--in cases where there are > 100 genes

involved--that 90% of those sets are "associated" with Breast Cancer outcome.

This is another very interesting paper related to how we do our statistical

analyses. It's essentially the problem of correlation != causation.

And moreso, that (from the paper):

"

the question is not whether a given set of genes is related to survival, but

whether it is more related to survival than random sets of genes

"

Because of this problem they find that their random gene sets and those with

un-related processes--including "postprandial laughter", are associated with

breast cancer outcome. And, 28 of 47 published studies had association that

was not stronger than expected by chance.

Only 18 of those 47 were more significant than all but 5% of the random

signatures (p < 0.05).

This problem should be somewhat specific to Cancer because it causes [sic]

**crazy expression changes in lots of genes.**Wait while more posts are being loaded