Profile cover photo
Profile photo
Brent Pedersen
Bioinformatics Programmer in at University of Utah Center for Genetic Discovery
Bioinformatics Programmer in at University of Utah Center for Genetic Discovery
About
Brent's posts

Post has attachment
After > 1 month with what appeared to be croup, a surgeon did exploratory bronchioscopy and pulled this piece of plastic from my daughter's airway. She's doing great now--back to eating beets and getting into mischief.
PhotoPhotoPhoto
2015-01-13
3 Photos - View album

Post has attachment

Post has attachment
I have been thinking about model selection on genomic data. 
(http://biostars.org/post/show/42955/genome-wide-model-selection-and-interactions-terms/)

Normally we choose a single model and fit it across the genome. Let's say:

    ~ disease + age + gender + pack_years

but what if in most places:

    ~ disease + age + gender

is more appropriate. (or maybe just ~ age)

I'd like to just give an over-specified, full-model with all possible terms, and then do model selection at each site. But, then, there are problems with multiple-testing correction (e.g. how to do it).

Anyone got any references on this?

Post has attachment
This was a hard-earned pub, 4 example analyses, 3 revisions in a 2 page applications note.
We're using it to find differentially methylated regions in 450K and CHARM data.

Here's the free link: http://bioinformatics.oxfordjournals.org/cgi/reprint/bts545?ijkey=ZTTOnczUJYLfKgw&keytype=ref

Post has attachment
My obvious thoughts on methylation and correlation using the Ipython Notebook:

http://brentp.github.com/correlation.html

I do see a lot of publications ignoring this.

Post has attachment
I updated my aligner comparison to include 75bp single end Solid 5500 data. Scroll down for ROC-like plot:
https://github.com/brentp/bowfast/tree/master/aligner-compare

Again, BFAST has the highest number of true positives (and lowest number of false positives given a proper mapping quality cutoff). Bowtie does much better on these reads which are better quality.

Mapping rate is much higher in 5500 (almost 80%) than in solid 3 ( < 60%).

Post has attachment
An ROC analysis using a number of aligners for colorspace (scroll down on the linked page to see notes, image):
https://github.com/brentp/bowfast/tree/master/aligner-compare

It is for real solid 3 data on a targetted resequencing project. So I can gauge accuracy by the number of reads that are mapped inside the target region.

This is preliminary, and I *welcome any feedback*. I certainly have not optimized
the parameters for all (any?) aligners. I did not trim the reads.

BFAST does quite well, novoalign takes forever, but maps a lot more reads, most of them accurate.

Post has attachment
when my car starts sliding in the snow, there's a flashing indicator that comes on the dash (a car with some squiggly lines behind it), apparently to distract me from the fact that I'm, umm sliding.
I don't see how that's a useful feature.

This is the first day I drove the 8 miles to work instead of cycling. Next time, I might snow-shoe it.
Photo

Post has attachment

Post has attachment
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002240


A meta-analysis paper looking at the association of published and random gene
sets with Breast Cancer (outcome). They find that you can choose a
random set of genes and find--in cases where there are > 100 genes
involved--that 90% of those sets are "associated" with Breast Cancer outcome.
This is another very interesting paper related to how we do our statistical
analyses. It's essentially the problem of correlation != causation.
And moreso, that (from the paper):

"
the question is not whether a given set of genes is related to survival, but
whether it is more related to survival than random sets of genes
"

Because of this problem they find that their random gene sets and those with
un-related processes--including "postprandial laughter", are associated with
breast cancer outcome. And, 28 of 47 published studies had association that
was not stronger than expected by chance.
Only 18 of those 47 were more significant than all but 5% of the random
signatures (p < 0.05).

This problem should be somewhat specific to Cancer because it causes [sic]
crazy expression changes in lots of genes.
Wait while more posts are being loaded