Profile

Cover photo
Brent Pedersen
Works at University of Colorado
Attended University of California, Berkeley
Lives in denver, co
29,839 views
AboutPostsPhotosVideos

Stream

Brent Pedersen

Shared publicly  - 
1
Aaron Quinlan's profile photo
 
Thanks Brent.  The algorithm is dead simple, but very effective.  If only the CUDA GPU libs were a bit more accessible for the typical user.
Add a comment...

Brent Pedersen

Shared publicly  - 
 
My obvious thoughts on methylation and correlation using the Ipython Notebook:

http://brentp.github.com/correlation.html

I do see a lot of publications ignoring this.
1
Istvan Albert's profile photoFernando Perez's profile photoJoseph Fass's profile photoSean Davis's profile photo
11 comments
 
BTW, now that we have the nbconver-based notebook viewer up, you can share an HTML view of any notebook on the web by prepending http://nbviewer.ipython.org/urls/  to its URL.
Add a comment...

Brent Pedersen

Shared publicly  - 
 
I updated my aligner comparison to include 75bp single end Solid 5500 data. Scroll down for ROC-like plot:
https://github.com/brentp/bowfast/tree/master/aligner-compare

Again, BFAST has the highest number of true positives (and lowest number of false positives given a proper mapping quality cutoff). Bowtie does much better on these reads which are better quality.

Mapping rate is much higher in 5500 (almost 80%) than in solid 3 ( < 60%).
1
Brent Pedersen's profile photoNils Homer's profile photoGavin Oliver's profile photo
5 comments
 
Hi +Nils Homer - I guess it would depend on application but 20 sounds more than acceptable for most purposes to me. My initial confusion stemmed from the fact that I had previously tested BWA in my pipeline, used no post-alignment filtering and obtained very few false positives. Bfast is producing many more false positives but this is likely due to the fact is is producing more alignments in general, and therefore more lower quality alignments. I will filter the alignments and reassess. By the way - how do I deal with paired end reads existing in 2 files with Bfast?
Add a comment...

Brent Pedersen

Shared publicly  - 
 
Great title:
http://genomebiology.com/content/12/10/130

I see this Multi-Dimensional Scaling (MDS) more frequently lately.
It's not clear to me how it differs From PCA. Anyone know why/when it's more useful?
1
Brent Pedersen's profile photoJeremy Leipzig's profile photo
2 comments
 
great, thanks!
Add a comment...

Brent Pedersen

Shared publicly  - 
 
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002240


A meta-analysis paper looking at the association of published and random gene
sets with Breast Cancer (outcome). They find that you can choose a
random set of genes and find--in cases where there are > 100 genes
involved--that 90% of those sets are "associated" with Breast Cancer outcome.
This is another very interesting paper related to how we do our statistical
analyses. It's essentially the problem of correlation != causation.
And moreso, that (from the paper):

"
the question is not whether a given set of genes is related to survival, but
whether it is more related to survival than random sets of genes
"

Because of this problem they find that their random gene sets and those with
un-related processes--including "postprandial laughter", are associated with
breast cancer outcome. And, 28 of 47 published studies had association that
was not stronger than expected by chance.
Only 18 of those 47 were more significant than all but 5% of the random
signatures (p < 0.05).

This problem should be somewhat specific to Cancer because it causes [sic]
crazy expression changes in lots of genes.
4
1
Farhat Habib's profile photo
Add a comment...

Brent Pedersen

Shared publicly  - 
 
My summary of the paper by the authors of the LAST aligner.

http://bioinformatics.oxfordjournals.org/content/early/2011/10/05/bioinformatics.btr537.short?rss=1

Introduce "Aligned column accuracy" which is different from "Mapping Accuracy"
in that the latter only checks if the mapped location overlaps the true
location whereas the former is per-base and more important for accuracy of variant calls.
Use probabilisitic alignments based on posterior decoding (I imagine this is
like the dynamic programming of converting colorspace alignments to
base-space?) instead of relying on the maximum score.
Actually try 2 probabilisitic alignment models.

Use dwgsim and stampy for sims and a set of 36 and 76 bp reads from SRA.
Compare their LAST with BWA, Bowtie, Novoalign SHRiMP2 and Stampy.

Simulated Data
==============
Probabilistic alignment improves LAST sensitivity by 2%, 6% for indel, gap
accuracy respectively.

LAST has highest sensitivity accuracy and PPV among the tested for mapping,
aligned column accuracy, and gap accuracy. Novoalign fares well. BWA doesn't
look so great "because it is designed to be more accurate and faster on
queries with low error rates".
They credit LAST's adaptive seed for its good performance.

Real Data
=========
Table 1. The probabilistic models in LAST take 4-5 times longer (194, 184
minutes) than unmodified LAST (41minutes) but are still faster than novoalign
(518 minutes). Bowtie takes 3 minutes and BWA 16.

LAST seems to map many fewer reads on the 36bp data --341304 compared to 400K
to 500K for the other aligners. On 76bp data it's more in line with the
others. LAST uses a lot more memory than other aligners (though only 15GB).

Simulated Data for SNP calling
==============================
Table 2. It's seems that due to the large rate of errors, LAST greatly
outperforms the other aligners in terms of downstream SNP calls (by samtools).
Much higher sensitivity and PPV especially at 20x and 40x coverage (less so at
10x coverage).


Would have been interesting to see how this compares to SRMA or GATK
realignment post-processing. Not sure why that wasn't included...
1
Brent Pedersen's profile photoHeng Li's profile photo
2 comments
 
I wondered about some of the tables...

So in this case, you mean an ROC of incorrectly mapped bases vs correctly mapped bases? Or I guess called SNPs vs false SNPs as in your review?
Add a comment...

Brent Pedersen

Shared publicly  - 
 
T and I went hiking around Mt Evans to see the aspens changing colors.
1
kevin pedersen's profile photo
 
Nice
Add a comment...

Brent Pedersen

Shared publicly  - 
 
I have been thinking about model selection on genomic data. 
(http://biostars.org/post/show/42955/genome-wide-model-selection-and-interactions-terms/)

Normally we choose a single model and fit it across the genome. Let's say:

    ~ disease + age + gender + pack_years

but what if in most places:

    ~ disease + age + gender

is more appropriate. (or maybe just ~ age)

I'd like to just give an over-specified, full-model with all possible terms, and then do model selection at each site. But, then, there are problems with multiple-testing correction (e.g. how to do it).

Anyone got any references on this?
1
Add a comment...

Brent Pedersen

Shared publicly  - 
 
This was a hard-earned pub, 4 example analyses, 3 revisions in a 2 page applications note.
We're using it to find differentially methylated regions in 450K and CHARM data.

Here's the free link: http://bioinformatics.oxfordjournals.org/cgi/reprint/bts545?ijkey=ZTTOnczUJYLfKgw&keytype=ref
3
Brent Pedersen's profile photoIstvan Albert's profile photo
2 comments
 
It's pretty naive.
You define a threshold, a distance, and a minimum. It will seed a peak/trough on anything less than minimum and extend the peak as long as it finds something less than threshold within distance.

I added the free link: http://bioinformatics.oxfordjournals.org/cgi/reprint/bts545?ijkey=ZTTOnczUJYLfKgw&keytype=ref

and it's on github:
https://github.com/brentp/combined-pvalues/
Add a comment...

Brent Pedersen

Shared publicly  - 
 
An ROC analysis using a number of aligners for colorspace (scroll down on the linked page to see notes, image):
https://github.com/brentp/bowfast/tree/master/aligner-compare

It is for real solid 3 data on a targetted resequencing project. So I can gauge accuracy by the number of reads that are mapped inside the target region.

This is preliminary, and I *welcome any feedback*. I certainly have not optimized
the parameters for all (any?) aligners. I did not trim the reads.

BFAST does quite well, novoalign takes forever, but maps a lot more reads, most of them accurate.
4
1
Brent Pedersen's profile photoJoseph Fass's profile photo
14 comments
 
The same thing happens here: http://lh3lh3.users.sourceforge.net/alnROC.shtml (note the x-axes is log-scaled)
I think the line is not originally vertical in my data because some reads are correctly mapped outside the target region because the capture wasn't perfect.
Add a comment...

Brent Pedersen

Shared publicly  - 
 
when my car starts sliding in the snow, there's a flashing indicator that comes on the dash (a car with some squiggly lines behind it), apparently to distract me from the fact that I'm, umm sliding.
I don't see how that's a useful feature.

This is the first day I drove the 8 miles to work instead of cycling. Next time, I might snow-shoe it.
1
Brent Pedersen's profile photoAaron Quinlan's profile photo
4 comments
 
I went off the grid for 2002 and was a ski bum. Lived in a little town near Breckenridge. What a year.
Add a comment...

Brent Pedersen

Shared publicly  - 
1
Istvan Albert's profile photoBrent Pedersen's profile photo
2 comments
 
Looks like a nice hike!
Add a comment...
Story
Tagline
Bioinformatics Programmer in at CU, Anschutz Medical Campus, Denver
Education
  • University of California, Berkeley
    Ph.D
  • University of Arizona
    BS
Basic Information
Gender
Male
Work
Occupation
Bioinformatics Programmer, Computational Biologist
Employment
  • University of Colorado
    computational biologist, 2011 - present
  • UC Berkeley
    programmer, computational biologist, 2005 - 2010
Places
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Currently
denver, co
Previously
berkeley - kaneohe, HI - tucson, AZ