Cover photo
Brent Pedersen
Worked at University of Colorado
Attended University of California, Berkeley
Lives in Salt Lake City, UT


Brent Pedersen

Shared publicly  - 
After > 1 month with what appeared to be croup, a surgeon did exploratory bronchioscopy and pulled this piece of plastic from my daughter's airway. She's doing great now--back to eating beets and getting into mischief.
Add a comment...

Brent Pedersen

Shared publicly  - 
This was a hard-earned pub, 4 example analyses, 3 revisions in a 2 page applications note.
We're using it to find differentially methylated regions in 450K and CHARM data.

Here's the free link:
Summary: comb-p is a command-line tool and a python library that manipulates BED files of possibly irregularly spaced p-values and 1) calculates auto-correlation, 2) combines adjacent p-values, 3) performs false discovery adjustment, 4) finds regions of enrichment (i.e., series of adjacent low p-values), and 5) assigns significance to those regions. In addition, tools are provided for visualization and assessment. We provide validation and exampl...
Istvan Albert's profile photoBrent Pedersen's profile photo
It's pretty naive.
You define a threshold, a distance, and a minimum. It will seed a peak/trough on anything less than minimum and extend the peak as long as it finds something less than threshold within distance.

I added the free link:

and it's on github:
Add a comment...

Brent Pedersen

Shared publicly  - 
I updated my aligner comparison to include 75bp single end Solid 5500 data. Scroll down for ROC-like plot:

Again, BFAST has the highest number of true positives (and lowest number of false positives given a proper mapping quality cutoff). Bowtie does much better on these reads which are better quality.

Mapping rate is much higher in 5500 (almost 80%) than in solid 3 ( < 60%).
Brent Pedersen's profile photoNils Homer's profile photoGavin Oliver's profile photo
Hi +Nils Homer - I guess it would depend on application but 20 sounds more than acceptable for most purposes to me. My initial confusion stemmed from the fact that I had previously tested BWA in my pipeline, used no post-alignment filtering and obtained very few false positives. Bfast is producing many more false positives but this is likely due to the fact is is producing more alignments in general, and therefore more lower quality alignments. I will filter the alignments and reassess. By the way - how do I deal with paired end reads existing in 2 files with Bfast?
Add a comment...

Brent Pedersen

Shared publicly  - 
when my car starts sliding in the snow, there's a flashing indicator that comes on the dash (a car with some squiggly lines behind it), apparently to distract me from the fact that I'm, umm sliding.
I don't see how that's a useful feature.

This is the first day I drove the 8 miles to work instead of cycling. Next time, I might snow-shoe it.
Brent Pedersen's profile photoAaron Quinlan's profile photo
I went off the grid for 2002 and was a ski bum. Lived in a little town near Breckenridge. What a year.
Add a comment...

Brent Pedersen

Shared publicly  - 
Brent Pedersen's profile photoIstvan Albert's profile photo
Looks like a nice hike!
Add a comment...

Brent Pedersen

Shared publicly  - 
+Aaron Quinlan congrats on the paper. Love figure 3.
Motivation: The comparison of diverse genomic datasets is fundamental to understanding genome biology. Researchers must explore many large datasets of genome intervals (e.g., genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect: that is, they overlap and thus share a common genome inter...
Aaron Quinlan's profile photo
Thanks Brent.  The algorithm is dead simple, but very effective.  If only the CUDA GPU libs were a bit more accessible for the typical user.
Add a comment...

Brent Pedersen

Shared publicly  - 
I have been thinking about model selection on genomic data. 

Normally we choose a single model and fit it across the genome. Let's say:

    ~ disease + age + gender + pack_years

but what if in most places:

    ~ disease + age + gender

is more appropriate. (or maybe just ~ age)

I'd like to just give an over-specified, full-model with all possible terms, and then do model selection at each site. But, then, there are problems with multiple-testing correction (e.g. how to do it).

Anyone got any references on this?
contrast a linear model like: expression ~ disease + gender + age. with: expression ~ disease + gender + disease:gender + age. In both cases, I pull out the p-values for the "disease" parameters. Each...
Add a comment...

Brent Pedersen

Shared publicly  - 
My obvious thoughts on methylation and correlation using the Ipython Notebook:

I do see a lot of publications ignoring this.
Correlation. I see a lot of publications doing correlations on methylation data. It's something that I've thought about a lot as well. A particular cytosine may be always methylated. If we mea...
Sean Davis's profile photoIstvan Albert's profile photoBrent Pedersen's profile photoFernando Perez's profile photo
BTW, now that we have the nbconver-based notebook viewer up, you can share an HTML view of any notebook on the web by prepending  to its URL.
Add a comment...

Brent Pedersen

Shared publicly  - 
An ROC analysis using a number of aligners for colorspace (scroll down on the linked page to see notes, image):

It is for real solid 3 data on a targetted resequencing project. So I can gauge accuracy by the number of reads that are mapped inside the target region.

This is preliminary, and I *welcome any feedback*. I certainly have not optimized
the parameters for all (any?) aligners. I did not trim the reads.

BFAST does quite well, novoalign takes forever, but maps a lot more reads, most of them accurate.
Joseph Fass's profile photoBrent Pedersen's profile photo
The same thing happens here: (note the x-axes is log-scaled)
I think the line is not originally vertical in my data because some reads are correctly mapped outside the target region because the capture wasn't perfect.
Add a comment...

Brent Pedersen

Shared publicly  - 
Great title:

I see this Multi-Dimensional Scaling (MDS) more frequently lately.
It's not clear to me how it differs From PCA. Anyone know why/when it's more useful?
1, 2, 3: Counting the fingers on a chicken wing. Carkett MD and Logan MPO. Genome Biology 2011, 12:130 (28 October 2011). Access to the PDF and full text of this article requires a subscription Help (...
Jeremy Leipzig's profile photoBrent Pedersen's profile photo
great, thanks!
Add a comment...

Brent Pedersen

Shared publicly  -

A meta-analysis paper looking at the association of published and random gene
sets with Breast Cancer (outcome). They find that you can choose a
random set of genes and find--in cases where there are > 100 genes
involved--that 90% of those sets are "associated" with Breast Cancer outcome.
This is another very interesting paper related to how we do our statistical
analyses. It's essentially the problem of correlation != causation.
And moreso, that (from the paper):

the question is not whether a given set of genes is related to survival, but
whether it is more related to survival than random sets of genes

Because of this problem they find that their random gene sets and those with
un-related processes--including "postprandial laughter", are associated with
breast cancer outcome. And, 28 of 47 published studies had association that
was not stronger than expected by chance.
Only 18 of those 47 were more significant than all but 5% of the random
signatures (p < 0.05).

This problem should be somewhat specific to Cancer because it causes [sic]
crazy expression changes in lots of genes.
Farhat Habib's profile photo
Add a comment...

Brent Pedersen

Shared publicly  - 
My summary of the paper by the authors of the LAST aligner.

Introduce "Aligned column accuracy" which is different from "Mapping Accuracy"
in that the latter only checks if the mapped location overlaps the true
location whereas the former is per-base and more important for accuracy of variant calls.
Use probabilisitic alignments based on posterior decoding (I imagine this is
like the dynamic programming of converting colorspace alignments to
base-space?) instead of relying on the maximum score.
Actually try 2 probabilisitic alignment models.

Use dwgsim and stampy for sims and a set of 36 and 76 bp reads from SRA.
Compare their LAST with BWA, Bowtie, Novoalign SHRiMP2 and Stampy.

Simulated Data
Probabilistic alignment improves LAST sensitivity by 2%, 6% for indel, gap
accuracy respectively.

LAST has highest sensitivity accuracy and PPV among the tested for mapping,
aligned column accuracy, and gap accuracy. Novoalign fares well. BWA doesn't
look so great "because it is designed to be more accurate and faster on
queries with low error rates".
They credit LAST's adaptive seed for its good performance.

Real Data
Table 1. The probabilistic models in LAST take 4-5 times longer (194, 184
minutes) than unmodified LAST (41minutes) but are still faster than novoalign
(518 minutes). Bowtie takes 3 minutes and BWA 16.

LAST seems to map many fewer reads on the 36bp data --341304 compared to 400K
to 500K for the other aligners. On 76bp data it's more in line with the
others. LAST uses a lot more memory than other aligners (though only 15GB).

Simulated Data for SNP calling
Table 2. It's seems that due to the large rate of errors, LAST greatly
outperforms the other aligners in terms of downstream SNP calls (by samtools).
Much higher sensitivity and PPV especially at 20x and 40x coverage (less so at
10x coverage).

Would have been interesting to see how this compares to SRMA or GATK
realignment post-processing. Not sure why that wasn't included...
Heng Li's profile photoBrent Pedersen's profile photo
I wondered about some of the tables...

So in this case, you mean an ROC of incorrectly mapped bases vs correctly mapped bases? Or I guess called SNPs vs false SNPs as in your review?
Add a comment...
Bioinformatics Programmer in at University of Utah Center for Genetic Discovery
  • University of California, Berkeley
  • University of Arizona
Basic Information
Bioinformatics Programmer, Computational Biologist
  • University of Colorado
    computational biologist, 2011 - 2015
  • UC Berkeley
    programmer, computational biologist, 2005 - 2010
  • University of Utah
    Computational Biologist, 2015 - present
Map of the places this user has livedMap of the places this user has livedMap of the places this user has lived
Salt Lake City, UT
berkeley - kaneohe, HI - tucson, AZ - denver, co