## Profile

Guillaume Filion
Lives in Barcelona
121 followers|47,432 views

## Stream

### Guillaume Filion

Shared publicly  -

I finally took the time to write this post. It tells how I was misunderstanding Principal Component Analysis and how a real world example made me understand something very important about unsupervised classification.

"The fact that experiments segregate by laboratory of origin does not mean that this dominates the signal. The differences between labs were tiny, but they were systematic, happening always on the same few loci. The differences between proteins were large, but unstructured, so they were not picked up by the PCA."

#bioinformatics
#machinelearning  ﻿
ENCODE data, Principal Components and racism | Filed under ENCODE, racism, series: genetics and racism, Principal Component Analysis.
3

### Guillaume Filion

Shared publicly  -

Great article about CPU caches by Ulrich Drepper.

"A simple computation can show how effective caches can theoretically be. Assume access to main memory takes 200 cycles and access to the cache memory take 15 cycles. Then code using 100 data elements 100 times each will spend 2,000,000 cycles on memory operations if there is no cache and only 168,500 cycles if all data can be cached. That is an improvement of 91.5%. (...) These are the numbers Intel lists for a Pentium M (actual access times measured in CPU cycles):

Register <= 1
L1d ~3
L2 ~14
Main Memory ~240

#computerscience
#softwaredevelopment  ﻿
[Editor's note: This is the second installment in Ulrich Drepper's "What every programmer should know about memory" document. Those who have not read the first part will likely want to start there. This is good stuff, and we once again thank Ulrich for allowing us to publish it.
1

### Guillaume Filion

Shared publicly  -

Great article by Ulrich Drepper about computer memory architecture.
#computerscience
#softwaredevelopment  ﻿
[Editor's introduction: Ulrich Drepper recently approached us asking if we would be interested in publishing a lengthy document he had written on how memory and software interact. We did not have to look at the text for long to realize that it would be of interest to many LWN readers.
2

### Guillaume Filion

Shared publicly  -

Very well put "I want developer parity so that I can spend my time improving code rather than debugging differences between environments."
#bioinformatics  ﻿
How I develop. Published Fri July 24 2015. I've been in the bioinformatics field for almost 10 years, originally coming from a molecular biology degree background, I deciding to move into computing after struggling to find a job doing lab work. This post is a general outline of how I now develop ...
1

### Guillaume Filion

Shared publicly  -

The distribution of the largest fragment of a broken stick has been worked out a long time ago. But somehow this result is difficult to find on the Internet. With the help of the Cross Validated community, I found readable proofs for this distribution and its asymptotic limit.
#statistics
#bioinformatics  ﻿
1

### Guillaume Filion

Shared publicly  -

"Time and energy spent on trying to increase internet clicks is time and energy we don't spend on the tedious administrative activities that are needed to actually affect change."﻿
Throughout history, engineers, medical doctors and other applied scientists have helped convert basic science discoveries into products, public goods and policy that have greatly improved our quality of life. With rare exceptions, it has taken years if not decades to establish these discoveries.
1
In his circles
191 people
Have him in circles
121 people

### Guillaume Filion

Shared publicly  -

Some distributions can produce "bad" samples for which usual estimators will fail. What to do in this case?
#statistics  ﻿
1

### Guillaume Filion

Shared publicly  -

In this blog post I explain how to use the so-called "stick breaking" process in the DNA alignment problem.

"Inserting k mutations at random in a sequencing read will produce k+1 (possibly empty) subsequences without errors. The process is analogous to inserting k breaks at random in a stick of length 1, and we can approximate the distribution of the longest subsequence without error by that of the longest fragment when breaking the stick."
#bioinformatics
#statistics  ﻿
Stick breaking and DNA alignment | Filed under heuristic, sequence alignment, spacings, stick breaking, bioinformatics.
1

### Guillaume Filion

Shared publicly  -

A very simple Python example to use kd-trees with practical application to count shootings nearby schools.
#python   ﻿
1

### Guillaume Filion

Shared publicly  -

Updating the lab website with our publications. It's nice to keep it alive.﻿
1

### Guillaume Filion

Shared publicly  -

Ever wanted to compute eigenvalues with sparse matrices? APRACK is the real deal. Surprisingly, it is not available in R by default, but the developers of the igraph package have written a nice port. So in case you look for it, there it is.

#computing
#mathematics
#statistics  ﻿
Arguments. func: The function to perform the matrix-vector multiplication. ARPACK requires to perform these by the user. The function gets the vector x as the first argument, and it should return Ax, where A is the “input matrix”. (The input matrix is never given explicitly.) ...
1

### Guillaume Filion

Shared publicly  -

Lior Pachter recently offered a cash prize to answer a scientific question. He got over a million views and got several answers from distinguished biologists. Very neat.

#bioinformatics
#evolution  ﻿
Two weeks ago in my post Pachter's P-value Prize I offered $latex {\bf \frac{\$100}{p}}$for justifying a reasonable null model and a p-value (p) associated to the statement ""Strikingly, 95% of ca... 1 Add a comment... People In his circles 191 people Have him in circles 121 people Work Occupation Team leader at the CRG, Barcelona Basic Information Gender Male Story Tagline Randomness is an attitude Introduction "Because I give a fuck!" That's why I am doing what I do. I give a fuck that everbody gets a chance to learn. I give a fuck that knowledge be shared. I give a fuck that children will live in a better world than ours. Places Currently Barcelona Links Contributor to Guillaume Filion's +1's are the things they like, agree with, or want to recommend.  ENCODE data, Principal Components and racismblog.thegrandlocus.comENCODE data, Principal Components and racism | Filed under ENCODE, racism, series: genetics and racism, Principal Component Analysis.  Memory part 2: CPU caches [LWN.net]lwn.net[Editor's note: This is the second installment in Ulrich Drepper's "What every programmer should know about memory" document. Those who have  What every programmer should know about memory, Part 1 [LWN.net]lwn.net[Editor's introduction: Ulrich Drepper recently approached us asking if we would be interested in publishing a lengthy document he had writt  Using KDTree's in python to calculate neighbor countsandrewpwheeler.wordpress.comFor a few different projects I've had to take a set of crime data and calculate the number of events nearby. It is a regular geospatial task  Bioinformatics Zen - How I developwww.bioinformaticszen.comHow I develop. Published Fri July 24 2015. I've been in the bioinformatics field for almost 10 years, originally coming from a molecular bio  arpack {igraph} | inside-R | A Community Site for Rwww.inside-r.orgArguments. func: The function to perform the matrix-vector multiplication. ARPACK requires to perform these by the user. The function gets t  I was wrongliorpachter.wordpress.comTwo weeks ago in my post Pachter's P-value Prize I offered$latex {\bf \frac{\$100}{p}}$ for justifying a reasonable null model and a p-valu
 Bayesian networks and causationblog.thegrandlocus.comBayesian networks and causation | Filed under statistics, Bayesian networks, causes, correlation.
 GigaScience | Full text | The ocean sampling day consortiumwww.gigasciencejournal.comOcean Sampling Day was initiated by the EU-funded Micro B3 (Marine Microbial Biodiversity, Bioinformatics, Biotechnology) project to obtain
 Starcode: sequence clustering based on all-pairs searchbioinformatics.oxfordjournals.orgAbstract Motivation: The increasing throughput of sequencing technologies offers new applications and challenges for computational biology.
 What is bioinformatics about?blog.thegrandlocus.comWhat is bioinformatics about? | Filed under information retrieval, PubMed, journals, bioinformatics.
 Genome Biology | Full text | When will ‘open science’ become simply ‘sci...genomebiology.comOpen science describes the practice of carrying out scientific research in a completely transparent manner, and making the results of that r
 Why do bioinformatics?blog.thegrandlocus.comWhy do bioinformatics? | Filed under software pollution, benchmark, bioinformatics.
 Google+plus.google.comGoogle+ is a place to connect with friends and family, and explore all of your interests. Share photos, send messages, and stay in touch wit
 borborygminhoffman.github.ioIllumina provides a program for demultiplexing sequencing output called bcl2fastq . They get a gold star for releasing the source - the down
 5 Steps to Develop a Basic AngularJS Application with Examplewww.thegeekstuff.comAngular.js is a client-side javascript framework developed and maintained by Google. It is a MVW ( Model View Whatever ) framework. It gives
 Starcode: sequence clustering based on all-pairs searchbioinformatics.oxfordjournals.orgAbstract Motivation: The increasing throughput of sequencing technologies offers new applications and challenges for computational biology.
 The Extent and Consequences of P-Hacking in Sciencejournals.plos.orgPublication bias resulting from so-called "p-hacking" is pervasive throughout the life sciences; however, its effects on general conclusions
I found the service at ShBarcelona to be good, with high level of responsiveness. The personnel is committed, foreigner-friendly and amicable. Negative point, I found the repair services (plumbers, electricians etc.) they contract overpriced. But overall they left me a good impression.
Public - 11 months ago
reviewed 11 months ago
1 review