This is an interesting discussion about the time it takes to publish papers in the academia. The distribution has a rather heavy tail, so we have to think about the consequences for evaluation purposes etc.
#statistics
Our work on polymer physics and chromatin has been published at Nature Communications. We show that DNA loops can cause the accumulation of transcription factors. This explains a puzzling observation from my postdoc and shows that the geometry of the genome influence the distribution of transcription factors. The future of this research line is exciting. https://rdcu.be/NGHt #biology #physics
"As a beginner in statistics, I remember being confused about one important thing. Every statistical test has a null hypothesis on the format “the correlation between the variables is 0”. Because statistics are approximate, I understood such hypotheses as “the correlation between the variables is approximately 0”. As it turns out, this has been the most harmful misconception I could have."

In this blog post, I explain some of the dangers associated with Big Data, and how trusting p-values blindly can fool you into believing that all your hypotheses are true.

#statistics
#bioinformatics
#BigData
In this blog post I asked my friend and colleague Miguel Beato to share some of his thoughts on science and research.

"Guillaume Filion: What do you think has been the most important revolution in science since the beginning of your career?

Miguel Beato: The transition from analysing single events to global events in the cell. Actually, changing the microscope for statistics."
I am very excited to share my recent work about analytic combinatorics. The story started more than three years ago on Coursera. I took a class by Robert Sedgewick and got very intrigued by the topic. I bought the textbook, did as many exercises I could for about two years, when I finally got the idea of what we could do with it in bioinformatics.

This is not an easy read, but I tried to be as clear as possible. With the help of friends and colleagues I sought the best didactic approach. I hope it is successful to some extent. In the end I am very proud to have written the first document with "analytic combinatorics" and "bioinformatics" in the title.
"T47D_rep2 and b1913e6c1_51720e9cf were two Hi-C samples. They were born and processed at the same time, yet their fates were very different. The life of b1913e6c1_51720e9cf was simple and fruitful, while that of T47D_rep2 was full of accidents and sorrow. At the heart of these differences lies the fact that b1913e6c1_51720e9cf was born under a lab culture of Documentation, Automation, Traceability, Autonomy and compliance with the FAIR Principles. Their lives are a lesson for those who wish to embark on the journey of managing high throughput sequencing data."
Here is our latest work on Hi-C normalization

"We observed that current normalization methods are not robust to the presence of large-scale copy number variations, potentially obscuring biological differences and enhancing batch effects. To address this issue, we developed an alternative approach designed to take into account chromosomal abnormalities."
In this article we discuss the most current challenges of data storage and organization. The literature is mostly concerned with defining standards, but little attention is paid to why people do not follow them.

Here, we follow the fate of a medium-size project and discuss the everyday issues of their members, and the solutions they can use in practice. The narrative is largely auto-biographic, but many researchers will recognize some of their own difficulties. Our hope is that this will help the community understand where the problems lie (lack of communications between experimenters and analysts, high turn over of laboratory members, lack of long-term planning for analysis and lack of awareness for standardization), and that it will help researchers implement solutions that work in their organization.

#bioinformatics