Thread by @JunhyongKim, Tomorrow is this talk by talented Paul Magwene. He is one of [...]

Tomorrow is this talk by talented Paul Magwene. He is one of the deepest thinker of my former trainees.

@lpachter kindly pointed Paul, Paul Lizardi, and I conceptualized pseudotime and I want to use this to talk about method dev in compbio

1/n https://twitter.com/blkwomencompbio/status/1323307261616336898

https://twitter.com/blkwomencompbio/status/1323307261616336898

Early years of func genomics, Spellman et al. was a landmark dataset for studying the transcriptome. https://www.molbiolcell.org/doi/full/10.1091/mbc.9.12.3273

My former colleague Paul Lizardi, who was interested in cancer progression asked me about the possibility of using transcriptome for time ordering

2/n

Comprehensive Identification of Cell Cycle–regulated Genes of the Yeast Saccharomyces cerevisiae by...

https://www.molbiolcell.org/doi/full/10.1091/mbc.9.12.3273

Paul L. is an incredibly creative biologist who invented rolling circle mol. beacon, methylome for long read assemb, universal microarray, etc. So, if anybody is to be given credit for conceptualizing time reconstruction from func genomics data, it should be Paul L.

3/n

When Paul M. and I started thinking about the problem, the first thought was using Hastie's principal curves, which I didn’t know about at the time and Paul taught me. But, given the sparcity of the data (~18 time points), it didn’t seem reasonable.

http://web.stanford.edu/~hastie/Papers/Principal_Curves.pdf

4/n

Additional inspirations came from:
Travel Salesman Problem curve reconst.
Nina Amenta et al., work on combinatorial shape
Tenenbaum et al. nonlinear dim reduction.

https://dl.acm.org/doi/abs/10.5555/338219.338627
https://escholarship.org/uc/item/8pb179vt
https://science.sciencemag.org/content/290/5500/2319.full

5/n

A Global Geometric Framework for Nonlinear Dimensionality Reduction

Scientists working with large volumes of high-dimensional data, such as global climate patterns, stellar spectra, or human gene distributions, regularly confront the problem of dimensionality...

https://dl.acm.org/doi/abs/10.5555/338219.338627

Given the noise and sampling density, we settled on a tree-graph model with the idea of a diameter path. Then Paul and I discussed modifications we need to deal with high curvature (like cycling genes). He came up with the key P-Q data structure to order the possible paths.

6/n

And then we added a delta-shortest criterion to resolve possible path. We published our paper in 2003 below. But, didn't receive much attention until fortunately, C. Trapnell adopted the ideas into Monocle for single cell analysis.

7/n https://academic.oup.com/bioinformatics/article/19/7/842/197339

Reconstructing the temporal ordering of biological samples using microarray data

Abstract. Motivation: Accurate time series for biological processes are difficult to estimate due to problems of synchronization, temporal sampling and rate

https://academic.oup.com/bioinformatics/article/19/7/842/197339

So, my view on key to comp methods

(1) breadth of knowledge of methods/theories, (Paul had vast quant);
(2) depth of tech knowledge to engineer fit to the problem at hand, (again, Paul).
(3) numbers are numbers. Almost always there is preexisting body of knowledge.

8/n

New data types call for engineering adjustment, almost never de novo creation. Compbio people should be regarded based on their breadth and depth of knowledge, not popularity of their methods. Lot’s of VHS out there. So, again, I strongly recommend checking out Paul’s talk.

/end

Latest Threads Unrolled: