单细胞数据高级分析之构建成熟路径 | Identifying a maturation trajectory

其实就是另一种形式的打分。

Identifying a maturation trajectory.

To assign each cell a maturation score that is proportional to the developmental progress, we first performed dimensionality reduction as described above using all genes that were detected in at least 2% of the cells (8,014 genes). This resulted in four significant dimensions. We then fit a principal curve (R package princurve, smoother= ‘lowess’, f= 1/3) through the data. The maturation score of a cell is then the arc-length from the beginning of the curve to the point at which the cell projects onto the curve.

The resulting curve is directionless, so we assign the ‘beginning’ of the curve so that the expression of Nes is negatively correlated with maturation. Nes is a known ventricular zone marker and therefore should only be highly expressed early in the trajectory. Maturation scores are normalized to the interval [0, 1]. In an independent analysis, we also used Monocle2 to order cells along a pseudo-time. We used Monocle version 2.3.6 with expression response variable set to negative binomial. We estimated size factors and dispersion using the default functions.

For ordering cells, we reduced the set of genes based on results of the monocle dispersion Table function, and only considered 718 genes with mean expression≥0.01 and an empirical dispersion at least twice as large as the fitted dispersion. Dimensionality reduction was carried out using the default method (DDRTree)

Defining mitotic and post mitotic populations.

We observed a sharp transition point along the maturation trajectory at which cells uniformly transitioned into a postmitotic state, corresponding to the loss of proliferation potential and exit from the cell cycle (Fig. 1f, Extended Data Fig. 1).

We therefore subdivided the maturation trajectory into a mitotic and postmitotic phase to facilitate downstream analyses. We defined cells with a high phase-specific enrichment score (score >2, see section ‘Removal of cell cycle effect’) as being in the S or the G2/M phase.

We then fitted a smooth curve (loess, span=0.33, degree=2) to number of cells in S, G2/M phases as a function of maturation score. The point where this curve falls below half the global average marks the dividing threshold (Fig. 1f).

# for maturation trajectory

# fit maturation trajectory
maturation.trajectory <- function(cm, md, expr, pricu.f=1/3) {
  cat('Fitting maturation trajectory\n')
  genes <- apply(cm[rownames(expr), ] > 0, 1, mean) >= 0.02 & apply(cm[rownames(expr), ] > 0, 1, sum) >= 3
  rd <- dim.red(expr[genes, ], max.dim=50, ev.red.th=0.04)
  # for a consisten look use Nes expression to orient each axis
  for (i in 1:ncol(rd)) {
    if (cor(expr['Nes', ], rd[, i]) > 0) {
      rd[, i] <- -rd[, i]
    }
  }
  
  md <- md[, !grepl('^DMC', colnames(md))]
  md <- cbind(md, rd)
  
  pricu <- principal.curve(rd, smoother='lowess', trace=TRUE, f=pricu.f, stretch=333)
  pc.line <- as.data.frame(pricu$s[order(pricu$lambda), ])
  md$maturation.score <- pricu$lambda/max(pricu$lambda)
  
  # orient maturation score using Nes expression
  if (cor(md$maturation.score, expr['Nes', ]) > 0) {
    md$maturation.score <- -(md$maturation.score - max(md$maturation.score))
  }
  
  # use 1% of neighbor cells to smooth maturation score
  md$maturation.score.smooth <- nn.smooth(md$maturation.score, rd[, 1:2], round(ncol(expr)*0.01, 0))
  
  
  # pick maturation score cutoff to separate mitotic from post-mitotic cells
  md$in.cc.phase <- md$cc.phase != 0
  fit <- loess(as.numeric(md$in.cc.phase) ~ md$maturation.score.smooth, span=0.5, degree=2)
  md$cc.phase.fit <- fit$fitted
  # pick MT threshold based on drop in cc.phase cells
  # ignore edges of MT because of potential outliers
  mt.th <- max(subset(md, cc.phase.fit > mean(md$in.cc.phase)/2 & maturation.score.smooth >= 0.2 & maturation.score.smooth <= 0.8)$maturation.score.smooth)
  
  md$postmitotic <- md$maturation.score.smooth > mt.th
  return(list(md=md, pricu=pricu, pc.line=pc.line, mt.th=mt.th))
}


# for smoothing maturation score

nn.smooth <- function(y, coords, k) {
  knn.out <- FNN::get.knn(coords, k)
  w <- 1 / (knn.out$nn.dist+.Machine$double.eps)
  w <- w / apply(w, 1, sum)
  v <- apply(knn.out$nn.index, 2, function(i) y[i])
  return(apply(v*w, 1, sum))
}

# maturation score colors
my.cols.RYG <- colorRampPalette(c("#a50026", "#d73027", "#f46d43", "#fdae61", "#fee08b",
                                  "#ffffbf", "#d9ef8b", "#a6d96a", "#66bd63", "#1a9850", "#006837"))(11)