Daniel Carlin's Current Research

I am currently involved in creating the Stem Cell Genomics Browser (SCGB). The central idea of the SCGB is to display gene expression data on top of the mouse or human genome. The main source of this data will initially be publicly available data sets, such as those on the Gene Expression Omnibus, but the browser will also support direct submissions from researchers. In addition to stem cell data, I will also include some differentiated cell studies for contrast. The SCGB will include all of the relevant features of the CGB. This includes the ability to sort experiments by various metadata. For instance, mutant versus wild type, which cell surface markers were used to sort the cells, cell type, etc. The browser would include built in statistical tests, such as the t-test, to identify significantly up- and down- regulated genes. Also, since the browser will have access to the UCSC Genome Browser Database, the expression data will be easily exportable so that it can be viewed alongside all of the regulation, conservation, and epigenetic data available in the CGB or Genome Browser.

The SCGB will put powerful bioinformatics tools in the hands of stem cell biologists. For instance, imagine a researcher is interested in genes that are up-regulated in cells positive for Sca1 and c-kit two cell surface markers. Using the SCGB, the researcher could select the data associated with the study, sort by and select for analysis of Sca1 and c-kit positive cells, click t-test, and the Browser would return all significantly (according to a user defined p-value) up- and down-regulated genes under those conditions, as compared to all data that is negative for those surface markers.

I am also building a Stem Cell Classifier Compendium (SCCC) to provide formal models that can be used for automated classification of stem cells from molecular profiles. I will train a classifier for each stem cell type using the data that I have compiled for the stem cell browser and test its efficacy against random controls in a cross-validation setting. Once the classifiers are trained, a user will be able to use any of the accurate classifiers to discover if their own expression data has signatures of any stem cells. The classifiers will provide probabilities that a particular expression result fits into each of the different stem cell types or none at all. This will provide an objective way for testing hypotheses, such as whether cancer cells with higher metastatic potential are more similar to, or have a higher stem cell content, compared to more benign cells.

I will derive classifiers using standard machine-learning techniques, such as with support-vector machines and penalized discriminants. Given known positive and negative examples (our data from the browser), a classifier assigns a new data case to either the positive or negative set. Several studies (e.g. Guyon, et al. 2000 and Fishel et al. 2007) have addressed this problem in cancer, where there is a strong incentive for classifying tissue as normal or tumorous, and metastatic or benign. These techniques are ripe for application to stem cell biology.

My previous research projects can be found here.

My Homepage