See the video tutorial: Getting Started (5 minutes)
Note that this video tutorial was created with GenePattern 3.2
The GenePattern Tutorial introduces you to GenePattern by providing step-by-step instructions for analyzing gene expression. It takes approximately 40 minutes to complete.
Note that this hands on tutorial was created with GenePattern 3.8.0
All of the information you need to successfully complete this tutorial is contained in the tutorial. For users who like additional discussion along the way, the tutorial includes pointers to more information in other GenePattern guides. Feel free to follow these links or to ignore them, depending on your learning style.
To follow the hands-on instructions in this tutorial, you must have access to the following:
The gene expression dataset used in the tutorial is from Golub and Slonim et al. (1999), which used clustering and prediction algorithms to find genes that distinguish between two subtypes of leukemia, ALL and AML. The dataset consists of 38 bone marrow samples (27 ALL, 11 AML) obtained from acute leukemia patients.
For the purposes of this tutorial, your goal is to identify marker genes for the two subtypes of leukemia:
Note: If you are using a different version of any analysis, follow the instructions as closely as possible, but be aware that your results might not match those shown in the tutorial.
To start GenePattern:
Note: Your home page may look slightly different depending on your GenePattern server, browser, and operating system. For this picture, we are using out public GenePattern server, Chrome and Windows 7.
Click the GenePattern icon to return to this home page at any time. | |
The upper right corner shows your user name. | |
The navigation bar provides access to other pages. | |
The Modules & Pipelines provides access to the analyses that you can run. Enter the first few characters of a module or pipeline name in the search box to locate that analysis. Click the Browse Modules button to list them alphabetically, by category or by suite. You will also find your Favorite Modules in this panel, as well as Recent Modules you have used. | |
The center pane is the main display pane, which GenePattern uses to display information and to prompt you for input. | |
The Jobs tab lists the most recent analyses that you have run and their results files. The Files tab lists files that you have copied to the GenePattern server. When you start GenePattern for the first time, these tabs are empty. |
For more information: see User Interface in the GenePattern User Guide.
Now that you have started GenePattern, you are ready to analyze your data. In this section, you learn how to:
Run the ComparativeMarkerSelection analysis to find the genes in the dataset that are most closely correlated with the two phenotypes (ALL and AML) in the dataset. To run the ComparativeMarkerSelection analysis:
all_aml_train.gct
(https://datasets.genepattern.org/data/all_aml/all_aml_train.gct).
all_aml_train.cls
(https://datasets.genepattern.org/data/all_aml/all_aml_train.cls).For more information: see Running Modules and Pipelines.
When you return to the GenePattern home page, the Jobs tab shows the analysis job that you ran and the associated analysis results files:
Download | Downloads a zip file containing all analysis results files for this job. |
Reload | Displays the analysis and its parameters, with the parameters set to the values used for this analysis job. |
Delete | Deletes the analysis job and its analysis results files from the GenePattern server. |
Info | Displays the parameter values and the analysis results files for this job. |
View Java Code View MATLAB Code View R Code |
Displays the command line that you would use to run this job in the Java, MATLAB, or R programming environments. These commands are useful for programmers who want to access GenePattern from one of these programming environments or from their own applications. |
Delete | Deletes the file from the GenePattern server. |
Save | Downloads the file from the GenePattern server. |
Create Pipeline | Creates a GenePattern pipeline that reproduces this analysis results file. Pipelines are discussed later in this tutorial. |
List of analyses | Lists analyses that commonly use this type of file as an input parameter. Select an analysis to display its parameters with this results file specified as the first input parameter. |
all_aml_train.comp.marker.odf
, to display it in a text viewer. The amount of information it contains makes the file difficult to understand. This file, like most analysis results file, is not intended to be viewed as a text file, but rather intended to be used as input to subsequent analyses.For more information: see Working with Analysis Results
After running the ComparativeMarkerSelection analysis, run the ComparativeMarkerSelectionViewer to examine the analysis results. To run the ComparativeMarkerSelectionViewer:
all_aml_train.comp.marker.odf
results file.all_aml_train.comp.marker.odf
results file.all_aml_train.gct
file (https://datasets.genepattern.org/data/all_aml/all_aml_train.gct).
The ComparativeMarkerSelectionViewer appears:
Now that you have examined the ComparativeMarkerSelection analysis results, you want to create a new dataset that contains only the most promising marker genes from the results file for further analysis. To run the ExtractComparativeResults analysis:
all_aml_train.comp.marker.odf
.all_aml_train.comp.marker.odf
results file.all_aml_train.gct
file (https://datasets.genepattern.org/data/all_aml/all_aml_train.gct).The HeatMapViewer displays expression values in a color-coded heat map. The largest expression values are displayed in red (hot) and the smallest values are displayed in blue (cool). Intermediate values are displayed in different shades of red and blue. The color-coding provides a quick coherent view of gene expression levels.
To display your new dataset in the HeatMapViewer:
all_aml_train.comp.marker.filt.gct
.all_aml_train.comp.marker.filt.gct
results file.
As you have seen, GenePattern makes it easy to run individual analyses and to review analysis results. Pipelines take this one step further: they make it easy to run multiple analyses. You can define a pipeline to run multiple analyses against a single dataset or to run a sequence of analyses, where the output from one analysis becomes the input for a subsequent analysis. Modules run from a pipeline work exactly the same as those run directly from GenePattern.
In this tutorial, you have run two analyses: ComparativeMarkerSelection and ExtractComparativeMarkerResults. The analysis results file from the first analysis became the input file for the second analysis. Running these two analyses produced a new dataset that contains the 100 genes in your dataset (all_aml_train.gct
) that are most closely correlated with phenotypes in your class file (all_aml_train.cls
).
In this section, you will:
For more information: see Working with Pipelines in the GenePattern User Guide.
See the video tutorial: Exploring the New GenePattern Pipeline Designer
You can create a pipeline in one of three ways:
In this tutorial, you want to create a pipeline based on the ExtractComparativeMarkerResults results file:
all_aml_train.comp.marker.filt.gct
.In this section, you first explore the pipeline designer and then examine the content of the pipeline.
Each time you create or edit a pipeline, GenePattern displays the pipeline designer:
The pipeline designer comprises three main parts (from left to right):
At the top of the pipeline designer, the toolbar provides the following options:
Displays the basic pipeline properties in the Editing Pipeline panel, as shown here. |
|
Saves your changes without closing the designer, and provides the option to run after saving. |
|
Loads the last saved version of the pipeline, overwriting any unsaved changes. |
|
Displays the pipeline designer section of the GenePattern documentation. |
The pipeline displayed in the pipeline designer reproduces the ExtractComparativeMarkerResults analysis results file:
The ComparativeMarkerSelection module has two input files. As shown by the connections in the diagram, the all_aml_train.gct
file is the input for the input.file parameter and the all_aml_train.cls
file is the input for the cls.file parameter.
The ExtractComparativeMarkerResults module also has two input files. The odf output file of the ComparativeMarkerSelection module is the input for the comparative.marker.selection.filename parameter and the all_aml_train.gct
file is the input for the dataset.filename parameter.
Click the ExtractComparativeMarkerResults module to display its parameters in the editing panel. Note that all of the parameters are set to the values you used when you initially ran the module; in particular, statistic=rank and max=100.
To edit pipeline details:
When GenePattern displays the Pipeline Saved confirmation window, click Close to close the window.
The pipeline contains the two analysis modules used to create the analysis results file: ComparativeMarkerSelection and ExtractComparativeMarkerResults. In your original analysis, after creating the analysis results file, you used the HeatMapViewer to review the results.
To add the HeatMapViewer module to your pipeline:
To run the pipeline:
You have created a pipeline that duplicates your original analysis: it runs the Comparative Marker Selection analysis on the all_aml_train data (gct and cls) files, uses the analysis results as input to the Extract Comparative Marker Results analysis, and then displays the analysis results using the Heat Map Viewer.
You can make the pipeline more generally useful by having it prompt you for the data (gct and cls) files to be analyzed, rather than simply analyzing the all_aml_train data files. To do this, mark the input file parameters as prompt-when-run. When GenePattern runs the pipeline, it will prompt the user to enter values for the prompt-when-run parameters.
To edit the pipeline:
To add parameters to the pipeline, mark the parameters of interest as prompt-when-run:
all_aml_train.gct
.all_aml_train.cls.
In the editing panel:
all_aml_train.res
.all_aml_train.gct
file, and then modify the dataset.filename parameter to mark it prompt-when-run.
When you save and run the pipeline, GenePattern displays the pipeline parameters (if any):
To run the edited pipeline:
all_aml_train.gct
(https://datasets.genepattern.org/data/all_aml/all_aml_train.gct).all_aml_train.cls
(https://datasets.genepattern.org/data/all_aml/all_aml_train.cls).all_aml_train.gct
again (https://datasets.genepattern.org/data/all_aml/all_aml_train.gct).The pipeline requires that you enter the same data file (all_aml_train.gct
) twice: once for the input file parameter of the ComparativeMarkerSelection module and again for the dataset filename parameter of the ExtractComparativeMarkerResults module. Ideally, you want to enter the data file once and have the pipeline use it for both the ComparativeMarkerSelection and ExtractComparativeMarkerResults modules. For more information on how that can be done, see Reusing a User-Supplied File in the Working With Pipelines section of the GenePattern User Guide.
As described earlier in the tutorial, analyses are run on the GenePattern server and analysis results files are stored on the server. Server storage is temporary and analysis results files are deleted after they have been on the server for a certain length of time (by default, one week).
To save your analysis results files, you must copy each file from the server to a more permanent location. If you do not need your analysis results, you can delete them at any time.
To save an analysis results file:
To delete an analysis results file:
To save or delete a job and all of its analysis results files, click the icon next to the job and click Download or Delete.
To exit from GenePattern, click the Sign out link in the top right corner of the title bar and then close the web browser window.
Thank you for taking this time to learn about GenePattern!
As you continue to work with GenePattern, please explore the rest of the site.
We welcome your feedback. If you have suggestions, comments, or questions please visit our forum .