Lab 11. Expression analysis III. Coregulation analysis
In this lab we will isolate clusters of similarly expressed
genes and examine their promoters for shared motifs indicative of
coregulation. We will also use another tool, GoMiner,
for GO-category enrichment investigation of a cluster. Finally, we will
see a dynamic display of gene expression on a biochemical-pathway map
along a time-course experiment.
I. Expression data retrieval and clustering
1. We will use a cDNA
microarray dataset from yeast cell-cycle studies. At the Stanford Microarray Database
(SMD) site, click on the Publications
button in the left-hand panel. Set the List Display organism to Saccharomyces cerevisiae and click
on the Re-list
button. Find the publication by Spellman et al. (1998) near the
bottom of the page and click on the SMD book icon at right.
2. Of the four different
cDNA microarray datasets listed in the new window, we will use the one
labeled Spellman et al: cdc15
block-release. This dataset is derived from yeast cells that
were blocked in telophase using a cdc15-2
temperature-sensitive mutant at restrictive temperature. The culture
was then shifted to permissive temperature (23°C), and released
into the cell replication cycle so that cells were synchronized at the
same stages. Click on the Data
Retrieval and Analysis button corresponding to the above dataset.
3. In the new window, in Next, choose your biological annotation,
leave the setting at SMD default
annotation. In the next option, check the Use Experiment name box and uncheck
the Use Slide name box. Then
click on the Proceed to Data Filtering
button.
4. In the new window,
click on the Retrieve Data
button. Some time will pass while the data are retrieved from the
database and processed (filtered and normalized). Right-click on
the Download PreClustering File
link at the bottom. Choose Save link as... or Save
Target As... (depending on your browser) to save the data as a
text file on your PC (change the suffix from .pcl to .txt). We will need this for the
gene names when we use GoMiner.
5. Click on Proceed to Gene Filtering. Accept
the default parameters, which appear to be aimed mostly at filtering
out genes whose expression doesn't vary much over the time-course
experiment, and click Filter Data.
Note that we've gotten rid of about 80% of the genes in this way.
6.
Right-click on
the Download PreClustering File
link at the bottom. Choose Save link as... or Save
Target As... (depending on your browser) to save the data as a
text file on your PC (change the suffix from .pcl to .txt so that TMEV will be able to
load it). Then click Proceed to
Clustering.
7. In the Clustering and Image Options page,
accept the default selections No
partitioning, Pearson
Correlation (non-centered), Do
no experiment clustering, and Use
red/green color scheme. You can turn off Show spot images -- I couldn't make
the feature work anyway. Now click Cluster.
This is just to show that there are a few analysis options available in
the SMD. We'll
pay more attention to the more elaborate TMeV analysis tools.
8. Load the filtered preclustering data
into TMEV as described in Lab 9, except in
the Expression File Loader
window, click the Two-color Array
button. When prompted to click the top left expression cell, be sure
you skip the weight row and column, which are filled with 1.
9. In the Multiple Array
Viewer window, select k-Means/Medians
Clustering and accept the default parameters of Euclidean distance and 10 clusters
with 50 iterations.
10. After the clustering
operation has finished, in the panel at left open the KMC entry and the Expression Graphs under that, and
click on All Clusters. Examine
the clustering results, looking for cyclical expression behavior. You
can study individual clusters closer by clicking on their entries. What do you expect to be the length
of a cycle? It may help to scan the Spellman
et al. article (linked from the data page) describing the
experiment.
11. What is the "best" number of clusters to
look for in this data set? One approach is called Figure of Merit. Try
this option under the Clustering
menu/button, leaving the settings at their default values. Examine the FOM vs. # of Clusters curve. Based on the description of the
feature in the MeV manual, what can you conclude about an optimum
cluster number for this data set?
12. Try a couple of other cluster numbers as
input into KMC and briefly
report on the appearance of the clusters with respect to cyclical
behavior of the period you expect from the article.
13. After identifying a
cluster that appears to show cyclical behavior of the expected period,
open its entry in Table views
in the left panel and right-click in the table to Save cluster to your directory as a
text file. Do
you expect a
clean sine-wave figure from beginning to end of the time course? What
would be the most likely reason why you don't? How did the authors of
this study identify cyclically expressed genes (as well as you
understand it). Also, for
comparison in the GO analysis below, save another cluster that does NOT
appear to show cyclical behavior.
14. Searching promoters of coregulated genes
for shared motifs Navigate
to the WebMOTIFs
page, a site set up to run several motif-finding programs and
synthesize their results. With
Microsoft
Excel, open the gene cluster file you just saved, and select and copy
the column under label YORF
(which we can guess means Yeast ORF). Click
on the Try it! link and, in
the submission page, enter your e-mail address with a job name, and
paste your ORF names into the box. After you Submit Query,
you may receive a message that not all of the ORF names could be found,
but that WebMOTIFs will continue to work on the remainder. It should
also warn you that these jobs typically take 6 hours, so you may want
to order a pizza and some beverages.
15.
Other motif finders require us to input the actual sequences we wish to
search. We will retrieve from the S. cerevisiae Promoter Database
(SCPD)
the promoter sequences for the genes in our cluster. On the SCPD site,
click on the link Retrieve promoter
sequences and in the next page, paste your ORF names in the box,
leave the settings unchanged, and Submit.
From
the resulting page, select all (Ctrl-A) and copy; paste the text into a
text document and save.
16. While waiting for the
seriously slow WebMOTIFs site, let's see if we can find a faster
option. One is Melina
II, which also invokes four different motif finders. On my test
example of 54 promoters it took less than a minute to produce results.
Again, you can use the default parameters (since we really don't know
any better). If after submitting, you get a Japanglish prompt "Please
checking your query sequences", you should look at them to see if any
have just tags with no sequence. These you will need to remove.
17. Navigate to MEME, enter your e-mail
address, leave the parameters at their default values, and submit; you
should receive your results within a half hour. Note
that this is one of the software tools also used by WebMOTIFs. Also
navigate to the AlignAce
site and submit your sequences there; this job finishes very quickly.
18. Register for a free
account at the Match
site and then log in. Paste your promoters into the text box and select
the Fungi group of matrices
(since S. cerevisiae
is a
fungus). Submit for a search against the motif profiles stored in the
database. Good luck -- I didn't find anything in most of the promoters
in my cluster. Be aware that the public version of this database is
incomplete (even four years ago when it was released!). For the
complete database you need to pay. Anyway, scroll through your results
to locate just one for which a matrix was found, and click on the
hyperlink to it under the label matrix
identifier. Give two
ways in which you can determine the number of aligned sequences from
which this matrix was constructed.
19. For your report, you
will need to compare the results of the individual programs run by
these sites. Comment on the consistency of motifs across programs and
on the frequency of occurrence of the same motif in many promoters in
your cluster. Where provided, follow links to identify any functional
assignment associated with a motif.
20. Another
GO analysis.
In step 4 you saved the full gene set from the yeast cell-cycle
experiment. Open this file with Excel, select the 6000+ gene names in
the first column, and copy-paste them into another worksheet. Do a
search-and replace of <space> with nothing, since this file must
have only one column. Then save as text (I'll assume you name it allgenes.txt).
From the two clusters that you saved in step 13, retrieve the names and
paste them into two other files, which I'll assume you call changedgenescyclical.txt and changedgenesnoncyclical.txt.
21. Navigate to the GoMiner
WWW interface (we could also download the software for a more
interactive graphical display). Use the Browse buttons to enter your total file and your changed file. Choose SGD as the data source and S. cerevisiae as the organism, and
leave the other options at their defaults. Finally, enter your e-mail
address and Submit Query.
Do the same thing with both the cyclical and noncyclical files. After a
few minutes you'll be e-mailed, for either analysis, a message with a
URL where you can examine your results. I won't give you detailed
directions for exploring these files, but at least compare the GO enrichment results
for the two clusters.
22. Viewing a time-course
experiment dynamically in the Omics Viewer. Save
these yeast
nitrogen-depletion data to your local disk (right click on the link
and choose Save link as...).
Navigate to SGD's Pathway
Tools Omics Viewer and scroll down the page to the line beginning File containing experimental data.
Use the Browse button to enter
the path to your data file.
23. In the section below,
labeled Animated time series,
enter the numbers 1 to 9 in the first scroll box, hitting the Enter
key after each so that it's on a line by itself. These numbers are the
9 time points in the series we'll try to animate. Leave the
radio-button setting to a single data
column.
24. Scroll down to the
end of the page and click Submit.
25. In your report,
identify a couple of pathways whose components seem to be varying in a
systematic way during progressive nitrogen depletion in yeast.
|