PLPTH 613
Bioinformatics Applications
Spring 2009
Home page
Schedule
Research project
K-State Online

Lab 11. Expression analysis III. Coregulation analysis

In this lab we will isolate clusters of similarly expressed genes and examine their promoters for shared motifs indicative of coregulation. We will also use another tool, GoMiner, for GO-category enrichment investigation of a cluster. Finally, we will see a dynamic display of gene expression on a biochemical-pathway map along a time-course experiment.

I. Expression data retrieval and clustering

1. We will use a cDNA microarray dataset from yeast cell-cycle studies.  At the Stanford Microarray Database (SMD) site, click on the Publications button in the left-hand panel. Set the List Display organism to Saccharomyces cerevisiae and click on the Re-list button.  Find the publication by Spellman et al. (1998) near the bottom of the page and click on the SMD book icon at right.

2. Of the four different cDNA microarray datasets listed in the new window, we will use the one labeled Spellman et al: cdc15 block-release. This dataset is derived from yeast cells that were blocked in telophase using a cdc15-2 temperature-sensitive mutant at restrictive temperature. The culture was then shifted to permissive temperature (23°C), and released into the cell replication cycle so that cells were synchronized at the same stages. Click on the Data Retrieval and Analysis button corresponding to the above dataset.

3. In the new window, in Next, choose your biological annotation, leave the setting at SMD default annotation. In the next option, check the Use Experiment name box and uncheck the Use Slide name box. Then click on the Proceed to Data Filtering button.

4. In the new window, click on the Retrieve Data button. Some time will pass while the data are retrieved from the database and processed (filtered and normalized). Right-click on the Download PreClustering File link at the bottom. Choose Save link as... or Save Target As... (depending on your browser) to save the data as a text file on your PC (change the suffix from .pcl to .txt). We will need this for the gene names when we use GoMiner.

5. Click on Proceed to Gene Filtering. Accept the default parameters, which appear to be aimed mostly at filtering out genes whose expression doesn't vary much over the time-course experiment, and click Filter Data. Note that we've gotten rid of about 80% of the genes in this way.

6. Right-click on the Download PreClustering File link at the bottom. Choose Save link as... or Save Target As... (depending on your browser) to save the data as a text file on your PC (change the suffix from .pcl to .txt so that TMEV will be able to load it). Then click Proceed to Clustering.

7. In the Clustering and Image Options page, accept the default selections No partitioning, Pearson Correlation (non-centered), Do no experiment clustering, and Use red/green color scheme. You can turn off Show spot images -- I couldn't make the feature work anyway. Now click Cluster. This is just to show that there are a few analysis options available in the SMD. We'll pay more attention to the more elaborate TMeV analysis tools.

8. Load the filtered preclustering data into TMEV as described in Lab 9, except in the Expression File Loader window, click the Two-color Array button. When prompted to click the top left expression cell, be sure you skip the weight row and column, which are filled with 1.

9. In the Multiple Array Viewer window, select k-Means/Medians Clustering and accept the default parameters of Euclidean distance and 10 clusters with 50 iterations.

10. After the clustering operation has finished, in the panel at left open the KMC entry and the Expression Graphs under that, and click on All Clusters. Examine the clustering results, looking for cyclical expression behavior. You can study individual clusters closer by clicking on their entries. What do you expect to be the length of a cycle? It may help to scan the Spellman et al. article (linked from the data page) describing the experiment.

11. What is the "best" number of clusters to look for in this data set? One approach is called Figure of Merit. Try this option under the Clustering menu/button, leaving the settings at their default values. Examine the FOM vs. # of  Clusters curve. Based on the description of the feature in the MeV manual, what can you conclude about an optimum cluster number for this data set?

12. Try a couple of other cluster numbers as input into KMC and briefly report on the appearance of the clusters with respect to cyclical behavior of the period you expect from the article.

13. After identifying a cluster that appears to show cyclical behavior of the expected period, open its entry in Table views in the left panel and right-click in the table to Save cluster to your directory as a text file. Do you expect a clean sine-wave figure from beginning to end of the time course? What would be the most likely reason why you don't? How did the authors of this study identify cyclically expressed genes (as well as you understand it). Also, for comparison in the GO analysis below, save another cluster that does NOT appear to show cyclical behavior.

14. Searching promoters of coregulated genes for shared motifs
Navigate to the WebMOTIFs page, a site set up to run several motif-finding programs and synthesize their results. With Microsoft Excel, open the gene cluster file you just saved, and select and copy the column under label YORF (which we can guess means Yeast ORF). Click on the Try it! link and, in the submission page, enter your e-mail address with a job name, and paste your ORF names into the box. After you Submit Query, you may receive a message that not all of the ORF names could be found, but that WebMOTIFs will continue to work on the remainder. It should also warn you that these jobs typically take 6 hours, so you may want to order a pizza and some beverages.

15.
Other motif finders require us to input the actual sequences we wish to search. We will retrieve from the S. cerevisiae Promoter Database (SCPD) the promoter sequences for the genes in our cluster. On the SCPD site, click on the link Retrieve promoter sequences and in the next page, paste your ORF names in the box, leave the settings unchanged, and Submit. From the resulting page, select all (Ctrl-A) and copy; paste the text into a text document and save.

16. While waiting for the seriously slow WebMOTIFs site, let's see if we can find a faster option. One is Melina II, which also invokes four different motif finders. On my test example of 54 promoters it took less than a minute to produce results. Again, you can use the default parameters (since we really don't know any better). If after submitting, you get a Japanglish prompt "Please checking your query sequences", you should look at them to see if any have just tags with no sequence. These you will need to remove.

17. Navigate to MEME, enter your e-mail address, leave the parameters at their default values, and submit; you should receive your results within a half hour. Note that this is one of the software tools also used by WebMOTIFs. Also navigate to the AlignAce site and submit your sequences there; this job finishes very quickly.

18. Register for a free account at the Match site and then log in. Paste your promoters into the text box and select the Fungi group of matrices (since S. cerevisiae is a fungus). Submit for a search against the motif profiles stored in the database. Good luck -- I didn't find anything in most of the promoters in my cluster. Be aware that the public version of this database is incomplete (even four years ago when it was released!). For the complete database you need to pay. Anyway, scroll through your results to locate just one for which a matrix was found, and click on the hyperlink to it under the label matrix identifier. Give two ways in which you can determine the number of aligned sequences from which this matrix was constructed.

19. For your report, you will need to compare the results of the individual programs run by these sites. Comment on the consistency of motifs across programs and on the frequency of occurrence of the same motif in many promoters in your cluster. Where provided, follow links to identify any functional assignment associated with a motif.

20. Another GO analysis. In step 4 you saved the full gene set from the yeast cell-cycle experiment. Open this file with Excel, select the 6000+ gene names in the first column, and copy-paste them into another worksheet. Do a search-and replace of <space> with nothing, since this file must have only one column. Then save as text (I'll assume you name it allgenes.txt). From the two clusters that you saved in step 13, retrieve the names and paste them into two other files, which I'll assume you call changedgenescyclical.txt and
changedgenesnoncyclical.txt.

21. Navigate to the GoMiner WWW interface (we could also download the software for a more interactive graphical display). Use the Browse buttons to enter your total file and your changed file. Choose SGD as the data source and S. cerevisiae as the organism, and leave the other options at their defaults. Finally, enter your e-mail address and Submit Query. Do the same thing with both the cyclical and noncyclical files. After a few minutes you'll be e-mailed, for either analysis, a message with a URL where you can examine your results. I won't give you detailed directions for exploring these files, but at least compare the GO enrichment results for the two clusters.

22. Viewing a time-course experiment dynamically in the Omics Viewer.
Save these yeast nitrogen-depletion data to your local disk (right click on the link and choose Save link as...). Navigate to SGD's Pathway Tools Omics Viewer and scroll down the page to the line beginning File containing experimental data. Use the Browse button to enter the path to your data file.

23. In the section below, labeled Animated time series, enter the numbers 1 to 9 in the first scroll box, hitting the Enter key after each so that it's on a line by itself. These numbers are the 9 time points in the series we'll try to animate. Leave the radio-button setting to a single data column.

24. Scroll down to the end of the page and click Submit.

25.
In your report, identify a couple of pathways whose components seem to be varying in a systematic way during progressive nitrogen depletion in yeast.