PLPTH 613
Bioinformatics Applications
Spring 2009


Schedule
Research project
K-State Online

Lab 13. Viewing and analyzing interaction networks

PurposeCytoscape network

Become familiar with Cytoscape, an open-source software package for viewing and analysis of networks, and examine the topology of an interaction network. The first half of today's lab will be simply following tutorials to begin to understand what can be done with this rich package. Questions to be addressed in your report are in the usual color.

I. Using Cytoscape

1. Download and install Cytoscape 2.6.2. (It can also be run via WebStart as shown in the tutorials you will follow, but the version launched with WebStart is 2.4 and some of the instructions are written for 2.6, so they don't always match what you see on the screen).

2. Follow this tutorial on Getting Started.

3. Scan this tutorial on Filters and Editing, but don't bother to follow it because we won't need these operations in our exercises. The main thing to realize is that edges and nodes have attributes and that we can set up filters to view only the ones we want.

4. Cytoscape is extensible (by anyone able to program in Java) via externally written plugins, some of which we will investigate. Choose menu item Plugins/Manage Plugins. In the dialog showing available plugins, open the Functional Enrichment category, select BiNGO (information about this plugin may be found at the BiNGO site), and click the Install button. Close the dialog. Now open the Gene Ontology Analysis tutorial and follow it to the end.

5. In the final tutorial section, Find significant enrichment of GO terms in a subnetwork, consider step 4. For your report, compare this operation to what we did in steps 24 and 25 in Lab 8. Now consider step 5. Which of the two p-values is generally larger, and why do you expect this to be the case?

6. We are now finished with the galactose example network and its children and can remove it by selecting the Network tab in the left-hand Control Panel, right-clicking on the names of these networks, and choosing Destroy Network. However, don't destroy the Ontology DAGs or the Yeast GO slim networks, as we'll now be reusing them.

7. We'll now look more closely at one of the clusters that showed cyclical behavior in Lab 11 last week. For this, you'll need the names of the ORFs, which you will find in this file. Right-click on the hyperlink and save to your disk.

8. Open the tutorial on downloading data into Cytoscape and find the section entitled Saccharomyces Genome Database (SGD). When you reach step 2, use section Upload a File of Feature/Standard Gene names, and browse to your file of gene names. In the Other data selection, click the checkbox for physical interactions and Submit. You'll be informed that one of the gene names could not be found, but Proceed and download the .gz file that will be presented.

9. To uncompress this file on your local disk, try right-clicking. There should be some software installed on the lab machines that will do the job, but if not, download 7-Zip.

10. Open the physical-interaction network file with a text editor (or with Excel) and note the very simple format. The second word on each line describes the method by which the interaction was established, and all other words are gene identifiers. Notice that many genes, such as RAD53, have been assigned interactions by multiple technologies, and also that some of the specific interactor genes, such as RAD9, were found with more than one of the technologies.

11. Import the physical-interaction file with Import/Network (Multiple file types) and apply a spring-embedding layout with Layout/Cytoscape Layout/Spring Embedded. To understand the varieties of layout, see the manual section describing layout algorithms. Be sure you understand what spring embedding means -- I may ask you to explain it some day.

12. Locate RAD53 (menu item Select/Nodes/By name, or just use the key shortcut Ctrl-F). Center it in the view by dragging the view box in the miniature view at lower left, and zoom in using the mouse wheel, right-click + dragging, the + button in the toolbar, or any other method you notice. Now you'll be able to see the gene labels on the nodes. Find RAD9 and select and drag it into an open area of the screen, noting that several edges connect it to RAD53. To map these edges to their interaction types, click the Edge Attribute Browser in the Data Panel and then click the Select Attributes icon at top left of the panel and click in the interaction checkbox. Right-click to close. Now from the menu choose Select/Mouse Drag Selects/Edges Only. Now you can click on these edges and view their interaction types in the Data Panel.

13. Repeat with these data the BiNGO procedure in the Ontology tutorial, starting in section Find significant enrichment of GO terms in a network at step 2 (do not create a child network as you did in the tutorial), and comment on your findings. Comment also on the validity of using the setting Test cluster vs. whole annotation, given the source of the data that we used.

II. Calculating the topological parameters of a network

1. We will use a Perl script and Excel to compute the degree distribution of the yeast protein-protein interaction network. Probably Cytoscape has plugins that will do this, but I haven't yet found one.

2. The Database of Interacting Proteins (DIP) has one of the largest collections of publicly available protein-protein interaction data. We will use the latest yeast dataset for today’s lab. In fact Cytoscape allows direct connection to DIP and many other interaction-data repositories, but this is another feature I haven't fully explored yet.

3.
At the DIP site, click on the Files link on the left panel. In the new window, click on the SPECIES link under the Standard data sets category. Note that there are several species-specific datasets available, and S. cerevisiae (baker’s yeast) is the default selection.

4. Don't bother to register as a DIP user and download the data, because you would have to do some Excel manipulation to simplify it. I've done it for you. The latest yeast file contains 17,611 interactions discovered by many different methods, but I've condensed it to just the DIP IDs of the interacting molecules. Right-click on this link and save the file to your work directory.

5. Download this Perl script to your working directory. Open a command-line window and navigate to it, and enter

perl calc_network_topology.pl <interaction file> <output file> N

The N tells the script that the file has the simple two-column format (the script will also work
, if we omit the N, for .sif -formatted files of the kind you saw in step 10 of part I).

6. Load your output file into Excel. Using the expected relation for a scale-free network,
P(k) ~ k, which is shorthand for P(k) = Ck, with C a constant whose value is of no special interest, perform any transformation and statistical operations necessary for estimating γ. Do the results support a hypothesis that node degree in the yeast protein interaction network follows a power-law distribution? Based on what we've seen of interaction data sets in this lab, what characteristics of these DIP data might lead to error in your conclusions?