Lab 13. Viewing and analyzing interaction networks
Purpose
Become familiar with Cytoscape, an open-source software package for
viewing and analysis of networks, and examine the topology of an
interaction network. The first half of today's lab will be
simply
following tutorials to begin to understand what can be done with this
rich package. Questions to be addressed in your report are in the usual color.
I. Using Cytoscape
1. Download
and install Cytoscape 2.6.2. (It can also be run via WebStart as shown
in the tutorials you will follow, but the version launched with
WebStart is 2.4 and some of the instructions are written for 2.6, so
they don't
always match what you see on the screen).
2. Follow this
tutorial on Getting Started.
3. Scan this
tutorial on Filters and Editing, but don't bother to follow it
because we won't need these operations in our exercises. The main thing
to realize is that edges and nodes have attributes and that we can set up filters to view only the ones we
want.
4. Cytoscape is
extensible (by anyone able to program in Java) via externally written plugins, some of which we will
investigate. Choose menu item
Plugins/Manage Plugins.
In the dialog showing available plugins, open the Functional Enrichment category,
select BiNGO (information
about this plugin may be found at the BiNGO site), and
click the Install button.
Close the dialog. Now open the Gene
Ontology Analysis tutorial and follow it to the end.
5. In the final tutorial
section, Find significant enrichment
of GO terms in a subnetwork, consider step 4. For your report, compare this operation to what we
did in steps 24 and 25 in Lab 8. Now consider step 5. Which of the two p-values is
generally larger, and why do you expect this to be the case?
6. We are now finished
with the galactose example network and its children and can remove it
by selecting the Network tab
in the left-hand Control Panel, right-clicking on the names of these
networks, and choosing Destroy Network.
However, don't destroy the Ontology
DAGs or the Yeast GO slim
networks, as we'll now be reusing them.
7. We'll now look more
closely at one of the clusters that showed cyclical behavior in Lab 11 last week. For this, you'll need the
names of the ORFs, which you will find in this file. Right-click
on the hyperlink and save to your disk.
8. Open the tutorial
on downloading data into Cytoscape and find the section entitled Saccharomyces Genome Database (SGD).
When you reach step 2, use section Upload
a File of Feature/Standard Gene names, and browse to your file
of gene names. In the Other data
selection, click the checkbox for physical interactions and Submit. You'll be informed that one
of the gene names could not be found, but Proceed and download the .gz file that will be presented.
9. To uncompress this
file on your local disk, try right-clicking. There should be some
software installed on the lab machines that will do the job, but if
not, download 7-Zip.
10. Open the
physical-interaction network file with a text editor (or with Excel)
and note the very simple format. The second word on each line describes
the method by which the interaction was established, and all other
words are gene identifiers. Notice that many genes, such as RAD53, have been assigned
interactions by multiple technologies, and also that some of the
specific interactor genes, such as RAD9,
were found with more than one of the technologies.
11. Import the
physical-interaction file with Import/Network
(Multiple file types) and apply a spring-embedding layout with Layout/Cytoscape Layout/Spring Embedded.
To understand the varieties of layout, see the manual
section describing layout algorithms. Be sure you understand what spring embedding means -- I may
ask you to explain it some day.
12. Locate RAD53 (menu item Select/Nodes/By name, or just use
the key shortcut Ctrl-F).
Center it in the view by dragging the view box in the miniature view at
lower left, and zoom in using the mouse wheel, right-click + dragging,
the + button in the toolbar, or any other method you notice. Now you'll
be able to see the gene labels on the nodes. Find RAD9 and select and drag it into an
open area of the screen, noting that several edges connect it to RAD53. To map these edges to their
interaction types, click the Edge
Attribute Browser in the Data Panel and then click the Select Attributes icon at top left
of the panel and click in the interaction
checkbox. Right-click to close. Now from the menu choose Select/Mouse Drag Selects/Edges Only.
Now you can click on these edges and view their interaction types in
the Data Panel.
13. Repeat with these
data the BiNGO procedure in the Ontology
tutorial, starting in section Find
significant enrichment of GO terms in a network at step 2 (do
not create a child network as you did in the tutorial), and comment on your findings. Comment also on the validity of
using the setting Test cluster
vs. whole annotation, given
the source of the data that we used.
II. Calculating the topological parameters of a network
1. We will use a Perl
script and Excel to compute the degree
distribution of the yeast protein-protein interaction network. Probably
Cytoscape has plugins that will do this, but I haven't yet found one.
2. The Database of Interacting Proteins
(DIP) has one of the largest collections of publicly available
protein-protein interaction data. We will use the latest yeast dataset
for today’s lab. In fact Cytoscape allows direct connection to DIP and
many other interaction-data repositories, but this is another feature I
haven't fully explored yet.
3. At the DIP site,
click on
the Files link on the left
panel. In the new window, click
on the SPECIES link under the Standard data sets category. Note
that there are several species-specific datasets available, and S. cerevisiae (baker’s yeast) is the
default selection.
4. Don't bother to
register as a DIP user and download the data, because you would have to
do some Excel manipulation to simplify it. I've done it for you. The
latest yeast file contains 17,611 interactions discovered by many
different methods, but I've condensed it to just the DIP IDs of the
interacting molecules. Right-click on this link and save the
file to your work directory.
5. Download this Perl script to
your working directory. Open
a command-line window and navigate to it, and enter
perl calc_network_topology.pl
<interaction file> <output file> N
The N tells the script that the file has the simple two-column format
(the script will also work,
if we omit the N, for .sif
-formatted files of the kind you saw in step 10 of part I).
6. Load your output file
into Excel. Using the expected relation for a scale-free network, P(k) ~ k-γ, which is shorthand for P(k) = Ck-γ, with C a constant
whose value is of no special interest, perform any transformation and
statistical operations necessary for estimating γ. Do the results support a
hypothesis
that node degree in the yeast protein interaction network follows a
power-law
distribution? Based on what we've seen of interaction data sets in this
lab, what characteristics of these DIP data might lead to error in your
conclusions?
|