Lab 4. Making genetic maps
Setting up an XWindows connection
The mapping program we'll use, CarthaGene,
is free and is available in a Windows version. However, today we will
use our Windows PCs as terminals, while CarthaGene runs on a remote
Linux host in the Nelson lab. Thus CarthaGene needn't be loaded onto
each of the lab PCs.
If you wish to work on the exercises on your own PC, install CarthaGene
according to the instructions that come with the package. If you do
this, you will
need to download the data files via the hyperlinks listed in the
exercises below, rather than leaving them on the remote host where
CarthaGene is running -- but otherwise the exercises are the same.
XWindows is a protocol that allows a description of a graphical display
on one computer to be sent to a remote computer and drawn on that
computer's screen. We typically use XWindows to run Unix or Linux
applications while controlling them from a graphical user interface
(GUI) that appears on our PC's screen. We do this through an
XWindows server (ours is called Xming)
and a slightly modified putty
connection. Here are the instructions for making the connection and
starting CarthaGene:
- Find Xming (probably
in Start/Internet Tools) and
start it.
- Open putty and in
the Session dialog, enter host name coding.plantpath.ksu.edu.
Do not start the session!
- In the left-hand Category
panel, click SSH and then
click in the Tunnels option
(this may be X11 in some
versions). In the panel that appears, click in the checkbox Enable X11 forwarding.
- Click again in Session
in the left-hand panel. I'll come and help you log on to the host with
a username and password, since you don't have your own account.
- In your terminal window, type cg to start CarthaGene. This should
bring up the CarthaGene interface, and you're ready to start mapping.
CarthaGene exercises
To give practice with features of CarthaGene,
I've assembled a few
data sets to play with. They cover
- one linkage group, with codominant F2 data
- same, with dominance
- finding linkage groups in multichromosome data
- merging sections of the same linkage group
CarthaGene offers some quick ways to build a rough map, a few slower
ways to build better maps, and a few slower ways to refine the orders
of the
maps you got. You can do much of the work by pushing buttons in the GUI
(graphical user interface), but for more control over the analysis
parameters
you'll need to learn some of the commands, which you have to type.
There's
a manual
that you can consult.
This is not to say that everything about CarthaGene is just
the way
you would like it. Entering a desired map order via the command line,
for example, involves constructing a list of marker codes, putting it
between {curly
brackets}, and pasting it after the command mrkselset, which is
awkward
if you want to stick to pushing buttons. And knowing this trick
matters,
since if you decide to drop a marker from the current best map,
CarthaGene
doesn't just keep on with the marker order that's left, but makes you
start
over. It's easier if you're familiar with the scripting language, but a
good mapping program ought to be fully usable by nonprogrammers, which
most
of us are. So overall, CarthaGene has some pluses and minuses in
comparison
with other publicly available mapping programs.
Ordering codominant F2 data in a single linkage
group
- CarthaGene starts by showing you all
of the available commands. You can ignore them for now, since we'll
use mostly the buttons and menus.
- The first
dataset looks like this. But we don't need to download the data
files to our PCs, since that's not where CarthaGene is running.
Instead, choose Load and
navigate to directory
/var/www/localhost/htdocs/PLPTH_613/Labs/data/Lab_4/data.
If you are used to Windows, note that to "go up" in the directory
system
you'll need to double-click on the ".."
in the Directories side of the
file
selection dialog. When you find the data file s_scrambled.txt, select it and click
OK to load it into
CarthaGene.
- CarthaGene's command window
should show something like
{1 intercross 40 100
C:/myDatafiles/s_scrambled.txt
- The map order from which these marker data were generated
is,
reasonably enough, a1 .. a40,
on a chromosome 120 cM long. So you'll be able to judge how well the
mapping algorithms work by comparing the resulting orders and length
with the known ones.
- Quick and dirty maps can be made with the nicemapd and nicemapl methods, which order
markers
by simply placing adjacent to one another markers with the smallest
distances or linkage LOD scores (a
LOD in this context is a measure of the informativeness of the marker
pair). Often they produce the same maps. Try typing nicemapd on the command line (press
the Enter key to execute).
The button does the same thing.
- According to the manual, mfmapd
and mfmapl are usually better,
so try them too. You'll have to type them at the command line, since no
menus or buttons are supplied.
- You'll notice that the order returned is not a1 .. a40, so we'll investigate this a bit.
- You can view these maps by choosing Graphical.
CarthaGene stores the best maps (10 by default) in what it calls the heap, and will show you all of them
in this view, which can help you see how maps have been rearranged in
the direction of higher likelihood.
- To see a table of the map information, type heaprintd. (Or choose button Detail
or menu item Maps/Detailed,
which gives the same result).
- Examine the Distance/Haldane
column of the last table and notice that some of the values are 0. So
identical data for some of the marker pairs (a3/a4, a5/a6, a8/a9, a30/a31) accounts for some of the
apparent reversals. CarthaGene actually lets us remove duplicates in
order
to speed
calculations and prevent the reporting of equivalent maps, but for now
we'll
leave these alone. Notice that the total map length is 138.4 cM and the log likelihood
(which we'll henceforth abbreviate LL) is -442.86. We can see that even
ignoring the duplicates, there is a genuine reversal in the first few
markers. Do
other CarthaGene ordering operations fix this?
- In the Build
method,
CarthaGene builds up candidate orders by choosing the best insertion
point for each
successive marker. Instead of clicking on the Build button, type build 10 at the command line. The
program
will build 10 orders at once (starting, I assume, from different
initial
markers), and report the top (highest-likelihood) one. In this case you
should
find a map with a slightly improved LL of -442.52 and all the markers in the
order we know to be the true one.
- Of course in real life we wouldn't know that we had found
the
true map, so we would probably want to test local rearrangements to
improve the map. You may have used Mapmaker's ripple; the equivalent in CarthaGene
is flips. Type
flips 4 1 1
- This tells CarthaGene to permute each set of 4 adjacent
markers
and see if the likelihood increases. (The other two parameters say that
we wish CarthaGene to repeat the operation if a new order of LL one unit
higher than the current one is found). In the output, part of which is
shown here,
2 3
1 3 2 3 4 1 1 1
2 1 3 2 1 2 2 2 2 2 1 3
3 2 3 1 1 3 1 3 3 log10
4 7 0 6 2 2 4 3 0
9 5
5 1 3 2 2 8 3 9 6 1 0 6 3 0 5 9 7 7 1 5 1 8 6 4 8 4 9 7 8
-442.52
[- - 3 2]- - - - - - -
- - -
- - - - - - - - - - - - - - - - - - - - - - - - -
- 0.00
- -[1 0 3 2]- - -
- -
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- 0.00
- -[- - 3 2]- - -
- -
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- 0.00
- -[2 3 1 0]- - -
- -
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -0.34
- -[2 3 0 1]- - -
- -
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -0.34
the vertically oriented numbers in the first two rows are the marker
indices, the brackets indicate the permutation carried out in each row,
and the right-hand column shows the change in LL (we're
hoping for
a positive change). It turns out there are no positive changes found,
so
we can't improve our order by flipping, at least at this local scale.
- Save your best map (use output from
Maps/Detail) and place it in your report.
- This was a well-behaved little toy dataset. Let's go on to
a more
challenging one.
Ordering dominant markers in a single linkage group
- Discard the current dataset, using menu choice Data/Session/Reset. Take a look at the
next dataset. Can you identify
the markers of both dominance types? This is another set of 40 markers,
but
this time 4 of them are dominant from one parent and 10 from the other.
So
we expect to get uncertainty from all linkage calculations involving
dominant markers, but particularly unreliable numbers from linkage
between dominant markers of opposite phase.
- Choose Load, find
file s10.25_scrambled.txt in
the same directory as above, and load it into CarthaGene.
- If you've been wondering why these files are scrambled, it's simply to assure you
that the orders generated by CarthaGene are independent of the order in
which
the data are presented in the file.
- Try mfmapl and mfmapd. Notice that one of them
gives a higher (less negative) LL than the
other, so we'd be better off starting with the better one (which
CarthaGene will do automatically for us). Use heaprintd (or choose menu item Maps/Detail) to view the table. We
can
see that the dominance doesn't seem to have messed up ordering terribly
badly. Can we get the right order though?
- Try build and flips. (You can use the buttons,
which will use the default parameters. Or you can vary the parameters
by typing the commands at the command line; for example flips 3 1 1).
- You've probably noticed that CarthaGene offers ordering
methods
called Annealing, Taboo, and Genetic. These are heuristic
(not guaranteed to produce the best solution to the ordering problem,
something no one knows how to do) methods, all based on choosing at
random some suborder of a map, flipping it and evaluating the resulting
order, and iterating.
They are described in the manual and in other documentation available
at
the CarthaGene
WWW site. They take some time to run, and I haven't explored them
in
depth. However, you're free to try them, and if you later use
CarthaGene
for your own datasets you should
investigate them. In this case when I chose Annealing with default parameters,
it
produced after about 20 minutes a map with LL 411.02 in which only a1..a4 were inverted. With flips this came down to 408.41 for a map of length 139.0 cM.
- Several of the markers in this dataset were completely
linked, so
that some of the apparent inversions are not the problem that they
might appear. (You can eliminate these double markers and speed up
ordering with CarthaGene's commands mrkdouble
and mrkmerge, as described in
the manual). However, it's still hard to get a1..a5 in the right order, probably
because a2 is a dominant
marker.
- Suppose you want CarthaGene to evaluate an order that you
give
it, rather than building one by its own methods. Here's how. In a text
editor, prepare a list of markers, between curly brackets, like this: {a0
a1
a2 a3 a4 a5 a6 ...}. At the CarthaGene command line, type mrkselset
[mrkids {a0 a1 a2 a3 a4 a5 a6}]. On
Enter, CarthaGene will now use this marker order. To see its LL, use the SEM button or type sem.
- Save
your best map (use output from Maps/Detail) and
place it in your report.
Grouping
- We may not get datasets that are already broken up by
chromosome,
especially if working with a new species or a new kind of genetic
marker. In such cases we need to do it ourselves. For an example, load this
dataset. You'll find it in the same directory as above, under the
name of RiceCAP_fake_mapping_data.txt.
This time I've shuffled all the marker labels, so that you
won't be able to tell when you've found the original orders!
- CarthaGene, like any other mapping program, computes
linkages and
linkage LOD scores when the data are loaded. To find linkage groups,
you specify
a maximum linkage and minimum LOD score for testing a candidate marker
against
members of an existing group. If both criteria are satisfied with at
least
one member, the marker is placed into that group.
- If you select menu item Loci/Identify groups,
CarthaGene
will default to thresholds of 0.5 and 3. Or you can set the default
yourself,
with Loci/Config. To set your own, type the command yourself.
Here, let's start with 0.2 and 5; type the following:
group 0.2 5
- For this data set we get lots of groups. Pretend you don't
know
how many chromosomes this species has (you don't yet, but I'll tell you
that
it's fewer than 13). Now, by playing with the grouping thresholds, see
if
you can reduce the number to a fairly stable one. What we are looking
for
is a few good-sized groups that are not too sensitive to a change in
grouping thresholds, and a few leftover markers or small groups. You
can see that
you can increase the size of groups by relaxing the "admission
standards"
-- reducing LOD threshold and/or increasing linkage threshold.
- Now that you've picked a set of candidate groups, let's
pick one
to start working on, using Loci/Select a group. Notice that
CarthaGene turns the marker IDs into names for you in the dialog.
- Before we specify a group, let's see how to merge fragments
of
groups. Note and remember the group IDs of two different large groups
(they're unlikely to represent the same chromosome), and then Cancel
the dialog. At the
command line, enter
groupmerge 1 7
(or whatever numbers you chose).
- We'll also need to know how to unselect
a group, for example when we wish to select a different group, or
regroup the markers. The command is rather ugly:
mrkselset [mrkallget].
There's no menu equivalent; you have to type it.
- Now try to order the current set of markers. Note how you
can
tell that the markers don't belong together.
- Construct
maps for
these data, incorporating as many of the markers as you can. Your
report should show the output from Maps/Detail, which you may
copy and paste.
- If you've used Mapmaker, you're probably going to miss the try
command. The CarthaGene manual advises that BuildFW (build framework) is more
powerful than build, and in
fact performs the function of trying
all markers in a framework map -- one for which the marker order is
strongly supported. This approach is preferable to forcing all
available markers into a map no matter how badly some of them may
"fit". You can try buildfw
now, but when you have some time, do read the manual entry to learn
more about it.
- How
many chromosomes do you think this organism has? About how
many cM long are they?
- In
lecture we saw an equation showing how to combine recombination
fractions in adjacent intervals. Show why this equation holds.
|