PLPTH 613
Bioinformatics Applications
Spring 2009
Schedule
Research project
K-State Online

Lab 4. Making genetic maps

Setting up an XWindows connection

The mapping program we'll use, CarthaGene, is free and is available in a Windows version. However, today we will use our Windows PCs as terminals, while CarthaGene runs on a remote Linux host in the Nelson lab. Thus CarthaGene needn't be loaded onto each of the lab PCs.

If you wish to work on the exercises on your own PC, install CarthaGene according to the instructions that come with the package. If you do this, you will need to download the data files via the hyperlinks listed in the exercises below, rather than leaving them on the remote host where CarthaGene is running -- but otherwise the exercises are the same.

XWindows is a protocol that allows a description of a graphical display on one computer to be sent to a remote computer and drawn on that computer's screen. We typically use XWindows to run Unix or Linux applications while controlling them from a graphical user interface (GUI) that appears on our PC's screen. We do this through an XWindows server (ours is called Xming) and a slightly modified putty connection. Here are the instructions for making the connection and starting CarthaGene:
  1. Find Xming (probably in Start/Internet Tools) and start it.
  2. Open putty and in the Session dialog, enter host name coding.plantpath.ksu.edu. Do not start the session!
  3. In the left-hand Category panel, click SSH and then click in the Tunnels option (this may be X11 in some versions). In the panel that appears, click in the checkbox Enable X11 forwarding.
  4. Click again in Session in the left-hand panel. I'll come and help you log on to the host with a username and password, since you don't have your own account.
  5. In your terminal window, type cg to start CarthaGene. This should bring up the CarthaGene interface, and you're ready to start mapping.

CarthaGene exercises

To give practice with features of CarthaGene, I've assembled a few data sets to play with. They cover
  • one linkage group, with codominant F2 data
  • same, with dominance
  • finding linkage groups in multichromosome data
  • merging sections of the same linkage group
CarthaGene offers some quick ways to build a rough map, a few slower ways to build better maps, and a few slower ways to refine the orders of the maps you got. You can do much of the work by pushing buttons in the GUI (graphical user interface), but for more control over the analysis parameters you'll need to learn some of the commands, which you have to type. There's a manual that you can consult.

This is not to say that everything about CarthaGene is just the way you would like it. Entering a desired map order via the command line, for example, involves constructing a list of marker codes, putting it between {curly brackets}, and pasting it after the command mrkselset, which is awkward if you want to stick to pushing buttons. And knowing this trick matters, since if you decide to drop a marker from the current best map, CarthaGene doesn't just keep on with the marker order that's left, but makes you start over. It's easier if you're familiar with the scripting language, but a good mapping program ought to be fully usable by nonprogrammers, which most of us are. So overall, CarthaGene has some pluses and minuses in comparison with other publicly available mapping programs.

Ordering codominant F2 data in a single linkage group

  1. CarthaGene starts by showing you all of the available commands. You can ignore them for now, since we'll use mostly the buttons and menus.
  2. The first dataset looks like this. But we don't need to download the data files to our PCs, since that's not where CarthaGene is running. Instead, choose Load and navigate to directory

    /var/www/localhost/htdocs/PLPTH_613/Labs/data/Lab_4/data.

    If you are used to Windows, note that to "go up" in the directory system you'll need to double-click on the ".." in the Directories side of the file selection dialog. When you find the data file s_scrambled.txt, select it and click OK to load it into CarthaGene.
  3. CarthaGene's command window should show something like
    {1 intercross 40 100 C:/myDatafiles/s_scrambled.txt
  4. The map order from which these marker data were generated is, reasonably enough, a1 .. a40, on a chromosome 120 cM long. So you'll be able to judge how well the mapping algorithms work by comparing the resulting orders and length with the known ones.
  5. Quick and dirty maps can be made with the nicemapd and nicemapl methods, which order markers by simply placing adjacent to one another markers with the smallest distances or linkage LOD scores (a LOD in this context is a measure of the informativeness of the marker pair). Often they produce the same maps. Try typing nicemapd on the command line (press the Enter key to execute). The button does the same thing.
  6. According to the manual, mfmapd and mfmapl are usually better, so try them too. You'll have to type them at the command line, since no menus or buttons are supplied.
  7. You'll notice that the order returned is not a1 .. a40, so we'll investigate this a bit.
  8. You can view these maps by choosing Graphical. CarthaGene stores the best maps (10 by default) in what it calls the heap, and will show you all of them in this view, which can help you see how maps have been rearranged in the direction of higher likelihood.
  9. To see a  table of the map information, type heaprintd. (Or choose button Detail or menu item Maps/Detailed, which gives the same result).
  10. Examine the Distance/Haldane column of the last table and notice that some of the values are 0. So identical data for some of the marker pairs (a3/a4, a5/a6, a8/a9, a30/a31) accounts for some of the apparent reversals. CarthaGene actually lets us remove duplicates in order to speed calculations and prevent the reporting of equivalent maps, but for now we'll leave these alone. Notice that the total map length is 138.4 cM and the log likelihood (which we'll henceforth abbreviate LL) is -442.86. We can see that even ignoring the duplicates, there is a genuine reversal in the first few markers. Do other CarthaGene ordering operations fix this?
  11. In the Build method, CarthaGene builds up candidate orders by choosing the best insertion point for each successive marker. Instead of clicking on the Build button, type build 10 at the command line. The program will build 10 orders at once (starting, I assume, from different initial markers), and report the top (highest-likelihood) one. In this case you should find a map with a slightly improved LL of -442.52 and all the markers in the order we know to be the true one.
  12. Of course in real life we wouldn't know that we had found the true map, so we would probably want to test local rearrangements to improve the map. You may have used Mapmaker's ripple; the equivalent in CarthaGene is flips. Type
    flips 4 1 1
  13. This tells CarthaGene to permute each set of 4 adjacent markers and see if the likelihood increases. (The other two parameters say that we wish CarthaGene to repeat the operation if a new order of LL one unit higher than the current one is found). In the output, part of which is shown here,

       2 3 1   3 2 3 4     1 1   1 2   1 3   2 1 2 2 2 2 2 1     3 3 2 3 1 1 3 1 3 3  log10
     4 7 0 6 2 2 4 3 0 9 5 5 1 3 2 2 8 3 9 6 1 0 6 3 0 5 9 7 7 1 5 1 8 6 4 8 4 9 7 8   -442.52
    [- - 3 2]- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -      0.00
     - -[1 0 3 2]- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -      0.00
     - -[- - 3 2]- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -      0.00
     - -[2 3 1 0]- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -     -0.34
     - -[2 3 0 1]- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -     -0.34
the vertically oriented numbers in the first two rows are the marker indices, the brackets indicate the permutation carried out in each row, and the right-hand column shows the change in LL (we're hoping for a positive change). It turns out there are no positive changes found, so we can't improve our order by flipping, at least at this local scale.
  1. Save your best map (use output from Maps/Detail) and place it in your report.
  2. This was a well-behaved little toy dataset. Let's go on to a more challenging one.

Ordering dominant markers in a single linkage group

  1. Discard the current dataset, using menu choice Data/Session/Reset. Take a look at the next dataset. Can you identify the markers of both dominance types? This is another set of 40 markers, but this time 4 of them are dominant from one parent and 10 from the other. So we expect to get uncertainty from all linkage calculations involving dominant markers, but particularly unreliable numbers from linkage between dominant markers of opposite phase.
  2. Choose Load, find file s10.25_scrambled.txt in the same directory as above, and load it into CarthaGene.
  3. If you've been wondering why these files are scrambled, it's simply to assure you that the orders generated by CarthaGene are independent of the order in which the data are presented in the file.
  4. Try mfmapl and mfmapd. Notice that one of them gives a higher (less negative) LL than the other, so we'd be better off starting with the better one (which CarthaGene will do automatically for us). Use heaprintd (or choose menu item Maps/Detail) to view the table. We can see that the dominance doesn't seem to have messed up ordering terribly badly. Can we get the right order though?
  5. Try build and flips. (You can use the buttons, which will use the default parameters. Or you can vary the parameters by typing the commands at the command line; for example flips 3 1 1).
  6. You've probably noticed that CarthaGene offers ordering methods called Annealing, Taboo, and Genetic. These are heuristic (not guaranteed to produce the best solution to the ordering problem, something no one knows how to do) methods, all based on choosing at random some suborder of a map, flipping it and evaluating the resulting order, and iterating. They are described in the manual and in other documentation available at the CarthaGene WWW site. They take some time to run, and I haven't explored them in depth. However, you're free to try them, and if you later use CarthaGene for your own datasets you should investigate them. In this case when I chose Annealing with default parameters, it produced after about 20 minutes a map with LL 411.02 in which only a1..a4 were inverted. With flips this came down to 408.41 for a map of length 139.0 cM.
  7. Several of the markers in this dataset were completely linked, so that some of the apparent inversions are not the problem that they might appear. (You can eliminate these double markers and speed up ordering with CarthaGene's commands mrkdouble and mrkmerge, as described in the manual). However, it's still hard to get a1..a5 in the right order, probably because a2 is a dominant marker.
  8. Suppose you want CarthaGene to evaluate an order that you give it, rather than building one by its own methods. Here's how. In a text editor, prepare a list of markers, between curly brackets, like this: {a0 a1 a2 a3 a4 a5 a6 ...}. At the CarthaGene command line, type mrkselset [mrkids {a0 a1 a2 a3 a4 a5 a6}]. On Enter, CarthaGene will now use this marker order. To see its LL, use the SEM button or type sem.
  9. Save your best map (use output from Maps/Detail) and place it in your report.

Grouping

  1. We may not get datasets that are already broken up by chromosome, especially if working with a new species or a new kind of genetic marker. In such cases we need to do it ourselves. For an example, load this dataset. You'll find it in the same directory as above, under the name of RiceCAP_fake_mapping_data.txt. This time I've shuffled all the marker labels, so that you won't be able to tell when you've found the original orders!
  2. CarthaGene, like any other mapping program, computes linkages and linkage LOD scores when the data are loaded. To find linkage groups, you specify a maximum linkage and minimum LOD score for testing a candidate marker against members of an existing group. If both criteria are satisfied with at least one member, the marker is placed into that group.
  3. If you select menu item Loci/Identify groups, CarthaGene will default to thresholds of 0.5 and 3. Or you can set the default yourself, with Loci/Config. To set your own, type the command yourself. Here, let's start with 0.2 and 5; type the following:
    group 0.2 5
  4. For this data set we get lots of groups. Pretend you don't know how many chromosomes this species has (you don't yet, but I'll tell you that it's fewer than 13). Now, by playing with the grouping thresholds, see if you can reduce the number to a fairly stable one. What we are looking for is a few good-sized groups that are not too sensitive to a change in grouping thresholds, and a few leftover markers or small groups. You can see that you can increase the size of groups by relaxing the "admission standards" -- reducing LOD threshold and/or increasing linkage threshold.
  5. Now that you've picked a set of candidate groups, let's pick one to start working on, using Loci/Select a group. Notice that CarthaGene turns the marker IDs into names for you in the dialog.
  6. Before we specify a group, let's see how to merge fragments of groups. Note and remember the group IDs of two different large groups (they're unlikely to represent the same chromosome), and then Cancel the dialog. At the command line, enter
    groupmerge 1 7
    (or whatever numbers you chose).
  7. We'll also need to know how to unselect a group, for example when we wish to select a different group, or regroup the markers. The command is rather ugly:
    mrkselset [mrkallget]. There's no menu equivalent; you have to type it.
  8. Now try to order the current set of markers. Note how you can tell that the markers don't belong together.
  9. Construct maps for these data, incorporating as many of the markers as you can. Your report should show the output from Maps/Detail, which you may copy and paste.
  10. If you've used Mapmaker, you're probably going to miss the try command. The CarthaGene manual advises that BuildFW (build framework) is more powerful than build, and in fact performs the function of trying all markers in a framework map -- one for which the marker order is strongly supported. This approach is preferable to forcing all available markers into a map no matter how badly some of them may "fit". You can try buildfw now, but when you have some time, do read the manual entry to learn more about it.
  11. How many chromosomes do you think this organism has? About how many cM long are they?
  12. In lecture we saw an equation showing how to combine recombination fractions in adjacent intervals. Show why this equation holds.