7.05.2007

How do you map a gene?

One of the questions that's most frequently asked about my research, and one that I'm never able to answer concisely and satisfactorily, is this one: How do you map a gene? And although the answer is a bit long and involved, it's not too difficult conceptually, once you get a few basics out of the way.

The first thing you need in order to map a gene is some sort of variation in that gene, be it mutants vs. wild-type, or a polymorphism in the population (say, red hair vs. brown hair). You need to be able to sort out, based on phenotype, which individuals have the wild-type version and which ones have the mutation or polymorphism. Usually, this means you need an assay, whether it's based on morphology, drug treatment, behavior... however you do it, you need to sort your animals into two categories. One other important thing to keep in mind is that the mutants and the wild-types, in most cases, come from the same families: they are siblings, so they're genetically nearly identical except for the gene you're looking for, which is causing the phenotype you're sorting by.

The next thing you need is a library of markers of known genomic location. For zebrafish and many other animals, this has been published - and new markers are added constantly. zfin.org is one place to find these published markers. Here is a link to a map of these markers. Click on a chromosome button (referred to as LG, or linkage group, on that site) to see the map of the chromosome and all the markers. Click on an individual marker to see all the published information about it - location, sequence, who discovered it, etc.

Most of the markers in this particular map, which is one that I use extensively in my work, are known as Z markers, and are identified with a Z and then a number, such as z1234. These "markers" are also known as SSLPs, or simple sequence-length polymorphisms. This means that they are sites which vary in length between different strains and even individuals within a strain. This makes them very handy for our purposes.

SSLPs are usually found around sequences of di- or tri-nucleotide repeats, such as a stretch of AGAGAGAGAGAG base pairs. The reason they have length polymorphisms is that when the DNA replication machinery is copying these short repeated sequences, the enzyme is likely to "slip" and copy a few bases twice. These events happen fairly often (evolutionarily speaking), and randomly, and the end result is that there are many different length alleles in the population. Length differences are easily detected by PCR and gel electrophoresis. This gives us an easy way to determine a fish's genotype at a particular site. By comparing individuals to their parents, we can determine which chromosome is from Mom and which one is from Dad.

There's one more important point to make before I get into the nitty-gritty of actually mapping the gene. Mutations are made on a lab "wild-type" strain - in our case, we use the *AB line for mutagenesis. ABs are useful because they are fairly genetically uniform, and have very few lethal mutations hiding out in their genome. But they are bad for mapping, because the SSLPs tend to be the same length in all the fish. Once a family has been identified with a mutation, one of the carrier parents is outcrossed to another strain - WIK, in our lab - which has different SSLP sizes at most of the sites, and is actually known as a "polymorphic mapping strain" in many labs.* So once the outcross is done and another carrier pair identified - these fish are *AB/WIK genotype - the offspring of this cross are sorted by phenotype and then their DNA is extracted. We also get the DNA from Mom and Dad, as well as the founder grandparent fish, and the wild-type WIK animals used for outcrossing.

The first step in mapping the gene is to determine gross linkage, or to answer the question: What chromosome is the mutation on? In order to figure this out, we make pools of DNA samples from the mutant and the wild-type sibling fish. We then test SSLP markers on pooled mutant DNA, pooled sibling DNA, and Mom, Dad, and grandparent DNA samples. Here, we're looking for a particular pattern: Mom, Dad, and the wild-type siblings should each have two bands, or two length alleles for the marker (since they have both an AB and a WIK chromosome); the grandparent and mutant samples should each only have one, and it should be the same one (since the mutation was made on the AB background).

Here are some simulated ASCII gels. The lanes, from left to right are: mutant pool, sibling pool, Mom, Dad, Founder Grandparent. (Note: on all these gels, the single line should line up with the bottom of the double lines. It doesn't really work right in this font, but pretend.)
The first gel is a non-informative marker:
----- All the samples have a single band of the same size. We can't learn anything from this.
The second gel is an informative, but unlinked, marker:
====- Mutants, siblings, Mom and Dad all have alleles from both the AB and the WIK chromosomes. These are recessive mutations, and the mutation is carried on the AB chromosome, so it can't be here, since the mutants have AB/WIK genotypes.
The third gel is an informative, linked marker:
-===- Mutants have just the AB band, meaning they are homozygous at this location. This is good evidence that the marker is near the mutation.

So you test markers on each chromosome (zebrafish have 25) and look for the linkage pattern. Once you find a chromosome that shows linkage to the mutation, it's time to switch tactics and go for fine mapping.

For fine mapping, or determining where on the chromosome the mutation is, we abandon our pooled DNA and work with individual DNA samples. We need as many of these as we can get, so we keep breeding our mapping pair (Mom and Dad) and sorting out the offspring based on mutant phenotype. (Remember that recessive traits are found in 1/4 of a carrier pair's offspring... do a Punnett square if you can't remember how that works.) So 1/4 of the offspring are identified as mutants, and the other 3/4 are siblings. Of these siblings, 2/3 (or 1/2 of the total) are heterozygous, or carriers, and 1/3 (1/4 of the total) are homozygous wild-type, or don't carry the mutation. Most importantly, though, every individual identified as a mutant must be homozygous at the site of the mutation.

So what we do here is we test markers all up and down the chromosome we've identified on all of our DNA samples - mutant, sibling, and parents and grandparents. Due to recombination, not all the mutants will be homozygous at all of the locations we test - and the proportion of those who are homozygous (show up with just one band - instead of two =) is directly proportional to how close the marker is to the mutation. Recall that during meiosis, when germ cells (sperm and egg) are being formed, crossing over occurs between homologous chromosomes (i.e. your copy of 5 from mom and your copy of 5 from dad), creating new chromosomes with bits of each. BUT - we know that all the mutants must have the AB chromosome only at the location where the mutation is, so we use this information to narrow down where the mutation is.

Here's another sample gel. This time, individuals are listed vertically, and each column is that fish's genotype at each of 5 different markers.

mutant A - - - = =
mutant B = - - - -
mutant C = = - - -
mutant D - - - - =
mutant E = = - - -
wt sib A = = = = =
wt sib B - = = = =
wt sib C - - = = =
wt sib D = = = - -
wt sib E = = = = -

Based on these results, we can conclude that the mutation is closest to the third marker - since all the mutants are homozygous and all the sibs are heterozygous here. (In reality, some siblings would also have just one line corresponding to the upper band, but I can't really do that with the ASCII at my disposal.) By testing hundreds, if not a thousand, mutant and wild-type fish, you can find a pair of markers between which the mutation must lie. By testing markers that are closer and closer together, you can narrow down the region to a few hundred thousand base-pairs, after which the mutation is mapped, and now needs to be cloned. But that's another post for another day.

This all sounds pretty easy and straightforward, and while it's conceptually simple, it's a lot harder in practice. One challenge that has hampered my progress is finding polymorphic markers - markers with different lengths between AB and WIK fish. There are also challenges with breeding and identifying mutant fish - sometimes the fish don't "give" (spawn) well, and after about a year an old pair will just stop giving. It can take a long time to map and clone a gene, as I've proven by taking more than a year and a half to find this one... or you can also get lucky and find it relatively quickly. Like anything in science, it's probably 50% luck, 50% hard work.

So that's my post on how to map a gene. The details vary by organism, but it's pretty much the same in principle - whether you're mapping the cystic fibrosis gene in humans or a novel mutation in zebrafish, fruit flies, or yeast. This whole process is known as "positional cloning" - finding the gene by its position in the genome. It's labor-intensive and slow at times, but it's a powerful method for finding a mutation that could be anywhere.

(* I have my own theories as to why this line is so polymorphic, but they're all unfounded at this point, just based on observation and hearsay. There's a chance that I'll end up exploring this as a part of my Ph.D. work... but until then, I'm going to leave those theories out.)

1 comment:

Anonymous said...

Wow, props to you for attempting to explain this! I was asked this exact question in an interview and was at a loss to explain it coherently to a bunch of cell biologists...>_< Thanks!