The title of THIS diary in the overview diary of this series was: The technical enablers of modern biology
An overview of microbiological theory and practice
Microbiology has been empowered by a combination of physical, chemical, biological , and mathematical techniques. It is the most complex of the sciences, studying the most complex thing in the universe: life.
Below the fold, this first diary in a series will give the 50,000 foot view of the technology behind modern biology.
Biology has gone from being zoology (classify and name) to being engineering (modifying genomes, modulating cell signaling networks with drugs) and information technology. This happened quite recently. The genetic code (what DNA triplet codes for what amino acid) was not worked out until 1965. In 1970, very little was understood about the molecular details of cell biology. Organelles (parts of the cell visibly different in an optical microscope) had names; but, in most cases, no one knew their functions.
Today, we have databases of complete genomes for hundreds of species. We have open source databases of atomic-scale models of tens of thousands of the protein molecules that make up the components of the cell. We know the function and the molecular components of all the organelles. (And the components are unbelievably complicated; e.g., each nuclear pore complex contains over 900 protein components.) We have circuit diagrams of the immense number of biochemical signaling pathways that keep the cell stable (in homeostasis), e.g., the intensely-studied p53 cancer gene pathway.
We now understand that only 1-2% of our DNA actually codes for expressing proteins (exons), while the bulk of the genome is silent (introns), but is not "junk DNA". We now know that a human genome has code for about 30,000 proteins (each of which can have variants due to "alternative splicing". We understand that there are even more information and control systems, besides DNA: epigenetics (methylation, etc. of DNA), alternative splicing (how exons are assembled into whole proteins), and small interfering RNA (siRNAs) (parts of the introns that are actually control circuitry). The HapMap (Haplotype genome Map) Project has shown that, within the human species, properties of the genetic code previously thought to be static, such as the number of copies of a specific gene and the chromosomal location of that gene can vary (so-called Copy Number Variation) from one individual to the next.
While understanding exactly how various proteins do their jobs (so that we can cure diseases) is still a complex, Nobel-prize level of research, it is a well-explored field. The "wild west" of biological science is now morphology - how the time sequencing, spatial localization, and expression levels of protein and other signaling molecules give rise to the complex structures of, say, the heart or the brain or the developing fetus.
With that tour of the horizon, let's get down to the nuts and bolts of the technology.
Protein - Theoretical and Practical Advances to date
The first biologically-famous structure determined by X-rays was the DNA double helix (1953). These determinations used to be difficult science, as is attested to by the over-representation of protein crystallographers in the Nobel laureate list. Today, technique has advanced so far that a graduate student can determine a new structure in a few months. Today's student has access to perfected hanging-drop crystallization techniques, heavy-atom replacement for solving phase problems, high-intensity beam lines from synchrotrons, and off-the-shelf computer applications for solving structures.
Protein structures for all species pour into the databases at thousands per year. Even you can get into the act with Protein Folding at Home. At this website, volunteers donate compute cycles to run simulations of how proteins fold as they emerge from their ribosomal assembly line. Theorists have determined that there are only a few thousand generic "fold families" to which all proteins can be assigned. (The slides "chap2_motif.ppt" from this directory at TU Norway discuss folds and families.) Are protein structures like words in a language, where most people get by with less than a ten thousand word vocabulary?
One result about proteins that I found fascinating from a "Darwin vs the fundies" POV is that somewhere between 10% and 30% of all proteins are misfolded upon synthesis. Cells have special protein machinery called "chaperones". These are containers that capture the dangerous/non-functional proteins and put them in a chemical environment that is designed to lead to correct folding.
Think about that. The "perfect" machinery of every living cell has a 10-30% error rate; and even after chaperoning, a non-trivial percentage of protein goes straight from the production line to the cellular trash bin (the proteasome). Protein misfolding is the cause of horrible diseases, like Alzheimers and Mad Cow; so, the existence and failure of chaperones is not some obscure fact that can be ignored.
DNA - Practical Advances to date
Most people know that Watson, Crick, and Franklin discovered the structure of DNA. Fewer know that the exact year was 1953. Fewer still know that the three-base code that translates DNA into protein was not deciphered until 1965 or that high-speed sequencing (Sanger, Maxam-Gilbert) was not invented until 1975. About the same time, biologists succeeded in using bacterial plasmids to transfer genetic material between organisms, creating the first artificial, human-made "recombinant DNA".
Unless you are a biologist, you probably haven't heard of "restriction enzymes"; but these enzymes are a critical tool in genetics. A restriction enzyme is a molecular scissors that cuts only at a specific place in a specific DNA sequence. The first one was discovered in 1955 - another Nobel Prize. Today, companies make a business out of discovering, cataloging, manufacturing, and selling microgram quantities of such enzymes to research labs. With these scissors, DNA can be cut and spliced in anyway you desire.
Here is another political aside. The possiblity that the current 3-letter DNA code evolved from an earlier 2-letter code is an active area of investigation. This research is motivated by the fact that, in a large percentage of the 3-letter code, the third letter is redundant; i.e., the same amino acid is coded for any DNA base in position three.
Isn't this a fun factoid? Suppose you run into a fundie who says, "I accept there is a genetic code, but I don't accept evolution." You can ask him, in reply, if he accepts the possiblity that the genetic code itself evolved.
Another important tool is Polymerase Chain Reaction (or PCR) - another Nobel prize, invented in 1985. PCR uses a naturally-occuring DNA polymerase (an enzyme that joins individual bases of DNA into a DNA polymer or chain). PCR can amplify minute traces of DNA into large quantities suitable for laboratory sequencing (to discover what proteins some new species is making or to discover whose blood is on the murder weapon).
The most recent advance in methodology, taking the bio world by storm since about 2000, has been "RNA interference" or RNAi. Again, RNAi technology is another re-purposing of a naturally occurring system. In living cells, the RNAi system is designed to destroy invading viral DNA. The system contains components that chop up free-floating double-stranded DNA (which does not belong outside a cell nucleus) into pieces whose length is long enough to represent a high likelihood of uniquely matching only the viral DNA. These pieces are then fitted into another RNAi component, "dicer". Dicer binds to matching sequences of DNA, and destroys them. It is a anti-DNA guided missile. It is used to "knockout" genes (i.e., destroy them) so that the effect of that single gene can be observed.
DNA - Theoretical Advances to date
After producing huge amounts of sequence data, we began to analyze it and to compare genomes between species and between individuals of the same species. This led to some important empirical data.
The first fact worth mentioning is that all species use the same genetic code. The second fact is that the slow changes of this code between species gives us a "molecular clock" for measuring the "distance" between species, based on mutations in the genetic code. And, it turns out that classifying species relationships by DNA distance gives almost the same "tree of life" as the Linnaean taxonomists and zoologists determined by observing physical features. Slight differences have usually been resolved in favor of the DNA evidence.
Political aside: Everyone has heard that humans share 99% of their DNA with chimpanzees; but what's even more telling in favor of evolution is that most of the 6,000 genes in common yeast have their counterparts in humans. That is, we share 25% of our genes with brewer's yeast.
Interestingly, mitochondria have their own separate genetic code. Again, this code, too, is constant across all species. By examining the number of mitochondrial genes as a function of species, it has been determined that, over evolutionary time, genes have migrated out of the mitochondria and into the nuclear DNA, changing their code in the process. The two facts, of different codes and of the ongoing shrinkage of the mitochondrial genome, are evidence for the "endosymbiont hypothesis". Here is a figure and some further discussion of this important hypothesis.
This hypothesis suggests that one unique event created the cell nucleus, and another the mitochondria. In this unique event, one bacteria (probably a macrophage) enveloped another bacteria; but the eater was unable to digest his meal. Instead, the two creatures settled into a symbiosis (like lichen); with the digestee becoming the nucleus or mitochondrion of the digestor. The progeny of this cell evolved into all the multi-celled eukaryotic organisms.
Another important fact is that mutation rates are not unvarying across a genome. The mutation rates are a function of how "critical" a specific amino acid is to the correct operation of the protein. This demonstrates the default state for mutation is "on", and natural selection operates only at vital points in the genome.
In the overview, protein folds were mentioned. It turns out that the components of the business end (i.e., active site) of enzymes tend to occur at the "bends" (joints of the folds) in between the various structural elements (alpha helices, beta sheets) of a protein. Looking across species at the sequence of a single enzyme (protein), it is amazing to watch the genetic code for individual amino acids in the structural elements mutate, while the amino acids in the active site are preserved by natural selection.
With this principle in hand, scientists applied it to mapping the mutation rate of introns; and that is how they discovered the non-coding RNA in the introns. Pieces of introns are preserved because they are doing a critical regulatory jobs. Recently, statistical analysis showed a higher mutation rate for "off-line DNA" (that is, DNA not unrolled from its storage "reels" - i.e., histones - for expression) than for "on-line", unrolled DNA.
Political aside: The juxtaposition of constant, slow change everywhere that mutations are irrelevant, with rock-solid stability at the critical points in the DNA - found time after time in species after species - is the most powerful evidence that natural selection exists which one could ever demand.
Another empirical finding has been huge numbers of duplicate genes scattered around the genome. It is now understood that, due to the elaborate layering of control systems, it is possible to make a duplicate copy of an entire gene. Once there are two copies, one can diverge into a new gene without destroying production of the original gene product. This finding demonstrates how it is possible for incompletely changed copies of a gene to survive through generations without killing the carrier. It also demonstrates that mutations can be larger than a single base change. (Quite often, duplications are carried out by transposase enzymes involved in Horizontal Gene Transfer; but that's a digression.)
Does all this technology rise to the level of "micro-agriculture"?
The title of this diary series is "The Micro-agricultural revolution". There has been some legitimate pushback against my claim that we are on the verge of such a revolution reaching the public. That claim has two components (the methods and their widespread use for commercial purposes) which I would like to examine separately.
As to the methods, I would say that, in this short-order Cook's tour, we have seen that a large number of "bits" of cellular machinery have been torn lose from their parent "animal" and used for human purpose. That is, we are using the piece parts of the whole micro-animal in the same way that agricultural civilizations used hair, leather, and bone or milk, meat, and eggs.
One technology I did not mention yet is so-called "cell-free systems". In these systems, the entire protein manufacturing machinery of a cell is delicately separated out. This "naked" protein production system can be placed in a lab flask and fed any bit of DNA. It will produce nothing but the single protein in the DNA snippet, until the c-f system expires from natural wear and tear in a day or two. If doing this equivalent of getting a dead horse's leg to pull a wagon is not micro-agriculture, I am at a loss for a definition of such.
As to widespread use, I would draw an analogy. In the early days of individual UNIX workstations, half the computing power sold went into designing the next generation of workstations. It was only when Moore's Law brought the performance down to everyday prices that workstation applications became cheap and widespread.
I would argue that, today, most of the biological tools sold go into research labs designing the next generation of biological tools. But, as the $1,000 genome meets up with the ability to mass produce protein on demand, there will be one of those "hockey stick curves" in the commercial applicability of all these technologies.
At the edge of the world of biological science
Before you think that biology has jumped completely on the reductionist bandwagon - that is, life is just inorganic machinery, blind, unthinking, etc. - there is a huge amount of work at the micro- and macro- frontiers of the discipline. Sort of the same way that all the real action in physics is at the extremes of sub-atomic particles or astrophysics.
The small-scale issue about biology is that we simply have no theory of why organic molecules can make such tiny working machines. It is astounding in its own way (the following blockquote is scientific, not political):
Between the nano- and micrometre scales, the collective behaviour of matter can give rise to startling emergent properties that hint at the nexus between biology and physics.
Up to about 30 nanometres, there is little difference between gold and niobium. It’s beyond this point that the electrons in niobium start binding together into the coupled electrons known as ‘Cooper pairs’. By the time we reach the micrometer scale, these pairs have congregated in their billions to form a single quantum state, transforming the crystal into an entirely new metallic state — that of a superconductor, which conducts without resistance, excludes magnetic fields and has the ability to levitate magnets.
In assemblies of softer, organic molecules, a tenth of a micrometre is big enough for the emergence of life. Self-sustaining microbes little more than 200 nanometres in size have recently been discovered. Although we understand the principles that govern the superconductor, we have not yet grasped those that govern the emergence of life on roughly the same spatial scale.- P. Coleman, "Frontiers at your fingertips", Nature, 22 Mar 07, p. 379.
The large scale issue is about morphology. That is, how do all these proteins, controlled by all these genes, differentiate into this hugely complex organisms, which then maintain themselves in homeostatsis for decades (mammals) to centuries (trees)? The Complex Adaptive Systems people, like the Santa Fe Institute, are investigating these questions. Needless to say, the inclusion of such material in this particular diary series would simply make it so complicated as to be un-readable. Perhaps another time.
Political aside: the renewed discussion between the hard-core "genes only" crowd and the "structuralist" school of morphology is one of the "controversies" that has been pointed to by "teach the controversy" creationists. No space for that in this diary. I'll come back to it in Diary 3 (The end of DNA dictatorship).
Ethical implications of the technology
Whether or not this "revolution" is a good thing is an entirely different argument. There is already a huge fight about whether the comparison of DNA from different human beings (the so-called Halotype Map Project or HapMap) will inevitably lead to racism, ethnically-targetted diseases/weapons, etc. One other "terrorist" worry to be discussed in a later diary is "synthetic biology".
Then, there is the implication of the increasing corporatization of academic research. Government has paid huge sums of money for this research, and yet, the professors (often in cahoots with corporate funders) are encouraged by the government to grab the patents we paid for and start their own companies - while paying ZERO royalties back to the public. (Meanwhile, universities have nasty Intellecutal Property offices which want their pound of the public's IP flesh.) Now, I am the last to want to restrict science; but the current situation is just robbery of the public treasury.
At least, the raw data itself is in the public domain (e.g., the various databanks mentioned above). There are also public domain tools for searching these databases, usually found on the database site itself. At least the public gets to see what it paid for and to use some of the tools we paid to create.
One final political point. Microbiology is simultaneously very concrete (atom-level analysis of DNA and proteins) and very abstract (statistical matching of gene similarity). The problem is that the concreteness takes place at such tiny dimensions that the entire abstract apparatus of physical science interposes itself between the phenomena and the facts in question. That is, there is very little "common sense" in molecular biology. Hence, the entire endeavor of molecular biology and genetics comes across to the average person as an opaque and threatening high-priesthood that is attacking "good folks" sacred way of life.
The increasing divide between what science knows and can do versus what the population understands about science and wants done is a recipe for political disaster.
What the masses refuse to recognize is the fortuitousness that pervades reality. They are predisposed to all ideologies because they explain facts as mere examples of laws and eliminate coincidences by inventing an embracing omnipotence which is supposed to be at the root of every accident. Totalitarian propaganda thrives on this escape from reality into fiction, from coincidence into consistency...- Hannah Arendt, "The Origins of Totalitarianism"
In closing, an Apology
Please think of this diary series as a popularized, idiosyncratic ramble through biology. Textbooks on this topic run to thousands of pages and require more than one complicated illustration per page. Even as a bad joke, I cannot pretend to be saying anything at a level above a "cocktail party" conversation by a techie for a politically-oriented audience. I just hope that, coupled with the links given, these diaries will convince politically-oriented people that what ordinary people know (or, worse, don't know) about science is of extreme importance at this moment in history.