Xover 3.0


Introduction

Xover is an online bioinformatic tool for automatic analysis and plotting of DNA/protein crossover patterns. It can analyze complex DNA/protein mosaic patterns generated by various DNA chimeragenesis technologies such as DNA shuffling fast and accurately without human intervention, and present data in a graph that is not only rich in information but also easy to understand. Xover is part of the online computational suite ReX. See the other tool of ReX: REcut.

Please cite: Huang W., Johnston W.A., Boden M. and Gillam E.M.J. (2016)  ReX: A suite of computational tools for the design, visualization, and analysis of chimeric protein libraries. Biotechniques, 60(2):91-94.  (PMID: 26842355)


Quick Start

  1. Paste FASTA sequences of parents into the top textbox.

  2. Paste FASTA sequences of chimeras into the bottom textbox.

  3. Both DNA and protein are supported. Pre-alignment is not required. You may submit up to 200 KB data, which allows about 200 sequences of 1 kb DNAs or 600 sequences of 300 aa proteins.

  4. By default, the crossover graph will be drawn in pre-defined colors. Alternatively, you can try the Random Colors option.

  5. Statistics such as the means and standard deviations of crossovers and point mutations, library quality indicators, and a compressed crossover graph will be displayed in a few seconds. A high resolution uncompressed crossover graph can be downloaded in BMP format for publication quality using the link shown at the bottom of the result page. The link to the full dataset used for generating the graph can also be found at the bottom of the page for further analysis if necessary.


Demonstration

A cytochrome P450 3A subfamily DNA shuffling library created using 3A4, 3A5, 3A7 and 3A9 as parents is used in the demonstration. After sequence submissions of parent DNAs and chimera DNAs, you can see a crossover graph and corresponding library statistics in a few seconds. An example result page is provided below:



On the top of the result page, essential statistics of the library are provided, such as the means and standard deviations of crossovers and point mutations. A figure legend is displayed after the statistics, which explains the drawing and symbols used in the graph. Briefly, library parents are depicted in different colors in order of entry of the parents. Large dots represent 100% parental match (i.e. the position in question matches only one parent) and small dots represent more than one parental match (i.e. the position matches more than one parent) at each position. The solid line for each chimera represents the library parents identified within the sequence between crossovers. A set of horizontal parallel lines between crossovers indicates multiple parents match at an equal probability (e.g. 5' ends of mutants 003 and 005 around 100 bp). A mutation is recorded as a plus sign when a position of a chimera does not match the corresponding position in any parental sequence. A vertical spike indicates a fast single position switch between parents (e.g. the spike downwards in the middle of mutant 152 around 700 bp). A horizontal black bar indicates a fragment insertion/deletion.

To evaluate relationship between mutants and parents, the real mutations introduced by DNA shuffling into each mutant is measured by Levenshtein distance. Levenshtein distance measures sequence difference between mutants and parents, i.e. minimum point mutations required to convert a parent into a mutant. A parent showing the shortest distance is the closest parent to the mutant. This distance is called Effective mutation, i.e. the minimum point mutations required to back-mutate a mutant to its closest parent. To evaluate the diversity of the library, Effective mutation is used to measure the actual diversity introduced into each mutant.

For fast display of large graphs over the internet, a compressed graph is shown on the result page, which might have subtle image quality loss sometimes. For publication purpose, you can download the full resolution lossless graph in BMP format using the link at the bottom of the result page. Meanwhile, the full dataset that has been analyzed by the program can also be downloaded. It contains all the information and analysis data that is used to plot the crossover graph, including sequence alignments, per-position parent calling, point mutation detection, detailed crossover information. The data is stored in CSV format, which can be readily open in any spreadsheet programs such as Microsoft Excel.


Tips for New Users


Contact

If you have any question or need any assistance, please don't hesitate to contact:

Dr Weiliang Huang
Email: weiliang.huang@uqconnect.edu.au
University of Maryland, Baltimore USA
Prof. Dr Elizabeth Gillam
Email: e.gillam@uq.edu.au
University of Queensland, Australia

Your suggestions are always welcome.