| The automated procedure of physical mapping is crucial to large scale genome sequencing and genome research. This automation procedure involves three important steps: (1) high-throughput fingerprinting of (BI)BAC clones, (2) accurate and efficient high-throughput data processing, and (3) efficient contig assembly grouping clones into different contigs.
In the NSF-supported Wheat D genome physical mapping project , a protocol of high-throughput fingerprinting of BAC clones with SNaPshotTM Kit using ABI 3100 capillary genetic analyzer has been developed and utilized (Luo et al. 2003). Other high-throughput fingerprinting methods are also available (Ding et al. 1999, 2001). For contig assembly, FPC (FingerPrint Contig) software has been developed by Soderlund et al. (1997) and widely used to build contigs in many physical mapping projects. However, manually handling of the size editing from raw sample files, and pre-analysis of large amount of fragment size data are time-consuming and error-prone process for a large-scale physical mapping project. As the application of high-throughput fingerprinting protocols in the large-scale physical mapping projects, high-throughput fingerprinting data processing becomes a bottleneck of automation of physical mapping.
GenoProfiler is a cross-platform software package for the fully automated processing of fingerprinting data from ABI 3100/3700/3730 Genetic Analyzer. This software provides the algorithms to accurately extract the traces and peak data (sizes, peak height, peak area, and scan number or data point) directly from sample files obtained from ABI 3100/3700 Genetic Analyzer without using ABI GeneScan, GeneMapper, or other software, and automatically remove the background (noise) bands and "false" bands, and detect and split double bands or triple bands, and create edited size files with FPC size format which can be directly used for contig assembly using FPC software and for other purposes (such as genotyping for phynogenetic analysis). GenoProfiler also provides some very useful tools to make fingerprinting data processing fully automatic. Those tools includes detecting cross-contaminations of clones within 384-well plates and 96-well plates and chloroplast DNA contaminations of clones, fragment frequency analysis of produced fragment sizes, renaming clones, sub-setting clone sizes from edited size files, sub-setting sample files from the sample file directories, sub-setting clones from contigs in the fpc files (*.fpc), and fingerprints viewer.
GenoProfiler is a Java-based window application which should work on all operating systems with JDK 1.4 or later.
Since version 1.05, ABI 3730 fingerpring data processing has been supported. The software will take as input the output files of fragment size data created by GeneMapper software (Applied Biosystems).
GenoProfiler is composed of some modules with different functions. In version 1.06, GenoProfiler provides the following modules:
Trace view of a sample file
This tool is to view trace file quality and check size calling and editing quality before you batch-process large number of sample files. By setting the appropriate parameters, you may get the best results. This module has the following functions:
- Viewing traces of different colors in a single sample file
- Automatically calling and editing fragment sizes
- View all processed peak data including peak size, peak height, peak area, data point (scan number), ratio of peak area / peak height
- Checking size editing quality
- Saving the edited fragment sizes with FPC size format.

Batch processing of sample files
This module serves for automatic size calling and editing of sample files from ABI 3100/3700 Genetic Analyzer in a batch mode. It has the following features:
- Batch-processing size calling and editing of sample files without any limitation in memory and number of sample files.
- Directly reading in raw sample files without pre-processing of sample files using ABI GeneScan, GeneMapper or other software and outputting processed size files with FPC size file format for FPC contig assembly. It's possible to output different size format for other purposes.
- By setting different parameters in Size Calling and Editing Setting, you may
- Call and edit one color sizes and output FPC size files with only one color.
- Call and edit multiple-color sizes and output FPC size files with multiple colors.
- Limit the called size range to the specified range, e.g. from 70 to 450 bps.
- Provide options to export bands with / without editing, such as removing / no removing background bands, detecting/no detecting double/triple bands, or finding / no finding "true bands" depending on you requirements.
- Selectively processing sample files with the specified clone name patterns.
- Reporting empty clones and failed fingerprinting clones during processing, and saving empty and failed fingerprinting clone list to file
- Reporting processing summary including number of empty clones, number of failed fingerprinting clones, empty and failed fingerprinting clone list, and band frequency distribution of created clone size files, and saving summary to the file.
BAC contamination check
There are two kinds of possible contaminations during BAC library construction, filter making and duplication, and fingerprinting:
- cross-contamination between adjacent clones and between non-adjacent clones in plates and
- chloroplast DNA contamination in library construction.
For cross-contaminations, there exists three kinds of possible contamination sources:
- cross-contamination of adjacent clones in 384 well plates,
- cross-contamination of adjacent clones in 96 well plates transferred from 384 well plates and
- cross-contamination of non-adjacent clones within 384 well plates or 96 well plates ( very high profile-sharing between clones).
This module will detect all 4 possible contamination sources by comparing the fingerprinting pattern similarities between clones or between clone and chloroplast DNA fingerprint pattern. If match percentage of fingerprints is very high and greater than a user-defined threshold, we may think there is high possibility that two clones are cross-contaminated or a clone is contaminated by chloroplast DNA.
Fragment frequency analysis
Fragment size frequency distribution is very useful information for detecting vector bands, chloroplast DNA fragments and repeat fragments in random clones, and for checking size distribution of each fragment in replications of fingerprinting of same clones. The size distribution of each fragment will provide the evidence that the tolerance parameter in FPC contig assembly is determined. The module provides an algorithm to find peaks and calculate the statistics of peaks (area, height, width, total frequency, std, and std error).
File management
GenoProfiler provides the supports to three file formats: sample file (*.fsa) from ABI 3100/3700, FPC size file (*.sizes) produced in this software and FPC file (*.fpc) produced in FPC software. In the real fingerprinting data processing and FPC contig assembly, some file operations for those three kinds of files are basic. This module provides the tools for those basic operations including removing vector/repeat bands, renaming clones, sub-setting clone sizes from edited size files or sub-setting sample files from the sample file directories or sub-setting clones from contigs in the fpc files (*.fpc).
Fingerprint viewing
Fingerprints Viewer is a visual tool for checking the fingerprint patterns of processed clones. From this tool, you may get the following information: 
- Total number of bands and number of bands produced each dye (color)
- Fingerprint patterns in separate dyes
- Shared bands and Sulston scores of a clone with all other clones and visual display of shared bands on the fingerprint image
- output of clone list with total number of bands and number of bands produced in each dye
Conversion between hybridization grid position and clone name (ID)
Hybridizations of markers to arrayed BAC clones (filters) can provide the associated clone-marker data for BAC contigs anchoring to a chromosome or a specific region of a chromosome or genetic maps. This module provides a tool to convert the filter well positions positive to markers into clone names or clone IDs .
Conversion to Ace file format from clone-marker hybridization data
After contigs are built by using FPC software, some markers associated with clones in contig assembly can be merged and integrated to contigs. Hybridization experiments of markers to arrayed BAC clones (filters) can provide the associated clone-marker data for BAC contig anchoring to a chromosome or a specific region of a chromosome. FPC software requires a specified ace file format for hybridization data. This module will provide a convenient tool to convert the hybridization results (clone-marker pairs) to Ace file format which then can be directly used by FPC for marker merging. |