************************************************** * Description of the GGsnp_update.pl PERL script * ************************************************** GGsnp_update_Help.txt file VERSION 0.1 19 Fevrier 2003 AMILHAT Laurence, LEROY Philippe Annotation & Cartographie In Silico UMR INRA-UBP ASP Clermont-Ferrand 234 Avenue du Brezet, Domaine de Crouel 63 039 Clermont-Ferrand Cedex 2, France contact : leroy@clermont.inra.fr This PERL script as been written by Laurence AMILHAT to facilitate the extraction of a valid FASTA file of the "Unigene" (contig consensus sequences + singletons) using the updated data files of the "Wheat SNP development" initiative. Then, this script should be used only within this context. The PERL script is normaly distributed with the updated data files at the following address : http://wheat/pw.usda.gov/ITMI/2002/WheatSNP.html The script source can also been download from : http://wheat/pw.usda.gov/ITMI/2002/WheatSNP/software.html To use correctly the GGsnp_update.pl script you need to have in the same directory : 1. the singleton file in a fasta format 2. The directory with all the contigs 3. The GGsnp_update.pl PERL script Remarks : when you unzip and untar the download files you may recover other files (cultivar_* files for exemple) Concerning the script itself you may have to modify the first line of the script to indicate properly the PERL path. In our case the command line is as follow : #!/usr/local/public/bin/perl IN YOUR CASE THIS PATH MAY CHANGED ? The purpose of the script is to build a unique FASTA file with all the contigs and singletons. All characters which are not A T G C or N are translated in a N character (IUB codes and *) We give here some examples of how the updated data files should be formated : singletons.fasta file ===================== >U19CA653661 GGACCAGCTTGTTGAAATCATCAAGGTCCTTGGTACCCCTACAAGGGAAGAAATTAAATG CATGAACCCAAATTATACAGAGTTCAAATTCCCACAGATTAAAGCACACCCATGGCACAA GGTATTCCATAAAAGGATGCCACCTGAAGCTGTTGATCTAGTCTCTCGGCTCCTCCAGTA CTCGCCCAACCTAAGATGCACTGCTGTTGAGGCACTTGTTCACCCCTTCTTTGATGAGCT ........... contigs Directory ================= In the contigs directory you should have thousands of files names contig-### each file having a different number (###). Each contig file should start with the contig consensus sequence following with EST sequences used to build the consensus sequence. Each contig file is not in a FASTA format. It is important that each contig sequence was named >contig-### exemple ======= >contig-9997.1 AGGCACGCTCCCGGCGCTCGAGCTCTGCTACTCCATGACCCAGCGCCTCCACCTGGTGGACGACGACAAGGACAAGCCGCCGCTGCTGCAGCAGCAGCACGAAGGCGCTGCCGCCGCACCGGATGCCTCCGCGCCGTCGCCGCCCGTCACCAACTGGAAGATCTCCAGCCCCGGTGACAGCCCGGACGAGGTGAAGGCGCGGCTCAAGTACTGGGCGCAGGCGGTGGCGTGCACCGTTCGCCTCTGCAGCTGAACGACGGCGACCATCCACGCGGCACACCATTCGCCGCCGGCAAACACACCATACTATATATCATACATGCCGCCGCCGCAGCCCGCAAGAAACGATCGCCGACGAGGACAACGAAAGAGTAAACGACGGACCAAGAAGGGACATGCCGGACGACCAGACCTCACCTGATCGCGAACGCGCTGCTGCTGCTGCTGCCGTCGTAAAGTGATGGGGGGCGTTGGCAACTGCATGCCATGATTGATTGCAGCTGGAGAGTTTGGTGGAGCGTTGTTGCTTGTTGGTAGTCGTCGGAGCAGCACACATGGGCTGGGCAGGTGGGAGGCGGTTTGGTTGGTTGGTTGCCCGAGTTGCTGTGTGCTCTTGTTCCAGCTGAGCAACCTGCGTCTCCTTAGCAGCAGTGATCAAAATAATGTGCAGAGTAATTAGTAGGACTACTGCAAATCTGGGGATCGGAGTGATTGAATGAATGAATAAATAAATCCACAGATCTCATTATGTTTGTTGCTTTCTTAATC >CSPCA487051 AGGCACGCTCCCGGCGCTCGAGCTCTGCTACTCCATGACCCAGCGCCTCCACCTGGTGGACGACGACAAGGACAAGCCGCCGCTGCTGCAGCAGCAGCACGAAGGCGCTGCCGCCGCACCGGATGCCTCCGCGCCGTCGCCGCCCGTCACCAACTGGAAGATCTCCAGCCCCGGTGACAGCCCGGACGAGGTGAAGGCGCGGCTCAAGTACTGGGCGCAGGCGGTGGCGTGCACCGTTCGCCTCTGCAGCTGAACGACGGCGACCATCCACGCGGCACACCATTCGCCGCCGGCAAACACACCATACTATATATCATACATGCCGCCGCCGCAGCCCGCAAGAAACGATCGCCGACGAGGACAACGAAAGAGTAAACGACGGACCAAGAAGGGACATGCCGGACGACCAGACCTCACCTGATCGCGAACGCGCTGCTGCTGCTGCTGCCGTCGTAAAGTGATGGGGGGCGTTGGCAACTGCATGCCATGATTGATTGCAGCTGGAGAGTTTGGTGGAGCGTTGTTGCTTGTTGGTAGTCGTCGGAGCAGCACACATGGGCTGGGCAGGTGGGAGGCGGTTTGGTTGG------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >CSPCA485268 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------GGTGACAGCCCGGACGAGGTGAAGGCGCGGCTCAAGTACTGGGCGCAGGCGGTGGCGTGCACCGTTCGCCTCTGCAGCTGAACGACGGCGACCATCCACGCGGCACACCATTCGCCGCCGGCAAACACACCATACTATATATCATACATGCCGCCGCCGCAGCCCGCAAGAAACGATCGCCGACGAGGACAACGAAAGAGTAAACGACGGACCAAGAAGGGACATGCCGGACGACCAGACCTCACCTGATCGCGAACGCG......... ect. ...