GrainGenes Home

 
The GrainGenes SQL Interface

Using the GrainGenes SQL Interface

from Dave Matthews, 14jul04

The SQL (Structured Query Language) interface at http://wheat.pw.usda.gov/cgi-bin/graingenes/sql.cgi provides direct access to the GrainGenes Database via SQL queries. This is the power query interface. Please use it.

Writing SQL queries isn't easy, I'll be the first to acknowledge that. So there's some help about that below. But the first thing is to address your possible concern about damaging our server.

No qualms!

We're fully aware that it's extremely unusual to provide unrestricted SQL query access to a large database from the Web. However we have some years of experience doing this in the old ACEDB version of GrainGenes, e.g. the AQL interface. The results show that yes of course there have been some problems. But, of course, nothing that came from abuse. The problems came from limitations inherent in the ACEDB database software. Now that we've moved to MySQL, most of the queries that were making trouble (minutes) happen in a few seconds.

The problems that did happen were nothing like "crashing the server", whatever that phrase might conjure up for you. We use Unix servers, which are not subject to anything like the "Blue Screen of Death" from mere software causes. The worst that can happen is a few minutes' unavailability of the server.

It's quite easy to write a pathological SQL query that sends the server into never-never land. I do it all the time. No problem. Hit the "Stop" button on your browser and the system will recover in a few minutes.

That said, if you still feel uneasy about it, or if you're having any problems after giving it a try, or for any reason at all, feel free to contact the GrainGenes Curators at and we'll develop your query for you.

Writing SQL queries

The best tutorial on SQL I've found is this one, at MySQL.com. Pay particular attention to the brief page on "Using More Than one Table." GrainGenes has a highly relational structure and nearly every query will involve one or more table joins. For help on SQL syntax, the "Search the MySQL manual:" box is useful.

The GrainGenes SQL interface provides several Premade Queries. Choose one and it will execute immediately and show the results. Then the query itself will be available for you to edit.
For example:


Many additional premade queries are available on the Quick Queries page, http://wheat.pw.usda.gov/GG2/quickquery.shtml. Again, clicking on a query executes it, and also provides the text of the SQL query itself in an edit box. For example here is the result of the Quick Query "Nearby Loci":

GrainGenes schema

The SQL interface page includes two links to the GrainGenes schema: Diagrams for a graphical view, and Table definitions for a text representation with some additional comments. Both are directories of files containing the tables for a particular data class: Allele, Germplasm, Sequence, etc.

Many of the tables are for relationships between two classes. The table relating Allele and Germplasm, for example, might either be in the Allele file and named allelegermplasm or in the Germplasm file and named germplasmallele. (In fact it's the former.)

All primary-key fields are named "id". All foreign-key fields are named for the table they point to followed by "id". For example in the allelegermplasm table, the field alleleid is a foreign key to the id field of the allele table.

GrainGenes is a product of the Agricultural Research Service of the US Department of Agriculture.