Banner background

Glossary

Glossary

There are many specialized terms and concepts used in genomics and bioinformatics. On this page, we have compiled a glossary of some of the scientific terminology used in the Genome Portal to describe data types, methods, and quality metrics.

To facilitate communication, advocate the FAIR principles, and ensure scientific accuracy, the glossary will use terms and definitions from external biological ontologies and controlled vocabularies whenever possible. In particular, we try to use ontologies included in the Open Biological and Biomedical Ontology (OBO) Foundry, which is an open source, non-redudant collection of ontologies.

On the rare occation that a glossary term does not have a definition in any external ontology, we will produce our own definition and include links to the references used to support the definition. If you have any questions or suggestions of new glossary terms, please contact us at dsn-eb@scilifelab.se or through the Contact page.

Term (or synonym) Ontology accession number Definition Reference / Ontology version
annotation track (EDAM) data_3002 Annotation of one particular positional feature on a biomolecular (typically genome) sequence, suitable for import and display in a genome browser. EDAM 1.25
assembly SO:0001248 A region of the genome of known length that is composed by ordering and aligning two or more different regions SO v2024-06-05
assembly level - chromosome FBcv:0003235 There is sequence for one or more chromosomes. This could be a completely sequenced chromosome without gaps or a chromosome containing scaffolds or contigs with gaps between them. There may also be unplaced or unlocalized scaffolds. FB2024_04
assembly level - scaffold FBcv:0003236 Some sequence contigs have been connected across gaps to create scaffolds, but no scaffolds have been placed on chromosomes. FB2024_04
contig SO:0000149 A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N’s from unavailable bases. SO v2024-06-05
contig N50 OBI_0001941 N50 statistic computed for the contigs produced by the assembly process. A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list. OBI v2024-06-10
contig N50 GENEPIO_0001633 Contig L50 is the number of contigs equal to or longer than contig N50. The length of the assembly itself is used in the calculation. GenEpiO v0.8
genome assembly FBcv:0003086 The result of reconstruction of the genome achieved by aligning and merging smaller fragments. FB2024_04
genome representation - full FBcv:0003237 The assembly represents the whole genome, though there may still be gaps. FB2024_04
genome representation - partial FBcv:0003238 The assembly represents only part of the organism’s genome, because only a single chromosome was targeted, coverage is <1, or total length is less than half the average for other assemblies of the same species. FB2024_04
isoform SO:0001149 Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting. SO v2024-06-05
mitochondrion GO:0005739 A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration. GO 2024-09-08
mitochondrial chromosome GO:0000262 A chromosome found in the mitochondrion of a eukaryotic cell. GO 2024-09-08
ncRNA SO:0000655 An RNA transcript that does not encode for a protein rather the RNA molecule is the gene product. SO v2024-06-05
ncRNA_gene SO:0001263 A gene that encodes a non-coding RNA. SO v2024-06-05
protein_coding SO:0000010 A gene which, when transcribed, can be translated into a protein. SO v2024-06-05
pseudogene SO:0000336 A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their “normal” paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its “normal” paralog). SO v2024-06-05
reference_genome SO:0001505 A collection of sequences (often chromosomes) taken as the standard for a given organism and genome assembly. SO v2024-06-05
reference genome assembly FBcv:0003087 The result of extensive sequence alignment and merging to produce a genome assembly that serves as the current reference for a particular organism. FB2024_04
reference genome sequence FBsv:0001013 A whole genome sequence used as the standard sequence for a species. FB2024_04
reference sequence GENO_0000017 A sequence that serves as a standard against which other sequences at the same location are compared. GENO v2023-10-08
refNameAlias (reference name alias) In-house definition A reference name alias is a tab-delimited file that contains links between the FASTA headers in an assembly and all their known synonymous names. This allows tracks that use alternative identifers for the same genomic sequence to be properly displayed in a genome browser. The first column of the refNameAlias contains the FASTA header of the genome assembly that will be loaded in JBrowse 2, and each subsequent column contains synonymous FASTA headers. Jbrowse 2 documentation
RNA-Seq FBcv:0003068 A quantitative method for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing. FB2024_04
scaffold SO:0000148 One or more contigs that have been ordered and oriented using end-read information. Contains gaps that are filled with N’s. SO v2024-06-05
scaffold N50 OBI_0001945 N50 statistic computed for the scaffold produced by the assembly process. The method for computing the value is similar to that of contig N50 but uses scaffold information instead of contig information. OBI v2024-06-10
tRNA SO:0000253 Transfer RNA (tRNA) molecules are approximately 80 nucleotides in length. Their secondary structure includes four short double-helical elements and three loops (D, anti-codon, and T loops). Further hydrogen bonds mediate the characteristic L-shaped molecular structure. Transfer RNAs have two regions of fundamental functional importance: the anti-codon, which is responsible for specific mRNA codon recognition, and the 3’ end, to which the tRNA’s corresponding amino acid is attached (by aminoacyl-tRNA synthetases). Transfer RNAs cope with the degeneracy of the genetic code in two manners: having more than one tRNA (with a specific anti-codon) for a particular amino acid; and ‘wobble’ base-pairing, i.e. permitting non-standard base-pairing at the 3rd anti-codon position. SO v2024-06-05
tRNA_gene SO:0001272 A noncoding RNA that binds to a specific amino acid to allow that amino acid to be used by the ribosome during translation of RNA. SO v2024-06-05
transcript SO:0000673 An RNA synthesized on a DNA or RNA template by an RNA polymerase. SO v2024-06-05
Transcriptome assembly (EDAM) operation_3258 Infer a transcriptome sequence by analysis of short sequence reads. EDAM 1.25