Banner background

Glossary

Glossary

There are many specialized terms and concepts used in genomics and bioinformatics. On this page, we have compiled a glossary of some of the scientific terminology used in the Genome Portal to describe data types, methods, and quality metrics.

To facilitate communication, advocate the FAIR principles, and ensure scientific accuracy, the glossary will use terms and definitions from external biological ontologies and controlled vocabularies whenever possible. In particular, we try to use ontologies included in the Open Biological and Biomedical Ontology (OBO) Foundry, which is an open source, non-redudant collection of ontologies.

On the rare occation that a glossary term does not have a definition in any external ontology, we will produce our own definition and include links to the references used to support the definition. If you have any questions or suggestions of new glossary terms, please contact us at dsn-eb@scilifelab.se or through the Contact page.

Term (or synonym)Ontology accession numberDefinitionReference / Ontology version
annotation track(EDAM) data_3002Annotation of one particular positional feature on a biomolecular (typically genome) sequence, suitable for import and display in a genome browser.EDAM 1.25
assemblySO:0001248A region of the genome of known length that is composed by ordering and aligning two or more different regionsSO v2024-06-05
assembly level - chromosomeFBcv:0003235There is sequence for one or more chromosomes. This could be a completely sequenced chromosome without gaps or a chromosome containing scaffolds or contigs with gaps between them. There may also be unplaced or unlocalized scaffolds.FB2024_04
assembly level - scaffoldFBcv:0003236Some sequence contigs have been connected across gaps to create scaffolds, but no scaffolds have been placed on chromosomes.FB2024_04
contigSO:0000149A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N’s from unavailable bases.SO v2024-06-05
contig N50OBI_0001941N50 statistic computed for the contigs produced by the assembly process. A contig N50 is calculated by first ordering every contig by length from longest to shortest. Next, starting from the longest contig, the lengths of each contig are summed, until this running sum equals one-half of the total length of all contigs in the assembly. The contig N50 of the assembly is the length of the shortest contig in this list.OBI v2024-06-10
contig N50GENEPIO_0001633Contig L50 is the number of contigs equal to or longer than contig N50. The length of the assembly itself is used in the calculation.GenEpiO v0.8
genome assemblyFBcv:0003086The result of reconstruction of the genome achieved by aligning and merging smaller fragments.FB2024_04
genome representation - fullFBcv:0003237The assembly represents the whole genome, though there may still be gaps.FB2024_04
genome representation - partialFBcv:0003238The assembly represents only part of the organism’s genome, because only a single chromosome was targeted, coverage is <1, or total length is less than half the average for other assemblies of the same species.FB2024_04
isoformSO:0001149Description of sequence variants produced by alternative splicing, alternative promoter usage, alternative initiation and ribosomal frameshifting.SO v2024-06-05
mitochondrionGO:0005739A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration.GO 2024-09-08
mitochondrial chromosomeGO:0000262A chromosome found in the mitochondrion of a eukaryotic cell.GO 2024-09-08
ncRNASO:0000655An RNA transcript that does not encode for a protein rather the RNA molecule is the gene product.SO v2024-06-05
ncRNA_geneSO:0001263A gene that encodes a non-coding RNA.SO v2024-06-05
protein_codingSO:0000010A gene which, when transcribed, can be translated into a protein.SO v2024-06-05
pseudogeneSO:0000336A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their “normal” paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its “normal” paralog).SO v2024-06-05
reference_genomeSO:0001505A collection of sequences (often chromosomes) taken as the standard for a given organism and genome assembly.SO v2024-06-05
reference genome assemblyFBcv:0003087The result of extensive sequence alignment and merging to produce a genome assembly that serves as the current reference for a particular organism.FB2024_04
reference genome sequenceFBsv:0001013A whole genome sequence used as the standard sequence for a species.FB2024_04
reference sequenceGENO_0000017A sequence that serves as a standard against which other sequences at the same location are compared.GENO v2023-10-08
refNameAlias (reference name alias)In-house definitionA reference name alias is a tab-delimited file that contains links between the FASTA headers in an assembly and all their known synonymous names. This allows tracks that use alternative identifers for the same genomic sequence to be properly displayed in a genome browser. The first column of the refNameAlias contains the FASTA header of the genome assembly that will be loaded in JBrowse 2, and each subsequent column contains synonymous FASTA headers.Jbrowse 2 documentation
RNA-SeqFBcv:0003068A quantitative method for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing.FB2024_04
scaffoldSO:0000148One or more contigs that have been ordered and oriented using end-read information. Contains gaps that are filled with N’s.SO v2024-06-05
scaffold N50OBI_0001945N50 statistic computed for the scaffold produced by the assembly process. The method for computing the value is similar to that of contig N50 but uses scaffold information instead of contig information.OBI v2024-06-10
tRNASO:0000253Transfer RNA (tRNA) molecules are approximately 80 nucleotides in length. Their secondary structure includes four short double-helical elements and three loops (D, anti-codon, and T loops). Further hydrogen bonds mediate the characteristic L-shaped molecular structure. Transfer RNAs have two regions of fundamental functional importance: the anti-codon, which is responsible for specific mRNA codon recognition, and the 3’ end, to which the tRNA’s corresponding amino acid is attached (by aminoacyl-tRNA synthetases). Transfer RNAs cope with the degeneracy of the genetic code in two manners: having more than one tRNA (with a specific anti-codon) for a particular amino acid; and ‘wobble’ base-pairing, i.e. permitting non-standard base-pairing at the 3rd anti-codon position.SO v2024-06-05
tRNA_geneSO:0001272A noncoding RNA that binds to a specific amino acid to allow that amino acid to be used by the ribosome during translation of RNA.SO v2024-06-05
transcriptSO:0000673An RNA synthesized on a DNA or RNA template by an RNA polymerase.SO v2024-06-05
Transcriptome assembly(EDAM) operation_3258Infer a transcriptome sequence by analysis of short sequence reads.EDAM 1.25