RenalDB: A Knowledge Database for Kidney RNA Expression
Institute of Cardiovascular Regeneration, Goethe University Frankfurt

Overview

RenalDB is a tool designed to assist researchers for the in silico screening of enriched/specific transcripts of humans, mice, and zebrafish with respect to nephrotic tissues and cells, developmental stages, and other metadata. Furthermore RenalDB serves as a test case for future databases focused on multi-attribute representations of RNA expression. The objective of RenalDB is simple; that is, to make a hypothesis-driven research to be easier. We built RenalDB from the standpoint of researchers to answer very fundamental questions in nephrology, such as, “Are there any RNAs (transcripts) that are expressed specifically in patients suffering from kidney diseases but not in healthy people?”

Search

While all types of RNA transcripts are included in RenalDB, the primary motivation for its creation is the study of lncRNAs. One of the major features of lncRNAs is high specificity of their expression patterns to a certain cell and/or tissue. To highlight this feature, RenalDB features advanced search capabilities. The search bar allows users to combine a variety of tags with boolean operators to create arbitrarily complex searches. See the table under "Queries" for a list of supported search tags and boolean operators.
Search View
Clicking on the [LOCI] button displayed at the top of the window will take the users to the loci view. This view contains a search form followed by a list of relevant "loci" (i.e. genes, transcripts, etc.). If no query is entered, the list will contain the set of all loci in RenalDB. When the [GO] button is pressed, a search will be executed, and the filtered list of loci will be returned. Detailed information about building queries can be found by clicking the button with a question mark, or on this page. Refer to the table under "Queries" below. The number of rows can be changes the value shown in the "Rows Per Page" field using the drop-down list. The page of the results can be changes by modifying the number next to "Page" or by clicking the "Previous" or "Next" buttons.
Universal Genomic Accessions
The users may notice the unconventional looking accession numbers used in RenalDB. These are Universal Genomic Accessions (UGAs) a hash-based accession system that allows for the de-centralized accession system. For details about UGAHash, please refer to the UGAHash web interace (http://ugahash.uni-frankfurt.de) and the following publication:

For more information regarding UGAs see the UGAHash website.
Or:
Weirick, T., John, D., Uchida, S. (2016). Resolving the problem of multiple accessions of the same transcript deposited across various public databases. Briefings in bioinformatics, bbv067. PMID: 26921280

UGAs have a number of advantages over traditional accession systems. In RenalDB, the main advantage is easy retrieval of relations to other genomic databases (e.g. ENSEMBL, UCSC Genome Browser). These relations are handled via CORS (cross-origin resource sharing) by requesting to the UGAHash server. The relations are updated automatically whenever the UGAHash server is updated. From the search bar, the users can use accessions from other databases (e.g. ENSEMBL, NCBI, NONCODE).

Operators
The following table contains list of operators that can be used to filter searches.

Boolean Operators
and | AND GO:0001822 AND TAXID:9606 Returns accessions existing between two query sets.
or | OR BIOTYPE:lincRNA OR BIOTYPE:antisense Returns accessions existing in either of two query sets.
not | NOT
and not | AND NOT
NOT TAXID:9606
EXPRESSED:Kidney AND NOT EXPRESSED:"Kidney Cortex"
Returns accessions not in the query.
()BIOTYPE:lincRNA AND (ENRICHED:Heart OR ENRICHED:Kidney) Increases the precedence of a query.
Expression Example Description
Important! If there is whitespace in your query. The query must be enclosed by quotes.
No tag ENST00000358073
NONHSAT008273
PAXIP1-AS1
Returns matched gene/transcript names, UGAs, or external accessions. Note: Ensembl annotations were used for alignment in RenalDB. Therefore, all Ensembl 83 ids should return values. Other accession systems may return values if they map to a UGA that also maps to an Ensembl UGA.
SPECIFIC:(tissue) SPECIFIC:"Kidney Cortex" Returns sequences detected in only one type of tissue. See [Help] for a list of possible values.
ENRICHED:(tissue) ENRICHED:Heart Returns sequences which have increased expression in the given tissue. See [Help] for a list of possible values.
EXPRESSED:(tissue) EXPRESSED:Heart Returns sequences expressed in the given tissue type. See [Help] for a list of possible values.
METADATA:(metadata) METADATA:"Weeks 3" Returns sequences from samples with the given metadata. See [Help] for a list of possible values.
POS:(tax_id):(chromosome):(start)-(end) POS:9606:1:914880-969309 Returns sequences falling withing the given position.
BIOTYPE:(biotype) BIOTYPE:lincRNA Returns sequences which contain a transcript annotated as the given biotype. See [Help] for a list of possible values.
SEQTYPE:(gene|transcript) SEQTYPE:transcript Returns the given type of sequence.
TAXID:(7955|9606|10090) TAXID:7955 Returns sequences associated with the given taxonomy. Taxonomy IDs (TAXID) are: 7955 (zebrafish), 9606 (human), and 10090 (mouse).
GO:(GO term accession) GO:0005515 Returns sequences which contain an annotation of the given Gene Ontology (GO) term.
Searchable Values
A number of the search tags will only return useful results if given specific tags. The tables below contain the names that can be used for queries.

Sources
Ureteric Tip
Ureteric Stalk
Ureter
Testis
Renal Vesicle
Proximal Tubule
Podocytes
Placenta
Ovary
Nephron Progenitor Cells
Muscle
Metanephros
Mesangial Cells
Lung
Liver
Kidney Tubules
Kidney Cortex
Kidney
Hypothalamus
HK-2 Cells
Hippocampus
HEK-293T Cells
HEK-293F Cells
HEK-293 Cells
Heart
Head Kidney
Epithelial Cells
Endothelial Cells
Cortical Collecting Duct
Colon
Cerebellum
Cap Mesenchyme
Breast
Brainstem
Brain
Bladder
Aorta
Adrenal
Adipose
Biotypes
3prime_overlapping_ncrna
antisense
bidirectional_promoter_lncrna
IG_C_gene
IG_C_pseudogene
IG_D_gene
IG_D_pseudogene
IG_J_gene
IG_J_pseudogene
IG_LV_gene
IG_pseudogene
IG_V_gene
IG_V_pseudogene
lincRNA
macro_lncRNA
miRNA
misc_RNA
Mt_rRNA
Mt_tRNA
nonsense_mediated_decay
non_coding
non_stop_decay
polymorphic_pseudogene
processed_pseudogene
processed_transcript
protein_coding
pseudogene
retained_intron
ribozyme
rRNA
scaRNA
sense_intronic
sense_overlapping
snoRNA
snRNA
sRNA
TEC
transcribed_processed_pseudogene
transcribed_unitary_pseudogene
transcribed_unprocessed_pseudogene
translated_processed_pseudogene
translated_unprocessed_pseudogene
TR_C_gene
TR_D_gene
TR_J_gene
TR_J_pseudogene
TR_V_gene
TR_V_pseudogene
unitary_pseudogene
unprocessed_pseudogene
vaultRNA
Metadata
age Adult
age E10.5
age E11.5
age E12.5
age E13.5
age E14
age E14.5
age E15.5
age E16
age E16.5
age E18
age P1
age P4
age Weeks 14
age Weeks 15
age Weeks 16
age Weeks 17
age Weeks 24
age Weeks 3
age Weeks 42
age Weeks 5.5
age Weeks 6
age Weeks 8
age Weeks 9
age Years 19
age Years 22
age Years 29
age Years 37
age Years 45
age Years 47
age Years 51
age Years 60
age Years 65
age Years 67.5
age Years 68
age Years 73
age Years 77
sex Female
sex Male
strain 129X1/SvJ
strain African
strain BALB/c
strain C57/BL6
strain C57BL/6
strain C57BL/6J
strain C57BL6/6
strain Caucasian
strain CD-1
strain DBA/2J
strain DBAxC57BL/6J
strain Singapore
strain Six2-TGC

Venn Diagrams

Clicking the [VENN] button will take the users to the view with Venn Diagram. This view allows the users to combine several searches, which are similar to the search in the LOCI view.


Logic Programming

RenalDB is unique in that it uses logic programming when determining expression specificity. This allows samples at different levels of anatomical hierarchies to be compared. For example allowing whole organs, organ sub-tissues, and even component cell types to all be compared at once. The knowledge bases used in RenalDB can be found below, implemented in pyDatalog.


Record Pages

Record pages can be accessed from the loci view. Record pages contain detailed information about a sequence contained within RenalDB. Record pages include annotation data, genomic coordinates, expression data, homologs, and GO terms depending on the sequence type.

Visualization of Expression Data

There are two ways to view expression data: a graphical view and a table view. Users can toggle between the two views using the provided buttons.

The graphical view features a tree showing the organization of the expressed samples and the heatmap showing the average magnitude of expression. In the heatmap darker values indicate stronger expression and lighter values weaker values. The tint of the heatmaps is determined using the equation y = 100.0 - (log(x)+12)*((3*x)/(x+1)+1), where y is the lightness value in the HSLA color.

Expression data is represented as between-sample normalized counts, divided by effective length, and finally multiplied by 10^3 to provide an FPKM like value. Note: FPKM could be obtained by dividing by the number of reads per sample then multiplying by 10^6. We excluded this step, in RenalDB as between-sample normalization was already preformed.


Downloads

In the interest of open and repeatable research. The programs used in to create RenalDB as well as the data contained in RenalDB are freely available under the MIT license.













Loading