Pig RNA sequencing

The pig tissue samples were collected and analyzed in collaboration with BGI. Pig tissue used for mRNA analysis were collected and handled in accordance with national guidance for large experimental animals and under permission of the local ethical committee (ethical permission numbers No.44410500000078 and BGI-IRB18135) as well as conducted in line with European directives and regulations.

Animal details

The experimental minipigs (Chinese Bama Minipig) were provided by the Peral Lab Animal Sci & Tech Co.,Ltd (Permit number SYXK2017-0123). Male (n = 2) and female (n = 2) Chinese Bama minipigs (1 year old), were housed in a specific pathogen-free stable facility under standard conditions. Female 1 (pig 1) 36 kg, female 2 (pig 2) 41.4 kg, male 1 (pig 3) 37.5 kg and male 2 (pig 4) 30.2 kg.

Sampling strategy

The four animals were sampled by the same team using a consistent strategy. Brain samples were taken from one hemisphere while the other hemisphere was fixed in formalin for later protein analysis. For each animal one eye was sampled into three tissue types; lens, cornea and retina, where the retina was removed with as little pigment layer as possible. One whole eye was fixed for paraffin embedding and staining protocols. Peripheral tissue samples were divided into two pieces where one was frozen for RNA extraction and the other adjacent piece was submerged into formalin for fixation, enabling morphological verification and quality assessment.

Normalization of transcriptomics data

The transcriptomics data was normalized in a similar manner as done for the Human Protein Atlas. In brief, transcripts per million (TPM) values were calculated per each sample (n=350) for all protein coding genes, referred to as pTPM. Samples of the same tissue type (n=98) were then aggregated by using the average pTPM per gene, and resulting values were sample-wise corrected using trimmed mean of M values (TMM) and then gene-wise pareto scaled, resulting in an expression score referred to as NX. Expression values for grouped tissues were calculated as the maximum expression of sub-tissues. Both TMM corrected and NX values were used in down-stream analyses, as specified in each section below.

Classification of transcriptomics data

Each gene was individually classified in terms of specificity and distribution based on relative NX expression values between 44 different tissue types. The specificity categories were defined as follows: Tissue enriched; a single tissue has 4-fold or higher NX than any other tissue, Group enriched; 2-5 tissues have NX larger than a fourth of the maximum NX and their average NX is 4-fold higher than any other tissue, Tissue enhanced; the gene is neither tissue enriched nor group enriched and one or multiple tissues have an NX at least 4-fold higher than the average NX, Low tissue specificity; the gene is neither tissue enriched, group enriched, nor tissue enhanced and detected above cut of in at least one tissue. The pig gene expression distribution categories were defined as follows: Detected in all; NX ≥ 1 for all tissues, Detected in many; NX ≥ 1 for at least 31% (n=14) tissues but not in all, Detected in some; NX ≥ 1 for more than 1 tissue but less than 31% (n=14), Detected in single; NX ≥ 1 for a single tissue. A gene was classified as Not detected if no tissue had NX ≥ 1.

Within-tissue variation classification

Genes were categorized as either variable or not variable within each of the grouped tissues (adipose tissue, brain, cartilage, heart, kidney, large intestine, lung, lymphoid tissue, male glands, mesothelial tissue, mouth, skin, small intestine, stomach, upper respiratory system) using a linear model in the limma R package (v 3.44.3). First, genes were filtered based on their TMM corrected expression, so that every gene fulfilling any of the following criteria was removed: 1) Zero variance in expression; 2) Maximum expression < 1; 3) Total range of expression less than 4-fold. For each grouped tissue, a linear model was created using lmfit as ~ 0+tissue type and contrasts were defined in makeContrasts between each unique pair of tissue types within the grouped tissue (e.g. for adipose tissue: abdominal – orbital, abdominal – subcutaneous, orbital – subcutaneous). The linear fit was then moderated using limma’s empirical Bayes statistic function eBayes. P-values were extracted and Benjamini Hochberg (BH) corrected, where a gene was considered differentially expressed if the adjusted p-value was below 0.05, and the contrast ratio for the particular gene was at least 4-fold. As the grouped tissue “brain” contained a considerable higher number of contrasts, an additional criterion was added, that at least one subtissue had to have log expression deviating more than two standard deviations from the mean for the gene to be classified as variable.

UMAP gene clustering

Genes were clustered based on their expression in all samples in order to stratify them into groups with related expression pattern and function, such that global transcriptomic structures can easily be navigated. In doing so, manual decisions in clustering were made such that the number of clusters were reasonably low (n = 84), and their average size neither too large nor too small. The resulting clustering favors accessibility and visualization, rather than optimizing for a particular metric.

To functionally annotate the gene clusters from UMAP analysis, gene ontology analysis was carried out using the enrichR R package (v 2.1). For each cluster, pig genes were transferred to human gene names using the established orthologs, and analyzed for enrichment to GO 2018 databases; Biological Process, Cellular Component, and Molecular Function. This data, together with manual investigation of genes, tissue specificity category and general expression levels were used to manually annotate each cluster in terms of specificity and function where possible.

Human orthologue translation

Human orthologues were used for cross-species comparison of gene specificity classification and tissue-wide expression based on transcriptomics data. The analyses were based on pig genes having a human orthologue in Ensembl release 92, and the orthologues included were the one2one orthologues (n=15483) and the set of one2many orthologues having a single high-confidence pair (n=756). Many2many orthologues and one2many orthologues with only low-confidence pairs or several high-confidence pairs were excluded since the analyses rely on gene-to-gene comparisons.

Human disease genes

The list of human disease genes was retrieved from the KEGG DISEASE Database, in which diseases are viewed as perturbed states of the molecular network system. We removed non-coding genes which are not within the focus of this study. In total, the body-wide expression of 3,432 human disease genes having one-to-one orthologs in pigs is shown in the individual gene summary pages.

Tissue profiling

Using HPA antibodies on pig tissues

The resource of antibodies produced in the HPA project was utilized for protein profiling of pig tissues. Genes of interest was only investigated if they fulfilled two important criteria: (1) antibodies with high reliability (based on human antibody validation and described in Sivertsson Å et al. (2020))) existed, and (2) over 80% sequence identity between the PrEST (protein epitope signature tag used for immunization, Berglund L et al. (2008)) and corresponding pig orthologous gene. The access to exact amino acid (aa) sequence for the antigens (PrEST) used for immunization enables the comparison to pig sequence for the corresponding orthologues. All used antibodies are published on the HPA portal Human Protein Atlas, where more details about antibody reliability as well as the aa- sequence for each antibody and the tissue distribution in human is available.

Immunohistochemical protocol

A tissue micro array (TMA) created from the tissue blocks representing most normal tissue types was used for immunohistochemical analysis of protein location in pig tissues. Sections of the TMA were used for immunohistochemical (IHC) stainings and pig tissue protein profiling, enabling comparison between protein detection and RNA expression level. Pig and human IHC stainings were generated at the same staining occasion, thereby minimizing the risk of technical bias. The standard IHC protocol used within the HPA pipeline was applied, and the protocol is downloadable here.

Pig tissue dictionary

A representative section of each tissue block was stained with hematoxyline and eosin (H&E) for morphological examination. These sections were also digitalized for generation of the online tissue dictionary. One to two examples of each tissue type were uploaded as representation. Joint cartilage, synovial tissue and larynx did not include fixed tissue and are therefore not included in the tissue dictionary. Only seven brain tissues are included in the dictionary, representing the central nervous system (olfactory bulb, cerebral cortex, basal ganglia, hippocampus, hypothalamus, cerebellum and spinal cord).