Documentation of the computational methods used to generate the data presented in this webserver
All analyses presented on this webserver were conducted by Elias Dohmen, who also developed this webserver platform. If underlying data by others was used, the source is clearly indicated in the respective sections below. The results and data presented here are freely available for use and publication by anyone, given appropriate attribution.
When using data from this webserver, please reference this webserver and its URL in your publications. For specific methodological details and tool citations, refer to the individual method sections below.
For any questions, please contact Elias Dohmen.
We analyzed 200 genome annotations generated with BRAKER3 as described in Saenko et al. (2025). For all downstream analyses, only the longest isoform per gene was retained. Protein domains were annotated with pfam_scan.pl v1.6 using the Pfam database version 37.3 (Mistry et al., 2021).
From the full set of 200 species, a reduced subset of 163 species was created using the filtering script
GEvol_filter_by_DOGMA_quality.py
.
Filtering was performed according to the following criteria:
Proteome quality was assessed with DOGMA v3.8.37.3 (Dohmen et al., 2016), run in proteome mode with the insect core set and Pfam v37.3. All other parameters were default. The insect core set includes 4182 single-domain Conserved Domain Arrangements (CDAs) and 4582 multi-domain CDAs.
Orthogroups and multiple sequence alignments (MSAs) were constructed using
OrthoFinder v3.0.1b1
(Emms et al., 2025;
Emms & Kelly, 2019)
executed via the official Docker container
davidemms/orthofinder:3.0.1b1
on DockerHub. The following parameter choices were made:
-M msa
: Generate multiple sequence alignments (MSAs)-t 48
: Use 48 threads for parallel processingDataset | Sites | Patterns | Gaps (%) | Invariant Sites (%) |
---|---|---|---|---|
200 species set | 26,323 | 25,487 | 19.93% | 7.63% |
163 species set | 31,125 | 29,660 | 11.85% | 9.60% |
Maximum-Likelihood phylogenetic trees were constructed with RAxML-NG v2.0.0 (Kozlov et al., 2019). The best-fit evolutionary model was selected using:
raxml-ng-2 --msa <msa-file> --model AA
JTT+FC+IU{0.071649}+G4m{0.91863}
JTT+FC+IU{0.090326}+G4m{0.937173}
Trees were inferred with:
raxml-ng-2 --all --msa <msa-file> --model <evolutionary-model> --seed 7 --threads <n> --bs-metric fbp,tbe
run mode: Tree with branch support (adaptive) (Felsenstein Bootstrap + Transfer Bootstrap)
start tree(s): adaptive
bootstrap replicates: parsimony (max: 1000) + bootstopping (autoMRE, cutoff: 0.030000)
random seed: 7
tip-inner: OFF
pattern compression: ON
per-rate scalers: OFF
site repeats: ON
logLH epsilon: general: 10.000000, brlen-triplet: 1000.000000
stopping rule: KH
fast spr radius: AUTO
spr subtree cutoff: 1.000000
fast CLV updates: ON
branch lengths: proportional (ML estimate, algorithm: NR-FAST)
SIMD kernels: AVX
parallelization: coarse-grained (auto), PTHREADS (20 threads), thread pinning: OFF
run mode: ML tree search + bootstrapping (adaptive) (Felsenstein Bootstrap + Transfer Bootstrap)
start tree(s): adaptive
bootstrap replicates: parsimony (max: 1000) + bootstopping (autoMRE, cutoff: 0.030000)
random seed: 7
tip-inner: OFF
pattern compression: ON
per-rate scalers: OFF
site repeats: ON
logLH epsilon: general: 10.000000, brlen-triplet: 1000.000000
stopping rule: KH
fast spr radius: AUTO
spr subtree cutoff: 1.000000
fast CLV updates: ON
branch lengths: proportional (ML estimate, algorithm: NR-FAST)
SIMD kernels: AVX2
parallelization: coarse-grained (auto), PTHREADS (48 threads), thread pinning: OFF
In Python 3.12 with the ETE4 library (Huerta-Cepas et al., 2016), the resulting tree underwent the following processing steps:
.set_outgroup()
function.to_ultrametric()
functionThis temporal scaling fits estimations by Thomas et al. (2020) and Misof et al. (2014).
Ancestral domain content across the phylogenetic tree and domain rearrangement events were reconstructed with DomRates (Dohmen et al., 2020).
-p
parameter)-s
and -d
parameters)-g
parameter)-a
parameter) - as described in Datasets and Annotation section-t
parameter) - as described in Phylogenetic Tree Construction section
Gene Ontology (GO) term enrichment analysis is carried out with the
topGO package in R
(Alexa et al., 2006)
using scripts
analyseGo.r
and
domain2topGo.py
and is based on the DomRates results as described in the Domain Rearrangements section.
The GO universe is composed of all domain arrangements that are present in all species as well as the reconstructed domain arrangement sets in the ancestral nodes.
New domain arrangements that can be explained by an exact or non-ambiguous solution (see DomRates) are annotated with the pfam2go mapping (v37.3) of Pfam domains to GO terms (Mitchell et al., 2015). The GO-terms of all these new domain arrangements are compared to the GO-terms of the GO Universe as described above either per node or for the whole tree.
weight01
methodmake_wordcloud.py
The mapping between NCBI Sequence IDs and BRAKER Sequence IDs is based on reciprocal BLASTp where only the 1:1 top hit is reported. The reciprocal BLASTp analysis was done and provided by Chetan Munegowda. The following NCBI annotations were used:
GCF_000001215.4
GCF_000002335.3
From the BRAKER annotations, only the longest isoforms were used, while all isoforms from the NCBI annotations were included in the analysis.
The Orthogroups from OrthoFinder3 are mapped to BRAKER Sequence IDs based on the OrthoFinder3 results file Orthogroups.tsv
.
For Pfam domain arrangement mappings, the BRAKER Sequence IDs were mapped to the annotated Pfam domain arrangements based on the annotation files mentioned above in the Datasets and Annotation section.