Enter new data
Upload new dataset
Create new strain
Analyse single dataset
Select dataset and define parameters for analysis
Distribution of NGS count
The logarithm of the NGS counts of the DelVGs given in the selected
dataset.
Frame shift
Shift of the reading frame introduced by deletion site. A chi-squared
test is performed to compare the distribution against a random shifts.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
Segment distribution
Distribution of the DelVGs over the eight segments. It is tested by a
chi-squared test if the distribution is similar to a distribution that
would be expected if the DelVGs occur solely dependent on the RNA
sequence length.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
DelVG lengths
Distribution of deletion sites
Locations and nucleotides of all start and end positions of the
deletion sites. If data about the packaging signal is available,
incorporation signal is included in blue and bundling signal is
included in red.
3' and 5' sequence end comparision
Comparision of the lengths of the 3' and 5' ends. If data about the
packaging signal is available, incorporation signal is included in
blue and bundling signal is included in red.
Deletion site mapping
The plot displays the connection between the start and end positons
of the DelVGs.
Frequency of direct repeats
The length of the overlapping sequence of the start and end of the
deletion site is calculated and plotted in a bar plot. The results are
compared against data from a random sampling apporach using a chi
squared test.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
Nucleotide enrichment at start of deletion site
Nucleotide enrichment at end of deletion site
Analyse multiple datasets
Select dataset and define parameters for analysis
NGS counts
Different statistical parameters for the NGS count of the datasets
Deletion shift
Distribution of the reading frame shift for the selected datasets. A
chi-squared test is performed to compare the distribution against
random shifts.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
Segment distribution
Distribution of the DelVGs over the eight segments. It is tested by a
chi-squared test if the distribution is similar to a distribution that
would be expected if the DelVGs occur solely dependent on the RNA
sequence length.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
DelVG length
Direct repeats
Distribution of the direct repeat lengths. The distribution is
compared against data from a random sampling apporach using a chi
squared test.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
Nucleotide enrichment (Start position)
Distribution of nucleotide enrichment at start of the deletion site.
For each position the difference to randomly sampled data is estimated
using a one-way ANOVA.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
Nucleotide enrichment (End position)
Distribution of nucleotide enrichment at end of the deletion site.
For each position the difference to randomly sampled data is estimated
using a one-way ANOVA.
(***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).
Identification of promising DelVGs by intersecting multiple datasets
Select datasets for intersection analysis
Intersecting DelVG candidates per dataset pair
Number of candidates per segment that occur in multiple datasets
Identified candidates by using mean and sum scoring
About this application
What this application is meant for
This application allows to investigate datasets of deletion-containing
viral genomes (DelVGs) for influenza viruses. Influenza DelVGs contain
large, usually single internal deletions in their vRNA sequence. This
deletion is defined in the corresponding datasets by the start and the
end point of the deletion. Start corresponds to the 5'-end and End to
the 3'-end of the vRNA sequence.
Overview about the different tabs
The application consists of multiple tabs, that provide different
analyses.
- Add new dataset: Allows to upload a new custom dataset. If a strain was used that is not included right now it has to be added first. For that the FASTA files of the single segments need to be provided.
- Single dataset: Allows the user to investigate a single dataset. The read support cutoff (RSC), which is the minimum NGS count for a DelVG to be included, can be set individually. In addition, the user decides if the DelVGs are weighted by their NGS count (unflattened) or not (flattened)
- Multiple datasets: Provides the same analyses as the 'single dataset' tab. But in this case multiple datasets can be compared.
- Dataset intersection: In this tab the user can search for DelVGs that are present in multiple datasets. It shows the candidates and the overlap between the datasets. Additionally, a plot of the NGS counts is given where the DelVGs with the highest occurrence are marked.
Verfied datasets
The following datasets were added by us to the applicaten. All others
are uploaded by other users and we cannot gurantee for their quality.
How to enter a custom dataset?
General info:
Including correct order and naming of the headers. The header names start with a capitalised letter.
Examplary input file with five DelVGs:
- Custom datasets need to be in *.csv format.
- They have exactly four columns.
- Ordering of the four columns is crucial!
Including correct order and naming of the headers. The header names start with a capitalised letter.
Column 1 | Column 2 | Column 3 | Column 4 | |
header names | Segment | Start | End | NGS_read_count |
column data type | character (string) | integer | integer | integer |
description | Name of the segment where the DelVG is originating from | Start position of the deletion site | End position of the deletion site | Number of counts in the NGS data of this specific DelVG candidate |
Examplary input file with five DelVGs:
Segment | Start | End | NGS_read_count |
PB2 | 163 | 2139 | 42 |
PA | 167 | 1990 | 161 |
PB1 | 113 | 2165 | 37 |
PB2 | 109 | 2152 | 69 |
PB2 | 163 | 2152 | 73 |
What are 'direct repeats'?
The sequence of the RNA before the starting point and end point of the
deletion site can be the same (is repeated). This phenomena is
described in literature a 'direct (sequence) repeat'. Direct repeats
can be of different length and are disscussed to be a driving factor
in the generation of DelVGs.
The actual start and end point of a DelVG that has a direct repeat longer than 0 can not be determined certainly. In the given image a direct repeat of length 3 is depicted. There are four possible ways on how it could have been created during the experiment. If n is the length of the direct repeat, there are always n+1 options on how it could have been created.

The actual start and end point of a DelVG that has a direct repeat longer than 0 can not be determined certainly. In the given image a direct repeat of length 3 is depicted. There are four possible ways on how it could have been created during the experiment. If n is the length of the direct repeat, there are always n+1 options on how it could have been created.
Contact information
To get into contact with the developement team open a new
issue
on GitHub. There you can get help with more detailed questions and
come up with new ideas and features for the application.