Enter new data

Upload new dataset

More info about the structure of the csv file can be found in the about tab. The upload can take a couple of minutes, because the sampling of random data, used as expected values, takes some time (5-20 mins).

Create new strain

Please enter the reference RNA sequences of the strain in the following:


Analyse single dataset

Select dataset and define parameters for analysis

The RSC is dataset-specific and is usually set to a value between 5 and 30. As reference: In our meta-analysis an RSC of 15 was used.

Flattened data do not take the NGS count into account. In unflattened data, each individual DelVG is weighted by the NGS count.

Distribution of NGS count

The logarithm of the NGS counts of the DelVGs given in the selected dataset.

Frame shift

Shift of the reading frame introduced by deletion site. A chi-squared test is performed to compare the distribution against a random shifts. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Segment distribution

Distribution of the DelVGs over the eight segments. It is tested by a chi-squared test if the distribution is similar to a distribution that would be expected if the DelVGs occur solely dependent on the RNA sequence length. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

DelVG lengths

The length of the single DelVGs is plotted as a histogram, showing the number of occurrences for each length.

Distribution of deletion sites

Locations and nucleotides of all start and end positions of the deletion sites. If data about the packaging signal is available, incorporation signal is included in blue and bundling signal is included in red.

3' and 5' sequence end comparision

Comparision of the lengths of the 3' and 5' ends. If data about the packaging signal is available, incorporation signal is included in blue and bundling signal is included in red.

Deletion site mapping

The plot displays the connection between the start and end positons of the DelVGs.

Frequency of direct repeats

The length of the overlapping sequence of the start and end of the deletion site is calculated and plotted in a bar plot. The results are compared against data from a random sampling apporach using a chi squared test. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Nucleotide enrichment at start of deletion site

For each position the difference to randomly sampled data is estimated using a one-way ANOVA. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Nucleotide enrichment at end of deletion site

For each position the difference to randomly sampled data is estimated using a one-way ANOVA. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Analyse multiple datasets

Select dataset and define parameters for analysis

The RSC is dataset-specific and is usually set to a value between 5 and 30. As reference: In our meta-analysis an RSC of 15 was used.

Flattened data do not take the NGS count into account. In unflattened data, each individual DelVG is weighted by the NGS count.

NGS counts

Different statistical parameters for the NGS count of the datasets

Deletion shift

Distribution of the reading frame shift for the selected datasets. A chi-squared test is performed to compare the distribution against random shifts. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Segment distribution

Distribution of the DelVGs over the eight segments. It is tested by a chi-squared test if the distribution is similar to a distribution that would be expected if the DelVGs occur solely dependent on the RNA sequence length. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

DelVG length

Length distribution of the DelVGs. Datasets can be (un-)selected by clicking on them in the legend.

Direct repeats

Distribution of the direct repeat lengths. The distribution is compared against data from a random sampling apporach using a chi squared test. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Nucleotide enrichment (Start position)

Distribution of nucleotide enrichment at start of the deletion site. For each position the difference to randomly sampled data is estimated using a one-way ANOVA. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Nucleotide enrichment (End position)

Distribution of nucleotide enrichment at end of the deletion site. For each position the difference to randomly sampled data is estimated using a one-way ANOVA. (***, p < 0.00001; **, p < 0.001; *, p < 0.05; ns., not significant).

Identification of promising DelVGs by intersecting multiple datasets

Select datasets for intersection analysis

Note: While it is possible to select datasets from different strains for experimental purposes, we recommend using datasets from the same strain for these analyses.

Intersecting DelVG candidates per dataset pair

Number of candidates per segment that occur in multiple datasets

Identified candidates by using mean and sum scoring

About this application

What this application is meant for

This application allows to investigate datasets of deletion-containing viral genomes (DelVGs) for influenza viruses. Influenza DelVGs contain large, usually single internal deletions in their vRNA sequence. This deletion is defined in the corresponding datasets by the start and the end point of the deletion. Start corresponds to the 5'-end and End to the 3'-end of the vRNA sequence.

Overview about the different tabs

The application consists of multiple tabs, that provide different analyses.
  1. Add new dataset: Allows to upload a new custom dataset. If a strain was used that is not included right now it has to be added first. For that the FASTA files of the single segments need to be provided.
  2. Single dataset: Allows the user to investigate a single dataset. The read support cutoff (RSC), which is the minimum NGS count for a DelVG to be included, can be set individually. In addition, the user decides if the DelVGs are weighted by their NGS count (unflattened) or not (flattened)
  3. Multiple datasets: Provides the same analyses as the 'single dataset' tab. But in this case multiple datasets can be compared.
  4. Dataset intersection: In this tab the user can search for DelVGs that are present in multiple datasets. It shows the candidates and the overlap between the datasets. Additionally, a plot of the NGS counts is given where the DelVGs with the highest occurrence are marked.

Verfied datasets

The following datasets were added by us to the applicaten. All others are uploaded by other users and we cannot gurantee for their quality.
Dateset name Strain Publication
Alnaji2021 A/Puerto Rico/8/1934 'Influenza A Virus Defective Viral Genomes Are Inefficiently Packaged into Virions Relative to Wild-Type Genomic RNAs.'
Pelz2021 A/Puerto Rico/8/1934 'Semi-continuous Propagation of Influenza A Virus and Its Defective Interfering Particles: Analyzing the Dynamic Competition To Select Candidates for Antiviral Therapy.'
Wang2023 A/Puerto Rico/8/1934 'Influenza Defective Interfering Virus Promotes Multiciliated Cell Differentiation and Reduces the Inflammatory Response in Mice.'
Wang2020 A/Puerto Rico/8/1934 'Cell-to-Cell Variation in Defective Virus Expression and Effects on Host Responses during Influenza Virus Infection'
Zhuravlev2020 A/Puerto Rico/8/1934 'RNA-Seq transcriptome data of human cells infected with influenza A/Puerto Rico/8/1934 (H1N1) virus'
Kupke2020 A/Puerto Rico/8/1934 'Single-Cell Analysis Uncovers a Vast Diversity in Intracellular Viral Defective Interfering RNA Content Affecting the Large Cell-to-Cell Heterogeneity in Influenza A Virus Replication.'
VdHoecke2015 A/Puerto Rico/8/1934 'Analysis of the genetic diversity of influenza A viruses using next-generation DNA sequencing.'
Alnaji2019_Cal07 A/California/07/2009 'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Alnaji2019_NC A/New Caledonia/20-JY2/1999 'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Mendes2021 A/WSN/1933 'Library-based analysis reveals segment and length dependent characteristics of defective influenza genomes.'
Boussier2020 A/WSN/1933 'RNA-seq accuracy and reproducibility for the mapping and quantification of influenza defective viral genomes.'
Alnaji2019_Perth A/Perth/16/2009 'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Berry2021_A A/Connecticut/Flu122/2013 'High confidence identification of intra-host single nucleotide variants for person-to-person influenza transmission tracking in congregate settings'
Penn2022 A/turkey/Turkey/1/2005 'Levels of Influenza A Virus Defective Viral Genomes Determine Pathogenesis in the BALB/c Mouse Model.'
Lui2019 A/Anhui/1/2013 'SMRT sequencing revealed the diversity and characteristics of defective interfering RNAs in influenza A (H7N9) virus infection. '
Alnaji2019_BLEE B/Lee/1940 'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Berry2021_B B/Victoria/504/2000 'High confidence identification of intra-host single nucleotide variants for person-to-person influenza transmission tracking in congregate settings'
Valesano2020_Vic B/Victoria/504/2000 'Influenza B Viruses Exhibit Lower Within-Host Diversity than Influenza A Viruses in Human Hosts.'
Sheng2018 B/Brisbane/60/2008 'Identification and characterization of viral defective RNA genomes in influenza B virus.'
Berry2021_B_Yam B/Yamagata/16/1988 'High confidence identification of intra-host single nucleotide variants for person-to-person influenza transmission tracking in congregate settings'
Southgate2019 B/Yamagata/16/1988 'Influenza classification from short reads with VAPOR facilitates robust mapping pipelines and zoonotic strain detection for routine surveillance applications'
Valesano2020_Yam B/Yamagata/16/1988 'Influenza B Viruses Exhibit Lower Within-Host Diversity than Influenza A Viruses in Human Hosts.'

How to enter a custom dataset?

General info:
  1. Custom datasets need to be in *.csv format.
  2. They have exactly four columns.
  3. Ordering of the four columns is crucial!
Example dataset:
Including correct order and naming of the headers. The header names start with a capitalised letter.
Column 1 Column 2 Column 3 Column 4
header names Segment Start End NGS_read_count
column data type character (string) integer integer integer
description Name of the segment where the DelVG is originating from Start position of the deletion site End position of the deletion site Number of counts in the NGS data of this specific DelVG candidate

Examplary input file with five DelVGs:
Segment Start End NGS_read_count
PB2 163 2139 42
PA 167 1990 161
PB1 113 2165 37
PB2 109 2152 69
PB2 163 2152 73

What are 'direct repeats'?

The sequence of the RNA before the starting point and end point of the deletion site can be the same (is repeated). This phenomena is described in literature a 'direct (sequence) repeat'. Direct repeats can be of different length and are disscussed to be a driving factor in the generation of DelVGs.

The actual start and end point of a DelVG that has a direct repeat longer than 0 can not be determined certainly. In the given image a direct repeat of length 3 is depicted. There are four possible ways on how it could have been created during the experiment. If n is the length of the direct repeat, there are always n+1 options on how it could have been created.

Contact information

To get into contact with the developement team open a new issue on GitHub. There you can get help with more detailed questions and come up with new ideas and features for the application.