Enter new data

Upload new dataset

Select an existing strain:

Give the dataset a unique name:

Upload a custom dataset (*.csv):

Browse...

More info about the structure of the csv file can be found in the about tab. The upload can take a couple of minutes, because the sampling of random data, used as expected values, takes some time (5-20 mins).

Create new strain

Write the name of the new strain (e.g. A/Perth/16/2009):

Please enter the reference RNA sequences of the strain in the following:

Upload PB2 sequence as FASTA:

Browse...

Upload PB1 sequence as FASTA:

Browse...

Upload PA sequence as FASTA:

Browse...

Upload HA sequence as FASTA:

Browse...

Upload NP sequence as FASTA:

Browse...

Upload NA sequence as FASTA:

Browse...

Upload M sequence as FASTA:

Browse...

Upload NS sequence as FASTA:

Browse...

Identification of promising DelVGs by intersecting multiple datasets

Select datasets for intersection analysis

Select datasets to compare:

Note: While it is possible to select datasets from different strains for experimental purposes, we recommend using datasets from the same strain for these analyses.

Intersecting DelVG candidates per dataset pair

Number of candidates per segment that occur in multiple datasets

Identified candidates by using mean and sum scoring

Select segment:

Using the highest n ranked candidates:

About this application

What this application is meant for

This application allows to investigate datasets of deletion-containing viral genomes (DelVGs) for influenza viruses. Influenza DelVGs contain large, usually single internal deletions in their vRNA sequence. This deletion is defined in the corresponding datasets by the start and the end point of the deletion. Start corresponds to the 5'-end and End to the 3'-end of the vRNA sequence.

Overview about the different tabs

The application consists of multiple tabs, that provide different analyses.

Add new dataset: Allows to upload a new custom dataset. If a strain was used that is not included right now it has to be added first. For that the FASTA files of the single segments need to be provided.
Single dataset: Allows the user to investigate a single dataset. The read support cutoff (RSC), which is the minimum NGS count for a DelVG to be included, can be set individually. In addition, the user decides if the DelVGs are weighted by their NGS count (unflattened) or not (flattened)
Multiple datasets: Provides the same analyses as the 'single dataset' tab. But in this case multiple datasets can be compared.
Dataset intersection: In this tab the user can search for DelVGs that are present in multiple datasets. It shows the candidates and the overlap between the datasets. Additionally, a plot of the NGS counts is given where the DelVGs with the highest occurrence are marked.

Verfied datasets

The following datasets were added by us to the applicaten. All others are uploaded by other users and we cannot gurantee for their quality.

Dateset name	Strain	Publication
Alnaji2021	A/Puerto Rico/8/1934	'Influenza A Virus Defective Viral Genomes Are Inefficiently Packaged into Virions Relative to Wild-Type Genomic RNAs.'
Pelz2021	A/Puerto Rico/8/1934	'Semi-continuous Propagation of Influenza A Virus and Its Defective Interfering Particles: Analyzing the Dynamic Competition To Select Candidates for Antiviral Therapy.'
Wang2023	A/Puerto Rico/8/1934	'Influenza Defective Interfering Virus Promotes Multiciliated Cell Differentiation and Reduces the Inflammatory Response in Mice.'
Wang2020	A/Puerto Rico/8/1934	'Cell-to-Cell Variation in Defective Virus Expression and Effects on Host Responses during Influenza Virus Infection'
Zhuravlev2020	A/Puerto Rico/8/1934	'RNA-Seq transcriptome data of human cells infected with influenza A/Puerto Rico/8/1934 (H1N1) virus'
Kupke2020	A/Puerto Rico/8/1934	'Single-Cell Analysis Uncovers a Vast Diversity in Intracellular Viral Defective Interfering RNA Content Affecting the Large Cell-to-Cell Heterogeneity in Influenza A Virus Replication.'
VdHoecke2015	A/Puerto Rico/8/1934	'Analysis of the genetic diversity of influenza A viruses using next-generation DNA sequencing.'
Alnaji2019_Cal07	A/California/07/2009	'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Alnaji2019_NC	A/New Caledonia/20-JY2/1999	'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Mendes2021	A/WSN/1933	'Library-based analysis reveals segment and length dependent characteristics of defective influenza genomes.'
Boussier2020	A/WSN/1933	'RNA-seq accuracy and reproducibility for the mapping and quantification of influenza defective viral genomes.'
Alnaji2019_Perth	A/Perth/16/2009	'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Berry2021_A	A/Connecticut/Flu122/2013	'High confidence identification of intra-host single nucleotide variants for person-to-person influenza transmission tracking in congregate settings'
Penn2022	A/turkey/Turkey/1/2005	'Levels of Influenza A Virus Defective Viral Genomes Determine Pathogenesis in the BALB/c Mouse Model.'
Lui2019	A/Anhui/1/2013	'SMRT sequencing revealed the diversity and characteristics of defective interfering RNAs in influenza A (H7N9) virus infection. '
Alnaji2019_BLEE	B/Lee/1940	'Sequencing Framework for the Sensitive Detection and Precise Mapping of Defective Interfering Particle-Associated Deletions across Influenza A and B Viruses.'
Berry2021_B	B/Victoria/504/2000	'High confidence identification of intra-host single nucleotide variants for person-to-person influenza transmission tracking in congregate settings'
Valesano2020_Vic	B/Victoria/504/2000	'Influenza B Viruses Exhibit Lower Within-Host Diversity than Influenza A Viruses in Human Hosts.'
Sheng2018	B/Brisbane/60/2008	'Identification and characterization of viral defective RNA genomes in influenza B virus.'
Berry2021_B_Yam	B/Yamagata/16/1988	'High confidence identification of intra-host single nucleotide variants for person-to-person influenza transmission tracking in congregate settings'
Southgate2019	B/Yamagata/16/1988	'Influenza classification from short reads with VAPOR facilitates robust mapping pipelines and zoonotic strain detection for routine surveillance applications'
Valesano2020_Yam	B/Yamagata/16/1988	'Influenza B Viruses Exhibit Lower Within-Host Diversity than Influenza A Viruses in Human Hosts.'

How to enter a custom dataset?

General info:

Custom datasets need to be in *.csv format.
They have exactly four columns.
Ordering of the four columns is crucial!

Example dataset:
Including correct order and naming of the headers. The header names start with a capitalised letter.

	Column 1	Column 2	Column 3	Column 4
header names	Segment	Start	End	NGS_read_count
column data type	character (string)	integer	integer	integer
description	Name of the segment where the DelVG is originating from	Start position of the deletion site	End position of the deletion site	Number of counts in the NGS data of this specific DelVG candidate

Examplary input file with five DelVGs:

Segment	Start	End	NGS_read_count
PB2	163	2139	42
PA	167	1990	161
PB1	113	2165	37
PB2	109	2152	69
PB2	163	2152	73

What are 'direct repeats'?

The sequence of the RNA before the starting point and end point of the deletion site can be the same (is repeated). This phenomena is described in literature a 'direct (sequence) repeat'. Direct repeats can be of different length and are disscussed to be a driving factor in the generation of DelVGs.

The actual start and end point of a DelVG that has a direct repeat longer than 0 can not be determined certainly. In the given image a direct repeat of length 3 is depicted. There are four possible ways on how it could have been created during the experiment. If n is the length of the direct repeat, there are always n+1 options on how it could have been created.

Contact information

To get into contact with the developement team open a new issue on GitHub. There you can get help with more detailed questions and come up with new ideas and features for the application.

Enter new data

Upload new dataset

Create new strain

Analyse single dataset

Select dataset and define parameters for analysis

Distribution of NGS count

Frame shift

Segment distribution

DelVG lengths

Distribution of deletion sites

3' and 5' sequence end comparision

Deletion site mapping

Frequency of direct repeats

Nucleotide enrichment at start of deletion site

Nucleotide enrichment at end of deletion site

Analyse multiple datasets

Select dataset and define parameters for analysis

NGS counts

Deletion shift

Segment distribution

DelVG length

Direct repeats

Nucleotide enrichment (Start position)

Nucleotide enrichment (End position)

Identification of promising DelVGs by intersecting multiple datasets

Select datasets for intersection analysis

Intersecting DelVG candidates per dataset pair

Number of candidates per segment that occur in multiple datasets

Identified candidates by using mean and sum scoring

About this application

What this application is meant for

Overview about the different tabs

Verfied datasets

How to enter a custom dataset?

What are 'direct repeats'?

Contact information