Standards
- FHIR
- Fast Healthcare Interoperability Resources is a standard describing data formats and elements and an application programming interface for exchanging electronic health records. The standard was created by the Health Level Seven International health-care standards organization.
- FASTA
- FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes.
- The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-processing tools and scripting languages like the R programming language, Python, Ruby, Haskell, and Perl
- FASTQ
- FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity.
- It has become the de facto standard for storing the output of high-throughput sequencing instruments such as the Illumina Genome Analyzer.
- OpenWDL
- The Workflow Description Language (WDL) is a way to specify data processing workflows with a human-readable and writeable syntax. WDL makes it straightforward to define complex analysis tasks, chain them together in workflows, and parallelize their execution.
- CWL
- Common Workflow Language (or CWL), is a growing language for defining workflows in a cross-platform and cross-domain manner. In biology in particular, we need workflows to automate complex analyses such as DNA variant calling, RNA sequencing, and genome assembly.
- VCF
- Variant Call Format, The Variant Call Format (VCF) specifies the format of a text file used in bioinformatics for storing gene sequence variations. The format has been developed with the advent of large-scale genotyping and DNA sequencing projects, such as the 1000 Genomes Project.