Subset: sequencing
URI: nmdc:sequencing
Classes
Mixins
Slots
_16s_recover - Can a 16S gene be recovered from the submitted SAG or MAG?
_16s_recover_software - Tools used for 16S rRNA gene extraction
adapters - Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters
annot - Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter
assembly_name - Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community
assembly_qual - The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling 3 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated
assembly_software - Tool(s) used for assembly, including version number and parameters
bin_param - The parameters that have been applied during the extraction of genomes from metagenomic datasets
bin_software - Tool(s) used for the extraction of genomes from metagenomic datasets
chimera_check - A chimeric sequence, or chimera for short, is a sequence comprised of two or more phylogenetically distinct parent sequences. Chimeras are usually PCR artifacts thought to occur when a prematurely terminated amplicon reanneals to a foreign DNA strand and is copied to completion in the following PCR cycles. The point at which the chimeric sequence changes from one parent to the next is called the breakpoint or conversion point
compl_appr - The approach used to determine the completeness of a given SAG or MAG, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome
compl_score - Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores
compl_software - Tools used for completion estimate, i.e. checkm, anvi’o, busco
contam_score - The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases
contam_screen_input - The type of sequence data used as input
contam_screen_param - Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer
decontam_software - Tool(s) used in contamination screening
detec_type - Type of UViG detection
feat_pred - Method used to predict UViGs features such as ORFs, integration site, etc.
host_pred_appr - Tool or approach used for host prediction
host_pred_est_acc - For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature
lib_layout - Specify whether to expect single, paired, or other configuration of reads
lib_reads_seqd - Total number of clones sequenced from the library
lib_screen - Specific enrichment or screening methods applied before and/or after creating libraries
lib_size - Total number of clones in the library prepared for the project
lib_vector - Cloning vector type(s) used in construction of libraries
mag_cov_software - Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets
mid - Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters
nucl_acid_amp - A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids
nucl_acid_ext - A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample
number_contig - Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG
pcr_cond - Description of reaction conditions and components of PCR in the form of ‘initial denaturation:94degC_1.5min; annealing=…’
pcr_primers - PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters
pred_genome_struc - Expected structure of the viral genome
pred_genome_type - Type of genome predicted for the UViG
reassembly_bin - Has an assembly been performed on a genome bin extracted from a metagenomic assembly?
ref_db - List of database(s) used for ORF annotation, along with version number and reference to website or publication
seq_meth - Sequencing method used; e.g. Sanger, pyrosequencing, ABI-solid
seq_quality_check - Indicate if the sequence has been called by automatic systems (none) or undergone a manual editing procedure (e.g. by inspecting the raw data or chromatograms). Applied only for sequences that are not submitted to SRA,ENA or DRA
sim_search_meth - Tool used to compare ORFs with database, along with version and cutoffs used
single_cell_lysis_appr - Method used to free DNA from interior of the cell(s) or particle(s)
single_cell_lysis_prot - Name of the kit or standard protocol used for cell(s) or particle(s) lysis
sop - Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences
sort_tech - Method used to sort/isolate cells or particles of interest
target_gene - Targeted gene or locus name for marker gene studies
target_subfragment - Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA
tax_class - Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes
tax_ident - The phylogenetic marker(s) used to assign an organism name to the SAG or MAG
trna_ext_software - Tools used for tRNA identification
trnas - The total number of tRNAs identified from the SAG or MAG
vir_ident_software - Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used
votu_class_appr - Cutoffs and approach used when clustering new UViGs in Rspecies-levelS vOTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside vOTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis
votu_db - Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in ‘species-level’ vOTUs, if any
votu_seq_comp_appr - Tool and thresholds used to compare sequences when computing ‘species-level’ vOTUs
wga_amp_appr - Method used to amplify genomic DNA in preparation for sequencing
wga_amp_kit - Kit used to amplify genomic DNA in preparation for sequencing