Description
1. QC of raw data
output files: /projects/rps/emh9/henafflab/2025-Gowanus_Biofilm_BioBAT/data/fastqc_results
output files: /projects/rps/emh9/henafflab/2025-Gowanus_Biofilm_BioBAT/data/multiqc_report
Is data ok to proceed?
- according to the report there are adapters so we must do trimming
- there are some duplicates (so maybe we could do also deduplication)
- maybe it is contaminated by human dna (we need to check)
2. Trimming
output files: /projects/rps/emh9/henafflab/2025-Gowanus_Biofilm_BioBAT/data/qc_after_fastp
- Trimming with cutadapt
Good
output files: /projects/rps/emh9/henafflab/2025-Gowanus_Biofilm_BioBAT/data/QC/03cutadapt
3. Deduplication
- Deduplication with clumpify :
- Another tool that I would like to check:
Embed GitHub
‣
output files: /projects/rps/emh9/henafflab/2025-Gowanus_Biofilm_BioBAT/data/QC/03cutadapt/deduplicated_clumpify
Iternal thoughts + questions:
Question: are there human samples in the mix?
Metaphlan does human genes removal i think.