MobiVision Epigenomics Algorithm Introduction - ATAC

Algorithm Overview

MobiVision ATAC is designed for analyzing single-cell ATAC-seq data generated from the MobiNova platform. The key analytical steps are illustrated in the following diagram:

Barcode Correction

The schematic diagram of the ATAC library generated by the MobiNova platform is shown below:

The MobiDrop scATAC fastq data is paired-end sequenced. Read 1 from 5' to 3' consists of cell barcode, UMI, MEC fixed sequence, and insert DNA. When processing the input fastq data, mobivision atac first corrects the cell barcode in Read 1. If the cell barcode exists in the built-in whitelist of mobivision, the read contains a valid cell barcode and can proceed to the next analysis step. If the cell barcode is not in the whitelist, the read is invalid and discarded. When comparing the cell barcode with the barcode sequences in the whitelist, a hamming distance <=1 per 10 bases is allowed for passing. In the output valid reads, the cell barcode corresponding to the Read 1 sequence is the corrected cell barcode. The cell barcode and UMI sequences are stored in the read ID, not in the read sequence.

For reads with corrected cell barcodes, further adapter removal is required. Read 1 needs to remove the MEB sequence at its 3' end and the reverse complement of the MEC sequence at its 5' end. Read 2 needs to remove the MEC sequence at its 3' end. The allowed mismatch rate for adapter trimming is 0.1. After trimming, valid and clean fastq files are obtained and can be used for subsequent alignment.

Alignment

Mobivision atac uses the built-in bowtie2 software for paired-end alignment, generating a .bam output file that includes both mapped and unmapped reads.

For the aligned bam file, further filtering and deduplication are performed. Only paired-end alignments with MapQ ≥ 30 are retained, and only alignments with lengths ≤ 2000 bp are kept. Duplicate fragments are removed based on the cell barcode, chromosome name, alignment start, and alignment end in the alignment information, resulting in a filtered and deduplicated filtered.bed file. This file is then used to generate a visualization .bw file. If the sample is a dual-species sample, a corresponding .bw file is generated for each species.

Peaks Calling and Annotation

The deduplicated and filtered filtered.bed file is used for peak calling with the built-in macs2 software in mobivision atac. If no peak type is specified, the narrow peak type is used by default. To call broad peaks, the --peaktypebroad parameter must be specified. If --control is specified, IgG data is used as the control during peak calling to correct for background noise. The final output is a peaks file with the extension .narrowPeak or .broadPeak.

The obtained peaks file is annotated based on the following principles:

● The promoter region is defined as the interval from 1000 bpupstream to 100 bp downstream of the transcription start site (TSS) (-1kb, +100 bp).

● A distal peak refers to a peak that is within 200 kb of itsnearest TSS but does not fall within the promoter region.

● A distal peak may also refer to a peak that overlaps with atranscript but is neither classified as a promoter region nor as a distalpeak under the above condition. Such peaks are still termed distal peaks.

● Peaks that do not fall into any of the above categories areclassified as intergenic peaks.

Valid Fragments

Valid Fragments, also referred to as fragments in peaks, are defined as fragments that have at least one base overlapping with a peak region. These fragments are identified as fragmentsInPeaks. This data is used as input for cell calling.

Cell Calling

mobivision atac currently employs a dynamic threshold strategy for cell barcode filtering: First, all barcodes are sorted in descending order based on the number of fragments falling within peak regions. The fragment count at the 95th percentile position of the expected cell number N (default 3000, i.e., the 2850th position when N=3000) is taken as the value m. Then, m/10 is set as the threshold. All barcodes with fragment counts exceeding this threshold are identified as valid cells. For example, when N=3000 and m=20000, the threshold is set to 2000. In this case, all barcodes with fragment counts exceeding 2000 are retained (as illustrated in the example, resulting in 9000 cells). The advantage of this method is its ability to automatically adjust the filtering criteria based on data characteristics, ensuring reliable cell identification results for datasets of varying scales.

Report Generation

Based on the above analysis results and intermediate data, a summary report of the sample analysis is generated, including the following five sections: Sequencing, Mapping, Cell, Targeting, and t-SNE Projection.

1. Sequencing: Primarily providesstatistics on the sequencing quality of the input library.

2. Mapping: Summarizes the alignmentresults of the library.

3. Cell: Provides statistics on thefinal cell calling results and the generated matrix.

4. Targeting: Includes annotationstatistics for fragments and peaks.

5. t-SNE Projection: Utilizes LSA fordimensionality reduction, t-SNE for mapping, and Louvain for clustering.

寿宁县| 嵊泗县| 临高县| 衡山县| 康保县| 平顶山市| 兴国县| 临桂县| 甘孜县| 建水县| 永康市| 白河县| 安国市| 东乡| 庄浪县| 南昌市| 巢湖市| 泽普县| 万荣县| 灵石县| 鄱阳县| 新巴尔虎右旗| 贵定县| 观塘区| 美姑县| 朔州市| 东兰县| 湟源县| 游戏| 灵寿县| 康平县| 曲沃县| 五大连池市| 大埔区| 洛南县| 天镇县| 营口市| 沈阳市| 迁西县| 楚雄市| 澄江县|

成品视频91久久,亚洲黄色一本久道中文字幕,青草伊人久久综合,女同精品区视频,精品久日韩一区高清.,色天使久久,欧美日本美女在线一区二区,久久久久色网在线,日韩精品亚视频国产无破