Introduction
This package contains utilities which may be useful in working with AIRR-C JSON germline sets
Installation
The package requires python, and can be installed using pip:
pip install receptor-germline-tools
It requires Biopython, which may need to be installed separately.
Using the rich information in AIRR-C germline sets
Quick summary:
Given an AIRR-C germline set germlines.json
, create a FASTA-formatted IMGT-gapped set, in which the header lists paralogs (other items
in the set with the same sequence) and aliases (other names by which this item has been known):
create_fasta -i germlines.json -o germlines.fasta -g -f paralogs,aliases
Given an AIRR-C rearrangements file or Changeo file rearrangements.tsv
, create a copy that also includes columns for subgroup number,
gene number and allele number of each V,D and J allele, using information from the germline set:
add_germline_annotations rearrangements.tsv germlines.json \
rearrangements_annotated.tsv \
-f subgroup_designation,gene_designation,allele_designation
Fields that can be included with the -f option
More detail:
create_fasta - Create a gapped or ungapped FASTA file from an AIRR-C JSON-format germline set, optionally containing additional information in the header
add_germline_annotations - Annotate an AIRR-C rearrangements file with additional germline information from a AIRR-C JSON-format germline set
Working with IGLabel-style sequence names
Alleles in the mouse germline sets are allocated four-letter IgLabel-style labels with ‘dummy’ subgroup and allele designations. These can be used as-is with many tools.
add_germline_annotations - Convert IgLabel-style labels in an alignment file to dummy IUIS format
convert_fasta_labels - Create a custom auxiliary data file for IgBlast