Ariya Shajii's personal website



Seq is a programming language for computational genomics applications. With a syntax heavily inspired by (and virtually identical to) Python's and an LLVM backend, Seq aims to avoid the tediums of C programming while attaining better performance by applying domain-specific optimizations. (More details coming soon.)


EMA is an alignment tool for barcoded short-read sequencing data, such as those produced by 10x Genomics' Chromium platform. EMA is faster and more accurate than current aligners, and produces not only the final alignments but also interpretable per-alignment probabilities. Moreover, EMA can effictively align reads in particularly tricky genomic regions containing nearby homologous elements by exploiting the non-uniform read densities characteristic of barcoded read sequencing.


LAVA (Lightweight Assignment of Variant Alleles) is an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that inexact matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0, while performing 4-9 times faster than a standard NGS genotyping pipeline and optionally using as little as 5.2 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays.


Rho is a lightweight, extensible, dynamically typed programming language written in C, including a compiler and virtual machine. While the language is largely inspired by CPython, it offers several features not found in Python, such as GIL-free multithreading via actors. (No longer actively maintained.)


Play poker in IRC. That is all. (No longer actively maintained.)