The Opportunity
Key to insitro’s approach to rethinking drug development is the use of machine learning on high-content human data to build models of biological states and understand the effect of perturbations on those states. A particular focus area are molecular measurements of biological states, including levels of gene expression (transcriptomics) and protein expression (proteomics).
As an applied Machine Learning Scientist for Molecular Omics, you will develop and apply cutting edge machine learning and bioinformatic methods to analyze high-content omics data to build models of biological state as manifested in these data. and uncover new disease biology. We acquire such data from both human samples and from our high throughput wetlab, where we build and assay iPSC-derived cellular disease models under genetic and chemical perturbation, using both single-cell and bulk RNA-seq. You will devise new and meaningful representations for these omic data sets that reveal underlying biological processes and the effect of diverse factors and interventions on those biologies.
Your work will involve the development and deployment of cutting edge methods in classical genomics and machine learning, including deep learning. The data we deal with will require addressing challenges such as distribution shift, experimental artifacts, data sparsity, and small sample sizes, among other unique challenges. You will need to develop fit-for-purpose approaches that utilize methods such as self-supervised learning, multi-task learning, few-shot learning, network models, and more. You will work in collaboration with the software engineering team to develop these methods as robust, reusable platform components.
You will work closely with biological collaborators to design and analyze in-house experiments, ensuring that the experimental designs produce data that are fit for purpose for machine learning. You will also provide input to our corporate development team on initiatives to acquire or construct data from external sources. Finally, you will integrate data with patient genetics and diverse clinical and cellular phenotypes (including microscopy) to identify molecular targets for impactful therapeutics.
You will be joining an agile and fast growing biotech startup that has long-term stability due to significant funding. You will have ample opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!
About You
- Ph.D. in computational biology, genetics, biomedical informatics, biostatistics, bioinformatics, computer science, machine learning or a related discipline, or equivalent practical experience (e.g., a Masters degree plus 2 years in relevant industry experience);
- Experience using and developing cutting-edge methods for analyzing NGS sequencing and/or proteomics data sets.
- Strong fundamentals in applied multivariate statistics
- Expertise in machine learning (including deep-learning); familiarity with machine learning application on molecular omics data;
- Strong programming skills in Python, or strong programming skills in R and experience in Python
- Interest in uncovering novel disease biology
- Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions in a fast-paced startup environment
- Passion for making a difference in the world.
Nice to Have
- Familiarity with common deep learning toolkits such as tensorflow, pytorch, keras
- Experience with modeling sequencing artifacts (e.g. GC content, fragment length bias, overdispersion, etc.) and interpretation of QC measurements to guide assay development
- Expertise with NGS data processing tools (samtools, GATK, IGV, etc)
- Experience working with diverse functional genomic assays (RNA/DNase/ATAC/ChIP-seq, etc); exposure to CRISPR-based experiments a plus
- Some understanding of human physiology or disease biology (especially cancer, metabolism, or neurodegeneration)
- Publication record of high-quality work in biomedical, machine learning, or statistics venues;
- Proficiency in modern software development tools, such as: Linux environment (including shell/Bash scripting), version control practices and tools (e.g., Git), or modern workflow management frameworks (Snakemake, Cromwell/CWL/WDL, NextFlow, etc)
- Familiarity with cloud computing services (e.g., AWS or GCP) and workflow management tools or batch scheduling systems (e.g. SLURM);
- Proficiency in C++ or other compiled, statically-typed languages
- Experience with database languages (e.g., SQL)
Compensation & Benefits at insitro
Our target starting salary for successful US-based applicants for this role is $160,000 - $215,000. To determine starting pay, we consider multiple job-related factors including a candidate’s skills, education and experience, the level at which they are actually hired, market demand, business needs, and internal parity. We may also adjust this range in the future based on market data.
This role is eligible for participation in our Annual Performance Bonus Plan (based on company targets by role level and annual company performance) and our Equity Incentive Plan, subject to the terms of those plans and associated policies.
In addition, insitro also provides our employees:
- 401(k) plan with employer matching for contributions
- Excellent medical, dental, and vision coverage (insitro pays 100% of premiums for employees), as well as mental health and well-being support
- Open, flexible vacation policy
- Paid parental leave
- Quarterly budget for books and online courses for self-development
- Support to occasionally attend professional conferences that are meaningful to your career growth and development
- New hire stipend for home office setup
- Monthly cell phone & internet stipend
- Access to free onsite baristas and cafe with daily lunch and breakfast
- Access to free onsite fitness center
- Commuter benefits
#LI-SF1
About insitro
insitro is a data-driven drug discovery and development company using machine learning and data at scale to transform the way that drugs are discovered and developed for patients. insitro is developing predictive machine learning models to discover underlying biologic state based on human cohort data and in-house generated cellular data at scale. These predictive models can be brought to bear on key bottlenecks in pharmaceutical R&D to advance novel targets and patient biomarkers, design therapeutics, and inform clinical strategy. insitro is advancing a wholly owned and partnered pipeline of biologic insights and molecules in neuroscience and metabolic diseases. Since formation in mid 2018, insitro has raised over $700 million from top tech, biotech, and crossover investors and from collaborations with pharmaceutical partners. For more information on insitro, please visit the company’s website at www.insitro.com .