The Opportunity
Software Engineering plays a key role in insitro’s approach to rethinking drug development. Our team is responsible for ensuring our biological data factory’s robots and instruments produce high quality data, optimizing storage, queries, and ingestion of petabytes of experimental results. On top of this stack, we build the infrastructure that our machine learning engineers and scientists leverage to train powerful models that solve key problems in the drug development process.
You will work closely with a cross-functional team of scientists, bioengineers, and data scientists to develop data architectures and systems on the high throughput platforms that enable our scientists to be maximally productive. You will design, implement, and deploy novel methods that use a broad spectrum of data engineering approaches, including techniques at the forefront of the field. You will work as part of a team to rigorously design our data platform, identify key architectural performance improvements and support ongoing discovery and automation platforms.
Here are some examples of the style of project you can expect to work on:
- Design cheminformatic tools and pipelines that enable us to wrangle multi-billion molecule screening libraries.
- Build image processing pipelines that transform raw microscopy data into phenotype predictions through our machine learning processes.
- Architect data cataloging and provenance tracking solutions that accelerate multidisciplinary scientific teams.
You will be joining as the founding team of a biotech startup that has long-term stability due to significant funding, but yet is very much in formation. A lot can change in this early and exciting phase, providing many opportunities for significant impact. You will work closely with a very talented team, learn a broad range of skills, and help shape insitro’s culture, strategic direction, and outcomes. Join us, and help make a difference to patients!
About You
- BS, MS, or Ph.D. in computer science, statistics, mathematics, physics, engineering, or equivalent practical experience.
- Expertise in one or more general-purpose programming languages such as Python, C/C++, or Go. We primarily use Python.
- Familiarity with cloud computing services. We use AWS.
- Familiarity with database technologies, data pipelines, workflow engines, distributed computing technologies such as Spark. We primarily use Postgres, redun , and Spark.
- Familiarity with web services and application frameworks such as Django and Flask.
- Ability to communicate effectively and collaborate with people of diverse backgrounds and job functions.
- Proficiency in Linux environments including shell scripting and experience with version control practices and tools such as git.
- Passion for making a positive impact through your work.
Nice to Have
- Experience with biological data such as DNA sequences, RNAseq, proteomics and microscopy images.
- Experience with medium-sized data sets (100TB+)
- Experience with the SciPy/PyData ecosystem (numpy, pandas, scipy, dask, etc.)
- Demonstrated ability to develop novel data engineering methods that go beyond putting together existing code, and to apply problem-solving skills to complex issues.
- Real-world work experience in software development for high-end data processing engines.
Benefits at insitro
- Highly competitive salary
- Health insurance benefits
- Gym allowance
- Flexible work schedule
- Home office equipment
About insitro