Principal Workload Performance Architect

Tenstorrent in Austin, TX

The Tenstorrent team combines technologists from different disciplines who come together with a shared passion for AI and a deep desire to build great products. We value collaboration, curiosity, and a commitment to solving hard problems. Find out more about our culture .

AI is redefining the computing paradigm. The new paradigm computation demand is incommensurable with the existing software and hardware criteria. The best AI solutions require unifying the innovations in the software programming model, compiler technology, heterogenous computation platform, networking technology, and semiconductor process and packaging technology. Tenstorrent drives the innovations through holistic views of each technological component in software and hardware to unify them to create the best AI platform.

As a performance architect in the dynamic and motivated Tenstorrent Platform Architecture team, you will work in a cross-functional team on ML software stacks, HPC and general-purpose workloads, graph compiler, cache coherence protocols, superscalar CPU, fabric/interconnection, networking, and DPU.
Locations:
We have presence in Toronto, Austin, Santa Clara, Portland, and Raleigh. We are open to remote candidates on a case-by-case basis.

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.

Due to U.S. Export Control laws and regulations, Tenstorrent is required to ensure compliance with licensing regulations when transferring technology to nationals of certain countries that have been sanctioned by the U.S. government.

As this position will have direct and/or indirect access to information, systems, or technologies that are subject to U.S. Export Control laws and regulations, please note that citizenship/permanent residency information and/or documentation will be required and considered as Tenstorrent moves through the employment process.
    • Collaborate with the software and platform architecture teams to understand hardware requirements for AI accelerator compiler, OS, video/image/voice processing, security, networking, and virtualization technology. Identify the application performance bottlenecks and functional requirements.
    • Perform full-stack workload characterization and performance analysis for AI, HPC, and CPU general-purpose applications. Identify representative benchmarks for the workloads. Perform data-driven analysis based on software profiling, performance model simulation, or analytical models to evaluate software and architecture solutions to PPA.
    • Set CPU architecture direction based on the data analysis and work with a cross-functional team to achieve the best hardware/software solutions to meet PPA goals.
    • Characterizing real-world workloads, conducting end-to-end system performance analysis and workload decomposition to gather requirements for SoC solutions. Generate representative CPU, accelerators, and SoC traces for the performance model to study PPA impacts and guide architecture decisions.
    • Work with Tenstorrent's graph compiler team and LLVM/GCC open source community to drive AI/CPU performance improvements. Identify the compiler optimization and align architecture and the compiler teams for implementing the improvements.
    • Drive analysis and correlation of performance feature both pre and post-silicon
    • BS/MS/PhD in EE/ECE/CE/CS
    • Strong background in CPU ISA, u-architecture research, and performance benchmarks.
    • Understanding SOC fabric, coherency protocols, memory technology, and accelerator technology is a plus.
    • Familiar with program tracing flows (SIMPOINT, SMART,..) to capture traces for applications.
    • Strong understanding of ML/AI algorithms, GCC and LLVM compilers, and OS kernel.
    • Proficient in C/C++ programming. Experience in the development of highly efficient C/C++ performance models.
Apply