Sr. Principal SOC Fabric Architect

Tenstorrent in Toronto, Ontario

The Tenstorrent team combines technologists from different disciplines who come together with a shared passion for AI and a deep desire to build great products. We value collaboration, curiosity, and a commitment to solving hard problems. Find out more about our culture .

AI is redefining the computing paradigm. The new paradigm computation demand is incommensurable with the existing software and hardware criteria. The best AI solutions require unifying the innovations in the software programming model, compiler technology, heterogenous computation platform, networking technology, and semiconductor process and packaging technology. Tenstorrent drives the innovations through holistic views of each technological component in software and hardware to unify them to create the best AI platform.

As a performance architect in the dynamic and motivated Tenstorrent Platform Architecture team, you will work in a cross-functional team on ML software stacks, HPC and general purpose workloads, graph compiler, cache coherency protocols, superscalar CPU, fabric/interconnection, networking, and DPU.
Locations:
We have presence in Toronto, Austin, Santa Clara, Portland, and Raleigh. We are open to remote candidates on a case-by-case basis.

Tenstorrent offers a highly competitive compensation package and benefits, and we are an equal opportunity employer.
    • Collaborate with the software team and platform architecture team to understand fabric bandwidth and latency requirements and real-time constraints for AI accelerator, CPU, security, and networking traffic. Devise QoS and ordering rules among the CPU, accelerator, and IO coherent/non-coherent traffics.
    • Identify representative traffic patterns for the software applications. Perform data-driven analysis to evaluate fabric topology, QoS, memory architecture , and u-architecture solutions to improve performance, power efficiency, or reduce hardware.
    • Create directory-base cache coherency specification to satisfy performance requirements of coherent multiple-cluster CPU system and accelerator. Tradeoff protocol complexity and performance requirements.
    • Design cache hierarchy to create best performance
    • Set SoC architecture direction based on the data analysis and work with a cross-functional team to achieve the best hardware/software solutions to meet PPA goals.
    • Develop a SoC cycle-accurate performance model includes memory sub-systems, directory-based coherence cache controllers, fabric interconnects, and fabric switches that describe the microarchitecture, use it for evaluation of new features.
    • Collaborate with RTL and Physical design engineers to make power, performance, and area trade-offs.
    • Drive analysis and correlation of performance feature both pre and post-silicon.
    • BS/MS/PhD in EE/ECE/CE/CS
    • Strong grasp of NoC topologies, routing algorithms, queuing, traffic scheduling, and QoS requirements.
    • Expertise in cache coherency protocols (AMBA CHI/AXI protocol), DDR/LPDDR/GDDR memory technology, and IO technology (PCIe/CCIX/CXL).
    • Prior experience or strong understanding of traffic patterns for ML/AI algorithms in a heterogeneous computation system is a plus.
    • Prior experience on formal verification of cache coherence protocols is a plus.
    • Proficient in C/C++ programming. Experience in the development of highly efficient C/C++ CPU models.
Apply