Experience

GPU Architect Intern at Samsung ACL (May 2019 - Aug 2019)

Worked on texture unit bottleneck analysis and proposed an approximate texture filtering technique to improve the efficiency of existing and emerging programs.
Explored operand reuse for improving vector register file bandwidth.
Performed studies in the production C++ architecture simulator used by product teams to define next generation architectures.
Work resulted in filing one US patent application.

CPU Design Intern at Intel Corporation (Sep 2017 - Dec 2017)

Proposed microarchitecture techniques to improve instruction decode and steering stages (front-end of CPU).
Developed RTL for above blocks; performed area, power, timig analysis; and formal verification.
Performed synthesis and PnR of existing blocks to analyze for timing failures and study the impact of different architectural choices.

ML Accelerator Architect Intern at Hewlett Packard Labs (May 2017 - Aug 2017)

Designed PUMA: an ISA-programmable and general-purpose architecture built with NVM crossbars that can implement all varieties of ML applications (CNN, LSTM, MLP etc.).
Developed a cycle-level simulator (performance, power and functionality) for the proposed architecture for design space exploration and benchmarking.
Proposed compiler optimizations for improving performance.
Work has been adopted for Advanced Development projects at HPE - DPE-PUMA.