FPGA programming for the masses DF Bacon, R Rabbah, S Shukla Communications of the ACM 56 (4), 56-63, 2013 | 261 | 2013 |
A scalable multi-TeraOPS deep learning processor core for AI trainina and inference B Fleischer, S Shukla, M Ziegler, J Silberman, J Oh, V Srinivasan, J Choi, ... 2018 IEEE symposium on VLSI circuits, 35-36, 2018 | 153 | 2018 |
Approximate computing: Challenges and opportunities A Agrawal, J Choi, K Gopalakrishnan, S Gupta, R Nair, J Oh, DA Prener, ... 2016 IEEE International Conference on Rebooting Computing (ICRC), 1-8, 2016 | 122 | 2016 |
Single bit error correction implementation in CRC-16 on FPGA S Shukla, NW Bergmann Proceedings. 2004 IEEE International Conference on Field-Programmable …, 2004 | 85 | 2004 |
A compiler and runtime for heterogeneous computing J Auerbach, DF Bacon, I Burcea, P Cheng, SJ Fink, R Rabbah, S Shukla Proceedings of the 49th Annual Design Automation Conference, 271-276, 2012 | 84 | 2012 |
9.1 A 7nm 4-core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-aware throttling A Agrawal, SK Lee, J Silberman, M Ziegler, M Kang, S Venkataramani, ... 2021 IEEE International Solid-State Circuits Conference (ISSCC) 64, 144-146, 2021 | 79 | 2021 |
RaPiD: AI accelerator for ultra-low precision training and inference S Venkataramani, V Srinivasan, W Wang, S Sen, J Zhang, A Agrawal, ... 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021 | 78 | 2021 |
QUKU: a two-level reconfigurable architecture S Shukla, NW Bergmann, J Becker IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and …, 2006 | 60 | 2006 |
Efficient AI system design with cross-layer approximate computing S Venkataramani, X Sun, N Wang, CY Chen, J Choi, M Kang, A Agarwal, ... Proceedings of the IEEE 108 (12), 2232-2250, 2020 | 53 | 2020 |
A 3.0 TFLOPS 0.62 V scalable processor core for high compute utilization AI training and inference J Oh, SK Lee, M Kang, M Ziegler, J Silberman, A Agrawal, ... 2020 IEEE Symposium on VLSI Circuits, 1-2, 2020 | 43 | 2020 |
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits CY Chen, K Gopalakrishnan, J Oh, SK Shukla, V Srinivasan US Patent 10,120,685, 2018 | 37 | 2018 |
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits CY Chen, K Gopalakrishnan, J Oh, LM Saltzman, SK Shukla, V Srinivasan US Patent 10,528,356, 2020 | 35 | 2020 |
A scalable multi-TeraOPS core for AI training and inference S Shukla, B Fleischer, M Ziegler, J Silberman, J Oh, V Srinivasan, J Choi, ... IEEE Solid-State Circuits Letters 1 (12), 217-220, 2018 | 34 | 2018 |
QUKU: a dual-layer reconfigurable architecture NW Bergmann, SK Shukla, J Becker ACM Transactions on Embedded Computing Systems (TECS) 12 (1s), 1-26, 2013 | 30 | 2013 |
QUKU: A FPGA based flexible coarse grain architecture design paradigm using process networks S Shukla, NW Bergmann, J Becker 2007 IEEE International Parallel and Distributed Processing Symposium, 1-7, 2007 | 28 | 2007 |
A 7-nm four-core mixed-precision AI chip with 26.2-TFLOPS hybrid-FP8 training, 104.9-TOPS INT4 inference, and workload-aware throttling SK Lee, A Agrawal, J Silberman, M Ziegler, M Kang, S Venkataramani, ... IEEE Journal of Solid-State Circuits 57 (1), 182-197, 2021 | 20 | 2021 |
QUKU: A fast run time reconfigurable platform for image edge detection S Shukla, NW Bergmann, J Becker International Workshop on Applied Reconfigurable Computing, 93-98, 2006 | 19 | 2006 |
QUKU: A coarse grained paradigm for FPGAs S Shukla, NW Bergmann, J Becker Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2006 | 18 | 2006 |
And then there were none: A stall-free real-time garbage collector for reconfigurable hardware DF Bacon, P Cheng, S Shukla ACM SIGPLAN Notices 47 (6), 23-34, 2012 | 15 | 2012 |
Cycle-accurate replay and debugging of running FPGA systems D Foisy, SK Shukla US Patent 9,217,774, 2015 | 11 | 2015 |