Sunil Shukla

Cited by

	All	Since 2019
Citations	1307	821
h-index	17	13
i10-index	21	14

200

100

150

2007200820092010201120122013201420152016201720182019202020212022202320247 12 19 14 13 14 34 53 55 84 76 72 117 104 160 196 192 51

Public access

View all

1 article

available

not available

Based on funding mandates

Co-authors

Jinwook OhRebellions Inc.Verified email at rebellions.ai
Juergen BeckerKarlsruhe Institute of TechnologyVerified email at kit.edu
Jungwook ChoiHanyang UniversityVerified email at hanyang.ac.kr
Ankur AgrawalResearch Staff Member - IBM ResearchVerified email at us.ibm.com
Suyog GuptaNvidiaVerified email at stanford.edu

Sunil Shukla

IBM Research

Verified email at us.ibm.com - Homepage


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
FPGA programming for the masses DF Bacon, R Rabbah, S Shukla Communications of the ACM 56 (4), 56-63, 2013	259	2013
A scalable multi-TeraOPS deep learning processor core for AI trainina and inference B Fleischer, S Shukla, M Ziegler, J Silberman, J Oh, V Srinivasan, J Choi, ... 2018 IEEE symposium on VLSI circuits, 35-36, 2018	148	2018
Approximate computing: Challenges and opportunities A Agrawal, J Choi, K Gopalakrishnan, S Gupta, R Nair, J Oh, DA Prener, ... 2016 IEEE International Conference on Rebooting Computing (ICRC), 1-8, 2016	119	2016
Single bit error correction implementation in CRC-16 on FPGA S Shukla, NW Bergmann Proceedings. 2004 IEEE International Conference on Field-Programmable …, 2004	84	2004
A compiler and runtime for heterogeneous computing J Auerbach, DF Bacon, I Burcea, P Cheng, SJ Fink, R Rabbah, S Shukla Proceedings of the 49th Annual Design Automation Conference, 271-276, 2012	80	2012
9.1 A 7nm 4-core AI chip with 25.6 TFLOPS hybrid FP8 training, 102.4 TOPS INT4 inference and workload-aware throttling A Agrawal, SK Lee, J Silberman, M Ziegler, M Kang, S Venkataramani, ... 2021 IEEE International Solid-State Circuits Conference (ISSCC) 64, 144-146, 2021	71	2021
RaPiD: AI accelerator for ultra-low precision training and inference S Venkataramani, V Srinivasan, W Wang, S Sen, J Zhang, A Agrawal, ... 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021	68	2021
QUKU: a two-level reconfigurable architecture S Shukla, NW Bergmann, J Becker IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and …, 2006	60	2006
Efficient AI system design with cross-layer approximate computing S Venkataramani, X Sun, N Wang, CY Chen, J Choi, M Kang, A Agarwal, ... Proceedings of the IEEE 108 (12), 2232-2250, 2020	50	2020
A 3.0 TFLOPS 0.62 V scalable processor core for high compute utilization AI training and inference J Oh, SK Lee, M Kang, M Ziegler, J Silberman, A Agrawal, ... 2020 IEEE Symposium on VLSI Circuits, 1-2, 2020	41	2020
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits CY Chen, K Gopalakrishnan, J Oh, SK Shukla, V Srinivasan US Patent 10,120,685, 2018	37	2018
A scalable multi-TeraOPS core for AI training and inference S Shukla, B Fleischer, M Ziegler, J Silberman, J Oh, V Srinivasan, J Choi, ... IEEE Solid-State Circuits Letters 1 (12), 217-220, 2018	33	2018
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits CY Chen, K Gopalakrishnan, J Oh, LM Saltzman, SK Shukla, V Srinivasan US Patent 10,528,356, 2020	32	2020
QUKU: a dual-layer reconfigurable architecture NW Bergmann, SK Shukla, J Becker ACM Transactions on Embedded Computing Systems (TECS) 12 (1s), 1-26, 2013	30	2013
QUKU: A FPGA based flexible coarse grain architecture design paradigm using process networks S Shukla, NW Bergmann, J Becker 2007 IEEE International Parallel and Distributed Processing Symposium, 1-7, 2007	28	2007
QUKU: A fast run time reconfigurable platform for image edge detection S Shukla, NW Bergmann, J Becker International Workshop on Applied Reconfigurable Computing, 93-98, 2006	18	2006
QUKU: A coarse grained paradigm for FPGAs S Shukla, NW Bergmann, J Becker Schloss-Dagstuhl-Leibniz Zentrum für Informatik, 2006	18	2006
A 7-nm four-core mixed-precision AI chip with 26.2-TFLOPS hybrid-FP8 training, 104.9-TOPS INT4 inference, and workload-aware throttling SK Lee, A Agrawal, J Silberman, M Ziegler, M Kang, S Venkataramani, ... IEEE Journal of Solid-State Circuits 57 (1), 182-197, 2021	16	2021
And then there were none: A stall-free real-time garbage collector for reconfigurable hardware DF Bacon, P Cheng, S Shukla ACM SIGPLAN Notices 47 (6), 23-34, 2012	15	2012
Cycle-accurate replay and debugging of running FPGA systems D Foisy, SK Shukla US Patent 9,217,774, 2015	11	2015

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors