Kazem Shekofteh, Dr.

Kazem Shekofteh is a postdoctoral research fellow at the Institute of Computer Engineering at Heidelberg University. His research interests focus on GPU computing, performance analysis of parallel programs and high performance computing in Bioinformatics. Previously, he was an assistant professor at Shandiz Institute of Higher Education, Mashhad, Iran. He got his PhD and MSc degree from Ferdowsi University of Mashhad, Iran in 2019. In late 2016, he was awarded a visiting scholarship at Heidelberg University. He has published papers in outstanding journals such as IEEE Transactions on Parallel and Distributed Systems. He has been serving as a lecturer of GPU Computing and seminar courses at Heidelberg University since 2022.

Research interests

  • Performance Analysis on GPU
  • Developing algorithms on Intelligence Processing Units (IPU)
  • Dealing with Sparsity (on IPUs)

Recent Service (4-year horizon)

Co-Chair

  • 2023: Workshop on IoT, Edge, and Mobile for Embedded Machine Learning (ITEM)
  • 2022: Workshop on IoT, Edge, and Mobile for Embedded Machine Learning (ITEM)

Student Volunteer Co-Chair

  • 2022: IEEE International Conference on Cluster Computing (CLUSTER)

Program Committee Member

  • 2023: IEEE ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGRID)
  • 2023: International Conference on Parallel Processing (ICPP)
  • 2023: International Symposium on Computing and Networking (CANDAR)

Poster Committee Member

  • 2023: The International Conference for High Performance Computing (SC)

Reviewer

  • IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • Journal of Parallel and Distributed Computing (JPDC)
  • Future Generation Computer Systems (FGCS)
  • International Conference on Supercomputing (ICS)
  • IEEE International Conference on Cluster Computing (CLUSTER)

Publications

  1. S. Kazem Shekofteh, Christian Alles and Holger Fröning
    Reducing Memory Requirements for the IPU using Butterfly Factorizations
    SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, SC-W 2023, Denver, CO, USA, November 12-17, 2023, 1255–1263, ACM, 2023
    @inproceedings{DBLP:conf/sc/ShekoftehAF23,
      author = {Shekofteh, S. Kazem and Alles, Christian and Fr{\"{o}}ning, Holger},
      title = {Reducing Memory Requirements for the {IPU} using Butterfly Factorizations},
      booktitle = {{SC} '23 Workshops of The International Conference
                        on High Performance Computing, Network, Storage, and Analysis, {SC-W}
                        2023, Denver, CO, USA, November 12-17, 2023},
      pages = {1255--1263},
      publisher = {{ACM}},
      year = {2023},
      url = {https://doi.org/10.1145/3624062.3624196}
      doi = {10.1145/3624062.3624196},
      timestamp = {Tue, 28 Nov 2023 00:00:00 +0100},
    }
    
  2. S. Kazem Shekofteh, Christian Alles, Nils Kochendörfer and Holger Fröning
    On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed Matrix Multiplication
    CoRR, abs/2310.00256, 2023
    @article{DBLP:journals/corr/abs-2310-00256,
      author = {Shekofteh, S. Kazem and Alles, Christian and Kochend{\"{o}}rfer, Nils and Fr{\"{o}}ning, Holger},
      title = {On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed
                        Matrix Multiplication},
      journal = {CoRR},
      volume = {abs/2310.00256},
      year = {2023},
      url = {https://doi.org/10.48550/arXiv.2310.00256}
      doi = {10.48550/ARXIV.2310.00256},
      eprinttype = {arXiv},
      eprint = {2310.00256},
      timestamp = {Wed, 18 Oct 2023 01:00:00 +0200},
    }
    
  3. S. Kazem Shekofteh, Hamid Noori, Mahmoud Naghibzadeh, Holger Fröning and Hadi Sadoghi Yazdi
    cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs
    IEEE Trans. Parallel Distributed Syst., 31(4), 766–778, 2020
    @article{DBLP:journals/tpds/ShekoftehNNFY20,
      author = {Shekofteh, S. Kazem and Noori, Hamid and Naghibzadeh, Mahmoud and Fr{\"{o}}ning, Holger and Yazdi, Hadi Sadoghi},
      title = {cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs},
      journal = {{IEEE} Trans. Parallel Distributed Syst.},
      volume = {31},
      number = {4},
      pages = {766--778},
      year = {2020},
      url = {https://doi.org/10.1109/TPDS.2019.2944602}
      doi = {10.1109/TPDS.2019.2944602},
      timestamp = {Fri, 02 Oct 2020 01:00:00 +0200},
    }
    
  4. S. Kazem Shekofteh, Hamid Noori, Mahmoud Naghibzadeh, Hadi Sadoghi Yazdi and Holger Fröning
    Metric Selection for GPU Kernel Classification
    ACM Trans. Archit. Code Optim., 15(4), 68:1–68:27, 2019
    @article{DBLP:journals/taco/ShekoftehNNYF19,
      author = {Shekofteh, S. Kazem and Noori, Hamid and Naghibzadeh, Mahmoud and Yazdi, Hadi Sadoghi and Fr{\"{o}}ning, Holger},
      title = {Metric Selection for {GPU} Kernel Classification},
      journal = {{ACM} Trans. Archit. Code Optim.},
      volume = {15},
      number = {4},
      pages = {68:1--68:27},
      year = {2019},
      url = {https://doi.org/10.1145/3295690}
      doi = {10.1145/3295690},
      timestamp = {Sat, 08 Jan 2022 00:00:00 +0100},
    }
    
  1. Mohamad Beheshti Roui, S. Kazem Shekofteh, Hamid Noori and Ahad Harati
    Efficient scheduling of streams on GPGPUs
    The Journal of Supercomputing, 76(11), 9270–9302, 2020
    @article{roui2020efficient,
      author = {Beheshti Roui, Mohamad and Shekofteh, S. Kazem and Noori, Hamid and Harati, Ahad},
      date = {2020/11/01},
      date-added = {2024-04-04 16:55:55 +0200},
      date-modified = {2024-04-04 16:56:58 +0200},
      doi = {10.1007/s11227-020-03209-x},
      id = {Beheshti Roui2020},
      isbn = {1573-0484},
      journal = {The Journal of Supercomputing},
      number = {11},
      pages = {9270--9302},
      title = {Efficient scheduling of streams on GPGPUs},
      url = {https://doi.org/10.1007/s11227-020-03209-x}
      volume = {76},
      year = {2020},
      bdsk-url-1 = {https://doi.org/10.1007/s11227-020-03209-x
    }
    
  2. F. Khorshahiyan, S. . -K. Shekofteh and H. Noori
    Predicting Execution Time of CUDA Kernels with Unified Memory Capability
    2019 9th International Conference on Computer and Knowledge Engineering (ICCKE) 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), 437–443, 2019
    @proceedings{khorshahiyan2019predicting,
      author = {Khorshahiyan, F. and Shekofteh, S. . -K. and Noori, H.},
      booktitle = {2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)},
      date-added = {2024-04-04 16:57:38 +0200},
      date-modified = {2024-04-04 16:58:19 +0200},
      doi = {10.1109/ICCKE48569.2019.8964952},
      isbn = {2643-279X},
      journal = {2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)},
      journal1 = {2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)},
      pages = {437--443},
      title = {Predicting Execution Time of CUDA Kernels with Unified Memory Capability},
      year = {2019},
      year1 = {24-25 Oct. 2019},
      bdsk-url-1 = {https://doi.org/10.1109/ICCKE48569.2019.8964952
    }
    
  3. Ahmadreza Montazerolghaem, S. -Kazem Shekofteh, M. H. Yaghmaee and Mahmoud Naghibzadeh
    A load scheduler for SIP proxy servers: design, implementation and evaluation of a history weighted window approach
    International Journal of Communication Systems, 30(3), e2980, John Wiley & Sons, Ltd, 2017
    @article{montazerolghaem2017load,
      author = {Montazerolghaem, Ahmadreza and Shekofteh, S. -Kazem and Yaghmaee, M. H. and Naghibzadeh, Mahmoud},
      date = {2017/02/01},
      date-added = {2024-04-04 17:00:11 +0200},
      date-modified = {2024-04-04 17:00:43 +0200},
      doi = {https://doi.org/10.1002/dac.2980}
      isbn = {1074-5351},
      journal = {International Journal of Communication Systems},
      journal1 = {International Journal of Communication Systems},
      journal2 = {International Journal of Communication Systems},
      journal3 = {Int J Commun Syst},
      keywords = {load balancer; scheduler; session initiation protocol; asterisk; overload},
      month = {2024/04/04},
      n2 = {Summary The widespread use of Session Initiation Protocol as a signalling protocol has created various challenges. An important one is that its throughput can be severely degraded when an overload happens in the proxy server because of several retransmissions from the user agent. One common approach to overcome this problem is ?load balancing?. A balancer needs to know the status of proxy servers, which are continuously gathered implicitly or explicitly. Implicit methods have averagely less overhead than explicit ones. This paper attempts to prevent throughput reduction by balancing the loads among available proxy servers properly using an implicit mechanism called History Weighted Average Response time. The proposed algorithm is robust because it incurs no extra processing to proxy servers. The novelty of the mechanism is making use of ?response time history? to estimate the load being currently processed on servers. By implementing in a real testbed, throughput and scalability are improved compared with an important state-of-the-art similar algorithm. This improvement stems from no need for modification in SIP protocol, easy implementation and application, simple computations for making decision and no need for extra feedback between servers and load balancer. Copyright ? 2015 John Wiley \& Sons, Ltd.},
      number = {3},
      pages = {e2980},
      publisher = {John Wiley \& Sons, Ltd},
      title = {A load scheduler for SIP proxy servers: design, implementation and evaluation of a history weighted window approach},
      url = {https://doi.org/10.1002/dac.2980}
      volume = {30},
      year = {2017},
      year1 = {2017},
      bdsk-url-1 = {https://doi.org/10.1002/dac.2980
    }
    
  4. A. Montazerolghaem, S. . -K. Shekofteh, G. Khojaste, M. Naghibzadeh and M. -H. Yaghmaee-M
    A novel load scheduling for session initiation protocol networks
    2014 4th International Conference on Computer and Knowledge Engineering (ICCKE) 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), 509–514, 2014
    @proceedings{montazerolghaem2014novel,
      author = {Montazerolghaem, A. and Shekofteh, S. . -K. and Khojaste, G. and Naghibzadeh, M. and Yaghmaee-M, M. -H.},
      booktitle = {2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)},
      date-added = {2024-04-04 17:01:42 +0200},
      date-modified = {2024-04-04 17:01:53 +0200},
      doi = {10.1109/ICCKE.2014.6993376},
      journal = {2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)},
      journal1 = {2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)},
      pages = {509--514},
      title = {A novel load scheduling for session initiation protocol networks},
      year = {2014},
      year1 = {29-30 Oct. 2014},
      bdsk-url-1 = {https://doi.org/10.1109/ICCKE.2014.6993376
    }
    
  5. Javad Mohebbi Najm Abad, S. Kazem Shekofteh, Hamid Tabatabaee and Maryam Mehrnejad
    CoreIIScheduler: Scheduling Tasks in a Multi-core-Based Grid Using NSGA-II Technique
    Intelligent Informatics, 507–518, Springer Berlin Heidelberg, 2013
    @proceedings{najmabad2013corell,
      address = {Berlin, Heidelberg},
      author = {Najm Abad, Javad Mohebbi and Shekofteh, S. Kazem and Tabatabaee, Hamid and Mehrnejad, Maryam},
      booktitle = {Intelligent Informatics},
      date = {2013//},
      date-added = {2024-04-04 17:02:50 +0200},
      date-modified = {2024-04-04 17:03:11 +0200},
      editor = {Abraham, Ajith and Thampi, Sabu M},
      id = {10.1007/978-3-642-32063-7{\_}54},
      isbn = {978-3-642-32063-7},
      pages = {507--518},
      publisher = {Springer Berlin Heidelberg},
      title = {CoreIIScheduler: Scheduling Tasks in a Multi-core-Based Grid Using NSGA-II Technique},
      year = {2013}
    }
    
  6. S.Kazem Shekofteh, Hossein Deldari and Maryam Baradaran Khalkhali
    Reducing cache contention in a multi-core processor via a scheduler
    2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE), 6, V6-555-V6-558, 2010
    @inproceedings{shekofteh2010reducing,
      author = {Shekofteh, S.Kazem and Deldari, Hossein and Khalkhali, Maryam Baradaran},
      booktitle = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)},
      date-added = {2024-04-04 17:06:21 +0200},
      date-modified = {2024-04-04 17:06:30 +0200},
      doi = {10.1109/ICACTE.2010.5579213},
      keywords = {component;multi-core architecture;resource contention;shared cache;thread scheduling},
      pages = {V6-555-V6-558},
      title = {Reducing cache contention in a multi-core processor via a scheduler},
      volume = {6},
      year = {2010},
      bdsk-url-1 = {https://doi.org/10.1109/ICACTE.2010.5579213
    }
    
  7. H. Salami, H. Saadatfar, Farhad Rahmani Fard, S. Kazem Shekofteh and H. Deldari
    Improving cluster computing performance based on job futurity prediction
    2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE) 2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE), 6, V6-303–V6-307, 2010
    @proceedings{salami2010improving,
      author = {Salami, H. and Saadatfar, H. and Fard, Farhad Rahmani and Shekofteh, S. Kazem and Deldari, H.},
      booktitle = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)},
      date-added = {2024-04-04 17:05:56 +0200},
      date-modified = {2024-04-04 17:06:09 +0200},
      doi = {10.1109/ICACTE.2010.5579820},
      isbn = {2154-7505},
      journal = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)},
      journal1 = {2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE)},
      pages = {V6-303--V6-307},
      title = {Improving cluster computing performance based on job futurity prediction},
      vo = {6},
      volume = {6},
      year = {2010},
      year1 = {20-22 Aug. 2010},
      bdsk-url-1 = {https://doi.org/10.1109/ICACTE.2010.5579820
    }
    
  8. M. Baradaran -Khalkhali, S. Kazem Shekofteh, S. Toosizadeh and M. . -R. Akbarzadeh -T
    Exploiting fuzzy approximator to head pose estimation
    Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010 Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010, 68–72, 2010
    @proceedings{khalkhali2010exploiting,
      author = {-Khalkhali, M. Baradaran and Shekofteh, S. Kazem and Toosizadeh, S. and -T, M. . -R. Akbarzadeh},
      booktitle = {Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010},
      date-added = {2024-04-04 17:04:59 +0200},
      date-modified = {2024-04-04 17:05:17 +0200},
      isbn = {2326-0319},
      journal = {Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010},
      journal1 = {Signal Processing Algorithms, Architectures, Arrangements, and Applications SPA 2010},
      pages = {68--72},
      title = {Exploiting fuzzy approximator to head pose estimation},
      year = {2010},
      year1 = {23-25 Sept. 2010}
    }
    
  9. S. K. Shekofteh, M. Baradaran-K, S. Toosizadeh, M. -R. Akbarzadeh-T and M. Hashemi
    Head pose estimation using fuzzy approximator augmented by redundant membership functions
    2010 2nd International Conference on Software Technology and Engineering 2010 2nd International Conference on Software Technology and Engineering, 2, V2-306–V2-310, 2010
    @proceedings{shekofteh2010head,
      author = {Shekofteh, S. K. and Baradaran-K, M. and Toosizadeh, S. and Akbarzadeh-T, M. -R. and Hashemi, M.},
      booktitle = {2010 2nd International Conference on Software Technology and Engineering},
      date-added = {2024-04-04 17:03:53 +0200},
      date-modified = {2024-04-04 17:04:03 +0200},
      doi = {10.1109/ICSTE.2010.5608799},
      journal = {2010 2nd International Conference on Software Technology and Engineering},
      journal1 = {2010 2nd International Conference on Software Technology and Engineering},
      pages = {V2-306--V2-310},
      title = {Head pose estimation using fuzzy approximator augmented by redundant membership functions},
      vo = {2},
      volume = {2},
      year = {2010},
      year1 = {3-5 Oct. 2010},
      bdsk-url-1 = {https://doi.org/10.1109/ICSTE.2010.5608799
    }
    
  10. M. Baradaran-K, S. K. Shekofteh, S. Toosizadeh and M. -R. Akbarzadeh-T
    A fuzzy approximator with Gaussian membership functions to estimate a human’s head pose
    2010 10th International Conference on Intelligent Systems Design and Applications 2010 10th International Conference on Intelligent Systems Design and Applications, 1154–1158, 2010
    @proceedings{baradaran2010fuzzy,
      author = {Baradaran-K, M. and Shekofteh, S. K. and Toosizadeh, S. and Akbarzadeh-T, M. -R.},
      booktitle = {2010 10th International Conference on Intelligent Systems Design and Applications},
      date-added = {2024-04-04 17:03:24 +0200},
      date-modified = {2024-04-04 17:03:34 +0200},
      doi = {10.1109/ISDA.2010.5687029},
      isbn = {2164-7151},
      journal = {2010 10th International Conference on Intelligent Systems Design and Applications},
      journal1 = {2010 10th International Conference on Intelligent Systems Design and Applications},
      pages = {1154--1158},
      title = {A fuzzy approximator with Gaussian membership functions to estimate a human's head pose},
      year = {2010},
      year1 = {29 Nov.-1 Dec. 2010},
      bdsk-url-1 = {https://doi.org/10.1109/ISDA.2010.5687029
    }