


default search action
CCF Transactions on High Performance Computing, Volume 8
Volume 8, Number 1, February 2026
- Xiran Gao

, Li Chen, Xiaobing Feng:
Improving scalability of sequential task flow models with cache-friendly parallel dependency tracking. 1-14 - Xiaofei Liu

, Tianhan He, Wencheng Wang:
Online parallel-machine customer order scheduling with delivery time and penalties. 15-21 - Xiaotian Chen, Pengyu Wang, Jianbin Fang

, Peng Zhang, Chun Huang:
Optimizing small matrix multiplications via batch grouping on multi-core DSPs. 22-36 - Dazheng Liu, Sheng Xiao

, Xiaoli Ren, Wenjuan Liu, Dajiang Yi, Zean Tian, Jianping Wu, Yongan Wu, Zuodong Niu, Keqin Li, Shaoliang Peng:
Para-FDS: a scalable multilevel parallel scheme for fire dynamic simulator on multicore architectures. 37-48 - Hengliang Guo, Yubo Han, Haolei Wang, Shengguang Zhu, Gang Wu, Yang Guo, Xiangdong Liu, Chuanqiang Li:

Optimizing sparse-dense matrix-matrix multiplication for DCUs. 49-60 - Haobo Hua

, Chuangzheng Hou, Zhuxin Wen, Xiangkai Zhang, Xiaodong Yu, Jiandong Shang
, Litao Zhang:
Optimizing Standard Convolution for Diverse Precision on DCU. 61-79 - Tao Chen, Xiaoning Wang, Guanlong Li, Yining Zhao, Haili Xiao:

Revisiting workflow execution in HPC: a data-flow approach. 80-93 - Runyu Zhou, Yijin Li, Jiacheng Zhao, Ziyang Wang, En Shao, Ziyan Xie, Huimin Cui:

SYCL-MLU: unifying SIMT and SIMD in heterogeneous programming. 94-106 - Jiandong Shang, Fuchang Gao, Zhaopeng Li, Yizhe Sui, Gang Wu, Nan Wang, Lingling Wang, Dujuan Zhang:

Optimizing winograd-based convolution with DCU's matrix cores. 107-119 - Renqian Wan, Jianqiang Huang, Haodong Bian:

Qklu: a two-dimensional block-cyclic sparse direct solver. 120-131
Volume 8, Number 2, April 2026
- Chenghua Xu, Jingwei Sun, Mengna Sai, Fuxin Zhang, Guangzhong Sun, Weiwu Hu:

Fast compiler autotuning framework using design of experiments. 133-147 - Shuoming Zhang, Jiacheng Zhao

, Qiuchu Yu, Chunwei Xia
, Zheng Wang
, Xiaobing Feng
, Huimin Cui
:
The new compiler stack: a survey on the synergy of LLMs and compilers. 148-179 - Yonghua Hu, Linyun Deng, Xiangyu Gao, Zhezhuo Zhao:

Parallel implementation and optimization of LMS adaptive filtering algorithms based on vector DSP. 180-195 - Shaofeng Yang, Zhi Li, Yunting Wang, Xin He, Guangming Tan:

Optimization of the ParILUT-GPU algorithm. 196-209 - De Dong, Shurui Dai, Nurbol Luktarhan, Yicheng Xu, Guanyu Lin, Jiaxuan Yin:

Accelerating TSA via SpMV-based GPU parallelization in the industrial chain context. 210-220 - Peng Liang, Linbo Qiao, Zhiquan Lai, Dong Sheng Li:

Parallelsim: an accurate, generic, and efficient simulator for distributed deep learning. 221-236 - De Dong, Shurui Dai, Nurbol Luktarhan, Yicheng Xu, Guanyu Lin, Jiaxuan Yin:

Correction: Accelerating TSA via SpMV-based GPU parallelization in the industrial chain context. 237

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














