


default search action
34th PACT 2025: Irvine, CA, USA
- 34th International Conference on Parallel Architectures and Compilation Techniques, PACT 2025, Irvine, CA, USA, November 3-6, 2025. IEEE 2025, ISBN 979-8-3315-8295-1

- Dowon Kim, MinJae Lee, Janghyeon Kim, Hyucksung Kwon, Hyeonggyu Jeong, Sang-Soo Park, Minyong Yoon, Si-Dong Roh, Yongsuk Kwon, Jinin So, Jungwook Choi:

Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits. 1-13 - Junyeol Ryu, Yujin Jeong

, Daeyoung Park, Jinpyo Kim, Heehoon Kim, Jaejin Lee:
SPipe: Hybrid GPU and CPU Pipeline for Training LLMs under Memory Pressure. 14-29 - Seohong Choi, Huize Hong

, Tae Hee Han, Joonsung Kim:
ScaleMoE: A Fast and Scalable Distributed Training Framework for Large-Scale Mixture-of-Experts Models. 30-42 - Hyeongjun Cho, Yoonho Jang, Hyungi Kim, Seongwook Kim, Keewon Kwon, Gwangsun Kim, Seokin Hong:

LibraPIM: Dynamic Load Rebalancing to Maximize Utilization in PIM-Assisted LLM Inference Systems. 43-56 - Jiazhi Jiang, Xiao Liu, Jiangsu Du, Dan Huang, Yutong Lu:

Doppeladler: Adaptive Tensor Parallelism for Latency-Critical LLM Deployment on CPU-GPU Integrated End-User Device. 57-70 - Yiqi Chen, Xiping Dong, Zhe Zhou, Zhao Wang, Jie Zhang, Guangyu Sun:

Exploring Memory Tiering Systems in the CXL Era via FPGA-based Emulation and Device-Side Management. 71-83 - Keun Soo Lim, Yunjay Hong, Jongheon Jeong, Sam Son, Donguk Kim, Yeonhong Park, Jae W. Lee, Jinkyu Jeong:

CPC: Coordinated Page Cache for Serverless Computing. 84-96 - Fan Li

, Mimi Xie, Yanan Guo, Huize Li, Xin Xin:
SCREME: A Scalable Framework for Resilient Memory Design. 97-109 - Eishi Arima, Martin Schulz:

Cache Miss Curve Analysis via Cardinality Domain. 110-121 - Hongyi Guan, Yichuan Gao, Chenlu Miao, Haoyang Wu, Hang Zhu, Mingfeng Lin, Huayue Liang:

EARTH: Efficient Architecture for RISC-V Vector Memory Access. 122-134 - Yuguang Wang, Yunmo Zhang

, Zeyu Liu, Junqiao Qiu
, Zhenlin Wang:
ANG: Accelerating NFA processing on GPUs via Exploring Multi-Level Fine-Grained Parallelism. 135-147 - Chen Chen, Shanzhi Gu, Junsheng Chang, Li Shen

:
Accelerating DFS-based Subgraph Matching on GPU via Reusing Intersection. 148-159 - Eric Lorimer, Ruobing Han, Sung Ha Kang, Hyesoon Kim:

Multiway Merge Partitioning for Sparse-Sparse Matrix Multiplication on GPUs. 160-171 - Chaemin Lim, Suhyun Lee, Jinwoo Choi, Joonsung Kim, Jinho Lee, Youngsok Kim:

DMO-DB: Mitigating the Data Movement Bottlenecks of GPU-Accelerated Relational OLAP. 172-185 - Massinissa Merouani, Islem Kara Bernou, Riyadh Baghdadi:

Agentic Auto-Scheduling: An Experimental Study of LLM-Guided Loop Optimization. 186-200 - Massinissa Merouani, Afif Boudaoud, Iheb Nassim Aouadj, Nassim Tchoulak, Islem Kara Bernou, Hamza Benyamina, Fatima Benbouzid-Si Tayeb, Karima Benatchba, Hugh Leather, Riyadh Baghdadi:

LOOPer: A Learned Automatic Code Optimizer For Polyhedral Compilers. 201-215 - José Wesley de Souza Magalhães, Jackson Woodruff, Jordi Armengol-Estapé, Alexander Brauckmann, Luc Jaulmes, Elizabeth Polgreen, Michael F. P. O'Boyle:

Guess, Measure & Edit: Using Lowering to Lift Tensor Code. 216-228 - Jun Shirako, Vivek Sarkar:

Automatic Generation of Actor-based Parallelism from Shared-Memory Parallel Programs. 229-242 - Beniel Thileepan, Suhaib A. Fahmy, Gihan R. Mudalige

:
Automatic Code-Generation for Accelerating Structured-Mesh-Based Explicit Numerical Solvers on FPGAs. 243-256 - Alireza Tabatabaeian, Arrvindh Shriraman:

FLASH: An Abstract Machine for Modelling Fully Homomorphic Encryption Accelerators. 257-269 - Yanze Wu, Md Tanvir Arafin

:
Energy-Efficient Acceleration of Hash-Based Post-Quantum Cryptographic Schemes on Embedded Spatial Architectures. 270-280 - Robin Geens

, Arne Symons, Marian Verhelst
:
Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration. 281-291 - Rubén Langarita, Jesús Alastruey-Benedé, Pablo Ibáñez-Marín, Santiago Marco-Sola, Miquel Moretó, Adrià Armejach:

Squire: A General-Purpose Accelerator to Exploit Fine-Grain Parallelism on Dependency-Bound Kernels. 292-305 - Se-Min Lim, Seongyoung Kang, Sang-Woo Jun:

Bancroft: Genomics Acceleration Beyond On-Device Memory. 306-319 - Yujeong Choi, John Kim, Minsoo Rhu:

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations. 320-332 - Cyan Subhra Mishra, Deeksha Chaudhary, Mahmut Taylan Kandemir, Chita R. Das:

Salient Store: Enabling Smart Storage for Continuous Learning Edge Servers. 333-346 - Hyunsei Lee, Shinhyoung Jang, Jaewoo Gwak, Jongho Park, Yeseong Kim:

Bit-Level Semantics: Scalable RAG Retrieval with Neurosymbolic Hyperdimensional Computing. 347-358 - Md. Musfiqur Rahman Sanim, Zhihao Shu, Bahram Afsharmanesh, AmirAli Mirian, Jiexiong Guan, Wei Niu, Bin Ren, Gagan Agrawal:

Optimizing 3D Gaussian Splattering for Mobile GPUs. 359-371 - Naveen Namashivayam, Krishna Kandalla, Pen-Chung Yew, Trey White, Larry Kaplan, Mark Pagel:

GPU Stream-Aware Communication for Effective Pipelining. 372-384 - Alen Sabu, Harish Patil, Wim Heirman, Changxi Liu, Trevor E. Carlson:

TPE: XPU-Point: Simulator-Agnostic Sample Selection Methodology for Heterogeneous CPU-GPU Applications. 385-400 - Botao Wu, Martin Kong:

Generating Two-Level, GPU-Aware Mappings for Distributed Tensor Computations. 401-415 - Jiaxin Liu, Rubao Lee, Cathy H. Xia, Xia Odong Zhang:

A Stable Marriage Requires a Shared Residence with Low Contention and Mutual Complementarity. 416-430 - Yi Zhou, Qinglin Wang, Lian Wang, Zhiyan Liu, Bingwei Wang, Feiming Liu, Xiangdong Pei, Jie Liu:

Optimize Winograd Convolution for a Novel MIMD Many-core Architecture PEZY-SC3s. 431-443 - Zhuolun Jiang, Songyue Wang, Xiaokun Pei, Tianyue, Mingyu Chen:

CoroAMU: Unleashing Memory-Driven Coroutines through Latency-Aware Decoupled Operations. 444-457 - Geraldo F. Oliveira, Alain Kohli, David Novo, Ataberk Olgun, A. Giray Yaglikçi, Saugata Ghose, Juan Gómez-Luna, Onur Mutlu:

POSTER: DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures. 458-460 - Bahram Afsharmanesh, Md. Musfiqur Rahman Sanim, AmirAli Mirian, Gagan Agrawal:

Poster: HeteroSched: Co-Optimizing Scheduling and Parallelization for Deep Learning Workloads. 461-463 - Sanil Rao, Mohammad Alaul Haque Monil, Het Mankad, Narasinga Rao Miniskar, Keita Teranishi, Jeffrey S. Vetter, Franz Franchetti:

POSTER: IRISX: A Dynamic Trade-off System for Performance Portability on Multi-Accelerator Platforms. 464-466 - Manos Frouzakis, Juan Gómez-Luna, Geraldo F. Oliveira, Mohammad Sadrosadati, Onur Mutlu:

POSTER: PIMAP: Characterizing a Real Processing-in-Memory System for Analytical Data Processing. 467-469 - Haiyue Ma, Kaifeng Xu, David Wentzlaff:

Poster:Value-Aware Scheduler for Energy Reduction. 470-472

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














