PagedAttention borrowed a 40-year-old idea from operating systems. The result: 24x higher inference throughput, same hardware.
PagedAttention borrowed a 40-year-old idea from operating systems. The result: 24x higher inference throughput, same hardware.