Parinaya Chaturvedi

This is the website of Parinaya Chaturvedi. I work on AI, NLP, and LLMs. If you like these posts, subscribe to the mailing list.

vLLM PagedAttention: Efficient Memory Management for LLM Inference

December 10, 2024

Understanding PagedAttention, the memory optimization technique that enables efficient serving of large language models.
Sample Blog Post

January 15, 2025

This is a sample blog post to demonstrate the website functionality.
Pipeline-Parallelism: Distributed Training via Model Partitioning

October 03, 2022

Pipeline parallelism makes it possible to train large models that don't fit into a single GPU's memory.
GPU CUDA Primitives: Understanding Parallel Computing Fundamentals

November 15, 2024

A deep dive into CUDA primitives, memory hierarchies, and parallel computation patterns on modern GPUs.

vLLM PagedAttention: Efficient Memory Management for LLM Inference