Parinaya Chaturvedi
This is the website of Parinaya Chaturvedi. I work on AI, NLP, and LLMs. If you like these posts, subscribe to the mailing list.
-
vLLM PagedAttention: Efficient Memory Management for LLM Inference
December 10, 2024
Understanding PagedAttention, the memory optimization technique that enables efficient serving of large language models.
-
Sample Blog Post
January 15, 2025
This is a sample blog post to demonstrate the website functionality.
-
Pipeline-Parallelism: Distributed Training via Model Partitioning
October 03, 2022
Pipeline parallelism makes it possible to train large models that don't fit into a single GPU's memory.
-
GPU CUDA Primitives: Understanding Parallel Computing Fundamentals
November 15, 2024
A deep dive into CUDA primitives, memory hierarchies, and parallel computation patterns on modern GPUs.
Subscribe to receive updates on new posts:
No spam. Unsubscribe anytime.