Menu

Post image 1
Post image 2
Post image 3
Post image 4
1 / 4
0

RAG - Sliding Window, Token Based Chunking and PDF Chunking Packages

DEV Community·Ramya Perumal·18 days ago
#Nyn7PGDI
#ai#beginners#rag#software#token#chunking
Reading 0:00
15s threshold

Sliding Window Chunking Sliding Window Chunking is a more intensive chunking mechanism. In this method, a window size is defined based on a character or token limit. Instead of creating completely separate chunks, the window moves forward gradually while keeping part of the previous content. The character or token limit is called the window size The amount the window moves forward each time is called the step size This is a stricter form of overlapping chunking. How it Works Suppose: Window size = 500 characters Step size = 100 characters The first chunk may contain characters 1–500. The second chunk starts after moving 100 characters and may contain characters 101–600. Because of this overlap, related information is repeatedly included across chunks. Benefits The major benefit of this method is that semantically related points are stored closer together in the vector database, almost like clusters. This improves retrieval in scenarios where context changes frequently.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More