Day 9 - Sparse embedding continued - RAG

1 / 5

Day 9 - Sparse embedding continued - RAG

DEV Community: nlp·Indumathi R·3 days ago

#vwkzqUSI

#dev #word #sparse #embedding #words #photo

Reading 0:00

15s threshold

In the previous post, we saw some basic methodologies under sparse embeddings. In that, term frequency(TF) had a fallback when same words are repeated too often. To overcome the shortcomings of TF, next method was introduced. We shall see them in detail: Inverse document frequency(IDF) It determines how less frequent a word occurs in the input documents. It calculates how the rare the word is. Rare word is of high priority. i.e If the word occurs less frequent, then the value will be high and if if the word occurs more frequently, then the value will be low. If i ask query about frequently occurring words (for which IDF score is low), results will not be that good. On the other hand, if i ask query about rarest word(IDF score is high), results will be comparatively good. Drawbacks of IDF If i ask query about kubernetes and if the word is occurring only in one document , that particular document will be returned.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Day 9 - Sparse embedding continued - RAG