Menu

Post image 1
Post image 2
1 / 2
0

78. Word Embeddings: Words as Numbers That Actually Mean Something

DEV Community·Akhilesh·20 days ago
#x8cdNmlh
Reading 0:00
15s threshold

The tokenizer gave you integers. "cat" is 2345. "dog" is 7891. Your model sees these numbers and knows nothing. Cat and dog might as well be completely unrelated. The integers carry no information about meaning. Word embeddings fix this by giving every word a dense vector of real numbers. Hundreds of dimensions. The key insight: words that appear in similar contexts get similar vectors. "cat" and "dog" both appear near "the," "my," "a," "played," "sleeps." Their vectors end up close together in the embedding space. This idea, learning word meaning from context, led to the most consequential series of advances in NLP history. Word2Vec in 2013. GloVe in 2014. ELMo in 2018. BERT in 2018. Every language model you use today traces its lineage to this one idea. The Embedding Layer import torch import torch.nn as nn import numpy as np import matplotlib.pyplot as plt from sklearn.decomposition import PCA import warnings warnings . filterwarnings ( " ignore " ) torch . manual_seed ( 42 ) np . random .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More