LLM Study Diary #1: Transformer

1 / 3

LLM Study Diary #1: Transformer

DEV Community·Sofia·about 1 month ago

#PyAenQBj

#ai #machinelearning #llm #devjournal #token #attention

Reading 0:00

15s threshold

About Me I have been working as software engineer for almost 8 years, mostly backend and infra, including distributed system, nearline processing, batch processing, etc. I have some basic knowledge of ML in the school but no complicated ML use case experience. The series will note what I learn about LLM as a general software engineer. Feel free to comment if anything seems wrong and leave your questions. Transformer This is a good source to understand each component in the transformer: Mastering Tensor Dimensions in Transformers . Decoder-only models (GPT family, Llama, Claude) are used for generation. Encoder-decoder models (BART, the original "Attention Is All You Need" Transformer) handle translation and summarization. Encoder-only models like BERT are used for classification and embeddings. Here we talk about decoder-only LLM. To summarize the architecture, the transformer block has two main important component: Masked Multi-Head Attention (MMHA) and Feed Forward Network (FFN).…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

LLM Study Diary #1: Transformer