Menu

Using hf tokenizers in Rust
📰
0

Using hf tokenizers in Rust

DEV Community·Wayne·about 1 month ago
#1c5EUsx1
Reading 0:00
15s threshold

The tokenizers library from Hugging Face provides an efficient way to work with text tokenization in Rust. This guide shows you how to get started with pretrained tokenizers. Setup First, add the tokenizer library to your project: cargo add tokenizers --features http,hf-hub Enter fullscreen mode Exit fullscreen mode Basic Usage Here's a complete example that loads a pretrained tokenizer and processes text: use tokenizers :: Tokenizer ; fn main () -> Result < (), Box < dyn std :: error :: Error + Send + Sync >> { // Load a pretrained tokenizer let tokenizer = Tokenizer :: from_pretrained ( "hf-internal-testing/llama-tokenizer" , None ) ? ; let text = "This is a sample string to tokenize" ; // Encode the text (false = no special tokens) let encoding = tokenizer .encode ( text , false ) ? ; // Get token IDs let token_ids = encoding .get_ids (); println! ( "Token IDs: {:?}" , token_ids ); // Get token text let tokens = encoding .get_tokens (); println! ( "Tokens: {:?}" , tokens ); println!…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More