Splitting with 's I have a question about splitting words with an apostrophe. I wanted to split an English text into words, where words like 'they're' or 'I'm' get recognized as one word and stay together. I also wanted words connected with a hyphen to stay together. I found a way to do this by using a custom tokenizer, namely tokenizer = RegexpTokenizer(r"\['\\w-\]+|\\.") My issue is that this works whenever I try it out with a smaller string, but when I try to apply this to my text file it doesn't, despite the code being pretty much the same. I don't understand why, since I don't get an error message, it just splits the words with apostrophes anyway (so the output is 'grandfather', 's' instead of "grandfather's". I've included my code below because I'm not sure where the mistake could be, if anyone could help or point out why this doesn't work that would be great. I think it might have to do with my the file\_content part but I can't figure it out.…