Text Analysis for Hybrid Search: Tokenization, Stopwords & Accent Folding
Tokenization makes or breaks hybrid search. See how Weaviate’s accent folding, custom stopwords, and /v1/tokenize endpoint power multilingual BM25.
Tokenization makes or breaks hybrid search. See how Weaviate’s accent folding, custom stopwords, and /v1/tokenize endpoint power multilingual BM25.
A Beginner-Friendly, No-Math Breakdown of How LLMs Process Context, Predict Tokens, and Produce Outputs.Continue reading on Towards AI »
Tokenizers are the essential bridge between human language and machine understanding.Continue reading on Medium »
Why every English tokenizer butchers Italian, the encoding switch that wasted my first attempt, and the regex that keeps “dell’algoritmo” in one piece.Fabio Angeletti — PhD in Computer Engineering (Sapienza), Adjunct Professor at LUISS and LUISS Busine…
JSON has served us well. For over a decade, it’s been the lingua franca of the web — powering APIs, config files, and data pipelines…Continue reading on Medium »
Before the revolutionary shift of 2017, sequential processing defined the world of artificial intelligence, characterized by a slow…Continue reading on Medium »
Claude Token Counter, now with model comparisons
I upgraded my Claude Token Counter tool to add the ability to run the same count against different models in order to compare them.
As far as I can tell Claude Opus 4.7 is the first model to change…
Series: An Engineer’s Explorations in LLM Concepts“It started with a simple question: What exactly is a token? Three hours later, I had filled six browser tabs, youtube videos and had somehow ended up reading about some 5–6 concepts of LLM.”If you’ve w…