Update on Gemma 4 having MTP: Reverse engineering effort

Hey Everyone

In a previous post I had mentioned I found out Gemma 4 has MTP. Turns out I was able to extract the model weights, but now I need help from the community, especially people who know C++ to help reverse engineer the MTP from the compiled TFLite graph files, back into a usable Pytorch nn.Module.

I have made a repo on HuggingFace with the extracted files, alongsite replication steps and clues I could find, which I linked here in the post.

TL;DR

Extracted .litertlm --> Multiple .tflite files
Seems to be quantized in INT8 so it might be salvagable with a de-quantization, if Google did QAT training on their side
Reverse-engineerable with Google's AI Edge Model explorer: https://ai.google.dev/edge/model-explorer
Maybe the previous Gemini Nano extraction/conversion efforts are helpful (e.g. converting to safetensors) https://huggingface.co/Xenova/gemini-nano/discussions/1 . This time it should actually be easier to port since we know Gemma 4's transformer block implementations, which seems to be a core part
I extracted a json of the Graphdef, might be usable to reverse engineer this with a LLM. Json is available within my repo in the extracted/ folder.

submitted by /u/Electrical-Monitor27
[link] [comments]

Leave a Comment