24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5

24/7 Headless AI Server on Xiaomi 12 Pro (Guide & Benchmarks) Gemma4 VS Qwen2.5

https://preview.redd.it/2olx2ckl9evg1.jpg?width=4088&format=pjpg&auto=webp&s=b8ee69bff72a4ca21888dccf6f825da11b2b89a2

Here is the build guide for my setup. While it isn't a massive textbook, it provides enough detail to replicate the steps. Please note that this script ecosystem and the specific instructions were tailor-made for the Xiaomi 12 Pro. I cannot guarantee it will work out of the box on other hardware, though the general concepts apply universally.

Here are the key steps to achieve the build:

1. Unlock the Bootloader

Because unlocking the bootloader isn't strictly related to running Local LLMs, I’ve put together a dedicated post for this on my personal profile.

2. Flash LineageOS

Ditch MIUI/HyperOS for a cleaner, leaner Android experience.

3. Termux Setup & Android Survival Guide

By default, Android acts like a serial killer for background apps. You must grant Termux total freedom to prevent your LLM from being killed mid-generation.

  • 3.1 Disable Battery Optimization (System Level)
    • Go to Settings > Apps > Manage Apps > Termux.
    • Find Battery Saver (or Activity Control) and select "No Restrictions".
  • 3.2 Enable Wake Lock (Termux Level)
    • This prevents the CPU from entering deep sleep when the screen is off.
    • Open Termux, pull down your notification shade, and tap "Acquire wakelock".
    • Alternatively, run this in the terminal: termux-wake-lock
  • 3.3 Disable the Phantom Process Killer (Android 12+)
    • Android 12+ has a hidden mechanism that aggressively kills resource-heavy background processes (like Ollama). Connect your phone to your PC via ADB and run this to set the limit to "infinite": Bashadb shell "/system/bin/device_config put activity_manager max_phantom_processes 2147483647"
  • 3.4 Lock the App in Memory (Xiaomi Specific)
    • Open your Recents/Multitasking menu.
    • Long-press the Termux window and tap the Padlock icon. Termux will now survive the "Clear All" button.

4. Obtain Root Access

Install Magisk (preferably via F-Droid) and root your device. I won't provide a full tutorial here as there are thousands across the web, or you can simply ask an AI for the latest method for LineageOS.

5. The Headless Setup (Stopping the UI & Automation)

To maximize RAM and CPU for text generation, the Android graphical interface must be completely shut down. You do not need to do this manually— the zeus_cryo.sh master script will automatically execute the stop command and configure the headless environment for you.

If you are doing it yourself just investigate zeus_cryo.sh

However, before you execute that script, your device needs the right tools. You must push a series of custom binaries and monitoring scripts to the phone while the UI is still running.

5.1 Wi-Fi Recovery (Post-UI Kill)

When the Android UI is killed by the script, you lose standard Wi-Fi management. We use static binaries to maintain the connection in the background.

  • Kernel Note: Requires nl80211 support (standard on modern Qualcomm chips).
  • Compatibility: Universal aarch64 binary, zero dependencies.

Bash

adb push wpa_supplicant_static /data/local/tmp/wpa_supplicant_static adb push wpa_cli_static /data/local/tmp/wpa_cli_static adb shell "su -c 'chmod 755 /data/local/tmp/wpa_supplicant_static /data/local/tmp/wpa_cli_static'" 

(GitHub Links: wpa_cli_static | wpa_supplicant_static)

5.2 The "Zeus" Daemon Scripts

Push the automation scripts to your phone:

Bash

adb push zeus_cryo.sh /data/local/tmp/zeus_cryo.sh adb push zeus_status.sh /data/local/tmp/zeus_status.sh adb push zeus_battery.sh /data/local/tmp/zeus_battery.sh adb push zeus_watchdog.sh /data/local/tmp/zeus_watchdog.sh adb push zeus_watchdog_loop.sh /data/local/tmp/zeus_watchdog_loop.sh 

Script Breakdown:

  • zeus_cryo.sh: The master script that launches everything. (Requires your Wi-Fi SSID/Pass).
  • zeus_status.sh: Run this to check current system health.
  • zeus_battery.sh: Cycles battery between 40% and 80%. Connects/disconnects wall power to save battery health. (Requires Telegram Bot Token & ID for alerts).
  • zeus_watchdog.sh: Revives the battery and cooler daemons if the Android OOM (Out of Memory) killer terminates them during heavy LLM usage.
  • zeus_watchdog_loop.sh: Loops the watchdog every 15 seconds.

5.3 Smart Cooling Automation (Optional)

If you are using a smart plug (e.g., SONOFF S60 EU via eWeLink) and a phone cooler, you can automate thermal throttling.

Bash

adb push sonoff_ctl /data/local/tmp/sonoff_ctl adb push zeus_cooler.sh /data/local/tmp/zeus_cooler.sh adb push zeus_cooler.conf /data/local/tmp/zeus_cooler.conf adb shell "su -c 'chmod 755 /data/local/tmp/sonoff_ctl'" 

How it works: zeus_cooler.sh reads CPU temps every 2 seconds. Hit 45°C? The fan kicks on via sonoff_ctl. Drops to 42°C? Fan turns off. If it hits critical (55°C), it kills Ollama and pings you on Telegram.

zeus_cooler.conf

On Aliexpress: Smart Plug: SONOFF S60 EU SONOFF Wifi Socket Wifi Smart Socket Overload Protection Timer Smart Scene Remote Control Via EWeLink Home IFTTT ( Probably will work with any SONOFF smart plug) Cooler : Magnetic Semiconductor Phone Cooler - Ice/Frost Cooling Pad for Mobile Gaming & Streaming 

5.4 Launching the Server

With files in place, initiate the headless mode and reconnect remotely:

Bash

adb disconnect adb shell "su -c 'sh /data/local/tmp/zeus_cryo.sh'" # Reconnect over Wi-Fi (Replace with your phone's IP) adb connect 192.168.1.31:5555 # Check system status adb -s 192.168.1.31:5555 shell "su -c 'sh /data/local/tmp/zeus_status.sh'" 

(You can unplug the USB cable after the connect command).

6. Real-World Benchmarks

Per community requests, I ran some heavy tests to see what this Snapdragon chip could handle in a headless state.

Prompt used: "Write a 2000-word IT project essay."

Metric Model 1: Gemma4 E2B (Q8) Model 2: Qwen2.5 7B (Q4)
Output Generated 1,312 Words (without thinking) 3,453 Words
Total Duration 21m 18s 43m 34s
Load Duration 400.39 ms 282.03 ms
Prompt Eval Time 1.01s (24.67 tokens/s) 5.29s (3.59 tokens/s)
Eval Rate (Generation) 2.16 tokens/s 1.54 tokens/s

I've also attached power measurements, a short real-time video, and the raw model logs to the post.

GEMMA4-E2B-8Q.txt

Qwen2.5-7B-Q4_K_M.txt

https://reddit.com/link/1smedrp/video/tybzuwfkaevg1/player

https://preview.redd.it/4iuh1koraevg1.jpg?width=3072&format=pjpg&auto=webp&s=40d269e87480ac423d718cc933596be816510dee

https://preview.redd.it/r59343ntaevg1.jpg?width=3072&format=pjpg&auto=webp&s=ec6c51bafc75004957af6b5cbe975f3cf9ab7541

Note on llama.cpp: I spent half a day trying to natively compile llama.cpp in Termux but keep hitting fatal spawn.h errors. Because of that, this guide focuses on my stable setup.

But I will compile it finally.

Thank you all for the interest. I hope this guide inspires some of you to dust off your old flagships and build something similar!

submitted by /u/Aromatic_Ad_7557
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top