*How I Built My Own Spotify (A Full-Stack AI Music Server)

May 26, 2026

If you love music but hate subscriptions, ads, and losing access to tracks because of licensing issues, building your own music server is the ultimate solution.

When I first started this project, I just wanted a simple way to stream my local MP3 files. But as I dove deeper into system architecture and machine learning, it quickly evolved into a fully automated, AI-powered music ecosystem. I ended up building something that doesn't just stream music—it downloads it, organizes it, and uses mathematical vector embeddings to recommend songs exactly like Spotify's algorithmic "Discover Weekly."

This post details everything I learned and how I actually built my own Spotify alternative, step by step, so you can do it too.

demo


The Infrastructure & Storage System

To build a seamless streaming experience on a budget, I needed an architecture that was lean but highly scalable.

I deployed a personal music streaming server on an Ubuntu VM on Microsoft Azure. Because it was a burstable 1GB RAM VM, I had to optimize it heavily. I disabled unnecessary Linux services, added swap space, and carefully tuned cache sizes to keep the system fast and responsive.

The biggest hurdle was storage. Music libraries grow fast, and VM disk space is expensive. To solve this, I used Microsoft Azure Blob Storage for my music files, mounted directly to the Linux filesystem using blobfuse2.

My music lives seamlessly at /mnt/music. This allowed me to bypass the small physical disk of the VM and scale my library infinitely for pennies.

demo


Step 1: Setting Up the Music Server (Navidrome)

With the storage mapped, I needed the actual streaming backend. I chose Navidrome—an open-source, lightweight music server compatible with the Subsonic API.

Instead of installing it directly, I used Docker to keep things clean. I configured it for on-the-fly transcoding, Last.fm scrobbling, and multi-user access.

Having a server is useless if you can only access it at home. Instead of messing with port forwarding and risky firewall rules, I used Cloudflare Tunnels. By installing cloudflared, I routed traffic securely to my local Docker container. Now, I have a fast, SSL-encrypted streaming platform accessible from anywhere via my custom domain.

For the frontend, I deployed Feishin on Vercel. Feishin gives you a gorgeous, Spotify-like UI with web playback, queue systems, and playlists. Setting it up required a bit of debugging around Chromium audio issues and WebAudio vs HTML5 playback, but the end result is a premium web player.


Step 2: Building the Command Center (Telegram Automation)

Managing a server via SSH is annoying when you're on a phone. I wanted a way to manage my music library from my pocket, so I built a custom Telegram Bot using Python (python-telegram-bot).

This bot became my all-in-one assistant. I programmed it to:

  • Monitor system health (RAM, CPU, and disk usage).
  • Manage users and trigger Navidrome library scans.
  • Clean duplicates by executing Linux fdupes scripts automatically.

The Magic – Automating Spotify Downloads: I wanted to send a Spotify link to my bot and have it automatically download every track directly into my library. Because Telegram bots can't easily download tracks directly from Spotify, I built a two-bot ecosystem:

  1. The Main Bot: Takes my /spotify command.
  2. The Userbot (Telethon): A script running on a real Telegram account.

When I text my main bot a Spotify URL, the Userbot secretly sends that URL to @M#######bot (a public bot that converts Spotify links). As the hunter bot replies with the .mp3 files, my Userbot intercepts them and downloads them directly to my Azure Blob storage. It even features a beautiful live progress bar that updates dynamically in my Telegram chat as the files hit my server!

demo


Step 3: The Crown Jewel – A Vector AI Recommendation Engine

This is the part of the project I am most proud of. I didn't just want a music player; I wanted an intelligent recommendation engine.

Most self-hosted recommendation systems rely on rigid metadata—they look at the ID3 tags, check the genre field, or use basic Last.fm tags. The problem with this is that music doesn't fit neatly into text boxes. An aggressive, fast-paced electronic track and a slow, ambient, moody electronic track might both be tagged "Electronic," but they feel completely different.

I wanted a system that understood the vibe. I essentially built a miniature Spotify recommendation engine by mathematically representing songs as vectors in a high-dimensional space.

Converting Audio to Vectors (Embeddings)

For each of the 1,326 songs currently in my library, I extracted an audio embedding. Instead of text, every single song became a point in 512-dimensional space.

An embedding looks like a giant list of floating-point numbers:
[0.183, -0.442, 0.991, 0.004, -0.771 ... ]

These 512 numbers act as the compressed "musical DNA" of the track. The model isn't explicitly told "this is a sad song." Instead, it learns to mathematically encode things like:

  • Mood and emotional tone
  • Sonic texture and instrumentation patterns
  • Energy levels and pacing
  • Rhythmic structure and ambience

Because of how this high-dimensional space works, similar songs cluster together geometrically. Dreamy ambient tracks form a cluster in one area of the 512-dimensional space, aggressive metal clusters far away, and cinematic orchestral tracks form their own region.

I stored this entire mapping in a vector database file called final_music_vectors.pkl. Each entry contained the song title, artist, the 512-float array, and extracted semantic moods.

Mathematical Similarity Search (Cosine Similarity)

So, how do you actually get recommendations?

When a user asks: "Recommend songs similar to Apocalypse by Cigarettes After Sex," the engine first locates the vector for Apocalypse (the query_vector).

It then compares this query_vector against ALL other 1,325 song vectors in the database. But it doesn't measure raw distance. Instead, it uses Cosine Similarity:

cos(θ) = (A · B) / (||A|| ||B||)

This formula measures the angle between two vectors pointing in space. If two songs have a highly similar sonic profile, texture, and emotional tone, their vectors will point in almost the exact same direction, resulting in a cosine similarity score very close to 1.0.

The engine sorts the database by these scores. For example:

{
  "title": "Space Song - Beach House",
  "score": 0.89
},
{
  "title": "After Dark - Mr. Kitty",
  "score": 0.82
}

Higher score = a closer sonic match.

From Loops to Matrix Optimization

Initially, my Python script compared the query song to the database one-by-one using a standard for loop. It worked, but it was slow.

To make it production-ready, I optimized the mathematics. Instead of looping, I stacked all 1,326 song vectors into a massive NumPy song_matrix with a shape of (1326, 512).

By using optimized linear algebra, a single matrix multiplication computes the cosine similarity of the query track against the entire database simultaneously. This made the inference massively faster, incredibly scalable, and much closer to how actual production Machine Learning systems are engineered.

The Hybrid Recommender: Semantic Boosts & Penalties

Audio embeddings alone are incredibly powerful, but to make the recommendations feel human, I built a hybrid engine.

I attached semantic metadata tags to songs like "dreamy", "melancholic", "night drive", and "romantic". My engine now calculates a combined score: Audio Similarity + Semantic Vibe Similarity.

To prevent the algorithm from getting "stuck" (a common issue in recommendation engines), I implemented ranking logic identical to Spotify's:

  • Artist Repetition Penalty: If the algorithm tries to recommend 10 Lana Del Rey songs in a row, the engine actively reduces the score of consecutive tracks by the same artist, forcing variety.
  • Semantic Boosts: If a user queries for a "sad dreamy late night" vibe, any song in the vector space that mathematically matches the audio profile and possesses the semantic tags "dreamy" or "ambient" gets a score multiplier.

This creates vibe continuity. The system recommends songs that aren't the same genre, and aren't the same artist, but feel identical emotionally.


Step 4: Reverse Engineering & Custom FastAPI Backend

Having the world's coolest Python script is useless if you can't hit "Play." I needed to integrate my AI directly into my music player's frontend.

I wrote a FastAPI service and designed it to mock Subsonic-compatible endpoints. Specifically, I exposed my recommendation engine on /rest/getSimilarSongs.view?id=SONG_ID.

Theoretically, any standard music client (like Feishin or Navidrome) could hit this endpoint, and my FastAPI server would perform the matrix multiplication and return the AI-curated list in standard XML/JSON formats.

The AutoDJ Discovery

Here is where I had to play detective. I wanted Feishin's "AutoDJ" feature to be powered by my AI. I pointed it at my server, but AutoDJ kept playing random songs.

Using Chrome DevTools, I ran a network trace and conducted a deep HAR (HTTP Archive) analysis of Feishin's fetch requests. I discovered a major architectural quirk: Feishin's AutoDJ does NOT use getSimilarSongs.view!

Instead, it relies entirely on /rest/getRandomSongs.view and handles the logic differently.

This was a massive discovery. To intercept this, I began experimenting with an AI-powered AutoDJ middleware proxy. The architecture I explored involved placing Nginx in front of Navidrome, intercepting the getRandomSongs requests, routing them to my FastAPI endpoint, rewriting the payloads with my vector AI results, and sending them back to the client.

While I paused the proxy implementation to polish other features, the groundwork for a fully transparent AI-interception layer is completely built.


Final Thoughts & What I Actually Achieved

By the end of this journey, I had completely replaced a paid service with a superior, self-hosted alternative.

I successfully built:

  • A scalable cloud music server backed by blob storage.
  • Telegram automation for remote management and Spotify importing.
  • A custom AI backend with a 512-dimensional vector recommender.
  • Fast matrix-optimized Cosine Similarity search.
  • Hybrid recommendation ranking with artist penalties and semantic boosts.

More importantly, I gained genuinely advanced full-stack and AI infrastructure knowledge. I learned the deep internals of Docker, Linux tuning, Cloudflare reverse proxies, FastAPI middleware, vector embeddings, and recommendation system engineering.

If you are a developer looking for a fun homelab project, build your own music server. It frees you from subscriptions, teaches you invaluable DevOps and ML skills, and feels like absolute magic every time you hit play on a track.


Try It Out!

If you want to see the final result in action, you can try out my self-hosted Spotify right now.

Head over to the web player: 👉 listen.prashantshirke.me
Head over to the mobile web player: 👉 musico.prashantshirke.me
Test Credentials:

  • Username: test
  • Password: test123

If you're reading this, try setting up a basic Dockerized app today. It's the first step to owning your own digital infrastructure.