That Blog

Network Radio Part 1: The Database

Well, after making my last blog post earlier this month, about the projects I wanted to do and what things I wanted to work on, I got inspired to start on my network radio station thing. If you didn't read that post, I pretty much wanted to write this whole bunch of code to automate playing stuff from my giant music library, like a radio station. I wanted it to also have some jingles, breaks, station IDs, and other such things that it would play in between each song, just like an actual station. Then I would make the stream available on my network so I or whoever could listen to it. I could have different programs, where certain types of music were played at certain times. I could have an automated weather report that I could generate and so forth, and use TTS to read it on air.
You know, the weird thing is that I don't even look forward to listening to it that much. I really just look forward to building the thing. That's the fun part for me.

Starting The project

Well, in order for the station to be as easy as possible to run, and to play songs on, I decided I would need a database containing all the metadata about each track, album, artist, and genre. So I set out to build that. I figured once I got all the tracks inserted, I could then just make some SQL queries and really easily and programatically get the information I needed. So this was the SQL I used to build my database tables:

CREATE TABLE artists (
    id INTEGER PRIMARY KEY,
    name TEXT,
    bio TEXT
);

CREATE TABLE genres (
    id INTEGER PRIMARY KEY,
    name TEXT
);

CREATE TABLE albums (
    id INTEGER PRIMARY KEY,
    name TEXT,
    release_date TEXT,
    cover_art TEXT,
    artist_id INTEGER,
    genre_id INTEGER,
    FOREIGN KEY (artist_id) REFERENCES artists(id),
    FOREIGN KEY (genre_id) REFERENCES genres(id)
);

CREATE TABLE tracks (
    id INTEGER PRIMARY KEY,
    name TEXT,
    duration INTEGER,
    file_path TEXT,
    lyrics TEXT,
    album_id INTEGER,
    FOREIGN KEY (album_id) REFERENCES albums(id)
);

Each table is related to another in some way. I can use SQL to query for any track based on any number of characteristics. For instance, I can select all the tracks from the database where the genre is "Country". Or I could select all tracks in the table where the artist is George Strait. Or I could select every track where the release date is between 1980 and 1990. Or I could select all tracks that are less than two and a half minutes. Or, I could select all of those things at once and get all tracks by George Strait which are Country, between 1980 and 1990, and are no more than two and a half minutes long. You get the picture.

Populating The Database

Well, before I could actually start writing code to pull out tracks and play them, I had to put all the tracks into the database. I had to iterate my entire music library, which is about 2.53TB at the time of writing. The obvious thing to do was write a script to go through and populate all the tables, and so the first program I wrote for the radio project was a directory crawler, in python.

My First crack At the Crawler

My music library is located on my NAS, and is mapped to my computer as the N drive. On the NAS, the music folder contains about 16 different folders, each of them representing a different genre. Inside of each genre folder are artists that write music in that genre. And inside of each artist folder were album folders. The singles the artist released were in the root artist folder, which is how I identified them as singles. An example file path might look like this: "N:\Music\Country & Bluegrass\Ray Pennington\It's All In The Swing\06 - Out Of Nowhere.mp3". Excellent song, by the way, you should listen to it
Well anyway, in my script, I first started by writing four nested loops. The first loop would loop through the genre folders and execute the second loop, which would loop over the artists in that genre folder, and execute the third loop, which would iterate the albums in each artist folder and execute the fourth loop, to get all of the tracks in each album folder and write them to the database. The script looked something like this pseudocode:

genre loop
    insert the genre name into the database
    execute artist loop
        insert artist into the database
        execute album loop
            insert the album information into the database, and also grab any singles that aren't in album folders to insert those too
            execute the track loop
                insert the information about each track

Well, this kind of worked, in a roundabout way. It wasn't clean though, and was very, very hard to troubleshoot. It was also very rigidly designed, and couldn't really handle any other conditions than the ones I had designed it for. For instance, some of the genre folders contained so many artists that I sorted them into folders by letter to speed up windows file explorer when it opened and displayed the directory. The way my code was written couldn't adapt though. Or at least it was hard for me to refactor it to adapt to that. Thus, I ended up abstracting some of the loops into their own functions, so I could more easily call them whenever I really needed them instead of them being explicitly defined as code that would run at a specific point in the program. I really thought I would do this as a quick one-off script that I'd rarely have to use, so I didn't put in a lot of effort initially. But I definitely should have.

Crawler part two

This time, I took the artist loop and put it inside of its own function called "insertArtist". This function would take a path to an artist and then loop through it. If the item the loop was on was a file, it was considered a single, and I just put it right into the database. But, if the item it was on was a folder, it was considered an album and I executed my other function, "insertAlbum".
All this function did was take the path to an album and get all the tracks in the folder. Then it inserted them. The method here was a little different. Instead of creating more nested loops, I used a function of python's OS library, os.path.walk. It just grabbed every path to every file in every folder in a given folder. This was useful here, because sometimes album folders contained CD folders, where the tracks were separated into the discs they would have been on if you physically had them. So after I got the list of paths I needed, I used a loop to go through and insert each one into the database.
This design worked much better in terms of adaptability and further expansion on the script down the road. For instance, if I added a new artist to my library, or an album, and I wanted to put that in the database, I would just need to run the insertArtist or album functions and supply the path. Now, the script looked something like this:

InsertAlbum function

InsertArtist function

genre loop
    insert the genre information
     execute the artist loop
        if the list of artists looked like "#, a, b, C, D, ..."
            execute a letter loop
                 run the insertArtist function and pass the artist the loop is currently on
        else
            call the insertArtist function and pass the artist the loop is on

Other things my script did

Earlier I talked about my database tables and showed you the code for creating them. You may remember that there were a lot of attributes about albums and tracks that I was storing, such as album release date, track duration, and even lyrics. In my script, I was getting this data and putting it in as I processed each album and track.
A lot of the information I needed was stored right in the tags of each audio file. If you're unfamiliar, each mp3 file usually contains a list of tags that tell you things about it, such as the artist, album, disc of the album, release date, lyrics, bpm, just all kinds of stuff. The reason I didn't just rely on tags for all the data is because my library was already fairly well sorted and I could retrieve the important info from just the file path. But also, tracks aren't always tagged correctly, if at all. But most of my music is, and I could get quite a bit of what I needed by just reading the tags.
I used a command line tool called "ffprobe.exe", which comes with "ffmpeg.exe" if you've ever used that. If not, well ffmpeg is a tool for handling and converting between audio formats, and ffprobe simply probes the file for information about it. So, I asked ffprobe to tell me about each track as I looped through them in my script, and got the date and lyrics from the tags, if they were there. Then ffprobe was able to determine the track's length and tell me that too.
Again, though, not all of my music is tagged correctly. Some of it is missing release date tags, and so what I need to do is figure out how to pull that information from an online music database and insert that. But that'll come later on. I'd also like to leverage the Wikipedia API to insert a link to the artist's page into the "bio" field in the artists table.

Running the script

Well, after a lot of testing, my script was finally ready for me to let it run through my library. So I set it to going, and watched it. I had error handling implemented, so it would log any errors and keep going. I wouldn't have to supervise it to make sure it didn't stop. And there for a while, things went great... until they didn't. Oh the script was fine. But my library was way out of order.
You see, I have gathered quite a bit of my music library myself. But I have also obtained significant portions of it from other people, and they like to sort it much, much differently than I do. For instance, a lot of the artist folders I had were sorted not only by album,but but by type of album. Studio, mixtape, live, bootleg... etc. Then, even after that, the album names contained completely useless information that I didn't need. Like the release year of the album. Like the artist's name. Like the bitrate of the album. I should have cleaned all that up years ago, but I'm finally doing that now so I can get it all the way I want it. But, I did get a pretty good bit of music in the database. So I could actually proceed to the next portion of the project, scheduling and playing tracks. ... Or could I?

Fatal Bug

Ok maybe it wasn't fatal, but it did render a lot of the data useless. There was a key, key! relationship that wasn't properly established between the artists and their albums. If you were to select all the albums from the database where the artist was George Strait, well, you'd get no results. Why? All the albums are there, but they aren't tied to the artist, because of one simple programming concept I wasn't clear on.
In python, there are what are known as global variables. A variable is simply something that holds information, and a global variable can be accessed anywhere in your script. Local variables, though, can only be accessed inside the functions they're part of.
When I used my script to insert my data into the database, I would retrieve the ID column of the last row I just inserted. For example, when I inserted an artist into the database, the database would give that artist an ID and then I would store the ID in a variable to use it later. When I inserted an album, I would also insert the artist ID to associate the artist with the album. I did the same thing with albums, to associate them with their tracks.
I wanted to be able to access these ID variables anywhere, so I made them global variables. I wouldn't even have to pass them around from function to function that way, they would just be accessible anywhere I needed them. And they were. But what I forgot was that you needed to actually write: "global variable_name" before actually updating the global variable. You could read it anywhere you wanted, but if you didn't specify that you were updating the global variable, then you would just be saving your value into a new, local variable.
So what was happening was, my insertArtist function would put the each artist into my database, store the ID in a variable called artistID, and call the insertAlbum function. The album function would read from the global variable "artistID", which, since I didn't specify that I wanted to update the global variable artistID in the artist function, was always set to 0. Thus, every album in my database now looks like it was composed, recorded and released by one artist, with the id of 0 in my database. Sqlite starts numbering at 1 though, so in short, none of the albums are related to any artist.

Next Steps

Well, I fixed that bug. My goodness. Now I have to crawl the database all over again. It's not hard, it just takes some time for the script to run. But before I can run it again, I need to go through my music library and organize everything properly. Then, and only then, will I run the crawler again.
After that, I'll start working on how to stream a file to an icecast server from python. Oh and create an Icecast server too. Then I'll work on the scheduler part of the station, which will take primary responsibility for determining what gets played on the station and when. I thought of having files I could build, called "programs", which I could point the scheduler at. They would just be simple files specifying what kind of music I want played and for how long, and at what time, and so on.
Well, that's about all I have for this week! I'll be trying to work on this some soon and update when I'm a little further along on it. Thanks for reading!