top of page

UX DESIGN, FRONT END DEVELOPMENT, MACHINE LEARNING

MoodSpace

A Spotify Soundtrack Generator that utilizes machine learning to create a personalized soundtrack to a user's favorite movie scene. 

moodspace.png

My Role

Conceptualization, Design Strategy, Wire framing, Prototyping, Usability Testing, Machine Learning, Brand Identity

Team

Kendall Arata, Ben Brill, Michael Ting

Tools

Figma

Python

Beautiful Soup

Heroku

Duration

PIC 16B - Advanced Python Programming

8 weeks

Project Overview

I’ve always viewed music as a connective medium; it’s something that people across all different cultures and languages can sit down and admire together. With today’s advancements in streaming services, music has never been more accessible.

However, I often find that people – including myself – have a difficult time fully verbalizing their music taste. Especially when music taste’s are so diverse, it can be difficult to express that variety of music you listen to into a couple of sentences. That’s why Spotify rewind is so popular on social media; it does all the summarizing for you.

Since discovering the Spotify API, a database that contains statistics on the musicality, energy, and even “wordiness” of hundreds of millions of songs, I saw the potential to use this data to aid in communicating music tastes between people in an automated fashion.

Apparently, this was not an uncommon thought. Ben and Michael had similar thoughts on how we can best leverage this unique dataset. I came up with the idea of trying to match different feelings and emotions to songs that you commonly listen to. That way, when comparing music tastes with another person, you would be able to see what you and your partner view as a “sad” or “exciting” song, making sharing music tastes more streamlined. And what better way to convey emotion through movies?

Our group came up with MoodSpace, a sort of “movie soundtrack generator”. Utilizing Spotipy, and what we’ve learned about TensorFlow+neural networks, and some SQLAlchemy we learned on the way for the backend, we built a machine learning model that upon select one of our curated famous movie scenes (think the throne room scene from The Last Jedi, or the ending of Titanic), could come up with a playlist of songs that fit it. We then deployed it to AWS, and built a front-end python Flask application to serve recommendation interface.

Our final product lets you choose from 10 famous movie scenes. Once you select a movie scene, our algorithm outputs three songs that lie in your Spotify library that match the mood of the scene. You can then compare these songs to other people to see how they could best score this movie scene given their music taste.

Mission

Connect individuals to each other with their favorite entertainment mediums: music and film

The Machine Learning

Now I want to dive deep into a little adventure we had into the biggest problem with our project as a whole. We had gotten to a point where we set up an initial stab at a machine learning pipeline:

  1. Perform k-means clustering on a baseline set of songs

  2. Build and train the machine learning model using our created clusters as targets

  3. Use the same model on the movie scripts

  4. Find closest song

 

We thought this was originally a clever way to marry together the “feeling” of two kinds of data: movies and songs. We didn’t think too much of this for a while, though, because we ended up being more focused on making the actual website and interface work for a while.

 

However, once we finished that, unfortunately quite late into the project, we started experimenting with the actual recommendations, and soon realized the Big Problem™:

 

Many very different movie scenes were recommending the same songs over and over again.

 

We spent so much time of the project on getting a functional pipeline and having a nice user interface that the actual thinking and effort on the model was quite lacking until the very end of the project. However, when we finally started noticing this, it took us quite a long time to figure out exactly why this was happening.

 

After quite a bit of experimenting, we finally figured out there to look. We eventually got around to making a notebook that diagnosed the problem in a nice quantitative way, but here’s the long and the short of it: all the movies ended up being very super duper close in feature space!

How the Machine Learning works

Credit for this part goes to Ben Brill, who mostly worked on the machine learning side

​Spotify has several different metrics it measures for each song on it’s API. For instance, if we wanted to see the statistics for the song “Waiting on the World to Change” by John Mayer, we find the following:

However, movie scenes obviously do not have these same metrics, because they do not have any musicality that can be measured, only dialogue. To bridge the gap between movie and song, we must provide to a model some sort of medium that both movie and songs share. Luckily, both have some form of text. For songs, it is the lyrics, and for movies it is the screenplay text.

 

With this in mind, we can now develop an idea of what the inputs and outputs of our model should be. The input of our model should be some sort of text, whether that be lyrics or screenplay. The output should be a given set of Spotify metrics. We can train our model using song lyrics as input variables and their respective Spotify Metrics as target variables. From there, we can apply this model to given screenplays to predict what the Spotify metrics would be given the text composition of that scene. Now that we have metrics for both songs and screenplays, we can compare the two to see which songs are most alligned to a selected scene’s screenplay.

Model Details

We implemented the above model infastructure using a tensorflow.keras neural network. Prior to actually inputing the text into the model, it must be vectorized, using the tensorflow TextVectorization function, which returns a vector that represents the words contained within a given song or screenplay in a tokenized form. Once this is complete, we can feed this tokenized vector into our neural network. An outline of our model’s structure is shown below.

 

Let’s highlight a couple of key features of this model.

Embedding Layer

 

The embedding layer takes an input of a vector representing a tokenized string and converts it into a vector in n dimensional space. We chose n = 60. This enables our model to visualize where each of our texts might lie in this space, and thus determine connections and patterns between them to output to subsequent layers of our model.

Dropout Layer

 

These layers are placed at various locations in the model to “dropout” some of the connecting neurons within the network, to prevent overfitting.

Model Training

 

We trained our model on a random set of songs in the Spotify API who had English lyrics on Genuis API. The target was set as the Spotify Metrics energy, valence, tempo, and liveness. These were the results of our final epochs in fitting the model.

Losses are relatively low, as is our Mean Squared Logistic Error, on both the training and validation data. Once the model was fitted, we saved the weights and used them to predict the energy, valence, tempo, and liveness of our given movie scenes.

Finding Song Matches

Now that we have metrics for both our songs and screenplays, we must develop some sort of way to determine which songs are most similar to which scenes. We can do this by computing the norm between the metrics of each song to a selected movie scene, which is essentially the distance between two points in  n dimensional space.

First, we take the user’s top 20 songs. An random offset is applied to generate a unique list of songs each time the user uses the web app, but this list will not include any songs above their top 70 most listend to tracks. We then run our model on the song lyrics to generate the predicted metrics corresponding to the subject of the song. Though the songs already contain Spotify metrics, we wanted to use metrics generated by the lyrics of the text, as to match the subject of a song to the subject of a movie scene, rather than match the musicality of a song to a movie scene.

Binding it all together: The Design

After nailing out all the technical kinks, it was time for the fun part: the design. 

I wanted the interface to be easily navigable and recognizable as a Spotify-related app, so I adopted as much of Spotify's original branding and color scheme as possible. I played with gradients and decided I really liked the purple to blue gradient because it represented the depth of what we were trying to unveil: the song mood. I also paid attention to the user flow, making it as simple and as similar to the Spotify user flow as possible (since our users are most definitely Spotify users). I also felt like our design would work well in dark mode, so I also prototyped an additional variation for said mode. See frames below for one of the first iterations of the web app.

After figuring out the typography, color scheme, components, and iconography, I implemented it on the front-end using javascript, css, and html.

Style Guide

Design Concepts and Wireframing

What could have gone better

Even though the our model did eventually end up working, we were honestly quite frustrated when we found our original model was pretty bad at predictions. I’m still happy that we actually managed to make that notebook that clearly diagnosed our problem and then we found a good way to change the model layers to fix it, but it was a lot more stressful than it needed to be.

I also still would have liked to implement the stretch goal we set for our project, being able to input your own movie scene. Indeed, we started brushing that aside after realizing how much work that would possibly take, and how quickly deadlines were approaching. Even without that piece, though, it’s still fun enough to get the recommendations for the movies we curated. To be honest, it’s something that we have ideas for if we work on this project later, like for example adding a search bar for one of those movie script databases.

With any kind of recommendation/curating system that lives online, there’s also always the important idea: what’s the point of this being online if it’s not going to improve with the more data it collects from users? We once again have ideas for this. We saw another group have a pretty good idea for this, in what looks to be a pretty standard way among online recommendation systems. They set up a Google form where users could submit data about when the model predicted incorrectly. We could almost definitely implement a similar system, maybe even automatically using AWS or Databricks, or some other way of running a cron job to constantly retrain the model without having to go through the extra hoop of Google form data collection. Either way, this would really be the next and maybe even final machine learning step to make this a true quality web app.

 

At the end of the day though, I think the thing to be most proud of is the website, though. As I was mentioning before, each of us has a piece of it that we’re all really happy with. At least for me personally, I’m really happy that I both found a good interesting niche to explore and develop, orchestrate the development in our little team, and make a really nice-looking and functioning website that can live online for anyone to try out.

What Did I Learn?

Maybe more abstractly, I do continue to learn more about how I work in a team as well, particularly in school-based technical projects. In particular, this project has solidified for me that I have this strong tendency to just forge on ahead on my own and do a bunch of work on my own. Maybe its because I have a desire to prove myself as a fast/competent part of the team, or maybe its just because I work best during degenerate hours of the night, or maybe its just plain social anxiety - whatever it is, this project has indeed just made it clear to me that its probably something to think about.

 

Not that this is a 100% bad thing though. In this project, one really interesting and almost defining characteristic of our team was that we all really naturally fell into separate niches within the overall product, and I believe it made the experience much more fun. We had one person who really was responsible for a big chunk of the machine learning and backend framework and was able to come up with some innovative solutions to dispel some of the obstacles we ran into. He was really clever with the way he related the movies and songs in a way I would have never been able to see myself. And our third teammate was the one who pretty much set up the whole backend framework. Either way, we all have a component of the end product we’re all really proud of.

bottom of page