Moodspace

Moodspace is a desktop app and Spotify Soundtrack Generator that utilizes machine learning to create a personalized soundtrack to a user's favorite movie scene.

MY ROLE

I was part of a team of three engineers and data scientists. As the lead designer, I was responsible for coding the front end visual display and design of the desktop app as well as help my teammates with the research and machine learning code.

Background

Music has always been my way of connecting with others, regardless of culture or background. With the rise of streaming services, music has become more accessible, but I noticed a common challenge: people often struggle to articulate their diverse tastes. This is why Spotify Rewind resonates—it neatly packages our musical identities in a way we often can’t express ourselves.

THE IDEA

When I discovered the Spotify API, which provides data on everything from a song’s energy to its "wordiness," I saw an opportunity to help people communicate their music tastes more clearly. I wasn’t alone—Ben and Michael were also exploring similar ideas. Together, we brainstormed ways to turn this idea into something tangible.

THE CONCEPT

A Mood Detector and Soundtrack Generator

I proposed a concept: matching songs to specific feelings or emotions, helping people express their music tastes through moods. Imagine being able to show someone what you consider a "sad" or "exciting" song, and instantly connecting over it. And what better way to explore emotion than through movies?

That’s how MoodSpace was born—a "movie soundtrack generator." We envisioned a tool that could take iconic movie scenes, like the throne room from The Last Jedi or the finale of Titanic, and generate playlists of songs that fit those moods.

TOOLS

To bring MoodSpace to life, we utilized several tools and technologies:

Spotipy: A Python library for accessing the Spotify Web API.
TensorFlow: We used this for building our machine learning model.
SQLAlchemy: Managed our database interactions.
Flask: Developed the front-end of the desktop app.
AWS: For deploying our application and handling the backend.

THE MISSION

Create a medium for individuals to connect over both music and film

THE PROBLEM

Feature Space

As our project progressed, we reached an exciting milestone: our first attempt at a machine learning pipeline. The plan seemed solid:

Perform k-means clustering on a baseline set of songs.
Build and train the machine learning model using these clusters as targets.
Use the model on movie scripts.
Find the closest matching song.

We thought we’d struck gold by linking the "feelings" of two very different data sets—movies and songs. So, we set it up and moved on, focusing our energy on building the website and interface.

For a while, everything seemed to be going smoothly.

But as we neared the end of the project, we finally got around to experimenting with the actual song recommendations. That’s when we ran headfirst into the Big Problem™.

No matter how different the movie scenes were, the model kept recommending the same songs over and over again. Our clever approach had hit a major snag. We realized that in our rush to create a functional pipeline and a polished user interface, we hadn’t spent enough time refining the model itself.

This realization came frustratingly late in the project, and it took us quite some time to figure out what was going wrong. We ran experiments, tweaked parameters, and scratched our heads until we finally found the culprit: all the movie scenes were ending up in the same tiny corner of feature space!

The songs weren’t being matched to the unique moods of the scenes—they were just clumped together because the feature space wasn’t capturing the distinctions we needed. We eventually put together a diagnostic notebook that quantified the issue, and the results were crystal clear: our model was seeing all the movies as practically identical.

This discovery was a turning point. It forced us to rethink our approach, dive deeper into the data, and refine the way we were clustering and mapping those emotions.

1/1

Parsing various movie text screenplays

DELVING DEEPER

The Machine Learning

Spotify has several different metrics it measures for each song on it’s API. For instance, if we wanted to see the statistics for the song “Waiting on the World to Change” by John Mayer, we find the following:

1/1

Spotify API extract data. We isolated some of these variables and fed them into our machine learning pipeline to detect a particular song's "mood"

However, movie scenes obviously do not have these same metrics, because they do not have any musicality that can be measured, only dialogue. To bridge the gap between movie and song, we must provide to a model some sort of medium that both movie and songs share. Luckily, both have some form of text. For songs, it is the lyrics, and for movies it is the screenplay text.

DELVING DEEPER

Model Input and Output

With this in mind, we can now develop an idea of what the inputs and outputs of our model should be. The input of our model should be some sort of text, whether that be lyrics or screenplay. The output should be a given set of Spotify metrics. We can train our model using song lyrics as input variables and their respective Spotify Metrics as target variables. From there, we can apply this model to given screenplays to predict what the Spotify metrics would be given the text composition of that scene. Now that we have metrics for both songs and screenplays, we can compare the two to see which songs are most aligned to a selected scene’s screenplay.

Based on these new user flows, I was able to envision Aiva as a tool that could be used as both proactively AND reactively to better educate users, identify problems, and solve user issues in the overall Amazonian daily experience.

Model Details

We implemented the above model infrastructure using a tensorflow.keras neural network. Prior to actually inputting the text into the model, it must be vectorized, using the tensorflow TextVectorization function, which returns a vector that represents the words contained within a given song or screenplay in a tokenized form. Once this is complete, we can feed this tokenized vector into our neural network. An outline of our model’s structure is shown below.

Let’s highlight a couple of key features of this model.

EMBEDDING LAYER

The embedding layer takes an input of a vector representing a tokenized string and converts it into a vector in n dimensional space. We chose n = 60. This enables our model to visualize where each of our texts might lie in this space, and thus determine connections and patterns between them to output to subsequent layers of our model.

DROPOUT LAYER

These layers are placed at various locations in the model to “dropout” some of the connecting neurons within the network, to prevent overfitting.

MODEL TRAINING

We trained our model on a random set of songs in the Spotify API who had English lyrics on Genuis API. The target was set as the Spotify Metrics energy, valence, tempo, and liveness. These were the results of our final epochs in fitting the model.

1/1

Losses are relatively low, as is our Mean Squared Logistic Error, on both the training and validation data. Once the model was fitted, we saved the weights and used them to predict the energy, valence, tempo, and liveness of our given movie scenes.

Finding Song Matches

Now that we have metrics for both our songs and screenplays, we must develop some sort of way to determine which songs are most similar to which scenes. We can do this by computing the norm between the metrics of each song to a selected movie scene, which is essentially the distance between two points in n dimensional space.

First, we take the user’s top 20 songs. An random offset is applied to generate a unique list of songs each time the user uses the web app, but this list will not include any songs above their top 70 most listend to tracks. We then run our model on the song lyrics to generate the predicted metrics corresponding to the subject of the song. Though the songs already contain Spotify metrics, we wanted to use metrics generated by the lyrics of the text, as to match the subject of a song to the subject of a movie scene, rather than match the musicality of a song to a movie scene.

BINDING IT ALL TOGETHER

The Design

After nailing out all the technical kinks, it was time for the fun part: the design.

I wanted the interface to be easily navigable and recognizable as a Spotify-related app, so I adopted as much of Spotify's original branding and color scheme as possible. I played with gradients and decided I really liked the purple to blue gradient because it represented the depth of what we were trying to unveil: the song mood. I also paid attention to the user flow, making it as simple and as similar to the Spotify user flow as possible (since our users are most definitely Spotify users). I also felt like our design would work well in dark mode, so I also prototyped an additional variation for said mode. See frames below for one of the first iterations of the web app.

After figuring out the typography, color scheme, components, and iconography, I implemented it on the front-end using javascript, css, and html.

STYLE GUIDE

1/1

LIGHT AND DARK MODE

Final Designs

For the final designs, I wanted to translate the look and feel of Spotify–playful and fun, to our web-based soundtrack generator. I brainstormed and wireframed flows and a homepage that would display different song features as well as match the "mood" detected from a user's Spotify playlists to a particular soundtrack and developed a light and dark mode to enhance readability and promote user engagement.

1/2

Song display with predicted mood displayed. Recommended songs based on mood generated at the bottom.

WHAT COULD HAVE GONE BETTER?

While our model eventually worked, it was frustrating to discover how poor the initial predictions were. We managed to diagnose the problem and fix it by adjusting the model layers, but it was more stressful than anticipated. I wish we could've implemented our stretch goal of allowing users to input their own movie scenes, but deadlines made that difficult. Even without this feature, the curated recommendations are enjoyable, and we have ideas for future improvements, like adding a search bar for movie script databases.

For an online recommendation system, it's essential that it improves with user data. We considered implementing a feedback system, similar to another group's Google form approach, where users report incorrect predictions. Automating this with AWS or Databricks for continuous retraining would be a logical next step, enhancing the system's quality over time. Ultimately, though, I'm proud of the website we built—it's well-designed, functional, and a testament to our teamwork. Personally, I’m happy to have found a unique niche, led development, and created a platform that anyone can use and enjoy.

WHAT I LEARNED

Through this project, I realized that I tend to take on tasks independently, possibly driven by a desire to prove myself or due to social anxiety. However, this wasn’t entirely negative. Our team naturally fell into different niches, making the experience enjoyable. One teammate focused on machine learning, another on backend development, and each of us ended up contributing something we’re proud of. This project highlighted my work style in a team setting, which is something I plan to reflect on further.