Generative Music | Bias in Machine Learning: Youtube
Bias in Youtube Recommendation / Flagging Algorithm
Social platforms like Facebook, Instagram, Youtube, and Twitter host and distribute enormous amount of content everyday, every second. The extent of influence that these platforms have on our lives and on our society is extremely pervasive: for some people, these platforms are the internet, the only place the get content from.
One platform hosting the longest-type content is Youtube. Long(ish) form of video is, I think, unique in the context of evaluating biases in machine learning algorithms because video is a richer medium than just mere text or static image. It allows personalities, complex narratives, and nuances to be embedded in the content, and hence interpreting it becomes infinitely more difficult.
I imagine Youtube’s main goal in its earlier years is to use machine learning algorithms to maximize watchtime for each user via recommendation system. This has resulted in Youtube recommending highly problematic videos (most high-profile of which is Logan Paul’s suicide forest video), and inadvertently encouraging its creators to make videos of physically dangerous activities.
Although Youtube had recently made a statement that it had moved “beyond optimizing for watchtime”, its recommendation and flagging system is definitely still biased. Some transgender Youtube creators had found that Youtube’s algorithm had demonetized their videos just because the title had the word “trans”, and Youtube allowed anti-LGBT ads to run on their videos.
Now, these tech giant algorithms are definitely opaque and I have no qualified insight, but I suspect the problem is rooted in these:
Optimizing for watchtime means rewarding content that triggers our lizard brain. This would include clickbaits, porn, stolen copyrighted content, irresponsible behaviors, conspiracy theories, etc.
Incorporating user-flagging, manual-flagging (like Facebook), or sentiment analysis on like/dislike and comment, if not done carefully, would always punish minority’s views and prioritize majority’s agenda.
The sheer volume of content and complexity of the medium make it impossible for algorithms to soundly interpret each and every frame of the content.
The impossibility to unbiasedly curate is important to note, because the typical capitalistic pursuit of scaling up tech platforms always neglect the externalities (e.g. filter bubbles or proliferation of fake news) it would eventually produce. Not having a system to slow down, take stock, carefully and unbiasedly gather feedback from all stakeholders especially the users, is perhaps the biggest bias.