Posted on December 14, 2018 at 11:02 PM
Programmer Discovers YouTube’s Secret: Text in Videos can be Read
Everyone already knows just how much user information Google records during each interaction with the search engine, the browser, and even smartphone apps. However, it appears that YouTube is also not quite innocent in this regard, as a recent event has people thinking that the video platform can even read the text displayed in content creators’ videos themselves.
Austin Burke, a programmer going by the name Sudofox, recently stumbled upon this discovery after finding an issue with cross-site scripting (XSS) on another website. Burke attempted to reveal the issue publically by making a video about it and posting it on YouTube so that relevant parties would see it and deal with it.
The video which Burke made, clearly displayed an URL which he used for testing the XSS exploit. After uploading the video, he realized that someone might visit the displayed URL, so he decided to check if someone actually thought about doing it. Surprisingly, he found several visits from something called Google-Youtube-Links, which is an automatic user agent which is used for identifying the nature of a program after visiting an URL.
How is YouTube doing this?
This is where Burke realized that the URL was only visible during the video itself, it wasn’t written down anywhere except for the address bar within the video. This can only mean that YouTube is using an optical character recognition (OCR), which has located the URL within the video and has decided to check it out.
When content creators decide to upload a video on YouTube, they are presented with several classifications for that video. One such classification, named Unlisted, allows anyone to see it, as long as they have a link to it. However, if the video is marked as Unlisted, it will not appear in recommendations.
Another classification is Private, which allows only those with specific invites to access the video. Being aware of this, and in order to ensure that there is no mistake, Burke decided to test his new theory by submitting a video classified as Private. The video contains a folder and file on his domain, which doesn’t exist. However, only a few minutes after the video was uploaded, the URL displayed in it already had several hits by the same Google-Youtube-Links user agent.
While Burke was spooked by the result, it still confirmed his theory. However, this can cause a lot of trouble for security researchers that are trying to disclose a vulnerability. In one scenario that Burke himself predicted, a researcher might use a private video to disclose an SQL injection vulnerability or a malicious URL. While a researcher or the cyber-security company would not visit said URL, YouTube might still do it on its own.
Because of issues like this, many believe that free, public websites are not the best place for revealing such serious system flaws. It is much better to use encrypted communication channels for such purposes. While Burke’s hypothetical scenario could mean trouble for researchers, this is only true in theory, as most researchers do not choose YouTube videos for communicating serious flaws anyway.
Protecting…or spying?
Nevertheless, the fact that YouTube is capable of reading URLs contained within videos themselves might be highly disturbing for some of its creators. Many have already suggested that YouTube might be capable of reading writings on a T-shirt, or even a vehicle number plate.
This also sparked a question — what if other websites are doing the same? It also made many people realize that they do not really know how YouTube handles data posted on the platform. This discovery happened by accident, but the YouTube community now wants to know how many other things YouTube does that nobody outside the company knows nothing about.
It is also understandable for Google to check these videos if it wants to stay ahead of accusations that it is not protecting its users from offensive or illicit content. However, while the due diligence is praiseworthy, many fear that this can easily turn into just another method of spying on said users.