Online porn: the largest dataset with the most benefits

A while back I met with a friend of mine who lamented the lack of usefulness on most machine learning experiments. Her train of thought was like “if you really want to show usefulness, you gotta make your AI trained on a real and meaningful dataset. But all you have is of dubious value. But if I can’t access real data, how can I reach real achievements?

Well, this is what I was thinking about when I realized that the MOST accessed databases on the World are actually the MOST real and the MOST meaningful at the same time. And what more, a large percentage of these databases are free to access for anyone capable of hitting the button that reads: “I am at least 18 years old”.

Yes, I am talking about the online streaming porn sites. But what is in there apart from the quick and hassle-free joy that these sites cater for?

Well, an infinite number of human models to train machine-learning AIs, to begin with.

And I have a feeling that I am not the first one to realize that.

I think that the data stored on online streaming porn sites is one of the most structured data, because it is categorized along not only what you are about to see, but also alomg the names of the actors, the dates (that correlates with the aging of the models) and with popularity counters that provide an insight into one of the main human weaknesses that is the need for intimacy.

Imagine this: someone is training a machine to be able to ID actors in the streaming videos and then use the trained AI to search for instances on the non-porn-related parts of the Internet. Because I am absolutely sure that at least one third of these videos has been recorded without the informed consent of the actors or without the full knowledge of how these videos will be used in the future, I think that IF someone does this, well s/he will have a huge potential for blackmail, to say the least.

And IF it is done by an organization, whose intentions are in line with gathering “kompromat“* of a large number of people of the society of an adversary? Well, then there is trouble down the road.

And IF it is done by an individual or a group of individuals to round up, intimidate and use humans by exploiting their past mistakes? It is equally possible, or at least that’s what I think.

A number of video-tweaking tools are widely known and are used to fake videos, and so are a wide array of facial and object recognition solutions. And I am talking about what is CURRENTLY available, not what will be able in the near future.

And because of the relatively good money that can be made by such a site, there always be a great number of such databases – that are expanding minute by minute.

Let’s be clear: I am not saying that this is happening, just that it is pretty much doable. And that makes it reasonably assumable that it is already taking place somewhere.

 

(For a quick factual support, it is enough to know that the eight largest online streaming porn video sites are together reported to be one of the top three consumers of bandwidth in the world, and their users are accounting for an estimated 35% of total Internet downloads. So, the dataset I am talking about IS huge. And what more, about 80% of the datasets are owned and operated by a single tech giant.)

Footnotes:

*Kompromat – embarrassing, compromising or damaging material or information intended to be used against someone. One aspect of “kompromat” that stands the test of time is that the compromising information is often sexual in nature.

 

Sources (for the site’s Sufficient Source Policy, please check HERE):

http://www.slate.com/articles/technology/technology/2014/10/mindgeek_porn_monopoly_its_dominance_is_a_cautionary_tale_for_other_industries.html

https://www.theatlantic.com/business/archive/2016/04/pornography-industry-economics-tarrant/476580/

https://en.wikipedia.org/wiki/Kompromat

Why Are Porn Perfomers Scared to Talk About Internet Piracy?