Last week a team of Danish university researchers released a dataset of 70,000 OkCupid profiles. The dataset included gender, usernames, age and location information amongst a large list of profile questions. Many have voiced ethical concerns over the treatment of ‘public data’. The Danish researchers extracted the data via a web scraper, amalgamating it in an easy-to-analyze format.
The ethical argument made by many academics is the debate around public versus private information. For example, Michael Zimmer, Privacy and Internet Ethics Professor, argues that making public data available in large data sets is an ‘ethical foul’. A counter argument being made is that public data is just that – public. Data junkies should be able to mine it and glean valuable information from the data.
LinkedIn has changed the game for businesses adding value for recruiters and sales people alike. Instead of an OkCupid search of a ‘female seeking a male’, LinkedIn users are searching for ‘an organization that might purchase a service’ or, ‘a data scientist looking to work for a tech company’ – the similarities are uncanny. According to InformationWeek.com, “LinkedIn want[ed] Amazon to turn over names of people it says registered fake LinkedIn accounts to extract users’ data”. Businesses have been and are targeting customers and candidates more effectively with larger data sets. Businesses are consolidating this information whenever possible.
From a corporate perspective, social networking companies such as OkCupid, Facebook and LinkedIn make profits and gain power from controlling the flow of data. The conundrum that social networking sites face is how much data to expose. If they expose too much information, it can be scraped, amalgamated and leaked. If they expose too little, the users on the sites will not be able to gain valuable insight, making the social networking site unpopular and in turn, less profitable. Like Lady Justice, social networking sites hold balance scales with access to data on one side and profit on the other.
From a personal perspective, how much data are we comfortable with being shared? Like any technology, data can be used for good and evil; it is up to our personal beliefs to determine how much to expose to the outside world. The leak of Ashley Madison data, where the slogan is ‘life is short, have an affair’, allowed people to weigh the moralities of the data leak up against the moralities of having an affair. From a personal perspective, there will always be a debate as to what is ethical or unethical in terms of data release. It is up to each individual user to determine what they are comfortable with sharing. In today’s day and age, assume that everything is shared unless it is in the vested interest of the organization housing your data to keep it private. This means that your personal banking information will be safeguarded as it is in the bank’s best interest to keep it private as they will put themselves in financial risk if it is released.
From an analytics and big data perspective, open data drives more value. Governments are pushing towards the open data movement as, according to Open Data Institute, it is proving to gain economic value. Big data is the lifeline of analytics professionals as it drives a newly emerging field and truly does drive economic decisions.
Let’s take the OkCupid data from 70,000 individual users, for example. We have the location, age and relationship status of each individual users. From the data, an expert determines that there are a large number of single people in their 60’s in the San Francisco Bay area. An entrepreneur, using the data for good, decides to throw a mixer aimed at that particular demographic. Some of the mixer attendees fall in love, get married and live out the rest of their days together. What happened? The entrepreneur made money, the wedding planner made money and people looking for love got married.
Open data can be good and bad, unethical or ethical, depending on each individual’s opinion. Ethics vary across people and countries – this debate will not be solved any time soon.
Have an opinion? Let me know what you think.