Wikipedia, happy 20th birthday

Author: Ringo

One day in 1971, on the outskirts of the Austrian city Innsbruck, a middle-aged man was lying drunk. He is the famous science fiction writer Douglas Adams. On this day, Adams brought a book called "Roaming Guide to Europe". And when he lay on the field looking up at the stars, inspiration came: he imagined the existence of "The Hitchhiker's Guide to the Galaxy", which would be a hybrid of a travel guide and an encyclopedia. The real magic is that the book is not written by experts, but anyone can contribute entries.

Obviously, we who live in the 21st century have already seen the “physical object” of this vision, namely Wikipedia.

Today, this free encyclopedia that anyone can edit is just over 20 years old.

The birth of an amateur encyclopedia

Wikipedia was not originally intended to be an independent information website. It is actually a "by-product" of another encyclopedia project written by experts-Nupedia.

At the beginning of the 21st century, the Internet was in the ascendant. With the development of the Web, many people tried to develop the Internet encyclopedia project to allow encyclopedia-like information to "come out" from the library.

image

Picture source: Wiki

Nupedia is one such attempt. It was co-founded in 2000 by former financial trader, Internet entrepreneur Jimmy Wales, and PhD Larry Sanger.

But throughout the year, the writing of Nupedia articles that relied on experts to create content was very slow. In 20001, the two began to rediscover new ways to supplement Nupedia with more open and complementary projects. At this time, they were exposed to the concept of "an encyclopedia that anyone can edit", and Wikipedia based on Wiki technology was born from this.

image

Jimmy Wales and Larry Sanger, the founders of Wikipedia. Source: commons.wikimedia.org

By the end of 2001, Wikipedia had more than 20,000 articles written in 18 languages, and its growth rate was accelerating, proving that its model was more in line with their expectations than Nupedia.

In 2003, Wales established the Wikimedia Foundation to operate servers and software and to raise the necessary funds. The control of website content is still in the hands of a circle called "wiki people", who have developed complex workflows and guidelines to generate and maintain content.

Now, Wikipedia has more than 55 million articles in hundreds of languages, and each article is written by volunteers. It is the largest and most-read database in human history. Web analytics company Alexa Internet ranks Wikipedia as the 13th most popular website on the Internet, ahead of Reddit, Netflix and Instagram.

image

Wikipedia was born based on such a simple original intention-ordinary people can use computers and the Internet as tools for liberation, education and enlightenment.

But for a long time, this idea of ​​creating an amateur encyclopedia was regarded as a joke by some authorities.

"A few people sincerely agree with Wikipedia. This makes me puzzled." A former president of the American Library Association wrote in 2007, "A professor who encourages the use of Wikipedia is equivalent to a recommendation for stable consumption of McDonald's Dietitian."

Even if some academic studies have confirmed and emphasized that it can be used as a reliable source of information, the recognition of Wikipedia is still not comparable to the old encyclopedias like Encyclopedia Britannica. After all, the latter is written by academic experts for a fee.

In 2005, Nature magazine even formed an expert group to solve this problem. The team found 42 scientific articles for testing from the websites of Wikipedia and Encyclopedia Britannica. As a result, the error rate of Wikipedia is an average of 4 per article, and Encyclopedia Britannica has an error rate of 3.

In theory, Wikipedia cannot quell such doubts; in practice, it has achieved an unquestionable victory.

How much is it worth?

In addition to the daily use of Wikipedia as one of the authoritative sources of information by the general public, in recent years, social platforms that have suffered from fake news, false information and conspiracy theories (such as Facebook and Youtube) have gradually promoted Wikipedia as a neutral, High-confidence information source.

Wikipedia has also won the favor of official institutions. During the rumors of the new crown epidemic, the World Health Organization chose to cooperate with Wikipedia to provide information on covid-19 through the website. WHO believes that such cooperation is essential to prevent the spread of misinformation about the new coronavirus.

When commercial companies and government agencies start to use this tool, it becomes even more difficult to calculate the value and influence of Wikipedia.

Shane Greenstein, an economist at Harvard University, once said: "Wikipedia is an example of what I would call'digital dark matter'." He has carefully studied this website and compared it to parenting and housework: investing in this kind of Affairs can produce great value, but this value is difficult to measure with standard economic tools.

There are also attempts to overuse the value generated by Wikipedia. A study in 2018 stated that the value of U.S. netizens' investment on Wikipedia each year is about $150. If it is true, in the United States alone, the value of the site is as high as $42 billion per year.

With the rise of data intelligence and AI technology, Wikipedia has also produced a more indirect economic benefit-as the original text of a large number of machine learning data sets, "feed" to various natural language processing models.

According to our incomplete statistics, probably since 2015, representative data sets using Wikipedia as the original corpus began to emerge in large numbers: first, WikiQA released by EMNLP in 2015, and then to the great success of SQuAD 1.1 in 2016. More and more dataset development teams use Wikipedia to develop datasets. Especially worth mentioning is the SQuAD data set. Its appearance has become an important turning point in the field of machine understanding. To this day, SQuAD (and the subsequent release of SQuAD 2.0) is still an important standard for measuring machine reading comprehension models.

image

A dataset built with the help of Wiki. Picture source: Data combat faction

Of course, this kind of data set promotes the machine's reading comprehension ability, and correspondingly "absorbs" some errors, including misinformation and bias.

In particular, what has been widely criticized for a long time is the diversity of Wikipedia authors. Studies have found that most of the people who write on Wikipedia are white-collar men who live in developed countries in the northern hemisphere and are good at technology products. What they write is often information that interests them. This disguise has created a kind of "survivor bias": in Wikipedia, there are more than 150 entries about characters in "Lord of the Rings", but fewer than 10 entries about the Vietnam War.

Therefore, how to correct such deviations in Wikipedia-based data sets is becoming an important direction in the field of AI ethics.

Where is it going?

Amazon and Apple train Alexa and Siri to answer factual questions based on Wikipedia; Google uses it to fill in "fact boxes" and apply them to search scenarios related to factual questions; maybe, the voice assistant you use the most every day is also Trained with a data set based on Wikipedia.

Even if these commercial companies benefit from it in this way, Wikipedia does not do anything special. It still has no so-called business model. This is why, in the eyes of some people, it is a strange and unrepeatable existence.

The pages of contemporary technology media are full of stories about the technology giants burning a lot of investors' money in pursuit of scale and traffic. But Wikipedia is contrary to all this.

Wikipedia has such a traffic, it has not staged the founder's deeds and fell into the cliché. It has no shareholders and does not sell advertisements, so no billionaires have emerged from its founding team. It is reported that Jimmy Wales has a personal net worth of only US$1 million, which is far away from other Internet giants that make money every day.

It is the legacy of the technological optimism and grassroots professionalism of the Internet at the end of the 20th century. Its income comes from charitable grants and user donations. It is not an exaggeration to call it the miracle of “power generation with love”.

Today, Wikipedia is hosted and funded by the Wikimedia Foundation, a non-profit organization, and the Wikimedia Foundation mainly relies on donations and grants from the public or enterprises. In the past few years, important donors include American investor Warren Buffett, former American President Jimmy Carter, Virgin Group CEO Richard Branson, Amazon.com founder Jeff Bezos , Craigslist founder Craig Newmark, etc.

In recent years, almost every time a "birthday" has been celebrated, there will be doubtful voices saying, how long can Wikipedia survive this way? For example, when various information platforms are showing a trend from manual to algorithm-driven, Wikipedia is still operating and managing by people instead of algorithms. Is this a good thing or a bad thing?

This pressure is not without. Katherine Maher, executive director and CEO of the Wikimedia Foundation, once said that if Wikipedia had not existed for a long time, I am afraid that it would not have been born in today's fragmented and commercialized Internet world.

But given that it already exists, Katherine is optimistic about its survival prospects. She believes that the existence of Wikipedia caters to this part of human nature: "People like to be right and love to prove their ability."

Moreover, even mistakes are not without gain. According to Cunningham's Law, the best way to get the correct answer on the Internet is to post the wrong answer.

image
Picture source: unsplash.com

References:
1、 https://rrchnm.org/essay/can-history-be-open-source-wikipedia-and-the-future-of-the-past/
2、https://www.cs.mcgill.ca/~rwest/wikispeedia/wpcd/wp/h/History_of_Wikipedia.htm
3、https://www.wired.com/story/wikipedia-online-encyclopedia-best-place-internet/
4、https://www.technologyreview.com/2013/10/22/175674/the-decline-of-wikipedia/

Wechat account: Data actual combat faction
, please contact the editor in the background~

Guess you like

Origin blog.csdn.net/shujushizhanpai/article/details/112682490