Arxiv User Guide

https://www.jianshu.com/p/0c634da4634e?utm_source=oschina-app

If you are very sure about what you are looking for, such as knowing the name of the paper (the name of the algorithm) or the name of the author, it is fastest to search directly on Google Scholar. However, if you are not sure what you want, but just want to see the latest developments in a certain field and know what everyone is doing, but you find that most of the results that Google Scholar gives you are unreliable, please read on.

Introduction

​ ​In the past six months, the following conversations have often happened to me: ​

"Hey, do you know how to solve the problem of XXXXXXXXXXXXX?"

"Oh, I happened to have read two related papers and will send them to you later."

......

Awkward silence. Generally speaking, after getting the article, the other party will never come to me again. Although I sincerely want to communicate, some studious children will continue to ask:

 

"Where did you find your paper?"

In view of the different backgrounds of the papers, I will leave a link there - this link (website) in most fields of mathematics/physics/computer/statistics is https://arxiv.org/ .

What is Arxiv

The original intention of arxiv's design was that a group of physicists wanted to exchange papers they were about to publish. You can imagine that in the 1990s, people were still using floppy disks (including 5.25-inch floppy disks and 3.5-inch floppy disks, etc.). The storage space can be said to be Calculated in KB), mailboxes are no exception. During the peak submission period, mailboxes of hundreds of KB will be filled with articles every minute. Paul Ginsparg saw that this was not possible. If the paper wanted to be shared and reviewed well, it probably needed to be stored centrally. In 1991, LANL ( http://www.lanl.gov/Los Alamos National Laboratory ) established the prototype of arxiv, as shown in the figure below.

 

At that time, there was a cute domain name: http://xxx.lanl.gov/ , which can still be accessed today. However, because LANL, as a rigorous scientific laboratory, was too lazy to operate this website, it was later taken over by Cornell University for support.

However, when we talk about arxiv today, we have to talk about Open Access. We all know that reading articles used to cost money, and even today, reading most articles also costs money. As shown in the picture below, reading a Nature article generally costs US$20, which is equivalent to more than 100 RMB.

 

​This price, whether it is expensive or not, is cheap or not. It is not a problem to buy a site license or reimburse from most wealthy companies and awesome schools, but for those individuals who are interested in scientific research or Those schools in backward areas are very problematic. Countries like Malawi, Central Africa, etc., by 2016, the per capita GDP in official data was only US$400 (about 20 articles), you can let people do whatever they want. Do we want knowledge to be concentrated in the hands of a few people like wealth?

It is as if we have always wanted the Internet to be neutral (Internet service providers such as telecom operators and cable TV companies should treat all traffic passing through their networks equally. Network service providers treating different traffic differently may make the world Companies can restrict consumer freedom), we do not want money to block the spread of knowledge. So, we have the Budapest Declaration :

There are many degrees and kinds of wider and easier access to this literature. By 'open access' to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.

As a pioneer of open access, we want to applaud arxiv, a pioneer of open access! The advantage for us is that it’s free! free! free!

Based on my personal access experience, if you want to know which magazines or journals provide free resources, you can refer to this list: https://en.wikipedia.org/wiki/List_of_open_access_journals The picture is provided under the "Mathematics" category Open access journal.

 

What should I do if I really can’t find free articles?

I usually send an email directly to the author to ask for it, and by the way, I can talk about my research field and several latest issues related to this article. The success rate is quite high, and I may make good friends.

Speaking of Arxiv, there is another thing that cannot be ignored: LaTex - I personally think it is the most beautiful text editor (or language?), but after jumping out of the academic circle, I found that no one seems to be too lazy to use it. Due to space limitations, I will not use it here. Again.

What Arxiv has

Due to various historical reasons, the main research field of the literature in Arxiv is mathematical philosophy, including mathematics/physics/computer/statistics/astronomy/quantitative biology/quantitative finance and other fields. For publication statistics as of 2016, please refer to the chart below.

 

The picture on the left is the number of newly submitted articles each year, and the picture on the right is the percentage of publications each year (the sum is 1). "hep-" represents high energy physics, (hep-th+hep-ph+hep-lat+hep -ex), "cond-mat" represents condensed matter physics, "astro-ph" = astrophysics, "math" represents mathematics, "other physics" represents other fields of physics (physics+nucl+gr-qc+ quant-ph+nlin) "biology" refers to quantitative biology, "finance" refers to quantitative finance, and "cs" refers to computer science.

We can see that before 2002, the proportion of the field of computer science was almost negligible. However, by 2016 it accounted for nearly one-fifth, and it is still growing at an extremely fast rate. High-energy physics, which occupied most of the country from 1992 to 1996, has been almost completely eaten away today, with only about 10% of the field hanging on. The vicissitudes of life are evident.

The total number of papers published is as follows: ​

 

​(Information from: https://arxiv.org/help/stats/2016_by_area/index )

From the picture, we can clearly find three things:

  1. The number of publications of all papers has increased explosively on the timeline, which is particularly obvious in the chart above of the total number of historical publications;
  2. Mathematics is the dominant one, both in terms of annual publications and total historical publications;
  3. Computer science (cs) only accounts for 8.3% of the total historical publications, but the number of submissions in 2016 accounted for 18.3%. Combined with the analysis of the submission number chart, the growth is really gratifying and worth looking forward to.
  4. Nearly 10,000 articles are submitted every month (the number officially accepted will be less, and if it is accurate to the vertical fields of interest, it will be even less).

So, what should you do if you don’t want to only care about mathematics and physics?

It's very simple, just like any market, after arxiv became popular, a large number of people followed suit, so we now have the biology version arxiv https://www.biorxiv.org/ , the psychology version arxiv https://psyarxiv.com/ , Etc., etc. Of course, the accumulation of knowledge often takes time. The "arxiv" in these vertical fields are not very mature so far, so I still recommend that you combine the institutions in the Open Access list mentioned above to find the free resources you want. .

How to use Arxiv

As I mentioned at the beginning of this article, the biggest effect of this engine is that when you are not sure what you want, you can check out the latest developments in the field and know what everyone is doing. Its biggest advantage is that it is trustworthy. Of course, there are many others that are equally trustworthy - although most of them cost money, for most journals, you can sort them according to the impact factor from large to small (as we all know, the impact factor The evaluation criteria are very one-sided like college entrance examination scores, but this is also the most common solution at present). The Nature series and Science series mentioned above are both very trustworthy. Without further ado.

What I’m very happy about is that the fees (some of which are relatively expensive) are basically journals. However, unlike other fields, especially biology, the top programs in computer science are often conferences rather than journals, and conferences There is often no charge! For example, here is the International Conference on Machine Learning [ICML], one of the top conferences on machine learning. All articles in it can be found at the link below. https://icml.cc/Conferences/2017/Schedule?type=Poster

So happy! ​——This is also my most recommended way for everyone to find articles worth reading: staring at the list of famous conferences in the field (although there is a disadvantage, that is, whatever you get is irrelevant).

However, newcomers often have no way to distinguish which journal and conference are truly valuable through keyword searches [look at faces]. There are so many conferences every year. I randomly searched for artificial intelligence on wikiCFP, and in the next year there were 3,130 conferences in this vertical field alone. To be fair, how much of it is truly valuable? Each conference is calculated based on 50 papers. There are 150,000 papers. A human being can read papers full-time for a year without eating, sleeping, or working. How many papers can he read?

Of course, according to personal experience, regarding the ranking of conferences in the computer field, you can refer to the following link https://www.aminer.cn/ranks/conf . I intercepted the top rankings in the field of AI/PR (Artificial Intelligence/Pattern Recognition). A section, as shown below. Those who want to learn about computer vision can refer to the ones with "vision" in them. The analysis and comparison of specific conferences will be analyzed in another article, so I won't go into details here.

 

But what if you don’t want to stare at the list one meeting at a time? What if you just want to find a certain topic or keyword? Or just want to know what new algorithms have appeared this month? ——The meeting is only once a year. ​​ At this time, it’s time for arxiv to shine. It gives us a more centralized search platform, and is relatively more trustworthy (anyway, compared to Google Scholar, which can search everything, users have much less noise to filter through).

There is no evidence, are they all just hooligans? ​ Please click on the following link https://arxiv.org/list/cs.LG/recent . This is a search in the field of Machine Learning. Just by looking at the familiar author names on this page, you will know that most of them will not disappoint you.

 

Although arxiv is positioned as preprint, it also includes various articles that have been accepted by top industry conferences such as NIPS, AAAI, etc. Moreover, all of these have direct pdf original texts that can be downloaded for free. Search and filtering costs are extremely low. At the same time, you only need to click on the name of the author you are interested in, and all the articles he has published will be listed (Lei Shu in the picture below), without having to worry about the trouble of authors with the same name and surname that often occurs in other search engines - this point I believe that anyone who has searched the pinyin of Chinese names will have a deep understanding - Zhang Wang, Li Zhao and Zhao are all over the world is definitely not just talk.

 

If the database is like this, what else can I ask for?

For comparison, the following are the results I got from searching for the Machine Learning keyword on Google Scholar. Everyone can experience the timeliness, relevance and quality by themselves.

 

Of course, timeliness can be solved by clicking "Sort by date" on the left hand side. However, after clicking, it looks like this:

 

——Anyway, Springer’s products cost money, not to mention the quality, and the downloadable pdf is really not common.

​More importantly, Google, as a long-established cross-platform search engine, does not search in special fields when it sees a certain keyword, so there are a large number of articles with unrelated topics (if you don’t believe it, you can search lenet, vgg and the like, look at the search results), will interfere with the search results.

So, when the reference you are looking for belongs to the field of mathematics and physics, especially related to AI/ML/Stat, and Google Scholar fails to give you satisfactory results (or is very expensive), try arxiv!



Author: ThoughtWorks
Link: https://www.jianshu.com/p/0c634da4634e
Source: Jianshu
Copyright belongs to the author. For commercial reprinting, please contact the author for authorization. For non-commercial reprinting, please indicate the source.


 

Guess you like

Origin blog.csdn.net/u012057432/article/details/103246142