Interviewer: an array of one million members to take the first and last performance there is a gap it?

This article first appeared in the public micro-channel number "programmer interviewer"

Array data structures can be almost all of the most commonly used software engineer, precisely because of this, many developers it enough attention.

The interview often have such a class of problems: "an array of one million members to take the first and last performance there is a gap it Why??"

In addition, we often a case of the array appear in the usual shuttle business development, we will operate with an array of in most cases, while the habit of reading the source code developers may find in some of the underlying library, we might usually use an array of local, underlying library has chosen another data structure, which is why?

I hope you take the above issues we discuss.

What is an array

An array of computer science in the most basic data structures, and the vast majority of programming languages ​​have built this data structure, it is the most common data structures developers.

Array (English: Array), a data structure of a set of elements of the same type (element) consisting of, a contiguous allocation of memory to store.

Of course, in some languages, for example, a dynamic list, or Python JavaScript array may non-continuous memory, may be stored in different types of elements.

For example, we have an array as follows:

arr = [1, 2, 3, 4, 5] 复制代码

Its performance in memory should look like this:

 

Distributed in an array of memory

 

We can see this in the memory array is a continuous linear form of storage, the storage form of continuous linear both have their advantages and disadvantages, only to find out the pros and cons we can better use in future development array.

Characteristics of the array

A data structure usually has "insert, search, delete, read," the four basic operations, we will analyze one by one performance differences caused by these operations.

First we have to discriminate a concept - performance .

Speed ​​performance here is not the absolute sense of speed, because different hardware devices that will be the basis of a huge difference in speed, performance, here is our "complexity" in the concept of algorithm analysis.

The concept of complexity can be the venue algorithm analysis

Insert performance

We already know that the array is a contiguous storage memory, when we want to insert a new element to the position of the array k it? This time all the elements needed after the index moved back a k, k index position and insert new element.

 

Inserting a new element

 

We saw this when the workload required to operate on most of, usually, insert the time complexity is O (n).

Delete performance

Delete operation is very similar to the insert, the same I want to delete the element k index position in the array, we need to remove it, in order to maintain the continuity of memory, you need to k all the elements after a forward movement, the time complexity of the situation is also O (n).

 

Removing elements

 

Find properties

For example, we want to find out if any one of an array 2of elements, then how does the computer need?

If people say, in the case of a small amount of data we naturally can one find if there is 2an element, but not the computer, the computer needs to be matched from the beginning of the index 0 down until the match 2up to the elements.

 

Find properties

 

This process is actually to find our common linear search, the average number of steps of length n array at this time about the need to match this same time complexity is O (n).

Read Performance

We have emphasized the characteristics of the array is to have the same data type and a continuous linear memory , so it is based on the above features, the array read performance is very excellent, time complexity is O (1), compared to the list , and other binary tree data structure, its advantage is obvious.

So array is how to do such a low time complexity of it?

Suppose we start address of the memory array start, and a length of the element type size, an array index i, we easily get the address of the array memory address of the formula:

arr[i]_address = start + size * i
复制代码

For example, we want to read arr[3]the value, then just need to 3substituting addressing formula, the computer can step querying the corresponding elements, so the time complexity of an array of read only O (1).

Performance Optimization

We already know that in addition to this one operation "read" other than, the time complexity of other operations in O (n), then there is no effective way to optimize the performance of it?

查找性能优化

当数组的元素是无序状态下,我们只能用相对不太快的线性查找进行查找,当元素是有序状态(递增或者递减),我们可以用另一种更高效的方法--二分查找.

假设我们有一个有int类型组成的数组,以递增的方式储存:

arr = [1, 2, 3, 4, 5, 6, 7]
复制代码

如果我们要查找值为6元素,按照线性查找的方式需要根据数组索引从0依次比对,直到碰到索引5的元素.

而二分查找的效率则更高,由于我们知道此数组的元素是有序递增排列的:

  1. 我们可以取一个索引为3的元素为中间值p
  2. 将p与目标值6进行对比,发现p的值4<6,那么此时由于是递增数组,目标值一定在索引3之后的元素中
  3. 此时,再在索引3之后到尾部的元素中取出新的中间值p与目标值比对,再重复下去,直到找到目标值

我们可以发现这样的操作每一次对比都能排除当前元素数量一半的元素,整体下来的时间复杂度只有O(log n),这表示此方法的效率极高.

这种高效的方法在数据量越大的情况下,越能体现出来,比如目前有一个10亿成员的数组是有序递增,如果按照线性查找,最差的情况下需要10亿此查找操作才能找到结果,而二分查找仅仅需要7次.

插入性能优化

比如有以下数组,我们要将一个新成员orange插入索引1的位置,通常情况下需要后三位成员后移,orange占据索引1的位置.

但是如果我们的需求并不一定需要索引的有序性呢?也就是说,我们可以把数组当成一个集合概念,我们可以在索引1的位置插入orange并在数组的尾部开辟一个新内存将原本在1位置的banana存入新内存中,这样虽然索引的乱了,但是整个操作仅仅需要O(1)的时间复杂度.

arr = ['apple', 'banana', 'grape', 'watermelon']
复制代码

删除性能优化

Delete operation requires an element of collective output of the position to move forward, which is very consumption performance, especially in the frequent delete, insert operation especially.

We can first record the related operations, but does not immediately delete, when a certain node we'll record a one-time basis to operate on the array, such an array of repeated frequently moving members into a one-time operation, can be large improve performance degree.

 

To delete an array

 

This idea is widely used:

  1. Virtual front-end framework DOM DOM is a large number of operations will be stored in the first difference queue, and then a one-time update, to avoid the DOM reflux and redrawn.
  2. V8 and the JVM mark sweep algorithm is based on this idea, mark sweep algorithm is divided into two phases, the mark phase of access to the object are marked with a logo, found during the cleanup phase of an object is not marked for recycling.

summary

Back to the question of the title, we can now clearly know that "the first and last array fetch if there is a performance gap between 1 million members", the answer is clearly no, because the array is a linear contiguous memory, we you can step out by addressing the members of the corresponding formula, which has nothing to do with the position of the members.

Finally, we often encounter such a class of problems in the interview or LeetCode, that is the problem child elements in the array.

For example: Given an array of integers, to calculate the length 'k' of the sum of the maximum consecutive sub-array.

 

topic

 

What can be reduced as much as possible the time complexity? Speak your ideas can be.


 



Guess you like

Origin www.cnblogs.com/duxinyi/p/11491552.html