Alibaba back-end internship experience

Alibaba back-end internship experience

es is used in the project, what is the role of es?

Elasticsearch is a very powerful open source search engine with many powerful functions that can help us quickly find the content we need from massive data.

 

What are the important concepts in es?

Cluster: A collection of one or more nodes (servers) that together hold your entire data and provide federated indexing and search capabilities across all nodes. Clusters are identified by a unique name, which is "elasticsearch" by default. This name is important because a node can only be part of a cluster if it is set up to join the cluster by name.

 

Node: A single server that is part of a cluster. It stores data and participates in cluster indexing and search functionality.

 

Index: Like a "database" in a relational database. It has a map that defines multiple types. An index is a logical namespace, mapped to one or more primary shards, and can have zero or more replica shards.

eg: MySQL =>Database ElasticSearch =>Index

 

Document: Similar to a row in a relational database. The difference is that each document in the index can have a different structure (fields), but should have the same data type for common fields.

MySQL => Databases => Tables => Columns/Rows ElasticSearch => Indices => Types => Documents with attributes

 

Type: is a logical category/partition of an index, the semantics of which are entirely up to the user.

 

 

Why is es fast? What are its advantages over mysql?

When querying the database, if you query the ID, it will go directly to the index, and the query speed is very fast. But if you do a fuzzy query based on title, you can only scan the data line by line. The process is as follows:

  1. The user searches for data, and the condition is that the title matches "%手机%"
  2. Get data row by row, such as the data with id 1
  3. Determine whether the title in the data meets the user's search conditions
  4. If it matches, it will be put into the result set, if it doesn't match, it will be discarded. Go back to step 1

This method of full table scan will consume a lot of time when there is a large amount of data. In order to solve this problem, the inverted index method is used in electricalsearch.

First introduce two concepts:

  • Document: The data used for searching. Each piece of data is a document. For example, a web page or product information.
  • Term: Use a certain algorithm to segment document data or user search data, and the words with meaning obtained are terms. For example: I am Chinese, it can be divided into several terms: I, am, Chinese, China, Chinese.

 

Creating an inverted index is a special processing of the forward index. The process is as follows:

  • The data of each document is segmented into words using an algorithm to obtain each entry.
  • Create a table, and each row of data includes information such as the term, the document ID or location where the term is located.
  • Because of the uniqueness of the term, an index can be created for the term, such as a hash table structure index.

 

c0a0db9736067f3896b6f8ac058a9bd6.png

The search process of the inverted index is as follows (take searching for "Huawei mobile phone" as an example):

1) The user enters criteria  "华为手机" to search.

2) Segment the user input content into words and obtain the entries: 华为, 手机.

3) Search the inverted index with the entry, and you can get the document IDs containing the entry: 1, 2, 3.

4) Take the document ID to find the specific document in the forward index.

 

0ad85ba090b0295f92ed7ab65b141e63.png

Although you need to query the inverted index first and then the inverted index, both the terms and document IDs are indexed, and the query speed is very fast! No need for full table scan.

  • Advantages: When searching based on terms and fuzzy searches, the speed is very fast
  • Disadvantages: Indexes can only be created for terms, not fields, and cannot be sorted based on fields.

 

Are transactions used in the project? Let’s talk about business

A database transaction is a sequence of database operations that access and possibly operate various data items . These operations are either all executed or not executed at all. They are an indivisible unit of work. A transaction consists of all database operations performed between the start and end of a transaction

Characteristics of Transactions (ACID)

- Atomicity: A transaction is the smallest executable body in an application that cannot be further divided.

- Consistency: The result of transaction execution must change the data from one consistency state to another consistency state.

- Isolation: The execution of each transaction does not interfere with each other, and the internal operations of any transaction are isolated from other transactions.

- Durability: Once a transaction is committed, any changes made to the data must be recorded in permanent storage.

Common concurrency exceptions

The first type of lost update: the rollback of a transaction causes the data that has been updated by another transaction to be lost.

The second type of lost update: the submission of a certain transaction causes the data updated by another transaction to be lost.

 

Dirty read: a transaction reads uncommitted data from another transaction

Non-repeatable read: A certain transaction has inconsistent reading results for the same data before and after.

Phantom reading: In a certain transaction, the number of rows queried before and after the same table is inconsistent.

Common isolation levels

Read Uncommitted: Read uncommitted data.

Read Committed: Reading submitted data means that you can only read the content that has been committed.

Repeatable Read: Repeatable read. The granularity of the lock is increased on the basis of read committed. During the transaction operation, other things are not allowed to update. This solves the problems of dirty reads and non-repeatable reads, but cannot solve the problem of phantom reads.

Serializable: Serialization, transactions are "serialized and executed sequentially", that is, they are queued and executed one by one.

 

Default isolation levels for different databases

MySQL:

MySQL's default isolation level is "Repeatable Read" . This means that reading the same data multiple times within the same transaction will give the same result, and other transactions cannot modify the data during the transaction.

Oracle:

Oracle's default isolation level is "Read Committed" . Under the "Read Committed" isolation level, transactions can only see committed data. This is Oracle's default behavior, but Oracle also provides other isolation levels, such as "Serializable" (serializable) and so on.

 

 

b159043ad45ee0638d64151c98d54067.png

Lock

• Pessimistic locking (database)

Shared lock (S lock) After transaction A adds a shared lock to certain data, other transactions can only add shared locks to the data, but cannot add exclusive locks.

Exclusive lock (X lock) After transaction A adds an exclusive lock to certain data, other transactions can neither add shared locks nor exclusive locks to the data.

• Optimistic locking (custom)

Before updating the data, check whether the version number has changed. If it changes, cancel this update, otherwise update the data (version number + 1).

 

Using transactions in spring

Declarative transaction: declare the transaction characteristics of a method through annotations.

Add the annotation @Transactional to the service class. In this annotation, transaction-related parameters can be configured.

    // REQUIRED: Supports the current transaction (external transaction), if it does not exist, create a new transaction.required
    // REQUIRES_NEW: Create a new transaction and suspend the current transaction (external transaction).requires_new
    // NESTED: If there is currently a transaction (external transaction), it will be nested and executed within the transaction (independent commit and rollback), otherwise it will be the same as REQUIRED.
    @Transactional(isolation = Isolation.READ_COMMITTED, propagation = Propagation.REQUIRED)
    public Object save1() {
        // New users
        User user = new User();
        user.setUsername("alpha");
        user.setSalt(CommunityUtil.generateUUID().substring(0, 5));
        user.setPassword(CommunityUtil.md5("123" + user.getSalt()));
        user.setEmail("[email protected]");
        user.setHeaderUrl("http://image.nowcoder.com/head/99t.png");
        user.setCreateTime(new Date());
        userMapper.insertUser(user);
 
        //Add new post
        DiscussPost post = new DiscussPost();
        post.setUserId(user.getId());
        post.setTitle("Hello");
        post.setContent("Newcomer report!");
        post.setCreateTime(new Date());
        discussPostMapper.insertDiscussPost(post);
 
        //This step reports an error, observe whether it is rolled back
        Integer.valueOf("abc");
 
        return "ok";
    }

Programmatic transactions: manage transactions through TransactionTemplate and perform database operations through it

    public Object save2() {
        transactionTemplate.setIsolationLevel(TransactionDefinition.ISOLATION_READ_COMMITTED);
        transactionTemplate.setPropagationBehavior(TransactionDefinition.PROPAGATION_REQUIRED);
 
        return transactionTemplate.execute(new TransactionCallback<Object>() {
            @Override
            public Object doInTransaction(TransactionStatus status) {
                // New users
                User user = new User();
                user.setUsername("beta");
                user.setSalt(CommunityUtil.generateUUID().substring(0, 5));
                user.setPassword(CommunityUtil.md5("123" + user.getSalt()));
                user.setEmail("**********");
                user.setHeaderUrl("http://image.nowcoder.com/head/999t.png");
                user.setCreateTime(new Date());
                userMapper.insertUser(user);
 
                //Add new post
                DiscussPost post = new DiscussPost();
                post.setUserId(user.getId());
                post.setTitle("Hello");
                post.setContent("I'm new!");
                post.setCreateTime(new Date());
                discussPostMapper.insertDiscussPost(post);
 
                //This step reports an error, observe whether it is rolled back
                Integer.valueOf("abc");
 
                return "ok";
            }
        });

 

Let’s talk about Java thread pool and related parameters?

Related processes:

 

bc4848f5d5f4eaca79bca2f87fde26da.png

The thread pool mainly has the following 6 parameters:

  1. corePoolSize (number of core worker threads): When submitting a task to the thread pool, if the number of threads created by the thread pool is less than corePoolSize, even if there are idle threads at this time, the task will be executed by creating a new thread until it is created. When the number of threads is greater than or equal to corePoolSize.
  2. maximumPoolSize (maximum number of threads): The maximum number of threads allowed by the thread pool. When the queue is full and the number of created threads is less than maximumPoolSize, the thread pool will create new threads to perform tasks. Additionally, for unbounded queues, this parameter can be ignored.
  3. keepAliveTime (excess thread survival time): When the number of threads in the thread pool is greater than the number of core threads, if the idle time of the thread exceeds the thread survival time, then the thread will be destroyed until the number of threads in the thread pool is less than or equal to the number of core threads. .
  4. workQueue (work queue): A blocking queue used to transmit and save tasks waiting to be executed.
  5. threadFactory (thread creation factory): used to create new threads. The threads created by threadFactory also use the new Thread() method. The thread names created by threadFactory have a unified style: pool-m-thread-n (m is the number of the thread pool, n is the number of the thread in the thread pool).
  6. handler (rejection policy): When the thread pool and queue are full, this policy will be executed when adding threads

 

Four rejection strategies

  1. AbortPolicy: Discard the task and throw RejectedExecutionException.
  2. DiscardPolicy: also discards tasks, but does not throw an exception.
  3. DiscardOldestPolicy: Discard the frontmost task in the queue and retry the task execution (repeat the process).
  4. CallerRunsPolicy: Rejected run methods are handled by the calling thread

 

How to create a thread pool?

Create using Executors related functions

1.newFixedThreadPool: Create a fixed-size thread pool

public class ThreadPool1 {
    public static void main(String[] args) {
        //1. Create a thread pool with a size of 5
        ExecutorService threadPool= Executors.newFixedThreadPool(5);
        //2. Use the thread pool to execute task one
        for (int i=0;i<5;i++){
            //Add tasks to the thread pool
            threadPool.submit(new Runnable() {
                @Override
                public void run() {
                    System.out.println("Thread name"+Thread.currentThread().getName()+"Executing task 1");
                }
            });
        }
        //2. Use the thread pool to perform task two
        for (int i=0;i<8;i++){
            //Add tasks to the thread pool
            threadPool.submit(new Runnable() {
                @Override
                public void run() {
                    System.out.println("Thread name"+Thread.currentThread().getName()+"Executing task 2");
                }
            });
        }

    }
}

2.newCachedThreadPool: Thread pool with cache, suitable for scenarios with a large number of tasks in a short period of time, but may occupy more resources; the number of threads depends on the amount of tasks.

public class ThreadPool3 {
    public static void main(String[] args) {
        //Create thread pool
        ExecutorService service= Executors.newCachedThreadPool();
        //There are 50 tasks
        for(int i=0;i<50;i++){
            int finalI = i;
            service.submit(()->{
                System.out.println(finalI +"thread name"+Thread.currentThread().getName());//How many thread names are there, the CPU creates as many threads
            });
        }
    }
}

3.newSingleThreadExecuto: Create a thread pool for a single thread

public class ThreadPool4 {
    public static void main(String[] args) {
        ExecutorService service= Executors.newSingleThreadExecutor();
        for (int i=0;i<5;i++){
            int finalI = i;
            service.submit(()->{
                System.out.println(finalI +"thread name"+Thread.currentThread().getName());//CPU only creates 1 thread, and the name is always the same
            });
        }
    }
}

4.newSingleThreadScheduledExecutor: Create a thread pool for a single thread that executes scheduled tasks

public class ThreadPool5 {
    public static void main(String[] args) {
        ScheduledExecutorService service= Executors.newSingleThreadScheduledExecutor();
        System.out.println("Add task: "+ LocalDateTime.now());
        service.schedule(new Runnable() {

                @Override
                public void run() {
                    System.out.println("Execute task:"+LocalDateTime.now());
                }
            },3,TimeUnit.SECONDS);//Delay task execution for 3 seconds
    }
}

 

Let’s talk about threadlocal

ThreadLocal can be interpreted as a local variable of the thread. That is to say, a ThreadLocal variable can only be accessed by the current own thread and cannot be accessed by other threads, so thread competition is naturally avoided.

 

use:

Create a ThreadLocal object:

private ThreadLocal<Integer> localInt = new ThreadLocal<>();

The above code creates a localInt variable. Since ThreadLocal is a generic class, the type of localInt is specified here as an integer.

Here's how to set and get the value of this variable:

public int setAndGet(){
    localInt.set(8);
    return localInt.get();
}

The above code sets the value of the variable to 8 and then gets this value.

Since the value set in ThreadLocal is only visible to the current thread itself, this means that you cannot initialize the value for it through other threads. To make up for this, ThreadLocal provides a withInitial() method to uniformly initialize the ThreadLocal values ​​of all threads:

private ThreadLocal<Integer> localInt = ThreadLocal.withInitial(() -> 6);

The above code sets the initial value of ThreadLocal to 6, which is visible to all threads.

 

How is the bottom layer of hashmap implemented?

hashmap definition:

HashMap is a map class implemented with array + singly linked list + red-black tree. At the same time, the default initial capacity of its array is 16, the expansion factor is 0.75, and the expansion is 2 times each time.

HashMap implements the Map interface, stores data according to the HashCode value of the key, has very fast access speed, allows the key of one record to be null at most, and does not support thread synchronization.

HashMap is unordered, that is, the order of insertion is not recorded.

 

HashMap stored procedure:

HashMap will calculate the corresponding array subscript of the value to be stored according to the key. If there is no element at the corresponding array subscript position, then the stored element will be stored. However, if there is already an element at that position, then this You need to use the linked list storage we mentioned above. Just store the data downwards in the order of storage in the linked list.

When the length of the linked list is greater than 8, we will perform a "tree" operation on the linked list and convert it into a red-black tree.

But please note that only when the length of the linked list is less than 6, we will re-convert the red-black tree into a linked list. This process is called "chaining".

 

Knowledge related to hash tables?

Hash table (hash table), we usually call it hash table. A hash table is a data structure that is directly accessed based on the key value.

The hash function is essentially a function, we define it as hash(key), key is the key value of the element, and the value obtained through the hash function is the hash value.

Hash function requirements

1. The hash function should not be too complex. Too complex will definitely consume more time, thus affecting the performance of the hash table.

2. The hash values ​​obtained by the hash function are as random and evenly distributed as possible, so as to reduce hash conflicts. Even if there are conflicts, the elements corresponding to each position will be relatively average, and there will not be too many, but some will be special. Few situations.

 

Hash conflicts will inevitably occur when we construct a hash table. We have two methods to solve this problem:

1. Open addressing method

The developed addressing method is that if we encounter a hash conflict, we will re-explore a free location and insert it. Common open addressing methods include linear detection and secondary exploration.

2. Linked list method

The linked list method is a more commonly used method to resolve hash conflicts and is simpler than the open addressing method. Each subscript position in the hash table corresponds to a linked list. All elements with the same hash value obtained by the hash function are placed in the linked list corresponding to the subscript position.

 

How to determine if two objects are the same

Use equal. By default, equal has the same function as ==. It will compare whether the addresses of two objects are the same. We can achieve object comparison by rewriting.

 

Let’s talk about the factory pattern and observer pattern in design patterns

Simple factory pattern

The factory pattern has a very vivid description. Creating an object class is like a factory, and the objects that need to be created are products; products are processed in the factory, and people who use the products do not need to care about how the products are produced. From a software development perspective, this effectively reduces the coupling between modules.

 

8a5d0deb0c449f53ac05f7c0caa65645.png

Factory pattern

Define an interface for creating objects and let its subclasses decide which factory class to instantiate. The factory pattern delays the creation process until the subclasses.

Specific process: create an interface; create an entity class that implements the interface; create a factory to generate objects of the entity class based on given information; use the factory to create the corresponding factory and objects of the entity class by passing type information;

 

abstract factory pattern

The Abstract Factory Pattern creates other factories around a super factory. The Gigafactory is also known as the factory of other factories. This type of design pattern is a creational pattern. Different from the factory method, Mercedes-Benz's factory does not only produce a specific product, but a family of products.

 

The difference between factory mode

  • Simple factory: Use a factory object to produce any product in the same hierarchical structure. (Expansion and additional products are not supported)
  • Factory method: Use multiple factory objects to produce corresponding fixed products in the same hierarchical structure. (Support expansion and additional products)
  • Abstract factory: Use multiple factory objects to produce all products of different product families. (Expansion and adding products is not supported; adding product families is supported)

 

Observer pattern

Define a one-to-many dependency relationship between objects. When the state of an object changes, all objects that depend on it are notified and automatically updated.

For example: During an auction, the auctioneer observes the highest bid and then notifies other bidders to bid. redis sentry mode supervises the master node



Author: Xiao Yi
Link: Ali Yiyi Mian Jing_Niuke.com
Source: Niuke.com

 

 

 

Guess you like

Origin blog.csdn.net/qq_51118755/article/details/135307940