Starting! DevOps@BOC — The way to use the device, like thinking and grinding

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

About the Author

于洪奎

Senior Manager of Bank of China Software Center,
System Analyst, CCEP, CSM, DevOps Master
14 years of software development related experience, including 9 years of development management experience

The following is a compilation of teacher Yu Hongkui's sharing at DOIS 2018 · Beijing Station.

I come from a development department of the Bank of China Software Center. The Bank of China Software Center started to pilot agile software development and related CI, CD practices in 2013, and our internal real DevOps was later than this.

Bank of China’s DevOps involves many product lines and technology stacks. The following sharing does not give you a complete introduction to Bank of China’s entire DevOps approach. It will only include the DevOps practices of some products related to X86.

My sharing mainly includes the following four aspects:

First, let’s take a look with everyone. What are the so-called "appliances" and DevOps, and "appliances" in this topic?

Second , share with you our DevOps, what is our DevOps? It is the tool selection and some overall thinking when we practice DevOps. This is not to say that our approach is the best, but to share with you our approach. The so-called, the stone of the mountain can be used for jade and throw away. Click "Little Stones" for your reference.

Third , share some things that we think we have done more successfully. On the one hand, we will show you something, and on the other hand, it will also increase the confidence of colleagues in the practice of DevOps. Some things are not only achievable by Internet companies. In fact, , Our financial companies and large banks can also do it.

Fourth, to share a few of our "failure lessons", we often say "Come on, say something unhappy, make everyone happy", this is also to let everyone have a good time and reflect on it. The so-called "failure is the mother of success", but "failure" is not enough. The most important thing is learning and reflection after failure.

1. Device and DevOps


Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

First of all, I will share with you this picture. This picture is the entrance to the DevOps maturity survey website of Atlassian, which developed the famous agile management tool Jira. The URL is: https://www.atlassian.com/devops/maturity- model .

Why share this picture with everyone? I think the DevOps description in this picture is a better expression: when we look at the word DevOps, what do you hope to get?

We are all doing DevOps now, and we have been doing agile some time ago. I believe many of us must have thought about it. What are we doing this for? And from this picture, there are two words that fascinate me very clearly, expressing clearly what we hope to get from DevOps, namely: "frequent" and "reliable" . In the final analysis, we do DevOps, we hope to release our software application products more "high frequency" and more "stable". This place will tell you two short stories.

The first short story. Last year, I sent an anonymous questionnaire to a business department of our head office. The content of this questionnaire is very simple. What do you think of us as the software development department of Bank of China? Should be promoted?

The first is that our delivery is not fast enough, we should deliver faster; the second is that we do not deliver enough, we should do more delivery; the third is that the quality of our delivery is poor, we should improve delivery The quality of our software; the fourth is that our development costs are high, and the same development content we offer is expensive, we should reduce our own development costs; the fifth is that the security of the software we deliver is poor, and we should improve the security of the software delivered . I referred to these five points as: more, faster, better, province, and security . I gave these five items to our business colleagues and asked them to choose one, up to two. What is the final result? The result is that more than 90% of business colleagues think that we are slow now and we should deliver faster.

We often people say that our financial industry is very different from other industries. We are not the Internet. Therefore, our agile and DevOps should also have their own characteristics. I totally agree with this point. However, being distinctive does not mean being divorced from the essence.

DevOps in our financial industry can't just learn the Internet and copy the practices of Internet companies, but "you first learn the Internet and learn how fast the Internet is".

I always have a point of view: all agile and DevOps that can't make the enterprise faster are hooligans. If you want to do DevOps and agile, you must first be able to get up fast. Of course, if this fast is sustainable fast, it is steady fast.

The second short story, just a few days ago, we had an agile testing salon with colleagues in the quality department and the testing department. Everyone said that our quality has been "improved" under agility. I will ask everyone, has the quality really improved after we practiced Agile? Compared with the traditional development, how have we improved? Everyone will smile heartily.

In fact, compared with departments and product teams that use traditional development methods, the quality of software delivery for development departments and product teams that use agile methods has not improved, at least from our internal data.

Agile has not improved the delivery quality of our software products. Even, you found that when we started to do agile, our delivery did indeed start to get faster, but our delivery quality has deteriorated. This is not a feeling, it is. Data supported.

However, with the passage of time, the delivery quality of agile products is gradually and significantly improved, and some have approached the delivery quality level of traditional development methods, which is also supported by data. This is not really a fuss. The change curve model has long explained this. The management master Peter Senge also said: First get worse, then get better.

Through these two short stories, I think this is a very good interpretation of our demands for agile and DevOps, whether from our business side or ourselves, we all hope to deliver our software products more frequently and faster However, this delivery must also be high-quality and robust.

It is often said: Agile and DevOps are not fast. I'm serious that this sentence is not accurate. I agree with an accurate statement: Agile and DevOps are not only fast , but first of all, they must be fast .

When it comes to DevOps, there are several concepts that everyone may know better, such as agile and lean. I want to share here, in a narrow sense, what is the relationship between these concepts. First of all, broadly speaking, whether it is Agile, Lean, or DevOps, it never admits that it only focuses on one paragraph, and says that it is end-to-end. A typical DevOps has infinite loops, and Lean does not say it. Regardless of the back, Agile does not say that it does not care about the front. Here I am speaking in a narrow sense. The first time I heard this point of view came from Mr. Li Jianhao from Halo. In a narrow sense, these concepts look like the picture below during our development process.

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

My summary of this picture is: Lean is more focused on doing things right, Agile is more focused on doing things fast, and DevOps is more focused on doing things smoothly . Speaking of Lean, I recommend two books to everyone, one is "Lean Entrepreneurship" and the other is "Lean Data Analysis". From the perspective of Lean, we will continue to explore business value and search through the process of learning, construction and feedback. Commercial value is to constantly explore and find the "right" point of value. The "lean" here is actually different from the original "lean production".

Agile everyone knows that it was initiated by a group of development masters from the source. They are essentially more concerned with "searching for better software development methods." They also said at the beginning that we want to deliver customer value and deliver workable Software, response to changes, they did not say that I am not concerned about other things.

Only from the original and narrow sense, agile focuses more on software development methods, and it does mention less about the value search in the front and the smooth delivery in the back.

I won't say much about DevOps, I have already said a lot. I will talk about it later. Let’s talk about “tool use” again. The term “tool use” is a very old term. At the beginning, the so-called “weapons” refers to weapons and “use” refers to agricultural tools. Later, it became tool use. Our Chinese civilization is broad and profound, and every word has it. There are many explanations. Here, I quote the "utility" to refer to the use of tools, and there is no other profound meaning.

This relates to our view when understanding tools?

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

Culture and behavior, we all know that culture affects behavior, and behavior in turn affects culture, but when we introduce tools, it is actually the same. You think that you are using tools through your own behavior to affect the changes in the functions of tools. At the same time, in turn, tools will affect and regulate, and even guide your behavior.

2. Our DevOps


The above briefly talked about the two concepts of DevOps and tools. The following is the DevOps of Bank of China. Everyone is very concerned or hopes to know what the DevOps of Bank of China is like. Let’s share with you. Of course, this is only related to our X86 related parts.

2.1 Selection of tools


I have always said that tools are as polished as they are. Let's take a look at what we are talking about when we are talking about tools.

There are many DevOps tool chains. These are just some of them, and there are not many of them. There are many subway-style, spider-web-style, and even periodic table-style tool chain maps. You can refer to Mr. Xu Feng’s "An article contains 16 maps." DevOps "Photo Review and Review" article.

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

Among them, many people must know this picture, this is a picture that is often used, from James Bowman. This picture is highly recommended for everyone to understand. From the front to the back, there are various fields in each link, and each field has some open source tools that it recommends or thinks that everyone should use. Among these open source tools, Bank of China uses many of them.

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

We have basically used all the open source and non-open source tools that have been marked. What I want to tell you is that it doesn’t matter which tool is useful, what matters is how you choose and use it in the process, and whether it’s true. Solved his own problems with tools!

2.2 Devops tool chain


The above picture is very complicated. From the user's point of view, our process is like this. We probably used these tools during the entire process from the previous requirements to the subsequent monitoring. What needs to be mentioned repeatedly here is that this is only related to our X86 part.

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

3. Successful experience sharing


3.1 Source code


Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

We all know that there are disputes between SVN and Git in the industry. Some say SVN is good, and some say Git is good. What do you think is the difference between SVN and Git? Or what do you advocate?

画外音:其实此处的互动,见仁见智,并没有标准答案!

In actual development, my department uses SVN, and some of our departments use Git. Which tool is better, my answer is: "Mainline development is better!".

Completely answer the wrong question! This is the first point of what we believe to be a successful practice that I want to share: mainline development. We all know that to improve work efficiency, it is very important to eliminate waste.

So, what is the waste in software development? We all know: job switching, ineffective communication, and equipping developers with bad machines and small screens. Here, I would like to mention that many people disagree, but the obvious waste is branch merging. In agile and lean, we will say that we want to generate value. Think about the branch merging in the software development process and what users are generated. value? No user value is generated at all.

I have always had a point of view: the so-called branch merge, in today's IT technology environment, is just an unnecessary action in the software development process that does not produce any value. There are only two reasons why you can't eliminate it. First, you don't want to, because you are bound by your imagination and feel that you have to do it; second, you are incompetent and have poor skills, and you can only make up for it by the so-called "management" means.

3.2 Branch Management


Starting!  DevOps@BOC — The way to use the device, like thinking and grinding
The picture above is our former branch, can you imagine it? This is our batch-by-batch production. This is our previous branch status. For this branch status, a special project manager drew a chart for management. Great! ?

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

This is the refreshing structure that we have obtained after more than half a year of hard work after we realized this problem. What I want to tell everyone is that it can be done, and it can be done. However, it is really difficult.

When we saw the backbone development, many people in the industry knew this, but what exactly is the backbone development? The so-called ideal trunk development, first all the code is submitted to the trunk branch, and then your release can be released from the trunk at any time, this can be very difficult to do, you need to keep your trunk very stable, this requires you The team is self-disciplined enough. Everyone who submits code is self-disciplined enough. Your continuous integration is robust enough and well maintained.

What we do is a kind of realistic trunk development, that is, your submission is still submitted to the trunk, but you want to release. When you want to release, you will pull a branch from this place for release. When releasing, there will be no more than one branch. I voted during the iteration. After the vote, the branch was frozen. I gave a very vivid model. It is like a tree. It has been growing. This is the main development situation we are doing now. However, backbone development is actually really difficult.

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

The above picture is a version management branch that is often used in the industry, and we did it before. These two methods are definitely different, and the value and efficiency improvements they bring to you are also completely different.

It is said that the backbone is well developed, but coins always have two sides. What side effects does it have? What is its difficulty? Think about it, everyone.

We often communicate with the industry. Everyone will say that this is good when I introduce the backbone development! Okay, why don't you do it? Everyone will smile very knowingly, we can't do it! We are special, we have this reason, that reason, Barabara...

At this time, if we ask ourselves, why can't we do it? How can I do it? If we really want to do it, what will be the first and smallest practical change?

When I communicate with many colleagues in the industry, especially our financial colleagues, we have developed a state. A word mentioned in the "DevOps Practice Guide" is called learned helplessness , see "DevOps Practice Guide" Page xxii. For "old fritters" who have worked in the financial IT field for more than ten years like me, I have heard a more appropriate term called sophisticated incompetence . The development of this state is caused by the environment on the one hand, and by itself on the other hand. When we do DevOps, we must first be aware of this problem.

There when you Mangmangdaodao, in fact, a lot of work to do just what you can do, however, is not necessarily what you should do . Because we often feel that the " things that should be done " are too difficult, so we take the initiative and give up when the idea is just an idea.

Main development, I want to say that it’s not easy to love you

  • Sufficiently short iterative cycle (two weeks, ready for production version)

  • Short enough lead time for production (three working days)

  • Desperate to guarantee the demand serial

Trunk development is such a thing that absolutely should be done, but it is also absolutely difficult, especially in the financial industry. To achieve backbone development, it takes a short enough iteration cycle and a short enough lead time for production. These two points are enough to discourage many financial colleagues, plus a guaranteed demand sequence. Well, basically give up completely.

What I want to tell everyone is that this can be done and it should be done .

3.3 continuous integration


What is continuous integration?

  • Developers submit code at least once a day

  • Trigger the build and test immediately after the code is submitted

  • Handle as soon as possible after build and test failure

Continuous integration is not a new vocabulary. Everyone is talking about continuous integration. This is the continuous integration in my definition. These three items clearly define the effect that should be achieved after "continuous integration" is achieved. Even these points can't meet the minimum requirements, I generally don't call it continuous integration. I have seen more so-called continuous integration, but in fact, only the second one can be achieved. This kind of continuous integration should be called: automated construction.

The point I'm focusing on here is, "Processing as soon as possible after the build and test failures", what does "processing as soon as possible" mean? In our definition, to deal with it as soon as possible is: the dashboard turns red and does not leave work. A display screen is placed near every product development team in our department. We call it a physical dashboard. After several years of hard work, we can finally do it now: the dashboard turns red and stays off work.

大家想想这一条规则或者叫做纪律会有什么副作用?

Yes, in order to get off work, everyone will submit the code in advance. You can submit the code at four in the afternoon. We allow this. We don’t mind developers submitting the code at four in the afternoon.

"Handle it as soon as possible" although it sounds like a vague sentence, but we have clear definitions, rules, and discipline to support it.

Only if you do each of the above, continuous integration becomes meaningful.

We will have continuous integration discipline, which sounds very simple, but it is really not as simple as we imagined to really make discipline abide by everyone and become everyone's habit. It is not that a manager can talk about it, preach it, pull a banner, and do it. It needs to take concrete actions.

How to do it Let me tell you about our approach. It may not be suitable for everyone, but you can listen to it.

We have a physical dashboard like the picture below. This physical dashboard is very simple to build. I believe you can do it in just a day and a half of research. This physical dashboard is placed near the development team. Physical things have a natural information radiation effect, that is, they are there if you want to see them or not. This is the first step and the easiest step.

The second step is to train everyone. We have conducted many trainings to tell you how to write unit tests and what kind of unit tests will not cause the dashboard to turn red due to other messy problems such as data and environment.

The third step is to spend the first difficult moment with the team as a manager and a promoter of culture.

Let me tell you a little story. In August 2017, we had four Scrum team products for closed development in Xi’an. Seventy or eighty people work six days a week. Every night until 10 o’clock, I was asked to go as an internal coach. The coaching team, when I went, the dashboard was always red. It took nearly two weeks. We went through three training sessions and went through the stage of repairing all sorts of messy problems that caused the dashboard to turn red. At noon on August 24th, I still clearly remember this date and everything is ready. I decided to start the work discipline of "the dashboard turns red without leaving get off work", and I announced it to everyone.

In fact, I still feel very imaginary in my heart. I have been working until ten o'clock in the evening. If it turns red, I really can’t fix it. Do I really let everyone go all night? If this is the case, I feel a bit unreasonable. Murphy's Law! When everyone reached 9:30, they all started to submit code. Suddenly the dashboard became red, and then everyone was shocked, and I was also nervous.

I tell everyone, stop all work, we still have half an hour to get off work, repair as soon as possible, we can get off work after the repair. Busy for a while, various searches at 10:20 in the evening, the dashboard was repaired, and everyone spontaneously applauded themselves. Since then, the product team has always maintained the work discipline of "the dashboard turns red and stays off work" for a year.

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

What I want to tell everyone is that in the process of implementing continuous integration, the most important thing is the establishment of continuous integration work rules and work discipline, and the best tools and platforms are built.

And when you want to introduce a rule and establish a discipline, you must define exactly how you want to do this thing. In the process, you need to train everyone, let everyone walk the road with everyone, how to locate and repair the possibility as soon as possible Then do something to reinforce this behavior. You have to really participate in the process as a participant, accompany everyone through the first and second difficult moments, and then, naturally, you will achieve a new team Habits and disciplines are the so-called establishment of a new "culture."

In many cases, coaches or managers are "uncultured", but they shout every day to build culture and change culture. Specific to this matter, the cultural enlightenment for coaches or managers is: coaches or managers must learn to "dirty" their hands. In short, they must do it with everyone! Culture is done, not shouted out, and posted on publicity banners.

Having said that, do you think the tools we used in the above continuous collection process are still as important as you thought before?

3.4 Automated deployment


Automated deployment is everyone’s persistent difficulty:

  • Difficulty in integration of rights management in production environment

  • Difficult to collect production environment configuration parameters

  • Difficulty in linkage of production environment resource changes

  • Incremental and full support is difficult to unify

  • Difficult to implement automatic package standard

The meal needs to be eaten one bite at a time. In 2016, we wrote in the department's annual quality target: All products are automatically packaged, and manual package grouping is strictly prohibited. In a certain sense, manual package assembly is a waste of organizational resources, so we strongly promote automated package assembly internally. Then, there is our automated deployment.

In the entire software center, there are now more than 20 products that have been automatically deployed. Our goal this year is 100+. I don’t know if it can be achieved. However, more than 20 products have been automatically deployed, including X86 and so on. All platforms have realized automated deployment.

Compared with the manual deployment time, automated deployment will become faster. In fact, faster is only a little bit of the profit. The most important thing is that you are drinking coffee during the automated deployment, and there is generally no error. Think about manual deployment. In this case, what are the operation and maintenance students doing? Fingers flew, frowning. Comparing the two situations, the happiness and stability are absolutely different.

Automated deployment is something we should strive to do when we do DevOps, but when we do automated deployment, we automate package grouping, environment management, parameter management, and authority management. If your data center and development center are still separate, These are all points you need to overcome, and each one is not simple. This is too complicated, I won't go into details. What I want to tell you is that automated deployment is something to be done to overcome various difficulties, and it is something that should be done.

4. Retrospect and reflection on failure

Let me talk about our failure and reflection.

The suffering of automated functional testing:

  • In 2016, a number of products compiled more than 1,000 UI-oriented automated functional test cases (Robot Framework). After half a year, all of them were abandoned and have not been run again.

  • A large number of UI-oriented automated functional test cases written by a product usually take one to two weeks to complete once they are run . There are 2-3 test developers who are responsible for running and writing, as well as updating these cases.

Seeing everyone’s knowing smiles, I know that our status quo is not special. We have experienced the same sufferings as everyone. Internet companies have set us a good example, but unfortunately it can’t save us. Our environment is different.

From my sharing, you know that all these processes of your scheduling tasks are completely different from those of Internet companies. Some Internet company product managers get a PPT, show them to the top, and start doing it. It’s time to catch people. We should ask for money and ask for money. We in the financial industry, we have a demand, from evaluation to successful project establishment, not one year or a half of a year is impossible to come down, and the ones that can come down are fast.

Back to automated testing, the above is a "suffering" we have in automated testing. After the suffering, we have reflections. The same sentence: Do what should be done, don't just do what is easy to do.

Because, in many cases, the things that you think should be done and are not easy to do are the key points of system improvement. Otherwise, why have you done so many capable things and the system still has no improvement?
Starting!  DevOps@BOC — The way to use the device, like thinking and grinding

What to do? We found that point in the comparative transition diagram of a high-performing organization and a low-performing organization that has been published.

We release it every 10 to 100 days. This is our current status. You will find that the organization, development and testers at this stage in the picture are working together to maintain automated test cases, and the automation we mentioned above For functional test cases, the first development is written by myself, and the second is written by testers who specialize in automated testing. The only one is not written together and maintained together.

When I saw this point, it was very inspiring to me. I wanted to find the thing to do. I heard people say that we need to test migration, develop and test integration, but what is integration? How to integrate?

Use the topic of automated testing to integrate development and testing. When the automated test cases are jointly maintained, development and testing are integrated.

What is the joint maintenance of automated test cases? What exactly is "joint maintenance"?

More intuitive point: development and testing share the code of automated testing, and jointly modify the code of automated testing. Sometimes we especially like to say some vague and abstract words, because these words make everyone feel a vague sense of security.

From the integration of development and testing, to the joint maintenance of automated test cases, to the sharing of automated test code, everyone can experience the difference. It's actually not that easy to find that key point, but it's actually not that difficult.

Finding is only the first step, and achieving it is important. This is another difficult journey. How much development and testing are in the same department? How many tests are capable of coding? I believe that when I asked these two questions, many people gave up silently in their hearts.

Our development and testing belong to two different departments, and our testers basically have no coding skills. However, we are doing this: by sharing automated test code, development and testing jointly maintain automated interfaces and functional test cases, forming the integration of development and testing. Because this is what should be done .

Starting!  DevOps@BOC — The way to use the device, like thinking and grinding
When it comes to writing automated test cases, there is a topic of increasing cases and how many cases.

Here, we are implementing a discipline, that is: when adding automated tests, you must ensure that your automated test cases are continuously effective and usable. How to call it continuous, effective and usable?
It is the added automated function test case, which must be able to

  • Run at least once a day

  • The operation is expected to be 100% successful

  • If it fails, it must be due to application issues, not non-application issues such as environment and data

I call this: the disciplined increase in automated test cases,

Test cases for the added automation function (interfaces are also counted), run at least once a day. Automated test cases that can’t be run once a day are basically decorations. Apart from making their managers feel refreshed every time they look at the number of cases, they have no practical use.

Every run should be expected to be 100% successful. If it is not successful, it must be because your application has a problem, not because of data problems or unstable environments. Other systems are down, so we have downtime. Wait a minute.

Only if the first case meets this condition, can the second case be written, and only if both cases meet this condition, can the third case be written. If the first case cannot meet the above three conditions, try to find a way to make your first case meet the above three conditions.

This is called discipline , and this is called restraint . We are now very cautious in increasing our automated test cases, and we require our tests to be integrated into our team to write automated test cases with us.

Our interface test cases are basically written by testers in iterations with the support of developers. What should I do if I can’t write at first? Training! Study!

Speaking of this place, there is a principle of unit testing called FIRST principle , and some places of functional testing are called AIR principle. I think adding an A is a bit redundant. I think automated functional testing should satisfy the SIR principle . What is the SIR principle? ? You will find that there is actually no F and T of the FIRST principle of unit testing .

What is F? F is Fast. You will find that your unit tests will be Fast. However, in any case, your automated interface testing and automated functional testing are difficult to be as fast as unit testing.

T is Timely, which means it can be written in time. Your unit test is written by your own development, and sometimes you can use TDD. There is enough Timly, but the automated testing of your functional classes and interface classes is not necessarily It can be written in Timely.

All, I think that automated functional testing should satisfy the SIR principle .

This is our own reflection and actions after a series of failures. It is not yet considered to have achieved any results. The process is also very difficult, but we will continue to do it.

Having said that, I don't know if everyone has different views on the "approach" in DevOps. Imitate the Over body of the Agile Manifesto, and finally give you a message:

Tools are important, what is more important?

This question is left for everyone to think and summarize.

注:本文仅代表个人观点。

As one of the banks with the longest history in China, Bank of China has passed the "DevOps Standard—Continuous Delivery Capability Level 3 of the R&D and Operation Integration Capability Maturity Model" evaluation!

Guess you like

Origin blog.51cto.com/15127503/2657655