1_How Huawei develops hardware

Table of contents

Preface

This article is reproduced from the WeChat public account Hardware One Hundred Thousand Whys

Recently, many friends have asked about some hardware issues and found that they did not read the datasheet carefully and did not do a good circuit analysis. Let me talk about how Huawei does hardware development and give some inspiration to friends who are doing hardware development. If something is wrong, everyone will criticize and correct it.

Once in 2007, when I had just been working for 2 years, I went to a small company for an interview. At that time, I felt that I did a good job on the test questions. During the interview, the other party also recognized me very much. But he said at the time: "I need to recruit someone who has worked in a large company and preferably knows the hardware development process and specifications. Although you answered the question well, we need someone with rich experience, preferably who has worked at Huawei. of."

At that time, I was thinking, "What are Huawei's specifications and processes like?" and I always wanted to take a look. I have never been very interested in interviews with Huawei before. After that, I really wanted to have the opportunity to visit Huawei. Arrived at Huawei in 2008.

I can think of a few different points in Huawei's hardware development. I would like to share them with you. I will write down whatever comes to my mind. Your criticisms and corrections are welcome.

1. Documentation, review, design

When I first joined the company, three people made one circuit board. Although the circuit is more complex, there is still some excess manpower. Therefore, I was assigned to write a PCI to UART logic.

I was a new employee at the time, and I was eager to express myself. I used my weekend time and estimated that it took a week to finish writing the code and start the simulation. I thought my mentor and supervisor would praise me, but he didn't. He said, "Why didn't you convene everyone to discuss it? Then write the plan and review it? Then start writing the code?" I didn't understand it at the time, and I felt that I was alone. It's just something that can be done, why do you need so much effort?

Reflect now:

First, from the supervisor's point of view, he doesn't know the new employee's personal abilities. Only if you can explain clearly what he is doing will he feel at ease.

Second, from the company's perspective, there is a set of processes to ensure project delivery. Then there will no longer be too much reliance on one person's personal abilities, and the resignation of any one person will not affect the delivery of the project. This is also the most remarkable thing about Huawei. It breaks down complex projects into very small pieces, so that no particularly talented people are needed to deliver the project. This is why the income of Huawei engineers is one-Nth of that of Cisco.

Third, from the perspective of effectiveness, after all, one person's ideas are limited. The process of documenting ideas is the process of organizing ideas; the process of discussion is the process of collecting things you have not thought of. Formal review is a process in which everyone reaches an opinion. It is much better to discuss it in advance and let relevant people participate in your design than to have a fatal problem pointed out by others after you have finished the design.

It is because Huawei has broken down a piece of work that communication, documentation, review, and discussion have become very important.

The shortcomings of this working model are also obvious, including high communication costs and low work efficiency.

2. Huawei’s personnel composition in the hardware field

Within Huawei, there are many personnel roles. Hardware people are responsible for the product development stage, end-to-end.

Being a single board hardware engineer allows you to cover the most fields. It is also the type of job with the most complicated work content, the most contacts, and the most wrangling.

But it is also because there are people who are responsible for drawing PCB, EMC, power supply, and logic, which are areas that hardware engineers should originally do. Then the hardware engineer will lose his martial arts and become "connected".

In fact, it is not the case. It is precisely because everyone is in a small field and no one is in charge, so the role of a good hardware manager is very important and is a key role throughout all fields and processes.

As someone on the original Huawei internal forum compared it, hardware engineers are more like the "Cache" in the processor, a transfer station for all links.

The division of labor in large companies is so detailed to prevent a certain group of people from mastering too many of the company's core technologies and going out to do it alone.

3. Huawei’s process

In fact, many people know that Huawei's process, IPD process, comes from IBM. At the same time, Huawei has also consulted Ericsson. Ericsson's hardware development has no process at all.

My personal understanding: The IPD process has been modified and optimized in Huawei, combining Chinese characteristics and Huawei's corporate characteristics. If Huawei rigidly applied IBM's process, it would definitely not be so successful.

So let’s summarize Huawei’s hardware development process:

Requirements analysis → Overall design → Special analysis → Detailed design → Logic detailing → Schematic → PCB → Inspection → Glue logic → Board casting → Production trial production → Back-to-board debugging → Unit testing → Professional experiment → System joint debugging → Small batch trial production → Hardware stability → Maintenance.

The essence of the process is that once this link is completed, move on to the next link. In fact, all the links are not much different from other companies, but the assessment conditions for entering the next link are strictly grasped. The most troubling thing for hardware engineers is that "no node corresponds to 'throw board'".

Huawei's system that supports the IPD process is PDM (also known as slow to climb)

The Chinese name of PDM is Product Data Management (Product DataManagement). PDM is a technology used to manage all product-related information (including part information, configurations, documents, CAD files, structures, permission information, etc.) and all product-related processes (including process definition and management).

All Huawei device information, product parts, tools, documents, schematics, PCBs, logic codes, etc. are stored on this system.

However, the system is too complex and difficult to use. It is also easily confused with server archiving and SVN archiving.

4.Normalization

Device normalization

Hardware engineers generally understand that on a board, they should choose lower-cost devices and fewer types of devices as much as possible to facilitate centralized procurement and processing. But other companies may not be as meticulous and rigorous in device normalization.

First, because Huawei uses a wide variety of devices throughout the company, if one device code is reduced, the benefits can range from RMB 100,000 to several million, while other companies may not be able to achieve this high benefit. So if I can save one code, I would rather choose a device that may cost more. However, this also needs to be compared based on the annual direct cost benefit of the device * the number of device shipments, and the coding cost + processing cost difference. However, after the devices are normalized, the price of the device can be renegotiated with the supplier. This benefit is iterative. Therefore, sometimes even if the cost is dominant, the conclusion will be towards device normalization. For example, resistors with 5% accuracy are gradually removed and normalized to 1%.

Second, device normalization requires thematic analysis. Because there are also engineers who do not fully analyze the circuit principles in order to normalize, the resulting normalization brings "problems introduced". Therefore, my department had a form at that time, an excel form of "Device Normalization Analysis. analyze. The first is to allow every employee doing normalization to fully consider the analysis, the second is to record the problems for easy review, and the third is to make it easy to fix problems when they arise.

Single board normalization

In addition to device normalization, a higher level of normalization is board normalization. (Let me clarify the concept of single board a little bit. When I first came to Huawei, I also thought this term was strange. Because communication equipment consists of a chassis, a backplane, a circuit board with various functional modules, and the circuits of each functional module. It’s called “single board”, and hardware engineers are generally called “single board hardware”)

The benefits brought by single board normalization are, first of all, fewer types of circuits. There are two benefits of fewer types of circuits: first, reduced production costs, second, reduced hardware maintenance costs, and third, reduced software development and maintenance costs.

First, the prerequisite for single-board normalization is processor normalization. In fact, some of Huawei's products don't do this well. They all use X86, MIPS, ARM, and PPC. Therefore, a hardware platform needs to be equipped with various software personnel and N sets of operating systems, such as VxWorks and Linux. Various BIOS packages.

Second, when standardizing veneers, attention should be paid to product derivation. If the functions implemented by the single board on the first version of the chassis can be used in subsequent products, they should be directly available without further development. If you don't pay attention to this, you will find that the boards of the first version cannot be borrowed from each other in the second version. In turn, the first version of the circuit board is modified to adapt to the new version. Sometimes the problem is worse, it is completely incompatible and has to be redeveloped. The planning of the veneer is very important.

Third, when single boards are normalized, although the circuit parts are compatible, the structural parts are not. For the configuration of market personnel, there are still two configurations. The same is a failure.

Platform normalization

Then if you find that different hardware platforms have the same architecture and similar functions. Then the machine frame can also be normalized. You only need to make different circuit function modules to achieve different functional requirements.

However, different hardware forms have their own meanings. If they are forced to be unified, the market may not accept this happening. For example, using an operator's platform to standardize an enterprise application or home application product may not be successful.

Network architecture normalization

I came up with this statement myself. As early as 2008, Huawei was discussing the "cloud, pipe-device strategy", but I didn't quite understand it at the time. When our operator platform department merged with the "server" department, we seemed to understand something.

When the X86 processor is powerful enough, all operations, regardless of whether they are the most cost-effective or not, will be sent to the cloud for processing, and all intermediate storage and calculations will become unimportant. Then the structure of the entire network is terminal + pipeline + cloud storage and cloud computing.

Insert image description here

Since computing and storage devices are the same, there is no need to be so diverse as computing and storage devices. At this time, network storage devices and servers are particularly important.

This is also an important reason why Huawei established an IT product line and made key strategic investments.

So now there is no need for so many network nodes and network platforms, only super processing and storage capabilities, wide channels, and various terminals are needed.

5. Core competitiveness of hardware engineers-thematic analysis

In the early days, the submarines we developed in Greater China were all sea blue, the same "blue-grey" color as warships. Later, I visited the military exercises of the former Soviet Union and discovered that the Russian submarines were not blue, but black. So I came back to discuss why the Russian submarines are black. Guess: It must be that black is difficult to see at night, so it is painted black. Hence the National Paint Movement. Later I learned that the black color of the Russians was not paint, but black rubber and sound-absorbing tiles. So we also put rubber on it, but after we put on rubber, the submarine couldn't run because our submarine was not as powerful as others. (The above stories are purely fictitious. If there are any similarities, please change the time of occurrence to the Qing Dynasty.)

Why are you telling this story of failure to copy the gourd and draw the scoop? I think many hardware engineers have a misunderstanding. They think that their core competitiveness lies in their ability to use several software (cadence, Protel), draw schematics, and draw PCBs. One of my early jobs was like this. My biggest skill was to copy the demo boards and mature circuits from the past. If I encountered a new circuit design, I usually drew the circuit according to the reference circuit and then debugged it. Try it, encounter problems, and then solve them.

So the current concept is that the most valuable thing for a hardware engineer is to understand hardware principles, circuit analysis, analog electrical and digital electrical principles, and electromagnetic field theory, rather than being able to use drawing software.

So how does Huawei do circuit design? Why is there such a thing as thematic analysis? Why do we need thematic analysis when designing circuits?

First, as a routine, each circuit usually has several required topics: power supply, clock, small system; how to use each pin, how to connect it, and whether the level of the connected pin meets the requirements. It needs to be documented and analyzed clearly. When selecting new devices, the workload of the corresponding hardware engineers is still relatively large. But if it's from another company, just follow the recommended circuit design and you're done. The power supply topic needs to analyze the power supply requirements, the voltage range, current demand, dynamic response, and power-on timing of each power supply; the clock topic requires the input level standard, frequency, jitter and other parameters of each clock, the clock timing, and in accordance with Various clock solutions are optimized;

Second, when some new problems are encountered during the circuit design process, problems that the team has not been exposed to before, or content that is considered to be key or difficult, we will conduct a special thematic analysis of this problem: for example, some of the things we have done Dual BIOS startup, camera infrared LED driver, active/standby switchover, etc. will analyze a problem point thoroughly, and then start drawing the schematic diagram.

Third, when developing hardware, the Demo is only used as a reference. Every basis comes from the datasheet. In addition to reading the data sheet of the chip, you must also carefully check the errata of the data sheet and check the difference between the datasheet and the Demo. One point, if the device has a checklist, you must check the checklist. When I was developing AMD, the datasheet, demo, and checklist were sometimes incompatible with each other. There has also been a problem that is difficult to reproduce. Later, I checked Errata and found that the manufacturer's chip has been upgraded and the bug has been corrected, but we are still purchasing the old version of the chip.

Fourth, since the project itself has delivery time requirements, it is actually impossible to thoroughly address every problem point within a limited time. So here comes the question:

How is it done? First of all, every project has an "Issue Tracking Form", and since the hardware team's work is very complicated, it must use this form very well, otherwise it will be normal for it to be messed up. I once applied this table to home decoration. The principle of this table is very simple, that is, record, problem content, responsible person, completion status, and completion time. But as long as you insist on using it, you will find that you will not lose track of your problems, you will be more organized in doing things, and you will have a sense of accomplishment. After using this table, after discovering a problem, record it first. Even if it is not solved now, it will be identified whether it needs to be solved and when it will be solved. Secondly, problems are divided into priorities. Any project moves forward with risks, so identify high-risk problems, solve high-risk problems first, and continue with low-risk problems. This is also one of the reasons why "0 ohm" resistors are used more frequently in Huawei's circuit design. After the risk is identified, but the analysis is not clear or there is no time to analyze it, we have to do compatible design. I have to say something sad here, in your design process, if you treat it sloppily and fail to analyze the problems clearly, they will definitely be exposed in the end.

Therefore, when working as a hardware engineer in the "Chrysanthemum Factory", "thematic analysis" is the core task of designing hardware, not drawing schematic diagrams.

Through this method, it takes 12 months to do circuit analysis and 12 weeks to draw the schematic diagram, instead of drawing, debugging, revising, and debugging again in the form of revision.

It is impossible to achieve both speed and economy at the same time, so hardware engineers have the responsibility to make good compromises and trade-offs.

6. Device selection

About "Device Selection Specifications"

When I joined Huawei, the entire company was in the midst of a "standardization" movement. Standards were written for everything, and everyone wrote standards. All appointments, performance, and technical levels were all subject to standards. (Large companies use KPIs to guide, which can easily lead to "movement").

So at that time, many people wrote various device selection specifications according to device types. At that time, during the schematic review, the most common thing I heard was "This is how the specification is written." There were some problems in this:

1. The person who writes the specifications may not necessarily be of high standard, or may not write in detail. If mistakes are made, it will be even more harmful.

2. Specifications sometimes inhibit developers’ thinking. Everything must be based on specifications, which may not be suitable for actual design scenarios. For example, if I need low-cost design, but the specifications emphasize high quality, it may not be applicable.

3. After the regulations are in place, some developers will not think about it. For example, the crystal oscillator is required to be above 50MHz, and pF-level capacitors are used for power supply filtering, while those below 50MHz are not used. Everyone doesn’t think about why, and naturally they don’t know why; another example is network port transformer protection, indoor and outdoor, according to the design requirements of various EMC standards, just follow the drawings; but few people think about why, and they don’t know the test results. Anyway, when you actually encounter difficulties, you will be blinded. It is true that work efficiency and product quality are sometimes improved, but as tools become more developed, people become more degraded, which is inevitable.

4. The selection of some devices is not suitable for writing specifications because the devices develop too fast. It is possible that by the time you finish writing the specifications, the devices will be eliminated. For example: after X86 processors entered the communication field, processor selection specifications became redundant.

Norms do bring benefits. However, not all jobs are suitable for regulation. Hardware engineers must be able to think beyond "reference circuits" and "standards" and think about problems and designs from principles.

Of course, specifications are still a very useful means, and they are the essence of a large amount of theoretical analysis + experience accumulation + practical data. I think the specification I read the most at that time was the "Derating Specifications for Device Selection", which was based on a large number of tests and actual cases and summarized what needs to be considered when selecting devices.

For example: when selecting aluminum electrolytic capacitors, it is necessary to consider that the steady-state operating voltage is lower than 90% of the rated withstand voltage; for tantalum capacitors, the steady-state derating requirement is 50%; for ceramic capacitors, the steady-state derating requirement is 85%; because this takes into account the actual mode of some devices, the harshest environment (high temperature, low temperature, maximum power consumption), the difference between steady-state power and transient power... and other factors.

Factors to consider when selecting devices

In Huawei's PDM system, devices have a preference level of "preferred", "non-preferred", "forbidden", "terminal only" and so on.

Engineers can intuitively feel whether the device is optimized based on this priority level.

So what factors are considered for the preferred grade of devices?

1.Availability

Especially manufacturers like Huawei ship a large number of products. Carefully select devices whose life cycle is in decline and prohibit the use of discontinued devices. I designed a circuit in 2005. When designing, I copied someone else's circuit. However, during processing, I found that the components were not available at all. Since the components were discontinued, I could only buy refurbished components in the electronics market.

For key components, there are at least two brands of models that can replace each other, and some even consider solution-level substitution. This is very important. If it is an exclusively supplied product, it requires layer-by-layer reporting, decision-making, and risk assessment.

2.Reliability

Heat dissipation: Power devices give priority to package models with small RjA thermal resistance and higher Tj junction temperature; when selecting processors, try to choose devices with smaller power consumption when the performance is satisfactory. But if it is a device with a monopoly like Intel, you can only endure it and add a radiator and a fan.

ESD: The anti-static capability of the selected components reaches at least 250V. For special devices such as radio frequency devices, the anti-ESD capability is at least 100V, and anti-static measures are required in the design. (Note: Huawei has strict requirements and prohibits holding boards with bare hands. I didn’t understand it at first. Later, after I led the team, I found that the brothers spent a lot of time repairing single boards; our team is very strict about this. See It seems to reduce efficiency, but it actually improves efficiency. At least there is no need to always suspect that the device has been damaged by static electricity.)

Select components with higher moisture sensitivity levels in mind.

Safety: The materials used are required to meet antistatic, flame retardant, anti-rust, anti-oxidation and safety regulations.

Failure rate: Avoid devices with a high failure rate, such as labeled DIP switches. Try not to choose bare die devices, which are prone to cracking. Do not choose glass-encapsulated devices. Do not choose ceramic capacitors with large packages.

Failure mode: It is necessary to consider whether the failure mode of some devices is open circuit or broken circuit, and what consequences it will cause need to be evaluated. This is also an important reason for careful selection of tantalum capacitors.

3.Producibility

Do not use devices with package sizes smaller than 0402. Try to choose surface mount devices, and only do one reflow soldering to complete the soldering, without the need for wave soldering. If some plug-in devices are unavoidable, you need to consider whether the through-hole reflow soldering process can be used to complete the soldering. Reduce welding procedures and costs.

4. Environmental protection

Since a large number of Huawei's products are shipped to Europe, environmental protection requirements are also relatively strict. Due to the EU's lead-free requirements, almost all hardware engineers in the entire company were making lead-free rectifications.

5. Consider normalization

For example, a product has selected this device, and when it is shipped in large quantities, sometimes the selection of this device is not very suitable, but it will be selected because not only can the cost be renegotiated through the increase in quantity, but also you can choose it with confidence , because it has been verified in large quantities. This is also the reason why we tend to choose devices in the mature stage and be careful in the introduction and decline stages.

6. Industry management

For a certain major category, such as power supply, clock, processor, memory, Flash, etc., there are dedicated people to plan and coordinate the use of the entire company, conduct market research, analysis, and write specifications in advance. They will participate in the selection of new devices.

7. Device Department

Colleagues in the device department will analyze device failure causes, reliability analysis, take X-rays of devices, evaluate device life, etc.

8. Cost

If none of the above factors are fatal - the above factors are just clouds, pay close attention to Article 8.

There are many documents on Baidu Library about "Electronic Component Selection Specifications". Interested students can take a look.

7. Whiteboard explanation

Team development culture is the most complacent management method of the boss of Huawei’s central hardware department. Team development culture is still a very effective management method in multi-person collaborative development projects.
Personally, I feel that "whiteboard explanation" is the most essential part of team development culture.

Explaining the principle of a circuit clearly is something that is rarely done in other companies or development teams. But there is a principle. If you cannot explain the truth clearly, then you must not have figured it out yourself, or you have not understood it well, or there must be something in it that you have ignored. Then in the end, the problem must be here. I also learned that this is a bit like Murphy's Law. However, speaking clearly will definitely help you grow. If you master a certain knowledge point and explain it to everyone, then you will definitely be the one who masters it most clearly.

One of the benefits of whiteboard explanation: a deep understanding of the details. When multiple people discuss, the principles must be discussed more thoroughly. First, it ensures that the design is correct, and at the same time, it also ensures that the entire team reaches the highest level.

When I was in 2010, because the project was stagnant at that time, I explained every detail of the switching power supply part. I explained it about 10 times in total. Later, I explained every detail of the Buck circuit. , I feel that I have a slightly more thorough understanding of the principles of switching power supplies. Then I sorted out the content of the 10 explanations and turned it into a textbook "How to Make a Single-Board Power Supply". At the same time, classic cases from Lao Wei, a classmate with rich power supply debugging experience, were added to form a relatively complete power supply textbook, which was widely disseminated within the company.

The second advantage of whiteboard explanations is that many explanations constitute one training, and many trainings are a set of teaching materials. The more the entire team explains, the deeper their technical accumulation becomes.

There was a time when I was working on the logic of the PCI protocol, and another colleague was also looking at it at the same time. Since I have already started debugging and have done various simulations, I have a relatively clear understanding of the entire protocol. Another colleague's main method is to read the code and the original text of the agreement, so he does not understand the reason for writing the code (because when writing logic, there are some technical contents in it, such as: how to use the base address register to determine the storage the size of the space).

Of course, when he started to explain, I remained silent because we were all new employees at the time, and our supervisors were all watching. It was not easy for someone else’s explanation to steal the limelight from others. Later, because there were too many mistakes in what he said, I couldn’t stand it anymore, so I pointed out his mistakes. Of course he was not convinced and said he was right.

But afterwards he told everyone that his original understanding was wrong.

After this incident, my project manager (PM) told me: The most powerful thing about whiteboard explanations is not that everyone can figure out the problem. What's more, "whiteboard explanation" is a competition, which allows everyone in the team to make technical comparisons and promotes everyone to continuously improve their skills. At the same time, it is also clear in front of the supervisor who has a high level or a low level.

* The third benefit of whiteboard explanation: It is the most effective technical competition within the team. It is a mule or a horse that can be driven around. Don’t be a literate person all day long and be unconvinced by each other during the evaluation. Those who are capable and those who are incompetent will become clear once they are mentioned.

The success or failure of a team, even a company, or a country is determined by the country's performance evaluation system and talent selection system. Whiteboard explanations provide the most favorable data support for the team's technical ranking.

Most of the R&D teams have a dull atmosphere, feel tired when they are in bad condition, the development cycle is delayed, and the efficiency is not high. It seems that this is the current R&D status of most companies.

Why is this so? Because people do not communicate with each other, they are social beings. A team that spends all day writing code is definitely a team with big problems. If we sit together, face to face, or back to back, we need to communicate via QQ or espace. A team will not say a word a day. Naturally, everyone is indifferent.

Although the whiteboard explanation is a technical competition, if everyone is open-minded, such a competition is actually an important means to promote mutual affection.

The fourth benefit of whiteboard explanation is that it is an important method to effectively improve the organizational atmosphere and enhance technical recognition among team members. Only a team that is willing to express its own views is a combative team.

Now that I am starting my own business, I actually find that Huawei's method of explanations, training, regular meetings, and follow-up is actually the most effective.

After all, Huawei has a very mature set of R&D management methods based on the characteristics of the Chinese people, long-term, multi-person, multi-team, and multi-project practice. Naturally, Huawei's method is suitable for large companies and has its own problems. However, until there is no better method, these methods can be regarded as good methods. Especially the whiteboard explanation. If you go to some large companies and small companies in Silicon Valley, you will definitely find that there is a whiteboard next to the desk of the engineers in these companies. Whenever a problem is discussed, it is "Let's draw it."

The fifth benefit of whiteboard explanation: The important feature of whiteboard explanation is "using a whiteboard". The advantage of using a whiteboard is to avoid errors in conveying oral expressions once; record the content of the speech one by one to facilitate sorting out ideas; through a large area The whiteboard displays the content to be discussed so that more people can participate in the discussion.

In addition, I have a few suggestions for whiteboard explanations:

  1. When your team does not have a whiteboard explanation, you can diligently find others to discuss the problem to achieve the effect of a whiteboard explanation.

  2. If you lead a team, the atmosphere of whiteboard explanation has not yet been formed. You can rigidify first and then optimize. First force everyone to develop a habit and realize the benefits, and then let everyone explain it spontaneously and automatically.

  3. For friends in Huawei (or friends from other large companies), if you are a technical geek, you should explain more in front of your supervisor; if you have already had the opportunity to report to the leader, you should contact PPT more. Because the essence of PPT is still a whiteboard. Of course, the content should be something that the supervisor is interested in and approves of, and it should be "customer-centric" - you get the idea.

  4. At the beginning, you have to overcome your own psychology. You may not understand the content yet, so you must dare to ask and speak. Don't let your technical shame hinder your technological progress. On the one hand, read more information, learn new content frequently, and work hard; on the other hand, you must discuss frequently. Only through discussion can you know your technical deficiencies, misunderstandings, or imperfections. The more you communicate with different people, the more understanding you will become among them.

  5. In addition, I should make more use of the Internet and ask questions in QQ groups and forums. Maybe some people laugh at you and say this is a low-level question, but if you ask more questions, you will naturally make progress, because everyone comes from a low-level question.

  6. In Huawei, there are supervisors who force every project team member to explain. There may not be such an environment and atmosphere in other companies, so you have to rely on yourself to be diligent in discussions.

I plan that when my child goes to school in the future, he will tell me everything he learns at school, so that I can ensure that he understands it.

The whiteboard explanation seems simple, but the philosophy inside is actually quite profound. It depends on how well you understand it.

8. Problem solving

Because nothing is perfect in the world, even products developed at a high level cannot be as perfect as the Mona Lisa. So whether it’s a big problem or a small problem, there may be a problem.

Part One: Three consequences of online problems:

  1. online accident
  2. online problems
  3. Board return

1. Online accidents

The most serious ones are of course "online accidents". Online accidents generally cause "security incidents", "customer losses" and "customer complaints". Wait for the situation.

The most serious online problems are naturally "security incidents", which endanger the personal safety of customers.

For example, there was once a device that was shipped in large quantities. When the backplane was modified, a power cord was moved. After the power cord was modified, it hit the metal parts of the machine frame through the green oil. Since the green oil itself has some insulating effect, this problem was not exposed during the R&D and production testing processes.

However, due to vibration and other reasons during transportation, the green oil was worn during the process. After the customer powered on, some equipment had a short circuit and board burnout occurred.

Insert image description here

Liquid photo solder resist (commonly known as green oil) is a protective layer that is coated on the lines and substrates of printed circuit boards that do not require soldering. The purpose is to protect the resulting line pattern over the long term.

This is a very serious situation. If a fire breaks out in the operator's computer room, it is a very serious accident.

However, when this problem occurred, various chassis and boards had been shipped to hundreds of countries on five continents. To solve this problem, we paid a very heavy price.

Another situation of online accidents is the interruption of the operator's business; based on the call charge of 0.6 yuan per minute, the number of users of an operator in a province is tens of millions, or even billions. How to calculate the loss if the customer's business is interrupted for one minute?

For this reason, most operators' equipment has a backup mechanism. For example, the internal switching module of the core-side equipment must be backed up by 1+1 redundancy; if it is a DSP resource, some signaling processing units are generally backed up by N+1. In this way, if a single point of failure occurs, it will neither affect user services nor the capacity specifications of the equipment.

The third situation is customer complaints. It is possible that although there are no serious consequences, if customers complain, the problem will be more serious. For example, new chassis and new boards are mailed to the operator. This is a situation where the circuit board cannot be inserted. Naturally, customers will be very annoyed and feel that it has greatly affected the company's brand image. Then this matter will be very big. Or a long time ago, when Boss Ren was on site, a leader from Sichuan Mobile said, "Your equipment is not as good-looking as Datang's." As a result, the people in the structural department were out of luck.

2. Online problems

If there is a problem on the Internet, you must use some means, such as some originally designed "maintainability" and "testability" software and hardware designs, to try to locate the problem.

Of course, these measures cannot affect the normal business of customers.

In addition, there will be some registers or some logs to check the records of device exceptions. You can also view the "last words" of some devices. Last Words will use the key information stored in the storage area before the processor is reset to facilitate subsequent discovery and solution of problems.

3. Return of single board

Front-line delivery personnel generally complain: "Your R&D team uses three tricks: reset, power off, and replace the board."

In fact, according to online problem analysis, if these three tricks have been used, it means that the problem is already serious, and it is basically a hardware problem.

However, "board return rate" is a very important KPI, which determines everyone's "evaluation." Therefore, maintenance personnel hope that the board will not be returned or recorded in the index. If the hardware is really no longer working properly, then the board will definitely be returned to the laboratory for failure analysis to find the cause of the failure.

No matter what level the above issues are, even some problems discovered in the laboratory, we take them very seriously. Because if there is any problem, it may cause unforeseen effects. Therefore, we dig into the root of each problem and analyze it thoroughly.

In addition, we are doing some experiments (EMC, environment), or during the testing process, problems discovered and exposed will be taken as seriously as online problems, and some problems will be tackled. why?

Because there is a theory that the sooner the problem is solved, the smaller the price paid.

Three tenets of problem solving:

  1. All "laboratory" problems, if not solved, will definitely appear online.
  2. Any problems that have occurred can be reproduced.
  3. Any problem that cannot be reproduced must have not found a pattern of recurrence.

Case 1. There was a NetLogic processor at that time (NetLogic’s network processor came from RMI. RMI acquired the processor startup Sandcraft, which itself was purchased by NetLogic. Later NetLogic was acquired by Broadcom), and the device failed, but There has been no similar situation online.

However, have you found any rules as to how the device fails? So the two sides entered the wrangling stage. However, through X-ray irradiation, it was found that the failed device was a cracked pad. But what makes a pad crack? Stress, high and low temperatures were suspected at that time. Tried various measures, but still no answer.

Later, during the discussion and testing process, some colleagues discovered that simple low and high temperatures were not enough to cause device failure. However, when high and low temperatures are experienced too many times, the probability of device failure increases significantly. Later, through many experiments, this colleague repeatedly used a hot air gun and liquid nitrogen to accelerate the aging of the device. It is very easy for the solder pad to crack.

When I went to Netlogic with this conclusion, the other party could only surrender, admit the problem, and agree to modify the device process.

Two things that illustrate the problem very much:

First, the single boards that later failed in the laboratory were basically problems before the manufacturer improved the process.

Second, another product with a large shipment volume, two years later, a large number of boards with this problem appeared online.

Case 2: If a problem is discovered during the test, the problem must be analyzed clearly or solved. Maybe this problem is difficult to solve and takes a long time to solve. But this problem must be recorded and finally solved according to priority.

For example, a colleague once discovered that the transistor had leakage current while doing experiments.

After theoretical analysis, since the triode is used as a switching tube, the theoretical analysis cannot produce such a large current, resulting in voltage changes; replacing the triode with a MOS tube will not help.

Because this leakage current only appears at low temperatures. Therefore, liquid nitrogen was used to keep the triode at an extremely low temperature (below -10 degrees). During the test, the temperature was almost in this range (-40 degrees to 0 degrees) and problems occurred.

However, after two weeks of testing, no pattern has been found. The problem occasionally reoccurs, and there is no pattern at all.

My colleague and I found it very confusing. We observed the weather at that time and felt that the leakage current of this triode was related to the weather. If it is cloudy, it will easily reappear, but if it is sunny, it will not reappear at all.

Through this rule, we began to suspect that "humidity" was at work.

Later, by increasing the humidity of the device, the problem was indeed very easy to reproduce.

We took our conclusion to the manufacturer, who confirmed that SOT packaged devices do leak current under the premise of high humidity and low temperature. This leakage current does not flow through the PN section, so it is completely inconsistent with the law of leakage current of the PN section.

It's the current leaking from the SOT32's plastic package.

Later, this problem was circumvented by adjusting the circuit parameters.

Therefore, throughout the entire analysis and testing process, we will never let go of problems even under extreme environmental conditions.

In fact, product problem solving is like this. Only after every problem is solved in a down-to-earth manner can the product quality be improved on a trial basis.

form:

  1. Research team: In order to show attention to any problem, a problem research team will usually be set up. It means organizing relevant people and experienced people to participate in discussions together, which can broaden your ideas and enrich your experience. Avoid being too horny or headless.

  2. Regular meetings: To tackle major issues, there must be daily meetings to summarize and track the issues discussed in the previous stage, record the conclusions corresponding to each measure, and clarify the next steps.

  3. Daily News: This kind of problem solving must be taken seriously by leaders, so progress will be announced every day. Of course, leaders will also look at it. Occasionally they will find that there has been no progress for a long time, and then they will allocate resources and coordinate troops.

  4. Summary: After solving the problem, we must organize the nine-nine-nine-eighty-one difficulties into cases and training and share them with everyone. In this way, all colleagues have not personally experienced this research process. Through sharing, you can learn relevant professional knowledge and ideas for solving problems. get promoted.

Solving a problem is painful, and breaking through the problem is also very rewarding. It is painful and happy at the same time.

The last two sentences:

  1. The more uncomfortable you are in your comfort zone, it is actually an opportunity for you to grow.

  2. The more difficult it is, the more you have to grit your teeth and endure; as long as you persist, you are always only one step away from success.

9.Meeting

1. Characteristics of Huawei conferences

Some elaborations on the characteristics of "Huawei's Meeting".

  1. First of all, large companies have "many people". Because the company is large, there are many departments, and people's responsibilities are carefully divided, so one thing requires the participation of many people. It's easy to get into trouble. When I first arrived at Huawei, I was very uncomfortable. I wrote documents for everything, reviewed everything, and held meetings for everything. So I was not used to so many meetings and would get bored during meetings. All Snake’s highest records were broken during that period. of.

  2. There is still a person in charge of everything. Huawei gives the person in charge enough rights, so it can promote the development of things and coordinate resources. For example, marketing is strong enough to promote research and development to meet customer needs. Product managers and account managers still have a lot of energy. They can have direct conversations with the R&D director and promote R&D to do this or that.

  3. All issues will eventually be recorded, tracked, and ensured to be completed. This is why customers are still willing to use Huawei equipment even when the quality and performance of some equipment do not satisfy them enough. For this reason, operators like to use Huawei equipment. A problem came up. Before we could determine which company had the problem, Huawei's brothers rushed to the problem. Two people from China Unicom attended the meeting, and six people from Huawei attended the meeting. Through tests and evidence, it was proved that the problem was with Juniper equipment. Then give a full report to tell the customer that this is not our problem, this is the problem of XXX manufacturer.

  4. The bigger the forest is, the more birds there will be. So pushing, delaying, and relying on things will naturally happen. This requires a strong and clear performance evaluation system to guide employees to take the initiative to take on tasks instead of drawing clear boundaries. This kind of "clearing of responsibilities" is inevitable. Otherwise, the three monks would have no water to drink. (Note: Huawei’s approach of fully discussing everything is applicable in the field of telecom operators, but often not applicable in the consumer field or even the enterprise IT field, because there is not enough profit margin to support this. So let’s talk about some of Huawei’s advantages)

  5. In the process of meetings, people often fall into misunderstandings, either being too divergent or too conservative. In meetings during the product definition phase, people are often reminded not to converge when diverging; in problem-solving meetings, people are often reminded not to divergence in the past and focus on the problem. This person who can remind everyone is often very important. Of course, sometimes it is just a formality. Friends, you can click on the original link to see the case "Huawei helps Sun Yang adjust his swimming posture." During the meeting, people kept reminding people to focus, but everyone was still relatively divergent.

2. "Robert's Rules of Procedure"

What is Robert's Rules of Order

A hundred years ago, there was a good young man named Henry Martin Robert. He was twenty-five years old and his Chinese name was Leng Touqing. He graduated from West Point Military Academy and was asked to preside over a local church meeting during the Civil War. The result - screwed up. People debated endlessly, but there was no conclusion. All in all a mess. It would be worse if this meeting was held than if it were not held. This young man is a bit stubborn. Said I need to study it and come up with a rule, otherwise I will never hold meetings again. He studied thousands of years of meetings and discussions, and came to a conclusion: humans are probably a particularly argumentative animal, and the most difficult animal to be persuaded by reason, once disagreements arise. It is difficult to convince the other party through verbal communication in a short period of time. Otherwise, there will be no results in arguing for days and nights. And the more they argue, the more they feel that they are justified and that the other party is an idiot. Therefore, there must be a mechanism for both parties to find common ground and reach a conclusion. He treated this research like a war. Treat people’s argumentative nature as the enemy. In the end the young man won the fight.

The winning outcome was Robert's Rules of Order in 1876. He published it at his own expense and bought a thousand copies to give away. In 1915, when Robert became a general, he revised the rules. At first, people didn't pay attention to it. How about a little guy who has no hair on his lips and can't speak well? Alas, I didn’t expect that it worked. As soon as they implemented this rule, the quarrels stopped and the meeting continued. Ink bottles and benches no longer fly around. As a result, Robert's Rules of Order have become the most popular rules of order in the world.

Conference FAQ.

  1. Off topic: Let me tell you a story at the beginning, and this time I will talk about lunch.

  2. Yiyantang: This Yiyantang is a leader who loves to talk, and once he speaks, he has everything to say. Secondly, there are some people in rural areas who are particularly talkative. There are also those who never speak. As long as a talkative person talks, he will basically talk again and will not give others a chance to talk.

  3. Savage Argument: As soon as the issue is discussed, say that you overpaid by five yuan last time, that you are not a good boy, and that you doubt the moral character of others. In a hundred sentences, I can catch a person's word without letting go. Even fight. There was no way the meeting could be held.

  4. Interruption: Do not interrupt other people's legitimate speeches.

Insert image description here

One of Robert's rules of procedure is: the moderator solves the above problems.

But in ordinary companies, when the leader appears, the host will not remind the leader, "You are off topic," "You have finished your sentence," "You should not interrupt other people's normal speech." This is the scientific method abroad. Some theories and methods are often not suitable for China's soil and cannot be applied mechanically.

In fact, at Huawei, in most meetings, when "off-topic, monotonous, interruption, or uncivilized" occurs, the moderator will remind them and get them back on track. But some meetings cannot be done, for example: the leader is relatively strong, the leader himself is the host, the host is a flatterer, and some politically sensitive issues cannot destroy the harmony. No details will be given here.

So how does Huawei solve these problems?

  1. The company has set a broad tone of "customer-centric", so no matter how big the leader is, he is never bigger than the customers. Customers' needs will always be met and fulfilled. Therefore, everyone is trying to win over customers, and there will be no major differences on issues of principle.

  2. Performance-oriented, everything is evaluated based on results. Therefore, on some issues, if the leader proposes a certain plan, but there may be major hidden dangers, the people below have the responsibility to remind and oppose it. Otherwise, after major and serious consequences are caused, the leader will not be able to escape and will still repair the people below him. They are all grasshoppers on a rope. When a colleague puts forward an opinion that is different from that of the leader and it is valuable, he will recognize this brother in terms of performance results. This is to educate employees, encourage them to raise objections, and encourage them to correct leadership mistakes.

  3. Education Supervisor. Huawei advocates a wolf culture. All supervisors who can be promoted are generally full of wolf nature, good at talking, and full of energy. They balabala in meetings and balabala when communicating with employees and speak freely. Then it will be easy to make a statement or go off topic. So during supervisor training, people who lead teams will be taught that they must be able to listen and communicate, and they must grasp the rhythm and propriety when communicating.

3. Reduce ineffective meetings

I have supported CCB's network construction for a period of time. When I first went there, I had a meeting with their IT planning department.

At that time, the meeting was a typical one-word affair. One of their leaders came over and cursed: "How come your Huawei equipment is not good, your Cisco equipment is shit, and your Siemens service is too bad..." , people from China Construction Bank and the equipment manufacturer were all dumbfounded by the scolding, so they just listened to his complaint. After scolding the equipment manufacturer, he started to call his own employees "balabala". Then no one knows what this guy wants to do, and he can't tell what kind of equipment, performance and services he wants. Then he left angrily.

Being off-topic, off-topic, and uncivilized are not fatal. The most fatal thing is an "ineffective meeting." After the leader left, everyone continued to discuss according to their own ideas and methods, and then spent 2 minutes discussing how to deal with this leader.

So we need it when we hold meetings, but there are routines on how to hold meetings effectively.

So how to do it?

  1. Regular meeting with agenda items. For example, in weekly meetings, the topics for the weekly meetings should be arranged in advance and should not be mentioned casually. Set the topics and the time for each topic to ensure that you don't go off topic.

  2. Meeting minutes must be kept, and the host of each meeting and the person who takes the minutes must be clearly identified. Meeting minutes are a very important thing and require high skills, that is, you need to effectively participate in the meeting discussion and record the key points without keeping a running account.

  3. Meeting minutes should be divided into: conclusions (the conclusions of the meeting cannot be changed at will); remaining issues (must comply with the SMART principle), responsible persons, required completion time, etc. There is a template for the minutes to remind everyone that the minutes must comply with SMART principles.

  4. Follow up diligently and close the loop. All remaining issues will be reviewed at the next meeting to see if they have been completed and whether they have been delayed until there is an explanation. Of course, if there is a problem with the cashback task arrangement, the problem will also be closed and suspended based on the evaluation.

  5. All decisions need to be well-founded and cannot be slapped on the head. Because if you pat your head beforehand, you will pat your thighs afterwards. Then someone slapped the butt and left. In this way, it will not be decided that the subordinates obey the superiors and the minority obeys the majority.

Of course, there will be efficiency issues in this case, because some issues will not be clear enough to be studied in a short period of time, and decisions will not be made. This is where CCB (this CCB does not mean China Construction Bank), CCB (Change Control Board) in CMMI (Capability Maturity Model Integration), means "Change Control Board". CCB can be held by a group or by Several different groups are responsible for making decisions about which proposed requirements changes or new product features will be implemented. A typical change control board will also decide which errors should be corrected in which versions. The CCB is the owner of the system integration project. It represents the rights and interests of the users and decides which changes should be accepted. The CCB is composed of multiple members involved in the project, usually including users and decision-makers of the implementation party. The CCB is a decision-making body, not an operating body. Usually the work of the CCB is decided through review methods Whether the project can be changed, but no change plan is proposed. At least it will be guaranteed that the decision-making resolution is collective wisdom.)

10.Brotherly culture

No brothers, no research and development - the joy of working together

In November 2014, I recruited a group of interns from Nanjing, all of whom were born in the 1990s. Some of my friends were from pretty good families. During this period, when I was working on the progress with them, there were basically no weekends, and I was basically late every night. I originally thought that the boys would complain about the hard work, but at this year’s annual summary meeting, I saw that the feedback from the boys was “gain” and “progress”. It was the most “positive energy” annual summary meeting I have ever had.

Why are everyone working so hard? Because the brothers struggle together, they are not alone because they have someone to accompany them on the road of struggle. We cooperate smoothly with each other, encourage each other, help each other, motivate each other, and at the same time, compete with each other.

When I worked in a research institute, although I worked in a state-owned enterprise with a very stable job and a very stable income, I still worked very hard at the time and sometimes worked very late. Although I was recognized by the organization and leadership, I finally chose to leave.

"If you work well and work well, there won't be much difference in income." This was one of the reasons why I left. Another important reason why I left was loneliness. When you work late and see others going home early or going out to play, you won’t be able to go very long on the road to perseverance. Trying to hold on for as long as you can on your own is not about perseverance, because that’s human nature.

When you solve a problem at night, your greatest happiness is actually having someone to tell you. When you find that you are the only one who cares about the results, or that others don't care whether you care about the results, you will still be frustrated.

When I was at Huawei, if a brother was still working overtime, he would ask his supervisor not to leave early. Because on the road of struggle, you cannot be alone. When a supervisor stays, it must not be simply to accompany him, but to help him with specific problems, or to see if the problem requires overtime, whether he is doing the right thing, and whether he is doing things correctly (giving guidance and guidance in the direction and methods) help) to avoid wasted effort. At the same time, let brothers who are willing to work hard see that someone supports you and pays attention to you.

A good supervisor must not go home and sleep while your brother is making meaningless efforts. This is the most demoralizing thing.

Of course, I am not advocating working overtime, but advocating doing things well and doing things according to time; advocating that everyone work together to get things done, rather than some people working hard to contribute and some people dawdling around with the same income, or even upside down.

My Huawei supervisor once said to me: "Ren Zhengfei believes that one of the things he does most to himself is to divide the money well and fairly." Therefore, our responsibility now is to divide the money well and fairly.

Teamwork, everyone makes the best use of it

Huawei has a saying: "A knowledge-intensive enterprise." Due to the influence of the Cultural Revolution, China is short of talents between the ages of 40 and 60 and has not accumulated enough engineering skills; because most college students have not found their own direction in life during college and are lost in games such as StarCraft, Warcraft, Legend, and League of Legends. In China, the technical capabilities of Chinese engineers are lower than those of American engineers. They often feel that their starting point is low, they start late, and they have insufficient accumulation of engineering knowledge. Therefore, our individual combat capabilities are actually far inferior to those of engineers in Silicon Valley. Then we can dismantle a relatively complex project and assign it to multiple engineers to work together. Therefore, the term "knowledge-intensive" came into being, analogous to "labor-intensive".

The ability to cooperate with each other and the allocation of project management are particularly important. However, no matter how good the distribution and management are, it cannot be perfect, and everyone's responsibilities cannot be divided so accurately. This requires a brotherly culture, where everyone finds his or her corresponding position and responsibility in the team, and can take the initiative to share and take responsibility when the responsibilities are not clear. How can you get team members to do this? Performance orientation is particularly important. (This problem ultimately comes down to dividing the money well)

"If you lose, fight to the death to save each other; if you win, raise a toast to celebrate."
Only with this kind of brotherhood can the final project be successful.

Management, the important thing is to rationalize, and then to manage

Some project managers complain that they “can’t manage, or it’s difficult to manage”. In fact, the reason lies in whether you are deeply involved in the team.

There are some very high-level leaders at Huawei who can give specific guidance on some projects and have wise opinions, so they can convince their subordinates. "Prime ministers must start from the state capital, and strong generals must be sent from the ranks." This is why some foreign companies still play the role of professional managers when they arrive in China, which often doesn't work.

As a good supervisor, you don't just demand progress and quality results, you must be able to help your subordinates achieve the goals you set, so that your subordinates can convince you and be willing to follow you.

Don't just issue an order to "capture that hilltop", but after giving the order, you can help your subordinates analyze the enemy's situation, provide intelligence, guide strategies, and finally achieve the goal of "capture that hilltop", and then give rewards.

After several successes like this, your subordinate has the ability to conquer the mountain. You can safely give him some tasks and then help him challenge higher goals; when he is able to challenge higher goals, he himself will also It will have higher value. In fact, during this process, he will be grateful to you, convinced of you, and willing to follow you. Therefore, brotherly culture is not just about eating a few meals and drinking a few drinks (eating and drinking are naturally necessary), but also practical help and care at work.

So if you help your brother to "manage" it clearly, there will naturally be no need to "manage" it. After such a long period of time, there is a feeling of brotherhood between everyone, rather than a conflict between labor and management.

Be Liu Bang, not Xiang Yu

No matter how strong you are, you need everyone to fight together.

Ren Zhengfei once said: "I don't understand technology, foreign languages, and law." However, for him to become a Fortune 500 company, he must not only rely on his own ability, but also rely on an elite think tank team. It relies on the concerted efforts of this think tank, everyone’s common ideals and goals, common core values, it relies on centripetal force, and it relies on the replication of successful teams one after another.

The nature of a gentleman is not different; he is good and false in nature.

The reason why Liu Bang defeated Xiang Yu was because he had brothers.

11.Testing

Comparing the testing of Huawei and Xiaomi from a progress perspective

Insert image description here

The picture above is a one-week progress chart of Xiaomi UI. According to the weekly release schedule of Xiaomi UI, it will be a one-day internal test on Thursday. I can't figure it out no matter how I follow Huawei's process.

The doubt lies in:

  1. Does internal testing refer to developer self-testing or tester testing?
  2. If it refers to developer self-testing, where do testers test?
  3. If it’s tester testing, what about developer self-testing? Where is the point of transitioning from development to testing?

Friends with a Huawei background will definitely ask: How can the testers complete the test in one day?

Some people may say that Xiaomi is very efficient. So let’s take a look at Huawei’s testing process, and you’ll know whether it can be compressed into one day to complete the relevant tests.

First of all, let me explain that Huawei's software department, including the UI or website development team, also develops in small iterations. After the product is stable, new requirements will be divided into small versions for the shortest cycle of development and testing. It is also possible that Huawei's ability to dismantle demand is weaker than Xiaomi's, but here we are simply talking about the testing process.

The stages Huawei has gone through during testing

Testing is an essential part of the product development process. Among Huawei's R&D personnel, nearly one-third are testers. Huawei's testing system started earlier in China and has probably gone through the following stages:

  1. The Bronze Age: Workshop-style testing

    The R&D and testing team was established in 1996

    R&D process and testing in a manual workshop

  2. The Iron Age: IPD and CMM Phases

    In 1998, Huawei cooperated with IBM and began to introduce the IPD process.

    The CMM concept was introduced around 1999

    Generate IPD-CMMI process

Insert image description here

  1. The Age of Firearms: The PTM Phase

    In 2004, the PTM process was developed based on IPD and automated testing was carried out on a large scale.

    PTM tended to be perfect around 2006~2007

Insert image description here

  1. Group Army Era: IPD-RD-I&V Stage

    Agile began to be promoted around 2008, and the R&D organization evolved into the PDU approach.

    Introduce iterative development model and form IPD-RD-I&V process

    System integration and verification process: IPD-RD-I&V

    (I&V:Integrationand Verification)

Insert image description here

The project manager writes the "Project Plan" and the developers produce the "SRS". At this time, the test team leader will start to write the "Test Plan" according to the SOW, which includes personnel, software and hardware resources, test points, integration sequence, schedule and risks. identification, etc. Project Management Forum

After the "Test Plan" is written, it needs to be reviewed. The participants include project managers, test managers and system engineers. The test team leader needs to modify the "Test Plan" based on the review opinions and upload it to VSS, which is managed by the configuration administrator. Project Manager Alliance

Project Managers Alliance.

After the developers have summarized the "SRS" and created a baseline, the test team leader began to organize the test members to write a "Test Plan". The test plan is required to be designed according to each demand point in the "SRS", including a brief introduction to the demand point, test ideas and Detailed test method three-part protocol. After the "Test Plan" is written, it also needs to be reviewed. The reviewers include project managers, developers, test managers, test team leaders, test members and system engineers, and return the review results. The test team leader organizes the test members to modify the test plan, and does not enter the next stage until the review is passed - writing test cases.

Test cases are written based on the "Test Plan". Through the "Test Plan" stage, testers have a detailed understanding of the entire system requirements. Only when you start writing use cases can you ensure that the use cases are executable and cover the requirements. Test cases need to include test items, use case levels, preset conditions, operation steps and expected results. The operating steps and expected results need to be written in detail and clearly. Test cases should cover the test plan, and the test plan covers the test demand points, so as to ensure that customer needs are not missed. Similarly, test cases also need to be reviewed by developers, testers, and system engineers. The test team leader also needs to organize testers to modify the test cases until they pass the review.

During the stage when we write test cases, developers have basically completed writing the code and completing unit testing at the same time. After transferring to the testing department, system testing will be conducted directly. The testing department pre-tests the newly transferred test version. If the software does not achieve 10% of the CheckList, the testing department will send the version back. Otherwise, the software is transferred to the testing department for system testing. According to the schedule of the "Test Plan", the test team leader conducts multiple rounds of testing. After each round of testing, the test team leader needs to write a test report, including the use case execution status, defect distribution, causes of defects, and risks during testing. Wait, then the tester will modify and add test cases. After the development has corrected the bugs and transferred a new test version, the testing department begins the second round of system testing. First, it returns to the problem sheet, then continues testing, and writes the second round of test reports. This cycle continues until the system The test is over. During system testing, testers also need to write acceptance manuals, acceptance use cases and data test cases, etc.

Modify the problem ticket until the specified defect density is met to pass the relevant TR point.

If the defect rate found during the acceptance is within the range specified in the SOW, the acceptance is successful. If the specified defect rate is exceeded, quality traceback is required.

SRS: requirements analysis document;

HLD: high-level design document;

LLD: detailed design document;

1. Unit testing (UT)

The object of unit testing is the program unit or module defined in LLD, which is also the largest testable unit in unit test case design. The test object may consist of one or more functions or classes. Test design is to design test cases for the test object.

The purpose of UT is to check the compliance of the module code with the LLD document through function operation, and to verify whether the input and output responses of each function are consistent with what it is predefined in the detailed design document. Function is the most basic unit of product development and implementation. The next unit of implementation is the module. From a testing perspective, we hope that after the UT is completed, each function will be solid and reliable. The next step of IT testing will focus on whether the cooperation between functions can be achieved. Allocate requirements without worrying about the input and output response of the function itself.

Unit testing is more suitable for developers.

2. Integration testing (IT)

Integration testing refers to testing that is performed by assembling several units that have been unit tested. Integration testing should be based on HLD and mainly finds errors or imperfections in interfaces and dependencies. The object of integration testing is a combination of several unit test objects, at least two.

The purpose of IT is to decompose the module according to the module design, start from the verified function, and integrate it layer by layer to obtain a runnable module.

IT can be done by developers or testers.

It is not difficult to see that UT is a test for each unit, and IT is the interface between test units. UT/IT can be classified as "unit-level" testing.

3. System Test (ST)

System testing defined by CMM: System testing is an overall test of the software system developed by the software project team. It is a test that runs the software system as a whole or implements a clearly defined subset of software behaviors. The main testing method used is black box testing, which is a testing method that regardless of the internal implementation logic of the program, is used to check whether the input and output information conforms to the relevant requirements specified in the specification. It can be seen that the test object of ST is the specification sheet, or more precisely, the module requirement specification sheet, so it is also generally called MST. The module SRS document gives the corresponding requirements for the input and output of the module. After MST, each module is firmly available.

4. Inter-module interface testing (BBIT)

BBIT is an inter-module interface test, which verifies whether the interfaces between modules can cooperate. Sometimes it is mixed with joint debugging, but the purpose is actually different. The purpose of BBIT is to decompose the system based on the system design, starting from the verified modules and integrating them layer by layer to obtain an operational system. Joint debugging generally involves testing software, hardware, or coordination between different products. MST and BBIT can be classified as "module-level" testing, a verification module and an interface between verification modules.

The above UT/IT/MST/BBIT are generally completed by developers. The system is basically ready to run, and testers can carry out SDV, SIT, and SVT.

5. System Design Verification (SDV)

Although SDV is a system test carried out by testers, it is a bit more gray box testing, because SDV verifies whether the cooperation of each subsystem meets the design requirements (DR), and is still concerned about the internal implementation, verifying whether the integration of multiple modules meets the design. need.

6. System Integration Test (SIT)

SIT also verifies whether the design requirements are met. Unlike SDV, SIT completely tests the system as a black box and does not care about the specific internal implementation. In actual applications, although SDV and SIT are both system-level tests, they are often tested separately by testers from different project groups (subsystems). They only focus on their respective subsystems, so SDV and SIT are still classified as "subsystems." Level" test is better.

7. System Validation Test (SVT)

SVT is an acceptance test, and its test object is the product package requirements OR. The product package requirements give the scope of the product and describe the system from the perspective of the possible application environment of the product. The purpose of SVT is to confirm (or accept) that the various application scenario products given by the product package requirements can meet the requirements.

Even for web development projects, outsourcing projects, and terminal projects, Huawei's testing will still go through the following testing stages:

SIV: System Integration Verify system integration verification.

SDV: System design Verify system design verification.

SIT: System Integration Test system integration test.

SVT: System Verification Test system confirmation test (system simulation test).

After the iteration is over, all the stories implemented in previous iterations will be tested again before they are officially released to the public. The main body of the test is the tester, including functional and non-functional, and a test report will be given. This activity is called SIT or release testing.

If the Story test and iterative SDV test are automated, this test will mainly focus on executing automated use cases, supplementing the tests if the previous tests are insufficient, and conducting detailed performance tests. If the degree of automation of the use case is not high, this test will select some parts for testing. A test report needs to be given after the test is completed.
SIT testing focus: After all iterative development is completed, testers in the iterative development team will complete regression testing of the entire system to meet the TR4A quality standard. Remaining issues need to meet the DI (defect density) target of TR5.

Incredible Xiaomi 5%

Lei Jun said:

After the reform and opening up, strong hardware production, manufacturing, design, and R&D capabilities have given Chinese companies huge opportunities. Coupled with the optimization of the whole process, in Xiaomi's past entrepreneurial time, R&D, manufacturing, maintenance, services, markets, and channels all together accounted for only 5% of Xiaomi's turnover, or 5% of the retail price of goods.

First of all, the average income of Huawei engineers should be lower than Xiaomi's, the cost of device procurement should also be lower than Xiaomi's, and the manufacturing production line should also be lower than Xiaomi's because it has its own production line.

So where did Huawei's costs go?

Then if we look at Huawei's hardware testing process, we will know where the cost lies.

1. Full test participation process:

Insert image description here

At the beginning of the project, testers begin to participate in requirements analysis and evaluate the testability of the product, test plans and other factors.

2. Multi-level testing and experimentation.

For circuit design, unit testing, complete machine testing, small batch trial production, HALT testing, environmental testing, EMC testing, thermal testing will be performed, and HASS testing will be performed after entering the production process. Special equipment will also conduct salt spray tests and vulcanization tests. The entire machine structure will also undergo: drop test, extrusion, twisting, etc.

HALT (Highly accelerated life test) highly accelerated life test. HALT is a process for discovering defects. It accelerates the exposure of defects and weak points of test samples by setting increasingly stringent environmental stresses, and then analyzes the exposed defects and faults from aspects such as design, process and materials. Analyze and improve to achieve the purpose of improving reliability. The biggest feature is to set environmental stress higher than the design operating limit of the sample, so that the time of exposure to failure is much shorter than the time required under normal reliability stress conditions.

Environmental testing is an activity carried out to ensure that the product maintains functional reliability under all environments of expected use, transportation or storage during its specified life span. It is to expose the product to natural or artificial environmental conditions and withstand its effects to evaluate the performance of the product under the environmental conditions of actual use, transportation and storage, and to analyze and study the impact of environmental factors and their mechanism of action.

HASS is applied during the production phase of the product to ensure that all improvements found in HALT can be implemented. HASS can also ensure that no new defects are introduced due to changes in production processes and components.

Hardware engineers are most afraid of HALT tests because they will test beyond the limitations of the device. But why should we do this? In fact, it is to find the weakest point of the entire device and then improve the weakest point. However, because it exceeds the allowable working range of the device, there are many abnormal situations and the reasons are complicated. However, it must be analyzed clearly according to the specifications and optimization measures must be given. This is a very brain-burning matter, and many classic problems arise during the HALT test.

12. Knowledge management and classification of hardware

Electronic hardware knowledge is extremely complex. Each subdivision can be studied deeply and can become a lifelong job for one person, such as EMC engineer, interconnection engineer, power supply engineer, programmable logic engineer... and power supply engineer can Subdivision: primary power supply, secondary power supply... as occupations respectively.

In the hardware field, since a lot of knowledge is invisible knowledge, you cannot become a qualified hardware engineer if you only know what is written in books. It is because of the huge knowledge system and the large amount of hidden knowledge that the hardware engineer's knowledge system is huge. It needs to be managed.

A netizen once used the management of movies on hard drives to illustrate his point of view on knowledge management;

Viewpoint: It is better not to classify if there is overlap in classification. Netizens’ views: When it comes to categorizing movies, everyone initially categorizes them according to actors, such as Jackie Chan, Donnie Yen... But after categorizing, what should I do if there are multiple people involved? If a film is co-starred by multiple people, how should it be classified? Later, as the collection of actor directories increased, it would be difficult to find what you need to find if you only classify them by their names. Later, a company was added outside the actor folder: Beijing Studio...; there were too many companies. , and want to classify: action adventure;...

This netizen’s point of view: “Since it cannot be classified, then don’t classify it. We rely on search.”

If you have too much knowledge, you still need to sort it out, and you can’t rely on searching;

If you collect some local knowledge and do not organize it, you will not be able to establish your knowledge system. In other words, which piece of knowledge do you need to focus on, and which piece of knowledge only needs to be understood without any concept; and the tools you commonly use or The time required to search for knowledge is the same as that of tools and knowledge you don’t commonly use; when you learn a certain piece of knowledge, there is no correlation between this knowledge point and other knowledge points, because you have not serialized the knowledge. Or there is no integration.

If your hard drive is the same, you should have a way to quickly find the videos you watch frequently; do you just need to save the seeds for the videos you watch less often; and those you never watch, or only watch once, are perverted. You can actually delete whatever is ugly, ugly, or unclear.

Insert image description here

The picture above is the status of a hardware engineer that I compiled, and it is also the knowledge area that a hardware engineer needs;

There is a saying: "Hardware is connections, software is keys." But when it comes to connecting, there are many factors to consider, and in a large company, there are also many people to deal with.

The above picture shows the shape of a car. Hardware engineers are the main body of this car. Power supply, logic, and interconnection are inseparable components of hardware engineers and are necessary conditions for the advancement of the hardware car (some small projects, or some companies do not Don’t split it like this.)

The bottom one is the path that supports the advancement of a qualified car, and is a necessary condition for the vehicle to move forward. Knowledge in these fields is essential for hardware engineers, such as: production and line, device sourcing, device failure analysis. Even in large companies with very thin personnel, hardware engineers are still the main body of responsibility for these matters; and some The company does not have people in related fields, so the hardware engineers do it themselves.

The front end is system design and product design. These thoughts that lead the direction of hardware must also be considered by hardware engineers. Because if you don’t understand the business and product scenarios, how can you choose a suitable processor? How do you know how much bandwidth and speed your memory requires?

I often said in the past that a hardware engineer, a carpenter and contractor for a decoration project, is the soul of the entire decoration and determines the level of the project. Therefore, in addition to the accumulation of knowledge in the electronic field, hardware engineering also needs to accumulate knowledge in other fields. Therefore, the degree of mastery of this peripheral knowledge needs to be distinguished from the core knowledge system. Because a person's time and energy are limited, it is impossible to master all knowledge, but one cannot not understand it completely.

Knowledge management is more than just classifying and organizing knowledge;

First of all, knowledge management is not just a catalog, it must have content, which is what everyone often says, it must have "dry information"; a catalog with a perfect and complete knowledge framework is useless, because you Just find the one you need most. It's even more useless if the folders are all empty.

Insert image description here

And I also think knowledge management must be goal-oriented. Don’t organize for the sake of organizing. Therefore, for this catalog of knowledge systems, I advocate that you first have problem points or knowledge points, and after accumulating a certain number, form the knowledge system you need most.

Just like, you first have N movies that you often watch, and then sort them into categories. Instead of creating a bunch of empty folders first.

For example, in the picture above, an effective cycle of learning, utilization, innovation, accumulation, and sharing has entered.

So, let me talk about the origin of "One Hundred Thousand Whys of Hardware". When I first joined Huawei, my mentor at Huawei was a project manager, who was my supervisor. He is very busy, but he is also very responsible. He didn’t have much time to tutor me, so he came up with a way: let me ask him 3 questions every week, and he would definitely arrange to answer them for me on Friday; Maybe he asked some too low-level questions; because my questions are not easy to answer, he also needs to spend some time to study and answer them carefully. I feel that during the trial period, this "question every Wednesday" was very effective for me to master some single points of knowledge in depth. When I brought in new employees, Huawei was experiencing rapid development. There were 17 people in my project team, including 4 old employees and the rest were new employees. There was no way to provide one-on-one coaching. I used the method of "asking every Wednesday" and it was very effective:

  1. Forcing new employees to think about problems;
  2. When new employees encounter obstacles, they have a place to ask questions;
  3. Faced with various questions from new employees, older employees also need to improve and learn in order to answer them.

Slowly, the new employees have grown up, and the questions are not so easy to answer. I no longer distinguish between new employees and old employees. Everyone can ask questions and take turns to answer them. Naturally, the student whose turn it is to answer the question is very painful. , make adjustments at work and set aside some time.

Later I discovered: Why can’t we put all our problems together? When I put together the weekly questions, there were 17 people, 3 questions a week, a total of 51 questions; we persisted for half a year (later I was transferred to the marketing department for training, so I did not persist); a total of about 1,000 questions were accumulated Next, the knowledge area covers various fields of hardware: power supply, clock, processor, logic, level standards, interface protocols...

When I sorted out the problems into categories and generated the directory in reverse, a big surprise came to me: a complete knowledge system of a hardware engineer was completely presented in front of me.

But a very interesting situation is that when searching for hardware issues in Huawei's internal technical forums, you will often find this document we compiled, because the problems everyone encounters are often similar.

Therefore, an advertising slogan illustrates the true meaning of knowledge management: "The important thing is not to have everything, but to have what you need at hand . " We not only need the concept of a complete knowledge framework, we most need to be able to solve our real-life problems.

Although "One Hundred Thousand Whys of Hardware" has accumulated quite a lot of problems, no matter how comprehensive these problems are, they are still fragmented. Although it can solve some specific problems, it cannot form a knowledge framework.

Classic textbooks and principle knowledge need to be systematized;

When I was still at Huawei, I once tried to compile the "Eighteen Martial Arts of Hardware", which covers the necessary skills for hardware engineers. Everyone participates in some systematic training materials to complete the basic knowledge: power supply, clock, processing Computers, high-speed interconnects, discrete devices, JTAG, memory... Systematic and complete mastery of basic knowledge is a must, otherwise you will not be able to complete your work, let alone complete your work well.

As there is a famous saying in AV: "If you don't know Muto Ran, you will have watched all the porn movies in vain." Therefore, some basic knowledge is necessary. Otherwise, if you don’t even understand the meaning of “cavalry” and “infantry”, how can you accurately download the film? Training and teaching materials should be divided into "entry" and "improvement"; they should be targeted at different levels of needs.

Guess you like

Origin blog.csdn.net/qq_37952052/article/details/130775796