How to design large groups integrated IT system operation and maintenance


Large enterprise groups, the Group companies usually independent building data centers and a range of IT operation and maintenance system.


With the development of IT technologies, operation and maintenance demand more and more diverse, infrastructure operation and maintenance system is also more complex, technical and cost each company independently building operation and maintenance systems have become increasingly demanding, so more and more large conglomerates began to change ideas, consider a unified operation and maintenance system integration construction Group.


This article describes how large conglomerates and introducing the concept of Internet technology to create an integrated operation and maintenance system, lay the foundation for information technology, digital transformation.



Status IT operation and maintenance of large conglomerates


 1. Organization feature

General large enterprise groups will be set up according to business segments, geographical or close the way of mergers and acquisitions in its multiple companies or business unit, or business men of each company will set up a number of subsidiaries or sub-division, each company or business unit independent a piece of business operations, but also closely linked to each other, as shown below:

4ac06850d9a10933f93c4f382c1ce675

As each authority units each level organizations must assume separate operational responsibility, each unit has a self-construction of IT systems, so after years of operation of the Group's construction unit may have one or more IT data centers, run a large number of IT infrastructure, business class system.


 2. The status of IT operation and maintenance system

In order to protect IT infrastructure and business systems at all levels of stable units, efficient and safe operation, the Group's units at all levels of the general will independently plan, build a set of IT system operation and maintenance. For example, shown below and safe operation and maintenance support system categories:


b07025e88d0534d905aa76ca3cf283ad


3. IT operation and maintenance pain point analysis

From the integrated management, collaborative integration of digital transformation point of view, such units at all levels "chimney", "decentralized" IT operation and maintenance system construction model is mainly the following problems:



Diverse and complex system architecture

Since there is no unified planning, at all levels over the years independent planning and construction of IT system operation and maintenance, has accumulated a large number of different vendors, different architectures of operation and maintenance products for own use or operation and maintenance of each system, each individual operation D system operation and maintenance are responsible for addressing the needs of a particular aspect of the unit.


Such as server monitoring system to monitor class IT system may contain multiple vendors as Microsoft, Hewlett-Packard, IBM, soft, North Tower, Zabbix and other products.



Synergy between the systems is difficult lateral

Because manufacturers of various types of IT operation and maintenance system architecture varies, there is no interface for collaborative interaction between each other, and because each system specifications and data formats vary can not effectively lead to unity in terms of data.


例如,IT监控系统中的配置数据与ITSM中的配置数据差异性较大,无法打通共享;IT监控系统自动采集的配置数据无法自动匹配、更新到IT资产系统;各单位的安全设备及安全类系统厂商不同,所能提供的安全指标差异较大,导致公司层面无法进行统一的安全指标展示和横向对比。



运维数据质量差、无法发挥价值

由于各单位的IT运维系统的数据没有统一的标准规范,各项数据准确性、一致性较差,数据质量问题突出,并且难以推行统一的质量改进方案。


因此,也无法基于各单位已有的运维数据进行分析、挖掘和应用,数据价值不大,无法实现统一化运营。



系统建设和运维成本持续增加

由于各单位的IT运维系统独立规划、建设、部署和运维,随着IT运维需求的复杂度逐步增加,更多数据化、自动化、智能化的运维需求不断提出,且所有单位大部分的运维需求都类似。


如果各单位继续独立规划建设运维系统,将导致整个集团层面的总体系统建设、运维成本持续增加,存在大量的重复投资。



新技术运用参差不齐

传统的IT运维系统技术落后,难以适应当前企业级应用快速开发、精细化管控要求。随着新的互联网技术的不断涌现和成熟,各单位分别自行引入新技术,势必将会造成大量的重复研究投入,并且技术应用程度参差不齐。




大型集团IT运维新思路


1. 企业IT运维的新思路

基于对大型集团IT运维现状的分析,为了能够满足越来越复杂的IT运维需求,实现运维数字化转型,需要大型集团化企业转变运维系统的建设思路,由原本的“烟囱式”、“分散式”的建设模式转变为“一体化”、“集中式”的建设模式,引入互联网的新兴技术和工具打造一站式IT运维业务“ERP”。



2. 一体化IT运维系统设计思想

为了满足复杂的运维需求,且兼具良好的扩展性,一体化IT运维系统的整体设计思想如下:



平台+应用模式

  • 建立一体化运维系统的基础平台,运用场景输出模式,对应用功能进行解耦

  • 提供便捷快速服务组合功能,各分子公司可根据实际管理需要实现个性化运维应用,全面支撑以运维场景为视角的全生命周期运维管理


 IT运维功能全覆盖

  • 覆盖企业现有IT资产配置管理、IT基础设施监控、IT应用监控、IT服务管理、IT安全监测、IT呼叫、IT设施巡检等功能

  • 同时需要为未来自动化、智能化运维场景预留扩展能力,构建监、管、控于一体的运维管理


统一门户、集中部署

  • 建立全集团统一运维门户,实现全集团运维统一入口,服务与支持全景展示

  • 采用一级部署模式,平台及应用均部署在集团总部一级,分子公司部署本地代理用于集成和管理各级单位的IT基础设施和系统


先进技术架构

  • 摈弃传统单体设计模式,采用业界先进的PaaS+微服务的设计模式

  • 利用分布式、高可用技术实现平台高可用、高性能

  • 采用开放式标准化的平台接口设计,支持基于平台进行场景式扩展开发



大型集团化企业IT运维蓝图规划


1. 平台架构简介

经过对目前行业内的多种互联网公司技术和平台的研究对比,目前国内最先进、体系最完整的运维系统架构当属腾讯公司的研运一体化PaaS平台。


a9205927ff9d237a6b4818f1e955e949


如上图所示,平台是一套能适用于各种不同IT数据中心的多层次可扩展的研运一体化能力平台:


IaaS层:能够支持企业传统的数据中心、虚拟化、私有云、公有云、混合云等各种IT基础设施。


PaaS层:能够提供配置管理、容器服务、管控服务、大数据计算、大数据存储、机器学习算法等先进的互联网技术以及基于这些技术研发的配置平台、作业平台、管控平台、容器平台、数据平台、开发框架等多种模块化、可复用的能力。


基础SaaS层:能够提供持续集成、发布变更、故障处理、体验优化、辅助运营、运营安全等多种IT场景中使用的应用系统和工具,满足企业IT在“持续集成-持续部署-持续运营”(CI-CD-CO)的全生命周期开发运营一体化的需求。


场景SaaS层:通过提供底层的开发运维能力和工具,能够支撑企业根据自身的需求构建各种复杂的、个性化的、满足特定场景的应用系统。



2. 一体化IT运维系统功能架构规划

根据大型集团化企业的IT运维需求,结合先进的互联网技术思路,可以基于平台设计如下图所示的一体化IT运维系统架构:

c530162f98f74c580e9b744df97905d3

系统由基础平台、运维场景应用、信息展现三部分组成


基础平台具备配置管理(CMDB)、公共组件、开发框架、作业服务、采控服务、数据服务等多种组件,为场景应用提供开发、运行的环境和基础能力。


运维场景既可以支持集团根据所有单位共有的运维需求规划构建统一的运维场景应用进行全集团共用,也可以支持各级单位根据自身的运维需求构建个性化的运维应用独立使用,支持灵活的扩展开发和发布。


例如通用的ITOM应用、ITSM应用等,以及个性化的特定应用系统的自主巡检和快速问题处置应用等。


信息展现可以支持统一运维门户、移动端运维入口、运维信息大屏等,提供面向全集团所有IT运维用户和领导的信息展示和运维操作入口。



3. 一体化IT运维系统部署架构规划

为了实现全集团统一集中运维,减少各级单位的部署、运维成本,故规划如下图所示的部署架构,实现集团与各分子公司的数据、服务的互联互通。


0af6f84b62bf77e93f49e7acc8f23b7a


集团总部集中部署一体化IT运维系统,面向全集团总部及各级单位的用户提供访问。


每个分子公司部署本地代理服务器,实现对本地IT数据中心的基础设施和应用系统的监、管、控,同时通过数据总线与集团总部的一体化IT运维系统集成:


采集代理:

负责对各单位本地的基础设施和应用系统的配置信息、关联关系、性能信息、运行状态等进行采集,并通过数据总线将采集信息上报到集团总部的采集中心,进而提供给各运维场景应用使用。


管控代理:

负责对各单位本地的基础设施和应用系统进行操作控制,包括文件下发、命令执行等,与集团总部的管控中心协作,接受来自集团总部各运维场景应用的操作指令,如服务启停、文件替换等。


数据代理

Responsible for receiving each unit of local infrastructure or application system operation and maintenance data for each custom format is sent via the data bus will be shipped dimensional data reported to the Group's headquarters data center, available to the operation and maintenance scenario analysis or display application.


Service Agent:

Group Headquarters is responsible for the API interface integrated IT system operation and maintenance provided by the registration, packaging, converting the units available to local application system call API interface units or local application system provided by registration, packaging, converting available to Group headquarters integrated IT operation and maintenance system calls.



A large group enterprise IT operation and maintenance system construction case

Recently I participated in the planning and construction of a large domestic conglomerates integrated IT system operation and maintenance of the enterprise group administered more than 10 secondary units, more than 200 three units, each one carried out before the project construction units are respectively a large number of IT operation and maintenance system.


The project by the introduction of these platforms, based on the "platform + application" model to build a new set of group-wide integrated IT system operation and maintenance, provide one-stop IT operation and maintenance for the whole group of all units of thousands of IT operations. integrated management system, and support for each unit based on the system of independent extension personalized operation and maintenance scenarios.


The integrated IT system operation and maintenance of the main functions of centralized deployment at Group headquarters, local agent deployment platform in the secondary and tertiary units, each unit will be massive, real-time details of the operation and maintenance of data aggregation to the headquarters of unified storage and applications.


The system CMDB at the core, to build the IT monitoring, operation and maintenance process management, automated operation and maintenance, call, security monitoring and early warning functions such as scene, set prison, defense, control, control in one, from technology to achieve security, transport full integration of the business dimension.


Meanwhile, on the line of the system, the company realized from the traditional manual operation and maintenance, scripts, operation and maintenance of the automated operation and maintenance, intelligent operation and maintenance, and promote the standardized operation and maintenance work, standardization and systematization, reducing operation and maintenance costs and enhance the value of the operation and maintenance department.


Author: Martin wins the whole


technical article

Powershell mining viral treatment and prevention

Transportation management at the clouds out of the tube

AD domain consolidation Notes

DevOps Evolution and landing value

Operation and maintenance of landing big data platform vision


Guess you like

Origin blog.51cto.com/11811406/2432091