"Data ETL" transformation journey from data to data of white-collar workers (a) - Tools Overview

In the era everyone is a data analyst, which no man can post data decoupling. Premise data analysis are clean and complete specification of data exists for this premise, many migrant workers live in a state of data (daily moving bricks dry Kulei live), will spend a lot of time in some meaningless, worthless, low data output processing. This series of non-IT-level perspective, for everyone to specify a path of progress, so that more people can enjoy the white-collar work status data (easy, multi-purpose brain, decent).

Tool really matter?

Strong many business professionals often say is not the most important tool of business thinking is the most important, in order to produce the greatest value.

If you can climb to have a special start to help deal with unimportant things in their mouth, it is worthy of congratulation, but also make them superior, and more considerate about the start of hard work, as you feel on unimportant things, they are in every day spend their own precious lives.

In the information technology era, the role of tools that can not be ignored, no tools, only the head, do not know how to do a non-head of the dirty dirty work assigned to the tool to complete, just as now count the number without a computer, but with the student's pen and paper, is solely asking for trouble.

We have to choose what data tool?

In order to obtain a data source can be used, the process requires a data preparation process professional that point, process the data ETL (extract the Extract, Transfrom conversion, Load loading), to get the first data as the data source when analyzing (and sometimes even their own distribution going to do a good job template to use for production data link), we must use more than enough tools to deal with the needs of our different scenarios.

Because of different scenarios, adapt to matching different tools are different, no tools is a panacea. So do not lock a data ETL tool to do these dirty dirty work.

In the eyes of the author, existing tools are divided into tools of Microsoft and non-Microsoft-based system tool, because of the limitations of the author's knowledge, only to share in the field of Microsoft system. But I think for the average worker data, the balance of costs and benefits for the (learning costs, tooling costs, usage scenarios range and other considerations), no non-Microsoft-based tools that can enter our field of interest.

The best choice for small data Scene: OFFICE third-party plug-in software +

If only a small temporary data ETL process, the most appropriate tool than OFFICE software we face every day (the higher the better version, the more it can bring out greater productivity effect, at least to the minimum OFFICE2010).

Sometimes we are able to complete their own part, but also share it with other people can grasp complete, we do commonly known as a template for other people, so that other people can lower the threshold to accomplish this in their daily work is a very common scenario needs . They will do, but also to let others follow suit, it is best to work to throw to, let others easily completed.

OFFICE software, after all, is a versatile software, and our work is a particular scene in nature, there is no necessity to use customized software easy to use, easy to use.

So there are third-party plug-ins such as Excel catalyst, so that it could more easily be used on a specific scene quickly accomplish a specific task, while at the same time do not need to start from scratch to develop a customized software tool to operate (usually very low feasibility the demand is unlimited, the budget is limited, and a lot of customization requirements are expensive.)

100+ existing Excel functions catalyst, at least there are 90+ functions is to serve data ETL part, I believe that with the support of this series of auxiliary functions of Excel catalyst, migrant status data will be very much improved.

Whether or third-party plug-ins also OFFICE native functions, are equally attributable to meet the needs of our work, let us work more convenient, there is no need to go to non-plug tangled I do not have to complete, is to use their own native function go fiddled West Minato various tips to achieve.

Similarly, OFFICE software also evolve constantly added new features to meet the more work the scene, and make work easier, it is necessary to pursue new software OFFICE, maintain the best way is to install the latest version of OFFICE365 .

Scene sized data choice: PowerQuery + Excel catalyst

In the self-service BI tools, Microsoft's system is the PowerBI series, which supports data ETL is part PowerQuery, which can be used in Excel, PowerBIDeskTop and Sqlserver of SSAS.

This learning tool cost is not high, but the output is still very substantial. Microsoft inheritance consistent product style: graphic operation, and can do more in-depth expansion at the code level. Last couple of years, the community is slowly complete the tutorial it can be said that a good tool for high-yield low investment.

Among the many features of Excel catalyst, is standing in a field of vision, senior data analyst, fully tap the data ETL process just to be functional, complex functions common to refine and ultimately implement the simple plug-in level for the call is completed. While in the performance and efficiency also meet the needs of mid-sized data scene.

In learning PowerQuery catalyst and Excel can do some balance, try to use its most functional areas of expertise, do not have to have to use a feature which tool to complete, such as the merger workbook function on PowerQuery, good normalized data source on Excel catalyst, good non-standard data source.

Each tool has its advantages and disadvantages of performance, some enthusiasts geeks will be a tool of certain functions were too depth expansion, and proud of it. Ordinary learners need to be able to distinguish, at 28 principles learning tool too deep to correct some of the less useful features, it takes a lot of energy, but output may not be high.

Medium and large enterprise data scene

Enterprise-class applications, point a lot of time and personal attention is not the same scenario, for example, need more attention: stability, performance, automation, and other reasonable distribution of competences.

In the field of professional data ETL, tied with Microsoft SSIS (data integration services) Sqlserver provided, of course, here too the slight share with other professional tools, but once contrast, I believe that readers will still love SSIS.

Interception article from the network of some comparative analysis of other people do, is best known for Informatica and datastage these two, but the price is also very touching, buy a single tool necessary to nearly 100 million.

Kettle careful reader can see that a free tool to use, but the cost is usually more than the cost of the software, as well as learning costs, anyway, this is not quite the level of the author dares to risk an open source free, limited functionality, learning costs expensive tools.

In SSIS, in fact it can be considered free tools to buy Sqlsever comes with free use, and performance is very good, especially for small and medium-scale enterprises for data, the already very capable. We had better know that many manufacturers of the products are separated from the set of independent units to be sold separately, and high value.

Many readers may be concerned that these levels of IT learning product costs are high, it is difficult to control. But the trend tool is part of the complex package of external output is simple to use, such as Excel catalyst, using a very simple interface level is operational, but the interior is complicated I give a good package, without the user's attention.

SSIS的学习曲线其实并不算高,全程也是图形化操作,对数据库有一些认识,熟悉SQL语句,熟悉使用PowerQuery的群体,也一样可以玩得转。

在可扩展性方面,SSIS提供了dotNET脚本的接口,理论上再复杂的处理都可以驾驭得住,而无需类似PowerQuery那样是封闭性的,例如它不提供正则表达式的功能,就永远用不上,在SSIS上就不存在。

同样地论性能和功能的丰富性来说,若PowerQuery这种自助式的数据ETL不能满足现状需求,很建议再往前一步,走进SSIS的领域瞧一瞧。

云时代的选择-Azure Data Factory(数据工厂)

时代在发展,特别是数据领域,现在已经迈进了大数据时代,除了数据量大,还伴随着大量的非结构化数据如语音、长文本、视频、图片等,若使用传统的SSIS这样的工具,已经很难胜任了,所以微软给到我们的方案是Azure Data Factory,使用SAAS服务,让专业的人做专业的事,我们只需按需来使用即可。

微软给到我们的架构图如下,除了数据的抽取外,还可以使用到Azure的机器学习、认知服务AI等功能来对非结构化数据进行分析加工,转换为结构化的数据供下游的数据建模和分析工具使用。

结语

时代在进步,人的能力也被重新要求,在数据领域,数据ETL的本领的掌握,能够帮助我们从数据民工式的繁重工作中得以解脱出来,换来的是我们通过脑力的劳动,学习先进的工具,更轻松地完成数据加工、整理、处理等工作。

与笔者一起走一遍,从Excel基本操作、Excel催化剂的功能掌握、PowerQuery自助式ETL工具的学习,到专业ETL工具SSIS,再到云时代的ETL工具Azure Data Factory,按需学习,当前不满足时,可离开舒适区,再往前行,必然会有开阔天空在等着你。

*笔者未来聚焦在数据领域的分享,不限于Excel,会分享更多Sqlserver、dotNET、Azure、PowerBI等话题,升级数据分析的能力,欢迎继续关注。**

关于Excel催化剂

Excel催化剂先是一微信公众号的名称,后来顺其名称,正式推出了Excel插件,插件将持续性地更新,更新的周期视本人的时间而定争取一周能够上线一个大功能模块。Excel催化剂插件承诺个人用户永久性免费使用!

Excel催化剂插件使用最新的布署技术,实现一次安装,日后所有更新自动更新完成,无需重复关注更新动态,手动下载安装包重新安装,只需一次安装即可随时保持最新版本!

Excel催化剂插件下载链接:https://pan.baidu.com/s/1Iz2_NZJ8v7C9eqhNjdnP3Q

联系作者

公众号

取名催化剂,因Excel本身的强大,并非所有人能够立马享受到,大部分人还是在被Excel软件所虐的阶段,就是头脑里很清晰想达到的效果,而且高手们也已经实现出来,就是自己怎么弄都弄不出来,或者更糟的是还不知道Excel能够做什么而停留在不断地重复、机械、手工地在做着数据,耗费着无数的青春年华岁月。所以催生了是否可以作为一种媒介,让广大的Excel用户们可以瞬间点燃Excel的爆点,无需苦苦地挣扎地没日没夜的技巧学习、高级复杂函数的烧脑,最终走向了从入门到放弃的道路。

最后Excel功能强大,其实还需树立一个观点,不是所有事情都要交给Excel去完成,也不是所有事情Excel都是十分胜任的,外面的世界仍然是一个广阔的世界,Excel只是其中一枚耀眼的明星,还有其他更多同样精彩强大的技术、工具等。*Excel催化剂也将借力这些其他技术,让Excel能够发挥更强大的爆发!

关于Excel催化剂作者

姓名:李伟坚,从事数据分析工作多年(BI方向),一名同样在路上的学习者。
服务过行业:零售特别是鞋服类的零售行业,电商(淘宝、天猫、京东、唯品会)

技术路线从一名普通用户,通过Excel软件的学习,从此走向数据世界,非科班IT专业人士。
历经重重难关,终于在数据的道路上达到技术平原期,学习众多的知识不再太吃力,同时也形成了自己的一套数据解决方案(数据采集、数据加工清洗、数据多维建模、数据报表展示等)。

擅长技术领域:Excel等Office家族软件、VBA&VSTO的二次开发、Sqlserver数据库技术、Sqlserver的商业智能BI技术、Powerbi技术、云服务器布署技术等等。

2018年开始职业生涯作了重大调整,从原来的正职工作,转为自由职业者,暂无固定收入,暂对前面道路不太明朗,苦重新回到正职工作,对Excel催化剂的运营和开发必定受到很大的影响(正职工作时间内不可能维护也不可能随便把工作时间内的成果公布于外,工作外的时间也十分有限,因已而立之年,家庭责任重大)。

和广大拥护者一同期盼:Excel催化剂一直能运行下去,我所惠及的群体们能够给予支持(多留言鼓励下、转发下朋友圈推荐、小额打赏下和最重点的可以和所在公司及同行推荐推荐,让我的技术可以在贵司发挥价值,实现双赢(初步设想可以数据顾问的方式或一些小型项目开发的方式合作)。

Guess you like

Origin www.cnblogs.com/ExcelCuiHuaJi/p/11331575.html