De 6 Sigma para Ciência de Dados.
Última atualização: 15 de abril de 2021
A profissão de "Cientista de Dados" tem explodido nos últimos anos, e hoje no mercado de trabalho faltam profissionais para preencher todas as vagas existentes. A partir desta carência mão-de-obra, criou-se um imenso e lucrativo mercado de cursos on-line, que oferecem desde aulas de análise de dados além de linguagens de programação em vários formatos. Há muitos cursos e instrutores sérios, que valem a pena o investimento, mas também há muita enganação com cursos do tipo "crash course", que vendem a ilusão de que este tipo de conhecimento se aprende rápido e sem nenhum pré-requisito.
O termo "Cientista de Dados" causa um pouco de estranheza para os mais experientes, pois sempre existiram profissões e posições na indústria que lidavam com dados utilizando algum método científico. Estatísticos, engenheiros de processos, economistas, atuários e muitos outros.
Na década de 90, houve um movimento muito forte de utilização de ciência de dados para melhoria de processos, redução de defeitos e de disperdícios na indústria. Foi o período em que as técnicas japonesas de Kamban, Just-in-Time foram revigoradas e repaginadas com nomes e metodologias. Os mais marcantes desta era foram as metodologias de 6-Sigma e Lean.
A essência da metodologia de 6Sigma era a utilização de principalmente inferência estatística para atuar em processos de modo a alterá-lo um estado final diferente, de prefêrencia com menor variação em relação a um benchmark qualquer, por exemplo a uma especificação do cliente ou regulatória. Um exemplo seria diminuir a incidência de malas extraviadas ou perdidas por uma companhia aérea, ou a incidência de pacotes que não chegam a seu destino final por uma empresa de logística. Percebe-se que a aplicação era muito mais para melhoria de processos existentes.
A abordagem 6Sigma causou um impacto muito grande na época, no início propagadas por grandes nomes do management, por exemplo Jack Welch da GE, com inúmeras histórias de sucesso. O profissional do momento se chamava "Faixa Preta - Black Belt", que era a pessoa que passava por um treinamento muito rigoroso sobre técnicas estatísticas e noções de gerência de projetos. Se criou, também em um segundo momento, o chamado "hype", onde várias companhias impelidas por estas idéias, correram para contratar e formar seus Black Belts e implementar os métodos. Algumas com muito sucesso e outras com fracasso. Com o passar do tempo, esta moda se arrefeceu e em alguns anos depois, já não se falavam tanto, ou o método se tornou lugar comum.
Como o momento atual e todo o "hype" da Ciência de Dados se compara com o movimento 6Sigma descrito anteriormente. Trata-se de algo passageiro ou algo que veio para ficar?
Grandes transformações ocorridas de lá para cá, destacando-se i) o aumento exponencial da escala de dados gerados disponíveis, impulsionados pela difusão de dispositivos móveis, sensores ligados à internet e pelas redes sociais, ii) aumento brutal da capacidade computacional ao alcance de qualquer um, promovidos pela introdução de placas gráficas (GPUs) e computação em nuvem, tem promovido uma revolução chamada empreendedorismo digital. As transformações acima descritas baixaram muito as barreiras de entrada para que qualquer pessoa possa desenhar e implementar um modelo de negócios disruptivos e que desafie os modelos de negócios dos incumbentes. Por isso que suspeito que a demanda destes novos "cientistas de dados" que saibam navegar por esta massa brutal de dados usando capacidade computacional para enxergar e gerar valor de negócio é vital e veio para ficar.
Mas quais seria as diferenças de competências e abordagem do problema entre um BlackBelt e um cientista de dados? Quais seriam os pontos em comum? E como alavancar
Do ponto de vista de competências quantitativas, permanece mesma base teórica estatística. A novidade são as competências de manipulação dos dados, que antes cabiam em uma planilha de excel e eram na maioria do tipo estruturados, e agora são escala massiva e e com uma míriade maior de formatos (textos, videos, audios, etc). Business Acumen, quanto mais sentido ver na massa de dados, maiores as chances. Abordagem de projeto, na parte de definição do problema e gerenciamento da mudança, entretanto o gerenciamento da execução é diferente com métodos ágeis.
- Statistics rigour and data manipulation of data, while in a scale infinetsimal lower, the thinking behind useing statistics such as sampling data collection strategy, reproducibility (Gauge R&R). Are behind the whole actual discussion of AI models that contain bias and fail to work in different situations. Same factors behind root cause analysis and statistical inference are most likely the factors of a regression predictive model. In fact sis sigma practioners were data scientists even before the term appeared
- Project Management, a typical sis sigma project always started with a rigour project charte in where the practioneer had to problem statement, identifying clearly the business problem with outside data and project the potential gains from project. While can be criticismm, the point is that this same thing is extremely important to reflect. Of course, nowdays, things like agile and lean start up are complementary and add to improvement. And least, Sig Sigma had componetns of change impact, CAP, which continuous no less importnat, as AI project comes with several ideas of jobs subsitution, change of working methods, and navigate with people management is eesntial.
- Business Acumen, these guys were also domain experts, and intimally connected with the actual world, listening to customers looking competitiors and disruptors. Most Six Sigma practioneers, were high talent at the time and many raised to higher ranks and leadership positions
It is April 2021, I would like to share some thoughts about the career trajectory of a Data Scientist, one of the top professions that has flourished over the last years, and according to the research
will continue to rise.
The name Data Scientist has been labeled more recentlym with the enteprise emergence of AI and Machine Learning
Why? Recent adavnces in computing, GPUs, open source and breakthorughs , internet, mobile computer, Iots, algorithms: deep learning, social networks great boom in AI (differente phases of AI) - relate to article
Typical applicayions that entreprise are looking to develop:
- Social data (apps of instagram) to feed instamtaneously algoriths to predict demand, make recoomendations, stock trade transactions
- Iot to feed models to logistics routing, agriculture
- Images to identify patterns (no visble human eye or at scale) human identification, disease diagnostics, agro and environmental, smart cities
- Natural Language Processing, translator, chat bots, RPAs
- Expert Systems, robotics, autonomously vehicles, Reinforcement learing, design agents that can perform tasks autonomously
So, the leapfrog,
- it that data no longer fits into your Excel, you need to deal with large ammounts of data, slice, dice and perform calculations. Parallel and distributed computing, cloud, map and reduce
- Furthermore, data is generated while your are reading this article, and being streamed to your repository. That requires a different approach
- Social networks requires a new set of data structures to
- and used to make realtime decisions rather than rear window analysis. Therefore, it is paramount the development of predictive models
However at its core what is a Data Scientist of today' world, what are his main jobs and value added?
According to Michale Christesse, classi HBR article, the data scienetist united three competencies, without no possible to perform: - Statistics, it is all about rigor with the data, respect the math laws. Sampling, cross validation, . Descriptive Statistics, Statistical Inference, but now it is very important to get deep in predictive models, staring from statistical regression, which is the basis of regression, neural networks, Decision Trees, bagging,
- Computer, as pointed out earlier, in today's world data scientist need use computer as favor to manipulate large amounts of data, ETL and spins models training and then data visualization to demonstrate, make sense for customers. Also, the internse work to sanitize data, make sure it is out of pollution and retain a minimal statistic rigour (often this is 80% of the job). While this can span from a user of tools (Tableau, etc) of hard core programmer in python, C and Java (just to name a few), that is fluent in use of tensorflow, dask, spark . Matrix calculations and use of GPU
- Business Acumen and project management skills, which is overlooked. The curiosity and ability to learn from data through asking right questions, probe with data or human input, learn and iterate. Often it the most overlloked skills nowadays and not suprisiing that one of the main reasons of project failures. Typically these kinds of projects consume lot of time, money and high expectations, vendors inflate with consulting and software offers, so a highly grouded seasoned professional to make the complicate simple to stakeholders as well strategy to start small, interate. In this, it welcome the use of agile methodsm which small and contunous delivery, rather than a waterfall planning doi it all big bang, often fail.
How easy it to find professionals in the market that unite all above competencies? How long does it take to forma and acquire experience to perform and lead initiatives in your company?
While that can be lots of trajectories to accomplish, one that stands out for me, especially because it has been one of my personal routes, it is a data scientist career from a Six Sigma background. Six Sigma is known as a set of techniques and tools for process improvement, introduced at Motorola in the 1980's and popularized by General Electric in the 1990's. It was heavily influenced by japanese total quality management programs, and at its core Six Sigma seek to improve quality of a process output by identifying and removing causes of defects. Its methodology combines quality management methods, statistical analysis and project management competencies. Business applications span in multiple functions and industries, from manufacturing, engineering and construction, finance, supply chain, healthcare, aviation, among many others. In addition, it had its curve of expectaions, hype and dissilusioned which one may learn important lessons. Some traits that I highlighted that can be instanteneously transfer to the actual Data Scientist role;
- Statistics rigour and data manipulation of data, while in a scale infinetsimal lower, the thinking behind useing statistics such as sampling data collection strategy, reproducibility (Gauge R&R). Are behind the whole actual discussion of AI models that contain bias and fail to work in different situations. Same factors behind root cause analysis and statistical inference are most likely the factors of a regression predictive model. In fact sis sigma practioners were data scientists even before the term appeared
- Project Management, a typical sis sigma project always started with a rigour project charte in where the practioneer had to problem statement, identifying clearly the business problem with outside data and project the potential gains from project. While can be criticismm, the point is that this same thing is extremely important to reflect. Of course, nowdays, things like agile and lean start up are complementary and add to improvement. And least, Sig Sigma had componetns of change impact, CAP, which continuous no less importnat, as AI project comes with several ideas of jobs subsitution, change of working methods, and navigate with people management is eesntial.
- Business Acumen, these guys were also domain experts, and intimally connected with the actual world, listening to customers looking competitiors and disruptors. Most Six Sigma practioneers, were high talent at the time and many raised to higher ranks and leadership positions
Of course, there are many gaps between the two profiles, and they reside much more in the computing area, and emergence of new mathematical models/algorithms to deal with emerging data structures. However, I advocate that learing curve is less steep than from other areas. The benefits, I argue than more than ability to train and implement ML models, it is the gray hair factor, the experience in similar environments, business problems, that startups and organizations may lack, and blend thes kind of professional can result in a good match.
So, think about, there are many former Six Sigma practioneers out there (Black Belts and Master Black Belts), that may be encoiuraged to unrust their brain, and use all their experience, enthusiasm. For them, a great opportunity to keep learing, active and accomplish new things.
I intend to continue writing more articles in how a seasoned executive may get acquianted with actual satge of AI, different options, as na opportunity tonshare my own trajectory, as executive that is transitioning to entreprenuersehip with coming back to universtiy high 40s pursue Master degree.