Analisando o conjunto de dados do Yelp no neo4j usando o docker
Última atualização: 15 de maio de 2022
I recommend reading the famous article Data Scientist: The Sexiest Job of the 21st Century
, in where the authors give a very good description of what they mean for Data Scientist. According to them
The name Data Scientist has been labeled more recentlym with the enteprise emergence of AI and Machine Learning
Why? Recent adavnces in computing, GPUs, open source and breakthorughs , internet, mobile computer, Iots, algorithms: deep learning, social networks great boom in AI (differente phases of AI) - relate to article
Typical applicayions that entreprise are looking to develop:
- Social data (apps of instagram) to feed instamtaneously algoriths to predict demand, make recoomendations, stock trade transactions
- Iot to feed models to logistics routing, agriculture
- Images to identify patterns (no visble human eye or at scale) human identification, disease diagnostics, agro and environmental, smart cities
- Natural Language Processing, translator, chat bots, RPAs
- Expert Systems, robotics, autonomously vehicles, Reinforcement learing, design agents that can perform tasks autonomously
So, the leapfrog,
- it that data no longer fits into your Excel, you need to deal with large ammounts of data, slice, dice and perform calculations. Parallel and distributed computing, cloud, map and reduce
- Furthermore, data is generated while your are reading this article, and being streamed to your repository. That requires a different approach
- Social networks requires a new set of data structures to and used to make realtime decisions rather than rear window analysis. Therefore, it is paramount the development of predictive models
However at its core what is a Data Scientist of today' world, what are his main jobs and value added?
According to Michale Christesse, classi HBR article, the data scienetist united three competencies, without no possible to perform:
- Statistics, it is all about rigor with the data, respect the math laws. Sampling, cross validation, . Descriptive Statistics, Statistical Inference, but now it is very important to get deep in predictive models, staring from statistical regression, which is the basis of regression, neural networks, Decision Trees, bagging,
- Computer, as pointed out earlier, in today's world data scientist need use computer as favor to manipulate large amounts of data, ETL and spins models training and then data visualization to demonstrate, make sense for customers. Also, the internse work to sanitize data, make sure it is out of pollution and retain a minimal statistic rigour (often this is 80% of the job). While this can span from a user of tools (Tableau, etc) of hard core programmer in python, C and Java (just to name a few), that is fluent in use of tensorflow, dask, spark . Matrix calculations and use of GPU
- Business Acumen and project management skills, which is overlooked. The curiosity and ability to learn from data through asking right questions, probe with data or human input, learn and iterate. Often it the most overlloked skills nowadays and not suprisiing that one of the main reasons of project failures. Typically these kinds of projects consume lot of time, money and high expectations, vendors inflate with consulting and software offers, so a highly grouded seasoned professional to make the complicate simple to stakeholders as well strategy to start small, interate. In this, it welcome the use of agile methodsm which small and contunous delivery, rather than a waterfall planning doi it all big bang, often fail.
How easy it to find professionals in the market that unite all above competencies? How long does it take to forma and acquire experience to perform and lead initiatives in your company?
While that can be lots of trajectories to accomplish, one that stands out for me, especially because it has been one of my personal routes, it is a data scientist career from a Six Sigma background. Six Sigma is known as a set of techniques and tools for process improvement, introduced at Motorola in the 1980's and popularized by General Electric in the 1990's. It was heavily influenced by japanese total quality management programs, and at its core Six Sigma seek to improve quality of a process output by identifying and removing causes of defects. Its methodology combines quality management methods, statistical analysis and project management competencies. Business applications span in multiple functions and industries, from manufacturing, engineering and construction, finance, supply chain, healthcare, aviation, among many others. In addition, it had its curve of expectaions, hype and dissilusioned which one may learn important lessons. Some traits that I highlighted that can be instanteneously transfer to the actual Data Scientist role;
- Statistics rigour and data manipulation of data, while in a scale infinetsimal lower, the thinking behind useing statistics such as sampling data collection strategy, reproducibility (Gauge R&R). Are behind the whole actual discussion of AI models that contain bias and fail to work in different situations. Same factors behind root cause analysis and statistical inference are most likely the factors of a regression predictive model. In fact sis sigma practioners were data scientists even before the term appeared
- Project Management, a typical sis sigma project always started with a rigour project charte in where the practioneer had to problem statement, identifying clearly the business problem with outside data and project the potential gains from project. While can be criticismm, the point is that this same thing is extremely important to reflect. Of course, nowdays, things like agile and lean start up are complementary and add to improvement. And least, Sig Sigma had componetns of change impact, CAP, which continuous no less importnat, as AI project comes with several ideas of jobs subsitution, change of working methods, and navigate with people management is eesntial.
- Business Acumen, these guys were also domain experts, and intimally connected with the actual world, listening to customers looking competitiors and disruptors. Most Six Sigma practioneers, were high talent at the time and many raised to higher ranks and leadership positions
Of course, there are many gaps between the two profiles, and they reside much more in the computing area, and emergence of new mathematical models/algorithms to deal with emerging data structures. However, I advocate that learing curve is less steep than from other areas. The benefits, I argue than more than ability to train and implement ML models, it is the gray hair factor, the experience in similar environments, business problems, that startups and organizations may lack, and blend thes kind of professional can result in a good match.
So, think about, there are many former Six Sigma practioneers out there (Black Belts and Master Black Belts), that may be encoiuraged to unrust their brain, and use all their experience, enthusiasm. For them, a great opportunity to keep learing, active and accomplish new things.
I intend to continue writing more articles in how a seasoned executive may get acquianted with actual satge of AI, different options, as na opportunity tonshare my own trajectory, as executive that is transitioning to entreprenuersehip with coming back to universtiy high 40s pursue Master degree.