Why ML Testing Could Be The Future of Data Science Careers

This article predominantly talks about testing as a distinct career option in data science and machine learning (ML). It gives a brief on testing workflows and process. It also depicts the expertise and top-level skills a tester needs to possess in order to test a ML application.

Testing in Data Science: Opportunity for Expansion

There is a significant opportunity to explore and expand the possibilities of testing and quality assurance into the field of data science and machine learning (ML).

Playing around with training data, algorithms and modeling in data science may be a complex yet interesting activity—but testing these applications is no less.

A considerable amount of time goes into testing and quality assurance activities. Experts and researchers believe 20 to 30% of the overall development time is spent testing the application; and 40 to 50% of a project's total cost is spent on testing.

Moreover, data science experts and practitioners often complain about having ready-for-production data science models, established evaluation criteria and set templates for report generation—but no teams to help them test it. This unleashes the potential of testing in data science as a full-fledged career option.

Testing can be implemented in data science in a totally new context and approach. But, for such systems, this new backdrop consumes even more time, effort and money than the other legacy systems at hand.

To understand this complexity, we first need to understand the mechanics behind machine learning systems

How Machine Learning Systems Work

In machine learning (ML), humans feed desired behavior as examples during the training phase through the training data set and the model optimization process produces the system's rationale (or logic).

But what lacks is a mechanism to find out if this optimized rationale is going to produce the desired behavior consistently.

This is where testing comes in.

An Overview of Machine Learning Testing

The cure-all for this is creating a sufficient number of behavioral tests for the model under consideration, which should be able to provide 100% coverage in terms of the software and its capabilities' optimized rationale. Also, it is advisable to group these tests under different capabilities headings so nothing is missed and you can easily trace your approach.

Traditional software testing has metrics such as the lines of code (LOC), software lines of code (SLOC) or McCabe complexity. But for the parameters of a machine learning model, it becomes harder to set metrics for coverage.

The only possible solution, in this context, is to track model logits and capabilities—and quantify the area each test covers around these output layers—for all tests executed. Complete traceability between behavioral test cases and the model logit and capabilities has to be captured.

But still, a well-established convention is lacking industry-wide in this regard. And testing for machine learning systems is in such an immature state professionals still aren't taking test coverage seriously.

The Two Main Types of Machine Learning Testing

Considering the above scenarios, we derive two broad categories of testing in machine learning applications.

Model evaluation, which depicts metrics and curves/plots explicitly defining model performance on a validation or test dataset
Model testing, which involves explicit checks for behaviors the model is expected to follow.

For these systems, model evaluation and model testing should be executed in parallel—because both are requisite for building high-quality models.

In practice, most experts are doing a combination of the two—where evaluation metrics are calculated automatically and some level of model "testing" is done manually through the error analysis process (i.e., through failure mode and effect analysis). But this is not sufficient.

Testers pitching in early in the phase and developing model tests exhaustively for machine learning systems can offer a systematic approach—not only towards error analysis, but also in helping achieve complete coverage and automating the entire approach.

Required Competencies for a Data Science Testing Team

A good testing team needs to validate the model's outcomes to make sure it works as expected. The model will keep changing as customer requirements come in, or changes and implementations are made, but the more the team optimizes the model the better the results will look. This cycle of refinement and modifications continues based on the customer’s needs.

Conclusion

Machine learning systems are knotty to test because developers and testers are not explicitly writing system's logic (it's generated through optimization).

Testers can tackle this issue, as they deal with large sets of data already and know how to use it optimally. Moreover, testers are are experts in looking critically at data and are concerned less with code and more with data and domain knowledge. All this helps testers conveniently embrace data science and machine learning—for them, it is just a matter of changing the lever and fine-tuning the engine for a new route in their ongoing journey.

Register your interest here for ML training: https://lnkd.in/dDzdXSEe

Reach us: +254 740288931, veronica.wahome@sankhyana.com
Visit us: www.sankhyana.com

#datascience #datascientist #machinelearning #deeplearning #python #ai #bigdata #artificialintelligence #ml #career #kenya #datacollection #datascientist #datascience #datacleaning #dataanalysis #dataanalytics #dataanalyst #data #machinelearning #analytics #artificialintelligence #ai #bigdata #software #nlp #nlptraining #pythontraining #pythonprogramming #pythonprogramminglanguage #pythondeveloper #python #pythonfordatascience #pythoncourse #datamining #statistics #kenyans #kenya #kenyan #kenyasafari #nairobikenya #nairobi #nairobian #naira #upskill #upskilling #upskillyourself #upskills #upgradeyourself #upgradeyourskills #upgradeyourcareer #upgradeandupskill #upgradeskills #linkedinposts #linkedinpost #linkedinconnections #trendingnow #trending2021 #trendingpost #sankhyana #datasciencetraininginkenya #bestdatasciencetraininginstituteinkenya #bestdatascienceonlinetraininginstituteinkenya #pythontraininginnairobi #bestdatasciencetraininginstituteinnairobi #bestdatascienceonlinetraininginstituteinnairobi #tembeakenya #magicalkenya #kenyameme #madeinkenya #publicity254 #visitkenya #vscokenya #pythonlearning #python3 #packagingengineering #pythoncoding #pythoncode #pythonscripting #job #jobs #like #training #quality #technology #customerexperience #language

Search This Blog

Explore Career in Data Science (with the best online data science training institute in kenya)