The rubber meets the road that’s where the model comes across data sets data points which it probably has not seen in the training set in the inner in the sample data given to us the model’s performance bombs so how do you prevent a model from failing into production that is that concept is called generalization so we take care of all those things at this point we sometimes we don’t get the minimum required performance from the models so we may have to tweak certain parameters become them hyperparameters that’s what we do here.
When we’re doing this we keep an eye on the generalizability of the model should be able to perform equally well in production so we keep an eye on those things while we tuned models and finally, the last stage is the deployment of the model’s models can be deployed in various ways they can be converted to easy files they can be run as web services they can run in the batch process more they can run in a real-time mode they can be deployed in various ways in this stage the data science team especially the data scientist he has to collaborate and work with other stakeholders other process owners in the business because those people would be very keen on understanding.
What performance this model is likely to give because now you’re adding a new component to those processes there will be some ripple effect on those processes so before you get into the process change which is a headache which is a very cumbersome exercise can become cumbersome it’s very important to strategize your approach to deploying the models in production and of course, we have to have a rollback plan in case the model does not work we need to roll it back all the things come in the last stage of deployment and subsequently we keep an eye on this model in production it’s not that you deploy today and forget it forever we keep an eye on this model and we observe it and trying to check whether the modern is performing as for expectation over a period of time in the production sometimes.
We also do recalibration of all models they need the recalibration over a period of time they are not for life so that’s why it’s very important to keep an eye on the deployment stage and subsequently now the reason why I showed you these different stages is and by the way, before I move on the model building is not a simple waterfall it’s not a linear room what you see here is directed acyclic graph which starts from the left and goes to the right model building is usually iterations inside iterations that loops inside loops from one particular state say tuning of models.
I might have to go back to the very first stage of collecting the data probably I realized later on that the data have is not sufficient the quality of the attributes I have is not good enough to get me the results I might have to go back to stage one so model building is a nitrating exercise you may have to have you will have many iterations in this but what I’ve shown here is a simple acyclic directed graph from left to right this is one way away at a very high level so it’s always an actor exercise the reason why I showed you this the reason I’m talking through these different stages is at every stage of the modern building exercise you need certain kind of experts with certain kind of skills.
For example, if you’re doing data collection and data is coming from various sources within and outside your organization you might need people who are expertise in ETL tools you might need people who have expertise in streaming tools such as tough car or flume you might need people with the knowledge of Hadoop to create your data mix there when you come to explore it read in analytics you obviously need people who know some amount of statistics and some kind of mathematics when it comes to the transformation of the data addressing missing outliers and so on so forth once again statistics comes into play here, domain expertise is required here because when you generate new features from existing features do the new features make sense domain expertise will be called into play.
When you create models you need people who have expertise in the various algorithms that are used to build models same way when you come to the tuning of the models you again need expertise in various algorithms using which you build the models deployment worth model you need process engineers people who know the different processes so at every stage of the model-building exercise you need experts with certain skills and hence the model building is never single person’s responsibility it’s not one data scientist acting like a hero model building is always a team exercise it’s always a team project there’s a team involved in building any models.
What I’m trying to do here is represent that team in form of a pyramid so you this pyramid I have tried to keep the various rules that have come across in data science projects the x-axis on this pyramid represents the strength that is the ratio of the number of people in the team and the y-axis represents the amount of experience required to be in that rule all rules are important you can’t do a serious data science project without even with even one of these rules missing so all rules are equally important but the strength and the experience required will be different for example when you start with developers who are kind of a backbone of this entire project there is the person people who have to have skills in the required languages such as Python or Ark or Java or see any other language.
These developers are supposed to build the applications most efficient applications with as loop as a most efficient and most effective application with no bugs at all testers they are responsible for identifying as many bugs as possible in what the developers have done data architects data engineers data architects and data engineers are responsible for defining the structures defining the flow of the data sets defining the storage systems where beta need to be stored.
How data will flow from point A to point B data engineers are responsible for generating additional features how do you enrich the data beyond what it is big data experts are all those people who are into big data technologies they know them inside out of Hadoop file systems that’s where the data links have to be created they are well-versed with spark execution engine so these are the people who know the various the ecosystem of the big data the various tools and technologies that are put together to make the entire flow of data storage and data possible of course not all projects require modeling to be done in a big data environment.
These days for a certain reason that is actually happening right now knowledge of big data and cloud technology is becoming important Junior data scientists are basically people who act as support to the data scientist the core data scientists they help the data scientists they take responsibilities of the various activities and help the data scientists in building the models so they work very closely with the technical team and the data scientist and act as a bridge between them but once again not all projects need to have something called junior data scientist or associate data scientist on the top and the pinnacle I have shown a senior data scientist.