1 00:00:00,480 --> 00:00:03,480 foreign 2 00:00:08,880 --> 00:00:15,360 okay 3 00:00:11,280 --> 00:00:18,420 uh last Talk of the day so 4 00:00:15,360 --> 00:00:21,140 um I have here nins who's uh software 5 00:00:18,420 --> 00:00:22,859 engineer that's transformed metamorph 6 00:00:21,140 --> 00:00:24,960 metamorphosized into her machine 7 00:00:22,859 --> 00:00:27,900 learning engineer and uh recently 8 00:00:24,960 --> 00:00:29,760 finished his masters of research with a 9 00:00:27,900 --> 00:00:31,140 focus on agriculture and image 10 00:00:29,760 --> 00:00:32,279 processing 11 00:00:31,140 --> 00:00:33,960 um so he's going to be talking to us 12 00:00:32,279 --> 00:00:35,700 today about a simple way to validate and 13 00:00:33,960 --> 00:00:37,800 monitor the performance of ml 14 00:00:35,700 --> 00:00:40,940 applications so round of applause please 15 00:00:37,800 --> 00:00:40,940 thank you 16 00:00:41,460 --> 00:00:48,539 hi uh hello hello can you hear me okay 17 00:00:45,780 --> 00:00:49,379 it's nice to be back on a live event 18 00:00:48,539 --> 00:00:51,539 um 19 00:00:49,379 --> 00:00:54,059 hi everyone I first want to thank Mike 20 00:00:51,539 --> 00:00:56,160 on Australia for organizing this event 21 00:00:54,059 --> 00:00:58,020 um definitely good one for all of us 22 00:00:56,160 --> 00:01:00,180 right so for today 23 00:00:58,020 --> 00:01:02,100 I'm here gonna I'm here and I'm gonna 24 00:01:00,180 --> 00:01:03,780 talk about a simple way to validate and 25 00:01:02,100 --> 00:01:05,280 monitor the performance of your machine 26 00:01:03,780 --> 00:01:08,840 learning applications 27 00:01:05,280 --> 00:01:12,000 okay a little bit about myself 28 00:01:08,840 --> 00:01:13,020 I do mountain bike I love DOTA 2. you 29 00:01:12,000 --> 00:01:14,280 can talk to me about software 30 00:01:13,020 --> 00:01:18,720 engineering data science machine 31 00:01:14,280 --> 00:01:21,240 learning I love to lay on grasses 32 00:01:18,720 --> 00:01:22,920 as evident by the picture and I like 33 00:01:21,240 --> 00:01:24,420 wrestler buns you know custard buns are 34 00:01:22,920 --> 00:01:26,840 really good 35 00:01:24,420 --> 00:01:26,840 anyway 36 00:01:27,180 --> 00:01:32,580 let's go so for today I hope I can 37 00:01:30,060 --> 00:01:34,080 impart some learnings to you guys and I 38 00:01:32,580 --> 00:01:36,360 hope you learned a lot from this session 39 00:01:34,080 --> 00:01:39,479 we're just gonna briefly go through and 40 00:01:36,360 --> 00:01:42,180 talk about these points right here so 41 00:01:39,479 --> 00:01:44,400 first one let's try to identify um 42 00:01:42,180 --> 00:01:46,560 what is model validation in the context 43 00:01:44,400 --> 00:01:49,439 of an application or a machine learning 44 00:01:46,560 --> 00:01:52,439 application to be accept right so how do 45 00:01:49,439 --> 00:01:53,700 we identify Trends why do you need 46 00:01:52,439 --> 00:01:55,619 validation 47 00:01:53,700 --> 00:01:59,220 specifically right now people are always 48 00:01:55,619 --> 00:02:01,020 into most of us are most AI subject to 49 00:01:59,220 --> 00:02:02,340 PT and stuff right 50 00:02:01,020 --> 00:02:04,020 so 51 00:02:02,340 --> 00:02:05,100 how do we choose the correct validation 52 00:02:04,020 --> 00:02:07,380 metric 53 00:02:05,100 --> 00:02:09,660 what are the available metrics that 54 00:02:07,380 --> 00:02:11,280 is in store for us 55 00:02:09,660 --> 00:02:13,260 um 56 00:02:11,280 --> 00:02:15,239 let's try to build a simple model 57 00:02:13,260 --> 00:02:18,000 validation module in Python so I'm going 58 00:02:15,239 --> 00:02:20,040 to show you an example of one of the 59 00:02:18,000 --> 00:02:22,260 simplest ways that we approach this 60 00:02:20,040 --> 00:02:24,540 problem in my previous work and I'm 61 00:02:22,260 --> 00:02:25,819 still using it as of today uh and then 62 00:02:24,540 --> 00:02:28,980 we're gonna briefly talk about 63 00:02:25,819 --> 00:02:31,920 retraining and model maintenance 64 00:02:28,980 --> 00:02:33,840 so stability versus training and then 65 00:02:31,920 --> 00:02:36,480 how long does your model stay in 66 00:02:33,840 --> 00:02:39,180 production and the maintenance cost of 67 00:02:36,480 --> 00:02:40,200 those models okay 68 00:02:39,180 --> 00:02:42,900 now 69 00:02:40,200 --> 00:02:44,580 every time every time every time I start 70 00:02:42,900 --> 00:02:46,680 a machine learning project or I talk 71 00:02:44,580 --> 00:02:48,599 with data scientists 72 00:02:46,680 --> 00:02:49,920 I always ask these questions 73 00:02:48,599 --> 00:02:51,900 these questions 74 00:02:49,920 --> 00:02:54,060 right so 75 00:02:51,900 --> 00:02:57,140 we always discuss if there's any ground 76 00:02:54,060 --> 00:03:00,120 truth validate validation data available 77 00:02:57,140 --> 00:03:02,940 going forward how can we validate the 78 00:03:00,120 --> 00:03:04,319 results of the model what metrics do we 79 00:03:02,940 --> 00:03:07,140 use as I mentioned we're going to go 80 00:03:04,319 --> 00:03:10,200 through this later is there going to be 81 00:03:07,140 --> 00:03:12,360 a change in the distribution of the 82 00:03:10,200 --> 00:03:14,400 features 83 00:03:12,360 --> 00:03:15,900 or are there going to be changes in the 84 00:03:14,400 --> 00:03:18,599 relationship between the features and 85 00:03:15,900 --> 00:03:20,400 the Target right so when I deploy the 86 00:03:18,599 --> 00:03:23,040 application will the deployment 87 00:03:20,400 --> 00:03:25,980 environment change over time 88 00:03:23,040 --> 00:03:27,599 so lastly we always talk about with the 89 00:03:25,980 --> 00:03:29,400 data scientist I'm working with we're 90 00:03:27,599 --> 00:03:31,080 always like how do we return the model 91 00:03:29,400 --> 00:03:33,480 how do we adjust the model how can we 92 00:03:31,080 --> 00:03:34,260 make sure that the model is always you 93 00:03:33,480 --> 00:03:37,500 know 94 00:03:34,260 --> 00:03:39,239 performing at its best 95 00:03:37,500 --> 00:03:41,640 now 96 00:03:39,239 --> 00:03:42,599 let's talk about a performance change I 97 00:03:41,640 --> 00:03:45,659 believe there's going to be a very 98 00:03:42,599 --> 00:03:47,040 specific talk about this tomorrow so 99 00:03:45,659 --> 00:03:49,500 we're just going to cover some of the 100 00:03:47,040 --> 00:03:50,700 basic and like um the underlying 101 00:03:49,500 --> 00:03:54,900 Concepts 102 00:03:50,700 --> 00:03:56,819 with this so model performance change it 103 00:03:54,900 --> 00:04:00,120 decreases over time 104 00:03:56,819 --> 00:04:02,640 it's almost always the case 105 00:04:00,120 --> 00:04:04,200 so it can be a result of several factors 106 00:04:02,640 --> 00:04:06,420 mainly 107 00:04:04,200 --> 00:04:09,180 um if it's a concept drift so the nature 108 00:04:06,420 --> 00:04:11,519 of the problem changes or if it's a data 109 00:04:09,180 --> 00:04:14,400 drift then the distribution and the 110 00:04:11,519 --> 00:04:17,519 relationship between the data changes it 111 00:04:14,400 --> 00:04:20,820 happens most of the time it can be an 112 00:04:17,519 --> 00:04:22,440 actual deployment setup problem right so 113 00:04:20,820 --> 00:04:24,780 someone accidentally pushed a new 114 00:04:22,440 --> 00:04:26,160 version of numpy in the requirements of 115 00:04:24,780 --> 00:04:28,400 the text 116 00:04:26,160 --> 00:04:31,199 the performance changed 117 00:04:28,400 --> 00:04:34,440 models are always going to be performing 118 00:04:31,199 --> 00:04:36,180 at its best right after training 119 00:04:34,440 --> 00:04:38,340 once you deploy it it's 120 00:04:36,180 --> 00:04:39,419 that's its peak right unless you're 121 00:04:38,340 --> 00:04:42,080 retrain it because it's going to 122 00:04:39,419 --> 00:04:44,580 encounter a new data set eventually 123 00:04:42,080 --> 00:04:47,520 and sometimes these changes can cause 124 00:04:44,580 --> 00:04:50,100 unintentional bias which is not bad 125 00:04:47,520 --> 00:04:51,960 because in some context or in some 126 00:04:50,100 --> 00:04:52,500 problems 127 00:04:51,960 --> 00:04:54,720 um 128 00:04:52,500 --> 00:04:56,880 putting in bias is actually quite good 129 00:04:54,720 --> 00:04:58,860 right it depends on the context it 130 00:04:56,880 --> 00:05:01,320 depends on the problem 131 00:04:58,860 --> 00:05:04,979 now just a brief preview about concept 132 00:05:01,320 --> 00:05:06,419 drift it typically involves 133 00:05:04,979 --> 00:05:09,060 um the problem that the model is trying 134 00:05:06,419 --> 00:05:11,220 to solve and suddenly it changed 135 00:05:09,060 --> 00:05:13,259 right so there's 136 00:05:11,220 --> 00:05:15,600 I I drew that graph I'm not sure if it's 137 00:05:13,259 --> 00:05:16,320 good or not but it's cute 138 00:05:15,600 --> 00:05:19,020 um 139 00:05:16,320 --> 00:05:22,139 so here we can see that 140 00:05:19,020 --> 00:05:24,780 initially the model is predicting the 141 00:05:22,139 --> 00:05:27,000 data points in blue correctly the data 142 00:05:24,780 --> 00:05:28,560 points in Orange are the ones that are 143 00:05:27,000 --> 00:05:31,259 used for training 144 00:05:28,560 --> 00:05:32,880 and then suddenly 145 00:05:31,259 --> 00:05:36,060 something happened 146 00:05:32,880 --> 00:05:37,800 right it's misdetecting everything 147 00:05:36,060 --> 00:05:40,199 so these things actually happen in the 148 00:05:37,800 --> 00:05:43,139 wild or in the development environment 149 00:05:40,199 --> 00:05:45,720 in most development environments uh 150 00:05:43,139 --> 00:05:49,560 we have before right one example would 151 00:05:45,720 --> 00:05:51,419 be probably 1920 prediction right when a 152 00:05:49,560 --> 00:05:53,820 variant change or if there's a new 153 00:05:51,419 --> 00:05:54,660 variant essentially it's going to have a 154 00:05:53,820 --> 00:05:58,020 different 155 00:05:54,660 --> 00:05:59,580 interaction right transmissibility will 156 00:05:58,020 --> 00:06:01,380 change other factors will change so 157 00:05:59,580 --> 00:06:04,440 essentially you're not predicting 158 00:06:01,380 --> 00:06:08,580 for the same problem anymore 159 00:06:04,440 --> 00:06:10,680 this could happen suddenly like this one 160 00:06:08,580 --> 00:06:12,979 or it can happen gradually 161 00:06:10,680 --> 00:06:15,479 incrementally or it can be recurring 162 00:06:12,979 --> 00:06:18,360 sometimes share occurring 163 00:06:15,479 --> 00:06:21,120 um chain uh concept drift happens and 164 00:06:18,360 --> 00:06:23,280 like weather data set you know 165 00:06:21,120 --> 00:06:24,960 it's quite common but I think the most 166 00:06:23,280 --> 00:06:26,819 common one is the gradual because 167 00:06:24,960 --> 00:06:30,720 sometimes you don't notice it that it's 168 00:06:26,819 --> 00:06:33,120 happening and then suddenly the entire 169 00:06:30,720 --> 00:06:34,560 problem changes and your prediction 170 00:06:33,120 --> 00:06:36,419 changes as well 171 00:06:34,560 --> 00:06:38,580 right 172 00:06:36,419 --> 00:06:41,699 there's also data drift 173 00:06:38,580 --> 00:06:44,639 so I think this is easier to understand 174 00:06:41,699 --> 00:06:47,819 it basically means you created the model 175 00:06:44,639 --> 00:06:50,580 using a certain set of data range or 176 00:06:47,819 --> 00:06:52,020 data distribution and then as you go 177 00:06:50,580 --> 00:06:55,319 forward 178 00:06:52,020 --> 00:06:57,960 new types of data distribution occur 179 00:06:55,319 --> 00:07:00,180 right so this is a very common in 180 00:06:57,960 --> 00:07:02,460 financial models or in demographic type 181 00:07:00,180 --> 00:07:05,039 of data right so if you're dealing with 182 00:07:02,460 --> 00:07:07,080 this problems right here 183 00:07:05,039 --> 00:07:09,000 you can expect to encounter this and 184 00:07:07,080 --> 00:07:11,759 it's usually easily fixed by just you 185 00:07:09,000 --> 00:07:14,520 know retraining the model again 186 00:07:11,759 --> 00:07:15,780 so we'll discuss more about retraining 187 00:07:14,520 --> 00:07:17,639 later 188 00:07:15,780 --> 00:07:20,340 now 189 00:07:17,639 --> 00:07:22,680 why is it important so you know Morty if 190 00:07:20,340 --> 00:07:24,180 you're I'm a fan of Rick and Morty 191 00:07:22,680 --> 00:07:26,819 um 192 00:07:24,180 --> 00:07:28,440 detecting and identifying this change or 193 00:07:26,819 --> 00:07:31,020 this Trends in terms of the performance 194 00:07:28,440 --> 00:07:32,819 of the model is actually quite important 195 00:07:31,020 --> 00:07:34,319 because it's a good indicator of your 196 00:07:32,819 --> 00:07:37,259 machine learning application it's a good 197 00:07:34,319 --> 00:07:39,180 indicator how it behaves right people 198 00:07:37,259 --> 00:07:41,520 talk about trying to 199 00:07:39,180 --> 00:07:43,199 make sure that their machine learning 200 00:07:41,520 --> 00:07:44,099 applications perform really well over 201 00:07:43,199 --> 00:07:45,780 time 202 00:07:44,099 --> 00:07:48,120 detecting these changes are a good 203 00:07:45,780 --> 00:07:49,800 indication of you know what's happening 204 00:07:48,120 --> 00:07:51,440 in your application 205 00:07:49,800 --> 00:07:54,900 so 206 00:07:51,440 --> 00:07:57,780 understanding these changes also lets 207 00:07:54,900 --> 00:08:00,660 you efficiently make decisions in terms 208 00:07:57,780 --> 00:08:02,940 of free training adjustments in your 209 00:08:00,660 --> 00:08:05,819 model especially when we have this 210 00:08:02,940 --> 00:08:07,560 project before and we're trying to do a 211 00:08:05,819 --> 00:08:09,300 factory line so you need to predict 212 00:08:07,560 --> 00:08:11,759 really fast and then you need to retrain 213 00:08:09,300 --> 00:08:13,020 really fast something like that so if 214 00:08:11,759 --> 00:08:14,460 you understand 215 00:08:13,020 --> 00:08:17,280 how 216 00:08:14,460 --> 00:08:19,080 the performance shifts you can easily 217 00:08:17,280 --> 00:08:21,060 retrain based on the parameters that you 218 00:08:19,080 --> 00:08:25,259 want etc etc 219 00:08:21,060 --> 00:08:26,520 and one of the I think one of the most 220 00:08:25,259 --> 00:08:28,919 interesting 221 00:08:26,520 --> 00:08:31,440 things that can happen is that you are 222 00:08:28,919 --> 00:08:34,200 able to understand more of the data set 223 00:08:31,440 --> 00:08:36,959 right if you're able to detect this 224 00:08:34,200 --> 00:08:39,180 change in the model performance then 225 00:08:36,959 --> 00:08:41,159 some new patterns arise some new 226 00:08:39,180 --> 00:08:43,200 features can be extracted and stuff like 227 00:08:41,159 --> 00:08:45,660 that so this is quite interesting this 228 00:08:43,200 --> 00:08:48,360 is part of feature engineering this is 229 00:08:45,660 --> 00:08:49,680 quite interesting that um it shows up 230 00:08:48,360 --> 00:08:50,820 once you understand how your model 231 00:08:49,680 --> 00:08:53,040 performs 232 00:08:50,820 --> 00:08:55,740 right 233 00:08:53,040 --> 00:08:58,519 so as I mentioned earlier 234 00:08:55,740 --> 00:09:01,019 here we have an example of 235 00:08:58,519 --> 00:09:05,100 fixing a data drift 236 00:09:01,019 --> 00:09:07,500 so the initial training data is limited 237 00:09:05,100 --> 00:09:09,360 and then as we move forward and 238 00:09:07,500 --> 00:09:11,880 encounter new predictions what we can do 239 00:09:09,360 --> 00:09:13,740 is we can adjust the training data to 240 00:09:11,880 --> 00:09:15,180 include the new data set that we 241 00:09:13,740 --> 00:09:18,360 encounter 242 00:09:15,180 --> 00:09:20,940 in order for us to correct the model 243 00:09:18,360 --> 00:09:21,600 so here you can clearly see that you 244 00:09:20,940 --> 00:09:23,880 know 245 00:09:21,600 --> 00:09:27,560 the model adjusted if you include more 246 00:09:23,880 --> 00:09:27,560 training data set right 247 00:09:27,839 --> 00:09:32,519 now how do we measure 248 00:09:30,120 --> 00:09:34,800 the model performance let's just go 249 00:09:32,519 --> 00:09:37,920 through this briefly it depends on the 250 00:09:34,800 --> 00:09:39,959 problem right so I'm sure at least most 251 00:09:37,920 --> 00:09:41,700 of most of you might be familiar with 252 00:09:39,959 --> 00:09:44,760 regression and classification type of 253 00:09:41,700 --> 00:09:47,399 problems so depending on the problem you 254 00:09:44,760 --> 00:09:48,660 might want to use this like mean square 255 00:09:47,399 --> 00:09:51,060 error 256 00:09:48,660 --> 00:09:53,100 root mean square error and you know 257 00:09:51,060 --> 00:09:54,720 similar approach if you're dealing with 258 00:09:53,100 --> 00:09:55,980 regression problems 259 00:09:54,720 --> 00:09:58,500 so 260 00:09:55,980 --> 00:10:01,560 if you're dealing with classification 261 00:09:58,500 --> 00:10:05,940 problems then you typically use your 262 00:10:01,560 --> 00:10:08,540 accuracy precision recall F1 Roc AUC and 263 00:10:05,940 --> 00:10:08,540 stuff like that 264 00:10:09,660 --> 00:10:12,920 but what does it mean right 265 00:10:12,959 --> 00:10:18,019 let's say for example I have this 266 00:10:15,120 --> 00:10:21,420 performance Trend right here 267 00:10:18,019 --> 00:10:24,720 I'm measuring two metrics which is 268 00:10:21,420 --> 00:10:27,240 accuracy and recall for example 269 00:10:24,720 --> 00:10:30,540 um initially the recall started really 270 00:10:27,240 --> 00:10:34,100 high but then at some point it went down 271 00:10:30,540 --> 00:10:34,100 while the accuracy went up 272 00:10:34,140 --> 00:10:39,060 which metrics should you use is it 273 00:10:36,959 --> 00:10:42,620 accuracy because you know over time it 274 00:10:39,060 --> 00:10:42,620 goes up or it is it recall 275 00:10:43,320 --> 00:10:47,459 it depends on the problem and on the 276 00:10:45,300 --> 00:10:50,040 context of the problem right there are 277 00:10:47,459 --> 00:10:51,180 some instances that it's very important 278 00:10:50,040 --> 00:10:53,700 to know 279 00:10:51,180 --> 00:10:55,740 uh like for example when recall it's 280 00:10:53,700 --> 00:10:57,839 being used for medical data set when 281 00:10:55,740 --> 00:10:59,880 you're not you know it's okay for false 282 00:10:57,839 --> 00:11:01,620 positives and stuff like that so 283 00:10:59,880 --> 00:11:03,000 accuracy there are also certain types of 284 00:11:01,620 --> 00:11:05,700 problems that you might want to prefer 285 00:11:03,000 --> 00:11:07,019 accuracy or F1 right 286 00:11:05,700 --> 00:11:09,120 so 287 00:11:07,019 --> 00:11:12,060 it depends on the circumstance 288 00:11:09,120 --> 00:11:14,160 uh and in our use case and in my 289 00:11:12,060 --> 00:11:17,220 experience typically when we're trying 290 00:11:14,160 --> 00:11:19,680 to monitor um the performance of a model 291 00:11:17,220 --> 00:11:21,420 we use two three five metrics at a given 292 00:11:19,680 --> 00:11:23,519 moment at a given model 293 00:11:21,420 --> 00:11:27,720 so it's not just one it's not just two 294 00:11:23,519 --> 00:11:30,779 like minimum two yes but then 295 00:11:27,720 --> 00:11:31,440 the more the merrier in this case or you 296 00:11:30,779 --> 00:11:33,300 know 297 00:11:31,440 --> 00:11:34,260 the more you use the more method you use 298 00:11:33,300 --> 00:11:36,300 the more 299 00:11:34,260 --> 00:11:38,519 information and the more you understand 300 00:11:36,300 --> 00:11:40,560 the behavior of the model 301 00:11:38,519 --> 00:11:42,240 right 302 00:11:40,560 --> 00:11:45,180 now 303 00:11:42,240 --> 00:11:49,079 how do we build a simple that's the key 304 00:11:45,180 --> 00:11:50,519 term a simple validation module 305 00:11:49,079 --> 00:11:51,839 how do we integrate this in our 306 00:11:50,519 --> 00:11:54,480 application 307 00:11:51,839 --> 00:11:56,519 so there are several ways for you to be 308 00:11:54,480 --> 00:11:57,720 able to detect data drift and concept 309 00:11:56,519 --> 00:11:58,320 drift 310 00:11:57,720 --> 00:12:01,320 um 311 00:11:58,320 --> 00:12:04,019 there are statistical tests those are 312 00:12:01,320 --> 00:12:05,760 quite technical right and some of us 313 00:12:04,019 --> 00:12:07,800 don't want to deal with those kinds of 314 00:12:05,760 --> 00:12:10,440 numbers and those approach there are 315 00:12:07,800 --> 00:12:12,720 several drift detection algorithms but 316 00:12:10,440 --> 00:12:15,300 typically they happen 317 00:12:12,720 --> 00:12:17,220 on uh you need 318 00:12:15,300 --> 00:12:19,560 you need lots of data set to be able to 319 00:12:17,220 --> 00:12:21,959 understand it we want to be able to 320 00:12:19,560 --> 00:12:22,800 detect these changes on the go as it 321 00:12:21,959 --> 00:12:26,220 happens 322 00:12:22,800 --> 00:12:27,899 right it needs to be simple so that it 323 00:12:26,220 --> 00:12:29,459 can be integrated into any kind of 324 00:12:27,899 --> 00:12:30,779 application that we want 325 00:12:29,459 --> 00:12:32,640 and 326 00:12:30,779 --> 00:12:34,920 the model validation function should be 327 00:12:32,640 --> 00:12:37,500 easily understandable because 328 00:12:34,920 --> 00:12:40,200 for developers we want maintainable and 329 00:12:37,500 --> 00:12:42,180 easy to understand code 330 00:12:40,200 --> 00:12:45,060 so you know show me what you got if you 331 00:12:42,180 --> 00:12:47,040 from share with the reference anyway 332 00:12:45,060 --> 00:12:48,600 um yeah that's the thing we need to 333 00:12:47,040 --> 00:12:51,480 catch the early signs 334 00:12:48,600 --> 00:12:54,320 so if you look at my poorly drawn graph 335 00:12:51,480 --> 00:12:54,320 right there 336 00:12:54,600 --> 00:12:58,680 we need to catch those instances the one 337 00:12:56,519 --> 00:13:03,300 that's in circled uh 338 00:12:58,680 --> 00:13:05,279 those data points right what we learned 339 00:13:03,300 --> 00:13:07,740 when we're building this machine 340 00:13:05,279 --> 00:13:10,380 learning applications 341 00:13:07,740 --> 00:13:14,220 we need to regularly check 342 00:13:10,380 --> 00:13:17,579 for performance change 343 00:13:14,220 --> 00:13:19,680 there are applications that 344 00:13:17,579 --> 00:13:22,380 we have I don't know maybe every five 345 00:13:19,680 --> 00:13:23,940 minutes we detect for we try to see if 346 00:13:22,380 --> 00:13:26,399 there's a performance change because 347 00:13:23,940 --> 00:13:27,660 it's surrounding 1000 predictions every 348 00:13:26,399 --> 00:13:30,600 second so 349 00:13:27,660 --> 00:13:31,860 you know and then one of the things that 350 00:13:30,600 --> 00:13:33,959 we learned is that 351 00:13:31,860 --> 00:13:35,279 let's just assume that the outliers are 352 00:13:33,959 --> 00:13:38,399 wrong prediction 353 00:13:35,279 --> 00:13:41,880 okay I have no bad blood versus outliers 354 00:13:38,399 --> 00:13:43,019 but you know in this instance 355 00:13:41,880 --> 00:13:46,079 um 356 00:13:43,019 --> 00:13:47,820 it's safe to assume that 357 00:13:46,079 --> 00:13:49,200 there's something wrong with these 358 00:13:47,820 --> 00:13:52,800 outliers right here and we want to 359 00:13:49,200 --> 00:13:54,779 understand why are they outliers so yeah 360 00:13:52,800 --> 00:13:56,040 just assume they're wrong predictions 361 00:13:54,779 --> 00:13:58,200 okay 362 00:13:56,040 --> 00:13:59,700 The Next Step would be to prepare a good 363 00:13:58,200 --> 00:14:01,620 validation data set 364 00:13:59,700 --> 00:14:04,200 I will show you a simple diagram later 365 00:14:01,620 --> 00:14:06,600 but essentially it's different from your 366 00:14:04,200 --> 00:14:09,180 testing or training data set these are 367 00:14:06,600 --> 00:14:10,560 data set that you can anchor to the 368 00:14:09,180 --> 00:14:12,420 model that's the term that we're using 369 00:14:10,560 --> 00:14:14,820 or that you can attach to the model to 370 00:14:12,420 --> 00:14:16,560 the deployed model so that you have a 371 00:14:14,820 --> 00:14:17,639 consistent understanding of their 372 00:14:16,560 --> 00:14:19,860 performance 373 00:14:17,639 --> 00:14:22,740 I'm gonna see that later 374 00:14:19,860 --> 00:14:23,880 and ideally you should have a secondary 375 00:14:22,740 --> 00:14:26,639 model 376 00:14:23,880 --> 00:14:30,240 stronger more powerful model for auto my 377 00:14:26,639 --> 00:14:32,160 automated validation or sometimes we do 378 00:14:30,240 --> 00:14:33,899 actually most of the times we do manual 379 00:14:32,160 --> 00:14:36,660 validation of 380 00:14:33,899 --> 00:14:38,220 um sample incoming data so that we're 381 00:14:36,660 --> 00:14:39,240 sure that the application is performing 382 00:14:38,220 --> 00:14:41,720 really well 383 00:14:39,240 --> 00:14:44,040 and it's very useful in terms of 384 00:14:41,720 --> 00:14:46,339 retraining or creating a new version of 385 00:14:44,040 --> 00:14:46,339 the model 386 00:14:47,399 --> 00:14:52,440 now here's a basic overview of what 387 00:14:50,519 --> 00:14:54,779 typically happens 388 00:14:52,440 --> 00:14:57,899 um during you know model development 389 00:14:54,779 --> 00:15:00,300 deploying it into an application so 390 00:14:57,899 --> 00:15:01,740 initially you have the model development 391 00:15:00,300 --> 00:15:03,000 which is the data scientists are 392 00:15:01,740 --> 00:15:05,820 involved the machine learning Engineers 393 00:15:03,000 --> 00:15:07,380 are involved and they're using training 394 00:15:05,820 --> 00:15:08,940 and testing data set to be able to 395 00:15:07,380 --> 00:15:10,920 create the best model or to be able to 396 00:15:08,940 --> 00:15:13,320 produce the best model 397 00:15:10,920 --> 00:15:15,540 after that we're going to use that model 398 00:15:13,320 --> 00:15:17,940 in an application and typically we 399 00:15:15,540 --> 00:15:20,160 record or no no you should record the 400 00:15:17,940 --> 00:15:21,540 results in the database or you know you 401 00:15:20,160 --> 00:15:23,279 should keep track of the results of your 402 00:15:21,540 --> 00:15:27,360 model that's the key part right there 403 00:15:23,279 --> 00:15:29,399 okay and then lastly we have two ways 404 00:15:27,360 --> 00:15:31,139 in most of our applications that we 405 00:15:29,399 --> 00:15:33,120 built we have two ways of model 406 00:15:31,139 --> 00:15:34,440 validation or you know checking the 407 00:15:33,120 --> 00:15:37,019 performance of the model 408 00:15:34,440 --> 00:15:40,320 so one is using the validation data set 409 00:15:37,019 --> 00:15:41,940 and the other one is double checking the 410 00:15:40,320 --> 00:15:44,100 incoming or the new data that we 411 00:15:41,940 --> 00:15:47,120 encounter or the model or the new data 412 00:15:44,100 --> 00:15:47,120 that the model encounters 413 00:15:47,820 --> 00:15:52,920 now let's just you know um 414 00:15:50,639 --> 00:15:56,279 take it slow maybe use this as a good 415 00:15:52,920 --> 00:15:58,260 story telling stuff that um we're gonna 416 00:15:56,279 --> 00:16:00,180 go each step 417 00:15:58,260 --> 00:16:01,560 so first one 418 00:16:00,180 --> 00:16:03,959 um data scientists machine learning 419 00:16:01,560 --> 00:16:05,639 Engineers typically create 420 00:16:03,959 --> 00:16:08,459 a really good model 421 00:16:05,639 --> 00:16:10,380 using testing and training data set 422 00:16:08,459 --> 00:16:11,940 I think most of us are familiar with 423 00:16:10,380 --> 00:16:13,920 that step 424 00:16:11,940 --> 00:16:16,620 after that 425 00:16:13,920 --> 00:16:18,779 what we typically do is we prepare a 426 00:16:16,620 --> 00:16:22,680 separate validation data set 427 00:16:18,779 --> 00:16:24,360 okay and then we place that model into a 428 00:16:22,680 --> 00:16:26,459 controlled environment that's very 429 00:16:24,360 --> 00:16:29,459 similar to production ideally 430 00:16:26,459 --> 00:16:32,279 and then we run the validation data set 431 00:16:29,459 --> 00:16:33,899 several times on the model maybe if 432 00:16:32,279 --> 00:16:35,760 you're using k-fold or any kind of 433 00:16:33,899 --> 00:16:37,620 sampling whatever 434 00:16:35,760 --> 00:16:39,899 um it's up to you 435 00:16:37,620 --> 00:16:41,940 so we run it we run it we run it and 436 00:16:39,899 --> 00:16:45,060 then we record the results 437 00:16:41,940 --> 00:16:48,360 after that we have this like uh we have 438 00:16:45,060 --> 00:16:50,579 this acceptable result metric that we're 439 00:16:48,360 --> 00:16:53,220 basing or that we're recording 440 00:16:50,579 --> 00:16:54,980 so in this instance for example this 441 00:16:53,220 --> 00:16:58,139 application 442 00:16:54,980 --> 00:17:00,420 this application setup paired with this 443 00:16:58,139 --> 00:17:03,120 model paired with this validation data 444 00:17:00,420 --> 00:17:06,079 set should produce an accuracy of 80 to 445 00:17:03,120 --> 00:17:10,819 82 percent and an F1 score of 80 to 81 446 00:17:06,079 --> 00:17:10,819 with 100 predictions in 10 seconds 447 00:17:10,980 --> 00:17:15,780 so you see our trying to figure out the 448 00:17:13,380 --> 00:17:18,419 constraint 449 00:17:15,780 --> 00:17:20,579 of this model in this given environment 450 00:17:18,419 --> 00:17:22,740 so take note of that um 451 00:17:20,579 --> 00:17:27,020 take note of that metrics right there so 452 00:17:22,740 --> 00:17:27,020 we typically record it right 453 00:17:27,480 --> 00:17:33,720 and then we deploy the model in the 454 00:17:30,679 --> 00:17:36,179 application production setup so the 455 00:17:33,720 --> 00:17:38,220 model does its job users use the 456 00:17:36,179 --> 00:17:39,780 application the model does the 457 00:17:38,220 --> 00:17:42,980 prediction it stores it into the 458 00:17:39,780 --> 00:17:42,980 database and you know 459 00:17:45,000 --> 00:17:49,559 after that 460 00:17:47,160 --> 00:17:51,600 we want to identify if the model is 461 00:17:49,559 --> 00:17:54,480 still doing well 462 00:17:51,600 --> 00:17:56,160 as we use the application so again we're 463 00:17:54,480 --> 00:17:59,160 going to go back to the initial approach 464 00:17:56,160 --> 00:18:02,039 right here inside the application we 465 00:17:59,160 --> 00:18:03,720 have the validation data set and we will 466 00:18:02,039 --> 00:18:06,240 apply that validation data set to the 467 00:18:03,720 --> 00:18:08,520 model and we will check if the result is 468 00:18:06,240 --> 00:18:10,919 still consistent with the recorded 469 00:18:08,520 --> 00:18:12,720 results that we have 470 00:18:10,919 --> 00:18:13,740 from our testing 471 00:18:12,720 --> 00:18:15,240 right 472 00:18:13,740 --> 00:18:18,799 so 473 00:18:15,240 --> 00:18:21,299 here is a very simple fast API example 474 00:18:18,799 --> 00:18:25,280 some of our micro services are built in 475 00:18:21,299 --> 00:18:28,380 fast API so we're just using the repeat 476 00:18:25,280 --> 00:18:31,080 decorator and first thing that we do is 477 00:18:28,380 --> 00:18:33,419 we load the validation data set 478 00:18:31,080 --> 00:18:35,940 after loading the validation data set we 479 00:18:33,419 --> 00:18:38,820 perform the prediction on the validation 480 00:18:35,940 --> 00:18:42,000 data set so it's here 481 00:18:38,820 --> 00:18:44,100 oh it's cool you can see it anyway 482 00:18:42,000 --> 00:18:45,600 um we record the start time 483 00:18:44,100 --> 00:18:47,220 and the end time just to get the 484 00:18:45,600 --> 00:18:50,820 duration 485 00:18:47,220 --> 00:18:53,039 right here and then 486 00:18:50,820 --> 00:18:55,500 we basically 487 00:18:53,039 --> 00:18:57,720 get the F1 and the accuracy or whichever 488 00:18:55,500 --> 00:19:00,179 metric you are using 489 00:18:57,720 --> 00:19:02,280 now after this we're basically just 490 00:19:00,179 --> 00:19:06,120 comparing 491 00:19:02,280 --> 00:19:07,740 if the accuracy F1 in duration is still 492 00:19:06,120 --> 00:19:10,500 within this range 493 00:19:07,740 --> 00:19:13,559 right so every time every time that this 494 00:19:10,500 --> 00:19:15,539 validation runs 495 00:19:13,559 --> 00:19:17,340 in this case it's running every day 496 00:19:15,539 --> 00:19:19,860 because um the frequencies 497 00:19:17,340 --> 00:19:22,559 per day so every day it should always 498 00:19:19,860 --> 00:19:23,640 fall into that bounds 499 00:19:22,559 --> 00:19:26,100 right 500 00:19:23,640 --> 00:19:27,660 if it's not there or if it decreases if 501 00:19:26,100 --> 00:19:30,360 it increases 502 00:19:27,660 --> 00:19:32,580 something might have gone you know wrong 503 00:19:30,360 --> 00:19:34,140 or something might have changed from the 504 00:19:32,580 --> 00:19:36,660 setup 505 00:19:34,140 --> 00:19:40,580 Okay so 506 00:19:36,660 --> 00:19:40,580 what's it for right 507 00:19:40,799 --> 00:19:45,000 initially when you look at this setup it 508 00:19:42,900 --> 00:19:46,620 seems pretty trivial because you're just 509 00:19:45,000 --> 00:19:47,760 validating it against itself every time 510 00:19:46,620 --> 00:19:49,980 every time 511 00:19:47,760 --> 00:19:51,419 what are we gaining from this 512 00:19:49,980 --> 00:19:53,580 right 513 00:19:51,419 --> 00:19:55,020 first is it's quite simple you can 514 00:19:53,580 --> 00:19:57,960 implement it in most of your 515 00:19:55,020 --> 00:19:59,880 applications easily straightforward and 516 00:19:57,960 --> 00:20:02,460 it's flexible enough to support multiple 517 00:19:59,880 --> 00:20:05,120 validation metrics okay 518 00:20:02,460 --> 00:20:07,980 however what we discovered is that 519 00:20:05,120 --> 00:20:12,360 implementing this very simple function 520 00:20:07,980 --> 00:20:14,820 can catch or can identify if there's a 521 00:20:12,360 --> 00:20:17,100 change in library or dependency version 522 00:20:14,820 --> 00:20:19,500 numpy for example 523 00:20:17,100 --> 00:20:21,780 um one specific example is that we have 524 00:20:19,500 --> 00:20:24,539 an application deployed in an ec2 525 00:20:21,780 --> 00:20:27,539 instance and for those of you who are 526 00:20:24,539 --> 00:20:30,419 familiar sometimes Amazon role you know 527 00:20:27,539 --> 00:20:33,240 they sometimes update the VMware that's 528 00:20:30,419 --> 00:20:36,000 underlying the is it it's quite annoying 529 00:20:33,240 --> 00:20:37,620 so you think that there's no update but 530 00:20:36,000 --> 00:20:40,080 they and they update the underlying 531 00:20:37,620 --> 00:20:42,000 architecture of the VMware which changes 532 00:20:40,080 --> 00:20:43,380 some of the libraries especially when we 533 00:20:42,000 --> 00:20:45,179 are using deep learning and stuff like 534 00:20:43,380 --> 00:20:47,640 that so 535 00:20:45,179 --> 00:20:50,280 our results suddenly change and we're 536 00:20:47,640 --> 00:20:51,780 like what what's happening what 537 00:20:50,280 --> 00:20:54,059 did something change did someone push 538 00:20:51,780 --> 00:20:55,860 something 539 00:20:54,059 --> 00:20:57,660 so that's that's one of the use cases 540 00:20:55,860 --> 00:20:59,880 it's actually very useful to detect if 541 00:20:57,660 --> 00:21:03,000 the setup has changed 542 00:20:59,880 --> 00:21:05,039 um there are some instances that the 543 00:21:03,000 --> 00:21:07,400 parameters of the model change when 544 00:21:05,039 --> 00:21:11,100 someone push a new code or when someone 545 00:21:07,400 --> 00:21:12,960 creates a bug fix and stuff like that 546 00:21:11,100 --> 00:21:14,640 sometimes they accidentally change the 547 00:21:12,960 --> 00:21:15,960 parameter of the model so we can detect 548 00:21:14,640 --> 00:21:17,220 that change as well 549 00:21:15,960 --> 00:21:19,140 and 550 00:21:17,220 --> 00:21:21,960 when you have a device set up if you're 551 00:21:19,140 --> 00:21:24,000 using uh Hardware in your prediction 552 00:21:21,960 --> 00:21:25,799 this is definitely useful because you 553 00:21:24,000 --> 00:21:27,840 don't know if the hardware is broken and 554 00:21:25,799 --> 00:21:30,059 if it's deployed somewhere if someone 555 00:21:27,840 --> 00:21:32,760 tampered with the hardware if the 556 00:21:30,059 --> 00:21:34,380 resource is not enough so that's the 557 00:21:32,760 --> 00:21:35,880 goal of the first validation to detect 558 00:21:34,380 --> 00:21:37,919 these kinds of change 559 00:21:35,880 --> 00:21:41,220 okay 560 00:21:37,919 --> 00:21:43,919 now let's go to the second step right 561 00:21:41,220 --> 00:21:46,200 so we have our application right here 562 00:21:43,919 --> 00:21:48,240 and our 563 00:21:46,200 --> 00:21:50,400 really good model 564 00:21:48,240 --> 00:21:53,100 didn't do so well on some of the data 565 00:21:50,400 --> 00:21:55,679 points right so he's like I'm not sure 566 00:21:53,100 --> 00:21:58,440 maybe I'm just 50 sure of this maybe I'm 567 00:21:55,679 --> 00:22:01,260 70 sure of this I'm 63 sure of this and 568 00:21:58,440 --> 00:22:03,120 stuff like that so in classification 569 00:22:01,260 --> 00:22:05,820 problems there is a predict probability 570 00:22:03,120 --> 00:22:08,460 which is this one so 571 00:22:05,820 --> 00:22:10,559 this is an example of some results using 572 00:22:08,460 --> 00:22:13,679 a classification problem 573 00:22:10,559 --> 00:22:15,120 right and then our little model right 574 00:22:13,679 --> 00:22:16,980 here is not sure 575 00:22:15,120 --> 00:22:19,620 I don't know but 576 00:22:16,980 --> 00:22:23,280 maybe this is true or maybe this is 577 00:22:19,620 --> 00:22:25,260 false so we we record the results right 578 00:22:23,280 --> 00:22:27,780 even the probability 579 00:22:25,260 --> 00:22:30,080 we we record it every time it performs a 580 00:22:27,780 --> 00:22:30,080 prediction 581 00:22:30,780 --> 00:22:35,520 now we have the probability we have two 582 00:22:33,299 --> 00:22:37,860 ways of going around this problem 583 00:22:35,520 --> 00:22:40,440 the first one is we have this big bad 584 00:22:37,860 --> 00:22:42,480 new model right here right so this is 585 00:22:40,440 --> 00:22:45,780 often our secondary model 586 00:22:42,480 --> 00:22:48,780 uh in our deployments we always have two 587 00:22:45,780 --> 00:22:50,400 or three models in place right so the 588 00:22:48,780 --> 00:22:52,799 first model is the one that we're using 589 00:22:50,400 --> 00:22:54,360 the one that's actually Taylor Made for 590 00:22:52,799 --> 00:22:56,340 that specific problem 591 00:22:54,360 --> 00:22:59,400 and then we have another model right 592 00:22:56,340 --> 00:23:02,940 here who's probably a lot stronger can 593 00:22:59,400 --> 00:23:05,460 classify better but it's quite slow 594 00:23:02,940 --> 00:23:07,320 right so there's a trade-off but they're 595 00:23:05,460 --> 00:23:09,720 still useful right if they can just 596 00:23:07,320 --> 00:23:11,100 validate some sample data it's still 597 00:23:09,720 --> 00:23:13,620 useful 598 00:23:11,100 --> 00:23:15,360 so that's the model that's the validator 599 00:23:13,620 --> 00:23:18,240 model right here that's what it's doing 600 00:23:15,360 --> 00:23:20,940 it's trying to figure out if the results 601 00:23:18,240 --> 00:23:23,280 of this main model that we have 602 00:23:20,940 --> 00:23:25,559 if the low confidence course it's trying 603 00:23:23,280 --> 00:23:29,580 to identify or it's trying to classify 604 00:23:25,559 --> 00:23:31,559 it with better confidence so the goal is 605 00:23:29,580 --> 00:23:32,340 this classifications right here should 606 00:23:31,559 --> 00:23:34,500 be 607 00:23:32,340 --> 00:23:36,659 you know 608 00:23:34,500 --> 00:23:38,059 it should have a solid classic or it 609 00:23:36,659 --> 00:23:42,179 should be at this 80 610 00:23:38,059 --> 00:23:44,700 confidence uh maybe even higher right 611 00:23:42,179 --> 00:23:46,440 and if you're using this you can do an 612 00:23:44,700 --> 00:23:49,080 automatic retraining if you want well 613 00:23:46,440 --> 00:23:50,159 I'm going to show you some code Snippets 614 00:23:49,080 --> 00:23:52,980 later 615 00:23:50,159 --> 00:23:54,600 uh in some instances we're working with 616 00:23:52,980 --> 00:23:57,240 several 617 00:23:54,600 --> 00:23:58,559 research groups for example 618 00:23:57,240 --> 00:24:00,900 um and then when they're using their 619 00:23:58,559 --> 00:24:03,720 model in the application what they 620 00:24:00,900 --> 00:24:07,440 wanted is they want to see 621 00:24:03,720 --> 00:24:08,700 the sampled data set right here so they 622 00:24:07,440 --> 00:24:10,860 want to be able to see it and they want 623 00:24:08,700 --> 00:24:12,659 to be able to verify it themselves this 624 00:24:10,860 --> 00:24:14,179 is very crucial especially when you're 625 00:24:12,659 --> 00:24:16,860 working with them 626 00:24:14,179 --> 00:24:20,280 medical type of problem or maybe 627 00:24:16,860 --> 00:24:21,240 research type research in biology or you 628 00:24:20,280 --> 00:24:23,100 know 629 00:24:21,240 --> 00:24:24,960 so they really want to see their data 630 00:24:23,100 --> 00:24:27,720 they want to be intimate with their data 631 00:24:24,960 --> 00:24:30,480 so in this case what we do is after we 632 00:24:27,720 --> 00:24:32,580 build the application we sample this 633 00:24:30,480 --> 00:24:36,960 data set right here and then we give it 634 00:24:32,580 --> 00:24:39,960 to them through a report right and then 635 00:24:36,960 --> 00:24:42,120 that's it they validate it and then they 636 00:24:39,960 --> 00:24:45,480 can retrain based on it 637 00:24:42,120 --> 00:24:47,580 so this is sample code of manual 638 00:24:45,480 --> 00:24:48,960 retraining so 639 00:24:47,580 --> 00:24:50,880 we just 640 00:24:48,960 --> 00:24:52,919 right here we get the sampled result 641 00:24:50,880 --> 00:24:53,640 that has very low confidence score and 642 00:24:52,919 --> 00:24:56,039 then 643 00:24:53,640 --> 00:24:58,880 basically send it to the user 644 00:24:56,039 --> 00:24:58,880 do whatever you want 645 00:24:59,520 --> 00:25:04,380 or we can go in the automated route in 646 00:25:02,640 --> 00:25:05,640 this case after we get the sampled 647 00:25:04,380 --> 00:25:08,640 results 648 00:25:05,640 --> 00:25:11,940 right we have a second validator that 649 00:25:08,640 --> 00:25:13,980 predicts the flag data set and then we 650 00:25:11,940 --> 00:25:15,120 compare the accuracy or whatever metric 651 00:25:13,980 --> 00:25:16,919 you want to use 652 00:25:15,120 --> 00:25:19,860 if we want we can just directly 653 00:25:16,919 --> 00:25:22,620 overwrite the results using the more 654 00:25:19,860 --> 00:25:25,559 stronger model or we can 655 00:25:22,620 --> 00:25:27,720 decide on what to do maybe if doctors is 656 00:25:25,559 --> 00:25:30,000 this overwrite and then retrain 657 00:25:27,720 --> 00:25:31,080 okay 658 00:25:30,000 --> 00:25:33,179 now 659 00:25:31,080 --> 00:25:35,400 this is really good 660 00:25:33,179 --> 00:25:38,400 for detecting early signs of drifts 661 00:25:35,400 --> 00:25:40,320 model drifts concept drifts and possible 662 00:25:38,400 --> 00:25:42,960 changes in the data distribution 663 00:25:40,320 --> 00:25:44,760 so this is what we usually have in place 664 00:25:42,960 --> 00:25:46,559 so that we can identify if there's a 665 00:25:44,760 --> 00:25:48,000 problem or if there's going to be a 666 00:25:46,559 --> 00:25:49,980 problem with the performance of the 667 00:25:48,000 --> 00:25:51,240 model 668 00:25:49,980 --> 00:25:52,860 now 669 00:25:51,240 --> 00:25:55,020 lastly we're going to talk about model 670 00:25:52,860 --> 00:25:56,220 maintenance 671 00:25:55,020 --> 00:25:58,559 um 672 00:25:56,220 --> 00:26:01,580 here we have the concept of model 673 00:25:58,559 --> 00:26:01,580 stability versus 674 00:26:01,679 --> 00:26:05,940 retraining sorry 675 00:26:03,360 --> 00:26:07,860 so models are more stable if you don't 676 00:26:05,940 --> 00:26:11,220 need to retrain them from time to time 677 00:26:07,860 --> 00:26:13,980 like this one in the first model and 678 00:26:11,220 --> 00:26:16,860 it's less prone to model and data drift 679 00:26:13,980 --> 00:26:19,080 right however they might require more 680 00:26:16,860 --> 00:26:20,940 time to develop because you need 681 00:26:19,080 --> 00:26:23,159 more data set you need to gather more 682 00:26:20,940 --> 00:26:25,140 resource to make the model more stable 683 00:26:23,159 --> 00:26:26,940 unlike the second model right here where 684 00:26:25,140 --> 00:26:29,840 you need to retrain periodically if the 685 00:26:26,940 --> 00:26:29,840 performance goes down 686 00:26:30,900 --> 00:26:34,080 however 687 00:26:32,159 --> 00:26:36,539 you need to understand that if you 688 00:26:34,080 --> 00:26:37,500 retrain the model on a longer period of 689 00:26:36,539 --> 00:26:40,140 time 690 00:26:37,500 --> 00:26:41,940 it seems that it's actually quite stable 691 00:26:40,140 --> 00:26:43,919 right as long as you have a good 692 00:26:41,940 --> 00:26:46,320 retraining process 693 00:26:43,919 --> 00:26:49,039 and it benefits from the new data set 694 00:26:46,320 --> 00:26:49,039 that it learns 695 00:26:50,220 --> 00:26:55,380 now for model longevity like it depends 696 00:26:52,860 --> 00:26:57,840 on the context and the expected inputs 697 00:26:55,380 --> 00:26:59,700 if it's just a small problem then 698 00:26:57,840 --> 00:27:02,159 typically you're gonna have a model 699 00:26:59,700 --> 00:27:03,720 deployed somewhere over a long period of 700 00:27:02,159 --> 00:27:04,740 time because it doesn't require any 701 00:27:03,720 --> 00:27:06,840 change 702 00:27:04,740 --> 00:27:09,240 similar with the dynamic relationship 703 00:27:06,840 --> 00:27:11,039 and like if you need to retrain models 704 00:27:09,240 --> 00:27:12,960 from time to time the data that 705 00:27:11,039 --> 00:27:15,659 encounters is dynamic so you need to 706 00:27:12,960 --> 00:27:17,460 replace the model and most of the time 707 00:27:15,659 --> 00:27:20,340 you need to retrain them 708 00:27:17,460 --> 00:27:22,020 and lastly this is my last slide so the 709 00:27:20,340 --> 00:27:23,880 cost of Maintenance you always need to 710 00:27:22,020 --> 00:27:26,520 consider this if you want to produce 711 00:27:23,880 --> 00:27:29,340 stable models you might have more 712 00:27:26,520 --> 00:27:30,720 upfront cost again as I mentioned stable 713 00:27:29,340 --> 00:27:33,900 models are 714 00:27:30,720 --> 00:27:36,779 more expensive to develop right and you 715 00:27:33,900 --> 00:27:39,000 know it's always good for models that 716 00:27:36,779 --> 00:27:40,980 need constantly training it's better to 717 00:27:39,000 --> 00:27:42,779 automate them to reduce the cost and you 718 00:27:40,980 --> 00:27:43,919 can always use transfer learning if you 719 00:27:42,779 --> 00:27:47,059 want 720 00:27:43,919 --> 00:27:47,059 in the sale of my slide 721 00:27:47,159 --> 00:27:51,140 questions 722 00:27:48,720 --> 00:27:51,140 hey 723 00:27:51,720 --> 00:27:54,559 thank you nins 724 00:27:55,080 --> 00:27:59,340 very very interesting very insightful 725 00:27:57,480 --> 00:28:01,640 now we probably got time for some 726 00:27:59,340 --> 00:28:01,640 questions 727 00:28:01,980 --> 00:28:08,120 I can't see anything from here because 728 00:28:04,860 --> 00:28:08,120 someone over there I think 729 00:28:13,559 --> 00:28:18,539 hi 730 00:28:15,539 --> 00:28:21,960 um you mentioned uh retraining the data 731 00:28:18,539 --> 00:28:24,900 as the model accuracy dips over time 732 00:28:21,960 --> 00:28:26,400 and I just wanted to ask how do you like 733 00:28:24,900 --> 00:28:28,580 what are some techniques you have to 734 00:28:26,400 --> 00:28:31,860 avoid model overfitting as you retrain 735 00:28:28,580 --> 00:28:35,279 particularly with the things like 736 00:28:31,860 --> 00:28:38,159 unidentified seasonal data and um 737 00:28:35,279 --> 00:28:39,659 uh yeah just how you would avoid 738 00:28:38,159 --> 00:28:42,539 especially when you automate the 739 00:28:39,659 --> 00:28:44,100 retraining process 740 00:28:42,539 --> 00:28:46,200 shouldn't and I think we get the 741 00:28:44,100 --> 00:28:48,000 question a lot right remember in the 742 00:28:46,200 --> 00:28:50,100 earlier part when I told you to use more 743 00:28:48,000 --> 00:28:52,559 than one metric 744 00:28:50,100 --> 00:28:54,419 use five use 10 metrics to be able to 745 00:28:52,559 --> 00:28:57,480 fully understand the behavior of your 746 00:28:54,419 --> 00:29:00,480 data and your model right so for example 747 00:28:57,480 --> 00:29:02,820 this is a real life scenario like um we 748 00:29:00,480 --> 00:29:03,900 have a model deployed in production and 749 00:29:02,820 --> 00:29:07,020 then 750 00:29:03,900 --> 00:29:09,059 the F1 score continually goes down 751 00:29:07,020 --> 00:29:12,120 right it continually goes down 752 00:29:09,059 --> 00:29:14,940 however if you look at uh 753 00:29:12,120 --> 00:29:17,159 recall and the lag loss function 754 00:29:14,940 --> 00:29:20,760 the performance is still okay 755 00:29:17,159 --> 00:29:23,520 right if we follow the F1 score we 756 00:29:20,760 --> 00:29:26,100 constantly need to retrain the model 757 00:29:23,520 --> 00:29:31,020 and as a result 758 00:29:26,100 --> 00:29:33,539 the subsequent models became overfitted 759 00:29:31,020 --> 00:29:36,360 however we did not do the retraining 760 00:29:33,539 --> 00:29:38,700 because we have the log loss and we have 761 00:29:36,360 --> 00:29:41,880 the other metrics that allowed us to 762 00:29:38,700 --> 00:29:43,860 understand that okay it's okay if the F1 763 00:29:41,880 --> 00:29:46,260 score decreases because 764 00:29:43,860 --> 00:29:47,760 it will be over fitted if we train it's 765 00:29:46,260 --> 00:29:50,520 not the target metric that we want to 766 00:29:47,760 --> 00:29:52,919 use anyway so you know that's one of the 767 00:29:50,520 --> 00:29:53,700 way that we approach the problem and I 768 00:29:52,919 --> 00:29:55,679 think 769 00:29:53,700 --> 00:29:57,179 um you should have a Hands-On or you 770 00:29:55,679 --> 00:29:58,919 should try it Hands-On so that you can 771 00:29:57,179 --> 00:30:02,460 experience it yourself right because 772 00:29:58,919 --> 00:30:03,779 it's uh it's something that um you'll 773 00:30:02,460 --> 00:30:05,760 definitely see in the patterns of the 774 00:30:03,779 --> 00:30:07,860 data once you try it yourself 775 00:30:05,760 --> 00:30:10,080 so thank you use metrics use as many 776 00:30:07,860 --> 00:30:12,539 metric as you want 777 00:30:10,080 --> 00:30:14,640 questions thank you any other questions 778 00:30:12,539 --> 00:30:17,600 oh I can see 779 00:30:14,640 --> 00:30:17,600 this one over there 780 00:30:18,539 --> 00:30:23,340 um I've got a bit of a two-parter 781 00:30:21,120 --> 00:30:27,120 um so you mentioned the test train 782 00:30:23,340 --> 00:30:29,039 validate split uh do you have preferred 783 00:30:27,120 --> 00:30:31,679 proportions for splitting your initial 784 00:30:29,039 --> 00:30:33,960 data set into test train validate ah yes 785 00:30:31,679 --> 00:30:36,059 they vary with the size of the data 786 00:30:33,960 --> 00:30:37,799 available the second part of my question 787 00:30:36,059 --> 00:30:39,980 was it sounds like you're picking local 788 00:30:37,799 --> 00:30:42,120 low confidence predictions and you and 789 00:30:39,980 --> 00:30:45,720 labeling those samples and using them to 790 00:30:42,120 --> 00:30:47,760 bulk up your data set in production if 791 00:30:45,720 --> 00:30:49,260 you're picking actual samples submitted 792 00:30:47,760 --> 00:30:51,240 to you have you ever come across privacy 793 00:30:49,260 --> 00:30:54,240 concerns with that 794 00:30:51,240 --> 00:30:55,799 okay so I'll answer the distribution 795 00:30:54,240 --> 00:30:57,419 problem first 796 00:30:55,799 --> 00:31:01,500 um typically 797 00:30:57,419 --> 00:31:04,799 uh we for example let's say we have 100 798 00:31:01,500 --> 00:31:07,440 data points right so typically what we 799 00:31:04,799 --> 00:31:10,020 do before is uh somewhat 60 training 800 00:31:07,440 --> 00:31:12,779 data set 20 801 00:31:10,020 --> 00:31:14,159 um testing and then 20 validation 802 00:31:12,779 --> 00:31:16,260 however 803 00:31:14,159 --> 00:31:18,659 uh we discovered that 804 00:31:16,260 --> 00:31:22,320 ideally we would want that validation 805 00:31:18,659 --> 00:31:24,720 data set to be sampled differently 806 00:31:22,320 --> 00:31:28,679 from the training and the testing data 807 00:31:24,720 --> 00:31:31,080 set so if we have an option to ask the 808 00:31:28,679 --> 00:31:33,120 customer hey look this is the initial 809 00:31:31,080 --> 00:31:36,000 data that you provided we can train with 810 00:31:33,120 --> 00:31:38,700 this data set and then we can use some 811 00:31:36,000 --> 00:31:42,600 part of it as a testing data set however 812 00:31:38,700 --> 00:31:44,460 would you be willing to extend your data 813 00:31:42,600 --> 00:31:46,740 set maybe to another month like for 814 00:31:44,460 --> 00:31:48,240 example if if we're building an 815 00:31:46,740 --> 00:31:49,980 application for them 816 00:31:48,240 --> 00:31:52,799 they're gonna give us another month 817 00:31:49,980 --> 00:31:56,159 worth of data after we finish the model 818 00:31:52,799 --> 00:31:57,480 and that will be the validation data set 819 00:31:56,159 --> 00:31:59,279 so 820 00:31:57,480 --> 00:32:01,200 we figured that or at least the data 821 00:31:59,279 --> 00:32:03,840 scientist in my team they figured out 822 00:32:01,200 --> 00:32:06,179 that it's more organic that way they 823 00:32:03,840 --> 00:32:07,860 were able to capture the relationships 824 00:32:06,179 --> 00:32:10,260 of the data better because 825 00:32:07,860 --> 00:32:12,240 let's please if someone gives you a 826 00:32:10,260 --> 00:32:13,860 training data set and a testing data set 827 00:32:12,240 --> 00:32:15,840 there will always be some form of bias 828 00:32:13,860 --> 00:32:17,880 going on in there right so they will 829 00:32:15,840 --> 00:32:19,380 this is our Target problem and 830 00:32:17,880 --> 00:32:22,200 oftentimes they will give you a really 831 00:32:19,380 --> 00:32:24,480 clean or a really standard relationship 832 00:32:22,200 --> 00:32:27,179 between the data so that's what we 833 00:32:24,480 --> 00:32:29,039 typically do we separate that 834 00:32:27,179 --> 00:32:31,799 training and testing data set to the 835 00:32:29,039 --> 00:32:34,020 validation data set right so 836 00:32:31,799 --> 00:32:36,539 if you can do that then do that right 837 00:32:34,020 --> 00:32:38,279 I'm sorry what's your second question uh 838 00:32:36,539 --> 00:32:40,740 it sounded like you were 839 00:32:38,279 --> 00:32:42,539 picking low confidence predictions for 840 00:32:40,740 --> 00:32:45,000 production use case of the model and 841 00:32:42,539 --> 00:32:46,559 then including them in your data set if 842 00:32:45,000 --> 00:32:48,299 you are doing that have you come across 843 00:32:46,559 --> 00:32:49,620 privacy concerns or are you just not 844 00:32:48,299 --> 00:32:52,260 processing data where that's a problem 845 00:32:49,620 --> 00:32:54,059 oh yeah so we're picking the low 846 00:32:52,260 --> 00:32:56,520 confidence predictions 847 00:32:54,059 --> 00:32:58,500 just for validation purposes right it's 848 00:32:56,520 --> 00:33:01,140 still up to the data scientist if they 849 00:32:58,500 --> 00:33:01,980 want to include those data points in the 850 00:33:01,140 --> 00:33:04,320 prediction 851 00:33:01,980 --> 00:33:06,600 right we just want to understand why 852 00:33:04,320 --> 00:33:09,120 these guys have low predictions scores 853 00:33:06,600 --> 00:33:10,679 and if it makes sense to include them in 854 00:33:09,120 --> 00:33:12,299 the training data set or if it makes 855 00:33:10,679 --> 00:33:13,260 sense to create a new model out of them 856 00:33:12,299 --> 00:33:16,080 then 857 00:33:13,260 --> 00:33:18,539 of course we will do that right so it's 858 00:33:16,080 --> 00:33:20,580 not it's not we just you know pick them 859 00:33:18,539 --> 00:33:22,140 and then include them right away uh 860 00:33:20,580 --> 00:33:23,880 there are some instances like what I've 861 00:33:22,140 --> 00:33:25,620 mentioned where if we're if we're really 862 00:33:23,880 --> 00:33:28,140 confident with the model like if we have 863 00:33:25,620 --> 00:33:30,539 a big red model you know getting ready 864 00:33:28,140 --> 00:33:32,640 for those data points we just overwrite 865 00:33:30,539 --> 00:33:34,500 them completely but in most cases 866 00:33:32,640 --> 00:33:36,899 there's always another layer of choosing 867 00:33:34,500 --> 00:33:38,760 which of those low confidence data 868 00:33:36,899 --> 00:33:40,679 points should we include in the next 869 00:33:38,760 --> 00:33:42,419 training cycle right so there are other 870 00:33:40,679 --> 00:33:44,880 statistical tests that we perform in 871 00:33:42,419 --> 00:33:47,220 those low level data points to make sure 872 00:33:44,880 --> 00:33:49,320 that we're including a data point that 873 00:33:47,220 --> 00:33:51,720 represents or has a good representation 874 00:33:49,320 --> 00:33:54,960 of the actual problem that we're facing 875 00:33:51,720 --> 00:33:56,640 so this is that answer your question 876 00:33:54,960 --> 00:33:58,440 I think it just sounds like you're not 877 00:33:56,640 --> 00:34:00,179 processing data that's personally 878 00:33:58,440 --> 00:34:01,919 identifiable so you're free to do that 879 00:34:00,179 --> 00:34:04,200 which is great yeah yeah basically yes 880 00:34:01,919 --> 00:34:06,720 so 881 00:34:04,200 --> 00:34:09,240 okay thank you I have a stupid question 882 00:34:06,720 --> 00:34:10,679 perhaps no no questions stupid all right 883 00:34:09,240 --> 00:34:13,619 wait until you hear it 884 00:34:10,679 --> 00:34:14,820 um if you've got a super brain model why 885 00:34:13,619 --> 00:34:16,919 aren't you just using that in production 886 00:34:14,820 --> 00:34:18,720 which one if you've got a super duper 887 00:34:16,919 --> 00:34:21,240 model that's like 888 00:34:18,720 --> 00:34:22,679 more accurate why wouldn't use that one 889 00:34:21,240 --> 00:34:25,099 in production well yeah that's the thing 890 00:34:22,679 --> 00:34:25,099 I mean 891 00:34:25,320 --> 00:34:30,480 in my opinion people are obsessed with 892 00:34:28,139 --> 00:34:32,580 how you know this very generalized model 893 00:34:30,480 --> 00:34:35,940 that can perform everything really fast 894 00:34:32,580 --> 00:34:37,260 very accurate it doesn't exist 895 00:34:35,940 --> 00:34:39,780 not yet 896 00:34:37,260 --> 00:34:41,879 okay it doesn't exist or at least not 897 00:34:39,780 --> 00:34:43,679 yet right based on our experience there 898 00:34:41,879 --> 00:34:46,560 will always be some sort of trade-offs 899 00:34:43,679 --> 00:34:49,020 like uh you need to communicate this to 900 00:34:46,560 --> 00:34:51,599 your customers or to your partners one 901 00:34:49,020 --> 00:34:54,839 example would be the factory line set up 902 00:34:51,599 --> 00:34:56,700 right we need to perform prediction on a 903 00:34:54,839 --> 00:34:59,640 two megabyte device 904 00:34:56,700 --> 00:35:01,940 so it's very difficult to put in a 905 00:34:59,640 --> 00:35:05,099 really powerful model inside that device 906 00:35:01,940 --> 00:35:06,420 yeah it's small memory so you know it 907 00:35:05,099 --> 00:35:08,940 depends upon the context and maybe 908 00:35:06,420 --> 00:35:13,140 someday maybe someday hopefully 909 00:35:08,940 --> 00:35:15,420 hopefully soon enough or or not we will 910 00:35:13,140 --> 00:35:17,040 have that really big AI that can 911 00:35:15,420 --> 00:35:19,380 generalize and predict most of our 912 00:35:17,040 --> 00:35:21,119 problems maybe maybe thanks okay thank 913 00:35:19,380 --> 00:35:23,540 you thank you nins and here's a token of 914 00:35:21,119 --> 00:35:26,550 our appreciation 915 00:35:23,540 --> 00:35:29,769 thank you round of applause thanks guys 916 00:35:26,550 --> 00:35:29,769 [Applause]