1
00:00:00,480 --> 00:00:03,480
foreign

2
00:00:08,880 --> 00:00:15,360
okay

3
00:00:11,280 --> 00:00:18,420
uh last Talk of the day so

4
00:00:15,360 --> 00:00:21,140
um I have here nins who's uh software

5
00:00:18,420 --> 00:00:22,859
engineer that's transformed metamorph

6
00:00:21,140 --> 00:00:24,960
metamorphosized into her machine

7
00:00:22,859 --> 00:00:27,900
learning engineer and uh recently

8
00:00:24,960 --> 00:00:29,760
finished his masters of research with a

9
00:00:27,900 --> 00:00:31,140
focus on agriculture and image

10
00:00:29,760 --> 00:00:32,279
processing

11
00:00:31,140 --> 00:00:33,960
um so he's going to be talking to us

12
00:00:32,279 --> 00:00:35,700
today about a simple way to validate and

13
00:00:33,960 --> 00:00:37,800
monitor the performance of ml

14
00:00:35,700 --> 00:00:40,940
applications so round of applause please

15
00:00:37,800 --> 00:00:40,940
thank you

16
00:00:41,460 --> 00:00:48,539
hi uh hello hello can you hear me okay

17
00:00:45,780 --> 00:00:49,379
it's nice to be back on a live event

18
00:00:48,539 --> 00:00:51,539
um

19
00:00:49,379 --> 00:00:54,059
hi everyone I first want to thank Mike

20
00:00:51,539 --> 00:00:56,160
on Australia for organizing this event

21
00:00:54,059 --> 00:00:58,020
um definitely good one for all of us

22
00:00:56,160 --> 00:01:00,180
right so for today

23
00:00:58,020 --> 00:01:02,100
I'm here gonna I'm here and I'm gonna

24
00:01:00,180 --> 00:01:03,780
talk about a simple way to validate and

25
00:01:02,100 --> 00:01:05,280
monitor the performance of your machine

26
00:01:03,780 --> 00:01:08,840
learning applications

27
00:01:05,280 --> 00:01:12,000
okay a little bit about myself

28
00:01:08,840 --> 00:01:13,020
I do mountain bike I love DOTA 2. you

29
00:01:12,000 --> 00:01:14,280
can talk to me about software

30
00:01:13,020 --> 00:01:18,720
engineering data science machine

31
00:01:14,280 --> 00:01:21,240
learning I love to lay on grasses

32
00:01:18,720 --> 00:01:22,920
as evident by the picture and I like

33
00:01:21,240 --> 00:01:24,420
wrestler buns you know custard buns are

34
00:01:22,920 --> 00:01:26,840
really good

35
00:01:24,420 --> 00:01:26,840
anyway

36
00:01:27,180 --> 00:01:32,580
let's go so for today I hope I can

37
00:01:30,060 --> 00:01:34,080
impart some learnings to you guys and I

38
00:01:32,580 --> 00:01:36,360
hope you learned a lot from this session

39
00:01:34,080 --> 00:01:39,479
we're just gonna briefly go through and

40
00:01:36,360 --> 00:01:42,180
talk about these points right here so

41
00:01:39,479 --> 00:01:44,400
first one let's try to identify um

42
00:01:42,180 --> 00:01:46,560
what is model validation in the context

43
00:01:44,400 --> 00:01:49,439
of an application or a machine learning

44
00:01:46,560 --> 00:01:52,439
application to be accept right so how do

45
00:01:49,439 --> 00:01:53,700
we identify Trends why do you need

46
00:01:52,439 --> 00:01:55,619
validation

47
00:01:53,700 --> 00:01:59,220
specifically right now people are always

48
00:01:55,619 --> 00:02:01,020
into most of us are most AI subject to

49
00:01:59,220 --> 00:02:02,340
PT and stuff right

50
00:02:01,020 --> 00:02:04,020
so

51
00:02:02,340 --> 00:02:05,100
how do we choose the correct validation

52
00:02:04,020 --> 00:02:07,380
metric

53
00:02:05,100 --> 00:02:09,660
what are the available metrics that

54
00:02:07,380 --> 00:02:11,280
is in store for us

55
00:02:09,660 --> 00:02:13,260
um

56
00:02:11,280 --> 00:02:15,239
let's try to build a simple model

57
00:02:13,260 --> 00:02:18,000
validation module in Python so I'm going

58
00:02:15,239 --> 00:02:20,040
to show you an example of one of the

59
00:02:18,000 --> 00:02:22,260
simplest ways that we approach this

60
00:02:20,040 --> 00:02:24,540
problem in my previous work and I'm

61
00:02:22,260 --> 00:02:25,819
still using it as of today uh and then

62
00:02:24,540 --> 00:02:28,980
we're gonna briefly talk about

63
00:02:25,819 --> 00:02:31,920
retraining and model maintenance

64
00:02:28,980 --> 00:02:33,840
so stability versus training and then

65
00:02:31,920 --> 00:02:36,480
how long does your model stay in

66
00:02:33,840 --> 00:02:39,180
production and the maintenance cost of

67
00:02:36,480 --> 00:02:40,200
those models okay

68
00:02:39,180 --> 00:02:42,900
now

69
00:02:40,200 --> 00:02:44,580
every time every time every time I start

70
00:02:42,900 --> 00:02:46,680
a machine learning project or I talk

71
00:02:44,580 --> 00:02:48,599
with data scientists

72
00:02:46,680 --> 00:02:49,920
I always ask these questions

73
00:02:48,599 --> 00:02:51,900
these questions

74
00:02:49,920 --> 00:02:54,060
right so

75
00:02:51,900 --> 00:02:57,140
we always discuss if there's any ground

76
00:02:54,060 --> 00:03:00,120
truth validate validation data available

77
00:02:57,140 --> 00:03:02,940
going forward how can we validate the

78
00:03:00,120 --> 00:03:04,319
results of the model what metrics do we

79
00:03:02,940 --> 00:03:07,140
use as I mentioned we're going to go

80
00:03:04,319 --> 00:03:10,200
through this later is there going to be

81
00:03:07,140 --> 00:03:12,360
a change in the distribution of the

82
00:03:10,200 --> 00:03:14,400
features

83
00:03:12,360 --> 00:03:15,900
or are there going to be changes in the

84
00:03:14,400 --> 00:03:18,599
relationship between the features and

85
00:03:15,900 --> 00:03:20,400
the Target right so when I deploy the

86
00:03:18,599 --> 00:03:23,040
application will the deployment

87
00:03:20,400 --> 00:03:25,980
environment change over time

88
00:03:23,040 --> 00:03:27,599
so lastly we always talk about with the

89
00:03:25,980 --> 00:03:29,400
data scientist I'm working with we're

90
00:03:27,599 --> 00:03:31,080
always like how do we return the model

91
00:03:29,400 --> 00:03:33,480
how do we adjust the model how can we

92
00:03:31,080 --> 00:03:34,260
make sure that the model is always you

93
00:03:33,480 --> 00:03:37,500
know

94
00:03:34,260 --> 00:03:39,239
performing at its best

95
00:03:37,500 --> 00:03:41,640
now

96
00:03:39,239 --> 00:03:42,599
let's talk about a performance change I

97
00:03:41,640 --> 00:03:45,659
believe there's going to be a very

98
00:03:42,599 --> 00:03:47,040
specific talk about this tomorrow so

99
00:03:45,659 --> 00:03:49,500
we're just going to cover some of the

100
00:03:47,040 --> 00:03:50,700
basic and like um the underlying

101
00:03:49,500 --> 00:03:54,900
Concepts

102
00:03:50,700 --> 00:03:56,819
with this so model performance change it

103
00:03:54,900 --> 00:04:00,120
decreases over time

104
00:03:56,819 --> 00:04:02,640
it's almost always the case

105
00:04:00,120 --> 00:04:04,200
so it can be a result of several factors

106
00:04:02,640 --> 00:04:06,420
mainly

107
00:04:04,200 --> 00:04:09,180
um if it's a concept drift so the nature

108
00:04:06,420 --> 00:04:11,519
of the problem changes or if it's a data

109
00:04:09,180 --> 00:04:14,400
drift then the distribution and the

110
00:04:11,519 --> 00:04:17,519
relationship between the data changes it

111
00:04:14,400 --> 00:04:20,820
happens most of the time it can be an

112
00:04:17,519 --> 00:04:22,440
actual deployment setup problem right so

113
00:04:20,820 --> 00:04:24,780
someone accidentally pushed a new

114
00:04:22,440 --> 00:04:26,160
version of numpy in the requirements of

115
00:04:24,780 --> 00:04:28,400
the text

116
00:04:26,160 --> 00:04:31,199
the performance changed

117
00:04:28,400 --> 00:04:34,440
models are always going to be performing

118
00:04:31,199 --> 00:04:36,180
at its best right after training

119
00:04:34,440 --> 00:04:38,340
once you deploy it it's

120
00:04:36,180 --> 00:04:39,419
that's its peak right unless you're

121
00:04:38,340 --> 00:04:42,080
retrain it because it's going to

122
00:04:39,419 --> 00:04:44,580
encounter a new data set eventually

123
00:04:42,080 --> 00:04:47,520
and sometimes these changes can cause

124
00:04:44,580 --> 00:04:50,100
unintentional bias which is not bad

125
00:04:47,520 --> 00:04:51,960
because in some context or in some

126
00:04:50,100 --> 00:04:52,500
problems

127
00:04:51,960 --> 00:04:54,720
um

128
00:04:52,500 --> 00:04:56,880
putting in bias is actually quite good

129
00:04:54,720 --> 00:04:58,860
right it depends on the context it

130
00:04:56,880 --> 00:05:01,320
depends on the problem

131
00:04:58,860 --> 00:05:04,979
now just a brief preview about concept

132
00:05:01,320 --> 00:05:06,419
drift it typically involves

133
00:05:04,979 --> 00:05:09,060
um the problem that the model is trying

134
00:05:06,419 --> 00:05:11,220
to solve and suddenly it changed

135
00:05:09,060 --> 00:05:13,259
right so there's

136
00:05:11,220 --> 00:05:15,600
I I drew that graph I'm not sure if it's

137
00:05:13,259 --> 00:05:16,320
good or not but it's cute

138
00:05:15,600 --> 00:05:19,020
um

139
00:05:16,320 --> 00:05:22,139
so here we can see that

140
00:05:19,020 --> 00:05:24,780
initially the model is predicting the

141
00:05:22,139 --> 00:05:27,000
data points in blue correctly the data

142
00:05:24,780 --> 00:05:28,560
points in Orange are the ones that are

143
00:05:27,000 --> 00:05:31,259
used for training

144
00:05:28,560 --> 00:05:32,880
and then suddenly

145
00:05:31,259 --> 00:05:36,060
something happened

146
00:05:32,880 --> 00:05:37,800
right it's misdetecting everything

147
00:05:36,060 --> 00:05:40,199
so these things actually happen in the

148
00:05:37,800 --> 00:05:43,139
wild or in the development environment

149
00:05:40,199 --> 00:05:45,720
in most development environments uh

150
00:05:43,139 --> 00:05:49,560
we have before right one example would

151
00:05:45,720 --> 00:05:51,419
be probably 1920 prediction right when a

152
00:05:49,560 --> 00:05:53,820
variant change or if there's a new

153
00:05:51,419 --> 00:05:54,660
variant essentially it's going to have a

154
00:05:53,820 --> 00:05:58,020
different

155
00:05:54,660 --> 00:05:59,580
interaction right transmissibility will

156
00:05:58,020 --> 00:06:01,380
change other factors will change so

157
00:05:59,580 --> 00:06:04,440
essentially you're not predicting

158
00:06:01,380 --> 00:06:08,580
for the same problem anymore

159
00:06:04,440 --> 00:06:10,680
this could happen suddenly like this one

160
00:06:08,580 --> 00:06:12,979
or it can happen gradually

161
00:06:10,680 --> 00:06:15,479
incrementally or it can be recurring

162
00:06:12,979 --> 00:06:18,360
sometimes share occurring

163
00:06:15,479 --> 00:06:21,120
um chain uh concept drift happens and

164
00:06:18,360 --> 00:06:23,280
like weather data set you know

165
00:06:21,120 --> 00:06:24,960
it's quite common but I think the most

166
00:06:23,280 --> 00:06:26,819
common one is the gradual because

167
00:06:24,960 --> 00:06:30,720
sometimes you don't notice it that it's

168
00:06:26,819 --> 00:06:33,120
happening and then suddenly the entire

169
00:06:30,720 --> 00:06:34,560
problem changes and your prediction

170
00:06:33,120 --> 00:06:36,419
changes as well

171
00:06:34,560 --> 00:06:38,580
right

172
00:06:36,419 --> 00:06:41,699
there's also data drift

173
00:06:38,580 --> 00:06:44,639
so I think this is easier to understand

174
00:06:41,699 --> 00:06:47,819
it basically means you created the model

175
00:06:44,639 --> 00:06:50,580
using a certain set of data range or

176
00:06:47,819 --> 00:06:52,020
data distribution and then as you go

177
00:06:50,580 --> 00:06:55,319
forward

178
00:06:52,020 --> 00:06:57,960
new types of data distribution occur

179
00:06:55,319 --> 00:07:00,180
right so this is a very common in

180
00:06:57,960 --> 00:07:02,460
financial models or in demographic type

181
00:07:00,180 --> 00:07:05,039
of data right so if you're dealing with

182
00:07:02,460 --> 00:07:07,080
this problems right here

183
00:07:05,039 --> 00:07:09,000
you can expect to encounter this and

184
00:07:07,080 --> 00:07:11,759
it's usually easily fixed by just you

185
00:07:09,000 --> 00:07:14,520
know retraining the model again

186
00:07:11,759 --> 00:07:15,780
so we'll discuss more about retraining

187
00:07:14,520 --> 00:07:17,639
later

188
00:07:15,780 --> 00:07:20,340
now

189
00:07:17,639 --> 00:07:22,680
why is it important so you know Morty if

190
00:07:20,340 --> 00:07:24,180
you're I'm a fan of Rick and Morty

191
00:07:22,680 --> 00:07:26,819
um

192
00:07:24,180 --> 00:07:28,440
detecting and identifying this change or

193
00:07:26,819 --> 00:07:31,020
this Trends in terms of the performance

194
00:07:28,440 --> 00:07:32,819
of the model is actually quite important

195
00:07:31,020 --> 00:07:34,319
because it's a good indicator of your

196
00:07:32,819 --> 00:07:37,259
machine learning application it's a good

197
00:07:34,319 --> 00:07:39,180
indicator how it behaves right people

198
00:07:37,259 --> 00:07:41,520
talk about trying to

199
00:07:39,180 --> 00:07:43,199
make sure that their machine learning

200
00:07:41,520 --> 00:07:44,099
applications perform really well over

201
00:07:43,199 --> 00:07:45,780
time

202
00:07:44,099 --> 00:07:48,120
detecting these changes are a good

203
00:07:45,780 --> 00:07:49,800
indication of you know what's happening

204
00:07:48,120 --> 00:07:51,440
in your application

205
00:07:49,800 --> 00:07:54,900
so

206
00:07:51,440 --> 00:07:57,780
understanding these changes also lets

207
00:07:54,900 --> 00:08:00,660
you efficiently make decisions in terms

208
00:07:57,780 --> 00:08:02,940
of free training adjustments in your

209
00:08:00,660 --> 00:08:05,819
model especially when we have this

210
00:08:02,940 --> 00:08:07,560
project before and we're trying to do a

211
00:08:05,819 --> 00:08:09,300
factory line so you need to predict

212
00:08:07,560 --> 00:08:11,759
really fast and then you need to retrain

213
00:08:09,300 --> 00:08:13,020
really fast something like that so if

214
00:08:11,759 --> 00:08:14,460
you understand

215
00:08:13,020 --> 00:08:17,280
how

216
00:08:14,460 --> 00:08:19,080
the performance shifts you can easily

217
00:08:17,280 --> 00:08:21,060
retrain based on the parameters that you

218
00:08:19,080 --> 00:08:25,259
want etc etc

219
00:08:21,060 --> 00:08:26,520
and one of the I think one of the most

220
00:08:25,259 --> 00:08:28,919
interesting

221
00:08:26,520 --> 00:08:31,440
things that can happen is that you are

222
00:08:28,919 --> 00:08:34,200
able to understand more of the data set

223
00:08:31,440 --> 00:08:36,959
right if you're able to detect this

224
00:08:34,200 --> 00:08:39,180
change in the model performance then

225
00:08:36,959 --> 00:08:41,159
some new patterns arise some new

226
00:08:39,180 --> 00:08:43,200
features can be extracted and stuff like

227
00:08:41,159 --> 00:08:45,660
that so this is quite interesting this

228
00:08:43,200 --> 00:08:48,360
is part of feature engineering this is

229
00:08:45,660 --> 00:08:49,680
quite interesting that um it shows up

230
00:08:48,360 --> 00:08:50,820
once you understand how your model

231
00:08:49,680 --> 00:08:53,040
performs

232
00:08:50,820 --> 00:08:55,740
right

233
00:08:53,040 --> 00:08:58,519
so as I mentioned earlier

234
00:08:55,740 --> 00:09:01,019
here we have an example of

235
00:08:58,519 --> 00:09:05,100
fixing a data drift

236
00:09:01,019 --> 00:09:07,500
so the initial training data is limited

237
00:09:05,100 --> 00:09:09,360
and then as we move forward and

238
00:09:07,500 --> 00:09:11,880
encounter new predictions what we can do

239
00:09:09,360 --> 00:09:13,740
is we can adjust the training data to

240
00:09:11,880 --> 00:09:15,180
include the new data set that we

241
00:09:13,740 --> 00:09:18,360
encounter

242
00:09:15,180 --> 00:09:20,940
in order for us to correct the model

243
00:09:18,360 --> 00:09:21,600
so here you can clearly see that you

244
00:09:20,940 --> 00:09:23,880
know

245
00:09:21,600 --> 00:09:27,560
the model adjusted if you include more

246
00:09:23,880 --> 00:09:27,560
training data set right

247
00:09:27,839 --> 00:09:32,519
now how do we measure

248
00:09:30,120 --> 00:09:34,800
the model performance let's just go

249
00:09:32,519 --> 00:09:37,920
through this briefly it depends on the

250
00:09:34,800 --> 00:09:39,959
problem right so I'm sure at least most

251
00:09:37,920 --> 00:09:41,700
of most of you might be familiar with

252
00:09:39,959 --> 00:09:44,760
regression and classification type of

253
00:09:41,700 --> 00:09:47,399
problems so depending on the problem you

254
00:09:44,760 --> 00:09:48,660
might want to use this like mean square

255
00:09:47,399 --> 00:09:51,060
error

256
00:09:48,660 --> 00:09:53,100
root mean square error and you know

257
00:09:51,060 --> 00:09:54,720
similar approach if you're dealing with

258
00:09:53,100 --> 00:09:55,980
regression problems

259
00:09:54,720 --> 00:09:58,500
so

260
00:09:55,980 --> 00:10:01,560
if you're dealing with classification

261
00:09:58,500 --> 00:10:05,940
problems then you typically use your

262
00:10:01,560 --> 00:10:08,540
accuracy precision recall F1 Roc AUC and

263
00:10:05,940 --> 00:10:08,540
stuff like that

264
00:10:09,660 --> 00:10:12,920
but what does it mean right

265
00:10:12,959 --> 00:10:18,019
let's say for example I have this

266
00:10:15,120 --> 00:10:21,420
performance Trend right here

267
00:10:18,019 --> 00:10:24,720
I'm measuring two metrics which is

268
00:10:21,420 --> 00:10:27,240
accuracy and recall for example

269
00:10:24,720 --> 00:10:30,540
um initially the recall started really

270
00:10:27,240 --> 00:10:34,100
high but then at some point it went down

271
00:10:30,540 --> 00:10:34,100
while the accuracy went up

272
00:10:34,140 --> 00:10:39,060
which metrics should you use is it

273
00:10:36,959 --> 00:10:42,620
accuracy because you know over time it

274
00:10:39,060 --> 00:10:42,620
goes up or it is it recall

275
00:10:43,320 --> 00:10:47,459
it depends on the problem and on the

276
00:10:45,300 --> 00:10:50,040
context of the problem right there are

277
00:10:47,459 --> 00:10:51,180
some instances that it's very important

278
00:10:50,040 --> 00:10:53,700
to know

279
00:10:51,180 --> 00:10:55,740
uh like for example when recall it's

280
00:10:53,700 --> 00:10:57,839
being used for medical data set when

281
00:10:55,740 --> 00:10:59,880
you're not you know it's okay for false

282
00:10:57,839 --> 00:11:01,620
positives and stuff like that so

283
00:10:59,880 --> 00:11:03,000
accuracy there are also certain types of

284
00:11:01,620 --> 00:11:05,700
problems that you might want to prefer

285
00:11:03,000 --> 00:11:07,019
accuracy or F1 right

286
00:11:05,700 --> 00:11:09,120
so

287
00:11:07,019 --> 00:11:12,060
it depends on the circumstance

288
00:11:09,120 --> 00:11:14,160
uh and in our use case and in my

289
00:11:12,060 --> 00:11:17,220
experience typically when we're trying

290
00:11:14,160 --> 00:11:19,680
to monitor um the performance of a model

291
00:11:17,220 --> 00:11:21,420
we use two three five metrics at a given

292
00:11:19,680 --> 00:11:23,519
moment at a given model

293
00:11:21,420 --> 00:11:27,720
so it's not just one it's not just two

294
00:11:23,519 --> 00:11:30,779
like minimum two yes but then

295
00:11:27,720 --> 00:11:31,440
the more the merrier in this case or you

296
00:11:30,779 --> 00:11:33,300
know

297
00:11:31,440 --> 00:11:34,260
the more you use the more method you use

298
00:11:33,300 --> 00:11:36,300
the more

299
00:11:34,260 --> 00:11:38,519
information and the more you understand

300
00:11:36,300 --> 00:11:40,560
the behavior of the model

301
00:11:38,519 --> 00:11:42,240
right

302
00:11:40,560 --> 00:11:45,180
now

303
00:11:42,240 --> 00:11:49,079
how do we build a simple that's the key

304
00:11:45,180 --> 00:11:50,519
term a simple validation module

305
00:11:49,079 --> 00:11:51,839
how do we integrate this in our

306
00:11:50,519 --> 00:11:54,480
application

307
00:11:51,839 --> 00:11:56,519
so there are several ways for you to be

308
00:11:54,480 --> 00:11:57,720
able to detect data drift and concept

309
00:11:56,519 --> 00:11:58,320
drift

310
00:11:57,720 --> 00:12:01,320
um

311
00:11:58,320 --> 00:12:04,019
there are statistical tests those are

312
00:12:01,320 --> 00:12:05,760
quite technical right and some of us

313
00:12:04,019 --> 00:12:07,800
don't want to deal with those kinds of

314
00:12:05,760 --> 00:12:10,440
numbers and those approach there are

315
00:12:07,800 --> 00:12:12,720
several drift detection algorithms but

316
00:12:10,440 --> 00:12:15,300
typically they happen

317
00:12:12,720 --> 00:12:17,220
on uh you need

318
00:12:15,300 --> 00:12:19,560
you need lots of data set to be able to

319
00:12:17,220 --> 00:12:21,959
understand it we want to be able to

320
00:12:19,560 --> 00:12:22,800
detect these changes on the go as it

321
00:12:21,959 --> 00:12:26,220
happens

322
00:12:22,800 --> 00:12:27,899
right it needs to be simple so that it

323
00:12:26,220 --> 00:12:29,459
can be integrated into any kind of

324
00:12:27,899 --> 00:12:30,779
application that we want

325
00:12:29,459 --> 00:12:32,640
and

326
00:12:30,779 --> 00:12:34,920
the model validation function should be

327
00:12:32,640 --> 00:12:37,500
easily understandable because

328
00:12:34,920 --> 00:12:40,200
for developers we want maintainable and

329
00:12:37,500 --> 00:12:42,180
easy to understand code

330
00:12:40,200 --> 00:12:45,060
so you know show me what you got if you

331
00:12:42,180 --> 00:12:47,040
from share with the reference anyway

332
00:12:45,060 --> 00:12:48,600
um yeah that's the thing we need to

333
00:12:47,040 --> 00:12:51,480
catch the early signs

334
00:12:48,600 --> 00:12:54,320
so if you look at my poorly drawn graph

335
00:12:51,480 --> 00:12:54,320
right there

336
00:12:54,600 --> 00:12:58,680
we need to catch those instances the one

337
00:12:56,519 --> 00:13:03,300
that's in circled uh

338
00:12:58,680 --> 00:13:05,279
those data points right what we learned

339
00:13:03,300 --> 00:13:07,740
when we're building this machine

340
00:13:05,279 --> 00:13:10,380
learning applications

341
00:13:07,740 --> 00:13:14,220
we need to regularly check

342
00:13:10,380 --> 00:13:17,579
for performance change

343
00:13:14,220 --> 00:13:19,680
there are applications that

344
00:13:17,579 --> 00:13:22,380
we have I don't know maybe every five

345
00:13:19,680 --> 00:13:23,940
minutes we detect for we try to see if

346
00:13:22,380 --> 00:13:26,399
there's a performance change because

347
00:13:23,940 --> 00:13:27,660
it's surrounding 1000 predictions every

348
00:13:26,399 --> 00:13:30,600
second so

349
00:13:27,660 --> 00:13:31,860
you know and then one of the things that

350
00:13:30,600 --> 00:13:33,959
we learned is that

351
00:13:31,860 --> 00:13:35,279
let's just assume that the outliers are

352
00:13:33,959 --> 00:13:38,399
wrong prediction

353
00:13:35,279 --> 00:13:41,880
okay I have no bad blood versus outliers

354
00:13:38,399 --> 00:13:43,019
but you know in this instance

355
00:13:41,880 --> 00:13:46,079
um

356
00:13:43,019 --> 00:13:47,820
it's safe to assume that

357
00:13:46,079 --> 00:13:49,200
there's something wrong with these

358
00:13:47,820 --> 00:13:52,800
outliers right here and we want to

359
00:13:49,200 --> 00:13:54,779
understand why are they outliers so yeah

360
00:13:52,800 --> 00:13:56,040
just assume they're wrong predictions

361
00:13:54,779 --> 00:13:58,200
okay

362
00:13:56,040 --> 00:13:59,700
The Next Step would be to prepare a good

363
00:13:58,200 --> 00:14:01,620
validation data set

364
00:13:59,700 --> 00:14:04,200
I will show you a simple diagram later

365
00:14:01,620 --> 00:14:06,600
but essentially it's different from your

366
00:14:04,200 --> 00:14:09,180
testing or training data set these are

367
00:14:06,600 --> 00:14:10,560
data set that you can anchor to the

368
00:14:09,180 --> 00:14:12,420
model that's the term that we're using

369
00:14:10,560 --> 00:14:14,820
or that you can attach to the model to

370
00:14:12,420 --> 00:14:16,560
the deployed model so that you have a

371
00:14:14,820 --> 00:14:17,639
consistent understanding of their

372
00:14:16,560 --> 00:14:19,860
performance

373
00:14:17,639 --> 00:14:22,740
I'm gonna see that later

374
00:14:19,860 --> 00:14:23,880
and ideally you should have a secondary

375
00:14:22,740 --> 00:14:26,639
model

376
00:14:23,880 --> 00:14:30,240
stronger more powerful model for auto my

377
00:14:26,639 --> 00:14:32,160
automated validation or sometimes we do

378
00:14:30,240 --> 00:14:33,899
actually most of the times we do manual

379
00:14:32,160 --> 00:14:36,660
validation of

380
00:14:33,899 --> 00:14:38,220
um sample incoming data so that we're

381
00:14:36,660 --> 00:14:39,240
sure that the application is performing

382
00:14:38,220 --> 00:14:41,720
really well

383
00:14:39,240 --> 00:14:44,040
and it's very useful in terms of

384
00:14:41,720 --> 00:14:46,339
retraining or creating a new version of

385
00:14:44,040 --> 00:14:46,339
the model

386
00:14:47,399 --> 00:14:52,440
now here's a basic overview of what

387
00:14:50,519 --> 00:14:54,779
typically happens

388
00:14:52,440 --> 00:14:57,899
um during you know model development

389
00:14:54,779 --> 00:15:00,300
deploying it into an application so

390
00:14:57,899 --> 00:15:01,740
initially you have the model development

391
00:15:00,300 --> 00:15:03,000
which is the data scientists are

392
00:15:01,740 --> 00:15:05,820
involved the machine learning Engineers

393
00:15:03,000 --> 00:15:07,380
are involved and they're using training

394
00:15:05,820 --> 00:15:08,940
and testing data set to be able to

395
00:15:07,380 --> 00:15:10,920
create the best model or to be able to

396
00:15:08,940 --> 00:15:13,320
produce the best model

397
00:15:10,920 --> 00:15:15,540
after that we're going to use that model

398
00:15:13,320 --> 00:15:17,940
in an application and typically we

399
00:15:15,540 --> 00:15:20,160
record or no no you should record the

400
00:15:17,940 --> 00:15:21,540
results in the database or you know you

401
00:15:20,160 --> 00:15:23,279
should keep track of the results of your

402
00:15:21,540 --> 00:15:27,360
model that's the key part right there

403
00:15:23,279 --> 00:15:29,399
okay and then lastly we have two ways

404
00:15:27,360 --> 00:15:31,139
in most of our applications that we

405
00:15:29,399 --> 00:15:33,120
built we have two ways of model

406
00:15:31,139 --> 00:15:34,440
validation or you know checking the

407
00:15:33,120 --> 00:15:37,019
performance of the model

408
00:15:34,440 --> 00:15:40,320
so one is using the validation data set

409
00:15:37,019 --> 00:15:41,940
and the other one is double checking the

410
00:15:40,320 --> 00:15:44,100
incoming or the new data that we

411
00:15:41,940 --> 00:15:47,120
encounter or the model or the new data

412
00:15:44,100 --> 00:15:47,120
that the model encounters

413
00:15:47,820 --> 00:15:52,920
now let's just you know um

414
00:15:50,639 --> 00:15:56,279
take it slow maybe use this as a good

415
00:15:52,920 --> 00:15:58,260
story telling stuff that um we're gonna

416
00:15:56,279 --> 00:16:00,180
go each step

417
00:15:58,260 --> 00:16:01,560
so first one

418
00:16:00,180 --> 00:16:03,959
um data scientists machine learning

419
00:16:01,560 --> 00:16:05,639
Engineers typically create

420
00:16:03,959 --> 00:16:08,459
a really good model

421
00:16:05,639 --> 00:16:10,380
using testing and training data set

422
00:16:08,459 --> 00:16:11,940
I think most of us are familiar with

423
00:16:10,380 --> 00:16:13,920
that step

424
00:16:11,940 --> 00:16:16,620
after that

425
00:16:13,920 --> 00:16:18,779
what we typically do is we prepare a

426
00:16:16,620 --> 00:16:22,680
separate validation data set

427
00:16:18,779 --> 00:16:24,360
okay and then we place that model into a

428
00:16:22,680 --> 00:16:26,459
controlled environment that's very

429
00:16:24,360 --> 00:16:29,459
similar to production ideally

430
00:16:26,459 --> 00:16:32,279
and then we run the validation data set

431
00:16:29,459 --> 00:16:33,899
several times on the model maybe if

432
00:16:32,279 --> 00:16:35,760
you're using k-fold or any kind of

433
00:16:33,899 --> 00:16:37,620
sampling whatever

434
00:16:35,760 --> 00:16:39,899
um it's up to you

435
00:16:37,620 --> 00:16:41,940
so we run it we run it we run it and

436
00:16:39,899 --> 00:16:45,060
then we record the results

437
00:16:41,940 --> 00:16:48,360
after that we have this like uh we have

438
00:16:45,060 --> 00:16:50,579
this acceptable result metric that we're

439
00:16:48,360 --> 00:16:53,220
basing or that we're recording

440
00:16:50,579 --> 00:16:54,980
so in this instance for example this

441
00:16:53,220 --> 00:16:58,139
application

442
00:16:54,980 --> 00:17:00,420
this application setup paired with this

443
00:16:58,139 --> 00:17:03,120
model paired with this validation data

444
00:17:00,420 --> 00:17:06,079
set should produce an accuracy of 80 to

445
00:17:03,120 --> 00:17:10,819
82 percent and an F1 score of 80 to 81

446
00:17:06,079 --> 00:17:10,819
with 100 predictions in 10 seconds

447
00:17:10,980 --> 00:17:15,780
so you see our trying to figure out the

448
00:17:13,380 --> 00:17:18,419
constraint

449
00:17:15,780 --> 00:17:20,579
of this model in this given environment

450
00:17:18,419 --> 00:17:22,740
so take note of that um

451
00:17:20,579 --> 00:17:27,020
take note of that metrics right there so

452
00:17:22,740 --> 00:17:27,020
we typically record it right

453
00:17:27,480 --> 00:17:33,720
and then we deploy the model in the

454
00:17:30,679 --> 00:17:36,179
application production setup so the

455
00:17:33,720 --> 00:17:38,220
model does its job users use the

456
00:17:36,179 --> 00:17:39,780
application the model does the

457
00:17:38,220 --> 00:17:42,980
prediction it stores it into the

458
00:17:39,780 --> 00:17:42,980
database and you know

459
00:17:45,000 --> 00:17:49,559
after that

460
00:17:47,160 --> 00:17:51,600
we want to identify if the model is

461
00:17:49,559 --> 00:17:54,480
still doing well

462
00:17:51,600 --> 00:17:56,160
as we use the application so again we're

463
00:17:54,480 --> 00:17:59,160
going to go back to the initial approach

464
00:17:56,160 --> 00:18:02,039
right here inside the application we

465
00:17:59,160 --> 00:18:03,720
have the validation data set and we will

466
00:18:02,039 --> 00:18:06,240
apply that validation data set to the

467
00:18:03,720 --> 00:18:08,520
model and we will check if the result is

468
00:18:06,240 --> 00:18:10,919
still consistent with the recorded

469
00:18:08,520 --> 00:18:12,720
results that we have

470
00:18:10,919 --> 00:18:13,740
from our testing

471
00:18:12,720 --> 00:18:15,240
right

472
00:18:13,740 --> 00:18:18,799
so

473
00:18:15,240 --> 00:18:21,299
here is a very simple fast API example

474
00:18:18,799 --> 00:18:25,280
some of our micro services are built in

475
00:18:21,299 --> 00:18:28,380
fast API so we're just using the repeat

476
00:18:25,280 --> 00:18:31,080
decorator and first thing that we do is

477
00:18:28,380 --> 00:18:33,419
we load the validation data set

478
00:18:31,080 --> 00:18:35,940
after loading the validation data set we

479
00:18:33,419 --> 00:18:38,820
perform the prediction on the validation

480
00:18:35,940 --> 00:18:42,000
data set so it's here

481
00:18:38,820 --> 00:18:44,100
oh it's cool you can see it anyway

482
00:18:42,000 --> 00:18:45,600
um we record the start time

483
00:18:44,100 --> 00:18:47,220
and the end time just to get the

484
00:18:45,600 --> 00:18:50,820
duration

485
00:18:47,220 --> 00:18:53,039
right here and then

486
00:18:50,820 --> 00:18:55,500
we basically

487
00:18:53,039 --> 00:18:57,720
get the F1 and the accuracy or whichever

488
00:18:55,500 --> 00:19:00,179
metric you are using

489
00:18:57,720 --> 00:19:02,280
now after this we're basically just

490
00:19:00,179 --> 00:19:06,120
comparing

491
00:19:02,280 --> 00:19:07,740
if the accuracy F1 in duration is still

492
00:19:06,120 --> 00:19:10,500
within this range

493
00:19:07,740 --> 00:19:13,559
right so every time every time that this

494
00:19:10,500 --> 00:19:15,539
validation runs

495
00:19:13,559 --> 00:19:17,340
in this case it's running every day

496
00:19:15,539 --> 00:19:19,860
because um the frequencies

497
00:19:17,340 --> 00:19:22,559
per day so every day it should always

498
00:19:19,860 --> 00:19:23,640
fall into that bounds

499
00:19:22,559 --> 00:19:26,100
right

500
00:19:23,640 --> 00:19:27,660
if it's not there or if it decreases if

501
00:19:26,100 --> 00:19:30,360
it increases

502
00:19:27,660 --> 00:19:32,580
something might have gone you know wrong

503
00:19:30,360 --> 00:19:34,140
or something might have changed from the

504
00:19:32,580 --> 00:19:36,660
setup

505
00:19:34,140 --> 00:19:40,580
Okay so

506
00:19:36,660 --> 00:19:40,580
what's it for right

507
00:19:40,799 --> 00:19:45,000
initially when you look at this setup it

508
00:19:42,900 --> 00:19:46,620
seems pretty trivial because you're just

509
00:19:45,000 --> 00:19:47,760
validating it against itself every time

510
00:19:46,620 --> 00:19:49,980
every time

511
00:19:47,760 --> 00:19:51,419
what are we gaining from this

512
00:19:49,980 --> 00:19:53,580
right

513
00:19:51,419 --> 00:19:55,020
first is it's quite simple you can

514
00:19:53,580 --> 00:19:57,960
implement it in most of your

515
00:19:55,020 --> 00:19:59,880
applications easily straightforward and

516
00:19:57,960 --> 00:20:02,460
it's flexible enough to support multiple

517
00:19:59,880 --> 00:20:05,120
validation metrics okay

518
00:20:02,460 --> 00:20:07,980
however what we discovered is that

519
00:20:05,120 --> 00:20:12,360
implementing this very simple function

520
00:20:07,980 --> 00:20:14,820
can catch or can identify if there's a

521
00:20:12,360 --> 00:20:17,100
change in library or dependency version

522
00:20:14,820 --> 00:20:19,500
numpy for example

523
00:20:17,100 --> 00:20:21,780
um one specific example is that we have

524
00:20:19,500 --> 00:20:24,539
an application deployed in an ec2

525
00:20:21,780 --> 00:20:27,539
instance and for those of you who are

526
00:20:24,539 --> 00:20:30,419
familiar sometimes Amazon role you know

527
00:20:27,539 --> 00:20:33,240
they sometimes update the VMware that's

528
00:20:30,419 --> 00:20:36,000
underlying the is it it's quite annoying

529
00:20:33,240 --> 00:20:37,620
so you think that there's no update but

530
00:20:36,000 --> 00:20:40,080
they and they update the underlying

531
00:20:37,620 --> 00:20:42,000
architecture of the VMware which changes

532
00:20:40,080 --> 00:20:43,380
some of the libraries especially when we

533
00:20:42,000 --> 00:20:45,179
are using deep learning and stuff like

534
00:20:43,380 --> 00:20:47,640
that so

535
00:20:45,179 --> 00:20:50,280
our results suddenly change and we're

536
00:20:47,640 --> 00:20:51,780
like what what's happening what

537
00:20:50,280 --> 00:20:54,059
did something change did someone push

538
00:20:51,780 --> 00:20:55,860
something

539
00:20:54,059 --> 00:20:57,660
so that's that's one of the use cases

540
00:20:55,860 --> 00:20:59,880
it's actually very useful to detect if

541
00:20:57,660 --> 00:21:03,000
the setup has changed

542
00:20:59,880 --> 00:21:05,039
um there are some instances that the

543
00:21:03,000 --> 00:21:07,400
parameters of the model change when

544
00:21:05,039 --> 00:21:11,100
someone push a new code or when someone

545
00:21:07,400 --> 00:21:12,960
creates a bug fix and stuff like that

546
00:21:11,100 --> 00:21:14,640
sometimes they accidentally change the

547
00:21:12,960 --> 00:21:15,960
parameter of the model so we can detect

548
00:21:14,640 --> 00:21:17,220
that change as well

549
00:21:15,960 --> 00:21:19,140
and

550
00:21:17,220 --> 00:21:21,960
when you have a device set up if you're

551
00:21:19,140 --> 00:21:24,000
using uh Hardware in your prediction

552
00:21:21,960 --> 00:21:25,799
this is definitely useful because you

553
00:21:24,000 --> 00:21:27,840
don't know if the hardware is broken and

554
00:21:25,799 --> 00:21:30,059
if it's deployed somewhere if someone

555
00:21:27,840 --> 00:21:32,760
tampered with the hardware if the

556
00:21:30,059 --> 00:21:34,380
resource is not enough so that's the

557
00:21:32,760 --> 00:21:35,880
goal of the first validation to detect

558
00:21:34,380 --> 00:21:37,919
these kinds of change

559
00:21:35,880 --> 00:21:41,220
okay

560
00:21:37,919 --> 00:21:43,919
now let's go to the second step right

561
00:21:41,220 --> 00:21:46,200
so we have our application right here

562
00:21:43,919 --> 00:21:48,240
and our

563
00:21:46,200 --> 00:21:50,400
really good model

564
00:21:48,240 --> 00:21:53,100
didn't do so well on some of the data

565
00:21:50,400 --> 00:21:55,679
points right so he's like I'm not sure

566
00:21:53,100 --> 00:21:58,440
maybe I'm just 50 sure of this maybe I'm

567
00:21:55,679 --> 00:22:01,260
70 sure of this I'm 63 sure of this and

568
00:21:58,440 --> 00:22:03,120
stuff like that so in classification

569
00:22:01,260 --> 00:22:05,820
problems there is a predict probability

570
00:22:03,120 --> 00:22:08,460
which is this one so

571
00:22:05,820 --> 00:22:10,559
this is an example of some results using

572
00:22:08,460 --> 00:22:13,679
a classification problem

573
00:22:10,559 --> 00:22:15,120
right and then our little model right

574
00:22:13,679 --> 00:22:16,980
here is not sure

575
00:22:15,120 --> 00:22:19,620
I don't know but

576
00:22:16,980 --> 00:22:23,280
maybe this is true or maybe this is

577
00:22:19,620 --> 00:22:25,260
false so we we record the results right

578
00:22:23,280 --> 00:22:27,780
even the probability

579
00:22:25,260 --> 00:22:30,080
we we record it every time it performs a

580
00:22:27,780 --> 00:22:30,080
prediction

581
00:22:30,780 --> 00:22:35,520
now we have the probability we have two

582
00:22:33,299 --> 00:22:37,860
ways of going around this problem

583
00:22:35,520 --> 00:22:40,440
the first one is we have this big bad

584
00:22:37,860 --> 00:22:42,480
new model right here right so this is

585
00:22:40,440 --> 00:22:45,780
often our secondary model

586
00:22:42,480 --> 00:22:48,780
uh in our deployments we always have two

587
00:22:45,780 --> 00:22:50,400
or three models in place right so the

588
00:22:48,780 --> 00:22:52,799
first model is the one that we're using

589
00:22:50,400 --> 00:22:54,360
the one that's actually Taylor Made for

590
00:22:52,799 --> 00:22:56,340
that specific problem

591
00:22:54,360 --> 00:22:59,400
and then we have another model right

592
00:22:56,340 --> 00:23:02,940
here who's probably a lot stronger can

593
00:22:59,400 --> 00:23:05,460
classify better but it's quite slow

594
00:23:02,940 --> 00:23:07,320
right so there's a trade-off but they're

595
00:23:05,460 --> 00:23:09,720
still useful right if they can just

596
00:23:07,320 --> 00:23:11,100
validate some sample data it's still

597
00:23:09,720 --> 00:23:13,620
useful

598
00:23:11,100 --> 00:23:15,360
so that's the model that's the validator

599
00:23:13,620 --> 00:23:18,240
model right here that's what it's doing

600
00:23:15,360 --> 00:23:20,940
it's trying to figure out if the results

601
00:23:18,240 --> 00:23:23,280
of this main model that we have

602
00:23:20,940 --> 00:23:25,559
if the low confidence course it's trying

603
00:23:23,280 --> 00:23:29,580
to identify or it's trying to classify

604
00:23:25,559 --> 00:23:31,559
it with better confidence so the goal is

605
00:23:29,580 --> 00:23:32,340
this classifications right here should

606
00:23:31,559 --> 00:23:34,500
be

607
00:23:32,340 --> 00:23:36,659
you know

608
00:23:34,500 --> 00:23:38,059
it should have a solid classic or it

609
00:23:36,659 --> 00:23:42,179
should be at this 80

610
00:23:38,059 --> 00:23:44,700
confidence uh maybe even higher right

611
00:23:42,179 --> 00:23:46,440
and if you're using this you can do an

612
00:23:44,700 --> 00:23:49,080
automatic retraining if you want well

613
00:23:46,440 --> 00:23:50,159
I'm going to show you some code Snippets

614
00:23:49,080 --> 00:23:52,980
later

615
00:23:50,159 --> 00:23:54,600
uh in some instances we're working with

616
00:23:52,980 --> 00:23:57,240
several

617
00:23:54,600 --> 00:23:58,559
research groups for example

618
00:23:57,240 --> 00:24:00,900
um and then when they're using their

619
00:23:58,559 --> 00:24:03,720
model in the application what they

620
00:24:00,900 --> 00:24:07,440
wanted is they want to see

621
00:24:03,720 --> 00:24:08,700
the sampled data set right here so they

622
00:24:07,440 --> 00:24:10,860
want to be able to see it and they want

623
00:24:08,700 --> 00:24:12,659
to be able to verify it themselves this

624
00:24:10,860 --> 00:24:14,179
is very crucial especially when you're

625
00:24:12,659 --> 00:24:16,860
working with them

626
00:24:14,179 --> 00:24:20,280
medical type of problem or maybe

627
00:24:16,860 --> 00:24:21,240
research type research in biology or you

628
00:24:20,280 --> 00:24:23,100
know

629
00:24:21,240 --> 00:24:24,960
so they really want to see their data

630
00:24:23,100 --> 00:24:27,720
they want to be intimate with their data

631
00:24:24,960 --> 00:24:30,480
so in this case what we do is after we

632
00:24:27,720 --> 00:24:32,580
build the application we sample this

633
00:24:30,480 --> 00:24:36,960
data set right here and then we give it

634
00:24:32,580 --> 00:24:39,960
to them through a report right and then

635
00:24:36,960 --> 00:24:42,120
that's it they validate it and then they

636
00:24:39,960 --> 00:24:45,480
can retrain based on it

637
00:24:42,120 --> 00:24:47,580
so this is sample code of manual

638
00:24:45,480 --> 00:24:48,960
retraining so

639
00:24:47,580 --> 00:24:50,880
we just

640
00:24:48,960 --> 00:24:52,919
right here we get the sampled result

641
00:24:50,880 --> 00:24:53,640
that has very low confidence score and

642
00:24:52,919 --> 00:24:56,039
then

643
00:24:53,640 --> 00:24:58,880
basically send it to the user

644
00:24:56,039 --> 00:24:58,880
do whatever you want

645
00:24:59,520 --> 00:25:04,380
or we can go in the automated route in

646
00:25:02,640 --> 00:25:05,640
this case after we get the sampled

647
00:25:04,380 --> 00:25:08,640
results

648
00:25:05,640 --> 00:25:11,940
right we have a second validator that

649
00:25:08,640 --> 00:25:13,980
predicts the flag data set and then we

650
00:25:11,940 --> 00:25:15,120
compare the accuracy or whatever metric

651
00:25:13,980 --> 00:25:16,919
you want to use

652
00:25:15,120 --> 00:25:19,860
if we want we can just directly

653
00:25:16,919 --> 00:25:22,620
overwrite the results using the more

654
00:25:19,860 --> 00:25:25,559
stronger model or we can

655
00:25:22,620 --> 00:25:27,720
decide on what to do maybe if doctors is

656
00:25:25,559 --> 00:25:30,000
this overwrite and then retrain

657
00:25:27,720 --> 00:25:31,080
okay

658
00:25:30,000 --> 00:25:33,179
now

659
00:25:31,080 --> 00:25:35,400
this is really good

660
00:25:33,179 --> 00:25:38,400
for detecting early signs of drifts

661
00:25:35,400 --> 00:25:40,320
model drifts concept drifts and possible

662
00:25:38,400 --> 00:25:42,960
changes in the data distribution

663
00:25:40,320 --> 00:25:44,760
so this is what we usually have in place

664
00:25:42,960 --> 00:25:46,559
so that we can identify if there's a

665
00:25:44,760 --> 00:25:48,000
problem or if there's going to be a

666
00:25:46,559 --> 00:25:49,980
problem with the performance of the

667
00:25:48,000 --> 00:25:51,240
model

668
00:25:49,980 --> 00:25:52,860
now

669
00:25:51,240 --> 00:25:55,020
lastly we're going to talk about model

670
00:25:52,860 --> 00:25:56,220
maintenance

671
00:25:55,020 --> 00:25:58,559
um

672
00:25:56,220 --> 00:26:01,580
here we have the concept of model

673
00:25:58,559 --> 00:26:01,580
stability versus

674
00:26:01,679 --> 00:26:05,940
retraining sorry

675
00:26:03,360 --> 00:26:07,860
so models are more stable if you don't

676
00:26:05,940 --> 00:26:11,220
need to retrain them from time to time

677
00:26:07,860 --> 00:26:13,980
like this one in the first model and

678
00:26:11,220 --> 00:26:16,860
it's less prone to model and data drift

679
00:26:13,980 --> 00:26:19,080
right however they might require more

680
00:26:16,860 --> 00:26:20,940
time to develop because you need

681
00:26:19,080 --> 00:26:23,159
more data set you need to gather more

682
00:26:20,940 --> 00:26:25,140
resource to make the model more stable

683
00:26:23,159 --> 00:26:26,940
unlike the second model right here where

684
00:26:25,140 --> 00:26:29,840
you need to retrain periodically if the

685
00:26:26,940 --> 00:26:29,840
performance goes down

686
00:26:30,900 --> 00:26:34,080
however

687
00:26:32,159 --> 00:26:36,539
you need to understand that if you

688
00:26:34,080 --> 00:26:37,500
retrain the model on a longer period of

689
00:26:36,539 --> 00:26:40,140
time

690
00:26:37,500 --> 00:26:41,940
it seems that it's actually quite stable

691
00:26:40,140 --> 00:26:43,919
right as long as you have a good

692
00:26:41,940 --> 00:26:46,320
retraining process

693
00:26:43,919 --> 00:26:49,039
and it benefits from the new data set

694
00:26:46,320 --> 00:26:49,039
that it learns

695
00:26:50,220 --> 00:26:55,380
now for model longevity like it depends

696
00:26:52,860 --> 00:26:57,840
on the context and the expected inputs

697
00:26:55,380 --> 00:26:59,700
if it's just a small problem then

698
00:26:57,840 --> 00:27:02,159
typically you're gonna have a model

699
00:26:59,700 --> 00:27:03,720
deployed somewhere over a long period of

700
00:27:02,159 --> 00:27:04,740
time because it doesn't require any

701
00:27:03,720 --> 00:27:06,840
change

702
00:27:04,740 --> 00:27:09,240
similar with the dynamic relationship

703
00:27:06,840 --> 00:27:11,039
and like if you need to retrain models

704
00:27:09,240 --> 00:27:12,960
from time to time the data that

705
00:27:11,039 --> 00:27:15,659
encounters is dynamic so you need to

706
00:27:12,960 --> 00:27:17,460
replace the model and most of the time

707
00:27:15,659 --> 00:27:20,340
you need to retrain them

708
00:27:17,460 --> 00:27:22,020
and lastly this is my last slide so the

709
00:27:20,340 --> 00:27:23,880
cost of Maintenance you always need to

710
00:27:22,020 --> 00:27:26,520
consider this if you want to produce

711
00:27:23,880 --> 00:27:29,340
stable models you might have more

712
00:27:26,520 --> 00:27:30,720
upfront cost again as I mentioned stable

713
00:27:29,340 --> 00:27:33,900
models are

714
00:27:30,720 --> 00:27:36,779
more expensive to develop right and you

715
00:27:33,900 --> 00:27:39,000
know it's always good for models that

716
00:27:36,779 --> 00:27:40,980
need constantly training it's better to

717
00:27:39,000 --> 00:27:42,779
automate them to reduce the cost and you

718
00:27:40,980 --> 00:27:43,919
can always use transfer learning if you

719
00:27:42,779 --> 00:27:47,059
want

720
00:27:43,919 --> 00:27:47,059
in the sale of my slide

721
00:27:47,159 --> 00:27:51,140
questions

722
00:27:48,720 --> 00:27:51,140
hey

723
00:27:51,720 --> 00:27:54,559
thank you nins

724
00:27:55,080 --> 00:27:59,340
very very interesting very insightful

725
00:27:57,480 --> 00:28:01,640
now we probably got time for some

726
00:27:59,340 --> 00:28:01,640
questions

727
00:28:01,980 --> 00:28:08,120
I can't see anything from here because

728
00:28:04,860 --> 00:28:08,120
someone over there I think

729
00:28:13,559 --> 00:28:18,539
hi

730
00:28:15,539 --> 00:28:21,960
um you mentioned uh retraining the data

731
00:28:18,539 --> 00:28:24,900
as the model accuracy dips over time

732
00:28:21,960 --> 00:28:26,400
and I just wanted to ask how do you like

733
00:28:24,900 --> 00:28:28,580
what are some techniques you have to

734
00:28:26,400 --> 00:28:31,860
avoid model overfitting as you retrain

735
00:28:28,580 --> 00:28:35,279
particularly with the things like

736
00:28:31,860 --> 00:28:38,159
unidentified seasonal data and um

737
00:28:35,279 --> 00:28:39,659
uh yeah just how you would avoid

738
00:28:38,159 --> 00:28:42,539
especially when you automate the

739
00:28:39,659 --> 00:28:44,100
retraining process

740
00:28:42,539 --> 00:28:46,200
shouldn't and I think we get the

741
00:28:44,100 --> 00:28:48,000
question a lot right remember in the

742
00:28:46,200 --> 00:28:50,100
earlier part when I told you to use more

743
00:28:48,000 --> 00:28:52,559
than one metric

744
00:28:50,100 --> 00:28:54,419
use five use 10 metrics to be able to

745
00:28:52,559 --> 00:28:57,480
fully understand the behavior of your

746
00:28:54,419 --> 00:29:00,480
data and your model right so for example

747
00:28:57,480 --> 00:29:02,820
this is a real life scenario like um we

748
00:29:00,480 --> 00:29:03,900
have a model deployed in production and

749
00:29:02,820 --> 00:29:07,020
then

750
00:29:03,900 --> 00:29:09,059
the F1 score continually goes down

751
00:29:07,020 --> 00:29:12,120
right it continually goes down

752
00:29:09,059 --> 00:29:14,940
however if you look at uh

753
00:29:12,120 --> 00:29:17,159
recall and the lag loss function

754
00:29:14,940 --> 00:29:20,760
the performance is still okay

755
00:29:17,159 --> 00:29:23,520
right if we follow the F1 score we

756
00:29:20,760 --> 00:29:26,100
constantly need to retrain the model

757
00:29:23,520 --> 00:29:31,020
and as a result

758
00:29:26,100 --> 00:29:33,539
the subsequent models became overfitted

759
00:29:31,020 --> 00:29:36,360
however we did not do the retraining

760
00:29:33,539 --> 00:29:38,700
because we have the log loss and we have

761
00:29:36,360 --> 00:29:41,880
the other metrics that allowed us to

762
00:29:38,700 --> 00:29:43,860
understand that okay it's okay if the F1

763
00:29:41,880 --> 00:29:46,260
score decreases because

764
00:29:43,860 --> 00:29:47,760
it will be over fitted if we train it's

765
00:29:46,260 --> 00:29:50,520
not the target metric that we want to

766
00:29:47,760 --> 00:29:52,919
use anyway so you know that's one of the

767
00:29:50,520 --> 00:29:53,700
way that we approach the problem and I

768
00:29:52,919 --> 00:29:55,679
think

769
00:29:53,700 --> 00:29:57,179
um you should have a Hands-On or you

770
00:29:55,679 --> 00:29:58,919
should try it Hands-On so that you can

771
00:29:57,179 --> 00:30:02,460
experience it yourself right because

772
00:29:58,919 --> 00:30:03,779
it's uh it's something that um you'll

773
00:30:02,460 --> 00:30:05,760
definitely see in the patterns of the

774
00:30:03,779 --> 00:30:07,860
data once you try it yourself

775
00:30:05,760 --> 00:30:10,080
so thank you use metrics use as many

776
00:30:07,860 --> 00:30:12,539
metric as you want

777
00:30:10,080 --> 00:30:14,640
questions thank you any other questions

778
00:30:12,539 --> 00:30:17,600
oh I can see

779
00:30:14,640 --> 00:30:17,600
this one over there

780
00:30:18,539 --> 00:30:23,340
um I've got a bit of a two-parter

781
00:30:21,120 --> 00:30:27,120
um so you mentioned the test train

782
00:30:23,340 --> 00:30:29,039
validate split uh do you have preferred

783
00:30:27,120 --> 00:30:31,679
proportions for splitting your initial

784
00:30:29,039 --> 00:30:33,960
data set into test train validate ah yes

785
00:30:31,679 --> 00:30:36,059
they vary with the size of the data

786
00:30:33,960 --> 00:30:37,799
available the second part of my question

787
00:30:36,059 --> 00:30:39,980
was it sounds like you're picking local

788
00:30:37,799 --> 00:30:42,120
low confidence predictions and you and

789
00:30:39,980 --> 00:30:45,720
labeling those samples and using them to

790
00:30:42,120 --> 00:30:47,760
bulk up your data set in production if

791
00:30:45,720 --> 00:30:49,260
you're picking actual samples submitted

792
00:30:47,760 --> 00:30:51,240
to you have you ever come across privacy

793
00:30:49,260 --> 00:30:54,240
concerns with that

794
00:30:51,240 --> 00:30:55,799
okay so I'll answer the distribution

795
00:30:54,240 --> 00:30:57,419
problem first

796
00:30:55,799 --> 00:31:01,500
um typically

797
00:30:57,419 --> 00:31:04,799
uh we for example let's say we have 100

798
00:31:01,500 --> 00:31:07,440
data points right so typically what we

799
00:31:04,799 --> 00:31:10,020
do before is uh somewhat 60 training

800
00:31:07,440 --> 00:31:12,779
data set 20

801
00:31:10,020 --> 00:31:14,159
um testing and then 20 validation

802
00:31:12,779 --> 00:31:16,260
however

803
00:31:14,159 --> 00:31:18,659
uh we discovered that

804
00:31:16,260 --> 00:31:22,320
ideally we would want that validation

805
00:31:18,659 --> 00:31:24,720
data set to be sampled differently

806
00:31:22,320 --> 00:31:28,679
from the training and the testing data

807
00:31:24,720 --> 00:31:31,080
set so if we have an option to ask the

808
00:31:28,679 --> 00:31:33,120
customer hey look this is the initial

809
00:31:31,080 --> 00:31:36,000
data that you provided we can train with

810
00:31:33,120 --> 00:31:38,700
this data set and then we can use some

811
00:31:36,000 --> 00:31:42,600
part of it as a testing data set however

812
00:31:38,700 --> 00:31:44,460
would you be willing to extend your data

813
00:31:42,600 --> 00:31:46,740
set maybe to another month like for

814
00:31:44,460 --> 00:31:48,240
example if if we're building an

815
00:31:46,740 --> 00:31:49,980
application for them

816
00:31:48,240 --> 00:31:52,799
they're gonna give us another month

817
00:31:49,980 --> 00:31:56,159
worth of data after we finish the model

818
00:31:52,799 --> 00:31:57,480
and that will be the validation data set

819
00:31:56,159 --> 00:31:59,279
so

820
00:31:57,480 --> 00:32:01,200
we figured that or at least the data

821
00:31:59,279 --> 00:32:03,840
scientist in my team they figured out

822
00:32:01,200 --> 00:32:06,179
that it's more organic that way they

823
00:32:03,840 --> 00:32:07,860
were able to capture the relationships

824
00:32:06,179 --> 00:32:10,260
of the data better because

825
00:32:07,860 --> 00:32:12,240
let's please if someone gives you a

826
00:32:10,260 --> 00:32:13,860
training data set and a testing data set

827
00:32:12,240 --> 00:32:15,840
there will always be some form of bias

828
00:32:13,860 --> 00:32:17,880
going on in there right so they will

829
00:32:15,840 --> 00:32:19,380
this is our Target problem and

830
00:32:17,880 --> 00:32:22,200
oftentimes they will give you a really

831
00:32:19,380 --> 00:32:24,480
clean or a really standard relationship

832
00:32:22,200 --> 00:32:27,179
between the data so that's what we

833
00:32:24,480 --> 00:32:29,039
typically do we separate that

834
00:32:27,179 --> 00:32:31,799
training and testing data set to the

835
00:32:29,039 --> 00:32:34,020
validation data set right so

836
00:32:31,799 --> 00:32:36,539
if you can do that then do that right

837
00:32:34,020 --> 00:32:38,279
I'm sorry what's your second question uh

838
00:32:36,539 --> 00:32:40,740
it sounded like you were

839
00:32:38,279 --> 00:32:42,539
picking low confidence predictions for

840
00:32:40,740 --> 00:32:45,000
production use case of the model and

841
00:32:42,539 --> 00:32:46,559
then including them in your data set if

842
00:32:45,000 --> 00:32:48,299
you are doing that have you come across

843
00:32:46,559 --> 00:32:49,620
privacy concerns or are you just not

844
00:32:48,299 --> 00:32:52,260
processing data where that's a problem

845
00:32:49,620 --> 00:32:54,059
oh yeah so we're picking the low

846
00:32:52,260 --> 00:32:56,520
confidence predictions

847
00:32:54,059 --> 00:32:58,500
just for validation purposes right it's

848
00:32:56,520 --> 00:33:01,140
still up to the data scientist if they

849
00:32:58,500 --> 00:33:01,980
want to include those data points in the

850
00:33:01,140 --> 00:33:04,320
prediction

851
00:33:01,980 --> 00:33:06,600
right we just want to understand why

852
00:33:04,320 --> 00:33:09,120
these guys have low predictions scores

853
00:33:06,600 --> 00:33:10,679
and if it makes sense to include them in

854
00:33:09,120 --> 00:33:12,299
the training data set or if it makes

855
00:33:10,679 --> 00:33:13,260
sense to create a new model out of them

856
00:33:12,299 --> 00:33:16,080
then

857
00:33:13,260 --> 00:33:18,539
of course we will do that right so it's

858
00:33:16,080 --> 00:33:20,580
not it's not we just you know pick them

859
00:33:18,539 --> 00:33:22,140
and then include them right away uh

860
00:33:20,580 --> 00:33:23,880
there are some instances like what I've

861
00:33:22,140 --> 00:33:25,620
mentioned where if we're if we're really

862
00:33:23,880 --> 00:33:28,140
confident with the model like if we have

863
00:33:25,620 --> 00:33:30,539
a big red model you know getting ready

864
00:33:28,140 --> 00:33:32,640
for those data points we just overwrite

865
00:33:30,539 --> 00:33:34,500
them completely but in most cases

866
00:33:32,640 --> 00:33:36,899
there's always another layer of choosing

867
00:33:34,500 --> 00:33:38,760
which of those low confidence data

868
00:33:36,899 --> 00:33:40,679
points should we include in the next

869
00:33:38,760 --> 00:33:42,419
training cycle right so there are other

870
00:33:40,679 --> 00:33:44,880
statistical tests that we perform in

871
00:33:42,419 --> 00:33:47,220
those low level data points to make sure

872
00:33:44,880 --> 00:33:49,320
that we're including a data point that

873
00:33:47,220 --> 00:33:51,720
represents or has a good representation

874
00:33:49,320 --> 00:33:54,960
of the actual problem that we're facing

875
00:33:51,720 --> 00:33:56,640
so this is that answer your question

876
00:33:54,960 --> 00:33:58,440
I think it just sounds like you're not

877
00:33:56,640 --> 00:34:00,179
processing data that's personally

878
00:33:58,440 --> 00:34:01,919
identifiable so you're free to do that

879
00:34:00,179 --> 00:34:04,200
which is great yeah yeah basically yes

880
00:34:01,919 --> 00:34:06,720
so

881
00:34:04,200 --> 00:34:09,240
okay thank you I have a stupid question

882
00:34:06,720 --> 00:34:10,679
perhaps no no questions stupid all right

883
00:34:09,240 --> 00:34:13,619
wait until you hear it

884
00:34:10,679 --> 00:34:14,820
um if you've got a super brain model why

885
00:34:13,619 --> 00:34:16,919
aren't you just using that in production

886
00:34:14,820 --> 00:34:18,720
which one if you've got a super duper

887
00:34:16,919 --> 00:34:21,240
model that's like

888
00:34:18,720 --> 00:34:22,679
more accurate why wouldn't use that one

889
00:34:21,240 --> 00:34:25,099
in production well yeah that's the thing

890
00:34:22,679 --> 00:34:25,099
I mean

891
00:34:25,320 --> 00:34:30,480
in my opinion people are obsessed with

892
00:34:28,139 --> 00:34:32,580
how you know this very generalized model

893
00:34:30,480 --> 00:34:35,940
that can perform everything really fast

894
00:34:32,580 --> 00:34:37,260
very accurate it doesn't exist

895
00:34:35,940 --> 00:34:39,780
not yet

896
00:34:37,260 --> 00:34:41,879
okay it doesn't exist or at least not

897
00:34:39,780 --> 00:34:43,679
yet right based on our experience there

898
00:34:41,879 --> 00:34:46,560
will always be some sort of trade-offs

899
00:34:43,679 --> 00:34:49,020
like uh you need to communicate this to

900
00:34:46,560 --> 00:34:51,599
your customers or to your partners one

901
00:34:49,020 --> 00:34:54,839
example would be the factory line set up

902
00:34:51,599 --> 00:34:56,700
right we need to perform prediction on a

903
00:34:54,839 --> 00:34:59,640
two megabyte device

904
00:34:56,700 --> 00:35:01,940
so it's very difficult to put in a

905
00:34:59,640 --> 00:35:05,099
really powerful model inside that device

906
00:35:01,940 --> 00:35:06,420
yeah it's small memory so you know it

907
00:35:05,099 --> 00:35:08,940
depends upon the context and maybe

908
00:35:06,420 --> 00:35:13,140
someday maybe someday hopefully

909
00:35:08,940 --> 00:35:15,420
hopefully soon enough or or not we will

910
00:35:13,140 --> 00:35:17,040
have that really big AI that can

911
00:35:15,420 --> 00:35:19,380
generalize and predict most of our

912
00:35:17,040 --> 00:35:21,119
problems maybe maybe thanks okay thank

913
00:35:19,380 --> 00:35:23,540
you thank you nins and here's a token of

914
00:35:21,119 --> 00:35:26,550
our appreciation

915
00:35:23,540 --> 00:35:29,769
thank you round of applause thanks guys

916
00:35:26,550 --> 00:35:29,769
[Applause]