1
00:00:00,539 --> 00:00:03,539
foreign

2
00:00:09,559 --> 00:00:15,059
up next we have Tish who's a senior

3
00:00:12,660 --> 00:00:16,139
engineer at csiro space and astronomy

4
00:00:15,059 --> 00:00:18,840
division

5
00:00:16,139 --> 00:00:20,699
which sounds like such a cool job

6
00:00:18,840 --> 00:00:23,039
um he's previously worked with agencies

7
00:00:20,699 --> 00:00:25,199
like NASA and jaxa

8
00:00:23,039 --> 00:00:27,240
um so if you give me a give him a round

9
00:00:25,199 --> 00:00:28,740
of applause

10
00:00:27,240 --> 00:00:29,640
it's gonna be talking about I should

11
00:00:28,740 --> 00:00:33,180
have said what he's going to be talking

12
00:00:29,640 --> 00:00:34,680
about talking about open data Cube and

13
00:00:33,180 --> 00:00:36,780
I'll let you take it away there you go

14
00:00:34,680 --> 00:00:38,940
thank you

15
00:00:36,780 --> 00:00:40,680
um I'll start by acknowledging the land

16
00:00:38,940 --> 00:00:42,719
and the peoples where we stand we land

17
00:00:40,680 --> 00:00:46,500
in the stand in there Ghana people's

18
00:00:42,719 --> 00:00:48,239
land and it has never been seated I work

19
00:00:46,500 --> 00:00:51,719
with satellite images and satellite

20
00:00:48,239 --> 00:00:53,820
images here look down on the Earth from

21
00:00:51,719 --> 00:00:56,820
space and sort of give us a perspective

22
00:00:53,820 --> 00:00:59,699
there are no boundaries and everything

23
00:00:56,820 --> 00:01:02,039
looks the same and we try to organize

24
00:00:59,699 --> 00:01:03,719
the data and try to deliver it to people

25
00:01:02,039 --> 00:01:04,680
so you can make sense of things over

26
00:01:03,719 --> 00:01:07,920
time

27
00:01:04,680 --> 00:01:09,360
open data cube is a python library for

28
00:01:07,920 --> 00:01:11,520
accessing and processing satellite

29
00:01:09,360 --> 00:01:13,200
imagery I will sort of use this Library

30
00:01:11,520 --> 00:01:15,200
metaphor a little bit we talk about

31
00:01:13,200 --> 00:01:18,240
libraries in software engineering a lot

32
00:01:15,200 --> 00:01:22,619
but libraries existed before software

33
00:01:18,240 --> 00:01:24,420
I'll just try to make use that analogy a

34
00:01:22,619 --> 00:01:27,180
little bit in my top

35
00:01:24,420 --> 00:01:28,320
a little bit about me I'm a bit of a

36
00:01:27,180 --> 00:01:31,140
space cadet

37
00:01:28,320 --> 00:01:33,560
um I sort of you know enjoy working with

38
00:01:31,140 --> 00:01:36,540
satellites I've been doing that since

39
00:01:33,560 --> 00:01:40,079
2004 or so for pretty much all of my

40
00:01:36,540 --> 00:01:41,759
career currently I work at csiro

41
00:01:40,079 --> 00:01:43,619
um on the open data Cube project

42
00:01:41,759 --> 00:01:45,720
steering committee it's a fairly small

43
00:01:43,619 --> 00:01:47,759
community I'm trying to do presentations

44
00:01:45,720 --> 00:01:51,360
like this to build bring more interest

45
00:01:47,759 --> 00:01:53,700
into it get diverse stakeholders uh

46
00:01:51,360 --> 00:01:55,560
looking after the project

47
00:01:53,700 --> 00:01:57,479
previously I used to work at your

48
00:01:55,560 --> 00:02:00,119
science Australia which is the

49
00:01:57,479 --> 00:02:02,040
Australia's geospatial agency these are

50
00:02:00,119 --> 00:02:04,040
some of my team I used to work with they

51
00:02:02,040 --> 00:02:06,420
have won a few government Awards

52
00:02:04,040 --> 00:02:08,780
currently I work at csro which is

53
00:02:06,420 --> 00:02:10,860
semi-government there are other

54
00:02:08,780 --> 00:02:13,560
participants in the open data queue

55
00:02:10,860 --> 00:02:16,920
project across the world which makes it

56
00:02:13,560 --> 00:02:20,099
a thriving multi-owner project

57
00:02:16,920 --> 00:02:21,420
if you go to the ODC or GitHub you can

58
00:02:20,099 --> 00:02:24,360
see lots and lots of people who have

59
00:02:21,420 --> 00:02:26,700
contributed over time and like in open

60
00:02:24,360 --> 00:02:28,020
source you sort of get to work with

61
00:02:26,700 --> 00:02:30,060
different people across different

62
00:02:28,020 --> 00:02:32,099
organizations but work on the same

63
00:02:30,060 --> 00:02:34,379
project sort of sort of gives you a bit

64
00:02:32,099 --> 00:02:36,540
of continuity

65
00:02:34,379 --> 00:02:39,660
um yeah so I work with the last few

66
00:02:36,540 --> 00:02:42,000
people in this list Matt and Rob so at

67
00:02:39,660 --> 00:02:44,220
csiro

68
00:02:42,000 --> 00:02:46,800
so what is open data Cube I started it

69
00:02:44,220 --> 00:02:50,280
by saying it is a python Library

70
00:02:46,800 --> 00:02:52,379
so for some that answer is true so it is

71
00:02:50,280 --> 00:02:53,940
a python Library most of the code in the

72
00:02:52,379 --> 00:02:55,680
open data cube is python code even

73
00:02:53,940 --> 00:02:57,840
though there is lots of SQL in there

74
00:02:55,680 --> 00:03:00,480
previously there was a talk about Pi

75
00:02:57,840 --> 00:03:04,680
spark and it was funny how you know Pi

76
00:03:00,480 --> 00:03:06,120
spark is a wrapper around Scala so for

77
00:03:04,680 --> 00:03:07,440
me I will mostly work at the

78
00:03:06,120 --> 00:03:10,620
infrastructure layer which I will talk

79
00:03:07,440 --> 00:03:13,019
about later uh so it to me it looks like

80
00:03:10,620 --> 00:03:15,000
a bunch of blob storage an indexing

81
00:03:13,019 --> 00:03:17,280
database of some sort a kubernetes

82
00:03:15,000 --> 00:03:19,440
cluster for an organization which

83
00:03:17,280 --> 00:03:21,360
doesn't care about the integrities or

84
00:03:19,440 --> 00:03:23,700
like csiro which is full of scientists

85
00:03:21,360 --> 00:03:25,560
it's an easy method for gaining access

86
00:03:23,700 --> 00:03:27,900
to large amounts of satellite imagery

87
00:03:25,560 --> 00:03:30,000
through a single interface being able to

88
00:03:27,900 --> 00:03:32,120
process it at scale and get the

89
00:03:30,000 --> 00:03:35,580
scientific questions answered

90
00:03:32,120 --> 00:03:38,459
it doesn't necessarily be used for

91
00:03:35,580 --> 00:03:42,799
satellite imagery it it fits pretty well

92
00:03:38,459 --> 00:03:46,200
with that sort of dense array style data

93
00:03:42,799 --> 00:03:48,840
so but it can be used for our processing

94
00:03:46,200 --> 00:03:51,420
other sort of matrices large Matrix data

95
00:03:48,840 --> 00:03:53,940
atmospheric simulations and so on

96
00:03:51,420 --> 00:03:56,700
so a bit of History where did open data

97
00:03:53,940 --> 00:03:58,200
Cube come from uh why how did we get

98
00:03:56,700 --> 00:04:01,260
here

99
00:03:58,200 --> 00:04:03,060
so I found out these things I was

100
00:04:01,260 --> 00:04:05,819
introduced to open data Cube just by

101
00:04:03,060 --> 00:04:09,599
accident I was working at a startup in

102
00:04:05,819 --> 00:04:13,620
Nairobi and I turned up at a university

103
00:04:09,599 --> 00:04:15,560
and some people from the Ordnance survey

104
00:04:13,620 --> 00:04:18,780
in Britain were being demonstrated that

105
00:04:15,560 --> 00:04:21,299
Africa East Africa Regional data Cube

106
00:04:18,780 --> 00:04:22,979
and at that point I was like oh what is

107
00:04:21,299 --> 00:04:24,960
this thing and I found artists run by an

108
00:04:22,979 --> 00:04:27,060
Australian organization and I came back

109
00:04:24,960 --> 00:04:28,860
and worked at geosens Australia and

110
00:04:27,060 --> 00:04:32,040
figured out some of this history

111
00:04:28,860 --> 00:04:34,800
so just as Australia is a data custodian

112
00:04:32,040 --> 00:04:36,960
for a lot of the data in over Australia

113
00:04:34,800 --> 00:04:38,580
they have been collecting landsat data

114
00:04:36,960 --> 00:04:41,360
receiving and collecting landsat data

115
00:04:38,580 --> 00:04:44,400
for the 40 plus years A lot of it is

116
00:04:41,360 --> 00:04:47,160
conducted by ground stations operated by

117
00:04:44,400 --> 00:04:48,540
geosens Australia is stored at NCI which

118
00:04:47,160 --> 00:04:51,960
is the national compute infrastructure

119
00:04:48,540 --> 00:04:54,120
supercomputer in Canberra recently

120
00:04:51,960 --> 00:04:56,340
European space agency has started the

121
00:04:54,120 --> 00:04:58,500
Copernicus program and the Australia

122
00:04:56,340 --> 00:05:00,780
Regional data Hub has collected a lot of

123
00:04:58,500 --> 00:05:03,240
the Sentinel data which is a series of

124
00:05:00,780 --> 00:05:05,639
satellites operated under the Copernicus

125
00:05:03,240 --> 00:05:08,580
program so that means there is around

126
00:05:05,639 --> 00:05:10,620
five petabytes of data collected and

127
00:05:08,580 --> 00:05:13,199
I'll just go through some maths later on

128
00:05:10,620 --> 00:05:15,660
which is going to explode a lot more in

129
00:05:13,199 --> 00:05:19,380
the near future

130
00:05:15,660 --> 00:05:22,440
so open data Cube started as a sort of a

131
00:05:19,380 --> 00:05:23,580
way to organize this data in the HPC in

132
00:05:22,440 --> 00:05:27,060
the high performance Computing

133
00:05:23,580 --> 00:05:29,160
supercomputer environment I looked

134
00:05:27,060 --> 00:05:33,479
through the GitHub and I joined the

135
00:05:29,160 --> 00:05:34,860
project around 2018 or so uh 2019 and I

136
00:05:33,479 --> 00:05:37,280
looked through the history it was

137
00:05:34,860 --> 00:05:40,080
released initially in 2016

138
00:05:37,280 --> 00:05:43,620
and then it went through a bit of uh

139
00:05:40,080 --> 00:05:45,900
refactoring renaming things from being a

140
00:05:43,620 --> 00:05:48,720
project for an agency to being open

141
00:05:45,900 --> 00:05:51,180
sourced it was renamed from Australian

142
00:05:48,720 --> 00:05:54,180
geoscience data Cube to the open data

143
00:05:51,180 --> 00:05:56,460
Cube currently it's in a 1.8 versions

144
00:05:54,180 --> 00:05:58,860
fairly stable it's used by organizations

145
00:05:56,460 --> 00:06:01,139
around the world uh doing a bit of

146
00:05:58,860 --> 00:06:03,240
restructuring around how the index is

147
00:06:01,139 --> 00:06:04,800
maintained so talk more about the

148
00:06:03,240 --> 00:06:06,199
importance of indexing and these things

149
00:06:04,800 --> 00:06:09,479
I'm going over

150
00:06:06,199 --> 00:06:11,039
but in the future that it's going

151
00:06:09,479 --> 00:06:13,500
towards this sort of machine learning

152
00:06:11,039 --> 00:06:17,039
queryable raster data sets approach

153
00:06:13,500 --> 00:06:20,100
where things are meta data is a bit more

154
00:06:17,039 --> 00:06:23,160
flexible you can query it easily and try

155
00:06:20,100 --> 00:06:25,199
to find answers quicker rather than be

156
00:06:23,160 --> 00:06:27,419
stuck in details of the technology

157
00:06:25,199 --> 00:06:29,580
implementation

158
00:06:27,419 --> 00:06:32,340
so talking about details of Technology

159
00:06:29,580 --> 00:06:34,500
implementation uh we are in a technology

160
00:06:32,340 --> 00:06:36,600
conference and some of these details are

161
00:06:34,500 --> 00:06:38,340
important in for performance Reasons I'm

162
00:06:36,600 --> 00:06:39,720
talking about large amounts of data

163
00:06:38,340 --> 00:06:42,360
processing so you have to store it

164
00:06:39,720 --> 00:06:43,979
efficiently and then get it back bits of

165
00:06:42,360 --> 00:06:46,680
it quickly

166
00:06:43,979 --> 00:06:49,080
so this is one of the this is like one

167
00:06:46,680 --> 00:06:53,819
of the images I like around the library

168
00:06:49,080 --> 00:06:56,460
analogy so this is uh in Mona at the in

169
00:06:53,819 --> 00:06:59,520
Hobart it's a piece of art called the

170
00:06:56,460 --> 00:07:01,740
white Library so a library doesn't

171
00:06:59,520 --> 00:07:04,319
necessarily make things faster if you

172
00:07:01,740 --> 00:07:06,960
don't have the right way to find the

173
00:07:04,319 --> 00:07:09,240
right piece of data right be it book If

174
00:07:06,960 --> 00:07:12,720
there there are no indexes the books are

175
00:07:09,240 --> 00:07:14,100
in you know pages are blank and

176
00:07:12,720 --> 00:07:16,440
ultimately you have to have an interest

177
00:07:14,100 --> 00:07:18,060
in finding the right thing so once you

178
00:07:16,440 --> 00:07:19,560
have that interest you go backwards and

179
00:07:18,060 --> 00:07:21,479
you organize your data you have

180
00:07:19,560 --> 00:07:24,240
everything labeled there are no missing

181
00:07:21,479 --> 00:07:26,400
bits or corrupted data there is a

182
00:07:24,240 --> 00:07:28,319
synopsis and everything but actual

183
00:07:26,400 --> 00:07:30,180
content is there somewhere in a durable

184
00:07:28,319 --> 00:07:33,840
way

185
00:07:30,180 --> 00:07:36,300
so over ODC tries to add those layers of

186
00:07:33,840 --> 00:07:38,099
organization on top of all of this data

187
00:07:36,300 --> 00:07:42,660
we are collecting from space continually

188
00:07:38,099 --> 00:07:45,479
to make it easier to access the volumes

189
00:07:42,660 --> 00:07:47,639
so in the early days there are storage

190
00:07:45,479 --> 00:07:50,819
back end for the open data Cube used to

191
00:07:47,639 --> 00:07:53,819
be net CDF which is common for storing

192
00:07:50,819 --> 00:07:56,580
simulation data is tried and tested in

193
00:07:53,819 --> 00:07:58,860
the ocean and atmosphere Community it

194
00:07:56,580 --> 00:08:01,259
has a convention attached to it but it's

195
00:07:58,860 --> 00:08:04,160
a convention so it's quite flexible so

196
00:08:01,259 --> 00:08:07,139
each net CDF can be slightly different

197
00:08:04,160 --> 00:08:09,060
so it was not Cloud native it could

198
00:08:07,139 --> 00:08:11,720
couldn't scale it required a very high

199
00:08:09,060 --> 00:08:14,940
performance luster like file system

200
00:08:11,720 --> 00:08:17,160
which is available in HPC environments

201
00:08:14,940 --> 00:08:20,120
to operate

202
00:08:17,160 --> 00:08:23,220
so around uh

203
00:08:20,120 --> 00:08:26,280
2016-17 the

204
00:08:23,220 --> 00:08:27,960
a new format came around which we were

205
00:08:26,280 --> 00:08:30,660
going through the djangocon HTTP

206
00:08:27,960 --> 00:08:34,320
requesting so HTTP has this idea of

207
00:08:30,660 --> 00:08:38,399
doing offset range requests so you can

208
00:08:34,320 --> 00:08:41,700
store the data in index in the head and

209
00:08:38,399 --> 00:08:45,120
then actual chunks of data so you can

210
00:08:41,700 --> 00:08:47,300
make a HTTP get range request to get the

211
00:08:45,120 --> 00:08:50,940
bits you're interested in

212
00:08:47,300 --> 00:08:53,519
if you have passed the first little bit

213
00:08:50,940 --> 00:08:54,779
and you know where your data is in a

214
00:08:53,519 --> 00:08:57,060
large chunk

215
00:08:54,779 --> 00:08:59,160
yeah I've looked at the title page and

216
00:08:57,060 --> 00:09:02,040
you know which page to jump to

217
00:08:59,160 --> 00:09:05,040
so that that became a name known as the

218
00:09:02,040 --> 00:09:08,220
cloud optimized geotiff which is what

219
00:09:05,040 --> 00:09:10,920
ODC currently uses a storage backend for

220
00:09:08,220 --> 00:09:15,300
most deployments

221
00:09:10,920 --> 00:09:17,940
so uh the it started with USGS which was

222
00:09:15,300 --> 00:09:20,279
the United States Geological Survey uh

223
00:09:17,940 --> 00:09:23,339
they had set up opened up access to the

224
00:09:20,279 --> 00:09:25,380
landsat data set which is my was my

225
00:09:23,339 --> 00:09:28,440
first days of working with NASA working

226
00:09:25,380 --> 00:09:30,540
with globe visualizers for landsat and

227
00:09:28,440 --> 00:09:32,580
but the delivery system was very naive

228
00:09:30,540 --> 00:09:34,920
you had to download all of the data in a

229
00:09:32,580 --> 00:09:37,260
tar format and there was an experiment

230
00:09:34,920 --> 00:09:40,019
done to see can we download the little

231
00:09:37,260 --> 00:09:42,600
bit we want using this of offset range

232
00:09:40,019 --> 00:09:45,959
requests if you store all of the blown

233
00:09:42,600 --> 00:09:48,120
up tar as individual Tiff files with the

234
00:09:45,959 --> 00:09:49,680
right index in the beginning to say look

235
00:09:48,120 --> 00:09:52,440
over here if you're looking for this

236
00:09:49,680 --> 00:09:55,019
latitude longitude that is of Interest

237
00:09:52,440 --> 00:09:58,200
so uh slowly a lot of the other space

238
00:09:55,019 --> 00:10:00,899
agencies have adopted this standard for

239
00:09:58,200 --> 00:10:03,060
store storing and delivering the data is

240
00:10:00,899 --> 00:10:05,459
powered by all of the different public

241
00:10:03,060 --> 00:10:08,640
cloud or even you know similar blob

242
00:10:05,459 --> 00:10:11,000
storage providers supporting the HTTP

243
00:10:08,640 --> 00:10:14,519
based file access protocol

244
00:10:11,000 --> 00:10:16,740
and the underlying libraries are the

245
00:10:14,519 --> 00:10:20,300
usual curl Library which lets you make

246
00:10:16,740 --> 00:10:20,300
this HTTP requests

247
00:10:20,519 --> 00:10:26,220
these days I'm working uh towards

248
00:10:23,640 --> 00:10:29,820
supporting an open data Cube uh the Tsar

249
00:10:26,220 --> 00:10:33,420
format uh Jose has just come into the

250
00:10:29,820 --> 00:10:36,660
standards body to be uh to be organized

251
00:10:33,420 --> 00:10:38,519
and put in the right format with us uh

252
00:10:36,660 --> 00:10:42,060
with the right headers and so on so you

253
00:10:38,519 --> 00:10:44,760
can look up a 3D range so I'll talk

254
00:10:42,060 --> 00:10:46,200
about that a bit later but obviously it

255
00:10:44,760 --> 00:10:49,399
requires more memory because you're

256
00:10:46,200 --> 00:10:49,399
adding extra dimensions

257
00:10:49,760 --> 00:10:55,079
so uh the open data Cube data model

258
00:10:53,220 --> 00:10:58,620
which I will go into a bit later

259
00:10:55,079 --> 00:11:01,200
essentially has snapshot in time with n

260
00:10:58,620 --> 00:11:03,779
number of Dimensions attached to that

261
00:11:01,200 --> 00:11:05,760
point in time so for a hyperspectral

262
00:11:03,779 --> 00:11:07,800
data storage which has lots of bands

263
00:11:05,760 --> 00:11:09,200
you'll have X and Y for the latitude

264
00:11:07,800 --> 00:11:11,940
longitude or

265
00:11:09,200 --> 00:11:15,600
meters in northing and easting and then

266
00:11:11,940 --> 00:11:18,000
a band as a spectral coordinate

267
00:11:15,600 --> 00:11:20,760
uh following the success of the Cog

268
00:11:18,000 --> 00:11:23,459
format the other similar Cloud optimized

269
00:11:20,760 --> 00:11:25,800
storage formats have come around and

270
00:11:23,459 --> 00:11:28,860
they follow the same idea so this one

271
00:11:25,800 --> 00:11:30,660
shows the approach used in Cloud

272
00:11:28,860 --> 00:11:33,000
optimize Point Cloud which is used for

273
00:11:30,660 --> 00:11:35,459
storing lidar data individual Point

274
00:11:33,000 --> 00:11:37,680
measurements you have a header you have

275
00:11:35,459 --> 00:11:38,420
points you have chunk tables and you

276
00:11:37,680 --> 00:11:42,060
have

277
00:11:38,420 --> 00:11:45,899
other metadata attached to it to scale

278
00:11:42,060 --> 00:11:49,500
the points as as required for uh for

279
00:11:45,899 --> 00:11:51,660
Vector data you have geopar K so parquet

280
00:11:49,500 --> 00:11:52,920
is common if you have spatial attributes

281
00:11:51,660 --> 00:11:56,279
attached to that you can have your

282
00:11:52,920 --> 00:11:58,560
parquet is for storing spatial column

283
00:11:56,279 --> 00:12:00,720
and data

284
00:11:58,560 --> 00:12:03,600
so essentially all of this comes down to

285
00:12:00,720 --> 00:12:06,839
the idea that you can like read bits of

286
00:12:03,600 --> 00:12:10,019
it at a time very quickly still in a

287
00:12:06,839 --> 00:12:12,000
real environment there are limits I use

288
00:12:10,019 --> 00:12:14,220
the Amazon example here you can

289
00:12:12,000 --> 00:12:16,820
concurrently read one particular thing

290
00:12:14,220 --> 00:12:19,920
five and a half thousand times

291
00:12:16,820 --> 00:12:21,839
at the same time from from the S3 stores

292
00:12:19,920 --> 00:12:23,339
you may need to for additional heavier

293
00:12:21,839 --> 00:12:24,899
machine learning loads where you're

294
00:12:23,339 --> 00:12:27,180
reading a lot you may need to add

295
00:12:24,899 --> 00:12:29,360
additional layers of caching on top of

296
00:12:27,180 --> 00:12:29,360
that

297
00:12:30,300 --> 00:12:36,240
so that's about how things are stored

298
00:12:32,880 --> 00:12:38,880
and read quickly in bits of interest uh

299
00:12:36,240 --> 00:12:42,000
the other part is which bits to look up

300
00:12:38,880 --> 00:12:44,160
obviously I said the each Cog file or

301
00:12:42,000 --> 00:12:46,500
has its own header to say which bit is

302
00:12:44,160 --> 00:12:48,540
of interest in that particular file but

303
00:12:46,500 --> 00:12:51,120
in a particular data collection you may

304
00:12:48,540 --> 00:12:53,820
have millions of files so you'll have to

305
00:12:51,120 --> 00:12:56,160
have another layer of indexing on top to

306
00:12:53,820 --> 00:12:59,579
look up which particular file is of

307
00:12:56,160 --> 00:13:01,860
interest for your band space time point

308
00:12:59,579 --> 00:13:03,959
of interest in which you're trying to

309
00:13:01,860 --> 00:13:06,779
perform typically some analysis

310
00:13:03,959 --> 00:13:09,600
collocate collating it the data to some

311
00:13:06,779 --> 00:13:11,700
actual measurement on the ground or

312
00:13:09,600 --> 00:13:12,839
performing some variability analysis

313
00:13:11,700 --> 00:13:15,120
over time

314
00:13:12,839 --> 00:13:17,519
so that's where we come to like sort of

315
00:13:15,120 --> 00:13:20,399
the idea of metadata and indexes over

316
00:13:17,519 --> 00:13:24,060
the actual raw storage

317
00:13:20,399 --> 00:13:26,639
so continuing the library analogy the

318
00:13:24,060 --> 00:13:29,660
data is very critical so usually the

319
00:13:26,639 --> 00:13:33,000
metadata is sort of

320
00:13:29,660 --> 00:13:34,920
less important you can rederive the

321
00:13:33,000 --> 00:13:37,079
metadata as long as you have high

322
00:13:34,920 --> 00:13:39,000
durability over the data

323
00:13:37,079 --> 00:13:41,399
so occasionally you may need to

324
00:13:39,000 --> 00:13:43,500
reprocess the whole of the data look at

325
00:13:41,399 --> 00:13:46,519
the actual content and rederive the

326
00:13:43,500 --> 00:13:50,279
metadata and the metadata format

327
00:13:46,519 --> 00:13:52,019
standards evolve as the as the data

328
00:13:50,279 --> 00:13:53,820
themselves evolve new satellites come

329
00:13:52,019 --> 00:13:55,940
online with different amounts of

330
00:13:53,820 --> 00:13:59,459
dimensionality about them

331
00:13:55,940 --> 00:14:01,800
and their internal representation from

332
00:13:59,459 --> 00:14:03,839
the space agency of the metadata because

333
00:14:01,800 --> 00:14:05,639
their independent entity they're sitting

334
00:14:03,839 --> 00:14:08,519
at the source they get to dictate how

335
00:14:05,639 --> 00:14:09,959
they do things slowly those standards

336
00:14:08,519 --> 00:14:12,300
are emerging because people are

337
00:14:09,959 --> 00:14:14,220
interested in multi-sensor fusions so

338
00:14:12,300 --> 00:14:17,040
your data is not useful in isolation

339
00:14:14,220 --> 00:14:19,139
it's useful in together with something

340
00:14:17,040 --> 00:14:21,959
else and that they can only be brought

341
00:14:19,139 --> 00:14:25,260
together if they have a shared sort of

342
00:14:21,959 --> 00:14:28,019
data model to work together

343
00:14:25,260 --> 00:14:31,920
um so the sort of the current convention

344
00:14:28,019 --> 00:14:33,899
is the spatio temporal asset catalog uh

345
00:14:31,920 --> 00:14:37,320
stack

346
00:14:33,899 --> 00:14:39,420
um open data Cube was built pre-stack so

347
00:14:37,320 --> 00:14:42,180
it has its own slightly different

348
00:14:39,420 --> 00:14:44,160
convention but the open data Cube

349
00:14:42,180 --> 00:14:46,620
Community worked a little bit Alex Leith

350
00:14:44,160 --> 00:14:48,060
and others worked a little bit uh with

351
00:14:46,620 --> 00:14:49,920
stack to bring some of those

352
00:14:48,060 --> 00:14:51,540
understandings because there was the

353
00:14:49,920 --> 00:14:53,940
implementation existed before the

354
00:14:51,540 --> 00:14:54,860
standard so this some happens a lot of

355
00:14:53,940 --> 00:14:57,480
the time

356
00:14:54,860 --> 00:14:59,040
to bring some of those ideas to say this

357
00:14:57,480 --> 00:15:01,019
is actually practically usable it would

358
00:14:59,040 --> 00:15:02,880
be good to have these things in the

359
00:15:01,019 --> 00:15:05,519
standard everyone else is choosing to

360
00:15:02,880 --> 00:15:07,459
adopt even these days there are stack

361
00:15:05,519 --> 00:15:10,620
implementations which are not compliant

362
00:15:07,459 --> 00:15:12,839
but there are more maturities coming in

363
00:15:10,620 --> 00:15:14,339
place to make sure conformance testing

364
00:15:12,839 --> 00:15:16,980
is done in a stack catalog

365
00:15:14,339 --> 00:15:20,100
implementation and so on okay

366
00:15:16,980 --> 00:15:22,320
so being an open data Cube sort of

367
00:15:20,100 --> 00:15:25,620
injecting data into it I do kubernetes

368
00:15:22,320 --> 00:15:27,300
every day so I you sort of are an yaml

369
00:15:25,620 --> 00:15:29,339
engineer you don't write code you write

370
00:15:27,300 --> 00:15:32,579
declarative things to say things look

371
00:15:29,339 --> 00:15:34,320
like this this is the spec of the actual

372
00:15:32,579 --> 00:15:36,899
file this is the coordinate system

373
00:15:34,320 --> 00:15:39,660
you're in yaml is nice because you can

374
00:15:36,899 --> 00:15:43,380
write sort of descriptions next to it

375
00:15:39,660 --> 00:15:45,360
uh then the data model comes there in

376
00:15:43,380 --> 00:15:47,339
creating these different objects in the

377
00:15:45,360 --> 00:15:48,959
yaml where you are attaching to the

378
00:15:47,339 --> 00:15:52,800
binary blobs that are the satellite

379
00:15:48,959 --> 00:15:55,100
imagery raster data sets uh to say uh

380
00:15:52,800 --> 00:15:57,839
this is a product this is from different

381
00:15:55,100 --> 00:15:59,399
collection of observations this is a

382
00:15:57,839 --> 00:16:01,740
single observation at a point in time

383
00:15:59,399 --> 00:16:05,579
and place and these are the components

384
00:16:01,740 --> 00:16:08,399
inside it and then uh these are the

385
00:16:05,579 --> 00:16:11,300
actual measurements that plug took place

386
00:16:08,399 --> 00:16:14,940
and then for human readability

387
00:16:11,300 --> 00:16:18,839
we don't respond to 950 nanometer as

388
00:16:14,940 --> 00:16:20,579
well you say what color that is but that

389
00:16:18,839 --> 00:16:22,800
sort of becomes a bit of a liability if

390
00:16:20,579 --> 00:16:25,860
you have lots of colors if you have 400

391
00:16:22,800 --> 00:16:31,399
different bands it's like near Red

392
00:16:25,860 --> 00:16:31,399
almost red it could be red salmon anyway

393
00:16:31,500 --> 00:16:37,019
ah so what does the metadata look like

394
00:16:34,500 --> 00:16:42,000
and this is actually something uh

395
00:16:37,019 --> 00:16:44,820
is uh like in any other convention based

396
00:16:42,000 --> 00:16:48,540
thing people find challenging in

397
00:16:44,820 --> 00:16:50,579
adopting or open data Cube uh you have

398
00:16:48,540 --> 00:16:55,800
to get a handle on what the data model

399
00:16:50,579 --> 00:16:58,320
looks like and get uh get uh products

400
00:16:55,800 --> 00:17:00,660
that you have say you have some UAV data

401
00:16:58,320 --> 00:17:02,699
and that doesn't have a predefined

402
00:17:00,660 --> 00:17:04,559
metadata set you'll have to come up with

403
00:17:02,699 --> 00:17:06,480
one to put it into the open data Cube

404
00:17:04,559 --> 00:17:08,459
there are some helper scripts we are

405
00:17:06,480 --> 00:17:12,540
trying to make that more mature to make

406
00:17:08,459 --> 00:17:15,120
it easier to put any any metadata on top

407
00:17:12,540 --> 00:17:17,760
of the data sets you have or you can

408
00:17:15,120 --> 00:17:20,400
also Define your own specification if

409
00:17:17,760 --> 00:17:22,380
you think this model is insufficient you

410
00:17:20,400 --> 00:17:25,740
want to have more descriptive things

411
00:17:22,380 --> 00:17:26,939
about your platform you can add more

412
00:17:25,740 --> 00:17:30,360
into it

413
00:17:26,939 --> 00:17:32,700
the flexibility comes at a cost but yeah

414
00:17:30,360 --> 00:17:36,419
so that initial implementation of open

415
00:17:32,700 --> 00:17:39,960
data Cube uses postgresql as a document

416
00:17:36,419 --> 00:17:42,840
database so postgresql is a relational

417
00:17:39,960 --> 00:17:45,240
database but it has this idea of a Json

418
00:17:42,840 --> 00:17:49,020
blob so in first days when I was working

419
00:17:45,240 --> 00:17:50,940
with open data Cube databases I found I

420
00:17:49,020 --> 00:17:53,460
was migrating databases and I found very

421
00:17:50,940 --> 00:17:55,980
large doesn't love storing this metadata

422
00:17:53,460 --> 00:17:58,080
greater than two megabytes in size and

423
00:17:55,980 --> 00:17:59,760
then it sort of comes with the

424
00:17:58,080 --> 00:18:02,400
flexibility in being able to Define

425
00:17:59,760 --> 00:18:05,460
arbitrary metadata comes with the

426
00:18:02,400 --> 00:18:07,559
downside of your data being very Blobby

427
00:18:05,460 --> 00:18:10,679
in this particular backend

428
00:18:07,559 --> 00:18:13,140
implementation so I'll talk about like

429
00:18:10,679 --> 00:18:15,500
future index implementations that we are

430
00:18:13,140 --> 00:18:15,500
looking at

431
00:18:16,260 --> 00:18:23,580
so the core part of open data cubes is a

432
00:18:19,860 --> 00:18:25,559
bit of the core code has this schema

433
00:18:23,580 --> 00:18:27,600
where we are moving from postgresql

434
00:18:25,559 --> 00:18:31,559
which just in blobs where the geospatial

435
00:18:27,600 --> 00:18:34,440
data was in the Json as strings uh to be

436
00:18:31,559 --> 00:18:36,780
more geospatial with post implementation

437
00:18:34,440 --> 00:18:38,539
of post GIS which has geospatial

438
00:18:36,780 --> 00:18:41,580
functions built into it

439
00:18:38,539 --> 00:18:43,919
and then you can have the data model

440
00:18:41,580 --> 00:18:46,440
quickly look up the spatial query of

441
00:18:43,919 --> 00:18:50,340
where the data set is looking for you're

442
00:18:46,440 --> 00:18:52,919
looking for is is found and then also

443
00:18:50,340 --> 00:18:55,260
time ranges and then any other metadata

444
00:18:52,919 --> 00:18:57,600
is stored as metadata which is sort of

445
00:18:55,260 --> 00:18:59,480
not doesn't play well with our SQL

446
00:18:57,600 --> 00:19:02,880
alchemies

447
00:18:59,480 --> 00:19:06,140
guidelines but is still there from the

448
00:19:02,880 --> 00:19:06,140
pre-postgious world

449
00:19:06,600 --> 00:19:12,140
the other way for having a back end is

450
00:19:10,080 --> 00:19:14,880
sort of uh

451
00:19:12,140 --> 00:19:17,580
separating the database and layering an

452
00:19:14,880 --> 00:19:20,760
API the stack API as I was saying the

453
00:19:17,580 --> 00:19:25,500
stack has a search API so putting that

454
00:19:20,760 --> 00:19:28,140
on lets you just make a similar query

455
00:19:25,500 --> 00:19:31,620
request Notch to a database as a SQL

456
00:19:28,140 --> 00:19:35,460
query but to a web endpoint and then you

457
00:19:31,620 --> 00:19:37,919
can find out what are the uh what are

458
00:19:35,460 --> 00:19:42,679
the data sets available

459
00:19:37,919 --> 00:19:42,679
um I think I have a quick demo on that

460
00:19:45,600 --> 00:19:48,020
see

461
00:19:55,020 --> 00:20:01,080
so so

462
00:19:57,380 --> 00:20:02,340
you can use this Library called ODC

463
00:20:01,080 --> 00:20:05,760
stack

464
00:20:02,340 --> 00:20:07,820
and you can say Okay I want to look up

465
00:20:05,760 --> 00:20:12,120
this stack catalog

466
00:20:07,820 --> 00:20:14,539
and Sentinel two cogs over sometime in

467
00:20:12,120 --> 00:20:14,539
January

468
00:20:14,580 --> 00:20:18,440
and gives you a bunch of Json

469
00:20:18,840 --> 00:20:26,059
don't have to worry about it

470
00:20:22,080 --> 00:20:26,059
we lots of stuff available

471
00:20:31,700 --> 00:20:37,799
so and there are some scenes available

472
00:20:34,860 --> 00:20:40,740
over Adelaide and then you can have them

473
00:20:37,799 --> 00:20:42,840
in the OR open data Cube data model to

474
00:20:40,740 --> 00:20:45,860
process

475
00:20:42,840 --> 00:20:45,860
so sort of

476
00:20:46,620 --> 00:20:52,559
the the yaml engineering becomes a bit

477
00:20:50,039 --> 00:20:54,660
of more traditional just on engineering

478
00:20:52,559 --> 00:20:57,720
with rest apis and you can look up

479
00:20:54,660 --> 00:21:00,240
things and process them so

480
00:20:57,720 --> 00:21:02,640
what is the big Advantage big idea

481
00:21:00,240 --> 00:21:06,179
behind this what open data cube is

482
00:21:02,640 --> 00:21:08,640
providing is being able to lazily

483
00:21:06,179 --> 00:21:11,220
operate in a lot of on a lot of this

484
00:21:08,640 --> 00:21:13,980
data so the way we were talking about in

485
00:21:11,220 --> 00:21:15,780
spark or in the previous dag talk you

486
00:21:13,980 --> 00:21:18,780
actually set up the workflow you're

487
00:21:15,780 --> 00:21:21,059
trying to apply on this uh large

488
00:21:18,780 --> 00:21:23,340
collections of data you can test it over

489
00:21:21,059 --> 00:21:26,160
small samples and then increase your

490
00:21:23,340 --> 00:21:28,500
query space to apply the same operation

491
00:21:26,160 --> 00:21:32,960
lazily over large things and spin up

492
00:21:28,500 --> 00:21:32,960
clusters to perform those operations

493
00:21:33,120 --> 00:21:39,380
so the magical command it usually

494
00:21:35,880 --> 00:21:41,700
provides is this dcd.load using your

495
00:21:39,380 --> 00:21:44,940
parameterized query for your resolution

496
00:21:41,700 --> 00:21:49,080
where you're looking for the data

497
00:21:44,940 --> 00:21:51,200
and it Returns the results in in this in

498
00:21:49,080 --> 00:21:54,600
an x-ray abstraction

499
00:21:51,200 --> 00:21:57,299
where you have the different dimensions

500
00:21:54,600 --> 00:22:00,480
and x-rays sort of a hybrid between

501
00:21:57,299 --> 00:22:04,679
pandas and numpy you can look up things

502
00:22:00,480 --> 00:22:06,960
by their by their Dimension and and work

503
00:22:04,679 --> 00:22:10,260
in arrays

504
00:22:06,960 --> 00:22:12,299
an x-ray is also compatible with dasc

505
00:22:10,260 --> 00:22:15,480
which lets you create this lazy graphs

506
00:22:12,299 --> 00:22:17,820
then you can operate over and farm out

507
00:22:15,480 --> 00:22:19,140
the data across a large cluster that you

508
00:22:17,820 --> 00:22:22,320
spin up

509
00:22:19,140 --> 00:22:23,940
uh the bigger spot of it is one around

510
00:22:22,320 --> 00:22:25,860
that five and a half thousand five

511
00:22:23,940 --> 00:22:28,380
thousand limit you can have five

512
00:22:25,860 --> 00:22:31,080
thousand CPUs concurrently loading from

513
00:22:28,380 --> 00:22:33,960
your storage and processing your data

514
00:22:31,080 --> 00:22:36,360
the data model has some shortcomings I

515
00:22:33,960 --> 00:22:39,600
did some a lot of my work in my PhD on

516
00:22:36,360 --> 00:22:42,299
SAR uh when you're trying to reproject

517
00:22:39,600 --> 00:22:43,799
data because SAR pixels are in complex

518
00:22:42,299 --> 00:22:47,039
domain open data Cube needs some

519
00:22:43,799 --> 00:22:49,440
enhancements not there yet to convert

520
00:22:47,039 --> 00:22:52,080
the complex data and restructure it

521
00:22:49,440 --> 00:22:54,720
better for your even grids or have

522
00:22:52,080 --> 00:22:56,220
uneven grids you can have uneven grids

523
00:22:54,720 --> 00:22:57,780
in the coordinate system it's not

524
00:22:56,220 --> 00:22:59,159
implemented yet

525
00:22:57,780 --> 00:23:00,720
the other one is obviously the

526
00:22:59,159 --> 00:23:03,659
hyperspectral one I was mentioning

527
00:23:00,720 --> 00:23:06,480
because of the way we make it easy with

528
00:23:03,659 --> 00:23:10,320
stringolysis of different bands you will

529
00:23:06,480 --> 00:23:12,600
probably try to name things a lot or

530
00:23:10,320 --> 00:23:14,580
just have to add another dimension where

531
00:23:12,600 --> 00:23:17,580
you it's like a numbered account instead

532
00:23:14,580 --> 00:23:19,320
of having names you just have numbers on

533
00:23:17,580 --> 00:23:20,880
the on the bands and you have some sort

534
00:23:19,320 --> 00:23:23,220
of mapping elsewhere to say these are

535
00:23:20,880 --> 00:23:25,440
the ranges in which you have you have

536
00:23:23,220 --> 00:23:27,659
your bands

537
00:23:25,440 --> 00:23:29,580
so this is like just a quick example of

538
00:23:27,659 --> 00:23:31,020
some maths of the modern hyperspectral

539
00:23:29,580 --> 00:23:33,539
satellites coming on

540
00:23:31,020 --> 00:23:36,900
uh just some quick back of the envelope

541
00:23:33,539 --> 00:23:39,299
mats uh you have 7.6 million square

542
00:23:36,900 --> 00:23:42,960
kilometers in Australia you have 30

543
00:23:39,299 --> 00:23:45,419
meters spectral say creates 8.5 billion

544
00:23:42,960 --> 00:23:48,299
uh spectrals over Australia

545
00:23:45,419 --> 00:23:51,179
uh uh in across the bands and you have

546
00:23:48,299 --> 00:23:52,919
daily Mosaic so if you have like how

547
00:23:51,179 --> 00:23:56,940
many years of data it builds out to

548
00:23:52,919 --> 00:23:59,820
exabytes of data and uh open data Cube

549
00:23:56,940 --> 00:24:02,640
doesn't have a good way of scaling out

550
00:23:59,820 --> 00:24:04,260
you can store it but it doesn't have a

551
00:24:02,640 --> 00:24:07,020
good way of scaling out to read that

552
00:24:04,260 --> 00:24:08,340
very quickly in parallel and process it

553
00:24:07,020 --> 00:24:10,440
yet

554
00:24:08,340 --> 00:24:12,419
so talk about the scale of the

555
00:24:10,440 --> 00:24:15,419
processing that has been done and the

556
00:24:12,419 --> 00:24:17,340
task way I think our survey was done

557
00:24:15,419 --> 00:24:19,500
roughly 10 percent of the Python

558
00:24:17,340 --> 00:24:22,520
Community uses tasks just how many

559
00:24:19,500 --> 00:24:22,520
people use disk

560
00:24:22,620 --> 00:24:27,840
yeah a few people yeah so maybe in this

561
00:24:25,080 --> 00:24:29,700
audience less than 10 but yeah so around

562
00:24:27,840 --> 00:24:31,460
10 percent of overall python Community

563
00:24:29,700 --> 00:24:34,340
uses dusk

564
00:24:31,460 --> 00:24:37,500
dusk is a structured way of doing

565
00:24:34,340 --> 00:24:40,620
multi-processing over numerical data or

566
00:24:37,500 --> 00:24:42,059
even any other data

567
00:24:40,620 --> 00:24:46,620
thank you

568
00:24:42,059 --> 00:24:49,919
uh so for the open data Cube usage uh it

569
00:24:46,620 --> 00:24:52,140
basically embeds a duck array a task

570
00:24:49,919 --> 00:24:54,659
array with embedded numpy arrays which

571
00:24:52,140 --> 00:24:57,960
are then passed around between the nodes

572
00:24:54,659 --> 00:24:59,820
or threads depending on the sort of

573
00:24:57,960 --> 00:25:02,220
cluster you're setting up in order to

574
00:24:59,820 --> 00:25:04,500
scale out the processing

575
00:25:02,220 --> 00:25:07,380
uh you can start up a local cluster or

576
00:25:04,500 --> 00:25:09,539
you can start up a large thousand CPU

577
00:25:07,380 --> 00:25:12,740
cluster it depends on what sort of

578
00:25:09,539 --> 00:25:12,740
resources we have available

579
00:25:13,080 --> 00:25:18,720
and I work in kubernetes land most of

580
00:25:16,620 --> 00:25:21,299
the time and you set up a job and you

581
00:25:18,720 --> 00:25:24,059
have run a pods spread across multiple

582
00:25:21,299 --> 00:25:26,820
nodes where the same python code is sent

583
00:25:24,059 --> 00:25:29,580
out parameterized with different sources

584
00:25:26,820 --> 00:25:31,919
that uh open data Cube has looked up of

585
00:25:29,580 --> 00:25:35,640
the data source and then operations are

586
00:25:31,919 --> 00:25:37,440
applied by each part independently based

587
00:25:35,640 --> 00:25:39,360
on the Das graph you have set up through

588
00:25:37,440 --> 00:25:41,400
open data Cube to perform your

589
00:25:39,360 --> 00:25:43,740
processing

590
00:25:41,400 --> 00:25:45,299
so this is like a sample scale

591
00:25:43,740 --> 00:25:47,159
processing I think bigger processing

592
00:25:45,299 --> 00:25:48,960
Than This was done so this was done as

593
00:25:47,159 --> 00:25:51,960
part of the digital of Africa project

594
00:25:48,960 --> 00:25:54,960
which shows starting up a 4000 CPU

595
00:25:51,960 --> 00:26:00,440
cluster and consuming 50 terabytes of

596
00:25:54,960 --> 00:26:00,440
ram to do a whole of Africa processing

597
00:26:00,500 --> 00:26:06,059
and these are some of the results you

598
00:26:02,700 --> 00:26:11,179
get using the landsat data Archive of

599
00:26:06,059 --> 00:26:11,179
say evolution of a city in Egypt

600
00:26:14,820 --> 00:26:19,020
uh the open data queue project also has

601
00:26:17,880 --> 00:26:21,000
sort of other than the core

602
00:26:19,020 --> 00:26:22,799
functionality of being able to load and

603
00:26:21,000 --> 00:26:25,860
index lots and lots of satellite imagery

604
00:26:22,799 --> 00:26:27,779
has a few web applications that are sort

605
00:26:25,860 --> 00:26:30,419
of offshoots to it

606
00:26:27,779 --> 00:26:32,880
one of them is essentially being able to

607
00:26:30,419 --> 00:26:35,580
see the index what data you have in your

608
00:26:32,880 --> 00:26:38,460
data Cube uh we will be able to query

609
00:26:35,580 --> 00:26:41,460
how much of it is there this is the data

610
00:26:38,460 --> 00:26:45,179
Cube Explorer application this shows

611
00:26:41,460 --> 00:26:48,000
that in AWS direct that digital that

612
00:26:45,179 --> 00:26:50,400
Australia project has a half a million

613
00:26:48,000 --> 00:26:52,559
data sets 400 000 data sets covering all

614
00:26:50,400 --> 00:26:55,200
of Australia for this particular

615
00:26:52,559 --> 00:26:58,260
collection of data

616
00:26:55,200 --> 00:27:02,940
there is also an open web service so it

617
00:26:58,260 --> 00:27:05,159
produces the tile Services WMS wmts also

618
00:27:02,940 --> 00:27:08,039
actually fetching the data which is the

619
00:27:05,159 --> 00:27:09,900
web coverage service so this particular

620
00:27:08,039 --> 00:27:13,260
instance is the project I work on called

621
00:27:09,900 --> 00:27:16,260
aqua watch which stores uh data for

622
00:27:13,260 --> 00:27:19,500
aquatic reflections

623
00:27:16,260 --> 00:27:21,179
so obviously that's a segue to the

624
00:27:19,500 --> 00:27:22,799
different open data cubes as

625
00:27:21,179 --> 00:27:24,720
infrastructure running you can have it

626
00:27:22,799 --> 00:27:27,120
locally as your own data queue but there

627
00:27:24,720 --> 00:27:29,220
are large scale deployments digital

628
00:27:27,120 --> 00:27:30,240
Australia from geoscience Australia is

629
00:27:29,220 --> 00:27:32,520
one of them

630
00:27:30,240 --> 00:27:33,900
I worked on initials of Africa currently

631
00:27:32,520 --> 00:27:35,760
some people are trying to set up

632
00:27:33,900 --> 00:27:37,980
something for the Pacific

633
00:27:35,760 --> 00:27:41,600
the

634
00:27:37,980 --> 00:27:41,600
work meets me to take breaks

635
00:27:41,760 --> 00:27:47,820
it's good that I went timing is coming

636
00:27:43,860 --> 00:27:50,760
to an end I work on the csiro easy data

637
00:27:47,820 --> 00:27:51,659
cubes which one of them is the echo

638
00:27:50,760 --> 00:27:54,419
watch one

639
00:27:51,659 --> 00:27:57,980
uh there are lots of them around the

640
00:27:54,419 --> 00:27:57,980
world from csro

641
00:27:58,080 --> 00:28:03,059
and sort of it looks uh open data cube

642
00:28:01,679 --> 00:28:04,500
is a core component but there's lots of

643
00:28:03,059 --> 00:28:06,299
other stuff around it

644
00:28:04,500 --> 00:28:08,880
and it's sort of there are thousand

645
00:28:06,299 --> 00:28:11,240
moving Parts in it so just put this

646
00:28:08,880 --> 00:28:11,240
there

647
00:28:11,900 --> 00:28:17,520
the alternative index backends is

648
00:28:15,299 --> 00:28:19,140
something we are looking to enhance to

649
00:28:17,520 --> 00:28:21,299
scale down so that you have embedded

650
00:28:19,140 --> 00:28:23,580
databases or scale up so that you can

651
00:28:21,299 --> 00:28:25,260
actually just have not a database but

652
00:28:23,580 --> 00:28:27,779
just look at query The Blob stores

653
00:28:25,260 --> 00:28:29,580
directly and index them through some

654
00:28:27,779 --> 00:28:32,039
sort of data Lake approach

655
00:28:29,580 --> 00:28:34,679
aren't also adding hyperspectral support

656
00:28:32,039 --> 00:28:37,440
to create this large task graphs across

657
00:28:34,679 --> 00:28:40,320
the band Dimension efficiently and being

658
00:28:37,440 --> 00:28:43,520
able to index that properly

659
00:28:40,320 --> 00:28:43,520
happy to have any questions

660
00:28:46,080 --> 00:28:50,600
thank you Tish

661
00:28:48,120 --> 00:28:50,600
great

662
00:28:51,539 --> 00:28:55,500
we might be able to fit in one question

663
00:28:53,700 --> 00:28:58,700
quickly if anyone's got any

664
00:28:55,500 --> 00:28:58,700
as long as it's a short answer

665
00:28:58,740 --> 00:29:02,840
I can't see oh just over here

666
00:29:07,200 --> 00:29:10,380
thanks very much for your presentation

667
00:29:08,700 --> 00:29:12,779
you mentioned that there were some

668
00:29:10,380 --> 00:29:15,480
issues using SARS satellite data did

669
00:29:12,779 --> 00:29:17,159
that mean that we cannot use like this

670
00:29:15,480 --> 00:29:18,360
tool for Star seller let it data or is

671
00:29:17,159 --> 00:29:21,360
that is slower and there are some

672
00:29:18,360 --> 00:29:23,159
limitations and also can you use this as

673
00:29:21,360 --> 00:29:26,520
well with private satellite data

674
00:29:23,159 --> 00:29:30,000
providers or is limited to landsat and a

675
00:29:26,520 --> 00:29:32,820
European Space Agency products

676
00:29:30,000 --> 00:29:35,640
um so the there is a limitations in

677
00:29:32,820 --> 00:29:39,419
using the single look complex data which

678
00:29:35,640 --> 00:29:42,120
is not uh not actual x uh reflectance

679
00:29:39,419 --> 00:29:43,740
data but is uh earlier in the SAR

680
00:29:42,120 --> 00:29:46,799
processing where the pixels are not

681
00:29:43,740 --> 00:29:48,720
square and they are complex numbers but

682
00:29:46,799 --> 00:29:50,640
if you have your reference data where

683
00:29:48,720 --> 00:29:52,500
you only see the reflectance uh then

684
00:29:50,640 --> 00:29:54,419
that's fine so there is actually a

685
00:29:52,500 --> 00:29:56,600
sentinel one collection for all of

686
00:29:54,419 --> 00:29:59,399
Africa and so on available

687
00:29:56,600 --> 00:30:02,520
any data can be put into it people build

688
00:29:59,399 --> 00:30:04,620
their own private data cubes uh based on

689
00:30:02,520 --> 00:30:07,380
say planet or any other satellite

690
00:30:04,620 --> 00:30:10,679
imagery they have but then they have to

691
00:30:07,380 --> 00:30:13,559
store it themselves and you know manage

692
00:30:10,679 --> 00:30:15,299
access to it and so on so they have to

693
00:30:13,559 --> 00:30:17,520
set up the infrastructure to manage that

694
00:30:15,299 --> 00:30:19,860
private data themselves a lot of people

695
00:30:17,520 --> 00:30:21,720
have used open data cube in their own

696
00:30:19,860 --> 00:30:23,340
private instances to manage their data

697
00:30:21,720 --> 00:30:25,740
collections

698
00:30:23,340 --> 00:30:27,720
thanks a lot brilliant thank you I mean

699
00:30:25,740 --> 00:30:29,640
the sheer amount of data involved in

700
00:30:27,720 --> 00:30:31,020
this kind of thing makes my head spin to

701
00:30:29,640 --> 00:30:32,399
be honest but

702
00:30:31,020 --> 00:30:34,620
um thank you so much for that and here's

703
00:30:32,399 --> 00:30:36,600
a token of our appreciation our

704
00:30:34,620 --> 00:30:38,039
appreciation thank you there you go

705
00:30:36,600 --> 00:30:39,480
thanks a lot

706
00:30:38,039 --> 00:30:42,140
thank you Tish

707
00:30:39,480 --> 00:30:42,140
add