1 00:00:00,539 --> 00:00:03,539 foreign 2 00:00:09,559 --> 00:00:15,059 up next we have Tish who's a senior 3 00:00:12,660 --> 00:00:16,139 engineer at csiro space and astronomy 4 00:00:15,059 --> 00:00:18,840 division 5 00:00:16,139 --> 00:00:20,699 which sounds like such a cool job 6 00:00:18,840 --> 00:00:23,039 um he's previously worked with agencies 7 00:00:20,699 --> 00:00:25,199 like NASA and jaxa 8 00:00:23,039 --> 00:00:27,240 um so if you give me a give him a round 9 00:00:25,199 --> 00:00:28,740 of applause 10 00:00:27,240 --> 00:00:29,640 it's gonna be talking about I should 11 00:00:28,740 --> 00:00:33,180 have said what he's going to be talking 12 00:00:29,640 --> 00:00:34,680 about talking about open data Cube and 13 00:00:33,180 --> 00:00:36,780 I'll let you take it away there you go 14 00:00:34,680 --> 00:00:38,940 thank you 15 00:00:36,780 --> 00:00:40,680 um I'll start by acknowledging the land 16 00:00:38,940 --> 00:00:42,719 and the peoples where we stand we land 17 00:00:40,680 --> 00:00:46,500 in the stand in there Ghana people's 18 00:00:42,719 --> 00:00:48,239 land and it has never been seated I work 19 00:00:46,500 --> 00:00:51,719 with satellite images and satellite 20 00:00:48,239 --> 00:00:53,820 images here look down on the Earth from 21 00:00:51,719 --> 00:00:56,820 space and sort of give us a perspective 22 00:00:53,820 --> 00:00:59,699 there are no boundaries and everything 23 00:00:56,820 --> 00:01:02,039 looks the same and we try to organize 24 00:00:59,699 --> 00:01:03,719 the data and try to deliver it to people 25 00:01:02,039 --> 00:01:04,680 so you can make sense of things over 26 00:01:03,719 --> 00:01:07,920 time 27 00:01:04,680 --> 00:01:09,360 open data cube is a python library for 28 00:01:07,920 --> 00:01:11,520 accessing and processing satellite 29 00:01:09,360 --> 00:01:13,200 imagery I will sort of use this Library 30 00:01:11,520 --> 00:01:15,200 metaphor a little bit we talk about 31 00:01:13,200 --> 00:01:18,240 libraries in software engineering a lot 32 00:01:15,200 --> 00:01:22,619 but libraries existed before software 33 00:01:18,240 --> 00:01:24,420 I'll just try to make use that analogy a 34 00:01:22,619 --> 00:01:27,180 little bit in my top 35 00:01:24,420 --> 00:01:28,320 a little bit about me I'm a bit of a 36 00:01:27,180 --> 00:01:31,140 space cadet 37 00:01:28,320 --> 00:01:33,560 um I sort of you know enjoy working with 38 00:01:31,140 --> 00:01:36,540 satellites I've been doing that since 39 00:01:33,560 --> 00:01:40,079 2004 or so for pretty much all of my 40 00:01:36,540 --> 00:01:41,759 career currently I work at csiro 41 00:01:40,079 --> 00:01:43,619 um on the open data Cube project 42 00:01:41,759 --> 00:01:45,720 steering committee it's a fairly small 43 00:01:43,619 --> 00:01:47,759 community I'm trying to do presentations 44 00:01:45,720 --> 00:01:51,360 like this to build bring more interest 45 00:01:47,759 --> 00:01:53,700 into it get diverse stakeholders uh 46 00:01:51,360 --> 00:01:55,560 looking after the project 47 00:01:53,700 --> 00:01:57,479 previously I used to work at your 48 00:01:55,560 --> 00:02:00,119 science Australia which is the 49 00:01:57,479 --> 00:02:02,040 Australia's geospatial agency these are 50 00:02:00,119 --> 00:02:04,040 some of my team I used to work with they 51 00:02:02,040 --> 00:02:06,420 have won a few government Awards 52 00:02:04,040 --> 00:02:08,780 currently I work at csro which is 53 00:02:06,420 --> 00:02:10,860 semi-government there are other 54 00:02:08,780 --> 00:02:13,560 participants in the open data queue 55 00:02:10,860 --> 00:02:16,920 project across the world which makes it 56 00:02:13,560 --> 00:02:20,099 a thriving multi-owner project 57 00:02:16,920 --> 00:02:21,420 if you go to the ODC or GitHub you can 58 00:02:20,099 --> 00:02:24,360 see lots and lots of people who have 59 00:02:21,420 --> 00:02:26,700 contributed over time and like in open 60 00:02:24,360 --> 00:02:28,020 source you sort of get to work with 61 00:02:26,700 --> 00:02:30,060 different people across different 62 00:02:28,020 --> 00:02:32,099 organizations but work on the same 63 00:02:30,060 --> 00:02:34,379 project sort of sort of gives you a bit 64 00:02:32,099 --> 00:02:36,540 of continuity 65 00:02:34,379 --> 00:02:39,660 um yeah so I work with the last few 66 00:02:36,540 --> 00:02:42,000 people in this list Matt and Rob so at 67 00:02:39,660 --> 00:02:44,220 csiro 68 00:02:42,000 --> 00:02:46,800 so what is open data Cube I started it 69 00:02:44,220 --> 00:02:50,280 by saying it is a python Library 70 00:02:46,800 --> 00:02:52,379 so for some that answer is true so it is 71 00:02:50,280 --> 00:02:53,940 a python Library most of the code in the 72 00:02:52,379 --> 00:02:55,680 open data cube is python code even 73 00:02:53,940 --> 00:02:57,840 though there is lots of SQL in there 74 00:02:55,680 --> 00:03:00,480 previously there was a talk about Pi 75 00:02:57,840 --> 00:03:04,680 spark and it was funny how you know Pi 76 00:03:00,480 --> 00:03:06,120 spark is a wrapper around Scala so for 77 00:03:04,680 --> 00:03:07,440 me I will mostly work at the 78 00:03:06,120 --> 00:03:10,620 infrastructure layer which I will talk 79 00:03:07,440 --> 00:03:13,019 about later uh so it to me it looks like 80 00:03:10,620 --> 00:03:15,000 a bunch of blob storage an indexing 81 00:03:13,019 --> 00:03:17,280 database of some sort a kubernetes 82 00:03:15,000 --> 00:03:19,440 cluster for an organization which 83 00:03:17,280 --> 00:03:21,360 doesn't care about the integrities or 84 00:03:19,440 --> 00:03:23,700 like csiro which is full of scientists 85 00:03:21,360 --> 00:03:25,560 it's an easy method for gaining access 86 00:03:23,700 --> 00:03:27,900 to large amounts of satellite imagery 87 00:03:25,560 --> 00:03:30,000 through a single interface being able to 88 00:03:27,900 --> 00:03:32,120 process it at scale and get the 89 00:03:30,000 --> 00:03:35,580 scientific questions answered 90 00:03:32,120 --> 00:03:38,459 it doesn't necessarily be used for 91 00:03:35,580 --> 00:03:42,799 satellite imagery it it fits pretty well 92 00:03:38,459 --> 00:03:46,200 with that sort of dense array style data 93 00:03:42,799 --> 00:03:48,840 so but it can be used for our processing 94 00:03:46,200 --> 00:03:51,420 other sort of matrices large Matrix data 95 00:03:48,840 --> 00:03:53,940 atmospheric simulations and so on 96 00:03:51,420 --> 00:03:56,700 so a bit of History where did open data 97 00:03:53,940 --> 00:03:58,200 Cube come from uh why how did we get 98 00:03:56,700 --> 00:04:01,260 here 99 00:03:58,200 --> 00:04:03,060 so I found out these things I was 100 00:04:01,260 --> 00:04:05,819 introduced to open data Cube just by 101 00:04:03,060 --> 00:04:09,599 accident I was working at a startup in 102 00:04:05,819 --> 00:04:13,620 Nairobi and I turned up at a university 103 00:04:09,599 --> 00:04:15,560 and some people from the Ordnance survey 104 00:04:13,620 --> 00:04:18,780 in Britain were being demonstrated that 105 00:04:15,560 --> 00:04:21,299 Africa East Africa Regional data Cube 106 00:04:18,780 --> 00:04:22,979 and at that point I was like oh what is 107 00:04:21,299 --> 00:04:24,960 this thing and I found artists run by an 108 00:04:22,979 --> 00:04:27,060 Australian organization and I came back 109 00:04:24,960 --> 00:04:28,860 and worked at geosens Australia and 110 00:04:27,060 --> 00:04:32,040 figured out some of this history 111 00:04:28,860 --> 00:04:34,800 so just as Australia is a data custodian 112 00:04:32,040 --> 00:04:36,960 for a lot of the data in over Australia 113 00:04:34,800 --> 00:04:38,580 they have been collecting landsat data 114 00:04:36,960 --> 00:04:41,360 receiving and collecting landsat data 115 00:04:38,580 --> 00:04:44,400 for the 40 plus years A lot of it is 116 00:04:41,360 --> 00:04:47,160 conducted by ground stations operated by 117 00:04:44,400 --> 00:04:48,540 geosens Australia is stored at NCI which 118 00:04:47,160 --> 00:04:51,960 is the national compute infrastructure 119 00:04:48,540 --> 00:04:54,120 supercomputer in Canberra recently 120 00:04:51,960 --> 00:04:56,340 European space agency has started the 121 00:04:54,120 --> 00:04:58,500 Copernicus program and the Australia 122 00:04:56,340 --> 00:05:00,780 Regional data Hub has collected a lot of 123 00:04:58,500 --> 00:05:03,240 the Sentinel data which is a series of 124 00:05:00,780 --> 00:05:05,639 satellites operated under the Copernicus 125 00:05:03,240 --> 00:05:08,580 program so that means there is around 126 00:05:05,639 --> 00:05:10,620 five petabytes of data collected and 127 00:05:08,580 --> 00:05:13,199 I'll just go through some maths later on 128 00:05:10,620 --> 00:05:15,660 which is going to explode a lot more in 129 00:05:13,199 --> 00:05:19,380 the near future 130 00:05:15,660 --> 00:05:22,440 so open data Cube started as a sort of a 131 00:05:19,380 --> 00:05:23,580 way to organize this data in the HPC in 132 00:05:22,440 --> 00:05:27,060 the high performance Computing 133 00:05:23,580 --> 00:05:29,160 supercomputer environment I looked 134 00:05:27,060 --> 00:05:33,479 through the GitHub and I joined the 135 00:05:29,160 --> 00:05:34,860 project around 2018 or so uh 2019 and I 136 00:05:33,479 --> 00:05:37,280 looked through the history it was 137 00:05:34,860 --> 00:05:40,080 released initially in 2016 138 00:05:37,280 --> 00:05:43,620 and then it went through a bit of uh 139 00:05:40,080 --> 00:05:45,900 refactoring renaming things from being a 140 00:05:43,620 --> 00:05:48,720 project for an agency to being open 141 00:05:45,900 --> 00:05:51,180 sourced it was renamed from Australian 142 00:05:48,720 --> 00:05:54,180 geoscience data Cube to the open data 143 00:05:51,180 --> 00:05:56,460 Cube currently it's in a 1.8 versions 144 00:05:54,180 --> 00:05:58,860 fairly stable it's used by organizations 145 00:05:56,460 --> 00:06:01,139 around the world uh doing a bit of 146 00:05:58,860 --> 00:06:03,240 restructuring around how the index is 147 00:06:01,139 --> 00:06:04,800 maintained so talk more about the 148 00:06:03,240 --> 00:06:06,199 importance of indexing and these things 149 00:06:04,800 --> 00:06:09,479 I'm going over 150 00:06:06,199 --> 00:06:11,039 but in the future that it's going 151 00:06:09,479 --> 00:06:13,500 towards this sort of machine learning 152 00:06:11,039 --> 00:06:17,039 queryable raster data sets approach 153 00:06:13,500 --> 00:06:20,100 where things are meta data is a bit more 154 00:06:17,039 --> 00:06:23,160 flexible you can query it easily and try 155 00:06:20,100 --> 00:06:25,199 to find answers quicker rather than be 156 00:06:23,160 --> 00:06:27,419 stuck in details of the technology 157 00:06:25,199 --> 00:06:29,580 implementation 158 00:06:27,419 --> 00:06:32,340 so talking about details of Technology 159 00:06:29,580 --> 00:06:34,500 implementation uh we are in a technology 160 00:06:32,340 --> 00:06:36,600 conference and some of these details are 161 00:06:34,500 --> 00:06:38,340 important in for performance Reasons I'm 162 00:06:36,600 --> 00:06:39,720 talking about large amounts of data 163 00:06:38,340 --> 00:06:42,360 processing so you have to store it 164 00:06:39,720 --> 00:06:43,979 efficiently and then get it back bits of 165 00:06:42,360 --> 00:06:46,680 it quickly 166 00:06:43,979 --> 00:06:49,080 so this is one of the this is like one 167 00:06:46,680 --> 00:06:53,819 of the images I like around the library 168 00:06:49,080 --> 00:06:56,460 analogy so this is uh in Mona at the in 169 00:06:53,819 --> 00:06:59,520 Hobart it's a piece of art called the 170 00:06:56,460 --> 00:07:01,740 white Library so a library doesn't 171 00:06:59,520 --> 00:07:04,319 necessarily make things faster if you 172 00:07:01,740 --> 00:07:06,960 don't have the right way to find the 173 00:07:04,319 --> 00:07:09,240 right piece of data right be it book If 174 00:07:06,960 --> 00:07:12,720 there there are no indexes the books are 175 00:07:09,240 --> 00:07:14,100 in you know pages are blank and 176 00:07:12,720 --> 00:07:16,440 ultimately you have to have an interest 177 00:07:14,100 --> 00:07:18,060 in finding the right thing so once you 178 00:07:16,440 --> 00:07:19,560 have that interest you go backwards and 179 00:07:18,060 --> 00:07:21,479 you organize your data you have 180 00:07:19,560 --> 00:07:24,240 everything labeled there are no missing 181 00:07:21,479 --> 00:07:26,400 bits or corrupted data there is a 182 00:07:24,240 --> 00:07:28,319 synopsis and everything but actual 183 00:07:26,400 --> 00:07:30,180 content is there somewhere in a durable 184 00:07:28,319 --> 00:07:33,840 way 185 00:07:30,180 --> 00:07:36,300 so over ODC tries to add those layers of 186 00:07:33,840 --> 00:07:38,099 organization on top of all of this data 187 00:07:36,300 --> 00:07:42,660 we are collecting from space continually 188 00:07:38,099 --> 00:07:45,479 to make it easier to access the volumes 189 00:07:42,660 --> 00:07:47,639 so in the early days there are storage 190 00:07:45,479 --> 00:07:50,819 back end for the open data Cube used to 191 00:07:47,639 --> 00:07:53,819 be net CDF which is common for storing 192 00:07:50,819 --> 00:07:56,580 simulation data is tried and tested in 193 00:07:53,819 --> 00:07:58,860 the ocean and atmosphere Community it 194 00:07:56,580 --> 00:08:01,259 has a convention attached to it but it's 195 00:07:58,860 --> 00:08:04,160 a convention so it's quite flexible so 196 00:08:01,259 --> 00:08:07,139 each net CDF can be slightly different 197 00:08:04,160 --> 00:08:09,060 so it was not Cloud native it could 198 00:08:07,139 --> 00:08:11,720 couldn't scale it required a very high 199 00:08:09,060 --> 00:08:14,940 performance luster like file system 200 00:08:11,720 --> 00:08:17,160 which is available in HPC environments 201 00:08:14,940 --> 00:08:20,120 to operate 202 00:08:17,160 --> 00:08:23,220 so around uh 203 00:08:20,120 --> 00:08:26,280 2016-17 the 204 00:08:23,220 --> 00:08:27,960 a new format came around which we were 205 00:08:26,280 --> 00:08:30,660 going through the djangocon HTTP 206 00:08:27,960 --> 00:08:34,320 requesting so HTTP has this idea of 207 00:08:30,660 --> 00:08:38,399 doing offset range requests so you can 208 00:08:34,320 --> 00:08:41,700 store the data in index in the head and 209 00:08:38,399 --> 00:08:45,120 then actual chunks of data so you can 210 00:08:41,700 --> 00:08:47,300 make a HTTP get range request to get the 211 00:08:45,120 --> 00:08:50,940 bits you're interested in 212 00:08:47,300 --> 00:08:53,519 if you have passed the first little bit 213 00:08:50,940 --> 00:08:54,779 and you know where your data is in a 214 00:08:53,519 --> 00:08:57,060 large chunk 215 00:08:54,779 --> 00:08:59,160 yeah I've looked at the title page and 216 00:08:57,060 --> 00:09:02,040 you know which page to jump to 217 00:08:59,160 --> 00:09:05,040 so that that became a name known as the 218 00:09:02,040 --> 00:09:08,220 cloud optimized geotiff which is what 219 00:09:05,040 --> 00:09:10,920 ODC currently uses a storage backend for 220 00:09:08,220 --> 00:09:15,300 most deployments 221 00:09:10,920 --> 00:09:17,940 so uh the it started with USGS which was 222 00:09:15,300 --> 00:09:20,279 the United States Geological Survey uh 223 00:09:17,940 --> 00:09:23,339 they had set up opened up access to the 224 00:09:20,279 --> 00:09:25,380 landsat data set which is my was my 225 00:09:23,339 --> 00:09:28,440 first days of working with NASA working 226 00:09:25,380 --> 00:09:30,540 with globe visualizers for landsat and 227 00:09:28,440 --> 00:09:32,580 but the delivery system was very naive 228 00:09:30,540 --> 00:09:34,920 you had to download all of the data in a 229 00:09:32,580 --> 00:09:37,260 tar format and there was an experiment 230 00:09:34,920 --> 00:09:40,019 done to see can we download the little 231 00:09:37,260 --> 00:09:42,600 bit we want using this of offset range 232 00:09:40,019 --> 00:09:45,959 requests if you store all of the blown 233 00:09:42,600 --> 00:09:48,120 up tar as individual Tiff files with the 234 00:09:45,959 --> 00:09:49,680 right index in the beginning to say look 235 00:09:48,120 --> 00:09:52,440 over here if you're looking for this 236 00:09:49,680 --> 00:09:55,019 latitude longitude that is of Interest 237 00:09:52,440 --> 00:09:58,200 so uh slowly a lot of the other space 238 00:09:55,019 --> 00:10:00,899 agencies have adopted this standard for 239 00:09:58,200 --> 00:10:03,060 store storing and delivering the data is 240 00:10:00,899 --> 00:10:05,459 powered by all of the different public 241 00:10:03,060 --> 00:10:08,640 cloud or even you know similar blob 242 00:10:05,459 --> 00:10:11,000 storage providers supporting the HTTP 243 00:10:08,640 --> 00:10:14,519 based file access protocol 244 00:10:11,000 --> 00:10:16,740 and the underlying libraries are the 245 00:10:14,519 --> 00:10:20,300 usual curl Library which lets you make 246 00:10:16,740 --> 00:10:20,300 this HTTP requests 247 00:10:20,519 --> 00:10:26,220 these days I'm working uh towards 248 00:10:23,640 --> 00:10:29,820 supporting an open data Cube uh the Tsar 249 00:10:26,220 --> 00:10:33,420 format uh Jose has just come into the 250 00:10:29,820 --> 00:10:36,660 standards body to be uh to be organized 251 00:10:33,420 --> 00:10:38,519 and put in the right format with us uh 252 00:10:36,660 --> 00:10:42,060 with the right headers and so on so you 253 00:10:38,519 --> 00:10:44,760 can look up a 3D range so I'll talk 254 00:10:42,060 --> 00:10:46,200 about that a bit later but obviously it 255 00:10:44,760 --> 00:10:49,399 requires more memory because you're 256 00:10:46,200 --> 00:10:49,399 adding extra dimensions 257 00:10:49,760 --> 00:10:55,079 so uh the open data Cube data model 258 00:10:53,220 --> 00:10:58,620 which I will go into a bit later 259 00:10:55,079 --> 00:11:01,200 essentially has snapshot in time with n 260 00:10:58,620 --> 00:11:03,779 number of Dimensions attached to that 261 00:11:01,200 --> 00:11:05,760 point in time so for a hyperspectral 262 00:11:03,779 --> 00:11:07,800 data storage which has lots of bands 263 00:11:05,760 --> 00:11:09,200 you'll have X and Y for the latitude 264 00:11:07,800 --> 00:11:11,940 longitude or 265 00:11:09,200 --> 00:11:15,600 meters in northing and easting and then 266 00:11:11,940 --> 00:11:18,000 a band as a spectral coordinate 267 00:11:15,600 --> 00:11:20,760 uh following the success of the Cog 268 00:11:18,000 --> 00:11:23,459 format the other similar Cloud optimized 269 00:11:20,760 --> 00:11:25,800 storage formats have come around and 270 00:11:23,459 --> 00:11:28,860 they follow the same idea so this one 271 00:11:25,800 --> 00:11:30,660 shows the approach used in Cloud 272 00:11:28,860 --> 00:11:33,000 optimize Point Cloud which is used for 273 00:11:30,660 --> 00:11:35,459 storing lidar data individual Point 274 00:11:33,000 --> 00:11:37,680 measurements you have a header you have 275 00:11:35,459 --> 00:11:38,420 points you have chunk tables and you 276 00:11:37,680 --> 00:11:42,060 have 277 00:11:38,420 --> 00:11:45,899 other metadata attached to it to scale 278 00:11:42,060 --> 00:11:49,500 the points as as required for uh for 279 00:11:45,899 --> 00:11:51,660 Vector data you have geopar K so parquet 280 00:11:49,500 --> 00:11:52,920 is common if you have spatial attributes 281 00:11:51,660 --> 00:11:56,279 attached to that you can have your 282 00:11:52,920 --> 00:11:58,560 parquet is for storing spatial column 283 00:11:56,279 --> 00:12:00,720 and data 284 00:11:58,560 --> 00:12:03,600 so essentially all of this comes down to 285 00:12:00,720 --> 00:12:06,839 the idea that you can like read bits of 286 00:12:03,600 --> 00:12:10,019 it at a time very quickly still in a 287 00:12:06,839 --> 00:12:12,000 real environment there are limits I use 288 00:12:10,019 --> 00:12:14,220 the Amazon example here you can 289 00:12:12,000 --> 00:12:16,820 concurrently read one particular thing 290 00:12:14,220 --> 00:12:19,920 five and a half thousand times 291 00:12:16,820 --> 00:12:21,839 at the same time from from the S3 stores 292 00:12:19,920 --> 00:12:23,339 you may need to for additional heavier 293 00:12:21,839 --> 00:12:24,899 machine learning loads where you're 294 00:12:23,339 --> 00:12:27,180 reading a lot you may need to add 295 00:12:24,899 --> 00:12:29,360 additional layers of caching on top of 296 00:12:27,180 --> 00:12:29,360 that 297 00:12:30,300 --> 00:12:36,240 so that's about how things are stored 298 00:12:32,880 --> 00:12:38,880 and read quickly in bits of interest uh 299 00:12:36,240 --> 00:12:42,000 the other part is which bits to look up 300 00:12:38,880 --> 00:12:44,160 obviously I said the each Cog file or 301 00:12:42,000 --> 00:12:46,500 has its own header to say which bit is 302 00:12:44,160 --> 00:12:48,540 of interest in that particular file but 303 00:12:46,500 --> 00:12:51,120 in a particular data collection you may 304 00:12:48,540 --> 00:12:53,820 have millions of files so you'll have to 305 00:12:51,120 --> 00:12:56,160 have another layer of indexing on top to 306 00:12:53,820 --> 00:12:59,579 look up which particular file is of 307 00:12:56,160 --> 00:13:01,860 interest for your band space time point 308 00:12:59,579 --> 00:13:03,959 of interest in which you're trying to 309 00:13:01,860 --> 00:13:06,779 perform typically some analysis 310 00:13:03,959 --> 00:13:09,600 collocate collating it the data to some 311 00:13:06,779 --> 00:13:11,700 actual measurement on the ground or 312 00:13:09,600 --> 00:13:12,839 performing some variability analysis 313 00:13:11,700 --> 00:13:15,120 over time 314 00:13:12,839 --> 00:13:17,519 so that's where we come to like sort of 315 00:13:15,120 --> 00:13:20,399 the idea of metadata and indexes over 316 00:13:17,519 --> 00:13:24,060 the actual raw storage 317 00:13:20,399 --> 00:13:26,639 so continuing the library analogy the 318 00:13:24,060 --> 00:13:29,660 data is very critical so usually the 319 00:13:26,639 --> 00:13:33,000 metadata is sort of 320 00:13:29,660 --> 00:13:34,920 less important you can rederive the 321 00:13:33,000 --> 00:13:37,079 metadata as long as you have high 322 00:13:34,920 --> 00:13:39,000 durability over the data 323 00:13:37,079 --> 00:13:41,399 so occasionally you may need to 324 00:13:39,000 --> 00:13:43,500 reprocess the whole of the data look at 325 00:13:41,399 --> 00:13:46,519 the actual content and rederive the 326 00:13:43,500 --> 00:13:50,279 metadata and the metadata format 327 00:13:46,519 --> 00:13:52,019 standards evolve as the as the data 328 00:13:50,279 --> 00:13:53,820 themselves evolve new satellites come 329 00:13:52,019 --> 00:13:55,940 online with different amounts of 330 00:13:53,820 --> 00:13:59,459 dimensionality about them 331 00:13:55,940 --> 00:14:01,800 and their internal representation from 332 00:13:59,459 --> 00:14:03,839 the space agency of the metadata because 333 00:14:01,800 --> 00:14:05,639 their independent entity they're sitting 334 00:14:03,839 --> 00:14:08,519 at the source they get to dictate how 335 00:14:05,639 --> 00:14:09,959 they do things slowly those standards 336 00:14:08,519 --> 00:14:12,300 are emerging because people are 337 00:14:09,959 --> 00:14:14,220 interested in multi-sensor fusions so 338 00:14:12,300 --> 00:14:17,040 your data is not useful in isolation 339 00:14:14,220 --> 00:14:19,139 it's useful in together with something 340 00:14:17,040 --> 00:14:21,959 else and that they can only be brought 341 00:14:19,139 --> 00:14:25,260 together if they have a shared sort of 342 00:14:21,959 --> 00:14:28,019 data model to work together 343 00:14:25,260 --> 00:14:31,920 um so the sort of the current convention 344 00:14:28,019 --> 00:14:33,899 is the spatio temporal asset catalog uh 345 00:14:31,920 --> 00:14:37,320 stack 346 00:14:33,899 --> 00:14:39,420 um open data Cube was built pre-stack so 347 00:14:37,320 --> 00:14:42,180 it has its own slightly different 348 00:14:39,420 --> 00:14:44,160 convention but the open data Cube 349 00:14:42,180 --> 00:14:46,620 Community worked a little bit Alex Leith 350 00:14:44,160 --> 00:14:48,060 and others worked a little bit uh with 351 00:14:46,620 --> 00:14:49,920 stack to bring some of those 352 00:14:48,060 --> 00:14:51,540 understandings because there was the 353 00:14:49,920 --> 00:14:53,940 implementation existed before the 354 00:14:51,540 --> 00:14:54,860 standard so this some happens a lot of 355 00:14:53,940 --> 00:14:57,480 the time 356 00:14:54,860 --> 00:14:59,040 to bring some of those ideas to say this 357 00:14:57,480 --> 00:15:01,019 is actually practically usable it would 358 00:14:59,040 --> 00:15:02,880 be good to have these things in the 359 00:15:01,019 --> 00:15:05,519 standard everyone else is choosing to 360 00:15:02,880 --> 00:15:07,459 adopt even these days there are stack 361 00:15:05,519 --> 00:15:10,620 implementations which are not compliant 362 00:15:07,459 --> 00:15:12,839 but there are more maturities coming in 363 00:15:10,620 --> 00:15:14,339 place to make sure conformance testing 364 00:15:12,839 --> 00:15:16,980 is done in a stack catalog 365 00:15:14,339 --> 00:15:20,100 implementation and so on okay 366 00:15:16,980 --> 00:15:22,320 so being an open data Cube sort of 367 00:15:20,100 --> 00:15:25,620 injecting data into it I do kubernetes 368 00:15:22,320 --> 00:15:27,300 every day so I you sort of are an yaml 369 00:15:25,620 --> 00:15:29,339 engineer you don't write code you write 370 00:15:27,300 --> 00:15:32,579 declarative things to say things look 371 00:15:29,339 --> 00:15:34,320 like this this is the spec of the actual 372 00:15:32,579 --> 00:15:36,899 file this is the coordinate system 373 00:15:34,320 --> 00:15:39,660 you're in yaml is nice because you can 374 00:15:36,899 --> 00:15:43,380 write sort of descriptions next to it 375 00:15:39,660 --> 00:15:45,360 uh then the data model comes there in 376 00:15:43,380 --> 00:15:47,339 creating these different objects in the 377 00:15:45,360 --> 00:15:48,959 yaml where you are attaching to the 378 00:15:47,339 --> 00:15:52,800 binary blobs that are the satellite 379 00:15:48,959 --> 00:15:55,100 imagery raster data sets uh to say uh 380 00:15:52,800 --> 00:15:57,839 this is a product this is from different 381 00:15:55,100 --> 00:15:59,399 collection of observations this is a 382 00:15:57,839 --> 00:16:01,740 single observation at a point in time 383 00:15:59,399 --> 00:16:05,579 and place and these are the components 384 00:16:01,740 --> 00:16:08,399 inside it and then uh these are the 385 00:16:05,579 --> 00:16:11,300 actual measurements that plug took place 386 00:16:08,399 --> 00:16:14,940 and then for human readability 387 00:16:11,300 --> 00:16:18,839 we don't respond to 950 nanometer as 388 00:16:14,940 --> 00:16:20,579 well you say what color that is but that 389 00:16:18,839 --> 00:16:22,800 sort of becomes a bit of a liability if 390 00:16:20,579 --> 00:16:25,860 you have lots of colors if you have 400 391 00:16:22,800 --> 00:16:31,399 different bands it's like near Red 392 00:16:25,860 --> 00:16:31,399 almost red it could be red salmon anyway 393 00:16:31,500 --> 00:16:37,019 ah so what does the metadata look like 394 00:16:34,500 --> 00:16:42,000 and this is actually something uh 395 00:16:37,019 --> 00:16:44,820 is uh like in any other convention based 396 00:16:42,000 --> 00:16:48,540 thing people find challenging in 397 00:16:44,820 --> 00:16:50,579 adopting or open data Cube uh you have 398 00:16:48,540 --> 00:16:55,800 to get a handle on what the data model 399 00:16:50,579 --> 00:16:58,320 looks like and get uh get uh products 400 00:16:55,800 --> 00:17:00,660 that you have say you have some UAV data 401 00:16:58,320 --> 00:17:02,699 and that doesn't have a predefined 402 00:17:00,660 --> 00:17:04,559 metadata set you'll have to come up with 403 00:17:02,699 --> 00:17:06,480 one to put it into the open data Cube 404 00:17:04,559 --> 00:17:08,459 there are some helper scripts we are 405 00:17:06,480 --> 00:17:12,540 trying to make that more mature to make 406 00:17:08,459 --> 00:17:15,120 it easier to put any any metadata on top 407 00:17:12,540 --> 00:17:17,760 of the data sets you have or you can 408 00:17:15,120 --> 00:17:20,400 also Define your own specification if 409 00:17:17,760 --> 00:17:22,380 you think this model is insufficient you 410 00:17:20,400 --> 00:17:25,740 want to have more descriptive things 411 00:17:22,380 --> 00:17:26,939 about your platform you can add more 412 00:17:25,740 --> 00:17:30,360 into it 413 00:17:26,939 --> 00:17:32,700 the flexibility comes at a cost but yeah 414 00:17:30,360 --> 00:17:36,419 so that initial implementation of open 415 00:17:32,700 --> 00:17:39,960 data Cube uses postgresql as a document 416 00:17:36,419 --> 00:17:42,840 database so postgresql is a relational 417 00:17:39,960 --> 00:17:45,240 database but it has this idea of a Json 418 00:17:42,840 --> 00:17:49,020 blob so in first days when I was working 419 00:17:45,240 --> 00:17:50,940 with open data Cube databases I found I 420 00:17:49,020 --> 00:17:53,460 was migrating databases and I found very 421 00:17:50,940 --> 00:17:55,980 large doesn't love storing this metadata 422 00:17:53,460 --> 00:17:58,080 greater than two megabytes in size and 423 00:17:55,980 --> 00:17:59,760 then it sort of comes with the 424 00:17:58,080 --> 00:18:02,400 flexibility in being able to Define 425 00:17:59,760 --> 00:18:05,460 arbitrary metadata comes with the 426 00:18:02,400 --> 00:18:07,559 downside of your data being very Blobby 427 00:18:05,460 --> 00:18:10,679 in this particular backend 428 00:18:07,559 --> 00:18:13,140 implementation so I'll talk about like 429 00:18:10,679 --> 00:18:15,500 future index implementations that we are 430 00:18:13,140 --> 00:18:15,500 looking at 431 00:18:16,260 --> 00:18:23,580 so the core part of open data cubes is a 432 00:18:19,860 --> 00:18:25,559 bit of the core code has this schema 433 00:18:23,580 --> 00:18:27,600 where we are moving from postgresql 434 00:18:25,559 --> 00:18:31,559 which just in blobs where the geospatial 435 00:18:27,600 --> 00:18:34,440 data was in the Json as strings uh to be 436 00:18:31,559 --> 00:18:36,780 more geospatial with post implementation 437 00:18:34,440 --> 00:18:38,539 of post GIS which has geospatial 438 00:18:36,780 --> 00:18:41,580 functions built into it 439 00:18:38,539 --> 00:18:43,919 and then you can have the data model 440 00:18:41,580 --> 00:18:46,440 quickly look up the spatial query of 441 00:18:43,919 --> 00:18:50,340 where the data set is looking for you're 442 00:18:46,440 --> 00:18:52,919 looking for is is found and then also 443 00:18:50,340 --> 00:18:55,260 time ranges and then any other metadata 444 00:18:52,919 --> 00:18:57,600 is stored as metadata which is sort of 445 00:18:55,260 --> 00:18:59,480 not doesn't play well with our SQL 446 00:18:57,600 --> 00:19:02,880 alchemies 447 00:18:59,480 --> 00:19:06,140 guidelines but is still there from the 448 00:19:02,880 --> 00:19:06,140 pre-postgious world 449 00:19:06,600 --> 00:19:12,140 the other way for having a back end is 450 00:19:10,080 --> 00:19:14,880 sort of uh 451 00:19:12,140 --> 00:19:17,580 separating the database and layering an 452 00:19:14,880 --> 00:19:20,760 API the stack API as I was saying the 453 00:19:17,580 --> 00:19:25,500 stack has a search API so putting that 454 00:19:20,760 --> 00:19:28,140 on lets you just make a similar query 455 00:19:25,500 --> 00:19:31,620 request Notch to a database as a SQL 456 00:19:28,140 --> 00:19:35,460 query but to a web endpoint and then you 457 00:19:31,620 --> 00:19:37,919 can find out what are the uh what are 458 00:19:35,460 --> 00:19:42,679 the data sets available 459 00:19:37,919 --> 00:19:42,679 um I think I have a quick demo on that 460 00:19:45,600 --> 00:19:48,020 see 461 00:19:55,020 --> 00:20:01,080 so so 462 00:19:57,380 --> 00:20:02,340 you can use this Library called ODC 463 00:20:01,080 --> 00:20:05,760 stack 464 00:20:02,340 --> 00:20:07,820 and you can say Okay I want to look up 465 00:20:05,760 --> 00:20:12,120 this stack catalog 466 00:20:07,820 --> 00:20:14,539 and Sentinel two cogs over sometime in 467 00:20:12,120 --> 00:20:14,539 January 468 00:20:14,580 --> 00:20:18,440 and gives you a bunch of Json 469 00:20:18,840 --> 00:20:26,059 don't have to worry about it 470 00:20:22,080 --> 00:20:26,059 we lots of stuff available 471 00:20:31,700 --> 00:20:37,799 so and there are some scenes available 472 00:20:34,860 --> 00:20:40,740 over Adelaide and then you can have them 473 00:20:37,799 --> 00:20:42,840 in the OR open data Cube data model to 474 00:20:40,740 --> 00:20:45,860 process 475 00:20:42,840 --> 00:20:45,860 so sort of 476 00:20:46,620 --> 00:20:52,559 the the yaml engineering becomes a bit 477 00:20:50,039 --> 00:20:54,660 of more traditional just on engineering 478 00:20:52,559 --> 00:20:57,720 with rest apis and you can look up 479 00:20:54,660 --> 00:21:00,240 things and process them so 480 00:20:57,720 --> 00:21:02,640 what is the big Advantage big idea 481 00:21:00,240 --> 00:21:06,179 behind this what open data cube is 482 00:21:02,640 --> 00:21:08,640 providing is being able to lazily 483 00:21:06,179 --> 00:21:11,220 operate in a lot of on a lot of this 484 00:21:08,640 --> 00:21:13,980 data so the way we were talking about in 485 00:21:11,220 --> 00:21:15,780 spark or in the previous dag talk you 486 00:21:13,980 --> 00:21:18,780 actually set up the workflow you're 487 00:21:15,780 --> 00:21:21,059 trying to apply on this uh large 488 00:21:18,780 --> 00:21:23,340 collections of data you can test it over 489 00:21:21,059 --> 00:21:26,160 small samples and then increase your 490 00:21:23,340 --> 00:21:28,500 query space to apply the same operation 491 00:21:26,160 --> 00:21:32,960 lazily over large things and spin up 492 00:21:28,500 --> 00:21:32,960 clusters to perform those operations 493 00:21:33,120 --> 00:21:39,380 so the magical command it usually 494 00:21:35,880 --> 00:21:41,700 provides is this dcd.load using your 495 00:21:39,380 --> 00:21:44,940 parameterized query for your resolution 496 00:21:41,700 --> 00:21:49,080 where you're looking for the data 497 00:21:44,940 --> 00:21:51,200 and it Returns the results in in this in 498 00:21:49,080 --> 00:21:54,600 an x-ray abstraction 499 00:21:51,200 --> 00:21:57,299 where you have the different dimensions 500 00:21:54,600 --> 00:22:00,480 and x-rays sort of a hybrid between 501 00:21:57,299 --> 00:22:04,679 pandas and numpy you can look up things 502 00:22:00,480 --> 00:22:06,960 by their by their Dimension and and work 503 00:22:04,679 --> 00:22:10,260 in arrays 504 00:22:06,960 --> 00:22:12,299 an x-ray is also compatible with dasc 505 00:22:10,260 --> 00:22:15,480 which lets you create this lazy graphs 506 00:22:12,299 --> 00:22:17,820 then you can operate over and farm out 507 00:22:15,480 --> 00:22:19,140 the data across a large cluster that you 508 00:22:17,820 --> 00:22:22,320 spin up 509 00:22:19,140 --> 00:22:23,940 uh the bigger spot of it is one around 510 00:22:22,320 --> 00:22:25,860 that five and a half thousand five 511 00:22:23,940 --> 00:22:28,380 thousand limit you can have five 512 00:22:25,860 --> 00:22:31,080 thousand CPUs concurrently loading from 513 00:22:28,380 --> 00:22:33,960 your storage and processing your data 514 00:22:31,080 --> 00:22:36,360 the data model has some shortcomings I 515 00:22:33,960 --> 00:22:39,600 did some a lot of my work in my PhD on 516 00:22:36,360 --> 00:22:42,299 SAR uh when you're trying to reproject 517 00:22:39,600 --> 00:22:43,799 data because SAR pixels are in complex 518 00:22:42,299 --> 00:22:47,039 domain open data Cube needs some 519 00:22:43,799 --> 00:22:49,440 enhancements not there yet to convert 520 00:22:47,039 --> 00:22:52,080 the complex data and restructure it 521 00:22:49,440 --> 00:22:54,720 better for your even grids or have 522 00:22:52,080 --> 00:22:56,220 uneven grids you can have uneven grids 523 00:22:54,720 --> 00:22:57,780 in the coordinate system it's not 524 00:22:56,220 --> 00:22:59,159 implemented yet 525 00:22:57,780 --> 00:23:00,720 the other one is obviously the 526 00:22:59,159 --> 00:23:03,659 hyperspectral one I was mentioning 527 00:23:00,720 --> 00:23:06,480 because of the way we make it easy with 528 00:23:03,659 --> 00:23:10,320 stringolysis of different bands you will 529 00:23:06,480 --> 00:23:12,600 probably try to name things a lot or 530 00:23:10,320 --> 00:23:14,580 just have to add another dimension where 531 00:23:12,600 --> 00:23:17,580 you it's like a numbered account instead 532 00:23:14,580 --> 00:23:19,320 of having names you just have numbers on 533 00:23:17,580 --> 00:23:20,880 the on the bands and you have some sort 534 00:23:19,320 --> 00:23:23,220 of mapping elsewhere to say these are 535 00:23:20,880 --> 00:23:25,440 the ranges in which you have you have 536 00:23:23,220 --> 00:23:27,659 your bands 537 00:23:25,440 --> 00:23:29,580 so this is like just a quick example of 538 00:23:27,659 --> 00:23:31,020 some maths of the modern hyperspectral 539 00:23:29,580 --> 00:23:33,539 satellites coming on 540 00:23:31,020 --> 00:23:36,900 uh just some quick back of the envelope 541 00:23:33,539 --> 00:23:39,299 mats uh you have 7.6 million square 542 00:23:36,900 --> 00:23:42,960 kilometers in Australia you have 30 543 00:23:39,299 --> 00:23:45,419 meters spectral say creates 8.5 billion 544 00:23:42,960 --> 00:23:48,299 uh spectrals over Australia 545 00:23:45,419 --> 00:23:51,179 uh uh in across the bands and you have 546 00:23:48,299 --> 00:23:52,919 daily Mosaic so if you have like how 547 00:23:51,179 --> 00:23:56,940 many years of data it builds out to 548 00:23:52,919 --> 00:23:59,820 exabytes of data and uh open data Cube 549 00:23:56,940 --> 00:24:02,640 doesn't have a good way of scaling out 550 00:23:59,820 --> 00:24:04,260 you can store it but it doesn't have a 551 00:24:02,640 --> 00:24:07,020 good way of scaling out to read that 552 00:24:04,260 --> 00:24:08,340 very quickly in parallel and process it 553 00:24:07,020 --> 00:24:10,440 yet 554 00:24:08,340 --> 00:24:12,419 so talk about the scale of the 555 00:24:10,440 --> 00:24:15,419 processing that has been done and the 556 00:24:12,419 --> 00:24:17,340 task way I think our survey was done 557 00:24:15,419 --> 00:24:19,500 roughly 10 percent of the Python 558 00:24:17,340 --> 00:24:22,520 Community uses tasks just how many 559 00:24:19,500 --> 00:24:22,520 people use disk 560 00:24:22,620 --> 00:24:27,840 yeah a few people yeah so maybe in this 561 00:24:25,080 --> 00:24:29,700 audience less than 10 but yeah so around 562 00:24:27,840 --> 00:24:31,460 10 percent of overall python Community 563 00:24:29,700 --> 00:24:34,340 uses dusk 564 00:24:31,460 --> 00:24:37,500 dusk is a structured way of doing 565 00:24:34,340 --> 00:24:40,620 multi-processing over numerical data or 566 00:24:37,500 --> 00:24:42,059 even any other data 567 00:24:40,620 --> 00:24:46,620 thank you 568 00:24:42,059 --> 00:24:49,919 uh so for the open data Cube usage uh it 569 00:24:46,620 --> 00:24:52,140 basically embeds a duck array a task 570 00:24:49,919 --> 00:24:54,659 array with embedded numpy arrays which 571 00:24:52,140 --> 00:24:57,960 are then passed around between the nodes 572 00:24:54,659 --> 00:24:59,820 or threads depending on the sort of 573 00:24:57,960 --> 00:25:02,220 cluster you're setting up in order to 574 00:24:59,820 --> 00:25:04,500 scale out the processing 575 00:25:02,220 --> 00:25:07,380 uh you can start up a local cluster or 576 00:25:04,500 --> 00:25:09,539 you can start up a large thousand CPU 577 00:25:07,380 --> 00:25:12,740 cluster it depends on what sort of 578 00:25:09,539 --> 00:25:12,740 resources we have available 579 00:25:13,080 --> 00:25:18,720 and I work in kubernetes land most of 580 00:25:16,620 --> 00:25:21,299 the time and you set up a job and you 581 00:25:18,720 --> 00:25:24,059 have run a pods spread across multiple 582 00:25:21,299 --> 00:25:26,820 nodes where the same python code is sent 583 00:25:24,059 --> 00:25:29,580 out parameterized with different sources 584 00:25:26,820 --> 00:25:31,919 that uh open data Cube has looked up of 585 00:25:29,580 --> 00:25:35,640 the data source and then operations are 586 00:25:31,919 --> 00:25:37,440 applied by each part independently based 587 00:25:35,640 --> 00:25:39,360 on the Das graph you have set up through 588 00:25:37,440 --> 00:25:41,400 open data Cube to perform your 589 00:25:39,360 --> 00:25:43,740 processing 590 00:25:41,400 --> 00:25:45,299 so this is like a sample scale 591 00:25:43,740 --> 00:25:47,159 processing I think bigger processing 592 00:25:45,299 --> 00:25:48,960 Than This was done so this was done as 593 00:25:47,159 --> 00:25:51,960 part of the digital of Africa project 594 00:25:48,960 --> 00:25:54,960 which shows starting up a 4000 CPU 595 00:25:51,960 --> 00:26:00,440 cluster and consuming 50 terabytes of 596 00:25:54,960 --> 00:26:00,440 ram to do a whole of Africa processing 597 00:26:00,500 --> 00:26:06,059 and these are some of the results you 598 00:26:02,700 --> 00:26:11,179 get using the landsat data Archive of 599 00:26:06,059 --> 00:26:11,179 say evolution of a city in Egypt 600 00:26:14,820 --> 00:26:19,020 uh the open data queue project also has 601 00:26:17,880 --> 00:26:21,000 sort of other than the core 602 00:26:19,020 --> 00:26:22,799 functionality of being able to load and 603 00:26:21,000 --> 00:26:25,860 index lots and lots of satellite imagery 604 00:26:22,799 --> 00:26:27,779 has a few web applications that are sort 605 00:26:25,860 --> 00:26:30,419 of offshoots to it 606 00:26:27,779 --> 00:26:32,880 one of them is essentially being able to 607 00:26:30,419 --> 00:26:35,580 see the index what data you have in your 608 00:26:32,880 --> 00:26:38,460 data Cube uh we will be able to query 609 00:26:35,580 --> 00:26:41,460 how much of it is there this is the data 610 00:26:38,460 --> 00:26:45,179 Cube Explorer application this shows 611 00:26:41,460 --> 00:26:48,000 that in AWS direct that digital that 612 00:26:45,179 --> 00:26:50,400 Australia project has a half a million 613 00:26:48,000 --> 00:26:52,559 data sets 400 000 data sets covering all 614 00:26:50,400 --> 00:26:55,200 of Australia for this particular 615 00:26:52,559 --> 00:26:58,260 collection of data 616 00:26:55,200 --> 00:27:02,940 there is also an open web service so it 617 00:26:58,260 --> 00:27:05,159 produces the tile Services WMS wmts also 618 00:27:02,940 --> 00:27:08,039 actually fetching the data which is the 619 00:27:05,159 --> 00:27:09,900 web coverage service so this particular 620 00:27:08,039 --> 00:27:13,260 instance is the project I work on called 621 00:27:09,900 --> 00:27:16,260 aqua watch which stores uh data for 622 00:27:13,260 --> 00:27:19,500 aquatic reflections 623 00:27:16,260 --> 00:27:21,179 so obviously that's a segue to the 624 00:27:19,500 --> 00:27:22,799 different open data cubes as 625 00:27:21,179 --> 00:27:24,720 infrastructure running you can have it 626 00:27:22,799 --> 00:27:27,120 locally as your own data queue but there 627 00:27:24,720 --> 00:27:29,220 are large scale deployments digital 628 00:27:27,120 --> 00:27:30,240 Australia from geoscience Australia is 629 00:27:29,220 --> 00:27:32,520 one of them 630 00:27:30,240 --> 00:27:33,900 I worked on initials of Africa currently 631 00:27:32,520 --> 00:27:35,760 some people are trying to set up 632 00:27:33,900 --> 00:27:37,980 something for the Pacific 633 00:27:35,760 --> 00:27:41,600 the 634 00:27:37,980 --> 00:27:41,600 work meets me to take breaks 635 00:27:41,760 --> 00:27:47,820 it's good that I went timing is coming 636 00:27:43,860 --> 00:27:50,760 to an end I work on the csiro easy data 637 00:27:47,820 --> 00:27:51,659 cubes which one of them is the echo 638 00:27:50,760 --> 00:27:54,419 watch one 639 00:27:51,659 --> 00:27:57,980 uh there are lots of them around the 640 00:27:54,419 --> 00:27:57,980 world from csro 641 00:27:58,080 --> 00:28:03,059 and sort of it looks uh open data cube 642 00:28:01,679 --> 00:28:04,500 is a core component but there's lots of 643 00:28:03,059 --> 00:28:06,299 other stuff around it 644 00:28:04,500 --> 00:28:08,880 and it's sort of there are thousand 645 00:28:06,299 --> 00:28:11,240 moving Parts in it so just put this 646 00:28:08,880 --> 00:28:11,240 there 647 00:28:11,900 --> 00:28:17,520 the alternative index backends is 648 00:28:15,299 --> 00:28:19,140 something we are looking to enhance to 649 00:28:17,520 --> 00:28:21,299 scale down so that you have embedded 650 00:28:19,140 --> 00:28:23,580 databases or scale up so that you can 651 00:28:21,299 --> 00:28:25,260 actually just have not a database but 652 00:28:23,580 --> 00:28:27,779 just look at query The Blob stores 653 00:28:25,260 --> 00:28:29,580 directly and index them through some 654 00:28:27,779 --> 00:28:32,039 sort of data Lake approach 655 00:28:29,580 --> 00:28:34,679 aren't also adding hyperspectral support 656 00:28:32,039 --> 00:28:37,440 to create this large task graphs across 657 00:28:34,679 --> 00:28:40,320 the band Dimension efficiently and being 658 00:28:37,440 --> 00:28:43,520 able to index that properly 659 00:28:40,320 --> 00:28:43,520 happy to have any questions 660 00:28:46,080 --> 00:28:50,600 thank you Tish 661 00:28:48,120 --> 00:28:50,600 great 662 00:28:51,539 --> 00:28:55,500 we might be able to fit in one question 663 00:28:53,700 --> 00:28:58,700 quickly if anyone's got any 664 00:28:55,500 --> 00:28:58,700 as long as it's a short answer 665 00:28:58,740 --> 00:29:02,840 I can't see oh just over here 666 00:29:07,200 --> 00:29:10,380 thanks very much for your presentation 667 00:29:08,700 --> 00:29:12,779 you mentioned that there were some 668 00:29:10,380 --> 00:29:15,480 issues using SARS satellite data did 669 00:29:12,779 --> 00:29:17,159 that mean that we cannot use like this 670 00:29:15,480 --> 00:29:18,360 tool for Star seller let it data or is 671 00:29:17,159 --> 00:29:21,360 that is slower and there are some 672 00:29:18,360 --> 00:29:23,159 limitations and also can you use this as 673 00:29:21,360 --> 00:29:26,520 well with private satellite data 674 00:29:23,159 --> 00:29:30,000 providers or is limited to landsat and a 675 00:29:26,520 --> 00:29:32,820 European Space Agency products 676 00:29:30,000 --> 00:29:35,640 um so the there is a limitations in 677 00:29:32,820 --> 00:29:39,419 using the single look complex data which 678 00:29:35,640 --> 00:29:42,120 is not uh not actual x uh reflectance 679 00:29:39,419 --> 00:29:43,740 data but is uh earlier in the SAR 680 00:29:42,120 --> 00:29:46,799 processing where the pixels are not 681 00:29:43,740 --> 00:29:48,720 square and they are complex numbers but 682 00:29:46,799 --> 00:29:50,640 if you have your reference data where 683 00:29:48,720 --> 00:29:52,500 you only see the reflectance uh then 684 00:29:50,640 --> 00:29:54,419 that's fine so there is actually a 685 00:29:52,500 --> 00:29:56,600 sentinel one collection for all of 686 00:29:54,419 --> 00:29:59,399 Africa and so on available 687 00:29:56,600 --> 00:30:02,520 any data can be put into it people build 688 00:29:59,399 --> 00:30:04,620 their own private data cubes uh based on 689 00:30:02,520 --> 00:30:07,380 say planet or any other satellite 690 00:30:04,620 --> 00:30:10,679 imagery they have but then they have to 691 00:30:07,380 --> 00:30:13,559 store it themselves and you know manage 692 00:30:10,679 --> 00:30:15,299 access to it and so on so they have to 693 00:30:13,559 --> 00:30:17,520 set up the infrastructure to manage that 694 00:30:15,299 --> 00:30:19,860 private data themselves a lot of people 695 00:30:17,520 --> 00:30:21,720 have used open data cube in their own 696 00:30:19,860 --> 00:30:23,340 private instances to manage their data 697 00:30:21,720 --> 00:30:25,740 collections 698 00:30:23,340 --> 00:30:27,720 thanks a lot brilliant thank you I mean 699 00:30:25,740 --> 00:30:29,640 the sheer amount of data involved in 700 00:30:27,720 --> 00:30:31,020 this kind of thing makes my head spin to 701 00:30:29,640 --> 00:30:32,399 be honest but 702 00:30:31,020 --> 00:30:34,620 um thank you so much for that and here's 703 00:30:32,399 --> 00:30:36,600 a token of our appreciation our 704 00:30:34,620 --> 00:30:38,039 appreciation thank you there you go 705 00:30:36,600 --> 00:30:39,480 thanks a lot 706 00:30:38,039 --> 00:30:42,140 thank you Tish 707 00:30:39,480 --> 00:30:42,140 add